I did some more thinking about my holographic filesystem ideas and some reading on coding theory and I realized that most of the complexity of the system actually comes from the desire to make it work for read-write. If we design it more narrowly for read-only media like DVD-R discs, and tie less closely to the operating system's filesystem code, the result can be much more compatible and keep most of the advantages. The new idea is that instead of being a full filesystem, it would come as two utilities: one that appends extra redundant blocks to an existing filesystem image so that it will be more recoverable in the future, and another that performs the recovery. If no blocks in the main image are lost, then the filesystem can be read normally by software that doesn't know about the recovery scheme, just ignoring the extra data at the end.
My initial plan had been that reliable blocks (RBs) should be arranged in smallish groups, and then some type of coding theory magic should map them into slightly larger smallish groups of unreliable blocks (URBs). For instance, you can map two RBs onto three URBs such that you can lose any one URB and still recover the two original RBs. The way that works is that you store the two RBs intact into the first two URBs, and then the third URB stores the parity (XOR sum) of the first two. If you lose that one, fine, you still have the two originals; if you lose one of the originals, you XOR the other into the parity block and recover the lost one. That scheme generalizes: you can store n RBs in n+1 URBs by using the last URB for parity of all the others, and then if you lose any one URB you can recover it from the others. This is the scheme RAID 3, 4, and 5 use (with varying decisions about how to lay out the blocks across the drives). RAID 6 uses two parity blocks, but they're identical: the idea is to provide backup while you're busy replacing a failed drive. RAID 6 with two discs for parity cannot necessarily recover from losing two blocks in the set at once; it only provides a better safety margin for recovering from two close-together single-block failures.
If you use bigger groups of blocks it's possible to do better. In particular, if you want to be able to recover from loss of two URBs really at once, then it appears the smallest group you can use is one based on the Hamming (7,4) code, where you add three parity blocks to four RBs to fill up seven URBs. The seven URBs store RBs A, B, C, D, parity of A+B+D, parity of B+C+D, and parity of A+C+D. Note that that way every RB gets included in at least three URBs, so if you lose any two URBs you're guaranteed to still have one containing each RB, and if you do the math it's possible to prove that with the loss of any two URBs you can still XOR together some combination of the others to recover each RB.
These kinds of setups are called erasure codes. Note that they are not the same as the more popular kind of error-correcting code, which correct bit-flip errors. If you read the blocks under one of these schemes and they come back corrupted and you don't know it, you can lose data. Bit flipping is not the usual failure mode I see on media; the drive always gives you the block correctly if it gives you the block at all, but every so often it throws up its heads and says it can't produce the block at all. Note that RAID also doesn't protect against bit-flipping; it just uses the simple parity erasure code on however many drives.
Now, here are a few things. People have been studying erasure codes in great detail for a long time and there's a lot of mathematical theory for designing and using them. However, the problems that people study in relation to erasure codes are actually a lot harder than the ones that come up if I just want to read my old archive discs. Cutting-edge erasure code work is on topics like "What if a whole lot, like 30%, of the blocks go bad?"; "If I want to read just one RB, what is the absolute minimum number of URBs I have to read, on average, to get it?"; and "What's the minimum number of logic gates I have to fit on my ASIC chip to decode this in hardware?" None of those are of prime importance in recovering a deteriorated backup disc, where you probably lost only a few blocks, you want to recover it all at once, and you don't mind doing a fair bit of extra computation this one time.
One important thing about erasure codes is that they get better at larger numbers of blocks. Add one parity URB to two RBs and you can replace one lost URB; but if you add one parity URB to one thousand RBs you can replace any one of them. More importantly, if you split your thousand RBs into pairs and add one parity URB to each pair, you end up storing them in 1500 URBs but you can still be hosed if you lose two blocks out of a group of three. But if you treated your thousand RBs as a single group and stored 500 parity URBs corresponding to intelligently-chosen combinations of RBs from all over the disc, then you end up with a ridiculously large safety margin - enough to correct far more than two lost blocks no matter where they happen to be.
So one conclusion from the coding theory stuff is that we don't really want to be splitting blocks into small groups like of three or five. Instead it would be better to combine all the blocks into one big group and then add redundancy to the whole group. My original plan had been to use lots of small groups by analogy with RAID and to make it easy to update a few blocks at a time. But if we're burning the filesystem image onto a DVD-R, we don't need to make it easy to update a few blocks at a time. We can take the original image and then just add blocks to add redundancy.
Two other interesting revelations from coding theory. First, it's the usual case that you can design one of these codes to just store the RBs unmodified in the first however many URBs; so we could start with an ISO-9660 or UDF image, leave it unchanged, and then add extra blocks at the end containing error-recovery redundancy. That way the reader doesn't need to know about the redundancy. We can just build an ISO image to burn as we normally would, run an "add redundancy" utility to extend it to the size of the medium with redundant blocks, and then software that doesn't know the score only sees a normal ISO-9660 filesystem with harmless garbage at the end. If we later read it and get errors, we can run the decoder utility that scans the disc, reads the extra blocks, and attempts to extract an error-free ISO image which we can drop on a hard drive or burn to another disc. Compatibility for the win.
Second revelation: within broad limits, and as long as you don't need ultra-high theoretical performance, it doesn't really matter how you decide which data blocks go into which parity blocks. Even if you did want ultra-high theoretical performance, you're not smart enough. Pushing the limits is an area of active research, we haven't achieved optimality yet, and the best known schemes are way too complicated to really be useful in this application, bearing in mind that we want something simple to implement and verify so that we can be sure of having a working implementation of it on the other end when it's time to recover old data. You can use a random number generator.
So here's the new scheme: you just tack a bunch of parity blocks onto the end of the ISO or UDF image. You use a pseudorandom number generator to match each data block in the image with several (five might be about right) parity blocks, and each parity block with several image blocks. The parity block is just the XOR sum of the corresponding data blocks.
When you go to read it, you first try to read any given block from the main data area. If you can't, then you find a parity block containing that data block, read all the other data blocks associated with that parity block, and use them to recover the missing data block. If you can't (because of some of those data blocks, or the parity block itself, being bad also) then you try a different parity block. You get five (or however many) chances because each data block is mentioned in five parity blocks. Ideally not too many data blocks are mentioned in each parity block, but that's a trade-off because if you don't have very many parity blocks you may be stuck putting a lot of data into each one. In general though your chances are pretty good and improve the more parity blocks you have.
Now, you do need a pseudorandom many-to-many mapping between data blocks and parity blocks so that for every data block you know which parity blocks it goes into and for each parity block you know which data blocks go into it. I have specific ideas on how to do that reasonably efficiently - involving modular arithmetic and a technique borrowed from cryptography called an asymmetric Feistel network - but I think I'll leave description of that process for another time. I'd rather go ahead and try implementing it.
Do not enter a fake email address. If you don't want to provide one, just leave it blank. Comments with fake email addresses will be deleted.
This form is for posting public comments to be read by other people who visit this Web site. If you have a software support question, or other material directed to the page author instead of to the general public, please send email instead.
All the data you enter, and your IP address, will be saved and displayed. Don't enter secret information. HTML is not accepted; it will be displayed as plain text. Your comment will only be added if you enter valid data in all required fields; if it isn't, use the back button and try again.
I, and I alone, reserve the right to remove postings for any reason.
jim m from 71.160.155.200 at Fri, 25 Apr 2008 15:19:01 +0000:
I found dvdisaster.net/en/index.php again, it seems to be very similar to what you were looking for on the simplified file holography version.
A librarian I spoke with had concerns about digital backups and said uncompressed TIFF images of book pages stored on premium media was the best method, I thought detection and then correction of problems was better.
Personally every backup and dvd-video I author has a md5sum of the directory tree just so I can detect errors.
I was thinking that md5 will offer very good detection of errors and I have not yet had a CD-R CD-RW DVD+R DVD-R DVD+RW or DVD-RW fail to hold data, once burned successfully.
Even the disks that were on sale for $4.95 per 50 DVDs and kept in cakeboxes.
My scripts also run a full md5 checksum of every file on the disk after burning and I just toss disks that fail, and burn a new one.
I only have about 500 DVDs of data backed up currently but I checked every one when burned, and rechecked recently.