So, here we are. It's 3:45pm on Wednesday, and I'm home again after a relatively brief trip to Waterloo. I wanted to be there this morning for a meet-n-greet meeting with some new students, but I found that (because of two and a half hours' sleep last night) I wasn't getting any work done, so I left shortly after noon.
As of 6:15am when I departed, the RAID build was in progress and estimated to finish in four hours, so about 10:15. It has certainly finished now. One other thing to report from the intervening time is that I managed to get through on the phone to the fellow from TigerDirect who had called me, and it turned out he wasn't phoning about the RAM RMA after all! It's just that they have some sort of AI that flags customers who look like they might be buying for a business, to get a personal call inviting them to use the company's B2B service. I wasn't a business and so he didn't have much to say to me. He listened politely to my description of the RAM troubles, but that clearly wasn't his department and he didn't and couldn't tell me anything I didn't already know about it.
So I shipped the RAM back to TigerDirect. I'd been holding off until I was able to talk to their representative, in case it was going to turn out that he'd tell me not to ship it back for some reason. It remains to be seen how much of it they'll reimburse.
Now, as the lady said to the tinker, let's have another round.
At the moment, the hardware side of this installation all seems to be under control, and the data is safely secured, though still not easily accessible. My next priority is to render tetsu into a state where I can administer it from remote; that way I'll be able to work on it wherever I am at the time, instead of only from home. After that, other networking and in particular, email; and then it'll be the long process of "settling in" with all my work and play on the new installation.
First things first: now that I have all the RAID arrays stable, I have to build the next levels of the filesystem on them. That will mean an ext4 filesystem for root on the RAID1 - directly without LVM, so that it'll be easier to boot and recover it; a swap file on the RAID0; a physical volume on the RAID5 to be added to the group that contains the existing RAID6; a bunch of logical volumes in the volume group for the different parts of my installation; and ext4 filesystems in those.
The old system was running ReiserFS throughout. Say what you will about the ReiserFS maintenance situation, the tipping point for me was actually reading about a known issue where if you have ReiserFS filesystem images in files on a ReiserFS filesystem, and then it crashes, there's a significant likelihood the repair process will cross-link the crashed filesystem with the images in files on the crashed filesystem. The idea of that being possible, upsets me. It looks like ext4 is the current filesystem of choice for serious Linux installations. Now, off I go to build volumes and filesystems.
4:25: Running Slackware setup now. Let's see how much of this RAID and LVM configuration it manages to auto-detect...
5:10: It seemed to recognize it, so the next hurdle will be whether it can retain the configuration past a reboot. I just finished selecting packages (nearly everything in the distribution except the really flagrant bloat that I'm certain I'll never use) and the software install is running now, slurping packages from the USB key to the assorted logical volumes spread among the RAIDs.
5:33: The installer basically terminated successfully, but LILO refused to install. Also, the installer seemed to install a non-initrd kernel. Both these things will have to be fixed before the system can boot under its own power. For the moment, I'm rebooting with the install key and a kernel command line pointing it at the first partition of the first hard drive (which is one of the mirrored copies of my actual root filesystem). That may end up screwing up the RAID mirroring, but it's only a gig and will be easy to fix later. The hope is that from here I can get enough of the "here is how to bring up RAID and LVM" stuff configured and built into the initrd, that I can create a hard-drive-based boot sequence.
5:42: Kernel panic!
5:57: Leftover taco mixture from yesterday is heating up. On the silicon front, it looks like the installer's boot prompt is not capable of booting into my installed environment with one partition of the mirror set specified on the kernel command line. So the next thing to try is booting into the regular installer, doing a chroot to the installed environment, and trying to get all the boot stuff set up from within that.
7:09: Looks like there are two main issues. One, to boot this installation properly I need an initrd; and the default Slackware one won't work because it doesn't support RAID arrays with nonstandard device names. I had given mine mnemonics instead of using /dev/md/0 and such, in order to avoid silly errors of forgetting the numbers in the future. In a chrooted environment, with a little bit of fooling around, I can run the mkinitrd program to assemble a new initrd image with the necessary change, which consists of adding the "--auto=md" option to the init script where it calls mdadm to bring up the RAID. I took the opportunity to also bake my RAID configuration into the initrd instead of scanning for it, to reduce silliness if I happen to put drives from other RAIDs into the computer in the future. (Disabling autodetect, as I mentioned yesterday.)
The other issue is that LILO will only boot from a RAID with a version 0.9 superblock, and I'd selected 1.2 throughout. This is a little more serious, because it means I have to start over the RAID1 array from scratch, and I already installed Slackware on the older one, and the 0.9 superblock consumes more space, so the resulting array is 64K smaller, so I have to save the contents and copy them to the new one but I can't just copy a filesystem image from the old one onto the new, I have to build a fresh filesystem after re-initializing the array and copy the contents file by file. Not really a problem, but it has many opportunities for error, and I managed to screw myself up enough with mounting things inside the chrooted environment, ending chroot, and then being unable to unmount them, that another reboot has become necessary.
8:28: After much fooling around with LILO and reading a lot of unhelpful "help" about the dreaded "Fatal: map file must be on the boot RAID partition" error, it appears that one issue is that even with a version 0.9 RAID superblock, I just am not allowed to use a nonstandard name for the device file of a RAID from which LILO will boot. So the new plan is to make just that one RAID be /dev/md0; the others can keep their 1.2 superblocks and nice names, and it's not too huge an imposition to remember that there is just ONE with a nonmnemonic name and it's the boot partition. Unfortunately discovering and working on this has required a number of reboots from the USB key, and those take a long time because the USB key is slow to transfer. So it's an annoying process of trial and error. I'm really looking forward to when this thing can boot from its hard drives - even though at that point it'll also probably be unnecessary to reboot it anymore anyway.
8:49: Okay, that did the trick for LILO. I was just making it too complicated for myself by trying to use a mnemonic name for the device file. However, the kernel is still panicking, this time apparently because it can't find "init." That's probably just a matter of needing the proper kernel command line - I used a lilo.conf copied over from the old machine, which didn't use an initrd, and now that one is necessary, I probably haven't quite the right options set for using it. So, more reboots remain necessary, but I think we're over the hump.
9:37: Getting closer. It appears, after much trial and error, that a big part of my problem is that I wasn't passing the command-line options to mkinitrd (not the kernel) to tell it to include the proper modules in the initrd to start up RAID, LVM, and so on. There was also the problem that I was using the Slackware "huge" kernel (which is intended to obviate the need for an initrd) and you can't actually do that and have an initrd too - the combination sums to more than some BIOS address limit. Since I'm using an initrd I don't need a huge kernel (the drivers can go on the initrd instead). With a "generic" kernel and the proper mkinitrd options, I get a lot farther - it still fails to find init, but it at least finds a shell and runs that (not that it helps much, but it shows progress). From error messages immediately before it goes into the shell, it looks like the system is still trying to use my mnemonic RAID names that I abandoned. Fortunately, that should be easy to fix: recursive grep the tree of files going into the initrd, find where there is still some kind of config file mentioning the old name, and change it. In fact, I think I can guess where it'll be - but I will do the recursive grep anyway to make sure I don't miss any.
10:04: The machine has successfully booted under its own power.
11:28: I am making this update from my own user ID on the new machine. With the home directories from opal moved onto the new installation of Slackware on tetsu, a lot of things have broken, and it's going to take a while to fix all the annoying configuration glitches that result; but at least now I'm up and running again, and connecting to the Net.
I've configured OpenSSHD and the firewall to allow me to log in and work on the machine from remote; with luck, tomorrow I'll be able to get email up and running again, which is the next priority item.