« Main screen turn on | Home | For great justice »

We get signal?

Wed 15 Sep 2010 by mskala Tags used: ,

So, here we are. It's 3:45pm on Wednesday, and I'm home again after a relatively brief trip to Waterloo. I wanted to be there this morning for a meet-n-greet meeting with some new students, but I found that (because of two and a half hours' sleep last night) I wasn't getting any work done, so I left shortly after noon.

As of 6:15am when I departed, the RAID build was in progress and estimated to finish in four hours, so about 10:15. It has certainly finished now. One other thing to report from the intervening time is that I managed to get through on the phone to the fellow from TigerDirect who had called me, and it turned out he wasn't phoning about the RAM RMA after all! It's just that they have some sort of AI that flags customers who look like they might be buying for a business, to get a personal call inviting them to use the company's B2B service. I wasn't a business and so he didn't have much to say to me. He listened politely to my description of the RAM troubles, but that clearly wasn't his department and he didn't and couldn't tell me anything I didn't already know about it.

So I shipped the RAM back to TigerDirect. I'd been holding off until I was able to talk to their representative, in case it was going to turn out that he'd tell me not to ship it back for some reason. It remains to be seen how much of it they'll reimburse.

Now, as the lady said to the tinker, let's have another round.

At the moment, the hardware side of this installation all seems to be under control, and the data is safely secured, though still not easily accessible. My next priority is to render tetsu into a state where I can administer it from remote; that way I'll be able to work on it wherever I am at the time, instead of only from home. After that, other networking and in particular, email; and then it'll be the long process of "settling in" with all my work and play on the new installation.

First things first: now that I have all the RAID arrays stable, I have to build the next levels of the filesystem on them. That will mean an ext4 filesystem for root on the RAID1 - directly without LVM, so that it'll be easier to boot and recover it; a swap file on the RAID0; a physical volume on the RAID5 to be added to the group that contains the existing RAID6; a bunch of logical volumes in the volume group for the different parts of my installation; and ext4 filesystems in those.

The old system was running ReiserFS throughout. Say what you will about the ReiserFS maintenance situation, the tipping point for me was actually reading about a known issue where if you have ReiserFS filesystem images in files on a ReiserFS filesystem, and then it crashes, there's a significant likelihood the repair process will cross-link the crashed filesystem with the images in files on the crashed filesystem. The idea of that being possible, upsets me. It looks like ext4 is the current filesystem of choice for serious Linux installations. Now, off I go to build volumes and filesystems.

4:25: Running Slackware setup now. Let's see how much of this RAID and LVM configuration it manages to auto-detect...

5:10: It seemed to recognize it, so the next hurdle will be whether it can retain the configuration past a reboot. I just finished selecting packages (nearly everything in the distribution except the really flagrant bloat that I'm certain I'll never use) and the software install is running now, slurping packages from the USB key to the assorted logical volumes spread among the RAIDs.

5:33: The installer basically terminated successfully, but LILO refused to install. Also, the installer seemed to install a non-initrd kernel. Both these things will have to be fixed before the system can boot under its own power. For the moment, I'm rebooting with the install key and a kernel command line pointing it at the first partition of the first hard drive (which is one of the mirrored copies of my actual root filesystem). That may end up screwing up the RAID mirroring, but it's only a gig and will be easy to fix later. The hope is that from here I can get enough of the "here is how to bring up RAID and LVM" stuff configured and built into the initrd, that I can create a hard-drive-based boot sequence.

5:42: Kernel panic!

5:57: Leftover taco mixture from yesterday is heating up. On the silicon front, it looks like the installer's boot prompt is not capable of booting into my installed environment with one partition of the mirror set specified on the kernel command line. So the next thing to try is booting into the regular installer, doing a chroot to the installed environment, and trying to get all the boot stuff set up from within that.

7:09: Looks like there are two main issues. One, to boot this installation properly I need an initrd; and the default Slackware one won't work because it doesn't support RAID arrays with nonstandard device names. I had given mine mnemonics instead of using /dev/md/0 and such, in order to avoid silly errors of forgetting the numbers in the future. In a chrooted environment, with a little bit of fooling around, I can run the mkinitrd program to assemble a new initrd image with the necessary change, which consists of adding the "--auto=md" option to the init script where it calls mdadm to bring up the RAID. I took the opportunity to also bake my RAID configuration into the initrd instead of scanning for it, to reduce silliness if I happen to put drives from other RAIDs into the computer in the future. (Disabling autodetect, as I mentioned yesterday.)

The other issue is that LILO will only boot from a RAID with a version 0.9 superblock, and I'd selected 1.2 throughout. This is a little more serious, because it means I have to start over the RAID1 array from scratch, and I already installed Slackware on the older one, and the 0.9 superblock consumes more space, so the resulting array is 64K smaller, so I have to save the contents and copy them to the new one but I can't just copy a filesystem image from the old one onto the new, I have to build a fresh filesystem after re-initializing the array and copy the contents file by file. Not really a problem, but it has many opportunities for error, and I managed to screw myself up enough with mounting things inside the chrooted environment, ending chroot, and then being unable to unmount them, that another reboot has become necessary.

8:28: After much fooling around with LILO and reading a lot of unhelpful "help" about the dreaded "Fatal: map file must be on the boot RAID partition" error, it appears that one issue is that even with a version 0.9 RAID superblock, I just am not allowed to use a nonstandard name for the device file of a RAID from which LILO will boot. So the new plan is to make just that one RAID be /dev/md0; the others can keep their 1.2 superblocks and nice names, and it's not too huge an imposition to remember that there is just ONE with a nonmnemonic name and it's the boot partition. Unfortunately discovering and working on this has required a number of reboots from the USB key, and those take a long time because the USB key is slow to transfer. So it's an annoying process of trial and error. I'm really looking forward to when this thing can boot from its hard drives - even though at that point it'll also probably be unnecessary to reboot it anymore anyway.

8:49: Okay, that did the trick for LILO. I was just making it too complicated for myself by trying to use a mnemonic name for the device file. However, the kernel is still panicking, this time apparently because it can't find "init." That's probably just a matter of needing the proper kernel command line - I used a lilo.conf copied over from the old machine, which didn't use an initrd, and now that one is necessary, I probably haven't quite the right options set for using it. So, more reboots remain necessary, but I think we're over the hump.

9:37: Getting closer. It appears, after much trial and error, that a big part of my problem is that I wasn't passing the command-line options to mkinitrd (not the kernel) to tell it to include the proper modules in the initrd to start up RAID, LVM, and so on. There was also the problem that I was using the Slackware "huge" kernel (which is intended to obviate the need for an initrd) and you can't actually do that and have an initrd too - the combination sums to more than some BIOS address limit. Since I'm using an initrd I don't need a huge kernel (the drivers can go on the initrd instead). With a "generic" kernel and the proper mkinitrd options, I get a lot farther - it still fails to find init, but it at least finds a shell and runs that (not that it helps much, but it shows progress). From error messages immediately before it goes into the shell, it looks like the system is still trying to use my mnemonic RAID names that I abandoned. Fortunately, that should be easy to fix: recursive grep the tree of files going into the initrd, find where there is still some kind of config file mentioning the old name, and change it. In fact, I think I can guess where it'll be - but I will do the recursive grep anyway to make sure I don't miss any.

10:04: The machine has successfully booted under its own power.

11:28: I am making this update from my own user ID on the new machine. With the home directories from opal moved onto the new installation of Slackware on tetsu, a lot of things have broken, and it's going to take a while to fix all the annoying configuration glitches that result; but at least now I'm up and running again, and connecting to the Net.

I've configured OpenSSHD and the firewall to allow me to log in and work on the machine from remote; with luck, tomorrow I'll be able to get email up and running again, which is the next priority item.

6 comments

Owen
Seems like a lot of work just to transfer your giant pr0n stash to new b0xen.

Remember that it's not really your computer until you bleed on it.

Thanks for turning me on to that "For Intel/For AMD" ram bullshit. I'll keep that in mind next time I build a system. Owen - 2010-09-16 00:21
Steven
What is this "for xyz" crap? Ram is ram, if you meet the timing/ecc and interface spcifications (72pin simms don't fit on a 30pins lot) it should just work. I don't want commodities like ram growing into little fiefdoms.

Pretty soon I'll need to buy special crisco compatible sugar. Steven - 2010-09-16 22:41
Owen
Yeah, and you'll get sued if you don't have a license for the recipe. Owen - 2010-09-16 22:42
Matt
I don't know what the electrical difference is between the two, but I know that the Corsair CMX8GX3M4A1333C9 modules did not work in my ECS A790GXM-AD3 motherboard, and the G.SKILL F3-10666CL9D-4GBRL "Ripjaws" modules did work. Both are claimed to be DDR3-1333 PC-10666 CL9-9-9-24 1.5v; both are 2G per module; the Corsair came in a package of four modules and the G.SKILL came in packages of two modules (I got two packages) for a total of 8G either way. The only difference in spec between them is that the Corsair modules are described in the marketing literature as "for Intel’s new Core i5 and Core i7 dual channel DDR3 processors" whereas the G.SKILL modules are described as "for Intel LGA1156 Core i5 & i7 CPUs and AMD AM3 CPUs." (This motherboard is AMD-based.)

The fact that one works and one doesn't (and this is consistent over all four modules) suggests there's definitely *some* difference in spec. I don't know what it might be. Maybe something to do with the voltage? If the Corsair modules are just a little more sensitive to getting the wrong voltage, and the motherboard is giving them a slightly wrong voltage, maybe that makes the difference. I don't know for sure that it's really caused by the brand of CPU, though it seems to be correlated with that.

Note I'm not using a tweaked voltage. In the case of the Corsair modules, I could not boot as far as the BIOS settings to tweak the voltage even if I wanted to. Matt - 2010-09-16 22:56
Matt
BTW, Steven, you sound like an old geezer talking about 72-pin and 30-pin SIMMs. :-) Matt - 2010-09-16 23:08
Steven
Alright dimm vs so-dimm then.

and owen, you don't need to worry, that Apple pie is licensed. Steven - 2010-09-19 20:41


(optional field)
(optional field)
Answer "bonobo" here to fight spam. ここに「bonobo」を答えてください。SPAMを退治しましょう!
I reserve the right to delete or edit comments in any way and for any reason.