[Novalug] Grub/EFI on CentOS6/RHEL6

Bryan J Smith b.j.smith@ieee.org
Mon Feb 22 11:25:53 EST 2016


Short Answer ...

  man 8 sfdisk
  /BACKING UP THE PARTITION TABLE


Long Answer ...

  Did you dump (make a backup) of the failed
  drive's GPT disk label, and then re-apply it
  to the new, replacement drive?  E.g.,

    # sfdisk -d /dev/sda >
        /boot/efi/dump_sda.sfdisk

    [ Yes, _always_ do this, keep it safe! ]

    # sfdisk /dev/sda >
        /boot/efi/dump_sda.sfdisk


  If not, then the uEFI and/or GRUB are having
  trouble because _all_ your UUIDs in the
  GPT disk label have changed. ;)

-- bjs


On Mon, Feb 22, 2016 at 11:17 AM, Peter Larsen via Novalug <
novalug@firemountain.net> wrote:

> Ok - I've gotten to the point where I've confused myself to the extend
> that I doubt even the basics ... time to write things up, and perhaps
> that will provide me with the answer I know is out there ....
>
> I have a relatively old RHEL6/CentOS6 box which plays a central role as
> router, DNS and other core network services for my home network. Because
> of this, I've always run the server in a RAID1 configuration (and yes,
> it's the server that one of the NEW drives went "poof" in that I posted
> about here some weeks ago). As to the RMA, I got a new disk - same
> model, just bigger *sigh*. Anyway - even with a single drive in the box,
> I noticed to my horror that the box would not boot automatically anymore
> - grub would simply go to the Grub> prompt and only because I've done
> this like 100x I know the commands to boot it manually (hardest part is
> the UUID for the md0 device - more about that below).
>
> The mobo supports UEFI but I've set the board to legacy/UEFI mode. When
> the drive failed, it was in pure UEFI mode and somehow the drives were
> in IDE mode, not ACPI which I don't get. Obviously I have forgotten to
> change some settings last time I did a BIOS update on the box.
>
> Each drive has the exact same setup - gpt partitions with 3 partitions:
> # parted /dev/sda print
> Model: ATA WDC WD3003FZEX-0 (scsi)
> Disk /dev/sda: 3001GB
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
>
> Number  Start   End     Size    File system  Name     Flags
>  1      1049kB  211MB   210MB   fat16                 boot
>  2      211MB   1285MB  1074MB  ext4                  raid
>  3      1285MB  1000GB  999GB                         raid
>
> 1 is the UEFI boot - very small, and basically contains two files -
> grub.conf and grub.efi - no more.
> 2 is /boot - which is a mirrored raid (raid 1) with sdb
> 3 is LVM - which is a mirrored raid (raid 1) with sdb
>
> Which legacy grub doesn't support RAID0 (striping) I've never had an
> issue with mirroring except that I have to manually configure grub when
> the primary drive fails (this is why hardware raid ROCKS - the OS will
> recover regardless of which drive fails - and no, not the fake-raid
> stuff that Intel includes on the boards - DON'T TOUCH IT!). If I do a
> file -s /dev/sda2 the drive reports as being ext4, which is what I
> expect on small RAID1 drives. In other words, the system does not know
> on boot that /boot is a RAID device and it shouldn't care.
>
> When I run grub-install I use /dev/sda1 as the target device because
> that's where grub.conf is. But this is where I have started to have real
> doubt and confusing myself. sda2 is what contains all stage 1.5 and
> stage2 grub files.  They're not on /dev/sda1 which i think could explain
> that grub falls into grub> - but I actually think that would cause grub
> to completely fail since the command prompt is present in the last stage
> of grub. Here's a quick piece of grub.conf as it's currently configured:
>
> boot=/dev/sda1
> device (hd0) HD(1,800,64000,cf11aab3-7612-4a79-b585-52bf196db82f)
> default=0
> timeout=5
> splashimage=(hd0,1)/grub/splash.xpm.gz
> hiddenmenu
> # password --encrypted
>
> $6$kEMErPh6jjKtcbWz$wqCPtBw1FdqA11ncsyp4F5kfeL/iLEx80myQct07N.283uv88I8SPThUFDDbLaZWhoz4ItvT5VpYECpaBKuQ71
> title Red Hat Enterprise Linux Server (2.6.32-573.18.1.el6.x86_64)
>     root (hd0,1)
>     kernel /vmlinuz-2.6.32-573.18.1.el6.x86_64 ro
> root=/dev/mapper/dilbert-lvroot rd_LVM_LV=dilbert/lvswap rd_NO_LUKS
> LANG=en_US.UTF-8 rd_LVM_LV=dilbert/lvroot SYSFONT=latarcyrheb-sun16
> crashkernel=auto rd_MD_UUID=c81eb941:8281acc8:299dd4c2:a0ee0824
> KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
>
> First - the problem was that this file had NOT been updated in at least
> 4 months. Not sure why - and it just shows how infrequent the server is
> booted because all the pointers were to kernels that no longer were
> present in /boot.  Note that the kernel line is pretty standard except
> that it passes in the MD UUID for the root device - otherwise the kernel
> cannot  mount root (it first has to activate MD then LVM). Typing the
> grub lines as they are above directly into grub WORKS - the systems
> boots just fine. But grub doesn't see it on boot.
>
> So something is wrong here - and of course grub isn't very helpful when
> it just exits into it's grub> prompt.  Basically I think the problem is
> in the first two lines.  boot= doesn't point to sda2 - but it's not the
> boot partition, so when there are two of them (UEFI is in play here)
> which one should it point to?  Second, the device name has a UUID that I
> cannot find anywhere.  The file system on /dev/sda1 and /dev/sda2 have
> both been wiped. sda failed this summer, and last month sdb failed. I've
> duplicated the sda1 and sdb2 by simply dd'ing the partitions from disk
> to disk, and I really regret that now. For one, fstab referred to a UUID
> that existed on two disk - it wasn't unique!
>
> So the question to the group is - what UUID should the "device" command
> point to - if it's hd0,1 it would be /boot (sda2). And what boot device
> should boot point to - the UEFI that has the boot code in the partition
> but not the stage2 part of grub? Recall that sda2 does not have boot
> data in the partition header. So it's not a boot partition. And I cannot
> find documentation on the HD part of the device command at all. Not sure
> what the 800 and 64000 refers to?
>
> Right now my thoughts are that the uuid just needs to be corrected in
> the device command. Since this would explain why it used to work and why
> it doesn't work now. But I've confused myself so much the last week
> while being angry at failing disks left and right - I need some
> confirmation on this :)
>
> --
> Regards
>   Peter Larsen
>
> **********************************************************************
> The Novalug mailing list is hosted by firemountain.net.
>
> To unsubscribe or change delivery options:
> http://www.firemountain.net/mailman/listinfo/novalug
>



-- 
-- 
Bryan J Smith - http://www.linkedin.com/in/bjsmith



More information about the Novalug mailing list