-
Notifications
You must be signed in to change notification settings - Fork 2
Grub2 part 1, the preparation
Current Illumos x86 boot system is based on grub-0.97 (grub 1), kernel supports Multiboot 1 specification, boot on BIOS systems only and using MBR partitioning scheme.
Main Pain points:
- MBR partitioning is limited, MBR scheme supports disks up to 2TB.
- Complicated to manage, Illumos is using vtoc label inside MBR (or fdisk) partition, zpool command does not implement MBR partition management for whole disk setups. Complicated setups lead to many different ways to make mistakes, and as real life has proven, people use every opportunity to actually implement those mistakes on real systems.
- No support for UEFI, which is currently not an big issue by itself as in many cases user can use BIOS instead.
- No support for Multiboot 2 specification, which is not an issue for users, but for OS development, as MB2 does have in my opinion much cleaner interface to implement various features used in kernel boot and load phases.
As current boot support is based on Grub, the current boot code (MB1 interface) and tools to manage boot environments are built to support grub boot manager. Since Grub has been evolved to support new hardware, has new features, and it does work, integrating Grub2 is possible with least effort and opens up path towards adoption of new interfaces and developments. It also does not cut off the options for alternate boot mechanisms.
The very first step in this project was to select and verify grub2 version to base upon. Since Illumos is fork of Solaris from OpenSolaris project, the first test was done by using Grub 1.99 code drop from Oracle Solaris 11.1. Oracle is building this version of grub with gcc 4.5 (with custom patches), the current gcc-4.4.4 based Illumos development compiler was unable to build this version of grub due to issues with assembler files. However, gcc-4.7 was able to build this version of grub and it was possible to boot unmodified Illumos kernel. Still, since that grub code drop is really old, it is not good idea to use it as base, but as proof of concept, it did what was expected.
Second test was with Grub 1.99 code drop from Oracle 11.2. Same result, gcc-4.4.4 does not build it without modifications and even with some grub updates integrated by Oracle, its still not really current.
Currently used Grub version (2.02) is from git://git.savannah.gnu.org/grub.git with some updates and fixes applied (most for configure phase, but also zfs features support), gcc-4.4.4 is used to build it and it is up to date with recent fixes and updates from grub developers. I have been using this version of grub to boot both from network and disk, so the results from basic tests are good.
The current Illumos (disk) boot is supported from zfs pool (root pool, rpool) built on VTOC slice in Solaris2 partition. Such pool is created by distribution specific installer, usually without redundancy. Illumos supports only mirroring for redundancy and no striping. Some installers (such as AI) do support setting up mirroring boot pool at install time.
The suggested first task for an admin is to set up mirror for root pool, which is manual work to set up MBR partitioning with fdisk and build correct VTOC label with format (or fmthard), followed zpool command to attach the mirror side.
The grub menu file is located at /root pool mount point/boot/grub and is named boot.lst, there are two tools to manage this menu.
- The boot environments (BE) are managed by beadm and it does update the grub config as well.
- bootadm can be used to directly manage the menu.lst file.
grub menu is also used by halt/reboot commands and can be updated by uadmin() function call to create temporary boot entry to boot kernel with specified command line option(s).
To support grub menu management, current Illumos has three (3!) alternative menu management implementations. beadm has its own implementation in libbe, bootadm has its own implementation and there is also libgrubmgmt used by halt/reboot and svc.startd.
And finally the disk partitioning is also managed by zpool command, to simplify whole disk data pools management (by creating and using EFI disk labels for data pools).
The objective of this project is to address the first two pain points, to get 2+TB disk boot support and to simplify boot setup management by using grub2 as boot loader. It will also serve as door opener for possible other features.
Today, the support for 2+TB disks is provided by using EFI disk labels.
Grub 2 implements the boot from EFI labelled disks in two variants:
- for BIOS systems the "BIOS boot" partition is used (as raw space to store the bootloader image)
- for UEFI systems the "EFI System" partition is used (FAT32 file system with bootloader files for different vendors).
The very first step to address those issues is to get OS to recognise and ability to manage the BIOS boot and System partitions and since Illumos is using translation to vtoc table, this translation should support those partitions as well. This step is done, see issue #5119.
Since Illumos is already able to build and manage whole disk data pools with zpool command, it should also be able to build and manage whole disk root or boot pools. Also, as all pool management operations are implemented via zpool command, the current implementation has also checks to ensure the boot pool is built according to requirements for bootable pools.
To make it possible to create and manage bootable pool with EFI disk labels, we will need following updates:
- zpool create should have way to specify we are about to create bootable [whole disk] pool
- libzfs should support creating EFI labels for disks in bootable pools
- libzfs checks for bootable pools should be updated to allow EFI labelled disks.
- zpool autoreplace support.
Implementing (1) is relatively easy and following the approach of Oracle Solaris (to maintain familiar interface) by adding -B switch for zpool create command. Note, the "zpool create -B" is only about creating whole disk setups, to tell zpool command to create boot and pool partitions for us. If user needs to support multi-os boot setups, the partitioning must be done manually by user, -B switch is not used, and proper disk device must be used as argument for zpool create command.
Implementing (2) is depending on system platform type, if its BIOS based system, it should create "BIOS boot" partition, and "EFI System" partition for UEFI systems. In fact, in both cases the location and size of the partition can be the same, so the only difference is about partition type.
Implementing (3) is a bit more complicated, as current libzfs code assumes the slice 0 for whole disk setups. Implementing (4) is straight forward update to syseventd mod_zfs.
Altho there are no requirements about partition ordering, location and size for boot partitions, the suggestion and good practice is to create boot partition as first partition (slice 0). To avoid possible alignment issues, it is suggested to start slice at block 256 (see libzfs_pool.c comment above NEW_START_BLOCK). As for size, Im again following the Oracle Solaris approach to use 256MB, which should be more than enough to provide space for boot blocks even when different boot loaders are used in UEFI system.
Here is an example of EFI label for BIOS system boot disk:
Current partition table (original):
Total disk sectors available: 33537981 + 16384 (reserved sectors)
Part Tag Flag First Sector Size Last Sector
0 BIOS_boot wm 256 256.00MB 524543
1 usr wm 524544 15.74GB 33538014
2 unassigned wm 0 0 0
3 unassigned wm 0 0 0
4 unassigned wm 0 0 0
5 unassigned wm 0 0 0
6 unassigned wm 0 0 0
8 reserved wm 33538015 8.00MB 33554398
The registered feature request for zpool/libzfs change is #5125 and the proposed implementation is webrev.
Note, the proposed implementation does allow mixed MBR/EFI pool setups for migration purposes, however, since the boot partition setup on EFI disk will take 256MB, it will most likely take more space compared with MBR partitioning layout and zpool command will not allow to mirror identical disks with different partitioning schemes. If that is the case, "beadm create -p poolname" can provide an alternate way for migration.
The second zfs pool related required change is kernel zfs module update to allow setting of "bootfs" property for whole disk boot pools. Current kernel module does check if it will receive ioctl() to update "bootfs" property, to make sure its not set on whole disk setups. Since bootadm does attempt to set this property, the zfs kernel module really needs to be updated accordingly. One possible approach for such update is to implement an check for disk label and only allow to set bootfs if the disk has "BIOS boot" or "EFI System" partition. Registered feature request is #5120 and the implementation is not yet published.
Above described updates can be counted as prerequisites for final stage which is update to management tools to support grub2 configuration updates and the inclusion of the grub2 itself.
It's still being written;)