Daily Source Reading: Boot Process - Stage 1, EFI edition [OpenBSD]
This seems to be a recurring theme. Yes, we will look at OpenBSD
first. I took a look at FreeBSD also, but the boot process there is
a bit more involved due to SYSINIT. We will get to it, but for
getting our feet wet, we will start with OpenBSD.
Disclaimer
I have been wrecking my brains about where to start. Do we start with the kernel main function and work our way backwards to the bootloader? Or do we start at the groundfloor? In the end, I think it makes more sense to start with the bootloader.
For brevity's sake, we will only look at amd64 architecture and only
use UEFI instead of classic old MBR.
I will also very likely get a lot of this wrong. I am not the most
knowledgeable about UEFI. My last experience writing an OS was
with good old BIOS boot. So, some of the very UEFI specific parts
I might skip over.
But even the EFI boot process has some assembly involved. But for
now, we will not go into that at all. This would take a lot more
time and also currently exceeds my daily bandwidth by a truckload.
With that out of the way, here goes nothing. Let's dive in and understand the boot process.
An OS to boot
From what I gathered from EFI, you provide it with a binary with an
efi_main. That is where we start off. Its only goal is to load our
boot loader and kick shit off. So, let's start there.
We power up our computer, and then what happens? In the times of
yore, the BIOS would have looked at all bootable devices in the
order specified in the settings and check the first sectors for a boot
sector and try to load that (very washed down explanation here).
Nowadays, most computers use UEFI. It's its own system in a way.
In short, it looks for FAT formatted partitions, EFI partitions,
and unless something is stored in its nvram settings, it will fall
back to looking for say /efi/boot/bootx64.efi. This is our first
bootloader.
There is a caveat though. The bootloader is expected to be a PE
(i.e. Windows) executable. PE has a different calling
convention than SysV (what BSD uses).
Calling conventions
Without going into too much detail, it's a convention on how to pass
arguments in assembly code. Microsoft's calling convention PE for
example passes the first 4 arguments to a function via the CPU
registers RCX, RDX, R8 and R9, in that order.
But SysV uses RDI, RSI, RDX and RCX, in that order.
We want to write most of our code in SysV style, and our toolchains
are laid out for that. But UEFI expects PE. What do we do?
Cast the magic spell from hither to yonder
.text .align 4 .globl _start _start: subq $8, %rsp
First we align the stack (rsp), so that it aligns to 16 byte. That
will make it easier later, because SysV expectes 16 byte alignment.
We only adjust it by 8 bytes, because the UEFI firmware calls us.
That means it already put an 8 byte return address on the stack.
pushq %rcx pushq %rdx
Now that our stack is properly aligned, we take the RCX and RDX
registers and push them into the stack. Remember, that is how PE
passes arguments. So effectively we are just storing the first 2
arguments on the stack.
0: lea ImageBase(%rip), %rdi lea _DYNAMIC(%rip), %rsi
Now we load the address of ImageBase and the _DYNAMIC section from
our binary and put them in rdi and rsi respectively. We are
preparing them as the first 2 arguments for an upcoming call.
popq %rcx popq %rdx pushq %rcx pushq %rdx
Now we restore the two values we pushed earlier (the original
arguments). We restore them into rcx and rdx but also immediately
push them onto the stack again. Basically we are peeking at the
values.
call self_reloc
Now we are ready with all our arguments and go call self_reloc. We
will look at that next, so bear with me for a bit.
popq %rdi popq %rsi
With the call done and everything relocated, we restore the two
arguments from the stack into rdi and rsi (remember…calling
convention).
call efi_main addq $8, %rsp
We are ready to call efi_main, our actual boot loader. Once done,
we are nice and restore to stack to how it was when we started.
.exit: ret /* * hand-craft a dummy .reloc section so EFI knows it's a relocatable * executable: */ .data .section .reloc, "a" .long 0 .long 10 .word 0
As the comment states, this is just a dummy section so that UEFI is
happy with us.
Relocation, relocation, relocation
/* * A simple elf relocator. */ void self_reloc(Elf_Addr baseaddr, ElfW_Dyn *dynamic) { Elf_Word relsz, relent; Elf_Addr *newaddr; ElfW_Rel *rel = NULL; ElfW_Dyn *dynp;
/* * Find the relocation address, its size and the relocation entry. */ relsz = 0; relent = 0; for (dynp = dynamic; dynp->d_tag != DT_NULL; dynp++) { switch (dynp->d_tag) { case DT_REL: case DT_RELA: rel = (ElfW_Rel *)(dynp->d_un.d_ptr + baseaddr); break; case DT_RELSZ: case DT_RELASZ: relsz = dynp->d_un.d_val; break; case DT_RELENT: case DT_RELAENT: relent = dynp->d_un.d_val; break; default: break; } }
This walks the _DYNAMIC array looking for three pieces of information:
- where the relocation table is
- how big it is
- how big each entry is
/* * Perform the actual relocation. We rely on the object having been * linked at 0, so that the difference between the load and link * address is the same as the load address. */ for (; relsz > 0; relsz -= relent) { switch (ELFW_R_TYPE(rel->r_info)) { case RELOC_TYPE_NONE: /* No relocation needs be performed. */ break; case RELOC_TYPE_RELATIVE: newaddr = (Elf_Addr *)(rel->r_offset + baseaddr); #ifdef ELF_RELA /* Addend relative to the base address. */ *newaddr = baseaddr + rel->r_addend; #else /* Address relative to the base address. */ *newaddr += baseaddr; #endif break; default: /* XXX: do we need other relocations ? */ break; } rel = (ElfW_Rel *) ((caddr_t) rel + relent); } }
The comment mostly explains what it does.
The main stage
So, we have the first few stumbling blocks out of the way. We have
switched over to our own calling convention and are not reliant on
PE anymore and we have patched up our addresses. Time to jump into
the next stage of booting.
EFI_STATUS efi_main(EFI_HANDLE image, EFI_SYSTEM_TABLE *systab) { extern char *progname; EFI_LOADED_IMAGE *imgp; EFI_DEVICE_PATH *dp0 = NULL, *dp; EFI_STATUS status; EFI_PHYSICAL_ADDRESS stack; ST = systab; BS = ST->BootServices; RS = ST->RuntimeServices; IH = image;
From the start, it looks a lot easier than the boot process of yore.
We actually can use C code, because there are actual EFI
structures we can rely on.
We start with grabbing a few of those in shorter handles.
/* disable reset by watchdog after 5 minutes */ BS->SetWatchdogTimer(0, 0, 0, NULL); efi_video_init(); efi_heap_init();
The first part explains itself. We don't wanna be bothered by some
UEFI firmware resets. Then we go ahead and grab us some screentime
to print on and prepare some memory to work within.
These two functions we will explore later. First we will dive into the boot itself.
status = BS->HandleProtocol(image, &imgp_guid, (void **)&imgp); if (status == EFI_SUCCESS) status = BS->HandleProtocol(imgp->DeviceHandle, &devp_guid, (void **)&dp0); if (status == EFI_SUCCESS) { for (dp = dp0; !IsDevicePathEnd(dp); dp = NextDevicePathNode(dp)) { if (DevicePathType(dp) == MEDIA_DEVICE_PATH && (DevicePathSubType(dp) == MEDIA_HARDDRIVE_DP || DevicePathSubType(dp) == MEDIA_CDROM_DP)) { bios_bootdev = (DevicePathSubType(dp) == MEDIA_CDROM_DP) ? 0x1e0 : 0x80; efi_bootdp = dp0; break; } else if (DevicePathType(dp) == MESSAGING_DEVICE_PATH&& DevicePathSubType(dp) == MSG_MAC_ADDR_DP) { bios_bootdev = 0x0; efi_bootdp = dp0; break; } } }
It looks long, but all it does is iterate over devices that UEFI
knows about and trying to find a bootable medium (USB sticks, CDs,
drives, etc).
#ifdef __amd64__ /* allocate run_i386_start() on heap */ if ((run_i386 = alloc(run_i386_size)) == NULL) panic("alloc() failed"); memcpy(run_i386, run_i386_start, run_i386_size); #endif
As far as I could find, run_i386_start is:
run_i386(_start) is to call the loaded kernel's start() with 32bit segment mode from x64 mode. %rdi == loaded start address, %rsi == kernel start address
But I am not too versed in segment mode and all that.
#ifdef __amd64__ progname = "BOOTX64"; #else progname = "BOOTIA32"; #endif
This is really just setting the name of the executable. Moving on.
/* * Move the stack before calling boot(). UEFI on some machines * locate the stack on our kernel load address. */ stack = heap + heapsiz; #if defined(__amd64__) asm("movq %0, %%rsp;" "mov %1, %%edi;" "call boot;" :: "r"(stack - 32), "r"(bios_bootdev)); #else asm("movl %0, %%esp;" "movl %1, (%%esp);" "call boot;" :: "r"(stack - 32), "r"(bios_bootdev)); #endif /* must not reach here */ return (EFI_SUCCESS); }
As the comment states, we need to move the stack to after our bootloader and its heap. Otherwise we might overwrite our own stack.
Once we are done with that, we can call boot.
Boot it is
Here it is. The one and only boot function. For the most part, it
tries to be architecture independent, as we are done with (most)
architecture specific stuff of our boot process. There is one small
exception though at the beginning.
void boot(dev_t bootdev) { int fd, isupgrade = 0; int try = 0, st; uint64_t marks[MARK_MAX]; machdep(); snprintf(prog_ident, sizeof(prog_ident), ">> OpenBSD/" MACHINE " %s %s", progname, version); printf("%s\n", prog_ident);
See that machdep() there? This one is responsible for setting up
our framebuffer and other devices. The code is dependent on
architecture and which kind of boot method we used. Going into this
would exceed this post, but we might take a peek at some point.
devboot(bootdev, cmd.bootdev); strlcpy(cmd.image, kernelfile, sizeof(cmd.image)); cmd.boothowto = 0; cmd.conf = "/etc/boot.conf"; cmd.timeout = boottimeout;
Here we are just preparing the cmd structure used later. devboot
just translates from an actual device to it's string representation.
if (upgrade()) { strlcpy(cmd.image, "/bsd.upgrade", sizeof(cmd.image)); printf("upgrade detected: switching to %s\n", cmd.image); isupgrade = 1; }
If we detect an upgrade, we change the image source to point to
/bsd.upgrade.
Quick interlude for upgrade
int upgrade(void) { struct stat sb; if (stat(qualify(("/bsd.upgrade")), &sb) < 0) return 0; if ((sb.st_mode & S_IXUSR) == 0) { printf("/bsd.upgrade is not u+x\n"); return 0; } return 1; }
This is all it does. It checks if the /bsd.upgrade file exists and
if it's executable. That means there is an upgrade waiting for us.
Back to the boot code:
st = read_conf();
This reads and applies any commands specified in /etc/boot.conf.
#ifdef HIBERNATE int bootdev_has_hibernate(void); if (bootdev_has_hibernate()) { strlcpy(cmd.image, "/bsd.booted", sizeof(cmd.image)); printf("unhibernate detected: switching to %s\n", cmd.image); cmd.boothowto |= RB_UNHIBERNATE; } #endif
If the machine we are booting from supports hibernation AND is
resuming from one, we attempt to load /bsd.booted.
if (!bootprompt) snprintf(cmd.path, sizeof cmd.path, "%s:%s", cmd.bootdev, cmd.image); while (1) { /* no boot.conf, or no boot cmd in there */ if (bootprompt && st <= 0) { do { printf("boot> "); } while(!getcmd()); }
if (loadrandom(BOOTRANDOM, rnddata, sizeof(rnddata)) == 0) cmd.boothowto |= RB_GOODRANDOM; #ifdef MDRANDOM if (mdrandom(rnddata, sizeof(rnddata)) == 0) cmd.boothowto |= RB_GOODRANDOM; #endif #ifdef FWRANDOM if (fwrandom(rnddata, sizeof(rnddata)) == 0) cmd.boothowto |= RB_GOODRANDOM; #endif rc4_keysetup(&randomctx, rnddata, sizeof rnddata); rc4_skip(&randomctx, 1536);
Paranoia! But in a good way. Depending on how many sources are
actually enabled, we gather randomness from up to 3 different sources.
This is both, to give the kernel a good start for its RNG, but also
to initialize the ASLR (Address Space Layour Randomization).
st = 0; bootprompt = 1; /* allow reselect should we fail */ printf("booting %s: ", cmd.path); marks[MARK_START] = 0; if ((fd = loadfile(cmd.path, marks, LOAD_ALL)) != -1) { /* Prevent re-upgrade: chmod a-x bsd.upgrade */ if (isupgrade) { struct stat st; if (fstat(fd, &st) == 0) { st.st_mode &= ~(S_IXUSR|S_IXGRP|S_IXOTH); if (fchmod(fd, st.st_mode) == -1) printf("fchmod a-x %s: failed\n", cmd.path); } } close(fd); break; }
Now we actually load the kernel with loadfile. It will read the
file, and add marks for code, data, etc., so that we know where all
addresses are.
One part in here is the if (isupgrade) part. If we are doing an
upgrade, we remove the executable flag from the file. If it fails and
reboots, it will not try again to upgrade and default back to the old
kernel.
kernelfile = KERNEL;
try++;
strlcpy(cmd.image, kernelfile, sizeof(cmd.image));
printf(" failed(%d). will try %s\n", errno, kernelfile);
if (try < 2) {
if (cmd.timeout > 0)
cmd.timeout++;
} else {
if (cmd.timeout)
printf("Turning timeout off.\n");
cmd.timeout = 0;
}
}
It loading failed, we set the default to KERNEL (usually /bsd) and
try again, while increasing the counter to keep track of how many
times we tried.
/* exec */ run_loadfile(marks, cmd.boothowto); }
And voila, with everything loaded, we actually run the kernel and jump
into it. run_loadfile will set up all arguments for the kernel and
then call ExitBootServices() to tell UEFI we are done and taking
over control.
Conclusion
Well, there it is…the mythical boot process. Sure, the overall code is a lot more, but the core gist of it is right here.
At this point, we are ready to actually run the kernel. And surprise: that is the next thing we will look at.