Daily Source Reading: Boot Process - Stage 1, EFI edition [OpenBSD]

This seems to be a recurring theme. Yes, we will look at OpenBSD first. I took a look at FreeBSD also, but the boot process there is a bit more involved due to SYSINIT. We will get to it, but for getting our feet wet, we will start with OpenBSD.

Disclaimer

I have been wrecking my brains about where to start. Do we start with the kernel main function and work our way backwards to the bootloader? Or do we start at the groundfloor? In the end, I think it makes more sense to start with the bootloader.

For brevity's sake, we will only look at amd64 architecture and only use UEFI instead of classic old MBR.

I will also very likely get a lot of this wrong. I am not the most knowledgeable about UEFI. My last experience writing an OS was with good old BIOS boot. So, some of the very UEFI specific parts I might skip over.

But even the EFI boot process has some assembly involved. But for now, we will not go into that at all. This would take a lot more time and also currently exceeds my daily bandwidth by a truckload.

With that out of the way, here goes nothing. Let's dive in and understand the boot process.

An OS to boot

From what I gathered from EFI, you provide it with a binary with an efi_main. That is where we start off. Its only goal is to load our boot loader and kick shit off. So, let's start there.

We power up our computer, and then what happens? In the times of yore, the BIOS would have looked at all bootable devices in the order specified in the settings and check the first sectors for a boot sector and try to load that (very washed down explanation here).

Nowadays, most computers use UEFI. It's its own system in a way. In short, it looks for FAT formatted partitions, EFI partitions, and unless something is stored in its nvram settings, it will fall back to looking for say /efi/boot/bootx64.efi. This is our first bootloader.

There is a caveat though. The bootloader is expected to be a PE (i.e. Windows) executable. PE has a different calling convention than SysV (what BSD uses).

Calling conventions

Without going into too much detail, it's a convention on how to pass arguments in assembly code. Microsoft's calling convention PE for example passes the first 4 arguments to a function via the CPU registers RCX, RDX, R8 and R9, in that order.

But SysV uses RDI, RSI, RDX and RCX, in that order.

We want to write most of our code in SysV style, and our toolchains are laid out for that. But UEFI expects PE. What do we do?

Cast the magic spell from hither to yonder

    .text
    .align 4

    .globl _start
_start:
    subq $8, %rsp

First we align the stack (rsp), so that it aligns to 16 byte. That will make it easier later, because SysV expectes 16 byte alignment.

We only adjust it by 8 bytes, because the UEFI firmware calls us. That means it already put an 8 byte return address on the stack.

pushq %rcx
pushq %rdx

Now that our stack is properly aligned, we take the RCX and RDX registers and push them into the stack. Remember, that is how PE passes arguments. So effectively we are just storing the first 2 arguments on the stack.

0:
    lea ImageBase(%rip), %rdi
    lea _DYNAMIC(%rip), %rsi

Now we load the address of ImageBase and the _DYNAMIC section from our binary and put them in rdi and rsi respectively. We are preparing them as the first 2 arguments for an upcoming call.

popq %rcx
popq %rdx
pushq %rcx
pushq %rdx

Now we restore the two values we pushed earlier (the original arguments). We restore them into rcx and rdx but also immediately push them onto the stack again. Basically we are peeking at the values.

call self_reloc

Now we are ready with all our arguments and go call self_reloc. We will look at that next, so bear with me for a bit.

popq %rdi
popq %rsi

With the call done and everything relocated, we restore the two arguments from the stack into rdi and rsi (remember…calling convention).

call efi_main
addq $8, %rsp

We are ready to call efi_main, our actual boot loader. Once done, we are nice and restore to stack to how it was when we started.

.exit:
    ret

    /*
     * hand-craft a dummy .reloc section so EFI knows it's a relocatable
     * executable:
     */

    .data
    .section .reloc, "a"
    .long   0
    .long   10
    .word   0

As the comment states, this is just a dummy section so that UEFI is happy with us.

Relocation, relocation, relocation

/*
 * A simple elf relocator.
 */
void
self_reloc(Elf_Addr baseaddr, ElfW_Dyn *dynamic)
{
    Elf_Word relsz, relent;
    Elf_Addr *newaddr;
    ElfW_Rel *rel = NULL;
    ElfW_Dyn *dynp;

/*
 * Find the relocation address, its size and the relocation entry.
 */
relsz = 0;
relent = 0;
for (dynp = dynamic; dynp->d_tag != DT_NULL; dynp++) {
    switch (dynp->d_tag) {
    case DT_REL:
    case DT_RELA:
        rel = (ElfW_Rel *)(dynp->d_un.d_ptr + baseaddr);
        break;
    case DT_RELSZ:
    case DT_RELASZ:
        relsz = dynp->d_un.d_val;
        break;
    case DT_RELENT:
    case DT_RELAENT:
        relent = dynp->d_un.d_val;
        break;
    default:
        break;
    }
}

This walks the _DYNAMIC array looking for three pieces of information:

where the relocation table is
how big it is
how big each entry is

    /*
     * Perform the actual relocation. We rely on the object having been
     * linked at 0, so that the difference between the load and link
     * address is the same as the load address.
     */
    for (; relsz > 0; relsz -= relent) {
        switch (ELFW_R_TYPE(rel->r_info)) {
        case RELOC_TYPE_NONE:
            /* No relocation needs be performed. */
            break;

        case RELOC_TYPE_RELATIVE:
            newaddr = (Elf_Addr *)(rel->r_offset + baseaddr);
#ifdef ELF_RELA
            /* Addend relative to the base address. */
            *newaddr = baseaddr + rel->r_addend;
#else
            /* Address relative to the base address. */
            *newaddr += baseaddr;
#endif
            break;
        default:
            /* XXX: do we need other relocations ? */
            break;
        }
        rel = (ElfW_Rel *) ((caddr_t) rel + relent);
    }
}

The comment mostly explains what it does.

The main stage

So, we have the first few stumbling blocks out of the way. We have switched over to our own calling convention and are not reliant on PE anymore and we have patched up our addresses. Time to jump into the next stage of booting.

EFI_STATUS
efi_main(EFI_HANDLE image, EFI_SYSTEM_TABLE *systab)
{
    extern char     *progname;
    EFI_LOADED_IMAGE    *imgp;
    EFI_DEVICE_PATH     *dp0 = NULL, *dp;
    EFI_STATUS       status;
    EFI_PHYSICAL_ADDRESS     stack;

    ST = systab;
    BS = ST->BootServices;
    RS = ST->RuntimeServices;
    IH = image;

From the start, it looks a lot easier than the boot process of yore. We actually can use C code, because there are actual EFI structures we can rely on.

We start with grabbing a few of those in shorter handles.

/* disable reset by watchdog after 5 minutes */
BS->SetWatchdogTimer(0, 0, 0, NULL);

efi_video_init();
efi_heap_init();

The first part explains itself. We don't wanna be bothered by some UEFI firmware resets. Then we go ahead and grab us some screentime to print on and prepare some memory to work within.

These two functions we will explore later. First we will dive into the boot itself.

status = BS->HandleProtocol(image, &imgp_guid, (void **)&imgp);
if (status == EFI_SUCCESS)
    status = BS->HandleProtocol(imgp->DeviceHandle, &devp_guid,
        (void **)&dp0);
if (status == EFI_SUCCESS) {
    for (dp = dp0; !IsDevicePathEnd(dp);
        dp = NextDevicePathNode(dp)) {
        if (DevicePathType(dp) == MEDIA_DEVICE_PATH &&
            (DevicePathSubType(dp) == MEDIA_HARDDRIVE_DP ||
            DevicePathSubType(dp) == MEDIA_CDROM_DP)) {
            bios_bootdev =
                (DevicePathSubType(dp) == MEDIA_CDROM_DP)
                ? 0x1e0 : 0x80;
            efi_bootdp = dp0;
            break;
        } else if (DevicePathType(dp) == MESSAGING_DEVICE_PATH&&
            DevicePathSubType(dp) == MSG_MAC_ADDR_DP) {
            bios_bootdev = 0x0;
            efi_bootdp = dp0;
            break;
        }
    }
}

It looks long, but all it does is iterate over devices that UEFI knows about and trying to find a bootable medium (USB sticks, CDs, drives, etc).

#ifdef __amd64__
    /* allocate run_i386_start() on heap */
    if ((run_i386 = alloc(run_i386_size)) == NULL)
        panic("alloc() failed");
    memcpy(run_i386, run_i386_start, run_i386_size);
#endif

As far as I could find, run_i386_start is:

run_i386(_start) is to call the loaded kernel's start() with
32bit segment mode from x64 mode.
%rdi == loaded start address, %rsi == kernel start address

But I am not too versed in segment mode and all that.

#ifdef __amd64__
    progname = "BOOTX64";
#else
    progname = "BOOTIA32";
#endif

This is really just setting the name of the executable. Moving on.

    /*
     * Move the stack before calling boot().  UEFI on some machines
     * locate the stack on our kernel load address.
     */
    stack = heap + heapsiz;
#if defined(__amd64__)
    asm("movq   %0, %%rsp;"
        "mov    %1, %%edi;"
        "call   boot;"
        :: "r"(stack - 32), "r"(bios_bootdev));
#else
    asm("movl   %0, %%esp;"
        "movl   %1, (%%esp);"
        "call   boot;"
        :: "r"(stack - 32), "r"(bios_bootdev));
#endif
    /* must not reach here */
    return (EFI_SUCCESS);
}

As the comment states, we need to move the stack to after our bootloader and its heap. Otherwise we might overwrite our own stack.

Once we are done with that, we can call boot.

Boot it is

Here it is. The one and only boot function. For the most part, it tries to be architecture independent, as we are done with (most) architecture specific stuff of our boot process. There is one small exception though at the beginning.

void
boot(dev_t bootdev)
{
    int fd, isupgrade = 0;
    int try = 0, st;
    uint64_t marks[MARK_MAX];

    machdep();

    snprintf(prog_ident, sizeof(prog_ident),
        ">> OpenBSD/" MACHINE " %s %s", progname, version);
    printf("%s\n", prog_ident);

See that machdep() there? This one is responsible for setting up our framebuffer and other devices. The code is dependent on architecture and which kind of boot method we used. Going into this would exceed this post, but we might take a peek at some point.

devboot(bootdev, cmd.bootdev);
strlcpy(cmd.image, kernelfile, sizeof(cmd.image));
cmd.boothowto = 0;
cmd.conf = "/etc/boot.conf";
cmd.timeout = boottimeout;

Here we are just preparing the cmd structure used later. devboot just translates from an actual device to it's string representation.

if (upgrade()) {
    strlcpy(cmd.image, "/bsd.upgrade", sizeof(cmd.image));
    printf("upgrade detected: switching to %s\n", cmd.image);
    isupgrade = 1;
}

If we detect an upgrade, we change the image source to point to /bsd.upgrade.

Quick interlude for `upgrade`

int
upgrade(void)
{
    struct stat sb;

    if (stat(qualify(("/bsd.upgrade")), &sb) < 0)
        return 0;
    if ((sb.st_mode & S_IXUSR) == 0) {
        printf("/bsd.upgrade is not u+x\n");
        return 0;
    }
    return 1;
}

This is all it does. It checks if the /bsd.upgrade file exists and if it's executable. That means there is an upgrade waiting for us.

Back to the boot code:

st = read_conf();

This reads and applies any commands specified in /etc/boot.conf.

#ifdef HIBERNATE
    int bootdev_has_hibernate(void);

    if (bootdev_has_hibernate()) {
        strlcpy(cmd.image, "/bsd.booted", sizeof(cmd.image));
        printf("unhibernate detected: switching to %s\n", cmd.image);
        cmd.boothowto |= RB_UNHIBERNATE;
    }
#endif

If the machine we are booting from supports hibernation AND is resuming from one, we attempt to load /bsd.booted.

if (!bootprompt)
    snprintf(cmd.path, sizeof cmd.path, "%s:%s",
        cmd.bootdev, cmd.image);

while (1) {
    /* no boot.conf, or no boot cmd in there */
    if (bootprompt && st <= 0) {
        do {
            printf("boot> ");
        } while(!getcmd());
    }

        if (loadrandom(BOOTRANDOM, rnddata, sizeof(rnddata)) == 0)
            cmd.boothowto |= RB_GOODRANDOM;
#ifdef MDRANDOM
        if (mdrandom(rnddata, sizeof(rnddata)) == 0)
            cmd.boothowto |= RB_GOODRANDOM;
#endif
#ifdef FWRANDOM
        if (fwrandom(rnddata, sizeof(rnddata)) == 0)
            cmd.boothowto |= RB_GOODRANDOM;
#endif
        rc4_keysetup(&randomctx, rnddata, sizeof rnddata);
        rc4_skip(&randomctx, 1536);

Paranoia! But in a good way. Depending on how many sources are actually enabled, we gather randomness from up to 3 different sources. This is both, to give the kernel a good start for its RNG, but also to initialize the ASLR (Address Space Layour Randomization).

st = 0;
bootprompt = 1; /* allow reselect should we fail */

printf("booting %s: ", cmd.path);
marks[MARK_START] = 0;
if ((fd = loadfile(cmd.path, marks, LOAD_ALL)) != -1) {

        /* Prevent re-upgrade: chmod a-x bsd.upgrade */
    if (isupgrade) {
        struct stat st;

        if (fstat(fd, &st) == 0) {
            st.st_mode &= ~(S_IXUSR|S_IXGRP|S_IXOTH);
            if (fchmod(fd, st.st_mode) == -1)
                printf("fchmod a-x %s: failed\n",
                    cmd.path);
        }
    }
    close(fd);
    break;
}

Now we actually load the kernel with loadfile. It will read the file, and add marks for code, data, etc., so that we know where all addresses are.

One part in here is the if (isupgrade) part. If we are doing an upgrade, we remove the executable flag from the file. If it fails and reboots, it will not try again to upgrade and default back to the old kernel.

    kernelfile = KERNEL;
    try++;
    strlcpy(cmd.image, kernelfile, sizeof(cmd.image));
    printf(" failed(%d). will try %s\n", errno, kernelfile);

    if (try < 2) {
        if (cmd.timeout > 0)
            cmd.timeout++;
    } else {
        if (cmd.timeout)
            printf("Turning timeout off.\n");
        cmd.timeout = 0;
    }
}

It loading failed, we set the default to KERNEL (usually /bsd) and try again, while increasing the counter to keep track of how many times we tried.

    /* exec */
    run_loadfile(marks, cmd.boothowto);
}

And voila, with everything loaded, we actually run the kernel and jump into it. run_loadfile will set up all arguments for the kernel and then call ExitBootServices() to tell UEFI we are done and taking over control.

Conclusion

Well, there it is…the mythical boot process. Sure, the overall code is a lot more, but the core gist of it is right here.

At this point, we are ready to actually run the kernel. And surprise: that is the next thing we will look at.