Daily Source Reading: `ed` [Part 1 - Overview]

`ed`

Time for something bigger. Not insanely large, but definitely more than an hour or two to read through. That is why I am going to split this up into several parts.

As has been a running theme for the first few days, we will start by looking at the OpenBSD version. I expect the code to be a bit easier to read, although I don't expect too many changes with the FreeBSD version, but we will see.

This time around I will definitely not go through each line, that would be too much.

We will look at the overall structure and grab a few interesting bits to look at.

OpenBSD

Signal setup

The first setup regarding signals we see is:

if (isatty(STDIN_FILENO)) {
        handle_winch(SIGWINCH);
        signal(SIGWINCH, handle_winch);
    }

If we search for it, we will find at the end of main.c the following implementation:

void
handle_winch(int signo)
{
    int save_errno = errno;
    struct winsize ws;      /* window size structure */

    if (ioctl(STDIN_FILENO, TIOCGWINSZ, &ws) == 0) {
        if (ws.ws_row > 2)
            rows = ws.ws_row - 2;
        if (ws.ws_col > 8)
            cols = ws.ws_col - 8;
    }
    errno = save_errno;
}

So, this seems to first get the current window size and our signal handler informs us about any terminal size changes. This is needed for proper printing and line breaks.

signal(SIGHUP, signal_hup);
siginterrupt(SIGHUP, 1);
signal(SIGQUIT, SIG_IGN);
signal(SIGINT, signal_int);

Here, we set up a few signal handlers. For example, we opt to ignore the SIGQUIT (CTRL-) signal. At the end we set up a handler for SIGINT (CTRL-C) and pass it to signal_int.

Lastly, there is

if (sigsetjmp(env, 1)) {
    status = -1;
    fputs("\n?\n", stdout);
    seterrmsg("interrupt");
} else {

Here, we save our current context, and should we ever get hit with an interrupt (for example CTRL-C), we jump back to this point.

This happens in handle_int:

void
handle_int(int signo)
{
    if (!sigactive)
        _exit(1);
    sigint = 0;
    siglongjmp(env, -1);
}

called from signal_int:

void
signal_int(int signo)
{
    if (mutex)
        sigint = 1;
    else
        handle_int(signo);  /* XXX quite unsafe */
}

I do not know the exact details, but my best guess is that if we accidentally kick of an expensive regexp to subsitute things in a large file, we want to be able to kill it, but stay within ed and continue editing.

I do wonder though what the XXX quite unsafe part is about as I currently am clueless as to why and how.

Main loop

Now that the initial setup is done, we hop into the main loop. In BSD style, this is usually done with ax

for (;;) {
}

The early parts are straightforward. Print an error message if the status indicates an error and if we are garrulous (Actually had to look that word up. Interesting choice for a variable name. verbose didn't make the cut?)

Emit the custom prompt if one was set.

Then we read the next input via get_tty_line. This function does some escape handling and finally returns the number of characters read.

if ((status = extract_addr_range()) >= 0 &&
    (status = exec_command()) >= 0)
    if (!status || (status &&
        (status = display_lines(current_addr, current_addr,
        status)) >= 0))
        continue;

Finally, we try to extract the address range and apply the command we entered. If everything looks good, we display lines if needed and then continue to the next round of the loop.

The rest of the loop is mostly just error handling and printing warnings.

/* extract_addr_range: get line addresses from the command buffer until an
   illegal address is seen; return status */
int
extract_addr_range(void)
{
    int addr;

    addr_cnt = 0;
    first_addr = second_addr = current_addr;
    while ((addr = next_addr()) >= 0) {
        addr_cnt++;
        first_addr = second_addr;
        second_addr = addr;
        if (*ibufp != ',' && *ibufp != ';')
            break;
        else if (*ibufp++ == ';')
            current_addr = addr;
    }
    if ((addr_cnt = min(addr_cnt, 2)) == 1 || second_addr != addr)
        first_addr = second_addr;
    return (addr == ERR) ? ERR : 0;
}

It looks a bit convoluted at first glance, but it's actually not that bad. next_addr is a tad long, but just simple straightup parsing of address ranges and handling things like %, ,, ; and other special forms for addresses.

The main part is exec_command obviously.

In there, we just look at the command character (and possible further) and just apply the appropriate actions. Let's look at a few:

switch ((c = (unsigned char)*ibufp++)) {
case 'a':
    GET_COMMAND_SUFFIX();
    if (!isglobal) clear_undo_stack();
    if (append_lines(second_addr) < 0)
        return ERR;
    break;
case 'c':
    if (check_addr_range(current_addr, current_addr) < 0)
        return ERR;
    GET_COMMAND_SUFFIX();
    if (!isglobal) clear_undo_stack();
    if (delete_lines(first_addr, second_addr) < 0 ||
        append_lines(current_addr) < 0)
        return ERR;
    break;
case 'd':
    if (check_addr_range(current_addr, current_addr) < 0)
        return ERR;
    GET_COMMAND_SUFFIX();
    if (!isglobal) clear_undo_stack();
    if (delete_lines(first_addr, second_addr) < 0)
        return ERR;
    else if ((addr = INC_MOD(current_addr, addr_last)) != 0)
        current_addr = addr;
    break;

It's nothing fancy or complicated, just straight up applying deletes, appends and other operations.

The exact implementations of at least a few of these will be in the next part, so we blast past these for now to concentrate on the bigger picture.

FreeBSD

FreeBSD seems to have very similar code, but a few minor differences at first glance.

For example, the signal handling here is more customized by definitions.

#ifdef SIGWINCH
    handle_winch(SIGWINCH);
    if (isatty(0)) signal(SIGWINCH, handle_winch);
#endif
    signal(SIGHUP, signal_hup);
    signal(SIGQUIT, SIG_IGN);
    signal(SIGINT, signal_int);
#ifdef _POSIX_SOURCE
    if ((status = sigsetjmp(env, 1)))
#else
    if ((status = setjmp(env)))
#endif

For example, it seems to fall back to setjmp instead of sigsetjmp, depending on the required POSIX compatibility, or not caring about SIGWINCH if not defined.

Conclusion

To be fair, ed has been around for decades. So it's no surprise that the implementation on both is very similar. For today, we only managed to get the general overview and I want to keep these posts light and 5-10 minute reads. Next, we will see what we explore. Either some of the command implementation, the buffers or the regexp engine.

Stay tuned, and feel free to let me know on my Mastodon account about any Feedback or any specific parts you want to see next in the series.

Daily Source Reading: ed [Part 1 - Overview]