Daily Source Reading: ed [Part 1 - Overview]
ed
Time for something bigger. Not insanely large, but definitely more than an hour or two to read through. That is why I am going to split this up into several parts.
As has been a running theme for the first few days, we will start by
looking at the OpenBSD version. I expect the code to be a bit
easier to read, although I don't expect too many changes with the
FreeBSD version, but we will see.
This time around I will definitely not go through each line, that would be too much.
We will look at the overall structure and grab a few interesting bits to look at.
OpenBSD
Signal setup
The first setup regarding signals we see is:
if (isatty(STDIN_FILENO)) {
handle_winch(SIGWINCH);
signal(SIGWINCH, handle_winch);
}
If we search for it, we will find at the end of main.c the following implementation:
void handle_winch(int signo) { int save_errno = errno; struct winsize ws; /* window size structure */ if (ioctl(STDIN_FILENO, TIOCGWINSZ, &ws) == 0) { if (ws.ws_row > 2) rows = ws.ws_row - 2; if (ws.ws_col > 8) cols = ws.ws_col - 8; } errno = save_errno; }
So, this seems to first get the current window size and our signal handler informs us about any terminal size changes. This is needed for proper printing and line breaks.
signal(SIGHUP, signal_hup); siginterrupt(SIGHUP, 1); signal(SIGQUIT, SIG_IGN); signal(SIGINT, signal_int);
Here, we set up a few signal handlers. For example, we opt to ignore
the SIGQUIT (CTRL-) signal. At the end we set up a handler for
SIGINT (CTRL-C) and pass it to signal_int.
Lastly, there is
if (sigsetjmp(env, 1)) { status = -1; fputs("\n?\n", stdout); seterrmsg("interrupt"); } else {
Here, we save our current context, and should we ever get hit with an interrupt (for example CTRL-C), we jump back to this point.
This happens in handle_int:
void handle_int(int signo) { if (!sigactive) _exit(1); sigint = 0; siglongjmp(env, -1); }
called from signal_int:
void signal_int(int signo) { if (mutex) sigint = 1; else handle_int(signo); /* XXX quite unsafe */ }
I do not know the exact details, but my best guess is that if we
accidentally kick of an expensive regexp to subsitute things in a
large file, we want to be able to kill it, but stay within ed and
continue editing.
I do wonder though what the XXX quite unsafe part is about as I currently am clueless as to why and how.
Main loop
Now that the initial setup is done, we hop into the main loop. In
BSD style, this is usually done with ax
for (;;) {
}
The early parts are straightforward. Print an error message if the
status indicates an error and if we are garrulous (Actually had to
look that word up. Interesting choice for a variable name. verbose
didn't make the cut?)
Emit the custom prompt if one was set.
Then we read the next input via get_tty_line. This function does
some escape handling and finally returns the number of characters
read.
if ((status = extract_addr_range()) >= 0 && (status = exec_command()) >= 0) if (!status || (status && (status = display_lines(current_addr, current_addr, status)) >= 0)) continue;
Finally, we try to extract the address range and apply the command we entered. If everything looks good, we display lines if needed and then continue to the next round of the loop.
The rest of the loop is mostly just error handling and printing warnings.
/* extract_addr_range: get line addresses from the command buffer until an illegal address is seen; return status */ int extract_addr_range(void) { int addr; addr_cnt = 0; first_addr = second_addr = current_addr; while ((addr = next_addr()) >= 0) { addr_cnt++; first_addr = second_addr; second_addr = addr; if (*ibufp != ',' && *ibufp != ';') break; else if (*ibufp++ == ';') current_addr = addr; } if ((addr_cnt = min(addr_cnt, 2)) == 1 || second_addr != addr) first_addr = second_addr; return (addr == ERR) ? ERR : 0; }
It looks a bit convoluted at first glance, but it's actually not that
bad. next_addr is a tad long, but just simple straightup parsing of
address ranges and handling things like %, ,, ; and other
special forms for addresses.
The main part is exec_command obviously.
In there, we just look at the command character (and possible further) and just apply the appropriate actions. Let's look at a few:
switch ((c = (unsigned char)*ibufp++)) { case 'a': GET_COMMAND_SUFFIX(); if (!isglobal) clear_undo_stack(); if (append_lines(second_addr) < 0) return ERR; break; case 'c': if (check_addr_range(current_addr, current_addr) < 0) return ERR; GET_COMMAND_SUFFIX(); if (!isglobal) clear_undo_stack(); if (delete_lines(first_addr, second_addr) < 0 || append_lines(current_addr) < 0) return ERR; break; case 'd': if (check_addr_range(current_addr, current_addr) < 0) return ERR; GET_COMMAND_SUFFIX(); if (!isglobal) clear_undo_stack(); if (delete_lines(first_addr, second_addr) < 0) return ERR; else if ((addr = INC_MOD(current_addr, addr_last)) != 0) current_addr = addr; break;
It's nothing fancy or complicated, just straight up applying deletes, appends and other operations.
The exact implementations of at least a few of these will be in the next part, so we blast past these for now to concentrate on the bigger picture.
FreeBSD
FreeBSD seems to have very similar code, but a few minor differences
at first glance.
For example, the signal handling here is more customized by definitions.
#ifdef SIGWINCH handle_winch(SIGWINCH); if (isatty(0)) signal(SIGWINCH, handle_winch); #endif signal(SIGHUP, signal_hup); signal(SIGQUIT, SIG_IGN); signal(SIGINT, signal_int); #ifdef _POSIX_SOURCE if ((status = sigsetjmp(env, 1))) #else if ((status = setjmp(env))) #endif
For example, it seems to fall back to setjmp instead of sigsetjmp,
depending on the required POSIX compatibility, or not caring about
SIGWINCH if not defined.
Conclusion
To be fair, ed has been around for decades. So it's no surprise
that the implementation on both is very similar. For today, we only
managed to get the general overview and I want to keep these posts
light and 5-10 minute reads. Next, we will see what we explore.
Either some of the command implementation, the buffers or the regexp
engine.
Stay tuned, and feel free to let me know on my Mastodon account about any Feedback or any specific parts you want to see next in the series.