Daily Source Reading: `ed` [Part 2 - Buffers]

`ed` buffers

With the general overview out of the way, we are going to look at buffers. We will look at commands soon enough, but they make more sense once we understand how ed handles its buffers.

/* Line node */
typedef struct  line {
    struct line *q_forw;
    struct line *q_back;
    off_t       seek;       /* address of line in scratch buffer */
    int     len;        /* length of line */
} line_t;

Many editors use ROPE or other fancy structures to optimize performance and certain operations. ed on the other hand only cares about lines, as it is a line editor.

Note here that the line_t structure does NOT hold any text. It merely keeps track of where we are in the buffer and how long the line is. It's just a doubly-linked linked list of buffer positions basically.

First we'll look at how we actually retrieve data from a buffer.

/* get_sbuf_line: get a line of text from the scratch file; return pointer
   to the text */
char *
get_sbuf_line(line_t *lp)
{
  static char *sfbuf = NULL;    /* buffer */
  static int sfbufsz = 0;       /* buffer size */
  int len;

  if (lp == &buffer_head)
    return NULL;

We pass in a line_t, basically the position and length in the buffer. If it's the list head, we return nothing, as the head is just a marker to designate the beginning (and end) of the list.

seek_write = 1;             /* force seek on write */
/* out of position */
if (sfseek != lp->seek) {
    sfseek = lp->seek;
    if (fseeko(sfp, sfseek, SEEK_SET) == -1) {
        perror(NULL);
        seterrmsg("cannot seek temp file");
        return NULL;
    }
}

We seem to have an actual line, so we do a quick check if our current position in the buffer matches up with the line. If it doesn't, we need to seek in the file to get to the correct location.

    len = lp->len;
    REALLOC(sfbuf, sfbufsz, len + 1, NULL);
    if (fread(sfbuf, sizeof(char), len, sfp) != len) {
        perror(NULL);
        seterrmsg("cannot read temp file");
        return NULL;
    }
    sfseek += len;              /* update file position */
    sfbuf[len] = '\0';
    return sfbuf;
}

A quick check how much data we need to read and resizing the read buffer accordingly, then we are good to go and just read the entire chunk from the file and be a good citizen and null-terminate the string.

For the opposite, writing a line to the buffer, we'll skip the boring parts and just look at the main part.

/* out of position */
if (seek_write) {
    if (fseek(sfp, 0L, SEEK_END) == -1) {
        perror(NULL);
        seterrmsg("cannot seek temp file");
        free(lp);
        return NULL;
    }
    sfseek = ftello(sfp);
    seek_write = 0;
}
/* assert: SPL1() */
if (fwrite(cs, sizeof(char), len, sfp) != len) {
    sfseek = -1;
    perror(NULL);
    seterrmsg("cannot write temp file");
    free(lp);
    return NULL;
}
lp->len = len;
lp->seek  = sfseek;
add_line_node(lp);
sfseek += len;          /* update file position */
return ++s;

Again, quickly making sure we are at the right location (if seek_write is enabled), and then we simply fwrite the data to the buffer, set the length and position for the line and add to our list with add_line_node.

One observation here is that the code asserts and expects that SPL1() has been called to disable interrupts. We wouldn't wanna get interrupted while actually writing to the file.

add_line_node itself reveals one nice small optimization. Let's take a look

/* add_line_node: add a line node in the editor buffer after the current line */
void
add_line_node(line_t *lp)
{
    line_t *cp;

    /* this get_addressed_line_node last! */
    cp = get_addressed_line_node(current_addr);
    INSQUE(lp, cp);
    addr_last++;
    current_addr++;
}

The trick here is in get_addressed_lin_node.

/* get_addressed_line_node: return pointer to a line node in the editor buffer */
line_t *
get_addressed_line_node(int n)
{
    static line_t *lp = &buffer_head;
    static int on = 0;

    SPL1();
    if (n > on) {
        if (n <= (on + addr_last) >> 1)
            for (; on < n; on++)
                lp = lp->q_forw;
        else {
            lp = buffer_head.q_back;
            for (on = addr_last; on > n; on--)
                lp = lp->q_back;
        }
    } else {
        if (n >= on >> 1)
            for (; on > n; on--)
                lp = lp->q_back;
        else {
            lp = &buffer_head;
            for (on = 0; on < n; on++)
                lp = lp->q_forw;
        }
    }
    SPL0();
    return lp;
}

See that static variable on? It helps us keep track of where we saw the last address we visited. This little piece of information helps us decide if we have a better chance looking forwards or backwards through the list.

We also have a way to check which line a line_t is actually on. All it requires is to find out where in the list it is.

/* get_line_node_addr: return line number of pointer */
int
get_line_node_addr(line_t *lp)
{
    line_t *cp = &buffer_head;
    int n = 0;

    while (cp != lp && (cp = cp->q_forw) != &buffer_head)
        n++;
    if (n && cp == &buffer_head) {
        seterrmsg("invalid address");
        return ERR;
    }
    return n;
}

Maybe not the most performant way to go about it (think of reaaaally large files), but also not worth it to optimize for this. This version requires zero bookkeeping and less memory.

I will spare the code for opening and closing buffers, this is just straight up boilerplate.. Only thing of note is that ed always uses scratch buffers that it puts into

#define SCRATCH_TEMPLATE      "/tmp/ed.XXXXXXXXXX"

using mkstemp, at least in this implementation.

Lastly, we need to initialize our scratch buffer.

static unsigned char ctab[256];     /* character translation table */

/* init_buffers: open scratch buffer; initialize line queue */
void
init_buffers(void)
{
    int i = 0;

    /* Read stdin one character at a time to avoid i/o contention
       with shell escapes invoked by nonterminal input, e.g.,
       ed - <<EOF
       !cat
       hello, world
       EOF */
    setvbuf(stdin, NULL, _IONBF, 0);
    if (open_sbuf() < 0)
        quit(2);
    REQUE(&buffer_head, &buffer_head);
    for (i = 0; i < 256; i++)
        ctab[i] = i;
}

stdin gets set to unbuffered, so each character immediately gets passed in instead of waiting for a while block. Then we initialize the line list and fill the ctab…

Wait..the what? It's yet another small quick optimization that is nice to show at the end.

ed sometimes needs to convert \0 null terminators to newlines and the other way around.

/* translit_text: translate characters in a string */
char *
translit_text(char *s, int len, int from, int to)
{
    static int i = 0;

    unsigned char *us;

    ctab[i] = i;            /* restore table to initial state */
    ctab[i = from] = to;
    for (us = (unsigned char *) s; len-- > 0; us++)
        *us = ctab[*us];
    return s;
}

Neat quick code that apart from the loop condition gets by without any further conditionals. It seems to be only used for translating between null terminators and newlines, but could also be used to for example replace every a with a t.

Conclusion

That wraps it up for today. ed keeps it simple here. No fancy =ROPE=s or equally difficult data structures. Just hold on to some lines info, keep the data in the file buffer and use as little memory as possible.

For tomorrow, we are probably going to look at commands finally. After that we might take a look at undo.

Feel free to drop some feedback over on Mastodon and see you for the next part.

(PS: To stick true to it, this blog post has been entirely written in ed instead of my usual Emacs session.)

Daily Source Reading: ed [Part 2 - Buffers]

ed buffers

Conclusion

Daily Source Reading: `ed` [Part 2 - Buffers]

`ed` buffers