Daily Source Reading: `ed` [Part 4 - Regexps]

`ed` regexps

The final strech! We are very close to ending our journey in the wonderful land of ed. We have looked at its overall structure, its buffers and its commands. Now it is time for the one command we skipped over in our last part: s, the substitution command.

do {
    switch (*ibufp) {
    case '\n':
        sflags |=SGF;
        break;
    case 'g':
        sflags |= SGG;
        ibufp++;
        break;
    case 'p':
        sflags |= SGP;
        ibufp++;
        break;
    case 'r':
        sflags |= SGR;
        ibufp++;
        break;
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
        STRTOI(sgnum, ibufp);
        sflags |= SGF;
        sgflag &= ~GSG;     /* override GSG */
        break;
    default:
        if (sflags) {
            seterrmsg("invalid command suffix");
            return ERR;
        }
    }
} while (sflags && *ibufp != '\n');

This first part performs some additional option parsing for the s command. It behaves very similar to what you'd expect from getopt or the likes.

These flags are for the case that we are repeating a substitution. For example sg to repeat the last substitution with the global flag enabled.

if (sflags && !pat) {
    seterrmsg("no previous substitution");
    return ERR;
} else if (sflags & SGG)
    sgnum = 0;      /* override numeric arg */
if (*ibufp != '\n' && *(ibufp + 1) == '\n') {
    seterrmsg("invalid pattern delimiter");
    return ERR;
}

Then it's off to some quick sanity checking. To repeat something, we require a previous substitution to have happened.

tpat = pat;
SPL1();
if ((!sflags || (sflags & SGR)) &&
    (tpat = get_compiled_pattern()) == NULL) {
    SPL0();
    return ERR;
} else if (tpat != pat) {
    if (pat) {
        regfree(pat);
        free(pat);
    }
    pat = tpat;
    patlock = 1;        /* reserve pattern */
}
SPL0();

Here we go and first lock down interrupts so we can proceed without any disturbance. If we are not repeating anything or we are repeating with the r flag, we go and recompile the pattern. In some implementations this is a hand-crafted regexp engine. OpenBSD here uses the regex library to keep it simple. We don't have to re-invent everything.

If the new pattern is different from the old one, we release the old one and save the new pattern into it.

if (!sflags && extract_subst_tail(&sgflag, &sgnum) < 0)
    return ERR;
else if (isglobal)
    sgflag |= GLB;
else
    sgflag &= ~GLB;
if (sflags & SGG)
    sgflag ^= GSG;
if (sflags & SGP) {
    sgflag ^= GPR;
    sgflag &= ~(GLS | GNP);
}

So, if we are not repeating anything (i.e. sflags is not set), we extract the subtitution text and any flags after it. The text we want to replace was already done in the pattern compilation part.

do {
    switch (*ibufp) {
    case 'p':
        sgflag |= GPR;
        ibufp++;
        break;
    case 'l':
        sgflag |= GLS;
        ibufp++;
        break;
    case 'n':
        sgflag |= GNP;
        ibufp++;
        break;
    default:
        n++;
    }
} while (!n);

This is plain parsin of any last printing related flags. Not interesting, but it's there. A lot of the seeming "complexity" of the s command code stems mostly from the option parsing.

if (check_addr_range(current_addr, current_addr) < 0)
    return ERR;
GET_COMMAND_SUFFIX();
if (!isglobal) clear_undo_stack();
if (search_and_replace(pat, sgflag, sgnum) < 0)
    return ERR;
break;

And here finally we check if we are good to go and then perform our search_and_replace. The implementation of this one exceeds my concentration today (came down with a nasty cold). But as we have seen, the code is not dark magic and does not require any blood sacrifices.

Shorter version

If we break all that code down into a simple digestible version, it would look something like this (very, very simplified):

case 's':
    parse_repeat_flags();
    compile_or_reuse_pattern();
    parse_substitution_tail();
    apply_repeat_flag_toggles();
    execute_substitution();
    break;

Conclusion

I think this is it for now in our trip through ed.

I highly recommend trying to use it for a bit. Build some experience with it. There are systems that don't even have a vi installed. So knowing ed can potentially save your bacon if your network switch gets hosed or other obscure scenarios.

It will also explain the heritage of a few commands in vi and vim.

What's up next? I haven't decided yet. Due to being sick at the moment, it will probably something small, yet interesting. I am thinking maybe the yes command. You'd be surprised what stark differences there can be.

Daily Source Reading: ed [Part 4 - Regexps]

ed regexps

Shorter version

Conclusion

Daily Source Reading: `ed` [Part 4 - Regexps]

`ed` regexps