Daily Source Reading: `yes`

`yes`

Yes (see what I did there?), really. You know, that command that only outputs yes or any other argument you give it?

Surely there can't be much code to read, huh? True, but also false as we'll see.

The reason why we picked yes today is two-fold:

I am still a bit under the weather, so can't concentrate on reading longer code
The difference in how FreeBSD and OpenBSD implement it is quite interesting

So without further ado, let's just jump right into it.

OpenBSD

As usual, we start with the one that is likely simpler:

int
main(int argc, char *argv[])
{
    if (pledge("stdio", NULL) == -1)
        err(1, "pledge");

    if (argc > 1)
        for (;;)
            puts(argv[1]);
    else
        for (;;)
            puts("y");
}

Yep, barring license header and includes, this is it. All of it. I don't really think I have to explain this code. Can't be easier than this.

FreeBSD

int
main(int argc, char **argv)
{
    char buf[8192];
    char y[2] = { 'y', '\n' };
    char * exp = y;
    size_t buflen = 0;
    size_t explen = sizeof(y);
    size_t more;
    ssize_t ret;

    if (caph_limit_stdio() < 0 || caph_enter() < 0)
        err(1, "capsicum");

    if (argc > 1) {
        exp = argv[1];
        explen = strlen(exp) + 1;
        exp[explen - 1] = '\n';
    }

    if (explen <= sizeof(buf)) {
        while (buflen < sizeof(buf) - explen) {
            memcpy(buf + buflen, exp, explen);
            buflen += explen;
        }
        exp = buf;
        explen = buflen;
    }

    more = explen;
    while ((ret = write(STDOUT_FILENO, exp + (explen - more), more)) > 0)
        if ((more -= ret) == 0)
            more = explen;

    err(1, "stdout");
    /*NOTREACHED*/
}

Talk about over-optimization. I am a bit at a loss for words. I am very sure that there are reasons for this, but this feels unnecessarily long.

So, let's step through it, even if it's not much.

char buf[8192];
char y[2] = { 'y', '\n' };
char * exp = y;
size_t buflen = 0;
size_t explen = sizeof(y);
size_t more;
ssize_t ret;

Mostly just some helper variables and a 8kb buffer…yes, you have read that right. 8 KILO bytes!

if (caph_limit_stdio() < 0 || caph_enter() < 0)
    err(1, "capsicum");

if (argc > 1) {
    exp = argv[1];
    explen = strlen(exp) + 1;
    exp[explen - 1] = '\n';
}

We quickly drop privileges down to stdio only and overwrite the default expression if one is given.

if (explen <= sizeof(buf)) {
    while (buflen < sizeof(buf) - explen) {
        memcpy(buf + buflen, exp, explen);
        buflen += explen;
    }
    exp = buf;
    explen = buflen;
}

This is the part where we start optimizing. We start filling a 8kb buffer with as many copies of the expression as we can squeeze in.

more = explen;
while ((ret = write(STDOUT_FILENO, exp + (explen - more), more)) > 0)
    if ((more -= ret) == 0)
        more = explen;

Finally we just loop and keep writing the entire buffer to stdout.

Conclusion

Sure, nowadays 8kb is nothing. But is it really necessary to allocate 8kb for the yes command? That is like 1/40th of ed on a fresh scratch buffer.

This shows again the difference between OpenBSD and FreeBSD. Simplicity vs. Performance. Neither is better than the other, it depends on your use case.

In terms of maintainability and understanding, OpenBSD wins here, and that is regardless of use case.

Addendum

So, I went and took a look at the GNU version…all I can say is that FreeBSD looks tame next to that.

Behold:

int
main (int argc, char **argv)
{
  /* REMOVED INIT STEPS */

  /* Buffer data locally once, rather than having the
     large overhead of stdio buffering each item.  */
  size_t bufalloc = 0;
  bool reuse_operand_strings = true;
  char **operandp = operands;
  do
    {
      size_t operand_len = strlen (*operandp);
      bufalloc += operand_len + 1;
      if (operandp + 1 < operand_lim
          && *operandp + operand_len + 1 != operandp[1])
        reuse_operand_strings = false;
    }
  while (++operandp < operand_lim);

  /* Improve performance by using a buffer size greater than BUFSIZ / 2.  */
  if (bufalloc <= BUFSIZ / 2)
    {
      bufalloc = BUFSIZ;
      reuse_operand_strings = false;
    }

#if defined __CHERI__
  /* Cheri capability bounds do not allow for this.  */
  reuse_operand_strings = false;
#endif

  /* Fill the buffer with one copy of the output.  If possible, reuse
     the operands strings; this wins when the buffer would be large.  */
  char *buf = reuse_operand_strings ? *operands : xmalloc (bufalloc);
  size_t bufused = 0;
  operandp = operands;
  do
    {
      size_t operand_len = strlen (*operandp);
      if (! reuse_operand_strings)
        memcpy (buf + bufused, *operandp, operand_len);
      bufused += operand_len;
      buf[bufused++] = ' ';
    }
  while (++operandp < operand_lim);
  buf[bufused - 1] = '\n';

  /* If a larger buffer was allocated, fill it by repeating the buffer
     contents.  */
  size_t copysize = bufused;
  for (size_t copies = bufalloc / copysize; --copies; )
    {
      memcpy (buf + bufused, buf, copysize);
      bufused += copysize;
    }

  /* Repeatedly output the buffer until there is a write error; then fail.  */
  while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
    continue;
  error (0, errno, _("standard output"));
  main_exit (EXIT_FAILURE);
}

I already cut out the argument parsing and initialization and yet it is way too long for yes. Seriously, what possible use case requires this?