Daily Source Reading: unveil
unveil
As promised at the end of the yesterday's article, today we are going
to take a look at unveil, the spiritual sibling to pledge.
Uncloaking the files
int sys_unveil(struct proc *p, void *v, register_t *retval) { struct sys_unveil_args /* { syscallarg(const char *) path; syscallarg(const char *) permissions; } */ *uap = v; struct process *pr = p->p_p; char *pathname, *c; struct nameidata nd; size_t pathlen; char permissions[5]; int error, allow;
A typical start for a syscall, forcing a void pointer into the arg structure and declaring all the variables we need.
if (SCARG(uap, path) == NULL && SCARG(uap, permissions) == NULL) { pr->ps_uvdone = 1; return (0); } if (pr->ps_uvdone != 0) return EPERM;
Should both arguments to unveil be NULL, it means we are done
calling unveil. So we set the process' unveil done flag and exit.
A similar affect could be achieved if we would just call pledge
without "unveil" given to it.
If the flag is already not 0, then we return with a permissions error.
error = copyinstr(SCARG(uap, permissions), permissions,
sizeof(permissions), NULL);
if (error)
return (error);
This is a part I kinda ignored yesterday and just hopped over this
copyinstr. Today I finally looked it up, and it is to copy a string
from user address space to kernel space. Makes sense, as we would
want to read the string arguments to unveil and we are in kernel
land.
/* * System calls in other threads may sleep between unveil * datastructure inspections -- this is the simplest way to * provide consistency */ single_thread_set(p, SINGLE_UNWIND); pathname = pool_get(&namei_pool, PR_WAITOK); error = copyinstr(SCARG(uap, path), pathname, MAXPATHLEN, &pathlen); if (error) goto end;
The comment explains the first part. Pause other threads in this process, just to be safe.
Then we go on and get us some memory for a pathname from the namei
pool (a resource pool for, have a guess….pathnames, that's right).
Once we got our memory, we copy in the pathname argument from user
space.
#ifdef KTRACE if (KTRPOINT(p, KTR_STRUCT)) ktrstruct(p, "unveil", permissions, strlen(permissions)); #endif
Let's hop over the diagnostics here.
if (pathlen < 2) { error = EINVAL; goto end; } /* find root "/" or "//" */ for (c = pathname; *c != '\0'; c++) { if (*c != '/') break; } if (*c == '\0') /* root directory */ NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF | SAVENAME, UIO_SYSSPACE, pathname, p); else NDINIT(&nd, CREATE, FOLLOW | LOCKLEAF | LOCKPARENT | SAVENAME, UIO_SYSSPACE, pathname, p);
This is a quick check if we are in a root path. Non-root paths get
a CREATE flag, because the path we are unveiling might not exist
yet.
nd.ni_pledge = PLEDGE_UNVEIL; if ((error = namei(&nd)) != 0) goto end;
Now that we have our namei data, we go ahead and use namei to
convert it into a vnode.
*(Interlude: I have to point out here how magnificent the man-pages
are. Instead of having to go and find the function in the source to
understand what it does, I literally just type man namei and get a
detailed explanation. This is what makes reading the BSD code so fun,
especially OpenBSD.)
/* * XXX Any access to the file or directory will allow us to * pledge path it */ allow = ((nd.ni_vp && (VOP_ACCESS(nd.ni_vp, VREAD, p->p_ucred, p) == 0 || VOP_ACCESS(nd.ni_vp, VWRITE, p->p_ucred, p) == 0 || VOP_ACCESS(nd.ni_vp, VEXEC, p->p_ucred, p) == 0)) || (nd.ni_dvp && (VOP_ACCESS(nd.ni_dvp, VREAD, p->p_ucred, p) == 0 || VOP_ACCESS(nd.ni_dvp, VWRITE, p->p_ucred, p) == 0 || VOP_ACCESS(nd.ni_dvp, VEXEC, p->p_ucred, p) == 0)));
What a nice multi-lined boolean expression… but what it boils down
to is that we check that the process actually got access to the
vnode or its parent directory. Any kind of access. Should we for
example set unveil to "you can read this path" but later it turns
out that we don't have the permissions to actually do so and can only
execute for example, that is not unveil's problem to deal with.
unveil only concerns itself if we can in any form or fashion access
that vnode.
/* release lock from namei, but keep ref */ if (nd.ni_vp) VOP_UNLOCK(nd.ni_vp); if (nd.ni_dvp && nd.ni_dvp != nd.ni_vp) VOP_UNLOCK(nd.ni_dvp); if (allow) error = unveil_add(p, &nd, permissions); else error = EPERM;
Our NDINIT earlier actually created a lock on the namei data,
because we passed LOCKLEAF to it. So we unlock it (not release),
but as the comment says, we hold on to the namei data for now to
hold a reference.
We hold on to it, because we are not done with our unveil call and
don't want that data to be fully released yet.
IF our check earlier (if access is allowed) was successful, we finally
pass our path to unveil_add (just wait a moment, we'll get to that
one).
/* release vref from namei, but not vref from unveil_add */ if (nd.ni_vp) vrele(nd.ni_vp); if (nd.ni_dvp) vrele(nd.ni_dvp); pool_put(&namei_pool, nd.ni_cnd.cn_pnbuf); end: pool_put(&namei_pool, pathname); single_thread_clear(p); return (error); }
We are done. So we can now release the nodes as unveil_add now
holds on to its own data. And because we are good citizens, we also
yield back the data we allocated from the pool.
Behind the veil, ehm, curtain
Alright, strap in, this is a bit of a long one. Grab some water, some rations, and off we go.
This is what our syscall that we just looked at does at the end. It
takes the namei data (so, the path basically) and the permissions we
wanted and adds it to the process' unveil data.
int unveil_add(struct proc *p, struct nameidata *ndp, const char *permissions) { struct process *pr = p->p_p; struct vnode *vp; struct unveil *uv; int directory_add; int ret = EINVAL; u_char flags;
Booooooring boilerplate, … skip (but a good reference to have while reading).
KASSERT(ISSET(ndp->ni_cnd.cn_flags, HASBUF)); /* must have SAVENAME */ if (unveil_parsepermissions(permissions, &flags) == -1) goto done;
A quick check that our NDINIT call earlier really did succeed on the
SAVENAME. We need to be sure that the pathname is stored in the
namei data.
Then we parse the permissions (in the current version, a string with
any of the characters "rwxc"). This is straightforward, so we are
not even going to take a look, but if you are interested, you
absolutely should. One thing we have learned in this series so far is
that even boring functions can end up being interesting.
if (pr->ps_uvpaths == NULL) {
pr->ps_uvpaths = mallocarray(UNVEIL_MAX_VNODES,
sizeof(struct unveil), M_PROC, M_WAITOK|M_ZERO);
}
if (pr->ps_uvvcount >= UNVEIL_MAX_VNODES ||
pr->ps_uvncount >= UNVEIL_MAX_NAMES) {
ret = E2BIG;
goto done;
}
Right…so ps_uvpaths seems to be where we store all the unveiled
paths. If we never call unveil, there's no reason to carry that
data. But once we do, we allocate as many as allowed
(UNVEIL_MAX_VNODES is 128 at the time of this writing).
Some additional error checking to make sure we are not overshooting the size and we are good to go.
That was the warm-up and everything is prepared now. Finally time for the good stuff.
/* Are we a directory? or something else */ directory_add = ndp->ni_vp != NULL && ndp->ni_vp->v_type == VDIR; if (directory_add) vp = ndp->ni_vp; else vp = ndp->ni_dvp; KASSERT(vp->v_type == VDIR); vref(vp); vp->v_uvcount++;
The comment here helps. We check if we are unveiling a directory or a terminal node (say instead of ).
If it is a directory, we grab ni_vp, which is the direct vnode for
that directory. If it is a terminal node, we grab the node for its
parent directory including the terminal name (so the node for
and as a name).
Then we hold a reference to it to make sure it doesn't just vanish and
we increase the number of unveils for the vnode.
if ((uv = unveil_lookup(vp, pr, NULL)) != NULL) { /* * We already have unveiled this directory * vnode */ vp->v_uvcount--; vrele(vp);
After checking if we already have an unveil entry for that node, we
can go and drop the extra reference we just took.
But, there are some special cases to consider, even if we already have that node.
/* * If we are adding a directory which was already * unveiled containing only specific terminals, * unrestrict it. */ if (directory_add) { DPRINTF("unveil: %s(%d): updating directory vnode %p" " to unrestricted uvcount %d\n", pr->ps_comm, pr->ps_pid, vp, vp->v_uvcount); if (!unveil_setflags(&uv->uv_flags, flags)) ret = EPERM; else ret = 0; goto done; }
Say we already called before. That
means we already have a vnode for /etc with the "pf.conf"
terminal. But if we then call , it's the same
node, but with ALL of the directory.
So we need to widen the scope of the unveiled area.
The other case:
/* * If we are adding a terminal that is already unveiled, just * replace the flags and we are done */ if (!directory_add) { struct unvname *tname; if ((tname = unveil_namelookup(uv, ndp->ni_cnd.cn_nameptr)) != NULL) { DPRINTF("unveil: %s(%d): changing flags for %s" "in vnode %p, uvcount %d\n", pr->ps_comm, pr->ps_pid, tname->un_name, vp, vp->v_uvcount); if (!unveil_setflags(&tname->un_flags, flags)) ret = EPERM; else ret = 0; goto done; } }
Say after our previous call, we try to
(adding "w" to it), that is similar
to pledge: widening permissions is not allowed.
} else { /* * New unveil involving this directory vnode. */ uv = unveil_add_vnode(p, vp); }
If we haven't seen that vnode yet, we simply add it with
unveil_add_vnode. It's not long, so we will add that to our pile.
/* * At this stage with have a unveil in uv with a vnode for a * directory. If the component we are adding is a directory, * we are done. Otherwise, we add the component name the name * list in uv. */ if (directory_add) { uv->uv_flags = flags; ret = 0; DPRINTF("unveil: %s(%d): added unrestricted directory vnode %p" ", uvcount %d\n", pr->ps_comm, pr->ps_pid, vp, vp->v_uvcount); goto done; } if (unveil_add_name(uv, ndp->ni_cnd.cn_nameptr, flags)) pr->ps_uvncount++; ret = 0; DPRINTF("unveil: %s(%d): added name %s beneath %s vnode %p," " uvcount %d\n", pr->ps_comm, pr->ps_pid, ndp->ni_cnd.cn_nameptr, uv->uv_flags ? "unrestricted" : "restricted", vp, vp->v_uvcount); done: return ret; }
I think the comment here explains it more concise than I could write up here, so let's move on.
The little helpers
struct unveil * unveil_add_vnode(struct proc *p, struct vnode *vp) { struct process *pr = p->p_p; struct unveil *uv = NULL; ssize_t i; KASSERT(pr->ps_uvvcount < UNVEIL_MAX_VNODES); uv = &pr->ps_uvpaths[pr->ps_uvvcount++]; rw_init(&uv->uv_lock, "unveil"); RBT_INIT(unvname_rbt, &uv->uv_names); uv->uv_vp = vp; uv->uv_flags = 0;
Grab the next free unveil node and read-write lock it. Then we initialize a red-black tree for the terminal names.
/* find out what we are covered by */ uv->uv_cover = unveil_find_cover(vp, p); /* * Find anyone covered by what we are covered by * and re-check what covers them (we could have * interposed a cover) */ for (i = 0; i < pr->ps_uvvcount - 1; i++) { if (pr->ps_uvpaths[i].uv_cover == uv->uv_cover) pr->ps_uvpaths[i].uv_cover = unveil_find_cover(pr->ps_uvpaths[i].uv_vp, p); } return (uv); }
Right, from what I can gather about uv_cover, it's this:
unveil("/foo", "rw"), now/foois covered- any access to say
/foo/bar/somethingwill have to walk up to check if it hits something that covers it. In this case/foo - when we do
unveil("/foo/bar", "r"), we have to add/foo/barinbetween that path, so now/foo/bar/somethingis covered by/foo/bar
Conclusion
This one was a bit harder to get through. There's more components and moving parts involved. But again, it mostly seems to be a simple, yet elegant, solution.
But wait…what's missing here? When do we actually check if a file access is allowed? Well, that we will look at this weekend. One of the involved functions is a bit longer, so we will split that into the next post.
In case I have gotten something egregiously wrong, feel free to yell at me over on Mastodon. Any other feedback also welcome.