Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

scx: Close a small race window in the enable path which can lead to use-after-free #205

Merged
merged 1 commit into from
May 18, 2024

Conversation

htejun
Copy link
Collaborator

@htejun htejun commented May 17, 2024

scx_ops_enable() was iterating all tasks without locking its rq and then calling get_task_struct() on them. However, this can race against task free path. The task struct itself is accessible but its refcnt may already have reached zero by the time get_task_struct() is called. This triggers the following refcnt warning.

WARNING: CPU: 11 PID: 2834521 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
...
RIP: 0010:refcount_warn_saturate+0x7d/0xe0
...
Call Trace:
bpf_scx_reg+0x851/0x890
bpf_struct_ops_link_create+0xbd/0xe0
__sys_bpf+0x2862/0x2d20
__x64_sys_bpf+0x18/0x20
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0

This can also affect other loops as it can call sched functions on tasks which are already destroyed. Let's fix it by making scx_task_iter_filtered_locked() also filter out TASK_DEAD tasks and switching the scx_task_iter_filtered() usage in scx_ops_enable() to scx_task_iter_filtered_locked(). This makes the next function always test TASK_DEAD while holding the task's rq lock and the rq lock guarantees that the task won't be freed closing the race window.

…se-after-free

scx_ops_enable() was iterating all tasks without locking its rq and then
calling get_task_struct() on them. However, this can race against task free
path. The task struct itself is accessible but its refcnt may already have
reached zero by the time get_task_struct() is called. This triggers the
following refcnt warning.

  WARNING: CPU: 11 PID: 2834521 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
  ...
  RIP: 0010:refcount_warn_saturate+0x7d/0xe0
  ...
  Call Trace:
   bpf_scx_reg+0x851/0x890
   bpf_struct_ops_link_create+0xbd/0xe0
   __sys_bpf+0x2862/0x2d20
   __x64_sys_bpf+0x18/0x20
   do_syscall_64+0x3d/0x90
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

This can also affect other loops as it can call sched functions on tasks
which are already destroyed. Let's fix it by making
scx_task_iter_filtered_locked() also filter out TASK_DEAD tasks and
switching the scx_task_iter_filtered() usage in scx_ops_enable() to
scx_task_iter_filtered_locked(). This makes the next function always test
TASK_DEAD while holding the task's rq lock and the rq lock guarantees that
the task won't be freed closing the race window.
@htejun htejun requested a review from Byte-Lab May 17, 2024 23:44
Copy link
Collaborator

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice debugging

@Byte-Lab Byte-Lab merged commit fb1406f into sched_ext May 18, 2024
1 check passed
@Byte-Lab Byte-Lab deleted the htejun/fix-task-iter-race branch May 18, 2024 01:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants