From 615ecfb8917ee0583aeab01e712ab5dd3830c5f0 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthalion6@gmail.com> Date: Fri, 20 Dec 2024 11:10:30 +0100 Subject: [PATCH] Keep event thread after execve Currently when a thread from a thread group is doing execve, we expect that the kernel will reassign its tid at the end to the group leader, and simulate this behavior in the parser accordingly. The final result is all the threads in the thread group, except the leader, are removed from the cache. But looks like under certain circumstances it's possible to end up in a situation when the kernel is not doing the reassignment, yet the syscall ends successfully. This leads to a crash, since the parser removes the thread associated with the execve_x event, which will be accessed later during post processing -- and everything is expose in use-after-free. It's hard to reproduce artificially, but there are crash reports from the field, demonstrating the problem and confirming the patch fixes the crash. So far the issue was discovered only on ppc64le (Power10 to be more precise). To handle this, keep the event thread in place. Note, that tid here comes from the BPF probe directly, where it's captured via bpf_get_current_task/_btf. This means that the tid is the one really reported by the kernel, so keeping it represents the current state precisely. Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> --- userspace/libsinsp/parsers.cpp | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/userspace/libsinsp/parsers.cpp b/userspace/libsinsp/parsers.cpp index 639aeb0d09..21584d0f26 100644 --- a/userspace/libsinsp/parsers.cpp +++ b/userspace/libsinsp/parsers.cpp @@ -2255,8 +2255,18 @@ void sinsp_parser::parse_execve_exit(sinsp_evt *evt) { auto thread_ptr = thread.lock().get(); /* we don't want to remove the main thread since it is the one * running in this parser! + * + * Also make sure the thread to be removed is not the one + * associated with the event. Under normal conditions this + * should not happen, since the kernel will reassign tid before + * returning from the exec syscall. But there are crash reports, + * indicating possibility the original tid is kept in place, but + * the syscall still returns a success. + * + * To handle such cases gracefully, keep the event thread. */ - if(thread_ptr == nullptr || thread_ptr->is_main_thread()) { + if(thread_ptr == nullptr || thread_ptr->is_main_thread() || + thread_ptr->m_tid == evt->get_tinfo()->m_tid) { continue; } m_inspector->remove_thread(thread_ptr->m_tid);