lots of zombie processes? #106

wideglide · 2020-08-10T16:23:21Z

I think I'm running into issues where Angora might be failing because it is not reaping zombie child processes, filling up the process table, then unable to launch new processes. It appears that the fork server does read the status of the child processes, so there must be another invocation that doesn't check the exit codes?

Do you know where this might be originating from?

Here's an example with base64 from LAVA-M, the number of defunct processes just keeps growing over time.

<snip>
81689 ?        Zs     0:00 [base64] <defunct>
81708 ?        Zs     0:00 [base64] <defunct>
81738 ?        Zs     0:00 [base64] <defunct>
81757 ?        Zs     0:00 [base64] <defunct>
81778 ?        Zs     0:00 [base64] <defunct>
81794 ?        Zs     0:00 [base64] <defunct>
81808 ?        Zs     0:00 [base64] <defunct>
81842 ?        Zs     0:00 [base64] <defunct>
81898 ?        Zs     0:00 [base64] <defunct>
81900 ?        Zs     0:00 [base64] <defunct>
user@d-9-9-2:/dev/shm/fuzz/angora/7051337.7/who$ ps ax | grep '\[base64\] <defunct>'  | wc -l
1304

And here's the error log from another instance on the same host:

 WARN  angora::executor::forksrv  > Fail to read child_id -- Interrupted system call (os error 4)
 WARN  angora::executor::forksrv  > Fail to read child_id -- Interrupted system call (os error 4)
 WARN  angora::executor::forksrv  > Fail to read child_id -- Interrupted system call (os error 4)
 WARN  angora::executor::forksrv  > Fail to read child_id -- Interrupted system call (os error 4)
 ERROR angora::executor::forksrv  > FATAL: Failed to spawn child. Reason: Resource temporarily unavailable (os error 11)
thread '<unnamed>' panicked at 'explicit panic', fuzzer/src/executor/forksrv.rs:66:17
stack backtrace:
 WARN  angora::executor::forksrv  > Unable to request new process from frok server! -1
 ERROR angora::executor::forksrv  > FATAL: Failed to spawn child. Reason: Resource temporarily unavailable (os error 11)
thread '<unnamed>' panicked at 'explicit panic', fuzzer/src/executor/forksrv.rs:66:17
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1076
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1537
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:218
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:486
  11: std::panicking::begin_panic
  12: angora::executor::forksrv::Forksrv::new
  13: angora::executor::executor::Executor::rebind_forksrv
  14: angora::executor::executor::Executor::run
  15: angora::search::afl::AFLFuzz::run
  16: angora::fuzz_loop::fuzz_loop
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1076
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1537
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:218
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:486
  11: std::panicking::begin_panic
  12: angora::executor::forksrv::Forksrv::new
  13: angora::executor::executor::Executor::rebind_forksrv
  14: angora::executor::executor::Executor::run_with_cond
  15: angora::search::gd::GdSearch::cal_gradient
  16: angora::search::gd::GdSearch::run
  17: angora::fuzz_loop::fuzz_loop
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

wideglide · 2020-08-20T19:03:38Z

Another instance of about 6K processes per fuzzing process

localuser@bot12b:~/archive/logs$ ps ax | wc -l
18278
localuser@bot12b:~/archive/logs$ ps ax | grep '\[duk\] <defunct>' | wc -l
5909
localuser@bot12b:~/archive/logs$ ps ax | grep '\[jq\] <defunct>' | wc -l
11817
localuser@bot12b:~/archive/logs$ ps ax | grep ' <defunct>' | wc -l
17727
localuser@bot12b:~/archive/logs$ pgrep -af angora/bin/fuzzer
4940 /angora/bin/fuzzer --sync_afl -i inputs -o outputs -t ./lava-ang/bin/jq.tt -j 2 --time_limit 9.0 -- ./lava-ang/bin/jq @@
30609 /angora/bin/fuzzer --sync_afl -i inputs -o outputs -t ./lava-ang/bin/jq.tt -j 2 --time_limit 9.0 -- ./lava-ang/bin/jq @@
31005 /angora/bin/fuzzer --sync_afl -i inputs -o outputs -t ./lava-ang/bin/duk.tt -j 2 --time_limit 2.0 -- ./lava-ang/bin/duk @@

229c9cf0 · 2025-02-21T14:08:37Z

fuzzer/src/executor/forksrv.rs:62 spawns child processes for use as a fork server but immediately discards their info, leading to zombies.

I'm seeing roughly one zombie per minute, which suggests this isn't a high-frequency hot path. Capturing the child process and wait()ing in Drop should fix the issue while hopefully not causing significant overhead. That said, I haven't tested this beyond "it still runs and there's no zombie accumulation", don't really know Rust, haven't read the rest of the code, yadda yadda – so there may well be be unintended side effects.

One potential issue I noticed: Zombies sometimes persist for up to ≈40s before they're collected. This might be fine and just mean Drop isn't being triggered immediately, while the process is already dead and waiting for collection. It could also mean that something in the parent process gets blocked somewhere and slows everything down. No clue / don't care, plausibly still better than PID exhaustion over time.

---
 fuzzer/src/executor/forksrv.rs | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fuzzer/src/executor/forksrv.rs b/fuzzer/src/executor/forksrv.rs
index bec8098..0dc209d 100644
--- a/fuzzer/src/executor/forksrv.rs
+++ b/fuzzer/src/executor/forksrv.rs
@@ -23,6 +23,7 @@ pub struct Forksrv {
     path: String,
     pub socket: UnixStream,
     uses_asan: bool,
+    child: std::process::Child,
 }

 impl Forksrv {
@@ -48,7 +49,7 @@ impl Forksrv {
         let mut envs_fk = envs.clone();
         envs_fk.insert(ENABLE_FORKSRV.to_string(), String::from("TRUE"));
         envs_fk.insert(FORKSRV_SOCKET_PATH_VAR.to_string(), socket_path.to_owned());
-        match Command::new(&target.0)
+        let child = match Command::new(&target.0)
             .args(&target.1)
             .stdin(Stdio::null())
             .envs(&envs_fk)
@@ -59,7 +60,7 @@ impl Forksrv {
             .pipe_stdin(fd, is_stdin)
             .spawn()
         {
-            Ok(_) => (),
+            Ok(child) => child,
             Err(e) => {
                 error!("FATAL: Failed to spawn child. Reason: {}", e);
                 panic!();
@@ -88,6 +89,7 @@ impl Forksrv {
             path: socket_path.to_owned(),
             socket,
             uses_asan,
+            child,
         }
     }

@@ -167,6 +169,10 @@ impl Drop for Forksrv {
         if self.socket.write(&fin).is_err() {
             debug!("Fail to write socket !!  FIN ");
         }
+        match self.child.wait() {
+            Ok(s) => debug!("Forksrv child reaped with status {:?}", s),
+            Err(e) => warn!("Forksrv child wait failure: {}", e),
+        }
         let path = Path::new(&self.path);
         if path.exists() {
             if fs::remove_file(&self.path).is_err() {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lots of zombie processes? #106

lots of zombie processes? #106

wideglide commented Aug 10, 2020

wideglide commented Aug 20, 2020

229c9cf0 commented Feb 21, 2025

lots of zombie processes? #106

lots of zombie processes? #106

Comments

wideglide commented Aug 10, 2020

wideglide commented Aug 20, 2020

229c9cf0 commented Feb 21, 2025