-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use async pid fd instead of blocking waitid to wait for a child process to exit #745
base: main
Are you sure you want to change the base?
Conversation
41ef0a5
to
94d3674
Compare
94d3674
to
82758d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I still need to figure out why CI is failing. I think the http_poxy container is not receiving the SiGINT for some reason. But I don't see how that can relate to this. I'll keep digging tomorrow. |
29927c0
to
4a4f9b0
Compare
f29e21c
to
26d8ab0
Compare
92edd5b
to
e89619a
Compare
…ss to exit Signed-off-by: Jorge Prendes <[email protected]>
Signed-off-by: Jorge Prendes <[email protected]>
e89619a
to
1a900dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for working on this
What I'd like to see, perhaps as a follow up, is some unit tests for edge cases like
- What would happen if the pid does not exist
- what would happen if the orignal PID is reused by a new process after the first one exists.
- what would happen to call
wait()
when containerd-shim has already reaped the process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for working on this
What I'd like to see, perhaps as a follow up, is some unit tests for edge cases like
- What would happen if the pid does not exist
- what would happen if the orignal PID is reused by a new process after the first one exists.
- what would happen to call
wait()
when containerd-shim has already reaped the process
return Err(std::io::Error::last_os_error().into()); | ||
} | ||
let fd = unsafe { OwnedFd::from_raw_fd(pidfd as RawFd) }; | ||
let subs = monitor_subscribe(Topic::Pid)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: shouldn't we start subscription monitor before the syscalls?
Err(Errno::ECHILD) => { | ||
// The process has already been reaped by the containerd-shim reaper. | ||
// Get the status from there. | ||
let status = wait_pid(self.pid, self.subs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If containerd has already reaped the process, will this be blocking the async runtime forever?
If so, I would suggest to run this in a spawn_blocking call. I am not entirely sure how whether or not containerd-shim handles subscription and how reliable is that.
Just to clarify, in this PR, you did not resolve this issue, right? If yes, could you please raise an issue against the repo so that we can track this one. |
This PR replaces the blocking waitid call with an async implementation based on pid fd.
There's a race condition where containerd-shim reaps child processes.
If the child process has already been reaped, query containerd-shim to get the process status.
Note that this race condition is already present in the current implementation, but runwasi's waitid is very likely to win the race, the introduction of async evens out the odds between runwasi and containerd-shim.
This PR is in preparation to move the whole shim implementation to async.