Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(jstzd): listen to os signals #686

Merged
merged 1 commit into from
Dec 11, 2024
Merged

Conversation

huancheng-trili
Copy link
Collaborator

@huancheng-trili huancheng-trili commented Nov 25, 2024

Context

Part of JSTZ-189.

JSTZ-189

Description

Terminate jstzd on SIGTERM and SIGINT.

Note that we still need to find a way to shut down all processes when SIGKILL hits the server process. What happens now is that the subprocesses become orphans after the server is terminated.

Manually testing the PR

  • Integration test: added back default_config with SIGTERM and added one test case with SIGINT

@huancheng-trili huancheng-trili force-pushed the huanchengchang-jstz-168 branch 5 times, most recently from c259d03 to 797bb89 Compare November 26, 2024 15:20
@huancheng-trili huancheng-trili force-pushed the huanchengchang-jstz-189 branch 5 times, most recently from 9b5966c to 9fae526 Compare November 26, 2024 16:28
Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 47.07%. Comparing base (c1cfe35) to head (a44e3cd).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/jstzd/src/lib.rs 0.00% 2 Missing ⚠️
Files with missing lines Coverage Δ
crates/jstzd/src/lib.rs 22.72% <0.00%> (-5.85%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1cfe35...a44e3cd. Read the comment docs.

@huancheng-trili huancheng-trili force-pushed the huanchengchang-jstz-168 branch 2 times, most recently from b7a7a2a to 620f38f Compare November 27, 2024 14:52
@huancheng-trili huancheng-trili force-pushed the huanchengchang-jstz-189 branch 2 times, most recently from 53b85d5 to 58fb599 Compare November 27, 2024 16:29
@huancheng-trili huancheng-trili marked this pull request as ready for review November 28, 2024 11:48
@zcabter zcabter self-requested a review December 2, 2024 11:55
Copy link
Collaborator

@ryutamago ryutamago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!
For handling the SIGKILL, i'm thikning we could assgin subprocesses to the same process group as the parent?
if we could set the process group ID of the main server process to its own PID, and make sure all the octez tasks inherits it, when the jstzd server process is killed, it should clean up all the sub processses.
I found a crate nix where we can use unix specfic process management feature!
wdyt ? @huancheng-trili @zcabter

@huancheng-trili
Copy link
Collaborator Author

Killing the server process won't take down the entire process group. Child processes do have a new parent but they keep running in the same group. It's tricky because we don't have control over the octez executables and thus we cannot make them listen to anything relevant. (btw with the current setup octez process are already in the same process group as the server process.)

If we want to guarantee that the sandbox is clean, we can use a container with everything running inside and create a wrapper server that handles path mapping and spawns the container, but that's a bit messy. For now we can probably just ignore this issue and deal with it later on.

@ryutamago
Copy link
Collaborator

ryutamago commented Dec 2, 2024

@huancheng-trili

Killing the server process won't take down the entire process group

you are right here.

i've done more digging into it and seems like we could set the PR_SET_PDEATHSIG for Child Processes:
if we modify the the spawning of child processes to include a pre_exec closure that sets the PR_SET_PDEATHSIG to say SIGTERM, This allows that if the parent process is terminated (e.g., via SIGKILL), the child processes receive a SIGTERM and can shut down gracefully. The downside of this approach is that we will have to implment the listener for all of the octez child processes (node, baker, rollup)

@huancheng-trili huancheng-trili force-pushed the huanchengchang-jstz-168 branch from 27a422d to 4fd53ec Compare December 2, 2024 17:06
@zcabter
Copy link
Collaborator

zcabter commented Dec 4, 2024

We discussed potential solutions offline, namely requiring a sidecar process. I think we can put SIGKILL support on the backlog for now

@huancheng-trili huancheng-trili force-pushed the huanchengchang-jstz-168 branch 4 times, most recently from 2da8de5 to c7f9a1a Compare December 10, 2024 12:03
Base automatically changed from huanchengchang-jstz-168 to main December 11, 2024 09:55
@huancheng-trili huancheng-trili merged commit 686cffc into main Dec 11, 2024
4 of 5 checks passed
@huancheng-trili huancheng-trili deleted the huanchengchang-jstz-189 branch December 11, 2024 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants