Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use systemd watchdog to restart a node #2004

Closed
wants to merge 15 commits into from
Closed

Use systemd watchdog to restart a node #2004

wants to merge 15 commits into from

Conversation

shawn-zil
Copy link
Contributor

@shawn-zil shawn-zil commented Dec 12, 2024

This uses an application layer detection to determine if a node is stuck. If so, it triggers systemd to restart the service.

@shawn-zil shawn-zil linked an issue Dec 12, 2024 that may be closed by this pull request
@shawn-zil shawn-zil changed the title Use systemd watchdog to restart nodes Use systemd watchdog to restart a node Dec 12, 2024
Copy link
Contributor

🐰 Bencher Report

Branch1917-watchdog
Testbedself-hosted
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
nanoseconds (ns)
(Result Δ%)
Upper Boundary
nanoseconds (ns)
(Limit %)
process-empty/process-empty📈 view plot
🚷 view threshold
9,361,700.00
(+1.29%)
10,732,542.32
(87.23%)
produce-full/produce-full📈 view plot
🚷 view threshold
1,947,200,000.00
(-18.90%)
3,205,577,511.14
(60.74%)
🐰 View full continuous benchmarking report in Bencher

@shawn-zil
Copy link
Contributor Author

Turns out that this is the wrong mechanism, as the service runs under docker, with an external systemd. So, closing this and reworking it to perform an internal restart of the node instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect and restart stuck nodes
1 participant