-
-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for a way to check the healthcheckers/checkers #2523
Comments
If the keepalived checker process is freezing, then we need to find the cause of that. However, since v2.0.10 is so old (over 6 years), the first thing to do is to upgrade keepalived to the current version (v2.3.2) and see if the problem still exists in that version (you state that the checker process has died but that is clearly not the case since it still exists because you can send a signal to it). The next thing to understand is has the keepalived checker process totally stopped working, or has it stopped running one (or more) healthcheckers. You could try executing Something you could do to help identify if the checker process has frozen/died is to add a CHECK_MISC checker to one (or more) real servers, and make the script that runs simply write the current time to a file (it could be even simpler and just touch a file and the file could be monitored by executing Between kill -USR1 and adding a CHECK_MISC script I think there is sufficient already in keepalived to be able to determine if it is still running. If the keepalived checker process really is freezing or losing checkers in the current version, then we will need to identify and resolve the cause of that, rather than putting effort into adding an API. |
@tchernomax Do you have any update on this? |
problem
sometimes
Keepalived_healthcheckers
(the healthchecker of keepalived) stop working without dying or writing anything on the logs.example
here between
Jan 02 10:47:14
andJan 03 18:38:23
server[10.36.6.35]:80
"came back to life" butKeepalived_healthcheckers
was stuck/freezed and didn't noticed it.At
Jan 03 18:38:23
Ikill -9 $(cat /run/checkers.pid)
, the healthchecker respawn and everything came back to normal ([10.36.6.35]:80
came back in the backend).note
kill -9 …
is oksolution/feature I would like
Has the freeze is completely silent I wonder if there was some signal or socket API or anything else that would allow me to check for the healthcheckers health.
My goal is to create a monitoring prob to check the liveness of the healthcheckers.
I looked at the code (current master) but didn't find anything.
I think this feature would be benefit not only to myself, but also to others.
Thank you in advance
The text was updated successfully, but these errors were encountered: