-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: How to handle Watchdog, which are always critical #61
Comments
Hi, thanks for the feedback. I think you raise an interesting point regarding "dead man's switch" alerts, which is a common use case. Any ideas are welcome. Cheers, |
Just offloading some thoughts:
|
i find the idea of excluding useful, but there shouldn't be any ‘firing’ checks in prometheus (except watchdog). Perhaps we can differentiate between the two. WatchdogHow do other alert managers deal with this, is there a standard? hard code if watchdog exists, flip and otherwise do not consider? Feature "Filter/Exclude"why useful, prometheus-community/helm-charts#5025 in the default ruleset of ‘Prometheus Rules’ there is a check ‘PrometheusNotConnectedToAlertmanagers’ and others, which alert because I have deactivated the "Alertmanager". My solution, I now maintain this ruleset Prometheus Rules’ by myself and have deleted 5 of them. my values from "kube-prometheus-stack" (shortened)
many false because we use k3s |
Been thinking about this some more, a mixture of filter/exclude flags and a flag to define the expected alert state should do it. I will do some experiments this week and report back. |
@wattebausch I started to work on this, you can check out the code here #67 This adds the option to exclude certain alerts from the list. I'm not 100% sure yet if this is sufficient or not. Since a Watchdog/Deadmanswitch is itself some "meta monitoring" one could argue that it checks itself. Since you want to get an alert if it doesn't do its work. Am I wrong? Does this make sense? |
@RincewindsHat any thoughts? |
Works from the command line.
yes. possibly as a later feature and work with the exclude now? |
Ask a question
Hello everyone,
We use prometheus for our kubernetes cluster. As written in your article (https://blog.netways.de/blog/2023/07/25/check_prometheus-ist-jetzt-oeffentlich-verfuegbar/), we define all rules there. And we use the default Rules like general.rules
We would like to use your plugin, but we have a question about how you handle the watchdog. This is set to active by default.
https://runbooks.prometheus-operator.dev/runbooks/general/watchdog
(If not firing then it should alert external systems that this alerting system is no longer working.)
If we now query ‘alerts’ in this way, we always have a critical state.
thanks for your plugin, looks great.
The text was updated successfully, but these errors were encountered: