Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split control-plane vs non-control plane alerts #50

Open
brancz opened this issue Jul 30, 2018 · 3 comments
Open

Split control-plane vs non-control plane alerts #50

brancz opened this issue Jul 30, 2018 · 3 comments

Comments

@brancz
Copy link
Member

brancz commented Jul 30, 2018

We include the kubernetes-mixin for monitoring in the kube-prometheus stack, and a common point of frustration is that all alerts are always shipped, even on Kubernetes clusters that are managed like GKE or AKS. For those clusters it is often not possible to retrieve the metrics necessary to monitor the control plane components.

While it would be possible to hand pick or filter alerts, my feeling is that it could be beneficial to split alerts into the two groups also for a world, where a single Prometheus server is not sufficient to monitor an entire cluster, or in multi-tenant Kubernetes environments. In these scenarios we are seeing people assign a Prometheus server per tenant (typically made up of one or more namespaces), and the responsibility of that tenant is not to monitor the Kubernetes cluster itself, but primarily the workload.

This would not be a breaking change, as the entrypoint (as in the .libsonnet file imported by people) for the alerting rules would stay the same.

@tomwilkie @metalmatze

@metalmatze
Copy link
Member

I'm all for it. Running clusters on GKE and Kubermatic, I've already experienced this and simply silenced the control plane alerts for 1 year.

@karlskewes
Copy link

Maybe we can apply the same grouping include/exclude to the Service Monitors here?: https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/jsonnet/kube-prometheus/prometheus/prometheus.libsonnet#L221

Copy link

This issue has not had any activity in the past 30 days, so the
stale label has been added to it.

  • The stale label will be removed if there is new activity
  • The issue will be closed in 7 days if there is no new activity
  • Add the keepalive label to exempt this issue from the stale check action

Thank you for your contributions!

@github-actions github-actions bot added stale and removed stale labels Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants