title | draft | date |
---|---|---|
Adopters |
false |
2021-03-08T23:50:39+01:00 |
This document tracks people and use cases for the Prometheus Operator in production. By creating a list of production use cases we hope to build a community of advisors that we can reach out to with experience using various the Prometheus Operator applications, operation environments, and cluster sizes. The Prometheus Operator development team may reach out periodically to check-in on how the Prometheus Operator is working in the field and update this list.
Go ahead and add your organization to the list.
Environments: AWS, Azure, Google Cloud
Uses kube-prometheus: Yes
Details (optional):
- Every environment (internal and customer) leverages the Prometheus Operator for deploying metrics. Kube Prometheus is used for cluster metrics, that are managed by an HA Prometheus Stateful Set that runs the Thanos sidecar. Thanos is used to aggregate and query across multi-region/cluster environments. Alertmanager is used to page on-call SREs.
European Laboratory for Particle Physics
Environments: On-premises
Prometheus is used extensively as part of the CERN Kubernetes infrastructure, both managed and unmanaged. Metrics deployment is managed by the community owned kube-prometheus-stack helm chart. Be sure to check our blog.
Details:
- 400+ Kubernetes clusters, with cluster sizes ranging from few nodes to ~100s
Significant usage also exists outside Kubernetes for generic service and infrastructure monitoring.
Environments: Bare Metal, Opennebula
Uses kube-prometheus: Yes
Details:
- multiple K8s cluster with prometheus deployed through prom-operator
- several own ceph cluster providing metrics via ceph mgr prometheus module
- several customer ceph clusters pushing metrics via external pushgateway to our our central monitoring instances
- thanos receiver connected to own S3 storage
Environments: AWS
Uses kube-prometheus: Yes
Details:
- Operator installed on each Kubernetes cluster, with Thanos aggregating metrics from a central query endpoint
- Two Prometheus instances per cluster
- Loose coupling between Kubernetes cluster administrators who manage alerting sinks and service owners who define alerts for their services
- 800K samples/s
- 30M active series
Environments: AWS, Azure, Bare Metal
Uses kube-prometheus: No
Details (optional):
- HA Pair of Prometheus
- 25k samples/s
- 750k active series
Environments: AWS, Azure, Google Cloud, Bare Metal
Uses kube-prometheus: Yes
Deckhouse is a Kubernetes Platform. Its clusters running on any infrastructure are provided with the monitoring system based on highly available Prometheus and Prometheus Operator. Essential metrics are preconfigured out-of-the-box to ensure monitoring of all levels, from hardware and Kubernetes internals to the platform’s modules functionality. The monitoring-custom module simplifies adding custom metrics for user applications. Deckhouse also hosts a dedicated Prometheus instance in each cluster to store downsampled metric series for longer periods.
Environments: Bare Metal
Uses kube-prometheus: Yes
Details (optional):
- HA Pair of Prometheus
- 340000 samples/s
- 14.3M active series
Environments: AWS, Azure, Bare Metal
Uses kube-prometheus: Yes (with additional tight Giant Swarm integrations)
Details:
- One prometheus operator per management cluster and one prometheus instance per workload cluster
- Customers can also install kube-prometheus for their workload using our App Platform
- 760000 samples/s
- 35M active series
Environments: Google Cloud
Uses kube-prometheus: Yes (with additional Gitpod mixins)
Details:
- One prometheus instance per cluster (8 so far)
- 20000 samples/s
- 1M active series
Environments: iflytek Cloud, etc
Uses kube-prometheus: Yes
Details (optional):
- One prometheus operator per management cluster and one prometheus instance per workload cluster
- 700000 samples/s
- 30M active series
Environments: AWS, Azure
Uses kube-prometheus: Yes
Details (optional):
- multiple remote K8s cluster in which we have prometheus deployed through prom-operator.
- these remote prometheus instances push cluster metrics to central Thanos receiver which is connected to S3 storage.
- on top of Thanos we have Grafana for dashboarding and visualisation.
https://kinvolk.io/lokomotive-kubernetes/
Environments: AKS, AWS, Bare Metal, Equinix Metal
Uses kube-prometheus: Yes
Details:
- Self-hosted (control plane runs as pods inside the cluster)
- Deploys full K8s stack (as a distro) or managed Kubernetes (currently only AKS supported)
- Deployed by Kinvolk for its own hosted infrastructure (including Flatcar Container Linux update server), as well as by Kinvolk customers and community users
Environments: AWS
Uses kube-prometheus: Yes
Details:
- One prometheus operator in our platform cluster and one prometheus instance per workload cluster
- 17k samples/s
- 841k active series
Environments: AWS
Uses kube-prometheus: Yes
Details:
- All Mattermost clusters use the Prometheus Operator with Thanos sidecar for cluster monitoring and central Thanos query component to gather all data.
- 977k samples/s
- 29.4M active series
Environment: Google Cloud
Uses kube-prometheus: Yes
Details:
- 100k samples/s
- 1M active series
Environments: AWS, Azure, Google Cloud, Bare Metal
Uses kube-prometheus: Yes (with additional tight OpenShift integrations)
This is a meta user; please feel free to document specific OpenShift users!
All OpenShift clusters use the Prometheus Operator to manage the cluster monitoring stack as well as user workload monitoring. This means the Prometheus Operator's users include all OpenShift customers.
Environments: AWS, Google Cloud
Uses kube-prometheus: No
Opstrace installations use the Prometheus Operator internally to collect metrics and to alert. Opstrace users also often use the Prometheus Operator to scrape their own aplications and remote_write those metrics to Opstrace.
Environment: Google Cloud
Uses kube-prometheus: Yes
Details:
- HA Pair of Prometheus
- 4000 samples/s
- 100k active series
Environments: EKS, GKE, AKS, and self-hosted Kubernetes
Uses kube-prometheus: Yes
We're an open source project that builds upon the awesome Prometheus Operator. We run automated playbooks in response to Prometheus alerts and other events in your cluster. For example, you can automatically fetch logs and send them to Slack when a Prometheus alert occurs. All it takes is this YAML:
triggers:
- on_prometheus_alert:
alert_name: KubePodCrashLooping
actions:
- logs_enricher: {}
sinks:
- slack
Environment: AWS
Uses kube-prometheus: Yes
Details (optional):
- HA Pairs of Prometheus
- 25000 samples/s
- 1.2M active series
suse.com/products/suse-rancher
Environments: RKE, RKE2, K3s, Windows, AWS, Azure, Google Cloud, Bare Metal, etc.
Uses kube-prometheus: Yes
Rancher Monitoring supports use cases for Prometheus Operator across various different cluster types and setups that are managed via the Rancher product. All Rancher users that install Monitoring V2 deploy this chart.
For more information, please see how Rancher monitoring works.
The open-source rancher-monitoring Helm chart (based on kube-prometheus-stack) can be found at rancher/charts.
Environments: OpenStack, VMware vCloud
Uses kube-prometheus: Yes
Details:
- All Kubernetes clusters use one Prometheus Operator instance with remote write enabled
- Prometheus instances push metrics to central H/A VirtualMetric, which gathers all data from clusters in 3 different data centers
- Grafana is used for dashboarding and visualization
- 7.50M samples/s
- 190M active series
Environments: Bare Metal
Uses kube-prometheus: Yes
Details (optional):
- HA Pair of Prometheus
- 786000 samples/s
- 23.6M active series
Environments: AWS, Azure, Google Cloud, cloudscale.ch, Exoscale, Swisscom
Uses kube-prometheus: Yes
Details (optional):
- A huge fleet of OpenShift and Kubernetes clusters, each using Prometheus Operator
- All managed by Project Syn, leveraging Commodore Components like component-rancher-monitoring which re-uses Prometheus Operator
Environments: AWS, Google Cloud
Uses kube-prometheus: Yes
Details (optional):
- Prometheus Operator provides real-time monitoring of all our runners. With AlertManager, we promptly receive notifications regarding any cluster issues, allowing for swift resolution before they have the chance to escalate into outages. Grafana allows us to quickly visualize our cluster's health and performance metrics.
- ~6k samples/s
- ~500k active series
Environments: Kubernetes, AWS (via some EC2)
Uses kube-prometheus: No
Details (optional):
- About 30 HA pairs of sharded Promethei across 10 environments, wired together with Thanos
- Operator also helps us seamlessly manage anywhere between 600-1500 short-lived prometheus instances for our "integration" kubernetes cluster.
- ~15mn samples/s
- ~200mn active series
Environments: AWS, Azure, Google Cloud, Bare Metal, etc
Uses kube-prometheus: Yes | No
Details (optional):
- HA Pair of Prometheus
- 1000 samples/s (query:
rate(prometheus_tsdb_head_samples_appended_total[5m])
) - 10k active series (query:
prometheus_tsdb_head_series
)