Note: This CHANGELOG is only for the monitoring team to track all monitoring related changes. Please see OpenShift release notes for official changes.
- #1299 Expose expose /api/v1/labels endpoint for Thanos query.
- #1402 Drop pod-centric cAdvisor metrics that are available at slice level.
- #1399 Rename ThanosSidecarUnhealthy to ThanosSidecarNoConnectionToStartedPrometheus and make it resilient to WAL replays.
- #1446 Bump Grafana version to 7.5.11
- #1312 Support label to exclude namespaces from user-workload monitoring.
- #1308 Expose remote_write to user for in-cluster deployment and UWM.
- #1241 Add config option to disable Grafana deployment.
- #1278 Add EnforcedTargetLimit option for user-workload Prometheus.
- #1291 Drop high caredinality cAdvisor metrics via kube-prometheus #1250
- #1270 Show a message in the degraded condition when Platform Monitoring Prometheus runs without persistent storage.
- #1241 Allow configuring additional Alertmanagers in User Workload Prometheus and Thanos Ruler.
- #1293 Allow disabling the local Alertmanager.
- #1310 Update Alert Configs, fewer critical alerts with more accurate triggering condition.
- #1324 Allow filtering by job in 'Prometheus/Overview' dashboard.
- #1087 Decrease alert severity to "warning" for ThanosQueryHttpRequestQueryErrorRateHigh and ThanosQueryHttpRequestQueryRangeErrorRateHigh alerts.
- #1087 Increase "for" duration to 1 hour for all Thanos query alerts.
- #1087 Remove ThanosQueryInstantLatencyHigh and ThanosQueryRangeLatencyHigh alerts.
- #1090 Decrease alert severity to "warning" for all Thanos sidecar alerts.
- #1090 Increase "for" duration to 1 hour for all Thanos sidecar alerts.
- #1093 Bump kube-state-metrics to major new release v2.0.0-rc.1. This changes a lot of metrics and flags, see kube-state-metrics CHANGELOG for full changes.
- #1126 Remove deprecated techPreviewUserWorkload field from CMO's configmap.
- #1136 Add recording rule for builds by strategy
- #1210 Bump Grafana version to 7.5.5
- #963 bump mixins to include new etcd alerts
- Added etcdBackendQuotaLowSpace, etcdExcessiveDatabaseGrowth, and etcdHighFsyncDurations critical alert.
- Adjusted NodeClockNotSynchronising, NodeNetworkReceiveErrs, and NodeNetworkTransmitErrs alerts.
- #962 Enable namespace by pod and pod total networking Grafana dashboards.
- #959 Remove memory limits from prometheus-config-reloader in user workload monitoring
- #969 Bump Thanos v0.16.0
- #970 Bump prometheus-operator v0.43.0.
- #971 Enable
hwmon
in node-exporter for hardware sensor data collection - #983 Remove deprecated user workload configuration
- #995 Add logLevel config field to Thanos Query.
- #993 Add metrics + alerts for Thanos sidecars.
- #1013 #1018 Bump and pin jsonnet dependencies:
- prometheus-operator v0.44.1
- Thanos: v0.17.2
- kube-prometheus: release-0.7
- #936 Bump prometheus-operator 0.42.1
- #928 Bump prometheus-operator 0.42:
- #714 Validate new/updated PrometheusRule custom resources against the prometheus-operator rule validation API.
- #799 Rules federation support.
- #800 Collect metrics and implement alerting rules for Thanos querier.
- #804 Allow user workload monitoring configuration ConfigMap to be created in openshift-user-workload-monitoring namespace.
- #736 Expose /api/v1/rules endpoint of Thanos Querier via the 9093 TCP port with multi-tenancy support.
- #854 Change KubeQuotaExceeded to KubeQuotaFullyUsed.
- #859 Remove the
hostport
parameter from the configuration. - #859 Allow users to configure EnforcedSampleLimit for User workload monitoring Prometheus tenant.
- #894 Bump jsonnet depdencies:
- kubernetes-mixin: kubernetes-monitoring/kubernetes-mixin#475: alerts: adjust error message accrodingly to recent change
- prometheus-operator: prometheus-operator/kube-prometheus#610: Add PrometheusOperatorListErrors and fix PrometheusOperatorWatchErrors threshold
- etcd: etcd-io/etcd#12122: Documentation/etcd-mixin: Reformulate alerting rules to use
without
rather thanby
- kubelet: prometheus-operator/kube-prometheus#623: Add scraping of endpoint for kubelet probe metrics
- thanos: thanos-io/thanos#2374: mixin: Added critical Rules alerts.
- #898 Bump jsonnet depdencies for kube-mixin:
- Adjusts severity levels of many alerts from critical to warning as they were cause based alerts
- Adjusts KubeStatefulSetUpdateNotRolledOut, KubeDaemonSetRolloutStuck
- Removes KubeAPILatencyHigh and KubeAPIErrorsHigh