Skip to content

Commit

Permalink
Update GCS transfer alert to ignore archive-mlab-oti (#939)
Browse files Browse the repository at this point in the history
* Update gcs transfer alert to ignore archive-mlab-oti
* Filter on datatypes that have had some data recently
  • Loading branch information
stephen-soltesz authored Aug 12, 2022
1 parent ecc9119 commit 4cd94b5
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions config/federation/prometheus/alerts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ groups:
#
# The following alerts are based on the exact queries that mlab-ns runs to
# determine the state of NDT services.
# https://github.com/m-lab/mlab-ns/blob/master/server/mlabns/util/prometheus_status.py
# https://github.com/m-lab/mlab-ns/blob/main/server/mlabns/util/prometheus_status.py
#
# The expression in the denominator of the queries can be read as:
# count `probe_success`es _unless_ the node is in GMX _and_ the node does not
Expand Down Expand Up @@ -658,15 +658,15 @@ groups:
# GCS Transfer SLO
#
# We run daily GCS transfers between project buckets and to the public archive.
# See: https://github.com/m-lab/gcp-config/blob/master/daily-archive-transfers.yaml
# See: https://github.com/m-lab/gcp-config/blob/main/daily-archive-transfers.yaml
#
# This alert enforces that daily transfers are working for all datatypes.
# Periodic delays are expected either to data volume or GCS Transfer service
# variance, so the expression must be firing for over 36h.
- alert: GCSTransfers_ArchiveFilesDoNotMatchOrMissing
- alert: GCSTransfers_ArchiveFilesMissing
expr: |
sum(increase(gcs_archive_files_total{bucket="archive-mlab-oti"}[1d]) - ignoring(bucket)
increase(gcs_archive_files_total{bucket="archive-measurement-lab"}[1d]) != 0)
sum by (experiment, datatype) (increase(gcs_archive_files_total{bucket="archive-measurement-lab"}[1d])) == 0
and (sum by (experiment, datatype) (increase(gcs_archive_files_total{bucket="archive-measurement-lab"}[1d] offset 1d)) > 0)
OR
absent(gcs_archive_files_total)
for: 36h
Expand All @@ -676,7 +676,7 @@ groups:
cluster: prometheus-federation
annotations:
summary: GCS Transfers may not include all files.
description: https://github.com/m-lab/ops-tracker/wiki/Alerts-&-Troubleshooting#GCSTransfers_ArchiveFilesDoNotMatchOrMissing
description: https://github.com/m-lab/ops-tracker/wiki/Alerts-&-Troubleshooting#GCSTransfers_ArchiveFilesMissing

# Pipeline: GCS Archives Not Found in BigQuery
#
Expand Down

0 comments on commit 4cd94b5

Please sign in to comment.