Add disk space alerts to loki, prometheus #88

sed-i · 2023-11-21T03:29:57Z

Bug Description

When loki fills up the disk, then cos charms fail.
Looking at juju status it may look like this:

Model               Controller  Cloud/Region        Version  SLA          Timestamp
cos-lite-load-test  uk8s        microk8s/localhost  3.1.6    unsupported  03:26:19Z

App            Version  Status   Scale  Charm                         Channel  Rev  Address         Exposed  Message
alertmanager   0.25.0   active       1  alertmanager-k8s              edge      96  10.152.183.245  no       
catalogue               active       1  catalogue-k8s                 edge      31  10.152.183.95   no       
cos-config     3.5.0    active       1  cos-configuration-k8s         edge      39  10.152.183.106  no       
grafana        9.2.1    active       1  grafana-k8s                   edge      93  10.152.183.29   no       
loki           2.8.4    waiting    0/1  loki-k8s                      edge     104  10.152.183.180  no       waiting for units to settle down
prometheus     2.47.2   waiting    0/1  prometheus-k8s                edge     156  10.152.183.54   no       waiting for units to settle down
scrape-config  n/a      active       1  prometheus-scrape-config-k8s  edge      44  10.152.183.33   no       
scrape-target  n/a      active       1  prometheus-scrape-target-k8s  edge      31  10.152.183.159  no       
traefik                 waiting    0/1  traefik-k8s                   edge     164  10.128.0.6      no       waiting for units to settle down

Unit              Workload  Agent  Address       Ports  Message
alertmanager/0*   active    idle   10.1.174.152         
catalogue/0*      active    idle   10.1.174.140         
cos-config/0*     active    idle   10.1.174.144         
grafana/0*        active    idle   10.1.174.186         
loki/0            unknown   lost   10.1.174.171         agent lost, see 'juju show-status-log loki/0'
prometheus/0      unknown   lost   10.1.174.160         agent lost, see 'juju show-status-log prometheus/0'
scrape-config/0*  active    idle   10.1.174.147         
scrape-target/0*  active    idle   10.1.174.150         
traefik/0         unknown   lost   10.1.174.184         agent lost, see 'juju show-status-log traefik/0'

and debug-log doesn't show anything obvious.

It could be handy to include baked-in disk space alerts (predict_linear etc.).

To Reproduce

Keep Loki running long enough until disk space is exhausted.

Environment

Not limited to a particular env.

Relevant log output

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        97G   96G  1.4G  99% /
tmpfs           7.9G     0  7.9G   0% /dev/shm
tmpfs           3.2G   11M  3.2G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      105M  6.1M   99M   6% /boot/efi
tmpfs           1.6G  4.0K  1.6G   1% /run/user/1000

Additional context

No response

The text was updated successfully, but these errors were encountered:

sed-i added Type: Bug Status: Triage labels Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add disk space alerts to loki, prometheus #88

Add disk space alerts to loki, prometheus #88

sed-i commented Nov 21, 2023

Add disk space alerts to loki, prometheus #88

Add disk space alerts to loki, prometheus #88

Comments

sed-i commented Nov 21, 2023

Bug Description

To Reproduce

Environment

Relevant log output

Additional context