You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- alert: RedfishProcessorHealthNotOkexpr: redfish_processor_info{health != "OK"}for: 0mlabels:
severity: criticalannotations:
summary: Redfish processor health not OK. (instance {{ $labels.instance }})description: | Redfish processor health not OK. LABELS = {{ $labels }}
If the processor health is not available we assign it the value of "NA". This also triggers a critical alert (since it's not "OK") which might not be necessary just if the processor health status cannot be queried.
If needed, we could create a new alert with a lower severity if the resource health data is not available (with value "NA").
The text was updated successfully, but these errors were encountered:
Currently, this is how the redfish metrics are created for the processor resource (it is similar for other redfish resources).
https://github.com/canonical/prometheus-hardware-exporter/blob/main/prometheus_hardware_exporter/collectors/redfish.py#L197
And this is how the alert rule corresponding to that metric looks like
https://github.com/canonical/hardware-observer-operator/blob/master/src/prometheus_alert_rules/redfish.yaml#L40
If the processor health is not available we assign it the value of "NA". This also triggers a critical alert (since it's not "OK") which might not be necessary just if the processor health status cannot be queried.
If needed, we could create a new alert with a lower severity if the resource health data is not available (with value "NA").
The text was updated successfully, but these errors were encountered: