Skip to content

sanecz/prometheus-rasdaemon-exporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prometheus Rasdaemon exporter

Introduction

This is a Prometheus exporter for rasdaemon.

This tool relies on Rasdaemon to retrieve its data.

Rasdaemon is a RAS (Reliability, Availability and Serviceability) logging tool. Rasdaemon is available at http://git.infradead.org/users/mchehab/rasdaemon.git

By default prometheus-rasdaemon-exporter listens on 0.0.0.0:.

Installation

Using apt:

apt-get install prometheus-rasdaemon-exporter

Using pip:

pip install prometheus-rasdaemon-exporter

Running

A minimal invocation looks like:

prometheus-rasdaemon-exporter

prometheus-rasdaemon-exporter supports the following arguments:

  --db DB              Path to rasdaemon DB
  --address ADDRESS    Address on which to expose metrics and the web interface
  --port PORT          Port on which to expose metrics and the web interface
  --collector-all      Enable/Disable collecting all errors (default: False)
  --collector-aer      Enable/Disable collecting AER errors (default: True)
  --collector-mce      Enable/Disable collecting MCE errors (default: True)
  --collector-mc       Enable/Disable collecting MC errors (default: True)
  --collector-extlog   Enable/Disable collecting EXTLOG errors (default: False)
  --collector-devlink  Enable/Disable collecting DEVLINK errors (default: False)
  --collector-disk     Enable/Disable collecting DISK errors (default: False)

Metrics

Metrics and labels are returned with the same name as used in the rasdaemon database, to avoid collisions they are prefixed with their type (aer, mc, mce, ...) By default, only AER, MC and MCE records are exported.

# HELP aer_events_total Total of AER Events occured
# TYPE aer_events_total counter
aer_events_total{aer_dev_name="0000:02:00.0",aer_err_msg="Bad TLP",aer_err_type="Corrected"} 3.0
aer_events_total{aer_dev_name="0000:02:00.0",aer_err_msg="Receiver Error, Bad TLP",aer_err_type="Corrected"} 1.0
aer_events_total{aer_dev_name="0000:02:00.0",aer_err_msg="",aer_err_type="Uncorrected (Fatal)"} 4.0
aer_events_total{aer_dev_name="0000:02:00.0",aer_err_msg="Completer Abort, Malformed TLP TLP Header: 00000004 00000005 00000006 00000007",aer_err_type="Uncorrected (Non-Fatal)"} 3.0
# HELP mce_record_total Total of MCE record occured
# TYPE mce_record_total counter
mce_record_total{msg="MEMORY CONTROLLER MS_CHANNEL1_ERR Transaction: Memory scrubbing error Corrected patrol scrub error"} 149.0
mce_record_total{msg="MEMORY CONTROLLER RD_CHANNEL1_ERR Transaction: Memory read error"} 5107.0
mce_record_total{msg="MEMORY CONTROLLER RD_CHANNEL1_ERR Transaction: Memory read error Corrected memory read error"} 4958.0
# HELP mc_events_total Total of Memory Controller events occured
# TYPE mc_events_total counter
mc_events_total{err_type="Corrected",label="DIMM_P0_B0"} 2.0

Releases

No releases published

Packages

No packages published

Languages