-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export unit substates #12
Comments
👍 thanks for this. IMO it makes sense and I like the |
While it's not optimal, I agree that the cardinality for substates is a bit much. There will still be issues with disappearing metrics. My only concern is that there will be different behavior between the two labels. |
One additional thought. It would be straightforward to have a feature flag IMO, maintaining the boilerplate code (list of all possible substates) is not worth it for a feature that might be used by a hypothetical advanced user. Would be better to wait for someone to request something like this before prematurely adding the feature |
We currently export unit states, but we do not export the unit substate. Substates often include much more actionable information than states, such as why a unit is inactive (e.g. did it stop with error, was it killed, did it stop without error, etc). Note that a unit's possible substates depend on the type of the unit - different types (
service
,mount
, etc) have different possible substates. See the large list below for all possible combinations on systemd v237 (and be aware that different systemd versions have added/removed substates as needed).Exporting substates would be useful to support querying/graphing/possibly alerting by substate e.g.
sum(systemd_unit_state{state="inactive"}) by (type, substate)
.As I see it, there are two reasonable ways to expose this substate information:
substate
to thesystemd_unit_state
metricsubstate
label. For example,systemd_mount_state{name="foo.mount", substate="mounted"}
IMO adding a new label to
systemd_unit_state
makes the most sense, but other opinions are welcomeRegardless of approach, I do not think we would follow the standard prometheus guideline of exporting all possible values of
substate
as 0-value timeseries. The cardinality explosion is ridiculous. For example, for each service unit we would be exporting approx. 6 states * 16 substates = 96 timeseries.Instead, we would add the current
substate
label to each metric. When the substate changes, this would be a new timeseries. For examplesystemd_unit_state{name="ssh.service", type="service", state="inactive", substate="failed"}
would be distinct fromsystemd_unit_state{name="ssh.service", type="service", state="inactive", substate="dead"}
. This might require aggregation in PromQL queries. However, as we already export one-timeseries-per-state
, this may be an easy transition (e.g. convertby (state)
intoby (state, substate)
. Feedback welcome on this...Regarding exporter performance, the good news is we are already receiving substate information from dbus. It's included in every
dbus.UnitStatus
already, so there is effectively zero performance penalty for adding it as a new label.List of states and substates on one of my systems. Note: different systemd versions will have different lists of substates.
The text was updated successfully, but these errors were encountered: