Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hardware metrics markdown? #258

Closed
wants to merge 1 commit into from

Conversation

trask
Copy link
Member

@trask trask commented Aug 15, 2023

I may not have understood something here, but figured I'd send this PR to find out.

@trask trask requested review from a team August 15, 2023 17:41
@trask trask requested a review from bertysentry August 15, 2023 17:41
@trask trask changed the title Fix hardware metrics? Fix hardware metrics markdown? Aug 15, 2023
Copy link
Contributor

@bertysentry bertysentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the aggregation of hw.errors and hw.status over different hardware types is meaningful, we should keep it under the same metric. Thank you!

@@ -93,11 +93,11 @@ Examples: physical server, switch or disk array.
**Description:** A battery in a computer system or an UPS.

| Name | Description | Units | Instrument Type ([*](/docs/general/metrics.md#instrument-types)) | Value Type | Attribute Key(s) | Attribute Values |
| ------------------------- | ----------------------------------------------------------------------------- | ----- | ------------------------------------------------- | ---------- | --------------------------------------------------------------------------- | ----------------------------------------------------- |
|---------------------------| ----------------------------------------------------------------------------- | ----- | ------------------------------------------------- | ---------- | --------------------------------------------------------------------------- | ----------------------------------------------------- |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this change is made on purpose (other columns don't follow the same pattern), and same for other similar changes in table formatting.

| `hw.battery.charge` | Remaining fraction of battery charge | 1 | Gauge | Double | | |
| `hw.battery.charge.limit` | Lower limit of battery charge fraction to ensure proper operation | 1 | Gauge | Double | `limit_type` (Recommended) | `critical`, `throttled`, `degraded` |
| `hw.battery.time_left` | Time left before battery is completely charged or discharged | s | Gauge | Int | `state` (Conditionally Required, if the battery is charging or discharging) | `charging`, `discharging` |
| `hw.status` | Operational status: `1` (true) or `0` (false) for each of the possible states | | UpDownCounter | Int | `state` (**Required**) | `ok`, `degraded`, `failed`, `charging`, `discharging` |
| `hw.battery.status` | Operational status: `1` (true) or `0` (false) for each of the possible states | | UpDownCounter | Int | `state` (**Required**) | `ok`, `degraded`, `failed`, `charging`, `discharging` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is absolutely no need to use namespaces everywhere.

It's much easier to query hw.status{state="failed"} to cover all devices, rather than hw.battery.status{state="failed"} + hw.cpu.status{state="failed"} + hw.memory.status{state="failed"} + ...

Same goes for all metrics that are common to different classes of hardware devices.

BTW, the very first version of the specification used namespaces heavily, and we simplified it and I think it's great. Let's not go backward and get it more complicated.

@trask
Copy link
Member Author

trask commented Aug 22, 2023

As the aggregation of hw.errors and hw.status over different hardware types is meaningful, we should keep it under the same metric. Thank you!

got it, should I remove all the duplicate definitions of hw.errors and hw.status instruments then?

@bertysentry
Copy link
Contributor

@trask You could remove the duplicate definitions (i.e. the description and unit). However, I recommend that hw.errors and hw.status are still explicitly listed for the different types of devices. Also, some types of devices include additional values for the error type or state attributes, so this information must remain.

@trask
Copy link
Member Author

trask commented Aug 22, 2023

ok thanks for answering my questions @bertysentry, I'm going to close this, I suspect it may need to be revisited when converting the hardware metrics to yaml anyways (my understanding is that there can only be a single definition of the metric in yaml)

@trask trask closed this Aug 22, 2023
@trask trask deleted the fix-hardware-metrics branch August 22, 2023 18:41
@joaopgrassi
Copy link
Member

(my understanding is that there can only be a single definition of the metric in yaml)

@trask I guess you meant there can only be a single attribute definition in yaml? If so, yes that is correct. if, for example, hw.status has the same meaning on all metrics, we can easily achieve it by defining it in its own namespace hw.* and then re-used across all metrics.

BUT: If the attribute has different a different semantic/meaning/values on different metrics, then we are going to run into problems, as via ref it's not possible to override the member values, only the brief/requirement levels AFAIK. In that case, it will have to be defined under each namespace, like you did in this PR.

@bertysentry since you worked on this initially, can you answer if such attributes have the same meaning across all metrics they are used? That will help when migrating this conventions to YAML.

@bertysentry
Copy link
Contributor

@joaopgrassi The attributes do have the same meaning across all device types. However, we may want to add an additional values that are specific a type of device. Example: we may want to specify the value charging for the state attribute in the hw.status metric, that is applicable to batteries only (charging).

In any case, we shouldn't add namespaces because of the tooling (YAML to Markdown)! 😅

@joaopgrassi
Copy link
Member

The attributes do have the same meaning across all device types. However, we may want to add an additional values that are specific a type of device. Example: we may want to specify the value charging for the state attribute in the hw.status metric, that is applicable to batteries only (charging).

I see, thanks for checking. If the attributes that are "common" have the same semantic meaning in all metrics it should be fine then leaving as is - they are already in the hw.* namespace anyway so I guess fulfill the decision on #51 (comment).

But we still have the problem that re-defining enum values (or overwriting them) is currently not supported. So, if we want to keep the attribute as today, we would need to implement it open-telemetry/weaver#479 before we migrate these conventions to YAML.

@trask
Copy link
Member Author

trask commented Aug 23, 2023

(my understanding is that there can only be a single definition of the metric in yaml)

@trask I guess you meant there can only be a single attribute definition in yaml?

@joaopgrassi hw.status is a metric (instrument), not a metric attribute

@joaopgrassi
Copy link
Member

joaopgrassi commented Aug 24, 2023

@joaopgrassi hw.status is a metric (instrument), not a metric attribute

A my mistake, I see it now. I thought it was just an attribute repeated over other metrics.

Then it's really a problem. I see that hw.status is a metric in the "general" namespace hw.* but it also appears down the other namespaces, with different set of attribute values for state. The way the page is today, I'm not sure we can migrate it to YAML without introducing a different namespace in all of them =/

The only solution I can see is: Don't include hw.status in each namespace (as is today) and only have it under the hw. namespace once. Then we can instruct in the page that other namespaces can also record it, but adding the corresponding hw.type.

@bertysentry
Copy link
Contributor

@joaopgrassi Thank you for the clarification! I insist that we will not add a different namespace for each device type for the hw.status and hw.errors metrics.

Any chance we can add this ability to the yaml-2-markdown converter?

@joaopgrassi
Copy link
Member

joaopgrassi commented Aug 24, 2023

Well, everything is possible if we want to do it 😁 , the question is more if it makes sense. The tool today is heavily structured around unique ids to avoid clashes which makes sense to me. I think we can get away with it by doing what I mentioned above and have a mix of automatic vs manual things in the markdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants