[chore] System Semantic Conventions Non-Normative Guidance #1618

braydonk · 2024-11-26T15:01:39Z

Changes

This PR adds non-normative guidance from the System Semantic Conventions Working Group. This is added in a new groups folder in non-normative, and a system subfolder in groups. The docs written here were already discussed in a Google doc where we were originally collaborating on this, a link to which can be shared directly if needed.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
[N/A] Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
[N/A] schema-next.yaml updated with changes to existing conventions.

lmolkova

I really like this doc!

I don't think we have similar precedents of "why we designed it in this way" documented (the closest analogy is OTEP), but I wish we had more of these.
We might find a better place for it within the repo over time if we'll have more docs like this.

docs/non-normative/groups/system/design-philosophy.md

docs/non-normative/groups/system/use-cases.md

lmolkova · 2024-11-26T22:06:43Z

docs/non-normative/groups/system/use-cases.md

+
+## **Host**
+
+A user should be able to monitor the health of a host, including monitoring resource consumption, unexpected errors due to resource exhaustion or malfunction of core components of a host or fleet of hosts (network stack, memory, CPU…).


unexpected errors due to resource exhaustion

not sure if we have anything defined today and if there is anything general we can provide, but it'd be nice to have some OS network/hw/etc errors and have them on the dashboards/alerts

We have the system.network.errors metric, I don't think we have anything else (I don't know if there is a way to retrieve this, libraries like psutil don't provide this for other stuff like memory or disk AFAIK). Still, I think the existing metrics cover the case of troubleshooting resource exhaustion/malfunction

docs/non-normative/groups/system/use-cases.md

braydonk · 2024-11-27T14:32:39Z

Did a first pass of easy comments to address, will make some time soon to go through the comments that require more thought!

ChrsMark

LGTM with a question/suggestion.

ChrsMark · 2024-11-28T09:07:59Z

docs/non-normative/groups/system/design-philosophy.md

+* General disk and network metrics  
+* Universal system/process information (names, identifiers, basic specs)
+
+Some Specialist Class examples:


While the whole description of the rationale here is exactly how it should be, I think we miss the part of having a set of rules/guidelines/sanity-checks that would help somebody in the future to decide into which directory a metric or attribute fall into. This might not be quite easy to define because of the nature of this problem but maybe it would worth adding a section in the bottom suggesting how this kind of situations should be handled in the future.

I do have a case study below for process.linux.cgroup; perhaps I can adapt this to more general rules?

Done in 487af83

docs/non-normative/groups/system/use-cases.md

mx-psi · 2024-11-29T09:12:15Z

docs/non-normative/groups/system/use-cases.md

+
+## **Host**
+
+A user should be able to monitor the health of a host, including monitoring resource consumption, unexpected errors due to resource exhaustion or malfunction of core components of a host or fleet of hosts (network stack, memory, CPU…).


We have the system.network.errors metric, I don't think we have anything else (I don't know if there is a way to retrieve this, libraries like psutil don't provide this for other stuff like memory or disk AFAIK). Still, I think the existing metrics cover the case of troubleshooting resource exhaustion/malfunction

mx-psi · 2024-11-29T09:13:59Z

docs/non-normative/groups/system/use-cases.md

+* Machine name  
+* ID (relevant to its context, could be a cloud provider ID or just base machine ID)  
+* OS information (platform, version, architecture, etc)  
+* Number of CPU cores  


Maybe this can be "CPU information" instead? We have a bunch of those here

mx-psi

Approving, I left a few non-blocking comments above :)

mx-psi · 2024-11-29T11:27:33Z

I marked #1403 and #1578 to be closed by this PR, please let me know if this is not right

jsuereth

I love writing this down.

The categorization of "Two Class Design Strategy" I think we should move to general non-normative guidance for all semantic conventions to follow.

docs/non-normative/groups/system/design-philosophy.md

mx-psi · 2024-12-19T10:43:59Z

What is missing for this to be merged?

braydonk · 2024-12-19T13:30:19Z

I'm finishing up edits for the remaining open comments, will be pushing this morning.

This PR adds non-normative guidance from the System Semantic Conventions Working Group. This is added in a new `groups` folder in `non-normative`, and a `system` subfolder in `groups`. The docs written here were already discussed in a Google doc where we were originally collaborating on this, a link to which can be shared directly if needed.

braydonk · 2024-12-19T18:27:07Z

I've pushed up two new commits:

487af83: Addresses review comments. I will re-request review from those who still had open comments.

01f43e9: To address the issue with the markdown files having really long lines, I have set up Prettier to apply to these markdown files and wrap them at 80 characters. Did this in a separate commit so it wasn't too difficult to see exactly how I addressed open comments.

lmolkova · 2024-12-21T01:08:30Z

docs/non-normative/groups/system/design-philosophy.md

+For example, there may be `process.linux`, `process.windows`, or `process.posix`
+names for metrics and attributes. We will not have root `linux.*`, `windows.*`,
+or `posix.*` namespaces. This is because of the principle we’re trying to uphold
+from the [Namespaces section](#namespaces); we still want the instrumentation
+source to be represented by the root namespace of the attribute/metric. If we
+had OS root namespaces, different sources like `system`, `process`, etc. could
+get very tangled within each OS namespace, defeating the intended design
+philosophy.


I'm curious what would be specific problems if we gave up on the prefix and use OS name as a root?

I'm trying to document naming patterns we have in #1708

and I'm actually struggling to understand what benefit the domain prefix brings.

E.g. what should I do if I want to describe a property of OS that's indifferent to instrumentation point/source? which namespace would I use?

lmolkova · 2024-12-21T01:14:08Z

PTAL at the related #1707 - it's my attempt to document overall semconv guidance (only attribute definition so far). There are some intersections.

braydonk requested review from a team as code owners November 26, 2024 15:01

braydonk requested a review from a team November 26, 2024 15:01

braydonk changed the title ~~System Semantic Conventions Non-Normative Guidance~~ [chore] System Semantic Conventions Non-Normative Guidance Nov 26, 2024

braydonk added Skip Changelog Label to skip the changelog check area:system labels Nov 26, 2024

mx-psi self-requested a review November 26, 2024 15:56

lmolkova approved these changes Nov 26, 2024

View reviewed changes

braydonk force-pushed the system_semconv_non_normative branch from e980f13 to e051e87 Compare November 27, 2024 14:30

ChrsMark approved these changes Nov 28, 2024

View reviewed changes

mx-psi reviewed Nov 29, 2024

View reviewed changes

mx-psi approved these changes Nov 29, 2024

View reviewed changes

AlexanderWert approved these changes Nov 29, 2024

View reviewed changes

This was linked to issues Nov 29, 2024

[non-normative] Write document with guidance behind naming of system conventions #1578

Open

Clarify OS specific system attributes/metrics namespace #1403

Open

jsuereth approved these changes Dec 6, 2024

View reviewed changes

christophe-kamphaus-jemmic approved these changes Dec 6, 2024

View reviewed changes

trask reviewed Dec 6, 2024

View reviewed changes

trask approved these changes Dec 6, 2024

View reviewed changes

braydonk added 7 commits December 19, 2024 18:25

add new docs folder to CODEOWNERS

6900c91

change old Bucket verbiage to Class

69a97da

address typo and nit comments

2183af1

add additional relevant discussion link

3c3c385

address review comments

487af83

wrap lines to 80 characters in system non-normative

01f43e9

braydonk force-pushed the system_semconv_non_normative branch from e051e87 to 01f43e9 Compare December 19, 2024 18:25

braydonk requested review from lmolkova, ChrsMark and trask December 19, 2024 18:27

trask approved these changes Dec 19, 2024

View reviewed changes

lmolkova mentioned this pull request Dec 21, 2024

Add system-specific naming guidance #1708

Open

3 tasks

lmolkova reviewed Dec 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore] System Semantic Conventions Non-Normative Guidance #1618

[chore] System Semantic Conventions Non-Normative Guidance #1618

braydonk commented Nov 26, 2024

lmolkova left a comment

lmolkova Nov 26, 2024

mx-psi Nov 29, 2024

braydonk commented Nov 27, 2024

ChrsMark left a comment

ChrsMark Nov 28, 2024

braydonk Nov 28, 2024

braydonk Dec 19, 2024

mx-psi Nov 29, 2024

mx-psi Nov 29, 2024

mx-psi left a comment

mx-psi commented Nov 29, 2024

jsuereth left a comment

mx-psi commented Dec 19, 2024

braydonk commented Dec 19, 2024

braydonk commented Dec 19, 2024

lmolkova Dec 21, 2024

lmolkova Dec 21, 2024 •

edited

Loading

lmolkova commented Dec 21, 2024


		## Host

		A user should be able to monitor the health of a host, including monitoring resource consumption, unexpected errors due to resource exhaustion or malfunction of core components of a host or fleet of hosts (network stack, memory, CPU…).

[chore] System Semantic Conventions Non-Normative Guidance #1618

Are you sure you want to change the base?

[chore] System Semantic Conventions Non-Normative Guidance #1618

Conversation

braydonk commented Nov 26, 2024

Changes

Merge requirement checklist

lmolkova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

braydonk commented Nov 27, 2024

ChrsMark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mx-psi left a comment

Choose a reason for hiding this comment

mx-psi commented Nov 29, 2024

jsuereth left a comment

Choose a reason for hiding this comment

mx-psi commented Dec 19, 2024

braydonk commented Dec 19, 2024

braydonk commented Dec 19, 2024

Choose a reason for hiding this comment

lmolkova Dec 21, 2024 • edited Loading

Choose a reason for hiding this comment

lmolkova commented Dec 21, 2024

lmolkova Dec 21, 2024 •

edited

Loading