Add agent resource type #396

jlegoff · 2022-12-14T12:15:40Z

What are you trying to achieve?

I'd like to define semantic conventions for agent resources.

Additional context.

Agents are a key part of the software stack, and need to be monitored just as any other component. Several vendors already offer self-monitoring capabilities, for instance:

The Opentelemetry collector also offers a set of best practices for monitoring.

While agents can be considered services, we might want to add additional attributes to define them in a more specific manner. Possible examples include:

agent.type: com.dynatrace.one_agent, com.newrelic.infra_agent, io.openetelemetry.collector
agent.version
agent.distro: github.com/signalfx/splunk-otel-collector

Note that this was first discussed in the context of OpAmp in this issue. However, since agent self monitoring happens outside of the context of OpAmp, I think it makes sense to define semantic conventions in this repo.

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2022-12-14T21:13:34Z

Several vendors already offer self-monitoring capabilities, for instance:

@jlegoff I think it would be very valuable to learn what attributes these vendors capture for self-monitoring purposes.

tigrannajaryan · 2022-12-14T21:17:33Z

@open-telemetry/specs-approvers this issue is about adding semantic conventions for recording Collector attributes.

Some questions:

Do we think this belongs to Otel semantic conventions?
Is agent. the right namespace? We could use otelcol. but using agent. generalizes it and allows other agents to also use it, which I think is desirable.
Do we need agent.version or service.version as recommended by OpAMP is enough?

jlegoff · 2022-12-15T11:43:08Z

I think it would be very valuable to learn what attributes these vendors capture for self-monitoring purposes

From what I'm seeing, some vendors include the agent version. The agent type seems to be mostly omitted but it can be deduced from the context. For instance, the GCP ops agent includes an uptime metric containing the version. The namespace of the metric (agent.googleapis.com/agent/) gives you a hint that this is the Ops Agent.

A couple of examples:

Dynatrace includes a dt.oneagent.agent_type attribute, but I'm not seeing any reference to the version in the public docs
Grafana includes version and branch for the build info metric. The metric prefix (grafana_metrics_enterprise) tells you the agent type.
GCP ops agent only adds the version, as mentioned above
New Relic includes the agent name and plans to include the version as well

tigrannajaryan · 2022-12-15T14:35:26Z

@jlegoff thanks for the examples.

Here is also ECS with its agent. fields: https://www.elastic.co/guide/en/ecs/current/ecs-agent.html. I think agent.type is similar to what we want.

They also have several other attributes which I think diverge from our recommendation to use service.* attributes (e.g. I see correspondence of agent.version->service.version, agent.id->service.instance.id).

I am not sure introducing a new set of attributes for version/id just for agents is necessary when we can use service attributes. This may be justified if we think that calling the agent a "service" is wrong for some reason (but I don't know why it would be wrong).

jlegoff · 2023-03-21T08:49:19Z

Hi @arminru, I was wondering if you had a chance to look at this. Do you think adding an agent resource type makes sense?

arminru · 2023-03-21T16:30:45Z

Hey @jlegoff!

Do you think the service.name and service.version attributes suggested by @tigrannajaryan above would be suitable or are there reasons not to use them and introduce dedicated attributes instead?
How do you imagine this data to be reported? Would the agents themselves be instrumented with OTel?

Note that there are plans to merge ECS into OTel semantic conventions (see open-telemetry/oteps#222), so there would in any case be discussions about whether the agent.* attributes defined in ECS should be included or whether they are redundant in OTel.

jlegoff · 2023-03-22T09:42:36Z

@arminru regarding service.name I think it makes sense to use it when the agent is a service, as is the case for the collector. I do think it also makes sense to rely on a specific attribute to know which type of agent is sending the data. For instance, we shouldn't rely on the name of the service being io.opentelemetry.collector to know it's a collector, because users can change the name. Or they could have several collector services with different purposes and names - but their type should be the same.

I'm less sure about service.version. I'm thinking their may be reporting agents that are not services, in which case this field would not be set. Though it's true that, for OTEL agent, we have the telemetry.sdk attributes.

How do you imagine this data to be reported? Would the agents themselves be instrumented with OTel?

In the case of the collector, we can use the self-monitoring capabilites.

Note that there are plans to merge ECS into OTel semantic conventions (see open-telemetry/oteps#222), so there would in any case be discussions about whether the agent.* attributes defined in ECS should be included or whether they are redundant in OTel.

I'm trying to find the agents attributes in the OTEP but I can't find them. In any case, wouldn't it make sense to prefix ECS attributes with aws.ecs, or something similar, to avoid clashes?

tigrannajaryan · 2023-03-24T19:33:06Z

This issue probably needs to be generalized a bit beyont just the needs of agents. Many other piece of technology have a "type", but can also have a more specialized "name" in a particular context they are used.

For example I may be using PostgreSQL database for the purpose of storing online orders information. In that case the type of the service can be "postgresql" and the name of service may be "ordersdb".

I would like to explore the possibility of introducing service.type as an optional Service attribute. The service.type would describe the service as it is known by its developers, while service.name will continue to the name of the service as it is known by its operators. This is primarily applicable to third-party services where who develops the service and who runs it are different people. For first-party the distinction likely is not applicable and in that case either service.type can be missing or can be set equal to service.name.

We would recommend using reverse FQDN for service.type and so for the Collector we would use service.type=io.opentelemetry.collector and for PostgreSQL we would use service.type=org.postgresql.

Similarly we may introduce service.distro. For example PostgreSQL has a bunch of forks and derived databases which this attribute can indicate.

jlegoff · 2023-03-31T13:00:57Z

I think this would work well in the case of standalone agents such as the collector, which was the motivation for this issue.

jaronoff97 · 2023-07-27T18:07:58Z

Yeah +1 to this, it's important to the operator group that we can distinguish between a collector OpAMP client and an operator OpAMP client. A respective server functions on different configuration (as per the spec). Something like being able to specify agent.type would make this possible. It would also be useful if we had defined constants for supports clients (collector and operator).

jaronoff97 · 2023-10-12T16:00:29Z

@arminru I was wondering if there had been any discussion or decisions made here? Would love to get this added to the docket if possible :) thank you!

arminru · 2023-10-12T16:28:37Z

@jaronoff97 I'm not aware of any further discussion. I'll move it over to the semconv repo where this fits better and might get more attention.

jaronoff97 · 2023-11-07T22:28:04Z

Hey @AlexanderWert I was wondering if there were any updates on this? With the merging of the OpAMP bridge and the OpAMP extension it's become more important to have a semconv to distinguish between these two agent types as part of their identifying attributes.

tigrannajaryan · 2023-11-23T02:43:48Z

Submitted this issue to discuss in semconv: #554

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 ## Problem Description `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. ## Proposed Change This is a request for comments for adding the following Recommended, experimental Resource semantic conventions: - `service.type` - an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Proposed Change =============== This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.

tigrannajaryan · 2024-02-20T21:39:07Z

All, the PR that adds service.type is created, but I and others have doubts that this is the right way. Please comment on the PR with arguments in favour or against it.

Resolves open-telemetry#396

Resolves open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 We need a way to record more information about agents than is currently possible using existing semantic conventions. Otel Collector in particular today uses service.name,service.instance.id,service.version attributes to report its own telemetry. These are useful but not sufficient, particularly we are missing the information about which distribution of Otel Collector it is. agent.type/agent.version/agent.id conventions are also aligned with ECS: https://www.elastic.co/guide/en/ecs/current/ecs-agent.html With introduction of this conventions the following attributes change in Otel Collector's own telemetry output: service.name -> agent.type service.version -> agent.version service.instance.id -> agent.id agent.distro will be added as one more property, the equivalent of which did not exist in the past.

github-actions bot assigned arminru Dec 14, 2022

arminru added the enhancement New feature or request label Dec 14, 2022

arminru transferred this issue from open-telemetry/opentelemetry-specification Oct 12, 2023

github-actions bot assigned AlexanderWert Oct 12, 2023

tigrannajaryan mentioned this issue Nov 23, 2023

Request for comments: service.type and service.distro Resource attributes #554

Closed

tigrannajaryan mentioned this issue Dec 1, 2023

Add service.type experimental Resource attribute #575

Closed

3 tasks

github-actions bot added the Stale label Feb 19, 2024

github-actions bot removed the Stale label Feb 24, 2024

tigrannajaryan added a commit to tigrannajaryan/semantic-conventions that referenced this issue Apr 17, 2024

Add agent semantic conventions

cdbd4a9

Resolves open-telemetry#396

tigrannajaryan linked a pull request Apr 23, 2024 that will close this issue

Add agent semantic conventions #950

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent resource type #396

Add agent resource type #396

jlegoff commented Dec 14, 2022 •

edited

Loading

tigrannajaryan commented Dec 14, 2022

tigrannajaryan commented Dec 14, 2022 •

edited

Loading

jlegoff commented Dec 15, 2022

tigrannajaryan commented Dec 15, 2022

jlegoff commented Mar 21, 2023

arminru commented Mar 21, 2023

jlegoff commented Mar 22, 2023 •

edited

Loading

tigrannajaryan commented Mar 24, 2023

jlegoff commented Mar 31, 2023

jaronoff97 commented Jul 27, 2023

jaronoff97 commented Oct 12, 2023

arminru commented Oct 12, 2023

jaronoff97 commented Nov 7, 2023

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Feb 20, 2024

Add agent resource type #396

Add agent resource type #396

Comments

jlegoff commented Dec 14, 2022 • edited Loading

tigrannajaryan commented Dec 14, 2022

tigrannajaryan commented Dec 14, 2022 • edited Loading

jlegoff commented Dec 15, 2022

tigrannajaryan commented Dec 15, 2022

jlegoff commented Mar 21, 2023

arminru commented Mar 21, 2023

jlegoff commented Mar 22, 2023 • edited Loading

tigrannajaryan commented Mar 24, 2023

jlegoff commented Mar 31, 2023

jaronoff97 commented Jul 27, 2023

jaronoff97 commented Oct 12, 2023

arminru commented Oct 12, 2023

jaronoff97 commented Nov 7, 2023

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Feb 20, 2024

jlegoff commented Dec 14, 2022 •

edited

Loading

tigrannajaryan commented Dec 14, 2022 •

edited

Loading

jlegoff commented Mar 22, 2023 •

edited

Loading