-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add service.type experimental Resource attribute #575
Add service.type experimental Resource attribute #575
Conversation
86ee32e
to
ab2c0b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the value of service.type
always be io.opentelemetry.collector
for an OpenTelemetry Collector regardless of the distro?
Yes. I also proposed |
|
||
**[1]:** MUST be unique for each instance of the same `service.namespace,service.name` pair (in other words `service.namespace,service.name,service.instance.id` triplet MUST be globally unique). The ID helps to distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled service). It is preferable for the ID to be persistent and stay the same for the lifetime of the service instance, however it is acceptable that the ID is ephemeral and changes during important lifetime events for the service (e.g. service restarts). If the service has no inherent unique ID that can be used as the value of this attribute it is recommended to generate a random Version 1 or Version 4 RFC 4122 UUID (services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations). | ||
|
||
**[2]:** A string value having a meaning that helps to distinguish a group of services, for example the team name that owns a group of services. `service.name` is expected to be unique within the same namespace. If `service.namespace` is not specified in the Resource then `service.name` is expected to be unique for all services that have no explicit namespace defined (so the empty/unspecified namespace is simply one more valid namespace). Zero-length namespace string is assumed equal to unspecified namespace. | ||
|
||
**[3]:** The `service.type` identifies the product that is deployed as the service. The same product may be simultaneously deployed multiple times on the same observable infrastructure. In this case each of these deployments will typically have a distinct `service.name` to help identify the logical role of the particular deployment, however their `service.type` will be the same and will help identify the deployed product. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly mention the possibility of different distros or flavors of the same component here? Even if we don't have a convention for this on the first iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it described here it plays well with definition ECS has for service.type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion I don't want to mention it in semconv until we have a clear understanding of how we want distros/flavours to be recorded. I think it can be done in future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two thoughts, otherwise this looks good to me. Adding a role feels like a worthy follow-up if we want it.
@open-telemetry/specs-approvers please take a look. |
Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.
31f5861
to
7abfc10
Compare
@open-telemetry/specs-approvers please take a look. If it looks good I will resolve the conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no definition of "type" in this PR. I think if we don't want to be rigorous in defining semantic meaning of an attribute then what's the value in the attribute, just reserving its name? Without a clear definition people would start putting different data into it.
PR says "the same product". Is the same binary the same product? What if they are configured differently? Can different binaries be the same product?
I agree it is not defined very precisely. A couple additional ways to think about this:
Do you think it would be useful to add these explanations? |
@tigrannajaryan tbh I am not convinced of the usefulness of At Uber we initially allowed services to hardcode their service names when initializing telemetry SDKs. This was later recognized as a big mistake because those service identities had meaning not just to telemetry, but needed (or at least it was highly desirable) to match other domains, e.g. identities that cluster manager was using, permissions, etc. So we made a coordinated effort to remove hardcoded service names and instead inject them based on env variables defined by the cluster manager. The cluster manager also supported other dimensions in the data model, such as deployment groups, regional jobs, etc. - all of those were injected into resource attributes automatically. I am assuming this model is not unique to Uber. And in this model, I don't see where To play devil's advocate, per your clarification the collector as agent and collector as collector (as called out in the PR description) would still be the same service name and service type, since it's the same binary, the same output telemetry schema, and similar configuration. So why not use |
+1, these additional dimensions (e.g. data center, geo location, availability zone, whether it is a private cloud or public cloud) keep evolving and are normally owned by other systems rather than observability. I think if "service.name" cannot be used as the primary key, it defeats the purpose, keep introducing more dimensions (e.g. "service.type" or "service.category" or something else) will make things more complicated without solving the issue in the end. What I learned from Microsoft, we used to have too many dimensions defined in the observability system, in the end only these work well after many years:
|
@yurishkuro The Generally speaking the end user can set for example
I think this is the equivalent of what I am proposing with simply different names of attributes. I am suggesting (service.type, service.name) pair, you are suggesting (service.name, service.role) pair. I see no semantic different between these two proposals, only attributes names are different. I don't mind using different attribute names if we agree that we need a pair and a single attribute we have is not enough.
I think this is true only for first-party services. If the person (or the team) who develops and who deploys the service are the same then I agree with you, most likely they will choose the same values for My focus is not the first-party services though. I am thinking about third-party services where the developer (the person who builds the service) and the operator (the person who deploys the service) are different and are disconnected from one another, are not part of the same organization. Let's go back to the NGINX example. The developers of NGINX will define that Let's say I expect a telemetry backend to provide a specialized dashboard that is built for NGINX. How does the backend know that this dashboard is applicable? It cannot look at the The
@reyang I think this is not enough to handle the use case I described above with NGINX. If you see a way can you clarify how using these attributes could make the NGINX-specific dashboard activate? -- To everyone: What's interesting is that we have these additional dimensions in semconv already, but they are domain-specific. For example The I think it is a choice that we need to make. Use domains-specific attributesWe use domain-specific, different attributes to signify the type of the service (type as it is known at build time). This is what we do currently (e.g. db.system, messaging.system, webengine.name). Pros
Cons
If we choose this approach I will close this PR and will create a new one to add Settle on one service.typeWe settle on one attribute (service.type) that is the same for all domains. Pros
Cons
I would like to hear some more pros and cons on these choices since it is not clear to me what’s the best way. |
the difference is that I was not suggesting to standardize on Don't think OpAMP use case is a good fit here since it's about a handshake mechanism, not about tagging telemetry of a binary. Ack on the NGINX use case. But it brings back to my first question - if this field is meant to provide a classification mechanism, it needs better definition, and a well-define value domain since you want it to be vendor-neutral identifier. If the only objective of this |
@yurishkuro I will set aside the OpAMP case for now since I agree with you, it has other possible solutions (e.g. Let's focus on the telemetry shape for now and decide if we want it to be roughly described by one attribute
I am not sure that this is a better way. Let's say for example there is hundreds of 3rd party products that produce telemetry according to Otel semconv and they don't deviate from that semconv. In that case the telemetry from these products will reference the base Otel Schema Nevertheless the shape of telemetry produced by each of these products can be very different since each product can use a very different subset of standard Otel semconv. If we require that the shape of the telemetry produced by each product is uniquely described by its Schema URL we are essentially forcing the products to have an extended schema derived from Otel Schema. That is unnecessary burden. Publishing and maintaining an extended schema is a job that is best to be avoided unless there are strong reasons - it requires running a highly available http server that can serve the schema files. Compared to that including a single FQDN value for
Do you think specifying that the value is the reverse FQDN of the product is not precise enough? It is vendor-neutral, ensures no collisions (provided guidelines are followed), is unique enough and is easy to understand. Are you looking for more guidelines on which FQDN to use for a particular product? (e.g. Collector vs Collector Contrib - should they use the same FQDN and which one). |
not specifically on the format, but on what criteria need to be satisfied to produce the same or different FQDNs. For example, jaeger-v1 and jaeger-v2 telemetry shapes are likely going to be significantly different (because of the architecture change), so should both binaries still produce the same or different FQDNs? I can seen arguments either way (since "Version" can be a separate resource attribute anyway). Or another example: jaeger-v2 is going to be a single binary that can work as either jaeger-collector or jaeger-query (from v1 nomenclature). Again, the telemetry is going to be pretty different, but it's the same binary and the same version - so same FQDN or not? |
Yes, the same
Same From the perspective of what to put into |
Fair enough. None of these nuances & explanations are coming through in the description though. I don't like putting things into the spec that are so vague that you need a whole separate FAQ to explain how to use them. |
Any reason to generalize the attribute? Would something be broken if applications that deploy the same binary to multiple services, need to build dashboards based on the same template, etc would define a custom resource attribute for themselves? The use case sounds like a niche one based on my (potentially limited) experience. |
I think this is a fair question. So far I do not see a huge number of supporting voices for a common |
I am happy to re-write the PR and add these to semconv but I would like to first see if there is a good support for the attribute at all (as suggested by @lmolkova). |
I stand by my earlier comment:
|
Isn't all of this already possible by using the Instrumentation Scope for this use case? https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#get-a-tracer Also, a thing I don't really get is:
IIUC, the From the use cases I read here, It seems the PR is somewhat backwards to me. To me, I'd understand it more like: |
@joaopgrassi The same product deployed as multiple roles likely still has largely the same configuration settings and that need to be set mostly to the same values. It is useful to be able to specify this common product-specific (
Hard to tell. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Contributes to #554
Contributes to #396
Contributes to open-telemetry/opamp-spec#131
Problem Description
service.name
Resource attribute is currently defined as the "Logical name of the service". The expectation is thatservice.name
will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within aservice.namespace
).Otel Collector sets
service.name
by default to be the name of the executable (e.g. otelcorecol or otelcontribcol).Collector's
service.name
can be overridden by the operator usingservice.telemetry.resource
setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names.However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees.
This issue talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention.
This issue shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol.
Changes
This change adds
service.type
as a Recommended, experimental Resource semantic convention.The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc.
Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique.
For OpAMP having a separate
service.type
allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though theirservice.name
values may be different due to different logical roles they have.An example unrelated to OpAMP, when using NGINX:
service.type
will be set to "com.nginx", whileservice.name
is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.Merge requirement checklist