From 639453f1b852c80bbc9bbcd74bbb201aa8512cb0 Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Wed, 9 Oct 2024 15:56:50 -0400 Subject: [PATCH 01/17] RFC - Auto-instrumentation of pipeline components --- docs/rfcs/component-universal-telemetry.md | 119 +++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 docs/rfcs/component-universal-telemetry.md diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md new file mode 100644 index 00000000000..4e349281312 --- /dev/null +++ b/docs/rfcs/component-universal-telemetry.md @@ -0,0 +1,119 @@ +# Auto-Instrumented Component Telemetry + +## Motivation + +The collector should be observable and this must naturally include observability of its pipeline components. It is understood that each _type_ (`filelog`, `batch`, etc) of component may emit telemetry describing its internal workings, and that these internally derived signals may vary greatly based on the concerns and maturity of each component. Naturally though, the collector should also describe the behavior of components using broadly normalized telemetry. A major challenge in pursuit is that there must be a clear mechanism by which such telemetry can be automatically captured. Therefore, this RFC is first and foremost a proposal for a _mechanism_. Then, based on what _can_ be captured by this mechanism, the RFC describes specific metrics, spans, and logs which can be broadly normalized. + +## Goals + +1. Articulate a mechanism which enables us to _automatically_ capture telemetry from _all pipeline components_. +2. Define attributes that are (A) specific enough to describe individual component [_instances_](https://github.com/open-telemetry/opentelemetry-collector/issues/10534) and (B) consistent enough for correlation across signals. +3. Define specific metrics for each kind of pipeline component. +4. Define specific spans for processors and connectors. +5. Define specific logs for all kinds of pipeline component. + +### Mechanism + +The mechanism of telemetry capture should be _external_ to components. Specifically, we should observe telemetry at each point where a component passes data to another component, and, at each point where a component consumes data from another component. In terms of the component graph, this means that every _edge_ in the graph will have two layers of instrumentation - one for the producing component and one for the consuming component. Importantly, each layer generates telemetry which is ascribed to a single component instance, so by having two layers per edge we can describe both sides of each handoff independently. In the case of processors and connectors, the appropriate layers can act in concert (e.g. record the start and end of a span). + +### Attributes + +All signals should use the following attributes: + +#### Receivers + +- `otel.component.kind`: `receiver` +- `otel.component.id`: The component ID +- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ALL`** + +#### Processors + +- `otel.component.kind`: `processor` +- `otel.component.id`: The component ID +- `otel.pipeline.id`: The pipeline ID, **OR `ALL`** +- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ALL`** + +#### Exporters + +- `otel.component.kind`: `exporter` +- `otel.component.id`: The component ID +- `otel.signal`: `logs`, `metrics` `traces`, **OR `ALL`** + +#### Connectors + +- `otel.component.kind`: `connector` +- `otel.component.id`: The component ID +- `otel.signal`: `logs->logs`, `logs->metrics`, `logs->traces`, `metrics->logs`, `metrics->metrics`, etc, **OR `ALL`** + +Notes: The use of `ALL` is based on the assumption that components are instanced either in the default way, or, as a single instance per configuration (e.g. otlp receiver). + +### Metrics + +There are two straightforward measurements that can be made on any pdata: + +1. A count of "items" (spans, data points, or log records). These are low cost but broadly useful, so they should be enabled by default. +2. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#L11). These are high cost to compute, so by default they should be disabled (and not calculated). + +The location of these measurements can be described in terms of whether the data is "incoming" or "outgoing", from the perspective of the component to which the telemetry is ascribed. + +1. Incoming measurements are attributed to the component which is _consuming_ the data. +2. Outgoing measurements are attributed to the component which is _producing_ the data. + +For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to whether or not the function call returned an error. Outgoing measurements will be recorded with `outcome` as `failure` when the next consumer returns an error, and `success` otherwise. Likewise, incoming measurements will be recorded with `outcome` as `failure` when the component itself returns an error, and `success` otherwise. + +```yaml + otelcol_component_incoming_items: + enabled: true + description: Number of items passed to the component. + unit: "{items}" + sum: + value_type: int + monotonic: true + otelcol_component_outgoing_items: + enabled: true + description: Number of items emitted from the component. + unit: "{items}" + sum: + value_type: int + monotonic: true + + otelcol_component_incoming_size: + enabled: false + description: Size of items passed to the component. + unit: "By" + sum: + value_type: int + monotonic: true + otelcol_component_outgoing_size: + enabled: false + description: Size of items emitted from the component. + unit: "By" + sum: + value_type: int + monotonic: true +``` + +### Spans + +A span should be recorded for each execution of a processor or connector. The instrumentation layers adjacent to these components can start and end the span as appropriate. + +### Logs + +Metrics and spans provide most of the observability we need but there are some gaps which logs can fill. For example, we can record spans for processors and connectors but logs are useful for capturing precise timing as it relates to data produced by receivers and consumed by exporters. Additionally, although metrics would describe the overall item counts, it is helpful in some cases to record more granular events. e.g. If an outgoing batch of 10,000 spans results in an error, but 100 batches of 100 spans succeed, this may be a matter of batch size that can be detected by analyzing logs, while the corresponding metric reports only that a 50% success rate is observed. + +For security and performance reasons, it would not be appropriate to log the contents of telemetry. + +It's very easy for logs to become too noisy. Even if errors are occurring frequently in the data pipeline, they may only be of interest to many users if they are not handled automatically. + +With the above considerations, this proposal includes only that we add a DEBUG log for each individual outcome. This should be sufficient for detailed troubleshooting but does not impact users otherwise. + +In the future, it may be helpful to define triggers for reporting repeated failures at a higher severity level. e.g. N number of failures in a row, or a moving average success %. For now, the criteria and necessary configurability is unclear so this is mentioned only as an example of future possibilities. + +### Additional context + +This proposal pulls from a number of issues and PRs: + +- [Demonstrate graph-based metrics](https://github.com/open-telemetry/opentelemetry-collector/pull/11311) +- [Attributes for component instancing](https://github.com/open-telemetry/opentelemetry-collector/issues/11179) +- [Simple processor metrics](https://github.com/open-telemetry/opentelemetry-collector/issues/10708) +- [Component instancing is complicated](https://github.com/open-telemetry/opentelemetry-collector/issues/10534) From 2a486ffc52f3c6e991147b38e2209374130ada6b Mon Sep 17 00:00:00 2001 From: Daniel Jaglowski Date: Thu, 10 Oct 2024 11:45:32 -0500 Subject: [PATCH 02/17] Update docs/rfcs/component-universal-telemetry.md Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com> --- docs/rfcs/component-universal-telemetry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 4e349281312..aeacdb646d7 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -14,7 +14,7 @@ The collector should be observable and this must naturally include observability ### Mechanism -The mechanism of telemetry capture should be _external_ to components. Specifically, we should observe telemetry at each point where a component passes data to another component, and, at each point where a component consumes data from another component. In terms of the component graph, this means that every _edge_ in the graph will have two layers of instrumentation - one for the producing component and one for the consuming component. Importantly, each layer generates telemetry which is ascribed to a single component instance, so by having two layers per edge we can describe both sides of each handoff independently. In the case of processors and connectors, the appropriate layers can act in concert (e.g. record the start and end of a span). +The mechanism of telemetry capture should be _external_ to components. Specifically, we should observe telemetry at each point where a component passes data to another component, and, at each point where a component consumes data from another component. In terms of the component graph, every _edge_ in the graph will have two layers of instrumentation - one for the producing component and one for the consuming component. Importantly, each layer generates telemetry ascribed to a single component instance, so by having two layers per edge we can describe both sides of each handoff independently. In the case of processors and connectors, the appropriate layers can act in concert (e.g. record the start and end of a span). ### Attributes From d3500437437cf8f649d323ba988f379c7f4a38de Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Thu, 10 Oct 2024 13:16:50 -0400 Subject: [PATCH 03/17] Feedback --- docs/rfcs/component-universal-telemetry.md | 29 +++++++++++----------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index aeacdb646d7..5bc2329106c 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -2,19 +2,18 @@ ## Motivation -The collector should be observable and this must naturally include observability of its pipeline components. It is understood that each _type_ (`filelog`, `batch`, etc) of component may emit telemetry describing its internal workings, and that these internally derived signals may vary greatly based on the concerns and maturity of each component. Naturally though, the collector should also describe the behavior of components using broadly normalized telemetry. A major challenge in pursuit is that there must be a clear mechanism by which such telemetry can be automatically captured. Therefore, this RFC is first and foremost a proposal for a _mechanism_. Then, based on what _can_ be captured by this mechanism, the RFC describes specific metrics, spans, and logs which can be broadly normalized. +The collector should be observable and this must naturally include observability of its pipeline components. It is understood that each _type_ (`filelog`, `batch`, etc) of component may emit telemetry describing its internal workings, and that these internally derived signals may vary greatly based on the concerns and maturity of each component. Naturally though, the collector should also describe the behavior of components using broadly normalized telemetry. A major challenge in pursuit is that there must be a clear mechanism by which such telemetry can be automatically captured. Therefore, this RFC is first and foremost a proposal for a _mechanism_. Then, based on what _can_ be captured by this mechanism, the RFC describes specific metrics and logs which can be broadly normalized. ## Goals 1. Articulate a mechanism which enables us to _automatically_ capture telemetry from _all pipeline components_. 2. Define attributes that are (A) specific enough to describe individual component [_instances_](https://github.com/open-telemetry/opentelemetry-collector/issues/10534) and (B) consistent enough for correlation across signals. 3. Define specific metrics for each kind of pipeline component. -4. Define specific spans for processors and connectors. -5. Define specific logs for all kinds of pipeline component. +4. Define specific logs for all kinds of pipeline component. ### Mechanism -The mechanism of telemetry capture should be _external_ to components. Specifically, we should observe telemetry at each point where a component passes data to another component, and, at each point where a component consumes data from another component. In terms of the component graph, every _edge_ in the graph will have two layers of instrumentation - one for the producing component and one for the consuming component. Importantly, each layer generates telemetry ascribed to a single component instance, so by having two layers per edge we can describe both sides of each handoff independently. In the case of processors and connectors, the appropriate layers can act in concert (e.g. record the start and end of a span). +The mechanism of telemetry capture should be _external_ to components. Specifically, we should observe telemetry at each point where a component passes data to another component, and, at each point where a component consumes data from another component. In terms of the component graph, every _edge_ in the graph will have two layers of instrumentation - one for the producing component and one for the consuming component. Importantly, each layer generates telemetry ascribed to a single component instance, so by having two layers per edge we can describe both sides of each handoff independently. ### Attributes @@ -24,28 +23,28 @@ All signals should use the following attributes: - `otel.component.kind`: `receiver` - `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ALL`** +- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ANY`** #### Processors - `otel.component.kind`: `processor` - `otel.component.id`: The component ID -- `otel.pipeline.id`: The pipeline ID, **OR `ALL`** -- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ALL`** +- `otel.pipeline.id`: The pipeline ID, **OR `ANY`** +- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ANY`** #### Exporters - `otel.component.kind`: `exporter` - `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics` `traces`, **OR `ALL`** +- `otel.signal`: `logs`, `metrics` `traces`, **OR `ANY`** #### Connectors - `otel.component.kind`: `connector` - `otel.component.id`: The component ID -- `otel.signal`: `logs->logs`, `logs->metrics`, `logs->traces`, `metrics->logs`, `metrics->metrics`, etc, **OR `ALL`** +- `otel.signal`: `logs->logs`, `logs->metrics`, `logs->traces`, `metrics->logs`, `metrics->metrics`, etc, **OR `ANY`** -Notes: The use of `ALL` is based on the assumption that components are instanced either in the default way, or, as a single instance per configuration (e.g. otlp receiver). +Notes: The use of `ANY` indicates that values are not associated with a particular signal or pipeline. This is used when a component enforces non-standard instancing patterns. For example, the `otlp` receiver isa singleton, so the values are aggregated across signals. Similarly, the `memory_limiter` processor is a singleton, so the values are aggregated across pipelines. ### Metrics @@ -93,13 +92,9 @@ For both metrics, an `outcome` attribute with possible values `success` and `fai monotonic: true ``` -### Spans - -A span should be recorded for each execution of a processor or connector. The instrumentation layers adjacent to these components can start and end the span as appropriate. - ### Logs -Metrics and spans provide most of the observability we need but there are some gaps which logs can fill. For example, we can record spans for processors and connectors but logs are useful for capturing precise timing as it relates to data produced by receivers and consumed by exporters. Additionally, although metrics would describe the overall item counts, it is helpful in some cases to record more granular events. e.g. If an outgoing batch of 10,000 spans results in an error, but 100 batches of 100 spans succeed, this may be a matter of batch size that can be detected by analyzing logs, while the corresponding metric reports only that a 50% success rate is observed. +Metrics provide most of the observability we need but there are some gaps which logs can fill. Although metrics would describe the overall item counts, it is helpful in some cases to record more granular events. e.g. If an outgoing batch of 10,000 spans results in an error, but 100 batches of 100 spans succeed, this may be a matter of batch size that can be detected by analyzing logs, while the corresponding metric reports only that a 50% success rate is observed. For security and performance reasons, it would not be appropriate to log the contents of telemetry. @@ -109,6 +104,10 @@ With the above considerations, this proposal includes only that we add a DEBUG l In the future, it may be helpful to define triggers for reporting repeated failures at a higher severity level. e.g. N number of failures in a row, or a moving average success %. For now, the criteria and necessary configurability is unclear so this is mentioned only as an example of future possibilities. +### Spans + +It is not clear that any spans can be captured automatically with the proposed mechanism. We have the ability to insert instrumentation both before and after processors and connectors. However, we generally cannot assume a 1:1 relationship between incoming and outgoing data. + ### Additional context This proposal pulls from a number of issues and PRs: From 9a2a9258da33642622c8c2b4f6f8df4ecaaf580b Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Fri, 11 Oct 2024 15:09:05 -0400 Subject: [PATCH 04/17] Feedback --- docs/rfcs/component-universal-telemetry.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 5bc2329106c..94f34b79a14 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -23,28 +23,29 @@ All signals should use the following attributes: - `otel.component.kind`: `receiver` - `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ANY`** +- `otel.signal`: `logs`, `metrics`, `traces` #### Processors - `otel.component.kind`: `processor` - `otel.component.id`: The component ID -- `otel.pipeline.id`: The pipeline ID, **OR `ANY`** -- `otel.signal`: `logs`, `metrics`, `traces`, **OR `ANY`** +- `otel.pipeline.id`: The pipeline ID +- `otel.signal`: `logs`, `metrics`, `traces` #### Exporters - `otel.component.kind`: `exporter` - `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics` `traces`, **OR `ANY`** +- `otel.signal`: `logs`, `metrics` `traces` #### Connectors - `otel.component.kind`: `connector` - `otel.component.id`: The component ID -- `otel.signal`: `logs->logs`, `logs->metrics`, `logs->traces`, `metrics->logs`, `metrics->metrics`, etc, **OR `ANY`** +- `otel.signal`: `logs`, `metrics` `traces` +- `otel.output.signal`: `logs`, `metrics` `traces` -Notes: The use of `ANY` indicates that values are not associated with a particular signal or pipeline. This is used when a component enforces non-standard instancing patterns. For example, the `otlp` receiver isa singleton, so the values are aggregated across signals. Similarly, the `memory_limiter` processor is a singleton, so the values are aggregated across pipelines. +Note: The `otel.signal`, `otel.output.signal`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances are unified by the component implementation. For example, the `otlp` receiver is a singleton, so its telemetry is not specific to a signal. Similarly, the `memory_limiter` processor is a singleton, so its telemetry is not specific to a pipeline. ### Metrics @@ -58,7 +59,7 @@ The location of these measurements can be described in terms of whether the data 1. Incoming measurements are attributed to the component which is _consuming_ the data. 2. Outgoing measurements are attributed to the component which is _producing_ the data. -For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to whether or not the function call returned an error. Outgoing measurements will be recorded with `outcome` as `failure` when the next consumer returns an error, and `success` otherwise. Likewise, incoming measurements will be recorded with `outcome` as `failure` when the component itself returns an error, and `success` otherwise. +For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to whether or not the corresponding function call returned an error. Specifically, incoming measurements will be recorded with `outcome` as `failure` when a call from the previous component the `ConsumeX` function returns an error, and `success` otherwise. Likewise, outgoing measurements will be recorded with `outcome` as `failure` when a call to the next consumer's `ConsumeX` function returns an error, and `success` otherwise. ```yaml otelcol_component_incoming_items: From 9d0cc6de723821817c56695dec7d6036ddb32e7e Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Wed, 16 Oct 2024 15:24:17 -0400 Subject: [PATCH 05/17] Broaden scope and convert to evolving consensus --- docs/rfcs/component-universal-telemetry.md | 93 +++++++++++++++------- 1 file changed, 66 insertions(+), 27 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 94f34b79a14..8fed5e062bc 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -1,65 +1,96 @@ -# Auto-Instrumented Component Telemetry +# Pipeline Component Telemetry -## Motivation +## Motivation and Scope -The collector should be observable and this must naturally include observability of its pipeline components. It is understood that each _type_ (`filelog`, `batch`, etc) of component may emit telemetry describing its internal workings, and that these internally derived signals may vary greatly based on the concerns and maturity of each component. Naturally though, the collector should also describe the behavior of components using broadly normalized telemetry. A major challenge in pursuit is that there must be a clear mechanism by which such telemetry can be automatically captured. Therefore, this RFC is first and foremost a proposal for a _mechanism_. Then, based on what _can_ be captured by this mechanism, the RFC describes specific metrics and logs which can be broadly normalized. +The collector should be observable and this must naturally include observability of its pipeline components. Pipeline components +are those components of the collector which directly interact with data, specifically receivers, processors, exporters, and connectors. + +It is understood that each _type_ (`filelog`, `batch`, etc) of component may emit telemetry describing its internal workings, +and that these internally derived signals may vary greatly based on the concerns and maturity of each component. Naturally +though, there is much we can do to normalize the telemetry emitted from and about pipeline components. + +Two major challenges in pursuit of broadly normalized telemetry are (1) consistent attributes, and (2) automatic capture. + +This RFC represents an evolving consensus about the desired end state of component telemetry. It does _not_ claim +to describe the final state of all component telemetry, but rather seeks to document some specific aspects. It proposes a set of +attributes which are both necessary and sufficient to identify components and their instances. It also articulates one specific +mechanism by which some telemetry can be automatically captured. Finally, it describes some specific metrics and logs which should +be automatically captured for each kind of pipeline component. ## Goals -1. Articulate a mechanism which enables us to _automatically_ capture telemetry from _all pipeline components_. -2. Define attributes that are (A) specific enough to describe individual component [_instances_](https://github.com/open-telemetry/opentelemetry-collector/issues/10534) and (B) consistent enough for correlation across signals. +1. Define attributes that are (A) specific enough to describe individual component[_instances_](https://github.com/open-telemetry/opentelemetry-collector/issues/10534) + and (B) consistent enough for correlation across signals. +2. Articulate a mechanism which enables us to _automatically_ capture telemetry from _all pipeline components_. 3. Define specific metrics for each kind of pipeline component. 4. Define specific logs for all kinds of pipeline component. -### Mechanism - -The mechanism of telemetry capture should be _external_ to components. Specifically, we should observe telemetry at each point where a component passes data to another component, and, at each point where a component consumes data from another component. In terms of the component graph, every _edge_ in the graph will have two layers of instrumentation - one for the producing component and one for the consuming component. Importantly, each layer generates telemetry ascribed to a single component instance, so by having two layers per edge we can describe both sides of each handoff independently. - -### Attributes +## Attributes All signals should use the following attributes: -#### Receivers +### Receivers - `otel.component.kind`: `receiver` - `otel.component.id`: The component ID - `otel.signal`: `logs`, `metrics`, `traces` -#### Processors +### Processors - `otel.component.kind`: `processor` - `otel.component.id`: The component ID - `otel.pipeline.id`: The pipeline ID - `otel.signal`: `logs`, `metrics`, `traces` -#### Exporters +### Exporters - `otel.component.kind`: `exporter` - `otel.component.id`: The component ID - `otel.signal`: `logs`, `metrics` `traces` -#### Connectors +### Connectors - `otel.component.kind`: `connector` - `otel.component.id`: The component ID - `otel.signal`: `logs`, `metrics` `traces` - `otel.output.signal`: `logs`, `metrics` `traces` -Note: The `otel.signal`, `otel.output.signal`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances are unified by the component implementation. For example, the `otlp` receiver is a singleton, so its telemetry is not specific to a signal. Similarly, the `memory_limiter` processor is a singleton, so its telemetry is not specific to a pipeline. +Note: The `otel.signal`, `otel.output.signal`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances +are unified by the component implementation. For example, the `otlp` receiver is a singleton, so its telemetry is not specific to a signal. +Similarly, the `memory_limiter` processor is a singleton, so its telemetry is not specific to a pipeline. + +## Auto-Instrumentation Mechanism + +The mechanism of telemetry capture should be _external_ to components. Specifically, we should observe telemetry at each point where a +component passes data to another component, and, at each point where a component consumes data from another component. In terms of the +component graph, every _edge_ in the graph will have two layers of instrumentation - one for the producing component and one for the +consuming component. Importantly, each layer generates telemetry ascribed to a single component instance, so by having two layers per +edge we can describe both sides of each handoff independently. + +Telemetry captured by this mechanism should be associated with an instrumentation scope corresponding to the package which implements +the mechanism. Currently, that package is `service/internal/graph`, but this may change in the future. Notably, this telemetry is not +ascribed to individual component packages, both because the instrumentation scope is intended to describe the origin of the telemetry, +and because no mechanism is presently identified which would allow us to determine the characteristics of a component-specific scope. -### Metrics +### Auto-Instrumented Metrics There are two straightforward measurements that can be made on any pdata: 1. A count of "items" (spans, data points, or log records). These are low cost but broadly useful, so they should be enabled by default. -2. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#L11). These are high cost to compute, so by default they should be disabled (and not calculated). +2. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#L11). + These are high cost to compute, so by default they should be disabled (and not calculated). -The location of these measurements can be described in terms of whether the data is "incoming" or "outgoing", from the perspective of the component to which the telemetry is ascribed. +The location of these measurements can be described in terms of whether the data is "incoming" or "outgoing", from the perspective of the +component to which the telemetry is ascribed. 1. Incoming measurements are attributed to the component which is _consuming_ the data. 2. Outgoing measurements are attributed to the component which is _producing_ the data. -For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to whether or not the corresponding function call returned an error. Specifically, incoming measurements will be recorded with `outcome` as `failure` when a call from the previous component the `ConsumeX` function returns an error, and `success` otherwise. Likewise, outgoing measurements will be recorded with `outcome` as `failure` when a call to the next consumer's `ConsumeX` function returns an error, and `success` otherwise. +For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to +whether or not the corresponding function call returned an error. Specifically, incoming measurements will be recorded with `outcome` as +`failure` when a call from the previous component the `ConsumeX` function returns an error, and `success` otherwise. Likewise, outgoing +measurements will be recorded with `outcome` as `failure` when a call to the next consumer's `ConsumeX` function returns an error, and +`success` otherwise. ```yaml otelcol_component_incoming_items: @@ -93,23 +124,31 @@ For both metrics, an `outcome` attribute with possible values `success` and `fai monotonic: true ``` -### Logs +### Auto-Instrumented Logs -Metrics provide most of the observability we need but there are some gaps which logs can fill. Although metrics would describe the overall item counts, it is helpful in some cases to record more granular events. e.g. If an outgoing batch of 10,000 spans results in an error, but 100 batches of 100 spans succeed, this may be a matter of batch size that can be detected by analyzing logs, while the corresponding metric reports only that a 50% success rate is observed. +Metrics provide most of the observability we need but there are some gaps which logs can fill. Although metrics would describe the overall +item counts, it is helpful in some cases to record more granular events. e.g. If an outgoing batch of 10,000 spans results in an error, but +100 batches of 100 spans succeed, this may be a matter of batch size that can be detected by analyzing logs, while the corresponding metric +reports only that a 50% success rate is observed. For security and performance reasons, it would not be appropriate to log the contents of telemetry. -It's very easy for logs to become too noisy. Even if errors are occurring frequently in the data pipeline, they may only be of interest to many users if they are not handled automatically. +It's very easy for logs to become too noisy. Even if errors are occurring frequently in the data pipeline, they may only be of interest to +many users if they are not handled automatically. -With the above considerations, this proposal includes only that we add a DEBUG log for each individual outcome. This should be sufficient for detailed troubleshooting but does not impact users otherwise. +With the above considerations, this proposal includes only that we add a DEBUG log for each individual outcome. This should be sufficient for +detailed troubleshooting but does not impact users otherwise. -In the future, it may be helpful to define triggers for reporting repeated failures at a higher severity level. e.g. N number of failures in a row, or a moving average success %. For now, the criteria and necessary configurability is unclear so this is mentioned only as an example of future possibilities. +In the future, it may be helpful to define triggers for reporting repeated failures at a higher severity level. e.g. N number of failures in +a row, or a moving average success %. For now, the criteria and necessary configurability is unclear so this is mentioned only as an example +of future possibilities. -### Spans +### Auto-Instrumented Spans -It is not clear that any spans can be captured automatically with the proposed mechanism. We have the ability to insert instrumentation both before and after processors and connectors. However, we generally cannot assume a 1:1 relationship between incoming and outgoing data. +It is not clear that any spans can be captured automatically with the proposed mechanism. We have the ability to insert instrumentation both +before and after processors and connectors. However, we generally cannot assume a 1:1 relationship between incoming and outgoing data. -### Additional context +## Additional Context This proposal pulls from a number of issues and PRs: From 14b0ba19423d2f23a1a3b68969cd05a131baae6f Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Mon, 21 Oct 2024 10:41:28 -0400 Subject: [PATCH 06/17] Update names to consumed and produced --- docs/rfcs/component-universal-telemetry.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 8fed5e062bc..702b58dde6c 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -80,27 +80,27 @@ There are two straightforward measurements that can be made on any pdata: 2. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#L11). These are high cost to compute, so by default they should be disabled (and not calculated). -The location of these measurements can be described in terms of whether the data is "incoming" or "outgoing", from the perspective of the +The location of these measurements can be described in terms of whether the data is "consumed" or "produced", from the perspective of the component to which the telemetry is ascribed. 1. Incoming measurements are attributed to the component which is _consuming_ the data. 2. Outgoing measurements are attributed to the component which is _producing_ the data. For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to -whether or not the corresponding function call returned an error. Specifically, incoming measurements will be recorded with `outcome` as -`failure` when a call from the previous component the `ConsumeX` function returns an error, and `success` otherwise. Likewise, outgoing +whether or not the corresponding function call returned an error. Specifically, consumed measurements will be recorded with `outcome` as +`failure` when a call from the previous component the `ConsumeX` function returns an error, and `success` otherwise. Likewise, produced measurements will be recorded with `outcome` as `failure` when a call to the next consumer's `ConsumeX` function returns an error, and `success` otherwise. ```yaml - otelcol_component_incoming_items: + otelcol_component_consumed_items: enabled: true description: Number of items passed to the component. unit: "{items}" sum: value_type: int monotonic: true - otelcol_component_outgoing_items: + otelcol_component_produced_items: enabled: true description: Number of items emitted from the component. unit: "{items}" @@ -108,14 +108,14 @@ measurements will be recorded with `outcome` as `failure` when a call to the nex value_type: int monotonic: true - otelcol_component_incoming_size: + otelcol_component_consumed_size: enabled: false description: Size of items passed to the component. unit: "By" sum: value_type: int monotonic: true - otelcol_component_outgoing_size: + otelcol_component_produced_size: enabled: false description: Size of items emitted from the component. unit: "By" @@ -127,7 +127,7 @@ measurements will be recorded with `outcome` as `failure` when a call to the nex ### Auto-Instrumented Logs Metrics provide most of the observability we need but there are some gaps which logs can fill. Although metrics would describe the overall -item counts, it is helpful in some cases to record more granular events. e.g. If an outgoing batch of 10,000 spans results in an error, but +item counts, it is helpful in some cases to record more granular events. e.g. If a produced batch of 10,000 spans results in an error, but 100 batches of 100 spans succeed, this may be a matter of batch size that can be detected by analyzing logs, while the corresponding metric reports only that a 50% success rate is observed. @@ -146,7 +146,7 @@ of future possibilities. ### Auto-Instrumented Spans It is not clear that any spans can be captured automatically with the proposed mechanism. We have the ability to insert instrumentation both -before and after processors and connectors. However, we generally cannot assume a 1:1 relationship between incoming and outgoing data. +before and after processors and connectors. However, we generally cannot assume a 1:1 relationship between consumed and produced data. ## Additional Context From 9cc449a2b1521549375909aaf33b9bf7b82e0fa3 Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Wed, 23 Oct 2024 14:08:58 -0400 Subject: [PATCH 07/17] Change proposed metric names to use '.' instead of '_' --- docs/rfcs/component-universal-telemetry.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 702b58dde6c..94372c015bb 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -93,14 +93,14 @@ measurements will be recorded with `outcome` as `failure` when a call to the nex `success` otherwise. ```yaml - otelcol_component_consumed_items: + otelcol.component.consumed.items: enabled: true description: Number of items passed to the component. unit: "{items}" sum: value_type: int monotonic: true - otelcol_component_produced_items: + otelcol.component.produced.items: enabled: true description: Number of items emitted from the component. unit: "{items}" @@ -108,14 +108,14 @@ measurements will be recorded with `outcome` as `failure` when a call to the nex value_type: int monotonic: true - otelcol_component_consumed_size: + otelcol.component.consumed.size: enabled: false description: Size of items passed to the component. unit: "By" sum: value_type: int monotonic: true - otelcol_component_produced_size: + otelcol.component.produced.size: enabled: false description: Size of items emitted from the component. unit: "By" From f03ce85e564b9f5927dd6e3f9b50d2804d7c61f4 Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Wed, 23 Oct 2024 14:41:46 -0400 Subject: [PATCH 08/17] Separate metrics by component kind --- docs/rfcs/component-universal-telemetry.md | 78 ++++++++++++++++++---- 1 file changed, 66 insertions(+), 12 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 94372c015bb..9ef5347ffd5 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -81,10 +81,8 @@ There are two straightforward measurements that can be made on any pdata: These are high cost to compute, so by default they should be disabled (and not calculated). The location of these measurements can be described in terms of whether the data is "consumed" or "produced", from the perspective of the -component to which the telemetry is ascribed. - -1. Incoming measurements are attributed to the component which is _consuming_ the data. -2. Outgoing measurements are attributed to the component which is _producing_ the data. +component to which the telemetry is attributed. Metrics which contain the term "procuded" describe data which is emitted from the component, +while metrics which contain the term "consumed" describe data which is received by the component. For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to whether or not the corresponding function call returned an error. Specifically, consumed measurements will be recorded with `outcome` as @@ -93,31 +91,87 @@ measurements will be recorded with `outcome` as `failure` when a call to the nex `success` otherwise. ```yaml - otelcol.component.consumed.items: + otelcol.receiver.produced.items: + enabled: true + description: Number of items emitted from the receiver. + unit: "{items}" + sum: + value_type: int + monotonic: true + otelcol.processor.consumed.items: + enabled: true + description: Number of items passed to the processor. + unit: "{items}" + sum: + value_type: int + monotonic: true + otelcol.processor.produced.items: + enabled: true + description: Number of items emitted from the processor. + unit: "{items}" + sum: + value_type: int + monotonic: true + otelcol.connector.consumed.items: + enabled: true + description: Number of items passed to the connector. + unit: "{items}" + sum: + value_type: int + monotonic: true + otelcol.connector.produced.items: enabled: true - description: Number of items passed to the component. + description: Number of items emitted from the connector. unit: "{items}" sum: value_type: int monotonic: true - otelcol.component.produced.items: + otelcol.exporter.consumed.items: enabled: true - description: Number of items emitted from the component. + description: Number of items passed to the exporter. unit: "{items}" sum: value_type: int monotonic: true - otelcol.component.consumed.size: + otelcol.receiver.produced.size: + enabled: false + description: Size of items emitted from the receiver. + unit: "By" + sum: + value_type: int + monotonic: true + otelcol.processor.consumed.size: + enabled: false + description: Size of items passed to the processor. + unit: "By" + sum: + value_type: int + monotonic: true + otelcol.processor.produced.size: + enabled: false + description: Size of items emitted from the processor. + unit: "By" + sum: + value_type: int + monotonic: true + otelcol.connector.consumed.size: + enabled: false + description: Size of items passed to the connector. + unit: "By" + sum: + value_type: int + monotonic: true + otelcol.connector.produced.size: enabled: false - description: Size of items passed to the component. + description: Size of items emitted from the connector. unit: "By" sum: value_type: int monotonic: true - otelcol.component.produced.size: + otelcol.exporter.consumed.size: enabled: false - description: Size of items emitted from the component. + description: Size of items passed to the exporter. unit: "By" sum: value_type: int From 3a911350029dd90d83f4d09c4973fc5b469b259a Mon Sep 17 00:00:00 2001 From: Daniel Jaglowski Date: Thu, 24 Oct 2024 09:52:28 -0500 Subject: [PATCH 09/17] Add profiles as attribute value Co-authored-by: Damien Mathieu <42@dmathieu.com> --- docs/rfcs/component-universal-telemetry.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 9ef5347ffd5..38d9f5ec396 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -33,27 +33,27 @@ All signals should use the following attributes: - `otel.component.kind`: `receiver` - `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics`, `traces` +- `otel.signal`: `logs`, `metrics`, `traces`, `profiles` ### Processors - `otel.component.kind`: `processor` - `otel.component.id`: The component ID - `otel.pipeline.id`: The pipeline ID -- `otel.signal`: `logs`, `metrics`, `traces` +- `otel.signal`: `logs`, `metrics`, `traces`, `profiles` ### Exporters - `otel.component.kind`: `exporter` - `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics` `traces` +- `otel.signal`: `logs`, `metrics` `traces`, `profiles` ### Connectors - `otel.component.kind`: `connector` - `otel.component.id`: The component ID - `otel.signal`: `logs`, `metrics` `traces` -- `otel.output.signal`: `logs`, `metrics` `traces` +- `otel.output.signal`: `logs`, `metrics` `traces`, `profiles` Note: The `otel.signal`, `otel.output.signal`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances are unified by the component implementation. For example, the `otlp` receiver is a singleton, so its telemetry is not specific to a signal. From ecda69571f08e5bb16287643b00bed430cb8c09a Mon Sep 17 00:00:00 2001 From: Daniel Jaglowski Date: Wed, 30 Oct 2024 08:04:26 -0500 Subject: [PATCH 10/17] Update docs/rfcs/component-universal-telemetry.md Co-authored-by: William Dumont --- docs/rfcs/component-universal-telemetry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 38d9f5ec396..a69a6d652dd 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -81,7 +81,7 @@ There are two straightforward measurements that can be made on any pdata: These are high cost to compute, so by default they should be disabled (and not calculated). The location of these measurements can be described in terms of whether the data is "consumed" or "produced", from the perspective of the -component to which the telemetry is attributed. Metrics which contain the term "procuded" describe data which is emitted from the component, +component to which the telemetry is attributed. Metrics which contain the term "produced" describe data which is emitted from the component, while metrics which contain the term "consumed" describe data which is received by the component. For both metrics, an `outcome` attribute with possible values `success` and `failure` should be automatically recorded, corresponding to From e87f245dcee0a3b8bc02cf08d11baf29c445480e Mon Sep 17 00:00:00 2001 From: Daniel Jaglowski Date: Mon, 4 Nov 2024 11:15:52 -0600 Subject: [PATCH 11/17] Update docs/rfcs/component-universal-telemetry.md Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com> --- docs/rfcs/component-universal-telemetry.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index a69a6d652dd..e5f9269f420 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -187,8 +187,8 @@ reports only that a 50% success rate is observed. For security and performance reasons, it would not be appropriate to log the contents of telemetry. -It's very easy for logs to become too noisy. Even if errors are occurring frequently in the data pipeline, they may only be of interest to -many users if they are not handled automatically. +It's very easy for logs to become too noisy. Even if errors are occurring frequently in the data pipeline, only the errors that are not +handled automatically will be of interest to most users. With the above considerations, this proposal includes only that we add a DEBUG log for each individual outcome. This should be sufficient for detailed troubleshooting but does not impact users otherwise. From 9645cf04667825f9f49e1281f1ccc5d8982bbc98 Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Mon, 4 Nov 2024 16:39:44 -0500 Subject: [PATCH 12/17] Change 'otel.output.signal' to 'otel.signal.output' --- docs/rfcs/component-universal-telemetry.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index e5f9269f420..bc61d5a3cee 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -53,9 +53,9 @@ All signals should use the following attributes: - `otel.component.kind`: `connector` - `otel.component.id`: The component ID - `otel.signal`: `logs`, `metrics` `traces` -- `otel.output.signal`: `logs`, `metrics` `traces`, `profiles` +- `otel.signal.output`: `logs`, `metrics` `traces`, `profiles` -Note: The `otel.signal`, `otel.output.signal`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances +Note: The `otel.signal`, `otel.signal.output`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances are unified by the component implementation. For example, the `otlp` receiver is a singleton, so its telemetry is not specific to a signal. Similarly, the `memory_limiter` processor is a singleton, so its telemetry is not specific to a pipeline. From eda769992d6f86a126b57aa42e31e16a64939819 Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Thu, 7 Nov 2024 09:24:22 -0500 Subject: [PATCH 13/17] Change 'otel.*' to 'otelcol.*' --- docs/rfcs/component-universal-telemetry.md | 30 +++++++++++----------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index bc61d5a3cee..fa573d56aef 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -31,31 +31,31 @@ All signals should use the following attributes: ### Receivers -- `otel.component.kind`: `receiver` -- `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics`, `traces`, `profiles` +- `otelcol.component.kind`: `receiver` +- `otelcol.component.id`: The component ID +- `otelcol.signal`: `logs`, `metrics`, `traces`, `profiles` ### Processors -- `otel.component.kind`: `processor` -- `otel.component.id`: The component ID -- `otel.pipeline.id`: The pipeline ID -- `otel.signal`: `logs`, `metrics`, `traces`, `profiles` +- `otelcol.component.kind`: `processor` +- `otelcol.component.id`: The component ID +- `otelcol.pipeline.id`: The pipeline ID +- `otelcol.signal`: `logs`, `metrics`, `traces`, `profiles` ### Exporters -- `otel.component.kind`: `exporter` -- `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics` `traces`, `profiles` +- `otelcol.component.kind`: `exporter` +- `otelcol.component.id`: The component ID +- `otelcol.signal`: `logs`, `metrics` `traces`, `profiles` ### Connectors -- `otel.component.kind`: `connector` -- `otel.component.id`: The component ID -- `otel.signal`: `logs`, `metrics` `traces` -- `otel.signal.output`: `logs`, `metrics` `traces`, `profiles` +- `otelcol.component.kind`: `connector` +- `otelcol.component.id`: The component ID +- `otelcol.signal`: `logs`, `metrics` `traces` +- `otelcol.signal.output`: `logs`, `metrics` `traces`, `profiles` -Note: The `otel.signal`, `otel.signal.output`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances +Note: The `otelcol.signal`, `otelcol.signal.output`, or `otelcol.pipeline.id` attributes may be omitted if the corresponding component instances are unified by the component implementation. For example, the `otlp` receiver is a singleton, so its telemetry is not specific to a signal. Similarly, the `memory_limiter` processor is a singleton, so its telemetry is not specific to a pipeline. From 5d5407845e3a5059babfcb1fdeace965d49f2680 Mon Sep 17 00:00:00 2001 From: Daniel Jaglowski Date: Thu, 21 Nov 2024 08:57:46 -0600 Subject: [PATCH 14/17] Update docs/rfcs/component-universal-telemetry.md --- docs/rfcs/component-universal-telemetry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index fa573d56aef..6c59e9c4585 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -78,7 +78,7 @@ There are two straightforward measurements that can be made on any pdata: 1. A count of "items" (spans, data points, or log records). These are low cost but broadly useful, so they should be enabled by default. 2. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#L11). - These are high cost to compute, so by default they should be disabled (and not calculated). + These may be high cost to compute, so by default they should be disabled (and not calculated). This default setting may change in the future if it is demonstrated that the cost is generally acceptable. The location of these measurements can be described in terms of whether the data is "consumed" or "produced", from the perspective of the component to which the telemetry is attributed. Metrics which contain the term "produced" describe data which is emitted from the component, From 95bafe9d3af4325cda26bbaf7dcfc320495013af Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Thu, 21 Nov 2024 10:11:52 -0500 Subject: [PATCH 15/17] Change unit "items" to "item" --- docs/rfcs/component-universal-telemetry.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 6c59e9c4585..a628ad65fd0 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -94,42 +94,42 @@ measurements will be recorded with `outcome` as `failure` when a call to the nex otelcol.receiver.produced.items: enabled: true description: Number of items emitted from the receiver. - unit: "{items}" + unit: "{item}" sum: value_type: int monotonic: true otelcol.processor.consumed.items: enabled: true description: Number of items passed to the processor. - unit: "{items}" + unit: "{item}" sum: value_type: int monotonic: true otelcol.processor.produced.items: enabled: true description: Number of items emitted from the processor. - unit: "{items}" + unit: "{item}" sum: value_type: int monotonic: true otelcol.connector.consumed.items: enabled: true description: Number of items passed to the connector. - unit: "{items}" + unit: "{item}" sum: value_type: int monotonic: true otelcol.connector.produced.items: enabled: true description: Number of items emitted from the connector. - unit: "{items}" + unit: "{item}" sum: value_type: int monotonic: true otelcol.exporter.consumed.items: enabled: true description: Number of items passed to the exporter. - unit: "{items}" + unit: "{item}" sum: value_type: int monotonic: true From a7a15e56598ad34e412e8b62839e28510fcf30aa Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Thu, 21 Nov 2024 10:57:20 -0500 Subject: [PATCH 16/17] Add section about instrementation scope --- docs/rfcs/component-universal-telemetry.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index a628ad65fd0..1f4e63a70da 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -72,6 +72,12 @@ the mechanism. Currently, that package is `service/internal/graph`, but this may ascribed to individual component packages, both because the instrumentation scope is intended to describe the origin of the telemetry, and because no mechanism is presently identified which would allow us to determine the characteristics of a component-specific scope. +### Instrumentation Scope + +All telemetry described in this RFC should include a scope name which corresponds to the package which implements the telemetry. If the +package is internal, then the scope name should be that of the module which contains the package. For example, +`go.opentelemetry.io/service` should be used instead of `go.opentelemetry.io/service/internal/graph`. + ### Auto-Instrumented Metrics There are two straightforward measurements that can be made on any pdata: From 02584b0746777216437e5c4d153e42b4a8d94966 Mon Sep 17 00:00:00 2001 From: Dan Jaglowski Date: Thu, 21 Nov 2024 11:03:32 -0500 Subject: [PATCH 17/17] Fix markdown link check --- docs/rfcs/component-universal-telemetry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/component-universal-telemetry.md b/docs/rfcs/component-universal-telemetry.md index 1f4e63a70da..4a721fbad1b 100644 --- a/docs/rfcs/component-universal-telemetry.md +++ b/docs/rfcs/component-universal-telemetry.md @@ -83,7 +83,7 @@ package is internal, then the scope name should be that of the module which cont There are two straightforward measurements that can be made on any pdata: 1. A count of "items" (spans, data points, or log records). These are low cost but broadly useful, so they should be enabled by default. -2. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#L11). +2. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#l11). These may be high cost to compute, so by default they should be disabled (and not calculated). This default setting may change in the future if it is demonstrated that the cost is generally acceptable. The location of these measurements can be described in terms of whether the data is "consumed" or "produced", from the perspective of the