Simple processor metrics #10708

djaglowski · 2024-07-23T17:53:05Z

Is your feature request related to a problem? Please describe.

We currently define a set of metrics for processors which cannot be automatically recorded by the collector framework, and as far as I can tell are not actually recorded by any processors in core or contrib. In short, the metrics describe the number of items accepted, dropped, inserted, and refused. (I believe these metrics are loosely inspired by proposals such as open-telemetry/oteps#259, but no such design has actually landed.)

We also have proposals such as #10355 which would measure size of data according to the same categories, but again, it is not clear that there is any path to automatically recording these values. In the case of size metrics, it would additionally be very difficult for processor developers to accurately capture such measurements because it's unclear how to attribute inserted, dropped, or refused sizes as a portion of an overall payload.

Describe the solution you'd like

Looking at this from a collector implementation perspective, what we clearly can observe automatically, is the data being passed in and out of each processor. It would be difficult to determine precisely which parts of that data were inserted or deleted, but we can certainly count the number of items and measure the size of the payload (according to a standard such as ProtoMarshaler.Sizer()).

Therefore, I propose that we should define new metrics in processorhelper which are automatically recorded. If another more nuanced proposal eventually lands, we can add those later.

Importantly, in order to handle error propagation consistently, these metrics would be recorded after the processor successfully passes data to the next consumer. This can be accomplished within processorhelper by looking at data before and after the processing function, and taking into consideration the result of passing the data along to the next consumer.

Specifically, I propose the following metrics per data type. I'm showing only logs but assume there would be equivalents for metrics and traces.

Incoming and outgoing counts.

    otelcol_processor_incoming_log_records: # or processor_consumed_log_records
      enabled: true
      description: Number of log records passed to the processor.
      unit: "{records}"
      sum:
        value_type: int
        monotonic: true
    otelcol_processor_outgoing_log_records: # or processor_emitted_log_records
      enabled: true
      description: Number of log records emitted from the processor.
      unit: "{records}"
      sum:
        value_type: int
        monotonic: true

Incoming and outgoing sizes. These would be measured only if enabled.

    otelcol_processor_incoming_log_size: # or processor_consumed_log_size
      enabled: false # by default
      description: Total size of log records passed to the processor.
      unit: "By"
      sum:
        value_type: int
        monotonic: true
    otelcol_processor_outgoing_log_size: # or processor_emitted_log_size
      enabled: false # by default
      description: Total size of log records emitted from the processor.
      unit: "By"
      sum:
        value_type: int
        monotonic: true

The text was updated successfully, but these errors were encountered:

yurishkuro · 2024-07-23T19:32:48Z

Therefore, I propose that we should define new metrics in processorhelper which are automatically recorded. If another more nuanced proposal eventually lands, we can add those later.

I would caution against automatically recording size, it's not that cheap to compute and the value of such metric is lower than plain counts.

djaglowski · 2024-07-23T19:45:46Z

I would caution against automatically recording size, it's not that cheap to compute and the value of such metric is lower than plain counts.

I agree. To clarify, I would expect that size metrics are disabled by default and the cost should only be incurred if enabled.

Edit: What I mean by "automatically" here is that the individual components do not need to be concerned with recording them. They would be handled in a standard way by the collector's processorhelper package.

atoulme · 2024-07-23T21:04:36Z

I have been working on offering guidance for metrics exposed by the collector here: #10365

From that guidance, my question would be: what attributes do you want to set on those metrics, and what cardinality will they have? It looks like you have none defined in your code sample, which would make identifying the processor used difficult.

@codeboten has a RFC out for internal telemetry https://github.com/open-telemetry/opentelemetry-collector/pull/10695/files

From that RFC, should we consider to prefix metrics with otelcol_?

djaglowski · 2024-07-23T21:22:25Z

what attributes do you want to set on those metrics, and what cardinality will they have? It looks like you have none defined in your code sample, which would make identifying the processor used difficult.

Generally, I would expect that the attributes should identify the specific instance of the processor within the collector. Today, that can be established with the following info:

Component ID as specified in the config. e.g. batch or batch/1
Pipeline ID in which the processor is used.

That said, the pipeline ID may be unnecessary if we implement #10664. The reason is because if we instance processors by data type instead of pipeline, then the only identifying info in the pipeline ID is the data type itself but all proposed processor metrics are already differentiated by data type. Therefore, I do not believe we'd need anything besides the component ID, which we already have in processorhelper.ObsReport.

Thanks for pointing out the RFC - I will take a look at it and consider how this proposal might fit.

#### Description Implements ingoing and outgoing counts as described in #10708.

) #### Description Implements ingoing and outgoing counts as described in open-telemetry#10708.

djaglowski · 2024-09-11T15:34:07Z

Discussing with @codeboten and @mx-psi, we think it would make sense to add an outcome dimension to the outgoing metrics. In the short term, this would have two possible values, e.g. success of failure, purely based on whether an error is returned by the next consumer. Longer term, we may add additional granularity to failures based on the types of errors returned by the next consumer. In any case, it's important that all details are captured automatically so that we consistency. Someone may also propose additional metrics at a later time which are manually recorded by component authors.

danelson · 2024-09-27T13:26:30Z

Not sure this is the right place to ask this but is there a plan to provide parity with refused/dropped as it relates to the memory limiter processor? I might be missing something but I don't think I can reason about back pressure and data loss in the same way I was able to previously.

djaglowski · 2024-10-03T14:30:11Z

Not sure this is the right place to ask this but is there a plan to provide parity with refused/dropped as it relates to the memory limiter processor? I might be missing something but I don't think I can reason about back pressure and data loss in the same way I was able to previously.

It probably requires a separate proposal but I doubt there would be resistance to the idea. This proposal isn't arguing against that. I am just advocating for doing what can be done for all processors so that we have some uniform metrics.

Generally speaking, we should add useful metrics to individual component, so I think you could propose to re-add refused/dropped. I also think refused/dropped may be universally relevant, even if theycannot be automatically recorded. Therefore, perhaps the end state is that we formalize the semantics of those metrics, but leave implementation to component authors. Personally, I think we need to get the basics in place before there is bandwidth to deal with the formal semantics, but having refused/dropped where they already were seems reasonable to me.

djaglowski · 2024-10-09T13:21:29Z

Closing in favor of #11343

djaglowski added this to the Self observability milestone Jul 23, 2024

djaglowski mentioned this issue Jul 24, 2024

Document collector's internal telemetry #10695

Merged

djaglowski mentioned this issue Aug 12, 2024

[processor/logdedup] Add aggregated log record histogram metric open-telemetry/opentelemetry-collector-contrib#34581

Merged

This was referenced Aug 19, 2024

Processor telemetry should include Pipeline ID as attribute #10908

Closed

Add ingoing and outgoing counts to processorhelper #10910

Merged

codeboten pushed a commit that referenced this issue Sep 9, 2024

Add ingoing and outgoing counts to processorhelper (#10910)

fa34718

#### Description Implements ingoing and outgoing counts as described in #10708.

dmathieu pushed a commit to dmathieu/opentelemetry-collector that referenced this issue Sep 10, 2024

Add ingoing and outgoing counts to processorhelper (open-telemetry#10910

5480910

) #### Description Implements ingoing and outgoing counts as described in open-telemetry#10708.

This was referenced Sep 10, 2024

Revert "[processorhelper] deprecating unused methods" #11114

Closed

[processorhelper] deprecating unused methods #11083

Merged

djaglowski mentioned this issue Oct 3, 2024

Auto-instrumented pipeline components #11343

Closed

djaglowski closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple processor metrics #10708

Simple processor metrics #10708

djaglowski commented Jul 23, 2024 •

edited

Loading

yurishkuro commented Jul 23, 2024

djaglowski commented Jul 23, 2024 •

edited

Loading

atoulme commented Jul 23, 2024

djaglowski commented Jul 23, 2024

djaglowski commented Sep 11, 2024

danelson commented Sep 27, 2024

djaglowski commented Oct 3, 2024

djaglowski commented Oct 9, 2024

Simple processor metrics #10708

Simple processor metrics #10708

Comments

djaglowski commented Jul 23, 2024 • edited Loading

yurishkuro commented Jul 23, 2024

djaglowski commented Jul 23, 2024 • edited Loading

atoulme commented Jul 23, 2024

djaglowski commented Jul 23, 2024

djaglowski commented Sep 11, 2024

danelson commented Sep 27, 2024

djaglowski commented Oct 3, 2024

djaglowski commented Oct 9, 2024

djaglowski commented Jul 23, 2024 •

edited

Loading

djaglowski commented Jul 23, 2024 •

edited

Loading