Amend Pipeline Component Telemetry RFC to add a "rejected" outcome #11956

jade-guiton-dd · 2024-12-18T14:36:28Z

Context

The Pipeline Component Telemetry RFC was recently accepted (#11406). The document states the following regarding error monitoring:

For both [consumed and produced] metrics, an outcome attribute with possible values success and failure should be automatically recorded, corresponding to whether or not the corresponding function call returned an error. Specifically, consumed measurements will be recorded with outcome as failure when a call from the previous component the ConsumeX function returns an error, and success otherwise. Likewise, produced measurements will be recorded with outcome as failure when a call to the next consumer's ConsumeX function returns an error, and success otherwise.

The observability requirements for stable pipeline components were also recently added (#11772). The document states the following regarding error monitoring:

The goal is to be able to easily pinpoint the source of data loss in the Collector pipeline, so this should either:

only include errors internal to the component, or;

allow distinguishing said errors from ones originating in an external service, or propagated from downstream Collector components.

Because errors are typically propagated across ConsumeX calls in a pipeline (except for components with an internal queue like processor/batch), the error observability mechanism proposed by the RFC implies that Pipeline Telemetry will record failures for every component interface upstream of the component that actually emitted the error, which does not match the goals set out in the observability requirements, and makes it much harder which component errors are coming from from the emitted telemetry.

Description

This PR amends the Pipeline Component Telemetry RFC with the following:

restrict the outcome=failure value to cases where the error comes from the very next component (the component on which ConsumeX was called);
add a third possible value for the outcome attribute: rejected, for cases where an error observed at an interface comes from further downstream (the component did not "fail", but its output was "rejected");
propose a mechanism to determine which of the two values should be used:
- The current proposal is for the pipeline instrumentation layer to wrap errors in an unexported downstream struct, which upstream layers could check for with errors.As to know the error has already been "assigned" to a component. This is the same mechanism currently used for tracking permanent vs. retryable errors.

The proposed naming convention and mechanism are up for debate.

codecov · 2024-12-18T15:01:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.62%. Comparing base (be406d3) to head (9b2707a).

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #11956   +/-   ##
=======================================
  Coverage   91.62%   91.62%           
=======================================
  Files         447      447           
  Lines       23729    23729           
=======================================
  Hits        21741    21741           
  Misses       1613     1613           
  Partials      375      375

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Updated Pipeline Instrumentation RFC to add a "rejected" outcome

9b2707a

jade-guiton-dd added Skip Changelog PRs that do not require a CHANGELOG.md entry Skip Contrib Tests labels Dec 18, 2024

jade-guiton-dd requested a review from djaglowski December 18, 2024 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amend Pipeline Component Telemetry RFC to add a "rejected" outcome #11956

Amend Pipeline Component Telemetry RFC to add a "rejected" outcome #11956

jade-guiton-dd commented Dec 18, 2024

codecov bot commented Dec 18, 2024

Amend Pipeline Component Telemetry RFC to add a "rejected" outcome #11956

Are you sure you want to change the base?

Amend Pipeline Component Telemetry RFC to add a "rejected" outcome #11956

Conversation

jade-guiton-dd commented Dec 18, 2024

Context

Description

codecov bot commented Dec 18, 2024

Codecov Report