Amend Pipeline Component Telemetry RFC to add a "rejected" outcome #11956
+14
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
The Pipeline Component Telemetry RFC was recently accepted (#11406). The document states the following regarding error monitoring:
The observability requirements for stable pipeline components were also recently added (#11772). The document states the following regarding error monitoring:
Because errors are typically propagated across
ConsumeX
calls in a pipeline (except for components with an internal queue likeprocessor/batch
), the error observability mechanism proposed by the RFC implies that Pipeline Telemetry will record failures for every component interface upstream of the component that actually emitted the error, which does not match the goals set out in the observability requirements, and makes it much harder which component errors are coming from from the emitted telemetry.Description
This PR amends the Pipeline Component Telemetry RFC with the following:
outcome=failure
value to cases where the error comes from the very next component (the component on whichConsumeX
was called);outcome
attribute:rejected
, for cases where an error observed at an interface comes from further downstream (the component did not "fail", but its output was "rejected");downstream
struct, which upstream layers could check for witherrors.As
to know the error has already been "assigned" to a component. This is the same mechanism currently used for tracking permanent vs. retryable errors.The proposed naming convention and mechanism are up for debate.