Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/transform] Add drop function to the transform processor #16297

Closed
wants to merge 9 commits into from

Conversation

TylerHelmuth
Copy link
Member

@TylerHelmuth TylerHelmuth commented Nov 14, 2022

Description:
This PR adds a drop function to the transform processor.

Removing items from lists is a special type of transformation that the OTTL cannot handle itself via a function. This is because OTTL functions operate on a specific item in a context but drop requires access to the slice that the item resides in. For this reason, we move the responsibility of dropping the item to the transform processor itself (utilizing pdata methods).

This PR makes the following assumptions:

  • drop is the only OTTL function with a return value that the transform processor cares about. This assumption may change in the future, but until there is a second identified scenario, this assumption helps keep executeStatements simple.
  • If a executeStatements returns an error that means that means we are done executing ContextStatements for the current transform context. If a drop function appeared lower in the ContextStatements list it will be ignored since it will not be executed. Statements executed before the err will still take affect.
  • If a Metric ever has an empty DataPoints slice, the metric will be dropped. If a Scope ever has an empty Span/Metric/Log slice it will be dropped. If a Resource ever has an empty Scope slice it will be dropped. An empty SpanEvents slice will not cause a Span to be dropped.

Depends on #16251

Link to tracking Issue:
Closes #7151
Closes #13578
Closes #13579
Closes #13580
Closes #13581
Closes #5679

Testing:
Added unit tests

Documentation:
Added drop to the README

@github-actions github-actions bot added the processor/transform Transform processor label Nov 14, 2022
@TylerHelmuth TylerHelmuth requested a review from kovrus November 14, 2022 19:58
@TylerHelmuth TylerHelmuth force-pushed the tp-drop branch 3 times, most recently from e20bc7a to e3ba42f Compare November 15, 2022 19:18
@TylerHelmuth TylerHelmuth marked this pull request as ready for review November 15, 2022 19:18
@TylerHelmuth TylerHelmuth requested a review from a team November 15, 2022 19:18
Copy link
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, I've added a couple of non-blocking queries.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It's too bad this can't be part of OTTL. It seems likely to be representative of class of transformations. I am right in thinking that in order to allow this in OTTL, we'd need the ability to drill down from a context, or contexts specifically at the various slice levels?

@TylerHelmuth
Copy link
Member Author

It's too bad this can't be part of OTTL. It seems likely to be representative of class of transformations. I am right in thinking that in order to allow this in OTTL, we'd need the ability to drill down from a context, or contexts specifically at the various slice levels?

@djaglowski Yes that is correct. I think it will be possible to move the transformprocessor's concept of ContextStatements into the OTTL, which would allow drop to live there too, but drop will always need to rely on the implementor to do the dropping. If ContextStatements lives in OTTL then at least it is still the OTTL doing it even if the ottlfunc cannot.

We're starting with the ContextStatement concept in the transform processor since it isn't a public api and therefore will be easier to modify the ContextStatement API as we iterate through the concept.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor requests, but overall this is solid. Great to see this finally land.

processor/transformprocessor/README.md Outdated Show resolved Hide resolved
processor/transformprocessor/README.md Outdated Show resolved Hide resolved
Copy link
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with me 👍🏻

@TylerHelmuth TylerHelmuth added the ready to merge Code review completed; ready to merge by maintainers label Nov 18, 2022
Copy link
Member

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy to have this land.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this functionality for sure. I have another idea though, can we have a separate processor for this, since it is very dangerous functionality in general?

My preference would be if we can adopt the conditions from ottl in the current filter processor and have the current "filter" processor for this. A second best (maybe easier to implement) we can deprecate filter processor, and create a "dropprocessor".

@bogdandrutu
Copy link
Member

It's too bad this can't be part of OTTL. It seems likely to be representative of class of transformations. I am right in thinking that in order to allow this in OTTL, we'd need the ability to drill down from a context, or contexts specifically at the various slice levels?

Kind of disagree, since "dropping" is not a transformation of the telemetry itself :) Also not sure if we need a "re-use" for this functionality.

@cartermp
Copy link
Contributor

@bogdandrutu re this:

since it is very dangerous functionality in general?

It can be, yes, but stepping back a bit .. what are the goals of OTTL in the first place, and does this fit or not fit with those goals? My understanding is that there's a general desire to cut down on the number of bespoke processors so that there's a standard way to specify any number of different kinds of transformations on data in transit. Is dropping certain things not within the purview? If not then maybe it'd be helpful to clarify what is. It's a ton of work to go through and implement this in OTTL only to have to rip it out and try to fit it into another processor.

@bogdandrutu
Copy link
Member

If not then maybe it'd be helpful to clarify what is. It's a ton of work to go through and implement this in OTTL only to have to rip it out and try to fit it into another processor.

I want to reuse everything in OTTL (conditions, etc.) I discussed with @TylerHelmuth, don't believe should be that much work... I may miss something.

@bogdandrutu
Copy link
Member

My understanding is that there's a general desire to cut down on the number of bespoke processors so that there's a standard way to specify any number of different kinds of transformations on data in transit. Is dropping certain things not within the purview?

Dropping out of all the functionality is the most dangerous, and the one that I want people to have lots of attention when they use it. Because of that I believe it deserves a different treatment :)

@TylerHelmuth
Copy link
Member Author

Here is the start of a PR to see what this is like in the filterprocessor: #16369. I still believe the drop functionality should live in the transform processor (where it can only be used if a statement explicitly uses it), but if it were to live in the filter processor this is how it would look.

Things to consider about implementing ottl in the filterprocessor:

  1. An implementation of the OTTL in the filter processor should be conditions only; the filter processor should not support regular transformation of data. If a user needs to transform their data prior to processing, such as parsing json from logs or attributes, that should be done in the transform processor. In these situations the collector configs will include a transformprocessor that preprocesses the data and a filter processor that drops it. The filterprocessor configuration will depend on the output of the filterprocessor. I don't love having to "remember the state of my data" between components.
  2. If users want to transform remaining data after processing they will need to use the transform processor. Based on considation 1, this will mean pipelines like [transformprocessor/prep-for-drop, filterprocessor, transformprocessor] in certain situations.
  3. The filterprocessor will be forced to execute conditions from "top" to "bottom". We need to make sure to explain the implications of this to users so they understand what is happening when they define multiple conditions for the same signal (resource and span and spanevent filters).
  4. The OTTL does not fit into the processors existing configuration/processing. A new configuration and processing logic will need to be added and the old stuff deprecated and then removed in a future PR.

@bogdandrutu
Copy link
Member

An implementation of the OTTL in the filter processor should be conditions only; the filter processor should not support regular transformation of data. If a user needs to transform their data prior to processing, such as parsing json from logs or attributes, that should be done in the transform processor. In these situations the collector configs will include a transformprocessor that preprocesses the data and a filter processor that drops it. The filterprocessor configuration will depend on the output of the filterprocessor. I don't love having to "remember the state of my data" between components.

You do have now to remember that body is JSON, so you need to keep the state between services. I don't understand this, and also as mentioned, I don't think the right design is to have one processor that does everything, will get too messy for configuration and code. Now you changed the iteration to use "RemoveIf", then in the future if we need something else like sorting how would you do?

If users want to transform remaining data after processing they will need to use the transform processor. Based on considation 1, this will mean pipelines like [transformprocessor/prep-for-drop, filterprocessor, transformprocessor] in certain situations.

I still believe this is an edge case.

The filterprocessor will be forced to execute conditions from "top" to "bottom". We need to make sure to explain the implications of this to users so they understand what is happening when they define multiple conditions for the same signal (resource and span and spanevent filters).

I don't thing "dropping" resource makes sense at all. Since a "Resource" is not a telemetry.

The OTTL does not fit into the processors existing configuration/processing. A new configuration and processing logic will need to be added and the old stuff deprecated and then removed in a future PR.

That needs to happen anyway since we don't want duplicate code. So you want to just add the "fancy" new way, but we still need to do cleanups as maintainers of this codebase.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a clear path to remove filter processor, this new functionality should not be added to not confuse users.

@cartermp
Copy link
Contributor

Hmmm. I'd find it very confusing to be able to only get access to a subset of OTTL in the transform processor, but then access to an additional kind of functionality in a different one. That may be "safer", but this kind of non-orthogonality of the language introduces a new cognitive cost for end-users.

(this is also getting into the mushy-gushy world of language design, where things are just as touchy-feely and philosophical as they are technical)

@TylerHelmuth TylerHelmuth removed the ready to merge Code review completed; ready to merge by maintainers label Nov 21, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 6, 2022
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/transform Transform processor Stale
Projects
None yet
9 participants