Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fallback merge strategy #21928

Open
john-from-corelight opened this issue Dec 2, 2024 · 1 comment
Open

Add a fallback merge strategy #21928

john-from-corelight opened this issue Dec 2, 2024 · 1 comment
Labels
transform: reduce Anything `reduce` transform related type: feature A value-adding code addition that introduce new functionality.

Comments

@john-from-corelight
Copy link

john-from-corelight commented Dec 2, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

As a user of vector with many log schemas, and dynamic schemas, that is using merge_strategies I have found the default merge strategies are great when the log schema is strictly defined. However, when there are additional fields in the schema I would like the ability to change the default merge strategy per reduce transform. This would deviate from the existing pattern of a default being applied per data type.

Attempted Solutions

The only solution I have right now to prevent unwanted data from sneaking in via a reduce transform is to follow the transform with a "remap" transform that drops fields via an "allow" list.

new_event = {}
allow_list = ["field_1", "field_2"]
for_each(allow_list) -> |_index, value| {
    val, err = get(. , [value])
    if !is_null(val){
         new_event = set!(new_event, [value], val )
     }
}
. =  new_event

Proposal

The following config snippet is an example:

[transforms.my_reduce]
type = "reduce"
inputs = ["my_input"]
merge_strategies.field_1 = "discard"
merge_strategies.field_2 = "sum"
# New field
merge_strategies.default = "discard"

merge_strategies.default would be the new option and allow the user to define a merge strategy default for any fields not explicitly defined. I have seen some issues where the default merge strategy for each type produces undesired behavior. For example, summing port numbers because they are integers.

I would also like to see a merge_strategies.default setting for "drop". This would drop any field not explicitly mentioned in the merge strategy. This would also give tighter control of the event being produced.

References

No response

Version

0.41.1

@john-from-corelight john-from-corelight added the type: feature A value-adding code addition that introduce new functionality. label Dec 2, 2024
@jszwedko jszwedko added the transform: reduce Anything `reduce` transform related label Dec 2, 2024
@jszwedko
Copy link
Member

jszwedko commented Dec 2, 2024

This seems like a useful enhancement, thanks for the detailed request @john-from-corelight !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transform: reduce Anything `reduce` transform related type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

2 participants