-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle multiline processors #2063
Handle multiline processors #2063
Conversation
test integrations |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#10964 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution!
In the build in integrations there is a failure that could be related: https://buildkite.com/elastic/integrations/builds/15324#0191b3b5-8906-4c82-ae84-994684452522/1582-1875
ERROR: Caused by: Line 149 is out of range in the file packages/teleport/data_stream/audit/elasticsearch/ingest_pipeline/default.yml (lines: 148)
And, is this change intended to fix #2045? If it is I think we also need to ignore empty and commented lines as mentioned in #2045 (comment).
Co-authored-by: Jaime Soriano Pastor <[email protected]>
Co-authored-by: Jaime Soriano Pastor <[email protected]>
test integrations |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#11003 |
one XML is generated with additional |
This could be related to this change, right? |
Probably as the failing XML is one created by the pipeline (the one I touched), I am running the integration tests on a fresh branch from main to be sure. |
I think I found the issue. The function is counting multilines here because of the |
processors: | ||
- script: | ||
description: Drops null/empty values recursively. | ||
tag: script_drop_null_empty_values | ||
lang: painless | ||
source: "def a = b \n; def b = 2; \n" | ||
`), | ||
expected: []Processor{ | ||
{Type: "script", FirstLine: 3, LastLine: 7}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating this test example with two processors, I think it does not report the expected lines in the coverage.
For instance if this test is modified to include two processors whose definition is multiline like this
content: []byte(`---
processors:
- script:
description: Drops null/empty values recursively.
tag: script_drop_null_empty_values
lang: painless
source: "def a = b \n; def b = 2; \n"
- script:
lang: painless
source: "def a = b \n; def b = 2; \n def c = 4; \n def d = 5; \n"
- set:
field: "foo"
value: "bar"
`),
This gives this result:
[]ingest.Processor{ingest.Processor{Type:"script", FirstLine:3, LastLine:9}, ingest.Processor{Type:"script", FirstLine:8, LastLine:13}, ingest.Processor{Type:"set", FirstLine:11, LastLine:13}}
But according to that:
- the second processors starts (line 8) before the first processor finishes (line 9)
- the second processor finishes in the same line as the third processor.
Should it be the result this ?
[]ingest.Processor{ingest.Processor{Type:"script", FirstLine:3, LastLine:7}, ingest.Processor{Type:"script", FirstLine:8, LastLine:10}, ingest.Processor{Type:"script", FirstLine:11, LastLine:13}}
I think to keep the coverage reports reporting the right lines, if that scenario happens (a processors is defined as oneline with some `\n` in it), that should be reported as just one line. Since in the YAML file, it is in the same line too. If that is not the case, probably the coverage reports would set incorrect lines as covered or not covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to revise the logic for determining the end of a processor. Simply counting line breaks doesn't work well, as in cases like:
- grok:
tag: Extract header
field: message
trim_value: " \n\n"
will count, for example, 5 lines instead of 4.
One other option is to consider that a processor ends the line before the next one starts. However, I am running into issues when setting the LastLine number for the last processor. I can't just set it to the end of the file because there could be other elements afterward, like:
- rename:
field: source.as.organization_name
target_field: source.as.organization.name
ignore_missing: true
on_failure:
- set:
field: error.message
value: '{{ _ingest.on_failure_message }}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is, when using yaml.Unmarshal
, the YAML parser will interpret
"a = 1\nb = 2"
as
a = 1
b = 2
So I can't really base the logic on lines breaks
548c938
to
9d2902f
Compare
test integrations |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#11032 |
test integrations |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#11032 |
Finally got a successful build on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking to the changes here, I wonder if we should do #2045 first, so empty and commented lines are not included in coverage.
Co-authored-by: Mario Rodriguez Molins <[email protected]>
💚 Build Succeeded
History
cc @haetamoudi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @mrodm please review if your last comments have been addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Thanks @haetamoudi !
The
LastLine
attribute is used to determine the end of a processor block in YAML files.Currently, for YAML files such as:
The script processor is incorrectly identified with:
When it should be:
LastLine
is used to identify the end of a processor when calculating code coverage.Right now, it’s not accurately reflecting multiline processors, which can make code coverage look worse than it is. This can cause issues with PRs in the integrations repo and might block merges.
To fix this, we can consider a processor ends when the next one starts.
To identify the end of the last processor:
on_failure
node)Code coverage test results
Before the changes:
Code coverage link
the content of the
script
node is not considered as part of the processorcomments and empty lines inside the processors node are marked as
uncovered
After the changes:
Code coverage link
script
node is considered part of the processorFixes #2111