-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza/fileconsumer] Emit logs in batches #35455
Labels
Comments
andrzej-stencel
added
enhancement
New feature or request
needs triage
New item requiring triage
labels
Sep 27, 2024
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This was referenced Sep 27, 2024
Removing |
andrzej-stencel
added a commit
to andrzej-stencel/opentelemetry-collector-contrib
that referenced
this issue
Oct 18, 2024
This refactors the `Reader::ReadToEnd` method by separating reading the file's header from reading the file's contents. This results in very similar code in `readHeader` and `readContents` methods, which was previosly deduplicated at the cost of slightly higher complexity. The bug could be fixed without separating header reading from contents reading, but I hope this separation will make it easier to implement content batching in the Reader (open-telemetry#35455). Content batching was my original motivation for these code changes. I only discovered the problem with record counting when reading the code.
The pull request #35870 is a first step: it refactors the code of the Reader to make the introduction of batching easier. |
djaglowski
pushed a commit
that referenced
this issue
Oct 28, 2024
#### Description Fixes #35869 by refactoring of the `Reader::ReadToEnd` method. This refactors the `Reader::ReadToEnd` method by separating reading the file's header from reading the file's contents. This results in very similar code in `readHeader` and `readContents` methods, which was previously deduplicated at the cost of slightly higher complexity. The bug could be fixed without separating header reading from contents reading, but I hope this separation will make it easier to implement content batching in the Reader (#35455). Content batching was my original motivation for these code changes. I only discovered the problem with record counting when reading the code. #### Link to tracking issue Fixes #35869 #### Testing In the first commit I have added tests that document the erroneous behavior. In the second commit I have fixed the bug and corrected the tests. #### Documentation Added changelog entry.
jpbarto
pushed a commit
to jpbarto/opentelemetry-collector-contrib
that referenced
this issue
Oct 29, 2024
) #### Description Fixes open-telemetry#35869 by refactoring of the `Reader::ReadToEnd` method. This refactors the `Reader::ReadToEnd` method by separating reading the file's header from reading the file's contents. This results in very similar code in `readHeader` and `readContents` methods, which was previously deduplicated at the cost of slightly higher complexity. The bug could be fixed without separating header reading from contents reading, but I hope this separation will make it easier to implement content batching in the Reader (open-telemetry#35455). Content batching was my original motivation for these code changes. I only discovered the problem with record counting when reading the code. #### Link to tracking issue Fixes open-telemetry#35869 #### Testing In the first commit I have added tests that document the erroneous behavior. In the second commit I have fixed the bug and corrected the tests. #### Documentation Added changelog entry.
This was referenced Nov 8, 2024
The PR that actually introduces batching: |
sbylica-splunk
pushed a commit
to sbylica-splunk/opentelemetry-collector-contrib
that referenced
this issue
Dec 17, 2024
) #### Description Fixes open-telemetry#35869 by refactoring of the `Reader::ReadToEnd` method. This refactors the `Reader::ReadToEnd` method by separating reading the file's header from reading the file's contents. This results in very similar code in `readHeader` and `readContents` methods, which was previously deduplicated at the cost of slightly higher complexity. The bug could be fixed without separating header reading from contents reading, but I hope this separation will make it easier to implement content batching in the Reader (open-telemetry#35455). Content batching was my original motivation for these code changes. I only discovered the problem with record counting when reading the code. #### Link to tracking issue Fixes open-telemetry#35869 #### Testing In the first commit I have added tests that document the erroneous behavior. In the second commit I have fixed the bug and corrected the tests. #### Documentation Added changelog entry.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Component(s)
pkg/stanza/fileconsumer
Is your feature request related to a problem? Please describe.
This issue is created as a result of discussion in #31074.
The Stanza adapter's LogEmitter has a 100-log buffer that is a source of data loss during non-graceful collector shutdown. One solution is to remove this buffer, but this would cause a severe performance impact (see #35454). This performance impact could be alleviated in case of the File Log receiver by implementing batching earlier in the Stanza pipeline - in the File consumer.
Describe the solution you'd like
From #31074 (comment):
and further in #31074 (comment):
Describe alternatives you've considered
Additional context
See #31074 (comment) and the following comments.
The text was updated successfully, but these errors were encountered: