-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes_events
input seems to create high cpu usage
#9787
Comments
I have been experiencing the same behaviour. It goes away when you are not using the db for checkpointing. However, it may lead to some data loss as I understand. A proper solution from maintainers for this issue is much appreciated. |
This is indeed a workaround, but one i would not like to have. Thanks for sharing it though, i wasn't aware that it was caused by the db. |
We are encountering the same issue and analyzed it by looking into the sqlite db. We saw that the entries don't get deleted by the cleanup code. The cpu usage seems to be high as there is no index in the uid column that gets used and the duplicate checking (for each processed event) takes really long when the database grows. I guess that the bug is located in the calculation of the retention_time_ago:
The stored timestamp has nanoseconds precision, but in line 652 the timestamp gets divided by 1 billion. Instead the retention time should be multiplied by 1 billion in line 658, as it is done here:
|
Nice find @cm-rudolph ! Can you create a PR so the fluent team gets awareness of this issue (and its fix)? |
Bug Report
Describe the bug
I have deployed a fluent-bit via a
Deployment
which' only job is to gather kubernetes_events and output them somewhere.This fluent-bit seems to have an issue where sometimes over the timespan of a few minutes to sometimes multiple hours the cpu usage goes to 1 (100% on 1 core).
The deployment only has a request of >1, no limit set, and the node has a lot of spare cpu capacity (32 core system).
My other fluent-bits which are gathering logs and outputting to the same output do not seem to have this issue.
There is no custom parsers in custom_parsers.conf.
I do use the helm chart of fluent-bit with these values:
I can also see on the node that it is fluent-bit itself causing the cpu usage and not the config watcher or hot-reload mechanism:
To Reproduce
fluent-bit-k8s-events-sync
first)Expected behavior
cpu usage should correlate to event amount produced
Screenshots
Your Environment
fluent-bit-k8s-events-sync
that can be used to create the db syncAdditional context
It seems that fluent-bit is still processing events and writing them to the output, but i haven't checked if they are complete.
I do see this behavior across all our clusters, except those where the output is running inside the same cluster (the outputs hostname is an internal kubernetes service in this case).
The text was updated successfully, but these errors were encountered: