-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed events in a short-lived container running as a CronJob #2013
Comments
Could you try to disable the K8s metadata support by removing Thanks in advance. |
Thanks for getting back so fast. I followed your advice and disabled We are curious, when falco fails to enrich metadata and doesn't find the pod or the container, is it silently dropping the events? Can we log the failed state or be aware of it, please? Thank you! |
I had a suspect, and you confirmed it :)
Falco should not drop in this case. When some metadata is missing, the expected behavior is to emit the alert without the missing metadata. The cause of the wrong behavior may be either:
OR
I guess the first. Let me know if you can try. Anyway, whatever the case, I think we have to address this issue. /cc @jasondellaluce |
Yep, i agree with Leo: i'd try with threshold: 0 to see if the event is actually dropped. |
I have tried with the following setup/changes. I have enabled back
I left the cluster to run for more than 15 minutes (with CronJobs executing every 3 minutes). The last log events from falco remained:
Please let me know if I should look somewhere else or apply any other changes. Thank you! |
Hey @cpungasc If there was no event drop, it is likely a bug. We need to investigate more. Thank you for your detailed report 🙏 Out of curiosity: why is the Falco version redacted? Have you modified Falco or the driver? |
Hi! |
Hey @leogr Sure, let me know if I can help in any other way. |
Hi @FedeDP Yes, here are the relevant rules:
Hope this helps. |
Hi! A bit of context: both container (docker/podman/cri) and k8s metadata are collected in separate threads and their values is In the following output:
(ie: the test run with |
Ran the test once more. The output was:
|
Ok, so we got some event drops.
@leogr any thought? |
I guess that k8s support is blocking the main thread for a short amount of time, OR it lets the Falco process goes over 100% CPU usage so that the kernel switches immediately to another process. Both situations can make Falco drop events. We definitely need to investigate. |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
cc @falcosecurity/falco-maintainers any updates? |
I'd tag @deepskyblue86 that did lots of research and fixes on short-lived containers, right? (Thank you!) |
Actually I didn't experience any drop so far, just dealing with container creation and container metadata lookup in libs. |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh with Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue with Mark the issue as fresh with Provide feedback via https://github.com/falcosecurity/community. |
@poiana: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@abroglesc: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Working on that here #2973 |
Rotten issues close after 30d of inactivity. Reopen the issue with Mark the issue as fresh with Provide feedback via https://github.com/falcosecurity/community. |
@poiana: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
The new |
This should be solved by Falco 0.37.0! Feel free to reopen if this is still an issue |
Describe the bug
Falco didn't capture event(s) in a container running as a CronJob.
How to reproduce it
Create a shell script similar to
test.sh
:Use a CronJob to execute the
test.sh
:Expected behaviour
The used rule set is correct. Confirmed by observing similar type of events for containers that run for a longer period of time
than the container used to run the test case (in
test.sh
).Observe (at least):
Environment
Cloud provider or hardware configuration:
GKE
OS:
Kubernetes; Part of a daemon set running in a non-privileged container_.
Additional context
Introducing a delay (ex:
sleep 10
) beforetouch "/etc/test.txt"
eliminates the issue.The text was updated successfully, but these errors were encountered: