Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falco 0.38.0 - k8s specific fields are not populated any more #3243

Open
networkhell opened this issue Jun 10, 2024 · 28 comments · Fixed by falcosecurity/libs#1907
Open

Falco 0.38.0 - k8s specific fields are not populated any more #3243

networkhell opened this issue Jun 10, 2024 · 28 comments · Fixed by falcosecurity/libs#1907
Assignees
Milestone

Comments

@networkhell
Copy link

After upgrading to falco 0.38.0 some k8s specific fields are not pupulated any more. E.g. k8s.ns.name amd k8s.pod.name.

Enviroment ist k8s 1.28.6 with the following runtime components:

  • docker-ce 24.0.2
  • containerd 1.6.21
  • cri-dockerd 0.3.4

Deploy falco 0.38.0 via Manifest with default config. Trigger any alert that contains k8s specific output fields e.g. spawn a shell in a container.

When a rule is triggered I execpt the relevant fields to be pupulated from the container runtime. But k8s.* fields are missing after the upgrade to 0.38.0

14:37:02.679088469: Notice A shell was spawned in a container with an attached terminal (evt_type=execve user=root user_uid=1000 user_loginuid=-1 process=bash proc_exepath=/usr/bin/bash parent=runc command=bash terminal=34816 exe_flags=0 container_id=ce69f7e51afe container_image=harbor.***/hub.docker.com-proxy/library/python container_image_tag=3.12-slim container_name=k8s_***-service-python_***-oauth-service-5995bb9788-fllrf_management_b8968793-8b38-42fd-b2cf-1681edb9f99e_0 k8s_ns=<NA> k8s_pod_name=<NA>)

Environment

  • Falco version:
Mon Jun 10 09:08:19 2024: Falco version: 0.38.0 (x86_64)
Mon Jun 10 09:08:19 2024: Falco initialized with configuration files:
Mon Jun 10 09:08:19 2024:    /etc/falco/falco.yaml
Mon Jun 10 09:08:19 2024: System info: Linux version 6.1.0-21-amd64 ([email protected]) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03)
Falco version: 0.38.0
Libs version:  0.17.1
Plugin API:    3.5.0
Engine:        0.40.0
Driver:
  API version:    8.0.0
  Schema version: 2.0.0
  Default driver: 7.2.0+driver
  • System info:
Mon Jun 10 09:09:00 2024: Falco version: 0.38.0 (x86_64)
Mon Jun 10 09:09:00 2024: Falco initialized with configuration files:
Mon Jun 10 09:09:00 2024:    /etc/falco/falco.yaml
Mon Jun 10 09:09:00 2024: System info: Linux version 6.1.0-21-amd64 ([email protected]) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03)
Mon Jun 10 09:09:00 2024: Loading rules from file /etc/falco/falco_rules.yaml
Mon Jun 10 09:09:00 2024: Loading rules from file /etc/falco/rules.d
Mon Jun 10 09:09:00 2024: /etc/falco/rules.d: Ok, with warnings
4 Warnings:
In rules content: (/etc/falco/rules.d:0:0)
    rule 'Read sensitive file untrusted': (/etc/falco/rules.d:7:2)
------
- rule: Read sensitive file untrusted
  ^
------
LOAD_DEPRECATED_ITEM (Used deprecated item): 'append' key is deprecated. Add an 'append' entry (e.g. 'condition: append') under 'override' instead.
In rules content: (/etc/falco/rules.d:0:0)
    rule 'Contact K8S API Server From Container': (/etc/falco/rules.d:31:2)
------
- rule: Contact K8S API Server From Container
  ^
------
LOAD_DEPRECATED_ITEM (Used deprecated item): 'append' key is deprecated. Add an 'append' entry (e.g. 'condition: append') under 'override' instead.
In rules content: (/etc/falco/rules.d:0:0)
    rule 'Redirect STDOUT/STDIN to Network Connection in Container': (/etc/falco/rules.d:48:2)
------
- rule: Redirect STDOUT/STDIN to Network Connection in Container
  ^
------
LOAD_DEPRECATED_ITEM (Used deprecated item): 'append' key is deprecated. Add an 'append' entry (e.g. 'condition: append') under 'override' instead.
In rules content: (/etc/falco/rules.d:0:0)
    rule 'Clear Log Activities': (/etc/falco/rules.d:57:2)
------
- rule: Clear Log Activities
  ^
------
LOAD_DEPRECATED_ITEM (Used deprecated item): 'append' key is deprecated. Add an 'append' entry (e.g. 'condition: append') under 'override' instead.
{
  "machine": "x86_64",
  "nodename": "falco-lfjvt",
  "release": "6.1.0-21-amd64",
  "sysname": "Linux",
  "version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03)"
}
  • Cloud provider or hardware configuration:
    k8s on premise (kubespray 2.24.1)
  • OS:
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel:
    Linux k8s-master01vt-nbg6.senacor-lbb.noris.de 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
  • Installation method:
    Kubernetes Manifest files from example Repo
@incertum
Copy link
Contributor

Hi @networkhell happy to help debugging.

First of all wanted to provide feedback that for me it continues to work very well after upgrading to Falco 0.38.0, so let's try to get to the bottom of what it could be.

Could you share some statistics re how often the k8s fields are missing? For all events even though you have the container fields?

Here are the relevant source code parts:

Maybe even try all k8s fields https://falco.org/docs/reference/rules/supported-fields/#field-class-k8s

@networkhell
Copy link
Author

networkhell commented Jun 12, 2024

Thanks for getting back to me @incertum!

As soon as I am running falco 0.38.0 I never get these k8s.* fields populated. Always N/A... while the container fields are always populated.
There is one thing I noticed while testing with crictl: I have to use unix:///var/run/cri-dockerd.sock as cri socket. Containerd socket does not work.

Crictl Output (issued on k8s node)

crictl -r unix:///var/run/cri-dockerd.sock inspect da940514d5f22
{
  "status": {
    "id": "da940514d5f2240780141367da424cbbc48d10bf35563bb0c2097b4b0fc9ddfd",
    "metadata": {
      "attempt": 0,
      "name": "falcoctl-artifact-follow"
    },
    "state": "CONTAINER_RUNNING",
    "createdAt": "2024-06-12T10:06:52.862952477+02:00",
    "startedAt": "2024-06-12T10:06:52.997678056+02:00",
    "finishedAt": "0001-01-01T00:00:00Z",
    "exitCode": 0,
    "image": {
      "annotations": {},
      "image": "harbor.***/hub.docker.com-proxy/falcosecurity/falcoctl:0.8.0",
      "userSpecifiedImage": ""
    },
    "imageRef": "docker-pullable://harbor.***/hub.docker.com-proxy/falcosecurity/falcoctl@sha256:6ec71dea6a5962a27b9ca4746809574bb9d7de50643c7434d7c81058aaecde3a",
    "reason": "",
    "message": "",
    "labels": {
      "io.kubernetes.container.name": "falcoctl-artifact-follow",
      "io.kubernetes.pod.name": "falco-wc8xk",
      "io.kubernetes.pod.namespace": "falco",
      "io.kubernetes.pod.uid": "18ce76bd-29ee-4421-83c3-ff56e7de7cf8"
    },
    "annotations": {
      "io.kubernetes.container.hash": "52596eac",
      "io.kubernetes.container.restartCount": "0",
      "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
      "io.kubernetes.container.terminationMessagePolicy": "File",
      "io.kubernetes.pod.terminationGracePeriod": "30"
    },
    "mounts": [
      {
        "containerPath": "/plugins",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~empty-dir/plugins-install-dir",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
      },
      {
        "containerPath": "/rulesfiles",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~empty-dir/rulesfiles-install-dir",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
      },
      {
        "containerPath": "/etc/falcoctl",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~configmap/falcoctl-config-volume",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": true,
        "selinuxRelabel": false,
        "uidMappings": []
      },
      {
        "containerPath": "/var/run/secrets/kubernetes.io/serviceaccount",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~projected/kube-api-access-6fsfh",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": true,
        "selinuxRelabel": false,
        "uidMappings": []
      },
      {
        "containerPath": "/etc/hosts",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/etc-hosts",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
      },
      {
        "containerPath": "/dev/termination-log",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/containers/falcoctl-artifact-follow/0f6b0a2c",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
      }
    ],
    "logPath": "/var/log/pods/falco_falco-wc8xk_18ce76bd-29ee-4421-83c3-ff56e7de7cf8/falcoctl-artifact-follow/0.log",
    "resources": null
  },
  "info": {
    "sandboxID": "c3c8b3f1706bd7a038f786406ad45ec1bd92b8f391ccb61ca4a3dcafe39189a3",
    "pid": 3573572
  }
}

crictl output with containerd socket

crictl -r unix:///run/containerd/containerd.sock ps
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD

So I tried the following without success:
run falco pods with the following args and cri-dockerd.socket mounted on path /host/var/run/cri-dockerd.sock

          args:
            - /usr/bin/falco
            - --cri
            - /var/run/cri-dockerd.sock
            - -pk 
          env:
            - name: HOST_ROOT
              value: /host

As soon as I roll back to 0.37.1 and default args:

          args:
            - /usr/bin/falco
            - --cri
            - /run/containerd/containerd.sock
            - --cri
            - /run/crio/crio.sock
            - -pk

the fields are populated again.

So is there any way I can debug this in detail within the falco container? Or maybe some work is do be done to officially support cri-dockerd as cri interface?

@incertum
Copy link
Contributor

incertum commented Jun 12, 2024

Thanks for providing the crictl output. The labels are there.

There is one thing I noticed while testing with crictl: I have to use unix:///var/run/cri-dockerd.sock as cri socket. Containerd socket does not work.

The regression you mention seems puzzling. Plus you also have 2 runtimes running right? We definitely touched the container engines during the last releases. Maybe the regression is something very subtle wrt to the container engine type and/or the fact you run these 2? Maybe it enters the docker container engine logic now and not the CRI logic even though you pass the --cri socket. IDK yet.

Btw we never tested it with /var/run/cri-dockerd.sock and only do tests and unit tests with containerd and crio, the 2 runtimes we primarily support for Kubernetes. docker runtime support is internally in Falco a different container runtime and does not contain any k8s logic.

Not sure we can fix this for the immediate next Falco 0.38.1 patch release, because we are always very careful when touching the container engine as it can break easily CC @falcosecurity/falco-maintainers .

Edit: In addition, the new k8smeta plugin is an alternative way to get k8s enrichment, just FYI.

So is there any way I can debug this in detail within the falco container?

Likely need to compile the source and sprinkle more debug statements here and there, but you can try running with the lib logger in debug mode for sure.


Your /etc/crictl.yaml shows what? [When I toggle runtimes for local tests I always edit that file]

runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock

@incertum
Copy link
Contributor

Installed cri-dockerd real quick and used crictl to spin up a pod, and it's confirmed it does categorize it as docker ...

$ sudo crictl run container-config.json pod-config.json                                                                       [22:16:09]
d0e07ff4adbabb8cec6f75d022c711c9cb7487d085c5c277760bd8172d8366d6

$ sudo crictl pods                                                                                                            [22:16:15]
POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
d49a88ccc29a9       32 seconds ago      Ready               nginx-sandbox       default             1                   (default)

It also can't be decoupled from the docker service.

06-12 22:16:16.362950 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No sandbox label found, not copying liveness/readiness probes
06-12 22:16:16.362968 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No liveness/readiness probes found
06-12 22:16:16.363033 docker_async (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): parse returning true
06-12 22:16:16.363147 docker_async (d0e07ff4adba), secondary (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): Secondary fetch successful
06-12 22:16:16.363204 docker_async (d0e07ff4adba): parse returning true
06-12 22:16:16.363306 docker_async (d0e07ff4adba): Source callback result=1
06-12 22:16:16.363585 notify_new_container (d0e07ff4adba): created CONTAINER_JSON event, queuing to inspector

Above's command are the libs logger debug lines, so you should be able to get similar logs when enabling the libs logger.

Maybe the fact that it worked before was a lucky accident and now since we cleaned up the code a bit, it doesn't work anymore. Let me check now what would need to be done to support this scenario.

@incertum
Copy link
Contributor

@networkhell opened a WIP PR. The issue was that the cgroups layout for docker was not supported for our internal CRI container engine. However right now we would do lookups against the docker and cri-dockerd sockets ...

We need a design discussion among the maintainers to see how we can best support cri-dockerd. because above can cause a few issues.

CC @gnosek @therealbobo

6-12 22:54:53.338290 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No sandbox label found, not copying liveness/readiness probes
06-12 22:54:53.338298 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No liveness/readiness probes found
06-12 22:54:53.338327 docker_async (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): parse returning true
06-12 22:54:53.338376 docker_async (d0e07ff4adba), secondary (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): Secondary fetch successful
06-12 22:54:53.338403 docker_async (d0e07ff4adba): parse returning true
06-12 22:54:53.338449 docker_async (d0e07ff4adba): Source callback result=1
06-12 22:54:53.338507 Mesos container [d0e07ff4adba],thread [1514227], has likely malformed mesos task id [], ignoring
06-12 22:54:53.338514 cri (d0e07ff4adba): Performing lookup
06-12 22:54:53.338525 cri_async (d0e07ff4adba): Starting synchronous lookup
06-12 22:54:53.340012 cri (d0e07ff4adba): ContainerStatusResponse status error message: ()
06-12 22:54:53.340051 cri (d0e07ff4adba): parse_cri_image: image_ref=docker-pullable://busybox@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7, digest_start=26
06-12 22:54:53.340068 cri (d0e07ff4adba): parse_cri_image: have_digest=1 image_name=docker-pullable://busybox
06-12 22:54:53.340080 cri (d0e07ff4adba): parse_cri_image: tag=, pulling tag from busybox:latest
06-12 22:54:53.340106 cri (d0e07ff4adba): parse_cri_image: repo=docker-pullable://busybox tag=latest image=docker-pullable://busybox:latest digest=sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7
06-12 22:54:53.340128 (cgroup-limits) mem cgroup for container [d0e07ff4adba]: /proc/1/root/sys/fs/cgroup//system.slice/docker-d0e07ff4adbabb8cec6f75d022c711c9cb7487d085c5c277760bd8172d8366d6.scope

@networkhell
Copy link
Author

networkhell commented Jun 13, 2024 via email

@incertum
Copy link
Contributor

Thanks for the additional info. I believe we should support cri docker, because we also support docker and who knows maybe it becomes more relevant in the future.

Just need to check and find a way to make sure that we do not look up the same container from 2 sockets, that's all. Not like it involves lots of code changes or so.

@leogr thoughts? However it definitely wouldn't be part of the next patch release.

@leogr
Copy link
Member

leogr commented Jun 17, 2024

Thanks for the additional info. I believe we should support cri docker, because we also support docker and who knows maybe it becomes more relevant in the future.

I have no idea if this may become relevant. We should dig into it.

@leogr thoughts? However it definitely wouldn't be part of the next patch release.

Totally agree. Let's see in falcosecurity/libs#1907 and target it for libs 0.18 (Falco 0.39)

@incertum
Copy link
Contributor

@leogr I believe exposing container engines configs in falco.yaml can not only help here, but also make the configuration more versatile. For example, I never liked that all container engines are enabled and the end user has no control at all. Plus for some deployment scenarios it will be better to not needing CLI flags (e.g. cri, disable-cri-async) and instead have the option to configure everything over falco.yaml similar to other settings.

Basically, if we have explicit enabled tags for each container engine we can easily support cri-dockerd while ensuring we do not look up the same container from 2 sockets.

We have a few options:

  1. Follow the plugins configs approach
enable_container_engines: ["docker", "cri", ...]
container_engines:
 - name: docker
 - name: cri
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false
  1. Similar to 1., but an explicit enabled tag per engine.
container_engines:
 - name: docker
   enabled: true
 - name: cri
   enabled: true
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false
  1. ... ?

Defaults will of course remain the same.

@incertum incertum self-assigned this Jun 17, 2024
@leogr
Copy link
Member

leogr commented Jun 20, 2024

@leogr I believe exposing container engines configs in falco.yaml can not only help here, but also make the configuration more versatile.

Totally agree 👍

Basically, if we have explicit enabled tags for each container engine we can easily support cri-dockerd while ensuring we do not look up the same container from 2 sockets.

We have a few options:

  1. Follow the plugins configs approach
enable_container_engines: ["docker", "cri", ...]
container_engines:
 - name: docker
 - name: cri
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false
  1. Similar to 1., but an explicit enabled tag per engine.
container_engines:
 - name: docker
   enabled: true
 - name: cri
   enabled: true
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false

I prefer 2 over 1. Anyway, we will still have the issue that's not easy to use lists with -o from the command line (cc @LucaGuerra).

The option 3 might be:

container_engines:
    docker:
      enabled: true
    cri:
      enabled: true
      cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
      disable-cri-async: false

That being said, I believe it's time to open a dedicated issue for this :)

@incertum
Copy link
Contributor

I may like option 3, it seems shorter. yes let me open a dedicated issue.

@incertum
Copy link
Contributor

/milestone 0.39.0

@poiana poiana added this to the 0.39.0 milestone Jun 20, 2024
@incertum
Copy link
Contributor

incertum commented Jun 27, 2024

@networkhell

Once (1) falcosecurity/libs#1907 and (2) #3266 are merged you could test the master falco container with the folllowing config. Important would be to disable docker.

container_engines:
  docker:
    enabled: false
  cri:
    enabled: true
    sockets: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
    disable_async: false
  podman:
    enabled: false
  lxc:
    enabled: false
  libvirt_lxc:
    enabled: false
  bpm:
    enabled: false

@networkhell
Copy link
Author

@incertum I had the chance to test the current master / main images just now. But unfortunately the problem is not solved.

Additions to Falco config:
container_engines: docker: enabled: false cri: enabled: true sockets: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/cri-dockerd.sock"] disable_async: false podman: enabled: false lxc: enabled: false libvirt_lxc: enabled: false bpm: enabled: false

Sample Log output:
09:26:41.301038994: Notice Unexpected connection to K8s API Server from container (connection=10.233.116.107:57454->10.233.0.1:443 lport=443 rport=57454 fd_type=ipv4 fd_proto=fd.l4proto evt_type=connect user=<NA> user_uid=65532 user_loginuid=-1 process=cluster-proport proc_exepath=/cluster-proportional-autoscaler parent=containerd-shim command=cluster-proport --namespace=kube-system --default-params={"linear":{"preventSinglePointFailure":true,"coresPerReplica":256,"nodesPerReplica":16,"min":2}} --logtostderr=true --v=2 --configmap=dns-autoscaler --target=Deployment/coredns terminal=0 exe_flags=<NA> container_id= container_image=<NA> container_image_tag=<NA> container_name=<NA> k8s_ns=<NA> k8s_pod_name=<NA>) 09:27:05.187856865: Notice A shell was spawned in a container with an attached terminal (evt_type=execve user=root user_uid=0 user_loginuid=-1 process=sh proc_exepath=/usr/bin/dash parent=runc command=sh terminal=34817 exe_flags=EXE_WRITABLE container_id= container_image=<NA> container_image_tag=<NA> container_name=<NA> k8s_ns=<NA> k8s_pod_name=<NA>)

As you can see in the logs neither k8s fields nor container specific fields are populated. So if you agree I would opt to re-open this issue.

@leogr
Copy link
Member

leogr commented Sep 9, 2024

/reopen
cc @FedeDP @Andreagit97

@poiana poiana reopened this Sep 9, 2024
@poiana
Copy link
Contributor

poiana commented Sep 9, 2024

@leogr: Reopened this issue.

In response to this:

/reopen
cc @FedeDP @Andreagit97

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@incertum
Copy link
Contributor

I believe we need to bump libs in falco first. We will ping you here when we do it the next time.

@FedeDP
Copy link
Contributor

FedeDP commented Sep 24, 2024

Ehy @networkhell we just released Falco 0.39.0-rc3, that should become final Falco 0.39.0 release in a week.
Care to test with it? You'll find packages under the *-dev of https://download.falco.org/packages/ .
And you'll find the docker images with the 0.39.0-rc3 tag.

@networkhell
Copy link
Author

Hi @FedeDP @incertum,

I tested just now with the suggested version and the results are not promising...

Full falco config:

base_syscalls:
  custom_set: []
  repair: false
buffered_outputs: false
container_engines:
  docker:
    enabled: false
  cri:
    enabled: true
    sockets: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/cri-dockerd.sock"]
    disable_async: false 
  podman:
    enabled: false
  lxc:
    enabled: false
  libvirt_lxc:
    enabled: false
  bpm:
    enabled: false
engine:
  kind: modern_ebpf
  kmod:
    buf_size_preset: 4
    drop_failed_exit: false
  ebpf:
    # path to the elf file to load.
    probe: ${HOME}/.falco/falco-bpf.o
    buf_size_preset: 4
    drop_failed_exit: false
  modern_ebpf:
    cpus_for_each_buffer: 2
    buf_size_preset: 4
    drop_failed_exit: false
  replay:
    # path to the capture file to replay (eg: /path/to/file.scap)
    capture_file: ""
  gvisor:
    # A Falco-compatible configuration file can be generated with
    # '--gvisor-generate-config' and utilized for both runsc and Falco.
    config: ""
    # Set gVisor root directory for storage of container state when used
    # in conjunction with 'gvisor.config'. The 'gvisor.root' to be passed
    # is the one usually passed to 'runsc --root' flag.
    root: ""
file_output:
  enabled: false
  filename: ./events.txt
  keep_alive: false
grpc:
  bind_address: unix:///host/var/run/falco/falco.sock
  enabled: true
  threadiness: 8
grpc_output:
  enabled: true
http_output:
  ca_bundle: ""
  ca_cert: ""
  ca_path: /etc/ssl/certs
  client_cert: /etc/ssl/certs/client.crt
  client_key: /etc/ssl/certs/client.key
  echo: false
  enabled: false
  insecure: false
  mtls: false
  url: ""
  user_agent: falcosecurity/falco
json_include_output_property: true
json_include_tags_property: true
json_output: false
libs_logger:
  enabled: false
  severity: debug
load_plugins: []
log_level: info
log_stderr: true
log_syslog: true
metrics:
  convert_memory_to_mb: true
  enabled: false
  include_empty_values: false
  interval: 1h
  kernel_event_counters_enabled: true
  libbpf_stats_enabled: true
  output_rule: true
  resource_utilization_enabled: true
output_timeout: 2000
plugins:
- library_path: libcloudtrail.so
  name: cloudtrail
- init_config: ""
  library_path: libjson.so
  name: json
priority: debug
program_output:
  enabled: false
  keep_alive: false
  program: 'jq ''{text: .output}'' | curl -d @- -X POST https://hooks.slack.com/services/XXX'
rule_matching: first
rules_files:
- /etc/falco/falco_rules.yaml
- /etc/falco/falco_rules.local.yaml
- /etc/falco/rules.d
stdout_output:
  enabled: true
syscall_event_drops:
  actions:
  - log
  - alert
  max_burst: 1
  rate: 0.03333
  simulate_drops: false
  threshold: 0.1
syscall_event_timeouts:
  max_consecutives: 1000
syslog_output:
  enabled: true
time_format_iso_8601: false
watch_config_files: true
webserver:
  enabled: true
  k8s_healthz_endpoint: /healthz
  listen_port: 8765
  ssl_certificate: /etc/falco/falco.pem
  ssl_enabled: false
  threadiness: 0

Log output:

Wed Sep 25 10:15:47 2024: Falco version: 0.39.0-rc3 (x86_64)
Wed Sep 25 10:15:47 2024: Falco initialized with configuration files:
Wed Sep 25 10:15:47 2024:    /etc/falco/falco.yaml | schema validation: ok
Wed Sep 25 10:15:47 2024: System info: Linux version 6.1.0-25-amd64 ([email protected]) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26)
Wed Sep 25 10:15:47 2024: Loading rules from:
Wed Sep 25 10:15:47 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
Wed Sep 25 10:15:47 2024:    /etc/falco/rules.d | schema validation: ok
Wed Sep 25 10:15:47 2024: /etc/falco/rules.d: Ok, with warnings
4 Warnings:
....
<omitted>
...
Wed Sep 25 10:15:47 2024: Hostname value has been overridden via environment variable to: k8s-master03vt-nbg6.***
Wed Sep 25 10:15:47 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Wed Sep 25 10:15:47 2024: gRPC server threadiness equals to 8
Wed Sep 25 10:15:47 2024: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Wed Sep 25 10:15:47 2024: Starting gRPC server at unix:///host/var/run/falco/falco.sock
Wed Sep 25 10:15:47 2024: Loaded event sources: syscall
Wed Sep 25 10:15:47 2024: Enabled event sources: syscall
Wed Sep 25 10:15:47 2024: Opening 'syscall' source with modern BPF probe.
Wed Sep 25 10:15:47 2024: One ring buffer every '2' CPUs.
10:16:22.924296762: Notice A shell was spawned in a container with an attached terminal (evt_type=execve user=root user_uid=0 user_loginuid=-1 process=sh proc_exepath=/usr/bin/dash parent=containerd-shim command=sh terminal=34817 exe_flags=EXE_WRITABLE|EXE_LOWER_LAYER container_id=72e76a55d5fd container_image=<NA> container_image_tag=<NA> container_name=<NA> k8s_ns=<NA> k8s_pod_name=<NA>)

Environment ist basically unchanged.

  • k8s: 1.29.7
  • Kernel 6.1.0-25-amd (debian 12)
  • Docker Version: 26.1.2 + cri-dockerd

Sorry for the bad news.

@FedeDP
Copy link
Contributor

FedeDP commented Sep 26, 2024

Sorry for the bad news.

:D doesn't matter, thanks for testing indeed!
So, i think we are out of time for the current release cycle and the issue is a bit hard to reproduce; moving to next milestone and 🙏 to fix it asap.
Sorry for the inconvenience!

/milestone 0.40.0

@poiana poiana modified the milestones: 0.39.0, 0.40.0 Sep 26, 2024
@leogr
Copy link
Member

leogr commented Sep 26, 2024

Hey @networkhell

First of all, thanks for your feedback.

I just got a question about the following:

10:16:22.924296762: Notice A shell was spawned in a container with an attached terminal (evt_type=execve user=root user_uid=0 user_loginuid=-1 process=sh proc_exepath=/usr/bin/dash parent=containerd-shim command=sh terminal=34817 exe_flags=EXE_WRITABLE|EXE_LOWER_LAYER container_id=72e76a55d5fd container_image=<NA> container_image_tag=<NA> container_name=<NA> k8s_ns=<NA> k8s_pod_name=<NA>)

Was it a short-lived container? 🤔

@networkhell
Copy link
Author

Hi @leogr,

Was it a short-lived container? 🤔

I tested this with one of the falco containers itself. I guess the pods uptime was something between 3 and 10 Minutes before my tests. What is your definition of short-lived? ;-)

@alacuku
Copy link
Member

alacuku commented Sep 27, 2024

Hey @networkhell have you considered using the k8s-metacollector for Kubernetes metadata?

@networkhell
Copy link
Author

Hey @networkhell have you considered using the k8s-metacollector for Kubernetes metadata?

@alacuku not yet - could it help to populate basic fields coming from the runtime such as container_name or k8s_pod_name? If yes I will give it a try.

@alacuku
Copy link
Member

alacuku commented Sep 30, 2024

@networkhell, at the following link you can find the fields provided by the k8smeta plugin: https://github.com/falcosecurity/plugins/tree/main/plugins/k8smeta#supported-fields

@leogr
Copy link
Member

leogr commented Sep 30, 2024

May #2700 (comment) be related? 🤔

@networkhell
Copy link
Author

@alacuku thanks for the hint. I guess this will work for us if we adjust our alerts to match the metacollector fields.

@leogr @incertum in the meantime we decided to drop support for docker runtime in our production k8s clusters in favour of containerd. So this issue will not really bother us any more in the future. But more important I will loose the access to my testing environments set up with this combination of tools within the next two weeks.

So if nobody else cares about k8s + docker as runtime feel free to close this issue.

@poiana
Copy link
Contributor

poiana commented Jan 5, 2025

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants