[v1.2 regression] `SCMP_ACT_NOTIFY` rule for `fcntl` causes runc to hang, before connecting to the seccomp listener agent #4328

AkihiroSuda · 2024-06-28T05:06:05Z

nerdctl run --rm --annotation nerdctl/bypass4netns=1 alpine echo hi works with runc v1.1.13, but hangs with runc v1.2.0-rc.2.

Reproduction steps:

Install nerdctl-full v2.0.0-rc.0 from https://github.com/containerd/nerdctl/releases/tag/v2.0.0-rc.0
Run the following commands

containerd-rootless-setuptool.sh install
containerd-rootless-setuptool.sh install-bypass4netnsd
nerdctl run --rm --annotation nerdctl/bypass4netns=1 alpine echo hi

The text was updated successfully, but these errors were encountered:

AkihiroSuda · 2024-06-28T05:23:20Z

Looks like seccomping fcntl with SCMP_ACT_NOTIFY is no longer possible with runc v1.2
https://github.com/rootless-containers/bypass4netns/blob/v0.4.1/pkg/oci/oci.go#L14

Minimal repro:

Install https://github.com/rootless-containers/bypass4netns (v0.4.1)
Run bypass4netns
Run runc spec
Apply the following changes to config.json:

--- config.json.OLD     2024-06-28 14:27:41.625930763 +0900
+++ config.json 2024-06-28 14:28:02.228529889 +0900
@@ -173,6 +173,23 @@
       "/proc/irq",
       "/proc/sys",
       "/proc/sysrq-trigger"
-    ]
+    ],
+    "seccomp": {
+      "defaultAction": "SCMP_ACT_ALLOW",
+      "architectures": [
+        "SCMP_ARCH_X86_64",
+        "SCMP_ARCH_X86",
+        "SCMP_ARCH_X32"
+      ],
+      "listenerPath": "/run/user/1001/bypass4netns.sock",
+      "syscalls": [
+        {
+          "names": [
+            "fcntl"
+          ],
+          "action": "SCMP_ACT_NOTIFY"
+        }
+      ]
+    }
   }
 }

Run rootlesskit runc run foo (hangs with runc v1.2.0-rc.2)

AkihiroSuda · 2024-06-28T05:56:30Z

Regression in 20b95f2 libcontainer: seccomp: pass around *os.File for notifyfd (PR #3982)

Seems to be related to https://cs.opensource.google/go/go/+/refs/tags/go1.22.4:src/os/file_unix.go;l=113

rata · 2024-06-28T10:31:04Z

This is borderline for me. Is it a regression? I mean, I feel like we should be allowed to change which syscalls we do after the seccomp policies are applied and before the container entrypoint exec. Especially since we are using go, that we don't have so much control on which syscalls are used.

In fact, this is a great example, it is not documented that os.File uses fcntl and they can probably change it in different versions of go. So it's hard for us to make promises, IMHO, when using such a high-level language.

Can't this be handled properly in the seccomp agent? I think it should be easy to handle if the agent does: (a) accept all the syscalls on fds it doesn't know about (those where created before the agent received the message on the unix socket, so those are runc fds), or (b) accept all the syscalls before the first exec.

I think one of those should be possible (IMHO the latter is more robust, I'd go for that)

@AkihiroSuda what do you think?

AkihiroSuda · 2024-06-28T14:31:48Z

cc @naoki9911 (maintainer of https://github.com/rootless-containers/bypass4netns)

accept all the syscalls before the first exec.

How can the agent detect the first exec?

rata · 2024-06-28T14:38:01Z

@AkihiroSuda if you notify on the exec syscall, then you can see if exec was done or not and accept all before processing an exec, after the exec do whatever the agent wants to do.

If you do not want to handle exec, then just accepting all on unknown fds will do the trick too for this issue.

rata · 2024-06-28T14:47:13Z

But I honestly don't think this is something we want to fix in runc. I'm okay if the PR to fix this is very small, but it seems like the agent should handle this better (also, maybe the agent doesn't work with youki or other runtimes until it has a more robust handling of this).

In fact, looking at the golang code you shared (https://cs.opensource.google/go/go/+/refs/tags/go1.22.4:src/os/file_unix.go;l=113) if only the agent returns an error, instead of hanging the syscall forever, it might just work fine. If fcntl returns an error, the runtime doesn't abort it at all in that function.

I do think a better handling in the agent as I proposed in the comments above is better, but even this small fix might just cut it for this case.

naoki9911 · 2024-06-29T04:02:02Z

I confirmed the issue.

bypass4netns always allows fcntl exec.
It only records fcntl's arguments.

With runc v 1.1.13, bypass4netns hooks fcntl correctly.

$ .local/bin/runc -v
runc version 1.1.13
commit: v1.1.13-0-g58aa9203
spec: 1.0.2-dev
go: go1.22.4
libseccomp: 2.5.1

$ rootlesskit ./.local/bin/runc run foo
exec /bin/sh: no such file or directory
[rootlesskit:child ] error: command [./.local/bin/runc run foo] exited: exit status 1
[rootlesskit:parent] error: child exited: exit status 1

$ ./bypass4netns --trace
INFO[0000] Trace mode enabled
INFO[0000] SocketPath: /run/user/1002/bypass4netns.sock
INFO[0000] 127.0.0.0/8 is added to ignore
INFO[0000] Waiting for seccomp file descriptors
INFO[0014] accept connection
INFO[0014] Received new seccomp fd: 8
INFO[0014] tracer is disabled
INFO[0014] background task is ready. start to handle
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 3 0 0 0 0]
DEBU[0014] process is registered                         pid=564221 sockfd=4 syscall=fcntl
DEBU[0014] got sockfd=10
DEBU[0014] failed to get socket args err="getsockopt(SO_DOMAIN) failed: socket operation on non-socket"  pid=564221 sockfd=4 syscall=fcntl
DEBU[0014] socket is registered (state=NotBypassable)    pid=564221 sockfd=4 syscall=fcntl
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 4 34816 0 0 0]
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 3 0 0 0 0]
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 4 32768 0 0 0]

But with runc v1.2.0-rc.2, bypass4nets does not receive any notifications.

$ ./runc-amd64-v1.2.0-rc.2 -v
runc version 1.2.0-rc.2
commit: v1.2.0-rc.2-0-gf2d2ee5e-dirty
spec: 1.2.0
go: go1.22.3
libseccomp: 2.5.5

$ rootlesskit ./runc-amd64-v1.2.0-rc.2 run foo
<- Hangs here

$ ./bypass4netns --trace
INFO[0000] Trace mode enabled
INFO[0000] SocketPath: /run/user/1002/bypass4netns.sock
INFO[0000] 127.0.0.0/8 is added to ignore
INFO[0000] Waiting for seccomp file descriptors
<- No new notify connection received.

bypass4netns does not receive notify fd with runc v1.2.0-rc.2.
I changed the hooking syscall to connect, bypass4netns successfully accepts notify fd.

      "syscalls": [
        {
          "names": [
-            "fcntl"
+           "connect"
         ],
          "action": "SCMP_ACT_NOTIFY"
        }
      ]

$ rootlesskit ./runc-amd64-v1.2.0-rc.2 run foo
exec /bin/sh: no such file or directory
[rootlesskit:child ] error: command [./runc-amd64-v1.2.0-rc.2 run foo] exited: exit status 255
[rootlesskit:parent] error: child exited: exit status 255

$ ./bypass4netns --trace
INFO[0000] Trace mode enabled
INFO[0000] SocketPath: /run/user/1002/bypass4netns.sock
INFO[0000] 127.0.0.0/8 is added to ignore
INFO[0000] Waiting for seccomp file descriptors
INFO[0002] accept connection <- accepted new connection.
INFO[0002] Received new seccomp fd: 8
INFO[0002] tracer is disabled
INFO[0002] background task is ready. start to handle

I think this issue occurs in the establishment of seccomp notification connection.
Filtering fcntl may be causing some 'dead-lock' state in it.

rata · 2024-07-01T09:08:25Z

Thinking about it more, we can probably break other setups changing the syscalls we do at that point. I think it is unlikely we do, but we can check if it's easy to fix this. Wanna take a look and open a PR?

(for the agent, though, I think the changes we talked about are the way to go anyways)

AkihiroSuda · 2024-07-03T08:28:55Z

PR:

Revert "libcontainer: seccomp: pass around *os.File for notifyfd" #4337

AkihiroSuda added the kind/bug label Jun 28, 2024

This was referenced Jun 28, 2024

Blockers for v1.2.0 #4114

Closed

update runc (1.2.0) containerd/nerdctl#3153

Merged

AkihiroSuda added this to the 1.2.0 milestone Jun 28, 2024

AkihiroSuda changed the title ~~[v1.2 regression] nerdctl run --annotation nerdctl/bypass4netns=1 hangs~~ [v1.2 regression] nerdctl run --annotation nerdctl/bypass4netns=1 hangs (due to SCMP_ACT_NOTIFY with fnctl) Jun 28, 2024

AkihiroSuda mentioned this issue Jun 28, 2024

Nsexec spring cleaning part I #3982

Merged

AkihiroSuda mentioned this issue Jun 28, 2024

hook for fcntl is incompatible with runc v1.2 (rc) rootless-containers/bypass4netns#86

Closed

AkihiroSuda changed the title ~~[v1.2 regression] nerdctl run --annotation nerdctl/bypass4netns=1 hangs (due to SCMP_ACT_NOTIFY with fnctl)~~ [v1.2 regression] SCMP_ACT_NOTIFY rule for fcntl causes runc to hang, before connecting to the seccomp listener agent Jun 29, 2024

AkihiroSuda mentioned this issue Jul 3, 2024

Revert "libcontainer: seccomp: pass around *os.File for notifyfd" #4337

Merged

AkihiroSuda closed this as completed in #4337 Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.2 regression] `SCMP_ACT_NOTIFY` rule for `fcntl` causes runc to hang, before connecting to the seccomp listener agent #4328

[v1.2 regression] `SCMP_ACT_NOTIFY` rule for `fcntl` causes runc to hang, before connecting to the seccomp listener agent #4328

AkihiroSuda commented Jun 28, 2024

AkihiroSuda commented Jun 28, 2024 •

edited

Loading

AkihiroSuda commented Jun 28, 2024 •

edited

Loading

rata commented Jun 28, 2024 •

edited

Loading

AkihiroSuda commented Jun 28, 2024

rata commented Jun 28, 2024

rata commented Jun 28, 2024 •

edited

Loading

naoki9911 commented Jun 29, 2024

rata commented Jul 1, 2024

AkihiroSuda commented Jul 3, 2024

[v1.2 regression] SCMP_ACT_NOTIFY rule for fcntl causes runc to hang, before connecting to the seccomp listener agent #4328

[v1.2 regression] SCMP_ACT_NOTIFY rule for fcntl causes runc to hang, before connecting to the seccomp listener agent #4328

Comments

AkihiroSuda commented Jun 28, 2024

AkihiroSuda commented Jun 28, 2024 • edited Loading

AkihiroSuda commented Jun 28, 2024 • edited Loading

rata commented Jun 28, 2024 • edited Loading

AkihiroSuda commented Jun 28, 2024

rata commented Jun 28, 2024

rata commented Jun 28, 2024 • edited Loading

naoki9911 commented Jun 29, 2024

rata commented Jul 1, 2024

AkihiroSuda commented Jul 3, 2024

[v1.2 regression] `SCMP_ACT_NOTIFY` rule for `fcntl` causes runc to hang, before connecting to the seccomp listener agent #4328

[v1.2 regression] `SCMP_ACT_NOTIFY` rule for `fcntl` causes runc to hang, before connecting to the seccomp listener agent #4328

AkihiroSuda commented Jun 28, 2024 •

edited

Loading

AkihiroSuda commented Jun 28, 2024 •

edited

Loading

rata commented Jun 28, 2024 •

edited

Loading

rata commented Jun 28, 2024 •

edited

Loading