Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.2 regression] SCMP_ACT_NOTIFY rule for fcntl causes runc to hang, before connecting to the seccomp listener agent #4328

Closed
Tracked by #4114
AkihiroSuda opened this issue Jun 28, 2024 · 9 comments · Fixed by #4337
Labels
Milestone

Comments

@AkihiroSuda
Copy link
Member

nerdctl run --rm --annotation nerdctl/bypass4netns=1 alpine echo hi works with runc v1.1.13, but hangs with runc v1.2.0-rc.2.

Reproduction steps:

containerd-rootless-setuptool.sh install
containerd-rootless-setuptool.sh install-bypass4netnsd
nerdctl run --rm --annotation nerdctl/bypass4netns=1 alpine echo hi
@AkihiroSuda AkihiroSuda added this to the 1.2.0 milestone Jun 28, 2024
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Jun 28, 2024

Looks like seccomping fcntl with SCMP_ACT_NOTIFY is no longer possible with runc v1.2
https://github.com/rootless-containers/bypass4netns/blob/v0.4.1/pkg/oci/oci.go#L14


Minimal repro:

--- config.json.OLD     2024-06-28 14:27:41.625930763 +0900
+++ config.json 2024-06-28 14:28:02.228529889 +0900
@@ -173,6 +173,23 @@
       "/proc/irq",
       "/proc/sys",
       "/proc/sysrq-trigger"
-    ]
+    ],
+    "seccomp": {
+      "defaultAction": "SCMP_ACT_ALLOW",
+      "architectures": [
+        "SCMP_ARCH_X86_64",
+        "SCMP_ARCH_X86",
+        "SCMP_ARCH_X32"
+      ],
+      "listenerPath": "/run/user/1001/bypass4netns.sock",
+      "syscalls": [
+        {
+          "names": [
+            "fcntl"
+          ],
+          "action": "SCMP_ACT_NOTIFY"
+        }
+      ]
+    }
   }
 }
  • Run rootlesskit runc run foo (hangs with runc v1.2.0-rc.2)

@AkihiroSuda AkihiroSuda changed the title [v1.2 regression] nerdctl run --annotation nerdctl/bypass4netns=1 hangs [v1.2 regression] nerdctl run --annotation nerdctl/bypass4netns=1 hangs (due to SCMP_ACT_NOTIFY with fnctl) Jun 28, 2024
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Jun 28, 2024

Regression in 20b95f2 libcontainer: seccomp: pass around *os.File for notifyfd (PR #3982)

Seems to be related to https://cs.opensource.google/go/go/+/refs/tags/go1.22.4:src/os/file_unix.go;l=113

@rata
Copy link
Member

rata commented Jun 28, 2024

This is borderline for me. Is it a regression? I mean, I feel like we should be allowed to change which syscalls we do after the seccomp policies are applied and before the container entrypoint exec. Especially since we are using go, that we don't have so much control on which syscalls are used.

In fact, this is a great example, it is not documented that os.File uses fcntl and they can probably change it in different versions of go. So it's hard for us to make promises, IMHO, when using such a high-level language.

Can't this be handled properly in the seccomp agent? I think it should be easy to handle if the agent does: (a) accept all the syscalls on fds it doesn't know about (those where created before the agent received the message on the unix socket, so those are runc fds), or (b) accept all the syscalls before the first exec.

I think one of those should be possible (IMHO the latter is more robust, I'd go for that)

@AkihiroSuda what do you think?

@AkihiroSuda
Copy link
Member Author

cc @naoki9911 (maintainer of https://github.com/rootless-containers/bypass4netns)

accept all the syscalls before the first exec.

How can the agent detect the first exec?

@rata
Copy link
Member

rata commented Jun 28, 2024

@AkihiroSuda if you notify on the exec syscall, then you can see if exec was done or not and accept all before processing an exec, after the exec do whatever the agent wants to do.

If you do not want to handle exec, then just accepting all on unknown fds will do the trick too for this issue.

@rata
Copy link
Member

rata commented Jun 28, 2024

But I honestly don't think this is something we want to fix in runc. I'm okay if the PR to fix this is very small, but it seems like the agent should handle this better (also, maybe the agent doesn't work with youki or other runtimes until it has a more robust handling of this).

In fact, looking at the golang code you shared (https://cs.opensource.google/go/go/+/refs/tags/go1.22.4:src/os/file_unix.go;l=113) if only the agent returns an error, instead of hanging the syscall forever, it might just work fine. If fcntl returns an error, the runtime doesn't abort it at all in that function.

I do think a better handling in the agent as I proposed in the comments above is better, but even this small fix might just cut it for this case.

@naoki9911
Copy link

I confirmed the issue.

bypass4netns always allows fcntl exec.
It only records fcntl's arguments.

With runc v 1.1.13, bypass4netns hooks fcntl correctly.

$ .local/bin/runc -v
runc version 1.1.13
commit: v1.1.13-0-g58aa9203
spec: 1.0.2-dev
go: go1.22.4
libseccomp: 2.5.1

$ rootlesskit ./.local/bin/runc run foo
exec /bin/sh: no such file or directory
[rootlesskit:child ] error: command [./.local/bin/runc run foo] exited: exit status 1
[rootlesskit:parent] error: child exited: exit status 1

$ ./bypass4netns --trace
INFO[0000] Trace mode enabled
INFO[0000] SocketPath: /run/user/1002/bypass4netns.sock
INFO[0000] 127.0.0.0/8 is added to ignore
INFO[0000] Waiting for seccomp file descriptors
INFO[0014] accept connection
INFO[0014] Received new seccomp fd: 8
INFO[0014] tracer is disabled
INFO[0014] background task is ready. start to handle
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 3 0 0 0 0]
DEBU[0014] process is registered                         pid=564221 sockfd=4 syscall=fcntl
DEBU[0014] got sockfd=10
DEBU[0014] failed to get socket args err="getsockopt(SO_DOMAIN) failed: socket operation on non-socket"  pid=564221 sockfd=4 syscall=fcntl
DEBU[0014] socket is registered (state=NotBypassable)    pid=564221 sockfd=4 syscall=fcntl
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 4 34816 0 0 0]
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 3 0 0 0 0]
TRAC[0014] Received syscall "fcntl", pid 564221, arch "amd64", args [4 4 32768 0 0 0]

But with runc v1.2.0-rc.2, bypass4nets does not receive any notifications.

$ ./runc-amd64-v1.2.0-rc.2 -v
runc version 1.2.0-rc.2
commit: v1.2.0-rc.2-0-gf2d2ee5e-dirty
spec: 1.2.0
go: go1.22.3
libseccomp: 2.5.5

$ rootlesskit ./runc-amd64-v1.2.0-rc.2 run foo
<- Hangs here

$ ./bypass4netns --trace
INFO[0000] Trace mode enabled
INFO[0000] SocketPath: /run/user/1002/bypass4netns.sock
INFO[0000] 127.0.0.0/8 is added to ignore
INFO[0000] Waiting for seccomp file descriptors
<- No new notify connection received.

bypass4netns does not receive notify fd with runc v1.2.0-rc.2.
I changed the hooking syscall to connect, bypass4netns successfully accepts notify fd.

      "syscalls": [
        {
          "names": [
-            "fcntl"
+           "connect"
         ],
          "action": "SCMP_ACT_NOTIFY"
        }
      ]
$ rootlesskit ./runc-amd64-v1.2.0-rc.2 run foo
exec /bin/sh: no such file or directory
[rootlesskit:child ] error: command [./runc-amd64-v1.2.0-rc.2 run foo] exited: exit status 255
[rootlesskit:parent] error: child exited: exit status 255

$ ./bypass4netns --trace
INFO[0000] Trace mode enabled
INFO[0000] SocketPath: /run/user/1002/bypass4netns.sock
INFO[0000] 127.0.0.0/8 is added to ignore
INFO[0000] Waiting for seccomp file descriptors
INFO[0002] accept connection <- accepted new connection.
INFO[0002] Received new seccomp fd: 8
INFO[0002] tracer is disabled
INFO[0002] background task is ready. start to handle

I think this issue occurs in the establishment of seccomp notification connection.
Filtering fcntl may be causing some 'dead-lock' state in it.

@AkihiroSuda AkihiroSuda changed the title [v1.2 regression] nerdctl run --annotation nerdctl/bypass4netns=1 hangs (due to SCMP_ACT_NOTIFY with fnctl) [v1.2 regression] SCMP_ACT_NOTIFY rule for fcntl causes runc to hang, before connecting to the seccomp listener agent Jun 29, 2024
@rata
Copy link
Member

rata commented Jul 1, 2024

Thinking about it more, we can probably break other setups changing the syscalls we do at that point. I think it is unlikely we do, but we can check if it's easy to fix this. Wanna take a look and open a PR?

(for the agent, though, I think the changes we talked about are the way to go anyways)

@AkihiroSuda
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants