-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Launch cockpit-session via socket activation on /run/cockpit/session #16808
Launch cockpit-session via socket activation on /run/cockpit/session #16808
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
aaefe97
to
7ebf9e6
Compare
This comment was marked as outdated.
This comment was marked as outdated.
7ebf9e6
to
1330a6a
Compare
1330a6a
to
ca43353
Compare
This comment was marked as resolved.
This comment was marked as resolved.
ca43353
to
9ed95a5
Compare
9ed95a5
to
6778062
Compare
6778062
to
f1e5a6a
Compare
6b7ddb3
to
8d11f0d
Compare
0de7b45
to
ff915c8
Compare
ff915c8
to
286aa1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall some conversation about moving the check for the newline at the end of the cgroup name closer to the spot where we check that the cert_pem
file starts with that line. Just want to mention that to make sure it doesn't fall through the cracks by accident.
Otherwise, this all looks pretty great!
I can't formally review this, and I still request a close reading by someone who isn't the two of us, anyway.
Thanks!
} | ||
|
||
/* this is an inout parameter, be extra suspicious */ | ||
assert (ucred_len >= sizeof ucred); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd handle this in a &&
or ||
clause on the if()
around the getsockopt()
call as well. Of course this is a non-op, since we know, but if we're gonna be extra suspicious, we should do it in a way that won't get removed by compiler flags.
Conceptually, this feels like part of the return value of that call, so it makes sense to check it in the same place (to me).
Feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion. Changed.
@@ -1116,7 +1118,12 @@ cockpit_session_launch (CockpitAuth *self, | |||
g_str_equal (type, "tls-cert")) | |||
{ | |||
if (command == NULL && unix_path == NULL) | |||
command = cockpit_ws_session_program; | |||
{ | |||
if (cockpit_ws_session_program) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed the fallback code path but the commit message still claims "Fall back to calling cockpit-session directly for custom setups."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the message, thanks!
char path[100]; | ||
int r = snprintf (path, sizeof path, "/proc/%lu", (unsigned long) pid); | ||
if (r < 0 || r >= sizeof path) | ||
errx (EX, "memory error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-euclidean memory error" ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the 10% extra that you get on special days!
src/session/client-certificate.c
Outdated
errx (EX, "memory error"); | ||
int fd = open (path, O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC); | ||
if (fd < 0) | ||
err (EX, "failed to open %s", path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called the Hubble tension!
(Fixed. Apparently my space bar bounces..)
if (len >= bufsize) | ||
errx (EX, "proc file %s exceeds buffer size %zu", name, bufsize); | ||
buffer[len] = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads very nicely now.
286aa1c
to
32d55a9
Compare
|
||
/* read_proc_pid_cgroup() already ensures that, but just in case we refactor this: this is *essential* for the | ||
* subsequent comparison */ | ||
if (ws_cgroup[ws_cgroup_length - 1] != '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for allowing me to be pedantic about this <3
09ed600
to
c6695e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/session/client-certificate.c
Outdated
} | ||
else { | ||
ws_pid_dirfd = open_proc_pid (getpid ()); | ||
debug ("failed to read stdin peer credentials: %m; not in socket mode, reading cgroup from my own pid %d", ws_pid_dirfd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets removed in a later commit. Can we squash that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly me being picky: Technically, that commit (session: Support Unix socket mode for client certificates) still requires the fallback. It is made obsolete by "ws: connect to cockpit-session via socket", but that one also cannot land before. So it's a very short transitionary period (just over two commits), but nevertheless for "I can build and test or revert a random commit" correctness I think we should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Then let me review it: errno might be overwritten here, no? And it prints a FD instead of a PID, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, that's in the wrong order, the debug() should come before the open_proc_pid() of course. Thanks, let me fix that completely dead piece of code!
When cockpit-session's stdin is a Unix socket, it is being spawned by cockpit-ws through [email protected]. In that case it doesn't make sense to look at its own cgroup, but we need to check the cgroup of the socket peer (i.e. cockpit-ws). We must guard against PID recycling attacks: 1. Mallory logs into cockpit, gets ws pid M, and hacks ws: connect to session, forks, keeps the session fd in a different process, and kills pid M. 3. Mallory waits until Alice logs in again and happens to get ws pid M (which can happen with a sufficient number of forks, social engineering, and some luck). cockpit-session checks that pid M is in cgroup /cockpit/alice, and starts an alice session for Mallory's ws. (Note: SO_PEERCRED gives you pid/uid/gid at the time connect() was made.) Thus require that the peer (ws) must have started earlier than cockpit-session. This is the same approach that polkit uses as a fallback if pidfds are not available: https://github.com/polkit-org/polkit/blob/main/src/polkit/polkitunixprocess.c Note that pidfds don't help us: There is no API to directly get from a pidfd to a cgroup, startup time, or /proc/<pid> dirfd, this has to happen via `pidfd_getpid()` and opening /proc/pid. But that's exactly what we want to avoid, and thus is pointless (they are also only available since kernel 6.5).
Unless it's otherwise specified in the configuration file, we now spawn cockpit-session by connecting to /run/cockpit/session if that exists. We leave the cockpit_ws_session_program variable in place to allow the tests to override things. Update the unit files for cockpit-ws to ensure that the socket is available when cockpit-ws is running. Adjust TestConnection.testBasic accordingly: When running cockpit-session via unix socket activation, its group permissions are irrelevant. Break the socket instead. Also adjust the reverse proxy tests which start the `cockpit-ws` binary directly to ensure that the session socket is aware. A custom production setup which doesn't use cockpit.socket will have to `Requires/After=cockpit-session.socket` as well to continue to function (as running cockpit-ws as root is undesirable). Drop `testAuthUnixPath`, as that's now the default. Instead, add a new `testAuthDirectSession` which tests a custom cockpit-ws setup that directly runs cockpit-session. Co-Authored-By: Martin Pitt <[email protected]>
systemd spawns this for us now, so we don't need the setuid bit anymore. Clean up the statoverride in the Debian packaging on upgrades.
This avoids an alternative code path which is unlikely to happen in practice, and which we don't test anywhere.
c6695e3
to
72bfb6c
Compare
🌟 |
|
||
/* Guard against pid recycling: If a malicious user captures ws, keeps the socket in a forked child and exits | ||
* the original pid, they can trap a different user to login, get the old pid (pointing ot their cgroup), and | ||
* capture their session. To prevent that, require that ws must have started earlier than ourselves. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably already aware, but just in case, in polkit there were CVEs due to comparing processes in this way for authentication purposes, as there are ways for an attacker to control the start timestamp as recorded by the kernel, by holding the forking process at a particular time. See comment at: https://github.com/polkit-org/polkit/blob/main/src/polkit/polkitunixprocess.c#L59
These days we use pid fds which cannot be recycled, and only fallback on heuristics on old kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment. We discussed this an awful lot. Please see this thread:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to know a bit more about what the pidfd peercred thing actually solves: does it mean that for as long as the kernel is willing to return that pidfd to us that it then won't reissue that pid? That would be a very big help indeed, but it would also mean that we don't actually need to use the pidfd to gain that benefit...
Or (in the case that the process that called connect()
is gone) will it issue us a pidfd that points to a process that doesn't exist anymore and possibly the pid of that process already got used?
I admittedly stopped looking into the polkit code after @martinpitt determined that it wasn't available on many of our target platforms and expressed reservations about implementing the fallback, but it would be useful to know how this ought to work, and I do believe that it will be a worthwhile thing to pursue in the future.
In the meantime, though, please understand that this code is only relevant on systems that support logging in with TLS client certificates, and this is only one part of a defence-in-depth strategy:
- only
cockpit-ws
can connect to the socket, as we control access to it via its group ownership - we use a "nonce" system: a randomly-generated certificate file name is created in
cockpit-tls
. That nonce is passed only to the process that's supposed to be logging in using that client certificate. That file is written to a temporary directory only accessible by root, and lives only as long as the TCP connection is open. In order to convincecockpit-session
that you are allowed to login, you need to know the name of the file in that directory. - the first line of that file contains the name of the cgroup of the process that is expected to be logging in using that file and we perform that double check based on the cgroup lookup we're discussing here.
We also limit the time that the login process is allowed to linger to 60 seconds in an attempt to make it more difficult to wait for a recycled PID.
One of the attack scenarios that we consider possible in Cockpit is that one legitimate user could exploit a weakness in the large C code base of cockpit-ws
to take over control of the process and use it to gain control over a logged-in session of another user. This is the main reason for why we go out of our way to isolate client-certificate-identified sessions from each other. But: I notice that we could be doing more limiting what cockpit-ws
is able to do, via the large number of sandboxing options available in systemd unit files. I'm not sure, for example, that it wouldn't be possible for one cockpit-ws
instance to gain control of another via ptrace()
— they're running as the same user. We restrict things quite a lot with our custom selinux policy, but it would be good to look into locking things down with systemd. I'll open an issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to know a bit more about what the pidfd peercred thing actually solves: does it mean that for as long as the kernel is willing to return that pidfd to us that it then won't reissue that pid? That would be a very big help indeed, but it would also mean that we don't actually need to use the pidfd to gain that benefit...
A pidfd can never be recycled, when the process exits, it will simply become invalid and can no longer be resolved to a pid, even if a process with the same pid has since appeared. With recent kernels you can use statx() and compare directly two pidfds for equality, we do this in systemd now for example. On slightly older kernels you get two pidfds, you resolve each pidfd -> pid, compare the pid, and then afterwards check again that both pidfds are still valid - more laborious than statx(), but equally safe against all races.
You can get the pidfd of a socket peer via SO_PEERPIDFD, this is safe because it's the kernel that gives you the pidfd, so you can be always sure it refers to the process that started the communication over the socket, it cannot be faked.
In summary, it is a feature designed exactly to avoid the kind of race conditions and vulnerabilities that have plagued polkit and other system software that need to authenticate processes.
In the meantime, though, please understand that this code is only relevant on systems that support logging in with TLS client certificates, and this is only one part of a defence-in-depth strategy
Of course, it's entirely up to you, and it is dependent on use cases, threat model, etc. I only mentioned because I found the link to this PR on Mastodon, and having had to deal with all of this recently in polkit, I thought of giving a heads-up, as it's a fairly recent kernel feature so it's not very well known yet. But if it's not necessary for the use case here, that's entirely fine, this was just intended as an FYI, not an RFE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluca Thanks! My main woe is that just getting a pidfd doesn't help us much. We need to know the peer's startup time (currently parsed from /proc/pid/stat) and cgroup (/proc/pid/cgroup), and pidfds have neither these APIs nor "give me the corresponding /proc/pid".
If SO_PEERPIDFD is guaranteed to give the pid fd at the time of connect(), and resolving it fails once it's gone, that is a good hardening. We'd still have to resolve it to a pid and open /proc/pid, but it would replace the time comparison on newer kernels -- and we'd keep the time comparison as a fallback on older ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@allisonkarlitskaya yes, that's what I meant with "it would replace the time comparison on newer kernels"
@mvollmer that's how I understand it, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to know the peer's startup time (currently parsed from /proc/pid/stat) and cgroup (/proc/pid/cgroup), and pidfds have neither these APIs nor "give me the corresponding /proc/pid".
Is that needed for reasons other than comparing two processes for equality?
If SO_PEERPIDFD is guaranteed to give the pid fd at the time of connect(), and resolving it fails once it's gone, that is a good hardening. We'd still have to resolve it to a pid and open /proc/pid, but it would replace the time comparison on newer kernels -- and we'd keep the time comparison as a fallback on older ones.
Yes that is the case - the kernel will give you the pidfd of the process that started the connect, and of nothing else, and it can be relied on. The normal flow is that you carry the pidfd around, and you resolve it to a pid (there's a glibc API to do that in 2.39, and there's an IOCTL in kernel 6.13, to avoid manual string parsing) at the time when you actually use the pid: if you can get a pid then the process is still alive and it's guaranteed to be the same, if you get an error then the process is gone and you can't resolve it to a pid anymore.
If you want the cgroup from that, then systemd has APIs to get the unit via a pidfd, so that it can be done again race-free. There is ongoing work to add another SO_PEER option to get the cgroup id directly from a socket, but I don't think that has landed in the kernel yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't need /stat any more, but we very much need the cgroup. That's the reason for doing all this dance 😁 (we use that to identify the correct service instance for the TLS certificate that we receive, and encode the cert's SHA into the unit's instance name)
Thanks for confirming the pidfd mechanics! That's how I understood it, and that will actually be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in the future we'll hopefully have a SO_PEERCGROUP or so, work was in progress. For now one can ask systemd to translate from pidfd to cgroup in various ways, like sd_pidfd_get_cgroup()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact if you want the unit, there's APIs to get that directly, via D-Bus
This paves the way for removing the static cockpit users in #16811, and fixes cockpit on bootc images.
A nice side effect of this is that we can now connect to unix sockets from cockpitauth, which is useful for https://github.com/allisonkarlitskaya/cockpit-cloud
read()
call (no streamFILE*
any more)pass memfd with login information from ws to sessionnot required. That information comes from cockpit-session to cockpit-bridge, which is unaffected by this change.$COCKPIT_REMOTE_PEER
from ws to sessioncockpit-session@<id>.service
fails with error 5/NOTINSTALLED initially (e.g.TestTerminal.testBasic
)TestConnection.testBasic
)Fixes #21201
Release note
Increased webserver sandboxing, no more
setuid
binaryThe outward-facing Cockpit webserver components run at a low privilege level. Previously, when logging into the session, we'd use a setuid helper program to help us regain root privileges to allow us to login as the target user. This program has always had a restricted set of permissions (via group ownership) that only allowed it to be executed by the Cockpit webserver, but it was still a
setuid
binary on the system.As of this release, this binary is no longer marked
setuid
and is instead started via systemd socket activation. The Cockpit webserver connects to it via a protected UNIX socket in/run
. The fact that the login session (which needs to be able to perform administration operations on the OS) is no longer a direct descendent of the webserver process has also enabled us to dramatically increase the amount of sandboxing of the webserver.The fact that we no longer have a binary that needs to be physically present in the filesystem with a special group ID means that we were also able to remove the last of our users that we create when Cockpit is installed. All Cockpit components now run as dynamic users created at startup via the
DynamicUser=
systemd feature. Old systems might still have acockpit-ws
user present (and very old systems might have TLS certificates owned by this user) and we'll leave this alone, but it's now possible to delete this user.