-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
session: Pass remote peer address in authorize message, add authorize response timeout, robustify client certificate mapping errors #21292
session: Pass remote peer address in authorize message, add authorize response timeout, robustify client certificate mapping errors #21292
Conversation
7cdcf4f
to
4ab9266
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks pretty good. Some very minor nitpicks below.
The one thing I find most concerning is the signal 14 message. This is unexpected and it requires a better explanation.
src/session/session-utils.c
Outdated
@@ -164,6 +164,24 @@ write_control_end (void) | |||
auth_msg_size = 0; | |||
} | |||
|
|||
__attribute__((__noreturn__)) void | |||
exit_init_problem (const char *problem, const char *message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra whitespace after the comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/session/session.c
Outdated
|
||
free (payload); | ||
exit (5); | ||
exit_init_problem (problem, message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same odd double whitespace :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...not fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fixed the last bit.
@@ -401,15 +400,7 @@ exit_init_problem (int result_code) | |||
else | |||
message = pam_strerror (NULL, result_code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a guarantee that PAM won't someday have a \
or "
in their messages? I know this is an existing issue, so feel free to punt on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running out of time this morning, but good point. It's not a new problem, so I'd punt it for this PR. But I made a TODO entry for myself and also in #16808.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed here after all. Running out of razor blades!
@@ -56,7 +56,7 @@ read_proc_self_cgroup (size_t *out_length) | |||
if (fp == NULL) | |||
{ | |||
warn ("Failed to open /proc/self/cgroup"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you want to keep the copy for the journal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as we don't see the protocol messages in the journal on failures, so the warn()/fail() are for tests and admins.
src/session/client-certificate.c
Outdated
@@ -285,7 +285,7 @@ sssd_map_certificate (const char *certificate, char** username) | |||
* | |||
* Read the given certificate file, ensure that it belongs to our own cgroup, and ask | |||
* sssd to map it to a user. If everything matches as expected, return the user name. | |||
* Otherwise return %NULL, a warning message will already have been logged. | |||
* Otherwise exit the process with sending an appropriate error to the protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to the protocol" is a bit weird as language. "To stdout using the cockpit protocol" perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
warnx ("Could not determine cgroup of this process"); | ||
return NULL; | ||
} | ||
assert (ws_cgroup); | ||
/* A simple prefix comparison is appropriate here because ws_cgroup | ||
* will contain exactly one newline (at the end), and the expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost feel like this is the place to make the extra check to ensure that ws_cgroup definitely ends with \n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#16808 changes this a lot, so I'd prefer to make the validation changes there.
test/verify/check-system-realms
Outdated
@@ -279,6 +279,8 @@ class CommonTests: | |||
# (this should be fixed properly in cockpit-ws) | |||
self.allow_journal_messages('.*session timed out during authentication') | |||
self.allow_journal_messages('.*authentication timed out') | |||
# 14 == SIGALRM | |||
self.allow_journal_messages('.*bridge program failed: Child process killed by signal 14') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong. If your signal handler is working properly then we shouldn't see this message. It should exit cleanly via your _exit(EX)
call. Please look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, yes, I didn't pay enough attention to this 🙈 This is doubly nonsense, as the bridge gets killed with that alarm, not cockpit -session. I. e. that alarm(60) somehow gets inherited.
I see one obvious bug here, there is a signal (SIGALRM, SIG_DFL);
call later on which I missed. However, after fixing that, I still get that error. It almost looks like starting a PAM session inherits the current alarms and signal handlers, and that of course is really not intended.
This confirms my initial skepticism about making the range of that alert(60) so wide. I reverted it to a narrower scope of waiting for and parsing the initial authorize dialog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our original issue here is that this message is misleading. This journal message is coming from -ws
(cockpitpipetransport.c:125
), which prints it after observing the exit status of cockpit-session
.
The child doesn't inherit alarm()
across a fork()
. This is the documented behaviour per POSIX, Linux, and I tested that it's true.
Somehow the -session
process itself is the thing that's getting killed by the signal. Which means that there's still something unexplained going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as said on Matrix it's definitively not our spawn_and_wait()
. That came after the alarm disarming even in the previous version of the patch. Hence my suspicion that it's the PAM session start, because what else could it be? If that's session itself (and a message from ws), then our signal handler is somehow not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I did test the signal handler, with alarm(1) and an extra sleep in ws (I put the patch somewhere, but can't find it quickly -- too many threads). That worked as expected, including the correct message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note: This piece of the patch is entirely gone now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to chase this down long enough to have thoroughly disproven my own original theory of what was going on here and to have (re-)learned with a high degree of certainty exactly what the process model is here.
The tl;dr: of it all is that a million things use alarm()
for all kinds of reasons, and many of them attempt to restore the previous state. I've been chasing ghosts through glibc with the locking functions in the utmp code and the password database and haven't been able to come up with anything, but I'm convinced enough at this point that the culprit is definitely in-process.
I also don't care anymore. It would have been nice to expand this to cover more code, but my concerns about the user being able to delay the other parts of the code are completely theoretical, and definitely "not our bug". Thanks for putting up with me on this. I'm not satisfied with the (lack of) answer but I'm sure enough about it to know that nothing deeply "weird" is happening here, and even if it is, it's not our fault.
src/ws/cockpitauth.c
Outdated
@@ -937,7 +939,7 @@ build_session_credentials (CockpitAuth *self, | |||
} | |||
} | |||
|
|||
remote_peer = cockpit_web_request_get_remote_address (request); | |||
gchar *remote_peer = cockpit_web_request_get_remote_address (request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda wondering why you didn't spring for g_autofree
here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, been hacking on cockpit-session for too long. Used that while at it.
The various perform_*() functions all assume a non-NULL rhost, as several functions such as `btmp_log()` do unchecked strncpy() on them. Add assertions to (1) make them fail more gracefully and usefully), and (2) document that requirement more explicitly. This is currently guaranteed by having an explicit fallback to `""` if `$COCKPIT_REMOTE_PEER` isn't set.
For hysterical raisins this was a leftover in session-utils (and even declared *twice* for the lolz), but it's only being used in session.c (which has all the PAM stuff). It fits much more neatly besides `last_txt_msg`.
4ab9266
to
819a745
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual: one serious problem and a small collection of minor/optionals.
@@ -164,6 +164,24 @@ write_control_end (void) | |||
auth_msg_size = 0; | |||
} | |||
|
|||
__attribute__((__noreturn__)) void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't catch this before, but <stdnoreturn.h>
is a thing since C11. Since C23 apparently you should write [[noreturn]]
without the #include
(which is now deprecated). That syntax is a bit of an eyebrow-raise for me, to be honest, but hey... no time like the present...
Feel free to pick either of those, or to ignore this request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use -std=gnu18
, so too new for [[noreturn]]
I figure. find /usr/include/ -name stdno*.h
shows no such entry. Just that libc apparently puts that attribute after the declaration, not before. Nevertheless, as this has been there for ages, I'd rather not touch it here and open yet another Baustelle.
src/session/session.c
Outdated
|
||
free (payload); | ||
exit (5); | ||
exit_init_problem (problem, message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...not fixed?
@@ -957,6 +968,10 @@ main (int argc, | |||
if (!cockpit_authorize_type (response, &type)) | |||
errx (EX, "invalid authorization header received"); | |||
|
|||
/* stop time-bounding */ | |||
alarm (0); | |||
sigaction (SIGALRM, &(struct sigaction) { .sa_handler = SIG_DFL, .sa_flags = 0 }, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant: ignored signals survive across execve()
but those with handlers are reset to default (since the handler isn't present in the new executable image).
Although I suppose we keep our own version of ourselves around to wait for the session to exit to perform cleanup.
I'm really curious about why lifting to here is required, though, and it's certainly not ideal. At exactly which point does PAM call fork()
? And which side are "we" on after it does that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alarm(2) says nothing about auto-resetting the alarm handler. I want to make really damn sure that we don't spawn the session with a non-default SIGALRM, so I really want to keep this. Maybe it's redundant, but then so are a lot of other things in session (just count the useless free() calls :grin).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that logic.
src/session/session.c
Outdated
/* we send the message through the Cockpit protocol, make sure we don't break JSON encoding */ | ||
char *message_sanitized = strdupx (message); | ||
char *c; | ||
while ((c = strchr (message, '"')) != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a mutable copy of the string into message_sanitized
and then you modify the original message
before passing the unmodified copy to exit_init_problem()
:) char*
unsafety issues strike again via the (terrible) signature of strchr()
which takes a const char *
and returns a char *
. I seriously start to hate C.
Shame we don't have g_strcanon
here.
The two passes over the string bother me and I'd have written it more like
for (int i = 0; message[i]; i++)
if (message[i] != '\\') ... else if ...
but I bet yours is actually faster in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, copy pasta! Fixed.
We want to use that facility for non-PAM code (like certificate checking). Rename the remainder in session.c to `exit_pam_init_problem()` to clarify its purpose.
Instead of passing them up to the caller, make the internal functions immediately exit cockpit-session with an appropriate protocol error, using our new `exit_init_problem()`. This doesn't do much yet, but this code is going to become more complex soon, let's not take chances of missing some error handling.
Just like in cockpit-wsinstance-*.socket
cockpit_session_launch() doesn't set this env if the remote host is unknown. That is never the case in practice at least in our tests, but callers should still be aware of it.
ws times out the authorization attempt after one minute (`cockpit_ws_auth_response_timeout`). Introduce a similar timeout on the session side. This makes PID recycling attacks harder, as their victim now has to log in within one minute. Drop the resetting of SIGQUIT. Nothing changes it before, `signal()` is discouraged, let's not mix it with our new `sigaction()` calls. That code has been there since the initial commit, so we don't have a justification for it.
Passing the remote peer from ws to cockpit-session via the `$COCKPIT_REMOTE_PEER` environment variable does not work in unix socket mode. So make that part of the protocol instead and attach it to the authorize response.
Any `"` or `\` in PAM error messages break the protocol, as we do naïve JSON encoding. Replace them with `'` and `/` respectively. (There are no known cases of this today, but let's be safe).
58958f6
to
984625e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the fixes. This is ready to go now!
/* we send the message through the Cockpit protocol, make sure we don't break JSON encoding */ | ||
char *message_sanitized = strdupx (message); | ||
char *c; | ||
while ((c = strchr (message_sanitized, '"')) != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look close enough the first time around (because of the more obvious bug) but my understanding of this code wasn't correct: you're rescanning the entire string on every loop iteration, not just starting again from the point of c
.
This is valid though, because the previous "match" will no longer exist, since you've replaced it.
Split out from #16808 plus some cockpit-session error code path strengthening suggested by @allisonkarlitskaya in #16808 (comment) .