SSH: incorrect channel cache update when closing the channel client-side #9102

yarisx · 2024-11-22T17:16:10Z

Describe the bug
When there are multiple channels open on one connection, and one of the channels gets closed one of the issues can happen:

if the user channel handler process closes the channel and immediately exits then it is possible that SSH connection handler (upon receiving 'DOWN' message) removes the corresponding channel entry from the cache before the proper close message exchange completes. In this case if anything arrives from the peer (in our case it is either 'exit-status' or 'exit-signal' message) the crash ensues because of failed cache lookup. This was partially fixed by PR ssh: avoid crash upon exit-signal #8995, but the root cause of the erroneous cache eviction is still there
when the previous issue is mitigated by the fix from ssh: avoid crash upon exit-signal #8995 then there is a possibility to send the 'channel-close' message twice: once upon receiving 'exit-signal' from the peer and then - when the peer replies with its own 'channel-close', our side again sends 'channel-close', thus breaking the protocol. With some bad luck this may lead to the peer closing the connection with any channels that are still alive on that connection.

To Reproduce
Create an SSH connection with two channels, run some traffic through both channels, then while one channel is still in active use, make the user channel handler to close the channel and exit immediately. See #8929 for the description of the scenario when the first issue happens. The second issue is quite easy to reproduce - just use a custom OpenSSH subsystem on the server side, then kill -PIPE the subsystem (which can be a shell script), on my machine every second test of this scenario leads to the "protocol error" from OpenSSH because the second 'channel-close' is received for a channel that does not exist on the server.

Expected behavior
When a channel is closed the correct procedure is implemented, that is - each side sends 'channel-close' once. The 'exit-signal' or 'exit-status' are not channel-closing messages.

Affected versions
The OTP versions that are affected by this bug.

Additional context
The example of server-side OpenSSH trace for the second issue:

debug1: session_exit_message: session 0 channel 0 pid 19880 signal 13
debug1: session_exit_message: release channel 0
debug2: channel 0: write failed
debug2: chan_shutdown_write: channel 0: (i3 o0 sock -1 wfd 11 efd -1 [closed])
debug2: channel 0: send eow
debug2: channel 0: output open -> closed
debug2: channel 0: send close
debug3: send packet: type 97
debug3: channel 0: will not send data after close
debug3: receive packet: type 97
debug2: channel 0: rcvd close
debug3: receive packet: type 97
debug2: channel 0: rcvd close
channel 0: protocol error: close rcvd twice
debug3: channel 0: will not send data after close
debug2: channel 0: is dead
debug2: channel 0: gc: notify user
debug1: session_by_channel: session 0 channel 0
debug1: session_close_by_channel: channel 0 child 0

Note the "received packet: type 97" repeated.

The text was updated successfully, but these errors were encountered:

yarisx added the bug Issue is reported as a bug label Nov 22, 2024

yarisx mentioned this issue Nov 22, 2024

Fix channel close procedure when the peer dies or our handler goes down #9103

Open

IngelaAndin added the team:PS Assigned to OTP team PS label Nov 25, 2024

u3s self-assigned this Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSH: incorrect channel cache update when closing the channel client-side #9102

SSH: incorrect channel cache update when closing the channel client-side #9102

yarisx commented Nov 22, 2024

SSH: incorrect channel cache update when closing the channel client-side #9102

SSH: incorrect channel cache update when closing the channel client-side #9102

Comments

yarisx commented Nov 22, 2024