Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with many SSH channels open, unhelpful errors #2

Open
sbrl opened this issue Jan 30, 2025 · 2 comments
Open

Race condition with many SSH channels open, unhelpful errors #2

sbrl opened this issue Jan 30, 2025 · 2 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed

Comments

@sbrl
Copy link

sbrl commented Jan 30, 2025

Hello,

I am developing an application in which I need to make a very large number of concurrent SSH connections at once (regularly 180+ individual connections is the end goal), but I am having difficulty scaling my application.

Whenever I run it with just ~2 connections at once, it runs fine. However, if I run it with more than 2-3 connections at once with concurrent async interaction between multiple connections, I get crashes like this one:

Ssh channel closed, check why the socket was closed or lost connection
Error: Ssh channel closed, check why the socket was closed or lost connection
    at Channel.<anonymous> (file:///home/sbrl/Documents/repos/PROJECT_NAME/node_modules/hivessh/dist/SshExec.js:190:41)
    at Object.onceWrapper (node:events:628:26)
    at Channel.emit (node:events:525:35)
    at Channel.doClose (/home/sbrl/Documents/repos/PROJECT_NAME/node_modules/ssh2/lib/utils.js:101:21)
    at Object.onceWrapper (node:events:627:28)
    at Channel.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1696:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)Details so far: 

.....but it fails to tell me what the error is, and I given that there is a stack break I cannot tell which connection has the problem or which call to a hivessh method actually had the problem.

This error is randomly thrown at random points across my entire codebase, making debugging impactical. In esssence, I'm doing something like this (greatly simplified):

const sshcs = [ do_ssh_connect(...), do_ssh_connect(...) ];
await Promise.all(sshcs.map(sshc => sshc.exec(...));

....just with lots of .exec calls.

Empirical evidence suggests that this is a bug with .exec(), and NOT with sshc.sftp.*, as I have yet to see a crash from the SFTP subsystem.

In other words, I suspect that hivessh has a race condition when you execute multiple commands at the same time asynchronously across multiple SshHosts.


To this end, I suggest that the above error message be updated to include the reason why the socket was closed or lost connection.

Inspecting an SshHost instance reveals this might be referring to SshHost.closeErr, but the message is unclear as to precisely where one should go looking.

For example, the error could instead read:

Error: Ssh channel closed. Reason: <reason here>

...for example, making something up:

Error: Ssh channel closed. Reason: Connection closed by server
@NobleMajo
Copy link
Owner

NobleMajo commented Jan 31, 2025

Hi @sbrl,
nice to see u again!!

I will test it with 3-4 parallel executions later. I also will track down the error, error message building and the origin error from the ssh2-library.

Can you meanwhile confirm that this is not a ssh2-library bug? I mean by checking their issues and check it with their callback function.

@NobleMajo NobleMajo self-assigned this Jan 31, 2025
@NobleMajo NobleMajo added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Jan 31, 2025
@NobleMajo
Copy link
Owner

NobleMajo commented Feb 2, 2025

Hi,
I've tried to reproduce your bug using 3 Hosts, but not succeeding.
Everything seems to work for me with concurrent sleep executions and also with sftp.

I have added a concurrent multihost test file. 37b5098
You can configure as many hosts as you like by following the env var pattern of the .env.temp file .
Just copy it locally to .env: cp .env.temp .env.
Then use npm run build && npm run testl to run a test against the defined hosts.
(After the bun migration it is, bun test )

The e2e execution tests are defined at the end of the e2e-concurrent-exec.test.ts file.

If you know how to reproduce this error, please reply immediately.

(UPDATE):
I found a very similar bug and hopefully fixed yours by handling the closing of the ssh2 exec channel in a different way.
Its on both branches 👍🏻
Please check if the change solves the problem for you.

cya 👍🏻

@NobleMajo NobleMajo reopened this Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants