Race condition with many SSH channels open, unhelpful errors #2
Labels
bug
Something isn't working
enhancement
New feature or request
help wanted
Extra attention is needed
Hello,
I am developing an application in which I need to make a very large number of concurrent SSH connections at once (regularly 180+ individual connections is the end goal), but I am having difficulty scaling my application.
Whenever I run it with just ~2 connections at once, it runs fine. However, if I run it with more than 2-3 connections at once with concurrent async interaction between multiple connections, I get crashes like this one:
.....but it fails to tell me what the error is, and I given that there is a stack break I cannot tell which connection has the problem or which call to a
hivessh
method actually had the problem.This error is randomly thrown at random points across my entire codebase, making debugging impactical. In esssence, I'm doing something like this (greatly simplified):
....just with lots of
.exec
calls.Empirical evidence suggests that this is a bug with
.exec()
, and NOT withsshc.sftp.*
, as I have yet to see a crash from the SFTP subsystem.In other words, I suspect that
hivessh
has a race condition when you execute multiple commands at the same time asynchronously across multipleSshHost
s.To this end, I suggest that the above error message be updated to include the reason why the socket was closed or lost connection.
Inspecting an
SshHost
instance reveals this might be referring toSshHost.closeErr
, but the message is unclear as to precisely where one should go looking.For example, the error could instead read:
...for example, making something up:
The text was updated successfully, but these errors were encountered: