Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update libssh to 0.11.1 #3735

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Update libssh to 0.11.1 #3735

wants to merge 13 commits into from

Conversation

Sploder12
Copy link
Contributor

@Sploder12 Sploder12 commented Oct 16, 2024

Sets the timeout to infinite since no messages for extended periods of time is expected behavior. Libssh updated to fix timeout being ignored.
Sets the pty mode explicitly to default since Libssh updated to inherit settings from stdin.

Fixes #3745

MULTI-1528

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 26.66667% with 22 lines in your changes missing coverage. Please review.

Project coverage is 88.86%. Comparing base (f01004e) to head (431ff31).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/platform/platform_unix.cpp 0.00% 19 Missing ⚠️
src/sshfs_mount/sshfs_mount.cpp 75.00% 2 Missing ⚠️
src/platform/console/unix_console.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3735      +/-   ##
==========================================
- Coverage   88.94%   88.86%   -0.09%     
==========================================
  Files         256      255       -1     
  Lines       14584    14602      +18     
==========================================
+ Hits        12972    12976       +4     
- Misses       1612     1626      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sploder12 Sploder12 marked this pull request as ready for review October 18, 2024 18:54
@ricab ricab requested a review from georgeliao October 21, 2024 09:06
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Trevor, thanks. I ask only for some unit tests for the new code. Other than that, +1 on my secondary review.

@Sploder12
Copy link
Contributor Author

Some reproduction steps for the issues.

Formatting:

  1. Run multipass launch -n <instance-name>
  2. Run multipass shell <instance-name>
  3. Observe incorrectly formatted shell and no ECHO

Mounts:

  1. Run multipass launch -n <instance-name>
  2. Run multipass mount <source> <target>
  3. Wait ~30 seconds after mount finishes mounting
  4. Run multipass shell <instance-name> (GUI doesn't have shell formatting issues)
  5. Attempt to navigate to the files in <target>
  6. Run ls or cd
  7. Observe shell hang

@sharder996 sharder996 added this to the 1.15.0 milestone Oct 30, 2024
@Sploder12 Sploder12 force-pushed the update-libssh branch 2 times, most recently from de690f1 to 96d1a01 Compare November 27, 2024 20:17
@ricab ricab modified the milestones: 1.15.0, 1.15.1 Nov 29, 2024
Copy link
Contributor

@georgeliao georgeliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Sploder12

Thanks for the good work, the fix seems to be solid. I only have a few minor comments and questions.

src/platform/console/unix_console.cpp Show resolved Hide resolved
src/sshfs_mount/sshfs_mount.cpp Outdated Show resolved Hide resolved
const std::string& target,
const mp::id_mappings& gid_mappings,
const mp::id_mappings& uid_mappings)
: running{true},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should setting running to be true be in the sftp_thread as a way of localizing the variable?

It should not be true until the sftp_server is running, so maybe before sftp_server->run(); is more appropriate. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same intuition, but I believe this needs to be alive right after construction, so that the watchdog doesn't quit before the thread starts.

To avoid confusion, one alternative would be to have an enum to distinguish the running state, instead of a bool. Something like "unstarted, running, stopped". Then in the watchdog condition, look for stopped. The logic would be the same, but it would help readers and future devs not move this and introduce a bug. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ricardo's belief is correct. The watchdog would immediately exit if the thread takes a while to start. I think the enum approach would work well though.

Copy link
Contributor

@georgeliao georgeliao Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see, then the enum approach is fine.

src/sshfs_mount/sshfs_mount.h Outdated Show resolved Hide resolved
: session{ssh_new(), ssh_free}, mut{}
{
if (session == nullptr)
throw mp::SSHException("could not allocate ssh session");

const long timeout_secs = std::chrono::duration_cast<std::chrono::seconds>(timeout).count();
const long timeout_secs = std::numeric_limits<long>::max();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this change, I think the timeout_secs is used to set the SSH_OPTIONS_TIMEOUT, and this option is the timeout value for ssh session connection establishment wait time (not entire sure about this, hard to find a clear documentation as well). Therefore, having a reasonable value (like the original 20 seconds) seems to makes sense here.

Besides, with your watchdog fix ( which is essentially checking the sftp_threading running together with other SIGQUIT, SIGTERM, SIGHUP signals), the program already behaves correctly. So I am doubting whether we still make the time out change here. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this a keep-alive timeout? A better var name might help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The program will not behave correctly with a reasonable timeout unfortunately. After the timeout expires, the sshfs_server process will exit, the watchdog fix makes it so that it doesn't hang. It behaves like a keep-alive timeout, but libssh has no keep-alive messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so it looks like a keep-alive timeout as opposed to ssh session connection establishment wait time, interesting. It is a weird option offered to users though.

Copy link
Contributor

@georgeliao georgeliao Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a wondering here though, the SSH_OPTIONS_TIMEOUT is the a ssh option. If it were a keep-alive timeout, should it be the timeout of ssh session alive? If so, should the multipass shell hang without mount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multipass shell doesn't hang since it's setup to automatically reconnect. You can see this behavior in #3810 where the SSH session is restarting every 20 seconds. Also with normal use it's much more likely there is some SSH traffic within the 20 seconds compared to SFTP which is likely to not get any traffic for long periods of time.

src/platform/platform_unix.cpp Show resolved Hide resolved
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR @Sploder12! And nice review as well @georgeliao 😃

A few comments on my secondary review.

src/platform/platform_unix.cpp Outdated Show resolved Hide resolved
include/multipass/platform.h Show resolved Hide resolved
const std::string& target,
const mp::id_mappings& gid_mappings,
const mp::id_mappings& uid_mappings)
: running{true},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same intuition, but I believe this needs to be alive right after construction, so that the watchdog doesn't quit before the thread starts.

To avoid confusion, one alternative would be to have an enum to distinguish the running state, instead of a bool. Something like "unstarted, running, stopped". Then in the watchdog condition, look for stopped. The logic would be the same, but it would help readers and future devs not move this and introduce a bug. WDYT?

: session{ssh_new(), ssh_free}, mut{}
{
if (session == nullptr)
throw mp::SSHException("could not allocate ssh session");

const long timeout_secs = std::chrono::duration_cast<std::chrono::seconds>(timeout).count();
const long timeout_secs = std::numeric_limits<long>::max();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this a keep-alive timeout? A better var name might help here.

@Sploder12 Sploder12 marked this pull request as draft December 4, 2024 19:28
@Sploder12 Sploder12 marked this pull request as ready for review December 9, 2024 21:41
int sig = SIGUSR2;

// A signal generator that triggers after `timeout`
AutoJoinThread signaler([&sig_mtx, &sig_cv, &sig, &timeout, signalee = pthread_self()] {
Copy link
Contributor

@georgeliao georgeliao Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we need to replace this in house utility when std::jthread is available.

}

{
std::unique_lock lock(sig_mtx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use std::lock_guard<std::mutex> as opposed to std::unique_lock since we do not need to move the lock into a function.

std::unique_lock lock(sig_mtx);
while (sig == SIGUSR2)
{
auto status = sig_cv.wait_for(lock, timeout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the wait_for overloaded function with the predicate to check the sig != SIGUSR2, which can simplify the code here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, so something like this should work

while (!sig_cv.wait_for(lock, timeout, [&sig]() { return sig != SIGUSR2; })) {
    pthread_kill(signalee, SIGUSR2);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSHFS watchdog doesn't notice thread exit
4 participants