Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update libssh to 0.11.1 #3735

Merged
merged 21 commits into from
Jan 20, 2025
Merged

Update libssh to 0.11.1 #3735

merged 21 commits into from
Jan 20, 2025

Conversation

Sploder12
Copy link
Contributor

@Sploder12 Sploder12 commented Oct 16, 2024

Sets the timeout to infinite since no messages for extended periods of time is expected behavior. Libssh updated to fix timeout being ignored.
Sets the pty mode explicitly to default since Libssh updated to inherit settings from stdin.

Fixes #3745

MULTI-1528

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 67.85714% with 9 lines in your changes missing coverage. Please review.

Project coverage is 89.07%. Comparing base (cbcb5e7) to head (d844a00).
Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
src/platform/platform_unix.cpp 64.70% 6 Missing ⚠️
src/sshfs_mount/sshfs_mount.cpp 75.00% 2 Missing ⚠️
src/platform/console/unix_console.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3735      +/-   ##
==========================================
+ Coverage   89.02%   89.07%   +0.05%     
==========================================
  Files         255      254       -1     
  Lines       14583    14598      +15     
==========================================
+ Hits        12982    13003      +21     
+ Misses       1601     1595       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sploder12 Sploder12 marked this pull request as ready for review October 18, 2024 18:54
@ricab ricab requested a review from georgeliao October 21, 2024 09:06
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Trevor, thanks. I ask only for some unit tests for the new code. Other than that, +1 on my secondary review.

@Sploder12
Copy link
Contributor Author

Some reproduction steps for the issues.

Formatting:

  1. Run multipass launch -n <instance-name>
  2. Run multipass shell <instance-name>
  3. Observe incorrectly formatted shell and no ECHO

Mounts:

  1. Run multipass launch -n <instance-name>
  2. Run multipass mount <source> <target>
  3. Wait ~30 seconds after mount finishes mounting
  4. Run multipass shell <instance-name> (GUI doesn't have shell formatting issues)
  5. Attempt to navigate to the files in <target>
  6. Run ls or cd
  7. Observe shell hang

@sharder996 sharder996 added this to the 1.15.0 milestone Oct 30, 2024
@Sploder12 Sploder12 force-pushed the update-libssh branch 2 times, most recently from de690f1 to 96d1a01 Compare November 27, 2024 20:17
@ricab ricab modified the milestones: 1.15.0, 1.15.1 Nov 29, 2024
Copy link
Contributor

@georgeliao georgeliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Sploder12

Thanks for the good work, the fix seems to be solid. I only have a few minor comments and questions.

src/platform/console/unix_console.cpp Show resolved Hide resolved
src/sshfs_mount/sshfs_mount.cpp Outdated Show resolved Hide resolved
const std::string& target,
const mp::id_mappings& gid_mappings,
const mp::id_mappings& uid_mappings)
: running{true},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should setting running to be true be in the sftp_thread as a way of localizing the variable?

It should not be true until the sftp_server is running, so maybe before sftp_server->run(); is more appropriate. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same intuition, but I believe this needs to be alive right after construction, so that the watchdog doesn't quit before the thread starts.

To avoid confusion, one alternative would be to have an enum to distinguish the running state, instead of a bool. Something like "unstarted, running, stopped". Then in the watchdog condition, look for stopped. The logic would be the same, but it would help readers and future devs not move this and introduce a bug. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ricardo's belief is correct. The watchdog would immediately exit if the thread takes a while to start. I think the enum approach would work well though.

Copy link
Contributor

@georgeliao georgeliao Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see, then the enum approach is fine.

src/sshfs_mount/sshfs_mount.h Outdated Show resolved Hide resolved
: session{ssh_new(), ssh_free}, mut{}
{
if (session == nullptr)
throw mp::SSHException("could not allocate ssh session");

const long timeout_secs = std::chrono::duration_cast<std::chrono::seconds>(timeout).count();
const long timeout_secs = std::numeric_limits<long>::max();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this change, I think the timeout_secs is used to set the SSH_OPTIONS_TIMEOUT, and this option is the timeout value for ssh session connection establishment wait time (not entire sure about this, hard to find a clear documentation as well). Therefore, having a reasonable value (like the original 20 seconds) seems to makes sense here.

Besides, with your watchdog fix ( which is essentially checking the sftp_threading running together with other SIGQUIT, SIGTERM, SIGHUP signals), the program already behaves correctly. So I am doubting whether we still make the time out change here. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this a keep-alive timeout? A better var name might help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The program will not behave correctly with a reasonable timeout unfortunately. After the timeout expires, the sshfs_server process will exit, the watchdog fix makes it so that it doesn't hang. It behaves like a keep-alive timeout, but libssh has no keep-alive messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so it looks like a keep-alive timeout as opposed to ssh session connection establishment wait time, interesting. It is a weird option offered to users though.

Copy link
Contributor

@georgeliao georgeliao Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a wondering here though, the SSH_OPTIONS_TIMEOUT is the a ssh option. If it were a keep-alive timeout, should it be the timeout of ssh session alive? If so, should the multipass shell hang without mount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multipass shell doesn't hang since it's setup to automatically reconnect. You can see this behavior in #3810 where the SSH session is restarting every 20 seconds. Also with normal use it's much more likely there is some SSH traffic within the 20 seconds compared to SFTP which is likely to not get any traffic for long periods of time.

src/platform/platform_unix.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR @Sploder12! And nice review as well @georgeliao 😃

A few comments on my secondary review.

src/platform/platform_unix.cpp Outdated Show resolved Hide resolved
include/multipass/platform.h Show resolved Hide resolved
const std::string& target,
const mp::id_mappings& gid_mappings,
const mp::id_mappings& uid_mappings)
: running{true},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same intuition, but I believe this needs to be alive right after construction, so that the watchdog doesn't quit before the thread starts.

To avoid confusion, one alternative would be to have an enum to distinguish the running state, instead of a bool. Something like "unstarted, running, stopped". Then in the watchdog condition, look for stopped. The logic would be the same, but it would help readers and future devs not move this and introduce a bug. WDYT?

: session{ssh_new(), ssh_free}, mut{}
{
if (session == nullptr)
throw mp::SSHException("could not allocate ssh session");

const long timeout_secs = std::chrono::duration_cast<std::chrono::seconds>(timeout).count();
const long timeout_secs = std::numeric_limits<long>::max();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this a keep-alive timeout? A better var name might help here.

@Sploder12 Sploder12 marked this pull request as draft December 4, 2024 19:28
@Sploder12 Sploder12 marked this pull request as ready for review December 9, 2024 21:41
int sig = SIGUSR2;

// A signal generator that triggers after `timeout`
AutoJoinThread signaler([&sig_mtx, &sig_cv, &sig, &timeout, signalee = pthread_self()] {
Copy link
Contributor

@georgeliao georgeliao Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we need to replace this in house utility when std::jthread is available.

include/multipass/platform.h Outdated Show resolved Hide resolved
include/multipass/platform.h Show resolved Hide resolved
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Trevor, the code looks good to me, but I would still love to see some unit tests on that make_quit_watchdog. That stuff is quite complex and you made a nice job of it :) However, the logic is inherently tricky and error prone (now and in the future).

An option to make testing easier would be to use the Timer class, which is already tested (at least partially). Let me know what you think.

src/platform/platform_unix.cpp Outdated Show resolved Hide resolved
});

// wait on signals and condition
int ret = SIGUSR2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this name be improved a little, please? This is not returned anywhere. I know it is used for the output of sigwait, but I would expect ret to be reserved for something we return, not the return of a function we call. Perhaps latest_sig or something of the sort. Feel free to come up with something better.

src/platform/platform_unix.cpp Outdated Show resolved Hide resolved
src/platform/platform_unix.cpp Outdated Show resolved Hide resolved
src/platform/platform_unix.cpp Outdated Show resolved Hide resolved
@ricab ricab mentioned this pull request Dec 17, 2024
@Sploder12
Copy link
Contributor Author

@ricab I can try to write unit tests. Timer won't help much since the main issue is the signals. Integration tests would be needed to properly test this too, sending SIGTERM is a bit too dangerous for a unit test. Where/How would I write an integration test?

@Sploder12 Sploder12 marked this pull request as draft December 17, 2024 15:28
@Sploder12 Sploder12 marked this pull request as ready for review December 17, 2024 20:23
georgeliao
georgeliao previously approved these changes Jan 13, 2025
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff Trevor, thanks! This is pretty much there, I have only a couple last renaming requests. They're not a big deal... I am hoping the IDE can rename everything for you, but let me know if this is troublesome for any reason.

include/multipass/platform_unix.h Outdated Show resolved Hide resolved
include/multipass/platform_unix.h Outdated Show resolved Hide resolved
include/multipass/platform_unix.h Outdated Show resolved Hide resolved
Comment on lines 165 to 178
int mp::platform::SignalWrapper::mask_signals(int how, const sigset_t* sigset, sigset_t* old_set) const
{
return pthread_sigmask(how, sigset, old_set);
}

int mp::platform::SignalWrapper::send(pthread_t target, int signal) const
{
return pthread_kill(target, signal);
}

int mp::platform::SignalWrapper::wait(const sigset_t& sigset, int& got) const
{
return sigwait(std::addressof(sigset), std::addressof(got));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, it is fine for these to go untested by unit tests, since they are themselves mere wrappers to enable testing (mocking) elsewhere.

@Sploder12 Sploder12 dismissed stale reviews from ricab and georgeliao via f7d3205 January 16, 2025 20:26
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ricab ricab added this pull request to the merge queue Jan 17, 2025
@ricab
Copy link
Collaborator

ricab commented Jan 17, 2025

This is going to fail because it needs the other side. After we confirm that is the only issue, we can merge manually.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 17, 2025
@ricab ricab merged commit 5ee1dab into main Jan 20, 2025
13 of 14 checks passed
@ricab ricab deleted the update-libssh branch January 20, 2025 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSHFS watchdog doesn't notice thread exit
4 participants