Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some backend (i.e. opendal:*, rclone) implementations are incompatible with async features (i.e. Webdav) which result in runtime crashes #1181

Open
lostb1t opened this issue Jul 20, 2024 · 7 comments
Assignees
Labels
A-backends Area: Related to backends in `rustic-core` A-core Area: Generally related to `rustic_core` A-filesystem Area: Related to the Filesystem in one way or the other C-bug Category: Something isn't working as expected I-panic Issue: `rustic` panicked

Comments

@lostb1t
Copy link

lostb1t commented Jul 20, 2024

When browsing to a specific snapshot the application crashes (using opendal S3)

The application panicked (crashed).
Message:  Cannot start a runtime from within a runtime. This happens because a function (like `block_on`) attempted to block the current thread while the thread is being used to drive asynchronous tasks.
Location: /cargo/registry/src/index.crates.io-6f17d22bba15001f/opendal-0.44.2/src/layers/blocking.rs:264
@github-actions github-actions bot added the S-triage Status: Waiting for a maintainer to triage this issue/PR label Jul 20, 2024
@lostb1t lostb1t changed the title Webdav: requesting snapshot content results in crqsh Webdav: requesting snapshot content results in crash Jul 20, 2024
@nardoor nardoor added C-bug Category: Something isn't working as expected I-panic Issue: `rustic` panicked A-filesystem Area: Related to the Filesystem in one way or the other and removed S-triage Status: Waiting for a maintainer to triage this issue/PR labels Aug 20, 2024
@nardoor
Copy link
Contributor

nardoor commented Aug 22, 2024

To keep you updated, I could reproduce the issue using opendal:fs.
The fix is not trivial, I'll keep this issue updated when I reach it.

@enboig
Copy link

enboig commented Sep 30, 2024

Same problem using sftp

@nardoor nardoor changed the title Webdav: requesting snapshot content results in crash Some backend (i.e. opendal:*, sftb) implementations are incompatible with async features (i.e. Webdav) which result in runtime crashes Nov 17, 2024
@nardoor
Copy link
Contributor

nardoor commented Nov 17, 2024

I renamed the issue to be more generic.

Problem insights:

To give some insights the problem is the following:

  1. (rustic_core) All our backend traits (and implementations) in rustic-rs/rustic_core are sync
  2. (rustic_core) Some crates used to implement these backends are async but provide a sync compatibility using block_on. This compatibility impl is used to impl the backend traits.
  3. (rustic_rs) Webdav feature deploys an async server
  4. (rustic_rs) When using Webdav with backends that use block_on (see 2.) we do the following ~callstack:
- main
- tokio runtime for webdav
- async webdav server handle
- opendal backend read
- `block_on` blocking operator read
- runtime block on <crash>

Remediation, fixing:

I see 2 main steps to advance on that issue:

  1. (~ simple) avoid crash by disabling incompatible configuration:
  • list async feature incompatible backends and fail early when using webdav with them
  • make sure rustic_rs as no other async features such as webdav
  1. see with the team for the bigger fix:
  • refactor backend traits into async traits? this is huge breaking change and a lot of work.. (in a nutshell, all rustic_core and rustic_backend would need a ~rewrite.
    howevre to me it would make sense because the backend operations are async by nature (network, disk, ...)
  • some other way? ...

@nardoor
Copy link
Contributor

nardoor commented Nov 17, 2024

Using rclone as backend with webdav feature result in a similar (but different) crash:

The application panicked (crashed).
Message:  Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context.
Location: /home/nardor/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/blocking/shutdown.rs:51

This happens because reqwest (used under the hood) creates a runtime and drops it for every request.

@nardoor nardoor changed the title Some backend (i.e. opendal:*, sftb) implementations are incompatible with async features (i.e. Webdav) which result in runtime crashes Some backend (i.e. opendal:*, rclone) implementations are incompatible with async features (i.e. Webdav) which result in runtime crashes Nov 17, 2024
@nardoor
Copy link
Contributor

nardoor commented Nov 17, 2024

See the above PR for the step 1 fix, once rustic_core is updated, I'll update rustic.

@simonsan simonsan added A-backends Area: Related to backends in `rustic-core` A-core Area: Generally related to `rustic_core` labels Nov 17, 2024
github-merge-queue bot pushed a commit to rustic-rs/rustic_core that referenced this issue Nov 18, 2024
…tibility (#355)

Add `getters` to warn about `async_incompatible` backends.

Using the type system or fixing all async compatibility issue is big and
structural work.
To avoid runtime crash for our users I suggest using this getter as a
temporary fix.

tracking issue: rustic-rs/rustic#1181 (see for more details)

---------

Co-authored-by: simonsan <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Nov 21, 2024
…tibility and error out (#1355)

Use `getters` to early exit webdav when using `async_incompatible`
backends.

Using the type system or fixing all async compatibility issue is big and
structural work.
To avoid runtime crash for our users I suggest using this getter as a
temporary fix.

tracking issue: #1181 (see for
more details)

> As far as I could tell, `webdav` feature was the only one spawning a
`runtime` in `rustic` and using it.

### Before merging this one:
- [x] rustic_core (core and backend) must be updated to a release that
includes rustic-rs/rustic_core#355.

---------

Co-authored-by: simonsan <[email protected]>
@aawsome
Copy link
Member

aawsome commented Nov 22, 2024

Actually the current webdav problem can be solved by just changing the implementation of WebDavFs - by using a spawned thread (std::thread::spawn) which calls all rustic_core functionality sync and then implement the async trait method by communicating with this thread using channels.

I made a POC implementation which needs quite some polish but already works.

@nardoor
Copy link
Contributor

nardoor commented Nov 22, 2024

That's a good workaround! @aawsome

github-merge-queue bot pushed a commit that referenced this issue Nov 23, 2024
…#1361)

Solves the webdav problem reported in #1181

---------

Signed-off-by: simonsan <[email protected]>
Co-authored-by: simonsan <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Nov 24, 2024
…1365)

(after reverting #1361)

solves the webdav problem reported in #1181

Co-authored-by: simonsan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backends Area: Related to backends in `rustic-core` A-core Area: Generally related to `rustic_core` A-filesystem Area: Related to the Filesystem in one way or the other C-bug Category: Something isn't working as expected I-panic Issue: `rustic` panicked
Projects
None yet
Development

No branches or pull requests

5 participants