Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop ObjectStore & async/await. Use Channels instead. Focus entirely (for now) on io_uring for local file storage. #93

Closed
3 of 7 tasks
JackKelly opened this issue Feb 28, 2024 · 1 comment
Assignees

Comments

@JackKelly
Copy link
Owner

JackKelly commented Feb 28, 2024

Why drop async / await and drop ObjectStore?

  • LSIO's API is already diverging from ObjectStore. For example:
    • Using aligned buffers instead of Vec (Use O_DIRECT #51)
    • We want GetRanges to return each byte range as soon as that byte range is ready (instead of waiting until all byte ranges are ready). A problem with the simple "tokio + rayon" code in Try interleaving compute with IO #37 is that store.get_ranges(filename, ranges).await will block until all ranges are available.
    • LSIO might want to re-use buffers (Re-use buffers #38).
  • Tokio is a relatively heavy-weight dependency. Recent flamegraphs show tokio taking up a lot of the runtime.
  • The object_store_adaptor.rs is actually quite heavyweight; and introduces two different type of Operation.
  • In the future, I'd still love to extend LSIO to support Windows I/O Rings and MacOS kqueue. But those platforms probably have similar abstractions to io_uring. And there's little point bogging the code down right now with unneccessary abstractions.
  • it'd be fairly easy to create a new adaptor for async

How?

Basically, what's already proposed in #61. LSIO would have a threadpool. The user would send operations to LSIO's threadpool via a channel.

Why completely drop async/await? Why not stay async and emit a Stream of completed operations?

Reasons for dropping async/await completely:

  • TL;DR: AFAICT, async is not necessary for this use-case, and async only adds computational overhead and code complexity 🙂
  • The io_uring loading code isn't async: it's a single thread right now, and it'll soon be a threadpool. The compute code won't be async (because we don't want to block an async thread). Yes, we could have a Stream connecting the IO and compute. But why add async glue between the two halves if it's not necessary?

Reasons for using Stream:

  • The StreamExt crate has some handy methods which, at first glance, might make it easy to compose steps of computation. But we can't run blocking computation in the async threadpool!
  • Makes it easier to interact with async code??

Use-cases to consider

UPDATE: Use-cases are moved to issue #104.

TODO

@JackKelly JackKelly self-assigned this Feb 28, 2024
@JackKelly JackKelly moved this to Todo in light-speed-io Feb 28, 2024
@JackKelly JackKelly changed the title Drop ObjectStore & async/await. Use Channels / async iterators instead. Drop ObjectStore & async/await. Use Channels / async iterators instead. Focus entirely (for now) on io_uring for local file storage. Feb 28, 2024
@JackKelly JackKelly changed the title Drop ObjectStore & async/await. Use Channels / async iterators instead. Focus entirely (for now) on io_uring for local file storage. Drop ObjectStore & async/await. Use Channels instead. Focus entirely (for now) on io_uring for local file storage. Feb 28, 2024
@JackKelly JackKelly moved this from Todo to In Progress in light-speed-io Mar 15, 2024
@JackKelly
Copy link
Owner Author

This issue is superseded by #122

@github-project-automation github-project-automation bot moved this from In Progress to Done in light-speed-io Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant