Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a new BatchObjectStore trait. Start with just a single method: get_batch(Vec<operations>) -> Stream. #32

Closed
Tracked by #28
JackKelly opened this issue Jan 29, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request performance Improvements to runtime performance

Comments

@JackKelly
Copy link
Owner

JackKelly commented Jan 29, 2024

Why define a new BatchObjectStore trait?

  • To Benchmark IoUringLocal::get_batch #34 against other implementations. I have a hunch that, when we're dealing with millions of operations, it may be a significant overhead to create one Future per operation, and to wake those Futures.
  • The existing ObjectStore API lacks some functionality that we need:
    • Reading into an existing buffer. (UPDATE: I don't think this is viable. O_DIRECT requires the start and end of the buffer to be aligned to 512-byte boundaries.)
    • When reading multiple byte_ranges of a file, we want each byte_range to be returned to the user as soon as it's ready. ObjectStore::get_ranges only returns once all the byte_ranges have been read.
      • Yes, we could call ObjectStore::get_range multiple times. But that limit's LSIO's ability to optimise the reads.
      • Unless we add a IoUringLocal::submit function, such that no operations are submitted to io_uring until submit is called? When submit is called, LSIO would first optimise all the operations submitted so far.
    • But, perhaps the solution here isn't a new BatchObjectStore trait, but instead is:
      • a new ObjectStoreWithBuffer trait, which defines a bunch of get_with_buffer methods.
      • a new WaitForSubmit trait, with a submit method?
  • We'd like to return a Stream (aka AsyncIterator) of buffers. Then we can have a separate crate which applies an arbitrary processing function (such as decompression) to a Stream of buffers, in parallel across CPU cores (New crate: apply a user-supplied function to a Stream of buffers #26).
    • This should make life easier for users, compared to the slightly fiddly code users have to write to interleave compute with IO, when each IO operation is represented by a Future (see code in Try interleaving compute with IO #37).
    • But maybe we don't need a BatchObjectStore trait, maybe we can just write a utility function which takes a Vec<Future> and returns a Stream?

So, in conclusion, I think the main reason for wanting a new BatchObjectStore trait is because it might perform better. All the other reasons for wanting a BatchObjectStore trait can be achieved in a less intrusive fashion.

So, I should implement an MVP BatchObjectStore, just to benchmark it.

@JackKelly
Copy link
Owner Author

Planning to do #93 instead of defining a BatchObjectStore.

@github-project-automation github-project-automation bot moved this from Todo to Done in light-speed-io Mar 12, 2024
@JackKelly JackKelly closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Improvements to runtime performance
Projects
Status: Done
Development

No branches or pull requests

1 participant