Define a new `BatchObjectStore` trait. Start with just a single method: `get_batch(Vec<operations>) -> Stream`. #32

JackKelly · 2024-01-29T14:51:39Z

The trait would have default implementations, which just call the equivalent single-shot ObjectStore method in a loop.
For io_uring, we'd impl BatchObjectStore for IoUringLocal. Implement just get_batch. #33
For a sketch the definition of each Item in the Stream, see New crate: apply a user-supplied function to a Stream of buffers #26

Why define a new `BatchObjectStore` trait?

To Benchmark IoUringLocal::get_batch #34 against other implementations. I have a hunch that, when we're dealing with millions of operations, it may be a significant overhead to create one Future per operation, and to wake those Futures.
The existing ObjectStore API lacks some functionality that we need:
- Reading into an existing buffer. (UPDATE: I don't think this is viable. O_DIRECT requires the start and end of the buffer to be aligned to 512-byte boundaries.)
- When reading multiple byte_ranges of a file, we want each byte_range to be returned to the user as soon as it's ready. ObjectStore::get_ranges only returns once all the byte_ranges have been read.
  - Yes, we could call ObjectStore::get_range multiple times. But that limit's LSIO's ability to optimise the reads.
  - Unless we add a IoUringLocal::submit function, such that no operations are submitted to io_uring until submit is called? When submit is called, LSIO would first optimise all the operations submitted so far.
- But, perhaps the solution here isn't a new BatchObjectStore trait, but instead is:
  - a new ObjectStoreWithBuffer trait, which defines a bunch of get_with_buffer methods.
  - a new WaitForSubmit trait, with a submit method?
We'd like to return a Stream (aka AsyncIterator) of buffers. Then we can have a separate crate which applies an arbitrary processing function (such as decompression) to a Stream of buffers, in parallel across CPU cores (New crate: apply a user-supplied function to a Stream of buffers #26).
- This should make life easier for users, compared to the slightly fiddly code users have to write to interleave compute with IO, when each IO operation is represented by a Future (see code in Try interleaving compute with IO #37).
- But maybe we don't need a BatchObjectStore trait, maybe we can just write a utility function which takes a Vec<Future> and returns a Stream?

So, in conclusion, I think the main reason for wanting a new BatchObjectStore trait is because it might perform better. All the other reasons for wanting a BatchObjectStore trait can be achieved in a less intrusive fashion.

So, I should implement an MVP BatchObjectStore, just to benchmark it.

The text was updated successfully, but these errors were encountered:

JackKelly · 2024-03-12T19:30:20Z

Planning to do #93 instead of defining a BatchObjectStore.

JackKelly mentioned this issue Jan 29, 2024

Plan for transitioning LSIO to be an extension to object_store #28

Closed

7 tasks

JackKelly self-assigned this Jan 29, 2024

JackKelly added the enhancement New feature or request label Jan 29, 2024

JackKelly added this to light-speed-io Jan 29, 2024

JackKelly moved this to Todo in light-speed-io Jan 29, 2024

JackKelly added this to the Convert LSIO to be an extension crate of `object_store` milestone Jan 29, 2024

JackKelly mentioned this issue Jan 30, 2024

Does LSIO need to exist?! Does object_store already do everything we need? If not, can we extend object_store instead of creating LSIO? #27

Closed

JackKelly mentioned this issue Feb 8, 2024

Should the worker thread receive a Stream of Operations, and output a Stream of completed operations? #44

Closed

1 task

JackKelly added the performance Improvements to runtime performance label Feb 10, 2024

JackKelly modified the milestones: Convert LSIO to be an extension crate of `object_store`, Perform at least as fast as `fio` Feb 10, 2024

This was referenced Feb 15, 2024

Register buffers #59

Open

Multiple threads (each with a uring) & crossbeam::deque & a pipeline of I/O steps #61

Closed

JackKelly closed this as completed Mar 12, 2024

github-project-automation bot moved this from Todo to Done in light-speed-io Mar 12, 2024

JackKelly closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a new `BatchObjectStore` trait. Start with just a single method: `get_batch(Vec<operations>) -> Stream`. #32

Define a new `BatchObjectStore` trait. Start with just a single method: `get_batch(Vec<operations>) -> Stream`. #32

JackKelly commented Jan 29, 2024 •

edited

Loading

JackKelly commented Mar 12, 2024

Define a new BatchObjectStore trait. Start with just a single method: get_batch(Vec<operations>) -> Stream. #32

Define a new BatchObjectStore trait. Start with just a single method: get_batch(Vec<operations>) -> Stream. #32

Comments

JackKelly commented Jan 29, 2024 • edited Loading

Why define a new BatchObjectStore trait?

JackKelly commented Mar 12, 2024

Define a new `BatchObjectStore` trait. Start with just a single method: `get_batch(Vec<operations>) -> Stream`. #32

Define a new `BatchObjectStore` trait. Start with just a single method: `get_batch(Vec<operations>) -> Stream`. #32

JackKelly commented Jan 29, 2024 •

edited

Loading

Why define a new `BatchObjectStore` trait?