-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define custom IoUringLocal::get*
methods which use io_uring
under the hood.
#30
Comments
IoUringLocal::get
method which uses io_uring
under the hood.IoUringLocal::get*
methods which use io_uring
under the hood.
…git add the appropriate files earlier, sorry! #30
…ration to any backend. About to try to resolve the problem of getting the output back! #30
…g good. Next step is to actually implement the logic for getting a file in prep_operation_for_io_uring.rs. #30
Maybe the "IO backend" should always accept a channel of Then we can easily use that backend for The Although need to think about what happens if we optimise the reads. But maybe I'll ignore that for now! |
…asks in flight and no messages in the channel. #30
Getting there! Some next steps: UPDATE: Moved next steps into the comment at the top of this issue |
This is really interesting! I've been looking at the other comments you made, related to this, as well as the implementation (which is still a work in progress of course), I had a couple high level questions, just to make sure I follow (again, at a high level, the finer details are quite tricky I'm sure). First, it sounds like if this the solution you go with, My other question is also somewhat general, but motivated by a specific use case. When submitting a lot of IO operations in a batch, and assuming a lot (64, 128, something like that) of operations are pushed to the SQ at once, can there ever be a way to preserve some sort of order in which operations will complete? The use case I have in mind is consuming data from a Zarr store with more than one array. Say I have a store with N "variables" (each stored in its own 2D array, with all N arrays being exactly of the same size, since they represent different variables on the same spatial grid), and let's say I have a use case where I can't do anything with "data chunk x.y" until I have the data for chunk files x.y for all N variables (there are N such files), could the approach described here be problematic? What I mean is, if I use |
Hi @maximedion2! Thanks so much for your interest in this project! You're the first person (outside of my colleagues in Open Climate Fix) to comment on this repo 🙂 The first thing to mention is that the API is still very much in flux. I'm working on this code quite actively at the moment, and almost every day of work on this project brings new insights into how this could work. A quick bit of news: Just today I benchmarked this (very early) code. As far as I can tell, LSIO's OK, to answer your questions:
It's enforced within LSIO, in this (messy) code.
Yes, exactly, the You can also see how this works, from the user's perspective, in this simple benchmark code. We first get a vector of 1,000 filenames. Then we fill a second vector with the 1,000
Great question! The short answer is that, if I extend the |
Of course! This is a pretty interesting project, I'm trying to learn as much as I can from it and to see if anything here can apply to this (slightly pie-in-the-sky) project I'm working on (not nearly as much as I'd like)! Good to know early benchmarks are promising!
Hmm yeah that makes sense. Can I ask though, under the hood, would groups be handled by only submitting one group of operations at a time to the SQ (assuming the group fits in the 32 operations, if not it could be split)? If someone calls this method for grouped operations with, say, 100 groups with only 2 operations per group, would that basically kill the potential performance gains from using io_uring? BTW, sorry I have a quick, unrelated question, since I think you and your team use Zarr a lot. What Python package do you currently use for it? I'm looking into version 3, and I'm trying to produce some sample data to help with testing my project, but |
Great question! The short answer is: I don't yet know the performance implications of submitting many groups, where each group has a small number of operations. But I'm very keen to make it work well 🙂. I know some people need every file to be delivered to their application in strict order, and so some people will have 1 operation per group! I haven't yet figured out exactly how I'll implement this. But I've started a new issue (#68) to start fleshing out some ideas about implementing groups.
In Open Climate Fix, we mostly use Zarr-Python v2, and we use zarr arrays which conform to v2 of the Zarr file format specification. It is a bit confusing but it's important to distinguish between the version of the Zarr-Python package versus the version of the Zarr file format spec. My understanding is that Zarr-Python v2 doesn't fully conform to v3 of the Zarr file format spec. But folks are working hard on version 3.0 of the Zarr-Python package. And Zarr-Python v3.0 should fully support v3 of the Zarr file format spec! If you want a Python package that outputs data that conforms to v3 of the Zarr file format spec, then try Zarrita. |
Ah, I see, so there's an Oh and thanks for the info on Zarr-Python, I briefly checked and Zarrita does indeed conform to the specs, that's good to know. |
Methods to implement:
get
get_range
get_ranges
Maybe it'd work something like this:
IoUringLocal::new
would start a new operating system thread that owns anio_uring
instance.io_uring
would be for the kernel to have a thread which checks the submission queue, so we don't have to do a system call for every file.io_uring_local.get()
(on the main thread) would:Future
, which defaults to returnPending
when polled, and which contains the details of the IO operation.Future
in aMutex
inside anArc
.Arc<Mutex<Future>>
would be sent to the worker thread via a channel.get()
. (See the customFuture
in the Rust paperback book). (I think we need to wrap the Future in an Arc because my code, and the executor will both try to read theFuture
)user_data
would be a raw pointer to theArc
. (I'm not sure if we should useArc::into_raw
andArc::from_raw
; orArc::as_ptr
(which gets a raw pointer to the data)).Vec<Option<&Future>>
. ThatVec
will be the length of the SQ. Reuse entries in theVec
when CQEs arrive.Vec
entries can be re-used (because CQEs might arrive in any order)? Have a simple ring buffer of theVec
indicies that are available. I think we can useVecDeque
as a ring buffer. (If not, could just have a fixed-lengthVec
with afront: usize
andback: usize
variables, which index into theVec
. Or useringbuf::LocalRb
.).io_uring
code works right now:read
, then we'd stat the file, create a buffer, then submit a chain of open-read-close ops to io_uring (see Chain<open><read><close>
#1) (so I guess we'd need to wait for no more than 61 SQEs to be in flight, so we can submit 3 SQEs?).ObjectStore
operations, perhaps re-usingobject_store
structs likeGetResult
and io_uring structs likeEntry
stat
using the "normal" blocking method. But, in the future, it'd be nice to use io_uring tostat
, too. Not sure if we should use oneio_uring
forstat
ing, and another forread
ing? Or just use a single io_uring?Future
(or find it in the map or Vec)Future
Future
to returnReady
when next polledFuture.wake()
Arc
reference count is correctTODO:
ObjectStore::get*
return typesPin
(in Rust paperback book)Future
?squeue::Entry
inprep_operation_for_io_uring
.OperationWithCallbacks
in flight inthread.rs
#42execute_callback
when the data arrives from io_uring<open><read><close>
#1The text was updated successfully, but these errors were encountered: