Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

look into using Rayon for the IO thread pool. Can we associate an IoUring instance with each thread? Can we publish tasks to other threads? #103

Closed
Tracked by #93
JackKelly opened this issue Mar 13, 2024 · 3 comments
Assignees
Labels
performance Improvements to runtime performance question Further information is requested

Comments

@JackKelly
Copy link
Owner

JackKelly commented Mar 13, 2024

I'll use this "issue" to document my research into using Rayon for the IO threadpool.

Why use Rayon's threadpool for the IO threadpool?

In #61, I'm talking myself into building my own work-stealing threadpool (using crossbeam::deque). But this hand-built threadpool is starting to sound a lot like Rayon's threadpool! So maybe I can just use Rayon's threadpool? (And hence save myself a bunch of work. And, more importantly, end up with code which is more robust & easier to maintain!)

Related:

@JackKelly JackKelly self-assigned this Mar 13, 2024
@JackKelly JackKelly added the performance Improvements to runtime performance label Mar 13, 2024
@JackKelly JackKelly moved this to In Progress in light-speed-io Mar 13, 2024
@JackKelly JackKelly added the question Further information is requested label Mar 13, 2024
@JackKelly JackKelly changed the title look into using Rayon for the IO thread pool. Can we associate a uring instance with each thread? Can we publish tasks to other threads? look into using Rayon for the IO thread pool. Can we associate a Uring instance with each thread? Can we publish tasks to other threads? Mar 13, 2024
@JackKelly
Copy link
Owner Author

JackKelly commented Mar 13, 2024

Can we associate exactly one IoUring instance with each Rayon thread?

This StackOverflow answer suggests that, yes, we can "use the scoped ThreadLocal from the thread_local crate" to "ensure that you only use as many objects as there are threads".

struct ThreadLocal<T: Send> and IoUring implements Send.

Or, actually, maybe for_each_init is better. This almost works! Except I can't see how to drop in_tx2 (and, if we can't drop in_tx2 then the in_rx.into_iter() will never end!)

@JackKelly JackKelly changed the title look into using Rayon for the IO thread pool. Can we associate a Uring instance with each thread? Can we publish tasks to other threads? look into using Rayon for the IO thread pool. Can we associate an [IoUring](https://docs.rs/io-uring/latest/io_uring/struct.IoUring.html) instance with each thread? Can we publish tasks to other threads? Mar 13, 2024
@JackKelly JackKelly changed the title look into using Rayon for the IO thread pool. Can we associate an [IoUring](https://docs.rs/io-uring/latest/io_uring/struct.IoUring.html) instance with each thread? Can we publish tasks to other threads? look into using Rayon for the IO thread pool. Can we associate an IoUring instance with each thread? Can we publish tasks to other threads? Mar 13, 2024
@JackKelly
Copy link
Owner Author

OK, I've figured out how to spawn tasks into the current threadpool. Here's some code which I think provides the full framework for making this work.

Some things that would need to be added to that code:

  • We'd have a custom struct, probably exactly like the Worker struct we currently have.
  • Within our Rayon closure, we'd use thread_local_storage.get_or(|| Worker::new()) to create exactly one Worker per thread.
  • As in the current code, our Rayon::pool.spawn(|| {in_rx.into_iter().par_bridge().for_each(|op| { closure would start by looping through that uring's completion queue. Then submit any strictly local (non-stealable) ops to its uring. Then, if the uring has space, submit the new op (but don't block here. If the uring doesn't have space for this op then either add this op to the uring's strictly local queue; or spawn it to the threadpool). Then check the completion queue again.
  • Finally, after the for_each statement, we'd pool.broadcast to all threads, to loop through its completion queue.

@JackKelly
Copy link
Owner Author

JackKelly commented Mar 14, 2024

I was worrying about what would happen if we get to the end of the tasks in the queue, but we don't hang up the sender, so the last few CQEs will not be reaped.

I think the solution is that the user always submits a vector of ops. Then, when we get to the end of the vec, we broadcast to all threads to finish their CQ. This is the only time when threads should wait on the CQ.

@github-project-automation github-project-automation bot moved this from In Progress to Done in light-speed-io Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements to runtime performance question Further information is requested
Projects
Status: Done
Development

No branches or pull requests

1 participant