-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch-read that can take ownership of consumed items #56
Comments
I guess this should be possible to implement. The question is: what should happen to un-consumed items? In the current API this decision is explicit, making it implicit may cause confusion. What about batch-writing? I guess this should also allow moves for symmetry. Did you happen to have a look at #52? And just out of curiosity: how large are your batches?
|
PS: I'm just starting to realize that the work-around will not work because |
My batches are not that big (generally a few dozen at most), but I am popping very frequently (this is a task-queue for a async task executor, so the work items are extremely small and there are many thousands every second under load). |
What about a closure-based API? Something like this: pub fn consume_chunk_with<F>(&mut self, n: usize, f: F) -> Result<(), ChunkError>
where
F: Fn(T),
{
todo!()
} The closure would be called exactly @nullchinchilla Would this work in your case? Alternatively, the closure could return something to indicate an early return. I think something like this could also work on the producer side. |
Regarding my questions above (#56 (comment)) But still, I don't know how this should look on the producer side, since I don't think we can use an |
How about the following (pseudo-Rust)? The crux I guess is in the signature: the iterator is not moved, only borrowed as mutable, but calling the iterator moves an item "out of the iterator". impl<T> Producer<T> {
pub fn extend<I>(iterator: &mut I) // Bikeshedding the name for now.
where I: Iterator<Item = T>
{
let writing_slice = self.get_writing_slice(); // Doesn't really exist, but suppose this gives an &mut [MaybeUninit<T>] to the potentially uninitialized data in the ring buffer
let mut number_of_items_written = 0;
for (input, output) in iterator.zip(writing_slice.iter_mut()) {
// I have no idea if this is the correct way to work with `MaybeUninit`, but you probably know.
unsafe {
output.as_mut_ptr().write(&input);
mem::leak(input);
}
number_of_items_written += 1;
}
self.advance(number_of_items_written); // Or something like that.
}
} |
With "I don't think we can use an Iterator here" I meant "we can't provide an iterator (for moving items into it)", but of course we could take an iterator (for moving items out of it). The main question here is: should we take the I don't really have a use case for this, so I don't know which API would make most sense. What do you think @nullchinchilla? |
Ah. I see. Just my two cents here: I would take the size_hint() from the iterator since the caller can usually limit the number of items that the iterator gives, so there's no need to specify it twice. E.g. in the case of an iterator that drains a
The latter only makes sense when the type does not implement Just my two cents. I'm also curious to hear @nullchinchilla 's view. |
For my particular use case I actually would prefer a |
The problem with this is that the user has to initially provide a slice filled with valid I think it would be better to provide an API that truly allows moving items without having to deal with any dummy items. I think slices are not the right tool for this job. I agree that an additional slice-copying API would be useful (see #57 for discussion), but only for
I'm not sure if I like this. On the one hand, I like to be explicit about the chunk size. I've tried to implement an experimental iterator API in the branch https://github.com/mgeier-forks/rtrb/tree/iterators. It provides a pub fn populate<I>(self, iter: &mut I) -> usize
where
I: Iterator<Item = T>; It can be used like this: let mut it = /* some iterator */;
if let Ok(mut chunk) = producer.write_chunk_uninit(25) {
chunk.populate(&mut it);
} I'm not sure if it would be better to take ownership of the iterator, but I think this way it is more obvious that the iterator might not be completely drained. The iterator might be re-used in a future call. On the reading side (which was actually requested in this issue), I've implemented if let Ok(chunk) = consumer.read_chunk(25) {
let v: Vec<_> = chunk.into_iter().collect();
} ... or like this: if let Ok(chunk) = consumer.read_chunk(25) {
for item in chunk {
// do something with item
}
} Only the actually iterated items are moved out of the ring buffer. If fewer than chunk size items are iterated, the remaining ones stay in the queue (i.e. they are not dropped). I've also made an alternative experimental closure-based API in the branch https://github.com/mgeier-forks/rtrb/tree/consume-chunk-with. |
I very recently (in the last few days) learned that it is not necessary to require Therefore, functions can in general take ownership of iterator parameters without worrying about that. This makes the call sites simpler in the cases where the iterator is not needed afterwards (and doesn't require all iterators to be But even better, knowing this we can also use For now, I'm using this signature in #60: impl<T> WriteChunkUninit<'_, T> {
pub fn populate<I>(self, iter: I) -> usize
where
I: IntoIterator<Item = T>,
{
let mut iter = iter.into_iter();
...
}
} This allows passing arrays with What do you think about this API? Any alternative names for |
Working with I don't have the energy right now to do a proper review of #60, but I trust you it has been thought-out quite well. |
Thanks @PieterPenninckx! I'm not in a hurry and I think I will wait some more time before making the next release (which will be a breaking change). @nullchinchilla, do you have any further suggestions? |
FYI, I've changed the name I will wait for a few more days, but if there are no further comments, I'll merge that. |
Can a method to take ownership of the items in a ReadChunk be added? I'm not sure what the best API would be (perhaps an
consume_into_iter()
?) since taking ownership meanscommit
must be called, but such a method would be crucial for efficiently batch-reading non-Copy types.(Context: using
rtrb
as a primitive in a scheduler, so it contains non-Clone non-Copy jobs rather than u8s or anything of that sort)The text was updated successfully, but these errors were encountered: