-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overload make_device_uvector_async for bool type #14062
overload make_device_uvector_async for bool type #14062
Conversation
Signed-off-by: Suraj Aralihalli <[email protected]>
…verted to host_span implicitly Signed-off-by: Suraj Aralihalli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to discuss design a bit here. I have three primary concerns:
- Currently we are working around the problems in the linked issue with
std::vector<bool>
support by usingthrust::host_vector<bool>
which is not bit-packed. That seems to be a sufficient workaround for the uses cases I would imagine for this utility (mostly in writing tests). If we know of specific cases where that workaround is insufficient, we should consider and document those more closely before implementing a fix. This is something we probably should discuss on the linked issue. - This PR appears to have an overload that would fit any type with a
value_type
member. This is probably not constraining enough. If we must supportstd::vector<bool>
types, we should explicitly overload for that type. - Performance: Using rmm to set every element individually in a for loop will consume a very large amount of time because it will not batch the host-device data transfers. If we choose to implement a specific overload for bitpacked containers like
std::vector<bool>
, we must use copying methods that can handle the whole range at once (perhaps with postprocessing to expand it to a final result) rather than copying one element at a time.
Thanks @bdice for reviewing my PR.
|
@SurajAralihalli I agree with most of your analysis here. I am inclined to say that the corresponding issue should be closed without changes to libcudf, because users should be expected to use That conclusion eliminates the need to discuss most of the rest of your comment about alternative plans -- but I think you're seeing the right tradeoffs here. Unfortunately, there aren't any simple ways to handle host-device copies of |
Thank you @SurajAralihalli for taking a careful look at this issue! |
This PR addresses the issue #13454.
Currently,
make_device_uvector_async
doesn't support copying the containers that can't be implicitly converted to host_span. This includesvector<bool>
as it stores 1-bit-per-bool instead of 1-byte-pe-bool. In these cases we have to resort to copying data one element at a time asynchronously.In this PR
make_device_uvector_async
has been overloaded to support all these special containers (not justvector<bool>
). rmm::device_uvector'sset_element_async
is leveraged to copy data asynchronously.Signed-off-by: Suraj Aralihalli [email protected]