overload make_device_uvector_async for bool type #14062

SurajAralihalli · 2023-09-07T22:46:18Z

This PR addresses the issue #13454.

Currently, make_device_uvector_async doesn't support copying the containers that can't be implicitly converted to host_span. This includes vector<bool> as it stores 1-bit-per-bool instead of 1-byte-pe-bool. In these cases we have to resort to copying data one element at a time asynchronously.

In this PR make_device_uvector_async has been overloaded to support all these special containers (not just vector<bool>). rmm::device_uvector's set_element_async is leveraged to copy data asynchronously.

Signed-off-by: Suraj Aralihalli [email protected]

Signed-off-by: Suraj Aralihalli <[email protected]>

copy-pr-bot · 2023-09-07T22:46:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…verted to host_span implicitly Signed-off-by: Suraj Aralihalli <[email protected]>

bdice

I’d like to discuss design a bit here. I have three primary concerns:

Currently we are working around the problems in the linked issue with std::vector<bool> support by using thrust::host_vector<bool> which is not bit-packed. That seems to be a sufficient workaround for the uses cases I would imagine for this utility (mostly in writing tests). If we know of specific cases where that workaround is insufficient, we should consider and document those more closely before implementing a fix. This is something we probably should discuss on the linked issue.
This PR appears to have an overload that would fit any type with a value_type member. This is probably not constraining enough. If we must support std::vector<bool> types, we should explicitly overload for that type.
Performance: Using rmm to set every element individually in a for loop will consume a very large amount of time because it will not batch the host-device data transfers. If we choose to implement a specific overload for bitpacked containers like std::vector<bool>, we must use copying methods that can handle the whole range at once (perhaps with postprocessing to expand it to a final result) rather than copying one element at a time.

SurajAralihalli · 2023-09-11T16:58:47Z

Thanks @bdice for reviewing my PR.

Yes, while working on the PR, I realized that thrust::host_vector<bool> is a sufficient workaround for the problems in the linked issue with std::vector<bool>, as it is host_span compatible. If a fix is not required at the moment, we can consider closing issue #13454, or I can add a summary of our discussion if it needs to remain open for documentation reasons.
The PR overloads all the containers that can't be implicitly converted to a host_span (!std::is_convertible_v). If we aren't aware of any other types, other than std::vector<bool>, that are host_span incompatible and need a fix, we can limit the scope to std::vector<bool>.
I see the performance impact because of using a for loop. I was looking for ways to avoid using a for loop for copying. By 'postprocessing to expand it to a final result,' do you mean to copy the bit-packed data to device memory and transform it into a compatible form in a later step, harnessing GPUs? vector<bool> vec.data() is a deleted method, so we need to find a suitable way to transfer the bit-packed data to device memory without iterating. It would be interesting to investigate this further.

bdice · 2023-09-11T21:38:46Z

@SurajAralihalli I agree with most of your analysis here. I am inclined to say that the corresponding issue should be closed without changes to libcudf, because users should be expected to use thrust::host_vector<bool>. I'll cross-link this discussion on that issue and close both this PR and the corresponding issue.

That conclusion eliminates the need to discuss most of the rest of your comment about alternative plans -- but I think you're seeing the right tradeoffs here. Unfortunately, there aren't any simple ways to handle host-device copies of std::vector<bool> besides using iterators to do a full copy on the host into a new data structure before transferring to device. References:

GregoryKimball · 2023-09-11T22:04:14Z

Thank you @SurajAralihalli for taking a careful look at this issue!

overload make_device_uvector_async for bool type

795ca3c

Signed-off-by: Suraj Aralihalli <[email protected]>

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 7, 2023

GregoryKimball assigned karthikeyann Sep 9, 2023

extend make_device_uvector_async to all containers that cannot be con…

06dec6f

…verted to host_span implicitly Signed-off-by: Suraj Aralihalli <[email protected]>

SurajAralihalli marked this pull request as ready for review September 11, 2023 07:03

SurajAralihalli requested a review from a team as a code owner September 11, 2023 07:03

SurajAralihalli requested review from bdice and mythrocks September 11, 2023 07:03

bdice requested changes Sep 11, 2023

View reviewed changes

GregoryKimball unassigned karthikeyann Sep 11, 2023

bdice closed this Sep 11, 2023

bdice mentioned this pull request Sep 11, 2023

[FEA] Specialize make_device_uvector_async to work with std::vector<bool> #13454

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overload make_device_uvector_async for bool type #14062

overload make_device_uvector_async for bool type #14062

SurajAralihalli commented Sep 7, 2023 •

edited

Loading

copy-pr-bot bot commented Sep 7, 2023

bdice left a comment •

edited

Loading

SurajAralihalli commented Sep 11, 2023 •

edited

Loading

bdice commented Sep 11, 2023

GregoryKimball commented Sep 11, 2023

overload make_device_uvector_async for bool type #14062

overload make_device_uvector_async for bool type #14062

Conversation

SurajAralihalli commented Sep 7, 2023 • edited Loading

copy-pr-bot bot commented Sep 7, 2023

bdice left a comment • edited Loading

Choose a reason for hiding this comment

SurajAralihalli commented Sep 11, 2023 • edited Loading

bdice commented Sep 11, 2023

GregoryKimball commented Sep 11, 2023

SurajAralihalli commented Sep 7, 2023 •

edited

Loading

bdice left a comment •

edited

Loading

SurajAralihalli commented Sep 11, 2023 •

edited

Loading