Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overload make_device_uvector_async for bool type #14062

Conversation

SurajAralihalli
Copy link
Contributor

@SurajAralihalli SurajAralihalli commented Sep 7, 2023

This PR addresses the issue #13454.

Currently, make_device_uvector_async doesn't support copying the containers that can't be implicitly converted to host_span. This includes vector<bool> as it stores 1-bit-per-bool instead of 1-byte-pe-bool. In these cases we have to resort to copying data one element at a time asynchronously.

In this PR make_device_uvector_async has been overloaded to support all these special containers (not just vector<bool>). rmm::device_uvector's set_element_async is leveraged to copy data asynchronously.

Signed-off-by: Suraj Aralihalli [email protected]

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 7, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 7, 2023
…verted to host_span implicitly

Signed-off-by: Suraj Aralihalli <[email protected]>
@SurajAralihalli SurajAralihalli marked this pull request as ready for review September 11, 2023 07:03
@SurajAralihalli SurajAralihalli requested a review from a team as a code owner September 11, 2023 07:03
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to discuss design a bit here. I have three primary concerns:

  1. Currently we are working around the problems in the linked issue with std::vector<bool> support by using thrust::host_vector<bool> which is not bit-packed. That seems to be a sufficient workaround for the uses cases I would imagine for this utility (mostly in writing tests). If we know of specific cases where that workaround is insufficient, we should consider and document those more closely before implementing a fix. This is something we probably should discuss on the linked issue.
  2. This PR appears to have an overload that would fit any type with a value_type member. This is probably not constraining enough. If we must support std::vector<bool> types, we should explicitly overload for that type.
  3. Performance: Using rmm to set every element individually in a for loop will consume a very large amount of time because it will not batch the host-device data transfers. If we choose to implement a specific overload for bitpacked containers like std::vector<bool>, we must use copying methods that can handle the whole range at once (perhaps with postprocessing to expand it to a final result) rather than copying one element at a time.

@SurajAralihalli
Copy link
Contributor Author

SurajAralihalli commented Sep 11, 2023

Thanks @bdice for reviewing my PR.

  1. Yes, while working on the PR, I realized that thrust::host_vector<bool> is a sufficient workaround for the problems in the linked issue with std::vector<bool>, as it is host_span compatible. If a fix is not required at the moment, we can consider closing issue #13454, or I can add a summary of our discussion if it needs to remain open for documentation reasons.

  2. The PR overloads all the containers that can't be implicitly converted to a host_span (!std::is_convertible_v). If we aren't aware of any other types, other than std::vector<bool>, that are host_span incompatible and need a fix, we can limit the scope to std::vector<bool>.

  3. I see the performance impact because of using a for loop. I was looking for ways to avoid using a for loop for copying. By 'postprocessing to expand it to a final result,' do you mean to copy the bit-packed data to device memory and transform it into a compatible form in a later step, harnessing GPUs? vector<bool> vec.data() is a deleted method, so we need to find a suitable way to transfer the bit-packed data to device memory without iterating. It would be interesting to investigate this further.

@bdice
Copy link
Contributor

bdice commented Sep 11, 2023

@SurajAralihalli I agree with most of your analysis here. I am inclined to say that the corresponding issue should be closed without changes to libcudf, because users should be expected to use thrust::host_vector<bool>. I'll cross-link this discussion on that issue and close both this PR and the corresponding issue.

That conclusion eliminates the need to discuss most of the rest of your comment about alternative plans -- but I think you're seeing the right tradeoffs here. Unfortunately, there aren't any simple ways to handle host-device copies of std::vector<bool> besides using iterators to do a full copy on the host into a new data structure before transferring to device. References:

@GregoryKimball
Copy link
Contributor

Thank you @SurajAralihalli for taking a careful look at this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants