-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port thrust::merge[_by_key]
to CUB
#1817
Conversation
a7a72b0
to
f0cdf6a
Compare
thrust::merge
to CUBthrust::merge[_by_key]
to CUB
153ced5
to
2b64f25
Compare
868412d
to
797ba58
Compare
8cdf145
to
253b8fb
Compare
df3cbe7
to
7851e7c
Compare
A few instructions are generated in different order
This changes the generated SASS
Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>
Co-authored-by: Georgii Evtushenko <[email protected]>
Co-authored-by: Georgii Evtushenko <[email protected]>
Co-authored-by: Georgii Evtushenko <[email protected]>
30223f8
to
f2e51af
Compare
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Left some final comments, let's merge when those are addressed.
🟨 CI finished in 8h 56m: Pass: 99%/250 | Total: 6d 01h | Avg: 34m 58s | Max: 1h 13m | Hits: 62%/248858
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 11h 04m: Pass: 100%/250 | Total: 6d 02h | Avg: 35m 05s | Max: 1h 13m | Hits: 62%/250036
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
// Cannot check output iterators, since they could be discard iterators, which do not have the right value_type | ||
static_assert(::cuda::std::is_same<cub::detail::value_t<KeyIt2>, key_t>::value, ""); | ||
static_assert(::cuda::std::is_same<cub::detail::value_t<ValueIt2>, value_t>::value, ""); | ||
static_assert(::cuda::std::__invokable<CompareOp, key_t, key_t>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe those are not the right constraints and they have broken cuDF
Why do we require all keys / values to have the exact same type? Shouldnt it be sufficient to ensure that the CompareOp is invocable with the passed in types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is what the documentation of thrust::merge
says:
InputIterator1 and InputIterator2 have the same value_type
Apparently, the iterators for the values don't need to have the same value_type
s.
As long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in NVIDIA#1817.
As long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in NVIDIA#1817.
* Add a cuDF inspired test for merge_by_key * Allow CUB MergePath to support iterators with different value types * Allow different input value types for merge, as long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in #1817.
* Refactor thrust/CUB merge * Port thurst::merge[_by_key] to cub::DeviceMerge Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>
* Add a cuDF inspired test for merge_by_key * Allow CUB MergePath to support iterators with different value types * Allow different input value types for merge, as long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in NVIDIA#1817.
* Refactor thrust/CUB merge * Port thurst::merge[_by_key] to cub::DeviceMerge Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>
* Add a cuDF inspired test for merge_by_key * Allow CUB MergePath to support iterators with different value types * Allow different input value types for merge, as long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in NVIDIA#1817.
This PR ports the CUDA implementation of
thrust::merge[_by_key]
to CUB.CUB already has an implementation of merge sort, also based on the merge path algorithm, so some infrastructure is taken from there, replacing the thrust implementation. However, whenever I was not sure what to pick, I preferred the thrust implementation and will attempt a unification in a separate PR after benchmarks are available. This especially concerns the agents.
The following new constructs are added:
cub::DeviceMerge
withMergeKeys
andMergePairs
, including documentation.cub::detail::DispatchMerge
with policy selection, fallback agent, vsmem implementation. Including a set of tuning policies taken over from the thrust implementation.cub::detail::AgentPartitionMergePath
implementing merge path partitioning.cub::detail::AgentMergeNoSort
implementing the device side merging. Very similar to merge sortsAgentMerge
and I plan to unifying the implementations in a separate step.A full tuning is probably necessary which will be conducted in a separate PR.
Compile-time of
thrust.test.merge
:Benchmark of `thrust.bench.merge` :
Running
nvbench_compare.py base.json branch.json
Fixes: #1763
Please merge before:
merge_by_key
andmerge_key_value
tests #1824TODO:
cub::DeviceMerge
correctly uses the fallback policycub::DeviceMerge
correctly uses virtual shared memorythrust::merge
compile-time before and after this PR