Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not require merge to have identical value types for both inputs #2054

Closed
wants to merge 2 commits into from

Conversation

miscco
Copy link
Collaborator

@miscco miscco commented Jul 24, 2024

Fixes some minor issues that break cuDF

I am happy to discuss what to do with the static asserts.

I did not catch it during review but I believe the asserts are too strict.

Most likely we actually want something the likes of is_assignable<>

Our beloved `tuple_of_iterator_references` does nto like ternary operators because that does not understand the underlying value txypes of the proxies.
@miscco miscco requested review from a team as code owners July 24, 2024 07:51
@miscco miscco added thrust For all items related to Thrust. bug: functional cub For all items related to CUB and removed thrust For all items related to Thrust. labels Jul 24, 2024
static_assert(::cuda::std::is_same<cub::detail::value_t<KeyIt2>, key_t>::value, "");
static_assert(::cuda::std::is_same<cub::detail::value_t<ValueIt2>, value_t>::value, "");
static_assert(::cuda::std::__invokable<CompareOp, key_t, key_t>::value,
static_assert(::cuda::std::__invokable<CompareOp, cub::detail::value_t<KeyIt1>, cub::detail::value_t<KeyIt1>>::value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernhardmgruber as discussed in #1817 (comment), this is a breaking change and should go into device code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to have added the static assert in the device code but haven't removed it from the dispatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this static assertion about the return type:

::cuda::std::is_convertible<typename ::cuda::std::__invoke_of<CompareOp, key_t, key_t>::type, bool>::value

which I have removed. The offending static assert here was not commented at in PR #1817.

I thought querying whether a comparison operator is invokable with a given set of arguments should be fine in host code. Only asking for the return type is problematic. My reasoning was that it's much more user friendly to have a compile error here if the user passed a wrong operator, than many stacks deeper in the kernel code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only asking for the return type is problematic.

Any form of introspection of the signature is problematic, including:

Introspecting the parameter type of operator() is only supported in device code.

Let's move the check to device code.

static_assert(::cuda::std::is_same<cub::detail::value_t<KeyIt2>, key_t>::value, "");
static_assert(::cuda::std::is_same<cub::detail::value_t<ValueIt2>, value_t>::value, "");
static_assert(::cuda::std::__invokable<CompareOp, key_t, key_t>::value,
static_assert(::cuda::std::__invokable<CompareOp, cub::detail::value_t<KeyIt1>, cub::detail::value_t<KeyIt1>>::value,
Copy link
Collaborator

@gevtushenko gevtushenko Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical: see #1817 (comment)

Suggested change
static_assert(::cuda::std::__invokable<CompareOp, cub::detail::value_t<KeyIt1>, cub::detail::value_t<KeyIt1>>::value,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, you just marked the second static_assert on the review, but your textual comment applied to both. I understand now.

// Cannot check output iterators, since they could be discard iterators, which do not have the right value_type
static_assert(::cuda::std::is_same<cub::detail::value_t<KeyIt2>, key_t>::value, "");
static_assert(::cuda::std::is_same<cub::detail::value_t<ValueIt2>, value_t>::value, "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: @miscco we require this in our docs:

InputIterator1 and InputIterator2 have the same value_type

If we don't want to enforce that, we should relax requirements in the docs (both Thrust and CUB) and add tests. I think it's fine to merge this PR without doing that, but we should at least file an issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So people are already doing it as we have seen with cuDF.

I believe we should think about something in the direction of is_assignable<value_t<OutIt>&, value_t<InIt1>> with the obvious caveat that it needs to work for discard iterators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue it should be is_assignable<reference_t<OutIt>, value_t<InIt1>>. That would handle proxy references.

@bernhardmgruber
Copy link
Contributor

bernhardmgruber commented Jul 24, 2024

Can we please have a test for this, so this bug is not introduced again?

I will add a test.

Copy link
Contributor

🟨 CI finished in 10h 54m: Pass: 99%/250 | Total: 5d 08h | Avg: 30m 44s | Max: 1h 40m | Hits: 81%/248859
  • 🟨 thrust: Pass: 99%/118 | Total: 1d 23h | Avg: 23m 56s | Max: 49m 21s | Hits: 75%/137735

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/110 | Total:  1d 19h | Avg: 23m 51s | Max: 49m 21s | Hits:  75%/128315
      🟩 arm64              Pass: 100%/8   | Total:  3h 20m | Avg: 25m 01s | Max: 27m 45s | Hits:  72%/9420  
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  5h 55m | Avg: 23m 41s | Max: 45m 08s | Hits:  72%/17660 
      🟩 11.8               Pass: 100%/3   | Total:  1h 38m | Avg: 32m 46s | Max: 35m 01s | Hits:  72%/3534  
      🔍 12.5               Pass:  99%/100 | Total:  1d 15h | Avg: 23m 42s | Max: 49m 21s | Hits:  76%/116541
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 51m 04s | Avg: 25m 32s | Max: 26m 12s | Hits:  71%/2354  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  5h 55m | Avg: 23m 41s | Max: 45m 08s | Hits:  72%/17660 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 38m | Avg: 32m 46s | Max: 35m 01s | Hits:  72%/3534  
      🔍 nvcc12.5           Pass:  98%/98  | Total:  1d 14h | Avg: 23m 40s | Max: 49m 21s | Hits:  76%/114187
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 04s | Avg: 25m 32s | Max: 26m 12s | Hits:  71%/2354  
      🔍 nvcc               Pass:  99%/116 | Total:  1d 22h | Avg: 23m 54s | Max: 49m 21s | Hits:  75%/135381
    🔍 cxx: Clang17 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 21m | Avg: 23m 39s | Max: 28m 53s | Hits:  72%/7062  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 20m | Avg: 26m 51s | Max: 29m 20s | Hits:  72%/3531  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 39m | Avg: 24m 49s | Max: 25m 58s | Hits:  72%/4708  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 39m | Avg: 24m 50s | Max: 27m 02s | Hits:  72%/4708  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 37m | Avg: 24m 24s | Max: 25m 51s | Hits:  72%/4708  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 11s | Max: 26m 27s | Hits:  72%/4708  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 32s | Max: 29m 13s | Hits:  72%/4708  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 41m | Avg: 25m 25s | Max: 29m 13s | Hits:  72%/4708  
      🔍 Clang17            Pass:  94%/18  | Total:  5h 21m | Avg: 17m 52s | Max: 26m 43s | Hits:  83%/20009 
      🟩 GCC6               Pass: 100%/2   | Total: 43m 22s | Avg: 21m 41s | Max: 22m 22s | Hits:  72%/2354  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 16m | Avg: 22m 43s | Max: 27m 10s | Hits:  72%/7068  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 23m | Avg: 23m 52s | Max: 27m 21s | Hits:  72%/7068  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 39m | Avg: 26m 37s | Max: 30m 44s | Hits:  64%/7068  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 50m | Avg: 27m 30s | Max: 30m 42s | Hits:  61%/4712  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 25m | Avg: 29m 20s | Max: 35m 01s | Hits:  72%/8246  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 52m | Avg: 28m 09s | Max: 30m 48s | Hits:  71%/4712  
      🟩 GCC13              Pass: 100%/20  | Total:  5h 40m | Avg: 17m 01s | Max: 27m 45s | Hits:  83%/23560 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 33m | Avg: 31m 18s | Max: 33m 16s | Hits:  72%/3540  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 08s | Avg: 45m 08s | Max: 45m 08s | Hits:  70%/1173  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 33m | Avg: 46m 41s | Max: 49m 21s | Hits:  70%/2346  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 15m | Avg: 32m 36s | Max: 48m 19s | Hits:  84%/7038  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  98%/51  | Total: 19h 05m | Avg: 22m 27s | Max: 29m 20s | Hits:  76%/58850 
      🟩 GCC                Pass: 100%/55  | Total: 20h 51m | Avg: 22m 44s | Max: 35m 01s | Hits:  74%/64788 
      🟩 Intel              Pass: 100%/3   | Total:  1h 33m | Avg: 31m 18s | Max: 33m 16s | Hits:  72%/3540  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 34m | Avg: 37m 07s | Max: 49m 21s | Hits:  79%/10557 
    🔍 jobs: TestGPU 🔍
      🟩 Build              Pass: 100%/99  | Total:  1d 19h | Avg: 26m 30s | Max: 49m 21s | Hits:  71%/116553
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 41m | Avg:  9m 11s | Max: 18m 54s | Hits:  99%/12939 
      🔍 TestGPU            Pass:  87%/8   | Total:  1h 38m | Avg: 12m 19s | Max: 14m 04s | Hits:  99%/8243  
    🔍 std: 11 🔍
      🔍 11                 Pass:  96%/30  | Total:  9h 57m | Avg: 19m 55s | Max: 28m 40s | Hits:  75%/34151 
      🟩 14                 Pass: 100%/34  | Total: 14h 36m | Avg: 25m 46s | Max: 47m 11s | Hits:  74%/40020 
      🟩 17                 Pass: 100%/33  | Total: 14h 16m | Avg: 25m 56s | Max: 49m 21s | Hits:  75%/38847 
      🟩 20                 Pass: 100%/21  | Total:  8h 14m | Avg: 23m 32s | Max: 48m 19s | Hits:  78%/24717 
    🟨 gpu
      🟨 v100               Pass:  99%/118 | Total:  1d 23h | Avg: 23m 56s | Max: 49m 21s | Hits:  75%/137735
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 38m | Avg: 32m 46s | Max: 35m 01s | Hits:  72%/3534  
      🟩 90a                Pass: 100%/4   | Total: 58m 50s | Avg: 14m 42s | Max: 15m 34s | Hits:  71%/4712  
    
  • 🟩 cub: Pass: 100%/131 | Total: 3d 08h | Avg: 37m 00s | Max: 1h 40m | Hits: 89%/111124

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  3d 02h | Avg: 36m 20s | Max:  1h 40m | Hits:  90%/104188
      🟩 arm64              Pass: 100%/8   | Total:  6h 18m | Avg: 47m 20s | Max: 52m 39s | Hits:  83%/6936  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  9h 01m | Avg: 36m 04s | Max: 48m 26s | Hits:  89%/11792 
      🟩 11.8               Pass: 100%/3   | Total:  2h 37m | Avg: 52m 30s | Max: 52m 56s | Hits:  86%/2601  
      🟩 12.5               Pass: 100%/113 | Total:  2d 21h | Avg: 36m 43s | Max:  1h 40m | Hits:  90%/96731 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 30m 01s | Avg: 15m 00s | Max: 15m 33s | Hits:  89%/1436  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  9h 01m | Avg: 36m 04s | Max: 48m 26s | Hits:  89%/11792 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 37m | Avg: 52m 30s | Max: 52m 56s | Hits:  86%/2601  
      🟩 nvcc12.5           Pass: 100%/111 | Total:  2d 20h | Avg: 37m 07s | Max:  1h 40m | Hits:  90%/95295 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 30m 01s | Avg: 15m 00s | Max: 15m 33s | Hits:  89%/1436  
      🟩 nvcc               Pass: 100%/129 | Total:  3d 08h | Avg: 37m 21s | Max:  1h 40m | Hits:  89%/109688
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 49m | Avg: 38m 14s | Max: 43m 36s | Hits:  88%/4980  
      🟩 Clang10            Pass: 100%/3   | Total:  2h 03m | Avg: 41m 09s | Max: 41m 49s | Hits:  86%/2607  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 45m | Avg: 41m 16s | Max: 43m 53s | Hits:  86%/3476  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 38m | Avg: 39m 44s | Max: 41m 04s | Hits:  86%/3476  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 43m | Avg: 40m 50s | Max: 42m 51s | Hits:  86%/3476  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 44m | Avg: 41m 05s | Max: 43m 15s | Hits:  86%/3476  
      🟩 Clang15            Pass: 100%/4   | Total:  2h 43m | Avg: 40m 53s | Max: 45m 02s | Hits:  86%/3468  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 45m | Avg: 41m 28s | Max: 42m 01s | Hits:  86%/3468  
      🟩 Clang17            Pass: 100%/26  | Total: 13h 38m | Avg: 31m 29s | Max:  1h 39m | Hits:  95%/22244 
      🟩 GCC6               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 37s | Max: 35m 44s | Hits:  89%/1582  
      🟩 GCC7               Pass: 100%/6   | Total:  3h 43m | Avg: 37m 18s | Max: 39m 58s | Hits:  87%/4983  
      🟩 GCC8               Pass: 100%/6   | Total:  3h 52m | Avg: 38m 45s | Max: 43m 15s | Hits:  87%/4983  
      🟩 GCC9               Pass: 100%/6   | Total:  3h 48m | Avg: 38m 07s | Max: 42m 54s | Hits:  87%/4983  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 48m | Avg: 42m 14s | Max: 43m 14s | Hits:  86%/3476  
      🟩 GCC11              Pass: 100%/7   | Total:  5h 22m | Avg: 46m 05s | Max: 52m 56s | Hits:  86%/6069  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 41m | Avg: 40m 18s | Max: 40m 58s | Hits:  86%/3468  
      🟩 GCC13              Pass: 100%/28  | Total: 14h 09m | Avg: 30m 20s | Max:  1h 40m | Hits:  91%/24276 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 13m | Avg: 44m 27s | Max: 46m 30s | Hits:  89%/2379  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 26s | Avg: 48m 26s | Max: 48m 26s | Hits:  88%/709   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 16s | Max: 51m 36s | Hits:  88%/1418  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 34m | Avg: 51m 36s | Max: 52m 44s | Hits:  88%/2127  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 11h | Avg: 36m 29s | Max:  1h 39m | Hits:  90%/50671 
      🟩 GCC                Pass: 100%/63  | Total:  1d 13h | Avg: 35m 49s | Max:  1h 40m | Hits:  89%/53820 
      🟩 Intel              Pass: 100%/3   | Total:  2h 13m | Avg: 44m 27s | Max: 46m 30s | Hits:  89%/2379  
      🟩 MSVC               Pass: 100%/6   | Total:  5h 05m | Avg: 50m 57s | Max: 52m 44s | Hits:  88%/4254  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  3d 08h | Avg: 37m 00s | Max:  1h 40m | Hits:  89%/111124
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 18h | Avg: 40m 22s | Max: 52m 56s | Hits:  86%/83380 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 47m | Avg: 20m 55s | Max: 25m 59s | Hits:  99%/6936  
      🟩 GraphCapture       Pass: 100%/8   | Total:  5h 00m | Avg: 37m 34s | Max:  1h 40m | Hits:  99%/6936  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 29m | Avg: 18m 44s | Max: 21m 48s | Hits:  99%/6936  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 54m | Avg: 29m 21s | Max: 48m 46s | Hits:  99%/6936  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 37m | Avg: 52m 30s | Max: 52m 56s | Hits:  86%/2601  
      🟩 90a                Pass: 100%/4   | Total:  1h 16m | Avg: 19m 06s | Max: 22m 58s | Hits:  79%/3468  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 21h 29m | Avg: 37m 54s | Max:  1h 40m | Hits:  89%/29047 
      🟩 14                 Pass: 100%/37  | Total:  1d 00h | Avg: 38m 58s | Max:  1h 39m | Hits:  90%/31174 
      🟩 17                 Pass: 100%/36  | Total: 21h 32m | Avg: 35m 54s | Max: 52m 44s | Hits:  90%/30392 
      🟩 20                 Pass: 100%/24  | Total: 13h 44m | Avg: 34m 21s | Max: 50m 23s | Hits:  90%/20511 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 12m 03s | Avg: 12m 03s | Max: 12m 03s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber
Copy link
Contributor

I will add a test.

Ok, this opened a huge can of worms. I tried with key1_type = int and key2_type = unsigned. The problem is that calling the comparison functor leads to converting both types two unsigned, underflowing the first key, which will lead to wrong comparison results. Furthermore, several parts of the algorithm need a single key type for local arrays in registers. E.g.:

    key_type keys_loc[items_per_thread];
    gmem_to_reg<threads_per_block, IsFullTile>(
      keys_loc, keys1_in + keys1_beg, keys2_in + keys2_beg, num_keys1, num_keys2);
    reg_to_shared<threads_per_block>(&storage.keys_shared[0], keys_loc);

Here, keys_loc and storage.keys_shared[0] use a single key type. We could make this the destination key type, so the input keys would be converted during load. However, we specify that the binary comparator is called on elements of the input sequence, but now we would be calling it (after loading to registers) on types from the output sequence. So the comparison operator would need to be able to compare any pair of key1, key2 an key_out types (We also call compare_op(key1, key2) and compare_op(key2, key1)).

This all started to sound a bit too scary, and I think we may only allow a subset of these combinations.

For completness, it seems std::merge can handle all those cases though.

@elstehle
Copy link
Collaborator

Here, keys_loc and storage.keys_shared[0] use a single key type. We could make this the destination key type, so the input keys would be converted during load. However, we specify that the binary comparator is called on elements of the input sequence, but now we would be calling it (after loading to registers) on types from the output sequence. So the comparison operator would need to be able to compare any pair of key1, key2 an key_out types (We also call compare_op(key1, key2) and compare_op(key2, key1)).

This all started to sound a bit too scary, and I think we may only allow a subset of these combinations.

For completness, it seems std::merge can handle all those cases though.

I'm late to the party but I totally share that concern and started working on a reproducer yesterday:
https://godbolt.org/z/or8TdGa64

I think ultimately, the right thing would be to have a specialization for the case where the key1 and key2 types differ. The specialization will be slower, but I think it's a valid use case.

For now, I would like to avoid opening up more than we used to before the refactor, to avoid users establishing the wrong expectations. But I'm afraid we were already quite permissive with regards to different types, right?

@bernhardmgruber
Copy link
Contributor

I have opened #2075 which mostly supersedes this PR. I tried to fix as little as possible to make the cuDF use case work.

@miscco
Copy link
Collaborator Author

miscco commented Jul 25, 2024

closing in favor of #2075

@miscco miscco closed this Jul 25, 2024
@miscco miscco deleted the ensure_rapids_builds branch July 25, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: functional cub For all items related to CUB
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants