-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDAX] Add copy_bytes and fill_bytes overloads for mdspan #2932
base: main
Are you sure you want to change the base?
Conversation
🟨 CI finished in 43m 38s: Pass: 81%/54 | Total: 4h 27m | Avg: 4m 57s | Max: 18m 23s | Hits: 82%/246
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 54)
# | Runner |
---|---|
43 | linux-amd64-cpu16 |
5 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
for (typename _SrcExtents::rank_type __i = 0; __i < __src.rank(); __i++) | ||
{ | ||
if (__src.extent(__i) | ||
!= static_cast<typename _SrcExtents::index_type>( | ||
__dst.extent((static_cast<typename _DstExtents::rank_type>(__i))))) | ||
{ | ||
_CUDA_VSTD::__throw_invalid_argument("Copy destination size differs from the source"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I would say that should rather be a function that is returning a bool
We can then decide if we rather want an if (!func()) __throw_invalid_argument
or _CCCL_ASSERT(__func(), "...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to a separate function. Fail mode here is pretty bad, so I left it as an exception instead of an assert
🟩 CI finished in 2h 00m: Pass: 100%/54 | Total: 4h 29m | Avg: 4m 59s | Max: 17m 52s | Hits: 84%/246
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 54)
# | Runner |
---|---|
43 | linux-amd64-cpu16 |
5 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
I was hoping we can get #2306 done but it looks like this only copies between two identical instead of arbitrary layouts 🥲 |
This PR adds
copy_bytes
andfill_bytes
overloads operating onmdspan
s. Input types need to becuda::std::mdspan
instance, it needs to launch transform to one or implicitly convert and containmdspan
template arguments as member aliases, so the destination type can be discovered (last case will most likely bemdarray
).For
copy_bytes
this version does not try to do anything clever to match shapes. Source and destination layouts need to be the same and extents need to be compatible, which means any combination of static or dynamic extents, as long as each runtime extent is the same.More test cases will be added once
mdarray
type is available.