Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAX] Add copy_bytes and fill_bytes overloads for mdspan #2932

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pciolkosz
Copy link
Contributor

This PR adds copy_bytes and fill_bytes overloads operating on mdspans. Input types need to be cuda::std::mdspan instance, it needs to launch transform to one or implicitly convert and contain mdspan template arguments as member aliases, so the destination type can be discovered (last case will most likely be mdarray).

For copy_bytes this version does not try to do anything clever to match shapes. Source and destination layouts need to be the same and extents need to be compatible, which means any combination of static or dynamic extents, as long as each runtime extent is the same.

More test cases will be added once mdarray type is available.

@pciolkosz pciolkosz requested a review from a team as a code owner November 22, 2024 02:53
Copy link
Contributor

🟨 CI finished in 43m 38s: Pass: 81%/54 | Total: 4h 27m | Avg: 4m 57s | Max: 18m 23s | Hits: 82%/246
  • 🟨 cudax: Pass: 81%/54 | Total: 4h 27m | Avg: 4m 57s | Max: 18m 23s | Hits: 82%/246

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  80%/50  | Total:  4h 16m | Avg:  5m 07s | Max: 18m 23s | Hits:  82%/246   
      🟩 arm64              Pass: 100%/4   | Total: 11m 14s | Avg:  2m 48s | Max:  2m 59s
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/30  | Total:  2h 05m | Avg:  4m 11s | Max: 16m 55s
      🔍 GCC                Pass:  50%/20  | Total:  1h 43m | Avg:  5m 11s | Max: 18m 23s
      🟩 MSVC               Pass: 100%/2   | Total: 25m 42s | Avg: 12m 51s | Max: 15m 20s | Hits:  82%/246   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 06s | Avg:  6m 03s | Max:  6m 04s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  79%/49  | Total:  3h 02m | Avg:  3m 42s | Max: 15m 20s | Hits:  82%/246   
      🟩 Test               Pass: 100%/5   | Total:  1h 25m | Avg: 17m 06s | Max: 18m 23s
    🟨 ctk
      🟨 12.0               Pass:  73%/19  | Total:  1h 38m | Avg:  5m 10s | Max: 18m 02s | Hits:  82%/123   
      🟩 12.5               Pass: 100%/2   | Total: 12m 06s | Avg:  6m 03s | Max:  6m 04s
      🟨 12.6               Pass:  84%/33  | Total:  2h 37m | Avg:  4m 45s | Max: 18m 23s | Hits:  82%/123   
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  73%/19  | Total:  1h 38m | Avg:  5m 10s | Max: 18m 02s | Hits:  82%/123   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 06s | Avg:  6m 03s | Max:  6m 04s
      🟨 nvcc12.6           Pass:  84%/33  | Total:  2h 37m | Avg:  4m 45s | Max: 18m 23s | Hits:  82%/123   
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  6m 50s | Avg:  3m 25s | Max:  3m 40s
      🟩 Clang10            Pass: 100%/2   | Total:  7m 41s | Avg:  3m 50s | Max:  4m 19s
      🟩 Clang11            Pass: 100%/4   | Total: 12m 40s | Avg:  3m 10s | Max:  3m 30s
      🟩 Clang12            Pass: 100%/4   | Total: 12m 48s | Avg:  3m 12s | Max:  3m 18s
      🟩 Clang13            Pass: 100%/4   | Total: 12m 34s | Avg:  3m 08s | Max:  3m 23s
      🟩 Clang14            Pass: 100%/4   | Total: 26m 28s | Avg:  6m 37s | Max: 16m 08s
      🟩 Clang15            Pass: 100%/2   | Total:  6m 59s | Avg:  3m 29s | Max:  3m 30s
      🟩 Clang16            Pass: 100%/4   | Total: 12m 35s | Avg:  3m 08s | Max:  3m 29s
      🟩 Clang17            Pass: 100%/2   | Total:  6m 54s | Avg:  3m 27s | Max:  3m 31s
      🟩 Clang18            Pass: 100%/2   | Total: 20m 23s | Avg: 10m 11s | Max: 16m 55s
      🟥 GCC9               Pass:   0%/2   | Total:  5m 51s | Avg:  2m 55s | Max:  3m 07s
      🟥 GCC10              Pass:   0%/4   | Total: 12m 22s | Avg:  3m 05s | Max:  3m 25s
      🟥 GCC11              Pass:   0%/4   | Total: 12m 11s | Avg:  3m 02s | Max:  3m 14s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 05m | Avg:  9m 19s | Max: 18m 23s
      🟩 GCC13              Pass: 100%/3   | Total:  8m 08s | Avg:  2m 42s | Max:  2m 50s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s | Hits:  82%/123   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s | Hits:  82%/123   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 06s | Avg:  6m 03s | Max:  6m 04s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  81%/54  | Total:  4h 27m | Avg:  4m 57s | Max: 18m 23s | Hits:  82%/246   
    🟨 gpu
      🟨 v100               Pass:  81%/54  | Total:  4h 27m | Avg:  4m 57s | Max: 18m 23s | Hits:  82%/246   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 39s | Avg:  2m 39s | Max:  2m 39s
      🟩 90a                Pass: 100%/1   | Total:  2m 41s | Avg:  2m 41s | Max:  2m 41s
    🟨 std
      🟨 17                 Pass:  79%/29  | Total:  2h 06m | Avg:  4m 21s | Max: 18m 23s
      🟨 20                 Pass:  84%/25  | Total:  2h 21m | Avg:  5m 38s | Max: 16m 55s | Hits:  82%/246   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Comment on lines 109 to 117
for (typename _SrcExtents::rank_type __i = 0; __i < __src.rank(); __i++)
{
if (__src.extent(__i)
!= static_cast<typename _SrcExtents::index_type>(
__dst.extent((static_cast<typename _DstExtents::rank_type>(__i)))))
{
_CUDA_VSTD::__throw_invalid_argument("Copy destination size differs from the source");
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I would say that should rather be a function that is returning a bool

We can then decide if we rather want an if (!func()) __throw_invalid_argument or _CCCL_ASSERT(__func(), "...")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to a separate function. Fail mode here is pretty bad, so I left it as an exception instead of an assert

Copy link
Contributor

🟩 CI finished in 2h 00m: Pass: 100%/54 | Total: 4h 29m | Avg: 4m 59s | Max: 17m 52s | Hits: 84%/246
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 29m | Avg: 4m 59s | Max: 17m 52s | Hits: 84%/246

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 17m | Avg:  5m 08s | Max: 17m 52s | Hits:  84%/246   
      🟩 arm64              Pass: 100%/4   | Total: 12m 16s | Avg:  3m 04s | Max:  3m 24s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 37m | Avg:  5m 09s | Max: 17m 52s | Hits:  84%/123   
      🟩 12.5               Pass: 100%/2   | Total: 10m 02s | Avg:  5m 01s | Max:  5m 02s
      🟩 12.6               Pass: 100%/33  | Total:  2h 41m | Avg:  4m 53s | Max: 17m 41s | Hits:  84%/123   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 37m | Avg:  5m 09s | Max: 17m 52s | Hits:  84%/123   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 02s | Avg:  5m 01s | Max:  5m 02s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 41m | Avg:  4m 53s | Max: 17m 41s | Hits:  84%/123   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 29m | Avg:  4m 59s | Max: 17m 52s | Hits:  84%/246   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  7m 05s | Avg:  3m 32s | Max:  3m 56s
      🟩 Clang10            Pass: 100%/2   | Total:  7m 53s | Avg:  3m 56s | Max:  4m 08s
      🟩 Clang11            Pass: 100%/4   | Total: 13m 13s | Avg:  3m 18s | Max:  3m 29s
      🟩 Clang12            Pass: 100%/4   | Total: 14m 08s | Avg:  3m 32s | Max:  3m 47s
      🟩 Clang13            Pass: 100%/4   | Total: 14m 00s | Avg:  3m 30s | Max:  3m 46s
      🟩 Clang14            Pass: 100%/4   | Total: 28m 14s | Avg:  7m 03s | Max: 17m 52s
      🟩 Clang15            Pass: 100%/2   | Total:  6m 48s | Avg:  3m 24s | Max:  3m 34s
      🟩 Clang16            Pass: 100%/4   | Total: 13m 45s | Avg:  3m 26s | Max:  3m 46s
      🟩 Clang17            Pass: 100%/2   | Total:  6m 52s | Avg:  3m 26s | Max:  3m 31s
      🟩 Clang18            Pass: 100%/2   | Total: 20m 42s | Avg: 10m 21s | Max: 17m 10s
      🟩 GCC9               Pass: 100%/2   | Total:  6m 29s | Avg:  3m 14s | Max:  3m 39s
      🟩 GCC10              Pass: 100%/4   | Total: 13m 37s | Avg:  3m 24s | Max:  3m 38s
      🟩 GCC11              Pass: 100%/4   | Total: 14m 01s | Avg:  3m 30s | Max:  3m 41s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 05m | Avg:  9m 20s | Max: 17m 41s
      🟩 GCC13              Pass: 100%/3   | Total:  8m 22s | Avg:  2m 47s | Max:  2m 50s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 33s | Avg:  9m 33s | Max:  9m 33s | Hits:  84%/123   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 22s | Avg:  9m 22s | Max:  9m 22s | Hits:  84%/123   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 02s | Avg:  5m 01s | Max:  5m 02s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 12m | Avg:  4m 25s | Max: 17m 52s
      🟩 GCC                Pass: 100%/20  | Total:  1h 47m | Avg:  5m 23s | Max: 17m 41s
      🟩 MSVC               Pass: 100%/2   | Total: 18m 55s | Avg:  9m 27s | Max:  9m 33s | Hits:  84%/246   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 02s | Avg:  5m 01s | Max:  5m 02s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 29m | Avg:  4m 59s | Max: 17m 52s | Hits:  84%/246   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 02m | Avg:  3m 44s | Max:  9m 33s | Hits:  84%/246   
      🟩 Test               Pass: 100%/5   | Total:  1h 26m | Avg: 17m 19s | Max: 17m 52s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 44s | Avg:  2m 44s | Max:  2m 44s
      🟩 90a                Pass: 100%/1   | Total:  2m 50s | Avg:  2m 50s | Max:  2m 50s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 09m | Avg:  4m 27s | Max: 17m 11s
      🟩 20                 Pass: 100%/25  | Total:  2h 20m | Avg:  5m 36s | Max: 17m 52s | Hits:  84%/246   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@leofang
Copy link
Member

leofang commented Nov 23, 2024

I was hoping we can get #2306 done but it looks like this only copies between two identical instead of arbitrary layouts 🥲

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

3 participants