Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DeviceSegmentedSort NVTX range name #2857

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Fixes the NVTX range name used by the cub::DeviceSegmentedSort.
This confused me for way too long when looking at an nsys trace.

Checklist

  • I am familiar with the Contributing Guidelines
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Nov 18, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@bernhardmgruber
Copy link
Contributor

Good catch, thank you for proposing the PR!

@miscco miscco enabled auto-merge (squash) November 18, 2024 16:34
@miscco
Copy link
Collaborator

miscco commented Nov 18, 2024

/ok to test

Copy link
Contributor

🟩 CI finished in 2h 09m: Pass: 100%/222 | Total: 2d 04h | Avg: 14m 08s | Max: 1h 35m | Hits: 93%/16144
  • 🟩 cub: Pass: 100%/110 | Total: 1d 07h | Avg: 17m 23s | Max: 1h 35m | Hits: 87%/2964

    🟩 cpu
      🟩 amd64              Pass: 100%/102 | Total:  1d 06h | Avg: 17m 46s | Max:  1h 35m | Hits:  87%/2964  
      🟩 arm64              Pass: 100%/8   | Total:  1h 40m | Avg: 12m 30s | Max: 15m 00s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  3h 19m | Avg: 13m 17s | Max: 51m 34s | Hits:  58%/741   
      🟩 11.8               Pass: 100%/3   | Total: 36m 50s | Avg: 12m 16s | Max: 13m 00s
      🟩 12.5               Pass: 100%/4   | Total:  4h 18m | Avg:  1h 04m | Max:  1h 05m
      🟩 12.6               Pass: 100%/88  | Total: 23h 38m | Avg: 16m 07s | Max:  1h 35m | Hits:  97%/2223  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 32m 37s | Avg:  8m 09s | Max:  9m 02s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  3h 19m | Avg: 13m 17s | Max: 51m 34s | Hits:  58%/741   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 36m 50s | Avg: 12m 16s | Max: 13m 00s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  4h 18m | Avg:  1h 04m | Max:  1h 05m
      🟩 nvcc12.6           Pass: 100%/84  | Total: 23h 06m | Avg: 16m 30s | Max:  1h 35m | Hits:  97%/2223  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 32m 37s | Avg:  8m 09s | Max:  9m 02s
      🟩 nvcc               Pass: 100%/106 | Total:  1d 07h | Avg: 17m 44s | Max:  1h 35m | Hits:  87%/2964  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 35m | Avg: 15m 52s | Max: 19m 12s
      🟩 Clang10            Pass: 100%/3   | Total: 51m 57s | Avg: 17m 19s | Max: 18m 16s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 03m | Avg: 15m 59s | Max: 17m 31s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 04m | Avg: 16m 07s | Max: 16m 58s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 04m | Avg: 16m 11s | Max: 17m 21s
      🟩 Clang14            Pass: 100%/4   | Total: 39m 53s | Avg:  9m 58s | Max: 11m 26s
      🟩 Clang15            Pass: 100%/4   | Total: 40m 19s | Avg: 10m 04s | Max: 11m 14s
      🟩 Clang16            Pass: 100%/4   | Total: 40m 00s | Avg: 10m 00s | Max: 11m 11s
      🟩 Clang17            Pass: 100%/4   | Total: 40m 53s | Avg: 10m 13s | Max: 10m 55s
      🟩 Clang18            Pass: 100%/11  | Total:  2h 33m | Avg: 13m 56s | Max: 33m 24s
      🟩 GCC6               Pass: 100%/2   | Total: 20m 08s | Avg: 10m 04s | Max: 10m 09s
      🟩 GCC7               Pass: 100%/6   | Total: 58m 04s | Avg:  9m 40s | Max: 11m 37s
      🟩 GCC8               Pass: 100%/6   | Total: 59m 09s | Avg:  9m 51s | Max: 10m 27s
      🟩 GCC9               Pass: 100%/6   | Total: 58m 02s | Avg:  9m 40s | Max: 10m 27s
      🟩 GCC10              Pass: 100%/4   | Total: 38m 54s | Avg:  9m 43s | Max: 10m 00s
      🟩 GCC11              Pass: 100%/7   | Total:  1h 17m | Avg: 11m 07s | Max: 13m 00s
      🟩 GCC12              Pass: 100%/4   | Total: 41m 06s | Avg: 10m 16s | Max: 10m 52s
      🟩 GCC13              Pass: 100%/16  | Total:  6h 09m | Avg: 23m 05s | Max:  1h 35m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 54m | Avg: 58m 06s | Max:  1h 00m
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 34s | Avg: 51m 34s | Max: 51m 34s | Hits:  58%/741   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 33m 40s | Avg: 16m 50s | Max: 18m 04s | Hits:  97%/1482  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 18m 16s | Avg: 18m 16s | Max: 18m 16s | Hits:  97%/741   
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  4h 18m | Avg:  1h 04m | Max:  1h 05m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total: 10h 54m | Avg: 13m 38s | Max: 33m 24s
      🟩 GCC                Pass: 100%/51  | Total: 12h 02m | Avg: 14m 10s | Max:  1h 35m
      🟩 Intel              Pass: 100%/3   | Total:  2h 54m | Avg: 58m 06s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/4   | Total:  1h 43m | Avg: 25m 52s | Max: 51m 34s | Hits:  87%/2964  
      🟩 NVHPC              Pass: 100%/4   | Total:  4h 18m | Avg:  1h 04m | Max:  1h 05m
    🟩 gpu
      🟩 v100               Pass: 100%/110 | Total:  1d 07h | Avg: 17m 23s | Max:  1h 35m | Hits:  87%/2964  
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total:  1d 02h | Avg: 15m 28s | Max:  1h 05m | Hits:  87%/2964  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 06s | Avg: 19m 06s | Max: 19m 06s
      🟩 GraphCapture       Pass: 100%/1   | Total:  1h 27m | Avg:  1h 27m | Max:  1h 27m
      🟩 HostLaunch         Pass: 100%/3   | Total:  2h 29m | Avg: 49m 45s | Max:  1h 35m
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 18m | Avg: 26m 14s | Max: 33m 24s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 36m 50s | Avg: 12m 16s | Max: 13m 00s
      🟩 90a                Pass: 100%/4   | Total: 25m 33s | Avg:  6m 23s | Max:  6m 32s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  8h 03m | Avg: 16m 07s | Max:  1h 05m
      🟩 14                 Pass: 100%/29  | Total:  8h 17m | Avg: 17m 09s | Max:  1h 04m | Hits:  78%/1482  
      🟩 17                 Pass: 100%/27  | Total:  6h 31m | Avg: 14m 29s | Max:  1h 02m | Hits:  97%/741   
      🟩 20                 Pass: 100%/24  | Total:  9h 01m | Avg: 22m 33s | Max:  1h 35m | Hits:  97%/741   
    
  • 🟩 thrust: Pass: 100%/109 | Total: 19h 59m | Avg: 11m 00s | Max: 1h 02m | Hits: 95%/13180

    🟩 cpu
      🟩 amd64              Pass: 100%/101 | Total: 19h 16m | Avg: 11m 26s | Max:  1h 02m | Hits:  95%/13180 
      🟩 arm64              Pass: 100%/8   | Total: 43m 12s | Avg:  5m 24s | Max:  6m 14s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  2h 54m | Avg: 11m 36s | Max: 58m 47s | Hits:  76%/2636  
      🟩 11.8               Pass: 100%/3   | Total: 18m 08s | Avg:  6m 02s | Max:  6m 34s
      🟩 12.5               Pass: 100%/4   | Total:  3h 53m | Avg: 58m 29s | Max:  1h 02m
      🟩 12.6               Pass: 100%/87  | Total: 12h 53m | Avg:  8m 53s | Max: 43m 24s | Hits:  99%/10544 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 23m 47s | Avg:  5m 56s | Max:  7m 04s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  2h 54m | Avg: 11m 36s | Max: 58m 47s | Hits:  76%/2636  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 18m 08s | Avg:  6m 02s | Max:  6m 34s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  3h 53m | Avg: 58m 29s | Max:  1h 02m
      🟩 nvcc12.6           Pass: 100%/83  | Total: 12h 29m | Avg:  9m 01s | Max: 43m 24s | Hits:  99%/10544 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 23m 47s | Avg:  5m 56s | Max:  7m 04s
      🟩 nvcc               Pass: 100%/105 | Total: 19h 35m | Avg: 11m 11s | Max:  1h 02m | Hits:  95%/13180 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 02m | Avg: 10m 20s | Max: 11m 37s
      🟩 Clang10            Pass: 100%/3   | Total: 32m 37s | Avg: 10m 52s | Max: 12m 10s
      🟩 Clang11            Pass: 100%/4   | Total: 37m 49s | Avg:  9m 27s | Max:  9m 59s
      🟩 Clang12            Pass: 100%/4   | Total: 39m 16s | Avg:  9m 49s | Max: 10m 25s
      🟩 Clang13            Pass: 100%/4   | Total: 37m 38s | Avg:  9m 24s | Max:  9m 56s
      🟩 Clang14            Pass: 100%/4   | Total: 25m 36s | Avg:  6m 24s | Max:  6m 51s
      🟩 Clang15            Pass: 100%/4   | Total: 24m 47s | Avg:  6m 11s | Max:  6m 29s
      🟩 Clang16            Pass: 100%/4   | Total: 23m 05s | Avg:  5m 46s | Max:  6m 14s
      🟩 Clang17            Pass: 100%/4   | Total: 24m 23s | Avg:  6m 05s | Max:  6m 33s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 10m | Avg:  6m 23s | Max: 12m 37s
      🟩 GCC6               Pass: 100%/2   | Total:  9m 33s | Avg:  4m 46s | Max:  4m 51s
      🟩 GCC7               Pass: 100%/6   | Total: 32m 45s | Avg:  5m 27s | Max:  6m 12s
      🟩 GCC8               Pass: 100%/6   | Total: 32m 54s | Avg:  5m 29s | Max:  6m 46s
      🟩 GCC9               Pass: 100%/6   | Total:  1h 06m | Avg: 11m 01s | Max: 37m 31s
      🟩 GCC10              Pass: 100%/4   | Total: 23m 27s | Avg:  5m 51s | Max:  6m 11s
      🟩 GCC11              Pass: 100%/7   | Total: 43m 23s | Avg:  6m 11s | Max:  7m 16s
      🟩 GCC12              Pass: 100%/4   | Total: 27m 21s | Avg:  6m 50s | Max:  7m 28s
      🟩 GCC13              Pass: 100%/14  | Total:  1h 41m | Avg:  7m 13s | Max: 14m 50s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 00m | Avg: 40m 07s | Max: 43m 24s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 58m 47s | Avg: 58m 47s | Max: 58m 47s | Hits:  76%/2636  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 30m 50s | Avg: 15m 25s | Max: 15m 46s | Hits:  99%/5272  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 41m 25s | Avg: 20m 42s | Max: 24m 08s | Hits:  99%/5272  
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  3h 53m | Avg: 58m 29s | Max:  1h 02m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  6h 17m | Avg:  7m 51s | Max: 12m 37s
      🟩 GCC                Pass: 100%/49  | Total:  5h 36m | Avg:  6m 52s | Max: 37m 31s
      🟩 Intel              Pass: 100%/3   | Total:  2h 00m | Avg: 40m 07s | Max: 43m 24s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 11m | Avg: 26m 12s | Max: 58m 47s | Hits:  95%/13180 
      🟩 NVHPC              Pass: 100%/4   | Total:  3h 53m | Avg: 58m 29s | Max:  1h 02m
    🟩 gpu
      🟩 v100               Pass: 100%/109 | Total: 19h 59m | Avg: 11m 00s | Max:  1h 02m | Hits:  95%/13180 
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total: 18h 30m | Avg: 10m 53s | Max:  1h 02m | Hits:  93%/10544 
      🟩 TestCPU            Pass: 100%/4   | Total: 48m 33s | Avg: 12m 08s | Max: 24m 08s | Hits:  99%/2636  
      🟩 TestGPU            Pass: 100%/3   | Total: 39m 55s | Avg: 13m 18s | Max: 14m 50s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 18m 08s | Avg:  6m 02s | Max:  6m 34s
      🟩 90a                Pass: 100%/4   | Total: 21m 33s | Avg:  5m 23s | Max:  5m 38s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  4h 34m | Avg:  9m 09s | Max: 52m 49s
      🟩 14                 Pass: 100%/29  | Total:  6h 19m | Avg: 13m 05s | Max:  1h 01m | Hits:  88%/5272  
      🟩 17                 Pass: 100%/27  | Total:  4h 51m | Avg: 10m 48s | Max:  1h 02m | Hits:  99%/2636  
      🟩 20                 Pass: 100%/23  | Total:  4h 13m | Avg: 11m 00s | Max: 57m 22s | Hits:  99%/5272  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 24s | Avg: 5m 12s | Max: 8m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  8m 05s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  8m 05s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  8m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  8m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  8m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  8m 05s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  8m 05s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 19s | Avg:  2m 19s | Max:  2m 19s
      🟩 Test               Pass: 100%/1   | Total:  8m 05s | Avg:  8m 05s | Max:  8m 05s
    
  • 🟩 python: Pass: 100%/1 | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 15m 09s | Avg: 15m 09s | Max: 15m 09s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 222)

# Runner
184 linux-amd64-cpu16
16 linux-arm64-cpu16
13 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

@miscco miscco merged commit 5b804f7 into NVIDIA:main Nov 18, 2024
238 checks passed
@davidwendt davidwendt deleted the fix-segsort-nvtx-name branch November 18, 2024 18:53
trxcllnt pushed a commit to trxcllnt/cccl that referenced this pull request Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants