Add internal wrapper for cuda driver APIs #2070

pciolkosz · 2024-07-24T23:47:25Z

Adds internal header that loads CUDA driver API functions from the cuda runtime.
It also adds a few first entries needed for current context management.

Each new function should be a function that loads the driver entry point with CUDAX_GET_DRIVER_FUNCTION and then calls it with proper arguments.

github-actions · 2024-07-25T02:36:13Z

🟩 CI finished in 2h 47m: Pass: 100%/56 | Total: 2h 35m | Avg: 2m 46s | Max: 11m 50s | Hits: 90%/1693

🟩 cudax: Pass: 100%/55 | Total: 2h 23m | Avg: 2m 36s | Max: 8m 06s | Hits: 90%/1693

🟩 cpu
  🟩 amd64              Pass: 100%/51  | Total:  2h 14m | Avg:  2m 37s | Max:  8m 06s | Hits:  90%/1569  
  🟩 arm64              Pass: 100%/4   | Total:  9m 29s | Avg:  2m 22s | Max:  2m 43s | Hits:  90%/124   
🟩 ctk
  🟩 12.0               Pass: 100%/23  | Total:  1h 00m | Avg:  2m 38s | Max:  8m 06s | Hits:  90%/707   
  🟩 12.5               Pass: 100%/32  | Total:  1h 22m | Avg:  2m 35s | Max:  6m 31s | Hits:  91%/986   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 00m | Avg:  2m 38s | Max:  8m 06s | Hits:  90%/707   
  🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 22m | Avg:  2m 35s | Max:  6m 31s | Hits:  91%/986   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/55  | Total:  2h 23m | Avg:  2m 36s | Max:  8m 06s | Hits:  90%/1693  
🟩 cxx
  🟩 Clang9             Pass: 100%/2   | Total:  4m 20s | Avg:  2m 10s | Max:  2m 13s | Hits:  93%/62    
  🟩 Clang10            Pass: 100%/2   | Total:  4m 16s | Avg:  2m 08s | Max:  2m 08s | Hits:  93%/62    
  🟩 Clang11            Pass: 100%/4   | Total:  7m 59s | Avg:  1m 59s | Max:  2m 10s | Hits:  93%/124   
  🟩 Clang12            Pass: 100%/4   | Total:  8m 58s | Avg:  2m 14s | Max:  2m 28s | Hits:  93%/124   
  🟩 Clang13            Pass: 100%/4   | Total:  8m 32s | Avg:  2m 08s | Max:  2m 22s | Hits:  93%/124   
  🟩 Clang14            Pass: 100%/6   | Total: 16m 30s | Avg:  2m 45s | Max:  4m 14s | Hits:  95%/186   
  🟩 Clang15            Pass: 100%/2   | Total:  4m 29s | Avg:  2m 14s | Max:  2m 18s | Hits:  93%/62    
  🟩 Clang16            Pass: 100%/6   | Total: 18m 37s | Avg:  3m 06s | Max:  4m 28s | Hits:  95%/186   
  🟩 GCC9               Pass: 100%/2   | Total:  4m 09s | Avg:  2m 04s | Max:  2m 06s | Hits:  87%/62    
  🟩 GCC10              Pass: 100%/4   | Total:  7m 46s | Avg:  1m 56s | Max:  2m 12s | Hits:  87%/124   
  🟩 GCC11              Pass: 100%/4   | Total:  8m 10s | Avg:  2m 02s | Max:  2m 11s | Hits:  87%/124   
  🟩 GCC12              Pass: 100%/12  | Total: 32m 39s | Avg:  2m 43s | Max:  4m 38s | Hits:  89%/372   
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 37s | Avg:  2m 37s | Max:  2m 37s | Hits:  93%/31    
  🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 06s | Avg:  8m 06s | Max:  8m 06s | Hits:  60%/25    
  🟩 MSVC14.39          Pass: 100%/1   | Total:  6m 31s | Avg:  6m 31s | Max:  6m 31s | Hits:  60%/25    
🟩 cxx_family
  🟩 Clang              Pass: 100%/30  | Total:  1h 13m | Avg:  2m 27s | Max:  4m 28s | Hits:  94%/930   
  🟩 GCC                Pass: 100%/22  | Total: 52m 44s | Avg:  2m 23s | Max:  4m 38s | Hits:  88%/682   
  🟩 Intel              Pass: 100%/1   | Total:  2m 37s | Avg:  2m 37s | Max:  2m 37s | Hits:  93%/31    
  🟩 MSVC               Pass: 100%/2   | Total: 14m 37s | Avg:  7m 18s | Max:  8m 06s | Hits:  60%/50    
🟩 gpu
  🟩 v100               Pass: 100%/55  | Total:  2h 23m | Avg:  2m 36s | Max:  8m 06s | Hits:  90%/1693  
🟩 jobs
  🟩 Build              Pass: 100%/47  | Total:  1h 50m | Avg:  2m 21s | Max:  8m 06s | Hits:  89%/1445  
  🟩 Test               Pass: 100%/8   | Total: 33m 10s | Avg:  4m 08s | Max:  4m 38s | Hits:  96%/248   
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  1m 49s | Avg:  1m 49s | Max:  1m 49s | Hits:  87%/31    
  🟩 90a                Pass: 100%/1   | Total:  1m 57s | Avg:  1m 57s | Max:  1m 57s | Hits:  87%/31    
🟩 std
  🟩 17                 Pass: 100%/31  | Total:  1h 13m | Avg:  2m 22s | Max:  4m 28s | Hits:  91%/961   
  🟩 20                 Pass: 100%/24  | Total:  1h 10m | Avg:  2m 55s | Max:  8m 06s | Hits:  89%/732

🟩 pycuda: Pass: 100%/1 | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
🟩 ctk
  🟩 12.5               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
🟩 cudacxx
  🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	pycuda

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
+/-	pycuda

🏃‍ Runner counts (total jobs: 56)

#	Runner
41	`linux-amd64-cpu16`
9	`linux-amd64-gpu-v100-latest-1`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`

cudax/include/cuda/experimental/__utility/driver_api.cuh

miscco · 2024-07-25T05:59:42Z

cudax/include/cuda/experimental/__utility/driver_api.cuh

+  if (status != CUDA_SUCCESS)
+  {
+    ::cuda::__throw_cuda_error(static_cast<cudaError_t>(status), err_msg);
+  }


Do we want something like _CCCL_TRY_CUDA_API

It could also be a function.

This function should be more or less equivalent to _CCCL_TRY_CUDA_API, am I missing some key difference here? I would have no issues turning it into a macro instead if its preffered

I believe a function is "cleaner" than a macro, but the macro cannot go as we cannot depend on cudax.

Otherwise we would need to move the function into libcu++

We might need two separate functions/macros because driver API returns CUresult and runtime returns cudaError_t.

But these have the same values, so maybe we can add a cast to _CCCL_TRY_CUDA_API and remove this function 🤔

ericniebler · 2024-07-25T17:38:23Z

cudax/include/cuda/experimental/__utility/driver_api.cuh

+{
+  static auto driver_fn = CUDAX_GET_DRIVER_FUNCTION(cuCtxPushCurrent);
+  call_driver_fn(driver_fn, "Failed to push context", ctx);
+}


why are we dynamically loading these functions instead of including <cuda.h> and linking to libcuda?

We would need to require -lcuda compilation flag otherwise. This is more in line with the current CUDA runtime which does not require the compilation flag. There are compatibility reasons why current CUDA runtime does that and we probably want the same thing

directly linking to libcuda.so means that any consuming library would only run on machines with the CUDA driver installed. This would mean that any application with runtime logic to dispatch to CUDA vs CPU based on HW support would fail to load when launched on a machine without the CUDA driver.

From a build engineer standpoint linking to libcuda.so should never happen

cool thanks. i knew there must be a reason. TIL

miscco · 2024-07-30T07:42:46Z

cudax/test/CMakeLists.txt

@@ -57,4 +57,8 @@ foreach(cn_target IN LISTS cudax_TARGETS)
    launch/configuration.cu
  )
  target_compile_options(${test_target} PRIVATE $<$<COMPILE_LANG_AND_ID:CUDA,NVIDIA>:--extended-lambda>)
+
+  Cudax_add_catch2_test(test_target misc_tests ${cn_target}


Suggested change

Cudax_add_catch2_test(test_target misc_tests ${cn_target}

cudax_add_catch2_test(test_target misc_tests ${cn_target}

github-actions · 2024-07-31T01:30:37Z

🟩 CI finished in 2h 28m: Pass: 100%/56 | Total: 2h 31m | Avg: 2m 42s | Max: 12m 40s | Hits: 97%/2408

🟩 cudax: Pass: 100%/55 | Total: 2h 19m | Avg: 2m 31s | Max: 6m 44s | Hits: 97%/2408

🟩 cpu
  🟩 amd64              Pass: 100%/51  | Total:  2h 10m | Avg:  2m 33s | Max:  6m 44s | Hits:  97%/2232  
  🟩 arm64              Pass: 100%/4   | Total:  8m 33s | Avg:  2m 08s | Max:  3m 00s | Hits:  97%/176   
🟩 ctk
  🟩 12.0               Pass: 100%/23  | Total: 59m 01s | Avg:  2m 33s | Max:  6m 24s | Hits:  97%/1006  
  🟩 12.5               Pass: 100%/32  | Total:  1h 20m | Avg:  2m 30s | Max:  6m 44s | Hits:  97%/1402  
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/23  | Total: 59m 01s | Avg:  2m 33s | Max:  6m 24s | Hits:  97%/1006  
  🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 20m | Avg:  2m 30s | Max:  6m 44s | Hits:  97%/1402  
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/55  | Total:  2h 19m | Avg:  2m 31s | Max:  6m 44s | Hits:  97%/2408  
🟩 cxx
  🟩 Clang9             Pass: 100%/2   | Total:  4m 13s | Avg:  2m 06s | Max:  2m 07s | Hits: 100%/88    
  🟩 Clang10            Pass: 100%/2   | Total:  4m 05s | Avg:  2m 02s | Max:  2m 04s | Hits: 100%/88    
  🟩 Clang11            Pass: 100%/4   | Total:  8m 00s | Avg:  2m 00s | Max:  2m 05s | Hits: 100%/176   
  🟩 Clang12            Pass: 100%/4   | Total:  8m 12s | Avg:  2m 03s | Max:  2m 13s | Hits: 100%/176   
  🟩 Clang13            Pass: 100%/4   | Total:  8m 33s | Avg:  2m 08s | Max:  2m 14s | Hits: 100%/176   
  🟩 Clang14            Pass: 100%/6   | Total: 16m 33s | Avg:  2m 45s | Max:  4m 36s | Hits: 100%/264   
  🟩 Clang15            Pass: 100%/2   | Total:  4m 16s | Avg:  2m 08s | Max:  2m 10s | Hits: 100%/88    
  🟩 Clang16            Pass: 100%/6   | Total: 18m 48s | Avg:  3m 08s | Max:  4m 51s | Hits: 100%/264   
  🟩 GCC9               Pass: 100%/2   | Total:  3m 37s | Avg:  1m 48s | Max:  1m 52s | Hits:  95%/88    
  🟩 GCC10              Pass: 100%/4   | Total:  7m 59s | Avg:  1m 59s | Max:  2m 04s | Hits:  95%/176   
  🟩 GCC11              Pass: 100%/4   | Total:  7m 26s | Avg:  1m 51s | Max:  2m 06s | Hits:  95%/176   
  🟩 GCC12              Pass: 100%/12  | Total: 31m 52s | Avg:  2m 39s | Max:  4m 56s | Hits:  95%/528   
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 30s | Avg:  2m 30s | Max:  2m 30s | Hits: 100%/44    
  🟩 MSVC14.36          Pass: 100%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s | Hits:  78%/38    
  🟩 MSVC14.39          Pass: 100%/1   | Total:  6m 44s | Avg:  6m 44s | Max:  6m 44s | Hits:  78%/38    
🟩 cxx_family
  🟩 Clang              Pass: 100%/30  | Total:  1h 12m | Avg:  2m 25s | Max:  4m 51s | Hits: 100%/1320  
  🟩 GCC                Pass: 100%/22  | Total: 50m 54s | Avg:  2m 18s | Max:  4m 56s | Hits:  95%/968   
  🟩 Intel              Pass: 100%/1   | Total:  2m 30s | Avg:  2m 30s | Max:  2m 30s | Hits: 100%/44    
  🟩 MSVC               Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max:  6m 44s | Hits:  78%/76    
🟩 gpu
  🟩 v100               Pass: 100%/55  | Total:  2h 19m | Avg:  2m 31s | Max:  6m 44s | Hits:  97%/2408  
🟩 jobs
  🟩 Build              Pass: 100%/47  | Total:  1h 44m | Avg:  2m 13s | Max:  6m 44s | Hits:  97%/2056  
  🟩 Test               Pass: 100%/8   | Total: 34m 16s | Avg:  4m 17s | Max:  4m 56s | Hits:  97%/352   
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  1m 48s | Avg:  1m 48s | Max:  1m 48s | Hits:  95%/44    
  🟩 90a                Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s | Hits:  95%/44    
🟩 std
  🟩 17                 Pass: 100%/31  | Total:  1h 11m | Avg:  2m 18s | Max:  4m 21s | Hits:  98%/1364  
  🟩 20                 Pass: 100%/24  | Total:  1h 07m | Avg:  2m 49s | Max:  6m 44s | Hits:  96%/1044

🟩 pycuda: Pass: 100%/1 | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s
🟩 ctk
  🟩 12.5               Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s
🟩 cudacxx
  🟩 nvcc12.5           Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	pycuda

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
+/-	pycuda

🏃‍ Runner counts (total jobs: 56)

#	Runner
41	`linux-amd64-cpu16`
9	`linux-amd64-gpu-v100-latest-1`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`

* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting

pciolkosz added 3 commits July 24, 2024 12:53

Add a header to interact with driver APIs

c123102

Add a test for the driver API interaction

8357b5a

Format

242e135

pciolkosz requested review from a team as code owners July 24, 2024 23:47

pciolkosz requested review from robertmaynard and gonidelis July 24, 2024 23:47

pciolkosz self-assigned this Jul 24, 2024

pciolkosz requested a review from ericniebler July 24, 2024 23:52

pciolkosz linked an issue Jul 25, 2024 that may be closed by this pull request

Add internal wrapper for CUDA driver APIs #2042

Closed

miscco approved these changes Jul 25, 2024

View reviewed changes

ericniebler reviewed Jul 25, 2024

View reviewed changes

miscco reviewed Jul 30, 2024

View reviewed changes

pciolkosz added 2 commits July 30, 2024 15:43

Merge branch 'main' into 2042-add-internal-wrapper-for-cuda-driver-apis

3f93e77

Fix formatting

4b7bb4c

pciolkosz merged commit 7a3dae7 into NVIDIA:main Jul 31, 2024
70 checks passed

pciolkosz added a commit to pciolkosz/cccl that referenced this pull request Aug 4, 2024

Add internal wrapper for cuda driver APIs (NVIDIA#2070)

15d0c19

* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting

pciolkosz added a commit to pciolkosz/cccl that referenced this pull request Aug 4, 2024

Add internal wrapper for cuda driver APIs (NVIDIA#2070)

6f2de8b

* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add internal wrapper for cuda driver APIs #2070

Add internal wrapper for cuda driver APIs #2070

pciolkosz commented Jul 24, 2024

github-actions bot commented Jul 25, 2024

🟩 cudax: Pass: 100%/55 | Total: 2h 23m | Avg: 2m 36s | Max: 8m 06s | Hits: 90%/1693

🟩 pycuda: Pass: 100%/1 | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 56)

miscco Jul 25, 2024

bernhardmgruber Jul 25, 2024

pciolkosz Jul 25, 2024

miscco Jul 25, 2024

pciolkosz Jul 25, 2024

ericniebler Jul 25, 2024

pciolkosz Jul 25, 2024

robertmaynard Jul 25, 2024

ericniebler Jul 25, 2024

miscco Jul 30, 2024

github-actions bot commented Jul 31, 2024

🟩 cudax: Pass: 100%/55 | Total: 2h 19m | Avg: 2m 31s | Max: 6m 44s | Hits: 97%/2408

🟩 pycuda: Pass: 100%/1 | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 56)

	Cudax_add_catch2_test(test_target misc_tests ${cn_target}
	cudax_add_catch2_test(test_target misc_tests ${cn_target}

Add internal wrapper for cuda driver APIs #2070

Add internal wrapper for cuda driver APIs #2070

Conversation

pciolkosz commented Jul 24, 2024

github-actions bot commented Jul 25, 2024

🟩 cudax: Pass: 100%/55 | Total: 2h 23m | Avg: 2m 36s | Max: 8m 06s | Hits: 90%/1693

🟩 pycuda: Pass: 100%/1 | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 56)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 31, 2024

🟩 cudax: Pass: 100%/55 | Total: 2h 19m | Avg: 2m 31s | Max: 6m 44s | Hits: 97%/2408

🟩 pycuda: Pass: 100%/1 | Total: 12m 40s | Avg: 12m 40s | Max: 12m 40s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 56)