Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA][E2E] bindless_images/read_norm_types.cpp failed on SYCL Nightly #13281

Open
uditagarwal97 opened this issue Apr 4, 2024 · 4 comments
Open
Labels
bug Something isn't working cuda CUDA back-end sycl-bindless-images SYCL Bindless Images

Comments

@uditagarwal97
Copy link
Contributor

Describe the bug

bindless_images/read_norm_types.cpp
Refer: https://github.com/intel/llvm/actions/runs/8548669532/job/23442092835

2024-04-04T03:48:41.2565096Z FAIL: SYCL :: bindless_images/read_norm_types.cpp (1910 of 1967)
2024-04-04T03:48:41.2566644Z ******************** TEST 'SYCL :: bindless_images/read_norm_types.cpp' FAILED ********************
2024-04-04T03:48:41.2567397Z Exit Code: 1
2024-04-04T03:48:41.2567596Z 
2024-04-04T03:48:41.2567761Z Command Output (stdout):
2024-04-04T03:48:41.2568176Z --
2024-04-04T03:48:41.2568489Z # RUN: at line 4
2024-04-04T03:48:41.2571279Z /__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/read_norm_types.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2575803Z # executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/read_norm_types.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2596880Z # note: command had no output on stdout or stderr
2024-04-04T03:48:41.2597862Z # RUN: at line 5
2024-04-04T03:48:41.2609779Z env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu  /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2613354Z # executed command: env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2615612Z # .---command stderr------------
2024-04-04T03:48:41.2616385Z # | Some test cases failed
2024-04-04T03:48:41.2617158Z # `-----------------------------
2024-04-04T03:48:41.2617953Z # error: command failed with exit status: 1
2024-04-04T03:48:41.2618584Z 
2024-04-04T03:48:41.2618842Z --

To reproduce

DPC++ version: fe44bad

Environment

sycl-ls --verbose output:

2024-04-04T03:22:44.5499901Z ur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
2024-04-04T03:22:44.5501510Z [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5502289Z 
2024-04-04T03:22:44.5502372Z Platforms: 1
2024-04-04T03:22:44.5502585Z Platform [#1]:
2024-04-04T03:22:44.5503010Z     Version  : CUDA 12.1
2024-04-04T03:22:44.5503546Z     Name     : NVIDIA CUDA BACKEND
2024-04-04T03:22:44.5504233Z     Vendor   : NVIDIA Corporation
2024-04-04T03:22:44.5504845Z     Devices  : 1
2024-04-04T03:22:44.5505458Z         Device [#0]:
2024-04-04T03:22:44.5506177Z         Type       : gpu
2024-04-04T03:22:44.5506842Z         Version    : 8.6
2024-04-04T03:22:44.5507577Z         Name       : NVIDIA GeForce RTX 3090
2024-04-04T03:22:44.5508469Z         Vendor     : NVIDIA Corporation
2024-04-04T03:22:44.5509292Z         Driver     : CUDA 12.1
2024-04-04T03:22:44.5517816Z         Aspects    : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph
2024-04-04T03:22:44.5536144Z         info::device::sub_group_sizes: 32
2024-04-04T03:22:44.5537445Z default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5539394Z accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5541078Z cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5542629Z gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5544340Z custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5545985Z custom_selector(cpu)    : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5546682Z custom_selector(acc)    : No device of requested type available. -1 (PI_ERRO...

Additional context

No response

@uditagarwal97 uditagarwal97 added bug Something isn't working cuda CUDA back-end labels Apr 4, 2024
@uditagarwal97
Copy link
Contributor Author

@JackAKirk FYI

@Seanst98 Seanst98 added the sycl-bindless-images SYCL Bindless Images label Apr 4, 2024
@Seanst98
Copy link
Contributor

Seanst98 commented Apr 4, 2024

Thank you for this report. We're aware of the issue and are tracking internally.

@Seanst98
Copy link
Contributor

For now, we've XFAILed the test so that it doesn't affect CI while we continue to investigate this issue. See here: #13425

@Seanst98
Copy link
Contributor

Seanst98 commented May 9, 2024

Please see this PR which addresses the failure in the test.

get_global_id(x) where x>NDims in the nd_item causes the kernel to crash. Simply moving get_global_id(x) inside of the constexprs which guard by the dimension fixes the issue.

Note: the SYCL spec does not define exactly what should happen in the case where get_global_id(x) is called with x>NDims, so I have opened an issue with the Khronos SYCL docs.

martygrant pushed a commit that referenced this issue May 10, 2024
Fix the read_norm_types test failure by moving the calculation of work
item ids inside dimension selecting constexprs. get_global_id(x) where x
is greater than NDims in the nd_item causes the kernel to crash.

Remove XFAIL from the test.

Also, fixup the test as it was not sampling the whole image correctly or
validating every element in the image.

Addresses issue: #13281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end sycl-bindless-images SYCL Bindless Images
Projects
None yet
Development

No branches or pull requests

2 participants