Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL Track Finding, main branch (2024.11.14.) #773

Closed

Conversation

krasznaa
Copy link
Member

This is finally a CKF algorithm for SYCL. 🥳

Still some technical stuff to set up, I just wanted to open it already, so people would be aware. 😄 (I'll open at least one PR off of this first, which should go in before this one.)

@krasznaa krasznaa added the sycl Changes related to SYCL label Nov 15, 2024
@@ -36,7 +36,7 @@ struct apply_interaction_payload {
* @brief View object to the vector of boolean-like integers describing
* whether each parameter is live. Has the same size as \ref params_view
*/
vecmem::data::vector_view<const unsigned int> params_liveness_view;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this change? CUDA doesn't natively support 8-bit loads so I'm a bit worried about the performance implications of this. Also is there a reason to use char and not unsigned char?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use these values as bools. Using 32-bits where we only need 1, seems very silly.

Of course bool doesn't work. 😦 But our convention in the offline code is to use char when we need "boolean information", but bool can't be used.

Also, take this into account: https://github.com/acts-project/traccc/blob/main/device/cuda/src/finding/finding_algorithm.cu#L154

What do you think is actually getting set for this buffer with that operation? 😏 Because it's not 0x1 values in the unsigned int variables...

In any case, I can't see why we shouldn't go for this. Even if NVIDIA always copies at least 16 bits, right now we move 32 bits in all cases. Even though we only need 1. If some of the loads are next to each other, this could still win us a little bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think is actually getting set for this buffer with that operation? 😏 Because it's not 0x1 values in the unsigned int variables...

It's setting a non-zero value; I don't see the problem?

Copy link
Member

@stephenswat stephenswat Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I can't see why we shouldn't go for this. Even if NVIDIA always copies at least 16 bits, right now we move 32 bits in all cases. Even though we only need 1. If some of the loads are next to each other, this could still win us a little bit.

Think of what happens if four adjacent threads want to write their chars to global memory at the same time.

I seriously do not see why we need to change this to save what boils down to 240 kilobytes...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically the code worked. But you can't argue that we're not spending more time with memsetting these values, and using more global memory on it, then we need to. 🤔

Comment on lines +17 to +39
unsigned int inline TRACCC_DEVICE getLocalThreadId() const {
return threadIdx.x;
}

std::size_t inline TRACCC_DEVICE getLocalThreadIdX() const {
unsigned int inline TRACCC_DEVICE getLocalThreadIdX() const {
return threadIdx.x;
}

std::size_t inline TRACCC_DEVICE getGlobalThreadId() const {
unsigned int inline TRACCC_DEVICE getGlobalThreadId() const {
return threadIdx.x + blockIdx.x * blockDim.x;
}

std::size_t inline TRACCC_DEVICE getGlobalThreadIdX() const {
unsigned int inline TRACCC_DEVICE getGlobalThreadIdX() const {
return threadIdx.x + blockIdx.x * blockDim.x;
}

std::size_t inline TRACCC_DEVICE getBlockIdX() const { return blockIdx.x; }
unsigned int inline TRACCC_DEVICE getBlockIdX() const { return blockIdx.x; }

std::size_t inline TRACCC_DEVICE getBlockDimX() const { return blockDim.x; }
unsigned int inline TRACCC_DEVICE getBlockDimX() const {
return blockDim.x;
}

std::size_t inline TRACCC_DEVICE getGridDimX() const { return gridDim.x; }
unsigned int inline TRACCC_DEVICE getGridDimX() const { return gridDim.x; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? Why? 😕

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all unsigned int-s natively. If we return them as std::size_t, I'll have to add a whole lot of static_cast-s in the kernels to make sure that we wouldn't use unsigned long-s where unsigned int is expected.

As it turns out, our CUDA build does not look for type conversions / narrowings. oneAPI does. So I got a lot of warnings from such things.

Copy link
Member

@stephenswat stephenswat Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but why not just add the static_casts then? getGlobalThreadIdX() can return a number larger than an unsigned integer can store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am not sure why the OneAPI compiler even sees a type that is supposed to be used only in CUDA code. That's worrying. 😟

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernels were/are receiving std::size_t values for the thread ID. And then they use those values in a lot of functions that expect unsigned int values. That's where I was getting the warnings from.

But as long as I'm modifying the signature of the traccc::device:: functions, I thought I'd also have a look at what's going on in the CUDA code... 🤔

return threadIdx.x + blockIdx.x * blockDim.x;
}

std::size_t inline TRACCC_DEVICE getGlobalThreadIdX() const {
unsigned int inline TRACCC_DEVICE getGlobalThreadIdX() const {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think it would be fair to put in an assertion into this function that this value would not be larger than unsigned int. At the same time, notice that the calculation on line 30 works with unsigned int itself. If the result of that overflows, the std::size_t return type will not do anything to help. 🤔

@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 0e3fddc to 2bacf76 Compare November 16, 2024 07:11
@krasznaa
Copy link
Member Author

Unfortunately the situation is that the AMD backend of oneAPI doesn't seem to work with the code. 😦 I now tried a number of incantations, but linking libtrraccc_sycl.so always fails in some way for that backend. 😦

Of course I've been developing the code with the NVIDIA backend. Where it compiles/links fine. (Note that there are no runtime tests in the code just yet.) With the Intel backend one needs to be a bit careful as well, but at least the build can succeed with that one. But with the AMD backend there seem to be some bigger issues. 😢

...
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-6b3473.out.lto.o:(typeinfo name for traccc::sycl::kernels::fill_prefix_sum)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-6b3473.out.lto.o:(typeinfo name for traccc::sycl::kernels::fill_prefix_sum)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-6b3473.out.lto.o:(typeinfo name for traccc::sycl::kernels::fill_prefix_sum)
[build] >>> referenced 45 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-1227a4.out.lto.o:(typeinfo name for traccc::sycl::kernels::estimate_track_params)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-1227a4.out.lto.o:(typeinfo name for traccc::sycl::kernels::estimate_track_params)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-1227a4.out.lto.o:(typeinfo name for traccc::sycl::kernels::estimate_track_params)
[build] >>> referenced 225 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b47285.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_grid_capacities)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b47285.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_grid_capacities)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b47285.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_grid_capacities)
[build] >>> referenced 189 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b9db0d.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b9db0d.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b9db0d.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced 189 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-7fca83.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-7fca83.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-7fca83.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced 189 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-9efc36.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_doublets)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-9efc36.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_doublets)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-9efc36.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_doublets)
[build] >>> referenced 1269 more times
...

@krasznaa
Copy link
Member Author

Some interesting stuff going on. 🤔 After fixing yet one more bug in the common track finding code, I can now get "this far":

[bash][Celeborn]:out > ./build/sycl-fp32/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (489 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
WARNING: No entries in volume finder

Detector check: OK
unknown file: Failure
C++ exception with description "Exceeded the number of registers available on the hardware.
        The number registers per work-group cannot exceed 65536 for this kernel on this device.
        The kernel uses 245 registers per work-item for a total of 512 work-items per work-group.
 -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)" thrown in the test body.

[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1, where GetParam() = ("toy_n_particles_10000", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, false) (5304 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
WARNING: No entries in volume finder

Detector check: OK
unknown file: Failure
C++ exception with description "Exceeded the number of registers available on the hardware.
        The number registers per work-group cannot exceed 65536 for this kernel on this device.
        The kernel uses 245 registers per work-item for a total of 512 work-items per work-group.
 -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)" thrown in the test body.

[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2, where GetParam() = ("toy_n_particles_10000_random_charge", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, true) (4559 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (10353 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (10354 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1, where GetParam() = ("toy_n_particles_10000", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, false)
[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2, where GetParam() = ("toy_n_particles_10000_random_charge", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, true)

 2 FAILED TESTS
[bash][Celeborn]:out >

Surprisingly, we run out of registers on this piece of code:

                oneapi::dpl::sort_by_key(policy, keys_device.begin(),
                                         keys_device.end(),
                                         param_ids_device.begin());

Which I'm trying to get some help with in: uxlfoundation/oneDPL#1936

Track finding is hard... 😦

@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from ed91591 to 665ea4a Compare November 20, 2024 20:33
@krasznaa
Copy link
Member Author

🎆 The unit test now succeeds with the NVIDIA backend! 🎆

[bash][Legolas]:out > ./build/sycl-fp32/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /data/ssd-1tb/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (321 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (4181 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (4162 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (8665 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (8665 ms total)
[  PASSED  ] 3 tests.
[bash][Legolas]:out >

But the PR still probably needs to stay open until we switch to oneAPI 2025.0.0 in the CI, to make the Intel backended build work as well. 🤔

@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch 4 times, most recently from 89717e4 to 8504f7f Compare November 29, 2024 19:45
@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 8504f7f to 24241b9 Compare December 2, 2024 13:37
@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 24241b9 to f3de0c6 Compare December 3, 2024 08:21
@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from f3de0c6 to 65ffad6 Compare December 3, 2024 12:15
@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from d5ee90c to 8f9b751 Compare December 6, 2024 12:49
@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch 3 times, most recently from 032f844 to 6486a22 Compare December 11, 2024 13:21
@krasznaa
Copy link
Member Author

Let me catch people up. The SYCL code seems to be functional finally, now that I fixed the very-very silly mistake that I made while translating it from CUDA. I can now reliably run the CKF unit test on our NVIDIA and AMD GPUs.

[bash][pcadp04]:traccc > ./build-sycl-nvidia/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-nvidia/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: NVIDIA RTX A5000
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (895 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
Created test queue on device: NVIDIA RTX A5000
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (4840 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
Created test queue on device: NVIDIA RTX A5000
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (4821 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (10557 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (10557 ms total)
[  PASSED  ] 3 tests.
[bash][pcadp04]:traccc >
[bash][pcadp04]:traccc > ./build-sycl-amd/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-amd/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: AMD Radeon RX 6700 XT
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (816 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
Created test queue on device: AMD Radeon RX 6700 XT
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (4972 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
Created test queue on device: AMD Radeon RX 6700 XT
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (4927 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (10717 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (10717 ms total)
[  PASSED  ] 3 tests.
[bash][pcadp04]:traccc >

I can also run the unit test successfully on a CPU.

[bash][pcadp04]:traccc > ONEAPI_DEVICE_SELECTOR=opencl:cpu ./build-sycl-intel/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-intel/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: AMD EPYC 7413 24-Core Processor
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (1027 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
Created test queue on device: AMD EPYC 7413 24-Core Processor
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (5204 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
Created test queue on device: AMD EPYC 7413 24-Core Processor
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (5110 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (11342 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (11342 ms total)
[  PASSED  ] 3 tests.
[bash][pcadp04]:traccc >

But unfortunately the SYCL runtime is having a very hard time with JIT-ing the track finding code for our Intel GPU. 😦

[bash][pcadp04]:traccc > ONEAPI_DEVICE_SELECTOR=opencl:gpu ./build-sycl-intel/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-intel/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: Intel(R) Data Center GPU Flex 170
WARNING: No entries in volume finder

Detector check: OK

(It is just stuck at this point, spinning the CPU at full tilt.) This unfortunately was to be expected. 😦 Since last week, when I tried to ahead-of-time compile the code specifically for our Intel GPU, the build had to be shut down after 3 hours. (It was still trying to link libtraccc_sycl.so after such a long time.)

So we'll have to have some heart-to-heart with the Intel developers. But at least the other backends seem functional by now. 🥳

@krasznaa
Copy link
Member Author

Just to show... After >1 hour of sitting on linking libtraccc_sycl.so, I see:

image

I'll keep the build running, let's see if it ever finishes... 🤔

The code doesn't do anything, and is not used by anything yet.
Made the "track liveness" buffers use char instead of unsigned int.
Since they store boolean information, char is enough. And it also
makes memset(...) do a more expected thing on the buffers.

Updated device::make_barcode_sequence not to narrow std::size_t
into unsigned int. (Which oneAPI doesn't like.)
Mainly to avoid type conversion / narrowing in the code, but also
to simplify it slightly in some places.
Trying to avoid confusion at runtime about which kernel is which.
Just as a convenience method for seeing in the SYCL unit tests which
exact device is used.
Taught traccc::sycl::test_queue how to figure out what sort
of a queue it is. So that the CKF test could be skipped on
OpenCL and Level-0 backends as long as those are still not
working.
@krasznaa krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 8155284 to 65fd176 Compare January 7, 2025 10:35
Copy link

sonarqubecloud bot commented Jan 7, 2025

@krasznaa
Copy link
Member Author

krasznaa commented Jan 9, 2025

Replaced by #811.

@krasznaa krasznaa closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sycl Changes related to SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants