-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread ID Cleanup, main branch (2025.01.09.) #810
Thread ID Cleanup, main branch (2025.01.09.) #810
Conversation
Made all of them into "private headers", and added automated tests that they would fulfill the appropriate concept.
While also cleaning up the includes of the files a bit.
ced855e
to
74badb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally dislike this idea that because vecmem only supports 32-bit integers we should be downcasting all out accesses in a 64-bit memory space to 32-bit at the generation site of those indices, rather than generating them at the appropriate size and only downcasting them necessary.
/// Function creating a global index in a 1D CUDA kernel | ||
__device__ inline device::global_index_t global_index1() { | ||
|
||
return blockIdx.x * blockDim.x + threadIdx.x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think we need this; the thread identifier classes already serve this purpose. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it. But note that global_index_t
is sort of "its own thing" in the code, it's not directly tied to the thread_id
classes. Those are still allowed to return any integer if they really need to. (That's another story by itself.)
This just seemed like a nicely readable way of expressing what we want. 🤔 In both the CUDA and SYCL code.
Quality Gate passedIssues Measures |
Following #808, I thought it would be time to clean up the rest of the device functions as well with how they receive thread identifiers.
traccc::device::global_index_t
;static_cast
-s from these functions;thread_id
and/orbarrier
object, now do so using constant references;const
functions. So I didn't see much reason for using non-const references here. 🤔After this, I went and harmonized the CUDA, SYCL and ALPAKA codes a little as well.
thread_id
types to be private headers in their libraries;traccc::device::concepts::thread_id1
concept;traccc::cuda::details::global_index1()
andtraccc::sycl::details::global_index(...)
as helper functions for generatingtraccc::device::global_index_t
values;While doing all of this, I tried to fix up the includes in all the touched files a bit. Since many of them were doing very questionable things. (Including way more files than necessary, hiding missing includes in some of the common device headers.)
I'm pretty happy with these updates myself, but am interested in your opinions.