You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The COBAHH example uses more than 32 registers per threads even in single precision, reducing the theoretical occupancy of the stateupdate kernel, see #266. I'm wondering if there is a way to easily reduce the register usage by modifying the way the code is generated. Currently, there are many intermediate variables produced (the lio variables in the generated code). I guess this is optimized for C++ performance, that means to generate code with as few operations as possible. Is there a way to instead optimize for as few intermediate results that have to remain in registers as possible? For the GPU it would be much more important to reach 100% theoretical occupancy than to reduce the number of arithmetic operations.
The text was updated successfully, but these errors were encountered:
Try disabling loop invariant optimizations. They make sense for C++, where constants used for all indices of a loop are precomputed once in order to reduce computation time in the loop. Makes no sense for GPU, where each thread computes those constants. And this likely increases register usage.
The COBAHH example uses more than 32 registers per threads even in single precision, reducing the theoretical occupancy of the stateupdate kernel, see #266. I'm wondering if there is a way to easily reduce the register usage by modifying the way the code is generated. Currently, there are many intermediate variables produced (the
lio
variables in the generated code). I guess this is optimized for C++ performance, that means to generate code with as few operations as possible. Is there a way to instead optimize for as few intermediate results that have to remain in registers as possible? For the GPU it would be much more important to reach 100% theoretical occupancy than to reduce the number of arithmetic operations.The text was updated successfully, but these errors were encountered: