Fix GPU compilation bugs that required val_unrolled_reduce workaround #18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After experimenting in CliMA/ClimaAtmos.jl#3313, I've found that two changes are needed to remove the
val_unrolled_reduce
workaround for GPUs: removal of internal function calls with keyword arguments and forced specialization on function arguments.The first of these changes also seems to marginally decrease the amount of time and memory required for compilation of some unit tests, with the decrease most clearly noticeable in the
Very Long Iterators
comparison table. This confirms a long-standing suspicion that function calls with keyword arguments make compilation of low-level kernels more difficult.I'm somewhat surprised that the second of these changes is necessary, given that everything is already inlined and it makes no observable difference for compilation on CPUs. Just to be on the safe side, though, I've added forced specialization for every function exported by this package.
Hopefully these changes will be enough to make UnrolledUtilities fully type-stable on GPUs.