Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPU compilation bugs that required val_unrolled_reduce workaround #18

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

dennisYatunin
Copy link
Member

@dennisYatunin dennisYatunin commented Oct 18, 2024

After experimenting in CliMA/ClimaAtmos.jl#3313, I've found that two changes are needed to remove the val_unrolled_reduce workaround for GPUs: removal of internal function calls with keyword arguments and forced specialization on function arguments.

The first of these changes also seems to marginally decrease the amount of time and memory required for compilation of some unit tests, with the decrease most clearly noticeable in the Very Long Iterators comparison table. This confirms a long-standing suspicion that function calls with keyword arguments make compilation of low-level kernels more difficult.

I'm somewhat surprised that the second of these changes is necessary, given that everything is already inlined and it makes no observable difference for compilation on CPUs. Just to be on the safe side, though, I've added forced specialization for every function exported by this package.

Hopefully these changes will be enough to make UnrolledUtilities fully type-stable on GPUs.

Copy link
Member

@charleskawczynski charleskawczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine by me. I'm curious if the necessary piece was specialization, removing the keyword, or both

@dennisYatunin
Copy link
Member Author

dennisYatunin commented Oct 22, 2024

The necessary piece is the specialization; without it the GPU gravity wave example in ClimaAtmos would encounter a type instability during compilation. The kwarg change came from two observations: GPU compilation would encounter a type instability when I used the kwarg method for unrolled_reduce instead of the non-kwarg method in the gravity wave example, and on average many of the unit tests in this package took slightly longer to compile on CPUs when kwarg methods were used in internal calls to unrolled_reduce. So, the kwarg change is intended to fix future GPU compilation errors, rather than any current issue.

@dennisYatunin dennisYatunin merged commit 8fd1653 into main Oct 22, 2024
9 checks passed
@dennisYatunin dennisYatunin deleted the dy/gpu_fixes branch October 22, 2024 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants