Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify inlining of unrolled_reduce for non-orographic gravity wave #3313

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

dennisYatunin
Copy link
Member

@dennisYatunin dennisYatunin commented Sep 18, 2024

This PR replaces the call to unrolled_reduce(op, Val(nc), init) in the non-orographic gravity wave parametrization with a forcibly non-inlined call to unrolled_reduce(op, StaticOneTo(nc), init).

The original solution was introduced in CliMA/UnrolledUtilities.jl#10 and CliMA/UnrolledUtilities.jl#12 in order to prevent compilation of 3 unit tests in CI from hanging (nogw_test_3d.jl, nogw_test_mima.jl, and nogw_test_single_column.jl), but it does not have a direct equivalent in the standard library. The new version of this unrolled function call, which achieves the same goal of preventing compilation from hanging, is exactly equivalent to reduce(op, Base.OneTo(nc); init) (and also to reduce(op, StaticOneTo(nc); init)). The new version also makes it clear that inlining of the unrolled function call is what causes compilation to hang in this case.

Note 1: This does not constitute a general rule of thumb, as it is possible to construct similar examples where not inlining is what causes compilation to hang. For example, if itr is a StaticBitVector that contains 256 Bools, then the following code hangs without inlining of unrolled_reduce:

unrolled_reduce(((itr′, i)->unrolled_reduce(((itr′′, j)->unrolled_reduce(((itr′′′, k)->Base.setindex(itr′′′, !(itr′′′[min(i, j, k)]), k)), StaticOneTo(length(itr′′)); init = itr′′)), StaticOneTo(length(itr′)); init = itr′)), StaticOneTo(length(itr)); init = itr)

Inlining has been observed to improve compilation of several unit tests in UnrolledUtilities, whereas not inlining has only been observed to improve compilation of this one particular example in ClimaAtmos. So, UnrolledUtilities will continue to inline everything by default, and @noinline can be used on a case-by-case basis when compilation is too slow.

Note 2: Although the equivalent non-unrolled function call uses init as a keyword argument, and although unrolled_reduce supports using init as a keyword argument, this is not done here in order to ensure GPU compatibility. If the non-orographic gravity wave parameterization calls unrolled_reduce(op, StaticOneTo(nc); init), the GPU compiler claims that it cannot compile the resulting kernel because it contains a dynamic function call.

@Xiaoyang-Xie, the compilation mystery is finally resolved!

@dennisYatunin
Copy link
Member Author

Looks like CliMA/UnrolledUtilities.jl#13 didn't help with the compilation hanging issue. I will now try out CliMA/UnrolledUtilities.jl#15 instead.

@dennisYatunin
Copy link
Member Author

CliMA/UnrolledUtilities.jl#15 also didn't fix the issue. Now trying out CliMA/UnrolledUtilities.jl#16. We might need to combine these two PRs before we have a solution, but maybe this will work on its own.

@dennisYatunin dennisYatunin force-pushed the dy/unrolled_inlining branch 8 times, most recently from 0753c18 to f98fb70 Compare October 16, 2024 23:36
@dennisYatunin dennisYatunin marked this pull request as ready for review October 16, 2024 23:36
@dennisYatunin dennisYatunin changed the title Try using new UnrolledUtilities branch Clarify inlining of unrolled_reduce for non-orographic gravity wave Oct 17, 2024
@dennisYatunin dennisYatunin added this pull request to the merge queue Oct 23, 2024
Merged via the queue into main with commit 09f7aff Oct 23, 2024
15 of 16 checks passed
@dennisYatunin dennisYatunin deleted the dy/unrolled_inlining branch October 23, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant