Clarify inlining of unrolled_reduce for non-orographic gravity wave #3313
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR replaces the call to
unrolled_reduce(op, Val(nc), init)
in the non-orographic gravity wave parametrization with a forcibly non-inlined call tounrolled_reduce(op, StaticOneTo(nc), init)
.The original solution was introduced in CliMA/UnrolledUtilities.jl#10 and CliMA/UnrolledUtilities.jl#12 in order to prevent compilation of 3 unit tests in CI from hanging (
nogw_test_3d.jl
,nogw_test_mima.jl
, andnogw_test_single_column.jl)
, but it does not have a direct equivalent in the standard library. The new version of this unrolled function call, which achieves the same goal of preventing compilation from hanging, is exactly equivalent toreduce(op, Base.OneTo(nc); init)
(and also toreduce(op, StaticOneTo(nc); init)
). The new version also makes it clear that inlining of the unrolled function call is what causes compilation to hang in this case.Note 1: This does not constitute a general rule of thumb, as it is possible to construct similar examples where not inlining is what causes compilation to hang. For example, if
itr
is aStaticBitVector
that contains 256Bools
, then the following code hangs without inlining ofunrolled_reduce
:Inlining has been observed to improve compilation of several unit tests in UnrolledUtilities, whereas not inlining has only been observed to improve compilation of this one particular example in ClimaAtmos. So, UnrolledUtilities will continue to inline everything by default, and
@noinline
can be used on a case-by-case basis when compilation is too slow.Note 2: Although the equivalent non-unrolled function call uses
init
as a keyword argument, and althoughunrolled_reduce
supports usinginit
as a keyword argument, this is not done here in order to ensure GPU compatibility. If the non-orographic gravity wave parameterization callsunrolled_reduce(op, StaticOneTo(nc); init)
, the GPU compiler claims that it cannot compile the resulting kernel because it contains a dynamic function call.@Xiaoyang-Xie, the compilation mystery is finally resolved!