-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move MPI and CUDA to Julia extensions #75
Conversation
fd63a69
to
cab5160
Compare
a6b83c7
to
0e711e9
Compare
0e711e9
to
b0d754e
Compare
3204aa9
to
3bf2e6f
Compare
3bf2e6f
to
71d8fa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
c95e994
to
0a816f7
Compare
I added a second commit to try to output more informative log/error messages. @charleskawczynski could you have a second look? I also tried testing this in ClimaCore/ClimaAtmos, but it is a big pain. ClimaComms has tendrils everywhere, and bumping to 0.6 makes updating really tricky because of circular dependencies of the test environments. To run with ClimaAtmos, I had to check out all the branches and dev this way:
I also had to comment out I couldn't get this to work on buildkite because I need to dev everything at the same time and I couldn't make that work. I tried with ClimaAtmos on 2 gpus on the cluster with:
And this works (adjusting one thing for the new remapper) I couldn't perform a more comprehesive test, but things are looking code from the interactive tests I've performed. So, I think we should release ClimaComms 0.6 and move the various other packages one by one. For ClimaUtilities, ClimaTimesteppers, RRTMGP, ClimaCoreTempestRemap the update is easy and non breaking, so we should move them first. Then, we can probably move to ClimaCore and release 0.14, and re-update ClimaUtilities, ClimaTimeSteppers, RRTMGP, ClimaCoreTempestRemap to support ClimaCore 0.14. Finally, we can update ClimaCoupler, ClimaAtmos, and ClimaLand to ClimaCore 0.14 and ClimaComms 0.6 |
Mostly so that we can print a log message informing the user we loaded MPI/CUDA
0a816f7
to
53b3b9f
Compare
Every package that depends on ClimaComms depends on CUDA and MPI. As a result, user of dowstream packages have to obtain CUDA and MPI even if they don't cannot or don't want to use either of the two.
This PR moves CUDA and MPI to extensions, opening the path to move support for CUDA and MPI in downstream packages to extensions too.
The benefits of using extensions are reduced instantiation and load time (eg, in every GitHub Action runner), but also establishes a pattern of supporting different backends/accelerators in a more modular way instead of entangling everything together. The downside is that extensions are not complete and there are things that cannot be done at the moment. One example is defining new types in the main module (without
evaling
). This is manifest in this PR, where I pushed the definition ofMPISendRecvGraphContext
andMPIPersistentSendRecvGraphContext
to the extension. A second downside is that extensions are a little brittle right now. For example, there can be errors in precompilation. That said, supporting backends like CUDA is precisely what extensioins were designed for.Closes #74
Closes #4
Closes #69