You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, both ClimaAtmos and ClimaLand have variables in their caches that depend on the cache of the other model when coupled. This complicates coupling because it introduces ill-defined orders of operations that are required to have correctly computed cache variables. Without correctly handling these orderings, the beginning of our coupled simulations will be inconsistent across component models.
In typical simulations we may accept this as some spin-up time that we can disregard later on, but in the case of restarts this leads to inexactness, so that a restarted simulation will have different results from one that is not restarted. As we look towards running long, computationally-intensive runs on Derecho (where we have a walltime limit per run), we will need reliable and exact restarts.
Note that this issue does not come up when restarting individual component models, because we can first read initial conditions from forcing data (which does not rely on the model being run), then use these to compute the state and cache of the model. In this case, all cache variables depend on the known forcing data and state, so they can be computed correctly.
This SDI essentially communicates a new guideline with respect to the role of a cache in our models: Where possible, we should remove variables from our caches, and replace their access with on-the-fly functional computation. This is better suited to running simulations on GPUs, which have strong computational ability but are memory limited.
Cost/Benefits/Risks
Costs: developer time, potentially worse CPU performance
Benefits: simplified coupling and model structure, potentially better GPU performance
Risks: we may still end up with interdependence even after cleaning up what we can
Reduce ClimaAtmos and ClimaLand caches where possible
Identify existing interdependencies between ClimaAtmos and ClimaLand cache variables
Note that some interdependencies are inevitable, based in the physical processes being modeled (e.g. surface albedo and atmospheric radiation feedbacks). We should let the physics guide the order of operations when an ordering is necessary.
Identify minimal set of interdependencies between cache variables
Identify roadblocks for coupling within the caches
Track how these changes affect performance
Inputs
Existing models and coupling
Results and Deliverables
Schematic demonstrating the interdependencies between ClimaAtmos and ClimaLand cache variables
Identified minimal set of required interdependencies
Quantitative results of performance changes on both CPU and GPU due to these changes
This looks necessary and valuable and I am looking forward to seeing these changes being realized. Please keep me in the loop on conceptual questions (e.g., what we should cache or not, and questions involving order of operation).
The Climate Modeling Alliance
Software Design Issue 📜
Purpose
Currently, both ClimaAtmos and ClimaLand have variables in their caches that depend on the cache of the other model when coupled. This complicates coupling because it introduces ill-defined orders of operations that are required to have correctly computed cache variables. Without correctly handling these orderings, the beginning of our coupled simulations will be inconsistent across component models.
In typical simulations we may accept this as some spin-up time that we can disregard later on, but in the case of restarts this leads to inexactness, so that a restarted simulation will have different results from one that is not restarted. As we look towards running long, computationally-intensive runs on Derecho (where we have a walltime limit per run), we will need reliable and exact restarts.
Note that this issue does not come up when restarting individual component models, because we can first read initial conditions from forcing data (which does not rely on the model being run), then use these to compute the state and cache of the model. In this case, all cache variables depend on the known forcing data and state, so they can be computed correctly.
This SDI essentially communicates a new guideline with respect to the role of a cache in our models: Where possible, we should remove variables from our caches, and replace their access with on-the-fly functional computation. This is better suited to running simulations on GPUs, which have strong computational ability but are memory limited.
Cost/Benefits/Risks
Costs: developer time, potentially worse CPU performance
Benefits: simplified coupling and model structure, potentially better GPU performance
Risks: we may still end up with interdependence even after cleaning up what we can
People and Personnel
Lead: @juliasloan25 @Sbozzolo
Collaborators: @charleskawczynski @kmdeck @szy21 @trontrytel
Components
Inputs
Results and Deliverables
SDI Revision Log
22 Nov 2024: SDI created by @juliasloan25
CC
@tapios @sriharshakandala @charleskawczynski @cmbengue
Scope of Work
Tasks
The text was updated successfully, but these errors were encountered: