O1.3.4a Ensure that model caches are diagnostic and avoid complicated interdependencies #1098

juliasloan25 · 2024-11-23T00:18:37Z

The Climate Modeling Alliance

Software Design Issue 📜

Purpose

Currently, both ClimaAtmos and ClimaLand have variables in their caches that depend on the cache of the other model when coupled. This complicates coupling because it introduces ill-defined orders of operations that are required to have correctly computed cache variables. Without correctly handling these orderings, the beginning of our coupled simulations will be inconsistent across component models.

In typical simulations we may accept this as some spin-up time that we can disregard later on, but in the case of restarts this leads to inexactness, so that a restarted simulation will have different results from one that is not restarted. As we look towards running long, computationally-intensive runs on Derecho (where we have a walltime limit per run), we will need reliable and exact restarts.

Note that this issue does not come up when restarting individual component models, because we can first read initial conditions from forcing data (which does not rely on the model being run), then use these to compute the state and cache of the model. In this case, all cache variables depend on the known forcing data and state, so they can be computed correctly.

This SDI essentially communicates a new guideline with respect to the role of a cache in our models: Where possible, we should remove variables from our caches, and replace their access with on-the-fly functional computation. This is better suited to running simulations on GPUs, which have strong computational ability but are memory limited.

Cost/Benefits/Risks

Costs: developer time, potentially worse CPU performance
Benefits: simplified coupling and model structure, potentially better GPU performance
Risks: we may still end up with interdependence even after cleaning up what we can

People and Personnel

Lead: @juliasloan25 @Sbozzolo
Collaborators: @charleskawczynski @kmdeck @szy21 @trontrytel

Components

Reduce ClimaAtmos and ClimaLand caches where possible
Identify existing interdependencies between ClimaAtmos and ClimaLand cache variables
- Note that some interdependencies are inevitable, based in the physical processes being modeled (e.g. surface albedo and atmospheric radiation feedbacks). We should let the physics guide the order of operations when an ordering is necessary.
Identify minimal set of interdependencies between cache variables
Identify roadblocks for coupling within the caches
Track how these changes affect performance

Inputs

Existing models and coupling

Results and Deliverables

Schematic demonstrating the interdependencies between ClimaAtmos and ClimaLand cache variables
Identified minimal set of required interdependencies
Quantitative results of performance changes on both CPU and GPU due to these changes

SDI Revision Log

22 Nov 2024: SDI created by @juliasloan25

CC

@tapios @sriharshakandala @charleskawczynski @cmbengue

Scope of Work

Tasks

Give feedback

Remove ClimaAtmos cache variables where possible - roadblocks compiled in Cache elimination limitations ClimaAtmos.jl#3454
Remove ClimaLand cache variables where possible
Identify existing interdependencies between ClimaAtmos and ClimaLand cache variables
Identify minimal set of interdependencies between cache variables
Track GPU + CPU performance as caches change
Options

tapios · 2024-11-23T18:30:21Z

This looks necessary and valuable and I am looking forward to seeing these changes being realized. Please keep me in the loop on conceptual questions (e.g., what we should cache or not, and questions involving order of operation).

juliasloan25 added the 🏅 SDI Software Design Issue label Nov 23, 2024

juliasloan25 assigned juliasloan25, trontrytel, charleskawczynski, Sbozzolo, szy21 and kmdeck Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

O1.3.4a Ensure that model caches are diagnostic and avoid complicated interdependencies #1098

O1.3.4a Ensure that model caches are diagnostic and avoid complicated interdependencies #1098

juliasloan25 commented Nov 23, 2024 •

edited

Loading

Tasks

tapios commented Nov 23, 2024

O1.3.4a Ensure that model caches are diagnostic and avoid complicated interdependencies #1098

O1.3.4a Ensure that model caches are diagnostic and avoid complicated interdependencies #1098

Comments

juliasloan25 commented Nov 23, 2024 • edited Loading

The Climate Modeling Alliance

Software Design Issue 📜

Purpose

Cost/Benefits/Risks

People and Personnel

Components

Inputs

Results and Deliverables

SDI Revision Log

CC

Scope of Work

Tasks

tapios commented Nov 23, 2024

juliasloan25 commented Nov 23, 2024 •

edited

Loading