Pangu Improvements #656

dallasfoster · 2024-08-27T19:19:21Z

Modulus Pull Request

Description

This PR adds the following features/changes to the Pangu model and training script:

Configurable number of constant, surface, and atmosphere variables in the model.
Configurable number of upsampled and downsampled transformer blocks.
Gradient checkpointing support in the Pangu processor (encoder/decoder) layers.
Improved training script with improved static capture support, multistep rollout, validation function, and weighted loss function.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

dallasfoster · 2024-08-27T19:19:38Z

/blossom-ci

dallasfoster · 2024-08-27T20:08:48Z

/blossom-ci

dallasfoster · 2024-08-27T20:52:56Z

/blossom-ci

dallasfoster · 2024-08-27T22:49:16Z

/blossom-ci

dallasfoster · 2024-08-28T19:52:46Z

/blossom-ci

dallasfoster · 2024-08-28T20:28:05Z

Depends on #660

pzharrington · 2024-11-27T18:57:57Z

CHANGELOG.md


 ### Changed

 - Refactored CorrDiff training recipe for improved usability
+- Refactored Pangu model for better extensibility and gradient checkpointing support.
+    Some of these changes are not backward compatible.


Perhaps comment on what specifically is not backward compatible? Is it just the removal of the prepare_input routine from the Pangu model?

pzharrington · 2024-11-27T19:00:51Z

examples/weather/pangu_weather/train_pangu_era5.py

+                outpred = my_model(invar_)
+                loss += loss_func(outpred, outvar[b : b + 1, t], weights) / batch_size
+                invar_ = outpred
+


Is there a reason this inner multistep loop cannot be run with batch_size > 1? Would be nice to support batched rollout training if it could fit in memory

pzharrington · 2024-11-27T19:01:58Z

examples/weather/pangu_weather/train_pangu_era5.py

@@ -203,122 +262,197 @@ def main(cfg: DictConfig) -> None:
            )
        torch.cuda.current_stream().wait_stream(ddps)

+    # pangu_model = torch.compile(pangu_model, mode = "max-autotune")


Can drop these commented out .compile statements if they're unused

pzharrington · 2024-11-27T19:02:39Z

modulus/models/pangu/pangu_processor.py

+import torch
+
+from ..layers import DownSample3D, FuserLayer, UpSample3D
+from ..module import Module


Prefer direct imports here

pzharrington · 2024-11-27T19:06:55Z

I added some minor suggestions, overall this looks great though. I found the LambdaLR scheme with the custom hydra resolver a bit convoluted, maybe a ConstantLR would be simpler and more readable to achieve the same effect. However it is nice to have the example in there if someone wants to do more custom scheduling.

dallasfoster added 6 commits August 27, 2024 11:20

initial improvements commit

9487fbd

update pangu model arguments

827ed29

add pangu processor to index

d1a28ca

update tests

5020208

Merge branch 'main' into dallasf/pangu_improvements

3ca1653

update training configs

bb58b08

dallasfoster self-assigned this Aug 27, 2024

flush outputs

0d5f573

dallasfoster added 2 commits August 27, 2024 13:49

remove combo test

2619b23

update changelog

fbfdfc4

fix test issues

9840a5b

dallasfoster added 2 commits August 28, 2024 10:48

add more checkpointing

547503c

update model

67b2aeb

dallasfoster requested a review from mnabian August 28, 2024 20:28

pzharrington reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pangu Improvements #656

Pangu Improvements #656

dallasfoster commented Aug 27, 2024 •

edited

Loading

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 28, 2024

dallasfoster commented Aug 28, 2024

pzharrington Nov 27, 2024

pzharrington Nov 27, 2024

pzharrington Nov 27, 2024

pzharrington Nov 27, 2024

pzharrington commented Nov 27, 2024

Pangu Improvements #656

Are you sure you want to change the base?

Pangu Improvements #656

Conversation

dallasfoster commented Aug 27, 2024 • edited Loading

Modulus Pull Request

Description

Checklist

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 27, 2024

dallasfoster commented Aug 28, 2024

dallasfoster commented Aug 28, 2024

pzharrington Nov 27, 2024

Choose a reason for hiding this comment

pzharrington Nov 27, 2024

Choose a reason for hiding this comment

pzharrington Nov 27, 2024

Choose a reason for hiding this comment

pzharrington Nov 27, 2024

Choose a reason for hiding this comment

pzharrington commented Nov 27, 2024

dallasfoster commented Aug 27, 2024 •

edited

Loading