-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
20 changed files
with
4,025 additions
and
237 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,61 +1,38 @@ | ||
# ChaosBench - A benchmark for long-term forecasting of chaotic systems | ||
ChaosBench is a benchmark project to improve long-term forecasting of chaotic systems, in particular subseasonal-to-seasonal (S2S) weather. Current features include: | ||
|
||
## 1. Benchmark and Dataset | ||
|
||
- __Input:__ ERA5 Reanalysis (1979-2022) | ||
|
||
- __Target:__ The following table indicates the 48 variables (channels) that are available for Physics-based models. Note that the __Input__ ERA5 observations contains __ALL__ fields, including the unchecked boxes: | ||
|
||
Parameters/Levels (hPa) | 1000 | 925 | 850 | 700 | 500 | 300 | 200 | 100 | 50 | 10 | ||
:---------------------- | :----| :---| :---| :---| :---| :---| :---| :---| :--| :-| | ||
Geopotential height, z ($gpm$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
Specific humidity, q ($kg kg^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | | | | ||
Temperature, t ($K$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
U component of wind, u ($ms^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
V component of wind, v ($ms^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
Vertical velocity, w ($Pas^{-1}$) | | | | | ✓ | | | | | | | ||
|
||
- __Baselines:__ | ||
- Physics-based models: | ||
- [x] UKMO: UK Meteorological Office | ||
- [x] NCEP: National Centers for Environmental Prediction | ||
- [x] CMA: China Meteorological Administration | ||
- [x] ECMWF: European Centre for Medium-Range Weather Forecasts | ||
- Data-driven models: | ||
- [x] Lagged-Autoencoder | ||
- [x] Fourier Neural Operator (FNO) | ||
- [x] ResNet | ||
- [x] UNet | ||
- [x] ViT/ClimaX | ||
- [x] PanguWeather | ||
- [x] Fourcastnetv2 | ||
|
||
## 2. Metrics | ||
We divide our metrics into 2 classes: (1) ML-based, which cover evaluation used in conventional computer vision and forecasting tasks, (2) Physics-based, which are aimed to construct a more physically-faithful and explainable data-driven forecast. | ||
|
||
- __Vision-based:__ | ||
- [x] RMSE | ||
- [x] Bias | ||
- [x] Anomaly Correlation Coefficient (ACC) | ||
- [x] Multiscale Structural Similarity Index (MS-SSIM) | ||
- __Physics-based:__ | ||
- [x] Spectral Divergence (SpecDiv) | ||
- [x] Spectral Residual (SpecRes) | ||
|
||
|
||
## 3. Tasks | ||
We presented two task, where the model still takes as inputs the __FULL__ 60 variables, but the benchmarking is done on either __ALL__ or a __SUBSET__ of target variable(s). | ||
|
||
- __Task 1: Full Dynamics Prediction.__ | ||
It is aimed at __ALL__ target channels simultaneously. This task is generally harder to perform but is useful to build a model that emulates the entire weather conditions. | ||
|
||
- __Task 2: Sparse Dynamics Prediction.__ | ||
It is aimed at a __SUBSET__ of target channel(s). This task is useful to build long-term forecasting model for specific variables, such as near-surface temperature (t-1000) or near-surface humidity (q-1000). | ||
|
||
## 4. Getting Started | ||
You can learn more about how to use our benchmark product through the following Jupyter notebooks under the `notebooks` directory. It covers topics ranging from: | ||
- `01*_dataset_exploration` | ||
- `02*_modeling` | ||
- `03*_training` | ||
- `04*_evaluation` | ||
# ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction | ||
|
||
|
||
ChaosBench is a benchmark project to improve long-term forecasting of chaotic systems, in particular subseasonal-to-seasonal (S2S) climate, using ML approaches. | ||
|
||
Homepage 🔗: https://leap-stc.github.io/ChaosBench | ||
|
||
Paper 📚: https://arxiv.org/ | ||
|
||
Dataset 🤗: https://huggingface.co/datasets/juannat7/ChaosBench | ||
|
||
|
||
## Features | ||
|
||
data:image/s3,"s3://crabby-images/40ee0/40ee02ca68ee0cf66d22c559c57210fa55b4ff26" alt="Overview of ChaosBench" | ||
|
||
1️⃣ __Extended Observations__. Spanning over 45 years (1979 - 2023) of ERA5 reanalysis | ||
|
||
2️⃣ __Diverse Baselines__. Wide selection of physics-based forecasts from leading national agencies in Europe, the UK, America, and Asia | ||
|
||
3️⃣ __Differentiable Physics Metrics__. Introduces two differentiable physics-based metrics to minimize the decay of power spectra at long forecasting horizon (blurriness) | ||
|
||
4️⃣ __Large-Scale Benchmarking__. Systematic evaluation for state-of-the-art ML-based weather models like PanguWeather, FourcastNetV2, ViT/ClimaX, and Graphcast | ||
|
||
|
||
## Getting Started | ||
- [Quickstart](https://leap-stc.github.io/ChaosBench/quickstart.html) | ||
- [Dataset Overview](https://leap-stc.github.io/ChaosBench/dataset.html) | ||
- [Task Overview](https://leap-stc.github.io/ChaosBench/task.html) | ||
|
||
|
||
## Build Your Own Model | ||
- [Training](https://leap-stc.github.io/ChaosBench/training.html) | ||
- [Evaluation](https://leap-stc.github.io/ChaosBench/evaluation.html) | ||
|
||
## Benchmarking | ||
- [Baseline Models](https://leap-stc.github.io/ChaosBench/baseline.html) | ||
- [Leaderboard](https://leap-stc.github.io/ChaosBench/leaderboard.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Baseline Models | ||
We differentiate between physics-based and data-driven models. The former is succintly illustrated as in the figure below. | ||
|
||
<div style="text-align: center;"> | ||
<img src="../docs/scheme/chaosbench_scheme-physics-model.jpg" style="width:300px;"/> | ||
</div> | ||
|
||
## Model Definition | ||
- __Physics-Based Models__: | ||
- [x] UKMO: UK Meteorological Office | ||
- [x] NCEP: National Centers for Environmental Prediction | ||
- [x] CMA: China Meteorological Administration | ||
- [x] ECMWF: European Centre for Medium-Range Weather Forecasts | ||
|
||
- __Data-Driven Models__: | ||
- [x] Lagged-Autoencoder | ||
- [x] Fourier Neural Operator (FNO) | ||
- [x] ResNet | ||
- [x] UNet | ||
- [x] ViT/ClimaX | ||
- [x] PanguWeather | ||
- [x] Fourcastnetv2 | ||
- [x] GraphCast | ||
|
||
## Model Checkpoints | ||
Checkpoints for data-driven models are accessible from [here](https://huggingface.co/datasets/juannat7/ChaosBench/tree/main/logs) | ||
|
||
- Data-driven models are indicated by the `_s2s` suffix (e.g., `unet_s2s`). | ||
|
||
- The hyperparameter specifications are located in `version_xx/lightning_logs/hparams.yaml`. The hyperparameters encode the following: | ||
|
||
- `lead_time` (default: 1): arbitrary delta_t to finetune the model, for direct approach | ||
- `n_step` (default: 1): number of autoregressive step, s, for autoregressive approach | ||
- `only_headline`: if false, optimize for task 1; if true for task 2 | ||
- `batch_size`: the batch size used for training | ||
- `train_years`: list of years used for training | ||
- `val_years`: list of years used for validation | ||
- `epochs`: number of epoch | ||
- `input_size`: number of input channel | ||
- `learning_rate`: update step at each iteration | ||
- `model_name`: the name of the model used for consistency | ||
- `num_workers`: number of workers used in dataloader | ||
- `output_size`: number of output channel | ||
- `t_max`: number of cosine learning rate scheduler cycle | ||
|
||
__NOTE__: You will notice that for each data-driven model, there are 4 checkpoints. | ||
|
||
1. Version 0 - Task 1; autoregressive up to 1-day ahead | ||
2. Version 1 - Task 1; autoregressive up to 5-day ahead | ||
3. Version 2 - Task 2; autoregressive up to 1-day ahead | ||
4. Version 3 - Task 2; autoregressive up to 5-day ahead | ||
|
||
Only for `unet_s2s` do we have many more checkpoints. This is to check for the effect of `direct` vs. `autoregressive` training approach described in the paper. In particular, the `direct` models have the following version numbers, | ||
1. Version {0, 4, 5, 6, 7, 8, 9, 10, 11, 12} - Task 1 | ||
2. Version {2, 13, 14, 15, 16, 17, 18, 19, 20, 21} - Task 2 | ||
|
||
Each element in the array corresponds to checkpoints optimized for each $\Delta T \in \{1, 5, 10, 15, 20, 25, 30, 35, 40, 44\}$. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Dataset Information | ||
|
||
> __NOTE__: Hands-on exploration of the ChaosBench dataset in `notebooks/01a_s2s_data_exploration.ipynb` | ||
1. __Input:__ ERA5 Reanalysis (1979-2023) | ||
|
||
2. __Target:__ The following table indicates the 48 variables (channels) that are available for Physics-based models. Note that the __Input__ ERA5 observations contains __ALL__ fields, including the unchecked boxes: | ||
|
||
Parameters/Levels (hPa) | 1000 | 925 | 850 | 700 | 500 | 300 | 200 | 100 | 50 | 10 | ||
:---------------------- | :----| :---| :---| :---| :---| :---| :---| :---| :--| :-| | ||
Geopotential height, z ($gpm$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
Specific humidity, q ($kg kg^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | | | | ||
Temperature, t ($K$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
U component of wind, u ($ms^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
V component of wind, v ($ms^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | ||
Vertical velocity, w ($Pas^{-1}$) | | | | | ✓ | | | | | | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Evaluation | ||
|
||
After training your model, you can simply perform evaluation by running: | ||
|
||
1. __Autoregressive__ | ||
``` | ||
python eval_iter.py --model_name <YOUR_MODEL>_s2s --eval_years 2023 --version_num <VERSION_NUM> | ||
``` | ||
|
||
2. __Direct__ | ||
``` | ||
python eval_direct.py --model_name <YOUR_MODEL>_s2s --eval_years 2023 --version_nums <VERSION_NUM> --task_num <TASK_NUM> | ||
``` | ||
|
||
Where `<VERSION_NUM(S)>` corresponds to the version(s) that `pytorch_lightning` generated during training. | ||
|
||
__For example__, in our `unet_s2s` baseline model, we can run: | ||
|
||
- Autoregressive: `python eval_iter.py --model_name unet_s2s --eval_years 2023 --version_num 0` | ||
|
||
- Direct: `python eval_direct.py --model_name unet_s2s --eval_years 2023 --version_nums 0 4 5 6 7 8 9 10 11 12 --task_num 1` | ||
|
||
|
||
## Accessing Baseline Scores | ||
You can access the complete scores (in `.csv` format) for data-driven, physics-based models, climatology, and persistence [here](https://huggingface.co/datasets/juannat7/ChaosBench/tree/main/logs). Below is a snippet from `logs/climatology/eval/rmse_climatology.csv`, where each row represents `<METRIC>`, such as `RMSE`, at each future timestep. | ||
|
||
| z-10 | z-50 | z-100 | z-200 | z-300 | ... | w-1000 | | ||
|----------|----------|----------|----------|----------|-----|----------| | ||
| 539.7944 | 285.9499 | 215.14742| 186.43161| 166.28784| ... | 0.07912156| | ||
| 538.9591 | 285.43832| 214.82317| 186.23743| 166.16902| ... | 0.07907272| | ||
| 538.1366 | 284.96063| 214.51791| 186.04941| 166.04732| ... | 0.07903882| | ||
| ... | ... | ... | ... | ... | ... | ... | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Leaderboard | ||
|
||
We divide our metrics into 2 classes: (1) ML-based, which cover evaluation used in conventional computer vision and forecasting tasks, (2) Physics-based, which are aimed to construct a more physically-faithful and explainable data-driven forecast. | ||
|
||
1. __Vision-based:__ | ||
- [x] RMSE | ||
- [x] Bias | ||
- [x] Anomaly Correlation Coefficient (ACC) | ||
- [x] Multiscale Structural Similarity Index (MS-SSIM) | ||
2. __Physics-based:__ | ||
- [x] Spectral Divergence (SpecDiv) | ||
- [x] Spectral Residual (SpecRes) | ||
|
||
|
||
For all models (data-driven, physics-based, etc), there is a folder named `eval/`. This contains individual `.csv` files for each metric (e.g., SpecDiv, RMSE). Within each file, it contains scores for all channels in question (e.g., the entire 60 for task 1, arbitrary n for task 2, or 48 for physics-based models) across 44-day lead time. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Quickstart | ||
|
||
**Step 1**: Clone the [ChaosBench](https://github.com/leap-stc/ChaosBench) Github repository | ||
|
||
**Step 2**: Create local directory to store your data, e.g., | ||
``` | ||
cd ChaosBench | ||
mkdir data | ||
``` | ||
|
||
**Step 3**: Navigate to `chaosbench/config.py` and change the field `DATA_DIR = /<YOUR_WORKING_DIR>/ChaosBench/data` (_Provide absolute path_) | ||
|
||
**Step 4**: Initialize the space by running | ||
``` | ||
cd /<YOUR_WORKING_DIR>/ChaosBench/data/ | ||
wget https://huggingface.co/datasets/juannat7/ChaosBench/blob/main/process.sh | ||
chmod +x process.sh | ||
``` | ||
**Step 5**: Download the data | ||
|
||
``` | ||
# NOTE: you can also run each line one at a time to retrieve individual dataset | ||
./process.sh era5 # Required: For input ERA5 data | ||
./process.sh climatology # Required: For climatology | ||
./process.sh ukmo # Optional: For simulation from UKMO | ||
./process.sh ncep # Optional: For simulation from NCEP | ||
./process.sh cma # Optional: For simulation from CMA | ||
./process.sh ecmwf # Optional: For simulation from ECMWF | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Task Overview | ||
|
||
We presented __TWO__ task, where the model still takes as __inputs the FULL__ 60 variables, but the benchmarking __targets ALL or SUBSET__ of variable(s). | ||
|
||
1. __Task 1️⃣: Full Dynamics Prediction.__ | ||
It is aimed at __ALL__ target channels simultaneously. This task is generally harder to perform but is useful to build a model that emulates the entire weather conditions. | ||
|
||
2. __Task 2️⃣: Sparse Dynamics Prediction.__ | ||
It is aimed at a __SUBSET__ of target channel(s). This task is useful to build long-term forecasting model for specific variables, such as near-surface temperature (t-1000) or near-surface humidity (q-1000). | ||
|
||
__NOTE__: Before training your own model [instructions here](https://leap-stc.github.io/ChaosBench/training.html), you can specify the Task you are optimizing for by changing `only_headline` field in `chaosbench/configs/<YOUR_MODEL>_s2s.yaml` file: | ||
|
||
- Task 1️⃣: `only_headline: False` | ||
|
||
- Task 2️⃣: `only_headline: True`. By default, it is going to optimize on {t-850, z-500, q-700}. To change this, modify the `HEADLINE_VARS` field in `chaosbench/config.py` | ||
|
||
In addition, we also provide flags to train the model either __autoregressively__ or __directly__. | ||
|
||
- Autoregressive: Using current output as the next model input. The number of iterative steps is defined in the `n_step: <N_STEP>` field. For our baselines, we set `N_STEP = 5`. | ||
|
||
- Direct: Directly targeting specific time in the future. The lead time can be specified in the `lead_time: <LEAD_TIME>` field. Ensure that `n_step: 1` for this case. For our baselines, we set `<LEAD_TIME>` $\in \{1, 5, 10, 15, 20, 25, 30, 35, 40, 44\}$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Training | ||
|
||
> __NOTE__: Hands-on modeling and training workflow in `notebooks/02a_s2s_modeling.ipynb` and `notebooks/03a_s2s_train.ipynb` | ||
We will outline how one can implement their own data-driven models. Several examples, including ED, FNO, ResNet, and UNet have been provided. | ||
|
||
**Step 1**: Define your model class in `chaosbench/models/<YOUR_MODEL>.py`. At present, we only support models built with `PyTorch` | ||
|
||
**Step 2**: Initialize your model in `chaosbench/models/model.py` under `__init__` method in `S2SBenchmarkModel` class | ||
|
||
**Step 3**: Write a configuration file in `chaosbench/configs/<YOUR_MODEL>_s2s.yaml`. We recommend reading the details on the definition of [hyperparameters](https://leap-stc.github.io/ChaosBench/baseline.html) and the different [task]((https://leap-stc.github.io/ChaosBench/task.html) before training. Also change the `model_name: <YOUR_MODEL>_s2s` to ensure correct pathing | ||
|
||
- Task 1️⃣ (autoregressive): `only_headline: False ; n_step: <N_STEP>` | ||
- Task 1️⃣ (direct): `only_headline: False ; n_step: 1 ; lead_time: <LEAD_TIME>` | ||
|
||
- Task 2️⃣ (autoregressive): `only_headline: True ; n_step: <N_STEP>` | ||
- Task 2️⃣ (direct): `only_headline: True ; n_step: 1 ; lead_time: <LEAD_TIME>` | ||
|
||
|
||
**Step 4**: Train by running `python train.py --config_filepath chaosbench/configs/<YOUR_MODEL>_s2s.yaml` | ||
|
||
**Step 5**: Done! | ||
|
||
__NOTE__: Remember to replace `<YOUR_MODEL>` with your own model name, e.g., `unet`. Checkpoints and logs would be automatically generated in `logs/<YOUR_MODEL>_s2s/`. |
Oops, something went wrong.