Skip to content

Commit

Permalink
deploy: 291aaef
Browse files Browse the repository at this point in the history
  • Loading branch information
juannat7 committed Jan 26, 2024
1 parent 01d72ca commit 6106eb9
Show file tree
Hide file tree
Showing 20 changed files with 4,025 additions and 237 deletions.
218 changes: 47 additions & 171 deletions README.html

Large diffs are not rendered by default.

99 changes: 38 additions & 61 deletions _sources/README.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,38 @@
# ChaosBench - A benchmark for long-term forecasting of chaotic systems
ChaosBench is a benchmark project to improve long-term forecasting of chaotic systems, in particular subseasonal-to-seasonal (S2S) weather. Current features include:

## 1. Benchmark and Dataset

- __Input:__ ERA5 Reanalysis (1979-2022)

- __Target:__ The following table indicates the 48 variables (channels) that are available for Physics-based models. Note that the __Input__ ERA5 observations contains __ALL__ fields, including the unchecked boxes:

Parameters/Levels (hPa) | 1000 | 925 | 850 | 700 | 500 | 300 | 200 | 100 | 50 | 10
:---------------------- | :----| :---| :---| :---| :---| :---| :---| :---| :--| :-|
Geopotential height, z ($gpm$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Specific humidity, q ($kg kg^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |   |   |   |
Temperature, t ($K$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
U component of wind, u ($ms^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
V component of wind, v ($ms^{-1}$) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Vertical velocity, w ($Pas^{-1}$) |   |   |   |   | ✓ |   |   |   |   |   |

- __Baselines:__
- Physics-based models:
- [x] UKMO: UK Meteorological Office
- [x] NCEP: National Centers for Environmental Prediction
- [x] CMA: China Meteorological Administration
- [x] ECMWF: European Centre for Medium-Range Weather Forecasts
- Data-driven models:
- [x] Lagged-Autoencoder
- [x] Fourier Neural Operator (FNO)
- [x] ResNet
- [x] UNet
- [x] ViT/ClimaX
- [x] PanguWeather
- [x] Fourcastnetv2

## 2. Metrics
We divide our metrics into 2 classes: (1) ML-based, which cover evaluation used in conventional computer vision and forecasting tasks, (2) Physics-based, which are aimed to construct a more physically-faithful and explainable data-driven forecast.

- __Vision-based:__
- [x] RMSE
- [x] Bias
- [x] Anomaly Correlation Coefficient (ACC)
- [x] Multiscale Structural Similarity Index (MS-SSIM)
- __Physics-based:__
- [x] Spectral Divergence (SpecDiv)
- [x] Spectral Residual (SpecRes)


## 3. Tasks
We presented two task, where the model still takes as inputs the __FULL__ 60 variables, but the benchmarking is done on either __ALL__ or a __SUBSET__ of target variable(s).

- __Task 1: Full Dynamics Prediction.__
It is aimed at __ALL__ target channels simultaneously. This task is generally harder to perform but is useful to build a model that emulates the entire weather conditions.

- __Task 2: Sparse Dynamics Prediction.__
It is aimed at a __SUBSET__ of target channel(s). This task is useful to build long-term forecasting model for specific variables, such as near-surface temperature (t-1000) or near-surface humidity (q-1000).

## 4. Getting Started
You can learn more about how to use our benchmark product through the following Jupyter notebooks under the `notebooks` directory. It covers topics ranging from:
- `01*_dataset_exploration`
- `02*_modeling`
- `03*_training`
- `04*_evaluation`
# ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction


ChaosBench is a benchmark project to improve long-term forecasting of chaotic systems, in particular subseasonal-to-seasonal (S2S) climate, using ML approaches.

Homepage 🔗: https://leap-stc.github.io/ChaosBench

Paper 📚: https://arxiv.org/

Dataset 🤗: https://huggingface.co/datasets/juannat7/ChaosBench


## Features

![Overview of ChaosBench](docs/scheme/chaosbench_scheme.jpg)

1️⃣ __Extended Observations__. Spanning over 45 years (1979 - 2023) of ERA5 reanalysis

2️⃣ __Diverse Baselines__. Wide selection of physics-based forecasts from leading national agencies in Europe, the UK, America, and Asia

3️⃣ __Differentiable Physics Metrics__. Introduces two differentiable physics-based metrics to minimize the decay of power spectra at long forecasting horizon (blurriness)

4️⃣ __Large-Scale Benchmarking__. Systematic evaluation for state-of-the-art ML-based weather models like PanguWeather, FourcastNetV2, ViT/ClimaX, and Graphcast


## Getting Started
- [Quickstart](https://leap-stc.github.io/ChaosBench/quickstart.html)
- [Dataset Overview](https://leap-stc.github.io/ChaosBench/dataset.html)
- [Task Overview](https://leap-stc.github.io/ChaosBench/task.html)


## Build Your Own Model
- [Training](https://leap-stc.github.io/ChaosBench/training.html)
- [Evaluation](https://leap-stc.github.io/ChaosBench/evaluation.html)

## Benchmarking
- [Baseline Models](https://leap-stc.github.io/ChaosBench/baseline.html)
- [Leaderboard](https://leap-stc.github.io/ChaosBench/leaderboard.html)
57 changes: 57 additions & 0 deletions _sources/baseline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Baseline Models
We differentiate between physics-based and data-driven models. The former is succintly illustrated as in the figure below.

<div style="text-align: center;">
<img src="../docs/scheme/chaosbench_scheme-physics-model.jpg" style="width:300px;"/>
</div>

## Model Definition
- __Physics-Based Models__:
- [x] UKMO: UK Meteorological Office
- [x] NCEP: National Centers for Environmental Prediction
- [x] CMA: China Meteorological Administration
- [x] ECMWF: European Centre for Medium-Range Weather Forecasts

- __Data-Driven Models__:
- [x] Lagged-Autoencoder
- [x] Fourier Neural Operator (FNO)
- [x] ResNet
- [x] UNet
- [x] ViT/ClimaX
- [x] PanguWeather
- [x] Fourcastnetv2
- [x] GraphCast

## Model Checkpoints
Checkpoints for data-driven models are accessible from [here](https://huggingface.co/datasets/juannat7/ChaosBench/tree/main/logs)

- Data-driven models are indicated by the `_s2s` suffix (e.g., `unet_s2s`).

- The hyperparameter specifications are located in `version_xx/lightning_logs/hparams.yaml`. The hyperparameters encode the following:

- `lead_time` (default: 1): arbitrary delta_t to finetune the model, for direct approach
- `n_step` (default: 1): number of autoregressive step, s, for autoregressive approach
- `only_headline`: if false, optimize for task 1; if true for task 2
- `batch_size`: the batch size used for training
- `train_years`: list of years used for training
- `val_years`: list of years used for validation
- `epochs`: number of epoch
- `input_size`: number of input channel
- `learning_rate`: update step at each iteration
- `model_name`: the name of the model used for consistency
- `num_workers`: number of workers used in dataloader
- `output_size`: number of output channel
- `t_max`: number of cosine learning rate scheduler cycle

__NOTE__: You will notice that for each data-driven model, there are 4 checkpoints.

1. Version 0 - Task 1; autoregressive up to 1-day ahead
2. Version 1 - Task 1; autoregressive up to 5-day ahead
3. Version 2 - Task 2; autoregressive up to 1-day ahead
4. Version 3 - Task 2; autoregressive up to 5-day ahead

Only for `unet_s2s` do we have many more checkpoints. This is to check for the effect of `direct` vs. `autoregressive` training approach described in the paper. In particular, the `direct` models have the following version numbers,
1. Version {0, 4, 5, 6, 7, 8, 9, 10, 11, 12} - Task 1
2. Version {2, 13, 14, 15, 16, 17, 18, 19, 20, 21} - Task 2

Each element in the array corresponds to checkpoints optimized for each $\Delta T \in \{1, 5, 10, 15, 20, 25, 30, 35, 40, 44\}$.
17 changes: 17 additions & 0 deletions _sources/dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Dataset Information

> __NOTE__: Hands-on exploration of the ChaosBench dataset in `notebooks/01a_s2s_data_exploration.ipynb`
1. __Input:__ ERA5 Reanalysis (1979-2023)

2. __Target:__ The following table indicates the 48 variables (channels) that are available for Physics-based models. Note that the __Input__ ERA5 observations contains __ALL__ fields, including the unchecked boxes:

Parameters/Levels (hPa) | 1000 | 925 | 850 | 700 | 500 | 300 | 200 | 100 | 50 | 10
:---------------------- | :----| :---| :---| :---| :---| :---| :---| :---| :--| :-|
Geopotential height, z ($gpm$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |
Specific humidity, q ($kg kg^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &nbsp; | &nbsp; | &nbsp; |
Temperature, t ($K$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |
U component of wind, u ($ms^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |
V component of wind, v ($ms^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |
Vertical velocity, w ($Pas^{-1}$) | &nbsp; | &nbsp; | &nbsp; | &nbsp; | &check; | &nbsp; | &nbsp; | &nbsp; | &nbsp; | &nbsp; |

32 changes: 32 additions & 0 deletions _sources/evaluation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Evaluation

After training your model, you can simply perform evaluation by running:

1. __Autoregressive__
```
python eval_iter.py --model_name <YOUR_MODEL>_s2s --eval_years 2023 --version_num <VERSION_NUM>
```

2. __Direct__
```
python eval_direct.py --model_name <YOUR_MODEL>_s2s --eval_years 2023 --version_nums <VERSION_NUM> --task_num <TASK_NUM>
```

Where `<VERSION_NUM(S)>` corresponds to the version(s) that `pytorch_lightning` generated during training.

__For example__, in our `unet_s2s` baseline model, we can run:

- Autoregressive: `python eval_iter.py --model_name unet_s2s --eval_years 2023 --version_num 0`

- Direct: `python eval_direct.py --model_name unet_s2s --eval_years 2023 --version_nums 0 4 5 6 7 8 9 10 11 12 --task_num 1`


## Accessing Baseline Scores
You can access the complete scores (in `.csv` format) for data-driven, physics-based models, climatology, and persistence [here](https://huggingface.co/datasets/juannat7/ChaosBench/tree/main/logs). Below is a snippet from `logs/climatology/eval/rmse_climatology.csv`, where each row represents `<METRIC>`, such as `RMSE`, at each future timestep.

| z-10 | z-50 | z-100 | z-200 | z-300 | ... | w-1000 |
|----------|----------|----------|----------|----------|-----|----------|
| 539.7944 | 285.9499 | 215.14742| 186.43161| 166.28784| ... | 0.07912156|
| 538.9591 | 285.43832| 214.82317| 186.23743| 166.16902| ... | 0.07907272|
| 538.1366 | 284.96063| 214.51791| 186.04941| 166.04732| ... | 0.07903882|
| ... | ... | ... | ... | ... | ... | ... |
15 changes: 15 additions & 0 deletions _sources/leaderboard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Leaderboard

We divide our metrics into 2 classes: (1) ML-based, which cover evaluation used in conventional computer vision and forecasting tasks, (2) Physics-based, which are aimed to construct a more physically-faithful and explainable data-driven forecast.

1. __Vision-based:__
- [x] RMSE
- [x] Bias
- [x] Anomaly Correlation Coefficient (ACC)
- [x] Multiscale Structural Similarity Index (MS-SSIM)
2. __Physics-based:__
- [x] Spectral Divergence (SpecDiv)
- [x] Spectral Residual (SpecRes)


For all models (data-driven, physics-based, etc), there is a folder named `eval/`. This contains individual `.csv` files for each metric (e.g., SpecDiv, RMSE). Within each file, it contains scores for all channels in question (e.g., the entire 60 for task 1, arbitrary n for task 2, or 48 for physics-based models) across 44-day lead time.
31 changes: 31 additions & 0 deletions _sources/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Quickstart

**Step 1**: Clone the [ChaosBench](https://github.com/leap-stc/ChaosBench) Github repository

**Step 2**: Create local directory to store your data, e.g.,
```
cd ChaosBench
mkdir data
```

**Step 3**: Navigate to `chaosbench/config.py` and change the field `DATA_DIR = /<YOUR_WORKING_DIR>/ChaosBench/data` (_Provide absolute path_)

**Step 4**: Initialize the space by running
```
cd /<YOUR_WORKING_DIR>/ChaosBench/data/
wget https://huggingface.co/datasets/juannat7/ChaosBench/blob/main/process.sh
chmod +x process.sh
```
**Step 5**: Download the data

```
# NOTE: you can also run each line one at a time to retrieve individual dataset
./process.sh era5 # Required: For input ERA5 data
./process.sh climatology # Required: For climatology
./process.sh ukmo # Optional: For simulation from UKMO
./process.sh ncep # Optional: For simulation from NCEP
./process.sh cma # Optional: For simulation from CMA
./process.sh ecmwf # Optional: For simulation from ECMWF
```

21 changes: 21 additions & 0 deletions _sources/task.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Task Overview

We presented __TWO__ task, where the model still takes as __inputs the FULL__ 60 variables, but the benchmarking __targets ALL or SUBSET__ of variable(s).

1. __Task 1️⃣: Full Dynamics Prediction.__
It is aimed at __ALL__ target channels simultaneously. This task is generally harder to perform but is useful to build a model that emulates the entire weather conditions.

2. __Task 2️⃣: Sparse Dynamics Prediction.__
It is aimed at a __SUBSET__ of target channel(s). This task is useful to build long-term forecasting model for specific variables, such as near-surface temperature (t-1000) or near-surface humidity (q-1000).

__NOTE__: Before training your own model [instructions here](https://leap-stc.github.io/ChaosBench/training.html), you can specify the Task you are optimizing for by changing `only_headline` field in `chaosbench/configs/<YOUR_MODEL>_s2s.yaml` file:

- Task 1️⃣: `only_headline: False`

- Task 2️⃣: `only_headline: True`. By default, it is going to optimize on {t-850, z-500, q-700}. To change this, modify the `HEADLINE_VARS` field in `chaosbench/config.py`

In addition, we also provide flags to train the model either __autoregressively__ or __directly__.

- Autoregressive: Using current output as the next model input. The number of iterative steps is defined in the `n_step: <N_STEP>` field. For our baselines, we set `N_STEP = 5`.

- Direct: Directly targeting specific time in the future. The lead time can be specified in the `lead_time: <LEAD_TIME>` field. Ensure that `n_step: 1` for this case. For our baselines, we set `<LEAD_TIME>` $\in \{1, 5, 10, 15, 20, 25, 30, 35, 40, 44\}$
24 changes: 24 additions & 0 deletions _sources/training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Training

> __NOTE__: Hands-on modeling and training workflow in `notebooks/02a_s2s_modeling.ipynb` and `notebooks/03a_s2s_train.ipynb`
We will outline how one can implement their own data-driven models. Several examples, including ED, FNO, ResNet, and UNet have been provided.

**Step 1**: Define your model class in `chaosbench/models/<YOUR_MODEL>.py`. At present, we only support models built with `PyTorch`

**Step 2**: Initialize your model in `chaosbench/models/model.py` under `__init__` method in `S2SBenchmarkModel` class

**Step 3**: Write a configuration file in `chaosbench/configs/<YOUR_MODEL>_s2s.yaml`. We recommend reading the details on the definition of [hyperparameters](https://leap-stc.github.io/ChaosBench/baseline.html) and the different [task]((https://leap-stc.github.io/ChaosBench/task.html) before training. Also change the `model_name: <YOUR_MODEL>_s2s` to ensure correct pathing

- Task 1️⃣ (autoregressive): `only_headline: False ; n_step: <N_STEP>`
- Task 1️⃣ (direct): `only_headline: False ; n_step: 1 ; lead_time: <LEAD_TIME>`

- Task 2️⃣ (autoregressive): `only_headline: True ; n_step: <N_STEP>`
- Task 2️⃣ (direct): `only_headline: True ; n_step: 1 ; lead_time: <LEAD_TIME>`


**Step 4**: Train by running `python train.py --config_filepath chaosbench/configs/<YOUR_MODEL>_s2s.yaml`

**Step 5**: Done!

__NOTE__: Remember to replace `<YOUR_MODEL>` with your own model name, e.g., `unet`. Checkpoints and logs would be automatically generated in `logs/<YOUR_MODEL>_s2s/`.
Loading

0 comments on commit 6106eb9

Please sign in to comment.