deploy: 291aaef

leap-stc · Jan 26, 2024 · 6106eb9 · 6106eb9
1 parent 01d72ca
commit 6106eb9
Show file tree

Hide file tree

Showing 20 changed files with 4,025 additions and 237 deletions.
diff --git a/README.html b/README.html
diff --git a/_sources/README.md b/_sources/README.md
@@ -1,61 +1,38 @@
-# ChaosBench - A benchmark for long-term forecasting of chaotic systems
-ChaosBench is a benchmark project to improve long-term forecasting of chaotic systems, in particular subseasonal-to-seasonal (S2S) weather. Current features include:
-
-## 1. Benchmark and Dataset
-
-- __Input:__ ERA5 Reanalysis (1979-2022)
-
-- __Target:__ The following table indicates the 48 variables (channels) that are available for Physics-based models. Note that the __Input__ ERA5 observations contains __ALL__ fields, including the unchecked boxes:
-
-    Parameters/Levels (hPa) | 1000 | 925 | 850 | 700 | 500 | 300 | 200 | 100 | 50 | 10
-    :---------------------- | :----| :---| :---| :---| :---| :---| :---| :---| :--| :-|
-    Geopotential height, z ($gpm$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
-    Specific humidity, q ($kg kg^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &nbsp; | &nbsp; | &nbsp; |  
-    Temperature, t ($K$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
-    U component of wind, u ($ms^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
-    V component of wind, v ($ms^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
-    Vertical velocity, w ($Pas^{-1}$) | &nbsp; | &nbsp; | &nbsp; | &nbsp; | &check; | &nbsp; | &nbsp; | &nbsp; | &nbsp; | &nbsp; |  
-
-- __Baselines:__
-    - Physics-based models:
-        - [x] UKMO: UK Meteorological Office
-        - [x] NCEP: National Centers for Environmental Prediction
-        - [x] CMA: China Meteorological Administration
-        - [x] ECMWF: European Centre for Medium-Range Weather Forecasts
-    - Data-driven models:
-        - [x] Lagged-Autoencoder
-        - [x] Fourier Neural Operator (FNO)
-        - [x] ResNet
-        - [x] UNet
-        - [x] ViT/ClimaX
-        - [x] PanguWeather
-        - [x] Fourcastnetv2
-
-## 2. Metrics
-We divide our metrics into 2 classes: (1) ML-based, which cover evaluation used in conventional computer vision and forecasting tasks, (2) Physics-based, which are aimed to construct a more physically-faithful and explainable data-driven forecast.
-
-- __Vision-based:__
-    - [x] RMSE
-    - [x] Bias
-    - [x] Anomaly Correlation Coefficient (ACC)
-    - [x] Multiscale Structural Similarity Index (MS-SSIM)
-- __Physics-based:__
-    - [x] Spectral Divergence (SpecDiv)
-    - [x] Spectral Residual (SpecRes)
-
-
-## 3. Tasks
-We presented two task, where the model still takes as inputs the __FULL__ 60 variables, but the benchmarking is done on either __ALL__ or a __SUBSET__ of target variable(s).
-
-- __Task 1: Full Dynamics Prediction.__
-It is aimed at __ALL__ target channels simultaneously. This task is generally harder to perform but is useful to build a model that emulates the entire weather conditions.
-
-- __Task 2: Sparse Dynamics Prediction.__
-It is aimed at a __SUBSET__ of target channel(s). This task is useful to build long-term forecasting model for specific variables, such as near-surface temperature (t-1000) or near-surface humidity (q-1000). 
-
-## 4. Getting Started
-You can learn more about how to use our benchmark product through the following Jupyter notebooks under the `notebooks` directory. It covers topics ranging from:
-- `01*_dataset_exploration`
-- `02*_modeling`
-- `03*_training`
-- `04*_evaluation`
+# ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction
+
+
+ChaosBench is a benchmark project to improve long-term forecasting of chaotic systems, in particular subseasonal-to-seasonal (S2S) climate, using ML approaches.
+
+Homepage 🔗: https://leap-stc.github.io/ChaosBench
+
+Paper 📚: https://arxiv.org/
+
+Dataset 🤗: https://huggingface.co/datasets/juannat7/ChaosBench 
+
+
+## Features
+
+![Overview of ChaosBench](docs/scheme/chaosbench_scheme.jpg)
+
+1️⃣ __Extended Observations__. Spanning over 45 years (1979 - 2023) of ERA5 reanalysis
+
+2️⃣ __Diverse Baselines__. Wide selection of physics-based forecasts from leading national agencies in Europe, the UK, America, and Asia
+
+3️⃣ __Differentiable Physics Metrics__. Introduces two differentiable physics-based metrics to minimize the decay of power spectra at long forecasting horizon (blurriness)
+
+4️⃣ __Large-Scale Benchmarking__. Systematic evaluation for state-of-the-art ML-based weather models like PanguWeather, FourcastNetV2, ViT/ClimaX, and Graphcast
+
+
+## Getting Started
+- [Quickstart](https://leap-stc.github.io/ChaosBench/quickstart.html)
+- [Dataset Overview](https://leap-stc.github.io/ChaosBench/dataset.html)
+- [Task Overview](https://leap-stc.github.io/ChaosBench/task.html)
+
+
+## Build Your Own Model
+- [Training](https://leap-stc.github.io/ChaosBench/training.html)
+- [Evaluation](https://leap-stc.github.io/ChaosBench/evaluation.html)
+
+## Benchmarking
+- [Baseline Models](https://leap-stc.github.io/ChaosBench/baseline.html)
+- [Leaderboard](https://leap-stc.github.io/ChaosBench/leaderboard.html)
diff --git a/_sources/baseline.md b/_sources/baseline.md
@@ -0,0 +1,57 @@
+# Baseline Models
+We differentiate between physics-based and data-driven models. The former is succintly illustrated as in the figure below. 
+
+<div style="text-align: center;">
+    <img src="../docs/scheme/chaosbench_scheme-physics-model.jpg" style="width:300px;"/>
+</div>
+
+## Model Definition
+- __Physics-Based Models__:
+    - [x] UKMO: UK Meteorological Office
+    - [x] NCEP: National Centers for Environmental Prediction
+    - [x] CMA: China Meteorological Administration
+    - [x] ECMWF: European Centre for Medium-Range Weather Forecasts
+
+- __Data-Driven Models__:
+    - [x] Lagged-Autoencoder
+    - [x] Fourier Neural Operator (FNO)
+    - [x] ResNet
+    - [x] UNet
+    - [x] ViT/ClimaX
+    - [x] PanguWeather
+    - [x] Fourcastnetv2
+    - [x] GraphCast
+
+## Model Checkpoints
+Checkpoints for data-driven models are accessible from [here](https://huggingface.co/datasets/juannat7/ChaosBench/tree/main/logs)
+
+- Data-driven models are indicated by the `_s2s` suffix (e.g., `unet_s2s`). 
+
+- The hyperparameter specifications are located in `version_xx/lightning_logs/hparams.yaml`. The hyperparameters encode the following:
+
+    - `lead_time` (default: 1): arbitrary delta_t to finetune the model, for direct approach
+    - `n_step` (default: 1): number of autoregressive step, s, for autoregressive approach
+    - `only_headline`: if false, optimize for task 1; if true for task 2
+    - `batch_size`: the batch size used for training
+    - `train_years`: list of years used for training
+    - `val_years`: list of years used for validation
+    - `epochs`: number of epoch
+    - `input_size`: number of input channel
+    - `learning_rate`: update step at each iteration
+    - `model_name`: the name of the model used for consistency
+    - `num_workers`: number of workers used in dataloader
+    - `output_size`: number of output channel
+    - `t_max`: number of cosine learning rate scheduler cycle
+
+__NOTE__: You will notice that for each data-driven model, there are 4 checkpoints. 
+
+1. Version 0 - Task 1; autoregressive up to 1-day ahead
+2. Version 1 - Task 1; autoregressive up to 5-day ahead
+3. Version 2 - Task 2; autoregressive up to 1-day ahead
+4. Version 3 - Task 2; autoregressive up to 5-day ahead
+
+Only for `unet_s2s` do we have many more checkpoints. This is to check for the effect of `direct` vs. `autoregressive` training approach described in the paper. In particular, the `direct` models have the following version numbers,
+1. Version {0, 4, 5, 6, 7, 8, 9, 10, 11, 12} - Task 1
+2. Version {2, 13, 14, 15, 16, 17, 18, 19, 20, 21} - Task 2
+
+Each element in the array corresponds to checkpoints optimized for each $\Delta T \in \{1, 5, 10, 15, 20, 25, 30, 35, 40, 44\}$.
diff --git a/_sources/dataset.md b/_sources/dataset.md
@@ -0,0 +1,17 @@
+# Dataset Information
+
+> __NOTE__: Hands-on exploration of the ChaosBench dataset in `notebooks/01a_s2s_data_exploration.ipynb`
+
+1. __Input:__ ERA5 Reanalysis (1979-2023)
+
+2. __Target:__ The following table indicates the 48 variables (channels) that are available for Physics-based models. Note that the __Input__ ERA5 observations contains __ALL__ fields, including the unchecked boxes:
+
+    Parameters/Levels (hPa) | 1000 | 925 | 850 | 700 | 500 | 300 | 200 | 100 | 50 | 10
+    :---------------------- | :----| :---| :---| :---| :---| :---| :---| :---| :--| :-|
+    Geopotential height, z ($gpm$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
+    Specific humidity, q ($kg kg^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &nbsp; | &nbsp; | &nbsp; |  
+    Temperature, t ($K$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
+    U component of wind, u ($ms^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
+    V component of wind, v ($ms^{-1}$) | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; | &check; |  
+    Vertical velocity, w ($Pas^{-1}$) | &nbsp; | &nbsp; | &nbsp; | &nbsp; | &check; | &nbsp; | &nbsp; | &nbsp; | &nbsp; | &nbsp; |  
+
diff --git a/_sources/evaluation.md b/_sources/evaluation.md
@@ -0,0 +1,32 @@
+# Evaluation
+
+After training your model, you can simply perform evaluation by running:
+
+1. __Autoregressive__
+```
+python eval_iter.py --model_name <YOUR_MODEL>_s2s --eval_years 2023 --version_num <VERSION_NUM>
+```
+
+2. __Direct__
+```
+python eval_direct.py --model_name <YOUR_MODEL>_s2s --eval_years 2023 --version_nums <VERSION_NUM> --task_num <TASK_NUM>
+```
+
+Where `<VERSION_NUM(S)>` corresponds to the version(s) that `pytorch_lightning` generated during training.
+
+__For example__, in our `unet_s2s` baseline model, we can run:
+
+- Autoregressive: `python eval_iter.py --model_name unet_s2s --eval_years 2023 --version_num 0`
+
+- Direct: `python eval_direct.py --model_name unet_s2s --eval_years 2023 --version_nums 0 4 5 6 7 8 9 10 11 12 --task_num 1`
+
+
+## Accessing Baseline Scores
+You can access the complete scores (in `.csv` format) for data-driven, physics-based models, climatology, and persistence [here](https://huggingface.co/datasets/juannat7/ChaosBench/tree/main/logs). Below is a snippet from `logs/climatology/eval/rmse_climatology.csv`, where each row represents `<METRIC>`, such as `RMSE`, at each future timestep.
+
+| z-10     | z-50     | z-100    | z-200    | z-300    | ... | w-1000   |
+|----------|----------|----------|----------|----------|-----|----------|
+| 539.7944 | 285.9499 | 215.14742| 186.43161| 166.28784| ... | 0.07912156|
+| 538.9591 | 285.43832| 214.82317| 186.23743| 166.16902| ... | 0.07907272|
+| 538.1366 | 284.96063| 214.51791| 186.04941| 166.04732| ... | 0.07903882|
+| ...      | ...      | ...      | ...      | ...      | ... | ...      |
diff --git a/_sources/leaderboard.md b/_sources/leaderboard.md
@@ -0,0 +1,15 @@
+# Leaderboard
+
+We divide our metrics into 2 classes: (1) ML-based, which cover evaluation used in conventional computer vision and forecasting tasks, (2) Physics-based, which are aimed to construct a more physically-faithful and explainable data-driven forecast.
+
+1. __Vision-based:__
+    - [x] RMSE
+    - [x] Bias
+    - [x] Anomaly Correlation Coefficient (ACC)
+    - [x] Multiscale Structural Similarity Index (MS-SSIM)
+2. __Physics-based:__
+    - [x] Spectral Divergence (SpecDiv)
+    - [x] Spectral Residual (SpecRes)
+
+
+For all models (data-driven, physics-based, etc), there is a folder named `eval/`. This contains individual `.csv` files for each metric (e.g., SpecDiv, RMSE). Within each file, it contains scores for all channels in question (e.g., the entire 60 for task 1, arbitrary n for task 2, or 48 for physics-based models) across 44-day lead time.
diff --git a/_sources/quickstart.md b/_sources/quickstart.md
@@ -0,0 +1,31 @@
+# Quickstart
+
+**Step 1**: Clone the [ChaosBench](https://github.com/leap-stc/ChaosBench) Github repository
+
+**Step 2**: Create local directory to store your data, e.g., 
+```
+cd ChaosBench
+mkdir data
+```
+
+**Step 3**: Navigate to `chaosbench/config.py` and change the field `DATA_DIR = /<YOUR_WORKING_DIR>/ChaosBench/data` (_Provide absolute path_)
+
+**Step 4**: Initialize the space by running
+```
+cd /<YOUR_WORKING_DIR>/ChaosBench/data/
+wget https://huggingface.co/datasets/juannat7/ChaosBench/blob/main/process.sh
+chmod +x process.sh
+```
+**Step 5**: Download the data 
+
+```
+# NOTE: you can also run each line one at a time to retrieve individual dataset
+
+./process.sh era5            # Required: For input ERA5 data
+./process.sh climatology     # Required: For climatology
+./process.sh ukmo            # Optional: For simulation from UKMO
+./process.sh ncep            # Optional: For simulation from NCEP
+./process.sh cma             # Optional: For simulation from CMA
+./process.sh ecmwf           # Optional: For simulation from ECMWF
+```
+
diff --git a/_sources/task.md b/_sources/task.md
@@ -0,0 +1,21 @@
+# Task Overview
+
+We presented __TWO__ task, where the model still takes as __inputs the FULL__ 60 variables, but the benchmarking __targets ALL or SUBSET__ of variable(s).
+
+1. __Task 1️⃣: Full Dynamics Prediction.__
+It is aimed at __ALL__ target channels simultaneously. This task is generally harder to perform but is useful to build a model that emulates the entire weather conditions.
+
+2. __Task 2️⃣: Sparse Dynamics Prediction.__
+It is aimed at a __SUBSET__ of target channel(s). This task is useful to build long-term forecasting model for specific variables, such as near-surface temperature (t-1000) or near-surface humidity (q-1000). 
+
+__NOTE__: Before training your own model [instructions here](https://leap-stc.github.io/ChaosBench/training.html), you can specify the Task you are optimizing for by changing `only_headline` field in `chaosbench/configs/<YOUR_MODEL>_s2s.yaml` file:
+
+- Task 1️⃣: `only_headline: False`
+
+- Task 2️⃣: `only_headline: True`. By default, it is going to optimize on {t-850, z-500, q-700}. To change this, modify the `HEADLINE_VARS` field in `chaosbench/config.py` 
+
+In addition, we also provide flags to train the model either __autoregressively__ or __directly__. 
+
+- Autoregressive: Using current output as the next model input. The number of iterative steps is defined in the `n_step: <N_STEP>` field. For our baselines, we set `N_STEP = 5`.
+
+- Direct: Directly targeting specific time in the future. The lead time can be specified in the `lead_time: <LEAD_TIME>` field. Ensure that `n_step: 1` for this case. For our baselines, we set `<LEAD_TIME>` $\in \{1, 5, 10, 15, 20, 25, 30, 35, 40, 44\}$
diff --git a/_sources/training.md b/_sources/training.md
@@ -0,0 +1,24 @@
+# Training
+
+> __NOTE__: Hands-on modeling and training workflow in `notebooks/02a_s2s_modeling.ipynb` and `notebooks/03a_s2s_train.ipynb`
+
+We will outline how one can implement their own data-driven models. Several examples, including ED, FNO, ResNet, and UNet have been provided. 
+
+**Step 1**: Define your model class in `chaosbench/models/<YOUR_MODEL>.py`. At present, we only support models built with `PyTorch`
+
+**Step 2**: Initialize your model in `chaosbench/models/model.py` under `__init__` method in `S2SBenchmarkModel` class
+
+**Step 3**: Write a configuration file in `chaosbench/configs/<YOUR_MODEL>_s2s.yaml`. We recommend reading the details on the definition of [hyperparameters](https://leap-stc.github.io/ChaosBench/baseline.html) and the different [task]((https://leap-stc.github.io/ChaosBench/task.html) before training. Also change the `model_name: <YOUR_MODEL>_s2s` to ensure  correct pathing
+
+- Task 1️⃣ (autoregressive): `only_headline: False ; n_step: <N_STEP>`
+- Task 1️⃣ (direct): `only_headline: False ; n_step: 1 ; lead_time: <LEAD_TIME>`
+
+- Task 2️⃣ (autoregressive): `only_headline: True ; n_step: <N_STEP>`
+- Task 2️⃣ (direct): `only_headline: True ; n_step: 1 ; lead_time: <LEAD_TIME>`
+
+
+**Step 4**: Train by running `python train.py --config_filepath chaosbench/configs/<YOUR_MODEL>_s2s.yaml`  
+
+**Step 5**: Done! 
+
+__NOTE__: Remember to replace `<YOUR_MODEL>` with your own model name, e.g., `unet`. Checkpoints and logs would be automatically generated in `logs/<YOUR_MODEL>_s2s/`.