Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.4.0 dev #55

Merged
merged 47 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
5cd09f3
Fixes #14: Dynamically generate seed block in yaml
bnubald Feb 28, 2024
6e21f1d
Merge pull request #33 from icenet-ai/14_dynamic_seed_gen
bnubald Feb 28, 2024
5e86f99
Fixes #36: Rename seed variables in ENVS.example
bnubald Mar 7, 2024
372f9aa
Merge pull request #37 from bnubald/36_fix_var_name
bnubald Mar 7, 2024
d2098e9
Dev #38: adding support for incremental HPC environment installation …
JimCircadian Mar 22, 2024
bef3bd7
Dev #38: sticking in some stubs for dawn use
JimCircadian Mar 23, 2024
2fd5ae4
Fixes #14: Dynamically generate seed block in yaml
bnubald Feb 28, 2024
100d399
Fixes #36: Rename seed variables in ENVS.example
bnubald Mar 7, 2024
3a15c62
Dev #38: adding support for incremental HPC environment installation …
JimCircadian Mar 22, 2024
a8f1193
Dev #38: sticking in some stubs for dawn use
JimCircadian Mar 23, 2024
adb234f
Dev #39: highlighting what the intention is for specifying basic pip …
JimCircadian Mar 28, 2024
9721c2f
Merge branch '38_dawn' into v0.3.0_dev
JimCircadian Apr 12, 2024
3e4f928
Removing explict icenet dependency, that's not necessary under pip (a…
JimCircadian Apr 12, 2024
0d5bf55
Fixes #39: sorted this out properly
JimCircadian Apr 12, 2024
81ca63c
Version of python bump
JimCircadian Jun 19, 2024
fa29ce5
Development rationalisation to support 0.4 development
JimCircadian Jul 22, 2024
515333d
Removing unnecessary pinning
JimCircadian Aug 16, 2024
0d91837
Sorting out new preprocess config gitignore
JimCircadian Aug 16, 2024
735cbf7
Dev #53: reorganising structure of scripts
JimCircadian Aug 20, 2024
1316e6a
Messed up gitignore, getting rid of ENVS
JimCircadian Aug 20, 2024
8d39b50
Dev #53: Adapted for new structure of environmental-forecasting train…
JimCircadian Aug 21, 2024
7679464
Dev #53: finalised scripting of prep_training_data
JimCircadian Aug 21, 2024
e6673fe
Dev #53: implementation for new structure of training runs
JimCircadian Aug 21, 2024
13204a2
Updating refs for creation of links
JimCircadian Aug 21, 2024
f2bd7d7
Training data working for both hemispheres
JimCircadian Aug 22, 2024
3016b5d
Dev #53: implementing prediction and more comprehensive lifecycle, BU…
JimCircadian Aug 27, 2024
ddc593c
Remapping lag and lead to the forecasting processing
JimCircadian Aug 29, 2024
c9bfb38
Correcting for localised processed data store
JimCircadian Aug 29, 2024
a9a34c7
Mask data ref was missing
JimCircadian Aug 29, 2024
2c2ee7a
Updated for much more efficient copying and processing of prediction …
JimCircadian Aug 30, 2024
0dcdf7b
Fixes #53: last amendment for patching model name correctly?
JimCircadian Aug 30, 2024
461e18e
Forgot to complete the full comfiguration naming in prediction copies
JimCircadian Aug 30, 2024
15fa949
Further fixing of changes to model delimiters
JimCircadian Aug 30, 2024
a183f41
Clearing up some cruft and giving the option to supply extra args
JimCircadian Aug 30, 2024
7a6d2b1
Updating for variable temporal lengths and resolutions
JimCircadian Sep 2, 2024
97f137b
Updating hemi regex
JimCircadian Sep 2, 2024
8344fdd
Updated for producing op assets with environmental forecasting
JimCircadian Sep 3, 2024
0a5960b
Updating plotting commands
JimCircadian Sep 4, 2024
4923cf9
Updating template dates
JimCircadian Sep 4, 2024
1e63b01
Validating and sorting out spatial interpolation
JimCircadian Sep 5, 2024
795e9c7
Clearing some comments and TODOs
JimCircadian Sep 6, 2024
2d3ad5e
AMSR2 dataset generation now working
JimCircadian Jan 10, 2025
11c4b20
Restrict the amount of copying on regrid for AMSR
JimCircadian Jan 10, 2025
8ae686a
Changing name
JimCircadian Jan 10, 2025
bf97ad6
ENVS
JimCircadian Jan 10, 2025
735d96e
Adding comments for transfer
JimCircadian Jan 14, 2025
acf37e6
Updating for revised split names
JimCircadian Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ __pycache__/
/wandb/
*test*.json
dataset_config.*.json
processed.*.json
loader.*.json
*.csv
*.err
Expand All @@ -29,9 +30,11 @@ loader.*.json
*.npy
*.out
tmp.*
*.swp
*test*
*.png

ENVS
!ENVS.example
ENVS.*

1 change: 0 additions & 1 deletion ENVS

This file was deleted.

11 changes: 11 additions & 0 deletions ENVS.example
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ DEMO_PIPELINE_VAL_END="2022-2-14"
DEMO_PIPELINE_TEST_START="2022-2-15"
DEMO_PIPELINE_TEST_END="2022-2-28"

##
# Training & Prediction ensemble run seeds
#
DEMO_PIPELINE_ENSEMBLE_TRAIN_SEEDS="42,46"
DEMO_PIPELINE_ENSEMBLE_PREDICT_SEEDS="42,46"

##
# The prefix to use for training date ranges
#
Expand All @@ -83,6 +89,9 @@ VAL_END_NAME="${PREFIX}_VAL_END"
TEST_START_NAME="${PREFIX}_TEST_START"
TEST_END_NAME="${PREFIX}_TEST_END"

ENSEMBLE_TRAIN_SEEDS_NAME="${PREFIX}_ENSEMBLE_TRAIN_SEEDS"
ENSEMBLE_PREDICT_SEEDS_NAME="${PREFIX}_ENSEMBLE_PREDICT_SEEDS"

# What are we exporting

export TRAIN_START=${!TRAIN_START_NAME}
Expand All @@ -92,3 +101,5 @@ export VAL_END=${!VAL_END_NAME}
export TEST_START=${!TEST_START_NAME}
export TEST_END=${!TEST_END_NAME}

export ENSEMBLE_TRAIN_SEEDS=${!ENSEMBLE_TRAIN_SEEDS_NAME}
export ENSEMBLE_PREDICT_SEEDS=${!ENSEMBLE_PREDICT_SEEDS_NAME}
10 changes: 0 additions & 10 deletions condense.slurm.sh

This file was deleted.

File renamed without changes.
18 changes: 14 additions & 4 deletions ensemble/predict.tmpl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ ensemble:
- ../../../processed
- ../../../results
mem: 224gb
cluster: pvc

pre_process: []
post_process: []
Expand All @@ -24,7 +25,6 @@ ensemble:
- icenet_predict.sh.j2
email: [email protected]
job_file: icenet_predict.sh
cluster: short
nodes: 1
ntasks: 8
length: 00:30:00
Expand All @@ -38,11 +38,21 @@ ensemble:
- name: execute
args:
cmd: /usr/bin/ln -s ../../data
- name: execute
args:
cmd: /usr/bin/ln -s ../../processed
- name: execute
args:
cmd: /usr/bin/ln -s ../../processed_data
- name: execute
args:
cmd: /usr/bin/ln -s ../../ref.osisaf.north.nc
- name: execute
args:
cmd: /usr/bin/ln -s ../../ref.osisaf.south.nc
pre_run: []
runs:
- seed: 42
- seed: 46
- seed: 45
- seed: SEEDS
post_run: []
post_batch:
- name: execute
Expand Down
5 changes: 5 additions & 0 deletions ensemble/template/dawn.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env bash

module purge
module load default-dawn
module load intelpython-conda
2 changes: 1 addition & 1 deletion ensemble/template/icenet_predict.sh.j2
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/bash -l
#SBATCH --output={{ run.dir }}/predict.%j.%N.{{ run.seed }}.out
#SBATCH --error={{ run.dir }}/predict.%j.%N.{{ run.seed }}.err
#SBATCH --chdir={{ run.dir }}
Expand Down
24 changes: 21 additions & 3 deletions ensemble/template/icenet_train.sh.j2
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#!/bin/bash
#!/bin/bash -l
{% if run.cluster != "test" %}
#SBATCH --output={{ run.dir }}/train.%j.%N.{{ run.seed }}.out
#SBATCH --error={{ run.dir }}/train.%j.%N.{{ run.seed }}.err
#SBATCH --chdir={{ run.dir }}
Expand All @@ -15,6 +16,7 @@
#SBATCH --cpus-per-task={{ run.ntasks }}
#SBATCH --mem={{ run.mem }}
{% if run.nodelist %}#SBATCH --nodelist={{ run.nodelist }}{% endif %}
{% endif %}

cd {{ run.dir }}

Expand All @@ -36,8 +38,24 @@ echo "START `date +%F\ %T`"
source $PREP_SCRIPT
conda activate $ICENET_CONDA

# TODO: run.arg_filter_factor comes from ENVS now
COMMAND="icenet_train -v {{ run.arg_dataset }} {{ run.name }} {{ run.seed }} $TRAIN_STATIC_ARGS -b {{ run.arg_batch }} -e {{ run.arg_epochs }} -m -qs {{ run.arg_queue }} -w {{ run.ntasks }} -s {{ run.arg_strategy }} {% if run.arg_preload %} -p results/networks/{{ run.name }}/{{ run.name }}.network_{{ run.arg_preload }}.{{ run.seed }}.h5 {% endif %}{% if run.arg_filter_factor %} -n {{ run.arg_filter_factor }}{% endif %}"
PRELOAD=""
FINAL_WEIGHTS="results/networks/{{ run.name }}/{{ run.name }}.network_{{ run.preload }}.{{ run.seed }}.h5"
CHECKPOINT_WEIGHTS="`ls results/networks/{{ run.name }}/checkpoint.{{ run.name }}.network_{{ run.preload }}.{{ run.seed }}.*.keras 2>/dev/null`"

# TODO: do we have keras / h5 weight multi-handling in place in library?
if [ -f $FINAL_WEIGHTS ]; then
echo "Preloading from previously trained network $FINAL_WEIGHTS"
PRELOAD="-p $FINAL_WEIGHTS"
elif [ ! -z "$CHECKPOINT_WEIGHTS" ]; then
CHECKPOINT_FILE=`echo "$CHECKPOINT_WEIGHTS" | sort | head -n 1`
echo "Preloading from checkpoint file $CHECKPOINT_FILE"
PRELOAD="-p $CHECKPOINT_FILE"
fi

COMMAND="icenet_train_tensorflow -v \
$TRAIN_STATIC_ARGS \
-b {{ run.batch }} -e {{ run.epochs }} -n $FILTER_FACTOR -s {{ run.strategy }} \
$PRELOAD {{ run.dataset }} {{ run.name }} {{ run.seed }} "

echo "Running $COMMAND"
eval $COMMAND
Expand Down
41 changes: 16 additions & 25 deletions ensemble/train.tmpl.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,27 @@
---
ensemble:
vars:
arg_batch: 4
arg_dataset: DATASET
arg_epochs: 100
arg_filter_factor: 1
arg_queue: 2
arg_strategy: default
batch: 4
cluster: dummy
dataset: DATASET
email: [email protected]
epochs: 100
filter_factor: 1
gpus: 1
length: "1-00:00:00"
mem: 128gb
nodes: 1
ntasks: 2
preload: DATASET
strategy: default
symlinks:
- ../../../data
- ../../../ENVS*
- ../../../loader.LOADER.json
- ../../../LOADER
- ../../../dataset_config.DATASET.json
- ../../../network_datasets
- ../../../processed
- ../../../results
gpus: 1
mem: 128gb

pre_process:
- name: execute
Expand All @@ -29,13 +34,8 @@ ensemble:
templatedir: ../template
templates:
- icenet_train.sh.j2
email: [email protected]
job_file: icenet_train.sh
cluster: gpu
nodes: 1
ntasks: NTASKS
length: 4-00:00:00
maxruns: 5
maxruns: MAXJOBS
maxjobs: MAXJOBS

batches:
Expand All @@ -44,16 +44,7 @@ ensemble:
pre_batch: []
pre_run: []
runs:
- seed: 42
- seed: 46
- seed: 45
- seed: 17
- seed: 24
- seed: 84
- seed: 83
- seed: 16
- seed: 5
- seed: 3
- seed: SEEDS
post_run: []
post_batch:
- name: execute
Expand Down
13 changes: 13 additions & 0 deletions environment.dawn.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
channels:
- conda-forge
- defaults
dependencies:
- cartopy
- eccodes
- ffmpeg
- hdf5
- netcdf4
- openh264
- xarray
pip:
- model-ensembler
10 changes: 6 additions & 4 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
channels:
- conda-forge
- defaults
dependencies:
- cartopy
- cudatoolkit=11.2
- cudnn=8.1.0
- cudatoolkit
- cudnn
- eccodes
- ffmpeg
- hdf5
- ipykernel
- netcdf4
- openh264
- python=3.8
- python=3.9
- pip
- xarray
- pip:
- model-ensembler
14 changes: 0 additions & 14 deletions loader_test_dates.sh

This file was deleted.

Loading