This repository contains code to generate training data, train and interprete
the neural network used in L. Denby
(2020)
collected in a python module called convml_tt
. From version v0.7.0
it
was rewritten to use pytorch-lightning rather
than fastai v1 to adopt best-practices and make it
easier to modify and carry out further research on the technique.
The easiest way to work with convml_tt
is to set up a conda environment with
the necessary dependencies and then install convml_tt
into this environment
with pip.
1. Check out convml_tt
from github
git clone https://github.com/leifdenby/convml_tt
cd convml_tt
2. Install dependecies
To train the model and do the model interpretation there are number of python modules which are needed. All the necessary dependencies can be installed with conda. Once conda is installed you can create an environment depending on whether you will be doing GPU or CPU-based training
For GPU-based training:
conda env create -f environment-gpu.yml
conda activate convml_tt
For CPU-based training:
conda env create -f environment-cpu.yml
conda activate convml_tt
3. Install convml_tt
Once you have a conda environment set up and activated you can install
convml_tt
through pip with:
pip install .
You will now have convml_tt
available whenever you activate the convml_tt
conda environment. You will have the base components of convml_tt
installed which enable training the model on a existing triplet-dataset
and making predictions with a trained model. To produce training data for
convml_tt
more dependecies are required depending on the kind of input
data you want to use (see "Creating training data" below).
NOTE ON DEVELOPING convml_tt
: if you plan on modifying the convml_tt
code yourself you add the -e
flag above (i.e. use pip install -e .
) so that
any changes you make are automatically picked up.
Below are details on how to obtain training data and how to train the model
A few example training datasets can be downloaded using the following command
python -m convml_tt.data.examples
To generate tiles images from your netCDF data use this command.
python -m luigi --module convml_tt.data.nc_satelitte BuildImages --local-scheduler --ExtractNcFiles-path="<your-path>"
This would generate the images in the directory "tmp" of this repository and then just use the absolute path to the tmp directory in the EURE4CA notebook.
NB: dataset creation doesn't currently work as it is being refactored
To work with satellite data you will need packages that kind read this
data, reproject it and plot it on maps. These requires some system
libraries that can be difficult to install using only pip
, but can
easily be installed with conda into your convml_tt
environment
conda install -c conda-forge xesmf cartopy
And then use pip to install the matching python packages
pip install ".[sattiles]"
TODO: complete rest of guide talking about processing pipeline and downloading satellite data
You can use the CLI (Command Line Interface) to train the model
python -m convml_tt.trainer data_dir
where data_dir
is the path of the dataset you want to use. There are a number
of optional command flags available, for example to train with one GPU use
the training process to weights & biases use
--log-to-wandb
. For a list of all the available flags use the -h
.
Training can also be done interactively in for example a jupyter notebook, you can see some simple examples how what commands to use by looking at the automated tests in tests/.
Finally there detailed notes on how to train on the ARC3 HPC cluster at University of Leeds are in doc/README.ARC3.md and on the JASMIN analysis cluster.
There are currently two types of plots that I use for interpreting the embeddings that the model produces. These are a dendrogram with examples plotted for each class of the leaf nodes of the dendrogram and a scatter plot of two dimensions annotated with example tiles so the actual tiles can be visualised.
There is an example of how to make these plots and how to easily generate an
embedding (or encoding) vector for each example tile in
example_notebooks/model_interpretation
. Again this notebook expects the
directory layout mentioned above.
If you encounter an import problem with this library follow those steps ( after activating in the conda environnement ) :
cd
git clone https://github.com/adobe/antialiased-cnns
cd antialiased-cnns
pip install -r requirements.txt
pip install antialiased-cnns