This repo contains code to evaluate spatiotemporal predictions with Optimal Transport (OT). In contrast to standard evaluation metrics (MSE, MAE etc), the OT error is a spatial metric that evaluates the spatial distribution of the errors.
Install the package via pypi:
pip install geot
You might have to install a suitable torch version first. If you want to use Optimal Transport as a loss function for training a DL model, make sure to install torch for cuda.
Alternatively, install from source:
git clone https://github.com/mie-lab/geospatialOT.git
cd geospatialOT
conda create -n geot_env
conda activate geot_env
pip install .
Check out our tutorial to get started with simple examples.
What is the goal?
- Evaluate and compare ML models for geospatial prediction tasks
- Assess their spatial goodness - measuring how well their predictions match the spatial distribution of the ground truth (standard metrics, e.g. MSE, ignore the spatial location; usually just the average error across locations is reported)
How does it work?
- The difference between the spatial distribution of the ground truth observations and the predicted spatial distribution is measured using Optimal Transport, i.e., the Wasserstein Distance or Earth Mover's Distance (EMD)
- For a simple explanation, check out the EMD between signatures. Our method is based on discrete OT, where the sample points are spatial locations and the mass at these points are the observations or predictions. OT finds the minimal cost transportation matrix, i.e. the mass that must be transported between each pair of locations to align the predicted with the observed distribution.
- In practice, we provide an interface to other OT python packages for the specific use case on geospatial data. Here, the input is usually just the
observations
at a set of location, thepredictions
and thecost matrix
. The output is the OT error (a single number) or the optimal transport matrix T. - Apart from evaluation, OT can be used as a loss function via the Sinkhorn loss.
What are typical use cases?
- This method is relevant any application where the spatial distribution of the prediction errors matters.
- Typical examples are applications where prediction errors are associated with relocation or reallocation costs. For example, errors in bike sharing demand prediction lead to users relocating to another station that is further away, etc
- Further examples: predicting wildfire spread or glacier retreat, forecasting weather or traffic, estimating poverty or crime rates, etc.
How to use your own data?
- The tutorial notebook provides examples using random values. You can adapt the code to use your own data by replacing the variables such as
locations
,observations
,predictions
andcost_matrix
. - If you have trouble, open an issue or get in touch!