Repository for the Probabilistic Timeseries Forecasting Challenge. This challenge focuses on quantile forecasting for timeseries data in Germany and Karlsruhe. Results can be found here.
Forecasts are inherently uncertain, and it is important to quantify this uncertainty. The goal is to predict some kind of distribution of future values, rather than just a single point estimate. Quantiles are a relatively straightforward way to quantify such an uncertainty.
The challenge is based on two datasets: bikes
and energy
. A third dataset no2
is also available, but was not selected for forecast submissions.
DVC tracked parameters, as well as metrics and plots can be found on DVC Studio.
First, clone the repository:
git clone https://github.com/MoritzM00/proba-forecasting
cd proba-forecasting
Then follow the instructions here to set up a dev environment.
Finally, reproduce the results by running:
dvc pull
dvc repro
The data pipeline is fully automated using DVC's data and experiment versioning, as well as caching and remote storage capabilities.
The pipeline can be visualized using dvc dag
, or via the web using the project's DagsHub location:
flowchart TD
node1["eval@bikes"]
node2["eval@energy"]
node3["prepare@bikes"]
node4["prepare@energy"]
node5["submit"]
node6["train@bikes"]
node7["train@energy"]
node3-->node1
node3-->node6
node4-->node2
node4-->node7
node6-->node1
node6-->node5
node7-->node2
node7-->node5
The pipeline consists of four stages:
prepare
: Downloads and preprocesses the data.train
: Train and save the models.eval
: Evaluate the models using Timeseries Cross-validation with expanding time windows.submit
: Create out-of-sample forecasts in the required format for this forecasting challenge.
Stages 1-3 are run for two datasets: bikes
and energy
.
This guide shows how to reproduce the results of the challenge.
- Install uv
- Set up the environment:
make setup
source .venv/bin/activate
Run
dvc pull
dvc repro
to reproduce the results, or equivalently
uv run dvc pull
uv run dvc repro
if you did not activate the virtual environment.
dvc pull
first pulls all the data (including experiments) from the remote storage, and dvc repro
then runs the pipeline to reproduce the results.
The Documentation is automatically deployed to GitHub Pages.
To view the documentation locally, run:
make docs_view
This project was generated with the Light-weight Python Template by Moritz Mistol.