Skip to content

MoritzM00/proba-forecasting

Repository files navigation

Probabilistic Timeseries Forecasting Challenge

Tests pre-commit License

Repository for the Probabilistic Timeseries Forecasting Challenge. This challenge focuses on quantile forecasting for timeseries data in Germany and Karlsruhe. Results can be found here.

Forecasts are inherently uncertain, and it is important to quantify this uncertainty. The goal is to predict some kind of distribution of future values, rather than just a single point estimate. Quantiles are a relatively straightforward way to quantify such an uncertainty. The challenge is based on two datasets: bikes and energy. A third dataset no2 is also available, but was not selected for forecast submissions.

DVC tracked parameters, as well as metrics and plots can be found on DVC Studio.

Quickstart

First, clone the repository:

git clone https://github.com/MoritzM00/proba-forecasting
cd proba-forecasting

Then follow the instructions here to set up a dev environment.

Finally, reproduce the results by running:

dvc pull
dvc repro

Data Pipeline

The data pipeline is fully automated using DVC's data and experiment versioning, as well as caching and remote storage capabilities. The pipeline can be visualized using dvc dag, or via the web using the project's DagsHub location:

flowchart TD
        node1["eval@bikes"]
        node2["eval@energy"]
        node3["prepare@bikes"]
        node4["prepare@energy"]
        node5["submit"]
        node6["train@bikes"]
        node7["train@energy"]
        node3-->node1
        node3-->node6
        node4-->node2
        node4-->node7
        node6-->node1
        node6-->node5
        node7-->node2
        node7-->node5
Loading

The pipeline consists of four stages:

  1. prepare: Downloads and preprocesses the data.
  2. train: Train and save the models.
  3. eval: Evaluate the models using Timeseries Cross-validation with expanding time windows.
  4. submit: Create out-of-sample forecasts in the required format for this forecasting challenge.

Stages 1-3 are run for two datasets: bikes and energy.

Development Guide

This guide shows how to reproduce the results of the challenge.

Set up the environment

  1. Install uv
  2. Set up the environment:
make setup
source .venv/bin/activate

Reproduce the results

Run

dvc pull
dvc repro

to reproduce the results, or equivalently

uv run dvc pull
uv run dvc repro

if you did not activate the virtual environment.

dvc pull first pulls all the data (including experiments) from the remote storage, and dvc repro then runs the pipeline to reproduce the results.

Documentation

The Documentation is automatically deployed to GitHub Pages.

To view the documentation locally, run:

make docs_view

Credits

This project was generated with the Light-weight Python Template by Moritz Mistol.