Probabilistic Timeseries Forecasting Challenge

Repository for the Probabilistic Timeseries Forecasting Challenge. This challenge focuses on quantile forecasting for timeseries data in Germany and Karlsruhe. Results can be found here.

Forecasts are inherently uncertain, and it is important to quantify this uncertainty. The goal is to predict some kind of distribution of future values, rather than just a single point estimate. Quantiles are a relatively straightforward way to quantify such an uncertainty. The challenge is based on two datasets: bikes and energy. A third dataset no2 is also available, but was not selected for forecast submissions.

DVC tracked parameters, as well as metrics and plots can be found on DVC Studio.

Quickstart

First, clone the repository:

git clone https://github.com/MoritzM00/proba-forecasting
cd proba-forecasting

Then follow the instructions here to set up a dev environment.

Finally, reproduce the results by running:

dvc pull
dvc repro

Data Pipeline

The data pipeline is fully automated using DVC's data and experiment versioning, as well as caching and remote storage capabilities. The pipeline can be visualized using dvc dag, or via the web using the project's DagsHub location:

flowchart TD
        node1["eval@bikes"]
        node2["eval@energy"]
        node3["prepare@bikes"]
        node4["prepare@energy"]
        node5["submit"]
        node6["train@bikes"]
        node7["train@energy"]
        node3-->node1
        node3-->node6
        node4-->node2
        node4-->node7
        node6-->node1
        node6-->node5
        node7-->node2
        node7-->node5

The pipeline consists of four stages:

prepare: Downloads and preprocesses the data.
train: Train and save the models.
eval: Evaluate the models using Timeseries Cross-validation with expanding time windows.
submit: Create out-of-sample forecasts in the required format for this forecasting challenge.

Stages 1-3 are run for two datasets: bikes and energy.

Development Guide

This guide shows how to reproduce the results of the challenge.

Set up the environment

Install uv
Set up the environment:

make setup
source .venv/bin/activate

Reproduce the results

Run

dvc pull
dvc repro

to reproduce the results, or equivalently

uv run dvc pull
uv run dvc repro

if you did not activate the virtual environment.

dvc pull first pulls all the data (including experiments) from the remote storage, and dvc repro then runs the pipeline to reproduce the results.

Documentation

The Documentation is automatically deployed to GitHub Pages.

To view the documentation locally, run:

make docs_view

Credits

This project was generated with the Light-weight Python Template by Moritz Mistol.

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
.dvc		.dvc
.github		.github
data		data
models		models
notebooks		notebooks
output		output
scripts		scripts
src/probafcst		src/probafcst
tests		tests
.dvcignore		.dvcignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probabilistic Timeseries Forecasting Challenge

Quickstart

Data Pipeline

Development Guide

Set up the environment

Reproduce the results

Documentation

Credits

About

Contributors 2

Languages

License

MoritzM00/proba-forecasting

Folders and files

Latest commit

History

Repository files navigation

Probabilistic Timeseries Forecasting Challenge

Quickstart

Data Pipeline

Development Guide

Set up the environment

Reproduce the results

Documentation

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages