Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Loading for Benchmarking #33

Closed
6 tasks done
fredmontet opened this issue Dec 4, 2023 · 2 comments
Closed
6 tasks done

Data Loading for Benchmarking #33

fredmontet opened this issue Dec 4, 2023 · 2 comments
Assignees
Labels
module Related to data science helper modules

Comments

@fredmontet
Copy link
Collaborator

fredmontet commented Dec 4, 2023

Goal

Develop a common way to load data for benchmarking purposes

Description

One of the issue with benchmarking is the difference in dataset structure. Within the onTime benchmarking module, this problem aims at being solved in three different aspects. First, with the availability of common datasets in the library. Second, with helper methods to create custom datasets and use them alongside any other 'onTime compatible' ones. Third, by following the already existing dataset structure 'TimeEval' as suggested by Jonathan.

Tasks

  • Check if something could be used from TimeEval[1]
  • Check the TimeEval data format[1].
  • Create a directory called data in https://github.com/fredmontet/ontime/tree/develop/src/ontime/module to make this new module
  • Add some basic datasets, see if those in Darts could be used.
  • Develop helper methods to add custom datasets
  • Make an example of how to use the developed method in a notebook

Reference

[1] https://timeeval.readthedocs.io/en/latest/

@fredmontet fredmontet added the module Related to data science helper modules label Dec 4, 2023
@fredmontet fredmontet added this to the v0.7 - Benchmarking milestone Dec 4, 2023
@fredmontet fredmontet added this to onTime Dec 4, 2023
@fredmontet fredmontet moved this to Todo in onTime Dec 4, 2023
@JunodCharlie
Copy link
Collaborator

Check if something could be used from TimeEval
TimeEval has a file converter script that could be useful. but it's meant to be used on the command line so it'd need to be adapted to fit into ontime.

The benchmarking part itself looks interesting as well though the doc in this part is still a WIP.

Check the TimeEval data format
The file format is basically a comma separated CSV with headers and an index column (integers or timestamps) doc link

@JunodCharlie JunodCharlie closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in onTime Dec 11, 2023
@JunodCharlie JunodCharlie reopened this Dec 11, 2023
JunodCharlie added a commit that referenced this issue Dec 19, 2023
…timeseries and/or convert other data format to ontime timeseries have been added to the TimeSeries class (it made more sense to me to put them there, though they can be easily moved to anothe module if needed). Usage examples have been added to the related notebook. The class DatasetLoader has been modified as well to better accomodate loading datasets from different sources. It now offers datasets from darts and from openml. Usage examples are available in the related notebook.
@fredmontet
Copy link
Collaborator Author

Since the issue #41 was created, this one can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module Related to data science helper modules
Projects
None yet
Development

No branches or pull requests

2 participants