Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I train two or more differnt ForecastDFDataset at the same time? #151

Open
yhlee52 opened this issue Oct 10, 2024 · 4 comments
Open

Comments

@yhlee52
Copy link

yhlee52 commented Oct 10, 2024

I am working with data that consists of multiple CSV files with the same characteristics for each column but different collection environments. Since combining individual CSV files into a single dataset creates an inappropriate dataset at the merging point, I would like to know how to create ForecastDFDataset objects from multiple individual CSV files and combine them together for training at once. Is there a way to combine the generated ForecastDFDataset objects into one or to put multiple ForecastDFDataset objects into the trainer after model? Such as 'train_datasets' in below
self.trainer = Trainer( model = self.model, args = self.training_args, train_dataset = train_datasets, eval_dataset = eval_dataset, )

@Dylan0211
Copy link

I am working with data that consists of multiple CSV files with the same characteristics for each column but different collection environments. Since combining individual CSV files into a single dataset creates an inappropriate dataset at the merging point, I would like to know how to create ForecastDFDataset objects from multiple individual CSV files and combine them together for training at once. Is there a way to combine the generated ForecastDFDataset objects into one or to put multiple ForecastDFDataset objects into the trainer after model? Such as 'train_datasets' in below self.trainer = Trainer( model = self.model, args = self.training_args, train_dataset = train_datasets, eval_dataset = eval_dataset, )

I also face this issue and my solution is to use ConcatDataset from torch to combine these ForecastDFDataset objects.

@yhlee52
Copy link
Author

yhlee52 commented Oct 11, 2024

I tried the method you suggested and it seems to work as intended.
I was thinking of a difficult method, but I had no idea there was such a simple way.
Thank you very much @Dylan0211 !

@wgifford
Copy link
Collaborator

wgifford commented Oct 11, 2024

Hi @yhlee52 @Dylan0211

This functionality is built into ForecastDFDataset -- it supports distinguishing time series by IDs. You can add an ID column (calling it "id" for example) to each of your CSV files (each CSV receiving a unique ID value) before you concatenate them, and then use the id_columns = ["id"] parameter to point to this column when you instantiate ForecastDFDataset. Internally ForecastDFDataset will use ConcatDataset to separate the time series.

Other components in our library support the id_columns argument to make working with multi-time series datasets easier.

@yhlee52
Copy link
Author

yhlee52 commented Oct 21, 2024

Hi @wgifford , it is very helpful features! I will try to fix my codes to handle my data with the arg. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants