Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for huggingface dataset streaming #16

Open
harrykeightley opened this issue Oct 17, 2023 · 0 comments
Open

Allow for huggingface dataset streaming #16

harrykeightley opened this issue Oct 17, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@harrykeightley
Copy link
Contributor

It's currently a bit of a hassle to have to download a potentially large dataset from HF before being able to train on it. The datasets library has an option to use IterableDataset and IterableDatasetDict, and stream in the data rather than blocking while loading it.

Handling these classes though is different than the regular Dataset and DatasetDict and we'll need to decide whether this should be supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant