Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate online dataset prefetching if needed #175

Closed
fotstrt opened this issue Mar 2, 2023 · 2 comments
Closed

Accelerate online dataset prefetching if needed #175

fotstrt opened this issue Mar 2, 2023 · 2 comments
Labels
enhancement New feature or request Trainer Server

Comments

@fotstrt
Copy link
Contributor

fotstrt commented Mar 2, 2023

Currently, prefetching the next partition of keys and samples is done synchronously by each data worker. It is not clear yet if this will block/delay training or not. If it turns out to do so, we would probably need some kind of multithreading or asyncio support.

@MaxiBoether
Copy link
Contributor

This also affects the EvaluationDataset introdocued in #269

MaxiBoether added a commit that referenced this issue Oct 12, 2023
In order to hide the latency of data transfer from selector during
training, we add an configurable number of partitions to be prefetched
in the dataloader. Furthermore, we change the storage and selector to
use processes instead of threads for serving gRPC requests.

Solves #175.
@MaxiBoether
Copy link
Contributor

Solved (for now) by #301.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Trainer Server
Projects
None yet
Development

No branches or pull requests

2 participants