Accelerate online dataset prefetching if needed #175

fotstrt · 2023-03-02T13:27:43Z

Currently, prefetching the next partition of keys and samples is done synchronously by each data worker. It is not clear yet if this will block/delay training or not. If it turns out to do so, we would probably need some kind of multithreading or asyncio support.

The text was updated successfully, but these errors were encountered:

MaxiBoether · 2023-06-08T20:35:32Z

This also affects the EvaluationDataset introdocued in #269

In order to hide the latency of data transfer from selector during training, we add an configurable number of partitions to be prefetched in the dataloader. Furthermore, we change the storage and selector to use processes instead of threads for serving gRPC requests. Solves #175.

MaxiBoether · 2023-11-09T14:43:29Z

Solved (for now) by #301.

fotstrt added enhancement New feature or request Trainer Server labels Mar 2, 2023

MaxiBoether mentioned this issue May 10, 2023

Asynchronous prefetching of data #253

Closed

MaxiBoether mentioned this issue Sep 22, 2023

Asynchronous prefetching of data #301

Merged

MaxiBoether closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate online dataset prefetching if needed #175

Accelerate online dataset prefetching if needed #175

fotstrt commented Mar 2, 2023

MaxiBoether commented Jun 8, 2023

MaxiBoether commented Nov 9, 2023

Accelerate online dataset prefetching if needed #175

Accelerate online dataset prefetching if needed #175

Comments

fotstrt commented Mar 2, 2023

MaxiBoether commented Jun 8, 2023

MaxiBoether commented Nov 9, 2023