H5Dataset with PyTorch DataLoader #4
-
Hello, I wanted to use the H5Dataset with PyTorch's DataLoader to load data with multiple workers. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello 👋 The advantage of Here is an example (
You can see that the time per iteration is closer to 1s in the second case. |
Beta Was this translation helpful? Give feedback.
Hello 👋
The advantage of
DataLoader
withnum_workers > 0
is that the data processing is concurrent with the main process. Therefore, if a loop iteration takes longer than it takes to fetch a batch and transfer it to the main process, the next iteration will not have to wait for data. However, if iterations are fast, the overhead added by the transfer of data between processes could outweigh the benefits.Here is an example (
train.h5
contains 1M samples) where using aDataLoader
is worthwhile. The effect is accentuated by the (very) large batch size.