Use multiple CUDA streams for true overlap of compute and data transfer on the GPU. #115

JasonMoho · 2022-09-15T19:04:36Z

Is your feature request related to a problem? Please describe.
Currently, H2D and D2H data transfer occur on the default CUDA stream, which is also used by the model forward/backward pass computation. This limits our utilization of the GPU and reduces training throughput by a small amount.

By separating computation from data transfer on different CUDA streams, our processing pipeline can be better optimized.

Salient is an example of a system that uses a well-optimized pipeline for training GNNs for node classification. Its separated CUDA stream implementation is located here: https://github.com/MITIBMxGraph/SALIENT/blob/master/fast_trainer/transferers.py#L22

JasonMoho added the enhancement New feature or request label Sep 15, 2022

JasonMoho self-assigned this Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiple CUDA streams for true overlap of compute and data transfer on the GPU. #115

Use multiple CUDA streams for true overlap of compute and data transfer on the GPU. #115

JasonMoho commented Sep 15, 2022 •

edited

Loading

Use multiple CUDA streams for true overlap of compute and data transfer on the GPU. #115

Use multiple CUDA streams for true overlap of compute and data transfer on the GPU. #115

Comments

JasonMoho commented Sep 15, 2022 • edited Loading

JasonMoho commented Sep 15, 2022 •

edited

Loading