You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, H2D and D2H data transfer occur on the default CUDA stream, which is also used by the model forward/backward pass computation. This limits our utilization of the GPU and reduces training throughput by a small amount.
By separating computation from data transfer on different CUDA streams, our processing pipeline can be better optimized.
Is your feature request related to a problem? Please describe.
Currently, H2D and D2H data transfer occur on the default CUDA stream, which is also used by the model forward/backward pass computation. This limits our utilization of the GPU and reduces training throughput by a small amount.
By separating computation from data transfer on different CUDA streams, our processing pipeline can be better optimized.
Salient is an example of a system that uses a well-optimized pipeline for training GNNs for node classification. Its separated CUDA stream implementation is located here: https://github.com/MITIBMxGraph/SALIENT/blob/master/fast_trainer/transferers.py#L22
The text was updated successfully, but these errors were encountered: