Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple CUDA streams for true overlap of compute and data transfer on the GPU. #115

Open
JasonMoho opened this issue Sep 15, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@JasonMoho
Copy link
Collaborator

JasonMoho commented Sep 15, 2022

Is your feature request related to a problem? Please describe.
Currently, H2D and D2H data transfer occur on the default CUDA stream, which is also used by the model forward/backward pass computation. This limits our utilization of the GPU and reduces training throughput by a small amount.

By separating computation from data transfer on different CUDA streams, our processing pipeline can be better optimized.

Salient is an example of a system that uses a well-optimized pipeline for training GNNs for node classification. Its separated CUDA stream implementation is located here: https://github.com/MITIBMxGraph/SALIENT/blob/master/fast_trainer/transferers.py#L22

@JasonMoho JasonMoho added the enhancement New feature or request label Sep 15, 2022
@JasonMoho JasonMoho self-assigned this Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant