New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

For model-parallel&multi-gpu training and inference #448

Open

hiun opened this issue Sep 27, 2020 · 0 comments

hiun commented Sep 27, 2020 •

edited

Loading

Hi,

I am looking for materials for how GPU memory in DeepSpeed is used for model-parallel&multi-gpu training setting (=means all weights are not fit into single GPU memory even DeepSpeed is applied).

For my current understanding, the following post only visualizes data-parallel&multi-gpu setting. https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/. Any reference materials are appreciated to understand the internals. (both DeepSpeed doc or outside doc is ok).

Is DeepSpeed provides inference or serving API for model-parallel&multi-gpu environment? (Because in model-parallel setting, weights are partitioned, I think this is non-trivial to serve. I currently looking for using typical inference function in the training process, however.)

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment