You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking for materials for how GPU memory in DeepSpeed is used for model-parallel&multi-gpu training setting (=means all weights are not fit into single GPU memory even DeepSpeed is applied).
Is DeepSpeed provides inference or serving API for model-parallel&multi-gpu environment? (Because in model-parallel setting, weights are partitioned, I think this is non-trivial to serve. I currently looking for using typical inference function in the training process, however.)
Thank you.
The text was updated successfully, but these errors were encountered:
Hi,
For my current understanding, the following post only visualizes data-parallel&multi-gpu setting. https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/. Any reference materials are appreciated to understand the internals. (both DeepSpeed doc or outside doc is ok).
Thank you.
The text was updated successfully, but these errors were encountered: