-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPU offloading capabilities #10
Conversation
Signed-off-by: Keith Valin <[email protected]>
Thank you for this! LGTM PTAL @Maxusmusti |
Thanks Keith. For now I think it is sufficient to just provide the offload_optimizer and offload_param options, the deepspeed_optimizer should be picked automatically ... no optimizer offload => FusedAdam and cpu optimizer offload => DeepSpeedCPUAdam. Otherwise we'll need more error checking, e.g. using cpu optimizer offload with FusedAdam optimizer is not supported. |
…er param Signed-off-by: Keith Valin <[email protected]>
"offload_param": {"device": "none"}, | ||
"offload_optimizer": {"device": "none"}, | ||
"offload_param": {"device": offload_param}, | ||
"offload_optimizer": {"device": offload_optimizer}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using ZeRO-2 but parameter offloading is only available in ZeRO-3? Sounds like offload_param
won't actually do anything with stage = 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is correct, I woldnt allow "offload_param"
to be configurable here for that reason.
we have another PR coming #12 that will add offloading more robustly. Feedback there would be appreciated! |
@@ -270,6 +276,8 @@ def main(args): | |||
) | |||
parser.add_argument("--is_granite", action="store_true") | |||
parser.add_argument("--max_batch_len", type=int, default=60000) | |||
parser.add_argument("--offload_optimizer", type=str, default="none", choices=["none", "cpu"]) | |||
parser.add_argument("--offload_param", type=str, default="none", choices=["none", "cpu"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldnt allow this. It is not supported by zero2
As mentioned above, the feature has been implemented in PR #12 |
This PR allows for Deepspeed to offload some GPU compute requirements to the CPU. This drastically lowers required VRAM to run the training backend by moving data to system memory. With this, training can be run on a single GH200 with 96 GiB of VRAM.
More information: https://www.deepspeed.ai/tutorials/zero-offload/