Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU offloading capabilities #10

Closed
wants to merge 2 commits into from

Conversation

kdvalin
Copy link

@kdvalin kdvalin commented May 9, 2024

This PR allows for Deepspeed to offload some GPU compute requirements to the CPU. This drastically lowers required VRAM to run the training backend by moving data to system memory. With this, training can be run on a single GH200 with 96 GiB of VRAM.

More information: https://www.deepspeed.ai/tutorials/zero-offload/

@n1hility
Copy link
Member

Thank you for this! LGTM

PTAL @Maxusmusti

@mnmehta
Copy link

mnmehta commented May 20, 2024

Thanks Keith. For now I think it is sufficient to just provide the offload_optimizer and offload_param options, the deepspeed_optimizer should be picked automatically ... no optimizer offload => FusedAdam and cpu optimizer offload => DeepSpeedCPUAdam. Otherwise we'll need more error checking, e.g. using cpu optimizer offload with FusedAdam optimizer is not supported.

"offload_param": {"device": "none"},
"offload_optimizer": {"device": "none"},
"offload_param": {"device": offload_param},
"offload_optimizer": {"device": offload_optimizer},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using ZeRO-2 but parameter offloading is only available in ZeRO-3? Sounds like offload_param won't actually do anything with stage = 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct, I woldnt allow "offload_param" to be configurable here for that reason.

@aldopareja
Copy link
Member

we have another PR coming #12 that will add offloading more robustly. Feedback there would be appreciated!

@@ -270,6 +276,8 @@ def main(args):
)
parser.add_argument("--is_granite", action="store_true")
parser.add_argument("--max_batch_len", type=int, default=60000)
parser.add_argument("--offload_optimizer", type=str, default="none", choices=["none", "cpu"])
parser.add_argument("--offload_param", type=str, default="none", choices=["none", "cpu"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldnt allow this. It is not supported by zero2

@Maxusmusti
Copy link
Contributor

As mentioned above, the feature has been implemented in PR #12

@Maxusmusti Maxusmusti closed this Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants