Add CPU offloading capabilities #10

kdvalin · 2024-05-09T15:30:18Z

This PR allows for Deepspeed to offload some GPU compute requirements to the CPU. This drastically lowers required VRAM to run the training backend by moving data to system memory. With this, training can be run on a single GH200 with 96 GiB of VRAM.

More information: https://www.deepspeed.ai/tutorials/zero-offload/

Signed-off-by: Keith Valin <[email protected]>

n1hility · 2024-05-15T03:39:23Z

Thank you for this! LGTM

PTAL @Maxusmusti

mnmehta · 2024-05-20T17:14:32Z

Thanks Keith. For now I think it is sufficient to just provide the offload_optimizer and offload_param options, the deepspeed_optimizer should be picked automatically ... no optimizer offload => FusedAdam and cpu optimizer offload => DeepSpeedCPUAdam. Otherwise we'll need more error checking, e.g. using cpu optimizer offload with FusedAdam optimizer is not supported.

…er param Signed-off-by: Keith Valin <[email protected]>

markmc · 2024-05-29T09:04:56Z

main_ds.py

-            "offload_param": {"device": "none"},
-            "offload_optimizer": {"device": "none"},
+            "offload_param": {"device": offload_param},
+            "offload_optimizer": {"device": offload_optimizer},


We're using ZeRO-2 but parameter offloading is only available in ZeRO-3? Sounds like offload_param won't actually do anything with stage = 2?

that is correct, I woldnt allow "offload_param" to be configurable here for that reason.

aldopareja · 2024-06-17T15:03:21Z

we have another PR coming #12 that will add offloading more robustly. Feedback there would be appreciated!

fabianlim · 2024-06-17T22:44:11Z

main_ds.py

@@ -270,6 +276,8 @@ def main(args):
    )
    parser.add_argument("--is_granite", action="store_true")
    parser.add_argument("--max_batch_len", type=int, default=60000)
+    parser.add_argument("--offload_optimizer", type=str, default="none", choices=["none", "cpu"])
+    parser.add_argument("--offload_param", type=str, default="none", choices=["none", "cpu"])


we shouldnt allow this. It is not supported by zero2

Maxusmusti · 2024-06-19T21:18:13Z

As mentioned above, the feature has been implemented in PR #12

feat: Add CPU offloading capabilities

2db4c06

Signed-off-by: Keith Valin <[email protected]>

kdvalin force-pushed the cpu-offload branch from cc42bee to 2db4c06 Compare May 9, 2024 15:31

Remove deepspeed_optimizer param, detection based off offload_optimiz…

23eb677

…er param Signed-off-by: Keith Valin <[email protected]>

markmc reviewed May 29, 2024

View reviewed changes

fabianlim reviewed Jun 17, 2024

View reviewed changes

Maxusmusti closed this Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CPU offloading capabilities #10

Add CPU offloading capabilities #10

kdvalin commented May 9, 2024

n1hility commented May 15, 2024

mnmehta commented May 20, 2024

markmc May 29, 2024

fabianlim Jun 17, 2024

aldopareja commented Jun 17, 2024

fabianlim Jun 17, 2024

Maxusmusti commented Jun 19, 2024

Add CPU offloading capabilities #10

Add CPU offloading capabilities #10

Conversation

kdvalin commented May 9, 2024

n1hility commented May 15, 2024

mnmehta commented May 20, 2024

markmc May 29, 2024

Choose a reason for hiding this comment

fabianlim Jun 17, 2024

Choose a reason for hiding this comment

aldopareja commented Jun 17, 2024

fabianlim Jun 17, 2024

Choose a reason for hiding this comment

Maxusmusti commented Jun 19, 2024