Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] stage3 cost more gpu vram than stage 2,why? #5153

Open
zhangvia opened this issue Feb 19, 2024 · 0 comments
Open

[BUG] stage3 cost more gpu vram than stage 2,why? #5153

zhangvia opened this issue Feb 19, 2024 · 0 comments
Labels
bug Something isn't working training

Comments

@zhangvia
Copy link

Describe the bug
i'm using the deepspeed to tran the animat anyone.i want to use stage3 to reduce the cost of gpu vram. theoretically,stage3 will reduce more memory than stage2. but in fact, it didn't

To Reproduce
use this repo
use deepspeed with accelerate to train the stage 2(the 2nd stage of this model not the stage 2 of deepspeed). and you will find the deepspeed stage3 will cost more memory than deepspeed stage2. besides, the cpu offload has no help to reduce memory,i was confused

@zhangvia zhangvia added bug Something isn't working training labels Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

1 participant