-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Error when run with multiple GPUs #454
Comments
I found adding "args.local_rank if hasattr(args, 'local_rank') else -1" to DeepSpeedTransformerConfig in BingBertSquad can solve this problem. |
@YeDeming thanks for using DeepSpeed. Sorry about this issue, but I am glad you found a resolution. From your solution, it seems like args.local_rank == -1 in the 2GPU case. Can you please confirm that by logging the value of args.local_rank at startup? |
It seems that you forget to add "local_rank=..." in The problem also apears in nvidia_modelingpreln.py |
Hi @YeDeming Thanks for pointing this out. Yes, you are right we need this argument passed to the kernel. This was not needed previously as the device was set before creating the model. But, after refactoring the code, we forgot to pass it to the kernel. I have made a PR to fix this: deepspeedai/DeepSpeedExamples#58 Thanks. |
Thanks for opening source the great code!
I try to load HugginFace checkpoint and run BingBertSquad example with deepspeed transformer kernal.
The script:
I run in two environment:
(1) 1080ti with the provided docker
1GPU with fp32 --> success
1GPU with fp16 --> NAN
2GPU with fp32 --> error
(2) TITAN RTX and manually use install.sh
1GPU with fp16 --> success
2GPU with fp16--> error
The error on the RTX server is shown in below (it is similiar to the error on the 1080ti sever):
Looking forward to your reply!
Best wishes,
Deming Ye
The text was updated successfully, but these errors were encountered: