[QUESTION]Why does Megatron-LM using gloo backend when Creating Parrallel Group ? #1177
Replies: 4 comments
-
Most of groups use the default backend. |
Beta Was this translation helpful? Give feedback.
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
-
Hi, I found some codes fixed to use gloo , for example https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L557, But I met issue when creating gloo backend. If I manually set all the "gloo" to "nccl", it works. What is the influence? Will it be okay if we replace all the "gloo" to "nccl" ? Thank you. |
Beta Was this translation helpful? Give feedback.
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
-
Your question
Why does Megatron-LM using gloo backend not value paased by --distributed-backend when Creating Parrallel Group ?
Ask a clear and concise question about Megatron-LM.
Beta Was this translation helpful? Give feedback.
All reactions