-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XLA:GPU] Add support for NCCL ncclCommInitRankScalable API #21273
base: main
Are you sure you want to change the base?
Conversation
40286ab
to
d2fb81e
Compare
CC @ezhulenev |
8a46f76
to
86e50af
Compare
86e50af
to
98ef02d
Compare
// Returns true if this clique is a subset of `other`: both cliques have the | ||
// same `stream_id` and all clique devices are part of `other` clique. | ||
bool IsSubsetOf(const CliqueKey& other) const final; | ||
|
||
// Returns a copy of the key (subkey) with the root device properly set given | ||
// nroots and root_seq_id. The subkey is used to generate a NcclCliqueId. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that a bit more details would be helpful, I don't have the context, but the prorblem is that after reading this documentation I still don't have any idea what's going on :) (cc @frgossen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to be a bit more verbose. Let me know if it is better.
The NCCL doc is at https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#creating-a-communicator-using-multiple-nccluniqueids
06fcca9
to
bb14e02
Compare
bb14e02
to
f146a48
Compare
ncclCommInitRankScalable
enables the initialization of communicators via multiple roots which improves the init performance at large scale.The maximum number of ranks associated with a root rank to initialize a NCCL communicator can be tuned via
--xla_gpu_nccl_init_max_rank_per_root_ratio
. Default is 128 ranks per root.