Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why deepspeed transformer only adjust init range on rank 0? #253

Open
LiweiPeng opened this issue Jun 4, 2020 · 5 comments
Open

why deepspeed transformer only adjust init range on rank 0? #253

LiweiPeng opened this issue Jun 4, 2020 · 5 comments

Comments

@LiweiPeng
Copy link

Thanks for open source your deepspeed transformer code.

At deepspeed_cuda.py function init_transformer_weights, the adjust init range feature is applied to rank 0 only. Can you explain why rank 0 only? Shouldn't it be applied to all ranks? Thanks.

@samyam
Copy link
Contributor

samyam commented Jun 4, 2020

@LiweiPeng : The adjust init range happens on rank 0, but since the weights are broadcasted to all the ranks in deepspeed.initialize, it will be propagated to all the ranks.

@LiweiPeng
Copy link
Author

@samyam Thanks for the quick response. Because the deepspeed transformer code is an excellent feature itself, in case someone wants to use the deepspeed transformer code only without using the other part of the deepspeed functionality, is the adjust init range still only on rank 0? Thanks.

@samyam
Copy link
Contributor

samyam commented Jun 4, 2020

@LiweiPeng I think so. Regardless of what you use for data parallelism (deepspeed or something else), it must do a broadcast at the beginning of the training to make sure all the parameters across different ranks are in sync. One issue I can imagine is in case the broadcast happens from an arbitrary rank source and not from rank 0 but that would be pretty weird implementation decision.

@samyam
Copy link
Contributor

samyam commented Jun 4, 2020

@LiweiPeng I do understand though that its a bit confusing. Do you have a suggestion for clarification other than just adding a comment?

@LiweiPeng
Copy link
Author

Thanks for the clarification. I recommend to add some comment for this. I couldn't figure out a better way because it depends on a user's implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants