why deepspeed transformer only adjust init range on rank 0? #253

LiweiPeng · 2020-06-04T18:39:48Z

Thanks for open source your deepspeed transformer code.

At deepspeed_cuda.py function init_transformer_weights, the adjust init range feature is applied to rank 0 only. Can you explain why rank 0 only? Shouldn't it be applied to all ranks? Thanks.

samyam · 2020-06-04T19:24:46Z

@LiweiPeng : The adjust init range happens on rank 0, but since the weights are broadcasted to all the ranks in deepspeed.initialize, it will be propagated to all the ranks.

LiweiPeng · 2020-06-04T21:30:06Z

@samyam Thanks for the quick response. Because the deepspeed transformer code is an excellent feature itself, in case someone wants to use the deepspeed transformer code only without using the other part of the deepspeed functionality, is the adjust init range still only on rank 0? Thanks.

samyam · 2020-06-04T21:40:36Z

@LiweiPeng I think so. Regardless of what you use for data parallelism (deepspeed or something else), it must do a broadcast at the beginning of the training to make sure all the parameters across different ranks are in sync. One issue I can imagine is in case the broadcast happens from an arbitrary rank source and not from rank 0 but that would be pretty weird implementation decision.

samyam · 2020-06-04T21:42:26Z

@LiweiPeng I do understand though that its a bit confusing. Do you have a suggestion for clarification other than just adding a comment?

LiweiPeng · 2020-06-04T21:58:04Z

Thanks for the clarification. I recommend to add some comment for this. I couldn't figure out a better way because it depends on a user's implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why deepspeed transformer only adjust init range on rank 0? #253

why deepspeed transformer only adjust init range on rank 0? #253

LiweiPeng commented Jun 4, 2020

samyam commented Jun 4, 2020

LiweiPeng commented Jun 4, 2020

samyam commented Jun 4, 2020

samyam commented Jun 4, 2020

LiweiPeng commented Jun 4, 2020

why deepspeed transformer only adjust init range on rank 0? #253

why deepspeed transformer only adjust init range on rank 0? #253

Comments

LiweiPeng commented Jun 4, 2020

samyam commented Jun 4, 2020

LiweiPeng commented Jun 4, 2020

samyam commented Jun 4, 2020

samyam commented Jun 4, 2020

LiweiPeng commented Jun 4, 2020