Question about communication time calculation for ZeRO algorithms #22
Unanswered
skyshine102
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am studying the communication cost of zero algorithms. I found that current implementation of llm-analysis only estimates fwd pass. May I ask if you plan to extend the estimation to bwd pass in the future? Or this is unnecessary?
llm-analysis/llm_analysis/analysis.py
Line 1218 in 9932ff4
I'm not 100% sure if the communication time of bwd pass will be bounded by 2*fwd pass? According to a communication cost table in "Rethinking Memory and Communication Costs for Efficient Large Language Model Training" paper, it seems that it is true though.
It would be helpful if you can give some comments on this topic and your thought on the design choice behind. Thank you for your wonderful tool!
Beta Was this translation helpful? Give feedback.
All reactions