[QUESTION] Megatron-LM DistributedOptimizer
or NeMo MegatronDistributedFusedAdam
Optimizer?
#1091
Unanswered
TJ-Solergibert
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
After going across both Megatron-LM & NeMo I've found that NeMo configs set by default the
MegatronDistributedFusedAdam
optimizer from the NeMo framework. But Megatron also contains aDistributedOptimizer
.The NeMo one is based on the Apex's
DistributedFusedAdam
which incorporates ZeRO-2. I would like to know which one is better, both in terms of throughput and performance or which config do you recommend to use for large scale training.Thanks!
Beta Was this translation helpful? Give feedback.
All reactions