[QUESTION] Megatron-LM `DistributedOptimizer` or NeMo `MegatronDistributedFusedAdam` Optimizer? #1091

TJ-Solergibert · 2024-08-06T16:59:52Z

TJ-Solergibert
Aug 6, 2024

Hi,

After going across both Megatron-LM & NeMo I've found that NeMo configs set by default the MegatronDistributedFusedAdam optimizer from the NeMo framework. But Megatron also contains a DistributedOptimizer.

The NeMo one is based on the Apex's DistributedFusedAdam which incorporates ZeRO-2. I would like to know which one is better, both in terms of throughput and performance or which config do you recommend to use for large scale training.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Megatron-LM `DistributedOptimizer` or NeMo `MegatronDistributedFusedAdam` Optimizer? #1091

{{title}}

Replies: 0 comments

Select a reply

[QUESTION] Megatron-LM DistributedOptimizer or NeMo MegatronDistributedFusedAdam Optimizer? #1091

TJ-Solergibert Aug 6, 2024

Replies: 0 comments

[QUESTION] Megatron-LM `DistributedOptimizer` or NeMo `MegatronDistributedFusedAdam` Optimizer? #1091

TJ-Solergibert
Aug 6, 2024