Example Analyses with Megatron-LM Models

Below shows the Table 2 in the paper Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM.

Training Analysis

llm-analysis are run with the setups described in the paper, and the outputs match the Training time for 300B tokens (days) reported for different schemes.

For PTD Parallelism scheme, run_train.sh is used and the output summaries are under outputs_train.
For ZeRO-3 without Model Parallelism scheme, run_train_zero.sh is used and the output summaries are under outputs_train_zero

References

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Example Analyses with Megatron-LM Models

Training Analysis

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Example Analyses with Megatron-LM Models

Training Analysis

References