Replies: 2 comments
-
mcore model uses modules under megatron.core. Most layers are optimized with transformer engine |
Beta Was this translation helpful? Give feedback.
0 replies
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What is the difference between with/without mcore model in pretrain_gpt.py?
pretrain_gpt.py#L33
Beta Was this translation helpful? Give feedback.
All reactions