[QUESTION] Does Megatron-LM supports Flash Attention for BERT and T5 Pretraining? #979
Unanswered
Leo-T-Zang
asked this question in
Q&A
Replies: 2 comments 1 reply
-
@shanmugamr1992 Please help answering the question. Thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
-
Megatron lm , when you use mcore models will support flash attention in the next couple of weeks. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My question
Does Megatron-LM supports Flash Attention for BERT and T5 Pretraining? If so, where is the code specifically supports such feature?
Thanks!!!
Beta Was this translation helpful? Give feedback.
All reactions