[QUESTION]how to incorporate MOE into hybrid mamba efficiently #1243
Unanswered
sunying2018
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
how to incorporate MOE into hybrid mamba efficiently
Hi,
You provided code pretrain_mamba.py contains the mamba spec to contain transformer, mamba and mlp layers. I'm wondering if there is an easier way to incorporate MOE to replace some of MLP layers? Since I found args.spec conflicts with simply specified moe args setting. What considerations require these two to be set as mutually exclusive? Thank you so much!
Beta Was this translation helpful? Give feedback.
All reactions