[QUESTION]how to incorporate MOE into hybrid mamba efficiently #1243

sunying2018 · 2024-10-21T09:48:59Z

sunying2018
Oct 21, 2024

how to incorporate MOE into hybrid mamba efficiently
Hi,

You provided code pretrain_mamba.py contains the mamba spec to contain transformer, mamba and mlp layers. I'm wondering if there is an easier way to incorporate MOE to replace some of MLP layers? Since I found args.spec conflicts with simply specified moe args setting. What considerations require these two to be set as mutually exclusive? Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION]how to incorporate MOE into hybrid mamba efficiently #1243

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

[QUESTION]how to incorporate MOE into hybrid mamba efficiently #1243

sunying2018 Oct 21, 2024

Replies: 0 comments

sunying2018
Oct 21, 2024