[QUESTION] Setting num-attention-heads=0 for Mamba #1194
Unanswered
zixianwang2022
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your question
Hi, it seems like I have triggered many assertion errors when trying to train pure Mamba2 without any attention by setting
NUM_ATTENTION_HEADS=0
.Can I just give
and give
NUM_ATTENTION_HEADS
a random num to avoid triggering assertions?I don't see all the errors by doing so.
Beta Was this translation helpful? Give feedback.
All reactions