-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update: support Qwen2-57B-A14B #7835
Conversation
You should fix up Specifically, here: Line 5808 in c9ee711
do the same as DeepSeekV2 here:Line 4484 in c9ee711
and here: Line 6352 in c9ee711
and here: Line 5017 in c9ee711
|
I'm sorry, as I'm not that well versed in this topic, can you please explain why those changes need to happen? I'm already setting the expert feed-forward size from the |
In the first line I referenced you can see it doesn't actually read the expert feed-forward size you are setting, it's calculated from feed-forward and For backwards and forwards compatibility you need to read that value from metadata if it exists, and do the calculation if it doesn't. |
Ah, I see. Thank you for the explanation. I will add the changes as soon as I get an opportunity. |
previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shared_exp are now properly calculated
I've pushed the llama.cpp changes as requested & cleanup up the convert-hf-to-gguf.py file, as the workaround was not needed anymore |
Fix the lint checks (EditorConfig and flake8) @CISC could you review/test the changes? |
convert-hf-to-gguf.py
Outdated
if (shared_expert_intermediate_size := self.find_hparam(["shared_expert_intermediate_size","intermediate_size", "n_inner"])) is not None: | ||
self.gguf_writer.add_feed_forward_length(shared_expert_intermediate_size) | ||
logger.info(f"gguf: feed forward length = {shared_expert_intermediate_size}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not removed, but cleaned up.
add_feed_forward_length
is called by super().set_gguf_parameters()
as it was before my PR.
add_expert_feed_forward_length
is the only new thing that is added, and is the one taken into account here https://github.com/ggerganov/llama.cpp/pull/7835/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5814 and here https://github.com/ggerganov/llama.cpp/pull/7835/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5820
since LLM_KV_EXPERT_FEED_FORWARD_LENGTH
is the only one taken into account, if felt irrelevant to set LLM_KV_FEED_FORWARD_LENGTH
from shared_expert_intermediate_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
However, looking at Qwen1.5-MoE-A2.7B
s config.json
:
"intermediate_size": 5632,
"moe_intermediate_size": 1408,
"shared_expert_intermediate_size": 5632,
"num_experts_per_tok": 4,
and Qwen2-57B-A14B
has the following values:
"intermediate_size": 18944,
"moe_intermediate_size": 2560,
"shared_expert_intermediate_size": 20480,
"num_experts_per_tok": 8,
Although I'm still not sure what Qwen2's intermediate_size
refers to, since we are now keeping that as feed_forward_length
, would it not make sense to store shared_expert_intermediate_size
also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure as what though, @ggerganov any thoughts?
Trying to see if there are any good docs on these values...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure what intermediate_size
is either. For Qwen1.5-MoE-A2.7B
the values look correct, that's why I'm inclined to say that intermediate_size = 18944
could have been a mistake.
As for storing shared_expert_intermediate_size
, I see no specific slot for it, apart from overriding LLM_KV_FEED_FORWARD_LENGTH
which might not be the best option (even tho' it's unused)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC What do you think the next steps should be?
We can leave this PR as is since it works, and then start a new one with a proper impl?
Otherwise, adding mlp_only_layers: int[]
, shared_expert_intermediate_size: int
and decoder_sparse_step: int
to the config (and converter) is probably out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a shared_expert_feed_forward_length
is within the scope of this PR.
The sparse layer stuff should be another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, would you like to simply override LLM_KV_FEED_FORWARD_LENGTH
with shared_expert_intermediate_size
or create a new one, say LLM_KV_SHARED_EXPERT_FEED_FORWARD_LENGTH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter since you need to keep the original value for the other PR. Or, well, at least it will make things a little less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlp_only_layers: int[]
, anddecoder_sparse_step: int
@legraphista I've had to deal with models which have both MoE layers and MLP-only layers in #7531 (Jamba). A new metadata key-value pair is not needed to identify these layers. The easy way is to check for the presence of layers.ffn_gate_inp
, which is only there on MoE layers, and when building the compute graph, build_llama
can be a good inspiration for how to check this per-layer.
i've updated the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
{ LLM_KV_EXPERT_SHARED_COUNT, "%s.expert_shared_count" }, | ||
{ LLM_KV_EXPERT_WEIGHTS_SCALE, "%s.expert_weights_scale" }, | ||
{ LLM_KV_POOLING_TYPE , "%s.pooling_type" }, | ||
{ LLM_KV_LOGIT_SCALE, "%s.logit_scale" }, | ||
|
||
{ LLM_KV_ATTENTION_HEAD_COUNT, "%s.attention.head_count" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have to align all of them, before and after, up to @ggerganov
previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shexp are now properly calculated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go! :)
We can merge after @compilade approves |
Added support for keys
moe_intermediate_size
andshared_expert_intermediate_size
for Qwen2-57B-A14Bcaveat:
since
self.gguf_writer.add_feed_forward_length
was getting called bysuper().set_gguf_parameters()
, I had to copy-paste the existing code to update it properly