-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Microsoft Phi-4 model #10817
base: master
Are you sure you want to change the base?
Conversation
…4 model llama : use regular (not a sliding window) attention mask for Phi-4 model
src/llama.cpp
Outdated
@@ -12839,7 +12839,13 @@ struct llm_build_context { | |||
struct ggml_tensor * inp_pos = build_inp_pos(); | |||
|
|||
// KQ_mask (mask for 1 head, it will be broadcasted to all heads) | |||
struct ggml_tensor * KQ_mask_swa = build_inp_KQ_mask_swa(); | |||
struct ggml_tensor * KQ_mask = nullptr; | |||
if (model.name == "Phi 4") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better solution would be to check if hparams.n_swa != 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified my patch to explicitly store zero sliding_window in case it's null in config.json and use the zero value to distinguish Phi-4 from other PHI3-based models.
convert_hf_to_gguf.py
Outdated
if self.metadata.name == "Phi 4": | ||
return self._set_vocab_gpt2() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, self._set_vocab_gpt2()
could be called when tokenizer.model
is missing here, regardless of the model name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the solution to check value of tokenizer_class from tokenizer_config.json and call self._set_vocab_gpt2()
if it's GPT2Tokenizer.
This PR adds support for Microsoft Phi-4 model. Fixes #10814.
Current solution is to:
sliding_window
hparam if it's null. This allows the old Phi-3n_swa
validation logic to work without any changes. Ifn_swa
is 0 a regular KQ mask is used instead of sliding window KQ mask inbuild_phi3()
.A model name value from general.name ("Phi 4") was used to trigger behavior specific to Phi-4 model:1. Using GPT2 vocab during model conversion2. Ignoringsliding_window
hparam during model conversion3. Skipping sliding window length value check (n_swa == 0
) inbuild_phi3()
4. Creating regular KQ mask instead of sliding window KQ mask inbuild_phi3()
Let me know if there is any better way to differentiate Phi 4 from other models based on PHI3 architecture.