Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deepseek MoE v1 & GigaChat models #10827

Merged
merged 15 commits into from
Dec 15, 2024

Conversation

Inf1delis
Copy link
Contributor

@Inf1delis Inf1delis commented Dec 14, 2024

Self-reported review complexity:

  • Medium

The PR adds support for DeepSeek MoE v1 models (Base and Instruct) & support new GigaChat models (Base and Instruct). Since GigaChat is based on the Deepseek MoE v1 architecture, the changes for that model is limited to the tokenizer.

@github-actions github-actions bot added testing Everything test related python python script changes labels Dec 14, 2024
@Inf1delis
Copy link
Contributor Author

@ggerganov Hi! I think this PR is ready, could you check it up?

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions to fix the location of the new DS code to be located before DS2

@@ -3506,6 +3509,97 @@ def prepare_tensors(self):
raise ValueError(f"Unprocessed experts: {experts}")


@Model.register("DeepseekForCausalLM")
class DeepseekModel(Model):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move before DeepseekV2Model above

src/llama.cpp Outdated
}
}
} break;
case LLM_ARCH_DEEPSEEK:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move before case LLM_ARCH_DEEPSEEK2 above.

@Inf1delis Inf1delis requested a review from ggerganov December 15, 2024 13:29
@Inf1delis
Copy link
Contributor Author

Thank you for your suggestions! I hadn't noticed that.
The changes have been made: the new DS code is now placed before DeepseekV2Model and before case LLM_ARCH_DEEPSEEK2.

@ggerganov ggerganov merged commit a097415 into ggerganov:master Dec 15, 2024
1 check passed
netrunnereve pushed a commit to netrunnereve/llama.cpp that referenced this pull request Dec 16, 2024
* Add deepseek v1 arch & gigachat template

* improve template code

* add readme

* delete comments

* remove comment

* fix format

* lint llama.cpp

* fix order of deepseek and deepseek2, move gigachat temlate to the end of func

* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need

* remove comments

* move deepseek above deepseek2

* change placement of gigachat chat template
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants