Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for larger Granite Code Models (20B, 34B) #7324

Merged
merged 2 commits into from
May 18, 2024

Conversation

sroecker
Copy link
Contributor

The larger Granite Code Models 20B and 34B are based on Starcoder.
One difference though is tied word embeddings.
This change should not break the existing Starcoder models.
A bit more work is required to support all Granite code models:
#7116

sroecker added 2 commits May 15, 2024 18:08
Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116

There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level model Model specific labels May 16, 2024
@ggerganov ggerganov merged commit 0f98acf into ggerganov:master May 18, 2024
61 of 64 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request May 18, 2024
…nov#7324)

Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116

There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Model specific Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants