-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Granite three support #608
base: main
Are you sure you want to change the base?
Conversation
473e3fc
to
bbe64fe
Compare
I was waiting for this. Thanks a lot for your hard work mate @gabe-l-hart |
Thanks for doing this @gabe-l-hart. And thanks for the link @DK013. I appreciate you both! -Brad |
I did my own llamafile build with this branch and was able to use IBM Granite 3.0 8B Instruct. Thank you again @gabe-l-hart! |
Hi @jart! I wanted to check in and see if this PR is something you would consider for upstream merging. I see that you use llama.cpp/README.llamafile to track the version of |
I have been wanting to try it but wasn't getting enough time to sit and resolve the errors on a windows machine. @BradHutchings would you mind sharing your build so I can run some tests as well? |
I'll try to shoot a video and send a link to the repositories I want to run, right now I'm running a light version consisting of one exe file. My system
Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2.60 GHz (processors: 2)
128 ГБ
Windows 10 Pro Release
Version 22H2
Installation Date 06.04.2024
OS Build 19045.5131
Interoperability Windows Feature Experience Pack 1000.19060.1000.0
…________________________________
From: Deep Chakraborty ***@***.***>
Sent: Sunday, November 24, 2024 6:37 AM
To: Mozilla-Ocho/llamafile ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [Mozilla-Ocho/llamafile] Granite three support (PR #608)
I did my own llamafile build with this branch and was able to use IBM Granite 3.0 8B Instruct. Thank you again @gabe-l-hart<https://github.com/gabe-l-hart>!
I have been wanting to try it but wasn't getting enough time to sit and resolve the errors on a windows machine. @BradHutchings<https://github.com/BradHutchings> would you mind sharing your build so I can run some tests as well?
Thanks in advance
—
Reply to this email directly, view it on GitHub<#608 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOOFRCGR7IS6HA6Y6CCIG7T2CFCYVAVCNFSM6AAAAABRFIWFGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJVG44DCNZWGY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
@DK013 My llamafile builds are here: https://huggingface.co/bradhutchings/DemoMachine-LLMs |
This is a port of the work done in llama.cpp directly ggerganov/llama.cpp#9412 Branch: GraniteThreeSupport Signed-off-by: Gabe Goodhart <[email protected]>
This is a port of the work done in llama.cpp directly ggerganov/llama.cpp#9438 Branch: GraniteThreeSupport Signed-off-by: Gabe Goodhart <[email protected]>
Branch: GraniteThreeSupport This is a port of the work done in llama.cpp with a slight tweak for the tool call response: ggerganov/llama.cpp#10013 Signed-off-by: Gabe Goodhart <[email protected]>
bbe64fe
to
f2557e3
Compare
Description
This PR adds support for the
"granite"
and"granitemoe"
architectures in order to support IBM's Granite 3.0. The changes mirror those added inllama.cpp
upstream:"granite"
: IBM Granite Architecture ggerganov/llama.cpp#9412"granitemoe"
: IBM Granite MoE Architecture ggerganov/llama.cpp#9438These models are currently available via HuggingFace and Ollama:
granite3-dense
("granite"
): https://ollama.com/library/granite3-densegranite3-moe
("granitemoe"
): https://ollama.com/library/granite3-moeTesting
I did my development on a Mac M3 without
gmake
natively installed. To avoid a system-level install, I wrapped my dev environment indocker
with the following two scripts:build_dockerized.sh
build_in_docker.sh
With these scripts, my workflow was:
ollama pull
then grab the$HOME/.ollama/models/blobs/...
blob for the GGUF file)./build_dockerized.sh
)llamafile
inside (./build_in_docker.sh /models/granite-3.0-2b-instruct.Q4_K_M.gguf granite3-dense-2b
)llamafile
outside the docker shell (./granite3-dense-2b.llamafile -p "tell me a story"
)Open Questions
Solved! I found the PR added after mine in
llama.cpp
to update the chat template to support"granite"
: ggerganov/llama.cpp#10013When running in interactive mode, the chat template seems to be using different special tokens besides those defined in thechat_template
metadata in the GGUF file. I haven't dug enough yet to understand if this is something that can be pulled automatically from the GGUF, or if there's an additional place where the Granite architectures will need to explicitly indicate their chat templates.