Granite three support #608

gabe-l-hart · 2024-11-04T22:53:26Z

Description

This PR adds support for the "granite" and "granitemoe" architectures in order to support IBM's Granite 3.0. The changes mirror those added in llama.cpp upstream:

"granite": IBM Granite Architecture ggerganov/llama.cpp#9412
"granitemoe": IBM Granite MoE Architecture ggerganov/llama.cpp#9438

These models are currently available via HuggingFace and Ollama:

HuggingFace: https://huggingface.co/collections/ibm-granite/granite-30-language-models-66fdb59bbb54785c3512114f
Ollama:
- granite3-dense ("granite"): https://ollama.com/library/granite3-dense
- granite3-moe ("granitemoe"): https://ollama.com/library/granite3-moe

Testing

I did my development on a Mac M3 without gmake natively installed. To avoid a system-level install, I wrapped my dev environment in docker with the following two scripts:

build_dockerized.sh

#!/usr/bin/env bash

cd $(dirname ${BASH_SOURCE[0]})

docker buildx build . -t llamafile-builder:latest --load
docker run --rm -it --entrypoint bash -w /src -v $PWD:/src -v $HOME/models:/models llamafile-builder:latest

build_in_docker.sh

#!/usr/bin/env bash

gguf_file=$1
if [ $# -ge 2 ]
then
    model_name=$2
else
    model_name=$(basename $gguf_file | cut -d'.' -f 1)
fi
echo "Model Name: $model_name"

# Build (NOTE: First build may fail due to the need to download tools)
make -j || make -j

# Install the built binaries
make install PREFIX=/usr/local

# Make a temp dir to work in
start_dir=$PWD
temp_dir=$(mktemp -d)
cd $temp_dir

# Copy over the model and base binary
echo "Copying source materials..."
cp $gguf_file .
cp $(which llamafile) $model_name.llamafile

# Make the .args file
echo "Making .args file..."
echo "-m
$(basename $gguf_file)
--host
0.0.0.0
-ngl
9999
..." > .args

# Pack it all together
echo "Packing with zipalign..."
zipalign -j0 $model_name.llamafile $(basename $gguf_file) .args

# Move it back to the root dir
mv $model_name.llamafile $start_dir/
echo "DONE"

With these scripts, my workflow was:

Download pre-quantized versions of the models (e.g. ollama pull then grab the $HOME/.ollama/models/blobs/... blob for the GGUF file)
- NOTE: IBM does not currently host official quantized versions, but there are also many community quantizations available in HF (dense, moe)
Launch the docker build shell (./build_dockerized.sh)
Build the llamafile inside (./build_in_docker.sh /models/granite-3.0-2b-instruct.Q4_K_M.gguf granite3-dense-2b)
Run the llamafile outside the docker shell (./granite3-dense-2b.llamafile -p "tell me a story")

Open Questions

Solved! I found the PR added after mine in llama.cpp to update the chat template to support "granite": ggerganov/llama.cpp#10013

When running in interactive mode, the chat template seems to be using different special tokens besides those defined in the chat_template metadata in the GGUF file. I haven't dug enough yet to understand if this is something that can be pulled automatically from the GGUF, or if there's an additional place where the Granite architectures will need to explicitly indicate their chat templates.

DK013 · 2024-11-06T17:59:18Z

I was waiting for this. Thanks a lot for your hard work mate @gabe-l-hart

BradHutchings · 2024-11-07T17:32:06Z

Thanks for doing this @gabe-l-hart. And thanks for the link @DK013. I appreciate you both!

-Brad

BradHutchings · 2024-11-09T20:06:21Z

I did my own llamafile build with this branch and was able to use IBM Granite 3.0 8B Instruct. Thank you again @gabe-l-hart!

gabe-l-hart · 2024-11-14T20:12:32Z

Hi @jart! I wanted to check in and see if this PR is something you would consider for upstream merging. I see that you use llama.cpp/README.llamafile to track the version of llama.cpp being used and the list of local modifications on top. I didn't see a clean way to re-bump the commit and apply those deltas, but I'd be happy to re-do this change set to be a full llama.cpp bump if that's preferred.

DK013 · 2024-11-24T03:37:23Z

I did my own llamafile build with this branch and was able to use IBM Granite 3.0 8B Instruct. Thank you again @gabe-l-hart!

I have been wanting to try it but wasn't getting enough time to sit and resolve the errors on a windows machine. @BradHutchings would you mind sharing your build so I can run some tests as well?
Thanks in advance

pawel665j · 2024-11-24T18:13:04Z

I'll try to shoot a video and send a link to the repositories I want to run, right now I'm running a light version consisting of one exe file. My system Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2.60 GHz (processors: 2) 128 ГБ Windows 10 Pro Release Version 22H2 Installation Date ‎06.‎04.‎2024 OS Build 19045.5131 Interoperability Windows Feature Experience Pack 1000.19060.1000.0

…

________________________________ From: Deep Chakraborty ***@***.***> Sent: Sunday, November 24, 2024 6:37 AM To: Mozilla-Ocho/llamafile ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [Mozilla-Ocho/llamafile] Granite three support (PR #608) I did my own llamafile build with this branch and was able to use IBM Granite 3.0 8B Instruct. Thank you again @gabe-l-hart<https://github.com/gabe-l-hart>! I have been wanting to try it but wasn't getting enough time to sit and resolve the errors on a windows machine. @BradHutchings<https://github.com/BradHutchings> would you mind sharing your build so I can run some tests as well? Thanks in advance — Reply to this email directly, view it on GitHub<#608 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOOFRCGR7IS6HA6Y6CCIG7T2CFCYVAVCNFSM6AAAAABRFIWFGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJVG44DCNZWGY>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

BradHutchings · 2024-11-24T20:25:18Z

@DK013 My llamafile builds are here: https://huggingface.co/bradhutchings/DemoMachine-LLMs

This is a port of the work done in llama.cpp directly ggerganov/llama.cpp#9412 Branch: GraniteThreeSupport Signed-off-by: Gabe Goodhart <[email protected]>

This is a port of the work done in llama.cpp directly ggerganov/llama.cpp#9438 Branch: GraniteThreeSupport Signed-off-by: Gabe Goodhart <[email protected]>

Branch: GraniteThreeSupport This is a port of the work done in llama.cpp with a slight tweak for the tool call response: ggerganov/llama.cpp#10013 Signed-off-by: Gabe Goodhart <[email protected]>

github-actions bot added the llama.cpp label Nov 4, 2024

gabe-l-hart force-pushed the GraniteThreeSupport branch from 473e3fc to bbe64fe Compare November 5, 2024 00:06

gabe-l-hart added 3 commits December 10, 2024 09:22

feat(granite): Add support for the "granite" architecture in llama.cpp

a574dff

This is a port of the work done in llama.cpp directly ggerganov/llama.cpp#9412 Branch: GraniteThreeSupport Signed-off-by: Gabe Goodhart <[email protected]>

feat(granitemoe): Add support for "granitemoe" architecture

070fd01

This is a port of the work done in llama.cpp directly ggerganov/llama.cpp#9438 Branch: GraniteThreeSupport Signed-off-by: Gabe Goodhart <[email protected]>

feat(granite*): Add granite chat template

f2557e3

Branch: GraniteThreeSupport This is a port of the work done in llama.cpp with a slight tweak for the tool call response: ggerganov/llama.cpp#10013 Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart force-pushed the GraniteThreeSupport branch from bbe64fe to f2557e3 Compare December 10, 2024 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Granite three support #608

Granite three support #608

gabe-l-hart commented Nov 4, 2024 •

edited

Loading

DK013 commented Nov 6, 2024

BradHutchings commented Nov 7, 2024

BradHutchings commented Nov 9, 2024

gabe-l-hart commented Nov 14, 2024

DK013 commented Nov 24, 2024 •

edited

Loading

pawel665j commented Nov 24, 2024 via email

BradHutchings commented Nov 24, 2024

Granite three support #608

Are you sure you want to change the base?

Granite three support #608

Conversation

gabe-l-hart commented Nov 4, 2024 • edited Loading

Description

Testing

Open Questions

DK013 commented Nov 6, 2024

BradHutchings commented Nov 7, 2024

BradHutchings commented Nov 9, 2024

gabe-l-hart commented Nov 14, 2024

DK013 commented Nov 24, 2024 • edited Loading

pawel665j commented Nov 24, 2024 via email

BradHutchings commented Nov 24, 2024

gabe-l-hart commented Nov 4, 2024 •

edited

Loading

DK013 commented Nov 24, 2024 •

edited

Loading