-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide .GGUF files? #7
Comments
@flatsiedatsie some folks in the community have already started working on this |
I've made a couple of GGUF's here: https://huggingface.co/YorkieOH10/granite-8b-code-instruct-Q8_0-GGUF But waiting on support in llama.cpp before they can be used. |
Thanks @YorkieDev |
Also, not sure if its hard, but would be awesome if you can create GGUFs of all the models |
@mayank31398 That's correct, once that PR's been merged GGUF's will work. I can get some more quants made up of them. From my understanding, it's a scale of quality/size Q8 is heavier to run, but Q4 is lighter, and is what the majority of folks use. For more info on GGUF models: https://huggingface.co/docs/hub/gguf |
thanks for the explanation. |
Having a wider range of quants would rock. For example, I would like to use Granite in a browser-based project. In that use case it's optimal when a file it below 2Gb in size, and the Q4 quant you provided is just above that threshold. |
GGUFs are now available: https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330 |
Would it be possible to provice a full range of GGUF files for these wicked models?
I'm tyring to convert the 3B myself, but running into issues.
The text was updated successfully, but these errors were encountered: