Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide .GGUF files? #7

Closed
flatsiedatsie opened this issue May 9, 2024 · 8 comments
Closed

Provide .GGUF files? #7

flatsiedatsie opened this issue May 9, 2024 · 8 comments

Comments

@flatsiedatsie
Copy link

Would it be possible to provice a full range of GGUF files for these wicked models?

I'm tyring to convert the 3B myself, but running into issues.

@mayank31398
Copy link
Member

@flatsiedatsie some folks in the community have already started working on this
relevant discussion: ggerganov/llama.cpp#7116

@YorkieDev
Copy link

@flatsiedatsie @mayank31398

I've made a couple of GGUF's here:

https://huggingface.co/YorkieOH10/granite-8b-code-instruct-Q8_0-GGUF
https://huggingface.co/YorkieOH10/granite-8b-code-instruct-Q4_K_M-GGUF
https://huggingface.co/YorkieOH10/granite-34b-code-instruct-Q8_0-GGUF

But waiting on support in llama.cpp before they can be used.

@mayank31398
Copy link
Member

Thanks @YorkieDev
I am not very familiar with GGUF but whats the difference between Q8_0 and Q4_K_M ?
Also, is this the required PR to merge? ggerganov/llama.cpp#7116

@mayank31398
Copy link
Member

mayank31398 commented May 9, 2024

Also, not sure if its hard, but would be awesome if you can create GGUFs of all the models

@YorkieDev
Copy link

@mayank31398 That's correct, once that PR's been merged GGUF's will work. I can get some more quants made up of them.

From my understanding, it's a scale of quality/size Q8 is heavier to run, but Q4 is lighter, and is what the majority of folks use.

For more info on GGUF models: https://huggingface.co/docs/hub/gguf

@mayank31398
Copy link
Member

thanks for the explanation.

@flatsiedatsie
Copy link
Author

Having a wider range of quants would rock.

For example, I would like to use Granite in a browser-based project. In that use case it's optimal when a file it below 2Gb in size, and the Q4 quant you provided is just above that threshold.

@mayank31398
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants