-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: GLM-4 9B Support #7778
Comments
You can try chatllm.cpp, which supports GLM-4. |
Can confirm this works and is cool 😎 It would be good to get this functionality in Llama.cpp too, if only for the GPU acceleration |
Well, it chatllm.cpp is CPU-only. Why not trying transformers version in fp16. llama.cpp GPU support for GLM-4 would be great, and then quantized versions will appear, which will be even more comfortable to run. This GLM-4 looks like comparable or beating LLama 3, maybe even best-in-class for now. |
We might have this feature soon: #8031 |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Any updates? |
I saw it's merged, but does it works with llama-cop-python and how to get vision stuff working in gguf? |
Prerequisites
Feature Description
It would be really cool to have support for these models that were released today. They have some very impressive benchmarks. I've also been trying out the model in huggingface spaces myself and noticed it speaks a lot of languages fluently and is knowledgeable on many topics. Thank you for your time.
Here are the download links:
Here is the English README: README_en.md
Motivation
The motivation for this feature are found in some of the technical highlights for this model:
Here are some of the results:
Needle challenge:
Longbench:
Possible Implementation
We might be able to use some of the code from: #6999.
There is also chatglm.cpp but it doesn't support GLM-4.
The text was updated successfully, but these errors were encountered: