-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the new Multi-Modal model of mistral AI: pixtral-12b #3535
Comments
Since yesterday vllm has internVL2 support. :-) |
I guess that would work already with llama.cpp GGUF models if/when is getting supported in there ( see also ggerganov/llama.cpp#9440 ). I'd change the focus of this one to be more generic and add support for multimodal with vLLM, examples: https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_pixtral.py |
vllm already has llama 3.2 support vllm-project/vllm#8811 Georgi wrote two weeks ago: |
See also: ggerganov/llama.cpp#9455 |
BTW: "(Coming very soon) 11B and 90B Vision models 11B and 90B models support image reasoning use cases, such as document-level understanding including charts and graphs and captioning of images." |
that would be interesting to see given upstream(llama.cpp) is still working on it: ggerganov/llama.cpp#9643 |
It seems they work independently on that ollama/ollama#6963 |
that looks only golang-side of things to fit the images. The real backend changes seems to be in ollama/ollama#6965 |
Oh, yes. Wrong link. |
Add the new Multi-Modal model of mistral AI: pixtral-12b:
https://huggingface.co/mistral-community/pixtral-12b-240910
The text was updated successfully, but these errors were encountered: