Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

SuperPat45 · 2024-09-12T11:30:13Z

Add the new Multi-Modal model of mistral AI: pixtral-12b:

https://huggingface.co/mistral-community/pixtral-12b-240910

AlexM4H · 2024-09-13T10:00:26Z

Since yesterday vllm has internVL2 support. :-)

vllm-project/vllm/releases/tag/v0.6.1

mudler · 2024-09-13T16:12:14Z

I guess that would work already with llama.cpp GGUF models if/when is getting supported in there ( see also ggerganov/llama.cpp#9440 ).

I'd change the focus of this one to be more generic and add support for multimodal with vLLM, examples:

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_pixtral.py
https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py

AlexM4H · 2024-09-26T06:52:04Z

vllm already has llama 3.2 support vllm-project/vllm#8811

Georgi wrote two weeks ago:
"Not much has changes since the issue was created. We need contributions to improve the existing vision code and people to maintain it. There is interest to reintroduce full multimodal support, but there are other things with higher priority that are currently worked upon by the core maintainers of the project."
(ggerganov/llama.cpp#8010 (comment))

mudler · 2024-09-26T08:16:04Z

See also: ggerganov/llama.cpp#9455

AlexM4H · 2024-09-26T08:19:46Z

BTW: "(Coming very soon) 11B and 90B Vision models

11B and 90B models support image reasoning use cases, such as document-level understanding including charts and graphs and captioning of images."

(https://ollama.com/blog/llama3.2)

mudler · 2024-09-26T08:20:57Z

BTW: "(Coming very soon) 11B and 90B Vision models

11B and 90B models support image reasoning use cases, such as document-level understanding including charts and graphs and captioning of images."

(https://ollama.com/blog/llama3.2)

that would be interesting to see given upstream(llama.cpp) is still working on it: ggerganov/llama.cpp#9643

AlexM4H · 2024-09-26T08:26:06Z

It seems they work independently on that ollama/ollama#6963

mudler · 2024-09-26T08:39:34Z

It seems they work independently on that ollama/ollama#6963

that looks only golang-side of things to fit the images. The real backend changes seems to be in ollama/ollama#6965

AlexM4H · 2024-09-26T09:19:10Z

It seems they work independently on that ollama/ollama#6963

that looks only golang-side of things to fit the images. The real backend changes seems to be in ollama/ollama#6965

Oh, yes. Wrong link.

SuperPat45 added the enhancement New feature or request label Sep 12, 2024

mudler added the roadmap label Sep 13, 2024

This was referenced Sep 26, 2024

llama3.2 vision models #3669

Open

Support multimodals models with vLLM #3670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

SuperPat45 commented Sep 12, 2024 •

edited

Loading

AlexM4H commented Sep 13, 2024

mudler commented Sep 13, 2024 •

edited

Loading

AlexM4H commented Sep 26, 2024 •

edited

Loading

mudler commented Sep 26, 2024

AlexM4H commented Sep 26, 2024

mudler commented Sep 26, 2024

AlexM4H commented Sep 26, 2024

mudler commented Sep 26, 2024 •

edited

Loading

AlexM4H commented Sep 26, 2024

Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

Comments

SuperPat45 commented Sep 12, 2024 • edited Loading

AlexM4H commented Sep 13, 2024

mudler commented Sep 13, 2024 • edited Loading

AlexM4H commented Sep 26, 2024 • edited Loading

mudler commented Sep 26, 2024

AlexM4H commented Sep 26, 2024

mudler commented Sep 26, 2024

AlexM4H commented Sep 26, 2024

mudler commented Sep 26, 2024 • edited Loading

AlexM4H commented Sep 26, 2024

SuperPat45 commented Sep 12, 2024 •

edited

Loading

mudler commented Sep 13, 2024 •

edited

Loading

AlexM4H commented Sep 26, 2024 •

edited

Loading

mudler commented Sep 26, 2024 •

edited

Loading