Add support for Phi-3.5-vision-instruct #9209

abetlen · 2024-08-27T22:24:56Z

Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

The biggest challenge here was porting over the hd_transform that's performed on the image embeddings before the projector. This requires doing a 6d reshape-permute which I had to adapt to work with a series of 4d reshape-permutes. It looks like a mess but it does work (I tested this section against numpy).

Converted GGUFs can be found here

Prompt format

<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n

Update
Just realised this still needs work to implement the hd transform in the preprocessing steps as well, will keep it as a draft for now.

ThiloteE · 2024-08-28T17:02:39Z

Partially resolves #9119

ayttop · 2024-08-28T17:52:19Z

wher Phi-3.5-MoE-instruct gguf?

Milor123 · 2024-09-01T01:15:54Z

@abetlen , Bro I've got this bug

Could please explain me how should use it in ollama Modelfile:

<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n

I've created the it using

my Modelfile:

PARAMETER num_ctx 12000
FROM Phi-3.5-3.8B-vision-instruct-Q8_0.gguf
FROM Phi-3.5-3.8B-vision-instruct-mmproj-F16.gguf
TEMPLATE "<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n"
PARAMETER stop <INST>
PARAMETER stop </INST>
PARAMETER stop <|end|>

but when run happened it

ollama run Phi-3.5-3.8B-vision-instruct-Q8_0             
Error: llama runner process has terminated: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

What should I change? I am running ollama from docker

abetlen added 2 commits August 27, 2024 18:11

Add support for Phi3-vision-instruct

dc0625a

Merge remote-tracking branch 'origin' into add-support-for-phi3-vision

951f1d9

github-actions bot added the examples label Aug 27, 2024

abetlen changed the title ~~Add support for Phi3-vision-instruct~~ Add support for Phi3.5-vision-instruct Aug 27, 2024

abetlen changed the title ~~Add support for Phi3.5-vision-instruct~~ Add support for Phi-3.5-vision-instruct Aug 27, 2024

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Aug 30, 2024

This was referenced Aug 31, 2024

FR: Phi-3-vision-128k-instruct implementation #7444

Closed

Microsoft Phi-3.5 models ollama/ollama#6449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Phi-3.5-vision-instruct #9209

Add support for Phi-3.5-vision-instruct #9209

abetlen commented Aug 27, 2024 •

edited

Loading

ThiloteE commented Aug 28, 2024

ayttop commented Aug 28, 2024

Milor123 commented Sep 1, 2024 •

edited

Loading

Add support for Phi-3.5-vision-instruct #9209

Are you sure you want to change the base?

Add support for Phi-3.5-vision-instruct #9209

Conversation

abetlen commented Aug 27, 2024 • edited Loading

ThiloteE commented Aug 28, 2024

ayttop commented Aug 28, 2024

Milor123 commented Sep 1, 2024 • edited Loading

abetlen commented Aug 27, 2024 •

edited

Loading

Milor123 commented Sep 1, 2024 •

edited

Loading