Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Phi-3.5-vision-instruct #9209

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

abetlen
Copy link
Collaborator

@abetlen abetlen commented Aug 27, 2024

Adds support for Phi-3.5-vison-instruct

Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

The biggest challenge here was porting over the hd_transform that's performed on the image embeddings before the projector. This requires doing a 6d reshape-permute which I had to adapt to work with a series of 4d reshape-permutes. It looks like a mess but it does work (I tested this section against numpy).

Converted GGUFs can be found here

Prompt format

<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n

Update
Just realised this still needs work to implement the hd transform in the preprocessing steps as well, will keep it as a draft for now.

@abetlen abetlen changed the title Add support for Phi3-vision-instruct Add support for Phi3.5-vision-instruct Aug 27, 2024
@abetlen abetlen changed the title Add support for Phi3.5-vision-instruct Add support for Phi-3.5-vision-instruct Aug 27, 2024
@ThiloteE
Copy link

Partially resolves #9119

@ayttop
Copy link

ayttop commented Aug 28, 2024

wher Phi-3.5-MoE-instruct gguf?

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Aug 30, 2024
@Milor123
Copy link

Milor123 commented Sep 1, 2024

@abetlen , Bro I've got this bug

Could please explain me how should use it in ollama Modelfile:

<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n

I've created the it using

my Modelfile:

PARAMETER num_ctx 12000
FROM Phi-3.5-3.8B-vision-instruct-Q8_0.gguf
FROM Phi-3.5-3.8B-vision-instruct-mmproj-F16.gguf
TEMPLATE "<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n"
PARAMETER stop <INST>
PARAMETER stop </INST>
PARAMETER stop <|end|>

but when run happened it

ollama run Phi-3.5-3.8B-vision-instruct-Q8_0             
Error: llama runner process has terminated: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

What should I change? I am running ollama from docker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants