FR: Phi-3-vision-128k-instruct implementation #7444

mirek190 · 2024-05-21T20:05:30Z

That model is insane for its size ....

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

simsi-andy · 2024-05-22T15:56:50Z

Is it natively supported once someone converts it to gguf?

4onen · 2024-05-25T07:20:16Z

Is it natively supported once someone converts it to gguf?

Someone has to write the code to run such a model into llama.cpp. Then it would be a model you could convert to gguf. Until then, no.

mirek190 · 2024-05-25T10:42:01Z

I'm waiting who will do that patiently ...😭

HaoHoo · 2024-05-27T12:09:15Z

I've tried to convert the phi-3-vision-128k-instruct HF model to the GGUF model. But it looks like the current version llama.cpp does not support the vision model (model.vision_embed_tokens, etc.) in phi-3v. After I add "Phi3VForCausalLM" into the convert-hf-to-gguf.py just copy from "Phi3ForCausalLM", the running result looks like below:

...
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{{'<|' + message['role'] + '|>' + '
' + message['content'] + '<|end|>
' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|>
' -}}{% endif %}
INFO:hf-to-gguf:Exporting model to 'converted.bin'
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {3072, 32064}
...
...
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 330, in write
self.write_tensors()
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 266, in write_tensors
for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 233, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 184, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.vision_embed_tokens.glb_GN'

The tensors' names like 'model.vision_embed_tokens.glb_GN' are not listed in the "TensorNameMap" of the tensor_mapping.py file. These additional models in the Phi-3v can be found here:
https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/tree/main?show_file_info=model.safetensors.index.json

Is that possible to make llama.cpp support multimodel like llava and Phi-3v?

DenisSergeevitch · 2024-05-27T22:05:52Z

The model is very good for its size for the OCR task, looking forward to use it in GGUF format

HaoHoo · 2024-05-30T06:41:05Z

Hi @ggerganov, the Phi-3 vision is similar to llava, combined with Phi-3 and CLIP-ViT-Large-patch14-336 models. Is possible to support converting it from HF to GGUF?

anuran-roy · 2024-05-31T18:03:19Z

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

HaoHoo · 2024-06-01T03:29:48Z

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

anuran-roy · 2024-06-01T19:44:04Z

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:
ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

I did exactly that, as mentioned in the messages above in this issue. And got the exact same problem. Any sort of workarounds for this - if we can somehow decouple them or something?

farris · 2024-06-02T00:29:47Z

You can use examples/llava/llava-surgery-v2.py to separate out clip. I was able to modify it to do so successfully. I'm a bit stuck on the rest... the easiest way to do this imo is to modify the code under LLAVA/ to accept the phi3 base model and this hacked off clip encoder

farris · 2024-06-03T01:23:20Z

#7705 👁️

BrainSlugs83 · 2024-06-27T15:05:18Z

Would it be possible to use a parameter in the GGUF header to tell it that the file contains two sets of tensor data?

I feel like for the typical user they will expect to use a single GGUF file.

Aisuko · 2024-08-04T04:16:26Z

bad bot

muzhig · 2024-08-04T09:59:08Z

sad but true

coder543 · 2024-08-20T22:11:03Z

New release of Phi-3.5-vision-instruct today: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

(As well as a 16x3.8B MoE and an updated version of the basic Phi-3.5-mini)

stygmate · 2024-08-26T21:00:10Z

+1 for support

Milor123 · 2024-08-27T00:18:01Z

@coder543 And it can be converted to GGUF? and use VISION model?

coder543 · 2024-08-27T00:22:46Z

@Milor123 Nope… that’s why this issue exists.

simsi-andy · 2024-08-27T04:29:44Z

Abetlen already did convert it and tries to create an experimental branch: https://huggingface.co/abetlen/Phi-3.5-vision-instruct-gguf

daboe01 · 2024-08-31T14:46:18Z

#9209

ayttop · 2024-09-02T17:34:55Z

code to use Phi-3.5-vision-instruct-gguf with image locally on llama cpp python????????????

github-actions · 2024-10-17T01:21:30Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gcapnias · 2024-10-17T14:36:01Z

Is the issue closed, or the bot closed anyway?

Regards,

mirek190 added the enhancement New feature or request label May 21, 2024

Galunid changed the title ~~LLAMACPP must get implementation for Phi-3-vision-128k-instruct~~ FR: Phi-3-vision-128k-instruct implementation May 21, 2024

TadayukiOkada mentioned this issue May 24, 2024

Phi-3 Vision ollama/ollama#4591

Closed

AshD mentioned this issue May 24, 2024

Support for Multimodal models dranger003/llama.cpp-dotnet#15

Closed

github-actions bot added the stale label Jul 28, 2024

github-actions bot removed the stale label Aug 6, 2024

mudler mentioned this issue Aug 28, 2024

models(gallery): add phi-3.5-vision mudler/LocalAI#3421

Merged

github-actions bot added the stale label Oct 3, 2024

github-actions bot closed this as completed Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: Phi-3-vision-128k-instruct implementation #7444

FR: Phi-3-vision-128k-instruct implementation #7444

mirek190 commented May 21, 2024

simsi-andy commented May 22, 2024

4onen commented May 25, 2024

mirek190 commented May 25, 2024

HaoHoo commented May 27, 2024

DenisSergeevitch commented May 27, 2024

HaoHoo commented May 30, 2024

anuran-roy commented May 31, 2024

HaoHoo commented Jun 1, 2024

anuran-roy commented Jun 1, 2024

farris commented Jun 2, 2024

farris commented Jun 3, 2024

BrainSlugs83 commented Jun 27, 2024

Aisuko commented Aug 4, 2024

muzhig commented Aug 4, 2024

coder543 commented Aug 20, 2024

stygmate commented Aug 26, 2024

Milor123 commented Aug 27, 2024 •

edited

Loading

coder543 commented Aug 27, 2024

simsi-andy commented Aug 27, 2024 •

edited

Loading

daboe01 commented Aug 31, 2024

ayttop commented Sep 2, 2024

github-actions bot commented Oct 17, 2024

gcapnias commented Oct 17, 2024

FR: Phi-3-vision-128k-instruct implementation #7444

FR: Phi-3-vision-128k-instruct implementation #7444

Comments

mirek190 commented May 21, 2024

simsi-andy commented May 22, 2024

4onen commented May 25, 2024

mirek190 commented May 25, 2024

HaoHoo commented May 27, 2024

DenisSergeevitch commented May 27, 2024

HaoHoo commented May 30, 2024

anuran-roy commented May 31, 2024

HaoHoo commented Jun 1, 2024

anuran-roy commented Jun 1, 2024

farris commented Jun 2, 2024

farris commented Jun 3, 2024

BrainSlugs83 commented Jun 27, 2024

Aisuko commented Aug 4, 2024

muzhig commented Aug 4, 2024

coder543 commented Aug 20, 2024

stygmate commented Aug 26, 2024

Milor123 commented Aug 27, 2024 • edited Loading

coder543 commented Aug 27, 2024

simsi-andy commented Aug 27, 2024 • edited Loading

daboe01 commented Aug 31, 2024

ayttop commented Sep 2, 2024

github-actions bot commented Oct 17, 2024

gcapnias commented Oct 17, 2024

Milor123 commented Aug 27, 2024 •

edited

Loading

simsi-andy commented Aug 27, 2024 •

edited

Loading