Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Phi-3-vision-128k-instruct implementation #7444

Closed
mirek190 opened this issue May 21, 2024 · 23 comments
Closed

FR: Phi-3-vision-128k-instruct implementation #7444

mirek190 opened this issue May 21, 2024 · 23 comments
Labels
enhancement New feature or request stale

Comments

@mirek190
Copy link

That model is insane for its size ....

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

@mirek190 mirek190 added the enhancement New feature or request label May 21, 2024
@Galunid Galunid changed the title LLAMACPP must get implementation for Phi-3-vision-128k-instruct FR: Phi-3-vision-128k-instruct implementation May 21, 2024
@simsi-andy
Copy link

Is it natively supported once someone converts it to gguf?

@4onen
Copy link

4onen commented May 25, 2024

Is it natively supported once someone converts it to gguf?

Someone has to write the code to run such a model into llama.cpp. Then it would be a model you could convert to gguf. Until then, no.

@mirek190
Copy link
Author

I'm waiting who will do that patiently ...😭

@HaoHoo
Copy link

HaoHoo commented May 27, 2024

I've tried to convert the phi-3-vision-128k-instruct HF model to the GGUF model. But it looks like the current version llama.cpp does not support the vision model (model.vision_embed_tokens, etc.) in phi-3v. After I add "Phi3VForCausalLM" into the convert-hf-to-gguf.py just copy from "Phi3ForCausalLM", the running result looks like below:

...
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{{'<|' + message['role'] + '|>' + '
' + message['content'] + '<|end|>
' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|>
' -}}{% endif %}
INFO:hf-to-gguf:Exporting model to 'converted.bin'
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {3072, 32064}
...
...
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 330, in write
self.write_tensors()
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 266, in write_tensors
for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 233, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 184, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.vision_embed_tokens.glb_GN'

The tensors' names like 'model.vision_embed_tokens.glb_GN' are not listed in the "TensorNameMap" of the tensor_mapping.py file. These additional models in the Phi-3v can be found here:
https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/tree/main?show_file_info=model.safetensors.index.json

Is that possible to make llama.cpp support multimodel like llava and Phi-3v?

@DenisSergeevitch
Copy link

The model is very good for its size for the OCR task, looking forward to use it in GGUF format

@HaoHoo
Copy link

HaoHoo commented May 30, 2024

Hi @ggerganov, the Phi-3 vision is similar to llava, combined with Phi-3 and CLIP-ViT-Large-patch14-336 models. Is possible to support converting it from HF to GGUF?

@anuran-roy
Copy link

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

@HaoHoo
Copy link

HaoHoo commented Jun 1, 2024

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

@anuran-roy
Copy link

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:
ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

I did exactly that, as mentioned in the messages above in this issue. And got the exact same problem. Any sort of workarounds for this - if we can somehow decouple them or something?

@farris
Copy link

farris commented Jun 2, 2024

You can use examples/llava/llava-surgery-v2.py to separate out clip. I was able to modify it to do so successfully. I'm a bit stuck on the rest... the easiest way to do this imo is to modify the code under LLAVA/ to accept the phi3 base model and this hacked off clip encoder

@farris
Copy link

farris commented Jun 3, 2024

#7705 👁️

@BrainSlugs83
Copy link

Would it be possible to use a parameter in the GGUF header to tell it that the file contains two sets of tensor data?

I feel like for the typical user they will expect to use a single GGUF file.

@github-actions github-actions bot added the stale label Jul 28, 2024
@Aisuko
Copy link
Contributor

Aisuko commented Aug 4, 2024

bad bot

@muzhig
Copy link

muzhig commented Aug 4, 2024

sad but true

@github-actions github-actions bot removed the stale label Aug 6, 2024
@coder543
Copy link

New release of Phi-3.5-vision-instruct today: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

(As well as a 16x3.8B MoE and an updated version of the basic Phi-3.5-mini)

@stygmate
Copy link

+1 for support

@Milor123
Copy link

Milor123 commented Aug 27, 2024

@coder543 And it can be converted to GGUF? and use VISION model?

@coder543
Copy link

@Milor123 Nope… that’s why this issue exists.

@simsi-andy
Copy link

simsi-andy commented Aug 27, 2024

Abetlen already did convert it and tries to create an experimental branch: https://huggingface.co/abetlen/Phi-3.5-vision-instruct-gguf

@daboe01
Copy link
Contributor

daboe01 commented Aug 31, 2024

#9209

@ayttop
Copy link

ayttop commented Sep 2, 2024

code to use Phi-3.5-vision-instruct-gguf with image locally on llama cpp python????????????

@github-actions github-actions bot added the stale label Oct 3, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@gcapnias
Copy link

Is the issue closed, or the bot closed anyway?

Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests