Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question for "llava-hf/llava-1.5-7b-hf" #1

Open
sev777 opened this issue Jan 6, 2025 · 3 comments
Open

Question for "llava-hf/llava-1.5-7b-hf" #1

sev777 opened this issue Jan 6, 2025 · 3 comments

Comments

@sev777
Copy link

sev777 commented Jan 6, 2025

Hi, I am try to run "contribution_visual_reps.py", but I could get the 'inpt = vllm.get_llm_input_embeds([prompt], [img])' due to
`

5. Fill the embeddings corresponding to the images. Anything that is still zeros needs filling

    image_to_overwrite = torch.all(final_embedding == 0, dim=-1)
    image_to_overwrite &= image_to_overwrite.cumsum(-1) - 1 >= nb_image_pad[:, None].to(target_device)

    if image_to_overwrite.sum() != image_features.shape[:-1].numel():
        raise ValueError(
            f"The input provided to the model are wrong. The number of image tokens is {torch.sum(special_image_token_mask)} while"
            f" the number of image given to the model is {num_images}. This prevents correct indexing and breaks batch generation."
        )

`

And
image_to_overwrite.sum() Out[2]: tensor(331776, device='cuda:0')

image_features.shape[:-1].numel() Out[3]: 576

Did you meet the same question? And what is the version of transformer you use. Mine is 4.46.2.

@qizhou000
Copy link
Owner

Hello! My Version: 4.43.0

@sev777
Copy link
Author

sev777 commented Jan 6, 2025

Hello! My Version: 4.43.0

Thanks, and what is the shape of ' image_to_overwrite', I got the shape of 1*331776. It looks incorrect,

@sev777 sev777 closed this as completed Jan 6, 2025
@sev777 sev777 reopened this Jan 6, 2025
@qizhou000
Copy link
Owner

I wrap all VLLMs in a shell based on BaseVLLMForEdit to share various functions required for VLLM editing. The get_llm_input_embeds function needs to be specifically implemented for each VLLM, taking [prompt] and [img] as inputs and outputting the corresponding embeddings, which can then be directly fed into the language transformer in the VLLM. In contribution_visual_reps.py, I used LLaVA as an example. Therefore, I suggest you debug the get_llm_input_embeds function I wrote for LLaVA, which is located in editor/vllms_for_edit/llava/llava.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants