-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always have same response #21
Comments
How did you get the tokenizer? Regarding your problem, I think maybe it's because you are using model.llm, which is just the llama part? In this case, seems the whisper and clip part are not used. From what I understand, we can run the model by:
|
Hi, thanks for sharing the infomation. We are currently checking it. |
Hi @chatsci, Lines 466 to 489 in d03e59d
Lines 952 to 963 in d03e59d
I call the functions inside I'm pretty sure that the input tokens for LLM contain image tokens. While conducting tests, I noticed that the model appears to disregard the image input and only generates responses based on the text portion. |
Hi, thanks for sharing this information with us. I think the possible reasons could be some incompatibility issues within the code. As I'm currently on traveling, I will look into it as soon as possible when travel is finished. Would you mind sending the code you used to my email: [email protected] for me to take a look? |
Hey @lyuchenyang , I have been experiencing the same issue during inference. Are there any updates on this? Thank you. |
Hi, I have loaded your pre-trained weights and tried some instructions. However, I found the model responded with the same answer no matter what image I gave.
No matter what image I gave to the model. The model always replies
There are 5000 in the picture.
with the same prompt. It seems the model just ignored any multi-modal inputs and replied based on text.Did I do anything wrong? Thank you.
The text was updated successfully, but these errors were encountered: