Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Video Feature Project #35

Open
xiaokj37 opened this issue Aug 1, 2024 · 1 comment
Open

About Video Feature Project #35

xiaokj37 opened this issue Aug 1, 2024 · 1 comment

Comments

@xiaokj37
Copy link

xiaokj37 commented Aug 1, 2024

First of all, thank you very much for open-sourcing your work.
According to your paper, VTimeLLm project the image cls token in to LLM embedding.
I would like to ask where this part is implemented in the code.
Looking forward to your reply.

@huangb23
Copy link
Owner

For training, we pre-extract the cls embedding of each frame and project it using the mm_projector in the class VTimeLLMMetaModel. The relevant code can be found in model/vtimellm_arch.py. Additionally, you can refer to inference.py for the code related to extracting the cls embedding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants