You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you very much for open-sourcing your work.
According to your paper, VTimeLLm project the image cls token in to LLM embedding.
I would like to ask where this part is implemented in the code.
Looking forward to your reply.
The text was updated successfully, but these errors were encountered:
For training, we pre-extract the cls embedding of each frame and project it using the mm_projector in the class VTimeLLMMetaModel. The relevant code can be found in model/vtimellm_arch.py. Additionally, you can refer to inference.py for the code related to extracting the cls embedding.
First of all, thank you very much for open-sourcing your work.
According to your paper, VTimeLLm project the image cls token in to LLM embedding.
I would like to ask where this part is implemented in the code.
Looking forward to your reply.
The text was updated successfully, but these errors were encountered: