About Video Feature Project #35

xiaokj37 · 2024-08-01T03:25:44Z

First of all, thank you very much for open-sourcing your work.
According to your paper, VTimeLLm project the image cls token in to LLM embedding.
I would like to ask where this part is implemented in the code.
Looking forward to your reply.

huangb23 · 2024-08-21T14:51:34Z

For training, we pre-extract the cls embedding of each frame and project it using the mm_projector in the class VTimeLLMMetaModel. The relevant code can be found in model/vtimellm_arch.py. Additionally, you can refer to inference.py for the code related to extracting the cls embedding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Video Feature Project #35

About Video Feature Project #35

xiaokj37 commented Aug 1, 2024

huangb23 commented Aug 21, 2024

About Video Feature Project #35

About Video Feature Project #35

Comments

xiaokj37 commented Aug 1, 2024

huangb23 commented Aug 21, 2024