This is the official repository of our papers:
- "HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics"
- "BREASE: Bridging Episodes and Semantics, A Novel Framework for Long-Form Video Understanding" (ECCVW'24).
- [2024.08.24] ⌨️ Our short paper "BREASE: Bridging Episodes and Semantics, A Novel Framework for Long-Form Video Understanding" has been accepted by the EVAL-FoMo workshop at ECCV'24.
You can install the conda environment by running:
git clone https://github.com/joslefaure/HERMES.git
cd HERMES
pip install -e .
-
Download the train data (if you want to finetune HERMES) from here and the test data from here
-
Extract the frames at 10FPS and organize it as follows:
├── data
└── moviechat
├── annotation
├── frames
└── {video_id}
├── frame000001.jpg
├── ...
We use Vicuna-v1.1 (we report results using the 7B model only) as our pre-trained LLM weights, you can download from this link and arrange in this format.
I prefer my bert-base-uncased
locally, therefore I added it here too. Download it from there.
├── llm
├── vicuna-7b
├── vicuna-13b
├── bert-based-uncased
We inference the model on 4 V100 GPUs (32GB).
First add your openai API to the environment variable export OPENAI_API_KEY='sk-*****
(only for moviechat dataset), as we use GPT3.5 for scoring. For the other datasets, we report top-1 accuracy.
# Zero-shot
bash run_scripts/moviechat/test.sh
# Fully-supervised
bash run_scripts/moviechat/test.sh path/to/your/model.pth
Same for the other datasets. All the scripts are included in run_scripts
.
Coming Soon
We train the model on 8 V100 GPUs (32GB).
bash run_scripts/{dataset}/train.sh
If you find our code or our paper useful for your research, please [★star] this repo and [cite] the following paper:
@misc{faure2024bridgingepisodessemanticsnovel,
title={Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding},
author={Gueter Josmy Faure and Jia-Fong Yeh and Min-Hung Chen and Hung-Ting Su and Winston H. Hsu and Shang-Hong Lai},
year={2024},
eprint={2408.17443},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.17443},
}
We thank the authors of the following repositories for open-sourcing their code.