Training VTimeLLM

VTimeLLM adopts a three-stage training strategy. Please follow the instructions below to train VTimeLLM-7B model.

Download clip and Vicuna v1.5 weights, and place them into the 'checkpoints' directory.
Download stage1 dataset from this link, and download stage2 and stage3 dataset from the Tsinghua Cloud. Place them into the 'data' directory.

- VTimeLLM
    - checkpoints
        - clip
        	- ViT-L-14.pt
        - vicuna-7b-v1.5
        	- pytorch_model-00001-of-00002.bin
        	- ...
    - data
        - blip_laion_cc_sbu_558k.json
        - stage2.json
        - stage3.json
    - scripts
    	- stage1.sh
    	- stage2.sh
    	- stage3.sh
    	- ...
    - vtimellm
    - ...

If you want to train a Chinese version, you can download the ChatGLM3-6b model and the translated Chinese dataset.

Download the pre-extracted features from the Tsinghua Cloud.

tar -xzvf stage1.tar.gz
cat stage2_part_* > stage2.tar.gz
tar -xzvf stage2.tar.gz
tar -xzvf stage3.tar.gz

Train in three stages sequentially, and make sure to modify '--feat_folder' in the script to the corresponding feature folder for each stage.

cd VTimeLLM
bash scripts/stage1.sh
bash scripts/stage2.sh
bash scripts/stage3.sh