VTimeLLM adopts a three-stage training strategy. Please follow the instructions below to train VTimeLLM-7B model.
-
Download clip and Vicuna v1.5 weights, and place them into the 'checkpoints' directory.
-
Download stage1 dataset from this link, and download stage2 and stage3 dataset from the Tsinghua Cloud. Place them into the 'data' directory.
- VTimeLLM
- checkpoints
- clip
- ViT-L-14.pt
- vicuna-7b-v1.5
- pytorch_model-00001-of-00002.bin
- ...
- data
- blip_laion_cc_sbu_558k.json
- stage2.json
- stage3.json
- scripts
- stage1.sh
- stage2.sh
- stage3.sh
- ...
- vtimellm
- ...
If you want to train a Chinese version, you can download the ChatGLM3-6b model and the translated Chinese dataset.
- Download the pre-extracted features from the Tsinghua Cloud.
tar -xzvf stage1.tar.gz
cat stage2_part_* > stage2.tar.gz
tar -xzvf stage2.tar.gz
tar -xzvf stage3.tar.gz
- Train in three stages sequentially, and make sure to modify '--feat_folder' in the script to the corresponding feature folder for each stage.
cd VTimeLLM
bash scripts/stage1.sh
bash scripts/stage2.sh
bash scripts/stage3.sh