adapted from XiYe20/STDiffProject (paper: arXiv link)
- Install the custom diffusers library
git clone https://github.com/XiYe20/CustomDiffusers.git
cd CustomDiffusers
pip install -e .
- Install the requirements
pip install -r requirements.txt
The dataset consists of 13000 video clips of 22 frames each in the unlabeled
folder. The val
and train
folder are for inference.
available here: https://drive.google.com/file/d/1iYTFuf4DgxgYQzTQ_2da1vC9es_niPRr/view?usp=drive_link
Folder Structure
unlabeled/
video_02000/
image_0.png
image_1.png
...
image_21.png
video_02001/
...
video_...
train/
...
val/
...
Note that there are also masks.npy files which are meant for segmentation.
Simiilar to STDiff project, accelerate is used for training. The configuration files are placed inside stdiff/configs.
- Check train.sh, modify the visible gpus, num_process, modify the config.yaml file
- Training
. ./train.sh
- Check inference.sh, modify config.yaml for inference
- Test
. ./inference.sh