By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li.
This repo is the official Pytorch implementation of FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.
- Python >= 3.6
- Pytorch >= 1.0 and corresponding torchvision (https://pytorch.org/)
- Clone this repo:
git clone https://github.com/ruiliu-ai/FuseFormer.git
- Install other packages:
cd FuseFormer
pip install -r requirements.txt
Download datasets (YouTube-VOS and DAVIS) into the data folder.
mkdir data
python train.py -c configs/youtube-vos.json
Download pre-trained model into checkpoints folder.
mkdir checkpoints
python test.py -c checkpoints/fuseformer.pth -v data/DAVIS/JPEGImages/blackswan -m data/DAVIS/Annotations/blackswan
If you find FuseFormer useful in your research, please consider citing:
@InProceedings{Liu_2021_FuseFormer,
title={FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting},
author={Liu, Rui and Deng, Hanming and Huang, Yangyi and Shi, Xiaoyu and Lu, Lewei and Sun, Wenxiu and Wang, Xiaogang and Dai, Jifeng and Li, Hongsheng},
booktitle = {International Conference on Computer Vision (ICCV)},
year={2021}
}
This code borrows heavily from the video inpainting framework spatial-temporal transformer net.