The official fine-tuning implementation of our VOS approach (DropSeg) for the CVPR 2023 paper DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks.
* Thanks for the great STCN library, which helps us to quickly implement the DropMAE VOS fine-tuning. The repository mainly follows the STCN repository.
* The proposed DropSeg uses pairs of frames for offline VOS training, and achieves SOTA results on existing VOS benchmarks w/ one-shot evaluation.
The Anaconda is used to create the Python environment, which mainly follows the installation in DropMAE and partially in STCN. The detailed installation packages can be found in environment.yaml
.
We follow the same data preparation steps used in STCN. Download both DAVIS and YouTube-19 datasets:
├── DAVIS
│ ├── 2016
│ │ ├── Annotations
│ │ └── ...
│ └── 2017
│ ├── test-dev
│ │ ├── Annotations
│ │ └── ...
│ └── trainval
│ ├── Annotations
│ └── ...
├── YouTube
│ ├── all_frames
│ │ └── valid_all_frames
│ ├── train
│ ├── train_480p
│ └── valid
Download pre-trained DropMAE models in DropMAE (e.g., K700-800E).
python -m torch.distributed.launch --master_port 9842 --nproc_per_node=8 train_dropseg.py --pretrained_net_path pretrained_model_path --id retrain_s03 --stage 3
--pretrained_net_path
indicates your downloaded pre-trained model path.
Download the DropSeg model here, and run the evaluation w/ the following commands. All evaluations are done in the 480p resolution.
Python submit_eval_davis17.py --davis_path path_to_davis17_dataset
Python submit_eval_davis16.py --davis_path path_to_davis16_dataset
After running the above evaluation, you could get the qualitative results saved in the root project directory. You could use the offline evaluation toolikit (https://github.com/davisvideochallenge/davis2017-evaluation) to get the validation performance on DAVIS-16/17. For test-dev
on DAVIS-17, using the online evaluation server instead.
- Thanks for the STCN library for convenient implementation.
If our work is useful for your research, please consider cite:
@inproceedings{dropmae2023,
title={DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks},
author={Qiangqiang Wu and Tianyu Yang and Ziquan Liu and Baoyuan Wu and Ying Shan and Antoni B. Chan},
booktitle={CVPR},
year={2023}
}