This directory contains the configs and results of PVTv2. You can find more examples in the original repository.
Please consider using the mmdet's configs when you train new models.
Method | Backbone | Pretrain | Lr schd | Aug | box AP | mask AP | Config | Download |
---|---|---|---|---|---|---|---|---|
ATSS | PVTv2-B2-Li | ImageNet-1K | 3x | Yes | 48.9 | - | config | log & model |
ATSS | PVTv2-B2 | ImageNet-1K | 3x | Yes | 49.9 | - | config | log & model |
GFL | PVTv2-B2-Li | ImageNet-1K | 3x | Yes | 49.2 | - | config | log & model |
GFL | PVTv2-B2 | ImageNet-1K | 3x | Yes | 50.2 | - | config | log & model |
Sparse R-CNN | PVTv2-B2-Li | ImageNet-1K | 3x | Yes | 48.9 | - | config | log & model |
Sparse R-CNN | PVTv2-B2 | ImageNet-1K | 3x | Yes | 50.1 | - | config | log & model |
Cascade Mask R-CNN | PVTv2-B2-Li | ImageNet-1K | 3x | Yes | 50.9 | 44.0 | config | log & model |
Cascade Mask R-CNN | PVTv2-B2 | ImageNet-1K | 3x | Yes | 51.1 | 44.4 | config | log & model |
The current configs use mixed precision training via MMCV by default. Please install PyTorch >= 1.6.0 to use torch.cuda.amp.
If you find performance difference from apex (used by the original authors), please raise an issue. Otherwise, we will clean code for apex.
Click me to use apex
To install apex, run:
git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user
Modify configs with the following code:
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
fp16 = None
optimizer_config = dict(
type='ApexOptimizerHook',
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
PVTv1
@misc{wang2021pyramid,
title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
year={2021},
eprint={2102.12122},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
PVTv2
@misc{wang2021pvtv2,
title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
year={2021},
eprint={2106.13797},
archivePrefix={arXiv},
primaryClass={cs.CV}
}