Skip to content

Latest commit

 

History

History

pvtv2_original

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Pyramid Vision Transformer (PVT)

Introduction

This directory contains the configs and results of PVTv2. You can find more examples in the original repository.

Please consider using the mmdet's configs when you train new models.

Results and Models

Method Backbone Pretrain Lr schd Aug box AP mask AP Config Download
ATSS PVTv2-B2-Li ImageNet-1K 3x Yes 48.9 - config log & model
ATSS PVTv2-B2 ImageNet-1K 3x Yes 49.9 - config log & model
GFL PVTv2-B2-Li ImageNet-1K 3x Yes 49.2 - config log & model
GFL PVTv2-B2 ImageNet-1K 3x Yes 50.2 - config log & model
Sparse R-CNN PVTv2-B2-Li ImageNet-1K 3x Yes 48.9 - config log & model
Sparse R-CNN PVTv2-B2 ImageNet-1K 3x Yes 50.1 - config log & model
Cascade Mask R-CNN PVTv2-B2-Li ImageNet-1K 3x Yes 50.9 44.0 config log & model
Cascade Mask R-CNN PVTv2-B2 ImageNet-1K 3x Yes 51.1 44.4 config log & model

Usage

Mixed Precision Training

The current configs use mixed precision training via MMCV by default. Please install PyTorch >= 1.6.0 to use torch.cuda.amp.

If you find performance difference from apex (used by the original authors), please raise an issue. Otherwise, we will clean code for apex.

Click me to use apex

To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user

Modify configs with the following code:

runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
fp16 = None
optimizer_config = dict(
    type='ApexOptimizerHook',
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citation

PVTv1

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

PVTv2

@misc{wang2021pvtv2,
      title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2106.13797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}