Skip to content

Latest commit

 

History

History
165 lines (137 loc) · 8.3 KB

README.md

File metadata and controls

165 lines (137 loc) · 8.3 KB

F-ViT: Build Open-Vocabulary Object Detectors Upon Frozen CLIP ViTs

Requirements

The detection framework is built upon MMDetection2.x. To install MMDetection2.x, run

cd ~/your/project/directory
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v1.7.0
MMCV_WITH_OPS=1 pip install -e . -v
cd ..
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
git checkout v2.28.1
pip install -e . -v

For other installation methods, please refer to the official website of MMCV and MMDetection.

Data Preparation

The main experiments are conducted on COCO and LVIS datasets. We also perform transfer evaluation on Objects365v1. Please prepare datasets and organize them like the following:

CLIPSelf/F-ViT
├── data         # use soft link to save storage on the disk
    ├── coco
        ├── annotations
            ├── instances_val2017.json       # for transfer evaluation
        ├── train2017
        ├── val2017
        ├── zero-shot         # obtain the files from the drive 
            ├── instances_val2017_all_2.json
            ├── instances_train2017_seen_2_65_cat.json
    ├── lvis_v1
        ├── annotations
            ├── lvis_v1_train_seen_1203_cat.json  # obtain the files from the drive 
            ├── lvis_v1_val.json 
        ├── train2017    # the same with coco
        ├── val2017      # the same with coco
    ├── Objects365v1
        ├── objects365_reorder_val.json         # obtain the files from the drive 
        ├── val
    

For open-vocabulary detection, we provide some preprocessed json files in Drive. Put instances_val2017_all_2.json and instances_train2017_seen_2_65_cat.json under data/coco/zero-shot/, lvis_v1_train_seen_1203_cat.json under data/lvis_v1/annotations/, and objects365_reorder_val.json under data/Objects365v1/.

CLIPSelf Checkpoints

Obtain the checkpoints from Drive. And they can be organized as follows:

CLIPSelf/FViT/  
├── checkpoints  # use soft link to save storage on the disk
    ├── eva_vitb16_coco_clipself_patches.pt     # 1
    ├── eva_vitb16_coco_clipself_proposals.pt   # 2
    ├── eva_vitb16_coco_regionclip.pt           # 3
    ├── eva_vitl14_coco_clipself_patches.pt     # 4
    ├── eva_vitl14_coco_clipself_proposals.pt   # 5
    ├── eva_vitl14_coco_regionclip.pt           # 6
    ├── eva_vitb16_lvis_clipself_patches.pt     # 7
    ├── eva_vitl14_lvis_clipself_patches.pt     # 8

Detectors

The detectors on OV-COCO are summarized as follows:

# Backbone CLIP Refinement Proposals AP50novel Config Checkpoint
1 ViT-B/16 CLIPSelf - 33.6 cfg model
2 ViT-B/16 CLIPSelf + 37.6 cfg model
3 ViT-B/16 RegionCLIP + 34.4 cfg model
4 ViT-L/14 CLIPSelf - 38.4 cfg model
5 ViT-L/14 CLIPSelf + 44.3 cfg model
6 ViT-L/14 RegionCLIP + 38.7 cfg model

The detectors on OV-LVIS are summarized as follows:

# Backbone CLIP Refinement Proposals mAPr Config Checkpoint
7 ViT-B/16 CLIPSelf - 25.3 cfg model
8 ViT-L/14 CLIPSelf - 34.9 cfg model

Test

We provide the checkpoints of the object detectors in Drive. And they can be organized as follows:

CLIPSelf/FViT/  
├── checkpoints  # use soft link to save storage on the disk
    ├── fvit_eva_vitb16_ovcoco_clipself_patches.pth     # 1
    ├── fvit_eva_vitb16_ovcoco_clipself_proposals.pth   # 2
    ├── fvit_eva_vitb16_ovcoco_regionclip.pth           # 3
    ├── fvit_eva_vitb16_ovlvis_clipself_patches.pth     # 4
    ├── fvit_eva_vitl14_ovcoco_clipself_patches.pth     # 5
    ├── fvit_eva_vitl14_ovcoco_clipself_proposals.pth   # 6
    ├── fvit_eva_vitl14_ovcoco_regionclip.pth           # 7
    ├── fvit_eva_vitl14_ovlvis_clipself_patches.pth     # 8

An example of evaluation on OV-COCO

bash dist_test.sh configs/ov_coco/fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_eva_clipself_proposals.py \
     checkpoints/fvit_eva_vitb16_ovcoco_clipself_proposals.pth  8  \
     --work-dir your/working/directory --eval bbox

An example of evaluation on OV-LVIS

bash dist_test.sh configs/ov_lvis/fvit_vitl14_upsample_fpn_bs64_4x_ovlvis_eva_clipself_patches.py \
     checkpoints/fvit_eva_vitl14_ovlvis_clipself_patches.pth   8  \
     --work-dir your/working/directory --eval segm

Transfer

Transfer evaluation on COCO:

bash dist_test.sh configs/transfer/fvit_vitl14_upsample_fpn_transfer2coco.py \
     checkpoints/fvit_eva_vitl14_ovlvis_clipself_patches.pth  8  \
     --work-dir your/working/directory --eval bbox

Transfer evaluation on Objects365v1:

bash dist_test.sh configs/transfer/fvit_vitl14_upsample_fpn_transfer2objects365v1.py \
     checkpoints/fvit_eva_vitl14_ovlvis_clipself_patches.pth   8  \
     --work-dir your/working/directory --eval bbox

Train

Prepare the CLIPSelf/RegionCLIP checkpoints as shown in the previous section. An example of training on OV-COCO:

bash dist_train.sh  configs/ov_coco/fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_eva_clipself_proposals.py \
                   8 --work-dir your/working/directory

An example of training on OV-LVIS:

bash dist_train.sh configs/ov_lvis/fvit_vitl14_upsample_fpn_bs64_4x_ovlvis_eva_clipself_patches.py \
                  8 --work-dir your/working/directory

To use multiple machines (e.g., 2x8=16 GPUs) to expedite the training on OV-LVIS, refer to the tutorial of MMDetection. We have set auto_scale_lr = dict(enable=True, base_batch_size=64) in the config files, so the learning rate will be modified automatically.