Name		Name	Last commit message	Last commit date
parent directory ..
.circleci		.circleci
.github		.github
configs		configs
demo		demo
detectron2		detectron2
dev		dev
docker		docker
docs		docs
projects		projects
tests		tests
tools		tools
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

README.md

EVA-02: Object Detection & Instance Segmentation

We provide evaluation and training code on Object365, COCO and LVIS datasets. All model weights related to object detection and instance segmentation are available for the community.

Table of Contents

EVA-02: Object Detection & Instance Segmentation

EVA-02 Model Card

EVA-02 uses ViTDet + Cascade Mask RCNN as the object detection and instance segmentation head. We mainly evaluate EVA-02 on COCO and LVIS val set.

To avoid data contamination, all LVIS models are initialized using IN-21K MIM pre-trained EVA-02. Refer to our paper for details.

Head-to-head Comparison

COCO

model name	init. ckpt	LSJ crop size	batch size	iter	AP box	AP mask	config	weight
`eva02_B_coco_bsl`	`eva02_B_pt_in21k_p14to16`	`1024x1024`	128	60k	55.5	47.1	link	🤗 HF link
`eva02_L_coco_bsl`	`eva02_L_pt_m38m_p14to16`	`1024x1024`	144	60k	59.2	50.8	link	🤗 HF link

LVIS

model name	init. ckpt	LSJ crop size	batch size	iter	AP box	AP mask	config	weight
`eva02_B_lvis_bsl`	`eva02_B_pt_in21k_p14to16`	`1024x1024`	128	50k	47.1	41.4	link	🤗 HF link
`eva02_L_lvis_bsl`	`eva02_L_pt_in21k_p14to16`	`1024x1024`	128	40k	55.3	48.6	link	🤗 HF link

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

model name	init. ckpt	LSJ crop size	batch size	iter	AP box	AP mask	config	weight
`eva02_B_coco_sys`	`eva02_B_pt_in21k_p14to16`	`1536x1536`	128	60k	58.9	50.7	link	🤗 HF link
`eva02_L_coco_sys`	`eva02_L_pt_m38m_p14to16`	`1536x1536`	128	60k	62.3	53.8	link	🤗 HF link

LVIS

model name	init. ckpt	LSJ crop size	batch size	iter	AP box	AP mask	config	weight
`eva02_L_lvis_sys`	`eva02_L_pt_in21k_p14to16`	`1536x1536`	128	40k	60.1	53.5	link	🤗 HF link

Object365 Intermediate Fine-tuning

model name	init. ckpt	LSJ crop size	batch size	iter	config	weight
`eva02_L_m38m_to_o365`	`eva02_L_pt_m38m_p14to16`	`1536x1536`	160	400k	link	🤗 HF link
`eva02_L_in21k_to_o365`	`eva02_L_pt_in21k_p14to16`	`1536x1536`	160	400k	link	🤗 HF link

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

model name	init. ckpt	LSJ crop size	batch size	iter	AP box	AP mask	config	weight
`eva02_L_coco_det_sys_o365`	`eva02_L_m38m_to_o365`	`1536x1536`	64	40k	64.1	54.3	link	🤗 HF link
`eva02_L_coco_seg_sys_o365`	`eva02_L_m38m_to_o365`	`1536x1536`	64	40k	63.9	55.4	link	🤗 HF link

We use different checkpoints (from the same training job) for object detection and instance segmentation tasks, since the instance segmentation part is not pre-trained on O365 and converges slower on COCO.

LVIS

model name	init. ckpt	LSJ crop size	batch size	iter	AP box	AP mask	config	weight
`eva02_L_lvis_sys_o365`	`eva02_L_in21k_to_o365`	`1536x1536`	64	70k	65.2	57.3	link	🤗 HF link

Setup

Environment

First, setup EVA-02 pre-training & image classification environment, and install mmcv==1.7.1 for soft-nms.

Then, build EVA-02 det / Detectron2 from source:

cd /path/to/EVA-02/det
python -m pip install -e .

Data

For Object365 (O365) dataset, download it from here. The file structure of O365 should look like:

o365
├── annotations
│   ├── zhiyuan_objv2_train.json
│   └── zhiyuan_objv2_val.json
├── images
│   ├── patch0
│   ├── patch1
│       ...
│   └── patch50
└── ...

For COCO and LVIS datasets, please follow the official guidelines in Detectron2.

Overall, the structure of DETECTRON2_DATASETS should look like:

DETECTRON2_DATASETS
├── o365
├── coco
├── lvis
└── ...

EVA-02 pre-trained weight

MIM Pre-trained EVA-02

model name	#params	MIM pt dataset	MIM pt epochs	weight
`eva02_B_pt_in21k_p14to16`	86M	IN-21K	150	🤗 HF link
`eva02_L_pt_in21k_p14to16`	304M	IN-21K	150	🤗 HF link
`eva02_L_pt_m38m_p14to16`	304M	Merged-38M	56	🤗 HF link

eva02_psz14to16 models interpolate the kernel size of patch_embed from 14x14 to 16x16, and interpolate the pos_embed from 16x16 to 14x14. This is useful for object detection, instance segmentation & semantic segmentation tasks.

O365 Intermediate Fine-tuned EVA-02

Please see here.

Evaluation

Head-to-head Comparison

COCO

Evaluate eva02_B_coco_bsl on COCO val using a single node with 4 gpus.

python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_b_4attn_1024_lrd0p7.py \
 --eval-only \
 train.init_checkpoint=/path/to/eva02_B_coco_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
55.5017,74.8972,60.2801,36.3313,60.9265,72.7190
Task: segm
AP,AP50,AP75,APs,APm,APl
47.0794,71.8778,50.4881,25.9719,51.6530,67.5334

Evaluate eva02_L_coco_bsl on COCO val using a single node with 4 gpus.

python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_l_4attn_1024_lrd0p8.py \
 --eval-only \
 train.init_checkpoint=/path/to/eva02_L_coco_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
59.1550,78.6420,64.0970,41.9209,64.4478,75.3628
Task: segm
AP,AP50,AP75,APs,APm,APl
50.7923,75.8581,55.2317,30.4149,55.2931,70.3713

LVIS

Evaluate eva02_B_lvis_bsl on LVIS val using a single node with 4 gpus.

python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_lvis/eva2_lvis_cascade_mask_rcnn_vitdet_b_4attn_1024_lrd0p7.py \
 --eval-only \
 train.init_checkpoint=/share/project/yxf/open/eva02/det/eva02_B_lvis_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
47.1304,62.8418,49.3729,34.5095,58.6703,69.0790,36.3894,47.9805,50.9023
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
41.3575,60.4043,44.2597,27.0808,54.2237,66.4821,32.2175,42.5939,43.9947

Evaluate eva02_L_lvis_bsl on LVIS val using a single node with 4 gpus.

python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_lvis/eva2_lvis_cascade_mask_rcnn_vitdet_l_4attn_1024_lrd0p8.py \
 --eval-only \
 train.init_checkpoint=/share/project/yxf/open/eva02/det/eva02_L_lvis_bsl.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
55.2866,71.9707,58.1774,41.9690,67.2664,74.6449,50.6344,56.7946,55.6481
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
48.5964,69.3752,52.5725,33.2063,61.7856,71.2531,45.5442,50.2401,48.1034

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

Evaluate eva02_B_coco_sys on COCO val using a single node with 4 gpus.

# evaluate object detection performance w/o maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_b_6attn_win32_1536_lrd0p7.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 train.init_checkpoint=/path/to/eva02_B_coco_sys.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
58.9331,77.8558,64.5855,42.1048,63.6015,74.3375

# evaluate instance segmentation performance w/ maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_b_6attn_win32_1536_lrd0p7.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_B_coco_sys.pth

Expected results:

Task: segm
AP,AP50,AP75,APs,APm,APl
50.6875,74.8027,55.4802,30.5407,54.2940,69.7923

Evaluate eva02_L_coco_sys on COCO val using a single node with 4 gpus.

# evaluate object detection performance w/o maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_l_8attn_win32_1536_lrd0p8.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 train.init_checkpoint=/path/to/eva02_L_coco_sys.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
62.2848,80.8003,68.0974,45.8547,66.7479,78.0017

# evaluate instance segmentation performance w/ maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_coco/eva2_coco_cascade_mask_rcnn_vitdet_l_8attn_win32_1536_lrd0p8.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_L_coco_sys.pth

Expected results:

Task: segm
AP,AP50,AP75,APs,APm,APl
53.8006,78.2082,59.0474,34.2192,57.6212,72.6909

LVIS

Evaluate eva02_L_lvis_sys on LVIS val using a single node with 4 gpus.

python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_mim_to_lvis/eva2_lvis_cascade_mask_rcnn_vitdet_l_8attn_win32_1536_lrd0p8.py \
 --eval-only \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_L_lvis_sys.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
60.0506,74.5881,63.7702,48.6573,70.7069,77.6001,53.1295,61.7127,61.2374
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
53.5098,72.7459,57.9503,40.0757,65.5123,73.5845,47.5871,55.2739,54.1442

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

Evaluate eva02_L_coco_det_sys_o365 on COCO val using a single node with 4 gpus.

# evaluate object detection performance w/o maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva2_o365_to_coco_cascade_mask_rcnn_vitdet_l_8attn_1536_lrd0p8.py \
 --eval-only \
 train.model_ema.use_ema_weights_for_eval_only=True \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 train.init_checkpoint=/path/to/eva02_L_coco_det_sys_o365.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
64.1442,82.1375,70.2666,48.8727,68.5649,78.1375

Evaluate eva02_L_coco_seg_sys_o365 on COCO val using a single node with 4 gpus.

# evaluate instance segmentation performance w/ maskness
python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva2_o365_to_coco_cascade_mask_rcnn_vitdet_l_8attn_1536_lrd0p8.py \
 --eval-only \
 train.model_ema.use_ema_weights_for_eval_only=True \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.override_score_thresh=0.0 \
 model.roi_heads.maskness_thresh=0.5 \
 train.init_checkpoint=/path/to/eva02_L_coco_seg_sys_o365.pth

Expected results:

Task: segm
AP,AP50,AP75,APs,APm,APl
55.3584,79.7107,61.4651,36.4061,59.0486,73.1774

LVIS

Evaluate eva02_L_lvis_sys_o365 on LVIS val using a single node with 4 gpus.

python tools/lazyconfig_train_net.py \
 --num-gpus 4  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
 --config-file projects/ViTDet/configs/eva2_o365_to_lvis/eva2_o365_to_lvis_cascade_mask_rcnn_vitdet_l_8attn_1536_lrd0p8.py \
 --eval-only \
 train.model_ema.use_ema_weights_for_eval_only=True \
 model.roi_heads.use_soft_nms=True \
 model.roi_heads.class_wise=True \
 model.roi_heads.method=linear \
 model.roi_heads.iou_threshold=0.6 \
 model.roi_heads.maskness_thresh=0.5 \
 dataloader.evaluator.max_dets_per_image=1000 \
 train.init_checkpoint=/path/to/eva02_L_lvis_sys_o365.pth

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
65.2244,78.9298,68.9737,55.3842,75.1310,79.8346,59.7070,66.8476,65.8378
Task: segm
AP,AP50,AP75,APs,APm,APl,APr,APc,APf
57.3209,76.9294,62.1691,44.9472,68.7592,74.9193,52.4748,58.9491,57.6336

Training

Please select config.py and the corresponding init_checkpoint.pth based on Model Card.

All configs can be trained with:

python tools/lazyconfig_train_net.py \
    --num-gpus 8  --num-machines ${WORLD_SIZE} --machine-rank ${RANK} --dist-url "tcp://$MASTER_ADDR:60900" \
    --config-file /path/to/config.py \
    train.init_checkpoint=/path/to/init_checkpoint.pth \
    train.output_dir=/path/to/output

Acknowledgment

EVA-02 object detection and instance segmentation are built upon Detectron2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

det

det

README.md

EVA-02: Object Detection & Instance Segmentation

EVA-02 Model Card

Head-to-head Comparison

COCO

LVIS

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

LVIS

Object365 Intermediate Fine-tuning

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

LVIS

Setup

Environment

Data

EVA-02 pre-trained weight

MIM Pre-trained EVA-02

O365 Intermediate Fine-tuned EVA-02

Evaluation

Head-to-head Comparison

COCO

LVIS

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

LVIS

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

LVIS

Training

Acknowledgment

Files

det

Directory actions

More options

Directory actions

More options

Latest commit

History

det

Folders and files

parent directory

README.md

EVA-02: Object Detection & Instance Segmentation

EVA-02 Model Card

Head-to-head Comparison

COCO

LVIS

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

LVIS

Object365 Intermediate Fine-tuning

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

LVIS

Setup

Environment

Data

EVA-02 pre-trained weight

MIM Pre-trained EVA-02

O365 Intermediate Fine-tuned EVA-02

Evaluation

Head-to-head Comparison

COCO

LVIS

System-level Comparisons w/o O365 Intermediate Fine-tuning

COCO

LVIS

System-level Comparisons w/ O365 Intermediate Fine-tuning

COCO

LVIS

Training

Acknowledgment