Skip to content

Latest commit

 

History

History
executable file
·
202 lines (167 loc) · 12.1 KB

README.md

File metadata and controls

executable file
·
202 lines (167 loc) · 12.1 KB

OV-PARTS: Towards Open-Vocabulary Part Segmentation

Meng WeiXiaoyu YueWenwei ZhangXihui LiuShu KongJiangmiao Pang*
Shanghai AI Laboratory The University of Hong Kong The University of Sydney University of Macau Texas A&M University

🏠 About

OV-PARTS is a benchmark for Open-Vocabulary Part Segmentation by using the capabilities of large-scale Vision-Language Models (VLMs).

  • Benchmark Datasets: Two refined versions of two publicly available datasets:

  • Benchmark Tasks: Three specific tasks which provides insights into the analogical reasoning, open granularity and few-shot adapting abilities of models.

    • Generalized Zero-Shot Part Segmentation: this benchmark task aims to assess the model’s capability to generalize part segmentation from seen objects to related unseen objects.
    • Cross-Dataset Part Segmentation: except for the zero-shot generalization ability, this benchmark task aims to assess the model’s capability to generalize part segmentation across different datasets with varying granularity levels.
    • Few-Shot Part Segmentation: this benchmark task aims to assess the model’s fast adaptation capability.
  • Benchmark Baselines: Baselines based on existing two-stage and one-stage object-level open vocabulary segmentation methods, including ZSseg, CLIPSeg, CATSeg.

🔥 News

We organize the Open Vocabulary Part Segmentation (OV-PARTS) Challenge in the Visual Perception via Learning in an Open World (VPLOW) Workshop. Please check our website!

🛠 Getting Started

Installation

  1. Clone this repository

    git clone https://github.com/OpenRobotLab/OV_PARTS.git
    cd OV_PARTS
  2. Create a conda environment with Python3.8+ and install python requirements

    conda create -n ovparts python=3.8
    conda activate ovparts
    pip install -r requirements.txt

Data Preparation

After downloading the two benchmark datasets, please extract the files by running the following command and place the extracted folder under the "Datasets" directory.

tar -xzf PascalPart116.tar.gz
tar -xzf ADE20KPart234.tar.gz

The Datasets folder should follow this structure:

Datasets/
├─Pascal-Part-116/
│ ├─train_16shot.json
│ ├─images/
│ │ ├─train/
│ │ └─val/
│ ├─annotations_detectron2_obj/
│ │ ├─train/
│ │ └─val/
│ └─annotations_detectron2_part/
│   ├─train/
│   └─val/
└─ADE20K-Part-234/
  ├─images/
  │ ├─training/
  │ ├─validation/
  ├─train_16shot.json
  ├─ade20k_instance_train.json
  ├─ade20k_instance_val.json
  └─annotations_detectron2_part/
    ├─training/
    └─validation/

Create {train/val}_{obj/part}_label_count.json files for Pascal-Part-116.

python baselines/data/datasets/mask_cls_collect.py Datasets/Pascal-Part-116/annotations_detectron2_{obj/part}/{train/val} Datasets/Pascal-Part-116/annotations_detectron2_part/{train/val}_{obj/part}_label_count.json

Training

  1. Training the two-stage baseline ZSseg+.

    Please first download the clip model fintuned with CPTCoOp.

    Then run the training command:

    python train_net.py --num-gpus 8 --config-file configs/${SETTING}/zsseg+_R50_coop_${DATASET}.yaml
  2. Training the one-stage baselines CLIPSeg and CATSeg.

    Please first download the pre-trained object models of CLIPSeg and CATSeg and place them under the "pretrain_weights" directory.

    Models Pre-trained checkpoint
    CLIPSeg download
    CATSeg download

    Then run the training command:

    # For CATseg.
    python train_net.py --num-gpus 8 --config-file configs/${SETTING}/catseg_${DATASET}.yaml
    
    # For CLIPseg.
    python train_net.py --num-gpus 8 --config-file configs/${SETTING}/clipseg_${DATASET}.yaml

Evaluation

We provide the trained weights for the three baseline models reported in the paper.

Models Setting Pascal-Part-116 checkpoint ADE20K-Part-234 checkpoint
ZSSeg+ Zero-shot download download
CLIPSeg Zero-shot download download
CatSet Zero-shot download download
CLIPSeg Few-shot download download
CLIPSeg cross-dataset - download

To evaluate the trained models, add --eval-only to the training command.

For example:

  python train_net.py --num-gpus 8 --config-file configs/${SETTING}/catseg_${DATASET}.yaml --eval-only MODEL.WEIGHTS ${WEIGHT_PATH}

📝 Benchmark Results

  • Zero-shot performance of the two-stage and one-stage baselines on Pascal-Part-116

    Model Backbone Finetuning Oracle-Obj Pred-Obj
    Seen Unseen Harmonic Seen Unseen Harmonic
    Fully-Supervised
    MaskFormer ResNet-50 - 55.28 52.14 - 53.07 47.82 -
    Two-Stage Baselines
    ZSseg ResNet-50 - 49.35 12.57 20.04 40.80 12.07 18.63
    ZSseg+ ResNet-50 CPTCoOp 55.33 19.17 28.48 54.23 17.10 26.00
    ZSseg+ ResNet-50 CPTCoCoOp 54.43 19.04 28.21 53.31 16.08 24.71
    ZSseg+ ResNet-101c CPTCoOp 57.88 21.93 31.81 56.87 20.29 29.91
    One-Stage Baselines
    CATSeg ResNet-101
    &ViT-B/16
    - 14.89 10.29 12.17 13.65 7.73 9.87
    CATSeg ResNet-101
    &ViT-B/16
    B+D 43.97 26.11 32.76 41.65 26.08 32.07
    CLIPSeg ViT-B/16 - 22.33 19.73 20.95 14.32 10.52 12.13
    CLIPSeg ViT-B/16 VA+L+F+D 48.68 27.37 35.04 44.57 27.79 34.24
  • Zero-shot performance of the two-stage and one-stage baselines on ADE20K-Part-234

    Model Backbone Finetuning Oracle-Obj Pred-Obj
    Seen Unseen Harmonic Seen Unseen Harmonic
    Fully-Supervised
    MaskFormer ResNet-50 - 46.25 47.86 - 35.52 16.56 -
    Two-Stage Baselines
    ZSseg+ ResNet-50 CPTCoOp 43.19 27.84 33.85 21.30 5.60 8.87
    ZSseg+ ResNet-50 CPTCoCoOp 39.67 25.15 30.78 19.52 2.98 5.17
    ZSseg+ ResNet-101c CPTCoOp 43.41 25.70 32.28 21.42 3.33 5.76
    One-Stage Baselines
    CATSeg ResNet-101
    &ViT-B/16
    - 11.49 8.56 9.81 6.30 3.79 4.73
    CATSeg ResNet-101
    &ViT-B/16
    B+D 31.40 25.77 28.31 20.23 8.27 11.74
    CLIPSeg ViT-B/16 - 15.27 18.01 16.53 5.00 3.36 4.02
    CLIPSeg ViT-B/16 VA+L+F+D 38.96 29.65 33.67 24.80 6.24 9.98
  • Cross-Dataset performance of models trained on the source dataset ADE20K-Part-234 and tested on the target dataset Pascal-Part-116.

    Model Source Target
    Oracle-Obj Pred-Obj Oracle-Obj Pred-Obj
    CATSeg 27.95 17.22 16.00 14.72
    CLIPSeg VA+L+F 35.01 21.74 16.18 11.70
    CLIPSeg VA+L+F+D 37.76 21.87 19.69 13.88

🔗 Citation

If you find our work helpful, please cite:

@inproceedings{wei2023ov,
  title={OV-PARTS: Towards Open-Vocabulary Part Segmentation},
  author={Wei, Meng and Yue, Xiaoyu and Zhang, Wenwei and Kong, Shu and Liu, Xihui and Pang, Jiangmiao},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2023}
}

👏 Acknowledgements

We would like to express our gratitude to the open-source projects and their contributors, including ZSSeg, CATSeg and CLIPSeg. Their valuable work has greatly contributed to the development of our codebase.