Skip to content

LeapLabTHU/OVM3D-Det

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OVM3D-Det (NeurIPS 2024)

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data

Rui Huang, Henry Zheng, Yan Wang, Zhuofan Xia, Marco Pavone, Gao Huang

[Project Page] [arXiv] [BibTeX]

OVM3D-Det We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det, which trains detectors using only RGB images, making it both cost-effective and scalable to publicly available data.

Table of Contents

  1. Installation
  2. Data
  3. Pseudo-Label Generation
  4. Test
  5. Training
  6. Citing
  7. Acknowledgement

Installation

We follow the main dependencies of Cube R-CNN and have added dependencies for UniDepth and Grounded-SAM.

# setup new evironment
conda create -n ovm3d python=3.10
conda activate ovm3d

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c fvcore -c iopath -c conda-forge -c pytorch3d -c pytorch fvcore iopath pytorch3d

# OpenCV, COCO, detectron2
pip install cython opencv-python
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

cd third_party
git clone [email protected]:facebookresearch/detectron2.git
python -m pip install -e detectron2
 
# other dependencies
conda install -c conda-forge scipy seaborn

# install dependencies for Unidepth
cd UniDepth
pip install -e .

# install dependencies for Grounded-Segment-Anything
cd ../Grounded-Segment-Anything 
python -m pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO

pip install scikit-learn

Data

We utilize four datasets from Omni3D: KITTI, nuScenes, SUN RGB-D and ARKitScenes. For detailed instructions on downloading and setting up the images and annotations, please refer to the Omni3D data guide.

Pseudo-Label Generation

We provide pre-generated pseudo labels here for the training and validation sets. Please place them in the datasets folder. To generate pseudo labels yourself, follow these steps by running:

bash scripts/generate_pseudo_label.sh DATASET

Specifically:

DATASET=$1

# Step 1: Predict depth using UniDepth
CUDA_VISIBLE_DEVICES=0 python third_party/UniDepth/run_unidepth.py --dataset $DATASET

# Step 2: Segment novel objects also the ground using Grounded-SAM
CUDA_VISIBLE_DEVICES=0 python third_party/Grounded-Segment-Anything/grounded_sam_detect.py --dataset $DATASET
CUDA_VISIBLE_DEVICES=0 python third_party/Grounded-Segment-Anything/grounded_sam_detect_ground.py --dataset $DATASET

# Step 3: Generate pseudo 3D bounding boxes
python tools/generate_pseudo_bbox.py \
  --config-file configs/Base_Omni3D_${DATASET}.yaml \
  OUTPUT_DIR output/generate_pseudo_label \

# Step 4: Convert to COCO dataset format
python tools/transform_to_coco.py --dataset_name $DATASET

Replace DATASET with the name of the dataset you are working with.

Test

To evaluate the trained models, download the pre-trained models and place them in the checkpoints folder.

Datasets Link
KITTI Google Drive
nuScenes Google Drive
SUNRGBD Google Drive
ARKitScenes Google Drive
bash scripts/test.sh DATASET

Training

To train the model from scratch, run:

bash scripts/train.sh DATASET

Citing

If you find this repo helpful, please consider citing us.

@inproceedings{huang2024training,
    title={Training an Open-Vocabulary Monocular 3D Detection Model without 3D Data},
    author={Rui Huang and Henry Zheng and Yan Wang and Zhuofan Xia and Marco Pavone and Gao Huang},
    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
    year={2024},
}

Acknowledgement

We build upon the source code of Cube R-CNN, UniDepth, Grounded-SAM, WeakM3D, and OV-3DET. We sincerely thank the authors for their efforts.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published