European Conference on Computer Vision (ECCV) 2024
- We would like to say YES to the title. We introduce SyncOOD to access open-world knowledge encapsulated within off-the-shelf foundation models by synthesizing meaningful OOD data.
- SyncOOD provides an automatic, transparent, controllable, and low-cost pipeline for synthesizing scene-level images containing novel objects with annotation boxes via image editing.
- The synthetic OOD samples are filtered and employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution(ID)/out-of-distribution(OOD) decision boundaries with minimal data usage.
- Explore more in the paper: Can OOD Object Detectors Learn from Foundation Models? in ECCV 2024.
This repository contains code of SyncOOD in two parts:
-
Synthesize Novel Samples for OOD object detection and more open-world tasks (comming soon).
-
Train an OOD Detector for achieving state-of-the-art OOD object detection (comming soon).
Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data. Inspired by recent advancements in text-to-image generative models, such as Stable Diffusion, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples, thereby enhancing OOD object detection. We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models to automatically extract meaningful OOD data from text-to-image generative models. This offers the model access to open-world knowledge encapsulated within off-the-shelf foundation models. The synthetic OOD samples are then employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution (ID)/OOD decision boundaries. Extensive experiments across multiple benchmarks demonstrate that SyncOOD significantly outperforms existing methods, establishing new state-of-the-art performance with minimal synthetic data usage.
- We investigate and unlock the potential of text-to-image generative models trained on large-scale open-set data for synthesizing OOD objects in object detection tasks.
- We introduce an automated data curation process for obtaining controllable, annotated scene-level synthetic OOD images for OOD object detection, which utilizes LLMs for novel concept discovery and visual foundation models for data annotation and filtering.
- We discover that maintaining ID/OOD image context consistency and obtaining more accurate OOD annotation bounding boxes are crucial for synthesized data to be effective in OOD object detection.
- Comprehensive experiments on multiple benchmarks demonstrate the effectiveness of our method, as we significantly outperform existing state-of-the-art approaches while using minimal synthetic data.
If you find this work is useful, please consider citing:
@InProceedings{liu2024can,
author = {Liu, Jiahui and Wen, Xin and Zhao, Shizhen and Chen, Yingxian and Qi, Xiaojuan},
title = {Can OOD Object Detectors Learn from Foundation Models?},
booktitle = {European Conference on Computer Vision},
year = {2024}
}
- This repository is based off of the work from Du et al (ICLR 2022) and Wilson et al (ICCV 2023). Please support their work.
- This work is powered by Detectron2, Stable-Diffusion, ChatGPT, and Segment-Anything. Thanks to these projects.
We aim to develop an automatic, transparent, controllable, and low-cost pipeline for synthesizing scene-level images containing novel objects and provide coco-format annotations to help 1) training OOD detectors and 2) exploring more general open-world tasks (comming soon).
We utilize synthetic Out-of-Distribution(OOD) samples and original In-Distribution(ID) samples to train a lightweight, plug-and-play OOD detector in a very efficient way, achieving state-of-the-art OOD object detection.
We mainly conduct the experiments on Ubuntu 20.04 with GeForce RTX 3090 GPUs (comming soon).