Sketch Query Guided Object Detection

Overview

This repository introduces a novel approach to Sketch-guided Object Detection (SGOD), where multiple objects can be localized based on sketches with spatial awareness, improving the performance of object detection tasks in complex scenes. Traditional methods focus on detecting a single object at a time based on a single sketch, but this approach extends the functionality to detect multiple objects with meaningful spatial relationships between them. The method is built on top of the DEtection TRansformers (DETR) model, with significant modifications to integrate both photo and sketch features into the detection pipeline. The result is a more flexible, scalable, and spatially aware object detection system.

The core contribution of this research is to support sketch query guidance. This allows users to draw multiple sketches, which are then processed by the model to localize various objects with spatial alignment.

Key Features

Multiple Object Detection with Spatial Awareness: Users can query complex scenes and detect multiple objects while considering their spatial relationships (e.g., "dog to the right of a person").
Sketch-Guided Object Localization: Unlike traditional object detection models, this approach uses sketches to guide the localization of objects in natural images.
DETR-based Architecture: We leverage the DETR model with custom modifications to handle sketches in combination with photo features for object localization.

Changes Made to the Original DETR Code

1. Modifications in `models/detr.py`

Original Code:

Feature Combination: In the original DETR implementation, features from the backbone are processed and passed to the transformer only for single input images. In this work we modify DETR to process the input images along with sketch inputs.

New Additions:

Feature Combination:
```
src = photo_features + sketch_features
```
The primary modification here is the element-wise addition of photo_features and sketch_features. In the original DETR, the photo features would have been processed in isolation, but now both photo and sketch features are merged. This modification helps integrate the information from both the photo and the sketch inputs for object localization.We also compare other methods like concatenation with early/late fusion as well.
Updated Forward Pass:
```
hs = self.transformer(self.input_proj(photo), self.input_proj_(sketch), mask, self.query_embed.weight, pos[-1])[0]
hs = self.transformer(self.input_proj(src), self.input_proj_(sketch), mask, self.query_embed.weight, pos[-1])[0]
```
The forward pass has been updated to accept both the photo and sketch features. The transformer now processes both of these feature sets simultaneously, ensuring the model can understand and use both photo and sketch data for detection.

2. Modifications in `models/transformer.py`

Original Code:

Projection Layers: The original code uses projection layers to map the input features to a suitable dimension for the transformer. It does not specifically account for handling both photo and sketch features in the same pipeline.

New Additions:

New Linear Layers:
```
self.input_proj_ = nn.Linear(1000, 100)
self.input_proj_ = nn.Linear(2100, 512)
self.input_proj = nn.Linear(512, 100)
```
These new layers are added to handle the feature projections of both the photo and sketch inputs. The layers are designed to handle the feature space after both inputs are combined.
Updated Target Tensor Size:
```
target = torch.zeros(2000, bs, c)
```
The target tensor size has been updated to 2000, as the model now processes a larger set of combined features from both the photo and sketch. This ensures that the model can handle more complex input data.
Modified Forward Pass to Handle Sketch: The forward pass was adjusted to include both the photo and sketch features, with the encoder now accepting both feature types:
```
memory = self.encoder(src, sketch, src_key_padding_mask=mask, pos=pos_embed)
```

Setup and Installation

To set up and run the project, follow these steps:

Clone the Repository:

git clone https://github.com/deepwilson/Sketch-Query-Guided-Object-Detection.git
cd Sketch-Query-Guided-Object-Detection

Install Dependencies:

pip install -r requirements.txt

Dataset Preparation:

Setup the Dataset: Place the sketchyCOCO dataset in the directory ../sketch_detr_data/data/sketchyCOCO/Scene/.
Run the Data Preparation Script: Run the script to prepare the dataset:
```
python prepare_single_instance_sketch_data_COCO.py
```
This will process the dataset and generate the necessary JSON files (trainInTrain.json, valInTrain.json) in the sketch_retrieval_dataset/single_instance_dataset folder.
Verify Output: After running the script, verify that the dataset is saved as a COCO JSON file and images are placed correctly in the output directory.

The script processes the sketch and bounding box annotations and prepares the data for training and evaluation.

Ensure you have the Sketch-guided Object Detection (SGOD) dataset in the following structure:

data/
  └── sketches_single_instance/
        ├── trainInTrain.json
        └── valInTrain.json

Running the Code:

Inference:

python3 test.py --data_path data/Sketch/paper_version/val --resume checkpoint/checkpoint.pth

Evaluation:

python3 main.py --num_workers 14 --batch_size 4 --device "cuda:0" --eval --resume checkpoint/checkpoint.pth

Training:

python3 main.py --num_workers 14 --batch_size 4 --device "cuda:0" --resume detr-r50-e632da11.pth

Results

The following images demonstrate the effectiveness of the Sketch Query Guided Object Detection approach. These visual results show how the model is able to detect objects and their spatial relationships based on sketches, as well as the architecture of the modified DETR model used in this study.

Example 1: Multi-object Detection with Spatial Alignment

This image shows how a sketch can guide the model to detect and localize objects in an image based on the user's query.

Example 2: Detecting only sketch realted object despite having Multiple instance of same label

In this example, the model successfully identifies only those instances drawn by user even though multiple objectsof same label existin the image, while preserving their spatial relationships as defined by the user’s sketch.

Example 3: Detecting only sketch realted object despite having Multiple instance of same label

Another example

Example 4: Example of Spatial Alignment

The model is capable of detecting multiple objects and maintaining the spatial arrangement as defined by the sketch input.

Model Architecture

This image illustrates the architecture of the modified DETR model that incorporates sketch-guided object detection with spatial awareness. The key modification is the introduction of a query canvas for users to draw multiple instances, which enables the model to detect objects based on spatial alignment.

Citation

If you find this work useful, please cite the following:

@thesis{Aricatt_Song_Chowdhury_2023, title={Sketch Query Guided Object Detection}, author={Aricatt, Deep  Wilson and Song, Yi-Zhe and Chowdhury, Pinaki  Nath}, year={2023}}

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.circleci		.circleci
.github		.github
checkpoint		checkpoint
d2		d2
data/sketches_single_instance		data/sketches_single_instance
datasets		datasets
eval_decoder_V2__multiple_sketch_instances_loss0_4__threshold5		eval_decoder_V2__multiple_sketch_instances_loss0_4__threshold5
eval_decoder__multiple_sketch_instances_loss0_2__threshold5		eval_decoder__multiple_sketch_instances_loss0_2__threshold5
eval_decoder__multiple_sketch_instances_threshold5		eval_decoder__multiple_sketch_instances_threshold5
eval_multiple_sketch_instances_threshold3		eval_multiple_sketch_instances_threshold3
eval_multiple_sketch_instances_threshold5		eval_multiple_sketch_instances_threshold5
eval_multiple_sketch_instances_trial		eval_multiple_sketch_instances_trial
eval_photADDsketch_vs_sketch_decoder_V2		eval_photADDsketch_vs_sketch_decoder_V2
eval_phot_sketch_vs_sketch_decoder_V2		eval_phot_sketch_vs_sketch_decoder_V2
final_models		final_models
models		models
util		util
.gitignore		.gitignore
000000061174combined_image.jpg		000000061174combined_image.jpg
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
combinedsketch.jpg		combinedsketch.jpg
engine.py		engine.py
hubconf.py		hubconf.py
lut.py		lut.py
main.py		main.py
plot.py		plot.py
prepare_single_instance_sketch_data_COCO.py		prepare_single_instance_sketch_data_COCO.py
requirements.txt		requirements.txt
run_with_submitit.py		run_with_submitit.py
test.py		test.py
test_all.py		test_all.py
test_v1.py		test_v1.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sketch Query Guided Object Detection

Overview

Key Features

Changes Made to the Original DETR Code

1. Modifications in `models/detr.py`

Original Code:

New Additions:

2. Modifications in `models/transformer.py`

Original Code:

New Additions:

Setup and Installation

Clone the Repository:

Install Dependencies:

Dataset Preparation:

Running the Code:

Results

Example 1: Multi-object Detection with Spatial Alignment

Example 2: Detecting only sketch realted object despite having Multiple instance of same label

Example 3: Detecting only sketch realted object despite having Multiple instance of same label

Example 4: Example of Spatial Alignment

Model Architecture

Citation

License

About

Releases

Packages

Languages

License

deepwilson/Sketch-Query-Guided-Object-Detection

Folders and files

Latest commit

History

Repository files navigation

Sketch Query Guided Object Detection

Overview

Key Features

Changes Made to the Original DETR Code

1. Modifications in models/detr.py

Original Code:

New Additions:

2. Modifications in models/transformer.py

Original Code:

New Additions:

Setup and Installation

Clone the Repository:

Install Dependencies:

Dataset Preparation:

Running the Code:

Results

Example 1: Multi-object Detection with Spatial Alignment

Example 2: Detecting only sketch realted object despite having Multiple instance of same label

Example 3: Detecting only sketch realted object despite having Multiple instance of same label

Example 4: Example of Spatial Alignment

Model Architecture

Citation

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Modifications in `models/detr.py`

2. Modifications in `models/transformer.py`

Packages