Skip to content

3D-aware visual question answering dataset, for parts, poses and occlusion reasoning. Published in NeurIPS'23.

License

Notifications You must be signed in to change notification settings

XingruiWang/superclevr-3D-question

Repository files navigation

Super-CLEVR-3D Dataset

Footnote: This dataset is published as part of the NeurIPS'23 paper "3D-Aware Visual Question Answering about Parts, Poses and Occlusions". Please refer to the paper for a more detailed explanation and motivations of this dataset. The whole project of this paper can be found at 3D-Aware-VQA.

About

Super-CLEVR-3D is a visual question answering (VQA) dataset where the questions are about the explicit 3D configuration of the objects from images (i.e. 3D poses, parts, and occlusion). It consists of objects from 5 categories: aeroplanes, buses, bicycles, cars and motorbikes. The rendered objects are from CGParts dataset, with the same setting as Super-CLEVR dataset.

How to Download

Name Download Link Description
Images images.zip There are 30k images in total. The first 20k are used for training, then 5k for validation and 5k for testing.
Annotations scenes.json The corresponding annotation for each objects.
Questions questions.zip Consist of 4 question files: questions/superclevr_questions_obj_occlusion.json, questions/superclevr_questions_occlusion.json, questions/superclevr_questions_parts.json, questions/superclevr_questions_pose.json.

Inspect the dataset

This notebook shows how you can load the questions and the image after you download the data.

How to generate data by yourself

1. Generate Images and Scene Annotations

The scripts for image generation is in scripts/render_images_3D.sh. Please read this documentation for more instructions.

2. Generate Questions

As introduced in the paper, we include three types of questions: pose, parts and occlusion.

  1. Pose questions
cd question_generation

START_IDX=0
python generate_questions_pose.py \
   --input_scene_file ../output/ver_mask_new/superCLEVR_scenes_210k_occlusion.json \
   --scene_start_idx ${START_IDX} \
   --num_scenes 20000 \
   --instances_per_template 1 \
   --templates_per_image 10 \
   --metadata_file metadata_pose.json \
   --output_questions_file ../output/superclevr_questions_pose_no_red_2.json \
   --template_dir super_clevr_pose \
   --remove_redundant 1.0 \
  1. Occlusion questions
cd question_generation

START_IDX=0
python generate_questions.py \
   --input_scene_file ../output/ver_mask_new/superCLEVR_scenes_210k_occlusion.json \
   --scene_start_idx ${START_IDX} \
   --num_scenes 21000 \
   --instances_per_template 1 \
   --templates_per_image 10 \
   --metadata_file metadata_part_occlusion.json \
   --output_questions_file ../output/superclevr_questions_occlusion_210k.json \
   --template_dir super_clevr_occlusion_new \
   --remove_redundant 1.0

Set template_dir=super_clevr_occlusion_new if generating occlusion questions with parts; Set template_dir=super_clevr_object_occlusion if generating occlusion questions without parts;

About

3D-aware visual question answering dataset, for parts, poses and occlusion reasoning. Published in NeurIPS'23.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •