This is a dedicated repository for the paper:
PlotQA: Reasoning over Scientific Plots
Pritha Ganguly,
Nitesh Methani,
Mitesh Khapra,
Pratyush Kumar
The paper deals with the task of question-answering over a very specific class of images, namely scientific plots such as bar plots, line plots, and dot-line plots. This work is to be presented at WACV 2020.
PlotQA is a VQA dataset with 28.9 million question-answer pairs grounded over 224,377 plots on data from real-world sources and questions based on crowd-sourced question templates.
Existing synthetic datasets (FigureQA, DVQA) for reasoning over plots do not contain variability in data labels, real-valued data, or complex reasoning questions. Consequently, proposed models for these datasets do not fully address the challenge of reasoning over plots. In particular, they assume that the answer comes either from a small fixed size vocabulary or from a bounding box within the image. However, in practice this is an unrealistic assumption because many questions require reasoning and thus have real valued answers which appear neither in a small fixed size vocabulary nor in the image. In this work, we aim to bridge this gap between existing datasets and real world plots by introducing PlotQA. Further, 80.76% of the out-of-vocabulary (OOV) questions in PlotQA have answers that are not in a fixed vocabulary.
Few examples of the {plot, question, answer} triplets from the PlotQA dataset are given below:
To download the dataset, click here.
To check the code for our proposed pipeline, click here.
All the datasets created as part of this work are released under a CC-BY-4.0 license and all models & code are released under an MIT license.
Please cite the following if you use the PlotQA dataset in your work:
@InProceedings{Methani_2020_WACV,
author = {Methani, Nitesh and Ganguly, Pritha and Khapra, Mitesh M. and Kumar, Pratyush},
title = {PlotQA: Reasoning over Scientific Plots},
booktitle = {The IEEE Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}
If you have any questions, suggestions or comments about the dataset in the paper, feel free to contact us at: Nitesh Methani ([email protected]), Pritha Ganguly ([email protected]).