This repository releases the curated ACQUIRED
dataset for counterfactual question answering on real-life videos. For more details, please check out our EMNLP 2023 paper.
ACQUIRED
consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints, which ensures a focus on real-world diversity.
Each video is annotated with questions that span three distinct dimensions of reasoning, including physical
, social
, and temporal
, which can comprehensively evaluate the model counterfactual abilities along multiple aspects.
Please download the zip file from this google drive link and directly unzip it here.
The unziped folder structure should look like below:
acquired_dataset
├── ego4d
│ ├── 002d2729-df71-438d-8396-5895b349e8fd
│ ├── 01db7c39-a512-4bac-b284-dff8c7360e80
│ └── ... ...
└── oopsqa
The main split of the datasets are prepared as .json
files under the folder Dataset
, which contains train.json
, val.json
, and test.json
, the official splits used in the above paper.
Please follow the instruction in Demo.ipynb
to visualize the data samples and check the structure of the datasets more in-depth.
Generally, each data point in {split}.json
file under the folder Dataset
will have fields like below (as an entry of a list of dict
s:
{
"video_id": ...,
"domain": ...,
"type": "Counterfactual",
"question": ...,
"answer1": ...,
"answer2": ...,
"correct_answer_key": "answer{1/2}",
"video_url": "url_of_the_video.mp4",
"video_path": "path/to/video.mp4"
},
For correspondence between each data point and the name of the video, please refer to the field video_path
.
The video_path
will be the relative path under the root directory of the downloaded folder (i.e., in this example, acquired_dataset
if you did not change the downloaded folder name).
If you find our curated resource useful, please cite our paper using:
@inproceedings{wu2023acquired,
title = {ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos},
author = {Wu*, Te-Lin and Dou*, Zi-Yi and Hu*, Qingyuan and Hou, Yu and Chandra, Nischal Reddy and Freedman, Marjorie and Weischedel, Ralph and Peng, Nanyun},
booktitle = {The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2023}
}