This folder contains the following data:
waldo_and_wenda.csv
– Waldo and Wenda benchmark for HHI understandingimsitu-hhi.txt
– IDs for imSitu-HHI subset of the imSitu datasetphhi.csv
– pHHI (pseudo-labels indicating HHI) for the Who's Waldo dataset
See the sections below for instructions on using these, as well as download instructions for synthetic caption data.
The file waldo_and_wenda.csv
contains metadata and ground-truth annotations for the 1,000–item Waldo and Wenda HHI understanding benchmark. The source
column indicates images from:
ww
– Who's Waldo (300 items)cc
– Conceptual Captions (400 items, from val set)coco
– Microsoft COCO (300 items, from val2014 set)
WW images can be obtained by requesting access to the WW dataset (see its homepage for details). CC and COCO images are available via the listed URLs (CC source). We do not reproduce image files here; see each dataset for its respective licensing details and see below for the licensing of our additions.
The caption
column provides ground-truth captions from the source datasets. Note that WW captions have named person entities replaced with an underscore, and COCO samples use the first reference from the original dataset as the listed caption.
The id
column contains a unique identifier for each item. For those from WW and COCO, these are the original identifiers from those datasets. For items from CC, these are the first five digits of the MD5 hash of the corresponding image URL.
The file imsitu-hhi.txt
lists the items from the imSitu dataset that comprise the imSitu-HHI subset as described in our paper.
The file phhi.csv
contains HHI pseudo-labels for relevant items in the Who's Waldo dataset. These have been preprocessed as described in our paper, including to avoid overlap with the test items in Waldo and Wenda.
Alternatively, you may generate these yourself using the code in the pseudo-labeling subrepo.
For image data, please request access to Who's Waldo as described above.
You may download the synthetic caption data synthetic_captions.csv.gz
(used for training summarization model) at this link.
Data from the Who's Waldo, Conceptual Captions, Microsoft COCO, and imSitu datasets are licensed according to the licensing terms of each respective dataset. We license our data contributions (ground-truth pseudo-label annotations) under the non-commercial CC BY-NC-SA 4.0 license.