Visit the project page to find the download links for our annotations.

Note that we do not provide the raw videos and frames, and you have to download them according to the instructions of the original datasets:

OVIS
UVO (v1.0)
Oops

You can use this script provided with the UVO dataset, to extract frames from the video files.

We recommend the following structure for the data (not complete):

video-localized-narratives/data/
├── vidlns
│   ├── OVIS_train.jsonl
│   ├── UVO_sparse_train.jsonl
│   ├── UVO_sparse_val.jsonl
│   ├── UVO_dense_train.jsonl
│   ├── UVO_dense_val.jsonl
│   ├── oops_train.jsonl
│   ├── oops_val.jsonl
├── frames
│   ├── OVIS_train
│   │   │── 5d24e4ea
│   │   │   │── img_0000001.jpg
│   │   │   │── ...
│   │   │── ...
│   ├── UVO_sparse_train
│   │   │── zxE180Fndow
│   │   │   │── 0.png
│   │   │   │── ...
│   │   │   │── 299.png
│   ├── UVO_sparse_val
│   │   │── ...
│   ├── oops_train
│   │   │── Your Tooth Is Missing - Best Fails of the Week (November 2017) _ FailArmy9
│   │   │   │── 000000.png
│   │   │   │── ...
│   │   │   │── 000249.png
│   │   │── ...
│   ├── oops_val
│   │   │── ...
├── recordings (optional)
│   ├── OVIS_train
│   │   │── 0_0.webm
│   │   │── 0_1.webm
│   │   │── ...
│   ├── UVO_sparse_train
│   │   │── 0_0.webm
│   │   │── ...
├── videos (optional)
│   ├── OVIS_train
│   │   │── 5d24e4ea.mp4
│   │   │── ...
│   ├── ...
├── vng (optional)
│   ├── OVIS_VNG
│   │   │── extra_masks
│   │   │   │── train
│   │   │   │   │── extra_masks.json
│   │   │   │── test
│   │   │   │   │── extra_masks.json
│   │   │── meta_expressions
│   │   │   │── train
│   │   │   │   │── meta_expressions.json
│   │   │   │── test
│   │   │   │   │── meta_expressions.json
│   │   │── orig_masks
│   │   │   │── train
│   │   │   │   │── annotations_train.json (download this from OVIS)
│   │   │   │── test
│   │   │   │   │── annotations_train.json (download this from OVIS, here train is also used for the test set)
│   ├── UVO_VNG
│   │   │── extra_masks
│   │   │   │── train
│   │   │   │   │── extra_masks.json
│   │   │   │── test
│   │   │   │   │── extra_masks.json
│   │   │── meta_expressions
│   │   │   │── train
│   │   │   │   │── meta_expressions.json
│   │   │   │── test
│   │   │   │   │── meta_expressions.json
│   │   │── orig_masks
│   │   │   │── train
│   │   │   │   │── UVO_sparse_train_video.json (download this from UVO)
│   │   │   │── val
│   │   │   │   │── UVO_sparse_val_video.json (download this from UVO)
├── videoqa (optional)
│   ├── text_output
│   ├── location_output

Note that recordings (the audio files) and videos (.mp4 video files) are optional and mainly used for visualization. The vng folder is only necessary, if you are interested in the Video Narrative Grounding task, and the videoqa folder is only necessary if you are interested in the Video Question-Answering task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_preparation.md

data_preparation.md

Files

data_preparation.md

Latest commit

History

data_preparation.md

File metadata and controls