Skip to content

Latest commit

 

History

History
99 lines (96 loc) · 3.89 KB

data_preparation.md

File metadata and controls

99 lines (96 loc) · 3.89 KB

Visit the project page to find the download links for our annotations.

Note that we do not provide the raw videos and frames, and you have to download them according to the instructions of the original datasets:

You can use this script provided with the UVO dataset, to extract frames from the video files.

We recommend the following structure for the data (not complete):

video-localized-narratives/data/
├── vidlns
│   ├── OVIS_train.jsonl
│   ├── UVO_sparse_train.jsonl
│   ├── UVO_sparse_val.jsonl
│   ├── UVO_dense_train.jsonl
│   ├── UVO_dense_val.jsonl
│   ├── oops_train.jsonl
│   ├── oops_val.jsonl
├── frames
│   ├── OVIS_train
│   │   │── 5d24e4ea
│   │   │   │── img_0000001.jpg
│   │   │   │── ...
│   │   │── ...
│   ├── UVO_sparse_train
│   │   │── zxE180Fndow
│   │   │   │── 0.png
│   │   │   │── ...
│   │   │   │── 299.png
│   ├── UVO_sparse_val
│   │   │── ...
│   ├── oops_train
│   │   │── Your Tooth Is Missing - Best Fails of the Week (November 2017) _ FailArmy9
│   │   │   │── 000000.png
│   │   │   │── ...
│   │   │   │── 000249.png
│   │   │── ...
│   ├── oops_val
│   │   │── ...
├── recordings (optional)
│   ├── OVIS_train
│   │   │── 0_0.webm
│   │   │── 0_1.webm
│   │   │── ...
│   ├── UVO_sparse_train
│   │   │── 0_0.webm
│   │   │── ...
├── videos (optional)
│   ├── OVIS_train
│   │   │── 5d24e4ea.mp4
│   │   │── ...
│   ├── ...
├── vng (optional)
│   ├── OVIS_VNG
│   │   │── extra_masks
│   │   │   │── train
│   │   │   │   │── extra_masks.json
│   │   │   │── test
│   │   │   │   │── extra_masks.json
│   │   │── meta_expressions
│   │   │   │── train
│   │   │   │   │── meta_expressions.json
│   │   │   │── test
│   │   │   │   │── meta_expressions.json
│   │   │── orig_masks
│   │   │   │── train
│   │   │   │   │── annotations_train.json (download this from OVIS)
│   │   │   │── test
│   │   │   │   │── annotations_train.json (download this from OVIS, here train is also used for the test set)
│   ├── UVO_VNG
│   │   │── extra_masks
│   │   │   │── train
│   │   │   │   │── extra_masks.json
│   │   │   │── test
│   │   │   │   │── extra_masks.json
│   │   │── meta_expressions
│   │   │   │── train
│   │   │   │   │── meta_expressions.json
│   │   │   │── test
│   │   │   │   │── meta_expressions.json
│   │   │── orig_masks
│   │   │   │── train
│   │   │   │   │── UVO_sparse_train_video.json (download this from UVO)
│   │   │   │── val
│   │   │   │   │── UVO_sparse_val_video.json (download this from UVO)
├── videoqa (optional)
│   ├── text_output
│   ├── location_output

Note that recordings (the audio files) and videos (.mp4 video files) are optional and mainly used for visualization. The vng folder is only necessary, if you are interested in the Video Narrative Grounding task, and the videoqa folder is only necessary if you are interested in the Video Question-Answering task.