Skip to content

Latest commit

 

History

History
79 lines (60 loc) · 2.44 KB

msrvtt.md

File metadata and controls

79 lines (60 loc) · 2.44 KB

简体中文 | English

MSR-VTT Preparation

Introduction

MSR-VTT(Microsoft Research Video to Text) is a large-scale dataset containing videos and subtitles, which is composed of 10000 video clips from 20 categories, and each video clip is annotated with 20 English sentences. We used 9000 video clips for training and 1000 for testing. For more details, please refer to the website: MSRVTT

Download for T2VLAD

T2VLAD doc

For ease of use, we provided extracted features of video.

First, make sure to enter the following command in the applications/T2VLAD/data directory to download the dataset.

bash download_features.sh

After downloading, the files in the data directory are organized as follows:

├── data
|   ├── MSR-VTT
|   │   ├── raw-captions.pkl
|   │   ├── train_list_jsfusion.txt
|   │   ├── val_list_jsfusion.txt
|   │   ├── aggregated_text_feats
|   |   |   ├── w2v_MSRVTT_openAIGPT.pickle
|   |   ├── mmt_feats
|   │   │   ├── features.audio.pkl
|   │   │   ├── features.face_agg.pkl
|   │   │   ├── features.flos_agg.pkl
|   │   │   ├── features.ocr.pkl
|   │   │   ├── features.rgb_agg.pkl
|   │   │   ├── features.s3d.pkl
|   │   │   ├── features.scene.pkl
|   │   │   ├── features.speech.pkl

Download for ActBERT

ActBERT doc

Download data features:

wget https://videotag.bj.bcebos.com/Data/ActBERT/msrvtt_test.lmdb.tar
wget https://videotag.bj.bcebos.com/Data/ActBERT/MSRVTT_JSFUSION_test.csv

Decompress the msrvtt_test.lmdb.tar

tar -zxvf msrvtt_test.lmdb.tar

The files in the data directory are organized as follows:

├── data
|   ├── MSR-VTT
|   │   ├── MSRVTT_JSFUSION_test.csv
|   │   ├── msrvtt_test.lmdb
|   │       ├── data.mdb
|   │       ├── lock.mdb

Reference

  • Valentin Gabeur, Chen Sun, Karteek Alahari, and Cordelia Schmid. Multi-modal transformer for video retrieval. In ECCV, 2020.