This repository contains the code and data for our paper: Diagnosing Vision-and-language Navigation: What Really Matters.
We cover three VLN datasets and nine agents in our study.
- data_processing/
- process_instructions/: scripts to prepare/download instructions
- Matterport3DSimulator: a copy of Matterport simulator, and scripts to prepare/download R2R/RxR image features
- r2r/: Data and code for experiments on Room-to-room (R2R) for indoor VLN
- rxr/: Data and code for experiments on Room-across-room (RxR) for indoor VLN
- touchdown/: Data and code for experiments on Touchdown for outdoor VLN
git clone --recursive https://github.com/VegB/Diagnose_VLN
We describe the detailed environment setup for each model in the corresponding directory. For instance, guidance to setup R2R-EnvDrop can be found here.
We thank the authors for Matterport3DSimulator, R2R-EnvDrop, FAST, Recurrent-VLN-BERT, PREVALENT_R2R, CLIP-ViL-VLN, VLN-HAMT, RCONCAT, ARC, VLN-Transformer for sharing their code!