[arXiv]
[video presentation at ICCV]
- PyTorch 1.8 or higher
- CLIP (install with
pip install git+https://github.com/openai/CLIP.git
) - transformers (install with
pip install transformers
) - cococaption
- COCO
- MPI. Rename to
mpi
- Flickr30K. Rename to
flickr30k
- VCR
- ImageNet (ILSVRC2012). Rename to
ImageNet
- Visual Genome v1.2. Rename to
VG_100K
The trianing and test data (combined for all datasets) can be found here
The annotations in the format that cococaption expects can be found here. Please place them inside the cococaption
folder.
train_nlx.py
: script for training only
test_datasets.py
: script for validation/testing for all epochs on all 7 NLE tasks
clip_model.py
: script for vision backbone we use (CLIP visual encoder)