This repository contains PyTorch evaluation code, training code and pretrained models for LeViT.
They obtain competitive tradeoffs in terms of speed / precision:
For details see LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference by Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou and Matthijs Douze.
If you use this code for a paper please cite:
@InProceedings{Graham_2021_ICCV,
author = {Graham, Benjamin and El-Nouby, Alaaeldin and Touvron, Hugo and Stock, Pierre and Joulin, Armand and Jegou, Herve and Douze, Matthijs},
title = {LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {12259-12269}
}
We provide baseline LeViT models trained with distllation on ImageNet 2012.
name | acc@1 | acc@5 | #FLOPs | #params | url |
---|---|---|---|---|---|
LeViT-128S | 76.6 | 92.9 | 305M | 7.8M | model |
LeViT-128 | 78.6 | 94.0 | 406M | 9.2M | model |
LeViT-192 | 80.0 | 94.7 | 658M | 11M | model |
LeViT-256 | 81.6 | 95.4 | 1120M | 19M | model |
LeViT-384 | 82.6 | 96.0 | 2353M | 39M | model |
First, clone the repository locally:
git clone https://github.com/facebookresearch/levit.git
Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models:
conda install -c pytorch pytorch torchvision
pip install timm
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision datasets.ImageFolder
, and the training and validation data is expected to be in the train/
folder and val
folder respectively:
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class/2
img4.jpeg
To evaluate a pre-trained LeViT-256 model on ImageNet val with a single GPU run:
python main.py --eval --model LeViT_256 --data-path /path/to/imagenet
This should give
* Acc@1 81.636 Acc@5 95.424 loss 0.750
To train LeViT-256 on ImageNet with hard distillation on a single node with 8 gpus run:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model LeViT_256 --data-path /path/to/imagenet --output_dir /path/to/save
Distributed training is available via Slurm and submitit:
pip install submitit
To train LeViT-256 model on ImageNet on one node with 8 gpus:
python run_with_submitit.py --model LeViT_256 --data-path /path/to/imagenet
This repository is released under the Apache 2.0 license as found in the LICENSE file.
We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.