LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

This repository contains PyTorch evaluation code, training code and pretrained models for LeViT.

They obtain competitive tradeoffs in terms of speed / precision:

For details see LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference by Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou and Matthijs Douze.

If you use this code for a paper please cite:

@InProceedings{Graham_2021_ICCV,
    author    = {Graham, Benjamin and El-Nouby, Alaaeldin and Touvron, Hugo and Stock, Pierre and Joulin, Armand and Jegou, Herve and Douze, Matthijs},
    title     = {LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {12259-12269}
}

Model Zoo

We provide baseline LeViT models trained with distllation on ImageNet 2012.

name	acc@1	acc@5	#FLOPs	#params	url
LeViT-128S	76.6	92.9	305M	7.8M	model
LeViT-128	78.6	94.0	406M	9.2M	model
LeViT-192	80.0	94.7	658M	11M	model
LeViT-256	81.6	95.4	1120M	19M	model
LeViT-384	82.6	96.0	2353M	39M	model

Usage

First, clone the repository locally:

git clone https://github.com/facebookresearch/levit.git

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models:

conda install -c pytorch pytorch torchvision
pip install timm

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Evaluation

To evaluate a pre-trained LeViT-256 model on ImageNet val with a single GPU run:

python main.py --eval --model LeViT_256 --data-path /path/to/imagenet

This should give

* Acc@1 81.636 Acc@5 95.424 loss 0.750

Training

To train LeViT-256 on ImageNet with hard distillation on a single node with 8 gpus run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model LeViT_256 --data-path /path/to/imagenet --output_dir /path/to/save

Multinode training

Distributed training is available via Slurm and submitit:

pip install submitit

To train LeViT-256 model on ImageNet on one node with 8 gpus:

python run_with_submitit.py --model LeViT_256 --data-path /path/to/imagenet

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
engine.py		engine.py
hubconf.py		hubconf.py
levit.py		levit.py
levit_c.py		levit_c.py
losses.py		losses.py
main.py		main.py
run_with_submitit.py		run_with_submitit.py
samplers.py		samplers.py
speed_test.py		speed_test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

Model Zoo

Usage

Data preparation

Evaluation

Training

Multinode training

License

Contributing

About

Releases

Packages

Contributors 3

Languages

License

facebookresearch/LeViT

Folders and files

Latest commit

History

Repository files navigation

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

Model Zoo

Usage

Data preparation

Evaluation

Training

Multinode training

License

Contributing

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages