☀️ Hiring research interns for neural architecture search, tiny transformer design, model compression projects: [email protected].
This is an official implementation of MiniViT, including Mini-DeiT and Mini-Swin.
[CVPR'2022] - MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT is a new compression framework that achieves parameter reduction in vision transformers while retaining the same performance. The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks. Specifically, we make the weights shared across layers, while imposing a transformation on the weights to increase diversity. Weight distillation over self-attention is also applied to transfer knowledge from large-scale ViT models to weight-multiplexed compact models.
- Accurate
MiniViT reduces the size of Swin-B by 48%, while achieving 1.0% better Top-1 accuracy on ImageNet.
- Small
MiniViT can compress DeiT-B (86M) to 9M (9.7x), without seriously compromising the accuracy.
For evaluation, we provide the checkpoints of our models in the following table.
Model | Params. | Input | Top-1 Acc. % | Top-5 Acc. % | Download link |
---|---|---|---|---|---|
Mini-DeiT-Ti | 3M | 224x224 | 73.0 | 91.6 | model, log |
Mini-DeiT-S | 11M | 224x224 | 80.9 | 95.6 | model, log |
Mini-DeiT-B | 44M | 224x224 | 83.2 | 96.5 | model, log |
Mini-DeiT-B | 44M | 384x384 | 84.9 | 97.2 | model, log |
Mini-Swin-T | 12M | 224x224 | 81.3 | 95.7 | model, log |
Mini-Swin-S | 26M | 224x224 | 83.9 | 97.0 | model, log |
Mini-Swin-B | 46M | 224x224 | 84.5 | 97.3 | model, log |
Mini-Swin-B | 47M | 384x384 | 85.5 | 97.6 | model, log |
- For Mini-DeiT, please see Mini-DeiT for detailed instructions.
- For Mini-Swin, please see Mini-Swin for a quick start.
If this repo is helpful for you, please consider to cite it. Thank you! :)
@InProceedings{MiniViT,
title = {MiniViT: Compressing Vision Transformers With Weight Multiplexing},
author = {Zhang, Jinnian and Peng, Houwen and Wu, Kan and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {12145-12154}
}