EVA/EVA-02 at master · baaivision/EVA

History

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
asuka		asuka
det		det
seg		seg
README.md		README.md

README.md

EVA-02: A Visual Representation for Neon Genesis

Yuxin Fang^2,1, Quan Sun¹, Xinggang Wang², Tiejun Huang¹, Xinlong Wang¹, Yue Cao¹

¹BAAI, ²HUST

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.

With an updated plain Transformer architecture as well as extensive pre-training from an open & accessible giant CLIP vision encoder, EVA-02 demonstrates superior performance compared to prior state-of-the-art approaches across various representative vision tasks, while utilizing significantly fewer parameters and compute budgets.

Notably, using exclusively publicly accessible training data, EVA-02 with only 304M parameters achieves a phenomenal 90.0 fine-tuning top-1 accuracy on ImageNet-1K val set. Additionally, EVA-02-CLIP can reach up to 80.4 zero-shot top-1 on ImageNet-1K, outperforming the previous largest & best open-sourced CLIP with only ~1/6 parameters and ~1/6 image-text training data.

We offer four EVA-02 variants in various model sizes, ranging from 6M to 304M parameters, all with impressive performance.

We hope our efforts enable a broader range of the research community to advance the field in a more efficient, affordable and equitable manner.

Summary of EVA-02 performance

Get Started

Best Practice

If you would like to use / fine-tune EVA-02 in your project, please start with a shorter schedule & smaller learning rate (compared with the baseline setting) first.
Using EVA-02 as a feature extractor: #56.

BibTeX & Citation

@article{eva02,
  title={Eva-02: A visual representation for neon genesis},
  author={Fang, Yuxin and Sun, Quan and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
  journal={Image and Vision Computing},
  pages={105171},
  year={2024},
  publisher={Elsevier}
}

Acknowledgement

EVA-01, BEiT, BEiTv2, CLIP, MAE, timm, DeepSpeed, Apex, xFormer, detectron2, mmcv, mmdet, mmseg, ViT-Adapter, detrex, and rotary-embedding-torch.

Contact

For help and issues associated with EVA-02, or reporting a bug, please open a GitHub Issue with label EVA-02. Let's build a better & stronger EVA-02 together :)
We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, self-supervised learning and multimodal learning, please contact Yue Cao ([email protected]) and Xinlong Wang ([email protected]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVA-02

EVA-02

README.md

EVA-02: A Visual Representation for Neon Genesis

Summary of EVA-02 performance

Get Started

Best Practice

BibTeX & Citation

Acknowledgement

Contact

Files

EVA-02

Directory actions

More options

Directory actions

More options

Latest commit

History

EVA-02

Folders and files

parent directory

README.md

EVA-02: A Visual Representation for Neon Genesis

Summary of EVA-02 performance

Get Started

Best Practice

BibTeX & Citation

Acknowledgement

Contact