Skip to content

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

License

Notifications You must be signed in to change notification settings

ant-research/DreamLIP

Repository files navigation

DreamLIP: Language-Image Pre-training with Long Captions

DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen
Project Page | Paper | Data

📰 News

  • [2024/11/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of COYO24M/LAION49M are released in huggingface~
  • [2024/08/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of CC3M/CC12M/YFCC15M are released in huggingface~
  • [2024/07/16] Upload the pretrained weight of VIT-B/16 pretrained in CC3M, CC12M, YFCC15M, and merged-30M (long captions of ShareGPT4V)!
  • [2024/07/08] DreamLIP is accepted by ECCV 2024!

💡 Highlights

  • 🔥 Exploring how language-image pre-training could benefit from long captions.
  • 🔥 Strong improvement on semantic segmentation, image-text retrieval, semantic segmentation, and image understanding in MLLM.

  • 🔥 DreamLIP trained with 30M image-text pairs achieves on par or even better performance than CLIP trained with 400M pairs. timeline.jpg

🎨 In-Progress

  • Release long captions of CC3M, CC12M, YFCC15M, COYO24M and LAION49M.
  • Release training code.

🏝️ Overview of supported long captions:

Long Captions of Supported Datasets (5)
Long Captions of MLLMs (3)

Generated Long Captions

Dataset Huggingface Dataset
CC3M Raw/Long/Short Caption
CC12M Raw/Long/Short Caption
YFCC15M Raw/Long/Short Caption
Laion49M Long Caption
COYO24M Long Caption

Pretrained checkpoints

Dataset Model ShareGPT4V InstructBLIP + LLAVA1.5 + ShareGPT4V
CC3M ViT-B/16 Link TODO
CC12M ViT-B/16 Link TODO
YFCC15M ViT-B/16 Link TODO
CC30M ViT-B/16 Link TODO

📣 Instructions

Environment installation

pip install -r requirments.txt

Evaluate zero shot classification

bash eval_zs.sh

License

The project is under a standard Creative Common CC-BY-4.0 License.

📖 Citation

We open source this library to the community to facilitate the research. If you do like our work and use the codebase for your projects, please cite our work as follows.

@inproceedings{DreamLIP,
  title={DreamLIP: Language-Image Pre-training with Long Captions},
  author={Zheng, Kecheng and Zhang, Yifei and Wu, Wei and Lu, Fan and Ma, Shuailei and Jin, Xin and Chen, Wei and Shen, Yujun},
  booktitle={ECCV},
  year={2024}
}

Acknowledgements

This project is based on open_clip, and thanks for the nice work! We also thank InstructBLIP, ShareGPT4V and LLAVA for the pretrained models and codes.

About

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages