-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2167089
commit 5601b77
Showing
40 changed files
with
7,893 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
------------------------------ LICENSE for Lumos ------------------------------ | ||
|
||
Copyright (c) 2024 Ant Group. | ||
|
||
MIT License | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
<p align="center"> | ||
<img src="asset/logo.gif" height=120> | ||
</p> | ||
|
||
### <div align="center"> Learning Visual Generative Priors without Text<div> | ||
<div style="text-align: center;"> | ||
<a href="https://scholar.google.com/citations?user=dNhzCu4AAAAJ&hl=zh-CN">Shuailei Ma*</a><sup>1</sup>, | ||
<a href="https://zkcys001.github.io/">Kecheng Zheng*</a><sup>2</sup>, | ||
<a href="https://ieeexplore.ieee.org/author/37836204100">Ying Wei✉️</a><sup>1</sup>, <a href="https://weiwu-ww.github.io/">Wei Wu</a><sup>2</sup>, <a href="https://scholar.google.com/citations?user=ILpxpfwAAAAJ&hl=zh-CN">Fan Lu</a><sup>2</sup>, | ||
<a href="https://scholar.google.com/citations?hl=en&user=rQKkIykAAAAJ">Yifei Zhang</a><sup>3</sup>,<a href="https://scholar.google.com/citations?user=UHCDCRMAAAAJ&hl=en">Chen-Wei Xie</a><sup>4</sup>, | ||
<a href="https://scholar.google.com/citations?user=BwdpTiQAAAAJ&hl=zh-CN">Biao Gong</a><sup>2</sup>, | ||
<a href="https://scholar.google.com/citations?user=-ACBm-gAAAAJ&hl=zh-TW">Jiapeng Zhu</a><sup>5</sup>, | ||
<a href="https://shenyujun.github.io/">Yujun Shen✉️</a><sup>2</sup> <br> | ||
<sup>1</sup>Northeastern University, China <sup>2</sup>Ant Group <sup>3</sup>SJTU <sup>4</sup>Alibaba Group <sup>5</sup>HKUST <br> | ||
<sup>*</sup>equal contribution <sup>✉️</sup>corresponding author | ||
</div> | ||
<br> | ||
<div style="text-align: center;"> | ||
<a href=""><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:Lumos&color=red&logo=arxiv"></a>   | ||
<a href="https://github.com/xiaomabufei.github.io/lumos/"><img src="https://img.shields.io/badge/Project-Website-blue"></a>   | ||
<a href=""><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=yellow"></a>   | ||
</div> | ||
|
||
|
||
## 📝 Content | ||
* [Update Log](#📣-update-log) | ||
* [Abstract](#🪄✨-abstract) | ||
* [Setup](#️⚙️-setup) | ||
* [License](#🕊️-license) | ||
* [Citation](#📖-citation) | ||
* [Acknowledgement](#❤️-acknowledgement) | ||
|
||
|
||
## 📣 Update Log | ||
- [2024.11.21] 🎉 Here comes Lumos, we release the code and gradio demos of Lumos-I2I and Lumos-T2I. | ||
|
||
## 🪄✨ Abstract | ||
<!-- <b>TL; DR: <font color="purple">Lumos</font> is a Transformer-based diffusion model.</b> --> | ||
|
||
<details><summary>CLICK for the full abstract</summary> | ||
Although text-to-image (T2I) models have recently thrived as visual generative priors, their reliance on high-quality text-image pairs makes scaling up expensive. | ||
We argue that grasping the cross-modality alignment is not a necessity for a sound visual generative prior, whose focus should be on texture modeling. | ||
Such a philosophy inspires us to study image-to-image (I2I) generation, where models can learn from in-the-wild images in a self-supervised manner. | ||
We first develop a pure vision-based training framework, Lumos, and confirm the feasibility and the scalability of learning I2I models. | ||
We then find that, as an upstream task of T2I, our I2I model serves as a more foundational visual prior and achieves on-par or better performance than existing T2I models using only 1/10 text-image pairs for fine-tuning. | ||
We further demonstrate the superiority of I2I priors over T2I priors on some text-irrelevant visual generative tasks, like image-to-3D and image-to-video. | ||
</details> | ||
|
||
![Visualization various downstream tasks of Lumos](asset/teaser.png) | ||
|
||
|
||
## ⚙️ Setup | ||
Follow the following guide to set up the environment. | ||
- Python >= 3.9 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html)) | ||
- [PyTorch >= 2.2.1+cu11.8](https://pytorch.org/) | ||
- Better create a virtual environment | ||
|
||
Install the required dependencies by following the command. | ||
|
||
1. git clone repo. | ||
``` | ||
git clone https://github.com/xiaomabufei/lumos.git | ||
cd lumos | ||
``` | ||
2. download model checkpoints | ||
``` | ||
mkdir ./checkpoints && cd ./checkpoints | ||
git lfs install | ||
git clone https://huggingface.co/Xiaomabufei/lumos | ||
``` | ||
3. create environment | ||
``` | ||
conda create -n lumos python=3.9 -y | ||
conda activate lumos | ||
``` | ||
4. install torch with GPU support | ||
``` | ||
pip install torch==2.2.1+cu118 torchvision==0.17.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html | ||
``` | ||
5. install xformers corresponding to torch and cuda | ||
``` | ||
pip install -U xformers==0.0.25 | ||
``` | ||
6. install the remaining environment | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
7. run lumos Image Interpolation | ||
``` | ||
python gradio_demos/lumos_I2I.py | ||
``` | ||
8. run lumos Text-to-Image Generation | ||
``` | ||
python gradio_demos/lumos_T2I.py | ||
``` | ||
If you are mainland user, you may try `export HF_ENDPOINT=https://hf-mirror.com` to use huggingface mirror to facilitate the download of some necessary checkpoints to run our system. | ||
## 🕊️ License | ||
This repository is released under the MiT license as found in the [LICENSE](LICENSE) file. | ||
## 📖 Citation | ||
Don't forget to cite this source if it proves useful in your research! | ||
```bibtex | ||
@article{Lumos2024, | ||
title={Learning Visual Generative Priors without Text}, | ||
author={Ma, Shuailei and Zheng, Kecheng and Wei, Ying and Wu, Wei and Lu, Fan and Zhang, Yifei and Xie, Chen-Wei and Gong, Biao and Zhu, Jiapeng and Shen, Yujun}, | ||
year={2024}, | ||
eprint={arxiv}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV}} | ||
``` | ||
|
||
|
||
# ❤️ Acknowledgement | ||
<!-- ## 🤗 <a name="acknowledgement"></a>Acknowledgement --> | ||
Our implementation is based on [DiT](https://github.com/nullquant/ComfyUI-BrushNet), [Pixart-α](https://github.com/facebookresearch/DiT) and [Dino](https://github.com/facebookresearch/dino). Thanks for their remarkable contribution and released code! |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
A close-up of a vibrant, fully bloomed red rose with dew drops on its petals |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+134 KB
...sing a round bed, earth-toned decor, and a cluttered, yet charming ambiance.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed
BIN
-1.23 MB
...of an apartment with stairs leading up to it, cozy and warm interior design.png
Binary file not shown.
Binary file removed
BIN
-89.5 KB
... symbol without any words in the results , peaky blinders why hat and smoke.jpg
Binary file not shown.
Oops, something went wrong.