Skip to content

Latest commit

 

History

History
executable file
·
141 lines (83 loc) · 4.78 KB

README.md

File metadata and controls

executable file
·
141 lines (83 loc) · 4.78 KB

Scribble-Guided Diffusion for
Training-free Text-to-Image Generation

(*Equal contribution)

arXiv Project page

This is the official implementation of Scribble-Guided-Diffusion.


Abstract

Figure

Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent. Existing approaches using textual inputs combined with bounding boxes or region masks fall short in providing precise spatial guidance, often leading to misaligned or unintended object orientation. To address these limitations, we propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that utilizes simple user-provided scribbles as visual prompts to guide image generation. However, incorporating scribbles into diffusion models presents challenges due to their sparse and thin nature, making it difficult to ensure accurate orientation alignment. To overcome these challenges, we introduce moment alignment and scribble propagation, which allow for more effective and flexible alignment between generated images and scribble inputs. Experimental results on the PASCAL-Scribble dataset demonstrate significant improvements in spatial control and consistency, showcasing the effectiveness of scribble-based guidance in diffusion models. Please check the paper here: Scribble-Guided Diffusion for Training-free Text-to-Image Generation


News & Updates

  • [TBA] ✨ User-friendly scribble drawing tool will be released soon.

  • [TBA] ✨ Huggingface-based code will be released soon.

  • [2024/09/13] 🌟 LDM-based code was released.


Architecture

Architecture


Setup

First, create and activate a new conda environment:

conda create --name highlight-guided python==3.8.0
conda activate highlight-guided
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

Next, install the necessary dependencies:

pip install -r environments/requirements_all.txt
# if this does not work, try the following
pip install -r environments/requirements.txt

Install additional libraries:

pip install git+https://github.com/CompVis/taming-transformers.git
pip install git+https://github.com/openai/CLIP.git

Download the model GLIGEN trained with box-grounding tokens with text and put them in checkpoints/gligen

Inference

To create scribbles for guidance:

python draw_scribble.py

We will explain how to draw and save scribbles in the future.

After drawing the scribbles, save the images in the */strokes directory, for example:

examples/example1/strokes

Ensure the directory structure matches the configuration file paths. For instance, in configs/config.json:

For config.json

"stroke_dir": "examples/example1/strokes",
"save_scribble_dir": "examples/example1/scribbles",
"save_mask_dir": "examples/example1/masks",

To run with user input text prompts:

python inference.py --ckpt checkpoints/gligen/text-box/diffusion_pytorch_model.bin

To use the default configuration file:

python inference.py --config configs/config.json

Scribble_Tool

We will provide a more user-friendly and intuitive scribble drawing tool in the future.


Acknowledgments

This project is built on the following resources:

  • Attention Refocusing: This is the baseline model we used in our paper.

  • GLIGEN: Our code is built upon the foundational work provided by GLIGEN.


Related Works

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Dense Text-to-Image Generation with Attention Modulation