Diffusion Features (DIFT)

This repository contains code for paper "Emergent Correspondence from Image Diffusion".

Project Page | Paper | Colab Demo

Prerequisites

If you have a Linux machine, you could either set up the python environment using the following command:

conda env create -f environment.yml
conda activate dift

or create a new conda environment and install the packages manually using the shell commands in setup_env.sh.

Interactive Demo: Give it a Try!

We provide an interative jupyter notebook demo.ipynb to demonstrate the semantic correspondence established by DIFT, and you could try on your own images! After loading two images, you could left-click on an interesting point of the source image on the left, then after 1 or 2 seconds, the corresponding point on the target image will be displayed as a red point on the right, together with a heatmap showing the per-pixel cosine distance calculated using DIFT. Here're two examples on cat and guitar:

If you don't have a local GPU, you can also use the provided Colab Demo.

Extract DIFT for a given image

You could use the following command to extract DIFT from a given image, and save it as a torch tensor. These arguments are set to the same as in the semantic correspondence tasks by default.

python extract_dift.py \
    --input_path ./assets/cat.png \
    --output_path dift_cat.pt \
    --img_size 768 768 \
    --t 261 \
    --up_ft_index 1 \
    --prompt 'a photo of a cat' \
    --ensemble_size 8

Here're the explanation for each argument:

input_path: path to the input image file.
output_path: path to save the output features as torch tensor.
img_size: the width and height of the resized image before fed into diffusion model. If set to 0, then no resize operation would be performed thus it will stick to the original image size. It is set to [768, 768] by default. You can decrease this if encountering memory issue.
t: time step for diffusion, choose from range [0, 1000], must be an integer. t=261 by default for semantic correspondence.
up_ft_index: the index of the U-Net upsampling block to extract the feature map, choose from [0, 1, 2, 3]. up_ft_index=1 by default for semantic correspondence.
prompt: the prompt used in the diffusion model.
ensemble_size: the number of repeated images in each batch used to get features. ensemble_size=8 by default. You can reduce this value if encountering memory issue.

The output DIFT tensor spatial size is determined by both img_size and up_ft_index. If up_ft_index=0, the output size would be 1/32 of img_size; if up_ft_index=1, it would be 1/16; if up_ft_index=2 or 3, it would be 1/8.

Application: Edit Propagation

Using DIFT, we can propagate edits in one image to others that share semantic correspondences, even cross categories and domains:

Check out more videos and visualizations in the project page.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
environment.yml		environment.yml
extract_dift.py		extract_dift.py
extract_dift.sh		extract_dift.sh
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion Features (DIFT)

Project Page | Paper | Colab Demo

Prerequisites

Interactive Demo: Give it a Try!

Extract DIFT for a given image

Application: Edit Propagation

About

Releases

Packages

Languages

License

rb-synth/dift

Folders and files

Latest commit

History

Repository files navigation

Diffusion Features (DIFT)

Project Page | Paper | Colab Demo

Prerequisites

Interactive Demo: Give it a Try!

Extract DIFT for a given image

Application: Edit Propagation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages