How to make image inversion more precise? #20

hmartiro · 2022-10-15T04:54:13Z

Fantastic work on this project @bloc97!

I'm able to get super impressive results with prompt editing. However, when doing img2img I find that the results degrade greatly. For example, here I'm editing the prompt to change to a charcoal drawing, which works well. However, if I pass in the initial image generated from the original prompt, there's no values of parameters I can find to get anywhere close to the quality of the prompt edit without initial image. I'm observing similar issues to stock SD where either the macro structure of the initial image is lost or the prompt edit has little to no effect.

The reason I want this is to edit real images and to build edits on top of each other. I realize this may be unsolved, and depend on how well the network understands the scene content, but I'm very interested in your thoughts and suggestions here as I think it would be incredibly powerful.

img_original = stablediffusion(
    prompt="a fantasy landscape with a maple forest",
    steps=50,
    seed=42,
)

img_prompt_edit = stablediffusion(
    prompt="a fantasy landscape with a maple forest",
    prompt_edit="a charcoal sketch of a fantasy landscape with a maple forest",
    steps=50,
    seed=42,
)

img_init_image = stablediffusion(
    prompt="a fantasy landscape with a maple forest",
    prompt_edit="a charcoal sketch of a fantasy landscape with a maple forest",
    steps=50,
    seed=42,
    init_image=img_original,
    init_image_strength=0.6,
)

The text was updated successfully, but these errors were encountered:

bloc97 · 2022-10-15T14:53:53Z

You can try the method in InverseCrossAttention_Release.ipynb. However, this works very well only with images that were generated with stable diffusion. For real images, inversion with high CFG is an unsolved problem currently. Sometimes you get good results, sometimes you don't (images with uniform content are usually easy to reconstruct, eg. object/faces with a white background, otherwise the reconstruction can focus on the background, distorting the intended object). Also, images that will never get generated by the model given the input prompt fares quite poorly.
Here's what I get with the following code:

gen_latents = inversestablediffusion(input_image, "a fantasy landscape with a maple forest", refine_iterations=10, guidance_scale=5.0)
stablediffusion("a fantasy landscape with a maple forest", guidance_scale=5.0, init_latents=gen_latents)
stablediffusion("a fantasy landscape with a maple forest", prompt_edit="a charcoal sketch of a fantasy landscape with a maple forest", guidance_scale=5.0, init_latents=gen_latents)

Reconstruction:

Edit:

bloc97 · 2022-10-15T15:28:09Z

Examples where inversion doesn't work well:
Left is original image, right is reconstruction with prompt.

https://www.pexels.com/photo/tray-of-pumpkins-on-a-knitted-sweater-5429788/

gen_latents = inversestablediffusion(input_image, "pumpkins in a tray seen from above", refine_iterations=10, refine_skip=0.4, guidance_scale=3.0)
stablediffusion("pumpkins in a tray seen from above", guidance_scale=3.0, init_latents=gen_latents)

https://www.pexels.com/photo/trees-at-the-park-under-clear-sky-3227735/

gen_latents = inversestablediffusion(input_image, "a temperate forest in autumn", refine_iterations=10, refine_skip=0.5, guidance_scale=3.0)
stablediffusion("a temperate forest in autumn", guidance_scale=3.0, init_latents=gen_latents)

hmartiro · 2022-10-15T23:13:56Z

That's good input @bloc97, thank you. I spent most of today experimenting with the inverse method on real images and getting unpredictable results, as you commented. Although, with a couple of great examples.

Have you thought about whether fine tuning the model or using an embedding (like textual inversion or dreambooth) could help with this applied to a particular domain? For example if one were to take a video through a particular forest or scene and fine tune with examples of that, whether it would then be possible to do precise prompt-to-prompt editing of real images that are close to that distribution?

bloc97 · 2022-10-16T00:28:58Z

Have you thought about whether fine tuning the model or using an embedding (like textual inversion or dreambooth)

That might actually work! A better reconstruction usually allows for better editing...

bloc97 · 2022-10-18T16:33:03Z

@hmartiro There's Google's Imagic paper that just got released. https://arxiv.org/abs/2210.09276

From the paper, it seems that inverting the prompt embeddings too (not just the latent) yields even better results. In that paper they also fine tune the model on the inverted embeddings so that it reconstructs the input image better.

hmartiro · 2022-10-18T18:45:15Z

Very spicy @bloc97! I'll take a look at that paper.

I did end up fine tuning using the DreamBooth approach on a novel object type, and I haven't tried this cross-attention method yet on that model but the similar "img2img alternate" approach in the automatic1111 repo did greatly improve prompt editing once the model understood the class. It's definitely still immature though and results are sproadic.

hmartiro · 2022-10-20T05:18:19Z

Also see: https://text2live.github.io/

So far it appears very slow to run since it requires training extensively on a single image, but very exciting in the results. I really like the idea of the split edit layer that gets composited on top of the original. Do you think such an approach would have value with your method?

KevinGoodman · 2022-10-22T14:20:48Z

Thanks for the amazing work. But I am a bit confused about this inversion ... I wonder if there is corresponding paper/article that elaborate this process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make image inversion more precise? #20

How to make image inversion more precise? #20

hmartiro commented Oct 15, 2022 •

edited

Loading

bloc97 commented Oct 15, 2022 •

edited

Loading

bloc97 commented Oct 15, 2022 •

edited

Loading

hmartiro commented Oct 15, 2022

bloc97 commented Oct 16, 2022 •

edited

Loading

bloc97 commented Oct 18, 2022

hmartiro commented Oct 18, 2022

hmartiro commented Oct 20, 2022

KevinGoodman commented Oct 22, 2022

How to make image inversion more precise? #20

How to make image inversion more precise? #20

Comments

hmartiro commented Oct 15, 2022 • edited Loading

bloc97 commented Oct 15, 2022 • edited Loading

bloc97 commented Oct 15, 2022 • edited Loading

hmartiro commented Oct 15, 2022

bloc97 commented Oct 16, 2022 • edited Loading

bloc97 commented Oct 18, 2022

hmartiro commented Oct 18, 2022

hmartiro commented Oct 20, 2022

KevinGoodman commented Oct 22, 2022

hmartiro commented Oct 15, 2022 •

edited

Loading

bloc97 commented Oct 15, 2022 •

edited

Loading

bloc97 commented Oct 15, 2022 •

edited

Loading

bloc97 commented Oct 16, 2022 •

edited

Loading