Question about original google implementation with stable diffusion #26

ethansmith2000 · 2022-12-11T07:34:40Z

Hi bloc, firstly thank you for your great work!
I've been spending a lot of time trying to implement google's original release into a custom pipeline with diffusers. I figured it wouldn't be too difficult as they have an example there running with SD that looks pretty good. Although I'm getting very strange results even though everything seems to be in working order. I was considering that it may be because I had been using SD1.5 whereas they had been using 1.4, but I don't think there were any changes in architecture that would be causing that?

Could you elaborate a bit more on the changes you made to get it to work with stable?

bloc97 · 2022-12-11T20:18:13Z

Hi, I'm not too sure about the code difference between my implementation and the original, as this repo's code is not a modification of the authors' code but an independent implementation from scratch (there was no official implementation when this repo was made). However I might be able to help spot common problems, what exactly are the "strange results" you are getting?

Could you elaborate a bit more on the changes you made to get it to work with stable?

The main difference between Imagen and Stable is that Stable has an additional attn1 self-attention layer that is very important for image generation, while in the paper they only modify the attn2 cross-attention layer. The modification in this case is simply to also edit and or copy the attn1 layer with the attn2 layer at the same time.

ethansmith2000 · 2022-12-12T01:14:10Z

the functions they use search through all named attn layers of the model and make the modifications as needed for self attn and cross, so I should think that shouldn't be too much of a problem? https://github.com/google/prompt-to-prompt/blob/main/prompt-to-prompt_stable.ipynb here is the link to their demo with SD.

here is an example of using the original prompt: "a panda at a picnic" with target prompt: "a dog at a picnic"
(the replace method requires that only one word is altered)

meanwhile, this is the original output i get on that seed pre-injection, as well as post-injection when i set the attn replace steps to 0

the only thing i can think of is that the example was done with sd1.4 but that doesn't seem like it would affect it.
additionally since the effects entirely take place in the Unet, I haven't looked into what happens at any other part in the process, but i could definitely be missing something.

are there any variables you'd reccomend printing out? I am pretty new to the lower-level parts of attention, so don't have a great idea of where to start. Thank you for your reply!

ethansmith2000 · 2022-12-12T02:15:50Z

Nevermind, got it working! I didn't realize that the prompt that goes into the text encoder has to be the new one. I'll be trying your repo as well afterwards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about original google implementation with stable diffusion #26

Question about original google implementation with stable diffusion #26

ethansmith2000 commented Dec 11, 2022

bloc97 commented Dec 11, 2022 •

edited

Loading

ethansmith2000 commented Dec 12, 2022 •

edited

Loading

ethansmith2000 commented Dec 12, 2022

Question about original google implementation with stable diffusion #26

Question about original google implementation with stable diffusion #26

Comments

ethansmith2000 commented Dec 11, 2022

bloc97 commented Dec 11, 2022 • edited Loading

ethansmith2000 commented Dec 12, 2022 • edited Loading

ethansmith2000 commented Dec 12, 2022

bloc97 commented Dec 11, 2022 •

edited

Loading

ethansmith2000 commented Dec 12, 2022 •

edited

Loading