Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about original google implementation with stable diffusion #26

Open
ethansmith2000 opened this issue Dec 11, 2022 · 3 comments

Comments

@ethansmith2000
Copy link

Hi bloc, firstly thank you for your great work!
I've been spending a lot of time trying to implement google's original release into a custom pipeline with diffusers. I figured it wouldn't be too difficult as they have an example there running with SD that looks pretty good. Although I'm getting very strange results even though everything seems to be in working order. I was considering that it may be because I had been using SD1.5 whereas they had been using 1.4, but I don't think there were any changes in architecture that would be causing that?

Could you elaborate a bit more on the changes you made to get it to work with stable?

@bloc97
Copy link
Owner

bloc97 commented Dec 11, 2022

Hi, I'm not too sure about the code difference between my implementation and the original, as this repo's code is not a modification of the authors' code but an independent implementation from scratch (there was no official implementation when this repo was made). However I might be able to help spot common problems, what exactly are the "strange results" you are getting?

Could you elaborate a bit more on the changes you made to get it to work with stable?

The main difference between Imagen and Stable is that Stable has an additional attn1 self-attention layer that is very important for image generation, while in the paper they only modify the attn2 cross-attention layer. The modification in this case is simply to also edit and or copy the attn1 layer with the attn2 layer at the same time.

@ethansmith2000
Copy link
Author

ethansmith2000 commented Dec 12, 2022

the functions they use search through all named attn layers of the model and make the modifications as needed for self attn and cross, so I should think that shouldn't be too much of a problem? https://github.com/google/prompt-to-prompt/blob/main/prompt-to-prompt_stable.ipynb here is the link to their demo with SD.

here is an example of using the original prompt: "a panda at a picnic" with target prompt: "a dog at a picnic"
(the replace method requires that only one word is altered)
Screen Shot 2022-12-11 at 8 09 27 PM

meanwhile, this is the original output i get on that seed pre-injection, as well as post-injection when i set the attn replace steps to 0
Screen Shot 2022-12-11 at 8 11 16 PM

the only thing i can think of is that the example was done with sd1.4 but that doesn't seem like it would affect it.
additionally since the effects entirely take place in the Unet, I haven't looked into what happens at any other part in the process, but i could definitely be missing something.

are there any variables you'd reccomend printing out? I am pretty new to the lower-level parts of attention, so don't have a great idea of where to start. Thank you for your reply!

@ethansmith2000
Copy link
Author

Nevermind, got it working! I didn't realize that the prompt that goes into the text encoder has to be the new one. I'll be trying your repo as well afterwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants