Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An observation #30

Open
sameerKgp opened this issue Aug 22, 2023 · 3 comments
Open

An observation #30

sameerKgp opened this issue Aug 22, 2023 · 3 comments

Comments

@sameerKgp
Copy link

Hi, thanks for the code.
I have observed that in the examples you have provided, even if I just directly use the cross attention from the edited prompt by commenting out the line "attn_slice = attn_slice * (1 - self.last_attn_slice_mask) + new_attn_slice * self.last_attn_slice_mask", I get the same result for most of the cases. I checked for the cases where the words are replaced or new phrases like ' in winter' is added. So, it seems like the cross attention editing is not having any effect. Please comment on this. Thanks.

@sameerKgp
Copy link
Author

I think there is a bug in the "stablediffusion" function of CrossAttention_Release_NoImages.py. The same latent is being used both for the noise_cond and noise_cond_edit prediction at every step. But these should be different. With this change, it gives same results as the official code. Attaching a shot of the correction
corrected_sd_p2p

@bloc97
Copy link
Owner

bloc97 commented Aug 28, 2023

Hi, thanks for catching the mistake! The official repo code was released after mine, and I probably misunderstood this part of the algorithm from the paper... I didn't have time to revisit the algorithm since I originally wrote it.

Does this change improve the quality of the generations? If you don't mind, feel free to create a pull request or create a fork of this repo...

Edit: Also I'm just stunned that the method was working with this bug. I don't quite understand what you mean by "the cross attention editing is not having any effect", if you added "in winter" in a SD prompt without using this repo, the entire image changes, but with this repo, cross attention seems to have an effect.

@sameerKgp
Copy link
Author

I meant that if you just replace the self-attention maps for the first 20 or so steps and use the cross-attention maps from the edit-prompt only, then also it gives similar results. But that is just an observation and not a problem with the code. Self-attention seems to be more important in preserving the scene layout in many cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants