Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the finite difference gradient descent method #25

Open
BrandonHanx opened this issue Dec 3, 2022 · 1 comment
Open

About the finite difference gradient descent method #25

BrandonHanx opened this issue Dec 3, 2022 · 1 comment

Comments

@BrandonHanx
Copy link

Hi @bloc97 ,

Thanks for your great work.

Do you know any other papers/implementations using the finite difference gradient descent to do inversion?
I want more references for this solution.

Also, could you please give more hints about the magic number tless?

@bloc97
Copy link
Owner

bloc97 commented Dec 4, 2022

Hi, unfortunately there isn't any reference for the "ad hoc" method I used to compensate for CFG, but I can give a quick explanation, if you have more questions we can discuss this further...

Because ODEs used in diffusion models are somewhat sensitive to initial conditions, using the CFG "vector" at t-1 to invert and find the t latent does not give the correct answer (seen in the fact that it is not always possible to invert a generated image back to the latent if the CFG is high). The correct answer is found by finding what CFG vector at t gives the correct t-1 latent, but since we do not know the latent at t in the first place, how can we find the CFG vector?
One solution is to use a gradient descent approximation, where we first use the wrong CFG vector (at t-1) to get an approximation of the latent at t, then do a forward diffusion pass to re-obtain our latent at t-1, we can then compute the difference and use gradient descent on the CFG vector.

In my simple implementation, I am assuming that the latent landscape near our point of interest (latent at t) is a convex and smooth function (which is most likely wrong), thus I am directly doing gradient descent on the latent at t using the difference of the ground truth and predicted t-1. (The numerically correct method would be to do backprop through the model twice, but it would be too slow...) This solution provided here is literally an approximation of an approximation, but works quite well for images generated by Stable Diffusion. In my tests, images that were produced using a CFG of up to 5.5 can be reasonably well inverted. For real images, the results are satisfactory in most cases up to a CFG of 4.5, but some images cannot be inverted at all.

For the magic number, it was found empirically. If tless is not used, sometimes the result diverges when re-diffusing the inverted latent and you get a completely grey image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants