Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss implementation differs from paper #128

Open
maximeraafat opened this issue May 16, 2022 · 1 comment
Open

loss implementation differs from paper #128

maximeraafat opened this issue May 16, 2022 · 1 comment

Comments

@maximeraafat
Copy link

maximeraafat commented May 16, 2022

Hi,

Thanks for this amazing implementation! I have a question concerning the loss implementation, as it seems to differ from the original equations.
The screenshot below shows the GAN loss as presented in the paper :

paper_losses

  • in red, the discriminator loss (D loss) on the true labels,
  • in green the D loss on labels for fake generated images,
  • and in blue, the generator loss (G loss) on labels for fake images.

This makes sense to me. Since it is assumed that D outputs values between 0 and 1 (0 = fake, 1 = real) :

  • in red, we want D to output 1 for true images → let's assume D indeed outputs 1 for true images : -min(0, -1 + D(x)) = 0, which is indeed the minimum achievable,
  • in green, we want D to output 0 (from the discriminator perspective) for fake images → let's assume D indeed outputs 0 for fake images : -min(0, -1 - D(x^)) = 1, which is the minimum achievable if D outputs values only between 0 and 1,
  • in blue, we want D to output 1 (from the generator perspective) for fake images : the equation follows directly.

Now, the way the authors implement this in the code provided in the supplementary materials of the paper is as follows (the colors match the ones in the above picture)

og_code_loss_d_real
og_code_loss_d_fake
og_code_loss_g

Except for the strange involved randomness (already explained in #11), their implementation is a one to one match with the paper equations.


The way it is implemented in this repo however is quite different, and I do not understand why..

lighweight_gan_losses

Let's start with the discriminator loss :

  • in red, you want D to output small values (negative if allowed), to set this term as small as possible (0 if D can output negative values)
  • in green, you want D to output values as large as possible (larger or equal to 1) to cancel this term out as well

For the generator loss :

  • in blue, you want the opposite of green, that is for D to output values as small as possible

This implementation seems to be meaningful, and yields coherent results (as proven in examples). It also seems to me that D is not limited to output values between 0 and 1, but any real value (I might be wrong). I am just wondering why this choice? Could you perhaps elaborate why you decided to implement the loss differently from the original paper?

@iScriptLex
Copy link

I think it was just taken from some other article. You can see some elements of WGAN-GP in this code, such as simplified realization of gradient penalty.
Also, this code contains multiple losses (user can use dual contrastive loss instead of hinge loss). It could be implemented in such a way as to be able to use one training loop code for several loss functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants