-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions on Loss scale, Hyperparameters #4
Comments
oops.. The paper already mention about Question1. "However, we do not claim that this is a new method to quantitatively evaluate |
I can comment a bit on this: me and @FeepingCreature have been trying out WGAN for modeling anime images (and 64px cropped faces specifically because attempts at larger images and more diverse datasets failed totally even with increased learning rates/discriminator steps). So far we've found that batch size doesn't seem especially important, model size and image size are very important (64px works great but 128px struggles to get anywhere, we've had better results enlarging the model while keeping it at 64px), learning rate is important and higher than the defaults doesn't seem to work well, and #D/G steps or Personally, I'm still wondering what it'll take to get unsupervised GANs to generate really diverse scenes on the level of StackGAN. I thought perhaps regular DCGANs could do it except that they diverged before leaning; but while, very impressively, none of my WGANs have diverged (just plateaued and stopped learning), they're still limited to highly homogeneous image sets. |
@gmkim90 I also have some experience here to share: I have also observed improvements in image quality with no decrease in the discriminator loss, although I am still learning whether it is necessary for me to increase DIters further. Leaving learning rate and Diters at its default never lowers my loss for the datasets I'm running (may be due to not enough DIters) |
I have similar questions, but it's relative to Wasserstein distance. I trained WGAN, WGAN-GP, and WGAN-DIV using celebA dataset and DCGAN in 64*64 image size (with default hyperparameters recommended in paper).
Please contact me if you have any advice, thank you! |
Thanks to share easy-to-follow code.
I am currently applying WGAN to learning text distribution.
Here is questions regarding WGAN.
Question1. In Figure 3, the loss of MLP and DCGAN seems comparable. However, I think scale of loss can be varied depending on weight initialization scale and model size. (Please correct me if I am wrong.) In this case, what could be the way to compare learning result of two different model?
Question2. Could you share what the sensitive hyperparameter for WGAN are?
For example : weight initialization scale(0.02), clamping threshold (0.01), batch size(64), model size, #D/G step(5), lr(0.00005)
Thank you
The text was updated successfully, but these errors were encountered: