Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminator Loss converges to 0 while Generator loss pretty high #133

Open
demiahmed opened this issue Jun 28, 2022 · 3 comments
Open

Comments

@demiahmed
Copy link

I am trying to train with a custom image dataset for about 600,000 epochs. At about halfway, my D_loss converges to 0 while my G_loss stays put at 2.5

My evaluation outputs are slowly starting to fade out to either black or white.

Is there any thing that I could to tweak my model? Either by increasing the threshold for the Discriminator or by training the Generator only?

@iScriptLex
Copy link

iScriptLex commented Jun 28, 2022

This is some kind of gradient vanishing in GAN. It means that generator has reached its limit on your dataset and begins to rearrange its capacity by dropping some rare nodes. So, with each iteration generator's output will lose more and more diversity. Like this:
2022-06-28-164449_398x331_scrot
Technically output images are not identical, but they look too similar and contain only few dataset features.

It could mean that your dataset is too complicated, unbalanced or just too small.

There are several ways to deal with it.

  1. Improve your dataset: add more images, remove outliers which differ too much from the most of pictures, etc.
  2. Reduce learning rate: --learning-rate 1e-4 or even --learning-rate 1e-5 (of course it should be reduced not from the start of training, but only when your discriminator loss drops too much).
  3. Continue your training with increased batch size: --batch-size 64
    If you don't have enough VRAM for that, use gradient accumulation with your original batch size:
    --gradient-accumulate-every 2
  4. Use TTUR. This GAN contains code for working with it, but for some reason it is not present in the list of input parameters. So, you should modify cli.py for that.

In cli.py, after line def train_from_folder( add to parameter list:
ttur_mult = 1.0,
and after model_args = dict( add ttur_mult = ttur_mult, to this dict.

Then, use it like this:
--ttur-mult 2.0

  1. Add more augmentation: --aug-prob 0.6 or even --aug_prob 0.8

Other methods greatly depend on your dataset and require code modifications (such as some kinds of regularizations during the training process).

@demiahmed
Copy link
Author

demiahmed commented Jun 29, 2022

Thanks for all the suggestions. I am trying out a combination of all measures.

My default --gradient-accumulate-every is 4. Does higher gradient accumulation imitate a larger batch size?

I'm using an RTX 3080 with 10GB of VRAM and with a dataset size of 4.3k images and hence I can't up my batch-size beyond 8

@iScriptLex
Copy link

Does higher gradient accumulation imitate a larger batch size?

Yes, it does. You can set --gradient-accumulate-every 8 or even more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants