Discriminator Loss converges to 0 while Generator loss pretty high #133

demiahmed · 2022-06-28T08:25:54Z

I am trying to train with a custom image dataset for about 600,000 epochs. At about halfway, my D_loss converges to 0 while my G_loss stays put at 2.5

My evaluation outputs are slowly starting to fade out to either black or white.

Is there any thing that I could to tweak my model? Either by increasing the threshold for the Discriminator or by training the Generator only?

The text was updated successfully, but these errors were encountered:

iScriptLex · 2022-06-28T14:14:03Z

This is some kind of gradient vanishing in GAN. It means that generator has reached its limit on your dataset and begins to rearrange its capacity by dropping some rare nodes. So, with each iteration generator's output will lose more and more diversity. Like this:

Technically output images are not identical, but they look too similar and contain only few dataset features.

It could mean that your dataset is too complicated, unbalanced or just too small.

There are several ways to deal with it.

Improve your dataset: add more images, remove outliers which differ too much from the most of pictures, etc.
Reduce learning rate: --learning-rate 1e-4 or even --learning-rate 1e-5 (of course it should be reduced not from the start of training, but only when your discriminator loss drops too much).
Continue your training with increased batch size: --batch-size 64
If you don't have enough VRAM for that, use gradient accumulation with your original batch size:
--gradient-accumulate-every 2
Use TTUR. This GAN contains code for working with it, but for some reason it is not present in the list of input parameters. So, you should modify cli.py for that.

In cli.py, after line def train_from_folder( add to parameter list:
ttur_mult = 1.0,
and after model_args = dict( add ttur_mult = ttur_mult, to this dict.

Then, use it like this:
--ttur-mult 2.0

Add more augmentation: --aug-prob 0.6 or even --aug_prob 0.8

Other methods greatly depend on your dataset and require code modifications (such as some kinds of regularizations during the training process).

demiahmed · 2022-06-29T06:44:52Z

Thanks for all the suggestions. I am trying out a combination of all measures.

My default --gradient-accumulate-every is 4. Does higher gradient accumulation imitate a larger batch size?

I'm using an RTX 3080 with 10GB of VRAM and with a dataset size of 4.3k images and hence I can't up my batch-size beyond 8

iScriptLex · 2022-06-29T14:30:56Z

Does higher gradient accumulation imitate a larger batch size?

Yes, it does. You can set --gradient-accumulate-every 8 or even more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discriminator Loss converges to 0 while Generator loss pretty high #133

Discriminator Loss converges to 0 while Generator loss pretty high #133

demiahmed commented Jun 28, 2022

iScriptLex commented Jun 28, 2022 •

edited

Loading

demiahmed commented Jun 29, 2022 •

edited

Loading

iScriptLex commented Jun 29, 2022

Discriminator Loss converges to 0 while Generator loss pretty high #133

Discriminator Loss converges to 0 while Generator loss pretty high #133

Comments

demiahmed commented Jun 28, 2022

iScriptLex commented Jun 28, 2022 • edited Loading

demiahmed commented Jun 29, 2022 • edited Loading

iScriptLex commented Jun 29, 2022

iScriptLex commented Jun 28, 2022 •

edited

Loading

demiahmed commented Jun 29, 2022 •

edited

Loading