Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems training will collapse when applying another Dataset #19

Open
schoengzc opened this issue Dec 16, 2018 · 18 comments
Open

It seems training will collapse when applying another Dataset #19

schoengzc opened this issue Dec 16, 2018 · 18 comments

Comments

@schoengzc
Copy link

I have trained the model using another content dataset with the given styles images of Monet, but it seems it will soon collapse and output entirely black stylized images.
I've tried discarding the image augmentation process and scipy.misc.imresize(), but it still can not work with this content dataset (150,000 jpg images with generally 1800+ pixels).
Would you please give me some tips or suggestions about this issue? Such as trying another learning rate/discriminator success rate.
Thanks for your time in advance.

@dimakot55
Copy link
Member

dimakot55 commented Dec 16, 2018

We also have experienced similar issues for some datasets. To the best of my knowledge this is caused by either

  1. some numerical instabilty of the loss function self.loss = sce_criterion in model.py (sce_criterion is defined in module.py) which relies on computing tf.nn.sigmoid_cross_entropy_with_logits involving computation exp(x). But this is unlikely since TF team have checked such a common in practice function.
  2. More likely cause to my opinion is the overflow of some convolutional kernel weight somewhere inside the network which corrupts all the other weights in single update step. Frankly, I'm still not sure about this explanation.

We've noticed that this happens when training on style datasets of especially complicated artists (those where local structure and texture is less prominent, but the painting composition and content is what matters the most)
Easy trick that helped us with this issue - restart training from the last saved step model hasn't corrupt yet.

@schoengzc
Copy link
Author

schoengzc commented Dec 17, 2018

Thanks very much for your detailed reply.
I have to say ,BTW, the performance of your style transfer is amazingly good, I hope I could learn more and solve the unstable training issues in some day.

@narayansundararajan123
Copy link

We also have experienced similar issues for some datasets. To the best of my knowledge this is caused by either

  1. some numerical instabilty of the loss function self.loss = sce_criterion in model.py (sce_criterion is defined in module.py) which relies on computing tf.nn.sigmoid_cross_entropy_with_logits involving computation exp(x). But this is unlikely since TF team have checked such a common in practice function.
  2. More likely cause to my opinion is the overflow of some convolutional kernel weight somewhere inside the network which corrupts all the other weights in single update step. Frankly, I'm still not sure about this explanation.

We've noticed that this happens when training on style datasets of especially complicated artists (those where local structure and texture is less prominent, but the painting composition and content is what matters the most)
Easy trick that helped us with this issue - restart training from the last saved step model hasn't corrupt yet.

I am running into this same issue of the output being entirely black stylized images when I train using the art style dataset of 144 black and white paintings of different sizes. Used --image_size=256 due to limitations of my hw. Ran the training to 30000 iterations. Would really appreciate some help esp. on how to restart training from the point of corruption. Or other things that would be good to try. Thanks much.

@eps696
Copy link

eps696 commented Jan 25, 2019

i've managed to solve collapsed black output problem by:

  1. adding dropouts to residual blocks (on training phase only):
        def residual_block(x, dim, k=3, s=1, dropout=0, name='res'):
            . . . 
            if dropout > 0: y = tf.nn.dropout(y, keep_prob = 1-dropout)
            return y + x

        # stack 9 residual blocks
        nf = features.get_shape().as_list()[-1]
        r1 = residual_block(features, nf, dropout=0,       name='g_r1')
        r2 = residual_block(r1,       nf, dropout=dropout, name='g_r2')
        r3 = residual_block(r2,       nf, dropout=dropout, name='g_r3')
        r4 = residual_block(r3,       nf, dropout=dropout, name='g_r4')
        r5 = residual_block(r4,       nf, dropout=dropout, name='g_r5')
        r6 = residual_block(r5,       nf, dropout=dropout, name='g_r6')
        r7 = residual_block(r6,       nf, dropout=dropout, name='g_r7')
        r8 = residual_block(r7,       nf, dropout=dropout, name='g_r8')
        r9 = residual_block(r8,       nf, dropout=0,       name='g_r9')

  1. adding progressive soft labels to discriminator losses:
        def ones(x, key):
            return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
        def zeros(x, key):
            return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            
        # Discriminator losses - ones for original styles, otherwise zero
        in_style_D_loss  = {key: loss(pred,  ones(pred, key)) * s_weight[key] for key, pred in zip(in_style_D_pred.keys(),  in_style_D_pred.values())}
        in_content_D_loss  = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(in_content_D_pred.keys(),  in_content_D_pred.values())}
        out_content_D_loss = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(out_content_D_pred.keys(), out_content_D_pred.values())}

i also removed winrate-based training schedule for now (left just one G and one D pass, no accuracy calculation), but will check again later if it was an issue

@narayansundararajan123
Copy link

i've managed to solve collapsed black output problem by:

  1. adding dropouts to residual blocks (on training phase only):
        def residual_block(x, dim, k=3, s=1, dropout=0, name='res'):
            . . . 
            if dropout > 0: y = tf.nn.dropout(y, keep_prob = 1-dropout)
            return y + x

        # stack 9 residual blocks
        nf = features.get_shape().as_list()[-1]
        r1 = residual_block(features, nf, dropout=0,       name='g_r1')
        r2 = residual_block(r1,       nf, dropout=dropout, name='g_r2')
        r3 = residual_block(r2,       nf, dropout=dropout, name='g_r3')
        r4 = residual_block(r3,       nf, dropout=dropout, name='g_r4')
        r5 = residual_block(r4,       nf, dropout=dropout, name='g_r5')
        r6 = residual_block(r5,       nf, dropout=dropout, name='g_r6')
        r7 = residual_block(r6,       nf, dropout=dropout, name='g_r7')
        r8 = residual_block(r7,       nf, dropout=dropout, name='g_r8')
        r9 = residual_block(r8,       nf, dropout=0,       name='g_r9')
  1. adding progressive soft labels to discriminator losses:
        def ones(x, key):
            return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
        def zeros(x, key):
            return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            
        # Discriminator losses - ones for original styles, otherwise zero
        in_style_D_loss  = {key: loss(pred,  ones(pred, key)) * s_weight[key] for key, pred in zip(in_style_D_pred.keys(),  in_style_D_pred.values())}
        in_content_D_loss  = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(in_content_D_pred.keys(),  in_content_D_pred.values())}
        out_content_D_loss = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(out_content_D_pred.keys(), out_content_D_pred.values())}

i also removed winrate-based training schedule for now (left just one G and one D pass, no accuracy calculation), but will check again later if it was an issue

great! possible to share the modified files for me to quickly rerun the training on my art dataset to see if it might work please?

@eps696
Copy link

eps696 commented Jan 25, 2019

@narayansundararajan123 well, i've quite refactored the whole code in a way that i'm more used to, so it's rather different from the original repo now - including names, vars, module structures, utility functions, etc.
i will try to apply the same changes to the original code and post those pieces, if the snippets in the post above are not enough. alas, i don't really use git, so cannot provide proper fork..

applied changes are kind of standard GAN tricks to 'slow down' or 'distract' discriminator when it's trained much faster than generator (which is the reason of collapsing - that's quite well seen on the D losses behaviour in tensorboard).

and btw i also totally removed all accuracy calculation and winrate-based training schedule part, cause the model never converged with it (and perfectly did without).

@eps696
Copy link

eps696 commented Jan 26, 2019

@narayansundararajan123 ok, let's try these quick updates for original code:

module.py, in decoder()

        def residule_block(x, dim, ks=3, s=1, dropout=False, name='res'):
            p = int((ks - 1) / 2)
            y = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c1'), name+'_bn1')
            y = tf.pad(tf.nn.relu(y), [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c2'), name+'_bn2')
            if dropout is True and options.is_training is True: 
                y = tf.nn.dropout(y, 0.5)
            return y + x

        # Now stack 9 residual blocks
        num_kernels = features.get_shape().as_list()[-1]
        r1 = residule_block(features, num_kernels, name='g_r1')
        r2 = residule_block(r1, num_kernels, dropout=True, name='g_r2')
        r3 = residule_block(r2, num_kernels, dropout=True, name='g_r3')
        r4 = residule_block(r3, num_kernels, dropout=True, name='g_r4')
        r5 = residule_block(r4, num_kernels, dropout=True, name='g_r5')
        r6 = residule_block(r5, num_kernels, dropout=True, name='g_r6')
        r7 = residule_block(r6, num_kernels, dropout=True, name='g_r7')
        r8 = residule_block(r7, num_kernels, dropout=True, name='g_r8')
        r9 = residule_block(r8, num_kernels, name='g_r9')

model.py, in _build_model()

            def ones(x, key):
                return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            def zeros(x, key):
                return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
                
            self.input_painting_discr_loss = {key: self.loss(pred, ones(pred, key)) * scale_weight[key]
                                              for key, pred in zip(self.input_painting_discr_predictions.keys(),
                                                                   self.input_painting_discr_predictions.values())}
            self.input_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                           for key, pred in zip(self.input_photo_discr_predictions.keys(),
                                                                self.input_photo_discr_predictions.values())}
            self.output_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                            for key, pred in zip(self.output_photo_discr_predictions.keys(),
                                                                 self.output_photo_discr_predictions.values())}

model.py, in train()

replace this

            if discr_success >= win_rate:
                # Train generator
                _, summary_all, gener_acc_ = self.sess.run(
                    [self.g_optim_step, self.summary_merged_all, self.gener_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * (1. - gener_acc_)
            else:
                # Train discriminator.
                _, summary_all, discr_acc_ = self.sess.run(
                    [self.d_optim_step, self.summary_merged_all, self.discr_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * discr_acc_

by this

            # Train generator
            _, summary_all = self.sess.run(
                [self.g_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })
            # Train discriminator.
            _, summary_all = self.sess.run(
                [self.d_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })

if you use last 'fix' you can also comment out everything related to accuracy measurement/reporting.
alas, i cannot make a test run with it, cause i don't have that huge places dataset (i use another smaller one). let me know how it goes on your side

@narayansundararajan123
Copy link

Thanks so much! Will try and let you know.

@narayansundararajan123
Copy link

@narayansundararajan123 ok, let's try these quick updates for original code:

module.py, in decoder()

        def residule_block(x, dim, ks=3, s=1, dropout=False, name='res'):
            p = int((ks - 1) / 2)
            y = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c1'), name+'_bn1')
            y = tf.pad(tf.nn.relu(y), [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c2'), name+'_bn2')
            if dropout is True and options.is_training is True: 
                y = tf.nn.dropout(y, 0.5)
            return y + x

        # Now stack 9 residual blocks
        num_kernels = features.get_shape().as_list()[-1]
        r1 = residule_block(features, num_kernels, name='g_r1')
        r2 = residule_block(r1, num_kernels, dropout=True, name='g_r2')
        r3 = residule_block(r2, num_kernels, dropout=True, name='g_r3')
        r4 = residule_block(r3, num_kernels, dropout=True, name='g_r4')
        r5 = residule_block(r4, num_kernels, dropout=True, name='g_r5')
        r6 = residule_block(r5, num_kernels, dropout=True, name='g_r6')
        r7 = residule_block(r6, num_kernels, dropout=True, name='g_r7')
        r8 = residule_block(r7, num_kernels, dropout=True, name='g_r8')
        r9 = residule_block(r8, num_kernels, name='g_r9')

model.py, in _build_model()

            def ones(x, key):
                return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            def zeros(x, key):
                return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
                
            self.input_painting_discr_loss = {key: self.loss(pred, ones(pred, key)) * scale_weight[key]
                                              for key, pred in zip(self.input_painting_discr_predictions.keys(),
                                                                   self.input_painting_discr_predictions.values())}
            self.input_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                           for key, pred in zip(self.input_photo_discr_predictions.keys(),
                                                                self.input_photo_discr_predictions.values())}
            self.output_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                            for key, pred in zip(self.output_photo_discr_predictions.keys(),
                                                                 self.output_photo_discr_predictions.values())}

model.py, in train()

replace this

            if discr_success >= win_rate:
                # Train generator
                _, summary_all, gener_acc_ = self.sess.run(
                    [self.g_optim_step, self.summary_merged_all, self.gener_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * (1. - gener_acc_)
            else:
                # Train discriminator.
                _, summary_all, discr_acc_ = self.sess.run(
                    [self.d_optim_step, self.summary_merged_all, self.discr_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * discr_acc_

by this

            # Train generator
            _, summary_all = self.sess.run(
                [self.g_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })
            # Train discriminator.
            _, summary_all = self.sess.run(
                [self.d_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })

if you use last 'fix' you can also comment out everything related to accuracy measurement/reporting.
alas, i cannot make a test run with it, cause i don't have that huge places dataset (i use another smaller one). let me know how it goes on your side

Hi

Tried training again using the art style dataset of 144 black and white paintings of different sizes. Used --image_size=256. Ran the training to 30000 iterations again with the new modifications. Still unfortunately running into the same issue of the output being entirely black stylized images. Would there be anything that I might have missed implementing other than the modifications from above or other suggestions on solving this issue?

@eps696
Copy link

eps696 commented Feb 12, 2019

@narayansundararajan123 other changes were quite subtle (like tweaking loss weights for D and G separately), so i don't think they really matter. i also changed some technical ops (like loading data) for the ones i'm used to, but this was done for easier reading/maintaining, i doubt it could affect the result.
could you share your dataset so that i'd try it on my side (if it's not private of course)?

@narayansundararajan123
Copy link

Thanks. I also noticed beyond 210000 iterations when the model likely goes off, I am also getting

RuntimeWarning: invalid value encountered in reduce
return umr_maximum(a, axis, None, out, keepdims, initial)

when i run the inference and get black output images after stylization.

I can also share the dataset if you could send me an email at [email protected].

@eps696
Copy link

eps696 commented Feb 13, 2019

haven't seen such warnings..
in fact, my fixes are not 100% remedy - i was also facing black output on some datasets, but it happened much later than with original code (like, 200k vs 1020k). these tricks are just stabilizing training for longer time - whether the model converges within that period is a separate question in every case

@andrew194
Copy link

@eps696 Can you tell me what discriminator,transformer loss and feature loss weight you used because I cant get the GAN to converge.

@eps696
Copy link

eps696 commented Mar 10, 2019

@andrew194
s_d_weight = {"s0": 1., "s1": 1., "s3": 0.5, "s5": 0.5, "s6": 0.5}
s_g_weight = {"s0": 1., "s1": 0.7, "s3": 0.3, "s5": 0.3, "s6": 0.3}
kept feature loss as in the original code (l1_loss * 100)

@andrew194
Copy link

@eps696 Thanks! Did you also use 1 for the discriminator loss weight?

@eps696
Copy link

eps696 commented Mar 10, 2019

didn't quite catch what you mean by 'use 1'
s_d_weight are discriminator loss weights
s_g_weight are generator loss weights

@andrew194
Copy link

Sorry I was referring to the optimizer
self.d_optim_step = tf.train.AdamOptimizer(self.lr).minimize(loss=self.options.discr_loss_weight * self.discr_loss,var_list=[self.discr_vars])

@eps696
Copy link

eps696 commented Mar 10, 2019

multiple weights are applied to the losses before, no need for another multiplier.
here is my code (var names are different, but should be pretty obvious):

        # Discriminator losses - ones for original styles, otherwise zero
        in_s_D_loss  = {key: loss(pred,  ones(pred, key)) * s_d_weight[key] for key, pred in zip(in_s_D_pred.keys(),  in_s_D_pred.values())}
        in_c_D_loss  = {key: loss(pred, zeros(pred, key)) * s_d_weight[key] for key, pred in zip(in_c_D_pred.keys(),  in_c_D_pred.values())}
        out_c_D_loss = {key: loss(pred, zeros(pred, key)) * s_d_weight[key] for key, pred in zip(out_c_D_pred.keys(), out_c_D_pred.values())}

        D_loss = tf.add_n(list(in_s_D_loss.values())) + \
                 tf.add_n(list(in_c_D_loss.values())) + \
                 tf.add_n(list(out_c_D_loss.values()))

        # Generator loss - ones for output images
        out_c_G_loss = {key: loss(pred, tf.ones_like(pred)) * s_g_weight[key] for key, pred in zip(out_c_D_pred.keys(), out_c_D_pred.values())}
        G_loss = tf.add_n(list(out_c_G_loss.values()))

        # Image loss.
        img_loss = mse_loss(t_block(out_c, 10), t_block(in_c, 10))

        # Features loss.
        feat_loss = l1_loss(out_c_feat, in_c_feat) 

        t_vars = tf.trainable_variables()
        D_vars = [var for var in t_vars if 'discriminator' in var.name]
        G_vars = [var for var in t_vars if 'encoder' in var.name or 'decoder' in var.name]

        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

        with tf.control_dependencies(update_ops):
            D_opt_step = tf.train.AdamOptimizer(a.lr).minimize(D_loss, var_list = [D_vars])
            G_opt_step = tf.train.AdamOptimizer(a.lr).minimize(G_loss + img_loss*100 + feat_loss*100, var_list=[G_vars])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants