It seems training will collapse when applying another Dataset #19

schoengzc · 2018-12-16T08:18:12Z

I have trained the model using another content dataset with the given styles images of Monet, but it seems it will soon collapse and output entirely black stylized images.
I've tried discarding the image augmentation process and scipy.misc.imresize(), but it still can not work with this content dataset (150,000 jpg images with generally 1800+ pixels).
Would you please give me some tips or suggestions about this issue? Such as trying another learning rate/discriminator success rate.
Thanks for your time in advance.

dimakot55 · 2018-12-16T23:41:45Z

We also have experienced similar issues for some datasets. To the best of my knowledge this is caused by either

some numerical instabilty of the loss function self.loss = sce_criterion in model.py (sce_criterion is defined in module.py) which relies on computing tf.nn.sigmoid_cross_entropy_with_logits involving computation exp(x). But this is unlikely since TF team have checked such a common in practice function.
More likely cause to my opinion is the overflow of some convolutional kernel weight somewhere inside the network which corrupts all the other weights in single update step. Frankly, I'm still not sure about this explanation.

We've noticed that this happens when training on style datasets of especially complicated artists (those where local structure and texture is less prominent, but the painting composition and content is what matters the most)
Easy trick that helped us with this issue - restart training from the last saved step model hasn't corrupt yet.

schoengzc · 2018-12-17T06:36:16Z

Thanks very much for your detailed reply.
I have to say ,BTW, the performance of your style transfer is amazingly good, I hope I could learn more and solve the unstable training issues in some day.

narayansundararajan123 · 2019-01-24T02:32:30Z

We also have experienced similar issues for some datasets. To the best of my knowledge this is caused by either

some numerical instabilty of the loss function self.loss = sce_criterion in model.py (sce_criterion is defined in module.py) which relies on computing tf.nn.sigmoid_cross_entropy_with_logits involving computation exp(x). But this is unlikely since TF team have checked such a common in practice function.

More likely cause to my opinion is the overflow of some convolutional kernel weight somewhere inside the network which corrupts all the other weights in single update step. Frankly, I'm still not sure about this explanation.

We've noticed that this happens when training on style datasets of especially complicated artists (those where local structure and texture is less prominent, but the painting composition and content is what matters the most)
Easy trick that helped us with this issue - restart training from the last saved step model hasn't corrupt yet.

I am running into this same issue of the output being entirely black stylized images when I train using the art style dataset of 144 black and white paintings of different sizes. Used --image_size=256 due to limitations of my hw. Ran the training to 30000 iterations. Would really appreciate some help esp. on how to restart training from the point of corruption. Or other things that would be good to try. Thanks much.

eps696 · 2019-01-25T08:05:34Z

i've managed to solve collapsed black output problem by:

adding dropouts to residual blocks (on training phase only):

        def residual_block(x, dim, k=3, s=1, dropout=0, name='res'):
            . . . 
            if dropout > 0: y = tf.nn.dropout(y, keep_prob = 1-dropout)
            return y + x

        # stack 9 residual blocks
        nf = features.get_shape().as_list()[-1]
        r1 = residual_block(features, nf, dropout=0,       name='g_r1')
        r2 = residual_block(r1,       nf, dropout=dropout, name='g_r2')
        r3 = residual_block(r2,       nf, dropout=dropout, name='g_r3')
        r4 = residual_block(r3,       nf, dropout=dropout, name='g_r4')
        r5 = residual_block(r4,       nf, dropout=dropout, name='g_r5')
        r6 = residual_block(r5,       nf, dropout=dropout, name='g_r6')
        r7 = residual_block(r6,       nf, dropout=dropout, name='g_r7')
        r8 = residual_block(r7,       nf, dropout=dropout, name='g_r8')
        r9 = residual_block(r8,       nf, dropout=0,       name='g_r9')

adding progressive soft labels to discriminator losses:

        def ones(x, key):
            return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
        def zeros(x, key):
            return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            
        # Discriminator losses - ones for original styles, otherwise zero
        in_style_D_loss  = {key: loss(pred,  ones(pred, key)) * s_weight[key] for key, pred in zip(in_style_D_pred.keys(),  in_style_D_pred.values())}
        in_content_D_loss  = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(in_content_D_pred.keys(),  in_content_D_pred.values())}
        out_content_D_loss = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(out_content_D_pred.keys(), out_content_D_pred.values())}

i also removed winrate-based training schedule for now (left just one G and one D pass, no accuracy calculation), but will check again later if it was an issue

narayansundararajan123 · 2019-01-25T21:12:16Z

i've managed to solve collapsed black output problem by:

adding dropouts to residual blocks (on training phase only):

        def residual_block(x, dim, k=3, s=1, dropout=0, name='res'):
            . . . 
            if dropout > 0: y = tf.nn.dropout(y, keep_prob = 1-dropout)
            return y + x

        # stack 9 residual blocks
        nf = features.get_shape().as_list()[-1]
        r1 = residual_block(features, nf, dropout=0,       name='g_r1')
        r2 = residual_block(r1,       nf, dropout=dropout, name='g_r2')
        r3 = residual_block(r2,       nf, dropout=dropout, name='g_r3')
        r4 = residual_block(r3,       nf, dropout=dropout, name='g_r4')
        r5 = residual_block(r4,       nf, dropout=dropout, name='g_r5')
        r6 = residual_block(r5,       nf, dropout=dropout, name='g_r6')
        r7 = residual_block(r6,       nf, dropout=dropout, name='g_r7')
        r8 = residual_block(r7,       nf, dropout=dropout, name='g_r8')
        r9 = residual_block(r8,       nf, dropout=0,       name='g_r9')

adding progressive soft labels to discriminator losses:

        def ones(x, key):
            return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
        def zeros(x, key):
            return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            
        # Discriminator losses - ones for original styles, otherwise zero
        in_style_D_loss  = {key: loss(pred,  ones(pred, key)) * s_weight[key] for key, pred in zip(in_style_D_pred.keys(),  in_style_D_pred.values())}
        in_content_D_loss  = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(in_content_D_pred.keys(),  in_content_D_pred.values())}
        out_content_D_loss = {key: loss(pred, zeros(pred, key)) * s_weight[key] for key, pred in zip(out_content_D_pred.keys(), out_content_D_pred.values())}

i also removed winrate-based training schedule for now (left just one G and one D pass, no accuracy calculation), but will check again later if it was an issue

great! possible to share the modified files for me to quickly rerun the training on my art dataset to see if it might work please?

eps696 · 2019-01-25T23:50:10Z

@narayansundararajan123 well, i've quite refactored the whole code in a way that i'm more used to, so it's rather different from the original repo now - including names, vars, module structures, utility functions, etc.
i will try to apply the same changes to the original code and post those pieces, if the snippets in the post above are not enough. alas, i don't really use git, so cannot provide proper fork..

applied changes are kind of standard GAN tricks to 'slow down' or 'distract' discriminator when it's trained much faster than generator (which is the reason of collapsing - that's quite well seen on the D losses behaviour in tensorboard).

and btw i also totally removed all accuracy calculation and winrate-based training schedule part, cause the model never converged with it (and perfectly did without).

eps696 · 2019-01-26T00:41:29Z

@narayansundararajan123 ok, let's try these quick updates for original code:

module.py, in decoder()

        def residule_block(x, dim, ks=3, s=1, dropout=False, name='res'):
            p = int((ks - 1) / 2)
            y = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c1'), name+'_bn1')
            y = tf.pad(tf.nn.relu(y), [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c2'), name+'_bn2')
            if dropout is True and options.is_training is True: 
                y = tf.nn.dropout(y, 0.5)
            return y + x

        # Now stack 9 residual blocks
        num_kernels = features.get_shape().as_list()[-1]
        r1 = residule_block(features, num_kernels, name='g_r1')
        r2 = residule_block(r1, num_kernels, dropout=True, name='g_r2')
        r3 = residule_block(r2, num_kernels, dropout=True, name='g_r3')
        r4 = residule_block(r3, num_kernels, dropout=True, name='g_r4')
        r5 = residule_block(r4, num_kernels, dropout=True, name='g_r5')
        r6 = residule_block(r5, num_kernels, dropout=True, name='g_r6')
        r7 = residule_block(r6, num_kernels, dropout=True, name='g_r7')
        r8 = residule_block(r7, num_kernels, dropout=True, name='g_r8')
        r9 = residule_block(r8, num_kernels, name='g_r9')

model.py, in _build_model()

            def ones(x, key):
                return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            def zeros(x, key):
                return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
                
            self.input_painting_discr_loss = {key: self.loss(pred, ones(pred, key)) * scale_weight[key]
                                              for key, pred in zip(self.input_painting_discr_predictions.keys(),
                                                                   self.input_painting_discr_predictions.values())}
            self.input_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                           for key, pred in zip(self.input_photo_discr_predictions.keys(),
                                                                self.input_photo_discr_predictions.values())}
            self.output_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                            for key, pred in zip(self.output_photo_discr_predictions.keys(),
                                                                 self.output_photo_discr_predictions.values())}

model.py, in train()

replace this

            if discr_success >= win_rate:
                # Train generator
                _, summary_all, gener_acc_ = self.sess.run(
                    [self.g_optim_step, self.summary_merged_all, self.gener_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * (1. - gener_acc_)
            else:
                # Train discriminator.
                _, summary_all, discr_acc_ = self.sess.run(
                    [self.d_optim_step, self.summary_merged_all, self.discr_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * discr_acc_

by this

            # Train generator
            _, summary_all = self.sess.run(
                [self.g_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })
            # Train discriminator.
            _, summary_all = self.sess.run(
                [self.d_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })

if you use last 'fix' you can also comment out everything related to accuracy measurement/reporting.
alas, i cannot make a test run with it, cause i don't have that huge places dataset (i use another smaller one). let me know how it goes on your side

narayansundararajan123 · 2019-01-26T03:16:12Z

Thanks so much! Will try and let you know.

narayansundararajan123 · 2019-02-12T18:44:50Z

@narayansundararajan123 ok, let's try these quick updates for original code:

module.py, in decoder()

        def residule_block(x, dim, ks=3, s=1, dropout=False, name='res'):
            p = int((ks - 1) / 2)
            y = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c1'), name+'_bn1')
            y = tf.pad(tf.nn.relu(y), [[0, 0], [p, p], [p, p], [0, 0]], "REFLECT")
            y = instance_norm(conv2d(y, dim, ks, s, padding='VALID', name=name+'_c2'), name+'_bn2')
            if dropout is True and options.is_training is True: 
                y = tf.nn.dropout(y, 0.5)
            return y + x

        # Now stack 9 residual blocks
        num_kernels = features.get_shape().as_list()[-1]
        r1 = residule_block(features, num_kernels, name='g_r1')
        r2 = residule_block(r1, num_kernels, dropout=True, name='g_r2')
        r3 = residule_block(r2, num_kernels, dropout=True, name='g_r3')
        r4 = residule_block(r3, num_kernels, dropout=True, name='g_r4')
        r5 = residule_block(r4, num_kernels, dropout=True, name='g_r5')
        r6 = residule_block(r5, num_kernels, dropout=True, name='g_r6')
        r7 = residule_block(r6, num_kernels, dropout=True, name='g_r7')
        r8 = residule_block(r7, num_kernels, dropout=True, name='g_r8')
        r9 = residule_block(r8, num_kernels, name='g_r9')

model.py, in _build_model()

            def ones(x, key):
                return tf.ones_like(x) - tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
            def zeros(x, key):
                return tf.zeros_like(x) + tf.random_uniform(tf.shape(x), 0., float(key[-1]) * 0.03)
                
            self.input_painting_discr_loss = {key: self.loss(pred, ones(pred, key)) * scale_weight[key]
                                              for key, pred in zip(self.input_painting_discr_predictions.keys(),
                                                                   self.input_painting_discr_predictions.values())}
            self.input_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                           for key, pred in zip(self.input_photo_discr_predictions.keys(),
                                                                self.input_photo_discr_predictions.values())}
            self.output_photo_discr_loss = {key: self.loss(pred, zeros(pred, key)) * scale_weight[key]
                                            for key, pred in zip(self.output_photo_discr_predictions.keys(),
                                                                 self.output_photo_discr_predictions.values())}

model.py, in train()

replace this

            if discr_success >= win_rate:
                # Train generator
                _, summary_all, gener_acc_ = self.sess.run(
                    [self.g_optim_step, self.summary_merged_all, self.gener_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * (1. - gener_acc_)
            else:
                # Train discriminator.
                _, summary_all, discr_acc_ = self.sess.run(
                    [self.d_optim_step, self.summary_merged_all, self.discr_acc],
                    feed_dict={
                        self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                        self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                        self.lr: self.options.lr
                    })
                discr_success = discr_success * (1. - alpha) + alpha * discr_acc_

by this

            # Train generator
            _, summary_all = self.sess.run(
                [self.g_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })
            # Train discriminator.
            _, summary_all = self.sess.run(
                [self.d_optim_step, self.summary_merged_all],
                feed_dict={
                    self.input_painting: normalize_arr_of_imgs(batch_art['image']),
                    self.input_photo: normalize_arr_of_imgs(batch_content['image']),
                    self.lr: self.options.lr
                })

if you use last 'fix' you can also comment out everything related to accuracy measurement/reporting.
alas, i cannot make a test run with it, cause i don't have that huge places dataset (i use another smaller one). let me know how it goes on your side

Hi

Tried training again using the art style dataset of 144 black and white paintings of different sizes. Used --image_size=256. Ran the training to 30000 iterations again with the new modifications. Still unfortunately running into the same issue of the output being entirely black stylized images. Would there be anything that I might have missed implementing other than the modifications from above or other suggestions on solving this issue?

eps696 · 2019-02-12T21:11:57Z

@narayansundararajan123 other changes were quite subtle (like tweaking loss weights for D and G separately), so i don't think they really matter. i also changed some technical ops (like loading data) for the ones i'm used to, but this was done for easier reading/maintaining, i doubt it could affect the result.
could you share your dataset so that i'd try it on my side (if it's not private of course)?

narayansundararajan123 · 2019-02-12T23:36:43Z

Thanks. I also noticed beyond 210000 iterations when the model likely goes off, I am also getting

RuntimeWarning: invalid value encountered in reduce
return umr_maximum(a, axis, None, out, keepdims, initial)

when i run the inference and get black output images after stylization.

I can also share the dataset if you could send me an email at [email protected].

eps696 · 2019-02-13T00:17:50Z

haven't seen such warnings..
in fact, my fixes are not 100% remedy - i was also facing black output on some datasets, but it happened much later than with original code (like, ~~200k vs 10~~20k). these tricks are just stabilizing training for longer time - whether the model converges within that period is a separate question in every case

andrew194 · 2019-03-10T10:20:06Z

@eps696 Can you tell me what discriminator,transformer loss and feature loss weight you used because I cant get the GAN to converge.

eps696 · 2019-03-10T11:06:57Z

@andrew194
s_d_weight = {"s0": 1., "s1": 1., "s3": 0.5, "s5": 0.5, "s6": 0.5}
s_g_weight = {"s0": 1., "s1": 0.7, "s3": 0.3, "s5": 0.3, "s6": 0.3}
kept feature loss as in the original code (l1_loss * 100)

andrew194 · 2019-03-10T11:31:25Z

@eps696 Thanks! Did you also use 1 for the discriminator loss weight?

eps696 · 2019-03-10T11:37:34Z

didn't quite catch what you mean by 'use 1'
s_d_weight are discriminator loss weights
s_g_weight are generator loss weights

andrew194 · 2019-03-10T11:40:56Z

Sorry I was referring to the optimizer
self.d_optim_step = tf.train.AdamOptimizer(self.lr).minimize(loss=self.options.discr_loss_weight * self.discr_loss,var_list=[self.discr_vars])

eps696 · 2019-03-10T12:11:56Z

multiple weights are applied to the losses before, no need for another multiplier.
here is my code (var names are different, but should be pretty obvious):

        # Discriminator losses - ones for original styles, otherwise zero
        in_s_D_loss  = {key: loss(pred,  ones(pred, key)) * s_d_weight[key] for key, pred in zip(in_s_D_pred.keys(),  in_s_D_pred.values())}
        in_c_D_loss  = {key: loss(pred, zeros(pred, key)) * s_d_weight[key] for key, pred in zip(in_c_D_pred.keys(),  in_c_D_pred.values())}
        out_c_D_loss = {key: loss(pred, zeros(pred, key)) * s_d_weight[key] for key, pred in zip(out_c_D_pred.keys(), out_c_D_pred.values())}

        D_loss = tf.add_n(list(in_s_D_loss.values())) + \
                 tf.add_n(list(in_c_D_loss.values())) + \
                 tf.add_n(list(out_c_D_loss.values()))

        # Generator loss - ones for output images
        out_c_G_loss = {key: loss(pred, tf.ones_like(pred)) * s_g_weight[key] for key, pred in zip(out_c_D_pred.keys(), out_c_D_pred.values())}
        G_loss = tf.add_n(list(out_c_G_loss.values()))

        # Image loss.
        img_loss = mse_loss(t_block(out_c, 10), t_block(in_c, 10))

        # Features loss.
        feat_loss = l1_loss(out_c_feat, in_c_feat) 

        t_vars = tf.trainable_variables()
        D_vars = [var for var in t_vars if 'discriminator' in var.name]
        G_vars = [var for var in t_vars if 'encoder' in var.name or 'decoder' in var.name]

        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

        with tf.control_dependencies(update_ops):
            D_opt_step = tf.train.AdamOptimizer(a.lr).minimize(D_loss, var_list = [D_vars])
            G_opt_step = tf.train.AdamOptimizer(a.lr).minimize(G_loss + img_loss*100 + feat_loss*100, var_list=[G_vars])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems training will collapse when applying another Dataset #19

It seems training will collapse when applying another Dataset #19

schoengzc commented Dec 16, 2018

dimakot55 commented Dec 16, 2018 •

edited

Loading

schoengzc commented Dec 17, 2018 •

edited

Loading

narayansundararajan123 commented Jan 24, 2019

eps696 commented Jan 25, 2019

narayansundararajan123 commented Jan 25, 2019

eps696 commented Jan 25, 2019

eps696 commented Jan 26, 2019

narayansundararajan123 commented Jan 26, 2019

narayansundararajan123 commented Feb 12, 2019

eps696 commented Feb 12, 2019

narayansundararajan123 commented Feb 12, 2019

eps696 commented Feb 13, 2019

andrew194 commented Mar 10, 2019

eps696 commented Mar 10, 2019

andrew194 commented Mar 10, 2019

eps696 commented Mar 10, 2019

andrew194 commented Mar 10, 2019

eps696 commented Mar 10, 2019

It seems training will collapse when applying another Dataset #19

It seems training will collapse when applying another Dataset #19

Comments

schoengzc commented Dec 16, 2018

dimakot55 commented Dec 16, 2018 • edited Loading

schoengzc commented Dec 17, 2018 • edited Loading

narayansundararajan123 commented Jan 24, 2019

eps696 commented Jan 25, 2019

narayansundararajan123 commented Jan 25, 2019

eps696 commented Jan 25, 2019

eps696 commented Jan 26, 2019

narayansundararajan123 commented Jan 26, 2019

narayansundararajan123 commented Feb 12, 2019

eps696 commented Feb 12, 2019

narayansundararajan123 commented Feb 12, 2019

eps696 commented Feb 13, 2019

andrew194 commented Mar 10, 2019

eps696 commented Mar 10, 2019

andrew194 commented Mar 10, 2019

eps696 commented Mar 10, 2019

andrew194 commented Mar 10, 2019

eps696 commented Mar 10, 2019

dimakot55 commented Dec 16, 2018 •

edited

Loading

schoengzc commented Dec 17, 2018 •

edited

Loading