WD or WD+GN with fp16 #19

vadik6666 · 2020-02-27T15:59:43Z

Thanks for the code and paper.
WD or WD+GN work good in comparison to BN when you train with fp32.
But have you tried WD or WD+GN during training with fp16 (from apex)? Should it still work as good as it is in the experiments with BN or with BN and fp16?

joe-siyuan-qiao · 2020-02-28T18:53:27Z

We haven't done any experiments with half-precision computation, but we expect to see similar improvements if only the computation precision is changed.

vadik6666 · 2020-03-11T16:00:40Z

@joe-siyuan-qiao Thanks for the reply!
Have you tried different epsilon values other than 1e-5 in weight =/ (std + eps)?
E.g. will it degrade the quality if epsilon is bigger like 1e-3 or 1e-4?

joe-siyuan-qiao · 2020-03-14T01:51:30Z

We haven't tried other values. 1e-5 is a commonly used eps in PyTorch, but I guess other values will not change the results too much as long as they are not too big nor too small.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WD or WD+GN with fp16 #19

WD or WD+GN with fp16 #19

vadik6666 commented Feb 27, 2020 •

edited

Loading

joe-siyuan-qiao commented Feb 28, 2020

vadik6666 commented Mar 11, 2020

joe-siyuan-qiao commented Mar 14, 2020

WD or WD+GN with fp16 #19

WD or WD+GN with fp16 #19

Comments

vadik6666 commented Feb 27, 2020 • edited Loading

joe-siyuan-qiao commented Feb 28, 2020

vadik6666 commented Mar 11, 2020

joe-siyuan-qiao commented Mar 14, 2020

vadik6666 commented Feb 27, 2020 •

edited

Loading