You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the code and paper.
WD or WD+GN work good in comparison to BN when you train with fp32.
But have you tried WD or WD+GN during training with fp16 (from apex)? Should it still work as good as it is in the experiments with BN or with BN and fp16?
The text was updated successfully, but these errors were encountered:
We haven't done any experiments with half-precision computation, but we expect to see similar improvements if only the computation precision is changed.
@joe-siyuan-qiao Thanks for the reply!
Have you tried different epsilon values other than 1e-5 in weight =/ (std + eps)?
E.g. will it degrade the quality if epsilon is bigger like 1e-3 or 1e-4?
We haven't tried other values. 1e-5 is a commonly used eps in PyTorch, but I guess other values will not change the results too much as long as they are not too big nor too small.
Thanks for the code and paper.
WD or WD+GN work good in comparison to BN when you train with fp32.
But have you tried WD or WD+GN during training with fp16 (from apex)? Should it still work as good as it is in the experiments with BN or with BN and fp16?
The text was updated successfully, but these errors were encountered: