Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WD or WD+GN with fp16 #19

Open
vadik6666 opened this issue Feb 27, 2020 · 3 comments
Open

WD or WD+GN with fp16 #19

vadik6666 opened this issue Feb 27, 2020 · 3 comments

Comments

@vadik6666
Copy link

vadik6666 commented Feb 27, 2020

Thanks for the code and paper.
WD or WD+GN work good in comparison to BN when you train with fp32.
But have you tried WD or WD+GN during training with fp16 (from apex)? Should it still work as good as it is in the experiments with BN or with BN and fp16?

@joe-siyuan-qiao
Copy link
Owner

We haven't done any experiments with half-precision computation, but we expect to see similar improvements if only the computation precision is changed.

@vadik6666
Copy link
Author

@joe-siyuan-qiao Thanks for the reply!
Have you tried different epsilon values other than 1e-5 in weight =/ (std + eps)?
E.g. will it degrade the quality if epsilon is bigger like 1e-3 or 1e-4?

@joe-siyuan-qiao
Copy link
Owner

We haven't tried other values. 1e-5 is a commonly used eps in PyTorch, but I guess other values will not change the results too much as long as they are not too big nor too small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants