-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two problems about paper #24
Comments
I have the same query after reading the paper. Can the authors please comment? |
I also don't understand the first question. : ( For the second one, I think the key is to prove that convex-looking regions in projected surface is relatively convex in original surface. A small absolute ratio means that the max eigenvalue is big enough compared to min eigenvalue which may be a negative value, that is the postive eigenvalue is dominant, so a convex-looking region in projected surface which has a small absolute ratio is a relatively convex region in original surface. |
Not affiliated with the paper, but in non-convex optimization it is generally believed that wider minima should generalize better than sharp minima. This clip from Leo Dirac (start at 16:30) conveys the intuition. The paper result capture it empirically. |
so if a loss surface, which we use Filter-Wise Normalization to get, is flatter, it will have a better Generalization? is there any explanation or math proof?
Thank you for sharing results of your work.
This is a really impressive paper and your response is appreciated.
The text was updated successfully, but these errors were encountered: