Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two problems about paper #24

Open
seanM29 opened this issue Jun 11, 2019 · 3 comments
Open

Two problems about paper #24

seanM29 opened this issue Jun 11, 2019 · 3 comments

Comments

@seanM29
Copy link

seanM29 commented Jun 11, 2019

  • In the paper, Figure 6, the only difference between (c) and (d) is that (d) is much more flat than (c), and the test error of (d) is smaller than (c),
    so if a loss surface, which we use Filter-Wise Normalization to get, is flatter, it will have a better Generalization? is there any explanation or math proof?
  • In final part of section6, paper use (min eigenvalues of the Hessian / max eigenvalues of the Hessian) to represent convex, larger value indicate a more non-convex region, smaller value indicate a more convex region, why?
    Thank you for sharing results of your work.
    This is a really impressive paper and your response is appreciated.
@Jamesswiz
Copy link

I have the same query after reading the paper.

Can the authors please comment?

@liiliiliil
Copy link

I also don't understand the first question. : (

For the second one, I think the key is to prove that convex-looking regions in projected surface is relatively convex in original surface. A small absolute ratio means that the max eigenvalue is big enough compared to min eigenvalue which may be a negative value, that is the postive eigenvalue is dominant, so a convex-looking region in projected surface which has a small absolute ratio is a relatively convex region in original surface.

@knowlen
Copy link

knowlen commented Feb 26, 2021

Not affiliated with the paper, but in non-convex optimization it is generally believed that wider minima should generalize better than sharp minima. This clip from Leo Dirac (start at 16:30) conveys the intuition. The paper result capture it empirically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants