Skip to content

Commit

Permalink
Add better image for resnet scematic
Browse files Browse the repository at this point in the history
  • Loading branch information
Vlad Feinberg committed Nov 12, 2023
1 parent e18d0db commit 269831e
Show file tree
Hide file tree
Showing 8 changed files with 670 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The magic of the AdaGrad's analysis is in identifying that the running root-mean

The problem here is that our gradients would be correlated across dimensions, and decorrelating them requires a full whitening matrix \\(H_t^{-1}\\) as pictured above. Unfortunately, this is a showstopper for all but the smallest problems. Full matrix Adagrad analysis states that the optimal preconditioner is the inverse matrix square root of the gradient covariance, \\(C_t=\sum\_t g\_tg\_t^\top\\), the sum of gradient outer products. This would require petabytes to represent in modern neural networks!

![unrealistic resnet](/assets/2023-08-18/schematic.jpg){: .center-image-half }
![unrealistic resnet](/assets/2023-08-18/resnet-img/resnet-schematic.png){: .center-image-half }

Enter Shampoo. Shampoo tells us that for convex functions with matrix-shaped inputs, we can use a structured approximation to the full covariance \\(C\_t\\) instead (DNNs are non-convex functions of multiple matrix-shaped inputs, but the convex-inspired approach seems to work!). In particular, given a weight matrix of shape \\(a\times b\\), rather than using the full flat gradient \\(\textbf{g}\_t\in\mathbb{R}^{ab}\\), whose outer product is a matrix of size \\(ab\times ab\\), we can use the Kronecker product of reshaped matrix gradient's tensor products. Specifically, we set

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions assets/2023-08-18/resnet-img/resnet.aux
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
\relax
\providecommand \oddpage@label [2]{}
\gdef \@abspage@last{1}
Loading

0 comments on commit 269831e

Please sign in to comment.