Built site for gh-pages

UBC-STAT · Nov 19, 2024 · 549efa9 · 549efa9
1 parent 0798a9e
commit 549efa9
Show file tree

Hide file tree

Showing 5 changed files with 91 additions and 51 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-396fd312
+0cb2f234
diff --git a/schedule/slides/23-nnets-other.html b/schedule/slides/23-nnets-other.html
@@ -398,7 +398,7 @@
 <h2>23 Neural nets - generalization</h2>
 <p><span class="secondary">Stat 406</span></p>
 <p><span class="secondary">Geoff Pleiss, Trevor Campbell</span></p>
-<p>Last modified – 13 November 2024</p>
+<p>Last modified – 18 November 2024</p>
 <p><span class="math display">\[
 \DeclareMathOperator*{\argmin}{argmin}
 \DeclareMathOperator*{\argmax}{argmax}
@@ -634,6 +634,39 @@ <h2>Understanding Double Descent (Hand-Wavy)</h2>
 </ul></li>
 </ul>
 </section>
+<section id="understanding-double-descent-less-hand-wavy" class="slide level2">
+<h2>Understanding Double Descent (Less Hand-Wavy)</h2>
+<div class="flex">
+<div class="w-60">
+<p>(From <a href="https://arxiv.org/abs/1903.08560">Hastie et al., 2020</a>)</p>
+<ul>
+<li><p><span class="math inline">\(\gamma = D / N\)</span> (ratio of features / data)</p></li>
+<li><p><span class="math inline">\(\sigma^2 = \mathbb{E}[Y|X]\)</span> (observational noise)</p></li>
+<li><p>When basis features are uncorrelated, we have (asymptotically)</p></li>
+</ul>
+<p><span class="math display">\[
+\begin{aligned}
+  \mathrm{Bias}^2 &amp;= \begin{cases}
+    0 &amp; \gamma &lt; 1 \text{ (underparam.)} \\
+    1 - \tfrac{1}{\gamma} &amp; \gamma \geq 1 \text{ (overparam.)}
+  \end{cases} \\
+  &amp; \\
+  \mathrm{Var} &amp;= \begin{cases}
+    \sigma^2 \tfrac{\gamma}{1 - \gamma} &amp; \gamma &lt; 1 \text{ (underparam.)} \\
+    \sigma^2 \tfrac{1}{\gamma - 1} &amp; \gamma \geq 1 \text{ (overparam.)}
+  \end{cases} \\
+\end{aligned}
+\]</span></p>
+</div>
+<div class="w-38">
+<div class="quarto-figure quarto-figure-center">
+<figure>
+<p><img data-src="gfx/hastie_double_descent.png" class="quarto-figure quarto-figure-center" style="width:100.0%" data-fig-caption="Double descent curve theoretical."></p>
+</figure>
+</div>
+</div>
+</div>
+</section>
 <section id="do-we-need-to-worry-about-variance" class="slide level2">
 <h2>Do we need to worry about variance?</h2>
 <p><em>Regularizing</em> a neural network (adding a complexity penalty to the loss) is a common practice to prevent overfitting to the noise.</p>

diff --git a/schedule/slides/gfx/hastie_double_descent.png b/schedule/slides/gfx/hastie_double_descent.png
diff --git a/search.json b/search.json
@@ -1390,7 +1390,7 @@
     "href": "schedule/slides/23-nnets-other.html#section",
     "title": "UBC Stat406 2024W",
     "section": "23 Neural nets - generalization",
-    "text": "23 Neural nets - generalization\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 13 November 2024\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n\\]"
+    "text": "23 Neural nets - generalization\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 18 November 2024\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n\\]"
   },
   {
     "objectID": "schedule/slides/23-nnets-other.html#this-lecture",
@@ -1483,6 +1483,13 @@
     "section": "Understanding Double Descent (Hand-Wavy)",
     "text": "Understanding Double Descent (Hand-Wavy)\nLet \\(\\boldsymbol Z \\in \\R^{n \\times d}\\) be the matrix of basis expansions for our \\(n\\) training points.\nBasis regression is just OLS with the basis expansion \\(\\boldsymbol Z\\): \\[ \\min_{\\boldsymbol \\beta} \\left\\Vert \\boldsymbol Z \\boldsymbol \\beta - \\boldsymbol y \\right\\Vert_2^2. \\]\n\nWhen \\(d &lt; n\\), the regressor is underparameterized.\nI.e. there is no \\(\\boldsymbol \\beta\\) that perfectly explains our training responses given our basis-expanded training inputs.\nWhen \\(d = n\\), there is a value of \\(\\boldsymbol \\beta\\) that fits our training data perfectly.\nI.e. \\(\\Vert \\boldsymbol Z \\boldsymbol \\beta - \\boldsymbol y \\Vert = 0\\).\n\nWe are fitting both the noise and the signal (leading to a high variance predictor).\n\nWhen \\(d &gt; n\\), we can also fit the data (noise + signal) perfectly.👋 However, more features implies that the the noise gets “spread out” over all of parameters. 👋\n\n👋 Since each parameter only captures “some” of the noise, we are less likely to make predictions based on it. 👋\nThis explanation is overly simplified, and there is a lot more at play."
   },
+  {
+    "objectID": "schedule/slides/23-nnets-other.html#understanding-double-descent-less-hand-wavy",
+    "href": "schedule/slides/23-nnets-other.html#understanding-double-descent-less-hand-wavy",
+    "title": "UBC Stat406 2024W",
+    "section": "Understanding Double Descent (Less Hand-Wavy)",
+    "text": "Understanding Double Descent (Less Hand-Wavy)\n\n\n(From Hastie et al., 2020)\n\n\\(\\gamma = D / N\\) (ratio of features / data)\n\\(\\sigma^2 = \\mathbb{E}[Y|X]\\) (observational noise)\nWhen basis features are uncorrelated, we have (asymptotically)\n\n\\[\n\\begin{aligned}\n  \\mathrm{Bias}^2 &= \\begin{cases}\n    0 & \\gamma &lt; 1 \\text{ (underparam.)} \\\\\n    1 - \\tfrac{1}{\\gamma} & \\gamma \\geq 1 \\text{ (overparam.)}\n  \\end{cases} \\\\\n  & \\\\\n  \\mathrm{Var} &= \\begin{cases}\n    \\sigma^2 \\tfrac{\\gamma}{1 - \\gamma} & \\gamma &lt; 1 \\text{ (underparam.)} \\\\\n    \\sigma^2 \\tfrac{1}{\\gamma - 1} & \\gamma \\geq 1 \\text{ (overparam.)}\n  \\end{cases} \\\\\n\\end{aligned}\n\\]"
+  },
   {
     "objectID": "schedule/slides/23-nnets-other.html#do-we-need-to-worry-about-variance",
     "href": "schedule/slides/23-nnets-other.html#do-we-need-to-worry-about-variance",

diff --git a/sitemap.xml b/sitemap.xml
@@ -2,194 +2,194 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-r-review.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/handouts/keras-nnet.html</loc>
-    <lastmod>2024-11-14T06:01:46.002Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.566Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/11-kernel-smoothers.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/02-lm-example.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/07-greedy-selection.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/10-basis-expansions.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/13-gams-trees.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/24-pca-intro.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/15-LDA-and-QDA.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/20-boosting.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-classification-losses.html</loc>
-    <lastmod>2024-11-14T06:01:46.006Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/16-logistic-regression.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/23-nnets-other.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/19-bagging-and-rf.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-cv-for-many-models.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/01-lm-review.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/12-why-smooth.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/22-nnets-estimation.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-intro-to-class.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/handouts/lab00-git.html</loc>
-    <lastmod>2024-11-14T06:01:46.002Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.566Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/course-setup.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/windows.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/mac_x86.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/index.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/index.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/mac_arm.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/ubuntu.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/syllabus.html</loc>
-    <lastmod>2024-11-14T06:01:46.050Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.618Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/index.html</loc>
-    <lastmod>2024-11-14T06:01:46.006Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-course-review.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-version-control.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/faq.html</loc>
-    <lastmod>2024-11-14T06:01:45.978Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.546Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/21-nnets-intro.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/03-regression-function.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/06-information-criteria.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/04-bias-variance.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/14-classification-intro.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/27-kmeans.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/08-ridge-regression.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-quiz-0-wrap.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/26-pca-v-kpca.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/25-pca-issues.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/05-estimating-test-mse.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/28-hclust.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/09-l1-penalties.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/17-nonlinear-classifiers.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/18-the-bootstrap.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.578Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-gradient-descent.html</loc>
-    <lastmod>2024-11-14T06:01:46.010Z</lastmod>
+    <lastmod>2024-11-19T03:03:30.574Z</lastmod>
   </url>
 </urlset>