From 2d5e51d91fa1253dbe0b01a6f5885b2c7a687436 Mon Sep 17 00:00:00 2001 From: Geoff Pleiss <824157+gpleiss@users.noreply.github.com> Date: Wed, 16 Oct 2024 22:57:28 -0400 Subject: [PATCH] Logistic regression and gradient descent --- .../execute-results/html.json | 5 +- .../figure-revealjs/generate-data-1.svg | 407 +++++---- .../figure-revealjs/unnamed-chunk-11-1.svg | 349 ++++---- .../figure-revealjs/unnamed-chunk-14-1.svg | 349 ++++---- .../figure-revealjs/unnamed-chunk-17-1.svg | 349 ++++---- .../figure-revealjs/unnamed-chunk-18-1.svg | 787 +++++++++--------- .../figure-revealjs/unnamed-chunk-2-1.svg | 210 +++-- .../figure-revealjs/unnamed-chunk-3-1.svg | 220 +++-- .../figure-revealjs/unnamed-chunk-4-1.svg | 226 +++-- .../figure-revealjs/unnamed-chunk-6-1.svg | 259 +++--- .../figure-revealjs/unnamed-chunk-8-1.svg | 349 ++++---- .../execute-results/html.json | 5 +- .../figure-revealjs/plot-d1-1.svg | 456 +++++----- .../figure-revealjs/unnamed-chunk-1-1.svg | 243 ++++++ .../site_libs/revealjs/dist/theme/quarto.css | 2 +- schedule/slides/00-gradient-descent.qmd | 228 +++-- schedule/slides/16-logistic-regression.qmd | 516 +++++++++--- 17 files changed, 2701 insertions(+), 2259 deletions(-) create mode 100644 _freeze/schedule/slides/16-logistic-regression/figure-revealjs/unnamed-chunk-1-1.svg diff --git a/_freeze/schedule/slides/00-gradient-descent/execute-results/html.json b/_freeze/schedule/slides/00-gradient-descent/execute-results/html.json index 880d932..345ffba 100644 --- a/_freeze/schedule/slides/00-gradient-descent/execute-results/html.json +++ b/_freeze/schedule/slides/00-gradient-descent/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "944b342dabb554b154ff3f17db4c9487", + "hash": "a4b6e50b9b0238476aa2e13a1199201e", "result": { - "markdown": "---\nlecture: \"00 Gradient descent\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 25 October 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n## Simple optimization techniques\n\n\nWe'll see \"gradient descent\" a few times: \n\n1. solves logistic regression (simple version of IRWLS)\n1. gradient boosting\n1. Neural networks\n\nThis seems like a good time to explain it.\n\nSo what is it and how does it work?\n\n\n## Very basic example\n\n::: flex\n::: w-65\n\nSuppose I want to minimize $f(x)=(x-6)^2$ numerically.\n\nI start at a point (say $x_1=23$)\n\nI want to \"go\" in the negative direction of the gradient.\n\nThe gradient (at $x_1=23$) is $f'(23)=2(23-6)=34$.\n\nMove current value toward current value - 34.\n\n$x_2 = x_1 - \\gamma 34$, for $\\gamma$ small.\n\nIn general, $x_{n+1} = x_n -\\gamma f'(x_n)$.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nniter <- 10\ngam <- 0.1\nx <- double(niter)\nx[1] <- 23\ngrad <- function(x) 2 * (x - 6)\nfor (i in 2:niter) x[i] <- x[i - 1] - gam * grad(x[i - 1])\n```\n:::\n\n\n:::\n\n::: w-35\n\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n:::\n:::\n\n\n## Why does this work?\n\n\n[Heuristic interpretation:]{.secondary}\n\n* Gradient tells me the slope.\n\n* negative gradient points toward the minimum\n\n* go that way, but not too far (or we'll miss it)\n\n## Why does this work?\n\n[More rigorous interpretation:]{.secondary}\n\n- Taylor expansion\n$$\nf(x) \\approx f(x_0) + \\nabla f(x_0)^{\\top}(x-x_0) + \\frac{1}{2}(x-x_0)^\\top H(x_0) (x-x_0)\n$$\n- replace $H$ with $\\gamma^{-1} I$\n\n- minimize this quadratic approximation in $x$:\n$$\n0\\overset{\\textrm{set}}{=}\\nabla f(x_0) + \\frac{1}{\\gamma}(x-x_0) \\Longrightarrow x = x_0 - \\gamma \\nabla f(x_0)\n$$\n\n\n## Visually\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-3-1.svg){fig-align='center'}\n:::\n:::\n\n\n## Visually\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-4-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n## What $\\gamma$? (more details than we have time for)\n\nWhat to use for $\\gamma_k$? \n\n\n[Fixed]{.secondary}\n\n- Only works if $\\gamma$ is exactly right \n- Usually does not work\n\n[Decay on a schedule]{.secondary}\n\n$\\gamma_{n+1} = \\frac{\\gamma_n}{1+cn}$ or $\\gamma_{n} = \\gamma_0 b^n$\n\n[Exact line search]{.secondary}\n\n- Tells you exactly how far to go.\n- At each iteration $n$, solve\n$\\gamma_n = \\arg\\min_{s \\geq 0} f( x^{(n)} - s f(x^{(n-1)}))$\n- Usually can't solve this.\n\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nx <- matrix(0, 40, 2); x[1, ] <- c(1, 1)\ngrad <- function(x) c(2, 1) * x\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-6-1.svg){fig-align='center'}\n:::\n:::\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .1\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-8-1.svg){fig-align='center'}\n:::\n:::\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .9 # bigger gamma\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-11-1.svg){fig-align='center'}\n:::\n:::\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .9 # big, but decrease it on schedule\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * .9^k * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-14-1.svg){fig-align='center'}\n:::\n:::\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .5 # theoretically optimal\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-17-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n\n## When do we stop?\n\nFor $\\epsilon>0$, small\n\n\nCheck any / all of\n\n1. $|f'(x)| < \\epsilon$\n1. $|x^{(k)} - x^{(k-1)}| < \\epsilon$\n1. $|f(x^{(k)}) - f(x^{(k-1)})| < \\epsilon$\n\n\n## Stochastic gradient descent\n\nSuppose $f(x) = \\frac{1}{n}\\sum_{i=1}^n f_i(x)$\n\nLike if $f(\\beta) = \\frac{1}{n}\\sum_{i=1}^n (y_i - x^\\top_i\\beta)^2$.\n\nThen $f'(\\beta) = \\frac{1}{n}\\sum_{i=1}^n f'_i(\\beta) = \\frac{1}{n} \\sum_{i=1}^n -2x_i^\\top(y_i - x^\\top_i\\beta)$ \n\nIf $n$ is really big, it may take a long time to compute $f'$\n\nSo, just sample some partition our data into mini-batches $\\mathcal{M}_j$\n\nAnd approximate (imagine the Law of Large Numbers, use a sample to approximate the population)\n$$f'(x) = \\frac{1}{n}\\sum_{i=1}^n f'_i(x) \\approx \\frac{1}{m}\\sum_{i\\in\\mathcal{M}_j}f'_{i}(x)$$\n\n## SGD\n\n$$\n\\begin{aligned}\nf'(\\beta) &= \\frac{1}{n}\\sum_{i=1}^n f'_i(\\beta) = \\frac{1}{n} \\sum_{i=1}^n -2x_i^\\top(y_i - x^\\top_i\\beta)\\\\\nf'(x) &= \\frac{1}{n}\\sum_{i=1}^n f'_i(x) \\approx \\frac{1}{m}\\sum_{i\\in\\mathcal{M}_j}f'_{i}(x)\n\\end{aligned}\n$$\n\n\nUsually cycle through \"mini-batches\":\n\n* Use a different mini-batch at each iteration of GD\n* Cycle through until we see all the data\n\n\nThis is the workhorse for neural network optimization\n\n\n\n# Practice with GD and Logistic regression\n\n\n## Gradient descent for Logistic regression\n\nSuppose $Y=1$ with probability $p(x)$ and $Y=0$ with probability $1-p(x)$, $x \\in \\R$. \n\nI want to model $P(Y=1| X=x)$. \n\nI'll assume that $\\log\\left(\\frac{p(x)}{1-p(x)}\\right) = ax$ for some scalar $a$. This means that\n$p(x) = \\frac{\\exp(ax)}{1+\\exp(ax)} = \\frac{1}{1+\\exp(-ax)}$\n\n. . .\n\n\n::: {.cell layout-align=\"center\" output-location='column'}\n\n```{.r .cell-code}\nn <- 100\na <- 2\nx <- runif(n, -5, 5)\nlogit <- function(x) 1 / (1 + exp(-x))\np <- logit(a * x)\ny <- rbinom(n, 1, p)\ndf <- tibble(x, y)\nggplot(df, aes(x, y)) +\n geom_point(colour = \"cornflowerblue\") +\n stat_function(fun = ~ logit(a * .x))\n```\n\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/generate-data-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## Reminder: the likelihood\n\n$$\nL(y | a, x) = \\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\\textrm{ and }\np(x) = \\frac{1}{1+\\exp(-ax)}\n$$\n\n\n$$\n\\begin{aligned}\n\\ell(y | a, x) &= \\log \\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i} \n= \\sum_{i=1}^n y_i\\log p(x_i) + (1-y_i)\\log(1-p(x_i))\\\\\n&= \\sum_{i=1}^n\\log(1-p(x_i)) + y_i\\log\\left(\\frac{p(x_i)}{1-p(x_i)}\\right)\\\\\n&=\\sum_{i=1}^n ax_i y_i + \\log\\left(1-p(x_i)\\right)\\\\\n&=\\sum_{i=1}^n ax_i y_i + \\log\\left(\\frac{1}{1+\\exp(ax_i)}\\right)\n\\end{aligned}\n$$\n\n## Reminder: the likelihood\n\n$$\nL(y | a, x) = \\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\\textrm{ and }\np(x) = \\frac{1}{1+\\exp(-ax)}\n$$\n\nNow, we want the negative of this. Why? \n\nWe would maximize the likelihood/log-likelihood, so we minimize the negative likelihood/log-likelihood (and scale by $1/n$)\n\n\n$$-\\ell(y | a, x) = \\frac{1}{n}\\sum_{i=1}^n -ax_i y_i - \\log\\left(\\frac{1}{1+\\exp(ax_i)}\\right)$$\n\n## Reminder: the likelihood\n\n$$\n\\frac{1}{n}L(y | a, x) = \\frac{1}{n}\\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\\textrm{ and }\np(x) = \\frac{1}{1+\\exp(-ax)}\n$$\n\nThis is, in the notation of our slides $f(a)$. \n\nWe want to minimize it in $a$ by gradient descent. \n\nSo we need the derivative with respect to $a$: $f'(a)$. \n\nNow, conveniently, this simplifies a lot. \n\n\n$$\n\\begin{aligned}\n\\frac{d}{d a} f(a) &= \\frac{1}{n}\\sum_{i=1}^n -x_i y_i - \\left(-\\frac{x_i \\exp(ax_i)}{1+\\exp(ax_i)}\\right)\\\\\n&=\\frac{1}{n}\\sum_{i=1}^n -x_i y_i + p(x_i)x_i = \\frac{1}{n}\\sum_{i=1}^n -x_i(y_i-p(x_i)).\n\\end{aligned}\n$$\n\n## Reminder: the likelihood\n\n$$\n\\frac{1}{n}L(y | a, x) = \\frac{1}{n}\\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\\textrm{ and }\np(x) = \\frac{1}{1+\\exp(-ax)}\n$$\n\n(Simple) gradient descent to minimize $-\\ell(a)$ or maximize $L(y|a,x)$ is:\n\n1. Input $a_1,\\ \\gamma>0,\\ j_\\max,\\ \\epsilon>0,\\ \\frac{d}{da} -\\ell(a)$.\n2. For $j=1,\\ 2,\\ \\ldots,\\ j_\\max$,\n$$a_j = a_{j-1} - \\gamma \\frac{d}{da} (-\\ell(a_{j-1}))$$\n3. Stop if $\\epsilon > |a_j - a_{j-1}|$ or $|d / da\\ \\ell(a)| < \\epsilon$.\n\n## Reminder: the likelihood\n\n$$\n\\frac{1}{n}L(y | a, x) = \\frac{1}{n}\\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\\textrm{ and }\np(x) = \\frac{1}{1+\\exp(-ax)}\n$$\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-line-numbers=\"1,10-11|2-3|4-9|\"}\namle <- function(x, y, a0, gam = 0.5, jmax = 50, eps = 1e-6) {\n a <- double(jmax) # place to hold stuff (always preallocate space)\n a[1] <- a0 # starting value\n for (j in 2:jmax) { # avoid possibly infinite while loops\n px <- logit(a[j - 1] * x)\n grad <- mean(-x * (y - px))\n a[j] <- a[j - 1] - gam * grad\n if (abs(grad) < eps || abs(a[j] - a[j - 1]) < eps) break\n }\n a[1:j]\n}\n```\n:::\n\n\n\n\n## Try it:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nround(too_big <- amle(x, y, 5, 50), 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 5.000 3.360 2.019 1.815 2.059 1.782 2.113 1.746 2.180 1.711 2.250 1.684\n[13] 2.309 1.669 2.344 1.663 2.359 1.661 2.364 1.660 2.365 1.660 2.366 1.660\n[25] 2.366 1.660 2.366 1.660 2.366 1.660 2.366 1.660 2.366 1.660 2.366 1.660\n[37] 2.366 1.660 2.366 1.660 2.366 1.660 2.366 1.660 2.366 1.660 2.366 1.660\n[49] 2.366 1.660\n```\n:::\n\n```{.r .cell-code}\nround(too_small <- amle(x, y, 5, 1), 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 5.000 4.967 4.934 4.902 4.869 4.837 4.804 4.772 4.739 4.707 4.675 4.643\n[13] 4.611 4.579 4.547 4.515 4.483 4.451 4.420 4.388 4.357 4.326 4.294 4.263\n[25] 4.232 4.201 4.170 4.140 4.109 4.078 4.048 4.018 3.988 3.957 3.927 3.898\n[37] 3.868 3.838 3.809 3.779 3.750 3.721 3.692 3.663 3.635 3.606 3.578 3.550\n[49] 3.522 3.494\n```\n:::\n\n```{.r .cell-code}\nround(just_right <- amle(x, y, 5, 10), 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 5.000 4.672 4.351 4.038 3.735 3.445 3.171 2.917 2.688 2.488 2.322 2.191\n[13] 2.094 2.027 1.983 1.956 1.940 1.930 1.925 1.922 1.920 1.919 1.918 1.918\n[25] 1.918 1.918 1.918 1.917 1.917 1.917 1.917\n```\n:::\n:::\n\n\n\n## Visual\n\n\n\n\n::: {.cell layout-align=\"center\" output-location='column'}\n\n```{.r .cell-code}\nnegll <- function(a) {\n -a * mean(x * y) -\n rowMeans(log(1 / (1 + exp(outer(a, x)))))\n}\nblah <- list_rbind(\n map(\n rlang::dots_list(\n too_big, too_small, just_right, .named = TRUE\n ), \n as_tibble),\n names_to = \"gamma\"\n) |> mutate(negll = negll(value))\nggplot(blah, aes(value, negll)) +\n geom_point(aes(colour = gamma)) +\n facet_wrap(~gamma, ncol = 1) +\n stat_function(fun = negll, xlim = c(-2.5, 5)) +\n scale_y_log10() + \n xlab(\"a\") + \n ylab(\"negative log likelihood\") +\n geom_vline(xintercept = tail(just_right, 1)) +\n scale_colour_brewer(palette = \"Set1\") +\n theme(legend.position = \"none\")\n```\n\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-18-1.svg){fig-align='center'}\n:::\n:::\n\n\n## Check vs. `glm()`\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nsummary(glm(y ~ x - 1, family = \"binomial\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nglm(formula = y ~ x - 1, family = \"binomial\")\n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \nx 1.9174 0.4785 4.008 6.13e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for binomial family taken to be 1)\n\n Null deviance: 138.629 on 100 degrees of freedom\nResidual deviance: 32.335 on 99 degrees of freedom\nAIC: 34.335\n\nNumber of Fisher Scoring iterations: 7\n```\n:::\n:::\n", + "engine": "knitr", + "markdown": "---\nlecture: \"00 Gradient descent\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 16 October 2024\n\n\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n## Motivation: maximum likelihood estimation as optimization\n\nBy the principle of maximum likelihood, we have that\n\n$$\n\\begin{align*}\n\\hat \\beta\n&= \\argmax_{\\beta} \\prod_{i=1}^n \\P(Y_i \\mid X_i)\n\\\\\n&= \\argmin_{\\beta} \\sum_{i=1}^n -\\log\\P(Y_i \\mid X_i)\n\\end{align*}\n$$\n\nUnder the model we use for logistic regression...\n$$\n\\begin{gathered}\n\\P(Y=1 \\mid X=x) = h(\\beta^\\top x), \\qquad \\P(Y=0 \\mid X=x) = h(-\\beta^\\top x),\n\\\\\nh(z) = \\tfrac{1}{1-e^{-z}}\n\\end{gathered}\n$$\n\n... we can't simply find the argmin with algebra.\n\n## Gradient descent: the workhorse optimization algorithm\n\nWe'll see \"gradient descent\" a few times: \n\n1. solves logistic regression\n1. gradient boosting\n1. Neural networks\n\nThis seems like a good time to explain it.\n\nSo what is it and how does it work?\n\n\n## Very basic example\n\n::: flex\n::: w-65\n\nSuppose I want to minimize $f(x)=(x-6)^2$ numerically.\n\nI start at a point (say $x_1=23$)\n\nI want to \"go\" in the negative direction of the gradient.\n\nThe gradient (at $x_1=23$) is $f'(23)=2(23-6)=34$.\n\nMove current value toward current value - 34.\n\n$x_2 = x_1 - \\gamma 34$, for $\\gamma$ small.\n\nIn general, $x_{n+1} = x_n -\\gamma f'(x_n)$.\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nniter <- 10\ngam <- 0.1\nx <- double(niter)\nx[1] <- 23\ngrad <- function(x) 2 * (x - 6)\nfor (i in 2:niter) x[i] <- x[i - 1] - gam * grad(x[i - 1])\n```\n:::\n\n\n\n:::\n\n::: w-35\n\n\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-2-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n:::\n:::\n\n\n## Why does this work?\n\n\n[Heuristic interpretation:]{.secondary}\n\n* Gradient tells me the slope.\n\n* negative gradient points toward the minimum\n\n* go that way, but not too far (or we'll miss it)\n\n## Why does this work?\n\n[More rigorous interpretation:]{.secondary}\n\n- Taylor expansion\n$$\nf(x) \\approx f(x_0) + \\nabla f(x_0)^{\\top}(x-x_0) + \\frac{1}{2}(x-x_0)^\\top H(x_0) (x-x_0)\n$$\n- replace $H$ with $\\gamma^{-1} I$\n\n- minimize this quadratic approximation in $x$:\n$$\n0\\overset{\\textrm{set}}{=}\\nabla f(x_0) + \\frac{1}{\\gamma}(x-x_0) \\Longrightarrow x = x_0 - \\gamma \\nabla f(x_0)\n$$\n\n\n## Visually\n\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-3-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n## Visually\n\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-4-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## What $\\gamma$? (more details than we have time for)\n\nWhat to use for $\\gamma_k$? \n\n\n[Fixed]{.secondary}\n\n- Only works if $\\gamma$ is exactly right \n- Usually does not work\n\n[Decay on a schedule]{.secondary}\n\n$\\gamma_{n+1} = \\frac{\\gamma_n}{1+cn}$ or $\\gamma_{n} = \\gamma_0 b^n$\n\n[Exact line search]{.secondary}\n\n- Tells you exactly how far to go.\n- At each iteration $n$, solve\n$\\gamma_n = \\arg\\min_{s \\geq 0} f( x^{(n)} - s f(x^{(n-1)}))$\n- Usually can't solve this.\n\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nx <- matrix(0, 40, 2); x[1, ] <- c(1, 1)\ngrad <- function(x) c(2, 1) * x\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-6-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .1\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-8-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .9 # bigger gamma\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-11-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .9 # big, but decrease it on schedule\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * .9^k * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-14-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n##\n\n$$ f(x_1,x_2) = x_1^2 + 0.5x_2^2$$\n\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngamma <- .5 # theoretically optimal\nfor (k in 2:40) x[k, ] <- x[k - 1, ] - gamma * grad(x[k - 1, ])\n```\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-17-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n\n\n## When do we stop?\n\nFor $\\epsilon>0$, small\n\n\nCheck any / all of\n\n1. $|f'(x)| < \\epsilon$\n1. $|x^{(k)} - x^{(k-1)}| < \\epsilon$\n1. $|f(x^{(k)}) - f(x^{(k-1)})| < \\epsilon$\n\n\n## Stochastic gradient descent\n\nIf we're optimizing\n$\\argmin_\\beta \\sum_{i=1}^n -\\log P_\\beta(Y_i \\mid X_i)$\nthen our derivative will also have an additive form\n$$\n\\sum_{i=1}^n \\frac{\\partial}{\\partial \\beta} \\left[-\\log P_\\beta(Y_i \\mid X_i) \\right]\n$$\n\nIf $n$ is really big, it may take a long time to compute this gradient\n\nSo, just sample some partition our data $\\mathcal{M} \\subset \\{ (X_i, Y_i)\\}_{i=1}^n$\nand approximate:\n$$\\sum_{i=1}^n \\frac{\\partial}{\\partial \\beta} \\left[-\\log P_\\beta(Y_i \\mid X_i) \\right] \\approx \\frac{n}{\\vert \\mathcal M \\vert}\\sum_{i\\in\\mathcal{M}} \\left[-\\log P_\\beta(Y_i \\mid X_i) \\right]$$\n[(Justification: Law of Large Numbers. Use a sample to approximate the population.)]{.secondary}\n\n## SGD\n\n$$\n\\begin{aligned}\nf'(\\beta) &= \\frac{1}{n}\\sum_{i=1}^n f'_i(\\beta) = \\frac{1}{n} \\sum_{i=1}^n -2x_i^\\top(y_i - x^\\top_i\\beta)\\\\\nf'(x) &= \\frac{1}{n}\\sum_{i=1}^n f'_i(x) \\approx \\frac{1}{m}\\sum_{i\\in\\mathcal{M}_j}f'_{i}(x)\n\\end{aligned}\n$$\n\n\nUsually cycle through \"mini-batches\":\n\n* Use a different mini-batch at each iteration of GD\n* Cycle through until we see all the data\n\n\nThis is the workhorse for neural network optimization\n\n\n\n# Practice with GD and Logistic regression\n\n\n## Gradient descent for Logistic regression\n\n$$\n\\begin{gathered}\n\\P(Y=1 \\mid X=x) = h(\\beta^\\top x), \\qquad \\P(Y=0 \\mid X=x) = h(-\\beta^\\top x),\n\\\\\n\\\\\nh(z) = \\tfrac{1}{1-e^{-z}}\n\\end{gathered}\n$$\n\n\n\n\n\n\n. . .\n\n\n\n::: {.cell layout-align=\"center\" output-location='column'}\n\n```{.r .cell-code}\nn <- 100\nbeta <- 2\nx <- runif(n, -5, 5)\nlogit <- function(x) 1 / (1 + exp(-x))\np <- logit(beta * x)\ny <- rbinom(n, 1, p)\ndf <- tibble(x, y)\nggplot(df, aes(x, y)) +\n geom_point(colour = \"cornflowerblue\") +\n stat_function(fun = ~ logit(beta * .x))\n```\n\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/generate-data-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n\n---\n\n$$\n\\P(Y=1 \\mid X=x) = h(\\beta^\\top x), \\qquad \\P(Y=0 \\mid X=x) = h(-\\beta^\\top x)\n$$\n\nUnder maximum likelihood\n\n$$\n\\hat \\beta = \\argmin_{\\beta} \\underbrace{\n \\textstyle{\\sum_{i=1}^n - \\log( \\P_\\beta(Y_i=y_i \\mid X_i=x_i) )}\n}_{:= -\\ell(\\beta)}\n$$\n\n---\n\n$$\n\\begin{align*}\n\\P_\\beta(Y_i=y_i \\mid X_i=X_i) &= h\\left( [-1]^{1 - y_i} \\beta^\\top x_i \\right)\n\\\\\n\\\\\n-\\ell(\\beta) &= \\sum_{i=1}^n -\\log\\left( \\P_\\beta(Y_i=y_i \\mid X_i=X_i) \\right)\n\\\\\n&= \\sum_{i=1}^n \\log\\left( 1 + \\exp\\left( [-1]^{y_i} \\beta^\\top x_i \\right) \\right)\n\\\\\n\\\\\n-\\frac{\\partial \\ell}{\\partial \\beta}\n&= \\sum_{i=1}^n x_i \\frac{\\exp\\left( [-1]^{y_i} \\beta^\\top x_i \\right)}{1 + \\exp\\left( [-1]^{y_i} \\beta^\\top x_i \\right)}\n\\\\\n&= \\sum_{i=1}^n x_i \\left( y_i - \\P_\\beta(Y_i=y_i \\mid X_i=X_i) \\right)\n\\end{align*}\n$$\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n---\n\n## Finding $\\hat\\beta = \\argmin_{\\beta} -\\ell(\\beta)$ with gradient descent:\n\n1. Input $\\beta_0,\\ \\gamma>0,\\ j_\\max,\\ \\epsilon>0,\\ \\tfrac{d \\ell}{d\\beta}$.\n2. For $j=1,\\ 2,\\ \\ldots,\\ j_\\max$,\n$$\\beta_j = \\beta_{j-1} - \\gamma \\left(\\tfrac{d}{d\\beta}-\\!\\ell(\\beta_{j-1}) \\right)$$\n3. Stop if $\\epsilon > |\\beta_j - \\beta_{j-1}|$ or $|d\\ell / d\\beta\\ | < \\epsilon$.\n\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbeta.mle <- function(x, y, beta0, gamma = 0.5, jmax = 50, eps = 1e-6) {\n beta <- double(jmax) # place to hold stuff (always preallocate space)\n beta[1] <- beta0 # starting value\n for (j in 2:jmax) { # avoid possibly infinite while loops\n px <- logit(beta[j - 1] * x)\n grad <- mean(-x * (y - px))\n beta[j] <- beta[j - 1] - gamma * grad\n if (abs(grad) < eps || abs(beta[j] - beta[j - 1]) < eps) break\n }\n beta[1:j]\n}\n```\n:::\n\n\n\n\n\n## Try it:\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ntoo_big <- beta.mle(x, y, beta0 = 5, gamma = 50)\ntoo_small <- beta.mle(x, y, beta0 = 5, gamma = 1)\njust_right <- beta.mle(x, y, beta0 = 5, gamma = 10)\n```\n:::\n\n::: {.cell layout-align=\"center\" output-location='column'}\n\n```{.r .cell-code}\nnegll <- function(beta) {\n -beta * mean(x * y) -\n rowMeans(log(1 / (1 + exp(outer(beta, x)))))\n}\nblah <- list_rbind(\n map(\n rlang::dots_list(\n too_big, too_small, just_right, .named = TRUE\n ), \n as_tibble),\n names_to = \"gamma\"\n) |> mutate(negll = negll(value))\nggplot(blah, aes(value, negll)) +\n geom_point(aes(colour = gamma)) +\n facet_wrap(~gamma, ncol = 1) +\n stat_function(fun = negll, xlim = c(-2.5, 5)) +\n scale_y_log10() + \n xlab(\"beta\") + \n ylab(\"negative log likelihood\") +\n geom_vline(xintercept = tail(just_right, 1)) +\n scale_colour_brewer(palette = \"Set1\") +\n theme(legend.position = \"none\")\n```\n\n::: {.cell-output-display}\n![](00-gradient-descent_files/figure-revealjs/unnamed-chunk-18-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n## Check vs. `glm()`\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nsummary(glm(y ~ x - 1, family = \"binomial\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\nCall:\nglm(formula = y ~ x - 1, family = \"binomial\")\n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \nx 1.9174 0.4785 4.008 6.13e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for binomial family taken to be 1)\n\n Null deviance: 138.629 on 100 degrees of freedom\nResidual deviance: 32.335 on 99 degrees of freedom\nAIC: 34.335\n\nNumber of Fisher Scoring iterations: 7\n```\n\n\n:::\n:::\n", "supporting": [ "00-gradient-descent_files" ], diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/generate-data-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/generate-data-1.svg index bcff84a..e7210ab 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/generate-data-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/generate-data-1.svg @@ -3,338 +3,329 @@ - - - - + - + - + - + - + - + - - - - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - + + + + - - - - + + + + - - - - + + + + - - - - + + + + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - + + + - + - + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-11-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-11-1.svg index bc95a3d..1171f36 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-11-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-11-1.svg @@ -11,144 +11,135 @@ - - - - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -162,7 +153,7 @@ - + @@ -171,210 +162,210 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - + + + - - + + - - + + - + - - + + + + + + + + - - + + - - + + - + + - - - - - - - - + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-14-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-14-1.svg index 59d7ece..4489159 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-14-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-14-1.svg @@ -11,144 +11,135 @@ - - - - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -162,7 +153,7 @@ - + @@ -171,210 +162,210 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - + + + - - + + - - + + - + - - + + + + + + + + - - + + - - + + - + + - - - - - - - - + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-17-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-17-1.svg index 52a36ae..ba7c98d 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-17-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-17-1.svg @@ -11,144 +11,135 @@ - - - - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -162,7 +153,7 @@ - + @@ -171,210 +162,210 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - + + + - - + + - - + + - + - - + + + + + + + + - - + + - - + + - + + - - - - - - - - + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-18-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-18-1.svg index 1459783..afe3040 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-18-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-18-1.svg @@ -1,720 +1,723 @@ - + - + - + - + - + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + + + + + + + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - + + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - + - + - - - - - - - - - + + + + + + + + + - + - - - - - - - + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - - - - + + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - + + + + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-2-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-2-1.svg index 52ed4cf..f72c315 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-2-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-2-1.svg @@ -3,256 +3,244 @@ - - - - + - + - + - + - - - - - - - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - + - + - + - + - + - - - + + + - - - + + + - - - - - - - - - - - - - + + + + + + + + + + + + + - - - + + + - - - + + + - + - - + + - - + + - + - + - + - + - + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-3-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-3-1.svg index 1f398d0..821a684 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-3-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-3-1.svg @@ -3,286 +3,274 @@ - - - - + - + - + - + - + - - - - - - - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - + + + + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - - + + - - + + - + - + - + - + - + - + - - - - - - - - - - - + + + + + + + + + + + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-4-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-4-1.svg index da31a52..ece4bc2 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-4-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-4-1.svg @@ -3,290 +3,278 @@ - - - - + - + - + - + - + - - - - - - - + - + - + - + - - - - + - + - + - - + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - + + + + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - - + + - - + + - + - + - + - + - + - + - - - - - - - - - - - - + + + + + + + + + + + + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-6-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-6-1.svg index de1d0da..610d185 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-6-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-6-1.svg @@ -11,135 +11,126 @@ - - - - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - + + - + @@ -153,7 +144,7 @@ - + @@ -162,163 +153,163 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - + + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - + + + - - + + - - + + - + - - + + + + + + + + - - + + - - + + - + + - - - - - - - - + diff --git a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-8-1.svg b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-8-1.svg index 2f08f97..79cb8dc 100644 --- a/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-8-1.svg +++ b/_freeze/schedule/slides/00-gradient-descent/figure-revealjs/unnamed-chunk-8-1.svg @@ -11,144 +11,135 @@ - - - - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -162,7 +153,7 @@ - + @@ -171,210 +162,210 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - + + + + - - - - + + + + - - - + + + - - - + + + - - - + + + - - + + - - + + - + - - + + + + + + + + - - + + - - + + - + + - - - - - - - - + diff --git a/_freeze/schedule/slides/16-logistic-regression/execute-results/html.json b/_freeze/schedule/slides/16-logistic-regression/execute-results/html.json index 3903950..80119d9 100644 --- a/_freeze/schedule/slides/16-logistic-regression/execute-results/html.json +++ b/_freeze/schedule/slides/16-logistic-regression/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "b376b72072374723c49782e4976a8d40", + "hash": "81d2686692c199cb2714ad2bef573cd8", "result": { - "markdown": "---\nlecture: \"16 Logistic regression\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 25 October 2023\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n$$\n\n\n\n\n\n## Last time\n\n\n* We showed that with two classes, the [Bayes' classifier]{.secondary} is\n\n$$g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}$$\n\nwhere $p_1(X) = Pr(X \\given Y=1)$ and $p_0(X) = Pr(X \\given Y=0)$\n\n* We then looked at what happens if we **assume** $Pr(X \\given Y=y)$ is Normally distributed.\n\nWe then used this distribution and the class prior $\\pi$ to find the __posterior__ $Pr(Y=1 \\given X=x)$.\n\n## Direct model\n\nInstead, let's directly model the posterior\n\n$$\n\\begin{aligned}\nPr(Y = 1 \\given X=x) & = \\frac{\\exp\\{\\beta_0 + \\beta^{\\top}x\\}}{1 + \\exp\\{\\beta_0 + \\beta^{\\top}x\\}} \\\\\nPr(Y = 0 | X=x) & = \\frac{1}{1 + \\exp\\{\\beta_0 + \\beta^{\\top}x\\}}=1-\\frac{\\exp\\{\\beta_0 + \\beta^{\\top}x\\}}{1 + \\exp\\{\\beta_0 + \\beta^{\\top}x\\}}\n\\end{aligned}\n$$\n\nThis is logistic regression.\n\n\n## Why this?\n\n$$Pr(Y = 1 \\given X=x) = \\frac{\\exp\\{\\beta_0 + \\beta^{\\top}x\\}}{1 + \\exp\\{\\beta_0 + \\beta^{\\top}x\\}}$$\n\n* There are lots of ways to map $\\R \\mapsto [0,1]$.\n\n* The \"logistic\" function $z\\mapsto (1 + \\exp(-z))^{-1} = \\exp(z) / (1+\\exp(z)) =:h(z)$ is nice.\n\n* It's symmetric: $1 - h(z) = h(-z)$\n\n* Has a nice derivative: $h'(z) = \\frac{\\exp(z)}{(1 + \\exp(z))^2} = h(z)(1-h(z))$.\n\n* It's the inverse of the \"log-odds\" (logit): $\\log(p / (1-p))$.\n\n\n\n## Another linear classifier\n\nLike LDA, logistic regression is a linear classifier\n\nThe _logit_ (i.e.: log odds) transformation\ngives a linear decision boundary\n$$\\log\\left( \\frac{\\P(Y = 1 \\given X=x)}{\\P(Y = 0 \\given X=x) } \\right) = \\beta_0 + \\beta^{\\top} x$$\nThe decision boundary is the hyperplane\n$\\{x : \\beta_0 + \\beta^{\\top} x = 0\\}$\n\nIf the log-odds are below 0, classify as 0, above 0 classify as a 1.\n\n## Logistic regression is also easy in R\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic <- glm(y ~ ., dat, family = \"binomial\")\n```\n:::\n\n\nOr we can use lasso or ridge regression or a GAM as before\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlasso_logit <- cv.glmnet(x, y, family = \"binomial\")\nridge_logit <- cv.glmnet(x, y, alpha = 0, family = \"binomial\")\ngam_logit <- gam(y ~ s(x), data = dat, family = \"binomial\")\n```\n:::\n\n\n\n::: aside\nglm means generalized linear model\n:::\n\n\n## Baby example (continued from last time)\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndat1 <- generate_lda_2d(100, Sigma = .5 * diag(2))\nlogit <- glm(y ~ ., dat1 |> mutate(y = y - 1), family = \"binomial\")\nsummary(logit)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nglm(formula = y ~ ., family = \"binomial\", data = mutate(dat1, \n y = y - 1))\n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \n(Intercept) -2.6649 0.6281 -4.243 2.21e-05 ***\nx1 2.5305 0.5995 4.221 2.43e-05 ***\nx2 1.6610 0.4365 3.805 0.000142 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for binomial family taken to be 1)\n\n Null deviance: 138.469 on 99 degrees of freedom\nResidual deviance: 68.681 on 97 degrees of freedom\nAIC: 74.681\n\nNumber of Fisher Scoring iterations: 6\n```\n:::\n:::\n\n\n\n## Visualizing the classification boundary\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\ngr <- expand_grid(x1 = seq(-2.5, 3, length.out = 100), \n x2 = seq(-2.5, 3, length.out = 100))\npts <- predict(logit, gr)\ng0 <- ggplot(dat1, aes(x1, x2)) +\n scale_shape_manual(values = c(\"0\", \"1\"), guide = \"none\") +\n geom_raster(data = tibble(gr, disc = pts), aes(x1, x2, fill = disc)) +\n geom_point(aes(shape = as.factor(y)), size = 4) +\n coord_cartesian(c(-2.5, 3), c(-2.5, 3)) +\n scale_fill_steps2(n.breaks = 6, name = \"log odds\") \ng0\n```\n\n::: {.cell-output-display}\n![](16-logistic-regression_files/figure-revealjs/plot-d1-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n## Calculation\n\nWhile the `R` formula for logistic regression is straightforward, it's not as easy to compute as OLS or LDA or QDA.\n\n\nLogistic regression for two classes simplifies to a likelihood:\n\nWrite $p_i(\\beta) = \\P(Y_i = 1 | X = x_i,\\beta)$\n\n* $P(Y_i = y_i \\given X = x_i, \\beta) = p_i^{y_i}(1-p_i)^{1-y_i}$ (...Bernoulli distribution)\n\n* $P(\\mathbf{Y} \\given \\mathbf{X}, \\beta) = \\prod_{i=1}^n p_i^{y_i}(1-p_i)^{1-y_i}$. \n\n\n## Calculation\n\n\nWrite $p_i(\\beta) = \\P(Y_i = 1 | X = x_i,\\beta)$\n\n\n$$\n\\begin{aligned}\n\\ell(\\beta) \n& = \\log \\left( \\prod_{i=1}^n p_i^{y_i}(1-p_i)^{1-y_i} \\right)\\\\\n&=\\sum_{i=1}^n \\left( y_i\\log(p_i(\\beta)) + (1-y_i)\\log(1-p_i(\\beta))\\right) \\\\\n& = \n\\sum_{i=1}^n \\left( y_i\\log(e^{\\beta^{\\top}x_i}/(1+e^{\\beta^{\\top}x_i})) - (1-y_i)\\log(1+e^{\\beta^{\\top}x_i})\\right) \\\\\n& = \n\\sum_{i=1}^n \\left( y_i\\beta^{\\top}x_i -\\log(1 + e^{\\beta^{\\top} x_i})\\right)\n\\end{aligned}\n$$\n\nThis gets optimized via Newton-Raphson updates and iteratively reweighed\nleast squares.\n\n\n## IRWLS for logistic regression (skip for now)\n\n(This is preparation for Neural Networks.)\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogit_irwls <- function(y, x, maxit = 100, tol = 1e-6) {\n p <- ncol(x)\n beta <- double(p) # initialize coefficients\n beta0 <- 0\n conv <- FALSE # hasn't converged\n iter <- 1 # first iteration\n while (!conv && (iter < maxit)) { # check loops\n iter <- iter + 1 # update first thing (so as not to forget)\n eta <- beta0 + x %*% beta\n mu <- 1 / (1 + exp(-eta))\n gp <- 1 / (mu * (1 - mu)) # inverse of derivative of logistic\n z <- eta + (y - mu) * gp # effective transformed response\n beta_new <- coef(lm(z ~ x, weights = 1 / gp)) # do Weighted Least Squares\n conv <- mean(abs(c(beta0, beta) - betaNew)) < tol # check if the betas are \"moving\"\n beta0 <- betaNew[1] # update betas\n beta <- betaNew[-1]\n }\n return(c(beta0, beta))\n}\n```\n:::\n\n\n\n## Comparing LDA and Logistic regression\n\nBoth decision boundaries are linear in $x$: \n\n- LDA $\\longrightarrow \\alpha_0 + \\alpha_1^\\top x$ \n- Logit $\\longrightarrow \\beta_0 + \\beta_1^\\top x$.\n\nBut the parameters are estimated differently.\n\n## Comparing LDA and Logistic regression\n\nExamine the joint distribution of $(X,\\ Y)$ [(not the posterior)]{.f3}: \n\n- LDA: $f(X_i,\\ Y_i) = \\underbrace{ f(X_i \\given Y_i)}_{\\textrm{Gaussian}}\\underbrace{ f(Y_i)}_{\\textrm{Bernoulli}}$\n- Logistic Regression: $f(X_i,Y_i) = \\underbrace{ f(Y_i\\given X_i)}_{\\textrm{Logistic}}\\underbrace{ f(X_i)}_{\\textrm{Ignored}}$\n \n* LDA estimates the joint, but Logistic estimates only the conditional (posterior) distribution. [But this is really all we need.]{.hand}\n\n* So logistic requires fewer assumptions.\n\n* But if the two classes are perfectly separable, logistic crashes (and the MLE is undefined, too many solutions)\n\n* LDA \"works\" even if the conditional isn't normal, but works very poorly if any X is qualitative\n\n\n## Comparing with QDA (2 classes)\n\n\n* Recall: this gives a \"quadratic\" decision boundary (it's a curve).\n\n* If we have $p$ columns in $X$\n - Logistic estimates $p+1$ parameters\n - LDA estimates $2p + p(p+1)/2 + 1$\n - QDA estimates $2p + p(p+1) + 1$\n \n* If $p=50$,\n - Logistic: 51\n - LDA: 1376\n - QDA: 2651\n \n* QDA doesn't get used much: there are better nonlinear versions with way fewer parameters\n\n## Bad parameter counting\n\nI've motivated LDA as needing $\\Sigma$, $\\pi$ and $\\mu_0$, $\\mu_1$\n\nIn fact, we don't _need_ all of this to get the decision boundary.\n\nSo the \"degrees of freedom\" is much lower if we only want the _classes_ and not\nthe _probabilities_.\n\nThe decision boundary only really depends on\n\n* $\\Sigma^{-1}(\\mu_1-\\mu_0)$ \n* $(\\mu_1+\\mu_0)$, \n* so appropriate algorithms estimate $<2p$ parameters.\n\n## Note again:\n\nwhile logistic regression and LDA produce linear decision boundaries, they are **not** linear smoothers\n\nAIC/BIC/Cp work if you use the likelihood correctly and count degrees-of-freedom correctly\n\nMust people use either test set or CV\n\n\n# Next time:\n\nNonlinear classifiers\n", + "engine": "knitr", + "markdown": "---\nlecture: \"16 Logistic regression\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n\n\n## {{< meta lecture >}} {.large background-image=\"gfx/smooths.svg\" background-opacity=\"0.3\"}\n\n[Stat 406]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 16 October 2024\n\n\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n## Last time\n\n\n* We showed that with two classes, the [Bayes' classifier]{.secondary} is\n\n$$g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}$$\n\nwhere\n\n* $p_1(X) = \\P(X \\given Y=1)$\n* $p_0(X) = \\P(X \\given Y=0)$\n* $\\pi = \\P(Y=1)$\n\n. . .\n\n### This lecture\n\nHow do we estimate $p_1(X), p_0(X), \\pi$?\n\n## Warmup: estimating $\\pi = Pr(Y=1)$\n\nA good estimator:\n\n$$\n\\hat \\pi = \\frac{1}{n} \\sum_{i=1}^n 1_{y_i = 1}\n$$\n\n(I.e. count the number of $1$s in the training set)\n\nThis estimator is low bias/variance.\n\n. . .\n\n\\\n*As we will soon see, it turns out we won't have to use this estimator.*\n\n\n\n\n\n\n## Estimating $p_0$ and $p_1$ is much harder\n\n$\\P(X=x \\mid Y = 1)$ and $\\P(X=x \\mid Y = 0)$ are $p$-dimensional distributions.\\\n[Remember the *curse of dimensionality*?]{.small}\n\n\\\n[Can we simplify our estimation problem?]{.secondary}\n\n\n## Sidestepping the hard estimation problem\n\nRather than estimating $\\P(X=x \\mid Y = 1)$ and $\\P(X=x \\mid Y = 0)$,\nI claim we can instead estimate the simpler ratio\n\n$$\n\\frac{\\P(Y = 1 \\mid X=x)}{\\P(Y = 0 \\mid X=x)}\n$$\nWhy?\n\n. . .\n\n\\\nRecall that $g_*(X)$ ony depends on the *ratio* $\\P(X \\mid Y = 1) / \\P(X \\mid Y = 0)$.\n\n. . .\n\n$$\n\\begin{align*}\n \\frac{\\P(X=x \\mid Y = 1)}{\\P(X=x \\mid Y = 0)}\n &=\n \\frac{\n \\tfrac{\\P(Y = 1 \\mid X=x) \\P(X=x)}{\\P(Y = 1)}\n }{\n \\tfrac{\\P(Y = 0 \\mid X=x) \\P(X=x)}{\\P(Y = 0)}\n }\n =\n \\frac{\\P(Y = 1 \\mid X=x)}{\\P(Y = 0 \\mid X=x)}\n \\underbrace{\\left(\\frac{1-\\pi}{\\pi}\\right)}_{\\text{Easy to estimate with } \\hat \\pi}\n\\end{align*}\n$$\n\n## Direct model\n\n[As with regression, we'll start with a simple model.]{.small}\\\nAssume our data can be modelled by a distribution of the form\n\n$$\n\\log\\left( \\frac{\\P(Y = 1 \\mid X=x)}{\\P(Y = 0 \\mid X=x)} \\right) = \\beta_0 + \\beta^\\top x\n$$\n\n[Why does it make sense to model the *log ratio* rather than the *ratio*?]{.secondary}\n\n. . .\n\n\nFrom this eq., we can recover an estimate of the ratio we need for the [Bayes classifier]{.secondary}:\n\n$$\n\\begin{align*}\n\\log\\left( \\frac{\\P(X=x \\mid Y = 1)}{\\P(X=x \\mid Y = 0)} \\right) \n&=\n\\log\\left( \\frac{\\tfrac{\\P(X=x)}{\\P(Y = 1)}}{\\tfrac{\\P(X=x)}{\\P(Y = 0)}} \\right) +\n\\log\\left( \\frac{\\P(Y = 1 \\mid X=x)}{\\P(Y = 0 \\mid X=x)} \\right)\n\\\\\n&= \\underbrace{\\left( \\tfrac{1 - \\pi}{\\pi} + \\beta_0 \\right)}_{\\beta_0'} + \\beta^\\top x\n\\end{align*}\n$$\n\n## Recovering class probabilities\n\n$$\n\\text{Our model:}\\qquad\n\\log\\left( \\frac{\\P(Y = 1 \\mid X=x)}{\\P(Y = 0 \\mid X=x)} \\right) = \\beta_0 + \\beta^\\top x\n$$\n\n. . .\n\n\\\nWe know that $\\P(Y = 1 \\mid X=x) + \\P(Y = 0 \\mid X=x) = 1$. So...\n\n$$\n\\frac{\\P(Y = 1 \\mid X=x)}{\\P(Y = 0 \\mid X=x)}\n=\n\\frac{\\P(Y = 1 \\mid X=x)}{1 - \\P(Y = 1 \\mid X=x)}\n=\n\\exp\\left( \\beta_0 + \\beta^\\top x \\right),\n$$\n\n---\n\n$$\n\\frac{\\P(Y = 1 \\mid X=x)}{\\P(Y = 0 \\mid X=x)}\n=\n\\frac{\\P(Y = 1 \\mid X=x)}{1 - \\P(Y = 1 \\mid X=x)}\n=\n\\exp\\left( \\beta_0 + \\beta^\\top x \\right),\n$$\n\n[After algebra...]{.small}\n$$\n\\begin{aligned}\n\\P(Y = 1 \\given X=x) &= \\frac{\\exp\\{\\beta_0 + \\beta^{\\top}x\\}}{1 + \\exp\\{\\beta_0 + \\beta^{\\top}x\\}},\n\\\\\\\n\\P(Y = 0 | X=x) &= \\frac{1}{1 + \\exp\\{\\beta_0 + \\beta^{\\top}x\\}}\n\\end{aligned}\n$$\n\nThis is logistic regression.\n\n\n## Logistic Regression\n\n$$\\P(Y = 1 \\given X=x) = \\frac{\\exp\\{\\beta_0 + \\beta^{\\top}x\\}}{1 + \\exp\\{\\beta_0 + \\beta^{\\top}x\\}}\n= h\\left( \\beta_0 + \\beta^\\top x \\right),$$\n\nwhere $h(z) = (1 + \\exp(-z))^{-1} = \\exp(z) / (1+\\exp(z))$\nis the [logistic function]{.secondary}.\n\n\\\n\n::: flex\n::: w-60\n\n### The \"logistic\" function is nice.\n\n* It's symmetric: $1 - h(z) = h(-z)$\n\n* Has a nice derivative: $h'(z) = \\frac{\\exp(z)}{(1 + \\exp(z))^2} = h(z)(1-h(z))$.\n\n* It's the inverse of the \"log-odds\" (logit): $\\log(p / (1-p))$.\n\n:::\n::: w-35\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](16-logistic-regression_files/figure-revealjs/unnamed-chunk-1-1.svg){fig-align='center'}\n:::\n:::\n\n\n:::\n:::\n\n\n## Logistic regression is a Linear Classifier\n\n\nLogistic regression is a [linear classifier]{.secondary}\n\n\n\n$$\\log\\left( \\frac{\\P(Y = 1 \\given X=x)}{\\P(Y = 0 \\given X=x) } \\right) = \\beta_0 + \\beta^{\\top} x$$\n\n* If the log-odds are $>0$, classify as 1\\\n [($Y=1$ is more likely)]{.small}\n\n* If the log-odds are $<0$, classify as a 0\\\n [($Y=0$ is more likely)]{.small}\n\n\\\nThe [decision boundary]{.secondary} is the hyperplane\n$\\{x : \\beta_0 + \\beta^{\\top} x = 0\\}$\n\n\n## Visualizing the classification boundary\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code code-fold=\"true\"}\nlibrary(mvtnorm)\nlibrary(MASS)\ngenerate_lda_2d <- function(\n n, p = c(.5, .5),\n mu = matrix(c(0, 0, 1, 1), 2),\n Sigma = diag(2)) {\n X <- rmvnorm(n, sigma = Sigma)\n tibble(\n y = which(rmultinom(n, 1, p) == 1, TRUE)[, 1],\n x1 = X[, 1] + mu[1, y],\n x2 = X[, 2] + mu[2, y]\n )\n}\n\ndat1 <- generate_lda_2d(100, Sigma = .5 * diag(2))\nlogit <- glm(y ~ ., dat1 |> mutate(y = y - 1), family = \"binomial\")\n\ngr <- expand_grid(x1 = seq(-2.5, 3, length.out = 100), \n x2 = seq(-2.5, 3, length.out = 100))\npts <- predict(logit, gr)\ng0 <- ggplot(dat1, aes(x1, x2)) +\n scale_shape_manual(values = c(\"0\", \"1\"), guide = \"none\") +\n geom_raster(data = tibble(gr, disc = pts), aes(x1, x2, fill = disc)) +\n geom_point(aes(shape = as.factor(y)), size = 4) +\n coord_cartesian(c(-2.5, 3), c(-2.5, 3)) +\n scale_fill_steps2(n.breaks = 6, name = \"log odds\") \ng0\n```\n\n::: {.cell-output-display}\n![](16-logistic-regression_files/figure-revealjs/plot-d1-1.svg){fig-align='center'}\n:::\n:::\n\n\n\n\n## Linear classifier $\\ne$ linear smoother\n\n* While logistic regression produces linear decision boundaries, it is **not** a linear smoother\n\n* AIC/BIC/Cp work if you use the likelihood correctly and count degrees-of-freedom correctly\n\n * $\\mathrm{df} = p + 1$ ([Why?]{.secondary})\n\n* Most people use CV\n\n\n## Logistic regression in R\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic <- glm(y ~ ., dat, family = \"binomial\")\n```\n:::\n\n\n\nOr we can use lasso or ridge regression or a GAM as before\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlasso_logit <- cv.glmnet(x, y, family = \"binomial\")\nridge_logit <- cv.glmnet(x, y, alpha = 0, family = \"binomial\")\ngam_logit <- gam(y ~ s(x), data = dat, family = \"binomial\")\n```\n:::\n\n\n\n\n::: aside\nglm means [generalized linear model]{.secondary}\n:::\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Estimating $\\beta_0$, $\\beta$\n\nFor regression...\n\n$$\n\\text{Model:} \\qquad\n\\P(Y=y \\mid X=x) = \\mathcal N( y; \\:\\: \\beta^\\top x, \\:\\:\\sigma^2)\n$$\n\n... recall that we motivated OLS with\nthe [principle of maximum likelihood]{.secondary}\n\n$$\n\\begin{align*}\n\\hat \\beta_\\mathrm{OLS}\n&= \\argmax_{\\beta} \\prod_{i=1}^n \\P(Y_i = y_i \\mid X_i = x_i)\n\\\\\n&= \\argmin_{\\beta} \\sum_{i=1}^n -\\log\\P(Y_i = y_i \\mid X_i = x_i)\n\\\\\n\\\\\n&= \\ldots (\\text{because regression is nice})\n\\\\\n&= \\textstyle \\left( \\sum_{i=1}^n x_i x_i^\\top \\right)^{-1}\\left( \\sum_{i=1}^n y_i x_i \\right)\n\\end{align*}\n$$\n\n## Estimating $\\beta_0$, $\\beta$\n\nFor classification with logistic regression...\n\n$$\n\\begin{align*}\n\\text{Model:} &\\qquad\n\\tfrac{\\P(Y=1 \\mid X=x)}{\\P(Y=0 \\mid X=x)} = \\exp\\left( \\beta_0 +\\beta^\\top x \\right)\n\\\\\n\\text{Or alternatively:} &\\qquad \\P(Y=1 \\mid X=x) = h\\left(\\beta_0 + \\beta^\\top x \\right)\n\\\\\n&\\qquad \\P(Y=0 \\mid X=x) = h\\left(-(\\beta_0 + \\beta^\\top x)\\right)\n\\end{align*}\n$$\n... we can also apply\nthe [principle of maximum likelihood]{.secondary}\n\n$$\n\\begin{align*}\n\\hat \\beta_{0}, \\hat \\beta\n&= \\argmax_{\\beta_0, \\beta} \\prod_{i=1}^n \\P(Y_i \\mid X_i)\n\\\\\n&= \\argmin_{\\beta_0,\\beta} \\sum_{i=1}^n -\\log\\P(Y_i \\mid X_i)\n\\end{align*}\n$$\n\n. . .\n\nUnfortunately that's as far as we can get with algebra alone.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n# Next time:\n\nThe workhorse algorithm for obtaining $\\hat \\beta_0$, $\\hat \\beta$\n", "supporting": [ "16-logistic-regression_files" ], diff --git a/_freeze/schedule/slides/16-logistic-regression/figure-revealjs/plot-d1-1.svg b/_freeze/schedule/slides/16-logistic-regression/figure-revealjs/plot-d1-1.svg index a9c714f..0f9afdd 100644 --- a/_freeze/schedule/slides/16-logistic-regression/figure-revealjs/plot-d1-1.svg +++ b/_freeze/schedule/slides/16-logistic-regression/figure-revealjs/plot-d1-1.svg @@ -3,627 +3,615 @@ - - - - + - - - - + - + - + - + - + - + - + - + - + - + - - - - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - + + - - + + - + - + - + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - + + - - + + - + - + - + - + - - + + - - + + - - - - - - + - - - + + + + + + + + + + + + + - - + + + - + + - + - - - - - - - - + diff --git a/_freeze/schedule/slides/16-logistic-regression/figure-revealjs/unnamed-chunk-1-1.svg b/_freeze/schedule/slides/16-logistic-regression/figure-revealjs/unnamed-chunk-1-1.svg new file mode 100644 index 0000000..be1b7e6 --- /dev/null +++ b/_freeze/schedule/slides/16-logistic-regression/figure-revealjs/unnamed-chunk-1-1.svg @@ -0,0 +1,243 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/_freeze/site_libs/revealjs/dist/theme/quarto.css b/_freeze/site_libs/revealjs/dist/theme/quarto.css index 62c5b6b..2be515e 100644 --- a/_freeze/site_libs/revealjs/dist/theme/quarto.css +++ b/_freeze/site_libs/revealjs/dist/theme/quarto.css @@ -5,4 +5,4 @@ * we also add `bright-[color]-` synonyms for the `-[color]-intense` classes since * that seems to be what ansi_up emits * -*/.ansi-black-fg{color:#3e424d}.ansi-black-bg{background-color:#3e424d}.ansi-black-intense-black,.ansi-bright-black-fg{color:#282c36}.ansi-black-intense-black,.ansi-bright-black-bg{background-color:#282c36}.ansi-red-fg{color:#e75c58}.ansi-red-bg{background-color:#e75c58}.ansi-red-intense-red,.ansi-bright-red-fg{color:#b22b31}.ansi-red-intense-red,.ansi-bright-red-bg{background-color:#b22b31}.ansi-green-fg{color:#00a250}.ansi-green-bg{background-color:#00a250}.ansi-green-intense-green,.ansi-bright-green-fg{color:#007427}.ansi-green-intense-green,.ansi-bright-green-bg{background-color:#007427}.ansi-yellow-fg{color:#ddb62b}.ansi-yellow-bg{background-color:#ddb62b}.ansi-yellow-intense-yellow,.ansi-bright-yellow-fg{color:#b27d12}.ansi-yellow-intense-yellow,.ansi-bright-yellow-bg{background-color:#b27d12}.ansi-blue-fg{color:#208ffb}.ansi-blue-bg{background-color:#208ffb}.ansi-blue-intense-blue,.ansi-bright-blue-fg{color:#0065ca}.ansi-blue-intense-blue,.ansi-bright-blue-bg{background-color:#0065ca}.ansi-magenta-fg{color:#d160c4}.ansi-magenta-bg{background-color:#d160c4}.ansi-magenta-intense-magenta,.ansi-bright-magenta-fg{color:#a03196}.ansi-magenta-intense-magenta,.ansi-bright-magenta-bg{background-color:#a03196}.ansi-cyan-fg{color:#60c6c8}.ansi-cyan-bg{background-color:#60c6c8}.ansi-cyan-intense-cyan,.ansi-bright-cyan-fg{color:#258f8f}.ansi-cyan-intense-cyan,.ansi-bright-cyan-bg{background-color:#258f8f}.ansi-white-fg{color:#c5c1b4}.ansi-white-bg{background-color:#c5c1b4}.ansi-white-intense-white,.ansi-bright-white-fg{color:#a1a6b2}.ansi-white-intense-white,.ansi-bright-white-bg{background-color:#a1a6b2}.ansi-default-inverse-fg{color:#fff}.ansi-default-inverse-bg{background-color:#000}.ansi-bold{font-weight:bold}.ansi-underline{text-decoration:underline}:root{--quarto-body-bg: #fefefe;--quarto-body-color: #222;--quarto-text-muted: #6f6f6f;--quarto-border-color: #bbbbbb;--quarto-border-width: 1px;--quarto-border-radius: 4px}table.gt_table{color:var(--quarto-body-color);font-size:1em;width:100%;background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_column_spanner_outer{color:var(--quarto-body-color);background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_col_heading{color:var(--quarto-body-color);font-weight:bold;background-color:rgba(0,0,0,0)}table.gt_table thead.gt_col_headings{border-bottom:1px solid currentColor;border-top-width:inherit;border-top-color:var(--quarto-border-color)}table.gt_table thead.gt_col_headings:not(:first-child){border-top-width:1px;border-top-color:var(--quarto-border-color)}table.gt_table td.gt_row{border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-width:0px}table.gt_table tbody.gt_table_body{border-top-width:1px;border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-color:currentColor}div.columns{display:initial;gap:initial}div.column{display:inline-block;overflow-x:initial;vertical-align:top;width:50%}.code-annotation-tip-content{word-wrap:break-word}.code-annotation-container-hidden{display:none !important}dl.code-annotation-container-grid{display:grid;grid-template-columns:min-content auto}dl.code-annotation-container-grid dt{grid-column:1}dl.code-annotation-container-grid dd{grid-column:2}pre.sourceCode.code-annotation-code{padding-right:0}code.sourceCode .code-annotation-anchor{z-index:100;position:relative;float:right;background-color:rgba(0,0,0,0)}input[type=checkbox]{margin-right:.5ch}:root{--mermaid-bg-color: #fefefe;--mermaid-edge-color: #e98a15;--mermaid-node-fg-color: #222;--mermaid-fg-color: #222;--mermaid-fg-color--lighter: #3c3c3c;--mermaid-fg-color--lightest: #555555;--mermaid-font-family: Commissioner, Source Sans Pro, Helvetica, sans-serif;--mermaid-label-bg-color: #fefefe;--mermaid-label-fg-color: #2c365e;--mermaid-node-bg-color: rgba(44, 54, 94, 0.1);--mermaid-node-fg-color: #222}@media print{:root{font-size:11pt}#quarto-sidebar,#TOC,.nav-page{display:none}.page-columns .content{grid-column-start:page-start}.fixed-top{position:relative}.panel-caption,.figure-caption,figcaption{color:#666}}.code-copy-button{position:absolute;top:0;right:0;border:0;margin-top:5px;margin-right:5px;background-color:rgba(0,0,0,0);z-index:3}.code-copy-button:focus{outline:none}.code-copy-button-tooltip{font-size:.75em}pre.sourceCode:hover>.code-copy-button>.bi::before{display:inline-block;height:1rem;width:1rem;content:"";vertical-align:-0.125em;background-image:url('data:image/svg+xml,');background-repeat:no-repeat;background-size:1rem 1rem}pre.sourceCode:hover>.code-copy-button-checked>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button:hover>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button-checked:hover>.bi::before{background-image:url('data:image/svg+xml,')}.panel-tabset [role=tablist]{border-bottom:1px solid #bbb;list-style:none;margin:0;padding:0;width:100%}.panel-tabset [role=tablist] *{-webkit-box-sizing:border-box;box-sizing:border-box}@media(min-width: 30em){.panel-tabset [role=tablist] li{display:inline-block}}.panel-tabset [role=tab]{border:1px solid rgba(0,0,0,0);border-top-color:#bbb;display:block;padding:.5em 1em;text-decoration:none}@media(min-width: 30em){.panel-tabset [role=tab]{border-top-color:rgba(0,0,0,0);display:inline-block;margin-bottom:-1px}}.panel-tabset [role=tab][aria-selected=true]{background-color:#bbb}@media(min-width: 30em){.panel-tabset [role=tab][aria-selected=true]{background-color:rgba(0,0,0,0);border:1px solid #bbb;border-bottom-color:#fefefe}}@media(min-width: 30em){.panel-tabset [role=tab]:hover:not([aria-selected=true]){border:1px solid #bbb}}.code-with-filename .code-with-filename-file{margin-bottom:0;padding-bottom:2px;padding-top:2px;padding-left:.7em;border:var(--quarto-border-width) solid var(--quarto-border-color);border-radius:var(--quarto-border-radius);border-bottom:0;border-bottom-left-radius:0%;border-bottom-right-radius:0%}.code-with-filename div.sourceCode,.reveal .code-with-filename div.sourceCode{margin-top:0;border-top-left-radius:0%;border-top-right-radius:0%}.code-with-filename .code-with-filename-file pre{margin-bottom:0}.code-with-filename .code-with-filename-file{background-color:rgba(219,219,219,.8)}.quarto-dark .code-with-filename .code-with-filename-file{background-color:#555}.code-with-filename .code-with-filename-file strong{font-weight:400}a.external:after{content:"";background-image:url('data:image/svg+xml,');background-size:contain;background-repeat:no-repeat;background-position:center center;margin-left:.2em;padding-right:.75em}div.sourceCode code a.external:after{content:none}a.external:after:hover{cursor:pointer}.quarto-ext-icon{display:inline-block;font-size:.75em;padding-left:.3em}.reveal.center .slide aside,.reveal.center .slide div.aside{position:initial}section.has-light-background,section.has-light-background h1,section.has-light-background h2,section.has-light-background h3,section.has-light-background h4,section.has-light-background h5,section.has-light-background h6{color:#222}section.has-light-background a,section.has-light-background a:hover{color:#2a76dd}section.has-light-background code{color:#4758ab}section.has-dark-background,section.has-dark-background h1,section.has-dark-background h2,section.has-dark-background h3,section.has-dark-background h4,section.has-dark-background h5,section.has-dark-background h6{color:#fff}section.has-dark-background a,section.has-dark-background a:hover{color:#42affa}section.has-dark-background code{color:#ffa07a}#title-slide,div.reveal div.slides section.quarto-title-block{text-align:center}#title-slide .subtitle,div.reveal div.slides section.quarto-title-block .subtitle{margin-bottom:2.5rem}.reveal .slides{text-align:left}.reveal .title-slide h1{font-size:1.6em}.reveal[data-navigation-mode=linear] .title-slide h1{font-size:2.5em}.reveal div.sourceCode{border:1px solid #bbb;border-radius:4px}.reveal pre{width:100%;box-shadow:none;background-color:#fefefe;border:none;margin:0;font-size:.55em}.reveal code{color:var(--quarto-hl-fu-color);background-color:rgba(0,0,0,0);white-space:pre-wrap}.reveal pre.sourceCode code{background-color:#fefefe;padding:6px 9px;max-height:500px;white-space:pre}.reveal pre code{background-color:#fefefe;color:#222}.reveal .column-output-location{display:flex;align-items:stretch}.reveal .column-output-location .column:first-of-type div.sourceCode{height:100%;background-color:#fefefe}.reveal blockquote{display:block;position:relative;color:#6f6f6f;width:unset;margin:var(--r-block-margin) auto;padding:.625rem 1.75rem;border-left:.25rem solid #6f6f6f;font-style:normal;background:none;box-shadow:none}.reveal blockquote p:first-child,.reveal blockquote p:last-child{display:block}.reveal .slide aside,.reveal .slide div.aside{position:absolute;bottom:20px;font-size:0.7em;color:#6f6f6f}.reveal .slide sup{font-size:0.7em}.reveal .slide.scrollable aside,.reveal .slide.scrollable div.aside{position:relative;margin-top:1em}.reveal .slide aside .aside-footnotes{margin-bottom:0}.reveal .slide aside .aside-footnotes li:first-of-type{margin-top:0}.reveal .layout-sidebar{display:flex;width:100%;margin-top:.8em}.reveal .layout-sidebar .panel-sidebar{width:270px}.reveal .layout-sidebar-left .panel-sidebar{margin-right:calc(0.5em*2)}.reveal .layout-sidebar-right .panel-sidebar{margin-left:calc(0.5em*2)}.reveal .layout-sidebar .panel-fill,.reveal .layout-sidebar .panel-center,.reveal .layout-sidebar .panel-tabset{flex:1}.reveal .panel-input,.reveal .panel-sidebar{font-size:.5em;padding:.5em;border-style:solid;border-color:#bbb;border-width:1px;border-radius:4px;background-color:#f8f9fa}.reveal .panel-sidebar :first-child,.reveal .panel-fill :first-child{margin-top:0}.reveal .panel-sidebar :last-child,.reveal .panel-fill :last-child{margin-bottom:0}.panel-input>div,.panel-input>div>div{vertical-align:middle;padding-right:1em}.reveal p,.reveal .slides section,.reveal .slides section>section{line-height:1.3}.reveal.smaller .slides section,.reveal .slides section.smaller,.reveal .slides section .callout{font-size:0.7em}.reveal.smaller .slides section section{font-size:inherit}.reveal.smaller .slides h1,.reveal .slides section.smaller h1{font-size:calc(2.5em/0.7)}.reveal.smaller .slides h2,.reveal .slides section.smaller h2{font-size:calc(1.6em/0.7)}.reveal.smaller .slides h3,.reveal .slides section.smaller h3{font-size:calc(1.3em/0.7)}.reveal .columns>.column>:not(ul,ol){margin-left:.25em;margin-right:.25em}.reveal .columns>.column:first-child>:not(ul,ol){margin-right:.5em;margin-left:0}.reveal .columns>.column:last-child>:not(ul,ol){margin-right:0;margin-left:.5em}.reveal .slide-number{color:#eea143;background-color:#fefefe}.reveal .footer{color:#6f6f6f}.reveal .footer a{color:#e98a15}.reveal .footer.has-dark-background{color:#fff}.reveal .footer.has-dark-background a{color:#7bc6fa}.reveal .footer.has-light-background{color:#505050}.reveal .footer.has-light-background a{color:#6a9bdd}.reveal .slide-number{color:#6f6f6f}.reveal .slide-number.has-dark-background{color:#fff}.reveal .slide-number.has-light-background{color:#505050}.reveal .slide figure>figcaption,.reveal .slide img.stretch+p.caption,.reveal .slide img.r-stretch+p.caption{font-size:0.7em}@media screen and (min-width: 500px){.reveal .controls[data-controls-layout=edges] .navigate-left{left:.2em}.reveal .controls[data-controls-layout=edges] .navigate-right{right:.2em}.reveal .controls[data-controls-layout=edges] .navigate-up{top:.4em}.reveal .controls[data-controls-layout=edges] .navigate-down{bottom:2.3em}}.tippy-box[data-theme~=light-border]{background-color:#fefefe;color:#222;border-radius:4px;border:solid 1px #6f6f6f;font-size:.6em}.tippy-box[data-theme~=light-border] .tippy-arrow{color:#6f6f6f}.tippy-box[data-placement^=bottom]>.tippy-content{padding:7px 10px;z-index:1}.reveal .callout.callout-style-simple .callout-body,.reveal .callout.callout-style-default .callout-body,.reveal .callout.callout-style-simple div.callout-title,.reveal .callout.callout-style-default div.callout-title{font-size:inherit}.reveal .callout.callout-style-default .callout-icon::before,.reveal .callout.callout-style-simple .callout-icon::before{height:2rem;width:2rem;background-size:2rem 2rem}.reveal .callout.callout-titled .callout-title p{margin-top:.5em}.reveal .callout.callout-titled .callout-icon::before{margin-top:1rem}.reveal .callout.callout-titled .callout-body>.callout-content>:last-child{margin-bottom:1rem}.reveal .panel-tabset [role=tab]{padding:.25em .7em}.reveal .slide-menu-button .fa-bars::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-easel2::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-brush::before{background-image:url('data:image/svg+xml,')}/*! light */.reveal ol[type=a]{list-style-type:lower-alpha}.reveal ol[type=a s]{list-style-type:lower-alpha}.reveal ol[type=A s]{list-style-type:upper-alpha}.reveal ol[type=i]{list-style-type:lower-roman}.reveal ol[type=i s]{list-style-type:lower-roman}.reveal ol[type=I s]{list-style-type:upper-roman}.reveal ol[type="1"]{list-style-type:decimal}.reveal ul.task-list{list-style:none}.reveal ul.task-list li input[type=checkbox]{width:2em;height:2em;margin:0 1em .5em -1.6em;vertical-align:middle}div.cell-output-display div.pagedtable-wrapper table.table{font-size:.6em}.reveal .code-annotation-container-hidden{display:none}.reveal code.sourceCode button.code-annotation-anchor,.reveal code.sourceCode .code-annotation-anchor{font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;color:var(--quarto-hl-co-color);border:solid var(--quarto-hl-co-color) 1px;border-radius:50%;font-size:.7em;line-height:1.2em;margin-top:2px;user-select:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;-o-user-select:none}.reveal code.sourceCode button.code-annotation-anchor{cursor:pointer}.reveal code.sourceCode a.code-annotation-anchor{text-align:center;vertical-align:middle;text-decoration:none;cursor:default;height:1.2em;width:1.2em}.reveal code.sourceCode.fragment a.code-annotation-anchor{left:auto}.reveal #code-annotation-line-highlight-gutter{width:100%;border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2}.reveal #code-annotation-line-highlight{margin-left:-8em;width:calc(100% + 4em);border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2;margin-bottom:-2px}.reveal code.sourceCode .code-annotation-anchor.code-annotation-active{background-color:var(--quarto-hl-normal-color, #aaaaaa);border:solid var(--quarto-hl-normal-color, #aaaaaa) 1px;color:#fefefe;font-weight:bolder}.reveal pre.code-annotation-code{padding-top:0;padding-bottom:0}.reveal pre.code-annotation-code code{z-index:3;padding-left:0px}.reveal dl.code-annotation-container-grid{margin-left:.1em}.reveal dl.code-annotation-container-grid dt{margin-top:.65rem;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;border:solid #222 1px;border-radius:50%;height:1.3em;width:1.3em;line-height:1.3em;font-size:.5em;text-align:center;vertical-align:middle;text-decoration:none}.reveal dl.code-annotation-container-grid dd{margin-left:.25em}.reveal .scrollable ol li:first-child:nth-last-child(n+10),.reveal .scrollable ol li:first-child:nth-last-child(n+10)~li{margin-left:1em}html.print-pdf .reveal .slides .pdf-page:last-child{page-break-after:avoid}.reveal .quarto-title-block .quarto-title-authors{display:flex;justify-content:center}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author{padding-left:.5em;padding-right:.5em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:hover,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:visited,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:active{color:inherit;text-decoration:none}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-name{margin-bottom:.1rem}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-email{margin-top:0px;margin-bottom:.4em;font-size:.6em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-orcid img{margin-bottom:4px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation{font-size:.7em;margin-top:0px;margin-bottom:8px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation:first{margin-top:12px}ol{padding-left:.5em}ul{list-style:none}ul li::marker{color:#2c365e}dt{color:#e98a15}#title-slide{text-align:left}#title-slide .title{color:#2c365e}#title-slide .author{text-align:left;color:#e98a15;font-weight:bold}#title-slide .institute{padding-bottom:200px}.small{font-size:.75em}.smallest{font-size:.5em}.large{font-size:1.5em}.medium{font-size:.9em}.center-align{text-align:center}.hand{font-family:"Gochi Hand",cursive;font-size:125%}.hand-blue{font-family:"Gochi Hand",cursive;color:var(--primary);font-size:125%}/*# sourceMappingURL=f95d2bded9c28492b788fe14c3e9f347.css.map */ +*/.ansi-black-fg{color:#3e424d}.ansi-black-bg{background-color:#3e424d}.ansi-black-intense-black,.ansi-bright-black-fg{color:#282c36}.ansi-black-intense-black,.ansi-bright-black-bg{background-color:#282c36}.ansi-red-fg{color:#e75c58}.ansi-red-bg{background-color:#e75c58}.ansi-red-intense-red,.ansi-bright-red-fg{color:#b22b31}.ansi-red-intense-red,.ansi-bright-red-bg{background-color:#b22b31}.ansi-green-fg{color:#00a250}.ansi-green-bg{background-color:#00a250}.ansi-green-intense-green,.ansi-bright-green-fg{color:#007427}.ansi-green-intense-green,.ansi-bright-green-bg{background-color:#007427}.ansi-yellow-fg{color:#ddb62b}.ansi-yellow-bg{background-color:#ddb62b}.ansi-yellow-intense-yellow,.ansi-bright-yellow-fg{color:#b27d12}.ansi-yellow-intense-yellow,.ansi-bright-yellow-bg{background-color:#b27d12}.ansi-blue-fg{color:#208ffb}.ansi-blue-bg{background-color:#208ffb}.ansi-blue-intense-blue,.ansi-bright-blue-fg{color:#0065ca}.ansi-blue-intense-blue,.ansi-bright-blue-bg{background-color:#0065ca}.ansi-magenta-fg{color:#d160c4}.ansi-magenta-bg{background-color:#d160c4}.ansi-magenta-intense-magenta,.ansi-bright-magenta-fg{color:#a03196}.ansi-magenta-intense-magenta,.ansi-bright-magenta-bg{background-color:#a03196}.ansi-cyan-fg{color:#60c6c8}.ansi-cyan-bg{background-color:#60c6c8}.ansi-cyan-intense-cyan,.ansi-bright-cyan-fg{color:#258f8f}.ansi-cyan-intense-cyan,.ansi-bright-cyan-bg{background-color:#258f8f}.ansi-white-fg{color:#c5c1b4}.ansi-white-bg{background-color:#c5c1b4}.ansi-white-intense-white,.ansi-bright-white-fg{color:#a1a6b2}.ansi-white-intense-white,.ansi-bright-white-bg{background-color:#a1a6b2}.ansi-default-inverse-fg{color:#fff}.ansi-default-inverse-bg{background-color:#000}.ansi-bold{font-weight:bold}.ansi-underline{text-decoration:underline}:root{--quarto-body-bg: #fefefe;--quarto-body-color: #222;--quarto-text-muted: #6f6f6f;--quarto-border-color: #bbbbbb;--quarto-border-width: 1px;--quarto-border-radius: 4px}table.gt_table{color:var(--quarto-body-color);font-size:1em;width:100%;background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_column_spanner_outer{color:var(--quarto-body-color);background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_col_heading{color:var(--quarto-body-color);font-weight:bold;background-color:rgba(0,0,0,0)}table.gt_table thead.gt_col_headings{border-bottom:1px solid currentColor;border-top-width:inherit;border-top-color:var(--quarto-border-color)}table.gt_table thead.gt_col_headings:not(:first-child){border-top-width:1px;border-top-color:var(--quarto-border-color)}table.gt_table td.gt_row{border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-width:0px}table.gt_table tbody.gt_table_body{border-top-width:1px;border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-color:currentColor}div.columns{display:initial;gap:initial}div.column{display:inline-block;overflow-x:initial;vertical-align:top;width:50%}.code-annotation-tip-content{word-wrap:break-word}.code-annotation-container-hidden{display:none !important}dl.code-annotation-container-grid{display:grid;grid-template-columns:min-content auto}dl.code-annotation-container-grid dt{grid-column:1}dl.code-annotation-container-grid dd{grid-column:2}pre.sourceCode.code-annotation-code{padding-right:0}code.sourceCode .code-annotation-anchor{z-index:100;position:relative;float:right;background-color:rgba(0,0,0,0)}input[type=checkbox]{margin-right:.5ch}:root{--mermaid-bg-color: #fefefe;--mermaid-edge-color: #e98a15;--mermaid-node-fg-color: #222;--mermaid-fg-color: #222;--mermaid-fg-color--lighter: #3c3c3c;--mermaid-fg-color--lightest: #555555;--mermaid-font-family: Commissioner, Source Sans Pro, Helvetica, sans-serif;--mermaid-label-bg-color: #fefefe;--mermaid-label-fg-color: #2c365e;--mermaid-node-bg-color: rgba(44, 54, 94, 0.1);--mermaid-node-fg-color: #222}@media print{:root{font-size:11pt}#quarto-sidebar,#TOC,.nav-page{display:none}.page-columns .content{grid-column-start:page-start}.fixed-top{position:relative}.panel-caption,.figure-caption,figcaption{color:#666}}.code-copy-button{position:absolute;top:0;right:0;border:0;margin-top:5px;margin-right:5px;background-color:rgba(0,0,0,0);z-index:3}.code-copy-button:focus{outline:none}.code-copy-button-tooltip{font-size:.75em}pre.sourceCode:hover>.code-copy-button>.bi::before{display:inline-block;height:1rem;width:1rem;content:"";vertical-align:-0.125em;background-image:url('data:image/svg+xml,');background-repeat:no-repeat;background-size:1rem 1rem}pre.sourceCode:hover>.code-copy-button-checked>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button:hover>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button-checked:hover>.bi::before{background-image:url('data:image/svg+xml,')}.panel-tabset [role=tablist]{border-bottom:1px solid #bbb;list-style:none;margin:0;padding:0;width:100%}.panel-tabset [role=tablist] *{-webkit-box-sizing:border-box;box-sizing:border-box}@media(min-width: 30em){.panel-tabset [role=tablist] li{display:inline-block}}.panel-tabset [role=tab]{border:1px solid rgba(0,0,0,0);border-top-color:#bbb;display:block;padding:.5em 1em;text-decoration:none}@media(min-width: 30em){.panel-tabset [role=tab]{border-top-color:rgba(0,0,0,0);display:inline-block;margin-bottom:-1px}}.panel-tabset [role=tab][aria-selected=true]{background-color:#bbb}@media(min-width: 30em){.panel-tabset [role=tab][aria-selected=true]{background-color:rgba(0,0,0,0);border:1px solid #bbb;border-bottom-color:#fefefe}}@media(min-width: 30em){.panel-tabset [role=tab]:hover:not([aria-selected=true]){border:1px solid #bbb}}.code-with-filename .code-with-filename-file{margin-bottom:0;padding-bottom:2px;padding-top:2px;padding-left:.7em;border:var(--quarto-border-width) solid var(--quarto-border-color);border-radius:var(--quarto-border-radius);border-bottom:0;border-bottom-left-radius:0%;border-bottom-right-radius:0%}.code-with-filename div.sourceCode,.reveal .code-with-filename div.sourceCode{margin-top:0;border-top-left-radius:0%;border-top-right-radius:0%}.code-with-filename .code-with-filename-file pre{margin-bottom:0}.code-with-filename .code-with-filename-file{background-color:rgba(219,219,219,.8)}.quarto-dark .code-with-filename .code-with-filename-file{background-color:#555}.code-with-filename .code-with-filename-file strong{font-weight:400}a.external:after{content:"";background-image:url('data:image/svg+xml,');background-size:contain;background-repeat:no-repeat;background-position:center center;margin-left:.2em;padding-right:.75em}div.sourceCode code a.external:after{content:none}a.external:after:hover{cursor:pointer}.quarto-ext-icon{display:inline-block;font-size:.75em;padding-left:.3em}.reveal.center .slide aside,.reveal.center .slide div.aside{position:initial}section.has-light-background,section.has-light-background h1,section.has-light-background h2,section.has-light-background h3,section.has-light-background h4,section.has-light-background h5,section.has-light-background h6{color:#222}section.has-light-background a,section.has-light-background a:hover{color:#2a76dd}section.has-light-background code{color:#4758ab}section.has-dark-background,section.has-dark-background h1,section.has-dark-background h2,section.has-dark-background h3,section.has-dark-background h4,section.has-dark-background h5,section.has-dark-background h6{color:#fff}section.has-dark-background a,section.has-dark-background a:hover{color:#42affa}section.has-dark-background code{color:#ffa07a}#title-slide,div.reveal div.slides section.quarto-title-block{text-align:center}#title-slide .subtitle,div.reveal div.slides section.quarto-title-block .subtitle{margin-bottom:2.5rem}.reveal .slides{text-align:left}.reveal .title-slide h1{font-size:1.6em}.reveal[data-navigation-mode=linear] .title-slide h1{font-size:2.5em}.reveal div.sourceCode{border:1px solid #bbb;border-radius:4px}.reveal pre{width:100%;box-shadow:none;background-color:#fefefe;border:none;margin:0;font-size:.55em}.reveal .code-with-filename .code-with-filename-file pre{background-color:unset}.reveal code{color:var(--quarto-hl-fu-color);background-color:rgba(0,0,0,0);white-space:pre-wrap}.reveal pre.sourceCode code{background-color:#fefefe;padding:6px 9px;max-height:500px;white-space:pre}.reveal pre code{background-color:#fefefe;color:#222}.reveal .column-output-location{display:flex;align-items:stretch}.reveal .column-output-location .column:first-of-type div.sourceCode{height:100%;background-color:#fefefe}.reveal blockquote{display:block;position:relative;color:#6f6f6f;width:unset;margin:var(--r-block-margin) auto;padding:.625rem 1.75rem;border-left:.25rem solid #6f6f6f;font-style:normal;background:none;box-shadow:none}.reveal blockquote p:first-child,.reveal blockquote p:last-child{display:block}.reveal .slide aside,.reveal .slide div.aside{position:absolute;bottom:20px;font-size:0.7em;color:#6f6f6f}.reveal .slide sup{font-size:0.7em}.reveal .slide.scrollable aside,.reveal .slide.scrollable div.aside{position:relative;margin-top:1em}.reveal .slide aside .aside-footnotes{margin-bottom:0}.reveal .slide aside .aside-footnotes li:first-of-type{margin-top:0}.reveal .layout-sidebar{display:flex;width:100%;margin-top:.8em}.reveal .layout-sidebar .panel-sidebar{width:270px}.reveal .layout-sidebar-left .panel-sidebar{margin-right:calc(0.5em*2)}.reveal .layout-sidebar-right .panel-sidebar{margin-left:calc(0.5em*2)}.reveal .layout-sidebar .panel-fill,.reveal .layout-sidebar .panel-center,.reveal .layout-sidebar .panel-tabset{flex:1}.reveal .panel-input,.reveal .panel-sidebar{font-size:.5em;padding:.5em;border-style:solid;border-color:#bbb;border-width:1px;border-radius:4px;background-color:#f8f9fa}.reveal .panel-sidebar :first-child,.reveal .panel-fill :first-child{margin-top:0}.reveal .panel-sidebar :last-child,.reveal .panel-fill :last-child{margin-bottom:0}.panel-input>div,.panel-input>div>div{vertical-align:middle;padding-right:1em}.reveal p,.reveal .slides section,.reveal .slides section>section{line-height:1.3}.reveal.smaller .slides section,.reveal .slides section.smaller,.reveal .slides section .callout{font-size:0.7em}.reveal.smaller .slides section section{font-size:inherit}.reveal.smaller .slides h1,.reveal .slides section.smaller h1{font-size:calc(2.5em/0.7)}.reveal.smaller .slides h2,.reveal .slides section.smaller h2{font-size:calc(1.6em/0.7)}.reveal.smaller .slides h3,.reveal .slides section.smaller h3{font-size:calc(1.3em/0.7)}.reveal .columns>.column>:not(ul,ol){margin-left:.25em;margin-right:.25em}.reveal .columns>.column:first-child>:not(ul,ol){margin-right:.5em;margin-left:0}.reveal .columns>.column:last-child>:not(ul,ol){margin-right:0;margin-left:.5em}.reveal .slide-number{color:#eea143;background-color:#fefefe}.reveal .footer{color:#6f6f6f}.reveal .footer a{color:#e98a15}.reveal .footer.has-dark-background{color:#fff}.reveal .footer.has-dark-background a{color:#7bc6fa}.reveal .footer.has-light-background{color:#505050}.reveal .footer.has-light-background a{color:#6a9bdd}.reveal .slide-number{color:#6f6f6f}.reveal .slide-number.has-dark-background{color:#fff}.reveal .slide-number.has-light-background{color:#505050}.reveal .slide figure>figcaption,.reveal .slide img.stretch+p.caption,.reveal .slide img.r-stretch+p.caption{font-size:0.7em}@media screen and (min-width: 500px){.reveal .controls[data-controls-layout=edges] .navigate-left{left:.2em}.reveal .controls[data-controls-layout=edges] .navigate-right{right:.2em}.reveal .controls[data-controls-layout=edges] .navigate-up{top:.4em}.reveal .controls[data-controls-layout=edges] .navigate-down{bottom:2.3em}}.tippy-box[data-theme~=light-border]{background-color:#fefefe;color:#222;border-radius:4px;border:solid 1px #6f6f6f;font-size:.6em}.tippy-box[data-theme~=light-border] .tippy-arrow{color:#6f6f6f}.tippy-box[data-placement^=bottom]>.tippy-content{padding:7px 10px;z-index:1}.reveal .callout.callout-style-simple .callout-body,.reveal .callout.callout-style-default .callout-body,.reveal .callout.callout-style-simple div.callout-title,.reveal .callout.callout-style-default div.callout-title{font-size:inherit}.reveal .callout.callout-style-default .callout-icon::before,.reveal .callout.callout-style-simple .callout-icon::before{height:2rem;width:2rem;background-size:2rem 2rem}.reveal .callout.callout-titled .callout-title p{margin-top:.5em}.reveal .callout.callout-titled .callout-icon::before{margin-top:1rem}.reveal .callout.callout-titled .callout-body>.callout-content>:last-child{margin-bottom:1rem}.reveal .panel-tabset [role=tab]{padding:.25em .7em}.reveal .slide-menu-button .fa-bars::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-easel2::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-brush::before{background-image:url('data:image/svg+xml,')}/*! light */.reveal ol[type=a]{list-style-type:lower-alpha}.reveal ol[type=a s]{list-style-type:lower-alpha}.reveal ol[type=A s]{list-style-type:upper-alpha}.reveal ol[type=i]{list-style-type:lower-roman}.reveal ol[type=i s]{list-style-type:lower-roman}.reveal ol[type=I s]{list-style-type:upper-roman}.reveal ol[type="1"]{list-style-type:decimal}.reveal ul.task-list{list-style:none}.reveal ul.task-list li input[type=checkbox]{width:2em;height:2em;margin:0 1em .5em -1.6em;vertical-align:middle}div.cell-output-display div.pagedtable-wrapper table.table{font-size:.6em}.reveal .code-annotation-container-hidden{display:none}.reveal code.sourceCode button.code-annotation-anchor,.reveal code.sourceCode .code-annotation-anchor{font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;color:var(--quarto-hl-co-color);border:solid var(--quarto-hl-co-color) 1px;border-radius:50%;font-size:.7em;line-height:1.2em;margin-top:2px;user-select:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;-o-user-select:none}.reveal code.sourceCode button.code-annotation-anchor{cursor:pointer}.reveal code.sourceCode a.code-annotation-anchor{text-align:center;vertical-align:middle;text-decoration:none;cursor:default;height:1.2em;width:1.2em}.reveal code.sourceCode.fragment a.code-annotation-anchor{left:auto}.reveal #code-annotation-line-highlight-gutter{width:100%;border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2}.reveal #code-annotation-line-highlight{margin-left:-8em;width:calc(100% + 4em);border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2;margin-bottom:-2px}.reveal code.sourceCode .code-annotation-anchor.code-annotation-active{background-color:var(--quarto-hl-normal-color, #aaaaaa);border:solid var(--quarto-hl-normal-color, #aaaaaa) 1px;color:#fefefe;font-weight:bolder}.reveal pre.code-annotation-code{padding-top:0;padding-bottom:0}.reveal pre.code-annotation-code code{z-index:3;padding-left:0px}.reveal dl.code-annotation-container-grid{margin-left:.1em}.reveal dl.code-annotation-container-grid dt{margin-top:.65rem;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;border:solid #222 1px;border-radius:50%;height:1.3em;width:1.3em;line-height:1.3em;font-size:.5em;text-align:center;vertical-align:middle;text-decoration:none}.reveal dl.code-annotation-container-grid dd{margin-left:.25em}.reveal .scrollable ol li:first-child:nth-last-child(n+10),.reveal .scrollable ol li:first-child:nth-last-child(n+10)~li{margin-left:1em}html.print-pdf .reveal .slides .pdf-page:last-child{page-break-after:avoid}.reveal .quarto-title-block .quarto-title-authors{display:flex;justify-content:center}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author{padding-left:.5em;padding-right:.5em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:hover,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:visited,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:active{color:inherit;text-decoration:none}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-name{margin-bottom:.1rem}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-email{margin-top:0px;margin-bottom:.4em;font-size:.6em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-orcid img{margin-bottom:4px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation{font-size:.7em;margin-top:0px;margin-bottom:8px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation:first{margin-top:12px}ol{padding-left:.5em}ul{list-style:none}ul li::marker{color:#2c365e}dt{color:#e98a15}#title-slide{text-align:left}#title-slide .title{color:#2c365e}#title-slide .author{text-align:left;color:#e98a15;font-weight:bold}#title-slide .institute{padding-bottom:200px}.small{font-size:.75em}.smallest{font-size:.5em}.large{font-size:1.5em}.medium{font-size:.9em}.center-align{text-align:center}.hand{font-family:"Gochi Hand",cursive;font-size:125%}.hand-blue{font-family:"Gochi Hand",cursive;color:var(--primary);font-size:125%}/*# sourceMappingURL=f95d2bded9c28492b788fe14c3e9f347.css.map */ diff --git a/schedule/slides/00-gradient-descent.qmd b/schedule/slides/00-gradient-descent.qmd index 4fb989c..ca65337 100644 --- a/schedule/slides/00-gradient-descent.qmd +++ b/schedule/slides/00-gradient-descent.qmd @@ -7,12 +7,35 @@ metadata-files: {{< include _titleslide.qmd >}} -## Simple optimization techniques +## Motivation: maximum likelihood estimation as optimization +By the principle of maximum likelihood, we have that + +$$ +\begin{align*} +\hat \beta +&= \argmax_{\beta} \prod_{i=1}^n \P(Y_i \mid X_i) +\\ +&= \argmin_{\beta} \sum_{i=1}^n -\log\P(Y_i \mid X_i) +\end{align*} +$$ + +Under the model we use for logistic regression... +$$ +\begin{gathered} +\P(Y=1 \mid X=x) = h(\beta^\top x), \qquad \P(Y=0 \mid X=x) = h(-\beta^\top x), +\\ +h(z) = \tfrac{1}{1-e^{-z}} +\end{gathered} +$$ + +... we can't simply find the argmin with algebra. + +## Gradient descent: the workhorse optimization algorithm We'll see "gradient descent" a few times: -1. solves logistic regression (simple version of IRWLS) +1. solves logistic regression 1. gradient boosting 1. Neural networks @@ -278,18 +301,19 @@ Check any / all of ## Stochastic gradient descent -Suppose $f(x) = \frac{1}{n}\sum_{i=1}^n f_i(x)$ - -Like if $f(\beta) = \frac{1}{n}\sum_{i=1}^n (y_i - x^\top_i\beta)^2$. - -Then $f'(\beta) = \frac{1}{n}\sum_{i=1}^n f'_i(\beta) = \frac{1}{n} \sum_{i=1}^n -2x_i^\top(y_i - x^\top_i\beta)$ - -If $n$ is really big, it may take a long time to compute $f'$ +If we're optimizing +$\argmin_\beta \sum_{i=1}^n -\log P_\beta(Y_i \mid X_i)$ +then our derivative will also have an additive form +$$ +\sum_{i=1}^n \frac{\partial}{\partial \beta} \left[-\log P_\beta(Y_i \mid X_i) \right] +$$ -So, just sample some partition our data into mini-batches $\mathcal{M}_j$ +If $n$ is really big, it may take a long time to compute this gradient -And approximate (imagine the Law of Large Numbers, use a sample to approximate the population) -$$f'(x) = \frac{1}{n}\sum_{i=1}^n f'_i(x) \approx \frac{1}{m}\sum_{i\in\mathcal{M}_j}f'_{i}(x)$$ +So, just sample some partition our data $\mathcal{M} \subset \{ (X_i, Y_i)\}_{i=1}^n$ +and approximate: +$$\sum_{i=1}^n \frac{\partial}{\partial \beta} \left[-\log P_\beta(Y_i \mid X_i) \right] \approx \frac{n}{\vert \mathcal M \vert}\sum_{i\in\mathcal{M}} \left[-\log P_\beta(Y_i \mid X_i) \right]$$ +[(Justification: Law of Large Numbers. Use a sample to approximate the population.)]{.secondary} ## SGD @@ -316,12 +340,19 @@ This is the workhorse for neural network optimization ## Gradient descent for Logistic regression -Suppose $Y=1$ with probability $p(x)$ and $Y=0$ with probability $1-p(x)$, $x \in \R$. +$$ +\begin{gathered} +\P(Y=1 \mid X=x) = h(\beta^\top x), \qquad \P(Y=0 \mid X=x) = h(-\beta^\top x), +\\ +\\ +h(z) = \tfrac{1}{1-e^{-z}} +\end{gathered} +$$ -I want to model $P(Y=1| X=x)$. + -I'll assume that $\log\left(\frac{p(x)}{1-p(x)}\right) = ax$ for some scalar $a$. This means that -$p(x) = \frac{\exp(ax)}{1+\exp(ax)} = \frac{1}{1+\exp(-ax)}$ + + . . . @@ -330,107 +361,133 @@ $p(x) = \frac{\exp(ax)}{1+\exp(ax)} = \frac{1}{1+\exp(-ax)}$ #| fig-width: 6 #| fig-height: 4 n <- 100 -a <- 2 +beta <- 2 x <- runif(n, -5, 5) logit <- function(x) 1 / (1 + exp(-x)) -p <- logit(a * x) +p <- logit(beta * x) y <- rbinom(n, 1, p) df <- tibble(x, y) ggplot(df, aes(x, y)) + geom_point(colour = "cornflowerblue") + - stat_function(fun = ~ logit(a * .x)) + stat_function(fun = ~ logit(beta * .x)) ``` -## Reminder: the likelihood +--- $$ -L(y | a, x) = \prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\textrm{ and } -p(x) = \frac{1}{1+\exp(-ax)} +\P(Y=1 \mid X=x) = h(\beta^\top x), \qquad \P(Y=0 \mid X=x) = h(-\beta^\top x) $$ +Under maximum likelihood $$ -\begin{aligned} -\ell(y | a, x) &= \log \prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i} -= \sum_{i=1}^n y_i\log p(x_i) + (1-y_i)\log(1-p(x_i))\\ -&= \sum_{i=1}^n\log(1-p(x_i)) + y_i\log\left(\frac{p(x_i)}{1-p(x_i)}\right)\\ -&=\sum_{i=1}^n ax_i y_i + \log\left(1-p(x_i)\right)\\ -&=\sum_{i=1}^n ax_i y_i + \log\left(\frac{1}{1+\exp(ax_i)}\right) -\end{aligned} +\hat \beta = \argmin_{\beta} \underbrace{ + \textstyle{\sum_{i=1}^n - \log( \P_\beta(Y_i=y_i \mid X_i=x_i) )} +}_{:= -\ell(\beta)} $$ -## Reminder: the likelihood +--- $$ -L(y | a, x) = \prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\textrm{ and } -p(x) = \frac{1}{1+\exp(-ax)} +\begin{align*} +\P_\beta(Y_i=y_i \mid X_i=X_i) &= h\left( [-1]^{1 - y_i} \beta^\top x_i \right) +\\ +\\ +-\ell(\beta) &= \sum_{i=1}^n -\log\left( \P_\beta(Y_i=y_i \mid X_i=X_i) \right) +\\ +&= \sum_{i=1}^n \log\left( 1 + \exp\left( [-1]^{y_i} \beta^\top x_i \right) \right) +\\ +\\ +-\frac{\partial \ell}{\partial \beta} +&= \sum_{i=1}^n x_i \frac{\exp\left( [-1]^{y_i} \beta^\top x_i \right)}{1 + \exp\left( [-1]^{y_i} \beta^\top x_i \right)} +\\ +&= \sum_{i=1}^n x_i \left( y_i - \P_\beta(Y_i=y_i \mid X_i=X_i) \right) +\end{align*} $$ -Now, we want the negative of this. Why? + + + + -We would maximize the likelihood/log-likelihood, so we minimize the negative likelihood/log-likelihood (and scale by $1/n$) + + + + + + + + + -$$-\ell(y | a, x) = \frac{1}{n}\sum_{i=1}^n -ax_i y_i - \log\left(\frac{1}{1+\exp(ax_i)}\right)$$ + -## Reminder: the likelihood + + + + -$$ -\frac{1}{n}L(y | a, x) = \frac{1}{n}\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\textrm{ and } -p(x) = \frac{1}{1+\exp(-ax)} -$$ + -This is, in the notation of our slides $f(a)$. + -We want to minimize it in $a$ by gradient descent. -So we need the derivative with respect to $a$: $f'(a)$. + -Now, conveniently, this simplifies a lot. + + + + + -$$ -\begin{aligned} -\frac{d}{d a} f(a) &= \frac{1}{n}\sum_{i=1}^n -x_i y_i - \left(-\frac{x_i \exp(ax_i)}{1+\exp(ax_i)}\right)\\ -&=\frac{1}{n}\sum_{i=1}^n -x_i y_i + p(x_i)x_i = \frac{1}{n}\sum_{i=1}^n -x_i(y_i-p(x_i)). -\end{aligned} -$$ + -## Reminder: the likelihood + -$$ -\frac{1}{n}L(y | a, x) = \frac{1}{n}\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\textrm{ and } -p(x) = \frac{1}{1+\exp(-ax)} -$$ + -(Simple) gradient descent to minimize $-\ell(a)$ or maximize $L(y|a,x)$ is: + -1. Input $a_1,\ \gamma>0,\ j_\max,\ \epsilon>0,\ \frac{d}{da} -\ell(a)$. -2. For $j=1,\ 2,\ \ldots,\ j_\max$, -$$a_j = a_{j-1} - \gamma \frac{d}{da} (-\ell(a_{j-1}))$$ -3. Stop if $\epsilon > |a_j - a_{j-1}|$ or $|d / da\ \ell(a)| < \epsilon$. -## Reminder: the likelihood + + + + + + -$$ -\frac{1}{n}L(y | a, x) = \frac{1}{n}\prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}\textrm{ and } -p(x) = \frac{1}{1+\exp(-ax)} -$$ + + + + + + + +--- -```{r amlecorrect} -#| code-line-numbers: "1,10-11|2-3|4-9|" -amle <- function(x, y, a0, gam = 0.5, jmax = 50, eps = 1e-6) { - a <- double(jmax) # place to hold stuff (always preallocate space) - a[1] <- a0 # starting value +## Finding $\hat\beta = \argmin_{\beta} -\ell(\beta)$ with gradient descent: + +1. Input $\beta_0,\ \gamma>0,\ j_\max,\ \epsilon>0,\ \tfrac{d \ell}{d\beta}$. +2. For $j=1,\ 2,\ \ldots,\ j_\max$, +$$\beta_j = \beta_{j-1} - \gamma \left(\tfrac{d}{d\beta}-\!\ell(\beta_{j-1}) \right)$$ +3. Stop if $\epsilon > |\beta_j - \beta_{j-1}|$ or $|d\ell / d\beta\ | < \epsilon$. + + +```{r betamle} +beta.mle <- function(x, y, beta0, gamma = 0.5, jmax = 50, eps = 1e-6) { + beta <- double(jmax) # place to hold stuff (always preallocate space) + beta[1] <- beta0 # starting value for (j in 2:jmax) { # avoid possibly infinite while loops - px <- logit(a[j - 1] * x) + px <- logit(beta[j - 1] * x) grad <- mean(-x * (y - px)) - a[j] <- a[j - 1] - gam * grad - if (abs(grad) < eps || abs(a[j] - a[j - 1]) < eps) break + beta[j] <- beta[j - 1] - gamma * grad + if (abs(grad) < eps || abs(beta[j] - beta[j - 1]) < eps) break } - a[1:j] + beta[1:j] } ``` @@ -439,23 +496,18 @@ amle <- function(x, y, a0, gam = 0.5, jmax = 50, eps = 1e-6) { ## Try it: ```{r ourmle1} -round(too_big <- amle(x, y, 5, 50), 3) -round(too_small <- amle(x, y, 5, 1), 3) -round(just_right <- amle(x, y, 5, 10), 3) +too_big <- beta.mle(x, y, beta0 = 5, gamma = 50) +too_small <- beta.mle(x, y, beta0 = 5, gamma = 1) +just_right <- beta.mle(x, y, beta0 = 5, gamma = 10) ``` - -## Visual - - - ```{r} #| output-location: column -#| fig-width: 6 -#| fig-height: 8 -negll <- function(a) { - -a * mean(x * y) - - rowMeans(log(1 / (1 + exp(outer(a, x))))) +#| fig-width: 7 +#| fig-height: 7 +negll <- function(beta) { + -beta * mean(x * y) - + rowMeans(log(1 / (1 + exp(outer(beta, x))))) } blah <- list_rbind( map( @@ -470,7 +522,7 @@ ggplot(blah, aes(value, negll)) + facet_wrap(~gamma, ncol = 1) + stat_function(fun = negll, xlim = c(-2.5, 5)) + scale_y_log10() + - xlab("a") + + xlab("beta") + ylab("negative log likelihood") + geom_vline(xintercept = tail(just_right, 1)) + scale_colour_brewer(palette = "Set1") + diff --git a/schedule/slides/16-logistic-regression.qmd b/schedule/slides/16-logistic-regression.qmd index 5477b82..bf7f737 100644 --- a/schedule/slides/16-logistic-regression.qmd +++ b/schedule/slides/16-logistic-regression.qmd @@ -18,33 +18,164 @@ $$g_*(X) = \begin{cases} 0 & \textrm{ otherwise} \end{cases}$$ -where $p_1(X) = Pr(X \given Y=1)$ and $p_0(X) = Pr(X \given Y=0)$ +where -* We then looked at what happens if we **assume** $Pr(X \given Y=y)$ is Normally distributed. +* $p_1(X) = \P(X \given Y=1)$ +* $p_0(X) = \P(X \given Y=0)$ +* $\pi = \P(Y=1)$ -We then used this distribution and the class prior $\pi$ to find the __posterior__ $Pr(Y=1 \given X=x)$. +. . . + +### This lecture + +How do we estimate $p_1(X), p_0(X), \pi$? + +## Warmup: estimating $\pi = Pr(Y=1)$ + +A good estimator: + +$$ +\hat \pi = \frac{1}{n} \sum_{i=1}^n 1_{y_i = 1} +$$ + +(I.e. count the number of $1$s in the training set) + +This estimator is low bias/variance. + +. . . + +\ +*As we will soon see, it turns out we won't have to use this estimator.* + + + + + + +## Estimating $p_0$ and $p_1$ is much harder + +$\P(X=x \mid Y = 1)$ and $\P(X=x \mid Y = 0)$ are $p$-dimensional distributions.\ +[Remember the *curse of dimensionality*?]{.small} + +\ +[Can we simplify our estimation problem?]{.secondary} + + +## Sidestepping the hard estimation problem + +Rather than estimating $\P(X=x \mid Y = 1)$ and $\P(X=x \mid Y = 0)$, +I claim we can instead estimate the simpler ratio + +$$ +\frac{\P(Y = 1 \mid X=x)}{\P(Y = 0 \mid X=x)} +$$ +Why? + +. . . + +\ +Recall that $g_*(X)$ ony depends on the *ratio* $\P(X \mid Y = 1) / \P(X \mid Y = 0)$. + +. . . + +$$ +\begin{align*} + \frac{\P(X=x \mid Y = 1)}{\P(X=x \mid Y = 0)} + &= + \frac{ + \tfrac{\P(Y = 1 \mid X=x) \P(X=x)}{\P(Y = 1)} + }{ + \tfrac{\P(Y = 0 \mid X=x) \P(X=x)}{\P(Y = 0)} + } + = + \frac{\P(Y = 1 \mid X=x)}{\P(Y = 0 \mid X=x)} + \underbrace{\left(\frac{1-\pi}{\pi}\right)}_{\text{Easy to estimate with } \hat \pi} +\end{align*} +$$ ## Direct model -Instead, let's directly model the posterior +[As with regression, we'll start with a simple model.]{.small}\ +Assume our data can be modelled by a distribution of the form + +$$ +\log\left( \frac{\P(Y = 1 \mid X=x)}{\P(Y = 0 \mid X=x)} \right) = \beta_0 + \beta^\top x +$$ + +[Why does it make sense to model the *log ratio* rather than the *ratio*?]{.secondary} + +. . . + + +From this eq., we can recover an estimate of the ratio we need for the [Bayes classifier]{.secondary}: + +$$ +\begin{align*} +\log\left( \frac{\P(X=x \mid Y = 1)}{\P(X=x \mid Y = 0)} \right) +&= +\log\left( \frac{\tfrac{\P(X=x)}{\P(Y = 1)}}{\tfrac{\P(X=x)}{\P(Y = 0)}} \right) + +\log\left( \frac{\P(Y = 1 \mid X=x)}{\P(Y = 0 \mid X=x)} \right) +\\ +&= \underbrace{\left( \tfrac{1 - \pi}{\pi} + \beta_0 \right)}_{\beta_0'} + \beta^\top x +\end{align*} +$$ + +## Recovering class probabilities + +$$ +\text{Our model:}\qquad +\log\left( \frac{\P(Y = 1 \mid X=x)}{\P(Y = 0 \mid X=x)} \right) = \beta_0 + \beta^\top x +$$ + +. . . + +\ +We know that $\P(Y = 1 \mid X=x) + \P(Y = 0 \mid X=x) = 1$. So... + +$$ +\frac{\P(Y = 1 \mid X=x)}{\P(Y = 0 \mid X=x)} += +\frac{\P(Y = 1 \mid X=x)}{1 - \P(Y = 1 \mid X=x)} += +\exp\left( \beta_0 + \beta^\top x \right), +$$ + +--- + +$$ +\frac{\P(Y = 1 \mid X=x)}{\P(Y = 0 \mid X=x)} += +\frac{\P(Y = 1 \mid X=x)}{1 - \P(Y = 1 \mid X=x)} += +\exp\left( \beta_0 + \beta^\top x \right), +$$ +[After algebra...]{.small} $$ \begin{aligned} -Pr(Y = 1 \given X=x) & = \frac{\exp\{\beta_0 + \beta^{\top}x\}}{1 + \exp\{\beta_0 + \beta^{\top}x\}} \\ -Pr(Y = 0 | X=x) & = \frac{1}{1 + \exp\{\beta_0 + \beta^{\top}x\}}=1-\frac{\exp\{\beta_0 + \beta^{\top}x\}}{1 + \exp\{\beta_0 + \beta^{\top}x\}} +\P(Y = 1 \given X=x) &= \frac{\exp\{\beta_0 + \beta^{\top}x\}}{1 + \exp\{\beta_0 + \beta^{\top}x\}}, +\\\ +\P(Y = 0 | X=x) &= \frac{1}{1 + \exp\{\beta_0 + \beta^{\top}x\}} \end{aligned} $$ This is logistic regression. -## Why this? +## Logistic Regression -$$Pr(Y = 1 \given X=x) = \frac{\exp\{\beta_0 + \beta^{\top}x\}}{1 + \exp\{\beta_0 + \beta^{\top}x\}}$$ +$$\P(Y = 1 \given X=x) = \frac{\exp\{\beta_0 + \beta^{\top}x\}}{1 + \exp\{\beta_0 + \beta^{\top}x\}} += h\left( \beta_0 + \beta^\top x \right),$$ -* There are lots of ways to map $\R \mapsto [0,1]$. +where $h(z) = (1 + \exp(-z))^{-1} = \exp(z) / (1+\exp(z))$ +is the [logistic function]{.secondary}. -* The "logistic" function $z\mapsto (1 + \exp(-z))^{-1} = \exp(z) / (1+\exp(z)) =:h(z)$ is nice. +\ + +::: flex +::: w-60 + +### The "logistic" function is nice. * It's symmetric: $1 - h(z) = h(-z)$ @@ -52,43 +183,51 @@ $$Pr(Y = 1 \given X=x) = \frac{\exp\{\beta_0 + \beta^{\top}x\}}{1 + \exp\{\beta_ * It's the inverse of the "log-odds" (logit): $\log(p / (1-p))$. +::: +::: w-35 +```{r echo=FALSE} +#| fig-width: 6 +#| fig-height: 4 +library(tidyverse) +n <- 100 +a <- 2 +x <- runif(n, -5, 5) +y <- 1 / (1 + exp(-x)) +df <- tibble(x, y) +ggplot(df, aes(x, y)) + + geom_line() + + labs(x = "x", y = "h(x)") +``` +::: +::: -## Another linear classifier +## Logistic regression is a Linear Classifier -Like LDA, logistic regression is a linear classifier + +Logistic regression is a [linear classifier]{.secondary} -The _logit_ (i.e.: log odds) transformation -gives a linear decision boundary + + $$\log\left( \frac{\P(Y = 1 \given X=x)}{\P(Y = 0 \given X=x) } \right) = \beta_0 + \beta^{\top} x$$ -The decision boundary is the hyperplane -$\{x : \beta_0 + \beta^{\top} x = 0\}$ - -If the log-odds are below 0, classify as 0, above 0 classify as a 1. -## Logistic regression is also easy in R - -```{r eval=FALSE} -logistic <- glm(y ~ ., dat, family = "binomial") -``` +* If the log-odds are $>0$, classify as 1\ + [($Y=1$ is more likely)]{.small} -Or we can use lasso or ridge regression or a GAM as before +* If the log-odds are $<0$, classify as a 0\ + [($Y=0$ is more likely)]{.small} -```{r eval=FALSE} -lasso_logit <- cv.glmnet(x, y, family = "binomial") -ridge_logit <- cv.glmnet(x, y, alpha = 0, family = "binomial") -gam_logit <- gam(y ~ s(x), data = dat, family = "binomial") -``` - - -::: aside -glm means generalized linear model -::: +\ +The [decision boundary]{.secondary} is the hyperplane +$\{x : \beta_0 + \beta^{\top} x = 0\}$ -## Baby example (continued from last time) +## Visualizing the classification boundary -```{r simple-lda, echo=FALSE} +```{r plot-d1} +#| code-fold: true +#| fig-width: 8 +#| fig-height: 5 library(mvtnorm) library(MASS) generate_lda_2d <- function( @@ -102,21 +241,10 @@ generate_lda_2d <- function( x2 = X[, 2] + mu[2, y] ) } -``` -```{r} dat1 <- generate_lda_2d(100, Sigma = .5 * diag(2)) logit <- glm(y ~ ., dat1 |> mutate(y = y - 1), family = "binomial") -summary(logit) -``` - -## Visualizing the classification boundary - -```{r plot-d1} -#| code-fold: true -#| fig-width: 8 -#| fig-height: 5 gr <- expand_grid(x1 = seq(-2.5, 3, length.out = 100), x2 = seq(-2.5, 3, length.out = 100)) pts <- predict(logit, gr) @@ -130,135 +258,251 @@ g0 ``` -## Calculation +## Linear classifier $\ne$ linear smoother -While the `R` formula for logistic regression is straightforward, it's not as easy to compute as OLS or LDA or QDA. +* While logistic regression produces linear decision boundaries, it is **not** a linear smoother +* AIC/BIC/Cp work if you use the likelihood correctly and count degrees-of-freedom correctly -Logistic regression for two classes simplifies to a likelihood: + * $\mathrm{df} = p + 1$ ([Why?]{.secondary}) -Write $p_i(\beta) = \P(Y_i = 1 | X = x_i,\beta)$ +* Most people use CV -* $P(Y_i = y_i \given X = x_i, \beta) = p_i^{y_i}(1-p_i)^{1-y_i}$ (...Bernoulli distribution) -* $P(\mathbf{Y} \given \mathbf{X}, \beta) = \prod_{i=1}^n p_i^{y_i}(1-p_i)^{1-y_i}$. +## Logistic regression in R +```{r eval=FALSE} +logistic <- glm(y ~ ., dat, family = "binomial") +``` -## Calculation +Or we can use lasso or ridge regression or a GAM as before +```{r eval=FALSE} +lasso_logit <- cv.glmnet(x, y, family = "binomial") +ridge_logit <- cv.glmnet(x, y, alpha = 0, family = "binomial") +gam_logit <- gam(y ~ s(x), data = dat, family = "binomial") +``` -Write $p_i(\beta) = \P(Y_i = 1 | X = x_i,\beta)$ +::: aside +glm means [generalized linear model]{.secondary} +::: -$$ -\begin{aligned} -\ell(\beta) -& = \log \left( \prod_{i=1}^n p_i^{y_i}(1-p_i)^{1-y_i} \right)\\ -&=\sum_{i=1}^n \left( y_i\log(p_i(\beta)) + (1-y_i)\log(1-p_i(\beta))\right) \\ -& = -\sum_{i=1}^n \left( y_i\log(e^{\beta^{\top}x_i}/(1+e^{\beta^{\top}x_i})) - (1-y_i)\log(1+e^{\beta^{\top}x_i})\right) \\ -& = -\sum_{i=1}^n \left( y_i\beta^{\top}x_i -\log(1 + e^{\beta^{\top} x_i})\right) -\end{aligned} -$$ -This gets optimized via Newton-Raphson updates and iteratively reweighed -least squares. - - -## IRWLS for logistic regression (skip for now) - -(This is preparation for Neural Networks.) - -```{r} -logit_irwls <- function(y, x, maxit = 100, tol = 1e-6) { - p <- ncol(x) - beta <- double(p) # initialize coefficients - beta0 <- 0 - conv <- FALSE # hasn't converged - iter <- 1 # first iteration - while (!conv && (iter < maxit)) { # check loops - iter <- iter + 1 # update first thing (so as not to forget) - eta <- beta0 + x %*% beta - mu <- 1 / (1 + exp(-eta)) - gp <- 1 / (mu * (1 - mu)) # inverse of derivative of logistic - z <- eta + (y - mu) * gp # effective transformed response - beta_new <- coef(lm(z ~ x, weights = 1 / gp)) # do Weighted Least Squares - conv <- mean(abs(c(beta0, beta) - betaNew)) < tol # check if the betas are "moving" - beta0 <- betaNew[1] # update betas - beta <- betaNew[-1] - } - return(c(beta0, beta)) -} -``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -## Comparing LDA and Logistic regression + -Both decision boundaries are linear in $x$: + -- LDA $\longrightarrow \alpha_0 + \alpha_1^\top x$ -- Logit $\longrightarrow \beta_0 + \beta_1^\top x$. -But the parameters are estimated differently. + -## Comparing LDA and Logistic regression -Examine the joint distribution of $(X,\ Y)$ [(not the posterior)]{.f3}: + -- LDA: $f(X_i,\ Y_i) = \underbrace{ f(X_i \given Y_i)}_{\textrm{Gaussian}}\underbrace{ f(Y_i)}_{\textrm{Bernoulli}}$ -- Logistic Regression: $f(X_i,Y_i) = \underbrace{ f(Y_i\given X_i)}_{\textrm{Logistic}}\underbrace{ f(X_i)}_{\textrm{Ignored}}$ - -* LDA estimates the joint, but Logistic estimates only the conditional (posterior) distribution. [But this is really all we need.]{.hand} -* So logistic requires fewer assumptions. + + + + + + + + + + + -* But if the two classes are perfectly separable, logistic crashes (and the MLE is undefined, too many solutions) + + -* LDA "works" even if the conditional isn't normal, but works very poorly if any X is qualitative + -## Comparing with QDA (2 classes) + + + + + + + + + + + + + + + + + + + + + + -* Recall: this gives a "quadratic" decision boundary (it's a curve). -* If we have $p$ columns in $X$ - - Logistic estimates $p+1$ parameters - - LDA estimates $2p + p(p+1)/2 + 1$ - - QDA estimates $2p + p(p+1) + 1$ - -* If $p=50$, - - Logistic: 51 - - LDA: 1376 - - QDA: 2651 - -* QDA doesn't get used much: there are better nonlinear versions with way fewer parameters + -## Bad parameter counting + -I've motivated LDA as needing $\Sigma$, $\pi$ and $\mu_0$, $\mu_1$ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +## Estimating $\beta_0$, $\beta$ + +For regression... + +$$ +\text{Model:} \qquad +\P(Y=y \mid X=x) = \mathcal N( y; \:\: \beta^\top x, \:\:\sigma^2) +$$ + +... recall that we motivated OLS with +the [principle of maximum likelihood]{.secondary} + +$$ +\begin{align*} +\hat \beta_\mathrm{OLS} +&= \argmax_{\beta} \prod_{i=1}^n \P(Y_i = y_i \mid X_i = x_i) +\\ +&= \argmin_{\beta} \sum_{i=1}^n -\log\P(Y_i = y_i \mid X_i = x_i) +\\ +\\ +&= \ldots (\text{because regression is nice}) +\\ +&= \textstyle \left( \sum_{i=1}^n x_i x_i^\top \right)^{-1}\left( \sum_{i=1}^n y_i x_i \right) +\end{align*} +$$ + +## Estimating $\beta_0$, $\beta$ + +For classification with logistic regression... + +$$ +\begin{align*} +\text{Model:} &\qquad +\tfrac{\P(Y=1 \mid X=x)}{\P(Y=0 \mid X=x)} = \exp\left( \beta_0 +\beta^\top x \right) +\\ +\text{Or alternatively:} &\qquad \P(Y=1 \mid X=x) = h\left(\beta_0 + \beta^\top x \right) +\\ +&\qquad \P(Y=0 \mid X=x) = h\left(-(\beta_0 + \beta^\top x)\right) +\end{align*} +$$ +... we can also apply +the [principle of maximum likelihood]{.secondary} + +$$ +\begin{align*} +\hat \beta_{0}, \hat \beta +&= \argmax_{\beta_0, \beta} \prod_{i=1}^n \P(Y_i \mid X_i) +\\ +&= \argmin_{\beta_0,\beta} \sum_{i=1}^n -\log\P(Y_i \mid X_i) +\end{align*} +$$ -In fact, we don't _need_ all of this to get the decision boundary. +. . . -So the "degrees of freedom" is much lower if we only want the _classes_ and not -the _probabilities_. +Unfortunately that's as far as we can get with algebra alone. -The decision boundary only really depends on + -* $\Sigma^{-1}(\mu_1-\mu_0)$ -* $(\mu_1+\mu_0)$, -* so appropriate algorithms estimate $<2p$ parameters. + -## Note again: -while logistic regression and LDA produce linear decision boundaries, they are **not** linear smoothers + -AIC/BIC/Cp work if you use the likelihood correctly and count degrees-of-freedom correctly + -Must people use either test set or CV + + # Next time: -Nonlinear classifiers +The workhorse algorithm for obtaining $\hat \beta_0$, $\hat \beta$