From beaa4a89b4ab765b43b6a81762fa7d2fac994f5f Mon Sep 17 00:00:00 2001 From: "James R. Maddison" Date: Tue, 4 Jun 2024 16:50:45 +0100 Subject: [PATCH 1/2] Hessian-based UQ example --- docs/source/examples/8_hessian_uq.ipynb | 609 ++++++++++++++++++++++++ docs/source/index.rst | 5 + 2 files changed, 614 insertions(+) create mode 100644 docs/source/examples/8_hessian_uq.ipynb diff --git a/docs/source/examples/8_hessian_uq.ipynb b/docs/source/examples/8_hessian_uq.ipynb new file mode 100644 index 00000000..f2e01b8d --- /dev/null +++ b/docs/source/examples/8_hessian_uq.ipynb @@ -0,0 +1,609 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "82be80f5-acad-4947-8f9a-c6a7a87a7ace", + "metadata": {}, + "source": [ + "# Hessian-based uncertainty quantification\n", + "\n", + "**Author:** James R. Maddison\n", + "\n", + "This notebook describes the use of tlm_adjoint for Hessian-based uncertainty quantification. The approach is based on the method described in\n", + "\n", + "- Tobin Isaac, Noemi Petra, Georg Stadler, and Omar Ghattas, 'Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet', Journal of Computational Physics, 296, pp. 348–368, 2015, doi: 10.1016/j.jcp.2015.04.047\n", + "\n", + "We assume real spaces and a real build of Firedrake throughout.\n", + "\n", + "## Uncertainty quantification problem\n", + "\n", + "We consider the advection-diffusion equation in the unit square domain\n", + "\n", + "$$\\partial_t u + \\nabla^\\perp m \\cdot \\nabla u = \\kappa \\nabla^2 u,$$\n", + "\n", + "for some real and positive $\\kappa$ and subject to doubly periodic boundary conditions. We consider a continuous Galerkin finite element discretization in space and an implicit midpoint rule discretization in time, seeking $u_{n + 1} \\in V$ such that\n", + "\n", + "$$\\forall \\zeta \\in V \\qquad \\int_\\Omega \\zeta u_{n + 1} + \\Delta t \\int_\\Omega \\zeta \\nabla^\\perp m \\cdot \\nabla \\left[ \\frac{1}{2} ( u_n + u_{n + 1}) \\right]\n", + " + \\kappa \\Delta t \\int_\\Omega \\nabla \\zeta \\cdot \\nabla \\left[ \\frac{1}{2} ( u_n + u_{n + 1}) \\right] = \\int_\\Omega \\zeta u_n,$$\n", + "\n", + "with timestep size $\\Delta t$ and given some suitable discrete initial condition $u_0 \\in V$. Here $V$ is a real continuous $P_1$ finite element space whose elements satisfy the doubly periodic boundary conditions, and after discretizing we have $m \\in V$. For a given value of $m$ we let $\\hat{u}_N ( m )$ denote the solution obtained after $N$ timesteps. Given an observation for $\\hat{u}_N ( t )$, $u_{obs} \\in V$, we seek to infer information about the control $m$. Here we are specifically seeking to infer the transport, defined in terms of a stream function $m$, which transports the solution from $u_0$ to $u_{obs}$ in time $T$.\n", + "\n", + "We first need to model the observational error. For a given value of the control $m$ we treat observations are being realizations of a Gaussian random variable, with mean given by $\\hat{u}_N ( m )$, and with known covariance. Specifically we define a density $p ( u_{obs} | m )$ whose negative logarithm is (up to a normalization term which is neglected here)\n", + "\n", + "$$-\\ln p ( u_{obs} | m ) = \\frac{1}{2} R_{obs}^{-1} ( u_{obs} - \\hat{u}_N ( m ), u_{obs} - \\hat{u}_N ( m ) ).$$\n", + "\n", + "$R_{obs}^{-1}$ is a bilinear and symmetric positive definite observational inverse covariance operator. Here we choose $R_{obs}^{-1}$ by defining\n", + "\n", + "$$R_{obs}^{-1} ( q_i, q_j ) = \\int_\\Omega q_j \\mathcal{L}_{\\sigma_R,d_R}^2 ( q_i ),$$\n", + "\n", + "where $\\mathcal{L}_{\\sigma,d} : V \\rightarrow V$ is defined by\n", + "\n", + "$$\\forall \\zeta \\in V \\qquad \\frac{1}{\\sqrt{4 \\pi \\sigma^2}} \\left[ \\frac{1}{d} \\int_\\Omega \\zeta q + d \\int_\\Omega \\nabla \\zeta \\cdot \\nabla q \\right] = \\int_\\Omega \\zeta \\mathcal{L}_{\\sigma,d} \\left( q \\right).$$\n", + "\n", + "In the continuous and $\\Omega = \\mathbb{R}^2$ case $R_{obs}$ then defines a covariance operator with single point variance $\\sigma_R^2$ and autocorrelation length scale $d_R$ (see Lindgren et al 2011, doi: 10.1111/j.1467-9868.2011.00777.x).\n", + "\n", + "We next introduce a prior for the control. We consider a Gaussian prior with mean zero and with known covariance. Specifically we define a prior density $p ( m )$ whose negative logarithm is (again up to a normalization term)\n", + "\n", + "$$-\\ln p ( m ) = \\frac{1}{2} B^{-1} ( m, m ).$$\n", + "\n", + "$B^{-1}$ is a bilinear and symmetric positive definite observational inverse covariance operator. Here we choose $B^{-1}$ by defining\n", + "\n", + "$$B^{-1} ( q_i, q_j ) = \\int_\\Omega q_j \\mathcal{L}_{\\sigma_B,d_B}^2 ( q_i ).$$\n", + "\n", + "Now applying Bayes theorem we obtain a posterior density whose negative logarithm is (up to a normalization term)\n", + "\n", + "$$-\\ln p ( m | u_{obs} ) = \\frac{1}{2} R_{obs}^{-1} ( u_{obs} - \\hat{u}_N ( m ), u_{obs} - \\hat{u}_N ( m ) ) + \\frac{1}{2} B^{-1} ( m, m ).$$\n", + "\n", + "We have made a number of modelling choices, but subject to these choices the posterior density now completely describes the information we have about the control after being supplied with an observation $u_{obs}$. The challenge is that the posterior is defined in a high dimensional space, and is in general not Gaussian. To simplify the problem we *approximate* the posterior with a Gaussian, and specifically make the approximation (up to a normalization term)\n", + "\n", + "$$-\\ln p ( m | u_{obs} ) \\approx \\frac{1}{2} \\Gamma_{post}^{-1} ( m - m_{MAP}, m - m_{MAP} ).$$\n", + "\n", + "The mean of the Gaussian approximation is set equal to the posterior density maximizer (the Maximum A Posteriori estimate), $m_{map}$. The inverse covariance of the Gaussian approximation, $\\Gamma_{post}^{-1}$, is set equal to the Hessian, $H$, of the negative log posterior density $-\\ln p ( m | u_{obs} )$, defined by differentiating twice with respect to $m$ and evaluated at the posterior density maximizer.\n", + "\n", + "Unfortunately in order to quantify uncertainty we require information about the *covariance*, and not the *inverse covariance*. Specifically if we have a linear observational operator $q \\in V^*$ then our estimate for the variance associated with $q ( m )$ is\n", + "\n", + "$$\\sigma^2_q \\approx \\Gamma_{post} ( q, q ) = H^{-1} ( q, q ),$$\n", + "\n", + "which requires access to information about the Hessian *inverse*.\n", + "\n", + "In the following we use two approaches to gain access to this information.\n", + "\n", + "1. A low rank update approximation. We approximate the Hessian inverse using a low rank update to the prior covariance, using the methodology of Isaac et al 2015, doi: 10.1016/j.jcp.2015.04.047.\n", + "2. By computing the Hessian inverse action using a Krylov method, preconditioned using the low rank update approximation.\n", + "\n", + "To solve this problem we need several pieces:\n", + "\n", + "1. A differentiable solver for the forward problem. We will construct this using Firedrake with tlm_adjoint.\n", + "2. To find the posterior maximizer. We will use gradient-based optimization using TAO.\n", + "3. To find a low rank update approximation for the Hessian inverse. We will find this using a partial eigenspectrum obtained using SLEPc.\n", + "4. To compute the Hessian inverse action. We will compute this using a Krylov method using PETSc, preconditioned using the partial eigenspectrum.\n", + "\n", + "## Forward problem\n", + "\n", + "We first implement the forward model using Firedrake with tlm_adjoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d9f4bf6-9c22-45bc-82c6-7b98d356f3e9", + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "from firedrake import *\n", + "from tlm_adjoint.firedrake import *\n", + "from firedrake.pyplot import tricontourf\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "\n", + "reset_manager()\n", + "clear_caches()\n", + "\n", + "mesh = PeriodicUnitSquareMesh(40, 40, diagonal=\"crossed\")\n", + "X = SpatialCoordinate(mesh)\n", + "space = FunctionSpace(mesh, \"Lagrange\", 1)\n", + "test = TestFunction(space)\n", + "trial = TrialFunction(space)\n", + "\n", + "T = 0.5\n", + "N = 10\n", + "kappa = Constant(1.0e-3, static=True)\n", + "dt = Constant(T / N, static=True)\n", + "sigma_R = Constant(0.03)\n", + "d_R = Constant(0.1)\n", + "sigma_B = Constant(0.025)\n", + "d_B = Constant(0.2)\n", + "u_0 = Function(space, name=\"u_0\").interpolate(\n", + " exp(-((X[0] - 0.4) ** 2 + (X[1] - 0.5) ** 2) / (2 * 0.08 * 0.08)))\n", + "u_obs = Function(space, name=\"u_obs\").interpolate(\n", + " exp(-((X[0] - 0.6) ** 2 + (X[1] - 0.5) ** 2) / (2 * 0.08 * 0.08)))\n", + "\n", + "\n", + "def L(sigma, d, q):\n", + " u = Function(space)\n", + " solve(inner(trial, test) * dx\n", + " == (1.0 / sqrt(4.0 * np.pi * sigma * sigma))\n", + " * ((1.0 / d) * inner(q, test) * dx + d * inner(grad(q), grad(test)) * dx),\n", + " u, solver_parameters={\"ksp_type\": \"preonly\",\n", + " \"pc_type\": \"cholesky\"})\n", + " return u\n", + "\n", + "\n", + "def L_inv(sigma, d, q):\n", + " u = Function(space)\n", + " solve((1.0 / sqrt(4.0 * np.pi * sigma * sigma))\n", + " * ((1.0 / d) * inner(trial, test) * dx + d * inner(grad(trial), grad(test)) * dx)\n", + " == inner(q, test) * dx,\n", + " u, solver_parameters={\"ksp_type\": \"preonly\",\n", + " \"pc_type\": \"cholesky\"})\n", + " return u\n", + "\n", + "\n", + "def R_inv_term(q):\n", + " L_q = L(sigma_R, d_R, q)\n", + " LL_q = L(sigma_R, d_R, L_q)\n", + " return Functional(name=\"J_mismatch\").assign(0.5 * inner(LL_q, q) * dx)\n", + "\n", + "\n", + "def R_inv_action(q):\n", + " L_q = L(sigma_R, d_R, q)\n", + " LL_q = L(sigma_R, d_R, L_q)\n", + " return assemble(inner(LL_q, test) * dx)\n", + "\n", + "\n", + "def B_inv_term(q):\n", + " L_q = L(sigma_B, d_B, q)\n", + " LL_q = L(sigma_B, d_B, L_q)\n", + " return Functional(name=\"J_prior\").assign(0.5 * inner(LL_q, q) * dx)\n", + "\n", + "\n", + "def B_inv_action(q):\n", + " L_q = L(sigma_B, d_B, q)\n", + " LL_q = L(sigma_B, d_B, L_q)\n", + " return assemble(inner(LL_q, test) * dx)\n", + "\n", + "\n", + "def B_action(q):\n", + " u = Function(space)\n", + " solve(inner(trial, test) * dx == q,\n", + " u, solver_parameters={\"ksp_type\": \"preonly\",\n", + " \"pc_type\": \"cholesky\"})\n", + " Li_q = L_inv(sigma_B, d_B, u)\n", + " LiLi_q = L_inv(sigma_B, d_B, Li_q)\n", + " return LiLi_q\n", + "\n", + "\n", + "def forward(m):\n", + " m = Function(space, cache=True).assign(m)\n", + " u_np1 = Function(space, name=\"u_np1\")\n", + " u_n = u_0.copy(deepcopy=True)\n", + " eq = EquationSolver(inner(trial, test) * dx\n", + " + 0.5 * dt * inner(dot(perp(grad(m)), grad(trial)), test) * dx\n", + " + 0.5 * kappa * dt * inner(grad(trial), grad(test)) * dx\n", + " == inner(u_n, test) * dx\n", + " - 0.5 * dt * inner(dot(perp(grad(m)), grad(u_n)), test) * dx\n", + " - 0.5 * kappa * dt * inner(grad(u_n), grad(test)) * dx,\n", + " u_np1, solver_parameters={\"ksp_type\": \"preonly\",\n", + " \"pc_type\": \"lu\"})\n", + " for _ in range(N):\n", + " eq.solve()\n", + " u_n.assign(u_np1)\n", + "\n", + " u_mismatch = Function(space, name=\"u_mismatch\").assign(u_n - u_obs)\n", + " J_mismatch = R_inv_term(u_mismatch)\n", + " J_prior = B_inv_term(m)\n", + " return J_mismatch, J_prior, J_mismatch + J_prior, u_np1" + ] + }, + { + "cell_type": "markdown", + "id": "d098a1d3-605d-4fc0-8348-c077ff305e3b", + "metadata": {}, + "source": [ + "Let's first visualize the initial condition $u_0$ and the observation $u_{obs}$ taken a time $T$ later." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70e37007-ea8b-4247-b1d5-8a9250dba4a2", + "metadata": {}, + "outputs": [], + "source": [ + "def plot_output(u, title):\n", + " r = (u.dat.data_ro.min(), u.dat.data_ro.max())\n", + " eps = (r[1] - r[0]) * 1.0e-12\n", + " p = tricontourf(u, np.linspace(r[0] - eps, r[1] + eps, 32))\n", + " plt.gca().set_title(title)\n", + " plt.colorbar(p)\n", + " plt.gca().set_aspect(1.0)\n", + "\n", + "\n", + "plot_output(u_0, r\"$u_0$\")\n", + "plot_output(u_obs, r\"$u_{obs}$\")" + ] + }, + { + "cell_type": "markdown", + "id": "b0784d97-de82-4d40-91a0-f031177b69ab", + "metadata": {}, + "source": [ + "## Quantities of interest\n", + "\n", + "In the following discussion we seek to quantify uncertainties in *quantities of interest*. These are defined using linear functionals, say $q \\in V^*$. Given some value of the control $m$, $q ( m )$ is the value of the quantity of interest. Moreover, given the Maximum A Posteriori estimate $m_{map}$ for the control (the posterior density maximizer), the estimate for the posterior mean of the quantity of interest is given by\n", + "\n", + "$$\\mu_{q,posterior} \\approx q ( m_{map} ).$$\n", + "\n", + "To obtain uncertainty estimates we need a covariance operator. Note that a covariance operator, say $\\Gamma_{post}$, is a bilinear operator, $\\Gamma_{post} : V^* \\times V^* \\rightarrow \\mathbb{R}$. It takes the linear functionals we use to obtain values for two quantities of interest, and gives us a value for a covariance between them. In particular the estimate for the posterior uncertainty in a single quantity of interest, defined by a linear functional $q \\in V^*$, is\n", + "\n", + "$$\\sigma_{q,posterior} \\approx \\sqrt{ \\Gamma_{post} ( q, q ) } = \\sqrt{ H^{-1} ( q, q ) }.$$\n", + "\n", + "## Finding the posterior maximizer\n", + "\n", + "We now find the posterior density maximizer, which corresponds to seeking a minimum of the negative log posterior density. We solve this problem using the Toolkit for Advanced Optimization (TAO), with the Limited-Memory Variable Metric (LMVM) approach. This is a gradient-based approach, so let's first verify the adjoint using Taylor verification." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13af8c73-fde1-43f9-a118-961b2c4eecac", + "metadata": {}, + "outputs": [], + "source": [ + "m_0 = Function(space, name=\"m_0\").interpolate(\n", + " 0.06 * sin(2 * pi * X[1]))\n", + "\n", + "reset_manager()\n", + "start_manager()\n", + "_, _, J, _ = forward(m_0)\n", + "stop_manager()\n", + "\n", + "dJ = compute_gradient(J, m_0)\n", + "dm = Function(space, name=\"dm\").interpolate(\n", + " sin(4 * pi * X[0]) * cos(6 * pi * X[1]))\n", + "min_order = taylor_test(lambda m: forward(m)[2], m_0, J_val=J.value, dJ=dJ, dM=dm, seed=1.0e-6)\n", + "assert min_order > 1.99" + ] + }, + { + "cell_type": "markdown", + "id": "b4c644d0-29ac-4b50-84c7-93c8a3089de3", + "metadata": {}, + "source": [ + "Next we solve the optimization problem. The considered problem seeks to infer the transport in the advection-diffusion equation, in terms of a stream function. Initially the tracer is concentrated on the left, and the observation taken a time $T$ later has the tracer moved to the right. The variable `m_0`, which will define our initial guess for the optimization, is set so that the velocity at the center has approximately the correct magnitude for this transport.\n", + "\n", + "As usual we need to define an appropriate inner product associated with derivatives. Here the prior defines a natural inner product – specifically we can use the prior covariance, $B$ to define an inner product for the dual space $V^*$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5a8a5ee0-2bec-4141-b7d3-d1b9834ff9a3", + "metadata": {}, + "outputs": [], + "source": [ + "optimizer = TAOSolver(lambda m: forward(m)[2], space, H_0_action=B_action,\n", + " solver_parameters={\"tao_type\": \"lmvm\",\n", + " \"tao_gatol\": 1.0e-4,\n", + " \"tao_grtol\": 0.0,\n", + " \"tao_gttol\": 0.0,\n", + " \"tao_monitor\": None})\n", + "\n", + "m_map = Function(space, name=\"m_map\").assign(m_0)\n", + "optimizer.solve(m_map)\n", + "\n", + "reset_manager()\n", + "start_manager()\n", + "J_mismatch, _, J, u_map = forward(m_map)\n", + "stop_manager()" + ] + }, + { + "cell_type": "markdown", + "id": "572fdb0b-61a7-4bd9-8224-a61de3befdef", + "metadata": {}, + "source": [ + "Let's plot the result." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "882960ad-2584-41d8-afa5-b20f30c5a7bb", + "metadata": {}, + "outputs": [], + "source": [ + "reset_manager()\n", + "start_manager()\n", + "J_mismatch, _, J, u_map = forward(m_map)\n", + "stop_manager()\n", + "\n", + "plot_output(m_map, r\"$m_{map}$\")\n", + "plot_output(u_map, r\"$\\hat{u} ( m_{map} )$\")" + ] + }, + { + "cell_type": "markdown", + "id": "bad61e68-e156-49f7-975d-40362fec59fd", + "metadata": {}, + "source": [ + "## Computing uncertainty estimates: Low rank update approximation\n", + "\n", + "We next seek to use Hessian information to quantify the uncertainty in the result of the inference. We use a low-rank update approximation using the methodology of Isaac et al 2015, doi: 10.1016/j.jcp.2015.04.047.\n", + "\n", + "In this approach we consider the mismatch Hessian, $H_{mismatch}$, obtained by differentiating\n", + "\n", + "$$J_{mismatch} = \\frac{1}{2} R_{obs}^{-1} ( u_{obs} - \\hat{u}_N ( m ), u_{obs} - \\hat{u}_N ( m ) ),$$\n", + "\n", + "twice with respect to the control $m$. i.e.\n", + "\n", + "$$H_{mismatch} = H - B^{-1}.$$\n", + "\n", + "We seek eigenpairs $\\lambda_k \\in \\mathbb{R}$, $v_k \\in V$ such that\n", + "\n", + "$$H_{mismatch} \\left( v_k, \\cdot \\right) = \\lambda_k B^{-1} \\left( v_k, \\cdot \\right),$$\n", + "\n", + "where the eigenvectors are $B^{-1}$-orthonormal\n", + "\n", + "$$B^{-1} ( v_k, v_l ) = \\delta_{k,l}.$$\n", + "\n", + "This leads directly to an expression for a low-rank update approximation for the Hessian inverse, expressed as a low rank update to the prior covariance (equation (20) of Isaac et al 2015, doi: 10.1016/j.jcp.2015.04.047),\n", + "\n", + "$$H^{-1} ( q_i, q_j ) \\approx B ( q_i, q_j ) - \\sum_k \\frac{\\lambda_k}{1 + \\lambda_k} q_i ( v_k ) q_j ( v_k ),$$\n", + "\n", + "which we use to approximate the posterior covariance.\n", + "\n", + "We first perform a higher order Taylor verification using the mismatch Hessian." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b1df5fc5-8598-4231-b9f9-d5ca457c560a", + "metadata": {}, + "outputs": [], + "source": [ + "H_mismatch = CachedHessian(J_mismatch)\n", + "\n", + "dm = Function(space, name=\"dm\").interpolate(\n", + " exp(-((X[0] - 0.3) ** 2 + (X[1] - 0.3) ** 2) / (2 * 0.08 * 0.08)))\n", + "min_order = taylor_test(lambda m: forward(m)[0], m_map, J_val=J_mismatch.value, ddJ=H_mismatch, dM=dm, seed=1.0e-4)\n", + "assert min_order > 2.99" + ] + }, + { + "cell_type": "markdown", + "id": "64e7d158-9772-4ab1-9ca3-d4442a932cb6", + "metadata": {}, + "source": [ + "We now use SLEPc, seeking the 20 eigenpairs whose eigenvalues have largest magnitude.\n", + "\n", + "Note: There is a notational clash here! Conventially the prior inverse covariance is denoted $B^{-1}$, but here we use this on the right-hand-side of a generalized eigenproblem, where the associated matrix is *also* conventially notated $B$. Here the `B_action` argument to the `Eigensolver` constructor is set equal to `B_inv_action` in the call, and the `B_inv_action` argument is set equal to `B_action`!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1896bfac-83e4-48eb-a8c0-6163e7a36531", + "metadata": {}, + "outputs": [], + "source": [ + "esolver = HessianEigensolver(\n", + " H_mismatch, m_map, B_action=B_inv_action, B_inv_action=B_action,\n", + " solver_parameters={\"eps_type\": \"krylovschur\",\n", + " \"eps_gen_hermitian\": None,\n", + " \"eps_largest_magnitude\": None,\n", + " \"eps_nev\": 20,\n", + " \"eps_conv_rel\": None,\n", + " \"eps_tol\": 1.0e-14,\n", + " \"eps_purify\": False,\n", + " \"eps_monitor\": None})\n", + "esolver.solve()" + ] + }, + { + "cell_type": "markdown", + "id": "a71568e9-6a12-4b34-9266-76ed64c48975", + "metadata": {}, + "source": [ + "Let's plot the eigenvalues." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12f9bd12-40e1-425b-8f61-efb2de94e000", + "metadata": {}, + "outputs": [], + "source": [ + "lam = esolver.eigenvalues()\n", + "assert (lam > 0.0).all()\n", + "plt.semilogy(range(1, len(esolver) + 1), lam, \"k-\")\n", + "plt.xlim(0, len(esolver))\n", + "plt.xlabel(\"Eigenvalue index\")\n", + "plt.ylabel(\"Eigenvalue\")" + ] + }, + { + "cell_type": "markdown", + "id": "5d352f10-3ec8-41de-af3c-4d1f2164cbc6", + "metadata": {}, + "source": [ + "Each eigenvalue indicates some information, added by the observation, over the prior. Specifically each eigenvector $v_k \\in V$ has an associated dual space element defined by the prior inverse covariance, $q_{v_k} = B^{-1} ( v_k, \\cdot ) \\in V^*$. If we have an observation for a quantitity of interest defined by a linear functional equal to a (non-zero) multiple of this associated dual space element, and if the posterior were Gaussian, then the associated variance decreases by a factor of one plus the eigenvalue when we add the observation. That is, under the Gaussian approximation, we have\n", + "\n", + "$$\\frac{\\sigma^2_{q_{v_k},posterior}}{\\sigma^2_{q_{v_k},prior}} = \\frac{1}{1 + \\lambda_k},$$\n", + "\n", + "where $\\sigma^2_{q_{v_k},posterior}$ and $\\sigma^2_{q_{v_k},prior}$ are respectively the posterior and prior variance associated with $q_{v_k}$, and where $\\lambda_k$ is the associated eigenvalue.\n", + "\n", + "Having retained only $20$ eigenpairs, the smallest eigenvalue obtained is quite a bit bigger than one. This means we might expect to be missing quite a bit of information available in the Hessian if we use a low rank update approximation using only these $20$ eigenpairs. We'll return to this issue later.\n", + "\n", + "The linear functional associated with computing the average $x$-component of the velocity for $0.4 < y < 0.6$, $q_u \\in V^*$, satisfies\n", + "\n", + "$$q_u ( \\zeta ) = -\\frac{1}{\\int_\\Omega \\mathcal{I}} \\int_\\Omega \\mathcal{I} \\partial_y \\zeta,$$\n", + "\n", + "where $\\mathcal{I}$ is one where $0.4 < y < 0.6$ and zero elsewhere. We can use this to evaluate an estimate for the posterior mean of this quantity,\n", + "\n", + "$$\\mu_{q_u,posterior} \\approx q_u ( m_{map} ).$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b19afce-4d8a-4fbf-8f2f-c912d70ee78a", + "metadata": {}, + "outputs": [], + "source": [ + "I = Function(FunctionSpace(mesh, \"Discontinuous Lagrange\", 0)).interpolate(\n", + " conditional(X[1] > 0.4, 1.0, 0.0) * conditional(X[1] < 0.6, 1.0, 0.0))\n", + "q_u = assemble(-(1 / Constant(assemble(I * dx))) * I * test.dx(1) * dx)\n", + "print(f\"{assemble(q_u(m_map))=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "6b378509-e7aa-4d07-bed9-8fb12005b40d", + "metadata": {}, + "source": [ + "The associated posterior uncertainty estimate is then\n", + "\n", + "$$\\sigma_{q_u,posterior} \\approx \\sqrt{H_{approx}^{-1} ( q_u, q_u )}.$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f07ac9a2-34e4-4513-b59e-b93f3b347ded", + "metadata": {}, + "outputs": [], + "source": [ + "sigma_u = sqrt(assemble(q_u(esolver.spectral_approximation_solve(q_u))))\n", + "print(f\"{sigma_u=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "e43b225b-a0d1-4872-8ad2-31783ce2c710", + "metadata": {}, + "source": [ + "We can compare this with the prior uncertainty,\n", + "\n", + "$$\\sigma_{q_u,prior} = \\sqrt{ B ( q_u, q_u ) }.$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f0f132c-f396-49d3-9fe0-728cfa34912f", + "metadata": {}, + "outputs": [], + "source": [ + "sigma_prior_q_u = sqrt(assemble(q_u(B_action(q_u))))\n", + "print(f\"{sigma_prior_q_u=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "bfc702c5-b84f-46db-9e7a-3a23856c0e47", + "metadata": {}, + "source": [ + "i.e. we estimate that the observation has reduced the uncertainty in our estimate for the mean of $q_u( m )$ by about 25%.\n", + "\n", + "## Computing uncertainty estimates: Full Hessian inverse\n", + "\n", + "We have made a number of assumptions in order to define the inference problem, but given the problem we have used only two approximations in the uncertainty estimate itself: the local Gaussian assumption (making use of the Hessian), and the low rank update approximation for the Hessian inverse. We now remove the second of these approximations using a Krylov solver, using PETSc. Since we already have an *approximation* for the Hessian inverse, we can use this to construct a preconditioner in the calculation of a Hessian inverse action." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26a67bd8-320b-4173-8bb0-a3805fffedba", + "metadata": {}, + "outputs": [], + "source": [ + "H = CachedHessian(J)\n", + "\n", + "ksp = HessianLinearSolver(H, m_map, solver_parameters={\"ksp_type\": \"cg\",\n", + " \"ksp_atol\": 0.0,\n", + " \"ksp_rtol\": 1.0e-7,\n", + " \"ksp_monitor\": None},\n", + " pc_fn=esolver.spectral_pc_fn())\n", + "H_inv_q_u = Function(space, name=\"H_inv_q_u\")\n", + "ksp.solve(H_inv_q_u, q_u)" + ] + }, + { + "cell_type": "markdown", + "id": "752cd2ee-c3bf-4d98-9570-a1c497c86aa5", + "metadata": {}, + "source": [ + "Let's double check that we solved the linear system." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e383e6d-be39-45e7-a41e-9b708842fbc4", + "metadata": {}, + "outputs": [], + "source": [ + "residual = Cofunction(space.dual()).assign(\n", + " H.action(m_map, H_inv_q_u)[2] - q_u)\n", + "q_u_norm = abs(q_u.dat.data_ro).max()\n", + "residual_norm = abs(residual.dat.data_ro).max()\n", + "print(f\"{residual_norm / q_u_norm=}\")\n", + "assert residual_norm / q_u_norm < 1.0e-4" + ] + }, + { + "cell_type": "markdown", + "id": "5cef87b1-eb9c-4558-973a-68fab5b1163c", + "metadata": {}, + "source": [ + "Our new uncertainty estimate, using the full Hessian, is" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b527bf4-099a-4230-9617-9b2c24f1311b", + "metadata": {}, + "outputs": [], + "source": [ + "sigma_u = sqrt(assemble(q_u(H_inv_q_u)))\n", + "print(f\"{sigma_u=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "e4c380ad-9d21-4f84-8dd1-04d33573773f", + "metadata": {}, + "source": [ + "This reveals that our partial eigenspectrum estimate did indeed miss quite a bit of relevant information available in the Hessian. In fact with the full Hessian we estimate a reduction in uncertainty of about 55%." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/source/index.rst b/docs/source/index.rst index 74ec2475..900f47aa 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -54,6 +54,11 @@ tlm_adjoint. - `JAX integration `__: A finite difference example using JAX. +Advanced tutorials +------------------ + +- `Hessian-based uncertainty quantification `__ + Source and license ------------------ From ed138ec1344fdeffec00d751fe2f84b7178dacf9 Mon Sep 17 00:00:00 2001 From: "James R. Maddison" Date: Thu, 6 Jun 2024 13:19:24 +0100 Subject: [PATCH 2/2] Minor fixes --- docs/source/examples/8_hessian_uq.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/examples/8_hessian_uq.ipynb b/docs/source/examples/8_hessian_uq.ipynb index f2e01b8d..407733f0 100644 --- a/docs/source/examples/8_hessian_uq.ipynb +++ b/docs/source/examples/8_hessian_uq.ipynb @@ -381,7 +381,7 @@ "source": [ "We now use SLEPc, seeking the 20 eigenpairs whose eigenvalues have largest magnitude.\n", "\n", - "Note: There is a notational clash here! Conventially the prior inverse covariance is denoted $B^{-1}$, but here we use this on the right-hand-side of a generalized eigenproblem, where the associated matrix is *also* conventially notated $B$. Here the `B_action` argument to the `Eigensolver` constructor is set equal to `B_inv_action` in the call, and the `B_inv_action` argument is set equal to `B_action`!" + "Note: There is a notational clash here! Conventially the prior inverse covariance is denoted $B^{-1}$, but here we use this on the right-hand-side of a generalized eigenproblem, where the associated matrix is *also* conventially notated $B$. Here the `B_action` argument to the `HessianEigensolver` constructor is set equal to `B_inv_action` in the call, and the `B_inv_action` argument is set equal to `B_action`!" ] }, { @@ -509,7 +509,7 @@ "id": "bfc702c5-b84f-46db-9e7a-3a23856c0e47", "metadata": {}, "source": [ - "i.e. we estimate that the observation has reduced the uncertainty in our estimate for the mean of $q_u( m )$ by about 25%.\n", + "i.e. we estimate that the observation has reduced the uncertainty in our estimate for $q_u( m )$ by about 25%.\n", "\n", "## Computing uncertainty estimates: Full Hessian inverse\n", "\n",