From beaa4a89b4ab765b43b6a81762fa7d2fac994f5f Mon Sep 17 00:00:00 2001
From: "James R. Maddison" <j.r.maddison@ed.ac.uk>
Date: Tue, 4 Jun 2024 16:50:45 +0100
Subject: [PATCH 1/2] Hessian-based UQ example

---
 docs/source/examples/8_hessian_uq.ipynb | 609 ++++++++++++++++++++++++
 docs/source/index.rst                   |   5 +
 2 files changed, 614 insertions(+)
 create mode 100644 docs/source/examples/8_hessian_uq.ipynb

diff --git a/docs/source/examples/8_hessian_uq.ipynb b/docs/source/examples/8_hessian_uq.ipynb
new file mode 100644
index 00000000..f2e01b8d
--- /dev/null
+++ b/docs/source/examples/8_hessian_uq.ipynb
@@ -0,0 +1,609 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "82be80f5-acad-4947-8f9a-c6a7a87a7ace",
+   "metadata": {},
+   "source": [
+    "# Hessian-based uncertainty quantification\n",
+    "\n",
+    "**Author:** James R. Maddison\n",
+    "\n",
+    "This notebook describes the use of tlm_adjoint for Hessian-based uncertainty quantification. The approach is based on the method described in\n",
+    "\n",
+    "- Tobin Isaac, Noemi Petra, Georg Stadler, and Omar Ghattas, 'Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet', Journal of Computational Physics, 296, pp. 348&ndash;368, 2015, doi: 10.1016/j.jcp.2015.04.047\n",
+    "\n",
+    "We assume real spaces and a real build of Firedrake throughout.\n",
+    "\n",
+    "## Uncertainty quantification problem\n",
+    "\n",
+    "We consider the advection-diffusion equation in the unit square domain\n",
+    "\n",
+    "$$\\partial_t u + \\nabla^\\perp m \\cdot \\nabla u = \\kappa \\nabla^2 u,$$\n",
+    "\n",
+    "for some real and positive $\\kappa$ and subject to doubly periodic boundary conditions. We consider a continuous Galerkin finite element discretization in space and an implicit midpoint rule discretization in time, seeking $u_{n + 1} \\in V$ such that\n",
+    "\n",
+    "$$\\forall \\zeta \\in V \\qquad \\int_\\Omega \\zeta u_{n + 1} + \\Delta t \\int_\\Omega \\zeta \\nabla^\\perp m \\cdot \\nabla \\left[ \\frac{1}{2} ( u_n + u_{n + 1}) \\right]\n",
+    "    + \\kappa \\Delta t \\int_\\Omega \\nabla \\zeta \\cdot \\nabla \\left[ \\frac{1}{2} ( u_n + u_{n + 1}) \\right] = \\int_\\Omega \\zeta u_n,$$\n",
+    "\n",
+    "with timestep size $\\Delta t$ and given some suitable discrete initial condition $u_0 \\in V$. Here $V$ is a real continuous $P_1$ finite element space whose elements satisfy the doubly periodic boundary conditions, and after discretizing we have $m \\in V$. For a given value of $m$ we let $\\hat{u}_N ( m )$ denote the solution obtained after $N$ timesteps. Given an observation for $\\hat{u}_N ( t )$, $u_{obs} \\in V$, we seek to infer information about the control $m$. Here we are specifically seeking to infer the transport, defined in terms of a stream function $m$, which transports the solution from $u_0$ to $u_{obs}$ in time $T$.\n",
+    "\n",
+    "We first need to model the observational error. For a given value of the control $m$ we treat observations are being realizations of a Gaussian random variable, with mean given by $\\hat{u}_N ( m )$, and with known covariance. Specifically we define a density $p ( u_{obs} | m )$ whose negative logarithm is (up to a normalization term which is neglected here)\n",
+    "\n",
+    "$$-\\ln p ( u_{obs} | m ) = \\frac{1}{2} R_{obs}^{-1} ( u_{obs} - \\hat{u}_N ( m ), u_{obs} - \\hat{u}_N ( m ) ).$$\n",
+    "\n",
+    "$R_{obs}^{-1}$ is a bilinear and symmetric positive definite observational inverse covariance operator. Here we choose $R_{obs}^{-1}$ by defining\n",
+    "\n",
+    "$$R_{obs}^{-1} ( q_i, q_j ) = \\int_\\Omega q_j \\mathcal{L}_{\\sigma_R,d_R}^2 ( q_i ),$$\n",
+    "\n",
+    "where $\\mathcal{L}_{\\sigma,d} : V \\rightarrow V$ is defined by\n",
+    "\n",
+    "$$\\forall \\zeta \\in V \\qquad \\frac{1}{\\sqrt{4 \\pi \\sigma^2}} \\left[ \\frac{1}{d} \\int_\\Omega \\zeta q + d \\int_\\Omega \\nabla \\zeta \\cdot \\nabla q \\right] = \\int_\\Omega \\zeta \\mathcal{L}_{\\sigma,d} \\left( q \\right).$$\n",
+    "\n",
+    "In the continuous and $\\Omega = \\mathbb{R}^2$ case $R_{obs}$ then defines a covariance operator with single point variance $\\sigma_R^2$ and autocorrelation length scale $d_R$ (see Lindgren et al 2011, doi: 10.1111/j.1467-9868.2011.00777.x).\n",
+    "\n",
+    "We next introduce a prior for the control. We consider a Gaussian prior with mean zero and with known covariance. Specifically we define a prior density $p ( m )$ whose negative logarithm is (again up to a normalization term)\n",
+    "\n",
+    "$$-\\ln p ( m ) = \\frac{1}{2} B^{-1} ( m, m ).$$\n",
+    "\n",
+    "$B^{-1}$ is a bilinear and symmetric positive definite observational inverse covariance operator. Here we choose $B^{-1}$ by defining\n",
+    "\n",
+    "$$B^{-1} ( q_i, q_j ) = \\int_\\Omega q_j \\mathcal{L}_{\\sigma_B,d_B}^2 ( q_i ).$$\n",
+    "\n",
+    "Now applying Bayes theorem we obtain a posterior density whose negative logarithm is (up to a normalization term)\n",
+    "\n",
+    "$$-\\ln p ( m | u_{obs} ) = \\frac{1}{2} R_{obs}^{-1} ( u_{obs} - \\hat{u}_N ( m ), u_{obs} - \\hat{u}_N ( m ) ) + \\frac{1}{2} B^{-1} ( m, m ).$$\n",
+    "\n",
+    "We have made a number of modelling choices, but subject to these choices the posterior density now completely describes the information we have about the control after being supplied with an observation $u_{obs}$. The challenge is that the posterior is defined in a high dimensional space, and is in general not Gaussian. To simplify the problem we *approximate* the posterior with a Gaussian, and specifically make the approximation (up to a normalization term)\n",
+    "\n",
+    "$$-\\ln p ( m | u_{obs} ) \\approx \\frac{1}{2} \\Gamma_{post}^{-1} ( m - m_{MAP}, m - m_{MAP} ).$$\n",
+    "\n",
+    "The mean of the Gaussian approximation is set equal to the posterior density maximizer (the Maximum A Posteriori estimate), $m_{map}$. The inverse covariance of the Gaussian approximation, $\\Gamma_{post}^{-1}$, is set equal to the Hessian, $H$, of the negative log posterior density $-\\ln p ( m | u_{obs} )$, defined by differentiating twice with respect to $m$ and evaluated at the posterior density maximizer.\n",
+    "\n",
+    "Unfortunately in order to quantify uncertainty we require information about the *covariance*, and not the *inverse covariance*. Specifically if we have a linear observational operator $q \\in V^*$ then our estimate for the variance associated with $q ( m )$ is\n",
+    "\n",
+    "$$\\sigma^2_q \\approx \\Gamma_{post} ( q, q ) = H^{-1} ( q, q ),$$\n",
+    "\n",
+    "which requires access to information about the Hessian *inverse*.\n",
+    "\n",
+    "In the following we use two approaches to gain access to this information.\n",
+    "\n",
+    "1. A low rank update approximation. We approximate the Hessian inverse using a low rank update to the prior covariance, using the methodology of Isaac et al 2015, doi: 10.1016/j.jcp.2015.04.047.\n",
+    "2. By computing the Hessian inverse action using a Krylov method, preconditioned using the low rank update approximation.\n",
+    "\n",
+    "To solve this problem we need several pieces:\n",
+    "\n",
+    "1. A differentiable solver for the forward problem. We will construct this using Firedrake with tlm_adjoint.\n",
+    "2. To find the posterior maximizer. We will use gradient-based optimization using TAO.\n",
+    "3. To find a low rank update approximation for the Hessian inverse. We will find this using a partial eigenspectrum obtained using SLEPc.\n",
+    "4. To compute the Hessian inverse action. We will compute this using a Krylov method using PETSc, preconditioned using the partial eigenspectrum.\n",
+    "\n",
+    "## Forward problem\n",
+    "\n",
+    "We first implement the forward model using Firedrake with tlm_adjoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d9f4bf6-9c22-45bc-82c6-7b98d356f3e9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "from firedrake import *\n",
+    "from tlm_adjoint.firedrake import *\n",
+    "from firedrake.pyplot import tricontourf\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "\n",
+    "reset_manager()\n",
+    "clear_caches()\n",
+    "\n",
+    "mesh = PeriodicUnitSquareMesh(40, 40, diagonal=\"crossed\")\n",
+    "X = SpatialCoordinate(mesh)\n",
+    "space = FunctionSpace(mesh, \"Lagrange\", 1)\n",
+    "test = TestFunction(space)\n",
+    "trial = TrialFunction(space)\n",
+    "\n",
+    "T = 0.5\n",
+    "N = 10\n",
+    "kappa = Constant(1.0e-3, static=True)\n",
+    "dt = Constant(T / N, static=True)\n",
+    "sigma_R = Constant(0.03)\n",
+    "d_R = Constant(0.1)\n",
+    "sigma_B = Constant(0.025)\n",
+    "d_B = Constant(0.2)\n",
+    "u_0 = Function(space, name=\"u_0\").interpolate(\n",
+    "    exp(-((X[0] - 0.4) ** 2 + (X[1] - 0.5) ** 2) / (2 * 0.08 * 0.08)))\n",
+    "u_obs = Function(space, name=\"u_obs\").interpolate(\n",
+    "    exp(-((X[0] - 0.6) ** 2 + (X[1] - 0.5) ** 2) / (2 * 0.08 * 0.08)))\n",
+    "\n",
+    "\n",
+    "def L(sigma, d, q):\n",
+    "    u = Function(space)\n",
+    "    solve(inner(trial, test) * dx\n",
+    "          == (1.0 / sqrt(4.0 * np.pi * sigma * sigma))\n",
+    "          * ((1.0 / d) * inner(q, test) * dx + d * inner(grad(q), grad(test)) * dx),\n",
+    "          u, solver_parameters={\"ksp_type\": \"preonly\",\n",
+    "                                \"pc_type\": \"cholesky\"})\n",
+    "    return u\n",
+    "\n",
+    "\n",
+    "def L_inv(sigma, d, q):\n",
+    "    u = Function(space)\n",
+    "    solve((1.0 / sqrt(4.0 * np.pi * sigma * sigma))\n",
+    "          * ((1.0 / d) * inner(trial, test) * dx + d * inner(grad(trial), grad(test)) * dx)\n",
+    "          == inner(q, test) * dx,\n",
+    "          u, solver_parameters={\"ksp_type\": \"preonly\",\n",
+    "                                \"pc_type\": \"cholesky\"})\n",
+    "    return u\n",
+    "\n",
+    "\n",
+    "def R_inv_term(q):\n",
+    "    L_q = L(sigma_R, d_R, q)\n",
+    "    LL_q = L(sigma_R, d_R, L_q)\n",
+    "    return Functional(name=\"J_mismatch\").assign(0.5 * inner(LL_q, q) * dx)\n",
+    "\n",
+    "\n",
+    "def R_inv_action(q):\n",
+    "    L_q = L(sigma_R, d_R, q)\n",
+    "    LL_q = L(sigma_R, d_R, L_q)\n",
+    "    return assemble(inner(LL_q, test) * dx)\n",
+    "\n",
+    "\n",
+    "def B_inv_term(q):\n",
+    "    L_q = L(sigma_B, d_B, q)\n",
+    "    LL_q = L(sigma_B, d_B, L_q)\n",
+    "    return Functional(name=\"J_prior\").assign(0.5 * inner(LL_q, q) * dx)\n",
+    "\n",
+    "\n",
+    "def B_inv_action(q):\n",
+    "    L_q = L(sigma_B, d_B, q)\n",
+    "    LL_q = L(sigma_B, d_B, L_q)\n",
+    "    return assemble(inner(LL_q, test) * dx)\n",
+    "\n",
+    "\n",
+    "def B_action(q):\n",
+    "    u = Function(space)\n",
+    "    solve(inner(trial, test) * dx == q,\n",
+    "          u, solver_parameters={\"ksp_type\": \"preonly\",\n",
+    "                                \"pc_type\": \"cholesky\"})\n",
+    "    Li_q = L_inv(sigma_B, d_B, u)\n",
+    "    LiLi_q = L_inv(sigma_B, d_B, Li_q)\n",
+    "    return LiLi_q\n",
+    "\n",
+    "\n",
+    "def forward(m):\n",
+    "    m = Function(space, cache=True).assign(m)\n",
+    "    u_np1 = Function(space, name=\"u_np1\")\n",
+    "    u_n = u_0.copy(deepcopy=True)\n",
+    "    eq = EquationSolver(inner(trial, test) * dx\n",
+    "                        + 0.5 * dt * inner(dot(perp(grad(m)), grad(trial)), test) * dx\n",
+    "                        + 0.5 * kappa * dt * inner(grad(trial), grad(test)) * dx\n",
+    "                        == inner(u_n, test) * dx\n",
+    "                        - 0.5 * dt * inner(dot(perp(grad(m)), grad(u_n)), test) * dx\n",
+    "                        - 0.5 * kappa * dt * inner(grad(u_n), grad(test)) * dx,\n",
+    "                        u_np1, solver_parameters={\"ksp_type\": \"preonly\",\n",
+    "                                                  \"pc_type\": \"lu\"})\n",
+    "    for _ in range(N):\n",
+    "        eq.solve()\n",
+    "        u_n.assign(u_np1)\n",
+    "\n",
+    "    u_mismatch = Function(space, name=\"u_mismatch\").assign(u_n - u_obs)\n",
+    "    J_mismatch = R_inv_term(u_mismatch)\n",
+    "    J_prior = B_inv_term(m)\n",
+    "    return J_mismatch, J_prior, J_mismatch + J_prior, u_np1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d098a1d3-605d-4fc0-8348-c077ff305e3b",
+   "metadata": {},
+   "source": [
+    "Let's first visualize the initial condition $u_0$ and the observation $u_{obs}$ taken a time $T$ later."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "70e37007-ea8b-4247-b1d5-8a9250dba4a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def plot_output(u, title):\n",
+    "    r = (u.dat.data_ro.min(), u.dat.data_ro.max())\n",
+    "    eps = (r[1] - r[0]) * 1.0e-12\n",
+    "    p = tricontourf(u, np.linspace(r[0] - eps, r[1] + eps, 32))\n",
+    "    plt.gca().set_title(title)\n",
+    "    plt.colorbar(p)\n",
+    "    plt.gca().set_aspect(1.0)\n",
+    "\n",
+    "\n",
+    "plot_output(u_0, r\"$u_0$\")\n",
+    "plot_output(u_obs, r\"$u_{obs}$\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0784d97-de82-4d40-91a0-f031177b69ab",
+   "metadata": {},
+   "source": [
+    "## Quantities of interest\n",
+    "\n",
+    "In the following discussion we seek to quantify uncertainties in *quantities of interest*. These are defined using linear functionals, say $q \\in V^*$. Given some value of the control $m$, $q ( m )$ is the value of the quantity of interest. Moreover, given the Maximum A Posteriori estimate $m_{map}$ for the control (the posterior density maximizer), the estimate for the posterior mean of the quantity of interest is given by\n",
+    "\n",
+    "$$\\mu_{q,posterior} \\approx q ( m_{map} ).$$\n",
+    "\n",
+    "To obtain uncertainty estimates we need a covariance operator. Note that a covariance operator, say $\\Gamma_{post}$, is a bilinear operator, $\\Gamma_{post} : V^* \\times V^* \\rightarrow \\mathbb{R}$. It takes the linear functionals we use to obtain values for two quantities of interest, and gives us a value for a covariance between them. In particular the estimate for the posterior uncertainty in a single quantity of interest, defined by a linear functional $q \\in V^*$, is\n",
+    "\n",
+    "$$\\sigma_{q,posterior} \\approx \\sqrt{ \\Gamma_{post} ( q, q ) } = \\sqrt{ H^{-1} ( q, q ) }.$$\n",
+    "\n",
+    "## Finding the posterior maximizer\n",
+    "\n",
+    "We now find the posterior density maximizer, which corresponds to seeking a minimum of the negative log posterior density. We solve this problem using the Toolkit for Advanced Optimization (TAO), with the Limited-Memory Variable Metric (LMVM) approach. This is a gradient-based approach, so let's first verify the adjoint using Taylor verification."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "13af8c73-fde1-43f9-a118-961b2c4eecac",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "m_0 = Function(space, name=\"m_0\").interpolate(\n",
+    "    0.06 * sin(2 * pi * X[1]))\n",
+    "\n",
+    "reset_manager()\n",
+    "start_manager()\n",
+    "_, _, J, _ = forward(m_0)\n",
+    "stop_manager()\n",
+    "\n",
+    "dJ = compute_gradient(J, m_0)\n",
+    "dm = Function(space, name=\"dm\").interpolate(\n",
+    "    sin(4 * pi * X[0]) * cos(6 * pi * X[1]))\n",
+    "min_order = taylor_test(lambda m: forward(m)[2], m_0, J_val=J.value, dJ=dJ, dM=dm, seed=1.0e-6)\n",
+    "assert min_order > 1.99"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4c644d0-29ac-4b50-84c7-93c8a3089de3",
+   "metadata": {},
+   "source": [
+    "Next we solve the optimization problem. The considered problem seeks to infer the transport in the advection-diffusion equation, in terms of a stream function. Initially the tracer is concentrated on the left, and the observation taken a time $T$ later has the tracer moved to the right. The variable `m_0`, which will define our initial guess for the optimization, is set so that the velocity at the center has approximately the correct magnitude for this transport.\n",
+    "\n",
+    "As usual we need to define an appropriate inner product associated with derivatives. Here the prior defines a natural inner product &ndash; specifically we can use the prior covariance, $B$ to define an inner product for the dual space $V^*$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5a8a5ee0-2bec-4141-b7d3-d1b9834ff9a3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "optimizer = TAOSolver(lambda m: forward(m)[2], space, H_0_action=B_action,\n",
+    "                      solver_parameters={\"tao_type\": \"lmvm\",\n",
+    "                                         \"tao_gatol\": 1.0e-4,\n",
+    "                                         \"tao_grtol\": 0.0,\n",
+    "                                         \"tao_gttol\": 0.0,\n",
+    "                                         \"tao_monitor\": None})\n",
+    "\n",
+    "m_map = Function(space, name=\"m_map\").assign(m_0)\n",
+    "optimizer.solve(m_map)\n",
+    "\n",
+    "reset_manager()\n",
+    "start_manager()\n",
+    "J_mismatch, _, J, u_map = forward(m_map)\n",
+    "stop_manager()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "572fdb0b-61a7-4bd9-8224-a61de3befdef",
+   "metadata": {},
+   "source": [
+    "Let's plot the result."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "882960ad-2584-41d8-afa5-b20f30c5a7bb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "reset_manager()\n",
+    "start_manager()\n",
+    "J_mismatch, _, J, u_map = forward(m_map)\n",
+    "stop_manager()\n",
+    "\n",
+    "plot_output(m_map, r\"$m_{map}$\")\n",
+    "plot_output(u_map, r\"$\\hat{u} ( m_{map} )$\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bad61e68-e156-49f7-975d-40362fec59fd",
+   "metadata": {},
+   "source": [
+    "## Computing uncertainty estimates: Low rank update approximation\n",
+    "\n",
+    "We next seek to use Hessian information to quantify the uncertainty in the result of the inference. We use a low-rank update approximation using the methodology of Isaac et al 2015, doi: 10.1016/j.jcp.2015.04.047.\n",
+    "\n",
+    "In this approach we consider the mismatch Hessian, $H_{mismatch}$, obtained by differentiating\n",
+    "\n",
+    "$$J_{mismatch} = \\frac{1}{2} R_{obs}^{-1} ( u_{obs} - \\hat{u}_N ( m ), u_{obs} - \\hat{u}_N ( m ) ),$$\n",
+    "\n",
+    "twice with respect to the control $m$. i.e.\n",
+    "\n",
+    "$$H_{mismatch} = H - B^{-1}.$$\n",
+    "\n",
+    "We seek eigenpairs $\\lambda_k \\in \\mathbb{R}$, $v_k \\in V$ such that\n",
+    "\n",
+    "$$H_{mismatch} \\left( v_k, \\cdot \\right) = \\lambda_k B^{-1} \\left( v_k, \\cdot \\right),$$\n",
+    "\n",
+    "where the eigenvectors are $B^{-1}$-orthonormal\n",
+    "\n",
+    "$$B^{-1} ( v_k, v_l ) = \\delta_{k,l}.$$\n",
+    "\n",
+    "This leads directly to an expression for a low-rank update approximation for the Hessian inverse, expressed as a low rank update to the prior covariance (equation (20) of Isaac et al 2015, doi: 10.1016/j.jcp.2015.04.047),\n",
+    "\n",
+    "$$H^{-1} ( q_i, q_j ) \\approx B ( q_i, q_j ) - \\sum_k \\frac{\\lambda_k}{1 + \\lambda_k} q_i ( v_k ) q_j ( v_k ),$$\n",
+    "\n",
+    "which we use to approximate the posterior covariance.\n",
+    "\n",
+    "We first perform a higher order Taylor verification using the mismatch Hessian."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b1df5fc5-8598-4231-b9f9-d5ca457c560a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "H_mismatch = CachedHessian(J_mismatch)\n",
+    "\n",
+    "dm = Function(space, name=\"dm\").interpolate(\n",
+    "    exp(-((X[0] - 0.3) ** 2 + (X[1] - 0.3) ** 2) / (2 * 0.08 * 0.08)))\n",
+    "min_order = taylor_test(lambda m: forward(m)[0], m_map, J_val=J_mismatch.value, ddJ=H_mismatch, dM=dm, seed=1.0e-4)\n",
+    "assert min_order > 2.99"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64e7d158-9772-4ab1-9ca3-d4442a932cb6",
+   "metadata": {},
+   "source": [
+    "We now use SLEPc, seeking the 20 eigenpairs whose eigenvalues have largest magnitude.\n",
+    "\n",
+    "Note: There is a notational clash here! Conventially the prior inverse covariance is denoted $B^{-1}$, but here we use this on the right-hand-side of a generalized eigenproblem, where the associated matrix is *also* conventially notated $B$. Here the `B_action` argument to the `Eigensolver` constructor is set equal to `B_inv_action` in the call, and the `B_inv_action` argument is set equal to `B_action`!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1896bfac-83e4-48eb-a8c0-6163e7a36531",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "esolver = HessianEigensolver(\n",
+    "    H_mismatch, m_map, B_action=B_inv_action, B_inv_action=B_action,\n",
+    "    solver_parameters={\"eps_type\": \"krylovschur\",\n",
+    "                       \"eps_gen_hermitian\": None,\n",
+    "                       \"eps_largest_magnitude\": None,\n",
+    "                       \"eps_nev\": 20,\n",
+    "                       \"eps_conv_rel\": None,\n",
+    "                       \"eps_tol\": 1.0e-14,\n",
+    "                       \"eps_purify\": False,\n",
+    "                       \"eps_monitor\": None})\n",
+    "esolver.solve()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a71568e9-6a12-4b34-9266-76ed64c48975",
+   "metadata": {},
+   "source": [
+    "Let's plot the eigenvalues."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "12f9bd12-40e1-425b-8f61-efb2de94e000",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lam = esolver.eigenvalues()\n",
+    "assert (lam > 0.0).all()\n",
+    "plt.semilogy(range(1, len(esolver) + 1), lam, \"k-\")\n",
+    "plt.xlim(0, len(esolver))\n",
+    "plt.xlabel(\"Eigenvalue index\")\n",
+    "plt.ylabel(\"Eigenvalue\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d352f10-3ec8-41de-af3c-4d1f2164cbc6",
+   "metadata": {},
+   "source": [
+    "Each eigenvalue indicates some information, added by the observation, over the prior. Specifically each eigenvector $v_k \\in V$ has an associated dual space element defined by the prior inverse covariance, $q_{v_k} = B^{-1} ( v_k, \\cdot ) \\in V^*$. If we have an observation for a quantitity of interest defined by a linear functional equal to a (non-zero) multiple of this associated dual space element, and if the posterior were Gaussian, then the associated variance decreases by a factor of one plus the eigenvalue when we add the observation. That is, under the Gaussian approximation, we have\n",
+    "\n",
+    "$$\\frac{\\sigma^2_{q_{v_k},posterior}}{\\sigma^2_{q_{v_k},prior}} = \\frac{1}{1 + \\lambda_k},$$\n",
+    "\n",
+    "where $\\sigma^2_{q_{v_k},posterior}$ and $\\sigma^2_{q_{v_k},prior}$ are respectively the posterior and prior variance associated with $q_{v_k}$, and where $\\lambda_k$ is the associated eigenvalue.\n",
+    "\n",
+    "Having retained only $20$ eigenpairs, the smallest eigenvalue obtained is quite a bit bigger than one. This means we might expect to be missing quite a bit of information available in the Hessian if we use a low rank update approximation using only these $20$ eigenpairs. We'll return to this issue later.\n",
+    "\n",
+    "The linear functional associated with computing the average $x$-component of the velocity for $0.4 < y < 0.6$, $q_u \\in V^*$, satisfies\n",
+    "\n",
+    "$$q_u ( \\zeta ) = -\\frac{1}{\\int_\\Omega \\mathcal{I}} \\int_\\Omega \\mathcal{I} \\partial_y \\zeta,$$\n",
+    "\n",
+    "where $\\mathcal{I}$ is one where $0.4 < y < 0.6$ and zero elsewhere. We can use this to evaluate an estimate for the posterior mean of this quantity,\n",
+    "\n",
+    "$$\\mu_{q_u,posterior} \\approx q_u ( m_{map} ).$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0b19afce-4d8a-4fbf-8f2f-c912d70ee78a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "I = Function(FunctionSpace(mesh, \"Discontinuous Lagrange\", 0)).interpolate(\n",
+    "    conditional(X[1] > 0.4, 1.0, 0.0) * conditional(X[1] < 0.6, 1.0, 0.0))\n",
+    "q_u = assemble(-(1 / Constant(assemble(I * dx))) * I * test.dx(1) * dx)\n",
+    "print(f\"{assemble(q_u(m_map))=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b378509-e7aa-4d07-bed9-8fb12005b40d",
+   "metadata": {},
+   "source": [
+    "The associated posterior uncertainty estimate is then\n",
+    "\n",
+    "$$\\sigma_{q_u,posterior} \\approx \\sqrt{H_{approx}^{-1} ( q_u, q_u )}.$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f07ac9a2-34e4-4513-b59e-b93f3b347ded",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sigma_u = sqrt(assemble(q_u(esolver.spectral_approximation_solve(q_u))))\n",
+    "print(f\"{sigma_u=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e43b225b-a0d1-4872-8ad2-31783ce2c710",
+   "metadata": {},
+   "source": [
+    "We can compare this with the prior uncertainty,\n",
+    "\n",
+    "$$\\sigma_{q_u,prior} = \\sqrt{ B ( q_u, q_u ) }.$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3f0f132c-f396-49d3-9fe0-728cfa34912f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sigma_prior_q_u = sqrt(assemble(q_u(B_action(q_u))))\n",
+    "print(f\"{sigma_prior_q_u=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfc702c5-b84f-46db-9e7a-3a23856c0e47",
+   "metadata": {},
+   "source": [
+    "i.e. we estimate that the observation has reduced the uncertainty in our estimate for the mean of $q_u( m )$ by about 25%.\n",
+    "\n",
+    "## Computing uncertainty estimates: Full Hessian inverse\n",
+    "\n",
+    "We have made a number of assumptions in order to define the inference problem, but given the problem we have used only two approximations in the uncertainty estimate itself: the local Gaussian assumption (making use of the Hessian), and the low rank update approximation for the Hessian inverse. We now remove the second of these approximations using a Krylov solver, using PETSc. Since we already have an *approximation* for the Hessian inverse, we can use this to construct a preconditioner in the calculation of a Hessian inverse action."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "26a67bd8-320b-4173-8bb0-a3805fffedba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "H = CachedHessian(J)\n",
+    "\n",
+    "ksp = HessianLinearSolver(H, m_map, solver_parameters={\"ksp_type\": \"cg\",\n",
+    "                                                       \"ksp_atol\": 0.0,\n",
+    "                                                       \"ksp_rtol\": 1.0e-7,\n",
+    "                                                       \"ksp_monitor\": None},\n",
+    "                          pc_fn=esolver.spectral_pc_fn())\n",
+    "H_inv_q_u = Function(space, name=\"H_inv_q_u\")\n",
+    "ksp.solve(H_inv_q_u, q_u)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "752cd2ee-c3bf-4d98-9570-a1c497c86aa5",
+   "metadata": {},
+   "source": [
+    "Let's double check that we solved the linear system."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e383e6d-be39-45e7-a41e-9b708842fbc4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "residual = Cofunction(space.dual()).assign(\n",
+    "    H.action(m_map, H_inv_q_u)[2] - q_u)\n",
+    "q_u_norm = abs(q_u.dat.data_ro).max()\n",
+    "residual_norm = abs(residual.dat.data_ro).max()\n",
+    "print(f\"{residual_norm / q_u_norm=}\")\n",
+    "assert residual_norm / q_u_norm < 1.0e-4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cef87b1-eb9c-4558-973a-68fab5b1163c",
+   "metadata": {},
+   "source": [
+    "Our new uncertainty estimate, using the full Hessian, is"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8b527bf4-099a-4230-9617-9b2c24f1311b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sigma_u = sqrt(assemble(q_u(H_inv_q_u)))\n",
+    "print(f\"{sigma_u=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4c380ad-9d21-4f84-8dd1-04d33573773f",
+   "metadata": {},
+   "source": [
+    "This reveals that our partial eigenspectrum estimate did indeed miss quite a bit of relevant information available in the Hessian. In fact with the full Hessian we estimate a reduction in uncertainty of about 55%."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 74ec2475..900f47aa 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -54,6 +54,11 @@ tlm_adjoint.
 - `JAX integration <examples/7_jax_integration.ipynb>`__: A finite difference
   example using JAX.
 
+Advanced tutorials
+------------------
+
+- `Hessian-based uncertainty quantification <examples/8_hessian_uq.ipynb>`__
+
 Source and license
 ------------------
 

From ed138ec1344fdeffec00d751fe2f84b7178dacf9 Mon Sep 17 00:00:00 2001
From: "James R. Maddison" <j.r.maddison@ed.ac.uk>
Date: Thu, 6 Jun 2024 13:19:24 +0100
Subject: [PATCH 2/2] Minor fixes

---
 docs/source/examples/8_hessian_uq.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/examples/8_hessian_uq.ipynb b/docs/source/examples/8_hessian_uq.ipynb
index f2e01b8d..407733f0 100644
--- a/docs/source/examples/8_hessian_uq.ipynb
+++ b/docs/source/examples/8_hessian_uq.ipynb
@@ -381,7 +381,7 @@
    "source": [
     "We now use SLEPc, seeking the 20 eigenpairs whose eigenvalues have largest magnitude.\n",
     "\n",
-    "Note: There is a notational clash here! Conventially the prior inverse covariance is denoted $B^{-1}$, but here we use this on the right-hand-side of a generalized eigenproblem, where the associated matrix is *also* conventially notated $B$. Here the `B_action` argument to the `Eigensolver` constructor is set equal to `B_inv_action` in the call, and the `B_inv_action` argument is set equal to `B_action`!"
+    "Note: There is a notational clash here! Conventially the prior inverse covariance is denoted $B^{-1}$, but here we use this on the right-hand-side of a generalized eigenproblem, where the associated matrix is *also* conventially notated $B$. Here the `B_action` argument to the `HessianEigensolver` constructor is set equal to `B_inv_action` in the call, and the `B_inv_action` argument is set equal to `B_action`!"
    ]
   },
   {
@@ -509,7 +509,7 @@
    "id": "bfc702c5-b84f-46db-9e7a-3a23856c0e47",
    "metadata": {},
    "source": [
-    "i.e. we estimate that the observation has reduced the uncertainty in our estimate for the mean of $q_u( m )$ by about 25%.\n",
+    "i.e. we estimate that the observation has reduced the uncertainty in our estimate for $q_u( m )$ by about 25%.\n",
     "\n",
     "## Computing uncertainty estimates: Full Hessian inverse\n",
     "\n",