Skip to content

Commit

Permalink
Merge branch 'feat_dsi' of https://github.com/gmdsi/GMDSI_notebooks i…
Browse files Browse the repository at this point in the history
…nto feat_dsi
  • Loading branch information
rhugman committed Aug 19, 2024
2 parents 26a5d2a + e234889 commit 90660fe
Showing 1 changed file with 30 additions and 10 deletions.
40 changes: 30 additions & 10 deletions tutorials/part0_intro_to_dsi/intro_to_dsi.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,17 @@
"source": [
"# Data-space inversion\n",
"\n",
"Following the notation in [Lima et al (2020)](https://doi.org/10.1007/s10596-020-09933-w), $\\mathbf{d}$ is the vector of model simulated outputs that correspond to both predictions and measurements. As mentioned above, the main idea behind the method is to use PCA to write the vector of predicted data ($\\mathbf{d}_{\\text{PCA}}$) as:\n",
"Following the notation in [Lima et al (2020)](https://doi.org/10.1007/s10596-020-09933-w), $\\mathbf{d}$ is the vector of model simulated outputs that correspond to both predictions and measurements. As mentioned above, the main idea behind the method is to use principle components analysis (PCA) to write the vector of predicted data ($\\mathbf{d}_{\\text{PCA}}$) as:\n",
"\n",
"$$\n",
"\\mathbf{d}_{\\text{PCA}} = \\bar{\\mathbf{d}} + \\mathbf{C}_d^{1/2} \\mathbf{x}\n",
"$$\n",
"$$|\n",
"\n",
"in which $\\bar{\\mathbf{d}}$ and $\\mathbf{C}_d$ are the mean and the covariance matrix of $\\mathbf{d}$, and $\\mathbf{x}$ is a vector of random numbers. Both of which are obtained from the ensemble of model outputs. \n",
"\n",
"in which $\\bar{\\mathbf{d}}$ and $\\mathbf{C}_d$ are the mean and the covariance matrix of $\\mathbf{d}$, and $\\mathbf{x}$ is a vector of random numbers. Both of which are obtained from the ensemble of model outputs.\n",
">_A note on $\\mathbf{x}$...This vector contains \"parameters\" for the surrogate model. Importantly, these are not the base parameters of the underlying model, but you can think of them as a mapping from base parameters to \"super parameters\" that drive the surrogate model. They are derived from the PCA analysis and we go into more detail below._\n",
"\n",
"## Calculate the mean-vector"
"## Calculate the mean-vector $\\bar{\\mathbf{d}}$"
]
},
{
Expand Down Expand Up @@ -116,7 +118,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Calculate $\\mathbf{C}_d^{1/2}$\n",
"## Calculate the covariance matrix $\\mathbf{C}_d^{1/2}$\n",
"\n",
"$\\mathbf{C}_d$ is calcualted as:\n",
"\n",
Expand Down Expand Up @@ -164,7 +166,7 @@
"\\Delta\\mathbf{D} = \\mathbf{U} \\mathbf{\\Sigma} \\mathbf{V}^T\n",
"$$\n",
"\n",
"where $\\mathbf{U}$ and $\\mathbf{V}$ are orthogonal matrices and $\\mathbf{\\Sigma}$ is a diagonal matrix with the singular values of $\\Delta\\mathbf{D}$.\n"
"where $\\mathbf{U}$ and $\\mathbf{V}$ are orthogonal matrices forming a basis for $\\Delta\\mathbf{D}$ and $\\mathbf{\\Sigma}$ is a diagonal matrix with the singular values of $\\Delta\\mathbf{D}$.\n"
]
},
{
Expand Down Expand Up @@ -223,14 +225,32 @@
"x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, to execute a \"forward run\", we just calculate:\n",
"$$\n",
"\\mathbf{d}_{\\text{PCA}} = \\bar{\\mathbf{d}} + \\mathbf{C}_d^{1/2} \\mathbf{x}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# a model-emulator \"forward run\"\n",
"d_bar.values + np.dot(Cd_sqrt,x)"
"d_bar.values + np.dot(Cd_sqrt,x), d_bar.values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### But wait, we haven't done anything?\n",
"We are starting with these PCA-related \"latent-space parameters\" all as 0, so we get a trivial result when we run the forward model. But...this formulation gives us an opportunity to \"map\" from the mean of the observations to new ones, if we just \"calibrate\" those parameters in a meaningful way. It's like starting with a forward model and a set of parameters with unknown values. If we perform calibration with real observation values, we can learn meaningful values for $\\mathbf{x}$. Let's try that!"
]
},
{
Expand Down Expand Up @@ -360,7 +380,7 @@
"source": [
"# Rejection sampling with DSI\n",
"\n",
"DSI is so efficient, we can try to do rejection sampling. So first, we need to generate a lot of (latent-space) parameter sets, then run them through the DSI emulator. Then we can filter out output sets that dont reproduce the historic observations \"well enough\"...here we go"
"DSI is so efficient, we can try to do rejection sampling. So first, we need to generate a lot of (latent-space) parameter sets, then run them through the DSI emulator. Then we can filter out output sets that don't reproduce the historic observations \"well enough\"...here we go"
]
},
{
Expand Down Expand Up @@ -388,7 +408,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok so now we need to decide which results are \"good enough\" and which ones aren't. In the PEST world, this usually done with weighted sum-of-squared residual, so lets do that (assuming weights on 1.0):"
"Ok so now we need to decide which results are \"good enough\" and which ones aren't. In the PEST world, this usually done with weighted sum-of-squared residual, so lets do that (assuming weights of 1.0 for each observation value):"
]
},
{
Expand Down Expand Up @@ -454,7 +474,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Boo ya! thats pretty awesome - pure bayesian sampling...and it worked (in that we captured the truth with the posterior)! #winning"
"Boo ya! thats pretty awesome - pure Bayesian sampling...and it worked (in that we captured the truth with the posterior)! #winning"
]
}
],
Expand Down

0 comments on commit 90660fe

Please sign in to comment.