Documentation

msainsburydale · Dec 10, 2023 · bc4cd6d · bc4cd6d
1 parent 386b91e
commit bc4cd6d
Show file tree

Hide file tree

Showing 6 changed files with 88 additions and 18 deletions.
diff --git a/docs/src/API/core.md b/docs/src/API/core.md
@@ -22,7 +22,7 @@ The data should be stored as a `Vector{A}`, where each element of the vector is
 ## Types of estimators
 
 See also [Architectures and activations functions](@ref) that are often used
-when constructing neural estimators, and the convenience constructor [`initialise_estimator`](@ref). 
+when constructing neural estimators, and the convenience constructor [`initialise_estimator`](@ref).
 
 ```@docs
 NeuralEstimator
@@ -59,9 +59,15 @@ assess
 Assessment
 
 risk
+
+bias
+
+rmse
 ```
 
-## Bootstrapping
+## Uncertainty quantification
+
+Uncertainty quantification often proceeds through the bootstrap distribution, which is essentially available "for free" when bootstrap data sets can be quickly generated. Alternatively, one may approximate a set of low and high marginal posterior quantiles using a specially constructed neural Bayes estimator, which can then be used to construct credible intervals: see [`IntervalEstimator`](@ref) and its variants.  
 
 ```@docs
 bootstrap

diff --git a/src/Architectures.jl b/src/Architectures.jl
@@ -55,7 +55,7 @@ element of the vector is associated with one parameter vector (i.e., one set of
 independent replicates), and where `A` depends on the form of the data and the
 chosen architecture for `ψ`. As a rule of thumb, when the data are stored as an
 array, the replicates are stored in the final dimension of the array. (This is
-usually the 'batch' dimension, but batching with `DeepSets` is done at the set
+usually the 'batch' dimension, but batching with `DeepSet`s is done at the set
 level, i.e., sets of replicates are batched together.) For example, with
 gridded spatial data and `ψ` a CNN, `A` should be
 a 4-dimensional array, with the replicates stored in the 4ᵗʰ dimension.

diff --git a/src/Estimators.jl b/src/Estimators.jl
@@ -108,7 +108,7 @@ where
 - ``g(⋅)`` is a logistic function that maps its input to the prior support.
 
 Note that, in addition to ensuring that the interval remains in the prior support,
-this constructions also ensures that the intervals are valid (i.e., it prevents
+this construction also ensures that the intervals are valid (i.e., it prevents
 quantile crossing, in the sense that the upper bound is always greater than the
 lower bound).
 
@@ -188,7 +188,7 @@ end
 	PointIntervalEstimator(arch_point, arch_lower, arch_upper)
 	PointIntervalEstimator(arch_point, arch_bound)
 	PointIntervalEstimator(arch)
-A neural estimator that jointly produces point estimates, θ̂(Z), where θ̂(Z) is a
+A neural estimator that jointly produces point estimates, ``θ̂(Z)``, where ``θ̂(⋅)`` is a
 neural point estimator with architecture `arch_point`, and credible intervals constructed as,
 
 ```math
@@ -363,8 +363,8 @@ they see fit using arbitrary `Flux` code; see
 [here](https://fluxml.ai/Flux.jl/stable/models/layers/) for `Flux`'s API reference.
 
 # Keyword arguments
-- `architecture::String`: for unstructured data, one may use a densely-connected neural network (`"DNN"`); for data collected over a grid, a convolutional neural network (`"CNN"`); and for graphical or irregular spatial data, a graphical neural network (`"GNN"`).
-- `d::Integer = 1`: dimension of the response variable (e.g., `d = 1` for univariate processes).
+- `architecture::String`: for unstructured multivariate data, one may use a densely-connected neural network (`"DNN"`); for data collected over a grid, a convolutional neural network (`"CNN"`); and for graphical or irregular spatial data, a graphical neural network (`"GNN"`).
+- `d::Integer = 1`: for unstructured multivariate data (i.e., when `architecture = "DNN"`), the dimension of the data (e.g., `d = 3` for trivariate data); otherwise, if `architecture ∈ ["CNN", "GNN"]`, the argument `d` controls the number of input channels (e.g., `d = 1` for univariate spatial processes).
 - `estimator_type::String = "point"`: the type of estimator; either `"point"` or `"interval"`.
 - `depth = 3`: the number of hidden layers; either a single integer or an integer vector of length two specifying the depth of the inner (summary) and outer (inference) network of the DeepSets framework.
 - `width = 32`: a single integer or an integer vector of length `sum(depth)` specifying the width (or number of convolutional filters/channels) in each hidden layer.

diff --git a/src/NeuralEstimators.jl b/src/NeuralEstimators.jl
@@ -57,7 +57,7 @@ include("densities.jl")
 export train, trainx, subsetdata
 include("train.jl")
 
-export assess, Assessment, merge, risk
+export assess, Assessment, merge, risk, bias, rmse
 include("assess.jl")
 
 export plotrisk, plotdistribution

diff --git a/src/assess.jl b/src/assess.jl
@@ -33,32 +33,31 @@ function merge(assessment::Assessment, assessments::Assessment...)
 end
 
 @doc raw"""
-	risk(assessment::Assessment; loss = (x, y) -> abs(x - y), average_over_parameters = true)
+	risk(assessment::Assessment; ...)
 
-Computes a Monte Carlo approximation of the Bayes risk,
+Computes a Monte Carlo approximation of an estimator's Bayes risk,
 
 ```math
-r_{\Omega}(\hat{\boldsymbol{\theta}}(\cdot))
+r(\hat{\boldsymbol{\theta}}(\cdot))
 \approx
-\frac{1}{K} \sum_{\boldsymbol{\theta} \in \vartheta} \frac{1}{J} \sum_{\boldsymbol{Z} \in \mathcal{Z}_{\boldsymbol{\theta}}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z})).
+\frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),
 ```
 
-where ``\vartheta`` denotes a set of ``K`` parameter vectors sampled from the
-prior ``\Omega(\cdot)`` and, for each ``\boldsymbol{\theta} \in \vartheta``, we
-have ``J`` sets of ``m`` mutually independent realisations from the model
-collected in ``\mathcal{Z}_{\boldsymbol{\theta}}``.
+where ``\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}`` denotes a set of ``K`` parameter vectors sampled from the
+prior and, for each ``k``, data ``\boldsymbol{Z}^{(k)}`` are simulated from the statistical model conditional on ``\boldsymbol{\theta}^{(k)}``.
 
 # Keyword arguments
 - `loss = (x, y) -> abs(x - y)`: a binary operator defining the loss function (default absolute-error loss).
 - `average_over_parameters::Bool = true`: if true (default), the loss is averaged over all parameters; otherwise, the loss is averaged over each parameter separately.
 - `average_over_sample_sizes::Bool = true`: if true (default), the loss is averaged over all sample sizes ``m``; otherwise, the loss is averaged over each sample size separately.
 """
-function risk(assessment::Assessment;
+risk(assessment::Assessment; args...) = risk(assessment.df; args...)
+
+function risk(df::DataFrame;
 			  loss = (x, y) -> abs(x - y),
 			  average_over_parameters::Bool = true,
 			  average_over_sample_sizes::Bool = true)
 
-	df = assessment.df
 	grouping_variables = [:estimator]
 	if !average_over_parameters push!(grouping_variables, :parameter) end
 	if !average_over_sample_sizes push!(grouping_variables, :m) end
@@ -69,6 +68,60 @@ function risk(assessment::Assessment;
 	return df
 end
 
+@doc raw"""
+	bias(assessment::Assessment; ...)
+
+Computes a Monte Carlo approximation of an estimator's bias,
+
+```math
+{\rm{bias}}(\hat{\boldsymbol{\theta}}(\cdot))
+\approx
+\frac{1}{K} \sum_{k=1}^K \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)},
+```
+
+where ``\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}`` denotes a set of ``K`` parameter vectors sampled from the
+prior and, for each ``k``, data ``\boldsymbol{Z}^{(k)}`` are simulated from the statistical model conditional on ``\boldsymbol{\theta}^{(k)}``.
+
+This function inherits the keyword arguments of [`risk`](@ref) (excluding the argument `loss`).
+"""
+bias(assessment::Assessment; args...) = bias(assessment.df; args...)
+
+function bias(df::DataFrame; args...)
+
+    df = risk(df; loss = (x, y) -> x - y, args...)
+
+	rename!(df, :risk => :bias)
+
+	return df
+end
+
+@doc raw"""
+	rmse(assessment::Assessment; ...)
+
+Computes a Monte Carlo approximation of an estimator's root-mean-squared error,
+
+```math
+{\rm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot))
+\approx
+\sqrt{\frac{1}{K} \sum_{k=1}^K (\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)})^2},
+```
+
+where ``\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}`` denotes a set of ``K`` parameter vectors sampled from the
+prior and, for each ``k``, data ``\boldsymbol{Z}^{(k)}`` are simulated from the statistical model conditional on ``\boldsymbol{\theta}^{(k)}``.
+
+This function inherits the keyword arguments of [`risk`](@ref) (excluding the argument `loss`).
+"""
+rmse(assessment::Assessment; args...) = rmse(assessment.df; args...)
+
+function rmse(df::DataFrame; args...)
+
+	df = risk(df; loss = (x, y) -> (x - y)^2, args...)
+
+    df[:, :risk] = sqrt.(df[:, :risk])
+    rename!(df, :risk => :rmse)
+
+	return df
+end
 
 # ---- assess() ----
 

diff --git a/test/runtests.jl b/test/runtests.jl
@@ -534,10 +534,21 @@ m  = 10 # default sample size
 
 				@test typeof(merge(assessment, assessment)) == Assessment
 				risk(assessment)
+				risk(assessment, loss = (x, y) -> (x - y)^2)
 				risk(assessment; average_over_parameters = false)
 				risk(assessment; average_over_sample_sizes = false)
 				risk(assessment; average_over_parameters = false, average_over_sample_sizes = false)
 
+				bias(assessment)
+				bias(assessment; average_over_parameters = false)
+				bias(assessment; average_over_sample_sizes = false)
+				bias(assessment; average_over_parameters = false, average_over_sample_sizes = false)
+
+				rmse(assessment)
+				rmse(assessment; average_over_parameters = false)
+				rmse(assessment; average_over_sample_sizes = false)
+				rmse(assessment; average_over_parameters = false, average_over_sample_sizes = false)
+
 				# J == 5 > 1
 				Z_test = simulator(parameters, m, 5)
 				assessment = assess([θ̂], parameters, Z_test, use_gpu = use_gpu, verbose = verbose)