diff --git a/docs/src/API/core.md b/docs/src/API/core.md index 69c7519..d0a4413 100644 --- a/docs/src/API/core.md +++ b/docs/src/API/core.md @@ -22,7 +22,7 @@ The data should be stored as a `Vector{A}`, where each element of the vector is ## Types of estimators See also [Architectures and activations functions](@ref) that are often used -when constructing neural estimators, and the convenience constructor [`initialise_estimator`](@ref). +when constructing neural estimators, and the convenience constructor [`initialise_estimator`](@ref). ```@docs NeuralEstimator @@ -59,9 +59,15 @@ assess Assessment risk + +bias + +rmse ``` -## Bootstrapping +## Uncertainty quantification + +Uncertainty quantification often proceeds through the bootstrap distribution, which is essentially available "for free" when bootstrap data sets can be quickly generated. Alternatively, one may approximate a set of low and high marginal posterior quantiles using a specially constructed neural Bayes estimator, which can then be used to construct credible intervals: see [`IntervalEstimator`](@ref) and its variants. ```@docs bootstrap diff --git a/src/Architectures.jl b/src/Architectures.jl index 1e17302..9076c18 100644 --- a/src/Architectures.jl +++ b/src/Architectures.jl @@ -55,7 +55,7 @@ element of the vector is associated with one parameter vector (i.e., one set of independent replicates), and where `A` depends on the form of the data and the chosen architecture for `ψ`. As a rule of thumb, when the data are stored as an array, the replicates are stored in the final dimension of the array. (This is -usually the 'batch' dimension, but batching with `DeepSets` is done at the set +usually the 'batch' dimension, but batching with `DeepSet`s is done at the set level, i.e., sets of replicates are batched together.) For example, with gridded spatial data and `ψ` a CNN, `A` should be a 4-dimensional array, with the replicates stored in the 4ᵗʰ dimension. diff --git a/src/Estimators.jl b/src/Estimators.jl index c63e11d..b576e68 100644 --- a/src/Estimators.jl +++ b/src/Estimators.jl @@ -108,7 +108,7 @@ where - ``g(⋅)`` is a logistic function that maps its input to the prior support. Note that, in addition to ensuring that the interval remains in the prior support, -this constructions also ensures that the intervals are valid (i.e., it prevents +this construction also ensures that the intervals are valid (i.e., it prevents quantile crossing, in the sense that the upper bound is always greater than the lower bound). @@ -188,7 +188,7 @@ end PointIntervalEstimator(arch_point, arch_lower, arch_upper) PointIntervalEstimator(arch_point, arch_bound) PointIntervalEstimator(arch) -A neural estimator that jointly produces point estimates, θ̂(Z), where θ̂(Z) is a +A neural estimator that jointly produces point estimates, ``θ̂(Z)``, where ``θ̂(⋅)`` is a neural point estimator with architecture `arch_point`, and credible intervals constructed as, ```math @@ -363,8 +363,8 @@ they see fit using arbitrary `Flux` code; see [here](https://fluxml.ai/Flux.jl/stable/models/layers/) for `Flux`'s API reference. # Keyword arguments -- `architecture::String`: for unstructured data, one may use a densely-connected neural network (`"DNN"`); for data collected over a grid, a convolutional neural network (`"CNN"`); and for graphical or irregular spatial data, a graphical neural network (`"GNN"`). -- `d::Integer = 1`: dimension of the response variable (e.g., `d = 1` for univariate processes). +- `architecture::String`: for unstructured multivariate data, one may use a densely-connected neural network (`"DNN"`); for data collected over a grid, a convolutional neural network (`"CNN"`); and for graphical or irregular spatial data, a graphical neural network (`"GNN"`). +- `d::Integer = 1`: for unstructured multivariate data (i.e., when `architecture = "DNN"`), the dimension of the data (e.g., `d = 3` for trivariate data); otherwise, if `architecture ∈ ["CNN", "GNN"]`, the argument `d` controls the number of input channels (e.g., `d = 1` for univariate spatial processes). - `estimator_type::String = "point"`: the type of estimator; either `"point"` or `"interval"`. - `depth = 3`: the number of hidden layers; either a single integer or an integer vector of length two specifying the depth of the inner (summary) and outer (inference) network of the DeepSets framework. - `width = 32`: a single integer or an integer vector of length `sum(depth)` specifying the width (or number of convolutional filters/channels) in each hidden layer. diff --git a/src/NeuralEstimators.jl b/src/NeuralEstimators.jl index f3f0400..c6cf1c6 100644 --- a/src/NeuralEstimators.jl +++ b/src/NeuralEstimators.jl @@ -57,7 +57,7 @@ include("densities.jl") export train, trainx, subsetdata include("train.jl") -export assess, Assessment, merge, risk +export assess, Assessment, merge, risk, bias, rmse include("assess.jl") export plotrisk, plotdistribution diff --git a/src/assess.jl b/src/assess.jl index 6728075..50f08a0 100644 --- a/src/assess.jl +++ b/src/assess.jl @@ -33,32 +33,31 @@ function merge(assessment::Assessment, assessments::Assessment...) end @doc raw""" - risk(assessment::Assessment; loss = (x, y) -> abs(x - y), average_over_parameters = true) + risk(assessment::Assessment; ...) -Computes a Monte Carlo approximation of the Bayes risk, +Computes a Monte Carlo approximation of an estimator's Bayes risk, ```math -r_{\Omega}(\hat{\boldsymbol{\theta}}(\cdot)) +r(\hat{\boldsymbol{\theta}}(\cdot)) \approx -\frac{1}{K} \sum_{\boldsymbol{\theta} \in \vartheta} \frac{1}{J} \sum_{\boldsymbol{Z} \in \mathcal{Z}_{\boldsymbol{\theta}}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z})). +\frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})), ``` -where ``\vartheta`` denotes a set of ``K`` parameter vectors sampled from the -prior ``\Omega(\cdot)`` and, for each ``\boldsymbol{\theta} \in \vartheta``, we -have ``J`` sets of ``m`` mutually independent realisations from the model -collected in ``\mathcal{Z}_{\boldsymbol{\theta}}``. +where ``\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}`` denotes a set of ``K`` parameter vectors sampled from the +prior and, for each ``k``, data ``\boldsymbol{Z}^{(k)}`` are simulated from the statistical model conditional on ``\boldsymbol{\theta}^{(k)}``. # Keyword arguments - `loss = (x, y) -> abs(x - y)`: a binary operator defining the loss function (default absolute-error loss). - `average_over_parameters::Bool = true`: if true (default), the loss is averaged over all parameters; otherwise, the loss is averaged over each parameter separately. - `average_over_sample_sizes::Bool = true`: if true (default), the loss is averaged over all sample sizes ``m``; otherwise, the loss is averaged over each sample size separately. """ -function risk(assessment::Assessment; +risk(assessment::Assessment; args...) = risk(assessment.df; args...) + +function risk(df::DataFrame; loss = (x, y) -> abs(x - y), average_over_parameters::Bool = true, average_over_sample_sizes::Bool = true) - df = assessment.df grouping_variables = [:estimator] if !average_over_parameters push!(grouping_variables, :parameter) end if !average_over_sample_sizes push!(grouping_variables, :m) end @@ -69,6 +68,60 @@ function risk(assessment::Assessment; return df end +@doc raw""" + bias(assessment::Assessment; ...) + +Computes a Monte Carlo approximation of an estimator's bias, + +```math +{\rm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) +\approx +\frac{1}{K} \sum_{k=1}^K \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}, +``` + +where ``\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}`` denotes a set of ``K`` parameter vectors sampled from the +prior and, for each ``k``, data ``\boldsymbol{Z}^{(k)}`` are simulated from the statistical model conditional on ``\boldsymbol{\theta}^{(k)}``. + +This function inherits the keyword arguments of [`risk`](@ref) (excluding the argument `loss`). +""" +bias(assessment::Assessment; args...) = bias(assessment.df; args...) + +function bias(df::DataFrame; args...) + + df = risk(df; loss = (x, y) -> x - y, args...) + + rename!(df, :risk => :bias) + + return df +end + +@doc raw""" + rmse(assessment::Assessment; ...) + +Computes a Monte Carlo approximation of an estimator's root-mean-squared error, + +```math +{\rm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) +\approx +\sqrt{\frac{1}{K} \sum_{k=1}^K (\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)})^2}, +``` + +where ``\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}`` denotes a set of ``K`` parameter vectors sampled from the +prior and, for each ``k``, data ``\boldsymbol{Z}^{(k)}`` are simulated from the statistical model conditional on ``\boldsymbol{\theta}^{(k)}``. + +This function inherits the keyword arguments of [`risk`](@ref) (excluding the argument `loss`). +""" +rmse(assessment::Assessment; args...) = rmse(assessment.df; args...) + +function rmse(df::DataFrame; args...) + + df = risk(df; loss = (x, y) -> (x - y)^2, args...) + + df[:, :risk] = sqrt.(df[:, :risk]) + rename!(df, :risk => :rmse) + + return df +end # ---- assess() ---- diff --git a/test/runtests.jl b/test/runtests.jl index 4824c7a..51a6993 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -534,10 +534,21 @@ m = 10 # default sample size @test typeof(merge(assessment, assessment)) == Assessment risk(assessment) + risk(assessment, loss = (x, y) -> (x - y)^2) risk(assessment; average_over_parameters = false) risk(assessment; average_over_sample_sizes = false) risk(assessment; average_over_parameters = false, average_over_sample_sizes = false) + bias(assessment) + bias(assessment; average_over_parameters = false) + bias(assessment; average_over_sample_sizes = false) + bias(assessment; average_over_parameters = false, average_over_sample_sizes = false) + + rmse(assessment) + rmse(assessment; average_over_parameters = false) + rmse(assessment; average_over_sample_sizes = false) + rmse(assessment; average_over_parameters = false, average_over_sample_sizes = false) + # J == 5 > 1 Z_test = simulator(parameters, m, 5) assessment = assess([θ̂], parameters, Z_test, use_gpu = use_gpu, verbose = verbose)