From 56b4b50efcb3cdf49d326ce826be844030a0ccdc Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 23 Aug 2023 02:23:01 +0000 Subject: [PATCH] build based on 8015d6e --- dev/API/architectures/index.html | 20 ++--- dev/API/core/index.html | 18 ++--- dev/API/index.html | 2 +- dev/API/loss/index.html | 6 +- dev/API/simulation/index.html | 10 +-- dev/API/utility/index.html | 14 ++-- dev/framework/index.html | 2 +- dev/index.html | 2 +- dev/search/index.html | 2 +- dev/search_index.js | 2 +- dev/workflow/advancedusage/index.html | 2 +- dev/workflow/examples/index.html | 103 +++++++++++++++++++++++++- dev/workflow/overview/index.html | 2 +- 13 files changed, 142 insertions(+), 43 deletions(-) diff --git a/dev/API/architectures/index.html b/dev/API/architectures/index.html index 9db51b5..4c05c33 100644 --- a/dev/API/architectures/index.html +++ b/dev/API/architectures/index.html @@ -28,7 +28,7 @@ x₂ = [rand(qₓ) for _ ∈ eachindex(Z₂)] θ̂((Z₁, x₁)) θ̂((Z₂, x₂)) -θ̂((Z₃, x₂))source
NeuralEstimators.DeepSetExpertType
DeepSetExpert(ψ, ϕ, S, a)
+θ̂((Z₃, x₂))
source
NeuralEstimators.DeepSetExpertType
DeepSetExpert(ψ, ϕ, S, a)
 DeepSetExpert(ψ, ϕ, S; a::String = "mean")
 DeepSetExpert(deepset::DeepSet, ϕ, S)

Identical to DeepSet, but with additional expert summary statistics,

\[θ̂(𝐙) = ϕ((𝐓(𝐙)', 𝐒(𝐙)')'),    𝐓(𝐙) = 𝐚(\{ψ(𝐙ᵢ) : i = 1, …, m\}),\]

where S is a function that returns a vector of expert summary statistics.

The constructor DeepSetExpert(deepset::DeepSet, ϕ, S) inherits ψ and a from deepset.

Similarly to DeepSet, set-level information can be incorporated by passing a Tuple, in which case we have

\[θ̂(𝐙) = ϕ((𝐓(𝐙)', 𝐒(𝐙)', 𝐱')'),    𝐓(𝐙) = 𝐚(\{ψ(𝐙ᵢ) : i = 1, …, m\}).\]

Examples

using NeuralEstimators
 using Flux
@@ -58,7 +58,7 @@
 x₁ = rand(qₓ)
 x₂ = [rand(qₓ) for _ ∈ eachindex(Z₂)]
 θ̂((Z₁, x₁))
-θ̂((Z₂, x₂))
source
NeuralEstimators.GNNType
GNN(propagation, readout, ϕ, a)
+θ̂((Z₂, x₂))
source
NeuralEstimators.GNNType
GNN(propagation, readout, ϕ, a)
 GNN(propagation, readout, ϕ, a::String = "mean")

A graph neural network (GNN) designed for parameter point estimation.

The propagation module transforms graphical input data into a set of hidden-feature graphs; the readout module aggregates these feature graphs into a single hidden feature vector of fixed length; the function a(⋅) is a permutation-invariant aggregation function, and ϕ is a neural network.

The data should be stored as a GNNGraph or AbstractVector{GNNGraph}, where each graph is associated with a single parameter vector. The graphs may contain sub-graphs corresponding to independent replicates from the model.

Examples

using NeuralEstimators
 using Flux
 using Flux: batch
@@ -95,7 +95,7 @@
 g₃ = batch([g₁, g₂])
 θ̂(g₁)
 θ̂(g₃)
-θ̂([g₁, g₂, g₃])
source

Layers

NeuralEstimators.WeightedGraphConvType
WeightedGraphConv(in => out, σ=identity; aggr=+, bias=true, init=glorot_uniform)

Same as regular GraphConv layer, but where the neighbours of a node are weighted by their spatial distance to that node.

Arguments

  • in: The dimension of input features.
  • out: The dimension of output features.
  • σ: Activation function.
  • aggr: Aggregation operator for the incoming messages (e.g. +, *, max, min, and mean).
  • bias: Add learnable bias.
  • init: Weights' initializer.

Examples

using NeuralEstimators
+θ̂([g₁, g₂, g₃])
source

Layers

NeuralEstimators.WeightedGraphConvType
WeightedGraphConv(in => out, σ=identity; aggr=mean, bias=true, init=glorot_uniform)

Same as regular GraphConv layer, but where the neighbours of a node are weighted by their spatial distance to that node.

Arguments

  • in: The dimension of input features.
  • out: The dimension of output features.
  • σ: Activation function.
  • aggr: Aggregation operator for the incoming messages (e.g. +, *, max, min, and mean).
  • bias: Add learnable bias.
  • init: Weights' initializer.

Examples

using NeuralEstimators
 using GraphNeuralNetworks
 
 # Construct a spatially-weighted adjacency matrix based on k-nearest neighbours
@@ -108,7 +108,7 @@
 
 # Construct the layer and apply it to the data to generate convolved features
 layer = WeightedGraphConv(d => 16)
-layer(Z)
source
NeuralEstimators.UniversalPoolType
UniversalPool(ψ, ϕ)

Pooling layer (i.e., readout layer) from the paper 'Universal Readout for Graph Convolutional Neural Networks'. It takes the form,

\[\mathbf{V} = ϕ(|G|⁻¹ \sum_{s\in G} ψ(\mathbf{h}_s)),\]

where $\mathbf{V}$ denotes the summary vector for graph $G$, $\mathbf{h}_s$ denotes the vector of hidden features for node $s \in G$, and ψ and ϕ are dense neural networks.

See also the pooling layers available from GraphNeuralNetworks.jl.

Examples

using NeuralEstimators
+layer(Z)
source
NeuralEstimators.UniversalPoolType
UniversalPool(ψ, ϕ)

Pooling layer (i.e., readout layer) from the paper 'Universal Readout for Graph Convolutional Neural Networks'. It takes the form,

\[\mathbf{V} = ϕ(|G|⁻¹ \sum_{s\in G} ψ(\mathbf{h}_s)),\]

where $\mathbf{V}$ denotes the summary vector for graph $G$, $\mathbf{h}_s$ denotes the vector of hidden features for node $s \in G$, and ψ and ϕ are dense neural networks.

See also the pooling layers available from GraphNeuralNetworks.jl.

Examples

using NeuralEstimators
 using Flux
 using GraphNeuralNetworks
 using Graphs: random_regular_graph
@@ -127,7 +127,7 @@
 pool = UniversalPool(ψ, ϕ)
 
 # Apply the pooling layer
-pool(G)
source

Output activation functions

These layers can be used at the end of an architecture to ensure that the neural estimator provides valid parameters.

NeuralEstimators.CompressType
Compress(a, b, k = 1)

Layer that compresses its input to be within the range a and b, where each element of a is less than the corresponding element of b.

The layer uses a logistic function,

\[l(θ) = a + \frac{b - a}{1 + e^{-kθ}},\]

where the arguments a and b together combine to shift and scale the logistic function to the desired range, and the growth rate k controls the steepness of the curve.

The logistic function given here contains an additional parameter, θ₀, which is the input value corresponding to the functions midpoint. In Compress, we fix θ₀ = 0, since the output of a randomly initialised neural network is typically around zero.

Examples

using NeuralEstimators
+pool(G)
source

Output activation functions

These layers can be used at the end of an architecture to ensure that the neural estimator provides valid parameters.

NeuralEstimators.CompressType
Compress(a, b, k = 1)

Layer that compresses its input to be within the range a and b, where each element of a is less than the corresponding element of b.

The layer uses a logistic function,

\[l(θ) = a + \frac{b - a}{1 + e^{-kθ}},\]

where the arguments a and b together combine to shift and scale the logistic function to the desired range, and the growth rate k controls the steepness of the curve.

The logistic function given here contains an additional parameter, θ₀, which is the input value corresponding to the functions midpoint. In Compress, we fix θ₀ = 0, since the output of a randomly initialised neural network is typically around zero.

Examples

using NeuralEstimators
 using Flux
 
 a = [25, 0.5, -pi/2]
@@ -141,7 +141,7 @@
 n = 20
 θ̂ = Chain(Dense(n, p), l)
 Z = randn(n, K)
-θ̂(Z)
source
NeuralEstimators.CholeskyCovarianceType
CholeskyCovariance(d)

Layer for constructing the parameters of the lower Cholesky factor associated with an unconstrained d×d covariance matrix.

The layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension, but with d rows constrained to be positive (corresponding to the diagonal elements of the Cholesky factor) and the remaining rows unconstrained.

The ordering of the transformed Matrix aligns with Julia's column-major ordering. For example, when modelling the Cholesky factor,

\[\begin{bmatrix} +θ̂(Z)

source
NeuralEstimators.CholeskyCovarianceType
CholeskyCovariance(d)

Layer for constructing the parameters of the lower Cholesky factor associated with an unconstrained d×d covariance matrix.

The layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension, but with d rows constrained to be positive (corresponding to the diagonal elements of the Cholesky factor) and the remaining rows unconstrained.

The ordering of the transformed Matrix aligns with Julia's column-major ordering. For example, when modelling the Cholesky factor,

\[\begin{bmatrix} L₁₁ & & \\ L₂₁ & L₂₂ & \\ L₃₁ & L₃₂ & L₃₃ \\ @@ -152,7 +152,7 @@ θ = randn(p, 50) l = CholeskyCovariance(d) θ = l(θ) # returns matrix (used for Flux networks) -L = [vectotril(y) for y ∈ eachcol(θ)] # convert matrix to Cholesky factors

source
NeuralEstimators.CovarianceMatrixType
CovarianceMatrix(d)

Layer for constructing the parameters of an unconstrained d×d covariance matrix.

The layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension.

Internally, it uses a CholeskyCovariance layer to construct a valid Cholesky factor 𝐋, and then extracts the lower triangle from the positive-definite covariance matrix 𝚺 = 𝐋𝐋'. The lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the covariance matrix,

\[\begin{bmatrix} +L = [vectotril(y) for y ∈ eachcol(θ)] # convert matrix to Cholesky factors

source
NeuralEstimators.CovarianceMatrixType
CovarianceMatrix(d)

Layer for constructing the parameters of an unconstrained d×d covariance matrix.

The layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension.

Internally, it uses a CholeskyCovariance layer to construct a valid Cholesky factor 𝐋, and then extracts the lower triangle from the positive-definite covariance matrix 𝚺 = 𝐋𝐋'. The lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the covariance matrix,

\[\begin{bmatrix} Σ₁₁ & Σ₁₂ & Σ₁₃ \\ Σ₂₁ & Σ₂₂ & Σ₂₃ \\ Σ₃₁ & Σ₃₂ & Σ₃₃ \\ @@ -165,7 +165,7 @@ l = CovarianceMatrix(d) θ = l(θ) -Σ = [Symmetric(cpu(vectotril(y)), :L) for y ∈ eachcol(θ)]

source
NeuralEstimators.CorrelationMatrixType
CorrelationMatrix(d)

Layer for constructing the parameters of an unconstrained d×d correlation matrix.

The layer transforms a Matrix with d(d-1)÷2 rows into a Matrix with the same dimension.

Internally, the layers uses the algorithm described here and here to construct a valid Cholesky factor 𝐋, and then extracts the strict lower triangle from the positive-definite correlation matrix 𝐑 = 𝐋𝐋'. The strict lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the correlation matrix,

\[\begin{bmatrix} +Σ = [Symmetric(cpu(vectotril(y)), :L) for y ∈ eachcol(θ)]

source
NeuralEstimators.CorrelationMatrixType
CorrelationMatrix(d)

Layer for constructing the parameters of an unconstrained d×d correlation matrix.

The layer transforms a Matrix with d(d-1)÷2 rows into a Matrix with the same dimension.

Internally, the layers uses the algorithm described here and here to construct a valid Cholesky factor 𝐋, and then extracts the strict lower triangle from the positive-definite correlation matrix 𝐑 = 𝐋𝐋'. The strict lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the correlation matrix,

\[\begin{bmatrix} 1 & R₁₂ & R₁₃ \\ R₂₁ & 1 & R₂₃\\ R₃₁ & R₃₂ & 1\\ @@ -185,7 +185,7 @@ R = Symmetric(cpu(vectotril(y, strict = true)), :L) R[diagind(R)] .= 1 R -end

source
NeuralEstimators.SplitApplyType
SplitApply(layers, indices)

Splits an array into multiple sub-arrays by subsetting the rows using the collection of indices, and then applies each layer in layers to the corresponding sub-array.

Specifically, for each i = 1, …, $n$, with $n$ the number of layers, SplitApply(x) performs layers[i](x[indices[i], :]), and then vertically concatenates the resulting transformed arrays.

Examples

using NeuralEstimators
+end
source
NeuralEstimators.SplitApplyType
SplitApply(layers, indices)

Splits an array into multiple sub-arrays by subsetting the rows using the collection of indices, and then applies each layer in layers to the corresponding sub-array.

Specifically, for each i = 1, …, $n$, with $n$ the number of layers, SplitApply(x) performs layers[i](x[indices[i], :]), and then vertically concatenates the resulting transformed arrays.

Examples

using NeuralEstimators
 
 d = 4
 K = 50
@@ -200,4 +200,4 @@
 l = SplitApply([l₁, l₂], [1:p₁, p₁+1:p])
 
 θ = randn(p, K)
-l(θ)
source
+l(θ)source diff --git a/dev/API/core/index.html b/dev/API/core/index.html index c2faec8..c453381 100644 --- a/dev/API/core/index.html +++ b/dev/API/core/index.html @@ -2,7 +2,7 @@ Core · NeuralEstimators.jl

Core

This page documents the functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these functions appear in a standard implementation; that is, from sampling parameters from the prior distribution, to uncertainty quantification of the final estimates via bootstrapping.

Sampling parameters

Parameters sampled from the prior distribution $\Omega(\cdot)$ are stored as a $p \times K$ matrix, where $p$ is the number of parameters in the model and $K$ is the number of parameter vectors sampled from the prior distribution.

It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). In this case, the user-defined type should be a subtype of the abstract type ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.

NeuralEstimators.ParameterConfigurationsType
ParameterConfigurations

An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.

The user-defined type must have a field θ that stores the $p$ × $K$ matrix of parameters, where $p$ is the number of parameters in the model and $K$ is the number of parameter vectors sampled from the prior distribution. There are no other restrictions.

See subsetparameters for the generic function for subsetting these objects.

Examples

struct P <: ParameterConfigurations
 	θ
 	# other expensive intermediate objects...
-end
source

Simulating data

NeuralEstimators facilitates neural estimation for arbitrary statistical models by having the user implicitly define the model via simulated data. The user may provide simulated data directly, or provide a function that simulates data from the model.

The data should be stored as a Vector{A}, where each element of the vector is associated with one parameter configuration, and where A depends on the architecture of the neural estimator.

Types of estimators

See also Architectures and activations functions that are often used when constructing neural estimators.

NeuralEstimators.PointEstimatorType
PointEstimator(arch)

A simple point estimator, that is, a mapping from the sample space to the parameter space, defined by the given architecture arch.

source

Simulating data

NeuralEstimators facilitates neural estimation for arbitrary statistical models by having the user implicitly define the model via simulated data. The user may provide simulated data directly, or provide a function that simulates data from the model.

The data should be stored as a Vector{A}, where each element of the vector is associated with one parameter configuration, and where A depends on the architecture of the neural estimator.

Types of estimators

See also Architectures and activations functions that are often used when constructing neural estimators.

NeuralEstimators.PointEstimatorType
PointEstimator(arch)

A simple point estimator, that is, a mapping from the sample space to the parameter space, defined by the given architecture arch.

source
NeuralEstimators.IntervalEstimatorType
IntervalEstimator(arch_lower, arch_upper)
 IntervalEstimator(arch)

A neural estimator that produces credible intervals constructed as,

\[[l(Z), l(Z) + \mathrm{exp}(u(Z))],\]

where $l(⋅)$ and $u(⋅)$ are the neural networks arch_lower and arch_upper, both of which should transform data into $p$-dimensional vectors, where $p$ is the number of parameters in the model. If only a single neural network architecture arch is provided, it will be used for both arch_lower and arch_upper.

Internally, the output from arch_lower and arch_upper are concatenated, so that IntervalEstimator objects transform data into $2p$-dimensional vectors.

Examples

using NeuralEstimators
 using Flux
 
@@ -23,7 +23,7 @@
 
 # Apply the interval estimator
 estimator(Z)
-interval(estimator, Z, parameter_names = ["ρ", "σ", "τ"])
source
NeuralEstimators.QuantileEstimatorType
QuantileEstimator()

Coming soon: this structure will allow for the simultaneous estimation of an arbitrary number of marginal quantiles of the posterior distribution.

source
NeuralEstimators.PiecewiseEstimatorType
PiecewiseEstimator(estimators, breaks)

Creates a piecewise estimator from a collection of estimators, based on the collection of changepoints, breaks, which should contain one element fewer than the number of estimators.

Any estimator can be included in estimators, including any of the subtypes of NeuralEstimator exported with the package NeuralEstimators (e.g., PointEstimator, IntervalEstimator, etc.).

Examples

# Suppose that we've trained two neural estimators. The first, θ̂₁, is trained
+interval(estimator, Z, parameter_names = ["ρ", "σ", "τ"])
source
NeuralEstimators.QuantileEstimatorType
QuantileEstimator()

Coming soon: this structure will allow for the simultaneous estimation of an arbitrary number of marginal quantiles of the posterior distribution.

source
NeuralEstimators.PiecewiseEstimatorType
PiecewiseEstimator(estimators, breaks)

Creates a piecewise estimator from a collection of estimators, based on the collection of changepoints, breaks, which should contain one element fewer than the number of estimators.

Any estimator can be included in estimators, including any of the subtypes of NeuralEstimator exported with the package NeuralEstimators (e.g., PointEstimator, IntervalEstimator, etc.).

Examples

# Suppose that we've trained two neural estimators. The first, θ̂₁, is trained
 # for small sample sizes (e.g., m ≤ 30), and the second, `θ̂₂`, is trained for
 # moderate-to-large sample sizes (e.g., m > 30). We construct a piecewise
 # estimator with a sample-size changepoint of 30, which dispatches θ̂₁ if m ≤ 30
@@ -46,7 +46,7 @@
 
 θ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], [30])
 Z = [rand(n, 1, m) for m ∈ (10, 50)]
-θ̂(Z)
source

Training

The function train is used to train a single neural estimator, while the wrapper function trainx is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.

Training

The function train is used to train a single neural estimator, while the wrapper function trainx is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.

NeuralEstimators.trainFunction
train(θ̂, sampler, simulator; )
 train(θ̂, θ_train::P, θ_val::P, simulator; ) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
 train(θ̂, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}

Train a neural estimator with architecture θ̂.

The methods cater for different forms of on-the-fly simulation. The method that takes functions sampler and simulator for sampling parameters and simulating data, respectively, allows for both the parameters and the data to be simulated on-the-fly. Note that simulator is called as simulator(θ, m), where θ is a set of parameters and m is the sample size (see keyword arguments below). If provided with specific instances of parameters (θ_train and θ_val) or data (Z_train and Z_val), they will be held fixed during training.

In all methods, the validation set is held fixed to reduce noise when evaluating the validation risk function, which is used to monitor the performance of the estimator during training.

If the number of replicates in Z_train is a multiple of the number of replicates for each element of Z_val, the training data will be recycled throughout training. For example, if each element of Z_train consists of 50 replicates, and each element of Z_val consists of 10 replicates, the first epoch uses the first 10 replicates in Z_train, the second epoch uses the next 10 replicates, and so on, until the sixth epoch again uses the first 10 replicates. Note that this requires the data to be subsettable with the function subsetdata.

Keyword arguments

Arguments common to all methods:

  • loss = mae: the loss function, which should return the average loss when applied to multiple replicates.
  • epochs::Integer = 100
  • batchsize::Integer = 32
  • optimiser = ADAM(1e-4)
  • savepath::String = "": path to save the trained estimator and other information; if savepath is an empty string (default), nothing is saved.
  • stopping_epochs::Integer = 5: cease training if the risk doesn't improve in this number of epochs.
  • use_gpu::Bool = true
  • verbose::Bool = true

Arguments common to train(θ̂, P, simulator) and train(θ̂, θ_train, θ_val, simulator):

  • m: sample sizes (either an Integer or a collection of Integers).
  • epochs_per_Z_refresh::Integer = 1: how often to refresh the training data.
  • simulate_just_in_time::Bool = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.

Arguments unique to train(θ̂, P, simulator):

  • K::Integer = 10000: number of parameter vectors in the training set; the size of the validation set is K ÷ 5.
  • ξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices); if ξ is provided, the parameter sampler is called as sampler(K, ξ).
  • epochs_per_θ_refresh::Integer = 1: how often to refresh the training parameters. Must be a multiple of epochs_per_Z_refresh.

Examples

using NeuralEstimators
 using Flux
@@ -95,10 +95,10 @@
 # training: fixed parameters and fixed data
 Z_train = simulate(θ_train, m)
 Z_val   = simulate(θ_val, m)
-θ̂ = train(θ̂, θ_train, θ_val, Z_train, Z_val, epochs = 5)
source
NeuralEstimators.trainxFunction
trainx(θ̂, P, simulator, M; )
 trainx(θ̂, θ_train, θ_val, simulator, M; )
 trainx(θ̂, θ_train, θ_val, Z_train::T, Z_val::T, M; )
-trainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ) where {V <: AbstractVector{AbstractVector{Any}}}

A wrapper around train to construct neural estimators for different sample sizes.

The collection M specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if M = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.

The method for Z_train::T and Z_val::T subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ M. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train and Z_val; hence, it does not need to invoke subsetdata, which can be slow or difficult to define in some cases (e.g., for graphical data).

The keyword arguments inherit from train, and certain keyword arguments can be given as vectors. For example, if we are training two estimators, we can use a different number of epochs by providing epochs = [e₁, e₂]. Other arguments that allow vectors are batchsize, stopping_epochs, and optimiser.

source

Assessing a neural estimator

NeuralEstimators.assessFunction
assess(estimators, θ, Z; <keyword args>)

Using a collection of estimators, compute estimates from data Z simulated based on true parameter vectors stored in θ.

If Z contains more data sets than parameter vectors, the parameter matrix will be recycled by horizontal concatenation.

The output is of type Assessment; see ?Assessment for details.

Keyword arguments

  • estimator_names::Vector{String}: names of the estimators (sensible defaults provided).
  • parameter_names::Vector{String}: names of the parameters (sensible defaults provided). If ξ is provided with a field parameter_names, those names will be used.
  • ξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices).
  • use_ξ = false: a Bool or a collection of Bool objects with length equal to the number of estimators. Specifies whether or not the estimator uses ξ: if it does, the estimator will be applied as estimator(Z, ξ). This argument is useful when multiple estimators are provided, only some of which need ξ; hence, if only one estimator is provided and ξ is not nothing, use_ξ is automatically set to true.
  • use_gpu = true: a Bool or a collection of Bool objects with length equal to the number of estimators.
  • verbose::Bool = true

Examples

using NeuralEstimators
+trainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ) where {V <: AbstractVector{AbstractVector{Any}}}

A wrapper around train to construct neural estimators for different sample sizes.

The collection M specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if M = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.

The method for Z_train::T and Z_val::T subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ M. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train and Z_val; hence, it does not need to invoke subsetdata, which can be slow or difficult to define in some cases (e.g., for graphical data).

The keyword arguments inherit from train, and certain keyword arguments can be given as vectors. For example, if we are training two estimators, we can use a different number of epochs by providing epochs = [e₁, e₂]. Other arguments that allow vectors are batchsize, stopping_epochs, and optimiser.

source

Assessing a neural estimator

NeuralEstimators.assessFunction
assess(estimators, θ, Z; <keyword args>)

Using a collection of estimators, compute estimates from data Z simulated based on true parameter vectors stored in θ.

If Z contains more data sets than parameter vectors, the parameter matrix will be recycled by horizontal concatenation.

The output is of type Assessment; see ?Assessment for details.

Keyword arguments

  • estimator_names::Vector{String}: names of the estimators (sensible defaults provided).
  • parameter_names::Vector{String}: names of the parameters (sensible defaults provided). If ξ is provided with a field parameter_names, those names will be used.
  • ξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices).
  • use_ξ = false: a Bool or a collection of Bool objects with length equal to the number of estimators. Specifies whether or not the estimator uses ξ: if it does, the estimator will be applied as estimator(Z, ξ). This argument is useful when multiple estimators are provided, only some of which need ξ; hence, if only one estimator is provided and ξ is not nothing, use_ξ is automatically set to true.
  • use_gpu = true: a Bool or a collection of Bool objects with length equal to the number of estimators.
  • verbose::Bool = true

Examples

using NeuralEstimators
 using Flux
 
 n = 10 # number of observations in each realisation
@@ -132,14 +132,14 @@
 θ̂ = DeepSet(ψ, ϕ)
 x = [rand(qₓ) for _ ∈ eachindex(Z)]
 assessment = assess([θ̂], θ, (Z, x));
-risk(assessment)
source
NeuralEstimators.AssessmentType
Assessment(df::DataFrame, runtime::DataFrame)

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:

  • estimator: the name of the estimator
  • parameter: the name of the parameter
  • truth: the true value of the parameter
  • estimate: the estimated value of the parameter
  • m: the sample size (number of iid replicates)
  • k: the index of the parameter vector in the test set
  • j: the index of the data set

Multiple Assessment objects can be combined with merge().

source
NeuralEstimators.riskFunction
risk(assessment::Assessment; loss = (x, y) -> abs(x - y), average_over_parameters = true)

Computes a Monte Carlo approximation of the Bayes risk,

\[r_{\Omega}(\hat{\boldsymbol{\theta}}(\cdot)) +risk(assessment)

source
NeuralEstimators.AssessmentType
Assessment(df::DataFrame, runtime::DataFrame)

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:

  • estimator: the name of the estimator
  • parameter: the name of the parameter
  • truth: the true value of the parameter
  • estimate: the estimated value of the parameter
  • m: the sample size (number of iid replicates)
  • k: the index of the parameter vector in the test set
  • j: the index of the data set

Multiple Assessment objects can be combined with merge().

source
NeuralEstimators.riskFunction
risk(assessment::Assessment; loss = (x, y) -> abs(x - y), average_over_parameters = true)

Computes a Monte Carlo approximation of the Bayes risk,

\[r_{\Omega}(\hat{\boldsymbol{\theta}}(\cdot)) \approx -\frac{1}{K} \sum_{\boldsymbol{\theta} \in \vartheta} \frac{1}{J} \sum_{\boldsymbol{Z} \in \mathcal{Z}_{\boldsymbol{\theta}}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z})).\]

where $\vartheta$ denotes a set of $K$ parameter vectors sampled from the prior $\Omega(\cdot)$ and, for each $\boldsymbol{\theta} \in \vartheta$, we have $J$ sets of $m$ mutually independent realisations from the model collected in $\mathcal{Z}_{\boldsymbol{\theta}}$.

Keyword arguments

  • loss = (x, y) -> abs(x - y): a binary operator (default absolute-error loss).
  • average_over_parameters::Bool = true: if true (default), the loss is averaged over all parameters; otherwise, the loss is averaged over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
source

Bootstrapping

NeuralEstimators.bootstrapFunction
bootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}
+\frac{1}{K} \sum_{\boldsymbol{\theta} \in \vartheta} \frac{1}{J} \sum_{\boldsymbol{Z} \in \mathcal{Z}_{\boldsymbol{\theta}}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z})).\]

where $\vartheta$ denotes a set of $K$ parameter vectors sampled from the prior $\Omega(\cdot)$ and, for each $\boldsymbol{\theta} \in \vartheta$, we have $J$ sets of $m$ mutually independent realisations from the model collected in $\mathcal{Z}_{\boldsymbol{\theta}}$.

Keyword arguments

  • loss = (x, y) -> abs(x - y): a binary operator (default absolute-error loss).
  • average_over_parameters::Bool = true: if true (default), the loss is averaged over all parameters; otherwise, the loss is averaged over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
source

Bootstrapping

NeuralEstimators.bootstrapFunction
bootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}
 bootstrap(θ̂, parameters::P, simulator, m::Integer; B = 400) where P <: Union{AbstractMatrix, ParameterConfigurations}
-bootstrap(θ̂, Z; B = 400, blocks = nothing)

Generates B bootstrap estimates from an estimator θ̂.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z, but this can only be achieved exactly if all blocks are equal in length.

The keyword argument use_gpu is a flag determining whether to use the GPU, if it is available (default true).

The return type is a p × B matrix, where p is the number of parameters in the model.

source
NeuralEstimators.intervalFunction
interval(θ̃, θ̂ = nothing; type::String, probs = [0.05, 0.95], parameter_names)

Compute a confidence interval using the p × B matrix of bootstrap samples, θ̃, where p is the number of parameters in the model.

If type = "quantile", the interval is constructed by simply taking the quantiles of θ̃, and if type = "reverse-quantile", the so-called reverse-quantile method is used. In both cases, the quantile levels are controlled by the argument probs.

The rows can be named with a vector of strings parameter_names.

The return type is a p × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval.

Examples

using NeuralEstimators
+bootstrap(θ̂, Z; B = 400, blocks = nothing)

Generates B bootstrap estimates from an estimator θ̂.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z, but this can only be achieved exactly if all blocks are equal in length.

The keyword argument use_gpu is a flag determining whether to use the GPU, if it is available (default true).

The return type is a p × B matrix, where p is the number of parameters in the model.

source
NeuralEstimators.intervalFunction
interval(θ̃, θ̂ = nothing; type::String, probs = [0.05, 0.95], parameter_names)

Compute a confidence interval using the p × B matrix of bootstrap samples, θ̃, where p is the number of parameters in the model.

If type = "quantile", the interval is constructed by simply taking the quantiles of θ̃, and if type = "reverse-quantile", the so-called reverse-quantile method is used. In both cases, the quantile levels are controlled by the argument probs.

The rows can be named with a vector of strings parameter_names.

The return type is a p × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval.

Examples

using NeuralEstimators
 p = 3
 B = 50
 θ̃ = rand(p, B)
 θ̂ = rand(p)
 interval(θ̃)
-interval(θ̃, θ̂, type = "basic")
source
+interval(θ̃, θ̂, type = "basic")source diff --git a/dev/API/index.html b/dev/API/index.html index 0f774e5..aa8f400 100644 --- a/dev/API/index.html +++ b/dev/API/index.html @@ -1,2 +1,2 @@ -Index · NeuralEstimators.jl
+Index · NeuralEstimators.jl
diff --git a/dev/API/loss/index.html b/dev/API/loss/index.html index 9192034..c31d49c 100644 --- a/dev/API/loss/index.html +++ b/dev/API/loss/index.html @@ -1,5 +1,5 @@ -Loss functions · NeuralEstimators.jl

Loss functions

In addition to the standard loss functions provided by Flux (e.g., mae, mse, etc.), NeuralEstimators provides the following loss functions.

NeuralEstimators.kpowerlossFunction
kpowerloss(θ̂, y, k; agg = mean, safeorigin = true, ϵ = 0.1)

For k ∈ (0, ∞), the k-th power absolute-distance loss,

\[L(θ̂, θ) = |θ̂ - θ|ᵏ,\]

contains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k → 0).

It is Lipschitz continuous iff k = 1, convex iff k ≥ 1, and strictly convex iff k > 1. It is quasiconvex for all k > 0.

If safeorigin = true, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ-interval surrounding the origin.

source
NeuralEstimators.quantilelossFunction
quantileloss(θ̂, θ, q; agg = mean)
+Loss functions · NeuralEstimators.jl

Loss functions

In addition to the standard loss functions provided by Flux (e.g., mae, mse, etc.), NeuralEstimators provides the following loss functions.

NeuralEstimators.kpowerlossFunction
kpowerloss(θ̂, y, k; agg = mean, safeorigin = true, ϵ = 0.1)

For k ∈ (0, ∞), the k-th power absolute-distance loss,

\[L(θ̂, θ) = |θ̂ - θ|ᵏ,\]

contains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k → 0).

It is Lipschitz continuous iff k = 1, convex iff k ≥ 1, and strictly convex iff k > 1. It is quasiconvex for all k > 0.

If safeorigin = true, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ-interval surrounding the origin.

source
NeuralEstimators.quantilelossFunction
quantileloss(θ̂, θ, q; agg = mean)
 quantileloss(θ̂, θ, q::V; agg = mean) where {T, V <: AbstractVector{T}}

The asymmetric loss function whose minimiser is the qth posterior quantile; namely,

\[L(θ̂, θ, q) = (θ̂ - θ)(𝕀(θ̂ - θ > 0) - q),\]

where q ∈ (0, 1) and 𝕀(⋅) is the indicator function.

The method that takes q as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂ is assumed to be pr, where p is the number of parameters: then, q should be an r-vector.

For further discussion on this loss function, see Equation (7) of Cressie, N. (2022), "Decisions, decisions, decisions in an uncertain environment", arXiv:2209.13157.

Examples

p = 1
 K = 10
 θ = rand(p, K)
@@ -15,5 +15,5 @@
 quantileloss(θ̂, θ, 0.1)
 
 θ̂ = rand(3p, K)
-quantileloss(θ̂, θ, [0.1, 0.5, 0.9])
source
NeuralEstimators.intervalscoreFunction
intervalscore(l, u, θ, α; agg = mean)
-intervalscore(θ̂, θ, α; agg = mean)

Given a 100×(1-α)% confidence interval [l, u] with true value θ, the interval score is defined by

\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]

where α ∈ (0, 1) and 𝕀(⋅) is the indicator function.

The method that takes a single value θ̂ assumes that θ̂ is a matrix with 2p rows, where p is the number of parameters in the statistical model. Then, the first and second set of p rows will be used as l and u, respectively.

For further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), "Strictly proper scoring rules, prediction, and estimation", Journal of the American statistical Association, 102, 359–378.

source
+quantileloss(θ̂, θ, [0.1, 0.5, 0.9])
source
NeuralEstimators.intervalscoreFunction
intervalscore(l, u, θ, α; agg = mean)
+intervalscore(θ̂, θ, α; agg = mean)

Given a 100×(1-α)% confidence interval [l, u] with true value θ, the interval score is defined by

\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]

where α ∈ (0, 1) and 𝕀(⋅) is the indicator function.

The method that takes a single value θ̂ assumes that θ̂ is a matrix with 2p rows, where p is the number of parameters in the statistical model. Then, the first and second set of p rows will be used as l and u, respectively.

For further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), "Strictly proper scoring rules, prediction, and estimation", Journal of the American statistical Association, 102, 359–378.

source
diff --git a/dev/API/simulation/index.html b/dev/API/simulation/index.html index 15c8cf0..701a979 100644 --- a/dev/API/simulation/index.html +++ b/dev/API/simulation/index.html @@ -20,7 +20,7 @@ # Circulant embedding, which is fast but can on only be used on grids: pts = 1.0:50.0 grf = GaussianRandomField(cov, CirculantEmbedding(), pts, pts, minpadding = 100) -simulategaussianprocess(grf)source
NeuralEstimators.simulateschlatherFunction
simulateschlather(L::Matrix, m = 1)
+simulategaussianprocess(grf)
source
NeuralEstimators.simulateschlatherFunction
simulateschlather(L::Matrix, m = 1)
 simulateschlather(grf::GaussianRandomField, m = 1)

Simulates m independent and identically distributed (i.i.d.) realisations from Schlather's max-stable model using the algorithm for approximate simulation given by Schlather (2002), "Models for stationary max-stable random fields", Extremes, 5:33-44.

Accepts either the lower Cholesky factor L associated with a Gaussian process or a GaussianRandomField object grf.

Keyword arguments

  • C = 3.5: a tuning parameter that controls the accuracy of the algorithm: small C favours computational efficiency, while large C favours accuracy. Schlather (2002) recommends the use of C = 3.
  • Gumbel = true: flag indicating whether the data should be log-transformed from the unit Fréchet scale to the Gumbel scale.

Examples

using NeuralEstimators
 
 n  = 500
@@ -41,7 +41,7 @@
 # Circulant embedding, which is fast but can on only be used on grids:
 pts = 1.0:50.0
 grf = GaussianRandomField(cov, CirculantEmbedding(), pts, pts, minpadding = 100)
-simulateschlather(grf)
source

Spatial point processes

NeuralEstimators.maternclusterprocessFunction
maternclusterprocess(; λ=10, μ=10, r=0.1, xmin=0, xmax=1, ymin=0, ymax=1)

Simulates a Matérn cluster process with density of parent Poisson point process λ, mean number of daughter points μ, and radius of cluster disk r, over the simulation window defined by {x/y}min and {x/y}max.

Note that one may also use the R package spatstat using RCall.

Examples

using NeuralEstimators
+simulateschlather(grf)
source

Spatial point processes

NeuralEstimators.maternclusterprocessFunction
maternclusterprocess(; λ=10, μ=10, r=0.1, xmin=0, xmax=1, ymin=0, ymax=1)

Simulates a Matérn cluster process with density of parent Poisson point process λ, mean number of daughter points μ, and radius of cluster disk r, over the simulation window defined by {x/y}min and {x/y}max.

Note that one may also use the R package spatstat using RCall.

Examples

using NeuralEstimators
 
 # Simulate a realisation from a Matérn cluster process
 S = maternclusterprocess()
@@ -57,7 +57,7 @@
 plots = map(eachindex(λ)) do i
 	S = maternclusterprocess(λ = λ[i], μ = μ[i])
 	scatterplot(S[:, 1], S[:, 2])
-end
source

Low-level functions

These low-level functions may be of use for various models.

NeuralEstimators.maternFunction
matern(h, ρ, ν, σ² = 1)

For two points separated by h units, compute the Matérn covariance function, with range parameter ρ, smoothness parameter ν, and marginal variance parameter σ².

We use the parametrisation $C(\|\mathbf{h}\|) = \sigma^2 \frac{2^{1 - \nu}}{\Gamma(\nu)} \left(\frac{\|\mathbf{h}\|}{\rho}\right)^\nu K_\nu \left(\frac{\|\mathbf{h}\|}{\rho}\right)$, where $\Gamma(\cdot)$ is the gamma function, and $K_\nu(\cdot)$ is the modified Bessel function of the second kind of order $\nu$.

source
NeuralEstimators.materncholsFunction
maternchols(D, ρ, ν, σ² = 1; stack = true)

Given a distance matrix D, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².

Providing vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.

If stack = true, the Cholesky factors will be "stacked" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).

Examples

using NeuralEstimators
+end
source

Low-level functions

These low-level functions may be of use for various models.

NeuralEstimators.maternFunction
matern(h, ρ, ν, σ² = 1)

For two points separated by h units, compute the Matérn covariance function, with range parameter ρ, smoothness parameter ν, and marginal variance parameter σ².

We use the parametrisation $C(\|\mathbf{h}\|) = \sigma^2 \frac{2^{1 - \nu}}{\Gamma(\nu)} \left(\frac{\|\mathbf{h}\|}{\rho}\right)^\nu K_\nu \left(\frac{\|\mathbf{h}\|}{\rho}\right)$, where $\Gamma(\cdot)$ is the gamma function, and $K_\nu(\cdot)$ is the modified Bessel function of the second kind of order $\nu$.

source
NeuralEstimators.materncholsFunction
maternchols(D, ρ, ν, σ² = 1; stack = true)

Given a distance matrix D, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².

Providing vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.

If stack = true, the Cholesky factors will be "stacked" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).

Examples

using NeuralEstimators
 using LinearAlgebra: norm
 n  = 10
 S  = rand(n, 2)
@@ -75,5 +75,5 @@
 
 S̃  = rand(2n, 2)
 D̃  = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
-maternchols([D, D̃], ρ, ν, σ²; stack = false)
source

Density functions

Density functions are not needed in the workflow of NeuralEstimators. However, as part of a series of comparison studies between neural estimators and likelihood-based estimators given in the manuscript, we have developed the following density functions, and we include them in NeuralEstimators to cater for the possibility that they may be of use in future comparison studies.

NeuralEstimators.gaussiandensityFunction
gaussiandensity(y::V, L; logdensity = true) where {V <: AbstractVector{T}} where T
-gaussiandensity(y::A, Σ; logdensity = true) where {A <: AbstractArray{T, N}} where {T, N}

Efficiently computes the density function for y ~ 𝑁(0, Σ), with L the lower Cholesky factor of the covariance matrix Σ.

The method gaussiandensity(y::A, Σ) assumes that the last dimension of y corresponds to the independent-replicates dimension, and it exploits the fact that we need to compute the Cholesky factor L for these independent replicates once only.

source
NeuralEstimators.schlatherbivariatedensityFunction
schlatherbivariatedensity(z₁, z₂, ψ; logdensity = true)

The bivariate density function for Schlather's max-stable model, as given in Huser (2013, pg. 231–232).

Huser, R. (2013). Statistical Modeling and Inference for Spatio-Temporal Ex- tremes. PhD thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland.

source
+maternchols([D, D̃], ρ, ν, σ²; stack = false)source

Density functions

Density functions are not needed in the workflow of NeuralEstimators. However, as part of a series of comparison studies between neural estimators and likelihood-based estimators given in the manuscript, we have developed the following density functions, and we include them in NeuralEstimators to cater for the possibility that they may be of use in future comparison studies.

NeuralEstimators.gaussiandensityFunction
gaussiandensity(y::V, L; logdensity = true) where {V <: AbstractVector{T}} where T
+gaussiandensity(y::A, Σ; logdensity = true) where {A <: AbstractArray{T, N}} where {T, N}

Efficiently computes the density function for y ~ 𝑁(0, Σ), with L the lower Cholesky factor of the covariance matrix Σ.

The method gaussiandensity(y::A, Σ) assumes that the last dimension of y corresponds to the independent-replicates dimension, and it exploits the fact that we need to compute the Cholesky factor L for these independent replicates once only.

source
NeuralEstimators.schlatherbivariatedensityFunction
schlatherbivariatedensity(z₁, z₂, ψ; logdensity = true)

The bivariate density function for Schlather's max-stable model, as given in Huser (2013, pg. 231–232).

Huser, R. (2013). Statistical Modeling and Inference for Spatio-Temporal Ex- tremes. PhD thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland.

source
diff --git a/dev/API/utility/index.html b/dev/API/utility/index.html index be632a1..e431669 100644 --- a/dev/API/utility/index.html +++ b/dev/API/utility/index.html @@ -1,5 +1,5 @@ -Miscellaneous · NeuralEstimators.jl

Miscellaneous

Core

These functions can appear during the core workflow, and may need to be overloaded in some applications.

NeuralEstimators.numberreplicatesFunction
numberofreplicates(Z)

Generic function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.

source
NeuralEstimators.subsetdataFunction

Generic function for subsetting replicates from a data set. Default methods are:

subsetdata(Z::A, m) where {A <: AbstractArray{T, N}} where {T, N}
+Miscellaneous · NeuralEstimators.jl

Miscellaneous

Core

These functions can appear during the core workflow, and may need to be overloaded in some applications.

NeuralEstimators.numberreplicatesFunction
numberofreplicates(Z)

Generic function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.

source
NeuralEstimators.subsetdataFunction

Generic function for subsetting replicates from a data set. Default methods are:

subsetdata(Z::A, m) where {A <: AbstractArray{T, N}} where {T, N}
 subsetdata(Z::G, m) where {G <: AbstractGraph}

Note that subsetdata is slow for graphical data, and one should consider using a method of train that does not require the data to be subsetted when working with graphical data: use numberreplicates to check that the training and validation data sets are equally replicated, which prevents the invocation of subsetdata. Note also that subsetdata only applies to vectors of batched graphs.

If the user is working with data that is not covered by the default methods, simply overload subsetdata with the appropriate type for Z.

Examples

using NeuralEstimators
 using GraphNeuralNetworks
 using Flux: batch
@@ -16,8 +16,8 @@
 # Graphical data
 e = 8 # number of edges
 Z = [batch([rand_graph(n, e, ndata = rand(d, n)) for _ ∈ 1:m]) for k ∈ 1:K]
-subsetdata(Z, 1:3) # extract first 3 replicates for each parameter vector
source
NeuralEstimators.subsetparametersFunction
subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}
-subsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}

Subset parameters using a collection of indices.

Arrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.

source

Utility functions

NeuralEstimators.subsetparametersFunction
subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}
+subsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}

Subset parameters using a collection of indices.

Arrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.

source

Utility functions

NeuralEstimators.adjacencymatrixFunction
adjacencymatrix(M::Matrix, k::Integer)
 adjacencymatrix(M::Matrix, r::Float)

Computes a spatially weighted adjacency matrix from M based on either the k nearest neighbours of each location, or a fixed spatial radius of r units.

If M is a square matrix, is it treated as a distance matrix; otherwise, it should be an n x d matrix, where n is the number of spatial sample locations and d is the spatial dimension (typically d = 2).

Examples

using NeuralEstimators
 using Distances
 
@@ -34,19 +34,19 @@
 # Construct from full distance matrix D
 D = pairwise(Euclidean(), S, S, dims = 1)
 adjacencymatrix(D, k)
-adjacencymatrix(D, r)
source
NeuralEstimators.containertypeFunction
containertype(A::Type)
 containertype(::Type{A}) where A <: SubArray
 containertype(a::A) where A

Returns the container type of its argument.

If given a SubArray, returns the container type of the parent array.

Examples

a = rand(3, 4)
 containertype(a)
 containertype(typeof(a))
-[containertype(x) for x ∈ eachcol(a)]
source
NeuralEstimators.estimateinbatchesFunction
estimateinbatches(θ̂, z; batchsize::Integer = 32, use_gpu::Bool = true)

Apply the estimator θ̂ on minibatches of z of size batchsize, to avoid memory issues that can occur when z is very large.

source
NeuralEstimators.loadbestweightsFunction
loadbestweights(path::String)

Given a path to a training run containing neural networks saved with names "network_epochx.bson" and an object saved as "loss_per_epoch.bson", returns the weights of the best network (measured by validation loss).

source
NeuralEstimators.stackarraysFunction
stackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}

Stack a vector of arrays v along the last dimension of each array, optionally merging the final dimension of the stacked array.

The arrays must be of the same size for the first N-1 dimensions. However, if merge = true, the size of the final dimension can vary.

Examples

# Vector containing arrays of the same size:
+[containertype(x) for x ∈ eachcol(a)]
source
NeuralEstimators.estimateinbatchesFunction
estimateinbatches(θ̂, z; batchsize::Integer = 32, use_gpu::Bool = true)

Apply the estimator θ̂ on minibatches of z of size batchsize, to avoid memory issues that can occur when z is very large.

source
NeuralEstimators.loadbestweightsFunction
loadbestweights(path::String)

Given a path to a training run containing neural networks saved with names "network_epochx.bson" and an object saved as "loss_per_epoch.bson", returns the weights of the best network (measured by validation loss).

source
NeuralEstimators.stackarraysFunction
stackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}

Stack a vector of arrays v along the last dimension of each array, optionally merging the final dimension of the stacked array.

The arrays must be of the same size for the first N-1 dimensions. However, if merge = true, the size of the final dimension can vary.

Examples

# Vector containing arrays of the same size:
 Z = [rand(2, 3, m) for m ∈ (1, 1)];
 stackarrays(Z)
 stackarrays(Z, merge = false)
 
 # Vector containing arrays with differing final dimension size:
 Z = [rand(2, 3, m) for m ∈ (1, 2)];
-stackarrays(Z)
source
NeuralEstimators.vectotrilFunction
vectotril(v; strict = false)
 vectotriu(v; strict = false)

Converts a vector v of length $d(d+1)÷2$ (a triangular number) into a $d × d$ lower or upper triangular matrix.

If strict = true, the matrix will be strictly lower or upper triangular, that is, a $(d+1) × (d+1)$ triangular matrix with zero diagonal.

Note that the triangular matrix is constructed on the CPU, but the returned matrix will be a GPU array if v is a GPU array. Note also that the return type is not of type Triangular matrix (i.e., the zeros are materialised) since Traingular matrices are not always compatible with other GPU operations.

Examples

using NeuralEstimators
 
 d = 4
@@ -55,4 +55,4 @@
 vectotril(v)
 vectotriu(v)
 vectotril(v; strict = true)
-vectotriu(v; strict = true)
source
+vectotriu(v; strict = true)
source
diff --git a/dev/framework/index.html b/dev/framework/index.html index 5dff805..82f9d9f 100644 --- a/dev/framework/index.html +++ b/dev/framework/index.html @@ -4,4 +4,4 @@ \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} \; r_{\Omega}(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})).\]

Typically, $r_{\Omega}(\cdot)$ cannot be directly evaluated, but it can be approximated using Monte Carlo methods. Specifically, given a set of $K$ parameter vectors sampled from the prior $\Omega(\cdot)$ denoted by $\vartheta$ and, for each $\boldsymbol{\theta} \in \vartheta$, $J$ realisations from $f(\boldsymbol{z} \mid \boldsymbol{\theta})$ collected in $\mathcal{Z}_{\boldsymbol{\theta}}$,

\[ r_{\Omega}(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})) \approx -\frac{1}{K} \sum_{\boldsymbol{\theta} \in \vartheta} \frac{1}{J} \sum_{\boldsymbol{z} \in \mathcal{Z}_{\boldsymbol{\theta}}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}; \boldsymbol{\gamma})). \]

Note that the above approximation does not involve evaluation, or knowledge, of the likelihood function.

The Monte-Carlo-approximated Bayes risk can be straightforwardly minimised with respect to $\boldsymbol{\gamma}$ using back-propagation and stochastic gradient descent. For sufficiently flexible architectures, the point estimator targets a Bayes estimator with respect to $L(\cdot, \cdot)$ and $\Omega(\cdot)$. We therefore call the fitted neural point estimator a neural Bayes estimator. Like Bayes estimators, neural Bayes estimators target a specific point summary of the posterior distribution. For instance, the absolute-error and squared-error loss functions lead to neural Bayes estimators that approximate the posterior median and mean, respectively.

Construction of neural Bayes estimators

The neural Bayes estimators is conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical or physical model being considered. The workflow is as follows:

  1. Define the prior, $\Omega(\cdot)$.
  2. Choose a loss function, $L(\cdot, \cdot)$, typically the absolute-error or squared-error loss.
  3. Design a suitable neural-network architecture for the neural point estimator $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})$.
  4. Sample parameters from $\Omega(\cdot)$ to form training/validation/test parameter sets.
  5. Given the above parameter sets, simulate data from the model, to form training/validation/test data sets.
  6. Train the neural network (i.e., estimate $\boldsymbol{\gamma}$) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.
  7. Assess the fitted neural Bayes estimator, $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)$, using the test set.
+\frac{1}{K} \sum_{\boldsymbol{\theta} \in \vartheta} \frac{1}{J} \sum_{\boldsymbol{z} \in \mathcal{Z}_{\boldsymbol{\theta}}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}; \boldsymbol{\gamma})). \]

Note that the above approximation does not involve evaluation, or knowledge, of the likelihood function.

The Monte-Carlo-approximated Bayes risk can be straightforwardly minimised with respect to $\boldsymbol{\gamma}$ using back-propagation and stochastic gradient descent. For sufficiently flexible architectures, the point estimator targets a Bayes estimator with respect to $L(\cdot, \cdot)$ and $\Omega(\cdot)$. We therefore call the fitted neural point estimator a neural Bayes estimator. Like Bayes estimators, neural Bayes estimators target a specific point summary of the posterior distribution. For instance, the absolute-error and squared-error loss functions lead to neural Bayes estimators that approximate the posterior median and mean, respectively.

Construction of neural Bayes estimators

The neural Bayes estimators is conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical or physical model being considered. The workflow is as follows:

  1. Define the prior, $\Omega(\cdot)$.
  2. Choose a loss function, $L(\cdot, \cdot)$, typically the absolute-error or squared-error loss.
  3. Design a suitable neural-network architecture for the neural point estimator $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})$.
  4. Sample parameters from $\Omega(\cdot)$ to form training/validation/test parameter sets.
  5. Given the above parameter sets, simulate data from the model, to form training/validation/test data sets.
  6. Train the neural network (i.e., estimate $\boldsymbol{\gamma}$) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.
  7. Assess the fitted neural Bayes estimator, $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)$, using the test set.
diff --git a/dev/index.html b/dev/index.html index dabe91c..e9d7cfe 100644 --- a/dev/index.html +++ b/dev/index.html @@ -5,4 +5,4 @@ journal = {The American Statistician}, year = {2023}, volume = {to appear} -} +} diff --git a/dev/search/index.html b/dev/search/index.html index 89d46b9..d4f8830 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · NeuralEstimators.jl

Loading search...

    +Search · NeuralEstimators.jl

    Loading search...

      diff --git a/dev/search_index.js b/dev/search_index.js index 8d5b65e..2f1dddf 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"API/simulation/#Model-specific-functions","page":"Model-specific functions","title":"Model-specific functions","text":"","category":"section"},{"location":"API/simulation/#Data-simulators","page":"Model-specific functions","title":"Data simulators","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"The philosophy of NeuralEstimators is to cater for arbitrary statistical models by having the user define their statistical model implicitly through simulated data. However, the following functions have been included as they may be helpful to others, and their source code provide an example for how a user could formulate code for their own model. If you've developed similar functions that you think may be helpful to others, please get in touch or make a pull request.","category":"page"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"simulategaussianprocess\n\nsimulateschlather","category":"page"},{"location":"API/simulation/#NeuralEstimators.simulategaussianprocess","page":"Model-specific functions","title":"NeuralEstimators.simulategaussianprocess","text":"simulategaussianprocess(L::Matrix, m = 1)\nsimulategaussianprocess(grf::GaussianRandomField, m = 1)\n\nSimulates m independent and identically distributed (i.i.d.) realisations from a mean-zero Gaussian process.\n\nAccepts either the lower Cholesky factor L associated with a Gaussian process or a GaussianRandomField object grf.\n\nExamples\n\nusing NeuralEstimators\n\nn = 500\nS = rand(n, 2)\nρ = 0.6\nν = 1.0\n\n# Passing GaussianRandomField object:\nusing GaussianRandomFields\ncov = CovarianceFunction(2, Matern(ρ, ν))\ngrf = GaussianRandomField(cov, Cholesky(), S)\nsimulategaussianprocess(grf)\n\n# Passing Cholesky factors directly as matrices:\nL = grf.data\nsimulategaussianprocess(L)\n\n# Circulant embedding, which is fast but can on only be used on grids:\npts = 1.0:50.0\ngrf = GaussianRandomField(cov, CirculantEmbedding(), pts, pts, minpadding = 100)\nsimulategaussianprocess(grf)\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#NeuralEstimators.simulateschlather","page":"Model-specific functions","title":"NeuralEstimators.simulateschlather","text":"simulateschlather(L::Matrix, m = 1)\nsimulateschlather(grf::GaussianRandomField, m = 1)\n\nSimulates m independent and identically distributed (i.i.d.) realisations from Schlather's max-stable model using the algorithm for approximate simulation given by Schlather (2002), \"Models for stationary max-stable random fields\", Extremes, 5:33-44.\n\nAccepts either the lower Cholesky factor L associated with a Gaussian process or a GaussianRandomField object grf.\n\nKeyword arguments\n\nC = 3.5: a tuning parameter that controls the accuracy of the algorithm: small C favours computational efficiency, while large C favours accuracy. Schlather (2002) recommends the use of C = 3.\nGumbel = true: flag indicating whether the data should be log-transformed from the unit Fréchet scale to the Gumbel scale.\n\nExamples\n\nusing NeuralEstimators\n\nn = 500\nS = rand(n, 2)\nρ = 0.6\nν = 1.0\n\n# Passing GaussianRandomField object:\nusing GaussianRandomFields\ncov = CovarianceFunction(2, Matern(ρ, ν))\ngrf = GaussianRandomField(cov, Cholesky(), S)\nsimulateschlather(grf)\n\n# Passing Cholesky factors directly as matrices:\nL = grf.data\nsimulateschlather(L)\n\n# Circulant embedding, which is fast but can on only be used on grids:\npts = 1.0:50.0\ngrf = GaussianRandomField(cov, CirculantEmbedding(), pts, pts, minpadding = 100)\nsimulateschlather(grf)\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#Spatial-point-processes","page":"Model-specific functions","title":"Spatial point processes","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"maternclusterprocess","category":"page"},{"location":"API/simulation/#NeuralEstimators.maternclusterprocess","page":"Model-specific functions","title":"NeuralEstimators.maternclusterprocess","text":"maternclusterprocess(; λ=10, μ=10, r=0.1, xmin=0, xmax=1, ymin=0, ymax=1)\n\nSimulates a Matérn cluster process with density of parent Poisson point process λ, mean number of daughter points μ, and radius of cluster disk r, over the simulation window defined by {x/y}min and {x/y}max.\n\nNote that one may also use the R package spatstat using RCall.\n\nExamples\n\nusing NeuralEstimators\n\n# Simulate a realisation from a Matérn cluster process\nS = maternclusterprocess()\n\n# Visualise realisation (requires UnicodePlots)\nusing UnicodePlots\nscatterplot(S[:, 1], S[:, 2])\n\n# Visualise realisations from the cluster process with varying parameters\nn = 250\nλ = [10, 25, 50, 90]\nμ = n ./ λ\nplots = map(eachindex(λ)) do i\n\tS = maternclusterprocess(λ = λ[i], μ = μ[i])\n\tscatterplot(S[:, 1], S[:, 2])\nend\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#Low-level-functions","page":"Model-specific functions","title":"Low-level functions","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"These low-level functions may be of use for various models.","category":"page"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"matern\n\nmaternchols","category":"page"},{"location":"API/simulation/#NeuralEstimators.matern","page":"Model-specific functions","title":"NeuralEstimators.matern","text":"matern(h, ρ, ν, σ² = 1)\n\nFor two points separated by h units, compute the Matérn covariance function, with range parameter ρ, smoothness parameter ν, and marginal variance parameter σ².\n\nWe use the parametrisation C(mathbfh) = sigma^2 frac2^1 - nuGamma(nu) left(fracmathbfhrhoright)^nu K_nu left(fracmathbfhrhoright), where Gamma(cdot) is the gamma function, and K_nu(cdot) is the modified Bessel function of the second kind of order nu.\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#NeuralEstimators.maternchols","page":"Model-specific functions","title":"NeuralEstimators.maternchols","text":"maternchols(D, ρ, ν, σ² = 1; stack = true)\n\nGiven a distance matrix D, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².\n\nProviding vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.\n\nIf stack = true, the Cholesky factors will be \"stacked\" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).\n\nExamples\n\nusing NeuralEstimators\nusing LinearAlgebra: norm\nn = 10\nS = rand(n, 2)\nD = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S), sⱼ ∈ eachrow(S)]\nρ = [0.6, 0.5]\nν = [0.7, 1.2]\nσ² = [0.2, 0.4]\nmaternchols(D, ρ, ν)\nmaternchols(D, ρ, ν, σ²; stack = false)\n\nS̃ = rand(n, 2)\nD̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]\nmaternchols([D, D̃], ρ, ν, σ²)\nmaternchols([D, D̃], ρ, ν, σ²; stack = false)\n\nS̃ = rand(2n, 2)\nD̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]\nmaternchols([D, D̃], ρ, ν, σ²; stack = false)\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#Density-functions","page":"Model-specific functions","title":"Density functions","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"Density functions are not needed in the workflow of NeuralEstimators. However, as part of a series of comparison studies between neural estimators and likelihood-based estimators given in the manuscript, we have developed the following density functions, and we include them in NeuralEstimators to cater for the possibility that they may be of use in future comparison studies.","category":"page"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"gaussiandensity\n\nschlatherbivariatedensity","category":"page"},{"location":"API/simulation/#NeuralEstimators.gaussiandensity","page":"Model-specific functions","title":"NeuralEstimators.gaussiandensity","text":"gaussiandensity(y::V, L; logdensity = true) where {V <: AbstractVector{T}} where T\ngaussiandensity(y::A, Σ; logdensity = true) where {A <: AbstractArray{T, N}} where {T, N}\n\nEfficiently computes the density function for y ~ 𝑁(0, Σ), with L the lower Cholesky factor of the covariance matrix Σ.\n\nThe method gaussiandensity(y::A, Σ) assumes that the last dimension of y corresponds to the independent-replicates dimension, and it exploits the fact that we need to compute the Cholesky factor L for these independent replicates once only.\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#NeuralEstimators.schlatherbivariatedensity","page":"Model-specific functions","title":"NeuralEstimators.schlatherbivariatedensity","text":"schlatherbivariatedensity(z₁, z₂, ψ; logdensity = true)\n\nThe bivariate density function for Schlather's max-stable model, as given in Huser (2013, pg. 231–232).\n\nHuser, R. (2013). Statistical Modeling and Inference for Spatio-Temporal Ex- tremes. PhD thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland.\n\n\n\n\n\n","category":"function"},{"location":"workflow/examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"workflow/examples/#Univariate-data","page":"Examples","title":"Univariate data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Here, we develop a neural Bayes estimator for boldsymboltheta equiv (mu sigma) from data Z_1 dots Z_m that are independent and identically distributed according to a N(mu sigma^2) distribution. We'll use the priors mu sim N(0 1) and sigma sim U(01 1), and we assume that the parameters are independent a priori.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Before proceeding, we load the required packages:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"using NeuralEstimators\nusing Flux\nusing Distributions\nimport NeuralEstimators: simulate","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"First, we define a function to sample parameters from the prior. The sampled parameters are stored as p times K matrices, with p the number of parameters in the model and K the number of sampled parameter vectors.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"function sample(K)\n\tμ = rand(Normal(0, 1), K)\n\tσ = rand(Uniform(0.1, 1), K)\n\tθ = hcat(μ, σ)'\n\tθ = Float32.(θ)\n\treturn θ\nend","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Next, we implicitly define the statistical model with simulated data. The data are stored as a Vector{A}, where each element of the vector is associated with one parameter vector, and where A depends on the representation of the neural estimator. Since our data is replicated, we will use the DeepSets framework and, since each replicate is univariate, we will use a dense neural network (DNN) for the inner network. Since the inner network is a DNN, A should be a sub-type of AbstractArray, with the independent replicates stored in the final dimension.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"function simulate(parameters, m)\n\tZ = [θ[1] .+ θ[2] .* randn(Float32, 1, m) for θ ∈ eachcol(parameters)]\n\treturn Z\nend","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"We now design architectures for the inner and outer neural networks, boldsymbolpsi(cdot) and boldsymbolphi(cdot) respectively, in the DeepSets framework, and initialise the neural estimator as a PointEstimator object.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"p = 2 # number of parameters in the statistical model\nw = 32 # width of each layer\n\nψ = Chain(Dense(1, w, relu), Dense(w, w, relu))\nϕ = Chain(Dense(w, w, relu), Dense(w, p))\narchitecture = DeepSet(ψ, ϕ)\n\nθ̂ = PointEstimator(architecture)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Next, we train the neural estimator using train, here using the default absolute-error loss. We'll train the estimator using 15 independent replicates per parameter configuration. Below, we pass our user-defined functions for sampling parameters and simulating data, but one may also pass parameter or data instances, which will be held fixed during training; see train.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"m = 15\nθ̂ = train(θ̂, sample, simulate, m = m, epochs = 30)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"To test the accuracy of the resulting neural Bayes estimator, we use the function assess, which can be used to assess the performance of the estimator (or multiple estimators) over a range of sample sizes. Note that, in this example, we trained the neural estimator using a single sample size, m = 15, and hence the estimator will not necessarily be optimal for other sample sizes; see Variable sample sizes for approaches that one could adopt if data sets with varying sample size are envisaged.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"θ = sample(1000)\nZ = [simulate(θ, m) for m ∈ (5, 10, 15, 20, 30)]\nassessment = assess([θ̂], θ, Z)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"The returned object is an object of type Assessment, which contains the true parameters and their corresponding estimates, and the time taken to compute the estimates for each sample size and each estimator. The risk function may be computed using the function risk:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"risk(assessment)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"It is often helpful to visualise the empirical sampling distribution of an estimator for a particular parameter configuration and a particular sample size. This can be done by providing assess with J data sets simulated under a particular parameter configuration (below facilitated with the pre-defined method simulate(parameters, m, J::Integer), which wraps the method of simulate that we defined earlier), and then plotting the estimates contained in the long-form DataFrame in the resulting Assessment object:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"J = 100\nθ = sample(1)\nZ = [simulate(θ, m, J)]\nassessment = assess([θ̂], θ, Z) ","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Once the neural Bayes estimator has been assessed, it may then be applied to observed data, with parametric/non-parametric bootstrap-based uncertainty quantification facilitated by bootstrap and interval. Below, we use simulated data as a substitute for observed data:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Z = simulate(θ, m) # pretend that this is observed data\nθ̂(Z) # point estimates from the observed data\nθ̃ = bootstrap(θ̂, Z) # non-parametric bootstrap estimates\ninterval(θ̃) # confidence interval from the bootstrap estimates","category":"page"},{"location":"workflow/examples/#Multivariate-data","page":"Examples","title":"Multivariate data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Suppose now that our data consists of m replicates of a d-dimensional multivariate distribution. Everything remains as given in the univariate example above, except that we now store the data as a vector of d times m matrices (previously they were stored as 1times m matrices), and the inner network of the DeepSets representation takes a d-dimensional input (previously it took a 1-dimensional input).","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"","category":"page"},{"location":"workflow/examples/#Gridded-spatial-data","page":"Examples","title":"Gridded spatial data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"For spatial data measured on a regular grid, the estimator is typically based on a convolutional neural network (CNN), and each data set is stored as a four-dimensional array, where the first three dimensions corresponding to the width, height, and depth/channels dimensions, and the fourth dimension stores the independent replicates. Note that, for univariate spatial processes, the channels dimension is simply equal to 1. For a 16x16 spatial grid, a possible architecture is given below.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"p = 2 # number of parameters in the statistical model\nw = 32 # number of neurons in each layer\nd = 2 # dimension of the response variable\n\nψ = Chain(\n\tConv((10, 10), 1 => 32, relu),\n\tConv((5, 5), 32 => 64, relu),\n\tConv((3, 3), 64 => 128, relu),\n\tFlux.flatten\n\t)\nϕ = Chain(Dense(128, 512, relu), Dense(512, p))\narchitecture = DeepSet(ψ, ϕ)","category":"page"},{"location":"workflow/examples/#Irregular-spatial-data","page":"Examples","title":"Irregular spatial data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"This example is coming soon! The methodology involves the use of graph neural networks, which in Julia can be implemented using the package GraphNeuralNetworks.jl. Some key steps:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Sampling spatial locations during the training phase: see maternclusterprocess.\nComputing (spatially-weighted) adjacency matrices: see adjacencymatrix.\nStoring the data as a graph: see GNNGraph.\nConstructing an appropriate architecture for the neural Bayes estimator: see GNN and WeightedGraphConv.","category":"page"},{"location":"framework/#Theoretical-framework","page":"Theoretical framework","title":"Theoretical framework","text":"","category":"section"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"In this section, we provide an overview of point estimation using neural Bayes estimators. For a more detailed discussion on the framework and its implementation, see Sainsbury-Dale et al. (2022; arxiv:2208.12942).","category":"page"},{"location":"framework/#Neural-Bayes-estimators","page":"Theoretical framework","title":"Neural Bayes estimators","text":"","category":"section"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"A parametric statistical model is a set of probability distributions on a sample space mathcalS, where the probability distributions are parameterised via some p-dimensional parameter vector boldsymboltheta on a parameter space Theta. Suppose that we have data from one such distribution, which we denote as boldsymbolZ. Then, the goal of parameter point estimation is to come up with an estimate of the unknown boldsymboltheta from boldsymbolZ using an estimator,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" hatboldsymboltheta mathcalS to Theta","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"which is a mapping from the sample space to the parameter space.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Estimators can be constructed within a decision-theoretic framework. Assume that the sample space is mathcalS = mathbbR^n, and consider a non-negative loss function, L(boldsymboltheta hatboldsymboltheta(boldsymbolZ)), which assesses an estimator hatboldsymboltheta(cdot) for a given boldsymboltheta and data set boldsymbolZ sim f(boldsymbolz mid boldsymboltheta), where f(boldsymbolz mid boldsymboltheta) is the probability density function of the data conditional on boldsymboltheta. boldsymboltheta. An estimator's risk function is its loss averaged over all possible data realisations,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" R(boldsymboltheta hatboldsymboltheta(cdot)) equiv int_mathcalS L(boldsymboltheta hatboldsymboltheta(boldsymbolz))f(boldsymbolz mid boldsymboltheta) rmd boldsymbolz","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"So-called Bayes estimators minimise the Bayes risk,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" r_Omega(hatboldsymboltheta(cdot))\n equiv int_Theta R(boldsymboltheta hatboldsymboltheta(cdot)) rmd Omega(boldsymboltheta) ","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"where Omega(cdot) is a prior measure for boldsymboltheta.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Bayes estimators are theoretically attractive: for example, unique Bayes estimators are admissible and, under suitable regularity conditions and the squared-error loss, are consistent and asymptotically efficient. Further, for a large class of prior distributions, every set of conditions that imply consistency of the maximum likelihood (ML) estimator also imply consistency of Bayes estimators. Importantly, Bayes estimators are not motivated purely by asymptotics: by construction, they are Bayes irrespective of the sample size and model class. Unfortunately, however, Bayes estimators are typically unavailable in closed form for the complex models often encountered in practice. A way forward is to assume a flexible parametric model for hatboldsymboltheta(cdot), and to optimise the parameters within that model in order to approximate the Bayes estimator. Neural networks are ideal candidates, since they are universal function approximators, and because they are also fast to evaluate, usually involving only simple matrix-vector operations.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Let hatboldsymboltheta(boldsymbolZ boldsymbolgamma) denote a neural point estimator, that is, a neural network that returns a point estimate from data boldsymbolZ, where boldsymbolgamma contains the neural-network parameters. Bayes estimators may be approximated with hatboldsymboltheta(cdot boldsymbolgamma^*) by solving the optimisation problem, ","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"boldsymbolgamma^*\nequiv\nundersetboldsymbolgammamathrmargmin r_Omega(hatboldsymboltheta(cdot boldsymbolgamma))","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Typically, r_Omega(cdot) cannot be directly evaluated, but it can be approximated using Monte Carlo methods. Specifically, given a set of K parameter vectors sampled from the prior Omega(cdot) denoted by vartheta and, for each boldsymboltheta in vartheta, J realisations from f(boldsymbolz mid boldsymboltheta) collected in mathcalZ_boldsymboltheta,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" r_Omega(hatboldsymboltheta(cdot boldsymbolgamma))\n approx\nfrac1K sum_boldsymboltheta in vartheta frac1J sum_boldsymbolz in mathcalZ_boldsymboltheta L(boldsymboltheta hatboldsymboltheta(boldsymbolz boldsymbolgamma)) ","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Note that the above approximation does not involve evaluation, or knowledge, of the likelihood function.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"The Monte-Carlo-approximated Bayes risk can be straightforwardly minimised with respect to boldsymbolgamma using back-propagation and stochastic gradient descent. For sufficiently flexible architectures, the point estimator targets a Bayes estimator with respect to L(cdot cdot) and Omega(cdot). We therefore call the fitted neural point estimator a neural Bayes estimator. Like Bayes estimators, neural Bayes estimators target a specific point summary of the posterior distribution. For instance, the absolute-error and squared-error loss functions lead to neural Bayes estimators that approximate the posterior median and mean, respectively.","category":"page"},{"location":"framework/#Construction-of-neural-Bayes-estimators","page":"Theoretical framework","title":"Construction of neural Bayes estimators","text":"","category":"section"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"The neural Bayes estimators is conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical or physical model being considered. The workflow is as follows:","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Define the prior, Omega(cdot).\nChoose a loss function, L(cdot cdot), typically the absolute-error or squared-error loss.\nDesign a suitable neural-network architecture for the neural point estimator hatboldsymboltheta(cdot boldsymbolgamma).\nSample parameters from Omega(cdot) to form training/validation/test parameter sets.\nGiven the above parameter sets, simulate data from the model, to form training/validation/test data sets.\nTrain the neural network (i.e., estimate boldsymbolgamma) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.\nAssess the fitted neural Bayes estimator, hatboldsymboltheta(cdot boldsymbolgamma^*), using the test set.","category":"page"},{"location":"API/#Index","page":"Index","title":"Index","text":"","category":"section"},{"location":"API/","page":"Index","title":"Index","text":"","category":"page"},{"location":"workflow/advancedusage/#Advanced-usage","page":"Advanced usage","title":"Advanced usage","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"In this section, we discuss practical considerations on how to construct neural estimators most effectively.","category":"page"},{"location":"workflow/advancedusage/#Storing-expensive-intermediate-objects-for-data-simulation","page":"Advanced usage","title":"Storing expensive intermediate objects for data simulation","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Parameters sampled from the prior distribution Omega(cdot) may be stored in two ways. Most simply, they can be stored as a p times K matrix, where p is the number of parameters in the model and K is the number of parameter vectors sampled from the prior distribution; this is the approach taken in the example using univariate Gaussian data. Alternatively, they can be stored in a user-defined subtype of the abstract type ParameterConfigurations, whose only requirement is a field θ that stores the p times K matrix of parameters. With this approach, one may store computationally expensive intermediate objects, such as Cholesky factors, for later use when conducting \"on-the-fly\" simulation, which is discussed below.","category":"page"},{"location":"workflow/advancedusage/#On-the-fly-and-just-in-time-simulation","page":"Advanced usage","title":"On-the-fly and just-in-time simulation","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"When data simulation is (relatively) computationally inexpensive, mathcalZ_texttrain can be simulated continuously during training, a technique coined \"simulation-on-the-fly\". Regularly refreshing mathcalZ_texttrain leads to lower out-of-sample error and to a reduction in overfitting. This strategy therefore facilitates the use of larger, more representationally-powerful networks that are prone to overfitting when mathcalZ_texttrain is fixed. Refreshing mathcalZ_texttrain also has an additional computational benefit; data can be simulated \"just-in-time\", in the sense that they can be simulated from a small batch of vartheta_texttrain, used to train the neural estimator, and then removed from memory. This can reduce pressure on memory resources when vartheta_texttrain is very large.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"One may also regularly refresh vartheta_texttrain, and doing so leads to similar benefits. However, fixing vartheta_texttrain allows computationally expensive terms, such as Cholesky factors when working with Gaussian process models, to be reused throughout training, which can substantially reduce the training time for some models. ","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"The above strategies are facilitated with various methods of train.","category":"page"},{"location":"workflow/advancedusage/#Variable-sample-sizes","page":"Advanced usage","title":"Variable sample sizes","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"A neural estimator in the Deep Set representation can be applied to data sets of arbitrary size. However, even when the neural Bayes estimator approximates the true Bayes estimator arbitrarily well, it is conditional on the number of replicates, m, and is not necessarily a Bayes estimator for m^* ne m. Denote a data set comprising m replicates as boldsymbolZ^(m) equiv (boldsymbolZ_1 dots boldsymbolZ_m). There are at least two (non-mutually exclusive) approaches one could adopt if data sets with varying m are envisaged, which we describe below.","category":"page"},{"location":"workflow/advancedusage/#Piecewise-estimators","page":"Advanced usage","title":"Piecewise estimators","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"If data sets with varying m are envisaged, one could train l neural Bayes estimators for different sample sizes, or groups thereof (e.g., a small-sample estimator and a large-sample estimator). Specifically, for sample-size changepoints m_1, m_2, dots, m_l-1, one could construct a piecewise neural Bayes estimator,","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"hatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*)\n=\nbegincases\nhatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*_tildem_1) m leq m_1\nhatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*_tildem_2) m_1 m leq m_2\nquad vdots \nhatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*_tildem_l) m m_l-1\nendcases","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"where, here, boldsymbolgamma^* equiv (boldsymbolgamma^*_tildem_1 dots boldsymbolgamma^*_tildem_l-1), and where boldsymbolgamma^*_tildem are the neural-network parameters optimised for sample size tildem chosen so that hatboldsymboltheta(cdot boldsymbolgamma^*_tildem) is near-optimal over the range of sample sizes in which it is applied. This approach works well in practice, and it is less computationally burdensome than it first appears when used in conjunction with pre-training.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Piecewise neural estimators are implemented with the struct, PiecewiseEstimator, and their construction is facilitated with trainx. ","category":"page"},{"location":"workflow/advancedusage/#Training-with-variable-sample-sizes","page":"Advanced usage","title":"Training with variable sample sizes","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Alternatively, one could treat the sample size as a random variable, M, with support over a set of positive integers, mathcalM, in which case, for the neural Bayes estimator, the risk function becomes","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"R(boldsymboltheta hatboldsymboltheta(cdot boldsymbolgamma))\nequiv\nsum_m in mathcalM\nP(M=m)left(int_mathcalS^m L(boldsymboltheta hatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma))p(boldsymbolZ^(m) mid boldsymboltheta) textd boldsymbolZ^(m)right)","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"This approach does not materially alter the workflow, except that one must also sample the number of replicates before simulating the data.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Below we define data simulation for a range of sample sizes (i.e., a range of integers) under a discrete uniform prior for M, the random variable corresponding to sample size.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"function simulate(parameters, m::R) where {R <: AbstractRange{I}} where I <: Integer\n\n\t## Number of parameter vectors stored in parameters\n\tK = size(parameters, 2)\n\n\t## Generate K sample sizes from the prior distribution for M\n\tm̃ = rand(m, K)\n\n\t## Pseudocode for data simulation\n\tZ = [ for k ∈ 1:K]\n\n\treturn Z\nend","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Then, setting the argument m in train to be an integer range (e.g., 1:30) will train the neural estimator with the given variable sample sizes.","category":"page"},{"location":"workflow/advancedusage/#Combining-neural-and-expert-summary-statistics","page":"Advanced usage","title":"Combining neural and expert summary statistics","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"See DeepSetExpert.","category":"page"},{"location":"workflow/advancedusage/#Loading-previously-saved-neural-estimators","page":"Advanced usage","title":"Loading previously saved neural estimators","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"As training is by far the most computationally demanding part of the workflow, one typically trains an estimator and then saves it for later use. More specifically, one usually saves the parameters of the neural estimator (e.g., the weights and biases of the neural networks); then, to load the neural estimator at a later time, one initialises an estimator with the same architecture used during training, and then loads the saved parameters into this estimator.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"If the argument savepath is specified, train automatically saves the neural estimator's parameters; to load them, one may use the following code, or similar:","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"using Flux: loadparams!\n\nθ̂ = architecture()\nloadparams!(θ̂, loadbestweights(savepath))","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Above, architecture() is a user-defined function that returns a neural estimator with the same architecture as the estimator that we wish to load, but with randomly initialised parameters, and the function loadparams! loads the parameters of the best (as determined by loadbestweights) neural estimator saved in savepath.","category":"page"},{"location":"workflow/overview/#Overview","page":"Overview","title":"Overview","text":"","category":"section"},{"location":"workflow/overview/","page":"Overview","title":"Overview","text":"To develop a neural estimator with NeuralEstimators.jl,","category":"page"},{"location":"workflow/overview/","page":"Overview","title":"Overview","text":"Sample parameters from the prior distribution: the parameters are stored as p times K matrices, with p the number of parameters in the model and K the number of samples (i.e., parameter configurations) in the given parameter set (i.e., training, validation, or test set).\nSimulate data from the assumed model over the parameter sets generated above. These data are stored as a Vector{A}, with each element of the vector associated with one parameter configuration, and where A depends on the representation of the neural estimator (e.g., an Array for CNN-based estimators, or a GNNGraph for GNN-based estimators).\nInitialise a neural network, θ̂, that will be trained into a neural Bayes estimator. \nTrain θ̂ under the chosen loss function using train.\nAssess θ̂ using assess. The resulting object of class Assessment can be used to assess the estimator with respect to the entire parameter space by estimating the risk function with risk, or used to plot the empirical sampling distribution of the estimator.\nApply θ̂ to observed data (once its performance has been checked in the above step). Bootstrap-based uncertainty quantification is facilitated with bootstrap and interval. ","category":"page"},{"location":"workflow/overview/","page":"Overview","title":"Overview","text":"See the Examples and, once familiar with the basic workflow, see Advanced usage for practical considerations on how to most effectively construct neural estimators.","category":"page"},{"location":"API/core/#Core","page":"Core","title":"Core","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"This page documents the functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these functions appear in a standard implementation; that is, from sampling parameters from the prior distribution, to uncertainty quantification of the final estimates via bootstrapping.","category":"page"},{"location":"API/core/#Sampling-parameters","page":"Core","title":"Sampling parameters","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"Parameters sampled from the prior distribution Omega(cdot) are stored as a p times K matrix, where p is the number of parameters in the model and K is the number of parameter vectors sampled from the prior distribution.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). In this case, the user-defined type should be a subtype of the abstract type ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion. ","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"ParameterConfigurations","category":"page"},{"location":"API/core/#NeuralEstimators.ParameterConfigurations","page":"Core","title":"NeuralEstimators.ParameterConfigurations","text":"ParameterConfigurations\n\nAn abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.\n\nThe user-defined type must have a field θ that stores the p × K matrix of parameters, where p is the number of parameters in the model and K is the number of parameter vectors sampled from the prior distribution. There are no other restrictions.\n\nSee subsetparameters for the generic function for subsetting these objects.\n\nExamples\n\nstruct P <: ParameterConfigurations\n\tθ\n\t# other expensive intermediate objects...\nend\n\n\n\n\n\n","category":"type"},{"location":"API/core/#Simulating-data","page":"Core","title":"Simulating data","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"NeuralEstimators facilitates neural estimation for arbitrary statistical models by having the user implicitly define the model via simulated data. The user may provide simulated data directly, or provide a function that simulates data from the model.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"The data should be stored as a Vector{A}, where each element of the vector is associated with one parameter configuration, and where A depends on the architecture of the neural estimator.","category":"page"},{"location":"API/core/#Types-of-estimators","page":"Core","title":"Types of estimators","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"See also Architectures and activations functions that are often used when constructing neural estimators.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"NeuralEstimator\n\nPointEstimator\n\nIntervalEstimator\n\nQuantileEstimator\n\nPiecewiseEstimator","category":"page"},{"location":"API/core/#NeuralEstimators.NeuralEstimator","page":"Core","title":"NeuralEstimators.NeuralEstimator","text":"NeuralEstimator\n\nAn abstract supertype for neural estimators.\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.PointEstimator","page":"Core","title":"NeuralEstimators.PointEstimator","text":"PointEstimator(arch)\n\nA simple point estimator, that is, a mapping from the sample space to the parameter space, defined by the given architecture arch.\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.IntervalEstimator","page":"Core","title":"NeuralEstimators.IntervalEstimator","text":"IntervalEstimator(arch_lower, arch_upper)\nIntervalEstimator(arch)\n\nA neural estimator that produces credible intervals constructed as,\n\nl(Z) l(Z) + mathrmexp(u(Z))\n\nwhere l() and u() are the neural networks arch_lower and arch_upper, both of which should transform data into p-dimensional vectors, where p is the number of parameters in the model. If only a single neural network architecture arch is provided, it will be used for both arch_lower and arch_upper.\n\nInternally, the output from arch_lower and arch_upper are concatenated, so that IntervalEstimator objects transform data into 2p-dimensional vectors.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\n# Generate some toy data\nn = 2 # bivariate data\nm = 10 # number of independent replicates\nZ = rand(n, m)\n\n# Create an architecture\np = 3 # parameters in the model\nw = 8 # width of each layer\nψ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ = Chain(Dense(w, w, relu), Dense(w, p));\narchitecture = DeepSet(ψ, ϕ)\n\n# Initialise the interval estimator\nestimator = IntervalEstimator(architecture)\n\n# Apply the interval estimator\nestimator(Z)\ninterval(estimator, Z, parameter_names = [\"ρ\", \"σ\", \"τ\"])\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.QuantileEstimator","page":"Core","title":"NeuralEstimators.QuantileEstimator","text":"QuantileEstimator()\n\nComing soon: this structure will allow for the simultaneous estimation of an arbitrary number of marginal quantiles of the posterior distribution.\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.PiecewiseEstimator","page":"Core","title":"NeuralEstimators.PiecewiseEstimator","text":"PiecewiseEstimator(estimators, breaks)\n\nCreates a piecewise estimator from a collection of estimators, based on the collection of changepoints, breaks, which should contain one element fewer than the number of estimators.\n\nAny estimator can be included in estimators, including any of the subtypes of NeuralEstimator exported with the package NeuralEstimators (e.g., PointEstimator, IntervalEstimator, etc.).\n\nExamples\n\n# Suppose that we've trained two neural estimators. The first, θ̂₁, is trained\n# for small sample sizes (e.g., m ≤ 30), and the second, `θ̂₂`, is trained for\n# moderate-to-large sample sizes (e.g., m > 30). We construct a piecewise\n# estimator with a sample-size changepoint of 30, which dispatches θ̂₁ if m ≤ 30\n# and θ̂₂ if m > 30.\n\nusing NeuralEstimators\nusing Flux\n\nn = 2 # bivariate data\np = 3 # number of parameters in the model\nw = 8 # width of each layer\n\nψ₁ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ₁ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂₁ = DeepSet(ψ₁, ϕ₁)\n\nψ₂ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ₂ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂₂ = DeepSet(ψ₂, ϕ₂)\n\nθ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], [30])\nZ = [rand(n, 1, m) for m ∈ (10, 50)]\nθ̂(Z)\n\n\n\n\n\n","category":"type"},{"location":"API/core/#Training","page":"Core","title":"Training","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"The function train is used to train a single neural estimator, while the wrapper function trainx is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"train\n\ntrainx","category":"page"},{"location":"API/core/#NeuralEstimators.train","page":"Core","title":"NeuralEstimators.train","text":"train(θ̂, sampler, simulator; )\ntrain(θ̂, θ_train::P, θ_val::P, simulator; ) where {P <: Union{AbstractMatrix, ParameterConfigurations}}\ntrain(θ̂, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}\n\nTrain a neural estimator with architecture θ̂.\n\nThe methods cater for different forms of on-the-fly simulation. The method that takes functions sampler and simulator for sampling parameters and simulating data, respectively, allows for both the parameters and the data to be simulated on-the-fly. Note that simulator is called as simulator(θ, m), where θ is a set of parameters and m is the sample size (see keyword arguments below). If provided with specific instances of parameters (θ_train and θ_val) or data (Z_train and Z_val), they will be held fixed during training.\n\nIn all methods, the validation set is held fixed to reduce noise when evaluating the validation risk function, which is used to monitor the performance of the estimator during training.\n\nIf the number of replicates in Z_train is a multiple of the number of replicates for each element of Z_val, the training data will be recycled throughout training. For example, if each element of Z_train consists of 50 replicates, and each element of Z_val consists of 10 replicates, the first epoch uses the first 10 replicates in Z_train, the second epoch uses the next 10 replicates, and so on, until the sixth epoch again uses the first 10 replicates. Note that this requires the data to be subsettable with the function subsetdata.\n\nKeyword arguments\n\nArguments common to all methods:\n\nloss = mae: the loss function, which should return the average loss when applied to multiple replicates.\nepochs::Integer = 100\nbatchsize::Integer = 32\noptimiser = ADAM(1e-4)\nsavepath::String = \"\": path to save the trained estimator and other information; if savepath is an empty string (default), nothing is saved.\nstopping_epochs::Integer = 5: cease training if the risk doesn't improve in this number of epochs.\nuse_gpu::Bool = true\nverbose::Bool = true\n\nArguments common to train(θ̂, P, simulator) and train(θ̂, θ_train, θ_val, simulator):\n\nm: sample sizes (either an Integer or a collection of Integers).\nepochs_per_Z_refresh::Integer = 1: how often to refresh the training data.\nsimulate_just_in_time::Bool = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.\n\nArguments unique to train(θ̂, P, simulator):\n\nK::Integer = 10000: number of parameter vectors in the training set; the size of the validation set is K ÷ 5.\nξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices); if ξ is provided, the parameter sampler is called as sampler(K, ξ).\nepochs_per_θ_refresh::Integer = 1: how often to refresh the training parameters. Must be a multiple of epochs_per_Z_refresh.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\nusing Distributions\nimport NeuralEstimators: simulate\n\n# data simulator\nm = 15\nfunction simulate(θ_set, m)\n\t[Float32.(rand(Normal(θ[1], θ[2]), 1, m)) for θ ∈ eachcol(θ_set)]\nend\n\n# parameter sampler\nK = 10000\nΩ = (μ = Normal(0, 1), σ = Uniform(0.1, 1)) # prior\nstruct sampler\n\tμ\n\tσ\nend\nfunction (s::sampler)(K)\n\tμ = rand(s.μ, K)\n\tσ = rand(s.σ, K)\n\tθ = hcat(μ, σ)'\n\treturn θ\nend\nsmplr = sampler(Ω.μ, Ω.σ)\n\n# architecture\np = length(Ω) # number of parameters in the statistical model\nw = 32 # width of each layer\nψ = Chain(Dense(1, w, relu), Dense(w, w, relu))\nϕ = Chain(Dense(w, w, relu), Dense(w, p))\nθ̂ = DeepSet(ψ, ϕ)\n\n# training: full simulation on-the-fly\nθ̂ = train(θ̂, smplr, simulate, m = m, K = K, epochs = 5)\nθ̂ = train(θ̂, smplr, simulate, m = m, K = K, epochs = 5)\nθ̂ = train(θ̂, smplr, simulate, m = m, K = K, epochs = 10, epochs_per_θ_refresh = 4, epochs_per_Z_refresh = 2)\n\n# training: fixed parameters\nθ_train = smplr(K)\nθ_val = smplr(K ÷ 5)\nθ̂ = train(θ̂, θ_train, θ_val, simulate, m = m, epochs = 5)\nθ̂ = train(θ̂, θ_train, θ_val, simulate, m = m, epochs = 5, epochs_per_Z_refresh = 2)\n\n# training: fixed parameters and fixed data\nZ_train = simulate(θ_train, m)\nZ_val = simulate(θ_val, m)\nθ̂ = train(θ̂, θ_train, θ_val, Z_train, Z_val, epochs = 5)\n\n\n\n\n\n","category":"function"},{"location":"API/core/#NeuralEstimators.trainx","page":"Core","title":"NeuralEstimators.trainx","text":"trainx(θ̂, P, simulator, M; )\ntrainx(θ̂, θ_train, θ_val, simulator, M; )\ntrainx(θ̂, θ_train, θ_val, Z_train::T, Z_val::T, M; )\ntrainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ) where {V <: AbstractVector{AbstractVector{Any}}}\n\nA wrapper around train to construct neural estimators for different sample sizes.\n\nThe collection M specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if M = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.\n\nThe method for Z_train::T and Z_val::T subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ M. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train and Z_val; hence, it does not need to invoke subsetdata, which can be slow or difficult to define in some cases (e.g., for graphical data).\n\nThe keyword arguments inherit from train, and certain keyword arguments can be given as vectors. For example, if we are training two estimators, we can use a different number of epochs by providing epochs = [e₁, e₂]. Other arguments that allow vectors are batchsize, stopping_epochs, and optimiser.\n\n\n\n\n\n","category":"function"},{"location":"API/core/#Assessing-a-neural-estimator","page":"Core","title":"Assessing a neural estimator","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"assess\n\nAssessment\n\nrisk","category":"page"},{"location":"API/core/#NeuralEstimators.assess","page":"Core","title":"NeuralEstimators.assess","text":"assess(estimators, θ, Z; )\n\nUsing a collection of estimators, compute estimates from data Z simulated based on true parameter vectors stored in θ.\n\nIf Z contains more data sets than parameter vectors, the parameter matrix will be recycled by horizontal concatenation.\n\nThe output is of type Assessment; see ?Assessment for details.\n\nKeyword arguments\n\nestimator_names::Vector{String}: names of the estimators (sensible defaults provided).\nparameter_names::Vector{String}: names of the parameters (sensible defaults provided). If ξ is provided with a field parameter_names, those names will be used.\nξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices).\nuse_ξ = false: a Bool or a collection of Bool objects with length equal to the number of estimators. Specifies whether or not the estimator uses ξ: if it does, the estimator will be applied as estimator(Z, ξ). This argument is useful when multiple estimators are provided, only some of which need ξ; hence, if only one estimator is provided and ξ is not nothing, use_ξ is automatically set to true.\nuse_gpu = true: a Bool or a collection of Bool objects with length equal to the number of estimators.\nverbose::Bool = true\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\nn = 10 # number of observations in each realisation\np = 4 # number of parameters in the statistical model\n\n# Construct the neural estimator\nw = 32 # width of each layer\nψ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\n\n# Generate testing parameters\nK = 100\nθ = rand(p, K)\n\n# Data for a single sample size\nm = 30\nZ = [rand(n, m) for _ ∈ 1:K];\nassessment = assess([θ̂], θ, Z);\nrisk(assessment)\n\n# Multiple data sets for each parameter vector\nJ = 5\nZ = repeat(Z, J);\nassessment = assess([θ̂], θ, Z);\nrisk(assessment)\n\n# With set-level information\nqₓ = 2\nϕ = Chain(Dense(w + qₓ, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\nx = [rand(qₓ) for _ ∈ eachindex(Z)]\nassessment = assess([θ̂], θ, (Z, x));\nrisk(assessment)\n\n\n\n\n\n","category":"function"},{"location":"API/core/#NeuralEstimators.Assessment","page":"Core","title":"NeuralEstimators.Assessment","text":"Assessment(df::DataFrame, runtime::DataFrame)\n\nA type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:\n\nestimator: the name of the estimator\nparameter: the name of the parameter\ntruth: the true value of the parameter\nestimate: the estimated value of the parameter\nm: the sample size (number of iid replicates)\nk: the index of the parameter vector in the test set\nj: the index of the data set\n\nMultiple Assessment objects can be combined with merge().\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.risk","page":"Core","title":"NeuralEstimators.risk","text":"risk(assessment::Assessment; loss = (x, y) -> abs(x - y), average_over_parameters = true)\n\nComputes a Monte Carlo approximation of the Bayes risk,\n\nr_Omega(hatboldsymboltheta(cdot))\napprox\nfrac1K sum_boldsymboltheta in vartheta frac1J sum_boldsymbolZ in mathcalZ_boldsymboltheta L(boldsymboltheta hatboldsymboltheta(boldsymbolZ))\n\nwhere vartheta denotes a set of K parameter vectors sampled from the prior Omega(cdot) and, for each boldsymboltheta in vartheta, we have J sets of m mutually independent realisations from the model collected in mathcalZ_boldsymboltheta.\n\nKeyword arguments\n\nloss = (x, y) -> abs(x - y): a binary operator (default absolute-error loss).\naverage_over_parameters::Bool = true: if true (default), the loss is averaged over all parameters; otherwise, the loss is averaged over each parameter separately.\naverage_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes m; otherwise, the loss is averaged over each sample size separately.\n\n\n\n\n\n","category":"function"},{"location":"API/core/#Bootstrapping","page":"Core","title":"Bootstrapping","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"bootstrap\n\ninterval","category":"page"},{"location":"API/core/#NeuralEstimators.bootstrap","page":"Core","title":"NeuralEstimators.bootstrap","text":"bootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}\nbootstrap(θ̂, parameters::P, simulator, m::Integer; B = 400) where P <: Union{AbstractMatrix, ParameterConfigurations}\nbootstrap(θ̂, Z; B = 400, blocks = nothing)\n\nGenerates B bootstrap estimates from an estimator θ̂.\n\nParametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).\n\nNon-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z, but this can only be achieved exactly if all blocks are equal in length.\n\nThe keyword argument use_gpu is a flag determining whether to use the GPU, if it is available (default true).\n\nThe return type is a p × B matrix, where p is the number of parameters in the model.\n\n\n\n\n\n","category":"function"},{"location":"API/core/#NeuralEstimators.interval","page":"Core","title":"NeuralEstimators.interval","text":"interval(θ̃, θ̂ = nothing; type::String, probs = [0.05, 0.95], parameter_names)\n\nCompute a confidence interval using the p × B matrix of bootstrap samples, θ̃, where p is the number of parameters in the model.\n\nIf type = \"quantile\", the interval is constructed by simply taking the quantiles of θ̃, and if type = \"reverse-quantile\", the so-called reverse-quantile method is used. In both cases, the quantile levels are controlled by the argument probs.\n\nThe rows can be named with a vector of strings parameter_names.\n\nThe return type is a p × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval.\n\nExamples\n\nusing NeuralEstimators\np = 3\nB = 50\nθ̃ = rand(p, B)\nθ̂ = rand(p)\ninterval(θ̃)\ninterval(θ̃, θ̂, type = \"basic\")\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#Miscellaneous","page":"Miscellaneous","title":"Miscellaneous","text":"","category":"section"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"Order = [:type, :function]\nPages = [\"utility.md\"]","category":"page"},{"location":"API/utility/#Core","page":"Miscellaneous","title":"Core","text":"","category":"section"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"These functions can appear during the core workflow, and may need to be overloaded in some applications.","category":"page"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"numberreplicates\n\nsubsetdata\n\nsubsetparameters","category":"page"},{"location":"API/utility/#NeuralEstimators.numberreplicates","page":"Miscellaneous","title":"NeuralEstimators.numberreplicates","text":"numberofreplicates(Z)\n\nGeneric function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.subsetdata","page":"Miscellaneous","title":"NeuralEstimators.subsetdata","text":"Generic function for subsetting replicates from a data set. Default methods are:\n\nsubsetdata(Z::A, m) where {A <: AbstractArray{T, N}} where {T, N}\nsubsetdata(Z::G, m) where {G <: AbstractGraph}\n\nNote that subsetdata is slow for graphical data, and one should consider using a method of train that does not require the data to be subsetted when working with graphical data: use numberreplicates to check that the training and validation data sets are equally replicated, which prevents the invocation of subsetdata. Note also that subsetdata only applies to vectors of batched graphs.\n\nIf the user is working with data that is not covered by the default methods, simply overload subsetdata with the appropriate type for Z.\n\nExamples\n\nusing NeuralEstimators\nusing GraphNeuralNetworks\nusing Flux: batch\n\nn = 5 # number of observations in each realisation\nm = 6 # number of replicates for each parameter vector\nd = 1 # dimension of the response variable\nK = 2 # number of parameter vectors\n\n# Array data\nZ = [rand(n, d, m) for k ∈ 1:K]\nsubsetdata(Z, 1:3) # extract first 3 replicates for each parameter vector\n\n# Graphical data\ne = 8 # number of edges\nZ = [batch([rand_graph(n, e, ndata = rand(d, n)) for _ ∈ 1:m]) for k ∈ 1:K]\nsubsetdata(Z, 1:3) # extract first 3 replicates for each parameter vector\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.subsetparameters","page":"Miscellaneous","title":"NeuralEstimators.subsetparameters","text":"subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}\nsubsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}\n\nSubset parameters using a collection of indices.\n\nArrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#Utility-functions","page":"Miscellaneous","title":"Utility functions","text":"","category":"section"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"adjacencymatrix\n\ncontainertype\n\nestimateinbatches\n\nexpandgrid\n\nloadbestweights\n\nstackarrays\n\nvectotril","category":"page"},{"location":"API/utility/#NeuralEstimators.adjacencymatrix","page":"Miscellaneous","title":"NeuralEstimators.adjacencymatrix","text":"adjacencymatrix(M::Matrix, k::Integer)\nadjacencymatrix(M::Matrix, r::Float)\n\nComputes a spatially weighted adjacency matrix from M based on either the k nearest neighbours of each location, or a fixed spatial radius of r units.\n\nIf M is a square matrix, is it treated as a distance matrix; otherwise, it should be an n x d matrix, where n is the number of spatial sample locations and d is the spatial dimension (typically d = 2).\n\nExamples\n\nusing NeuralEstimators\nusing Distances\n\nn = 100\nd = 2\nS = rand(n, d)\nk = 5\nr = 0.3\n\n# Memory efficient constructors (avoids constructing the full distance matrix D)\nadjacencymatrix(S, k)\nadjacencymatrix(S, r)\n\n# Construct from full distance matrix D\nD = pairwise(Euclidean(), S, S, dims = 1)\nadjacencymatrix(D, k)\nadjacencymatrix(D, r)\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.containertype","page":"Miscellaneous","title":"NeuralEstimators.containertype","text":"containertype(A::Type)\ncontainertype(::Type{A}) where A <: SubArray\ncontainertype(a::A) where A\n\nReturns the container type of its argument.\n\nIf given a SubArray, returns the container type of the parent array.\n\nExamples\n\na = rand(3, 4)\ncontainertype(a)\ncontainertype(typeof(a))\n[containertype(x) for x ∈ eachcol(a)]\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.estimateinbatches","page":"Miscellaneous","title":"NeuralEstimators.estimateinbatches","text":"estimateinbatches(θ̂, z; batchsize::Integer = 32, use_gpu::Bool = true)\n\nApply the estimator θ̂ on minibatches of z of size batchsize, to avoid memory issues that can occur when z is very large.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.expandgrid","page":"Miscellaneous","title":"NeuralEstimators.expandgrid","text":"expandgrid(xs, ys)\n\nSame as expand.grid() in R, but currently caters for two dimensions only.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.loadbestweights","page":"Miscellaneous","title":"NeuralEstimators.loadbestweights","text":"loadbestweights(path::String)\n\nGiven a path to a training run containing neural networks saved with names \"network_epochx.bson\" and an object saved as \"loss_per_epoch.bson\", returns the weights of the best network (measured by validation loss).\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.stackarrays","page":"Miscellaneous","title":"NeuralEstimators.stackarrays","text":"stackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}\n\nStack a vector of arrays v along the last dimension of each array, optionally merging the final dimension of the stacked array.\n\nThe arrays must be of the same size for the first N-1 dimensions. However, if merge = true, the size of the final dimension can vary.\n\nExamples\n\n# Vector containing arrays of the same size:\nZ = [rand(2, 3, m) for m ∈ (1, 1)];\nstackarrays(Z)\nstackarrays(Z, merge = false)\n\n# Vector containing arrays with differing final dimension size:\nZ = [rand(2, 3, m) for m ∈ (1, 2)];\nstackarrays(Z)\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.vectotril","page":"Miscellaneous","title":"NeuralEstimators.vectotril","text":"vectotril(v; strict = false)\nvectotriu(v; strict = false)\n\nConverts a vector v of length d(d+1)2 (a triangular number) into a d d lower or upper triangular matrix.\n\nIf strict = true, the matrix will be strictly lower or upper triangular, that is, a (d+1) (d+1) triangular matrix with zero diagonal.\n\nNote that the triangular matrix is constructed on the CPU, but the returned matrix will be a GPU array if v is a GPU array. Note also that the return type is not of type Triangular matrix (i.e., the zeros are materialised) since Traingular matrices are not always compatible with other GPU operations.\n\nExamples\n\nusing NeuralEstimators\n\nd = 4\nn = d*(d+1)÷2\nv = collect(range(1, n))\nvectotril(v)\nvectotriu(v)\nvectotril(v; strict = true)\nvectotriu(v; strict = true)\n\n\n\n\n\n","category":"function"},{"location":"API/loss/#Loss-functions","page":"Loss functions","title":"Loss functions","text":"","category":"section"},{"location":"API/loss/","page":"Loss functions","title":"Loss functions","text":"In addition to the standard loss functions provided by Flux (e.g., mae, mse, etc.), NeuralEstimators provides the following loss functions.","category":"page"},{"location":"API/loss/","page":"Loss functions","title":"Loss functions","text":"kpowerloss\n\nquantileloss\n\nintervalscore","category":"page"},{"location":"API/loss/#NeuralEstimators.kpowerloss","page":"Loss functions","title":"NeuralEstimators.kpowerloss","text":"kpowerloss(θ̂, y, k; agg = mean, safeorigin = true, ϵ = 0.1)\n\nFor k ∈ (0, ∞), the k-th power absolute-distance loss,\n\nL(θ θ) = θ - θᵏ\n\ncontains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k → 0).\n\nIt is Lipschitz continuous iff k = 1, convex iff k ≥ 1, and strictly convex iff k > 1. It is quasiconvex for all k > 0.\n\nIf safeorigin = true, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ-interval surrounding the origin.\n\n\n\n\n\n","category":"function"},{"location":"API/loss/#NeuralEstimators.quantileloss","page":"Loss functions","title":"NeuralEstimators.quantileloss","text":"quantileloss(θ̂, θ, q; agg = mean)\nquantileloss(θ̂, θ, q::V; agg = mean) where {T, V <: AbstractVector{T}}\n\nThe asymmetric loss function whose minimiser is the qth posterior quantile; namely,\n\nL(θ θ q) = (θ - θ)(𝕀(θ - θ 0) - q)\n\nwhere q ∈ (0, 1) and 𝕀(⋅) is the indicator function.\n\nThe method that takes q as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂ is assumed to be pr, where p is the number of parameters: then, q should be an r-vector.\n\nFor further discussion on this loss function, see Equation (7) of Cressie, N. (2022), \"Decisions, decisions, decisions in an uncertain environment\", arXiv:2209.13157.\n\nExamples\n\np = 1\nK = 10\nθ = rand(p, K)\nθ̂ = rand(p, K)\nquantileloss(θ̂, θ, 0.1)\n\nθ̂ = rand(3p, K)\nquantileloss(θ̂, θ, [0.1, 0.5, 0.9])\n\np = 2\nθ = rand(p, K)\nθ̂ = rand(p, K)\nquantileloss(θ̂, θ, 0.1)\n\nθ̂ = rand(3p, K)\nquantileloss(θ̂, θ, [0.1, 0.5, 0.9])\n\n\n\n\n\n","category":"function"},{"location":"API/loss/#NeuralEstimators.intervalscore","page":"Loss functions","title":"NeuralEstimators.intervalscore","text":"intervalscore(l, u, θ, α; agg = mean)\nintervalscore(θ̂, θ, α; agg = mean)\n\nGiven a 100×(1-α)% confidence interval [l, u] with true value θ, the interval score is defined by\n\nS(l u θ α) = (u - l) + 2α¹(l - θ)𝕀(θ l) + 2α¹(θ - u)𝕀(θ u)\n\nwhere α ∈ (0, 1) and 𝕀(⋅) is the indicator function.\n\nThe method that takes a single value θ̂ assumes that θ̂ is a matrix with 2p rows, where p is the number of parameters in the statistical model. Then, the first and second set of p rows will be used as l and u, respectively.\n\nFor further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), \"Strictly proper scoring rules, prediction, and estimation\", Journal of the American statistical Association, 102, 359–378.\n\n\n\n\n\n","category":"function"},{"location":"API/architectures/#Architectures-and-activations-functions","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"","category":"section"},{"location":"API/architectures/#Index","page":"Architectures and activations functions","title":"Index","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"Order = [:type, :function]\nPages = [\"architectures.md\"]","category":"page"},{"location":"API/architectures/#Architectures","page":"Architectures and activations functions","title":"Architectures","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"Although the user is free to construct their neural estimator however they see fit, NeuralEstimators provides several useful architectures described below.","category":"page"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"DeepSet\n\nDeepSetExpert\n\nGNN","category":"page"},{"location":"API/architectures/#NeuralEstimators.DeepSet","page":"Architectures and activations functions","title":"NeuralEstimators.DeepSet","text":"DeepSet(ψ, ϕ, a)\nDeepSet(ψ, ϕ; a::String = \"mean\")\n\nThe Deep Set representation,\n\nθ(𝐙) = ϕ(𝐓(𝐙))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nwhere 𝐙 ≡ (𝐙₁', …, 𝐙ₘ')' are independent replicates from the model, ψ and ϕ are neural networks, and a is a permutation-invariant aggregation function.\n\nTo make the architecture agnostic to the sample size m, the aggregation function a must aggregate over the replicates. It can be specified as a positional argument of type Function, or as a keyword argument with permissible values \"mean\", \"sum\", and \"logsumexp\".\n\nDeepSet objects act on data stored as Vector{A}, where each element of the vector is associated with one parameter vector (i.e., one set of independent replicates), and where A depends on the form of the data and the chosen architecture for ψ. As a rule of thumb, when the data are stored as an array, the replicates are stored in the final dimension of the array. (This is usually the 'batch' dimension, but batching with DeepSets is done at the set level, i.e., sets of replicates are batched together.) For example, with gridded spatial data and ψ a CNN, A should be a 4-dimensional array, with the replicates stored in the 4ᵗʰ dimension.\n\nNote that, internally, data stored as Vector{Arrays} are first concatenated along the replicates dimension before being passed into the inner neural network ψ; this means that ψ is applied to a single large array rather than many small arrays, which can substantially improve computational efficiency, particularly on the GPU.\n\nSet-level information, 𝐱, that is not a function of the data can be passed directly into the outer network ϕ in the following manner,\n\nθ(𝐙) = ϕ((𝐓(𝐙) 𝐱))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nThis is done by providing a Tuple{Vector{A}, Vector{B}}, where the first element of the tuple contains the vector of data sets and the second element contains the vector of set-level information.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\nn = 10 # number of observations in each realisation\np = 4 # number of parameters in the statistical model\n\n# Construct the neural estimator\nw = 32 # width of each layer\nψ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\n\n# Apply the estimator\nZ₁ = rand(n, 3); # single set of 3 realisations\nZ₂ = [rand(n, m) for m ∈ (3, 3)]; # two sets each containing 3 realisations\nZ₃ = [rand(n, m) for m ∈ (3, 4)]; # two sets containing 3 and 4 realisations\nθ̂(Z₁)\nθ̂(Z₂)\nθ̂(Z₃)\n\n# Repeat the above but with set-level information:\nqₓ = 2\nϕ = Chain(Dense(w + qₓ, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\nx₁ = rand(qₓ)\nx₂ = [rand(qₓ) for _ ∈ eachindex(Z₂)]\nθ̂((Z₁, x₁))\nθ̂((Z₂, x₂))\nθ̂((Z₃, x₂))\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.DeepSetExpert","page":"Architectures and activations functions","title":"NeuralEstimators.DeepSetExpert","text":"DeepSetExpert(ψ, ϕ, S, a)\nDeepSetExpert(ψ, ϕ, S; a::String = \"mean\")\nDeepSetExpert(deepset::DeepSet, ϕ, S)\n\nIdentical to DeepSet, but with additional expert summary statistics,\n\nθ(𝐙) = ϕ((𝐓(𝐙) 𝐒(𝐙)))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nwhere S is a function that returns a vector of expert summary statistics.\n\nThe constructor DeepSetExpert(deepset::DeepSet, ϕ, S) inherits ψ and a from deepset.\n\nSimilarly to DeepSet, set-level information can be incorporated by passing a Tuple, in which case we have\n\nθ(𝐙) = ϕ((𝐓(𝐙) 𝐒(𝐙) 𝐱))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\nn = 10 # number of observations in each realisation\np = 4 # number of parameters in the statistical model\n\n# Construct the neural estimator\nS = samplesize\nqₛ = 1\nqₜ = 32\nw = 16\nψ = Chain(Dense(n, w, relu), Dense(w, qₜ, relu));\nϕ = Chain(Dense(qₜ + qₛ, w), Dense(w, p));\nθ̂ = DeepSetExpert(ψ, ϕ, S)\n\n# Apply the estimator\nZ₁ = rand(n, 3); # single set\nZ₂ = [rand(n, m) for m ∈ (3, 4)]; # two sets\nθ̂(Z₁)\nθ̂(Z₂)\n\n# Repeat the above but with set-level information:\nqₓ = 2\nϕ = Chain(Dense(qₜ + qₛ + qₓ, w, relu), Dense(w, p));\nθ̂ = DeepSetExpert(ψ, ϕ, S)\nx₁ = rand(qₓ)\nx₂ = [rand(qₓ) for _ ∈ eachindex(Z₂)]\nθ̂((Z₁, x₁))\nθ̂((Z₂, x₂))\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.GNN","page":"Architectures and activations functions","title":"NeuralEstimators.GNN","text":"GNN(propagation, readout, ϕ, a)\nGNN(propagation, readout, ϕ, a::String = \"mean\")\n\nA graph neural network (GNN) designed for parameter point estimation.\n\nThe propagation module transforms graphical input data into a set of hidden-feature graphs; the readout module aggregates these feature graphs into a single hidden feature vector of fixed length; the function a(⋅) is a permutation-invariant aggregation function, and ϕ is a neural network.\n\nThe data should be stored as a GNNGraph or AbstractVector{GNNGraph}, where each graph is associated with a single parameter vector. The graphs may contain sub-graphs corresponding to independent replicates from the model.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\nusing Flux: batch\nusing GraphNeuralNetworks\nusing Statistics: mean\n\n# Propagation module\nd = 1 # dimension of response variable\nnh = 32 # dimension of node feature vectors\npropagation = GNNChain(GraphConv(d => nh), GraphConv(nh => nh), GraphConv(nh => nh))\n\n# Readout module (using \"universal pooling\")\nnt = 64 # dimension of the summary vector for each node\nno = 128 # dimension of the final summary vector for each graph\nreadout = UniversalPool(Dense(nh, nt), Dense(nt, nt))\n\n# Alternative readout module (using the elementwise average)\n# readout = GlobalPool(mean); no = nh\n\n# Mapping module\np = 3 # number of parameters in the statistical model\nw = 64 # width of layers used for the mapping network ϕ\nϕ = Chain(Dense(no, w, relu), Dense(w, w, relu), Dense(w, p))\n\n# Construct the estimator\nθ̂ = GNN(propagation, readout, ϕ)\n\n# Apply the estimator to:\n# \t1. a single graph,\n# \t2. a single graph with sub-graphs (corresponding to independent replicates), and\n# \t3. a vector of graphs (corresponding to multiple spatial data sets).\ng₁ = rand_graph(11, 30, ndata=rand(d, 11))\ng₂ = rand_graph(13, 40, ndata=rand(d, 13))\ng₃ = batch([g₁, g₂])\nθ̂(g₁)\nθ̂(g₃)\nθ̂([g₁, g₂, g₃])\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#Layers","page":"Architectures and activations functions","title":"Layers","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"WeightedGraphConv\n\nUniversalPool","category":"page"},{"location":"API/architectures/#NeuralEstimators.WeightedGraphConv","page":"Architectures and activations functions","title":"NeuralEstimators.WeightedGraphConv","text":"WeightedGraphConv(in => out, σ=identity; aggr=+, bias=true, init=glorot_uniform)\n\nSame as regular GraphConv layer, but where the neighbours of a node are weighted by their spatial distance to that node.\n\nArguments\n\nin: The dimension of input features.\nout: The dimension of output features.\nσ: Activation function.\naggr: Aggregation operator for the incoming messages (e.g. +, *, max, min, and mean).\nbias: Add learnable bias.\ninit: Weights' initializer.\n\nExamples\n\nusing NeuralEstimators\nusing GraphNeuralNetworks\n\n# Construct a spatially-weighted adjacency matrix based on k-nearest neighbours\n# with k = 5, and convert to a graph with random (uncorrelated) dummy data:\nn = 100\nS = rand(n, 2)\nd = 1 # dimension of each observation (univariate data here)\nA = adjacencymatrix(S, 5)\nZ = GNNGraph(A, ndata = rand(d, n))\n\n# Construct the layer and apply it to the data to generate convolved features\nlayer = WeightedGraphConv(d => 16)\nlayer(Z)\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.UniversalPool","page":"Architectures and activations functions","title":"NeuralEstimators.UniversalPool","text":"UniversalPool(ψ, ϕ)\n\nPooling layer (i.e., readout layer) from the paper 'Universal Readout for Graph Convolutional Neural Networks'. It takes the form,\n\nmathbfV = ϕ(G¹ sum_sin G ψ(mathbfh_s))\n\nwhere mathbfV denotes the summary vector for graph G, mathbfh_s denotes the vector of hidden features for node s in G, and ψ and ϕ are dense neural networks.\n\nSee also the pooling layers available from GraphNeuralNetworks.jl.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\nusing GraphNeuralNetworks\nusing Graphs: random_regular_graph\n\n# Construct an input graph G\nn_h = 16 # dimension of each feature node\nn_nodes = 10\nn_edges = 4\nG = GNNGraph(random_regular_graph(n_nodes, n_edges), ndata = rand(n_h, n_nodes))\n\n# Construct the pooling layer\nn_t = 32 # dimension of the summary vector for each node\nn_v = 64 # dimension of the final summary vector V\nψ = Dense(n_h, n_t)\nϕ = Dense(n_t, n_v)\npool = UniversalPool(ψ, ϕ)\n\n# Apply the pooling layer\npool(G)\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#Output-activation-functions","page":"Architectures and activations functions","title":"Output activation functions","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"These layers can be used at the end of an architecture to ensure that the neural estimator provides valid parameters.","category":"page"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"Compress\n\nCholeskyCovariance\n\nCovarianceMatrix\n\nCorrelationMatrix\n\nSplitApply","category":"page"},{"location":"API/architectures/#NeuralEstimators.Compress","page":"Architectures and activations functions","title":"NeuralEstimators.Compress","text":"Compress(a, b, k = 1)\n\nLayer that compresses its input to be within the range a and b, where each element of a is less than the corresponding element of b.\n\nThe layer uses a logistic function,\n\nl(θ) = a + fracb - a1 + e^-kθ\n\nwhere the arguments a and b together combine to shift and scale the logistic function to the desired range, and the growth rate k controls the steepness of the curve.\n\nThe logistic function given here contains an additional parameter, θ₀, which is the input value corresponding to the functions midpoint. In Compress, we fix θ₀ = 0, since the output of a randomly initialised neural network is typically around zero.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\na = [25, 0.5, -pi/2]\nb = [500, 2.5, 0]\np = length(a)\nK = 100\nθ = randn(p, K)\nl = Compress(a, b)\nl(θ)\n\nn = 20\nθ̂ = Chain(Dense(n, p), l)\nZ = randn(n, K)\nθ̂(Z)\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.CholeskyCovariance","page":"Architectures and activations functions","title":"NeuralEstimators.CholeskyCovariance","text":"CholeskyCovariance(d)\n\nLayer for constructing the parameters of the lower Cholesky factor associated with an unconstrained d×d covariance matrix.\n\nThe layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension, but with d rows constrained to be positive (corresponding to the diagonal elements of the Cholesky factor) and the remaining rows unconstrained.\n\nThe ordering of the transformed Matrix aligns with Julia's column-major ordering. For example, when modelling the Cholesky factor,\n\nbeginbmatrix\nL₁₁ \nL₂₁ L₂₂ \nL₃₁ L₃₂ L₃₃ \nendbmatrix\n\nthe rows of the matrix returned by a CholeskyCovariance layer will be ordered as\n\nL₁₁ L₂₁ L₃₁ L₂₂ L₃₂ L₃₃\n\nwhich means that the output can easily be transformed into the implied Cholesky factors using vectotril.\n\nExamples\n\nusing NeuralEstimators\n\nd = 4\np = d*(d+1)÷2\nθ = randn(p, 50)\nl = CholeskyCovariance(d)\nθ = l(θ) # returns matrix (used for Flux networks)\nL = [vectotril(y) for y ∈ eachcol(θ)] # convert matrix to Cholesky factors\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.CovarianceMatrix","page":"Architectures and activations functions","title":"NeuralEstimators.CovarianceMatrix","text":"CovarianceMatrix(d)\n\nLayer for constructing the parameters of an unconstrained d×d covariance matrix.\n\nThe layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension.\n\nInternally, it uses a CholeskyCovariance layer to construct a valid Cholesky factor 𝐋, and then extracts the lower triangle from the positive-definite covariance matrix 𝚺 = 𝐋𝐋'. The lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the covariance matrix,\n\nbeginbmatrix\nΣ₁₁ Σ₁₂ Σ₁₃ \nΣ₂₁ Σ₂₂ Σ₂₃ \nΣ₃₁ Σ₃₂ Σ₃₃ \nendbmatrix\n\nthe rows of the matrix returned by a CovarianceMatrix layer will be ordered as\n\nΣ₁₁ Σ₂₁ Σ₃₁ Σ₂₂ Σ₃₂ Σ₃₃\n\nwhich means that the output can easily be transformed into the implied covariance matrices using vectotril and Symmetric.\n\nExamples\n\nusing NeuralEstimators\nusing LinearAlgebra\n\nd = 4\np = d*(d+1)÷2\nθ = randn(p, 50)\n\nl = CovarianceMatrix(d)\nθ = l(θ)\nΣ = [Symmetric(cpu(vectotril(y)), :L) for y ∈ eachcol(θ)]\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.CorrelationMatrix","page":"Architectures and activations functions","title":"NeuralEstimators.CorrelationMatrix","text":"CorrelationMatrix(d)\n\nLayer for constructing the parameters of an unconstrained d×d correlation matrix.\n\nThe layer transforms a Matrix with d(d-1)÷2 rows into a Matrix with the same dimension.\n\nInternally, the layers uses the algorithm described here and here to construct a valid Cholesky factor 𝐋, and then extracts the strict lower triangle from the positive-definite correlation matrix 𝐑 = 𝐋𝐋'. The strict lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the correlation matrix,\n\nbeginbmatrix\n1 R₁₂ R₁₃ \nR₂₁ 1 R₂₃\nR₃₁ R₃₂ 1\nendbmatrix\n\nthe rows of the matrix returned by a CorrelationMatrix layer will be ordered as\n\nR₂₁ R₃₁ R₃₂\n\nwhich means that the output can easily be transformed into the implied correlation matrices using the strict variant of vectotril and Symmetric.\n\nExamples\n\nusing NeuralEstimators\nusing LinearAlgebra\n\nd = 4\np = d*(d-1)÷2\nl = CorrelationMatrix(d)\nθ = randn(p, 50)\n\n# returns a matrix of parameters\nθ = l(θ)\n\n# convert matrix of parameters to implied correlation matrices\nR = map(eachcol(θ)) do y\n\tR = Symmetric(cpu(vectotril(y, strict = true)), :L)\n\tR[diagind(R)] .= 1\n\tR\nend\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.SplitApply","page":"Architectures and activations functions","title":"NeuralEstimators.SplitApply","text":"SplitApply(layers, indices)\n\nSplits an array into multiple sub-arrays by subsetting the rows using the collection of indices, and then applies each layer in layers to the corresponding sub-array.\n\nSpecifically, for each i = 1, …, n, with n the number of layers, SplitApply(x) performs layers[i](x[indices[i], :]), and then vertically concatenates the resulting transformed arrays.\n\nExamples\n\nusing NeuralEstimators\n\nd = 4\nK = 50\np₁ = 2 # number of non-covariance matrix parameters\np₂ = d*(d+1)÷2 # number of covariance matrix parameters\np = p₁ + p₂\n\na = [0.1, 4]\nb = [0.9, 9]\nl₁ = Compress(a, b)\nl₂ = CovarianceMatrix(d)\nl = SplitApply([l₁, l₂], [1:p₁, p₁+1:p])\n\nθ = randn(p, K)\nl(θ)\n\n\n\n\n\n","category":"type"},{"location":"#NeuralEstimators","page":"NeuralEstimators","title":"NeuralEstimators","text":"","category":"section"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"Neural estimators are neural networks that transform data into parameter point estimates, and they are a promising recent approach to inference. They are likelihood free, substantially faster than classical methods, and can be designed to be approximate Bayes estimators. Uncertainty quantification with neural estimators is also straightforward through the bootstrap distribution, which is essentially available \"for free\" with a neural estimator.","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"The package NeuralEstimators facilitates the development of neural estimators in a user-friendly manner. It caters for arbitrary models by having the user implicitly define their model via simulated data. This makes the development of neural estimators particularly straightforward for models with existing implementations (possibly in other programming languages, e.g., R or python). A convenient interface for R users is available here.","category":"page"},{"location":"#Getting-started","page":"NeuralEstimators","title":"Getting started","text":"","category":"section"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"Install NeuralEstimators from Julia's package manager using the following command inside Julia:","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"using Pkg; Pkg.add(\"NeuralEstimators\")","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"Once familiar with the details of the Theoretical framework, see the Examples.","category":"page"},{"location":"#Supporting-and-citing","page":"NeuralEstimators","title":"Supporting and citing","text":"","category":"section"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"This software was developed as part of academic research. If you would like to support it, please star the repository. If you use NeuralEstimators in your research or other activities, please use the following citation.","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"@article{SZH_2023_neural_Bayes_estimators,\n\tauthor = {Sainsbury-Dale, Matthew and Zammit-Mangion, Andrew and Huser, Raphaël},\n\ttitle = {Likelihood-Free Parameter Estimation with Neural {B}ayes Estimators},\n\tjournal = {The American Statistician},\n\tyear = {2023},\n\tvolume = {to appear}\n}","category":"page"}] +[{"location":"API/simulation/#Model-specific-functions","page":"Model-specific functions","title":"Model-specific functions","text":"","category":"section"},{"location":"API/simulation/#Data-simulators","page":"Model-specific functions","title":"Data simulators","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"The philosophy of NeuralEstimators is to cater for arbitrary statistical models by having the user define their statistical model implicitly through simulated data. However, the following functions have been included as they may be helpful to others, and their source code provide an example for how a user could formulate code for their own model. If you've developed similar functions that you think may be helpful to others, please get in touch or make a pull request.","category":"page"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"simulategaussianprocess\n\nsimulateschlather","category":"page"},{"location":"API/simulation/#NeuralEstimators.simulategaussianprocess","page":"Model-specific functions","title":"NeuralEstimators.simulategaussianprocess","text":"simulategaussianprocess(L::Matrix, m = 1)\nsimulategaussianprocess(grf::GaussianRandomField, m = 1)\n\nSimulates m independent and identically distributed (i.i.d.) realisations from a mean-zero Gaussian process.\n\nAccepts either the lower Cholesky factor L associated with a Gaussian process or a GaussianRandomField object grf.\n\nExamples\n\nusing NeuralEstimators\n\nn = 500\nS = rand(n, 2)\nρ = 0.6\nν = 1.0\n\n# Passing GaussianRandomField object:\nusing GaussianRandomFields\ncov = CovarianceFunction(2, Matern(ρ, ν))\ngrf = GaussianRandomField(cov, Cholesky(), S)\nsimulategaussianprocess(grf)\n\n# Passing Cholesky factors directly as matrices:\nL = grf.data\nsimulategaussianprocess(L)\n\n# Circulant embedding, which is fast but can on only be used on grids:\npts = 1.0:50.0\ngrf = GaussianRandomField(cov, CirculantEmbedding(), pts, pts, minpadding = 100)\nsimulategaussianprocess(grf)\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#NeuralEstimators.simulateschlather","page":"Model-specific functions","title":"NeuralEstimators.simulateschlather","text":"simulateschlather(L::Matrix, m = 1)\nsimulateschlather(grf::GaussianRandomField, m = 1)\n\nSimulates m independent and identically distributed (i.i.d.) realisations from Schlather's max-stable model using the algorithm for approximate simulation given by Schlather (2002), \"Models for stationary max-stable random fields\", Extremes, 5:33-44.\n\nAccepts either the lower Cholesky factor L associated with a Gaussian process or a GaussianRandomField object grf.\n\nKeyword arguments\n\nC = 3.5: a tuning parameter that controls the accuracy of the algorithm: small C favours computational efficiency, while large C favours accuracy. Schlather (2002) recommends the use of C = 3.\nGumbel = true: flag indicating whether the data should be log-transformed from the unit Fréchet scale to the Gumbel scale.\n\nExamples\n\nusing NeuralEstimators\n\nn = 500\nS = rand(n, 2)\nρ = 0.6\nν = 1.0\n\n# Passing GaussianRandomField object:\nusing GaussianRandomFields\ncov = CovarianceFunction(2, Matern(ρ, ν))\ngrf = GaussianRandomField(cov, Cholesky(), S)\nsimulateschlather(grf)\n\n# Passing Cholesky factors directly as matrices:\nL = grf.data\nsimulateschlather(L)\n\n# Circulant embedding, which is fast but can on only be used on grids:\npts = 1.0:50.0\ngrf = GaussianRandomField(cov, CirculantEmbedding(), pts, pts, minpadding = 100)\nsimulateschlather(grf)\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#Spatial-point-processes","page":"Model-specific functions","title":"Spatial point processes","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"maternclusterprocess","category":"page"},{"location":"API/simulation/#NeuralEstimators.maternclusterprocess","page":"Model-specific functions","title":"NeuralEstimators.maternclusterprocess","text":"maternclusterprocess(; λ=10, μ=10, r=0.1, xmin=0, xmax=1, ymin=0, ymax=1)\n\nSimulates a Matérn cluster process with density of parent Poisson point process λ, mean number of daughter points μ, and radius of cluster disk r, over the simulation window defined by {x/y}min and {x/y}max.\n\nNote that one may also use the R package spatstat using RCall.\n\nExamples\n\nusing NeuralEstimators\n\n# Simulate a realisation from a Matérn cluster process\nS = maternclusterprocess()\n\n# Visualise realisation (requires UnicodePlots)\nusing UnicodePlots\nscatterplot(S[:, 1], S[:, 2])\n\n# Visualise realisations from the cluster process with varying parameters\nn = 250\nλ = [10, 25, 50, 90]\nμ = n ./ λ\nplots = map(eachindex(λ)) do i\n\tS = maternclusterprocess(λ = λ[i], μ = μ[i])\n\tscatterplot(S[:, 1], S[:, 2])\nend\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#Low-level-functions","page":"Model-specific functions","title":"Low-level functions","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"These low-level functions may be of use for various models.","category":"page"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"matern\n\nmaternchols","category":"page"},{"location":"API/simulation/#NeuralEstimators.matern","page":"Model-specific functions","title":"NeuralEstimators.matern","text":"matern(h, ρ, ν, σ² = 1)\n\nFor two points separated by h units, compute the Matérn covariance function, with range parameter ρ, smoothness parameter ν, and marginal variance parameter σ².\n\nWe use the parametrisation C(mathbfh) = sigma^2 frac2^1 - nuGamma(nu) left(fracmathbfhrhoright)^nu K_nu left(fracmathbfhrhoright), where Gamma(cdot) is the gamma function, and K_nu(cdot) is the modified Bessel function of the second kind of order nu.\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#NeuralEstimators.maternchols","page":"Model-specific functions","title":"NeuralEstimators.maternchols","text":"maternchols(D, ρ, ν, σ² = 1; stack = true)\n\nGiven a distance matrix D, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².\n\nProviding vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.\n\nIf stack = true, the Cholesky factors will be \"stacked\" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).\n\nExamples\n\nusing NeuralEstimators\nusing LinearAlgebra: norm\nn = 10\nS = rand(n, 2)\nD = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S), sⱼ ∈ eachrow(S)]\nρ = [0.6, 0.5]\nν = [0.7, 1.2]\nσ² = [0.2, 0.4]\nmaternchols(D, ρ, ν)\nmaternchols(D, ρ, ν, σ²; stack = false)\n\nS̃ = rand(n, 2)\nD̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]\nmaternchols([D, D̃], ρ, ν, σ²)\nmaternchols([D, D̃], ρ, ν, σ²; stack = false)\n\nS̃ = rand(2n, 2)\nD̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]\nmaternchols([D, D̃], ρ, ν, σ²; stack = false)\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#Density-functions","page":"Model-specific functions","title":"Density functions","text":"","category":"section"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"Density functions are not needed in the workflow of NeuralEstimators. However, as part of a series of comparison studies between neural estimators and likelihood-based estimators given in the manuscript, we have developed the following density functions, and we include them in NeuralEstimators to cater for the possibility that they may be of use in future comparison studies.","category":"page"},{"location":"API/simulation/","page":"Model-specific functions","title":"Model-specific functions","text":"gaussiandensity\n\nschlatherbivariatedensity","category":"page"},{"location":"API/simulation/#NeuralEstimators.gaussiandensity","page":"Model-specific functions","title":"NeuralEstimators.gaussiandensity","text":"gaussiandensity(y::V, L; logdensity = true) where {V <: AbstractVector{T}} where T\ngaussiandensity(y::A, Σ; logdensity = true) where {A <: AbstractArray{T, N}} where {T, N}\n\nEfficiently computes the density function for y ~ 𝑁(0, Σ), with L the lower Cholesky factor of the covariance matrix Σ.\n\nThe method gaussiandensity(y::A, Σ) assumes that the last dimension of y corresponds to the independent-replicates dimension, and it exploits the fact that we need to compute the Cholesky factor L for these independent replicates once only.\n\n\n\n\n\n","category":"function"},{"location":"API/simulation/#NeuralEstimators.schlatherbivariatedensity","page":"Model-specific functions","title":"NeuralEstimators.schlatherbivariatedensity","text":"schlatherbivariatedensity(z₁, z₂, ψ; logdensity = true)\n\nThe bivariate density function for Schlather's max-stable model, as given in Huser (2013, pg. 231–232).\n\nHuser, R. (2013). Statistical Modeling and Inference for Spatio-Temporal Ex- tremes. PhD thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland.\n\n\n\n\n\n","category":"function"},{"location":"workflow/examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"workflow/examples/#Univariate-data","page":"Examples","title":"Univariate data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Here, we develop a neural Bayes estimator for boldsymboltheta equiv (mu sigma) from data Z_1 dots Z_m that are independent and identically distributed according to a N(mu sigma^2) distribution. We'll use the priors mu sim N(0 1) and sigma sim U(01 1), and we assume that the parameters are independent a priori.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Before proceeding, we load the required packages:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"using NeuralEstimators\nusing Flux\nusing Distributions\nimport NeuralEstimators: simulate","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"First, we define a function to sample parameters from the prior. The sampled parameters are stored as p times K matrices, with p the number of parameters in the model and K the number of sampled parameter vectors.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"function sample(K)\n\tμ = rand(Normal(0, 1), K)\n\tσ = rand(Uniform(0.1, 1), K)\n\tθ = hcat(μ, σ)'\n\tθ = Float32.(θ)\n\treturn θ\nend","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Next, we implicitly define the statistical model with simulated data. The data are stored as a Vector{A}, where each element of the vector is associated with one parameter vector, and where A depends on the representation of the neural estimator. Since our data is replicated, we will use the DeepSets framework and, since each replicate is univariate, we will use a dense neural network (DNN) for the inner network. Since the inner network is a DNN, A should be a sub-type of AbstractArray, with the independent replicates stored in the final dimension.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"function simulate(parameters, m)\n\tZ = [θ[1] .+ θ[2] .* randn(Float32, 1, m) for θ ∈ eachcol(parameters)]\n\treturn Z\nend","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"We now design architectures for the inner and outer neural networks, boldsymbolpsi(cdot) and boldsymbolphi(cdot) respectively, in the DeepSets framework, and initialise the neural estimator as a PointEstimator object.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"p = 2 # number of parameters in the statistical model\nw = 32 # width of each layer\n\nψ = Chain(Dense(1, w, relu), Dense(w, w, relu))\nϕ = Chain(Dense(w, w, relu), Dense(w, p))\narchitecture = DeepSet(ψ, ϕ)\n\nθ̂ = PointEstimator(architecture)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Next, we train the neural estimator using train, here using the default absolute-error loss. We'll train the estimator using 15 independent replicates per parameter configuration. Below, we pass our user-defined functions for sampling parameters and simulating data, but one may also pass parameter or data instances, which will be held fixed during training; see train.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"m = 15\nθ̂ = train(θ̂, sample, simulate, m = m, epochs = 30)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"To test the accuracy of the resulting neural Bayes estimator, we use the function assess, which can be used to assess the performance of the estimator (or multiple estimators) over a range of sample sizes. Note that, in this example, we trained the neural estimator using a single sample size, m = 15, and hence the estimator will not necessarily be optimal for other sample sizes; see Variable sample sizes for approaches that one could adopt if data sets with varying sample size are envisaged.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"θ = sample(1000)\nZ = [simulate(θ, m) for m ∈ (5, 10, 15, 20, 30)]\nassessment = assess([θ̂], θ, Z)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"The returned object is an object of type Assessment, which contains the true parameters and their corresponding estimates, and the time taken to compute the estimates for each sample size and each estimator. The risk function may be computed using the function risk:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"risk(assessment)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"It is often helpful to visualise the empirical sampling distribution of an estimator for a particular parameter configuration and a particular sample size. This can be done by providing assess with J data sets simulated under a particular parameter configuration (below facilitated with the pre-defined method simulate(parameters, m, J::Integer), which wraps the method of simulate that we defined earlier), and then plotting the estimates contained in the long-form DataFrame in the resulting Assessment object:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"J = 100\nθ = sample(1)\nZ = [simulate(θ, m, J)]\nassessment = assess([θ̂], θ, Z) ","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Once the neural Bayes estimator has been assessed, it may then be applied to observed data, with parametric/non-parametric bootstrap-based uncertainty quantification facilitated by bootstrap and interval. Below, we use simulated data as a substitute for observed data:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Z = simulate(θ, m) # pretend that this is observed data\nθ̂(Z) # point estimates from the observed data\nθ̃ = bootstrap(θ̂, Z) # non-parametric bootstrap estimates\ninterval(θ̃) \t\t\t\t\t# confidence interval from the bootstrap estimates","category":"page"},{"location":"workflow/examples/#Multivariate-data","page":"Examples","title":"Multivariate data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Suppose now that our data consists of m replicates of a d-dimensional multivariate distribution. Everything remains as given in the univariate example above, except that we now store the data as a vector of d times m matrices (previously they were stored as 1times m matrices), and the inner network of the DeepSets representation takes a d-dimensional input (previously it took a 1-dimensional input).","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Note that, when estimating a full covariance matrix, one may wish to constrain the neural estimator to only produce parameters that imply a valid (i.e., positive definite) covariance matrix. This can be achieved by appending a CovarianceMatrix layer to the end of the outer network of the DeepSets representation. However, this is often unnecessary as the estimator will typically learn to provide valid estimates, even if not constrained to do so.","category":"page"},{"location":"workflow/examples/#Gridded-spatial-data","page":"Examples","title":"Gridded spatial data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"For spatial data measured on a regular grid, the estimator is typically based on a convolutional neural network (CNN), and each data set is stored as a four-dimensional array, where the first three dimensions correspond to the width, height, and channels dimensions, and the fourth dimension stores the independent replicates. Note that, for univariate spatial processes, the channels dimension is simply equal to 1. For a 16x16 spatial grid, a possible architecture is given below.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"p = 2 # number of parameters in the statistical model\nw = 32 # number of neurons in each layer\nd = 2 # dimension of the response variable\n\nψ = Chain(\n\tConv((10, 10), 1 => 32, relu),\n\tConv((5, 5), 32 => 64, relu),\n\tConv((3, 3), 64 => 128, relu),\n\tFlux.flatten\n\t)\nϕ = Chain(Dense(128, 512, relu), Dense(512, p))\narchitecture = DeepSet(ψ, ϕ)","category":"page"},{"location":"workflow/examples/#Irregular-spatial-data","page":"Examples","title":"Irregular spatial data","text":"","category":"section"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"The methodology we illustrate here uses graph neural networks (GNNs), which are implemented in Julia in the package GraphNeuralNetworks.jl. GNN-based estimators parsimoniously model spatial dependence, and they can be applied to data collected over arbitrary spatial locations. Some key steps involve:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Sampling spatial locations to cover a wide range of spatial configurations during the training phase: see maternclusterprocess.\nComputing (spatially-weighted) adjacency matrices: see adjacencymatrix.\nStoring the data as a graph: see GNNGraph.\nConstructing an appropriate architecture: see GNN and WeightedGraphConv.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"For a concrete example, we consider a classical spatial model, the linear Gaussian-Gaussian model,","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Z_j = Y(boldsymbols_j) + epsilon_j j = 1 dots n","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"where boldsymbolZ equiv (Z_1 dots Z_n) are data observed at locations boldsymbols_1 dots boldsymbols_n subset mathcalD, where mathcalD is some spatial domain, Y(cdot) is a spatially-correlated mean-zero Gaussian process, and epsilon_j sim N(0 tau^2), j = 1 dots n is Gaussian white noise with standard deviation tau 0. Here, we use the popular isotropic Matérn covariance function with fixed marginal variance sigma^2 = 1, fixed smoothness parameter nu = 05, and unknown range parameter rho 0. See matern for the specific parametrisation used in this example. Hence, we will construct a neural Bayes estimator for boldsymboltheta equiv (tau rho).","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Before proceeding, we load the required packages:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"using NeuralEstimators\nusing Flux\nusing GraphNeuralNetworks\nusing Distributions: Uniform\nusing Distances: pairwise, Euclidean\nusing LinearAlgebra\nusing Statistics: mean","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"First, we define a function to sample parameters from the prior. As before, the sampled parameters are stored as p times K matrices, with p the number of parameters in the model and K the number of sampled parameter vectors. We use the priors tau sim U(01 1) and rho sim U(005 05), and we assume that the parameters are independent a priori. Simulation from this model involves the computation of an expensive intermediate object, namely, the Cholesky factor of the covariance matrix. Storing this Cholesky factor for re-use can enable the fast simulation of new data sets (provided that the parameters are held fixed): hence, in this example, we define a class, Parameters, which is a sub-type of ParameterConfigurations, for storing the matrix of parameters and the corresponding intermediate objects needed for data simulation.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"If one wishes to make inference from a single spatial data set only, and this data is collected before the estimator is constructed, then the data can be simulated using the observed spatial locations. However, if one wishes to construct an estimator that is (approximately) Bayes irrespective of the spatial locations, then synthetic spatial locations must be generated during the training phase. If no prior knowledge on the sampling configuration is available, then a wide variety of spatial configurations must be simulated to produce an estimator that is broadly applicable. Below, we use a Matérn cluster process (see maternclusterprocess) for this task (note that the hyper-parameters of this process govern the expected number of locations in each sampled set of spatial locations, and the degree of clustering).","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"We define two constructors for our Parameters object: one that constructs a Parameters object given a single integer K, and another that constructs a Parameters object given a pre-specified ptimes K matrix of parameters and a set of spatial locations associated with each parameter vector. These constructors will be useful in the workflow below.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"struct Parameters{T} <: ParameterConfigurations\n\tθ::Matrix{T}\n\tlocations\n\tchols\n\tgraphs\nend\n\nfunction Parameters(K::Integer)\n\n\t# Sample parameters from the prior distribution\n\tτ = rand(Uniform(0.1, 1.0), K)\n\tρ = rand(Uniform(0.05, 0.5), K)\n\n\t# Combine parameters into a pxK matrix\n\tθ = permutedims(hcat(τ, ρ))\n\n\t# Simulate spatial locations from a cluster process over the unit square\n\tn = rand(Uniform(75, 200), K)\n\tλ = rand(Uniform(10, 50), K)\n\tlocations = [maternclusterprocess(λ = λ[k], μ = n[k]/λ[k]) for k ∈ 1:K]\n\n\tParameters(θ::Matrix, locations)\nend\n\nfunction Parameters(θ::Matrix, locations)\n\n\t# Compute distance matrices and construct the graphs\n\tD = pairwise.(Ref(Euclidean()), locations, locations, dims = 1)\n\tA = adjacencymatrix.(D, 0.15)\n\tgraphs = GNNGraph.(A)\n\n\t# Compute Cholesky factors using the distance matrices\n\tρ = θ[2, :]\n\tν = 0.5\n\tσ = 1\n\tchols = maternchols(D, ρ, ν, σ.^2; stack = false) \n\n\tParameters(θ, locations, chols, graphs)\nend","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Next, we define a function for simulating from the model given an object of type Parameters. Although here we are constructing an estimator for a single replicate, the code below enables simulation of an arbitrary number of independent replicates m: one may provide a single integer for m, a range of values (e.g., 1:30), or any object that can be sampled from using rand(m, K).","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"function simulate(parameters::Parameters, m)\n\n\tK = size(parameters, 2)\n\tm̃ = rand(m, K)\n\n\tτ = parameters.θ[1, :]\n\tchols = parameters.chols\n\tg = parameters.graphs\n\n\t# Z = Folds.map(1:K) do i # use this for parallel simulation\n\tZ = map(1:K) do k\n\t\tL = chols[k][:, :]\n\t\tz = simulategaussianprocess(L, m̃[k]) # simulate a smooth field\n\t\tz = z + τ[k] * randn(size(z)...) # add white noise\n\t\tz = batch([GNNGraph(g[k], ndata = z[:, i, :]') for i ∈ 1:m̃[k]])\n\t\tz\n\tend\n\n\treturn Z\nend\nsimulate(parameters::Parameters, m::Integer) = simulate(parameters, range(m, m))","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Next we construct an appropriate architecture using GNN and WeightedGraphConv. For example, we might construct a point estimator as:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"# Propagation module\nd = 1 # dimension of response variable\nnh = 32 # dimension of node feature vectors\npropagation = GNNChain(WeightedGraphConv(d => nh), WeightedGraphConv(nh => nh), WeightedGraphConv(nh => nh))\n\n# Readout module (using the elementwise average)\nno = nh # dimension of the final summary vector for each graph\nreadout = GlobalPool(mean)\n\n# Mapping module (use exponential output activation to ensure positive estimates)\np = 2 # number of parameters in the statistical model\nw = 64 # width of layers used for the mapping network ϕ\nϕ = Chain(Dense(no, w, relu), Dense(w, w, relu), Dense(w, p, exp))\n\n# Construct the estimator\nθ̂ = GNN(propagation, readout, ϕ)\nθ̂ = PointEstimator(θ̂)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Next, we train the neural estimator using train, here using the default absolute-error loss. We'll train the estimator using a single realisation per parameter configuration (i.e., with m = 1). Below, we use a very small number of epochs and a small number of training parameter vectors to keep the run time of this example low, and this will of course result in a poor estimator: in practice, one may set K to some large value (say, 10,000), and leave epochs unspecified so that training halts only when the risk function ceases to decrease.","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"θ̂ = train(θ̂, Parameters, simulate, m = 1, epochs = 5, K = 500)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Note that one may also train the estimator using fixed parameter sets and fixed training data (as described in train), by passing parameter or data instances:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"# Fixed parameters, but data simulated on-the-fly:\nθ_train = Parameters(500)\nθ_val = Parameters(100)\nθ̂ = train(θ̂, θ_train, θ_val, simulate, m = 1, epochs = 5)\n\n# Fixed parameters and data:\nz_train = simulate(θ_train, 1)\nz_val = simulate(θ_val, 1)\nθ̂ = train(θ̂, θ_train, θ_val, z_train, z_val, epochs = 5)","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"Once the neural Bayes estimator has been trained, it can be assessed with testing data using assess, as illustrated in the univariate example above. Finally, once the neural Bayes estimator has been assessed, it may be applied to observed data, with bootstrap-based uncertainty quantification facilitated by bootstrap and interval. Below, we use simulated data as a substitute for observed data:","category":"page"},{"location":"workflow/examples/","page":"Examples","title":"Examples","text":"# Generate some toy data\nparameters = Parameters(1) # sample a single parameter vector\nz = simulate(parameters, 1) # simulate some data \nθ = parameters.θ # true parameters used to generate data\nS = parameters.locations # observed locations\n\n# Point estimates\nθ̂(z)\n\n# Parametric bootstrap sample and bootstrap confidence interval\nθ̃ = bootstrap(θ̂, Parameters(θ̂(z), S), simulate, 1) \ninterval(θ̃) \t\t\t\t\t ","category":"page"},{"location":"framework/#Theoretical-framework","page":"Theoretical framework","title":"Theoretical framework","text":"","category":"section"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"In this section, we provide an overview of point estimation using neural Bayes estimators. For a more detailed discussion on the framework and its implementation, see Sainsbury-Dale et al. (2022; arxiv:2208.12942).","category":"page"},{"location":"framework/#Neural-Bayes-estimators","page":"Theoretical framework","title":"Neural Bayes estimators","text":"","category":"section"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"A parametric statistical model is a set of probability distributions on a sample space mathcalS, where the probability distributions are parameterised via some p-dimensional parameter vector boldsymboltheta on a parameter space Theta. Suppose that we have data from one such distribution, which we denote as boldsymbolZ. Then, the goal of parameter point estimation is to come up with an estimate of the unknown boldsymboltheta from boldsymbolZ using an estimator,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" hatboldsymboltheta mathcalS to Theta","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"which is a mapping from the sample space to the parameter space.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Estimators can be constructed within a decision-theoretic framework. Assume that the sample space is mathcalS = mathbbR^n, and consider a non-negative loss function, L(boldsymboltheta hatboldsymboltheta(boldsymbolZ)), which assesses an estimator hatboldsymboltheta(cdot) for a given boldsymboltheta and data set boldsymbolZ sim f(boldsymbolz mid boldsymboltheta), where f(boldsymbolz mid boldsymboltheta) is the probability density function of the data conditional on boldsymboltheta. boldsymboltheta. An estimator's risk function is its loss averaged over all possible data realisations,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" R(boldsymboltheta hatboldsymboltheta(cdot)) equiv int_mathcalS L(boldsymboltheta hatboldsymboltheta(boldsymbolz))f(boldsymbolz mid boldsymboltheta) rmd boldsymbolz","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"So-called Bayes estimators minimise the Bayes risk,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" r_Omega(hatboldsymboltheta(cdot))\n equiv int_Theta R(boldsymboltheta hatboldsymboltheta(cdot)) rmd Omega(boldsymboltheta) ","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"where Omega(cdot) is a prior measure for boldsymboltheta.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Bayes estimators are theoretically attractive: for example, unique Bayes estimators are admissible and, under suitable regularity conditions and the squared-error loss, are consistent and asymptotically efficient. Further, for a large class of prior distributions, every set of conditions that imply consistency of the maximum likelihood (ML) estimator also imply consistency of Bayes estimators. Importantly, Bayes estimators are not motivated purely by asymptotics: by construction, they are Bayes irrespective of the sample size and model class. Unfortunately, however, Bayes estimators are typically unavailable in closed form for the complex models often encountered in practice. A way forward is to assume a flexible parametric model for hatboldsymboltheta(cdot), and to optimise the parameters within that model in order to approximate the Bayes estimator. Neural networks are ideal candidates, since they are universal function approximators, and because they are also fast to evaluate, usually involving only simple matrix-vector operations.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Let hatboldsymboltheta(boldsymbolZ boldsymbolgamma) denote a neural point estimator, that is, a neural network that returns a point estimate from data boldsymbolZ, where boldsymbolgamma contains the neural-network parameters. Bayes estimators may be approximated with hatboldsymboltheta(cdot boldsymbolgamma^*) by solving the optimisation problem, ","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"boldsymbolgamma^*\nequiv\nundersetboldsymbolgammamathrmargmin r_Omega(hatboldsymboltheta(cdot boldsymbolgamma))","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Typically, r_Omega(cdot) cannot be directly evaluated, but it can be approximated using Monte Carlo methods. Specifically, given a set of K parameter vectors sampled from the prior Omega(cdot) denoted by vartheta and, for each boldsymboltheta in vartheta, J realisations from f(boldsymbolz mid boldsymboltheta) collected in mathcalZ_boldsymboltheta,","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":" r_Omega(hatboldsymboltheta(cdot boldsymbolgamma))\n approx\nfrac1K sum_boldsymboltheta in vartheta frac1J sum_boldsymbolz in mathcalZ_boldsymboltheta L(boldsymboltheta hatboldsymboltheta(boldsymbolz boldsymbolgamma)) ","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Note that the above approximation does not involve evaluation, or knowledge, of the likelihood function.","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"The Monte-Carlo-approximated Bayes risk can be straightforwardly minimised with respect to boldsymbolgamma using back-propagation and stochastic gradient descent. For sufficiently flexible architectures, the point estimator targets a Bayes estimator with respect to L(cdot cdot) and Omega(cdot). We therefore call the fitted neural point estimator a neural Bayes estimator. Like Bayes estimators, neural Bayes estimators target a specific point summary of the posterior distribution. For instance, the absolute-error and squared-error loss functions lead to neural Bayes estimators that approximate the posterior median and mean, respectively.","category":"page"},{"location":"framework/#Construction-of-neural-Bayes-estimators","page":"Theoretical framework","title":"Construction of neural Bayes estimators","text":"","category":"section"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"The neural Bayes estimators is conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical or physical model being considered. The workflow is as follows:","category":"page"},{"location":"framework/","page":"Theoretical framework","title":"Theoretical framework","text":"Define the prior, Omega(cdot).\nChoose a loss function, L(cdot cdot), typically the absolute-error or squared-error loss.\nDesign a suitable neural-network architecture for the neural point estimator hatboldsymboltheta(cdot boldsymbolgamma).\nSample parameters from Omega(cdot) to form training/validation/test parameter sets.\nGiven the above parameter sets, simulate data from the model, to form training/validation/test data sets.\nTrain the neural network (i.e., estimate boldsymbolgamma) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.\nAssess the fitted neural Bayes estimator, hatboldsymboltheta(cdot boldsymbolgamma^*), using the test set.","category":"page"},{"location":"API/#Index","page":"Index","title":"Index","text":"","category":"section"},{"location":"API/","page":"Index","title":"Index","text":"","category":"page"},{"location":"workflow/advancedusage/#Advanced-usage","page":"Advanced usage","title":"Advanced usage","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"In this section, we discuss practical considerations on how to construct neural estimators most effectively.","category":"page"},{"location":"workflow/advancedusage/#Storing-expensive-intermediate-objects-for-data-simulation","page":"Advanced usage","title":"Storing expensive intermediate objects for data simulation","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Parameters sampled from the prior distribution Omega(cdot) may be stored in two ways. Most simply, they can be stored as a p times K matrix, where p is the number of parameters in the model and K is the number of parameter vectors sampled from the prior distribution; this is the approach taken in the example using univariate Gaussian data. Alternatively, they can be stored in a user-defined subtype of the abstract type ParameterConfigurations, whose only requirement is a field θ that stores the p times K matrix of parameters. With this approach, one may store computationally expensive intermediate objects, such as Cholesky factors, for later use when conducting \"on-the-fly\" simulation, which is discussed below.","category":"page"},{"location":"workflow/advancedusage/#On-the-fly-and-just-in-time-simulation","page":"Advanced usage","title":"On-the-fly and just-in-time simulation","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"When data simulation is (relatively) computationally inexpensive, mathcalZ_texttrain can be simulated continuously during training, a technique coined \"simulation-on-the-fly\". Regularly refreshing mathcalZ_texttrain leads to lower out-of-sample error and to a reduction in overfitting. This strategy therefore facilitates the use of larger, more representationally-powerful networks that are prone to overfitting when mathcalZ_texttrain is fixed. Refreshing mathcalZ_texttrain also has an additional computational benefit; data can be simulated \"just-in-time\", in the sense that they can be simulated from a small batch of vartheta_texttrain, used to train the neural estimator, and then removed from memory. This can reduce pressure on memory resources when vartheta_texttrain is very large.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"One may also regularly refresh vartheta_texttrain, and doing so leads to similar benefits. However, fixing vartheta_texttrain allows computationally expensive terms, such as Cholesky factors when working with Gaussian process models, to be reused throughout training, which can substantially reduce the training time for some models. ","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"The above strategies are facilitated with various methods of train.","category":"page"},{"location":"workflow/advancedusage/#Variable-sample-sizes","page":"Advanced usage","title":"Variable sample sizes","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"A neural estimator in the Deep Set representation can be applied to data sets of arbitrary size. However, even when the neural Bayes estimator approximates the true Bayes estimator arbitrarily well, it is conditional on the number of replicates, m, and is not necessarily a Bayes estimator for m^* ne m. Denote a data set comprising m replicates as boldsymbolZ^(m) equiv (boldsymbolZ_1 dots boldsymbolZ_m). There are at least two (non-mutually exclusive) approaches one could adopt if data sets with varying m are envisaged, which we describe below.","category":"page"},{"location":"workflow/advancedusage/#Piecewise-estimators","page":"Advanced usage","title":"Piecewise estimators","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"If data sets with varying m are envisaged, one could train l neural Bayes estimators for different sample sizes, or groups thereof (e.g., a small-sample estimator and a large-sample estimator). Specifically, for sample-size changepoints m_1, m_2, dots, m_l-1, one could construct a piecewise neural Bayes estimator,","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"hatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*)\n=\nbegincases\nhatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*_tildem_1) m leq m_1\nhatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*_tildem_2) m_1 m leq m_2\nquad vdots \nhatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma^*_tildem_l) m m_l-1\nendcases","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"where, here, boldsymbolgamma^* equiv (boldsymbolgamma^*_tildem_1 dots boldsymbolgamma^*_tildem_l-1), and where boldsymbolgamma^*_tildem are the neural-network parameters optimised for sample size tildem chosen so that hatboldsymboltheta(cdot boldsymbolgamma^*_tildem) is near-optimal over the range of sample sizes in which it is applied. This approach works well in practice, and it is less computationally burdensome than it first appears when used in conjunction with pre-training.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Piecewise neural estimators are implemented with the struct, PiecewiseEstimator, and their construction is facilitated with trainx. ","category":"page"},{"location":"workflow/advancedusage/#Training-with-variable-sample-sizes","page":"Advanced usage","title":"Training with variable sample sizes","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Alternatively, one could treat the sample size as a random variable, M, with support over a set of positive integers, mathcalM, in which case, for the neural Bayes estimator, the risk function becomes","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"R(boldsymboltheta hatboldsymboltheta(cdot boldsymbolgamma))\nequiv\nsum_m in mathcalM\nP(M=m)left(int_mathcalS^m L(boldsymboltheta hatboldsymboltheta(boldsymbolZ^(m) boldsymbolgamma))p(boldsymbolZ^(m) mid boldsymboltheta) textd boldsymbolZ^(m)right)","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"This approach does not materially alter the workflow, except that one must also sample the number of replicates before simulating the data.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Below we define data simulation for a range of sample sizes (i.e., a range of integers) under a discrete uniform prior for M, the random variable corresponding to sample size.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"function simulate(parameters, m::R) where {R <: AbstractRange{I}} where I <: Integer\n\n\t## Number of parameter vectors stored in parameters\n\tK = size(parameters, 2)\n\n\t## Generate K sample sizes from the prior distribution for M\n\tm̃ = rand(m, K)\n\n\t## Pseudocode for data simulation\n\tZ = [ for k ∈ 1:K]\n\n\treturn Z\nend","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Then, setting the argument m in train to be an integer range (e.g., 1:30) will train the neural estimator with the given variable sample sizes.","category":"page"},{"location":"workflow/advancedusage/#Combining-neural-and-expert-summary-statistics","page":"Advanced usage","title":"Combining neural and expert summary statistics","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"See DeepSetExpert.","category":"page"},{"location":"workflow/advancedusage/#Loading-previously-saved-neural-estimators","page":"Advanced usage","title":"Loading previously saved neural estimators","text":"","category":"section"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"As training is by far the most computationally demanding part of the workflow, one typically trains an estimator and then saves it for later use. More specifically, one usually saves the parameters of the neural estimator (e.g., the weights and biases of the neural networks); then, to load the neural estimator at a later time, one initialises an estimator with the same architecture used during training, and then loads the saved parameters into this estimator.","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"If the argument savepath is specified, train automatically saves the neural estimator's parameters; to load them, one may use the following code, or similar:","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"using Flux: loadparams!\n\nθ̂ = architecture()\nloadparams!(θ̂, loadbestweights(savepath))","category":"page"},{"location":"workflow/advancedusage/","page":"Advanced usage","title":"Advanced usage","text":"Above, architecture() is a user-defined function that returns a neural estimator with the same architecture as the estimator that we wish to load, but with randomly initialised parameters, and the function loadparams! loads the parameters of the best (as determined by loadbestweights) neural estimator saved in savepath.","category":"page"},{"location":"workflow/overview/#Overview","page":"Overview","title":"Overview","text":"","category":"section"},{"location":"workflow/overview/","page":"Overview","title":"Overview","text":"To develop a neural estimator with NeuralEstimators.jl,","category":"page"},{"location":"workflow/overview/","page":"Overview","title":"Overview","text":"Sample parameters from the prior distribution: the parameters are stored as p times K matrices, with p the number of parameters in the model and K the number of samples (i.e., parameter configurations) in the given parameter set (i.e., training, validation, or test set).\nSimulate data from the assumed model over the parameter sets generated above. These data are stored as a Vector{A}, with each element of the vector associated with one parameter configuration, and where A depends on the representation of the neural estimator (e.g., an Array for CNN-based estimators, or a GNNGraph for GNN-based estimators).\nInitialise a neural network, θ̂, that will be trained into a neural Bayes estimator. \nTrain θ̂ under the chosen loss function using train.\nAssess θ̂ using assess. The resulting object of class Assessment can be used to assess the estimator with respect to the entire parameter space by estimating the risk function with risk, or used to plot the empirical sampling distribution of the estimator.\nApply θ̂ to observed data (once its performance has been checked in the above step). Bootstrap-based uncertainty quantification is facilitated with bootstrap and interval. ","category":"page"},{"location":"workflow/overview/","page":"Overview","title":"Overview","text":"See the Examples and, once familiar with the basic workflow, see Advanced usage for practical considerations on how to most effectively construct neural estimators.","category":"page"},{"location":"API/core/#Core","page":"Core","title":"Core","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"This page documents the functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these functions appear in a standard implementation; that is, from sampling parameters from the prior distribution, to uncertainty quantification of the final estimates via bootstrapping.","category":"page"},{"location":"API/core/#Sampling-parameters","page":"Core","title":"Sampling parameters","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"Parameters sampled from the prior distribution Omega(cdot) are stored as a p times K matrix, where p is the number of parameters in the model and K is the number of parameter vectors sampled from the prior distribution.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). In this case, the user-defined type should be a subtype of the abstract type ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion. ","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"ParameterConfigurations","category":"page"},{"location":"API/core/#NeuralEstimators.ParameterConfigurations","page":"Core","title":"NeuralEstimators.ParameterConfigurations","text":"ParameterConfigurations\n\nAn abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.\n\nThe user-defined type must have a field θ that stores the p × K matrix of parameters, where p is the number of parameters in the model and K is the number of parameter vectors sampled from the prior distribution. There are no other restrictions.\n\nSee subsetparameters for the generic function for subsetting these objects.\n\nExamples\n\nstruct P <: ParameterConfigurations\n\tθ\n\t# other expensive intermediate objects...\nend\n\n\n\n\n\n","category":"type"},{"location":"API/core/#Simulating-data","page":"Core","title":"Simulating data","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"NeuralEstimators facilitates neural estimation for arbitrary statistical models by having the user implicitly define the model via simulated data. The user may provide simulated data directly, or provide a function that simulates data from the model.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"The data should be stored as a Vector{A}, where each element of the vector is associated with one parameter configuration, and where A depends on the architecture of the neural estimator.","category":"page"},{"location":"API/core/#Types-of-estimators","page":"Core","title":"Types of estimators","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"See also Architectures and activations functions that are often used when constructing neural estimators.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"NeuralEstimator\n\nPointEstimator\n\nIntervalEstimator\n\nQuantileEstimator\n\nPiecewiseEstimator","category":"page"},{"location":"API/core/#NeuralEstimators.NeuralEstimator","page":"Core","title":"NeuralEstimators.NeuralEstimator","text":"NeuralEstimator\n\nAn abstract supertype for neural estimators.\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.PointEstimator","page":"Core","title":"NeuralEstimators.PointEstimator","text":"PointEstimator(arch)\n\nA simple point estimator, that is, a mapping from the sample space to the parameter space, defined by the given architecture arch.\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.IntervalEstimator","page":"Core","title":"NeuralEstimators.IntervalEstimator","text":"IntervalEstimator(arch_lower, arch_upper)\nIntervalEstimator(arch)\n\nA neural estimator that produces credible intervals constructed as,\n\nl(Z) l(Z) + mathrmexp(u(Z))\n\nwhere l() and u() are the neural networks arch_lower and arch_upper, both of which should transform data into p-dimensional vectors, where p is the number of parameters in the model. If only a single neural network architecture arch is provided, it will be used for both arch_lower and arch_upper.\n\nInternally, the output from arch_lower and arch_upper are concatenated, so that IntervalEstimator objects transform data into 2p-dimensional vectors.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\n# Generate some toy data\nn = 2 # bivariate data\nm = 10 # number of independent replicates\nZ = rand(n, m)\n\n# Create an architecture\np = 3 # parameters in the model\nw = 8 # width of each layer\nψ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ = Chain(Dense(w, w, relu), Dense(w, p));\narchitecture = DeepSet(ψ, ϕ)\n\n# Initialise the interval estimator\nestimator = IntervalEstimator(architecture)\n\n# Apply the interval estimator\nestimator(Z)\ninterval(estimator, Z, parameter_names = [\"ρ\", \"σ\", \"τ\"])\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.QuantileEstimator","page":"Core","title":"NeuralEstimators.QuantileEstimator","text":"QuantileEstimator()\n\nComing soon: this structure will allow for the simultaneous estimation of an arbitrary number of marginal quantiles of the posterior distribution.\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.PiecewiseEstimator","page":"Core","title":"NeuralEstimators.PiecewiseEstimator","text":"PiecewiseEstimator(estimators, breaks)\n\nCreates a piecewise estimator from a collection of estimators, based on the collection of changepoints, breaks, which should contain one element fewer than the number of estimators.\n\nAny estimator can be included in estimators, including any of the subtypes of NeuralEstimator exported with the package NeuralEstimators (e.g., PointEstimator, IntervalEstimator, etc.).\n\nExamples\n\n# Suppose that we've trained two neural estimators. The first, θ̂₁, is trained\n# for small sample sizes (e.g., m ≤ 30), and the second, `θ̂₂`, is trained for\n# moderate-to-large sample sizes (e.g., m > 30). We construct a piecewise\n# estimator with a sample-size changepoint of 30, which dispatches θ̂₁ if m ≤ 30\n# and θ̂₂ if m > 30.\n\nusing NeuralEstimators\nusing Flux\n\nn = 2 # bivariate data\np = 3 # number of parameters in the model\nw = 8 # width of each layer\n\nψ₁ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ₁ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂₁ = DeepSet(ψ₁, ϕ₁)\n\nψ₂ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ₂ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂₂ = DeepSet(ψ₂, ϕ₂)\n\nθ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], [30])\nZ = [rand(n, 1, m) for m ∈ (10, 50)]\nθ̂(Z)\n\n\n\n\n\n","category":"type"},{"location":"API/core/#Training","page":"Core","title":"Training","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"The function train is used to train a single neural estimator, while the wrapper function trainx is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.","category":"page"},{"location":"API/core/","page":"Core","title":"Core","text":"train\n\ntrainx","category":"page"},{"location":"API/core/#NeuralEstimators.train","page":"Core","title":"NeuralEstimators.train","text":"train(θ̂, sampler, simulator; )\ntrain(θ̂, θ_train::P, θ_val::P, simulator; ) where {P <: Union{AbstractMatrix, ParameterConfigurations}}\ntrain(θ̂, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}\n\nTrain a neural estimator with architecture θ̂.\n\nThe methods cater for different forms of on-the-fly simulation. The method that takes functions sampler and simulator for sampling parameters and simulating data, respectively, allows for both the parameters and the data to be simulated on-the-fly. Note that simulator is called as simulator(θ, m), where θ is a set of parameters and m is the sample size (see keyword arguments below). If provided with specific instances of parameters (θ_train and θ_val) or data (Z_train and Z_val), they will be held fixed during training.\n\nIn all methods, the validation set is held fixed to reduce noise when evaluating the validation risk function, which is used to monitor the performance of the estimator during training.\n\nIf the number of replicates in Z_train is a multiple of the number of replicates for each element of Z_val, the training data will be recycled throughout training. For example, if each element of Z_train consists of 50 replicates, and each element of Z_val consists of 10 replicates, the first epoch uses the first 10 replicates in Z_train, the second epoch uses the next 10 replicates, and so on, until the sixth epoch again uses the first 10 replicates. Note that this requires the data to be subsettable with the function subsetdata.\n\nKeyword arguments\n\nArguments common to all methods:\n\nloss = mae: the loss function, which should return the average loss when applied to multiple replicates.\nepochs::Integer = 100\nbatchsize::Integer = 32\noptimiser = ADAM(1e-4)\nsavepath::String = \"\": path to save the trained estimator and other information; if savepath is an empty string (default), nothing is saved.\nstopping_epochs::Integer = 5: cease training if the risk doesn't improve in this number of epochs.\nuse_gpu::Bool = true\nverbose::Bool = true\n\nArguments common to train(θ̂, P, simulator) and train(θ̂, θ_train, θ_val, simulator):\n\nm: sample sizes (either an Integer or a collection of Integers).\nepochs_per_Z_refresh::Integer = 1: how often to refresh the training data.\nsimulate_just_in_time::Bool = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.\n\nArguments unique to train(θ̂, P, simulator):\n\nK::Integer = 10000: number of parameter vectors in the training set; the size of the validation set is K ÷ 5.\nξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices); if ξ is provided, the parameter sampler is called as sampler(K, ξ).\nepochs_per_θ_refresh::Integer = 1: how often to refresh the training parameters. Must be a multiple of epochs_per_Z_refresh.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\nusing Distributions\nimport NeuralEstimators: simulate\n\n# data simulator\nm = 15\nfunction simulate(θ_set, m)\n\t[Float32.(rand(Normal(θ[1], θ[2]), 1, m)) for θ ∈ eachcol(θ_set)]\nend\n\n# parameter sampler\nK = 10000\nΩ = (μ = Normal(0, 1), σ = Uniform(0.1, 1)) # prior\nstruct sampler\n\tμ\n\tσ\nend\nfunction (s::sampler)(K)\n\tμ = rand(s.μ, K)\n\tσ = rand(s.σ, K)\n\tθ = hcat(μ, σ)'\n\treturn θ\nend\nsmplr = sampler(Ω.μ, Ω.σ)\n\n# architecture\np = length(Ω) # number of parameters in the statistical model\nw = 32 # width of each layer\nψ = Chain(Dense(1, w, relu), Dense(w, w, relu))\nϕ = Chain(Dense(w, w, relu), Dense(w, p))\nθ̂ = DeepSet(ψ, ϕ)\n\n# training: full simulation on-the-fly\nθ̂ = train(θ̂, smplr, simulate, m = m, K = K, epochs = 5)\nθ̂ = train(θ̂, smplr, simulate, m = m, K = K, epochs = 5)\nθ̂ = train(θ̂, smplr, simulate, m = m, K = K, epochs = 10, epochs_per_θ_refresh = 4, epochs_per_Z_refresh = 2)\n\n# training: fixed parameters\nθ_train = smplr(K)\nθ_val = smplr(K ÷ 5)\nθ̂ = train(θ̂, θ_train, θ_val, simulate, m = m, epochs = 5)\nθ̂ = train(θ̂, θ_train, θ_val, simulate, m = m, epochs = 5, epochs_per_Z_refresh = 2)\n\n# training: fixed parameters and fixed data\nZ_train = simulate(θ_train, m)\nZ_val = simulate(θ_val, m)\nθ̂ = train(θ̂, θ_train, θ_val, Z_train, Z_val, epochs = 5)\n\n\n\n\n\n","category":"function"},{"location":"API/core/#NeuralEstimators.trainx","page":"Core","title":"NeuralEstimators.trainx","text":"trainx(θ̂, P, simulator, M; )\ntrainx(θ̂, θ_train, θ_val, simulator, M; )\ntrainx(θ̂, θ_train, θ_val, Z_train::T, Z_val::T, M; )\ntrainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ) where {V <: AbstractVector{AbstractVector{Any}}}\n\nA wrapper around train to construct neural estimators for different sample sizes.\n\nThe collection M specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if M = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.\n\nThe method for Z_train::T and Z_val::T subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ M. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train and Z_val; hence, it does not need to invoke subsetdata, which can be slow or difficult to define in some cases (e.g., for graphical data).\n\nThe keyword arguments inherit from train, and certain keyword arguments can be given as vectors. For example, if we are training two estimators, we can use a different number of epochs by providing epochs = [e₁, e₂]. Other arguments that allow vectors are batchsize, stopping_epochs, and optimiser.\n\n\n\n\n\n","category":"function"},{"location":"API/core/#Assessing-a-neural-estimator","page":"Core","title":"Assessing a neural estimator","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"assess\n\nAssessment\n\nrisk","category":"page"},{"location":"API/core/#NeuralEstimators.assess","page":"Core","title":"NeuralEstimators.assess","text":"assess(estimators, θ, Z; )\n\nUsing a collection of estimators, compute estimates from data Z simulated based on true parameter vectors stored in θ.\n\nIf Z contains more data sets than parameter vectors, the parameter matrix will be recycled by horizontal concatenation.\n\nThe output is of type Assessment; see ?Assessment for details.\n\nKeyword arguments\n\nestimator_names::Vector{String}: names of the estimators (sensible defaults provided).\nparameter_names::Vector{String}: names of the parameters (sensible defaults provided). If ξ is provided with a field parameter_names, those names will be used.\nξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices).\nuse_ξ = false: a Bool or a collection of Bool objects with length equal to the number of estimators. Specifies whether or not the estimator uses ξ: if it does, the estimator will be applied as estimator(Z, ξ). This argument is useful when multiple estimators are provided, only some of which need ξ; hence, if only one estimator is provided and ξ is not nothing, use_ξ is automatically set to true.\nuse_gpu = true: a Bool or a collection of Bool objects with length equal to the number of estimators.\nverbose::Bool = true\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\nn = 10 # number of observations in each realisation\np = 4 # number of parameters in the statistical model\n\n# Construct the neural estimator\nw = 32 # width of each layer\nψ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\n\n# Generate testing parameters\nK = 100\nθ = rand(p, K)\n\n# Data for a single sample size\nm = 30\nZ = [rand(n, m) for _ ∈ 1:K];\nassessment = assess([θ̂], θ, Z);\nrisk(assessment)\n\n# Multiple data sets for each parameter vector\nJ = 5\nZ = repeat(Z, J);\nassessment = assess([θ̂], θ, Z);\nrisk(assessment)\n\n# With set-level information\nqₓ = 2\nϕ = Chain(Dense(w + qₓ, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\nx = [rand(qₓ) for _ ∈ eachindex(Z)]\nassessment = assess([θ̂], θ, (Z, x));\nrisk(assessment)\n\n\n\n\n\n","category":"function"},{"location":"API/core/#NeuralEstimators.Assessment","page":"Core","title":"NeuralEstimators.Assessment","text":"Assessment(df::DataFrame, runtime::DataFrame)\n\nA type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:\n\nestimator: the name of the estimator\nparameter: the name of the parameter\ntruth: the true value of the parameter\nestimate: the estimated value of the parameter\nm: the sample size (number of iid replicates)\nk: the index of the parameter vector in the test set\nj: the index of the data set\n\nMultiple Assessment objects can be combined with merge().\n\n\n\n\n\n","category":"type"},{"location":"API/core/#NeuralEstimators.risk","page":"Core","title":"NeuralEstimators.risk","text":"risk(assessment::Assessment; loss = (x, y) -> abs(x - y), average_over_parameters = true)\n\nComputes a Monte Carlo approximation of the Bayes risk,\n\nr_Omega(hatboldsymboltheta(cdot))\napprox\nfrac1K sum_boldsymboltheta in vartheta frac1J sum_boldsymbolZ in mathcalZ_boldsymboltheta L(boldsymboltheta hatboldsymboltheta(boldsymbolZ))\n\nwhere vartheta denotes a set of K parameter vectors sampled from the prior Omega(cdot) and, for each boldsymboltheta in vartheta, we have J sets of m mutually independent realisations from the model collected in mathcalZ_boldsymboltheta.\n\nKeyword arguments\n\nloss = (x, y) -> abs(x - y): a binary operator (default absolute-error loss).\naverage_over_parameters::Bool = true: if true (default), the loss is averaged over all parameters; otherwise, the loss is averaged over each parameter separately.\naverage_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes m; otherwise, the loss is averaged over each sample size separately.\n\n\n\n\n\n","category":"function"},{"location":"API/core/#Bootstrapping","page":"Core","title":"Bootstrapping","text":"","category":"section"},{"location":"API/core/","page":"Core","title":"Core","text":"bootstrap\n\ninterval","category":"page"},{"location":"API/core/#NeuralEstimators.bootstrap","page":"Core","title":"NeuralEstimators.bootstrap","text":"bootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}\nbootstrap(θ̂, parameters::P, simulator, m::Integer; B = 400) where P <: Union{AbstractMatrix, ParameterConfigurations}\nbootstrap(θ̂, Z; B = 400, blocks = nothing)\n\nGenerates B bootstrap estimates from an estimator θ̂.\n\nParametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).\n\nNon-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z, but this can only be achieved exactly if all blocks are equal in length.\n\nThe keyword argument use_gpu is a flag determining whether to use the GPU, if it is available (default true).\n\nThe return type is a p × B matrix, where p is the number of parameters in the model.\n\n\n\n\n\n","category":"function"},{"location":"API/core/#NeuralEstimators.interval","page":"Core","title":"NeuralEstimators.interval","text":"interval(θ̃, θ̂ = nothing; type::String, probs = [0.05, 0.95], parameter_names)\n\nCompute a confidence interval using the p × B matrix of bootstrap samples, θ̃, where p is the number of parameters in the model.\n\nIf type = \"quantile\", the interval is constructed by simply taking the quantiles of θ̃, and if type = \"reverse-quantile\", the so-called reverse-quantile method is used. In both cases, the quantile levels are controlled by the argument probs.\n\nThe rows can be named with a vector of strings parameter_names.\n\nThe return type is a p × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval.\n\nExamples\n\nusing NeuralEstimators\np = 3\nB = 50\nθ̃ = rand(p, B)\nθ̂ = rand(p)\ninterval(θ̃)\ninterval(θ̃, θ̂, type = \"basic\")\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#Miscellaneous","page":"Miscellaneous","title":"Miscellaneous","text":"","category":"section"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"Order = [:type, :function]\nPages = [\"utility.md\"]","category":"page"},{"location":"API/utility/#Core","page":"Miscellaneous","title":"Core","text":"","category":"section"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"These functions can appear during the core workflow, and may need to be overloaded in some applications.","category":"page"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"numberreplicates\n\nsubsetdata\n\nsubsetparameters","category":"page"},{"location":"API/utility/#NeuralEstimators.numberreplicates","page":"Miscellaneous","title":"NeuralEstimators.numberreplicates","text":"numberofreplicates(Z)\n\nGeneric function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.subsetdata","page":"Miscellaneous","title":"NeuralEstimators.subsetdata","text":"Generic function for subsetting replicates from a data set. Default methods are:\n\nsubsetdata(Z::A, m) where {A <: AbstractArray{T, N}} where {T, N}\nsubsetdata(Z::G, m) where {G <: AbstractGraph}\n\nNote that subsetdata is slow for graphical data, and one should consider using a method of train that does not require the data to be subsetted when working with graphical data: use numberreplicates to check that the training and validation data sets are equally replicated, which prevents the invocation of subsetdata. Note also that subsetdata only applies to vectors of batched graphs.\n\nIf the user is working with data that is not covered by the default methods, simply overload subsetdata with the appropriate type for Z.\n\nExamples\n\nusing NeuralEstimators\nusing GraphNeuralNetworks\nusing Flux: batch\n\nn = 5 # number of observations in each realisation\nm = 6 # number of replicates for each parameter vector\nd = 1 # dimension of the response variable\nK = 2 # number of parameter vectors\n\n# Array data\nZ = [rand(n, d, m) for k ∈ 1:K]\nsubsetdata(Z, 1:3) # extract first 3 replicates for each parameter vector\n\n# Graphical data\ne = 8 # number of edges\nZ = [batch([rand_graph(n, e, ndata = rand(d, n)) for _ ∈ 1:m]) for k ∈ 1:K]\nsubsetdata(Z, 1:3) # extract first 3 replicates for each parameter vector\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.subsetparameters","page":"Miscellaneous","title":"NeuralEstimators.subsetparameters","text":"subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}\nsubsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}\n\nSubset parameters using a collection of indices.\n\nArrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#Utility-functions","page":"Miscellaneous","title":"Utility functions","text":"","category":"section"},{"location":"API/utility/","page":"Miscellaneous","title":"Miscellaneous","text":"adjacencymatrix\n\ncontainertype\n\nestimateinbatches\n\nexpandgrid\n\nloadbestweights\n\nstackarrays\n\nvectotril","category":"page"},{"location":"API/utility/#NeuralEstimators.adjacencymatrix","page":"Miscellaneous","title":"NeuralEstimators.adjacencymatrix","text":"adjacencymatrix(M::Matrix, k::Integer)\nadjacencymatrix(M::Matrix, r::Float)\n\nComputes a spatially weighted adjacency matrix from M based on either the k nearest neighbours of each location, or a fixed spatial radius of r units.\n\nIf M is a square matrix, is it treated as a distance matrix; otherwise, it should be an n x d matrix, where n is the number of spatial sample locations and d is the spatial dimension (typically d = 2).\n\nExamples\n\nusing NeuralEstimators\nusing Distances\n\nn = 100\nd = 2\nS = rand(n, d)\nk = 5\nr = 0.3\n\n# Memory efficient constructors (avoids constructing the full distance matrix D)\nadjacencymatrix(S, k)\nadjacencymatrix(S, r)\n\n# Construct from full distance matrix D\nD = pairwise(Euclidean(), S, S, dims = 1)\nadjacencymatrix(D, k)\nadjacencymatrix(D, r)\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.containertype","page":"Miscellaneous","title":"NeuralEstimators.containertype","text":"containertype(A::Type)\ncontainertype(::Type{A}) where A <: SubArray\ncontainertype(a::A) where A\n\nReturns the container type of its argument.\n\nIf given a SubArray, returns the container type of the parent array.\n\nExamples\n\na = rand(3, 4)\ncontainertype(a)\ncontainertype(typeof(a))\n[containertype(x) for x ∈ eachcol(a)]\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.estimateinbatches","page":"Miscellaneous","title":"NeuralEstimators.estimateinbatches","text":"estimateinbatches(θ̂, z; batchsize::Integer = 32, use_gpu::Bool = true)\n\nApply the estimator θ̂ on minibatches of z of size batchsize, to avoid memory issues that can occur when z is very large.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.expandgrid","page":"Miscellaneous","title":"NeuralEstimators.expandgrid","text":"expandgrid(xs, ys)\n\nSame as expand.grid() in R, but currently caters for two dimensions only.\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.loadbestweights","page":"Miscellaneous","title":"NeuralEstimators.loadbestweights","text":"loadbestweights(path::String)\n\nGiven a path to a training run containing neural networks saved with names \"network_epochx.bson\" and an object saved as \"loss_per_epoch.bson\", returns the weights of the best network (measured by validation loss).\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.stackarrays","page":"Miscellaneous","title":"NeuralEstimators.stackarrays","text":"stackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}\n\nStack a vector of arrays v along the last dimension of each array, optionally merging the final dimension of the stacked array.\n\nThe arrays must be of the same size for the first N-1 dimensions. However, if merge = true, the size of the final dimension can vary.\n\nExamples\n\n# Vector containing arrays of the same size:\nZ = [rand(2, 3, m) for m ∈ (1, 1)];\nstackarrays(Z)\nstackarrays(Z, merge = false)\n\n# Vector containing arrays with differing final dimension size:\nZ = [rand(2, 3, m) for m ∈ (1, 2)];\nstackarrays(Z)\n\n\n\n\n\n","category":"function"},{"location":"API/utility/#NeuralEstimators.vectotril","page":"Miscellaneous","title":"NeuralEstimators.vectotril","text":"vectotril(v; strict = false)\nvectotriu(v; strict = false)\n\nConverts a vector v of length d(d+1)2 (a triangular number) into a d d lower or upper triangular matrix.\n\nIf strict = true, the matrix will be strictly lower or upper triangular, that is, a (d+1) (d+1) triangular matrix with zero diagonal.\n\nNote that the triangular matrix is constructed on the CPU, but the returned matrix will be a GPU array if v is a GPU array. Note also that the return type is not of type Triangular matrix (i.e., the zeros are materialised) since Traingular matrices are not always compatible with other GPU operations.\n\nExamples\n\nusing NeuralEstimators\n\nd = 4\nn = d*(d+1)÷2\nv = collect(range(1, n))\nvectotril(v)\nvectotriu(v)\nvectotril(v; strict = true)\nvectotriu(v; strict = true)\n\n\n\n\n\n","category":"function"},{"location":"API/loss/#Loss-functions","page":"Loss functions","title":"Loss functions","text":"","category":"section"},{"location":"API/loss/","page":"Loss functions","title":"Loss functions","text":"In addition to the standard loss functions provided by Flux (e.g., mae, mse, etc.), NeuralEstimators provides the following loss functions.","category":"page"},{"location":"API/loss/","page":"Loss functions","title":"Loss functions","text":"kpowerloss\n\nquantileloss\n\nintervalscore","category":"page"},{"location":"API/loss/#NeuralEstimators.kpowerloss","page":"Loss functions","title":"NeuralEstimators.kpowerloss","text":"kpowerloss(θ̂, y, k; agg = mean, safeorigin = true, ϵ = 0.1)\n\nFor k ∈ (0, ∞), the k-th power absolute-distance loss,\n\nL(θ θ) = θ - θᵏ\n\ncontains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k → 0).\n\nIt is Lipschitz continuous iff k = 1, convex iff k ≥ 1, and strictly convex iff k > 1. It is quasiconvex for all k > 0.\n\nIf safeorigin = true, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ-interval surrounding the origin.\n\n\n\n\n\n","category":"function"},{"location":"API/loss/#NeuralEstimators.quantileloss","page":"Loss functions","title":"NeuralEstimators.quantileloss","text":"quantileloss(θ̂, θ, q; agg = mean)\nquantileloss(θ̂, θ, q::V; agg = mean) where {T, V <: AbstractVector{T}}\n\nThe asymmetric loss function whose minimiser is the qth posterior quantile; namely,\n\nL(θ θ q) = (θ - θ)(𝕀(θ - θ 0) - q)\n\nwhere q ∈ (0, 1) and 𝕀(⋅) is the indicator function.\n\nThe method that takes q as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂ is assumed to be pr, where p is the number of parameters: then, q should be an r-vector.\n\nFor further discussion on this loss function, see Equation (7) of Cressie, N. (2022), \"Decisions, decisions, decisions in an uncertain environment\", arXiv:2209.13157.\n\nExamples\n\np = 1\nK = 10\nθ = rand(p, K)\nθ̂ = rand(p, K)\nquantileloss(θ̂, θ, 0.1)\n\nθ̂ = rand(3p, K)\nquantileloss(θ̂, θ, [0.1, 0.5, 0.9])\n\np = 2\nθ = rand(p, K)\nθ̂ = rand(p, K)\nquantileloss(θ̂, θ, 0.1)\n\nθ̂ = rand(3p, K)\nquantileloss(θ̂, θ, [0.1, 0.5, 0.9])\n\n\n\n\n\n","category":"function"},{"location":"API/loss/#NeuralEstimators.intervalscore","page":"Loss functions","title":"NeuralEstimators.intervalscore","text":"intervalscore(l, u, θ, α; agg = mean)\nintervalscore(θ̂, θ, α; agg = mean)\n\nGiven a 100×(1-α)% confidence interval [l, u] with true value θ, the interval score is defined by\n\nS(l u θ α) = (u - l) + 2α¹(l - θ)𝕀(θ l) + 2α¹(θ - u)𝕀(θ u)\n\nwhere α ∈ (0, 1) and 𝕀(⋅) is the indicator function.\n\nThe method that takes a single value θ̂ assumes that θ̂ is a matrix with 2p rows, where p is the number of parameters in the statistical model. Then, the first and second set of p rows will be used as l and u, respectively.\n\nFor further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), \"Strictly proper scoring rules, prediction, and estimation\", Journal of the American statistical Association, 102, 359–378.\n\n\n\n\n\n","category":"function"},{"location":"API/architectures/#Architectures-and-activations-functions","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"","category":"section"},{"location":"API/architectures/#Index","page":"Architectures and activations functions","title":"Index","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"Order = [:type, :function]\nPages = [\"architectures.md\"]","category":"page"},{"location":"API/architectures/#Architectures","page":"Architectures and activations functions","title":"Architectures","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"Although the user is free to construct their neural estimator however they see fit, NeuralEstimators provides several useful architectures described below.","category":"page"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"DeepSet\n\nDeepSetExpert\n\nGNN","category":"page"},{"location":"API/architectures/#NeuralEstimators.DeepSet","page":"Architectures and activations functions","title":"NeuralEstimators.DeepSet","text":"DeepSet(ψ, ϕ, a)\nDeepSet(ψ, ϕ; a::String = \"mean\")\n\nThe Deep Set representation,\n\nθ(𝐙) = ϕ(𝐓(𝐙))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nwhere 𝐙 ≡ (𝐙₁', …, 𝐙ₘ')' are independent replicates from the model, ψ and ϕ are neural networks, and a is a permutation-invariant aggregation function.\n\nTo make the architecture agnostic to the sample size m, the aggregation function a must aggregate over the replicates. It can be specified as a positional argument of type Function, or as a keyword argument with permissible values \"mean\", \"sum\", and \"logsumexp\".\n\nDeepSet objects act on data stored as Vector{A}, where each element of the vector is associated with one parameter vector (i.e., one set of independent replicates), and where A depends on the form of the data and the chosen architecture for ψ. As a rule of thumb, when the data are stored as an array, the replicates are stored in the final dimension of the array. (This is usually the 'batch' dimension, but batching with DeepSets is done at the set level, i.e., sets of replicates are batched together.) For example, with gridded spatial data and ψ a CNN, A should be a 4-dimensional array, with the replicates stored in the 4ᵗʰ dimension.\n\nNote that, internally, data stored as Vector{Arrays} are first concatenated along the replicates dimension before being passed into the inner neural network ψ; this means that ψ is applied to a single large array rather than many small arrays, which can substantially improve computational efficiency, particularly on the GPU.\n\nSet-level information, 𝐱, that is not a function of the data can be passed directly into the outer network ϕ in the following manner,\n\nθ(𝐙) = ϕ((𝐓(𝐙) 𝐱))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nThis is done by providing a Tuple{Vector{A}, Vector{B}}, where the first element of the tuple contains the vector of data sets and the second element contains the vector of set-level information.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\nn = 10 # number of observations in each realisation\np = 4 # number of parameters in the statistical model\n\n# Construct the neural estimator\nw = 32 # width of each layer\nψ = Chain(Dense(n, w, relu), Dense(w, w, relu));\nϕ = Chain(Dense(w, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\n\n# Apply the estimator\nZ₁ = rand(n, 3); # single set of 3 realisations\nZ₂ = [rand(n, m) for m ∈ (3, 3)]; # two sets each containing 3 realisations\nZ₃ = [rand(n, m) for m ∈ (3, 4)]; # two sets containing 3 and 4 realisations\nθ̂(Z₁)\nθ̂(Z₂)\nθ̂(Z₃)\n\n# Repeat the above but with set-level information:\nqₓ = 2\nϕ = Chain(Dense(w + qₓ, w, relu), Dense(w, p));\nθ̂ = DeepSet(ψ, ϕ)\nx₁ = rand(qₓ)\nx₂ = [rand(qₓ) for _ ∈ eachindex(Z₂)]\nθ̂((Z₁, x₁))\nθ̂((Z₂, x₂))\nθ̂((Z₃, x₂))\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.DeepSetExpert","page":"Architectures and activations functions","title":"NeuralEstimators.DeepSetExpert","text":"DeepSetExpert(ψ, ϕ, S, a)\nDeepSetExpert(ψ, ϕ, S; a::String = \"mean\")\nDeepSetExpert(deepset::DeepSet, ϕ, S)\n\nIdentical to DeepSet, but with additional expert summary statistics,\n\nθ(𝐙) = ϕ((𝐓(𝐙) 𝐒(𝐙)))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nwhere S is a function that returns a vector of expert summary statistics.\n\nThe constructor DeepSetExpert(deepset::DeepSet, ϕ, S) inherits ψ and a from deepset.\n\nSimilarly to DeepSet, set-level information can be incorporated by passing a Tuple, in which case we have\n\nθ(𝐙) = ϕ((𝐓(𝐙) 𝐒(𝐙) 𝐱))\t \t 𝐓(𝐙) = 𝐚(ψ(𝐙ᵢ) i = 1 m)\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\nn = 10 # number of observations in each realisation\np = 4 # number of parameters in the statistical model\n\n# Construct the neural estimator\nS = samplesize\nqₛ = 1\nqₜ = 32\nw = 16\nψ = Chain(Dense(n, w, relu), Dense(w, qₜ, relu));\nϕ = Chain(Dense(qₜ + qₛ, w), Dense(w, p));\nθ̂ = DeepSetExpert(ψ, ϕ, S)\n\n# Apply the estimator\nZ₁ = rand(n, 3); # single set\nZ₂ = [rand(n, m) for m ∈ (3, 4)]; # two sets\nθ̂(Z₁)\nθ̂(Z₂)\n\n# Repeat the above but with set-level information:\nqₓ = 2\nϕ = Chain(Dense(qₜ + qₛ + qₓ, w, relu), Dense(w, p));\nθ̂ = DeepSetExpert(ψ, ϕ, S)\nx₁ = rand(qₓ)\nx₂ = [rand(qₓ) for _ ∈ eachindex(Z₂)]\nθ̂((Z₁, x₁))\nθ̂((Z₂, x₂))\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.GNN","page":"Architectures and activations functions","title":"NeuralEstimators.GNN","text":"GNN(propagation, readout, ϕ, a)\nGNN(propagation, readout, ϕ, a::String = \"mean\")\n\nA graph neural network (GNN) designed for parameter point estimation.\n\nThe propagation module transforms graphical input data into a set of hidden-feature graphs; the readout module aggregates these feature graphs into a single hidden feature vector of fixed length; the function a(⋅) is a permutation-invariant aggregation function, and ϕ is a neural network.\n\nThe data should be stored as a GNNGraph or AbstractVector{GNNGraph}, where each graph is associated with a single parameter vector. The graphs may contain sub-graphs corresponding to independent replicates from the model.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\nusing Flux: batch\nusing GraphNeuralNetworks\nusing Statistics: mean\n\n# Propagation module\nd = 1 # dimension of response variable\nnh = 32 # dimension of node feature vectors\npropagation = GNNChain(GraphConv(d => nh), GraphConv(nh => nh), GraphConv(nh => nh))\n\n# Readout module (using \"universal pooling\")\nnt = 64 # dimension of the summary vector for each node\nno = 128 # dimension of the final summary vector for each graph\nreadout = UniversalPool(Dense(nh, nt), Dense(nt, nt))\n\n# Alternative readout module (using the elementwise average)\n# readout = GlobalPool(mean); no = nh\n\n# Mapping module\np = 3 # number of parameters in the statistical model\nw = 64 # width of layers used for the mapping network ϕ\nϕ = Chain(Dense(no, w, relu), Dense(w, w, relu), Dense(w, p))\n\n# Construct the estimator\nθ̂ = GNN(propagation, readout, ϕ)\n\n# Apply the estimator to:\n# \t1. a single graph,\n# \t2. a single graph with sub-graphs (corresponding to independent replicates), and\n# \t3. a vector of graphs (corresponding to multiple spatial data sets).\ng₁ = rand_graph(11, 30, ndata=rand(d, 11))\ng₂ = rand_graph(13, 40, ndata=rand(d, 13))\ng₃ = batch([g₁, g₂])\nθ̂(g₁)\nθ̂(g₃)\nθ̂([g₁, g₂, g₃])\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#Layers","page":"Architectures and activations functions","title":"Layers","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"WeightedGraphConv\n\nUniversalPool","category":"page"},{"location":"API/architectures/#NeuralEstimators.WeightedGraphConv","page":"Architectures and activations functions","title":"NeuralEstimators.WeightedGraphConv","text":"WeightedGraphConv(in => out, σ=identity; aggr=mean, bias=true, init=glorot_uniform)\n\nSame as regular GraphConv layer, but where the neighbours of a node are weighted by their spatial distance to that node.\n\nArguments\n\nin: The dimension of input features.\nout: The dimension of output features.\nσ: Activation function.\naggr: Aggregation operator for the incoming messages (e.g. +, *, max, min, and mean).\nbias: Add learnable bias.\ninit: Weights' initializer.\n\nExamples\n\nusing NeuralEstimators\nusing GraphNeuralNetworks\n\n# Construct a spatially-weighted adjacency matrix based on k-nearest neighbours\n# with k = 5, and convert to a graph with random (uncorrelated) dummy data:\nn = 100\nS = rand(n, 2)\nd = 1 # dimension of each observation (univariate data here)\nA = adjacencymatrix(S, 5)\nZ = GNNGraph(A, ndata = rand(d, n))\n\n# Construct the layer and apply it to the data to generate convolved features\nlayer = WeightedGraphConv(d => 16)\nlayer(Z)\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.UniversalPool","page":"Architectures and activations functions","title":"NeuralEstimators.UniversalPool","text":"UniversalPool(ψ, ϕ)\n\nPooling layer (i.e., readout layer) from the paper 'Universal Readout for Graph Convolutional Neural Networks'. It takes the form,\n\nmathbfV = ϕ(G¹ sum_sin G ψ(mathbfh_s))\n\nwhere mathbfV denotes the summary vector for graph G, mathbfh_s denotes the vector of hidden features for node s in G, and ψ and ϕ are dense neural networks.\n\nSee also the pooling layers available from GraphNeuralNetworks.jl.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\nusing GraphNeuralNetworks\nusing Graphs: random_regular_graph\n\n# Construct an input graph G\nn_h = 16 # dimension of each feature node\nn_nodes = 10\nn_edges = 4\nG = GNNGraph(random_regular_graph(n_nodes, n_edges), ndata = rand(n_h, n_nodes))\n\n# Construct the pooling layer\nn_t = 32 # dimension of the summary vector for each node\nn_v = 64 # dimension of the final summary vector V\nψ = Dense(n_h, n_t)\nϕ = Dense(n_t, n_v)\npool = UniversalPool(ψ, ϕ)\n\n# Apply the pooling layer\npool(G)\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#Output-activation-functions","page":"Architectures and activations functions","title":"Output activation functions","text":"","category":"section"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"These layers can be used at the end of an architecture to ensure that the neural estimator provides valid parameters.","category":"page"},{"location":"API/architectures/","page":"Architectures and activations functions","title":"Architectures and activations functions","text":"Compress\n\nCholeskyCovariance\n\nCovarianceMatrix\n\nCorrelationMatrix\n\nSplitApply","category":"page"},{"location":"API/architectures/#NeuralEstimators.Compress","page":"Architectures and activations functions","title":"NeuralEstimators.Compress","text":"Compress(a, b, k = 1)\n\nLayer that compresses its input to be within the range a and b, where each element of a is less than the corresponding element of b.\n\nThe layer uses a logistic function,\n\nl(θ) = a + fracb - a1 + e^-kθ\n\nwhere the arguments a and b together combine to shift and scale the logistic function to the desired range, and the growth rate k controls the steepness of the curve.\n\nThe logistic function given here contains an additional parameter, θ₀, which is the input value corresponding to the functions midpoint. In Compress, we fix θ₀ = 0, since the output of a randomly initialised neural network is typically around zero.\n\nExamples\n\nusing NeuralEstimators\nusing Flux\n\na = [25, 0.5, -pi/2]\nb = [500, 2.5, 0]\np = length(a)\nK = 100\nθ = randn(p, K)\nl = Compress(a, b)\nl(θ)\n\nn = 20\nθ̂ = Chain(Dense(n, p), l)\nZ = randn(n, K)\nθ̂(Z)\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.CholeskyCovariance","page":"Architectures and activations functions","title":"NeuralEstimators.CholeskyCovariance","text":"CholeskyCovariance(d)\n\nLayer for constructing the parameters of the lower Cholesky factor associated with an unconstrained d×d covariance matrix.\n\nThe layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension, but with d rows constrained to be positive (corresponding to the diagonal elements of the Cholesky factor) and the remaining rows unconstrained.\n\nThe ordering of the transformed Matrix aligns with Julia's column-major ordering. For example, when modelling the Cholesky factor,\n\nbeginbmatrix\nL₁₁ \nL₂₁ L₂₂ \nL₃₁ L₃₂ L₃₃ \nendbmatrix\n\nthe rows of the matrix returned by a CholeskyCovariance layer will be ordered as\n\nL₁₁ L₂₁ L₃₁ L₂₂ L₃₂ L₃₃\n\nwhich means that the output can easily be transformed into the implied Cholesky factors using vectotril.\n\nExamples\n\nusing NeuralEstimators\n\nd = 4\np = d*(d+1)÷2\nθ = randn(p, 50)\nl = CholeskyCovariance(d)\nθ = l(θ) # returns matrix (used for Flux networks)\nL = [vectotril(y) for y ∈ eachcol(θ)] # convert matrix to Cholesky factors\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.CovarianceMatrix","page":"Architectures and activations functions","title":"NeuralEstimators.CovarianceMatrix","text":"CovarianceMatrix(d)\n\nLayer for constructing the parameters of an unconstrained d×d covariance matrix.\n\nThe layer transforms a Matrix with d(d+1)÷2 rows into a Matrix of the same dimension.\n\nInternally, it uses a CholeskyCovariance layer to construct a valid Cholesky factor 𝐋, and then extracts the lower triangle from the positive-definite covariance matrix 𝚺 = 𝐋𝐋'. The lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the covariance matrix,\n\nbeginbmatrix\nΣ₁₁ Σ₁₂ Σ₁₃ \nΣ₂₁ Σ₂₂ Σ₂₃ \nΣ₃₁ Σ₃₂ Σ₃₃ \nendbmatrix\n\nthe rows of the matrix returned by a CovarianceMatrix layer will be ordered as\n\nΣ₁₁ Σ₂₁ Σ₃₁ Σ₂₂ Σ₃₂ Σ₃₃\n\nwhich means that the output can easily be transformed into the implied covariance matrices using vectotril and Symmetric.\n\nExamples\n\nusing NeuralEstimators\nusing LinearAlgebra\n\nd = 4\np = d*(d+1)÷2\nθ = randn(p, 50)\n\nl = CovarianceMatrix(d)\nθ = l(θ)\nΣ = [Symmetric(cpu(vectotril(y)), :L) for y ∈ eachcol(θ)]\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.CorrelationMatrix","page":"Architectures and activations functions","title":"NeuralEstimators.CorrelationMatrix","text":"CorrelationMatrix(d)\n\nLayer for constructing the parameters of an unconstrained d×d correlation matrix.\n\nThe layer transforms a Matrix with d(d-1)÷2 rows into a Matrix with the same dimension.\n\nInternally, the layers uses the algorithm described here and here to construct a valid Cholesky factor 𝐋, and then extracts the strict lower triangle from the positive-definite correlation matrix 𝐑 = 𝐋𝐋'. The strict lower triangle is extracted and vectorised in line with Julia's column-major ordering. For example, when modelling the correlation matrix,\n\nbeginbmatrix\n1 R₁₂ R₁₃ \nR₂₁ 1 R₂₃\nR₃₁ R₃₂ 1\nendbmatrix\n\nthe rows of the matrix returned by a CorrelationMatrix layer will be ordered as\n\nR₂₁ R₃₁ R₃₂\n\nwhich means that the output can easily be transformed into the implied correlation matrices using the strict variant of vectotril and Symmetric.\n\nExamples\n\nusing NeuralEstimators\nusing LinearAlgebra\n\nd = 4\np = d*(d-1)÷2\nl = CorrelationMatrix(d)\nθ = randn(p, 50)\n\n# returns a matrix of parameters\nθ = l(θ)\n\n# convert matrix of parameters to implied correlation matrices\nR = map(eachcol(θ)) do y\n\tR = Symmetric(cpu(vectotril(y, strict = true)), :L)\n\tR[diagind(R)] .= 1\n\tR\nend\n\n\n\n\n\n","category":"type"},{"location":"API/architectures/#NeuralEstimators.SplitApply","page":"Architectures and activations functions","title":"NeuralEstimators.SplitApply","text":"SplitApply(layers, indices)\n\nSplits an array into multiple sub-arrays by subsetting the rows using the collection of indices, and then applies each layer in layers to the corresponding sub-array.\n\nSpecifically, for each i = 1, …, n, with n the number of layers, SplitApply(x) performs layers[i](x[indices[i], :]), and then vertically concatenates the resulting transformed arrays.\n\nExamples\n\nusing NeuralEstimators\n\nd = 4\nK = 50\np₁ = 2 # number of non-covariance matrix parameters\np₂ = d*(d+1)÷2 # number of covariance matrix parameters\np = p₁ + p₂\n\na = [0.1, 4]\nb = [0.9, 9]\nl₁ = Compress(a, b)\nl₂ = CovarianceMatrix(d)\nl = SplitApply([l₁, l₂], [1:p₁, p₁+1:p])\n\nθ = randn(p, K)\nl(θ)\n\n\n\n\n\n","category":"type"},{"location":"#NeuralEstimators","page":"NeuralEstimators","title":"NeuralEstimators","text":"","category":"section"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"Neural estimators are neural networks that transform data into parameter point estimates, and they are a promising recent approach to inference. They are likelihood free, substantially faster than classical methods, and can be designed to be approximate Bayes estimators. Uncertainty quantification with neural estimators is also straightforward through the bootstrap distribution, which is essentially available \"for free\" with a neural estimator.","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"The package NeuralEstimators facilitates the development of neural estimators in a user-friendly manner. It caters for arbitrary models by having the user implicitly define their model via simulated data. This makes the development of neural estimators particularly straightforward for models with existing implementations (possibly in other programming languages, e.g., R or python). A convenient interface for R users is available here.","category":"page"},{"location":"#Getting-started","page":"NeuralEstimators","title":"Getting started","text":"","category":"section"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"Install NeuralEstimators from Julia's package manager using the following command inside Julia:","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"using Pkg; Pkg.add(\"NeuralEstimators\")","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"Once familiar with the details of the Theoretical framework, see the Examples.","category":"page"},{"location":"#Supporting-and-citing","page":"NeuralEstimators","title":"Supporting and citing","text":"","category":"section"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"This software was developed as part of academic research. If you would like to support it, please star the repository. If you use NeuralEstimators in your research or other activities, please use the following citation.","category":"page"},{"location":"","page":"NeuralEstimators","title":"NeuralEstimators","text":"@article{SZH_2023_neural_Bayes_estimators,\n\tauthor = {Sainsbury-Dale, Matthew and Zammit-Mangion, Andrew and Huser, Raphaël},\n\ttitle = {Likelihood-Free Parameter Estimation with Neural {B}ayes Estimators},\n\tjournal = {The American Statistician},\n\tyear = {2023},\n\tvolume = {to appear}\n}","category":"page"}] } diff --git a/dev/workflow/advancedusage/index.html b/dev/workflow/advancedusage/index.html index 8d2507d..423c95e 100644 --- a/dev/workflow/advancedusage/index.html +++ b/dev/workflow/advancedusage/index.html @@ -24,4 +24,4 @@ end

      Then, setting the argument m in train to be an integer range (e.g., 1:30) will train the neural estimator with the given variable sample sizes.

      Combining neural and expert summary statistics

      See DeepSetExpert.

      Loading previously saved neural estimators

      As training is by far the most computationally demanding part of the workflow, one typically trains an estimator and then saves it for later use. More specifically, one usually saves the parameters of the neural estimator (e.g., the weights and biases of the neural networks); then, to load the neural estimator at a later time, one initialises an estimator with the same architecture used during training, and then loads the saved parameters into this estimator.

      If the argument savepath is specified, train automatically saves the neural estimator's parameters; to load them, one may use the following code, or similar:

      using Flux: loadparams!
       
       θ̂ = architecture()
      -loadparams!(θ̂, loadbestweights(savepath))

      Above, architecture() is a user-defined function that returns a neural estimator with the same architecture as the estimator that we wish to load, but with randomly initialised parameters, and the function loadparams! loads the parameters of the best (as determined by loadbestweights) neural estimator saved in savepath.

      +loadparams!(θ̂, loadbestweights(savepath))

      Above, architecture() is a user-defined function that returns a neural estimator with the same architecture as the estimator that we wish to load, but with randomly initialised parameters, and the function loadparams! loads the parameters of the best (as determined by loadbestweights) neural estimator saved in savepath.

      diff --git a/dev/workflow/examples/index.html b/dev/workflow/examples/index.html index 05fca4e..bbd3e15 100644 --- a/dev/workflow/examples/index.html +++ b/dev/workflow/examples/index.html @@ -27,7 +27,7 @@ assessment = assess([θ̂], θ, Z)

      Once the neural Bayes estimator has been assessed, it may then be applied to observed data, with parametric/non-parametric bootstrap-based uncertainty quantification facilitated by bootstrap and interval. Below, we use simulated data as a substitute for observed data:

      Z = simulate(θ, m)     # pretend that this is observed data
       θ̂(Z)                   # point estimates from the observed data
       θ̃ = bootstrap(θ̂, Z)    # non-parametric bootstrap estimates
      -interval(θ̃)  # confidence interval from the bootstrap estimates

      Multivariate data

      Suppose now that our data consists of $m$ replicates of a $d$-dimensional multivariate distribution. Everything remains as given in the univariate example above, except that we now store the data as a vector of $d \times m$ matrices (previously they were stored as $1\times m$ matrices), and the inner network of the DeepSets representation takes a $d$-dimensional input (previously it took a 1-dimensional input).

      <!– Note that, when estimating a covariance matrix, one may wish to constrain the neural estimator to only produce parameters that imply a valid (i.e., positive definite) covariance matrix. This can be achieved by appending a CovarianceMatrix layer to the end of the outer network of the DeepSets representation. However, this is often unnecessary as the estimator will typically learn to provide valid estimates, even if not constrained to do so. –>

      Gridded spatial data

      For spatial data measured on a regular grid, the estimator is typically based on a convolutional neural network (CNN), and each data set is stored as a four-dimensional array, where the first three dimensions corresponding to the width, height, and depth/channels dimensions, and the fourth dimension stores the independent replicates. Note that, for univariate spatial processes, the channels dimension is simply equal to 1. For a 16x16 spatial grid, a possible architecture is given below.

      p = 2    # number of parameters in the statistical model
      +interval(θ̃)  					# confidence interval from the bootstrap estimates

      Multivariate data

      Suppose now that our data consists of $m$ replicates of a $d$-dimensional multivariate distribution. Everything remains as given in the univariate example above, except that we now store the data as a vector of $d \times m$ matrices (previously they were stored as $1\times m$ matrices), and the inner network of the DeepSets representation takes a $d$-dimensional input (previously it took a 1-dimensional input).

      Note that, when estimating a full covariance matrix, one may wish to constrain the neural estimator to only produce parameters that imply a valid (i.e., positive definite) covariance matrix. This can be achieved by appending a CovarianceMatrix layer to the end of the outer network of the DeepSets representation. However, this is often unnecessary as the estimator will typically learn to provide valid estimates, even if not constrained to do so.

      Gridded spatial data

      For spatial data measured on a regular grid, the estimator is typically based on a convolutional neural network (CNN), and each data set is stored as a four-dimensional array, where the first three dimensions correspond to the width, height, and channels dimensions, and the fourth dimension stores the independent replicates. Note that, for univariate spatial processes, the channels dimension is simply equal to 1. For a 16x16 spatial grid, a possible architecture is given below.

      p = 2    # number of parameters in the statistical model
       w = 32   # number of neurons in each layer
       d = 2    # dimension of the response variable
       
      @@ -38,4 +38,103 @@
       	Flux.flatten
       	)
       ϕ = Chain(Dense(128, 512, relu), Dense(512, p))
      -architecture = DeepSet(ψ, ϕ)

      Irregular spatial data

      This example is coming soon! The methodology involves the use of graph neural networks, which in Julia can be implemented using the package GraphNeuralNetworks.jl. Some key steps:

      +architecture = DeepSet(ψ, ϕ)

      Irregular spatial data

      The methodology we illustrate here uses graph neural networks (GNNs), which are implemented in Julia in the package GraphNeuralNetworks.jl. GNN-based estimators parsimoniously model spatial dependence, and they can be applied to data collected over arbitrary spatial locations. Some key steps involve:

      For a concrete example, we consider a classical spatial model, the linear Gaussian-Gaussian model,

      \[Z_{j} = Y(\boldsymbol{s}_{j}) + \epsilon_{j}, \; j = 1, \dots, n,\]

      where $\boldsymbol{Z} \equiv (Z_{1}, \dots, Z_{n})'$ are data observed at locations $\{\boldsymbol{s}_{1}, \dots, \boldsymbol{s}_{n}\} \subset \mathcal{D}$, where $\mathcal{D}$ is some spatial domain, $Y(\cdot)$ is a spatially-correlated mean-zero Gaussian process, and $\epsilon_j \sim N(0, \tau^2)$, $j = 1, \dots, n$ is Gaussian white noise with standard deviation $\tau > 0$. Here, we use the popular isotropic Matérn covariance function with fixed marginal variance $\sigma^2 = 1$, fixed smoothness parameter $\nu = 0.5$, and unknown range parameter $\rho > 0$. See matern for the specific parametrisation used in this example. Hence, we will construct a neural Bayes estimator for $\boldsymbol{\theta} \equiv (\tau, \rho)'$.

      Before proceeding, we load the required packages:

      using NeuralEstimators
      +using Flux
      +using GraphNeuralNetworks
      +using Distributions: Uniform
      +using Distances: pairwise, Euclidean
      +using LinearAlgebra
      +using Statistics: mean

      First, we define a function to sample parameters from the prior. As before, the sampled parameters are stored as $p \times K$ matrices, with $p$ the number of parameters in the model and $K$ the number of sampled parameter vectors. We use the priors $\tau \sim U(0.1, 1)$ and $\rho \sim U(0.05, 0.5)$, and we assume that the parameters are independent a priori. Simulation from this model involves the computation of an expensive intermediate object, namely, the Cholesky factor of the covariance matrix. Storing this Cholesky factor for re-use can enable the fast simulation of new data sets (provided that the parameters are held fixed): hence, in this example, we define a class, Parameters, which is a sub-type of ParameterConfigurations, for storing the matrix of parameters and the corresponding intermediate objects needed for data simulation.

      If one wishes to make inference from a single spatial data set only, and this data is collected before the estimator is constructed, then the data can be simulated using the observed spatial locations. However, if one wishes to construct an estimator that is (approximately) Bayes irrespective of the spatial locations, then synthetic spatial locations must be generated during the training phase. If no prior knowledge on the sampling configuration is available, then a wide variety of spatial configurations must be simulated to produce an estimator that is broadly applicable. Below, we use a Matérn cluster process (see maternclusterprocess) for this task (note that the hyper-parameters of this process govern the expected number of locations in each sampled set of spatial locations, and the degree of clustering).

      We define two constructors for our Parameters object: one that constructs a Parameters object given a single integer K, and another that constructs a Parameters object given a pre-specified $p\times K$ matrix of parameters and a set of spatial locations associated with each parameter vector. These constructors will be useful in the workflow below.

      struct Parameters{T} <: ParameterConfigurations
      +	θ::Matrix{T}
      +	locations
      +	chols
      +	graphs
      +end
      +
      +function Parameters(K::Integer)
      +
      +	# Sample parameters from the prior distribution
      +	τ = rand(Uniform(0.1, 1.0), K)
      +	ρ = rand(Uniform(0.05, 0.5), K)
      +
      +	# Combine parameters into a pxK matrix
      +	θ = permutedims(hcat(τ, ρ))
      +
      +	# Simulate spatial locations from a cluster process over the unit square
      +	n = rand(Uniform(75, 200), K)
      +	λ = rand(Uniform(10, 50), K)
      +	locations = [maternclusterprocess(λ = λ[k], μ = n[k]/λ[k]) for k ∈ 1:K]
      +
      +	Parameters(θ::Matrix, locations)
      +end
      +
      +function Parameters(θ::Matrix, locations)
      +
      +	# Compute distance matrices and construct the graphs
      +	D = pairwise.(Ref(Euclidean()), locations, locations, dims = 1)
      +	A = adjacencymatrix.(D, 0.15)
      +	graphs = GNNGraph.(A)
      +
      +	# Compute Cholesky factors using the distance matrices
      +	ρ = θ[2, :]
      +	ν = 0.5
      +	σ = 1
      +	chols = maternchols(D, ρ, ν, σ.^2; stack = false)     
      +
      +	Parameters(θ, locations, chols, graphs)
      +end

      Next, we define a function for simulating from the model given an object of type Parameters. Although here we are constructing an estimator for a single replicate, the code below enables simulation of an arbitrary number of independent replicates m: one may provide a single integer for m, a range of values (e.g., 1:30), or any object that can be sampled from using rand(m, K).

      function simulate(parameters::Parameters, m)
      +
      +	K = size(parameters, 2)
      +	m̃ = rand(m, K)
      +
      +	τ      = parameters.θ[1, :]
      +	chols  = parameters.chols
      +	g      = parameters.graphs
      +
      +	# Z = Folds.map(1:K) do i # use this for parallel simulation
      +	Z = map(1:K) do k
      +		L = chols[k][:, :]
      +		z = simulategaussianprocess(L, m̃[k])  # simulate a smooth field
      +		z = z + τ[k] * randn(size(z)...)      # add white noise
      +		z = batch([GNNGraph(g[k], ndata = z[:, i, :]') for i ∈ 1:m̃[k]])
      +		z
      +	end
      +
      +	return Z
      +end
      +simulate(parameters::Parameters, m::Integer) = simulate(parameters, range(m, m))

      Next we construct an appropriate architecture using GNN and WeightedGraphConv. For example, we might construct a point estimator as:

      # Propagation module
      +d = 1      # dimension of response variable
      +nh = 32    # dimension of node feature vectors
      +propagation = GNNChain(WeightedGraphConv(d => nh), WeightedGraphConv(nh => nh), WeightedGraphConv(nh => nh))
      +
      +# Readout module (using the elementwise average)
      +no = nh    # dimension of the final summary vector for each graph
      +readout = GlobalPool(mean)
      +
      +# Mapping module (use exponential output activation to ensure positive estimates)
      +p = 2     # number of parameters in the statistical model
      +w = 64    # width of layers used for the mapping network ϕ
      +ϕ = Chain(Dense(no, w, relu), Dense(w, w, relu), Dense(w, p, exp))
      +
      +# Construct the estimator
      +θ̂ = GNN(propagation, readout, ϕ)
      +θ̂ = PointEstimator(θ̂)

      Next, we train the neural estimator using train, here using the default absolute-error loss. We'll train the estimator using a single realisation per parameter configuration (i.e., with m = 1). Below, we use a very small number of epochs and a small number of training parameter vectors to keep the run time of this example low, and this will of course result in a poor estimator: in practice, one may set K to some large value (say, 10,000), and leave epochs unspecified so that training halts only when the risk function ceases to decrease.

      θ̂ = train(θ̂, Parameters, simulate, m = 1, epochs = 5, K = 500)

      Note that one may also train the estimator using fixed parameter sets and fixed training data (as described in train), by passing parameter or data instances:

      # Fixed parameters, but data simulated on-the-fly:
      +θ_train = Parameters(500)
      +θ_val   = Parameters(100)
      +θ̂ = train(θ̂, θ_train, θ_val, simulate, m = 1, epochs = 5)
      +
      +# Fixed parameters and data:
      +z_train = simulate(θ_train, 1)
      +z_val   = simulate(θ_val, 1)
      +θ̂ = train(θ̂, θ_train, θ_val, z_train, z_val, epochs = 5)

      Once the neural Bayes estimator has been trained, it can be assessed with testing data using assess, as illustrated in the univariate example above. Finally, once the neural Bayes estimator has been assessed, it may be applied to observed data, with bootstrap-based uncertainty quantification facilitated by bootstrap and interval. Below, we use simulated data as a substitute for observed data:

      # Generate some toy data
      +parameters = Parameters(1)   # sample a single parameter vector
      +z = simulate(parameters, 1)  # simulate some data                  
      +θ = parameters.θ             # true parameters used to generate data
      +S = parameters.locations     # observed locations
      +
      +# Point estimates
      +θ̂(z)
      +
      +# Parametric bootstrap sample and bootstrap confidence interval
      +θ̃ = bootstrap(θ̂, Parameters(θ̂(z), S), simulate, 1)   
      +interval(θ̃)  					                
      diff --git a/dev/workflow/overview/index.html b/dev/workflow/overview/index.html index 7b0f1a8..2230172 100644 --- a/dev/workflow/overview/index.html +++ b/dev/workflow/overview/index.html @@ -1,2 +1,2 @@ -Overview · NeuralEstimators.jl

      Overview

      To develop a neural estimator with NeuralEstimators.jl,

      • Sample parameters from the prior distribution: the parameters are stored as $p \times K$ matrices, with $p$ the number of parameters in the model and $K$ the number of samples (i.e., parameter configurations) in the given parameter set (i.e., training, validation, or test set).
      • Simulate data from the assumed model over the parameter sets generated above. These data are stored as a Vector{A}, with each element of the vector associated with one parameter configuration, and where A depends on the representation of the neural estimator (e.g., an Array for CNN-based estimators, or a GNNGraph for GNN-based estimators).
      • Initialise a neural network, θ̂, that will be trained into a neural Bayes estimator.
      • Train θ̂ under the chosen loss function using train.
      • Assess θ̂ using assess. The resulting object of class Assessment can be used to assess the estimator with respect to the entire parameter space by estimating the risk function with risk, or used to plot the empirical sampling distribution of the estimator.
      • Apply θ̂ to observed data (once its performance has been checked in the above step). Bootstrap-based uncertainty quantification is facilitated with bootstrap and interval.

      See the Examples and, once familiar with the basic workflow, see Advanced usage for practical considerations on how to most effectively construct neural estimators.

      +Overview · NeuralEstimators.jl

      Overview

      To develop a neural estimator with NeuralEstimators.jl,

      • Sample parameters from the prior distribution: the parameters are stored as $p \times K$ matrices, with $p$ the number of parameters in the model and $K$ the number of samples (i.e., parameter configurations) in the given parameter set (i.e., training, validation, or test set).
      • Simulate data from the assumed model over the parameter sets generated above. These data are stored as a Vector{A}, with each element of the vector associated with one parameter configuration, and where A depends on the representation of the neural estimator (e.g., an Array for CNN-based estimators, or a GNNGraph for GNN-based estimators).
      • Initialise a neural network, θ̂, that will be trained into a neural Bayes estimator.
      • Train θ̂ under the chosen loss function using train.
      • Assess θ̂ using assess. The resulting object of class Assessment can be used to assess the estimator with respect to the entire parameter space by estimating the risk function with risk, or used to plot the empirical sampling distribution of the estimator.
      • Apply θ̂ to observed data (once its performance has been checked in the above step). Bootstrap-based uncertainty quantification is facilitated with bootstrap and interval.

      See the Examples and, once familiar with the basic workflow, see Advanced usage for practical considerations on how to most effectively construct neural estimators.