diff --git a/404.html b/404.html index a5afcb9c4..9a38a1ec9 100644 --- a/404.html +++ b/404.html @@ -302,20 +302,6 @@ -
sbi
: simulation-based inference¶sbi
: A Python toolbox for simulation-based inference.
Inference can be run in a single line of code:
+Inference can be run in a single line of code
posterior = infer(simulator, prior, method='SNPE', num_simulations=1000)
and you can choose from a variety of amortized and sequential SBI methods.
+or in a few lines for more flexibility:
+inference = SNPE(prior=prior)
+_ = inference.append_simulations(theta, x).train()
+posterior = inference.build_posterior()
+
sbi
lets you choose from a variety of amortized and sequential SBI methods:
Amortized methods return a posterior that can be applied to many different observations without retraining, whereas sequential methods focus the inference on one particular observation to be more simulation-efficient. For an overview of implemented methods see below, or checkout or GitHub page.
diff --git a/install/index.html b/install/index.html index 07a4b9903..d779b7f25 100644 --- a/install/index.html +++ b/install/index.html @@ -319,20 +319,6 @@ -num_dim = 3
prior = utils.BoxUniform(low=-2 * torch.ones(num_dim), high=2 * torch.ones(num_dim))
-
def simulator(parameter_set):
return 1.0 + parameter_set + torch.randn(parameter_set.shape) * 0.1
sbi
can then run inference:
posterior = infer(simulator, prior, method="SNPE", num_simulations=1000)
+# Other methods are "SNLE" or "SNRE".
+posterior = infer(simulator, prior, method="SNPE", num_simulations=1000)
Running 1000 simulations.: 0%| | 0/1000 [00:00<?, ?it/s]
@@ -1043,32 +947,7 @@ Running the inference procedure
Next steps¶
-The single-line interface described above provides an easy entry for using sbi
. However, if you are working on a larger project or need additional features, we strongly recommend using the flexible interface.
-Requirements for the simulator, prior, and observation¶
-In the interface described above, you need to provide a prior and a simulator for training. Let’s talk about what requirements they need to satisfy.
-Prior¶
-A prior is a distribution object that allows to sample parameter sets. Any class for the prior is allowed as long as it allows to call prior.sample()
and prior.log_prob()
.
-Simulator¶
-The simulator is a Python callable that takes in a parameter set and outputs data with some (even if very small) stochasticity.
-Allowed data types and shapes for input and output:
-
-- the input parameter set and the output have to be either a
np.ndarray
or a torch.Tensor
.
-- the input parameter set should have either shape
(1,N)
or (N)
, and the output must have shape (1,M)
or (M)
.
-
-You can call simulators not written in Python as long as you wrap them in a Python function.
-Observation¶
-Once you have a trained posterior, you will want to evaluate or sample the posterior \(p(\theta|x_o)\) at certain observed values \(x_o\):
-
-- The allowable data types are either Numpy
np.ndarray
or a torch torch.Tensor
.
-- The shape must be either
(1,M)
or just (M)
.
-
-Running different algorithms¶
-sbi
implements three classes of algorithms that can be used to obtain the posterior distribution: SNPE, SNLE, and SNRE. You can try the different algorithms by simply swapping out the method
:
-posterior = infer(simulator, prior, method="SNPE", num_simulations=1000)
-posterior = infer(simulator, prior, method="SNLE", num_simulations=1000)
-posterior = infer(simulator, prior, method="SNRE", num_simulations=1000)
-
-You can then infer, sample, evaluate, and plot the posterior as described above.
+The single-line interface described above provides an easy entry for using sbi
. However, on almost any real-world problem that goes beyond a simple demonstration, we strongly recommend using the flexible interface.
@@ -1104,13 +983,13 @@ Running different algorithms
+
diff --git a/tutorial/09_sensitivity_analysis/index.html b/tutorial/09_sensitivity_analysis/index.html
index b0e691433..b7a80d3cf 100644
--- a/tutorial/09_sensitivity_analysis/index.html
+++ b/tutorial/09_sensitivity_analysis/index.html
@@ -311,20 +311,6 @@
-
-
- Amortized inference
-
-
-
-
-
-
-
-
-
-
-
Flexible interface
@@ -340,8 +326,8 @@
-
- Sampler interface
+
+ Amortized inference
@@ -415,8 +401,8 @@
-
- Using Variational Inference for Building Posteriors
+
+ Sampling algorithms in sbi
@@ -457,8 +443,8 @@
-
- Handling invalid simulations
+
+ SBI with trial-based data
@@ -471,8 +457,8 @@
-
- Crafting summary statistics
+
+ Handling invalid simulations
@@ -485,8 +471,8 @@
-
- SBI with trial-based data
+
+ Crafting summary statistics
diff --git a/tutorial/10_crafting_summary_statistics/index.html b/tutorial/10_crafting_summary_statistics/index.html
index d842e288a..f69c3a25c 100644
--- a/tutorial/10_crafting_summary_statistics/index.html
+++ b/tutorial/10_crafting_summary_statistics/index.html
@@ -311,20 +311,6 @@
-
-
- Amortized inference
-
-
-
-
-
-
-
-
-
-
-
Flexible interface
@@ -340,8 +326,8 @@
-
- Sampler interface
+
+ Amortized inference
@@ -417,8 +403,8 @@
-
- Using Variational Inference for Building Posteriors
+
+ Sampling algorithms in sbi
@@ -458,6 +444,20 @@
+
+
+ SBI with trial-based data
+
+
+
+
+
+
+
+
+
+
+
Handling invalid simulations
@@ -491,20 +491,6 @@
-
-
-
-
-
-
-
- SBI with trial-based data
-
-
-
-
-
-
@@ -1167,13 +1153,13 @@ 1.7 Explicit recommendations
+
-
@@ -933,7 +844,7 @@
The sampler interface¶
+ Sampling algorithms in sbi
¶
Note: this tutorial requires that the user is already familiar with the flexible interface.
sbi
implements three methods: SNPE, SNLE, and SNRE. When using SNPE, the trained neural network directly approximates the posterior. Thus, sampling from the posterior can be done by sampling from the trained neural network. The neural networks trained in SNLE and SNRE approximate the likelihood(-ratio). Thus, in order to draw samples from the posterior, one has to perform additional sampling steps, e.g. Markov-chain Monte-Carlo (MCMC). In sbi
, the implemented samplers are:
@@ -947,12 +858,44 @@ The sampler interface, mcmc_method="slice_np"
, and mcmc_parameters={}
. However, for full flexibility in customizing the sampler, we recommend using the sampler interface. This interface is described here. Further details can be found here.
-Main syntax for SNLE¶
+
Below, we will demonstrate how these samplers can be used in sbi
. First, we train the neural network as always:
+import torch
+from sbi.inference import SNLE
+
+# dummy Gaussian simulator for demonstration
+num_dim = 2
+prior = torch.distributions.MultivariateNormal(torch.zeros(num_dim), torch.eye(num_dim))
+theta = prior.sample((1000,))
+x = theta + torch.randn((1000, num_dim))
+x_o = torch.randn((1, num_dim))
+
+inference = SNLE(prior=prior, show_progress_bars=False)
+likelihood_estimator = inference.append_simulations(theta, x).train()
+
+And then we pass the options for which sampling method to use to the build_posterior()
method:
+# Sampling with MCMC
+sampling_algorithm = "mcmc"
+mcmc_method = "slice_np" # or nuts, or hmc
+posterior = inference.build_posterior(sample_with=sampling_algorithm, mcmc_method=mcmc_method)
+
+# Sampling with variational inference
+sampling_algorithm = "vi"
+vi_method = "rKL" # or fKL
+posterior = inference.build_posterior(sample_with=sampling_algorithm, vi_method=vi_method)
+# Unlike other methods, vi needs a training step for every observation.
+posterior = posterior.set_default_x(x_o).train()
+
+# Sampling with rejection sampling
+sampling_algorithm = "rejection"
+posterior = inference.build_posterior(sample_with=sampling_algorithm)
+
+More flexibility in adjusting the sampler¶
+With the above syntax, you can easily try out different sampling algorithms. However, in many cases, you might want to customize your sampler. Below, we demonstrate how you can change hyperparameters of the samplers (e.g. number of warm-up steps of MCMC) or how you can write your own sampler from scratch.
+Main syntax (for SNLE and SNRE)¶
+As above, we begin by training the neural network as always:
import torch
from sbi.inference import SNLE
-from sbi.inference import likelihood_estimator_based_potential, MCMCPosterior
# dummy Gaussian simulator for demonstration
num_dim = 2
@@ -963,17 +906,31 @@ Main syntax for SNLEinference = SNLE(show_progress_bars=False)
likelihood_estimator = inference.append_simulations(theta, x).train()
+
+ Neural network successfully converged after 52 epochs.
+
+
+Then, for full flexibility on using the sampler, we do not use the .build_posterior()
method, but instead we explicitly define the potential function and the sampling algorithm (see below for explanation):
+from sbi.inference import likelihood_estimator_based_potential, MCMCPosterior
potential_fn, parameter_transform = likelihood_estimator_based_potential(
likelihood_estimator, prior, x_o
)
posterior = MCMCPosterior(
- potential_fn, proposal=prior, theta_transform=parameter_transform
+ potential_fn, proposal=prior, theta_transform=parameter_transform, warmup_steps=10
)
- Neural network successfully converged after 52 epochs.
-
+If you want to use variational inference or rejection sampling, you have to replace the last line with VIPosterior
or RejectionPosterior
:
+# For VI, we have to train.
+posterior = VIPosterior(
+ potential_fn, proposal=prior, theta_transform=parameter_transform
+).train()
+posterior = RejectionPosterior(
+ potential_fn, proposal=prior, theta_transform=parameter_transform
+)
+
+At this point, you could also plug the potential_fn
into any sampler of your choice and not rely on any of the in-built sbi
-samplers.
Further explanation¶
The first lines are the same as for the flexible interface:
inference = SNLE()
@@ -1043,7 +1000,7 @@ Main syntax for SNPE
-
+
Main syntax for SNPE
Previous
- Flexible interface
+ Multi-round inference
@@ -1052,20 +1009,20 @@
-
+
There are scenarios in which we observe multiple data points per experiment and we can assume that they are independent and identically distributed (iid, i.e., they are assumed to have the same underlying model parameters). -For example, in a decision-making experiments, the experiment is often repeated in trials with the same experimental settings and conditions. The corresponding set of trials is then assumed to be “iid”. +For example, in decision-making experiments, the experiment is often repeated in trials with the same experimental settings and conditions. The corresponding set of trials is then assumed to be “iid” given a single parameter set. In such a scenario, we may want to obtain the posterior given a set of observation \(p(\theta | X=\{x_i\}_i^N)\).
For some SBI variants the iid assumption can be exploited: when using a likelihood-based SBI method (SNLE
, SNRE
) one can train the density or ratio estimator on single-trial data, and then perform inference with MCMC
. Crucially, because the data is iid and the estimator is trained on single-trial data, one can repeat the inference with a different x_o
(a different set of trials, or different number of trials) without having to retrain the density estimator. One can interpet this as amortization of the SBI training: we can obtain a neural likelihood, or likelihood-ratio estimate for new x_o
s without retraining, but we still have to run MCMC
or VI
to do inference.
In addition, one can not only change the number of trials of a new x_o
, but also the entire inference setting.
-For example, one can apply hierarchical inference scenarios with changing hierarchical denpendencies between the model parameters–all without having to retrain the density estimator because that is based on estimating single-trail likelihoods.
For some SBI variants the iid assumption can be exploited: when using a likelihood-based SBI method (SNLE
, SNRE
) one can train the density or ratio estimator on single-trial data, and then perform inference with MCMC
or variational inference (VI
). Crucially, because the data is iid and the estimator is trained on single-trial data, one can repeat the inference with a different x_o
(a different set of trials, or different number of trials) without having to retrain the density estimator. One can interpet this as amortization of the SBI training: we can obtain a neural likelihood, or likelihood-ratio estimate for new x_o
s without retraining, but we still have to run MCMC
or VI
to do inference.
In addition, one cannot only change the number of trials of a new x_o
, but also the entire inference setting.
+For example, one can apply hierarchical inference with changing hierarchical denpendencies between the model parameters–all without having to retrain the density estimator because it estimates single-trail likelihoods.
When performing neural posterior estimation (SNPE
) we cannot exploit the iid assumption directly because we are learning a density estimator in theta
.
+
When performing neural posterior estimation (SNPE
) we cannot exploit the iid assumption directly.
Thus, the underlying neural network takes x
as input and predicts the parameters of the density estimator.
-As a consequence, if x
is a set of iid observations \(X=\{x_i\}_i^N\) then the neural network has to be invariant to permutations of this set, i.e., it has to be permutation invariant.
-Overall, this means that we can use SNPE
for inference with iid data, however, we need to provide a corresponding embedding network that handles the iid-data and is permutation invariant.
-This will likely require some hyperparameter tuning and more training data for the inference to work accurately. But once we have this, the inference is fully amortized, i.e., we can get new posterior samples basically instantly without retraining and without running MCMC
or VI
.
Let us first have a look how trial-based inference works in SBI
before we discuss models with “mixed data types”.
x
is a set of iid observations \(X=\{x_i\}_i^N\) then the neural network has to be invariant to permutations of this set, i.e., it has to be permutation invariant. In addition, the neural network has to be able to consume a varying number of iid datapoints in order to be amortized over the number of trials.
+Therefore, in order to use SNPE
for inference on iid data, we need to provide a corresponding embedding network that handles the iid-data.
+This will likely require some hyperparameter tuning and more training data for inference to work accurately. But once we have this, inference is fully amortized, i.e., we can get new posterior samples almost instantly without retraining and without running MCMC
or VI
.
For illustration we use a simple linear Gaussian simulator, as in previous tutorials. The simulator takes a single parameter (vector), the mean of the Gaussian, and its variance is set to one.
-We define a Gaussian prior over the mean and perform inference.
-The observed data is again a from a Gaussian with some fixed “ground-truth” parameter \(\theta_o\).
-Crucially, the observed data x_o
can consist of multiple samples given the same ground-truth parameters and these samples are then iid:
For illustration, we use a simple linear Gaussian simulator, as in previous tutorials. The simulator takes a single parameter (vector) which is the mean of a Gaussian. The simulator then adds noise with a fixed variance (set to one). +We define a Gaussian prior over the mean and perform inference.
+The observed data is also sampled from a Gaussian with some fixed “ground-truth” parameter \(\theta_o\).
+Crucially, the observed data x_o
can consist of multiple samples given the same ground-truth parameters and these samples are iid given \(\theta\):
For this toy problem the ground-truth posterior is well defined, it is again a Gaussian, centered on the mean of \(\mathbf{x_o}\) and with variance scaled by the number of trials \(N\), i.e., the more trials we observe, the more information about the underlying \(\theta_o\) we have and the more concentrated the posteriors becomes.
+For this toy problem, the ground-truth posterior is well defined, it is again a Gaussian, centered on the mean of \(\mathbf{x_o}\) and with variance scaled by the number of trials \(N\), i.e., the more trials we observe, the more information about the underlying \(\theta_o\) we have and the more concentrated the posteriors becomes.
We will illustrate this below:
import torch
import matplotlib.pyplot as plt
@@ -1156,8 +1141,7 @@
Indeed, with increasing number of trials the posterior density concentrates around the true underlying parameter.
IID inference with NLE¶
-(S)NLE can easily perform inference given multiple IID x because it is based on learning the likelihood. Once the likelihood is learned on single trials, i.e., a neural network that given a single observation and a parameter predicts the likelihood of that observation given the parameter, one can perform MCMC to obtain posterior samples.
-MCMC relies on evaluating ratios of likelihoods of candidate parameters to either accept or reject them to be posterior samples. When inferring the posterior given multiple IID observation, these likelihoods are just the joint likelihoods of each IID observation given the current parameter candidate. Thus, given a neural likelihood from SNLE, we can calculate these joint likelihoods and perform MCMC given IID data, we just have to multiply together (or add in log-space) the individual trial-likelihoods (sbi
takes care of that).
+(S)NLE and (S)NRE can perform inference given multiple IID obserations by using only single-trial training data (i.e., for training, we run the simulator only once per parameter set). Once the likelihood is learned on single trials (i.e., a neural network that predicts the likelihood of a single observation given a parameter set), one can sample the posterior for any number of trials. This works because, given a single-trial neural likelihood from (S)NLE or (S)NRE, we can calculate the joint likelihoods of all trials by multiplying them together (or adding them in log-space). The joint likelihood can then be plugged into MCMC
or VI
. sbi
takes care of all of these steps, so you do not have to implement anything yourself:
# Train SNLE.
inferer = SNLE(prior, show_progress_bars=True, density_estimator="mdn")
theta, x = simulate_for_sbi(simulator, prior, 10000, simulation_batch_size=1000)
@@ -1265,10 +1249,11 @@ IID inference with NLE
IID inference with NPE using permutation-invariant embedding nets¶
-For NPE we need to define an embedding net that handles the set-like structure of iid-data, i.e., that it permutation invariant and can handle different number of trials.
+For NPE we need to define an embedding net that handles the set-like structure of iid-data, i.e., that it permutation invariant and can handle different number of trials.
We implemented several embedding net classes that allow to construct such a permutation- and number-of-trials invariant embedding net.
To become permutation invariant, the neural net first learns embeddings for single trials and then performs a permutation invariant operation on those embeddings, e.g., by taking the sum or the mean (Chen et al. 2018, Radev et al. 2021).
-To become invariant w.r.t. the number-of-trials, we train the net with varying number of trials for each parameter setting. As it is difficult to handle tensors of varying lengths in the SBI training loop, we construct a training data set in which “unobserved” trials are mask by NaNs (and ignore the resulting SBI warning about NaNs in the training data).
+To become invariant w.r.t. the number-of-trials, we train the net with varying number of trials for each parameter setting. This means that, unlike for (S)NLE and (S)NRE, (S)NPE requires to run the simulator multiple times for individual parameter sets to generate the training data.
+In order to implement this in sbi
, “unobserved” trials in the training dataset have to be masked by NaNs (and ignore the resulting SBI warning about NaNs in the training data).
Construct training data set.¶
# we need to fix the maximum number of trials.
max_num_trials = 20
@@ -1486,7 +1471,7 @@ Amortized inference
-
+
Amortized inference
Previous
- Crafting summary statistics
+ Learning summary statistics
@@ -1495,20 +1480,20 @@
In the previous tutorial, we saw how to build the posterior and how to specialize on one specific observation x_o
. If one uses SNPE, then the posterior can be sampled from directly, yet this comes at the expense of necessary correction terms during training, since the samples are obtained from the “wrong” prior for num_rounds > 1
. For SNLE or SNRE, MCMC sampling is required, which is computationally expensive. With SNVI (sequential neural variational inference), it is possible to directly sample from the posterior without any corrections during training or without expensive MCMC for sampling. This is possible by learning the posterior with variational inference techniques. For this, an additional network (one for the likelihood or likelihood-to-evidence-ratio) must be trained first.
If one uses SNPE, then the posterior can be sampled from directly (without MCMC). Contrary to that, for SNLE or SNRE, MCMC sampling is required, which is computationally expensive. With SNVI (sequential neural variational inference), it is possible to directly sample from the posterior without any corrections during training or without expensive MCMC for sampling. This is possible by learning the posterior with variational inference techniques. For this, an additional network (one for the likelihood or likelihood-to-evidence-ratio) must be trained first.
inference = SNLE(prior)
@@ -1047,41 +980,6 @@ Linear Gaussian example
-
-