Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary convergence of posteriors for SNPE_C, SNLE and SNRE #1290

Open
paarth-dudani opened this issue Oct 4, 2024 · 7 comments
Open

Arbitrary convergence of posteriors for SNPE_C, SNLE and SNRE #1290

paarth-dudani opened this issue Oct 4, 2024 · 7 comments

Comments

@paarth-dudani
Copy link

paarth-dudani commented Oct 4, 2024

Describe the bug
SNLE, SNRE and SNPE_C converge arbitrarily for a simple simulator model with just one parameter to infer.

To Reproduce

  1. Python (3.9.13) and SBI version (0.23.1)
  2. SNPE:
# ## Function to introduce noise into the exponential:
def noisy_exponential(initial_cond, scale, input):
    x_noise = np.zeros((len(input)),)
    for i in range(len(input)):
        value = 1/(initial_cond)*np.exp(-input[i]/scale) + 0.05*np.random.normal(0,1)
        x_noise[i] = value
    return x_noise


# Define the prior
class WrappedExponential(Exponential):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def log_prob(*args, **kwargs):
        log_probs = Exponential.log_prob(*args, **kwargs)
        return log_probs.squeeze()

exp_prior_mod = WrappedExponential(torch.tensor([2.0]))

n = 50
t = np.linspace(0,10,n)
num_dims = 1
num_sims = 100
num_rounds = 2
theta_true = torch.tensor([1])
initial_condition = 1.0

x_o = torch.tensor(noisy_exponential(1.0, theta_true,t)).float()

simulator = lambda theta: ((1/initial_condition)*np.exp(-t/theta) + 0.05*torch.rand_like(theta)).float()
inference = NPE(prior=exp_prior_mod, density_estimator='maf')
proposal = exp_prior_mod

# Later rounds use the APT loss.
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    x = simulator(theta)
    x[x>10] = 10
    _ = inference.append_simulations(theta, x, proposal=proposal).train(use_combined_loss = False, retrain_from_scratch = True)
    posterior = inference.build_posterior().set_default_x(x_o)
    proposal = posterior
  1. SNRE
    All else same
inference = NRE(exp_prior_mod)
proposal = exp_prior_mod
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    print(np.shape(theta))
    x = simulator(theta)
    x[x>10] = 10
    _ = inference.append_simulations(theta, x).train()
    posterior = inference.build_posterior(mcmc_method="slice_np_vectorized",
                                          mcmc_parameters={"num_chains": 10,
                                                           "thin": 5})
    proposal = posterior.set_default_x(x_o)
  1. SNLE
    All else is same
inference = NLE(exp_prior_mod)
proposal = exp_prior_mod
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    x = simulator(theta)
    x[x>10] = 10
    _ = inference.append_simulations(theta, x).train(validation_fraction=0.1, retrain_from_scratch = False)
    posterior = inference.build_posterior(mcmc_method="slice_np_vectorized",
                                          mcmc_parameters={"num_chains": 10,
                                                           "thin": 5})
    proposal = posterior.set_default_x(x_o)

Examples of pathological convergence:

  1. SNPE
Screenshot 2024-10-04 at 20 55 48 Screenshot 2024-10-04 at 20 57 15 Screenshot 2024-10-04 at 20 58 59
  1. SNRE
Screenshot 2024-10-04 at 21 02 16 Screenshot 2024-10-04 at 21 04 02 Screenshot 2024-10-04 at 21 04 33
  1. SNLE
Screenshot 2024-10-04 at 21 06 35 Screenshot 2024-10-04 at 21 08 34 Screenshot 2024-10-04 at 21 14 38

Expected behavior
A proper and robust convergence to the true theta value with low variability across iterations and across various true values of the parameter to be inferred.

Additional context
I would appreciate suggestions as it is an urgent issue on which I have been stuck for more than a week now.

@paarth-dudani paarth-dudani added the bug Something isn't working label Oct 4, 2024
@michaeldeistler
Copy link
Contributor

You are currently using 100 simulations (i.e. datapoints) to estimate a 50D distribution (with NLE). I would suggest to increase the number of simulations to at least 1000.

@paarth-dudani
Copy link
Author

paarth-dudani commented Oct 5, 2024

What you suggested does seem to help but it doesn't fully address the issue:

  1. SNPE
Screenshot 2024-10-05 at 08 17 57 Screenshot 2024-10-05 at 08 23 14 Screenshot 2024-10-05 at 10 15 24
  1. SNRE
Screenshot 2024-10-05 at 08 18 16 Screenshot 2024-10-05 at 08 20 06 Screenshot 2024-10-05 at 08 22 58
  1. SNLE
Screenshot 2024-10-05 at 08 21 29 Screenshot 2024-10-05 at 08 34 49 Screenshot 2024-10-05 at 10 17 49

I changed the no. of simulations to 1500/2000

@manuelgloeckler
Copy link
Contributor

manuelgloeckler commented Oct 5, 2024

As general advice, if you perform such studies, use the same simulator for x_o as you use in training. In your case

simulator = lambda theta: ((1/initial_condition)*np.exp(-t/theta) + 0.05*torch.rand_like(theta)).float()

Which uses a uniform noise range from 0. to 0.05. On the other hand, the observation uses 0.05*np.random.normal(0,1), which uses a normal noisy distribution.

This adds the complication of model-misspecification to your problem, which can be problematic for the neural nets involved (i.e. x_o falls out of training distribution). But also leads to the fact that the true parameter does not need to be covered by the estimated posterior. I suppose this was not intentional. Specifically, in this case, the observation might not even be within the support of your data distribution implied by the simulator, and no parameter might exist to reproduce the observation (and Bayesian inference is the ill-defined anyway).

Next as general advice for (multi-round) training:

  • It is very sensitive to results obtained in the first round. Especially in your case, the simulator is almost deterministic, and its associated posterior is basically a point mass. This does make it even more important to get a good first estimate, as afterward, your training theta will basically just be a single number (i.e., the point mass of the first round posterior estimates). So it is a good idea to assign a sufficient simulation budget for the first round.
  • I would use a NSF as a density estimator in 1d (MAF is just a Gaussian transformed to the prior domain).
  • I would not enable retrain_from_scratch as this does amplify the requirement that your initial estimate is rather good.

Applying this to NPE will lead to consistently good posterior estimates on my machine:

import numpy as np 
import torch 

from torch.distributions import Uniform, Exponential

from sbi.inference import NPE
from sbi.analysis import pairplot

# ## Function to introduce noise into the exponential:
def noisy_exponential(initial_cond, scale, input):
    x_noise = np.zeros((len(input)),)
    for i in range(len(input)):
        value = 1/(initial_cond)*np.exp(-input[i]/scale) + 0.05*np.random.normal(0,1)
        x_noise[i] = value
    return x_noise


# Define the prior
class WrappedExponential(Exponential):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def log_prob(*args, **kwargs):
        log_probs = Exponential.log_prob(*args, **kwargs)
        return log_probs.squeeze()

exp_prior_mod = WrappedExponential(torch.tensor([2.0]))

n = 50
t = np.linspace(0,10,n)
num_dims = 1
num_sims = 1000
num_rounds = 2
theta_true = torch.tensor([1.])
initial_condition = 1.0

#x_o = torch.tensor(noisy_exponential(1.0, theta_true,t)).float()

simulator = lambda theta: ((1/initial_condition)*np.exp(-t/theta) + 0.05*torch.rand_like(theta)).float()

x_o = simulator(theta_true)

inference = NPE(prior=exp_prior_mod, density_estimator="nsf")
proposal = exp_prior_mod

# Later rounds use the APT loss.
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    x = simulator(theta)
    x[x>10] = 10
    _ = inference.append_simulations(theta, x, proposal=proposal).train(use_combined_loss = False, retrain_from_scratch=False)
    posterior = inference.build_posterior().set_default_x(x_o)
    proposal = posterior

@manuelgloeckler manuelgloeckler removed the bug Something isn't working label Oct 7, 2024
@paarth-dudani
Copy link
Author

paarth-dudani commented Oct 7, 2024

Thanks for your suggestions!

I have made the appropriate changes. In particular, I rewrote the simulator function, since it was not giving me the noisy data that I was looking for.

simulator = lambda theta: ((1/theta)*np.exp(-t/theta) + initial_condition + torch.rand_like(theta)).float()

gives me the following:

Screenshot 2024-10-07 at 16 18 35

Whereas using the following simulator:

def simulator_custom(theta, num_sim, input):
    output = np.zeros((num_sim, len(input)))
    for i in range(num_sim):
        for j in range(len(input)):
            output[i,j] = ((1/theta[i])*np.exp(-t[j]/theta[i]) + initial_condition + 0.05*np.random.normal(0,1))
            
    return torch.tensor(output).float()

gives me the following noisy data which I prefer:

Screenshot 2024-10-07 at 16 19 44

So with this change, as I implemented SNPE_C, SNLE and SNRE, they yield good posteriors but occasionally yield slightly pathological ones as shown below:

SNPE:

Screenshot 2024-10-07 at 16 22 38

SNRE:

Screenshot 2024-10-07 at 14 09 20

SNLE:

Screenshot 2024-10-07 at 14 05 59

Following is my code for SNPE inference, the other two are analogous:

inference = NPE(prior=exp_prior_mod, density_estimator='nsf')
proposal = exp_prior_mod

# Later rounds use the APT loss.
for _ in range(num_rounds):
    theta = proposal.sample((num_sim,))
    x = simulator_custom(theta, num_sim, t)
    x = np.clip(x, 0, 8)
    _ = inference.append_simulations(theta, x, proposal=proposal).train(use_combined_loss = True, retrain_from_scratch = False)
    posterior = inference.build_posterior().set_default_x(x_o)
    proposal = posterior

@manuelgloeckler
Copy link
Contributor

Hey,
Your results align with what I would expect. I am happy that it helped.

In the end, you are working with few model simulations in a relatively high dimensional x space. So, it is expected that there will be deviations in the posteriors returned by the different methods. Particularly, NL(R)E will have to estimate a 50d density with just a few samples. A general rule of thumb is that NPE > N(L)RE if dim(x) >> dim(theta).

Feel free to close the issue if this resolves your problem.

Kind regards,
Manuel

@paarth-dudani
Copy link
Author

In this case, should I reduce the dimensionality of the dataset?

The reason I am asking this is because according to the following paragraph from the sbi package website, my ground truth should be within the range of the posterior:

Screenshot 2024-10-10 at 13 35 56

However as you can see in the following plots and those in the previous post, sometimes I get posterior estimates which lie outside of the range of the posterior:

Screenshot 2024-10-10 at 16 23 33 Screenshot 2024-10-10 at 16 24 22 Screenshot 2024-10-10 at 16 28 09

What else can I do to rectify this? I face this issue with some estimates using other Neural estimation algorithms as well.

Following is my code:

n = 100
t = np.linspace(0,10,n)

num_dim = 1
num_sim = 1000
num_rounds = 4
theta_true = torch.tensor([2.0])
initial_condition =  0.0
Uniform_prior = BoxUniform(low=0.1 * torch.ones(num_dim), high = 4* torch.ones(num_dim))

def simulator_custom(theta, num_sim, input):
    output = np.zeros((num_sim, len(input)))
    for i in range(num_sim):
        for j in range(len(input)):
            output[i,j] = ((1/theta[i])*np.exp(-t[j]/theta[i]) + initial_condition + 0.05*np.random.normal(0,1))
            
    return torch.tensor(output).float()

np.random.seed(0)
x_o = simulator_custom(theta_true, 1, t)
np.random.seed(0)

inference = NPE(prior=exp_prior_mod, density_estimator='mdn')
proposal = exp_prior_mod

# Later rounds use the APT loss.
for _ in range(num_rounds):
    theta = proposal.sample((num_sim,))
    print(theta[-1])
    x = simulator_custom(theta, num_sim, t)
    x = np.clip(x, 0, 8)
    _ = inference.append_simulations(theta, x, proposal=proposal).train(use_combined_loss = True, retrain_from_scratch = True)
    posterior = inference.build_posterior().set_default_x(x_o)
    proposal = posterior

@manuelgloeckler
Copy link
Contributor

manuelgloeckler commented Oct 10, 2024

In general, under the true posterior, we would expect that within the 95% highest density region (i.e., the peak in your density plots), the true parameters is contained 95% of the time.

So if you say that "sometimes" it lies outside of the posterior, this does not necessarily indicate an error. In fact, given the true posterior, we would expect it to lay outside 5% of independent trials (following the example above).

A way to characterize if there is a problem is simulation-based calibration analysis. This, in fact e.g., it simply checks if the above is true for your approximation (for all x% levels).

There are a bunch of diagnostic tools implemented in SBI, that only require a test set of new thetas:

  • SBC(what I indicated above, can only indicate if the posterior is on average correct).
  • l-C2ST (can also test local correctness).

If you have rather high-dimensional data, then NPE does give you the opportunity to include an (embedding net)[https://sbi-dev.github.io/sbi/latest/tutorials/04_embedding_networks/]. This can be jointly trained with the density estimator or/and include certain regularity constraints you might want to impose.

It is helpful if you have very high-dimensional data. For example if you have images, it makes sense to use a convolutional neural network as an embedding net, that compresses the image into e.g a 100 dim "summary" vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants