Bayesian inference

With Bayesian inference we ask the question ‘how does our understanding of the inputs change given some observation of the outputs of the model?’, i.e. we perform an updating step of the prior distributions to posterior, based on some observations. Bayesvalidrox provides a dedicated class to perform this task, bayesvalidrox.bayes_inference.bayes_inference.BayesInference, which infers the posterior via rejection-sampling or MCMC. The likelihood in rejection sampling is estimated with the help of bootstrapping. MCMC-specific parameters are to be given as a dictionary called mcmc_params and can include

  • init_samples: initial samples

  • n_steps: number of steps

  • n_walkers: number of walkers

  • n_burn: length of the burn-in

  • moves: function to use for the moves, e.g. taken from emcee

  • multiprocessing: setting for multiprocessing

  • verbose: verbosity

The observation should be set as Model.observations in the Engine, and an estimation of its uncertainty can be provided as a bayesvalidrox.bayes_inference.discrepancy.Discrepancy object.

Example

For this example we need to add the following imports.

>>> from bayesvalidrox import Discrepancy, BayesInference

In order to run Bayesian inference we first need to provide an observation. For this example we take an evaluation of the model on some chosen sample and add the resulting values as Model.observations. As this expects a 1D-array for each output key, we need to change the format slightly.

>>> true_sample = [[2]]
>>> observation = Model.run_model_parallel(true_sample)
>>> Model.observations = {}
>>> for key in observation:
>>>     if key == 'x_values':
>>>         Model.observations[key]=observation[key]
>>>     else:
>>>         Model.observations[key]=observation[key][0]

Next we define the uncertainty on the observation with the class bayesvalidrox.bayes_inference.discrepancy.Discrepancy. For this example we set the uncertainty to be zero-mean gaussian and dependent on the values in the observation, i.e. larger values have a larger uncertainty associated with them. The parameters contain the variance for each point in the observation.

Warning

For models with only a single uncertain input parameter, numerical issues can appear when the discrepancy is set only depending on the observed data. To resolve this, a small value can be added to the variance of the discrepancy.

>>> obsData = pd.DataFrame(Model.observations, columns=Model.Output.names)
>>> DiscrepancyOpts = Discrepancy('')
>>> DiscrepancyOpts.type = 'Gaussian'
>>> DiscrepancyOpts.parameters = obsData**2+0.01

Now we can initialize an object of class bayesvalidrox.bayes_inference.bayes_inference.BayesInference with all the wanted properties. This object has to be given our Engine. If it should use the surrogate during inference, set emulator to True, otherwise the model will be evaluated directly. We also set the defined Discrepancy. and set post_plot_pred if posterior predictions should be visualized.

>>> BayesObj = BayesInference(Engine_)
>>> BayesObj.emulator = True
>>> BayesObj.Discrepancy = DiscrepancyOpts
>>> BayesObj.plot_post_pred = True

In order to run with rejection sampling, we set the inference_method accordingly and add properties for bootstrap.

>>> BayesObj.inference_method = 'rejection'
>>> BayesObj.bootstrap = True
>>> BayesObj.n_bootstrap_itrs = 500
>>> BayesObj.bootstrap_noise = 2

If the sampling should be done with MCMC, then this is set as the inference_method and additional properties are given in mcmc_params. For this example we use the python package emcee to define the MCMC moves.

>>> BayesObj.inference_method = 'MCMC'
>>> import emcee
>>> BayesObj.mcmc_params = {
>>>     'n_steps': 1e4,
>>>     'n_walkers': 30,
>>>     'moves': emcee.moves.KDEMove(),
>>>     'multiprocessing': False,
>>>     'verbose': False
>>>     }

Then we run the inference.

>>> BayesObj.create_inference()

If the output directory BayesObj.out_dir is not set otherwise, the outputs are written into the folder Outputs_Bayes_model_Calib. This folder includes the posterior distribution of the input parameters, as well as the predictions resulting from the mean of the posterior. For inference with MCMC, chain diagnostics are also written out in the console.

---------------Posterior diagnostics---------------
Mean auto-correlation time: 2.057
Thin: 1
Burn-in: 4
Flat chain shape: (13380, 1)
Mean acceptance fraction*: 0.752
Gelman-Rubin Test**:  [1.001]

* This value must lay between 0.234 and 0.5.
** These values must be smaller than 1.1.
--------------------------------------------------