bayesvalidrox.bayes_inference.mcmc.MCMC¶

class bayesvalidrox.bayes_inference.mcmc.MCMC(engine, mcmc_params, discrepancy, observation=None, out_names=None, selected_indices=None, use_emulator=False, out_dir='')¶

Bases: PostSampler

A class for bayesian inference via a Markov-Chain Monte-Carlo (MCMC) Sampler to approximate the posterior distribution of the Bayes theorem: $$p(theta|mathcal{y}) = frac{p(mathcal{y}|theta) p(theta)}

{p(mathcal{y})}.$$

This class make inference with emcee package [1] using an Affine Invariant Ensemble sampler (AIES) [2].

[1] Foreman-Mackey, D., Hogg, D.W., Lang, D. and Goodman, J., 2013.emcee:: the MCMC hammer. Publications of the Astronomical Society of the Pacific, 125(925), p.306. https://emcee.readthedocs.io/en/stable/
[2] Goodman, J. and Weare, J., 2010. Ensemble samplers with affine: invariance. Communications in applied mathematics and computational science, 5(1), pp.65-80.

Attributes¶

enginebvr.Engine

Engine object that contains the surrogate, model and exp_design.

mcmc_paramsdict

Dictionary of parameters for the mcmc. Required are - prior_samples: np.array of size [Nsamples, ndim]

With samples from the parameters to infer. If given, the walkers will be initialized with values sampled equally spaced based on the boundaries of the samples given here. No burnin will be done. Default is None - in which case the walkers are initialized randomly, with a burn in period.

n_steps: int
Number of steps/samples to generate for each walker
n_walkers: int
Number of walkers/independent chains in the ensemble
n_burn: int
Number of samples to consider in the burnin period
moves: Obj
sampling strategy which determines how new points are proposed. Must be a valid emcee move object. The following options are available (see the EMCEE website for more details https://emcee.readthedocs.io/en/stable/user/moves/):
emcee.moves.KDEMove()

emcee.moves.DEMove()

emcee.moves.StretchMove()

emcee.moves.DESnookerMove()

emcee.moves.WalkMove()

None - default value. If None is given, then EMCEE uses the StretchMove() by default
multiplrocessing: bool
True to parallelize the different walkers. Default is False
verbose: bool

discrepancyobject, optional

Object of class bvr.Discrepancy. The default is None.

observationdict, optional

Measurement/observation to use as reference. The default is None.

out_nameslist, optional

The list of requested output keys to be used for the analysis. The default is None. If None, all the defined outputs from the engine are used.

selected_indicesdict, optional

A dictionary with the selected indices of each model output. The default is None. If None, all measurement points are used in the analysis.

use_emulatorbool

Set to True if the emulator/metamodel should be used in the analysis. If False, the model is run.

out_dirstring

Directory to write the outputs to.

__init__(engine, mcmc_params, discrepancy, observation=None, out_names=None, selected_indices=None, use_emulator=False, out_dir='')¶

Methods

`__init__`(engine, mcmc_params, discrepancy[, ...])
`calculate_loglik_logbme`(model_evals[, ...])	Calculate log-likelihoods and logbme on the perturbed data.
`eval_model`(theta)	Evaluates the (meta-) model at the given theta.
`is_valid_move`()	Checks to see if user-provided Move is a valid EMCEE move.
`log_likelihood`(theta)	Computes likelihood ( p(mathcal{Y}\|theta)) of the performance of the (meta-)model in reproducing the observation data.
`log_posterior`(theta)	Computes the posterior likelihood (p(theta\| mathcal{Y})) for the given parameterset.
`log_prior`(theta)	Calculates the log prior likelihood ( p(theta)) for the given parameter set(s) ( theta ).
`normpdf`(outputs[, std_outputs, rmse])	Calculates the likelihood of simulation outputs compared with observation data.
`run_sampler`()	Run the MCMC sampler for the given observations and stdevs.

calculate_loglik_logbme(model_evals, surr_error=None, std_outputs=None) → tuple[ndarray, ndarray]¶

Calculate log-likelihoods and logbme on the perturbed data. This function assumes everything as Gaussian.

Parameters¶

model_evalsdict: Model or metamodel outputs as a dictionary.
surr_errordict, optional: A dictionary containing the root mean squared error as array of shape (n_samples, n_measurement) for each model output. The default is None.
std_outputsdict of 2d np arrays, optional: Standard deviation (uncertainty) associated to the output. The default is None.

Returns¶

log_likelihoodnp.ndarray: The calculated loglikelihoods. Size: (n_samples, n_bootstrap_itr).
log_bmenp.ndarray: The log bme. This also accounts for metamodel error, if self.use_emulator is True. Size: (1,n_bootstrap_itr).

eval_model(theta) → tuple[dict, dict]¶

Evaluates the (meta-) model at the given theta.

Parameters¶

thetaarray of shape (n_samples, n_params): Parameter set, i.e. proposals of the MCMC chains.

Returns¶

mean_preddict: Mean model prediction.
std_preddict: Std of model prediction.

is_valid_move() → bool¶

Checks to see if user-provided Move is a valid EMCEE move.

Raises¶

ValueError

Returns¶

bool: True if valid move, raises an error if not a valid move.

log_likelihood(theta) → ndarray¶

Computes likelihood ( p(mathcal{Y}|theta)) of the performance of the (meta-)model in reproducing the observation data.

Parameters¶

thetaarray of shape (n_samples, n_params): Parameter set, i.e. proposals of the MCMC chains.

Returns¶

log_likenp.ndarray: Log likelihood. Shape: (n_samples)

log_posterior(theta) → ndarray¶

Computes the posterior likelihood (p(theta| mathcal{Y})) for the given parameterset.

Parameters¶

thetaarray of shape (n_samples, n_params): Parameter set, i.e. proposals of the MCMC chains.

Returns¶

log_likenp.ndarray: Log posterior likelihood. Shape: (n_samples)

log_prior(theta) → ndarray¶

Calculates the log prior likelihood ( p(theta)) for the given parameter set(s) ( theta ).

Parameters¶

thetaarray of shape (n_samples, n_params): Parameter sets, i.e. proposals of MCMC chains.

Returns¶

logprior: np.ndarray: Log prior likelihood. If theta has only one row, a single value is returned otherwise an array of shape (n_samples) is returned.

normpdf(outputs, std_outputs=None, rmse=None) → ndarray¶

Calculates the likelihood of simulation outputs compared with observation data.

Parameters¶

outputsdict: The metamodel outputs as an array of shape (n_samples, n_measurement) for each model output.
std_outputsdict of 2d np arrays, optional: Standard deviation (uncertainty) associated to the output. The default is None.
rmsedict, optional: A dictionary containing the root mean squared error as array of shape (n_samples, n_measurement) for each model output. The default is None.

Returns¶

logLiknp.ndarray: Log-likelihoods. Shape: (n_samples)

run_sampler() → DataFrame¶

Run the MCMC sampler for the given observations and stdevs.

Returns¶

Posterior_dfpd.DataFrame: Posterior samples of the input parameters.