bayesvalidrox.bayes_inference.mcmc.MCMC

class bayesvalidrox.bayes_inference.mcmc.MCMC(engine, mcmc_params, discrepancy, observation=None, out_names=None, selected_indices=None, use_emulator=False, out_dir='')

Bases: PostSampler

A class for bayesian inference via a Markov-Chain Monte-Carlo (MCMC) Sampler to approximate the posterior distribution of the Bayes theorem: $$p(theta|mathcal{y}) = frac{p(mathcal{y}|theta) p(theta)}

{p(mathcal{y})}.$$

This class make inference with emcee package [1] using an Affine Invariant Ensemble sampler (AIES) [2].

[1] Foreman-Mackey, D., Hogg, D.W., Lang, D. and Goodman, J., 2013.emcee:

the MCMC hammer. Publications of the Astronomical Society of the Pacific, 125(925), p.306. https://emcee.readthedocs.io/en/stable/

[2] Goodman, J. and Weare, J., 2010. Ensemble samplers with affine

invariance. Communications in applied mathematics and computational science, 5(1), pp.65-80.

Attributes

enginebvr.Engine

Engine object that contains the surrogate, model and exp_design.

mcmc_paramsdict

Dictionary of parameters for the mcmc. Required are - prior_samples: np.array of size [Nsamples, ndim]

With samples from the parameters to infer. If given, the walkers will be initialized with values sampled equally spaced based on the boundaries of the samples given here. No burnin will be done. Default is None - in which case the walkers are initialized randomly, with a burn in period.

  • n_steps: int

    Number of steps/samples to generate for each walker

  • n_walkers: int

    Number of walkers/independent chains in the ensemble

  • n_burn: int

    Number of samples to consider in the burnin period

  • moves: Obj

    sampling strategy which determines how new points are proposed. Must be a valid emcee move object. The following options are available (see the EMCEE website for more details https://emcee.readthedocs.io/en/stable/user/moves/):

    • emcee.moves.KDEMove()

    • emcee.moves.DEMove()

    • emcee.moves.StretchMove()

    • emcee.moves.DESnookerMove()

    • emcee.moves.WalkMove()

    • None - default value. If None is given, then EMCEE uses the StretchMove() by default

  • multiplrocessing: bool

    True to parallelize the different walkers. Default is False

  • verbose: bool

discrepancyobject, optional

Object of class bvr.Discrepancy. The default is None.

observationdict, optional

Measurement/observation to use as reference. The default is None.

out_nameslist, optional

The list of requested output keys to be used for the analysis. The default is None. If None, all the defined outputs from the engine are used.

selected_indicesdict, optional

A dictionary with the selected indices of each model output. The default is None. If None, all measurement points are used in the analysis.

use_emulatorbool

Set to True if the emulator/metamodel should be used in the analysis. If False, the model is run.

out_dirstring

Directory to write the outputs to.

__init__(engine, mcmc_params, discrepancy, observation=None, out_names=None, selected_indices=None, use_emulator=False, out_dir='')

Methods

__init__(engine, mcmc_params, discrepancy[, ...])

calculate_loglik_logbme(model_evals[, ...])

Calculate log-likelihoods and logbme on the perturbed data.

eval_model(theta)

Evaluates the (meta-) model at the given theta.

is_valid_move()

Checks to see if user-provided Move is a valid EMCEE move.

log_likelihood(theta)

Computes likelihood ( p(mathcal{Y}|theta)) of the performance of the (meta-)model in reproducing the observation data.

log_posterior(theta)

Computes the posterior likelihood (p(theta| mathcal{Y})) for the given parameterset.

log_prior(theta)

Calculates the log prior likelihood ( p(theta)) for the given parameter set(s) ( theta ).

normpdf(outputs[, std_outputs, rmse])

Calculates the likelihood of simulation outputs compared with observation data.

run_sampler()

Run the MCMC sampler for the given observations and stdevs.

calculate_loglik_logbme(model_evals, surr_error=None, std_outputs=None) tuple[ndarray, ndarray]

Calculate log-likelihoods and logbme on the perturbed data. This function assumes everything as Gaussian.

Parameters

model_evalsdict

Model or metamodel outputs as a dictionary.

surr_errordict, optional

A dictionary containing the root mean squared error as array of shape (n_samples, n_measurement) for each model output. The default is None.

std_outputsdict of 2d np arrays, optional

Standard deviation (uncertainty) associated to the output. The default is None.

Returns

log_likelihoodnp.ndarray

The calculated loglikelihoods. Size: (n_samples, n_bootstrap_itr).

log_bmenp.ndarray

The log bme. This also accounts for metamodel error, if self.use_emulator is True. Size: (1,n_bootstrap_itr).

eval_model(theta) tuple[dict, dict]

Evaluates the (meta-) model at the given theta.

Parameters

thetaarray of shape (n_samples, n_params)

Parameter set, i.e. proposals of the MCMC chains.

Returns

mean_preddict

Mean model prediction.

std_preddict

Std of model prediction.

is_valid_move() bool

Checks to see if user-provided Move is a valid EMCEE move.

Raises

ValueError

Returns

bool

True if valid move, raises an error if not a valid move.

log_likelihood(theta) ndarray

Computes likelihood ( p(mathcal{Y}|theta)) of the performance of the (meta-)model in reproducing the observation data.

Parameters

thetaarray of shape (n_samples, n_params)

Parameter set, i.e. proposals of the MCMC chains.

Returns

log_likenp.ndarray

Log likelihood. Shape: (n_samples)

log_posterior(theta) ndarray

Computes the posterior likelihood (p(theta| mathcal{Y})) for the given parameterset.

Parameters

thetaarray of shape (n_samples, n_params)

Parameter set, i.e. proposals of the MCMC chains.

Returns

log_likenp.ndarray

Log posterior likelihood. Shape: (n_samples)

log_prior(theta) ndarray

Calculates the log prior likelihood ( p(theta)) for the given parameter set(s) ( theta ).

Parameters

thetaarray of shape (n_samples, n_params)

Parameter sets, i.e. proposals of MCMC chains.

Returns

logprior: np.ndarray

Log prior likelihood. If theta has only one row, a single value is returned otherwise an array of shape (n_samples) is returned.

normpdf(outputs, std_outputs=None, rmse=None) ndarray

Calculates the likelihood of simulation outputs compared with observation data.

Parameters

outputsdict

The metamodel outputs as an array of shape (n_samples, n_measurement) for each model output.

std_outputsdict of 2d np arrays, optional

Standard deviation (uncertainty) associated to the output. The default is None.

rmsedict, optional

A dictionary containing the root mean squared error as array of shape (n_samples, n_measurement) for each model output. The default is None.

Returns

logLiknp.ndarray

Log-likelihoods. Shape: (n_samples)

run_sampler() DataFrame

Run the MCMC sampler for the given observations and stdevs.

Returns

Posterior_dfpd.DataFrame

Posterior samples of the input parameters.