bayesvalidrox.bayes_inference.bayes_inference.BayesInference

class bayesvalidrox.bayes_inference.bayes_inference.BayesInference(engine, discrepancy=None, use_emulator=True, name='Calib', out_names=None, selected_indices=None, prior_samples=None, n_prior_samples=100000, measured_data=None, inference_method='rejection', mcmc_params=None, bootstrap_method='normal', n_bootstrap_itrs=1, perturbed_data: list | None = None, bootstrap_noise=0.05, valid_metrics=None, plot=True, max_a_posteriori='mean', out_dir='')

Bases: object

A class to perform Bayesian Analysis.

Attributes

engineobj

A (trained) bvr.Engine object.

discrepancyobj

The discrepancy object for the sigma2s, i.e. the diagonal entries of the variance matrix for a multivariate normal likelihood.

namestr, optional

The type of analysis, either calibration (calib) or validation (valid). This is used to decide which model observations are used in the analysis. The default is ‘calib’.

use_emulatorbool

Set to True if the emulator/metamodel should be used in the analysis. If False, the model is run.

out_nameslist, optional

The list of requested output keys to be used for the analysis. The default is None. If None, all the defined outputs from the engine are used.

selected_indicesdict, optional

A dictionary with the selected indices of each model output. The default is None. If None, all measurement points are used in the analysis.

prior_samplesarray of shape (n_samples, n_params), optional

The samples to be used in the analysis. The default is None. If None the samples are drawn from the probablistic input parameter object of the MetaModel object.

n_prior_samplesint, optional

Number of samples to be used in the analysis. The default is 500000. If samples is not None, this argument will be assigned based on the number of samples given.

measured_datadict, optional

A dictionary containing the observation data. The default is None. if None, the observation defined in the Model object of the MetaModel is used.

inference_methodstr, optional

A method for approximating the posterior distribution in the Bayesian inference step. The default is ‘rejection’, which stands for rejection sampling. A Markov Chain Monte Carlo sampler can be simply selected by passing ‘MCMC’.

mcmc_paramsdict, optional

A dictionary with args required for the Bayesian inference with MCMC. The default is None.

Pass the mcmc_params like the following:

>>> mcmc_params:{
    'init_samples': None,  # initial samples
    'n_walkers': 100,  # number of walkers (chain)
    'n_steps': 100000,  # number of maximum steps
    'n_burn': 200,  # number of burn-in steps
    'moves': None,  # Moves for the emcee sampler
    'multiprocessing': False,  # multiprocessing
    'verbose': False # verbosity
    }

The items shown above are the default values. If any parmeter is not defined, the default value will be assigned to it.

bootstrap_methodstring, optional

Method of bootstrapping. If ‘normal’ then common bootstrapping is used, if ‘none’ then no bootstrap is applied. If set to ‘loocv’, the LOOCV procedure is used to estimate the bayesian Model Evidence (BME). The default is ‘normal’.

n_bootstrap_itrsint, optional

Number of bootstrap iteration. The default is 1. If bootstrap_method is loocv, this is set to the total length of the observation data set.

perturbed_dataarray of shape (n_bootstrap_itrs, n_obs), optional

User defined perturbed data. The default is [].

bootstrap_noisefloat, optional

A noise level to perturb the data set. The default is 0.05.

valid_metricslist, optional

List of the validation metrics. The following metrics are supported: 1. KLD : Kullback-Leibler Divergence 2. inf_entropy: Information entropy The default is [].

plotbool, optional

Toggles evaluation plots including posterior predictive plots and plots of the model outputs vs the metamodel predictions for the maximum a posteriori (defined as max_a_posteriori) parameter set. The default is True.

max_a_posterioristr, optional

Maximum a posteriori. ‘mean’ and ‘mode’ are available. The default is ‘mean’.

out_dirstr, optional

The output directory that any generated plots are saved in. The default ‘’ leads to the folder “Outputs_Bayes_{self.engine.model.name}_{self.name}”.

__init__(engine, discrepancy=None, use_emulator=True, name='Calib', out_names=None, selected_indices=None, prior_samples=None, n_prior_samples=100000, measured_data=None, inference_method='rejection', mcmc_params=None, bootstrap_method='normal', n_bootstrap_itrs=1, perturbed_data: list | None = None, bootstrap_noise=0.05, valid_metrics=None, plot=True, max_a_posteriori='mean', out_dir='')

Methods

__init__(engine[, discrepancy, ...])

calculate_loglik_logbme(model_evals, surr_error)

Calculate log-likelihoods and logbme on the perturbed data.

calculate_valid_metrics(log_likelihoods, log_bme)

Calculate KLD and information entropy if noted in self.valid_metrics.

get_surr_error()

Get rmse of the surrogate from the engine.

perturb_data(data, output_names)

Returns an array with n_bootstrap_itrs rows of perturbed data.

plot_logbme()

Plots the log_BME if bootstrap is active.

plot_max_a_posteriori()

Plots the response of the model output against that of the metamodel at the maximum a posteriori sample (mean or mode of posterior.)

plot_post_params([corner_title_fmt])

Plots the multivar.

plot_post_predictive()

Plots the posterior predictives against the observation data.

posterior_predictive([save])

Evaluates the engine on the prior- and posterior predictive samples, and stores the results as hdf5 files. priorPredictive.hdf5 : Prior predictive samples. postPredictive_wo_noise.hdf5 : Posterior predictive samples without the additive noise. postPredictive.hdf5 : Posterior predictive samples with the additive noise.

run_inference()

Performs Bayesian inference on the given setup.

run_validation()

Validate a model on the given samples by calculating the loglikelihood and BME.

setup()

This function sets up the inference by checking the inputs and getting needed data.

write_as_hdf5(name, data, x_values)

Write given values to an hdf5 file.

calculate_loglik_logbme(model_evals, surr_error) tuple[ndarray, ndarray]

Calculate log-likelihoods and logbme on the perturbed data.

Parameters

model_evalsdict

Model or metamodel outputs as a dictionary.

surr_errordict

Estimation of surrogate error via root mean square error.

Returns

log_likelihoodnp.ndarray

The calculated loglikelihoods. Size: (n_samples, n_bootstrap_itr).

log_bmenp.ndarray

The log bme. This also accounts for metamodel error, if self.use_emulator is True. Size: (1,n_bootstrap_itr).

calculate_valid_metrics(log_likelihoods, log_bme) tuple[ndarray, ndarray]

Calculate KLD and information entropy if noted in self.valid_metrics.

Parameters

log_likelihoodnp.array

Calculated loglikelihoods. Size: (n_samples, n_bootstrap_itr).

log_bmenp.array

Calculated log bme. This should include for metamodel error, if self.use_emulator is True. Size: (1,n_bootstrap_itr).

Raises

AttributeError

Returns

kldnp.ndarray

Calculated KLD, size: (1,n_bootstrap_itr).

inf_entropynp.ndarray

Calculated information entropy, size: (1,n_bootstrap_itr).

get_surr_error() dict

Get rmse of the surrogate from the engine.

Returns

surr_errordict

RMSE of metamodel if available. Otherwise returns None.

perturb_data(data, output_names) dict

Returns an array with n_bootstrap_itrs rows of perturbed data. The first row includes the original observation data.

If self.bootstrap_method is ‘loocv’, a 2d-array will be returned with repeated rows and zero diagonal entries.

Parameters

datapandas DataFrame

Observation data.

output_nameslist

The output names.

Raises

AttributeError

Returns

final_datadict

Perturbed data set for each key in output_names. Shape of np.ndarray for each key: (n_bootstrap, #xvalues in measurement data)

plot_logbme()

Plots the log_BME if bootstrap is active.

plot_max_a_posteriori()

Plots the response of the model output against that of the metamodel at the maximum a posteriori sample (mean or mode of posterior.)

plot_post_params(corner_title_fmt='.2e')

Plots the multivar. posterior parameter distribution.

Parameters

corner_title_fmtstr, optional

Title format for the posterior distribution plot with python package corner. The default is ‘.2e’.

plot_post_predictive()

Plots the posterior predictives against the observation data.

posterior_predictive(save=False)

Evaluates the engine on the prior- and posterior predictive samples, and stores the results as hdf5 files.

priorPredictive.hdf5 : Prior predictive samples. postPredictive_wo_noise.hdf5 : Posterior predictive samples

without the additive noise.

postPredictive.hdf5Posterior predictive samples with the

additive noise.

Parameters

savebool, optional

Toggles storing the posterior predictives as hdf5 files. The default is False.

run_inference() DataFrame

Performs Bayesian inference on the given setup.

Returns

posterior_dfpd.DataFrame

The generated posterior samples.

run_validation() ndarray

Validate a model on the given samples by calculating the loglikelihood and BME. The data used in the calculation can be perturbed with e.g. loo.

Returns

log_bmenp.ndarray

The log-BME calculated on perturbed reference data.

setup()

This function sets up the inference by checking the inputs and getting needed data.

write_as_hdf5(name, data, x_values)

Write given values to an hdf5 file.

Parameters

namestring

Filename to write to.

datadict

Data to write out. Is expected to be model or metamodel runs.

x_valueslist

The x_values that correspond to the written data.