bayesvalidrox.bayes_inference.bayes_inference.BayesInference¶
- class bayesvalidrox.bayes_inference.bayes_inference.BayesInference(engine, discrepancy=None, use_emulator=True, name='Calib', out_names=None, selected_indices=None, prior_samples=None, n_prior_samples=100000, measured_data=None, inference_method='rejection', mcmc_params=None, bootstrap_method='normal', n_bootstrap_itrs=1, perturbed_data: list | None = None, bootstrap_noise=0.05, valid_metrics=None, plot=True, max_a_posteriori='mean', out_dir='')¶
Bases:
object
A class to perform Bayesian Analysis.
Attributes¶
- engineobj
A (trained) bvr.Engine object.
- discrepancyobj
The discrepancy object for the sigma2s, i.e. the diagonal entries of the variance matrix for a multivariate normal likelihood.
- namestr, optional
The type of analysis, either calibration (calib) or validation (valid). This is used to decide which model observations are used in the analysis. The default is ‘calib’.
- use_emulatorbool
Set to True if the emulator/metamodel should be used in the analysis. If False, the model is run.
- out_nameslist, optional
The list of requested output keys to be used for the analysis. The default is None. If None, all the defined outputs from the engine are used.
- selected_indicesdict, optional
A dictionary with the selected indices of each model output. The default is None. If None, all measurement points are used in the analysis.
- prior_samplesarray of shape (n_samples, n_params), optional
The samples to be used in the analysis. The default is None. If None the samples are drawn from the probablistic input parameter object of the MetaModel object.
- n_prior_samplesint, optional
Number of samples to be used in the analysis. The default is 500000. If samples is not None, this argument will be assigned based on the number of samples given.
- measured_datadict, optional
A dictionary containing the observation data. The default is None. if None, the observation defined in the Model object of the MetaModel is used.
- inference_methodstr, optional
A method for approximating the posterior distribution in the Bayesian inference step. The default is ‘rejection’, which stands for rejection sampling. A Markov Chain Monte Carlo sampler can be simply selected by passing ‘MCMC’.
- mcmc_paramsdict, optional
A dictionary with args required for the Bayesian inference with MCMC. The default is None.
Pass the mcmc_params like the following:
>>> mcmc_params:{ 'init_samples': None, # initial samples 'n_walkers': 100, # number of walkers (chain) 'n_steps': 100000, # number of maximum steps 'n_burn': 200, # number of burn-in steps 'moves': None, # Moves for the emcee sampler 'multiprocessing': False, # multiprocessing 'verbose': False # verbosity }
The items shown above are the default values. If any parmeter is not defined, the default value will be assigned to it.
- bootstrap_methodstring, optional
Method of bootstrapping. If ‘normal’ then common bootstrapping is used, if ‘none’ then no bootstrap is applied. If set to ‘loocv’, the LOOCV procedure is used to estimate the bayesian Model Evidence (BME). The default is ‘normal’.
- n_bootstrap_itrsint, optional
Number of bootstrap iteration. The default is 1. If bootstrap_method is loocv, this is set to the total length of the observation data set.
- perturbed_dataarray of shape (n_bootstrap_itrs, n_obs), optional
User defined perturbed data. The default is [].
- bootstrap_noisefloat, optional
A noise level to perturb the data set. The default is 0.05.
- valid_metricslist, optional
List of the validation metrics. The following metrics are supported: 1. KLD : Kullback-Leibler Divergence 2. inf_entropy: Information entropy The default is [].
- plotbool, optional
Toggles evaluation plots including posterior predictive plots and plots of the model outputs vs the metamodel predictions for the maximum a posteriori (defined as max_a_posteriori) parameter set. The default is True.
- max_a_posterioristr, optional
Maximum a posteriori. ‘mean’ and ‘mode’ are available. The default is ‘mean’.
- out_dirstr, optional
The output directory that any generated plots are saved in. The default ‘’ leads to the folder “Outputs_Bayes_{self.engine.model.name}_{self.name}”.
- __init__(engine, discrepancy=None, use_emulator=True, name='Calib', out_names=None, selected_indices=None, prior_samples=None, n_prior_samples=100000, measured_data=None, inference_method='rejection', mcmc_params=None, bootstrap_method='normal', n_bootstrap_itrs=1, perturbed_data: list | None = None, bootstrap_noise=0.05, valid_metrics=None, plot=True, max_a_posteriori='mean', out_dir='')¶
Methods
__init__
(engine[, discrepancy, ...])calculate_loglik_logbme
(model_evals, surr_error)Calculate log-likelihoods and logbme on the perturbed data.
calculate_valid_metrics
(log_likelihoods, log_bme)Calculate KLD and information entropy if noted in self.valid_metrics.
Get rmse of the surrogate from the engine.
perturb_data
(data, output_names)Returns an array with n_bootstrap_itrs rows of perturbed data.
Plots the log_BME if bootstrap is active.
Plots the response of the model output against that of the metamodel at the maximum a posteriori sample (mean or mode of posterior.)
plot_post_params
([corner_title_fmt])Plots the multivar.
Plots the posterior predictives against the observation data.
posterior_predictive
([save])Evaluates the engine on the prior- and posterior predictive samples, and stores the results as hdf5 files. priorPredictive.hdf5 : Prior predictive samples. postPredictive_wo_noise.hdf5 : Posterior predictive samples without the additive noise. postPredictive.hdf5 : Posterior predictive samples with the additive noise.
Performs Bayesian inference on the given setup.
Validate a model on the given samples by calculating the loglikelihood and BME.
setup
()This function sets up the inference by checking the inputs and getting needed data.
write_as_hdf5
(name, data, x_values)Write given values to an hdf5 file.
- calculate_loglik_logbme(model_evals, surr_error) tuple[ndarray, ndarray] ¶
Calculate log-likelihoods and logbme on the perturbed data.
Parameters¶
- model_evalsdict
Model or metamodel outputs as a dictionary.
- surr_errordict
Estimation of surrogate error via root mean square error.
Returns¶
- log_likelihoodnp.ndarray
The calculated loglikelihoods. Size: (n_samples, n_bootstrap_itr).
- log_bmenp.ndarray
The log bme. This also accounts for metamodel error, if self.use_emulator is True. Size: (1,n_bootstrap_itr).
- calculate_valid_metrics(log_likelihoods, log_bme) tuple[ndarray, ndarray] ¶
Calculate KLD and information entropy if noted in self.valid_metrics.
Parameters¶
- log_likelihoodnp.array
Calculated loglikelihoods. Size: (n_samples, n_bootstrap_itr).
- log_bmenp.array
Calculated log bme. This should include for metamodel error, if self.use_emulator is True. Size: (1,n_bootstrap_itr).
Raises¶
AttributeError
Returns¶
- kldnp.ndarray
Calculated KLD, size: (1,n_bootstrap_itr).
- inf_entropynp.ndarray
Calculated information entropy, size: (1,n_bootstrap_itr).
- get_surr_error() dict ¶
Get rmse of the surrogate from the engine.
Returns¶
- surr_errordict
RMSE of metamodel if available. Otherwise returns None.
- perturb_data(data, output_names) dict ¶
Returns an array with n_bootstrap_itrs rows of perturbed data. The first row includes the original observation data.
If self.bootstrap_method is ‘loocv’, a 2d-array will be returned with repeated rows and zero diagonal entries.
Parameters¶
- datapandas DataFrame
Observation data.
- output_nameslist
The output names.
Raises¶
AttributeError
Returns¶
- final_datadict
Perturbed data set for each key in output_names. Shape of np.ndarray for each key: (n_bootstrap, #xvalues in measurement data)
- plot_logbme()¶
Plots the log_BME if bootstrap is active.
- plot_max_a_posteriori()¶
Plots the response of the model output against that of the metamodel at the maximum a posteriori sample (mean or mode of posterior.)
- plot_post_params(corner_title_fmt='.2e')¶
Plots the multivar. posterior parameter distribution.
Parameters¶
- corner_title_fmtstr, optional
Title format for the posterior distribution plot with python package corner. The default is ‘.2e’.
- plot_post_predictive()¶
Plots the posterior predictives against the observation data.
- posterior_predictive(save=False)¶
Evaluates the engine on the prior- and posterior predictive samples, and stores the results as hdf5 files.
priorPredictive.hdf5 : Prior predictive samples. postPredictive_wo_noise.hdf5 : Posterior predictive samples
without the additive noise.
- postPredictive.hdf5Posterior predictive samples with the
additive noise.
Parameters¶
- savebool, optional
Toggles storing the posterior predictives as hdf5 files. The default is False.
- run_inference() DataFrame ¶
Performs Bayesian inference on the given setup.
Returns¶
- posterior_dfpd.DataFrame
The generated posterior samples.
- run_validation() ndarray ¶
Validate a model on the given samples by calculating the loglikelihood and BME. The data used in the calculation can be perturbed with e.g. loo.
Returns¶
- log_bmenp.ndarray
The log-BME calculated on perturbed reference data.
- setup()¶
This function sets up the inference by checking the inputs and getting needed data.