TUTORIAL

Here we provide an introductory tutorial for the general workflow with bayesvalidrox. This tutorial follows along the example Example: Analytical function to explain the basic functionality of working with bayesvalidrox surrogates, postprocessing and the Bayesian classes. The full code can be found in examples.analytical-function.test_analytical_function.

Import necessary libraries

>>> import numpy as np
>>> import pandas as pd
>>> import sys
>>> import joblib
>>> from IPython.display import IFrame

Define the model with PyLinkForwardModel

We use the bayesvalidrox.pylink.pylink.PyLinkForwardModel object to define the model.

>>> from bayesvalidrox import PyLinkForwardModel
>>> Model = PyLinkForwardModel()

Since the analytical function is implmented as a python function in a separate file, we can set the type of the link to Function and only need to pass it’s name (without the .py extension) to the object variable py_file. Note that the function name in the python script should match that of the script.

The name variable takes any user defined string.

>>> Model.link_type = 'Function'
>>> Model.py_file = 'analytical_function'
>>> Model.name = 'AnalyticFunc'

The model output names are defined as a list of strings. These names will also be adopted by the surrogate and used in all plots.

>>> Model.Output.names = ['Z']

For this example, we have a Monte-Carlo reference solution for the first moments (mean and standard deviation) of the analytical function. The numpy (*.npy) files for this can be found in the data\ directory of the example. We will later discuss the estimation of the first two moments from the surrogate model, so let us import the mc-reference here. These values can be passed in a form of a dictionary to the object variable mc_reference.

>>> Model.mc_reference = {}
>>> Model.mc_reference['Time [s]'] = np.arange(0, 10, 1.) / 9
>>> Model.mc_reference['mean'] = np.load(f"data/mean_2.npy")
>>> Model.mc_reference['std'] = np.load(f"data/std_2.npy")

Some of the training methods and evaluations make use of given data. For this we choose the model results for the parameter values X=[0,0].

>>> Model.observations = {}
>>> Model.observations['Time [s]'] = np.arange(0, 10, 1.) / 9
>>> Model.observations['Z'] = np.repeat([2.], 10)

Define probabilistic input model

Import and instantiate the input object

>>> from bayesvalidrox import Input
>>> Inputs = Input()

Now, we define the distribution of the model inputs. bayesvalidrox accepts the definition in two ways: by defining the distribution directly or by passing available data. The latter is handy when little information is available on the parameters or they do not follow any typical distributions. Here we show both options, the associated example Example: Analytical function read the input parameters from a numpy file in the data/ directory.

Option I: Define distribution directy with their name, type and parameters

>>> # First parameter
>>> Inputs.add_marginals()
>>> Inputs.Marginals[0].name = '$X_1$'
>>> Inputs.Marginals[0].dist_type = 'unif'
>>> Inputs.Marginals[0].parameters =  [-5, 5]
>>>
>>> # Second parameter
>>> Inputs.add_marginals()
>>> Inputs.Marginals[1].name = '$X_2$'
>>> Inputs.Marginals[1].dist_type = 'unif'
>>> Inputs.Marginals[1].parameters =  [-5, 5]

Option II: Pass available data for input parameters

>>> inputParams = np.load('data/InputParameters_2.npy')
>>>
>>> # First parameter
>>> Inputs.add_marginals()
>>> Inputs.Marginals[0].name = '$X_1$'
>>> Inputs.Marginals[0].input_data = inputParams[:, 0]
>>>
>>> # Second parameter
>>> Inputs.add_marginals()
>>> Inputs.Marginals[1].name = '$X_2$'
>>> Inputs.Marginals[1].input_data = inputParams[:, 1]

Define surrogate (meta) model

In bayesvalidrox a metamodel is an object of class bayesvalidrox.surrogate_models.surrogate_models.MetaModel. This object also accepts the inputs we defined in the previous section as an argument.

>>> from bayesvalidrox import MetaModel
>>> MetaModelOpts = MetaModel(Inputs)

In this example, we use the arbitrary Polynomial Chaos Expansion ('aPCE') as our meta_model_type.

>>> MetaModelOpts.meta_model_type = 'aPCE'

Note

In bayesvalidrox two implementations of the Polynomial Chaos Expansion are available, 'PCE' and 'aPCE'. The tag 'PCE' uses the generalized Polynomial Chaos Expansion by Xiu & Karniadakis (2002). while using 'aPCE' results in its arbitrary extension by Oladyshkin & Nowak (2012).

In bayesvalidrox the calculation of the polynomial coefficient is by standard done as described in Blatman & Sudret (2011).

In addition the regression method to be used at the core of this calculation has to be set. Here we choose 'FastARD', as it also induces sparsity in the expansion.

>>> MetaModelOpts.pce_reg_method = 'FastARD'

The polynomial degrees to consider can be either set as a range, or a single value to indicate the highest allowed degree.

>>> MetaModelOpts.pce_deg = np.arange(9)

The truncation shceme of the expansion is given as the q-norm, with a value between 0 and 1. A value of 1 results in standard truncation of the expansion, while smaller values make the expansion prefer less combined terms.

>>> MetaModelOpts.pce_q_norm = 0.75

Set the experimental design

The experimental design provides instructions on how to sample the input parameter space for training and evaluating the surrogate. Various sampling methods are available, and the samples can also be given by the user.

>>> ExpDesign = ExpDesign(Inputs)
>>> ExpDesign.n_init_samples = 100
>>> ExpDesign.sampling_method = 'latin_hypercube'

Train the surrogate with an engine

Training is done by giving the model, experimental design and the surrogate model to an engine, which performs the training for us. The engine is of class bayesvalidrox.surrogate_models.engine.Engine.

>>> engine = Engine(MetaModelOpts, Model, ExpDesign)

Now, we can start training the surrogate (meta-) model by starting the engine and using the method train_normal. The engine obtains the training samples from the experimental design using the sampling strategy we set, runs the model on these samples and trains the surrogate on the results.

>>> engine.start_engine()
>>> engine.train_normal()

Once this has run through we can obtain the trained metamodel from the engine.

>>> MetaModel_trained = engine.MetaModel

As bayesvalidrox uses the engine class for postprocessing and inference, we will save it at this point as a pkl object. This can be easily read in to avoid retraining the surrogate.

>>> with open(f'PCEengine_{Model.name}.pkl', 'wb') as output:
>>>     joblib.dump(engine, output, 2)

Sequential training

The basic surrogate training that we just performed is done only on one static set of data. bayesvalidrox also provide the option of sequential training, also known as active learning, where additional samples to be trained on are chosen by the surrogate. This will split the training into two parts. In the first part the training is performed as before, though the size of this initial training set can a bit smaller.

>>> ExpDesign.n_init_samples = 3*ndim
>>> ExpDesign.sampling_method = 'latin_hypercube'

The options for sequential training are listed in the dedicated page of the USER GUIDE. New samples are set by exploration and exploitation. Exploration refers to samples that are randomly drawn from the prior input space, while exploitation can use different metrics. The tradeoff between the two helps to avoid overfitting, while keeping the faster convergence given by the exploitation methods.

>>> ExpDesign.n_new_samples = 1
>>> ExpDesign.n_max_samples = 150
>>> ExpDesign.mod_LOO_threshold = 1e-16
>>>
>>> ExpDesign.tradeoff_scheme = None
>>> ExpDesign.explore_method = 'random'
>>>
>>> # Use when 'Voronoi' or 'random' or 'latin_hypercube' chosen
>>> ExpDesign.n_canddidate = 1000
>>> ExpDesign.n_cand_groups = 4

Here we set the exploitaiton method to be Bayesian Active Learning, which chooses the new samples based on the information gain with respect to some given data, here the model results described earlier.

In addition we need to set the information metric to use, here 'DKL' is chosen.

>>> ExpDesign.exploit_method = 'BayesActDesign'
>>> ExpDesign.util_func = 'DKL'

This active learning strategy also relies on the data uncertainty, so we set this to follow a Gaussian distribution around all values with standard deviations that are as large as the values themselves.

>>> obsData = pd.DataFrame(Model.observations, columns=Model.Output.names)
>>> DiscrepancyOpts = Discrepancy('')
>>> DiscrepancyOpts.type = 'Gaussian'
>>> DiscrepancyOpts.parameters = obsData**2
>>> MetaModelOpts.Discrepancy = DiscrepancyOpts

The measures calculated in each training iteration can also be plotted.

>>> ExpDesign.post_snapshot = False
>>> ExpDesign.step_snapshot = 1
>>> ExpDesign.max_a_post = [0] * ndim

For calculating and plotting the validation error of the surrogate in each iteration, additional references can be given.

>>> prior = np.load(f"data/Prior_{ndim}.npy")
>>> prior_outputs = np.load(f"data/origModelOutput_{ndim}.npy")
>>> likelihood = np.load(f"data/validLikelihoods_{ndim}.npy")
>>> ExpDesign.valid_samples = prior[:500]
>>> ExpDesign.valid_model_runs = {'Z': prior_outputs[:500]}

The sequential training is again performed by the class bayesvalidrox.surrogate_models.engine.Engine, but using the function train_sequential.

>>> engine.ExpDesign = ExpDesign
>>> engine.train_sequential()

Post-processing

The available post-processing methods for bayesvalidrox are given in the class bayesvalidrox.post_processing.post_processing.PostProcessing. All results created by this class are automatically stored in an output folder Outputs_PostProcessing_*modelname.

>>> from bayesvalidrox import PostProcessing
>>> PostPCE = PostProcessing(engine)

Since the reference moments obtained from a Monte-Carlo simulation is available, we only need to call the method plot_moments method from the PostProcessing object. This method generates a plot and stores it in the output folder.

>>> PostPCE.plot_moments()

The method valid_metamodel allows for visual comparison between the model and surrogate for samples from the prior parameter distribution. The samples can be drawn randomly, or set by the user.

>>> PostPCE.valid_metamodel(n_samples=3)

Another way to check the accuracy of the meta model is to use the method accuracyCheckMetaModel to show the Root Mean Square Error (RMSE) and the validation error.

>>> PostPCE.accuracyCheckMetaModel(nSamples=200)

Global sensitivity analysis can be performed on the surrogate via the Total Sobol Indices. These are especially cheap to compute for PCE as they can exploit the properties of the calculated coefficients. The method sobolIndicesPCE returns a dictionary that contains the total sobol indices and stores their plots.

>>> total_sobol = PostPCE.sobolIndicesPCE()

Bayesian Inference

Inverse parameter estimation can be done in bayesvalidrox with the class bayesvalidrox.bayes_inference.bayes_inference.BayesInference.

>>> from bayesvalidrox import BayesInference
>>> BayesOpts = BayesInference(engine)

If we set emulator to be true the Bayesian Inference will be performed based on the emulator. Some posterior predictions will be plotted by setting plot_post_pred. More options for Bayesian inference are listed at Bayesian inference.

Note

Setting emulator = False means that the inference is based on actual model runs and not the surrogate. This can also be achieved by initializing the Engine without a surrogate object.

>>> BayesOpts.emulator = True
>>> BayesOpts.plot_post_pred = True

We use MCMC to approximate the posterior distribution, rejection sampling is also available.

>>> import emcee
>>> BayesOpts.inference_method = "MCMC"
>>> BayesOpts.mcmc_params = {
>>>     'n_steps': 1e5,
>>>     'n_walkers': 30,
>>>     'moves': emcee.moves.KDEMove(),
>>>     'multiprocessing': False,
>>>     'verbose': False
>>>     }

Define the data uncertainty

The estimated uncertainty of the data is used for the Bayesian Inference. bayesvalidrox provides three option to add this. For this tutorial we assume the uncertainty of the data to be distributed according to a Gaussian distribution around the known data with a standard deviation that is as large as the values in the data.

Option I: Set error directly for all data

>>> BayesOpts.measurement_error = obsData

Warning

This option will become deprecated.

Option II: Set discrepancy distributions all at once

>>> DiscrepancyOpts = Discrepancy('')
>>> DiscrepancyOpts.type = 'Gaussian'
>>> DiscrepancyOpts.parameters = obsData**2
>>> BayesOpts.Discrepancy = DiscrepancyOpts

Option III: Set discrepancy distributions for each parameter at a time

>>> BayesOpts.bias_inputs = {'Z':np.arange(0, 10, 1.).reshape(-1,1) / 9}
>>> DiscOutputOpts = Input()
>>> DiscOutputOpts.add_marginals()
>>> DiscOutputOpts.Marginals[0].Nnme = '$\sigma^2_{\epsilon}$'
>>> DiscOutputOpts.Marginals[0].dist_type = 'uniform'
>>> DiscOutputOpts.Marginals[0].parameters =  [0, 10]
>>> BayesOpts.Discrepancy = {'known': DiscrepancyOpts,
>>>                         'infer': Discrepancy(DiscOutputOpts)}
>>> DiscOutputOpts = Input()
>>> DiscOutputOpts.add_marginals()
>>> DiscOutputOpts.Marginals[0].name = '$\lambda$'
>>> DiscOutputOpts.Marginals[0].dist_type = 'uniform'
>>> DiscOutputOpts.Marginals[0].parameters = [0, 1]
>>> DiscOutputOpts.add_marginals()
>>> DiscOutputOpts.Marginals[1].Name = '$\sigma_f$'
>>> DiscOutputOpts.Marginals[1].dist_type = 'uniform'
>>> DiscOutputOpts.Marginals[1].parameters = [0, 1e-4]
>>> BayesOpts.Discrepancy = Discrepancy(DiscOutputOpts)
>>> BayesOpts.Discrepancy = {'known': DiscrepancyOpts,
>>>                          'infer': Discrepancy(DiscOutputOpts)}

With the method create_inference we start the calibration/inference and save the results in a a``.pkl`` file. This also creates and saves multiple plots in the folder Output_Bayes_*modelname. The saved plots include a histogram of the BME of the surrogate and TOM, the posterior distribution with its most likely values and plots of posterior predictions if wanted.

>>> Bayes_PCE = BayesOpts.create_inference()
>>> with open(f'Bayes_{Model.name}.pkl', 'wb') as output:
>>>     joblib.dump(Bayes_PCE, output, 2)