TUTORIAL

Here we provide an introductory tutorial for the general workflow with bayesvalidrox. This tutorial follows along the example Example: Analytical function to explain the basic functionality of working with bayesvalidrox surrogates, postprocessing and the Bayesian classes. The full code can be found in examples.analytical-function.example_analytical_function_pce.py. Variations with other surrogate models are available in the same folder.

Import necessary libraries

>>> import numpy as np
>>> import pandas as pd
>>> import sys
>>> import joblib

Define the model with PyLinkForwardModel

We use the bayesvalidrox.pylink.pylink.PyLinkForwardModel object to define the model.

>>> from bayesvalidrox import PyLinkForwardModel
>>> model = PyLinkForwardModel()

Since the analytical function is implmented as a python function in a separate file, we can set the type of the link to Function and only need to pass it’s name (without the .py extension) to the object variable py_file. Note that the function name in the python script should match that of the script.

The name variable takes any user defined string.

>>> model.link_type = 'Function'
>>> model.py_file = 'analytical_function'
>>> model.name = 'AnalyticFunc'

The model output names are defined as a list of strings. These names will also be adopted by the surrogate and used in all plots.

>>> model.output.names = ['Z']

For this example, we have a Monte-Carlo reference solution for the first moments (mean and standard deviation) of the analytical function. The numpy (*.npy) files for this can be found in the data\ directory of the example. We will later discuss the estimation of the first two moments from the surrogate model, so let us import the mc-reference here. These values can be passed in a form of a dictionary to the object variable mc_reference.

>>> model.mc_reference = {}
>>> model.mc_reference['Time [s]'] = np.arange(0, 10, 1.) / 9
>>> model.mc_reference['mean'] = np.load(f"data/mean_2.npy")
>>> model.mc_reference['std'] = np.load(f"data/std_2.npy")

Some of the training methods and evaluations make use of given data. For this we choose the model results for the parameter values X=[0,0].

>>> model.observations = {}
>>> model.observations['Time [s]'] = np.arange(0, 10, 1.) / 9
>>> model.observations['Z'] = np.repeat([2.], 10)

Define probabilistic input model

Import and instantiate the input object

>>> from bayesvalidrox import Input
>>> inputs = Input()

Now, we define the distribution of the model inputs. bayesvalidrox accepts the definition in two ways: by defining the distribution directly or by passing available data. The latter is handy when little information is available on the parameters or they do not follow any typical distributions. Here we show both options, the associated example Example: Analytical function read the input parameters from a numpy file in the data/ directory.

Option I: Define distribution directy with their name, type and parameters

>>> # First parameter
>>> inputs.add_marginals()
>>> inputs.marginals[0].name = '$X_1$'
>>> inputs.marginals[0].dist_type = 'unif'
>>> inputs.marginals[0].parameters =  [-5, 5]
>>>
>>> # Second parameter
>>> inputs.add_marginals()
>>> inputs.marginals[1].name = '$X_2$'
>>> inputs.marginals[1].dist_type = 'unif'
>>> inputs.marginals[1].parameters =  [-5, 5]

Option II: Pass available data for input parameters and use shorthand

>>> inputParams = np.load('data/InputParameters_2.npy')
>>>
>>> inputs.add_marginals(name = '$X_1$',input_data = inputParams[:, 0])
>>> inputs.add_marginals(name = '$X_2$',input_data = inputParams[:, 1])

Define surrogate (meta) model

In bayesvalidrox a metamodel is an object of a child class of the class bayesvalidrox.surrogate_models.surrogate_models.MetaModel. This object also accepts the inputs we defined in the previous section as an argument. For this example we use the PCE class bayesvalidrox.surrogate_models.polynomial_chaos_expansion.PCE.

>>> from bayesvalidrox import PCE
>>> meta_model = PCE(Inputs)

In this example, we use the arbitrary Polynomial Chaos Expansion ('aPCE') as our meta_model_type.

>>> meta_model.meta_model_type = 'aPCE'

Note

In bayesvalidrox two implementations of the Polynomial Chaos Expansion are available, 'PCE' and 'aPCE'. The tag 'PCE' uses the generalized Polynomial Chaos Expansion by Xiu & Karniadakis (2002). while using 'aPCE' results in its arbitrary extension by Oladyshkin & Nowak (2012).

In bayesvalidrox the calculation of the polynomial coefficient is by standard done as described in Blatman & Sudret (2011).

In addition the regression method to be used at the core of this calculation has to be set. Here we choose 'FastARD', as it also induces sparsity in the expansion.

>>> meta_model.pce_reg_method = 'FastARD'

The polynomial degrees to consider can be either set as a range, or a single value to indicate the highest allowed degree.

>>> meta_model.pce_deg = np.arange(9)

The truncation shceme of the expansion is given as the q-norm, with a value between 0 and 1. A value of 1 results in standard truncation of the expansion, while smaller values make the expansion prefer less combined terms.

>>> meta_model.pce_q_norm = 0.75

Set the experimental design

The experimental design provides instructions on how to sample the input parameter space for training and evaluating the surrogate. Various sampling methods are available, and the samples can also be given by the user.

>>> from bayesvalidrox import ExpDesigns
>>> exp_design = ExpDesign(Inputs)
>>> exp_design.n_init_samples = 100
>>> exp_design.sampling_method = 'latin_hypercube'

Train the surrogate with an engine

Training is done by giving the model, experimental design and the surrogate model to an engine, which performs the training for us. The engine is of class bayesvalidrox.surrogate_models.engine.Engine.

>>> from bayesvalidrox import engine
>>> engine = Engine(meta_model, model, exp_design)

Now, we can start training the surrogate (meta-) model by starting the engine and using the method train_normal. The engine obtains the training samples from the experimental design using the sampling strategy we set, runs the model on these samples and trains the surrogate on the results.

>>> engine.train_normal()

Once this has run through we can obtain the trained metamodel from the engine.

>>> MetaModel_trained = engine.MetaModel

As bayesvalidrox uses the engine class for postprocessing and inference, we will save it at this point as a pkl object. This can be easily read in to avoid retraining the surrogate.

>>> with open(f'PCEengine_{Model.name}.pkl', 'wb') as output:
>>>     joblib.dump(engine, output, 2)

Sequential training

The basic surrogate training that we just performed is done only on one static set of data. bayesvalidrox also provide the option of sequential training, also known as active learning, where additional samples to be trained on are chosen by the surrogate. This will split the training into two parts. In the first part the training is performed as before, though the size of this initial training set can a bit smaller.

>>> exp_design.n_init_samples = 3*ndim
>>> exp_design.sampling_method = 'latin_hypercube'

The options for sequential training are listed in the dedicated page of the USER GUIDE. New samples are set by exploration and exploitation. Exploration refers to samples that are randomly drawn from the prior input space, while exploitation can use different metrics. The tradeoff between the two helps to avoid overfitting, while keeping the faster convergence given by the exploitation methods.

>>> exp_design.n_new_samples = 1
>>> exp_design.n_max_samples = 150
>>> exp_design.mod_loo_threshold = 1e-16
>>>
>>> exp_design.tradeoff_scheme = None
>>> exp_design.explore_method = 'random'
>>>
>>> # Use when 'Voronoi' or 'random' or 'latin_hypercube' chosen
>>> exp_design.n_canddidate = 1000
>>> exp_design.n_cand_groups = 4

Here we set the exploitaiton method to be Bayesian Active Learning, which chooses the new samples based on the information gain with respect to some given data, here the model results described earlier. In addition we need to set the information metric to use, here 'DKL' is chosen.

>>> exp_design.exploit_method = 'BayesActDesign'
>>> exp_design.util_func = 'DKL'

This active learning strategy also relies on the data uncertainty, so we set this to follow a Gaussian distribution around all values with standard deviations that are as large as the values themselves.

>>> obsData = pd.DataFrame(Model.observations, columns=Model.output.names)
>>> discrepancy = Discrepancy('')
>>> discrepancy.type = 'Gaussian'
>>> discrepancy.parameters = obsData**2
>>> meta_model.discrepancy = discrepancy

For calculating and plotting the validation error of the surrogate in each iteration, additional references can be given.

>>> prior = np.load(f"data/Prior_{ndim}.npy")
>>> prior_outputs = np.load(f"data/origModelOutput_{ndim}.npy")
>>> likelihood = np.load(f"data/validLikelihoods_{ndim}.npy")
>>> exp_design.valid_samples = prior[:500]
>>> exp_design.valid_model_runs = {'Z': prior_outputs[:500]}

The sequential training is again performed by the class bayesvalidrox.surrogate_models.engine.Engine, but using the function train_sequential. If the measures that are calculated in each training iteration should be visualized, set plot=True in the function call.

>>> engine.exp_design = exp_design
>>> engine.train_sequential()

Post-processing

The available post-processing methods for bayesvalidrox are given in the class bayesvalidrox.post_processing.post_processing.PostProcessing. All results created by this class are automatically stored in an output folder Outputs_PostProcessing_*modelname.

>>> from bayesvalidrox import PostProcessing
>>> post = PostProcessing(engine)

Since the reference moments obtained from a Monte-Carlo simulation is available, we only need to call the method plot_moments method from the PostProcessing object. This method generates a plot and stores it in the output folder.

>>> post.plot_moments()

The method valid_metamodel allows for visual comparison between the model and surrogate for samples from the prior parameter distribution. The samples can be drawn randomly, or set by the user.

>>> post.valid_metamodel(n_samples=3)

Another way to check the accuracy of the meta model is to use the method check_accuracy to show the Root Mean Square Error (RMSE) and the validation error.

>>> post.check_accuracy(n_samples=200)

Global sensitivity analysis can be performed on the surrogate via the Total Sobol’ Indices. These are especially cheap to compute for PCE as they can exploit the properties of the calculated coefficients. The method sobol_indices returns both the Sobol’ and the Total Sobol’ indices and stores their plots.

>>> sobol = post.sobol_indices()

Bayesian validation

Both Bayesian validation and Bayesian inference are performed with the class bayesvalidrox.bayes_inference.bayes_inference.BayesInference.

>>> from bayesvalidrox import BayesInference
>>> bayes = BayesInference(engine)

If we set use_emulator to be true, the object will consider the metamodel in the engine as the main model. Otherwise it will use evaluations of the model given in engine.model. Some posterior predictions will be plotted by setting plot to be true. More options for Bayesian inference are listed at Bayesian inference.

Note

Setting emulator = False means that the inference is based on actual model runs and not the surrogate. This can also be achieved by initializing the Engine without a surrogate object.

>>> BayesOpts.use__emulator = True
>>> BayesOpts.plot = True

For validation, the reference can be perturbed based on different bootstrap settings. Setting bootstrap_method to 'normal' applies Gaussian noise, while 'loocv' applies leave-one-out. To disable bootstrapping, set this attribute to 'none'.

>>> bayes.bootstrap_method = "normal"
>>> bayes.n_bootstrap_itrs = 500
>>> bayes.bootstrap_noise = 100

The bayesvalidrox.bayes_inference.bayes_inference.BayesInference object considers uncertainty in the observations as given by a discrepancy similar to for the sequential training.

>>> bayes.discrepancy = discrepancy

The validation looks for a validation related observation in the model. Here we just copy the read-in observation.

>>> bayes.engine.model.observations_valid = bayes.engine.model.observations
>>> bayes.engine.model.n_obs_valid = 1

Now we can perform the validation. The result is a set of log-BME values of the metamodel in comparison to the perturbed observations. A histogram of the BME of the surrogate and TOM is saved in the folder Output_Bayes_*modelname, if plot is set to be true.

>>> log_bme = bayes.run_validation()

Bayesian Inference

Like Bayesian validation, inverse parameter estimation can be done in bayesvalidrox with the class bayesvalidrox.bayes_inference.bayes_inference.BayesInference. In this example we use MCMC to approximate the posterior distribution. Rejection sampling is also available.

>>> import emcee
>>> bayes.inference_method = "MCMC"
>>> bayes.mcmc_params = {
>>>     'n_steps': 1e5,
>>>     'n_walkers': 30,
>>>     'moves': emcee.moves.KDEMove(),
>>>     'multiprocessing': False,
>>>     'verbose': False
>>>     }

With the method create_inference we start the calibration/inference and save the results in a a``.pkl`` file. This also creates and saves multiple plots in the folder Output_Bayes_*modelname. The saved plots include the posterior distribution with its most likely values and plots of posterior predictions if wanted.

>>> posterior = bayes.create_inference()