TUTORIAL ******** Here we provide an introductory tutorial for the general workflow with **bayesvalidrox**. This tutorial follows along the example :any:`analyticalfunction` to explain the basic functionality of working with **bayesvalidrox** surrogates, postprocessing and the Bayesian classes. The full code can be found in :py:mod:`examples.analytical-function.test_analytical_function`. Import necessary libraries -------------------------- >>> import numpy as np >>> import pandas as pd >>> import sys >>> import joblib >>> from IPython.display import IFrame Define the model with PyLinkForwardModel ---------------------------------------- We use the :any:`bayesvalidrox.pylink.pylink.PyLinkForwardModel` object to define the model. >>> from bayesvalidrox import PyLinkForwardModel >>> Model = PyLinkForwardModel() Since the analytical function is implmented as a python function in a separate file, we can set the type of the link to ``Function`` and only need to pass it's name (without the ``.py`` extension) to the object variable ``py_file``. Note that the function name in the python script should match that of the script. The name variable takes any user defined string. >>> Model.link_type = 'Function' >>> Model.py_file = 'analytical_function' >>> Model.name = 'AnalyticFunc' The model output names are defined as a list of strings. These names will also be adopted by the surrogate and used in all plots. >>> Model.Output.names = ['Z'] For this example, we have a Monte-Carlo reference solution for the first moments (mean and standard deviation) of the analytical function. The numpy (``*.npy``) files for this can be found in the ``data\`` directory of the example. We will later discuss the estimation of the first two moments from the surrogate model, so let us import the mc-reference here. These values can be passed in a form of a dictionary to the object variable ``mc_reference``. >>> Model.mc_reference = {} >>> Model.mc_reference['Time [s]'] = np.arange(0, 10, 1.) / 9 >>> Model.mc_reference['mean'] = np.load(f"data/mean_2.npy") >>> Model.mc_reference['std'] = np.load(f"data/std_2.npy") Some of the training methods and evaluations make use of given data. For this we choose the model results for the parameter values X=[0,0]. >>> Model.observations = {} >>> Model.observations['Time [s]'] = np.arange(0, 10, 1.) / 9 >>> Model.observations['Z'] = np.repeat([2.], 10) Define probabilistic input model -------------------------------- Import and instantiate the input object >>> from bayesvalidrox import Input >>> Inputs = Input() Now, we define the distribution of the model inputs. **bayesvalidrox** accepts the definition in two ways: by defining the distribution directly or by passing available data. The latter is handy when little information is available on the parameters or they do not follow any typical distributions. Here we show both options, the associated example :any:`analyticalfunction` read the input parameters from a numpy file in the ``data/`` directory. **Option I:** Define distribution directy with their name, type and parameters >>> # First parameter >>> Inputs.add_marginals() >>> Inputs.Marginals[0].name = '$X_1$' >>> Inputs.Marginals[0].dist_type = 'unif' >>> Inputs.Marginals[0].parameters = [-5, 5] >>> >>> # Second parameter >>> Inputs.add_marginals() >>> Inputs.Marginals[1].name = '$X_2$' >>> Inputs.Marginals[1].dist_type = 'unif' >>> Inputs.Marginals[1].parameters = [-5, 5] **Option II:** Pass available data for input parameters >>> inputParams = np.load('data/InputParameters_2.npy') >>> >>> # First parameter >>> Inputs.add_marginals() >>> Inputs.Marginals[0].name = '$X_1$' >>> Inputs.Marginals[0].input_data = inputParams[:, 0] >>> >>> # Second parameter >>> Inputs.add_marginals() >>> Inputs.Marginals[1].name = '$X_2$' >>> Inputs.Marginals[1].input_data = inputParams[:, 1] Define surrogate (meta) model ----------------------------- In **bayesvalidrox** a metamodel is an object of class :any:`bayesvalidrox.surrogate_models.surrogate_models.MetaModel`. This object also accepts the inputs we defined in the previous section as an argument. >>> from bayesvalidrox import MetaModel >>> MetaModelOpts = MetaModel(Inputs) In this example, we use the arbitrary Polynomial Chaos Expansion (``'aPCE'``) as our ``meta_model_type``. >>> MetaModelOpts.meta_model_type = 'aPCE' .. _2002: https://epubs.siam.org/doi/10.1137/S1064827501387826/ .. note:: In **bayesvalidrox** two implementations of the Polynomial Chaos Expansion are available, ``'PCE'`` and ``'aPCE'``. The tag ``'PCE'`` uses the generalized Polynomial Chaos Expansion by Xiu & Karniadakis (2002_). while using ``'aPCE'`` results in its arbitrary extension by Oladyshkin & Nowak (2012_). .. _2012: https://www.sciencedirect.com/science/article/pii/S0951832012000853?via%3Dihub/ In **bayesvalidrox** the calculation of the polynomial coefficient is by standard done as described in Blatman & Sudret (2011_). .. _2011: https://www.sciencedirect.com/science/article/pii/S0021999110006856/ In addition the regression method to be used at the core of this calculation has to be set. Here we choose ``'FastARD'``, as it also induces sparsity in the expansion. >>> MetaModelOpts.pce_reg_method = 'FastARD' The polynomial degrees to consider can be either set as a range, or a single value to indicate the highest allowed degree. >>> MetaModelOpts.pce_deg = np.arange(9) The truncation shceme of the expansion is given as the q-norm, with a value between 0 and 1. A value of 1 results in standard truncation of the expansion, while smaller values make the expansion prefer less combined terms. >>> MetaModelOpts.pce_q_norm = 0.75 Set the experimental design --------------------------- The experimental design provides instructions on how to sample the input parameter space for training and evaluating the surrogate. Various sampling methods are available, and the samples can also be given by the user. >>> ExpDesign = ExpDesign(Inputs) >>> ExpDesign.n_init_samples = 100 >>> ExpDesign.sampling_method = 'latin_hypercube' Train the surrogate with an engine ---------------------------------- Training is done by giving the model, experimental design and the surrogate model to an engine, which performs the training for us. The engine is of class :any:`bayesvalidrox.surrogate_models.engine.Engine`. >>> engine = Engine(MetaModelOpts, Model, ExpDesign) Now, we can start training the surrogate (meta-) model by starting the engine and using the method ``train_normal``. The engine obtains the training samples from the experimental design using the sampling strategy we set, runs the model on these samples and trains the surrogate on the results. >>> engine.start_engine() >>> engine.train_normal() Once this has run through we can obtain the trained metamodel from the engine. >>> MetaModel_trained = engine.MetaModel As **bayesvalidrox** uses the engine class for postprocessing and inference, we will save it at this point as a ``pkl`` object. This can be easily read in to avoid retraining the surrogate. >>> with open(f'PCEengine_{Model.name}.pkl', 'wb') as output: >>> joblib.dump(engine, output, 2) Sequential training ------------------- The basic surrogate training that we just performed is done only on one static set of data. **bayesvalidrox** also provide the option of sequential training, also known as active learning, where additional samples to be trained on are chosen by the surrogate. This will split the training into two parts. In the first part the training is performed as before, though the size of this initial training set can a bit smaller. >>> ExpDesign.n_init_samples = 3*ndim >>> ExpDesign.sampling_method = 'latin_hypercube' The options for sequential training are listed in the dedicated page of the :any:`packagedescription`. New samples are set by exploration and exploitation. Exploration refers to samples that are randomly drawn from the prior input space, while exploitation can use different metrics. The tradeoff between the two helps to avoid overfitting, while keeping the faster convergence given by the exploitation methods. >>> ExpDesign.n_new_samples = 1 >>> ExpDesign.n_max_samples = 150 >>> ExpDesign.mod_LOO_threshold = 1e-16 >>> >>> ExpDesign.tradeoff_scheme = None >>> ExpDesign.explore_method = 'random' >>> >>> # Use when 'Voronoi' or 'random' or 'latin_hypercube' chosen >>> ExpDesign.n_canddidate = 1000 >>> ExpDesign.n_cand_groups = 4 Here we set the exploitaiton method to be Bayesian Active Learning, which chooses the new samples based on the information gain with respect to some given data, here the model results described earlier. In addition we need to set the information metric to use, here ``'DKL'`` is chosen. >>> ExpDesign.exploit_method = 'BayesActDesign' >>> ExpDesign.util_func = 'DKL' This active learning strategy also relies on the data uncertainty, so we set this to follow a Gaussian distribution around all values with standard deviations that are as large as the values themselves. >>> obsData = pd.DataFrame(Model.observations, columns=Model.Output.names) >>> DiscrepancyOpts = Discrepancy('') >>> DiscrepancyOpts.type = 'Gaussian' >>> DiscrepancyOpts.parameters = obsData**2 >>> MetaModelOpts.Discrepancy = DiscrepancyOpts The measures calculated in each training iteration can also be plotted. >>> ExpDesign.post_snapshot = False >>> ExpDesign.step_snapshot = 1 >>> ExpDesign.max_a_post = [0] * ndim For calculating and plotting the validation error of the surrogate in each iteration, additional references can be given. >>> prior = np.load(f"data/Prior_{ndim}.npy") >>> prior_outputs = np.load(f"data/origModelOutput_{ndim}.npy") >>> likelihood = np.load(f"data/validLikelihoods_{ndim}.npy") >>> ExpDesign.valid_samples = prior[:500] >>> ExpDesign.valid_model_runs = {'Z': prior_outputs[:500]} The sequential training is again performed by the class :any:`bayesvalidrox.surrogate_models.engine.Engine`, but using the function ``train_sequential``. >>> engine.ExpDesign = ExpDesign >>> engine.train_sequential() Post-processing --------------- The available post-processing methods for **bayesvalidrox** are given in the class :any:`bayesvalidrox.post_processing.post_processing.PostProcessing`. All results created by this class are automatically stored in an output folder ``Outputs_PostProcessing_*modelname``. >>> from bayesvalidrox import PostProcessing >>> PostPCE = PostProcessing(engine) Since the reference moments obtained from a Monte-Carlo simulation is available, we only need to call the method ``plot_moments`` method from the PostProcessing object. This method generates a plot and stores it in the output folder. >>> PostPCE.plot_moments() The method ``valid_metamodel`` allows for visual comparison between the model and surrogate for samples from the prior parameter distribution. The samples can be drawn randomly, or set by the user. >>> PostPCE.valid_metamodel(n_samples=3) Another way to check the accuracy of the meta model is to use the method ``accuracyCheckMetaModel`` to show the Root Mean Square Error (RMSE) and the validation error. >>> PostPCE.accuracyCheckMetaModel(nSamples=200) Global sensitivity analysis can be performed on the surrogate via the Total Sobol Indices. These are especially cheap to compute for PCE as they can exploit the properties of the calculated coefficients. The method ``sobolIndicesPCE`` returns a dictionary that contains the total sobol indices and stores their plots. >>> total_sobol = PostPCE.sobolIndicesPCE() Bayesian Inference ------------------ Inverse parameter estimation can be done in **bayesvalidrox** with the class :any:`bayesvalidrox.bayes_inference.bayes_inference.BayesInference`. >>> from bayesvalidrox import BayesInference >>> BayesOpts = BayesInference(engine) If we set ``emulator`` to be true the Bayesian Inference will be performed based on the emulator. Some posterior predictions will be plotted by setting ``plot_post_pred``. More options for Bayesian inference are listed at :any:`bayes_description`. .. note:: Setting ``emulator = False`` means that the inference is based on actual model runs and not the surrogate. This can also be achieved by initializing the Engine without a surrogate object. >>> BayesOpts.emulator = True >>> BayesOpts.plot_post_pred = True We use MCMC to approximate the posterior distribution, rejection sampling is also available. >>> import emcee >>> BayesOpts.inference_method = "MCMC" >>> BayesOpts.mcmc_params = { >>> 'n_steps': 1e5, >>> 'n_walkers': 30, >>> 'moves': emcee.moves.KDEMove(), >>> 'multiprocessing': False, >>> 'verbose': False >>> } Define the data uncertainty --------------------------- The estimated uncertainty of the data is used for the Bayesian Inference. **bayesvalidrox** provides three option to add this. For this tutorial we assume the uncertainty of the data to be distributed according to a Gaussian distribution around the known data with a standard deviation that is as large as the values in the data. **Option I:** Set error directly for all data >>> BayesOpts.measurement_error = obsData .. warning:: This option will become deprecated. **Option II:** Set discrepancy distributions all at once >>> DiscrepancyOpts = Discrepancy('') >>> DiscrepancyOpts.type = 'Gaussian' >>> DiscrepancyOpts.parameters = obsData**2 >>> BayesOpts.Discrepancy = DiscrepancyOpts **Option III:** Set discrepancy distributions for each parameter at a time >>> BayesOpts.bias_inputs = {'Z':np.arange(0, 10, 1.).reshape(-1,1) / 9} >>> DiscOutputOpts = Input() >>> DiscOutputOpts.add_marginals() >>> DiscOutputOpts.Marginals[0].Nnme = '$\sigma^2_{\epsilon}$' >>> DiscOutputOpts.Marginals[0].dist_type = 'uniform' >>> DiscOutputOpts.Marginals[0].parameters = [0, 10] >>> BayesOpts.Discrepancy = {'known': DiscrepancyOpts, >>> 'infer': Discrepancy(DiscOutputOpts)} >>> DiscOutputOpts = Input() >>> DiscOutputOpts.add_marginals() >>> DiscOutputOpts.Marginals[0].name = '$\lambda$' >>> DiscOutputOpts.Marginals[0].dist_type = 'uniform' >>> DiscOutputOpts.Marginals[0].parameters = [0, 1] >>> DiscOutputOpts.add_marginals() >>> DiscOutputOpts.Marginals[1].Name = '$\sigma_f$' >>> DiscOutputOpts.Marginals[1].dist_type = 'uniform' >>> DiscOutputOpts.Marginals[1].parameters = [0, 1e-4] >>> BayesOpts.Discrepancy = Discrepancy(DiscOutputOpts) >>> BayesOpts.Discrepancy = {'known': DiscrepancyOpts, >>> 'infer': Discrepancy(DiscOutputOpts)} With the method ``create_inference`` we start the calibration/inference and save the results in a a``.pkl`` file. This also creates and saves multiple plots in the folder ``Output_Bayes_*modelname``. The saved plots include a histogram of the BME of the surrogate and TOM, the posterior distribution with its most likely values and plots of posterior predictions if wanted. >>> Bayes_PCE = BayesOpts.create_inference() >>> with open(f'Bayes_{Model.name}.pkl', 'wb') as output: >>> joblib.dump(Bayes_PCE, output, 2)