Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Similar documents
AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

General Linear Model Introduction, Classes of Linear models and Estimation

Distributed Rule-Based Inference in the Presence of Redundant Information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Bayesian Networks Practice

Estimating function analysis for a class of Tweedie regression models

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

arxiv: v1 [physics.data-an] 26 Oct 2012

Feedback-error control

Chapter 10. Supplemental Text Material

A New GP-evolved Formulation for the Relative Permittivity of Water and Steam

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Universal Finite Memory Coding of Binary Sequences

Radial Basis Function Networks: Algorithms

Gaussian processes A hands-on tutorial

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics

Probability Estimates for Multi-class Classification by Pairwise Coupling

On split sample and randomized confidence intervals for binomial proportions

Covariance Matrix Estimation for Reinforcement Learning

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction

Linear diophantine equations for discrete tomography

Information collection on a graph

Information collection on a graph

State Estimation with ARMarkov Models

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

UNCERTAINLY MEASUREMENT

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin

An Analysis of Reliable Classifiers through ROC Isometrics

Nonlinear Estimation. Professor David H. Staelin

A Toolbox for Validation of Nonlinear Finite Element Models

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

MODEL-BASED MULTIPLE FAULT DETECTION AND ISOLATION FOR NONLINEAR SYSTEMS

Modelling 'Animal Spirits' and Network Effects in Macroeconomics and Financial Markets Thomas Lux

The non-stochastic multi-armed bandit problem

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

Fuzzy Automata Induction using Construction Method

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Robust Predictive Control of Input Constraints and Interference Suppression for Semi-Trailer System

Efficient & Robust LK for Mobile Vision

DIFFERENTIAL evolution (DE) [3] has become a popular

Notes on Instrumental Variables Methods

Hotelling s Two- Sample T 2

ESE 524 Detection and Estimation Theory

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS

Developing A Deterioration Probabilistic Model for Rail Wear

SAS for Bayesian Mediation Analysis

Finite Mixture EFA in Mplus

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

Ensemble Forecasting the Number of New Car Registrations

SPARSE SIGNAL RECOVERY USING A BERNOULLI GENERALIZED GAUSSIAN PRIOR

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Collaborative Place Models Supplement 1

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton

Distributed K-means over Compressed Binary Data

Convex Optimization methods for Computing Channel Capacity

Bayesian System for Differential Cryptanalysis of DES

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Ducted Wind/Water Turbines and Propellers Revisited By Michael, J. Werle, PhD 1 and Walter M. Presz, Jr., PhD 2 FLODESIGN, INC. WILBRAHAM, MA.

Adaptive estimation with change detection for streaming data

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Lecture 8 Genomic Selection

INTRODUCTION. Please write to us at if you have any comments or ideas. We love to hear from you.

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

Cryptanalysis of Pseudorandom Generators

Pairwise active appearance model and its application to echocardiography tracking

The Poisson Regression Model

Boosting Variational Inference

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA)

Multivariable Generalized Predictive Scheme for Gas Turbine Control in Combined Cycle Power Plant

Estimation and Detection

Deformation Effect Simulation and Optimization for Double Front Axle Steering Mechanism

Research Article Research on Evaluation Indicator System and Methods of Food Network Marketing Performance

Analysis of Multi-Hop Emergency Message Propagation in Vehicular Ad Hoc Networks

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

An Improved Calibration Method for a Chopped Pyrgeometer

Time Domain Calculation of Vortex Induced Vibration of Long-Span Bridges by Using a Reduced-order Modeling Technique

Improved Identification of Nonlinear Dynamic Systems using Artificial Immune System

A Closed-Form Solution to the Minimum V 2

Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Methods for Computing Marginal Data Densities from the Gibbs Output

arxiv: v3 [physics.data-an] 23 May 2011

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search

Transcription:

HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Why use metamodeling techniques? High-fidelity numerical simulators are increasingly used in a variety of engineering alications Since they can be comutationally exensive Metamodeling techniques are adoted in many imlementations (for examle in UQ setting) to aroximate the simulator with a cheaer-to-run emulator Poular choices include: Suort vector machine Artificial neural networs Polynomial Chaos Kriging (aa Gaussian Process Regression); Data-driven statistical emulator

Kriging (Gaussian rocess) intro I h( x) f ( x) βz( x) Outut = Trend + GP residual T Select basis functions f and correlation ernel R Tune Kriging arameters based on exeriments (inut X/outut H air) i.e., otimize ( β,, θ ) * * * The regression structure catures deterministic global trend, through basis functions: f [ f1,, f n f ] The residual GP dictates local deviation from trend with some chosen covariance ernel: x x' cov z( x), z( x') R( ) θ

Kriging (Gaussian rocess) intro I h( x) f ( x) βz( x) Outut = Trend + GP residual T Select basis functions f and correlation ernel R Tune Kriging arameters based on exeriments (inut X/outut H air) h 0.4 0.3 h Formulate Kriging redictions (redictive mean and variance) for any new inut x 0. 0.1 0 x 0 - - 0 x 1 x x 1

Kriging (Gaussian rocess) intro II h x x Kriging Outut rediction ~ h ( x, XH, )~ h ( x), ( x) Select basis functions f and correlation ernel R Tune Kriging arameters based on exeriments (inut X/outut H air) T T 1 Formulate Kriging redictions (redictive mean and variance) for any new inut x x 1 h ( x) f( x) β r ( x) R ( HFβ ) * * * * T T 1 1 T 1 * * * * * T 1 where uf R* r* ( x) f( x) ( x) 1 u ( F R F) u r ( x) R r ( x)

Is regression structure imortant? I Real model Exeriments x x 1 x x 1 Kriging with f(x) = {1} Kriging with f(x) = {x 1 x, (x ) } x x 1 x x 1 x x 1 x x 1 Kriging with f(x) = {x 3 } Kriging with f(x) = {x 4 }

Is regression structure imortant? II Contour lots of absolute errors (iner indicates higher error) Kriging with f(x) = {1} Kriging with f(x) = {x 1 x, x } x x x 1 x 1 x x x 1 Kriging with f(x) = {x 3 } Kriging with f(x) = {x 4 } x 1

Bayesian inference for regression structure selection Question: how can we select a regression structure, conditioned on all available data? Bayesian inference Terminology: Model M for Kriging with basis function vector f ={f 1 f } Bayes rule: Data PM XH H X (, ), M ( M ) Model evidence Posterior robability of each model 0.4 0.3 0. 0.1 Prior on each model 0.4 0.3 0. 0.1

Bayesian model averaging rediction Reeat for every model Predict under model M h x x Kriging ( h x, XH, M ) x 1, h xxh,, M n m 1 h xxh,,, M PM ( XH, ) P( M XH, ) H X, M ( M ) Mixture weights = Posterior robability

Evidence evaluation Evidence: P( H X, M ) P( H X, β, σ, θ, M ) π( β, σ, θ M ) dβ dσ dθ Data lielihood Model arameter rior For each model, its evidence calculation requires a robabilistic integration over the model s arameter sace (Kriging tunable arameters), which can be numerically cumbersome (needs sohisticated MCMC samlers)

Evidence evaluation Evidence: P( H X, M ) P( H X, β, σ, θ, M ) π( β, σ, θ M ) dβdσ dθ Data lielihood has a determined form, because of Kriging s GP nature Model arameter rior The rior can be selected to result in analytical exression of the integral, eliminating numerical integration challenges

Prior elicitation I [ ] Prior selection [ β, ] 0 max(, T 1 θ, nn ) ( FF ) This rior is similar in sirit from the extremely oularused benchmar g-rior in linear regression context The red comonent is adative to both data size n and basis function size n. The blue comonent corresonds to the Fisher s information matrix of the basis function values of the training data (F ).

Prior elicitation II [ and θ] θ Prior selection Illustrative contour lots of conditional lielihood [ β, ] 0 max(, T 1 θ, nn ) ( FF ) [( σ ), θ ] arg ma x P( H X, M, σ, θ ) * * σ, θ Emirical Bayesian (EB) aroach: the maximizer of the conditional evidence chosen as rior; leads to closed-form exression for evidence: log P( H X, M) log P( H X, M, σ, θ) σ 1 T 1 1 T 1 HR θ H HCH log σrθ σ 1 1 n T 1 log cσ( FF ) log A log π

Prior elicitation II [ and θ] θ Prior selection Illustrative contour lots of conditional lielihood [ β, ] 0 max(, T 1 θ, nn ) ( FF ) [( σ ), θ ] arg ma x P( H X, M, σ, θ ) * * σ, θ Emirical Bayesian (EB) aroach: the maximizer of the conditional evidence chosen as rior; leads to closed-form exression for evidence: But Maximization is non-trivial (non-convex otimization roblem) σ

Retain Gaussian rediction characteristics Prior selection [ β, ] 0 max(, Not only evidence is analytically tractable log P( H X, M) log P( H X, M, σ, θ) T 1 θ, nn ) ( FF ) [( σ ), θ ] arg ma x P( H X, M, σ, θ ) * * σ, θ 1 1 1 1 1 n log σ log c σ ( ) log log π T 1 T T 1 HR θ H HCH Rθ FF A σ But also redictions retain Gaussian nature (analytically tractable, facilitate seamless integration with existing Kriging imlementations) h( x) X, H, M, { β,, θ }~ h( x), ( x) h T BMA T 1 BMA ( x M) f( x) β r ( x) R ( HFβ ) θ * * θ 1 * 1 1 1 ( ) ( ) T T T T x M 1 ( ) * *( ) * *( ) u c F F F R F u r x R r x θ θ θ θ

Comutational imlementation I Reeat for every model Predict under model M h Inefficient for large number of models (n m >>1) x x Kriging ( h x, XH, M ) x 1, h xxh,, M h, H M PM ( X, H) xx,, n m 1

Efficient averaging of redictions. Utilize Occam s Razor rincile, and only consider models whose lausibility is non-negligible [1]: ˆ M P ( M XH, ) α max PM ( XH, ) M M, h xxh,, h xxh,, M P( M XH, ) M ˆ where α controls the degree of restriction: α=0 : include all models; α=1 : include the most lausible one only. [1]. Raftery, A. E., Madigan, D., and Hoeting, J. A., 1997, "Bayesian model averaging for linear regression models," Journal of the American Statistical Association, 9(437),. 179-191.

Comutational imlementation II Training data X H Each model s evidence calculation requires otimization (EB arameters) intensive comutation! Model (basis functions) M 1 Model (basis functions) M... Evidence Evidence... Post rob. Post rob.... Model (basis functions) M Evidence Post rob. M P(H X, M ) P(M X, H) P( H X, M ) P M where Bayesian model class averaging * * ( H X,, σ, θ) [( σ ), θ ] arg max P( H X, M, σ, θ * * σ, θ )

Model sace exloration setting We assume the ool of interested basis functions are olynomial functions u to some order (set to commonly): n x nx i fc xi i i1 i1 1 1, x1,, x, 1,,, 1,, 1,, 1, n x x x n x x x x n x x n x x x x nx A oular setu because of its flexibility and hysical interretability Number of otential models is large, for examle for n x =10 dimensional roblem, the combination to exlore (for =) are 410! An efficient search imlementation is necessary

Efficient model search rinciles Occam s Razor: the model is ignored if its osterior robability is lesser than its simler variants (those with basis functions as its subsets) [1]. Effect Heredity: a basis function is allowed to aear only if all its arents (corresonding lower order ones) are already included []. For examle, the quadratic term (x ) is allowed only if the model already includes the corresonding constant and linear terms [1, x]. Forward search from simler to more comlex models [1]. Raftery, A. E., Madigan, D., and Hoeting, J. A., 1997, "Bayesian model averaging for linear regression models," Journal of the American Statistical Association, 9(437),. 179-191. []. Peixoto, J. L., 1987, "Hierarchical variable selection in olynomial regression models," The American Statistician, 41(4),. 311-313.

Forward and stewise search I Start with ordinary Kriging (OK). Calculate its EB arameters through Eq. (16) and marginal lielihood P(M 0 X,H). Set active set ={M 0 } 1st iteration ( = 1) Select model M to corresond to the th model in and obtain all allowable basis function additions (functions whose arents are included in M ) Termination Construct the final set of models by screening across ˆ M P ( H X, M ) P ( M ) α max P ( H X, M ) P ( M ) = +1 yes M no Is there a +1 model in? Weights for each PM (, ) model are XH w nˆ m M ˆ PM ( XH, ) 1 Construct the model ( M Elimination yes 1 ) with one added basis function. 1 Calculate the osterior robability PM ( XH, ) Eliminate PM XH chec PM XH 1 (, ) (, ) Reeat for each allowable basis function addition f +1 no 1 M 1 Retain M Enrich active set: 1 M Exlore if more comlex model satisfies Occam and heredity rinciles. If yes exlore around it (consider even more comlex models)

Intermediate elimination We can obtain (no comutational cost) aguessfor a new (comlex) M +1 model s osterior robability using the nown EB arameter of the arent M simler one: PM PM 1 1 ( X, H) ( X, H,, θ ) Comlex model EB arameters of arent simler model Intermediate elimination: ignore the comlex model without the need of calculating its EB arameter, if its aroximated osterior robability is very unromising (significantly below simler one). PM PM 1 ( XH,,, θ ) ( X, H,, θ ) Only if the model asses intermediate elimination, we identify its actual EB arameters to examine actual Occam s razor.

Forward and stewise search II Start with ordinary Kriging (OK). Calculate its EB arameters through Eq. (16) and marginal lielihood P(M 0 X,H). Set active set ={M 0 } 1st iteration ( = 1) Select model M to corresond to the th model in and obtain all allowable basis function additions (functions whose arents are included in M ) Termination Construct the final set of models by screening across ˆ M P ( H X, M ) P ( M ) α max P ( H X, M ) P ( M ) = +1 yes M no Is there a +1 model in? Weights for each PM (, ) model are XH w nˆ m M ˆ PM ( XH, ) 1 Construct the model ( M Intermediate elimination yes Eliminate 1 ) with one added basis function. 1 Aroximate the osterior robability PM ( XH, ) PM chec 1 ( XH, ) PM ( XH, ) Reeat for each allowable basis function addition f +1 no 1 M 1 Retain M Collect all retained models Enrich active set: no 1 M Actual Elimination chec XH 1 (, ) PM (, ) PM Reeat for each retained model M +1 XH Eliminate yes Exloration 1 Obtain the EB arameters for M through Eq. (16) 1 and the actual osterior robability PM ( XH, ) 1 M

Comarison methods Blind Kriging (BK): forwardly selects the next best basis function to add until the cross validation error decreases [3]. Dynamic Kriging (DK): formulate the basis function selection as an integer rogramming roblem, and sees the combination that minimizes the rocess variance with Genetic Algorithm [4]. [3]. Joseh, V. R., Hung, Y., and Sudjianto, A., 008, "Blind riging: A new method for develoing metamodels," Journal of Mechanical Design, 130(3),. 03110. [4]. Zhao, L., Choi, K., and Lee, I., 011, "Metamodeling method using dynamic riging for design otimization," AIAA Journal, 49(9),. 034-046.

Comarison methods Blind Kriging (BK): forwardly selects the next best basis function to add until the cross validation error decreases [3]. Dynamic Kriging (DK): formulate the basis function selection as an integer rogramming roblem, and sees the combination that minimizes the rocess variance with Genetic Algorithm [4]. BMAK: with efficient model averaging and search strategies; Full-BMAK: roosed aroach without efficient strategies [only alicable to low-dimensional roblems]; BMSK: same as BMAK, but only use the most lausible model to redict (i.e. model selection variant)

Benchmar roblems I Branin: two-dimensional roblem with moderate nonlinearity Griewan: two-dimensional roblem with high nonlinearity

Benchmar roblems II Borehole: engineering roblem modelling the water flow rate through a borehole (Harer et al. 1983) Half-Car: realistic engineering roblem calculating the roadholding statistics of a half-car nonlinear model riding on a rough road (Zhang et al., 016). Dimension Number of exeriments Branin 14 Griewan 30 Borehole 8 4 Half-Car 19 60

Prediction correlation coefficient Branin Griewan Borehole Half-Car Full BMAK 0.95 0.946 N/A N/A BMAK 0.915 0.949 0.996 0.917 BMSK 0.93 0.945 0.995 0.890 BK 0.915 0.917 0.983 0.86 DK 0.897 0.93 0.914 0.69 Reorted: the average rediction correlation over 50 indeendent runs Higher value Better rediction

More comrehensive erformance assessment Half-car examle Correlation coefficient RMS error Prediction interval coverage Better erformance Better erformance Better erformance

Performance for UQ redictions Borehole Half-Car BMAK 0.0144 0.0337 BMSK 0.0154 0.0408 BK 0.083 0.0571 DK 0.0818 0.074 Reorted: the Kolmogorov Smirnov measure between the true and rediction CDF over 50 indeendent runs Smaller discreancy Better erformance

Comutational cost Com. cost Branin Griewan Borehole Half-Car Tuning (EB otimization) 3.17 3 0.5 7.37 Predictions (model averaging) 3.14 3 16.7 7.5 Model sace size 63 63 3.5x10 13 1.6x10 63 Conventional Kriging cost = 1 for both tuning and redictions BMAK s comutational cost is at most one magnitude greater than conventional Kriging with deterministic regression structure

Conclusion A Bayesian Model Averaging formulation for Kriging (BMAK) is roosed to systematically treat regression structure uncertainties. A data-driven rior is roosed, combining the benchmar g- rior and the Emirical Bayes rior; Selection lends itself to analytical tractability of roblem. Efficient model averaging and search strategy is roosed for roblems with large model sace. Results demonstrates the advantage of BMAK in both analytical and engineering roblems.

Than You!