HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis
Why use metamodeling techniques? High-fidelity numerical simulators are increasingly used in a variety of engineering alications Since they can be comutationally exensive Metamodeling techniques are adoted in many imlementations (for examle in UQ setting) to aroximate the simulator with a cheaer-to-run emulator Poular choices include: Suort vector machine Artificial neural networs Polynomial Chaos Kriging (aa Gaussian Process Regression); Data-driven statistical emulator
Kriging (Gaussian rocess) intro I h( x) f ( x) βz( x) Outut = Trend + GP residual T Select basis functions f and correlation ernel R Tune Kriging arameters based on exeriments (inut X/outut H air) i.e., otimize ( β,, θ ) * * * The regression structure catures deterministic global trend, through basis functions: f [ f1,, f n f ] The residual GP dictates local deviation from trend with some chosen covariance ernel: x x' cov z( x), z( x') R( ) θ
Kriging (Gaussian rocess) intro I h( x) f ( x) βz( x) Outut = Trend + GP residual T Select basis functions f and correlation ernel R Tune Kriging arameters based on exeriments (inut X/outut H air) h 0.4 0.3 h Formulate Kriging redictions (redictive mean and variance) for any new inut x 0. 0.1 0 x 0 - - 0 x 1 x x 1
Kriging (Gaussian rocess) intro II h x x Kriging Outut rediction ~ h ( x, XH, )~ h ( x), ( x) Select basis functions f and correlation ernel R Tune Kriging arameters based on exeriments (inut X/outut H air) T T 1 Formulate Kriging redictions (redictive mean and variance) for any new inut x x 1 h ( x) f( x) β r ( x) R ( HFβ ) * * * * T T 1 1 T 1 * * * * * T 1 where uf R* r* ( x) f( x) ( x) 1 u ( F R F) u r ( x) R r ( x)
Is regression structure imortant? I Real model Exeriments x x 1 x x 1 Kriging with f(x) = {1} Kriging with f(x) = {x 1 x, (x ) } x x 1 x x 1 x x 1 x x 1 Kriging with f(x) = {x 3 } Kriging with f(x) = {x 4 }
Is regression structure imortant? II Contour lots of absolute errors (iner indicates higher error) Kriging with f(x) = {1} Kriging with f(x) = {x 1 x, x } x x x 1 x 1 x x x 1 Kriging with f(x) = {x 3 } Kriging with f(x) = {x 4 } x 1
Bayesian inference for regression structure selection Question: how can we select a regression structure, conditioned on all available data? Bayesian inference Terminology: Model M for Kriging with basis function vector f ={f 1 f } Bayes rule: Data PM XH H X (, ), M ( M ) Model evidence Posterior robability of each model 0.4 0.3 0. 0.1 Prior on each model 0.4 0.3 0. 0.1
Bayesian model averaging rediction Reeat for every model Predict under model M h x x Kriging ( h x, XH, M ) x 1, h xxh,, M n m 1 h xxh,,, M PM ( XH, ) P( M XH, ) H X, M ( M ) Mixture weights = Posterior robability
Evidence evaluation Evidence: P( H X, M ) P( H X, β, σ, θ, M ) π( β, σ, θ M ) dβ dσ dθ Data lielihood Model arameter rior For each model, its evidence calculation requires a robabilistic integration over the model s arameter sace (Kriging tunable arameters), which can be numerically cumbersome (needs sohisticated MCMC samlers)
Evidence evaluation Evidence: P( H X, M ) P( H X, β, σ, θ, M ) π( β, σ, θ M ) dβdσ dθ Data lielihood has a determined form, because of Kriging s GP nature Model arameter rior The rior can be selected to result in analytical exression of the integral, eliminating numerical integration challenges
Prior elicitation I [ ] Prior selection [ β, ] 0 max(, T 1 θ, nn ) ( FF ) This rior is similar in sirit from the extremely oularused benchmar g-rior in linear regression context The red comonent is adative to both data size n and basis function size n. The blue comonent corresonds to the Fisher s information matrix of the basis function values of the training data (F ).
Prior elicitation II [ and θ] θ Prior selection Illustrative contour lots of conditional lielihood [ β, ] 0 max(, T 1 θ, nn ) ( FF ) [( σ ), θ ] arg ma x P( H X, M, σ, θ ) * * σ, θ Emirical Bayesian (EB) aroach: the maximizer of the conditional evidence chosen as rior; leads to closed-form exression for evidence: log P( H X, M) log P( H X, M, σ, θ) σ 1 T 1 1 T 1 HR θ H HCH log σrθ σ 1 1 n T 1 log cσ( FF ) log A log π
Prior elicitation II [ and θ] θ Prior selection Illustrative contour lots of conditional lielihood [ β, ] 0 max(, T 1 θ, nn ) ( FF ) [( σ ), θ ] arg ma x P( H X, M, σ, θ ) * * σ, θ Emirical Bayesian (EB) aroach: the maximizer of the conditional evidence chosen as rior; leads to closed-form exression for evidence: But Maximization is non-trivial (non-convex otimization roblem) σ
Retain Gaussian rediction characteristics Prior selection [ β, ] 0 max(, Not only evidence is analytically tractable log P( H X, M) log P( H X, M, σ, θ) T 1 θ, nn ) ( FF ) [( σ ), θ ] arg ma x P( H X, M, σ, θ ) * * σ, θ 1 1 1 1 1 n log σ log c σ ( ) log log π T 1 T T 1 HR θ H HCH Rθ FF A σ But also redictions retain Gaussian nature (analytically tractable, facilitate seamless integration with existing Kriging imlementations) h( x) X, H, M, { β,, θ }~ h( x), ( x) h T BMA T 1 BMA ( x M) f( x) β r ( x) R ( HFβ ) θ * * θ 1 * 1 1 1 ( ) ( ) T T T T x M 1 ( ) * *( ) * *( ) u c F F F R F u r x R r x θ θ θ θ
Comutational imlementation I Reeat for every model Predict under model M h Inefficient for large number of models (n m >>1) x x Kriging ( h x, XH, M ) x 1, h xxh,, M h, H M PM ( X, H) xx,, n m 1
Efficient averaging of redictions. Utilize Occam s Razor rincile, and only consider models whose lausibility is non-negligible [1]: ˆ M P ( M XH, ) α max PM ( XH, ) M M, h xxh,, h xxh,, M P( M XH, ) M ˆ where α controls the degree of restriction: α=0 : include all models; α=1 : include the most lausible one only. [1]. Raftery, A. E., Madigan, D., and Hoeting, J. A., 1997, "Bayesian model averaging for linear regression models," Journal of the American Statistical Association, 9(437),. 179-191.
Comutational imlementation II Training data X H Each model s evidence calculation requires otimization (EB arameters) intensive comutation! Model (basis functions) M 1 Model (basis functions) M... Evidence Evidence... Post rob. Post rob.... Model (basis functions) M Evidence Post rob. M P(H X, M ) P(M X, H) P( H X, M ) P M where Bayesian model class averaging * * ( H X,, σ, θ) [( σ ), θ ] arg max P( H X, M, σ, θ * * σ, θ )
Model sace exloration setting We assume the ool of interested basis functions are olynomial functions u to some order (set to commonly): n x nx i fc xi i i1 i1 1 1, x1,, x, 1,,, 1,, 1,, 1, n x x x n x x x x n x x n x x x x nx A oular setu because of its flexibility and hysical interretability Number of otential models is large, for examle for n x =10 dimensional roblem, the combination to exlore (for =) are 410! An efficient search imlementation is necessary
Efficient model search rinciles Occam s Razor: the model is ignored if its osterior robability is lesser than its simler variants (those with basis functions as its subsets) [1]. Effect Heredity: a basis function is allowed to aear only if all its arents (corresonding lower order ones) are already included []. For examle, the quadratic term (x ) is allowed only if the model already includes the corresonding constant and linear terms [1, x]. Forward search from simler to more comlex models [1]. Raftery, A. E., Madigan, D., and Hoeting, J. A., 1997, "Bayesian model averaging for linear regression models," Journal of the American Statistical Association, 9(437),. 179-191. []. Peixoto, J. L., 1987, "Hierarchical variable selection in olynomial regression models," The American Statistician, 41(4),. 311-313.
Forward and stewise search I Start with ordinary Kriging (OK). Calculate its EB arameters through Eq. (16) and marginal lielihood P(M 0 X,H). Set active set ={M 0 } 1st iteration ( = 1) Select model M to corresond to the th model in and obtain all allowable basis function additions (functions whose arents are included in M ) Termination Construct the final set of models by screening across ˆ M P ( H X, M ) P ( M ) α max P ( H X, M ) P ( M ) = +1 yes M no Is there a +1 model in? Weights for each PM (, ) model are XH w nˆ m M ˆ PM ( XH, ) 1 Construct the model ( M Elimination yes 1 ) with one added basis function. 1 Calculate the osterior robability PM ( XH, ) Eliminate PM XH chec PM XH 1 (, ) (, ) Reeat for each allowable basis function addition f +1 no 1 M 1 Retain M Enrich active set: 1 M Exlore if more comlex model satisfies Occam and heredity rinciles. If yes exlore around it (consider even more comlex models)
Intermediate elimination We can obtain (no comutational cost) aguessfor a new (comlex) M +1 model s osterior robability using the nown EB arameter of the arent M simler one: PM PM 1 1 ( X, H) ( X, H,, θ ) Comlex model EB arameters of arent simler model Intermediate elimination: ignore the comlex model without the need of calculating its EB arameter, if its aroximated osterior robability is very unromising (significantly below simler one). PM PM 1 ( XH,,, θ ) ( X, H,, θ ) Only if the model asses intermediate elimination, we identify its actual EB arameters to examine actual Occam s razor.
Forward and stewise search II Start with ordinary Kriging (OK). Calculate its EB arameters through Eq. (16) and marginal lielihood P(M 0 X,H). Set active set ={M 0 } 1st iteration ( = 1) Select model M to corresond to the th model in and obtain all allowable basis function additions (functions whose arents are included in M ) Termination Construct the final set of models by screening across ˆ M P ( H X, M ) P ( M ) α max P ( H X, M ) P ( M ) = +1 yes M no Is there a +1 model in? Weights for each PM (, ) model are XH w nˆ m M ˆ PM ( XH, ) 1 Construct the model ( M Intermediate elimination yes Eliminate 1 ) with one added basis function. 1 Aroximate the osterior robability PM ( XH, ) PM chec 1 ( XH, ) PM ( XH, ) Reeat for each allowable basis function addition f +1 no 1 M 1 Retain M Collect all retained models Enrich active set: no 1 M Actual Elimination chec XH 1 (, ) PM (, ) PM Reeat for each retained model M +1 XH Eliminate yes Exloration 1 Obtain the EB arameters for M through Eq. (16) 1 and the actual osterior robability PM ( XH, ) 1 M
Comarison methods Blind Kriging (BK): forwardly selects the next best basis function to add until the cross validation error decreases [3]. Dynamic Kriging (DK): formulate the basis function selection as an integer rogramming roblem, and sees the combination that minimizes the rocess variance with Genetic Algorithm [4]. [3]. Joseh, V. R., Hung, Y., and Sudjianto, A., 008, "Blind riging: A new method for develoing metamodels," Journal of Mechanical Design, 130(3),. 03110. [4]. Zhao, L., Choi, K., and Lee, I., 011, "Metamodeling method using dynamic riging for design otimization," AIAA Journal, 49(9),. 034-046.
Comarison methods Blind Kriging (BK): forwardly selects the next best basis function to add until the cross validation error decreases [3]. Dynamic Kriging (DK): formulate the basis function selection as an integer rogramming roblem, and sees the combination that minimizes the rocess variance with Genetic Algorithm [4]. BMAK: with efficient model averaging and search strategies; Full-BMAK: roosed aroach without efficient strategies [only alicable to low-dimensional roblems]; BMSK: same as BMAK, but only use the most lausible model to redict (i.e. model selection variant)
Benchmar roblems I Branin: two-dimensional roblem with moderate nonlinearity Griewan: two-dimensional roblem with high nonlinearity
Benchmar roblems II Borehole: engineering roblem modelling the water flow rate through a borehole (Harer et al. 1983) Half-Car: realistic engineering roblem calculating the roadholding statistics of a half-car nonlinear model riding on a rough road (Zhang et al., 016). Dimension Number of exeriments Branin 14 Griewan 30 Borehole 8 4 Half-Car 19 60
Prediction correlation coefficient Branin Griewan Borehole Half-Car Full BMAK 0.95 0.946 N/A N/A BMAK 0.915 0.949 0.996 0.917 BMSK 0.93 0.945 0.995 0.890 BK 0.915 0.917 0.983 0.86 DK 0.897 0.93 0.914 0.69 Reorted: the average rediction correlation over 50 indeendent runs Higher value Better rediction
More comrehensive erformance assessment Half-car examle Correlation coefficient RMS error Prediction interval coverage Better erformance Better erformance Better erformance
Performance for UQ redictions Borehole Half-Car BMAK 0.0144 0.0337 BMSK 0.0154 0.0408 BK 0.083 0.0571 DK 0.0818 0.074 Reorted: the Kolmogorov Smirnov measure between the true and rediction CDF over 50 indeendent runs Smaller discreancy Better erformance
Comutational cost Com. cost Branin Griewan Borehole Half-Car Tuning (EB otimization) 3.17 3 0.5 7.37 Predictions (model averaging) 3.14 3 16.7 7.5 Model sace size 63 63 3.5x10 13 1.6x10 63 Conventional Kriging cost = 1 for both tuning and redictions BMAK s comutational cost is at most one magnitude greater than conventional Kriging with deterministic regression structure
Conclusion A Bayesian Model Averaging formulation for Kriging (BMAK) is roosed to systematically treat regression structure uncertainties. A data-driven rior is roosed, combining the benchmar g- rior and the Emirical Bayes rior; Selection lends itself to analytical tractability of roblem. Efficient model averaging and search strategy is roosed for roblems with large model sace. Results demonstrates the advantage of BMAK in both analytical and engineering roblems.
Than You!