Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm materials prepared by Jerry Sacks and Will Welch fr varius shrt curses Acadia/SFU/UBC Curse n Dynamic Cmputer Experiments September December 2014 J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 1 / 20

Outline f Tpics Outline 1 Estimating the Parameters f the GP Mdel 2 Case Study: G-Prtein Cmputer Experiment 3 Measuring Predictin Accuracy 4 GP Diagnstics 5 Summary 6 Appendix J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 2 / 20

Estimating the Parameters f the GP Mdel Parameters f the Gaussian Prcess (GP) Mdel Recall frm Mdule 2 that the Gaussian prcess prir fr y(x) = y(x 1,, x d ) has hyper-parameters: mean, µ, variance, σ 2 crrelatin parameters, eg, θ 1,, θ d and p 1,, p d fr the pwer-expnential crrelatin functin, R(x, x ) = d exp( θ j x j x j p j ) j=1 Their values will be chsen t be cnsistent with the cmputer-mdel runs J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 3 / 20

Estimating the Parameters f the GP Mdel Maximum Likelihd Recall als that y(x) is assumed t be Gaussian Hence, y = [y(x (1) ),, y(x (n) )] T, the data frm the cmputer mdel, are a sample frm a multivariate-nrmal distributin The likelihd, L(y µ, σ 2, θ 1,, θ d, p 1,, p d ), is 1 (2πσ 2 ) n/2 det 1/2 (R) exp( 1 2σ 2 (y µ1)t R 1 (y µ1)) Maximum likelihd estimatin (MLE) chses the hyper-parameters t maximize this Or use Bayes rule t get a psterir distributin fr the hyper-parameters and fr predictins f y(x) (see Appendix A) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 4 / 20

Estimating the Parameters f the GP Mdel Maximum Likelihd: Cmputatin Fr fixed crrelatin parameters, and ˆµ = 1T R 1 y 1 T R 1 1 σ 2 = 1 n (y ˆµ1)T R 1 (y ˆµ1) The likelihd functin (with ˆµ and σ 2 substituted) has t be numerically maximized wrt the crrelatin parameters J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 5 / 20

Case Study: G-Prtein Cmputer Experiment G-Prtein Cmputer Mdel Bisystems mdel fr s-termed ligand activatin f G-prtein in yeast d = 4 input variables x is cncentratin f ligand u 1,, u 8 is a vectr f 8 kinetic parameters (nly u 1, u 6, and u 7 are varied) Output variable y is the nrmalized cncentratin f part f the cmplex J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 6 / 20

Case Study: G-Prtein Cmputer Experiment G-Prtein System Dynamics: Differential Equatins 1 η 1 = u 1 η 1 x + u 2 η 2 u 3 η 1 + u 5 2 η 2 = u 1 η 1 x u 2 η 2 u 4 η 2 3 η 3 = u 6 η 2 η 3 + u 8 (G tt η 3 η 4 )(G tt η 3 ) 4 η 4 = u 6 η 2 η 3 u 7 η 4 5 y = (G tt η 3 )/G tt where η 1,, η 4 are cncentratins f 4 chemical species and η 1 η 1 t, etc G tt = (fixed) ttal cncentratin f G-prtein cmplex after 30 secnds J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 7 / 20

Case Study: G-Prtein Cmputer Experiment Inputs and Cde Runs Input variables d = 4 variables Wrk with lg(x), lg(u 1 ), lg(u 6 ), lg(u 7 ) ie, what we called the x vectr befre is lg(x), lg(u 1 ), lg(u 6 ), and lg(u 7 ) here All input variable ranges are nrmalized t [0, 1] n the lg scale Number f runs n = 33 (this chice and the design fr the 33 runs is described in Mdule 4) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 8 / 20

Case Study: G-Prtein Cmputer Experiment Cmputer Mdel Data ymd 02 04 06 08 ymd 02 04 06 08 00 02 04 06 08 10 lgu1 00 02 04 06 08 10 lgu6 ymd 02 04 06 08 ymd 02 04 06 08 00 02 04 06 08 10 lgu7 00 02 04 06 08 10 lgx J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 9 / 20

Case Study: G-Prtein Cmputer Experiment Gaussian Prcess (GP) Mdel y(x) is a realizatin f a Gaussian prcess with: mean µ variance σ 2 crrelatins given by Cr(y(x), y(x )) R(x, x ) = The parameters in red need t be estimated 4 j=1 e θ j x j x j p j J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 10 / 20

Case Study: G-Prtein Cmputer Experiment Maximum Likelihd Estimates ˆµ = 036 ˆσ 2 = 051 Variable ˆθ ˆp lg(x) 0929 198 lg(u 1 ) 0179 2 lg(u 6 ) 0082 2 lg(u 7 ) 0083 2 It is difficult t interpret the magnitudes f the estimates (we will revisit this example in Mdule 5 and d a sensitivity analysis) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 11 / 20

Measuring Predictin Accuracy Plug-In Predictin and Standard Errr Replace all hyper-parameters by their MLEs in the cnditinal mean and variance frmulas: and predictin f y(x) = ŷ = ˆm(x) = ˆµ + r T (x)r 1 (y ˆµ1) estimated variance f predictin = ˆv(x) = σ 2 (1 r T (x)r 1 r(x)) (R and r(x) are als estimates) The plug-in estimated variance ignres uncertainty in estimating the hyper-parameters It can be adapted t include uncertainty frm estimating µ: ˆv(x) = σ 2 ( 1 r T (x)r 1 r(x) + [1 1T R 1 r(x)] 2 1 T R 1 1 This plug-in frmula is ften used t give a standard errr, ie, s(x) = ˆv(x) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 12 / 20 )

Measuring Predictin Accuracy Measures f Accuracy We culd rely n the standard errr, ˆv(x) If we have m test data bservatins, the rt mean squared errr (RMSE) f predictin is 1 RMSE = (ŷ y(x)) m 2 But rarely available Crss validatin (CV) test pts J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 13 / 20

Crss Validatin (CV) GP Diagnstics Let x (i) dente x fr run i in the data (i = 1,, n) Fr run i: The crss validated predictin f y(x (i) ) is ŷ i (x (i) ), ie, ŷ(x) = ˆm(x) cmputed frm the n 1 runs excluding run i The crss validated standard errr f ŷ i (x (i) ) is s i (x (i) ), ie, s(x) = ˆv(x) cmputed frm the n 1 runs excluding run i The crss-validated residual fr run i is y(x (i) ) ŷ i (x (i) ) The standardized crss-validated residual fr run i is y(x (i) ) ŷ i (x (i) ) s i (x (i) ) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 14 / 20

Diagnstic Plts GP Diagnstics Plt the crss-validated residuals t assess the verall magnitude f errr Plt the standardized crss-validated residuals t assess the validity f the standard errr fr individual predictins J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 15 / 20

GP Diagnstics G-Prtein Diagnstic Plts ymd 02 04 06 08 Standardized residual 4 2 0 2 4 02 04 06 08 Predicted ymd 02 04 06 08 Predicted ymd J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 16 / 20

GP Diagnstics Crss Validatin: Numerical Summaries Magnitude f errr The crss-validated rt mean squared errr is 1 CVRMSE (y(x n (i) ) ŷ i (x (i) )) 2 = 020 Maximum crss-validated residual is 044 Fairly accurate relative t a range f abut 07 in y Standard errrs? y(x(i) ) ŷ i (x (i) ) fr i = 1,, n are rughly in ( 2, 2) s i (x (i) ) Standard errrs lk reliable J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 17 / 20

Fast and Slw CV GP Diagnstics When run i is remved, the hyper-parameters shuld be re-estimated Fr cmputatinal reasns the crrelatin parameters are ften nt updated (it is cheap t update the estimates f µ and σ 2 ), prducing a fast CV Fr slw CV, d say 10-fld crss-validatin, re-estimating all hyper-parameters The agreement between fast CVRMSE and slw CVRMSE is ften gd The agreement between fast CVRMSE and the RMSE frm test pints has been gd in examples J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 18 / 20

Mdule Summary Summary The GP mdel has t be tuned t data s that its prperties match thse f the cmputer mdel Tuning (fitting) the GP by maximum likelihd is cmputatinally feasible fr up t abut n = 1000 runs and d = 50 input variables GP mdel gives an apprximatin and a measure f accuracy The measure f accuracy (standard errr) can be checked fr validity by crss validatin J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 19 / 20

Appendix Appendix A: Bayesian Treatment f the Hyper-parameters Psterir distributin f the hyper-parameters ( hyper belw), µ, σ 2, θ 1,, θ d, etc, f the GP Frm Bayes rule, given the data y p(hyper y) π(hyper)l(y hyper), π(hyper) is the prir fr hyper L(y hyper) is the multivariate nrmal likelihd Predictive distributin fr y(x) at a new x p(y(x) y) = p(y(x) y, hyper)p(hyper y) dhyper Usually, the integratin is nt carried ut explicitly Rather, prperties such as the psterir predictive mean and variance f p(y(x 0 ) y) are btained by MCMC sampling f the psterir distributin fr the hyper-parameters, p(hyper y) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 20 / 20