Gaussian processes A hands-on tutorial

Size: px

Start display at page:

Download "Gaussian processes A hands-on tutorial"

Janel Rose
6 years ago
Views:

Gaussian rocesses A hands-on tutorial Slides

Deartment of Mechanical Engineering Web:

1 Gaussian rocesses A hands-on tutorial Slides and code: htts://github.com/araklas/gptutorial Paris Perdikaris Massachusetts Institute of Technology Deartment of Mechanical Engineering Web: htt://web.mit.edu/aris/www/ aris@mit.edu ICERM Providence RI June 5 th 017

2 Tuesday June Time Descrition Seaker Wednesday June Time Descrition Seaker Location Abstracts 9:00-9:45 10:00-10:30 10:30-11:15 Probabilistic Dimensionality Reduction Coffee Break Bayesian otimization for automating model selection Neil Lawrence University of Sheffield and Amazon Research Cambridge Roman Garnett Washington University in St. Louis 9:00-9:45 Bayesian Calibration of Simulators with Structured Discretization Uncertainty 10:00-10:30 Coffee Break 11th Floor 11:15 Numerical Gaussian Processes for Timedeendent and Nonlinear Partial Differential Equations Oksana Chkrebtii The Ohio State University Maziar Raissi Brown University 11th :30 Floor - 3:15 Bayesian Chris Lecture Hall Probabilistic 4/Bayesian_Calibration_of_ Oates Numerical Newcastle Methods. University (Part I) 3:30-4:00 Coffee/Tea Break 4:00-4:45 Bayesian Probabilistic Numerical Methods. (Part II) Jon Cockayne University of Warwick 11:30-1:15 Variational Reformulation of Uncertainty Proagation Problem in Linear Partial Differential Equations Ilias Bilionis Purdue University GPs will be mentioned in ~50% of worksho talks!

3 model... but we always work with finite sets! Data-driven modeling with Gaussian rocesses tart with a multivariate Gaussian: The linear algebra of comutation fs fs+1 fs+ uncertainty fn ) N (µ K). (f1 f surfaces under onse {z } {z } fa fb ypriors fover () functions: + µa 94 µ µb 0 f GP(µ() K( ; )) KAA KAB and K KBA KBB covariance function nalisation (1940) roerty: Marginalization: 0) (f f ) N (µ K). A B Z ing 1996) Covariance func constant eression Samles from a GP rior 0 PD S ND Neural 0 d d d1 d networks 0 olynomial ( +infinite 0) (fa fb )dfb N (µa KAA ) (fa ) Dual r fb squared eonential e( `limits ) functions 1 Mat ern K 1 ( ) ` r ` r.i.e. y f()gaussian r+ ε f~gp(μσ) Kernel eonential e( ` ) Gaussian roceses fully non-arametric! rocesses machines r -eonential e (`) Dee NN [Snoek et al. 015]. Bayesian r (fa fb ) N (µ K). Then: rational quadratic (1 + ` ) inference 1 1 Covariance 1K (fa fb ) N (µa + KAB KBB (fneural µbnetwork ) KAA KAB K ) > 0 B BA BB sin (1+ > )(1+ 0> 0 ) Then: linear PosteriorGPs is also Gaussian! with ) rior over functions > Gaussian Processes ns: Posterior iscalibrate also Gaussian: ations (y) 013] 94 such linear [f y] tothat infereach redictions ariate Gaussian. ed uncertainty 4. Eamles of Covariance Functions Rasmussen C. E. Gaussian rocesses for machine learning (006) covariance function eression S fu N

4 I) n osterior distri If we consider a Gaussian likelihood (y f ) N (y f stimation: while rediction uncertainty is 015]. quantified thro ed usingor choices: osterior mean µ t-student rocesses [Shah et al. 013] Dee NN [Snoek et al. bution (f y X) is tractable and canaroach be with used to erform redictive inference for a new Data-driven modeling Gaussian rocesses fequentist. terior variance outut f Infinite-dimensional given a new inut as robability density94 such by thatmaimizing each linear vector of hyer-arameters is determined marginal 0 y f restriction () + is multivariate f GP(0 k( ; )) finite-dimensional Gaussian. od of observed data ( so called model evidence) i.e. (f y X ) N (f µ ) 1 (5 covariance N log constant(6 1 1 T ( ) k (K + µ N log (y X ) log K + I y (K + I) I)y 1 y ( ) k kn(k + I) 1 kn (7 Training rediction linear for ond kind resectively. In what follows&we formulate inference roblem Bayesian aroach olynomia homoscedastic noise while we refer reader to [] for a detailed outline of T and [k( )... k( )] k k k k( ). Predictions are where k risk-averseness 1 out using MCMC. Assign riors hyer arameters marginalize m N over N and N Introducing N ]T as a vector of hy scedastic case. To this end we introduce [ arameter estimation: squared e is quantified through comuted using osterior mean µ while rediction uncertainty nt forecast of fvariance is needed n erforming redictions using osterior mead eters which characterize GP fequentist model aroach which are tyically comuted from osterior. Mat ern The vector hyer-arameters is determined maimizing marginalcon log maimum estimation..h6) would belikelihood oftraditional choice. Carrying this by into an otimization eonenti likelihood of observed data ( so called model evidence) i.e. I) n osterior di we Gaussian likelihood (y f ) N (y f ght consider be led toa consider following substitute of Eq. 1: Training via maimizing marginal likelihood -eonen 0 Model f () GP(µ() k( to )) 1iserform determined by mean (f y X) isitractable and redictive inference for a 1can be used N 1 0 log function (y X )m() and logcovariance K + I function y T (K log rational(8q + I) y k( ; ). f given a new inut as min µ (). variance k(r) second kind resectively. In what follows we formulate inference roblem for X neural net I Posterior mean µ(; D) and variance (; D) can be Bayesian aroach case of Assign homoscedastic noise while we refer reader to [] for ausing detailed outline of of Prediction via conditioning on available data 4. Eamles Cw riors over hyer arameters and marginalize m out MCMC..3 Introducing risk-averseness comuted elicitly given a dataset D. nd vheteroscedastic are otimal solution and otimal value of this roblem n T case. To this end we introduce [ ] as a vector of hyer )? In this (f Xerforming ) which Nwe (fare )using Bayesian setting believe that osterior eected valu said about f ( y µ Ifarameters a oint forecast f is needed n redictions mean µ which of characterize GP model tyically comuted from data Table 1 4.1: Summa into 1 an otimization stion: equal to6)maimum µwould ( ) be likelihood µ () for all X with right-hand side being equa through (see Eq. traditional choice. Carrying this contet are written eir ( ) k (K + I) y µestimation. N 0.8 I) n columns marked S osterior distriif we consider a Gaussian likelihood (y f ) N (y f Workflow: one might be led to consider following substitute of Eq. 1: ected value of f (). Consequently based on information incororated in 1 and 0.6 nondegenerate ( ) k k (K + I) k N N Assign a Gaussian rior over functions bution (f y is tractable for for a new section 4.3 mor or (f y X) wex)have that and can be used to erform redictive inference training set calibrate outut f Given givenforaamachine newlearning inut of as datamin µ (). model arameters (9 Rasmussen C. E. Gaussian rocesses (006) 0.4

5 Workflow 1.) Model secification: Choosing a rior f GP(0k( 0 ; )).) Training model: Inference algorithm log (y X ) 1 log K + 1 I yt (K + I) 1 y Bayesian aroach 3.) Obtain redictions & quantify uncertainty (f y X )N(f µ ) µ ( )k N (K + I) 1 y ( )k k N (K + I) 1 k N N log 4.) Data acquisition

6 Multi-fidelity modeling

7 Multi-fidelity modeling

8 Multi-fidelity modeling

9 Multi-fidelity modeling

10 Multi-fidelity modeling with GPs Auto-regressive model: AR1 f t () t 1 ()f t 1 ()+ t () t 1...s Predictive osterior (f y X )N(f µ ) µ ( )k N (K + I) 1 y ( )k k N (K + I) 1 k N N 1 K 11 K 1 N K 1 K Block covariance matri M.C Kennedy and A. O'Hagan. Predicting outut from a comle comuter code when fast aroimations are available 000.

11 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. e.g. samle at locations that maimize eected imrovement Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no

12 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

13 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

14 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

15 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

16 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

17 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

18 Some software ackages Gaussian rocesses: htt:// htts://github.com/sheffieldml/gpy htts://github.com/gpflow/gpflow Bayesian otimization: htts://github.com/sheffieldml/gpyot htts://github.com/hips/searmint Automatic differentiation: htts://github.com/hips/autograd htts:// htt://deelearning.net/software/ano/ htt://ytorch.org Probabilistic rogramming: htt://edwardlib.org htt://mc-stan.org htts://ymc-devs.github.io/ymc3/inde.html

Introduction to Probability for Graphical Models

Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,