Gaussian processes A hands-on tutorial

Size: px
Start display at page:

Download "Gaussian processes A hands-on tutorial"

Transcription

1 Gaussian rocesses A hands-on tutorial Slides and code: htts://github.com/araklas/gptutorial Paris Perdikaris Massachusetts Institute of Technology Deartment of Mechanical Engineering Web: htt://web.mit.edu/aris/www/ aris@mit.edu ICERM Providence RI June 5 th 017

2 Tuesday June Time Descrition Seaker Wednesday June Time Descrition Seaker Location Abstracts 9:00-9:45 10:00-10:30 10:30-11:15 Probabilistic Dimensionality Reduction Coffee Break Bayesian otimization for automating model selection Neil Lawrence University of Sheffield and Amazon Research Cambridge Roman Garnett Washington University in St. Louis 9:00-9:45 Bayesian Calibration of Simulators with Structured Discretization Uncertainty 10:00-10:30 Coffee Break 11th Floor 11:15 Numerical Gaussian Processes for Timedeendent and Nonlinear Partial Differential Equations Oksana Chkrebtii The Ohio State University Maziar Raissi Brown University 11th :30 Floor - 3:15 Bayesian Chris Lecture Hall Probabilistic 4/Bayesian_Calibration_of_ Oates Numerical Newcastle Methods. University (Part I) 3:30-4:00 Coffee/Tea Break 4:00-4:45 Bayesian Probabilistic Numerical Methods. (Part II) Jon Cockayne University of Warwick 11:30-1:15 Variational Reformulation of Uncertainty Proagation Problem in Linear Partial Differential Equations Ilias Bilionis Purdue University GPs will be mentioned in ~50% of worksho talks!

3 model... but we always work with finite sets! Data-driven modeling with Gaussian rocesses tart with a multivariate Gaussian: The linear algebra of comutation fs fs+1 fs+ uncertainty fn ) N (µ K). (f1 f surfaces under onse {z } {z } fa fb ypriors fover () functions: + µa 94 µ µb 0 f GP(µ() K( ; )) KAA KAB and K KBA KBB covariance function nalisation (1940) roerty: Marginalization: 0) (f f ) N (µ K). A B Z ing 1996) Covariance func constant eression Samles from a GP rior 0 PD S ND Neural 0 d d d1 d networks 0 olynomial ( +infinite 0) (fa fb )dfb N (µa KAA ) (fa ) Dual r fb squared eonential e( `limits ) functions 1 Mat ern K 1 ( ) ` r ` r.i.e. y f()gaussian r+ ε f~gp(μσ) Kernel eonential e( ` ) Gaussian roceses fully non-arametric! rocesses machines r -eonential e (`) Dee NN [Snoek et al. 015]. Bayesian r (fa fb ) N (µ K). Then: rational quadratic (1 + ` ) inference 1 1 Covariance 1K (fa fb ) N (µa + KAB KBB (fneural µbnetwork ) KAA KAB K ) > 0 B BA BB sin (1+ > )(1+ 0> 0 ) Then: linear PosteriorGPs is also Gaussian! with ) rior over functions > Gaussian Processes ns: Posterior iscalibrate also Gaussian: ations (y) 013] 94 such linear [f y] tothat infereach redictions ariate Gaussian. ed uncertainty 4. Eamles of Covariance Functions Rasmussen C. E. Gaussian rocesses for machine learning (006) covariance function eression S fu N

4 I) n osterior distri If we consider a Gaussian likelihood (y f ) N (y f stimation: while rediction uncertainty is 015]. quantified thro ed usingor choices: osterior mean µ t-student rocesses [Shah et al. 013] Dee NN [Snoek et al. bution (f y X) is tractable and canaroach be with used to erform redictive inference for a new Data-driven modeling Gaussian rocesses fequentist. terior variance outut f Infinite-dimensional given a new inut as robability density94 such by thatmaimizing each linear vector of hyer-arameters is determined marginal 0 y f restriction () + is multivariate f GP(0 k( ; )) finite-dimensional Gaussian. od of observed data ( so called model evidence) i.e. (f y X ) N (f µ ) 1 (5 covariance N log constant(6 1 1 T ( ) k (K + µ N log (y X ) log K + I y (K + I) I)y 1 y ( ) k kn(k + I) 1 kn (7 Training rediction linear for ond kind resectively. In what follows&we formulate inference roblem Bayesian aroach olynomia homoscedastic noise while we refer reader to [] for a detailed outline of T and [k( )... k( )] k k k k( ). Predictions are where k risk-averseness 1 out using MCMC. Assign riors hyer arameters marginalize m N over N and N Introducing N ]T as a vector of hy scedastic case. To this end we introduce [ arameter estimation: squared e is quantified through comuted using osterior mean µ while rediction uncertainty nt forecast of fvariance is needed n erforming redictions using osterior mead eters which characterize GP fequentist model aroach which are tyically comuted from osterior. Mat ern The vector hyer-arameters is determined maimizing marginalcon log maimum estimation..h6) would belikelihood oftraditional choice. Carrying this by into an otimization eonenti likelihood of observed data ( so called model evidence) i.e. I) n osterior di we Gaussian likelihood (y f ) N (y f ght consider be led toa consider following substitute of Eq. 1: Training via maimizing marginal likelihood -eonen 0 Model f () GP(µ() k( to )) 1iserform determined by mean (f y X) isitractable and redictive inference for a 1can be used N 1 0 log function (y X )m() and logcovariance K + I function y T (K log rational(8q + I) y k( ; ). f given a new inut as min µ (). variance k(r) second kind resectively. In what follows we formulate inference roblem for X neural net I Posterior mean µ(; D) and variance (; D) can be Bayesian aroach case of Assign homoscedastic noise while we refer reader to [] for ausing detailed outline of of Prediction via conditioning on available data 4. Eamles Cw riors over hyer arameters and marginalize m out MCMC..3 Introducing risk-averseness comuted elicitly given a dataset D. nd vheteroscedastic are otimal solution and otimal value of this roblem n T case. To this end we introduce [ ] as a vector of hyer )? In this (f Xerforming ) which Nwe (fare )using Bayesian setting believe that osterior eected valu said about f ( y µ Ifarameters a oint forecast f is needed n redictions mean µ which of characterize GP model tyically comuted from data Table 1 4.1: Summa into 1 an otimization stion: equal to6)maimum µwould ( ) be likelihood µ () for all X with right-hand side being equa through (see Eq. traditional choice. Carrying this contet are written eir ( ) k (K + I) y µestimation. N 0.8 I) n columns marked S osterior distriif we consider a Gaussian likelihood (y f ) N (y f Workflow: one might be led to consider following substitute of Eq. 1: ected value of f (). Consequently based on information incororated in 1 and 0.6 nondegenerate ( ) k k (K + I) k N N Assign a Gaussian rior over functions bution (f y is tractable for for a new section 4.3 mor or (f y X) wex)have that and can be used to erform redictive inference training set calibrate outut f Given givenforaamachine newlearning inut of as datamin µ (). model arameters (9 Rasmussen C. E. Gaussian rocesses (006) 0.4

5 Workflow 1.) Model secification: Choosing a rior f GP(0k( 0 ; )).) Training model: Inference algorithm log (y X ) 1 log K + 1 I yt (K + I) 1 y Bayesian aroach 3.) Obtain redictions & quantify uncertainty (f y X )N(f µ ) µ ( )k N (K + I) 1 y ( )k k N (K + I) 1 k N N log 4.) Data acquisition

6 Multi-fidelity modeling

7 Multi-fidelity modeling

8 Multi-fidelity modeling

9 Multi-fidelity modeling

10 Multi-fidelity modeling with GPs Auto-regressive model: AR1 f t () t 1 ()f t 1 ()+ t () t 1...s Predictive osterior (f y X )N(f µ ) µ ( )k N (K + I) 1 y ( )k k N (K + I) 1 k N N 1 K 11 K 1 N K 1 K Block covariance matri M.C Kennedy and A. O'Hagan. Predicting outut from a comle comuter code when fast aroimations are available 000.

11 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. e.g. samle at locations that maimize eected imrovement Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

12 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

13 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

14 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

15 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

16 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

17 Bayesian otimization Goal: Estimate global minimum of a function: arg min R d g() (otentially intractable) Setu: g() is a black-bo and eensive to evaluate objective function noisy observations no gradients. Idea: Aroimate g() using a GP surrogate: y f()+ f GP (f 0k( 0 ; )) Utilize osterior to guide a sequential or arallel samling olicy by otimizing a chosen eected utility function (; D n )E E v [U(v )] The otimization roblem is transformed to: n+1 arg ma (; D n ) ive function to obtain Remark: Acquisition functions aim to balance trade-off between eloration and eloitation. Shahriari Bobak et al. "Taking human out of loo: A review of bayesian otimization." Proceedings of IEEE (016):

18 Some software ackages Gaussian rocesses: htt:// htts://github.com/sheffieldml/gpy htts://github.com/gpflow/gpflow Bayesian otimization: htts://github.com/sheffieldml/gpyot htts://github.com/hips/searmint Automatic differentiation: htts://github.com/hips/autograd htts:// htt://deelearning.net/software/ano/ htt://ytorch.org Probabilistic rogramming: htt://edwardlib.org htt://mc-stan.org htts://ymc-devs.github.io/ymc3/inde.html

Introduction to Probability for Graphical Models

Introduction to Probability for Graphical Models Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Lecture 7: Linear Classification Methods

Lecture 7: Linear Classification Methods Homeork Homeork Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June. What is linear classification? lassification

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Image Alignment Computer Vision (Kris Kitani) Carnegie Mellon University

Image Alignment Computer Vision (Kris Kitani) Carnegie Mellon University Lucas Kanade Image Alignment 16-385 Comuter Vision (Kris Kitani) Carnegie Mellon University htt://www.humansensing.cs.cmu.edu/intraface/ How can I find in the image? Idea #1: Temlate Matching Slow, combinatory,

More information

Nonlinear programming

Nonlinear programming 08-04- htt://staff.chemeng.lth.se/~berntn/courses/otimps.htm Otimization of Process Systems Nonlinear rogramming PhD course 08 Bernt Nilsson, Det of Chemical Engineering, Lund University Content Unconstraint

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Empirical Bayesian EM-based Motion Segmentation

Empirical Bayesian EM-based Motion Segmentation Emirical Bayesian EM-based Motion Segmentation Nuno Vasconcelos Andrew Liman MIT Media Laboratory 0 Ames St, E5-0M, Cambridge, MA 09 fnuno,lig@media.mit.edu Abstract A recent trend in motion-based segmentation

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

A parametric approach to Bayesian optimization with pairwise comparisons

A parametric approach to Bayesian optimization with pairwise comparisons A parametric approach to Bayesian optimization with pairwise comparisons Marco Co Eindhoven University of Technology m.g.h.co@tue.nl Bert de Vries Eindhoven University of Technology and GN Hearing bdevries@ieee.org

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Graphical Models (Lecture 1 - Introduction)

Graphical Models (Lecture 1 - Introduction) Grahical Models Lecture - Introduction Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture - Introduction / 7

More information

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 11: Bayesian learning ontinued Geoffrey Hinton Bayes Theorem, Prior robability of weight vetor Posterior robability of weight vetor

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

REVIEW PERIOD REORDER POINT PROBABILISTIC INVENTORY SYSTEM FOR DETERIORATING ITEMS WITH THE MIXTURE OF BACKORDERS AND LOST SALES

REVIEW PERIOD REORDER POINT PROBABILISTIC INVENTORY SYSTEM FOR DETERIORATING ITEMS WITH THE MIXTURE OF BACKORDERS AND LOST SALES T. Vasudha WARRIER, PhD Nita H. SHAH, PhD Deartment of Mathematics, Gujarat University Ahmedabad - 380009, Gujarat, INDIA E-mail : nita_sha_h@rediffmail.com REVIEW PERIOD REORDER POINT PROBABILISTIC INVENTORY

More information

Multi-Attribute Bayesian Optimization under Utility Uncertainty

Multi-Attribute Bayesian Optimization under Utility Uncertainty Multi-Attribute Bayesian Optimization under Utility Uncertainty Raul Astudillo Cornell University Ithaca, NY 14853 ra598@cornell.edu Peter I. Frazier Cornell University Ithaca, NY 14853 pf98@cornell.edu

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

Lecture 21: Quantum Communication

Lecture 21: Quantum Communication CS 880: Quantum Information Processing 0/6/00 Lecture : Quantum Communication Instructor: Dieter van Melkebeek Scribe: Mark Wellons Last lecture, we introduced the EPR airs which we will use in this lecture

More information

Lecture 1c: Gaussian Processes for Regression

Lecture 1c: Gaussian Processes for Regression Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Decoding First-Spike Latency: A Likelihood Approach. Rick L. Jenison

Decoding First-Spike Latency: A Likelihood Approach. Rick L. Jenison eurocomuting 38-40 (00) 39-48 Decoding First-Sike Latency: A Likelihood Aroach Rick L. Jenison Deartment of Psychology University of Wisconsin Madison WI 53706 enison@wavelet.sych.wisc.edu ABSTRACT First-sike

More information

Chapter 10. Supplemental Text Material

Chapter 10. Supplemental Text Material Chater 1. Sulemental Tet Material S1-1. The Covariance Matri of the Regression Coefficients In Section 1-3 of the tetbook, we show that the least squares estimator of β in the linear regression model y=

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

A Time-Varying Threshold STAR Model of Unemployment

A Time-Varying Threshold STAR Model of Unemployment A Time-Varying Threshold STAR Model of Unemloyment michael dueker a michael owyang b martin sola c,d a Russell Investments b Federal Reserve Bank of St. Louis c Deartamento de Economia, Universidad Torcuato

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

ECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides.

ECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides. ECE521 Tutorial 2 Regression, GPs, Assignment 1 ECE521 Winter 2016 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides. ECE521 Tutorial 2 ECE521 Winter 2016 Credits to Alireza / 3 Outline

More information

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda Short course A vademecum of statistical attern recognition techniques with alications to image and video analysis Lecture 6 The Kalman filter. Particle filters Massimo Piccardi University of Technology,

More information

Quick Detection of Changes in Trac Statistics: Ivy Hsu and Jean Walrand 2. Department of EECS, University of California, Berkeley, CA 94720

Quick Detection of Changes in Trac Statistics: Ivy Hsu and Jean Walrand 2. Department of EECS, University of California, Berkeley, CA 94720 Quic Detection of Changes in Trac Statistics: Alication to Variable Rate Comression Ivy Hsu and Jean Walrand 2 Deartment of EECS, University of California, Bereley, CA 94720 resented at the 32nd Annual

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Optimal Recognition Algorithm for Cameras of Lasers Evanescent

Optimal Recognition Algorithm for Cameras of Lasers Evanescent Otimal Recognition Algorithm for Cameras of Lasers Evanescent T. Gaudo * Abstract An algorithm based on the Bayesian aroach to detect and recognise off-axis ulse laser beams roagating in the atmoshere

More information

Part III. for energy minimization

Part III. for energy minimization ICCV 2007 tutorial Part III Message-assing algorithms for energy minimization Vladimir Kolmogorov University College London Message assing ( E ( (,, Iteratively ass messages between nodes... Message udate

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

Covariance Matrix Estimation for Reinforcement Learning

Covariance Matrix Estimation for Reinforcement Learning Covariance Matrix Estimation for Reinforcement Learning Tomer Lancewicki Deartment of Electrical Engineering and Comuter Science University of Tennessee Knoxville, TN 37996 tlancewi@utk.edu Itamar Arel

More information

Convex Analysis and Economic Theory Winter 2018

Convex Analysis and Economic Theory Winter 2018 Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Bayesian Calibration of Simulators with Structured Discretization Uncertainty Bayesian Calibration of Simulators with Structured Discretization Uncertainty Oksana A. Chkrebtii Department of Statistics, The Ohio State University Joint work with Matthew T. Pratola (Statistics, The

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Optimal array pattern synthesis with desired magnitude response

Optimal array pattern synthesis with desired magnitude response Otimal array attern synthesis with desired magnitude resonse A.M. Pasqual a, J.R. Arruda a and P. erzog b a Universidade Estadual de Caminas, Rua Mendeleiev, 00, Cidade Universitária Zeferino Vaz, 13083-970

More information

Bayesian Inference of Noise Levels in Regression

Bayesian Inference of Noise Levels in Regression Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop

More information

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 10: The Bayesian way to fit models Geoffrey Hinton The Bayesian framework The Bayesian framework assumes that we always have a rior

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Bayesian optimization for automatic machine learning

Bayesian optimization for automatic machine learning Bayesian optimization for automatic machine learning Matthew W. Ho man based o work with J. M. Hernández-Lobato, M. Gelbart, B. Shahriari, and others! University of Cambridge July 11, 2015 Black-bo optimization

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Bayesian classification CISC 5800 Professor Daniel Leeds

Bayesian classification CISC 5800 Professor Daniel Leeds Bayesian classification CISC 5800 Professor Daniel Leeds Classifying with robabilities Examle goal: Determine is it cloudy out Available data: Light detector: x 0,25 Potential class (atmosheric states):

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida

More information

L p Norms and the Sinc Function

L p Norms and the Sinc Function L Norms and the Sinc Function D. Borwein, J. M. Borwein and I. E. Leonard March 6, 9 Introduction The sinc function is a real valued function defined on the real line R by the following eression: sin if

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems

Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems Ali Baheri Department of Mechanical Engineering University of North Carolina at Charlotte

More information

Bayesian Networks for Modeling and Managing Risks of Natural Hazards

Bayesian Networks for Modeling and Managing Risks of Natural Hazards [National Telford Institute and Scottish Informatics and Comuter Science Alliance, Glasgow University, Set 8, 200 ] Bayesian Networks for Modeling and Managing Risks of Natural Hazards Daniel Straub Engineering

More information

transformation, and nonlinear deformations containing local The deformation can be decomposed into a global affine progress).

transformation, and nonlinear deformations containing local The deformation can be decomposed into a global affine progress). Image Waring for Forecast Verification Johan Lindström, Eric Gilleland 2 & Finn Lindgren Centre for Mathematical Sciences, Lund University 2 Research Alications Laboratory, National Center for Atmosheric

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Kriging by Example: Regression of oceanographic data. Paris Perdikaris. Brown University, Division of Applied Mathematics

Kriging by Example: Regression of oceanographic data. Paris Perdikaris. Brown University, Division of Applied Mathematics Kriging by Example: Regression of oceanographic data Paris Perdikaris Brown University, Division of Applied Mathematics! January, 0 Sea Grant College Program Massachusetts Institute of Technology Cambridge,

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information