Nonparameteric Regression:

Size: px
Start display at page:

Download "Nonparameteric Regression:"

Transcription

1 Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin 1 / 34

2 Outline Two nonparametric regression methods are introduced: Kernel regression: Nadaraya-Watson Estimator Gaussian process regression (Bayesian nonparametric models applied to the regression problem) GP regression Link with Bayesian linear regression 2 / 34

3 Nadaraya-Watson Kernel Regression 3 / 34

4 Kernel Regression Nadaraya-Watson estimator: N n=1 f (x ) = y n k h (x x n ) N l=1 k. h(x x l ) In the case of Gaussian kernel, k h (x x n ) = 1 { exp 1 (2πh 2D D/2 ) 2h 2 x x n 2} = 1 D { exp 1 } (2πh 2D ) D/2 2h 2 (x i x n,i ) 2. i=1 Computes the locally weighted average of y n s near x using a kernel as a weighting function. 4 / 34

5 Nadaraya-Watson Estimator (Detailed Derivation) Recall f (x) = E [y x] = y p(y x) dy = p(x, y) y dy. p(x) Use kernel density estimation to determine both p(x, y) and p(x): p(x, y) = 1 N p(x) = 1 N N k hx (x x n)k hy (y y n), n=1 N k hx (x x n). n=1 5 / 34

6 Compute y p(x, y) dy = 1 N = 1 N = 1 N N k hx (x x n) y k hy (y y n) dy n=1 N n=1 k hx (x x n) N y n k hx (x x n) n=1 y 1 { exp λ y (y y n) 2} dy Z y } {{ } y n Therefore, f (x) = = y p(x, y) dy p(x) N n=1 yn k h(x x n) N l=1 k. h(x x l ) 6 / 34

7 Gaussian Process Regression 7 / 34

8 Pictorial Illustration of GP Regression Green curve: the true sinusoidal function from which the data points are obtained by sampling and addition of Gaussian noise. Red line: the mean of GP predictive distribution and the shaded region corresponds to plus and minus two standard deviations Treat the latent vector as parameters f = [f (x 1 ),..., f (x N )] R N. Compute Gaussian process posterior f (x) X, y, combining the GP prior f (x) GP(0, k(x, x )) with the Gaussian likelihood p(y X, f) = N (y f, σ 2 I). [Figure source: Bishop s PRML] 8 / 34

9 Bayesian Regression: Parametric vs. Nonparametric Given a set of training examples, D = {(x n, y n) n = 1,..., N}, the goal of Bayesian regression is to make a prediction given new input x, computing p(y x, D). Parametric approach Model x n, y n θ p(x, y θ), assuming a parametric representation for f ( ) = f θ ( ). Prior over parameters: p(θ). Posterior over parameters p(θ D) = p(d θ)p(θ). p(d) Nonparametric approach Model x n, y n f, without parametric representations for f ( ). Prior over function: f p(f ). Posterior over function p(f D) = p(d f )p(f ). p(d) Prediction is done by Prediction is done by p(y x, D) = p(y x, θ)p(θ D)dθ. p(y x, D) = p(y x, f )p(f D)df. GP regression infers p(f D), instead of p(θ D). 9 / 34

10 Gaussian Processes Definition: A Gaussian Process (GP) is a collection of random variables, any finite number of which has a joint Gaussian distribution. A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. GP defines a a distribution over functions of the form f : X R, which is completely specified by mean function µ(x) and covariance function k(x, x ): where f (x) GP ( µ(x), k(x, x ) ), µ(x) = E [f (x)], k(x, x ) = E [ (f (x) µ(x)) ( f (x ) µ(x ) )] { = σf 2 exp 1 } 2l x 2 x 2, which is referred to as squared exponential kernel, l is an length-scale parameter that controls the rate of decay of the covariance, σf 2 controls the prior variance (signal variance). 10 / 34

11 Gaussian Process Regression Model: y i = f (x i ) + ɛ i, where f ( ) is referred to as latent function. Latent vector: f = [f (x 1 ),..., f (x N )] R N. Note that parameters are function itself in GPR model. Gaussian likelihood: p(y X, f) = N (y f, σ 2 I). Gaussian process prior (zero mean): f (x) GP(0, k(x, x )), p(f X) = N (f 0, K). (K = [k(x i, x j )] R N N ) 11 / 34

12 Interested in: p(f X, X, y), given a test data X, leading to the predictive distribution p(y X, X, y). This is nothing but computing the posterior over f, combining the Gaussian likelihood with GP prior. It can be analytically computed. 12 / 34

13 Gaussian process posterior: f (x) X, y GP(µ(x), k(x, x )), where [ 1 µ(x) = k(x, X) k(x, X) + σ I] 2 y, [ 1 k(x, x ) = k(x, x ) k(x, X) k(x, X) + σ I] 2 k(x, x ). Predictive distribution: ( [ k(x, 1 p(f X, y, x ) = N y X) k(x, X) + σ I] 2 y, [ ) 1 k(x, x ) k(x, X) k(x, X) + σ I] 2 k(x, x ), where k(x, X) = [k(x, x 1),..., k(x, x N )] R 1 N, k(x, X) = [k(x i, x j )] R N N. 13 / 34

14 [Figure source: Rasmussen and Williams] 14 / 34

15 GP regression is a linear predictor in the sense that the prediction at x is done via E [f D] = N α n k(x, x n ), n=1 where α = [ σ 2 I + k(x, X)] 1y. 15 / 34

16 Algorithm Outline Algorithm 1 GP Regression Input: Training dataset D = {(x n, y n ) n = 1,..., N}, test input x, covariance function k(, ), and noise level σ 2 1: Compute K = [k(x i, x j )] and k = [k(x 1, x ),..., k(x N, x )] 2: L = Cholesky(K + σ 2 I) 3: α = L \(L\y) 4: Compute predictive mean: E[f ] = k α 5: v = L\k 6: Compute predictive variance: var(f ) = k(x, x ) v v 7: Compute the marginal log-likelihood: log p(y X) = 1 2 y α n log L n,n N 2 log 2π 8: return E[f ], var(f ), log p(y X) 16 / 34

17 Cholesky Decomposition The Cholesdy decomposition of a symmetric, positive-definite matrix A decomposes A into a product of lower triangular matrix L and its transpose: A = LL, where L is called the Cholesky factor. To solve Ax = b for x, first solve the triangular system Ly = b by forward substitution and then the triangular system L x = y by back substitution. We write the solution as x = L \(L\b). 17 / 34

18 GP Regression: Detailed Derivation Let f R T be latent function values evaluated at test data points X R D T. We first write the joint distribution of the observed target values and the function values at the test locations under the prior: [ ] ([ ] [ ]) y 0 k(x, X) + σ 2 I k(x, X ) N,, 0 k(x, X) k(x, X ) f It follows from the Gaussian Identity that we have ( [ ] k(x, 1 f y, X, X N f X) k(x, X) + σ 2 I N y, [ ] ) 1 k(x, X ) k(x, X) k(x, X) + σ 2 I N k(x, X ), leading to p(y y, X, X ) = N ( [ ] k(x, 1 f X) k(x, X) + σ 2 I N y, ] ) 1 k(x, X ) k(x, X) [k(x, X) + σ 2 I N k(x, X ) + σ 2 I T. 18 / 34

19 Gaussian Identity A D-dimensional Gaussian density for x is 1 N (x µ, Σ) = exp { 12 } (2π) D 2 Σ (x 1 µ) Σ 12 (x µ). 2 Define the augmented vector y = [x, z ] which is jointly normal, i.e., [ ] ([ ] [ ]) x a A C y = N, z b C. B The marginal densities are x N (a, A), z N (b, B). The conditional distributions are: p(x z) = N (a + CB 1 (z b), A CB 1 C ) ( ) p(z x) = N b + C A 1 (x a), B C A 1 C. 19 / 34

20 Pros and Cons Pros GPs provide fully probabilistic predictive distributions, including estimates of the uncertainty of the predictions. The evidence framework applied to GPs allows to learn thehyperparameters of the kernel (marginal likelihood maximization). Cons Computational complexity grows as O(N 3 ) (in the case of naïve implementation). 20 / 34

21 Marginal Likelihood (Evidence) The marginal likelihood is the integral of the likelihood times the prior (marginalization over the function values f): p(y X) = p(y f, X) p(f X) df, }{{}}{{} likelihood prior where p(y f, X) = N (y f, σ 2 I), p(f X) = N (f 0, K). Performing the Gaussian integration yields p(y X) = N (y 0, K + σ 2 I), Thus, the marginal log-likelihood is log p(y X) = N 2 log 2π 1 2 log K + σ2 I 1 }{{} 2 y (K + σ 2 I) 1 y. }{{} model complexity data fit 21 / 34

22 (a) (b) (c) Figure: (l, σ f, σ) =: (a) (1,1,0.1); (b) 0.3,1.08, ); (c) (3.0, 1.16, 0.89). [Figure source: Murphy s Fig. 15.3] 22 / 34

23 The marginal likelihood tells us the probability of observations given the assumptions of the model. Hyperparameters are determined by maximizing the marginal log-likelihood. Alternatively, cross-validation is used for hyperparameter estimation (leave-one-out predictive probability, a.k.a. pseudo-likelihood). Sparse approximations for GPs will be given in other lectures (possibly fall semester?). 23 / 34

24 Link with Bayesian Linear Regression 24 / 34

25 Bayesian Linear Regression Given a set of N training examples, D = {(x n, y n ) n = 1,..., N}, assuming Gaussian noise, ɛ n N (0, σ 2 ), linear regression model is described as: Or in a compact form, y = X θ + ɛ, y n = f (x n ) + ɛ n = θ x n + ɛ n. Gaussian prior over θ: θ N (0, Σ 0). Gaussian likelihood: p(y X, θ) = = = (X R D N is a design matrix) N p(y n x n, θ) n=1 N { 1 exp 1 } n=1 2πσ 2 2σ 2 (yn θ x n) 2 ( ) N { 1 exp 1 } 2πσ 2 2σ y 2 X θ 2 2 = N (Xθ, σ 2 I). 25 / 34

26 Calculate the posterior over θ: p(θ X, y) = p(y X, θ)p(θ) p(y X, θ)p(θ)dθ p(y X, θ)p(θ) = p(y X) { exp 1 ( ) ( y X θ 2σ 2 { = exp 1 2 (θ θ N) Σ 1 N (θ θ N ) y X θ) } exp }. { 1 } 2 θ Σ 1 0 θ where θ N = 1 σ Σ NXy, ( 2 Σ N = Σ ) 1 σ 2 XX. Hence, the posterior p(θ X, y) is also Gaussian: p(θ X, y) = N (θ N, Σ N ). 26 / 34

27 Given a new input x, the predictive distribution of f = f (x ) is calculated as: p(f x, X, y) = p(f x, θ)p(θ X, y)dθ = f (x θ)p(θ X, y)dθ Hence, where = E θ X,y [ x θ ]. p(f x, X, y) = N (x θ N, x Σ N x ), θ N = 1 σ 2 Σ NXy, ( Σ N = Σ ) 1 σ 2 XX. 27 / 34

28 Bayesian linear regression with f (x) = w 1 + w 2x: (a) Gaussian prior over w 1 and w 2; (b) three training points (superimposed on data are the predictive mean plus/minus two standard deviations of the (noise-free) predictive distribution p(f x, X, y); (c) likelihood; (d) posterior over w 1 and w 2. [Figure source: Rasmussen and Williams] 28 / 34

29 Increasing Expressiveness Use a set of basis function φ(x) = [φ 1 (x),..., φ M (x)] to project a D-dimensional input x R D to M-dimensional feature space: φ(x) : R D R M. The regression function is written as f (x) = φ(x) θ. The design matrix is Φ R M N. The predictive distribution p(f φ, Φ, y) is computed in feature space: p(f φ, Φ, y) = N (µ, σ 2 ), where µ = φ σ 2 = φ ( 1 σ 2 Σ ) 1 σ 2 ΦΦ Φy, ( Σ ) 1 σ 2 ΦΦ φ. 29 / 34

30 Now we show that the predictive distribution p(f φ, Φ, y) can be expressive in terms of inner products in feature space (K = Φ Σ 0 Φ): where p(f φ, Φ, y) = N (µ, σ 2 ), µ = φ Σ 0 Φ ( K + σ 2 I ) 1 y, σ 2 = φ Σ 0 φ φ Σ 0 Φ ( K + σ 2 I ) 1 Φ Σ 0 φ. Recall our earlier results for GP regression: ( k(x f y, X, x N f, X) [ k(x, X) + σ 2 I ] 1 y, k(x, x ) k(x, X) [ k(x, X) + σ 2 I ] ) 1 k(x, x ), GP regression is a Bayesian linear regression leveraged with kernel trick. 30 / 34

31 Detailed Calculation: Weight-Space View Define K = Φ Σ 0Φ, ( Σ N = Σ ) 1 σ 2 ΦΦ. Then Σ 1 N Σ 0Φ = = Φ ( Σ ) σ 2 ΦΦ Σ 0Φ = (I + 1σ ) K = 1σ ( ) Φ σ 2 I + K. 2 2 ( Φ + 1 ) σ 2 ΦΦ Σ 0Φ Premultiply Σ N and postmultiply (K + σ 2 I) 1 to obtain Σ N ( Σ 1 N ) [ Σ 0Φ (σ 2 I + K) 1 1 ( ) ] = Σ N σ Φ σ 2 I + K (σ 2 I + K) 1 2 = 1 σ 2 Σ NΦ. 31 / 34

32 Thus, we have With this result, we have 1 σ 2 Σ NΦ = Σ 0Φ(σ 2 I + K) 1. µ = 1 σ 2 φ Σ N Φy = φ Σ 0Φ(σ 2 I + K) 1 y. Apply the matrix inversion lemma ( 1 (A + BCD) 1 = A 1 A 1 B C 1 + DA B) 1 DA 1, to obtain σ 2 = φ Σ Nφ ( = φ Σ ) 1 σ 2 ΦΦ φ [ ( ) ] 1 = φ Σ 0 Σ 0Φ σ 2 I + Φ Σ 0Φ Φ Σ 0 φ ( 1 = φ Σ 0φ φ Σ 0Φ K + σ I) 2 Φ Σ 0φ. QED 32 / 34

33 How many basis functions? Recall φ(x) : R D R M (M could be infinite) Kernels are inner products in a feature space. For instance, k(x, y) = e (x y)2 = e x 2 e y 2 2 k x k y k. k! k=0 }{{} e 2xy 33 / 34

34 References C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, C. E. Rasmussen, Advances in Gaussian Processes, NIPS-2006 Tutorial. 34 / 34

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen Kernel Methods Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 37 Outline

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Lecture 1c: Gaussian Processes for Regression

Lecture 1c: Gaussian Processes for Regression Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Prediction of Data with help of the Gaussian Process Method

Prediction of Data with help of the Gaussian Process Method of Data with help of the Gaussian Process Method R. Preuss, U. von Toussaint Max-Planck-Institute for Plasma Physics EURATOM Association 878 Garching, Germany March, Abstract The simulation of plasma-wall

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

State Space Gaussian Processes with Non-Gaussian Likelihoods

State Space Gaussian Processes with Non-Gaussian Likelihoods State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E483 Kernel Methods in Machine Learning Lecture : Gaussian processes Markus Heinonen 3. November, 26 Today s topic Gaussian processes: Kernel as a covariance function C. E. Rasmussen & C. K. I. Williams,

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information