Lecture 1c: Gaussian Processes for Regression

Size: px
Start display at page:

Download "Lecture 1c: Gaussian Processes for Regression"

Transcription

1 Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London Advanced Topics in Machine Learning (MSc in Intelligent Systems) January 8

2 Today s plan The equivalent kernel Definition of a Gaussian process Sampling from a Gaussian process Gaussian processes for regression Parameter inference Automatic relevance determination Covariance functions Sparse etensions Non-Gaussian likelihoods

3 Probabilistic linear regression and the equivalent kernel The predictive mean is defined by a weighted combination of the targets: y(, µ w ) = µ w φ() = σ t ΦΣ wφ() = X n σ φ() Σ wφ( n)t n X k(, n) tn. n The equivalent kernel k(, ) is implicitly defined in terms of the basis function φ( ) and is data dependent through Σ w. It can be reformulated in terms of an inner product: k(, ) = ψ ()ψ( ), ψ( ) σ Σ / w φ( ). It determines the correlation between (often nearby) input pairs: y(, w)y(, w) = φ () ww φ( ) = σ k(, ). The idea is to define the covariance function or kernel directly instead of chosing basis functions which induce an implicit kernel.

4 Gaussian process A multivariate Gaussian distribution: Defines a probability density (based on correlations) over D random variables. Is defined by a mean vector µ and a covariance matri Σ: y (y,..., y D ) N (µ, Σ). A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. It defines a probability measure over random functions. (Informally a function can be viewed as an infinitely long vector.) It is defined by a mean function m() and a covariance function k(, ): y( ) GP(m( ), k(, )). The joint distribution over a finite subset of variables is a consistent finite dimensional Gaussian! (see Lecture a)

5 Eample of a covariance function The squared eponential kernel is defined as k(, ) = c ep j ff, l where c and l > are hyperparameters. It is a valid kernel as it leads to a positive semidefinite Gram matri K R N N for any possible choices of the set { n} N n=. It is a stationary kernel, i.e. it depends only on the difference. It corresponds to projecting the input data into an infinite dimensional feature space (see e.g. Shawe-Taylor and Cristianini, 4). Alternatively, it corresponds to using an infinite number of basis functions (not just on the training points).

6 Consider a basis function which is an infinite sum of squared eponentials weighted by Gaussian random variables: ψ() = where w(u) N (, ) for all u. Z w(u)e ( u) The resulting covariance function defines the squared eponential kernel: k(, ) = ψ()ψ( ) = Z du, e ( u) e ( u) du ) convolution e (.

7 Sampling random functions from a Gaussian processes Sequential sampling: y Y p(yn y \n) = Y N ( emn, n> n> σ n), where y \n = (y n,..., y ). Repeat for n > : Generate n. Draw a sample z n from N (, ). Compute the function value associated to n using y n = σ nz n + em n. Batch sampling: y N (m, K). Generate a set of inputs { n} N n=. Draw N samples from N (, ). Compute the function values using y = L z + m. Matri L is the upper triangular Cholesky factor of the kernel matri K.

8 The function values y n and y \n are jointly Gaussian:»» m(n) k(n, n) k n p(y n, y \n ) = N, m \n k n K \n = N (m, K). «The conditional p(y n y \n ) is then also Gaussian with the conditional mean and the conditional variance respectively given by em n = m( n) + k nk \n (y \n m \n ), σ n = k( n, n) k nk \n k n.

9 Eample (demo: gpsampl fun) y()!.5!!.5!!.5!!.8!.6!.4! Figure: Three random functions generated from a GP with m() = and a squared eponential covariance function (c = and l =.5).

10 Gaussian processes for regression The choice of the kernel defines a prior process (and a prior measure over functions): y( ) GP(, k(, )). We assume a finite number of observations and iid Gaussian noise. The likelihood is given by t y, σ N (y, σ I N ), where y (y( ),..., y( N )) are the latent function values. The posterior process is again a Gaussian process: where y( ) t, σ GP( em( ), k(, )), em( ) = k ( )(K + σ I N ) t, k(, ) = k(, ) k ( )(K + σ I N ) k( ).

11 Any latent function value y() is jointly Gaussian with the finite subset y:» «K k() p(y, y()) = N, k, () k(, ) where k() (k(, ),..., k(, N )). The mean and the variance of the conditional Gaussian p(y() y) are given by µ() = k ()K y, κ(, ) = k(, ) k ()K k(). We have the p(y) = N (, K) and the p(t y) = N (y, σ I N ), such that where Σ = (K + σ I N ). p(y t) = N (σ Σt, Σ), Hence, the marginal posterior p(y() t) = R p(y() y)p(y t)dy is a Gaussian with mean and variance given by em() = k ()(K + σ I N ) t, where the Woodbury identity was invoked. k(, ) = k(, ) k ()(K + σ I N ) k(),

12 Eample (demo: gpsampl fun) y() y()!!!!!3!3!4!4!5!!.8!.6!.4! !5!!.8!.6!.4! (a) Prior. (b) Posterior. Figure: Three random functions generated from (a) the prior GP and (b) the posterior GP. An observation is indicated by a +, the mean function by a dashed line and the 3 standard deviation error bars by the shaded regions. We used a squared eponential covariance function (c = and l =.5).

13 Learning the parameters by type II ML Let us denote the kernel parameters by θ. We view the latent functions as nuisance parameters and maimise the log-marginal wrt σ and θ. The log-marginal likelihood is given by ln p(t σ, θ) = N ln π ln K(θ) + σ I N {z } t (K(θ) + σ I N ) t. {z } compleity penality data fit The noise variance σ and the kernel parameters θ can be learned by means of gradient ascent techniques (see Nocedal and Wright, ): ln p(t σ, θ) = n σ tr (K + σ I N ) o + ν ν, ln p(t σ, θ) = j θ k tr (K + σ I N ) νν ff K, θ k where ν (K(θ) + σ I N ) t. The negative log-marginal surface is non-conve (no guarantee of attaining a global minimum) and the computational compleity for its evaluation is O(N 3 ).

14 Z p(t σ, θ) = Z = p(t y, σ) p(y θ) dy N (y, σ I N ) N (, K(θ)) dy = N (, K(θ) + σ I N ).

15 Predictive distribution The predictive distribution at for type II ML estimates of the hyperparameters is given by p(t t) p(t t, σ ML, θ ML) = N ( em ML(), k ML(, ) + σ ML). The predictive variance has three components: The prior variance k ML(, ). The term k ML()(K ML + σ I N ) k ML(), which reduces the prior uncertainty and tells us how much is eplained by the data. The noise σ ML on the observations. This term is independent of the targets!

16 Z p(t t, σ ML, θ ML) = Z = p(t y(), σ ML) posterior GP z } { p(y() t, σ ML, θ ML) dy() N (y(), σ ML) N ( em ML(), k ML(, )) dy() = N ( em ML(), σ ML + k ML(, )).

17 t t Sinc eample revisited !.5!!8!6!4! 4 6 8!.5!!8!6!4! (a) Variational linear regression. (b) GP regression. Figure: Comparison of the optimal solutions found by (a) variational linear regression with squared eponential basis functions (λ =.495) and by (b) Gaussian process regression with a squared eponential kernel (λ =.84).

18 Automatic relevance determination (ARD) Can we select the relevant input dimensions form the data? Consider a more general form of the squared eponential kernel: ( ) k(, ) = c ep DX ( d d), where {l d } D d= are allowed to be different. The characteristic length scale l d measures the distance for being uncorrelated along d. Hence, d is not relevant if /l d is small. In general, ARD can be implemented by imposing hierarchical priors on the parameters. For eample, ARD is used in relevance vector machines for achieving sparsity. A prior with different inverse length scale α m is imposed on each weight w m. d= l d

19 ! Eample& C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 6, E. Rasmussen! c 6 Massachusetts Institute of Technology. N 6853X.! The Model Selection Problem input 7!! input!!! input!! (a) (a) input input!! (b) (b)!!! input input! (c Figure 5.: Functions with two dimensional input drawn at r squared eponential covariance function Gaussian processes, y () as a function Figure: Latent function values of the input measures dimensions (5.) andrespectively.. three different distance in eq. The In (a) both dimensions are relevant, while (b)(b)only is.!, and (c) Λ = (, )!,! = (6, 6)!. In pa! in =,! = (, 3) output y output y output y output y output y (a)! are equally important, while in (b) the function varies less rapi than. In (c) the Λ column gives the direction of most rapid!!!!!! covariance will become almost independent of that input, it from the inference. ARD has been used successfully fo input by several authors, e.g. Williams and Rasmussen [!

20 Covariance functions In order to be valid a kernel should satisfy Mercer s condition (see e.g. Shawe-Taylor and Cristianini, 4). In practice we require the kernel to induce a symmetric and positive semidefinite kernel matri. Eamples of other kernels: Non-stationary kernels (e.g. sigmoidal kernel). Kernels for structured inputs (e.g. string kernels). Some rules for kernel design: k(, ) = ck (, ), k(, ) = k (, ) + k (, ), k(, ) = k (, )k (, ), k(, ) = f ()k (, )f (),. where c > is a constant and f ( ) is a deterministic function. An interesting open question is how to learn (the type of) the kernel.

21 Periodic covariance functions A periodic signal of can be constructed using the following warping function: u() = (sin, cos ). Plugging u in the squared eponential kernel leads to a periodic kernel: 8 9 < sin = k(, ) = c ep : ;, where we used the fact that u() u( ) = 4 sin ( ). l y()!!!3!4!5!!8!6!4! Figure: Three random functions generated with a periodic kernel (c = and l =.5).

22 Rational quadratic covariance functions The rational quadratic kernel is defined as follows: k(, ) = + νl «ν+d where ν > is the shape parameter, l > the scale parameter and D is the dimension of the input space. The rational quadratic kernel (or Student-t kernel) corresponds to an infinite miture of scaled squared eponentials: Z Z p(r u, l) p(u ν) du = N (, l /u)g( ν, ν ) du where r. + r νl «ν+d The shape parameter ν defines the thickness of the kernel tails. The squared eponential is recovered for ν.,.

23 Eample revisited y() y() y()!!!!!!!3!3!3!4!4!4!5!!.8!.6!.4! !5!!.8!.6!.4! !5!!.8!.6!.4! (a) Prior ν = 3. (b) Prior ν = 3. (c) Prior ν y() y() y()!!!!!!!3!3!3!4!4!4!5!!.8!.6!.4! !5!!.8!.6!.4! !5!!.8!.6!.4! (d) Prior ν = 3. (e) Prior ν = 3. (f) Prior ν. Figure: Three random functions generated from (a) the prior GP and (b) the posterior GP with the rational quadratic kernel (l =.5). The observations are indicated by +, the means by a dashed lines and the 3 standard deviation error bars by the shaded regions.

24 Matérn covariance functions The Matérn kernel is given by «k(, ) = ν ν ν «ν K ν, Γ(ν) l l where ν > and l >. Function K ν( ) is the modified Bessel function of the second kind. The order ν defines the roughness of the random functions as they are ν times differentiable: We have the Laplacian or Ornstein-Uhlenbeck kernel for ν =. For ν = p + with p N, the covariance function takes the simple form of a product of an eponential and a polynomial of order p. j k(, ) = ep ν l ff p! p! px = (p + i)! i!(p i)! We recover the squared eponential kernel for ν. «8ν p i. There is in general no closed form solution for the derivative of K ν( ) wrt ν. The Ornstein-Uhlenbeck (OU) process is a mathematical description of the velocity of a particle undergoing Brownian motion. l

25 Eample revisited y() y() y()!!!!!!!3!3!3!4!4!4!5!!.8!.6!.4! !5!!.8!.6!.4! !5!!.8!.6!.4! (a) Prior ν =. (b) Prior ν = 5. (c) Prior ν y() y() y()!!!!!!!3!3!3!4!4!4!5!!.8!.6!.4! !5!!.8!.6!.4! !5!!.8!.6!.4! (d) Prior ν =. (e) Prior ν = 5. (f) Prior ν. Figure: Three random functions generated from (a) the prior GP and (b) the posterior GP with the Matérn kernel (l =.5). The observations are indicated by +, the means by a dashed lines and the 3 standard deviation error bars by the shaded regions.

26 Matérn kernel vs rational quadratic kernel.9.8! = /3! = 3! " #.9.8 p = p = p! " k(, ).5 k(, ) ! (a) Rational quadratic ! (b) Matérn. Figure: Comparison of the rational quadratic and the Matérn kernel with unit length scale (l = ) for three values of respectively the shape and the roughness parameter. Both kernels are less localised than the squared eponential. Forcing the random latent functions to be infinitely differentiable might be unrealistic in practice.

27 Sparse Gaussian processes The main problem with GPs is that eact inference is O(N 3 ), where N is the number of input variables. Subset of training data: The data points in the active set are selected in a greedy fashion according to some heuristic: Random selection. Vector quantisation or clustering (e.g. K-means). Maimum entropy score (Lawrence et al., 3): H[p(y n y \n )] H[p(y n y)]. Maimum information gain (Seeger et al., 3): KL[p(y n y) p(y n y \n )].... Predictions are made based on the active set only. Subset of regressors: Consider a set inducing variables u R M, which are deterministically related to the latent function values: y() = k u ()K u u. The GP prior is replaced by a degenerate GP with the covariance function k SoR(, ) = y()y( ) = k u ()K u k u( ). The (inputs of the) inducing variables are selected from the training data according to some simple heuristic.

28 Sparse Gaussian processes (continued) 3 Projected process approimation (Csató and Opper, ): Consider again a set inducing variables u R M. They are now related to the observations: t y t u N (k u ()K u u, σ ), u N (, K u). The information contained in the N observations is absorbed into the m inducing variables. Same predictive mean as for the subset of regressors, but more realistic predictive variance (i.e. it grows when moving away from observations). 4 Pseudo-inputs approimation (Snelson and Ghahramani, 6): The approimate likelihood is chosen from richer class: t y t u N (k u ()K u u, k(, ) k u ()K u k u( ) + σ ). It can be shown that this choice leads to a (non-degenerate) GP with the covariance function k(, ) PI = k SoR(, ) + δ(, ) `k(, ) k SoR(, ), where δ(, ) is the Kronecker s delta.

29 Non-Gaussian noise Assume the noise is non-gaussian, but still iid. The likelihood factorises and takes the following form: p(t y, θ) e P N n= V n, where V n V θ (t n, y n) is a nonlinear function parametrised by θ. Even for a GP prior, the posterior non-gaussian process is intractable. We consider the variational Gaussian distribution q(y) = N (µ, Σ), which maimises the free energy (Opper and Archambeau, 8) The stationary points are given by F(q, θ) = ln p(t, y θ) q(y) + H[q(y)]. µ = K ν, ν (... V n qn / µ n... ), Λ Σ = K +, Λ diag{... Vn qn / Σ nn... }, where q n q( n) is the marginal Gaussian. The number of parameters to optimise (e.g. by gradient descent) is O(N)!

30 Non-Gaussian noise (continued) If y GP, then the conditional mean function and the conditional variance function are given by µ() = k ()K y, κ(, ) = k(, ) k ()K k(). The approimate posterior process is a Gaussian process Z y( ) t p(y( ) y) q(y) dy = GP(eµ( ), κ(, )), with mean function and the predictive variance given by em() = k () ν, k(, ) = k(, ) k ()(K + Λ ) k(), where the Woodbury identity was invoked. The log-marginal is intractable, but the noise and the kernel parameters can be estimated by maimising F.

31 Sinc eample with Laplace noise The likelihood is defined as p(t y, η) = η e η t y, with η > !.! !.!.4!!5 5!!5 5 (a) Standard GP. (b) Variational GP. Figure: Sinc eample with Laplace noise (η = ). Both GPs use an optimised squared eponential kernel. Note that the shaded regions indicate the standard deviation error bars. Useful Gaussian identities: (see Opper and Archambeau (8) for proof) fi fl V n qn Vn (n µn)vn qn =, µ n n Σ nn V n qn Σ nn = q n = fi fl V n n = (n µn) V n Σnn V qn n qn q n Σ. nn

32 Interpretation of the variational Gaussian approimation Laplace approimation: A Gaussian density is fitted locally at a mode of the posterior and the covariance is built from the curvature of the log-posterior around this point: = y ln p(t, y θ), Σ = y y ln p(t, y θ). Variational Gaussian approimation: The variational mean and the variational covariance can be rewritten in two different ways: = µ ln p(t, y θ) q(y) = y ln p(t, y θ) q(y), Σ = µ µ ln p(t, y θ) q(y) = y y ln p(t, y θ) q(y). A Gaussian density is fitted globally, i.e. the conditions of the Laplace approimations hold on average. The variational Gaussian method is also equivalent to applying Laplace s method to an implicitly defined probability density q(µ) e ln p(t,y θ) q(y).

33 References L. Csató and M. Opper, Sparse on-line Gaussian processes, Neural Computation 4:64-668,. C.M. Bishop: Pattern Recognition and Machine Learning. Springer, 6. J. Nocedal and S.J. Wright: Numerical optimization. Springer,. M. Opper and C. Archambeau, The variational Gaussian approimation revisited, Neural Computation 8. C. E. Rasmussen and C. K.I. Williams: Gaussian Processes for Machine Learning. MIT Press, 6. J. Shawe-Taylor and N. Cristianini: Kernel Methods for Pattern Analysis. Cambridge University Press, 4. E. Snelson and Z. Ghahramani, Sparse Gaussian processes using pseudo-inputs, NIPS 5. Tutorial on Gaussian processes at NIPS 6 by C. E. Rasmussen. The Matri Cookbook by K. B. Petersen and M. S. Pedersen.

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Lecture 1a: Basic Concepts and Recaps

Lecture 1a: Basic Concepts and Recaps Lecture 1a: Basic Concepts and Recaps Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Lecture 1b: Linear Models for Regression

Lecture 1b: Linear Models for Regression Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Bayesian Inference of Noise Levels in Regression

Bayesian Inference of Noise Levels in Regression Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop

More information

Regression with Input-Dependent Noise: A Bayesian Treatment

Regression with Input-Dependent Noise: A Bayesian Treatment Regression with Input-Dependent oise: A Bayesian Treatment Christopher M. Bishop C.M.BishopGaston.ac.uk Cazhaow S. Qazaz qazazcsgaston.ac.uk eural Computing Research Group Aston University, Birmingham,

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Dan Cawford Manfred Opper John Shawe-Taylor May, 2006 1 Introduction Some of the most complex models routinely run

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Where now? Machine Learning and Bayesian Inference

Where now? Machine Learning and Bayesian Inference Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

A variational radial basis function approximation for diffusion processes

A variational radial basis function approximation for diffusion processes A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Quantifying mismatch in Bayesian optimization

Quantifying mismatch in Bayesian optimization Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida

More information

Analytic Long-Term Forecasting with Periodic Gaussian Processes

Analytic Long-Term Forecasting with Periodic Gaussian Processes Nooshin Haji Ghassemi School of Computing Blekinge Institute of Technology Sweden Marc Peter Deisenroth Department of Computing Imperial College London United Kingdom Department of Computer Science TU

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Variational Dependent Multi-output Gaussian Process Dynamical Systems

Variational Dependent Multi-output Gaussian Process Dynamical Systems Variational Dependent Multi-output Gaussian Process Dynamical Systems Jing Zhao and Shiliang Sun Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Deep learning with differential Gaussian process flows

Deep learning with differential Gaussian process flows Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Tokamak profile database construction incorporating Gaussian process regression

Tokamak profile database construction incorporating Gaussian process regression Tokamak profile database construction incorporating Gaussian process regression A. Ho 1, J. Citrin 1, C. Bourdelle 2, Y. Camenen 3, F. Felici 4, M. Maslov 5, K.L. van de Plassche 1,4, H. Weisen 6 and JET

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

State Space Representation of Gaussian Processes

State Space Representation of Gaussian Processes State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)

More information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns

More information

Gaussian Processes in Reinforcement Learning

Gaussian Processes in Reinforcement Learning Gaussian Processes in Reinforcement Learning Carl Edward Rasmussen and Malte Kuss Ma Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen, Germany {carl,malte.kuss}@tuebingen.mpg.de

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes

Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes Journal of Machine Learning Research 17 (2016) 1-62 Submitted 9/14; Revised 7/15; Published 4/16 Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes Andreas C. Damianou

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information