Lecture 5: GPs and Streaming regression

Size: px
Start display at page:

Download "Lecture 5: GPs and Streaming regression"

Transcription

1 Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19,

2 Recall: Non-parametric regression Input space X R n, target space Y = R Input matrix X of size m n, target vector y of size m 1 Feature mapping φ : X R d Assumption: y = Φw Minimize J λ (w) = 1 2 (Φw y) (Φw y) + λ 2 w w λ 0 The solution is w = (Φ Φ + λi n ) 1 Φ y = Φ (ΦΦ + λi m ) 1 y COMP-652 and ECSE-608, Lecture 5 - September 19,

3 Recall: Kernel regression Let K = ΦΦ and k(x) = φ(x) Φ be the kernel matrix/vector: K = k(x 1, x 1 ) k(x 1, x 2 )... k(x 1, x m ) k(x 2, x 1 ). k(x 2, x 2 ).... k(x 2, x m ). k(x m, x 1 ) k(x m, x 2 )... k(x m, x m ) The predictions for the input data are given by k(x) = ŷ = ΦΦ (ΦΦ + λi m ) 1 y = K(K + λi m ) 1 y k(x, x 1 ) k(x, x 2 ). k(x, x m ) The prediction for a new input point x is given by ˆf(x) = φ(x) Φ (K + λi m ) 1 y = k(x)(k + λi m ) 1 y COMP-652 and ECSE-608, Lecture 5 - September 19,

4 Recall: Bayesian view of regression Consider noisy observations y = f(x) + ɛ = φ(x) w + ɛ Recall Bayes rule: posterior = likelihood prior marginal likelihood P φ (w y, X) = P φ(y X, w)p (w) P φ (y X) Marginal likelihood is independent of weights w With Gaussian noise ɛ N (0, σ 2 ) P φ (y X, w) = = m P φ (y i x i, w) = i=1 1 ( 2πσ) m exp ( m i=1 1 2πσ exp y Φw 2 2σ 2 ) ( (y i φ(x i ) w) 2 ) 2σ 2 = N m ( Φw, σ 2 I m ) COMP-652 and ECSE-608, Lecture 5 - September 19,

5 Posterior distribution on parameters With Gaussian prior on parameters w N d (0, Σ w ) ) y Φw 2 P φ (w y, X) exp ( 2σ 2 exp ( w Σ 1 ) w w 2 = exp ( y y y Φw w Φy + w Φ Φw + σ 2 w Σ 1 2σ 2 = exp ( y y y Φw w Φy + w (Φ Φ + σ 2 Σ 1 ) w )w 2σ 2 exp ( (w b) A 1 (w b) ) where A 1 = σ 2 (Φ Φ + σ 2 Σ 1 w ) and b = (Φ Φ + σ 2 Σ 1 w ) 1 Φ y The posterior distribution is Gaussian! w w ) COMP-652 and ECSE-608, Lecture 5 - September 19,

6 Predictive distribution The pointwise posterior predictive distribution is a normal distribution ( f(x) x 1,..., x m, y 1,..., y m N ˆf(x), s (x)) 2 of expectation and variance ˆf(x) = φ(x) (Φ Φ + σ 2 Σ 1 w ) 1 Φ y = φ(x) Σ w Φ (ΦΣ w Φ + σ 2 I m ) 1 y s 2 (x) = σ 2 φ(x) (Φ Φ + σ 2 Σ 1 w ) 1 φ(x) = φ(x) Σ w φ(x) φ(x) Σ w Φ (Φ Σ w Φ + σ 2 I m ) 1 ΦΣ w φ(x) using Sherman-Morrison COMP-652 and ECSE-608, Lecture 5 - September 19,

7 Reinterpreting regularization Recall kernel regression predictions: ˆf(x) = k(x) (K + λi m ) 1 y Using prior Σ w = σ2 λ I d, the predictive mean rewrites as: ˆf(x) = φ(x) Σ w Φ (ΦΣ w Φ + σ 2 I m ) 1 y ) 1 = φ(x) (Φ σ2 λ Φ σ2 λ Φ + σ 2 I m y = k(x) (K + λi m ) 1 y λ encodes some prior on weights w COMP-652 and ECSE-608, Lecture 5 - September 19,

8 Reinterpreting regularization (cont d) Still using Σ w = σ2 λ I d, the predictive variance rewrites as: s 2 (x) = φ(x) Σ w φ(x) φ(x) Σ w Φ (Φ Σ w Φ + σ 2 I m ) 1 ΦΣ w φ(x) ) 1 = φ(x) σ2 φ(x) φ(x) σ2 (Φ λ λ Φ σ2 λ Φ + σ2 I m Φ σ2 λ φ(x) = σ2 λ k λ(x, x) with k λ (x, x ) = k(x, x ) k(x) (K + λi m ) 1 k(x ) COMP-652 and ECSE-608, Lecture 5 - September 19,

9 Summary Using prior Σ w = σ2 λ I d, we have ˆf(x) = k(x) (K + λi m ) 1 y s 2 (x) = σ2 λ k λ(x, x) with k λ (x, x ) = k(x, x ) k(x) (K + λi m ) 1 k(x ) Providing pointwise posterior prediction f(x) x 1,..., x m, y 1,..., y m N ( ˆf(x), s 2 (x)) What does it mean to use λ R 0? COMP-652 and ECSE-608, Lecture 5 - September 19,

10 Pointwise posterior distribution ( At each point x X, we have a distribution N ˆf(x), s (x)) 2 We can sample from these f(x) ( N ˆf(x), s (x)) 2 Y ˆf s X COMP-652 and ECSE-608, Lecture 5 - September 19,

11 Joint distribution Suppose you query your model at locations X Extend the prior to include query points: [ f f ] [ ]) KX,X K X, X N m+m (0, X,X K X,X K X,X y f N m (f, σ 2 I m ) Using joint normality of f and y: [ ( [ ]) y KX,X + σ N f ] m+m 0, 2 I m K X,X K X,X K X,X COMP-652 and ECSE-608, Lecture 5 - September 19,

12 Gaussian Process (GP) By considering the covariance between every points in the space, we get a distribution over functions! Posterior distribution on f: P [f X, y] N X ( [ ˆf(x) ] x X ), σ2 λ [k λ(x, x )] x,x X Y ˆf s X COMP-652 and ECSE-608, Lecture 5 - September 19,

13 Sampling from a Gaussian Process Generalization of normal probability distribution to the function space From a normal distribution we sample variables From a GP we sample functions! 2 1 Y 0 1 Sampled f X COMP-652 and ECSE-608, Lecture 5 - September 19,

14 Sampling from a Gaussian Process: How to Observe that N X ( [ ˆf(x) ] x X ), σ2 λ [k λ(x, x )] x,x X defines a X -dimensional multivariate Gaussian distribution What if X = (e.g. X = [ 1, 1])? We can consider a discrete, finite, set X X and sample from N X ( [ ˆf(x) ] x X ), σ2 λ [k λ(x, x )] x,x X This will result in a function f evaluated at every x X COMP-652 and ECSE-608, Lecture 5 - September 19,

15 Learning the hyperparameters If we assume that Σ w = I d, then we have λ = σ 2 Let θ denote the kernel hyperparameters (e.g. ρ) In practice you may not know the noise and the optimal kernel hypers Recall: multivariate normal density P (y θ) = exp ( 1 2 y (K θ + σ 2 I m ) 1 y ) (2π)D K θ + σ 2 I m Maximize the marginal likelihood L = log P (y θ): L = 1 2 y (K θ + σ 2 I m ) 1 y D 2 log(2π) 1 2 log K θ + σ 2 I m COMP-652 and ECSE-608, Lecture 5 - September 19,

16 Marginal likelihood: Anatomy of marginal likelihood L = log P (y θ) 1 2 y (K θ + σ 2 I m ) 1 y 1 2 log K θ + σ 2 I m C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Lea ISBN X. c 2006 Massachusetts Institute of Technology. 1st term: quality of predictions; 2nd term: model complexity 5.4 Model Selection for GP Regression Trade-off (from Rasmussen & Williams, 2006): log probability minus complexity penalty data fit marginal likelihood 10 0 characteristic lengthscale (a) log marginal likelihood % conf int 10 0 Characteristic lengthscale Figure 5.3: Panel (a) shows a decomposition of the log marginal likelihood its constituents: data-fit and complexity penalty, as a function of the characteri length-scale. The training data is drawn from a Gaussian process with SE covaria function and parameters (`, f, n) = (1, 1, 0.1), the same as in Figure 2.5, and we fitting only the length-scale parameter ` (the two other parameters have been se COMP-652 and ECSE-608, Lecture 5 - September 19, (b) 2 5

17 Compute gradients: Gradient-based optimization L = 1 θ i 2 y (K θ + σ 2 I m ) 1 (K θ + σ 2 I m ) (K θ + σ 2 I m ) 1 y θ i 1 ( 2 Tr (K θ + σ 2 I m ) 1 (K θ + σ 2 ) I m ) θ i Minimize the negative Non-convex optimization task COMP-652 and ECSE-608, Lecture 5 - September 19,

18 Summary Normal priors on the weights distribution Gaussian Process Regularization prior on the weights covariance GP provides a posterior distribution on functions Expectation: kernel regression model Covariance confidence intervals Sample discretized functions from a GP COMP-652 and ECSE-608, Lecture 5 - September 19,

19 Typical supervised setting Have dataset (X, y) of previously acquired data Learn model w on (X, y) Apply model w to provide predictions at new data points Samples (X, y) are often assumed to be i.i.d. COMP-652 and ECSE-608, Lecture 5 - September 19,

20 Streaming setting Start with (possibly empty) dataset (X 0, y 0 ) For each time step t = 1, 2,... : Fit model w t on X t 1 and y t 1 Acquire a new sample (x t, y t ) (possibly using w t ) Define X t = [x i ] i=1...t, y t = [y i ] i=1...t Samples may be dependent of model (not i.i.d.) Samples influence model / Samples depend on model COMP-652 and ECSE-608, Lecture 5 - September 19,

21 Application: Online function optimization 1.0 Y ? X Unknown function f : X R Sequentially select locations (x t ) t 1 at which to observe the function y t Try to maximize/minimize the observations COMP-652 and ECSE-608, Lecture 5 - September 19,

22 Example What is the optimal treatment dosage for a given disease? For each time step t = 1, 2,... : New patient t comes in We decide on treatment dosage x t We observe the patient s response to treatement, y t Our goal is to cure patients as effectively as possible What you don t want: give bad dosages that were known to be bad What you want: give good dosages try informative dosages COMP-652 and ECSE-608, Lecture 5 - September 19,

23 Information An informative sample improves the model by reducing its uncertainty 1 1 Y 0 Y X X How do we quantity the reduction of uncertainty after observing y 1,..., y t at locations x 1,..., x t? COMP-652 and ECSE-608, Lecture 5 - September 19,

24 A little bit of information theory Mutual information between underlying function f and observations y 1,..., y t at locations x 1,..., x t : I(y 1,..., y t ; f) = H(y 1,..., y t ) }{{} Marginal entropy H(y 1,..., y t f) }{{} Conditional entropy It quantifies the amount of information obtained about one random variable, through the other random variable Entropy is the amount of information held by a random variable H(Y X) = 0 if and only if the value of Y is completely determined by the value of X H(Y X) = H(Y ) if and only if Y and X are independent random variables COMP-652 and ECSE-608, Lecture 5 - September 19,

25 Amount of information H(X) = E[ ln P X ] = n p(x i ) log b (p(x i )) i=1 Example: Tossing a coin A fair coin has maximal entropy: it is the less predictable! Every new sample that you get changes your model the most COMP-652 and ECSE-608, Lecture 5 - September 19,

26 Mutual information Entropy of X N D (µ, Σ): [ ln exp ( 1 2 (x µ) Σ 1 (x µ) ) ] H(X) = E (2π)D Σ = D 2 +D 2 ln 2π+1 2 ln Σ Using E [ a M 1 a ] = E [ Tr ( a M 1 a )] = E [ Tr ( aa M 1)] = D Recall the joint distribution between m observations y and m query points X : [ ( [ ]) y KX,X + σ N f ] m+m 0, 2 I m K X,X K X,X K X,X Then I(y 1,..., y t ; f) = 1 2 ln K t + σ 2 I t COMP-652 and ECSE-608, Lecture 5 - September 19,

27 Another decomposition of mutual information ( ) Recall that y 1,..., y t f(x 1 ),..., f(x t ) N t (f(x1 ),..., f(x t )), σ 2 I t Using that y i = f(x i ) + ɛ with ɛ N (0, σ 2 ) Pluging-in the conditional entropy of a multivariate normal distribution: I(y 1,..., y t ; f) = H(y 1,..., y t ) 1 2 ln 2πeσ2 I t = H(y 1,..., y t ) t 2 ln 2πe t 2 σ2 What is the marginal entropy H(y 1,..., y t )? COMP-652 and ECSE-608, Lecture 5 - September 19,

28 Recursively decompose Entropy of observations H(y 1,..., y t ) = H(y 1,..., y t 1 ) + H(y t y 1,..., y t 1 ) = H(y 1,..., y t 1 ) + 1 )] [2πe (σ 2 ln 2 + σ2 λ k λ,t 1(x t, x t ). = H(y 1 ) + H(y 2 y 1 ) )] [2πe (σ 2 ln 2 + σ2 λ k λ,t 1(x s, x s ) t ( 1 = [2πeσ 2 ln )] λ k λ,s 1(x s, x s ) s=1 Still using that y i = f(x i ) + ɛ with ɛ N (0, σ 2 ) Also using the uncertainty about the location of the true f COMP-652 and ECSE-608, Lecture 5 - September 19,

29 Information gain I(y 1,..., y t ; f) = = t s=1 t s=1 ( 1 [2πeσ 2 ln )] λ k λ,s 1(x s, x s ) [ 1 2 ln ] λ k λ,s 1(x s, x s ) t 2 ln(2πe) t 2 σ2 Information gain γ t (λ) = I(y 1,..., y t ; f): reduction of uncertainty on f after observing y 1,..., y t Information gain is inversely proportionnal to λ Limiting changes in function limits the contribution of samples COMP-652 and ECSE-608, Lecture 5 - September 19,

30 Information gain vs regularization Y 1 0 λ = 0.1 λ = 1 λ = 10 Y 1 0 λ = 0.1 λ = 1 λ = X X γt(λ) 3 2 λ = 0.1 λ = 1 λ = 10 γt(λ) λ = 0.1 λ = 1 λ = t t COMP-652 and ECSE-608, Lecture 5 - September 19,

31 Summary Streaming regression: sequentially gather (potentially non-i.i.d) samples Information gain measures the total information that could result from adding a new sample (observation) to the model Information gain is controlled by the information sharing capability of the kernel Information gain is controlled by the changes in model admitted by regularization Information gain will play a part in confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19,

32 Confidence intervals Given that we have gathered t samples under the streaming setting, what kind of guarantees can we have on the resulting model? More specifically, could we guarantee that f(x) ˆf t (x) something simultaneously for all x X and for all t 0? Motivations: Control bad behaviours in critical applications Help selecting the next sample location Maximize/minimize function? Maximize model improvement? Derive sound algorithms COMP-652 and ECSE-608, Lecture 5 - September 19,

33 Result (Maillard, 2016) Under the assumption of subgaussian noise... f(x) ˆf t (x) [ kλ,t (x, x) λ f K + σ 2 ln(1/δ) + 2γ t (λ)] λ With probability higher than 1 δ Simultaneously for all t 0, for all x X Observe that the error bound scales with the information gain! COMP-652 and ECSE-608, Lecture 5 - September 19,

34 Subgaussian noise A real-valued random variable X is σ 2 -subgaussian if E [ e γx] e γ2 σ 2 /2 The Laplace transform of X is dominated by the Laplace transform of a random variable sampled from N (0, σ 2 ) Require that the tails of the noise distribution are dominated by the tails of a Gaussian distribution For example, true for Gaussian noise Bounded noise COMP-652 and ECSE-608, Lecture 5 - September 19,

35 Confidence envelope 5 observations 10 observations 1 1 Y 0 Y X 50 observations X 100 observations Y Y f λ = σ 2 λ = σ 2 / f 2 K X X COMP-652 and ECSE-608, Lecture 5 - September 19,

36 Summary It is possible to have guarantees even with non-i.i.d. data The predition error depends on How well your model shares information across observations How well your model is adapted to the function How noisy the observations are Regularization prior on Σ w Confidence intervals/envelopes will be really useful for deriving algorithms (we will see that later) COMP-652 and ECSE-608, Lecture 5 - September 19,

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information