Lecture 5: GPs and Streaming regression
|
|
- Laureen Samantha Sutton
- 5 years ago
- Views:
Transcription
1 Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19,
2 Recall: Non-parametric regression Input space X R n, target space Y = R Input matrix X of size m n, target vector y of size m 1 Feature mapping φ : X R d Assumption: y = Φw Minimize J λ (w) = 1 2 (Φw y) (Φw y) + λ 2 w w λ 0 The solution is w = (Φ Φ + λi n ) 1 Φ y = Φ (ΦΦ + λi m ) 1 y COMP-652 and ECSE-608, Lecture 5 - September 19,
3 Recall: Kernel regression Let K = ΦΦ and k(x) = φ(x) Φ be the kernel matrix/vector: K = k(x 1, x 1 ) k(x 1, x 2 )... k(x 1, x m ) k(x 2, x 1 ). k(x 2, x 2 ).... k(x 2, x m ). k(x m, x 1 ) k(x m, x 2 )... k(x m, x m ) The predictions for the input data are given by k(x) = ŷ = ΦΦ (ΦΦ + λi m ) 1 y = K(K + λi m ) 1 y k(x, x 1 ) k(x, x 2 ). k(x, x m ) The prediction for a new input point x is given by ˆf(x) = φ(x) Φ (K + λi m ) 1 y = k(x)(k + λi m ) 1 y COMP-652 and ECSE-608, Lecture 5 - September 19,
4 Recall: Bayesian view of regression Consider noisy observations y = f(x) + ɛ = φ(x) w + ɛ Recall Bayes rule: posterior = likelihood prior marginal likelihood P φ (w y, X) = P φ(y X, w)p (w) P φ (y X) Marginal likelihood is independent of weights w With Gaussian noise ɛ N (0, σ 2 ) P φ (y X, w) = = m P φ (y i x i, w) = i=1 1 ( 2πσ) m exp ( m i=1 1 2πσ exp y Φw 2 2σ 2 ) ( (y i φ(x i ) w) 2 ) 2σ 2 = N m ( Φw, σ 2 I m ) COMP-652 and ECSE-608, Lecture 5 - September 19,
5 Posterior distribution on parameters With Gaussian prior on parameters w N d (0, Σ w ) ) y Φw 2 P φ (w y, X) exp ( 2σ 2 exp ( w Σ 1 ) w w 2 = exp ( y y y Φw w Φy + w Φ Φw + σ 2 w Σ 1 2σ 2 = exp ( y y y Φw w Φy + w (Φ Φ + σ 2 Σ 1 ) w )w 2σ 2 exp ( (w b) A 1 (w b) ) where A 1 = σ 2 (Φ Φ + σ 2 Σ 1 w ) and b = (Φ Φ + σ 2 Σ 1 w ) 1 Φ y The posterior distribution is Gaussian! w w ) COMP-652 and ECSE-608, Lecture 5 - September 19,
6 Predictive distribution The pointwise posterior predictive distribution is a normal distribution ( f(x) x 1,..., x m, y 1,..., y m N ˆf(x), s (x)) 2 of expectation and variance ˆf(x) = φ(x) (Φ Φ + σ 2 Σ 1 w ) 1 Φ y = φ(x) Σ w Φ (ΦΣ w Φ + σ 2 I m ) 1 y s 2 (x) = σ 2 φ(x) (Φ Φ + σ 2 Σ 1 w ) 1 φ(x) = φ(x) Σ w φ(x) φ(x) Σ w Φ (Φ Σ w Φ + σ 2 I m ) 1 ΦΣ w φ(x) using Sherman-Morrison COMP-652 and ECSE-608, Lecture 5 - September 19,
7 Reinterpreting regularization Recall kernel regression predictions: ˆf(x) = k(x) (K + λi m ) 1 y Using prior Σ w = σ2 λ I d, the predictive mean rewrites as: ˆf(x) = φ(x) Σ w Φ (ΦΣ w Φ + σ 2 I m ) 1 y ) 1 = φ(x) (Φ σ2 λ Φ σ2 λ Φ + σ 2 I m y = k(x) (K + λi m ) 1 y λ encodes some prior on weights w COMP-652 and ECSE-608, Lecture 5 - September 19,
8 Reinterpreting regularization (cont d) Still using Σ w = σ2 λ I d, the predictive variance rewrites as: s 2 (x) = φ(x) Σ w φ(x) φ(x) Σ w Φ (Φ Σ w Φ + σ 2 I m ) 1 ΦΣ w φ(x) ) 1 = φ(x) σ2 φ(x) φ(x) σ2 (Φ λ λ Φ σ2 λ Φ + σ2 I m Φ σ2 λ φ(x) = σ2 λ k λ(x, x) with k λ (x, x ) = k(x, x ) k(x) (K + λi m ) 1 k(x ) COMP-652 and ECSE-608, Lecture 5 - September 19,
9 Summary Using prior Σ w = σ2 λ I d, we have ˆf(x) = k(x) (K + λi m ) 1 y s 2 (x) = σ2 λ k λ(x, x) with k λ (x, x ) = k(x, x ) k(x) (K + λi m ) 1 k(x ) Providing pointwise posterior prediction f(x) x 1,..., x m, y 1,..., y m N ( ˆf(x), s 2 (x)) What does it mean to use λ R 0? COMP-652 and ECSE-608, Lecture 5 - September 19,
10 Pointwise posterior distribution ( At each point x X, we have a distribution N ˆf(x), s (x)) 2 We can sample from these f(x) ( N ˆf(x), s (x)) 2 Y ˆf s X COMP-652 and ECSE-608, Lecture 5 - September 19,
11 Joint distribution Suppose you query your model at locations X Extend the prior to include query points: [ f f ] [ ]) KX,X K X, X N m+m (0, X,X K X,X K X,X y f N m (f, σ 2 I m ) Using joint normality of f and y: [ ( [ ]) y KX,X + σ N f ] m+m 0, 2 I m K X,X K X,X K X,X COMP-652 and ECSE-608, Lecture 5 - September 19,
12 Gaussian Process (GP) By considering the covariance between every points in the space, we get a distribution over functions! Posterior distribution on f: P [f X, y] N X ( [ ˆf(x) ] x X ), σ2 λ [k λ(x, x )] x,x X Y ˆf s X COMP-652 and ECSE-608, Lecture 5 - September 19,
13 Sampling from a Gaussian Process Generalization of normal probability distribution to the function space From a normal distribution we sample variables From a GP we sample functions! 2 1 Y 0 1 Sampled f X COMP-652 and ECSE-608, Lecture 5 - September 19,
14 Sampling from a Gaussian Process: How to Observe that N X ( [ ˆf(x) ] x X ), σ2 λ [k λ(x, x )] x,x X defines a X -dimensional multivariate Gaussian distribution What if X = (e.g. X = [ 1, 1])? We can consider a discrete, finite, set X X and sample from N X ( [ ˆf(x) ] x X ), σ2 λ [k λ(x, x )] x,x X This will result in a function f evaluated at every x X COMP-652 and ECSE-608, Lecture 5 - September 19,
15 Learning the hyperparameters If we assume that Σ w = I d, then we have λ = σ 2 Let θ denote the kernel hyperparameters (e.g. ρ) In practice you may not know the noise and the optimal kernel hypers Recall: multivariate normal density P (y θ) = exp ( 1 2 y (K θ + σ 2 I m ) 1 y ) (2π)D K θ + σ 2 I m Maximize the marginal likelihood L = log P (y θ): L = 1 2 y (K θ + σ 2 I m ) 1 y D 2 log(2π) 1 2 log K θ + σ 2 I m COMP-652 and ECSE-608, Lecture 5 - September 19,
16 Marginal likelihood: Anatomy of marginal likelihood L = log P (y θ) 1 2 y (K θ + σ 2 I m ) 1 y 1 2 log K θ + σ 2 I m C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Lea ISBN X. c 2006 Massachusetts Institute of Technology. 1st term: quality of predictions; 2nd term: model complexity 5.4 Model Selection for GP Regression Trade-off (from Rasmussen & Williams, 2006): log probability minus complexity penalty data fit marginal likelihood 10 0 characteristic lengthscale (a) log marginal likelihood % conf int 10 0 Characteristic lengthscale Figure 5.3: Panel (a) shows a decomposition of the log marginal likelihood its constituents: data-fit and complexity penalty, as a function of the characteri length-scale. The training data is drawn from a Gaussian process with SE covaria function and parameters (`, f, n) = (1, 1, 0.1), the same as in Figure 2.5, and we fitting only the length-scale parameter ` (the two other parameters have been se COMP-652 and ECSE-608, Lecture 5 - September 19, (b) 2 5
17 Compute gradients: Gradient-based optimization L = 1 θ i 2 y (K θ + σ 2 I m ) 1 (K θ + σ 2 I m ) (K θ + σ 2 I m ) 1 y θ i 1 ( 2 Tr (K θ + σ 2 I m ) 1 (K θ + σ 2 ) I m ) θ i Minimize the negative Non-convex optimization task COMP-652 and ECSE-608, Lecture 5 - September 19,
18 Summary Normal priors on the weights distribution Gaussian Process Regularization prior on the weights covariance GP provides a posterior distribution on functions Expectation: kernel regression model Covariance confidence intervals Sample discretized functions from a GP COMP-652 and ECSE-608, Lecture 5 - September 19,
19 Typical supervised setting Have dataset (X, y) of previously acquired data Learn model w on (X, y) Apply model w to provide predictions at new data points Samples (X, y) are often assumed to be i.i.d. COMP-652 and ECSE-608, Lecture 5 - September 19,
20 Streaming setting Start with (possibly empty) dataset (X 0, y 0 ) For each time step t = 1, 2,... : Fit model w t on X t 1 and y t 1 Acquire a new sample (x t, y t ) (possibly using w t ) Define X t = [x i ] i=1...t, y t = [y i ] i=1...t Samples may be dependent of model (not i.i.d.) Samples influence model / Samples depend on model COMP-652 and ECSE-608, Lecture 5 - September 19,
21 Application: Online function optimization 1.0 Y ? X Unknown function f : X R Sequentially select locations (x t ) t 1 at which to observe the function y t Try to maximize/minimize the observations COMP-652 and ECSE-608, Lecture 5 - September 19,
22 Example What is the optimal treatment dosage for a given disease? For each time step t = 1, 2,... : New patient t comes in We decide on treatment dosage x t We observe the patient s response to treatement, y t Our goal is to cure patients as effectively as possible What you don t want: give bad dosages that were known to be bad What you want: give good dosages try informative dosages COMP-652 and ECSE-608, Lecture 5 - September 19,
23 Information An informative sample improves the model by reducing its uncertainty 1 1 Y 0 Y X X How do we quantity the reduction of uncertainty after observing y 1,..., y t at locations x 1,..., x t? COMP-652 and ECSE-608, Lecture 5 - September 19,
24 A little bit of information theory Mutual information between underlying function f and observations y 1,..., y t at locations x 1,..., x t : I(y 1,..., y t ; f) = H(y 1,..., y t ) }{{} Marginal entropy H(y 1,..., y t f) }{{} Conditional entropy It quantifies the amount of information obtained about one random variable, through the other random variable Entropy is the amount of information held by a random variable H(Y X) = 0 if and only if the value of Y is completely determined by the value of X H(Y X) = H(Y ) if and only if Y and X are independent random variables COMP-652 and ECSE-608, Lecture 5 - September 19,
25 Amount of information H(X) = E[ ln P X ] = n p(x i ) log b (p(x i )) i=1 Example: Tossing a coin A fair coin has maximal entropy: it is the less predictable! Every new sample that you get changes your model the most COMP-652 and ECSE-608, Lecture 5 - September 19,
26 Mutual information Entropy of X N D (µ, Σ): [ ln exp ( 1 2 (x µ) Σ 1 (x µ) ) ] H(X) = E (2π)D Σ = D 2 +D 2 ln 2π+1 2 ln Σ Using E [ a M 1 a ] = E [ Tr ( a M 1 a )] = E [ Tr ( aa M 1)] = D Recall the joint distribution between m observations y and m query points X : [ ( [ ]) y KX,X + σ N f ] m+m 0, 2 I m K X,X K X,X K X,X Then I(y 1,..., y t ; f) = 1 2 ln K t + σ 2 I t COMP-652 and ECSE-608, Lecture 5 - September 19,
27 Another decomposition of mutual information ( ) Recall that y 1,..., y t f(x 1 ),..., f(x t ) N t (f(x1 ),..., f(x t )), σ 2 I t Using that y i = f(x i ) + ɛ with ɛ N (0, σ 2 ) Pluging-in the conditional entropy of a multivariate normal distribution: I(y 1,..., y t ; f) = H(y 1,..., y t ) 1 2 ln 2πeσ2 I t = H(y 1,..., y t ) t 2 ln 2πe t 2 σ2 What is the marginal entropy H(y 1,..., y t )? COMP-652 and ECSE-608, Lecture 5 - September 19,
28 Recursively decompose Entropy of observations H(y 1,..., y t ) = H(y 1,..., y t 1 ) + H(y t y 1,..., y t 1 ) = H(y 1,..., y t 1 ) + 1 )] [2πe (σ 2 ln 2 + σ2 λ k λ,t 1(x t, x t ). = H(y 1 ) + H(y 2 y 1 ) )] [2πe (σ 2 ln 2 + σ2 λ k λ,t 1(x s, x s ) t ( 1 = [2πeσ 2 ln )] λ k λ,s 1(x s, x s ) s=1 Still using that y i = f(x i ) + ɛ with ɛ N (0, σ 2 ) Also using the uncertainty about the location of the true f COMP-652 and ECSE-608, Lecture 5 - September 19,
29 Information gain I(y 1,..., y t ; f) = = t s=1 t s=1 ( 1 [2πeσ 2 ln )] λ k λ,s 1(x s, x s ) [ 1 2 ln ] λ k λ,s 1(x s, x s ) t 2 ln(2πe) t 2 σ2 Information gain γ t (λ) = I(y 1,..., y t ; f): reduction of uncertainty on f after observing y 1,..., y t Information gain is inversely proportionnal to λ Limiting changes in function limits the contribution of samples COMP-652 and ECSE-608, Lecture 5 - September 19,
30 Information gain vs regularization Y 1 0 λ = 0.1 λ = 1 λ = 10 Y 1 0 λ = 0.1 λ = 1 λ = X X γt(λ) 3 2 λ = 0.1 λ = 1 λ = 10 γt(λ) λ = 0.1 λ = 1 λ = t t COMP-652 and ECSE-608, Lecture 5 - September 19,
31 Summary Streaming regression: sequentially gather (potentially non-i.i.d) samples Information gain measures the total information that could result from adding a new sample (observation) to the model Information gain is controlled by the information sharing capability of the kernel Information gain is controlled by the changes in model admitted by regularization Information gain will play a part in confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19,
32 Confidence intervals Given that we have gathered t samples under the streaming setting, what kind of guarantees can we have on the resulting model? More specifically, could we guarantee that f(x) ˆf t (x) something simultaneously for all x X and for all t 0? Motivations: Control bad behaviours in critical applications Help selecting the next sample location Maximize/minimize function? Maximize model improvement? Derive sound algorithms COMP-652 and ECSE-608, Lecture 5 - September 19,
33 Result (Maillard, 2016) Under the assumption of subgaussian noise... f(x) ˆf t (x) [ kλ,t (x, x) λ f K + σ 2 ln(1/δ) + 2γ t (λ)] λ With probability higher than 1 δ Simultaneously for all t 0, for all x X Observe that the error bound scales with the information gain! COMP-652 and ECSE-608, Lecture 5 - September 19,
34 Subgaussian noise A real-valued random variable X is σ 2 -subgaussian if E [ e γx] e γ2 σ 2 /2 The Laplace transform of X is dominated by the Laplace transform of a random variable sampled from N (0, σ 2 ) Require that the tails of the noise distribution are dominated by the tails of a Gaussian distribution For example, true for Gaussian noise Bounded noise COMP-652 and ECSE-608, Lecture 5 - September 19,
35 Confidence envelope 5 observations 10 observations 1 1 Y 0 Y X 50 observations X 100 observations Y Y f λ = σ 2 λ = σ 2 / f 2 K X X COMP-652 and ECSE-608, Lecture 5 - September 19,
36 Summary It is possible to have guarantees even with non-i.i.d. data The predition error depends on How well your model shares information across observations How well your model is adapted to the function How noisy the observations are Regularization prior on Σ w Confidence intervals/envelopes will be really useful for deriving algorithms (we will see that later) COMP-652 and ECSE-608, Lecture 5 - September 19,
Nonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationMachine Learning Srihari. Gaussian Processes. Sargur Srihari
Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationGaussian Processes. 1 What problems can be solved by Gaussian Processes?
Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationGaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer
Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More information