Introduction to Machine Learning
|
|
- Martin Newton
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University
2 The Normal Distribution
3 The Normal Distribution
4 Gaussians in Space
5 Gaussians in Space samples in R 2
6 The Normal Distribution Density for scalar variables p(x) = 1 p 2 2 e (x µ) Density in d dimensions p(x) =(2 ) d 2 1 e 1 2 (x µ)> 1 (x µ) Principal components Eigenvalue decomposition = U > U Product representation p(x) =(2 ) d 2 e 1 2 (U(x µ))> 1 U(x µ)
7 The Normal Distribution principal principal components components = U > U p(x) =(2 ) d 2 dy i=1 1 ii e 1 2 (U(x µ))> 1 U(x µ)
8 Why do we care? Central limit theorem shows that in the limit all averages behave like Gaussians Easy to estimate parameters (MLE) µ = 1 m mx x i and = 1 m mx x i x > i µµ > i=1 i=1 Distribution with largest uncertainty (entropy) for a given mean and covariance. Works well even if the assumptions are wrong
9 Why do we care? Central limit theorem shows that in the limit all averages behave like Gaussians Easy to estimate parameters (MLE) µ = 1 m mx i=1 X: data m: sample size x i and = 1 m mx x i x > i µµ > i=1 mu = (1/m)*sum(X,2) sigma = (1/m)*X*X - mu*mu
10 Sampling from a Gaussian Case 1 - We have a normal distribution (randn) We want Recipe: where and Proof: x N(µ, ) x = µ + Lz z N(0, 1) = LL > E (x µ)(x µ) > = E Lzz > L > = LE zz > L > = LL > = Case 2 - Box-Müller transform for U[0,1] p(x) = 1 2 e 1 2 kxk 2 =) p(,r)= 1 2 e 1 2 r h1 F (,r)= e 1 r2i 2 2 2
11 Sampling from a Gaussian Φ r p(x) = 1 2 e 1 2 kxk 2 =) p(,r)= 1 2 e 1 2 r h1 F (,r)= e 1 r2i 2 2 2
12 Sampling from a Gaussian Cumulative distribution function F (,r)= 2 h1 e 1 2 r2i Draw radial and angle component separately tmp1 = rand() tmp2 = rand() r = sqrt(-2*log(tmp1)) x1 x2 = r*sin(tmp2/(2*pi)) = r*cos(tmp2/(2*pi))
13 Sampling from a Gaussian Cumulative distribution function F (,r)= 2 h1 e 1 2 r2i Draw radial and angle component separately tmp1 = rand() tmp2 = rand() r = sqrt(-2*log(tmp1)) Why can we use tmp1 instead of 1-tmp1? x1 x2 = r*sin(tmp2/(2*pi)) = r*cos(tmp2/(2*pi))
14 Example: correlating weight and height
15 Example: correlating weight and height assume Gaussian correlation
16 p(weight height) = p(height, weight) p(height) / p(height, weight)
17 p(x 2 x 1 ) / exp " 1 2 apple x1 µ 1 x 2 µ 2 > apple apple x1 µ 1 x 2 µ 2 # keep linear and quadratic terms of exponent
18 The gory math Correlated Observations Assume that the random variables t 2 R n,t 0 2 R n0 are jointly normal with mean (µ, µ 0 ) and covariance matrix K p(t, t 0 ) / exp 1 2 apple t µ t 0 µ 0 > apple Ktt K tt 0 K > tt 0 K t 0 t 0 1 apple t µ t 0 µ 0 Inference Given t, estimate t 0 via p(t 0 t). Translation into machine learning language: we learn t 0 from t. Practical Solution Since t 0 t N( µ, K), we only need to collect all terms in p(t, t 0 ) depending on t 0 by matrix inversion, hence K = K t 0 t 0 K> tt 0 Ktt 1 K tt 0 and µ = µ 0 + K tt > 0 K 1 tt (t µ) {z } independent of t 0 Handbook of Matrices, Lütkepohl 1997 (big timesaver)!.
19 Mini Summary Normal distribution p(x) =(2 ) d 2 1 e 1 2 (x µ)> 1 (x µ) Sampling from x N(µ, ) Use where and x = µ + Lz z N(0, 1) = LL > Estimating mean and variance µ = 1 m mx i=1 x i and = 1 m Conditional distribution is Gaussian, too! p(x 2 x 1 ) / exp " 1 2 apple x1 µ 1 x 2 µ 2 > apple mx i=1 x i x > i µµ > 1 apple x1 µ 1 x 2 µ 2 #
20 Gaussian Processes
21 Gaussian Process Key Idea Instead of a fixed set of random variables t, t 0 we assume a stochastic process t : X! R, e.g. X = R n. Previously we had X = {age, height, weight,...}. Definition of a Gaussian Process A stochastic process t : X! R, where all (t(x 1 ),...,t(x m )) are normally distributed. Parameters of a GP Mean µ(x) := E[t(x)] Covariance Function k(x, x 0 ) := Cov(t(x),t(x 0 )) Simplifying Assumption We assume knowledge of k(x, x 0 ) and set µ =0.
22 Gaussian Process Sampling from a Gaussian Process Points x where we want to sample Compute covariance matrix X Can only obtain values at those points! In general entire function f(x) is NOT available
23 Gaussian Process Sampling from a Gaussian Process Points x where we want to sample Compute covariance matrix X Can only obtain values at those points! In general entire function f(x) is NOT available only looks smooth (evaluated at many points)
24 Gaussian Process Sampling from a Gaussian Process Points x where we want to sample Compute covariance matrix X Can only obtain values at those points! In general entire function f(x) is NOT available p(t X) =(2 ) m 1 2 K exp 1 2 (t µ)> K 1 (t µ) where K ij = k(x i,x j ) and µ i = µ(x i )
25 Kernels... Covariance Function Function of two arguments Leads to matrix with nonnegative eigenvalues Describes correlation between pairs of observations Kernel Function of two arguments Leads to matrix with nonnegative eigenvalues Similarity measure between pairs of observations Lucky Guess We suspect that kernels and covariance functions are the same... yes!
26 Mini Summary Gaussian Process Think distribution over function values (not functions) Defined by mean and covariance function p(t X) =(2 ) m 1 2 K exp 1 2 (t µ)> K 1 (t µ) Generates vectors of arbitrary dimensionality (via X) Covariance function via kernels
27 Gaussian Process Regression
28 Gaussian Processes for Inference X = {height, weight} p(weight height) = p(height, weight) p(height) / p(height, weight)
29 Joint Gaussian Model Random variables (t,t ) are drawn from GP p(t, t 0 ) / exp 1 2 apple t µ t 0 µ 0 0 > apple Ktt K tt 0 K > tt 0 K t 0 t 0 1 apple t µ t 0 µ 0! Observe subset t Predict t using K = K t 0 t 0 K > tt 0 K 1 tt K tt 0 and µ = µ 0 + K > tt 0 K 1 tt (t µ) Linear expansion (precompute things) Predictive uncertainty is data independent Good for experimental design Predictive uncertainty is data independent
30 Linear Gaussian Process Regression Linear kernel: k(x, x 0 )=hx, x 0 i Problem Kernel matrix X > X Mean and covariance K = X 0> X 0 X 0> X(X > X) 1 X > X 0 = X 0> (1 P X )X 0. µ = X 0> X(X > X) 1 t µ is a linear function of X 0. The covariance matrix X > X has at most rank n. After n observations (x 2 R n ) the variance vanishes. This is not realistic. Flat pancake or cigar distribution.
31 Degenerate Covariance
32 Degenerate Covariance x t
33 Degenerate Covariance x t y fatten up covariance
34 Degenerate Covariance x t y fatten up covariance
35 Degenerate Covariance x t y fatten up covariance t N(µ, K) y i N(t i, 2 )
36 Additive Noise Indirect Model Instead of observing t(x) we observe y = t(x)+, where is a nuisance term. This yields Z Y m p(y X) = p(y i t i )p(t X)dt i=1 where we can now find a maximum a posteriori solution for t by maximizing the integrand (we will use this later). Additive Normal Noise If N(0, 2 ) then y is the sum of two Gaussian random variables. Means and variances add up. y N(µ, K + 2 1).
37 Data
38 Predictive mean > k(x, X) > (K(X, X)+ 2 1) 1 y
39 Variance >
40 Putting it all together
41 Putting it all together
42 Ugly details Covariance Matrices Additive noise K = K kernel Predictive mean and variance K = K t 0 t 0 K> tt 0 K 1 tt K tt 0 and µ = K > tt 0 K 1 tt t With Noise K = K t 0 t K tt > 0 K tt K tt 0 h and µ = µ 0 + K tt > 0 K tt i (y µ)
43 Pseudocode K = K t 0 t K tt > K 0 tt K tt 0 h i and µ = µ 0 + K tt > K 0 tt (y µ) ktrtr ktetr ktete = k(xtrain,xtrain) + sigma2 * eye(mtr) = k(xtest,xtrain) = k(xtest,xtest) alpha = ytr/ktrtr %better if you use cholesky yte = ktetr * alpha sigmate = ktete + sigma2 * eye(mte) +... ktetr * (ktetr/ktrtr)
44 The connection between SVM and GP Gaussian Process on Parameters t N(µ, K) where K ij = k(x i,x j ) Linear Model in Feature Space t(x) =h (x),wi + µ(x) where w N(0, 1) The covariance between t(x) and t(x 0 ) is then given by E w [h (x),wihw, (x 0 )i] =h (x), (x 0 )i = k(x, x 0 ) Linear model in feature space induces a Gaussian Process
45 Mini Summary Latent variables t drawn from a Gaussian Process Observations y are t corrupted with noise Observations y are drawn from Gaussian Process µ! µ and K! K Estimate y y,x,x (matrix inversion) K = K t 0 t K tt > 0 K tt K tt 0 h and µ = µ 0 + K tt > 0 K tt i (y µ) SVM kernel is GP kernel
46 Gaussian Process Classification
47 Gaussian Process Classification Regression Data y is scalar x t y Connection to t is by additive noise t N(µ, K) and y i N(t i, (Binary) Classification 2 ) i.e. p(y i t i )= e (y i t i ) 2 Data y in {-1, 1} Connection to t is by logistic model 1 t N(µ, K) and p(y i t i )= 1+e y it i
48 Logistic function p(y i t i )= 1 1+e y it i
49 Gaussian Process Classification Regression t N(µ, K) and y i N(t i, 2 )hencey N(µ, K + 2 1) We can integrate out the latent variable t. Classification Closed form solution is not possible t N(µ, K) and y i Logistic(t i ) (we cannot solve the integral in t). p(t y, x) / p(t x) / exp my i=1 p(y i t i ) 1 Y m 2 t> K 1 t i=1 1 1+e y it i
50 Gaussian Process Classification What we should do: integrate out t,t Z p(y 0 y, x, x 0 )= d(t, t 0 )p(y 0 t 0 )p(y t)p(t, t 0 x, x 0 ) But this is very very expensive (e.g. MCMC) Maximum a Posteriori approximation Find ˆt := argmax t p(y t)p(t x) Ignore correlation in test data (horrible) Find Estimate ˆt 0 (x 0 ) := argmax t 0 p(ˆt, t 0 x, x 0 ) y 0 y, x, x 0 Logistic(ˆt 0 (x 0 ))
51 Maximum a Posteriori Approximation Step 1 - maximize p(t y,x) minimize t 1 2 t> K 1 t + mx i=1 log 1+e y it i Step 2 - find t t for MAP estimate of t t 0 = K tt > K 1 0 tt t precompute Step 3 - estimate p(y t ) p(y 0 t 0 )= 1 1+e y0 t 0
52 Clean Data
53 Noisy Data
54 Connection to SVMs revisited SVM objective minimize 1 2 > K + mx i=1 max (0, 1 y i [K ] i ) Logistic regression objective (MAP estimation) minimize t 1 2 t> K 1 t + mx i=1 log 1+e y it i Reparametrize = K 1 t minimize 1 2 > K + mx log (1 + exp y i [K ] i ) i=1
55 More loss functions Logistic log h1+e f(x)i Huberized loss 8 >< >: 0 if f(x) > (1 f(x))2 if f(x) 2 [0, 1] 1 2 f(x) if f(x) < 0 (asymptotically) linear Soft margin max(0, 1 f(x)) (asymptotically) 0
56 Mini Summary Latent variables drawn from Gaussian Process Observation drawn from logistic model Impossible to integrate out latent variables Maximum a posteriori inference (with many hacks to make it scale) Optimization problem is similar to SVM (different loss and parametrization = K 1 t) Advanced topic - adjusting K via prior on k
Gaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationMachine Learning Srihari. Gaussian Processes. Sargur Srihari
Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationNonparametric Regression With Gaussian Processes
Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationIntroduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen
Kernel Methods Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 37 Outline
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationSupervised Learning Coursework
Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session
More informationThis ensures that we walk downhill. For fixed λ not even this may be the case.
Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationIntroduction to Machine Learning
Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationBayesian Kernel Methods
Bayesian Kernel Methods Alexander J. Smola 1 and Bernhard Schölkopf 2 1 RSISE, The Australian National University, Canberra 2, ACT, Australia 2 Max Planck Institut für Biologische Kybernetik, 7276 Tübingen,
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationPREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou
PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L2: Instance Based Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationGaussian Processes 1. Schedule
1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More information