Machine Learning CS-6350, Assignment - 5 Due: 14 th November 2013

Size: px
Start display at page:

Download "Machine Learning CS-6350, Assignment - 5 Due: 14 th November 2013"

Transcription

1 SCHOOL OF COMPUTING, UNIVERSITY OF UTAH Machine Learning CS-6350, Assignment - 5 Due: 14 th November 2013 Chandramouli, Shridharan sdharan@cs.utah.edu ( ) Singla, Sumedha sumedha.singla@utah.edu ( ) November 14, 2013 INTRODUCTION In this assignment, we are asked to work on and implement mixture of gaussian for classification. Sections 1 of this assignment is on using the Mixture of Gaussians for classification, whereas the sections 2 and section 3 concentrate on using the Mixture of Gaussians as a tool for regression. 1 MIXTURE OF GAUSSIANS FOR CLASSIFICATION Question 1(a): In an unsupervised classification problem, we are given data [x i ] n and a number of classes m, and we try to learn: p(y x) = p(x y)p(y) p(x) = p(x y)p(y) m =1 p(x y = )p(y = ) (1.1) where y {1..m} are the class labels. Our hypothesis is that the data is generated as: y Mul tinomi al (φ), (x y = ) N (µ,σ ) {1..m} (1.2) 1

2 Hence, the parameters of this model are φ,[µ ] m =1 and [Σ ] m =1.Since the labels y i of each data point are unknown, we need the EM-algorithm to find locally-maximal log-likelihood estimates for φ,[µ ] m =1 and [Σ ] m =1 Question: Derive the update equations for the parameters φ,[µ ] m =1 and [Σ ] m of the model =1 as maximizers of the expected log-likelihood given the distributions p(y i = ) = w i, Answer:The mixture of Gaussian model for one x i in terms of a latent variable y is: p(x) = p(x i y = )p(y = ) =1 p(y = ) φ p(x i y = ) N (µ,σ ) (1.3) Let us now introduce the following indicator variable: q i, = 1 if Gaussian emitted x i and 0 else where We can now write the oint likelihood of all the X and q: n m p(x,q y) = p(x i y = ) q i, p(y = ) q i, =1 (1.4) Taking log at both sides log (p(x,q y)) = q i, log (p(x i y = )) + q i, log (p(y = )) (1.5) =1 The corresponding auxiliary function is: A(y) = =1 A(y) = E[log (p(x,q y))] A(y) = E q i, log (p(x i y = )) q i, log (p(y = )) A(y) = ( E[q i, ]log =1 =1 E[q i, ]log (p(x y = )) + E[q i, ]log (p(y = )) 1 (2π) a Σ e 1 2 (x i µ ) T Σ 1 (x i µ ) ) + E[q i, ]log (φ ) (1.6) The E-Step estimates the posterior: E[q i, ] = 1 p(q i, = 1 X ) + 0 p(q i, = 0 X ) E[q i, ] = p(y = x) = w i, (1.7) The M-step finds the parameters y that maximizes A A y = 0 (1.8) 2

3 M-Step for Means A = 0 µ A A log p(x i y = ) = µ log p(x i y = ) µ ( ) log 1 e 1 2 (x i µ ) T Σ 1 (x i µ ) (2π) a Σ = 0 = p(y = x i ) µ = 0 = w i,. x i µ Σ w i,.x i = w i,.µ µ = n w i,.x i n w i, (1.9) M-Step for Variances A = 0 Σ A A log p(x i y = ) = Σ log p(x i y = ) Σ ( ) log 1 e 1 2 (x i µ ) T Σ 1 (x i µ ) (2π) a Σ = 0 = p(y = x i ) Σ ( ) (xi µ ) = 0 = w i,. 1 2Σ 2Σ 2 w i,.(x i µ ) 2 = w i, Σ n w i,.(x i µ ) 2 Σ = n w i, (1.10) M-Step for Weights φ = 1 (1.11) =1 We can rewrite auxiliary function as ( J(y) = A(y) + 1 φ ).λ (1.12) =1 3

4 where λ are Lagrange multipliers. = 0 = 1. ( J φ = 0 J = J A λ φ A φ ) p(y = x i ). 1 λ φ n = φ = w i, λ (1.13) n φ = w i, m n k=1 p(k x i ) φ = 1 w i, n Question 1 (b): Plot the data points. Based on the plot, how many classes m do you think the data set should be classified into? Answer 1 (b): Bases on the plot we have choosen m = 3. Plot for the data in 2-D plane is shown in figure 1.1 Figure 1.1: Plot of the initial data Question 1 (c): Implement the EM algorithm to learn the parameters. 4

5 Answer For the EM algorithm, we wrote a Matlab script which runs the E-step and the M-step alternatively, till we get convergence on the log likelihood. After implementing the EM Algorithm the final values for the means are: [ 7.02 µ 1 = 4 [ 1.98 µ 2 = 3.9 [ 5.01 µ 3 = 5.97 We found that repeating the experiment with a different initial mean did not change the final values much.the values remains almost same with different runs, the minor change is due to the inital value of mean. This is probably due to a good and well defined peaks in the data set. Question 1 (c):how did you initialize the EMalgorithm? How does the initialization affect the result? Describe what you see. AnswerWe initialize the EM algorithm with a random value of mean for each gaussian. We have taken the intial value of Sigma as Identity matrix and φ as 1 m where m is the number of gaussians. The final value of the parameters are relatively similar and have little dependence on initial values. But the number of iterations to get the final values depends on the initial guess of the means. If the initial guess is very poor than it take more number of iterations to get the convergence. Refer figure 1.2 Question 1 (c):shows plots of the means and the 1-sigma contours of the Gaussian distributions x y = for 1.m on top of the data upon convergence. Answer Plot is shown in figure 1.3 Question 1 (c):show plots of the evolution of the expected log-likelihood during the iteration of the EM-algorithm. Answer Plot is shown in figure 1.4 Question 1 (d): Given the parameters φ,[µ ] m =1 and [Σ ] m as computed above, give an =1 implicit equation for the decision boundaries between each of the classes. Plot the decision boundaries. Answer Decision boundary between the i th Gaussian and th Gaussian is given implicitly by the equation: p(x y = i )p(y = i ) = p(x y = )p(y = ) unique combinations of i, where i, 1.m Plot is shown in figure 1.5 ] ] ] 5

6 Figure 1.2: Convergence plot for poor initial mean Figure 1.3: Plots of the means and the 1-sigma contours of the Gaussian distributions 6

7 Figure 1.4: The expected log-likelihood Vs Number of Iterations Figure 1.5: The expected log-likelihood Vs Number of Iterations 7

8 2 QUESTION 2: MIXTURE OF GAUSSIANS FOR NON-LINEAR REGRESSION. The mixture of Gaussians model can be used to parameterize multi-modal distributions of data. The model is given by: p(x) = φ N (x : µ,σ ), (2.1) =1 for a given number m of constituent Gaussians, where N (x : µ,σ) is the probability density at x of a Gaussian distribution with mean µ and variance Σ: N (x : µ,σ) = 1 (2π) a Σ exp( 1 2 (x µ)t Σ 1 (x µ) (2.2) where a is the dimension of x. Also here,φ,[µ ] m =1 and [Σ ] m are the parameters of the model, which can be learned using =1 the same EM-algorithm as above given a set of data. Here, φ R,µ 0,µ 1 R dim( ˆX ), andσ R dim( ˆX ) dim( ˆX ) are the parameters of the model. Here, ˆX is a feature vector whose elements are functions of the raw data elements of x. Question 2 (a): Assume that x is distributed according to a mixture of Gaussians distribution with parameters φ,[µ ] m =1 and [Σ ] m and a given number m of constituent Gaussians. =1 Give equations for the mean µ = E(x) and variance Σ = V ar (x) of the distribution. Answerµ = E(x) is given by equation: Σ = V ar (x) is given by equation Σ = V ar (x) = µ = E(x) = φ µ (2.3) =1 φ Σ + φ (µ E(x))(µ E(x)) T (2.4) =1 =1 Question 2 (b): Give equations for the parameters of the marginal mixture of Gaussians distribution over x in terms of the parameters of the oint distribution? Answer If the oint distribution of x and y is given as : ([ ]) ( [ ] {[ x x µ X P = φ y N ; y µ Y =1 ], [ Σ X Σ Y X Σ X Y Σ Y ]}) (2.5) Then, the marginal mixture of Gaussians distribution over x in terms of the parameters of the oint distribution has the parameters: 8

9 1.m mean and variance is given by: { } ) m X MoG m (φ, µ x,σx =1 [ µ X µ Y ] = 1 n [ xi y i ] (2.6) (2.7) [ Σ X Σ X Y Σ Y X Σ Y ] = 1 n ([ xi y i ] [ µ X µ Y ])([ xi y i ] [ µ X µ Y ]) T (2.8) ( ) N X ;µ x φ =,Σx φ m k=1 N ( X ;µ x (2.9) k k),σx φk ( ) 1 ( ) µ n = µx + Σyx Σ x X µ x (2.10) ( ) 1 Σ = Σy Σyx Σ x x y Σ (2.11) Question 2 (c):the conditional distribution x y is also a mixture of Gaussians distribution. Give equations for the parameters of the conditional mixture of Gaussians distribution of x y in terms of the parameters of the oint distribution? Answer The parameters of the conditional mixture of Gaussians distribution of x y in terms of the parameters of the oint distribution are as follows: ( ) N Y ;µ y φ =,Σy φ ( (2.12) m k=1 N Y ;µ )φ Y,ΣY k µ n = µy + ΣX Y Σ = ΣX Σ X Y ( ) Σ y 1 ( ) Y µ y (2.13) ( ) 1 Σ Y Σ Y X (2.14) Question 2 (d) Plot the data and compute the parameters of the oint mixture of Gaussians distribution of for varying number m of constituent Gaussians using the EM-algorithm. Plot the means and 1-sigma contours of the constituent Gaussians on top of the data Answer Plot of the data is shown in figure: 2.1 9

10 Figure 2.1: Plots of the data Figure 2.2: The means and 1-sigma contours of the constituent Gaussians on top of the data 10

11 Figure 2.3: The means and 1-sigma contours of the constituent Gaussians on top of the data Question 2 (d)plot the graphs of E(y x) and the 1-sigma countours E(y x) ± (V ar (y x)) as functions of x on top of the data. What happens if you change the number m of constituent Gaussians. Answer Plot is shown in figure: 2.5 On changing the number m of constituent Gaussians, the accuaracy of the model increases, and the data fits more accurately between the E(y x) ± (V ar (y x)) boundaries. 3 QUESTION 3 : MIXTURE OF GAUSSIANS FOR NON LINEAR REGRESSION Question 3(a) Generate n = 10,000 random states and control inputs {x i ;u i } i=n 1 with bounds i=0 10 < x < 10,10 < y < 10, π < θ < π, 1 φ 1, and 2 v 2, and compute x next = i f (x i,u i ) + m i with m i = N(0,0.0001I ) for all i = 0...n 1. To learn the dynamics model f from this data, learn the parameters of the oint mixture of Gaussians distribution of the data ( x next ) T x i i u i for a varying number m (m should be in the order of tens) of constituent Gaussians using the EM-algorithm. Answer 11

12 Figure 2.4: The means and 1-sigma contours of the constituent Gaussians on top of the data 12

13 Figure 2.5: The E(y x) and the 1-sigma countours E(y x) ± (V ar (y x)) as functions of x We wrote a Matlab function called generate_data.m which takes as input the number of data points (n) and returns a randomly generated set of vectors x i, u i and calculates the value of x next from the below equations. [X,U, X next ] = g ener ate_dat a(n,noi se_f l ag ) Where n is the number of data points needed (n = 10,000 as required per assignment) and noise_flag determines if the values of X,U and X next are modified with additive Gaussian noise. x k + v k cos (θ k ) t if φ k = 0, ( ) x k+1 = x k λsin(θ k ) t an(φ + λsin t an(φ θ k +v k ) k t λ otherwise k) t an(φ k) (3.1) y k + v k sin (θ k ) t if φ k = 0, ( ) y k+1 = y k λcos(θ k ) t an(φ λcos t an(φ θ k +v k ) k t λ otherwise k) t an(φ k) (3.2) t an ( ) φ k θ k+1 = θ k + v k t λ (3.3) The function find_next_position.m uses the data generated by the generate_data function and determines the model parameters for a mixture of gaussians. The number of gaussians is given as an input to this function. 13

14 Figure 2.6: The E(y x) and the 1-sigma countours E(y x) ± (V ar (y x)) as functions of x 14

15 Question 3(b) Now, by constructing the conditional mixture of Gaussian distribution x next x,u and computing its expected value E ( x next x,u ), we get a learned estimate of the function f. Generate a new set of n = 10,000 random states and control inputs {x i,u i } n 1 with bounds 10 < x < 10 i=0, 10 < y < 10, π < θ < π, 1 φ 1 and 2 v 2, and compute the root-mean squared error. Plot this error for a varying number of m of the constituent Gaussians. What is the optimal number of constituent Gaussians? Answer: [µ,φ,σ,rmser r or ] = f ind_next_posi tion ( m Gaussi ancluster s,n dat apoint s ) Here µ, φ, Σ are the model parameters for the determined mixture of m Gaussians over n data points evaluated using the EM Algorithm. Using the oint distribution of X next, X &U, the conditional distribution of X next X,U is calculated as follows: Where, ( { } ) m X next X,U MoG φ, µ i,σ =1 φ = ( ) N Y ;µ y,σy φ m k=1 N ( Y ;µ Y,ΣY µ n = µy + ΣX Y Σ = ΣX Σ X Y ( ) Σ y 1 ( ) Y µ y (3.4) )φ k (3.5) (3.6) ( ) 1 Σ Y Σ Y X (3.7) Here, Y = [X U ] T and is used so for easier representations. The expected value of the above conditional distribution, gives us the estimate of the new X next value. µ = E(x) = φ µ (3.8) We ran the above code for m = 5,10, a number of times for the newly generated data containing X,Uand X next and plotted the values of the average RMS. We found that there was a significant increase in the accuracy of the predicted x next at around m = 30 or 40 and then it almost saturated beyond it. The plot of the RMS error against the number of constituent gaussians is shown below in Figure 3.1. Since there is very little difference in RMS values to ustify a huge m value, the optimal value of m is between 30 and 40. =1 4 WORK SPLIT-UP The work was split evenly between the authors. The matlab programs were coded was coded together and the equations for the first, second and third question was solved individually and verified. 15

16 Figure 3.1: Plot of RMS error against number of constituent Gaussians 5 CONCLUSION In this assignment, we used the Mixture of Gaussians model to model the distribution of data and to use it as a means of running regressions on data. The first part of the assignment dealt with using the Mixture of Gaussians for classifing a set of data, whereas the second and the third parts of the assignment dealt with using the mixture of Gaussian models to run regressions on non linear data. 16

Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013

Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013 SCHOOL OF COMPUTING, UNIVERSITY OF UTAH Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013 Chandramouli, Shridharan sdharan@cs.utah.edu (00873255) Singla, Sumedha sumedha.singla@utah.edu

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

EM Algorithm LECTURE OUTLINE

EM Algorithm LECTURE OUTLINE EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,

More information

speaker recognition using gmm-ubm semester project presentation

speaker recognition using gmm-ubm semester project presentation speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

TDA231. Logistic regression

TDA231. Logistic regression TDA231 Devdatt Dubhashi dubhashi@chalmers.se Dept. of Computer Science and Engg. Chalmers University February 19, 2016 Some data 5 x2 0 5 5 0 5 x 1 In the Bayes classifier, we built a model of each class

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Infering the Number of State Clusters in Hidden Markov Model and its Extension Infering the Number of State Clusters in Hidden Markov Model and its Extension Xugang Ye Department of Applied Mathematics and Statistics, Johns Hopkins University Elements of a Hidden Markov Model (HMM)

More information