Mixtures of Gaussians. Sargur Srihari
|
|
- Sherilyn Farmer
- 6 years ago
- Views:
Transcription
1 Mixtures of Gaussians Sargur 1
2 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm in General 2
3 Topics in Mixtures of Gaussians Goal of Gaussian Mixture Modeling Latent Variables Maximum Likelihood EM for Gaussian Mixtures 3
4 Goal of Gaussian Mixture Modeling Machine Learning A linear superposition of Gaussians in the form k=1 Goal of Modeling: K p(x) = π k (x µ k, Σ k ) Find maximum likelihood parameters π k, µ k, Σ k Examples of data sets and models 1-D data, K=2 subclasses 2-D data, K=3 k 1 2 π µ σ Each data point is associated with a subclass k with probability π k
5 GMMs and Latent Variables A GMM is a linear superposition of Gaussian components Provides a richer class of density models than the single Gaussian We formulate a GMM in terms of discrete latent variables This provides deeper insight into this distribution Serves to motivate the EM algorithm Which gives a maximum likelihood solution to no. of components and their means/covariances 5
6 Latent Variable Representation Linear superposition of K Gaussians: K p(x) = π k (x µ k, Σ k ) k=1 Introduce a K-dimensional binary variable z Use 1-of-K representation (one-hot vector) Let z = z 1,..,z K whose elements are k z k {0,1} and z k = 1 K possible states of z corresponding to K components k 1 2 z π k µ k σ k k z
7 Joint Distribution Define joint distribution of latent variable and observed variable p(x,z)=p(x z) p(z) x is observed variable z is the hidden or missing variable Marginal distribution p(z) Conditional distribution p(x z) 7
8 Graphical Representation of Mixture Model The joint distribution p(x,z) is represented in the form p(z)p(x z) Latent variable z=[z 1,..z K ] represents subclass Observed variable x We now specify marginal p(z)and conditional p(x z) Using them we specify p(x) in terms of observed and latent variables 8
9 Specifying the marginal p(z) Associate a probability with each component z k Denote p(z k = 1) = π k where parameters {π k } satisfy 0 π k 1 and π k = 1 k Because z uses 1-of-K it follows that p(z) p(z) = K z π k k k=1 p(x z) since z k {0,1} and components of z are mutually exclusive and hence are independent With one component p(z 1 ) = π 1 z 1 With two components p(z 1,z 2 ) = π 1 z 1 π 2 z 2 9
10 Specifying the Conditional p(x z) For a particular component (value of z) p(x z k = 1) = (x µ k, Σ k ) Thus p(x z) can be written in the form p(x z) = K k=1 ( x µ k, Σ ) z k k Due to the exponent z k all product terms except for one equal one p(z) p(x z) 10
11 Marginal distribution p(x) The joint distribution p(x,z) is given by p(z)p(x z) Thus marginal distribution of x is obtained by summing over all possible states of z to give p(x) = p(z)p(x z) = π z k ( x µ k k, Σ ) z k k = π k x µ k, Σ k Since z z k {0,1} z K k=1 ( ) This is the standard form of a Gaussian mixture K k=1 11
12 Value of Introducing Latent Variable If we have observations x 1,..,x Because marginal distribution is in the form p(x) = z p(x,z) It follows that for every observed data point x n there is a corresponding latent vector z n, i.e., its sub-class Thus we have found a formulation of Gaussian mixture involving an explicit latent variable We are now able to work with joint distribution p(x,z) instead of marginal p(x) Leads to significant simplification through introduction of expectation maximization 12
13 Another conditional probability (Responsibility) In EM p(z x) plays a role The probability p(z k =1 x) is denoted γ (z k ) From Bayes theorem View γ (z k ) p(z k = 1 x) = p(z k = 1)p(x z k = 1) p(z k = 1) = π k γ (z k ) = p(z k = 1 x) K j =1 p(z j = 1)p(x z j = 1) = π k (x µ k, Σ k ) K j =1 π j (x µ k, Σ j ) as prior probability of component k as the posterior probability p(x,z)=p(x z)p(z) it is also the responsibility that component k takes for explaining the observation x 13
14 Plan of Discussion ext we look at 1. How to get data from a mixture model synthetically and then 2. Given a data set {x 1,..x } how to model the data using a mixture of Gaussians 14
15 Synthesizing data from mixture Use ancestral sampling Start with lowest numbered node and draw a sample, Generate sample of z, called ẑ move to successor node and draw a sample given the parent value, etc. Then generate a value for x from conditional p(x ẑ ) Samples from p(x,z) are plotted according to value of x and colored with value of z Samples from marginal p(x) obtained by ignoring values of z 500 points from three Gaussians Complete Data set Incomplete Data set 15
16 Illustration of responsibilities Evaluate for every data point Posterior probability of each component Responsibility γ (z nk ) is associated with data point x n Color using proportion of red, blue and green ink If for a data point γ (z n1 ) = 1 it is colored red If for another point γ (z n2 ) = γ (z n3 ) = 0.5 it has equal blue and green and will appear as cyan 16
17 Maximum Likelihood for GMM We wish to model data set {x 1,..x } using a mixture of Gaussians ( items each of dimension D) Represent by x D matrix X n th row is given by x n T Represent latent variables with x K matrix Z n th row is given by z n T Z = X = Goal is to state the likelihood function so as to estimate the three sets of parameters by maximizing the likelihood z 1 z 2 z x 1 x 2 x 17
18 Graphical representation of GMM For a set of i.i.d. data points {x n } with corresponding latent points {z n } where n=1,.., Bayesian etwork for p(x,z) using plate notation x D matrix X x K matrix Z 18
19 Likelihood Function for GMM Mixture density function is ( ) p(x) = p(z)p(x z) = π k x µ k, Σ k z K Therefore Likelihood function is p(x π, µ, Σ) = n=1 k=1 K π k (x n µ k, Σ k ) k=1 Since z has values {z k } with probabilities {π k } Product is over the i.i.d. samples Therefore log-likelihood function is ln p(x π, µ, Σ) = n=1 K ln π k (x n µ k, Σ k ) k=1 Which we wish to maximize A more difficult problem than for a single Gaussian 19
20 Maximization of Log-Likelihood ln p(x π, µ, Σ) = n=1 K ln π k (x n µ k, Σ k ) k=1 Goal is to estimate the three sets of parameters π k,µ k,σ k By taking derivatives in turn w.r.t each while keeping others constant But there are no closed-form solutions Task is not straightforward since summation appears in Gaussian and logarithm does not operate on Gaussian While a gradient-based optimization is possible, we consider the iterative EM algorithm 20
21 Some issues with GMM m.l.e. Before proceeding with the m.l.e. briefly mention two technical issues: 1. Problem of singularities with Gaussian mixtures 2. Problem of Identifiability of mixtures 21
22 Problem of Singularities with Gaussian mixtures Consider Gaussian mixture components with covariance matrices Data point that falls on a mean µ j = x n contribute to the likelihood function (x n x n,σ j 2 I) = 1 1 (2π ) 1/2 σ j since exp(x n -µ j ) 2 =1 Σ k = σ k 2 I As σ j 0 term goes to infinity Therefore maximization of log-likelihood K ln π k (x n µ k, Σ k ) is not well-posed ln p(x π, µ, Σ) = n=1 k=1 Does not happen with a single Gaussian Multiplicative factors go to zero Does not happen in the Bayesian approach Problem is avoided using heuristics Resetting mean or covariance will One component assigns finite values and other to large value Multiplicative values Take it to zero 22
23 Problem of Identifiability A density p(x θ) is identifiable if θ θ ' then there is an x for which p(x θ) p(x θ ') A K-component mixture will have a total of K! equivalent solutions Corresponding to K! ways of assigning K sets of parameters to K components E.g., for K=3 K!=6: 123, 132, 213, 231, 312, 321 For any given point in the space of parameter values there will be a further K!-1 additional points all giving exactly same distribution However any of the equivalent solutions is as good as the other Two ways of labeling three Gaussian subclasses A B C B A C 23
24 EM for Gaussian Mixtures EM is a method for finding maximum likelihood solutions for models with latent variables Begin with log-likelihood function ln p(x π, µ, Σ) = K ln π k (x n µ k, Σ k ) k=1 We wish to find that maximize this quantity Task is not straightforward since summation appears in Gaussian and logarithm does not operate on Gaussian Take derivatives in turn w.r.t Means µ k and set to zero Σ k n=1 π,µ,σ covariance matrices and set to zero mixing coefficients and set to zero π k 24
25 EM for GMM: Derivative wrt Begin with log-likelihood function ln p(x π, µ, Σ) = n=1 K ln π k (x n µ k, Σ k ) k=1 µ k Take derivative w.r.t the means Making use of exponential form of Gaussian Use formulas: We get d dx lnu = u ' u and µ k d dx eu = e u u ' and set to zero 0 = n=1 j π k (x n µ k, Σ k ) 1 (x π j (x n µ j, Σ j ) n µ k k ) γ (z nk ) the posterior probabilities Inverse of covariance matrix 25
26 M.L.E. solution for Means Multiplying by µ k = 1 k γ (z nk )x n Where we have defined n=1 k = γ (z nk ) n=1 Σ k (assuming non-singularity) Mean of k th Gaussian component is the weighted mean of all the points in the data set: where data point x n is weighted by the posterior probability that component k was responsible for generating x n Which is the effective number of points assigned to cluster k 26
27 M.L.E. solution for Covariance Set derivative wrt Σ k to zero Making use of mle solution for covariance matrix of single Gaussian Σ k = 1 k n=1 γ (z nk )(x n µ k )(x n µ k ) T Similar to result for a single Gaussian for the data set but each data point weighted by the corresponding posterior probability Denominator is effective no of points in component 27
28 M.L.E. solution for Mixing Coefficients Maximize ln p(x π, µ, Σ) w.r.t. π k Must take into account that mixing coefficients sum to one Achieved using Lagrange multiplier and maximizing ln p(x π,µ,σ) + λ K k=1 π k 1 π k Setting derivative wrt to zero and solving gives π = k k 28
29 Summary of m.l.e. expressions GMM maximum likelihood parameter estimates Means 1 µ = γ( z )x k nk n k n= 1 Covariance matrices Mixing Coefficients Σ k = 1 k γ(z nk )(x n µ k )(x n µ k ) T π = k n =1 k k = γ(z nk ) All three are in terms of responsibilities and so we have not completely solved the problem n =1 29
30 EM Formulation The results for µ k,σ k,π k are not closed form solutions for the parameters Since γ (z nk ) the responsibilities depend on those parameters in a complex way Results suggest an iterative solution An instance of EM algorithm for the particular case of GMM 30
31 Informal EM for GMM First choose initial values for means, covariances and mixing coefficients Alternate between following two updates Called E step and M step In E step use current value of parameters to evaluate posterior probabilities, or responsibilities In the M step use these posterior probabilities to to reestimate means, covariances and mixing coefficients 31
32 EM using Old Faithful Data points and Initial mixture model Initial E step Determine responsibilities After first M step Re-evaluate Parameters After 2 cycles After 5 cycles After 20 cycles 32
33 Comparison with K-Means K-means result E-M result 33
34 Animation of EM for Old Faithful Data File:Em_old_faithful.gif Code in R #initial parameter estimates (chosen to be deliberately bad) theta <- list( tau=c(0.5,0.5), mu1=c(2.8,75), mu2=c(3.6,58), sigma1=matrix(c(0.8,7,7,70),ncol=2), sigma2=matrix(c(0.8,7,7,70),ncol=2) ) 34
35 Practical Issues with EM Takes many more iterations than K-means Each cycle requires significantly more comparison Common to run K-means first in order to find suitable initialization Covariance matrices can be initialized to covariances of clusters found by K-means EM is not guaranteed to find global maximum of log likelihood function 35
36 Summary of EM for GMM Given a Gaussian mixture model Goal is to maximize the likelihood function w.r.t. the parameters (means, covariances and mixing coefficients) Step1: Initialize the means, covariances and mixing coefficients log-likelihood π k µ k Σ k and evaluate initial value of 36
37 EM continued Step 2: E step: Evaluate responsibilities using current parameter values γ (z k )= π k (x n µ k, Σ k ) K j =1 π j (x n µ j, Σ j )) Step 3: M Step: Re-estimate parameters using current responsibilities µ k new = 1 k γ (z nk )x n n=1 Σ k new = 1 k n=1 γ (z nk )(x n µ k new )(x n µ k new ) T π k new = k where k = γ (z nk ) n=1 37
38 EM Continued Step 4: Evaluate the log likelihood ln p(x π, µ, Σ) = n=1 ln K π k (x n µ k, Σ k ) k=1 And check for convergence of either parameters or log likelihood If convergence not satisfied return to Step 2 38
Latent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationReview and Motivation
Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationMixture Models and EM
Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationClustering and Gaussian Mixtures
Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected
More informationVariational Inference and Learning. Sargur N. Srihari
Variational Inference and Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics in Approximate Inference Task of Inference Intractability in Inference 1. Inference as Optimization 2. Expectation Maximization
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationProbabilistic clustering
Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More information1 Expectation Maximization
Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationMCMC and Gibbs Sampling. Sargur Srihari
MCMC and Gibbs Sampling Sargur srihari@cedar.buffalo.edu 1 Topics 1. Markov Chain Monte Carlo 2. Markov Chains 3. Gibbs Sampling 4. Basic Metropolis Algorithm 5. Metropolis-Hastings Algorithm 6. Slice
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationThe Expectation-Maximization Algorithm
The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More information