The Expectation-Maximization Algorithm
|
|
- Solomon Norton
- 5 years ago
- Views:
Transcription
1 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford
2 MLE for Latent Variable Models Latent Variables and Marginal Likelihoods Many probabilistic models have hidden variables that are not observable in the dataset D: these models are known as latent variable models Examples: Hidden Markov Models & Mixture Models How would MLE be carried out for such models? Each data point is drawn from a joint distribution P θ (X, Z) For a realization ((X 1, Z 1 ),, (X n, Z n )), we only observe the variables in the dataset D = (X 1,, X n ) Complete-data likelihood: n P θ ((X 1, Z 1 ),, (X n, Z n )) = P θ (X i, Z i ) Marginal likelihood: P θ (X 1,, X n ) = z i=1 n P θ (X i, Z i = z) i=1 2/29
3 MLE for Latent Variable Models The Hardness of Maximizing Marginal Likelihoods (I) The MLE is obtained by maximizing the marginal likelihood: ( ) n ˆθ n = arg max log P θ (X i, Z i = z) θ Θ i=1 Solving this optimization problem is often a hard task! Non-convex Many local maxima No analytic solution z 0 Complete-data likelihood 1 Marginal likelihood log(pθ(x)) log(pθ(x)) θ θ 3/29
4 MLE for Latent Variable Models The Hardness of Maximizing Marginal Likelihoods (II) The MLE for θ is obtained by maximizing the marginal log likelihood function: ( ) n ˆθ n = arg max log P θ (X i, Z i = z) θ Θ i=1 Solving this optimization problem is often a hard task! The methods used in the previous lecture would not work Need a simpler approximate procedure! The Expectation-Maximization is an iterative algorithm that computes an approximate solution for the MLE optimization problem z 4/29
5 MLE for Latent Variable Models Exponential Families (I) The EM algorithm is well-suited for exponential family distributions Exponential Family A single-parameter exponential family is a set of probability distributions that can be expressed in the form P θ (X ) = h(x ) exp (η(θ) T (X ) A(θ)), where h(x ), A(θ) and T (X ) are known functions An alternative, equivalent form often given as P θ (X ) = h(x ) g(θ) exp (η(θ) T (X )) The variable θ is called the parameter of the family 5/29
6 MLE for Latent Variable Models Exponential Families (II) Exponential family distributions: P θ (X ) = h(x ) exp (η(θ) T (X ) A(θ)) T (X ) is a sufficient statistic of the distribution The sufficient statistic is a function of the data that fully summarizes the data X within the density function P θ (X ) This means that for any data sets D 1 and D 2, the density function is the same if T (D 1 ) = T (D 2 ) This is true even if D 1 and D 2 are quite different The sufficient statistic of a set of independent identically distributed data observations is simply the sum of individual sufficient statistics, ie T (D) = n i=1 T (X i) 6/29
7 MLE for Latent Variable Models Exponential Families (III) Exponential family distributions: P θ (X ) = h(x ) exp (η(θ) T (X ) A(θ)) η(θ) is called the natural parameter The set of values of η(θ) for which the function P θ (X ) is finite is called the natural parameter space A(θ) is called the log-partition function The mean, variance and other moments of the sufficient statistic T (X ) can be derived by differentiating A(θ) 7/29
8 MLE for Latent Variable Models Exponential Families (IV) Exponential Family Example: Normal Distribution P θ (X ) = 1 ( ) (X µ) 2 exp 2πσ 2 σ 2 = 1 exp ( X 2 2 X µ + µ 2 ) 2π 2 σ 2 log(σ) ( [ exp µ, 1 ] T [ σ = 2 2σ, X, X 2 ] ( )) T µ 2 + log(σ) 2 2σ 2 2π [ µ η(θ) = σ 2, 1 ] T 2σ 2, h(x ) = (2π) 1 2 T (X ) = [ X, X 2] ( ) T µ 2, A(θ) = 2σ 2 + log(σ) 8/29
9 MLE for Latent Variable Models Exponential Families (V) Properties of Exponential Families Exponential families have sufficient statistics that can summarize arbitrary amounts of independent identically distributed data using a fixed number of values Exponential families have conjugate priors (an important property in Bayesian statistics) The posterior predictive distribution of an exponential-family random variable with a conjugate prior can always be written in closed form 9/29
10 MLE for Latent Variable Models Exponential Families (VI) The Canonical Form of Exponential Families If η(θ) = θ, then the exponential family is said to be in canonical form The canonical form is non-unique, since η(θ) can be multiplied by any nonzero constant, provided that T (X ) is multiplied by that constant s reciprocal, or a constant c can be added to η(θ) and h(x ) multiplied by exp( c T (x)) to offset it 10/29
11 EM: The Algorithm Expectation-Maximization (I) Two unknowns: The latent variables Z = (Z 1,, Z n ) The parameter θ Complications arise because we don t know the latent variables (Z 1,, Z n ) maximizing P θ ((X 1, Z 1 ),, (X n, Z n )) is often a simpler task! Recall that maximizing the complete-data likelihood is often simpler than maximizing the marginalized likelihood! 0 Complete-data likelihood 1 Marginal likelihood log(pθ(x)) log(pθ(x)) θ θ 11/29
12 12/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory EM: The Algorithm Expectation-Maximization (II) The EM Algorithm 1 Start with an initial guess ˆθ (o) for θ For every iteration t, do the following: 2 E-Step: Q(θ, ˆθ (t) ) = z log (P θ(z = z, D)) P(Z D, ˆθ (t) ) 3 M-Step: ˆθ (t+1) = arg max θ Θ Q(θ, ˆθ (t) ) 4 Go to step 2 if stopping criterion is not met
13 EM: The Algorithm Expectation-Maximization (III) Two unknowns: The latent variables Z = (Z 1,, Z n ) The parameter θ Expected Likelihood: z log (P θ(z = z, D)) P(Z D, θ) Here the logarithm acts directly on the complete-data likelihood, so the corresponding M-step will be tractable 0 Complete-data likelihood 1 Marginal likelihood log(pθ(x)) log(pθ(x)) θ θ 13/29
14 EM: The Algorithm Expectation-Maximization (III) Two unknowns: The latent variables Z = (Z 1,, Z n ) The parameter θ Expected Likelihood: z log (P θ(z = z, D)) P(Z D, θ) Here the logarithm acts directly on the complete-data likelihood, so the corresponding M-step will be tractable But we still have two terms (log (P θ (Z = z, D)) & P(Z D, θ)) that depend on the two unknowns Z and θ The EM algorithm: E-step: Fix the posterior Z D, θ by conditioning on the current guess for θ, ie Z D, θ (t) M-step: Update the guess for θ by solving a tractable optimization problem The EM algorithm breaks down the intractable MLE optimization problem into simpler, tractable iterative steps 14/29
15 EM: The Algorithm EM for Exponential Family (I) The critical points of the marginal likelihood function: log(p θ (D)) θ = 1 z P θ(d,z=z) P θ (D) θ = 0 log (P θ (D, Z)) θ = θ log exp ( η(θ), T (D, Z) A(θ)) h(d, Z) }{{} Canonical form of exponential family For η(θ) = θ, we have that P θ (D, Z) = (T (D, Z) θ ) θ A(θ) P θ (D, Z) 15/29
16 EM: The Algorithm EM for Exponential Family (II) For exponential families: E θ [T (D, Z)] = θ A(θ) P θ (D, Z) = (T (D, Z) E θ [T (D, Z)]) P θ (D, Z) θ 1 P Since θ (D,Z=z) P θ (D) z θ = 0, we have that 1 P θ (D) (T (D, Z = z) E θ [T (D, Z)]) P θ (D, Z = z) = 0 z z T (D, Z = z) P θ(d, Z = z) P θ (D) E θ [T (D, Z)] = 0 E θ [T (D, Z) D] E θ [T (D, Z)] = 0 16/29
17 EM: The Algorithm EM for Exponential Family (III) For the critical values of θ, the following condition is satisfied: E θ [T (D, Z) D] = E θ [T (D, Z)] How is this related to the EM objective Q(θ, ˆθ (t) )? Q(θ, ˆθ (t) ) = z log (P θ (Z = z, D)) Pˆθ(t)(Z D) = θ Eˆθ (t) [T (D, Z) D] A(θ) + Constant = θ Eˆθ (t) [T (D, Z) D] E θ [T (D, Z)] + Constant Q(θ,ˆθ (t) ) θ = 0 Eˆθ(t) [T (D, Z) D] = E θ [T (D, Z)] Since it is difficult to solve the above equation analytically, the EM algorithm solves for θ via successive approximations, ie solve the following for ˆθ (t+1) : Eˆθ(t) [T (D, Z) D] = Eˆθ(t+1) [T (D, Z)] 17/29
18 Multivariate Gaussian Mixture Models Example: Multivariate Gaussian Mixtures Parameters for a mixture of K Gaussians: mixture proportions {π k } K k=1, mean vectors and covariance matrices {(µ k, Σ k )} K k=1 3 K = X X1 Figure: Contour plot for the density of a mixture of 3 bivariate Gaussian distributions 18/29
19 Multivariate Gaussian Mixture Models The Generative Process Z i = z Categorical(π 1,, π K ), and X i N (µ z, Σ z ) X X1 Figure: A sample from a mixture model: every data point is colored according to its component membership 19/29
20 Multivariate Gaussian Mixture Models The Dataset Need to learn the parameters (π k, µ k, Σ k ) K k=1 from the data points D = (X 1,, X n ) that are not colored by the component memberships, ie we do not observe the latent variables Z = (Z 1,, Z n ) X2 1 2 X X X1 (a) (D, Z): the data points and their component memberships (b) D: the dataset with the observed data points (component memberships are latent) 20/29
21 EM for Gaussian Mixture Models MLE for the Gaussian Mixture Models The complete-data likelihood function is given by n P θ (D, Z) = π zi N (X i µ zi, Σ zi ) i=1 The marginal likelihood function is P θ (D) = n i=1 k=1 K π k N (X i µ k, Σ k ) The MLE can be obtained by maximizing the marginal log likelihood function: ( n K ) ˆθ n = arg max log π k N (X i µ k, Σ k ) θ Θ i=1 k=1 Exercise: Is the objective function above concave? 21/29
22 EM for Gaussian Mixture Models Implementing EM for the Gaussian Mixture Model (I) The expected complete-data log likelihood function is E z [P θ (D, Z)] = n i=1 k=1 γ(k, X i θ) = P θ (Z i = k X i ) K γ(k, X i θ) (log(π k ) + log (N (X i µ k, Σ k ))) γ(k, X i θ) is called the responsibility of component k towards data point X i γ(k, X i θ) = π k N (X i µ k, Σ k ) K j=1 π j N (X i µ j, Σ j ) Try to work out the derivation above yourself! 22/29
23 EM for Gaussian Mixture Models Implementing EM for the Gaussian Mixture Model (II) (E-step) Approximate expected complete-data likelihood by fixing the responsibilities γ(k, X i θ) using the parameter estimates obtained from the previous iteration Q(θ, ˆθ (t) ) = γ(k, X i ˆθ (t) ) = n K γ(k, X i ˆθ (t) ) (log(π k ) + log (N (X i µ k, Σ k ))) i=1 k=1 ˆπ (t) k K j=1 ˆπ(t) j N (X i ˆµ (t) (t), ˆΣ k k ) N (X i ˆµ (t) j, ˆΣ (t) j ) (M-step) Solve a tractable optimization problem (ˆπ (t+1), ˆµ (t+1), ˆΣ (t+1) ) = arg max (π,µ,σ) n i=1 k=1 K γ(k, X i ˆθ (t) ) (log(π k ) + log (N (X i µ k, Σ k ))) 23/29
24 EM for Gaussian Mixture Models Implementing EM for the Gaussian Mixture Model (III) The (M-step) yields the following parameter updating equations ˆπ (t+1) k ˆµ (t+1) k = 1 n ˆΣ (t+1) k = n γ(k, X i ˆθ (t) ) i=1 = 1 n X i γ(k, X i ˆθ (t) ) n i=1 n γ(k, X i ˆθ (t) ) n j=1 γ(k, X j ˆθ (t) ) (X i ˆµ (t+1) i=1 k )(X i ˆµ (t+1) k ) T Try to work out the updating equations by yourself! 24/29
25 EM for Gaussian Mixture Models EM in Practice Consider a Gaussian mixture model with K = 3, and the following parameters: π 1 = 06, π 2 = 005, and π 3 = 035 µ 1 = [ 14, 18] T, µ 2 = [ 14, 28] T, µ 3 = [ 19, 055] T Σ 1 = [ ] 08 08, Σ = [ ] 12 23, Σ = [ 04 ] Try writing a MATLAB code that generates a random dataset of 5000 data points drawn from the model specified above, and implement the EM algorithm to learn the model parameters from this dataset 25/29
26 EM for Gaussian Mixture Models EM in Practice The complete-data log likelihood increases after every EM iteration! This means that every new iteration finds a better estimate! Log-likelihood EM iteration 26/29
27 EM for Gaussian Mixture Models EM in Practice Compare the true density function with the estimated one 6 Contour Plot for the True Density Function Contour Plot for the Estimated Density Function X2 0 X X X 1 27/29
28 EM Performance Guarantees What Does EM Guarantee? The EM algorithm does not guarantee that ˆθ (t) will converge to ˆθ n EM guarantees the following: ˆθ (t) always converges (to a local optimum) Every iteration improves the marginal likelihood Pˆθ (t) (D) Does the Initial Value Matter? 1 The initial value θ (o) affects the speed of convergence and the value of θ ( )! Smart initialization methods are often needed 2 The K-means algorithm is often used to initialize the parameters in a Gaussian mixture model before applying the EM algorithm 28/29
29 29/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory EM Performance Guarantees References 1 Robert W Keener, Statistical theory: notes for a course in theoretical statistics, Robert W Keener, Theoretical Statistics: Topics for a Core Course, Christopher Bishop, Pattern Recognition and Machine Learning, 2007
Introduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationComputing the MLE and the EM Algorithm
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationCOMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017
COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University OVERVIEW This class will cover model-based
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationIntroduction: exponential family, conjugacy, and sufficiency (9/2/13)
STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationLatent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationSTATS 306B: Unsupervised Learning Spring Lecture 3 April 7th
STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationExpectation Maximization - Math and Pictures Johannes Traa
Expectation Maximization - Math and Pictures Johannes Traa This document covers the basics of the EM algorithm, maximum likelihood ML) estimation, and maximum a posteriori MAP) estimation. It also covers
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationOutline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.
s s Practitioner Course: Portfolio Optimization September 24, 2008 s The Goal of s The goal of estimation is to assign numerical values to the parameters of a probability model. Considerations There are
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationVariational Learning : From exponential families to multilinear systems
Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More information