Lecture 6: Gaussian Mixture Models (GMM)

Size: px

Start display at page:

Download "Lecture 6: Gaussian Mixture Models (GMM)"

Amie Hodges
5 years ago
Views:

1 Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee

2 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning for simple models (maximum likelihood) Gaussian CLT Mixtures Multi modal data Parameter learning for GMM Expectation Maximization Applications 2

3 A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities... Let s start from the very very beginning 3

4 Goal: describe a set of data with a model and then use the model as the representative of the data What we have: A set of d-dimensional data points Example: Two dimensions: d=2 N=16 X = x 1,, x N x t R d = {x t N } t=1 4

5 Assumptions: There exists a model that describes the data The designer knows the family type of the model, but not its parameters To do: Given the family model, find the best model that fits to the data Meaning: given the distribution family, find the best parameters that fits the distribution to the observed data 5

6 Model families and their parameters: Chi-square Beta Poisson Gaussian (1-d) Gaussian (2-d) 0.25 Binomial v v 2 6

7 Example: Given that we know that data are Gaussian, what kind of Gaussian distribution should we use to represent them? Problem definition: x 1,, x N ~N μ, σ 2 What is the best value for μ? One answer: The μ that maximizes the likelihood function Maximum Likelihood estimate (ML) 7

8 Definition: Likelihood function: ML estimate: Example: x 1,, x N ~N μ, σ 2 Find μ ML L X; θ = P(X; θ) θ ML = arg max L(X; θ) 8

9 Solution for μ ML = arg max L(X; μ, σ 2 ) Picture will be added Side notes: If x 1 and x 2 are independent, then: P(x 1, x 2 ) = P x 1 P(x 2 ) We can maximize the log of a function, instead of the function arg max[f θ + const] = arg max f θ = arg max log f θ 9

10 By using ML we can estimate the parameters of any model, given that we know the model family and we have observed a set of data 10

11 Gaussian (normal) distributions are preferred models for continuous random variables Central limit theorem: Mean of a large number of random variables has an approximately Gaussian distribution Example: A large number of phenomena in the nature, like noise, are also sum of different random factors 11

12 Sometimes it is hard to describe the data with just one model The next reasonable step is to use sum of several models x 1,, x N ~w 1 f 1 θ 1 +w 2 f 2 θ 2 +w 3 f 3 θ 3,where w 1 + w 2 + w 3 = 1 Each model has its own family and parameters We have to estimate both parameters and weights We don t know which data comes from which model => estimating the parameters is very difficult 12

13 Different operating regimes Continuous nonlinearities 13

14 A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. x 1,, x N ~w 1 N μ 1, Σ w M N μ M, Σ M M w i = 1,where i=1 14

15 Problem statement: We need to estimate: λ = w i, μ i, Σ i i = 1,, M GMM: Each data comes from one of the mixtures, which is unknown to us. For each data x t there is a latent parameter z t that indicate from which mixture x is coming from X: observed data, Z: unobserved latent variables {X, Z} : complete data, X: incomplete data 15

16 This time there are two sets of unknowns The parameters of each model The latent variable Z: specifying the mixture component that each data point belongs to It is often intractable to compute the maximum likelihood since the likelihood would be products of sums λ ML = arg max L(X; λ) = arg max P(X; λ) = arg max[ Z P(X, Z; λ) ] One approach to solve this ML estimate is to use Expectation Maximization (EM) algorithm 16

17 Expectation Maximization The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly. The EM algorithm proceeds from the observation that the following is a way to solve these two sets of equations numerically. One can simply pick arbitrary values for one of the two sets of unknowns, use them to estimate the second set, then use these new values to find a better estimate of the first set, and then keep alternating between the two until the resulting values both converge to fixed points. 17

18 Expectation Maximization EM algorithm: Goal: find λ ML = arg max[log Z P(X, Z; λ)] 1. E-step: Take the expectation over the latent variables, assuming that the unknown parameters are fixed 2. M-step Assume that the latent variables are fixed and solve the maximum likelihood for unknown parameters EM in one equation: λ t+1 = arg max E Z X, λ t log P(X, Z λ) 18

19 Clustering with GMM 19

20 GMM in Matlab doc gmdistribution.fit Example See GMM_test.m 20

21 GMM in Matlab 21

22 Thank You 22

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a