Mixture Models and EM

Size: px

Start display at page:

Download "Mixture Models and EM"

Bartholomew Gibson
5 years ago
Views:

1 Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23

2 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23

3 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23

4 Outline 1 Starters: K-Means 2 Main Course: EM Algorithm 3 Dessert: General Algorithm Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 3 / 23

5 Setting Starters: K-Means On the idea of K-Means Going Further... Data set of N observations of D-dimensional variable x {x 1,..., x N } with each vector D-dimensional x 1 = (x1 1,..., x1 D ) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 4 / 23

6 K-Means Idea: Clustering Starters: K-Means On the idea of K-Means Going Further... Idea Define the K prototype vectors (code-book vectors) D-dimensional All prototypes of the corresponding cluster: Sum of squares of distances from each data point to its closest prototype µ k is minimum. µ k This is the condition to define the µ k Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 5 / 23

7 K-Means Idea: Clustering Starters: K-Means On the idea of K-Means Going Further... Idea Define the K prototype vectors (code-book vectors) D-dimensional All prototypes of the corresponding cluster: Sum of squares of distances from each data point to its closest prototype µ k is minimum. µ k This is the condition to define the µ k Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 5 / 23

8 K-Means Idea: Clustering Starters: K-Means On the idea of K-Means Going Further... Idea Define the K prototype vectors (code-book vectors) D-dimensional All prototypes of the corresponding cluster: Sum of squares of distances from each data point to its closest prototype µ k is minimum. µ k This is the condition to define the µ k Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 5 / 23

9 Clustering idea (2) Starters: K-Means On the idea of K-Means Going Further... Minimize the distortion measure J N K J = r nk x n µ k 2 (1) k=1 Sum of squares of distances of all data points to their prototype. r nk = δ n,k (Kronecker) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 6 / 23

10 The Old Faithful data set Starters: K-Means On the idea of K-Means Going Further... 2 (a) 2 (b) 2 (c) (a) (b) (c) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 7 / 23

11 The Old Faithful data set (2) Starters: K-Means On the idea of K-Means Going Further... 2 (d) 2 (e) 2 (f) (d) (e) (f) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 8 / 23

12 The Old Faithful data set (3) Starters: K-Means On the idea of K-Means Going Further... 2 (g) 2 (h) 2 (i) 2 2! !2 2 (g) (h) (i) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 9 / 23

13 Starters: K-Means On the idea of K-Means Going Further... How to determine r nk and µ k Iterative procedure in two steps E Step: Fix initial values for the {µ k } and minimize J with respect to the r nk M Step: Fix the r nk and minimize J with respect to the {µ k } In practice... r nk = { 1 if k = min x n µ k 2 j else r nk x n µ k = n (2) (3) r nk n Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23

14 Starters: K-Means On the idea of K-Means Going Further... How to determine r nk and µ k Iterative procedure in two steps E Step: Fix initial values for the {µ k } and minimize J with respect to the r nk M Step: Fix the r nk and minimize J with respect to the {µ k } In practice... r nk = { 1 if k = min x n µ k 2 j else r nk x n µ k = n (2) (3) r nk n Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23

15 Generalizing to other distances Starters: K-Means On the idea of K-Means Going Further... Generalizing to the K-medoids algorithm. J N K = r nk ν(x n µ k ) (4) k=1 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

16 Image Clustering Starters: K-Means On the idea of K-Means Going Further... Pixels considered as 3-dimensional vectors (j) K = 2 (k) K = 3 (l) K = 5 (m) K = 1 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

17 Reminders and Notations Main Course: EM Algorithm Mixtures of Gaussians Gaussian mixture distribution: Recall that a Gaussian mixture can be written as a sum of Gaussians: And let us introduce p(x) = K π k N (x µ k, Σ k ) (5) k=1 z = (,,...,, 1,,..., ) z 1 z 2 z k z K (6) whose distribution, considering p(z k = 1) = π k can be written in the form p(z) = K k=1 π z k k (7) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

18 Reminders and Notations (2) Main Course: EM Algorithm Mixtures of Gaussians We also have z K p(x z) = N (x µ k, Σ k ) zk (8) k=1 giving finally, by summing the joint distribution p(x) = z K p(z)p(x z) = π k N (x µ k, Σ k ) (9) k=1 x Only interest in these development in the introduction of the latent variable z n Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

19 Reminders and Notations (3) Main Course: EM Algorithm Mixtures of Gaussians Lets us note, further, γ(z k ) = p(z k = 1 x) γ(z k ) = π kn (x µ k, Σ k ) K π j N (x µ j, Σ j ) j=1 the conditional probability of z given x. (1) γ(z k ) is in fact the responsibility that k has in the observation x Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

20 Example for mixture of Gaussians Main Course: EM Algorithm Mixtures of Gaussians 1 (a) 1 (b) 1 (c) (n) (o) (p) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

21 And again... Main Course: EM Algorithm Mixtures of Gaussians This time, instead of codebook vectors, we use Gaussians: Now... Maximizing the log-likelihood instead of the distortion measure µ k = 1 N N γ(z nk )x n with N k = γ(z nk ) N k Σ k = 1 N k N γ(z nk )(x n µ k )(x n µ k ) T Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

22 And again... Main Course: EM Algorithm Mixtures of Gaussians This time, instead of codebook vectors, we use Gaussians: Now... Maximizing the log-likelihood instead of the distortion measure µ k = 1 N N γ(z nk )x n with N k = γ(z nk ) N k Σ k = 1 N k N γ(z nk )(x n µ k )(x n µ k ) T Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

23 And again... Main Course: EM Algorithm Mixtures of Gaussians This time, instead of codebook vectors, we use Gaussians: Now... Maximizing the log-likelihood instead of the distortion measure µ k = 1 N N γ(z nk )x n with N k = γ(z nk ) N k Σ k = 1 N k N γ(z nk )(x n µ k )(x n µ k ) T Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

24 And again... Main Course: EM Algorithm Mixtures of Gaussians This time, instead of codebook vectors, we use Gaussians: Now... Maximizing the log-likelihood instead of the distortion measure µ k = 1 N N γ(z nk )x n with N k = γ(z nk ) N k Σ k = 1 N k N γ(z nk )(x n µ k )(x n µ k ) T Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

25 And again... Main Course: EM Algorithm Mixtures of Gaussians This time, instead of codebook vectors, we use Gaussians: Now... Maximizing the log-likelihood instead of the distortion measure µ k = 1 N N γ(z nk )x n with N k = γ(z nk ) N k Σ k = 1 N k N γ(z nk )(x n µ k )(x n µ k ) T Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

26 In images... Main Course: EM Algorithm Mixtures of Gaussians (a) 2 (q) 2 (b) 2 (r) 2 (c) 2 (s) (d) 2 2 (e) 2 2 (f) 2 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

27 EM for Gaussian Mixtures (1) Main Course: EM Algorithm Mixtures of Gaussians EM Algorithm (1) Initialize µ k, Σ k and π k and evaluate log-likelihood with these E Step: γ(z nk ) = π kn (x n µ k, Σ k ) K π j N (x n µ j, Σ j ) j=1 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

28 EM for Gaussian Mixtures (1) Main Course: EM Algorithm Mixtures of Gaussians EM Algorithm (1) Initialize µ k, Σ k and π k and evaluate log-likelihood with these E Step: γ(z nk ) = π kn (x n µ k, Σ k ) K π j N (x n µ j, Σ j ) j=1 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

29 EM for Gaussian Mixtures (2) Main Course: EM Algorithm Mixtures of Gaussians EM Algorithm (2) M Step: µ new k Σ new k π new k = 1 N k N = 1 N k N γ(z nk )x n, = N k N with N k = γ(z nk )(x n µ new k )(x n µ new k ) T, N γ(z nk ) Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23

30 EM for Gaussian Mixtures (3) Main Course: EM Algorithm Mixtures of Gaussians EM Algorithm (3) Evaluate log-likehood N K ln p(x µ, Σ, π) = ln π k N (x n µ k, Σ k ) j=1 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

31 Dessert: General Algorithm Kidding... The proposed version of the general algorithm in the book is way too heavy to be presented on slides... Meanwhile, I invite you to read section 9.4 about this general algorithm since it gives a general framework that is given in Chapter 1 Or not... Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

32 Had fun? Hope you enjoyed... Yoan Miche (CIS, HUT) Mixture Models and EM March 17, / 23

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures