Accelerating the EM Algorithm for Mixture Density Estimation

Size: px

Start display at page:

Download "Accelerating the EM Algorithm for Mixture Density Estimation"

Aubrey Sharp
5 years ago
Views:

1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute Joint work with Josh Plasse (WPI/Imperial College). Research supported in part by DOE Grant DE-SC and NSF Grant DMS

2 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 2/18 Mixture Densities Consider a (finite) mixture density m p(x Φ) = α i p i (x φ i ). i=1 Problem: Estimate Φ = (α 1,..., α m, φ 1,..., φ m ) using an unlabeled sample {x k } N on the mixture. Maximum-Likelihood Estimate (MLE): Determine Φ = arg max L(Φ), where N L(Φ) log.

3 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 3/18 The EM (Expectation-Maximization) Algorithm The general formulation and name were given in... A. P. Dempster, N. M. Laird, and D. B. Rubin (1977), Maximum-likelihood from incomplete data via the EM algorithm, J. Royal Statist. Soc. Ser. B (methodological), 39, pp General idea: Determine the next approximate MLE to maximize the expectation of the complete-data log-likelihood function, given the observed incomplete data and the current approximate MLE. Marvelous property: The log-likelihood function increases at each iteration.

4 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 4/18 The EM Algorithm for Mixture Densities For a mixture density, an EM iteration is... α + i = 1 N N α c i p i(x k φ c i ) p(x k Φ c ), φ + i = arg max N log p i (x k φ i ) αc i p i(x k φ c i ) p(x k Φ c ) For a derivation, convergence analysis, history, etc., see... R. A. Redner and HW (1984), Mixture densities, maximum-likelihood, and the EM algorithm, SIAM Review, 26,

5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 5/18 Particular Example: Normal (Gaussian) Mixtures Assume (multivariate) normal densities. For each i, φ i = (µ i, Σ i ) and p i (x φ i ) = 1 (2π) n/2 (det Σ i ) 1/2 e (x µ i ) T Σ 1 i (x µ i )/2 EM iteration: For i = 1,..., m, N α + i = 1 N µ + i = { N α c i p i (x k φ c i ) p(x k Φ c ), α c i x p i (x k φ c i ) } / { N k p(x k Φ c ) α c i p i (x k φ c i ) } p(x k Φ c, ) { N Σ + i = (x k µ + i )(x k µ + i ) T αc i p i (x k φ c i ) } / { N α c i p i (x k φ c i ) } p(x k Φ c ) p(x k Φ c. )

6 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18 EM Iterations Demo A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) !0.05!0.1!3!2!

7 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18 EM Iterations Demo A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) Log Residual Norm ! !0.1!3!2! Iteration Number

8 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 7/18 Anderson Acceleration Derived from a method of D. G. Anderson, Iterative procedures for nonlinear integral equations, J. Assoc. Comput. Machinery, 12 (1965), Consider a fixed-point iteration x + = g(x), g : R n R n. Anderson Acceleration: Given x 0 and mmax 1. Set x 1 = g(x 0 ). Iterate: For k = 1, 2,... Set m k = min{mmax, k}. Set F k = (f k mk,..., f k ), where f i = g(x i ) x i. Solve min α R m k +1 F k α 2 s. t. m k i=0 α i = 1. Set x k+1 = m k i=0 α i g(x k mk +i ).

9 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18 EM Iterations Demo (cont.) p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) !0.05!0.1!3!2!

10 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18 EM Iterations Demo (cont.) p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) Log Residual Norm ! !0.1!3!2! Iteration Number

11 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 9/18 EM Convergence and Separation Redner W (1984): For mixture densities, the convergence is linear and depends on the separation of the component populations: well-separated (fast convergence) if, whenever i j, p i (x φ i ) p(x Φ ) pj(x φ j ) p(x Φ ) 0 for all x IRn ; poorly separated (slow convergence) if, for some i j, p i (x φ i ) p(x Φ ) p j(x φ j ) p(x Φ ) for all x Rn.

12 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18 Example: EM Convergence and Separation A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 3. 2πσ i 2 { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N Sample of 100,000 observations. [α 1, α 2, α 3 ] = [.3,.3,.4], [σ1 2, σ2 2, σ2 3 ] = [1, 1, 1]. [µ 1, µ 2, µ 3 ] = [0, 2, 4], [0, 1, 2], [0,.5, 1]. } α i p i (x k φ i ). 0 2 Log Residual Norm Iteration Number

13 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18 Example: EM Convergence and Separation A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 3. 2πσ i 2 { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N Sample of 100,000 observations. [α 1, α 2, α 3 ] = [.3,.3,.4], [σ1 2, σ2 2, σ2 3 ] = [1, 1, 1]. [µ 1, µ 2, µ 3 ] = [0, 2, 4], [0, 1, 2], [0,.5, 1]. } α i p i (x k φ i ). 0 2 Log Residual Norm Iteration Number

14 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 11/18 Experiments with Multivariate Normal Mixtures Experiment with Anderson acceleration applied to... EM iteration: For i = 1,..., m, N α + i = 1 N µ + i = { N α c i p i (x k φ c i ) p(x k Φ c ), α c i x p i (x k φ c i ) } / { N k p(x k Φ c ) α c i p i (x k φ c i ) } p(x k Φ c, ) { N Σ + i = (x k µ + i )(x k µ + i ) T αc i p i (x k φ c i ) } / { N α c i p i (x k φ c i ) } p(x k Φ c ) p(x k Φ c. ) Assume m is known. Ultimate interest: very large N.

15 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 12/18 Experiments with Multivariate Normal Mixtures (cont.) Two issues: Good initial guess? Use K-means. Fast clustering algorithm. Usually gives good results. Apply several times to random subsets of the sample. Choose the clustering with minimal sum of within-class distances. Use proportions, means, covariance matrices for the clusters as the initial guess. Preserving constraints? Iterate on... α i, i = 1,..., m; Cholesky factors of each Σ i.

16 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 13/18 Experiments with Generated Data All computing in MATLAB. Mixtures with m = 5 subpopulations. Generated data in R d for d = 2, 5, 10, 15, 20: For each d, randomly generated 100 true {α i, µ i, Σ i } 5 i=1. For each {α i, µ i, Σ i } 5 i=1, randomly generated a sample of size N = 1, 000, 000. Compared (unaccelerated) EM with EM+AA with mmax = 5, 10, 15, 20, 25, 30.

17 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 14/18 Experiments with Generated Data (cont.) A look at failures. mmax Totals failure to converge within 300 iterations. N α i p i (x k )/p(x k ) = 0 for some i. There were trials in which all methods failed, 26 trials in which EM failed and EM+AA succeeded for at least one mmax, 15 trials in which EM failed and EM+AA succeeded for all mmax, 20 trials in which EM succeeded and EM+AA failed for all mmax, 21 trials in which EM succeeded and EM+AA failed for at least one mmax.

18 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 15/18 Experiments with Generated Data (cont.) Performance profiles (Dolan-Moré, 2002) for (unaccelerated) EM and EM+AA with mmax = 5 over all trials: mmax = 0 mmax = mmax = 0 mmax = Iteration Numbers Run Times

19 An Experiment with Real Data I Remotely sensed data from near Tollhouse, CA. (Thanks to Brett Bader, Digital Globe.) I N = = observations of 16-dimensional multispectral data. I Modeled with a mixture of m = 3 multivariate normals. I Applied (unaccelerated) EM and EM+AA with mmax = 5, 10, 15, 20, 25,30. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 16/18

20 An Experiment with Real Data (cont.) Log residual norms vs. iteration numbers. Right: Bayes classification of data based on MLE. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 17/18

21 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 18/18 In Conclusion... Anderson acceleration is a promising tool for accelerating the EM algorithm that may improve both robustness and efficiency. Future work: Expand generated-data experiments to include more trials, larger data sets, well-controlled separation experiments, partially-labeled samples, and other parametric PDF forms. Look for more data from real applications.

The EM Algorithm in Multivariate Gaussian Mixture Models using Anderson Acceleration

The EM Algorithm in Multivariate Gaussian Mixture Models using Anderson Acceleration by Joshua H. Plasse A Project Report Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment