Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Size: px

Start display at page:

Download "Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions"

Cleopatra Francis
6 years ago
Views:

1 Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3

2 Presentation Outline Mixture modelling problem Minimum Message Length framework MML-based search method Evaluation of the proposed method von Mises-Fisher mixtures and applications. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3

3 Mixture models K Pr(x; M) = w j f j (x; Θ j ) j= Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3

4 Mixture models Pr(x; M) = Ubiquitously used Modelling multi-modal data K w j f j (x; Θ j ) j= Component probability distributions of various kinds Poisson, Exponential, Weibull,... multivariate Gaussian (Euclidean) multivariate von Mises-Fisher (directional) Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3

5 The Problem Estimation of the parameters of the components. Expectation-Maximization (EM) algorithm Determination of a suitable number of components Objective function to compare two mixtures. Parthan Kasarapu Mixture modelling using MML September 8, 25 4 / 3

6 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3

7 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3

8 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3

9 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3

10 Motivation Principal component setosa versicolor virginica Principal component setosa versicolor virginica Principal component setosa versicolor virginica Principal component Principal component Principal component Statistical model selection is important. Parthan Kasarapu Mixture modelling using MML September 8, 25 6 / 3

11 Model selection and inference Several candidate models: which one to choose? A criterion to compare models... Based on the model s complexity and the goodness-of-fit Parthan Kasarapu Mixture modelling using MML September 8, 25 7 / 3

12 Minimum Message Length (MML) Framework Conceptualized by Wallace and Boulton (968) I (H&D) = I (H) }{{} + I (D H) }{{} First part Second part Parthan Kasarapu Mixture modelling using MML September 8, 25 8 / 3

13 Minimum Message Length (MML) Framework Conceptualized by Wallace and Boulton (968) I (H&D) = Two-part message: I (H): model complexity I (D H): goodness-of-fit I (H) }{{} + I (D H) }{{} First part Second part Parthan Kasarapu Mixture modelling using MML September 8, 25 8 / 3

14 MML parameter estimation (Wallace and Freeman, 987) Single component (H) with parameter Θ j I (H&D) = I (Θ j ) + I (D H) + constant where I (Θ j ) = log h(θ j) F(Θj ) Prior density h(θ j ) Expected Fisher information F(Θ j ) Negative log-likelihood I (D H) Parthan Kasarapu Mixture modelling using MML September 8, 25 9 / 3

15 MML parameter estimation (Wallace and Freeman, 987) Mixture with K components (H) I (H&D) = I (K) + I (w) + K I (Θ j ) +I (D H) + constant j= } {{ } first part Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3

16 MML parameter estimation (Wallace and Freeman, 987) Mixture with K components (H) I (H&D) = I (K) + I (w) + K I (Θ j ) +I (D H) + constant j= } {{ } first part An EM algorithm to estimate parameters... Component parameters are updated using their MML estimates! I (H&D) is the scoring function. Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3

17 Determining the number of components K Several scoring functions... AIC & BIC (Akaike, 974; Schwarz et al., 978) MDL (Rissanen, 978) Approximated MML (Oliver et al., 996; Roberts et al., 998) ICL (Biernacki et al., 2) MML-like (Figueiredo and Jain, 22) Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3

18 Determining the number of components K Several scoring functions... AIC & BIC (Akaike, 974; Schwarz et al., 978) MDL (Rissanen, 978) Approximated MML (Oliver et al., 996; Roberts et al., 998) ICL (Biernacki et al., 2) MML-like (Figueiredo and Jain, 22) We propose a comprehensive MML formulation with no assumptions. Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3

19 Determining the number of components K Search method: existing approaches... Choose the K that has the best EM outcome. Figueiredo and Jain (22) propose an improved method. Begin with a large number of components. Iteratively eliminate the redundant ones. MML-based Snob (Wallace and Boulton, 968)... Perturb the current mixture. Assumes independent assumption on the attributes. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3

20 Proposed search method Basic idea Perturb a K-component mixture through a series of operations so that the mixture escapes a presumably sub-optimal state to an improved state. Operations include... Split Delete Merge Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3

21 Illustrative example of the search method Y X Original mixture with three components. Parthan Kasarapu Mixture modelling using MML September 8, 25 4 / 3

22 Illustrative example of the search method Y Y X X Original mixture with three components. Begin with a one-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 4 / 3

23 Illustrative example of the search method Y Y Y X X X (a) I = bits (b) Splitting (c) I = bits Split operation A parent component is split to find locally optimal children leading to a (K + )-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3

24 Illustrative example of the search method Y Y Y X X X (d) Initial means (e) I = 2269 bits (f) I = 2246 bits Parthan Kasarapu Mixture modelling using MML September 8, 25 6 / 3

25 Illustrative example of the search method Y Y Y X X X (g) Deleting (h) I = bits (i) I = bits Delete operation A component is deleted to find an optimal (K )-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 7 / 3

26 Illustrative example of the search method Y Y Y X X X (j) Merging (k) Initialization (l) I = bits Merge operation A pair of close components are merged to find an optimal (K )-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 8 / 3

27 Evolution of the mixture model Message length (in thousands of bits) first part second part total Number of components.2 Figure: Variation of the individual parts of the total message length with increasing components. Parthan Kasarapu Mixture modelling using MML September 8, 25 9 / 3

28 Performance of the proposed method Comparison with the search method of Figueiredo and Jain (22) Correct selections (%) Proposed FJ Separation δ (a) Average number of inferred components Proposed FJ Number of data points (b) Figure: -dimensional Gaussian mixture simulations (a) Percentage of correct selections with varying δ for a fixed sample size of N = 8 (b) Average number of inferred mixture components with different sample sizes and δ =.2 between component means. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3

29 Performance of the proposed method Comparison methodology I MML = I MML (M FJ ) I MML (M ) and I FJ = I FJ (M FJ ) I FJ (M ) Difference in message lengths Proposed ( I MML ) MML-like ( I FJ ) Empirical KL-divergence Proposed FJ Separation δ (a) Separation δ (b) Figure: (a) Difference in message lengths of inferred mixtures (b) Box-whisker plot of KL-divergence of inferred mixtures. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3

30 Mixtures of von Mises-Fisher (vmf) distributions vmf is analogous to a symmetric Gaussian wrapped on the hypersphere. Suitable for modelling directional data. Mixtures of vmf distributions inferred for... Describing protein data. High-dimensional text clustering. Parthan Kasarapu Mixture modelling using MML September 8, / 3

31 Mixture modelling of protein directional data X Pi+ θ Pi φ Pi X3 X2 Pi 2 Data corresponds to unit vectors on the sphere. Set of co-latitude θ [, π] and longitude φ [, 2π) pairs. Parthan Kasarapu Mixture modelling using MML September 8, / 3

32 Mixture modelling of protein directional data Parthan Kasarapu Mixture modelling using MML September 8, / 3

33 Optimal number of vmf mixture components 95 Co-latitude θ Longitude φ Figure: 37-component mixture Parthan Kasarapu Mixture modelling using MML September 8, / 3

34 Improved descriptors of protein data Null model Total message length Bits per (millions of bits) residue Uniform vmf mixture Parthan Kasarapu Mixture modelling using MML September 8, / 3

35 Text clustering Data corresponds to the normalized vector representations of text documents (Banerjee et al., 25). Parthan Kasarapu Mixture modelling using MML September 8, / 3

36 Text clustering Data corresponds to the normalized vector representations of text documents (Banerjee et al., 25). Clusters Methods of vmf parameter estimation Evaluation metric True Inferred Banerjee Tanabe Sra Song MML Message length Avg. F-measure Mutual Information Message length Mutual Information Table: Clustering performance on the two datasets: (a) Classic3 (d = 4358)(b) CMU Newsgroup (d = 6448). The MML mixtures consistently have lower message lengths. Parthan Kasarapu Mixture modelling using MML September 8, / 3

37 Summary MML-based parameter estimation of... Multivariate Gaussian and vmf distributions Design of the mixture modelling apparatus... Selection of the optimal number of components. Applications to modelling protein directional data and text clustering. P. Kasarapu, L. Allison, Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions, Machine Learning, (2-3): , 25. Parthan Kasarapu Mixture modelling using MML September 8, / 3

38 Thank you. Parthan Kasarapu Mixture modelling using MML September 8, / 3

39 References I H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 9(6):76 723, Dec 974. A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra. Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6: , 25. C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):79 725, 2. M. A. T. Figueiredo and A. K. Jain. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):38 396, 22. J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised learning using MML. In Machine Learning: Proceedings of the 3th International Conference, pages , 996. J. Rissanen. Modeling by shortest data description. Automatica, 4(5):465 47, 978. S. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to Gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(): 33 42, Nov 998. G. Schwarz et al. Estimating the dimension of a model. The Annals of Statistics, 6(2):46 464, 978. C. S. Wallace and D. M. Boulton. An information measure for classification. Computer Journal, (2):85 94, 968. C. S. Wallace and P. R. Freeman. Estimation and inference by compact coding. Journal of the Royal Statistical Society: Series B (Methodological), 49(3):24 265, 987. Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3

The Regularized EM Algorithm

The Regularized EM Algorithm Haifeng Li Department of Computer Science University of California Riverside, CA 92521 hli@cs.ucr.edu Keshu Zhang Human Interaction Research Lab Motorola, Inc. Tempe, AZ 85282