Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne 25th Australasian Joint Conference on Artificial Intelligence 2012 (The University of Melbourne) AI 2012 1 / 19

Content Mixture Modelling 1 Mixture Modelling Problem Description MML Mixture Models 2 MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians 3 Example (The University of Melbourne) AI 2012 2 / 19

Problem Description Mixture Modelling Problem Description We have n items, each with q associated attributes, formed into a matrix y 1 y 1,1 y 1,2... y 1,q y 2 Y =. = y 2,1 y 2,2... y 2,q...... y n y n,1 y n,2... y n,q Group together, or cluster, similar items A form of unsupervised learning Sometimes called intrinsic classification Class labels are learned from the data (The University of Melbourne) AI 2012 3 / 19

Mixture Modelling Problem Description Mixture Modelling (1) Models data as a mixture of probability distributions p(y i,j ; Φ) = K α k p(y i,j ; θ k,j ) k=1 where K is the number of classes α = (α 1,..., α K ) are the mixing (population) weights θ k,j are the parameters of the distributions Φ = {K, α, θ 1,1,..., θ K,q } denotes the complete mixture model Has an explicit probabilistic form allows for statistical interpretion (The University of Melbourne) AI 2012 4 / 19

Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q p k (y i ) = p(y i,j ; θ k,j ) j=1 probability of item s attributes under class distributions (The University of Melbourne) AI 2012 5 / 19

Mixture Modelling (3) Mixture Modelling Problem Description Membership of items to classes is soft r i,k = α k p k (y i ) Kl=1 α l p l (y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k (y i ) is probability of data item y i under class k Assign to class with highest posterior probability Total number of samples in a class is then n n k = r i,k i=1 (The University of Melbourne) AI 2012 6 / 19

Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick; comprised of 1 Length of codeword needed to state model Φ Number of classes: I(K) Relative abundances: I(α) Parameters for each distribution in each class: I(θ k,j ) 2 Length of codeword needed to state data, given model: I(Y Φ) (The University of Melbourne) AI 2012 7 / 19

Mixture Modelling MML Mixture Models (2) MML Mixture Models Total message length: K q I(Y, Φ) = I(K) + I(α) + I(θ k,j ) + I(Y Φ) k=1 j=1 balances model complexity against model fit Estimate Φ by minimising message length ˆα and ˆθ j,k found by expectation-maximisation Find ˆK by splitting/merging classes (The University of Melbourne) AI 2012 8 / 19

Content MML Inverse Gaussian Distributions 1 Mixture Modelling Problem Description MML Mixture Models 2 MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians 3 Example (The University of Melbourne) AI 2012 9 / 19

MML Inverse Gaussian Distributions Inverse Gaussian Distributions (1) Inverse Gaussian Distributions Distribution for positive, continuous data We say Y i IG(µ, λ) if p.d.f. for Y i = y i is p(y i ; µ, λ) = ( 1 2πλy 3 i where µ > 0 is the mean parameter λ > 0 is the inverse-shape parameter Suitable for positively skewed data ) 1 ( 2 exp (y i µ) 2 ) 2µ 2, λy i Derive the message length formula for use in mixture modelling (The University of Melbourne) AI 2012 10 / 19

MML Inverse Gaussian Distributions Inverse Gaussian Distributions (2) Inverse Gaussian Distributions Example of inverse Gaussian distributions 2 1.8 1.6 µ=1, λ=1 µ=1, λ=3 µ=3, λ=1 p(y; µ, λ) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 y (The University of Melbourne) AI 2012 11 / 19

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace Freeman approximation Bayesian; we chose uninformative priors π(µ, λ) 1 λµ 3 2 Message length component for use in mixture models I(θ k,j ) = log n k 1 ( ) 2 log ˆλ 2 2aj k,j + log bj where ˆλ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j, b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI 2012 12 / 19

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = (y 1,..., y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 S 1 = y i, S 2 =, y i i=1 Compare maximum likelihood estimates i=1 ˆµ ML = S 1 n, ˆλML = S 1S 2 n 2 ns 1 to minimum message length estimates ˆµ 87 = S 1 n, ˆλ87 = S 1S 2 n 2 (n 1)S 1 MML estimates: 1 Are Unbiased 2 Strictly dominate ML estimates in terms of KL risk (The University of Melbourne) AI 2012 13 / 19

Content Example 1 Mixture Modelling Problem Description MML Mixture Models 2 MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians 3 Example (The University of Melbourne) AI 2012 14 / 19

Example (1) Example Compared inverse Gaussian mixture models against standard Gaussian mixture models Used several well known, real, datasets 1 Enzyme 2 Acidity 3 Galaxy Results shown for enzyme n = 245 samples See paper for acidity and galaxy results (The University of Melbourne) AI 2012 15 / 19

Example (2) Example Histogram of enzyme data 80 70 60 50 40 30 20 10 0 0 0.5 1 1.5 2 2.5 3 (The University of Melbourne) AI 2012 16 / 19

Example (3) Example Gaussian mixture model (K = 2, I = 86.19) 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 (The University of Melbourne) AI 2012 17 / 19

Example (4) Example Inverse Gaussian mixture model (K = 3, I = 69.34) 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 (The University of Melbourne) AI 2012 18 / 19

Example References Wallace, C. S., Boulton, D. M. An information measure for classification. Computer Journal, 1968, Vol. 11, pp. 185-194 Wallace, C. S., Dowe, D. L. MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions. Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, 1997, pp. 529-536 Wallace, C. S. Intrinsic Classification of Spatially Correlated Data. The Computer Journal, 1998, Vol. 41, pp. 602-611 Wallace, C. S., Dowe, D. L., MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 2000, Vol. 10, pp. 73-83 Wallace, C. S. Statistical and Inductive Inference by Minimum Message Length, Springer, 2005 (The University of Melbourne) AI 2012 19 / 19