Mixture Models. Michael Kuhn

Size: px

Start display at page:

Download "Mixture Models. Michael Kuhn"

Britton Bell
5 years ago
Views:

1 Mixture Models Michael Kuhn

2 Objec<ves Students will be able to explain what is a normal mixture model Students will be able to explain steps of the Expecta<on-Maximiza<on Algorithm Describe methods to select the number of components apply methods from mclust to their own data

3 Mixture Models Probabilis<c model for subpopula<ons with different probability density func<ons A probability density func<on generated by summing together mul<ple probability density func<ons gx y(x) = a i f(x) i=1 p

4 Mixture Models Probabilis<c model for subpopula<ons with different probability density func<ons A probability density func<on generated by summing together mul<ple probability density func<ons number of components y(x) = GX a i f(x) mixing coefficient (a i sum to 1) mixture model g=1 p component density func<on

5 Interpreta<on of Mixture Models Arise when there is a latent variable indica<ng group membership observa<ons (x i,z i ) z i drawn from {1,,G} x i drawn from f z ( θ z ) Semi-parametric method of es<ma<ng density probability density x

6 Normal (Gaussian) Mixture Models Normal distribu<ons have convenient X proper<es univariate X 1 (x µ) 2 f(x ) = (x µ, )= p exp p mul<variate f(x µ, ) = exp 1 2 (x µ)t 1 (x µ) p (2 )k

7 Diversity of Normal Mixture Models Examples from McLachlan & Peel (2004)

8 Diversity of Normal Mixture Models Examples from McLachlan & Peel (2004)

9 X X p Probability p p(x = 3) = f (3)dx probability density f (x) x= x

10 X p Likelihood p p(x 1,x 2,x 3 )=f (x 1 )f (x 2 )f (x 3 )dx 1 dx 2 dx 3 probability density f (x) x

11 p X p Likelihood likelihood( x 1,x 2,x 3 )=f (x 1 )f (x 2 )f (x 3 ) probability density f (x) x

12 X p Likelihood log likelihood( x 1,x 2,x 3 ) = log f (x 1 ) + log f (x 2 ) + log f (x 3 ) probability density f (x) X x

13 X Likelihood p L( x 1,x 2,...,x N )= NX log f (x i ) i=1 probability density f (x) X x

14 X Likelihoods p L( x 1,x 2,...,x N )= NX log f (x i ) i=1 probability density f (x) X x

15 p X p Likelihoods p For a normal distribu<on (µ, 2 ) the parameters are = {µ, 2 } What is the name probability density f (x) for the maximum likelihood es<mator for the center of a normal distribu<ons? x

16 Normal Mixture Models The maximum-likelihood (ML) es<mates for μ and Σ can be easily es<mated. unbiased es<mator of mean unbiased es<mator of covariance matrix x = 1 n nx i=1 X Q = 1 n 1 x i X nx (x i x)(x i x) T i=1

17 Expecta<on Maximiza<on algorithm Maximum-likelihood es<ma<on (MLE) method for a case where some variables are hidden In the case of mixture models, the hidden variables are the group a par<cular point belongs to

18 Expecta<on Maximiza<on algorithm E step: creates a func<on for the Expecta<on of the log-likelihood using the parameters from the M step M step: computes the parameters that maximizes the func<on from the E step

19 Expecta<on Maximiza<on algorithm X E step: creates a func<on for the Expecta<on of the log-likelihood using the parameters from the M step M step: computes the parameters that maximizes the func<on from the E step Q( (t) )=E Z X, (t) [L( ; X, Z)] Formally, (t+1) = arg max X Q( (t) )

20 Expecta<on Maximiza<on algorithm group 1 group 2 x1 x3 x2 x4 x5 For points {x i } the indicator variables variables z ig = 1 if the i th point is in the g th cluster and 0 otherwise In general z ig are not known, so we es<mate the group assignments probabilis<cally

21 Likelihood for EM X Observed Data Likelihood Data are X={x 1, x 2, x N } L obs. ( X) = Complete Data Likelihood X " nx G # X X log a i f(x i g ) i=1 g=1 X X " X Data are X,Z = {(x 1,z 1 ), (x 2,z 2 ), (x N,z N )} L compl. ( X, Z) = nx i=1 g=1 # GX z ig log a i f(x i g )

22 u t X M step 1 = 2 = a 1 = 1 n a 2 = 1 n X 2 component model y(x) =a 1 (x; µ 1, 1)+au t 2 (x; µ X 2, 2) nx Pr(1 x i ) i=1 nx Pr(2 x i ) i=1 µ 1 = 1 na 1 X xpr(1 xi ) µ 2 = 1 X xpr(2 xi ) na 2 v u t 1 nx (x µ 1 ) na 2 Pr(1 x i ) 1 v u t 1 na 2 i=1 nx (x µ 2 ) 2 Pr(2 x i ) i=1 E step v u t X Pr(g x i )= (x i ; µ j, j)/y(x i )

23 An example of the E-M algorithm The intrinsic distribu<on of the simulated data density x x

24 E M algorithm: ini<al guess grp1 grp2 n = length(x) grp1 = which(x < 1.0) grp2 = which(x >= 1.0) z.indicator1 = rep(0,100) z.indicator2 = rep(0,100) z.indicator1[grp1] = 1.0 z.indicator2[grp2] = 1.0 Frequency What methods would you as an ini<al guess for clustering in a mul<variate dataset? x

25 X M step 1 = 2 = a 1 = 1 n a 2 = 1 n nx Pr(1 x i ) i=1 nx Pr(2 x i ) i=1 µ 1 = 1 na 1 X xpr(1 xi ) µ 2 = 1 X xpr(2 xi ) na 2 v u t 1 nx (x µ 1 ) na 2 Pr(1 x i ) 1 v u t 1 na 2 i=1 nx (x µ 2 ) 2 Pr(2 x i ) i=

26 Number of Components Use methods for model selec<on Penalized likelihoods Log-likelihood: (stricter) (less strict) Individual values of the BIC and AIC by themselves are meaningless. They are only useful for comparison between models.

27 BIC BIC E V Number of components

28 mclust EM algorithm for fiung models BIC for model selec<on Op<onal, restric<ve or unrestric<ve priors

29 Open R and the notes from the school webpage.

30 Bayesian Methods MCMC methods useful for parameter es<ma<on and model selec<on Bayesian methods are par<cularly useful for evalua<ng the effect of uncertain<es on model parameters

31 Problems Interpreta<on of results Mul<modality of Density Nonintui<ve group membership Fiung Mul<modality of likelihood Infinite likelihood Irregularity Noniden<fability of components Slow convergence

32 Singulari<es The likelihood of mixture models is unbound at the edge of parameter space A component can shrink on a given point, the mean becomes the point, and the variance goes to zero. Mixture model codes must implement strategies to avoid this effect

33 Noniden<fiability Permuta<on of labels A challenge for implementa<ons of MCMC Point process representa<on σ Components with equal means and covariance matrices Interpreta<on of the likelihood ra<o sta<s<c is not straighxorward μ

34 Rela<onship to Other Cluster Analysis Algorithms Normal models are a soz classifier Condi<onal probabili<es es<mated Classifica<on based on es<mated probabil<es Hard classifica<on methods K-means DBSCAN

35 Mo<va<ons for Use of Mixture Models in Astronomy

36 Inves<ga<ng Multmodality GRB dura<ons from BATSE Mukherjee et al. (1998) log GRB Dura<on (T90) [sec]

37 Inves<ga<ng Mul<modality BATSE GRBs Mukherjee et al. (1998)

38 Removal of Contaminants HST observa<ons of galaxies GCs Background galaxies Field stars Mixture model approach to separate GCs and contaminants Distribu<on of galaxies obtained from an offtarget field Jordán et al. (2009)

39 Red and Blue Galaxies Well-known bimodality to galaxy color distribu<ons Color distribu<ons overlap Simple color cuts lead to bias when examining the galaxy mass func<on for red and blue galaxies Taylor et al. (2015)

40 Red and Blue Galaxies

shape to mass func<ons Mixture models reveal a longer distribu<on of

41 Red and Blue Galaxies >23,000 galaxies from GAMA 40 parameter model for red and blue galaxies astrophysically mo<vated MCMC Expected shape to mass func<ons Mixture models reveal a longer distribu<on of red galaxies that merge with blue popula<on at low masses Taylor et al. (2015)

42 Star Cluster Modeling In star-forming regions young stars are ozen distributed in a number of groups NGC 6357 IRAC NGC 6357 Stellar Members

43 Star Cluster Modeling Isothermal King Model Ellipsoid Multivariate Normal Log surface density [stars pc 2 ] Length Scale [pc] Adap<vely smoothed data are shown in black Models are shown in gray

44 Comparison of Methods Mixture Model Classification Minimum spanning tree method Pruned MST y x Two simulated spherical clusters 1) 300 points 2) 700 points

Pruned MST y 3 2 1 0 1 2 3 2 0 2 4 6 8 x Two

45 Comparison of Methods Mixture Model Classification Minimum spanning tree method Pruned MST y x Two simulated spherical clusters 1) 300 points 2) 700 points

46 Extreme Deconvolu<on Technique described by Bovy et al. (2011) Deals with heteroscedas<c errors Makes use of mixture models to approximate arbitrary density func<ons

47 Examples in R

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which