FINITE MIXTURE MODELLING USING THE SKEW NORMAL DISTRIBUTION

Size: px

Start display at page:

Download "FINITE MIXTURE MODELLING USING THE SKEW NORMAL DISTRIBUTION"

Felicia Moody
6 years ago
Views:

1 Statstca Snca 1727, FINITE MIXTURE MODELLING USING THE SKEW NORMAL DISTRIBUTION Tsung I. Ln 1, Jack C. Lee 2 and Shu Y. Yen 2 1 Natonal Chung Hsng Unversty and 2 Natonal Chao Tung Unversty Abstract: Normal mxture models provde the most popular framework for modellng heterogenety n a populaton wth contnuous outcomes arsng n a varety of subclasses. In the last two decades, the skew normal dstrbuton has been shown benefcal n dealng wth asymmetrc data n varous theoretc and appled problems. In ths artcle, we address the problem of analyzng a mxture of skew normal dstrbutons from the lkelhood-based and Bayesan perspectves, respectvely. Computatonal technques usng EM-type algorthms are employed for teratvely computng maxmum lkelhood estmates. Also, a fully Bayesan approach usng the Markov chan Monte Carlo method s developed to carry out posteror analyses. Numercal results are llustrated through two examples. Key words and phrases: ECM algorthm, ECME algorthm, Fsher nformaton, Markov chan Monte Carlo, maxmum lkelhood estmaton, skew normal mxtures. 1. Introducton Fnte mxture models have been broadly developed and wdely appled to classfcaton, clusterng, densty estmaton and pattern recognton problems, as shown by Ttterngton, Smth and Markov 1985, McLachlan and Basord 1988, McLachlan and Peel 2, and the references theren. Wth the growng advances of computatonal methods, especally for the development of Markov chan Monte Carlo MCMC technques, many works are also devoted to Bayesan mxture modellng ssues, ncludng Debolt and Robert 1994, Escobar and West 1995, Rchardson and Green 1997 and Stephens 2, among others. In many appled problems, the shapes of ftted mxture normal components may be dstorted, and nferences can be msleadng when the data nvolves hghly asymmetrc observatons. In partcular, the normal mxture NORMIX model tends to overft when addtonal components are ncluded to capture the skewness. Sometmes, ncreasng the number of pseudo-components may lead to dffcultes and neffcences n computatons. Instead, we consder usng the skew normal dstrbutons proposed by Azzaln 1985 as component denstes to overcome the potental weakness of normal mxtures. The skew normal dstrbuton s a new class of densty functons dependent on an addtonal shape parameter,

2 91 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN and ncludes the normal densty as a specal case. It provdes a more flexble approach to the fttng of asymmetrc observatons and uses fewer components n the fttng of mxture models. A comprehensve coverage of the fundamental theory and new developments for skew-ellptcal dstrbutons s gven by Genton 24. It s not easy to deal wth computatonal aspects of parameter estmaton for the fttng of skew normal mxture SNMIX models. For smplcty, we treat the number of components as known and descrbe how to employ EM-type algorthms for fndng the maxmum lkelhood ML estmates. In addton, Bayesan samplng methods for SNMIX are consdered as an alternatve modellng strategy. Prors and hyperparameters are chosen as weakly nformatve to avod nondentfablty problems n the mxture context. The rest of the paper unfolds as follows. Secton 2 brefly outlnes some prelmnares of the skew normal dstrbuton. Azzaln and Captano 1999 ponted out that the ML estmates mght be mproved by a few EM teratons, but detaled expressons of the EM algorthm are not avalable n the lterature. We thus show how to compute the ML estmates for the skew normal dstrbuton usng two EM-type algorthms. In Secton 3 we show a herarchcal representaton for the SNMIX model by ncorporatng two latent varables. Based on the model, we also derve the correspondng EM-type algorthms for ML estmaton. Meanwhle, the nformaton-based standard errors are also presented. In Secton 4, we develop the MCMC samplng algorthm used n smulatng posteror dstrbutons to carry out Bayesan nferences. In Secton 5, two examples are gven, and n Secton 6 we provde some concludng remarks. 2. The Skew Normal Dstrbuton 2.1. Prelmnares As developed by Azzaln 1985, 1986, a random varable Y follows a unvarate skew normal dstrbuton wth locaton parameter ξ, scale parameter σ 2 and skewness parameter λ R f t has the densty ψy ξ,σ 2,λ = 2 σ φ y ξ σ Φ λ y ξ σ, 1 where φ and Φ denote the standard normal densty functon and cumulatve dstrbuton functon, respectvely; then, for brevty, we say that Y SNξ,σ 2,λ. Note that f λ =, the densty of Y reduces to the Nξ,σ 2 densty. Lemma 1. If Y SNξ,σ 2,λ and X Nξ,σ 2 /1 + λ 2, we have EX n+1 = ξex n + [σ 2 /1 + λ 2 ][dex n /dξ].

3 SKEW NORMAL MIXTURES 911 EY n+1 = ξey n + σ 2 [dey n /dξ] + 2/πδλσEX n. E Y E Y } n+1 = σ 2 [dey EY } n /dξ] + nσ 2 E Y EY } n 1 EY ξ } E Y EY } n } n. + 2/πδλσE X EY Lemma 1 provdes a smple way of obtanng hgher moments wthout usng the moment generatng functon. Wth some basc algebrac manpulatons, we can easly obtan 2 EY = ξ + π δλσ, vary = 24 πλ 3 γ Y = } 3/2, κ Y = 3 + π + π 2λ } π δ2 λ σ 2, 8π 3λ 4 π + π 2λ 2 } 2, 2 where δλ = λ/ 1 + λ 2, and γ Y and κ Y are the measures of skewness and kurtoss, respectvely. It s easly shown that γ Y s n.9953,.9953 and κ Y s n 3, Henze 1986 showed that the odd moments of the standard skew normal varable Z = Y ξ/σ have the expresson EZ 2k+1 = 2 π λ1 + λ2 k+.5 2 k 2k + 1! k j= j!2λ 2j 2j + 1!k j!, whle the even moments concde wth those of standard normal, as Z 2 χ 2 1 Roberts and Gesser From 2, Arnold, Beaver, Groeneveld and Meeker 1993 showed the followng method of moments estmators: ξ = m 1 a 1 m3 σ 2 = m 2 + a 2 1 δλ = b 1 m3 b 1 1 3, 2 3, 2 a 2 b1 3 } m 2, 3 m 3 where a 1 = 2/π, b 1 = 4/π 1a 1, m 1 = n 1 n =1 Y, m 2 = n 1 1 n =1 Y Ȳ 2, and m 3 = n 1 1 n =1 Y Ȳ Parameter estmaton usng EM-type algorthms In ths subsecton, we show how to explot two extensons of the EM algorthm Dempster, Lard and Rubn 1977, the ECM algorthm Meng and Rubn 1993 and the ECME algorthm Lu and Rubn 1994, for ML estmaton of the skew normal dstrbuton. A key feature of these two EM-type

4 912 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN algorthms s that they preserve the stablty of the EM algorthm wth ther monotone convergence. In order to represent the skew normal model n an ncomplete data framework, we extend the result of Azzaln 1986, p.21 and Henze 1986, Thm. 1 to show that f Y j SNξ,σ 2,λ, then Y j = ξ + δλτ j + 1 δ 2 λu j, 4 wth τ j TN,σ 2 Iτ j > }, U j N,σ 2, where τ j and U j are ndependent, TN, denotes the truncated normal dstrbuton, and I } represents an ndcator functon. Lettng Y = Y 1,...,Y n and τ = τ 1,...,τ n, the complete-data log-lkelhood of θ = ξ,σ 2,λ gven Y,τ, after omttng addtve constants, s l c θ = n logσ 2 n 1 2 log δ 2 λ n τ2 j 2δλ n τ jy j ξ + n y j ξ σ 1 2 δ 2 λ Obvously, the posteror dstrbuton of τ j s τ j Y j = y j TNµ τj,σ 2 τ Iτ j > }, 6 where µ τj = δλy j ξ and σ τ = σ 1 δ 2 λ. Lemma 2. Let X TNµ,σ 2 Ia 1 < x < a 2 } be a truncated normal dstrbuton wth the densty fx µ,σ 2 = } 1 1 Φα 2 Φα 1 exp 2πσ where α = a µ/σ, = 1, 2. Then EX = µ σ φα 2 φα 1 Φα 2 Φα 1. EX 2 = µ 2 + σ 2 σ 2α 2φα 2 α 1 φα 1 Φα 2 Φα 1 By Lemma 2, we have Eτ j y j = µ τj + φµτj σ τ The ECM algorthm s as follows. 1 2σ2x µ2 2µσ φα 2 φα 1 Φα 2 Φα 1. Φ µτ j σ τ σ τ and Eτ 2 j y j = µ 2 τ j + σ 2 τ + }, a 1 < x < a 2, φ µτ j σ τ Φ µτ j σ τ µ τ j σ τ.

5 SKEW NORMAL MIXTURES 913 E-step: Calculatng the condtonal expectaton of 5 at the kth teraton yelds ŝ k 1j = Eˆθ kτ j y j = ˆµ k τ j + ŝ k 2j = Eˆθ kτ2 j y j = ˆµ k2 τ j + ˆσ τ k2 + } φˆλk yj ˆξ k ˆσ k ˆλk }ˆσ yj ˆξ Φ k τ k, ˆσ k } φˆλk yj ˆξ k ˆσ k ˆλk yj ˆξ Φ k ˆσ k } ˆµ k τ j ˆσ k τ, where ˆµ τ k j, ˆσ τ k are µ τj and σ τ n 6 wth ξ, σ and λ replaced by ˆξ k, ˆσ k and ˆλ k, respectvely. CM-steps CM-step 1: Update ˆξ k by ˆξ k+1 = 1 y j δˆλ k n CM-step 2: Update ˆσ 2k by ŝ k 1j. ˆσ 2k+1 = n ŝk 2j 2δˆλ k n y j ˆξ k+1 ŝ k 1j + n y j ˆξ k+1 2 2n 1 δ 2 ˆλ k. CM-step 3: Fx ξ = ˆξ k+1 and σ 2 = ˆσ 2k+1, obtan ˆλ k+1 as the soluton of nˆσ 2k+1 δλ 1 δ 2 λ δ 2 λ n y j ˆξ k+1 ŝ k δλ ŝ k 2j δλ y j ˆξ k+1 2 =. For the ECME algorthm, the E-step and the frst two CM steps are the same as ECM, whle the CM-Step 3 of ECM s modfed as the followng CML-step. CML-step: Update ˆλ k by optmzng the constraned log-lkelhood functon,.e., ˆλ k+1 = argmax log Φ λ y j ˆξ k+1 }. λ ˆσ k+1 The maxmzaton n the CML-step requres a one-dmensonal search, whch can be easly solved by the functon optm embedded n the statstcal package R. As noted by Lu and Rubn 1994, the ECME has a faster convergence rate than the ECM algorthm. 1j

6 914 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN Lemma 3. If Z } SN,1,λ, then E φλz 2 ΦλZ = 1 π. 1+λ 2 } E Z 2k+1 φλz ΦλZ =, k =, 1, 2,... } E = Z 2 φλz ΦλZ 2 π λ 1+λ The method of moments estmators n 3 can provde good ntal values. Applyng Lemma 3, the Fsher nformaton Iξ, σ, λ can be easly obtaned. The results are shown n Azzaln 1985, p.175. The standard errors of ML estmates can be computed by takng the square root of the correspondng dagonal elements of I 1 ˆξ, ˆσ, ˆλ. 3. The Skew Normal Mxtures 3.1. The model We consder a fnte mxture model n whch a set of ndependent data Y 1,...,Y n are from a g-component mxture of skew normal denstes fy j Θ = g ω ψy j ξ,σ 2,λ, 7 =1 where ω = ω 1,...,ω g are the mxng probabltes, constraned to be nonnegatve and sum to unty, and Θ = θ,...,θ g wth θ = ω,ξ,σ 2,λ beng the specfc parameters for component. We ntroduce a set of latent component-ndcators Z j = Z 1j,...,Z gj, j = 1,...,n, whose values are a set of bnary varables wth 1 f Yj belongs to group k, Z kj = otherwse, and g =1 Z j = 1. Gven the mxng probabltes ω, the component-ndcators Z 1,...,Z j are ndependent, wth multnomal denstes fz j = ω z 1j 1 ωz 2j 2 1 ω 1 ω g 1 z gj. 8 We wrte Z j M1; ω 1,...,ω g to denote Z j wth densty 8. From 4, a herarchcal model for skew normal mxtures can be wrtten as Y j τ j, Z j = 1 N ξ + δλ τ j, 1 δ 2 λ σ 2, τ j Z j = 1 TN,σ 2 Iτ j >, Z j M1; ω 1,...,ω g j = 1,...,n. 9

7 SKEW NORMAL MIXTURES Maxmum lkelhood estmaton As n 6, we have τ j Y j = y j,z j = 1 TNµ τj,σ 2 τ Iτ j > }, where µ τj = δλ y j ξ, σ τ = σ 1 δ 2 λ. 1 From 9, the complete-data log-lkelhood functon s l c θ = =1 g Z j logω logσ log δ 2 λ } τ2 j 2δλ τ j y j ξ + y j ξ δ 2 λ 2σ 2 Lettng ẑ j =E ˆΘ kz j Y, ŝ 1j =E ˆΘ kz jτ j Y and ŝ 2j =E ˆΘ kz jτ 2 j Y be the necessary condtonal expectatons of 11, we obtan ẑ k j = ˆω k g m=1 ˆωk ŝ k 1j = ẑk j ˆµk ŝ k 2j = ẑk j ˆµk2 ψy j ˆξ k, ˆσ 2k k ˆξ k, ˆλ m ψy j m, ˆσ 2k φ τ j + ˆσ τ k Φ ˆλ k φ τ j + ˆσ τ k2 + Φ k m, ˆλ ˆλ k yj ˆξ k ˆσ k yj ˆξ k ˆσ k m, 12 } ˆλ k yj ˆξ k ˆλ k }, 13 ˆσ k yj ˆξ k ˆσ k } } ˆµk τ j ˆσ τ k, 14 where ˆµ k τ j, ˆσ τ k are µ τj and σ τ n 1 wth ξ, σ and λ replaced by ˆξ k, ˆσ k and ˆλ k, respectvely. The ECM algorthm s as follows. E-step: Gven Θ = ˆΘ k, compute ẑ k j, ŝk 1j j = 1,...,n, usng 12, 13 and 14. CM-step 1: Calculate ˆω k+1 CM-step 2: Calculate = n 1 n ẑk j. n ˆξ k+1 = ẑk and ŝk 2j j y j δˆλ k n n ẑk j for = 1,...,g and ŝk 1j.

8 916 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN CM-step 3: Calculate ˆσ 2k+1 = n ŝk 2j CM-step 4: Fx ξ = soluton of ˆσ 2k+1 k 2δˆλ n ŝk 1j y j ˆξ k+1 δλ 1 δ 2 λ n δλ ŝ k 2j δλ 2 1 δ 2 ˆλ k n ẑk j k+1 ˆξ + n ẑk j y k+1 j ˆξ 2 and σ 2 = k+1 ˆσ2k+1, obtan ˆλ = 1,...,g as the ẑ k j δ 2 λ n y j ẑ k j y k+1 j ˆξ 2 =. k+1 ˆξ ŝ k 1j ECME s dentcal to ECM except for the CM-Step 4 of ECM, whch can be modfed by the followng CML-Step. CML-step: Let λ = λ 1,...,λ g, and update ˆλ k to ˆλ k+1 = argmax λ 1,...,λ g g log =1 ˆω k+1 ψy j k+1 ˆξ, ˆσ 2k+1, λ. We remark here that f the skewness parameters λ 1,...,λ g are assumed to be dentcal, we use ECME snce t s more effcent than ECM. Otherwse, the CML-step becomes a non-trval hgh dmensonal optmzaton problem, whle usng the CM-step 4 can avod the complcaton Standard errors We let I o Θ y = 2 lθ Y / Θ Θ T be the observed nformaton matrx for the mxture model 7. Under some regularty condtons, the covarance matrx of ML estmates ˆΘ can be approxmated by the nverse of I o ˆΘ y. We follow Basford, Greenway, McLachlan and Peel 1997 to evaluate I o ˆΘ y = ŝ j ŝ T j, 15 g } where ŝ j = log =1 ω ψy j ξ,σ 2,λ / Θ ˆΘ. Θ= Correspondng to the vector of all 4g 1 unknown parameters n Θ, we partton ŝ j j = 1,...,n as ŝ j = ŝ j,ω1,...,ŝ j,ωg 1,ŝ j,ξ1,...,ŝ j,ξg,ŝ j,σ1,...,ŝ j,σg,ŝ j,λ1,...,ŝ j,λg T..

9 SKEW NORMAL MIXTURES 917 The elements of ŝ j are gven by ŝ j,ωr = ψy j ˆξ r, ˆσ 2 r, ˆλ r ψy j ˆξ g, ˆσ 2 g, ˆλ g g =1 ˆω ψy j ˆξ, ˆσ 2, ˆλ r = 1,...,g 1, ŝ j,ξr = 2ˆω r φ y j ˆξ r ˆσ r } ˆσ 2 r g =1 ˆω ψy j ˆξ, ˆσ 2, ˆλ ŝ j,σr = ˆω rψy j ˆξ r, ˆσ r, 2 ˆλ r g =1 ˆω ψy j ˆξ, ˆσ 2, ˆλ 1ˆσ + y j ˆξ r 2 r ˆσ r 3 2ˆω rˆλ r y j ˆξ r φ y j ˆξr ˆλry ˆσ r φ j ˆξ r ˆσ r ˆσ r 3 g =1 ˆω ψy j ˆξ, ˆσ 2, ˆλ yj ˆξ r y j Φ ˆλ ˆξ r r ˆσ r ˆσ r y j ˆλ r φ ˆλ ˆξ } r r r = 1,...,g, ˆσ r } r = 1,...,g, ŝ j,λr = ˆω rψy j ˆξ r, ˆσ r, 2 ˆλ r yj ˆξ ˆλry φ j ˆξ r r g =1 ˆω ψy j ˆξ, ˆσ 2, ˆλ } r = 1,...,g. ˆσ r ˆλry Φ j ˆξ r ˆσ r ˆσ r } The nformaton-based approxmaton 15 s asymptotcally applcable. However, t may not be relable unless the sample sze s large. It s common n practce to perform the bootstrap approach Efron and Tbshran 1986 for obtanng an alternatve estmate of the covarance matrx for ˆΘ. The bootstrap method may provde more accurate standard error estmates than 15, but, t requres enormous computng power Notes on mplementaton In the mxture context, the log-lkelhood functon may have multple modes. A convenent way to crcumvent such lmtatons s to try several EM teratons wth a varety of startng values that are representatves of the parameter space. If there exst several modes, one can fnd the global mode by comparng ther relatve masses and log-lkelhood values. In partcular, the algorthm runnng wth dfferent startng values can be used to assess the stablty of the resultng estmates. Although the EM-type algorthm tends to be robust wth respect to the choce of the startng values, t may not converge when ntal values are far from optmum. The followng outlnes a smple procedure to acheve a set of reasonable ntal values. a Randomly generate a set of B bootstrap resamplng samples y 1,...,y B from the orgnal data y. b For each bootstrap sample, partton them nto g components usng the K-means clusterng algorthm and compute the

10 918 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN ntal values ŵ the ntal values = n Z j ˆξ, ˆσ 2 and /n. c For each parttoned component, compute ˆδ λ usng the method of moments as n Bayesan Modellng For Skew Normal Mxtures 4.1. The pror dstrbutons and posteror MCMC samplng We consder a Bayesan approach to 7 n whch Θ s regarded as random wth a pror dstrbuton that reflects our degree of belef n dfferent values of these quanttes. Snce fully non-nformatve pror dstrbutons are not permssble n the mxture context, the pror dstrbutons chosen are weakly nformatve subject to vague pror knowledge and ths avods nonntegrable posteror dstrbutons. The pror dstrbutons for model 7 takes ξ Nη,κ 1 = 1,...,g, σ 2 β Γα,β = 1,...,g, β Γν 1,ν 2, δλ U 1,1 ω Dh,...,h, = 1,...,g, where β s an unknown hyperparameter, η,κ,α,ν 1,ν 2,h are known datadependent constants, Γα, β denotes the gamma dstrbuton wth mean α/β and varance α/β 2, U 1,1 denotes the contnuous unform dstrbuton on the nterval [ 1,1], and Dh,...,h stands for the Drchlet dstrbuton wth the densty functon Γgh Γh g ωh 1 1 ωg 1 h 1 1 g 1 h 1. ω =1 For the values of η,κ,α,ν 1,ν 2,h, we follow Rchardson and Green 1997 n lettng η equal the mdpont of the observed nterval and κ 1 = R 2, where R s the range of the nterval, and n settng α = 2, ν 1 =.2, ν 2 = 1α/αR 2 and h = 2. Gven Θ = Θ k, the MCMC samplng scheme at the k + 1st teraton conssts of the followng steps. Step 1. Sample Z k+1 j j = 1,...,n from M1; ω1,...,ω g, where ω = ψy j ξ k g m=1 ωk m ψy j ξ k,σ 2k,λ k m,σ 2k m,λ k m = 1,...,g. Step 2. Gven Z j = 1, sample τ k+1 j j = 1,...,n from TN δλ k y j ξ k, σ 2k 1 δ 2 λ k Iτ j > }.

11 SKEW NORMAL MIXTURES 919 Step 3. Sample β k+1 from Γν 1 + gα, ν 2 + g =1 σ 2k. Step 4. Sample ω k+1 from Dh + n k+1 1,...,h + n k+1 g n Zk+1 j. Step 5. Gven Z j = 1, sample ξ k+1 N where µ k+1 ξ, µ k+1 ξ = σ 2k n Zk+1 j n k+1 1 δ 2 λ k + κ y j δλ k n n k+1 Step 6. Gven Z j = 1, sample σ 2k+1 1 b = 2 1 δ 2 λ k + from } 1, Zk+1 j τ k+1 j + κσ 2k Z k+1 j τj 2k+1 2δλ k, where n k+1 = 1 δ 2 λ k + κησ 2k 1 δ 2 λ k. from Γ α + n k+1, β k+1 + b, where Z k+1 j y j ξ k+1 2}. Z k+1 j τ k+1 j y j ξ k+1 Step 7. Sample δ k+1 = δλ k+1 1,...,δλ k+1 g va the Metropols Hastngs M-H algorthm Hastngs 197 from f δ [ g n 1 δ 2 λ 1 2 =1 k+1 τ 2 j exp 2δλ τ k+1 j 2σ 2k+1 y j ξ k+1 + y j ξ k+1 2 } ] Z 1 δ 2 λ k+1 j To elaborate on Step 7 of the above algorthm, we transform δλ to δ λ = log 1 + δλ / 1 δλ } and then apply the M-H algorthm to g δ = f δδ g =1 J δ λ, where δ = δ λ 1,...,δ λ g, and J δ λ =2e δ λ / 1+ e δ λ 2 s the Jacoban of transformaton from δλ to δ λ. A g-dmensonal multvarate normal dstrbuton wth mean δ k and covarance matrx c 2 Σ k δ s chosen as the proposal dstrbuton, where the scale c 2.4/ g, as suggested n Gelman, Robert and Glks The value of Σ k δ can be estmated by the nverted sample nformaton matrx gven y and Θ = Θ k. Havng obtaned δ from the M-H algorthm, we transform t back to δ by δλ =.

12 92 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN e δ λ 1/e δ λ + 1 = 1,...,g, and then transform δλ back to λ by δλ / 1 δ 2 λ. To avod the label-swtchng problem and slow stablzaton of the Markov chan, our ntal values Θ are chosen to be dspersed around the ML estmates wth the restrcton ξ 1 < < ξ g Convergence assessment usng multple chans Before conductng nference usng MCMC samples, the output should be analyzed to determne the requred run length of MCMC sequences. Gelman and Rubn 1992 proposed a convergence dagnostc ˆR, the potental scale reducton factor PSRF, obtaned by runnng multple chans wth overdspersed startng values. However, the approach s essentally unvarate. Recently, Brooks and Gelman 1998 provded a generalzaton of Gelman and Rubn s method that consder several parameters smultaneously. Suppose there are I ndependent parallel chans and the length of each chan s 2n. Let θ denote a p 1 vector of parameters and θ = θ 1,...,θ n denote the smulaton sample of the th chan = 1,..., I, after dscardng the frst n teratons. Brooks and Gelman 1998 stated that the posteror varancecovarance matrx of θ can be estmated by ˆV = n 1 1 n W B I n, where W and B/n denote the wthn and between-sequence sample covarance matrx estmates of θ 1,...,θ I, respectvely. They then proposed the multvarate potental scale reducton factor MP- SRF, ˆRp = n 1/n /Iλ 1, where λ 1 s the largest egenvalue of W 1 B/n. Note that the multvarate measure ˆR p bounds above the unvarate ˆR values over all p varables. Suppose the I parallel chans are mxng well wthn the model, ˆRp wll declne to 1 for reasonably large n. Meanwhle, f the I parallel chans are essentally overlappng, then the determnants of ˆV and W should stablze over the teratons and be suffcently close. 5. Examples 5.1. The enzyme data We frst carry out our methodology for the enzyme data set wth n = 245 observatons. The data were frst analyzed by Bechtel, Bonata-Pelleé, Posson, Magnette and Bechtel 1993, who dentfed a mxture of skew dstrbutons by the maxmum lkelhood technques of Maclean, Morton, Elston and Yee Rchardson and Green 1997 provded the reversble jump MCMC approach for

13 SKEW NORMAL MIXTURES 921 the unvarate normal mxture models wth an unknown number of components and dentfed the most possble values of g to be between 3 and 5. Table 1. Estmated parameter values and the correspondng standard errors SE for model 16 wth the enzyme data. ω ξ 1 ξ 2 σ 1 σ 2 λ 1 λ 2 Estmate SE We ft the followng two-component SNMIX model to the data fy = ωψy ξ 1,σ 2 1,λ ωψy ξ 2,σ 2 2,λ The ECM algorthm was run wth 1 startng values and was checked for convergence. All EM teratons under dfferent statng values converge to the same statonary pont wth log-lkelhood The resultng ML estmates and the correspondng standard errors are lsted n Table 1. In ths table, we found that the standard error for λ 2 s relatvely large. Ths s due to the fact that the log-lkelhood functon can be farly flat near the ML estmates of the shape parameter of the skew normal components. We have shown ths by plottng the profle log-lkelhood functon of λ 1,λ 2 n Fgure 1. 2 Profle log-lkelhood λ λ Fgure 1. Plot of the profle log-lkelhood for λ 1 and λ 2 for the enzyme data. For comparson purposes, we also ft a NORMIX model λ 1 = λ 2 = wth g = 2 5 components. The log-lkelhood maxmum and two nformaton-based crtera, AIC Akake 1973 and BIC Schwarz 1978, are dsplayed n the thrd to ffth columns of Table 2. Apparently, the ftted two-component SN- MIX model s superor to the ftted NORMIX model, snce t has the largest

14 922 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN log-lkelhood and the smallest AIC and BIC. The last two columns of ths table present the requred number of EM teratons and the assocated rate of convergence, r, whch s assessed n practce as r = lm t θ t+1 θ t θ t θ t 1. A relatve tolerance of 1 8 for the estmates of all parameters n the model was used as the convergence crteron. We note that the reported rate of convergence depends on the fracton of mssng nformaton and the greater the value of r mples the slower the convergence, see Meng In ths example, we also note that the estmatng procedure for fttng SNMIX model does not converge properly for g 3. Table 2. Comparson of log-lkelhood maxmum, AIC and BIC for ftted SNMIX and NORMIX models usng the enzyme data. The number of parameters and the rate of convergence are denoted by m and r, respectvely. Model g m log-lkelhood AIC BIC Iteratons r SNMIX NORMIX NORMIX NORMIX NORMIX NORMIX 6 > 123 > 185 AIC= 2log-lkelhood m; BIC= 2 log-lkelhood.5m logn } The fathful data As another example, we consder the Old Fathful Geyser data taken from Slverman It conssts of 272 erupton lengths n mnutes of the Old Fathful Geyser n Yellowstone Natonal Park, Wyomng, USA. The data appear to be bmodal wth asymmetrcal components. We ft a two-component SNMIX model 16 by analogy wth the prevous example. The ML estmates and the correspondng standard errors are reported n the second and thrd columns of Table 3, respectvely. We carry out an MCMC smulaton by runnng 1, teratons of ten ndependent parallel chans wth dfferent startng values for each chan over-dspersed around ±3 standard devatons of the ML estmates. The convergence of MCMC samplers s montored by examnng ˆR p values as dscussed n Secton 4.2. The montored values of ˆR p and the determnants of ˆV and W are plotted n Fgures 2a and 2b, respectvely. By examnng both fgures, convergence occurs around 4, teratons. Havng obtaned the remanng

15 SKEW NORMAL MIXTURES 923 converged MCMC smulaton samples, we computed the posteror mean, standard devaton, medan and 95% posteror nterval 2.5% and 97.5% posteror quantles, whch are lsted n the 4th-8th columns of Table 3. MPSRF , 2, 3, teraton no 4, 5, Generalzed Varance , 2, 3, teraton no 4, 5, Fgure 2. a Plot of MPSRF, ˆR p ; b Plot of the determnants 1 13 of ˆV sold and W dashed. Table 3. ML estmaton results and MCMC summary statstcs for the parameters of model 16 wth the fathful data. Parameter ML MCMC Estmate SE Mean SE Medan 2.5% 97.5% ω ξ ξ σ σ λ λ Fgure 3 dsplays the hstograms of the posteror samples of the model parameters. It s evdent that the shape of the posteror dstrbuton of λ 1 s skewed to the rght, whle the shape of the posteror dstrbuton of λ 2 s skewed to the left. It s nterestng to note that the posteror dstrbutons of the parameters λ 1,λ 2, whch regulate the skewness, are skewed as well.

16 924 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN w 1 w ξ 1 ξ σ 2 σ λ 1 λ Fgure 3. Hstograms of the posteror sample of the SNMIX parameters for the fathful data. Fnally, t s nterestng to compare the densty estmaton of NORMIX and SNMIX fttng results. The ML-ftted NORMIX and SNMIX denstes, together wth the Bayesan predctve SNMIX densty, are supermposed n Fgure 4a. Subsequently, the ftted cumulatve densty functons CDFs and the emprcal CDF are shown n Fgure 4b. Based on the graphcal vsualzaton, the resultng ML-ftted SNMIX densty, as well as the Bayesan predctve SNMIX densty, are more sutable than the ML-ftted NORMIX densty for ths data set. Furthermore, the ftted SNMIX CDFs more closely track the emprcal CDF than does the ftted NORMIX CDF. 6. Concludng Remarks In our examples, t s qute appealng that the skew normal mxtures can provde a more approprate densty estmaton than normal mxtures based on nformaton-based crtera and graphcal vsualzaton. There are a number of possble extensons of the current work. Mxture modellng usng the multvarate skew normal dstrbuton e.g., Azzaln and Dalla Valle 1996, Shau, Dey and Branco 23 and Gupta, González-Farías and Domínguez-Monla 24

17 SKEW NORMAL MIXTURES 925 s the most natural extenson and wll be reported n a follow-up paper. In addton, t would be a worthwhle task to model the number of components, g, and component parameters, Θ, jontly. For modellng both skewness and long tals n a mxture context, component denstes usng the skew t dstrbuton e.g., Jones and Faddy 23, Azzaln and Captano 23 and Ln, Lee and Hseh 27 s a feasble choce and awats further nvestgaton. densty CDFs a SNMIXBayesan SNMIX ML NORMIX ML y b Emprcal SNMIXBayesan SNMIX ML NORMIX ML Fgure 4. a Hstogram of the fathful data overlad wth denstes based on two ftted two-component SNMIX ML and Bayesan, and a ML-ftted two-component NORMIX; b Emprcal CDF of the fathful data overlad wth CDFs based on two ftted two-component SNMIX ML and Bayesan and a ML-ftted two-component NORMIX. y

18 926 TSUNG I. LIN, JACK C. LEE AND SHU Y. YEN Acknowledgement We gratefully acknowledge the Char Co-Edtor, an assocate edtor, and one referee for ther valuable comments, whch substantally mproved the qualty of the paper. Ths research was supported by the Natonal Scence Councl of Tawan. References Akake, H Informaton theory and an extenson of the maxmum lkelhood prncple. In 2nd Int. Symp. on Informaton Theory, Edted by B. N. Petrov and F. Csak, Akadema Kado, Budapest. Arnold, B. C., Beaver, R. J., Groeneveld, R. A. and Meeker, W. Q The nontruncated margnal of a truncated bvarate normal dstrbuton. Psychometrka 58, Azzaln, A A class of dstrbutons whch ncludes the normal ones. Scand. J. Statst. 12, Azzaln, A Further results on a class of dstrbutons whch ncludes the normal ones. Statstca 46, Azzaln, A. and Captano, A Statstcal applcatons of the multvarate skew-normal dstrbuton. J. Roy. Statst. Soc. Ser. B 61, Azzaln, A. and Captano, A. 23. Dstrbutons generated by perturbaton of symmetry wth emphass on a multvarate skew t-dstrbuton J. Roy. Statst. Soc. Ser. B 65, Azzaln, A. and Dalla Valle, A The multvarate skew-normal dstrbuton. Bometrka 83, Basord, K. E., Greenway D. R., McLachlan G. J. and Peel D Standard errors of ftted means under normal mxture. Comput. Statst. 12, Bechtel, Y. C., Bonat-Pelleé, C., Posson, N., Magnette, J. and Bechtel, P. R A populaton and famly study of N-acetyltransferase usng caffene urnary metaboltes. Cln. Pharm. Therp. 54, Brooks, S. P. and Gelman, A General methods for montorng convergence of teratve smulatons. J. Comput. Graph. Statst. 7, Dempster, A. P., Lard, N. M. and Rubn, D. B Maxmum lkelhood from ncomplete data va the EM algorthm wth dscusson. J. Roy. Statst. Soc. Ser. B 39, Debolt, J. and Robert, C. P Estmaton of fnte mxture dstrbutons through Bayesan samplng. J. Roy. Statst. Soc. Ser. B 56, Efron B. and Tbshran R Bootstrap method for standard errors, confdence ntervals, and other measures of statstcal accuracy. Statst. Sc. 1, Escobar, M. D. and West, M Bayesan densty estmaton and nference usng mxtures. J. Amer. Statst. Assoc. 9, Gelman, A., Robert, G. and Glks, W Effcent Metropols jumpng rules. In Bayesan Statstcs 5 Edted by J. M. Bernardo, J. O. Berger, A. P. Dawd and A. F. M. Smth. Oxford Unversty Press, New York. Gelman A. and Rubn D. B Inference from teratve smulaton usng multple sequences. Statst, Sc. 7, Genton, M. G. 24. Skew-Ellptcal Dstrbutons and Ther Applcatons. Chapman & Hall, New York.

19 SKEW NORMAL MIXTURES 927 Gupta, A. K., González-Farías G. and Domínguez-Monla, J. A. 24. A multvarate skew normal dstrbuton. J. Multvarate Anal. 89, Hastngs, W. K Monte Carlo samplng methods usng Markov chans and ther applcatons. Bometrka 57, Henze, N A probablstc representaton of the skew-normal dstrbuton. Scand. J. Statst. 13, Jones, M. C. and Faddy, M. J. 23. A skew extenson of the t-dstrbuton, wth applcatons. J. Roy. Statst. Soc. Ser. B 65, Ln, T. I., Lee, J. C. and Hseh, W. J. 27. Robust mxture modelng usng the skew t dstrbuton. Statst. Comput. 17, Lu, C. H. and Rubn, D. B The ECME algorthm: a smple extenson of EM and ECM wth faster monotone convergence. Bometrka 81, Maclean, C. J., Morton, N. E., Elston, R. C. and Yee, S Skewness n commngled dstrbutons. Bometrcs 32, McLachlan, G. J. and Basord, K. E Mxture Models: Inference and Applcaton to Clusterng. Marcel Dekker, New York. McLachlan, G. J. and Peel D. 2. Fnte Mxture Models. Wely, New York. Meng, X. L. and Rubn, D. B Maxmum lkelhood estmaton va the ECM algorthm: A general framework. Bometrka 8, Meng, X. L On the global and componentwse rates of convergence of the EM algorthm. Ln. Alg. Applc. 199, Rchardson, S. and Green, P. J On Bayesan analyss of mxtures wth an unknown number of components. J. R. Statst. Soc. B 59, Roberts, C. and Gesser, S A necessary and suffcent condton for the square of a random varable to be gamma. Bometrka 53, Sahu, S. K., Dey, D. K. and Branco, M. D. 23. A new class of multvarate skew dstrbutons wth applcaton to Bayesan regresson models. Canad. J. Statst. 31, Schwarz, G Estmatng the dmenson of a model. Ann. Statst. 6, Slverman, B. W Densty Estmaton for Statstcs and Data Analyss. Chapman & Hall, London. Stephens, M. 2. Bayesan analyss of mxture models wth an unknown number of components an alternatve to reversble jump methods. Ann. Statst. 28, Ttterngton, D. M., Smth, A. F. M. and Markov, U. E Statstcal Analyss of Fnte Mxture Dstrbutons. Wely, New York. Department of Appled Mathematcs, Natonal Chung Hsng Unversty, Tachung 42, Tawan. E-mal: tln@amath.nchu.edu.tw Insttute of Statstcs and Graduate Insttute of Fnance, Natonal Chao Tung Unversty, Hsnchu 3, Tawan. E-mal: jclee@stat.nctu.edu.tw Insttute of Statstcs, Natonal Chao Tung Unversty, Hsnchu 3, Tawan. E-mal: kelly.st92g@nctu.edu.tw Receved November 24; accepted November 25

Robust mixture modeling using multivariate skew t distributions

Robust mixture modeling using multivariate skew t distributions Robust mxture modelng usng multvarate skew t dstrbutons Tsung-I Ln Department of Appled Mathematcs and Insttute of Statstcs Natonal Chung Hsng Unversty, Tawan August, 1 T.I. Ln (NCHU Natonal Chung Hsng