Maximum likelihood estimation for multivariate skew normal mixture models

Size: px

Start display at page:

Download "Maximum likelihood estimation for multivariate skew normal mixture models"

Marlene Hudson
5 years ago
Views:

Journal of Multvarate Analyss 100 2009 257 265 Contents lsts avalable at ScenceDrect Journal of Multvarate Analyss ournal homepage: wwwelsevercom/locate/mva Maxmum lkelhood estmaton for multvarate

onlne 25 Aprl 2008 AMS 1991 subect classfcatons: 62F10 62H10 62H12 Ths paper provdes a flexble mxture modelng framework usng the multvarate skew normal dstrbuton A feasble EM algorthm s developed for

1 Journal of Multvarate Analyss Contents lsts avalable at ScenceDrect Journal of Multvarate Analyss ournal homepage: wwwelsevercom/locate/mva Maxmum lkelhood estmaton for multvarate skew normal mxture models Tsung I Ln Department of Appled Mathematcs Natonal Chung Hsng Unversty Tachung 402 Tawan a r t c l e n f o a b s t r a c t Artcle hstory: Receved 6 February 2006 Avalable onlne 25 Aprl 2008 AMS 1991 subect classfcatons: 62F10 62H10 62H12 Ths paper provdes a flexble mxture modelng framework usng the multvarate skew normal dstrbuton A feasble EM algorthm s developed for fndng the maxmum lkelhood estmates of parameters n ths context A general nformaton-based method for obtanng the asymptotc covarance matrx of the maxmum lkelhood estmators s also presented The proposed methodology s llustrated wth a real example and results are also compared wth those obtaned from fttng normal mxtures 2008 Elsever Inc All rghts reserved Keywords: EM algorthm Multvarate truncated normal dstrbutons Skew normal mxtures Stochastc representaton 1 Introducton A fnte mxture of dstrbutons n partcular the use of normal components has receved consderable attenton and s known to be a very powerful tool for modelng an extremely wde varety of random phenomena Most mportantly mxture modelng has been a favorable model-based technque n handlng supervsed classfcaton and unsupervsed clusterng problems There are a number of farly comprehensve monographs n ths area; see for example [ ] and the references contaned theren It s well known that there stll exst several problems n statstcal modelng of normal mxture NORMIX models For nstance normalty assumptons for component denstes could be volated when a set of data contans asymmetrc outcomes for each component Moreover the classcal normal mxture model tends to overft the data snce they need to nclude addtonal components to capture possbly excess skewness To overcome aforementoned weaknesses n the fttng of normal mxtures Ln et al [17] recently proposed a skew normal mxture SNMIX model usng the unvarate skew normal SN dstrbuton Azzaln [3] and showed ts great flexblty n modelng data wth asymmetrc behavors To allow for modelng real data as approprately as possble and to remedy unrealstc assumptons n classcal normalbased models a poneerng work on multvarate SN dstrbutons was frst studed by Azzaln and Dalla Valle [5] and subsequently generalzed by Gupta et al [14] and Arellano-Valle and Genton [2] among others Sahu et al [22] developed a more general class of dstrbutons by ntroducng skewness n multvarate ellptcally symmetrc dstrbutons They ponted out that the multvarate SN dstrbuton n ths famly s more flexble n adustng the correlaton structure than that proposed by Azzaln and Dalla Valle [5] The man obectve of ths work s to ntroduce a novel mxture modelng usng a new class of multvarate SN dstrbutons proposed by Sahu et al [22] For computatonal aspects I develop an effectve teratve procedure for obtanng maxmum Correspondng author E-mal address: tln@amathnchuedutw X/$ see front matter 2008 Elsever Inc All rghts reserved do:101016/mva

2 258 TI Ln / Journal of Multvarate Analyss lkelhood ML estmates of model parameters va the EM algorthm [7] Moreover I provde a smple way of obtanng standard errors of estmates by nvertng the observed nformaton matrx nstead of performng the computatonally ntensve bootstrap method As an llustraton I apply the proposed method on a real data set and show the advantage of usng SNMIX models Some concludng remarks are gven at the end and techncal dervatons are collected n Appendx 2 Prelmnares I start by formulatng some dstrbutonal propertes of the multvarate skew normal dstrbuton that was ntroduced by Sahu et al [22] Besdes I gve a stochastc representaton whch s useful for the constructon of complete data framework I next revew the multvarate truncated normal dstrbuton where truncaton s at arbtrary ponts and provde general formulae for computng the correspondng frst two moments These analytcal results are useful for the proposed EM algorthm 21 The multvarate skew normal dstrbuton A random vector Y s sad to follow a p-dmensonal skew normal dstrbuton wth a p 1 locaton vector ξ a p p postve defnte scale covarance matrx Σ and a p p skewness matrx Λ f ts densty functon s fy ξ Σ Λ 2 p φ p y ξ ΩΦ p Λ T Ω y ξ 1 wth Ω Σ + ΛΛ T and I p + Λ T Σ Λ I p Λ T Ω Λ where I p s a p p dentty matrx Moreover φ p µ Σ and Φ p Σ denote the probablty densty functon pdf of N p µ Σ and cumulatve densty functon cdf of N p 0 Σ respectvely I denote ths dstrbuton by Y SN p ξ Σ Λ hereafter and note that 1 belongs to the famly of skew-ellptcal dstrbutons as defned n [22] Typcally f Λ s assumed to be dagonal then the covarance structure of Y s not affected by the ntroducton of skewness Let Φ p denote the cdf of N p 0 I p The moment generatng functon of Y s M Y t 2 p exp t T ξ + 1 } 2 tt Ωt Φ p Λ T t t t 1 t p T R p 2 Expressng Λ T t p k1 λ k1t k p k1 λ kpt k T straghtforward calculatons gve the followng results and Φ p Λ t T t t0 2 p 2 π λ r r1 2 Φ p Λ T 2 } t 2 p λ t t r λ r λ r λ r 4 t0 π r1 r1 r1 Takng the frst two dervatves of 2 and applyng 3 and 4 the mean vector and the covarance matrx can be wrtten as 2 EY ξ + π Λ1 p covy Σ ΛΛ T π where 1 p s a p-dmensonal vector of ones Assumng Z N p 0 I p t follows that Z s dstrbuted as a p-dmensonal standard half-normal dstrbuton denoted by HN p 0 I p By Proposton 1 of Arellano-Valle et al [1] t turns out that 1 has a convenent stochastc representaton Y ξ + Λτ + U where τ and U are ndependently dstrbuted as HN p 0 I p and N p 0 Σ respectvely Note that the expresson 5 provdes a useful tool for random number generaton and for theoretcal purposes Moreover t s easy to see Y τ N p ξ + Λτ Σ Hence the densty of Y n 1 can be obtaned by usng the convoluton of denstes of Y τ and τ and Lemma 21 of [2] The multvarate truncated normal dstrbuton Let TN p µ Σ; A denote a p-varate truncated normal dstrbuton for N p µ Σ lyng wthn a truncated hyperplane regon A x x 1 x p T x 1 > a 1 x p > a p } and use the notaton p 1 a p for the abbrevaton of multple ntegrals Talls [23] has provded the formulae for the frst two moments of a multvarate truncated normal dstrbuton TN p 0 R; A where R s a correlaton matrx Under ths truncaton type I shall generalze Talls results to provde explct formulae for computng the frst and second moments of general multvarate truncated normal dstrbutons a a 1

3 TI Ln / Journal of Multvarate Analyss Consder a random vector X X 1 X p T whch has a p-varate truncated normal densty gven by fx µ Σ; A 1 α φ px µ ΣI A x 6 where α p 1 a φ p x µ Σdx wth a s are arbtrary real numbers and I A x s the ndcator functon whose value equals one f x A and zero elsewhere I shall use the notaton X TN p µ Σ; A to denote that X has densty 6 The moment generatng functon of X s of X M X t α exp t T µ + 1 } p 2 tt Σt 1 a φ p x µ + Σt Σdx 7 Dfferentatng 7 wth respect to t then evaluatng the dervatve wth t 0 one readly obtans the margnal mean EX µ + α σ r f r a r G r r1 where σ denotes the th entry of Σ f r a r φa r µ r σ rr denotes a normal densty wth mean µ r and varance σ rr for the rth varable evaluated at a r and G r r a φ p x r µ r 2 1 Σ r 22 1 dx r wth φ p x r µ r 2 1 Σ r 22 1 beng the condtonal densty of the remanng p 1 varables gven X r a r Smlarly we can verfy that σ EX X µ EX + µ EX µ µ + σ + α r σ r a r µ r f r a r G r r1 σ rr + σ r σ s σ } rsσ r f rs a r a s G rs 9 r1 s r σ rr where f rs a r a s s a bvarate normal densty of the r sth varables of N p µ Σ evaluated at a r a s and G rs rs a φ p 2 x rs µ rs 2 1 Σ rs 22 1 dx rs wth φ p 2 x rs µ rs 2 1 Σ rs 22 1 beng the condtonal densty of the remanng p 2 varables gven X r a r and X s a s Throughout ths paper I wll use the followng notatons: [A] rs denotes the r sth entry of a gven matrx A; Dag denotes a dagonal matrx created by extractng the man dagonal elements of a square matrx or the dagonalzaton of a vector; and dag denotes a vector contanng the dagonal elements of a square matrx After some algebrac manpulatons expressons 8 and 9 can be wrtten n matrx notatons as follows: 8 EX η µ + α Σq 10 where q q 1 q p T s a p 1 vector whose rth element s f r a r G r and EXX T µη T + ηµ T µµ T + Σ + α ΣH + DΣ 11 where H s a p p matrx wth all dagonal entres beng zero and f rs a r a s G rs on the r sth off-dagonal entry and D s a p p dagonal matrx whose rth dagonal entry s σ rr ar µ r f r a r G r [ΣH] rr Note that the computaton of truncated moments s hghly depended on the numercal method for Φ p whch can be swftly evaluated by the fast algorthms descrbed n Genz [1213] The procedures proposed n Genz s papers can be mplemented by usng the package mvtnorm avalable at A computer code for the computaton of the mean and covarance matrx of multvarate truncated normal dstrbutons wrtten n R language s avalable from the author upon request 3 ML estmaton for multvarate skew normal mxtures I consder the ML estmaton for a g-component mxture model n whch a set of random sample Y 1 Y n follows a mxture of multvarate skew normal dstrbutons Its probablty densty functon can be wrtten as Y w fy ξ Σ Λ w 0 1 w where Θ θ 1 θ g wth θ w ξ Σ Λ beng the unknown parameters of component and w s beng the mxng probabltes The ML estmates ˆΘ based on a set of ndependent observatons y y T 1 y T n T s ˆΘ argmax lθ y Θ

4 260 TI Ln / Journal of Multvarate Analyss where lθ y log w fy ξ Σ Λ 1 13 s the observed log-lkelhood functon Generally there s no explct analytcal soluton of ˆΘ but t can be acheved teratvely by usng the EM algorthm under the complete data framework 14 dscussed later In the context of herarchcal mxture modelng for each Y t s convenent to ntroduce a set of zero-one ndcator varables Z Z 1 Z g T for 1 n whch s a multnomal random vector wth 1 tral and cell probabltes w 1 w g denoted as Z M1; w 1 w g Note that the rth element Z r 1 f Y arses from component r From 5 wth the ncluson of ndcator varables Z s a herarchcal representaton of 12 s gven by Y τ Z 1 N p ξ + Λ τ Σ τ Z 1 HN p 0 I p Z M1; w 1 w g 1 n 14 By Bayes Theorem t can be shown that τ Y y Z 1 TN p Λ T Ω y ξ ; R p + 15 where Ω Σ + Λ Λ T I p + Λ T Σ Λ I p Λ T Ω Λ and R p + x x 1 x p T R p x > 0 1 p} In what follows denote Eτ y Z 1 η and Eτ τ T y Z 1 Ψ 16 where η and Ψ are both mplct functons of parameters ξ Σ Λ It s crucal to emphasze that evaluatons of 16 rely heavly on the results of 10 and 11 For notatonal smplcty let Z Z T 1 ZT n T and τ τ T 1 τ T n T From 14 the complete-data log-lkelhood functon of Θ gnorng addtve constants s l c Θ y Z τ 1 Z logw 1 2 log Σ 1 2 y ξ Λ τ T Σ y ξ Λ τ 1 } 2 τ T τ 17 I adopt the EM algorthm for fndng ML estmates Formally the E-step of the EM algorthm requres to calculate the socalled Q-functon QΘ ˆΘk E lc Θ y τ Z y ˆΘk whch s the condtonal expectaton of 17 gven observed data y and the current estmated parameters ˆΘk To calculate the Q-functon t can be observed from 17 that the condtonal expectaton of the term 1 2 Z τ Tτ can be omtted because t does not nclude any parameters thereby the necessary condtonal expectatons nvolved n the Q-functon are EZ y ˆΘk EZ τ y ˆΘk and EZ τ τ T y ˆΘk The mplementaton of the EM algorthm proceeds as follows: E-step: At the kth teraton compute and EZ y ˆΘk ŵ k m1 fy ˆξ k k ˆΣ ˆΛk ŵ k m fy ˆξ k ẑ k m ˆΣ m ˆΛk m k EZ τ y ˆΘk EZ y ˆΘk Eτ Z 1 y ˆΘk k ẑ ˆη k EZ τ τ T y ˆΘk EZ y ˆΘk Eτ τ T Z 1 y ˆΘk ẑ k ˆΨ k where ˆη k k and ˆΨ are η and Ψ n 16 wth ξ Σ and Λ replaced by ˆξ k k ˆΣ Therefore the Q-functon can be wrtten by QΘ ˆΘk 1 ẑ k logw log Σ y ξ Λ ˆη k 1 2 tr Σ 1 2 y ξ Λ ˆη k T Σ Λ ˆΨ k ˆη k and ˆΛk respectvely } ˆη kt Λ T 18 M-step: 1 Update ŵ k by ŵ k+1 n n ẑk

5 2 Update ˆξ k ˆξ k+1 3 Fx ξ ˆξ k+1 TI Ln / Journal of Multvarate Analyss by maxmzng 18 over ξ whch leads to / n ẑ k y ˆΛk z k ˆη k n z k ˆΛ k+1 update ˆΛk by maxmzng 18 over Λ whch leads to n ẑ k y ˆξ k+1 n ˆη kt ẑ k ˆΨ k 3 In the case wth Λ assumed to be dagonal say Λ Dagλ where λ s a p-dmensonal vector then update ˆλ k by ˆλ k+1 ˆΣ k ẑ k ˆΨ k ˆΣ k ẑ k ˆη k y ˆξ k+1 where the operator denotes the Hadamard elementwse product [15] of two matrces of the same dmenson 4 Fx ξ ˆξ k+1 and Λ ˆΛk+1 k update ˆΣ by maxmzng 18 over Σ whch leads to ˆΣ k+1 1 ẑ k + ˆΛk+1 n ẑ k n y ˆξ k+1 ẑ k ˆΨ k ˆη k ˆΛk+1 ˆη kt ˆη k y ˆξ k+1 } ˆΛk+1 T T ˆΛk+1 4 k In the case where the scale covarance matrces are homoscedastc say Σ 1 Σ g Σ then update ˆΣ by ˆΣ k+1 1 n ẑ k y ˆξ k+1 ˆΛk+1 ˆη k y ˆξ k+1 ˆΛk+1 ˆη k T n 1 n + ˆΛk+1 ẑ k k ˆΨ ˆη k } ˆη kt ˆΛk+1 T I further offer some remarks on the mplementaton of the proposed EM algorthm Remark 1 To montor the convergence by usng the lkelhood ncreasng property of the EM algorthm [725] a smple way s to repeat teratons after a certan number of teratons or untl the dfference between two successve log-lkelhood evaluatons s small enough Remark 2 As analogous to other teratve optmzaton procedures such as the Newton Raphson or Fsher scorng algorthms one needs to search for approprate ntal values to avod dvergence or tme-consumng computatons I offer a smple way of automatcally generatng a selecton of ntal values The technque proceeds as follows: Perform a K- means clusterng [11] ntalzed wth respect to a random start Specfy the zero-one component membershp ndcator Ẑ 0 Ẑ 0 } g 1 accordng to the the K-means clusterng results The ntal values of mxng probabltes component locatons and scale covarance matrces are then explctly chosen as ŵ 0 Ẑ 0 n ˆξ 0 Ẑ 0 y Ẑ 0 ˆΣ 0 Ẑ 0 y ˆξ 0 y ˆξ 0 0 Meanwhle f Σ s are assumed to be dentcal say Σ 1 Σ g Σ then ˆΣ s taken as the sample covarance of the whole sample Furthermore the ntal skewness matrces can be chosen as dagonal say ˆΛ0 Dag ˆλ 0 1 ˆλ 0 p } wth the value of each entry chosen slghtly devated from zero eg ˆλ 0 r s taken as 3 or 3 whose sgn s measured by the sgn of the sample skewness of the K-means clusterng observatons Ẑ 0 T 1 p ˆη k T Remark 3 The man dffculty n dealng wth mxture models s to fnd the global maxmzer of Θ for nstance the lkelhood functon LΘ y mght be unbounded n certan stuatons Another oft-voced crtcsm s that the EM-type procedure tends to get stuck n local modes One convenent way to crcumvent such lmtatons s to try several EM teratons

6 262 TI Ln / Journal of Multvarate Analyss under a varety of startng values If there exst several modes one can fnd the global mode by comparng ther log-lkelhood values In partcular the algorthm runnng wth dfferent startng values can also be used to assess the stablty of the resultng estmates Under certan boundedness condtons stated n Render and Walker [21] the ML estmate ˆΘ s consstent and converges n dstrbuton to a zero-mean normal random vector whose covarance matrx s the nverse Fsher nformaton matrx That s n ˆΘ Θ 0 d N q 0 I Θ 0 where q DmΘ Θ 0 s the true value of Θ and IΘ E 2 lθ y/ ΘΘ T s the Fsher nformaton matrx For a more detaled dscusson on the asymptotc theores of ML estmators for mxture models nterested readers are referred to [2021] 4 Provson of standard errors A smple way of obtanng the standard errors of ML estmates of mxture model parameters s to approxmate the asymptotc covarance matrx of ˆΘ by the nverse of the observed nformaton matrx see eg Basford et al [6] Let I o Θ y 2 lθ y/ Θ Θ T be the observed nformaton matrx where lθ y s the observed log-lkelhood functon as n 13 The estmated observed nformaton matrx can be reduced to I o ˆΘ y ŝ ŝ T 19 where ŝ E ˆΘ l c Θ y Z τ / Θ y wth l c Θ y Z τ beng the complete-data log-lkelhood formed from the sngle observaton y for 1 n Let vec be the matrx operator whch stacks all columns of a matrx nto a vector and vech the matrx operator whch arranges the supradagonal elements of a symmetrc matrx Let ŝ be a vector contanng ŝ ŝ w1 ŝ wg ŝ ξ 1 ŝ ξ g ŝ λ1 ŝ λg ŝ σ 1 ŝ σ g T where λ vecλ and σ vechσ Expressons for the elements of ŝ w ŝ ξ ŝ λ and ŝ σ are gven by where ŝ w ẑ ŵ ẑg ŵ g ŝ ξ ẑ ˆΣ y ˆξ ˆΛ ˆη ŝ λ vec ẑ ˆΣ y ξ ˆη T ˆΛ ˆΨ 1 ŝ σ vech 2 ẑ 2Â DagÂ Â ˆΣ y ˆξ ˆΛ ˆη y ˆξ ˆΛ ˆη T + ˆΛT ˆΨ ˆη ˆη T ˆΛ ˆΣ ˆΣ 20 ẑ ŵ fy ˆξ ˆΣ ˆΛ / g m1 ŵmfy ˆξ m ˆΣ m ˆΛm s the posteror probablty that the observaton y belongs to component and ˆη and ˆΨ are obtaned by substtutng the ML estmates ˆξ ˆΣ ˆΛ nto 16 If the skewness matrces are assumed to be dagonal e λ dagλ then ŝ λ ẑ ˆΣ ˆη y ˆξ T } 1 p ˆΣ ˆΨ ˆλ 21 Furthermore f one assumes that the scale covarance matrces are homoscedastc e σ 1 σ g σ then 1 ŝ σ vech ẑ 2Â 2 DagÂ 22 where 1 Â ˆΣ y ˆξ ˆΛ ˆη y ˆξ ˆΛ ˆη T + ˆΛT ˆΨ ˆη ˆη T ˆΛ ˆΣ ˆΣ The detaled proofs of are gven n Appendx The nformaton-based approxmaton 19 s asymptotcally applcable However t s less relable unless the sample sze s suffcently large Alternatvely t s common practce to perform the parametrc bootstrap approach Efron and Tbshran [8] to obtan more accurate standard error estmates whle t requres enormous amounts of computng power

7 TI Ln / Journal of Multvarate Analyss Table 1 ML estmates and the assocated standard errors for the ftted two-component SNMIX model for the bank data Parameter w ξ 11 ξ 12 σ 111 σ 112 σ 122 λ 111 λ 122 Estmate SE Parameter 1 w ξ 21 ξ 22 σ 211 σ 212 σ 222 λ 211 λ 222 Estmate SE Table 2 Model selecton crtera for the bank data Model m l ˆΘ y LRT P-value AIC BIC NORMIX SNMIX A practcal example As an llustraton I apply the methods descrbed n prevous sectons to the famous bank data set whch was orgnally reported Tables 11 and 12 n Flury and Redwyl [9] and subsequently analyzed by Ma and Genton [18] wth a flexble skewsymmetrc dstrbuton The data consst of sx measurements made on 100 genune and 100 counterfet old Swss 1000 franc blls In ths example the goal s to verfy the developed estmatng devce and assess the relatve performances of the ftted SNMIX and NORMIX models To smplfy the analyss attenton s focused on the sample of X 1 : the wdth of the rght edge and X 2 : the length of the mage dagonal Margnally each of the two varables exhbts a bmodal dstrbuton wth asymmetrc components I now carry out the EM procedure for fndng the parameter estmates of a two-component model 12 To avod the correlaton structure affected by the ncluson of skewness parameters emphaszed by Sahu et al [22] the skewness matrces Λ for 1 2 are chosen as dagonal More specfcally the model to be ftted can be wrtten as fy Θ wfy ξ 1 Σ 1 Λ wfy ξ 2 Σ 2 Λ where [ ] ξ ξ 1 ξ 2 T σ11 σ Σ 12 σ 12 σ 22 [ ] λ11 0 and Λ 0 λ To get several dfferent sets of startng values ths can be done frst by randomly generatng a set of B bootstrap resamplng samples y 1 y from the orgn data y then computng B ˆΘ0 for each bootstrap sample usng the method descrbed n Remark 2 The EM algorthm was run under B 30 dfferent sets of startng values and was termnated when an ncrease n the log-lkelhood s less than 10 4 For ths data set these EM roots computed under dfferent startng values converge to smlar statonary ponts wth the largest log-lkelhood The resultng ML estmates and the assocated standard errors are reported n Table 1 From the reported nformatonbased standard errors all the parameters are statstcally sgnfcant except for σ 212 and λ 222 The estmates of skewness parameters reveal the two varables are both sgnfcantly skewed to the left n component 1 Wth regard to component 2 only X 1 s sgnfcantly skewed to the rght For comparson purposes I also ft a NORMIX model whch can be treated as a reduced model of SNMIX wth parameters n skewness matrces specfed by zeros For testng the null hypothess H 0 : Λ 1 Λ 2 0 NORMIX versus the alternatve hypothess H 1 : at least Λ SNMIX the lkelhood rato test LRT statstc whch s a comparson of lkelhood scores between two compettve models s used to udge whch of the two models s more approprate for ths data set The LRT statstc for testng the exstence of skewness n component denstes gves a value 2418 whch s hghly sgnfcant compared to a χ 2 4 dstrbuton ndcatng that the null hypothess s not acceptable for the bank data Furthermore the fts of two models are also compared based on the Akake nformaton crteron AIC and Bayesan nformaton crteron BIC whch are defned as AIC 2l ˆΘ y m and BIC 2 l ˆΘ y 05 m logn } respectvely where l ˆΘ y s the maxmzed log-lkelhood m s the number of parameters and n s the sample sze The comparson results are lsted n Table 2 It s readly seen from the table that both AIC and BIC values as well as the LRT statstc consstently favor the SNMIX model The contours of the ML-ftted SNMIX and NORMIX denstes are depcted n Fg 1 As antcpated the ftted SNMIX densty has better ablty to capture the asymmetry and tracks the data more closely than does the ftted NORMIX densty

264 TI Ln / Journal of Multvarate Analyss 100 2009 257 265 Fg 1 Scatter plot of X 1 X 2 overlad on the contours of ftted two-component a SNMIX b NORMIX models The genune old Swss 1000 franc blls are

8 264 TI Ln / Journal of Multvarate Analyss Fg 1 Scatter plot of X 1 X 2 overlad on the contours of ftted two-component a SNMIX b NORMIX models The genune old Swss 1000 franc blls are ndcated by the sold crcles and the pluses + denote the counterfet ones 6 Concludng remarks In ths paper I have presented an ML approach to estmatng the parameters as well as ther nformaton-based standard errors for a multvarate settng of SNMIX models I have descrbed a stochastc normal-truncated normal-multnomal herarchcal representaton of SNMIX and presented an effectve EM algorthm for dealng wth ML estmaton n a flexble complete data framework The formulae for computng the frst two moments of the multvarate truncated normal dstrbuton and ther usefulness n computng condtonal expectatons are also shown The proposed EM algorthm appears to be easly mplemented and coded wth exstng statstcal software such as R package Numercal results llustrated n Secton 5 ndcate that the SNMIX model for the bank data s evdently more adequate than the conventonal NORMIX model Whle the SNMIX model consdered n ths paper has proved ts great flexblty n regulatng skewness among components ts robustness aganst outlers could be serously affected by thck taled observatons Ln et al [16] have recently proposed a remedy to accommodate skewness and heavy-taledness smultaneously usng the mxture of skew t dstrbutons Azzaln and Captano [4] However ther approach s restrcted to data wth unvarate outcomes I conecture that the methodology presented n ths paper can be undertaken under a multvarate settng of skew t mxtures and should yeld satsfactory results n certan stuatons at the expense of addtonal complexty of mplementaton Nevertheless a deeper nvestgaton of those modfcatons s beyond the scope of the present paper but provdes nterestng topcs for further research Acknowledgments The author would lke to express hs deepest thanks to the Chef Edtor the Assocate Edtor and two anonymous referees for ther nsghtful comments and valuable suggestons whch led to substantal mprovements n the presentaton of ths work I am also grateful to Ms Chang-Lng Chen for her ntal smulaton study and to Prof Jack C Lee for hs kndness and patence n proofreadng the earler verson of ths paper Ths research was partly supported by the Natonal Scence Councl of Tawan Grant NO NSC M MY2 Appendx Proofs of Eqs Let l c l c Θ y Z τ denote the complete-data log-lkelhood formed from the sngle observaton y Thus l c Z log w 1 2 log Σ 1 2 y ξ Λ τ T Σ y ξ Λ τ 1 } 2 τ T τ 1 Now recall the formulae for matrx dervatves log Σ 2Σ DagΣ Σ trσ A 2Σ AΣ + DagΣ AΣ A1 Σ f Σ and A are symmetrc and Σ s nonsngular

9 TI Ln / Journal of Multvarate Analyss By applyng A1 the frst dervatves of l c wth respect to w ξ Λ and Σ are l c w Z w Z g w g l c ξ Z Σ y ξ Λ τ l c Λ l c Σ Z Σ y ξ τ T Λ τ τ T 1 2 Z + Dag 2Σ Σ 1 2 Z 2A DagA DagΣ 2Σ y ξ Λ τ y ξ Λ τ T Σ } y ξ Λ τ y ξ Λ τ T Σ A2 where A Σ y ξ Λ τ y ξ Λ τ T Σ Σ Now f Λ s a dagonal matrx e λ dagλ then l c λ 1 2 Z Z 2dag Σ Σ y ξ τ T In the case of Σ 1 Σ g Σ one obtans + 2dag Σ Λ τ τ T } τ y ξ T 1 p Σ τ τ T λ } A3 l c Σ 1 Z 2Σ DagΣ 2Σ y ξ Λ τ y ξ Λ τ T Σ Dag Σ y ξ Λ τ y ξ Λ τ T Σ } 1 2 Z 2A DagA where A Σ y ξ Λ τ y ξ Λ τ T Σ Σ On evaluaton at Θ ˆΘ takng the condtonal expectatons of A2 A4 yelds the score estmates References [1] RB Arellano-Valle H Bolfarne VH Lachos Bayesan nference for skew-normal lnear mxed models J Appl Stat [2] RB Arellano-Valle MG Genton On fundamental skew dstrbutons J Multvarate Anal [3] A Azzaln A class of dstrbutons whch ncludes the normal ones Scand J Statst [4] A Azzaln A Captano Dstrbutons generated by perturbaton of symmetry wth emphass on a multvarate skew t-dstrbuton Roy Statst Soc Ser B [5] A Azzaln A Dalla Valle The multvarate skew-normal dstrbuton Bometrka [6] KE Basford DR Greenway GJ McLachlan D Peel Standard errors of ftted means under normal mxture Comp Statst [7] AP Dempster NM Lard DB Rubn Maxmum lkelhood from ncomplete data va the EM algorthm wth dscusson J Roy Statst Soc Ser B [8] B Efron R Tbshran Bootstrap method for standard errors confdence ntervals and other measures of statstcal accuracy Statst Sc [9] B Flury H Redwyl Multvarate Statstcs a Practcal Approach Cambrdge Unversty Press Cambrdge 1988 [10] S Frühwrth-Schnatter Fnte Mxture and Markov Swtchng Models Sprnger New York 2006 [11] JA Hartgan MA Wong Algorthm AS 136: A K-means clusterng algorthm Appl Stat [12] A Genz Numercal computaton of multvarate normal probabltes J Comput Graph Statst [13] A Genz Comparson of methods for the computaton of multvarate normal probabltes Comp Sc Statst [14] AF Gupta G González-Farías JA Domínguez-Monla A multvarate skew normal dstrbuton J Multvarate Anal [15] GPH Styan Hadamard products and multvarate statstcal analyss Lnear Algebra Appl [16] TI Ln JC Lee WJ Hseh Robust mxture modelng usng the skew t dstrbuton Statst Comp [17] TI Ln JC Lee SY Yen Fnte mxture modellng usng the skew normal dstrbuton Statst Snca [18] Y Ma MG Genton Flexble class of skew-symmetrc dstrbtons Scand J Statst [19] GJ McLachlan KE Basord Mxture Models: Inference and Applcaton to Clusterng Marcel Dekker New York 1988 [20] GJ McLachlan D Peel Fnte Mxture Models Wely New York 2000 [21] RA Redner HF Walker Mxture denstes maxmum lkelhood and the EM algorthm SIAM Rev [22] SK Sahu DK Dey MD Branco A new class of multvarate skew dstrbutons wth applcaton to Bayesan regresson models Canad J Statst [23] GM Talls The moment generatng functon of the truncated mult-normal dstrbuton J Roy Statst Soc Ser B [24] DM Ttterngton AFM Smth UE Markov Statstcal Analyss of Fnte Mxture Dstrbutons Wely New York 1985 [25] CFJ Wu On the convergence propertes of the EM algorthm Ann Statst A4

Robust mixture modeling using multivariate skew t distributions

Robust mixture modeling using multivariate skew t distributions Robust mxture modelng usng multvarate skew t dstrbutons Tsung-I Ln Department of Appled Mathematcs and Insttute of Statstcs Natonal Chung Hsng Unversty, Tawan August, 1 T.I. Ln (NCHU Natonal Chung Hsng