Maximum Likeliood Estimation fo Finite Mixtues of Canonical Fundamental Skew t-distibutions: te Unification of te Unesticted and Resticted Skew t-mixtue Models axiv:1401.818v1 [stat.me] 31 Jan 014 Saon X. Lee, Geoffey J. McLaclan Depatment of matematics, te Univesity of Queensland, Bisbane, Austalia Abstact In tis pape, we pesent an algoitm fo te fitting of a location-scale vaiant of te canonical fundamental skew t CFUST distibution, a supeclass of te esticted and unesticted skew t-distibutions. In ecent yeas, a few vesions of te multivaiate skew t MST model ave been put fowad, togete wit vaious EM-type algoitms fo paamete estimation. Tese fomulations adopted eite a esticted o unesticted caacteization fo tei MST densities. In tis pape, we examine a natual genealization of tese developments, employing te CFUST distibution as te paametic family fo te component distibutions, and point out tat te esticted and unesticted caacteizations can be unified unde tis geneal fomulation. We sow tat an exact implementation of te EM algoitm can be acieved fo te CFUST distibution and mixtues of tis distibution, and pesent some new analytical esults fo a conditional expectation involved in te E-step. 1 Intoduction To date, vaious paametic families of densities ave been poposed in te liteatue to epesent te multivaiate skew t MST model, all aving eite a esticted o unesticted caacteization accoding to te classification sceme pesented in Lee and McLaclan 013b. In paticula, te esticted MST MST mixtue model o its equivalent vaiants by epaameteization ave been examined in Pyne et al. 009, Cabal et al. 01 and Vbik and McNicolas 01, wile Lin 010 and Lee and McLaclan 011 focused on te unesticted MST umst mixtue model adopting te caacteization poposed by Sau et al. 003. In addition, Muay et al. 013 ecently consideed an altenative constuction tat also adopted te tem skew t-distibution, altoug efeing to a quite diffeent fomulation moe commonly known as te genealized-ypebolic skew t GHST mixtue model. Altoug tis latte distibution is a esticted type of skew distibution unde te agument tat te skewing vaiable is a scala, it diffes fom te MST distibution in a numbe ways, suc as aving diffeent beaviou in its tails and not becoming a skew nomal distibution as a limiting and/o special case. In tis pape, we conside a natual and staigtfowad extension of te umst distibution, wee te skewing function of its density is given by a q-dimensional t-distibution function, 1
and te skewness paamete is a geneal p by q matix. Tis fomulation coincides wit a special case of te unified skew t SUT distibution Aellano-Valle and Azzalini, 006 and te fundamental skew t FUST distibution Aellano-Valle and Genton, 005, and is known as a location-scale vaiant of te canonical fundamental skew t CFUST distibution. Tis caacteization includes te esticted fom of te MST distibution. As discussed in Lee and McLaclan 013a, wile exact implementation of te Expectation- Maximization EM algoitm ave been acievable fo esticted caacteizations of te component skew t-distibutions, paamete estimation fo te unesticted and moe geneal caacteizations is substantially moe inticate. In pevious wok on mixtues of unesticted skew t-distibutions using Sau et al. 003 s caacteization a submodel of te CFUST distibution consideed in tis pape, Lin 010 esoted to Monte Calo metods fo te computation of intactable conditional expectations involved in te E-step, wile Lee and McLaclan 013a adopted te OSL one-step late appoac to evaluating one of tese difficult conditional expectation. In contast fo te MST distibution, a numbe of closed-fom implementations of te EM algoitm ave been put fowad see Lee and McLaclan 013a fo fute discussion. Tis pape pesents an EM fitting algoitm fo te FM-CFUST model. Unde tis geneal fomulation, te afoementioned esticted and unesticted skew t-mixtue models can be obtained as special cases and, as suc, te expessions on te E- and M-steps of te poposed algoitm would apply diectly in tese situations. In contast to te appoac of Lin 010 wic elies on computationally intensive numeical appoximations, ou development was motivated by te desie fo an exact implementation of te EM algoitm using te tuncated moments appoac Lee and McLaclan, 013a, Ho et al., 01, leading to muc faste and moe accuate esults. Also, we deive a new exact expession fo te emaining E-step conditional expectation tat ad peviously elied on appoximation o numeical metods. Te CFUST distibution Te multivaiate canonical fundamental skew t-distibution CFUST was intoduced by Aellano-Valle and Genton 005. A location-scale vaiant of te CFUST distibution can be caacteized as follows. Let U 1 be p-dimensional andom vecto and U 0 a q-dimensional andom vecto. Suppose tat, conditional on a scala gamma vaiable w gamma,, te joint distibution of U 0 and U 1 is given by [ ] [ ] U 0 0 N U q+p, 1 [ ] Iq 0, 1 1 0 w 0 Σ wee is a scala, I q denotes te q q identity matix, Σ is a positive definite scale matix, and 0 is a vecto/matix of zeos wit appopiate dimensions. Ten Y µ + U 0 + U 1 follows te CFUST distibution, wose density is defined by f y; µ, Σ,, q t p y; µ, Ω, T q qy ; 0, Λ,, 3 + dy wee Ω Σ + T, q T Ω 1 y µ, Λ I p T Ω 1, dy y µ T Ω 1 y µ.
Hee, we let t p y; µ, Ω, denotes te p-dimensional t-distibution wit location paamete µ, scale matix Ω, and degee of feedom v, and T q. is te q-dimensional cumulative t- distibution function. In te above, U 0 denotes te vecto wose it element is te magnitude of te it element of te vecto U 0..1 Te unesticted caacteization as a special case As mentioned peviously, te caacteization of Sau et al. 003 coesponds to te CFUST distibution wit q p and diagδ. Hence, we can wite Λ I p Σ 1, q Ω 1 y µ and Ω Σ +, and te umst density is given by f y; µ, Σ, δ, p t p y; µ, Ω, T p qy ; 0, Λ, + dy. 4 Te umst distibution as te same stocastic epesentation as given by and 1 wit q eplaced by p.. Te esticted caacteization as a special case In te esticted caacteization of te MST distibution, te convolution-type stocastic epesentation is given by Y µ + δ U 0 + U 1 5 wee [ U0 U 1 ] [ 0 N 1+p 0 ], 1 [ 1 0 w 0 Σ ]. 6 It follows tat te density of te MST distibution is given by + dy 1 fy t p y; µ, Ω, T 1 δ T Ω 1 y µ, δ T Ω 1 δ,. 7 On diectly compaing 5 and, it can be obseved tat te skewing effect of δ U 0 in te MST distibution can be acieved by te CFUST fomulation in a numbe of ways, including taking diagδ and in te unesticted case, and setting U 0 U 0 1 p in, wee 1 p denotes a p-dimensional vecto wit elements one; constaining te skewness matix to be a p p zeo matix except fo one column wic is given by δ, and setting te coesponding element in U 0 to be U 0 ; o setting q 1. As a coollay, it can be seen tat te special case of q p of te CFUST distibution encompass bot te MST and umst distibution toug stuctual constaints on te skewness paamete. Tis is given by taking δ 1 0... 0 δ 1 0... 0 δ 0... 0...... and 0 δ... 0.... 8.. δ p 0... 0 0 0... δ p fo te MST and umst distibutions espectively, wee δ δ 1, δ,..., δ p T. Note tat fo te MST case, δ can be placed in any one of te columns of, not necessaily te fist column as given in 8. 3
3 Paamete estimation via te EM algoitm We ae inteested in te finite mixtues wit component densities given by 3. A g-component mixtue of CFUST distibutions as density fy; Ψ g π fy; µ, Σ,,, 9 1 wee π 1,..., g ae te mixing popotions and f. denotes a CFUST density given by 3. We sall adopt te aconym FM-CFUST fo 9. Te vecto Ψ π 1,..., π g 1, θ T 1,..., θ T g contains all te unknown paametes of te model, wit θ containing te elements of µ and δ, te distinct elements of Σ, and. Simila to te MST and umst distibutions, te CFUST admits a convenient ieacical epesentation tat geatly facilitates paamete estimation via te EM algoitm: Y U, w N p µ + u, 1 w Σ U w HN q 0, 1 w I m w gamma, 10 wee N p µ, Σ denotes te multivaiate nomal distibution and HN q 0, Σ epesents te q-dimensional alf-nomal distibution wit mean 0 and scale matix Σ. Te EM algoitm fo FM-CFUST ten poceeds in a simila way as fo te FM-uMST model as descibed in Lee and McLaclan 013a. 3.1 E-Step In te E-step, te following conditional expectations ae computed: z k [ ] j E Ψ k zj 1 y j w k [ j E Ψ k wj y j, z j 1 ], e k [ 1j E Ψ k logwj y j, z j 1 ], e k [ j E Ψ k wj u j y j, z j 1 ], e k [ 3j E Ψ k wj u j u T j, y j, z j 1 ]. 4
Tey ae given by j π fy j ; µ k, Σk fy j ; Ψ k z k w k j k + dk y j, δk, k Tq q k T q q k k e k 1j w k j log + d k y j e k,j w k j E Ψ k[u j y j ],, +p+ dk y j +p dk + 0, Λk ;, k y j 0, Λk ; + d k y j +,, k ψ e k 3j w k j E Ψ k[u ju T j y j ], 1 wee U j y j as a q-dimensional tuncated t-distibution given by U j y j tt q q k k, + d y j + 11 Λ k, k + ; R +. 13 Note tat 11 is obtained using a OSL EM algoitm. Te deivations fo tese esults ae analogous to Lee and McLaclan 013a. Altenatively, a seies tuncation appoac can be used in place of 11, wic exploits an exact epesentation of tis conditional expectation, will be pesented late. 3. M-Step On te k + 1t M-step, te model paametes ae updated wit π k+1 1 n µ k+1 k+1 Σ k+1 n j1 z k j, n j1 z jw k j y j k n j1 z k n j wk j [ n j1 [ n j1 z k j z k j y j µ k+1 ] 1 { n y j µ k+1 1 j1 [ z k j n j1 zk j ek j e kt j w k j ] [ n e kt j k+1t j1, y j µ k+1 z k j ek 3j] 1, 14 y j µ k+1 T k+1 e k j y j µ k+1 ]} + k+1 e kt 3j k+1t. 15 An update of te degees of feedom is obtained by solving fo : n [ 0 z k ] n j log ψ + 1 e k 1j wk j. j1 T 5
3.3 Te MST and umst as special cases As pointed out peviously, te MST and umst distibutions can be obtained as special cases of te CFUST distibution by imposing stuctual constaints on te skewness paamete. In te case of te umst distibution, te diagonal elements of te component skewness matices ae estimated as δ k+1 Σ k 1 n j1 τ k j ek 4,j 1 diag [ Σ k 1 n j1 τ k j y j µ k+1 e kt 3,j ], 16 wee denotes te elementwise poduct. Unde te constaint tat is zeo except fo te fist column δ, tat is, [δ 0... 0], 14 educes to estimating δ k+1 n j1 τ k j [ek 3,j ] 1y j µ k+1 j1 τ k j [e, 17 4,j] 11 wee [e k 3,j ] 1 and [e k 4,j ] 11 denote te fist element of e k 3,j and ek 4,j, espectively. On compaing te above wit te coesponding expession fo te MST mixtue model see, fo example, equation 9 fom Wang et al. 009, it can obseved tat te two expessions ae indeed equivalent. 3.4 New esults on e k 1,j : te tuncated seies appoac We now pesent te deivation of a new exact expession fo e k 1,j, expessed in tems of an infinite seies involving multivaiate t-distibution functions. By te law of iteated expectations, e k [ ] 1,j E Ψ k logwj y j [ [ ] ] E Ψ k E Ψ k logwj u j, y j yj [ k + d k E Ψ k ψ log y j + u j q k j T Λ k 1 u j q k j ] y j k + d k ψ log y [ j E Ψ k log 1 + u j q k j T Λ k 1 u j q k + d k y j To calculate te last tem in 18, note tat te conditional density of u j given y j can be witten as t q u j ; q k j, +dk y j Λ k fu j y j +p, k T q q k j ; 0, +dk y. 19 j Λ k, k +p Afte some algebaic manipulations, it follows tat te expectation in 18 can be educed to a simple closed-fom expession see equation 76 fom Lee and McLaclan 013a, except j ]. 18 6
fo a tem involving an integal given by S k 1,j 0 log 1 + u j q k j T Λ k 1 u j q k j + d k y j k t q u j ; q k j, + d k y j Λ k, k du j 1 s 1 1 + u j q k j T Λ k 1 u j q k j s s 1 s0 0 + d k y j k t q u j ; q k j, + d k y j Λ k, k du j 0 1 s 1 s +p k +p+q s 1 s0 k +p k +p+q k 0 t q u j ; q k j, + d k y j Λ k, k du j 1 k +p+q 1 s 1 s +p k +p s 1 s0 k +p+q k T q q k j ; 0, + d k y j Λ k, k In 0 above, we ave applied a Taylo expansion of logx fo 0 < x < and te binomial expansion fomula to obtain a seies expession fo logx, given by logx 1 s0 1 s 1 s x s, fo 0 < x <. Note tat te tem inside te logaitm in 1 is always geate tan one. Hence, its invese lies between 0 and 1, and we can apply te above seies epesentation. In pactice, S k 1,j can be appoximated by a small finite numbe of tems. 7
It follows tat e k 1,j e k 1,j ψ is given by + log +p+q 1 s0 T q 4 Conclusions +p q k T 1 q 1 s 1 k q k s k j ; 0, + d k y j + d k y j k j ; 0, + d k y j s +p k +p+q Λ k, k Λ k, k. 3 We ave intoduced a finite mixtue of CFUST distibutions, wee te component densities ae te genealization of te esticted and unesticted skew t-distibutions. Paamete estimation via te EM algoitm as been outlined, following te tuncated moments appoac in Ho et al. 01 and Lee and McLaclan 013a. Futemoe, new esults ave been deived fo conditional expectation w k j, exploiting an infinite seies epesentation of te logaitm function. Tis leads to a full implementation of te EM algoitm fo te FM-CFUST model wit exact and closed-fom expessions fo all E-step conditional expectations. Simulations and applications of te FM-CFUST model, as well as extension to facto analyzes, will be teated in a fotcoming manuscipt. Refeences Aellano-Valle RB, Azzalini A 006 On te unification of families of skew-nomal distibutions. Scandinavian Jounal of Statistics 33:561 574 Aellano-Valle RB, Genton MG 005 On fundamental skew distibtuions. Jounal of Multivaiate Analysis 96:93 116 Cabal CS, Lacos VH, Pates MO 01 Multivaiate mixtue modeling using skew-nomal independent distibutions. Computational Statistics and Data Analysis 56:16 14 Ho HJ, Lin TI, Cen HY, Wang WL 01 Some esults on te tuncated multivaiate t distibution. Jounal of Statistical Planning and Infeence 14:5 40 Lee S, McLaclan GJ 011 On te fitting of mixtues of multivaiate skew t-distibutions via te EM algoitm. axiv:11094706 [statme] Lee S, McLaclan GJ 013a Finite mixtues of multivaiate skew t-distibutions: some ecent and new esults. Statistics and Computing DOI 10.1007/s11-01-936-4 8
Lee SX, McLaclan GJ 013b On mixtues of skew-nomal and skew t-distibutions. Advances in Data Analysis and Classification 7:41 66 Lin TI 010 Robust mixtue modeling using multivaiate skew t distibution. Statistics and Computing 0:343 356 Muay PM, Bowne BP, McNicolas PD 013 Mixtues of skew-t facto analyzes. Pepint axiv: 13054301 [statme] Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maie LM, Baece-Allan C, McLaclan GJ, Tamayo P, Hafle DA, De Jage PL, Mesiow JP 009 Automated ig-dimensional flow cytometic data analysis. Poceedings of te National Academy of Sciences USA 106:8519 854 Sau SK, Dey DK, Banco MD 003 A new class of multivaiate skew distibutions wit applications to Bayesian egession models. Te Canadian Jounal of Statistics 31:19 150 Vbik I, McNicolas PD 01 Analytic calculations fo te EM algoitm fo multivaiate skew t-mixtue models. Statistics and Pobability Lettes 8:1169 1174 Wang K, Ng SK, McLaclan GJ 009 Multivaiate skew t mixtue models: applications to fluoescence-activated cell soting data. In: Si H, Zang Y, Bottema MJ, Lovell BC, Maede AJ eds DICTA 009 Confeence of Digital Image Computing: Tecniques and Applications, Melboune, IEEE Compute Society, Los Alamitos, Califonia, pp 56 531 9