Robust mixture modeling using multivariate skew t distributions

Robust mxture modelng usng multvarate skew t dstrbutons Tsung-I Ln Department of Appled Mathematcs and Insttute of Statstcs Natonal Chung Hsng Unversty, Tawan August, 1 T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

OUTLINE 1 Introducton Prelmnares The multvarate skew t (MST dstrbuton 3 The multvarate skew t mxture model Model formulaton and estmaton Example: The AIS data 5 Concludng Remarks T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 / 15

Introducton 1. INTRODUCTION Fnte mxture models have become a useful tool for modelng data that are thought to come from several dfferent groups wth varyng proportons. Ln et al. (7 proposed a novel (unvarate skew t mxture (STMIX model, whch allows for accommodaton of both skewness and thck tals for makng robust nferences. Drawback: lmted to data wth unvarate outcomes. We propose a multvarate verson of the STMIX (MSTMIX model, composed of a weghed sum of g-component multvarate skew t (MST dstrbutons. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 3 / 15

Prelmnares The multvarate skew t (MST dstrbuton The multvarate skew t (MST dstrbuton The MST dstrbuton, Y St p (ξ,σ,λ,ν, can be represented by The stochastc representaton of skew t dstrbuton Y = µ Z τ, Z SN p (,Σ,Λ, τ Γ(ν/,ν/, Z τ (1 Y τ SN p (µ,σ/τ,λ/ τ Proposton 1. If τ Γ(α,β, then for any a R p E ( Φ p(a τ α = T p (a β ; α. Integratngτ from the jont densty of (Y,τ yelds ψ(y ξ,σ,λ,ν = p ν p t p(y ξ,ω,νt p (q U ν where q = ΛΩ 1 (y ξ and U = (y ξ Ω 1 (y ξ. ;ν p, ( T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 / 15

[ µ = Prelmnares ] [ 1 ρ, Σ = ρ 1 The multvarate skew t (MST dstrbuton ] [ ] λ1, λ =, ν = λ (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, Fgure 1: The scatter plots and contours and together wth ther hstograms. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 5 / 15

The multvarate skew t mxture model The MSTMIX model Model formulaton and estmaton The MSTMIX model f(y j Θ = g w ψ(y j ξ,σ,λ,ν, (3 =1 where ψ(y j ξ,σ,λ,ν represents the MST densty, and w s are the mxng probabltes satsfyng g =1 w = 1. Introduce allocaton varables Z j = (Z 1j,...,Z gj, j = 1,...,n, whose values are a set of bnary varables wth { 1 f Y Z j = j belongs to group, otherwse, and satsfyng g =1 Z j = 1. Denoted by Z j M(1; w 1,...,w g. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 6 / 15

The multvarate skew t mxture model Model formulaton and estmaton A herarchcal representaton of (3 s Y j (γ j,τ j, Z j = 1 N p(ξ Λ γ j,σ /τ j, γ j (τ j, Z j = 1 HN p(, I p/τ j, τ j (Z j = 1 Γ(ν /,ν /, Z j M(1; w 1,...,w g. ( The complete data log-lkelhood functon of Θ s = l c(θ y,γ,τ, Z g n =1 j=1 { Z j log(w ν ( log ν logγ ( ν 1 log Σ ( ν p 1 logτ j τ ( j (y j ξ Λ γ j Σ 1 (y j ξ Λ γ j ν γ j γ j }. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 7 / 15

The multvarate skew t mxture model Model formulaton and estmaton Computatonal aspects of parameter estmaton The Q functon s Q(Θ ˆΘ (k = E(l c (Θ y,γ,τ, Z y, ˆΘ (k. In the MCEM-based algorthm, Q-functon can be approxmated by ˆQ(Θ ˆΘ (k = 1 M M m=1 l c (Θ y, ˆγ (k [m],ˆτ (k [m], Z, (5 where ˆγ (k [m] = {ˆγ (k j,m } and ˆτ (k [m] = {ˆτ (k j,m } are ndependently generated by ( 1 ˆγ (k1 j,m (y j, Z j = 1 T t p ˆq (k j, Û (k ˆν (k j ˆ (k,ˆν (k p; R p. ˆτ (k1 j,m (ˆγ(k1 j,m, y j, Z j = 1 (k (ˆν Γ p, (ˆγ(k1 j,m ˆq(k j pˆν (k ˆ (k 1 (ˆγ (k1 j,m ˆq(k j Û(k j ˆν (k. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 8 / 15

The multvarate skew t mxture model The MCECM algorthm Model formulaton and estmaton l(θ Y o ˆθ ( ˆQ(θ ˆθ (k stoppng rule l c (θ Y c MCE CM ˆθ ˆθ (k1 arg max Q θ 1 θ θ 3 fx ˆθ(k, ˆθ(k 3 ˆθ(k1 1, ˆθ(k 3 ˆθ (k1 1, ˆθ (k1 T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 9 / 15

The multvarate skew t mxture model Model formulaton and estmaton CM-steps: ŵ (k1 ˆξ (k1 = ˆΛ (k1 Obtan ˆν (k1 ˆΣ (k1 = = n 1 n j=1 n j=1 ˆτ(k j ẑ (k j y j ˆΛ (k { (ˆΣ(k = dag 1 1 n j=1 ẑ(k j ˆΛ (k1 as the soluton of log ( ν 1 DG n j=1 ˆη(k j n j=1 ˆτ(k j (k 1 (ˆΣ(k ˆB 1 1 ( n ˆB (k 1 j=1 ˆτ (k j ˆΛ (k1 ( ν (k } ˆB 1p (y j ˆξ (k1 (y j ˆξ (k1 ˆΛ (k1 1 n j=1 ẑ(k j ˆB (k If the dfs are assumed to be dentcal, update ˆν (k by n ( g ˆν (k1 = argmax log ŵ (k1 (k1 ψ(y j ˆξ, ν j=1 =1 n j=1 (ˆκ (k j ˆB (k (k1 ˆΣ, ˆΛ (k1 ˆτ (k j =. (k1 ˆΛ,ν. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

Example: The AIS data The Australan Insttute of Sport (AIS data Data : The AIS data taken by Cook and Wesberg (199. There are athletes whch nclude 1 females and 1 males. Varables : BMI (Body mass ndex; kg/m and Bfat (Body fat percentage. 5 3 35 5 1 15 5 3 35 BMI Bfat female male T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 11 / 15

Example: The AIS data A two-component MSTMIX model can be wrtten as f(y j Θ = wf(y j ξ 1,Σ 1,Λ 1,ν 1 (1 wf(y j ξ,σ,λ,ν, where [ ] ξ = (ξ 1,ξ σ,11 σ, Σ =,1 σ,1 σ, [ ] λ,11 and Λ =. λ, (a (b Profle log-lkelhood -18.5-18. -179.5-179. -178.5-178. 5 1 3 5 nu profle log lkelhood 17 18 19 11 111 11 113 5 3 nu 1 5 1 15 nu1 5 3 Fgure : Plot of the profle log-lkelhood for ν 1 and ν wth a two component MSTMIX model wth (a ν 1 = ν = ν (b ν 1 ν. (ˆν 1 =., ˆν =.1 T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

Example: The AIS data Table 1:Summary results from fttng varous mxture models on the AIS data. Θ MVNMIX MVTMIX MSNMIX MSTMIX mle se mle se mle se mle se w.39..7.58.51.6.7.65 ξ 11 3.19.3 3.373.8 1.998. 1.676.77 ξ 1 7.959.3 8.3 1.8 5.898.11 5.97.57 ξ 1.87.393.9.69 19.319.38 19.79.35 ξ 16.77.697 17.31.579 13.96 1.76 17.13 1.139 σ 1,11.878.7 3.791.873 3.178.988.73.39 σ 1,1 1.551.59.8.61.51.31.579.1 σ 1,.111.66 3.158.573.11.115.1.975 σ,11 1.971 1.68 5.66 1.98.765 1.55..533 σ,1.96.81 6.589 1.839 7.11.15 7.7 1.1 σ, 3.13.97.36 5.5.6 9.15 3.8.777 λ 1,11 1.163 3.3 1.615.36 λ 1, 3.13.565 3.17.139 λ,11.85.8.19 1.789 λ,.6 1.91.895 6.88 ν 5.8 1.66 11.1 5.7 m 11 1 15 16 l(ˆθ 197.79 193.585 18.67 177.76 AIC 17.581 11.17 191.93 187.51 BIC 53.97 5.87.917.53 AIC = l(ˆθ m; BIC = l(ˆθ m log(n, l(ˆθ s the maxmzed log-lkelhood, m s the number of parameters and n s the sample sze. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 13 / 15

Example: The AIS data 5 3 35 5 1 15 5 3 35 BMI Bfat (a MVNMIX 5 3 35 5 1 15 5 3 35 BMI Bfat (b MVTMIX 5 3 35 5 1 15 5 3 35 BMI Bfat (c MSNMIX 5 3 35 5 1 15 5 3 35 BMI Bfat (d MSTMIX Fgure 3: Scatter plot of BMI and Bfat wth supermposed contours of two-component varous models. The sex are ndcated by the female ( and male (. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

Concludng remarks Concludng Remarks Contrbutons: 1 Propose a new robust the MSTMIX model, whch offers a great deal of flexblty that accommodates asymmetry and heavy tals smultaneously. Allow practtoners to analyze heterogeneous multvarate data n a broad varety of consderatons. 3 MCEM-based algorthms are developed for computng ML estmates. Numercal results show that the MSTMIX model performs reasonably well for the expermental data. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 15 / 15