Robust mixture modeling using multivariate skew t distributions

Size: px

Start display at page:

Download "Robust mixture modeling using multivariate skew t distributions"

Oswald Gavin Underwood
5 years ago
Views:

1 Robust mxture modelng usng multvarate skew t dstrbutons Tsung-I Ln Department of Appled Mathematcs and Insttute of Statstcs Natonal Chung Hsng Unversty, Tawan August, 1 T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

2 OUTLINE 1 Introducton Prelmnares The multvarate skew t (MST dstrbuton 3 The multvarate skew t mxture model Model formulaton and estmaton Example: The AIS data 5 Concludng Remarks T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 / 15

3 Introducton 1. INTRODUCTION Fnte mxture models have become a useful tool for modelng data that are thought to come from several dfferent groups wth varyng proportons. Ln et al. (7 proposed a novel (unvarate skew t mxture (STMIX model, whch allows for accommodaton of both skewness and thck tals for makng robust nferences. Drawback: lmted to data wth unvarate outcomes. We propose a multvarate verson of the STMIX (MSTMIX model, composed of a weghed sum of g-component multvarate skew t (MST dstrbutons. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 3 / 15

4 Prelmnares The multvarate skew t (MST dstrbuton The multvarate skew t (MST dstrbuton The MST dstrbuton, Y St p (ξ,σ,λ,ν, can be represented by The stochastc representaton of skew t dstrbuton Y = µ Z τ, Z SN p (,Σ,Λ, τ Γ(ν/,ν/, Z τ (1 Y τ SN p (µ,σ/τ,λ/ τ Proposton 1. If τ Γ(α,β, then for any a R p E ( Φ p(a τ α = T p (a β ; α. Integratngτ from the jont densty of (Y,τ yelds ψ(y ξ,σ,λ,ν = p ν p t p(y ξ,ω,νt p (q U ν where q = ΛΩ 1 (y ξ and U = (y ξ Ω 1 (y ξ. ;ν p, ( T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 / 15

5 [ µ = Prelmnares ] [ 1 ρ, Σ = ρ 1 The multvarate skew t (MST dstrbuton ] [ ] λ1, λ =, ν = λ (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, (ρ, λ1, λ = (.9,, Fgure 1: The scatter plots and contours and together wth ther hstograms. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 5 / 15

6 The multvarate skew t mxture model The MSTMIX model Model formulaton and estmaton The MSTMIX model f(y j Θ = g w ψ(y j ξ,σ,λ,ν, (3 =1 where ψ(y j ξ,σ,λ,ν represents the MST densty, and w s are the mxng probabltes satsfyng g =1 w = 1. Introduce allocaton varables Z j = (Z 1j,...,Z gj, j = 1,...,n, whose values are a set of bnary varables wth { 1 f Y Z j = j belongs to group, otherwse, and satsfyng g =1 Z j = 1. Denoted by Z j M(1; w 1,...,w g. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 6 / 15

7 The multvarate skew t mxture model Model formulaton and estmaton A herarchcal representaton of (3 s Y j (γ j,τ j, Z j = 1 N p(ξ Λ γ j,σ /τ j, γ j (τ j, Z j = 1 HN p(, I p/τ j, τ j (Z j = 1 Γ(ν /,ν /, Z j M(1; w 1,...,w g. ( The complete data log-lkelhood functon of Θ s = l c(θ y,γ,τ, Z g n =1 j=1 { Z j log(w ν ( log ν logγ ( ν 1 log Σ ( ν p 1 logτ j τ ( j (y j ξ Λ γ j Σ 1 (y j ξ Λ γ j ν γ j γ j }. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 7 / 15

8 The multvarate skew t mxture model Model formulaton and estmaton Computatonal aspects of parameter estmaton The Q functon s Q(Θ ˆΘ (k = E(l c (Θ y,γ,τ, Z y, ˆΘ (k. In the MCEM-based algorthm, Q-functon can be approxmated by ˆQ(Θ ˆΘ (k = 1 M M m=1 l c (Θ y, ˆγ (k [m],ˆτ (k [m], Z, (5 where ˆγ (k [m] = {ˆγ (k j,m } and ˆτ (k [m] = {ˆτ (k j,m } are ndependently generated by ( 1 ˆγ (k1 j,m (y j, Z j = 1 T t p ˆq (k j, Û (k ˆν (k j ˆ (k,ˆν (k p; R p. ˆτ (k1 j,m (ˆγ(k1 j,m, y j, Z j = 1 (k (ˆν Γ p, (ˆγ(k1 j,m ˆq(k j pˆν (k ˆ (k 1 (ˆγ (k1 j,m ˆq(k j Û(k j ˆν (k. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 8 / 15

9 The multvarate skew t mxture model The MCECM algorthm Model formulaton and estmaton l(θ Y o ˆθ ( ˆQ(θ ˆθ (k stoppng rule l c (θ Y c MCE CM ˆθ ˆθ (k1 arg max Q θ 1 θ θ 3 fx ˆθ(k, ˆθ(k 3 ˆθ(k1 1, ˆθ(k 3 ˆθ (k1 1, ˆθ (k1 T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 9 / 15

10 The multvarate skew t mxture model Model formulaton and estmaton CM-steps: ŵ (k1 ˆξ (k1 = ˆΛ (k1 Obtan ˆν (k1 ˆΣ (k1 = = n 1 n j=1 n j=1 ˆτ(k j ẑ (k j y j ˆΛ (k { (ˆΣ(k = dag 1 1 n j=1 ẑ(k j ˆΛ (k1 as the soluton of log ( ν 1 DG n j=1 ˆη(k j n j=1 ˆτ(k j (k 1 (ˆΣ(k ˆB 1 1 ( n ˆB (k 1 j=1 ˆτ (k j ˆΛ (k1 ( ν (k } ˆB 1p (y j ˆξ (k1 (y j ˆξ (k1 ˆΛ (k1 1 n j=1 ẑ(k j ˆB (k If the dfs are assumed to be dentcal, update ˆν (k by n ( g ˆν (k1 = argmax log ŵ (k1 (k1 ψ(y j ˆξ, ν j=1 =1 n j=1 (ˆκ (k j ˆB (k (k1 ˆΣ, ˆΛ (k1 ˆτ (k j =. (k1 ˆΛ,ν. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

11 Example: The AIS data The Australan Insttute of Sport (AIS data Data : The AIS data taken by Cook and Wesberg (199. There are athletes whch nclude 1 females and 1 males. Varables : BMI (Body mass ndex; kg/m and Bfat (Body fat percentage BMI Bfat female male T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 11 / 15

12 Example: The AIS data A two-component MSTMIX model can be wrtten as f(y j Θ = wf(y j ξ 1,Σ 1,Λ 1,ν 1 (1 wf(y j ξ,σ,λ,ν, where [ ] ξ = (ξ 1,ξ σ,11 σ, Σ =,1 σ,1 σ, [ ] λ,11 and Λ =. λ, (a (b Profle log-lkelhood nu profle log lkelhood nu nu1 5 3 Fgure : Plot of the profle log-lkelhood for ν 1 and ν wth a two component MSTMIX model wth (a ν 1 = ν = ν (b ν 1 ν. (ˆν 1 =., ˆν =.1 T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

13 Example: The AIS data Table 1:Summary results from fttng varous mxture models on the AIS data. Θ MVNMIX MVTMIX MSNMIX MSTMIX mle se mle se mle se mle se w ξ ξ ξ ξ σ 1, σ 1, σ 1, σ, σ, σ, λ 1, λ 1, λ, λ, ν m l(ˆθ AIC BIC AIC = l(ˆθ m; BIC = l(ˆθ m log(n, l(ˆθ s the maxmzed log-lkelhood, m s the number of parameters and n s the sample sze. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 13 / 15

14 Example: The AIS data BMI Bfat (a MVNMIX BMI Bfat (b MVTMIX BMI Bfat (c MSNMIX BMI Bfat (d MSTMIX Fgure 3: Scatter plot of BMI and Bfat wth supermposed contours of two-component varous models. The sex are ndcated by the female ( and male (. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 1 / 15

15 Concludng remarks Concludng Remarks Contrbutons: 1 Propose a new robust the MSTMIX model, whch offers a great deal of flexblty that accommodates asymmetry and heavy tals smultaneously. Allow practtoners to analyze heterogeneous multvarate data n a broad varety of consderatons. 3 MCEM-based algorthms are developed for computng ML estmates. Numercal results show that the MSTMIX model performs reasonably well for the expermental data. T.I. Ln (NCHU Natonal Chung Hsng Unversty August, 1 15 / 15

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons