Sharon X. Lee, Geoffrey J. McLachlan Department of mathematics, the University of Queensland, Brisbane, Australia

Similar documents
Finite Mixtures of Canonical Fundamental Skew t-distributions

On Quantifying the Resistance of Concrete Hash Functions to Generic Multi-Collision Attacks

A pathway to matrix-variate gamma and normal densities

Weighted least-squares estimators of parametric functions of the regression coefficients under a general linear model

Tutorial on Strehl ratio, wavefront power series expansion, Zernike polynomials expansion in small aberrated optical systems By Sheng Yuan

Absorption Rate into a Small Sphere for a Diffusing Particle Confined in a Large Sphere

Pearson s Chi-Square Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

On the Poisson Approximation to the Negative Hypergeometric Distribution

Information Retrieval Advanced IR models. Luca Bondi

Analytical Solutions for Confined Aquifers with non constant Pumping using Computer Algebra

Central Coverage Bayes Prediction Intervals for the Generalized Pareto Distribution

Bayesian Analysis of Topp-Leone Distribution under Different Loss Functions and Different Priors

Multiple Criteria Secretary Problem: A New Approach

On the integration of the equations of hydrodynamics

Compactly Supported Radial Basis Functions

COMPUTATIONS OF ELECTROMAGNETIC FIELDS RADIATED FROM COMPLEX LIGHTNING CHANNELS

ON INDEPENDENT SETS IN PURELY ATOMIC PROBABILITY SPACES WITH GEOMETRIC DISTRIBUTION. 1. Introduction. 1 r r. r k for every set E A, E \ {0},

A Multivariate Normal Law for Turing s Formulae

Surveillance Points in High Dimensional Spaces

Reliability analysis examples

Conservative Averaging Method and its Application for One Heat Conduction Problem

Identification of the degradation of railway ballast under a concrete sleeper

Method for Approximating Irrational Numbers

Chem 453/544 Fall /08/03. Exam #1 Solutions

1D2G - Numerical solution of the neutron diffusion equation

Multiple Experts with Binary Features

Goodness-of-fit for composite hypotheses.

MEI Structured Mathematics. Module Summary Sheets. Numerical Methods (Version B reference to new book)

State tracking control for Takagi-Sugeno models

I. CONSTRUCTION OF THE GREEN S FUNCTION

A Backward Identification Problem for an Axis-Symmetric Fractional Diffusion Equation

6 PROBABILITY GENERATING FUNCTIONS

Introduction to Mathematical Statistics Robert V. Hogg Joeseph McKean Allen T. Craig Seventh Edition

arxiv: v1 [math.co] 4 May 2017

This is a very simple sampling mode, and this article propose an algorithm about how to recover x from y in this condition.

Numerical approximation to ζ(2n+1)

Alternative Tests for the Poisson Distribution

Implicit Constraint Enforcement for Rigid Body Dynamic Simulation

A NEW VARIABLE STIFFNESS SPRING USING A PRESTRESSED MECHANISM

A New Approach to General Relativity

( ) [ ] [ ] [ ] δf φ = F φ+δφ F. xdx.

Chapter 5 Linear Equations: Basic Theory and Practice

New problems in universal algebraic geometry illustrated by boolean equations

A Relativistic Electron in a Coulomb Potential

Hammerstein Model Identification Based On Instrumental Variable and Least Square Methods

Fractional Zero Forcing via Three-color Forcing Games

Hydroelastic Analysis of a 1900 TEU Container Ship Using Finite Element and Boundary Element Methods

A generalization of the Bernstein polynomials

Casimir-Polder potential for parallel metallic plates in background of a conical defect

Semicanonical basis generators of the cluster algebra of type A (1)

A New Method of Estimation of Size-Biased Generalized Logarithmic Series Distribution

On a generalization of Eulerian numbers

16 Modeling a Language by a Markov Process

q i i=1 p i ln p i Another measure, which proves a useful benchmark in our analysis, is the chi squared divergence of p, q, which is defined by

Hua Xu 3 and Hiroaki Mukaidani 33. The University of Tsukuba, Otsuka. Hiroshima City University, 3-4-1, Ozuka-Higashi

Lifting Private Information Retrieval from Two to any Number of Messages

Hypothesis Test and Confidence Interval for the Negative Binomial Distribution via Coincidence: A Case for Rare Events

PROBLEM SET #1 SOLUTIONS by Robert A. DiStasio Jr.

Mitscherlich s Law: Sum of two exponential Processes; Conclusions 2009, 1 st July

On Polynomials Construction

Analytic Evaluation of two-electron Atomic Integrals involving Extended Hylleraas-CI functions with STO basis

As is natural, our Aerospace Structures will be described in a Euclidean three-dimensional space R 3.

Aalborg Universitet. Load Estimation from Natural input Modal Analysis Aenlle, Manuel López; Brincker, Rune; Canteli, Alfonso Fernández

Likelihood vs. Information in Aligning Biopolymer Sequences. UCSD Technical Report CS Timothy L. Bailey

Research Article On Alzer and Qiu s Conjecture for Complete Elliptic Integral and Inverse Hyperbolic Tangent Function

Rate Splitting is Approximately Optimal for Fading Gaussian Interference Channels

Journal of Inequalities in Pure and Applied Mathematics

A Comparison and Contrast of Some Methods for Sample Quartiles

Secret Exponent Attacks on RSA-type Schemes with Moduli N = p r q

Mohammadiad'Ingenieurs(EMI, Avenue Ibn Sina, B.P. 765, Agdal,Rabat, Maroc. c

How to Obtain Desirable Transfer Functions in MIMO Systems Under Internal Stability Using Open and Closed Loop Control

Gradient-based Neural Network for Online Solution of Lyapunov Matrix Equation with Li Activation Function

LET a random variable x follows the two - parameter

A Bijective Approach to the Permutational Power of a Priority Queue

0606 ADDITIONAL MATHEMATICS 0606/01 Paper 1, maximum raw mark 80

An upper bound on the number of high-dimensional permutations

On the Quasi-inverse of a Non-square Matrix: An Infinite Solution

6 Matrix Concentration Bounds

Classification and Ordering of Portfolios and of New Insured Unities of Risks

A new approach in classical electrodynamics to protect principle of causality

DonnishJournals

3.1 Random variables

A matrix method based on the Fibonacci polynomials to the generalized pantograph equations with functional arguments

Chapter 3 Optical Systems with Annular Pupils

Perturbation to Symmetries and Adiabatic Invariants of Nonholonomic Dynamical System of Relative Motion

JENSEN S INEQUALITY FOR DISTRIBUTIONS POSSESSING HIGHER MOMENTS, WITH APPLICATION TO SHARP BOUNDS FOR LAPLACE-STIELTJES TRANSFORMS

PHYS 110B - HW #7 Spring 2004, Solutions by David Pace Any referenced equations are from Griffiths Problem statements are paraphrased

General Solution of EM Wave Propagation in Anisotropic Media

Bifurcation Analysis for the Delay Logistic Equation with Two Delays

KOEBE DOMAINS FOR THE CLASSES OF FUNCTIONS WITH RANGES INCLUDED IN GIVEN SETS

MEASURES OF BLOCK DESIGN EFFICIENCY RECOVERING INTERBLOCK INFORMATION

MULTILAYER PERCEPTRONS

MODULE 5a and 5b (Stewart, Sections 12.2, 12.3) INTRO: In MATH 1114 vectors were written either as rows (a1, a2,..., an) or as columns a 1 a. ...

arxiv: v1 [physics.gen-ph] 18 Aug 2018

Functions Defined on Fuzzy Real Numbers According to Zadeh s Extension

On a quantity that is analogous to potential and a theorem that relates to it

arxiv: v1 [physics.pop-ph] 3 Jun 2013

Crack Propagation and the Wave Equation on time dependent domains

Chapter 6 Balanced Incomplete Block Design (BIBD)

Review: Electrostatics and Magnetostatics

Transcription:

Maximum Likeliood Estimation fo Finite Mixtues of Canonical Fundamental Skew t-distibutions: te Unification of te Unesticted and Resticted Skew t-mixtue Models axiv:1401.818v1 [stat.me] 31 Jan 014 Saon X. Lee, Geoffey J. McLaclan Depatment of matematics, te Univesity of Queensland, Bisbane, Austalia Abstact In tis pape, we pesent an algoitm fo te fitting of a location-scale vaiant of te canonical fundamental skew t CFUST distibution, a supeclass of te esticted and unesticted skew t-distibutions. In ecent yeas, a few vesions of te multivaiate skew t MST model ave been put fowad, togete wit vaious EM-type algoitms fo paamete estimation. Tese fomulations adopted eite a esticted o unesticted caacteization fo tei MST densities. In tis pape, we examine a natual genealization of tese developments, employing te CFUST distibution as te paametic family fo te component distibutions, and point out tat te esticted and unesticted caacteizations can be unified unde tis geneal fomulation. We sow tat an exact implementation of te EM algoitm can be acieved fo te CFUST distibution and mixtues of tis distibution, and pesent some new analytical esults fo a conditional expectation involved in te E-step. 1 Intoduction To date, vaious paametic families of densities ave been poposed in te liteatue to epesent te multivaiate skew t MST model, all aving eite a esticted o unesticted caacteization accoding to te classification sceme pesented in Lee and McLaclan 013b. In paticula, te esticted MST MST mixtue model o its equivalent vaiants by epaameteization ave been examined in Pyne et al. 009, Cabal et al. 01 and Vbik and McNicolas 01, wile Lin 010 and Lee and McLaclan 011 focused on te unesticted MST umst mixtue model adopting te caacteization poposed by Sau et al. 003. In addition, Muay et al. 013 ecently consideed an altenative constuction tat also adopted te tem skew t-distibution, altoug efeing to a quite diffeent fomulation moe commonly known as te genealized-ypebolic skew t GHST mixtue model. Altoug tis latte distibution is a esticted type of skew distibution unde te agument tat te skewing vaiable is a scala, it diffes fom te MST distibution in a numbe ways, suc as aving diffeent beaviou in its tails and not becoming a skew nomal distibution as a limiting and/o special case. In tis pape, we conside a natual and staigtfowad extension of te umst distibution, wee te skewing function of its density is given by a q-dimensional t-distibution function, 1

and te skewness paamete is a geneal p by q matix. Tis fomulation coincides wit a special case of te unified skew t SUT distibution Aellano-Valle and Azzalini, 006 and te fundamental skew t FUST distibution Aellano-Valle and Genton, 005, and is known as a location-scale vaiant of te canonical fundamental skew t CFUST distibution. Tis caacteization includes te esticted fom of te MST distibution. As discussed in Lee and McLaclan 013a, wile exact implementation of te Expectation- Maximization EM algoitm ave been acievable fo esticted caacteizations of te component skew t-distibutions, paamete estimation fo te unesticted and moe geneal caacteizations is substantially moe inticate. In pevious wok on mixtues of unesticted skew t-distibutions using Sau et al. 003 s caacteization a submodel of te CFUST distibution consideed in tis pape, Lin 010 esoted to Monte Calo metods fo te computation of intactable conditional expectations involved in te E-step, wile Lee and McLaclan 013a adopted te OSL one-step late appoac to evaluating one of tese difficult conditional expectation. In contast fo te MST distibution, a numbe of closed-fom implementations of te EM algoitm ave been put fowad see Lee and McLaclan 013a fo fute discussion. Tis pape pesents an EM fitting algoitm fo te FM-CFUST model. Unde tis geneal fomulation, te afoementioned esticted and unesticted skew t-mixtue models can be obtained as special cases and, as suc, te expessions on te E- and M-steps of te poposed algoitm would apply diectly in tese situations. In contast to te appoac of Lin 010 wic elies on computationally intensive numeical appoximations, ou development was motivated by te desie fo an exact implementation of te EM algoitm using te tuncated moments appoac Lee and McLaclan, 013a, Ho et al., 01, leading to muc faste and moe accuate esults. Also, we deive a new exact expession fo te emaining E-step conditional expectation tat ad peviously elied on appoximation o numeical metods. Te CFUST distibution Te multivaiate canonical fundamental skew t-distibution CFUST was intoduced by Aellano-Valle and Genton 005. A location-scale vaiant of te CFUST distibution can be caacteized as follows. Let U 1 be p-dimensional andom vecto and U 0 a q-dimensional andom vecto. Suppose tat, conditional on a scala gamma vaiable w gamma,, te joint distibution of U 0 and U 1 is given by [ ] [ ] U 0 0 N U q+p, 1 [ ] Iq 0, 1 1 0 w 0 Σ wee is a scala, I q denotes te q q identity matix, Σ is a positive definite scale matix, and 0 is a vecto/matix of zeos wit appopiate dimensions. Ten Y µ + U 0 + U 1 follows te CFUST distibution, wose density is defined by f y; µ, Σ,, q t p y; µ, Ω, T q qy ; 0, Λ,, 3 + dy wee Ω Σ + T, q T Ω 1 y µ, Λ I p T Ω 1, dy y µ T Ω 1 y µ.

Hee, we let t p y; µ, Ω, denotes te p-dimensional t-distibution wit location paamete µ, scale matix Ω, and degee of feedom v, and T q. is te q-dimensional cumulative t- distibution function. In te above, U 0 denotes te vecto wose it element is te magnitude of te it element of te vecto U 0..1 Te unesticted caacteization as a special case As mentioned peviously, te caacteization of Sau et al. 003 coesponds to te CFUST distibution wit q p and diagδ. Hence, we can wite Λ I p Σ 1, q Ω 1 y µ and Ω Σ +, and te umst density is given by f y; µ, Σ, δ, p t p y; µ, Ω, T p qy ; 0, Λ, + dy. 4 Te umst distibution as te same stocastic epesentation as given by and 1 wit q eplaced by p.. Te esticted caacteization as a special case In te esticted caacteization of te MST distibution, te convolution-type stocastic epesentation is given by Y µ + δ U 0 + U 1 5 wee [ U0 U 1 ] [ 0 N 1+p 0 ], 1 [ 1 0 w 0 Σ ]. 6 It follows tat te density of te MST distibution is given by + dy 1 fy t p y; µ, Ω, T 1 δ T Ω 1 y µ, δ T Ω 1 δ,. 7 On diectly compaing 5 and, it can be obseved tat te skewing effect of δ U 0 in te MST distibution can be acieved by te CFUST fomulation in a numbe of ways, including taking diagδ and in te unesticted case, and setting U 0 U 0 1 p in, wee 1 p denotes a p-dimensional vecto wit elements one; constaining te skewness matix to be a p p zeo matix except fo one column wic is given by δ, and setting te coesponding element in U 0 to be U 0 ; o setting q 1. As a coollay, it can be seen tat te special case of q p of te CFUST distibution encompass bot te MST and umst distibution toug stuctual constaints on te skewness paamete. Tis is given by taking δ 1 0... 0 δ 1 0... 0 δ 0... 0...... and 0 δ... 0.... 8.. δ p 0... 0 0 0... δ p fo te MST and umst distibutions espectively, wee δ δ 1, δ,..., δ p T. Note tat fo te MST case, δ can be placed in any one of te columns of, not necessaily te fist column as given in 8. 3

3 Paamete estimation via te EM algoitm We ae inteested in te finite mixtues wit component densities given by 3. A g-component mixtue of CFUST distibutions as density fy; Ψ g π fy; µ, Σ,,, 9 1 wee π 1,..., g ae te mixing popotions and f. denotes a CFUST density given by 3. We sall adopt te aconym FM-CFUST fo 9. Te vecto Ψ π 1,..., π g 1, θ T 1,..., θ T g contains all te unknown paametes of te model, wit θ containing te elements of µ and δ, te distinct elements of Σ, and. Simila to te MST and umst distibutions, te CFUST admits a convenient ieacical epesentation tat geatly facilitates paamete estimation via te EM algoitm: Y U, w N p µ + u, 1 w Σ U w HN q 0, 1 w I m w gamma, 10 wee N p µ, Σ denotes te multivaiate nomal distibution and HN q 0, Σ epesents te q-dimensional alf-nomal distibution wit mean 0 and scale matix Σ. Te EM algoitm fo FM-CFUST ten poceeds in a simila way as fo te FM-uMST model as descibed in Lee and McLaclan 013a. 3.1 E-Step In te E-step, te following conditional expectations ae computed: z k [ ] j E Ψ k zj 1 y j w k [ j E Ψ k wj y j, z j 1 ], e k [ 1j E Ψ k logwj y j, z j 1 ], e k [ j E Ψ k wj u j y j, z j 1 ], e k [ 3j E Ψ k wj u j u T j, y j, z j 1 ]. 4

Tey ae given by j π fy j ; µ k, Σk fy j ; Ψ k z k w k j k + dk y j, δk, k Tq q k T q q k k e k 1j w k j log + d k y j e k,j w k j E Ψ k[u j y j ],, +p+ dk y j +p dk + 0, Λk ;, k y j 0, Λk ; + d k y j +,, k ψ e k 3j w k j E Ψ k[u ju T j y j ], 1 wee U j y j as a q-dimensional tuncated t-distibution given by U j y j tt q q k k, + d y j + 11 Λ k, k + ; R +. 13 Note tat 11 is obtained using a OSL EM algoitm. Te deivations fo tese esults ae analogous to Lee and McLaclan 013a. Altenatively, a seies tuncation appoac can be used in place of 11, wic exploits an exact epesentation of tis conditional expectation, will be pesented late. 3. M-Step On te k + 1t M-step, te model paametes ae updated wit π k+1 1 n µ k+1 k+1 Σ k+1 n j1 z k j, n j1 z jw k j y j k n j1 z k n j wk j [ n j1 [ n j1 z k j z k j y j µ k+1 ] 1 { n y j µ k+1 1 j1 [ z k j n j1 zk j ek j e kt j w k j ] [ n e kt j k+1t j1, y j µ k+1 z k j ek 3j] 1, 14 y j µ k+1 T k+1 e k j y j µ k+1 ]} + k+1 e kt 3j k+1t. 15 An update of te degees of feedom is obtained by solving fo : n [ 0 z k ] n j log ψ + 1 e k 1j wk j. j1 T 5

3.3 Te MST and umst as special cases As pointed out peviously, te MST and umst distibutions can be obtained as special cases of te CFUST distibution by imposing stuctual constaints on te skewness paamete. In te case of te umst distibution, te diagonal elements of te component skewness matices ae estimated as δ k+1 Σ k 1 n j1 τ k j ek 4,j 1 diag [ Σ k 1 n j1 τ k j y j µ k+1 e kt 3,j ], 16 wee denotes te elementwise poduct. Unde te constaint tat is zeo except fo te fist column δ, tat is, [δ 0... 0], 14 educes to estimating δ k+1 n j1 τ k j [ek 3,j ] 1y j µ k+1 j1 τ k j [e, 17 4,j] 11 wee [e k 3,j ] 1 and [e k 4,j ] 11 denote te fist element of e k 3,j and ek 4,j, espectively. On compaing te above wit te coesponding expession fo te MST mixtue model see, fo example, equation 9 fom Wang et al. 009, it can obseved tat te two expessions ae indeed equivalent. 3.4 New esults on e k 1,j : te tuncated seies appoac We now pesent te deivation of a new exact expession fo e k 1,j, expessed in tems of an infinite seies involving multivaiate t-distibution functions. By te law of iteated expectations, e k [ ] 1,j E Ψ k logwj y j [ [ ] ] E Ψ k E Ψ k logwj u j, y j yj [ k + d k E Ψ k ψ log y j + u j q k j T Λ k 1 u j q k j ] y j k + d k ψ log y [ j E Ψ k log 1 + u j q k j T Λ k 1 u j q k + d k y j To calculate te last tem in 18, note tat te conditional density of u j given y j can be witten as t q u j ; q k j, +dk y j Λ k fu j y j +p, k T q q k j ; 0, +dk y. 19 j Λ k, k +p Afte some algebaic manipulations, it follows tat te expectation in 18 can be educed to a simple closed-fom expession see equation 76 fom Lee and McLaclan 013a, except j ]. 18 6

fo a tem involving an integal given by S k 1,j 0 log 1 + u j q k j T Λ k 1 u j q k j + d k y j k t q u j ; q k j, + d k y j Λ k, k du j 1 s 1 1 + u j q k j T Λ k 1 u j q k j s s 1 s0 0 + d k y j k t q u j ; q k j, + d k y j Λ k, k du j 0 1 s 1 s +p k +p+q s 1 s0 k +p k +p+q k 0 t q u j ; q k j, + d k y j Λ k, k du j 1 k +p+q 1 s 1 s +p k +p s 1 s0 k +p+q k T q q k j ; 0, + d k y j Λ k, k In 0 above, we ave applied a Taylo expansion of logx fo 0 < x < and te binomial expansion fomula to obtain a seies expession fo logx, given by logx 1 s0 1 s 1 s x s, fo 0 < x <. Note tat te tem inside te logaitm in 1 is always geate tan one. Hence, its invese lies between 0 and 1, and we can apply te above seies epesentation. In pactice, S k 1,j can be appoximated by a small finite numbe of tems. 7

It follows tat e k 1,j e k 1,j ψ is given by + log +p+q 1 s0 T q 4 Conclusions +p q k T 1 q 1 s 1 k q k s k j ; 0, + d k y j + d k y j k j ; 0, + d k y j s +p k +p+q Λ k, k Λ k, k. 3 We ave intoduced a finite mixtue of CFUST distibutions, wee te component densities ae te genealization of te esticted and unesticted skew t-distibutions. Paamete estimation via te EM algoitm as been outlined, following te tuncated moments appoac in Ho et al. 01 and Lee and McLaclan 013a. Futemoe, new esults ave been deived fo conditional expectation w k j, exploiting an infinite seies epesentation of te logaitm function. Tis leads to a full implementation of te EM algoitm fo te FM-CFUST model wit exact and closed-fom expessions fo all E-step conditional expectations. Simulations and applications of te FM-CFUST model, as well as extension to facto analyzes, will be teated in a fotcoming manuscipt. Refeences Aellano-Valle RB, Azzalini A 006 On te unification of families of skew-nomal distibutions. Scandinavian Jounal of Statistics 33:561 574 Aellano-Valle RB, Genton MG 005 On fundamental skew distibtuions. Jounal of Multivaiate Analysis 96:93 116 Cabal CS, Lacos VH, Pates MO 01 Multivaiate mixtue modeling using skew-nomal independent distibutions. Computational Statistics and Data Analysis 56:16 14 Ho HJ, Lin TI, Cen HY, Wang WL 01 Some esults on te tuncated multivaiate t distibution. Jounal of Statistical Planning and Infeence 14:5 40 Lee S, McLaclan GJ 011 On te fitting of mixtues of multivaiate skew t-distibutions via te EM algoitm. axiv:11094706 [statme] Lee S, McLaclan GJ 013a Finite mixtues of multivaiate skew t-distibutions: some ecent and new esults. Statistics and Computing DOI 10.1007/s11-01-936-4 8

Lee SX, McLaclan GJ 013b On mixtues of skew-nomal and skew t-distibutions. Advances in Data Analysis and Classification 7:41 66 Lin TI 010 Robust mixtue modeling using multivaiate skew t distibution. Statistics and Computing 0:343 356 Muay PM, Bowne BP, McNicolas PD 013 Mixtues of skew-t facto analyzes. Pepint axiv: 13054301 [statme] Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maie LM, Baece-Allan C, McLaclan GJ, Tamayo P, Hafle DA, De Jage PL, Mesiow JP 009 Automated ig-dimensional flow cytometic data analysis. Poceedings of te National Academy of Sciences USA 106:8519 854 Sau SK, Dey DK, Banco MD 003 A new class of multivaiate skew distibutions wit applications to Bayesian egession models. Te Canadian Jounal of Statistics 31:19 150 Vbik I, McNicolas PD 01 Analytic calculations fo te EM algoitm fo multivaiate skew t-mixtue models. Statistics and Pobability Lettes 8:1169 1174 Wang K, Ng SK, McLaclan GJ 009 Multivaiate skew t mixtue models: applications to fluoescence-activated cell soting data. In: Si H, Zang Y, Bottema MJ, Lovell BC, Maede AJ eds DICTA 009 Confeence of Digital Image Computing: Tecniques and Applications, Melboune, IEEE Compute Society, Los Alamitos, Califonia, pp 56 531 9