Estimation of the Precision Matrix of a Singular Wishart Distribution and its Application in High Dimensional Data

Size: px

Start display at page:

Download "Estimation of the Precision Matrix of a Singular Wishart Distribution and its Application in High Dimensional Data"

Cynthia Moore
5 years ago
Views:

1 Estmaton of the Precson Matrx of a Sngular Wshart Dstrbuton and ts Applcaton n Hgh Dmensonal Data Tatsuya Kubokawa and Mun S. Srvastava Unversty of Tokyo and Unversty of Toronto July 8, 005 Abstract In ths artcle, Sten-Haff dentty s establshed for a sngular Wshart dstrbuton wth a postve defnte mean matrx but wth the dmenson larger than the degrees of freedom. Ths dentty s then used to obtan estmators of the precson matrx mprovng on the estmator based on the Moore-Penrose nverse of the Wshart matrx under the Efron-Morrs loss functon and ts varants. Rdge-type emprcal Bayes estmators of the precson matrx are also gven and ther domnance propertes over the usual one are shown usng ths dentty. Fnally, these precson estmators are used n a quadratc dscrmnant rule, and t s shown through smulaton that the use of the rdge-type emprcal Bayes estmators provdes hgher correct classfcaton rates. Key words and phrases: Covarance matrx, dscrmnant analyss, domnance property, Efron-Morrs loss functon, emprcal Bayes procedure, multvarate classfcaton, precson matrx, sngular Wshart, Sten-Haff dentty. 1 Introducton The estmaton of the precson matrx, namely the nverse of the covarance matrx Σ, of a multvarate normal dstrbuton has been an mportant ssue n practcal stuatons as well as from theoretcal aspects, and when the dmenson p s smaller than the number of observatons n, Efron and Morrs (1976) consdered ths problem. But, when p>n, the Wshart matrx s sngular, and thus many estmators can be constructed by usng a generalzed nverse of the sample covarance matrx. However, Srvastava (004) proposed the unque Moore-Penrose nverse of the sample covarance matrx as t uses the suffcent Faculty of Economcs, Unversty of Tokyo, Hongo, Bunkyo-ku, Tokyo , JAPAN, E-Mal: tatsuya@e.u-tokyo.ac.jpfaculty Department of Statstcs, Unversty of Toronto, 100 St George Street, Toronto, Ontaro, CANADA M5S 3G3, E-Mal: srvasta@utstat.toronto.edu 1

2 statstc for Σ. In ths paper, we obtan several estmators theoretcally mprovng on the Moore-Penrose nverse estmator of the precson matrx, some of whch are shown to be very useful n dscrmnant analyss. To specfy the problem consdered here, let x 1,...,x N be ndependently and dentcally dstrbuted (..d.) as multvarate normal wth mean vector µ and a p p postve defnte matrx Σ denoted as N p (µ, Σ), Σ > 0. Let x = N 1 N x, n = N 1 and Then W = N (x x)(x x) t. W = YY t where Y =(y 1,...,y n ), y 1,...,y n are..d. N p (0, Σ), and W has a Wshart dstrbuton wth mean nσ and degrees of freedom n, denoted as W p (Σ,n). When n<p, t s called a sngular Wshart dstrbuton, whose dstrbuton has been recently studed by Srvastava (003). In many nference procedures, an estmate of the precson matrx Σ 1 s requred. Srvastava (004) used nw +, where W + s the Moore-Penrose nverse of W. We shall consder a generalzed verson of ths estmator for estmatng the precson matrx. It s gven by δ a = aw + for a constant a. The man am of ths paper s to develop estmators of Σ 1 mprovng on the usual one δ a n term of rsk n a decson-theoretc framework. To evaluate the rsk of δ a, however, we cannot employ the Sten loss functon L S (δ, Σ) =trδσ log δσ p for estmator δ, because of the sngularty of W +. Alternatve loss functons are of the forms L k (δ, Σ) = tr (δ Σ 1 ) W k for k =0, 1,, where the L 1 -loss was used by Efron and Morrs (1976), and the L 1 - and L 0 -losses were used by Haff (1979b). However, all the estmators that we obtan n Secton under the losses L 0 and L 1 domnatng the estmator nw +, requre not only that p>nbut that n = O(p ε ), 0 ε<1. In practcal applcatons ths s a severe restrcton. On the other hand, no such restrcton on n and p are requred under the loss functon L. Thus, for obtanng a rdge type emprcal Bayes estmator of the precson matrx Σ 1, we consder only L loss functon. To develop analytcal domnance propertes of estmators, we need to derve the socalled Sten-Haff dentty n the sngular Wshart dstrbuton. The Sten-Haff dentty was derved by Sten (1977) and Haff (1979a) for the full rank Wshart dstrbuton. A smlar dentty for the ellptcally contoured model has been gven by Kubokawa and Srvastava (1999). It has been well known that the Sten-Haff dentty s a very powerful tool to develop sgnfcant domnance results. In the Appendx, we derve the Sten-Haff

3 dentty n the sngular Wshart dstrbuton, whch s equally powerful. Wth the help of ths dentty, we obtan n Secton several estmators domnatng δ a under the three loss functons L 0, L 1 and L. In Secton 3, the emprcal Bayes approach to the estmaton of the precson matrx s gven to provde rdge-type stable procedures domnatng δ a under the loss functon L. It may be of great nterest to nvestgate how much useful the mproved estmators of the precson matrx are n practcal multvarate nference procedures. Whle ts applcaton n tests and confdence ntervals for mean vectors are currently under nvestgaton, we n ths artcle consder an equally mportant problem of classfyng an observaton vector nto one of two groups wth unequal covarance matrces. Through smulatons we show that our empercal Bayes procedures usng nonsngular rdge type estmators for the precson matrces provde sgnfcantly hgher correct classfcaton rates for the quadratc classfcaton rules; these rates are very close to the rates obtaned when all the parameters are known. Estmaton of the precson matrx For estmatng the precson matrx n the case of p > n, n ths paper, we consder orthogonally equvarant estmators of the general form δ(φ) =H 1 Φ(l)H t 1, (.1) Φ(l) = dag (φ 1 (l),...,φ n (l)). Instead of the functon Φ(l), we often use the functon Ψ = Ψ(l) = dag (ψ 1 (l),...,ψ n (l)) for Ψ(l) =LΦ(l), ψ (l) =l φ (l), =1,...,n. To evaluate the estmators, we use the followng three loss functons L 0 (δ, Σ) =tr (δ Σ 1 ), (.) L 1 (δ, Σ) =tr (δ Σ 1 ) W, (.3) L (δ, Σ) =tr (δ Σ 1 ) W, (.4) whch are here called the L 0 -loss, the L 1 -loss and the L -loss functons. The rsk functon of estmator δ relatve to the L k -loss s wrtten by R k (Σ, δ) =EL k (δ, Σ)] for k =0, 1 and. Domnance results n terms of the rsks are gven below for the L 1, L 0 and L loss functons, but all the proofs are gven n the Appendx. It s specally noted that the Sten-Haff dentty (A.1) n the sngular Wshart dstrbuton s qute useful for establshng the domnance propertes and the dervaton of the ndentty s also gven n the Appendx. 3

4 .1 Domnance results relatve to the L 1 -loss We frst handle the L 1 -loss, for t s the most tractable of the three. The rsk functon of δ(φ) under the loss L 1 (δ, Σ) s expressed as R 1 (Σ, δ(φ)) = E tr {δ(φ)} W tr δ(φ)w Σ 1] + ntr Σ 1. (.5) Then the Sten-Haff dentty gven by Lemma A.1 s appled to rewrte Etr δ(φ)w Σ 1 ] as E tr δ(φ)w Σ 1] =E tr H 1 ΦLH t 1 Σ 1] (.6) = E (p n 1)φ + (l φ )+ ] l φ l j φ j. l l j> l j Combnng (.5) and (.6), we get the followng expresson of the rsk functon. Proposton.1 Assume that p>n+1. The rsk functon of the orthogonally equvarant estmator δ(φ) relatve to the L 1 -loss (.3) s expressed by R 1 (Σ,δ(Φ)) ntr Σ 1 = E l φ (p n 1)φ 4 ] l φ l j φ j 4 (l φ ) l j> l j l ψ (p n 1)ψ = E 4 ] ψ ψ j 4 ψ. l l j> l j l In ths artcle, we consder estmators of the knd ah 1 L 1 H t 1, a>0. From Proposton.1, ths estmator has the rsk {a (p n 1)a}Etr L 1 ]+ntr Σ 1, whch s mnmzed at a = p n 1. Hence, the estmator wth the best multple s δ 1 = a 1 H 1 L 1 H t 1, a 1 = p n 1 wth the rsk R 1 (Σ, δ 1 )= a 1Etr L 1 ]+ntr Σ 1. Although t s not possble to get an unbased estmator of the rsk R 1 (Σ, δ(φ)) n the case of p>n, we can provde an unbased estmator of the rsk dfference R 1 (Σ, δ(φ)) R 1 (Σ, δ 1 ), whch gves a suffcent condton for mprovng on the estmator δ 1. Proposton. The estmator δ(φ) domnates δ 1 relatve to the L 1 -loss (.3) f ψ (l) s satsfy the nequalty { ψ ψ a 1 4 } ψ ψ j 4 ψ a 1, l l l j> l j l l for p>n+1, and ψ = l φ. The followng proposton s very useful for developng mproved estmators. 4

5 Proposton.3 Assume that Ψ(l) = dag (ψ 1 (l),...,ψ n (l)) satsfes the followng condtons for p>n+1: (a) ψ (l)/ l 0 for =1,...,n. (b) ψ 1 (l) ψ n (l) =p n 1. (c) n + p 1 ψ (l) for each. Then the estmator δ(l 1 Ψ)=H 1 L 1 Ψ(l)H t 1 domnates the estmator δ 1 relatve to the L 1 -loss (.3). Proposton.3 drectly provdes an example of the Sten type estmator gven by δ S =H 1 DL 1 H t 1, (.7) D =dag (d 1,...,d n ), d =n + p 1 for =1,...,n. Ths corresponds to the case of φ = d or ψ = l d, and the domnance property follows from Proposton.3. Proposton.4 The Sten type estmator δ S domnates δ 1 under the L 1 -loss. The Sten type estmator δ S can be further mproved on by the estmator δ IS (g) =δ S + g(l) tr W I p, (.8) where g(l) s an absolutely contnuous functon. Ths domnance property follows from Proposton.5. Proposton.5 Assume that g(l) satsfes the condtons: (a) g(l)/ l 0 for =1,...,n. (b) 0 <g(l) 4(n 1). Then the estmator δ IS (g) domnates the Sten type estmator δ S under the L 1 -loss. Puttng g(l) = (n 1) n (.8) gves the mproved estmator δ IS = H 1 DL 1 H t (n 1) 1 + tr W I p, (.9) whch we shall call the mproved Sten type estmator. It s noted that δ IS has a form smlar to the Efron-Morrs type estmator gven by δ EM =(p n 1)H 1 L 1 H t (n 1)(n +) 1 + I p. (.10) tr W Usng the same arguments as n the proof of Proposton.5 shows that δ EM domnates δ 1 relatve to the L 1 -loss, but t s not known f t domnates δ S or δ IS. Proposton.3 allows us to produce a new type of mproved estmator, gven by δ R = H 1 dag (φ R 1 (l),...,φr n (l))ht 1, (.11) 5

6 where for =1,...,n, φ R (l) = d l + ˆλ and d = n + p 1. Here, ˆλ n = 0 and for =1,...,n 1, ˆλ s a functon of l +1,...,l n defned sequentally by ˆλ =(d ˆλ+1 +l )/d +1. (.1) It s nterestng to note that the estmator δ R s a rdge type because of nonnegatveness of ˆλ s. Proposton.6 The rdge type estmator δ R domnates δ 1 under the L 1 -loss.. Domnance results relatve to the L 0 -loss The rsk functon of δ(φ) under the loss L 0 (δ, Σ) s expressed as R 0 (Σ, δ(φ)) = E tr {δ(φ)} tr δ(φ)σ 1] +trσ. Usng the Sten-Haff dentty gven by Lemma A.1 for the term Etr δ(φ)σ 1 ], we get the followng expresson of the rsk functon. Proposton.7 Assume that p>n+3. The rsk functon of the orthogonally equvarant estmator δ(φ) relatve to the L 0 -loss (.) s expressed by R 0 (Σ,δ(Φ)) tr Σ = E φ (p n 1)φ 4 l j> ψ = E (p n 3)ψ 4 l j> φ φ j l l j 4 φ l ] l j ψ l ψ j l l j (l l j ) 4 l ψ l where Φ(l) =L 1 Ψ(l) = dag (ψ 1 (l)/l 1,...,ψ n (l)/l n ) for ψ = l φ. ], From Proposton.7, the estmator of the form ah 1 L 1 H t 1 has the rsk that R 0 (Σ,aH 1 L 1 H t 1 ) tr Σ ={a (p n 3)a}Etr L ]+4a E1/(l l j )] (.13) ={a (p n )a}etr L ]+ae(tr L 1 ) ], snce n j> 1/(l l j ) = (tr L 1 ) tr L. Ths expresson shows that the best constant a does not exst, but we suggest a reasonable choce of a gven by j> δ 0 = a 0 H 1 L 1 H t 1, a 0 = p n 3. 6

7 As seen from (.13), R 0 (Σ, δ 0 ) R 0 (Σ,aH 1 L 1 H t 1) for any a>a 0 and any Σ, whch mples that δ 0 domnates δ 1 as ponted out by Haff (1979b). An unbased estmator of the rsk dfference R 0 (Σ, δ(φ)) R 0 (Σ, δ 0 ) can be provded and a suffcent condton for mprovng on the estmator δ 1 s gven n the followng. Proposton.8 The estmator δ(φ) domnates δ 0 relatve to the L 0 -loss (.) f ψ (l) s satsfy the nequalty { ψ a 0ψ 4 } { l j ψ l ψ j l l j> l j (l l j ) 4 ψ a 0 + } 4a 0, l l l l j> l j for p>n+3. Proposton.8 provdes the condton for the estmator δ(φ) to domnate δ 0 n the case of p>n. Some mproved estmators proposed by Haff (1979b) and Dey (1987) for n>pcan hold domnance propertes n the case p>nby nterchangng n and p. Of these, Dey (1987) proposed the use of estmators of the form δ DE (g) =δ 0 + g(l) tr W W, (.14) where g(l) s an absolutely contnuous functon. The followng proposton provdes condtons for δ DE (g) to domnate δ 0. Proposton.9 Assume that g(l) satsfes the condtons: (a) g(l)/ l 0 for =1,...,n. (b) 0 <g(l) (n 1)(n +4). Then the estmator δ DE (g) domnates the Sten type estmator δ 0 under the L 0 -loss. Puttng g(l) =(n 1)(n + 4) n (.15) gves the mproved estmator δ DE (n 1)(n +4) = δ 0 + tr W W, (.15) whch we shall call the Dey type estmator. Fnally, we shall derve a Sten type estmator domnatng δ 0 lke Propostons.3 and.4. Let r =(n 1)/ fn s odd and r = n/ fn s even. Defne constants d by d = max {a 0 +(n +1 ), a 0 } = max{p + n 4 1, p n 3} { p + n 4, f =1,...,r, = p n 3, f = r +1,...,n. and let D = dag (d 1,...,d n ). The resultng Sten type estmator s of the form δ S = H 1 D L 1 H t 1. The domnance property of δ S over δ 0 follows from the followng proposton. 7

8 Proposton.10 Assume that Ψ(l) = dag (ψ 1 (l),...,ψ n (l)) satsfes the followng condtons for p>n+3: (a) ψ (l)/ l 0 for =1,...,n. (b) ψ 1 (l) ψ n (l) =p n 3 a 0. (c) d = max{p + n 4 1, p n 3} ψ for =1,...,n. Then the estmator δ(l 1 Ψ)=H 1 L 1 Ψ(l)H t 1, gven by (.1), domnates the estmator δ 0 relatve to the L 0 -loss (.3). Proposton.10 provdes not only the Sten type estmator δ S, but also a rdge type estmator for mprovng on δ 0. Defne ˆλ sequentally by The rdge type estmator s gven by where for =1,...,n, { ˆλ (d = ˆλ +1 +4l )/d +1, for =1,...,r, 0, for = r +1,...,n. δ R = H 1 dag (φ R 1 (l),...,φr n (l))ht 1, (.16) φ R (l) = d l + ˆλ. Then the same arguments as n the proof of Proposton.6 can be used to show that δ R domnates δ 0. Proposton.11 The rdge type estmator δ R domnates δ 0 under the L 0 -loss..3 Domnance results relatve to the L -loss The rsk functon of δ(φ) under the loss L (δ, Σ) s expressed as R (Σ, δ(φ)) = E tr {δ(φ)} W tr δ(φ)w Σ 1 +trw Σ ]. Usng the Sten-Haff dentty gven by Lemma A.1, we can derve an expresson of the rsk functon. Proposton.1 Assume that p>n+1. The rsk functon of the orthogonally equvarant estmator δ(φ) relatve to the L -loss (.4) s expressed by R (Σ,δ(Φ)) Etr W Σ ] = E ψ (p + n +1 )ψ 4 j> l j (ψ ψ j ) l l j 4l ψ l ], where Φ(l) =L 1 Ψ(l) = dag (ψ 1 (l)/l 1,...,ψ n (l)/l n ) for ψ = l φ. 8

9 From Proposton.1, the estmator of the form ah 1 L 1 H t 1 has the rsk that n{a pa} + Etr W Σ ], whch s mnmzed at a = p. Hence, the estmator wth the best multple s δ = a H 1 L 1 H t 1, a = p wth the rsk R (Σ, δ )= np ++Etr W Σ ]. Although t s not possble to get an unbased estmator of the rsk R (Σ, δ(φ)) n the case of p>n, we can provde an unbased estmator of the rsk dfference R (Σ, δ(φ)) R (Σ, δ 1 ), whch gves a suffcent condton for mprovng on the estmator δ. Proposton.13 The estmator δ(φ) domnates δ relatve to the L -loss (.4) f ψ (l) s satsfy the nequalty { ψ (p + n +1 )ψ + p 4 } l j (ψ ψ j ) ψ 4l 0 l j> l j l for p>n+1. One canddate for mprovng on δ may be the Efron-Morrs type estmator δ EM (g) =δ + g(l) tr W I p, (.17) where g(l) s an absolutely contnuous functon. The followng proposton provdes the condtons for δ EM (g) to domnate δ. Proposton.14 Assume that g(l) satsfes the condtons: (a) g(l)/ l 0 for =1,...,n. (b) 0 <g(l) (n 1). Then the estmator δ EM (g) domnates the estmator δ under the L -loss. Puttng g(l) = n 1 n (.18) gves the mproved estmator δ EM = δ + n 1 tr W I p. (.18) From Proposton.13, we can also get another condton for the estmator δ(φ) to domnate δ n the case of p>n+1. Proposton.15 Assume that Ψ(l) = dag (ψ 1 (l),...,ψ n (l)) satsfes the followng condtons for p>n+1: (a) ψ (l)/ l 0 for =1,...,n. (b) ψ 1 (l) ψ n (l). (c) n {ψ (p + n +1 )ψ + p } 0. Then the estmator δ(l 1 Ψ)=H 1 L 1 Ψ(l)H t 1 domnates the estmator δ relatve to the L -loss (.4). 9

10 Proposton.15 provdes an example of the Sten type estmator gven by δ S =H 1 D L 1 H t 1, D =dag (d 1,...,d n), d =p + n + 1 for =1,...,n. Proposton.16 The Sten type estmator δ S domnates δ under the L -loss. In the above secton, we have obtaned estmators under the three loss functons L 0, L 1 and L. These estmators domnate the estmators of the knd aw +. However, all the estmators obtaned under L 0 and L 1 loss functons nvolve a factor of p n. Ths n turn mples that n = O(p ε ), 0 ε<1. Such a restrcton, however, s not needed under the loss functon L. Thus, from now on, we wll consder estmators only under the loss functon L. But even under the L loss functon, the estmator of the knd pw + s not stable due to smallness of smaller egenvalues. Thus, t would be desrable to obtan rdge-type estmators of the precson matrx under the loss functon L. Ths s obtaned n the next secton usng emprcal Bayes methods. 3 Emprcal Bayes estmator of the precson matrx The estmators of the precson matrx Σ 1 gven n the prevous secton have the shortcomng of ther sngularty n the case of p>n. An approach to dervng nonsngular estmators of Σ 1 s to consder rdge type estmators of the form a(w + λi p ) 1 for postve constants a and λ. The mportant ssue n the use of the rdge type estmators s how to choose the rdge parameter λ. We here employ an emprcal Bayes method for gvng estmators of λ and show that the resultng rdge type emprcal Bayes estmators domnate the usual estmator pw + relatve to the L -loss functon. 3.1 Emprcal Bayes procedures In our Bayesan setup, we assume that Σ 1 has a Wshart dstrbuton wth mean rλ 1 I p, λ>0, r>p, and degrees of freedom r. That s, Σ 1 W p (λ 1 I p,r), λ>0, r p, wth the densty π(σ 1 )=c(p, r) λ 1 I p r/ Σ 1 (r p 1)/ etr( 1 λσ 1 ) where etr (A) stands for the exponental of the trace of the matrx A and c(p, r) = (pr)/ Γ p (r/) ] 1, Γp (r/) = π p(p 1)/4 p Γ( r +1 ). Snce W s dstrbuted as the sngular Wshart dstrbuton W p (Σ,n) for n<p, there exst a random varable Y =(y 1,...,y n ) such that W = YY t and y s are..d. N p (0, Σ), Σ > 0. Then, the jont p.d.f. of y 1,...,y n gven Σ 1 s gven by (π) (np)/ Σ 1 n/ etr { 1 Σ 1 YY t }. 10

11 Hence, the jont p.d.f. of Y and Σ 1 s gven by c(p, r)(π) (pn)/ λi p r/ Σ 1 (n+r p 1)/ etr { 1 Σ 1 (λi p + YY t )}. From ths jont densty, t s seen that the posteror dstrbuton of Σ 1 gven Y s gven by W p ((λi p + YY t ) 1,n+ r). Then, the Bayes estmator of Σ 1 s wrtten by δ B (λ) =EΣ 1 Y ]=(n + r)(λi p + YY t ) 1. (3.1) Snce λ s unknown, t should be estmated from the margnal densty of Y whose densty s gven by c(p, r) c(p, n + r) (π) pn/ λi p r/ / λi p + YY t (n+r)/. Makng the transformaton V = Y t Y, we obtan ts margnal densty from Lemma 3..3 of Srvastava and Khatr (1979) as c(p, r) pn/ c(p, n + r) Γ n (p/) λ np/ V (p n 1)/ I n + λ 1 V (n+r)/ pn/ c(p, r) = c(p, n + r) Γ n (p/) λ 1 V (p n 1)/ I n + λ 1 V (n+r)/ λ n(n+1)/ wth respect to the nonsngular matrx V. Note the expectatons E λ 1 V 1/n ], Etr λ 1 V ] and Etr λv 1 ] are constants ndependent of λ. These suggest the use of the followng moment estmators as possble canddates of estmators of λ: n ˆλ G =c V 1/n = c( l ) 1/n, ˆλ A =ctr V =(cn) 1 n l, ˆλ H =c/tr V 1 =(c/n) n n l 1, (3.) for postve constant c and the egenvalues l =(l 1,...,l n )ofv. It s noted that the estmators ˆλ G, ˆλ A and ˆλ H are based on the geometrc, the arthmetc and the harmonc means of l 1,...,l n. Another type of estmator s provded by the soluton of the equaton ˆλ M = c, (3.3) ˆλ M + l for a constant c satsfyng 0 < c < n. Ths s analogous to the maxmum lkelhood estmator n the margnal densty. An emprcal Bayes estmator can be derved by substtutng an estmator of the hyperparameter nto a Bayes estmator. When λ s estmated by an estmator ˆλ = ˆλ(l), the emprcal Bayes estmator of Σ 1 s gven by ( δ EB a (ˆλ) =a W + ˆλI ) 1 p, (3.4) 11

12 where a s a postve constant sutably chosen. In the Bayes estmator (3.1), a s gven by a = n + r. For r = p n, a s a = p, whch s the best multple under the L -loss functon. In the next subsecton, we examne the domnance property of the estmator n (3.4) under the loss functon L. 3. Domnance property under L -loss Now we shall nvestgate whether the emprcal Bayes estmator δ EB a (ˆλ) gven n (3.4) domnate the estmator of the form δ a = aw + for W + = H 1 L 1 H t 1 relatve to the L - loss (.4). Usng the Sten-Haff dentty gven by Lemma A.1, we derve n the followng proposton an unbased estmator of the rsk dfference of the two estmators δ EB a (ˆλ) and δ a, the proof of whch s gven n the Appendx. Proposton 3.1 An unbased estmator of the rsk dfference (a, ˆλ) =R (Σ, δ EB a (ˆλ)) R (Σ, δ a ) relatve to the L -loss s gven by (a, ˆλ) ˆλ { ˆλ ˆλ =a l + ˆλ (a +) + l + ˆλ j=1 l j + ˆλ l ˆλ } (a p + n +1)+4 ˆλ(l + ˆλ). (3.5) l As demonstrated n Secton.3, t s noted that the best multple a of estmators aw + s gven by a = p relatve to the L -loss. From Proposton 3.1, t s seen that the rdge type emprcal Bayes estmator δ EB (ˆλ) =p(w + ˆλI p ) 1 domnates the estmator δ = pw + f the rdge functon ˆλ satsfes the nequalty ˆλ { ˆλ l + ˆλ (p +) + l + ˆλ ˆλ l ˆλ } (n +1)+4 l j + ˆλ ˆλ(l + ˆλ) 0. (3.6) l j=1 Usng the condton (3.6), we frst obtan a condton on c for the functon ˆλ M to satsfy the nequalty (3.6), where ˆλ M s the soluton of the equaton ˆλ M ˆλ M + l = c. (3.7) Then from the mplct functon theorem, we can see that the partal dervatve ˆλ M / l s gven by ˆλ M = ˆλ M /(l + ˆλ M ) l n j=1 l j/(l j + ˆλ M ), 1

13 whch s used to get that l ( ˆλ M / l ) (l + ˆλ M ) =ˆλ M l /(l + ˆλ M ) 4 l /(l + ˆλ M ) { ˆλ l /(l + ˆλ M ) } M l /(l + ˆλ M ) l =ˆλ M (l + ˆλ M ) = ˆλ M l + ˆλ M ˆλ M (l + ˆλ M ). From the nequalty (3.6) and the equaton c = n ˆλ M /(l + ˆλ M ), we get a suffcent condton gven by (p ) ˆλ M (l + ˆλ M ) (n 1)c +c 0, whch can be satsfed for 0 <c (n 1)/p snce ˆλ M /(l + ˆλ M ) c. We thus get the followng domnance result. Proposton 3. Assume that the constant c satsfes the nequalty 0 <c (n 1)/p. Let ˆλ M be the unque soluton of the equaton (3.7). Then, the rdge type emprcal Bayes estmator δ EB (ˆλ M )=p(w + ˆλ M I p ) 1 (3.8) domnates δ = pw + under the L -loss. We next show that the functon ˆλ H = c/tr V 1 = c/ n l 1 satsfes the nequalty (3.6). It s noted that l ˆλ H = ˆλ H ˆλ H l c, ˆλ H /l = c/(1 + j l /l j ) c. (3.9) Then from the condton (3.6), we can get a suffcent condton gven by c (p +) 1+c +n c 1 (n +1)+4 1+c 1+c 0, whch can be satsfed for 0 <c (n 1)/p. Proposton 3.3 Assume that the constant c satsfes the nequalty 0 <c (n 1)/p. Then, the rdge type emprcal Bayes estmator δ EB (ˆλ H ) domnates δ = pw + under the L -loss. From Propostons 3. and 3.3, t s seen that ˆλ M and ˆλ H are two superor estmators of λ n the sense that the resultng rdge type emprcal Bayes estmators δ EB (ˆλ M ) and δ EB (ˆλ H ) have smaller rsks than δ relatve to the L -loss. For the other estmators ˆλ G and ˆλ A gven by (3.), however, we could not show smlar domnance propertes of the resultng emprcal Bayes estmators under the L -loss. Ths may be due to the unboundedness of the functons ˆλ G /l and ˆλ A /l. 13

14 4 Applcaton to multvarate classfcaton Now we nvestgate how much useful the mproved estmators of the precson matrx are n the multvarate dscrmnant analyss. It should be noted that the use of the mproved precson estmators does not theoretcally guarantee the mprovement n reducng the classfcaton errors. Although the dea of usng the mproved precson estmators n the dscrmnant rule s qute ntutve, t s worth nspectng through the smulaton studes. The related problems have been studed by Fredman (1989), Loh (1997) and Zhao, Honda and Konsh (1996) and others. Of these, Fredman (1989) handled the case of the dmenson p beng much larger than the degrees of freedom n, and proposed regularzed dscrmnant rules where the rdge parameters are determned by the cross-valdaton method. We here try to answer the query about whether the correct classfcaton rates can be mproved or not by usng the mproved precson estmators derved n the prevous sectons. We treat the problem of classfyng observatons nto two classes of the dstrbutons: π : N p (µ, Σ ) for unknown µ and Σ, =1,. For =1,, let x 1,...,x n be a tranng sample from π, and suppose that µ s estmated by the sample mean x and that the precson matrx Σ 1 s estmated by δ based on W = n j=1 (x j x )(x j x ) t. A new observaton x s classfed nto π 1 f (x x 1 ) t δ 1 (x x 1 ) < (x x ) t δ (x x ), (4.1) and nto π otherwse. The smulaton experment s planned as follows: 100 tranng samples are generated and each tranng sample constructs the above quadratc dscrmnant rule, whch classfes 00 new observatons nto the two classes of the dstrbutons, where 100 observatons are generated from π 1 and the other 100 from π. Thus, the correct classfcaton rates are computed based on total classfcatons. As the estmators of the precson matrx Σ 1 based on W, we use the followng estmators: the Srvastava type estmators nw + and pw +, the mproved Sten estmator δ IS gven by (.9), the Efron-Morrs estmator δ EM gven by (.10), the rdge-type emprcal Bayes estmators δ EB (ˆλ H ), δ EB (ˆλ M ) gven by (3.8) for c =(n 1)/, and δ EB (tr W /p) and δ EB (tr W /n), whch are abbrevated by SRn, SRp, IS, EM, RH, RM, RAp and RAn. All these estmators are used n the estmators δ 1 and δ of the classfcaton rule (4.1), for example, the estmator nw +, SRn, gves the rule (x x 1 ) t n 1 W + 1 (x x 1 ) < (x x ) t n W + (x x ). In the smulaton experments, t s supposed that µ 1 = µ (1,...,1) t for µ =0.8, 1.0 and 1.5, µ = 0, Σ = I p and Σ 1 = dag (σ 1,...,σ p )(ρ j )dag (σ 1,...,σ p ), where σ = /p for =1,...,p, and (ρ j )sap p matrx wth ρ j =(k/6) j / for k = 1,, 3, 4 and 5. Table 1 reports the correct classfcaton rates when the estmators SRn, SRp, IS, EM, RH, RM, RAp and RAn are used for p = 100, (n 1,n )=(5, 5), (5, 50) and (50, 5), and µ =0.8, 1.0 and 1.5, where TR means the correct classfcatons rates based on the true rule (x µ 1 ) t Σ 1 1 (x µ 1) < (x µ ) t Σ 1 (x µ ), 14

15 Table 1: Correct Classfcaton Rates by Smulaton for p = 100, µ =0.8, 1.0 and 1.5, and (n 1,n )=(5, 5), (5, 50) and (50, 5). k TR SRn SRp IS EM RH RM RAp RAn µ = n 1 = n = µ = n 1 = n = µ = n 1 = n = µ = n 1 = n = µ = n 1 = n =

16 Table : Correct Classfcaton Rates by Smulaton for p = 50, µ =1.5 and (n 1,n )= (30, 30), (10, 10), (5, 5), (10, 30) and (30, 10). (n 1,n ) k TR SRn SRp IS EM RH RM RAp RAn n 1 = n = n 1 = n = n 1 = n = n 1 = n = n 1 = n =

17 whch uses the true parameters nstead of ther estmators. Table reports the correct classfcaton rates when the same estmators are used for p = 50, µ =1.5 and (n 1,n )= (30, 30), (10, 10), (5, 5), (10, 30) and (30, 10). Overvew through the smulaton results n Tables 1 and reveals that the rdge-type emprcal Bayes estmators RH, RM and RAn provde sgnfcant mprovements n the correct classfcaton rates for all the cases, and ther rates are close to those of the true classfcaton rule TR. Although the other rdge-type estmator RAp, whch uses tr W /p as an estmator of λ, provdes good behavors n the cases of equal sample szes n 1 = n, but not good for sample szes n 1 and n extremely dfferent. Use of the mproved estmators IS and EM gans relatvely small mprovements n the classfcaton. The dfference between SRn and SRp appears n the fourth and the ffth part of the Table 1 and n the fourth part of Table. Overall t appears that SRn s better than SRp n terms of the correct classfcaton rates for all the cases. We conclude ths secton wth gvng comments for the nterestng query about whether the use of the mproved precson estmator leads to the mprovement n the correct classfcaton rates. It s noted that the estmator IS, EM, RH and RM domnate the estmators of the form aw +, especally RH and RM beat SRp as estmators of the precson matrx. Certanly, IS, EM, RH and RM outperform SRn and SRp for most cases, but the mprovements of IS and EM are much smaller than those of RH and RM. On the other hand, the use of the rdge-type estmator RAn yelds the substantal mprovements although the domnance property of RAn over SRp cannot be guaranteed. Takng these observatons nto account, we can guess that the rdge-form n the precson estmators sgnfcantly affects the mprovement n the correct classfcaton rates rather than the domnance property of the precson estmators. In the use of the rdge-type estmators, the estmator of the rdge-parameter λ senstvely affects the behavors of the dscrmnant rules, and the estmators ˆλ H, ˆλ M and tr W /n, whch correspond to RH, RM and RAn, are good choces n the quadratc dscrmnant rule. Among these three, RAn s the smplest wth performance comparable to RM. A Appendx We here gve the proofs of the propostons n Sectons and 3. For ths purpose, we frst develop the so-called Sten-Haff dentty for the sngular Wshart dstrbuton. A.1 Sten-Haff dentty for the sngular Wshart dstrbuton Let H 1 be a p n matrx such that H t 1 H 1 = I n, that s, H 1 H n,p, the Stefel manfold, and W = H 1 LH t 1, where L = dag (l 1,l,...,l n ), an n n dagonal matrx, where l 1 >l > >l n are the n non-zero egenvalues of the p p matrx W of rank n. Let l =(l 1,...,l n ). Then, from Srvastava (003, p.1549), the jont probablty densty functon (p.d.f.) of l and H 1 17

18 s gven by f(l, H 1, Σ) =c n (Σ)b(l)g m,p (H t 1) exp { 1 tr H 1 LH t 1Σ 1}, where b(l) = { n l (p n 1)/ } (l l j ), <j c n (Σ) = n c(n, n) n (π) pn/ Σ n/, for r c(n, n) = n π n / Γ n (n/) and Γ r (m/) = π r(r 1)/4 Γ((m +1)/). Hence, the p.d.f. of l s gven by { f 1 (l, Σ) =c n (Σ)b(l) exp 1 H n,p } a (H 1 )l g n,p (H t 1 )dh 1, where A =(a j )=H t 1 Σ 1 H 1 s an n n matrx. Next, we state a lemma statng the Sten-Haff dentty for the sngular Wshart matrx W ; a smlar dentty for n>phas been obtaned by Sheena (1995). Lemma A.1 Let W have a sngular Wshart dstrbuton W p (Σ,n), Σ > 0, n < p, W = H 1 LH t 1 and Φ(l) = dag (φ 1 (l),...,φ n (l)), where H t 1H 1 = I n, and L s the dagonal matrx wth ordered non-zero egenvalues of the matrx W. Then the followng dentty holds: E tr H 1 Φ(l)H t 1 Σ 1] = E (p n 1) φ + φ + l l j> φ φ j l l j ]. (A.1) Proof. We follow Sheena (1995) n provng ths dentty. Let l 0 =, l n+1 = 0 and dl () = j dl j, where the product term does not nclude the term dl (). Let L () be the set defned by L () = {(l 1,...,l 1,l +1,...,l n ) l 1 > >l 1 >l +1 > >l n > 0}. Let I = Etr H 1 ΦH t 1Σ 1 ]= n Eφ a ]. Then, I s expressed as l 1 ] I = φ b(l) a exp{ 1 a kk l k }g n,p (H t 1 L () l +1 H n,p )dh 1 dl dl () k=1 l 1 ] = φ b(l) exp{ 1 a kk l k }g n,p (H t 1 l H n,p )dh 1 dl dl (). L () l +1 k=1 18

19 Usng ntegraton by parts, we rewrte I as I = L () l 1 l +1 {φ b(l)} exp{ 1 l H n,p whch s equal to ] 1 E {φ b(l)} = b(l) l E a kk l k }g n,p (H t 1 )dh 1dl dl (), k=1 ] φ + φ log b(l). l l Snce log b(l) = n k=1 { 1 (p n 1) log l k + k<j log(l k l j )}, t s noted that log b(l) = p n 1 + l l j 1 l l j, whch mples that I = E (p n 1) φ + φ + ] φ l l l l j j Ths proves Lemma A.1 snce n j φ /(l l j )= n j> (φ φ j )/(l l j ). A. Proofs of the propostons Proof of Proposton.3. It s noted that j=+1 ψ ψ j l l j = 1 l (ψ ψ j )+ 1 l j=+1 = 1 l {(n )ψ j=+1 j=+1 l j (ψ ψ j ) l l j } ψ j + 1 l j=+1 l j (ψ ψ j ) l l j. (A.) Then, the l.h.s. of the nequalty n Proposton. s expressed by { 1 ψ l (n + p 1)ψ +4 } ψ j 4 j> j> l j (ψ ψ j ) l (l l j ) 4 whch, from the condtons (a) and (b), can be seen to be less than or equal to { 1 ψ l (n + p 1)ψ +4 } ψ j. j> ψ l, From the condtons (b) and (c), t s noted that n + p 1 ψ ψ +1, so that ψ (n + p 1)ψ ψ +1 (n + p 1)ψ +1. Repeatng ths argument, we see 19

20 that ψ (n + p 1)ψ +4 j> ψ+1 (n + p 1)ψ +1 +4ψ = ψ +1 {n + p ( +1) 1}ψ ψ j j>+1 j>+1 ψ n (p n 1)ψ n = (p n 1), ψ j ψ j (A.3) whch mples that { 1 ψ l (n + p 1)ψ +4 } ψ j j> Hence, Proposton.3 s proved. (p n 1) l. Proof of Proposton.5. The rsk dfference of the two estmators δ S and δ IS (g) s wrtten as =R(Σ, δ IS (g)) R(Σ, δ S b ) ] g(l) =tr E (tr W ) W +g(l) tr W (δs Σ 1 )W g(l) =E tr W +g(l)tr D tr W g(l)tr W ] Σ 1, (A.4) tr W snce tr δ S W =trd. The Sten-Haff dentty gven n Lemma A.1 s used to evaluate Eg(l)tr W Σ 1 /tr W ]as ] g(l) E tr H 1 tr L LHt 1 Σ 1 = E (p n 1) g(l) tr L +g(l) tr L g(l)l (tr L) + l g(l) + ] g(l) tr L l tr L j> = E (p + n 1) g(l) tr L +g(l) tr L g(l)l (tr L) + l ] g(l) tr L l ] =E tr D g(l) +(n 1)g(l) tr L tr L + l g(l). (A.5) tr L l Combnng (A.4) and (A.5) gves the expresson g(l) 4(n 1)g(l) =E 4 tr W l tr L ] g(l), l 0

21 whch leads to the condtons gven n Proposton.5 for the domnaton of δ IS (g) over δ S. Proof of Proposton.6. It s suffcent to check the condtons (a), (b) and (c) of Proposton.3 for ψ R = l φ R d l =. l +(d ˆλ+1 +l )/d +1 Snce ˆλ +1 does not depend on l, t s easy to see that ψ R s ncreasng n l. For the condton (b), we have that ψn R = d n snce ˆλ n = 0. Also t s seen that the nequalty ψ ψ +1 s equvalent to the nequalty ˆλ l ( d d ) d ˆλ+1 = 1 ( l + l ) d ˆλ+1. d +1 l +1 d +1 l +1 Hence, the condton (b) follows from the defnton of ˆλ and the fact that l /l +1 > 1. Fnally, t s easly verfed that φ R d for =1,...,n. Proof of Proposton.9. Snce the estmator δ DE (g) belongs to the class of the estmators (.1) as ψ = a 0 + l g(l)/tr L, we can evaluate the condton n Proposton.8 as { l g g (tr L 4(n ) ) tr L 8 g } tr L +8 l g (tr L ) 4l ( g/ l ) tr L 0, or { } 1 tr L g l g (n 1)(n +4)g 4 0. l Hence, we get the condtons of Proposton.9 for the domnance of δ DE (g) over δ 0. Proof of Proposton.10. It s noted that l j ψ l ψ j l j> l j (l l j ) = l j (ψ ψ j )+(l j l )ψ j l j> l j (l l j ) = 1 ψ ψ j ψ j l l j> l j l j> l j { = 1 (n )ψ l } ψ j + 1 l j> j> l j (ψ ψ j ) l l j j> ψ j l l j, where the equaton (A.) s used to get the thrd equaton. Hence, the condton gven n Proposton.8 s expressed by { ψ a 0 ψ 4(n )ψ +4 j> ψ j + a 0 +4 ψ j a 0 l l j> l j 4 l j> l j (ψ ψ j ) l l j 4 l ψ l } 0. (A.6) 1

22 From the condton (b), ψ a 0 s nonnegatve, and we observe that j> ψ j a 0 l l j j> ψ j a 0 l j = ( 1) ψ a 0. l Usng the condtons (a) and (b), we see that the nequalty (A.6) holds f h (l) 0, where h (l) =ψ (a 0 +n )ψ +4 ψ j + a 0 +4( 1)(ψ a 0 ), j> whch can be rewrtten by h (l) =(ψ a 0 ) 4(n + 1)(ψ a 0 )+4 j>(ψ j a 0 ). (A.7) To prove the nequalty h (l) 0, note that max{n +1, 0} ψ a 0 ψ +1 a 0 0. (A.8) Then, (A.8) mples that ψ a 0 = 0 for = r +1,...,n, so that t s easy to see that h (l) 0 for = r +1,...,n.For =1,...,r 1, from the nequaltes n (A.8) and the same arguments as n the proof of Proposton.3, t follows that h (l) (ψ +1 a 0 ) 4(n + 1)(ψ +1 a 0 )+4(ψ +1 a 0 )+4 (ψ j a 0 ) j>+1 =(ψ +1 a 0 ) 4(n ( + 1) + 1)(ψ +1 a 0 ) 4(ψ +1 a 0 )+4 (ψ j a 0 ) j>+1 (ψ +1 a 0 ) 4(n ( + 1) + 1)(ψ +1 a 0 )+4 (ψ j a 0 ) j>+1 (ψ r 1 a 0 ) 4(n (r 1) + 1)(ψ r 1 a 0 )+4(ψ r a 0 ) (ψ r a 0 ) 4(n r + 1)(ψ r a 0 ) 4(ψ r a 0 ), whch s not postve. Therefore we get Proposton.10. Proof of Proposton.14. Snce the estmator δ EM (g) belongs to the class of the estmators (.1) as ψ = p + l g(l)/tr L, we can evaluate the condton n Proposton.13 as { l g (tr L) +p l g tr L (p + n +1)l g tr L j> 4 l jg tr L 4 l g tr L +4 l g (tr L) 4 l g } 0, tr L l or g tr L tr L (n +1)g +4g (tr L) (tr L) 4 l g 0. tr L l

23 Snce tr L (tr L), we get the condtons gven by Proposton.14 for δ EM (g) to domnate δ relatve to the L -loss. Proof of Proposton 3.1. We frst consder the emprcal Bayes estmator δ EB a (ˆλ) = a(w + ˆλI p ) 1 and the estmator δ + a (ˆλ) =ah 1 (L + ˆλI n ) 1 H t 1. It s noted that both are p p matrcal estmators of Σ 1, δ + a (ˆλ) s sngular whle δ EB a (ˆλ) has the full rank p and s nonsngular. However, we wll show that both estmators have the same rsk under the L -loss functon (.4). Let H =(H 1, H ) where H s a p (p n) matrx belongng to H p n,p, H t H = I p n. Thus, H s an orthogonal matrx, and t s noted that ( ) H H t Σ 1 t H = 1 Σ 1 H 1 H t 1Σ 1 H H t Σ 1 H 1 H t Σ 1 H and (H t Σ 1 H) 11 = H t 1 Σ 1 H 1. Then we observe that ( ) ( L + tr {δ EB a (ˆλ)} W =a ˆλIn 0 L tr H HH t 0 0 ˆλIp n 0 0 =a tr (L + ˆλI n ) L =tr {δ + a (ˆλ)} W, ) H t and ( L tr W δ EB a (ˆλ)Σ 1 0 =atr H 0 0 ) HH t ( L + ˆλIn 0 0 ˆλIp n ) 1 H t Σ 1 =atr L (L + ˆλI n ) 1 (H t Σ 1 H) 11 =tr W δ + a (ˆλ)Σ 1. Thus, the two estmators δ EB a (ˆλ) and δ + a (ˆλ) have the same rsk under the L -loss. We next apply Proposton.1 to get the unbased estmator of the rsk dfference of the estmators δ + a (ˆλ) and δ a where ψ (l) =al /(l + ˆλ(l)) n the estmator (.1). Then, we have (a, ˆλ) =R (Σ, δ EB a (ˆλ)) R (Σ, δ a ) { a l = E (l + ˆλ) (p + n +1) al l + ˆλ 4 al l + ˆλ 4a l jˆλ j> (l + ˆλ)(l j + ˆλ) (a pa)+4a l (1 + ˆλ/ l ) } (l + ˆλ), 3

24 whch, snce l /(l + ˆλ) =1 ˆλ/(l + ˆλ), can be rewrtten as { ˆλ a E (a +4) (l + ˆλ) (a p + n +1) ˆλ l + ˆλ ˆλ ˆλ +4 l + ˆλ l j + ˆλ +4l ( ˆλ/ l ) } (l + ˆλ). j> Usng the equaton { ˆλ l + ˆλ } = ˆλ (l + ˆλ) + ˆλ ˆλ l + ˆλ l j + ˆλ, j> we can get the expresson (3.5) of the rsk dfference. Acknowledgments. The research of the frst author was supported n part by grants from the Mnstry of Educaton, Japan, Nos , and and n part by a grant from the 1st Century COE Program at Faculty of Economcs, Unversty of Tokyo. REFERENCES Dey, D.K. (1987). Improved estmaton of a multnormal precson matrx. Statst. Probab. Letters, 6, Dey, D.K., Ghosh, M. and Srnvasan, C. (1990). A new class of mproved estmators of a multnormal precson matrx. Statst. and Decson, 8, Efron, B. and Morrs, C. (1976). Multvarate emprcal Bayes estmaton of covarance matrces. Ann. Statst., 4, -3. Fredman, J.H. (1989). Regularzed dscrmnant analyss. J. Amer. Statst. Assoc., 84, Haff, L.R. (1979a). An dentty for the Wshart dstrbuton wth applcatons. J. Multvarate Anal., 9, Haff, L.R. (1979b). Estmaton of the nverse covarance matrx: Random mxtures of the nverse Wshart matrx and the dentty. Ann. Statst., 7, Kubokawa, T. (005). A revst to estmaton of the precson matrx of the Wshart dstrbuton. J. Statst. Res., 39, Kubokawa, T. and Srvastava, M.S. (1999). Robust mprovement n estmaton of a covarance matrx n an ellptcally contoured dstrbuton. Ann. Statst., 7, Loh, W.L. (1997). Lnear dscrmnaton wth adaptve rdge classfcaton rules. J. Multvarate Anal., 6, Sheena, Y. (1995). Unbased estmator of rsk for an orthogonally nvarant estmator of a covarance matrx. J. Japan Statst. Soc., 5,

25 Srvastava, M.S. (003). Sngular Wshart and multvarate beta dstrbutons. Ann. Statst., 31, Srvastava, M.S. (004). Mutvarate theory for analyzng hgh dmensonal data. Tech. Report, Unversty of Toronto. Srvastava, M.S., and Khatr, C.G. (1979). An Introducton to Multvarate Statstcs. North-Holland, New York. Sten, C. (1977). Lectures on multvarate estmaton theory. (In Russan.) In Investgaton on Statstcal Estmaton Theory I, Zapsk Nauchych Semnarov LOMI m. V.A. Steklova AN SSSR vol. 74, Lenngrad.(In Russan) Zhao, Y., Honda, M. and Konsh, S. (1996). Effect of a shrnkage estmator on the lnear dscrmnant functon. Amercan J. Math. Management Sc., 16,

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,