Performance Accuracy of Linear classifiers for Two-level Multivariate Observations in High-dimensional Framework

Size: px

Start display at page:

Download "Performance Accuracy of Linear classifiers for Two-level Multivariate Observations in High-dimensional Framework"

Mark Riley
5 years ago
Views:

1 THE UNIVERSITY OF TEXAS AT SAN ANTONIO, COLLEGE OF BUSINESS Workig Paper SERIES Date February, 6 WP # MGST-53-6 Performace Accuracy of Liear classifiers for Two-level Multivariate Observatios i High-dimesioal Framework Auradha Roy _ Departmet of Maagemet Sciece ad Statistics The Uiversity of Texas at Sa Atoio Auradha.Roy@utsa.edu Tatjaa Pavleko Departamet of Mathematics KTH Royal Istitute of Techology pavleko@math.kth.se Copyright 5, by the authors. Please do ot quote, cite, or reproduce without permissio from the authors. ONE UTSA CIRCLE SAN ANTONIO, TEXAS BUSINESS.UTSA.EDU

2 Performace Accuracy of Liear classifiers for Two-level Multivariate Observatios i High-dimesioal Framework Auradha Roy Departmet of Maagemet Sciece ad Statistics The Uiversity of Texas at Sa Atoio Oe UTSA Circle Sa Atoio, TX 7849, USA Tel: , Fax: Auradha.Roy@utsa.edu Tatjaa Pavleko Departamet of Mathematics KTH Royal Istitute of Techology SE- 44 Stockholm, Swede Tel: , Fax: pavleko@math.kth.se Abstract Asymptotic approximatio for the expected probability of misclassificatio is derived for the liear classifier based o two-level multivariate data usig block-compoud symmetric BCS covariace structure. Advatages of this structure for modelig two-level multivariate data are show i growig dimesios asymptotics which allows the umber of replicates, u to grow faster tha the total umber of samples, ad oly costraits the umber of feature variables, m <. Relevace ad beefits of the desiged classifier are demostrated for a umber of high-dimesioal scearios usig both asymptotic results ad simulatios. Compariso of our fidigs with other existig results is cosidered. Keywords Blocked compoud symmetry; Two-level multivariate observatios; Growig dimesios; Probability of misclassificatio. Mathematics Subject Classificatio 6H3; 6G. Itroductio Performace accuracy of the classificatio procedures desiged for high-dimesioal data is extesively studied i the statistical literature. As the sample based classifiers usually have complex distributioal expressios, various types of asymptotic approximatios of the expected probability of misclassificatio EPMC are derived. Iitially, the cosideratio was give to large sample theory, i.e., assumig that the umber of feature variables, m is fixed ad the total sample size, goes to ifiity. Okamoto 963 i his reowed article first Correspodece to: Auradha Roy, Departmet of Maagemet Sciece ad Statistics, The Uiversity of Texas at Sa Atoio, Oe UTSA Circle, Sa Atoio, TX 7849, USA

3 time derived the asymptotic expasio of the EPMC for liear classifier up to the secod order term. Siotai 98 also cosidered the large sample approximatios ad asymptotic behavior of a umber of statistics related to classificatio accuracy. Later, Fujikoshi ad Seo 998 derived a asymptotic expasio of the same EPMC of the liear classifier i the high-dimesioal framework, i.e., assumig that m ca grow at the same rate as. Very recetly, i the same asymptotic settigs Kubokawa et al. 3 derived a expasio of EPMC for the ridge-type liear classifier up to the secod order term ad ivestigate the asymptotic effect of the ridge parameter o the classificatio accuracy. For the review of asymptotic expasios of the EPMC both i large-sample as well as i high-dimesioal asymptotics, see Fujikoshi et al.. The above-metioed asymptotic expasios of EPMC have bee established for traditioal multivariate data, or just for oe-level multivariate data i a high-dimesioal settig. However, moder experimetal techiques allow to collect ad store multi-level Leiva ad Roy, high-dimesioal data i almost all fields from agriculture to medical research, where the observatios are collected o more tha oe feature variable m at differet locatios u repeatedly over time v ad at differet depths d etc. etc. These multi-level observatios may have variaces that differ across locatios, time ad depths, ad efficiet techiques for talkig ito accout these variatios is of great importace. I recet years, there has bee a flurry of activity over developmet of classificatio rules for multi-level multivariate data i high-dimesioal settig, hitherto to the best of our kowledge o oe has ivestigated their asymptotic performace properties. I a series of papers by Roy ad Leiva 7 ad Leiva ad Roy, a umber of liear ad quadratic classifiers were developed for two- ad three-level multivariate data i combiatio with structured mea ad structured variace-covariace matrix. The mai advatage of exploitig the structure uderlyig the data is show i reducig the umber of ukow parameters, which i tur leads to a sigificat gai i the classificatio accuracy. This advatage is especially importat for small sample high-dimesioal problems where the umber of practically available observatios is limited. To ivestigate various effects of the covariace structure o the classificatio theoretically, oe eeds to derive asymptotic results o the probability of misclassificatio PMC i high-dimesios. I this article we obtai the asymptotic expasio of the EPMC up to the secod order term for the liear classifier desiged for two-level multivariate data i high-dimesioal situatio, which is a extesio of the method proposed by Kubokawa et al. 3 for oe-level multivariate data. To represet the two-level m dimesioal vector over u locatios or time poits multivariate observatios we use blocked compoud symmetric BCS covariace structure defied i Sectio. i high-dimesioal settig ad evaluate the performace of the correspodig classifier usig the secod order asymptotic approximatio of EPMC. The desiged classifier does ot eed the costrait > mu, i.e., the total umber of

4 observatios larger tha the total umber of variables mu, it just eeds > m. I our high-dimesioal asymptotics we assume that m/ c, where < c <, whereas u ca go to ifiity at the same rate as. It is iterestig to ote that the BCS covariace structure for two-level feature variables, which is a multivariate geeralizatio of compoud symmetry structure for multiple variables, has bee iitially itroduced by Rao 945, 953 while classifyig geetically differet groups, ad the it did ot attract much attetio i the literature for early half a cetury. The Leiva 7 developed classificatio rules for BCS covariace structure alog with structured mea vectors i 7. Lately, this covariace structure starts to gai a lot of attetio i the literature, especially i the area of high-dimesioal estimatio ad hypothesis testig see Roy ad Leiva, ; Coelho ad Roy, 3. A importat advatage of usig BCS structure is that the umber of ukow parameters is oly mm +, which does ot eve deped o the umber of repeated measures u, whereas the umber of ukow parameters i the ustructured covariace matrix is mumu + /, which ca icrease very rapidly with the icrease of ay oe of the levels m or u. Hece, usig BCS covariace structure, oe ca allow the umber of repeated measuremets u to grow urestrictedly ad thereby providig more iformatio, while the umber of ukow parameters remais the same. This paper proceeds as follows. I Sectio, the BCS covariace structure is itroduced, ad estimatio of its matrix parameters are preseted. I Sectio 3 the BCS liear classifier is described, pooled estimators of the orthogoally trasformed matrix parameters are derived ad their distributioal properties are established. I Sectio 4, performace accuracy of the estimated BCS classifier is evaluated usig asymptotic expasio of the EPMC. Asymptotics for the momets of the BCS classifier are derived i Sectio 5. Evaluatio of posterior PMC PPMC ad expected PMC EPMC are doe i Sectio 6. Simulatio studies are performed for a umber of high-dimesioal scearios ad advatages of takig structured covariace matrix are umerically verified i Sectio 7 ad fially, coclusios ad scope for the future is preseted i Sectio 8. Prelimiaries. Blocked compoud symmetry covariace structure A BCS structure ca be writte as Σ Σ... Σ. Γ = Σ Σ... Σ = I u Σ Σ + J u Σ,. where I u is the u u idetity matrix, u is a u vector of oes, J u = u u ad represets the Kroecker product. We assume Σ is a positive defiite symmetric m m matrix, Σ is a symmetric m m matrix, 3

5 uder the costraits u Σ < Σ ad Σ < Σ for a proof, see Lemma. i Roy ad Leiva, so that the mu mu matrix Γ is positive defiite. Leiva 7 amed Σ ad Σ as the equicorrelated matrix parameters. The m m block diagoals Σ i Γ represet the variace-covariace matrix of the m respose variables at ay give time, whereas the m m block off diagoals Σ i Γ represet the covariace matrix of the m respose variables betwee ay two time poits. We also assume that Σ is costat for all time poits ad Σ is costat betwee ay two time poits. Our two-level model allows for varyig the stregth of depedece over time by choosig the covariace matrix Σ.. Matrix results Lemma 4.3 i Ritter ad Gallegos ad Leiva 7 guaratee that a mu mu dimesioal matrix Γ of the form Γ = I u Σ Σ + J u Σ, is o sigular if both Σ Σ ad Σ + u Σ are o sigular matrices, ad the iverse of Γ is give by Γ = I u Σ Σ + J u u Σ + u Σ Σ Σ.. We otice that Γ has the same structure as Γ. This result. geeralizes the oe give by Bartlett 95 for m =. Usig the above results we have the followig lemma. Lemma Let Γ be a BCS covariace matrix as i equatio.. If = Σ Σ, ad = Σ + u Σ.3 are o sigular matrices the Γ is a o sigular matrix, ad Γ = I u + J u..4 u The proof of this lemma is give i Appedix A i Roy ad Leiva 7. This result is used i Sectio 3 to obtai the BCS liear classifier. At this poit we assume that we have samples from p populatios Π p for p =,..., k with the commo BCS covariace matrix Γ. We estimate Γ from each Π p i the ext sectio..3 Estimatio of the equicorrelated parameters Σ ad Σ i Π p Let x p r,s be a m-variate vector of measuremets o the r th observatio at the s th time poit; r =,...,, s =,..., u from the pth populatio, p =,..., k. Let x p r 4 = x p r,,..., xp r,u be the mu-variate vector of

6 all measuremets correspodig to the r th observatio. Fially, let x p, xp,..., xp p be a radom sample of size p draw from the pth populatio Π p represeted by N mu ν p, Γ, where ν p R mu ad Γ is a mu mu positive defiite BCS structure defied i Sectio.. Followig Roy ad Leiva 3 ubiased estimates of Σ ad Σ for Π p are ad Σp = Σ p = p p u p p u u r= s= r= s= s = s s x p r,s ν p p s x r,s, νp s x p r,s ν p s x p r,s νp s, respectively. Cosequetly, a ubiased estimate of Γ for the pth populatio is as follows Γ p = I u Σ p p p Σ + J u Σ, p =,..., k. 3 Liear classifier uder BCS covariace structure Assumig that Π p s are represeted by N mu ν p, Γ we assig a ew observatio x to Π p wheever p = arg max C x; ν p, Γ, where p=,...,k C x; ν p, Γ = x Γ ν p νp Γ ν p + log π p, 3.5 is the liear score for Π p, Γ is give i.4 ad k p= π p = with π p deotig the prior probability of Π p. This classifier is aalogous to the well-kow Fisher liear discrimiat rule that is optimal i a sese of miimum overall misclassificatio probability Aderso 3; Srivastava, 979. Evaluatio of the performace accuracy of the sample based versio of 3.5 requires its distributioal properties which i tur depeds o the distributio of the sample couterpart Γ of Γ. A desirable property of the classifier is ormality, however it caot be achieved whe the distributio of Γ is ot Wishart. I this study we circumvet this situatio by applyig the caoical trasformatio of the data which is preseted i the ext sectio. 3. Caoical trasformatio of the feature data Let Ξ = H u u I m be a orthogoal matrix with H a orthogoal Helmert matrix whose first colum is proportioal to a vector of s. Makig the caoical trasformatio of the data as, y p = Ξx p, we have y p distributed as mu dimesioal ormal with Ey p = µ p = ΞEx p = Ξν p, 3.6 5

7 ad Covy p = Γ Ξ = ΞΓΞ =, 3.7 I u where = Σ Σ, ad = Σ + u Σ. See Lemma 3. i Roy ad Foseca for a proof. Positive defiiteess of ad is guarateed as Γ is positive defiite see Lemma. i Roy ad Leiva. We should ote that Γ Ξ is a block diagoal matrix ad Ξ is ot a fuctio of either Σ or Σ. Let T = {y p,..., yp p } p {,...,k} deote the orthogoally trasformed sample data. structure of Γ Ξ the vectors y p r µ p = µ p,..., µ p u ad µ p are partitioed i u subvectors as y p r. The y p r,, yp r,,..., yp r,u are idepedetly ormally distributed as By oticig the ad = y p r,,..., yp r,u ad y p r, N p m µ ;, y p r,s N m µ p s ; for fixed s =,..., u ad fixed r =,...,. Now, for each fixed r, we observe that the vector y p r, p i.e., y we deote y p r, = y p r,,..., yp r,u compoet y p r,. Similarly, µp = r, µ p,..., µp u ca be partitioed as y p r = y p r,, yp r,, where is the u compoets of the vector yp r except the first with µ p s = p p r= yp r,s for s =,..., u. We deote µ p = µ p, µ p, where µ p is the u compoets of the vector µp except the first compoet m um µ p. 3. Estimatio of the Trasformed Matrix Parameters ad From this sectio owards we will focus o two populatio case, i.e., k =. Now, usig the results of Sectio.3, ubiased estimates p ad p of ad respectively for the pth populatio are ad p = p = p p Σ Σ, p p Σ + u Σ, for p =,. I the ext sectio we obtai the pooled estimates ad of ad respectively from both ad p. We also obtai the distributios of ad. p 3.3 Pooled Estimates ad Distributios of ad Sice Γ p does ot follow Wishart distributio, we focus o the distributios of p ad p, p =, which fortuately are both Wishart. Followig Theorem i Roy et al. 3 it ca be show that distributios 6

8 of p p u ad p p are idepedet, ad S p = p p u Wishart m, p u, 3.8a ad S p = p p Wishart m, p, 3.8b for p =,. Now, from the property of Wishart distributio the ubiased estimates of ad are ad. p respectively for the pth populatio. The followig lemma yields the pooled estimates of ad p Lemma The pooled estimates of ad are give by = ad = S u. S, where S = S = p u p= p p= ad = +. p, p, Proof : Applyig the additive property of two idepedet Wishart distributios from 3.8a we get ad similarly from 3.8b we get Now, agai from 3.8a we get Therefore, S = S + S Wishart m, u, S = S + S Wishart m,, S = p u p= p. p p= = p u p= p u = S u. 3.9 Similarly, from 3.8b we get S = p p= 7 p.

9 Therefore, p p= = p p= p = S. 3. Hece, the Lemma follows. This Lemma ca easily be exteded to more tha two populatios. Now, takig iverse of both sides of the equatios 3.9 ad 3. separately we get ad = u S, 3.a = S. 3.b Therefore, ad E = u ES, 3.a E = ES. 3.b 4 Performace accuracy of BCS classifier As the goal of classificatio problem is to derive the decisio rule that optimizes some measure of performace accuracy, we cosider the PMC as its measure. By partitioig the ew observed vector y = y, y m um ad by substitutig Γ Ξ from 3.7 ito 3.5, the theoretical classifier 3.5 becomes Cy; µ i,, = C y ; µ i, + C y ; µ i, = y µ + µ µ µ + y µ + µ Iu µ Pluggig the estimators of µ i ad i ito 4.3 leads to the sample based classifier Cy; µ i,, = + y µ + µ µ µ y µ + µ Iu µ µ µ Give that the prior probabilities are the same for both populatios, a ew observatio y is to be assiged to Π wheever Cy; µ i,, > or Cy; µi,, >. Assume heceforth that y is comig from Π, i.e y Nµ, Γ Ξ. The symmetry of our classificatio rule makes posterior PMC if the mea of y is µ the same as that uder the assumptio if mea of y is µ. Throughout the paper we shall cosider the followig defiitios of PMC: OPMC : Optimal or Bayes PMC of C y; µ i,, is defied as E O = Pr C y; µ i,, y Nµ, Γ Ξ. 8

10 PPMC : Posterior PMC of Cy; µ i,, is defied for a sigle traiig sample T of size = + from populatios Π ad Π EPMC : Expected PMC of Cy; µ i, size, E C = Pr Cy; µ i,, T, y Nµ, Γ Ξ. i is defied as the average of E C over arbitrary traiig samples T of EE C = E T Pr Cy; µ i,, T, y Nµ, Γ Ξ. It should be oted that for ay sample based classifier Cy; µ i, always be strictly larger tha OPMC., both PPMC ad EPMC will I what follows we use otatios δ = δ,..., δ u = δ, δ, with δ = µ µ ad δ = m um µ µ, ad observe that due to the ormality of y, OPMC ca easily be computed as E = Φ D / where Φ is the cumulative distributio fuctio of the stadard ormal distributio ad D is the Mahalaobis squared distace betwee Π ad Π which by the structure of µ i s ad Γ Ξ ca represeted as with D = µ µ Γ Ξ µ µ = δ δ + δ I u δ = D + D, D = δ δ, 4.5a ad D = δ I u δ = δ k δ k = Dk. Observe that D depeds o u, which is importat for our asymptotic cosideratios. k= k= 4.5b The advatage of cosiderig OPMC is that it relates the Mahalaobis distace D, i.e., a measure of classificatio complexity for the true uderlyig model ad the optimal performace accuracy which ca be achieved by Cy. By the strict mootoicity of Φ oe ca see that D = Φ E O, ad hece uder ormality of y, E O ad D provide equivalet iformatio about the classificatio performace. 5 Asymptotics for the momets of Cy; µ i,, To obtai PPMC ad EPMC of Cy; µ i, data T. We begi by the followig,, we eed to obtai its coditioal distributio give the 9

11 Lemma 3 Uder the model 4.4 ad estimators 3.a ad 3.b the PPMC is give by E C = Φ U/ V T where U ad V are coditioal mea ad coditioal variace of the classifier Cy; µ i, Proof: To derive the distributio of Cy; µ i, U = U = µ µ µ µ Iu µ µ µ µ,., we first defie the followig statistics µ µ µ µ µ µ, Iu µ µ, V = V = µ µ µ µ Iu µ µ, µ µ, The Cy; µ i, Z = V / ad Z = V /, ca be writte as µ µ µ µ y µ, y µ. Cy; µ i,, = Cy ; µ i, + Cy ; µ i, = V / Z U + V / Z U. Now, by the distributioal properties of ad see Sectio 3.3 oe ca see that Z ad Z are idepedet ad follow the stadard ormal distributios give T i.e., give µ i, ad assumig that y Π. Hece Cy; µ i, ad, T, y Nµ, Γ Ξ N mu U, V, 5.6 where U = U + U ad V = V + V, from which the lemma directly follows. By 5.6 we foud that EPMC is EE C = E T Φ U/ V. Both E C ad EE C provide exact results o the classificatio performace, however gettig the closed-form expressios for them is too demadig. We istead focus o the asymptotic approximatio of EE C withi the followig high-dimesioal framework: A : ad A : D = O. p, p =,, u c,, + =, m c,, The coditio A relates the BCS structure to the sample size ad for mu implies that q = m. The coditio A is to guaratee that the classificatio problem does ot degeerate, i.e., to esure that the Mahalaobis distace betwee the Π ad Π is bouded as < d D d <. Also, to fulfill A we require that Dk = O for k =,..., u, ad D = O.

12 Further, we tur to stochastic expasio of the coditioal momets of Cy; µ i,, uder A ad defie two radom variables ζ ad η as follows ζ = µ µ + µ µ, ad η = µ µ µ µ. Sice ζ ad η are idepedet ad distributed as N mu, Γ Ξ, usig the BCS structure of Γ Ξ we ca express U ad V as follows U = δ δ + δ I u δ + δ ζ η η + η I u η η + δ I u η + δ ζ + δ I u ζ η η + ζ I u ad V = δ Γ Ξ Γ Ξ Γ Ξ δ + η Γ Ξ Γ Ξ Γ Ξ η + where Γ Ξ Γ Ξ Γ Ξ = I u We ow proceed by expadig both U ad V stochastically. 5. Stochastic expasio of the coditioal mea, 5.7 δ Γ Ξ Γ Ξ Γ Ξ η, 5.8 Followig the techique as i Sectio 3. we partitio ζ ad η as ζ = ζ,..., ζ u = ζ, ζ ad η m um as η = η,..., η u = η, η. m um To evaluate the first term i U we eed E ad E that are derived i 3.a ad 3.b respectively. Now usig D ad D from 4.5a ad 4.5b it is see that the first term i 5.7 have the followig represetatio Eδ δ + E δ I u δ = Eδ S δ + u Eδ ks δ k EtrW ξ ξ + = = = q + q D + k= EtrW k q + q D. ξ kξ k + O q + q D + O, 5.9

13 where ξ = / δ ad ξ k = / δ k for k =,..., u, ad Now, usig the expressios of W = u / S /, 5.a ad W = / S /. 5.b ad from 3.a ad 3.b, W ad W from 5.a ad 5.b, ad also usig the higher order momets of iverse Wishart matrix see the appedix A. i Kubokawa et al. 3 we express the expectatio of the secod term i 5.7 as follows E η η + η I u η = E η η + E η I u η = E trη S η + u E η I u S = = = mu E trw + u E trw + k= k= E η ks η k E trw η q + q + O. 5. Usig the above expasios 5.9 ad 5., ad usig the facts that Eζ = ad Eη = we ca ow express U as U = U + U + U + O P where U = q + q D + U = δ η + δ Iu + ζ δ δ + δ Iu ad U = mu, η + ζ I u η, η η + η I u η + δ ζ + δ I u ζ δ q + D q + Du η mu q + q. Remark By settig u =, our two-level model reduces to oe-level model, cosequetly U reduces to U = q + q D + m. This is the expressio of U up to the secod order term. Now, by igorig the secod order term i this expressio it further reduces to U = q D + which is the expressio of U i Kubokawa et al. 3 see Page 498. m,

14 5. Stochastic expasio of the coditioal variace Now we cosider the stochastic expasio of V. Cosiderig the first term i 5.8 ad substitutig the estimators ad from 3.a ad 3.b we get δ Γ Ξ Γ Ξ Γ Ξ δ = δ δ + δ Iu δ = δ S S δ + u δ Iu S S Therefore, usig the expressios W ad W from 5.a ad 5.b, ad also usig the higher order momets of iverse Wishart matrix see the appedix A. i Kubokawa et al. 3 the expectatio of the above expressio becomes E δ S S δ + u δ I u S S = Eδ S S δ + u = EtrW ξ ξ + = = k= k= EtrW ξ kξ k Eδ ks S δ k δ δ. 3 q 3 D 4 q + q 4 D 3 + q 3 Dk 4 q + q 4 Dk + O k= 3 4 q q 3 + q 4 D + O. 5. Now, cosiderig the secod term i 5.8 ad substitutig the values of 3.b we get η Γ Ξ Γ Ξ Γ Ξ η = Now, = η η E η η + η I u = E η η + E ad from 3.a ad η + η I u η η + η I u η. η I u η η = E η S S η + u E η I u S S η = E η S S η + u E η ks S = E trw = k= + E trw k= m m 4 q q 3 + q 4 u + O. 3 η k

15 Therefore, the expectatio of the secod term i 5.8 is as follows m m 4 q q 3 + q 4 u + O. 5.3 Therefore, usig the above expasios 5. ad 5.3 we ca ow express V as V = V +V +V +O P 3/ where V = V = 3 q 3 + δ 4 q D + q 4 mu, η + δ I u ad V = δ δ + δ Iu δ η, 3 q η η + η Iu η 4 q 3 q 3 + q 4 mu. 4 q Remark As i Remark, by settig u =, the model reduces to oe-level model, cosequetly V reduces q 4 D to V = 3 4 q q 3 + q 4 D + m. This is the expressio of V up to the secod order term. Now, by igorig the secod order term i this expressio it further reduces to V = 3 q 3 D + m, which is the expressio of V i Kubokawa et al. 3 see Page Evaluatio of PPMC ad EPMC To evaluate PPMC we first obtai expasio of the distributio fuctio of Cy; µ i,,. By returig ow to Lemma 3 ad 5.6, ad by usig the represetatios U = U + U + U + O P 3/ ad V = V + V + V + O P 3/ we cosider the followig Taylor series expasio U = U + U + U + V / + V V / V / V = U + U + U V / + 3 U + U + U 8 V / U + U + U V / V V V + 3 U + U + U 8 V U + U + U V / V / 4 V V V V + 6 U + U + U 8 V / V V V + O P 3/.

16 Further, by igorig the terms of order higher tha O P 3/ we obtai U V / = γ + γ + γ + O P 3/, where γ = U V / γ = V / ad γ = V /, U U V, V U U V + 3U V 8V The above results ca be summarized i the followig theorem. V V U V. Theorem The cumulative distributio fuctio of the ormalized classifier Cy; µ i,, U /V / uder y Π ad uder the coditio A is expaded as Φ γ + γ + γ + OP 3/ 6.4 where φ is the desity fuctio of N,. Now, by cosiderig the Taylor series expasio of Φ γ + γ + γ i 6.4 aroud γ ad usig the fact that φ x = xφx oe ca see that PPMC of Cy; µ i,, is expaded as follows E C = Φγ + φγ γ + γ γ γ + O P 3/. 6.5 Remark 3 For u =, Φγ reduces to Φγ = Φ U V /, which represets the correspodig expressio i Kubokawa et al. 3 see Page 5 up to the secod order term. Thus, from Remarks, ad 3 we see that our method is a clear extesio of Kubokawa et al. 3 for the two-level multivariate observatios. To evaluate EPMC, we deote L = E γ + γ γ γ ad observe that L ca be writte as L = V / EU + EU + U 8V 5/ 3 U EV V U EV V 3/ + EV V 3/ U V 3/ U EU V. V EU Now, to calculate the approximate EPMC, we eed to evaluate the expectatio of each term i L. 5

17 By usig the fact that ζ ad η are idepedet ad idetically distributed while evaluatio EU, ad by usig the expressios 5.9 ad 5. ad by keepig the secod order term while evaluatio EU respectively, we obtai EU =, ad EU = D + q + mu + O 3/. Agai, by applyig the similar argumets for V ad V we obtai EV =, 4 q ad EV = q 4 D + mu + O 3/. Now, usig the results o the higher order momets of the iverse of Wishart distributio i Appedix A i the Kubokawa et al. 3 we evaluate EV as follows. EV = E δ η + δ Iu η = 4 E δ η η δ + δ k η k η k δ k 4 = 4 { m + + m } k= q 7 D + O 3/. Further, give that the expectatio of the product terms are zeros we ow obtai EU as follows EU = E δ Γ Ξ η + δ Γ Ξ ζ + ζ Γ Ξ η = Eδ Γ Ξ ηη Γ Ξ δ + Eδ Γ Ξ ζζ Γ Ξ δ + Eζ Γ Ξ ηη Γ Ξ ζ = + Eδ Γ Ξ Γ Ξ Γ Ξ δ + Eζ Γ Ξ Γ Ξ Γ Ξ ζ = + Eδ δ + δ I u δ + ζ + ζ I u ζ = = Eζ q 3 + E trw ξ ξ E trw + + k= 4 q D q 4 k= E trw + E trw ξ kξ k + mu + O 3/. 6

18 The above results are derived usig the idepedece of η ad η, idepedece of η ad ζ, as well as Eζ =. Fially, we express EU V as follows EU V = E = = E E = E = δ δ δ δ η + δ Iu η η δ + E δ Iu δ + E δ Iu η δ η + δ Iu η η η Iu δ Iu Iu δ δ + E δ Iu δ 4 + m q 5 D 4 + m + q 5 = + m 4 q 5 D + O 3/. Now, by summarizig the above results we obtai the followig theorem. k= D k + O 3/ Theorem Uder the coditios A ad A the asymptotic approximatio of EPMC of Cy; µ i, is give by E T E C = Φγ + φγ L D. 6. Relatio to existig results, To compare our approximatio with the existig results we apply the asymptotic approximatios of the type cosidered i Wyma et al. 99 ad Fujikoshi et al. to the BCS classifier. As we are mostly iterested i the asymptotic effect of u we focus o C = Cy ; µ i,. We recall that by the boudig coditio A, we assume that Dk = O/ for all k =,..., u ad observe that with u, the distributio of C as a sum of growig umber of idepedet ad idetically distributed radom variables coverges to the ormal distributio, so that EPMC ca be approximated as E T E C Φ E U E V /, where U = δ k= k y k µ k Dk ad V = k= k= δ k δ k. 6.6 Now, by evaluatig the momets i 6.6 ad by derivig the represetatio of E T E C, we obtai E T E C Φ Du = Φ m D mu D u u +, 6.7 4mu m + D u + m 7

19 where we set = = ad igore the terms of order / ad m/ i order to sigle out effects of m ad u. The secod term i the deomiator of 6.7 reflects icrease i the misclassificatio rate due to estimatio of : for fixed u, the effect of the block size, m whe grows. However, the effect of u is much more proouced sice 4mu D u grows with u thereby icreasig E T E C. However, the fiite sample size behavior of the approximatio is much more subtle because it depeds o the terms of order / ad m/. 7 Simulatio Studies We ow examie the performace accuracy of the liear classifiers 3.5 with the help of simulatio studies. Populatios Π ad Π are represeted by ormal distributio with the commo BCS covariace matrix Γ. The matrix parameters Σ ad Σ i Γ are chose as m m idetity matrix ad m m zero matrix with all elemets as zeros respectively; ad Γ / ν = mu / D, D,..., D for D =, 4 ad ν =,,...,. The trasformed meas µ ad µ ad the trasformed variace Γ Ξ are obtaied by the orthogoalizatio of Ξ, with u u dimesioal Helmert matrix H as described i Sectio 3.. By settig D = ad D = 4, we cosider two types of classificatio complexity for two-level data, for which the OMCP E O =.3 ad E O =.59 respectively. To see the effect of the repeated measuremets over time u, the values of u are chose as 3, 5 ad 8. Similarly to see the effect of the umber of variables m, it is chose as, 5, 8 ad. The total sample size is chose as, 4 ad 8, with differet combiatios of the pairs of sample sizes,, to ecompass both balaced ad ubalaced cases. Eve though we have chose Σ ad Σ as m m idetity matrix ad m m zero matrix, the calculated values of Φγ, the limitig behavior of EPMC do ot deped o the choices of Σ ad Σ. Simulatio results are give i Tables, ad 3 for =, > ad < respectively. The limitig values of EPMC, Φγ are give i these tables i the first row ad the approximate EPMC values are give i the secod row i both italics ad parethesis for each combiatio of D, m, u ad,. Table 4 preset the values of Φγ for differet D, m ad,, but for fixed u =. As metioed i the various remarks i the previous sectios that the our method reduces to the method of Kubokawa et al. 3 for u =, ad ideed we get the exact idetical values of Φγ up to the first order term as Kubokawa et al. 3 for differet values of D, m ad,. Nevertheless, for compariso purpose with our ew exteded method we also calculate the limitig values of EPMC up to the secod order term for u =. I Tables, ad 3 we see that both Φγ ad the approximate EPMC values icrease with m ad u 8

20 for each fixed D this result is supported by 6.7 ad,. However, both Φγ ad approximate EPMC values decrease with both balaced ad ubalaced cases for each fixed D, m ad u. This is aturally expected as icreasig sample size should improve performace accuracy. It is very importat to poit out that with our techique both the limitig value Φγ ad approximate EPMC values work well eve for small, however Kubokawa et al. 3 beig ot desiged for small sample case fails. Furthermore, for those sample sizes whe Kubokawa et al. 3 works, our proposed method for two-level data outperforms their results. To demostrate the achieved gai cosider for example, two followig set-ups, D =, m =, u = 5,, = 5, 5, ad D =, m = 5, u =,, = 5, 5. Our method gives Φγ =.37 ad Φγ =.35 respectively, whereas the correspodig value i Kubokawa et al. 3 is.338. For additioal compariso of the performace patter of the two methods cosider the case whe D = ; with m =, u = 5,, = 3, 5, i Table 3, ad with m = 5,, = 3, 5 i Table 4, which result i.387 ad.49 respectively. Observe that both limitig values are calculated usig secod order term, thereby showig the gai of takig ito accout the covariace structure uderlyig the data. Aother importat observatio here is that with fixed total dimesioality, larger umber of replicates u results i improved performace accuracy. For example, give that mu =, we cosider the two cases, m =, u = 5, ad m = 5 ad u = ; which result i.37 ad.35 respectively. See Table 3, for, = 5, 5 ad D =. This is i accordace with the theory of repeated measuremets where icreasig the umber of replicates improves precisio. Moreover, we observe aother very importat fact that whe > both Φγ, the limitig values of EPMC are less that their couterparts whe < ad =. Thus, i practical applicatios it is suggested that whe classifyig a patiet with a rare disease, as a sick or healthy, oe eeds to choose diseased Populatio as Π. It is also iterestig to observe the effect of icreasig sample size for both choices of the classificatio complexity D = ad D = 4; the limitig values φγ ad EPMC becomes closer to each other, see Tables, ad 3. Observe also that the differece betwee the first ad secod order represetatio of Φ is egligible, see Table 4. 8 Coclusios ad scope for the future We have aalyzed the performace accuracy of the liear classifier desiged for two-level multivariate data where the model is represeted by the BCS covariace structure. The crucial advatage of this structure is its ability to capture two types of variatios i the data, where the covariace matrix Σ represets variatio of 9

21 feature variables withi observatios, ad Σ represets variatio of feature variables betwee ay two time replicates. Our mai results are as follows: The asymptotic ormality of the BCS classifier higes o the distributioal property of the matrix parameters ad. We derive the pooled ubiased estimators of the matrix parameters of the BCS structure ad show that their distributios are Wishart. Our high-dimesioal asymptotic framework allows the umber of replicates u to grow faster tha the sample size, so that the total umber of variables ca be larger tha the sample size. Mai results of this paper ca be exteded to more geeral settigs. I particular, two-level data ca have may other covariace structures, e.g., separable covariace structure with half structured, half ustructured or both structured or both ustructured. A extesio of the BCS classifier to more tha two classes is also possible. We exted the cosideratio of Kubokawa et al. 3 to two-level data ad derive the asymptotic approximatio of EPMC i the settig which allows the umber of replicates to grow with the weaker costrai m <, ulike mu < i the case of Kubokawa et al. 3. Our results are i lie with Kubokawa et al. 3 ad reduces to their model exactly with u =. To exted the measure of classificatio accuracy to the multi-class case, oe ca apply the multiple biary comparisos i p-populatios classificatio problem as it is suggested i Wu 3, or to use oe-agaist-all approach applied i Dettlig 5 or to cosider the distace based classifier, see Srivastava 6. Ackowledgemet Auradha Roy gratefully ackowledges fiacial support i part from the Royal Swedish Academy of Scieces, Göra Gustafsso Stiftelse, grat -4, ad Tatjaa Pavleko gratefully ackowledges fiacial support i part from the Swedish Research Coucil uder grat This work was completed while Auradha Roy was visitig the KTH Royal Istitute of Techology, Swede i 3. Refereces Aderso T. W. 3. A itroductio to multivariate statistical aalysis. Third Ed. Joh Wiley & Sos, New Jersey. Bartlett M. S. 95. A iverse matrix adjustmet arisig i discrimiat aalysis. A. Math. Statist : 7-.

22 3 Coelho, A. C. ad Roy A. 3. Testig of hypothesis of a block compoud symmetric covariace matrix. The Uiversity Of Texas At Sa Atoio, College Of Busiess Workig Paper Series, WP # MSS Dettlig, M., Gabrielso, E. ad Parmigiai G. 5. Searchig for Differetially Expressed Gee Combiatios Geome Biology, 6, R88. 5 Fujikoshi, Y., Ulyaov, V. V. ad Shimizu R.. Multivariate statistics : high-dimesioal ad large-sample approximatios, Hobroke, New Jersey: Wiley. 6 Fujikoshi, Y. ad Seo, T Asymptotic approximatios for EPMC s of the liear ad the quadratic discrimiat fuctios whe the sample sizes ad the dimesios are large, Radom Oper. Stochastic Equatios 6, Kubokawa, T. Hyodo, M. ad Srivastava, M. S. 3. Asymptotic expasio ad estimatio of EPCM for liear classificatio rules i high dimesio. J. Multivariate Aal., 5, Leiva, R. 7. Liear discrimiatio with equicorrelated traiig vectors. Joural of Multivariate Aalysis 98: Leiva, R. ad Roy, A.. Liear Discrimiatio for Multi-level Multivariate Data with Separable Meas ad Joitly Equicorrelated Covariace Structure, Joural of Statistical Plaig ad Iferece, 45, Leiva R. ad Roy A.. Liear discrimiatio for three-level multivariate data with separable additive mea vector ad doubly exchageable covariace structure, Computatioal Statistics ad Data Aalysis, 566, Okamoto, M A asymptotic expasio for the distributio of the liear discrimiat fuctio. A. Math. Statist., 34, Rao, C. R Familial correlatios or the multivariate geeralizatios of the itraclass correlatio. Curret Sciece 4: Rao, C. R Discrimiat fuctios for geetic differetiatio ad selectio. Sakhyā, : Ritter G. ad Gallegos M. T.. Bayesia object idetificatio: variats. Joural of Multivariate Aalysis 8:

23 5 Roy, A. ad Leiva R., Žežula I. ad Daiel Klei 3. Testig the Equality of Mea Vectors for Paired Doubly Multivariate Observatios i Blocked Compoud Symmetric Covariace Matrix Setup. Submitted. 6 Roy, A. ad Leiva R. 3. Testig the Equality of Mea Vectors for Paired Doubly Multivariate Observatios. Uiversity of Texas at Sa Atoio Workig Paper Series # 7MSS Roy, A. ad Leiva R.. Estimatig ad Testig a Structured Covariace Matrix for Three-level Multivariate Data, Commuicatios i Statistics - Theory ad Methods, 4, Roy, A. ad Leiva R. 7. Discrimiatio with Joitly Equicorrelated Multi-level Multivariate Data, Advaces i Data Aalysis ad Classificatio, 3, Srivastava, M. S. 6. Miimum distace classificatio rules for high dimesioal data. J. Multivariate Aal. 97, Srivastava, M. S. ad Khatri, C. G A itroductio to multivariate statistics, North-Hollad, New York. Wu, T. F., Li, C. J. ad Weg R. 3 Probability estimates for multi-class classificatio by pairwise couplig. J. Mash. Lear. Res., 5, Wyma F. J., Youg, D. M. ad Turer, D. W. 99 A compariso of asymptotic error rate expectatios for the sample liear discrimiat fuctios. Patter Recogitio,

24 Table : Values of Φγ for simulated data for differet choices of =, m, u, ad D.,,, 4,4 D m u The otatio represets failure of Kubokawa et al. 3. 3

25 Table : Values of Φγ for simulated data for differet choices of >, m, u, ad D,,8 5,5 5,3 D m u The otatio represets failure of Kubokawa et al. 3. 4

26 Table 3: Values of Φγ for simulated data for differet choices of <, m, u, ad D, 8, 5,5 3,5 D m u The otatio represets failure of Kubokawa et al. 3. 5

27 Table 4: Compariso of first ad secod order values of Φγ i Kubokawa et al. 3 D m, Φγ Secod order First order,.34.3, ,.69.65, , ,.98.93, , , , , , ,

Constrained linear discriminant rule for 2-groups via the Studentized classification statistic W for large dimension

Constrained linear discriminant rule for 2-groups via the Studentized classification statistic W for large dimension Costraied liear discrimiat rule for -groups via the Studetized classificatio statistic W for large dimesio Takayuki Yamada Istitute for Comprehesive Educatio Ceter of Geeral Educatio Kagoshima Uiversity