arxiv: v2 [stat.me] 3 Nov 2014

Size: px

Start display at page:

Download "arxiv: v2 [stat.me] 3 Nov 2014"

Alicia Oliver
5 years ago
Views:

1 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. arxiv: v2 [stat.me] 3 ov 2014 Abstract Estimating a covariance matrix is an imortant task in alications where the number of variables is larger than the number of observations. In the literature, shrinkage aroaches for estimating a high-dimensional covariance matrix are emloyed to circumvent the limitations of the samle covariance matrix. A new family of nonarametric Stein-tye shrinkage covariance estimators is roosed whose members are written as a convex linear combination of the samle covariance matrix and of a redefined invertible target matrix. Under the Frobenius norm criterion, the otimal shrinkage intensity that defines the best convex linear combination deends on the unobserved covariance matrix and it must be estimated from the data. A simle but effective estimation rocess that roduces nonarametric and consistent estimators of the otimal shrinkage intensity for three oular target matrices is introduced. In simulations, the roosed Stein-tye shrinkage covariance matrix estimator based on a scaled identity matrix aeared to be u to 80% more efficient than existing ones in extreme high-dimensional settings. A colon cancer dataset was analyzed to demonstrate the utility of the roosed estimators. A rule of thumb for adhoc selection among the three commonly used target matrices is recommended. Keywords Covariance matrix, High-dimensional settings, onarametric estimation, Shrinkage estimation 1 Introduction The roblem of estimating large covariance matrices arises frequently in modern alications, such as in genomics, cancer research, clinical trials, signal rocessing, financial mathematics, attern recognition and comutational convex geometry. Formally, the goal is to estimate the covariance matrix Σ based on a samle of indeendent and identically distributed i.i.d) -variate random vectors X 1,..., X with mean vector µ in the small, large aradigm, that is when is a lot smaller comared to. It is a well-known fact that the samle covariance matrix S = 1 1 X i X)X i X) T, i=1 where X = i=1 X i/ is the samle mean vector, is not erforming satisfactory in high-dimensional settings. For examle, S is singular even when Σ is a strictly ositive definite matrix. Recent research in estimating high-dimensional covariance matrices includes banding, taering, enalization and shrinkage methods. We focus on the Steinian shrinkage method Stein, 1956) as adoted by Ledoit and Wolf 2004) because it leads to covariance matrix estimators that are: i) non-singular ii) well-conditioned, iii) invariant to ermutations of the order of the variables, iv) consistent to deartures from a multivariate normal model, v) not necessarily sarse, vi) exressed in closed form and vii) comutationally chea regardless of. Ledoit and Wolf 2004) roosed a Stein-tye covariance matrix estimator for Σ based on S = 1 λ)s + λνi, 1) 1

2 where I is the identity matrix, and where λ and ν minimize the risk function E [ S Σ 2 ] F, that is λ = E [ S Σ 2 ] F E [ ] S νi 2 F and ν = trσ). The otimal shrinkage intensity arameter λ in 1) suggests how much we must shrink the eigenvalues of the samle covariance matrix S towards the eigenvalues of the target matrix νi. For examle, λ = 0 imlies no contribution of νi to S, while λ = 1 imlies no contribution of S to S. Intermediate values for λ reveal the simultaneous contribution of S and νi to S. Desite the attractive interretation, S is not a covariance matrix estimator because ν and λ deend on the unobservable covariance matrix Σ. For this reason, Ledoit and Wolf 2004) roosed to lug-in nonarametric -consistent estimators for ν and λ in 1) and use the resulting matrix as a shrinkage covariance matrix estimator for Σ. Although ν seems to be adequately estimated by ˆν = trs)/, we noticed via simulations that the estimator of λ roosed by Ledoit and Wolf 2004) was biased in extreme high-dimensional settings and when Σ = I. This is counter-intuitive because λ = 1 and the lug-in estimator of S is exected to be as close as ossible to the target matrix νi. In addition, this observation underlines the imortance of choosing a target matrix that aroximates well the true underlying deendence structure. To this direction, Fisher and Sun 2011) roosed Stein-tye shrinkage covariance matrix estimators for alternative target matrices. However, they are no longer nonarametric as their construction was based on a multivariate normal model assumtion. Motivated by the above, we imrove estimation of the otimal shrinkage intensity by roviding a consistent estimator of λ in high-dimensional settings. To construct the estimator of λ we follow three simle stes: i) exand the exectations in the numerator and denominator of λ assuming a multivariate normal model, ii) rove that this ratio, say λ, is asymtotically equivalent to λ, and iii) relace each unknown arameter in λ with unbiased and consistent estimators constructed using U-statistics. The last ste is essential in our roosal so as to ensure consistent and nonarametric estimation of λ. Further, we relax the normality assumtion in Fisher and Sun 2011) for target matrices other than νi in 1) and we illustrate how to estimate consistently the corresonding otimal shrinkage intensities in high-dimensional settings. In other words, we roose a new nonarametric family of Stein-tye shrinkage estimators suitable for high-dimensional settings that reserve the attractive roerties mentioned in the first aragrah and can accommodate arbitrary target matrices. The rest of this aer is organized as follows. In Section 2, we resent the working framework that allows us to manage the high-dimensional setting. Section 3 contains the main results where we derive consistent and nonarametric estimators for the otimal shrinkage intensity of three different target matrices. We evaluate the erformance of the roosed covariance matrix estimators via simulations in Section 4. In Section 5, we illustrate the use of the roosed estimators in a colon cancer study and we recommend a rule of thumb for selecting the target matrix. In Section 6, we summarize our findings and discuss future research. The technical details can be found in the aendix. Throughout the aer, we use A 2 F = trat A)/ to denote the scaled Frobenius norm of A, tra) to denote the trace of the matrix A, D A to denote the diagonal matrix with elements the diagonal elements of A, and A B to denote the Hadamard roduct of the matrices A and B, i.e., the matrix whose a, b)-th element is the roduct of the corresonding elements of A and B. In the above, it is imlicit that A and B are matrices. 2 Framework for High-Dimensional Settings Let X 1,..., X be a samle of i.i.d. -variate random vectors from the nonarametric model X i = Σ 1/2 Z i + µ, 2) where µ = E[X i ] is the -variate mean vector, Σ = cov[x i ] = Σ 1/2 Σ 1/2 is the covariance matrix, and Z 1,..., Z is a collection of i.i.d. -variate random vectors. Instead of distributional assumtions, 2

3 moments restrictions are imosed on the random variables in Z i. In articular, let Z ia be the a-th random variable in Z i and suose that E[Z ia ] = 0, E[Zia 2 ] = 1, E[Z4 ia ] = 3 + B with 2 B < and for any nonnegative integers l 1,..., l 4 such that 4 ν=1 l ν 4 E[Z l 1 ia1 Z l 2 ia2 Z l 3 ia3 Z l 4 ia4 ] = E[Z l 1 ia1 ]E[Z l 2 ia2 ]E[Z l 2 ia3 ]E[Z l 4 ia4 ], 3) where the indexes a 1,..., a 4 are distinct. The nonarametric model 2) includes the -variate normal distribution µ, Σ) as a secial case obtained if Z ia are i.i.d. 0, 1) random variables. Since B = 0 under a multivariate normal model, B can be interreted as a measure of dearture of the fourth moment of Z ia to that of a 0, 1) random variable. The assumtion of common fourth moments is made for notational ease and the results of this aer remain valid even if E[Z 4 ia ] = 3 + B a for finite B a a = 1,..., ). Model 2) assumes that X i is a linear combination of Z i, a random vector that contains standardized white noise variables. Unlike the usual definition of white noise, model 2) makes no distributional assumtions for Z i and it allows deendence atterns given that the seudoindeendence condition 3) holds. Therefore, the working framework can cover situations in which the white noise mechanism does not roduce indeendent and/or identically distributed random variables. To handle the high-dimensional setting, we restrict the dimension of Σ by assuming that as, = ), trσ4 ) tr 2 Σ 2 ) t 1 and trσ2 ) tr 2 Σ) t 2, 4) with 0 t 2 t 1 1. This flexible assumtion does not secify the limiting behavior of with resect to, thus including the case where / is bounded Ledoit and Wolf, 2004). At the same time, it does not seriously restrict the class of covariance matrices for Σ that satisfy assumtion 4). For examle, members of this class are covariance matrices whose eigenvalues are bounded away from 0 and Chen et al., 2010), banded first-order auto-regressive covariance matrices such that Σ = {σ a σ b ρ a b I a b k)} 1 a,b with 1 ρ 1, σ a and σ b bounded ositive constants and 1 k Chen et al., 2010), covariance matrices that have a few divergent eigenvalues as long as they diverge slowly Chen and Qin, 2010), and covariance matrices with a comound symmetry correlation attern, that is Σ = {σ a σ b ρ} 1 a,b. Model 2) and assumtion 4) constitute an attractive working framework to handle the small, large aradigm. A similar but stricter working framework was considered by Chen et al. 2010) in the context of hyothesis testing for Σ. By contrast, we avoid making assumtions about moments of fifth or higher order and we allow more otions for Σ. 3 Main Results Under the nonarametric model 2), the exectations in the numerator and denominator of the otimal shrinkage intensity λ in 1) can be exlicitly calculated to obtain that λ = E [ S Σ 2 ] F E [ trσ 2 ) + tr ] 2 Σ) + BtrD 2 Σ S νi 2 = ) F trσ 2 ) + +1 tr 2 Σ) + BtrD 2 Σ ). Since trd 2 A ) = tra A) tra2 ) tr 2 A) for any ositive definite matrix A, the contribution of BtrD 2 Σ ) to λ under assumtion 4) is negligible when comared to that of trσ2 ) or tr 2 Σ). Ignore BtrD 2 Σ ) in λ and define λ trσ 2 ) + tr 2 Σ) = trσ 2 ) + +1 tr 2 Σ). 3

4 This is the otimal shrinkage intensity of the multivariate normal model or, more generally, λ = λ when B = 0 in the nonarametric model 2). Under assumtion 4), it follows that ) 1) trσ 2 ) 1 λ λ tr2 Σ) = ) trσ 2 ) 1 tr2 Σ) + +1 tr2 Σ) trσ 2 ) 1 tr2 Σ) trσ 2 ) tr 2 Σ) 1 B trd2 Σ ) B trd ) 2 Σ ) tr 2 Σ) + +1 tr2 Σ) + BtrD 2 Σ ) ) B trd2 Σ ) tr 2 Σ) where k denotes the absolute value of the real number k. Therefore, the otimal shrinkage intensity obtained under normality is asymtotically equivalent to the otimal shrinkage intensity with resect to model 2) as long as the trace ratios restrictions in 4) hold. This means that it suffices to construct a nonarametric and consistent estimator of λ to estimate λ. To accomlish this, relace the unknown arameters trσ) and trσ 2 ) in λ with the unbiased estimators 0, Y 1 = U 1 U 4 = 1 i=1 X T i X i 1 P 2 X T j X i i j and Y 2 = U 2 2U 5 + U 6 = 1 P2 X T i X j ) P3 i j i j k X T i X j X T i X k + 1 P 4 i j k l X i X T j X k X T l resectively, where Pt s = s!/s t)! and denotes summation over mutually distinct indices. ote that U 1 and U 2 are the unbiased estimators of trσ) and trσ 2 ) resectively, when the data are centered around the mean vector i.e., µ = 0), while the remaining terms U 4,U 5 and U 6 ) are U- statistics of second, third and fourth order that ensure the unbiasedness of Y 1 and Y 2 when µ 0. In B, we argue that Y 1 and Y 2 are ratio-consistent estimators to trσ) and trσ 2 ) resectively. Here, it should be noted a statistic ˆθ is called a ratio-consistent estimator to θ if ˆθ/θ converges in robability to one. Therefore, it follows from the continuous maing theorem that ˆλ = Y 2 + Y 2 1 Y Y 2 1 is a consistent estimator of λ. The roosed Stein-tye shrinkage estimator for Σ, Ŝ = 1 ˆλ)S + ˆλˆνI, is obtained by lugging-in ˆν = Y 1 / and ˆλ in 1). 3.1 Alternative target matrices ext, we consider target matrices other than νi in 1). This extension is motivated by situations where λ 1 and νi is not a good aroximation of Σ. In this case, Ŝ remains a well-defined and non-singular covariance matrix estimator but it fails to reflect the underlying deendence structure. Let T be a well-conditioned and non-singular target matrix, and define the matrix S T = 1 λ T )S + λ T T. 5) 4

5 Simle algebraic maniulation shows that the otimal shrinkage intensity λ T = E [ S Σ 2 ] F + E [tr{s Σ)Σ T)}] / E [ ] S T 2 6) F minimizes the exected risk function E[ S T Σ 2 F ]. A closed form solution for λ T can be derived by calculating the exectations in 6) with resect to model 2). The key idea of our roosal is to simlify the estimation rocess for λ T by identifying terms in the numerator and denominator of λ T that can be safely ignored under assumtion 4). One aroach is to examine whether the otimal shrinkage intensity under the multivariate normal model assumtion is asymtotically equivalent to λ T. Whenever this is the case, we can set B = 0 in λ T and relace the remaining arameters with unbiased and ratio-consistent estimators to obtain ˆλ T. The roosed Stein-tye shrinkage covariance matrix estimator is Ŝ T = 1 ˆλ T )S + ˆλ T T. ote that if we set T = νi then λ T = λ and thus S T = S. We rovide the estimator of λ T for two target matrices: i) the identity matrix T = I, and ii) the diagonal matrix T = D S whose diagonal elements are the samle variances. To guarantee the consistency of λ T, we suose that t 1 = t 2 = 0 in 4). This assumtion does not heavily affects the class of Σ under consideration. In fact, all the deendence atterns mentioned in Section 2 satisfy this stronger version of assumtion 4) excet the comound symmetry correlation matrix. We believe that this is a small rice to ay if we are willing to increase the range of the target matrices in 1). First, when T = I in 5) the otimal shrinkage intensity in 6) becomes λ I = trσ 2 ) + tr 2 Σ) + BtrD 2 Σ ) trσ 2 ) + tr 2 Σ) + 1)tr [Σ I ) 2 ] + BtrD 2 Σ ). As before, we can ignore the terms BtrD 2 Σ ) in λ I and rove that ˆλ I = Y 2 + Y 2 1 Y 2 + Y 2 1 1)2Y 1 ) is a consistent estimator of λ I. When T = D S, we can use the results in A and rove that the otimal shrinkage intensity in 6) is λ D = E [ S Σ 2 ] F + E [tr{s Σ)Σ DS )}] / E [ ] S D S 2 F E[trS 2 )] trσ 2 ) ) + E[trΣD S )] E[trSD S )]) = where Σ 1/2 ab = E[trS 2 )] E[trD 2 S )] [ trσ 2 ) + tr 2 Σ) 2trD 2 Σ ) + B trd 2 Σ ) a=1 b=1 trσ 2 ) + tr 2 Σ) + 1)trD 2 Σ ) + B [ trd 2 Σ ) a=1 b=1 Σ 1/2 ab ) 4 ] Σ 1/2 ab ) 4 ], [ denotes the a, b)-th element of Σ 1/2. It can be shown that B trd 2 Σ ) a=1 b=1 has negligible contribution to λ D and that Σ 1/2 ab ) 4 ] Y 3 = U 3 2U 7 + U 8 = 1 P2 trx i X T i X j X T j ) 2 1 P3 i j i j k trx i X T i X j X T k ) + 1 P 4 i j k l trx i X T j X k X T l ) is an unbiased and ratio-consistent estimator to trd 2 Σ ). The construction of Y 3 is closely related to that of Y 1 and Y 2. To see this, note that Y 3 is a linear combination of three U-statistics, U 3, 5

6 the unbiased estimator of trd 2 Σ ) when µ = 0, and U 7 and U 8, which make the bias of Y 3 zero when µ 0. Hence, a consistent estimator of λ D is 3.2 Remarks ˆλ D = Y 2 + Y1 2 2Y 3 Y 2 + Y )Y. 3 There is no need to account for the mean vector when the random vectors are centered. In this case, the roosed shrinkage covariance matrix estimators can be obtained by relacing 1 with in the formula for ˆλ, ˆλ I or ˆλ D, the samle covariance matrix S with i=1 X ix T i / and the statistics Y 1, Y 2 and Y 3 with U 1, U 2 and U 3 resectively. The last modification is due to the fact that U 1, U 2 and U 3 are unbiased and ratio-consistent estimators to the targeted arameters when µ = 0. Fisher and Sun 2011) derived Stein-tye shrinkage covariance matrix estimators for the three target matrices considered herein under a multivariate normal model. We emhasize that our estimators differ in three imortant asects. First, the consistency of the roosed estimators for the otimal shrinkage intensities λ, λ I or λ D is not sensitive to deartures of the normality assumtion. Second, trσ 2 ) and trd Σ ) are estimated using U-statistics and are not based on the samle covariance matrix S as in Fisher and Sun 2011). Consequently, we avoid terms such as X T i X i) 2 or X 4 ia, which allows us to control their asymtotic variance. Third, the class of covariance matrices under consideration in Fisher and Sun 2011) is different. They require the first four arithmetic means of the eigenvalues of Σ to converge while we lace trace ratios restrictions on Σ. 3.3 Software imlementation The R R Core Team, 2014) language ackage Shrinkcovmat imlements the roosed Stein-tye shrinkage covariance estimators and it is available at htt://cran.r-roject.org/web/ackages/shrinkcovmat. The core functions shinkcovmat.equal, shinkcovmat.identity and shinkcovmat.unequal rovide the roosed shrinkage covariance matrix estimators when T = νi, T = I and T = D S resectively. The statistics Y 1, Y 2 and Y 3 are calculated using the comutationally efficient formulas given in C. To modify the shrinkage estimators when µ = 0, one should set the argument centered=true in the core functions. 4 Simulations We carried out a simulation study to investigate the erformance of the roosed Stein-tye covariance matrix estimators. The -variate random vectors X 1,...,X were generated according to model 2), where we emloyed the following three distributional scenarios regarding Z 1,..., Z : 1. A normality scenario, in which Z ia i.i.d 0, 1). 2. A gamma scenario, in which Z ia = Z ia 8)/4, Z ia i.i.d Gamma4, 0.5) and thus B = A mixture of Scenarios 1 and 2, in which the first /2 elements of Z i are distributed according to Scenario 1 and the remaining elements according to Scenario 2. Scenarios 2 and 3 were used to emirically verify the nonarametric nature of the roosed methodology. To mimic high-dimensional settings, we let range from 10 to 100 with increments of 10 and we let = 100, 500, 1000, 1500 and The roosed family of covariance estimators was evaluated using Ŝ, the roosed shrinkage covariance matrix estimator when T = νi, which was then comared to Ŝ LW and Ŝ F S, the corresonding shrinkage covariance matrix estimator roosed by Ledoit and Wolf 2004) and Fisher and Sun 2011) resectively. As in Ledoit and Wolf 2004), we assumed that µ = 0 and we made the necessary adjustments to Ŝ and Ŝ F S. Since Ŝ F S was constructed under a multivariate normal model assumtion, its erformance was evaluated only in samling schemes that involved Scenario 1. We excluded the shrinkage estimator of Schäfer and Strimmer 2005) in 6

7 Table 1: Estimation of the otimal shrinkage intensity λ = 1 under Scenario 1 when Σ = I. ˆλ ˆλLW ˆλF S Mean S.E. Mean S.E. Mean S.E our simulation studies because they do not guarantee consistent estimation of λ in high-dimensional settings. Given Σ,, and the distributional scenario, we draw 1000 relicates based on which we calculated the simulated ercentage relative imrovement in average loss SPRIAL) of Ŝ and of Ŝ F S for estimating Σ. Formally, the SPRIAL criterion of ˆΣ was defined as SPRIAL ˆΣ) = 1000 b=1 Ŝ LW,b Σ 2 F 1000 b=1 ˆΣ b Σ 2 F 1000 b=1 Ŝ LW,b Σ 2 F 100%, where Ŝ LW,b denotes the estimator of Ledoit and Wolf 2004) at relicate b b = 1,..., 1000) and ˆΣ b denotes the corresonding estimator of the cometing estimation rocess that generates the covariance matrix estimator ˆΣ. By definition, SPRIALΣ) = 100% and SPRIALŜ LW ) = 0%. Therefore, ositive negative) values of SPRIAL ˆΣ) imly that ˆΣ is a more less) efficient covariance matrix estimator than Ŝ LW while values around zero imly that the two estimators were equally efficient. ote that we treated Ŝ LW as the baseline estimator because Ledoit and Wolf 2004) have already established its efficiency over the samle covariance matrix S. To exlore the situation in which the target matrix equals the true covariance matrix, we set Σ = I. In this case, accurate estimation of λ is crucial due to the fact that λ = 1 regardless of and. Table 1 contains the simulation results under Scenario 1 for the mean of the estimated otimal shrinkage intensities based on the roosed method ˆλ), the aroach of Ledoit and Wolf 2004) ˆλ LW ) and the aroach of Fisher and Sun 2011) ˆλ F S ) when = 10, 50, 100 and = 100, 1000, As required, ˆλ, ˆλLW and ˆλ F S were all restricted to lie on the unit interval. Comared to ˆλ LW, ˆλ aeared to be more accurate in estimating λ as it was substantially less biased and with slightly smaller standard error for all and. Although the bias of ˆλ LW decreased as increased, ˆλ LW seemed to be biased downwards even when = 100. These trends also occurred for Scenarios 2 and 3, and for this reason the results are omitted. As exected, no significant difference was noticed between ˆλ and ˆλ F S under Scenario 1. Figure 1 dislays SPRIALŜ ) under Scenario 2 - similar atterns were observed for the other two distributional scenarios. Clearly, Ŝ was more effective than Ŝ LW for % 97.81%) and for = 100 and % 61.16%). We should mention that Ŝ and Ŝ F S were equally efficient, meaning that SPRIALŜ ) and SPRIALŜ F S ) took similar values. To investigate the erformance of Ŝ when Σ deviates slightly from the target matrix, we emloyed a tridiagonal correlation matrix for Σ in which the non-zero off-diagonal elements were all equal to 0.1. Figure 2 suggests that Ŝ outerformed Ŝ LW in extreme high-dimensional settings 50) under Scenario 3. The efficiency gains in using Ŝ instead of Ŝ LW decreased at a much faster rate than before as increased. ote that similar trends were observed under Scenarios 1 and 2 results not shown). ext, we considered deendence structures such that νi is a rather oor aroximation of Σ. First, we defined Σ as a first order autoregressive correlation matrix with a, b)-th element Σ ab = 0.5 a b. Although the results in Table 2 imly that ˆλ is a more accurate estimator of λ than ˆλ LW, 7

8 SPRIAL Figure 1: SPRIALŜ ) under Scenario 2 when Σ = I and = 100 symbols), symbols), 1000 symbols), 1500 x symbols) or 2500 symbols). SPRIAL Figure 2: SPRIALŜ ) under Scenario 3 when Σ is a tridiagonal correlation matrix with a, b)-th element Σ ab = 0.1I a b ) and = 100 symbols), symbols), 1000 symbols), 1500 x symbols) or 2500 symbols). Figure 3 suggests that Ŝ and Ŝ LW were almost equally efficient as soon as = 30 and regardless of the dimensionality. We reached the same conclusions when we let Σ to be either a ositive definite matrix whose eigenvalues were drawn from the uniform distribution U0.5, 10) or a block diagonal covariance matrix such that the dimension of each block matrix is /4 /4 and the eigenvalues of these 4 block matrices were drawn from U0.5, 5), U5, 10), U10, 20) and U0.5, 100) resectively. In articular, ˆλ seemed to outerformed ˆλ LW for all configurations of,, Σ) while SPRIALŜ ) was close to zero unless 30. Hence, if the target matrix νi fails to describe adequately the deendence structure, we exect Ŝ to be more efficient than Ŝ LW only for small samle sizes. Further, we adoted a comound symmetry correlation form for Σ with correlation arameter ρ = 0.5. This is the only configuration of Σ where t 1 0 and t 2 0 in assumtion 4). The estimator ˆλ LW aeared to be less biased but with larger standard error than ˆλ, while the SPRIALŜ ) was always close to zero. The erformance of Ŝ and Ŝ LW was comarable across the related samling schemes. We also evaluated the erformance of the inverse of the cometing Stein-tye shrinkage covariance matrix estimators using a modified version of the SPRIAL criterion, that is we relaced Σ, Σ and Ŝ LW with their corresonding inverses in SPRIAL ˆΣ). The inverse of Ŝ was more efficient that of Ŝ LW when Σ was equal to the identity matrix or to the tridiagonal correlation matrix, and quite surrisingly when Σ satisfied the comound symmetry correlation attern and 50, as shown in Figure 4. In all other samling schemes, the inverses of Ŝ and Ŝ LW were comarable. 8

9 Table 2: Estimation of λ under Scenario 1 when Σ satisfies a first-order auto-regressive correlation form with correlation arameter ρ = 0.5. ˆλ ˆλLW ˆλF S λ Mean S.E. Mean S.E. Mean S.E SPRIAL Figure 3: SPRIALŜ ) under Scenario 2 when Σ satisfies a first-order autoregressive form with correlation arameter ρ = 0.5 and = 100 symbols) or 2500 symbols). Based on our simulations, SPRIALŜ ) was an increasing function of and a decreasing function of while keeing the remaining arameters fixed. This indicates that, comared to Ŝ LW, Ŝ is an imroved estimator of Σ in extreme high-dimensional settings. The nonarametric nature of Ŝ was emirically verified because both the SPRIAL criterion and the bias of ˆλ seemed to remain constant across the three distributional scenarios. In addition, we find reassuring that the simulation results for ˆλ and ˆλ F S and for Ŝ and Ŝ F S were almost identical under a multivariate normal distribution. To this end, we also believe that the above trends together with the fact that we have not encountered a situation in which Ŝ was erforming significantly worse than Ŝ LW are favoring the consistency of the roosed covariance matrix estimators. 5 Emirical Study: Colon Cancer Study Alon et al. 1999) described a colon cancer study where exression levels for 2000 genes were measured on 40 normal and on 22 colon tumor tissues. The dataset is available at htt://genomicsubs.rinceton.edu/oncology/affydata. As in Fisher and Sun 2011), we aly a logarithmic base 10) transformation to the exression levels and sort the genes based on the between-grou to within-grou sum of squares BW) selection criterion Dudoit et al., 2002). In the literature, it has been suggested to estimate the covariance matrix of the genes using a subset of the to genes see, e.g., Fisher and Sun, 2011). With this in mind, we lan to estimate the covariance matrix of the normal and of the colon cancer grou for subsets of the to genes, where = 250, 500, 750, 1000, 1250, 1500, 1750 and The Quantile-Quantile lots in Figure 5 raise concerns regarding the assumtion that the 4 to 9

10 SPRIAL Figure 4: SPRIAL criterion for the inverse of Ŝ under Scenario 3 when Σ satisfies a comound symmetry correlation matrix and = 100 symbols) or 2500 symbols) Figure 5: Quantile-Quantile lots for the exression levels of the 4 to genes according to the BW selection criterion. The to anel corresonds to the normal grou and the bottom anel to the colon cancer grou. genes are marginally normally distributed. Since similar atterns occur for the remaining genes, we conclude that a multivariate normal model is not likely to hold and nonarametric covariance matrix estimation is required. For this urose, we calculate Ŝ, Ŝ I and Ŝ D for both grous and for all values of. The otimal shrinkage intensity λ T in 5) is informative for the suitability of the selected target matrix T because it reveals its contribution to S T. If ˆλ, ˆλI and ˆλ D differ significantly, then it is meaningful to choose the target matrix with the largest estimated otimal shrinkage intensity. Otherwise, the selection of the target matrix can be based on ˆν and r, the range of the samle variances. In articular, we suggest to emloy I when ˆν is close to 1.00, D S when r is large say more than one unit so as to account for the samling variability), and νi when neither of these seems lausible. Table 3 dislays the estimated otimal shrinkage intensities for the roosed family of shrinkage covariance estimators, ˆν and r in both grous. The estimates of λ and λ D are very similar in both grous for all subsets of genes and larger than those of λ I. This suggests that D S and νi are better otions than I as a target matrix. Since r aears to be relatively small and constant across the to genes in both grous, it seems sensible to set T = νi for all. Therefore, we recommend using Ŝ to estimate the covariance matrix of the genes in the normal and in the colon cancer grou and regardless of. 10

11 Table 3: The estimates of λ, λ I, λ D, ν and the range of the samle variances r) in the colon cancer dataset. Grou Statistic ormal ˆλ ˆν ˆλ D r ˆλ I Colon ˆλ ˆν ˆλ D r ˆλ I Discussion We roosed a new family of nonarametric Stein-tye shrinkage estimators for a high-dimensional covariance matrix. This family is based on imroving the nonarametric estimation of the otimal shrinkage intensity for three commonly used target matrices at the exense of imosing mild restrictions for the covariance matrix Σ. In our simulations, the roosed shrinkage covariance matrix estimator Ŝ was more recise than that of Ledoit and Wolf 2004) for estimating Σ, esecially when the number of variables is extremely large comared to the samle size and/or νi is a good aroximation of Σ. Unsurrisingly, the behavior of our estimators and that of Fisher and Sun 2011) were similar as long as an underlying multivariate normal model holds. However, we emhasize that our estimators are more flexible since they are robust to deartures from normality. In addition, we recommended a simle data-driven strategy for selecting the target matrix in 5). The main idea is to comare ˆλ, ˆλ I and ˆλ D and choose the target matrix with the largest estimated otimal shrinkage intensity. If these estimates are similar and ˆν is close to one, then it is sensible to use I as target matrix. Otherwise, νi should be selected when the samle variances are very close and D S when these vary. The R ackage ShrinkCovMat imlements the roosed estimators. Since Ŝ, Ŝ I and Ŝ D are all biased estimators of Σ, their risk functions might be unbounded esecially when none of the corresonding target matrices describes adequately the underlying deendence structure. In these situations, a more suitable target matrix T in 5) must be considered. The corresonding λ T should be estimated by following our guidelines in Section 3.1. If such a target matrix cannot be identified, one should choose among the alternative covariance matrix estimation methods mentioned in the Introduction. In future research, we aim to investigate formal rocedures for selecting the target matrix in 5) and to extend the roosed Steinian shrinkage aroach for estimating correlation matrices. A Useful Results We list six results with resect to model 2) that allows us to derive the formulas for the λ, λ I and λ D : 1. E[trS)] = trσ). 2. E[trS 2 )] = 1 trσ2 ) tr2 Σ) + B 1 trd2 Σ ). 3. E[tr 2 S)] = tr 2 Σ) trσ2 ) + B 1 trd2 Σ ). 4. E[trD S )] = trd Σ ). 11

12 5. E[trD 2 S )] = E[trSD S)] = +1 1 trd2 Σ ) + a, b)-th element of Σ 1/2. B 1 a=1 b=1 Σ 1/2 ab ) 4, where Σ 1/2 ab denotes the 6. E[trΣD S )] = trd 2 Σ ). B Consistency of Y 1, Y 2 and Y 3 ote that E[Y 1 ] = E[trS)] = trσ). Under assumtion 4), derivations in Chen et al. 2010) imly that [ ] Var[Y 1 ] 2 + max{0, B} 2 trσ 2 ) 3 + max{0, B} tr 2 + Σ) 1) tr 2 0. Σ) Thus Y 1 is a ratio-consistent estimator to trσ). Similarly, it can be shown that Y 2 and Y 3 are unbiased estimators to trσ 2 ) and trd 2 Σ ) resectively, and that Var[Y 2]/tr 2 Σ 2 ) 0 and Var[Y 3 ]/tr 2 D 2 Σ ) 0 as. Hence, Y 2 and Y 3 are ratio-consistent estimators of trσ 2 ) and trd 2 Σ ). C Alternative Formulas for Y 1, Y 2 and Y 3 ote that Y 1 = U 1 U 4 = 1 Himeno and Yamada 2014) showed that i=1 X T i X i 1 P 2 X T j X i = trs). i j Y 2 = U 2 2U 5 + U 6 = 1 P2 X T i X j ) P3 + 1 P 4 = i j i j k l X i X T j X k X T l 1 2) 3) i j k X T i X j X T i X k [ 1) 2)trS 2 ) + tr 2 S) Q ] where Q = 1 1 [ Xi X) T X i X) ] 2. i=1 12

13 Also it can be shown that Y 3 = U 3 2U 7 + U 8 = 1 P2 trx i X T i X j X T j ) 2 1 P3 + 1 P 4 = 1 P P P 4 i j i j k l a=1 i j trx i X T j X k X T l ) X 2 iax 2 ja a=1 2 a=1 i=1 1 Xia) 1 2 i=1 j=i+1 i=1 j=i+1 X ia X ja 2 i j k X ia X ja trx i X T i X j X T k ) a=1 i j a=1 i j X 2 iax 2 ja X 3 iax ja Together these results reduce the comutational cost for Y 1, Y 2 and Y 3 from O 4 ) to O 2 ). References U. Alon,. Barkai, D. A. otterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad atterns of gene exression revealed by clustering analysis of tumor and normal colon tissues robed by oligonucleotide arrays. Proceedings of the ational Academy of Sciences of the United States of America, 96: , S.X. Chen and Y.L. Qin. A two-samle test for high-dimensional data with alications to gene-set testing. The Annals of Statistics, 38: , S.X. Chen, L.X. Zhang, and P.S. Zhong. Tests for high-dimensional covariance matrices. Journal of the American Statistical Association, 105: , S. Dudoit, J. Fridlyand, and T. P. Seed. Comarison of discrimination methods for the classification of tumors using gene exression data. Journal of the American Statistical Association, 97:77 87, T.J. Fisher and X. Sun. Imroved Stein-tye shrinkage estimators for the high-dimensional multivariate normal covariance matrix. Comutational Statistics and Data Analysis, 55: , T. Himeno and T. Yamada. Estimations for some functions of covariance matrix in high dimension under non-normality. Journal of Multivariate Analysis, 130:27 44, O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88: , R Core Team. R: A Language and Environment for Statistical Comuting. R Foundation for Statistical Comuting, Vienna, Austria, URL htt:// J. Schäfer and K. Strimmer. A shrinkage aroach to large-scale covariance matrix estimation and imlications for functional genomics. Statistical Alications in Genetics and Molecular Biology, 4: 32, C. Stein. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings of the Third Berkeley Symosium on Mathematical Statistics and Probability, 1: ,

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo