Linear Ridge Estimator of High-Dimensional Precision Matrix Using Random Matrix Theory

Size: px

Start display at page:

Download "Linear Ridge Estimator of High-Dimensional Precision Matrix Using Random Matrix Theory"

Mark Wilkinson
5 years ago
Views:

1 CIRJE-F-995 Linear Ridge Estimator of High-Dimensional Precision Matrix Using Random Matrix Theory Tsubasa Ito Graduate School of Economics, The University of Tokyo Tatsuya Kubokawa The University of Tokyo November 205 CIRJE Discussion Paers can be downloaded without charge from: htt:// Discussion Paers are a series of manuscrits in their draft form. They are not intended for circulation or distribution excet as indicated by the author. For that reason Discussion Paers may not be reroduced or distributed without the written consent of the author.

2 Linear Ridge Estimator of High-Dimensional Precision Matrix Using Random Matrix Theory Tsubasa Ito and Tatsuya Kubokawa November 4, 205 Abstract In estimation of the large recision matrix, this aer suggests a new shrinkage estimator, called the linear ridge estimator. This estimator is motivated from a Bayesian asect for a sike and slab rior distribution of the recision matrix, and has a form of convex combination of the ridge estimator and the identity matrix multilied by scalar. The otimal arameters in the linear ridge estimator are derived in terms of minimizing a Frobenius loss function and estimated in closed forms based on the random matrix theory. Finally, the erformance of the linear ridge estimator is numerically investigated and comared with some existing estimators. Key words and hrases: Large-dimensional asymtotics, nonlinear shrinkage, recision matrix, random matrix theory, ridge tye estimator, rotation-equivariant estimators. Introduction The estimation of a large covariance matrix has been actively studied in the literature in recent years. Analysis of high-dimensional data is an imortant and modern toic in genetics in bioscience, ortfolio in financial economics and others. However, the samle covariance matrix is not invertible when the dimension of the variables is larger than the samle size N. When is large and close to N, the inverse of the samle covariance matrix may be ill-conditioned even if N >. To address this roblem, many aroaches have been considered in the literature. Under some model structures such as sarsity or ordering, enalized methods have been roosed and the oulation covariance matrix can be estimated consistently. However, the true model structures are generally unknown and the estimates become inconsistent in the case that the model structure is missecified. In the absence of such a rior information on the structure of the covariance matrix, the shrinkage method is a useful aroach. Since the eigenvalues of the samle covariance matrix diverge more widely than the eigenvalues of the oulation covariance matrix, it is reasonable to shrink the samle eigenvalues in the direction of their center. Linear shrinkage estimators of the covariance matrix have been suggested in Ledoit and Wolf (2004), Srivastava and Kubokawa (2007), Touloumis (205) and others. A nonlinear shrinkage aroach Graduate School of Economics, University of Tokyo, 7-3- Hongo, Bunkyo-ku, Tokyo , JAPAN. tsubasa ito.070@gmail.com Faculty of Economics, University of Tokyo, 7-3- Hongo, Bunkyo-ku, Tokyo , JAPAN. tatsuya@e.u-tokyo.ac.j

3 based on the random matrix theory is one of recent hot issues in statistics, and Ledoit and Wolf (202, 5) develoed some fundamental results and more efficient nonlinear shrinkage estimators. For the related works, see also Wang, Pan, Tong and Zhu (205), Bodnar, Guta and Parolya (205) and the references therein. Although the estimation of the covariance matrix has received more attention than that of the recision matrix, the estimation of the recision is more imortant in multivariate analysis, since for examle, the recision matrix aears in the Fisher linear discriminant analysis, confidence intervals based on the Mahalanobis distance and generalized least squares estimators in multivariate linear regression models. As an aroach, one can use the inverse matrix of estimators of the covariance matrix. However, the inverse of the covariance matrix estimator is not necessarily otimal in the estimation of the recision matrix. Further, two estimation roblems of the covariance and recision matrices are drastically different when > N. Thus, in this aer, we address the roblem of estimating the recision matrix directly. To exlain the roblem more secifically, let Σ be a covariance matrix, and let S be the samle covariance matrix such that E[S ] = Σ. Using the random matrix theory, Ledoit and Wolf (202, 5) derived asymtotic otimal shrinkage coefficients in the general class of rotationequivariant estimators, and roosed the nonlinear shrinkage estimator for the recision matrix. The nonlinear shrinkage estimator erforms well for large N, but the erformance dwindles when N or N/ is small, because the consistent estimators of the eigenvalues of Σ are not stable for small N or N/. Bodnar, et al. (205) suggested the linear shrinkage estimator { Ω linear αs + β I = if N > αs + + β I if N <. This estimator has an aealing form of shrinking S or S + towards βi. However, a drawback is that under their settings the arameters α and β cannot be estimated in the case of > N, without assuming that Σ = σ 2 I for scalar σ 2. Wang, et al. (205) considered the ridge-tye estimator Ω ridge = α(s + γi ), and rovided estimators of α and γ using the random matrix theory. Although this estimator is stable for any and N, it shrinks all the samle eigenvalues toward the same direction, which is not necessarily desirable in the estimation of the recision matrix. In this aer, we suggest a new tye of estimator given by Ω LR = α(s + γi ) + βi, and we call it the linear ridge estimator. This estimator is stable for any and N, and re-shrinks the shrunk samle eigenvalues toward the constant β. By combining the linear and the ridge estimators, the linear ridge estimator can comensate their weakness. Using the random matrix theory, we can derive the limits of the otimal α and β for fixed γ. Based on the limits, we can rovide the estimators of the otimal α and β in closed forms. It is also interesting to oint out that the linear ridge estimator can be motivated from a Bayesian asect for a sike and slab rior distribution of Σ. This motivation is useful when we need to restrict the estimators of α, β and γ. 2

4 The aer is organized as follows: The basic framework of the random matrix theory is described in Section 2. The loss function based on the Frobenius norm and the class of rotationequivariant estimators are given. In Section 3, we rovided the Ledoit-Wolf tye estimator and the suggested linear ridge estimator. The otimal estimators of the arameters in these estimators are derived based on the random matrix theory. The Bayesian motivation of the linear ridge estimator is also given. In Section 4, we investigate the finite-samle erformance of the roosed estimator through simulation and emirical studies. Concluding remarks are given in Section 5 and the technical roofs are given in the Aendix. 2 Basic Framework 2. Preliminary results in the random matrix theory We begin by stating the basic assumtions which are common in estimation of the highdimensional covariance matrix based on the random matrix theory. Throughout the aer, R and C denote the saces of real and comlex numbers, resectively. Also, C + denotes the half-lane of comlex numbers with strictly ositive imaginary art. The real and imaginary arts of z C are denoted by R(z) and I(z), resectively. (A) Let N and (N) denote the samle size and the number of variables resectively. In the large-dimensional asymtotics, goes to infinity, as N does, but it is assumed that the ratio /N converges to a limit y (0, ) (, + ), as N +. The case y = is excluded for technical reason. (A2) The oulation covariance matrix Σ is a non-random -dimensional ositive definite matrix, where the subscrit denotes the dimension of the matrix. Let X = (x,,..., x,n ) T be an N random matrix, where x,,..., x,n are mutually indeendently and identically distributed as E[x,j ] = 0 and Cov (x,j ) = I. It is assumed that there exist fourth moments of all elements of x,j. Let Y = (y,,..., y,n ) T, where y,j = Σ /2 x,j for the squared symmetric root matrix Σ /2 such that Σ = (Σ /2 ) 2. It is assumed that Y is observable. (A3) Let t = (t,,, t, ) T be a system of eigenvalues of Σ, sorted in decreasing order. The emirical sectral distribution (ESD) of the oulation eigenvalues is defined by H (t) [t,i,+ ), t R, i= where A denotes the indicator function of set A. It is assumed that H (t) converges to some limit H(t) at all oints of continuity of H. (A4) Su(H), the suort of H, is the union of a finite number of closed intervals, bounded away from zero and infinity. Furthermore, there exists a comact interval in (0, + ) that contains Su(H ) for all N large enough. Let S = N Y T Y. As noted in Section 5, all the results given in this aer still hold in the case of E[y,j ] = µ. A system of eigenvalues of S, sorted in decreasing order, and the 3

5 corresonding eigenvectors are denoted by l = (l,,, l, ) T and (u,..., u ), resectively. The emirical sectral distribution (ESD) of S is defined by F (t) [l,i,+ ), t R. i= Marčenko and Pastur (967) greatly contributed the research on large-dimensional asymtotic theories of eigenvalues of S. This asymtotic theory is called the random matrix theory, and has been generalized by Silverstein (995), Silverstein and Bai (995), Silverstein and Choi (995) among others. These literatures imly that under assumtions (A)-(A4), there exists a distribution function F such that F (x) F (x), x R\{0}, which is called the limiting sectral distribution (LSD). Silverstein and Choi (995) showed that F is everywhere continuous excet at zero, and that the mass of F at zero is given by by F (0) = max{ y, H(0)}. (2.) For a nondecreasing function G on the real line, the stieltjes transform m G of G is defined m G (z) x z dg(x), z C+. The stieltjes transform has the well-known inversion formula G{[a, b]} = π lim η 0 + b a I ( m G (ξ + iη) ) dξ, (2.2) if G is continuous at a and b. Silverstein (995) develoed the most general version of the equation which relates F to H and y, namely, m m F (z) is the unique solution in the set {m C : ( y)/z + ym C + } to the equation m F (z) = t( y yzm F (z)) z dh(t), z C+. (2.3) Silverstein and Choi (995) showed that m F is extended to R, that is, lim z C + λm F (z) m F (λ) for λ R. It is noted from the inversion formula that F (λ) = π I[ m F (λ)] on R. We next consider the limiting sectral distribution (LSD) of the eigenvalues of N Y Y T = N X Σ X T. Denote the LSD by F. The eigenvalues of N Y T Y and N Y Y T only differs by N zero eigenvalues. It then holds: F (x) =( y) [0,+ ) (x) + yf (x), x R, F (x) = y y [0,+ ) (x) + F, x R, y m F (z) = y z m F (z) = y yz + ym F (z), z C +, + y m F (z), z C +. 4

6 2.2 Setu of the roblem Consider the roblem of estimating Σ by estimator Ω, where the estimator is evaluated by the risk function relative to the loss function L (Σ, Ω ) tr (Ω Σ I )(Ω Σ I ) T. (2.4) This loss function is based on the Frobenius norm A F = tr (AA T ), namely, L (Σ, Ω ) = Ω Σ I 2 F. We thus call it the Frobenius loss function. The loss function satisfies that L (Σ, Ω ) 0 and L (Σ, Σ ) = 0. For remarks of other loss functions, see Section 5. Ledoit and Wolf (202, 5) considered a class of rotation-equivariant estimators in estimation of Σ and Σ. An estimator Σ = Σ (Y ) of Σ is called rotation-equivariant when for any -dimensional orthogonal matrix W, Σ (Y W ) = W T Σ (Y )W. For the samle covariance matrix, it is well known that eigenvalues of S diverge more widely than those of Σ. Rotationequivariant estimators have the same eigenvectors as S and relace eigenvalues of S with other values shrunk towards a center of the eigenvalues. In the estimation of the inverse Σ, rotationequivariant estimators Ω = Ω (Y ) satisfy Ω (Y W ) = W T Ω (Y )W for any orthogonal matrix W. Thus, every rotation-equivariant estimator of Σ is of the form Ω (A ) = U A U T, A = diag(a,, a ), (2.5) where U = (u,..., u ), and a i may deend on the eigenvalues l of S. The class of rotation-equivariant estimators include various estimators. The ridge estimator given by Wang, et al. (204) corresonds to the case that a i = α/(l i + γ), and the linear estimator treated by Bodnar, et al. (204) corresonds to a i = α/l i + β for N > and a i = β for N <. Ledoit and Wolf (205) treated the general rotation-equivariant estimator directly. The roblem is how to estimate the arameters a i s, α, γ or β. A reasonable way is the estimation of the otimal a i s which minimize the risk function R (Σ, Ω ) E[L (Σ, Ω )]. Instead of minimizing risk functions, Ledoit and Wolf (202) obtained the otimal a i s which minimize the loss function and estimated the asymtotic otimal a i s using the random matrix theory. This argument is used in the next section to estimate the arameters in the estimation of the recision matrix Σ under the Frobenius loss (2.4). 3 Non-linear Shrinkage Estimation of the Precision Matrix In this aer, we treat two non-linear shrinkage estimators: the Ledoit-Wolf tye estimator and the linear ridge estimator. The estimators of the otimal arameters are rovided based on the random matrix theory. 3. Ledoit-Wolf tye estimator The Ledoit-Wolf tye estimator can be derived based on the otimal a i s which minimize the Frobenius loss (2.4) of the rotation-equivariant estimator Ω (A ) in (2.5). In fact, the loss function is written as L (Σ, Ω (A )) = i= {a2 i ut i Σ2 u i 2a i u T i Σ u i } +, which is minimized at a oracle i = u T i Σ u i /u T i Σ 2 u i, (3.) 5

7 for i =,...,. Since a oracle i s deend on Σ, we need to estimate them. Using the random matrix theory, Ledoit and Péché (20) showed that under (A)-(A4), u T i Σ u i can be aroximated as δ(l i ), where x y yx m F (x) 2 if x > 0, δ(x) = (3.2) if x = 0, y >, (y ) m F (0) 0 otherwise. Concerning the term u T i Σ2 u i, we can use a similar argument as in Ledoit and Péché (20). In fact, the following theorem guarantees that it can be aroximated by ϕ(l i ), where x 2 { y 2 2y 2 xr[ m F (x)] y 2 x 2 m F (x) 2 } ( y yx m F (x)) tdh(t)yx y yx m F (x) 2 if x > 0, ϕ(x) = y y m F (0) ( tdh(t) ) y m F (0) if x = 0, y >, 0 otherwise. (3.3) Theorem Under (A)-(A4), the rotation-equivariant estimator Ω (A ) with a i = a i (l i ) has the almost sure limit given by {a L (Σ, Ω (A )) 2 (x)ϕ(x) 2a(x)δ(x) } df (x) +. (3.4) The roof is given in the Aendix. Hence, one gets the Ledoit-Wolf tye estimator Ω LW =U A U T, A =diag(a,, a ), a i = δ(l i )/ϕ(l i ). (3.5) Theorem imlies that the estimator Ω LW attains the minimum of the limiting loss function, given by [{δ(x)} 2 /ϕ(x)]df (x) +. Since y, F and H are unknown, however, Ω LW is not feasible. The way for obtaining a bona fide estimator in the oracle estimator Ω LW is rovided by Ledoit and Wolf (205). Here, the estimators of m F (x) and m F (0) are denoted by mf (x) and m F (0), (3.6) resectively, where the detail for the derivation is given in the Aendix. Also tdh(t) in equation (3.3) is unknown, but this is the limit of tr [Σ ] and can be estimated by tr [S ]. Relacing m F (x), m F (0) and tdh(t) in equations (3.2) and (3.3) with their estimators, and relacing the limiting concentration ratio y with /N, we have the consistent estimators for δ(x) and ϕ(x). Then, we get the bona fide estimator δ(x) and ϕ(x), (3.7) Ω LW =U Â U T, Â =diag(â,, â ), â i = δ(l i )/ ϕ(l i ). (3.8) 6

8 3.2 Linear ridge estimator In the estimation of the inverse Σ, we here consider a new tye of the estimator Ω (α, β, γ) = α(s + γi ) + βi, (3.9) where α, β and γ are unknown arameters. This estimator not only belongs to the class of rotation-equivariant estimators (2.5) with a i = α/(l i + γ) + β, but also corresonds to the ridge estimator for β = 0 and the linear estimator for γ = 0. We call Ω (α, β, γ) the linear ridge estimator. This estimator is well-conditioned and available when > N. The use of the form β 0 I for constant β 0 has been suggested as an estimator of Σ in the literature in the ultra-high dimensional case. The ridge-tye estimator α 0 (S + γi ) for constant α 0 is reasonable unless is ultra-high dimensional. Thus, the linear ridge estimator (3.9) is interreted as a convex combination of α 0 (S +γi ) and β 0 I, namely, for 0 < w <, ( w)α 0 (S +γi ) +wβ 0 I = α(s + γi ) + βi for α = ( w)α 0 and β = wβ 0. A related exlanation is given in the next subsection from a Bayesian asect. The Frobenius loss function of Ω (α, β, γ) is written as L (Σ, Ω (α, β, γ)) = tr [ {α(s + γi ) Σ + βσ I }{α(s + γi ) Σ + βσ I } T]. Given γ, the otimal α and β are given by α (γ) = tr [(S + γi ) Σ ]tr [Σ 2 ] tr [(S + γi ) Σ 2 ]tr [Σ ] tr [(S + γi ) 2 Σ 2 ]tr [Σ 2 ] {tr [(S + γi ) Σ 2 ]} 2, β (γ) = tr [(S + γi ) 2 Σ 2 ]tr [Σ ] tr [(S + γi ) Σ ]tr [(S + γi ) Σ 2 ] tr [(S + γi ) 2 Σ 2 ]tr [Σ 2 ] {tr [(S + γi ) Σ 2 ]} 2. Substituting these otimal quantities into the loss function, we can get L (γ) =L (Σ, Ω LR (α (γ), β (γ), γ)) = [tr [(S + γi ) 2 Σ 2 ]tr [Σ 2 ] {tr [(S + γi ) Σ 2 ]} 2] [ {tr [(S + γi ) Σ ]} 2 tr [Σ 2 ] tr [(S + γi ) 2 Σ 2 ](tr [Σ ]) 2 ] + 2 tr [(S + γi ) Σ ]tr [(S + γi ) Σ 2 ]tr [Σ ] +. The asymtotic behaviors of α (γ), β (γ) and L (γ) can easily be checked by alying the results of the literature. The results are given in the following theorem, which will be roved in the Aendix. Theorem 2 Let t = tdh(t), t 2 = t 2 dh(t), B(γ) = γ m F ( γ) y( γ m F ( γ)) and B (γ) = d dγ B(γ). 7

9 Under (A)-(A4), α (γ), β (γ) and L (γ) converge to α(γ), β(γ) and L(γ), resectively, where where α(γ) = { } t yγb 2 (γ) + (t 2 + t γ t 2 c(γ) y)b(γ) t 2, β(γ) = { } yγb 3 (γ) + γb 2 (γ) + 2t yγb(γ)b (γ) + t (γ t )B (γ), c(γ) L(γ) = + { 2t yγb 3 (γ) + ( t 2 + t 2 c(γ) y 2t γ)b 2 (γ) + t 2 B(γ) } 2t 2 yγb(γ)b (γ) + t 2 (t γ)b (γ), c(γ) = y 2 γ 2 B 4 (γ) + {t 2 yγ + 2yγ(t y γ)}b 3 (γ) + (t 2 γ t 2 y 2 + 4t yγ γ 2 )B 2 (γ) 2t (yt γ)b(γ) + 2t t 2 γb(γ)b (γ) + t t 2 (γ t )B (γ) t 2. (3.0) To estimate α, β and γ, we need to estimate α(γ), β(γ) and L(γ). As mentioned in Wang, et al. (205), for γ > 0, m F ( γ) can easily be estimated by tr [(S + γi ) ]/, since m F ( γ) is the limit of m F ( γ) = tr [(S + γi ) ]/. Thus, consistent estimators of B(γ) and B (γ) given γ are γtr [(S + γi ) B(γ) ]/ = (/N) + γtr [(S + γi ) ]/N, B (γ) = tr [(S + γi ) ]{(/N) B(γ) } (/N) + γtr [(S + γi ) ]/N. (3.) Since t = tdh(t) is the limit of a = tr [Σ ]/, it is estimated by â = tr [S ]/. Also, note that t 2 = t 2 dh(t) is the limit of a 2 = tr [Σ 2 ]/. Then we can use the consistent estimator â 2 = N { (N )(N 2)tr [ S 2 } N(N 2)(N 3) ] + {tr [ S ]} 2 NQ, where S = (N ) N i= (y,i y )(y,i y ) T and Q = (N ) N i= {(y,i y ) T (y,i y )} 2 for y = N N i= y,i. For the details of the estimator â 2, see Himeno and Yamada (204). Thus, consistent estimators of t and t 2 are given by â and â 2. Relacing B(γ), B (γ), t, t 2 and y with B(γ), B (γ), â, â 2 and /N in (3.0), we have α (γ), β (γ) and L (γ). Let γ be the solution of minimizing L (γ), namely γ = argmin γ>0 L (γ). Then, α and β are estimated by α = α ( γ) and β = β ( γ). Although α, β and γ are the consistent estimators of α, β and γ as (N, ) with /N y, it is not guaranteed that α, β and γ are inside reasonable ranges for finite and N. For ractical use, we need to adjust the values of α, β and γ to be inside aroriate ranges. An aroach to this end is the Bayesian argument given in the next subsection. We shall suggest the linear ridge estimator Ω LR in (3.6) through (3.3), (3.4) and (3.5). 8

10 3.3 Bayesian motivation and the suggested linear ridge estimator We give a Bayesian motivation of the linear ridge estimator (3.9). This tells us about how the arameters α, β and γ should be restricted. Let V = NS. Consider the case of N > and assume that V has a Wishart distribution W (N, Σ ). The density function of V is denoted by f(v Σ ). Let η be a random variable having a Bernoulli distribution Ber(θ) with P (η = ) = θ. As a rior distribution of Σ, consider a sike and slab rior distribution: Given η = 0, Σ Σ has a one-oint distribution P (Σ and η = are denoted by π(σ to satisfy that δ Λ0 (Σ )dσ marginal density functions of (V, Σ f(v, Σ W (k, Λ ), and given η =, = Λ 0 ) =. The density functions of Σ given η = 0 Λ ) and δ Λ0 (Σ ), resectively, where δ Λ0 (Σ ) is assumed = and Σ δ Λ0 (Σ )dσ = Λ 0. Then, the joint and ) and V are written as, resectively, ) =f(v Σ ){( θ)π(σ f(v ) =( θ) Thus, the Bayes estimator, given by Ω Bayes where Ω Bayes = It can be easily seen that Σ f(v, Σ = ( θ) Σ Λ ) + θδ Λ0 (Σ )}, f(v Σ )π(σ Λ )dσ + θf(v Λ 0 ). = E[Σ )dσ /f(v ) ( θ) f(v Σ Σ f(v Σ =( w 0 ) f(v Σ V ], is exressed as f(v Σ )π(σ Λ )dσ )π(σ )π(σ )π(σ + θλ 0 f(v Λ 0 ) Λ )dσ + θf(v Λ 0 ) Λ )dσ Λ )dσ + w 0 Λ 0, θf(v Λ 0 ) w 0 = ( θ) f(v Σ )π(σ Λ )dσ + θf(v Λ 0 ). Σ f(v Σ )π(σ f(v Σ )π(σ Λ )dσ Λ )dσ = (N + k)(v + Λ ) = v 0 (S + N Λ ), for v 0 = (N + k)/n. Thus, we get Ω Bayes = ( w 0 )v 0 (S + N Λ ) + w 0 Λ 0, where v 0 and w 0 satisfy that v 0 > and 0 < w 0 <. Motivated from the Bayes estimator Ω Bayes, we consider the following linear ridge estimator: Let Λ = NγI and Λ 0 = (/l)i for l = i= l i/ = tr [S ]/. Then the linear ridge estimator we treat is linear ridge Ω = v( w)(s + γi ) + w(/l)i, (3.2) where v, w and γ are constants which satisfy the restrictions v >, 0 < w <, and γ > 0. (3.3) 9

11 This estimator corresonds to (3.9) with α = v( w) and β = w/l. On the other hand, from the results below (3.), we have the consistent estimators α, β and γ for α, β and γ. Thus, one gets the equations α = ṽ( w) and β = w/l, which yields that w = l β and ṽ = α/( w) = α/( l β). Since ṽ, w and γ need to satisfy the restriction (3.3), the arameters v, w and γ are estimated by ŵ = 0 w = 0 (l β), v = ṽ = α/( w), γ = γ 0, (3.4) where a b = max(a, b) and a b = min(a, b). This gives α = v( ŵ) and β = ŵ/l, (3.5) and we suggest the use of the linear ridge estimator Ω LR = v( ŵ)(s + γi ) + ŵ(/l)i = α(s + γi ) + βi. (3.6) Note that when ŵ = 0 or β = 0, the linear ridge estimator becomes the ridge-tye estimator α(s + γi ) = v(s + γi ). On the other hand, when ŵ = or α = 0, the linear ridge estimator becomes βi = (/l)i, which is roosed frequently as an estimator of an ultra-high dimensional recision matrix. Generally, in the case that 0 < ŵ <, the linear ridge estimator is a convex combination of the ridge estimator and (/l)i. 3.4 Ridge and linear shrinkage estimators for comarison In the revious subsections, we have derived the Ledoit-Wolf tye estimator (3.8) and the linear ridge estimator (3.6) relative to the Frobenius loss function (2.4). To comare them with the existing other estimators, we here look at two estimators: the ridge and the linear shrinkage estimators. Both estimators are secial cases of the linear ridge estimator (3.9), and we can investigate how the linear ridge estimator is effective in comarison with them. Thus, we derive the otimal values of the arameters in the ridge and the linear shrinkage estimators with resect to the Frobenius loss. [] Ridge estimator. The ridge estimator is of the form Ω ridge = α(s + γi ), (3.7) which has been used and studies in the literature. For examle, see Srivastava and Kubokawa (2007). Although most results were obtained under Gaussian distributions, Wang, et al. (204) derived the estimators of the otimal α and γ using the random matrix theory without any assumtion of underlying distributions, where they treated another quadratic loss function. Given γ, the otimal α relative to the Frobenius loss is α R (γ) = which leads to the reduced loss function L (Σ tr [(S + γi ) Σ ] tr [(S + γi ) Σ 2 (S + γi ) ],, Ω ridge (α R (γ), γ)) = {tr [(S + γi ) Σ ]} 2 tr [(S + γi ) Σ 2 (S + γi ) ]. 0

12 Then, it follows from the above results that α R (γ) and L (Σ, Ω ridge (α R (γ), γ)) converge to α R B(γ) (γ) = yb 2 (γ) + B(γ) + 2yγB(γ)B (γ) + (γ t )B (γ), L R (γ) = B 2 (γ) yb 2 (γ) + B(γ) + 2yγB(γ)B (γ) + (γ t )B (γ), for B(γ) and B (γ) in Theorem 2. Relacing B(γ), B (γ), t and y with B(γ), B (γ), â and /N, we have α R and L R (γ), where B(γ) and B (γ) are given in (3.). Let γ R be the solution of minimizing L R (γ). Then, we get α R = α R ( γ R ), which yields the ridge estimator. [2] Linear shrinkage estimator. The linear shrinkage estimator suggested by Bodnar, et al. (204) is of the form { Ω linear αs + β I = if N > αs + + β I if N <. (3.8) In the case of N >, we can use the results given in the revious subsections and Theorem 3.2 in Bodnar, et al. (204) to show that the Frobenius loss function of the linear shrinkage estimator has the limit lim L (Σ, Ω linear ) N, { = lim α 2 tr [S 2 N, ( = + ( y) 2 + t } Σ 2 ] + 2αβtr [S Σ 2 ] + β 2 tr [Σ 2 ] 2αtr [S Σ ] 2βtr [Σ ] + + m F (0) ) α t y y αβ + t 2β 2 2 y α 2t β, where t and t 2 are given in Theorem 2. Then, the otimal α and β in terms of minimizing the limit of the loss function are given by α L t 2 t 2 = t 2 { + t ( + m F (0))( y)}, β L = t ( αl ). t 2 y Note that m F (0) can be consistently estimated by tr [S ]/ when N >. Relacing t, t 2 and m F (0) with their estimators and relacing y with /N, one gets the estimators α L and β L, which yields the linear estimator. In the case of N <, Bodnar, et al. (204) cannot rovide estimators for general Σ. This is because in their settings one needs the limit of tr [S + Σ ], but this cannot be obtained without assuming a structure such as Σ = σ 2 I for scalar σ 2. Without assuming such a structure, however, we can obtain estimators of the otimal α and β using equation (3.3). In our setting, we can exand the loss function as L (Σ, Ω linear ) = { } α 2 tr [(S + ) 2 Σ 2 ] + 2αβtr [S + Σ 2 ] + β 2 tr [Σ 2 ] 2αtr [S + Σ ] 2βtr [Σ ] +,

13 so that we need the limit of tr [(S + ) 2 Σ 2 ], tr [S + Σ 2 ] and tr [S + Σ ]. Using equation (3.3) and Theorem 3.3 in Bodnar, et al. (204), one gets lim L (Σ, Ω linear N, ) = + α 2 lim N, = + α 2 ϕ(x) x 2 N i= ϕ(l i ) (l i ) df (x) + 2αβ 2 + 2αβ lim N, N i= ϕ(l i ) l i + t 2 β 2 2 y α 2t β ϕ(x) x df (x) + t 2β 2 2 y α 2t β, where ϕ( ) is given in (3.3) and l i, i =,, N are the eigenvalues of S. Then, the otimal α and β in terms of minimizing the limit of the loss function are given by α L t 2 /(y ) t {ϕ(x)/x}df (x) = t 2 {ϕ(x)/x 2 }df (x) [ {ϕ(x)/x}df (x)] 2, β L = t {ϕ(x)/x 2 }df (x) {ϕ(x)/x}df (x)/(y ) t 2 {ϕ(x)/x 2 }df (x) [ {ϕ(x)/x}df (x)] 2. Note that x k ϕ(x)df (x) for integer k can be estimated by N i= lk i ϕ(l i ) for ϕ(x) given in (3.7). Relacing t, t 2, {ϕ(x)/x}df (x) and {ϕ(x)/x 2 }df (x) with their estimators and relacing y with /N, we have the consitent estimators α L and β L, which roduces the linear estimator for > N. 4 Simulation and Real Data Analysis 4. Simulation study In this section, we investigate erformances of the rocedures suggested in the revious sections through Monte Carlo simulation. In our simulation, eigenvalues of a oulation covariance matrix are fixed as follows: Let F (a,b) (x) be a cumulative distribution function of the beta distribution beta(a, b), namely F (a,b) (x) = Γ(a + b) Γ(a)Γ(b) x 0 t a ( t) b dt, x [0, ]. Since the suort of the beta distribution is [0, ], we transform the interval linearly to [, 0], and the oulation eigenvalues are given by This setu is the same as in Ledoit and Wolf (202). ( i + 9F (a,b) ), i =,,. (4.) 2 We first treat the case of (a, b) = (, ), and investigate the numerical erformances of the risk of Ω oracle, Ω LW, Ω LR, Ω ridge and Ω linear, which are given in (3.), (3.8), (3.6), (3.7) and (3.8), resectively. These estimators are denoted by oracle, LW, LR, ridge and linear in the tables given below. For comutation of Ω LW, we need the QuEST function introduced in Ledoit and Wolf (205). Using this ackage, we carried out the simulation 2 and Ω linear

14 exeriments on the Matlab. Random observations y,..., y N are generated as y i = Σ /2 x i for x i = (x i,..., x i ) T where x i,..., x i are mutually indeendently distributed as (Case I) x ij N (0, ), (Case II) x ij (m 2)/mz i,j for z i,j t m, where t m is a t-distribution with m degrees of freedom. (Case II) is an examle of a heavy-tailed distribution and we treat the case of m = 0. The simulation exeriments are carried out under the above setu for N = 50, = 30, 70, 50, 250, 500 and 700, and emirical risks of these estimators are calculated based on, 000 relications. Table rovides values of emirical risks of the estimators in normal and 4/5t 0 distributions with m = 0, where eigenvalues of Σ are generated by (4.) for (a, b) = (, ). Definitely, the linear ridge estimator Ω LR erforms better than the ridge and linear estimators Ω ridge and Ω linear. Comaring Ω LR and Ω LW, we can see that the linear ridge estimator Ω LR is better than the Ledoit-Wolf tye estimator Ω LW excet for the cases of = 70 or 50. Esecially, in the case of = 50, the estimator Ω LW gives a smaller risk than Ω LR in the normal and t 0 distributions, but the imrovement is not significant. When is small or /N is comaratively large, Ω LR erforms better than Ω LW. It may be because in such cases, δ(l i ) and ϕ(l i ) in equation (3.8) give oor aroximations for u T i Σ u i and u T i Σ2 u i in equation (3.), resectively. The resulting estimate δ(l i )/ ϕ(l i ) causes over-shrinkage or under-shrinkage, which leads to the increased risk of Ω LW. For the linear ridge estimator, on the other hand, we only have to calculate tr [(S + γi ) ], namely, we can use all the samle eigenvalues to estimate arameters. This leads to a stability of Ω LR in the cases that N and are small or /N is comaratively large. Comaring Ω LW and Ω LR with the oracle estimator, we observe that as gets larger, the risks of Ω LW and Ω LR become more inflated in comarison with the oracle estimator, namely, the estimation error of â i in (3.8) for estimating a i in the Ledoit-Wolf tye estimator increases for large. The inflation in the risk of Ω LR seems relatively smaller than that of Ω LW for large. Table : Emirical Risks of Ω oracle, Ω LW, Ω LR, Ω ridge and Ω linear with N = 50, (a, b) = (, ) oracle LW LR ridge linear Normal (a,b)=(,) t (a,b)=(,)

15 We next investigate the erformances of the estimators for some variety of shaes of oulation sectral distribution. As the shae arameters (a, b) in the beta distribution, we deal with the cases of (a, b) = (.5,.5), (0.5, 0.5), (5, 5) and (2, 5). For grahical illustration of the beta distribution under these arameters, see Ledoit and Wolf (202). We carry out similar simulation exeriments as described above for N = 50 and = 30, 70, 50, 250 and 500. Poulation eigenvalues are generated by equation (4.). The results are reorted in Table 2. It is revealed that we have similar risk erformances as mentioned in the case of (a, b) = (, ). Thus, the results given in Tables and 2 show that the suggested linear ridge estimator Ω LR gives a relatively good erformance among the estimators comared in this simulation. Table 2: Emirical Risks of Ω oracle Distribution, Ω LW, Ω LR, Ω ridge and Ω linear with N = 50 under Normal (a, b) oracle LW LR ridge linear (.5,.5) (0.5,0.5) (5,5) (2,5) Illustrative examle We now comare the erformance of several estimators of the recision matrix through the quadratic discriminant analysis (QDA). In QDA, when observation x i,j, i =, 2, j =,..., N i are obtained from the oulation grou Π i, i =, 2, a new observation x is classified into Π or Π 2 via the classification rule (x x ) T Ω, (x x ) (x x 2 ) T Ω 2, (x x 2 ) < (res. >) 0 = x Π (res.π 2 ), 4

16 where x i = Ni Ni j= x i,j, i =, 2 and Ω i,, i =, 2 is an estimator of the recision matrix of the oulation Π i. The classification rules given by substituting the estimators Ω, and Ω 2, corresonding to Ω LW, Ω LR, Ω ridge, Ω linear, Ω MP and Ω diag are denoted by LW, LR, ridge, linear, MP and diag, where Ω MP is the diagonal matrix Ω diag is the Moore-Penrose generalized inverse matrix and Ω diag = (diag(s )). It is known in the literature that Ω diag has a good erformance in discriminant analysis. To estimate these estimators, the samle covariance matrix of the oulation Π i is estimated by S i, = N i N i j= (x i,j x i )(x i,j x i ) T. We use the microarray data described in a colon cancer study by Alon, et al. (999) where exression levels for 2000 genes were measured on 40 normal and on 22 colon tumor tissues. The dataset is available at htt://genomics-ubs.rinceton.edu/oncology/affydata. A base 0 logarithmic transformation is alied. The descrition of the above datasets and rerocessing are due to Dettling and Buhlmann (2003) excet standardization is not followed. This dataset was used by Srivastava and Kubokawa (2007), Fisher and Sun (20) and Touloumis (205). Using this dataset, we estimate the covariance matrices of the normal and colon cancer grous of the to genes, where = 00, 250, 500 and 900. The correct classification rates based on the estimators Ω LW, Ω LR, Ω ridge, Ω linear, Ω MP and Ω diag are estimated by a leave-one-out cross validation. The estimated correct classification rates are reorted in Table 3. A relative gain in correct classification rate for our Ω LR is at least 28.7 over Ω LW and is at most 28.7 over Ω ridge. Moreover, Ω LR shows equal or slightly higher correct classification rates than Ω linear and Ω diag. Lastly we can see that the correct classification rate of Ω linear is higher than that of Ω LW and Ω ridge, which seems to contradict the results in the revious simulation study. This is because this data set of the colon cancer has considerable sarsity in the covariance structure. Thus, we may imrove the correct classification rate by assuming the shericity or diagonality for the recision matrix. Table 3: Correct Classification Rates in the Colon Cancer Dataset LW LR ridge linear MP diag Concluding Remarks In this aer, we have addressed the roblem of estimating the high-dimensional recision matrix Σ relative ot the Frobenius loss function L (Σ, Ω ) = tr (Ω Σ I )(Ω Σ I ) T. We have suggested the linear ridge estimator of the recision matrix motivated from the Bayesian asect, and rovided the estimators of the otimal arameters using the random matrix theory. 5

17 We also have derived the Ledoit-Wolf tye estimators and comared it with the suggested linear ridge estimator and the other existing estimators. Some simulation results show that the linear ridge estimator erforms better than the other existing estimators when N and are small, or when /N is large. It is worth mentioning the estimation under other loss functions. A related loss function is the quadratic loss tr [(Ω Σ I ) 2 ]. Under this loss fucntion, we can obtain the ridge, linear and linear ridge estimators. However, we could not derive a Ledoit-Wolf tye nonlinear shrinkage estimator, because we need the asymtotic behavior of u T i Σ u j for i j. In our numerical investigation, these quantities seem almost zero, but not just zero. This is left for future research. Ledoit and Wolf (202) treated the loss functions tr [(Ω Σ ) 2 ] and {tr [Ω Σ] log Ω Σ and derived the nonlinear shrinkage estimators. We can derive the ridge, linear and linear ridge estimators relative to these loss functions, and we shall comare those estimators numerically under the loss function in a future study. To get estimators of the arameters a i, α, β and γ under those loss functions, we have to solve equation (2.3) by resorting to the strong ackage QuEST run on Matlab. An advantage of the Frobenius loss function treated in this aer is that the estimators of α and β in the linear ridge estimator and the ridge estimator are exlicitly exressed for fixed γ, and the resulting linear ridge and ridge estimators are written in closed forms. Since the other estimators can be derived by using the QuEST, we can comare the erformance of the suggested linear ridge estimator with the existing other shrinkage ones. Concerning the mean µ = E[x,i ], we have treated the case of µ = 0 in this aer. In the case of µ 0, however, it is noted that all the results in this aer still hold by relacing y,i and N with y,i y and n = N for y = N N i= y i as long as we consider the asymtotics on N and. That is, the samle covariance matrix is S = N (y n,i y )(y,i y ) T, which is rewritten as S = N n N i= N (y,i µ)(y,i µ) T N n (y µ)(y µ) T. i= Leaving the second term out of consideration does not affect the LSD of S since rank((y µ)(y µ) T ) = and is negligible in our arguments for large N and. It would be interesting to use the suggested estimator for testing the hyothesis H 0 : µ = 0. Reasonable test statistics are functions of y T Ω y. For examle, test statistics of Bai and Saradanasa (996) and Srivastava (2007) are based on y T y /( tr [S ]) and y T S + y, resectively, which are corresond to Ω = (/tr S)I = (/l)i and Ω = S +. Using the linear ridge estimator for Ω, we can suggest a test statistic based on ) y T Ω LR y =y T (( w)α 0 (S + γi ) + w(/l)i y =( w)α 0 y T (S + γi ) y + w(/l)y T y, which is a convex combination of the Bai-Saranadasa statistic and the Hotelling T -statistic relacing S or S + with the ridge estimator. We will study the erformance of a test statistic based y T Ω LR y as a future roject. 6

18 A Aendix A. Proof of Theorem The Frobenius loss function of the rotation-equivariant estimator is L (Σ, Ω (A )) = {a 2 (l i )u T i Σ 2 u i 2a(l i )u T i Σ u i } +. Theorem 4 in Ledoit and Péché (20) shows that a(l i )u T i Σ u i a(x)δ(x)df (x), i= i= where δ(x) is given in (3.2). Thus, we shall evaluate the asymtotic quantity of i= a(l i)u T i Σ2 u i. The same arguments as used in the roof of Theorem 4 in Ledoit and Péché (20) are heavily exloited for the evaluation. Let (2) (x) be the nondecreasing function defined by (2) (x) = u T j Σ 2 u j [lj,+ )(x), x R. j= When all the samle eigenvalues are distinct, it is noted that u T i Σ2 u i (i =,, ) can be recovered from (2) as u T i Σ 2 u i = lim ϵ 0 + (2) (l i + ϵ) (2) (l i ϵ). F (l i + ϵ) F (l i ϵ) Thus, it suffices to examine the asymtotic behavior of (2) (x). Note that the stieltjes transform of (2) (x) is the same with Euation (3) of Ledoit and Péché (20) for g(τ j ) = τj 2. Let Θ(2) (z) be the stieltjes transform of (2) (x). Then, Θ (2) (z) converges a.s. to Θ (2) (z) for all z C +. Moreover, it follows from Lemmas 2 and 3 in Ledoit and Péché (20) that Θ (2) (z) = z + z 2 m F (z) tdh(t) ( y yzm F (z)) 2 + y yzm F (z), z C+. (A.) Using Lemma 6 in Ledoit and Péché (20), we can see that lim (2) (x) (2) (x) exists and is equal to (2) x (x) = lim I[Θ (2) (λ + iη)]dλ. (A.2) η 0 + π for every continuous oint x R of (2) (x). Plugging (A.) into (A.2) yields (2) (x) = lim η 0 + π = lim η 0 + π x + lim η 0 + π x x [ (λ + iη) + (λ + iη) 2 m F (λ + iη) I ( y y(λ + iη)m F (λ + iη)) 2 + [ (λ + iη) + (λ + iη) 2 m F (λ + iη) ] I ( y y(λ + iη)m F (λ + iη)) 2 dλ [ tdh(t) ] I dλ. y y(λ + iη)m F (λ + iη) tdh(t) ] dλ y y(λ + iη)m F (λ + iη) (A.3) 7

19 We begin by evaluating (2) (x) in the case of y (0, ). The first term in RHS of (A.3) is lim η 0 + π x = lim η 0 + π x [ (λ + iη) + (λ + iη) 2 m F (λ + iη) ] I ( y y(λ + iη)m F (λ + iη)) 2 which is rewritten as π x = π = x x dλ [ { } I (λ + iη) + (λ + iη)m F (λ + iη) { ( y yr[(λ + iη)m F (λ + iη)]) 2 y 2 (I[(λ + iη)m F (λ + iη)]) 2 }] + 2iyI[(λ + iη)m F (λ + iη)]( y yr[(λ + iη)m F (λ + iη)]) ( ( y y(λ + iη)m F (λ + iη)) 2 2 ) dλ, λi[λm F (λ)] { ( + y + yr[λm F (λ)])( y yr[λm F (λ)]) y 2 (I[λm F (λ)]) 2} ( y yλm F (λ)) 2 2 dλ I[m F (λ)]λ 2 { y 2 2y 2 λr[m F (λ)] y 2 λ 2 m F (λ) 2 } ( y yλm F (λ)) 2 2 dλ λ 2 { y 2 2y 2 λr[m F (λ)] y 2 λ 2 m F (λ) 2 } ( y yλm F (λ)) 2 2 df (λ). The second term in RHS of (A.3) is x tdh(t) lim I[ η 0 + π y y(λ + iη)m F (λ + iη) ]dλ = tdh(t) x π lim yi[(λ + iη)m F (λ + iη)] η 0 + y y(λ + iη)m F (λ + iη) 2 dλ, which is equal to tdh(t) x π x = tdh(t) Hence, for x > 0, it is concluded that yi[λm F (λ)] y yλm F (λ) 2 dλ yλf (λ) y yλm F (λ) 2 dλ = (2) (x) = x x ϕ(λ)df (λ), for ϕ( ) given in (3.3). This shows Theorem in the case of y (0, ). tdh(t)yλ 2 df (λ). y yλm F (λ) We next consider the case of y (, ). Rewriting Θ (2) (z) based on the relation between m F (z) and m F (z) given by y + yzm F (z) = + zm F (z), we have Θ (2) (z) = zm F (z) = zm F (z) ( y z + y zm F (z) zm F (z) ( + zmf (z) ym F (z) + ) tdh(t) ) tdh(t) = µ(z) z 8

20 where µ(z) = ( + zm F (z) + m F (z) ym F (z) The inversion formula for the stieltjes transform imlies that: lim (ϵ) (2) ( ϵ)) = lim ϵ 0 +( (2) ϵ 0 + lim η 0 + π ϵ ϵ lim ϵ ϵ 0 + η 0 + π ϵ = lim ) tdh(t). I[Θ (2) (λ + iη)]dλ [ I =µ(0) = ( m F (0) ym F (0) + µ(λ + iη) ] dλ λ + iη ) tdh(t), (A.4) The third equality follows from Lemma 9 in Ledoit and Péché (20), since µ is a comlex holomorhic function and µ(0) R. Also from Lemma 8 in Ledoit and Péché (20)], it is noted that F (λ) = ( y ) [0,+ ) (λ) for λ in a neighborhood of zero. For x in a neighborhood of zero, from (A.4), it can be seen that (2) (x) = x m F (0) ( ym F (0) + ) tdh(t) d [0,+ ) (λ)λ. Comaring the two exressions, we can see that for x in a neighborhood of zero, x (2) y ( ) (x) = y m F (0) ym F (0) + tdh(t) df (λ) Therefore, for x in a neighborhood of zero, which roves Theorem. A.2 Proof of Theorem 2 (2) (x) = x ϕ(λ)df (λ), We begin by roviding the roof for the limit of α (γ). Given γ > 0, the otimal α is α (γ) = tr [(S + γi ) Σ ] tr [Σ2 ] tr [(S + γi ) Σ 2 ] tr [Σ ] tr [(S + γi ) 2 Σ 2 ] tr [Σ2 ] { tr [(S + γi ) Σ 2 ]} 2 It follows from Theorem 2 in Wang, et al. (204) that when N and /N y (0, ), tr [(S + γi ) Σ ] γ m F ( γ) y( γ m F ( γ)). Let B(γ) = { γ m F ( γ)}/{ y( γ m F ( γ))}. Since tr [(S + γi ) Σ 2 ] is Θ (2) ( γ) in the roof of Theorem, it can be seen from the equation (A.) that tr [(S + γi ) Σ 2 ] γ + γ 2 m F ( γ) ( y( yγ m F ( γ))) 2 + =( tdh(t) γb(γ)) ( + yb(γ) ). tdh(t) y( yγ m F ( γ)) 9

21 Since tr [(S + γi ) 2 Σ 2 ] = d dγ tr [(S + γi ) Σ 2 ], the limit of tr [(S + γi ) 2 Σ 2 ] is d ( ( ) tdh(t) γb(γ)) + yb(γ) dγ ( = yb 2 (γ) + B(γ) + 2yγB(γ)B (γ) + γ ) tdh(t) B (γ). For β (γ) and L (γ), we can rove their limits using the same arguments, and the roof of Theorem 2 is comlete. A.3 Consistent estimator of the stieltjes transform m F (x) This section exlains briefly the method of Ledoit and Wolf (205) for estimating m F (x). Ledoit and Wolf (205) rovided the nonrandom multivariate function called the Quantized Eigenvalues Samling Transforms or QuEST for short. For any ositive integers N and, the QuEST function denoted by Q N, is defined as Q N, : [0, ) [0, ) (A.5) t (t,, t ) T Q N, (t) (q N,(t),, q N, (t))t. (A.6) For z C +, m m t N, (z) is the unique solution in the set {m C : (N )/Nz+m/N C+ } to the equation m = i= t i { (/N) (/N)zm} z, (A.7) x R, FN,(x) t { max N, lim η 0 + π x i= (ti =0) } I[m t N,(ξ + iη)]dξ if x = 0, otherwise, (A.8) u [0, ], (F t N,) (u) su{x R : F t N,(x) u}, (A.9) and i/ i =,,, qn,(t) i (FN,) t (u)du. (A.0) (i )/ It can be seen that equation (A.7) quantizes equation (2.3), and that equation (A.8) quantizes equations (2.) and (2.2). FN, t is the limiting distribution of samle eigenvalues corresonding to the oulation sectral distribution i= (t i =0). By equation (A.9), (FN, t ) reresents the inverse sectral distribution function, or the quantile function. By equation (A.0), qn, i (t) 20

22 can be interreted as a smoothed version of the (i 0.5)/ quantile of FN, t. Ledoit and Wolf (205) estimates the eigenvalues of the oulation covariance matrix by ˆt = argmin t [0, ) [qn,(t) i l i ] 2, where l = (l,..., l ) T are eigenvalues of the samle covariance matrix, and showed that i= [ˆt i t i ] 2 0. i= Lastly, the estimator of m F (x) is obtained as the unique solution ˆm R C + to the equation m = i= ˆt i { (/N) (/N)xm} x. Acknowledgments. First of all, we are really grateful to Professor Olivier Ledoit for his kindness of giving us his rograms for comutation. Research of the second author was suorted in art by Grant-in- Aid for Scientific Research (5H0943 and ) from Jaan Society for the Promotion of Science. References [] Alon, U., Barkai, N., Motterman, D., Gish, K., Mack, S., and Levine, J. (999). Broad atterns of gene exression revealed by clustering analysis of tumor and normal colon tissues robed by oligonucleotide arrays. Proceeding of the National Academy of Science of the United States of America, 96, [2] Bai, Z., and Saranadasa, H. (996). Effect of high dimension: By an examle of a two samle roblem. Statist. Sinica, 6, [3] Bai, J., and Shi, S. (20). Estimating high dimensional covariance matrices and its alications. Ann. Economics and finance, 2, [4] Bodnar, T., Guta, A.K., and Parolya, N. (205). Direct shrinkage estimation of large dimensional recision matrix. J. Multivariate Analysis, to aear. [5] Dettling, M., and Buhlmann, P. (2002). Boosting for tumor classification with gene exression data. Bioinfomatics, 9, [6] Fisher, T.J., and Sun, X. (20). Imroved Stein-tye shrinkage estimators for the highdimensional multivariate normal covariance matrix. Com. Statist. Data Analysis, 55,

23 [7] Himeno, T. and Yamada, T. (204). Estimations for some functions of covariance matrix in high dimension under non-normality and its alications. J. Multivariate Analysis, 30, [8] Ledoit, O., and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Analysis, 88, [9] Ledoit, O., and Péché, S. (20). Eigenvectors of some large samle covariance matrix. Prob. Theory Relat. Fields, 52, [0] Ledoit, O., and Wolf, M. (202). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist., 40, [] Ledoit, O., and Wolf, M. (205). Sectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. J. Multivariate Anal., 39, [2] Marčenko, V.A., and Patur, L.A. (967). Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sb.,, [3] Silverstein, J. (995). Strong convergence of the emirical distribution of eigenvalues of large dimensional random matrices. J. Multivariate Anal., 55, [4] Silverstein, J., and Bai, Z.D. (995). On the emirical distribution of eigenvalues of a class of large dimensional random matrices. J. Multivariate Anal., 54, [5] Silverstein, J., and Choi, S. (995). Analysis of the limiting sectral distribution of large dimensional random matrices. J. Multivariate Anal., 54, [6] Srivastava, M.S. (2007). Multivariate theory for analysing high dimensional data. J. Jaan Statist. Soc., 37, [7] Srivastava, M.S., and Kubokawa, T. (2007). Comarison of discrimination methods for high dimensional data. J. Jaan Statist. Soc., 37, [8] Wang, C., Pan, G., Tong, T., and Zhu, L. (205). Shrinkage estimation of large dimensional recision matrix using random matrix theory. Statistica Sinica, 25, [9] Touloumis, A. (205). Nonarametric Stein-tye shrinkage covariance matrix estimators in high-dimensional setting. Com. Statist. Data Analysis, 83,

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo