High-dimensional asymptotic expansions for the distributions of canonical correlations

Size: px

Start display at page:

Download "High-dimensional asymptotic expansions for the distributions of canonical correlations"

Beryl Mathews
5 years ago
Views:

Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.

Bunkyo-ku, 112-8551, Japan a r t i c l e i n f o a b s t r a c t Article history: Received 14 August 2007 Available online 26 April 2008 AMS 1991 subject classifications: primary 62H10 secondary

1 Journal of Multivariate Analysis ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: High-dimensional asymptotic expansions for the distributions of canonical correlations Yasunori Fujikoshi, Tetsuro Sakurai Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku, , Japan a r t i c l e i n f o a b s t r a c t Article history: Received 14 August 2007 Available online 26 April 2008 AMS 1991 subject classifications: primary 62H10 secondary 62E20 Keywords: Asymptotic distributions Canonical correlations Extended Fisher s z-transformation High-dimensional framework This paper examines asymptotic distributions of the canonical correlations between x 1 ; q 1 and x 2 ; p 1 with q p, based on a sample of size of N = n + 1. The asymptotic distributions of the canonical correlations have been studied extensively when the dimensions q and p are fixed and the sample size N tends toward infinity. However, these approximations worsen when q or p is large in comparison to N. To overcome this weakness, this paper first derives asymptotic distributions of the canonical correlations under a high-dimensional framework such that q is fixed, m = n p and c = p/n c 0 [0, 1), assuming that x 1 and x 2 have a joint q + p)-variate normal distribution. An extended Fisher s z-transformation is proposed. Then, the asymptotic distributions are improved further by deriving their asymptotic expansions. Numerical simulations revealed that our approximations are more accurate than the classical approximations for a large range of p, q, and n and the population canonical correlations Elsevier Inc. All rights reserved. 1. Introduction Let x 1 = x 1,..., x q ) and x 2 = x q+1,..., x q+p ) be two random vectors with a joint q + p)-variate normal distribution with a mean vector µ = µ, 1 µ 2 ) and a covariance matrix ) Σ11 Σ Σ = 12, Σ 21 Σ 22 where Σ 12 is a q p matrix. Without loss of generality, we may assume q p. Let S be the sample covariance matrix formed from a sample of size of N = n + 1 of x = x, 1 x 2 ). Corresponding to a partition of x, we partition S as ) S11 S S = 12. S 21 S 22 Let ρ 1 ρ q 0 and r 1 > > r q > 0 be the population and sample canonical correlations between x 1 and x 2. Note that ρ 2 1 ρ2 0 and q r2 > 1 > r2 1 q > 0 are the characteristic roots of Σ11 Σ 12Σ 1 22 Σ 12 and S 1 11 S 12S 1 22 S 12, respectively. This paper is concerned with asymptotic distributions of the canonical correlations and their transformations. Under the framework of a large sample; A0: p and q are fixed, n, many asymptotic results have been obtained e.g., see [11,1]). Note, however, it may be noted that these results will not work well when the dimensions q and or p become large. Such examples can be seen in [13, pp ]. One of the examples is Corresponding author. address: fujikoshi_y@yahoo.co.jp Y. Fujikoshi) X/$ see front matter 2008 Elsevier Inc. All rights reserved. doi: /j.jmva

2 232 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) concerned with a canonical correlation analysis between the p = 14 explanatory variables and the q = 8 criteria variables based on N = 128 observations. In this example we are also interested in a canonical correlation analysis between the p = 14 explanatory variables and a subset of the q = 8 criteria variables. To overcome this weakness, we first derive the limiting distributions of the canonical correlations and their functions under a high-dimensional framework such that A1: q; fixed, p, n, m = n p, c = p/n c 0 [0, 1). 1.1) Here, the constant c is assumed to be 0 < c < 1. Based on the limiting results, we propose an extended Fisher s z- transformation in which the asymptotic variance does not depend on the unknown parameters. In addition, the limiting results are improved further by deriving their asymptotic expansions. Note that the classical large-sample limiting results are the same for p and q, while the high-dimensional limiting results depend on p through c = p/n. Furthermore, our high-dimensional results can be reduced to the classical large-sample results when c tends toward 0. This means that our approximations contain more information than the classical approximations. Moreover, numerical simulations revealed that our approximations are more accurate than the classical approximations over a large range of p, q, and n and the population canonical correlations. Some papers have described asymptotic distributions under a high-dimensional framework such that both the dimension and the sample size are large. For example, there are some works on the characteristic roots of Wishart matrices, see [2,5]. For some tests of covariance matrices, see [6,9], etc. Raudys and Young [8] gave asymptotic results of linear discriminant functions. 2. Limiting distributions Since we are interested in the distribution of a function regarding the characteristic roots of S 1 11 S 12S 1 22 S 12, without loss of generality we may assume Iq P ) Σ =, P = P P 1 O), P 1 = diagρ 1,..., ρ q ). I p Let A = ns, which is distributed as a Wishart distribution W q+p n, Σ), and partition A as ) A11 A A = 12, A 21 A 22 corresponding to a partition of x. In our derivation, first we consider the distribution of l 2 α l2 > 1 > l2 q ) defined by l 2 α = r2 α /1 r2 α ), α = 1,..., q, which are the characteristic roots of A A 12A 1 22 A 21, instead of r 2 α or r αr 1 > > r q ). Here A 11 2 = A 11 A 12 A 1 22 A 21. The distribution of r 2 α or r α is treated as the distribution of a function fl 2 α ). For example, r2 α and r α are expressed in terms of l α as rα 2 = l2 α /1 + l2 α ) and r α = l α / 1 + l 2 α, respectively. The population quantity corresponding to l2 α is denoted by γα 2 = ρ2 α /1 ρ2 α ), α = 1,..., q. We derive the asymptotic distributions of l 2 α and its function under framework A1 and A2: The αth root ρ α of Σ 1 11 Σ 12Σ 1 22 Σ 12 is simple and is not zero. 2.1) Then, the l α s are the roots of Q = A 1/ A 12A 1 22 A 21A 1/ and Q can be expanded see Appendix) as : Q = A 1/ A 12A 1 22 A 21A 1/ = Θ 2 + O, 1/2 where Θ 2 = diagθ1 2,..., θ2) = 1 q c) 1 ci q + Γ 2 ), i.e., θα 2 = c + γ2 α )/1 c), α = 1,..., q. This shows that l 2 α approaches θ 2 α = 1 c) 1 { ρ 2 α /1 ρ2 α ) + c }, 2.2) not ρ 2 α /1 ρ2 α ), and Theorem 2.1 holds.

3 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) Theorem 2.1. Let l 2 α = r2 α /1 r2 α ), where r α is the αth canonical correlation, and let hl 2 α ) be a function of l2 α such that the first derivative is continuous in the neighborhood of l 2 α = θ2 α and h θα 2 ) 0. Then, under A1 and A2, we have y α = nl 2 α θ2 α ) d N0, τα 2), z α = { n hl 2 α ) hθ2 α )} d N 0, { ) h θ 2α )} 2 τ 2α, where d denotes convergence in the distribution, and the asymptotic variance τα 2 of y α is expressed in terms of θα 2 and ρ2 α as τα 2 22 c) = θα 2 1 c + 1) θα 2 c ) 2 c { } 22 c) ρ 2 = 1 c) 1 α 3 ρ2 α ) 1 + c, 2.3) 1 ρ 2 α 2 c respectively. Theorem 2.1 is shown by using a perturbation expansion of l 2 α in terms of U and V see, A.1) and A.4)). Therefore, we find that the limiting distribution of y α is the same as that of q 1) αα = c1 c) 1 u αα θ 2 α v αα, where the limiting distributions of u αα and v αα are independently normal with mean 0. This yields the limiting distribution of y α. In a rigorous terminology of convergence in distribution the results of Theorem 2.1 should be considered for a highdimensional framework that n = pc with a constant 0 < c < 1 or c should be read as c 0. However, for an actual use the statement as in Theorem 2.1 will be more useful. Note that rα 2 = l2 α /1 + l2 α ) and r α = l 2 α /1 + l2 α ). Therefore, rα 2 and r α can be expressed as functions of l 2 α given by hx) = x/1 + x), and hx) = x/1 + x), respectively. From Theorem 2.1, we obtain the following results: nr 2 α ρ 2 α ) d N0, σα 2), nrα ρ α ) d N 0, 1 ) 4 σ2 α ρ 2 α, 2.4) 2.5) where ρ α = {ρ 2 α + c1 ρ2 α )}1/2, σ 2 α = 21 c)1 ρ2 α )2 {2ρ 2 α + c1 2ρ2 α )}. In particular, letting c = 0 in 2.4) and 2.5) we have the well-known large-sample results: nr 2 α ρ 2 α ) d N0, 4ρ 2 α 1 ρ2 α )2 ), nrα ρ α ) d N0, 1 ρ 2 α )2 ). 2.6) 2.7) Here, note that the high-dimensional asymptotic results 2.4) and 2.5) depend on p through c = p/n, but the largesample results 2.6) and 2.7) do not depend on p and are the same for all p. The numerical accuracy of these approximations is examined in Section Extension of Fisher s z-transformation It is of interest to find a transformation such that its asymptotic variance becomes parameter-free. In particular, the transformation whose asymptotic variance becomes 1 is usually used. This is equivalent to finding a function h such that the high-dimensional asymptotic variance of z = nhl 2 α ) hθ2 α )) is equal to 1. From Theorem 2.1, this is equivalent to finding a function h satisfying h 2 22 c) x) x + 1) x 1 c c 2 c ) = 1 for all x > c/2 c). 3.1)

4 234 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) It is easy to find a solution defined by { } 1 c hx) = 22 c) log gx) + gx) 2 1, 3.2) where gx) = 2 c)x + 1 c. The transformation is defined for x > c/2 c) gl 2 α ) 1 r α c/2. We shall see that the transformation is an extension of Fisher s z-transformation. Let u = {2 c)x c}/2, and then gx) + gx) 2 1 = 1 + 2u + 2 uu + 1) = 1 + u + u) 2. The last expression is equal to 1 + u + u 1 + u u = 1 + u/1 + u) 1 u/1 + u). Furthermore, u 1 + u = 1 1 x + 1 c 2 c)x + 1), which is equal to rα 2 c1 r2 α )/2 c) when x = l2 α. Therefore, we have hl 2 α ) = 1 c } {gl 22 c) log 2α ) + gl 2 α )2 1 1 c 1 = 1 c/2) 2 log 1 + rα 2 c1 r2 α )/2 c). 3.3) 1 rα 2 c1 r2 α )/2 c) Let z = 1 2 log 1 + rα 2 c1 r2 α )/2 c), 1 rα 2 c1 r2 α )/2 c) ζ = 1 2 log 1 + ρ 2 α c1 ρ2 α )/2 c). 1 ρ 2 α c1 ρ2 α )/2 c) 3.4) Then, from Theorem 2.1 we have n1 c) 1 c/2) z ζ) d N0, 1). 3.5) Similarly, letting c = 0 in 3.5), we have Fisher s z-transformation and its asymptotic normality in the large-sample case { 1 n 2 log 1 + r α 1 1 r α 2 log 1 + ρ } α d N0, 1). 3.6) 1 ρ α Results 3.5) and 3.6) can be used to construct a confidence interval for ρ. If A1 and A2 are satisfied, we have approximately where Pu 1 ζ u 2 ) 1 δ, u 1 = z u ) 1 2 δ 1, n1 c)/1 c/2) ) 1 u 2 = z + u 2 δ 1 n1 c)/1 c/2) and uδ) is the upper δ point of N0, 1). The relationship u 1 ζ u 2 is converted to the form of an interval as follows: 2u 1 log 1 + ρ 2 α c1 ρ2 α )/2 c) 1 ρ 2 α c1 ρ2 α )/2 c) u 2 tanhu 1 ) ρ 2 α c1 ρ2 α )/2 c) tanhu 2),

5 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) Table 3.1 Actual confidence coefficients for the confidence intervals of ρ α with the 95% confidence coefficient N p ρ 1 = 0.9 ρ 2 = 0.5 ρ 3 = Large High Large High Large High Large and High mean the large-sample confidence interval 3.8) and the high-dimensional confidence intervals 3.7), respectively. Table 3.2 Actual confidence coefficients for the large-sample and high-dimensional confidence intervals of ρ 1 with the 95% confidence coefficient Case Large High i) ii) which is equivalent to { 2 c tanh 2 u 1 ) c } ρ α 21 c) 2 c 2 c 21 c) { tanh 2 u 2 ) c 2 c }. 3.7) Letting c = 0 in 3.7), we obtain a confidence interval based on a large-sample framework as follows: tanhu 1 ) ρ α tanhu 2 ). 3.8) Transformation 3.3) is defined for rα r 2 c/2. As a general transformation it is suggested to replace α 2 c1 r2 α )/2 c) by rα 2 c1 r2 α )/2 c). This modification was used in Table 3.1. The other modification will replace it by zero when rα 2 < c/2. These two modifications are almost the same, but the former is more useful. This is examined in the following numerical example. Table 3.1 gives the actual confidence coefficients for the large-sample confidence interval 3.8) and the high-dimensional confidence intervals 3.7) of ρ α with the 95% confidence coefficient. The simulation with 100,000 repetitions was performed when ρ 1 = 0.9, ρ 2 = 0.5, ρ 3 = 0.3, and q = 3. The sample size N and dimension p were chosen as in Table 3.1. The values with denote the ones obtained by replacing rα 2 c1 r2 α )/2 c) by its absolute value. The modifications replaced by zero were almost the same except for ρ 3 = 0.3, N = 50 and p = 3, 7. Table 3.1 shows that the high-dimensional confidence interval is more useful than the large-sample confidence interval in a large range of N, p and the population canonical correlations. The high-dimensional confidence interval is also applicable to a moderate-sample size with p = 2 or 3. Consider two settings i) N = 25, p = q = 2, ρ 1 = , ρ 2 = 0.054, ii) N = 37, q = 2, p = 3, ρ 1 = , ρ 2 = , which are based on real data in [1, p. 505] and [10, p. 208], respectively. Then, we have large-sample and high-dimensional confidence intervals with a 95% confidence coefficient. Using a simulation, the actual confidence coefficients are obtained as follows. Table 3.2 shows that the high-dimensional confidence interval is more useful than the large-sample confidence interval even in a situation where the large-sample asymptotic is suitable.

6 236 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) Asymptotic refinement To make our asymptotic results more elaborate, we derive their asymptotic expansions. Let y α be the random variable in Theorem 2.1. Then we see, e.g., [3,4]) that the first four cumulants have the forms κ 1 y α ) = Ey α ) = 1 n a 1α + O 3/2, κ 2 y α ) = Vary α ) = a 2α + O 1, κ 3 y α ) = 1 n a 3α + O 3/2, 4.1) κ 4 y α ) = O 1, where the notation O i denotes a term of the ith order with respect to n 1, p 1, m 1 ). In fact, the coefficients can be expressed as follows see Appendix): a 1α = 1 c) 1 [ q 1) q 3)θ 2 α + cq 1)1 + θ2 α ) θ2 α ) { 2θ 2 α c1 + θ2 α )} d α ], a 2α = 21 c) θ 2 α ) { 2θ 2 α c1 + θ2 α )}, a 3α = 81 c) θ 2 α ) { 3θ 2 α 1 + 2θ2 α ) + c {c 2 + 2c 7)θ 2 α + c 5)θ4 α }}, 4.2) where d α = β αθ 2 α θ2 β ) 1. Using the cumulant formulas, it is possible to give an asymptotic expansion of the distribution function of y α. For a general theory, see [3,4]. In the following we, however, give a general result for the distributions of r 2 α and its function. Consider the transformed variable defined by z α = n {hl 2 α ) hθ2 α )} 4.3) in Theorem 2.1. Here, it is assumed that hx) is three times continuously differentiable in the neighborhood of x = θα 2. Then, we can expand z as z α = n {hl 2 α ) hθ2 α )} = { n h θα 2)l2 α θ2 α ) + 1 } 2 h θα 2)l2 α θ2 α )2 + This implies = h θ 2 α )y α + 1 n 1 2 h θ 2 α )y2 α +. κ 1 z α ) = Ez α ) = 1 n b 1α + O 3/2, κ 2 z α ) = Varz α ) = b 2α + O 1, 4.4) where κ 3 z α ) = 1 n b 3α + O 3/2, b 1α = h θ 2 α )a 1α h θ 2 α )a 2α, b 2α τ 2 α = h θ 2 α )2 a 2α, b 3α = h θ 2 α )3 a 3α + 3h θ 2 α )2 h θ 2 α )a2 2α. 4.5) It is easy to obtain the first two expressions in 4.5). For a derivation of the last expression, see the Appendix. We are especially interested in obtaining an asymptotic expansion of the distribution function of z = n[r 2 α {ρ2 α + c1 ρ2 α )}], which can be expressed as n{hl 2 α ) hθ2 α )} with hx) = x/1 + x). Then, h θ 2 α ) = 1 + θ2 α ) 2 = 1 c) 2 1 ρ 2 α )2, h θ 2 α ) = 21 + θ2 α ) 3 = 21 c) 3 1 ρ 2 α )3.

7 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) Using 4.2) the coefficients in 4.5), this can be expressed as b 1α = 1 ρ 2 α ) { } q 1) + 2q 2)ρ 2 α + c 2q 1) 2q 2)ρ 2 α + { 2ρ 2 α + c1 2ρ2 α )} 1 ρ 2 α ) β αρ 2 α ρ2 β ) 1, 4.6) b 2α σ 2 α = 21 c)1 ρ2 α )2 [ 2ρ 2 α + c1 2ρ2 α )], b 3α = 81 c)1 ρ 2 α )3 {3ρ 2 α + c1 3ρ2 α )}{1 3ρ2 α c2 3ρ2 α )}. Theorem 4.1. Let r α be the αth canonical correlation, Then, under A1 and A2, we have P nr 2 α ρ2 α )/σ α x) = Φx) + φx) { b 1α /σ α ) + b 3α /σ 3 α )x2 1) } + O 3/2, 4.7) where Φ and φ are the probability distribution function and probability density function of N0, 1), respectively, ρ α = {ρ 2 α + c1 ρ2 α )}1/2, σ α = b 2α. The coefficients b 1α, b 2α and b 3α are given by 4.6). Furthermore, let hrα 2) be a function of r2 α such that the third derivative is continuous in the neighborhood of rα 2 = ρ2 α and h ρ 2 α ) 0. Then, the distribution function of { n hrα 2) h ρ2 α )} /τ α can be expanded as in 4.7) with the coefficients biα instead of b iα, where τ α = b1/2 2α, b 1α = h ρ 2 α )b 1α h ρ 2 α )b 2α, b 2α = h ρ 2 α )2 b 2α, 4.8) b 3α = h ρ 2 α )3 b 3α + 3h ρ 2 α )2 h ρ 2 α )b2 2α. From Theorem 4.1, the upper percent point of nr 2 α ρ2 α )/σ α is given by u + 1 n { b1α /σ α ) + b 3α /σ 3 α )u2 1) } 4.9) in the term of the upper percent point u of N0, 1). This expansion is called the Cornish Fisher expansion. The distribution of nr α ρ α )/τ α can be obtained from Theorem 2.1 with hx) = x. In this case, h ρ 2 α ) = 1 2 ρ α, h ρ 2 α ) = 1 4 ρ3 α. Now, we shall see that the large-sample asymptotic expansion e.g., see [11]) can be obtained by considering a largesample expansion for the high-dimensional asymptotic expansion 4.7). The statistic can be expanded for a large n as nr 2 α ρ 2 α ) = nr 2 α ρ2 α ) + 1 n 1 ρ 2 α )p + On 1 ). The coefficient b iα can be expanded as g iα + On 1 ) when n is large, where g 1α = 1 ρ 2 α ) q 1) + 2q 2)ρ 2 α + 2ρ2 α 1 ρ2 α ) β αρ 2 α ρ2 β ) 1, g 2α = 41 ρ 2 α )2 ρ 2 α, g 3α = 241 ρ 2 α )3 ρ 2 α 1 3ρ2 α ). 4.10) These equations imply the well-known asymptotic expansion in a large-sample case given by P nr 2 α ρ2 α )/ g 2α x) = Φx) + φx) { g 1α / g 2α ) + g 3α / g 2α ) 3 )x 2 1) } + On 3/2 ). 4.11)

8 238 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) Numerical accuracy In this section, we compare our high-dimensional approximations with the classical approximations based on the asymptotic distribution under a large-sample framework such that q and p are fixed and n tends toward infinity. The numerical accuracy is studied for the upper 5% points of the distribution of rα 2 when q = 3. The following values of the population canonical correlations were chosen: ρ 1 = 0.9, ρ 2 = 0.5, ρ 3 = 0.3. The upper 5% points of the distribution of rα 2 were approximated using the Cornish Fisher expansion. The highdimensional expansion is given by 4.9). The approximations using the limiting term and the expansion up to O 1/2 are denoted by a H0 and a H1, respectively. Similarly, the corresponding approximations in the large-sample case are denoted by a L0 and a L1. The Cornish Fisher expansion is obtained from 4.11). The percentage points based on these approximate upper 5% points are examined using the actual percentage points by simulation with 100,000 repetitions. These values are given in Tables From Tables , we can conclude that the following tendencies occur. The large-sample approximations work well only when p is very small and N is large. For the other case, the large-sample approximations will worsen. The approximation a L0 tends to approach 1 as p increases, while the approximation a L1 tends to approach 0 as p increases. If the large-sample approximations work well, the corresponding high-dimensional approximation works well. The high-dimensional approximations are good even when p is one-half of n. The approximation a H1 is better than the approximation a H0, especially when the population canonical correlation is small. The high-dimensional approximation is good even when n is small. Table 5.1 Actual probabilities for the approximate upper 5% points of r 2 1 N p a L0 a L1 a H0 a H Table 5.2 Actual probabilities for the approximate upper 5% points of r 2 2 N p a L0 a L1 a H0 a H

9 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) Table 5.3 Actual probabilities for the approximate upper 5 points of r 2 3 N p a L0 a L1 a H0 a H Conclusive remarks and discussion In this paper we obtained asymptotic expansions as well as the limiting distributions of canonical correlations under a high-dimensional framework 1.1). By simulation experiments Tables ), it was shown that the high-dimensional approximations are better than the large-sample approximations in a large range of p, q, N) except for p and q less than 3. The high-dimensional asymptotic expansions are useful for the distributions of the smallest canonical correlations. We proposed a transformation 3.3) whose asymptotic variance is distribution-free, under a high-dimensional framework. It is just an extension of Fisher s z-transformation. The new confidence interval 3.7) of ρ α based on the transformed statistic was compared with a classical confidence interval based on Fisher s z-transformation. As is seen in Table 3.2, the new confidence interval is more useful than the classical one even in the case of i) p = q = 2, N = 25 and ii) q = 2, p = 3 = 3, N = 37. However, it is pointed that the high-dimensional approximations worsen when q is large. An approach for overcoming the fault will be deriving asymptotic distributions of canonical correlations under the following high-dimensional framework. q, p, n, m 1 = n p, m 2 = n q, c 1 = p/n, c 2 = q/n c 01, c 02 [0, 1). This problem and the extension to a class of elliptical distributions, etc. are left as a future research topic. Acknowledgments The authors would like to thank two referees for their valuable comments. Appendix. Derivation of the asymptotic expansions For our derivation, we use the following distributional reduction on the canonical correlations. Lemma A.1. When we consider the distribution of a function of the canonical correlations r 1 > > r q, without loss of generality, we may assume that: 1) A 11 2 is distributed as a central Wishart distribution W q m, I q ), where m = n p. 2) Let B be the first q q submatrix of A 22. Then, given B, A 12 A 1 22 A 21 is distributed as a noncentral Wishart distribution W q p, I p ; Γ BΓ), where Γ = diagγ 1,..., γ q ) and γ α = ρ α / 1 ρ 2 α. 3) A 12 A 1 22 A 21 and A 11 2 are independent. 4) B is distributed as a central Wishart distribution W q n, I q ). The lemma was essentially obtained by Sugiura and Fujikoshi [12], except that 2) and 3) were given under a conditional setup. Note that 3) follows from the independence of A 11 2 and B. Let U = { 1 p p A 12A 1 22 A 21 I q + np )} Γ 2, V = ) 1 m m A 11 2 I q. A.1) Then, it is well known that the limiting distribution of V is normal. To show that the limiting distribution of U is normal under A1, we consider the characteristic function of U. Let T be a real symmetric matrix whose i, j) element is given by

10 240 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) δ ij )t ij /2. Here, δ ij is the Kronecker delta, i.e., δ ii = 1, δ ij = 0 i j). The characteristic function of U can be expressed as e.g., see [7, p. 444]) C U T) = E[expi tr TU)] = E B [E{expi tr TU) B}]. The conditional characteristic function can be evaluated as C U T B) = E[expi tr TU) B] [ = etr pit I q + n )] p Γ 2 I q 2i T p p/2 etr i p Γ T I q 2i 1 T) Γ B. p Therefore, [ C U T) = etr pit I q + n )] p Γ 2 I q 2i T p p/2 I q 2i T I q 2i 1 T) Γ p p Under framework A1 we can expand log C U T) as log C U T) = p tr T I q + np ) ) Γ 2 + p 2i 2 tr T + 1 ) 2 2i p 2 tr T i p 3 tr T) + p + n 2 tr 2i T I q 2i 1 T) Γ + 1 p p 2 tr 2i T I q 2i 1 2 T) Γ p p This implies the following lemma tr 2i T I q 2i 1 3 T) Γ + p p. Lemma A.2. Let U be the random matrix defined by A.1). Then, under framework A1 we can expand the characteristic function of U as { C U T) = exp i 2 φ 0 T) } [ ] 1 + i3 φ 1 T) + O 3/2, A.2) n n/2. where φ 0 T) = c 1 { c tr T tr Γ 2 T 2 + tr Γ 2 T) 2}, φ 1 T) = 4 3 c 3/2 { c tr T tr Γ 2 T tr Γ 2 TΓ 2 T 2 + tr Γ 2 T) 3}. Note A 1/ = 1 n 1 c) 1/2 {I q + 1 n 1 c) 1/2 V } 1/2 = 1 { 1 c) 1/2 I q 1 n 2 n 1 c) 1/2 V + 3 } 8n 1 c) 1 V 2 + O 3/2. Similarly, expanding A 1 22, under A1 we have the following perturbation expansion of Q = A 1/ A 12A 1 22 A 21A 1/ = Θ n Q 1) + 1 n Q2) + O 3/2, A.3) where Θ 2 = diagθ1 2,..., θ2) = 1 q c) 1 ci q + Γ 2 ), i.e., θα 2 = c + γ2 α )/1 c), α = 1,..., q, Q 1) c = 1 c U c) 1/2 Θ 2 V + VΘ 2), Q 2) = c) 3/2 c UV + VU) c) 1 Θ 2 V 2 + V 2 Θ 2) c) 1 VΘ 2 V.

11 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) Using a perturbation expansion A.3) of Q and a general result e.g., see [11]) for a perturbation expansion of its characteristic root, we can obtain y α = nl 2 α θ2 α ) = q1) αα + 1 n q2) αα + β α θαβ 2 q1) αβ )2 + O 1, where Q i) = q i) αβ ), i = 1, 2, and the elements are expressed as q 1) αβ = c1 c) 1 u αβ 1 2 θ2 α + θ2 β )v αβ, q 2) αα = c1 c) 3/2 q v αβ u αβ + 3 q 4 1 c) 1 θα 2 v 2 αβ + 1 q 1 c) 1 4 β=1 The limiting distribution of y α is the same as that of q 1) αα = c1 c) 1 u αα θ 2 α v αα. β=1 Since the limiting distributions of u αα and v αα are independently normal with mean 0, then the limiting distribution of y α is normal with mean 0. To compute its asymptotic variance, we use Ev αβ ) = 0, Eu αβ ) = 0, Ev 2 αβ ) = 1 + δ αβ, Eu 2 αβ ) = 1 + δ αβ)c 1 { c + γ 2 α + γ2 β + γ2 α γ2 β β=1 v 2 αβ θ2 β. A.4) } + O 1. A.5) The first two formulas in A.5) are easily derived. The last one is obtained by differentiating both sides of A.2) in Lemma A.2. These imply Theorem 2.1. To prove Theorem 4.1, it is sufficient to show 4.2) and 4.6). Before deriving them, we derive κ 3 z α ) in 4.4). We can write κ 3 z α ) = Ezα 3) 3Ez2 α )Ez α) + 2Ez α ) 3 = h θα 2)3 Ey 3 α ) n h θα 2)2 h θα 2)Ey4 α ) 3 { h θα 2)2 a 2α h θα 2)a 1α + 1 } n 2 h θα 2)a 2α + O. 3/2 Furthermore, Ey 3 α ) = κ 3y α ) + 3κ 1 y α )κ 2 + κ 1 y α ) 3 = 1 n {a 3 + 3a 1 a 2 } + O 3/2, Ey 4 α ) = κ 4y α ) + 4κ 3 y α )κ 1 y α ) + 3κ 2 y α ) 2 + 6κ 2 y α )κ 1 y α ) 2 + κ 1 y α ) 4 = 3a 2α + O 1. The formulas for a 1α and a 3α are derived by using A.4) with the help of the moment formulas as in A.5). However, to compute a 3α, we need the following moment formulas: Ev 3 αα ) = 8/ m = 81 c) 1/2 / n, Ev 4 αα ) = 12, Eu αα u αβ ) = O 1 α β), Ev2 αα v2 αβ ) = 2α β), Eu 3 αα ) = 8c 3/2 {c + 3γα 2 + 3γ3 α + γ6 α }/ n + O, 3/2 Eu 2 αα u2 αβ ) { } { } = 2c 2 c + 2γα 2 + γ4 α c + γα 2 + γ2 β + γ2 α γ2 β + O 1 α β). Note that the formulas for u αβ as in A.6) are obtained by by differentiating both sides of A.2) in Lemma A.2. The formulas 4.6) are obtained by using 4.2) and 4.5) with hx) = x/1 + x). A.6) References [1] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, third ed., John Wiley & Sons, New York, [2] Z.D. Bai, Methodologies in spectral analysis of large dimensional random matrices: A review, Statist. Sinica ) [3] R.N. Bhattacharya, J.K. Ghosh, On the validity of the formal Edgeworth expansion, Ann. Statist ) [4] Y. Fujikoshi, Asymptotic expansions for the distributions of the sample roots under nonnormality, Biometrika ) [5] I.M. Johnstone, On the distribution of the largest eigenvalue in principal component analysis, Ann. Statist ) [6] O. Ledoit, M. Wolf, Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size, Ann. Statist. 30 4) 2002) [7] R.J. Muirhead, Aspects of Multivariate Statistical Theory, John Wiley & Sons, New York, NY, [8] S. Raudys, D.M. Young, Results in statistical discriminant analysis: A review of the former Soviet Union literature, J. Multivariate Anal ) [9] J.R. Schott, Testing for complete independence in high dimensions, Biometrika )

12 242 Y. Fujikoshi, T. Sakurai / Journal of Multivariate Analysis ) [10] M. Siotani, An Introduction to Multivariate Analysis, Asakura-shoten, Tokyo, 1990 in Japanese). [11] M. Siotani, T. Hayakawa, Y. Fujikoshi, Modern Multivariate Statistical Analysis: A Graduate Course and Handbook, American Sciences Press, Columbus, OH, [12] N. Sugiura, Y. Fujikoshi, Asymptotic expansions of the non-null distributions of the likelihood ratio criteria for multivariate linear hypothesis and independence, Ann. Math. Statist. 40 3) 1969) [13] M.S. Srivastava, Methods of Multivariate Statistics, John Wiley & Sons, New York, 2002.

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,