Regularized LRT for Large Scale Covariance Matrices: One Sample Problem

Size: px
Start display at page:

Download "Regularized LRT for Large Scale Covariance Matrices: One Sample Problem"

Transcription

1 Regularized LRT for Large Scale ovariance Matrices: One Sample Problem Young-Geun hoi, hi Tim Ng and Johan Lim arxiv: v3 [stat.me] 0 Jun 206 August 6, 208 Abstract The main theme of this paper is a modification of the likelihood ratio test LRT for testing high dimensional covariance matrix. Recently, the correct asymptotic distribution of the LRT for a large-dimensional case the case p/n approaches to a constant γ 0, ] is specified by researchers. The correct procedure is named as corrected LRT. Despite of its correction, the corrected LRT is a function of sample eigenvalues that are suffered from redundant variability from high dimensionality and, subsequently, still does not have full power in differentiating hypotheses on the covariance matrix. In this paper, motivated by the successes of a linearly shrunken covariance matrix estimator simply shrinkage estimator in various applications, we propose a regularized LRT that uses, in defining the LRT, the shrinkage estimator instead of the sample covariance matrix. We compute the asymptotic distribution of the regularized LRT, when the true covariance matrix is the identity matrix and a spiked covariance matrix. The obtained asymptotic results have applications in testing various hypotheses on the covariance matrix. Here, we apply them to testing the identity of the true covariance matrix, which is a long standing problem in the literature, and show that the regularized LRT outperforms the corrected LRT, which is its non-regularized counterpart. In addition, we compare the power of the regularized LRT to those of recent non-likelihood based procedures. Keywords: Asymptotic normality; covariance matrix estimator; identity covariance matrix; high dimensional data; linear shrinkage estimator; linear spectral statistics; random matrix theory; regularized likelihood ratio test; spiked covariance matrix. Introduction High dimensional data are now prevalent everywhere that include genomic data in biology, Young-Geun hoi and Johan Lim are at Department of Statistics, Seoul National University, Seoul, Korea. hi Tim Ng is at Department of Statistics, honnam National University, Gwangju, Korea. All correspondences are to johanlim@snu.ac.kr

2 financial times series data in economics, and natural language processing data in machine learning and marketing. The traditional procedures that assume that sample size n is large and dimension p is fixed are not valid anymore for the analysis of high dimensional data. A significant amount of research are made to resolve the difficulty from the dimensionality of the data. This paper considers the inference problem of large scale covariance matrix whose dimension p is large compared to the sample size n. To be specific, we are interested in testing whether the covariance matrix equals to a given matrix; H 0 : Σ = Σ 0, where Σ 0 can be set I p without loss of generality. The likelihood ratio test LRT statistic for testing H 0 : Σ = I p is defined by LRT = tr p S n log S n p = li log l i, where S n is the unbiased and centered sample covariance matrix and l i is the i th largest eigenvalue of the sample covariance matrix. When p is finite, LRT follows the chi-square distribution with degrees of freedom pp + /2 asymptotically. However, this does not hold when p increases. Its correct asymptotic distribution is computed by Bai et al for the case p/n approaches γ 0, and both n and p increase. They further numerically show that their asymptotic normal distribution defines a valid procedure for testing H 0 : Σ = I p. The results of Bai et al are refined by Jiang et al. 202, which include the asymptotic null distribution for the case γ =. Despite of the correction of the null distribution, the sample covariance is known to have redundant variability when p is large, and it still remains a general question that the LRT is asymptotically optimal for testing problem in the n, p large scheme. In this paper, it is shown that the corrected LRT can be further improved by introducing a linear shrinkage component. In detail, we consider a modification of the LRT, denoted by regularized LRT rlrt, defined by rlrt = tr Σ log Σ p = p ψi log ψ i, where Σ is a regularized covariance matrix and ψ i is the i th largest eigenvalue of Σ. Here, we consider the regularization via linear shrinkage: Σ λs n + λi p. 2 2

3 We also occasionally notate rlrtλ to emphasize the use of the value λ. The linearly shrunken sample covariance matrix simply shrinkage estimator is known to reduce expected estimation loss of the sample covariance matrix Ledoit and Wolf, It is also successfully applied to many high-dimensional procedures to resolve the dimensionality problem. For example, Schäfer and Strimmer 2005 reconstruct a gene regulatory network from microarray gene expression data using the inverse of a regularized covariance matrix. hen et al. 20 propose a modified Hotelling s T 2 -statistic for testing high dimensional mean vectors and apply it to finding differentially expressed gene sets. We are motivated by the success of above examples and inspect whether the power can be improved by the reduced variability via linear shrinkage. To the best of our knowledge, our work is the first time to apply the linear shrinkage to the covariance matrix testing problem itself. We derive the asymptotic distribution of the proposed rlrtλ under two scenarios, i when Σ = I p for the null distribution, and additionally ii when Σ = Σ spike for power study. Here Σ spike means a covariance matrix from the spiked population model Johnstone, 200, roughly it is defined as a covariance matrix whose eigenvalues are all s but some finite nonunit spike. The spiked covariance matrix assumed here includes the well known compound symmetry matrix Σ cs ρ = I p + ρj p, where J p is the p p matrix of ones. The main results show that rlrtλ has normal distribution in asymptotic under both i and ii; their asymptotic means are different but the variances are same. The main results are useful in testing various one sample covariance matrices. To be specific, first, in testing H 0 : Σ = I p, i provides the asymptotic null distribution of rlrtλ. Second, combining i and ii provides the asymptotic power for an arbitrary spiked alternative covariance matrix including Σ cs ρ. Finally, the results with λ = provide various asymptotic distributions of the corrected LRT. Among these many applications, in this paper, we particularly focus on the LRT for testing H 0 : Σ = I p, which has long been studied by many researchers Anderson, 2003; Bai et al., 2009; hen et al., 200; Jiang et al., 202; Ledoit and Wolf, The paper is organized as follows. In Section 2, we briefly review results of the random matrix theory that are essential to the asymptotic theory of the proposed rlrt. The results include the limit of empirical spectral distribution ESD of the sample covariance matrix and the central limit theorem LT for linear spectral statistics LSS. In Section 3, we formally define the rlrt, and prove the asymptotic normality of the rlrt when the true 3

4 covariance matrix Σ is I p or Σ spike. In Section 4, the results developed in Section 3 are applied to testing H 0 : Σ = I p. Numerical study is provided to compare the powers of the LRT and other existing methods including the corrected LRT and other non-lrt tests by Ledoit and Wolf 2002 and hen et al In Section 5, we conclude the paper with discussions of several technical details of the rlrt, for example, close spiked eigenvalues. 2 Random matrix theory In this section, some useful properties of linear spectral statistics of the sample covariance matrix are introduced. The true covariance matrix Σ is identity or that from a spiked population model. The following notation is used throughout the paper. Let M be a real-valued symmetric matrix of size p p and α j M be the j th largest eigenvalue of the matrix M with natural labeling α p M α M. The spectral distribution SD for M is defined by F M t := p p δ αj Mt, t R, j= where and δ α t is a point mass function that can be also written, with notational abuse, as δ α t = Iα t. Here, IA denotes the indicator function of a set A. 2. Limiting spectral distribution of sample covariance matrix Let {z ij i,j be an infinite double array of independent and identically distributed realvalued random variables with Ez = 0, Ez 2 = and Ez 4 = 3. Let Z n = {z ij, i =, 2,..., n, j =, 2,..., p be the top-left n p block of the infinite double array. We assume that both n and p diverge and their ratio γ := p/n converges to a positive constant γ. The data matrix and the uncentered sample covariance matrix are X n = Z n Σ /2 p S 0 n = n X n X n respectively, where {Σ p, p =, 2,... is a sequence of p p nonrandom symmetric matrices. Note that the fourth moment condition Ez 4 = 3 is used later on in Proposition. In the random matrix theory literature Bai et al., 2009; Bai and Silverstein, 2004, the limiting distribution of empirical SD F S0 n is determined by both the limits of p/n and F Σp. Specifically, if H p := F Σp converges in distribution to H, a random 4 and

5 distribution function F S0 n converges in distribution to a nonrandom distribution function, say F γ,h with probability one. The definition of F γ,h is given by its Stieltjes transform m γ,h z that is the unique solution of the following system of equations: on z {z : Imz > 0. m γ,h z = γ mγ,h z + γ γz ; 3 z = m γ,h z + γ t dht 4 + tm γ,h z Generally, m γ,h z is known as the Stieltjes transform of the limiting SD of the so-called companion matrix for S 0 n, which is defined by S 0 n := n X nx n. The density of F γ,h can be calculated from m γ,h by the inversion formula, df γ,h dx x = lim z x γπ Im[mγ,H z], x R, z : Imz > 0. 5 In the special case Σ p = I p, we have H p t = δ t and the corresponding spectral distribution F γ,δ follows the Mar cenko-pastur law. To see this, note that the second equation of 4 can be rewritten as γ z = +, 6 m γ,δ z + m γ,δ z when H = δ. By the inversion formula, 5 yields the probability density function of the Mar cenko-pastur distribution indexed by γ when 0 < γ <, df γ,δ dx x = bγ xx aγ, aγ x bγ, 7 2πγx where aγ := γ 2 and bγ := + γ entral limit theorem for linear spectral statistics Many multivariate statistical procedures are based on F Sn, the empirical SD of the centered sample covariance matrix S n. onsider is a family of functionals of eigenvalues that is also called as linear spectral statistics LSS or linear eigenvalue statistics: p p gl j = j= gxdf Sn x where g is a function fulfilling certain complex-analytic conditions. 5

6 If the sample covariance matrix is uncentered, the central limit theorem LT for the corresponding LSS is developed in Bai and Silverstein 2004 and Bai et al The proposition below is adapted from Theorem 2. in Bai et al and used in the asymptotic property of the proposed regularized LRT under the null. Note that the centering term of the LT possesses a finite-dimensional proxy F γ,h p. Proposition Bai et al Let T n g be the functional { T n g = p gx d F S0 n x F γ,h p x. 8 Suppose that the two functions g and g 2, are complex analytic on an open domain containing an closed interval [aγ, bγ] on the real axis. If γ = p/n γ 0, and H p converges in distribution to δ, then the vector T n g, T n g 2 converges in distribution to a bivariate normal distribution with mean µg i = g iaγ + g i bγ 4 bγ g i x dx 9 2π aγ 4γ x γ 2 for i =, 2, and variance vg, g 2 = 2π 2 g z g 2 z 2 mz mz 2 2 dmz dmz 2, 0 where m = m γ,δ is defined in 6. The two contours in 0 are non-overlapping and containing [aγ, bγ], the support of F γ,δ. The proposition requires the data to have known population mean vector known as zero without loss of generality and considers the non-centered sample covariance matrix S 0 n. However, this is seldomly true in practice and it is common to use the unbiased and centered sample covariance matrix X n = B n Z n Σ /2 p and S n = /ñx n X n, where ñ = n and B n = I n /nj n. The extension of Proposition to the centered sample covariance matrix is studied recently in Zheng et al At the paper, the authors prove that, under the fourth moment assumptions the assumption Ez 4 = 3 in Section 2., Proposition is still valid for S n if T n g is redefined by { T n g = p gx d F Sn x F γ,h p x, 6

7 where ñ = n is the adjusted sample size and γ = p / ñ. This is named as substitution principle. The assumption H = δ roughly implies that the spectrum of Σ p is eventually concentrated around one. One simple example is Σ p = I p. Then the SD is H p = F Ip = δ, which trivially converges to δ. In addition, we note that the spiked population model Johnstone, 200 has H = δ as the limiting SD and is applicable to Proposition. In our settings, the spiked population model refers to the data whose covariance matrix Σ p has the following eigenvalue structure: Then the SD H p corresponding to Σ p is a,..., a {{, a 2,..., a 2,..., a {{ k,..., a k,,...,. 2 {{{{ n n k n 2 H p t = p K δ t + p p p K n i δ ai t, 3 where K = n +...+n k is a fixed finite integer, not depending on n so that p K eigenvalues of unity eventually dominate corresponding H p when p is large. Thus, the limiting SD remains unchanged as H = δ. To study the power of the proposed regularized LRT, it is useful to study how to apply the spiked population model to Proposition. Although the limiting SD is simple δ, the spiked population model has several difficulties with the use of Proposition. In the spiked model, its SD, H p in 3, has masses at K + distinct points and m γ,h p z is the solution to a polynomial equation of degree K + 2. A polynomial equation has an analytical solution only when its degree is less than or equal to 4. Therefore, if K 3, we do not have an analytic form of m γ,h p z. To resolve this difficulty, recently, Wang et al. 204 provides an approximation formula of gdf γ,h p in Proposition for the spiked population model. Such an approximation of gm γ,h p at m γ,δ is constructed based on the idea that m γ,h p z and m γ,δ z would be close enough if p is large. Proposition 2 Wang et al Suppose that γ < and H p is given by 3, with a i > γ for all i =, 2,..., K. If a complex-valued function g is analytic on an open domain containing the interval [aγ, bγ] and k points ϕa i := a i + γa i a i, i =, 2,..., K 7

8 on the real axis, then gxdf γ,h p x in 8 can be approximated by gxdf γ,h p x = g 2πip m + γ + m + f K 2πip m + γ + m + K gxdf γ,δ x + p p K K γ m a i n i + a i m + m n i a 2 i m dm 4 + a i m 2 m γ m dm 5 + m 2 n i gϕa i + O n 6 2 where m = m γ,δ is defined in 6 by substituting γ by γ, and is a counterclockwise contour [ ] enclosing the interval, γ + on the real axis. γ The above proposition is a special case of Theorem 2 in Wang et al. 204 when all the a i s are distant spikes, i.e., a i > γ. Note that Theorem 2 of Wang et al. 204 allows close spikes a i that is defined by a i γ. In this paper, we focus on the alternative hypothesis with distant spike in the power study. We finally remark that the substitution principle is directly applicable to Wang et al. s results, that is, one can approximate gxdf γ,h p x in by Proposition 2, with γ = p/n in the formula replaced by γ = p/n. 3 Main results In this section, the asymptotic results of the rlrt are presented. Here, the rlrt is defined via the linear shrinkage estimator instead of the sample covariance matrix : rlrtλ := tr Σ log Σ p, where Σ := λsn + λi p. The shrinkage intensity λ is fixed and chosen from 0,. Define ψx = λx + λ and gx = ψx log{ψx. We consider rlrtλ = p gxdf Sn x, whose sample covariance matrix term is of the centered version. Then Proposition along with the substitution principle yields the following results. Theorem. Let gx = ψx log{ψx and ψx = λx + λ with fixed λ 0,. Suppose that Σ p = I p. If γ = p / ñ γ 0, with ñ = n, then T n g = rlrtλ p gxdf γ,δ x 8

9 converges in distribution to the normal distribution with mean and variance where µg = log + λγ 2 4λ 2 γ 2 { vg = 2 λ M λ + γ λγ + λγ + 2π log + λγ 2λ γ cos θ dθ 7 4π 0 + N log M N, 8 M + N M, N = Mλ, γ, Nλ, γ := 2λ + λγ ± 2λ + λγ 2 + 4λ λ 2 λ The detailed proof of Theorem is given in Appendix A. Note that when λ =, µg = log γ/2 and vg = 2γ 2 log γ that are consistent with the results in Bai et al To see this, observe that the integral in the mean function 2π 0 log + λγ 2λ γ cos θ dθ approaches zero according to the dominate convergence theorem. In addition, it can be shown that M goes to / γ, and N goes to + as λ using the approximation formula x + x x + 2 x /2 x. Next, consider the finite-dimensional proxy gxdf γ,δ x. From the density function of Mar cenko-pastur law 7 and the fact that = xdf γ,δ x = df γ,δ x, 9 gxdf γ,δ x = = b γ a γ b γ a γ λx λ logλx + λ {b γ x{x a γ 2πx γ dx logλx + λ {b γ x{x a γ 2πx γ dx. By substituting x = + γ 2 γ cos θ, we have an alternative representation of the integral gxdf γ,δ x = 2 π log + λ γ 2λ γ cos θ π + γ 2 γ sin 2 θdθ. 20 cos θ 0 It is remarked that the right-hand-side of 20 can be evaluated via the standard numerical integration techniques. 9

10 Theorem 2 below establishes the asymptotic normality of the rlrt under the alternative hypothesis that the true covariance matrix from the spiked population model in Section 2. It follows directly from Proposition 2. Theorem 2. Let gx = ψx log{ψx and ψx = λx + λ with fixed λ 0,. Suppose that Σ p has SD H p t = p K δ p t + K p n iδ ai t as in 3 with a i > γ for all i =, 2,..., K. If γ = p/ñ γ 0, with ñ = n, then rlrtλ p gxdf γ,h p x N µg, vg, where µg, vg are defined in Theorem and p gxdf γ,h p x = K gxdf γ,δ x + K p p λ, γ + O with λ, γ = λ n i a i λ K K [ γ log M + K [ λ + n i λk M N n i log ψ{ϕa i K { a i M + + a i M a i γ M 2 ai n i log + a i M M + N + n 2 ] n i + a i M a i + a i MM + + γ M 2 M + 2 { ] a i γ + γ 2MN + M + N, a i M + N + where ϕa = a + γa a and both M and N in λ, γ are M = Mλ, γ and N = Nλ, γ. The proof of the above theorem is provided in Appendix A. Theorem 2 can be applied to Σ cs, the covariance matrix with compound symmetry, which is defined by + β/p β/p β/p β/p β β/p + β/p. Σ cs =.... = I p p + β. p J p β/p + β/p This matrix has a spiked eigenvalue structure; + β for one eigenvalue and for the other p eigenvalues. The corresponding SD is H p t = p δ p t + δ p +βt. Theorem 2 with K =, k =, n =, and a = + β gives the following corollary. 0

11 orollary. Let gx = ψx log{ψx and ψx = λx + λ with fixed λ 0,. Suppose that Σ p has SD H p t = p δ p t + δ p +βt with β > γ. If γ := p/ñ γ 0, where ñ = n, then rlrtλ p gxdf γ,h p x Nµg, vg, where µg, vg are defined in Theorem and gxdf γ,h p x = gxdf γ,δ x + p p λ, γ + O with λ, γ = λβ log ψ{ϕ + β + γ { λ + log M βm log { λ + λm N β + + βm n 2 + βm βm γ + βm βmm + + γ M 2 M + 2 where ϕa = a + γa a and both M and N in λ, γ are M = Mλ, γ and N = Nλ, γ. In the corollary, the condition β > γ means that the spiked eigenvalue + β is distant. 4 Testing the identity covariance matrix The results in Section 3 can be used for testing various hypotheses on one sample covariance matrix. In this section, we study the finite-sample properties of the proposed rlrt in testing H 0 : Σ = I p. Additionally, we compare the power of the proposed rlrt and the following existing procedures in the literature. Ledoit and Wolf 2002 assume that p / n γ 0, and propose a statistic T LW = { S p tr 2 I p n { p tr S 2 + p n. The asymptotic distribution of T LW is, if both n and p increase with p/n γ 0,, nt LW p converges in distribution to normal distribution with mean and variance 4.,

12 Bai et al and Zheng et al. 205 propose a corrected LRT for the cases where both n and p increases and p / n converges to γ 0,. The corrected LRT statistic is clrt = tr S n log S n p. They show that T lrt = vg {clrt /2 p gxdf γ,δ xdx µg converges in distribution to the standard normal distribution, where µg = log γ 2, vg = 2 log γ 2γ, and gxdf γ,δ xdx = γ γ log γ. Finally, hen et al. 200 proposes to use the statistic T = p V 2.n 2 p V.n +, where V.n = n V 2.n = P 2 n n + P 4 n i j X i X i P 2 n Xi X j i j X i X j 2 2 P 3 n Xi X j Xk X l i,j,k,l P r n = n!/n r! Xi X j Xj X k i,j,k and is the summation over different indices. The asymptotic theory suggests that, under the null, nt converges in distribution to the normal distribution with mean 0 and variance Power comparison with the clrt The asymptotic power curves of the clrt and rlrt for the alternative hypothesis of the compound symmetry can be obtained using orollary. When Σ p Σ cs β/p and γ γ, the probability of rejecting H 0 : Σ p = I p at level η is [ Φ Φ η + ] gxdf γ,δ x λ, γ, vg β > γ 22 2

13 where λ, γ is defined in orollary and Φ denotes the cumulative distribution function of the standard normal distribution. The powers of the clrt and rlrt with λ = 0.4 and λ = 0.7 are plotted in Figure. Each panel of Figure compares the powers of the clrt and rlrt for different sample size n. In each panel, the results of β < γ close spike are also included to study the performances when the the assumption of distant spike in orollary is violated. The results in Figure suggest that Theorem 2 would be applicable when there is a close spike eigenvalue. More detailed discussions are given in Section 5. We find that in all cases the rlrt has higher empirical power than the clrt for the chosen values of λ = 0.4 and 0.7. We also find that the empirical power curve increases to less rapidly if λ or γ is closer to. In addition, although we do not report the details, the empirical curves converge fast and do have minor changes after n = 80 for the selected values of λ and γ. To understand the power gain due to the use of the rlrt better, we plot the empirical density of the rlrt and clrt under the null and four alternative hypotheses A-A4 used in Section 4.2 in Figure 2. The figure shows that i the variances of the rlrt are smaller than the clrt under both null and alternative hypotheses and ii the distances between the null and the alternative distributions are larger in the rlrt than in the clrt. Accordingly, the rlrt has larger power than the clrt and we will see this is true regardless of the choice of n and p in the next section. 4.2 Power comparison with other existing procedures In this section, we numerically compare the empirical sizes and powers of the rlrt statistic to other existing tests, namely the corrected LRT clrt by Bai et al. 2009, the invariant test by Ledoit and Wolf 2002, and the non-parametric test by hen et al In this study, random samples of size n are generated from the p dimensional multivariate normal distribution MVN p 0, Σ. The covariance matrix is set as Σ = I p to obtain the empirical sizes. The sample size n is chosen as 20, 40, 80, and 60, and, for each n, γ = γ = p/n is chosen as 0.2, 0.5, and 0.8. For example, p = 32, 80, and 28 are considered for n = 60 in the simulation. We take 0.05 as as the level of significance. The shrinkage intensity λ of rlrtλ is selected from 0.2, 0.5, and 0.8 to investigate the effect of the magnitude of 3

14 Analytical n = 20 H_0 rejection prob β < γ H_0 rejection freq β β n = 40 n = 80 H_0 rejection freq H_0 rejection freq RLRT0.4 RLRT0.7 LRT β β Figure : Analytic and empirical power curves for the rlrt and clrt. the linear shrinkage. This means that we compare the empirical sizes of clrt, rlrt0.2, rlrt0.5, and rlrt0.8 under varying n and γ. To compare the powers, we consider the following four alternatives: A independent but heteroscedastic variance case Σ = diag2, 2,, 2,,,, where the number of 2 s is min{, 0.2p, where r is the round-down of r; A2 independent with a single diverging spiked eigenvalue as Σ = diag + 0.2p,,,, ; A3 compound symmetry Σ = Σ cs 0.2; and A4 compound symmetry Σ = Σ cs 0. as defined in 2. Here, the compound symmetry matrix Σ cs ρ has a single spiked eigenvalue + ρ p and p non-spiked eigenvalues 4

15 clrt rlrt Density Density Null A A2 A3 A Figure 2: Empirical density functions of the rlrt and the clrt under the null and four alternative hypotheses when n = 40, p = 32, and λ = 0.5: A independent but heteroscedastic variance case Σ = diag2, 2,, 2,,,, where the number of 2 s is min{, 0.2p, where r is the round-down of r; A2 independent with a single diverging spiked eigenvalue as Σ = diag + 0.2p,,,, ; A3 compound symmetry Σ = Σ cs 0.2; and A4 compound symmetry Σ = Σ cs 0. as defined in 2. of. Thus, A2 and A3 have the identical spectra. The sample size n and γ are chosen to be the same as those of the null. The empirical sizes and powers of the listed methods are reported in Table. First, the empirical sizes of all the tests approach to the aimed level 0.05 as n increases. However, the size of Ledoit and Wolf 2002 shows slower convergence and more upward bias than the size of the other tests do in all cases we considered here. For this reason, the power of Ledoit and Wolf 2002 after correcting the size empirically are also reported in Table, where the cut off value is decided based on 00, 000 simulated test statistics under the null for each simulation setting. Second, it can be seen that the emprical powers of the rlrts are higher than those of the clrt in all cases we considered. In addition, it is interesting to note that the power improvement is especially higher in the case γ = 0.8 when p is relatively largfe. Third, comparing to the tests of hen et al. 200 and the biased-corrected version of Ledoit and Wolf 2002, the proposed rlrt0.5 and rlrt0.2 has higher empirical power in most of the cases. Finally, we remark that the computational cost of hen et al. 200 is at least Opn 4 due to the fourth moment calculation so it is not suitable for data with 5

16 large n. In fact, to test the data with n = 500, hen et al. 200 takes tens of hours to finish all computation using codes and Intel ore-i7 PU, whereas the other tests only require seconds. 5 Discussion We conclude the paper with a few additional issues of the proposed rlrt not fully discussed in the mainbody of the paper. First, we consider the case where n < p; equivalently, γ. In this case, the clrt is not well-defined because the logarithm term, log S n = i logl i, contains some zero l i s. On the other hand, the rlrt is still well-defined as the corresponding logartithm term i log ψl i remains positive even if λ <. Since the LT of the linear spectral statistics holds for γ [0, Bai and Silverstein, 2004, it is possible to extend Theorem and 2 in this paper to the case where γ [,. Second, we discuss the case with closely spiked eigenvalues. In the compound symmetry model of 2, the close spiked eigenvalues are those where the spike + β is smaller than + γ. As shown in Figure, it appears that the power curves over the interval β 0, γ could be obtained simply by extending the formula of orollary to 0, γ. We remark that, however, if we follow Wang et al. 204, the term log ψ + β in orollary should be omitted when β 0, γ, leading to incoherence between the analytical and empirical power curves on 0, γ. Finally, the selection of shrinkage intensity λ is still not well understood for hypothesis testing. As a reviewer points out, when λ approaches 0, the rlrt becomes irrelevant with the alternative covariance matrix and its power is expected to be close to the size. Thus, an appropriate selection of λ is important for good performance of the rlrt. The selection of λ for the purpose of improved estimation is well studied in the literature, for example, Ledoit and Wolf 2004, Schäfer and Strimmer 2005 and Warton However, our additional numerical study shows that such a choice of λ for the estimation purpose cannot achieve a power gain in testing problem. The optimal selection of λ for the hypothesis testing needs further research. 6

17 γ n clrt rlrt rlrt rlrt hen LW Null A: Indep with hetero variance A2: Indep with single diversing spike p A3: ompound symmetry with ρ = A4: ompound symmetry with ρ = Table : Summary of sizes and powers over 00,000 replications except for hen et al. 200 and,000 replications for hen et al. 200 due to heavy computation. The empirically corrected powers of Ledoit and Wolf 2002 are reported in the parentheses. For each row, the maximum powers are highlighted in bold and the maximum powers among the four LRT-based tests are underlined. The powers for n = 60 are all equal to and are removed from the table. 7

18 Acknowledgements This paper is supported by National Research Foundation of Korea NRF grant funded by the government MSIP No The authors are grateful to two reviewers and the AE for a careful reading and providing improvements. A Proof of the main theorems A. Proof of Theorem Theorem is a direct consequence of Proposition. Here, we calculate the integrals 9 and 0 for gz = g z = g 2 z = ψz logψz, where ψz = λz + λ. Mean: Using Proposition and the substitution x = + γ 2 γ cos θ, µg i = g iaγ + g i bγ 4 = λγ log + λγ 2 4λ 2 γ 2 π 2π 0 bγ g i x 2π dx aγ 4γ x γ 2 λγ 2λ γ cos θ log + λγ 2λ γ cos θ dθ = log + λγ 2 4λ 2 γ 2 = log + λγ 2 4λ 2 γ 2 + π 2π + 4π 0 2π 0 log + λγ 2λ γ cos θ dθ log + λγ 2λ γ cos θ dθ. Variance: We write m := mz and m 2 = mz 2 for notational simplicity. We have vg = gz m gz 2π 2 2 m 2 m m 2 dm dm 2 2, 23 To evaluate this integral with auchy s formula, we need to identify the points of singularity in gzm. It can be seen that there is singularity when ψzm = 0. Rewrite ψzm as ψzm = λ + γ + λ = λm Mm N m m + m m + 8

19 where M, N = 2λ + λγ ± 2λ + λγ 2 + 4λ λ. 2 λ Then, M, N are points of singularity. Next, choose contours and 2 enclosing and M, but not 0 and N, such that on the contours, the logarithm in gz is single-valued. In addition, and 2 are chosen so that they do not overlap. Applying integration by parts and auchy s formula, we have gzm m m 2 dm 2 { λ λγ = m m Then, Here, { = 2πi m 2 λγ m m m 2 M vg = 2π 2 = i { gz 2 m 2 π gzm2 m + m M m N gz m gz 2 m 2 m m 2 dm dm 2 2 λγ m m m 2 M m m 2 dm dm = m 2 + dm 2 2 { λ m 2 2 = 2πi + N λγ m m 2 + m 2 + m 2 M m 2 N m 2 + dm 2 Applying integration by parts, auchy s formula, and Lemma, we have gzm m m 2 dm 2 Then, { λ = { = 2πi λγ m m m + m M m N λγ m m m 2 M m 2 vg = 2π 2 = i { gz 2 m 2 π gz m gz 2 m 2 m m 2 dm dm 2 2 λγ m m m 2 M m m 2 dm dm

20 Using integration by parts, gzm2 m 2 + dm 2 2 { λ λγ = m 2 2 m m 2 m 2 + m 2 M m 2 N m 2 + dm 2 = 2πi λ N Applying Lemma and auchy s formula, we obtain gzm2 m 2 + dm 2 = 2πi {log λ + N 28 and { gzm2 m 2 M dm 2 = 2πi λ M λm N λ log M ombining the results 26-29, we have the desired result of variance. A.2 Proof of Theorem 2 29 We compute gxdf γ,h p x = {λx λ logλx + λ df γ,h p x, where H p is the SD of spiked population model, which is written as H p t = p K δ t + p p Following the lines of Section 3 in Wang et al. 204, λx λ df γ,h p x = λ p n i δ ai t. n i a i λk p + O n. 2 The difficult part lies on the evaluation of integration of the logarithm-related term. Using the labeling in Proposition 2, we rewrite it as logλx + λdf γ,h p x = and calculate 4, 5 and 6 separately. In the remainder of the proof, we write m and γ instead of m and γ, respectively, for notational convenience. The terms 4 and 5 20

21 involve contour integrals. Recall that the contour on 4 and 5 encloses the closed interval [ γ, + γ ] on the real axis of the complex plane and has poles of {m =, {m = M, where M = Mλ, γ is defined in 9. It is easy to show that < M < γ and N > 0 provided γ 0, and n is large, where N = Nλ, γ is from 9. + γ Following the lines of Section 3.3 of Wang et al. 204, recall that λ + γ + λ = m +m λm Mm N. We have mm+ 4 = 2πip = = log 2πipγ 2πipγ = K 2πipγ log λ m + A + A 2. λ m + log λm N m γ K K + m + λ γm γ + λ +m m + log m M m+ m log m M m+ dm + m 2πipγ K K n i a 2 i m dm + a i m 2 n i a 2 i m 2 γ dm + a i m 2 n i a 2 i m 2 γ dm + a i m 2 log m M K n i a 2 i mγ m + + a i m dm 2 Here, and A 2 = = A = K log m M d log m 2πipγ m + K = log m d log m M 2πipγ m + K log m = M + 2πipγ m + m M dm 2πip 2πip = K pγ log M, log m M K m + A 3 A 4, log m M ni a i m + n i a 2 i m + a i m 2 dm + a i m + a i m 2 dm 2

22 where and A 3 = = A 4 = 2πip 2πip = 2πip = 2πip M + K = p = log m M n i a i m + + a i m dm n i log m M d log + ai m m + n i log + a i m d log m M m + n i log + a i m m + m M dm n i log a i p 2πip 2πip = M + 2πip = p n i log + a i M, log m M n i a i m + + a i m dm 2 n i + a i m d log m M m + n i + a i mm Mm + dm n i + a i M a i. ombiing the results of A + A 2 = A + A 3 A 4, we have 4 = K pγ log M + p ai n i log + a i M p n i + a i M. a i Next, consider 5. Taking f = log ψ, we have log ψ m + γ λ = + m λx + λ x= m + γ +m = λmm + λm Mm N. 22

23 Then 5 = 2πip = 2πip λ λ = 2πip λ λ log ψ m + n i γ + m K n i a i + a i m n i + m m γm dm + m 2 mm + a i m Mm N + a i m + m m γm dm + m 2 n i B B 2 B 3 + B 4, where and a i m + B = m Mm N + a i m dm a i M + = 2πi, + a i MM N a i γm 2 B 2 = m Mm N + a i mm + dm a i γm 2 = 2πi M N + a i MM + + a i γ, M + N + a i B 3 = m Mm N dm = 2πi M N, B 4 = γm 2 m Mm Nm + dm 2 = 2πi γm 2 γ2mn + M + N M NM + 2 M + 2 N + 2. ollecting the four terms, we have : 5 λ = p λ [ { a i M + n i M N + a i M a i γm 2 + a i MM + + γm 2 M + 2 { ] a i γ γ2mn + M + N +. M + N + a i M + N + To obtain 6, note that the integration term logψxdf γ,δ x is equal to {ψx+ 23

24 logψx df γ,δ x since M-P law satisfies xdf γ,δ x = df γ,δ x. This gives 6 = K gxdf γ,δ x + n i log ψ{ϕa i + O p p n 2 where ϕa i = a i + γa i a i. Finally, combining the four results, we finally obtain the centering term : {ψx logψx df γ,h p x = = λx λ df γ,δ x K p gxdf γ,δ x + p λ, γ + O n 2, where n = λ n i a i λ K K [ γ log M + K [ λ + n i λk M N A.3 Technical lemma n i log ψ{ϕa i K { a i M + + a i M a i γm 2 ai n i log + a i M M + N + { ] n i + a i M a i + a i MM + + γm 2 M + 2 ] a i γ γ2mn + M + N +. a i M + N + Lemma. Let z A and z B be any two different fixed complex numbers. Then, a for any contour enclosing z A and z B such that log z z A z z B is single-valued on the contour, we have z z A log z z A z z B dz = 0, b for any contour enclosing z A and z B, we have dz z z A z z B = 0 24

25 Figure 3: Illustration of the path of the contour integral Proof. a Let be a contour enclosing such that both and are clockwise anticlockwise, and on, we have z z A < z B z A see Figure 3. Then, D, the singly connected region between and as indicated in Figure 3 contains no singularity. Therefore, the integral on and are the same. Next, consider the power series expansion log z z A = log + z A z B z z B z z { A za z B = 2 za z B + 3 za z B.... z z A 2 z z A 3 z z A Such power series converges on. The desired result is a consequence of dz z z A i = 0 for all i = 2, 3,.... b Applying auchy s formula, we have { dz z z A z z B = z A z B z z A z z B dz = z A z B = 0. References Anderson, T. W An Introduction to Multivariate Statistical Analysis. Wiley- Interscience, 3rd edition. 25

26 Bai, Z., Jiang, D., Yao, J.-F., and Zheng, S orrections to LRT on large-dimensional covariance matrix by RMT. The Annals of Statistics, 376B: Bai, Z. and Yao, J On sample eigenvalues in a generalized spiked population model. Journal of Multivariate Analysis, 06: Bai, Z. D. and Silverstein, J. W. J LT for linear spectral statistics of largedimensional sample covariance matrices. The Annals of Probability, 32A: ai, T. T. and Ma, Z Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli, 95B: hen, L. S., Paul, D., Prentice, R. L., and Wang, P. 20. A regularized Hotelling s T-test for pathway analysis in proteomic studies. Journal of the American Statistical Association, 06496: hen, S., Zhang, L., and Zhong, P Tests for high-dimensional covariance matrices. Journal of the American Statistical Association, 05490: Jiang, D., Jiang, T., and Yang, F Likelihood ratio tests for covariance matrices of high-dimensional normal distributions. Journal of Statistical Planning and Inference, 428: Johnstone, I. M On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics, 292: Ledoit, O. and Wolf, M Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. The Annals of Statistics, 304: Ledoit, O. and Wolf, M A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 882: Pan, G omparison between two types of large sample covariance matrices. Annales de l Institut Henri Poincaré, Probabilités et Statistiques, 502: Pan, G. M. and Zhou, W entral limit theorem for signal-to-interference ratio of reduced rank linear receiver. The Annals of Applied Probability, 83:

27 Schäfer, J. and Strimmer, K A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology, 4. Wang,., Yang, J., Miao, B., and ao, L Identity tests for high dimensional data using RMT. Journal of Multivariate Analysis, 8: Wang, Q., Silverstein, J. W., and Yao, J.-f A note on the LT of the LSS for sample covariance matrix from a spiked population model. Journal of Multivariate Analysis, 30: Warton, D. I Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 0348: Won, J.-H., Lim, J., Kim, S.-J., & Rajaratnam, B ondition number regularized covariance estimation. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 753: Zheng, S., Bai, Z. D, and Yao, J-F Substitution principle for LT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. The Annals of Statistics, 43:

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST

TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST Submitted to the Annals of Statistics arxiv:. TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST By Dandan Jiang Jilin University arxiv:151.398v1 [stat.me] 11 Oct 15 This paper

More information

Non white sample covariance matrices.

Non white sample covariance matrices. Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix Taras Bodnar a, Holger Dette b and Nestor Parolya c a Department of Mathematics, Stockholm University, SE-069

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment

More information

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p. Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

Weiming Li and Jianfeng Yao

Weiming Li and Jianfeng Yao LOCAL MOMENT ESTIMATION OF POPULATION SPECTRUM 1 A LOCAL MOMENT ESTIMATOR OF THE SPECTRUM OF A LARGE DIMENSIONAL COVARIANCE MATRIX Weiming Li and Jianfeng Yao arxiv:1302.0356v1 [stat.me] 2 Feb 2013 Abstract:

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

Covariance estimation using random matrix theory

Covariance estimation using random matrix theory Covariance estimation using random matrix theory Randolf Altmeyer joint work with Mathias Trabs Mathematical Statistics, Humboldt-Universität zu Berlin Statistics Mathematics and Applications, Fréjus 03

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

Appendix A: The time series behavior of employment growth

Appendix A: The time series behavior of employment growth Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time

More information

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications J.P Bouchaud with: M. Potters, G. Biroli, L. Laloux, M. A. Miceli http://www.cfm.fr Portfolio theory: Basics Portfolio

More information

MOMENTS OF HYPERGEOMETRIC HURWITZ ZETA FUNCTIONS

MOMENTS OF HYPERGEOMETRIC HURWITZ ZETA FUNCTIONS MOMENTS OF HYPERGEOMETRIC HURWITZ ZETA FUNCTIONS ABDUL HASSEN AND HIEU D. NGUYEN Abstract. This paper investigates a generalization the classical Hurwitz zeta function. It is shown that many of the properties

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

An introduction to G-estimation with sample covariance matrices

An introduction to G-estimation with sample covariance matrices An introduction to G-estimation with sample covariance matrices Xavier estre xavier.mestre@cttc.cat Centre Tecnològic de Telecomunicacions de Catalunya (CTTC) "atrices aleatoires: applications aux communications

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA) The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator

Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator Clifford Lam Department of Statistics, London School of Economics and Political Science Abstract We introduce nonparametric

More information

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title itle On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime Authors Wang, Q; Yao, JJ Citation Random Matrices: heory and Applications, 205, v. 4, p. article no.

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Likelihood Ratio Tests for High-dimensional Normal Distributions

Likelihood Ratio Tests for High-dimensional Normal Distributions Likelihood Ratio Tests for High-dimensional Normal Distributions A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Fan Yang IN PARTIAL FULFILLMENT OF THE

More information

Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond

Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond arxiv:1707.04010v2 [math.st] 17 Apr 2018 Xinxin Yang Department of ISOM, Hong Kong University of Science and Technology

More information

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices by Jack W. Silverstein* Department of Mathematics Box 8205 North Carolina State University Raleigh,

More information

Tests for Covariance Matrices, particularly for High-dimensional Data

Tests for Covariance Matrices, particularly for High-dimensional Data M. Rauf Ahmad Tests for Covariance Matrices, particularly for High-dimensional Data Technical Report Number 09, 200 Department of Statistics University of Munich http://www.stat.uni-muenchen.de Tests for

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Multivariate Time Series: Part 4

Multivariate Time Series: Part 4 Multivariate Time Series: Part 4 Cointegration Gerald P. Dwyer Clemson University March 2016 Outline 1 Multivariate Time Series: Part 4 Cointegration Engle-Granger Test for Cointegration Johansen Test

More information

RENORMALIZATION OF DYSON S VECTOR-VALUED HIERARCHICAL MODEL AT LOW TEMPERATURES

RENORMALIZATION OF DYSON S VECTOR-VALUED HIERARCHICAL MODEL AT LOW TEMPERATURES RENORMALIZATION OF DYSON S VECTOR-VALUED HIERARCHICAL MODEL AT LOW TEMPERATURES P. M. Bleher (1) and P. Major (2) (1) Keldysh Institute of Applied Mathematics of the Soviet Academy of Sciences Moscow (2)

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0. Matrices Operations Linear Algebra Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0 The rectangular array 1 2 1 4 3 4 2 6 1 3 2 1 in which the

More information

Topic 4 Notes Jeremy Orloff

Topic 4 Notes Jeremy Orloff Topic 4 Notes Jeremy Orloff 4 auchy s integral formula 4. Introduction auchy s theorem is a big theorem which we will use almost daily from here on out. Right away it will reveal a number of interesting

More information

1 Discussion on multi-valued functions

1 Discussion on multi-valued functions Week 3 notes, Math 7651 1 Discussion on multi-valued functions Log function : Note that if z is written in its polar representation: z = r e iθ, where r = z and θ = arg z, then log z log r + i θ + 2inπ

More information

Near-exact approximations for the likelihood ratio test statistic for testing equality of several variance-covariance matrices

Near-exact approximations for the likelihood ratio test statistic for testing equality of several variance-covariance matrices Near-exact approximations for the likelihood ratio test statistic for testing euality of several variance-covariance matrices Carlos A. Coelho 1 Filipe J. Marues 1 Mathematics Department, Faculty of Sciences

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

The Calculus of Residues

The Calculus of Residues hapter 7 The alculus of Residues If fz) has a pole of order m at z = z, it can be written as Eq. 6.7), or fz) = φz) = a z z ) + a z z ) +... + a m z z ) m, 7.) where φz) is analytic in the neighborhood

More information

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1815 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

Hypothesis testing in multilevel models with block circular covariance structures

Hypothesis testing in multilevel models with block circular covariance structures 1/ 25 Hypothesis testing in multilevel models with block circular covariance structures Yuli Liang 1, Dietrich von Rosen 2,3 and Tatjana von Rosen 1 1 Department of Statistics, Stockholm University 2 Department

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX September 2007 MSc Sep Intro QT 1 Who are these course for? The September

More information

1 Intro to RMT (Gene)

1 Intro to RMT (Gene) M705 Spring 2013 Summary for Week 2 1 Intro to RMT (Gene) (Also see the Anderson - Guionnet - Zeitouni book, pp.6-11(?) ) We start with two independent families of R.V.s, {Z i,j } 1 i

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao A note on a Marčenko-Pastur type theorem for time series Jianfeng Yao Workshop on High-dimensional statistics The University of Hong Kong, October 2011 Overview 1 High-dimensional data and the sample covariance

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

Random Matrices for Big Data Signal Processing and Machine Learning

Random Matrices for Big Data Signal Processing and Machine Learning Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, 2017 1 / 153 Outline Basics of Random

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 11 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,, a n, b are given real

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Chemistry 5.76 Revised February, 1982 NOTES ON MATRIX METHODS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Chemistry 5.76 Revised February, 1982 NOTES ON MATRIX METHODS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Chemistry 5.76 Revised February, 198 NOTES ON MATRIX METHODS 1. Matrix Algebra Margenau and Murphy, The Mathematics of Physics and Chemistry, Chapter 10, give almost

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Notes on the Matrix-Tree theorem and Cayley s tree enumerator Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will

More information

A matrix over a field F is a rectangular array of elements from F. The symbol

A matrix over a field F is a rectangular array of elements from F. The symbol Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F ) denotes the collection of all m n matrices over F Matrices will usually be denoted

More information

Putzer s Algorithm. Norman Lebovitz. September 8, 2016

Putzer s Algorithm. Norman Lebovitz. September 8, 2016 Putzer s Algorithm Norman Lebovitz September 8, 2016 1 Putzer s algorithm The differential equation dx = Ax, (1) dt where A is an n n matrix of constants, possesses the fundamental matrix solution exp(at),

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

An Introduction to Large Sample covariance matrices and Their Applications

An Introduction to Large Sample covariance matrices and Their Applications An Introduction to Large Sample covariance matrices and Their Applications (June 208) Jianfeng Yao Department of Statistics and Actuarial Science The University of Hong Kong. These notes are designed for

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

1. General Vector Spaces

1. General Vector Spaces 1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

arxiv: v1 [math-ph] 19 Oct 2018

arxiv: v1 [math-ph] 19 Oct 2018 COMMENT ON FINITE SIZE EFFECTS IN THE AVERAGED EIGENVALUE DENSITY OF WIGNER RANDOM-SIGN REAL SYMMETRIC MATRICES BY G.S. DHESI AND M. AUSLOOS PETER J. FORRESTER AND ALLAN K. TRINH arxiv:1810.08703v1 [math-ph]

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

FRACTIONAL HYPERGEOMETRIC ZETA FUNCTIONS

FRACTIONAL HYPERGEOMETRIC ZETA FUNCTIONS FRACTIONAL HYPERGEOMETRIC ZETA FUNCTIONS HUNDUMA LEGESSE GELETA, ABDULKADIR HASSEN Both authors would like to dedicate this in fond memory of Marvin Knopp. Knop was the most humble and exemplary teacher

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

Numerical Solutions to the General Marcenko Pastur Equation

Numerical Solutions to the General Marcenko Pastur Equation Numerical Solutions to the General Marcenko Pastur Equation Miriam Huntley May 2, 23 Motivation: Data Analysis A frequent goal in real world data analysis is estimation of the covariance matrix of some

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Anderson-Darling Type Goodness-of-fit Statistic Based on a Multifold Integrated Empirical Distribution Function

Anderson-Darling Type Goodness-of-fit Statistic Based on a Multifold Integrated Empirical Distribution Function Anderson-Darling Type Goodness-of-fit Statistic Based on a Multifold Integrated Empirical Distribution Function S. Kuriki (Inst. Stat. Math., Tokyo) and H.-K. Hwang (Academia Sinica) Bernoulli Society

More information

An introduction to Birkhoff normal form

An introduction to Birkhoff normal form An introduction to Birkhoff normal form Dario Bambusi Dipartimento di Matematica, Universitá di Milano via Saldini 50, 0133 Milano (Italy) 19.11.14 1 Introduction The aim of this note is to present an

More information