Effects of data dimension on empirical likelihood

Similar documents
High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

University of California, Berkeley

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Adjusted Empirical Likelihood for Long-memory Time Series Models

A note on profile likelihood for exponential tilt mixture models

Multivariate-sign-based high-dimensional tests for sphericity

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood

On corrections of classical multivariate tests for high-dimensional data

High-dimensional asymptotic expansions for the distributions of canonical correlations

Penalized Empirical Likelihood and Growing Dimensional General Estimating Equations

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Empirical likelihood for linear models in the presence of nuisance parameters

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Stochastic Design Criteria in Linear Models

Miscellanea Multivariate sign-based high-dimensional tests for sphericity

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Empirical Likelihood Tests for High-dimensional Data

Thomas J. Fisher. Research Statement. Preliminary Results

High-dimensional two-sample tests under strongly spiked eigenvalue models

On testing the equality of mean vectors in high dimension

Extending the two-sample empirical likelihood method

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Empirical Likelihood Inference for Two-Sample Problems

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

Statistica Sinica Preprint No: SS R1

Research Article Optimal Portfolio Estimation for Dependent Financial Returns with Generalized Empirical Likelihood

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

11 Survival Analysis and Empirical Likelihood

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

Statistical Inference On the High-dimensional Gaussian Covarianc

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

Likelihood ratio confidence bands in nonparametric regression with censored data

Calibration of the Empirical Likelihood Method for a Vector Mean

Inference For High Dimensional M-estimates. Fixed Design Results

Inference on reliability in two-parameter exponential stress strength model

A nonparametric two-sample wald test of equality of variances

Lecture 3. Inference about multivariate normal distribution

CONSTRUCTING NONPARAMETRIC LIKELIHOOD CONFIDENCE REGIONS WITH HIGH ORDER PRECISIONS

Jackknife Euclidean Likelihood-Based Inference for Spearman s Rho

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL

Carl N. Morris. University of Texas

Tests for High Dimensional Generalized Linear Models

Asymptotic inference for a nonstationary double ar(1) model

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

Quantile Regression for Residual Life and Empirical Likelihood

Interval Estimation for AR(1) and GARCH(1,1) Models

A Review on Empirical Likelihood Methods for Regression

Chapter 7, continued: MANOVA

Empirical likelihood based modal regression

Inference For High Dimensional M-estimates: Fixed Design Results

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

An Introduction to Multivariate Statistical Analysis

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

What s New in Econometrics. Lecture 13

Goodness-of-fit tests for the cure rate in a mixture cure model

Multivariate Normal-Laplace Distribution and Processes

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Analysis of variance, multivariate (MANOVA)

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications

ASSESSING A VECTOR PARAMETER

Applied Multivariate and Longitudinal Data Analysis

Efficiency of Profile/Partial Likelihood in the Cox Model

Empirical likelihood-based methods for the difference of two trimmed means

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

An Information Criteria for Order-restricted Inference

Bayesian Linear Regression

Weighted empirical likelihood estimates and their robustness properties

Single Index Quantile Regression for Heteroscedastic Data

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Variance Function Estimation in Multivariate Nonparametric Regression

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report -

The Hybrid Likelihood: Combining Parametric and Empirical Likelihoods

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Notes on Random Vectors and Multivariate Normal

Likelihood and p-value functions in the composite likelihood context

Applied Multivariate and Longitudinal Data Analysis

Statistical inference on Lévy processes

Improving linear quantile regression for

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

Likelihood ratio testing for zero variance components in linear mixed models

Marginal tests with sliced average variance estimation

Inferences about a Mean Vector

Quick Review on Linear Multiple Regression

Empirical Likelihood Methods for Pretest-Posttest Studies

The Delta Method and Applications

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Transcription:

Biometrika Advance Access published August 8, 9 Biometrika (9), pp. C 9 Biometrika Trust Printed in Great Britain doi:.9/biomet/asp7 Effects of data dimension on empirical likelihood BY SONG XI CHEN Department of Statistics, Iowa State University, Iowa 5-, U.S.A. songchen@iastate.edu LIANG PENG School of Mathematics, Georgia Institute of Technology, Atlanta, Georgia -6, U.S.A. peng@math.gatech.edu AND YING-LI QIN Department of Statistics, Iowa State University, Iowa 5-, U.S.A. qinyl@iastate.edu SUMMARY We evaluate the effects of data dimension on the asymptotic normality of the empirical likelihood ratio for high-dimensional data under a general multivariate model. Data dimension and dependence among components of the multivariate random vector affect the empirical likelihood directly through the trace and the eigenvalues of the covariance matrix. The growth rates to infinity we obtain for the data dimension improve the rates of Hjort et al. (8). Some key words: Asymptotic normality; Data dimension; Empirical likelihood; High-dimensional data.. INTRODUCTION Since Owen (988, 99) introduced empirical likelihood, it has been extended to a wide range of settings as a tool for nonparametric and semiparametric inference. Its most attractive property is its permitting likelihood-like inference in nonparametric or semiparametric settings, largely due to its sharing two key properties with the conventional likelihood: Wilks theorem and Bartlett correction (Hall & La Scala, 99; DiCiccio et al., 99; Chen & Cui, 6). High-dimensional data are increasingly common; for instance, in DNA and genetic microarray analysis, marketing research and financial applications. There is a rapidly expanding literature on multivariate analysis where the data dimension p depends on the sample size n and grows to infinity as n ; see, for example, Portnoy (984, 985) in the context of M-estimation, Bai & Saranadasa (996) for two-sample test for means, Ledoit & Wolf ()andschott (5) for testing a specific covariance structure and Schott (7) for tests with more than two samples. Given the interest in both high-dimensional data and empirical likelihood, there is a need to evaluate the behaviour of the latter when the data dimension and the sample size increase simultaneously. In this paper, we evaluate the effects of the data dimension and dependence on the asymptotic normality of the empirical likelihood ratio statistic for the mean.

SONG XI CHEN, LIANG PENG AND YING-LI QIN Let X,...,X n be independent and identically distributed p-dimensional random vectors in R p with mean vector µ = (µ,...,µ p ) T and nonsingular variance matrix.let L n (µ) = sup ( n i= π i : π i, π i =, i= ) π i X i = µ be the empirical likelihood for µ and let w n (µ) = log{n n L n (µ)} be the empirical likelihood ratio statistic. When p is fixed, Owen (988, 99) showed that i= () w n (µ) χ p () in distribution as n, which mimics Wilks theorem for parametric likelihood ratios. An extension of the above result for parameters defined by general estimating equations is given in Qin & Lawless (994). As p for high-dimensional data, the natural substitute for ()is (p) / {w n (µ) p} N(, ) () in distribution as n, since χp is asymptotic normal with mean p and variance p. Akey question is how large the dimension p can be while () remains valid. In a recent study, Hjort et al. (8) have established that it is p = o(n / ) under the assumptions: Assumption. Assumption. The eigenvalues of are uniformly bounded away from zero and infinity, and All components of X i are uniformly bounded random variables. When Assumption is relaxed, we have: Assumption. E( p / X i q )andp p ( j) j= E( X i µ j q ) are bounded for some q 4, where is the Euclidean norm. Hjort et al. (8) showed that () is valid if p +6/(q ) /n. When q = 4 in Assumption, it means p = o(n /6 ). Hence, there is a significant slowingdown on the rate of p when Assumption is weakened. Tsao (4) found that, when p is moderately large but fixed, the distribution of w n (µ) has an atom at infinity for fixed n: the probability of w n (µ) = is nonzero. Tsao showed that, if p and n increase at the same rate such that p/n 5, the probability of w n (µ) = converges to since the probability of µ being contained in the convex hull of the sample converges to. These reveal the effects of p on the empirical likelihood from another perspective. In this paper, we analyze the empirical likelihood for high-dimensional data under a general multivariate model, which facilitates a more detailed analysis than Hjort et al. (8) and allows less restrictive conditions. The analysis requires neither the largest eigenvalue of nor E( p / X i q ) to be bounded, and hence accommodates a wider range of dependences among components of X i. Our main finding is that the effect of the dimensionality and the dependence among components of X i on the empirical likelihood are leveraged through tr( ), the trace of the covariance matrix and its largest eigenvalue λ p. We provide a general rate for the dimension p, which is shown to be dependent on tr( ) andλ p. In particular, under Assumptions and, p = o(n / ), which improves p = o(n / )ofhjort et al. (8). This is likely to be the best rate for p in the context of the empirical likelihood as p = o(n / ) is the sufficient and necessary condition for the

Empirical likelihood convergence of the sample covariance matrix to under the trace-norm when all the eigenvalues of are bounded.. PRELIMINARIES Suppose that each of the independent and identically distributed observations X i R p is specified by X i = ƔZ i + µ, whereɣ is a p m matrix, m p, andz i = (Z i,...,z im ) T is a random vector such that E(Z i ) =, var(z i ) = I m, E ( Zil 4k ) = m4k (, ), (4) E ( Z α il Z α q il q ) = E ( Z α il ) E ( Z α q il q ), whenever q l= α l 4k and l l q. Here k is some positive integer and I m is the m-dimensional identity matrix. The above multivariate model, employed in Bai & Saranadasa (996), means that each X i is a linear transformation of some m-variate random vector Z i. An important feature is that m, the dimension of Z i, is arbitrary provided m p and ƔƔ T =, which can generate a rich collection of X i from Z i with the given covariance. It also requires that power transformations of different components of Z i are uncorrelated, which is weaker than assuming that they are independent. The model (4) encompasses many multivariate models. It includes the elliptically contoured distributions with Z i = RU (m) where R is a nonnegative random variable and U (m) is the uniform random vector on the unit sphere (Fang & Zhang, 99). The multivariate normal and t-distribution are elliptically contoured, and so are a mixture of normal distributions whose density is defined by n(x µ, v )dw(v), where n(x µ, ) is the density of N(µ, ) and w(v) is the distribution function of a nonnegative univariate random variable (Anderson, ). Both the moment conditions and the correlation are imposed on Z i rather than X i. This model structure allows the moments of X i µ k to be derived and allows us to conduct a more detailed analysis than possible in Hjort et al. (8). The integer k determines the number of finite moments for Z il.ask, each Z il has at least finite fourth moments. This is the minimal moment condition to ensure the convergence of the largest eigenvalue of the sample covariance matrix to the largest eigenvalues of (Yin et al., 988; Bai et al., 998), and hence the convergence of the sample covariance matrix to under the matrix norm based on the largest eigenvalue. By inspecting the proofs given in the Appendix, we see that a divergent sample covariance matrix would dramatically alter the asymptotic mean and variance of the empirical likelihood ratio. Hence, it is unclear if () would remain true. From the standard empirical likelihood solutions (Owen, 988, 99) that are valid for any p, fixed or growing, the optimal weights π i for the optimization problem () are π i = n where λ R p is a Lagrange multiplier satisfying g(λ) = i= + λ T (X i µ), X i µ =. (5) + λ T (X i µ) Hence, the empirical likelihood L n (µ) equals n n n i= { + λ T (X i µ)}. As the maximum empirical likelihood is attained at π i = n (i =,...,n), the empirical likelihood ratio

4 SONG XI CHEN, LIANG PENG AND YING-LI QIN statistic is w n (µ) = log{n n L n (µ)} = log{ + λ T (X i µ)}. Throughout the paper we let γ (A) γ p (A) denote the eigenvalues and let tr(a) denote the trace operator of a matrix A. When A =, we write γ j ( )asγ j ( j =,...,p). It is assumed throughout the paper that γ C for some positive constant C. i=. EFFECTS OF HIGH DIMENSION The Lagrange multiplier λ defined in (5) is a key element in any empirical likelihood formulation, and reflects the implicit nature of the methodology. When p is fixed, Owen (99) showed that λ = O p (n / ). (6) This has been the prevailing order for the λ except in nonparametric curve estimation, where n is replaced by the effective sample size (Chen, 996). When p grows with n,(6) is no longer valid. THEOREM. If {tr( )} 4k γ p = O(n k ) and γ p p = o(n), then λ = O p [{tr( )/n} / ]. Theorem implies that the effect of the dimension and dependence among components of X i on the Lagrange multiplier is directly determined through tr( )andγ p. The rate for λ can be regarded as a generalization of (6) for a fixed p since O p [{ tr( )/n} / ] degenerates to O p (n / ) in that case. We first study the effects of dimension on the asymptotic normality of w n (µ), assuming existence of the minimal fourth moment for each Z il. Later, we will increase the number of moments. We assume for the time being that k = in(4) and tr 5 ( )γ 5 p = o(np). Since pγ tr( ) pγ p, this implies the conditions of Theorem. We wish to establish an expansion for w n (µ). Put W i = λ T (X i µ). From (A7) of the Appendix, max i=,...,n W i =o p (), which allows log{ + λ T (X i µ)} =W i W i / + W i /( + ξ i) 4, (7) where ξ i λ T (X i µ). Expand (5) so that = g(λ) = X µ S n λ + β n, where β n = n n i= (X i µ)w i /( + ξ i) for some ξ i λ T (X i µ) and S n = n n i= (X i µ)(x i µ) T. Hence, From (7) and (8), we obtain an expansion for w n (µ): w n (µ) = n( X µ) T Sn ( X µ) nβ n Sn β n + λ = Sn ( X µ) + Sn β n. (8) ni= {λ T (X i µ)} /( + ξ i ) 4 = n( X µ) T ( X µ) + n( X µ) T (S n )( X µ) nβ n S n β n + R n{ + o p ()}, (9) where R n = n i= {λ T (X i µ)}. This expansion looks similar to that given in Owen (99) for afixedp, but the stochastic order of each term requires careful evaluation as p grows with n.

From Lemma 5 in the Appendix, we have Empirical likelihood 5 (p) / {n( X µ) T ( X µ) p} N(, ) () in distribution as n, which is true under much weaker conditions, for instance p/n c, by applying the martingale central limit theorem. Derivations given in the Appendix show that the other two terms on the right-hand side of (9) are both o p (p / ). These lead us to establish () as summarized in the following theorem. THEOREM. If k = in (4) and tr 5 ( )γ 5 p = o(np), then () is valid. Theorem indicates that, when γ p is bounded, () istrueifp = o(n /4 ), which improves the order p = o(n /6 ) obtained by Hjort et al. (8) under the finite fourth moment condition of X i, which we do not need in our study. The conditions assumed under Theorem are liberal compared to Assumptions and, and there is no explicit restriction on γ p, which may diverge to infinity as n. Next we show that the dimension p can increase more rapidly if Z il possesses more than four moments. Assuming higher-order moments allows us to evaluate those terms in (9) more accurately. Specifically, we will assume Z il has at least finite th moment, k in model (4). The case k can be considered as a part of the case k whose analysis is covered by Theorem. The following theorem, whose proof is given in an Iowa State University technical report available from the authors, shows that p = o(n / ) is approachable. THEOREM. If k in (4), {tr( )} 4k γ p = O(n k ) and p γ 5 p = o{n(4k)/(4k) }, then () is valid. When γ p is bounded, Theorem implies that w n (µ) is asymptotically normally distributed if p = o(n //(8k) ), which is close to o(n / )fork and improves the earlier rate o(n / ) attained in Hjort et al. (8). By reviewing the proof of Theorem, we can see that if Z ij are all bounded random variables the dimensionality p can reach o(n / ). We believe that p = o(n / )is the best rate for the asymptotic normality of the empirical likelihood ratio with the normalizing constants p and (p) /, based on the following considerations. Lemma 4 in the Appendix implies that, when the largest eigenvalue of is bounded, S n tr in probability if and only if p = o(n / ). Here A tr ={tr(a A)} / is the trace norm. Bai & Yin (99) established the convergence of S n to with probability one if p = o(n) under the matrix norm based on the largest eigenvalue by assuming each Z nl are independent and identically distributed. However, it can be seen from our proofs in the technical report that the convergence of S n to under the trace norm is the one used in establishing various results for the empirical likelihood. As shown by Theorems and, when() is valid, the asymptotic mean and variance of the empirical likelihood ratio are respectively p and p, which are known. This means that the empirical likelihood carries out internal studentizing even when p increases along with n. However, it is apparent that the internal studentization prevents p from growing faster as it brings in those higher-order terms. The least-squares empirical likelihood is a simplified version of the empirical likelihood. The least-squares empirical likelihood ratio for µ is q n (µ) = min (nπ i ) subject to n i= π i = and π i (X i µ) =. The least-squares empirical likelihood uses (nπ i ) to approximate log(nπ i ). As shown in Brown & Chen (998), the optimal weights π i admit closed-form solutions so that q n (µ) = n( X µ) T H n ( X µ), ()

6 SONG XI CHEN, LIANG PENG AND YING-LI QIN where H n = S n ( X µ)( X µ) T. Thus, q n (µ) can be readily computed without solving the nonlinear equation (5) as for the full empirical likelihood. The least-squares empirical likelihood ratio is a first-order approximation to the full empirical likelihood ratio, and q n (µ) χ p in distribution when p is fixed. The least-squares empirical likelihood is less affected by higher dimension. In particular, if k in(4), then (p) / {q n (µ) p} N(, ) () in distribution as n when p = o(n / ), which improves the rate given by Theorem for the full empirical likelihood ratio w n (µ). To appreciate (), we note from () that q n (µ) = n( X µ) T ( X µ) + n( X µ) T (H n )( X µ). () Then, following a similar line to the proof of Lemma 6, n( X µ) T( H n ) ( X µ) = O p (p /n) = o p (p / ). As the first term on the right-hand side of () is asymptotically normal with mean p and variance p as conveyed in (), () is valid. If we confine ourselves to specific distributions, faster rates for p can be established. For example, if the data are normally distributed, the least-squares empirical likelihood ratio is the Hotelling-T statistic, which is shown in Bai & Saranadasa (996) to be asymptotically normal if p/n c [, ). 4. NUMERICAL RESULTS We report results from a simulation study designed to evaluate the asymptotic normality of the empirical likelihood ratio. The p independent and identically distributed data vectors {X i } i= n were generated from a moving average model, X ij = Z ij + ρ Z ij+ (i =,...,n, j =,...,p), where, for each i, the innovations {Z ij } p+ j= were independent random variables with zero mean and unit variance. We considered two distributions for the innovation Z ij. One is the standard normal distribution, and the other is a standardized version of a Pareto distribution with distribution function ( x 4 5 )I (x ). We standardized the Pareto random variables so that they had zero mean and unit variance. As the Pareto distribution has only four finite moments, we had k = in (4), whereas k = for the normally distributed innovations. In both distributions, X i is a multivariate random vector with zero mean and covariance = (σ ij ) p p,whereσ ii =,σ ii± = ρ and σ ij = for i j >. We set ρ to be 5 throughout the simulation. To make p and n increase simultaneously, we considered two growth rates for p with respect to n: (i)p = c n 4 and (ii) p = c n 4. We chose the sample size n =, 4 and 8. By assigning c =, 4 and 5 in the faster growth rate setting (i), we obtained three dimensions for each sample size, which were p = 5, and 4 for n = ; p =, 44 and 58 for n = 4; and p = 4, 55 and 7 for n = 8, respectively. For the slower growth rate setting (ii), to maintain a certain amount of increase between successive dimensions when n was increased, we assigned larger c = 4, 6 and 8, which led to p = 4, 7 and for n = ; p =, 5 and for n = 4; and p = 9, 4 and 4 for n = 8, respectively. We carried out 5 simulations for each of the (p, n)-combinations and for each of the two innovation distributions. Figure displays Q-Q plots of standardized empirical likelihood ratio

Empirical likelihood 7 Normal n = Normal n = 4 Normal n = 8 EL quantile Pareto n = Pareto n = 4 Pareto n = 8 EL quantile Fig.. Normal Q-Q plots with the faster growth rate p = c n 4 for the normal (upper panels) and the Pareto (lower panels) innovations: c = (solid line), 4 (dotted lines) and 5 (dashed lines). statistics for the faster growth rate (i). Those for the slower growth rate (ii) are presented in Fig.. Asn and p were increased simultaneously, there was a general convergence of the standardized empirical likelihood ratio to N(, ). We also observed that the convergence in Fig. for the slower growth rate setting (ii) was faster than that in Fig. for the faster growth rate setting. This is expected as the setting (i) ensured much higher dimensionality. The convergence for the normal innovation was faster than that for the Pareto case when p = c n 4 in Fig.. This may be explained by the fact that the Pareto distribution has only finite fourth moments, which corresponds to k =, whereas the normal innovation has all moments finite. According to Theorems and, the growth rate for p depends on the value of k: the larger the k, the higher the rate. For the lower growth rate in setting (ii), Fig. shows that, there was substantial improvement in the convergence in the Q-Q plots as p was increased at the slower rate for both distributions of innovations. It is observed that the most of the lack-of-fit in the N(, ) Q-Q plots in Figs. and appeared at the lower and upper quantiles. This could be attributed to the lack-of-fit between χp and N(, ), as χp may be viewed as the intermediate convergence of the empirical likelihood ratio. To verify this, we carried out further simulations by inverting settings (i) and (ii) so that for a given dimension p, three sample sizes were generated according to (iii) n = (p/c ) / 4 and (iv) n = (p/c ) / 4, with c =, 4 and 5 and c = 4, 5 and 6, respectively. We chose p = 5, 45 and 55 for the setting (iii) and p = 7, and 5 for the setting (iv). Two figures of χp -based Q-Q plots for (iii) and (iv), given in the Iowa State University technical report, show that there was a substantial improvement in the overall fit of the Q-Q plots, and that the lack-of-fit in the N(, )-based Q-Q plots largely disappeared.

8 SONG XI CHEN, LIANG PENG AND YING-LI QIN Normal n = Normal n = 4 Normal n = 8 EL quantile Pareto n = Pareto n = 4 Pareto n = 8 EL quantile Fig.. Normal Q-Q plots with the slower growth rate p = c n 4 for the normal (upper panels) and the Pareto (lower panel) innovations: c = 4 (solid line), 6 (dotted lines) and 8 (dashed lines). ACKNOWLEDGEMENT The authors thank the editor, Professor D. M. Titterington, and two referees for valuable suggestions that have improved the presentation of the paper. The authors acknowledge grants from the U.S. National Science Foundation. and APPENDIX Technical details We first establish some lemmas. LEMMA. If m 4k < for some k, then E( X i µ k ) = O{tr k ( )} and var( X i µ k ) = O{tr k ( )γ p }. Proof. We only show the case of k = since other cases are similar. It is easy to check that E( X i µ ) = tr{e(x i µ) T (X i µ)} =tr( ) E( X i µ 4 ) = E( ƔZ i 4 ) = E ( ) { Zi TƔT ƔZ i Zi TƔT ƔZ i = tr ƔT ƔE ( )} Z i Zi TƔT ƔZ i Zi T. Write Ɣ T Ɣ = (ν sl ) s,l m. Then Z i Zi TƔT ƔZ i Zi T = ( m m j= l= Z ik Z il ν lj Z ij Z ik ) k,k m. When k = k = s, E { m m j= l= Z } ik Z il ν lj Z ij Z ik = νss E(Z is ) 4 + l s ν ll. (A)

When k k, E( m j= m l= Z ik Z il ν lj Z ij Z ik ) = ν k k. Hence, E( X i µ 4 ) ={m 4 } Empirical likelihood 9 m νss + tr ( ) + tr( ). Note that m s= ν ss m m j= s= ν js = tr{(ɣt Ɣ) }=tr( ). This together with (A) and (A) implies that var( X i µ ) ={m 4 } m s= ν ss + tr( ) = O{tr( )}. LEMMA. Proof. We note that s= If m 4k < for some k, then, with probability one, max X i µ = o [ {tr( )} (k)/(4k) γp /(4k) n /(4k)] + O{tr / ( )}. i=,...,n (A) and max i=,...,n X i µ { max i=,...,n X i µ k E( X i µ k ) +E( X i µ k ) } /(k) max [ X i µ k E( X i µ k ){var( X i µ k )} / ] = o(n / ) i=,...,n with probability one as n. The lemma is proved by applying Lemma A of Owen (99) and Lemma. From now on, we let Y i = / (X i µ), V n = n n i= Y iy T i, Ȳ = n n i= Y i and D n = V n I p = (d sl ) l,s =,...,p. LEMMA. Under the conditions of Theorem, tr(d n ) = O p(p /n). Proof. We only need to show that E{tr(Dn )}=O(p /n). Note that V n = / ƔS z Ɣ T /, where S z = n n i= Z i Zi T and = Ɣ T Ɣ = ( ) σ jl, say. Then j,l =,...,m and ( m E{tr(S z )} =E tr(d n ) = tr(s z S z ) tr(s z ) + p n ) m Z ij Z il σ lj = δ jl σ lj = j,l= i= j,l= m σ jj = p j= (A) (A4) since tr( ) = tr(i p ) = p. By utilizing information of Z i in (4), ( m ) m E[tr{(S Z ) }] = E n Z i j Z i l Z i l Z i l σ l l σ l j j,l= l,l = i,i = m = m 4 n σ jj + ( ) n σ jl + σ jj σ ll + ( n ) j l = j= m σ jl + (m 4 )n j,l= m j= m σ jj + n j l ( σ jl + σ jj σ ll ). σ jl j,l= It is easy to check that m j,l= σ jl = tr( ) = p, m j= σ jj m j,l= σ jl = p, j l σ jl m j,l= σ jl = p and j l σ jj σ ll ( m j= σ jj) = p. These together with (A) and (A4) imply E{tr(Dn )}=O(p /n). LEMMA 4. Under condition (4), max i=i,...,p γ i(s n ) γ i ( ) =O p (γ p pn / ).

SONG XI CHEN, LIANG PENG AND YING-LI QIN Proof. Note that γ i (S n ) γ i ( ) p i= γ / ( ) i S / n γ i ( ) = tr ( S n) + tr( ) By Von Neumann s inequality, p i= γ i(s n )γ i ( ) tr(s n ). Hence Now by applying Lemma. p γ i (S n )γ i ( ). i= max γ i(s n ) γ i ( ) {tr(s n ) } /. i=,...,p tr { (S n ) } = tr(d n D n ) γp ( )tr( Dn ) { = O p γ p ( )p /n } This lemma implies that all the eigenvalues of S n converge to those of uniformly at the rate of O p (γ p pn / ). Proof of Theorem. By (5), λ R p satisfies = X i µ = g(λ). (A5) n + λ T (X i µ) i= Write λ = ρθ, where ρ and θ =. Hence, Hence, = g(ρθ) θ T g(ρθ) { } = n (X θ T i µ)θ T (X i µ) (X i µ) ρ + ρθ T (X i= i= i µ) ρθ T S n θ { + ρ max X i µ } n θ T (X i µ) i=,...,n. { ρ θ T S n θ max X i µ n i=,...,n i= } θ T (X i µ) n θ T (X i µ). i= Since n n i= θ T (X i µ) =O p [{tr( )/n} / ], it follows from Lemma that max X i µ n θ T (X i µ) i=,...,n = o { p {tr( )} /(4k) γp /(4k) n /+/(4k) { + O p tr( )n / } = o p (). i= By Lemma 4, for a positive constant C, P(θ T S n θ C ) as n. Hence λ = ρ = O p [{tr( )/n} / ]. By repeating (A6) in the proof of the above theorem and Lemma, wehave i= max i=,...,n λt (X i µ) λ max X i µ = o p (). i=,...,n We need the following lemmas to prove Theorem. LEMMA 5. n. (A6) (A7) If p/n c, then (p) / {n( X µ) T ( X µ) p} N(, ) in distribution as

Empirical likelihood Proof. The proof entails applying the martingale central limit theorem (Hall & Hyde, 98). Bai & Saranadasa (996) used this approach to establish asymptotic normality of a two sample test statistic for high-dimensional data. LEMMA 6. Under the conditions of Theorem, n( X µ) T( S n ) ( X µ) = o p (p n / ). Proof. Recall that D n = V n I p = (d sl ) s p, l p. It follows from Lemma that P ( max d k k >ɛ ) k,k =,...,p p k = k = p ɛ E ( dk ) k = ɛ E { tr ( Dn)} = O(p /n). Hence, d jl = O p (pn / ) = o p () uniformly in j, l p. It is easy to check that Vn I p = D n + Dn + ( ) D n V n I p and n( X µ) T( S n ) ( X µ) = nȳ T( V n I p )Ȳ. From Lemma, E( Ȳ ) = n E( Y ) = p/n. Since Ȳ T AȲ Ȳ {tr(a )} / for any symmetric matrix A, it follows from Lemma 4 and the condition p = o(n / ) that nȳ T D n Ȳ n Ȳ { tr ( D n)} / = O p (p n / ) = o p (p / ). Similarly, nȳ T D nȳ n Ȳ tr(d n ) = O p(p /n) = o p (p / ). Furthermore, we note the following facts: Ȳ T DnȲ max { γ i(d n ) }Ȳ T DnȲ = o p i=,...,p since max { γ i(d n ) } {tr(dn i=,...,p )}/ and p = o(n / ), for any positive integer l, (Ȳ T D nȳ ) Ȳ T DnȲ 4 γ p(dn )Ȳ T DnȲ = o p(ȳ T DnȲ ). In general, if Ȳ T Dn +l (Ȳ Ȳ = o T p DnȲ ). The lemma follows from summarizing the above results. Proof of Theorem. Put W i = λ T (X i µ). Then (A7) implies that max i=,...,n W i =o p (). Expand equation (A5), W i = g(λ) = X µ S n λ + β n (A8) where β n = n n i= (X i µ) (+ξ i and ξ ) i λ T (X i µ). As max W i =o p (), max ξ i = i=,...,n i=,...,n o p () as well. Hence β n = β n { + o p ()}, where β n = n n i= (X i µ)wi. Apply Theorem and Lemma with k =, we have, if tr( ) = O(γp 5/ n / ), It follows from (A8) that β n = max i=,...,n X i µ λ O p (γ p ( )) = o p ( λ ). (A9) λ = Sn ( X µ) + Sn β n (A) and log( + W i ) = W i Wi / + W i /( + ξ i) 4 for some ξ i such that ξ i W i. Therefore, w n (µ) = n( X µ) T Sn ( X µ) nβ n Sn β n + n i= { λt (X i µ) } ( + ξi ) 4 = n( X µ) T ( X µ) + n( X µ) T( S n ) ( X µ) nβ n S n β n + R n{ + o p ()},

SONG XI CHEN, LIANG PENG AND YING-LI QIN where R n = n i= {λt (X i µ)}.by(a9), (A) and Lemma 4, nβ n Sn β n n β n /γ (S n ) = O p { γ p tr ( )n } + o p { γ 5/ p tr 5/ ( )n /} = o p (p / ). We also note that R n = (nλ T S n λ) / ( n i= λ 4 X i µ 4 ) / = o p (p / ). Hence the theorem follows from Lemmas 5 and 6. REFERENCES ANDERSON, T. W. (). An Introduction to Multivariate Statistical Analysis. NewYork:Wiley. BAI, Z.&SARANADASA, H. (996). Effect of high dimension: by an example of a two sample problem. Statist. Sinica 6, 9. BAI, Z. D., SILVERSTEIN, J. W.& YIN, Y. Q. (998). A note on the largest eigenvalue of a large-dimensional sample covariance matrix. J. Mult. Anal. 6, 66 68. BAI, Z.& YIN, Y. Q. (99). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. Ann. Prob., 76 94. BROWN, B. M.& CHEN, S. X. (998). Combined and least squares empirical likelihood. Ann. Inst. Statist. Math. 5, 697 74. CHEN, S. X. (996). Empirical likelihood confidence intervals for nonparametric density estimation. Biometrika 8, 9 4. CHEN, S. X.& CUI, H. J. (6). On bartlett correction of empirical likelihood in the presence of nuisance parameters. Biometrika 9, 5. DICICCIO, T. J., HALL, P.& ROMANO, J. P. (99). Empirical likelihood is Bartlett-correctable. Ann. Statist. 9, 5 6. FANG, K.T&ZHANG, Y. T. (99). Generalized Multivariate Analysis. Beijing: Science Press. HALL, P. G.& HYDE, C. C. (98). Martingale Central Limit Theory and Its Applications. New York: Academic Press. HALL,P.&LA SCALA, B. (99). Methodology and algorithms of empirical likelihood. Int. Statist. Rev. 58, 9 7. HJORT, H. L., MCKEAGUE, I. W.& VAN KEILEGOM, I. (8). Extending the scope of empirical likelihood. Ann. Statist. 7, 79 5. LEDOIT,O.&WOLF, M. (). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist., 8. OWEN, A. B. (988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 7 49. OWEN, A. B. (99). Empirical likelihood ratio confidence regions. Ann. Statist. 8, 9. PORTNOY, S. (984). Asymptotic behavior of M-estimations of p regression parameters with p /n is large. I. consistency. Ann. Statist. 4, 98 9. PORTNOY, S. (985). Asymptotic behavior of M estimations of p regression parameters when p /n is large. II. normal approximation. Ann. Statist., 4 7. QIN, J.&LAWLESS, J. (994). Empirical likelihood and general estimating functions. Ann. Statist., 5. SCHOTT, J. R. (5). Testing for complete independence in high dimensions. Biometrika 9, 95 56. SCHOTT, J. R. (7). Some high dimensional tests for a one-way MANOVA. J. Mult. Anal. 98, 85 9. TSAO, M. (4). Bounds on coverage probabilities of the empirical likelihood ratio confidence regions. Ann. Statist., 5. YIN, Y. Q., BAI, Z. D.& KRISHNAIAH, P. R. (988). On the limit of the largest eigenvalue of the large-dimensional sample covariance matrix. Prob. Theory Rel. Fields 78, 59. [Received July 7. Revised November 8]