Asymptotic normality of quadratic forms with random vectors of increasing dimension

Similar documents
An Empirical Likelihood Approach To Goodness of Fit Testing

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

7.1 Convergence of sequences of random variables

Efficient GMM LECTURE 12 GMM II

Sequences and Series of Functions

Estimation for Complete Data

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Rank tests and regression rank scores tests in measurement error models

7.1 Convergence of sequences of random variables

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Empirical likelihood approach to goodness of fit testing

1 Introduction to reducing variance in Monte Carlo simulations

32 estimating the cumulative distribution function

Advanced Stochastic Processes.

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Mathematical Statistics - MS

Solution to Chapter 2 Analytical Exercises

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Chapter 6 Infinite Series

Random Variables, Sampling and Estimation

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

An Introduction to Randomized Algorithms

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

4. Partial Sums and the Central Limit Theorem

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A statistical method to determine sample size to estimate characteristic value of soil parameters

Notes 19 : Martingale CLT

Distribution of Random Samples & Limit theorems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Probability and Statistics

SDS 321: Introduction to Probability and Statistics

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

Lecture 2. The Lovász Local Lemma

Optimally Sparse SVMs

LECTURE 8: ASYMPTOTICS I

Notes 5 : More on the a.s. convergence of sums

Lecture 19: Convergence

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Measure and Measurable Functions

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Law of the sum of Bernoulli random variables

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Asymptotic Results for the Linear Regression Model

1 Convergence in Probability and the Weak Law of Large Numbers

Empirical Processes: Glivenko Cantelli Theorems

5. Likelihood Ratio Tests

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

2.2. Central limit theorem.

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

IMPROVING EFFICIENT MARGINAL ESTIMATORS IN BIVARIATE MODELS WITH PARAMETRIC MARGINALS

Simulation. Two Rule For Inverting A Distribution Function

The Central Limit Theorem

Topic 9: Sampling Distributions of Estimators

CONDITIONAL PROBABILITY INTEGRAL TRANSFORMATIONS FOR MULTIVARIATE NORMAL DISTRIBUTIONS

Stat 319 Theory of Statistics (2) Exercises

arxiv: v1 [math.pr] 13 Oct 2011

This section is optional.

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators


Chapter 6 Sampling Distributions

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Introduction to Probability. Ariel Yadin

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Probability for mathematicians INDEPENDENCE TAU

Expectation and Variance of a random variable

Asymptotic distribution of products of sums of independent random variables

REGRESSION WITH QUADRATIC LOSS

Lecture 33: Bootstrap

Self-normalized deviation inequalities with application to t-statistic

Math Solutions to homework 6

Riesz-Fischer Sequences and Lower Frame Bounds

Chapter 6 Principles of Data Reduction

Lecture 2: Monte Carlo Simulation

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

1 Inferential Methods for Correlation and Regression Analysis

ESTIMATING THE ERROR DISTRIBUTION FUNCTION IN NONPARAMETRIC REGRESSION WITH MULTIVARIATE COVARIATES

Lecture 20: Multivariate convergence and the Central Limit Theorem

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

TENSOR PRODUCTS AND PARTIAL TRACES

Stochastic Matrices in a Finite Field

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Lecture 7: Properties of Random Samples

EE 4TM4: Digital Communications II Probability Theory

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Introductory statistics

Transcription:

Asymptotic ormality of quadratic forms with radom vectors of icreasig dimesio Haxiag Peg ad Ato Schick Abstract. This paper provides sufficiet coditios for the asymptotic ormality of quadratic forms of averages of radom vectors of icreasig dimesio ad improves o coditios foud i the literature. Such results are eeded i applicatios of Owe s empirical likelihood whe the umber of costraits is allowed to grow with the sample size. I this coectio we fix a gap i the proof of Theorem 4.1 of Hjort, McKeague ad Va Keilegom (2009). We also demostrate how our results ca be used to obtai the asymptotic distributio of the empirical likelihood with a icreasig umber of costraits uder cotiguous alteratives. I additio, we discuss potetial applicatios of our result. Oe example treats testig for the equality of the margial distributios of a bivariate radom vector. Aother example treats a test for diagoality of a covariace matrix of a ormal radom vector with icreasig dimesio. Key words: Martigale cetral limit theorem; Lideberg coditio; empirical likelihood; cotiguous alteratives; chi-square of fit test with icreasig umber of cells; testig for equal margials; testig for diagoality of a covariace matrix. 1. Itroductio Let be positive itegers that ted to ifiity with. Let ξ,1,...,ξ, be idepedet ad idetically distributed -dimesioal radom vectors with mea E[ξ,1 ] = 0 ad dispersio matrix V = E[ξ,1 ξ,1]. We assume throughout that the largest eigevalue of V is bouded, (C1) ρ = sup u V u = O(1), u =1 ad that the euclidea orm of V teds to ifiity, (C2) trace(v 2 ). Let x deote the euclidea orm of a vector x. We are iterested i the asymptotic behavior of ξ +µ 2 with µ a -dimesioal vector satisfyig (C3) µ V µ trace(v 2 ) 0 The research of Haxiag Peg was partially supported by NSF Grat DMS 0940365. The research of Ato Schick was partially supported by NSF Grat DMS 0906551. 1

2 HANXIANG PENG AND ANTON SCHICK ad ξ the -dimesioal radom vector defied by ξ = 1/2 ξ,j. More precisely, we are lookig for coditios that imply the asymptotic ormality (1.1) ξ +µ 2 µ 2 trace(v ) 2trace(V 2 ) = N(0,1). Of special iterest is the case, whe µ is the zero vector ad V is idempotet with rak q tedig to ifiity. The (1.1) simplifies to (1.2) ξ 2 q 2q = N(0,1). I particular, if µ is the zero vector ad V equals I r, the idetity matrix, the (1.2) becomes (1.3) ξ 2 2r = N(0,1). Such results are eeded to obtai the asymptotic behavior of the likelihood ratio statistic i expoetial families of icreasig dimesios ad to study the behavior of Owe s empirical likelihood whe the data dimesio is allowed to icrease with the sample size. The former was doe by Portoy (1988) who proved (1.3) uder the assumptio that the sixth momets of the coordiates of ξ,1 are uiformly bouded. The latter has bee recetly studied by Hjort, McKeague ad Va Keilegom (2009) who rely o Portoy s result ad by Che, Peg ad Chi (2009) who rely o results ad structural assumptios of Bai ad Saraadasa (1996). We are iterested i verifyig (1.1) uder weaker momet assumptios tha used by these authors. This allows us to fix a gap i the proof of Theorem 4.1 i Hjort, McKeague ad Va Keilegom (2009), see Remark 4 below. Our results are used i Peg ad Schick (2013) where the authors geeralize the results i Hjort, McKeague ad Va Keilegom (2009) to allow for ifiitely may costraits that deped o uisace parameters. We achieve our goal by provig two cetral limit theorems. The first oe uses the followig growth coditios. (1.4) Var[ ξ,1 2 ] = o(trace(v 2 )), (1.5) Var[ V 1/2 ξ,1 2 ] = o(trace 2 (V 2 )), (1.6) E[(ξ,1ξ,2 ) 4 ] = o( 2 trace 2 (V 2 )). by Theorem 1. Suppose (C1) (C3) hold. The (1.4)-(1.6) imply (1.1). I the presece of (C1) ad (C2), the growth coditios (1.4) (1.6) are implied (1.7) E[ ξ,1 4 ] = o(trace(v 2 )). Thus we have the followig corollary. Corollary 1. Suppose (C1) (C3) hold. The (1.7) implies (1.1).

CLT 3 (C4) The ext theorem uses ad the Lideberg coditio, trace(v ) trace(v 2 ) = O(1) (L) L (ǫ) = E[ ξ,1 2 1[ ξ,1 > ǫ ]] 0, ǫ > 0, to obtai the desired result. Theorem 2. Suppose (C1) (C4) hold. The (L) implies (1.1). The proofs of the theorems are give i Sectio 9. Sectio 8 gives techical details eeded i the proofs. A first example with simulatios is give i Sectio 2. There a chi-square goodess of fit test is discussed ad show to valid eve if the umber of cells icreases almost as fast as the sample size. I Sectio 3 we discuss the results i more detail ad compare our results with those i the literature. Sectio 4 illustrates how our results ca be used to give the asymptotic behavior uder cotiguous alteratives. I Sectio 5 we discuss potetial applicatios of our results. We illustrate such a applicatio i Sectio 6 by presetig a test for the equality of the margial distributios of a bivariate radom vector. As aother applicatio a test for the diagoality of a covariace matrix of a ormal radom vector with icreasig dimesio is preseted i Sectio 7. 2. A first example with simulatios Let X 1,...,X be idepedet radom variables with commo distributio fuctio F. To test the ull hypothesis that F equals a specified cotiuous distributio fuctio F 0, we ca use the test statistic r (N i /r) 2 T,r = /r with N i = [ i 1 1 < F 0 (X j ) i ], i = 1,...,r, r r ad reject the ull hypothesis if T,r exceeds χ 1 α (r 1), the (1 α)-quatile of the chi-square distributio with r 1 degrees of freedom. This test has asymptotic size α. This follows from the fact that uder the ull hypothesis the test statistic is asymptotically chi-square with r degrees of freedom. Keep i mid that uder the ull hypothesis the radom variables F 0 (X 1 ),...,F 0 (X ) are uiformly distributed o (0,1) ad that P((i 1)/r < U i/r) = 1/r holds for a uiform radom variable U ad i = 1,...,r. Ca we let r grow with ad still maitai the asymptotic size of this test? The aswer is yes. More precisely, we have the followig result. The test δ,r = 1[T,r > χ 1 α ( 1)] has asymptotic size α as log as teds to ifiity at a rate slower tha, i.e., = o(). The proof of this claim is based o the observatio that a chisquare radom variable with m degrees of freedom is approximately ormal with mea m ad variace 2m for large m. This result is a cosequece of the cetral limit theorem ad the fact that a chi-square radom variable with m degrees of freedom has the same distributio as a sum of m idepedet chi-square radom

4 HANXIANG PENG AND ANTON SCHICK 20 30 Tr 20 15 0 5 10 10 Tr 25 40 30 50 35 Figure 1. Q-Q plots of the simulated values of T,/5 agaist the chi-square distributio with (/5 1) degrees of freedom for = 50, 100, 200, 400. 0 10 20 30 10 20 40 50 Chisquare, = 100 100 Tr 40 20 60 40 80 Tr 60 120 80 140 Chisquare, = 50 30 20 40 60 80 40 60 Chisquare, = 200 80 100 120 140 Chisquare, = 400 variables with oe degree of freedom. Thus our claim ca be verified by showig the asymptotic ormality result (2.1) T,r (r 1) p = N (0, 1). 2(r 1) We ote that T,r equals ξ 2 if we take ξj to be the r -dimesioal radom vector whose i-th coordiate is r 1[i 1 < r F0 (Xj ) i] 1/ r. These radom vectors are idepedet ad idetically distributed with mea vector 0

CLT 5 ad dispersio matrix V = I r 1/ J r, with J r the r r matrix with all its etries equal to oe, ad satisfy ξ j 2 = 1 almost surely. The matrix V is idempotet with trace 1 ad E[ ξ 1 4 ] equals ( 1) 2. Cosequetly, the assumptios of Corollary 1 are met with ρ = 1 ad µ = 0 if ad = o() hold, ad this corollary gives the desired (2.1). Table 1. Simulated sizes of the test δ,/5 for selected values of ad α α\ 40 60 80 100 120 140 160 180 200 400.10.0910.0941.0985.0926.0936.1040.0998.0949.0956.1028.05.0449.0464.0460.0476.0503.0498.0501.0495.0466.0525.01.0090.0109.0098.0102.0096.0100.0108.0096.0098.0114 We have ru some simulatios to assess this result. I the simulatios F 0 was take to be the uiform distributio ad = /5. We geerated 25,000 idepedet copies of T,/5 for several choices of. Figure 1 gives quatilequatile plots. These show that the chi-square approximatio is quite good. Table 2 reports the simulated size of the test for three choices of α. 3. Discussio of the results We begi by addressig sufficiet coditios for the Lideberg coditio. Remark 1. I view of the iequality L (ǫ) E[ ξ,1 2 log(1+ ξ,1 )]/log(1+ǫ ) the Lideberg coditio (L) is implied by (L1) E[ ξ,1 2 log(1+ ξ,1 )] = o(log). I view of the iequality L (ǫ) E[ ξ,1 2+δ ](ǫ ) δ, δ > 0, the Lideberg coditio (L) holds wheever (L2) E[ ξ,1 2α ] = o( α 1 ), for some α > 1. I particular, if E[ ξ,1 2α ] = O(r α ) holds for some α > 1, the (L2) is implied by = o( 1 1/α ). Let us ow specialize our results to the case whe µ is the zero vector ad V is a idempotet matrix with rak q tedig to ifiity. I this case (C1) (C4) hold ad (1.1) simplifies to (1.2). Corollary 2. Suppose V is idempotet with rak q tedig to ifiity. The the followig are true. (a) The growth coditios Var[ ξ,1 2 ] = o(q ), Var[ V 1/2 ξ,1 2 ] = o(q) 2 ad E[(ξ,1ξ,2 ) 4 ] = o( 2 q) 2 imply (1.2). (b) The momet coditio E[ ξ,1 4 ] = o(q ) implies (1.2). (c) The Lideberg coditio (L) implies (1.2). (d) If E[ ξ,1 2α ] = O(r) α ad = o( 1 1/α ) hold for some α > 1, the (1.2) holds.

6 HANXIANG PENG AND ANTON SCHICK For its importace we formulate the special case V = I r. Corollary 3. Suppose V equals I r. The the followig are true. (a) Var[ ξ,1 2 ] = o( ) ad E[(ξ,1ξ,2 ) 4 ] = o( 2 r 2 ) imply (1.3). (b) The momet coditio E[ ξ,1 4 ] = o( ) implies (1.3). I particular, E[ ξ,1 4 ] = O(r 2 ) ad = o() imply (1.3). (c) The Lideberg coditio (L) implies (1.3). (d) If E[ ξ,1 2α ] = O(r α ) ad = o( 1 1/α ) hold for some α > 1, the (1.3) holds. Note that i the case E[ ξ,1 4 ] = r 2 part (b) allows for larger tha part (d). More precisely, part (b) requires = o(), while part (d) requires = o( 1/2 ). Remark 2. Portoy (1988, Theorem 4.1) obtais the coclusio (1.3) i the case V = I r uder the growth coditio / 0 ad the assumptio that the coordiates ξ,1,i of ξ,1 have a uiformly bouded sixth momet, His last coditio implies max 1 i E[ξ 6,1,i] = O(1). (3.1) max 1 i E[ξ 4,1,i] = O(1), ad the latter implies E[ ξ,1 4 ] = E[( ξ,1,i) 2 2 ] E[ξ,1,i] 4 = O(r). 2 Thus his result is a special case of part (b) of Corollary 3. Remark 3. Assume that ξ,1 = V Z for some symmetric idempotet matrix V with rak q tedig to ifiity ad some radom vector Z satisfyig E[Z ] = 0, E[Z Z ] = I r, (3.2) ζ = max 1 i E[ Z 4,i] = o(), ad (3.3) E[Z α1,i Zα2,j Zα3,k Zα4,l ] = E[Zα1,i ]E[Zα2,j ]E[Zα3,k ]E[Zα4,l ] for distict idices i,j,k,l ad o-egative itegers α 1,...,α 4 that sum to 4. The above coditios geeralize those i Che, Peg ad Qi (2009) with our V equal to their Γ (Γ Γ ) 1 Γ. These authors require istead of (3.2) the stroger E[Z 4,1] = = E[Z 4, ] = β for some β. Relyig o results of Bai ad Saraadasa (1996), they obtai (1.2) uder the coditio that q = O(). We shall show (3.4) Var( V 1/2 ξ,1 2 ) = Var( ξ,1 2 ) (2+ζ )q = o(q ) ad (3.5) E[(ξ,1ξ,2 ) 4 ] 3(q 2 +2q +ζ q )+ζ (3+ζ )q = o( 2 q 2 ). Thus we obtai (1.2) from part (a) of Corollary 2 without their restrictios.

CLT 7 Note that the right-had side i (3.3) equals zero if at least oe of α 1,...,α 4 equals oe ad that (3.3) yields E[Z,i 2 Z2,j ] = E[Z2,i ]E[Z2,j ] = 1 for i j. Thus we calculate E[ ξ,1 4 ] = E[(Z V Z ) 2 ] = k=1 l=1 E[Z,i V,i,j Z,j Z,k V,k,l Z,l ] = i k V,i,i V,k,k + i j 2V,i,j V,i,j + i E[Z 4,i]V 2,i,i ν=1 = (trace(v )) 2 +2trace(V )+ E[(Z ξ,2 ) 4 ] = E[ξ 4,1,ν] = k=1 l=1 3E[ ξ,2 4 ]+ζ Here we used the idetity ad the iequalities ν=1 k=1 l=1 [ ( 3 V,ν,i 2 ν=1 V 2,i,j = (E[Z 4,i] 3)V 2,i,i, E[Z,i ξ,2,i Z,j ξ,2,j Z,k ξ,2,k Z,l ξ,2,l ] E[ξ 4,2,i], E[V,ν,i Z,i V,ν,j Z,j V,ν,k Z,k V,ν,l Z,l ] ) 2 +ζ V 4,ν,i V,i,j V,j,i = V,i,i ] 3trace(V )+ζ trace(v ). 0 V,i,i 1 ad V 2,i,j 1, i,j = 1,...,. Usig the idetities ξ,1 2 = V 1/2 ξ,1 2 ad ξ,1ξ,2 = Z ξ,2 we obtai (3.4) ad (3.5). Note also that E[ ξ,1 2 ] equals trace(v ). Remark 4. Our results are motivated by recet results o extedig Owe s (1988, 1990, 2001) empirical likelihood approach to allow for a icreasig umber of costraits, see Hjort, McKeague ad Va Keilegom (2009) ad Che, Peg ad Qi (2009). The empirical likelihood for this case is give by R = sup{ π j : 0 π j, π j = 1, π j X,j = 0} where X,1,...,X, are idepedet ad idetically distributed -dimesioal radom variables with mea E[X,1 ] = 0 ad ivertible dispersio matrix W. It is equivalet to R = sup{ π j : 0 π j, π j = 1, π j ξ,j = 0}

8 HANXIANG PENG AND ANTON SCHICK with ξ,j = W 1/2 X,j. The goal is to show that 2log R is approximately a chi-square radom variable with degrees of freedom. This is doe by showig the asymptotic ormality result (3.6) 2log R 2r = N(0,1). This result is typically achieved i two steps. The first step establishes the approximatio (3.7) 2log R ξ 2 = o p (r 1/2 ), ad the secod step obtais the asymptotic ormality result (1.3). I their Theorem 4.1, Hjort, McKeague ad Va Keilegom (2009) claim (3.6) uder the assumptios that the q-th momets of the coordiates of X,1 are uiformly bouded for some q > 2, that the eigevalues of W are bouded ad bouded away from zero, ad that the dimesio satisfies (3.8) r 3+6/(q 2) = r 3q/(q 2) = o(). Their proof, however, is valid for the case q 6 oly, as they rely o Portoy s (1988) asymptotic ormality result metioed i Remark 2 above. With C a boud o the largest eigevalue of W 1/2 ad B a boud o the q-th momets of the coordiates of X,1, their assumptios imply [( 1 ) q/2 ] E[ ξ,1 q ] C q E[ X,1 q ] = C q r q/2 E X,1,j 2 C q r q/2 1 E[ X,1,j q ] = C q Br q/2. Thus the required asymptotic ormality follows from part (d) of Corollary 3 with α = q/2. Note that their requiremet (3.8) o implies = o( (q 2)/(3q) ) = o( 2(α 1)/6α ) = o( 1 1/α ) as eeded. This closes the gap i Theorem 4.1 of Hjort, McKeague ad Va Keilegom (2009). Remark 5. Suppose E[ ξ,1 2α ] = O(r α ) holds for some α 2. The the momet iequality yields E[ ξ,1 4 ] E[ ξ,1 2α ] 2/α = O(r 2 ). I this case, part (b) of Corollary 3 allows for larger tha part (d). 4. Asymptotic behavior uder local alteratives Let (X, S,Q) be a probability space ad w be a fuctio from X ito R r satisfyig w dq = 0, w 2 dq < ad (4.1) Λ (ǫ) = w 2 1[ w > ǫ ]dq 0, ǫ > 0. Assume also that the matrix W = w w dq satisfies λ = sup u W u = sup (u w ) 2 dq = O(1), u =1 u =1 trace(w 2 ) ad trace(w ) = O(trace(W 2 )).

CLT 9 It the follows from Theorem 2 that 1 w (X j ) 2 trace(w ) 2trace(W 2 ) = N(0,1) if X 1,...,X are idepedet X -valued radom variables with distributio Q. The ext theorem aswers the questio of what happes if we slightly perturb the distributio Q. Let h deote a measurable fuctio satisfyig hdq = 0 ad h 2 dq < ad set h = h1[ h < c 1/2 /s ] h1[ h < c 1/2 /s ]dq with 0 < c < 1/2, 1 s, c 0, s 2 = o(trace(w)) 2 ad c 1/2 /s. Let Q,h deote the probability measure with desity 1 + 1/2 s h with respect to Q. By costructio, we have (4.2) 1/2 /s ( 1+ 1/2 s h 1) h/2 2 dq 0. If s = 1, this implies that the product measures Q,h ad Q are mutually cotiguous. Set µ (h) = w hdq ad = 1/2 s w 2 h dq. Theorem 3. Let X,1,...,X, be idepedet X -valued radom variables with distributio Q,h. The we have the asymptotic ormality result (4.3) 1 w (X,j ) 2 s 2 µ (h) 2 trace(w ) 2trace(W 2 ) = N(0,1). I the case s = 1, this simplifies to (4.4) 1 w (X,j ) 2 µ (h) 2 trace(w ) 2trace(W 2 ) = N(0,1). Proof. Takig ν = µ (h ) ad ξ,j = w (X,j ) 1/2 s ν, we ca write 1 w (X,j ) 2 = ξ +s ν 2. The dispersio matrix of ξ,1 is give by V = W 1 s 2 ν ν where W = w w dq,h = W + 1/2 s w wh dq. By costructio, 1/2 s h is bouded by 2c. Thus, for k = 1,2, we have the iequality ad obtai (1 2c ) k trace(w k ) trace( W k ) (1+2c ) k trace(w k ) trace( W ) trace(w ) 1 ad trace( W 2 ) trace(w 2 ) 1. Sice trace(w 2 ) λ trace(w ), we also have trace(w ).

10 HANXIANG PENG AND ANTON SCHICK The requiremets o the sequeces c ad s imply 1/2 s = o(c ) = o(1). Usig this ad the above, we fid (4.5) sup u V u sup u W u (1+2c ) sup u W u = O(λ ) = O(1) u =1 u =1 u =1 (4.6) ν 2 = sup(u ν ) 2 u =1 h 2 dq sup (u w ) 2 dq λ u =1 (4.7) ν µ (h) 2 λ (h h) 2 dq 0, h 2 dq = O(1), (4.8) trace(v ) = trace( W ) 1 s 2 ν 2 = trace(w )+o(trace(w )). (4.9) trace(v 2 ) = trace( W 2 ) 2 1 s 2 ν W ν + 2 s 4 ν 4 = trace(w 2 )+o(trace(w 2 )). Thus the coditios (C1) (C4) hold with µ = s ν. For (C3) ote that ν V ν is bouded by (4.5) ad (4.6). Fially, usig (4.1), 1/2 s ν = o(1) ad the boud 1/2 s h 1, we derive the Lideberg coditio (L). Thus Theorem 2 yields (4.10) 1 w (X,j ) 2 s 2 ν 2 trace(v ) 2trace(V 2 ) = N(0,1). The desired result (4.3) follows from this, (4.7), (4.9) ad the fact that trace(v ) = trace(w )+ +o(1). I the case s = 1, we have the boud w 2 1/2 h dq 2c Λ (ǫ)+ǫ w h 1[ w ǫ ]dq 2c Λ (ǫ)+ǫ( 1/2, h 2 dq w dq) 2 ǫ > 0. This boud ad trace(w ) = O(trace(W)) 2 yield = o((trace(w)) 2 1/2 ) ad hece (4.4). Remark 6. Let X,1,...,X, be idepedet X -valued radom variables with distributio Q,h for s = 1. Cosider the empirical likelihood R = sup{ π j : 0 π j, π j = 1, π j v (X,j,X,1,...,X, ) = 0} with v a measurable fuctio from X +1 ito R r. Suppose that (4.11) 2log R 1 w (X,j ) 2 = o p ( trace(w 2 )) whe h = 0. By cotiguity, this the also holds if h 0 ad we obtai 2log R µ (h) 2 trace(w ) 2trace(W 2 ) = N(0,1). If W is idempotet with rak q tedig to ifiity, this simplifies to 2log R µ (h) 2 q 2q = N(0,1)

CLT 11 ad may be iterpreted as 2log R beig approximately a o-cetral chi-square radom variable with q degrees of freedom ad o-cetrality parameter µ (h). Remark 7. I the previous remark Q,h was chose to have desity 1 + 1/2 h. By (4.2) this implies that (4.12) 1/2 ( dq,h dq) h/2 dq 2 0. The results of the previous remark remai true uder the more geeral coditio (4.12). 5. Applicatios I applicatios, the quadratic form ξ 2 will ofte serve as a approximatio to a more complicated statistic S. More precisely, suppose that we have the expasio (5.1) S = ξ 2 +o p (r 1/2 ), the the asymptotic ormality result (1.3) implies the same asymptotic ormality result for S, S (5.2) = N(0,1). 2r We have already ecoutered this cocept i Remark 4. Of special iterest is the case ξ j = W 1/2 w (Z j ), where Z 1,...,Z are k- dimesioal radom vectors with commo distributio Q ad w is a measurable fuctio from R k ito R r such that w (Z 1 ) has mea w dq = 0 ad dispersio matrix W = w w dq which satisfies (5.3) 0 < if if u =1 u W u sup sup u W u <. u =1 Suppose also that E[ w (Z 1 ) 4 ] = O(r 2 ) ad = o(). It the follows from part (b) of Corollary 3 that (1.3) holds. Now let ŵ deote a estimator of w ad cosider the statistic with S = ˆT Ŵ 1 ˆT ˆT = 1 ŵ (Z j ) ad Ŵ = 1 ŵ (Z j )ŵ(z j ). I this settig, (5.1) follows from the statemets (5.4) 1 (ŵ (Z j ) w (Z j )) 2 = o p (1) ad (5.5) Ŵ W o = sup u (Ŵ W )u = o p (r 1/2 ). u =1 These statemets typically require additioal restrictios o the rate of growth of. Let W = 1 w (Z j )w (Z j ).

12 HANXIANG PENG AND ANTON SCHICK The we have W W o = O p ( / 1/2 ) as E[ W W 2 0] E[ W W 2 ] E[ w (Z 1 ) 4 ]/ = O(r 2 /) ad with Ŵ W o D +2 W 1/2 o D 1/2 D = 1 ŵ (Z j ) w (Z j ) 2. Hece (5.5) is implied by r 3 = o() ad D = o p (r 1 ). Let us summarize our fidigs. Propositio 1. Suppose w is give as above ad r 3 = o(), D = o p (r 1 ) ad (5.4) hold. The we have the asymptotic ormality result (5.2) with S = ˆT Ŵ 1 ˆT. 6. Testig for equal margials Let us illustrate the result of the previous sectio by meas of a example, amely testig for the equality of the margial distributios of a bivariate radom vector. Let the observatios (X 1,Y 1 ),...,(X,Y ) be idepedet copies of a bivariate radom vector (X, Y). We wat to test whether the margial distributios are the same. This is of importace whe X deotes pre-treatmet ad Y posttreatmet measuremet. Equality of the margial distributios idicates that there is o treatmet effect. Assume that the margial distributio fuctios F (of X) ad G (of Y) are cotiuous. Let us set H = (F + G)/2. We ca estimate H by the pooled empirical distributio fuctio, H(x) = 1 (1[X j x]+1[y j x])/2, x R. Assume from ow o that F equals G so that the ull hypothesis holds. The we have H = F = G ad E[a(X) a(y)] = 0 for every a L 2,0 (H) where L 2,0 (H) = {a L 2 (H) : adh = 0}. We also impose the coditio (6.1) if a A E[(a(X) a(y))2 ] > 0 with A = {a L 2,0 (H) : a 2 dh = 1} the uit sphere i L 2,0 (H). Let ψ 1,ψ 2,... deote a orthoormal basis of L 2,0 (U), where U is the uiform distributio o [0,1]. Sice H is cotiuous, the fuctios ψ 1 H,ψ 2 H,... form a orthoormal basis of L 2,0 (H). We shall work with the trigoometric basis defied by ψ k (x) = 2cos(πkx), 0 x 1,k = 1,2,..., because these fuctios are bouded ad have bouded derivatives. Let v = (ψ 1,...,ψ r ) ad set ad w (x,y) = v (H(x)) v (H(y)) ŵ (x,y) = v (H(x)) v (H(y)), x,y R.

CLT 13 It follows from (6.1) that the dispersio matrix W = E[w (X 1,Y 1 )w (X 1,Y 1 )] of w (X 1,Y 1 ) satisfies 0 < if a A E[(a(X) a(y))2 ] u W u 4, u = 1. To see this use the fact that u v belogs to A for each uit vector u. Thus (5.3) holds. Sice w 2 v 2 2, we obtai E[ w (X,Y) 4 ] = O(r). 2 IfH adw werekow,wecouldusetheteststatistic W 1/2 T 2 = T W 1 T, where T = 1 w (X j,y j )). Sice H ad W are ukow, we work istead with ˆT Ŵ 1ˆT where ˆT = 1 ŵ (X j,y j ) ad Ŵ = 1 ŵ (X j,y j )ŵ (X j,y j ) Usig v 2 2π 2 r, 3 we have 1 v (H(X j )) v (H(X j )) 2 2π 2 r 3 sup H(t) H(t) 2 = O p (r/), 3 t R 1 v (H(Y j )) v (H(Y j )) 2 2π 2 rsup 3 H(t) H(t) 2 = O p (r/). 3 t R This implies D = o p (r 1 ) if r 4 = o(). Fially (5.4) holds as show i Peg ad Schick (2005, pages 403 404) if r 3 = o(). Ideed it follows from there that [( 1 ) 2 ] E ŵ (X j,y j ) w (X j,y j ) k=1 Thus we have proved the followig result. 48π2 r 3 [ 8π 2 ( 1) 2 +3 r 3 ( 1) + 32π2 r( 1) 3 ] ( 1) 2. Corollary 4. Suppose F equals G ad (6.1) holds. The we have the asymptotic ormality result ˆT Ŵ 1ˆT r = N(0,1) 2r provided teds to ifiity ad r 4 / teds to zero. This result shows that the test which rejects the ull hypothesis if ˆT Ŵ 1 ˆT exceedsthe(1 α)-quatileofthechi-squaredistributiowith degreesoffreedom has asymptotic size α. We coducted a small simulatio study to ivestigate the power of this test. We first looked at data from a bivariate ormal distributio with parameters (0,θ,1,σ 2,ρ) where the first two coordiates refer to the meas, the third ad fourth to the variaces, ad the fifth to the correlatio coefficiet. By (a) (page 12) of Peg ad Schick (2004) the bivariate ormal model satisfies the coditio (6.1). We simulated the power for some choices of θ, σ 2 ad ρ, amely θ =.0,.2,.4, σ 2 =.7,1,1.3 ad ρ =.5,.8, for the sample sizes = 50,100,150 ad for the values = 1,2,3,4. I each case the power was estimated based o 10,000 repetitios

14 HANXIANG PENG AND ANTON SCHICK Table 2. Simulated power for bivariate ormal data with α =.05 for selected values of θ, σ 2, ρ, ad = 1,...,4. ρ =.5 ρ =.8 θ σ 2 1 2 3 4 1 2 3 4 0.0 0.7 0.051 0.127 0.105 0.103 0.060 0.186 0.145 0.161 0.0 1.0 0.049 0.044 0.047 0.042 0.050 0.045 0.042 0.037 0.0 1.4 0.051 0.117 0.097 0.095 0.052 0.164 0.130 0.144 0.2 0.7 0.276 0.290 0.240 0.224 0.531 0.539 0.484 0.453 50 0.2 1.0 0.244 0.180 0.161 0.134 0.468 0.341 0.319 0.261 0.2 1.4 0.210 0.225 0.187 0.171 0.409 0.433 0.380 0.352 0.4 0.7 0.778 0.724 0.684 0.629 0.982 0.971 0.972 0.956 0.4 1.0 0.714 0.590 0.554 0.484 0.969 0.925 0.928 0.888 0.4 1.4 0.613 0.562 0.517 0.471 0.927 0.895 0.886 0.859 0.0 0.7 0.052 0.227 0.188 0.206 0.052 0.353 0.291 0.359 0.0 1.0 0.048 0.049 0.047 0.044 0.053 0.049 0.050 0.048 0.0 1.4 0.054 0.199 0.160 0.180 0.057 0.313 0.259 0.313 0.2 0.7 0.505 0.564 0.514 0.499 0.836 0.876 0.861 0.862 100 0.2 1.0 0.440 0.341 0.314 0.274 0.778 0.676 0.671 0.610 0.2 1.4 0.370 0.450 0.403 0.400 0.691 0.756 0.727 0.730 0.4 0.7 0.972 0.967 0.965 0.954 1.000 1.000 1.000 1.000 0.4 1.0 0.950 0.899 0.891 0.856 1.000 1.000 1.000 0.999 0.4 1.4 0.908 0.897 0.879 0.859 0.998 0.998 0.998 0.997 0.0 0.7 0.050 0.330 0.280 0.328 0.055 0.506 0.431 0.542 0.0 1.0 0.048 0.052 0.049 0.047 0.052 0.052 0.051 0.051 0.0 1.4 0.052 0.296 0.249 0.283 0.052 0.456 0.386 0.490 0.2 0.7 0.677 0.758 0.728 0.727 0.945 0.973 0.974 0.974 150 0.2 1.0 0.598 0.486 0.465 0.418 0.918 0.859 0.863 0.823 0.2 1.4 0.516 0.629 0.578 0.589 0.856 0.919 0.908 0.915 0.4 0.7 0.997 0.997 0.997 0.996 1.000 1.000 1.000 1.000 0.4 1.0 0.993 0.983 0.982 0.972 1.000 1.000 1.000 1.000 0.4 1.4 0.982 0.983 0.981 0.974 1.000 1.000 1.000 1.000 usig a sigificace level of α =.05. The results are reported i Table 1 for the above metioed values of θ, σ 2, ρ, ad. The rows correspodig to the value (θ,σ 2 ) = (0,1) refer to the ull hypothesis. We see from the table that the power is larger for the larger value of ρ. We also geerated data from the Farlie-Gumbel-Morgester copula model with margials F ad G possessig desities f ad g, respectively. The desity for this model is give by p γ,f,g (x,y) = ( 1+γ(1 2F(x))(1 2G(y)) ) f(x)g(y), x,y R, where γ is a umber i the iterval ( 1,1). As show i Peg ad Schick (2004), thedesityp γ,f,g satisfiesthecoditio(6.1). Foroursimulatiowetookγ =.5,.8, F to be the logistic distributio fuctio, F(x) = 1/(1 + exp( x)), ad G of the form G(x) = F((x θ)/σ) for some selected values of θ ad σ, amely θ = 0,.2,.4 ad σ =.8,1,1.2. We agai estimated the powers usig 10,000 repetitios. Table 2 reports the simulated powers of the test for the above combiatios of values of θ,

CLT 15 Table 3. Simulated power i the FGM copula with logistic margials with α =.05 for selected values of θ, σ, γ,, ad = 1,...,4. γ =.5 γ =.8 θ σ 1 2 3 4 1 2 3 4 0.0 0.8 0.048 0.138 0.112 0.109 0.052 0.142 0.116 0.112 0.0 1.0 0.050 0.050 0.049 0.047 0.052 0.049 0.046 0.042 0.0 1.2 0.056 0.110 0.092 0.084 0.056 0.111 0.093 0.086 0.2 0.8 0.110 0.179 0.143 0.134 0.116 0.189 0.155 0.140 50 0.2 1.0 0.094 0.078 0.070 0.062 0.101 0.075 0.070 0.063 0.2 1.2 0.080 0.127 0.106 0.100 0.086 0.134 0.112 0.102 0.4 0.8 0.274 0.309 0.253 0.225 0.305 0.330 0.268 0.244 0.4 1.0 0.231 0.169 0.137 0.118 0.257 0.185 0.153 0.131 0.4 1.2 0.206 0.219 0.177 0.161 0.215 0.227 0.182 0.164 0.0 0.8 0.051 0.252 0.212 0.218 0.055 0.257 0.214 0.217 0.0 1.0 0.050 0.051 0.048 0.046 0.050 0.049 0.047 0.046 0.0 1.2 0.050 0.178 0.149 0.154 0.053 0.183 0.153 0.151 0.2 0.8 0.166 0.350 0.296 0.290 0.186 0.361 0.307 0.298 100 0.2 1.0 0.145 0.110 0.093 0.083 0.148 0.115 0.097 0.087 0.2 1.2 0.121 0.239 0.202 0.194 0.136 0.249 0.204 0.199 0.4 0.8 0.493 0.586 0.517 0.491 0.543 0.612 0.548 0.523 0.4 1.0 0.417 0.321 0.270 0.237 0.468 0.354 0.309 0.267 0.4 1.2 0.356 0.416 0.353 0.326 0.388 0.450 0.388 0.364 0.0 0.8 0.055 0.377 0.318 0.333 0.054 0.366 0.308 0.322 0.0 1.0 0.050 0.049 0.052 0.050 0.051 0.054 0.049 0.048 0.0 1.2 0.052 0.267 0.221 0.228 0.049 0.255 0.222 0.227 0.2 0.8 0.226 0.498 0.433 0.434 0.255 0.511 0.446 0.452 150 0.2 1.0 0.187 0.151 0.121 0.112 0.211 0.157 0.139 0.123 0.2 1.2 0.163 0.351 0.294 0.289 0.181 0.364 0.310 0.310 0.4 0.8 0.661 0.778 0.718 0.702 0.722 0.814 0.758 0.740 0.4 1.0 0.577 0.463 0.407 0.354 0.635 0.524 0.460 0.412 0.4 1.2 0.497 0.598 0.527 0.508 0.545 0.631 0.569 0.543 σ ad γ =.5,.8, for sample sizes = 50,100,150 ad omial level of sigificace.05. The rows with (θ,σ) = (0,1) i Table 2 correspod to the ull hypothesis. From the table it appears that the power is slightly larger for the larger value of γ. 7. Testig diagoality of a covariace matrix of a ormal radom vector with icreasig dimesio Let {X 1 (a) : a A },...,{X, (a) : a A } be idepedet ad idetically distributed cetered secod order processes idexed by a fiite set A with elemets. We deote the covariace fuctio by ad set K (a,b) = E[X 1 (a)x 1 (b)], a,b A, X (a) = 1 X j (a), a A.

16 HANXIANG PENG AND ANTON SCHICK The results i this papeow apply with ξ j = (X j (a 1 ),...,X j (a r )) for ay eumeratio (a 1,...,a r ) of the elemets of A. Footatioal cosideratios it might be more coveiet to avoid this eumeratio ad work with the give parametrizatio. I the origial parametrizatio, the aalog of (1.1) with µ = 0 is a A (7.1) ( X (a)) 2 a A K (a,a) 2 = N(0,1). a,b A K (a,b)k (b,a) Simple sufficiet coditios for this are (7.2) max K (a,b) = O(1), a A b A (7.3) k = (7.4) a,b A K (a,b)k (b,a), a A E[ X 1 (a) 4 ] = o(k ). Ideed, the first coditio implies (C1) as the operatoorm of a symmetric matrix is bouded by the maximal l 1 -orm of its rows, the secod coditio is equivalet to (C2), ad the third coditio implies (1.7). Thus (7.1) follows from Corollary 1. Let us ow illustrate this result. Suppose Z 1,...,Z are idepedet ad idetically distributed cetered p -dimesioal radom vectors. We are iterested i testig whether their dispersio matrix Σ is diagoal. Sice the radom vectors are cetered, we estimate Σ by ˆΣ = 1 Z jzj. A estimator of Σ uder the ull hypothesis is diag(ˆσ ), the diagoal matrix formed by the diagoal etries of ˆΣ. As test statistic we ca the take We ca express T as with T = (/2) ˆΣ diag(ˆσ ) 2 2. T = 1 i<j p ( X (i,j)) 2 X (i,j) = 1 Z,k,i Z,k,j k=1 where Z,k,i deotes the i-th coordiate of Z k. Here A equals {(i,j) : 1 i < j p } ad has = p (p 1)/2 elemets, while X k (a) = Z,k,i Z k,j for a = (i,j) A. Let us ow derive the asymptotic behavior of T uder the ull hypothesis. For this we assume that Σ is a diagoal matrix whose diagoal etries fall ito a compact subiterval [λ,λ] of (0, ) for each. Uder this assumptio we calculate { Σ,i,i Σ,jj, (k,l) = (i,j), K ((i,j),(k,l)) = 0, otherwise, ad fid max a A b A K (a,b) = max 1 i<j p Σ,i,i Σ,j,j Λ 2,

ad k = a,b A K (a,b)k (b,a) = CLT 17 1 i<j p E[Z 4,1,iZ 4,1,j] = 1 i<j p Σ 2,i,iΣ 2,j,j λ 4 p (p 1)/2 1 i<j p 9Σ 2,i,iΣ 2,j,j = 9k. Thus, the sufficiet coditios (7.2) (7.4) are met if p 2 = o(). I this case we have T 1 i<j p Σ,i,i Σ,j,j 2 = N(0,1). 1 i<j p Σ 2,i,i, Σ2,j,j This result, however, is of limited practical use as the quatities Σ,i,i are ukow. We claim that uder the above assumptios T 1 i<j p ˆΣ,i,iˆΣ,j,j 2 1 i<j p ˆΣ2,i,iˆΣ2,j,j = N(0,1). This follows if we show S,1 = ) 1 i<j p (ˆΣ,i,iˆΣ,j,j Σ,i,i Σ,j,j ) = o p (k 1/2 ad S,2 = (ˆΣ 2,i,iˆΣ 2,j,j Σ 2,i,iΣ 2,j,j) = o p (k ). 1 i<j p It is easy to verify that the summads i S,1 are of the form with Σ,i,i Σ,j,j (Y i +Y j +Y i Y j ) Y i = 1 k=1 ( Z 2 ),k,i 1. Σ,i,i The radom variables Y 1,...,Y p are idepedet with zero mea ad variace 2/. Usig this ad the idetity a 2 b 2 = (a b) 2 +2b(a b), we verify E[ S,1 ] 1 i<j p Λ 2 E[ Y i +Y j +Y i Y j = O(p 2 / ) = o(p ) ad E[ S,2 ] 1 i<j p Λ 4 (E[ Y i +Y j +Y i Y j 2 ]+2E[ Y i +Y j +Y i Y j ]) = O(p 2 / ) = o(p ). These yield the desired results i view of k λ 2 p (p 1)/2. 8. A auxiliary lemma Our proofs of the theorems will rely o the followig simple lemma. Lemma 1. Let X 1,...,X m be idepedet ad idetically distributed radom vectors with zero mea ad dispersio matrix V ad set S k = X 1 + + X k, k = 1,...,m. The oe has E[ S k 2 ] = ke[ X 1 2 ] = ktrace[v], k = 1,...,m,

18 HANXIANG PENG AND ANTON SCHICK ad ǫ 2 P( max 1 k m S k > ǫ) E[ S m 2 ] = mtrace(v), ǫ > 0. If also E[ X 1 4 ] is fiite, the oe has ad Var( S k 2 ) = 2k(k 1)trace(V 2 )+kvar( X 1 2 ), k = 1,...,m, m Var( S k 2 ) 2m 4 trace(v 2 )+2m 3 Var( X 1 2 ). k=1 Proof. The first iequality is the Kolmogorov iequality for radom vectors. Let X = X 1 ad Y = X 2. The ad E[ X 2 ] = E[trace[XX )] = trace(e[xx ]) = trace(v) E[(X Y) 2 ] = E[trace(X YY X)] = E[trace(YY XX )] = trace(e[yy XX ]) = trace(e[yy ]E[XX ]) = trace(v 2 ). Usig idepedece we calculate E[ S k 2 ] = E[ S k 4 ] = = 4 k k k l=1 p=1 1 i<j k k E[Xi X j ] = k k E[ X i 2 ] = ke[ X 2 ] = ktrace(v), k E[Xi X j Xl X p ] E[(X i X j ) 2 ]+ k E[ X i 4 ]+2 1 i<p k = 2k(k 1)trace(V 2 )+ke[ X 4 ]+k(k 1)(trace(V)) 2, E[ X i 2 ]E[ X p 2 ] ad hece obtai the desired form of Var( S k 2 ). It is easy to see that the covariace of S i 2 ad S j 2 equals the variace of S mi(i,j) 2. Thus we obtai m m Var( S k 2 ) = Var( S k 2 )(1+2(m k)) k=1 = k=1 m (1+2(m k)k(2(k 1)trace(V 2 )+Var( X 2 )) k=1 ad hece the desired boud o the variace of m k=1 S k 2. 9. Proof of the theorems To simplify otatio we abbreviate ξ,j by ξ j ad (trace(v)) 2 1/2 by σ ad itroduce -dimesioal radom vectors D 0 = 0 ad 2 j D j = ξ i, j = 1,...,. σ

CLT 19 I view of the idetity ξ +µ 2 = 1 ξ j 2 + 2 1 i<j ξ i ξ j +2µ ξ + µ 2 we ca write the left-had side of (1.1) as Q +R +T where Q = Dj 1ξ j, R = 1 2 [ ξ j 2 E[ ξ j 2 ]] ad T = µ σ σ ξ j. We have E[T 2 ] = 2µ V µ /σ 2 0. Thus the desired result follow if we show that R coverges to zero i probability ad that Q is asymptotically stadard ormal. The latter follows from the Martigale Cetral Limit Theorem (see e.g. part (a) of Theorem 2.5 of Hellad (1982), or Corollary 3.1 i Hall ad Heyde (1980) ad the esuig remarks) if we verify that (9.1) E j 1 (D j 1ξ j ) = 0, j = 1,...,, (9.2) ad, for ǫ > 0, (9.3) E j 1 ((Dj 1ξ j ) 2 ) = 1+o p (1) E j 1 ( Dj 1ξ j 2 1[ Dj 1ξ j > ǫ]) = o p (1), with E j 1 the coditioal expectatio give ξ 1,...,ξ j 1. Of course, (9.1) is a simple cosequece of the idepedece of the radom vectors ξ 1,...,ξ. Proof of Theorem 1. Assume ow (1.4) (1.6) hold. We have R = o p (1) i view of (1.4) ad the idetity E[R] 2 = Var( ξ 1 2 ) σ 2. The left-had side of (9.2) equals S = Dj 1V D j 1 = 2 1 2 σ 2 j V 1/2 ξ i 2. Note that the radom vector V 1/2 ξ 1 is cetered ad has dispersio matrix V. 2 We have trace(v) 4 ρ 2 trace(v) 2 = ρ 2 σ. 2 Thus, with the aid of Lemma 1 ad (1.5), we fid ad E[S ] = 1 2j 2 = 1 1 Var(S ) 8ρ2 + 8Var( V 1/2 ξ 1 2 ) 0. σ 2 This shows that S = 1+o p (1). Fially, the expected value of the left-had side of (9.3) is bouded by U /ǫ 2 with U = E[ D j 1ξ j 4 ] = 4 4 σ 4 σ 4 j=2 ξi ξ j ) 4 ]. j 1 E[(

20 HANXIANG PENG AND ANTON SCHICK Coditioig i the expectatio with idex j o ξ j, we obtai with the aid of Lemma 1, U 4 4 σ 4 [3(j 1)(j 2)E[(ξ1 V ξ 1 ) 2 ]+(j 1)E[(ξ1 ξ 2 ) 4 ] j=2 4 [ 3 3 4 σ 4 (Var( V 1/2 ξ 1 2 )+trace 2 (V))+ 2 2 E[ ξ1 ξ 2 4 ] ]. It follows from (1.5) ad (1.6) that U coverges to zero. This proves (9.3) ad completes the proof of Theorem 1. Proof of Theorem 2. For a arbitrary positive ǫ, we ca write R = R,1 +R,2, where R,1 = 1 ( ξj 2 1[ ξ j ǫ ] E[ ξ j 2 1[ ξ,j ǫ ]] ), σ R,2 = 1 σ ( ξj 2 1[ ξ j > ǫ ] E[ ξ j 2 [ ξ j > ǫ ]] ), ad calculate E[ R,2 ] 2L (ǫ)/σ ad E[R 2,1] E[ ξ 1 4 1[ ξ 1 ǫ ]] σ 2 This shows that R = o p (1). Next, we show (9.4) D = max Dj 1 = O p (1) ad 1 j Ideed, with the help of Lemma 1 we obtai ad P(D > 2K) = P( max 1 j j 1 ǫ2 E[ ξ 1 2 ] σ 2 = ǫ 2trace(V ) trace(v 2 ). D j 1 2 = O p (1). trace(v ) ξ i > Kσ ) σk 2 2 = 1 K2, K > 0, E[ D j 1 2 ] = 2 2 σ 2 (j 1)trace(V ) trace(v ). The statemets (9.4) imply (9.3), sice the left-had side of (9.3) is bouded by D j 1 2 y 2 1[ D j 1 y > ǫ]df (y) y 2 1[Dy > ǫ ]df (y) D j 1 2, where F is the distributio of ξ 1. Fially, we obtai (9.2) by verifyig (9.5) S = 2 2 σ 2 j=2 For this we write ξ j = X j +Y j with j 1 V 1/2 2 ξ i = 1+op (1). X j = ξ j 1[ ξ j ] E[ξ j 1[ ξ j ]], Y j = ξ j 1[ ξ j > ] E[ξ j 1[ ξ j > ]]. σ 2

CLT 21 I view of the Cauchy Schwarz iequality, the desired (9.5) follows from the statemets (9.6) S,1 = 2 j 1 2 σ 2 V 1/2 2 X j = 1+op (1) ad (9.7) S,2 = 2 2 σ 2 j 1 V 1/2 2 Y j = op (1). The latter follows from the boud E[S,2 ] = 2 2 σ 2 (j 1)E[ V 1/2 Y 1 2 ] ρ L (1) σ 2. The former follows if we show E[S,1 ] 1 ad Var(S,1 ) 0. We calculate E[S,1 ] = ( 1) σ 2 E[V 1/2 X 1 2 ] = ( 1) trace(w ) σ 2 with W = V 1/2 E[X 1 X1 ]V 1/2 the dispersio matrix of V 1/2 X 1. We have the idetity W = V 1/2 (V E[Z Z ] E[Z ]E[Z ] )V 1/2 with Z = ξ 1 [1 ξ 1 > ] ad obtai the iequality trace(v 2 ) 2ρ L (1) trace(w ) trace(v 2 ). This lets us coclude E[S,1 ] 1. Lemma 1 ad the iequalities V 1/2 X 1 2 4ρ ad trace(w 2 ) trace(v 4 ) ρ 2 trace(v 2 ) yield Var(S,1 ) 8 4 σ 4 ( 4 trace(w)+ 2 3 E( V 1/2 X 1 4 ]) 8ρ2 +32ρ σ 2 0. This completes the proof of Theorem 2. Refereces [1] Bai, Z. ad Saraadasa, H. (1996). Effect of high dimesio: by a example of a two sample problem. Statist. Siica, 6, 311-329. [2] Che, S.X., Peg, L. ad Qi, Y.-L. (2009). Effects of data dimesio o empirical likelihood. Biometrika 96, 711 722. [3] Hall, P. ad Heyde, C.C. (1980). Martigale limit theory ad its applicatio. Academic Press. [4] Hellad, I. (1982). Cetral limit theorems for martigales with discrete or cotiuous time. Scad. J. Statist. 9, 79 94. [5] Hjort, N.L., McKeague, I.W. ad Va Keilegom, I. (2009). Extedig the scope of empirical likelihood. A. Statist. 37, 1079 1111. [6] Owe, A. (1988). Empirical likelihood ratio cofidece itervals for a sigle fuctioal. Biometrika 75, 237 249. [7] Owe, A. (1990). Empirical likelihood ratio cofidece regios. A. Statist. 18, 90 120. [8] Owe, A. (2001). Empirical Likelihood. Chapma & Hall/CRC, Lodo. [9] Peg, H. ad Schick, A. (2005). Efficiet estimatio of liear fuctioals of a bivariate distributio with equal, but ukow, margials: The least squares approach. J. Multivariate Aal. 95, 385-409. [10] Peg, H. ad Schick, A. (2013). A empirical likelihood approach to goodess of fit testig. Beroulli, 19, 2250 2276. [11] Portoy, S. (1988). Behavior of likelihood methods for expoetial families whe the umber of parameters teds to ifiity. A. Statist. 16, 356 366.

22 HANXIANG PENG AND ANTON SCHICK Haxiag Peg, Idiaa Uiversity Purdue Uiversity, at Idiaapolis, Departmet of Mathematical Scieces, Idiaapolis, IN 46202-3267, USA Ato Schick, Departmet of Mathematical Scieces, Bighamto Uiversity, Bighamto, NY 13902-6000, USA