Econometrica, Vol. 69, No. 6 (November, 2001), ASYMPTOTIC OPTIMALITY OF EMPIRICAL LIKELIHOOD FOR TESTING MOMENT RESTRICTIONS

Similar documents
Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

On the Asymptotic Optimality of Empirical Likelihood for Testing Moment Restrictions

What s New in Econometrics. Lecture 15

Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data

EMPIRICAL LIKELIHOOD METHODS IN ECONOMETRICS: THEORY AND PRACTICE

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Research Article Optimal Portfolio Estimation for Dependent Financial Returns with Generalized Empirical Likelihood

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

Testing Statistical Hypotheses

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Generalized Method of Moments Estimation

Estimation of the Conditional Variance in Paired Experiments

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics CONSISTENT FIRM CHOICE AND THE THEORY OF SUPPLY

Testing Statistical Hypotheses

A Test of Cointegration Rank Based Title Component Analysis.

3 Integration and Expectation

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

Optimal global rates of convergence for interpolation problems with random design

The properties of L p -GMM estimators

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Bootstrap Tests: How Many Bootstraps?

QED. Queen s Economics Department Working Paper No. 1244

11. Bootstrap Methods

Large Deviations Techniques and Applications

Econometrica Supplementary Material

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

Product metrics and boundedness

Specification Test for Instrumental Variables Regression with Many Instruments

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

The Essential Equivalence of Pairwise and Mutual Conditional Independence

Subdifferential representation of convex functions: refinements and applications

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

A note on profile likelihood for exponential tilt mixture models

Induced Norms, States, and Numerical Ranges

The International Journal of Biostatistics

Empirical Processes: General Weak Convergence Theory

Chapter 1. GMM: Basic Concepts

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

IDENTIFICATION OF MARGINAL EFFECTS IN NONSEPARABLE MODELS WITHOUT MONOTONICITY

Does k-th Moment Exist?

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Partial Identification and Confidence Intervals

Lecture 7: Semidefinite programming

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 3, MARCH

APPROXIMATION OF MOORE-PENROSE INVERSE OF A CLOSED OPERATOR BY A SEQUENCE OF FINITE RANK OUTER INVERSES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

Course 212: Academic Year Section 1: Metric Spaces

A Bootstrap Test for Conditional Symmetry

Bayesian Interpretations of Heteroskedastic Consistent Covariance. Estimators Using the Informed Bayesian Bootstrap

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Appendix B Convex analysis

LECTURE 15: COMPLETENESS AND CONVEXITY

Case study: stochastic simulation via Rademacher bootstrap

University of California San Diego and Stanford University and

Averaging Estimators for Regressions with a Possible Structural Break

1. GENERAL DESCRIPTION

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

EMPIRICALLY RELEVANT CRITICAL VALUES FOR HYPOTHESIS TESTS: A BOOTSTRAP APPROACH

The Skorokhod reflection problem for functions with discontinuities (contractive case)

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for

THE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION

14.30 Introduction to Statistical Methods in Economics Spring 2009

Introduction to Real Analysis Alternative Chapter 1

2 Sequences, Continuity, and Limits

Inference for Identifiable Parameters in Partially Identified Econometric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

On likelihood ratio tests

Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption

Can we do statistical inference in a non-asymptotic way? 1

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

The small ball property in Banach spaces (quantitative results)

Sample path large deviations of a Gaussian process with stationary increments and regularily varying variance

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Missing dependent variables in panel data models

APPLICATIONS IN FIXED POINT THEORY. Matthew Ray Farmer. Thesis Prepared for the Degree of MASTER OF ARTS UNIVERSITY OF NORTH TEXAS.

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A nonparametric test for path dependence in discrete panel data

Exponential Tilting with Weak Instruments: Estimation and Testing

Part III. 10 Topological Space Basics. Topological Spaces

Optimization and Optimal Control in Banach Spaces

Lecture 21. Hypothesis Testing II

On the Principle of Optimality for Nonstationary Deterministic Dynamic Programming

Confidence Measure Estimation in Dynamical Systems Model Input Set Selection

Probability and Measure

Math 117: Topology of the Real Numbers

Testing Restrictions and Comparing Models

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Joint Estimation of Risk Preferences and Technology: Further Discussion

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

A better way to bootstrap pairs

Maths 212: Homework Solutions

Transcription:

Econometrica, Vol. 69, No. 6 (November, 200), 66 672 ASYMPTOTIC OPTIMALITY OF EMPIRICAL LIKELIHOOD FOR TESTING MOMENT RESTRICTIONS By Yuichi Kitamura introduction Economic theory often provides testable implications in the form of moment restrictions. Hansen (982) proposed a computationally simple test of moment restrictions based on Generalized Method of Moments (GMM). This test, often referred to as the overidentifying restrictions test, is routinely used in practice. Several alternatives to Hansen s test have been proposed, including the empirical likelihood ratio test (Owen (988), Qin and Lawless (994), Kitamura (996, 997)) and its variants (Kitamura and Stutzer (997), Imbens, Spady, and Johnson (998)). This paper investigates the asymptotic size and power, i.e. the asymptotic efficiency, of moment restrictions tests, in terms of large deviations. The conventional Pitman approach is not very useful for this purpose since, under appropriate local alternatives, all of the test statistics mentioned above have the same limiting noncentral chi-square distributions. This paper uses Hoeffding s (965) global approach to overcome this problem. Hoeffding investigated multinomial models using an asymptotic relative efficiency (ARE) concept in which both type I and type II error probabilities go to zero asymptotically. Specifically, let n and n denote the type I and type II error probabilities of a test. Consider (competing) tests that satisfy () n log n for a given >0. Among such tests, a test is optimal if it minimizes (2) n log n uniformly over all multinomial distributions with the same support. This optimality criterion is sometimes called the Generalized Neyman-Pearson criterion. Hoeffding showed I thank Hyungtaik Ahn, Badi Baltagi, Tim Bollerslev, Bill Brown, Ray Deneckere, Art Goldberger, Joel Horowitz, John Kennan, Chuck Manski, Rosa Matzkin, Ingmar Prucha, Robin Sickles, Richard Spady, Ken West, a co-editor, four anonymous referees, and seminar participants at Hitotsubashi University, Northwestern University, Rice University, Texas A&M University, the University of Maryland, the University of Virginia, the University of Wisconsin, the Yokohama National University, the NBER Summer Institute, and the 997 North American Winter Meetings of the Econometric Society for helpful comments. This is a revised version of a paper circulated under the title GNP-Optimal Tests for Moment Restrictions. I gratefully acknowledge financial support from the National Science Foundation (Grant Nos. SBR-96320 and SES-9905247) and the Alfred P. Sloan Foundation. 66

662 yuichi kitamura that the LR test is optimal with respect to this criterion. 2 His result has been elaborated (Oosterhoff and van Zwet (972), Sutrick (986)) and extended in various directions (Gutman (989), Steinberg and Zeitouni (992), Zeitouni and Gutman (99a)). We demonstrate that a Generalized Neyman-Pearson optimality result holds for the empirical likelihood ratio test for moment restrictions with a possibly continuous distribution. 3 2 moment restriction tests Consider a known function g d m, where r is the parameter space. Suppose our null hypothesis is given by the following moment restriction: (H 0 ) g x 0 d 0 = 0 0 where 0 is the (true and unknown) law of the random variable x. Let x i n i= be IID observations distributed according to 0. When m>r H 0 is called an overidentifying restriction. Our task is to decide whether the restriction (H 0 ) is compatible with the observations. Among the various methods that could be potentially used to test (H 0 ), let us first consider the method of empirical likelihood. The notion of empirical likelihood was introduced by Owen (988, 990a, 990b, 99), and extended to the above setting by Qin and Lawless (994). The (constrained) maximum empirical likelihood is n n n (3) = sup sup p i pi > 0 p i = p i g x i = 0 L constrained EL p p n i= Without the moment constraint, the maximum empirical likelihood is L unconstrained EL = n n. The empirical likelihood ratio statistic for testing the hypothesis (H 0 ) is given by (4) l EL = 2 log ( L constrained EL /L unconstrained EL = inf max 2 n log ( + g x m i ) i= (if the conditioning set for (3) is empty, simply let l EL = ). Under regularity conditions, the limiting distribution of this statistic is chi-squared with m r degrees of freedom (Qin and Lawless (994)). The empirical likelihood procedure described above can be re-interpreted using information theory in the same way as in the standard parametric likelihood procedure. 2 There are some other global asymptotic efficiency measures used in the literature. For example, the LR test for multinomial models is known to be optimal in Bahadur s sense (Serfling (980, Section 0.6.4)). Given this result, one may conjecture that empirical likelihood is optimal also in Bahadur s sense. The Bahadur AREtypically relies on a law of large numbers under alternatives, and Hoeffding s measure may have some advantage in this regard. The Hodges-Lehman criterion uses the large deviation principle (LDP) under a particular type of alternative, and this is hard to evaluate in practice for many models, including ours. The Chernoff AREis more balanced than Bahadur or Hodges-Lehman in that it uses the LDP for both size and power (so does the Hoeffding ARE), but it is not straightforward to apply it to our case. For example, Theorem 2.2 of Kallenberg (982), which is a basic apparatus for obtaining the Chernoff index, does not directly apply to our problem. 3 Berk and Jones (979) note similar optimality results for likelihood-based tests in testing the goodness-of-fit for a uniform distribution. ) i= i=

moment restrictions 663 Let M denote the space of probability measures on the Borel -field d B d. Define = M g x d = 0 and =. With this notation, our hypothesis can be rewritten as (H 0 0. For probability measures P and Q, define the relative entropy: I P Q = log dp/dq dp if P is absolutely continuous with respect to Q = otherwise. It is easy to see that (3) is equivalent to (5) inf I n P P where n denotes the empirical measure of x i n i= (note that the set M includes discrete measures). The empirical likelihood ratio test of (H 0 )isto (E) reject H 0 if inf P I n P >C for a constant C. Note that the test (E) depends on data solely through the empirical measure. The standard GMM and its alternatives mentioned in Section also fall into this category. 4 Therefore in what follows a test is defined as a partition 2 of M such that (H 0 ) is accepted (rejected) if the observed empirical measure is in 2. A test (partition) 2 may be denoted simply by as a short-hand notation. Our goal is to construct a test that is asymptotically optimal in Hoeffding s sense. 3 main results First we introduce a metric for the space of probability measures M. LetF and F 2 be the distribution functions of and 2, respectively, in M. The multivariate Lévy metric for and 2 is 2 = inf >0 F x e F 2 x F x + e + for all x d where e = is the unit vector in d. In what follows M is equipped with the Lévy metric. Recall that the Lévy metric is compatible with the weak topology on M (see, for example, Lemma 3.2.2 of Deuschel and Stroock (989)). To apply Hoeffding s approach to our problem, it is necessary to evaluate large deviation probabilities, as discussed in Section. As the rejection region of the empirical likelihood test is defined in terms of n, a large deviation principle for empirical measures is needed. The following result, known as Sanov s Theorem, is useful for this purpose. Throughout the rest of the paper, the notation n is used to denote the n-fold product measure n i= of a measure. 4 These specification tests may fail to be consistent (see Newey (985)). If g x i is replaced by g x i z i and its conditional mean given z i is restricted, then many consistent tests for the restriction are available. Whether the results in this paper carry over to conditional mean restrictions is an interesting open question. To answer this, a careful analysis of the topology of rejection sets is necessary, among other technical difficulties. This is beyond the scope of the current paper.

664 yuichi kitamura Theorem (Sanov): Let M denote the space of probability measures on a Polish space equipped with the Lévy metric, and suppose M. Then n log n n G inf I G for all closed sets G M, and lim inf n log n n H inf I H for all open sets H M. See Deuschel and Stroock (989, Theorem 3.2.7) for a statement, a proof, and details of Sanov s Theorem. Next, let us introduce some more notation. In what follows we sometimes need to consider a sequence of tests indexed by the sample size. In such a case, we write n rather than to signify the dependence of the test on n explicitly. Let B be an open ball of radius around. A -blowup (smoothing) of a set X is defined by X = X B. X c denotes the complement of a set X. As noted in Section 2, an empirical likelihood ratio test is mathematically equivalent to a test in terms of inf P I P, which is a functional of. This fact provides a useful insight for the Generalized Neyman-Pearson optimality of empirical likelihood. The basic idea is to apply the large deviation principle to the functional inf P I P. If the functional were -continuous, (i.e. continuous in the -topology 5 ), then it would be fairly straightforward to establish the optimality of empirical likelihood. This is not the case, unfortunately, and a conditioning argument is used here to deal with the problem. Put loosely, we consider a set of conditional measures for which inf P I P is -continuous. To invoke this conditioning argument, the following assumption is introduced. Let u denote the Euclidean norm u u /2 of a column vector u. (T) P sup g x = = 0 for all P (T) is a tightness condition and simply says that sup g x is a random variable under all P (see, e.g., Chow and Teicher (997)). We also assume the following. (C) At each g x is continuous for all x d Here is our main theorem. Theorem 2 ( -Optimality of Empirical Likelihood): Assume that (T) and (C) hold, and for any >0, let = 2 denote the following empirical likelihood ratio test: = M inf I P < 2 = c P (a) sup P n log P n n 2. 5 -topology on M is the topology generated by the open sets M d dw < > 0, ranging over the space of all bounded measurable functions on d into.

(b) If a test n = n 2 n satisfies sup P for some >0, then for all P 2 M. moment restrictions 665 n log P n n 2 n Proof: See the Appendix. n log P n 2 n n n log P n 2 n The optimality property of the empirical likelihood ratio test described above is termed -optimality. 6 Note that it does not matter what the alternative P 2 is, and in that sense our result establishes a uniform optimal property of the empirical likelihood ratio test. In the literature of information theory this kind of optimal test is said to be universal. Our notion of -optimality differs from the original definition (see Dembo and Zeitouni (998), Zeitouni and Gutman (99a)); our optimality result holds without modifying by smoothing in any way. On the other hand, the arbitrary alternative test n requires smoothing. Theorem 2 covers Hansen s overidentifying restrictions test, at least in IID cases, as the test is based on the empirical distribution. Note that the existence of any small positive number that fulfills the qualifications in Theorem 2 will ensure the -optimality of empirical likelihood. The margin is still necessary due to the rough nature of the large deviation results used here. (See Dembo and Zeitouni (998, p. 33)). More precisely, the proof of the theorem is based on the argument that the empirical likelihood-based rejection region 2 includes the rejection region of an alternative test (thus the former is at least as powerful as the latter), under the qualification that the two tests are comparable in terms of the large deviation probability of type I errors. So, to prove the theorem, it suffices to show that if the inclusion relation does not hold, there is a region (in the space of measures) for which the (large deviation) type I error probability of the empirical likelihood ratio exceeds that of the alternative test, violating the qualification. This can be shown using Sanov s Theorem. The -smoothing is necessary for the application of Sanov s Theorem. With an additional assumption, the need to smooth alternative tests can be eliminated. A test n = n 2 n is called regular if 0 P n log P n n 2 n = sup P n log P n n 2 n This condition has been used by Zeitouni and Gutman (99a), who also have provided a sufficient condition for it. Using this definition, we get the following corollary. Corollary : Assume that (T) and (C) hold. For any regular test n that satisfies sup P n log P n n 2 n n log P n 2 n n n log P n 2 n for all P 2 M, where is the empirical likelihood ratio test defined in Theorem. 6 Note that the size of the tests in Theorem 2 is controlled through sup of, not of sup. While the analysis of the latter formulation is important, here we choose to use the former due to technical reasons.

666 yuichi kitamura Proof: See the Appendix. As the above corollary shows, in terms of Hoeffding s type of asymptotic efficiency criterion, the empirical likelihood ratio test (E) is not less powerful than any regular test. This is a Neyman-Pearson type optimality result, though the probability law of x under an alternative hypothesis is not known a priori. Our result also differs from the standard Neyman-Pearson lemma in that our null hypothesis is nonparametric. 4 numerical examples This section provides some simulation evidence. Our experimental design follows Hall and Horowitz s (996) simulation study. In each Monte Carlo replication, an IID sequence of bivariate normal random vectors x i z i N c 0 4 + s 2 I 2, i = n, is generated, where c is a 2 -vector and s is a scalar. The distribution under an alternative is characterized by c and s. Hall and Horowitz consider the estimation of an unknown parameter in a nonlinear model h x z = exp 0 72 x+z +3z, instrumented by a constant and z. The null hypothesis is (exp 0 72 x + z + 3z ) z d 0 = 0 0 where 0 is the law of x i z i. The null holds at c s = 0 0 0 and = 3 in our simulation design. Four testing procedures are compared. The first is the empirical likelihood ratio test. The second is Hansen s J -test with the usual two-step GMM. The third is the J -test with 0-step GMM, where the optimal weighting function is re-estimated ten times, each time evaluated at the estimate from the previous step. The fourth is the Euclidean likelihood ratio test ; this is simply the J -test with the continuous updating GMM. 7 Our algorithm for empirical likelihood first computes the profile empirical likelihood at a fixed value of. This inner loop returns the log value of the profile empirical value at each, and the outer loop maximizes the log profile likelihood over. The computational cost of the inner loop is trivial due to the convexity of the problem. The outer loop maximization with respect to is more difficult, depending on the property of the moment function, but no more than the maximization of GMM objective functions. An interesting empirical finding is that, while both the continuous-updating GMM (i.e. the maximum Euclidean likelihood estimation) and the maximum empirical likelihood estimation are one-step procedures (as opposed to feasible type procedures), it appears that the maximum empirical likelihood estimator is much more stable in the algorithmic sense. 7 The continuous updating GMM procedure has a tendency to yield very large estimates. This is also noted by Hansen, Heaton, and Yaron (996) and Imbens, Spady, and Johnson (998). It appears that one of the main causes of this is that the weighting matrix sometimes explodes at parameter values far from the true value, thereby driving down the value of the GMM objective function (thus the null hypothesis is likely to be accepted). This tendency becomes conspicuous as the DGP deviates from the null. In the simulations reported here, the continuous updating GMM algorithm is modified so that it switches to the 0-step GMM when the maximum element of the weighting matrix exceeds 0 0. If anything, this switching algorithm would be biased in favor of the continuousupdating GMM. The (size-corrected) power of the continuous-updating GMM-based test can be overestimated by this procedure, as the explosion of the weighting matrix that makes the objective function value small tends to occur more frequently under alternatives than under the null.

moment restrictions 667 TABLE I Size of Moment Restrictions Tests Nominal Size 0 00 0 050 0 00 Empirical Likelihood 0 023 0 087 0 44 2-Step GMM 0 024 0 087 0 43 0-Step GMM 0 039 0 094 0 40 Continuous Updating GMM 0 024 0 077 0 24 Note: The number of Monte Carlo replications is 5,000. Throughout this experimental design, n is set to be 200. Table I reports the size properties of the four tests, based on 5,000 Monte Carlo replications. Though all the tests have moderate size distortions, the ten-step GMM is slightly worse, and the continuous updating is slightly better than the others. The empirical distributions of the test statistics in this simulation are used for size-correction in the next four tables. The power properties are reported in Table II. The results are based on 500 Monte Carlo replications for each parameterization (the size corrected critical values are obtained from 5,000 replications, as noted above). Four directions of the departure from the null are considered using the parameterization (c s = k, where k is a scale factor that takes values from 0 2 to 0.2, and is either (,, 0), (2, 0, 0), 0, or (0, 0, ). We did not experiment with the direction (0, constant, 0), since the null continues to hold in this direction. When = 0, the ten-step GMM and the continuous updating GMM tend to have poor power for negative k s. The two-step GMM test has lower power than the other tests against positive k s, and its power is considerably lower than the empirical likelihood ratio test, in particular for positive k s. While the ranking among the three GMM-based tests varies with parameterization, empirical likelihood performs better than them consistently for the other specifications of as well. Though the two-step GMM test has higher power than the ten-step GMM and the continuous updating GMM tests in many cases, its power loss relative to the empirical likelihood ratio test is often sizable. 5 conclusion We obtained a new result on the AREof moment restriction tests that had been previously known to be asymptotically equivalent using the conventional local power analysis. A powerful large deviation result, known as Sanov s theorem, was used to discover an optimal property of Owen s empirical likelihood ratio tests. Loosely put, our theorem tells us that if two tests, an empirical likelihood ratio test and an arbitrary test based on some other distance, satisfy the same size constraint, the empirical likelihood ratio test is more powerful than the alternative test, unless the relative entropy and the alternative distance happen to coincide (in the latter special case, the two tests have essentially the same power). Note well that in our theorem size and power are evaluated through the large deviation principle, so their asymptotic decay rates are used as AREcriteria. An attractive feature of our optimality result is that it holds in a Generalized Neyman-Pearson sense, that is, it does not matter what the alternative is the latter information is typically not available in practice. Since we have demonstrated that empirical likelihood ratio tests have a good power property, they seem to deserve serious attention in future econometric research of moment

668 yuichi kitamura TABLE II Power of Moment Restrictions Tests Nominal Size = 0 05 GMM GMM Continuous Continuous k l EL 2-Step 0-Step Updating l EL 2-Step 0-Step Updating = 0 = 2 0 0 0 200 0 880 0 768 0 554 0 584 000 000 0 994 0 902 0 50 0 634 0 502 0 286 0 356 000 000 0 994 0 846 0 00 0 356 0 220 0 090 0 8 0 982 0 952 0 860 0 848 0 050 0 4 0 078 0 02 0 022 0 398 0 30 0 2 0 48 0.050 0 80 0 66 0 226 0 72 0 392 0 402 0 444 0 346 0.00 0 426 0 334 0 500 0 364 0 706 0 606 0 692 0 402 0.50 0 680 0 298 0 688 0 528 0 860 0 668 0 802 0 442 0.200 0 804 0 464 0 772 0 540 0 936 0 744 0 836 0 524 = 0 = 0 0 0 200 000 0 990 0 972 0 960 0 978 0 974 0 966 0 742 0 50 0 926 0 870 0 78 0 758 0 866 0 866 0 888 0 686 0 00 0 556 0 438 0 80 0 252 0 598 0 60 0 644 0 506 0 050 0 2 0 066 0 04 0 020 0 90 0 96 0 222 0 96 0.050 0 54 0 62 0 82 0 42 0 30 0 076 0 08 0 024 0.00 0 298 0 326 0 36 0 230 0 378 0 240 0 070 0 04 0.50 0 460 0 498 0 480 0 340 0 720 0 530 0 256 0 280 0.200 0 576 0 62 0 582 0 358 0 904 0 754 0 474 0 492 Note: The sample size and the number of Monte Carlo replications are 200 and 500, respectively. Size-correction was performed using 5,000 Monte Carlo replications of the statistics under the null hypothesis. See Section 4 for details. restriction models, which are conventionally tested via GMM. While we recognize the importance of the issue of small sample size distortion, it should be emphasized that statistical testing procedures with weak power make few scientific contributions. In view of the results obtained in this paper, attempts to better understand the small sample size properties of empirical likelihood ratio tests should be valuable. Dept. of Economics, University of Wisconsin, Social Science Bldg., 80 Observatory Dr., Madison, WI 53706-393, U.S.A. Manuscript received October, 997; final revision received October, 2000. APPENDIX We first show some auxiliary results before proving the main theorem. Let us introduce the set B m = x sup g x m x m m, which will be used for conditioning. Note that lim m P B m = for all P under Condition (T). Let P m denote a conditional probability measure defined by P m C = P C B m C B d P M. As in the text, the notation P m n signifies the n-fold product measure of P m. B denotes the indicator function of B d. Lemma : inf P I P is continuous in. m

moment restrictions 669 Proof: Let m 2 = inf I P, where P m m = m and m = M g x Bm d = 0. Without loss of generality, we focus on the case where m is nonempty. By the duality of partially-finite programming (see Theorem 3.4 in J. Borwein and Lewis (993) and Theorem 3. in P. Borwein and Lewis (994)), inf P I P = max m log + g x B m d. Let R = log + g x B m d. Due to its concavity in, Theorem 0.8 of Rockafeller (970) and following the proof of Theorem 3, Haberman (984), the maximizer of R is continuous in, and so is R = inf P I P = max m R. Moreover, under (C), R is continuous in M. By the maximum theorem, inf I P = P m inf inf P I P is continuous in. Q.E.D. m Lemma 2: lim inf m inf 2 I inf 2 I P for P M. Proof: It can be shown by following the proof of Lemma 4. of Groeneboom, Oosterhoff, and Ruymgaart (979) (note that elements in M B m = for an m c 2 are irrelevant, since I = for all m for such elements). Q.E.D. Lemma 3: inf m I = inf 2 I. 2 Proof: Let M B m denote the space of probability measures on B m. Then (A.) m 2 M B m = M B m = M B m inf = M B m inf inf I P P m max max = M B m inf I P P = 2 M B m log ( + g x B m ) d log + g x d Since I = for M B m, inf m I m = inf m I P M Bm and 2 2 inf 2 I = inf 2 M Bm I. These equalities and (A.) imply the lemma. Q.E.D. Proof of Theorem 2: First Claim: Note (A.2) lim P m k k 2 = P k k 2 for all k m Also note that for any >0 there exists an integer k = k such that (A.3) k log P k k 2 n log P n n 2 By (A.2) and (A.3), there exists an integer m 0 such that (A.4) k log P m k k 2 n log P n n 2 2 for all m>m 0 We now find a bound for the left-hand-side of (A.4). Define E m = x d x m and let M E m denote the set of probability measures on E m. Since E m is compact, so is M E m in the weak topology. Let C m = m 2 M E m. Consider a (positive) null sequence > 2 >.Forz M, let B z denote a closed ball of radius around z with respect to the Lévy metric. We can find a finite covering of C m that consists of K = K l m such balls with centers z z 2 z K in C m with radius l, that is, K B z j= j l C m. Define (A.5) z l = arg max P m k k B z j l z j Cm

670 yuichi kitamura Then (A.6) k log P m k k 2 = k log P m k k C m k log P m k k K B z j l j= k log K + k log P m k k B z l l Choose large enough k = k K that satisfies the following two conditions: (i) (A.3) holds with k = k, (ii) k log K. Setting k = k in (A.6), (A.7) k log P m k k 2 + k log P m k k B z l l = + lim kj log [ P m k k B z l l ] j j + inf B z l l n log P m n n B z l l I ( ) + where the second to last inequality is due to the convexity of B z l l and the last inequality is implied by Sanov s theorem. Define m O = inf 2 I P > and m P m = inf 2 I P. Since m P m 2 is closed by Lemma, there exists a subsequence l h so that h= z lh converges to z = z m k in m. For large enough 2 h, the inclusion relationship B z m O 2 = holds, since m O 2 l h is open by Lemma. Then for large enough m, (A.8) inf B z We can choose m so that I ( ) inf m O 2 inf m 2 I ( ) I ( ) = inf I ( ) 2 by Lemma 3. (A.9) inf I ( 2 ) lim inf m inf I ( ) 2 But (A.0) lim inf m inf I ( ) inf I P by Lemma 2 2 2 for all P. By (A.4), (A.7) (A.0), n log P n n 2 + 4 for all >0. This establishes the first claim. Second Claim: Our proof closely follows the arguments by Dembo and Zeitouni (998, Section 7.) and Zeitouni and Gutman (99a, b). We first show that the proposition (A.) n

moment restrictions 67 holds for all n>n 0 with some n 0. Suppose otherwise. Then there exists an infinite sequence of probability measures m such that m and m 2 n m. Since the set M inf P I P is compact in the weak topology (Deuschel and Stroock (989, Ch. 3.2)), there exists a subsequence m k such that mk converges to a probability measure within this set. For such a B /2 n 2 m holds for some subsequence n m. Moreover, for a large enough m 0 m0 B /2 holds. Since m0, inf P I m0 P <. By Sanov s Theorem and various inclusion relations, sup P n log P n n 2 n sup P sup P sup P sup P > lim inf n log P n m m m nm 2 n m lim inf n log P n n B /2 [ This contradicts the definition of n, implying (A.). inf B /2 [ I ( m0 P )] ] I P n log P n 2 n n log P n 2 n n and the proof is complete. Q.E.D. Proof of Corollary : Choose >0. By the regularity of n and the assumption, there exists a > 0 such that sup P n log P n n n 2 + Define = M inf P I P <. Apply Theorem (b) to obtain n log P n n n n log P n n = n log P inf n I n P < P Since this holds for all >0, the conclusion follows. Q.E.D. REFERENCES Berk, R. H., and D. H. Jones (979): Goodness-of-fit Test Statistics that Dominate the Kolmogorov Statistics, Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 47, 47 59. Borwein, J. M., and A. S. Lewis (993): Partially-finite Programming in L and the Existence of Maximum Entropy Estimates, SIAM Journal of Optimization, 3, 248 267. Borwein, P., and A. S. Lewis (994): Moment-matching and Best Entropy Estimation, Journal of Mathematical Analysis and Applications, 85, 596 604. Chow, Y. S., and H. Teicher (997): Probability Theory, Third Edition. Springer: New York. Dembo, A., and O. Zeitouni (998): Large Deviations Techniques, Second Edition. Springer: New York. Deuschel, J. D., and D. W. Stroock (989): Large Deviations. New York: Academic Press. Groeneboom, P., J. Oosterhoff, and F. H. Ruymgaart (979): Large Deviation Theorems for Empirical Probability Measures, Annals of Probability, 4, 553 586.

672 yuichi kitamura Gutman, M. (989): Asymptotically Optimal Classification for Multiple Tests with Empirically Observed Statistics, IEEE Transactions on Information Theory, 35, 40 408. Haberman, S. J. (984): Adjustment by Minimum Discriminant Information, Annals of Statistics, 2, 97 988. Hall, P., and J. L. Horowitz (996): Bootstrap Critical Values for Tests Based on Generalized- Method-of-Moments Estimators, Econometrica, 64, 89 96. Hansen, L. P. (982): Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, 029 054. Hansen, L. P., J. Heaton, and A. Yaron (996): Finite Sample Properties of Some Alternative GMM Estimators, Journal of Business and Economic Statistics, 4, 262 280. Hoeffding, W. (965): Asymptotically Optimal Tests for Multinomial Distributions (with Discussion), Annals of Mathematical Statistics, 36, 369 408. Imbens, G., R. Spady, and P. Johnson (998): Information Theoretic Approaches to Inference in Moment Condition Models, Econometrica, 66, 333 357. Kallenberg, W. C. (982): Chernoff Efficiency and Deficiency, Annals of Statistics, 0, 583 594. Kitamura, Y. (996): Empirical Likelihood and the Bootstrap for Time Series Regressions, Mimeo. (997): Empirical Likelihood Methods with Weakly Dependent Processes, Annals of Statistics, 25, 2084 202. Kitamura, Y., and M. Stutzer (997): An Information-theoretic Alternative to Generalized Method of Moments Estimation, Econometrica, 65, 86 874. Newey, W. K. (985): Generalized Method of Moments Specification Testing, Journal of Econometrics, 29, 229 256. Oosteroff, J., and W. R. van Zwet (972): The Likelihood Ratio Test for the Multinomial Distribution, Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability 2, ed. by L. M. LeCam, J. Neyman, and E. L. Scott. Berkeley, CA: University of California Press, 3 49. Owen, A. (988): Empirical Likelihood Ratio Confidence Intervals for a Single Functional, Biometrika, 75, 237 249. (990a): Empirical Likelihood Ratio Confidence Regions, Annals of Statistics, 8, 90 20. (990b): Empirical Likelihood and Small Samples, in Computing Science and Statistics: Proceedings of the Symposium on the Interface. Berlin: Springer-Verlag, pp. 79 88. (99): Empirical Likelihood for Linear Models, Annals of Statistics, 9, 725 747. Qin, J., and J. Lawless (994): Empirical Likelihood and General Estimating Equations, Annals of Statistics, 22, 300 325. Rockafeller, R. T. (970): Convex Analysis. Princeton: Princeton University Press. Serfling R. J. (980): Approximation Theorems of Mathematical Statistics. New York: John Wiley. Steinberg, Y., and O. Zeitouni (992): On Tests for Normality, IEEE Transactions on Information Theory, 38, 779 787. Sutrick, K. H. (986): Asymptotic Power Comparison of the Chi-square and Likelihood Ratio Tests, Annals of the Institute of Statistical Mathematics, 38, 503 5. Zeitouni, O., and M. Gutman (99a): On Universal Hypothesis Testing via Large Deviations, IEEE Transactions on Information Theory, 37, 285 290. (99b): Correction to: On Universal Hypotheses Testing via Large Deviations, IEEE Transactions on Information Theory, 37, 698.