Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Similar documents
Econometrica, Vol. 69, No. 6 (November, 2001), ASYMPTOTIC OPTIMALITY OF EMPIRICAL LIKELIHOOD FOR TESTING MOMENT RESTRICTIONS

On the Asymptotic Optimality of Empirical Likelihood for Testing Moment Restrictions

EMPIRICAL LIKELIHOOD METHODS IN ECONOMETRICS: THEORY AND PRACTICE

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Entropy, Inference, and Channel Coding

Problem Set 2: Solutions Math 201A: Fall 2016

An inverse of Sanov s theorem

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood

On Consistent Hypotheses Testing

2a Large deviation principle (LDP) b Contraction principle c Change of measure... 10

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Consistency of the maximum likelihood estimator for general hidden Markov models

OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models

LECTURE 15: COMPLETENESS AND CONVEXITY

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Large deviation convergence of generated pseudo measures

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

Empirical Likelihood Ratio Test with Distribution Function Constraints

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

2.1.3 The Testing Problem and Neave s Step Method

Chapter 4. Theory of Tests. 4.1 Introduction

EECS 750. Hypothesis Testing with Communication Constraints

General Theory of Large Deviations

AW -Convergence and Well-Posedness of Non Convex Functions

On the Complexity of Best Arm Identification with Fixed Confidence

ON CONVERGENCE OF CONDITIONAL PROBABILITY MEASURES

Statistical Inference

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Finding the best mismatched detector for channel coding and hypothesis testing

Research Article Optimal Portfolio Estimation for Dependent Financial Returns with Generalized Empirical Likelihood

Berge s Maximum Theorem

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Master s Written Examination

Adjusted Empirical Likelihood for Long-memory Time Series Models

Convergence of generalized entropy minimizers in sequences of convex problems

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 3, MARCH

Mostly calibrated. Yossi Feinberg Nicolas S. Lambert

Large Deviations Techniques and Applications

DETECTING PHASE TRANSITION FOR GIBBS MEASURES. By Francis Comets 1 University of California, Irvine

Serena Doria. Department of Sciences, University G.d Annunzio, Via dei Vestini, 31, Chieti, Italy. Received 7 July 2008; Revised 25 December 2008

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Econometrica Supplementary Material

Invariant measures for iterated function systems

OVERIDENTIFICATION IN REGULAR MODELS. Xiaohong Chen and Andres Santos. April 2015 Revised June 2018 COWLES FOUNDATION DISCUSSION PAPER NO.

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Finding a Needle in a Haystack: Conditions for Reliable. Detection in the Presence of Clutter

Testing Statistical Hypotheses

Another Riesz Representation Theorem

Stat 710: Mathematical Statistics Lecture 31

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Closest Moment Estimation under General Conditions

On Composite Quantum Hypothesis Testing

A generic property of families of Lagrangian systems

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Exponential Tilting with Weak Instruments: Estimation and Testing

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Course 212: Academic Year Section 1: Metric Spaces

Maths 212: Homework Solutions

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Volume 30, Issue 1. Measuring the Intertemporal Elasticity of Substitution for Consumption: Some Evidence from Japan

Anonymous Heterogeneous Distributed Detection: Optimal Decision Rules, Error Exponents, and the Price of Anonymity

Department of Agricultural & Resource Economics, UCB

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015

University of Southampton Research Repository eprints Soton

Lecture 21. Hypothesis Testing II

Jensen s inequality for multivariate medians

On Improved Bounds for Probability Metrics and f- Divergences

Integration on Measure Spaces

The information complexity of sequential resource allocation

Introduction to Topology

2. The Concept of Convergence: Ultrafilters and Nets

APPENDIX C: Measure Theoretic Issues

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

REVIEW OF ESSENTIAL MATH 346 TOPICS

Sums of exponentials of random walks

Lecture 12 November 3

Math 117: Topology of the Real Numbers

VARIATIONAL PRINCIPLE FOR THE ENTROPY

Geometry and Optimization of Relative Arbitrage

Metric Spaces and Topology

On generalized empirical likelihood methods

Sample path large deviations of a Gaussian process with stationary increments and regularily varying variance

Verifying Regularity Conditions for Logit-Normal GLMM

Measurable Choice Functions

The Lebesgue Integral

JORDAN CONTENT. J(P, A) = {m(i k ); I k an interval of P contained in int(a)} J(P, A) = {m(i k ); I k an interval of P intersecting cl(a)}.

ON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The International Journal of Biostatistics

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Large-deviation theory and coverage in mobile phone networks

STAT 7032 Probability Spring Wlodek Bryc

Transcription:

Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised: 3 October 2007 / Published online: 24 April 2008 The Institute of Statistical Mathematics, Tokyo 2008 Abstract This paper studies the Generalized Neyman Pearson (GNP) optimality of empirical likelihood-based tests for parameter hypotheses. The GNP optimality focuses on the large deviation errors of tests, i.e., the convergence rates of the type I and II error probabilities under fixed alternatives. We derive (i) the GNP optimality of the empirical likelihood criterion (ELC) test against all alternatives, and (ii) a necessary and a sufficient condition for the GNP optimality of the empirical likelihood ratio (ELR) test against each alternative. Keywords Empirical likelihood Generalized Neyman Pearson optimality 1 Introduction This paper studies the Generalized Neyman Pearson (GNP) optimality of empirical likelihood-based tests for parameter hypotheses. Compared to the Neyman Pearson optimality, the GNP optimality focuses on the large deviation errors of tests, i.e., the convergence rates of the type I and II error probabilities under fixed alternatives. Under some restriction on the convergence rate of the type I error probability, we maximize the convergence rates of the type II error probabilities under fixed alternatives. We derive (i) the GNP optimality of the empirical likelihood criterion (ELC) test against T. Otsu would like to thank Yuichi Kitamura, Gustavo Soares, and two anonymous referees for helpful comments. Financial support from the National Science Foundation (SES-0720961) is gratefully acknowledged. T. Otsu (B) Cowles Foundation and Department of Economics, Yale University, P.O. Box 208281, New Haven, CT 06520-8281, USA e-mail: taisuke.otsu@yale.edu

774 T. Otsu all alternatives, and (ii) a necessary and a sufficient condition for the GNP optimality of the empirical likelihood ratio (ELR) test against each alternative. Empirical likelihood proposed by Owen (1988) is nonparametric likelihood constructed from some moment restrictions. Although the empirical likelihood-based estimator and tests are asymptotically first-order equivalent to the generalized method of moments (GMM) estimator and tests, recent research found important differences between these methods. DiCiccio et al. (1991) and Newey and Smith (2004) showed desirable higher order properties of the empirical likelihood-based parameter hypothesis test and estimator, respectively. Kitamura (2001) considered power properties of the empirical likelihood-based overidentifying restriction test and showed that the empirical likelihood-based test has the GNP optimality against all fixed alternatives. The motivation of this paper is as follows. Suppose that a statistician propose a moment restriction model to explain some data. The validity of the model can be checked by the empirical likelihood-based overidentifying restriction test whose GNP optimality is shown by Kitamura (2001). If the proposed moment restriction model is accepted, the next task for the statistician is to conduct some hypothesis testing for unknown parameters in the model. The main interest of the present paper is to investigate the GNP optimal properties of the empirical likelihood-based tests for the parameter hypothesis testing problem and to complement Kitamura s (2001) GNP optimality result for the overidentifying restriction testing problem. To analyze the convergence rates of the type I and II error probabilities under fixed alternatives, large deviation theory plays a key role. See Dembo and Zeitouni (1998) for a review of large deviation theory. By utilizing these convergence rates, several efficiency or optimality criteria for tests are proposed, such as Bahadur and Chernoff efficiency. Serfling (1980, Chap. 10) provides a concise review on this topic. The notion of the GNP optimality is originally proposed by Hoeffding (1965) to compare the chisquare and likelihood ratio tests for multinomial distributions. Hoeffding s approach is extended to various setups by, e.g., Zeitouni and Gutman (1991) and Steinberg and Zeitouni (1992). Zeitouni et al. (1992) considered finite-state Markov process and derived a necessary and a sufficient condition for the GNP optimality of the likelihood ratio test. This paper extends the result of Zeitouni et al. (1992) to moment restriction models, where data can be continuously distributed and the distribution forms of the data are unspecified (i.e., semiparametric). This paper is organized as follows. Section 2 introduces our basic setup. Section 3 presents the optimality results. Section 4 concludes. All proofs are in Appendix. 2 Setup Let x i n be an iid sequence of d 1 random vectors, θ Rp be a p 1 vector of unknown parameters, θ 0 be the true value, g : R d R q be an q 1 vector of known functions, where q > p (overidentified). Consider the unconditional moment restriction model: E[g(x,θ 0 )]= g(x,θ 0 )dµ 0 = 0, (1)

GNP optimality of empirical likelihood 775 where µ 0 is the unknown true measure of x. Let 0 be a subset of. Consider the composite parameter hypothesis testing problem: H 0 : θ 0 0, H 1 : θ 0 \ 0. (2) This setup includes equality and inequality hypotheses for possibly nonlinear functions of the parameters. This paper studies optimal properties of empirical likelihood for testing H 0.FromOwen (1988) and Qin and Lawless (1994), empirical likelihood at each θ is defined as L(θ) = sup p i n n p i p i > 0, n p i = 1, n p i g (x i,θ) = 0. (3) Without the restriction n p i g (x i,θ)=0, the unconstrained empirical likelihood is L u = sup p i n n p i p i > 0, n p i = 1 = n n. For testing H 0, we consider the following two empirical likelihood-based test statistics. (i) (ii) ELC test statistic: ELR test statistic: l C = 2 sup log L(θ) log L u θ 0 n = inf max 2 log ( 1 + γ g (x i,θ) ), (4) θ 0 γ R q l R = 2 sup log L(θ) sup log L(θ) θ 0 θ n = inf max 2 log ( 1 + γ g (x i,θ) ) θ 0 γ R q n inf max 2 log ( 1 + γ g (x i,θ) ). (5) θ γ R q Under H 0 with certain regularity conditions, l C and l R converge in distribution to χ 2 (q) and χ 2 (p), respectively (Qin and Lawless 1994). Note that the second term in (5) (i.e., l O = l C l R ) is the empirical likelihood overidentifying restriction test statistic, which follows the χ 2 (q p) limiting distribution. Kitamura (2001) derived the GNP optimality of l O for testing the overidentifying restrictions.

776 T. Otsu We now introduce an information theoretic interpretation of empirical likelihood. Let M be the space of probability measures on R d, P (θ) =µ M : g (x,θ) dµ = 0 be a set of measures which satisfy the moment restrictions (1) at θ, P = θ P (θ), and P 0 = θ 0 P (θ). Based on this notation, the hypotheses H 0 and H 1 are written as H 0 : µ 0 P 0, H 1 : µ 0 P \P 0. (6) The relative entropy (or Kullback Leibler divergence) of measures P and Q is defined as ( ) dp I (P Q) = log dp if P is absolutely continuous with respect to Q dq = otherwise. Let µ n be the empirical measure of x i n. It is known that l C 2n = inf I (µ n P), P P 0 l R 2n = inf I (µ n P) inf I (µ n P), P P 0 P P (7) i.e., the test statistics l C and l R are written as functions of µ n. Thus, the empirical likelihood-based tests can be defined by partitions of the space of measures M. (i) (ii) ELC test: ELR test: accept H 0 if µ n C1 = µ M : inf I (µ P) <η, P P 0 reject if µ n C2 = M\ C1. (8) accept H 0 if µ n R1 = µ M : inf I (µ P) inf I (µ P) <η, P P 0 P P reject if µ n R2 = M\ R1. (9) We consider a class of tests which are represented by partitions of M for µ n and derive optimal properties of the partitions C = ( C1, C2 ) and R = ( R1, R2 ). Let B (µ, δ) be an open ball of radius δ (0, ) around µ M, A δ = µ A B (µ, δ) be a δ-blowup of a set A, and P0 n and Pn be the n-fold product measures of P 0 and P, respectively. As a metric on M, we use the Lévy metric, which is defined as ρ (P 1, P 2 ) inf ɛ>0 : F 1 (x ɛe) ɛ F 2 (x) F 1 (x + ɛe) + ɛ for all x R d,

GNP optimality of empirical likelihood 777 for each P 1 M and P 2 M, where F 1 and F 2 are the distribution functions of P 1 and P 2, respectively, and e (1,...,1). The Lévy metric is compatible with the weak topology on M (see Dembo and Zeitouni (1998, Chap. D.2)). Our optimality criterion is called the Generalized Neyman Pearson δ-optimality (or simply the GNP optimality) by Zeitouni and Gutman (1991), which is a natural extension of the Neyman Pearson optimality to analyze large deviation error properties of tests. Definition 1 (Generalized Neyman Pearson δ-optimality) A test defined by a partition = ( 1, 2 ) of M is Generalized Neyman Pearson δ -optimal for a set A if it satisfies (i) sup P0 P 0 n 1 log P0 n µ n 2 η; (ii) for any test defined by some partition n = ( ) 1,n, 2,n, which satisfies sup n 1 log P0 n P 0 P 0 µn δ 2,n η (10) for some δ (0, ), wehave n 1 log P n µ n 1 n 1 log P n µ n 1,n (11) for each P A. This definition of the GNP optimality is slightly different from the one in Kitamura (2001), where the set A is to be M. To analyze the optimality of the ELR test, we adopt this definition. If the test = ( 1, 2 ) is consistent, the type I (i.e., P0 n µ n 2 for P 0 P 0 ) and type II (i.e., P n µ n 1 for P M\P 0 ) error probabilities converge to zero. As we will see below, these convergence rates (or large deviation errors) are exponentially small. Therefore, under the restriction on the exponential convergence rate of the type I error in (10), we minimize the exponential convergence rate of the type II error in (11). However, because of a rough nature of the large deviation theorem for µ n (see Theorem 1 below), we need a modification by the δ-blowup, i.e., the rival test n must satisfy the type I error restriction (10)fortheδ-blowup of 2,n. We can replace δ 2,n in (10) with 2,n if we consider a class of tests that satisfy n 1 log P0 n δ 0 P 0 P 0 µn δ 2,n = sup P 0 P 0 n 1 log P0 n µn 2,n (see, Kitamura 2001, pp. 1665 1666). In order to analyze the convergence rates in Definition 1, we need to know large deviation properties of the empirical measure µ n. For our purpose, Sanov s theorem (see, e.g., Deuschel and Stroock 1989, Theorem 3.2.17) plays a central role. Theorem 1 (Sanov) Let be the Polish space (i.e., complete separable metric space), M ( ) be the space of measures on endowed with the Lévy metric and P M ( ). Then n 1 log P n µ n G inf ν G I (ν P)

778 T. Otsu for each closed sets G M ( ), and lim inf for each open sets H M ( ). n 1 log P n µ n H inf ν H I (ν P) Sanov s theorem says that the large deviation properties of the empirical measure µ n is characterized by the relative entropy I. Intuitively, if the set G (or H) is far from the true measure P in terms of the relative entropy, the exponential convergence rate of P n µ n G (or P n µ n H) is fast. Note that the upper (resp. lower) bound of the convergence rate is available only for closed (resp. open) sets. This property is called a rough nature of the Sanov s theorem. 3 Main results 3.1 Optimality of the empirical likelihood criterion test We first present the GNP optimality of the ELC test in (8). We impose the following assumptions. Assumption 1 Assume that (i) (ii) x i n is iid, P sup θ 0 g(x,θ) = = 0 for each P P 0, (iii) g(x,θ)is continuous for all x R d at each θ 0. Assumption 1 (i) is required to apply Sanov s theorem. Assumption 1 (ii) is a tightness condition, which guarantees that sup θ 0 g(x,θ) is a random variable. Assumption 1 (iii) is required to obtain the first statement of Lemma 1 (i) (Appendix A.2), which ensures continuity of a functional of the relative entropy. The GNP optimality of the ELC test is obtained as follows. Theorem 2 (Optimality of the empirical likelihood criterion test) Under Assumption 1, the ELC test (8) is Generalized Neyman Pearson δ-optimal for M. The proof is similar to that of Kitamura (2001, Theorem 2). We basically show that as far as the rival test n satisfies the type I restriction (10), the acceptance region 1,n must satisfy C1 1,n for sufficiently large n. Note that the GNP optimality of the ELC test holds for any alternative P M. Thus, even if the moment restriction (1) does not hold (i.e., P M\P), the ELC test maintains the optimal power property in (11). 3.2 Optimality of the empirical likelihood ratio test We next investigate the GNP optimality of the ELR test. To test the parameter hypothesis H 0, we often assume that the moment restriction (1) holds true and compare

GNP optimality of empirical likelihood 779 the difference of the maximized values of empirical likelihood under H 0 and H 1.The critical value η for the ELR test in (9) is typically smaller than the critical value η for the ELC test in (8). From the GNP optimality of the ELC test (Theorem 2), our next question is: when do the large deviation properties of the ELR test with the critical value η become equivalent to those of the ELC test with the same critical value η? Therefore, we set as η = η in the following discussion. We add the following assumption. Assumption 2 g(x,θ)is continuous for all x R d at each θ. This assumption is an extension of Assumption 1 (iii) and is required to obtain the second statement in Lemma 1 (i) (Appendix A.2). From η =η and inf P P I (µ P) 0 for each µ M, wehave C1 R1 and R2 C2. From Theorem 2 and R2 C2, Definition 1 (i) is obviously satisfied for the ELR test R. On the other hand, from C1 R1, the type II error probability satisfies P n µ n R1 = P n µ n C1 + P n µ n R1 C2, (12) for each P M. Therefore, n 1 log P n µ n C1 n 1 log P n µ n R1 for each P M, i.e., if the critical values are same, the ELC test has a faster convergence rate of the type II error probability than the ELR test. However, if the convergence rate of the component P n µ n R1 C2 in (12) is faster than that of P n µ n C1 for some P M, i.e., n 1 log P n µ n R1 C2 n 1 log P n µ n C1, (13) then we can still obtain n 1 log P n µ n C1 = n 1 log P n µ n R1 and the ELR test becomes GNP optimal at P. Thus, to obtain the GNP optimality of the ELR test, we need additional conditions for the alternatives. Let B m = x R d : sup θ g (x,θ) m, x m for m N, and M =µ M : µb m =1forsomem N, R1,0 = µ M : inf I (µ P) inf I (µ P) η, P P 0 P P C1,0 = µ M : inf I (µ P) η, C2,0 = µ M : inf I (µ P) >η. P P 0 P P 0 Note that lim m P B m = 1 for each P P 0 by Assumption 1 (ii). The GNP optimality of the ELR test is presented as follows. Theorem 3 (Optimality of the empirical likelihood ratio test)

780 T. Otsu (i) Sufficient condition: Suppose that Assumptions 1 (i) (ii) and 2 hold. If P M = P M : P sup θ g(x,θ) = = 0, inf I (µ P) inf I (µ P) µ R1,0 C2 µ C1 M, (14) (ii) then the ELR test (9) is Generalized Neyman Pearson δ -optimal for P. Necessary condition: Suppose that Assumptions 1 (i) (ii)and 2 hold. If the ELR test (9) is Generalized Neyman Pearson δ-optimal for P, then P M = P M : P sup θ g(x,θ) = = 0, inf I (µ P) inf I (µ P) µ R1 C2,0 M µ C1,0. (15) Obviously, M M. The restriction P sup θ g(x,θ) = = 0inM and M implies that lim m P B m = 1 for each P M and M. Theorem 3 (i) is a sufficient condition for the GNP optimality of the ELR test at P. By applying Sanov s theorem (Theorem 1), the restriction inf µ R1,0 C2 I (µ P) inf µ C1 M I (µ P) in M guarantees (13). Because of a rough nature of Sanov s theorem, we need to use R1,0 C2 and C1 M in the restriction instead of R1 C2 and C1, respectively. Theorem 3 (ii) is a necessary condition for the GNP optimality of the ELR test at P. Again, because of the rough nature of Sanov s theorem, the restriction inf µ R1 C2,0 M I (µ P) inf µ C1,0 I (µ P) in M is slightly different from that in M. The component M in M and M is required to apply a trimming argument of Groeneboom et al. (1979). If the support of x is finite or compact, we can drop M in the definitions of M and M. In finite state Markov models, Zeitouni et al. (1992) derived similar results for the (standard) likelihood ratio test. Our result can be considered as an extension to the ELR test, where the model is semiparametric. 4 Conclusion This paper studies the Generalized Neyman Pearson (GNP) optimality of the empirical likelihood criterion and ratio tests for testing parameter hypotheses. The GNP optimality is defined by extending the Neyman Pearson optimality to the large deviation analog. We show that (i) the empirical likelihood criterion test is GNP optimal for all alternatives, and (ii) under additional conditions for the alternatives the empirical likelihood ratio test is also GNP optimal.

GNP optimality of empirical likelihood 781 Appendix: Mathematical appendix Notation Our notation closely follows that of Kitamura (2001). Let P (m) be the conditional probability measure P (m) C = P C B m = P C B m /P B m for C B d and P M, M (B m ) be the space of probability measures on B m, P (m)n be the n-fold product measure of P (m), and 1 B, (B) c, and cl(b) be the indicator function, complement, and closure of a set B, respectively. Define P m (θ) = µ M : g (x,θ) 1 B m dµ = 0, P 0,m = θ 0 P m (θ), P m = θ P m (θ), (m) C1 µ = M : inf I (µ P) <η, P P 0,m (m) C2 µ = M : inf I (µ P) η, P P 0,m C2,ɛ = µ M : inf I (µ P) η ɛ, P P 0 (m) C2,ɛ µ = M : inf I (µ P) η ɛ, P P 0,m R1,ɛ = µ M : inf I (µ P) inf I (µ P) η + ɛ, P P 0 P P (m) R1,ɛ = µ M : inf P P 0,m I (µ P) inf P P m I (µ P) η + ɛ. A.1 Proof of Theorem 2 Since the proof is similar to that of Kitamura (2001, Theorem 2), it is omitted. A detailed proof is available from the author upon request. A.2 Proof of Theorem 3 A.2.1 Proof of (i) We set as η = η. First, we show that R2 satisfies Definition 1 (i). Since inf P P I (µ P) 0 for each µ M, wehave R2 C2. Therefore, P n 0 µ n R2 P n 0 µ n C2 for each P 0 P 0 and the result follows from Theorem 2. Next, we show that R1 satisfies Definition 1 (ii). From Theorem 2, it is sufficient to show that R1 satisfies n 1 log P n µ n R1 n 1 log P n µ n C1, (16)

782 T. Otsu for each P M. Observe that C1 R1, i.e., P n µ n R1 = P n µ n C1 + P n µ n R1 C2, (17) for each P M. From(16) and (17), it is sufficient to show that n 1 log P n µ n R1 C2 n 1 log P n µ n C1, (18) for each P M.PickanyP M. Lemma 2 implies that for each ε (0, ), there exists m ɛ N such that n 1 log P n µ n R1 C2 inf I (µ P (m) ) + ɛ, (19) µ R1,0 C2 for each m m ɛ. Lemma 1 (iv) and (v) and the definition of M in (14) imply that for each ɛ (0, ), there exists m ε N such that inf I (µ P (m) ) inf I (µ P) + ɛ µ R1,0 C2 µ R1,0 C2 inf I (µ P) + ɛ µ C1 M inf µ C1 M I (µ P (m) ) + 2ɛ inf I (µ P (m) ) + 2ɛ, µ C1 (20) for each m m ɛ. Lemma 1 (ii) and (iii) imply that for each m N. Since (m) C1 inf I (µ P (m) ) = inf µ C1 µ C1 M(B m ) (µ P(m) ) = inf C1 M(B m) I (µ P (m) ) inf I (µ P (m) ), (21) inf C1 C1 is an open set (from Lemma 1 (i)), Theorem 1 yields that I (µ P (m) ) lim inf n 1 log P (m)n µ n (m) C1, (22) for each m N. For each P M and ɛ (0, ), there exists m ɛ N such that log PB m ɛ for all m m ɛ (from P sup θ g(x,θ) = = 0). Thus, for

GNP optimality of empirical likelihood 783 all m m ɛ we have lim inf = lim inf = lim inf n 1 log P (m)n µ n (m) C1 n 1 log P (m)n µ n (m) C1 M(B m) n 1 log P n µ n (m) C1 M(B m) log PB m lim inf n 1 log P n µ n C1 +ɛ, (23) where the first equality follows from P (m)n µ n / M(B m )=0 for each m N and n N (by the definition of P (m) ), the second equality follows from the definition of P (m) and Assumption 1 (i), and the inequality follows from Lemma 1 (ii) and the set inclusion relationship. From (19) (23), we have n 1 log P n µ n R1 C2 n 1 log P n µ n C1 + 4ɛ, for each ɛ (0, ) and P M. Therefore, (18) is obtained. A.2.2 Proof of (ii) Suppose that the ELR test (9) is Generalized Neyman Pearson δ-optimal at some P. Definition 1 (ii) implies that n 1 log P n µ n R1 n 1 log P n µ n C1. Since C1 R1,wemusthave n 1 log P n µ n R1 C2 n 1 log P n µ n C1. (24) Thus, it is sufficient to show that if P satisfies (24) then we have P M. First, consider the left hand side of (24). For each ɛ (0, ), there exists m ε N such that n 1 log P n µ n R1 C2 lim inf n 1 log P n µ n R1 C2,0 lim inf n 1 log P (m)n µ n (m) R1 (m) C2,0 ɛ, (25) for each m m ɛ, where the second inequality follows from the same argument as (23). Since (m) R1 (m) C2,0 is open (from Lemma 1 (i)), Theorem 1 yields that lim inf n 1 log P (m)n µ n (m) R1 (m) C2,0 inf I (µ P (m) ) R1 (m) C2,0 inf R1 (m) C2,0 M(B m) I (µ P (m) )

784 T. Otsu = inf I µ R1 C2,0 M(B m ) (µ P(m) ) = inf I (µ P (m) ), (26) µ R1 C2,0 for each m N, where the first and second equalities follow from Lemma 1 (ii) and (iii), respectively. Lemma 1 (iv) implies that for each ɛ (0, ) there exists m ɛ N such that inf I (µ P (m) ) inf I (µ P (m) ) µ R1 C2,0 µ R1 C2,0 M inf µ R1 C2,0 M I (µ P) ɛ, (27) for each m m ɛ. Next, consider the right hand side of (24). Observe that n 1 log P n µ n C1 n 1 log P n µ n C1,0 n 1 log P (m)n µ n C1,0 = n 1 log P (m)n µ n C1,0 M(B m ) n 1 log P (m)n µ n (m) C1,0, (28) for each m N, where the first inequality follows from the set inclusion relationship, the second inequality follows from the definition of P (m) and PB m 1for all m N, the equality follows from P (m)n µ n / M(B m )=0for each m N and n N, the last inequality follows from Lemma 1 (ii) and the set inclusion relationship. Since (m) C1,0 is closed (from Lemma 1 (i)), Theorem 1 yields that n 1 log P (m)n µ n (m) C1,0 inf I (µ P (m) ) C1,0 = inf C1,0 M(B m) I (µ P (m) ) = inf µ C1,0 M(B m ) (µ P(m) ) = inf µ C1,0 I (µ P (m) ), (29) for each m N, where the equalities follow from Lemma 1 (ii) and (iii). Lemma 1 (v) implies that for each ɛ (0, ) there exists m ɛ N such that inf I (µ P (m) ) inf I (µ P) + ɛ, (30) µ C1,0 µ C1,0

GNP optimality of empirical likelihood 785 for each m m ɛ. Finally, combining these results, P satisfies inf µ R1 C2,0 M for each ɛ (0, ), i.e., P M. I (µ P) 2ɛ n 1 log P n µ n R1 C2 n 1 log P n µ n C1 inf I (µ P) + 2ɛ, µ C1,0 A.3 Auxiliary lemmas The following lemmas are derived under Assumptions 1 and 2. Lemma 1 (i) Kitamura (2001, Lemma 1): inf P P0,m I (µ P) and inf P Pm I (µ P) are continuous in µ M. (ii) A (m) M (B m )= A M (B m ) for each m N, where (A (m), A)=( (m) C1, C1), ( (m) C1,0, C1,0), ( (m) R1,0 (m) C2, R1,0 C2 ), ( (m) R1 (m) C2,0, R1 C2,0 ), and ( (m) R1,ɛ (m) C2,ɛ, R1,ɛ C2,ɛ ) for each ɛ (0, ). (iii) inf µ A M(Bm ) I (µ P (m) ) = inf µ A I (µ P (m) ) for each m N and P M, where A = C1, (m) C1,0, R1,0 C2, R1 C2,0, and R1,ɛ C2,ɛ for each ɛ (0, ). (iv) Groeneboom et al. (1979, Lemma 4.1): If is a subset of M=µ M : µb m = 1 for some m N, then lim m inf µ I (µ P (m) ) = inf µ I (µ P) for each P M. (v) If is a subset of M, then lim m inf µ I (µ P (m) ) inf µ I (µ P) for each P M. Proof (ii) and (iii) are obtained from the proof of Kitamura (2001, Lemma 3). (v) is obtained from the proof of Groeneboom et al. (1979, Lemma 4.1). Lemma 2 For each ε (0, ) and P M, there exists m ɛ N such that for each m > m ɛ. n 1 log P n µ n R1 C2 inf I (µ P (m) ) + ɛ, µ R1,0 C2 Proof A similar argument as the proof of Kitamura (2001, Theorem 2) yields that for each ε (0, ) and P M, there exists m ɛ N such that n 1 log P n µ n R1 C2 inf I (µ P (m) ) + ε/2, µ R1,ɛ C2,ɛ

786 T. Otsu for all m > m ɛ. From Lemma 1 (ii) and (iii), inf I (µ P (m) ) = inf I µ R1,0 C2 µ R1,0 C2 M(B m ) (µ P(m) ) = inf R1,0 (m) C2 M(B m) I (µ P (m) ), inf I (µ P (m) ) = inf I µ R1,ɛ C2,ɛ µ R1,ɛ C2,ɛ M(B m ) (µ P(m) ) = inf R1,ɛ (m) C2,ɛ M(B m) I (µ P (m) ), for each m N, P M, and ε (0, ). Therefore, it is sufficient to show that lim ɛ 0 inf R1,ɛ (m) C2,ɛ M(B m) I (µ P (m) ) = inf R1,0 (m) C2 M(B m) I (µ P (m) ) for each m N and P M. Now pick any P M and m N. Since the sequence of the sets A ɛ (m) = (m) R1,ɛ (m) C2,ɛ M (B m) ɛ (0, ) is non-increasing as ɛ 0, the sequence inf (m) µ A ɛ I ( µ P (m)) ɛ (0, ) is non-decreasing as ɛ 0. Also, this sequence inf (m) µ A I ( ɛ µ P (m) ) ɛ (0, ) is bounded above by inf (m) µ R1,0 (m) C2 M(B m) I ( µ P (m)). Therefore, the sequence inf (m) µ A I ( µ P (m)) ɛ (0, ) converges as ɛ 0 to some limit ɛ Ī inf (m) µ R1,0 (m) C2 M(B m) I ( µ P (m)), and we can take a decreasing sequence of positive numbers ɛ l l N such that inf (m) µ A I ( µ P (m)) ɛ l N increases to Ī as l l. Lemma 1 (i) implies that (m) R1,ɛ l and (m) C2,ɛ l are closed sets for each l N. Since M (B m ) is compact (see, e.g., Dembo and Zeitouni 1998, Appendix D.2), A (m) ɛ l is also compact for each l N. Thus, the lower semicontinuity of I ( µ P (m)) for µ A ɛ (m) l implies that the infimum inf µ A (m) ɛ l i.e., there exists µ l A ɛ (m) l l N. I (µ P (m) ) is attained on the compact set A (m) ɛ l, such that I (µ l P (m) ) = inf µ A (m) ɛ l I (µ P (m) ) for each Now consider the sequence of the sets cl(µ l l l ) l N. Since cl(µ l l l ) A (m) ɛ l for each l N,cl(µ l l l ) is compact for each l N. Thus, since cl(µ l l l ) l N is a non-increasing sequence of non-empty compact sets, the Heine Borel theorem implies that l =1 cl(µ l l l ) is non-empty. Pick any µ l =1 cl(µ l l l ). Since I (µ l P (m) ) Ī for each l N, wehaveµ l l l µ M : I (µ P (m) ) Ī. Since µ M : I (µ P (m) ) Ī is closed (from the lower semicontinuity of I (µ P (m) ) for µ M), we have cl(µ l l l ) µ M : I (µ P (m) ) Ī for each l N, which implies I ( µ P (m) ) Ī. On the other hand, since µ A ɛ (m) l for each l N,wehave µ (m) R1,0 (m) C2 M (B m), which implies Ī inf (m) µ R1,0 (m) C2 M(B m)

GNP optimality of empirical likelihood 787 I (µ P (m) ) I ( µ P (m) ). Combining these results, I ( µ P (m) ) Ī inf R1,0 (m) C2 M(B m) I (µ P (m) ) I ( µ P (m) ), i.e., inf (m) µ A I ( µ P (m)) ɛ l N increases to the limit Ī = inf (m) l µ R1,0 (m) C2 M(B m) I (µ P (m) ) as l. Therefore, the conclusion is obtained. References Dembo, A., Zeitouni, O. (1998). Large deviations techniques and applications, (2nd ed.). Heidelberg: Springer. Deuschel, J. D., Stroock, D. W. (1989). Large Deviations. New York: Academic Press. DiCiccio, T. J., Hall, P., Romano, J. P. (1991). Empirical likelihood is Bartlett-correctable. Annals of Statistics, 19, 1053 1061. Groeneboom, P., Oosterhoff, J., Ruymgaart, F. H. (1979). Large deviation theorems for empirical probability measures. Annals of Probability, 7, 553 586. Hoeffding, W. (1965). Asymptotically optimal tests for multinomial distributions. Annals of Mathematical Statistics, 36, 369 401. Kitamura, Y. (2001). Asymptotic optimality of empirical likelihood for testing moment restrictions. Econometrica, 69, 1661 1672. Newey, W. K., Smith, R. J. (2004). Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica, 72, 219 255. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237 249. Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. Annals of Statistics, 22, 300 325. Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: Wiley. Steinberg, Y., Zeitouni, O. (1992). On tests for normality. IEEE Transactions on Information Theory, 38, 1779 1787. Zeitouni, O., Gutman, M. (1991). On universal hypothesis testing via large deviations. IEEE Transactions on Information Theory, 37, 285 290. Zeitouni, O., Ziv, J., Merhav, N. (1992). When is the generalized likelihood ratio test optimal? IEEE Transactions on Information Theory, 38, 1597 1602.