MAXIMUM LIKELIHOOD ESTIMATION AND UNIFORM INFERENCE WITH SPORADIC IDENTIFICATION FAILURE. Donald W. K. Andrews and Xu Cheng.

Similar documents
ESTIMATION AND INFERENCE WITH WEAK, SEMI-STRONG, AND STRONG IDENTIFICATION. Donald W. K. Andrews and Xu Cheng. October 2010 Revised July 2011

Estimation and Inference with Weak, Semi-strong, and Strong Identi cation

Estimation and Inference with Weak Identi cation

Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation

IDENTIFICATION-ROBUST SUBVECTOR INFERENCE. Donald W. K. Andrews. September 2017 Updated September 2017 COWLES FOUNDATION DISCUSSION PAPER NO.

GENERIC RESULTS FOR ESTABLISHING THE ASYMPTOTIC SIZE OF CONFIDENCE SETS AND TESTS. Donald W.K. Andrews, Xu Cheng and Patrik Guggenberger.

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Applications of Subsampling, Hybrid, and Size-Correction Methods

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Testing for Regime Switching: A Comment

A Conditional-Heteroskedasticity-Robust Con dence Interval for the Autoregressive Parameter

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

GMM ESTIMATION AND UNIFORM SUBVECTOR INFERENCE WITH POSSIBLE IDENTIFICATION FAILURE

A CONDITIONAL-HETEROSKEDASTICITY-ROBUST CONFIDENCE INTERVAL FOR THE AUTOREGRESSIVE PARAMETER. Donald W.K. Andrews and Patrik Guggenberger

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Chapter 1. GMM: Basic Concepts

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Simple Estimators for Semiparametric Multinomial Choice Models

IDENTIFICATION- AND SINGULARITY-ROBUST INFERENCE FOR MOMENT CONDITION MODELS. Donald W. K. Andrews and Patrik Guggenberger

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

The Impact of a Hausman Pretest on the Size of a Hypothesis Test: the Panel Data Case

Supplemental Material to IDENTIFICATION- AND SINGULARITY-ROBUST INFERENCE FOR MOMENT CONDITION MODELS. Donald W. K. Andrews and Patrik Guggenberger

Parametric Inference on Strong Dependence

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

GMM estimation of spatial panels

7 Semiparametric Methods and Partially Linear Regression

Simple Estimators for Monotone Index Models

Inference for Parameters Defined by Moment Inequalities: A Recommended Moment Selection Procedure. Donald W.K. Andrews and Panle Jia

Rank Estimation of Partially Linear Index Models

13 Endogeneity and Nonparametric IV

Notes on Generalized Method of Moments Estimation

11. Bootstrap Methods

Introduction: structural econometrics. Jean-Marc Robin

Tests for Cointegration, Cobreaking and Cotrending in a System of Trending Variables

Closest Moment Estimation under General Conditions

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

On the Power of Tests for Regime Switching

GMM based inference for panel data models

On the Asymptotic Size Distortion of Tests When Instruments Locally Violate the Exogeneity Assumption

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

On Standard Inference for GMM with Seeming Local Identi cation Failure

What s New in Econometrics. Lecture 13

Nonlinear Programming (NLP)

MC3: Econometric Theory and Methods. Course Notes 4

The properties of L p -GMM estimators

Closest Moment Estimation under General Conditions

Notes on Time Series Modeling

Testing for a Trend with Persistent Errors

Subsets Tests in GMM without assuming identi cation

Inference Based on Conditional Moment Inequalities

Chapter 2. Dynamic panel data models

A CONDITIONAL-HETEROSKEDASTICITY-ROBUST CONFIDENCE INTERVAL FOR THE AUTOREGRESSIVE PARAMETER. Donald Andrews and Patrik Guggenberger

Spatial panels: random components vs. xed e ects

Chapter 2. GMM: Estimating Rational Expectations Models

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

A more powerful subvector Anderson Rubin test in linear instrumental variable regression

A New Approach to Robust Inference in Cointegration

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

ECON0702: Mathematical Methods in Economics

University of Toronto

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Approximately Most Powerful Tests for Moment Inequalities

Stein s method and weak convergence on Wiener space

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Estimator Averaging for Two Stage Least Squares

Xiaohu Wang, Peter C. B. Phillips and Jun Yu. January 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1778

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 Procedures robust to weak instruments

2014 Preliminary Examination

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Robust Standard Errors in Transformed Likelihood Estimation of Dynamic Panel Data Models with Cross-Sectional Heteroskedasticity

Bayesian Modeling of Conditional Distributions

Estimation and Inference with a (Nearly) Singular Jacobian

The Hausman Test and Weak Instruments y

Weak Identification in Maximum Likelihood: A Question of Information

Cointegration Tests Using Instrumental Variables Estimation and the Demand for Money in England

Robust Solutions to Multi-Objective Linear Programs with Uncertain Data

Control Functions in Nonseparable Simultaneous Equations Models 1

Economics 241B Estimation with Instruments

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

City, University of London Institutional Repository. This version of the publication may differ from the final published version.

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk)

(Y jz) t (XjZ) 0 t = S yx S yz S 1. S yx:z = T 1. etc. 2. Next solve the eigenvalue problem. js xx:z S xy:z S 1

Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models

Linear Algebra. Linear Algebra. Chih-Wei Yi. Dept. of Computer Science National Chiao Tung University. November 12, 2008

Stochastic Processes

When is it really justifiable to ignore explanatory variable endogeneity in a regression model?

ON THE ASYMPTOTIC BEHAVIOR OF THE PRINCIPAL EIGENVALUES OF SOME ELLIPTIC PROBLEMS

Economics 620, Lecture 18: Nonlinear Models

A CONSISTENT SPECIFICATION TEST FOR MODELS DEFINED BY CONDITIONAL MOMENT RESTRICTIONS. Manuel A. Domínguez and Ignacio N. Lobato 1

Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk)

How to Attain Minimax Risk with Applications to Distribution-Free Nonparametric Estimation and Testing 1

A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS. By Marcelo J. Moreira 1

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

Transcription:

MAXIMUM LIKELIHOOD ESTIMATION AND UNIFORM INFERENCE WITH SPORADIC IDENTIFICATION FAILURE By Donald W. K. Andrews and Xu Cheng October COWLES FOUNDATION DISCUSSION PAPER NO. 8 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY Box 88 New Haven, Connecticut 65-88 http://cowles.econ.yale.edu/

Maximum Likelihood Estimation and Uniform Inference with Sporadic Identi cation Failure Donald W. K. Andrews Cowles Foundation Yale University Xu Cheng Department of Economics University of Pennsylvania First Draft: August, 7 Revised: October 7, The rst author gratefully acknowledges the research support of the National Science Foundation via grant numbers SES-7557 and SES-58376. The authors thank Xiaohong Chen, Sukjin Han, Yuichi Kitamura, Peter Phillips, Eric Renault, Frank Schorfheide, and Ed Vytlacil for helpful comments.

Abstract This paper analyzes the properties of a class of estimators, tests, and con dence sets (CS s) when the parameters are not identi ed in parts of the parameter space. Speci cally, we consider estimator criterion functions that are sample averages and are smooth functions of a parameter : This includes log likelihood, quasi-log likelihood, and least squares criterion functions. We determine the asymptotic distributions of estimators under lack of identi cation and under weak, semi-strong, and strong identi cation. We determine the asymptotic size (in a uniform sense) of standard t and quasi-likelihood ratio (QLR) tests and CS s. We provide methods of constructing QLR tests and CS s that are robust to the strength of identi cation. The results are applied to two examples: a nonlinear binary choice model and the smooth transition threshold autoregressive (STAR) model. Keywords: Asymptotic size, binary choice, con dence set, estimator, identi cation, likelihood, nonlinear models, test, smooth transition threshold autoregression, weak identi- cation. JEL Classi cation Numbers: C, C5.

. Introduction This paper provides a set of maximum likelihood (ML) regularity conditions under which the asymptotic properties of ML estimators and corresponding t and QLR tests and con dence sets (CS s) are obtained. The novel feature of the conditions is that they allow the information matrix to be singular in parts of the parameter space. In consequence, the parameter vector is unidenti ed and weakly identi ed in some parts of the parameter space, while semi-strongly and strongly identi ed in other parts. The conditions maintain the usual assumption that the log-likelihood satis es a stochastic quadratic expansion. The results also apply to quasi-log likelihood and nonlinear least squares procedures. Compared to standard asymptotic results in the literature for ML estimators, tests, and CS s, the results given here cover both xed and drifting sequences of true parameters. The latter are necessary to treat cases of weak identi cation and semi-strong identi cation. In particular, they are necessary to determine the asymptotic sizes of tests and CS s (in a uniform sense). This paper is a sequel to Andrews and Cheng (7a) (AC). The method of establishing the results outlined above and in the Abstract is to provide a set of su cient conditions for the high-level conditions of AC for estimators, tests, and CS s that are based on smooth sample-average criterion functions. The high-level conditions in AC involve the behavior of the estimator criterion function under certain drifting sequences of distributions. In contrast, the assumptions given here are much more primitive. They only involve mixing, smoothness, and moment conditions, plus conditions on the parameter space. The paper considers models in which the parameter of interest is of the form = (; ; ); where is identi ed if and only if 6= ; is not related to the identi cation of ; and = (; ) is always identi ed. For examples, the nonlinear binary choice model is of the form: Y i = (Yi > ) and Yi = h(x i ; ) + Zi U i ; where (Y i ; X i ; Z i ) is observed and h(; ) is a known function. The STAR model is of the form: Y t = + Y t + m(y t ; ) + U t ; where Y t is observed and m(; ) is a known function. In general, the parameters ; ; and may be scalars or vectors. We determine the asymptotic properties of ML estimators, tests, and CS s under drifting sequences of parameters/distributions. Suppose the true value of the parameter is n = ( n ; n ; n ) for n ; where n indexes the sample size. The behavior of ML estimators and test

statistics depends on the magnitude of jj n jj: The asymptotic behavior of these statistics varies across three categories of sequences f n : n g: Category I(a): n = 8n ; is unidenti ed; Category I(b): n 6= and n = n! b R d ; is weakly identi ed; Category II: n! and n = jj n jj! ; is semi-strongly identi ed; and Category III: n! 6= ; is strongly identi ed. For Category I sequences, we obtain the following results: the estimator of is inconsistent, the estimator of = (; ) and the t and QLR test statistics have non-standard asymptotic distributions, and the standard tests and CS s (that employ standard normal or critical values) have asymptotic null rejection probabilities and coverage probabilities that may or may not be correct depending on the model. (In many cases, they are not correct). For Category II sequences, estimators and standard tests and CS s are found to have standard asymptotic properties, but the rate of convergence of the estimator of is less than n = : Speci cally, the estimators are asymptotically normal and the test statistics have asymptotic chi-squared distributions. For Category III sequences, the estimators and standard tests and CS s have standard asymptotic properties and the estimators converge at rate n = : We also consider t and QLR tests and CS s that are robust to the strength of identi cation. These procedures use di erent critical values from the standard ones. First, we consider critical values based on asymptotically least-favorable sequences of distributions. Next, we consider data-dependent critical values that employ an identi cationcategory selection procedure that determines whether is near the value that yields lack of identi cation of ; and if it is, the critical value is adjusted (in a smooth way) to take account of the lack of identi cation or weak identi cation. We show that the robust procedures have correct asymptotic size (in a uniform sense). The data-dependent robust critical values yield more powerful tests than the least favorable critical values. The numerical results for the STAR and nonlinear binary choice models are summarized as follows. The asymptotic distributions of the estimators of and are far from the normal distribution under weak identi cation and lack of identi cation. The asymptotic distributions range from being strongly bimodal, to being close to uniform, to being extremely peaked. The asymptotics provide remarkably accurate approximations to the nite-sample distributions. In the STAR model, the standard t and QLR con dence intervals (CI s) for have Here, by correct we mean or less for tests and or greater for CS s, where and are the nominal sizes of the tests or CS s.

substantial asymptotic size distortions with asymptotic sizes equaling :56 and :7; respectively, for nominal :95 CI s. This is also true for the t and QLR CI s for ; where the asymptotic sizes are : and :8; respectively. Note that the size distortions are noticeably larger for the standard t than QLR CI. In the binary choice model, the standard t and QLR CI s for have incorrect asymptotic sizes: :68 versus :9; respectively, for nominal :95 CI s. However, the standard t and QLR CI s for have small and no size distortion, respectively. In both models, the asymptotic sizes provide very good approximations to the nite-sample sizes for the cases considered. In both models, the robust CI s have correct asymptotic sizes and nite-sample sizes that are quite close to the asymptotic size for the QLR CI s and fairly close for the t CI s. In sum, the numerical results indicate that the asymptotic results of the paper are quite useful in determining the nite-sample behavior of estimators and standard tests and CI s under weak identi cation and lack of identi cation. They are also quite useful in designing robust tests and CI s whose nite-sample size is close to their nominal size. The results of this paper apply when the criterion function satis es a stochastic quadratic expansion in the parameter : This rules out a number of interesting models that exhibit lack of identi cation in parts of the parameter space, including regime switching models, mixture models, abrupt transition structural change models, and abrupt transition threshold autoregressive models. Now, we brie y discuss the literature related to this paper. See AC for a more detailed discussion. The following are companion papers to this one: AC, Andrews and Cheng (7c) (AC-SM), and Andrews and Cheng (8) (AC3). These papers provide related, complementary results to the present paper. AC provides results under high-level conditions and analyzes the ARMA(, ) model in detail. AC-SM provides proofs for AC and related results. AC3 provides results for estimators and tests based on generalized method of moments (GMM) criterion functions. It provides applications to an endogenous nonlinear regression model and an endogenous binary choice model. Cheng (8) provides results for a nonlinear regression model with multiple sources of weak identi cation, whereas the present paper only considers a single source. However, the present paper applies to a much broader range of models. Tests of H : = versus H : 6= are tests in which a nuisance parameter only appears under the alternative. Such tests have been considered in the literature See AC for references concerning results for these models. 3

starting from Davies (977). The results of this paper cover tests of this sort, as well as tests for a whole range of linear and nonlinear hypotheses that involve (; ; ) and corresponding CS s. The weak instrument (IV) literature is closely related to this paper. However, papers in that literature focus on criterion functions that are indexed by parameters that do not determine the strength of identi cation. In contrast, in this paper, the parameter ; which determines the strength of identi cation of ; appears as one of parameters in the criterion function. Selected papers from the weak IV literature include Nelson and Startz (99), Dufour (997), Staiger and Stock (997), Stock and Wright (), Kleibergen (, 5), Moreira (3), and Kleibergen and Mavroeidis (9). Andrews and Mikusheva () and Qu () consider Lagrange multiplier (LM) tests in a maximum likelihood context where identi cation may fail, with emphasis on dynamic stochastic general equilibrium models. The results of the present paper apply to t and QLR statistics, but not to LM statistics. The consideration of LM statistics is in progress. Antoine and Renault (9, ) and Caner () consider GMM estimation with IV s that lie in the semi-strong category, using our terminology. Nelson and Startz (7) and Ma and Nelson (8) analyze models like those considered in this paper. However, they do not provide asymptotic results or robust tests and CS s of the type given in this paper. Sargan (983), Phillips (989), and Choi and Phillips (99) provide nite-sample and asymptotic results for linear simultaneous equations models when some parameters are not identi ed. Phillips and Shi () provide results for a nonlinear regression model with non-stationary regressors in which identi cation may fail. The remainder of the paper is organized as follows. Section introduces the smooth sample average extremum estimators, criterion functions, tests, CS s, and drifting sequences of distributions considered in the paper. Section 3 states the assumptions employed. Section provides the asymptotic results for the extremum estimators. Section 5 establishes the asymptotic distributions of QLR statistics, determines the asymptotic size of standard QLR CS s, and introduces robust QLR tests and CS s, whose asymptotic size is equal to their nominal size. Section 6 considers t-based CS s. The nonlinear binary choice model is used as a running example in the previous sections. Section 7 provides results for the smooth transition threshold autoregressive model (STAR) model. Section 8 provides numerical results for the STAR and binary choice models. Appendix A provides proofs of the results given in the paper. Appendix B provides some mis-

cellaneous results. Three Supplemental Appendices to this paper are given in Andrews and Cheng (7b). Supplemental Appendix C provides additional numerical results for the nonlinear binary choice and STAR models. Supplemental Appendices D and E verify the assumptions for the nonlinear binary choice model and the STAR model, respectively. All limits below are taken as n! : Let min (A) and max (A) denote the smallest and largest eigenvalues, respectively, of a matrix A: All vectors are column vectors. For notational simplicity, we often write (a; b) instead of (a ; b ) for vectors a and b: Also, for a function f(c) with c = (a; b) (= (a ; b ) ); we often write f(a; b) instead of f(c): Let d denote a d-vector of zeros. Because it arises frequently, we let denote a d -vector of zeros, where d is the dimension of a parameter : Let R [] = R [ fg : Let R p [] = R [] :::R [] with p copies. Let ) denote weak convergence of a sequence of stochastic processes indexed by for some space : 3. Estimator and Criterion Function.. Smooth Sample Average Estimators We consider an extremum estimator b n that is de ned by minimizing a sample criterion function of the form Q n () = n nx (W i ; ) ; (.) i= where fw i : i ng are the observations and (w; ) is a known function that is twice continuously di erentiable in : This includes ML and LS estimators. The observations fw i : i ng may be i.i.d. or strictly stationary. Formal assumptions are provided in Section 3 below. The paper considers the case where is not identi ed (by the criterion function Q n ()) at some points in the parameter space. Lack of identi cation occurs when the Q n () is at wrt some sub-vector of : To model this identi cation problem, is parti- 3 In the de nition of weak convergence, we employ the uniform metric d on the space E v of R v -valued functions on : See the Outline of the Supplemental Appendix of AC for more details. 5

tioned into three sub-vectors: = (; ; ) = ( ; ); where = (; ): (.) The parameter R d is unidenti ed when = ( R d ): The parameter = (; ) R d is always identi ed. The parameter R d does not e ect the identi cation of : These conditions allow for a wide range of cases, including cases in which reparametrization is used to convert a model into the framework considered here. Example. This example is the nonlinear binary choice model Y i = (Y i > ) and Y i = h(x i ; ) + Z i U i ; (.3) where h(x i ; ) R is known up to the nite-dimensional parameter R d : Suppose h(x; ) is twice continuously di erentiable wrt for any x in the support of X i and the rst- and second-order partial derivatives are denoted by h (x; ) and h (x; ); respectively. The observed variables are fw i = (Y i ; X i ; Z i ) : i = ; :::; ng: The random variables f(x i ; Z i ; U i ) : i = ; :::; ng are i.i.d. The distribution of (X i ; Z i ) is ; which is an in nitedimensional nuisance parameter. The parameter of interest is = (; ; ): Conditional on (X i ; Z i ); the distribution function (df) of U i is L(u): The df L(u) is known and does not depend on (X i ; Z i ): For example, L(u) is the standard normal distribution df in a probit model and the logistic df in a logit model. We assume that L(u) is twice continuously di erentiable and its rst- and second-order derivatives are denoted by L (u) and L (u); respectively. Suppose L (u) > and < L(u) < 8u R: In this model, P (Y i = jx i ; Z i ) = P (U i < h(x i ; ) + Z ijx i ; Z i ) = L(g i ()); where g i () = h(x i ; ) + Z i: (.) We estimate = (; ; ) by the ML estimator. The sample criterion function is nx Q n () = n [Y i log L(g i ()) + ( Y i ) log( L(g i ())] (.5) i= and the ML estimator minimizes Q n () over : (Here we use the negative of the 6

standard log-likelihood function so that the estimator minimizes the sample criterion function as in the general set-up of the paper.) When = ; g i () and Q n () do not depend on ; and is not identi ed. The true distribution of the observations fw i : i ng is denoted F for some parameter parameter space compact and is of the form: : We let P and E denote probability and expectation under F : The where is a compact subset of R d for the true parameter, referred to as the true parameter space, is = f = (; ) : ; ()g; (.6) and () 8 for some compact metric space with a metric that induces weak convergence of the bivariate distributions (W i ; W i+m ) for all i; m : In unconditional likelihood scenarios, no parameter appears. In conditional likelihood scenarios, with conditioning variables fx i : i g; indexes the distribution of fx i : i g: In nonlinear regression models estimated by least squares, indexes the regression functions and possibly a nite-dimensional feature of the distribution of the errors, such as its variance, and indexes the remaining characteristics of the distribution of the errors, which may be in nite dimensional. By de nition, the estimator b n (approximately) minimizes Q n () over an optimization parameter space : 5 b n and Q n ( b n ) = inf Q n() + o(n ): (.7) We assume that the interior of includes the true parameter space (see Assumption B below). This ensures that the asymptotic distribution of b n is not a ected by boundary constraints for any sequence of true parameters in : The focus of this paper is not on boundary e ects. Without loss of generality (wlog), the optimization parameter space can be written Thus, the metric satis es: if! ; then (W i ; W i+m ) under converges in distribution to (W i ; W i+m ) under : Note that is a metric space with metric d ( ; ) = jj jj + d ( ; ); where j = ( j ; j ) for j = ; and d is the metric on : 5 The o(n ) term in (.7), and in (.) and (.) below, is a xed sequence of constants that does not depend on the true parameter and does not depend on in (.). The o(n ) term allows for some numerical inaccuracy in practice and circumvents the issue of the existence of parameter values that achieve the in ma. 7

as = f = ( ; ) : (); g; where = f : ( ; ) for some g and () = f : ( ; ) g for : (.8) We allow () to depend on and, hence, need not be a product space between and : For example, this is needed in the STAR model and in the ARMA(, ) example in AC. Example (cont.). The true parameter space for is = B Z ; where B = [ b ; b ] R; (.9) b ; b ; b and b are not both equal to ; Z ( R d ) is compact, and ( R d ) is compact. The ML estimator of minimizes Q n () over : The optimization parameter space is = B Z ; where B = [ b ; b ] R; (.) b > b ; b > b ; Z ( R d ) is compact, ( R d ) is compact, Z int(z); and B int(b):.. Con dence Sets and Tests We are interested in the e ect of lack of identi cation or weak identi cation on the extremum estimator b n ; on CS s for various functions r() of ; and on tests of null hypotheses of the form H : r() = v: CS s are obtained by inverting tests. A nominal CS for r() is CS n = fv : T n (v) c n; (v)g; (.) where T n (v) is a test statistic, such as the QLR statistic, and c n; (v) is a critical value for testing H : r() = v: Critical values considered in this paper may depend on the null value v of r() as well as on the data. The coverage probability of a CS for r() is P (r() CS n ) = P (T n (r()) c n; (r())); (.) 8

where P () denotes probability when is the true value. We are interested in the nite-sample size of a CS, which is the smallest nitesample coverage probability of the CS over the parameter space. It is approximated by the asymptotic size, which is de ned as follows: AsySz = lim inf n! inf P (r() CS n ) = lim inf n! inf P (T n (r()) c n; (r())): (.3) For a test, we are interested in the maximum null rejection probability, which is the nite-sample size of the test. A test s asymptotic size is an approximation to the latter. The asymptotic size of a test of H : r() = v is AsySz = lim sup n!.3. Drifting Sequences of Distributions The uniformity over sup P (T n (v) > c n; (v)): (.) :r()=v for any given sample size n in (.3) and (.) is crucial for the asymptotic size to be a good approximation to the nite-sample size. The value of at which the nite-sample size of a CS or test is attained may vary with the sample size. Thus, to determine the asymptotic size we need to derive the asymptotic distribution of the test statistic T n (v n ) under sequences of true parameters n = ( n ; n ) and v n = r( n ) that may depend on n: As shown in Andrews and Guggenberger () and Andrews, Cheng, and Guggenberger (9), the asymptotic size of CS s and tests are determined by certain drifting sequences of distributions. The following sequences f n g are key: ( ) = ff n : n g : n! g ; (.5) n o ( ; ; b) = f n g ( ) : = and n = n! b R d [] ; and ( ; ;! ) = f n g ( ) : n = jj n jj! and n =jj n jj!! R d ; where = ( ; ; ; ) and n = ( n ; n ; n ; n ): The sequences in ( ; ; b) are in Categories I and II and are sequences for which f n g is close to : n! : When jjbjj < ; f n g is within O(n = ) of and the sequence is in Category I. The sequences in are more distant from = : n = jj n jj! : Throughout the paper we use the terminology: under f n g 9 ( ; ;! ) are in Categories II and III and ( ) to mean when

the true parameters are f n g ( ) for any ; under f n g ( ; ; b) to mean when the true parameters are f n g ( ; ; b) for any with = and any b R d [] ; and under f ng ( ; ;! ) to mean when the true parameters are f n g ( ; ;! ) for any and any! R d with jj! jj = : 3. Assumptions 3.. Smooth Sample Average Assumptions This section provides primitive su cient conditions for many of the high-level assumptions given in AC for the class of sample average criterion functions that are smooth in : Note that the high-level assumptions in AC concern limit behavior under drifting sequences of true distributions. In contrast, the assumptions given here concern behavior under xed true distributions and do not involve the sample size n: 6 In Assumptions S-S below, the true distribution of fw i : i g is F : The conditions in Assumptions S-S are assumed to hold for all = ( ; ; ; ) : Let C be a generic nite positive constant that does not necessarily take the same value when it appears in two di erent places. None of the constants that appear in Assumptions S-S depend on : 3... Assumption S The rst assumption is the following. Assumption S. Under any ; fw i : i g is a strictly stationary and strong mixing sequence with mixing coe cients m Cm A for some A > d q=(q d ) and some q > d ; or fw i : i g is an i.i.d. sequence and the constant q (that appears in Assumption S3 below) equals + for some > : In Assumption S, the decay rate of the strong mixing coe cients is used to obtain the stochastic equicontinuity of certain empirical processes using results in Hansen (996). The WLLN and CLT for strong mixing arrays also hold under this decay rate, see Andrews (988) and de Jong (997). In the i.i.d. case, the constant q is smaller than in the strong mixing case, which yields weaker moment restrictions in Assumption S3 below. 6 The su cient conditions given here imply Assumptions A, B3, C-C8, and D-D3 of AC.

Example (cont.). In this example, Assumption S holds with q = + for some > because fw i : i g are i.i.d. for each : 3... Assumption S The second assumption is as follows. Assumption S. (i) For some function (w; ) R; Q n () = n P n i= (W i; ) ; where (w; ) is twice continuously di erentiable in on an open set containing 8w W: (ii) (w; ) does not depend on when = 8w W: (iii) 8 with = ; E (W i ; ; ) is uniquely minimized by 8 : (iv) 8 with 6= ; E (W i ; ) is uniquely minimized by : (v) () is compact 8 ; and and are compact. (vi) 8" > ; 9 > such that d H ( ( ) ; ( )) < " 8 ; with k k < ; where d H () is the Hausdor metric. For i.i.d. observations with density f(w; ); the ML estimator is obtained by taking (W i ; ) = log f(w i ; ): For a stationary p-th order Markov process fw i : p + i ng; we let W i = (Wi ; :::; Wi (W i ; :::; Wi p) is f(w jwi ; :::; Wi p): If the conditional density of Wi given p; ); then the ML estimator is obtained by taking (W i ; ) = log f(wi jwi ; :::; Wi p; ): Example (cont.). Assumption S(i) holds in this example with (W i ; ) = Y i log L(g i ()) + ( Y i ) log( L(g i ()) (3.6) by (.5) and the smoothness conditions on h(x i ; ) and L(u): Assumption S(ii) holds because g i () does not depend on h(x i ; ) when = : For brevity, Assumptions S(iii) and S(iv) are veri ed in Supplemental Appendix D. The argument for Assumption S(iv) is a standard argument for ML estimators in well-identi ed scenarios. Assumption S(v) holds because () = B Z; which does not depend on ; = B Z ; and B; Z; and are all compact. Assumption S(vi) holds because () does not depend on : A class of examples of (w; ) functions that satisfy Assumption S(ii) are functions of the form (w; ) = (w; a(x; )h(x; ); ); where a(x; ) = ; 8w W; (3.7)

x is a sub-vector of w; and a(x; ) and h(x; ) are known functions. In (3.7), (w; ) does not depend on when = because a(x; ) = : Examples of a(x; ) include (i) a(x; ) = ; (ii) a(x; ) = exp() ; and (iii) a(x; ) = x : Example (i) covers the nonlinear regression example, where is the coe cient of the nonlinear regressor. Example (ii) demonstrates that a(x; ) can be nonlinear in provided a(x; ) = at = : Example (iii) covers the weak IV example and the case in which enters the model through a single index. The form in (3.7) does not require a regression model and it allows for complicated structural models by allowing di erent functional forms for a(x; ); h(x; ); and (w; ): Returning now to the general (w; ) case, Assumption S(vi) holds immediately in cases where () does not depend on : When () depends on ; the boundary of () is often a continuous linear function of ; as in the STAR model and the ARMA(,) model considered in AC. In such cases, it is simple to verify Assumption S(vi). 3..3. Assumption S3 Let (w; ) and (w; ) denote the rst-order and second-order partial derivatives of (w; ) wrt ; respectively. Let (w; ) and (w; ) denote the rst-order and second-order partial derivatives of (w; ) wrt : We de ne a matrix B() that is used to normalize the second-derivative matrix (w; ) so that its sample average has a nonsingular probability limit. Let B() = " I d d d dd ()I d # R d d ; where () = ( if is a scalar jjjj if is a vector : (3.8) We use a di erent de nition of B() in the scalar and vector cases because in the scalar case the use of ; rather than jjjj; produces noticeably simpler (but equivalent) formulae, but in the vector case jjjj is required. For 6= ; let B () (w; ) = y (w; ) and B () (w; )B () = y (w; ) + ()"(w; ); (3.9) where y (w; ) is symmetric and y (w; ); y (w; ); and "(w; ) satisfy Assumption S3 below. The re-scaling matrix B () in (3.9) is used to deal with the singularity issue

that arises when = : In particular, the covariance matrix of (W i ; ) is singular when = and close to singular when is close to. In contrast, the re-scaled quantity y (W i; ) has a covariance matrix that is not close to being singular even when is close to. Similarly, E (W i ; ) is singular when = and close to singular when is close to : Re-scaling of (W i ; ) yields a quantity y (W i; ) whose expectation is not close to singular even when is close to plus another term "(W i ; ) that is asymptotically negligible. Below we illustrate the form of y (w; ); y (w; ); and "(w; ) in Example and for (w; ) functions as in (3.7), see Section 3... Next, de ne V y ( ; ; ) = X m= Cov ( y (W i; ); y (W i+m; )); (3.) which does not depend on i because the observations are stationary under Assumption S. Under Assumptions S and S3(iii) below, V y ( ; ; ) exists by a standard strong mixing inequality. The form of Assumption S3 di ers depending on whether is a scalar or vector. We state Assumption S3 for the scalar case rst because it is simpler. Assumption S3 (scalar ). Cjj jj 8 with < j j < for some > : (i) E "(W i ; ) = and j j jje "(W i ; ; )jj (ii) For all > and some functions M (w) : W! R + and M (w) : W! R + ; jj (w; ) (w; )jj+jj y (w; ) y (w; )jj M (w) and jj y (w; ) y (w; )jj + jj"(w; ) "(w; )jj M (w); 8 ; with jj jj ; 8w W: (iii) E sup fj(w i ; )j + +jj (W i ; )jj + +jj y (W i; )jj + +M (W i )+jj y (W i; )jj q + jj"(w i ; )jj q + M (W i ) q g C for some > 8 ; where q is as in Assumption S. (iv) min (E de nite 8 : (W i ; ; )) > 8 when = and E y (W i; ) is positive (v) V y ( ; ; ) is positive de nite 8 : In Assumption S3(iii), the last three terms have bounded qth moments in order to establish the stochastic equicontinuity of empirical processes based on y (W i; ) and "(W i ; ) using Lemma 9. in Appendix A. In Assumptions S-S3, Assumptions S(ii), S(iii), S3(i), S3(iii), S3(iv) and S3(v) are related to the weak identi cation problem. Assumption S(ii) implies that the sample 3

criterion function is at in when = ; as in Assumption A of AC. Assumption S(iii) di ers from a standard condition in the sense that the population criterion function is not uniquely minimized by the true value when = : The Lipschitz condition in Assumption S3(i) typically holds because the partial derivative of E "(W i ; ; ) wrt is approximately proportional to jj jj when jj jj is close to : Because parts of B () diverge as converges to, the moment conditions for y (W i; ) and y (W i; ) in Assumption S3(iii) are stronger than standard moment conditions on the rst-order and second-order derivatives. These conditions hold in typical examples, see below, because the partial derivative of (w; ) wrt is small when is close to under Assumption S(ii). Hence, the rhs moments are uniformly bounded even after the scaling by B (): In Assumptions S3(iv) and S3(v), E y (W i; ) and V y ( ; ; ) typically are positive de nite due to the re-scaling in (3.9). Under Assumptions S-S3, the criterion function Q n () has probability limit Q(; ) = E (W i ; ) under any sequence of parameters n! : Example (cont.). In this example, y (W i; ); y (W i; ); and "(w; ) are de ned as follows. For notational simplicity, let L i (); L i(); and L () abbreviate L(g i ()); L (g i ()); and L (g i ()); respectively. Let d ;i() = (h(x i ; ); Zi) ; d i () = (h (X i ; ) ; Zi; h (X i ; ) ) ; and 3 d h (X i ; ) 6 D i () = d d d d d h (X i ; ) dd h (X i ; ) 7 5 : (3.) The rst-and second-order partial derivatives of (W i ; ) wrt to and are (W i ; ) = w ;i ()(Y i (W i ; ) = w ;i ()(Y i L i ())d ;i(); L i ())B()d i (); (W i ; ) = [w ;i()(y i L i ()) + w ;i ()(Y i L i ())]d ;i()d ;i() ; (W i ; ) = [w ;i()(y i L i ()) + w ;i ()(Y i L i ())]B()d i ()d i () B() w ;i () = +w ;i ()(Y i L i ())D i (); where L i() L i ()( L i ()) and w L i () ;i() = L i ()( L i ()) : (3.) See Section 3. for the calculation of these derivatives.

The re-scaled partial derivatives in (3.9) take the form y (W i; ) = w ;i ()(Y i L i ())d i (); y (W i; ) = [w;i()(y i L i ()) + w ;i ()(Y i L i ())]d i ()d i () ; and 3 d h (X i ; ) 6 7 "(w; ) = w ;i ()(Y i L i ()) d d d d d 5 : (3.3) h (X i ; ) dd h (X i ; ) To verify Assumption S3(i), note that E (Y i jx i ; Z i ) = P (Y i = jx i ; Z i ) = L i ( ) by (.). Hence, E (Y i L i ( )jx i ; Z i ) = implies E "(W i ; ) = by the law of iterated expectations (LIE). Let L i = sup jl i()j and L i = sup jl i ()j: A mean-value expansion of L i ( ; ) wrt around yields L i ( ; ) L i ( ) = L i( ; e) @g i( ; e) ( @ ) = L i( ; e)h (X i ; ; e) ( ); (3.) where e is between and : To verify the second part of Assumption S3(i), we have jje (w ;i ( ; )[Y i L i ( ; )]h (X i ; ))jj = jje (w ;i ( ; )[L i ( ) L i ( ; )]h (X i ; ))jj j j jj jje (w ;i L ih ;i) Cj j jj jj; (3.5) for some C < ; where the equality holds by LIE, the rst inequality holds using (3.), and the second inequality holds by the Cauchy-Schwarz inequality and the moment conditions in (3.3). Similarly, we have jje (w ;i ( ; )[Y i L i ( ; )]h (X i ; ))jj Cj j jj jj (3.6) for some C < : By (3.3), (3.5), and (3.6), the second part of Assumption S3(i) holds. The rest of Assumption S3 is veri ed in Supplemental Appendix D for this example. 5

When is a vector, i.e., d > ; we reparameterize as (jjjj;!); where! = =jjjj if 6= and by de nition! = d =jj d jj with d = (; :::; ) R d if = : Correspondingly, is reparameterized as + = (jjjj;!; ; ): Let + = f + : + = (jjjj; =jjjj; ; ); g: This new parameterization is needed when is a vector because y (w; ); y (w; ); and "(w; ) typically involve =jjjj due to the re-scaling in (3.9) and =jjjj is not continuous in for : In consequence, the Lipschitz conditions in Assumptions S3(ii) and S3(iii) (scalar ) can not be veri ed when is a vector. The new parameterization treats jjjj and! = =jjjj as separate variables. In Assumption S3 (vector ) below, some Lipschitz conditions are speci ed in terms of + = (jjjj;!; ; ): In Assumption S3 (vector ), both the original parameterization with and the alternative parameterization with + are employed for convenience. Note that only conditions related to y (w; ); y (w; ); and "(w; ) require the alternative parameterization with + : Assumption S3 (vector ). (i) E "(W i ; ) = and jj jj jje "(W i ; + )jj C(jj jj + jj!! jj) 8 + = (jj jj;!; ; ) and 8 with < jj jj < for some > : (ii) For all > and some functions M (w) : W! R + and M (w) : W! R + ; jj (w; ) (w; )jj + jj y (w; + ) y (w; + )jj M (w) and jj (w; ) (w; )jj + jj y (w; + ) y (w; + )jj + jj"(w; + ) "(w; + )jj M (w); 8 ; with jj jj ; 8 + ; + + with jj + + jj ; 8w W: (iii) Assumptions S3(iii)-(iv) (scalar ) hold with the de nitions of M (w) and M (w) replaced by those given above. Assumption S3(i) (vector ) typically holds because the partial derivatives of E "(W i ; + ) wrt and! are approximately proportional to jj jj: 3... Forms of y (w; ); y (w; ); and "(w; ) Now, we illustrate the forms of y (w; ); y (w; ); and "(w; ) when (w; ) belongs to the class speci ed in (3.7) and show that Assumption S3(i) holds in this case. For simplicity, we assume a(x; ) and h(x; ) are both scalars and no parameter appears. Let () and () abbreviate the rst- and second-order derivatives of (w; a(x; )h(x; )) wrt a(x; )h(x; ): Let a (x; ); a (x; ); h (x; ); h (x; ) denote the rst- and second-order partial derivatives of a(x; ) and h(x; ) wrt and : 6

The rst and second order partial derivatives of (w; ) wrt to and are (w; ) = ()a (x; )h(x; ); (w; ) = ()a(x; )h (x; ); (w; ) = ()a (x; )a (x; ) h (x; ) + ()a (x; )h(x; ); (w; ) = ()a(x; )h(x; )a (x; )h (x; ) + ()a (x; )h (x; ) ; and (w; ) = ()a (x; )h (x; )h (x; ) + ()a(x; )h (x; ): (3.7) In this case, we have y (w; ) = ()a y (x; ); y (w; ) = ()a y (x; )a y (x; ) ; where a y (x; ) = (a (x; ) a(x; ) h(x; ); () h (x; ) ) and " # "(w; ) = a (x; )h(x; ) a (x; )h (x; ) () h (x; )a (x; ) a(x;) h : (3.8) () (x; ) Note that a(x; ) is continuous at = in the scalar case. In particular, lim! a(x; ) = a (x; ) by a mean-value expansion because a(x; ) = and a(x; ) is continuously di erentiable in : In the vector case, lim!;=jjjj!! jjjj a(x; ) = a (x; )! : When "(w; ) takes the form in (3.8), Assumption S3 below implies Assumption S3(i). In Assumption S3 (i), X i is a sub-vector of W i that takes the place of x in w: Assumption S3. (i) X i is a vector of weakly exogenous variables such that E ( (W i ; a(x i ; )h(x i ; ))jx i ) = a.s. 8 : (ii) E sup jjjj<; j (W i ; a(x i ; )h(x i ; ))j (jjh(x i ; )jj + jjh (X i ; )jj) (jh(x i ; )j + jjh (X i ; )jj + jjh (X i ; )jj) sup jjjj< jja (X i ; )jj (jja (X i ; )jj + jja (X i ; )jj) C for some C < and > 8 : Several of the derivatives in Assumption S3 (ii) are constants in many examples, which makes the moment condition in Assumption S3 (ii) less restrictive than it may appear. For example, when a(x i ; ) = ; a (X i ; ) = and a (X i ; ) = : Lemma 3.. Suppose (w; ) belongs to the class in (3.7), where a(x; ) R and h(x; ) R are twice di erentiable wrt and ; respectively, and no parameter appears. Then, "(w; ) takes the form in (3.8) and Assumption S3(i) is implied by Assumption S3 in both the scalar and vector cases. 7

Comment. When (w; ) belongs to the class in (3.7) and a parameter appears, the form of "(w; ) is the same as in (3.8) but with zeros in the rows and columns that correspond to : In this case, Assumption S3(i) is still implied by Assumption S3 provided () and () in Assumption S3 are adjusted to include ; evaluated at : See Appendix B for details. 3..5. Assumption S Next, we state an assumption that controls how the mean E ;i() changes as the true changes, where = ( ; ; ; ): De ne the d d -matrix of partial derivatives of the average population moment function wrt the true value, ; to be K(; ) = @ @ E (W i ; ): (3.9) The domain of the function K(; ) is ; where = f : jjjj < g and = f a = (a; ; ; ) : = (; ; ; ) with jjjj < and a [; ]g for some > : 7 Assumption S. (i) K(; ) exists 8(; ) : (ii) K(; ) is continuous in (; ) at (; ) = (( ; ); ) uniformly over 8 with = ; where is a sub-vector of : Assumption S is not restrictive in most applications. 8 For simplicity, K ( ; ; ) is abbreviated as K (; ) : Example (cont.). holds with It is shown in Supplemental Appendix D that Assumption S K(; ) = K( ; ; ) = E L i ( ) L i ( )( L i ( )) h(x i; )d ;i() (3.3) for = ( ; ): 7 The constant > is as in Assumption B(iii) stated below. The set is not empty by Assumption B(ii). 8 Assumptions S and S imply Assumption C5 of AC. A set of primitive su cient conditions for Assumption C5 of AC is given in Appendix A of AC-SM. These conditions also are su cient for Assumption S. 8

3.. Parameter Space Assumptions Next, we specify conditions on the parameter spaces and : De ne = f : jjjj < g; where is the true parameter space for ; see (.6). The optimization parameter space satis es: Assumption B. (i) int() : (ii) For some > ; f R d : jjjj < g Z set Z R d and as in (.8). (iii) is compact. for some non-empty open Because the optimization parameter space is user selected, Assumptions B(ii)-(iii) can be made to hold by the choice of : The true parameter space satis es: Assumption B. (i) is compact and (.6) holds. (ii) 8 > ; 9 = (; ; ; ) with < jjjj < : (iii) 8 = (; ; ; ) with < jjjj < for some > ; a = (a; ; ; ) 8a [; ]: Assumption B(ii) guarantees that is not empty and that there are elements of whose values are non-zero but are arbitrarily close to ; which is the region of the true parameter space where near lack of identi cation occurs. Assumption B(iii) ensures that is compatible with the existence of partial derivatives of certain expectations wrt the true parameter around = ; which arise in (3.9) and Assumption S. Example (cont.). Let = (; ); where is the distribution of (X i ; Z i ); and ; where is a compact metric space with some metric that induces weak convergence. The parameter space for the true value of is = f = (; ) : ; ()g; (3.3) where () 8 : The parameter space (); which must be speci ed precisely to obtain the uniform asymptotic results, is de ned as follows. For notational simplicity, let h i = sup jh(x i ; )j; h ;i = sup jjh (X i ; )jj; h ;i = sup jjh (X i ; )jj; w ;i = sup jw ;i ()j; and w ;i = sup jw ;i ()j: Let q = + for some > : 9

For any ; the true parameter space for is ( ) = f : E (h q i + h q ;i + h q ;i + jjz i jj q + w q ;i + w+ ;i ) C jjw ;i ( ) w ;i ( )jj M (W i )jj jj; jjw ;i ( ) w ;i ( )jj M (W i )jj jj; jjh (X i ; ) h (X i ; )jj M h (W i )jj jj; 8 ; for some functions M (W i ); M (W i ); M h (W i ); E (M (W i ) q=3 + M (W i ) =3 + M h (W i ) q=3 ) C; E sup j log L i ()j + + j log( L i ())j + C; P (a (h(x i ; ); h(x i ; ); Z i ) = ) < ; 8 ; with 6= ; 8a R d + with a 6= ; E d i ()d i () ) is positive de nite 8 g (3.3) for some C < ; where d i () = (h(x i ; ); Z i; h (X i ; ) ) : 9 3.3. Key Quantities Now, we de ne some of the key quantities that arise in the asymptotic distribution of the estimator b n and the test statistics considered. Let S = [I d : d d ] denote the d d selector matrix that selects out of : De ne ( ; ; ) = S V y (( ; ); ( ; ); )S ; H(; ) = E (W i ; ; ); J( ) = E y (W i; ); and V ( ) = V y ( ; ; ): (3.33) Example (cont.). The key quantities that determine the asymptotic behavior of the ML estimator in the binary choice model are as follows. The probability limit of the criterion function Q n () when the true value is is Q(; ) = E (W i ; ) = E E ((W i ; )jx i ; Z i ) = E [L i ( ) log L i () + ( L i ( )) log( L i ())]: (3.3) 9 In (3.3), the expectation E () only depends on : Because shows up in some other expectations, we use E () throughout the example for notational consistency.

By calculations given in Section 3. of Supplemental Appendix D, we have ( ; ; ) = E L i ( ) L i ( )( L i ( )) d ;i( )d ;i( ) ; L i ( ) H(; ) = E L i ( )( L i ( )) d ;i()d ;i() ; and J( ) = V ( ) = E L i ( ) L i ( )( L i ( )) d i( )d i ( ) : (3.35) 3.. Quadratic Approximations Here we specify certain quadratic approximations to Q n () and related results that hold under Assumptions S-S, B, and B. These results help to explain the form of the asymptotic distributions that arise in the results stated below. (i) Under f n g ( ; ; b) (de ned in (.5) above), the sample criterion function Q n () (= Q n ( ; )) has a quadratic expansion in around the point ;n = (; n ) for given for the form: Q n ( ; ) = Q n ( ;n ; ) + D Q n ( ;n ; ) ( ;n) + ( ;n) D Q n ( ;n ; )( ;n) + R n ( ; ); (3.36) where D Q n ( ;n ; ) and D Q n ( ;n ; ) denote the vector and matrix of rst and second partial derivatives of Q n ( ; ) with respect to ; respectively, evaluated at = ;n; and R n ( ; ) is a remainder term that is small uniformly in for close to ;n: (ii) Under f n g ( ; ;! ); the sample criterion function Q n () has a quadratic expansion in around the true value n of the form: Q n () = Q n ( n ) + DQ n ( n ) ( n ) + ( n)d Q n ( n )( n ) + R n(); (3.37) where DQ n ( n ) and D Q n ( n ) denote the vector and matrix of rst and second partial derivatives of Q n () with respect to ; respectively, evaluated at = n ; and R n() is a The precise conditions that the remainder R n ( ; ) satis es are speci ed in Assumption C of AC. The quadratic approximation result (i) and results (ii)-(iv) that follow are established in the proof of Theorem. given in Appendix A.

remainder term that is small for close to n : (iii) Under f n g ( ; ; b); the recentered and rescaled rst derivative of Q n () wrt satis es an empirical process CLT: G n () ) G(; ); where nx G n () = n = ;i( ;n ; ) E n ;i( ;n ; ) (3.38) i= and G(; ) is a mean zero Gaussian process indexed by with bounded continuous sample paths and covariance kernel ( ; ; ) for ; : (iv) Under f n g ( ; ;! ); the rescaled rst and second derivatives of Q n () satisfy n = B ( n )DQ n ( n )! d G ( ) N( d ; V ( )) (3.39) and J n = B ( n )D Q n ( n )B ( n )! p J( ) R d d 8 : (3.) 3.5. Assumptions C6 and C7 In this section, we state assumptions that concern the minimum of the limit of the normalized criterion function after has been concentrated out. De ne a weighted non-central chi-square process f(; ; b) : g and a nonstochastic function f(; ;! ) : g by (; ; b) = (G(; ) + K(; ) b) H (; ) (G(; ) + K(; )b) and (; ;! ) =! K(; ) H (; )K(; )! : (3.) The process (; ; b) is the limit under f n g ( ; ; b) for jjbjj < ; de ned in (.5), and the function (; ;! ) is the limit under f n g ( ; ; b) for jjbjj = : Under Assumptions S-S, f(; ; b) : g has bounded continuous sample paths a.s. To obtain the asymptotic distribution of b n when n = O(n = ) via the continuous mapping theorem, we use the following assumption. The precise conditions that the remainder R n() satis es are speci ed in Assumption D of AC. Assumptions C6 and C7 are the same as in AC, which is why the numbering starts at C6, rather than C.

Assumption C6. Each sample path of the stochastic process f(; ; b) : g in some set A( ; b) with P (A( ; b)) = is minimized over at a unique point (which may depend on the sample path), denoted ( ; b); 8 with = ; 8b with jjbjj < : In Assumption C6, ( ; b) is random. Next, we give a primitive su cient condition for Assumption C6 for the case where is a scalar parameter. Let (w; ) = ( (w; ) ; (w; ) ) : When = ; (w; ) does not depend on by Assumption S(ii) and is denoted by (w; ): When d = and = ; de ne (W i ; ; ; ) = ( (W i ; ; ); (W i ; ; ); (W i ; ) ) and X G ( ; ; ) = Cov ( (W i ; ; ; ); (W i+m ; ; ; )): (3.) m= Assumption C6 y. (i) d = (i.e., is a scalar). (ii) G ( ; ; ) is positive de nite 8 ; with 6= ; 8 with = : Lemma 3.. Assumptions S-S3 and C6 y imply Assumption C6: Example (cont.). For this example, Assumption C6 y is veri ed in Supplemental Appendix D with the covariance matrix in Assumption C6 y (ii) equal to L (Z G ( ; ; ) = E i ) L(Zi )( L(Zi )) h Z;i( ; )h Z;i ( ; ) ; where h Z;i ( ; ) = (h(x i ; ); h(x i ; ); Zi) : (3.3) The following assumption is used in the proof of consistency of b n for the case where the true parameter n satis es n! and n = jj n jj! : Assumption C7. The non-stochastic function (; ;! ) is uniquely minimized over at 8 with = : In Assumption C7, is non-random. Assumption C7 can be veri ed using the Cauchy-Schwarz inequality or a matrix version of it, see Tripathi (999), when K (; ) and H (; ) take proper forms, as in our examples. 3

Example (cont.). Assumption C7 is veri ed in this example as follows. By (.) and (3.35), when = ; By (.) and (3.3), when = ; Hence, when = ; H(; ) = E L (Z i ) L(Z i )( L(Z i )) d ;i()d ;i() : (3.) K(; ) = E L (Z i ) L(Z i )( L(Z i )) h(x i; )d ;i(): (3.5) K(; ) H (; )K(; ) E L (Z i ) L(Z i )( L(Z i )) h (X i ; ) (3.6) by the matrix Cauchy-Schwarz inequality in Tripathi (999). The holds as an equality if and only if h(x i ; )a + d ;i() b = with probability for some a R and b R d + with (a; b ) 6= : The holds as an equality uniquely at = because for any 6= ; P (c (h(x i ; ); h(x i ; ); Z i) = ) < for any c 6= by (3.3).. Estimation Results This section provides the asymptotic results of the paper for the extremum estimator b n : The results are given under the drifting sequences of distributions de ned in Section.3. De ne a concentrated extremum estimator b n() ( ()) of for given by Q n ( b n(); ) = inf Q n ( ; ) + o(n ): (.) () Let Q c n() denote the concentrated sample criterion function Q n ( b n(); ): De ne an extremum estimator b n ( ) by Q c n(b n ) = inf Qc n() + o(n ): (.) We assume that the extremum estimator b n in (.7) can be written as b n = ( b n(b n ); b n ): Note that if (.) and (.) hold and b n = ( b n(b n ); b n ); then (.7) automatically holds.

For n = ( n ; n ; n ; n ) ; let Q ;n = Q n ( ;n ; ); where ;n = (; n ): Note that Q ;n does not depend on by Assumption S(ii). De ne the Gaussian process f(; ; b) : g by (; ; b) = H (; )(G(; ) + K(; )b) (b; d ); (.3) where (b; d ) R d : Note that, by (3.) and (.3), (; ; b) = (=)((; ; b) + (b; d )) H(; )((; ; b) + (b; d )): Let ( ; b) = arg min (; ; b): (.) Theorem.. Suppose Assumptions S-S, B, B, and C6 hold. Under f n g ( ; ; b) with jjbjj < ; n = ( (a) b!! n n ) ( (! ; b); ; b) d ; and b n ( ; b) (b) n Q n ( b n ) Q ;n! d inf (; ; b): Comments.. The results of Theorem. and Theorem. below are like those of Theorems 5. and 5. of AC. However, Theorems. and Theorem. are obtained under assumptions that are much more primitive and easier to verify, though less general, than the results in AC. In particular, Assumptions S-S impose conditions for xed parameters, not conditions on the behavior of random variables under sequences of parameters. In addition, explicit formulae for the components of the asymptotic results are provided here based on the sample average form of Q n () that is considered.. De ne the Gaussian process f (; ; b) : g by where S = [I d (; ; b) = S (; ; b) + b; (.5) : d d ] is the d d selector matrix that selects out of : The asymptotic distribution of n = b n (without centering at n ) under ( ; ; b) with jjbjj < is given by ( ( ; b); ; b): 3. Assumption C6 is not needed for Theorem.(b). Let G ( ) N( d ; V ( )): (.6) 5

Theorem.. Suppose Assumptions S-S3, B, B, and C7 hold. Under f n g ( ; ;! ); (a) n = B( n )( b n n )! d J ( )G ( ) N( d ; J ( )V ( )J ( )); and (b) n(q n ( b n ) Q n ( n ))! d G ( ) J ( )G ( ): 5. QLR Con dence Sets In this section, we consider CS s based on the quasi-likelihood ratio (QLR) statistic. We establish (i) the asymptotic distribution of the QLR statistic under the drifting sequences of distributions de ned in Section.3, (ii) the asymptotic size of standard QLR CS s, which often are size-distorted, and (iii) the correct asymptotic size of robust QLR CS s, which are designed to be robust to the strength of identi cation. The proofs of the results given here rely on results given in Appendix A and AC. 5.. De nition of the QLR Test Statistic We consider CS s for a function r() ( R dr ) of obtained by inverting QLR tests. The function r() is assumed to be smooth and to be of the form r() = " r ( ) r () # ; (5.7) where r ( ) R dr ; dr is the number of restrictions on ; r () R dr ; dr is the number of restrictions on ; and d r = d r + d r : For v r(); we de ne a restricted estimator e n (v) of subject to the restriction that r() = v: By de nition, e n (v) ; r( e n (v)) = v; and Q n ( e n (v)) = The QLR test statistic for testing H : r() = v is inf Q n() + o(n ): (5.8) :r()=v QLR n (v) = n(q n ( e n (v)) Q n ( b n ))=bs n ; (5.9) where bs n is a random real-valued scaling factor that is employed in some cases to yield a QLR statistic that has an asymptotic d r null distribution under strong identi cation. 6

See Assumptions RQ and RQ3 below. Let c n; (v) denote a nominal level critical value to be used with the QLR test statistic. It may be stochastic or non-stochastic. The usual choice, based on the asymptotic distribution of the QLR statistic under standard regularity conditions, is the quantile of the d r distribution, which we denote by d r; : Given a critical value c n; (v); the nominal level QLR CS for r() is CS QLR r;n = fv r() : QLR n (v) c n; (v)g: (5.) 5.. QLR Assumptions If r() includes restrictions on ; i.e., d r > ; then not all values are consistent with the restriction r () = v : For v r (); the set of values that are consistent with r () = v is denoted by r (v ) = f : r () = v for some = ( ; ) g: (5.) If d r = ; then by de nition r (v ) = 8v r (): We assume r() satis es: Assumption RQ. (i) r() is continuously di erentiable on : (ii) r () (= (@=@ )r()) is full row rank d r 8 : (iii) r() satis es (5.7). (iv) d H ( r (v ); r (v ; ))! as v! v ; 8v ; r ( ): (v) Q( ; ; ) is continuous in at uniformly over (i.e., sup jq( ; ; ) Q( ; ; )j! as! ) 8 with = : (vi) Q(; ) is continuous in at 8 with 6= : In Assumption RQ(iv), d H denotes the Hausdor distance. In Assumptions RQ(iv) and (v), Q(; ) = E (W i ; ): Assumptions RQ(i) and RQ(ii) are standard and are not restrictive. Assumption RQ(iii) rules out the case where any single restriction depends on both and : This is restrictive. But, in some cases, a reparametrization can be used to obtain results for such restrictions, see AC for details. Assumption RQ(iv) is not very restrictive and is easy to verify in most cases. Assumptions RQ(v) and RQ(vi) are not restrictive. 7