Lecture 3 Consistency of Extremum Estimators 1

Similar documents
1 Extremum Estimators

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Lecture 2: Consistency of M-estimators

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Problem Set 2 Solution

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elementary Analysis in Q p

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Elementary theory of L p spaces

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

General Linear Model Introduction, Classes of Linear models and Estimation

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

Estimating Time-Series Models

Lecture 8 Inequality Testing and Moment Inequality Models

Sums of independent random variables

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

MATH 2710: NOTES FOR ANALYSIS

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Nonparametric estimation of Exact consumer surplus with endogeneity in price

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015)

Convex Analysis and Economic Theory Winter 2018

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Extension of Minimax to Infinite Matrices

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

Lecture 4: Law of Large Number and Central Limit Theorem

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

2 Asymptotic density and Dirichlet density

2 Asymptotic density and Dirichlet density

Chapter 3. GMM: Selected Topics

Existence Results for Quasilinear Degenerated Equations Via Strong Convergence of Truncations

Real Analysis 1 Fall Homework 3. a n.

Variable Selection and Model Building

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Asymptotic theory for linear regression and IV estimation

Notes on Instrumental Variables Methods

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Mollifiers and its applications in L p (Ω) space

Optimism, Delay and (In)Efficiency in a Stochastic Model of Bargaining

Analysis of some entrance probabilities for killed birth-death processes

Multiplicity of weak solutions for a class of nonuniformly elliptic equations of p-laplacian type

8.7 Associated and Non-associated Flow Rules

LORENZO BRANDOLESE AND MARIA E. SCHONBEK

t 0 Xt sup X t p c p inf t 0

arxiv:math/ v4 [math.gn] 25 Nov 2006

On the minimax inequality and its application to existence of three solutions for elliptic equations with Dirichlet boundary condition

HENSEL S LEMMA KEITH CONRAD

Econometrica Supplementary Material

Stochastic integration II: the Itô integral

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

On Z p -norms of random vectors

On a Markov Game with Incomplete Information

Marcinkiewicz-Zygmund Type Law of Large Numbers for Double Arrays of Random Elements in Banach Spaces

Sobolev Spaces with Weights in Domains and Boundary Value Problems for Degenerate Elliptic Equations

Consistency and asymptotic normality

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important

1 Probability Spaces and Random Variables

Review of Probability Theory II

Estimation of the large covariance matrix with two-step monotone missing data

Advanced Econometrics II (Part 1)

1 Riesz Potential and Enbeddings Theorems

Estimation of Dynamic Regression Models

3 Properties of Dedekind domains

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

GENERICITY OF INFINITE-ORDER ELEMENTS IN HYPERBOLIC GROUPS

Commutators on l. D. Dosev and W. B. Johnson

MATH 6210: SOLUTIONS TO PROBLEM SET #3

Boundary regularity for elliptic problems with continuous coefficients

B8.1 Martingales Through Measure Theory. Concept of independence

Lecture 3 January 16

Chapter 7: Special Distributions

HIGHER ORDER NONLINEAR DEGENERATE ELLIPTIC PROBLEMS WITH WEAK MONOTONICITY

Exercises Econometric Models

Optimal Estimation and Completion of Matrices with Biclustering Structures

Topic 7: Using identity types

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS

Uniform Law on the Unit Sphere of a Banach Space

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Best approximation by linear combinations of characteristic functions of half-spaces

Extremal Polynomials with Varying Measures

BEST CONSTANT IN POINCARÉ INEQUALITIES WITH TRACES: A FREE DISCONTINUITY APPROACH

Consistency and asymptotic normality

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

p-adic Measures and Bernoulli Numbers

MA3H1 TOPICS IN NUMBER THEORY PART III

The inverse Goldbach problem

The non-stochastic multi-armed bandit problem

Trading OTC and Incentives to Clear Centrally

A Note on Massless Quantum Free Scalar Fields. with Negative Energy Density

On the Interplay of Regularity and Decay in Case of Radial Functions I. Inhomogeneous spaces

7. Introduction to Large Sample Theory

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

COMMUNICATION BETWEEN SHAREHOLDERS 1

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Research Article Positive Solutions of Sturm-Liouville Boundary Value Problems in Presence of Upper and Lower Solutions

Transcription:

Lecture 3 Consistency of Extremum Estimators 1 This lecture shows how one can obtain consistency of extremum estimators. It also shows how one can find the robability limit of extremum estimators in cases where they are not consistent. The arameter sace of interest is Θ R d. Extremum estimators EE) { θ n : n 1} are defined to be random elements of Θ that aroximately minimize a stochastic criterion function ˆQ n θ). That is, θ n is defined to satisfy Definition EE: θ n Θ and ˆQ n θ n ) inf θ Θ ˆQn θ) + o 1). Here are some examles of extremum estimators: 1) Maximum Likelihood ML) Estimator: Suose the data {W i : i n} are iid with density fw, θ) with resect to some measure µ that does not deend on θ). The likelihood function is Π n fw i, θ). The log-likelihood function is log fw i, θ). The ML estimator θ n maximizes the likelihood function or the log-likelihood function over Θ. Equivalently, the ML estimator θ n minimizes at least u to o 1)) over θ Θ. ˆQ n θ) = 1 n log fw i, θ) 1) In most cases in econometrics, the vector W i can be decomosed into two subvectors: W i = Y i, X i ), where Y i is a vector of outcome or resonse) variables and X i is a vector of exlanatory variables. Suose the conditional density of Y i given X i = x is fy x, θ) and the marginal density of X i is some density gx) that does not deend on θ. Then, fw, θ) = fy x, θ)gx) and ˆQ n θ) = 1 n = 1 n log fw i, θ) log fy i X i, θ) 1 n log gx i). 2) Since the second term on the right-hand side does not deend on θ, it can be deleted without affecting the definition of the ML estimator. That is, one can define the ML estimator to minimize minus the conditional log-likelihood ˆQ n θ) = 1 n log fy i X i, θ) 3) 1 The note for this lecture is largely adated from the note of Donald Andrews on the same toic. I am grateful for Professor Andrews generosity and elegant exosition. All errors are mine. Xiaoxia Shi Page: 1

which does not require that one secifies the marginal distribution of the exlanatory variables X i. In the case of deendent data, we write the log-likelihood function as the roduct of conditional distributions. Suose W i = Y i, X i ), where Y i and X i are as above. Let fy i x i, x i 1,..., x 1, y i 1, y i 2,..., y 1, θ) denote the conditional density of Y i given X i = x i, X i 1 = x i 1,..., X 1 = x 1 and Y i 1 = y i 1, Y i 2 = y i 2,..., Y 1 = y 1. Let gx i x i 1,..., x 1, y i 1,..., y 1 ) denote the conditional density of X i given X i 1 = x i 1,..., X 1 = x 1 and Y i 1 = y i 1,..., Y 1 = y 1. If g ) does not deend on θ, then the exlanatory variables {X 1,..., X n } are said to be weakly exogenous. If g ) does not deend on θ or y i 1,..., y 1, then the exlanatory variables are said to be strongly exogenous. Suose the exlanatory variables are weakly or strongly exogenous. function can be written as Then, the likelihood = gx 1 )fy 1 X 1, θ) gx 2 X 1, Y 1 )fy 2 X 2, X 1, Y 1, θ)... gx n X n 1,..., X 1, Y n 1,..., Y 1 )fy n X n,..., X 1, Y n 1,..., Y 1, θ) n fy i X i,..., X 1, Y i 1,..., Y 1, θ)gx i X i 1,..., X 1, Y i 1,..., Y 1 ). 4) The log-likelihood function is n n log fy i X i,..., X 1, Y i 1,..., Y 1, θ) + log gx i X i,..., X 1, Y i 1,..., Y 1 ). 5) In this case, the ML estimator minimizes n ˆQ n θ) = 1 n log fy i X i,..., X 1, Y i 1,..., Y 1, θ). 6) If {Y i : i 1} is Markov of order κ, then fy i x i,..., x 1, y i 1,..., y 1, θ) = fy i x i,..., x i κ, y i 1,..., y i κ, θ) and we take n ˆQ n θ) = 1 n log fy i X i,..., X i κ, Y i 1,..., Y i κ, θ). 7) 2) Least Squares LS) Estimator for Nonlinear Regression: Suose the data W i = Y i, X i ) for i n are iid and satisfy the nonlinear regression model Y i = gx i, θ 0 ) + ε i, 8) Xiaoxia Shi Page: 2

where Eε i X i ) = 0 a.s. The LS estimator θ n minimizes ˆQ n θ) = 1 n Y i gx i, θ)) 2 /2 9) over θ Θ. The scale factor 1/2 is used because it is convenient for the asymtotic normality results given below. It has no effect on the consistency results.) 3) Generalized Method of Moments GMM) Estimator: Let θ 0 denote the true value. Suose the data {W i : i n} are iid and the following moment conditions hold EgW i, θ) { = 0 if θ = θ 0 0 if θ θ 0, 10) where gw, θ) R k for some k d. Let A n be a k k random weight) matrix. Then, the GMM estimator θ n minimizes ˆQ n θ) = 1 A n n over θ Θ, where is the Euclidean norm. gw i, θ) 2 /2 11) 4) Minimum Distance MD) Estimator: Let π n be a consistent unrestricted estimator of a k- vector arameter π 0. Suose π 0 is known to be a function of a d-vector arameter θ 0, where d k: π 0 = gθ 0 ). 12) Let A n be a k k random weight matrix. Then, the MD estimator θ n minimizes ˆQ n θ) = A n π n gθ)) 2 /2 13) over θ Θ. For examle, in the correlated random effects anel data model, π n is the OLS or FGLS) estimator of π 0. 2 5) Two-ste TS) Estimator: Suose the data {W i : i n} are iid, τ n is a reliminary consistent estimator of a arameter τ 0, G n θ, τ) is a random k-vector that should be close to 0 when 2 Wooldridge 2002),. 445) A anel data correlated random effect model is of the form: y it = ψ + x it β + c i + v it, 14) where x it is a k-dimensional row vector, x i = x i1,..., x it ), c i = T t=1 x itλ t, E v it ) = 0 and Ex i v it) = 0, t = 1,..., T. The structual arameter of the model is θ = ψ, λ 1,..., λ T, β ). To aly the MD aroach, one first estimates by OLS) y it = π t0 + x i π t + v it, t = 1,..., T. 15) Let π = π 10, π 1,..., π T 0, π T ). Then one forms the MD roblem using the restriction π = Hθ, 16) Xiaoxia Shi Page: 3

θ = θ 0, τ = τ 0, and n is large e.g., in the GMM case, G n θ, τ) = 1 n gw i, θ, τ) and in the MD case G n θ, τ) = πτ) gθ, τ)), and A n is a k k random weight matrix. Then, the TS estimator θ n minimizes over θ Θ. The criterion function ˆQ n θ) is assumed to satisfy: Assumtion U WCON: su θ Θ ˆQ n θ) Qθ) Θ. ˆQ n θ) = A n G n θ, τ n ) 2 /2 17) For a given estimator of interest, one has to verify this condition. What is the function Qθ) in our examles? It is given by 0 for some nonstochastic function Qθ) on 1) ML Estimator: Qθ) = E log fw i, θ), where E denotes exectation under the true distribution generating the data. 2) LS Estimator: Qθ) = EY i gx i, θ)) 2 /2. 3) GMM Estimator: Qθ) = AEgW i, θ) 2 /2, where A n A. 4) MD Estimator: Qθ) = Aπ 0 gθ)) 2 /2, where A n A and πn π 0. 5) TS Estimator: Qθ) = AGθ, τ 0 ) 2 /2, where A n A, τn τ0, and G n θ, τ) Gθ, τ). The next assumtion is that of identifiable uniqueness of some non-stochastic element θ 0 of Θ. It is to this arameter value that the estimators { θ n : n 1} converge as n. Let Bθ, ε) denote an oen ball in Θ of radius ε centered at θ. Assumtion ID: There exists θ 0 Θ such that ε > 0, inf θ / Bθ0,ε) Qθ) > Qθ 0 ). A necessary condition for Assumtion ID is that θ 0 uniquely minimizes Qθ) over Θ. What values uniquely minimize Qθ) in the examles? The answer is as follows: where H = 1 0 0 0 I KT E 1 1 0 0 0 I KT E 2......... 1 0 0 0 I KT E T, E t = e t I K, and e t is a T -vector whose tth element is 1 and other elements are 0. The MD objective function is as defined in 13) with g θ) = Hθ and some user-chosen weight matrix A n. Xiaoxia Shi Page: 4

1) ML Estimator with iid observations): If the true density fw, θ 0 ) of the data is in the arametric family {fw, θ) : θ Θ}, then θ 0 minimizes Qθ): Qθ 0 ) Qθ) = E θ0 logfw i, θ)/fw i, θ 0 )) log E θ0 fw i, θ)/fw i, θ 0 ) fw, θ) = log fw, θ 0 ) fw, θ 0)dµw) = 0, where the inequality holds by Jensen s inequality because log ) is a concave function. The inequality is strict, and so, θ 0 uniquely minimizes Qθ) over Θ if and only if P fw i, θ) fw i, θ 0 )) > 0 θ Θ with θ θ 0. Suose the true density f ) of W i is not in the arametric family {f, θ) : θ Θ}. In this case, we roceed as follows. The Kullback Liebler Information Criterion KLIC) between f ) and f, θ) is defined by KLICf, f, θ)) = E f log fw i ) E f log fw i, θ), where E f denotes exectation when W i has density f. Note that KLICf, f, θ)) = E f log fw i ) + Qθ). In consequence, the ML estimator under missecification often called the quasi-ml estimator) converges in robability to the arameter value θ 0 that uniquely minimizes the KLIC between the true density f and the densities in the arametric family {f, θ) : θ Θ} rovided such a unique value exists). 2) LS Estimator: The true value θ 0 minimizes Qθ) if the regression model is correctly secified i.e., if there is a θ 0 Θ such that EY i X i ) = gx i, θ 0 ) a.s.): Qθ) Qθ 0 ) = EU i + gx i, θ 0 ) gx i, θ)) 2 /2 EUi 2 /2 = EgX i, θ 0 ) gx i, θ)) 2 /2 + EU i gx i, θ 0 ) gx i, θ)) = EgX i, θ 0 ) gx i, θ)) 2 /2 0. The inequality is strict, and so, θ 0 uniquely minimizes Qθ) over Θ if and only if P gx i, θ) gx i, θ 0 )) > 0 θ Θ with θ θ 0. Xiaoxia Shi Page: 5

Suose the nonlinear regression model is not correctly secified. Let gx) = EY i X i = x). Then, Qθ) = EY i gx i ) + gx i ) gx i, θ)) 2 /2 = EY i gx i )) 2 /2 + EgX i ) gx i, θ)) 2 /2 and a value θ 0 uniquely minimizes Qθ) over Θ if it uniquely minimizes EgX i ) gx i, θ)) 2 over θ Θ. That is, the LS estimator converges to the oint θ 0 that gives the best mean squared aroximation in the family {g, θ) : θ Θ} to the conditional mean of Y i given X i rovided the best aroximation is unique). 3) GMM Estimator: If A is nonsingular and there exists a unique value θ 0 Θ such that EgW i, θ 0 ) = 0, then θ 0 uniquely minimizes Qθ) over Θ. Suose the moment conditions are missecified and no value θ Θ is such that EgW i, θ) = 0. Then, θ 0 is the value that uniquely minimizes Qθ) = EgW i, θ) A AEgW i, θ)/2, if such a value exists. 4) MD Estimator: If A is nonsingular and there exists a unique value θ 0 Θ such that π 0 = gθ 0 ), then θ 0 uniquely minimizes Qθ) over Θ. Suose the restrictions are missecified and no value θ Θ is such that π 0 = gθ). Then, θ 0 is the value that uniquely minimizes Qθ) = π 0 gθ)) A Aπ 0 gθ))/2, if such a value exists. 5) TS Estimator: If A is nonsingular and there exists a unique value θ 0 Θ such that Gθ 0, τ 0 ) = 0, then θ 0 uniquely minimizes Qθ) over Θ. The following theorem gives sufficient conditions for θ n θ0. Theorem 3.1 Consistency): EE, ID, & U WCON θ n θ0. Proof of Theorem 31: Given ε > 0, δ > 0 such that θ / Bθ 0, ε) Qθ) Qθ 0 ) δ > 0. Thus, P θn / Bθ 0, ε)) P Q θ n ) Qθ 0 ) δ) = P Q θ n ) ˆQ n θ n ) + ˆQ n θ n ) Qθ 0 ) δ) P Q θ n ) ˆQ n θ n ) + ˆQ n θ 0 ) + o 1) Qθ 0 ) δ) P 2 su ˆQ n θ) Qθ) + o 1) δ) 0. 18) θ Θ Xiaoxia Shi Page: 6

Primitive Sufficient Conditions for the Identification Condition The following assumtion is sufficient for Assumtion ID. Assumtion ID1: i) Θ is comact. ii) Qθ) is continuous on Θ. iii) θ 0 uniquely minimizes Qθ) over θ Θ. Lemma 3.1: ID1 ID. Proof of Lemma 3.1: Problem Set Question.) Hint: Use a roof by contradiction. Note that none of the three conditions of Assumtion ID1 is redundant. If Θ is non-comact, Qθ) is not continuous, or Qθ) is not uniquely minimized at θ 0, then Assumtion ID fails. On the other hand, only Assumtion ID1iii) is a necessary condition for Assumtion ID. When does Assumtion ID1 hold in the examles? Obviously ID1i) holds in each examle if Θ is comact. We have already given conditions under which ID1iii) holds in each examle. To verify ID1ii) in the examles, the following lemma is useful. Lemma 3.2: Suose mw i, θ) is continuous in θ at each θ Θ with robability one and E su θ Θ mw i, θ) <. Then, EmW i, θ) is continuous in θ on Θ. For examle, when ˆQ n θ) is of the form ˆQ n θ) = n 1 mw i, θ), then Qθ) = EmW i, θ). This occurs in the ML and nonlinear regression examles. In the GMM examle, ˆQn θ) is of the form ˆQ n θ) = A n n 1 mw i, θ) 2 /2 and Qθ) = AEmW i, θ) 2 /2. In this case too, continuity of EmW i, θ) yields continuity of Qθ). Proof of Lemma 3.2: The result holds by the dominated convergence theorem with the dominating function given by su θ Θ mw i, θ), since lim mw i, θ ) = mw i, θ) a.s. and 19) θ θ E su mw i, θ) <. 20) θ Θ Using Lemma 3.2, we see that ID1ii) holds for the 1) ML estimator if fw, θ) is continuous in θ at each θ Θ with robability one and E su θ Θ log fw i, θ) <, 2) LS estimator if gx, θ) is continuous in θ at each θ Θ with robability one and E su θ Θ Y i gx i, θ)) 2 < or, equivalently, Eε 2 i < and E su θ Θ gx i, θ 0 ) gx i, θ)) 2 < ), and 3) GMM estimator if gw, θ) is continuous in θ at each θ Θ with robability one and E su θ Θ gw i, θ) <. Xiaoxia Shi Page: 7

Assumtion ID1ii) holds for the MD and TS estimators if gθ) and Gθ, τ 0 ) are continuous in θ on Θ resectively. Consistency without Point Identification In recent years, there have been considerable interest in artially identified models, in which the arameters cannot be inned down to a unique oint even if one had infinite amount of data. In the context of extreme estimation, artial identification amounts to the failure of Assumtion ID. Tyically, Assumtion ID is relaced by a set identification condition: Assumtion SetID: There exists Θ 0 Θ such that su θ Θ0 Qθ) = inf θ Θ Qθ) and ε > 0, inf θ / BΘ0,ε) Qθ) > Qθ 0 ) The set Θ 0 is called the identified set. Assumtion EE is relaced with a set estimator Θ n. Consistency of the set estimator tyically means consistency in the Hausdorff distance, where the Hausdorff distance is a distance measure between two sets: d H A, B) = su inf a b + su inf a b. 21) a A b B b B a A A natural set estimator is the argmin set of ˆQ n θ). However, under Assumtions U-WCON and SetID, this set estimator is not necessarily consistent in Hausdorff distance. It is not difficult to show roblem set question) that it is half-hausdorff consistent under these two conditions, i.e., su inf θ θ 0. 22) θ Θ 0 θ Θ n That is: the argmin set aroaches the identified set, but it might not include enough oints to aroach every oint in the identified set. There have been different roosals for solving this roblem: 1. Hausdorff consistency of the argmin set can be roved in the secial case where ˆQ n θ) degenerate in the interior of Θ 0 and the closure of the interior equals Θ 0, i.e.,su θ Θ ε ˆQ n θ) = 0 inf θ Θ ˆQn θ) with robability aroaching 1 for any ε > 0 and Θ ε 0 being the set of oints that are at least ε away from the comlement of Θ 0, and lim ε 0 d H Θ 0, Θ ε 0 ) = 0. This is a secial case of Chernozhukov, Hong and Tamer 2007, Condition C.3) 2. Use a bigger set: let Θ n be all oints such that ˆQ n θ) inf θ Θ ˆQn θ) + τ n where τ n is a ositive sequence that converges to zero at an aroriate rate. A stronger convergence rate) assumtion is needed to relace U-WCON for such estimators to be consistent. Chernozhukov, Hong and Tamer 2007, Econometrica),Theorem 3.1) 3. Abandon consistent estimation all together and focus on confidence set. The argument for this roosal is that the identified set is not the true arameter and thus is not the arameter of Xiaoxia Shi Page: 8

interest anyway, while a confidence set for the true arameter can be constructed and interreted in the conventional sense. Andrews and Soares 2010, Econometrica), Andrews and Guggenberger 2010, Econometrica), etc.) 4. Make assumtions so that certain features of the identified set like the uer left corner) is oint identified and focus on such features. Pakes, Porter, Ho and Ishii 2012, Econometrica)) 5. Bayesian analysis, where there is no notion of artial identification. Liao and Jiang 2010, Annals of Statistics)) Below I describe Chernozhukov, Hong and Tamer 2007) s results regarding solutions 1 and 2. But before doing so, here are a coule of motivating examles of artially identified models. Examle. Interval Outcome in Regression Model. Suose that one has a regression model: Y = X β 0 + ε, EεX) = 0, 23) where Y is the outcome variable and X is the regressors. Assume that variables in X are all nonnegative. Suose that Y is unobserved. Only an interval that includes Y is observed: [Y l, Y u ]. No addition information on how [Y l, Y u ] is related to Y is available. Then the model imlies and is imlied by) the following moment inequalities: E[XY u XX β 0 ] 0, and E[XX β 0 XY l ] 0. 24) One criterion function used for estimating the identified set of β 0 is: ˆQ n β) = A n [ m n β)] 2, 25) where A n is a weighting matrix,[x] = minx, 0, and m n β) = n 1 n n ) X i Y u,i X i X iβ), n 1 X i X iβ X i Y l,i ), 26) The function ˆQ n β) converges under aroriate conditions) uniformly to the function Qθ) = A mθ) 2, where mθ) = The function Qθ) does not have unique minimum. E[XY u XX β], E[XX β XY l ] ). 27) Proosition 3.1. Under P-WCON, IDSet and the degeneracy condition described in solution 1 above, the argmin set Θ n = {θ : ˆQ n θ) = inf θ Θ ˆQn θ)} is consistent in Hausdorff distance. Xiaoxia Shi Page: 9

Proof. The Hausdorff consistency is imlied by Equation 22) and su inf θ θ 0. 28) θ Θ n The roof of Equation 22) is left as a roblem set question. Here we rove 28). Consider the following derivation: for any ε > 0, Pr su inf θ θ > ε) θ Θ n Pr su inf θ Θ δ 0 θ θ > ε ) + PrΘ δ 0 Θ n ). 29) For any fixed δ > 0, the second term of the RHS of the above dislay converges to zero as n. The first term can be made arbitrarily small by choosing δ small enough. Therefore the limit of the LHS is zero as n. The roblem with solution 1 is that the degeneracy condition often is not satisfied and certainly is tricky to verify. The second solution needs more assumtions on the convergence of the samle criterion functions. The additional assumtion is given below as Assumtion RT, where RT stands for rate of convergence). Part a) of this assumtion is without loss of generality because one can always recenter the criterion functions by their infimum and make this assumtion hold. In art b), the rate n can be relaced with any other rate, as long as the τ n choice in the roosition below adjusts with this rate accordingly. Assumtion RT a)inf θ Θ ˆQn θ) inf θ Θ Qθ) and b) n su θ Θ0 ˆQ n θ) Qθ) = O 1). Proosition 3.2. Under Assumtions U-WCON, RT and IDset, if nτ n, then the augmented argmin set defined in solution 2 above is consistent in Hausdorff distance. Proof. As in the revious roostion, we only rove 28). It suffices to show that Θ 0 Θ n with robability aroaching 1. Consider the following derivation: PrΘ 0 Θ n ) Pr su Pr Pr Pr su su ) ˆQn θ) > inf ˆQ n θ) + τ n θ Θ ) ˆQn θ) inf Qθ) > τ n θ Θ ) ˆQ n θ) Qθ) > τ n n su ˆQ n θ) Qθ) > nτ n ). 30) Xiaoxia Shi Page: 10

The last line of the above dislay goes to zero as n due to Assumtion RTb). The assumtions required for solution 2 are innocuous enough, but it leaves the choice of τ n oen. Other than a loose rate requirement, there is no ractical way to choose τ n. As a result, one man s set estimator can be 10 times bigger than another man s simly due to different choices of τ n. Xiaoxia Shi Page: 11