Exclusion Restrictions in Dynamic Binary Choice Panel Data Models

Similar documents
Informational Content in Static and Dynamic Discrete Response Panel Data Models

Estimating Semi-parametric Panel Multinomial Choice Models

Flexible Estimation of Treatment Effect Parameters

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Econometric Analysis of Cross Section and Panel Data

Generated Covariates in Nonparametric Estimation: A Short Review.

Simple Estimators for Monotone Index Models

Rank Estimation of Partially Linear Index Models

Non-linear panel data modeling

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION

ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY

Working Paper No Maximum score type estimators

A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators

Simple Estimators for Semiparametric Multinomial Choice Models

A Local Generalized Method of Moments Estimator

Missing dependent variables in panel data models

Identification and Estimation of Marginal Effects in Nonlinear Panel Models 1

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Semiparametric Identification in Panel Data Discrete Response Models

What s New in Econometrics? Lecture 14 Quantile Methods

1 Motivation for Instrumental Variable (IV) Regression

1 Estimation of Persistent Dynamic Panel Data. Motivation

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Econ 273B Advanced Econometrics Spring

New Developments in Econometrics Lecture 16: Quantile Estimation

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

An Overview of the Special Regressor Method

Final Exam. Economics 835: Econometrics. Fall 2010

Nonseparable multinomial choice models in cross-section and panel data

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Semiparametric Efficiency in Irregularly Identified Models

IDENTIFICATION OF MARGINAL EFFECTS IN NONSEPARABLE MODELS WITHOUT MONOTONICITY

Identification in Nonparametric Limited Dependent Variable Models with Simultaneity and Unobserved Heterogeneity

ECO Class 6 Nonparametric Econometrics

Statistical Properties of Numerical Derivatives

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Partial Identification and Inference in Binary Choice and Duration Panel Data Models

On Uniform Inference in Nonlinear Models with Endogeneity

A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator

Identification in Discrete Choice Models with Fixed Effects

Estimation of Treatment Effects under Essential Heterogeneity

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Identification of Regression Models with Misclassified and Endogenous Binary Regressor

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

A Bootstrap Test for Conditional Symmetry

Dummy Endogenous Variables in Weakly Separable Models

Bias Corrections for Two-Step Fixed Effects Panel Data Estimators

Specification Test for Instrumental Variables Regression with Many Instruments

The properties of L p -GMM estimators

On IV estimation of the dynamic binary panel data model with fixed effects

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Increasing the Power of Specification Tests. November 18, 2018

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Comparison of inferential methods in partially identified models in terms of error in coverage probability

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Chapter 6 Stochastic Regressors

Principles Underlying Evaluation Estimators

Partial Rank Estimation of Transformation Models with General forms of Censoring

Conditions for the Existence of Control Functions in Nonseparable Simultaneous Equations Models 1

Estimation of Dynamic Regression Models

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Short T Panels - Review

Econometrics of Panel Data

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Applied Microeconometrics (L5): Panel Data-Basics

Control Functions in Nonseparable Simultaneous Equations Models 1

A Simple Estimator for Binary Choice Models With Endogenous Regressors

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

Additive Isotonic Regression

Unconditional Quantile Regression with Endogenous Regressors

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Estimation of Time-invariant Effects in Static Panel Data Models

Estimation of partial effects in non-linear panel data models

Binary Models with Endogenous Explanatory Variables

Nonparametric Identication of a Binary Random Factor in Cross Section Data and

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Cross-fitting and fast remainder rates for semiparametric estimation

Identification and Estimation of Nonlinear Dynamic Panel Data. Models with Unobserved Covariates

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Censored Regression Quantiles with Endogenous Regressors

A Simple Estimator for Binary Choice Models With Endogenous Regressors

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

NBER WORKING PAPER SERIES

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Econometrica Supplementary Material

Consistency and Asymptotic Normality for Equilibrium Models with Partially Observed Outcome Variables

Econometric Analysis of Games 1

Transcription:

Exclusion Restrictions in Dynamic Binary Choice Panel Data Models Songnian Chen HKUST Shakeeb Khan Boston College Xun Tang Rice University February 2, 208 Abstract In this note we revisit the use of exclusion restrictions in the semiparametric binary choice panel data model introduced in Honore and Lewbel (2002). We show that in a dynamic panel data setting (where one of the pre-determined explanatory variables is the lagged dependent variable), the exclusion restriction in Honore and Lewbel (2002) implicitly requires serial independence condition on an observed regressor, that if violated in the data will result in their procedure being inconsistent. We propose a new identification strategy and estimation procedure for the semiparametric binary panel data model under exclusion restrictions that accommodate the serial correlation of observed regressors in a dynamic setting. The new estimator converges at the parametric rate to a limiting normal distribution. This rate is faster than the nonparametric rates of existing alternative estimators for the binary choice panel data model, including the static case in Manski (987) and the dynamic case in Honore and Kyriazidou (2000). JEL Codes: C4,C23,C25. Keywords: Panel Data, Dynamic Binary Choice, Exclusion Restriction We are grateful to Bo Honore, Arthur Lewbel, Jim Powell, Elie Tamer for helpful feedback. We also thank conference and seminar participants at Cambridge, CEMFI, China ES Meetings, Cornell, CREST, Northwestern, Panel Data Workshop (U Amsterdam, 205), Pompeu Fabra, SETA (Taiwan), Texas Econometrics Camp (Houston, 206), Tilburg, UCL, UT Austin, World Congress ES Meetings (Montreal, 205), and Yale for comments. We thank Yichong Zhang and Michelle Tyler for capable research assistance. The usual disclaimer applies.

Introduction We revisit the use of exclusion restrictions in the semiparametric binary choice panel data model with predetermined regressors introduced in Honoré and Lewbel (2002). The identification strategy in Honoré and Lewbel (2002) requires an exclusion restriction (Assumption A.2) that one of the explanatory variables (which we refer to as an excluded regressor henceforth), is independent of the individual fixed effect and time-varying idiosyncratic errors, conditional on the other regressors. Their model is presented in a general framework where the explanatory variables are predetermined, e.g., including lagged dependent variables. As explained in Honore and Kyriazidou (2000), such models allow for both true state dependence in addition to unobserved heterogeneity. Without such an exclusion restriction, identification and inference in dynamic binary choice panel data models (where one of the predetermined explanatory variables is the lagged dependent variable) is complicated and non-standard, as shown in Honore and Kyriazidou (2000) and Hahn (200). Thus the introduction of exclusion restriction into the model by Honoré and Lewbel (2002) is well motivated. However, here we show that in a dynamic binary choice panel data model, the exclusion restriction in Honoré and Lewbel (2002) implicitly requires (conditional) serial independence of the excluded regressor mentioned above. If such serial independence is violated, then the main identifying condition (Assumption A.2) does not hold in general, and the inverse-density-weighted estimator in Honoré and Lewbel (2002) is generally inconsistent. We propose a new identification strategy and estimation method for this semiparametric binary choice panel data model under exclusion restrictions which can accommodate serial dependence of excluded regressors in a dynamic setting. The new estimator converges at the parametric rate to a limiting normal distribution. This rate is faster than the nonparametric rates of existing alternative estimators for the binary choice panel data model, including the static case in Manski (987) and the dynamic case in Honore and Kyriazidou (2000), both of which do not impose the exclusion restrictions mentioned above. To develop the intuition for how the exclusion restriction in Assumption A.2 of Honoré and Lewbel (2002) could fail in a dynamic setting, consider the binary choice panel data model: y it = I[v it + x itβ 0 + α i + ɛ it 0] (.) where i =, 2,..., n, and t =, 2,..., T. Here I[ ] is the indicator function that equals one if is true and zero otherwise, v it R is the excluded regressor whose coefficient that is normalized to one, x it is a vector of other regressors (possibly predetermined), β 0 is a vector of coefficients, α i is an individual-specific fixed effect, and the distribution of ɛ it is unknown. Honoré and Lewbel (2002) estimate this model under an exclusion restriction that e it α i +ɛ it is independent of v it, conditional on x it. In cross-sectional models with no individualspecific fixed effects α i, such an exclusion restriction has proven useful. Examples include See also Heckman (99) for a more detailed discussion on this matter. 2

Lewbel (2000), Lewbel and Tang (205) and Chen, Khan, and Tang (206). However, in a dynamic panel data model, one of the components in x it is the lagged dependent variable y it, which itself is a function of α i, ɛ it and v it. As a result, serial correlation in v it leads to a complex dependence structure between y it, v it and α i + ɛ it, which is generally not compatible with the exclusion restriction (Assumption A.2) in Honoré and Lewbel (2002). This is true even if v it is independent of α i and ɛ it conditional on the other components in x it in each period. The rest of the note is organized as follows. Section 2 uses a simplified version of (.) to formalize the intuition about why the exclusion restriction in Honoré and Lewbel (2002) does not hold in general when v it is serially correlated. Sections 3 and 4 introduce our new approach for estimating dynamic binary choice panel data models, based on a pairwise comparison approach that allows for serial correlation in the excluded regressor. Section 5 presents simulation evidence. Section 6 concludes. Technical proofs are collected in the appendix. 2 A Simplified Model To illustrate our main idea, consider a simplified version of the model in (.) with two periods following the initial condition y i0 (T = 2) and only two explanatory variables, which consist of an excluded regressor v it R and the lagged dependent variable y it : y it = I[v it + y it γ 0 + α i + ɛ it 0] for t =, 2, (2.) where the parameter of interest is γ 0. This specification is subsumed by the original model in (.) with x it y it. Honoré and Lewbel (2002) maintain the following exclusion restriction (on p.2055) on the model in (.) for identification and estimation: ASSUMPTION A.2: For t =, 2, α i + ɛ it is independent of v it conditional on x it and z i. Note that Honoré and Lewbel (2002) states this assumption by conditioning on an instrument z i, which may overlap with the exogenous variables. Likewise, our argument and results throughout the current paper are also valid conditional on any instruments available. In what follows, we suppress such instruments in the conditioning events to lighten the notation. For simplicity, suppose that the initial value y i0 is degenerate at 0 in the data-generating process. Assumption A.2 in Honoré and Lewbel (2002) requires (α i + ɛ it ) v it conditional on y it for t =, 2. (2.2) We show that, if v i, v i2 are serially correlated, then (2.2) does not hold in general even when (α i + ɛ i, α i + ɛ i2 ) is independent of (v i, v i2 ). To simplify notation, we drop the subscript i for all random variables, and let e t (α + ε t ) for t =, 2. Assume that (e, e 2 ) is independent of (v, v 2 ); and that the joint 3

distribution of (e, e 2 ) and the joint distribution of (v, v 2 ) are both exchangeable in the index t {, 2}. 2 Let F (and f) denote the marginal distribution (and density) of v t ; let G (and g) denote the marginal distribution (and density) of e t. Define F (s s) Pr(v s v 2 = s) and G(r r) Pr(e r e 2 = r). That F, G do not vary with t =, 2 is a consequence of the exchangeability condition. We will show that e 2 is not independent of v 2 conditional on y = (or equivalently, v e 0) if v t is serially correlated between t =, 2. By definition, = 2 Pr(e 2 r, v 2 s y = ) r s r=r, s=s 2 ( ) Pr(v e 0, e 2 r, v 2 s) r s Pr(v e 0) r=r, s=s = g(r)f(s) Pr(e s v = s, e 2 = r, v 2 = s)df ( s s) G( s)df ( s) = Pr(v e 0 e 2 = r, v 2 = s)g(r)f(s) Pr(v e 0) = g(r)f(s) G( s r)df ( s s) G( s)df ( s). where the third and fourth equalities follow from an application of the Law of Total Probability in the numerator and the denominator, and from the independence between (e, e 2 ) and (v, v 2 ). On the other hand, Pr(e 2 r y = ) r = [ ] Pr(v e 0, e 2 r) r=r r Pr(v e 0) r=r = g(r) Pr(e s v = s, e 2 = r)df ( s) G( s)df ( s) = Pr(v e 0 e 2 = r)g(r) Pr(v e 0) = g(r) G( s r)df ( s) G( s)df ( s) and Pr(v 2 s y = ) s = [ ] Pr(v e 0, v 2 s) s=s s Pr(v e 0) s=s = f(s) Pr(e s v = s, v 2 = s)df ( s s) G( s)df ( s) = Pr(v e 0 v 2 = s)f(s) Pr(v e 0) = f(s) G( s)df ( s s) G( s)df ( s), where the last two equalities hold because of similar reasons. Thus for all (e, v), ( ) 2 Pr(e 2 r,v 2 s y =) r s r=r, s=s ( ) ( Pr(e2 r y =) Pr(v2 s y =) r r=r s ) = s=s ( ) ( ) G( s r)df ( s s) G( s)df ( s). (2.3) G( s r)df ( s) G( s)df ( s s) The right-hand side (r.h.s.) of (2.3) is whenever v and v 2 are serially independent (F (s s) = F (s ) for all (s, s) on the joint support of (v, v 2 )). However, if v and v 2 are serially dependent, 2 Exchangeability is not assumed in Honoré and Lewbel (2002), but we introduce it here to demonstrate how Assumption A2 could be violated. 4

then the right-hand side of (2.3) is not equal to in general. To see this, consider an extreme case where v t has perfect correlation (v t is time-invariant with Pr(v = v 2 ) = ). Then the right-hand side of (2.3) becomes: G(s r) G( s)df ( s). G( s r)df ( s) G(s) Because G(. r) varies with r due to the dependence between e and e 2, this expression is in general not equal to for all (s, r) on the support of (e, e 2 ). To sum up, the identifying condition A.2 in Honoré and Lewbel (2002) implicitly requires that the excluded regressors (v t ) t=,2 be serially independent. Otherwise, A.2 does not hold in general, and the estimator in Honoré and Lewbel (2002) is generally inconsistent. 3 Identification with Serially Correlated Regressors Serial independence of observed covariates is hard to justify in a dynamic panel data setting. To address this limitation of the method in Honoré and Lewbel (2002), we introduce an alternative approach that is valid in the presence of serial dependence in the regressors. Consider the simplified version of the dynamic binary choice panel data model with two periods t =, 2 in (2.), where the initial value y i0 is stochastic and reported in the data. Let y i (y i0, y i, y i2 ), v i (v i, v i2 ) and ɛ i (ɛ i, ɛ i2 ). Our method requires the following conditions: EM (Random Sampling) For each cross-sectional unit i, the vector (y i, v i, α i, ɛ i ) is independently drawn from the same data-generating process. The vector (y i, v i ) is observed while (α i, ɛ i ) is not. EM2 (Exclusion Restriction) v i is independent of (ɛ i, α i, y i0 ), and is continuously distributed over a support V R 2. EM3 (Exchangeability) Conditional on y i0, e i (e i, e i2 ) ( α i ɛ i, α i ɛ i2 ) is continuously distributed with positive density over R 2, and is exchangeable in t =, 2. EM4 (Overlapping Support) There exists v, v V such that either v = v 2 and v 2 + γ 0 = v or v 2 = v and v 2 = v + γ 0. Unlike Assumption A.2 in Honoré and Lewbel (2002), the exclusion restriction in EM 2 does not condition on the endogenous lagged dependent variable y i. The exchangeability in EM3 holds, for example, if ɛ i is exchangeable in t =, 2 conditional on (α i, y i0 ). We impose no other 5

restriction on the distribution of (α i, ɛ i ) given y i0 than EM3. Condition EM4 ensures that the intersection of the marginal support of v i and v i2 is non-empty. A few remarks about how these conditions are related to the existing literature are in order. First, EM-EM3 allow for the serial correlation between regressors in v i, which as we show above are implicitly ruled out in Assumption A.2 in Honoré and Lewbel (2002). Second, our conditions are non-nested with those in Honore and Kyriazidou (2000), which allows the initial condition y i0 to depend on v i. However, our identification result only requires the data to report two periods T = 2 (not including the initial condition) whereas their approach requires T 3. We state our identification theorem for the model in (2.) with a stochastic initial value y i0. Theorem 3. Consider the model in (2.) with t =, 2. Under Assumptions EM, 2, 3, 4, the coefficient γ 0 is identified. Proof of Theorem 3.. Consider two observations i, j such that y i0 = y j0 = 0 and v j = v i2 = v for some v and Pr(y i = 0, y i2 = v i, y i0 = 0) = Pr(y j =, y j2 = 0 v j, y j0 = 0). (3.) Such a pair i, j and v exist under EM4. Under EM2, the left-hand side of (3.) is Pr(e i > v i, e i2 v i2 y i0 = 0) = Pr(e i > v i, e i2 v y i0 = 0), and the right-hand side of (3.) is Pr(e j v j, e j2 > v j2 + γ 0 y j0 = 0) = Pr(e j > v j2 + γ 0, e j2 v y j0 = 0), where the equality follows from the exchangeability of (e j, e j2 ) given y j0 = 0 in EM3. It follows from EM and EM3 that (3.) holds if and only if v i = v j2 + γ 0. This implies γ 0 is -identified as γ 0 = v i v j2 using any pair i, j such that y i0 = y j0 = 0, v j = v i2 and (3.) holds. Likewise, we can look for another pair of cross-sectional units k, l with y k0 = y l0 = and v l2 = v k = ṽ for some ṽ and Pr(y k = 0, y k2 = v k, y k0 = ) = Pr(y l =, y l2 = 0 v l, y l0 = ) (3.2) By a similar argument, the left-hand side of (3.2) is Pr(e k > v k + γ 0, e k2 v k2 y k0 = ) = Pr(e k > ṽ + γ 0, e k2 v k2 y k0 = ) and the right-hand side of (3.2) is Pr(e l v l + γ 0, e l2 > v l2 + γ 0 y l0 = ) = Pr(e l > ṽ + γ 0, e l2 v l + γ 0 y l0 = ). It then follows from EM and EM3 that (3.2) holds if and only if v l + γ 0 = v k2. Hence γ 0 is over-identified as v k2 v l using any pair k, l such that v l2 = v k and (3.2) holds. 6

3. Estimation We propose two estimators for γ 0 in (2.), based on the constructive argument for identification in Theorem 3.. As we show in Appendix A, both estimators converge at the parametric rate to a limiting normal distribution. The first estimator has a closed form as follows: j i ˆγ CF [ω ij,0(v i v j2 ) + ω ij, (v i2 v j )] j i (ω, (3.3) ij,0 + ω ij, ) where j i is shorthand notation for the summation over ordered pairs N i= and ˆp i j {,2,...,N}\{i} ω ij,0 K h (ˆp i0 ˆq j0, v j v i2 )( y i0 )( y j0 ), ω ij, K h (ˆp i ˆq j, v j2 v i )y i0 y j0, s ˆp i0 y s2( y s )L σ (v s v i )( y s0 ) s s L, ˆq j0 y s( y s2 )L σ (v s v j )( y s0 ) σ(v s v i )( y s0 ) s L, σ(v s v j )( y s0 ) s y s2( y s )L σ (v s v i )y s0 s s L, ˆq j y s( y s2 )L σ (v s v j )y s0 σ(v s v i )y s0 s L, σ(v s v j )y s0 with K h (.) h K( ḣ) and L σ(.) σ L(. σ ) being shorthand notation for kernel smoothing. The intuition ( for the consistency of ˆγ CF is as follows. First off, under appropriate conditions, [ ] the ratio j i ij,0) ω j i ω ij,0(v i v j2 ) converges in probability to the expectation of v i v j2 conditional on y i0 = y j0 = 0, v j = v i2 and on the equality in (3.). By the proof of Theorem ( [ ] 3., such a conditional expectation is equal to γ 0. Likewise, j i ij,) ω j i ω ij,(v i2 v j ) also converges in probability to γ 0. Thus the estimator ˆγ CF in (3.3) is a weighted average of these two components, each of which is consistent for γ 0. This estimator avoids minimization, but requires multiple kernel smoothing procedures. In Appendix A we show that this estimator is root-n consistent and asymptotically normal (CAN). where with The second estimator we propose is a kernel-weighted maximum rank correlation estimator: ˆγ MR max γ n(n ) [ ω ij,0g ij,0 (γ) + ω ij, G ij, (γ)], (3.4) j i G ij,0 (γ) {d i,0 > d j,0 }{v j2 + γ > v i } + {d i,0 < d j,0 }{v j2 + γ < v i }; G ij, (γ) {d i,0 > d j,0 }{v i2 > v j + γ} + {d i,0 < d j,0 }{v i2 < v j + γ} d i,0 ( y i )y i2, d j,0 y j ( y j2 ), (3.5) ω ij,0 K h (v j v i2 )( y i0 )( y j0 ), ω ij, K h (v j2 v i )y i0 y j0 7

and K h (.) h K( ḣ ). This estimator is motivated by the maximum rank correlation estimator introduced by Han (987) for cross-sectional models. To understand the intuition for the consistency of ˆγ MR, note that under appropriate regularity conditions, the objective function in (3.4) converges in probability to a weighted integral of E[G ij,0 (γ) v j = v i2, y i0 = 0, y j0 = 0] and E[G ij, (γ) v j2 = v i, y i0 =, y j0 = ]. (3.6) Both conditional expectations in (3.6) are uniquely maximized at γ = γ 0 under the support condition in EM4. 3 The formal argument is analogous to the proof of identification in Han (987) and omitted for brevity here. Thus the extremum estimator in (3.4) satisfies the identification condition for consistency. We note that this maximum rank correlation estimator has the advantage of requiring fewer smoothing parameters than the first closed-form estimator. This comes at the expense of higher computational costs as the maximum rank correlation estimator requires optimization of a nonconcave objective function. Nonetheless, desirable asymptotic properties such as root-n consistency and asymptotic normality still hold, as we show in Appendix A using an argument similar to Abrevaya, Hausman, and Khan (200). 4 Model with Multiple Explanatory Regressors We now discuss the identification and estimation of the model in (.) when x it includes other regressors w it R L as well as the lagged dependent variable y it. That is, in the notation of Honoré and Lewbel (2002), x it (y it, w it ), β 0 (γ 0, δ 0 ) and y it = I[v it + γ 0 y it + w itδ 0 + α i + ɛ it 0] for t =, 2. (4.) Let w i (w i, w i2 ), ɛ i (ɛ i, ɛ i2 ) and e i ( α i ɛ i, α i ɛ i2 ) as before. Let W R L denote the support of w i. We maintain the following conditions: EEM (Random Sampling) For each i, (y i, w i, v i, α i, ɛ i ) are independently drawn from the same data-generating process. The vector (y i, w i, v i ) is reported in data while (α i, ɛ i ) is not. EEM2 (Exclusion Restriction) Conditional on w i, v i is independent of (e i, y i0 ) and is continuously distributed over a connected support V R 2. EEM3 (Exchangeability) Conditional on (w i, y i0 ), e i is continuously distributed with positive density over R 2 and is exchangeable in t =, 2. 3 Assumption EM4 implies that for all γ γ 0, there is positive probability that v i v j2 is between γ and γ 0 conditional on v j = v i2, y i0 = 0, y j0 = 0. 8

EEM4 (Support Condition) There exists C R L such that C C W, and the support condition in EM4 holds conditional on some w i = ( w, w) C C. EEM5 (Rank Condition) There exists some W 0 W such that (i) the support of {w i2 w i : w i W 0 } is not contained in any proper linear subspace, and (ii) conditional on any w i W 0, there exists v, ṽ V with v = ṽ, ṽ 2 = v 2 + γ 0 and v = ṽ 2 + (w i2 w i ) δ 0. Theorem 4. Consider the model in (4.) with t =, 2. Under Assumptions EEM, 2, 3, 4, 5, the coefficients γ 0 and δ 0 are identified. Proof of Theorem 4.. For each i, define η it α i ɛ it w i δ 0 for t =, 2 (note that the definition subtracts the first period index w i δ 0 for both t =, 2). Under EEM3, the distribution of η i (η i, η i2 ) is exchangeable in t =, 2 conditional on w i. Thus (4.) can be written as: y i = I[η i v i + γ 0 y i0 ] and y i2 = I[η i2 v i2 + γ 0 y i + i ], where i (w i2 w i ) δ 0. Consider a vector w i with w i = w i2 so that i = 0. Such a vector exists under the support condition in EEM4. Conditional on such a w i, the argument for identifying γ 0 in Theorem 3. applies (with the constant index w i δ 0 absorbed in the fixed effect α i ). With γ 0 known, we can look for pairs of cross-sectional units i and j such that w i = w j, v i = v j and v j2 = v i2 + γ 0. (4.2) Such pairs exist due to the condition (ii) in EEM5. By construction, Pr(y i =, y i2 = 0 v i, w i, y i0 = 0) + Pr(y j = 0, y j2 = 0 v j, w j, y j0 = 0) = Pr(η i v i, η i2 > v i2 + γ 0 + i w i, y i0 = 0) + Pr(η j > v j, η j2 > v j2 + j w j, y j0 = 0) = Pr(η j2 > v j2 + j w j, y j0 = 0), where the first equality follows from EEM2 and the second follows from the equalities in (4.2) and EEM. Also by EEM2, Pr(y i = 0 v i, w i, y i0 = 0) = Pr(η i > v i w i, y i0 = 0). It then follows from EEM3 that for any pair i and j that satisfy the equalities in (4.2), Pr(y i =, y i2 = 0 v i, w i, y i0 = 0) + Pr(y j = 0, y j2 = 0 v j, w j, y j0 = 0) = Pr(y i = 0 v i, w i, y i0 = 0) (4.3) 9

if and only if v i = v j2 + j. This identifies j = (w j2 w j ) δ 0 for w j conditional on the equalities in (4.2). Replicating the same argument conditional on other values of w j allows us to identify δ 0 under the rank condition (i) in EEM5. 4 We now discuss how to translate the identification result in Theorem 4. to an estimation procedure, conditioning on the set of time-invariant regressors mentioned in EEM 4. For ease of illustration, suppose for now that the vector w it consists of discrete components only. Then we can construct a closed-form estimator and a kernel weighted maximum rank correlation estimator for γ 0 similar to those in Section 3. conditioning on w i = w i2, an equality that occurs with positive probability due to the discreteness of w it. Specifically, we need to replace the weights ω ij,s, ω ij,s in (3.3) and (3.4) with ω ij,s {w i = w i2 } and ω ij,s {w i = w i2 } respectively for s = 0,. If w it contains a continuous component, the event w i = w i2 occurs with zero probability and we need to replace {w i = w i2 } with kernel weights wi wi2 hk( h ) in order to implement the estimation procedure. Note that in such an estimation procedure we are trimming out all but a shrinking fraction of the cross-sectional population, as opposed to trimming all but a shrinking fraction of pairwise comparisons as before. Consequently, the resulting estimator converges at a nonparametric rate. In comparison, the estimator proposed in Honore and Kyriazidou (2000) does not impose any exclusion restriction, but requires more time periods yet still attains a nonparametric rate. The non-standard, slower rate of convergence of our estimator described in the preceding paragraph motivates the need to strengthen model assumptions in order to construct root-n CAN estimators. The following two subsections present two cases where root-n CAN estimators are available under strengthened model assumptions. 4. Exchangeability in Regressors Consider the following condition of exchangeability: EEM3 (Exchangeability) The distribution of e i (e i, e i2 ) conditional on y i0 and w i (w i, w i2 ) is exchangeable in the time index t =, 2. EEM4 (Support Condition) There exists (w, v) and ( w, ṽ) such that (w, w 2 ) = ( w 2, w ) and either ṽ = v 2, v = ṽ 2 + γ 0 or ṽ 2 = v, v 2 = ṽ + γ 0. 4 A symmetric argument identifies δ 0 from analogous conditional probabilities under similar support conditions, using other pairs with y i0 = and y j0 =. 0

The condition EEM3 strengthens the exchangeability in (e i, e i2 ) in EEM3 into a stronger notion of exchangeability in (e i, e i2 ) as well as the explanatory variables (w i, w i2 ) that are conditioned on. This notion of exchangeability in EEM3 has been previously used to attain identification results in the econometrics literature. See, for example, Honoré (992) for censored panel models, Fox (2007) for multinomial choice models, and Altonji and Matzkin (2005) for nonseparable models in both cross-sectional and (random-effect) panel data models. In a binary choice random effect model, Altonji and Matzkin (2005) uses the following condition (Assumption 2.3) on the unobserved errors and observed regressors (assuming two time periods for ease of exposition) f(e it w i, w i2 ) = f(e it w i2, w i ) for t =, 2, where f( ) notes the density of e it conditional on w i. Hence their Assumption 2.3 means that the value of the conditional density function does not change even if the order of the conditioning variables does. Their Assumption 2.3, along with a conditional i.i.d. (or exchangeability) assumption on ɛ it would imply our Assumption EEM3. Theorem 4.2 Consider the model in (4.) with t =, 2. Under Assumptions EEM, 2, 3, 4, 5, the coefficients γ 0 and δ 0 are identified. Proof of Theorem 4.2. Consider a pair (w i, v i ) and (w j, v j ) such that (w i, w i2 ) = (w j2, w j ) and v j = v i2 (4.4) and Pr(y i = 0, y i2 = w i, v i, y i0 = 0) = Pr(y j =, y j2 = 0 w j, v j, y j0 = 0). (4.5) As we show below, such a pair exists under the support condition in EEM4. By construction, the left-hand side of (4.5) is Pr(e i > w iβ 0 + v i, e i2 w i2β 0 + v i2 w i, y i0 = 0), where the equality is due to EEM2. Besides, the right-hand side of (4.5) is Pr(e j w jβ 0 + v j, e j2 > w j2β 0 + v j2 + γ 0 W j = (w j, w j2 ),y j0 = 0) = Pr(e j2 w jβ 0 + v j, e j > w j2β 0 + v j2 + γ 0 W j = (w j2, w j ),y j0 = 0) = Pr(e i2 w i2β 0 + v i2, e i > w iβ 0 + v j2 + γ 0 W i = (w i, w i2 ),y i0 = 0), where the first equality follows from the exchangeability in EEM3 and the second follows from i.i.d. sampling assumption (EEM ) and the equalities in (4.4). Thus the equality in (4.5) holds

if and only if v i = v j2 + γ 0. Thus γ 0 = v i v j2 is over-identified using any pair (w i, v i ) and (w j, v j ) that satisfy (4.4) and (4.5). By a symmetric argument, we can look for pairs of (w i, v i ) and (w j, v j ) such that and (w i, w i2 ) = (w j2, w j ), v j2 = v i Pr(y i = 0, y i2 = w i, v i, y i0 = ) = Pr(y j =, y j2 = 0 w j, v j, y j0 = ), and show that γ 0 is (over-)identified as γ 0 = v i2 v j. With γ 0 identified, we can replicate the argument in Theorem 4. to identify δ 0 through pairwise comparison under the rank condition in EEM 5. Based on the identification strategy in Theorem 4.2, we propose a two-step estimator for γ 0, δ 0. In the first step, use a kernel-weighted maximum rank correlation estimator to estimate γ 0 : ˆγ EX max γ n(n ) K ij [ ω ij,0 G ij,0 (γ) + ω ij, G ij, (γ)] j i where d i,0, d j,0, ω ij,0 and ω ij, are defined as in (3.5) and K ij K σ (w i w j2, w i2 w j ) is a kernel weight for matching i and j with (w i, w i2 ) = (w j2, w j ). In the second step, use ˆγ EX to construct a closed-form estimator for δ 0 by matching pairs i and j that satisfy the equalities in (4.2) and (4.3) simultaneously. Remark 4. For the model with lagged dependent variables as well as strictly exogenous variables we are able to identify the regression coefficients for three times periods. Furthermore the closedform estimator we propose converges at the parametric rate with a limiting normal distribution. In comparison, Honore and Kyriazidou (2000) require four periods for identification and achieve a nonparametric rate in estimation. They impose an i.i.d. assumption on ɛ it which is stronger than what we assume here; but they do not impose the exclusion restriction in EEM2 nor the exchangeability assumption in EEM3. They require that the support of w it to be overlapping over time so that the difference in regressors across adjacent time periods has a positive density in a neighborhood of 0. Remark 4.2 Our new identification and estimation results extend to static binary choice models with fixed effects, where the vector of explanatory variables in a period does not include the lagged dependent variable from the previous period. (See Appendix B for details.) In fact, it can be shown that the same pairwise comparison procedure provides root-n CAN estimators of the regression coefficients under weaker conditions. Specifically, in a static binary choice panel data model with 2

the exclusion restriction, we only need observations of the dependent and explanatory variables in two time periods, do not require the exchangeability in w it (EEM3 ), and only impose a conditional stationarity restriction on e it. The assumed behavior on e it would then be identical to that imposed in Manski (987). The model in Manski (987) is more general as exclusion was not assumed for his identification result, but the parameters could not be estimated at the parametric rate. Ai and Gan (200) also study static models (e.g. without lagged dependent variables) and, like ours, their estimator converges at the parametric rate. However, they require w it to be independent of e it, which we do not impose in the static model considered in Appendix B. 4.2 Exogenous Initial Condition Another way to attain root-n CAN estimation is to further require that e i, v i and the initial condition y i0 be mutually independent conditional on w i. Then a similar two-step estimator based on pairwise comparison across cross-sectional units with different initial conditions can be constructed. To see this, suppose that e i, v i and y i0 are mutually independent given w i, and that e i is continuously distributed over R given w i. First, look for pairs of observations i and j such that w i = w j and Pr(y i = y i0 = 0, w i, v i ) = Pr(y j = y j0 =, w j, v j ). (4.6) Under the mutual independence condition above, the second equality in (4.6) is equivalent to Pr(e i w iδ 0 + v i w i ) = Pr(e j w jδ 0 + v j + γ 0 w j ). Such pairs exist under mild support conditions. It then follows that γ 0 is over-identified as γ 0 = v i v j using any pair of i and j that satisfies both equalities in (4.6). Second, with γ 0 identified, we can identify δ 0 under EEM, 2, 3, 5 by the same argument as in the proof of Theorem 4., which uses variables observed in both periods t =, 2. This line of constructive identification argument lends itself to a two-step estimator that is based on pairwise comparisons, and root-n CAN under appropriate regularity conditions. 5 Simulation Study In the this section we compare the finite-sample performance of the new estimators we propose with that of existing estimators in dynamic panel data models. The first class of designs we consider have no other strictly exogenous regressors than v it. We randomly generate data from the following equation: y it = I[α i + v it + γ 0 y i,t + ɛ it > 0] t =, 2 3

where (v i, v i2 ) is bivariate normal with mean (0, 0) and unit standard deviation (, ). We report the results for several designs, with the coefficient of correlation between v i and v i2 ranging from 0 to 0.75. The error terms (ɛ i, ɛ i2 ) are independent of (v i, v i2, α i ) with a bivariate normal distribution with mean (0, 0), standard deviation (, ) and a correlation coefficient of 0.5. For the fixed effect α i, we considered designs where α i is binary and independent of (v i, v i2 ) with Pr(α i = ) = Pr(α i = 0) = 0.5. Table reports simulation results for two estimators of γ 0 = 0.5: the inverse weighting procedure in Honoré and Lewbel (2002) (HL) and our two estimators based on pairwise comparison, i.e., the closed-form estimator (CKT) and the kernel-weighted maximum rank correlation estimator (CKT2). In practice, each of these three estimators requires some nonparametric estimation procedure and hence smoothing parameters. To focus on these estimators sensitivity to serial dependence in v it (as opposed to their sensitivity to tuning parameters), we compare the infeasible version of each estimator in our simulation exercises. That is, we use knowledge of the true conditional density for the estimator in Honoré and Lewbel (2002), and the true conditional choice probability for both of our estimators introduced here. For matching the probabilities in the CKT estimator we followed the procedure outlined in Chen, Khan, and Tang (206), where there is under-smoothing in the choice of bandwidths for the kernel estimation of propensity scores in the preliminary step (relative to the bandwidths used for matching explanatory variables). We report the mean bias (BIAS) and the root mean square errors (RMSE) of all three estimators for γ 0 (HL, CKT and CKT2) in the design above, with the correlation coefficients for (v i, v i2 ) ranging between ρ v {0, 0.25, 0.5, 0.75}. For each sample size n {200, 400, 800, 600}, we calculate the mean bias and RMSEs using 60 replications of simulated samples. The findings from this simulation exercise under dynamic designs are in accordance with our theoretical results. When there is serial correlation in v it (ρ v 0), the mean bias of the HL estimator does not decline monotonically with the sample size, and the RMSE diminishes at a rate much slower than root-n. In contrast, both of our estimators proposed in this paper demonstrate a faster rate of decline in RMSE as the sample size increases, regardless of the level of serial correlation in v it. 4

TABLE. Performance of Estimators for γ 0 in a Simple Model (Exogenous variable: v i ) HL CKT CKT2 ρ v 0 /4 /2 3/4 0 /4 /2 3/4 0 /4 /2 3/4 n=200 BIAS 0.067 0.00-0.045-0.23-0.3-0.7-0.30-0.79-0.08-0.006-0.06 0.008 RMSE.572.597.5.50 0.52 0.53 0.6 0.200 0.37 0.38 0.37 0.38 n=400 BIAS -0.038-0.52-0.49-0.36-0.096-0.099-0.5-0.62-0.00-0.03-0.002-0.004 RMSE.96.22.36.26 0.8 0.8 0.29 0.73 0.283 0.288 0.279 0.280 n=800 BIAS 0.055-0.097-0.225-0.288-0.086-0.088-0.06-0.47-0.00-0.006 0.004 0.009 RMSE.027.025.023.26 0.097 0.097 0.4 0.53 0.229 0.232 0.227 0.234 n=600 BIAS 0.006-0.04-0.24-0.382-0.074-0.079-0.096-0.40-0.003-0.005 0.003 0.009 RMSE.024 0.992 0.857 0.890 0.079 0.084 0.00 0.43 0.76 0.76 0.80 0.73 Next, we study a more general design which includes other explanatory variables w it in addition to v it : y it = I[α i + v it + w it δ 0 + γ 0 y i,t + ɛ it > 0]. In our simulation we let δ 0 = and w i be independent of (α i, v i, ɛ i ). We let (w it, w i2 ) be serially independent and drawn from a binary distribution Pr(w it = ) = Pr(w it = 0) = 0.5 for t =, 2. The other elements of the model are specified as in the simple design above. Table 2 and Table 3 report the performance of HL and CKT estimators for δ 0 and for γ 0 respectively in small and moderate-sized samples. Table 3 shows in general neither the mean bias or the RMSE of the HL estimator for γ 0 decreases with the sample size in the presence of serial correlation in v it. Table 2 demonstrates similar results for the HL estimator for δ 0. In comparison, CKT exhibits a noticeable bias for both δ 0 and γ 0 (especially δ 0 ) but the RMSE diminishes as the sample size increases. 5

TABLE 2. Performance of Estimators for δ 0 in a Full Model (Exogenous variables: w i and v i ) HL CKT ρ v 0 /4 /2 3/4 0 /4 /2 3/4 200 obs. Mean Bias -0.0546-0.286-0.522-0.0655-0.242-0.254-0.265-0.3244 RMSE.0879 0.827.537.2749 0.2475 0.2600 0.273 0.3323 400 obs. Mean Bias -0.0486-0.0448-0.03-0.093-0.288-0.2304-0.242-0.2993 RMSE.4033.5027 0.708.366 0.2220 0.2339 0.2447 0.3033 800 obs. Mean Bias 0.0324-0.0057-0.0869-0.43-0.990 0.200-0.223-0.287 RMSE.7034.052 0.8770 0.4258 0.2007 0.27 0.223 0.2835 600 obs. Mean Bias -0.0873-0.082-0.043-0.0838-0.8-0.902-0.202-0.2577 RMSE 0.4085 0.4692 0.8775 0.7066 0.822 0.9 0.20 0.2588 TABLE 3. Performance of Estimators for γ 0 in a Full Model (Exogenous variables: w i and v i ) HL CKT ρ v 0 /4 /2 3/4 0 /4 /2 3/4 200 obs. Mean Bias -0.04-0.044-0.960-0.3782-0.286-0.598-0.796-0.946 RMSE 7.76673 9.9655 6.89 4.922 0.435 0.79 0.897 0.2037 400 obs. Mean Bias -0.0734-0.767-0.092-0.226-0.95-0.437-0.644-0.828 RMSE 6.8646 3.236 3.5076 4.6439 0.269 0.493 0.694 0.872 800 obs. Mean Bias 0.2720-0.397-0.5554-0.328-0.08-0.334-0.50-0.66 RMSE 6.944 2.7588 5.4450 2.0464 0.46 0.369 0.539 0.686 600 obs. Mean Bias -0.448-0.249-0.2386-0.3757-0.04-0.206-0.386-0.56 RMSE 2.2697 2.272 3.4023 2.4554 0.035 0.222 0.40 0.530 6

6 Conclusions We explore the use of exclusion restrictions in dynamic binary choice panel data models introduced in Honoré and Lewbel (2002). Their model was partly motivated by the difficulty in identifying models that allow for both state dependence and unobserved heterogeneity. However, here we show that the exclusion restriction in Honoré and Lewbel (2002) requires (conditional) serial independence of the excluded regressor. Thus their inverse-density-weighted estimator in Honoré and Lewbel (2002) is generally inconsistent when the excluded regressors are serially correlated in a dynamic panel data model. We propose a new approach of identification and estimators for semiparametric binary choice panel data model under exclusion restrictions. Our approach accommodates the serial dependence in the excluded regressors, and the new estimators converge at the parametric rate to a limiting normal distribution. This rate is faster than the nonparametric rates of existing alternative estimators for the binary choice panel data model, including the static case in Manski (987) and the dynamic case in Honore and Kyriazidou (2000). References Abrevaya, J., J. Hausman, and S. Khan (200): Testing for Causal Effects in a Generalized Regression Model with Endogenous Regressors, Econometrica, 48, 2043 206. Ai, C., and L. Gan (200): An Alternative Root-n Consistent Estimator for Panel Data Binary Choice Model, Journal of Econometrics, 57, 93 00. Altonji, J., and R. Matzkin (2005): Cross Section and Panel Data Estimators for Nonseparable Models with Endogenous Regressors, Econometrica, 73, 053 02. Chen, S., S. Khan, and X. Tang (206): On the Informational Content of Special Regressors in Heteroskedastic Binary Response Models, Journal of Econometrics, 93, 62 82. Fox, J. (2007): Semiparametric Estimation of Multinomials Discrete Choice Models using a Subset of Choices, RAND Journal of Economics, 38, 002 029. Hahn, J. (200): The Information Bound of a Dynamic Panel Logit Model with Fixed Effects, Econometric Theory, 7, 93 932. Han, A. (987): The Maximum Rank Correlation Estimator, Journal of Econometrics, 303-36. 7

Heckman, J. (99): Identifying the Hand of the Past: Distinguishing State Dependence from Heterogeneity, American Economic Review, pp. 75 79. Honoré, B. (992): Trimmed LAD and Least Squares Estimation of Truncated and Censored Regression Models with Fixed Effects, Econometrica, 60(3), 533 65. Honore, B., and E. Kyriazidou (2000): Panel Data Discrete Choice Models with Lagged Dependent Variables, Econometrica, 68, 839 874. Honoré, B., and A. Lewbel (2002): Semiparametric Binary Choice Panel Data Models without Strictly Exogeneous Regressors, Econometrica, 70(5), 2053 2063. Lewbel, A. (2000): Semiparametric Qualitative Response Model Estimation with Unknown Heteroscedasticity or Instrumental Variables, Journal of Econometrics, 97, 45 77. Lewbel, A., and X. Tang (205): Identification and Estimation of Games with Incomplete Information using Excluded Regressors, Journal of Econometrics, 89, 229 244. Manski, C. F. (987): Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data, Econometrica, 55(2), 357 362. Sherman, R. (993): The Limiting Distribution of the Maximum Rank Correlation Estimator, Econometrica, 6, 23 37. (994): U-Processes in the Analysis of a Generalized Semiparametric Regression Estimator, Econometric Theory, 0, 372 395. A Regularity Conditions and Asymptotic Theory We outline the regularity conditions and arguments for deriving the limiting distribution of the closed-form estimator ˆγ CF in Section A. and the rank-based estimator ˆγ MR in Section A.2. In both cases we focus on the simplified model where the two regressors are the excluded variable v it and the lagged dependent variable y it. A. Closed-Form Estimator Recall the closed-form estimator was expressed as j i ˆγ CF [ω ij,0(v i v j2 ) + ω ij, (v i2 v j )] j i (ω ij,0 + ω ij, ) (A.) 8

where j i denote the summation over N(N ) ordered pairs and ω ij,0 K h (ˆp i0 ˆq j0, v j v i2 )( y i0 )( y j0 ), ω ij, K h (ˆp i ˆq j, v j2 v i )y i0 y j0 ; s ˆp i0 y s2( y s )L σ (v s v i )( y s0 ) s s L, ˆq j0 y s( y s2 )L σ (v s v j )( y s0 ) σ(v s v i )( y s0 ) s L ; σ(v s v j )( y s0 ) s y s2( y s )L σ (v s v i )y s0 s s L, ˆq j y s( y s2 )L σ (v s v j )y s0 σ(v s v i )y s0 s L, σ(v s v j )y s0 ˆp i with K h (.) h K( ḣ) and L σ(.) σ L(. σ ). Our arguments for the limiting distribution theory for the closed form estimator are based on the following conditions: Assumption A. (Non-singularity) The matrix Σ defined in (A.6) is positive and finite. Assumption A.2 (Kernels for matching) (i) Let K(, ) = K ( )K 2 ( ) where K, K 2 have compact supports, are symmetric around 0, integrate to, are twice continuously differentiable and are eighth-order kernels. (ii) sup t R h 2 K 2 (t/h 2 ), sup t R h K (t/h ) and sup t R h K (t/h ) are all O() as h, h 2 0. Assumption A.3 (Bandwidths for matching) h n δ and h 2 n δ2 where δ ( 2, 9 ) and 2δ 2 < 2 3 δ. Assumption A.4 (Smoothness) The functions p, q and the conditional density f 0, g 0, f, g defined below are all M = 6 times continuously differentiable with bounded derivatives. Assumption A.5 (Kernel for estimating propensity scores) (i) L has compact support, is symmetric around zero, integrates to one, and is twice continuously differentiable. (ii) L has an m-th order with m > 2 Assumption A.6 (Smoothness of population moments) The propensity score p 0 (.), q 0 (.), p (.) and q (.) defined below and the density of (v i, v i2 ) are continuously differentiable of order m with bounded derivatives, where m > 2. Assumption A.7 (Bandwidth for estimating propensity scores) σ n n γ/3, where ( 3 m 3 + δ ) < γ < 3 2δ. Assumption A.8 (Finite second moments) The function χ i defined in (A.3) has finite second moment. Proposition Under Assumptions A.-A.8, n (ˆγCF γ 0 ) d N ( 0, Σ 2 E[χ 2 i ] ) 9

Proof. By construction, we can write j i ˆγ CF γ 0 = [ω ij,0(v i v j2 γ 0 ) + ω ij, (v i2 v j γ 0 )] j i (ω. (A.2) ij,0 + ω ij, ) The proof of the asymptotic distribution of ˆγ CF requires us to derive a linear representation for the right-hand side of (A.2). First, we look for the probability limit of n(n ) j i ω ij,0. Under Assumption A.2, we apply a Taylor expansion of ω ij,0 around the actual conditional expectation in the data-generating process p i0 p 0 (v i ) E [y i2 ( y i ) v i, y i0 = 0] and q j0 q 0 (v j ) E [y j ( y j2 ) v j, y j0 = 0]. This allows us to write where n(n ) ω ij,0 = j i n(n ) ω ij,0 + o p (n /4 ), j i (A.3) ω ij,0 K h (p i0 q j0, v j v i2 )( y i0 )( y j0 ). (A.4) That the remainder term in (A.3) is o p (n /4 ) follows from Assumptions A.2, A.3, A.5, A.6, A.7 and an argument used in Lemma D.3 in Chen, Khan, and Tang (206). Under Assumption A.2, A.3 and A.6, E[ ω ij,0 2 ] = o(n). By an application of the Law of Large Numbers for U-statistics (e.g., Lemma 3. in Powell, Stock and Stocker (989)), n(n ) j i ω ij,0 converges in probability to the limit of the expectation of ω ij,0 as n. To evaluate this limit, first note that the conditional expectation of ω ij,0 given y i0, y j0 is ( y i0 )( y j0 ) K h (p q)k 2h (v j v i2 )f 0 (p, v i2 y i0 )g 0 (v j, q y j0 )dv j dpdqdv i2, where K h h K (. h ), K 2h h 2 K 2 (. h 2 ), f 0 (.,. y i0 ) denotes the density of (p i0, v i2 ) given y i0, and g 0 (.,. y j0 ) denotes the density of (v j, q j0 ) given y j0. By changing variables between v j and u (v j v i2 )/h 2 while fixing (p, q, v i2 ), we write this expression as: ( ) ( y i0 )( y j0 ) K h (p q)f 0 (p, v i2 y i0 ) K 2 (u)g 0 (v i2 + uh 2, q y j0 )du dpdqdv i2. Next, change variables between p and ũ = (p q)/h while fixing (u, q, v i2 ), we can write this as ( y i0 )( y j0 ) K (ũ)k 2 (u)f 0 (q + ũh, v i2 y i0 )g 0 (v i2 + uh 2, q y j0 )dudũdqdv i2. By the Dominated Convergence Theorem, this converges to the following expression as h, h 2 0: H 0 (y i0, y j0 ) ( y i0 )( y j0 ) f 0 (q, v i2 y i0 )g 0 (v i2, q y j0 )dqdv i2. (A.5) where we have used the fact K (ũ)dũ = K 2 (u)du =. Thus the probability limit of n(n ) j i ω ij,0 is E[H 0 (y i0, y j0 )]. It then follows from (A.3) that n(n ) j i ω ij,0 converges in 20

probability to E[H 0 (y i0, y j0 )]. Using an analogous argument, we conclude that n(n ) j i ω ij, converges in probability to E[H (y i0, y j0 )] where H (y i0, y j0 ) y i0 y j0 f (v i, q y i0 )g (q, v i y j0 )dqdv i, and f (.,. y i0 ) is the density of (v i, p i ) given y i0, and g (.,. y j0 ) the density of (q j, v j2 ) given y j0, with p (v i ) E [y i2 ( y i ) v i, y i0 = ] p i and q (v j ) E [y j ( y j2 ) v j, y j0 = ] q j. Combining these results, we have shown that where n(n ) (ω ij,0 + ω ij, ) p Σ j i Σ E[H 0 (y i0, y j0 ) + H (y i0, y j0 )]. (A.6) and is strictly positive and finite under Assumption A.. We now turn to the linear representation of the numerator in the right-hand side of (A.2). The first term in the numerator is: n(n ) ω ij,0(v i v j2 γ 0 ). i j (A.7) By a second-order Taylor expansion around p i0 and q j0, we can write this expression as [ ] ω ij,0 (v i v j2 γ 0 ) + ω () ij,0 (v i v j2 γ 0 )(ˆp i0 ˆq j0 p i0 + q j0 ) + R n (A.8) n(n ) i j where ω ij,0 is defined in (A.4) and ω () ij,0 ( ) h 2 K pi0 q j0 K 2 h 2 h ( vj v i2 h 2 ) ( y i0 )( y j0 ), and R n is the second-order term in the Taylor expansion. Under Assumptions A.2, A.3, A.5, A.6 and A.7, R n is o p (n /2 ) by an argument that follows Lemma D.3 in Chen, Khan, and Tang (206). The lead term n(n ) i j [ ω ij,0(v i v j2 γ 0 )] is o p (n /2 ) by our identification result and under the maintained assumptions. It remains to derive a linear representation of the firstorder term (i.e., the second term) in the approximation in (A.8). component in that term: n(n ) i j ω() ij,0 (v i v j2 γ 0 )(ˆp i0 p i0 ). Consider the first additive (A.9) Let ˆm i0, ˆf i0 denote the numerator and denominator in the definition of ˆp i0 ; let m i0 E[y i2 ( y i )( y i0 ) v i ]f(v i ) and f i0 E( y i0 v i )f(v i ) so that p i0 = m i0 /f i0 by construction. Applying a first-order Taylor expansion of (A.9) around (m i0, f i0 ), we get n(n ) i j ω () ij,0 (v i v j2 γ 0 ) [ ˆm i0 m i0 ( f ˆf ] i0 f i0 )p i0 + R n (A.0) i0 2

where R n is o p (n /2 ) under Assumption A.2, A.3, A.5, A.6 and A.7. Thus we can write the righthand side of (A.0) as the sum of a third-order U-statistic and some asymptotically negligible term: where n(n )(n 2) ϕ n(ξ i, ξ j, ξ s ) + o p (n /2 ) i j s ϕ n (ξ i, ξ j, ξ s ) ω() ij,0 (v i v j2 γ 0 ) f i0 L σ (v s v i ) ( y s0 ) [y s2 ( y s ) p i0 ] (A.) where ξ i (y i, v i, p i ) with y i (y i0, y i, y i2 ), v i (v i, v i2 ) and p i (p i0, p i ). By Lemma 3. in Powell, Stock and Stocker (989), we can write (A.) as θ n + n n i= 3 l= [r(l) n (ξ i ) θ n ] + o p (n /2 ) where r (l) n (ξ) E[ϕ n (ξ, ξ 2, ξ 3 ) ξ l = ξ] and θ n E[ϕ n (ξ i, ξ j, ξ l )]. Using change of variables and Taylor expansion, as well as the smooth conditions on the kernel L(.), we can show that θ n = E[r n () (ξ i )] = o(n /2 ). Furthermore by the Dominated Convergence Theorem, the unconditional variance of r n () (ξ i ) is o() under maintained assumptions. It then follows from the Chebyshev s Inequality that n [ ] r (l) n i= n (ξ i ) θ n = o p (n /2 ) for l =, 2. By an argument similar to Chen, Khan, and Tang (206), under Assumptions A2-A8, the thirdorder U-statistic in (A.) has the following representation: n n Γ 0 {y i2 ( y i )) E[y i2 ( y i ) v i ]} + o p (n /2 ) i= (A.2) where Γ 0 is the limit of the following expectation as n and h, h 2 0: [ ( ) ( ) ] E K pi0 q j0 vj v i2 K 2h ( y i0 )( y j0 )(v i v j2 γ 0 ). h 2 h 2 h That is, Γ 0 = E[( y i0 )( y j0 )H(y i0, y j0 )] with H(y i0, y j0 ) G 0 (q, q, v i2, v i2, y i0 )g 0 (v i2, q y j0 )dqdv i2, where G 0 (p, q, v i2, v j ) {f 0(p, v i2 y i0 ) [ p 0 (p, v i2 ) q 0 (q, v j ) γ 0 ]} ; p h 2 with p 0 (t, v i2 ) inf{v i : p 0 (v i, v i2 ) t}; q 0 (t, v j ) inf{v j2 : q 0 (v j, v j2 ) t}; and f 0 and g 0 denote the joint density of (p i0, v i2 ) and (v j, q j0 ) conditional on y i0 and y j0 respectively. 22

By an analogous argument, the linear representation of ω () ij,0 n(n ) (v i v j2 γ 0 )(ˆq j0 q i0 ). i j is similar to (A.2), only with y i2 ( y i ) replaced by y i ( y i2 ). Combining these results, we get the following linear representation of (A.7) as n n Γ 0 [y i2 y i E (y i2 y i v i )] + o p (n /2 ). i= We can use identical arguments to the second part of the numerator: n(n ) ω ij,(v i2 v j γ 0 ) i j and derive the following asymptotic linear representation: n n Γ [y i2 y i E (y i2 y i v i )] + o p (n /2 ) i= where Γ = E[y i0 y j0 H(yi0, y j0 )] with and H(y i0, y j0 ) G (q, q, v i, v i, y i0 )g (q, v i y j0 )dqdv i G (p, q, v i, v j2 ) {f (v i, p y i0 ) [ p (v i, p) q (v j2, q) γ 0 ]} p and p (v i, t) inf{v i2 : p (v i, v i2 ) t} and q (v j2, t) inf{v j : q (v j, v j2 ) t}; and f (.,. y i0 ) and g (.,. y j0 ) denote the density of (v i, p i ) and (q j, v j2 ) conditional on y i0 and y j0 respectively. Gathering these results (i.e., the probability limit of the denominator and the linear representation of the numerator), we get the following asymptotic linear representation of our closed-form estimator: where ( ˆγ CF γ 0 = Σ n ) n χ i + o p (n /2 ) i= χ i (Γ 0 + Γ ) [ y i E ( y i v i )], (A.3) with y i y i2 y i. 23

A.2 Weighted Maximum Rank Correlation Estimator Recall the weighted maximum rank correlation estimator is ˆγ MR arg max γ n(n ) [ ω ij,0g ij,0 (γ) + ω ij, G ij, (γ)], j i (A.4) where G n,0 (γ) {d i,0 > d j,0 }{v j2 + γ > v i } + {d i,0 < d j,0 }{v j2 + γ < v i } G n, (γ) {d i,0 > d j,0 }{v i2 > v j + γ} + {d i,0 < d j,0 }{v i2 < v j + γ} with d i,0 ( y i )y i2, d j,0 y j ( y j2 ), (A.5) ω ij,0 K h (v j v i2 )( y i0 )( y j0 ), ω ij, K h (v j2 v i )y i0 y j0 and K h (.) h K( ḣ ) being shorthand for kernel smoothing. Before stating the regularity conditions, we define the following functions: G r (v i, v j, y i0, y j0 ) = E[d i,0 d j,0 ] v i, v j, y 0i, y 0j ] ˆΥ n (γ) = n(n ) ij,0g ij,0 (γ) + ω ij, G ij, (γ)] j i Υ n (γ, y 0i, y 0j, v i, v j ) = E[ ˆΥ n (γ) y 0i, v i, y 0j, v j ] Ῡ n (γ, y 0i, v i ) = E[Υ n (γ, y 0i, v i, y 0j, v j ) y 0i, v i ] Ῡ 2n (γ, y 0j, v j ) = E[Υ n (γ, y 0i, v i, y 0j, v j ) y 0j, v j ] Υ n (γ) = E[Υ n (γ, y 0i, v i, y 0j, v j )] Υ 0 (γ) = lim n Υ n(γ) Throughout this part of the appendix, we maintain that the true parameter γ 0 lies in the interior of Ξ 0, a compact parameter space (interval) on the real line. Our arguments for the limiting distribution theory for the rank estimator are based on the following conditions: Assumption MR. The constant Σ 0, (defined formally in (A.9)) is positive and finite. Assumption MR.2 (Kernel for matching) (i) Let K( ) where K has compact support, is symmetric around 0, integrate to, is twice continuously differentiable and of order M. (ii) sup t R h K (t/h), is O() as h 0. Assumption MR.3 (Bandwidth for matching) nh M n 0 and nh n. 24