Irregular Identification, Support Conditions, and Inverse Weight Estimation

Size: px
Start display at page:

Download "Irregular Identification, Support Conditions, and Inverse Weight Estimation"

Transcription

1 Irregular Identification, Support Conditions, and Inverse Weight Estimation Shakeeb Khan and Elie Tamer This version: Jan. 30, 2009 ABSTRACT. This paper points out a subtle, sometimes delicate property of estimators based on weighted moment conditions models where we find that there is an intimate link between identification and estimability. If it is necessary for (point) identification that the weights take arbitrary large values, then the parameter of interest, though point identified, cannot be estimated at the regular (parametric) rate. This rate depends on relative tail conditions and can be as slow as n 1/4. Practically speaking, this nonstandard rate of convergence can lead to numerical instability, and/or large standard errors. We examine two examples where a parameter is defined by a weighted moment condition with the above property: 1) The binary response model under mean restriction introduced by Lewbel (1997) and further generalized to cover endogeneity and selection where the estimator in this class of models is weighted by the density of a special regressor; and, 2) The treatment effect model under exogenous selection where the resulting estimator of the average treatment effect (or ATE) is one that is weighted by a variant of the propensity score. Without support conditions (that can be strong), these models, similar to well known identified at infinity models, lead to estimators that converge at slower than parametric rate, since essentially, to ensure point identification, one requires some variables to take values on sets with arbitrarily small probabilities, or thin sets. For the two models above, we derive rates of convergence under different tail conditions and also propose rate adaptive inference procedures, analogous to Andrews and Schafgans (1998). JEL Classification: C14, C25, C13. Key Words: Irregular and Robust Identification, inverse weighting, rates of convergence. We thank a Co-editor, 3 referees, J. Powell and seminar participants at Brown, UC-Berkeley, U Chicago, Duke, Harvard/MIT, Indiana, Simon Fraser, Stanford, UBC, University of Toronto, UW-Madison, UWO, and Yale, as well as conference participants at the 2007 NASM at Duke University and 2008 NAWM in New Orleans, for helpful comments. Department of Economics, Duke University, 213 Social Sciences Building, Durham, NC Phone: (919) Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL Phone: (847) Support from the National Science Foundation is gratefully acknowledged.

2 1 Introduction The identification problem in econometrics is a first step that one must explore and carefully analyze before any meaningful inferences can be reached. Identification clarifies what can be learned about a parameter of interest in a given model. There is a class of models in econometrics, mainly arising in limited dependent variable models, that attain identification by requiring that covariate variables take support in regions with arbitrary small probability mass. These identification strategies sometimes lead to estimators that are weighted by a density or a conditional probability, and these weights take arbitrarily large values on these regions of small mass. For example, an estimator that is weighted by the inverse of the density of a continuous random variable, effectively only uses observations for which that density is small. Taking values on these thin sets, is essential (often necessary) for point identification in these models. We explore the relationship between the inverse weighting method in these models and the rates of convergence of resulting estimators, and show that under general conditions, it is often not possible to attain the regular parametric rate (square root of the sample size) with these models. Furthermore, these rates of convergence depend on unknown relative tail behavior properties of observed and unobserved random variables, making inference more complicated than in standard problems. We label these identification strategies as irregular in the sense that estimators based on them do not converge at the parametric (square root of the sample size) rate. Consequently we argue that these strategies belong to the class of identified at infinity approaches (See Chamberlain (1986) and Heckman (1990a)). Note that it is part of econometrics folklore to equate the parameter being identified at infinity with slow rates of convergence, as was also established in Chamberlain (1986) for some particular models. See also Andrews and Schafgans (1998). We show that the actual rates of convergence for estimators based on these identification strategies vary substantially with tail conditions on the underlying variables. The rate of convergence is an explicit function of the relative tail behavior. This suggests the need for a rate adaptive inference procedure for this class of models and this paper aims to address this by proposing and establishing the asymptotic properties of studentized estimators. The results in this paper are connected to two models that we examine in detail. In these models, the parameter of interest can be written as an expectation of functions of observed random variables. First, we consider the binary choice model under an exclusion restriction and mean restrictions. This model was introduced to the literature by Lewbel (1997), Lewbel (1998) and Lewbel (2000). There, Lewbel demonstrates that this binary model is point identified with only a mean restriction on the unobservables, by requiring 1

3 the presence of a special regressor that is conditionally independent of the error. Under these conditions, Lewbel provides a density weighted estimator for the finite dimensional parameter (including the constant term). This estimator contains a random denominator that can take arbitrarily small values in regions that are necessary for point identification. We show that the parameters in a simple version of that model are irregularily identified unless special relative tail behavior conditions are imposed. Two such conditions are: a support condition such that the propensity score hits limiting values with positive measure, in which case root n estimation can be attained; or infinite moment restrictions on the regressors. As we show below, the optimal rate of convergence in these cases depend on the tail behavior of the special regressor relative to the error distribution. Second, we consider the important treatment effects model under binary treatment and exogenous selection. This is an important model that is widely used in economics and statistics to estimate the average treatment effect (ATE), or the treatment on the treated (ATOT) parameters in program evaluations. For an exhaustive review of this literature, see Imbens (2004). Hahn (1998), in important work, derived the efficiency bound for this model and provided an estimator that reaches that bound (See also Hirano, Imbens, and Ridder (2003)). In the case where the covariates take relative large support, the propensity score is arbitrarily close to zero or one 4 (as opposed to bounded away from zero or one). We show that the average treatment effect in this case can be irregularly identified (the semiparametric efficiency bound can be infinite) resulting in non-regular rates of convergence for the ATE in general: the optimal rates will depend on the relative tail thickness of the error in the treatment equation and the covariates. For empirical users, this results in instability of the estimator in cases where there is limited overlap and that care should be taken in making inference on these parameters. Busso, DiNardo, and McCrary (2008), in an important recent paper, highlight this instability in extensive Monte Carlo simulations where the ATE and TOT parameter appear to exhibit bias at moderate sample sizes. See also Frolich (2004). Also, in Appendix A.1, we show impossibility theorems that are analogous to those proposed in Chamberlain (1986) for sample selection and binary choice models that show that generally, no regular estimator (see Newey (1990)) exists for these models. The next section formally defines the concepts of irregular and robust identification and relates it to the two models we study in detail which is done in sections 3 and 4. Specifically, section 3 considers the binary choice model under mean restrictions, and section 4 considers 4 The identification strategy for these models is based on writing the estimand as a weighted sum where the weights can get arbitrary small. This intuition applies equally to a matching strategy since a matching estimator can be written as a weighted estimator where weights can get small. 2

4 the average treatment effect. In each of these sections we also derive optimal rates of convergence for estimators of parameters of interest that are sample analogs to the moments conditions used to identify these parameters, and propose rate adaptive inference procedures. Section 5 concludes by summarizing and suggesting areas for future research. 2 Irregular Identification and Inverse Weighting The notion of regularity of statistical models is linked to efficient estimation of parameterssee, e.g. Stein (1956). The concept of irregular identification has been related to consistent estimation of the parameters of interest at the parametric rate. An important early reference to irregular identification in the literature appears in Chamberlain (1986) dealing with efficiency bounds for sample selection models 5. There, Chamberlain showed that even though a slope parameter in the selection equation is identified, it is not possible to estimate it at the parametric root n rate (under the assumptions he maintains in the paper). Chamberlain added that point identification in this case relies on the behavior of the regression function at infinity. In essence, since the disturbance terms are allowed to take support on the real line, one requires that the regression function takes large values so that at the limit the propensity score is equal to one (or is arbitrarily close to one). The identified set in the model shrinks to a point when the propensity score hits one (or no selection). Heckman (1990b) highlighted the importance of identified at infinity parameters in various selection models and linked this type of identification essentially to the propensity score. Andrews and Schafgans (1998) used an estimator proposed by Heckman (1990a) and confirmed slow rates of convergence of the estimator for the constant and also showed that this rate depends on the thickness of the tails of the regression function distribution relative to that of the error term. This rate can be as slow as cube root and can be arbitrarily close to root n. The class of models we consider in this paper are ones where the parameter of interest can be written as a closed form solution to a weighted moment condition. The weight in these models can take arbitrary small values, and causes point identification to be fragile. This will mean that the rate of convergence is generally slower than the regular rate. This exact rate is determined by relative tail conditions on observed and unobserved variables. In some models, not only is point identification delicate, but the model will not be able to have any information about the true parameter, i.e., the identified set is the whole parameter space. This leads to what we call non-robust irregular identification in the sense that one either learns the parameter precisely, or, the model provides no information about the parameter 5 See also Newey (1990) for other impossibility theorems and notions of regular and irregular estimators. 3

5 of interest. We view this lack of robustness as an added drawback of inference in this class of models. Heuristically, we consider models where the parameter of interest can be written as an expectation of some function of observed variables. θ 0 = E[g(y i, x i )] (2.1) where θ 0 is a finite dimensional parameter of interest, and y i, x i is finite dimensional vector of observed random variables What we are particularly interested in this paper is the class of models above where E[ g(y i, x i ) 2 ] = (2.2) where denotes the Euclidean norm. As this paper will show, many recently proposed identification strategies based on inverse weighting will fall into this class, as the inverse weight function can become arbitrarily large when the conditioning variables take the limiting values of their support. As we will show in this paper identification of θ 0 in these cases is irregular as defined: Definition 2.1 (Irregular Identification of a parameter vector) We say that a parameter vector is irregularly point identified if it cannot be estimated regularly 6 at the parametric rate. Furthermore, in this paper we will distinguish between two classes of irregularly identified models, robust and non-robust, defined as follows: Definition 2.2 (Robust Irregular Identification of a parameter vector) We say that a parameter vector is robustly irregularly point identified if it is irregularly identified as defined above, yet the function g( ) in (2.1) is bounded on the support of its arguments. We note that this definition is a special case of the classical definition of robustness (see, e.g. Hampel (1986)) where we focus attention to irregularly identified models. The rest of this paper will explore the statistical properties of estimators of models based on (2.1). As we will prove, these estimators generally converge at a rate slower than the standard parametric rate, and whose optimal rate depends on relative tail conditions on the random variable x i and the unobserved components also governing the behavior of y i. This general setting will 6 see, e.g. Newey (1990) for the conditions for an estimator to be regular. 4

6 encompass many models in econometrics, but for illustrative purposes, our discussion will focus on two important models in the recent literature that are based on weighted moment conditions: the binary response model under mean restrictions, and the average treatment effect estimator under conditional independence. 2.1 Preview of Approach The key problem with an empirical analog estimator of θ ˆθ = 1 n g(y i, x i ) (2.3) is that although consistent (assume that θ 0 = 0), has the unattractive property that E[ g(y i, x i ) 2 ] = which does not allow us to use a standard CLT to say that ˆθ is root-n consistent and asymptotically normal. One approach is to introduce a trimming sequence τ ni = τ(y i, x i, n) > a.s. 0, τ ni 1 such that E[τ 2 ni g(y i, x i ) 2 ] γ n < (2.4) But note that since τ ni is converging to 1, γ n will generally diverge to infinity as the sample size increases. Thus we can explore the asymptotic properties of the trimmed estimator: ˆθ n = 1 n τ ni g(y i, x i ) (2.5) This trimming, which disregards observations in the sample where the value of the denominator is small, will introduce bias which has to be taken into account. So, ultimately for the binary response model and the ATE model we consider, we address the following questions. 1. What is the optimal rate of convergence for ˆθ n? 2. What is the limiting distribution for ˆθ n? 3. How does one conduct inference for θ 0 given the rate of convergence of its estimator will generally be unknown? Addressing the first question is completely analogous to deriving optimal rates for nonparametric estimation. That is, the variance and bias are inversely related in the sense that here τ ni controls the variance, but at the expense of inducing bias. Therefore, as in nonparametric 5

7 estimation, we can solve for an optimal sequence τni that balances the bias and the variance.to address the second question, we can apply the Lindeberg theorem to the centered trimmed estimator that should hold under a version of a Lindberg condition. Finally, to address the third question, our proposed approach to conduct inference based on ˆθ n will be to studentize the estimator. As we will show, under our conditions the t-statistic will be rate adaptive in the sense that it will converge in probability to a normal distribution regardless of the rate of convergence of ˆθ. We next apply this framework to specific models, beginning with the binary choice model. Next, we examine the average treatment effect model under conditional independence. 3 Binary Choice with Mean Restrictions In the standard binary response model y = 1[x β + ɛ 0] Manski (1988) showed that a conditional mean restriction does not identify the parameter β. In fact, he showed that the mean independence assumption is not strong enough in binary response models to even bound β. Hence, we modify this model by adding more assumptions to ensure point identification. The ensuing model is a simplified version of the one introduced by Lewbel (1997): y i = 1[α + v i ɛ i 0] (3.1) where v i is a scalar random variable that is independent of ɛ i and both ɛ i and v i have support on the real line and E[ɛ i ] = 0. We observe both y and v and the object of interest is the parameter α. The location restriction on ɛ point identifies α. We start with the following: P (y i = 1 v i ) = F ɛ (α + v i ) (3.2) where F ɛ (.) is the cdf of ɛ where we assume this cdf is a strictly increasing function. Lewbel (1997) derived the following relation α = E[(y i I[v i > 0]) 1 ] E[(y i I[v i > 0])w(v i )] (3.3) with the weight function w(v i ) = 1 where f(.) here denotes the density of v i. This type of identification is sensitive to conditions imposed on the support of v and ɛ. For example, 6

8 identification of α is lost when one excludes sets of v of arbitrary small probability when v is in the tails. The model in this case will not contain any information about α (trivial bounds). This is a case of non-robust or thin-set identification. To see this, note that if we restrict the support of v to lie on the set [ K, K] for any K > 0, α will not be point identified: K α = vf ɛ (α + v)dv } {{ } (1) K v v P (y = 1 v)dv vf ɛ (α + v)dv K } {{ } K } {{ } (2) (3) We see that only (2) is identified, while (1) and (3) are not since, in (3.2), we can only learn F ɛ from α K to α + K. So, no matter how large K is, the model is set identified. In addition, we see that one can easily choose the unidentified portion of f ɛ (.) (parts (1) and (3) above) in such a way that the model with these (strange) support conditions provides no information about α. This was shown for the Lewbel model Magnac and Maurin (2005) (we view this property as similar to the nonrobustness of the sample mean). To guarantee point identification when ɛ takes support on the real line ([, + ]), v is also required to take arbitrary large values. This will mean that the density of v i vanishes at these, implying that there is a set of arbitrarily small measure, in this case v > M, for an arbitrarily large constant M, where the weight function in (3.3) becomes unbounded. This suggests identification of α based on (3.3) is irregular in the sense that an analog estimator will not generally converge at the parametric rate 7 8. We confirm this later 9 by showing how optimal rates of the analog estimator can vary widely with tail behavior conditions on observed and unobserved variables. This nonuniform behavior of the analog estimator makes conducting inference difficult, and motivates our proposed rate adaptive inference procedure which we explain in the following section. 7 This is a property shared by the estimator in Heckman (1990b) and Andrews and Schafgans (1998) for an estimator of the intercept term in a sample selection model. As is the case in that setting, the slower rate attained does not necessarily imply inefficiency of the analog estimator. In fact the appendix derives a result analogous to an impossibility theorem in Chamberlain (1986) 8 In cases where ɛ has bounded support, it is possible that α is identified regularly, if the support of v is large relative to that of ɛ. One can check that easily since for those large values of v, the choice probability hits 0 or 1. 9 We also derive the efficiency bounds for this model in the Appendix. 7

9 3.1 Rate Adaptive Inference in the Binary Choice Model We propose the use of rate-adaptive inference procedures to estimate the binary response model above. This is similar to what Andrews and Schafgans (1998) did for the selection model. Let ˆα n denote the trimmed variant of the estimator proposed in Lewbel (1997) for the intercept term: ˆα n = 1 n y i I[v i > 0] I[ v i γ 2n ] (3.4) ˆ where ˆ is a kernel estimator for the density of v (details of the kernel estimation procedure can be found in the Appendix). The estimator includes the additional trimming term I[ v i γ 2n ] where γ 2n is a deterministic sequence of numbers that satisfies lim n γ 2n = and lim n γ 2n /n = 0. Effectively, this extra term helps govern tail behavior. Trimming this way suffices (under the assumptions in Lewbel (1997)) to deal with the denominator problem associated with the density function getting small, which, under stated conditions in Lewbel (1997)), only happens in the tails of v i. Let Ŝn denote a trimmed estimator of its asymptotic variance if conditions were such that the asymptotic variance were finite Ŝ n = 1 n ˆp(v i )(1 ˆp(v i )) ˆ 2 I[ v i γ 2n ] (3.5) where ˆp( ) denotes a kernel estimator of the choice probability function (or the propensity score). Our main result in this section is that the studentized estimator converges to a standard normal distribution. We consider this result as rate-adaptive in the sense that the limiting distribution holds regardless of the rate of convergence of the un-studentized estimator. We state this formally as a theorem, which first requires the following definitions. Define v(γ 2n ) = E[ p(v i)(1 p(v i )) 2 I[ v i γ 2n ]] (3.6) and b(γ 2n ) = α 0 E[ y i I[v i > 0] I[ v i γ 2n ]] (3.7) X ni = y i p(v i ) I[ v i γ 2n ] (3.8) The first piece v(γ 2n ) is the variance of the infeasible estimator where in (3.4), the true f(.) is used. This term is of the same order as the variance of the feasible estimator (See the 8

10 Appendix). Similarly, the term in (3.7) is the bias term for the infeasible estimator. The form of X in is the linear representation of ˆα n, which involves first linearizing the estimator around ˆf and then taking the usual U-statistic projections. See section A.4 of the appendix for details on this. Theorem 3.1 Suppose (i)γ 2n, (ii) ɛ > 0, lim n 1 v(γ 2n ) E[X2 nii[ X ni > ɛ nv(γ 2n )] = 0 (3.9) and (iii) nv(γ 2n )b(γ 2n ) 0, then n Ŝ 1/2 n (ˆα α 0 ) N(0, 1) (3.10) The proof is left to the appendix and here we only comment on the conditions. Condition (ii) is a Lindeberg that allows us to apply a central limit theorem. Condition (iii) on the other hand is needed to ensure that the bias will vanish with sample size. Other conditions on the kernel and the other smoothing parameters are given in section A.3 in the appendix. This result is analogous to the results in Andrews and Schafgans (1998) who considered asymptotic properties of the identification at infinity estimator proposed in Heckman (1990a). Note that the rate of convergence of this estimator depends on the behavior of Ŝn. For example, in cases where Ŝn converges in probability to a non-random matrix, then the estimator above will converge at the regular root n rate. However, in some interesting cases that we go over below, the variance of the estimator goes to infinity which results in slower than regular rates. Finally, for a model with regressors x, the estimator for the vector of parameters (which include both slope and intercept) can be obtained in closed form (See Lewbel (1997)): ˆβ = { 1 n x i x i} 1 1 n x i y i 1[v i 0] ˆf(v i x i ) 2 I[ v i γ 2n ] and so, the studentized estimator is similar to the one above. 3.2 Relative Tail Behavior and Rates of Convergence in Special Cases In this section, we investigate further the rate of convergence of the estimator in the previous section. This rate of convergence depend on the relative tail behavior of the density of v vs the tails of the propensity score, or the rate at which the propensity score approaches 9

11 0 and 1. In generic cases, the rate of convergence for the estimator of the intercept term in (3.1) is slower than parametric rate. The result holds for any model where the tail of the special regressor v is as thin or thinner than the tail of the error term. We also point out that, there are cases (when the moment conditions in the above theorem are violated) where the rate of convergence reaches the regular parametric rate. As we show below, when v i is Cauchy, then the estimator of α converges at root n rate 10. Here, we focus on the rate of convergence for the infeasible estimator. This rate of convergence is of the same order as the one of the feasible estimator (See the Appendix for more on this). Consider the infeasible estimator of α that was proposed by Lewbel (1997) ˆα = 1 n y i I[v i > 0] I[ v i γ 2n ] (3.11) Next we define ᾱ n = E[ˆα]. In what follows we will establish a rate of convergence and limiting distribution for ˆα ᾱ n. To do so we first define the sequence of constants v(γ 2n ) = Var( y i I[v i >0] f(v i I[ v ) i γ 2n ]), and in what will follow we let h n = v(γ 2n ) 1 As alluded to in our previous discussion, we will generally have lim n v(γ 2n ) =, which will result in a slower rate of convergence. To derive the distribution of the above, we will apply the Lindeberg condition to the following triangular array: hn nhn (ˆα ᾱ n ) = n ( ) yi I[v i > 0] I[ v i γ 2n ] ᾱ n (3.12) Before claiming asymptotic normality of the above term we verify that the Lindeberg condition for the array is indeed satisfied. Specifically, we need to establish the limit of [ ((yi ) 2 ( ) ] 2 I[v i > 0]) (yi I[v i > 0]) h n E I[ v i γ 2n ] ᾱ n I[ ᾱ n > n ɛ 2 ] (3.13) h n [ ( ) ] 2 for ɛ > 0. The above limit is indeed zero since h n E[ (yi I[v i >0]) f(v i I[ v ) i γ 2n ] ᾱ n = 1 and n h n. Therefore, we can conclude that nhn ( ˆα n ᾱ n ) N(0, 1) (3.14) 10 The notion of attaining a faster rate of convergence when moments are infinite is not new. For example, it is well known that when regressors have a cauchy distribution in the basic linear model, OLS is superconsistent. 10

12 So we have established that the rate of convergence of the centered estimator is governed by the limiting behavior of h n. As we will show below, in most cases, h n 0, resulting in a slower rate of convergence, as is to be expected given our efficiency bound calculations in A.1. Of course, the above calculations only derives rates for the centered estimator, and not the estimator itself. convergence of the bias term: so we have: To complete the distribution theory we need to derive the rate of nhn (ᾱ n α 0 ) and α 0 = E[ y i I[v i > 0] ] b n ᾱ n α 0 = γ 2n (p(v) 1)dv (3.15) where p(v) = P (y i = 1 v i = v) is the propensity score. Clearly we have lim n b n = 0 if we maintain that lim n γ 2n =. There we can see that the limiting distribution of the estimator can be characterized by two components, the variance, which we see depends on the limiting ratio of the propensity score divided by the regressor density, and the bias, which depends on the rate at which the propensity score converges to 1. Our results are somewhat abstract as stated since we have not yet stated conditions on γ 2n except that it increases to infinity. Nor have we stated what the sequences h n and b n look like as functions of γ 2n. We show now how they relate to tail behavior assumptions on the regressor distribution and latent error term. First we calculate the form of the variance term v(γ 2n ) as a function of γ 2n. Note we can express v(γ 2n ) as the following integral: v(γ 2n ) = γ2n p(v)(1 p(v) γ 2n f(v) dv + Var((E[( y i I[v i > 0] I[ v i γ 2n ]) v i ] ᾱ n )) (3.16) We will focus on the first term because we will now show the second term is negligible when compared to the first. The second term in the variance is of the form: Var(E[( y i I[v i > 0] I[ v i γ 2n ]) v i ] ᾱ n ) (3.17) The above variance is of the form: E[(ᾱ n (v i ) ᾱ n ) 2 ] (3.18) 11

13 where ᾱ n (v i ) denotes the conditional expectation in (3.17). Note that since ᾱ n converges to α 0 and E[ᾱ n (v i )] = ᾱ n the only term in the above expression that may diverge is E[ᾱ n (v i ) 2 ] (3.19) This term can be expressed as the integral: γ2n (1 p(v)) 2 γ 2n f(v) dv (3.20) which will generally be dominated by the first piece of the variance term in (3.16), which recall was of the form: γ2n p(v)(1 p(v)) dv (3.21) γ 2n f(v) So generally speaking, as far as deriving the optimal rate of convergence for the estimator, we can ignore this component of the variance. We can show that the bias term is b n = E[ᾱ n ] α = γ2n This bias term behaves asymptotically like: γ 2n {p(v + α) 1[v 0]}dv α (3.22) b n γ 2n (1 p(γ 2n )) (3.23) Note that the bias term does not directly depend on the density of v i. With these general results, we now calculate the rate of convergence for some special cases corresponding to various tail behavior conditions on the regressors and the error terms. - [Logit Errors/Logit Regressors] Here we assume the latent error term and the regressors both have a standard logistic distribution. We consider this to be the main example, as the results that arise here ( a slower than root-n rate) will generally also arise anytime we have distributions whose tails decline exponentially, such as the normal, logistic and Laplace distributions. From our calculations in the previous section we can see that first v(γ 2n ) = γ 2n and b n = ln 1 + eγ 2n+α 1 + e γ 2n+α ln(eγ 2n+α ) (3.24) 12

14 v(γ 2n ) = γ 2n and b n = γ 2n exp( γ 2n ) 1 + exp( γ 2n ) (3.25) = γ2 2n exp( 2γ 2n ) So to minimize mean squared error we set γ 2n n (1+exp( γ 2n to get γ )) 2 2n = O(log n 1/2 ), resulting in the rate of convergence of the estimator: n log n (ˆα α 0) = O 1/2 p (1) (3.26) Furthermore, from the results in the previous section, the estimator will be asymptotically normally distributed, and have a limiting bias. - [Normal Errors/Normal Regressors] Here we assume that the latent error term and the regressors both have a standard normal distribution. To calculate v(γ 2n ) in this case, we can use the asymptotic properties of the Mills Ratio (see, e.g. Gordon(1941, AMS, 12)), r( ): r(t) 1 t as t (3.27) to approximate v(γ 2n ) log(γ 2n ). For this case we have b n γ n exp( γ2n/2). 2 So to minimize MSE, we set γ 2n = log n and the estimator is O p ( log( log n) n ). - [Logit Errors/ Normal Regressors] Here we assume the latent error term has a logistic distribution but the regressors have a normal distribution. As we show here the rate of convergence for the estimator will be very slow. The variance is of the form: v(γ 2n ) = γ2n exp(v 2 /2)dv (3.28) which can be approximated as: v(γ 2n ) exp(γ 2 n/2)γ 1 2n see, e.g. Weisstein (1998) 11. The bias is of the form: b n γ n exp( γ 2n ) (3.29) So the MSE minimizing sequence is of the form: γ 2n = log n Resulting in the rate of convergence O p ( n 1/4 4 log n ) 11 Erfi. From MathWorld A Wolfram Web Resource. 13

15 - [Logit Errors/Cauchy Regressors] Here we assume the latent error term has a standard logistic distribution but the regressor has a standard cauchy distribution. From our calculations in the previous section we can see that and v(γ 2n ) = γ 2n exp( γ n ) (1 + exp( γ n )) 2 (1 + γ2 n) (3.30) b n = γ 2n exp( γ 2n ) 1 + exp( γ 2n ) (3.31) We note in this situation v(γ 2n ) remains bounded as γ 2n, so the variance of the estimator is O( 1 n ). Therefore we can let γ 2n increase to infinity as quickly as desired to ensure b n = o p (n 1/2 ). Therefore in this case we can conclude that the estimator is root-n consistent and asymptotically normal with no asymptotic bias. - [Probit Errors/Cauchy Regressors] Here we assume the latent error term has a standard normal distribution but the regressor has a standard cauchy distribution. From our calculations in the previous section we can see that v(γ 2n ) γ 2n exp( γ 2 2n/2)γ 1 2n (1 + γ 2 2n) (3.32) where the above approximation is based on the asymptotic series expansion of the erf. Furthermore b n = γ 2n exp( γ 2 2n/2)γ 1 2n (3.33) We can see immediately that no matter how quickly γ 2n, v(γ 2n ) = O(1), so we can set γ 2n = O(n 2 ) to that n(ˆα α0 ) = O p (1) (3.34) and furthermore the estimator will be asymptotically normal and asymptotically unbiased. - Differing Support Conditions: Regular Rate] Let the support of the latent error term be strictly smaller than the support of the regression function. For γ 2n sufficiently large p(γ 2n ) takes the value 1, so in this case we need not use the above approximation and lim γ2n v(γ 2n ) is finite. This implies we can let γ 2n increase as quickly as possible to remove the bias, so the estimator is root-n consistent and asymptotically normal with no limiting bias. This is a case where for example α + v in (3.1) has strictly larger support than ɛ s. 14

16 As the results in this section illustrates, the rates of convergence for the analog inverse weight estimators can vary widely with tail behavior conditions on both observed and unobserved random variables, and the optimal rate rarely coincides with the parametric rate. Hence, semiparametric efficiency bounds might not be as useful for this class of models. Overall, this confirms the results in Lewbel (1997) about the relationship between the thickness of the tails and the rates of convergence. So to summarize our results, we say that the optimal rate is directly tied to the relative tail behavior of the special regressor and the error term. Rates will generally be slower than the parametric rate, which can only be attained with infinite moments on the regressor, or strong support conditions. The latter ensure the propensity scores attains its extreme values with positive probability. 4 Treatment Effects Model under Exogenous Selection This section studies another example of a parameter that is written as a weighted moment condition and where regular rates of convergence requires that support conditions be met, essentially guaranteeing that the denominator be bounded away from zero. We show that the Average Treatment Effect estimator under exogenous selection cannot generally be estimated at the regular rate unless the propensity score is bounded away from 0 and 1, and so, estimation of this parameter can lead to unstable estimates if these support (or overlap) conditions are not met. A central problem in evaluation studies is that potential outcomes that program participants would have received in the absence of the program is not observed. Letting d i denote a binary variable taking the value 1 if treatment was given to agent i, and 0 otherwise, and letting y 0i,y 1i denote potential outcome variables, we refer to y 1i y 0i as the treatment effect for the i th individual. A parameter of interest for identification and estimation is the average treatment effect, defined as: β = E[y 1i y 0i ] (4.1) One identification strategy for β was proposed in Rosenbaum and Rubin (1983), who imposed the following: Assumption 1 (ATE under Conditional Independence) Let the following hold: (i) There exists an observed variable x i s.t. d i (y 0i, y 1i ) x i 15

17 (ii) 0 < P (d i = 1 x i ) < 1 x i See also Hirano, Imbens, and Ridder (2003). The above assumption can be used to identify β as or β = E X [E[y i d i = 1, x i ] E[y i d i = 0, x i ]] (4.2) β = E X [E[y i d i = 1, p(x i )] E[y i d i = 0, p(x i )]] (4.3) where p(x i ) = P (d i = 1 x i ). The above parameter can be written as: y i (d i p(x i )) β = E[ p(x i )(1 p(x i )) ] (4.4) The above parameter β is a weighted moment condition where the denominator gets small if the propensity score approaches 0 or 1. Also, identification is lost when we remove any region in the support of x i (so, fixed trimming will not identify β above). Without any further 1 restrictions, there can be regions in the support of x where becomes arbitrarily p(x)(1 p(x)) large, suggesting an analog estimator may not converge at the parametric rate. Whether or not it can is a delicate question, and depends on the support conditions and on the propensity score. Hahn (1998), in very important work, derived efficiency bounds for the model based on Assumption 1. These bounds can be infinite and in fact, there will be many situations where this can happen. For example, in the homoskedastic setting with one continuous regressor, anytime the distribution of the regressor is the same as that of the error term in the treatment equation, so (ii) is satisfied, the variance bound in is infinity. Thus we see that, such as was the case in the binary choice model, rates of convergence will depend on relative tail behaviors of regressors and error terms. Consequently, inference can be more complicated, and one might want to supplement standard efficiency bound calculations with more. For inference, we again propose a rate adaptive procedure based on a studentized estimator. 4.1 Rate Adaptive Inference in the Treatment Effects Model We derive the asymptotic distribution of the feasible ATE estimator when the propensity score p(x) is replaced by a nonparametric estimate ˆp(x) which is kernel based. The feasible estimator is 16

18 ˆβ n = 1 n Moreover, let ( di y i ˆp(x i ) (1 d ) i)y i I[ x i γ 2n ] (4.5) 1 ˆp(x i ) X in = [( di y i p(x i ) (1 d ) ( i)y i 1 p(x i ) α µ1 (x i ) 0 p(x i ) + µ ) ] 0(x i ) (d i p(x i ) 1[ x i γ 2n ] 1 p(x i ) This represents the first term in the linear representation (see the appendix for more on this). As in the previous section, let Ŝn denote the trimmed estimator of its asymptotic variance, which is the variance of the above linear term and can be written as (assuming a homoskedastic model with unit variances) Ŝ n = 1 n [ ) ] 2 (Ê[yi d i = 1, x i ] Ê[y 1 i d i = 0, x i ] ˆα n + 1[ x i γ 2n ] ˆp(x i )(1 ˆp(x i )) Again, we provide below the main result which is the asymptotic distribution of the studentized estimator. First, we make the following definitions: where here ( di y i β 0 = E[ β n = E[ ˆβ 1 n ]; v(γ 2n ) = E[ p(x i )(1 p(x i ) I[ x i γ 2n ]]; b n = β n β 0 p(x i ) (1 d i)y i 1 p(x i ) We next state the main theorem of this section. ) ] (4.6) Theorem 4.1 Suppose (i)γ 2n, (ii) ɛ > 0, lim n 1 v(γ 2n ) E[X2 nii[ X ni > ɛ nv(γ 2n )] = 0 (4.7) and (iii) nv(γ 2n )b(γ 2n ) 0, then n Ŝ 1/2 n ( ˆβ n β 0 ) N(0, 1) (4.8) 17

19 4.2 Average Treatment Effect Estimation Rates Here we conduct the same rate of convergence exercises for estimating the average treatment effect as the ones we computed for the binary response model in the previous section. As a reminder, we do not compute what the variance or the bias are exactly, rather, we are interested in the rate at which these need to behave as sample size increases. We will explore the asymptotic properties of the following estimator: ˆα = 1 ( di y i n p(x i ) (1 d ) i)y i I[ x i γ 2n ] (4.9) 1 p(x i ) We note that this estimator is infeasible, this time because we are assuming the propensity score is known. Like in the binary choice model this will not effect our main conclusions (of course its asymptotic variance will be different than the feasible estimator, but will have the same order). Also, like in the binary choice setting, we assume that it suffices to trim on regressor values, and that the denominator only vanishes in the tails of the regressor. Furthermore, to clarify our arguments we will assume here that the counterfactual outcomes are homoskedastic with a variance of 1. Define h n = v(γ 2n ) 1. Carrying the same arguments as before we explore the asymptotic properties of v(γ 2n ) and b n as γ 2n. Generally speaking, anytime v(γ 2n ), the fastest rate for the estimator will be slower than root-n. We can express v(γ 2n ) as the following integral: v(γ 2n ) γ2n γ 2n f(x) dx (4.10) p(x)(1 p(x)) which using the same expansion as before, will behave asymptotically like v(γ 2n ) γ 2n f(γ 2n ) p(γ 2n )(1 p(γ 2n )) (4.11) For the problem at hand we can write b n approximate as the following integral: b n = γ 2n ( γ 2n m(x)f(x)dx + γ2n m(x)f(x)dx) (4.12) where m(x) is the conditional ATE. We now consider some particular examples corresponding to tail behavior on the error term in the treatment equation and the regressor. Logit Errors/Regressors/Bounded CATE Here we assume the latent error term and the regressors both have a standard logistic distribution, and to simplify calculations 18

20 we will assume the conditional average treatment effect is bounded on the regressor support. We consider this to be a main example, as the results that arise here ( a slower than root-n rate) will generally also arise anytime we have distributions whose tails decline exponentially, such as the normal, logistic and Laplace distributions. From our calculations in the previous section we can see that v(γ 2n ) = γ 2n (4.13) and b n γ 2n exp( γ 2n ) (1 + exp( γ 2n )) 2 (4.14) Clearly v(γ 2n ) resulting in a slower rate of convergence. To get the optimal rate we solve for γ 2n that set v(γ 2n )/n = b 2 n. We see this matches up when γ 2n = 1 log n, 2 resulting in the rate of convergence of the estimator: n log n (ˆα α 0) = O 1/2 p (1) (4.15) Furthermore, from the results in the previous section, the estimator will be asymptotically normally distributed, and have a limiting bias. We notice this the exact same result attained for the binary choice model considered previously. As mentioned, similar slower than parametric rates will be attained when the error distribution and regressor have similar tail behavior. Normal Errors/Logistic Regressors/Bounded CATE Here we assume the latent error term has a standard normal distribution and the regressors both have a standard logistic distribution, and to simplify calculations we will assume the conditional average treatment effect is bounded on the regressor support. In this situation we have: v(γ 2n ) γ2n γ 2n exp( x) dx (4.16) 1 Φ(x) which by multiplying and dividing the above fraction inside the integral by φ(x), the standard normal p.d.f, and using the asymptotic approximation to the inverse Mills ratio, we get v(γ 2n ) γ2n gamma 2n exp(x 2 /2)xdx = exp(γ 2n ) (4.17) 19

21 In this setting the bias is approximately of the form b n = γ 2n f(x)dx + γ2n so the MSE minimizing value of γ 2n here is f(x)dx exp( γ 2n ) (4.18) γ 2n = log n 1/3 (4.19) so the estimator is O p (n 1/3 ). Differing Support Conditions/Bounded CATE Here we assume the support of the latent error term in the treatment equation is strictly larger than the support of the regressor and the support of the regression function in the treatment equation. For γ 2n sufficiently large p(γ 2n ) remains bounded away from 0 and 1 so lim γ2n v(γ 2n ) is finite. We can let γ 2n increase as quickly as possible to remove the bias, resulting in a root-n consistent estimator that is asymptotically normal, completely analogous to the result attained before under differing supports in the binary choice model. Cauchy Errors/Logistic Regressors/Bounded CATE Here we impose heavy tails on treatment effect error term, assuming it has a Cauchy distribution, but we impose exponential tails on the regressor distribution, assuming here that it has an logistic distribution, though similar results would hold if we assumed normality. v(γ 2n ) γ 2n exp( γ 2n ) 1 4 arctan(γ 2n) 2 (4.20) where after using L Hospitale s rule we can conclude that: v(γ 2n ) γ 2n exp( γ 2n )(1 + γ 2 2n) (4.21) which remains bounded as γ 2n. Therefore we can let γ 2n increase arbitrarily quickly in n, say γ 2n = O(n 2 ) in which case the estimator will be root-n consistent and asymptotically normal with no asymptotic bias. 5 Conclusion This paper points out a subtle property of estimators based on weighted moment conditions. In some of these models, there is an intimate link between identification and estimability, in that if it is necessary for (point) identification that the weights take arbitrary large values, 20

22 then the parameter of interest, though point identified, cannot be estimated at the regular rate. This rate depends on relative tail conditions and can be as slow as O(n 1/4 ). Practically speaking, this nonstandard rate of convergence can lead to numerical instability, and/or large standard errors. See for example the recent work of Busso, DiNardo, and McCrary (2008) numerical evidence on this based on the ATE parameter. We provide in this paper a studentized approach to inference in which there is no need to know explicitly the rate of convergence of the estimator. There are a few remaining issues that need to be addressed in future work. One is the choice of the trimming parameter, γ 2n, in small samples. We provide bounds on the rate of divergence to infinity of this parameter in the paper. One approach to pick a rate is similar to what is typically done for kernel estimators for densities, mainly pick the parameter that minimizes a the first term expansion of the mean squared error. For example, in the ATE case, γ 2n can be chosen to minimize the sum of an empirical version of (4.10) and the squared of the empirical version of (4.12). In addition, in this paper, we saw examples of non-robust point identification in which the model has no information on the parameter if a set with arbitrary small probability is taken out of the support. This notion of identification deserves further study also. 21

23 References Andrews, D. W. K., and M. Schafgans (1998): Semiparametric Estimation of the Intercept of a Sample Selection Model, Review of Economic Studies, 65, Busso, M., J. DiNardo, and J. McCrary (2008): Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects, Working Paper, UC - Berkeley. Chamberlain, G. (1986): Asymptotic efficiency in semi-parametric models with censoring, journal of Econometrics, 32(2), Collomb, G., and W. Hardle (1986): Strong Uniform Convergence Rates in Robust Nonparametric Time Series Analysis and Prediction: Kernel Regression Estimation from Dependent Observations, Stochastic Processes and their Applications, 23, Frolich, M. (2004): Finite-Sample Properties of Propensity-Score Matching and Weighting Estimators, Review of Economics and Statistics, 86(1), Hahn, J. (1998): On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects, Econometrica, 66(2), Hampel, F.R., R. E. R. P. S. W. (1986): Robust Statistics The Approach Based on Influence Functions. Wiley Interscience. Heckman, J. (1990a): Varieties of Selection Bias, The American Economic Review, 80(2), Heckman, J. (1990b): Varieties of Selection Bias, American Economic Review, 80, Hirano, K., G. W. Imbens, and G. Ridder (2003): Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score, Econometrica, 71(4), Imbens, G. W. (2004): Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review, Review of Economics and Statistics, 86(1), Lewbel, A. (1997): Semiparametric Estimation of Location and Other Discrete Choice Moments, Econometric Theory, 1997(1), Lewbel, A. (1998): Semiparametric Latent Variable Model Estimation with Endogenous or Mismeasured Regressors, Econometrica, 66(1), (2000): Semiparametric Qualitative Response Model Estimation with Unknown Heteroscedasticity or Instrumental Variables, Journal of Econometrics, 97, Magnac, T., and E. Maurin (2005): Identification and Information in Monotone Binary Models, Journal of Econometrics, forthcoming. 22

24 Manski, C. F. (1988): Identification of Binary Response Models, Journal of the American Statistical Association, 83. Newey, W. (1990): Semiparametric Efficiency Bounds, Journal of Applied Econometrics, 5(2), Newey, W., and D. McFadden (1994): Large Sample Estimation and Hypothesis Testing, in Handbook of Econometrics, Vol. 4, ed. by R. Engle, and D. McFadden. North Holland. Powell, J., J. Stock, and T. Stoker (1989): Semiparametric Estimation of Index Coefficients, Econometrica, pp Rosenbaum, P., and D. Rubin (1983): The Central Role of the Propensity Score in Observational Studies for Causal Effects, Biometrika, 70, Stein, C. (1956): Efficient Nonparametric Testing and Estimation, Proc. Third Berekely Symp. on Math. Stat. and Prob., 1, Stoker, T. (1991): Equivalence of Direct, Indirect, and Slope Estimators for Average Derivatives, Nonparametric and Semiparametric Methods, pp Weisstein, E. (1998): The CRC Concise Encyclopedia of Mathematics. CRC Press. 23

25 A Appendix In section A.1, we provide impossibility results for the binary choice model above. In sections A2-A6, we provide proofs for the theorems in Section 3, for the binary choice model. A7 outlines the proof for the linear representation for the ATE model. A.1 Infinite Bounds As alluded to earlier in the paper the slower than parametric rates of the inverse weight estimators do not necessarily imply a form of inefficiency of the procedure. It may be the case that the parametric rate is unattainable by any procedure. In this section we confirm this by establishing an impossibility theorem- analogous to those established in Chamberlain (1986). We show that efficiency bounds are infinite for a variant of model (3.1) above. Now, introducing regressors x i, we alter the assumption so that ɛ x, v = d ɛ x and that E[ɛ x] = 0 (here, = d means has the same distribution as ). We also impose other conditions such that ɛ x and v x have support on the real line for all x. The model we consider now is y = 1[α + xβ + v ɛ 0] (A.1) The following theorem, proven below, states that there does not exist any regular root-n estimator for α and β in this model. Theorem A.1 In the binary choice model (A.1) with exclusion restriction and unbounded support on the error terms, and where ɛ x, v = d ɛ x and E[ɛ x] = 0, if the second moments of x, v are finite, then the semiparametric efficiency bound for β is not finite. Remark A.1 The proof of the above results are based on the assumption of second moments of the regressors being finite. This type of assumption was also made in Chamberlain (1986) in establishing his impossibility result for the maximum score model. A.2 Proof of Theorem A.1 To prove this theorem, we follow the approach taken in Chamberlain (1986). Specifically, look for a subpath around which the variance of the parametric submodel is unbounded. We first define the function: g 0 (t, x) = P (ɛ i t x i = x) 24

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Semiparametric Efficiency in Irregularly Identified Models

Semiparametric Efficiency in Irregularly Identified Models Semiparametric Efficiency in Irregularly Identified Models Shakeeb Khan and Denis Nekipelov July 008 ABSTRACT. This paper considers efficient estimation of structural parameters in a class of semiparametric

More information

Informational Content in Static and Dynamic Discrete Response Panel Data Models

Informational Content in Static and Dynamic Discrete Response Panel Data Models Informational Content in Static and Dynamic Discrete Response Panel Data Models S. Chen HKUST S. Khan Duke University X. Tang Rice University May 13, 2016 Preliminary and Incomplete Version Abstract We

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON Department of Economics, Michigan State University East Lansing, MI 48824-1038, United States (email: carls405@msu.edu)

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp page 1 Lecture 7 The Regression Discontinuity Design fuzzy and sharp page 2 Regression Discontinuity Design () Introduction (1) The design is a quasi-experimental design with the defining characteristic

More information

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics DEPARTMENT OF ECONOMICS R. Smith, J. Powell UNIVERSITY OF CALIFORNIA Spring 2006 Economics 241A Econometrics This course will cover nonlinear statistical models for the analysis of cross-sectional and

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION Arthur Lewbel Boston College December 19, 2000 Abstract MisclassiÞcation in binary choice (binomial response) models occurs when the dependent

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

On Uniform Inference in Nonlinear Models with Endogeneity

On Uniform Inference in Nonlinear Models with Endogeneity On Uniform Inference in Nonlinear Models with Endogeneity Shakeeb Khan Duke University Denis Nekipelov University of Virginia September 2, 214 Abstract This paper explores the uniformity of inference for

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Parametric Identification of Multiplicative Exponential Heteroskedasticity

Parametric Identification of Multiplicative Exponential Heteroskedasticity Parametric Identification of Multiplicative Exponential Heteroskedasticity Alyssa Carlson Department of Economics, Michigan State University East Lansing, MI 48824-1038, United States Dated: October 5,

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.

More information

A Discontinuity Test for Identification in Nonparametric Models with Endogeneity

A Discontinuity Test for Identification in Nonparametric Models with Endogeneity A Discontinuity Test for Identification in Nonparametric Models with Endogeneity Carolina Caetano 1 Christoph Rothe 2 Nese Yildiz 1 1 Department of Economics 2 Department of Economics University of Rochester

More information

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models James J. Heckman and Salvador Navarro The University of Chicago Review of Economics and Statistics 86(1)

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Econ 273B Advanced Econometrics Spring

Econ 273B Advanced Econometrics Spring Econ 273B Advanced Econometrics Spring 2005-6 Aprajit Mahajan email: amahajan@stanford.edu Landau 233 OH: Th 3-5 or by appt. This is a graduate level course in econometrics. The rst part of the course

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Simple Estimators for Semiparametric Multinomial Choice Models

Simple Estimators for Semiparametric Multinomial Choice Models Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper

More information

Exclusion Restrictions in Dynamic Binary Choice Panel Data Models

Exclusion Restrictions in Dynamic Binary Choice Panel Data Models Exclusion Restrictions in Dynamic Binary Choice Panel Data Models Songnian Chen HKUST Shakeeb Khan Boston College Xun Tang Rice University February 2, 208 Abstract In this note we revisit the use of exclusion

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn

More information

Estimating Semi-parametric Panel Multinomial Choice Models

Estimating Semi-parametric Panel Multinomial Choice Models Estimating Semi-parametric Panel Multinomial Choice Models Xiaoxia Shi, Matthew Shum, Wei Song UW-Madison, Caltech, UW-Madison September 15, 2016 1 / 31 Introduction We consider the panel multinomial choice

More information

New Developments in Econometrics Lecture 16: Quantile Estimation

New Developments in Econometrics Lecture 16: Quantile Estimation New Developments in Econometrics Lecture 16: Quantile Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Review of Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile

More information

The relationship between treatment parameters within a latent variable framework

The relationship between treatment parameters within a latent variable framework Economics Letters 66 (2000) 33 39 www.elsevier.com/ locate/ econbase The relationship between treatment parameters within a latent variable framework James J. Heckman *,1, Edward J. Vytlacil 2 Department

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E. NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES James J. Heckman Petra E. Todd Working Paper 15179 http://www.nber.org/papers/w15179

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

Instrumental Variables

Instrumental Variables Instrumental Variables Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Noncompliance in Experiments Stat186/Gov2002 Fall 2018 1 / 18 Instrumental Variables

More information

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata

More information

Tobit and Interval Censored Regression Model

Tobit and Interval Censored Regression Model Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 98-994 Research India Publications http://www.ripublication.com Tobit and Interval Censored Regression Model Raidani

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

ECON 3150/4150, Spring term Lecture 6

ECON 3150/4150, Spring term Lecture 6 ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College December 2016 Abstract Lewbel (2012) provides an estimator

More information

ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY

ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY Rosa L. Matzkin Department of Economics University of California, Los Angeles First version: May 200 This version: August 204 Abstract We introduce

More information

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity Jörg Schwiebert Abstract In this paper, we derive a semiparametric estimation procedure for the sample selection model

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008 Panel Data Seminar Discrete Response Models Romain Aeberhardt Laurent Davezies Crest-Insee 11 April 2008 Aeberhardt and Davezies (Crest-Insee) Panel Data Seminar 11 April 2008 1 / 29 Contents Overview

More information

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness finite-sample optimal estimation and inference on average treatment effects under unconfoundedness Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2017 Introduction

More information

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples DISCUSSION PAPER SERIES IZA DP No. 4304 A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples James J. Heckman Petra E. Todd July 2009 Forschungsinstitut zur Zukunft

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Information Structure and Statistical Information in Discrete Response Models

Information Structure and Statistical Information in Discrete Response Models Information Structure and Statistical Information in Discrete Response Models Shakeeb Khan a and Denis Nekipelov b This version: January 2012 Abstract Strategic interaction parameters characterize the

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy Department of Economics, Harvard University 1 / 40 Agenda instrumental variables part I Origins of instrumental

More information

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Lecture 8. Roy Model, IV with essential heterogeneity, MTE Lecture 8. Roy Model, IV with essential heterogeneity, MTE Economics 2123 George Washington University Instructor: Prof. Ben Williams Heterogeneity When we talk about heterogeneity, usually we mean heterogeneity

More information

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based

More information

A simple alternative to the linear probability model for binary choice models with endogenous regressors

A simple alternative to the linear probability model for binary choice models with endogenous regressors A simple alternative to the linear probability model for binary choice models with endogenous regressors Christopher F Baum, Yingying Dong, Arthur Lewbel, Tao Yang Boston College/DIW Berlin, U.Cal Irvine,

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Identification in Discrete Choice Models with Fixed Effects

Identification in Discrete Choice Models with Fixed Effects Identification in Discrete Choice Models with Fixed Effects Edward G. Johnson August 2004 Abstract This paper deals with issues of identification in parametric discrete choice panel data models with fixed

More information

Increasing the Power of Specification Tests. November 18, 2018

Increasing the Power of Specification Tests. November 18, 2018 Increasing the Power of Specification Tests T W J A. H U A MIT November 18, 2018 A. This paper shows how to increase the power of Hausman s (1978) specification test as well as the difference test in a

More information

Revisiting the Synthetic Control Estimator

Revisiting the Synthetic Control Estimator MPRA Munich Personal RePEc Archive Revisiting the Synthetic Control Estimator Bruno Ferman and Cristine Pinto Sao Paulo School of Economics - FGV 23 November 2016 Online at https://mpra.ub.uni-muenchen.de/77930/

More information

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL BRENDAN KLINE AND ELIE TAMER Abstract. Randomized trials (RTs) are used to learn about treatment effects. This paper

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models Location Properties of Point Estimators in Linear Instrumental Variables and Related Models Keisuke Hirano Department of Economics University of Arizona hirano@u.arizona.edu Jack R. Porter Department of

More information

APEC 8212: Econometric Analysis II

APEC 8212: Econometric Analysis II APEC 8212: Econometric Analysis II Instructor: Paul Glewwe Spring, 2014 Office: 337a Ruttan Hall (formerly Classroom Office Building) Phone: 612-625-0225 E-Mail: pglewwe@umn.edu Class Website: http://faculty.apec.umn.edu/pglewwe/apec8212.html

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 995-002 Research India Publications http://www.ripublication.com Truncated Regression Model and Nonparametric Estimation

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

Econ 2148, fall 2017 Instrumental variables II, continuous treatment

Econ 2148, fall 2017 Instrumental variables II, continuous treatment Econ 2148, fall 2017 Instrumental variables II, continuous treatment Maximilian Kasy Department of Economics, Harvard University 1 / 35 Recall instrumental variables part I Origins of instrumental variables:

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

A Local Generalized Method of Moments Estimator

A Local Generalized Method of Moments Estimator A Local Generalized Method of Moments Estimator Arthur Lewbel Boston College June 2006 Abstract A local Generalized Method of Moments Estimator is proposed for nonparametrically estimating unknown functions

More information

Controlling for overlap in matching

Controlling for overlap in matching Working Papers No. 10/2013 (95) PAWEŁ STRAWIŃSKI Controlling for overlap in matching Warsaw 2013 Controlling for overlap in matching PAWEŁ STRAWIŃSKI Faculty of Economic Sciences, University of Warsaw

More information

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS Olivier Scaillet a * This draft: July 2016. Abstract This note shows that adding monotonicity or convexity

More information

The Identification Zoo - Meanings of Identification in Econometrics: PART 3

The Identification Zoo - Meanings of Identification in Econometrics: PART 3 The Identification Zoo - Meanings of Identification in Econometrics: PART 3 Arthur Lewbel Boston College original 2015, heavily revised 2018 Lewbel (Boston College) Identification Zoo 2018 1 / 85 The Identification

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Locally Robust Semiparametric Estimation

Locally Robust Semiparametric Estimation Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper

More information

Regression Discontinuity Designs

Regression Discontinuity Designs Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational

More information

A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator

A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator Francesco Bartolucci and Valentina Nigro Abstract A model for binary panel data is introduced

More information

A Simple Estimator for Binary Choice Models With Endogenous Regressors

A Simple Estimator for Binary Choice Models With Endogenous Regressors A Simple Estimator for Binary Choice Models With Endogenous Regressors Yingying Dong and Arthur Lewbel University of California Irvine and Boston College Revised June 2012 Abstract This paper provides

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

The Identification Zoo - Meanings of Identification in Econometrics: PART 2

The Identification Zoo - Meanings of Identification in Econometrics: PART 2 The Identification Zoo - Meanings of Identification in Econometrics: PART 2 Arthur Lewbel Boston College heavily revised 2017 Lewbel (Boston College) Identification Zoo 2017 1 / 80 The Identification Zoo

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 1, Monday, July 30th, 9.00-10.30am Estimation of Average Treatment Effects Under Unconfoundedness 1.

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information