Large Sample Properties of Matching Estimators for Average Treatment Effects

Size: px
Start display at page:

Download "Large Sample Properties of Matching Estimators for Average Treatment Effects"

Transcription

1 Revision in progress. December 7, 4 Large Sample Properties of atching Estimators for Average Treatment Effects Alberto Abadie Harvard University and BER Guido Imbens UC Bereley and BER First version: July This version: December 4 We wish to than Don Andrews, Joshua Angrist, Gary Chamberlain, Geert Dhaene, Jinyong Hahn, James Hecman, Keisue Hirano, Hidehio Ichimura, Whitney ewey, Jac Porter, James Powell, Geert Ridder, Paul Rosenbaum, Ed Vytlacil, a co-editor and two anonymous referees, and seminar participants at various universities for comments, and Don Rubin for many discussions on these topics. Financial support for this research was generously provided through SF grants SES Abadie, SBR and SES Imbens.

2 Large Sample Properties of atching Estimators for Average Treatment Effects Alberto Abadie and Guido W. Imbens Abstract atching estimators for average treatment effects are widely used in evaluation research despite the fact that their large sample properties have not been established in many cases. The absence of formal results in this area may be partly due to the fact that standard asymptotic expansions do not apply to simple matching estimators, which are highly nonsmooth functionals of the data. In this article, we develop new methods to analyze the properties of matching estimators and establish a number of new results. First, we show that matching estimators are not / -consistent in general and describe conditions under which matching estimators attain / -consistency. Second, we show that even for cases that matching estimators are / -consistent, simple matching estimators with a fixed number of matches do not attain the semiparametric efficiency bound. Third, we provide new a estimator for the variance that does not require consistent nonparametric estimation of unnown functions. eywords: matching estimators, average treatment effects, unconfoundedness, selection-onobservables, potential outcomes Alberto Abadie John F. Kennedy School of Government Harvard University 79 John F. Kennedy Street Cambridge, A 38 and BER alberto abadie@harvard.edu Guido Imbens Department of Economics, and Department of Agricultural and Resource Economics University of California at Bereley 66 Evans Hall #388 Bereley, CA and BER imbens@econ.bereley.edu

3 . Introduction Estimation of average treatment effects is an important goal of much evaluation research, both in academic studies, as well as in government sponsored evaluations of social programs. Often, analyses are based on the assumptions that i assignment to treatment is unconfounded or exogenous, that is, based on observable pretreatment variables only, and ii there is sufficient overlap in the distributions of the pretreatment variables. ethods for estimating average treatment effects in parametric settings under these assumptions have a long history. Recently, a number of nonparametric implementations of this idea have been proposed. Hahn 998 calculates the efficiency bound and proposes an asymptotically efficient estimator based on nonparametric series estimation. Hecman, Ichimura, and Todd 998 and Hecman, Ichimura, Smith, and Todd 998 focus on the average effect on the treated and consider estimators based on local linear ernel regression methods. Hirano, Imbens, and Ridder propose an estimator that weights the units by the inverse of their assignment probabilities, and show that nonparametric series estimation of this conditional probability, labeled the propensity score by Rosenbaum and Rubin 983, leads to an efficient estimator. Empirical researchers, however, often use simple matching procedures to estimate average treatment effects when assignment for treatment is believed to be unconfounded. uch lie nearest neighbor estimators, these procedures match each treated unit to a fixed number of untreated units with similar values for the pretreatment variables. The average effect of the treatment is then estimated by averaging within-match differences in the outcome variable between the treated and the untreated units see, e.g., Rosenbaum, 995; Dehejia and Wahba, 999. atching estimators have great intuitive appeal, and are widely used in practice. However, their formal large sample properties have not been established. Part of the reason may be that simple matching estimators are highly non-smooth functionals of the distribution of the data, not amenable to standard asymptotic methods for smooth functionals. In this article, we study the large sample properties of matching estimators of average treatment effects and establish a number of new results. Our results show that some of the formal large sample properties of simple matching estimators are not very attractive. First, we show that when matching is done in multiple dimensions, matching estimators include a conditional bias term whose stochastic order increases with the number of continuous matching variables. We shown that the order of this conditional bias term See for example Cochran and Rubin 973, Rubin 977, Barnow, Cain, and Goldberger 98, Rosenbaum and Rubin 983, Hecman and Robb 984, and Rosenbaum 995.

4 may be greater than /. As a result, matching estimators are in general not / -consistent. Second, even if the dimension of the covariates is low enough for the conditional bias term to vanish asymptotically, we show that the simple matching estimator with a fixed number of matches does not achieve the semiparametric efficiency bound as calculated by Hahn 998. However, for the case when only a single continuous covariate is used to match, we show that the efficiency loss can be made arbitrarily close to zero by allowing a sufficiently large number of matches. Despite these poor formal properties, matching estimators do have some attractive features that may account for their popularity. In particular, matching estimators are extremely easy to implement, and they do not require consistent nonparametric estimation of unnown functions. In this paper we also propose a consistent estimator for the variances of matching estimators that does not require consistent non-parametric estimation of unnown functions. In the next section we introduce the notation and define the estimators. In Section 3 we discuss the large sample properties of simple matching estimators. In Section 4 we propose an estimator for the large sample variance of matching estimators. Section 5 concludes. The appendix contains proofs... otation. otation and Basic Ideas We are interested in estimating the average effect of a binary treatment on some outcome. For unit i, with i =,...,, let Y i and Y i denote the two potential outcomes given the control treatment and given the active treatment, respectively. The variable W i, for W i {, } indicates the treatment received. For unit i, we observe W i and the outcome for this treatment, { Yi if W Y i = i =, Y i if W i =, as well as a vector of pretreatment variables or covariates, X i. Following the literature, our main focus is on the population average treatment effect, τ = EY i Y i ], and the average effect for the treated, τ t = EY i Y i W i = ]. See Rubin 977, Hecman and Robb 984, and Imbens 3 for some discussion of these estimands.

5 We assume that assignment to treatment is unconfounded Rosenbaum and Rubin, 983, and that the probability of assignment is bounded away from zero and one. Assumption : Let X be a random vector of dimension of continuous covariates distributed on R with compact and convex support X, with a version of the density bounded, and bounded away from zero on its support. Assumption : For almost every x X, i unconfoundedness W is independent of Y, Y conditional on X = x; ii overlap η < PrW = X = x < η, for some η >. The dimension of X, denoted by, will be seen to play an important role in the properties of matching estimators. We assume that all covariates have continuous distributions. The combination of the two conditions in Assumption is referred to as strong ignorability Rosenbaum and Rubin, 983. These conditions are strong, and in many cases may not be satisfied. Compactness and convexity of the support of the covariates are convenient regularity conditions. Hecman, Ichimura and Todd 998 point out that for identification of the average treatment effect, τ, Assumption i can be weaened to mean independence EY w W, X] = EY w X] for w =,. For simplicity, we assume full independence, although for most of the results meanindependence is sufficient. When the parameter of interest is the average effect for the treated, τ t, then the first part of Assumption can be relaxed to require only that Y, is independent of W conditional on X. Also, when the parameter of interest is τ t the second part of Assumption can be relaxed so that the support of X for the treated X is a subset of the support of X for the untreated X. Assumption : For almost every x X, i W is independent of Y conditional on X = x; ii PrW = X = x < η, for some η >. Under Assumption i, the average treatment effect for the subpopulation with X = x is equal to: τx = EY Y X = x] = EY W =, X = x] EY W =, X = x]. Discrete covariates with a finite number of support points can be easily dealt with by analyzing estimation of average treatment effects within subsamples defined by their values. The number of such covariates does not affect the asymptotic properties of the estimators. In small samples, however, matches along discrete covariates may not be exact, so discrete covariates may create the same type of biases as continuous covariates. 3

6 Under Assumption ii, the difference on the right hand side of equation is identified for almost all x in X. Therefore, the average effect of the treatment can be recovered by averaging EY W =, X = x] EY W =, X = x] over the distribution of X: τ = E τx ] = E EY W =, X = x] EY W =, X = x] ]. Under Assumption i, the average treatment effect for the subpopulation with X = x and W = is equal to: τ t x = EY Y W =, X = x] = EY W =, X = x] EY W =, X = x]. Under Assumption ii, the difference on the right hand side of equation is identified for all x in X. Therefore, the average effect of the treatment on the treated can be recovered by averaging EY W =, X = x] EY W =, X = x] over the distribution of X conditional on W = : τ t = E τ t X W = ] = E EY W =, X = x] EY W =, X = x] W = ]. ext, we introduce some additional notation. For x X and w {, }, let µx, w = EY X = x, W = w], µ w x = EY w X = x], σ x, w = VY X = x, W = w, σwx = VY w X = x, and ε i = Y i µ Wi X i. Under Assumption, µx, w = µ w x and σ x, w = σwx. Let f w x be the conditional density of X given W = w, and let ex = PrW = X = x be the propensity score Rosenbaum and Rubin, 983. In part of our analysis, we adopt the following assumption. Assumption 3: {Y i, W i, X i } are independent draws from the distribution of Y, W, X. In some cases, however, treated and untreated are sampled separately and their proportions in the sample may not reflect their proportions in the population. Therefore, we relax Assumption 3 so that conditional on W i sampling is random. As we will show later, relaxing Assumption 3 is particularly useful when the parameter of interest is the average treatment effect on the treated. The numbers of control and treated units are and respectively, with = +. For reasons that will become clear later, to estimate average effects on the treated we will require that the size of the control group is at least of the same order of magnitude as the size of the treatment group. Assumption 3 : Conditional on W i = w the sample consists of independent draws from Y, X W = w, for w =,. For some r, r/ θ, with < θ <. 4

7 In this paper we focus on matching with replacement, allowing each unit to be used as a match more than once. For x X, let x = x x / be the standard Euclidean vector norm. 3 Let j m i be the index j that solves W j = W i and l:w l = W i { } X l X i X j X i = m, where { } is the indicator function, equal to one if the expression in bracets is true and zero otherwise. In other words, j m i is the index of the unit that is the m-th closest to unit i in terms of the covariate values, among the units with the treatment opposite to that of unit i. In particular, j i, which will be sometimes denoted by ji, is the nearest match for unit i. For notational simplicity and because we only consider continuous covariates, we ignore the possibility of ties, which happen with probability zero. Let J i denote the set of indices for the first matches for unit i: J i = {j i,..., j i}. 4 Finally, let K i denote the number of times unit i is used as a match given that matches per unit are done: K i = {i J l}. l= The distribution of K i will play an important role in the variance of the estimators. In many analyses of matching methods e.g., Rosenbaum, 995, matching is carried out without replacement, so that every unit is used as a match at most once, and K i. In this article, however, we focus on matching with replacement, allowing each unit to be used as a match more than once. atching with replacement produces matches of higher quality than matching without replacement by increasing the set of possible matches. 5 In addition, matching with replacement has the advantage that it allows us to consider estimators that match all units, treated as well as controls, so that the estimand is identical to the population average treatment effect... Estimators The unit level treatment effect is τ i = Y i Y i. For the units in the sample, only one of the potential outcomes, Y i and Y i, is observed and the other is unobserved or missing. All 3 Alternative norms of the form x V = x V x / for some positive definite symmetric matrix V are also covered by the results below, because x V = P x P x / for P such that P P = V. 4 For this definition to mae sense, we assume that and. We maintain this assumption implicit throughout. 5 As we show below, inexact matches generate bias in matching estimators. Therefore, expanding the set of possible matches will tend to produce smaller biases. 5

8 estimators for the average treatment effects we consider impute the expected potential outcomes in some way. The first estimator, the simple matching estimator, uses the following estimates for the expected potential outcomes: Y i if W i =, Ŷ i = j J i Y j if W i =, and Ŷ i = leading to the following estimator for the average treatment effect: ˆτ sm = Ŷi Ŷi = j J i Y j if W i =, Y i if W i =, W i + K i Y i. 3 The simple matching estimator can easily be modified to estimate the average treatment effect on the treated: τ sm,t = W i = Y i Ŷi = W i W i K i Y i. 4 It is useful to compare matching estimators to covariance-adjustment or regression imputation estimators. Let ˆµ w X i be a consistent estimator of µ w X i. Let Ȳ i = { Yi if W i =, ˆµ X i if W i =, and Ȳ i = { ˆµ X i if W i =, Y i if W i =. 5 The regression imputation estimators of τ and τ t are ˆτ reg = Ȳi Ȳi and ˆτ reg,t = Yi Ȳi. 6 W i = In our discussion we classify as regression imputation estimators those for which ˆµ w x is a consistent estimator of µ w x. The estimators proposed by Hahn 998 and some of those proposed by Hecman, Ichimura, and Todd 997 and Hecman, Ichimura, Smith, and Todd 998 fall into this category. If µ w X i is estimated using a nearest neighbor estimator with a fixed number of neighbors, then the regression imputation estimator is identical to the matching estimator with the same number of matches. The two estimators differ in the way they change with the sample size. We classify as matching estimators those estimators which use a finite and fixed number of matches. Interpreting matching estimators in this way may provide some intuition for some of the subsequent results. In nonparametric regression methods one typically chooses smoothing parameters to balance bias and variance of the estimated regression function. For example, in 6

9 ernel regression a smaller bandwidth leads to lower bias but higher variance. A nearest neighbor estimator with a single neighbor is at the extreme end of this. The bias is minimized within the class of nearest neighbors estimators but the variance no longer vanishes with the sample size. evertheless, as we shall show, matching estimators of average treatment effects are consistent under wea regularity conditions. The variance of matching estimators, however, is still relatively high and, as a result, matching with a fixed number of matches does not lead to an efficient estimator. The first goal of our paper is to derive the properties of the simple matching estimator in large samples, that is, as increases, for fixed. The motivation for our fixed- asymptotics is to provide an approximation to the sampling distribution of matching estimators with a small number of matches, because matching estimators with a small number of matches have been widely used in practice. The properties of interest include bias and variance. Of particular interest is the dependence of these results on the dimension of the covariates. A second goal is to provide methods for conducting inference through estimation of the large sample variance of the matching estimator. 3. Simple atching Estimators In this section we investigate the properties of the simple matching estimator, ˆτ sm, defined in 3. We can write the difference between the matching estimator, ˆτ sm, and the population average treatment effect τ as ˆτ sm τ = τx τ + E sm + B sm, 7 where τx is the average conditional treatment effect: τx = µ X i µ X i, 8 E sm is a weighted average of the residuals: E sm and B sm = E sm,i = W i + K i ε i, 9 is the conditional bias relative to τx: B sm = B sm,i = W i m= µ Wi X i µ Wi X jmi. 7

10 The first two terms on the right hand side of equation 7, τx τ and E sm, have zero mean. They will be shown to be / -consistent and asymptotically normal. The first term depends only on the covariates, and its variance is V τx /, where V τx = EτX τ ] is the variance of the conditional average treatment effect τx. Conditional on X and W, the matrix and vector with i-th row equal to X i and W i respectively the variance of τ sm is equal to the conditional variance of the second term, E sm. We will analyze the variances of these two terms in Section 3.. We will refer to the third term on the right hand side of of equation 7, B sm, as the conditional bias, and to Biassm = EBsm ] as the unconditional bias. If matching is exact, X i = X jmi for all i, and the conditional bias is equal to zero. In general it is not and its properties will be analyzed in Section 3.. where Similarly, we can write the estimator for the average effect for the treated, 4, as ˆτ sm,t τ t = τx t τ t + E sm,t + Bsm,t, τx t = W i µx i, µ X i, is the average conditional treatment effect for the treated, E sm,t = E sm,t,i = W i W i K i/ε i, is the contribution of the residuals, and B sm,t = is the conditional bias. B sm,t,i = W i µ X i µ X jmi, m= 3.. Bias Here we investigate the stochastic order of the conditional bias and its counterpart for the average treatment effect for the treated. The conditional bias consists of terms of the form µ X jmi µ X i or µ X i µ X jmi. To investigate the nature of these terms expand the difference µ X jmi µ X i around X i : µ X jmi µ X i = X jmi X i µ x X i + X j mi X i µ x x X ix jmi X i + O X jmi X i 3. 8

11 In order to study the components of the bias it is therefore useful to analyze the distribution of the matching discrepancy X jmi X i. First, let us analyze the matching discrepancy at a general level. Fix the covariate value at X = z, and suppose we have a random sample X,..., X with density fx and distribution function F x over the support X which is bounded. ow, consider the closest match to z in the sample. Let j = argmin j=,..., X j z, and let U = X j z be the matching discrepancy. We are interested in the distribution of the difference U, which is a vector. ore generally, we are interested in the distribution of the m-th closest matching discrepancy, U m = X jm z, where X jm is the m-th closest match to z from the random sample of size. The following lemma describes some ey asymptotic properties of the matching discrepancy at interior points of the support of X. Lemma : atching Discrepancy Asymptotic Properties Suppose that fz > and that f is differentiable in a neighborhood of z. Let V m = / U m and f Vm be the density of V m. Then, as, lim f V m v = fz v fz π / m! Γ/ m exp v fz π / Γ/ where Γy = e t t y dt for y > is Euler s Gamma Function, so that U m = O p /. oreover, the first three moments of U m are: / m + π / f E U m ] = Γ fz m! Γ + / fz x z / + o /, and m + EU m U m] = Γ fz m! E U m 3 ] = O 3/, where I is the identity matrix of size. All proofs are given in the appendix. π / Γ + / / / I + o /, This lemma shows that the order of the matching discrepancy increases with the number of continuous covariates. The lemma also shows that the first term in the stochastic expansion of / U m has a rotation invariant distribution with respect to the origin. The following lemma shows that for all points in the support, including boundary points, the normalized moments of the matching discrepancies, U m, are bounded. 9,

12 Lemma : atching Discrepancy Uniformly Bounded oments If Assumption holds, then all the moments of / U m are uniformly bounded in and z X. These results allow us to bound the stochastic order of the conditional bias. Theorem : Order of the Conditional Bias for the Average Treatment Effect Under assumptions, and 3: i if µ x and µ x are Lipschitz on X, then B sm = O p /, and ii the order of Bias sm = EBsm ] is not in general lower than /. Consider the implications of this theorem for the asymptotic properties of the simple matching estimator. First notice that, under regularity conditions, τx τ = O p with a normal limiting distribution, by a standard central limit theorem. Also, it will be shown later that, under regularity conditions E sm result of the theorem implies that B sm = O p, again with a normal limiting distribution. However, the is not O p in general. In particular, if is large enough the asymptotic distribution of ˆτ sm τ is dominated by the bias term and the simple matching estimator is not / -consistent. However, if only one of the covariates is continuously distributed, then = and B sm = O p, so ˆτ sm A similar result holds for the average treatment effect on the treated. τ will be asymptotically normal. Theorem : Order of the Conditional Bias for the Average Treatment Effect on the Treated Under assumptions, and 3, i if µ x is Lipschitz on X, then B sm,t = O p r/, and ii if X is a compact subset of the interior of X, µ x has bounded third derivatives in the interior of X, f x is differentiable in the interior of X with bounded derivatives, then Bias sm,t = EBsm,t ] = π / θ / f x Γ + / m + Γ m! m= /{ f f x x x µ x x + tr r/ µ x x x } f xdx+ o r/. This case is particularly relevant because often matching estimators have been used to estimate the average effect for the treated, in settings in which a large number of controls are sampled separately. Generally, in those cases, the conditional bias term has been ignored in the asymptotic approximation to standard errors and confidence intervals. That is justified if is of sufficiently high order relative to, or, to be precise, if r > /. In that case it follows that B sm,t =

13 o p /, and the bias term will get dominated in the large sample distribution by the two other terms, τx t τ t and E sm,t, both of which are O p /. In part ii of Theorem, we show that a general expression of the bias Bias sm,t can be calculated if X is compact and X int X so that the bias is not affected by the geometric characteristics of the boundary of X. Under these conditions, the bias of the matching estimator is at most of order /. This bias is further reduced when µ is constant or when µ is linear and f is constant, among other cases. ote, however, that randomizing the treatment, f = f does not reduce the order of Bias sm,t. 3.. Variance In this section we investigate the variance of the simple matching estimator, ˆτ sm. We focus on the first two terms of the simple matching estimator, 8 and 9, ignoring for the moment the conditional bias term. Conditional on X and W, the matrix and vector with i-th row equal to X i and W i respectively, the variance of ˆτ sm For τ sm,t Vˆτ sm X, W = we obtain: Vˆτ sm,t X, W = is + K i σ X i, W i. W i W i K i/ σ X i, W i. 3 Let V E = Vˆτ sm X, W and V E,t = Vˆτ sm,t X, W be the corresponding normalized variances. Ignoring the conditional bias term, B sm, the conditional expectation of τ sm is τx. The variance of this conditional mean is therefore V τx /, where V τx = EτX τ ]. Hence the marginal variance of τ sm, ignoring the conditional bias term, is Vˆτ sm = EV E ] + V τx /. For the estimator for the average effect on the treated the marginal variance is, again ignoring the conditional bias term, Vˆτ sm,t = EV E,t ]+V τx,t /, where V τx,t = Eτ t X τ t W = ]. The following lemma shows that the expectation of the normalized variance is finite. The ey is that K i, the number of times that unit i is used as a match, is O p with finite moments. 6 Lemma 3: i Suppose assumptions -3 hold, then K i = O p and its moments are bounded uniformly in. ii If, in addition, σ x, w are Lipschitz in X for w =,, then EV E + V τx ] = O. iii Suppose Assumptions, and 3, then / EK i q W i = ] is 6 otice that, for i, K i are exchangeable random variables, and therefore have identical marginal distributions.

14 uniformly bounded in for all q. iv If also σ x, w are Lipschitz in X for w =,, then EV E,t + V τx,t ] = O Consistency and Asymptotic ormality In this section we show that the simple matching estimator is consistent for the average treatment effect and that, without the conditional bias term, is / -consistent and asymptotically normal. The next assumption contains a set of wea smoothness restrictions on the conditional distribution of Y given X. otice that it does not require existence of higher order derivatives. Assumption 4: i µx, w and σ x, w are Lipschitz in X for w =,, ii the fourth moments of the conditional distribution of Y given W = w and X = x exist and are uniformly bounded, and iii σ x, w is bounded away from zero. Theorem 3: Consistency of the Simple atching Estimator i Suppose assumptions -3 and 4i hold. Then ˆτ sm τ p. ii Suppose assumptions,, 3, and 4i hold. Then ˆτ sm,t τ t p. otice that the consistency result holds regardless of the dimension of the covariates. ext, we state the formal result for asymptotic normality. The first result gives an asymptotic normality result for the estimators ˆτ sm sm,t and ˆτ after subtracting the bias term. Theorem 4: Asymptotic ormality for the Simple atching Estimator i Suppose assumptions -3 and 4 hold. Then V E + V τx / ˆτ sm B sm d τ,. ii Suppose assumptions,, 3, and 4 hold. Then V E,t + V τx,t / ˆτ sm,t Bsm,t τ t d,. Although one generally does not now the conditional bias term, this result is useful for two reasons. First, in some cases the bias term can be ignored because it is of sufficiently low order. Second, as we show in Abadie and Imbens 3, under some conditions, an estimate of the bias term can be used in the statement of Theorem 4 without changing the result. In the scalar covariate case, or when only the treated are matched and the size of the control group is of sufficient order of magnitude, there is no need to remove the bias.

15 Corollary : Asymptotic ormality for Simple atching Estimator Vanishing Bias i Suppose assumptions -3 and 4 hold, and =. Then V E + V τx / ˆτ sm d τ,. ii Suppose assumptions,, 3, and 4 hold, and r > /. Then V E,t + V τx,t / ˆτ sm,t τ t d, Efficiency The asymptotic efficiency of the estimators considered here depends on the limit of EV E ], which in turn depends on the limiting distribution of K i. It is difficult to wor out the limiting distribution of this variable for the general case. 7 for the special case with a scalar covariate = and a general. Here we investigate the form of the variance Theorem 5: Suppose =. If Assumptions to 4 hold, and f and f are continuous on int X, then σ Vˆτ sm = E X + E ex + σ X ] + V τx ex ex ex σx + ex ex ] σx + o. ote that with = we can ignore the conditional bias term, B sm. The semiparametric efficiency bound for this problem is, as established by Hahn 998, V eff σ = E X ex + σ X ] + V τx. ex The limiting variance of the matching estimator is in general larger. Relative to the efficiency bound it can be written as Vˆτ sm lim V eff V eff <. The asymptotic efficiency loss disappears quicly if the number of matches is large enough, and the efficiency loss from using a few matches is very small. For example, the asymptotic variance with a single match is less than 5% higher than the asymptotic variance of the efficient estimator, and with five matches the asymptotic variance is less than % higher. 7 The ey is the second moment of the volume of the catchment area A i, defined as the subset of X such that each observation, j, with W j = W i and X j A i is matched to i. In the single match case with = these objects are studied in stochastic geometry where they are nown as Poisson-Voronoi tesselations Oabe, Boots, Sugihara and o Chiu,. The variance of the volume of such objects under uniform f x and f x, normalized by the mean, has been wored out numerically for the one, two, and three dimensional cases. 3

16 4. Estimating the Variance Corollary uses the square-roots of V E + V τx and V E,t + V τx,t as normalizing factors to attain a limiting normal distribution for matching estimators. In this section, we show how to estimate these terms. 4.. Estimating the Conditional Variance Estimating the conditional variance, V E = + K i/ σ W i X i /, is complicated by the fact that it involves the conditional outcome variances, σ wx. We propose an estimator of the conditional variance of the simple matching estimator which does not require consistent nonparametric estimation of σ wx. Our method uses a matching estimator for σ wx where instead of the original matching of treated to control units, we now match treated units to treated units and control units to control units. Let l j i be the j-th closest unit to unit i among the units with the same value for the treatment. Then, we estimate the conditional variance as σ W i X i = J Y i J + J J j= Y lj i. 4 otice that if all matches are perfect so X lj i = X i for all j =,..., J, then E σ W i X i X i = x, W i = w] = σ wx. In practice, if the covariates are continuous, it will not be possible to find perfect matches, so σ W i X i will be only asymptotically unbiased. Because σ W i X i is an average of a fixed number of observations, this estimator will not be consistent for σ W i X i. However, the next theorem shows that averages of the σ W i X i -s are consistent for V E and V E,t. Theorem 6: Let σ W i X i be as in equation 4. Let V E = + K i σ W i X i, V E,t = W i W i K i ˆσ W i X i. If Assumptions to 4 hold, then V E V E = o p. If Assumptions, 3 and 4 hold, then V E,t V E,t = o p. 4.. Estimating the arginal Variance Here we develop consistent estimators for V = V E + V τx and V t = V E,t + V τx,t. These estimators are based on the same matching approach to estimating the conditional error variance 4

17 σ wx as in the previous subsection. In addition, these estimators exploit the fact that, EŶi Ŷi τ ] V τx + E ε i + m= ε j mi The average of the left hand side can be estimated as i Ŷi Ŷi τ sm /. The average of the second term on the right hand side can be estimated using the fact that ] E ε i + ε j mi X, W = + K i σw i X i, m= which can then be combined to estimate V τx. This in turn can be combined with the previously defined estimator for V E to obtain an estimator of V. ]. Theorem 7: Let σ X i, W i be as in equation 4. Let V = Ŷi Ŷi τ sm + K i ] σ K i + X i, W i and V t = W i = Y i Ŷi τ sm,t + K ik i W i σ X i, W i. If Assumptions to 4 hold, then V V = o p. V t V t = o p. If Assumptions, 3 and 4 hold, then 5. Conclusion In this paper we derive large sample properties of simple matching estimators that are widely used in applied evaluation research. The formal large sample properties turn out to be surprisingly poor. We show that simple matching estimators include a conditional bias term which does not disappear in large samples, under the standard / normalization. Therefore, standard matching estimators are not / -consistent in general. We derive the asymptotic distribution of matching estimators for the cases where the conditional bias can be ignored, and show that matching estimators with a fixed number of matches are not efficient. 5

18 Appendix Before proving Lemma, we collect some results on integration using polar coordinates that will be useful. See for example Strooc 999. Let = {ω R : ω = } be the unit -sphere, and λ S be its surface measure. Then, the area and volume of the unit -sphere are λ S dω = S π/ and r λ S dω dr = π/ Γ/ Γ/ = π / Γ + /, respectively. In addition, ω λ S dω =, and ωω λ S dω = λ S dω π / I = Γ + / I, where I is the -dimensional identity matrix. For any non-negative measurable function g on R, gx dx = r grω λ S dω dr. R We will also use the following result on Laplace approximation of integrals. Lemma A.: Let ar and br be two real functions, ar is continuous in a neighborhood of zero and br has continuous first derivative in a neighborhood of zero. Let b =, br > for r >, and that for every r > the infimum of br over r r is positive. Suppose that there exist positive real numbers a, b, α, β such that lim r arr α = a, lim brr β db = b, and lim r r dr rr β = b β. Suppose also that ar exp br dr < for all sufficiently large. Then, for α a ar exp brdr = Γ β + o. α/β α/β βb α/β Proof: It follows from Theorem 7. in Olver 997, page 8. Proof of Lemma : First consider the conditional probability of unit i being the m-th closest match to z, given X i = x: Prj m = i X i = x = Pr X z > x z m Pr X z x z m. m Because the marginal probability of unit i being the m-th closest match to z is Prj m = i = /, and because the density of X i is fx, then the distribution of X i conditional on it being the m-th closest match is: f Xi j m=ix = fx Prj m = i X i = x = fx Pr X z x z m Pr X z x z m, m and this is also the distribution of X jm. ow transform to the matching discrepancy U m = X jm z to get f Um u = fz + u Pr X z u m Pr X z u m. A. m 6

19 Transform to V m = / U m with Jacobian to obtain: f Vm v = f z + v Pr m / f z + = m m ote that Pr X z v / is v / / r fz + rωλ S dω dr, X z v / v Pr X z v / / m Pr X z v + o Pr / m X z v / where = {ω R : ω = } is the unit -sphere, and λ S is its surface measure. The derivative w.r.t. is v f z + v ω λ / S dω. Therefore, Pr X z v / lim / In addition, it is easy to chec that m = m m! + o. = v fz λ S dω. Therefore, lim f V m v = fz v fz m λ S dω exp v fz λ S dω. m! The previous equation shows that the density of V m converges pointwise to a non-negative function which is rotation invariant with respect to the origin. As a result, the matching discrepancy U m is O p / and the limiting distribution of / U m is rotation invariant with respect to the origin. This finishes the proof of the first result. ext, given f Um u in A., EU m = m A m, where A m = u fz + u Pr X z u m Pr X z u m du. R Boundedness of X implies that A m converges uniformly. It is easy to relax the bounded support condition here. We maintain it because it is used elsewhere in the article. Changing variables to polar coordinates gives: A m = r rω fz + rωλ S dω Pr X z r m Pr X z r m dr Then rewriting the probability Pr X z r as fx{ x z r}dx = fz + v{ v r}dv = R R 7 r s fz + sωλ S dω ds m.

20 and substituting this into the expression for A m gives: where A m = br = log r rω fz + rωλ S dω r r m s fz + sωλ S dω ds r m s fz + sωλ S dω ds dr = s fz + sωλ S dω ds, e br ar dr, and ar = r ω fz + rωλ S dω r m s fz + sωλ S dω ds r s fz + sωλ S dω m. ds That is, ar = qrpr, qr = r cr, and pr = gr m, where ω fz + rωλ S dω S cr = r, gr = s fz + sωλ S dω ds r s fz + sωλ S dω ds. s fz + sωλ S dω ds First notice that br is continuous in a neighborhood of zero and b =. By Theorem 6. in Rudin 976, s fz + sωλ S dω is continuous, and db dr r = r fz + rωλ S dω, s fz + sωλ S dω ds r which is also continuous. Using L Hospital s rule: db lim r brr = lim r r dr r = fz λ S dω. Similarly, cr is continuous in a neighborhood of zero, c =, and dc f lim r crr = lim r = r dr x z ωω λ S dω = f x z λ S dω. Therefore, dc lim r qrr + = lim r dr r = f x z λ S dω. Similar calculations yield dg lim r grr = lim r r dr r = fz λ S dω. 8 r

21 Therefore lim r prr m = ow, it is clear that lim r arr m+ = = fz m fz λ S dω. lim r qrr + prr m lim r m f λ S dω x z λ S dω = Therefore, the conditions of Lemma A. hold for α = m +, β = m a = fz f λ S dω fz x z, and b = fz λ S dω. Applying Lemma A., we get m + a A m = Γ b m+/ + o m+/ m+/ / m + π / = Γ fz Γ df + fz dx z + o m+/ Therefore, / m + π / EU m ] = Γ fz m! Γ df + fz dx z m fz f λ S dω fz x z. m+/ + o / which finishes the proof for the second result of the lemma. The results for EU m U m] and E U m 3 ] follow from similar arguments. Proof of Lemma : The proof consists of showing that the density of V m = / U m, denoted by f Vm v, is bounded by f Vm v which does not depend on or z, followed by a proof that v L f Vm vdv < for any L >. It is enough to show the result for > m the bounded support condition implies uniformly bounded moments of V m over z X for any given, and in particular for = m. Recall from the proof of Lemma that f Vm v = f z + v Pr X z v m Pr X z v m. m / / / Define f = inf x X fx and f = sup x X fx. By assumption, f > and f is finite. Let ū be the diameter of X ū = sup x,y X x y which is finite because X is bounded by assumption. Consider all the balls Bx, ū with centers x X and radius ū. Let c be the infimum over x X of the fraction that the intersection with X represents in volume of the balls. otice that, because X has dimension, then < c < ; and that, because X is convex, this proportion can only increase for a smaller radius. Let z X and v / ū. /., Pr X z v / = = v / v / f v / cf v / r fz + rωλ S dωdr r fz + rω{fz + rω > }λ S dωdr r {fz + rω > }λ S dωdr r 9 λ S dωdr = c v f π / Γ + /.

22 Similarly, Pr X z v v / f π / Γ + /. Hence, using the fact that for positive a, loga a and thus for all < b < and > m we have b/ m exp b m/ exp b/m +. In addition, m m m!. It follows that f Vm v f Vm v = f m! exp v c v m + f π/ π f / m =C v m exp C v, Γ/ Γ/ with C and C positive. This inequality holds trivially for v > / ū. This establishes an exponential bound that does not depend on or z. Hence for all and z, v L f Vm vdv is finite and thus all moments of / U m are uniformly bounded in and z. Proof of Theorem i: Let the unit-level matching discrepancy U m,i = X i X jmi. Define the unit-level conditional bias from the m-th match as B m,i = W i µ X i µ X jmi W i µ X i µ X jmi = W i µ X i µ X i + U m,i W i µ X i µ X i + U m,i. By the Lipschitz assumption on µ and µ, we obtain B m,i C U m,i, for some positive constant C. The bias term is B sm = m= B m,i. Using the Cauchy-Schwarz Inequality and Lemma : ] E / B sm ] C / E U,i = C / E + / W i= C E / E W ] E / U,i W,..., W, X i / U,i W,..., W, X i ] ] / + / for some positive constant, C. Using Chernoff s Inequality, it can be seen that any moment of / or / is uniformly bounded in with w for w =,. The result of the theorem follows now from arov s Inequality. This proves part i of the theorem. We defer the proof of Theorem ii until after the proof of Theorem ii, because the former will follow directly from the latter. Proof of Theorem : The proof of first part of theorem is very similar to the proof of Theorem, and therefore is omitted. To prove the second part of Theorem, the following auxiliary lemma will be useful. ],

23 Lemma A.: Let X be distributed with density f on some compact set of dimension : X R. Let Z be a compact set of dimension which is a subset of int X. Suppose that f is bounded and bounded away from zero on X, < f fx f < for all x X. Suppose also that f is differentiable in the interior of X with bounded derivatives X, sup x int X fx/ X <. Then / EU m ] is bounded by a constant uniformly over z Z and > m. Proof of Lemma A.: Fix z Z. From the proof of Lemma, we now that: E ] U m = ufz + u Pr X z u m Pr X z u m du m = r rωf z + rω λ m S dω Pr S X z r m Pr X z r m dr. Let rz = sup{r > z + rω X, for all ω }. Given the conditions of the lemma, there exists r such that rz r > for all z Z. Let where A m = ϕr = Then, rz m E U m ] = A m + A m. ϕr dr, and A m = rz ϕr dr, r rωf z + rω λ S dω Pr X z r m Pr X z r m. Consider the change of variable t = r /, then: A m = / rz / Pr m X z t / t ωf m Pr z + t ω λ / S dω X z t m dt. / Let f/ X = sup x int X f/ Xx. By the ean Value Theorem, for t t rz / f z + t ω fz / = t / f ω X z + t / ω t / f X. As a result, for t rz /, we obtain: ωf z + t ω λ / S dω S = ω f z + t ω fz λ / S dω t f / X λ S dω. ote also that for t rz / we have: Pr X z t / = t / f t / s fz + sωλ S dω ds s ds λ S dω = t f λ S dω.

24 Similarly: Pr X z t / Therefore A m It is easy to see that: / m m rz / t f λ S dω. m t + f m X λ S dω t m t m f λ S dω f λ S dω dt. m!. It is also easily seen that for < b < and > m, b/ m exp b/m +. Therefore, for z Z and > m, we obtain: A m < / m! t+ f X λ S dω t t m exp m + f λ S dω f λ S dω dt C, / for some positive constant, C. For A m, notice that, for m: A m Pr X z rz m m r rωfz + rωλ S dω Pr X z r m dr < Pr X z rz m m m! m r rωfz + rωλ S dω Pr X z r m dr rz m Pr X z rz m ū/m!, where ū is the diameter of X, that is, ū = sup x,y X x y <. The last inequality holds because the last term on the left hand side is equal to the expectation of U m when = m which is bounded by ū. Consequently, we obtain: m / A m < ū/m! /+m f rz λ S dω, z Z. To simplify notation, let b = f λ S dω/. Then: / A m < u/m! /+m b rz m u/m! /+m b r m, where < b r <. The right hand side of last equation is maximal at: / + m = ln b r.

25 Therefore, / A m < u/m! / + m /+m ln b r /+m b r /+m ln b r m = / + m /+m u/m! e /+m = C ln b r /+m <. b r m ote that this bound does not depend on or z. As a result, we obtain / E Um ] < C + C <, for all > m and z Z. The proof of the second part of Theorem is as follows. Bias sm,t = EB sm,t ] = E W i µ X i µ X ] jmi = m= E µ X i µ X jmi Wi = ]. m= Applying a second order Taylor expansion, we obtain: µ X jmi µ X i = µ x X i U m,i + tr µ x x X i U m,i U m,i + O U m,i 3. Therefore, because the trace is a linear operator: E µ X jmi µ X i X i = z, W i = ] = µ x z EU m,i X i = z, W i = ] + tr µ x x z EU m,iu m,i X i = z, W i = ] + O E U m,i 3 Xi = z, W i = ]. Lemma implies that the norm of / EU m,i U m,i X i = z, W i = ] and / E U m,i 3 X i = z, W i = ] are uniformly bounded over z X and. Lemma A. implies the same result for / EU m,i X i = z, W i = ]. As a result, / Eµ X jmi µ X i X i = z, W i = ] is uniformly bounded over z X and. Applying Lebesgue s Dominated Convergence Theorem along with Lemma, we obtain: / E µ X jmi µ X i Wi = ] m + = Γ π / f x Γ + / m! / { f f x x x µ x x + tr ow, the result follows easily from the conditions of the theorem. } µ x x x f x dx + o. Proof of Theorem ii: Consider the special case where µ x is flat over X and µ x is flat in a neighborhood of the boundary, B. Then, matching the control units does not create bias. atching the treated units creates a bias that is similar to the formula in Theorem ii, but with r =, θ = p/ p, and the integral taen over X B c. Proof of Lemma 3: Define f = inf x,w f w x and f = sup x,w f w x, with f > and f finite. Let ū = sup x,y X x y. Consider all the balls Bx, u with centers x X and radius u. Let cu < cu < 3

26 be the infimum over x X of the proportion that the intersection with X represents in volume of the balls. ote that, because X is convex, this proportion nonincreasing in u, so let c = cū, and cu c for u ū. The proof consists of three parts. First we derive an exponential bound for the probability that the distance to a match, X jmi X i exceeds some value. Second, we use this to obtain an exponential bound on the volume of the catchment area, A i, defined as the subset of X such that i is matched to each observation, j, with W j = W i and X j A i. That is, if W j = W i and X j A i, then i J j. { A i = x } l W l =W i { X l x X i x }. Third, we use the exponential bound on the volume of the catchment area to derive an exponential bound on the probability of a large K i, which will be used to bound the moments of K i. For the first part we bound the probability of the distance to a match. Let x X and u < / W i ū. Then, Pr Similarly X j X i > u / W W i,..., W, W j = W i, X i = x / u W i = r f Wi x + rωλ S dωdr cf Pr u / W i r λ S dωdr = cfu W i π / /Γ + /. X j X i u / W W i,..., W, W j = W i, X i = x fu W i π / /Γ + /. otice also that Pr X j X i > u / W W i,..., W, X i = x, j J i Pr X j X i > u / W W i,..., W, X i = x, j = j i Wi = m m= Pr X j X i > u / Wi m W W i,..., W, W j = W i, X i = x Pr X j X i u / W i W,..., W, W j = W i, X i = x m. In addition, Wi m Pr X j X i u / m W W i,..., W, W j = W i, X i = x m! u π f / m. Γ + / Therefore, Pr X j X i > u / W W i,..., W, X i = x, j J i m= m! u f π / Γ + / m u cf π / Γ + / Wi Wi m. 4

27 Then, for some constant C >, Pr X j X i > u / W W i,..., W, X i = x, j J i C max{, u } u π / cf Γ + / m= Wi Wi m C max{, u } exp u + cf π /. Γ + / otice that this bound also holds for u / W i ū, because in that case the probability that X jmi X i > u / W i is zero. ext, we consider for unit i, the volume B i of the catchment area A i, defined as: B i = dx. A i Conditional on W,..., W, i J j, X i = x, and A i, the distribution of X j is proportional to f Wi x {x A i}. otice that a ball with radius b/ / /π / /Γ + / / has volume b/. Therefore for X i in A i and B i b, we obtain b/ / Pr X j X i > π / /Γ + / / W,..., W, X i = x, A i, i J j f f. The last inequality does not depend on A m i given B i b. Therefore, b/ / Pr X j X i > π / /Γ + / / W,..., W, X i = x, i J j, B i b f f. As a result, if b/ / Pr X j X i > π / /Γ + / / W,..., W, X i = x, i J j δ f f, then it must be the case that PrB i b W,..., W, X i = x, i J j δ. In fact, the inequality in equation A. has been established above for b = u π /, and δ = f Wi Γ + / f C max{, u } exp u + cf π /. Γ + / Let t = u π / /Γ + /, then Pr Wi B i t W,..., W, X i = x, i J j C max{, C 3 t } exp C 4 t, for some positive constants, C, C 3, and C 4. This establishes an uniform exponential bound, so all the moments of Wi B i exist conditional on W,..., W, X i = x, i J j uniformly in. For the third part of the proof, consider the distribution of K i, the number of times unit i is used as a match. Let P i be the probability that an observation with the opposite treatment is matched to observation i: P i = f Wi xdx f B i. A i ote that for n, E Wi P i n X i = x, W,..., W ] E Wi P i n X i = x, W,..., W, i J j] A. f n E Wi B i n X i = x, W,..., W, i J j]. 5

28 As a result, E Wi P i n X i = x, W,..., W ] is uniformly bounded. Conditional on P i, and on X i = x, W,..., W, the distribution of K i is binomial with parameters Wi and P i. Therefore, conditional on P i, and X i = x, W,..., W, the q-th moment of K i is EK q i P i, X i = x, W,..., W ] = q n= Sq, n Wi! P i n Wi n! q Sq, n Wi P i n, n= where Sq, n are Stirling numbers of the second ind and q see, e.g., Johnson, Kotz and Kemp, 99. Then, because Sq, = for q, EK q i X i = x, W,..., W ] C q Sq, n n= Wi for some positive constant, C. Using Chernoff s bound for binomial tails, it can be easily seen that E Wi / Wi n X i = x, W i ] = E Wi / Wi n W i ] is uniformly bounded in, for all n, so the result of the first part of the lemma follows. ext, consider part ii of Lemma 3. Because the variance σ x, w is Lipschitz on a bounded set, it is therefore bounded by some constant, σ = sup w,x σ x, w. As a result, E + K / σ x, w] is bounded by σ E + K / ], which is uniformly bounded in by the result in the first part of the lemma. Hence EV E ] = O. ext, consider part iii of Lemma 3. Using the same argument as for EK q i], we obtain Therefore, EK q i W i = ] q Sq, n n= EK q i W i = ] q Sq, n n= Wi n, n E P i n W i = ]. n E P i n W i = ], which is uniformly bounded because r. For part iv notice that ] EV E,t ] = E W i σ K i X i, W i + E W i σ X i, W i ] K ] = Eσ i X i, W i W i = ] + E σ X i, W i W i =. Therefore, EV E,t ] is uniformly bounded. Proof of Theorem 3: We only prove the first part of the theorem. The second part follows the same argument. We can write ˆτ sm τ = τx τ + Esm + Bsm. We consider each of the three terms separately. First, by assumptions and 4i, µ w x is bounded over x X and w =,. Hence µ X µ X τ has mean zero and p finite variance. Therefore, by a standard law of large numbers τx τ. Second, by Theorem, B sm = O p / = o p. Finally, because Eε i X, W] =, Eε i X, W] σ and Eε i ε j X, W] = i j, we obtain E ] E sm = E + K ] i ε i = E + K i σ X i, W i ] = O, 6

29 where the last equality comes from Lemma 3. By arov s inequality E sm = O p / = o p. Proof of Theorem 4: We only prove the first assertion in the theorem as the second follows the same argument. We can write ˆτ sm B sm τ = τx τ + E sm. First, consider the contribution of τx τ. By a standard central limit theorem τx τ d, V τx. Second, consider the contribution of E sm/ V E = Esm,i / V E. Conditional on W and X the unit-level terms E,i sm are independent with zero means and non-identical distributions. The conditional variance of E,i sm is + K i/ σ X i, W i. We will use a Lindeberg-Feller central limit theorem for E sm / V E. For a given X, W, the Lindeberg-Feller condition requires that V E ] E E,i sm { E,i sm η LF V E } X, W, for all η LF >. To prove that A.4 condition holds, notice that by Hölder s and arov s inequalities we have ] E E,i sm { E,i sm η LF V E } X, W E E,i sm 4 X, W ] / ] / E { E,i sm η LF V E } X, W E E,i sm 4 X, W ] / Pr E,i sm η LF V E X, W E E,i sm 4 X, W ] / E E sm,i X, W ] ηlf V E. Let s = sup w,x σ wx <, s = inf w,x σ x, w >, and K = sup w,x E ε 4 i X i = x, W i = w ] <. otice that V E s. Therefore, V E V E ] E E,i sm { E,i sm η LF V E } X, W s K / η LF s + K i/ 4 E ε 4 i X, W ] / + K i/ σ X i, W i ηlf V E + K i/ 4. Because E + K i/ 4 ] is uniformly bounded, by arov s Inequality, the last term in parentheses is bounded in probability. Hence, the Lindeberg-Feller condition is satisfied for almost all X and W. As a result, / Esm,i + K i/ σ X i, W i / = / E sm V E d,. Finally, E sm/ V E and τx τ are asymptotically independent the central limit theorem for E sm / V E holds conditional on X and W. Thus, the fact that both converge to standard normal distributions, boundedness of V E and V τx, and boundedness away from zero of V E imply that V E + V τx / / τ sm Bsm τ converges to a standard normal distribution. 7 A.3 A.4

Large Sample Properties of Matching Estimators for Average Treatment Effects

Large Sample Properties of Matching Estimators for Average Treatment Effects Large Sample Properties of atching Estimators for Average Treatment Effects Alberto Abadie Harvard University and BER Guido Imbens UC Bereley and BER First version: July This version: January 4 A previous

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Matching on the Estimated Propensity Score

Matching on the Estimated Propensity Score atching on the Estimated Propensity Score Alberto Abadie Harvard University and BER Guido W. Imbens Harvard University and BER October 29 Abstract Propensity score matching estimators Rosenbaum and Rubin,

More information

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Keisuke Hirano University of Miami 2 Guido W. Imbens University of California at Berkeley 3 and BER Geert Ridder

More information

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW*

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW* OPARAMETRIC ESTIMATIO OF AVERAGE TREATMET EFFECTS UDER EXOGEEITY: A REVIEW* Guido W. Imbens Abstract Recently there has been a surge in econometric work focusing on estimating average treatment effects

More information

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 1, Monday, July 30th, 9.00-10.30am Estimation of Average Treatment Effects Under Unconfoundedness 1.

More information

Dealing with Limited Overlap in Estimation of Average Treatment Effects

Dealing with Limited Overlap in Estimation of Average Treatment Effects Dealing with Limited Overlap in Estimation of Average Treatment Effects The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation

More information

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region Impact Evaluation Technical Session VI Matching Techniques Jed Friedman Manila, December 2008 Human Development Network East Asia and the Pacific Region Spanish Impact Evaluation Fund The case of random

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar Mitnik First Draft: July 2004

More information

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ Matching James J. Heckman Econ 312 This draft, May 15, 2007 1 / 169 Introduction The assumption most commonly made to circumvent problems with randomization is that even though D is not random with respect

More information

TECHNICAL WORKING PAPER SERIES ON THE FAILURE OF THE BOOTSTRAP FOR MATCHING ESTIMATORS. Alberto Abadie Guido W. Imbens

TECHNICAL WORKING PAPER SERIES ON THE FAILURE OF THE BOOTSTRAP FOR MATCHING ESTIMATORS. Alberto Abadie Guido W. Imbens TECHNICAL WORKING PAPER SERIES ON THE FAILURE OF THE BOOTSTRAP FOR MATCHING ESTIMATORS Alberto Abadie Guido W. Imbens Technical Working Paper 325 http://www.nber.org/papers/t0325 NATIONAL BUREAU OF ECONOMIC

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens

NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens BER WORKIG PAPER SERIES MATCHIG O THE ESTIMATED PROPESITY SCORE Alberto Abadie Guido W. Imbens Working Paper 530 http://www.nber.org/papers/w530 ATIOAL BUREAU OF ECOOMIC RESEARCH 050 Massachusetts Avenue

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

The Evaluation of Social Programs: Some Practical Advice

The Evaluation of Social Programs: Some Practical Advice The Evaluation of Social Programs: Some Practical Advice Guido W. Imbens - Harvard University 2nd IZA/IFAU Conference on Labor Market Policy Evaluation Bonn, October 11th, 2008 Background Reading Guido

More information

Estimation of Treatment Effects under Essential Heterogeneity

Estimation of Treatment Effects under Essential Heterogeneity Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.

More information

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING ESTIMATION OF TREATMENT EFFECTS VIA MATCHING AAEC 56 INSTRUCTOR: KLAUS MOELTNER Textbooks: R scripts: Wooldridge (00), Ch.; Greene (0), Ch.9; Angrist and Pischke (00), Ch. 3 mod5s3 General Approach The

More information

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E. NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES James J. Heckman Petra E. Todd Working Paper 15179 http://www.nber.org/papers/w15179

More information

Efficient Semiparametric Estimation of Quantile Treatment Effects

Efficient Semiparametric Estimation of Quantile Treatment Effects fficient Semiparametric stimation of Quantile Treatment ffects Sergio Firpo U Berkeley - Department of conomics This Draft: January, 2003 Abstract This paper presents calculations of semiparametric efficiency

More information

Classical regularity conditions

Classical regularity conditions Chapter 3 Classical regularity conditions Preliminary draft. Please do not distribute. The results from classical asymptotic theory typically require assumptions of pointwise differentiability of a criterion

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1 Imbens/Wooldridge, IRP Lecture Notes 2, August 08 IRP Lectures Madison, WI, August 2008 Lecture 2, Monday, Aug 4th, 0.00-.00am Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Introduction

More information

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links. 14.385 Fall, 2007 Nonlinear Econometrics Lecture 2. Theory: Consistency for Extremum Estimators Modeling: Probit, Logit, and Other Links. 1 Example: Binary Choice Models. The latent outcome is defined

More information

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance Identi cation of Positive Treatment E ects in Randomized Experiments with Non-Compliance Aleksey Tetenov y February 18, 2012 Abstract I derive sharp nonparametric lower bounds on some parameters of the

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

Weak Stochastic Increasingness, Rank Exchangeability, and Partial Identification of The Distribution of Treatment Effects

Weak Stochastic Increasingness, Rank Exchangeability, and Partial Identification of The Distribution of Treatment Effects Weak Stochastic Increasingness, Rank Exchangeability, and Partial Identification of The Distribution of Treatment Effects Brigham R. Frandsen Lars J. Lefgren December 16, 2015 Abstract This article develops

More information

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true 3 ohn Nirenberg inequality, Part I A function ϕ L () belongs to the space BMO() if sup ϕ(s) ϕ I I I < for all subintervals I If the same is true for the dyadic subintervals I D only, we will write ϕ BMO

More information

2. Function spaces and approximation

2. Function spaces and approximation 2.1 2. Function spaces and approximation 2.1. The space of test functions. Notation and prerequisites are collected in Appendix A. Let Ω be an open subset of R n. The space C0 (Ω), consisting of the C

More information

Principles Underlying Evaluation Estimators

Principles Underlying Evaluation Estimators The Principles Underlying Evaluation Estimators James J. University of Chicago Econ 350, Winter 2019 The Basic Principles Underlying the Identification of the Main Econometric Evaluation Estimators Two

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness finite-sample optimal estimation and inference on average treatment effects under unconfoundedness Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2017 Introduction

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

Section 10: Inverse propensity score weighting (IPSW)

Section 10: Inverse propensity score weighting (IPSW) Section 10: Inverse propensity score weighting (IPSW) Fall 2014 1/23 Inverse Propensity Score Weighting (IPSW) Until now we discussed matching on the P-score, a different approach is to re-weight the observations

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar Mitnik First Draft: July 2004

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

NBER WORKING PAPER SERIES FINITE POPULATION CAUSAL STANDARD ERRORS. Alberto Abadie Susan Athey Guido W. Imbens Jeffrey M.

NBER WORKING PAPER SERIES FINITE POPULATION CAUSAL STANDARD ERRORS. Alberto Abadie Susan Athey Guido W. Imbens Jeffrey M. NBER WORKING PAPER SERIES FINITE POPULATION CAUSAL STANDARD ERRORS Alberto Abadie Susan Athey Guido W. Imbens Jeffrey. Wooldridge Working Paper 20325 http://www.nber.org/papers/w20325 NATIONAL BUREAU OF

More information

Hartogs Theorem: separate analyticity implies joint Paul Garrett garrett/

Hartogs Theorem: separate analyticity implies joint Paul Garrett  garrett/ (February 9, 25) Hartogs Theorem: separate analyticity implies joint Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ (The present proof of this old result roughly follows the proof

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples DISCUSSION PAPER SERIES IZA DP No. 4304 A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples James J. Heckman Petra E. Todd July 2009 Forschungsinstitut zur Zukunft

More information

Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities

Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities Laurent Saloff-Coste Abstract Most smoothing procedures are via averaging. Pseudo-Poincaré inequalities give a basic L p -norm control

More information

arxiv: v2 [math.st] 14 Oct 2017

arxiv: v2 [math.st] 14 Oct 2017 Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning Sören Künzel 1, Jasjeet Sekhon 1,2, Peter Bickel 1, and Bin Yu 1,3 arxiv:1706.03461v2 [math.st] 14 Oct 2017 1 Department

More information

Mean-squared-error Calculations for Average Treatment Effects

Mean-squared-error Calculations for Average Treatment Effects Mean-squared-error Calculations for Average Treatment Effects Guido W. Imbens UC Berkeley and BER Whitney ewey MIT Geert Ridder USC First Version: December 2003, Current Version: April 9, 2005 Abstract

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Inference for Average Treatment Effects

Inference for Average Treatment Effects Inference for Average Treatment Effects Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2018 1 / 15 Social

More information

Random Feature Maps for Dot Product Kernels Supplementary Material

Random Feature Maps for Dot Product Kernels Supplementary Material Random Feature Maps for Dot Product Kernels Supplementary Material Purushottam Kar and Harish Karnick Indian Institute of Technology Kanpur, INDIA {purushot,hk}@cse.iitk.ac.in Abstract This document contains

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1 Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1 Lectures on Evaluation Methods Guido Imbens Impact Evaluation Network October 2010, Miami Methods for Estimating Treatment

More information

Problem List MATH 5143 Fall, 2013

Problem List MATH 5143 Fall, 2013 Problem List MATH 5143 Fall, 2013 On any problem you may use the result of any previous problem (even if you were not able to do it) and any information given in class up to the moment the problem was

More information

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Clustering as a Design Problem

Clustering as a Design Problem Clustering as a Design Problem Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge Harvard-MIT Econometrics Seminar Cambridge, February 4, 2016 Adjusting standard errors for clustering is common

More information

WHEN TO CONTROL FOR COVARIATES? PANEL ASYMPTOTICS FOR ESTIMATES OF TREATMENT EFFECTS

WHEN TO CONTROL FOR COVARIATES? PANEL ASYMPTOTICS FOR ESTIMATES OF TREATMENT EFFECTS WHEN TO CONTROL FOR COVARIATES? PANEL ASYPTOTICS FOR ESTIATES OF TREATENT EFFECTS Joshua Angrist Jinyong Hahn* Abstract The problem of when to control for continuous or highdimensional discrete covariate

More information

The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors

The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors The Asymptotic Variance of Semi-parametric stimators with Generated Regressors Jinyong Hahn Department of conomics, UCLA Geert Ridder Department of conomics, USC October 7, 00 Abstract We study the asymptotic

More information

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3)

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3) M ath 5 2 7 Fall 2 0 0 9 L ecture 4 ( S ep. 6, 2 0 0 9 ) Properties and Estimates of Laplace s and Poisson s Equations In our last lecture we derived the formulas for the solutions of Poisson s equation

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

A Local Generalized Method of Moments Estimator

A Local Generalized Method of Moments Estimator A Local Generalized Method of Moments Estimator Arthur Lewbel Boston College June 2006 Abstract A local Generalized Method of Moments Estimator is proposed for nonparametrically estimating unknown functions

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Lecture 11 Roy model, MTE, PRTE

Lecture 11 Roy model, MTE, PRTE Lecture 11 Roy model, MTE, PRTE Economics 2123 George Washington University Instructor: Prof. Ben Williams Roy Model Motivation The standard textbook example of simultaneity is a supply and demand system

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Notes on causal effects

Notes on causal effects Notes on causal effects Johan A. Elkink March 4, 2013 1 Decomposing bias terms Deriving Eq. 2.12 in Morgan and Winship (2007: 46): { Y1i if T Potential outcome = i = 1 Y 0i if T i = 0 Using shortcut E

More information

Math The Laplacian. 1 Green s Identities, Fundamental Solution

Math The Laplacian. 1 Green s Identities, Fundamental Solution Math. 209 The Laplacian Green s Identities, Fundamental Solution Let be a bounded open set in R n, n 2, with smooth boundary. The fact that the boundary is smooth means that at each point x the external

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Consistent Tests for Conditional Treatment Effects

Consistent Tests for Conditional Treatment Effects Consistent Tests for Conditional Treatment Effects Yu-Chin Hsu Department of Economics University of Missouri at Columbia Preliminary: please do not cite or quote without permission.) This version: May

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

S chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1.

S chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1. Sep. 1 9 Intuitively, the solution u to the Poisson equation S chauder Theory u = f 1 should have better regularity than the right hand side f. In particular one expects u to be twice more differentiable

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp page 1 Lecture 7 The Regression Discontinuity Design fuzzy and sharp page 2 Regression Discontinuity Design () Introduction (1) The design is a quasi-experimental design with the defining characteristic

More information

Balancing Covariates via Propensity Score Weighting

Balancing Covariates via Propensity Score Weighting Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014

More information

MATH COURSE NOTES - CLASS MEETING # Introduction to PDEs, Fall 2011 Professor: Jared Speck

MATH COURSE NOTES - CLASS MEETING # Introduction to PDEs, Fall 2011 Professor: Jared Speck MATH 8.52 COURSE NOTES - CLASS MEETING # 6 8.52 Introduction to PDEs, Fall 20 Professor: Jared Speck Class Meeting # 6: Laplace s and Poisson s Equations We will now study the Laplace and Poisson equations

More information

MATH COURSE NOTES - CLASS MEETING # Introduction to PDEs, Spring 2018 Professor: Jared Speck

MATH COURSE NOTES - CLASS MEETING # Introduction to PDEs, Spring 2018 Professor: Jared Speck MATH 8.52 COURSE NOTES - CLASS MEETING # 6 8.52 Introduction to PDEs, Spring 208 Professor: Jared Speck Class Meeting # 6: Laplace s and Poisson s Equations We will now study the Laplace and Poisson equations

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Gov 2002: 5. Matching

Gov 2002: 5. Matching Gov 2002: 5. Matching Matthew Blackwell October 1, 2015 Where are we? Where are we going? Discussed randomized experiments, started talking about observational data. Last week: no unmeasured confounders

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects

Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects Matias Busso IDB, IZA John DiNardo University of Michigan and NBER Justin McCrary University of Californa, Berkeley and

More information

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 995-002 Research India Publications http://www.ripublication.com Truncated Regression Model and Nonparametric Estimation

More information

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Alea 4, 117 129 (2008) Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Anton Schick and Wolfgang Wefelmeyer Anton Schick, Department of Mathematical Sciences,

More information

Econ 2148, fall 2017 Instrumental variables II, continuous treatment

Econ 2148, fall 2017 Instrumental variables II, continuous treatment Econ 2148, fall 2017 Instrumental variables II, continuous treatment Maximilian Kasy Department of Economics, Harvard University 1 / 35 Recall instrumental variables part I Origins of instrumental variables:

More information