NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens

Size: px

Start display at page:

Download "NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens"

Bruce Barrett
6 years ago
Views:

1 BER WORKIG PAPER SERIES MATCHIG O THE ESTIMATED PROPESITY SCORE Alberto Abadie Guido W. Imbens Working Paper ATIOAL BUREAU OF ECOOMIC RESEARCH 050 Massachusetts Avenue Cambridge, MA 0238 August 2009 We are grateful to Ben Hansen, James Robins, Paul Rosenbaum, Donald Rubin, and participants in seminars at the Banff Center, Brown, Georgetown, Harvard/MIT, Montreal, and UPenn for comments and discussions. Software implementing these methods is available on our websites. The views expressed herein are those of the authors and do not necessarily reflect the views of the ational Bureau of Economic Research. BER working papers are circulated for discussion and comment purposes. They have not been peerreviewed or been subject to the review by the BER Board of Directors that accompanies official BER publications by Alberto Abadie and Guido W. Imbens. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including notice, is given to the source.

2 Matching on the Estimated Propensity Score Alberto Abadie and Guido W. Imbens BER Working Paper o. 530 August 2009, Revised December 2009 JEL o. C3,C4 ABSTRACT Propensity score matching estimators Rosenbaum and Rubin, 983 are widely used in evaluation research to estimate average treatment effects. In this article, we derive the large sample distribution of propensity score matching estimators. Our derivations take into account that the propensity score is itself estimated in a first step, prior to matching. We prove that first step estimation of the propensity score affects the large sample distribution of propensity score matching estimators. Moreover, we derive an adjustment to the large sample variance of propensity score matching estimators that corrects for first step estimation of the propensity score. In spite of the great popularity of propensity score matching estimators, these results were previously unavailable in the literature. Alberto Abadie John F. Kennedy School of Government Harvard University 79 JFK Street Cambridge, MA 0238 and BER alberto_abadie@harvard.edu Guido W. Imbens Department of Economics Littauer Center Harvard University 805 Cambridge Street Cambridge, MA 0238 and BER imbens@fas.harvard.edu

3 I. Introduction Propensity score matching estimators Rosenbaum and Rubin, 983 are widely used to estimate treatment effects when all treatment confounders are measured. Rosenbaum and Rubin 983 define the propensity score as the conditional probability of assignment to a treatment given a vector of covariates including the values of all treatment confounders. Their key insight is that adjusting for the propensity score is enough to remove the bias created by all treatment confounders. Relative to matching directly on the covariates, propensity score matching has the advantage of reducing the dimensionality of matching to a single dimension. This greatly facilitates the matching process, because units with dissimilar covariate values may nevertheless have similar values in their propensity scores. Propensity score values are rarely observed in practice. Usually the propensity score has to be estimated prior to matching. In spite of the great popularity that propensity score matching methods have gained since they were proposed by Rosenbaum and Rubin in 983, their large sample distribution has not yet been derived for the case when the propensity score is estimated in a first step. A possible reason for this void in the literature is that matching estimators are highly non-smooth functionals of the distribution of the matching variables, which makes it difficult to establish an asymptotic approximation to the distribution of matching estimators when a matching variable is estimated in a first step. This has motivated the use of bootstrap standard errors for propensity score matching estimators. However, recently it has been shown that the bootstrap is not in general valid for matching estimators Abadie and Imbens, In this article, we derive the large sample distribution of propensity score matching estimators. Our derivations take into account that the propensity score is itself estimated in a first step. We prove that first step estimation of the propensity score affects the large sample distribution of propensity score matching estimators. Moreover, we derive In contexts different than matching, Hirano, Imbens and Ridder 2003, Abadie 2005, Wooldridge 2007, and Angrist and Kuersteiner 2009 derive large sample properties of statistics based on a first step estimator of the propensity score. In all these cases, the second step statistics are smooth functionals of the propensity scores and, therefore, standard expansions for two-step estimators apply see, e.g., ewey and McFadden, 994.

4 an adjustment to the large sample variance of propensity score matching estimators that corrects for first step estimation of the propensity score. Finally, we use a small simulation exercise to illustrate the implications of our theoretical results. To preview our results, let F x θ be a parametric model for the propensity score, with unknown parameters θ, and let ˆθ be the maximum likelihood estimator for θ. We show that, under regularity conditions, the estimator ˆτ, for the average treatment effect τ = EY Y 0, based on matching on the estimated propensity score F X i ˆθ, satisfies ˆτ τ d 0, σ 2 c I θ c. In this expression, σ 2 is the variance of the matching estimator based on matching on the true propensity score F X iθ which follows from results in Abadie and Imbens, 2006, I θ is the Fisher information matrix for the parametric model for the propensity score, and c is a vector that depends on the covariance between the covariates and the outcome, conditional on the propensity score and the treatment. Thus, matching on the estimated propensity score has a smaller asymptotic variance than matching on the true propensity score. This is in line with results in Rubin and Thomas 992ab who argue that, in settings with normally distributed covariates, matching on the estimated rather than the true propensity score improves the properties of matching estimators. Hirano, Imbens and Ridder 2003 obtain a similar result for weighting estimators. The rest of the article is organized as follows. Section II provides an introduction to propensity score matching. Section III is the main section of the article. In this section we derive the large sample properties of an estimator that match on estimated propensity scores. Section IV proposes an estimator for the adjusted standard errors derived in section III. In section V we report the results of a small simulation exercise. Section VI concludes. II. Matching on the Estimated Propensity Score In evaluation research the focus of the analysis is typically the effect of a binary treatment, represented in this paper by the indicator variable W, on some outcome variable, Y. More specifically, W = indicates exposure to treatment, while W = 0 indicates lack of exposure 2

5 to treatment. Following Rubin 974, we define treatment effects in terms of potential outcomes. We define Y as the potential outcome under exposure to treatment, and Y 0 as the potential outcome under no exposure to treatment. Our goal is to estimate the average treatment effect, τ = E Y Y 0, where the expectation is taken over the population of interest, based on a random sample from this population. Estimation of treatment effects is complicated by the fact that for each unit in the population, the observed outcome reflects only one of the potential outcomes: Y = { Y 0 if W = 0, Y if W =. Let X be a vector of covariates that includes treatment confounders, that is, variables that affect the probability of treatment exposure and the potential outcomes. The propensity score is px = PrW = X. The following assumption is often referred to as strong ignorability Rosenbaum and Rubin, 983. Assumption : i Y, Y 0 W X almost surely; ii 0 < px < almost surely. Assumption i will hold if all treatment confounders are included in X; so, after controlling for X, treatment exposure is independent of the potential outcomes. Assumption ii states that for almost all values of X the population includes treated and untreated units. Let µw, x = EY W = w, X = x and µw, p = EY W = w, px = p be the regression of the outcome on the treatment indicator and the covariates, and on the treatment indicator and the propensity score respectively. Rosenbaum and Rubin 983 prove that, under Assumption, τ = E µ, px µ0, px. In other words, adjusting for the propensity score is enough to eliminate the bias created by all treatment confounders. This result by Rosenbaum and Rubin 983 motivates the use of propensity score matching estimators. Following Rosenbaum and Rubin 983 and the vast majority of 3

6 the empirical literature, consider a generalized linear specification for the propensity score px = F x θ. 2 In empirical research the link function F is usually specified as logit or probit. Assume for the moment that the parameters of the propensity score, θ, are known. For each observation, i, let J M i, θ be a set of M observations in the treatment group opposite to i and with values of F X θ similar to F X iθ. A propensity score matching estimator can be defined as: τ θ = 2W i Y i M j J M i,θ In this article we will consider matching with replacement, so each unit in the sample can be used as a match multiple times. In the absence of matching ties, the sets J M i, θ can be defined as: { } J M i, θ = j : W j = W i, {Wk = W i } { F X i θ F X k θ F X i θ F X j θ } M. k= Let K,i θ be the number of times that observation i is used as a match when matching on F X θ: K,i θ = The estimator τ θ can be represented as: τ θ = k= {i JM k,θ}. Y j 2W i + K,iθ Y i. M In practice, propensity scores are not directly observed and estimators that match on the true propensity score are therefore unfeasible. For some random sample {Y i, W i, X i }, let θ be an estimator of θ. A matching estimator of τ that matches on estimated propensity. scores is given by: τ θ = 2W i Y i M j J M i, θ Y j. 2 It is easy to extend our results to more general parametric models for the propensity score. We restrict our attention to generalized linear specifications because in practice they are widely used to estimate propensity scores. 4

7 We assume, in concordance with the literature, that θ is the Maximum Likelihood estimator of θ. 3 In the next section, we derive the large sample distribution of τ θ. III. Large Sample Distribution We begin by introducing a decomposition of τ θ that will be used later in this section. Define T θ = τ θ τ = 2W i + K,iθ Y i τ. M otice that T θ = D θ + R θ, where and R θ = D θ = = µ, F X iθ µ0, F X iθ τ 2W i + K,iθ Y i µw i, F X M iθ, 2W i µ W i, F X iθ M j J M i µ W i, F X iθ. Let P θ be the distribution of Z = {Y, W, X}, induced by the propensity score, F x θ, the marginal distribution of X, and the conditional distribution of Y given X and W. To simplify the exposition, we will implicitly assume that Y and X are bounded, so all moments exist for these two variables. Consider Z,i = {Y,i, W,i, X,i } with distribution given by the local shift P θ with θ = θ + h/, where h is a conformable vector of constants. Assumption 2: i For some ε > 0, all x in the support of X, and all θ R k such that θ θ ε, the distribution of F X θ is continuous with support equal to an interval bounded away from zero and one. ii For all θ R k such that θ θ ε, all F in the 3 This is done to conform with empirical practice. Our results can be readily extended to estimators of θ other than Maximum Likelihood. 5

8 support of F X θ, and all w = 0,, the regression function EY,i W,i = w, F X,i θ = F is Lipschitz-continuous in F. Proposition : If Assumption 2 holds, R θ p 0 under P θ. All proof are provided in the appendix. and Proposition implies that T θ = D θ + o p under P θ. Let In addition, let θ = Λ θ θ = be the Fisher Information Matrix for θ. Assumption 3: Under P θ : log dp θ Z dp θ,i, X,i W,i F X,i θ F X,i θ F X,i θfx,iθ. fx θ 2 I θ = E F X θ F X θ XX, Λ θ θ = h θ 2 h I θ h + o p, θ d 0, I θ, 2 and θ θ = I θ θ + o p. 3 For regular parametric models, equation can be established using Proposition 2..2 in Bickel et al Also for regular parametric models, equation 2 is derived in the proof of Proposition 2..2 in Bickel et al Equation 3 can be established using the same set of results plus classical conditions for asymptotic linearity of maximum likelihood estimators see, e.g., van der Vaart 998 Theorem 5.39; Lehmann and Romano 2005 Theorem The following assumption collects some technical regularity conditions that will be used later in this section. 6

9 Assumption 4: i The function F has a continuous derivative. ii There is some ε > 0, such that for all θ such that θ θ ε the density of F X θ is bounded and bounded away from zero. iii For all bounded functions hy, W, X, E θ hy, W, X F X θ, W converges to EhY, W, X F X θ, W where E θ P θ. denotes an expectation with respect to Assumption 4i is satisfied in the most usual binary choice models employed for the estimation of the propensity score Probit, Logit. We adopt Assumption 4ii for technical reasons, because it simplifies matters considerably in the proof of our main theorem. This assumption typically implies some trimming on the population of interest to discard lowdensity values of the propensity score. To avoid cluttering, we leave such trimming implicit in our notation. Primitive conditions for assumption 4iii can be established using the results in Ganssler and Pfanzagl 97. Our derivation of the limit distribution of τ τ is based on the techniques developed in Andreou and Werker 2005 to analyze to limit distribution of residualbased statistics. We proceed in four steps. First, we derive the joint limit distribution of T θ, θ under P θ. The following result is useful in that respect. Proposition 2: Suppose that Assumption 3 holds. Then, under P θ : D θ d 0 σ 2 c,, θ 0 c where σ 2 is the asymptotic variance of T θ and c = E covx, µw, X F X θ, W fx θ otice that propositions and 2 imply: T θ θ under P θ. d 0 0 I θ W F X θ 2 + σ 2 c, c I θ W. F X θ 2, 4 Second, we use equation 4, along with Assumption 3, to obtain the joint limit distribution of T θ, θ θ, Λ θ θ under P θ : T θ 0 σ 2 c I θ θ d θ c h 0, I Λ θ θ h θ c I θ h. I θ h/2 h c h h I θ h 7

10 Third, applying Le Cam s third lemma, we obtain T θ d c h, θ θ h σ 2 c I θ I θ c I θ, or equivalently: T θ + h/ θ θ d c h 0, σ 2 c I θ I θ c I θ, under P θ, for any h R k. Finally, we calculate the limit distribution of T θ = τ τ as the limit distribution of T θ +h/ conditional on θ = θ +h/ i.e. θ θ = h, integrated over the distribution of θ θ. Theorem : Under P θ τ τ d 0, σ 2 c I θ c. The asymptotic variance of τ is adjusted by c I θ c to account for first-step estimation of the propensity score. In this case, the adjustment reduces the asymptotic variance. This need not be the case for matching estimators of other treatment parameters, such as the average treatment effect on the treated. Formally, the proof of Theorem requires a discretization of the first step estimator θ see Andreou and Werker, 2005, for details. This discretization can be arbitrarily fine and the result of Theorem arises in the limit, as we make the discretization increasingly finer. IV. Estimation of the Asymptotic Variance Let H J i, θ be the set of the J units with W = W i and closest values of F X θ to F X iθ, and let Ȳ J,θ i be the average of Y for for the units in {i H J i, θ}. Consider the following estimator of vary i F X iθ, W i : σ 2,iθ = J j {i H J i,θ} Y j Ȳ J,θ i Y j Ȳ J,θ i. Abadie and Imbens 2006 propose the following estimator for σ 2 θ: σ θ 2 = 2W i Y i Y j τ θ M 8 j J M i,θ 2

11 = 2 K,i θ + 2M K,i θ σ 2 M M M,iθ. Let J,θ X i be the averages of X for for the units in {i H J i, θ}. otice that Y = µw, X + ε, where Eε X, W = 0. As a result: covx, µw, X F X θ, W = covx, Y F X θ, W. Consider the following estimator of covx, Y F X θ, W : ĉovx i, Y i F X iθ, W i = J j {i H J i, θ } X j X J, θ i Y j Ȳ J, θ i. Our estimator of c is: ĉ = ĉovx i, Y i F X iθ, W i fx i θ W i F X i θ + W i. 2 F X i θ 2 Finally, let Î θ, = fx i θ 2 F X i θ F X i θ X ix i. Because I θ is non-singular, the inverse of Îθ, exists with probability approaching one. Our estimator of the large sample variance of the propensity score matching estimator, adjusted for first step estimation of the propensity score, is: σ 2 adj, θ = σ 2 θ ĉ Î θ,ĉ. Consistency of this estimator can be shown using the results in Abadie and Imbens 2006 and the contiguity arguments employed in section III. V. A Small Simulation Exercise In this section, we run a small Monte Carlo exercise to investigate the sampling distribution of propensity score matching estimators and of the approximation to that distribution that we propose in the article. We use a simple Monte Carlo design. The outcome variable is generated by Y = 5W + 4X + X 2 + U, where X and X 2 are independent and uniform on 0, and U 9

12 is a standard ormal variable independent of W, X, X 2. The treatment variable, W, is related to X, X 2 through the propensity score, which is logistic PrW = X = x, X 2 = x 2 = exp + x x 2 + exp + x x 2. Table I reports the results of our Monte Carlo simulation for M = and = As in our theoretical results, the variance of τ θ, the estimator that matches on the true propensity score, is larger than the variance of τ θ, the estimator that matches on the estimated propensity score. The estimator of the variance of τ θ proposed in Abadie and Imbens 2006, σ 2 θ, is centered at the variance of τ θ. σ 2 θ is the estimator of the variance that treats the first step estimate of the propensity score θ as if it was the true propensity score, and σ 2 adj, θ is the adjusted estimator of the variance that takes into account that the propensity score is itself estimated in a first step. Finally, the table reports also confidence interval constructed with adjusted and unadjusted standard errors. In concordance with out theoretical results, the simulation shows that σ 2 θ is biased and too large on average. In addition, confidence intervals constructed with σ 2 θ have larger than nominal coverage rates. In contrast, σ 2 adj, θ is unbiased and produce confidence intervals that have coverage rates close to nominal rates. VI. Conclusions In this article, we propose a method to correct to the asymptotic variance of propensity score matching estimators when the propensity scores are estimated in a first step. Our results allow valid large sample inference for propensity score matching estimators. 0

13 Appendix For the proof of Proposition we will need some preliminary lemmas. Lemma A.: Consider two independent samples of sizes n 0 and n from continuous distributions F 0 and F with common support: V 0,,..., V 0,n0 i.i.d. F 0 and V,,..., V,n i.i.d. F. Let = n 0 + n. Assume that the support of F 0 and F is an interval inside 0,. Let f 0 and f be the densities of F 0 and F, respectively. Suppose that for any v in the supports of F 0 and F, f v/f 0 v r. For i n and m M n 0, let U n0,n,i m be the m-th order statistic of { V,i V 0,,..., V,i V 0,n0 }. Then, for n 0 3: E n M M n U n0,n,i m r /2 n 3/4 0 + M n m= 0 exp n /4 0. /2 nm /4 Proof: Consider balls assigned at random among n bins of equal probability. It is known that the mean of the number of bins with exactly m balls is equal to m n m m n n see Johnson and Kotz, 977, p. 4. Because f v/f 0 v r, for any measurable set A: f v PrV,i A = f v dv = f 0 v dv r PrV 0,i A. f 0 v A A Divide the support of F 0 and F in n 3/4 0 cells of equal probability / n 3/4 0 under F 0. Let Z M,n0 be the number of such cells are not occupied by at least M observations from the second sample: V 0,,..., V 0,n0. For i =,...,. Let µ M,n0 = EZ M,n0. otice that n 0 3 implies n 3/ Then, M m µ M,n0 = n 3/4 n0 0 m m=0 n 3/4 n0 m 0 n 3/4 0 M m n 3/4 0 nm 0 m! m=0 n 3/4 n0 m 0 n 3/4 0 Mn M /4 0 n0 n 3/4. 0 Using Markov s inequality, PrZ M,n0 > 0 = PrZ M,n0 µ M,n0 Mn M /4 0 n0 n 3/4. 0 otice that for any positive a, we have that a loga. Therefore, for any b <, we have that log b/ b/ and b/ exp b. As a result, we obtain: n0 n0 n 3/4 = n/4 0 exp n /4 0. n 0 0

14 Putting together the last two displayed equations, we obtain the following exponential bound for PrZ M,n0 > 0: PrZ M,n0 > 0 Mn M /4 0 exp n /4 0. otice that U n0,n,i m. For 0 n n 3/4 0, let c n0,n = F n/ n 3/4 0, then M E U n0,n,i m ZM,n0 = 0 m= ow, n E M M U n0,n,i m m= = n 3/4 0 n= M r n 3/4 0 M r n 3/4 0. M c n0,n c n0,n Pr cn0,n V,i c n0,n ZM,n0 = 0 n 3/4 0 n= n = E M + E + = + E n M n M n cn0,n c n0,n M ZM,n0 U n0,n,i m = 0 PrZ M,n0 = 0 m= M ZM,n0 U n0,n,i m > 0 PrZ M,n0 > 0 m= M ZM,n0 U n0,n,i m = 0 m= /2 PrZ M,n 0 > 0 n M E U n0,n M,i m Z M,n0 = 0 n m= /2 PrZ M,n 0 > 0 n r /2 n 3/4 0 + M n 0 exp n /4 0. /2 nm /4 Lemma A.2: Inverse Moments of the Doubly Truncated Binomial Distribution Let 0 be a Binomial variable with parameters, p that is left-truncated for values smaller than M and right-truncated for values greater than M, where M < /2. Then, for any r > 0, there exist a constant C r, such that r E C r, 0 for all > 2M. Proof: Let = 0. For all q > 0, r r { } r { } E = E > q + E q

15 = M M r Pr r Pr > > q + q r 0 q + q r otice that: Pr > q = x M x> / q,x M x M x M x x p x p x p x p x For > 2M the denominator can be bounded away from zero. Therefore, for some positive constant C, and q > / p, Pr > q C x M x> / q,x M C x> / q x x p x p x p x p x C exp { 2 / q p 2 }, by Hoeffding s Inequality e.g. van der Vaart and Wellner 996, p Therefore E/ 0 r is uniformly bounded for > 2M. Lemma A.3: Suppose that the propensity score, PrW = X, is continuously distributed and that there exist c L > 0 and c U < such that c L PrW = X = x c U for all x X. Let f be the distribution of the propensity score conditional on W =, and let f 0 be the distribution of the propensity score conditional on W = 0. Then, the ratio f p/f 0 p is bounded and bounded away from zero. Proof: Use Bayes Theorem to show that f p/f 0 p = p/ pprw = / PrW = 0. Proof of Proposition : Let f θ be the distribution of the propensity score conditional on W =, and let f θ 0 be the distribution of the propensity score conditional on W = 0. By lemma A.3 the ratio f θ p/f θ 0 p is uniformly bounded by some constant r. Consider 0 and as in Lemma A.2. Let ψ M, 0, = r /2 3/4 0 + M 0 exp /4 0. /2 M /4 Then, ψ M, 0, p 0. Rearrange the observations in the sample so that the first observations have W = and the remaining 0 = observations have W = 0. For i and m M, let U 0,,i m be the m-th order statistic of { F X,i θ F X, + θ,..., F X,i θ F X, θ }. For + i and m M, 3

16 let U 0,,i m be the m-th order statistic of { F X,i θ F X, θ,..., F X,i θ F X, θ }. Lemma A. implies that for large enough : E M M m= U 0,,i m E ψ M, 0,. A. Therefore, to prove that the left-hand-side of equation A. converges to zero, it is enough to show that ψ M, 0, is asymptotically uniformly integrable: lim lim sup E ψ k M, 0, {ψ M, 0, > k} = 0 see, e.g., van der Vaart 998, p. 7. otice that the ratio 3/4 / 3/4 is bounded. This, in combination with Lemma A.2, implies that for all k > 0 and some positive constant C, E ψ M, 0, {ψ M, 0, > k} E ψ M, 0, E r /2 3/4 0 + M /2 3/4 M+/2 0 exp /4 0 0 Similarly, E = i= + C E C /4 E M M m= /2 3/4 0 3/4 3/4 0 U 0,,i m Using Markov s inequality, we obtain that for any ε > 0: Pr M M U 0,,i m > ε m= E 0. p 0. M M U 0,,i m m= ε 0. The result now follows from Lipschitz-continuity of the regression functions, EY,i W,i = w, F X,i θ = F for some ε > 0 and θ θ ε. Proof of Proposition 2 Sketch: In this proof, we first establish a extend the martingale representation of matching estimators Abadie and Imbens, 2009 to the propensity score matching estimator studied in this article. Consider the linear combination C = z D θ + z 2 θ. C = z µ, F X,iθ µ0, F X,iθ τ 4

17 + z + z 2 2W,i + K,iθ Y,i µw,i, F X M,iθ X,i W,i F X,i θ F X,i θ F X,i θ fx,iθ. C can be analyzed using martingale methods. otice that: C = 3 k= ξ,k, where ξ,k = z µ, F X,k θ µ0, F X,k θ τ for k, + z 2 E θ X,k F X,k θ W,k F X,k θ F X,k θ F X,k θ fx,k θ, ξ,k = z 2 X,k E θ X,k F X,k θ W,k F X,k θ fx,k θ F X,k θ F X,k θ + z 2W,k + K,k θ µw,k, X,k µw,k, F X,k M θ. for + k 2, ξ,k = z 2W,k 2 + K,k 2θ Y,k 2 µw,k 2, X,k 2, M for 2 + k 3. Consider the σ-fields F,k = σ{w,,..., W,k, X, θ,..., X,k θ } for k, F,k = σ{w,,..., W,, X, θ,..., X, θ, X,,..., X,k } for + k 2, and F,k = σ{w,,..., W,, X,,..., X,, Y,,..., Y,k } for 2 + k 3. Then, i ξ,j, F,i, i 3 j= is a martingale for each. Therefore, the limiting distribution of C can be studied using Martingale Central Limit Theorem e.g., Theorem 35.2 in Billingsley 995, p. 476; importantly, notice that this theorem allows that the probability space varies with. Because Y,i, X,i, and W,i are bounded random variables uniformly in, and because K,i has uniformly bounded moments see Abadie and Imbens, 2009, it follows that: 3 k= Eξ 2+δ,k 0 for some δ > 0. 5

18 Lindeberg s condition in Billingsley s theorem follows easily from last equation Lyapounov s condition. As a result, we obtain that under P θ where and C d 0, σ 2 + σ σ 2 3, σ 2 = plim σ 2 2 = plim σ 2 3 = plim E θ ξ,k 2 F,k, k= 2 k=+ 3 k=2+ E θ ξ 2,k F,k, E θ ξ 2,k F,k. After some algebra, we obtain: σ 2 = z 2 E µ, F X θ µ0, F X θ τ 2 + z 2E f 2 X θ F X θ F X θ EX F X θex F X θ z 2. Following the calculations in Abadie and Imbens 2006, additional proofs for the expectation of + K,i /M 2 : σ2 2 = z 2E f 2 X θ F X θ F X θ varx F X θ z 2 varµ, X F X + ze 2 θ F X + varµ0, X F X θ θ F X θ + z 2 2M E F X θ F X θ varµ, X F X θ + z 2 2M E F X θ F X θ varµ0, X F X θ + 2 z 2E covx, µw, X F X fx θ θ, W F X θ F X z. θ Here we use the fact that, conditional on the propensity score, X is independent of W. To derive the constant vector of the cross-product notice that: E cov X, µx, W F X θ, W W F X θ2w F X θ F X fx θ + K θ θ M = E cov X, µx, F X θ fx θ F X + K W θ = p θ M + E cov X, µx, 0 F X θ fx θ F X + K W θ = 0 p θ M 6

19 E cov X, µx, F X θ fx θ W = F X θ 2 p + E cov X, µx, 0 F X θ fx θ W = 0 F X θ 2 p W = E covx, µw, X F X θ, W fx θ F X θ 2 + W F X θ 2. Finally, vary X, W = σ3 2 = z 2 vary X, W = 0 E F X + θ F X θ + z 2 2M E F X θ F X θ vary X, W = + z 2 2M E F X θ F X θ vary X, W = 0. otice that for any integrable function gf X θ: E gf X θ varµw, X F X θ + vary X, W = w = E = E gf X θ As a result, under P θ : where z = z, z 2, and where gf X θ varµw, X F X θ + E vary X, W = w F X θ varµw, X F X θ, W = w + E vary X, W = w F X θ, W = w = E gf X θ vary F X θ, W = w. C d 0, z V z, σ 2 c V = c W c = E covx, µw, X F X θ, W fx θ F X θ 2 + W F X θ 2, and σ 2 is the asymptotic variance calculated in Abadie and Imbens for the case of a known propensity score. Applying the Cramer-Wold device, under P θ : D θ d 0, V. θ Proof of Theorem : Given our preliminary results, Theorem follows from Andreou and Werker I θ, 7

20 References Abadie, A Semiparametric Difference-in-Differences Estimators, Review of Economic Studies, vol. 72, no., -9. Abadie, A. and Imbens, G.W Large Sample Properties of Matching Estimators for Average Treatment Effects, Econometrica, vol. 74, no., Abadie, A. and Imbens, G.W On the Failure of the Bootstrap for Matching Estimators, Econometrica, vol. 76, no. 6, Abadie, A. and Imbens, G.W A Martingale Representation for Matching Estimators, BER working paper, no Andreou, E. and Werker, B.J.M An Alternative Asymptotic Analysis of Residual- Based Statistics, mimeo. Angrist, J.D. and Kuersteiner, G.M Causal Effects of Monetary Shocks: Semiparametric Conditional Independence Tests with a Multinomial Propensity Score, mimeo. Bickel, P.J., Klaassen, C.A., Ritov, Y. and Wellner, J.A. 998 Efficient and Adaptive Estimation for Semiparametric Models, Springer, ew York. Billingsley, P. 995, Probability and Measure, third edition. Wiley, ew York. Ganssler, P. and Pfanzagl, J. 97 Convergence of Conditional Expectations, The Annals of Mathematical Statistics, vol. 42, no., Hirano, K., G. Imbens, and G. Ridder 2003 Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score, Econometrica, vol. 7, no. 4, Johnson,. and Kotz, S. 977 Urn Models and Their Applications, John Wiley & Sons, ew York. ewey, W.K. and McFadden, D. 994 Large sample estimation and hypothesis testing. In: Engle, R.F., McFadden, D. Eds., Handbook of Econometrics, vol. 4. Elsevier Science, Amsterdam. Lehmann, E.L. and Romano, J.P Testing Statistical Hypothesis. Springer, ew York. Rosenbaum, P. and Rubin, D.B. 983 The Central Role of the Propensity Score in Observational Studies for Causal Effects, Biometrika, vol. 70, 455. Rubin, D.B. 974 Estimating Causal Effects of Treatments in Randomized and on-randomized Studies, Journal of Educational Psychology, vol. 66, Rubin, D., and. Thomas 992a Characterizing the effect of matching using linear propensity score methods with normal distributions, Biometrika, vol. 79, Rubin, D., and. Thomas 992b Affinely Invariant Matching Methods with Ellipsoidal Distributions, Annals of Statistics, vol. 20, no. 2,

21 van der Vaart, A. 998, Asymptotic Statistics, Cambridge University Press, ew York. van der Vaart, A.W. and Wellner, J.A. 996, Weak Convergence and Empirical Processes, Springer-Verlag, ew York. Wooldridge, J.M Inverse Probability Weighted Estimation for General Missing Data Problems, Journal of Econometrics, vol. 4,

22 Table I Simulation Results = 5000, umber of simulations = 0000 Variances over simulations Coverage of 95% C.I. asymp. s.e. = τ θ τ θ, σ 2 θ τ θ τ θ, σ 2 θ τ θ, σ adj, 2 θ Averages over simulations σ 2 θ σ 2 θ σ adj, 2 θ

Matching on the Estimated Propensity Score

Matching on the Estimated Propensity Score atching on the Estimated Propensity Score Alberto Abadie Harvard University and BER Guido W. Imbens Harvard University and BER October 29 Abstract Propensity score matching estimators Rosenbaum and Rubin,