TECHNICAL WORKING PAPER SERIES ON THE FAILURE OF THE BOOTSTRAP FOR MATCHING ESTIMATORS. Alberto Abadie Guido W. Imbens

Size: px
Start display at page:

Download "TECHNICAL WORKING PAPER SERIES ON THE FAILURE OF THE BOOTSTRAP FOR MATCHING ESTIMATORS. Alberto Abadie Guido W. Imbens"

Transcription

1 TECHNICAL WORKING PAPER SERIES ON THE FAILURE OF THE BOOTSTRAP FOR MATCHING ESTIMATORS Alberto Abadie Guido W. Imbens Technical Working Paper NATIONAL BUREAU OF ECONOMIC RESEARCH 050 Massachusetts Avenue Cambridge, MA 0238 June 2006 We are grateful for comments by Peter Bickel, Stéphane Bonhomme, Whitney Newey, and seminar participants at Princeton, CEMFI, and Harvard/MIT. Financial support for this research was generously provided through NSF grants SES Abadie and SES and SES Imbens. Abadie: John F. Kennedy School of Government, Harvard University, 79 John F. Kennedy Street, Cambridge, MA 0238, and NBER. Electronic correspondence: Imbens: Department of Economics, and Department of Agricultural and Resource Economics, University of California at Berkeley, 330 Giannini Hall, Berkeley, CA , and NBER. Electronic correspondence: berkeley. edu, The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research by Alberto Abadie and Guido W. Imbens. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including notice, is given to the source.

2 On the Failure of the Bootstrap for Matching Estimators Alberto Abadie and Guido W. Imbens NBER Technical Working Paper No. 325 June 2006 JEL No. C4, C2, C52 ABSTRACT Matching estimators are widely used for the evaluation of programs or treatments. Often researchers use bootstrapping methods for inference. However, no formal justification for the use of the bootstrap has been provided. Here we show that the bootstrap is in general not valid, even in the simple case with a single continuous covariate when the estimator is root-n consistent and asymptotically normally distributed with zero asymptotic bias. Due to the extreme non-smoothness of nearest neighbor matching, the standard conditions for the bootstrap are not satisfied, leading the bootstrap variance to diverge from the actual variance. Simulations confirm the difference between actual and nominal coverage rates for bootstrap confidence intervals predicted by the theoretical calculations. To our knowledge, this is the first example of a root-n consistent and asymptotically normal estimator for which the bootstrap fails to work. Alberto Abadie John F. Kennedy School of Government Harvard University 79 John F. Kennedy Street Cambridge, MA 0238 and NBER alberto_abadie@harvard.edu Guido W. Imbens Department of Economics Department of Agricultural and Resource Economics University of California at Berkeley 330 Giannini Hall Berkeley, CA and NBER imbens@econ.berkeley.edu

3 Introduction Matching methods have become very popular for the estimation of treatment effects. Often researchers use bootstrap methods to calculate the standard errors of matching estimators. 2 Bootstrap inference for matching estimators has not been formally justified. Because of the non-smooth nature of some matching methods and the lack of evidence that the resulting estimators are asymptotically linear e.g., nearest neighbor matching with a fixed number of neighbors, there is reason for concern about their validity of the bootstrap in this context. At the same time, we are not aware of any example where an estimator is root-n consistent, as well as asymptotically normally distributed with zero asymptotic bias and yet where the standard bootstrap fails to deliver valid confidence intervals. 3 This article addresses the question of the validity of the bootstrap for nearest-neighbor matching estimators with a fixed number of neighbors. We show in a simple case with a single continuous covariate that the standard bootstrap does indeed fail to provide asymptotically valid confidence intervals, in spite of the fact that the estimator is root-n consistent and asymptotically normal with no asymptotic bias. We provide some intuition for this failure. We present theoretical calculations for the asymptotic behavior of the difference between the variance of the matching estimator and the average of the bootstrap variance. These theoretical calculations are supported by Monte Carlo evidence. We show that the bootstrap confidence intervals can have over-coverage as well as under-coverage. The E.g., Dehejia and Wahba, 999. See Rosenbaum 200 and Imbens 2004 for surveys. 2 A partial list of recent papers using matching with bootstrapped standard errors includes Agodini and Dynarski 2004, Dehejia and Wahba 999, 2002, Galasso and Ravaillon 2003, Guarcello, Mealli, and Rosati 2003, Heckman, Ichimura and Todd 997, Ichino and Becker 2002, Imai 2005, Jalan and Ravallion 999, Lechner 2002, Myers, Olsen, Seftor, Young, and Tuttle 2002 Pradhan, and Rawlings 2003, Puhani 2002, Sianesi 2004, Smith and Todd 2005, Yamani, Lauer, Starling, Pothier, Tuzcu, Ratliff, Cook, Abdo, McNeil, Crowe, Hobbs, Rincon, Bott-Silverman, McCarthy and Young Familiar examples of failure of the bootstrap for estimators with non-normal limiting distributions arise in the contexts of estimating the maximum of the support of a random variable Bickel and Freedman, 98, estimating the average of a variable with infinite variance Arthreya, 987, and super-efficient estimation Beran, 984. Resampling inference in these contexts can be conducted using alternative methods such as subsampling Politis and Romano, 994; Politis, Romano, and Wolf, 999 and versions of the bootstrap where the size of the bootstrap sample is smaller than the sample size e.g., Bickel, Götze and Van Zwet, 997. See Hall 992 and Horowitz 2003 for general discussions. 2

4 results do not address whether nearest neighbor estimators with the number of neighbors increasing with the sample size do satisfy asymptotic linearity or whether the bootstrap is valid for such estimators as in practice many researchers have used estimators with very few e.g., one nearest neighbors. In Abadie and Imbens 2006 we have proposed analytical estimators of the asymptotic variance of matching estimators. Because the standard bootstrap is shown to be invalid, together with subsampling Politis, Romano, and Wolf, 999 these are now the only available methods of inference that are formally justified. 4 The rest of the article is organized as follows. Section 2 reviews the basic notation and setting of matching estimators. Section 3 presents theoretical results on the lack of validity of the bootstrap for matching estimators, along with simulations that confirm the formal results. Section 4 concludes. The appendix contains proofs. 2 Set up 2. Basic Model In this article we adopt the standard model of treatment effects under unconfoundedness Rubin, 978; Rosenbaum and Rubin, 983, Heckman, Ichimura and Todd, 997, Rosenbaum, 200, Imbens, The goal is to evaluate the effect of a treatment on the basis of data on outcomes and covariates for treated and control units. We have a random sample of units from the control population, and a random sample of N units from the treated population, with N = + N. Each unit is characterized by a pair of potential outcomes, Y i 0 and Y i, denoting the outcomes under the control and active treatment respectively. We observe Y i 0 for units in the control sample, and Y i for units in the treated sample. For all units we observe a covariate vector, X i. 5 Let W i indicate whether a unit is from the control sample W i = 0 or the treatment sample W i =. For each unit we observe the triple X i, W i, Y i where Y i = W i Y i + W i Y i 0 is the observed 4 Politis, Romano, and Wolf 999 show that subsampling produces valid inference for statistics with stable asymptotic distributions. 5 To simplify our proof of lack of validity of the bootstrap we will consider in our calculations the case with a scalar covariate. With higher dimensional covariates there is the additional complication of biases that may dominate the asymptotic distribution of matching estimators Abadie and Imbens,

5 outcome. Let X an N-column matrix with column i equal to X i, and similar for Y and W. Also, let X 0 denote the N-column matrix with column i equal to W i X i, and X the N-column matrix with column i equal to W i X i. The following two assumptions are the identification conditions behind matching estimators. Assumption 2.: unconfoundedness For almost all x, Y i, Y i 0 is independent of W i conditional on X i = x, or Y i 0, Y i W i Xi = x, a.s. Assumption 2.2: overlap For some c > 0, and almost all x c PrW i = X i = x c. In this article we focus on matching estimation of the average treatment effect for the treated: 6 τ = E[Y i Y i 0 W i = ]. 2. A nearest neighbor matching estimator of τ matches each treated unit i to the control unit j with the closest value for the covariate, and then averages the within-pair outcome differences, Y i Y j, over the N matched pairs. Here we focus on the case of matching with replacement, so each control unit can be used as a match for more than one treated units. Formally, for all treated units i that is, units with W i = let D i be the distance between the covariate value for observation i and the covariate value for the closest control match: D i = Then let min X i X j. j=,...,n:w j =0 J i = {j {, 2,..., N} : W j = 0, X i X j = D i } 6 In many cases, the interest is in the average effect for the entire population. We focus here on the average effect for the treated because it simplifies the calculations below. Since the overall average effect is the weighted sum of the average effect for the treated and the average effect for the controls it suffices to show that the bootstrap is not valid for one of the components. 4

6 be the set of closest matches for treated unit i. If unit i is a control unit, then J i is defined to be the empty set. When X i is continuously distributed, the set J i will consist of a single index with probability one, but for bootstrap samples there will often be more than one index in this set because an observation from the original sample may appear multiple times in the bootstrap sample. For each treated unit, i, let Ŷ i 0 = #J i j J i Y j be the average outcome in the set of the closest matches for observation i, where #J i is the number of elements of the set J i. The matching estimator of τ is then ˆτ = N i:w i = Y i Ŷi For the subsequent discussion it is useful to write the estimator in a different way. Let K i denote the weighted number of times unit i is used as a match if unit i is a control unit, with K i = 0 if unit i is a treated unit: 0 if W i =, K i = {i J j} if W i = 0. #J j W j = Then we can write ˆτ = N N i= W i K i Y i. 2.3 Let K i = W j = 0 if W i =, 2 {i J j} if W i = 0. #J j Abadie and Imbens 2006 prove that under certain conditions for example, when X is a scalar variable the nearest-neighbor matching estimator in 2.2 is root-n consistent and asymptotically normal with zero asymptotic bias. 7 Abadie and Imbens propose two 7 More generally, Abadie and Imbens 2002 propose a bias correction that makes matching estimators root-n consistent and asymptotically normal regardless of the dimension of X. 5

7 variance estimators: V AI,I = N 2 N i= W i K i 2 σ 2 X i, W i, and V AI,II = N 2 N i= 2+ Y i Ŷi0 ˆτ N 2 N Ki 2 K i σ 2 X i, W i, i= where σ 2 X i, W i is an estimator of the conditional variance of Y i given W i and X i based on matching. Let l j i be the j-th closest match to unit i, in terms of the covariates, among the units with the same value for the treatment that is, units in the treatment groups are matched to units in the treatment group, and units in the control group are matched to units in the control group. 8 Define σ 2 X i, W i = J Y i J + J 2 J Y lj i. 2.4 j= Let Vˆτ be the variance of ˆτ, and let Vˆτ X, W the variance of ˆτ conditional on X and W. Abadie and Imbens 2006 show that under weak regularity conditions the normalized version of first variance estimator, N V AI,I is consistent for the normalized conditional variance, N Vˆτ X, W: N Vˆτ X, W V AI,I for fixed J as N. p 0, The normalized version of the second variance estimator, N V AI,II, is consistent for the normalized marginal variance, N Vˆτ: N Vˆτ V AI,II p 0, for fixed J as N. 8 To simplify the notation, here we consider only the case without matching ties. The extension to accommodate ties is immediate see Abadie, Drukker, Herr, and Imbens, 2004, but it is not required for the purpose of the analysis in this article. 6

8 2.2 The Bootstrap We consider two versions of the bootstrap in this discussion. The first version centers the bootstrap variance at the matching estimate in the original sample. The second version centers the bootstrap variance at the mean of the bootstrap distribution of the matching estimator. Consider a random sample Z = X, W, Y with controls and N treated units. The matching estimator, ˆτ, is a functional t of the original sample: ˆτ = tz. We construct a bootstrap sample, Z b, with controls and N treated by sampling with replacement from the two subsamples. We then calculate the bootstrap estimator, ˆτ b, applying the functional t to the bootstrap sample: ˆτ b = tz b. The first version of the bootstrap variance is the second moment of ˆτ b ˆτ conditional on the sample, Z: V B,I = v I Z = E [ ˆτ b ˆτ 2 Z ]. 2.5 The second version of the bootstrap variance centers the bootstrap variance at the bootstrap mean, E[ˆτ b Z], rather than at the original estimate, ˆτ: V B,II = v II Z = E [ ˆτ b E [ˆτ b Z] 2 Z ]. 2.6 Although these bootstrap variances are defined in terms of the original sample Z, in practice an easier way to calculate them is by drawing B bootstrap samples. Given B bootstrap samples with bootstrap estimates ˆτ b, for b =,..., B, we can obtain unbiased estimators for these two variances as ˆV B,I = B ˆτ b ˆτ 2, B and b= ˆV B,II = B B ˆτ b B b= 2 B ˆτ b. b= We will focus on the first bootstrap variance, V B,I, and its unconditional expectation, E[V B,I ]. We shall show that in general N E[V B,I ] does not converge to N Vˆτ. We will show that in some cases the limit of N E[V B,I ] Vˆτ is positive and that in other cases 7

9 this limit is negative. As a result, we will show that N V B,I is not a consistent estimator of the limit of N Vˆτ. This will indirectly imply that N V B,II is not consistent either. Because E [ ˆτ b ˆτ 2 Z ] E [ ˆτb E [ˆτ b Z] 2 Z ], it follows that E[V B,I ] E[V B,II ]. Thus in the cases where the limit of N E[V B,I ] Vˆτ is smaller than zero, it follows that the limit of N E[V B,II ] Vˆτ is also smaller than zero. In most standard settings, both centering the bootstrap variance at the estimate in the original sample or at the average of the bootstrap distribution of the estimator lead to valid confidence intervals. In fact, in many settings the average of the bootstrap distribution of an estimator is identical to the estimate in the original sample. example, if we are interested in constructing a confidence interval for the population mean µ = E[X] given a random sample X,..., X N, the expected value of the bootstrap statistic, E[ˆµ b X,..., X N ], is equal to the sample average for the original sample, ˆµ = i X i/n. For matching estimators, however, it is easy to construct examples where the average of the bootstrap distribution of the estimator differs from the estimate in the original sample. As a result, the two bootstrap variance estimators will lead to different confidence intervals with potentially different coverage rates. For 3 An Example where the Bootstrap Fails In this section we discuss in detail a specific example where we can calculate the limits of N Vˆτ and N E[V B,I ] and show that they differ. 3. Data Generating Process We consider the following data generating process: Assumption 3.: The marginal distribution of the covariate X is uniform on the interval [0, ] Assumption 3.2: The ratio of treated and control units is N / = α for some α > 0. 8

10 Assumption 3.3: The propensity score ex = PrW i = X i = x is constant as a function of x. Assumption 3.4: The distribution of Y i is degenerate with PrY i = τ =, and the conditional distribution of Y i 0 given X i = x is normal with mean zero and variance one. The implication of Assumptions 3.2 and 3.3 is that the propensity score is ex = α/ + α. 3.2 Exact Variance and Large Sample Distribution The data generating process implies that conditional on X = x the treatment effect is equal to E[Y Y 0 X = x] = τ for all x. Therefore, the average treatment effect for the treated is equal to τ. Under this data generating process i W i Y i /N = i W i Y i /N = τ, which along with equation 2.3 implies: ˆτ τ = N N i= K i Y i. Conditional on X and W the only stochastic component of ˆτ is Y. By Assumption 3.4 the Y i -s are mean zero, unit variance, and independent of X. Thus E[ˆτ τ X, W] = 0. Because i E[Y i Y j W i = 0, X, W] = 0 for i j, ii E[Y 2 i W i = 0, X, W] = and iii K i is a deterministic function of X and W, it also follows that the conditional variance of ˆτ given X and W is Vˆτ X, W = N 2 N Ki 2. i= Because VE[ˆτ X, W] = Vτ = 0, the exact unconditional variance of the matching estimator is therefore equal to the expected value of the conditional variance: Vˆτ = N 2 E [ K 2 i W i = 0 ]. 3.7 Lemma 3.: Exact Variance of Matching Estimator Suppose that Assumptions 2., 2.2, and hold. Then 9

11 i the exact variance of the matching estimator is Vˆτ = N ii as N, and iii, N + 8/3 N + + 2, 3.8 N Vˆτ + 3 α, N ˆτ τ d N 0, + 32 α. All proofs are given in the Appendix. 3.3 The Bootstrap Variance Now we analyze the properties of the bootstrap variance, V B,I in 2.5. As before, let Z = X, W, Y denote the original sample. We will look at the distribution of statistics both conditional on the original sample, as well as over replications of the original sample drawn from the same distribution. Notice that E [ V B,I] = E [ E [ ˆτ b ˆτ 2 Z ]] = E [ ˆτb ˆτ 2] 3.0 is the expected bootstrap variance. The following lemma establishes the limit of N E[V B,I ] under our data generating process. Lemma 3.2: Bootstrap Variance I Suppose that Assumptions hold. Then, as N : N E[V B,I ] α 5 exp 2 exp 2 3 exp + 2 exp. 3. Recall that the limit of the normalized variance of ˆτ is + 3/2 α. For small values of α the limit of the expected bootstrap variance exceeds the limit variance by the third term in 3., 2 exp 0.74, or 74%. For large values of α the second term in 3. dominates and the ratio of the limit expected bootstrap and limit variance is equal to the factor in the second term of 3. multiplying 3/2α. Since 5 exp 0

12 2 exp 2/3 exp 0.83, it follows that as α increases, the ratio of the limit expected bootstrap variance to the limit variance asymptotes to 0.83, suggesting that in large samples the bootstrap variance can under as well as over estimate the true variance. So far, we have established the relation between the limiting variance of the estimator and the limit of the average bootstrap variance. We end this section with a discussion of the implications of the previous two lemmas for the validity of the bootstrap. The first version of the bootstrap provides a valid estimator of the asymptotic variance of the simple matching estimator if: N E [ τ b τ 2 ] p Z V τ 0. Lemma 3. shows that: N V τ α. Lemma 3.2 shows that N E [ τ b τ 2] α 5 exp 2 exp 2 3 exp + 2 exp. Assume that the first version of the bootstrap provides a valid estimator of the asymptotic variance of the simple matching estimator. Then, N E [ τ b τ 2 Z ] p α. Because N E [ τ b τ 2 Z] 0, it follows by Portmanteau Lemma see, e.g., van der Vaart, 998, page 6 that, as N, α lim E [ N E [ τ b τ 2 Z ] ] = lim N E [ τ b τ 2] = α 5 exp 2 exp 2 3 exp + 2 exp. However, the algebraic inequality α α 5 exp 2 exp 2 3 exp + 2 exp, does not hold for large enough α. As a result, the first version of the bootstrap does not provide a valid estimator of the asymptotic variance of the simple matching estimator.

13 The second version of the bootstrap provides a valid estimator of the asymptotic variance of the simple matching estimator if: N E [ τ b E[ τ b Z] 2 ] p Z V τ 0. Assume that the second version of the bootstrap provides a valid estimator of the asymptotic variance of the simple matching estimator. Then, N E [ τ b E[ τ b Z] 2 Z ] p α. Notice that E [ τ b E[ τ b Z] 2 Z] E [ τ b τ 2 Z]. By Portmanteau Lemma, as N + 3 [N 2 α lim inf E E [ τ b E[ τ b Z] 2 ] ] Z lim E [N E [ τ b τ 2 ] ] Z = lim N E [ τ b τ 2] = α 5 exp 2 exp 2 3 exp + 2 exp. Again, this inequality does not hold for large enough α. As a result, the second version of the bootstrap does not provide a valid estimator of the asymptotic variance of the simple matching estimator. 3.4 Simulations We consider three designs: = N = 00 Design I, = 00, N = 000 Design II, and = 000, N = 00 Design III, We use 0,000 replications, and 00 bootstrap samples in each replication. These designs are partially motivated by Figure, which gives the ratio of the limit of the expectation of the bootstrap variance given in equation 3. to limit of the actual variance given in equation 3.9, for different values of α. On the horizontal axis is the log of α. As α converges to zero the variance ratio converges to.74; at α = the variance ratio is.9; and as α goes to infinity the variance ratio converges to The vertical dashed lines indicate the three designs that we adopt in our simulations: α = 0., α =, and α = 0. The simulation results are reported in Table. The first row of the table gives normalized exact variances, N Vˆτ, calculated from equation 3.8. The second and third rows present averages over the 0,000 simulation replications of the normalized 2

14 variance estimators from Abadie and Imbens The second row reports averages of N V AI,I and the third row reports averages of N V AI,II. In large samples, N V AI,I and N V AI,II are consistent for N Vˆτ X, W and N Vˆτ, respectively. Because, for our data generating process, the conditional average treatment effect is zero for all values of the covariates, N V AI,I and N V AI,II converge to the same parameter. Standard errors for the averages over 0,000 replications are reported in parentheses. The first three rows of Table allow us to assess the difference between the averages of the Abadie-Imbens AI variance estimators and the theoretical variances. For example, for Design I, the normalized AI Var I estimator N V AI,I is on average 2.449, with a standard error of The theoretical variance is 2.480, so the difference between the theoretical and AI variance I is approximately %, although it is statistically significant at about 5 standard errors. Given the theoretical justification of the variance estimator, this difference is a finite sample phenomenon. The fourth row reports the limit of normalized expected bootstrap variance, N E[V B,I ], calculated as in 3.. The fifth and sixth rows give normalized averages of the estimated bootstrap variances, N V B,I and N V B,II, over the 0,000 replications. These variances are estimated for each replication using 00 bootstrap samples, and then averaged over all replications. Again it is interesting to compare the average of the estimated bootstrap variance in the fifth row to the limit of the expected bootstrap variance in the fourth row. The differences between the fourth and fifth rows are small although significantly different from zero as a result of the small sample size. The limited number of bootstrap replications makes these averages noisier than they would otherwise be, but it does not affect the average difference. The results in the fifth row illustrate our theoretical calculations in Lemma 3.2: the average bootstrap variance can over-estimate or under-estimate the variance of the matching estimator. The next two panels of the table report coverage rates, first for nominal 90% confidence intervals and then for nominal 95% confidence intervals. The standard errors for the coverage rates reflect the uncertainty coming from the finite number of replications 0,000. They are equal to p p/r where for the second panel p = 0.9 and for 3

15 the third panel p = 0.95, and R = 0, 000 is the number of replications. The first rows of the last two panels of Table report coverage rates of 90% and 95% confidence intervals constructed in each replication as the point estimate, ˆτ, plus/minus.645 and.96 times the square root of the variance in 3.8. The results show coverage rates which are statistically indistinguishable from the nominal levels, for all three designs and both levels 90% and 95%. The second row of the second and third panels of Table report coverage rates for confidence intervals calculated as in the preceding row but using the estimator V AI,I in 2.5. The third row report coverage rates for confidence intervals constructed with the estimator V AI,II in 2.6. Both V AI,I and V AI,II produce confidence intervals with coverage rates that are statistically indistinguishable from the nominal levels. The last two rows of the second panel of Table report coverage rates for bootstrap confidence intervals obtained by adding and subtracting.645 times the square root of the estimated bootstrap variance in each replication, again over the 0,000 replications. The third panel gives the corresponding numbers for 95% confidence intervals. Our simulations reflect the lack of validity of the bootstrap found in the theoretical calculations. Coverage rates of confidence intervals constructed with the bootstrap estimators of the variance are different from nominal levels in substantially important and statistically highly significant magnitudes. In Designs I and III the bootstrap has coverage larger the nominal coverage. In Design II the bootstrap has coverage smaller than nominal. In neither case the difference is huge, but it is important to stress that this difference will not disappear with a larger sample size, and that it may be more substantial for different data generating processes. The bootstrap calculations in this table are based on 00 bootstrap replications. Increasing the number of bootstrap replications significantly for all designs was infeasible as matching is already computationally expensive. 9 We therefore investigated the implications of this choice for Design I, which is the fastest to run. For the same 0,000 9 Each calculation of the matching estimator requires N searches for the minimum of an array of length, so that with B bootstrap replications and R simulations one quickly requires large amounts of computer time. 4

16 replications we calculated both the coverage rates for the 90% and 95% confidence intervals based on 00 bootstrap replications and based on,000 bootstrap replications. B,I For the confidence intervals based on V the coverage rate for a 90% nominal level was s.e higher with,000 bootstrap replications than with 00 bootstrap replications. The coverage rate for the 95% confidence interval was s.e., 0.00 higher with,000 bootstrap replications than with 00 bootstrap replications. Because the difference between the bootstrap coverage rates and the nominal coverage rates for this design are 0.03 and for the 90% and 95% confidence intervals respectively, the number of bootstrap replications can only explain approximately 6-5% of the difference between the bootstrap and nominal coverage rates. We therefore conclude that using more bootstrap replications would not substantially change the results in Table. 4 Conclusion In this article we prove that the bootstrap is not valid for the standard nearest-neighbor matching estimator with replacement. This is a somewhat surprising discovery, because in the case with a scalar covariate the matching estimator is root-n consistent and asymptotically normally distributed with zero asymptotic bias. However, the extreme non-smooth nature of matching estimators and the lack of evidence that the estimator is asymptotically linear explain the lack of validity of the bootstrap. We investigate a special case where it is possible to work out the exact variance of the estimator as well as the limit of the average bootstrap variance. We show that in this case the limit of the average bootstrap variance can be greater or smaller than the limit variance of the matching estimator. This implies that the standard bootstrap fails to provide valid inference for the matching estimator studied in this article. A small Monte Carlo study supports the theoretical calculations. The implication for empirical practice of these results is that for nearest-neighbor matching estimators with replacement one should use the variance estimators developed by Abadie and Imbens 2006 or the subsampling bootstrap Politis, Romano and Wolf, 999. It may well be that if the number of neighbors increases with the sample size the matching estimator does become asymptotically linear and sufficiently 5

17 regular for the bootstrap to be valid. However, the increased risk of a substantial bias has led many researchers to focus on estimators where the number of matches is very small, often just one, and the asymptotics based on an increasing number of matches may not provide a good approximation in such cases. Finally, our results cast doubts on the validity of the standard bootstrap for other estimators that are asymptotically normal but not asymptotically linear see, e.g., Newey and Windmeijer,

18 Appendix Before proving Lemma 3. we introduce some notation and preliminary results. Let X,..., X N be a random sample from a continuous distribution. Let M j be the index of the closest match for unit j. That is, if W j =, then M j is the unique index ties happen with probability zero, with W Mj = 0, such that X j X Mj X j X i, for all i such that W i = 0. If W j = 0, then M j = 0. Let K i be the number of times unit i is the closest match for a treated observation: K i = W i N W j {M j = i}. j= Following this definition K i is zero for treated units. Using this notation, we can write the estimator for the average treatment effect on the treated as: ˆτ = N W i K i Y i. A. N i= Also, let P i be the probability that the closest match for a randomly chosen treated unit j is unit i, conditional on both the vector of treatment indicators W and on vector of covariates for the control units X 0 : P i = PrM j = i W j =, W, X 0. For treated units we define P i = 0. The following lemma provides some properties of the order statistics of a sample from the standard uniform distribution. Lemma A.: Let X X 2 X N be the order statistics of a random sample of size N from a standard uniform distribution, U0,. Then, for i j N, E[X r i X j s ] = i[r] N j + [s] N + [r+s], where for a positive integer, a, and a non-negative integer, b: a [b] = a + b!/a!. Moreover, for i N, X i has a Beta distribution with parameters i, N i + ; for i j N, X j X i has a Beta distribution with parameters j i, N j i +. Proof: All the results of this lemma, with the exception of the distribution of differences of order statistics, appear in Johnson, Kotz, and Balakrishnan 994. The distribution of differences of order statistics can be easily derived from the joint distribution of order statistics provided in Johnson, Kotz, and Balakrishnan 994. Notice that the lemma implies the following results: i E[X i ] = for i N, N + E[Xi 2 ] = ii + for i N, N + N + 2 ij + E[X i X j ] = for i j N. N + N + 2 First we investigate the first two moments of K i, starting by studying the conditional distribution of K i given X 0 and W. 7

19 Lemma A.2: Conditional Distribution and Moments of K i Suppose that assumptions hold. Then, the distribution of K i conditional on W i = 0, W, and X 0 is binomial with parameters N, P i : K i W i = 0, W, X 0 BN, P i. Proof: By definition K i = W i N j= W j {M j = i}. The indicator {M j = i} is equal to one if the closest control unit for X j is i. This event has probability P i. In addition, the events {M j = i} and {M j2 = i}, for W j = W j2 = and j j 2, are independent conditional on W and X 0. Because there are N treated units the sum of these indicators follows a binomial distribution with parameters N and P i. This implies the following conditional moments for K i : E[K i W, X 0 ] = W i N P i, E[Ki 2 W, X 0 ] = W i N P i + N N Pi 2. To derive the marginal moments of K i we need first to analyze the properties of the random variable P i. Exchangeability of the units implies that the marginal expectation of P i given, N and W i = 0 is equal to /. To derive the second moment of P i it is helpful to express P i in terms of the order statistics of the covariates for the control group. For control unit i let ιi be the order of the covariate for the i th unit among control units: ιi = N W j {X j X i }. j= Furthermore, let X 0k be the k th order statistic of the covariates among the control units, so that X 0 X X 0N0, and for control units X 0ιi = X i. Ignoring ties, a treated unit with covariate value x will be matched to control unit i if X 0ιi + X 0ιi 2 x X 0ιi+ + X 0ιi, 2 if < ιi <. If ιi =, then x will be matched to unit i if x X 02 + X 0, 2 and if ιi =, x will be matched to unit i if X 0N0 + X 0N0 2 < x. To get the value of P i we need to integrate the density of X conditional on W =, f x, over these sets. With a uniform distribution for the covariates in the treatment group f x =, for x [0, ], we get the following representation for P i : P i = X 02 + X 0 /2 if ιi =, X0ιi+ X 0ιi /2 if < ιi < N0, X 0N0 + X 0N0/2 if ιi =. Lemma A.3: Moments of P i Suppose that Assumptions hold. Then i, the second moment of P i conditional on W i = 0 is E[P 2 i W i = 0] = , 8 A.2

20 and ii, the Mth moment of P i is bounded by E[P M i W i = 0] M + M. + Proof: First, consider i. Conditional on W i = 0, X i has a uniform distribution on the interval [0, ]. Using Lemma A. and equation A.2, for interior i i such that < ιi <, we have that and E[P i < ιi <, W i = 0] = E [ P 2 i < ιi <, W i = 0 ] = 3 2 +, For the smallest and largest observations: and E[P i ιi {, }, W i = 0] = 3 2 E[P 2 i ιi {, }, W i = 0] = +, Averaging over all units includes two units at the boundary and 2 interior values, we obtain: E[P i W i = 0] = =, and E[Pi 2 W i = 0] = = For ii notice that X 02 + X 0 /2 X 02 if ιi =, P i = X0ιi+ X 0ιi /2 X0ιi+ X 0ιi if < ιi <, X 0N0 + X 0N0 /2 X 0N0 if ιi =. A.3 Because the right-hand sides of the inequalities in equation A.3 all have a Beta distribution with parameters 2,, the moments of P i are bounded by those of a Beta distribution with parameters 2 and. The Mth moment of a Beta distribution with parameters α and β is M j=0 α+j/α+β+j. This is bounded by α+m M /α+β M, which completes the proof of the second part of the Lemma. Proof of Lemma 3.: First we prove i. The first step is to calculate E[Ki 2 W i = 0]. Using Lemmas A.2 and A.3, E[K 2 i W i = 0] = N E[P i W i = 0] + N N E[P 2 i W i = 0] = N + 3 N N + 8/ Substituting this into 3.7 we get: Vˆτ = N 2 E[Ki 2 W i = 0] = + 3 N 2 N + 8/3 N + + 2, 9

21 proving part i. Next, consider part ii. Multiply the exact variance of ˆτ by N and substitute N = α to get N Vˆτ = α + 8/ Then take the limit as to get: lim N Vˆτ = + 3 N 2 α. Finally, consider part iii. Let Sr, j be a Stirling number of the second kind. The Mth moment of K i given W and X 0 is Johnson, Kotz, and Kemp, 993: E[K M i X 0, W i = 0] = M j=0 SM, j! P j i j!. Therefore, applying Lemma A.3 ii, we obtain that the moments of K i are uniformly bounded: E[K M i W i = 0] = M j=0 SM, j! j! M SM, j + M j. j=0 E[P j i W i = 0] M j=0 SM, j! j! j + M + Notice that [ E V N Ki 2 N i= N Ki 2 N i= ] = N E[K 2 i W i = 0] α, N 2 VKi 2 W i = 0 0, because covk 2 i, K2 j W i = W j = 0, i j 0 see Joag-Dev and Proschan, 983. Therefore: N Ki 2 N i= Finally, we write p α. A.4 ˆτ τ = N ξ i, N i= where ξ i = K i Y i. Conditional on X and W the ξ i are independent, and the distribution of ξ i is degenerate at zero for W i = and normal, Ki 2 for W i = 0. Hence, for any c R: N /2 Pr Ki 2 N ˆτ τ c X, W = Φc, N i= where Φ is the cumulative distribution function of a standard normal variable. Integrating over the distribution of X and W yields: N /2 Pr Ki 2 N ˆτ τ c = Φc. N i= 20

22 Now, Slustky s Theorem implies iii. Next we introduce some additional notation. Let R b,i be the number of times unit i is in the bootstrap sample. In addition, let D b,i be an indicator for inclusion of unit i in the bootstrap sample, so that D b,i = {R b,i > 0}. Let N b,0 = N i= W i D b,i be the number of distinct control units in the bootstrap sample. Finally, define the binary indicator B i x, for i =..., N to be the indicator for the event that in the bootstrap sample a treated unit with covariate value x would be matched to unit i. That is, for this indicator to be equal to one the following three conditions need to be satisfied: i unit i is a control unit, ii unit i is in the bootstrap sample, and iii the distance between X i and x is less than or equal to the distance between x and any other control unit in the bootstrap sample. Formally: { if x Xi = min B i x = k:wk =0,D b,k= x X k, and D b,i =, W i = 0, 0 otherwise. For the N units in the original sample, let K b,i be the number of times unit i is used as a match in the bootstrap sample. N K b,i = W j B i X j R b,j. j= A.5 We can write the estimated treatment effect in the bootstrap sample as ˆτ b = N W i R b,i Y i K b,i Y i. N i= Because Y i = τ by Assumption 3.4, and N i= W ir b,i = N, then ˆτ b τ = N N i= K b,i Y i. The difference between the original estimate ˆτ and the bootstrap estimate ˆτ b is ˆτ b ˆτ = N N i= We will calculate the expectation K i K b,i Y i = α N E[V B,I ] = N E[ˆτ b ˆτ 2 ] = N α 2 N 2 0 N K i K b,i Y i. i= N N E K i K b,i Y i K j K b,j Y j. i= j= Using the facts that E[Y 2 i X, W, W i = 0] =, and E[Y i Y j X, W, W i = W j = 0] = 0 if i j, this is equal to N E[V B,I ] = α E [ K b,i K i 2 W i = 0 ]. The first step in deriving this expectation is to establish some properties of D b,i, R b,i, N b,0, and B i x. Lemma A.4: Properties of D b,i, R b,i, N b,0, and B i x Suppose that Assumptions hold. Then, for w {0, }, and n {,..., } i R b,i W i = w, Z BN w, /N w, 2

23 ii D b,i W i = w, Z B, /N w N w, iii iv PrN b,0 = n = N0 n! S, n, n N N0 0 PrB i X j = W j =, W i = 0, D b,i =, N b,0 = N b,0, v for l j vi vii PrB i X l B i X j = W j = W l =, W i = 0, D b,i =, N b,0 = E[N b,0 / ] = / N0 exp, 3N b, N b,0 N b,0 + N b,0 + 2, VN b,0 = 2/ + / / 2 exp 2 exp. Proof: Parts i, ii, and iv are trivial. Part iii follows easily from equation 3.6 in page 0 of Johnson and Kotz 977. Next, consider part v. First condition on X 0b and W b the counterparts of X 0 and W in the b-th bootstrap sample, and suppose that D b,i =. The event that a randomly chosen treated unit will be matched to control unit i conditional on X 0b and W b depends on the difference in order statistics of the control units in the bootstrap sample. The equivalent in the original sample is P i. The only difference is that the bootstrap control sample is of size,b. The conditional probability that two randomly chosen treated units are both matched to control unit i is the square of the difference in order statistics. It marginal expectation is the equivalent in the bootstrap sample of E[Pi 2 W i = 0], again with the sample size scaled back to N b,0. Parts vi and vii can be derived by making use of equation 3.3 on page 4 in Johnson and Kotz 977. Next, we prove a general result for the bootstrap. Consider a sample of size N, indexed by i =,..., N. Let D b,i be an indicator whether observation i is in bootstrap sample b. Let N b = N i= D b,i be the number of distinct observations in bootstrap sample b. Lemma A.5: Bootstrap For all m 0: [ m ] N Nb E exp m, N and [ m ] m N E. N b exp Proof: From parts vi and vii of Lemma A.4 we obtain that N N b /N p exp. By the p Continuous Mapping Theorem, N/N b / exp. To obtain convergence of moments it suffices that, for any m 0, E[N N b /N m ] and E[N/N b m ] are uniformly bounded in N see, e.g., van der Vaart, 998. For E[N N b /N m ] uniform boundedness follows from the fact that, for any m 0, 22

24 N N b /N m. For E[N/N b m ] the proof is a little bit more complicated. As before, it is enough to show that E[N/N b m ] is uniformly bounded for N. Let λn = /N N. Notice that λn < exp for all N. Let 0 < θ < exp. Notice that, + θλn > 0. Therefore: [ N E N b m ] [ m { N N = E N b m [ N +E + θ exp } { }] N < + N b + θλn N b + θλn m ] N N Pr N b N b + θλn N b + θλn m N + N m Pr. + θ exp N b + θλn Therefore, for the expectation E[N/N b m ] to be uniformly bounded, it is sufficient that the probability PrN/N b + θλn converges to zero at an exponential rate as N. Notice that N Pr = Pr N N b NλN θnλn N b + θλn Pr N N b NλN θnλn. Theorem 2 in Kamath, Motwani, Palem, and Spirakis 995 implies: Pr N N b NλN θnλn 2 exp θ2 λn 2 N /2 λn 2. Because for N, λn 2 / λn 2 is increasing in N converging to exp2 > 0 as N, the last equation establishes an exponential bound on the tail probability of N/N b. Lemma A.6: Approximate Bootstrap K Moments Suppose that assumptions 3. to 3.3 hold. Then, i and ii, α 2 E[Kb,i 2 W i = 0] 2α exp, E[K b,i K i W i = 0] exp α + 32 α2 + α 2 exp. Proof: First we prove part i. Notice that for i, j, l, such that W i = 0, W j = W l = R b,j, R b,l D b,i, B i X j, B i X l. Notice also that {R b,j : W j = } are exchangeable with: R b,j = N. W j = Therefore, applying Lemma A.4i, for W j = W l = : covr b,j, R b,l = VR b,j N = /N 0. N 23

25 As a result, E[R b,j R b,l D b,i =, B i X j = B i X l =, W i = 0, W j = W l =, j l] 2 E[R b,j D b,i =, B i X j = B i X l =, W i = 0, W j = W l =, j l] 0. By Lemma A.4i, Therefore, In addition, Notice that E[R b,j D b,i =, B i X j = B i X l =, W i = 0, W j = W l =, j l] =. E[R b,j R b,l D b,i =, B i X j = B i X l =, W i = 0, W j = W l =, j l]. E [ R 2 b,j D b,i =, B i X j =, W j =, W i = 0 ] = N /N + N N /N 2 2. PrD b,i = W i = 0, W j = W l =, j l, N b,0 = PrD b,i = W i = 0, N b,0 = N b,0. Using Bayes Rule: Therefore, In addition, Therefore PrN b,0 = n D b,i =, W i = 0, W j = W l =, j l = PrN b,0 = n D b,i =, W i = 0 = PrD b,i = W i = 0, N b,0 = n PrN b,0 = n PrD b,i = W i = 0 PrB i X j = D b,i =, W i = 0, W j = = PrB i X j = D b,i =, W i = 0, W j =, N b,0 = n n= PrN b,0 = n D b,i =, W i = 0, W j = n = n= n N 2 0 PrB i X j B i X l D b,i =, W i = 0, W j = W l =, j l, N b,0 n= = n PrN b,0 = n /. PrNb,0 = n / = / exp. = 3 N0 2 N b,0 + 8/3 2 N b,0 N b,0 + N b, N0 2 PrB i X j B i X l D b,i =, W i = 0, W j = W l =, j l, N b,0 p 3 2 exp PrN b,0 = n D b,i =, W i = 0, W j = W l =, j l 3 N 2 n 2 PrN 0 n + 8/3 b,0 = n N = 0 2 nn + n + 2 /N n= 0 N0 9 N0 N0 4 n + 8/3 2 4 exp n 6 PrN b,0 = n. n= 2. 24

26 Notice that n= N0 4 n + 8/3 2 PrN b,0 = n n N0 9 n= 4 N0 PrN b,0 = n, n which is bounded away from infinity as shown in the proof of Lemma A.5. Convergence in probability of a random variable along with boundedness of its second moment implies convergence of the first moment see, e.g., van der Vaart, 998. As a result, N0 2 PrB i X j B i X l D b,i =, W i = 0, W j = W l =, j l exp Then, using these preliminary results, we obtain: N N E[Kb,i 2 W i = 0] = E W j W l B i X j B i X l R b,j R b,l Wi = 0 j= l= N = E W j B i X j Rb,j 2 W i = 0 j= N + E W j W l B i X j B i X l R b,j R b,l Wi = 0 j= l j = N E [ R 2 b,j D b,i =, B i X j =, W j =, W i = 0 ] Pr B i X j = D b,i =, W j =, W i = 0 PrD b,i = W j =, W i = 0 + N N E [R b,j R b,l D b,i =, B i X j = B i X l =, W j = W l =, j l, W i = 0] Pr B i X j B i X l = D b,i =, W j = W l =, j l, W i = 0 PrD b,i = W j = W l =, j l, W i = 0 α 2 2α exp. 25

27 This finishes the proof of part i. Next, we prove part ii. E[K i K b,i X 0, W, D b,i =, W i = 0] N N = E W j {M j = i} W l B i X l R b,l X0, W, D b,i =, W i = 0 j= j= l= l= N N = E W j W l {M j = i}b i X l R b,l X0, W, D b,i =, W i = 0 N N = E W j W l {M j = i}b i X l X 0, W, D b,i =, W i = 0 j= l= N N = E W j W l {M j = i}{m l = i}b i X l X 0, W, D b,i =, W i = 0 j= l= N N + E W j W l {M j = i}{m l i}b i X l X 0, W, D b,i =, W i = 0 j= l= N N = E W j W l {M j = i}{m l = i} X 0, W, D b,i =, W i = 0 j= l= N N + E W j W l {M j = i}{m l i}b i X l X 0, W, D b,i =, W i = 0 = E j= l= [ K 2 i X 0, W, D b,i =, W i = 0 ] N + E W j W l {M j = i}{m l i}b i X l X 0, W, D b,i =, W i = 0. j= l j Conditional on X 0, W i = 0, and D b,i =, the probability that a treated observation, l, that was not matched to i in the original sample, is matched to i in a bootstrap sample does not depend on the covariate values of the other treated observations or on W. Therefore: B 0 i = E[B i X l X 0, W, D b,i =, W i = 0, W j = W l =, M j = i, M l i] = E[B i X l X 0, D b,i =, W i = 0, W l =, M l i]. As a result: N E W j W l {M j = i}{m l i}b i X l X 0, W, D b,i =, W i = 0 j= l j N = Bi 0 E W j W l {M j = i}{m l i} X 0, W, D b,i =, W i = 0 j= l j [ ] = Bi 0 E K i N K i X 0, W, D b,i =, W i = 0 [ = Bi 0 E ] K i N K i X 0, W i = 0. Conditional on X 0 and W i = 0, K i has a Binomial distribution with parameters N, P i. Therefore: E[K i N K i X 0, W i = 0] = N 2 P i N P i N N Pi 2 = N N P i P i. 26

28 Therefore: N E W j W l {M j = i}{m l i}b i X l ιi, P i, D b,i =, W i = 0 j= l j = E[B 0 i ιi, P i, D b,i =, W i = 0]N N P i P i. In addition, the probability that r specified observations do not appear in a bootstrap sample conditional on that another specified observation appears in the sample is apply Bayes theorem: N0 r r N0 {r } N0 N0. Notice that for a fixed r this probability converges to exp r, as. Notice also that this probability is bounded by exp r/ exp, which is integrable: r= exp r exp = exp exp 2. As a result, by the dominated convergence theorem for infinite sums: lim r= N0 r N0 {r } r N0 N0 = exp r = r= exp exp. For k, d {,..., } and k + d, let dk = X 0k+d X 0k. In addition, let d0 = X 0d, and for k + d = + let dk = X 0k. Notice that: Bi 0 = P i { N0 ιi ιi+r + r= ιi r= 2 ιi r 2 + {r = ιi} ιi+ιi 2 + {r = ιi } ιi0 2 N0 r r N0 N0 N0 r N0 } r N0 N0. In addition, using the results in Lemma A. we obtain that, for < ιi <, and r ιi, we have: E [ ιi+r P i ιi, D b,i =, W i = 0 ] = + + 2, 27 N0

29 E [ N0 ιi+ιip i ιi, D b,i =, W i = 0 ] = ιi + 3/ For < ιi <, and r ιi, we have: E [ ιi r P i ιi, D b,i =, W i = 0 ] = E [ ιi0 P i ιi, D b,i =, W i = 0 ] = Therefore, for < ιi < : + + 2, ιi + 3/ N E W j W l {M j = i}{m l i}b i X l ιi, D b,i =, W i = 0 = j= l j { N0 ιi r= + {r = ιi} ιi + 3/2 + ιi r= + {r = ιi }ιi + /2 For ιi = and r, we obtain: E [ +r P i ιi =, D b,i =, W i = 0 ] = 3 2 E [ N0P i ιi =, D b,i =, W i = 0 ] = 3 2 N N N0 r r N0 N0 N0 r r N0 N , + / , N E W j W l {M j = i}{m l i}b i X l ιi =, D b,i =, W i = 0 = j= l j N 0 r= + {r = } + /3 N0 N0 } 3N N N0 r N0 r N0 N0. with analogous results for the case ιi =. Let N T, N, n = E W j W l {M j = i}{m l i}b i X l ιi = n, D b,i =, W i = 0, Then, T, N, n = j= l j N N R N0 n + U N0 n,

30 where R N0 n = 2 { N0 n r= N0 r N0 r N0 N0 N0 r n r + r= N0 N0 N0 }, U N0 n = 2 { n + 3/2 N0 n n for < n <. For n = : N0 R N0 = 3 N 0 r N0 r 4 r= N0 N0, U N0 = 3 N N0 0 4 N 0 + /3 N0 N0, with analogous expressions for n =. Let T = α 2 exp / exp. Then, N0 N0 N0 N0 n n + + n + 3/2 N0 N0 T T, N, n = exp α 2 exp R n U N0 n + α 2 Notice that, for 0 < n <, R N0 n = exp r= exp exp 2. N N R N0n + U N0n. r N0 exp exp r r= N0 }, 29

31 Notice also that, because log λ λ for any λ > 0, we have that for all n such that 0 < n <, logn n/ N0 = logn + log n/ logn n, and therefore n n/ N0 exp. This implies that the quantities n+3/2 n/ N0 and n+/2 n / N0 are bounded by exp + 3/2. Therefore, for 0 < n < : U N0 n exp + 3/2 exp. Similarly, for n {, }, we obtain and R N0 n 3 exp 4 exp 2, U N0 n 3 exp + 4/3 4 exp. Consequently, R N0 n and U N0 n are uniformly bounded by some constants R and Ū for all, n N the set of positive integers with n. Therefore, for all, n N with n + α 2 + R + Ū. T T, N, n α 2 exp exp + R + Ū N N Because every convergent sequence in R is bounded, N N / is bounded by some constant ᾱ 2. As a result, there exist some constant D such that T T, N, n D. Notice that for all n such that 0 < n <, R N0 n = R n + V N0 n, where R n = 2 { N0 n r= r N0 + n } rn0 N0, r= and V N0 n = 2 { N0 n r= [ N0 ] N0 r r N0 N0 N0 n + r= [ N0 ] N0 r r N0 N0 N0 }, Notice that for all n such that 0 < n <, 0 V N0 n V N0, where [ N0 ] N0 r N0 {r } r V N0 = N0 N0, r= which does not depend on n. Applying the dominated convergence theorem, it is easy to show that V N0 0, as. 30

32 For some δ such that 0 < δ <, let I δ,n0 = {n N : + N0 δ < n < N0 δ } and Īδ, = {n N : n + N0 δ } {n N : N0 δ n }. Notice that { RN 0 n < N0 n } n exp r + exp r 2 < r= r= r= exp r = exp exp. For n I δ,n0, n > N δ 0 and n > N δ 0. Therefore, R n > r= r N0 {r N δ 0 }, Let D N0 = exp exp r N0 {r N0 δ }. r= It follows that exp exp R n < D N0. Notice that D N0 does not depend on n. Also, applying the dominated convergence theorem, it is easy to show that D N0 0, as. In addition, for n I N0,δ, it has to be the case that n 2 and n. As a result, n + 3/2 n N0 < δ N0 < exp N0 δ, and n + 3/2 n N0 < δ N0 < exp N0 δ. Therefore, for n I N0,δ, U N0 n < Ū, where Ū N0 = exp exp N δ 0. Notice that Ū 0, as. The last set of results imply that for n I δ,n0, T T, N, n α 2 DN0 + V N0 + Ū + N N α R + Ū. Let #I δ,n0 and #Īδ, be the cardinalities of the sets I δ,n0 and Īδ, respectively. Notice that #I δ,n0 <, #I δ,n0 / and #Īδ, / 0. 3

33 N E W j W l {M j = i}{m l i}b i X l D b,i =, W i = 0 T j= l j = T T, N, n { n }/ = + n= T T, N, n {n I δ,n0 }/ n= T T, N, n {n Īδ, }/. n= Using the bounds established above and the fact that #I δ,n0 <, we obtain: T T, N, n {n I δ,n0 }/ α 2 DN0 + V N0 + Ū {n I δ,n0 }/ n= Notice also that: n= n= + N N α R + Ū {n I δ,n0 }/ n= α 2 DN0 + V N0 + Ū + N N α R + Ū 0. T T, N, n {n Īδ, }/ D {n N Īδ, } = D #Īδ, 0. 0 As a result, we obtain: N E W j W l {M j = i}{m l i}b i X l D b,i =, W i = 0 α 2 exp exp. j= l j Now, because we obtain n= E[K 2 i D b,i =, W i = 0] = E[K 2 i W i = 0] α α2, E[K i K b,i D b,i =, W i = 0] α α2 + α 2 exp exp. Therefore, because E[K i K b,i D b,i = 0, W i = 0] = 0, we obtain E[K i K b,i W i = 0] α α2 + α 2 exp exp. exp 32

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Matching on the Estimated Propensity Score

Matching on the Estimated Propensity Score atching on the Estimated Propensity Score Alberto Abadie Harvard University and BER Guido W. Imbens Harvard University and BER October 29 Abstract Propensity score matching estimators Rosenbaum and Rubin,

More information

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E. NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES James J. Heckman Petra E. Todd Working Paper 15179 http://www.nber.org/papers/w15179

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 1, Monday, July 30th, 9.00-10.30am Estimation of Average Treatment Effects Under Unconfoundedness 1.

More information

NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens

NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens BER WORKIG PAPER SERIES MATCHIG O THE ESTIMATED PROPESITY SCORE Alberto Abadie Guido W. Imbens Working Paper 530 http://www.nber.org/papers/w530 ATIOAL BUREAU OF ECOOMIC RESEARCH 050 Massachusetts Avenue

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar Mitnik First Draft: July 2004

More information

Inference for Average Treatment Effects

Inference for Average Treatment Effects Inference for Average Treatment Effects Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2018 1 / 15 Social

More information

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II A Course in Applied Econometrics Lecture Outline Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Assessing Unconfoundedness (not testable). Overlap. Illustration based on Lalonde

More information

Clustering as a Design Problem

Clustering as a Design Problem Clustering as a Design Problem Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge Harvard-MIT Econometrics Seminar Cambridge, February 4, 2016 Adjusting standard errors for clustering is common

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Large Sample Properties of Matching Estimators for Average Treatment Effects

Large Sample Properties of Matching Estimators for Average Treatment Effects Large Sample Properties of atching Estimators for Average Treatment Effects Alberto Abadie Harvard University and BER Guido Imbens UC Bereley and BER First version: July This version: January 4 A previous

More information

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata

More information

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1 Imbens/Wooldridge, IRP Lecture Notes 2, August 08 IRP Lectures Madison, WI, August 2008 Lecture 2, Monday, Aug 4th, 0.00-.00am Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Introduction

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Large Sample Properties of Matching Estimators for Average Treatment Effects

Large Sample Properties of Matching Estimators for Average Treatment Effects Revision in progress. December 7, 4 Large Sample Properties of atching Estimators for Average Treatment Effects Alberto Abadie Harvard University and BER Guido Imbens UC Bereley and BER First version:

More information

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS A SAS macro to estimate Average Treatment Effects with Matching Estimators Nicolas Moreau 1 http://cemoi.univ-reunion.fr/publications/ Centre d'economie

More information

A Course in Applied Econometrics. Lecture 10. Partial Identification. Outline. 1. Introduction. 2. Example I: Missing Data

A Course in Applied Econometrics. Lecture 10. Partial Identification. Outline. 1. Introduction. 2. Example I: Missing Data Outline A Course in Applied Econometrics Lecture 10 1. Introduction 2. Example I: Missing Data Partial Identification 3. Example II: Returns to Schooling 4. Example III: Initial Conditions Problems in

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Section 10: Inverse propensity score weighting (IPSW)

Section 10: Inverse propensity score weighting (IPSW) Section 10: Inverse propensity score weighting (IPSW) Fall 2014 1/23 Inverse Propensity Score Weighting (IPSW) Until now we discussed matching on the P-score, a different approach is to re-weight the observations

More information

Dealing with Limited Overlap in Estimation of Average Treatment Effects

Dealing with Limited Overlap in Estimation of Average Treatment Effects Dealing with Limited Overlap in Estimation of Average Treatment Effects The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW*

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW* OPARAMETRIC ESTIMATIO OF AVERAGE TREATMET EFFECTS UDER EXOGEEITY: A REVIEW* Guido W. Imbens Abstract Recently there has been a surge in econometric work focusing on estimating average treatment effects

More information

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING ESTIMATION OF TREATMENT EFFECTS VIA MATCHING AAEC 56 INSTRUCTOR: KLAUS MOELTNER Textbooks: R scripts: Wooldridge (00), Ch.; Greene (0), Ch.9; Angrist and Pischke (00), Ch. 3 mod5s3 General Approach The

More information

arxiv: v1 [math.st] 28 Feb 2017

arxiv: v1 [math.st] 28 Feb 2017 Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

NBER WORKING PAPER SERIES ESTIMATING DISCOUNT FUNCTIONS WITH CONSUMPTION CHOICES OVER THE LIFECYCLE. David Laibson Andrea Repetto Jeremy Tobacman

NBER WORKING PAPER SERIES ESTIMATING DISCOUNT FUNCTIONS WITH CONSUMPTION CHOICES OVER THE LIFECYCLE. David Laibson Andrea Repetto Jeremy Tobacman NBER WORKING PAPER SERIES ESTIMATING DISCOUNT FUNCTIONS WITH CONSUMPTION CHOICES OVER THE LIFECYCLE David Laibson Andrea Repetto Jeremy Tobacman Working Paper 13314 http://www.nber.org/papers/w13314 NATIONAL

More information

A Martingale Representation for Matching Estimators

A Martingale Representation for Matching Estimators A Martingale Representation for Matching Estimators Alberto Abadie Harvard University and BER Guido W. Imbens Harvard University and BER September 20 Abstract Matching estimators Rubin, 973a, 977; Rosenbaum,

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Estimation and Inference Gerald P. Dwyer Trinity College, Dublin January 2013 Who am I? Visiting Professor and BB&T Scholar at Clemson University Federal Reserve Bank of Atlanta

More information

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

Reliable Inference in Conditions of Extreme Events. Adriana Cornea Reliable Inference in Conditions of Extreme Events by Adriana Cornea University of Exeter Business School Department of Economics ExISta Early Career Event October 17, 2012 Outline of the talk Extreme

More information

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar Mitnik First Draft: July 2004

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Jumping Sequences. Steve Butler 1. (joint work with Ron Graham and Nan Zang) University of California Los Angelese

Jumping Sequences. Steve Butler 1. (joint work with Ron Graham and Nan Zang) University of California Los Angelese Jumping Sequences Steve Butler 1 (joint work with Ron Graham and Nan Zang) 1 Department of Mathematics University of California Los Angelese www.math.ucla.edu/~butler UCSD Combinatorics Seminar 14 October

More information

Midterm Exam 1 (Solutions)

Midterm Exam 1 (Solutions) EECS 6 Probability and Random Processes University of California, Berkeley: Spring 07 Kannan Ramchandran February 3, 07 Midterm Exam (Solutions) Last name First name SID Name of student on your left: Name

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen Woutersen. Draft, September

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen Woutersen. Draft, September A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters By Tiemen Woutersen Draft, September 006 Abstract. This note proposes a new way to calculate confidence intervals of partially

More information

Solution Set for Homework #1

Solution Set for Homework #1 CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean

More information

Principles Underlying Evaluation Estimators

Principles Underlying Evaluation Estimators The Principles Underlying Evaluation Estimators James J. University of Chicago Econ 350, Winter 2019 The Basic Principles Underlying the Identification of the Main Econometric Evaluation Estimators Two

More information

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region Impact Evaluation Technical Session VI Matching Techniques Jed Friedman Manila, December 2008 Human Development Network East Asia and the Pacific Region Spanish Impact Evaluation Fund The case of random

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples DISCUSSION PAPER SERIES IZA DP No. 4304 A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples James J. Heckman Petra E. Todd July 2009 Forschungsinstitut zur Zukunft

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Comparison of inferential methods in partially identified models in terms of error in coverage probability

Comparison of inferential methods in partially identified models in terms of error in coverage probability Comparison of inferential methods in partially identified models in terms of error in coverage probability Federico A. Bugni Department of Economics Duke University federico.bugni@duke.edu. September 22,

More information

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Statistica Sinica 26 (2016), 1117-1128 doi:http://dx.doi.org/10.5705/ss.202015.0240 A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Xu He and Peter Z. G. Qian Chinese Academy of Sciences

More information

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

The Evaluation of Social Programs: Some Practical Advice

The Evaluation of Social Programs: Some Practical Advice The Evaluation of Social Programs: Some Practical Advice Guido W. Imbens - Harvard University 2nd IZA/IFAU Conference on Labor Market Policy Evaluation Bonn, October 11th, 2008 Background Reading Guido

More information

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1 Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1 Lectures on Evaluation Methods Guido Imbens Impact Evaluation Network October 2010, Miami Methods for Estimating Treatment

More information

Lecture 2: Discrete Probability Distributions

Lecture 2: Discrete Probability Distributions Lecture 2: Discrete Probability Distributions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 1st, 2011 Rasmussen (CUED) Lecture

More information

Controlling for overlap in matching

Controlling for overlap in matching Working Papers No. 10/2013 (95) PAWEŁ STRAWIŃSKI Controlling for overlap in matching Warsaw 2013 Controlling for overlap in matching PAWEŁ STRAWIŃSKI Faculty of Economic Sciences, University of Warsaw

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

The formal relationship between analytic and bootstrap approaches to parametric inference

The formal relationship between analytic and bootstrap approaches to parametric inference The formal relationship between analytic and bootstrap approaches to parametric inference T.J. DiCiccio Cornell University, Ithaca, NY 14853, U.S.A. T.A. Kuffner Washington University in St. Louis, St.

More information

2.3 Estimating PDFs and PDF Parameters

2.3 Estimating PDFs and PDF Parameters .3 Estimating PDFs and PDF Parameters estimating means - discrete and continuous estimating variance using a known mean estimating variance with an estimated mean estimating a discrete pdf estimating a

More information

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Keisuke Hirano University of Miami 2 Guido W. Imbens University of California at Berkeley 3 and BER Geert Ridder

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Why high-order polynomials should not be used in regression discontinuity designs

Why high-order polynomials should not be used in regression discontinuity designs Why high-order polynomials should not be used in regression discontinuity designs Andrew Gelman Guido Imbens 6 Jul 217 Abstract It is common in regression discontinuity analysis to control for third, fourth,

More information

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased

More information

WHEN TO CONTROL FOR COVARIATES? PANEL ASYMPTOTICS FOR ESTIMATES OF TREATMENT EFFECTS

WHEN TO CONTROL FOR COVARIATES? PANEL ASYMPTOTICS FOR ESTIMATES OF TREATMENT EFFECTS WHEN TO CONTROL FOR COVARIATES? PANEL ASYPTOTICS FOR ESTIATES OF TREATENT EFFECTS Joshua Angrist Jinyong Hahn* Abstract The problem of when to control for continuous or highdimensional discrete covariate

More information

Estimation of Treatment Effects under Essential Heterogeneity

Estimation of Treatment Effects under Essential Heterogeneity Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March

More information

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance Identi cation of Positive Treatment E ects in Randomized Experiments with Non-Compliance Aleksey Tetenov y February 18, 2012 Abstract I derive sharp nonparametric lower bounds on some parameters of the

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department

More information

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects The Problem Analysts are frequently interested in measuring the impact of a treatment on individual behavior; e.g., the impact of job

More information

BOOTSTRAP INFERENCE FOR PROPENSITY SCORE MATCHING

BOOTSTRAP INFERENCE FOR PROPENSITY SCORE MATCHING BOOTSTRAP IFERECE FOR PROPESITY SCORE ATCHIG KARU ADUSUILLI Abstract. Propensity score matching, where the propensity scores are estimated in a first step, is widely used for estimating treatment effects.

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Random Variable. Pr(X = a) = Pr(s)

Random Variable. Pr(X = a) = Pr(s) Random Variable Definition A random variable X on a sample space Ω is a real-valued function on Ω; that is, X : Ω R. A discrete random variable is a random variable that takes on only a finite or countably

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,

More information

ECON 3150/4150, Spring term Lecture 6

ECON 3150/4150, Spring term Lecture 6 ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information