Mean-squared-error Calculations for Average Treatment Effects

Size: px
Start display at page:

Download "Mean-squared-error Calculations for Average Treatment Effects"

Transcription

1 Mean-squared-error Calculations for Average Treatment Effects Guido W. Imbens UC Berkeley and BER Whitney ewey MIT Geert Ridder USC First Version: December 2003, Current Version: April 9, 2005 Abstract This paper develops JEL Classification: C4, C20. Keywords: onparametric Estimation

2 Introduction Recently a number of estimators have been proposed for average treatment effects under the assumption of unconfoundedness or selection on observables. Many of these estimators require nonparametric estimation of an unknown function, either the regression function or the propensity score. Typically results are presented concerning the rates at which the smoothing parameters go to their limiting values, without specific recommendations regarding their values Hahn, 998; Hirano, Imbens and Ridder, 2000; Heckman, Ichimura and Todd, 998; Rotnitzky and Robins, 995; Robins, Rotnitzky and Zhao, 995)). In this paper we make two contributions. First, we propose a new estimator. Our estimator is a modification of an estimator introduced in an influential paper by Hahn 998). Like Hahn, our estimator relies on consistent estimation of the two regression functions followed by averaging their difference over the empirical distribution of the covariates. Our estimator differs from Hahn s in that it directly estimates these two regression functions whereas Hahn first estimates the propensity score and the two conditional epectations of the product of the outcome and the indicators for being in the control and treatment group and then combines these to get estimates of the two regression functions. Thus our estimator completely avoids the need to estimate the propensity score. Our second and most important contribution is that we are eplicit about the choice of smoothing parameters. In our series estimation setting the smoothing parameter is the number of terms in the series. We provide a criterion for choosing this number of terms and show its asymptotic optimality in terms of epectedmean-squared-error. This criterion is related to, but differ from from the standard one for nonparametric estimation as in Li 987) and Andrews 99) in that it focuses eplicitly on optimal estimation of the average treatment effect rather than on optimal estimation of the entire unknown function. In the net section we discuss the basic set up and introduce the new estimator. In Section 3 we analyze the asymptotic properties of this estimator. In Section 4 we propose a method for choosing the number of terms in the series. 2 The Basic Framework The basic framework is standard in this literature e.g Rosenbaum and Rubin, 983; Hahn, 998; Heckman, Ichimura and Todd, 998; Hirano, Imbens and Ridder, 2003) We have a random sample of size from a large population. For each unit i in the sample, let W i indicate whether the treatment of interest was received, with W i = if unit i receives the treatment of interest, and W i = 0 if unit i receives the control treatment. Using the potential outcome notation popularized by Rubin 974), let Y i 0) denote the outcome for unit i under control and Y i ) Independently Chen, Hong, and Tarozzi 2004) have established the efficiency of their CEP-GMM estimator that is similar to our new estimator. []

3 the outcome under treatment. We observe W i and Y i, where Y i Y i W i ) = W i Y i ) W i ) Y i 0). In addition, we observe a vector of pre-treatment variables, or covariates, denoted by X i. We shall focus on the population average treatment effect: τ E[Y ) Y 0)]. Similar results can be obtained for the average effect for the treated: τ t E[Y ) Y 0) W = ]. The central problem of evaluation research e.g., Holland, 986) is that for unit i we observe Y i 0) or Y i ), but never both. Without further restrictions, the treatment effects are not consistently estimable. To solve the identification problem, we maintain throughout the paper the unconfoundedness assumption Rubin, 978; Rosenbaum and Rubin, 983), which asserts that conditional on the pre-treatment variables, the treatment indicator is independent of the potential outcomes. This assumption is closely related to selection on observables assumptions e.g., Barnow, Cain and Goldberger, 980; Heckman and Robb, 984). Formally: Assumption 2. Unconfoundedness) W Y 0), Y )) X. 2.) Let the propensity score be the probability of selection into the treatment group: e) PrW = X = ) = E[W X = ], 2.2) Assumption 2.2 Overlap) The propensity score is bounded away from zero and one. Define the average effect conditional on pre-treatment variables: τ) E[Y ) Y 0) X = ] ote that τ) is estimable under the unconfoundedness assumption, because E[Y ) Y 0) X = ] = E[Y ) W =, X = ] E[Y 0) W = 0, X = ] = E[Y W =, X = ] E[Y W = 0, X = ]. The population average treatment effect can be obtained by averaging the τ) over the distribution of X: τ = E[τX)], and therefore the average treatment effect is identified. [2]

4 3 Efficient Estimation In this section we review two efficient estimators previously proposed in the literature. We then discuss two new estimators, which will facilitate the mean-squared error calculations. 3. The Hahn Estimator Hahn 998) studies the same model as in the current paper. He calculates the efficiency bound, and proposes an efficient estimator. His estimator also imputes the potential outcomes given covariates, followed by averaging the difference in the estimated regression functions. The difference with our estimator is that Hahn first estimates nonparametrically the three conditional epectations E[Y W X], E[Y W ) X], and ex) = E[W X]. and then uses these conditional epectations to estimate the two regression function µ 0 ) = E[Y 0) X = ] and µ ) = E[Y ) X = ] as ˆµ 0 ) = Ê[Y W ) X = ], and ˆµ ) = êx) Ê[Y W X = ]. êx) The average treatment effect is then estimated as ˆτ h = ˆµ X i ) ˆµ 0 X i )). Hahn shows that under regularity conditions this estimator is consistent, asymptotically normally distributed, and that it reaches the semiparametric efficiency bound. 3.2 The Hirano-Imbens-Ridder Estimator Hirano, Imbens and Ridder 2003) also study the same set up. They propose using a weighting estimator with the weights based on the estimated propensity score: ˆτ hir, = Wi Y i êx i ) W ) i. êx i ) They show that under regularity conditions, and with ê) a nonparametric estimator for the propensity score, this estimator is consistent, asymptotically normally distributed and efficienty. It will be useful to consider a slight modification of this estimator. Consider the weights for the treated observations, /êx i ). Summing up over all treated observations and dividing by we get i W i/êx i ))/. This is not necessarily equal to one. We may therefore wish to modify the weights to ensure they add up to one for the treated and control units. This leads to the following estimator: ˆτ hir,2 = Wi Y i êx i ) W i êx i ) ) / Wi êx i ) W ) i. êx i ) [3]

5 3.3 A ew Estimator The new estimator relies on estimating the unknown regression functions µ ) and µ 0 ) through nonparametric regression of Y on X for the two subpopulations indeed by the treatment status. It is a modification of Hahn s estimator. Like Hahn s estimator it estimates the average treatment effect as ˆτ inr = ˆµ X i ) ˆµ 0 X i )). The difference with Hahn s estimator is that the two regression functions are estimated directly without first estimating the propensity score and the two conditional epectations E[Y W X] and E[Y W ) X]. This implies that we only need to estimate two regression functions nonparametrically rather than three and this will make the optimal choice of smoothing parameters easier. ote that this estimator still requires more unknown regression functions to be estimated than the HIR estimator which requires only estimation of the propensity score, but since the new estimator relies on estimation of different functions, it is not clear which is to be preferred in practice. To provide additional insight into the structure of the problem, we introduce one final estimator, which combines features of the Hirano-Imbens-Ridder estimators and the new estimator: ˆτ mod = Wi ˆµ X i ) êx i ) W i) ˆµ 0 X i ) êx i ) ) / Wi êx i ) W ) i. êx i ) In order to implement these estimators we need estimators for the two regression functions. Following ewey 995), we use series estimators for the two regression functions µ w ), with K terms. As the basis we use power series. Let λ = λ,..., λ k ) be a multi-inde of dimension k, that is, a k-dimensional vector of non-negative integers, with λ = k λ i, and let λ = λ... λ k k. Consider a series {λr)} r= containing all distinct such vectors such that λr) is nondecreasing. Let p r ) = λr), where p K ) = p ),..., p K )). The nonparametric series estimator of the regression function µ w ), given K terms in the series, is given by: ˆµ w ) = p K ) p K X i )p K X i ) p K X i )Y i, W i =w W i =w where A denotes a generalized inverse of A. Given the two estimated regression functions we estimate the average treatment effect as ˆτ = ) ˆµ X i ) ˆµ 0 X i ). For the propensity score we use the series logit estimator Hirano, Imbens and Rubin, 2003) Let Lz) = epz)/ epz)) be the logistic cdf. The series logit estimator of the population [4]

6 propensity score e) is ê M ) = Lp M ) ˆπ M ), for simplicity we use the same series p M ), although this is not essential) where for ˆπ M = arg ma π L,Mπ), 3.3) L,M π) = Wi ln Lp M X i ) π) W i ) ln Lp M X i ) π)) ). 3.4) 3.4 First Order Equivalence of Estimators We make the following assumptions. Assumption 3. Distribution of Covariates) X X R d, where X is the Cartesian product of intervals [ jl, ju ], j =,..., d, with jl < ju. The density of X is bounded away from zero on X. Assumption 3.2 Propensity Score) i) The propensity score is bounded away from zero and one. ii) The propensity score is s times continuously differentiable. Assumption 3.3 Conditional Outcome Distributions) i) The two regression functions µ w ) are t times continuously differentiable. ii) the conditional variance of Y i w) given X i = is bounded by σw. 2 Assumption 3.4 Rates for Series Estimators) i) K = ν, with < ν <, ii) L = ξ, with < ξ <. The properties of the estimators will follow from the following lemma: Theorem 3. Asymptotic Equivalence of ˆβ inr, ˆβ mod and ˆβ hir ) Suppose Assumptions hold. Then i), ˆτinr ˆτ mod ) = o p ), ii), ˆτmod ˆτ hir, ) = o p ). iii), ˆτhir, ˆτ hir,2 ) = o p ). [5]

7 and iv), ˆτh ˆτ hir, ) = o p ). Proof: See Appendi. 4 A feasible MSE criterion All three estimators contain a function or functions that are estimated nonparametrically. For the Hahn estimator we need nonparametric estimates of the propensity score and the conditional epectation of the product of the outcome and the treatment indicator and of the outcome and the control indicator given the covariates. For the HIR weighting estimator an estimator of the propensity score is needed, and for the imputation estimator introduced in section 3.3 we need estimators for the conditional means in the treatment and control populations. Both Hahn and HIR use series estimators for either the propensity score or the conditional epectations. That leaves the question how to select the order of the series. For a meaningful comparison of the performance of these asymptotically efficient estimators such a selection rule is essential. Despite its practical importance there has been little work on the selection of the nonparametric estimators. The only paper that we are aware of is Ichimura and Linton 2003) who consider bandwidth selection if the propensity score in the weighting estimator is estimated by a kernel nonparametric regression estimator. The current practice in propensity score matching, which is a nonparametric estimator that is different from the estimators considered in section 3, is that the propensity score is usually selected using the balancing score property W X ex) In practice this is implemented by stratifying the sample on the propensity score and testing whether the means of the covariates are the same for treatment and controls see e.g. Dehejia and Wahba 2000)). This method of selecting the model for the propensity score focuses eclusively on the bias in the estimation of the treatment effect. This could lead to overspecification of the propensity score and inflation of the variance of the estimator of the treatment effect. We consider both the bias and the variance associated with a choice of the nonparametric function in the treatment effect estimator. Initially we consider the missing data problem in which we observe X for all observations, but Y only if an observation indicator D =. If D = 0 we observe X but not Y. We refer to this type of data as one-sample data. Alternatively, we could have two samples. The first sample is from the marginal population distribution of X. The second sample is a random sample from the subpopulation Y, X D =. This type of data is referred as the two-sample data. We develop the theory for two-sample data, but it turns out that the results are identical for the one-sample case. [6]

8 Throughout we assume that the Y is Missing At Random MAR), i.e. Y D X The notation in this section will differ from that used in section 3 to reflect that we consider the missing data instead of the treatment effect problem. In particular, we use p) for the propensity score, and r kk ) for the orthonormal polynomials that are used in the estimation of µ). 4. The MSE and its estimator As in Li 987) we consider a population in which the joint distribution of Y, X is such that Y = µx) U with EU X) = 0 and VarU X) = σ 2. Andrews 99) has generalized Li s results to the heteroskedastic case and we could do the same. To concentrate on essentials first we maintain the assumption that U is homoskedastic. Initially we assume that we have two independent samples. Sample is a sample of size from the marginal distribution of X. We denote this sample by X i, i =,...,. Sample 2 is a random sample of size from the distribution of Y, X D =. We denote this sample by Y i, X i, i =,...,. The imputation estimator for the parameter of interest µ Y = EY ) is with ˆτ IR = ˆµ Y K = ˆµ K X i ) ˆµ K ) = r K ) R KR K ) R Ky The subscript K is the number of base functions used in the estimation of µ), and we use the notation y = Y. Y µ = µx ). µx ) u = U. U and r K ) = r K ). r KK ) R K = r K X ). r K X ) R K = r K X ). r K X ) The parameter of interest is µ Y = EY ). The MSE is obtained from ˆµY K µ Y ) = ˆµ K X [ i ) E Y X ˆµ K X ]) i ) [ E Y X ˆµ K X ] i ) µ X ) i ) [7]

9 µ X ) i ) µ Y 4.5) We can treat X i and X i as constants. Therefore, it is reasonable to redefine the parameter of interest as µ Y = µ X i ) If we do so, the final term in the comparison can be omitted, and we need only consider the first two terms. This parameter is the missing data analogue of the sample treatment effect of Imbens 2003). We now consider the first two terms separately. The first corresponds to the variance and the second to the bias term in the MSE. The first term can be written as V = ˆµ K X [ i ) E Y X ˆµ K X ]) i ) = ι RK R KR K ) R Ku If we treat the covariates as constants, it is easily seen that the epected value of the crossproduct of the bias and variance term is 0. Hence we can deal with the bias and variance terms separately. The variance term can be epressed as EV 2 ) = σ 2 2 Because f) = ι RK R KR K ) R K ι = σ 2 ι RK ) R ) K R K ι RK p f D = ) 4.6) p) with p = PrD = ), the variance term can be computed from the observations in the subpopulation X D = if we use the weights p px), so that with EV 2 ) = p 2 σ2 a M K a a = px ). px ) M K = R K R KR K ) R K The bias term is [ E Y X ˆµ K X ] i ) µ X ) i ) = r K X i ) R KR K ) R Kµ µ X ) i ) [8] )

10 Again we average over the distribution of X D = using the weights term is written as B = p rk X i ) R px i ) KR K ) R Kµ µx i ) ) = pa A K µ p px). Hence, the bias with A K = I M K. ote that a A K µ is the covariance of the residuals of the regressions of µ on R K and a on R K, respectively. By Lorentz 986), Theorem 8, p. 90, we have that a A K µ = O K sµ/d sp/d) so that the squared bias term is of order O K 2sµ/d 2sp/d) with s µ the number of continuous derivatives of µ), s p the number of continuous derivatives of p), and d the dimension of X. If we add the variance and squared bias terms we obtain the population MSE E K) = p2 σ 2 a M K a a A K µ) 2) 4.7) To use the MSE we need to estimate the bias term. This can be done in several ways 2. Let e K be the vector of residuals of the regression of y on R K, i.e. e K = A K y. Using so that Hence, a e K = a A K µ a A K u a e K ) 2 = a A K µ) 2 a A K uu A K a 2a A K µa A K u E [ a e K ) 2] = a A K µ) 2 σ 2 a A K a so that the bias term is estimated by ote that and a e K ) 2 σ 2 a A K a a A K µ) 2 = O 2 K 2sµ/d 2sp/d ) σ 2 a A K a = OK 2sp/d ) Upon substitution of the estimate we obtain the estimated MSE C K) = p2 2σ 2 a M K a σ 2 a a a e K ) 2) 4.8) 2 ote that a A K ˆµ K = 0, so that this obvious estimator cannot be used. [9]

11 Both the population MSE and its estimator are proportional to p 2. Hence the value of K that minimizes the estimated) MSE is independent of the fraction of observations for which we observe Y. In the sequel we omit p 2 and redefine E K) = σ 2 a M K a a A K µ) 2) C K) = 2σ 2 a M K a σ 2 a a a e K ) 2) 4.2 The population MSE If, the variance term converges to [ σ 2 E r K X) ] Σ K [r E kk X) ] with Σ K the probability limit of R K R K. The variance term simplifies if we choose the base functions r kk as orthonormal polynomials with respect to the density of X D =, so that E [r jk X)r jk X) D = ] = 0, j k E [ r kk X) 2 D = ] = For this choice the variance term converges to [ σ 2 E r K X) ] [ E r K X) ] = σ 2 K k= [ E r kk X) ]) 2 Because the constant function is one of the orthonormal polynomials, we have that for all non-constant base functions E [r kk X)] = 0 Using 4.6) [ E r kk X) ] [ ] p = E px) r kkx) which is proportional to the covariance of px) and r kkx). If the inverse of the probability that Y is observed is smooth in the sense that it can be well approimated by a linear combination of a relatively small number of base functions, then the variance will not increase much if K eceeds the number of base functions needed. As an eample take p) = d c γ K 0 r K0 ) [0]

12 With bounded support of X we can always find a c so that p) is nonnegative. For this choice of p) we have [ ] p E px) r kkx) = pγ k E [ r kk X) 2] = pγ k if the base functions are orthonormal. Hence for large, the variance term is proportional to p 2 K k= γ2 k. Hence the variance term increases with K for K K 0, and is constant for K > K 0. In this eample the bias decreases with K and is 0 for K K 0 for all. Because the bias term increases with we find that the value of K that minimizes the MSE is bounded by K 0 if is large. In this eample a is orthogonal to the residuals e K0, and a A K0 a = 0, so that the estimate of the bias is 0 as is the bias) for K K 0. As we will see in section 4.4 these same results apply approimately) if p) is not eactly a linear combination of base functions. Unless p) is very unsmooth, only a few base functions are needed to minimize the MSE. It is instructive to compare the population MSE with Li s 986) average squared error criterion. We obtain intuitive results if we assume that K0 p) = K 0 γ k r k ) µ) = δ k r k ) k= k= with r k ) orthogonal polynomials with respect to the density of X D =. Li s average squared error criterion is L K) = ˆµ K X i ) µx i )) 2 i.e. it is average squared deviation in the sample in which we observe both Y and X. The epected value of the variance term is σ 2 r K X i ) R KR K ) r K X i ) = σ 2 K The bias term is equal to µ µ µ R K R KR K ) R Kµ which converges to K 0 k=k δ2 k. Hence Li s criterion is E [L K)] = σ 2 K K 0 k=k Under these assumptions the bias in our criterion is p a µ ) a R K R KR K ) R Kµ δ 2 k 4.9) []

13 We have a µ = K0 ) K0 ) γ k r k X i ) δ l r l X i ) = k= l= K 0 γ k δ k r k X i ) 2 k= K 0 γ k δ l r k X i )r l X i ) k l Because the r k are orthonormal, the second term on the right hand side converges to 0, and the first converges to K 0 k= γ kδ k. Finally, because a R K converges to γ K I K and R K µ to I K δ K, the second term in the bias converges to K k= γ kδ k. Combining the results we find p 2 σ 2 K k= γ 2 k K0 k=k ) 2 γ k δ k 4.0) These two criteria are clearly different. For instance, if we take γ k = γ, then for fied, E [L K )] E [L K)] if and only if t 2 K with t k = δ k σ the asymptotic t-ratio for δ K. For our criterion we find that it decreases if and only if t K TK2 2 T K2 or t K TK2 2 T K2 with T K2 = K 0 k=k2 t k. 4.3 Optimality of minimizer of estimated MSE Define ˆK as ˆK = argmin K K C K) with K an inde set that grows with at rate κ. We will show that ˆK is optimal in the sense that Li 987)) E ˆK) inf K K E K) p 4.) This does not imply that the difference between ˆK and the minimizer of E K) converges to 0. We make the following assumptions Assumption 4. The smallest eigenvalue of E [r K X)r K X) D = ] is bounded from 0 for all K. [2]

14 Assumption 4.2 E [U m ] < for some integer m. Assumption 4.3 κmsp/d ) inf E K) m 2 K K with d the dimension of X. Theorem 4. If assumptions hold, then E ˆK) inf K K E K) Proof See Appendi. p 4.4 Simulation results The finite sample performance of our estimator is investigated in a number of sampling eperiments. The population model is with Y = µx) U µ) = 5 8 k= k X a scalar variable and U standard normal. We choose a simple logit model for p) p) = e.53 e.53 As base functions we choose the Legendre polynomials. These polynomials are defined on [, ] and are orthogonal with weight function equal to on this interval. For that reason we choose the marginal distribution of X such that X D = is a uniform distribution on [, ], i.e. the distribution of X has density f) = p 2 p) with p = 6 6 e.5 e 4.5 ) Of course, in practice it may not be feasible to choose polynomials that are orthogonal with a weight function that is equal to the distribution X D =. Table gives some statistics for the data generating process. The fraction with a missing Y is.429. The mean of both X and Y is larger in the subpopulation with observed Y. [3]

15 Table : Statistics sampling eperiment Mean Std dev Y X Y D = D.57 Table 2: Optimality of minimizer estimated GMM Fraction ˆK = K E ˆK) E K) = = = We consider three sample sizes = 00, 000, The number of repetitions is 000. The results are in Table 2 and the figures. The second column is in line with the asymptotic result in section 4.3. That result does not imply that the minimizer of the population and estimated population MSE are closer if the sample size increases. However, this is what we find in the eperiments. [4]

16 Figure : Population and estimated MSE, = 00 In the figures -3 we report the population MSE and its estimator. ote that with the simple logit model the MSE is flat after the inclusion of a few basis functions. To avoid scaling problems we omit some population and estimated MSE values for small K. [5]

17 Figure 2: Population and estimated MSE, = 000 [6]

18 Figure 3: Population and estimated MSE, = 0000 [7]

19 5 Appendi For matrices A we use the norm A = tra A)) /2. Lemma A. Properties of orm) For conformable matrices A and B, i), AB A B, ii), A BA B A A. If B is positive semi-definite and symmetric, then, for λ ma B) equal to the maimum eigenvalue of B, iii), tra BA) A 2 λ ma B), iv), AB A λ ma B), v), BA A λ ma B). Proof: Let A be of dimension K L, and B of dimension L M. First we prove i): AB is of dimension K M, with its k, m) element equal to L l= a klb lm. Hence the m, m) element of the M M matri B A AB is equal to K L ) 2, k= l= a klb lm and thus AB 2 = trb A AB) = M m= k= K L ) 2 a kl b lm l= M K L m= k= l= a 2 kl l= L b 2 lm = A 2 B 2, L ) 2 where the last inequality follows from the Cauchy-Schwartz inequality which implies that l= a klb lm L L l= a2 kl l= b2 lm. et, consider ii): A BA 2 = tra B AA BA) = trb AA BAA ) = trd E), where D = AA B and E = BAA. Let δ = trd E)/trE E), and F = D δe, so that trf E) = 0. Then trd E) = trδe F ) E) = δtre E), and trd D) = δ 2 tre E) trf F ), so that Thus trd E) 2 = δ 2 tre E) 2 δ 2 tre E) trf F )) tre E) = trd D) tre E). A BA 2 = trb AA BAA ) trb AA AA B)) /2 traa B BAA )) /2 = AA B BAA. By part i) of the lemma, AA B AA B, and BAA B AA, so that the conclusion follows. et, consider iii). Because B is symmetric L L, and positive definite, we have B = SΛS, where S S = I L, and Λ is diagonal with the eigenvalues of B on its diagonal, so that λ ma B) = λ ma Λ). Let C = A S, with c ij the i, j) element of A S. With A of dimension L K, C is of dimension K L. Then tra BA) = tra SΛS A) = trλs AA S) = trλc C) = L K λ j j= c 2 ij λ ma B) L j= K c 2 ij = λ ma B) trc C) = λ ma B) trs AA S) = λ ma B) traa SS ) = λ ma B) traa ) = λ ma B) tra A) = λ ma B) A 2. [8]

20 et, consider iv). Again write B = SΛS, with Λ diagonal and S S = I L. Then: AB 2 = trb A AB) = tra ABB ) = tra ASΛS SΛS ) = tra ASΛ 2 S ) λ 2 mab) trs A AS) = λ 2 mab) tra ASS ) = λ 2 mab) tra A) = λ 2 mab) A 2. Finally, v) can be proven the same way. The data consist of two random samples X i, Y i ), i =,...,, and Z i ), i =,..., M. Let µ) = E[Y X = ] be the conditional epectation of the outcome given the covariates. We use a series estimator for the regression function µ). Let K denote the number of terms in the series. As the basis we use power series. Let λ = λ,..., λ d ) be a multi-inde of dimension d, that is, a d-dimensional vector of non-negative integers, with λ = d k= λ k, and let λ = λ... λ d d. Consider a series {λr)} r= containing all distinct vectors such that λr) is nondecreasing. Let p r ) = λr), where P r ) = p ),..., p r )). Assumption A. X X, where X is the Cartesian product of intervals [ jl, ju ], j =,..., d, with jl < ju. The density of X is bounded away from zero on X. Assumption A.2 Z X, and the density of Z is bounded away from zero on X. Given Assumption A. the epectation Ω K = E[P K X)P K X) ] is nonsingular for all K. Hence we can construct a sequence R K ) = Ω /2 K P K) with E[R K X)R K X) ] = I K. Let R kk ) be the kth element of the vector R K ). It will be convenient to work with this sequence of basis function R K ). Define ζk) = sup R K ). X Lemma A.2 ewey, 994) ζk) = OK). Proof: see other notes) Let R K be the K matri with ith row equal to R K X i), and let ˆΩ K, = R K R K/. Lemma A.3 ewey, 997) ˆΩ K, I K = O p ζk)k /2 /2) Proof: We will show that E [ R KR K / I K 2] ζk) 2 K/, A.) so that the result follows by Markov s inequality. E [ R KR K / I K 2] = E [ tr R KR K R KR K / 2 2R KR )] K / I K = tr E [ R KR K R KR K / 2] 2E [R KR ) K /] I K = tr E [ R KR K R KR K / 2]) K [9]

21 K K = E [ R kk X i )R lk X i )R lk X j )R kk X j )/ 2] K. k= l= j= The terms with i j and k l have epectation zero. The terms with k = l and i j have epectation E[R kk X i ) 2 R kk X j ) 2 ] =. There are K ) of those terms, each divided by 2, leading to a total of K )/ = K K/. Hence K K E [ R kk X i )R lk X i )R lk X j )R kk X j )/ 2] K k= l= j= K k= l= = 2 K E [ R kk X i )R lk X i )R lk X i )R kk X i )/ 2] [ K E R kk X i ) 2 k= K l= R lk X i ) 2 ]. A.2) By definition, ζk) = sup R K ) = sup k R2 kk ))/2. Hence it follows that k R2 kk ) ζ2 K), and thus A.2) can be bounded by 2 [ K ] ζ 2 K) E R lk X i ) 2 = Kζ 2 K)/. l= This shows that A.) holds, and thus finishes the proof. Let U i = Y i µx i ). Let U, Y, and X be the vectors and d matri with ith row equal to U i, Y i, and X i. Furthermore, let be an indicator for the event λ min ˆΩ K, ) > /2. By Lemma A.3, p it follows that. For a symmetric matri A, we can write A as F ΛF, where F is symmetric and Λ diagonal. Let Λ /2 be the diagonal matri with each diagonal element equal to the reciprocal of the square root of the corresponding diagonal element of Λ, and let A /2 = F Λ /2, so that A /2 is symmetric and satisfies A /2 A /2 = A. Assumption A.3 sup VarY X) σ 2. X Lemma A.4 i), and ii), ˆΩ ˆΩ /2 K, R KU/ K, R KU/ Proof: First we prove i). [ E ˆΩ /2 K, R KU/ = O p K /2 /2 ), = O p K /2 /2 ), ] 2 X [ ] = E U R ˆΩ K K, R KU/ 2 X [20]

22 = E [ U R K R KR K ) R KU/ X ] = E [ tr U R K R KR K ) R KU ) X ] / [ = E tr R K R KR K ) R KUU ) ] X / ) = tr R K R KR K ) R KE [UU X] / σ 2 tr R K R KR K ) R K) / = σ 2 K/ σ 2 K/. Then by the Markov inequality ˆΩ /2 K, R K U/ = O pk /2 / /2 ). et, consider part ii). Using Lemma A.v), ˆΩ λ ma ˆΩ /2 K, ) ˆΩ /2 K, R KU/. K, R KU/ Since λ ma ˆΩ /2 K, ) = O p), the conclusion follows. This conditional epectation µ) is estimated by the series estimator ˆµ K ) = R K )ˆγ K with ˆγ K the least squares estimator. Formally R K R K may be singular, although Lemma A.3 shows that this happens with probability going to zero. To deal with this case we define ˆγ K as 0 K if λ min R K ˆγ K = R K/) /2, ) R KX i )R K X i) R A.3) KX i )Y i otherwise, where 0 K is a K-dimensional vector of zeros, and λ min A) is the minimum eigenvalue of the matri A. Define γk to be the pseudo true value defined as γk = arg min E [µx) R K X) γ) 2 ] W =, A.4) γ with the corresponding pseudo true value of the regression function denoted by µ K ) = R K) γ K. First we state some of the properties of the estimator for the regression function. In order to do so it is useful to first give some approimation properties for µ). Assumption A.4 µ) is s times continuously differentiable on X. Lemma A.5 Lorentz) Suppose Assumptions A.-A.2 hold. Then there is a sequence γk 0 sup µ) R K ) γk 0 = O K s/d). such that Proof: see notes) For the sequence γk 0 in Lemma A.5, define the corresponding sequence of regression functions, µ 0 K ) = R K) γk 0. [2]

23 Lemma A.6 Convergence Rate for Regression Function Estimators) Suppose Assumptions A.-A.2 hold. Then i): ii): iii): γ K γk 0 = O ζk)k s/d), A.5) sup µ K) µ 0 K) = Oζ 2 K)K s/d ). A.6) ˆγ K γ K = O p K /2 /2 K s/d ), A.7) iv): ˆγK γk 0 = Op K /2 /2 K s/d), A.8) v): sup ˆµ K ) µ K) = O p ζk)k /2 /2 ζk)k s/d ), A.9) and vi): sup ˆµ K ) µ) = O p ζk)k /2 /2 ζ 2 K)K s/d ), A.0) Proof: First, consider i): γ K = E[R K X)R KX)]) E[R K X)Y ] = E[R K X)R KX)]) E[R K X)µX)] = E[R K X)µX)], where we use both [R K X)U] = 0 and E[R K X)R K X) ] = I K. Also, γk 0 = E[R KX)R K X)γ0 K ], so that γ K γ 0 K = E[R K X)µX)] E[R K X)R K X) γ 0 K] = E[R K X)µX) R K X) γ 0 K)] sup et, consider ii): sup R K ) sup µx) R K X) γk 0 CζK)K s/d. µ K) µ 0 K) = sup R K ) γk γk) 0 sup R K ) γk γk 0 Cζ 2 K)K s/d, using the result in Lemma A.6i). et, consider iii): ˆγ K γ K = ˆγ K γ K ) ˆγ K γ K, [22]

24 since Pr = ). The second term is ) ˆγ K γ K = ) γ K = o p ). The first term is ˆγ K γk = ˆΩ K, R KY/) ˆΩ K, R KR K γk/) = ˆΩ K, R KU/) ˆΩ K, R KµX) R K γk)/) ˆΩ K, R KU/) ˆΩ K, R KµX) R K γ K)/). By Lemma A.4ii) the first term is O p K /2 /2 ). For the second term, we have, using Lemma A.v), and using the fact that if is equal to, then λ ma ˆΩ /2 K, ) 2, ˆΩ K, R KµX) R K γk)/) λ ma ˆΩ /2 /2 K, ) ˆΩ K, R KµX) R K γk)/) 2 2 µx) R Kγ K) R K ) /2 /2 /2 ˆΩ K, ˆΩ K, R KµX) R K γk) = 2 µx) R KγK) R K R KR K ) R KµX) R K γk) 2 ) /2 µx) R KγK) µx) R K γk) A.) where we use the fact that because R K R K R K) R K is a projection matri, it follows that I R K R K R K) R K is positive semi-definite. Since by the definition of γ K [ ] [ ] E µx) R KγK) µx) R K γk) E µx) R KγK) 0 µx) R K γk) 0 sup µ) R K ) γk 0 2 CK 2s/d, it follows by the Markov inequality that A.) is O p K s/d ). Hence ˆγ K γ K = O pk s/d ) o p ) O p K /2 /2 ) = O p K s/d K /2 /2 ). et, consider iv). We use the same argument as for iii): ˆγ K γk 0 = ˆΩ K, R KY/) ˆΩ K, R KR K γk/) 0 ˆΩ K, R KU/) ˆΩ K, R KµX) R Kγ 0 K)/). The first term is again O p K /2 /2 ). For the second term we now have: ˆΩ K, R KµX) R K γk)/) 0 λ ma ˆΩ /2 /2 K, ) ˆΩ K, R KµX) R K γk)/) 0 2 sup µ) R K ) γk 0 = OK s/d ). et, consider v): sup ˆµ K ) µ K) = sup R K ) ˆγ K γk) sup R K ) ˆγ K γk = O p ζk)k /2 /2 ζk)k s/d ). [23] ) /2

25 Finally, consider vi). sup ˆµ K ) µ) sup ˆµ K ) µ K) sup µ K) µ 0 K) sup µ 0 K) µ) = O p ζk)k /2 /2 ζk)k s/d ζ 2 K)K s/d K s/d ) = O p ζk)k /2 /2 ζ 2 K)K s/d ). The new imputation estimator developed in this paper is ˆβ inr = ˆµ K X i ) = R K X i ) ˆγ K A.2) It is useful to consider in this discussion three additional estimators that are based on the propensity score. In all cases we estimate the propensity score using a logistic series estimator, as suggested in Hirano, Imbens and Ridder 2003). Here we briefly summarize the relevant results. Let Lz) = epz)/ epz)) be the logistic cdf and L z) = Lz) Lz)). The series logit estimator of the population propensity score e ) is ê K ) = LR K ) ˆπ K ), for simplicity we use the same series R K ), although this is not essential) where for ˆπ K = arg ma π L π), A.3) L π) = W i ln LR K X i ) π) W i ) ln LR K X i ) π))). A.4) For and K fied we have ˆπ K p π K, with π K the pseudo true value: π K = arg ma π E [e X) ln LR K X) π) e X)) ln LR K X) π))]. A.5) We also define the pseudo true propensity score: e K ) = LR K) π K ). Assumption A.5 e) is s times continuously differentiable on X. Lemma A.7 Suppose Assumptions A., A.3 hold. Then there is a sequence πl 0 sup e) LR L ) πk) 0 = O L s/d). such that Assumption A.6 inf e) > 0 and sup e) <. Lemma A.8 Convergence Rate for Propensity Score Estimators) Suppose Assumptions??, A., and A.4 hold. Then i): ˆπ L π L = O p L /2 /2 ), A.6) ii): π L πl 0 = O L s/2d)), A.7) iii): sup ê L ) e L) = O p ζl)l /2 /2). A.8) [24]

26 iv): v): vi), sup e L ) e 0 L) = O ζl)l s/2d)). A.9) sup e 0 L) e) = O p L s/d). A.20) ˆπ L π 0 L = O p L /2 /2 L s/2d) ), A.2) vii), and viii), sup ê L ) e 0 L) = O p ζl)l /2 /2 L s2d) )), A.22) sup ê L ) e) = O p ζl)l /2 /2 L s2d) )). A.23) Proof: See Hirano, Imbens and Ridder 2003). The second estimator is a modified imputation estimator that only averages over the observations with W i = : ˆβ mod = W i ˆµ K X i ). A.24) ê L X i ) The third estimator is the weighting estimator proposed by Hirano, Imbens and Ridder 2003): ˆβ hir, = W i Y i ê L X i ). A.25) Hirano, Imbens and Ridder 2003) show that ˆβ hir, is consistent, asymptotically normal and efficient. The fourth estimator is a modified version of the Hirano-Imbens-Ridder estimator where the weights are normalized to add up to unity: ˆβ hir,2 = / W i Y i W j ê L X i ) ê j= L X j ). The properties of ˆβ inr will follow from the following lemma: Proof of Theorem 3.: First we prove i). We have ˆβinr ˆβ mod = ˆµ K X i ) W i ˆµ K X i ) ê L X i ) = = ˆµK X i ) ê L X i ) ê L X i ) ˆµK X i ) ê L X i ) ê L X i ) W ) i ˆµ K X i ) ê L X i ) W i ˆµ K X i ) ê L X i ) A.26) A.27) [25]

27 ˆµ KX i ) ê L X i ) ˆµ KX i ) ê L X i ) ˆµ KX i ) W i e L X i ) e L X i ) e 0 L X ˆµ KX i ) W i i) µ0 K X i) ê L X i ) µ0 K X i) µ0 K X i) µ0 K X i) µ0 K X i) W i µ0 K X i) ê L X i ) µ0 K X i) µ0 K X i) µ0 K X i) µ0 K X i) W i µx i) ê L X i ) ˆµK X = i ) ê L X i ) ê L X i ) ˆµ KX i ) ê L X i ) ˆµ KX i ) ˆµ KX i ) µx i) ê L X i ) ˆµ KX i ) ˆµ KX i ) ˆµ0 K X i) ˆµ KX i ) W i µ0 K X i) W i µx i) W i ˆµ KX i ) ê L X i ) µ0 K X i) ê L X i ) µ0 K X i) e L X i ) µ KX i ) ˆµ KX i ) ˆµ KX i ) ˆµ0 K X i) ˆµ KX i ) W i µ0 K X i) W i µx i) W i W i ˆµ K X i ) ê 0 L X i) ˆµ KX i ) ˆµ KX i ) ˆµ KX i ) ) ˆµ KX i ) W i µ0 K X i) µ0 K X i) µ0 K X i) ˆµ KX i ) W i ˆµ KX i ) µ KX i ) W i µ0 K X i) ˆµ KX i ) W i ˆµ KX i ) W i µ0 K X i) W i µ0 K X i) W i µ0 K X i) ê 0 L X i) e 0 L X µx i) ê L X i ) µ0 K X i) W i i) e 0 L X µx i) W i i) µx i) ê L X i ) µx ) i) W i ˆµK X i ) ê L X i ) ˆµ0 K X ) i) e 0 L X ê L X i ) W i ) i) ˆµ K X i ) µ 0 K X i) ê L X i ) e 0 e L X i ) LX i ) ˆµ K X i ) µ 0 K X i) e 0 L X e 0 LX i ) ) i) ˆµK X i ) µ 0 K X i) ˆµ KX i ) µ 0 K X ) i) W i ) e L X i ) A.28) A.29) A.30) A.3) [26]

28 ˆµ K X i ) µ 0 K X i) W i ) µk X i ) µx ) i) ê L X i ) W i ) µx i ) ê LX i ) W i ). A.32) A.33) A.34) We will deal with equations A.28)-A.34) separately. First consider A.28). This itself will be broken up into seven parts. RK X i ) ˆγ K LR L X i ) ˆπ L ) R KX i ) ˆγ ) K LR L X i ) πl 0 ) ê L X i ) W i ) = = ˆµK X i ) ê L X i ) ˆµ ) KX i ) ) µ K X i ) ê L X i ) ) ˆµ K X i ) µ K X i )) ê L X i ) e L X i ) ê L X i ) W i ) ê L X i ) W i ) ê L X i ) W i ) ê L X i ) W i ) ê L X i ) e L X i )) ê L X i ) W i ) )) êl X i ) e L X i ) µ K X i ) e 2 L X i) ê L X i ) e L X i ) µk X i ) e 2 L X i) µx ) i) e 2 X i ) µx i ) ) e 2 X i ) ê L X i ) e L X i ) )R LX i )ˆπ L πl) 0 µx i ) ) e 2 X i ) µx i ) ) e 2 X i ) µx i ) ) e 2 X i ) R LX i )ˆπ L πl) 0 ê L X i ) e L X i )) R LX i )ˆπ L πl) 0 e L X i ) ) R LX i )ˆπ L πl) 0 W i ) A.35) A.36) A.37) ê L X i ) W i ) A.38) A.39) A.40) A.4) First consider A.35). Since inf e) > c for some c > 0, it follows that for large enough, inf e 0 L ) > c/2, and therefore for large enough we have inf e L ) > c/4. Thus for large enough, with arbitrarily high probability, inf ê L ) > c/8. Thus, since sup ê L ) e 0 L ) = O pζl)l s/2d) ζl)l /2 /2 ), it follows that ê L X i ) e L X i ) = O pζl)l s/2d) ζl)l /2 /2 ). [27]

29 Thus ) ˆµ K X i ) µ K X i )) ê L X i ) e L X i ) /2 sup ˆµ K ) µ K ) sup ê L X i ) W i ) ê L X i ) e L X i ) 2 = O p /2 ζk)k s/d ζk)k /2 /2 ) ζl)l s/2d) ζl)l /2 /2 ). et, consider A.36). ) µ K X i ) ê L X i ) êlx ) i ) e L X i ) e L X i ) e 2 L X ê L X i ) W i ) i) = = /2 sup µ K X i ) ) el X i ) ê L X i ) êlx ) i ) e L X i ) e L X i )ê L X i ) e 2 L X i) µ K X i ) e L X i ) e LX i ) ê L X i )) ê L X i ) ê L ) e 0 L) sup = O p /2 ζk)k s/d ζk)k /2 / /2 ) 2 ). ê L X i ) W i ) ) ê L X i ) W i ) e L X i ) ê L X i ) e L X i ) 2 et, consider A.37). µk X i ) e 2 L X i) µx ) i) e 2 ê L X i ) e L X i )) ê L X i ) W i ) X i ) /2 sup µk X i ) e 2 L X i) µx ) i) e 2 L X ê L X i ) e L X i )) ê L X i ) W i ) i) µxi ) e 2 L X i) µx ) i) e 2 ê L X i ) e L X i )) ê L X i ) W i ) X i ) µ K X i ) e 2 L X i) µx i) e 2 L X i) sup ê L ) e L ) 2 µx i ) e 2 L X i) µx i) e 2 X i ) sup ê L ) e L ) 2 /2 sup = O p /2 ζ 2 K)K s/d ζl)l /2 /2 ζl)l s/2d) )) O p /2 ζl)l s/2d) ζl)l /2 /2 ζl)l s/2d) )). et, consider A.38). Using a mean value theorem, we can write ê L X i ) e 0 LX i ) = LR L X i )ˆπ L ) LR L X i )π 0 L) = LR L X i ) π L ) LR L X i ) π L ))R L X i ) ˆπ L π 0 L), [28]

30 for some intermediate value π L. Since π L is between ˆπ L and πl 0, it follows that and Hence = π L π 0 L = LO p L /2 /2 ), sup LR L ) π L ) LR L )πl) 0 = O p ζl)l /2 /2 ). /2 sup Since µx i ) e 2 X i ) ê L X i ) e L X i ) ))R L X i )ˆπ L πl) 0 ) ê L X i ) W i ) µx i ) e 2 X i ) LR K X i ) π L ) LR K X i ) π L ))) ))R L X i )ˆπ L πl) 0 ) ê L X i ) W i ) µx i ) e 2 X i ) sup LR K X i ) π L ) LR K X i ) π L ))) )) sup R L ) ˆπ L πl 2 0 = O p /2 ζl)l /2 /2 ζl)ζl)l /2 /2 ). et, consider A.39). µx i ) ) e 2 R L X i )ˆπ L π 0 X i ) L) ê L X i ) e L X i )) /2 sup µ)e) e)) e 2 ) sup R K ) ˆπ L πl 0 sup ê L ) e 0 L) = O p /2 ζl)l /2 /2 L s/2d) )ζl)l /2 /2 ζl)l s/2d) )). et, consider A.40). µx i ) ) e 2 R L X i )ˆπ L π 0 X i ) L) e L X i ) ) /2 sup µ)e) e)) e 2 ) sup = O p /2 ζl)l /2 /2 L s/2d) )L s/d ). R K ) ˆπ L πl 0 sup e 0 L) e) et, consider A.4). µx i ) ) e 2 R L X i )ˆπ L π 0 X i ) L) W i ) E ˆπ L πl 0 µx i ) ) e 2 X i ) µx i ) ) e 2 X i ) 2 R L X i ) W i ) [29] R L X i ) W i ).

31 sup µ)e) e)) e 2 E[R L X i ) R L X i )] = OL), ) it follows that A.4) is O p LL /2 /2 L s/2d) ). Epression A.29) is bounded by ˆγ K γ K sup R K ) /2 sup e L ) sup ê L ) e L ) = O p /2 ζk)) O p ζk)) /2 O p /2 ξ 2 L)) = O p /2 ξ 2 K)ξ 2 L)). ote that inf e L ) > sup e)/2 for L large enough, so that sup /e L )) = O).) Epression A.30) is bounded by ˆγ K γ K sup R K ) /2 sup e L ) sup e L ) e) = O p /2 ζk)) O p ζk)) /2 O p L t/d ) = O p L t/d ξ 2 K)). Epression A.3) is bounded by ˆγ K γ K sup R K ) /2 sup e L ) e) = O p /2 ζk)) O p ζk)) /2 O p L t/d ) = O p L t/d ξ 2 K)). Epression A.32) is bounded by R ˆγ K γ K K X i ) ex i ) W i ) The first factor is O p /2 ζk)). Because E R K X i ) 2 W i ) C ξ 2 K), it follows that the second factor is O p ζk)). Thus A.32) is O p /2 ξ 2 K)). et, consider A.33): µk X i ) e L X i ) µx ) i) êx i ) W i ) µ = K X i ) µx i ) e L X i ) êx i ) W i ) e L X i ) µk X i ) µx i ) e L X i ) e 2 µ ) KX i ) µx i ) e L X i ) êx i ) W i ) X i ) e L X i ) µ K X i ) µx i ) e L X i ) e 2 êx i ) W i ) X i ) sup e L ) e)) µ K X i ) µx i ) e L X i ) [30] êx i ) W i )

32 µ K X i ) µx i ) e 2 X i ) µx i ) µx i ) e L X i ) e 2 X i ) = O p L t/d ) O p K s/d ) O p L t/d ). êx i ) W i ) êx i ) W i ) so that epression A.33) is O p L t/d K s/d ). et, consider A.34). Because e) is bounded away from 0 on X, X is a compact subset of R d, and µ) and e) are s and t times continuously differentiable, it follows that µx) ex) is mins, t) times continuously differentiable, has a finite second moment, and the projection of this random variable on R L X) eists. Define [ µx) ) ] 2 δ L = arg min E δ ex) R LX) δ A.42) By Lorentz ), the uniform approimation error is sup µ) e) R ) L) δ L = O L mins,t) d. A.43) X Hence, µx i ) ê LX i ) W i ) = ) µxi ) R LX i ) δ L ê L X i ) W i ) ) µxi ) R LX i ) δ L ê L X i ) W i ), A.44) R L X i ) δ L ê L X i ) W i ) A.45) because the second term vanishes as a result of the first order conditions for the series logit estimator which imply that i R LX i )êx i ) W i ) = 0. The last epression, A.45) can be bounded by sup µ) e) R L) δ L êx i ) W i 2 sup µ) e) R L) δ L = O X X L mins,t) d because of A.43). This proves that A.34) is O L mins,t)/d). Thus combined with ), this finishes the proof of the first assertion in Theorem 3.. et, consider part ii) of Theorem 3.. By the triangle inequality we have ), ˆβmod ˆβ hir ) = W i Y i µx i )) W i Y i ˆµ K X i )) ê L X i ) W i µx i ) ˆµ K X i )) ê L X i ) ) A.46) A.47) [3]

33 W i Y i µ K X i )) ) ) W i Y i µ K X i )) ê L X i ). A.48) A.49) First consider A.46). Because e) is s times continuously differentiable and bounded away from zero, it follows that there is a δ K such that for some finite C we can approimate /e) by R K )δ, with sup e) R K)δ K CK s/d. By the first order conditions for ˆγ K it follows that W i Y i ˆµ K X i ))R K X i ) = 0. Hence W i Y i ˆµ K X i )) ) W i Y i ˆµ K X i )) ex i ) R KX i )δ K W i Y i ˆµ K X i )) R K X i )δ K ) W i Y i ˆµ K X i )) ex i ) R KX i )δ K ) W i Y i µx i )) ex i ) R KX i )δ K ) W i µx i ) ˆµ K X i )) R KX i )δ K /2 /2 sup W i Y i µx i )) sup µ) ˆµ K ) sup e) R K)δ K e) R K)δ K = O p /2 K s/d ) O p /2 ζk)k /2 / /2 ζ 2 K)K s/d )K s/d ) = O p /2 K s/d ). et, consider A.47). W i µx i ) ˆµ K X i )) ê L X i ) ) sup µ) ˆµ K ) sup ê L ) e) [32]

34 = O p /2 ζk)k /2 / /2 ζ 2 K)K s/d )ζl)l /2 /2 L s2d) ))). et, consider A.48). W i Y i µx i )) ) ) W i Y i µx i )) = O p /2 L s/d ). sup Finally, consider A.49). ) W i Y i µx i )) ê L X i ) = W i Y i µx i )) êlx i ) ê L X i ) êl X i ) e 0 L W i Y i µx i )) X i) ê L X i )e 0 L X êlx i ) e 0 L X ) i) i) êl X i ) e 0 L W i Y i µx i )) X i) W i Y i µx i )) ex i) ex i) R L X i )ˆπ L πl) 0. First, consider A.50). êl X i ) e 0 L W i Y i µx i )) X i) ê L X i )e 0 L X êlx i ) e 0 L X ) i) i) A.50) R L X i )ˆπ L π 0 L)) A.5) A.52) ) /2 W i Y i µx i ) sup êl ) e 0 L) sup e ) L ) ê L) e 0 L ) e0 L ) = O p /2 ζl)l /2 /2 L s/2d) ))ζl)l /2 /2 L s/2d) )). et, consider A.5). ote that ê L ) e 0 L) = LR L )ˆπ L ) LR L )π 0 L) = LR L ) π L ) LR L ) π L ))R L )ˆπ L π 0 L), with π in between ˆπ and π 0 L. Hence π π0 L = O pl /2 /2 L s/2d) ), and sup LR L ) π L ) e) = O p ζl)l /2 /2 L s/2d) )). Thus sup ê L ) e 0 L) e) e))r L )ˆπ L π 0 L) = O p ζl)l /2 /2 L s/2d) )) sup R L ) ˆπ L πl) 0 = O p ζ 2 L)L /2 /2 L s/2d) )L /2 /2 L s/2d) )). [33]

35 Then we have êl X i ) e 0 L W i Y i µx i )) X i) êl X i ) e 0 L W i Y i µx i )) X i) ex i) R L X i )ˆπ L π 0 L)) ex i) R L X i )ˆπ L π 0 L)) êl X i ) e 0 L W i Y i µx i )) X i) e 0 L X êlx i ) e 0 L X ) i) i). The first term is O p ζ 2 L)L /2 /2 L s/2d) )L /2 /2 L s/2d) )). The second term is bounded by W i Y i µx i )) sup êl X i ) e 0 LX i ) sup = O p /2 ζl)l /2 / /2 L s/2d) )ζl)l /2 / /2 L s/2d) )) Finally, consider A.52). W i Y i µx i )) ex i) R L X i )ˆπ L π 0 L). ˆπ L πl 0 W i Y i µx i )) ex i) R L X i ). The first factor is O p ζl)l /2 /2 L s/2d) ). The epectation of the square of the second factor is [ E = E [ W i Y i µx i )) ex i) T 2 Y µx)) 2 ex) ex) ] R L X i ). ) 2 R L X) R L X)] ) 2 e) sup σy 2 ) sup E[R L X) R L X)] = OL). e) Hence A.52) is O p LζL)L /2 /2 L s/2d) ). This finishes the proof of part ii) of the Theorem. Finally, consider part iii) of the Theorem. First we prove /2 / ) W i = o p ). A.53) ê L X i ) / /2 W i ) ê L X i ) = W i ê L X i ) W i ê L X i ) ê L X i ) A.54) [34]

36 ê L X i )) êx i ) ) W i )) ) ) W i )) ê L X i ). A.55) A.56) A.57) We will deal with A.54)-A.57) one by one. First consider A.54). There is a sequence of δ L such that for some finite C we have sup e) R L) δ L CL s/d. Then W i ê L X i ) W i ê L X i )) R L X i ) δ W i ê L X i )) R L X i ) δ L ). Because of the first order conditions for ˆπ L the first term vanishes. Then The second term is bounded by C /2 sup R L) δ L e) = O /2 L s/d ) = o p ). et, consider A.55). ê L X i )) êx i ) ) /2 sup e) ê L ) sup e) ê L ) = O p /2 ζl)l /2 /2 L s/d )ζl)l /2 /2 L s/d )) = o p ). et, consider A.56). W i )) ) 2 sup e 0 L ) e) = O p /2 L s/d ) = o p ). Finally, consider A.57). ) W i )) ê L X i ) = W i )) e0 L X i) ê L X i ) ê L X i ) [35]

37 W i ) e 0 LX i ) ê L X i ) ) e 0 W i ) L X i ) ê L X i ) e 2 X i ) W i ) ex i) ê L X i ) e 2 X i ) ex i) R L X i ) ˆπ L πl) 0 ) R L X i ) ˆπ L π 0 L)) The first of these terms is of order O p /2 ζ 2 L)L /2 /2 L s/2d) ) 2 ) = o p ). The second term is of order5 O P ) = o p ). For the third term, note that E W i ) ex 2 i) R L X i ) so that [ ] ex) 3 = E R L X) R L X) CL ex) W i ) ex i) R L X i ) ˆπ L πl) 0 CL /2 ˆπ L π 0 L = O p L /2 L /2 /2 L s/2d) )) = o p ). This finishes the proof of A.53). et, define C = W i Y i ê L X i ), so that C = o p ), and ˆβ hir, ˆβ hir,2 = Hence ˆβhir, ˆβ hir,2 W i Y i ê L X i ) C ). W i Y i ê L X i ) C C = O p) o p ) = o p ). Lemma A.9 If then E K) C K) p sup 0 K K E K) E ˆK) p inf K K E K) [36]

38 Proof: Define K = argmin K K E K) For large enough we have that for any δ, η > 0 ) Pr C K) E K) < δ > η sup K K Hence if is sufficiently large then with probability of at least η δ δ C K) δ)e K) C ˆK) δ)e K) E ˆK) E K) and the conclusion follows because δ, η >) are arbitrary. Proof of Theorem 4.: We have C K) = E K) 2 ε A K aa A K µ a A K εε A K a σ 2 a A K a) We need to show sup K K u A K aa A K µ E K) p 0 A.58) and sup K K a A K uu A K a σ 2 a A K a E K) p 0 A.59) I consider the first statement. We have for δ > 0 a A K µ u A K µ Pr sup K K E K) a A K µ me u A K a m ) δ m m E K) m K K By Whittle s theorem We have Also E u A K a m ) Ca A K a) m 2 a A K a CK 2sp/d a A K µ) 2 E K) Combining this we find ) > δ a A K µ me u A K a m ) δ m m E K) m K K K K Pr C K msp/d E K K K) m 2 [37] a A K µ u ) A K µ > δ E K)

39 so that we require that compare with Li) 0 K msp/d E K K K) m 2 We have K msp/d E K K K) m 2 inf K K E K) m 2 K K K msp/d If we take the dimension of K as κ, this holds if compare with Li) κmsp/d ) Pr sup K K inf E K) m 2 A.60) K K so that optimal MSE cannot decrease too fast with. We now consider A.59). We have u A K aa A K u σ 2 a A K a ) K K Pr K K K K so that we require or E K) > δ u A K aa A K u σ 2 a A K a ) > δ E K) E u A K aa A K u σ 2 a A K a m ) δ m m E K) m Ctr AK aa A K ) 2) m 2 δ m m E K) m K 2msp/d E K K K) 0 m κ2msp/d ) which is implied by A.60). inf E K) m K K K K Ca A K a)m δ m m E K) m REFERECES Andrews, D. 99): Asymptotic Optimality of Generalized C L, Cross-validation, and Generalized Cross-validation in Regression with Heteroskedastic Errors, Journal of Econometrics, 47, Barnow, B.S., G.G. Cain and A.S. Goldberger 980), Issues in the Analysis of Selectivity Bias, in Evaluation Studies, vol. 5, ed. by E. Stromsdorfer and G. Farkas. San Francisco: Sage. Chen, X., Hong, H., and Tarozzi, A., 2004), Semiparametric Efficiency in GMM Models of onclassical Measurement Error, Missing Data and Treatment Effects. Working Paper. [38]

40 Hahn, J., 998), On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects, Econometrica 66 2), Heckman, J., and R. Robb, 984), Alternative Methods for Evaluating the Impact of Interventions, in Heckman and Singer eds.), Longitudinal Analysis of Labor Market Data, Cambridge, Cambridge University Press. Heckman, J., H. Ichimura, and P. Todd, 998), Matching as an Econometric Evaluation Estimator, Review of Economic Studies 65, Heckman, J., H. Ichimura, J. Smith, and P. Todd, 998), Characterizing Selection Bias Using Eperimental Data, Econometrica 66, Hirano, K., G. Imbens, and G. Ridder 2000), Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score. BER Working Paper. Holland, P. 986), Statistics and Causal Inference, Journal of the American Statistical Association, 8, Ichimura, H., and O. Linton, 200), Asymptotic Epansions for some Semiparametric Program Evaluation Estimators. Institute for Fiscal Studies, cemmap working paper cwp04/0. Li, K. 987), Asymptotic Optimality for C p, C L, Cross-validation and Generalized Cross-validation: Discrete Inde Set, Annals of Statistics, 53), ewey, W., 994), The Asymptotic Variance of Semiparametric Estimators, Econometrica 62, ewey, W. 995), Convergence Rates for Series Estimators, in Advances in Econometrics and Quantitative Economics: Essays in Honor of Professor C. R. Rao, Maddala, Phillips, and Srinivasan eds.), Cambridge, Basil Blackwell. ewey, W., and D. McFadden 994), Large Sample Estimation, in Handbook of Econometrics, Vol. 4, Engle and McFadden eds.), orth Holland. Robins, J., A. Rotnitzky, and L. Zhao 995), Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data, Journal of the American Statistical Association, ), Rosenbaum, P., and D. Rubin, 983a), The Central Role of the Propensity Score in Observational Studies for Causal Effects, Biometrika, 70, Rotnitzky, A., and J. Robins 995), Semiparametric Regression Estimation in the Presence of Dependent Censoring, Biometrika 82 4), Rubin, D. 974), Estimating Causal Effects of Treatments in Randomized and on-randomized Studies, Journal of Educational Psychology, 66, Rubin, D. B., 978), Bayesian inference for causal effects: The Role of Randomization, Annals of Statistics, 6: [39]

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Keisuke Hirano University of Miami 2 Guido W. Imbens University of California at Berkeley 3 and BER Geert Ridder

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect

More information

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E. NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES James J. Heckman Petra E. Todd Working Paper 15179 http://www.nber.org/papers/w15179

More information

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar Mitnik First Draft: July 2004

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 1, Monday, July 30th, 9.00-10.30am Estimation of Average Treatment Effects Under Unconfoundedness 1.

More information

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples DISCUSSION PAPER SERIES IZA DP No. 4304 A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples James J. Heckman Petra E. Todd July 2009 Forschungsinstitut zur Zukunft

More information

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1 Imbens/Wooldridge, IRP Lecture Notes 2, August 08 IRP Lectures Madison, WI, August 2008 Lecture 2, Monday, Aug 4th, 0.00-.00am Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Introduction

More information

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region Impact Evaluation Technical Session VI Matching Techniques Jed Friedman Manila, December 2008 Human Development Network East Asia and the Pacific Region Spanish Impact Evaluation Fund The case of random

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar Mitnik First Draft: July 2004

More information

Nonparametric Tests for Treatment Effect. Heterogeneity

Nonparametric Tests for Treatment Effect. Heterogeneity forthcoming in The Review of Economics and Statistics Nonparametric Tests for Treatment Effect Heterogeneity Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar A. Mitnik First Draft: November 2005 Final

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW*

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS UNDER EXOGENEITY: A REVIEW* OPARAMETRIC ESTIMATIO OF AVERAGE TREATMET EFFECTS UDER EXOGEEITY: A REVIEW* Guido W. Imbens Abstract Recently there has been a surge in econometric work focusing on estimating average treatment effects

More information

Large Sample Properties of Matching Estimators for Average Treatment Effects

Large Sample Properties of Matching Estimators for Average Treatment Effects Large Sample Properties of atching Estimators for Average Treatment Effects Alberto Abadie Harvard University and BER Guido Imbens UC Bereley and BER First version: July This version: January 4 A previous

More information

Estimation of Treatment Effects under Essential Heterogeneity

Estimation of Treatment Effects under Essential Heterogeneity Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March

More information

Nonparametric Tests for Treatment Effect Heterogeneity

Nonparametric Tests for Treatment Effect Heterogeneity DISCUSSION PAPER SERIES IZA DP No. 2091 Nonparametric Tests for Treatment Effect Heterogeneity Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar A. Mitnik April 2006 Forschungsinstitut zur Zukunft

More information

Covariate selection and propensity score specification in causal inference

Covariate selection and propensity score specification in causal inference Covariate selection and propensity score specification in causal inference Ingeborg Waernbaum Doctoral Dissertation Department of Statistics Umeå University SE-901 87 Umeå, Sweden Copyright c 2008 by Ingeborg

More information

Nonparametric Tests for Treatment Effect Heterogeneity

Nonparametric Tests for Treatment Effect Heterogeneity Nonparametric Tests for Treatment Effect Heterogeneity The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Published Version

More information

Dealing with Limited Overlap in Estimation of Average Treatment Effects

Dealing with Limited Overlap in Estimation of Average Treatment Effects Dealing with Limited Overlap in Estimation of Average Treatment Effects The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation

More information

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects DISCUSSION PAPER SERIES IZA DP No. 9989 The Value of Knowing the Propensity Score for Estimating Average Treatment Effects Christoph Rothe June 2016 Forschungsinstitut zur Zukunft der Arbeit Institute

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ Matching James J. Heckman Econ 312 This draft, May 15, 2007 1 / 169 Introduction The assumption most commonly made to circumvent problems with randomization is that even though D is not random with respect

More information

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness finite-sample optimal estimation and inference on average treatment effects under unconfoundedness Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2017 Introduction

More information

Department of Economics. Working Papers

Department of Economics. Working Papers 10ISSN 1183-1057 SIMON FRASER UNIVERSITY Department of Economics Working Papers 08-07 "A Cauchy-Schwarz inequality for expectation of matrices" Pascal Lavergne November 2008 Economics A Cauchy-Schwarz

More information

Lecture 11/12. Roy Model, MTE, Structural Estimation

Lecture 11/12. Roy Model, MTE, Structural Estimation Lecture 11/12. Roy Model, MTE, Structural Estimation Economics 2123 George Washington University Instructor: Prof. Ben Williams Roy model The Roy model is a model of comparative advantage: Potential earnings

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1

Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1 Imbens, Lecture Notes 1, Unconfounded Treatment Assignment, IEN, Miami, Oct 10 1 Lectures on Evaluation Methods Guido Imbens Impact Evaluation Network October 2010, Miami Methods for Estimating Treatment

More information

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,

More information

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II A Course in Applied Econometrics Lecture Outline Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Assessing Unconfoundedness (not testable). Overlap. Illustration based on Lalonde

More information

The Evaluation of Social Programs: Some Practical Advice

The Evaluation of Social Programs: Some Practical Advice The Evaluation of Social Programs: Some Practical Advice Guido W. Imbens - Harvard University 2nd IZA/IFAU Conference on Labor Market Policy Evaluation Bonn, October 11th, 2008 Background Reading Guido

More information

Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects

Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects Matias Busso IDB, IZA John DiNardo University of Michigan and NBER Justin McCrary University of Californa, Berkeley and

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Simple Estimators for Semiparametric Multinomial Choice Models

Simple Estimators for Semiparametric Multinomial Choice Models Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper

More information

Simple Estimators for Monotone Index Models

Simple Estimators for Monotone Index Models Simple Estimators for Monotone Index Models Hyungtaik Ahn Dongguk University, Hidehiko Ichimura University College London, James L. Powell University of California, Berkeley (powell@econ.berkeley.edu)

More information

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen Woutersen. Draft, September

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen Woutersen. Draft, September A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters By Tiemen Woutersen Draft, September 006 Abstract. This note proposes a new way to calculate confidence intervals of partially

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,

More information

Large Sample Properties of Matching Estimators for Average Treatment Effects

Large Sample Properties of Matching Estimators for Average Treatment Effects Revision in progress. December 7, 4 Large Sample Properties of atching Estimators for Average Treatment Effects Alberto Abadie Harvard University and BER Guido Imbens UC Bereley and BER First version:

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Efficiency of repeated-cross-section estimators in fixed-effects models

Efficiency of repeated-cross-section estimators in fixed-effects models Efficiency of repeated-cross-section estimators in fixed-effects models Montezuma Dumangane and Nicoletta Rosati CEMAPRE and ISEG-UTL January 2009 Abstract PRELIMINARY AND INCOMPLETE Exploiting across

More information

Bias-adjusted Nearest Neighbor Estimation for the Partial Linear Model

Bias-adjusted Nearest Neighbor Estimation for the Partial Linear Model Bias-adjusted Nearest Neighbor Estimation for the Partial Linear Model Guido Imbens UC Berkeley and NBER Jack Porter University of Wisconsin - Madison First draft: December 2002 This draft March 2005 Abstract

More information

Semiparametric Efficiency in Irregularly Identified Models

Semiparametric Efficiency in Irregularly Identified Models Semiparametric Efficiency in Irregularly Identified Models Shakeeb Khan and Denis Nekipelov July 008 ABSTRACT. This paper considers efficient estimation of structural parameters in a class of semiparametric

More information

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING ESTIMATION OF TREATMENT EFFECTS VIA MATCHING AAEC 56 INSTRUCTOR: KLAUS MOELTNER Textbooks: R scripts: Wooldridge (00), Ch.; Greene (0), Ch.9; Angrist and Pischke (00), Ch. 3 mod5s3 General Approach The

More information

Grouped Network Vector Autoregression

Grouped Network Vector Autoregression Statistica Sinica: Supplement Grouped Networ Vector Autoregression Xuening Zhu 1 and Rui Pan 2 1 Fudan University, 2 Central University of Finance and Economics Supplementary Material We present here the

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Efficient Semiparametric Estimation of Quantile Treatment Effects

Efficient Semiparametric Estimation of Quantile Treatment Effects fficient Semiparametric stimation of Quantile Treatment ffects Sergio Firpo U Berkeley - Department of conomics This Draft: January, 2003 Abstract This paper presents calculations of semiparametric efficiency

More information

Locally Robust Semiparametric Estimation

Locally Robust Semiparametric Estimation Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper

More information

Blinder-Oaxaca as a Reweighting Estimator

Blinder-Oaxaca as a Reweighting Estimator Blinder-Oaxaca as a Reweighting Estimator By Patrick Kline A large literature focuses on the use of propensity score methods as a semi-parametric alternative to regression for estimation of average treatments

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

A nonparametric test for path dependence in discrete panel data

A nonparametric test for path dependence in discrete panel data A nonparametric test for path dependence in discrete panel data Maximilian Kasy Department of Economics, University of California - Los Angeles, 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095,

More information

Imbens, Lecture Notes 2, Local Average Treatment Effects, IEN, Miami, Oct 10 1

Imbens, Lecture Notes 2, Local Average Treatment Effects, IEN, Miami, Oct 10 1 Imbens, Lecture Notes 2, Local Average Treatment Effects, IEN, Miami, Oct 10 1 Lectures on Evaluation Methods Guido Imbens Impact Evaluation Network October 2010, Miami Methods for Estimating Treatment

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

LIKELIHOOD RATIO INFERENCE FOR MISSING DATA MODELS

LIKELIHOOD RATIO INFERENCE FOR MISSING DATA MODELS LIKELIHOOD RATIO IFERECE FOR MISSIG DATA MODELS KARU ADUSUMILLI AD TAISUKE OTSU Abstract. Missing or incomplete outcome data is a ubiquitous problem in biomedical and social sciences. Under the missing

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.

More information

The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors

The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors The Asymptotic Variance of Semi-parametric stimators with Generated Regressors Jinyong Hahn Department of conomics, UCLA Geert Ridder Department of conomics, USC October 7, 00 Abstract We study the asymptotic

More information

Jinyong Hahn. Department of Economics Tel: (310) Bunche Hall Fax: (310) Professional Positions

Jinyong Hahn. Department of Economics Tel: (310) Bunche Hall Fax: (310) Professional Positions Jinyong Hahn Department of Economics Tel: (310) 825-2523 8283 Bunche Hall Fax: (310) 825-9528 Mail Stop: 147703 E-mail: hahn@econ.ucla.edu Los Angeles, CA 90095 Education Harvard University, Ph.D. Economics,

More information

A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators

A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators Robert M. de Jong Department of Economics Michigan State University 215 Marshall Hall East

More information

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics DEPARTMENT OF ECONOMICS R. Smith, J. Powell UNIVERSITY OF CALIFORNIA Spring 2006 Economics 241A Econometrics This course will cover nonlinear statistical models for the analysis of cross-sectional and

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Lecture 8. Roy Model, IV with essential heterogeneity, MTE Lecture 8. Roy Model, IV with essential heterogeneity, MTE Economics 2123 George Washington University Instructor: Prof. Ben Williams Heterogeneity When we talk about heterogeneity, usually we mean heterogeneity

More information

Course Description. Course Requirements

Course Description. Course Requirements University of Pennsylvania Spring 2007 Econ 721: Advanced Microeconometrics Petra Todd Course Description Lecture: 9:00-10:20 Tuesdays and Thursdays Office Hours: 10am-12 Fridays or by appointment. To

More information

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments CHAPTER 6 Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments James H. Stock and Motohiro Yogo ABSTRACT This paper extends Staiger and Stock s (1997) weak instrument asymptotic

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based

More information

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS IVERSE PROBABILITY WEIGHTED ESTIMATIO FOR GEERAL MISSIG DATA PROBLEMS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038 (517) 353-5972 wooldri1@msu.edu

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens

NBER WORKING PAPER SERIES MATCHING ON THE ESTIMATED PROPENSITY SCORE. Alberto Abadie Guido W. Imbens BER WORKIG PAPER SERIES MATCHIG O THE ESTIMATED PROPESITY SCORE Alberto Abadie Guido W. Imbens Working Paper 530 http://www.nber.org/papers/w530 ATIOAL BUREAU OF ECOOMIC RESEARCH 050 Massachusetts Avenue

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

13 Endogeneity and Nonparametric IV

13 Endogeneity and Nonparametric IV 13 Endogeneity and Nonparametric IV 13.1 Nonparametric Endogeneity A nonparametric IV equation is Y i = g (X i ) + e i (1) E (e i j i ) = 0 In this model, some elements of X i are potentially endogenous,

More information

Diagonalization by a unitary similarity transformation

Diagonalization by a unitary similarity transformation Physics 116A Winter 2011 Diagonalization by a unitary similarity transformation In these notes, we will always assume that the vector space V is a complex n-dimensional space 1 Introduction A semi-simple

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press. Cheng Hsiao Microeconometrics Required Text: C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press. A.C. Cameron and P.K. Trivedi (2005), Microeconometrics, Cambridge University

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Matching on the Estimated Propensity Score

Matching on the Estimated Propensity Score atching on the Estimated Propensity Score Alberto Abadie Harvard University and BER Guido W. Imbens Harvard University and BER October 29 Abstract Propensity score matching estimators Rosenbaum and Rubin,

More information

A Local Generalized Method of Moments Estimator

A Local Generalized Method of Moments Estimator A Local Generalized Method of Moments Estimator Arthur Lewbel Boston College June 2006 Abstract A local Generalized Method of Moments Estimator is proposed for nonparametrically estimating unknown functions

More information

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Comparative Advantage and Schooling

Comparative Advantage and Schooling Comparative Advantage and Schooling Pedro Carneiro University College London, Institute for Fiscal Studies and IZA Sokbae Lee University College London and Institute for Fiscal Studies June 7, 2004 Abstract

More information

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS Olivier Scaillet a * This draft: July 2016. Abstract This note shows that adding monotonicity or convexity

More information

The Influence Function of Semiparametric Estimators

The Influence Function of Semiparametric Estimators The Influence Function of Semiparametric Estimators Hidehiko Ichimura University of Tokyo Whitney K. Newey MIT July 2015 Revised January 2017 Abstract There are many economic parameters that depend on

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators

A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators ANNALS OF ECONOMICS AND FINANCE 4, 125 136 (2003) A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators Insik Min Department of Economics, Texas A&M University E-mail: i0m5376@neo.tamu.edu

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Department of Econometrics and Business Statistics

Department of Econometrics and Business Statistics ISSN 440-77X Australia Department of Econometrics and Business Statistics http://wwwbusecomonasheduau/depts/ebs/pubs/wpapers/ The Asymptotic Distribution of the LIML Estimator in a artially Identified

More information