Efficient Estimation of Average Treatment Effects With Mixed Categorical and Continuous Data
|
|
- Sydney Cameron
- 5 years ago
- Views:
Transcription
1 Efficient Estimation of Average Treatment Effects With Mixed Categorical and Continuous Data Qi Li Department of Economics Texas A&M University Bush Academic Building West College Station, TX Jeff Wooldridge Department of Economics Michigan State University East Lansing, MI June 18, 004 Jeff Racine Department of Economics & Center for Policy Research Syracuse University Syracuse, NY Abstract In this paper we consider the nonparametric estimation of average treatment effects when there exist mixed categorical and continuous covariates. One distinguishing feature of the approach presented herein is the use of kernel smoothing for both the continuous and the discrete covariates. This approach, together with the cross-validation method to select the smoothing parameters, has the amazing ability of automatically removing irrelevant covariates. We establish the asymptotic distribution of the proposed average treatment effects estimator with data-driven smoothing parameters. Simulation results show that the proposed method is capable of performing much better than existing kernel approaches whereby one splits the sample into subsamples corresponding to discrete cells. An empirical application to a controversial study that examines the efficacy of right heart catheterization on medical outcomes reveals that our proposed nonparametric estimator overturns the controversial findings of Connors et al. (1996), suggesting that their findings may be an artifact of an incorrectly specified parametric model. Key words and phrases: Average Treatment, Discrete Covariates, Kernel Smoothing, Bootstrap, Asymptotic normality. Li s research is partially supported by the Private Enterprise Research Center, Texas A&M University. Racine s research is supported by NSF grant BCS
2 1 Introduction The measurement of average treatment effects, initially confined to the assessment of dose-response relationships in medical settings, is today widely used across a range of disciplines. Assessing humancapital losses arising from war (Ichino and Winter-Ebmer (1998)) and the effectiveness of job training programs (Lechner (1999)) are but two examples of the wide range of potential applications. Perhaps the most widespread approach towards the measurement of treatment effects involves estimation of a propensity score. Estimation of the propensity score (conditional probability of receiving treatment) was originally undertaken with parametric index models such as the Logit or Probit, though there is an expanding literature on the semiparametric and nonparametric estimation thereof (see Hahn (1998) and Hirano et al. (00)). The advantage of nonparametric approaches in this setting is rather obvious, as misspecification of the propensity score may impact significantly upon the magnitude and even the sign of the estimated treatment effect. In many settings mismeasurement induced by misspecification can be extremely costly envision for a moment the societal cost of incorrectly concluding that a novel and beneficial cancer treatment in fact causes harm. Though the appeal of robust nonparametric methods is obvious in this setting, existing nonparametric approaches split the sample into cells in the presence of categorical covariates, resulting in a loss of efficiency. Given that datasets used to assess treatment effects frequently contain a preponderance of categorical data, 1 it is not uncommon that the number of discrete cells are larger than the sample sizes. In such cases the sample splitting frequency method become infeasible. On the other hand, the kernel-based smoothing cross-validation method can (asymptotically) automatically detect and remove irrelevant covariates, leading to a feasible and accurate nonparametric estimation of the average treatment effects. Another obvious symptom of the sample-splitting frequency method would be a loss in power of tests of whether a treatment effect differs from that of no effect. In this paper we propose a kernel-based nonparametric method for measuring and testing for the presence of treatment effects that is ideally suited to datasets containing a mix of categorical (nominal and ordinal) and continuous datatypes. One distinguishing feature of the proposed approach is the use of kernel smoothing for both the continuous and the discrete covariates. We elect to use the least-squares conditional cross-validation method to select smoothing parameters for both the categorical and continuous variables proposed by Hall et al. (004a), who demonstrate that crossvalidation produces asymptotically optimal smoothing for relevant components, while it eliminates irrelevant components by oversmoothing. Indeed, in the problem of nonparametric estimation of a conditional density with mixed categorical and continuous data, cross-validation comes into its own as a method with no obvious peers. The rest of the paper proceeds as follows. In Section we outline the nonparametric model and derive the distribution of the resultant average treatment effect. In Section 3 we undertake some simulation experiments, which demonstrate that the proposed method is capable of outperforming existing kernel approaches that require splitting the sample into subsamples ( discrete cells ). An empirical application presented in Section 4 involving a study that examines the efficacy of right heart catheterization on medical outcomes reveals that our approach negates the controversial findings of Connors et al. (1996) suggesting that their result may be an artifact of an incorrectly specified parametric model. Main proofs appear in the appendices. 1 In the typical medical study, it is common to encounter exclusively categorical data types. It is not clear to us how to extend this property of kernel smoothing, which automatically removes irrelevant covariates, to nonparametric series methods. 1
3 The Model For what follows, we use a dummy variable, t i {0, 1}, to indicate whether or not an individual has received treatment. We let t i = 1 for the treated, 0 for the untreated. Letting y i (t i ) denote the outcome, then, for i = 1,..., n, we write y i = t i y i (1) + (1 t i )y i (0). Interest lies in the average treatment effect defined as follows: τ = E[y i (1) y i (0)]. Let x i denote a vector of pre-treatment variables. One issue that instantly surfaces in this setting is that, for each individual i, we either observe y i (0) or y i (1), but not both. Therefore, in the absence of additional assumptions, the treatment effect is not consistently estimable. One popular assumption is the unconfoundedness condition (Rosenbaum and Rubin (1983)): Assumption (A1) (Unfoundedness): Conditional on x i, the treatment indicator t i is independent of the potential outcome. Define the conditional treatment effect by τ(x) = E[y i (1) y i (0) X = x]. Under Assumption 1 one can easily show that τ(x) = E[y i t i = 1, x i = x] E[y i t i = 0, x i = x]. (.3) The two terms on the right-hand side of (.3) can be estimated consistently by any nonparametric estimation technique. Therefore, under A1, the average treatment effects can be obtained via simple averaging over τ(x). τ = E[τ(x i )]. (.4) Letting E(y i x i, t i ) be denoted by g(x i, t i ), we then have y i = g(x i, t i ) + u i, (.5) with E(u i x i, t i ) = 0. Defining g 0 (x i ) = g(x i, t i = 0) and g 1 (x i ) = g(x i, t i = 1), we can re-write (.5) as y i = g 0 (x i ) + [g 1 (x i ) g 0 (x i )]t i + u i = g 0 (x i ) + τ(x i )t i + u i, (.6) where τ(x i ) = g 1 (x i ) g 0 (x i ). From (.6) it is easy to show that τ(x i ) = cov(y i, t i x i )/var(t i x i ). Letting µ(x i ) = P r(t 1 = 1 x i ) E(t i x i ) (because t i = {0, 1}), we may write { } (ti µ i )y i τ = E[τ(x i )] = E. (.7) var(t i x i ) We now turn to the discussion of the nonparametric estimation of τ based on (.7).
4 .1 Nonparametric Estimation of the Propensity Score We use x c i and xd i to denote the continuous and discrete components of x i, with x c i Rq and x d i being of dimension r. Let w( ) denote a univariate kernel function for the continuous variables, and define the product kernel function by W h (x c i, xc j ) = ( q x c h 1 s w is x c js h s ), where x c is is the sth component of x c i. We assume that some of the discrete variables have a natural ordering, examples of which would include preference orderings (like, indifference, dislike), health conditions (excellent, good, poor), and so forth. Let x d i denote a r 1 -vector (say, the first r 1 components of x d i, 0 r 1 r) of discrete covariates that have a natural ordering (0 r 1 r), and let x d i denote the remaining r = r r 1 discrete covariates that do not have a natural ordering. We use x d it to denote the tth component of xd i (t = 1,..., r). For an ordered variable, we use the Habbema kernel: { d 1, if x d l( x it, x d it = x d jt jt, λ t ) =, λ xd it xd jt t, if x d it xd jt. (.8) When λ t = 0 (λ t [0, 1]), l( x d it, xd jt, λ t = 0) becomes an indicator function, and when λ t = 1, l( x d it, xd jt, λ t = 1) = 1 becomes a uniform weight function. For an unordered variable, we use a variation on Aitchison and Aitken s (1976) kernel function defined by { d 1, if x l( x it, x d d jt) = it = x d jt, (.9) λ t, otherwise. Again λ t = 0 leads to an indicator function, λ t = 1 to a uniform weight function. Let 1(A) denote an indicator function that assumes the value 1 if A occurs and 0 otherwise. Combining (.8) and (.9), we obtain the product kernel function given by [ r1 ] [ r ] L(x d i, x d j, λ) =. (.10) t=1 λ xd it xd jt t t=r 1 +1 λ 1( xd it xd jt ) t We note that there does not exist a plug-in or even an ad-hoc formula for selecting λ t in this setting. Hence we recommend using least squares cross-validation for selecting λ t (t = 1,..., r). Our recommendation is based not only on the mean square error optimality of least squares cross-validation, but also due to its automatic ability to (asymptotically) remove irrelevant discrete covariates (see Hall et al. (004a,b)). This property bears highlighting as we have observed that irrelevant variables tend to occur surprisingly often in practice. Thus, cross-validation provides an efficient way of guarding against overspecification of nonparametric models, and thereby mitigates the curse of dimensionality often associated with kernel methods. Since µ(x i ) = P r(t i = 1 x i ) = E(t i x i ), we can use either a conditional probability estimator, or a conditional mean estimator to estimate µ(x i ). We will use the latter in this paper. We let ˆt(x i ) be the nonparametric estimator of µ i µ(x i ) defined by ˆt(x i ) = n j=1 t jk n,ij n j=1 K, (.11) n,ij where K n,ij = W h (x c i, xc j )L(xd i, xd j, λ). By noting that var(t i x i ) = µ i (1 µ i ), one can estimate the 3
5 average treatment effects by ˆτ = 1 n (t i ˆt(x i ))y i M ni ˆt(x i )(1 ˆt(x i )) 1 n [ ti y i ˆt(x i ) (1 t ] i)y i M ni, (.1) 1 ˆt(x i ) where M ni = M n (x i ) is a trimming set that trims out observations near the boundary. With the exception of the presence of the trimming function M ni, (.1) is exactly the same as the propensity score based estimator considered by Hirano et al. (00), who used series methods for estimating µ(x i ). As we mentioned earlier, it is not unclear how to use nonparametric series methods to automatically remove irrelevant discrete covariates. Therefore, in this paper we will consider only the kernel-based estimation method, which has the advantage of automatically removing irrelevant covariates (be they continuous or discrete). To derive the asymptotic distribution of ˆτ we make the following regularity assumptions. Following Robinson (1988) we use Gν α (ν is a positive integer) to denote the smooth class of functions such that if g Gν α, then g is ν-times differentiable, and g and its partial derivatives (up to order ν) are all bounded by functions with finite αth moments. Assumption (A): (i) (y i, x i, t i ) are independently and identically distributed as (y i, x i, t i ). (ii) x d i takes finitely many different values; for each x d, the support of f(x c, x d ), is a compact convex set in x c, µ(x c, x d ) Gν, 4 f(x c, x d ) Gν 1 4, where ν and ν > q is a positive integer. (iii) inf x S f(x) η for some η > 0, where S is the support of x i. (iv) σ (x, t) = var(u i x i = x, t i = t) is bounded below by a positive constant on the support of (x i, t i ). (v) The trimming function M n (x) converges to an indicator function (as n ) 1(x S), where 1( ) is the usual indicator function, and S is the support of f(x). Assumption (A3): (i) w( ) is νth order kernel; it is bounded, symmetric and differentiable up to order ν. (ii) As n, n q hν+4 s 0, and nh 1... h q. Assumptions (A) (i)-(iv) are standard smoothness and moment conditions. (A) (v) implies that, asymptotically, we only trim a negligible amount of data (near the boundary) so that ˆτ is asymptotically efficient (see Theorem.1). A trimming set is used in (.1) for theoretical reasons. Given that the support of x is a compact and convex set (in x c ), without loss of generality, one can assume = q [ δ s, δ s ], where δ s = δ sn < 1 converges to that x c [ 1, 1] q. Then one can define a set A δn 0 as n. To avoid boundary bias, one can choose δ s = O (h α s ) for some 0 < α < 1, and define M n (x i ) = 1(x i A δn ). In this way the boundary effects disappear asymptotically. In practice, boundary trimming does not appear to be necessary. In both the simulations and the empirical application reported in Sections 3 and 4, we do not resort to trimming. In the presence of outliers, however, one might wish to consider trimming. In order to appreciate the restrictions imposed by (A3), let us assume that h s = h for all s s. In this case, (A3) (ii) requires that ν + > q. Using a second order kernel (ν = ) implies that q < 4 or q 3 since q is a positive integer. Thus, a second order kernel can satisfy (A3) if q 3. When q 4 (A3) requires the use of a higher order kernel function. Remark.1 If 1 q 3 and one uses a second order kernel (ν = ), then (A3) allows optimal smoothing. To see this, note that when ν =, the optimal smoothing is h s = O ( n 1/(4+q)). (A3) (ii) becomes (assuming h s = h) nh 8 0 and nh q ; optimal smoothing h n 1/(4+q) satisfies these conditions for q = 1,, 3. For q > 3, (A3) (ii) rules out optimal smoothing. We will choose the smoothing parameters based on, but not the same as (if q 4), the least squares 4
6 cross-validation method. The leave-one-out kernel estimator of E(y i x i ) = µ(x i ) is given by ˆt i (x i ) = n j i t jk n,ij n j i K, (.13) n,ij where K n,ij = W h (x c i, xc j )L(xd i, xd j, λ). We choose (h, λ) = (h 1,..., h q, λ 1,..., λ r ) by minimizing the following least squares cross-validation function CV (h, λ) = 1 n [ ti ˆt i (x i ) ] Sn (x i ), (.14) where S n ( ) is a weight function that trims out observations near the boundary of the support of x i (avoiding the excessive boundary bias). Hall et al (004a,b) have shown that when x d s (x c s) is an irrelevant covariate, the cross-validation selected smoothing parameter λ s (h s ) will converge to 1 ( ) in probability, hence, irrelevant covariates (discrete or continuous) will be automatically smoothed out. Given that the cross-validation method can automatically remove the irrelevant covariates, we will derive the asymptotic distribution of ˆτ under the condition that all the q continuous variables and the r discrete variables are relevant ones, i.e., assuming that the irrelevant covariates have already been removed when computing ˆτ. When all the covariates are the relevant ones, we request that h s s and λ s s are chosen from some shrinking set, h s (0, η n ] (s = 1,..., q) and λ s (0, η n ] (s = 1,..., r), where η n 0 as n at a rate slower than any inverse polynomial of n. This assumption can be relaxed as in Hall et al (004a,b). We let ( h 1,..., h q, λ 1,..., λ q ) denote the cross-validation choices of (h 1,..., h q, λ 1,..., λ q ) that minimize (.14). It is well known that the cross-validation method leads to optimal smoothing, but our assumption (A) (ii) rules out optimal smoothing when q > 3. Therefore, we suggest using ĥ s = h s n 1/(ν+q) n 1/(q+ν+) and ˆλ s = λ s n ν/(ν+q) n ν/(q+ν+). Thus we have ĥs n 1/(q+ν+) and ˆλ s n ν/(q+ν+), satisfying (A) (ii). When 1 q 3, we know that we can choose ν = (second order kernel), which leads to ĥs h s and ˆλ s λ s. Following the proofs of Hall et al. (004a), one can show the following: Lemma. Under the assumptions (A1) to (A3), and the condition of (A.6) given in the Appendix A, we have ĥ s = a 0 sn 1/(ν+q+) + o p (n 1/(ν+q+)), for s = 1,..., q; ˆλ s = b 0 sn /(ν+q+) + o p (n /(ν+q+)), for s = 1,..., r. where a 0 ss are finite positive constants, and b 0 ss are non-negative finite constants. The proof of Lemma., as well as consistent estimators of V 1, V and B h,λ, is given in Appendix A. The empirical applications and simulation results presented in Hall et al. (004a,b) reveal that nonparametric estimation based on cross-validated bandwidth selection performs much better than a conventional frequency estimator (which corresponds to λ s = 0 for all s = 1,..., r) because the former does not split the sample in finite-sample applications, which creates efficiency losses. Having obtained the ĥss and ˆλ s s based on the cross-validation method, we estimate τ using expression (.1) with ˆt(x i ) computed using ĥss and ˆλ s s. To avoid introducing too many notations, we will still use ˆτ to denote the resulting estimator of τ. 5
7 . The Asymptotic Distribution of ˆτ The next theorem provides the asymptotic distribution of ˆτ. Theorem.1 Under assumptions (A1) - (A3) we have n(ˆτ τ Bh,λ ) N(0, V 1 + V ) in distribution, where B h,λ = q B1s(x)ĥν s r B s(x)ˆλ s, B 1s (x) and B s (x) are defined in lemma B. of Appendix B, V 1 = var(τ(x i )), V = E { σ (x i, t i )(t i µ i ) /[µ i (1 µ i) ] }, and σ (x i, t i ) = E(u i x i, t i ). The proof of Theorem.1 is given in Appendix A. Theorem.1 shows that our kernel-based estimator of ˆτ is semiparametrically efficient. Let f(x, t) and f x (x) denote the joint and marginal densities of (x i, t i ) and x i, respectively, and let p(t i x i ) be the conditional probability of t i given x i. Letting dx = x d dx c, µ x = µ(x), and using f(x i, t i ) = p(t i x i )f x (x i ), and noting that p(t i = 1 x i ) = µ i and p(t i = 0 x i ) = 1 µ i, we have V = E { σ (x i )(t i µ i ) / [ µ i (1 µ i ) ]} = f x (x)p(t x) { σ (x, t)(t i µ x ) / [ µ x(1 µ x ) ]} dx t=1,0 fx (x)µ x σ (x, 1)(1 µ x ) = µ x(1 µ x ) dx + { σ } (x i, 1) = E + σ (x i, 0). µ i 1 µ i fx (x)(1 µ x ) σ (x, 0) µ x(1 µ x ) dx (.16) Equation (.16) matches the expression given in Hahn (1998). Thus, V 1 + V coincides with the semiparametric efficient bound for this model. Hirano et al. (00) consider the problem of estimating average treatment effects using series estimation methods, and they observe that if one uses the true cov(t i x i ) = µ i (1 µ i ) to replace the estimated covariance ˆt i (1 ˆt i ) in (the denominator of) ˆτ, then it results in a less efficient estimator of τ. The same result holds true for our kernel-based estimator, as the next lemma shows. Lemma.3 If one replaces the denominator ˆt i (1 ˆt i ) in ˆτ by µ i (1 µ i ), and lets τ denote the resulting estimator of τ, i.e., τ = 1 (t i ˆt i )y i M ni, (.17) n µ i (1 µ i ) then n( τ τ Bh,λ ) N(0, V 1 + V + V 3 ) in distribution, where V 1 and V are the same as that given in Theorem.1, while V 3 is given by { [ (ti µ i ) ] } V 3 = E µ i (1 µ i ) 1 τi. The proof of Lemma.3 is given in Appendix A. 6
8 We observe how using the true var(t i x i ) yields a less efficient estimator than ˆτ, which uses the estimated var(t i x i ). The reason for this result is that one can express n(ˆτ τ B h,λ ) as n(ˆτ τ Bh,λ ) = n(ˆτ τ) + n( τ τ B h,λ ). In Appendix A we show that n( τ τ B h,λ ) = Z n1 + Z n + Z n3 + o p (1) N(0, V 1 + V + V 3 ) in distribution, where Z nl s (l = 1,, 3) are three asymptotically uncorrelated terms, having asymptotic N(0, V l ) distributions, respectively (l = 1,, 3, with definitions appearing in Appendix A). This yields Lemma.3. In Appendix A we also show that n(ˆτ τ) = Z n3 + o p (1). Hence, n(ˆτ τ Bh,λ ) = n(ˆτ τ) + n( τ τ B h,λ ) = { Z n3 + o p (1)} + {Z n1 + Z n + Z n3 + o p (1)} = Z n1 + Z n + o p (1) N(0, V 1 + V ) in distribution, (.1) resulting in Theorem.1. That is, since the leading term in n(ˆτ τ) cancels one of the leading terms in n( τ τ B h,λ ), this gives rise to the result that using an estimated variance ˆvar(t i x i ) is more efficient than using the true variance var(t i x i ) when estimating τ. If one uses the true propensity score µ i in both the numerator and denominator of ˆτ, then one gets τ, which is more efficient than ˆτ since n( τ τ) is asymptotically normal with zero mean and asymptotic variance V 1. Of course, τ is not a feasible estimator. Thus, among the class of feasible ( regular ) estimators, ˆτ is asymptotically efficient. An alternative estimator for τ In order to construct consistent estimator for the aymptotic variance V 1 + V, we need to obtain, among other things, consistent estimator of the error u i. The above proposed ˆτ is based on estimated propensity score, it does not estimate the regression mean functional directly. In this subsection we consider an alternative estimator for τ which is based on direction estimation of E(y i x i, t i ), which of course also leads to a direct estimator of u i. Note that (.6) can also be viewed as a functional coefficient model (smooth coefficient model) as considered by Chen and Tsay (1993), Cai, Fan and Yao (000), Cai, Fan and Li (000), and Li et al. (00), among others. Thus an alternative estimator of τ(x i ) can be obtained by a local regression of y i on (1, t i ) using kernel weights. In this way we obtain a nonparametric estimator of (g 0 (x i ), τ(x i )) given by ) (ĝ0 (x i ) = n 1 ˆτ n (x i ) ( ) 1 (1, t j )W h,ij Lˆλ,ij j i t j 1 n 1 ( ) 1 y j W h,ij Lˆλ,ij, (.) where W h,ij = W h (x c j, xc i ) and Lˆλ,ij = L(x d i, xd j, ˆλ). (.) gives consistent estimator of g 0 (x i ) and tan(x i ). For example, the resulting estimate of τ(x i ) is given by j i ˆτ n (x i ) = Ê(y it i x i ) Ê(y i x i )Ê(t i x i ), (.3) ˆt(x i )(1 ˆt(x i )) where Ê(y i t i x i ) = n 1 n j=1 t jy j K n,ij / ˆf(x i ), Ê(y i x i ) = n 1 n j=1 y jk h,ij / ˆf(x i ), ˆt(x i ) = n 1 n j=1 t jk n,ij / ˆf(x i ), and ˆf(x i ) = n 1 n j=1 K n,ij. t j 7
9 From (.3) we readily obtain a consistent estimator of E(y i x i ) by ĝ 0 (x i ) + ˆτ n (x i ) t i. One can also estimate τ by ˆτ n = n 1 n ˆτ n(x i )..3 Testing no Effect Based on Bootstrapping The result of Theorem.1 can be used to test the null hypothesis of no effect (τ = 0) of a treatment. We know that nonparametric estimation results can be sensitive to the choice of smoothing parameters. Therefore, test results based on asymptotic distributions may also be sensitive to the selection of smoothing parameters. In order to obtain a more robust test, we suggest using resampling methods to approximate the null distribution of the test statistic. Below we present two bootstrap procedures. The first procedure does not involve any null hypothesis and is useful for the construction of error bounds, while the second procedure imposes the null hypothesis of no treatment effect. Let z i {y i, t i, x c i, xd i }n, the vector of realizations on the outcome, treatment, and conditioning information. We wish to construct the sampling distribution of ˆτ, and do so with the following resampling procedure. 1. Randomly select from {z j } n j=1 with replacement, and call {z i }n the bootstrap sample.. Use the bootstrap sample to compute the bootstrap statistic ˆτ using the same cross-validated smoothing parameters as were used for ˆτ. 3. Repeat steps 1 and a large number of times, say B times. The empirical distribution function of {ˆτ j } B j=1 will be used to approximate the finite-sample distribution of ˆτ. We now wish to construct the sampling distribution of ˆτ under the null of no treatment effect, and do so with the following bootstrap procedure. 1. Randomly select from {y j } n j=1 from {z j} n j=1 with replacement and call this {y j }n j=1. Next, call {zi }n {y j, t j, x c j, xd j }n j=1 the bootstrap sample. Note that we have broken any systematic relationships between the outcome and covariates, thereby imposing the null of no treatment effect on the sample {zi }n.. Use the bootstrap sample to compute the bootstrap statistic ˆτ using the same cross-validated smoothing parameters as were used for ˆτ (ˆτ n ). 3. Repeat steps 1 and a large number of times, say B times. The empirical distribution function of {ˆτ j } B j=1 will be used to approximate the finite-sample distribution of ˆτ under the null. 4. A test of the null of no treatment follows directly. Let {ˆτ j } B j=1 be the ordered (in an ascending order) statistic of the B bootstrap statistics, and let ˆτ α denote the αth percentile of {ˆτ j }B j=1. We reject H 0 if ˆτ > ˆτ α at the level α. Let ˆV 1, ˆV and ˆB h,λ denote some consistent estimators of V 1, V and B h,λ, respectively (say, as given in Appendix A), and let ˆV 1, ˆV and ˆB h,λ, denote their bootstrap counterparts, obtained by replacing (y i, x i, t i ) with (yi, x i, t i ) in ˆV 1, ˆV and ˆB h,λ, respectively. Note that the bootstrap counterpart quantities use the same smoothing parameters (they do not require re-cross-validation). The following theorem shows that the bootstrap method works. 8
10 Theorem. Under the same conditions as in Theorem.1, define Tn = n(ˆτ ˆB h,λ )/ ˆV 1 + ˆV Then Tn {x i, t i, y i } n N(0, 1) in distribution in probability. 3 The proof of Theorem. is similar to the proof of Theorem.1 and is thus omitted here. Let ˆT n = n(ˆτ ˆB h,λ )/ ˆV1 + ˆV. Then under H 0 both ˆT n and ˆT n have asymptotic standard normal distributions. Compare the differences between the numerators of the two test statistics: n(ˆτ ˆτ ) + n(bh,λ B h,λ) = n(ˆτ ˆτ ) + ( no q ) p hν+ s = n (ˆτ ˆτ ) + o p (1) because both ˆB h,λ and ˆB h,λ are of order O ( q hν s) and their differences are of smaller order ˆB h,λ ( ˆB h,λ = O q ) p hν+ s. Therefore, when one uses bootstrap procedures to conduct the test, one does not need to compute B h,λ (and Bh,λ ). This is yet another advantage of using bootstrap procedures in this setting. 3 Simulations In this section we report simulations designed to examine the finite sample performance of the proposed methods. We highlight performance in mixed data settings, a feature that existing methods do not handle well, and consider testing for the null of no treatment effect using two kernel methods, a nonparametric propensity score model, and a nonparametric frequency propensity score model (traditional cell-based estimator). We consider the following data generating process (DGP):. y i = g(x c i, x d i, t i ) + ɛ i = g 0 (x c i, x d i ) + [g 1 (x c i, x d i ) g 0 (x c i, x d i )]t i + ɛ i = g 0 (x c i, x d i ) + τ(x c i, x d i )t i + ɛ i = β 0 + β 1 x c i1 + β (x c i1) + β 3 x d i1 + β 4 x d i + τ(x c i, x d i )t i + ɛ i, (3.5) where x c 1 is U[ 1, 1] and xd 1 {0, 1} with P [xd 1 = 1] = 0. and xd {0, 1} with P [xd 1 = 1] = 0.6, and let (β 0, β 1, β, β 3, β 4, τ) = (1,,,, 0, τ), while σ ɛ = 1 (x d is irrelevant). These parameter choices yield an adjusted R of around 66%. Our model for the propensity score (Probit) is t i = γ 0 + γ 1 x c i1 + γ (x c i1) + γ 3 x d i1 + γ 4 x d i + η i, { 1 if Φ(t t i = i ) > otherwise. (3.6) where σ η = 1 and Φ is the standard normal CDF. We set (γ 0, γ 1, γ, γ 4, γ 4 ) = ( 1/, 1/4, 1/4, 1/4, 0) (x d is irrelevant). We fix the sample size at for n = 50. These parameter choices yield a correct classification ratio of roughly 60%. For a given value of τ, we generate each replication in the following manner: 1. Draw a sample of size n for {x c 1, xd 1, xd, η, ɛ}, which then determines the values of {t, y}. 3 For the definition of convergence in distribution in probability, see Li et al (003). 9
11 . Using {t i, y i, x c i1, xd i1, xd i }n, compute ˆτ using each of the four kernel methods. 3. Compute tests for the null of no effect for each of the four kernel methods based on 199 bootstrap replications under H 0 at nominal levels α = 0.10, 0.05, Repeat steps 1 through 3 B = 1000 times for values of τ in {0.0, 0.5, 0.50, 0.75}. All bandwidths were selected via cross-validation based upon two restarts of a multidimensional numerical search routine allowing for different bandwidths for all variables. The results are summarized in tables 1 and. Table 1 presents results for the proposed method which employs nonparametric kernel approach appropriate for mixed data ( smooth ), while Table presents the traditional frequency approach, which involves splitting the data into subsamples ( nonsmooth ). Table 1: Empirical Rejection Frequencies for Smooth Propensity Score Model. τ α = 0.10 α = 0.05 α = Table : Empirical Rejection Frequencies for Non-Smooth Propensity Score Model. τ α = 0.10 α = 0.05 α = We observe first that the smooth approach is correctly sized and has power in the direction of the alternative DGP. The traditional nonsmooth propensity score approach suffers from minor size distortions, suggesting that it is more susceptible to efficiency losses arising from sample splitting than the nonsmooth functional coefficient approach. The important comparison lies with the performance of the proposed smooth and the traditional nonsmooth approaches. It can be seen that, as expected, sample splitting leads to efficiency loss, which manifests itself as a loss in power. Note that we have only considered one binary irrelevant covariate case, when there exists more irrelevant covariates, or one irrelevant covariate that takes more than two different values, the power loss becomes more substantial. Summarizing, the proposed smooth test is more powerful that a traditional frequency-based (nonsmooth) test when confronted with mixed data settings, which are often encountered in applied settings. 10
12 4 An Empirical Application We will be interested in models that make use of the following variables 4 which were taken from the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT). The data was obtained from the Department of Health Evaluation Sciences at the University of Virginia 5 : Y : Outcome - 1 if death occurred within 180 days, zero otherwise T : Treatment - 1 if a Swan-Ganz catheter was received by the patient when they were hospitalized, zero otherwise. X 1 : Sex - 0 for female, 1 for male X : Race - 0 if black, 1 if white, if other X 3 : Income - 0 if under 11K, 1 if 11-5K, if 5-50K, 3 if over 50K X 4 : Primary disease category - 1 if Acute Respiratory Failure, if Congestive Heart Failure, 3 if Chronic Obstructive Pulmonary Disease, 4 if Cirrhosis, 5 if Colon Cancer, 6 if Coma, 7 if Lung Cancer, 8 if Multiple Organ System Failure with Malignancy, 9 if Multiple Organ System Failure with Sepsis X 5 : Secondary disease category - 1 if Cirrhosis, if Colon Cancer, 3 if Coma, 4 if Lung Cancer, 5 if Multiple Organ System Failure with Malignancy, 6 if Multiple Organ System Failure with Sepsis, 7 if NA X 6 : Medical insurance - 1 if Medicaid, if Medicare, 3 if Medicare & Medicaid, 4 if No insurance, 5 if Private, 6 if Private & Medicare X 7 : Age - age (converted to years from Y/M/D data stored with decimal accuracy) Table 3 presents some summary statistics on the variables described above. The number of cells in this dataset is 18,144 which exceeds the number of records, 5,735. Note that, as was found by Connors et al (1996), those receiving right-heart catheterization are more likely to die within 180 days than those who did not. Interestingly, Lin et al (1998) also find that when further adjustments were made that the risk of death is lower than that reported by Connors et al (1996) and they conclude that results of our sensitivity analysis provide additional insights into this important study and imply perhaps greater uncertainty about the role of RHC than those stated in the original report. 4 The Connors et al (1996) study considered 30-day, 60-day, and 180-day survival and they also considered categories of admission diagnosis and categories of comorbidities illness as covariates. We restrict attention to 180-day survival by way of example, while we ignore admission diagnosis and comorbidities illness due to the prevalence of missing observations among these covariates. As it is our intention to demonstrate the utility of the proposed methods on actual data and not to become immersed in ad hoc adjustments that must be made to handle the prevalence of missing data for these additional covariates, we beg the reader s forgiveness in this matter. Nevertheless, even though we omit admission diagnosis and comorbidities illness as covariates, we indeed detect results that are qualitatively and quantitatively similar to those reported in Connors et al (1996) and Lin et al (1998). 5 We are most grateful to Dr. B. Knaus and Dr. F. Harrell Jr. for making this data available to us, and also to Luz Saavedra for helping with data conversion. 11
13 Table 3: Summary Statistics Variable Mean StdDev Minimum Maximum Outcome Treatment Sex Race Income Primary disease category Secondary disease category Medical insurance Age Lin et al (1998) note that cardiologists and intensive care physician s belief in the efficacy of RHC for guiding therapy for certain patients is so strong that it has prevented the conduct of a randomized clinical trial (RCT) while Connors et al (1996) note that the most recent attempt at an RCT was stopped because most physicians refused to allow their patients to be randomized. The confusion matrix for the parametric propensity score model is A/P Sample Size 5735 CR(0-1) 66.7% CR(0) 80.0% CR(1) 45.% The confusion matrix for the nonparametric propensity score model is A/P Sample Size 5735 CR(0-1) 69.9% CR(0) 8.1% CR(1) 50.0% An examination of these confusion matrices demonstrates how, for this dataset, the nonparametric approach is better able to predict who receives treatment and who does not than the Logit model. The parametric approach correctly predicts 3,88 of the 5,735 patients while the nonparametric approach correctly predicts 3,976 patients thereby predicting an additional 148 patients correctly. The differences between the parametric and nonparametric versions of the weighting estimator reflect this additional number of correctly classified patients along with differences in the estimated probability of treatment themselves. The increased risk suggested by the parametric model drops from a 7% increase for those receiving RHC to roughly 0%. 1
14 Based upon the parametric propensity score estimate, the treatment effect is 0.07, while the nonparametric propensity score estimate yields a treatment effect of We then bootstrapped the sampling distribution of these estimates, and obtained 95% coverage error bounds of [0.044, 0.099] for the parametric approach and [ 0.038, 0.011] for the nonparametric approach. Thus, we overturn the parametric testing result and conclude that patients receiving RHC treatment does not suffer an increased risk. We also conducted some sensitivity analysis. Using likelihood cross-validation rather than leastsquares cross-validation yielded 95% coverage error bounds of [ 0.034, 0.013]. Out of concern that the nonparametric results might reflect overfitting, we computed the leave-one-out kernel predictions, and again bootstrapped their error bounds. For the least-squares cross-validated estimates we obtained 95% coverage error bounds of [ 0.015, 0.037], while for the likelihood cross-validated estimates we obtained 95% coverage error bounds of [ 0.007, 0.039]. These error bounds indicate that the parametric model suggests a statistically significant increased risk of death for those receiving RHC, while the nonparametric model yields no significant difference. This does not appear to be a result of a loss of efficiency due to using the nonparametric propensity score rather than the parametric one as can be seen by a comparison of the out of sample prediction results of the confusion matrices, because if the parametric model is correctly specified, one should expect that the parametric model predicts better than a nonparametric model. A Appendix A Definition of ˆV 1, ˆV and ˆB h,λ ˆV 1 = n 1 n [ˆτ ni ˆm τ ], where ˆτ ni is defined in (.3), ˆm τ = n 1 n j=1 ˆτ nj is the sample mean of ˆτ n (x j ). ˆV = n 1 n û i (t i ˆt i ) /[ˆt i (1 ˆt i )], where ˆt i is the kernel estimator of E(t i x i ), û i = ĝ 0 (x i ) + t iˆτ n (x i ), ĝ 0 (x i ) is defined in (.). To estimate B h,λ we need to estimate B 1s (s = 1,..., q) and B s (s = 1,..., r), these quantities can be estimated by estimating f(x i ), m(x i ) = g 0 (x i ) + µ(x i )τ(x i ) by ˆf(x i ) and ˆm(x i ) = ĝ 0 (x i ) + ˆµ(x i )ˆτ n (x i ), respectively, and their derivatives estimators can be obtained by taking derivatives (since the kernel function is differentiable up to order ν). Finally, replacing the population mean E(.) by sample mean lead consistent estimators for B 1s and B s, and hence for B h,λ. In Appendices A and B, because M n (x) 1 on the support of f(x), we will omit the trimming function M ni. Also, we will use the notation (s.o.) which is defined as follows: When B n is the leading term of A n, we write A n = B n + (s.o.), where (s.o.) denotes terms having probability order smaller than B n. Also, when we write A(x i ) = B(x i ) + (s.o.), it means that n 1 n A(x i) = n 1 n B(x i) + (s.o.). Proof of Lemma. We use the notation g s (l) (x) to denote l g(x)/ (x c s) l, the lth order partial derivative of g( ) with respect to x c s. Also, when x d s is an unordered categorical variabe, define an indicator function I s (.,.) by ( ) r ( ) I s (x d, z d ) = 1 x d s zs d 1 x d t = zt d (A.1) When x d s is an ordered categorical variabe, for notational simplicity, we assume that x d s takes (finitely t s 13
15 many) consecutive integer values, and I s (.,.) is defined by ( I s (x d, z d ) = 1 x d s zs d = 1 ) r t s ( ) 1 x d t = zt d. (A.) When using a second order kernel (ν = ), Hall et al. (004a,b) have shown that CV (h, λ) ν= = x d { κ + v d r + q κ q nh 1... h q [ (fg) () s (x) g(x)f s () ] (x) h s } [ ] I s (v d, x d ) g(x c, v d ) g(x) f(x c, v d )λ s S(x)f(x) 1 dx c x d σ (x)s(x)dx c + (s.o.), (A.3) where κ = w(v)v dv, κ = w(v) dv, and I s ( ) is defined in (A.1) and (A.). By following exactly the same derivation as in Hall et al. (004a,b), one can show that, with a νth order kernel, CV (h, λ) = { } q r B 1s (x)h ν s + B s (x)λ s S(x)f(x) 1 dx c x d κ q + nh 1... h q x d σ (x)s(x)dx c + (s.o.), (A.4) where B 1s (x) = (κ q /ν!)[(fg) (ν) s (x) g(x)f s (ν) (x)] (s = 1,..., q), κ q = w(v)v q dv, and B s (x) = v d I s(v d, x d )[g(x c, v d ) g(x)]f(x c, v d ), and (s.o.) denote terms having smaller probability orders, uniformly in (h, λ) (0, η n ] p+r. The only difference between (A.3) and (A.4) are that h s being replaced by h ν s and that the definition of B 1s is slightly difference. Of course, (A.4) reduces to (A.3) if ν =. Define a s via h s = a s n 1/(ν+q) (s = 1,..., q), and b s via λ s = b s n ν/(ν+q) (s = 1,..., r), then (A.4) can be written as C(h, λ) = n ν/(ν+q) χ(a, b) + (s.o.) uniformly in (h, λ) (0, η n ] p+r, where χ(a, b) = { q B 1s (x)a ν s + x d + κq a 1... a q x d } r B s (x)b s S(x)f(x) 1 dx c σ (x)s(x)dx c. (A.5) We assume that There exist unique finite positive constants a 0 s (s = 1,..., q) and finite non-negative constants λ s (s = 1,..., r) that minimzes χ(a, b). (A.6) Define a (q + r) (q + r) positive semidefinite matrix A via its tth row and sth column as A t,s = x d Bt (x) B(x s )S(x)f(x) 1 dx c, where B t (x) = B 1t (x) for t = 1,..., q, and B q+t (x) = B t (x) for 14
16 t = 1,..., r, then it can be easily shown that a sufficient condition for (A.6) to hold is that A is positive definite. From (A.4), (A.5) and (A.6), we obtain that h s = a 0 sn ν/(ν+q) + o p (n ν/(ν+q) ) and λ s = λ 0 sn 1/(ν+q) + o p (n 1/(ν+q) ), Lemma.1 then follows from this since ĥs = h s n 1/(ν+q) n 1/(ν+q+) and ˆλ s = λ s n ν/(ν+q) n ν/(ν+q+). Proof of Theorem.1 In order to make the proof of Theorem.1 more manageable we make a number of simplifying assumptions. (i) we replace ˆt(x i ) in the definition of ˆτ by the leave-one-out estimator ˆt i (x i ), or one can redefine ˆτ by replacing ˆt(x i ) by ˆt i (x i ) in ˆτ. (ii) we replace ĥs by the non-stochastic quantity h 0 s = a 0 sn 1/(ν+q+) (s = 1,..., q), and ˆλ s by λ 0 s = b 0 sn /(ν+q+) (s = 1,..., r). (iii) when we evaluate the probability order of a term, we sometimes assume that h s = h for all s = 1,..., q, to simplify the notation. For example, we will write O(h ν ) for O( q hν s) to save space. Note that the proof carries through without making these simplifying assumptions. For example, ignoring the leave-oneout estimator only introduces some extra smaller order terms. Lemma. shows that ĥs/h s 1 = o p (1), and ˆλ s /λ s 1 = o p (1), and by the stochastic equicontinuity result of Ichimura (000), we know that the asymptotic distribution of ˆτ remains the same whether one uses ĥs s and ˆλ s s, or the non-stochastic leading term of them (h 0 s s and λ 0 s s). We will repeatedly use the U-statistic H-decomposition in the proof below. We sometimes write n 1 for (n 1) 1 since this approximation does not affect the order of any quantity considered. We will use the short-hand notation ˆt i = ˆt i (x i ) and ˆf i = ˆf i (x i ), i.e., ˆt i = n 1 n j i t jk n (x j, x i ) ˆf i, (A.7) with ˆf i = n 1 n j i K n(x j, x i ). Define v i = t i E(t i x i ) t i µ i, so that t i = µ i + v i, replacing t j by µ j + v j in the right-hand-side of (A.7) we have ˆt i = ˆµ i + ˆv i, (A.8) where and We use the short-hand notation w i and w i defined by ˆµ i = n 1 n j i µ jk h ( x j x i h ) ˆf i, (A.9) ˆv i = n 1 n j i v jk h ( x j x i h ) ˆf i. (A.10) w i = µ i (1 µ i ) and w i = ˆt i (1 ˆt i ). (A.11) We use the following identities to handle the random denominator of ˆτ: Proof of Theorem.1 1 w i = 1 w i + w i w i w i + (w i w) wi w. (A.1) i 15
17 We have defined τ in (.17). We now define another intermediate quantity τ (v i = t i µ i and we omit M ni for notational simplicity): τ = 1 n (t i µ i )y i w i 1 n v i y i w i. (A.13) By adding and subtracting terms in n(ˆτ τ), we get n(ˆτ τ) = n[(ˆτ τ) + ( τ τ) + ( τ τ)] = J 1n + J n + J 3n, (A.14) where J 1n = n(ˆτ τ), J n = n( τ τ), and J 3n = n( τ τ). Recall that w i = µ i (1 µ i ), w i = ˆt i (1 ˆt i ). Using (A.1), we get from (.1) ˆτ = 1 n [ ti ˆt i ] yi [ 1 w i + w i w i w i + (w i w i ) ] wi w i L 1n + L n + L 3n, (A.15) where L 1n = n 1 L n = n 1 L 3n = n 1 [(t i ˆt i )y i ]/w i τ, [(t i ˆt i )(w i w i )y i ]/[wi ], [(t i ˆt i )(w i w i ) y i ]/[wi w i ]. (A.16) Note that L 1n = τ, therefore, by (A.15) we have J 1n = n(ˆτ τ) = nl n + nl 3n. (A.17) Lemmas A.3 and A.4 (see below) give the leading terms of J 1n and J n. Using (.6) and adding and subtracting terms, we write J 3n = n( τ τ) as J 3n = n 1/ (v i y i /w i τ) = n 1/ (v i y i /w i τ i ) + n 1/ (τ i τ) = n 1/ [v i (g 0i + τ i t i + u i )/w i τ i ] + n 1/ (τ i τ). (A.18) 16
18 By Eq. (A.18), Lemma A.3 and Lemma A.4, we obtain from Eq. (A.14) that n(ˆτ τ Bh,λ ) = J 1n + J n n 1/ B h,λ + J 3n = n 1/ v i [µ i 1]τ i /w i n 1/ v i (g 0i + τ i µ i )/w i ] +n 1/ {v i (g 0i + τ i t i + u i )/w i τ i } + n 1/ (τ i τ) + o p (1) = n 1/ v i u i w i + n 1/ (τ i τ) + o p (1) Z n + Z n3 + o p (1), (A.19) where the definitions of Z n = n 1/ n v iu i /w i and Z n3 = n 1/ n (τ i τ), also, in the above we have used the following cancellation result (w i = µ i (1 µ i )): [ ] n 1/ vi (t i µ i ) 1 τ i + n 1/ w i i [ v = n 1/ i µ i (1 µ i ) + v i µ i v i w i w i [ v = n 1/ i µ i + µ i + v ] iµ i v i τ i [ = n 1/ (µi + v i ) ] (µ i + v i ) τ i = 0, w i ] τ i [ ] µi 1 v i τ i w i since (µ i + v i ) (µ i + v i ) t i t i = 0 (because t i = t i). Theorem.1 follows from (A.19) and the Lindeberg central limit theorem. (A.0) Proof of Lemma.3 From n( τ τ B h,λ ) = n( τ τ B h,λ ) + n( τ τ) = J n n 1/ B h,λ + J 3n, and using (A.18) and Lemma A.4, we have (v i = t i µ i ) n( τ τ Bh,λ ) = J n n 1/ B h,λ + J 3n = n 1/ v i (g 0i + τ i µ i )/w i + n 1/ [v i (g 0i + τ i t i + u i )/w i τ i ] + n 1/ (τ i τ) + o p (1) = n 1/ [ v i w i 1]τ i + n 1/ Z n1 + Z n + Z n3 + o p (1) v i u i w i + n 1/ (τ i τ) + o p (1) d N(0, V 1 + V + V 3 ) by the Lindeberg central limit theorem, (A.1) 17
19 where Z n1 and Z n are defined in (A.19), Z n3 = n 1/ [ ] n v i w i 1. Note that by Lemma A.3 and (A.0) we know that J 1n = Z n3 + o p (1). Below we present some lemmas that are used in proving Theorem.1. We will use the following identity to handle the random denominator in the kernel estimator. For any positive integer p we have 1 ˆf i = 1 f i + f i ˆf i f i ˆfi = 1 f i + p (f i ˆf i ) l l=1 f l+1 i + (f i ˆf i ) p+1 f p i ˆf i. (A.) For example, in Lemma A.1 below we need to evaluate a term like n 1 n v iy iˆv i /wi. ˆv i = n 1 j i v jk n,ij / ˆf i has a random denominator ˆf i. By computing the second moment of the term associated with (f i ˆf i ) l /fi l+1, one can easily show that this term has an smaller order than the main term that is associated with 1/f i. Also, using the uniform convergence rate of sup x S ˆf(x) f(x) = O p ( q hν s + ln n(nh 1...h q ) 1 ), together with inf x S f(x) δ > 0, one can easily show the last remainder term associated with (f i ˆf i ) p+1 /(f p ˆf i i ) is of smaller order than the first leading term (by choosing p to be sufficiently large). Therefore, using (A.) we have n 1 v i y iˆv i /wi = n 1 v i y iˆv i ˆfi /(f i wi ) + (s.o.), Now the leading term n 1 n v iy iˆv i ˆfi /(f i w i ) does not contain the random denominator ˆf i and its probability order can be easily evaluated by using H-decomposition of U-statistics. Lemma A.1 L n = n 1 i v i(µ i 1)τ i /w i + o p ( n 1/ ). Proof: Recall that w i = µ i (1 µ i ), w i = ˆt i (1 ˆt i ), t i = µ i + v i, and ˆt i = ˆµ i + ˆv i, we have L n = n 1 y i (t i ˆt i )[µ i ˆt i (µ i ˆt i )]/wi = n 1 y i (t i ˆt i )(µ i ˆt i )[1 (µ i + ˆt i )]/wi = n 1 y i (µ i ˆµ i + v i ˆv i )(µ i ˆµ i ˆv i )[1 (µ i + ˆµ i + ˆv i )]/wi = n 1 y i v iˆv i [1 µ i ]/wi + o p (n 1/) (using ˆµ i = µ i + (ˆµ i µ i ) and Lemma B.3) = = 1 n(n 1) n(n 1) j i y i v i v j (µ i 1)K n,ij /wi + o p (n 1/) (by Lemma B. and E(v i x i ) = 0) H n,a (z i, z j ) + (s.o.), i j>i (A.4) 18
20 where H n,a (z i, z j ) = (1/)v i v j {y i (1 µ i )/[f i wi ] + y j(µ j 1)/[f j wj ]}K n,ij and z i = (x i, t i, u i ). Define H 1n,a (z i ) = E[H n,a (z i, z j ) z i ] = (1/)v i τ i (µ i 1)/wi + (s.o.) by Lemma B.4) (i). Hence, by the U-statistic H-decomposition we have L n = n(n 1) = 0 + (/n) i = n 1 i H n,a (z i, z j ) + (s.o.) i j>i H 1n,a (z i ) + n(n 1) {H n,a (z i, z j ) H 1n,a (z i ) H 1n,a (z j ) + 0} + (s.o.), i j>i v i τ i (µ i 1)/w i + O p ( (nh q/ ) 1) + (s.o.) = n 1 i v i τ i (µ i 1)/w i + o p ( n 1/) (A.5) by Lemma B.4 and assumption (A3), where we also used the fact that the degenerate U-statistic def U n,a = [/n(n 1)] i j>i {H n,a(z i, z j ) H 1n,a (z i ) H 1n,a (z j )} has a second moment of E[Un,a] = O ( (n h q ) 1) (, so U n,a = O p (nh q/ ) 1). Lemma A. L 3n = O ( h ν + h (nh q ) 1) = o p ( n 1/ ). Proof: Using the identity of 1 w i = 1 w i + p (w i w i ) l l=1 w l+1 i + (w i w i ) p+1 w p i w, (A.6) i one can show that the leading term of L 3n is L 3n,1 = n 1 i y i(t i ˆt i )(w i w i ) /wi 3. This is because, (i) it is easy show that (by computing the second moment of them) the term associated with (w i w i ) l /wi l+1 has an smaller order than the main term that is associated with 1/w i. Also, using the uniform convergence rate of sup x S ˆµ(x) µ(x) = O p ( q hν s + ln n(nh 1...h q ) 1 ), together with inf x S µ(x) c > 0, and sup x S µ(x) c 1 < 1 (0 < c < 1), one can easily show the last remainder term associated with (w i w i ) p+1 /(w p i w i) is of smaller order than the first leading term (by choosing p to be sufficiently large if needed). By noting that t i = µ i + v i and w i w i = (µ i ˆt i )[1 (µ i + ˆt i )], we have L 3n,1 = n 1 i y i (t i ˆt i )(w i w i ) /w 3 i = n 1 [g 0i + τ i t i + u i ][(µ i ˆt i ) + v i ](µ i ˆt i ) [1 (µ i + ˆt i )] /wi 3 i n 1 [v i (µ i ˆt i ) + (µ i ˆt i ) 3 ] = O ( h ν + h (nh q ) 1) (A.7) by Lemma B.3, where in the above A B means that A = B + (s.o.). Lemma A.3 J 1n = n 1/ i v i(µ i 1)τ i /w i + o p (1). Proof: It follows from lemmas A.1 and A.. Lemma A.4 J n = B h,λ 1 n n v 1i(g 0i + τ i µ i )/w i + o p (1). 19
21 Proof: Using ˆt i = ˆµ i + ˆv i, we have J n = n 1/ n (µ i ˆt i )y i /w i = n 1/ n (µ i ˆµ i )y i /w i n 1/ n ˆv iy i /w i J n,1 J n,. We consider J n,1 first. J n,1 n 1/ ( (µ i ˆµ i )y i /w i = B h,λ + O p n 1/ h ν+ + h(nh q ) 1/) by Lemma B., where B h,λ is defined in lemma B.. Next, J n, = n 1/ ˆv i ˆfi y i /(f i w i ) + o p (1) = n 1/ (n 1) 1 = n 1/ (n 1) = n 1/ n(n 1) j i v j y i K n,ij /(f i w i ) (by using Eq. (A.)) (1/){v j y i /(f i w i ) + v i y j /(f j w j )}K n,ij j>i H n,b (z i, z j ), j>i (A.9) where H n,b (z i, z j ) = (1/){v j y i /(f i w i ) + v i y j /(f j w j )}K n,ij, and z i = (x i, t i, u i ). By noting that E(v i x i ) = 0 we have (using y j = g 0j + τ j (µ j + v j ) + u j ) H 1n,b (z i ) def = E[H n,b (z i, z j ) z i ] = (1/)v i (g 0i + τ i µ i )/w i + (s.o.) by Lemma B.4 (iii). Hence, by the U-statistic H-decomposition we have J n, = n 1/ {0 + (/n) = n 1/ H 1n,b (z i ) + n(n 1) v i (g 0i + τ i µ i )/w i + O p ( (nh q ) 1/) [H n,b (z i, z j ) H 1n,b (z i ) H 1n,b (z j ) + 0] j>i (A.31) because the last term in the H-decomposition is a degenerate U-statistic which has an order n 1/ O p ( (nh q/ ) 1) = O p ( (nh q ) 1/). B Appendix B Lemma B.1 Let D denote the support of x d, for all x d D, let g(x d, x c ) G ν and f(x d, x c ) G ν 1, ν is an integer. Define η = q hν s + r λ s. Suppose the kernel function W satisfies (A). Then, uniformly in x, (i) E {[g(x) g(x)]k n (X, x)} = q C 1s(x)h ν s + r C s(x)λ s + O ( η ( q h s + r λ s) ) ; 0
Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors
Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Peter Hall, Qi Li, Jeff Racine 1 Introduction Nonparametric techniques robust to functional form specification.
More informationNonparametric Estimation of Regression Functions with Both Categorical and Continuous Data
Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data Jeff Racine Department of Economics, University of South Florida Tampa, FL 33620, USA Qi Li Department of Economics,
More informationECO Class 6 Nonparametric Econometrics
ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationWhat s New in Econometrics. Lecture 1
What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationBalancing Covariates via Propensity Score Weighting
Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationError distribution function for parametrically truncated and censored data
Error distribution function for parametrically truncated and censored data Géraldine LAURENT Jointly with Cédric HEUCHENNE QuantOM, HEC-ULg Management School - University of Liège Friday, 14 September
More informationCausal Inference with Big Data Sets
Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity
More informationConstrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources
Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationCHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity
CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various
More informationImplementing Matching Estimators for. Average Treatment Effects in STATA
Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC
More informationA Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions
A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions Yu Yvette Zhang Abstract This paper is concerned with economic analysis of first-price sealed-bid auctions with risk
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationOptimal bandwidth selection for the fuzzy regression discontinuity estimator
Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationImplementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston
Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend
More informationGoodness-of-fit tests for the cure rate in a mixture cure model
Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationEstimation of the Conditional Variance in Paired Experiments
Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationLecture 3. Truncation, length-bias and prevalence sampling
Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationEstimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.
Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationA review of some semiparametric regression models with application to scoring
A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France
More informationNonparametric Regression
Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationA Goodness-of-fit Test for Copulas
A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and
More informationComparative Advantage and Schooling
Comparative Advantage and Schooling Pedro Carneiro University College London, Institute for Fiscal Studies and IZA Sokbae Lee University College London and Institute for Fiscal Studies June 7, 2004 Abstract
More informationTesting Homogeneity Of A Large Data Set By Bootstrapping
Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp
More informationWeb-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach
1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and
More informationA Practitioner s Guide to Cluster-Robust Inference
A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode
More informationA COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES
A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES MICHAEL O HARA AND CHRISTOPHER F. PARMETER Abstract. This paper presents a Monte Carlo comparison of
More informationSMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES
Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and
More informationPropensity Score Analysis with Hierarchical Data
Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationA Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators
ANNALS OF ECONOMICS AND FINANCE 4, 125 136 (2003) A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators Insik Min Department of Economics, Texas A&M University E-mail: i0m5376@neo.tamu.edu
More informationThe propensity score with continuous treatments
7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationFigure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationSupport Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina
Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,
More informationA Simple, Graphical Procedure for Comparing Multiple Treatment Effects
A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical
More informationSemiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity
Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity Jörg Schwiebert Abstract In this paper, we derive a semiparametric estimation procedure for the sample selection model
More informationST745: Survival Analysis: Nonparametric methods
ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationThe Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model
The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationComputation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy
Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationTest for Discontinuities in Nonparametric Regression
Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel
More informationPreface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation
Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric
More informationBayesian regression tree models for causal inference: regularization, confounding and heterogeneity
Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate
More informationNew Developments in Econometrics Lecture 11: Difference-in-Differences Estimation
New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?
More informationCross-fitting and fast remainder rates for semiparametric estimation
Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationImbens/Wooldridge, Lecture Notes 1, Summer 07 1
Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 1, Monday, July 30th, 9.00-10.30am Estimation of Average Treatment Effects Under Unconfoundedness 1.
More informationTests of independence for censored bivariate failure time data
Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ
More informationNonparametric confidence intervals. for receiver operating characteristic curves
Nonparametric confidence intervals for receiver operating characteristic curves Peter G. Hall 1, Rob J. Hyndman 2, and Yanan Fan 3 5 December 2003 Abstract: We study methods for constructing confidence
More informationA multiple testing procedure for input variable selection in neural networks
A multiple testing procedure for input variable selection in neural networks MicheleLaRoccaandCiraPerna Department of Economics and Statistics - University of Salerno Via Ponte Don Melillo, 84084, Fisciano
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationMann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study
MPRA Munich Personal RePEc Archive Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study Songxi Chen and Jing Qin and Chengyong Tang January 013 Online
More informationImbens/Wooldridge, IRP Lecture Notes 2, August 08 1
Imbens/Wooldridge, IRP Lecture Notes 2, August 08 IRP Lectures Madison, WI, August 2008 Lecture 2, Monday, Aug 4th, 0.00-.00am Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Introduction
More informationGenerated Covariates in Nonparametric Estimation: A Short Review.
Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationCHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials
CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or
More informationFinite Sample Performance of Semiparametric Binary Choice Estimators
University of Colorado, Boulder CU Scholar Undergraduate Honors Theses Honors Program Spring 2012 Finite Sample Performance of Semiparametric Binary Choice Estimators Sean Grover University of Colorado
More informationIndividualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models
Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018
More informationA nonparametric method of multi-step ahead forecasting in diffusion processes
A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School
More informationBalancing Covariates via Propensity Score Weighting: The Overlap Weights
Balancing Covariates via Propensity Score Weighting: The Overlap Weights Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu PSU Methodology Center Brown Bag April 6th, 2017 Joint
More informationRegression Discontinuity Designs in Stata
Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization
More informationThe Value of Knowing the Propensity Score for Estimating Average Treatment Effects
DISCUSSION PAPER SERIES IZA DP No. 9989 The Value of Knowing the Propensity Score for Estimating Average Treatment Effects Christoph Rothe June 2016 Forschungsinstitut zur Zukunft der Arbeit Institute
More informationProfessors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th
DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationNonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs
Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Xiao-Hua Zhou, Huazhen Lin, Eric Johnson Journal of Royal Statistical Society Series
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationBayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects
Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects P. Richard Hahn, Jared Murray, and Carlos Carvalho July 29, 2018 Regularization induced
More informationA Measure of Robustness to Misspecification
A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate
More informationHeteroskedasticity-Robust Inference in Finite Samples
Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard
More informationEfficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1
Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Keisuke Hirano University of Miami 2 Guido W. Imbens University of California at Berkeley 3 and BER Geert Ridder
More informationDeductive Derivation and Computerization of Semiparametric Efficient Estimation
Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationIndependent and conditionally independent counterfactual distributions
Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views
More information