Efficient Estimation of Average Treatment Effects With Mixed Categorical and Continuous Data

Size: px
Start display at page:

Download "Efficient Estimation of Average Treatment Effects With Mixed Categorical and Continuous Data"

Transcription

1 Efficient Estimation of Average Treatment Effects With Mixed Categorical and Continuous Data Qi Li Department of Economics Texas A&M University Bush Academic Building West College Station, TX Jeff Wooldridge Department of Economics Michigan State University East Lansing, MI June 18, 004 Jeff Racine Department of Economics & Center for Policy Research Syracuse University Syracuse, NY Abstract In this paper we consider the nonparametric estimation of average treatment effects when there exist mixed categorical and continuous covariates. One distinguishing feature of the approach presented herein is the use of kernel smoothing for both the continuous and the discrete covariates. This approach, together with the cross-validation method to select the smoothing parameters, has the amazing ability of automatically removing irrelevant covariates. We establish the asymptotic distribution of the proposed average treatment effects estimator with data-driven smoothing parameters. Simulation results show that the proposed method is capable of performing much better than existing kernel approaches whereby one splits the sample into subsamples corresponding to discrete cells. An empirical application to a controversial study that examines the efficacy of right heart catheterization on medical outcomes reveals that our proposed nonparametric estimator overturns the controversial findings of Connors et al. (1996), suggesting that their findings may be an artifact of an incorrectly specified parametric model. Key words and phrases: Average Treatment, Discrete Covariates, Kernel Smoothing, Bootstrap, Asymptotic normality. Li s research is partially supported by the Private Enterprise Research Center, Texas A&M University. Racine s research is supported by NSF grant BCS

2 1 Introduction The measurement of average treatment effects, initially confined to the assessment of dose-response relationships in medical settings, is today widely used across a range of disciplines. Assessing humancapital losses arising from war (Ichino and Winter-Ebmer (1998)) and the effectiveness of job training programs (Lechner (1999)) are but two examples of the wide range of potential applications. Perhaps the most widespread approach towards the measurement of treatment effects involves estimation of a propensity score. Estimation of the propensity score (conditional probability of receiving treatment) was originally undertaken with parametric index models such as the Logit or Probit, though there is an expanding literature on the semiparametric and nonparametric estimation thereof (see Hahn (1998) and Hirano et al. (00)). The advantage of nonparametric approaches in this setting is rather obvious, as misspecification of the propensity score may impact significantly upon the magnitude and even the sign of the estimated treatment effect. In many settings mismeasurement induced by misspecification can be extremely costly envision for a moment the societal cost of incorrectly concluding that a novel and beneficial cancer treatment in fact causes harm. Though the appeal of robust nonparametric methods is obvious in this setting, existing nonparametric approaches split the sample into cells in the presence of categorical covariates, resulting in a loss of efficiency. Given that datasets used to assess treatment effects frequently contain a preponderance of categorical data, 1 it is not uncommon that the number of discrete cells are larger than the sample sizes. In such cases the sample splitting frequency method become infeasible. On the other hand, the kernel-based smoothing cross-validation method can (asymptotically) automatically detect and remove irrelevant covariates, leading to a feasible and accurate nonparametric estimation of the average treatment effects. Another obvious symptom of the sample-splitting frequency method would be a loss in power of tests of whether a treatment effect differs from that of no effect. In this paper we propose a kernel-based nonparametric method for measuring and testing for the presence of treatment effects that is ideally suited to datasets containing a mix of categorical (nominal and ordinal) and continuous datatypes. One distinguishing feature of the proposed approach is the use of kernel smoothing for both the continuous and the discrete covariates. We elect to use the least-squares conditional cross-validation method to select smoothing parameters for both the categorical and continuous variables proposed by Hall et al. (004a), who demonstrate that crossvalidation produces asymptotically optimal smoothing for relevant components, while it eliminates irrelevant components by oversmoothing. Indeed, in the problem of nonparametric estimation of a conditional density with mixed categorical and continuous data, cross-validation comes into its own as a method with no obvious peers. The rest of the paper proceeds as follows. In Section we outline the nonparametric model and derive the distribution of the resultant average treatment effect. In Section 3 we undertake some simulation experiments, which demonstrate that the proposed method is capable of outperforming existing kernel approaches that require splitting the sample into subsamples ( discrete cells ). An empirical application presented in Section 4 involving a study that examines the efficacy of right heart catheterization on medical outcomes reveals that our approach negates the controversial findings of Connors et al. (1996) suggesting that their result may be an artifact of an incorrectly specified parametric model. Main proofs appear in the appendices. 1 In the typical medical study, it is common to encounter exclusively categorical data types. It is not clear to us how to extend this property of kernel smoothing, which automatically removes irrelevant covariates, to nonparametric series methods. 1

3 The Model For what follows, we use a dummy variable, t i {0, 1}, to indicate whether or not an individual has received treatment. We let t i = 1 for the treated, 0 for the untreated. Letting y i (t i ) denote the outcome, then, for i = 1,..., n, we write y i = t i y i (1) + (1 t i )y i (0). Interest lies in the average treatment effect defined as follows: τ = E[y i (1) y i (0)]. Let x i denote a vector of pre-treatment variables. One issue that instantly surfaces in this setting is that, for each individual i, we either observe y i (0) or y i (1), but not both. Therefore, in the absence of additional assumptions, the treatment effect is not consistently estimable. One popular assumption is the unconfoundedness condition (Rosenbaum and Rubin (1983)): Assumption (A1) (Unfoundedness): Conditional on x i, the treatment indicator t i is independent of the potential outcome. Define the conditional treatment effect by τ(x) = E[y i (1) y i (0) X = x]. Under Assumption 1 one can easily show that τ(x) = E[y i t i = 1, x i = x] E[y i t i = 0, x i = x]. (.3) The two terms on the right-hand side of (.3) can be estimated consistently by any nonparametric estimation technique. Therefore, under A1, the average treatment effects can be obtained via simple averaging over τ(x). τ = E[τ(x i )]. (.4) Letting E(y i x i, t i ) be denoted by g(x i, t i ), we then have y i = g(x i, t i ) + u i, (.5) with E(u i x i, t i ) = 0. Defining g 0 (x i ) = g(x i, t i = 0) and g 1 (x i ) = g(x i, t i = 1), we can re-write (.5) as y i = g 0 (x i ) + [g 1 (x i ) g 0 (x i )]t i + u i = g 0 (x i ) + τ(x i )t i + u i, (.6) where τ(x i ) = g 1 (x i ) g 0 (x i ). From (.6) it is easy to show that τ(x i ) = cov(y i, t i x i )/var(t i x i ). Letting µ(x i ) = P r(t 1 = 1 x i ) E(t i x i ) (because t i = {0, 1}), we may write { } (ti µ i )y i τ = E[τ(x i )] = E. (.7) var(t i x i ) We now turn to the discussion of the nonparametric estimation of τ based on (.7).

4 .1 Nonparametric Estimation of the Propensity Score We use x c i and xd i to denote the continuous and discrete components of x i, with x c i Rq and x d i being of dimension r. Let w( ) denote a univariate kernel function for the continuous variables, and define the product kernel function by W h (x c i, xc j ) = ( q x c h 1 s w is x c js h s ), where x c is is the sth component of x c i. We assume that some of the discrete variables have a natural ordering, examples of which would include preference orderings (like, indifference, dislike), health conditions (excellent, good, poor), and so forth. Let x d i denote a r 1 -vector (say, the first r 1 components of x d i, 0 r 1 r) of discrete covariates that have a natural ordering (0 r 1 r), and let x d i denote the remaining r = r r 1 discrete covariates that do not have a natural ordering. We use x d it to denote the tth component of xd i (t = 1,..., r). For an ordered variable, we use the Habbema kernel: { d 1, if x d l( x it, x d it = x d jt jt, λ t ) =, λ xd it xd jt t, if x d it xd jt. (.8) When λ t = 0 (λ t [0, 1]), l( x d it, xd jt, λ t = 0) becomes an indicator function, and when λ t = 1, l( x d it, xd jt, λ t = 1) = 1 becomes a uniform weight function. For an unordered variable, we use a variation on Aitchison and Aitken s (1976) kernel function defined by { d 1, if x l( x it, x d d jt) = it = x d jt, (.9) λ t, otherwise. Again λ t = 0 leads to an indicator function, λ t = 1 to a uniform weight function. Let 1(A) denote an indicator function that assumes the value 1 if A occurs and 0 otherwise. Combining (.8) and (.9), we obtain the product kernel function given by [ r1 ] [ r ] L(x d i, x d j, λ) =. (.10) t=1 λ xd it xd jt t t=r 1 +1 λ 1( xd it xd jt ) t We note that there does not exist a plug-in or even an ad-hoc formula for selecting λ t in this setting. Hence we recommend using least squares cross-validation for selecting λ t (t = 1,..., r). Our recommendation is based not only on the mean square error optimality of least squares cross-validation, but also due to its automatic ability to (asymptotically) remove irrelevant discrete covariates (see Hall et al. (004a,b)). This property bears highlighting as we have observed that irrelevant variables tend to occur surprisingly often in practice. Thus, cross-validation provides an efficient way of guarding against overspecification of nonparametric models, and thereby mitigates the curse of dimensionality often associated with kernel methods. Since µ(x i ) = P r(t i = 1 x i ) = E(t i x i ), we can use either a conditional probability estimator, or a conditional mean estimator to estimate µ(x i ). We will use the latter in this paper. We let ˆt(x i ) be the nonparametric estimator of µ i µ(x i ) defined by ˆt(x i ) = n j=1 t jk n,ij n j=1 K, (.11) n,ij where K n,ij = W h (x c i, xc j )L(xd i, xd j, λ). By noting that var(t i x i ) = µ i (1 µ i ), one can estimate the 3

5 average treatment effects by ˆτ = 1 n (t i ˆt(x i ))y i M ni ˆt(x i )(1 ˆt(x i )) 1 n [ ti y i ˆt(x i ) (1 t ] i)y i M ni, (.1) 1 ˆt(x i ) where M ni = M n (x i ) is a trimming set that trims out observations near the boundary. With the exception of the presence of the trimming function M ni, (.1) is exactly the same as the propensity score based estimator considered by Hirano et al. (00), who used series methods for estimating µ(x i ). As we mentioned earlier, it is not unclear how to use nonparametric series methods to automatically remove irrelevant discrete covariates. Therefore, in this paper we will consider only the kernel-based estimation method, which has the advantage of automatically removing irrelevant covariates (be they continuous or discrete). To derive the asymptotic distribution of ˆτ we make the following regularity assumptions. Following Robinson (1988) we use Gν α (ν is a positive integer) to denote the smooth class of functions such that if g Gν α, then g is ν-times differentiable, and g and its partial derivatives (up to order ν) are all bounded by functions with finite αth moments. Assumption (A): (i) (y i, x i, t i ) are independently and identically distributed as (y i, x i, t i ). (ii) x d i takes finitely many different values; for each x d, the support of f(x c, x d ), is a compact convex set in x c, µ(x c, x d ) Gν, 4 f(x c, x d ) Gν 1 4, where ν and ν > q is a positive integer. (iii) inf x S f(x) η for some η > 0, where S is the support of x i. (iv) σ (x, t) = var(u i x i = x, t i = t) is bounded below by a positive constant on the support of (x i, t i ). (v) The trimming function M n (x) converges to an indicator function (as n ) 1(x S), where 1( ) is the usual indicator function, and S is the support of f(x). Assumption (A3): (i) w( ) is νth order kernel; it is bounded, symmetric and differentiable up to order ν. (ii) As n, n q hν+4 s 0, and nh 1... h q. Assumptions (A) (i)-(iv) are standard smoothness and moment conditions. (A) (v) implies that, asymptotically, we only trim a negligible amount of data (near the boundary) so that ˆτ is asymptotically efficient (see Theorem.1). A trimming set is used in (.1) for theoretical reasons. Given that the support of x is a compact and convex set (in x c ), without loss of generality, one can assume = q [ δ s, δ s ], where δ s = δ sn < 1 converges to that x c [ 1, 1] q. Then one can define a set A δn 0 as n. To avoid boundary bias, one can choose δ s = O (h α s ) for some 0 < α < 1, and define M n (x i ) = 1(x i A δn ). In this way the boundary effects disappear asymptotically. In practice, boundary trimming does not appear to be necessary. In both the simulations and the empirical application reported in Sections 3 and 4, we do not resort to trimming. In the presence of outliers, however, one might wish to consider trimming. In order to appreciate the restrictions imposed by (A3), let us assume that h s = h for all s s. In this case, (A3) (ii) requires that ν + > q. Using a second order kernel (ν = ) implies that q < 4 or q 3 since q is a positive integer. Thus, a second order kernel can satisfy (A3) if q 3. When q 4 (A3) requires the use of a higher order kernel function. Remark.1 If 1 q 3 and one uses a second order kernel (ν = ), then (A3) allows optimal smoothing. To see this, note that when ν =, the optimal smoothing is h s = O ( n 1/(4+q)). (A3) (ii) becomes (assuming h s = h) nh 8 0 and nh q ; optimal smoothing h n 1/(4+q) satisfies these conditions for q = 1,, 3. For q > 3, (A3) (ii) rules out optimal smoothing. We will choose the smoothing parameters based on, but not the same as (if q 4), the least squares 4

6 cross-validation method. The leave-one-out kernel estimator of E(y i x i ) = µ(x i ) is given by ˆt i (x i ) = n j i t jk n,ij n j i K, (.13) n,ij where K n,ij = W h (x c i, xc j )L(xd i, xd j, λ). We choose (h, λ) = (h 1,..., h q, λ 1,..., λ r ) by minimizing the following least squares cross-validation function CV (h, λ) = 1 n [ ti ˆt i (x i ) ] Sn (x i ), (.14) where S n ( ) is a weight function that trims out observations near the boundary of the support of x i (avoiding the excessive boundary bias). Hall et al (004a,b) have shown that when x d s (x c s) is an irrelevant covariate, the cross-validation selected smoothing parameter λ s (h s ) will converge to 1 ( ) in probability, hence, irrelevant covariates (discrete or continuous) will be automatically smoothed out. Given that the cross-validation method can automatically remove the irrelevant covariates, we will derive the asymptotic distribution of ˆτ under the condition that all the q continuous variables and the r discrete variables are relevant ones, i.e., assuming that the irrelevant covariates have already been removed when computing ˆτ. When all the covariates are the relevant ones, we request that h s s and λ s s are chosen from some shrinking set, h s (0, η n ] (s = 1,..., q) and λ s (0, η n ] (s = 1,..., r), where η n 0 as n at a rate slower than any inverse polynomial of n. This assumption can be relaxed as in Hall et al (004a,b). We let ( h 1,..., h q, λ 1,..., λ q ) denote the cross-validation choices of (h 1,..., h q, λ 1,..., λ q ) that minimize (.14). It is well known that the cross-validation method leads to optimal smoothing, but our assumption (A) (ii) rules out optimal smoothing when q > 3. Therefore, we suggest using ĥ s = h s n 1/(ν+q) n 1/(q+ν+) and ˆλ s = λ s n ν/(ν+q) n ν/(q+ν+). Thus we have ĥs n 1/(q+ν+) and ˆλ s n ν/(q+ν+), satisfying (A) (ii). When 1 q 3, we know that we can choose ν = (second order kernel), which leads to ĥs h s and ˆλ s λ s. Following the proofs of Hall et al. (004a), one can show the following: Lemma. Under the assumptions (A1) to (A3), and the condition of (A.6) given in the Appendix A, we have ĥ s = a 0 sn 1/(ν+q+) + o p (n 1/(ν+q+)), for s = 1,..., q; ˆλ s = b 0 sn /(ν+q+) + o p (n /(ν+q+)), for s = 1,..., r. where a 0 ss are finite positive constants, and b 0 ss are non-negative finite constants. The proof of Lemma., as well as consistent estimators of V 1, V and B h,λ, is given in Appendix A. The empirical applications and simulation results presented in Hall et al. (004a,b) reveal that nonparametric estimation based on cross-validated bandwidth selection performs much better than a conventional frequency estimator (which corresponds to λ s = 0 for all s = 1,..., r) because the former does not split the sample in finite-sample applications, which creates efficiency losses. Having obtained the ĥss and ˆλ s s based on the cross-validation method, we estimate τ using expression (.1) with ˆt(x i ) computed using ĥss and ˆλ s s. To avoid introducing too many notations, we will still use ˆτ to denote the resulting estimator of τ. 5

7 . The Asymptotic Distribution of ˆτ The next theorem provides the asymptotic distribution of ˆτ. Theorem.1 Under assumptions (A1) - (A3) we have n(ˆτ τ Bh,λ ) N(0, V 1 + V ) in distribution, where B h,λ = q B1s(x)ĥν s r B s(x)ˆλ s, B 1s (x) and B s (x) are defined in lemma B. of Appendix B, V 1 = var(τ(x i )), V = E { σ (x i, t i )(t i µ i ) /[µ i (1 µ i) ] }, and σ (x i, t i ) = E(u i x i, t i ). The proof of Theorem.1 is given in Appendix A. Theorem.1 shows that our kernel-based estimator of ˆτ is semiparametrically efficient. Let f(x, t) and f x (x) denote the joint and marginal densities of (x i, t i ) and x i, respectively, and let p(t i x i ) be the conditional probability of t i given x i. Letting dx = x d dx c, µ x = µ(x), and using f(x i, t i ) = p(t i x i )f x (x i ), and noting that p(t i = 1 x i ) = µ i and p(t i = 0 x i ) = 1 µ i, we have V = E { σ (x i )(t i µ i ) / [ µ i (1 µ i ) ]} = f x (x)p(t x) { σ (x, t)(t i µ x ) / [ µ x(1 µ x ) ]} dx t=1,0 fx (x)µ x σ (x, 1)(1 µ x ) = µ x(1 µ x ) dx + { σ } (x i, 1) = E + σ (x i, 0). µ i 1 µ i fx (x)(1 µ x ) σ (x, 0) µ x(1 µ x ) dx (.16) Equation (.16) matches the expression given in Hahn (1998). Thus, V 1 + V coincides with the semiparametric efficient bound for this model. Hirano et al. (00) consider the problem of estimating average treatment effects using series estimation methods, and they observe that if one uses the true cov(t i x i ) = µ i (1 µ i ) to replace the estimated covariance ˆt i (1 ˆt i ) in (the denominator of) ˆτ, then it results in a less efficient estimator of τ. The same result holds true for our kernel-based estimator, as the next lemma shows. Lemma.3 If one replaces the denominator ˆt i (1 ˆt i ) in ˆτ by µ i (1 µ i ), and lets τ denote the resulting estimator of τ, i.e., τ = 1 (t i ˆt i )y i M ni, (.17) n µ i (1 µ i ) then n( τ τ Bh,λ ) N(0, V 1 + V + V 3 ) in distribution, where V 1 and V are the same as that given in Theorem.1, while V 3 is given by { [ (ti µ i ) ] } V 3 = E µ i (1 µ i ) 1 τi. The proof of Lemma.3 is given in Appendix A. 6

8 We observe how using the true var(t i x i ) yields a less efficient estimator than ˆτ, which uses the estimated var(t i x i ). The reason for this result is that one can express n(ˆτ τ B h,λ ) as n(ˆτ τ Bh,λ ) = n(ˆτ τ) + n( τ τ B h,λ ). In Appendix A we show that n( τ τ B h,λ ) = Z n1 + Z n + Z n3 + o p (1) N(0, V 1 + V + V 3 ) in distribution, where Z nl s (l = 1,, 3) are three asymptotically uncorrelated terms, having asymptotic N(0, V l ) distributions, respectively (l = 1,, 3, with definitions appearing in Appendix A). This yields Lemma.3. In Appendix A we also show that n(ˆτ τ) = Z n3 + o p (1). Hence, n(ˆτ τ Bh,λ ) = n(ˆτ τ) + n( τ τ B h,λ ) = { Z n3 + o p (1)} + {Z n1 + Z n + Z n3 + o p (1)} = Z n1 + Z n + o p (1) N(0, V 1 + V ) in distribution, (.1) resulting in Theorem.1. That is, since the leading term in n(ˆτ τ) cancels one of the leading terms in n( τ τ B h,λ ), this gives rise to the result that using an estimated variance ˆvar(t i x i ) is more efficient than using the true variance var(t i x i ) when estimating τ. If one uses the true propensity score µ i in both the numerator and denominator of ˆτ, then one gets τ, which is more efficient than ˆτ since n( τ τ) is asymptotically normal with zero mean and asymptotic variance V 1. Of course, τ is not a feasible estimator. Thus, among the class of feasible ( regular ) estimators, ˆτ is asymptotically efficient. An alternative estimator for τ In order to construct consistent estimator for the aymptotic variance V 1 + V, we need to obtain, among other things, consistent estimator of the error u i. The above proposed ˆτ is based on estimated propensity score, it does not estimate the regression mean functional directly. In this subsection we consider an alternative estimator for τ which is based on direction estimation of E(y i x i, t i ), which of course also leads to a direct estimator of u i. Note that (.6) can also be viewed as a functional coefficient model (smooth coefficient model) as considered by Chen and Tsay (1993), Cai, Fan and Yao (000), Cai, Fan and Li (000), and Li et al. (00), among others. Thus an alternative estimator of τ(x i ) can be obtained by a local regression of y i on (1, t i ) using kernel weights. In this way we obtain a nonparametric estimator of (g 0 (x i ), τ(x i )) given by ) (ĝ0 (x i ) = n 1 ˆτ n (x i ) ( ) 1 (1, t j )W h,ij Lˆλ,ij j i t j 1 n 1 ( ) 1 y j W h,ij Lˆλ,ij, (.) where W h,ij = W h (x c j, xc i ) and Lˆλ,ij = L(x d i, xd j, ˆλ). (.) gives consistent estimator of g 0 (x i ) and tan(x i ). For example, the resulting estimate of τ(x i ) is given by j i ˆτ n (x i ) = Ê(y it i x i ) Ê(y i x i )Ê(t i x i ), (.3) ˆt(x i )(1 ˆt(x i )) where Ê(y i t i x i ) = n 1 n j=1 t jy j K n,ij / ˆf(x i ), Ê(y i x i ) = n 1 n j=1 y jk h,ij / ˆf(x i ), ˆt(x i ) = n 1 n j=1 t jk n,ij / ˆf(x i ), and ˆf(x i ) = n 1 n j=1 K n,ij. t j 7

9 From (.3) we readily obtain a consistent estimator of E(y i x i ) by ĝ 0 (x i ) + ˆτ n (x i ) t i. One can also estimate τ by ˆτ n = n 1 n ˆτ n(x i )..3 Testing no Effect Based on Bootstrapping The result of Theorem.1 can be used to test the null hypothesis of no effect (τ = 0) of a treatment. We know that nonparametric estimation results can be sensitive to the choice of smoothing parameters. Therefore, test results based on asymptotic distributions may also be sensitive to the selection of smoothing parameters. In order to obtain a more robust test, we suggest using resampling methods to approximate the null distribution of the test statistic. Below we present two bootstrap procedures. The first procedure does not involve any null hypothesis and is useful for the construction of error bounds, while the second procedure imposes the null hypothesis of no treatment effect. Let z i {y i, t i, x c i, xd i }n, the vector of realizations on the outcome, treatment, and conditioning information. We wish to construct the sampling distribution of ˆτ, and do so with the following resampling procedure. 1. Randomly select from {z j } n j=1 with replacement, and call {z i }n the bootstrap sample.. Use the bootstrap sample to compute the bootstrap statistic ˆτ using the same cross-validated smoothing parameters as were used for ˆτ. 3. Repeat steps 1 and a large number of times, say B times. The empirical distribution function of {ˆτ j } B j=1 will be used to approximate the finite-sample distribution of ˆτ. We now wish to construct the sampling distribution of ˆτ under the null of no treatment effect, and do so with the following bootstrap procedure. 1. Randomly select from {y j } n j=1 from {z j} n j=1 with replacement and call this {y j }n j=1. Next, call {zi }n {y j, t j, x c j, xd j }n j=1 the bootstrap sample. Note that we have broken any systematic relationships between the outcome and covariates, thereby imposing the null of no treatment effect on the sample {zi }n.. Use the bootstrap sample to compute the bootstrap statistic ˆτ using the same cross-validated smoothing parameters as were used for ˆτ (ˆτ n ). 3. Repeat steps 1 and a large number of times, say B times. The empirical distribution function of {ˆτ j } B j=1 will be used to approximate the finite-sample distribution of ˆτ under the null. 4. A test of the null of no treatment follows directly. Let {ˆτ j } B j=1 be the ordered (in an ascending order) statistic of the B bootstrap statistics, and let ˆτ α denote the αth percentile of {ˆτ j }B j=1. We reject H 0 if ˆτ > ˆτ α at the level α. Let ˆV 1, ˆV and ˆB h,λ denote some consistent estimators of V 1, V and B h,λ, respectively (say, as given in Appendix A), and let ˆV 1, ˆV and ˆB h,λ, denote their bootstrap counterparts, obtained by replacing (y i, x i, t i ) with (yi, x i, t i ) in ˆV 1, ˆV and ˆB h,λ, respectively. Note that the bootstrap counterpart quantities use the same smoothing parameters (they do not require re-cross-validation). The following theorem shows that the bootstrap method works. 8

10 Theorem. Under the same conditions as in Theorem.1, define Tn = n(ˆτ ˆB h,λ )/ ˆV 1 + ˆV Then Tn {x i, t i, y i } n N(0, 1) in distribution in probability. 3 The proof of Theorem. is similar to the proof of Theorem.1 and is thus omitted here. Let ˆT n = n(ˆτ ˆB h,λ )/ ˆV1 + ˆV. Then under H 0 both ˆT n and ˆT n have asymptotic standard normal distributions. Compare the differences between the numerators of the two test statistics: n(ˆτ ˆτ ) + n(bh,λ B h,λ) = n(ˆτ ˆτ ) + ( no q ) p hν+ s = n (ˆτ ˆτ ) + o p (1) because both ˆB h,λ and ˆB h,λ are of order O ( q hν s) and their differences are of smaller order ˆB h,λ ( ˆB h,λ = O q ) p hν+ s. Therefore, when one uses bootstrap procedures to conduct the test, one does not need to compute B h,λ (and Bh,λ ). This is yet another advantage of using bootstrap procedures in this setting. 3 Simulations In this section we report simulations designed to examine the finite sample performance of the proposed methods. We highlight performance in mixed data settings, a feature that existing methods do not handle well, and consider testing for the null of no treatment effect using two kernel methods, a nonparametric propensity score model, and a nonparametric frequency propensity score model (traditional cell-based estimator). We consider the following data generating process (DGP):. y i = g(x c i, x d i, t i ) + ɛ i = g 0 (x c i, x d i ) + [g 1 (x c i, x d i ) g 0 (x c i, x d i )]t i + ɛ i = g 0 (x c i, x d i ) + τ(x c i, x d i )t i + ɛ i = β 0 + β 1 x c i1 + β (x c i1) + β 3 x d i1 + β 4 x d i + τ(x c i, x d i )t i + ɛ i, (3.5) where x c 1 is U[ 1, 1] and xd 1 {0, 1} with P [xd 1 = 1] = 0. and xd {0, 1} with P [xd 1 = 1] = 0.6, and let (β 0, β 1, β, β 3, β 4, τ) = (1,,,, 0, τ), while σ ɛ = 1 (x d is irrelevant). These parameter choices yield an adjusted R of around 66%. Our model for the propensity score (Probit) is t i = γ 0 + γ 1 x c i1 + γ (x c i1) + γ 3 x d i1 + γ 4 x d i + η i, { 1 if Φ(t t i = i ) > otherwise. (3.6) where σ η = 1 and Φ is the standard normal CDF. We set (γ 0, γ 1, γ, γ 4, γ 4 ) = ( 1/, 1/4, 1/4, 1/4, 0) (x d is irrelevant). We fix the sample size at for n = 50. These parameter choices yield a correct classification ratio of roughly 60%. For a given value of τ, we generate each replication in the following manner: 1. Draw a sample of size n for {x c 1, xd 1, xd, η, ɛ}, which then determines the values of {t, y}. 3 For the definition of convergence in distribution in probability, see Li et al (003). 9

11 . Using {t i, y i, x c i1, xd i1, xd i }n, compute ˆτ using each of the four kernel methods. 3. Compute tests for the null of no effect for each of the four kernel methods based on 199 bootstrap replications under H 0 at nominal levels α = 0.10, 0.05, Repeat steps 1 through 3 B = 1000 times for values of τ in {0.0, 0.5, 0.50, 0.75}. All bandwidths were selected via cross-validation based upon two restarts of a multidimensional numerical search routine allowing for different bandwidths for all variables. The results are summarized in tables 1 and. Table 1 presents results for the proposed method which employs nonparametric kernel approach appropriate for mixed data ( smooth ), while Table presents the traditional frequency approach, which involves splitting the data into subsamples ( nonsmooth ). Table 1: Empirical Rejection Frequencies for Smooth Propensity Score Model. τ α = 0.10 α = 0.05 α = Table : Empirical Rejection Frequencies for Non-Smooth Propensity Score Model. τ α = 0.10 α = 0.05 α = We observe first that the smooth approach is correctly sized and has power in the direction of the alternative DGP. The traditional nonsmooth propensity score approach suffers from minor size distortions, suggesting that it is more susceptible to efficiency losses arising from sample splitting than the nonsmooth functional coefficient approach. The important comparison lies with the performance of the proposed smooth and the traditional nonsmooth approaches. It can be seen that, as expected, sample splitting leads to efficiency loss, which manifests itself as a loss in power. Note that we have only considered one binary irrelevant covariate case, when there exists more irrelevant covariates, or one irrelevant covariate that takes more than two different values, the power loss becomes more substantial. Summarizing, the proposed smooth test is more powerful that a traditional frequency-based (nonsmooth) test when confronted with mixed data settings, which are often encountered in applied settings. 10

12 4 An Empirical Application We will be interested in models that make use of the following variables 4 which were taken from the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT). The data was obtained from the Department of Health Evaluation Sciences at the University of Virginia 5 : Y : Outcome - 1 if death occurred within 180 days, zero otherwise T : Treatment - 1 if a Swan-Ganz catheter was received by the patient when they were hospitalized, zero otherwise. X 1 : Sex - 0 for female, 1 for male X : Race - 0 if black, 1 if white, if other X 3 : Income - 0 if under 11K, 1 if 11-5K, if 5-50K, 3 if over 50K X 4 : Primary disease category - 1 if Acute Respiratory Failure, if Congestive Heart Failure, 3 if Chronic Obstructive Pulmonary Disease, 4 if Cirrhosis, 5 if Colon Cancer, 6 if Coma, 7 if Lung Cancer, 8 if Multiple Organ System Failure with Malignancy, 9 if Multiple Organ System Failure with Sepsis X 5 : Secondary disease category - 1 if Cirrhosis, if Colon Cancer, 3 if Coma, 4 if Lung Cancer, 5 if Multiple Organ System Failure with Malignancy, 6 if Multiple Organ System Failure with Sepsis, 7 if NA X 6 : Medical insurance - 1 if Medicaid, if Medicare, 3 if Medicare & Medicaid, 4 if No insurance, 5 if Private, 6 if Private & Medicare X 7 : Age - age (converted to years from Y/M/D data stored with decimal accuracy) Table 3 presents some summary statistics on the variables described above. The number of cells in this dataset is 18,144 which exceeds the number of records, 5,735. Note that, as was found by Connors et al (1996), those receiving right-heart catheterization are more likely to die within 180 days than those who did not. Interestingly, Lin et al (1998) also find that when further adjustments were made that the risk of death is lower than that reported by Connors et al (1996) and they conclude that results of our sensitivity analysis provide additional insights into this important study and imply perhaps greater uncertainty about the role of RHC than those stated in the original report. 4 The Connors et al (1996) study considered 30-day, 60-day, and 180-day survival and they also considered categories of admission diagnosis and categories of comorbidities illness as covariates. We restrict attention to 180-day survival by way of example, while we ignore admission diagnosis and comorbidities illness due to the prevalence of missing observations among these covariates. As it is our intention to demonstrate the utility of the proposed methods on actual data and not to become immersed in ad hoc adjustments that must be made to handle the prevalence of missing data for these additional covariates, we beg the reader s forgiveness in this matter. Nevertheless, even though we omit admission diagnosis and comorbidities illness as covariates, we indeed detect results that are qualitatively and quantitatively similar to those reported in Connors et al (1996) and Lin et al (1998). 5 We are most grateful to Dr. B. Knaus and Dr. F. Harrell Jr. for making this data available to us, and also to Luz Saavedra for helping with data conversion. 11

13 Table 3: Summary Statistics Variable Mean StdDev Minimum Maximum Outcome Treatment Sex Race Income Primary disease category Secondary disease category Medical insurance Age Lin et al (1998) note that cardiologists and intensive care physician s belief in the efficacy of RHC for guiding therapy for certain patients is so strong that it has prevented the conduct of a randomized clinical trial (RCT) while Connors et al (1996) note that the most recent attempt at an RCT was stopped because most physicians refused to allow their patients to be randomized. The confusion matrix for the parametric propensity score model is A/P Sample Size 5735 CR(0-1) 66.7% CR(0) 80.0% CR(1) 45.% The confusion matrix for the nonparametric propensity score model is A/P Sample Size 5735 CR(0-1) 69.9% CR(0) 8.1% CR(1) 50.0% An examination of these confusion matrices demonstrates how, for this dataset, the nonparametric approach is better able to predict who receives treatment and who does not than the Logit model. The parametric approach correctly predicts 3,88 of the 5,735 patients while the nonparametric approach correctly predicts 3,976 patients thereby predicting an additional 148 patients correctly. The differences between the parametric and nonparametric versions of the weighting estimator reflect this additional number of correctly classified patients along with differences in the estimated probability of treatment themselves. The increased risk suggested by the parametric model drops from a 7% increase for those receiving RHC to roughly 0%. 1

14 Based upon the parametric propensity score estimate, the treatment effect is 0.07, while the nonparametric propensity score estimate yields a treatment effect of We then bootstrapped the sampling distribution of these estimates, and obtained 95% coverage error bounds of [0.044, 0.099] for the parametric approach and [ 0.038, 0.011] for the nonparametric approach. Thus, we overturn the parametric testing result and conclude that patients receiving RHC treatment does not suffer an increased risk. We also conducted some sensitivity analysis. Using likelihood cross-validation rather than leastsquares cross-validation yielded 95% coverage error bounds of [ 0.034, 0.013]. Out of concern that the nonparametric results might reflect overfitting, we computed the leave-one-out kernel predictions, and again bootstrapped their error bounds. For the least-squares cross-validated estimates we obtained 95% coverage error bounds of [ 0.015, 0.037], while for the likelihood cross-validated estimates we obtained 95% coverage error bounds of [ 0.007, 0.039]. These error bounds indicate that the parametric model suggests a statistically significant increased risk of death for those receiving RHC, while the nonparametric model yields no significant difference. This does not appear to be a result of a loss of efficiency due to using the nonparametric propensity score rather than the parametric one as can be seen by a comparison of the out of sample prediction results of the confusion matrices, because if the parametric model is correctly specified, one should expect that the parametric model predicts better than a nonparametric model. A Appendix A Definition of ˆV 1, ˆV and ˆB h,λ ˆV 1 = n 1 n [ˆτ ni ˆm τ ], where ˆτ ni is defined in (.3), ˆm τ = n 1 n j=1 ˆτ nj is the sample mean of ˆτ n (x j ). ˆV = n 1 n û i (t i ˆt i ) /[ˆt i (1 ˆt i )], where ˆt i is the kernel estimator of E(t i x i ), û i = ĝ 0 (x i ) + t iˆτ n (x i ), ĝ 0 (x i ) is defined in (.). To estimate B h,λ we need to estimate B 1s (s = 1,..., q) and B s (s = 1,..., r), these quantities can be estimated by estimating f(x i ), m(x i ) = g 0 (x i ) + µ(x i )τ(x i ) by ˆf(x i ) and ˆm(x i ) = ĝ 0 (x i ) + ˆµ(x i )ˆτ n (x i ), respectively, and their derivatives estimators can be obtained by taking derivatives (since the kernel function is differentiable up to order ν). Finally, replacing the population mean E(.) by sample mean lead consistent estimators for B 1s and B s, and hence for B h,λ. In Appendices A and B, because M n (x) 1 on the support of f(x), we will omit the trimming function M ni. Also, we will use the notation (s.o.) which is defined as follows: When B n is the leading term of A n, we write A n = B n + (s.o.), where (s.o.) denotes terms having probability order smaller than B n. Also, when we write A(x i ) = B(x i ) + (s.o.), it means that n 1 n A(x i) = n 1 n B(x i) + (s.o.). Proof of Lemma. We use the notation g s (l) (x) to denote l g(x)/ (x c s) l, the lth order partial derivative of g( ) with respect to x c s. Also, when x d s is an unordered categorical variabe, define an indicator function I s (.,.) by ( ) r ( ) I s (x d, z d ) = 1 x d s zs d 1 x d t = zt d (A.1) When x d s is an ordered categorical variabe, for notational simplicity, we assume that x d s takes (finitely t s 13

15 many) consecutive integer values, and I s (.,.) is defined by ( I s (x d, z d ) = 1 x d s zs d = 1 ) r t s ( ) 1 x d t = zt d. (A.) When using a second order kernel (ν = ), Hall et al. (004a,b) have shown that CV (h, λ) ν= = x d { κ + v d r + q κ q nh 1... h q [ (fg) () s (x) g(x)f s () ] (x) h s } [ ] I s (v d, x d ) g(x c, v d ) g(x) f(x c, v d )λ s S(x)f(x) 1 dx c x d σ (x)s(x)dx c + (s.o.), (A.3) where κ = w(v)v dv, κ = w(v) dv, and I s ( ) is defined in (A.1) and (A.). By following exactly the same derivation as in Hall et al. (004a,b), one can show that, with a νth order kernel, CV (h, λ) = { } q r B 1s (x)h ν s + B s (x)λ s S(x)f(x) 1 dx c x d κ q + nh 1... h q x d σ (x)s(x)dx c + (s.o.), (A.4) where B 1s (x) = (κ q /ν!)[(fg) (ν) s (x) g(x)f s (ν) (x)] (s = 1,..., q), κ q = w(v)v q dv, and B s (x) = v d I s(v d, x d )[g(x c, v d ) g(x)]f(x c, v d ), and (s.o.) denote terms having smaller probability orders, uniformly in (h, λ) (0, η n ] p+r. The only difference between (A.3) and (A.4) are that h s being replaced by h ν s and that the definition of B 1s is slightly difference. Of course, (A.4) reduces to (A.3) if ν =. Define a s via h s = a s n 1/(ν+q) (s = 1,..., q), and b s via λ s = b s n ν/(ν+q) (s = 1,..., r), then (A.4) can be written as C(h, λ) = n ν/(ν+q) χ(a, b) + (s.o.) uniformly in (h, λ) (0, η n ] p+r, where χ(a, b) = { q B 1s (x)a ν s + x d + κq a 1... a q x d } r B s (x)b s S(x)f(x) 1 dx c σ (x)s(x)dx c. (A.5) We assume that There exist unique finite positive constants a 0 s (s = 1,..., q) and finite non-negative constants λ s (s = 1,..., r) that minimzes χ(a, b). (A.6) Define a (q + r) (q + r) positive semidefinite matrix A via its tth row and sth column as A t,s = x d Bt (x) B(x s )S(x)f(x) 1 dx c, where B t (x) = B 1t (x) for t = 1,..., q, and B q+t (x) = B t (x) for 14

16 t = 1,..., r, then it can be easily shown that a sufficient condition for (A.6) to hold is that A is positive definite. From (A.4), (A.5) and (A.6), we obtain that h s = a 0 sn ν/(ν+q) + o p (n ν/(ν+q) ) and λ s = λ 0 sn 1/(ν+q) + o p (n 1/(ν+q) ), Lemma.1 then follows from this since ĥs = h s n 1/(ν+q) n 1/(ν+q+) and ˆλ s = λ s n ν/(ν+q) n ν/(ν+q+). Proof of Theorem.1 In order to make the proof of Theorem.1 more manageable we make a number of simplifying assumptions. (i) we replace ˆt(x i ) in the definition of ˆτ by the leave-one-out estimator ˆt i (x i ), or one can redefine ˆτ by replacing ˆt(x i ) by ˆt i (x i ) in ˆτ. (ii) we replace ĥs by the non-stochastic quantity h 0 s = a 0 sn 1/(ν+q+) (s = 1,..., q), and ˆλ s by λ 0 s = b 0 sn /(ν+q+) (s = 1,..., r). (iii) when we evaluate the probability order of a term, we sometimes assume that h s = h for all s = 1,..., q, to simplify the notation. For example, we will write O(h ν ) for O( q hν s) to save space. Note that the proof carries through without making these simplifying assumptions. For example, ignoring the leave-oneout estimator only introduces some extra smaller order terms. Lemma. shows that ĥs/h s 1 = o p (1), and ˆλ s /λ s 1 = o p (1), and by the stochastic equicontinuity result of Ichimura (000), we know that the asymptotic distribution of ˆτ remains the same whether one uses ĥs s and ˆλ s s, or the non-stochastic leading term of them (h 0 s s and λ 0 s s). We will repeatedly use the U-statistic H-decomposition in the proof below. We sometimes write n 1 for (n 1) 1 since this approximation does not affect the order of any quantity considered. We will use the short-hand notation ˆt i = ˆt i (x i ) and ˆf i = ˆf i (x i ), i.e., ˆt i = n 1 n j i t jk n (x j, x i ) ˆf i, (A.7) with ˆf i = n 1 n j i K n(x j, x i ). Define v i = t i E(t i x i ) t i µ i, so that t i = µ i + v i, replacing t j by µ j + v j in the right-hand-side of (A.7) we have ˆt i = ˆµ i + ˆv i, (A.8) where and We use the short-hand notation w i and w i defined by ˆµ i = n 1 n j i µ jk h ( x j x i h ) ˆf i, (A.9) ˆv i = n 1 n j i v jk h ( x j x i h ) ˆf i. (A.10) w i = µ i (1 µ i ) and w i = ˆt i (1 ˆt i ). (A.11) We use the following identities to handle the random denominator of ˆτ: Proof of Theorem.1 1 w i = 1 w i + w i w i w i + (w i w) wi w. (A.1) i 15

17 We have defined τ in (.17). We now define another intermediate quantity τ (v i = t i µ i and we omit M ni for notational simplicity): τ = 1 n (t i µ i )y i w i 1 n v i y i w i. (A.13) By adding and subtracting terms in n(ˆτ τ), we get n(ˆτ τ) = n[(ˆτ τ) + ( τ τ) + ( τ τ)] = J 1n + J n + J 3n, (A.14) where J 1n = n(ˆτ τ), J n = n( τ τ), and J 3n = n( τ τ). Recall that w i = µ i (1 µ i ), w i = ˆt i (1 ˆt i ). Using (A.1), we get from (.1) ˆτ = 1 n [ ti ˆt i ] yi [ 1 w i + w i w i w i + (w i w i ) ] wi w i L 1n + L n + L 3n, (A.15) where L 1n = n 1 L n = n 1 L 3n = n 1 [(t i ˆt i )y i ]/w i τ, [(t i ˆt i )(w i w i )y i ]/[wi ], [(t i ˆt i )(w i w i ) y i ]/[wi w i ]. (A.16) Note that L 1n = τ, therefore, by (A.15) we have J 1n = n(ˆτ τ) = nl n + nl 3n. (A.17) Lemmas A.3 and A.4 (see below) give the leading terms of J 1n and J n. Using (.6) and adding and subtracting terms, we write J 3n = n( τ τ) as J 3n = n 1/ (v i y i /w i τ) = n 1/ (v i y i /w i τ i ) + n 1/ (τ i τ) = n 1/ [v i (g 0i + τ i t i + u i )/w i τ i ] + n 1/ (τ i τ). (A.18) 16

18 By Eq. (A.18), Lemma A.3 and Lemma A.4, we obtain from Eq. (A.14) that n(ˆτ τ Bh,λ ) = J 1n + J n n 1/ B h,λ + J 3n = n 1/ v i [µ i 1]τ i /w i n 1/ v i (g 0i + τ i µ i )/w i ] +n 1/ {v i (g 0i + τ i t i + u i )/w i τ i } + n 1/ (τ i τ) + o p (1) = n 1/ v i u i w i + n 1/ (τ i τ) + o p (1) Z n + Z n3 + o p (1), (A.19) where the definitions of Z n = n 1/ n v iu i /w i and Z n3 = n 1/ n (τ i τ), also, in the above we have used the following cancellation result (w i = µ i (1 µ i )): [ ] n 1/ vi (t i µ i ) 1 τ i + n 1/ w i i [ v = n 1/ i µ i (1 µ i ) + v i µ i v i w i w i [ v = n 1/ i µ i + µ i + v ] iµ i v i τ i [ = n 1/ (µi + v i ) ] (µ i + v i ) τ i = 0, w i ] τ i [ ] µi 1 v i τ i w i since (µ i + v i ) (µ i + v i ) t i t i = 0 (because t i = t i). Theorem.1 follows from (A.19) and the Lindeberg central limit theorem. (A.0) Proof of Lemma.3 From n( τ τ B h,λ ) = n( τ τ B h,λ ) + n( τ τ) = J n n 1/ B h,λ + J 3n, and using (A.18) and Lemma A.4, we have (v i = t i µ i ) n( τ τ Bh,λ ) = J n n 1/ B h,λ + J 3n = n 1/ v i (g 0i + τ i µ i )/w i + n 1/ [v i (g 0i + τ i t i + u i )/w i τ i ] + n 1/ (τ i τ) + o p (1) = n 1/ [ v i w i 1]τ i + n 1/ Z n1 + Z n + Z n3 + o p (1) v i u i w i + n 1/ (τ i τ) + o p (1) d N(0, V 1 + V + V 3 ) by the Lindeberg central limit theorem, (A.1) 17

19 where Z n1 and Z n are defined in (A.19), Z n3 = n 1/ [ ] n v i w i 1. Note that by Lemma A.3 and (A.0) we know that J 1n = Z n3 + o p (1). Below we present some lemmas that are used in proving Theorem.1. We will use the following identity to handle the random denominator in the kernel estimator. For any positive integer p we have 1 ˆf i = 1 f i + f i ˆf i f i ˆfi = 1 f i + p (f i ˆf i ) l l=1 f l+1 i + (f i ˆf i ) p+1 f p i ˆf i. (A.) For example, in Lemma A.1 below we need to evaluate a term like n 1 n v iy iˆv i /wi. ˆv i = n 1 j i v jk n,ij / ˆf i has a random denominator ˆf i. By computing the second moment of the term associated with (f i ˆf i ) l /fi l+1, one can easily show that this term has an smaller order than the main term that is associated with 1/f i. Also, using the uniform convergence rate of sup x S ˆf(x) f(x) = O p ( q hν s + ln n(nh 1...h q ) 1 ), together with inf x S f(x) δ > 0, one can easily show the last remainder term associated with (f i ˆf i ) p+1 /(f p ˆf i i ) is of smaller order than the first leading term (by choosing p to be sufficiently large). Therefore, using (A.) we have n 1 v i y iˆv i /wi = n 1 v i y iˆv i ˆfi /(f i wi ) + (s.o.), Now the leading term n 1 n v iy iˆv i ˆfi /(f i w i ) does not contain the random denominator ˆf i and its probability order can be easily evaluated by using H-decomposition of U-statistics. Lemma A.1 L n = n 1 i v i(µ i 1)τ i /w i + o p ( n 1/ ). Proof: Recall that w i = µ i (1 µ i ), w i = ˆt i (1 ˆt i ), t i = µ i + v i, and ˆt i = ˆµ i + ˆv i, we have L n = n 1 y i (t i ˆt i )[µ i ˆt i (µ i ˆt i )]/wi = n 1 y i (t i ˆt i )(µ i ˆt i )[1 (µ i + ˆt i )]/wi = n 1 y i (µ i ˆµ i + v i ˆv i )(µ i ˆµ i ˆv i )[1 (µ i + ˆµ i + ˆv i )]/wi = n 1 y i v iˆv i [1 µ i ]/wi + o p (n 1/) (using ˆµ i = µ i + (ˆµ i µ i ) and Lemma B.3) = = 1 n(n 1) n(n 1) j i y i v i v j (µ i 1)K n,ij /wi + o p (n 1/) (by Lemma B. and E(v i x i ) = 0) H n,a (z i, z j ) + (s.o.), i j>i (A.4) 18

20 where H n,a (z i, z j ) = (1/)v i v j {y i (1 µ i )/[f i wi ] + y j(µ j 1)/[f j wj ]}K n,ij and z i = (x i, t i, u i ). Define H 1n,a (z i ) = E[H n,a (z i, z j ) z i ] = (1/)v i τ i (µ i 1)/wi + (s.o.) by Lemma B.4) (i). Hence, by the U-statistic H-decomposition we have L n = n(n 1) = 0 + (/n) i = n 1 i H n,a (z i, z j ) + (s.o.) i j>i H 1n,a (z i ) + n(n 1) {H n,a (z i, z j ) H 1n,a (z i ) H 1n,a (z j ) + 0} + (s.o.), i j>i v i τ i (µ i 1)/w i + O p ( (nh q/ ) 1) + (s.o.) = n 1 i v i τ i (µ i 1)/w i + o p ( n 1/) (A.5) by Lemma B.4 and assumption (A3), where we also used the fact that the degenerate U-statistic def U n,a = [/n(n 1)] i j>i {H n,a(z i, z j ) H 1n,a (z i ) H 1n,a (z j )} has a second moment of E[Un,a] = O ( (n h q ) 1) (, so U n,a = O p (nh q/ ) 1). Lemma A. L 3n = O ( h ν + h (nh q ) 1) = o p ( n 1/ ). Proof: Using the identity of 1 w i = 1 w i + p (w i w i ) l l=1 w l+1 i + (w i w i ) p+1 w p i w, (A.6) i one can show that the leading term of L 3n is L 3n,1 = n 1 i y i(t i ˆt i )(w i w i ) /wi 3. This is because, (i) it is easy show that (by computing the second moment of them) the term associated with (w i w i ) l /wi l+1 has an smaller order than the main term that is associated with 1/w i. Also, using the uniform convergence rate of sup x S ˆµ(x) µ(x) = O p ( q hν s + ln n(nh 1...h q ) 1 ), together with inf x S µ(x) c > 0, and sup x S µ(x) c 1 < 1 (0 < c < 1), one can easily show the last remainder term associated with (w i w i ) p+1 /(w p i w i) is of smaller order than the first leading term (by choosing p to be sufficiently large if needed). By noting that t i = µ i + v i and w i w i = (µ i ˆt i )[1 (µ i + ˆt i )], we have L 3n,1 = n 1 i y i (t i ˆt i )(w i w i ) /w 3 i = n 1 [g 0i + τ i t i + u i ][(µ i ˆt i ) + v i ](µ i ˆt i ) [1 (µ i + ˆt i )] /wi 3 i n 1 [v i (µ i ˆt i ) + (µ i ˆt i ) 3 ] = O ( h ν + h (nh q ) 1) (A.7) by Lemma B.3, where in the above A B means that A = B + (s.o.). Lemma A.3 J 1n = n 1/ i v i(µ i 1)τ i /w i + o p (1). Proof: It follows from lemmas A.1 and A.. Lemma A.4 J n = B h,λ 1 n n v 1i(g 0i + τ i µ i )/w i + o p (1). 19

21 Proof: Using ˆt i = ˆµ i + ˆv i, we have J n = n 1/ n (µ i ˆt i )y i /w i = n 1/ n (µ i ˆµ i )y i /w i n 1/ n ˆv iy i /w i J n,1 J n,. We consider J n,1 first. J n,1 n 1/ ( (µ i ˆµ i )y i /w i = B h,λ + O p n 1/ h ν+ + h(nh q ) 1/) by Lemma B., where B h,λ is defined in lemma B.. Next, J n, = n 1/ ˆv i ˆfi y i /(f i w i ) + o p (1) = n 1/ (n 1) 1 = n 1/ (n 1) = n 1/ n(n 1) j i v j y i K n,ij /(f i w i ) (by using Eq. (A.)) (1/){v j y i /(f i w i ) + v i y j /(f j w j )}K n,ij j>i H n,b (z i, z j ), j>i (A.9) where H n,b (z i, z j ) = (1/){v j y i /(f i w i ) + v i y j /(f j w j )}K n,ij, and z i = (x i, t i, u i ). By noting that E(v i x i ) = 0 we have (using y j = g 0j + τ j (µ j + v j ) + u j ) H 1n,b (z i ) def = E[H n,b (z i, z j ) z i ] = (1/)v i (g 0i + τ i µ i )/w i + (s.o.) by Lemma B.4 (iii). Hence, by the U-statistic H-decomposition we have J n, = n 1/ {0 + (/n) = n 1/ H 1n,b (z i ) + n(n 1) v i (g 0i + τ i µ i )/w i + O p ( (nh q ) 1/) [H n,b (z i, z j ) H 1n,b (z i ) H 1n,b (z j ) + 0] j>i (A.31) because the last term in the H-decomposition is a degenerate U-statistic which has an order n 1/ O p ( (nh q/ ) 1) = O p ( (nh q ) 1/). B Appendix B Lemma B.1 Let D denote the support of x d, for all x d D, let g(x d, x c ) G ν and f(x d, x c ) G ν 1, ν is an integer. Define η = q hν s + r λ s. Suppose the kernel function W satisfies (A). Then, uniformly in x, (i) E {[g(x) g(x)]k n (X, x)} = q C 1s(x)h ν s + r C s(x)λ s + O ( η ( q h s + r λ s) ) ; 0

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Peter Hall, Qi Li, Jeff Racine 1 Introduction Nonparametric techniques robust to functional form specification.

More information

Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data

Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data Jeff Racine Department of Economics, University of South Florida Tampa, FL 33620, USA Qi Li Department of Economics,

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Balancing Covariates via Propensity Score Weighting

Balancing Covariates via Propensity Score Weighting Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Error distribution function for parametrically truncated and censored data

Error distribution function for parametrically truncated and censored data Error distribution function for parametrically truncated and censored data Géraldine LAURENT Jointly with Cédric HEUCHENNE QuantOM, HEC-ULg Management School - University of Liège Friday, 14 September

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions Yu Yvette Zhang Abstract This paper is concerned with economic analysis of first-price sealed-bid auctions with risk

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

Comparative Advantage and Schooling

Comparative Advantage and Schooling Comparative Advantage and Schooling Pedro Carneiro University College London, Institute for Fiscal Studies and IZA Sokbae Lee University College London and Institute for Fiscal Studies June 7, 2004 Abstract

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES

A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES MICHAEL O HARA AND CHRISTOPHER F. PARMETER Abstract. This paper presents a Monte Carlo comparison of

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators

A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators ANNALS OF ECONOMICS AND FINANCE 4, 125 136 (2003) A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators Insik Min Department of Economics, Texas A&M University E-mail: i0m5376@neo.tamu.edu

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity Jörg Schwiebert Abstract In this paper, we derive a semiparametric estimation procedure for the sample selection model

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model

The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1

Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 Imbens/Wooldridge, Lecture Notes 1, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 1, Monday, July 30th, 9.00-10.30am Estimation of Average Treatment Effects Under Unconfoundedness 1.

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Nonparametric confidence intervals. for receiver operating characteristic curves

Nonparametric confidence intervals. for receiver operating characteristic curves Nonparametric confidence intervals for receiver operating characteristic curves Peter G. Hall 1, Rob J. Hyndman 2, and Yanan Fan 3 5 December 2003 Abstract: We study methods for constructing confidence

More information

A multiple testing procedure for input variable selection in neural networks

A multiple testing procedure for input variable selection in neural networks A multiple testing procedure for input variable selection in neural networks MicheleLaRoccaandCiraPerna Department of Economics and Statistics - University of Salerno Via Ponte Don Melillo, 84084, Fisciano

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study

Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study MPRA Munich Personal RePEc Archive Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study Songxi Chen and Jing Qin and Chengyong Tang January 013 Online

More information

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1 Imbens/Wooldridge, IRP Lecture Notes 2, August 08 IRP Lectures Madison, WI, August 2008 Lecture 2, Monday, Aug 4th, 0.00-.00am Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Introduction

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

More information

Finite Sample Performance of Semiparametric Binary Choice Estimators

Finite Sample Performance of Semiparametric Binary Choice Estimators University of Colorado, Boulder CU Scholar Undergraduate Honors Theses Honors Program Spring 2012 Finite Sample Performance of Semiparametric Binary Choice Estimators Sean Grover University of Colorado

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Balancing Covariates via Propensity Score Weighting: The Overlap Weights Balancing Covariates via Propensity Score Weighting: The Overlap Weights Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu PSU Methodology Center Brown Bag April 6th, 2017 Joint

More information

Regression Discontinuity Designs in Stata

Regression Discontinuity Designs in Stata Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization

More information

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects DISCUSSION PAPER SERIES IZA DP No. 9989 The Value of Knowing the Propensity Score for Estimating Average Treatment Effects Christoph Rothe June 2016 Forschungsinstitut zur Zukunft der Arbeit Institute

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs

Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Xiao-Hua Zhou, Huazhen Lin, Eric Johnson Journal of Royal Statistical Society Series

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects P. Richard Hahn, Jared Murray, and Carlos Carvalho July 29, 2018 Regularization induced

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 Keisuke Hirano University of Miami 2 Guido W. Imbens University of California at Berkeley 3 and BER Geert Ridder

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information