A Simple, Graphical Procedure for Comparing. Multiple Treatment Effects

Size: px
Start display at page:

Download "A Simple, Graphical Procedure for Comparing. Multiple Treatment Effects"

Transcription

1 A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb October 26, 2015 <<< PRELIMINARY AND INCOMPLETE >>> Abstract In this paper, we utilize a new graphical procedure to show how multiple treatment effects can be compared while controlling the familywise error rate (the probability of finding one or more spurious differences between the parameters of interest). Monte Carlo simulations suggest that this procedure adequately controls the familywise error rate in finite samples, and has average power nearly identical to a simple max-t procedure. We illustrate our proposed approach using data from two field experiments, one comparing different types of performance pay for teachers in India, and another comparing different student achievement programs in Canada. Keywords: multiple comparisons; familywise error rate; average treatment effect; bootstrap Department of Economics, Ryerson University. brennan@ryerson.ca Department of Economics, Carleton University. matt.webb@carleton.ca 1

2 1 Introduction In the case of comparing multiple treatments, the problem at hand is two-fold: (A) we want to know whether or not the effect of each treatment is different from zero, and (B) we want to know whether or not the effect of each treatment is different from that of any of the other treatments. 1 In order to make the discussion of our problem more concrete, consider the following regression model: Y t = β 0 C t + k β i T i,t + Z tη + U t, t = 1,..., n, (1.1) i=1 where C t equals one if individual t belongs to the control group and zero otherwise, T i,t equals one if individual t receives treatment i {1,..., k} and zero otherwise, Z t is a vector of other characteristics for individual t, and U t is an idiosyncratic error term for individual t. The (average) treatment effect of the ith treatment is defined as δ i β i β 0, for i {1,..., k}. Thus, the first part of our problem involves testing δ i = 0, for each i {1,..., k}, (1.2) while the second part of our problem involves testing δ i = δ j, for each unique (i, j) {1,..., k} {1,..., k}. (1.3) Note that, since δ i = 0 is equivalent to β i = β 0, and δ i = δ j is equivalent to 1 Note that this problem is quite distinct from the problem of comparing heterogenous treatment effects, where different types of individuals may respond differently to the same treatment. Lee and Shaikh (2014) and Gu and Shen (2015) consider this problem using a multiple comparisons approach. 2

3 β i = β j, our problem boils down to testing β i = β j, for each unique (i, j) K K, (1.4) where K = {0,..., k}. Thus, our problem can be seen to involve making a total of ( ) ( k+1 2 comparisons: the k comparisons implicit in (1.2), plus the k ) 2 comparisons implicit in (1.3). For example, with k = 2 treatments, we must make 3 comparisons: the comparison of the 2 treatment effects to zero, and the comparison between the 2 treatment effects. With k = 3 treatments, we must make 6 comparisons, and so on. It is well-known that, when conducting more than one hypothesis test at a given nominal level simultaneously, the probability of rejecting at least one true hypothesis (i.e., the familywise error rate) is often well in excess of that given nominal level. To illustrate the severity of this issue, we generate, for k {2,..., 5}, one million samples from the model in (1.1) as follows. We set β 0 = = β k = 0, and assign 100 observations to the control group (i.e., n t=1 C t = 100) and 100 observations to each of the k treatment groups (i.e., n t=1 T i,t = 100 for each i {1,..., k}), so that n = 100(k + 1). For each t, Z t = 0 and U t is an independent standard normal draw. Within each sample, we independently test (A) each of the k restrictions in (1.2), and (B) each of the ( k 2) restrictions in (1.3), using conventional t-tests at the 5% nominal level. The rejection frequencies for these tests are shown in Figure 1. Specifically, the dash-dotted line shows the frequency of rejecting at least one of the k restrictions in (1.2), while the dashed line shows the frequency of rejecting at least one of the ( ) k 2 restrictions in (1.3). The solid line shows the empirical familywise error rate (i.e., the frequency of rejecting at least one the ( ) k+1 2 restrictions in (1.4)). Note that, even with k = 2, the empirical familywise error rate is approximately 0.122; with k = 5, the empirical familywise error rate is approximately

4 Figure 1: Empirical Familywise Error Rates for Independent t-tests In recognition of such problems, a wide variety of multiple testing procedures, such as max-t procedures (see, e.g., Romano and Wolf, 2005a, and Section 4 below) have been developed to control the familywise error rate (or other generalized error rates, such as the false discovery rate; see Benjamini and Hochberg, 1995). For the specific problem of making multiple pairwise comparisons that we are interested in here, Bennett and Thompson (forthcoming), hereafter BT, have recently proposed a graphical procedure that identifies differences between a pair of parameters through the non-overlap of the so-called uncertainty intervals (see Section 2) for those parameters. 2 This graphical procedure, which can be seen as a resampling-based generalization of Tukey s (1953) procedure, is appealing because it offers users more than a 0-1 ( Yes-No ) decision regarding differences between parameters. Indeed, this procedure allows users to determine both statistical and practical significance of pairwise differences, while also providing them with a measure of uncertainty concerning 2 Note that BT denote the total number of parameters by k, while the total number of parameters we consider is k

5 the locations of the individual parameters. The remainder of this paper is organized as follows. In Section 2, we describe how the procedure of BT can be applied in the current context. In Section 3, we consider an extension of this procedure designed to identify the best treatment under consideration. Section 4 describes the results of a set of Monte Carlo simulations designed to examine the finite-sample performance of these procedures. In Section 5, we present the results of two empirical examples, one in which the different treatments are k = 2 types of performance pay for teachers used in a field experiment in India, and another in which the different treatments are k = 3 student achievement programs used in a field experiment in Canada. 2 The Overlap Procedure The basic approach of this procedure is to present each of the parameter estimates ˆβ n,i, i K, together with a corresponding uncertainty interval, [ ˆβn,i ± γ se ( ˆβn,i )], ( ) whose length is determined by the parameter γ (discussed below) and se ˆβn,i, the standard error of ˆβ n,i (high-level assumptions on the large-sample behaviour of these estimators are given at the end of this section). We denote the lower and upper endpoints of the uncertainty interval for β i by L n,i and U n,i, respectively. These uncertainty intervals are used to make inferences about the ordering of the parameters of interest as follows: We infer that β i > β j if L n,i > U n,j. If the uncertainty intervals for β i and β j overlap one another, we can make no such inference. For this reason, we refer to this procedure as the overlap procedure. 5

6 Note that, if all k parameters are equal, the ideal choice of γ at the nominal familywise error rate α is the smallest value satisfying } Prob P {max {L n,i(γ)} > min {U n,i(γ)} α, (2.1) i K i K where P denotes the unknown probability mechanism from which our data is generated. Of course, because P is unknown, this choice is infeasible. Instead, we choose γ to satisfy the bootstrap analogue of (2.1): Prob ˆPn {max i K { L n,i (γ) } { > min U n,i (γ) }} α, i K where ˆP n is an estimate of P, and L n,i and U n,i are, respectively, the lower and upper endpoints of [ ( ˆβ n,i ˆβ ( )] n,i ) ± γ se β n,i, with ˆβ ( ) n,i and se β n,i are obtained under sampling from ˆP n. We assume only that n( ˆβn β) and n( ˆβ n,i ˆβ n,i ) both have the same (continuous and strictly increasing) limiting distribution, and that ( ) n se ˆβn,i probability to the same (positive) constant. and ( ) n se β n,i both converge in Given these high-level assumptions, BT show that the overlap procedure described above (1) controls the familywise error rate asymptotically, and (2) is consistent, in the sense that any differences between parameter pairs are inferred (in the correct direction) with probability one asymptotically. Simulation evidence presented in BT and in Section 4 below suggest that the overlap procedure provides satisfactory control of the familywise error rate and has good (average) power properties in finite samples. 6

7 3 A Modified Overlap Procedure for Multiple Comparisons with the Best The procedure described in the previous section is designed to control the familywise error rate across all pairwise comparisons. This approach allows for a (potentially complete) ranking of all the treatments under consideration. For example, assuming (without loss of generality) that a larger value of the outcome variable is better, one could infer that treatment i is the best (i.e., β i > β j for all j K \ {i}) if L n,i > U n,j for all j K \ {i}. Similarly, one may be able to identify a second best treatment, a third best treatment, and so on. While such a complete ranking may occasionally be of value, interest often centers on identifying only the (first) best treatment. That is, we may only want to know whether or not the treatment effect which is estimated to be the largest is actually statistically distinguishable from the other treatments effects (and zero). Such a problem is the focus of so-called multiple comparisons with the best procedures (see, e.g., Hsu, 1981, 1984 or Horrace and Schmidt 2000). Below, we follow BT in developing a modification of the graphical procedure described in the previous section to focus on this problem. 3 The basic idea behind this modification is that, by eliminating irrelevant pairwise comparisons (i.e., those in which neither of the parameters is estimated to be largest), it may be possible to increase the power of the procedure. We begin by introducing some further notation. Let [1], [2],..., [k + 1], be the 3 In fact, BT introduce a generalization of the multiple comparisons with the best approach which allows for comparisons within the r best (r being some integer smaller than the total number of parameters under consideration). Such an approach may be of use when the number of parameters under consideration is very large (perhaps in the hundreds or thousands), and one is willing to abandon pursuit of a complete ranking in return for the ability to resolve more comparisons within the top r. 7

8 random indices such that ˆβ n,[1] > ˆβ n,[2] > > ˆβ n,[k+1]. This means that β [1] is the true value of the parameter which is estimated to be largest, and not necessarily the largest parameter value. Similarly, L n,[1] (γ) = ˆβ ( ) n,[1] γ se ˆβn,[1] is the lower endpoint of the uncertainty interval associated with the largest point estimate, which is not necessarily the largest lower endpoint. Indeed, if the standard error of ˆβ n,[1] is relatively large, it could be the case that the lower endpoint associated with this point estimate extends below the lower endpoint associated with a smaller point estimate. Similar to what is done in the basic overlap procedure, we infer that β [1] is the largest parameter value in the collection if L n,[1] > U n,[j] for all j > 1. Thus, since our ideal choice of γ is the smallest value satisfying 4 { Prob P {L n,[1] (γ) > max Un,[j] (γ) }} α (3.1) j {2,...,k+1} when all k parameters are equal, a feasible choice of γ is the smallest value satisfying { Prob ˆPn {L n,[1](γ) > max U n,[j] (γ) }} α. (3.2) j {2,...,k+1} Simulation evidence presented in the next section suggests that the choice of γ resulting from this modification may be substantially smaller than the choice resulting from 4 If a smaller value of the outcome variable is better, { this procedure would be modified { as follows. The probability in (3.1) would be replaced by Prob P Un,[k+1] (γ) < max j {1,...,k} Ln,[j] (γ) }} }}, and the probability in (3.2) would be replaced by Prob ˆPn {Un,[k+1] (γ) < max j {1,...,k} {U n,[j] (γ). The resulting uncertainty intervals could then be used to infer that β [k+1] is the smallest parameter value in the collection if U n,[k+1] < U n,[j] for all j < (k + 1). 8

9 the basic overlap procedure, thereby greatly increasing the power of the procedure. 4 Simulation Evidence We now examine the finite-sample performance of the overlap procedures described above by way of several Monte Carlo experiments. As in BT, we consider a max-t procedure as a benchmark for the basic overlap procedure. Specifically, this procedure rejects the restriction that β i = β j whenever T n,(i,j) = ˆβ n,i ˆβ n,j [ ( )] 2 [ ( )]. 2 se ˆβn,i + se ˆβn,j is greater than 1 α quantile of max i j T n,(i,j), where T n,(i,j) = ( ˆβ n,i ˆβ ) ( n,i ˆβ n,j ˆβ ) n,j [ ( )] 2 [ ( )]. 2 se ˆβ n,i + se ˆβ n,j It is important to note that this test statistic (and its bootstrap counterpart) is not a function of the estimated covariance between ˆβ n,i and ˆβ n,j, since this estimate will always be zero in the designs we consider. This is due to the fact that we do not include any covariates in our designs (i.e., Z t = 0 for each t). 5 In what follows, we estimate β i, i K, using ordinary least squares and obtain heteroskedasticity-consistent standard errors using the method of White (1980). The bootstrap counterparts of these objects are obtained using the wild bootstrap (see, 5 If covariates are included (as in the empirical examples considered in the next section), the estimated covariances between the parameters of interest may be non-zero. In any case, although the overlap procedure does not make explicit use of the estimated covariances, it is valid under general forms of dependence between the estimated parameters; see BT. 9

10 Homoskedasticity Heteroskedasticity n max-t Overlap Mod. Overlap max-t Overlap Mod. Overlap Table 1: Empirical familywise error rates for max-t and basic and modified overlap procedures e.g., Mammen, 1993) with 199 replications. We set the nominal familywise error rate α equal to The design of our simulations is the same as the one described in Section 1, but with several variations. First, we fix k = 5 and generate 10,000 samples of size n, with n {300, 600, 1200} (i.e., when n = 300, there are 50 observations in the control group and 50 observations in each of the 5 treatment groups, and so on). Second, we set β i = θ(i + 1), with θ {0, 0.01, 0.02,..., 1}, which allows us to examine both control of the familywise error rate (when θ = 0) and power (when θ > 0). Finally, we consider two different specifications for the error term distribution: A homoskedastic case in which all of the errors are drawn from the standard normal distribution, and a heteroskedastic case in which the errors for observations assigned to the control group are standard normal, while the observations assigned to treatment group i {1,..., k} are normal with mean zero and variance i + 1. Table 1 shows that control of the familywise error rate for the max-t procedure and both the basic and modified overlap procedures is adequate at all of the sample sizes considered in both the homoskedastic and heteroskedastic cases. Interestingly, although the differences are quite small, the empirical familywise error rates for the overlap procedures are uniformly smaller than those for the max-t procedure. In order to compare the power of the max-t procedure and the basic overlap procedure, we follow BT and Romano and Wolf (2005b) in examining average power, 10

11 (a) Homoskedastic errors (b) Heteroskedastic errors Figure 2: Empirical average power for overlap and max-t procedures which is the proportion of false restrictions (of the form β i = β j here) that are rejected. Figures 2a and 2b display the empirical average power for these two procedures as a function of θ in the homoskedastic and heteroskedastic cases, respectively. Within these figures, black lines correspond to the basic overlap procedure and red lines correspond to the max-t procedure, while lines that are solid, dashed, and dotted correspond to n = 300, n = 600, and n = 1200, respectively. Evidently, both procedures have nearly identical average power over θ and at all of the sample sizes considered in both the homoskedastic and heteroskedastic cases. Finally, we turn to the gain in power that results from using the modified overlap procedure. Specifically, we examine the probability that the largest (i.e., the best ) parameter is correctly identified. 6 Figures 3a and 3b display the empirical probability that the best is identified by the two overlap procedures as a function of θ in the homoskedastic and heteroskedastic cases, respectively. Within these figures, the black 6 Note that, here, we always have θ > 0, so the largest parameter is β k+1. Thus, the largest parameter is correctly identified whenever L n,k+1 > U n,j for all j (k + 1). 11

12 (a) Homoskedastic errors (b) Heteroskedastic errors Figure 3: Empirical probability of identifying the best for the basic and modified overlap procedures line corresponds to the basic overlap procedure while the blue line corresponds to the modified overlap procedure. To reduce clutter, we only plot the results for n = 600, but the results for the other sample sizes are qualitatively similar. Namely, the modified overlap procedure does a much better job in identifying the best, particularly in the heteroskedastic case. As noted in the previous section, this is due to the fact that γ is chosen to be much smaller with the modified overlap procedure, resulting in narrower uncertainty intervals. In all cases considered here, we find that the value of γ that is chosen by the modified overlap procedure is slightly less than half as large (on average, over the 10,000 samples) as the value of γ that is chosen by the basic overlap procedure. More generally, this ratio will decrease as k is increased. 12

13 5 Empirical Examples We now present the results of two empirical examples, both from the education literature. In Section 5.1, we re-visit a field experiment examining different types of performance pay for teachers. In Section 5.2, we re-visit a field experiment examining different student achievement programs. 5.1 Performance Pay for Teachers Muralidharan and Sundararaman (2011), hereafter MS, conducted a set of experiments in India in which they offered incentive pay to teachers conditioned on student s scores. In the analysis, they have outcomes from three separate groups of schools: a control group, a group in which teachers were paid based on the scores of their own students, and a group in which teachers were paid based on the performance of all students at their school. That is, they consider k = 2 distinct treatments. Although k is quite small here, this example provides a very easy-to-follow illustration of the overlap procedures described above. Moreover, as the dataset used in this example is publicly-available, readers can easily replicate the results presented below. 7 Among other things, MS test the impact of the two interventions on combined math and language (Telegu) scores. The experiment ran for two years. In the majority of the analysis, year two scores are analyzed controlling for the year zero (i.e., pre-treatment) scores. While most of the paper is about the average impact of the interventions, in their Table 8, the authors compare the impact of the group incentive to the impact of the individual incentive. To make these comparisons, they first 7 The dataset is available at: file/ data.zip. 13

14 estimate the following model: 49 Score2 t = δ 0 + δ 1 Group t + δ 2 Individual t + η l County l,t + η 50 Score0 t + errors, where Score2 and Score0 are, respectively, the combined math and language score in year 2 and year 0, Group is an indicator variable for being in the group incentive, Individual is an indicator for being in the individual incentive, and {County l } 49 l=1 is a set of indicator variables for all but one of the 50 counties (Mandals) in which the experiment was conducted. There are n = observations, and the model is estimated using ordinary least squares. Heteroskedasticity-consistent standard errors are obtained using clustering (see, e.g., Cameron et al., 2008); this clustering is done by school. Next, MS independently test (i.e., without controlling the familywise error rate) the following three restrictions (absolute values of the T -statistics obtained for the l=1 tests of these three restrictions are given in brackets): 8 MS1: δ 1 = 0 ( T n,ms1 = 2.71) MS2: δ 2 = 0 ( T n,ms2 = 4.84) MS3: δ 1 = δ 2 ( T n,ms3 = 1.91) Thus, MS conclude that both treatment effects are statistically different from zero, and from one another (although this latter conclusion can only be made at a nominal level of slightly less than 6%). We now consider an application of the overlap procedures discussed above. To do so, we first need to re-write the model above so that the mean of the control group is 8 Note that these T -statistics do incorporate the estimated covariances between the parameters, which are not equal to zero because of the inclusion of the covariates. 14

15 estimated as a parameter, rather than being absorbed into the intercept. Specifically, we estimate the following model: 49 Score2 t = β 0 Control t +β 1 Group t +β 2 Individual t + η l County l,t +η 50 Score0 t +errors, where Control is an indicator variable for membership in the control group. Notice that δ 0 = β 0 and and δ i = β i β 0, for i {1, 2}. For a nominal familywise error rate of α = 0.05, we obtain a value of 0.50 for γ with the basic overlap procedure. This choice is based on 999 wild cluster bootstrap (see, e.g., Cameron et al., 2008) replications. The resulting uncertainty intervals are shown in Figure 4a, which can be interpreted as follows: Since the uncertainty intervals for β 1 and β 0 overlap, we can not infer that β 1 > β 0 (or, equivalently, we can not infer that δ 1 is positive). l=1 Since L n,2 > U n,0, we infer that β 2 > β 0 (or, equivalently, that δ 2 > 0). Since the uncertainty intervals for β 2 and β 1 overlap, we can not infer that β 2 > β 1 (or, equivalently, we can not infer that δ 2 δ 1 is positive). Thus, while our conclusion on the second restriction (MS2) is consistent with MS, our conclusions on the first and third restrictions (MS1 and MS3) are not. 9 We also consider an alternative and perhaps more useful method of visualizing these results in Figure 4b. Here, the quantities are re-centered around the treatment effects, δ 1 β 1 β 0 and δ 2 β 2 β 0. That is, we subtract ˆβ n,0 = 0.13 from ˆβ n,1 and ˆβ n,2, and from the endpoints of the uncertainty intervals for β 1 and β 2. Moreover, we include a dotted horizontal line at the the value of U n,0 ˆβ n,0 = 0.08, the upper 9 We also applied the basic overlap procedure at a nominal familywise error rate of α = 0.06 (obtaining a value of 0.47 for γ), and found that the uncertainty intervals for β 2 and β 1 were still overlapping (recall that the absolute value of the T -statistic for the test of MS3 was 1.91). 15

16 (a) Uncertainty Intervals for β i, i K (b) Uncertainty Intervals for δ i, i {1,..., k} Figure 4: Performance Pay Example: Basic Overlap Procedure endpoint of the re-centered version of the uncertainty interval for β Given that the uncertainty interval for δ 2 (the treatment effect of for the individual incentive) lies entirely above this dotted horizontal line, for example, one can quickly infer that δ 2 > 0 (or, equivalently, β 2 > β 0, as was seen from Figure 4a). Finally, we apply the modified overlap procedure in an attempt to identify the best treatment. That is, we seek to determine whether or not the treatment effect for the individual incentive (the treatment effect which is estimated to be the largest) is statistically distinguishable from the other treatments effects (and zero). Using the modified overlap procedure, we obtain a value of 0.34 for γ, leading to substantially narrower uncertainty intervals; see Figure 5. Note that we only plot the lower endpoint of the uncertainty interval for β [1] = β 2 and the upper endpoints of the uncertainty intervals for β [2] = β 1 and β [3] = β 0 in Figure 5a (and similarly in Figure 5b). This is 10 One could similarly include a dotted horizontal line at the the value of L n,0 ˆβ n,0 if the vertical axis were to extend below zero; Figure 6b provides an example of this. 16

17 (a) Uncertainty Intervals for β 0, β 1, and β 2 (b) Uncertainty Intervals for δ 1 and δ 2 Figure 5: Performance Pay Example: Modified Overlap Procedure done in order to emphasize the fact that the modified overlap procedure can not be used to make inferences about the ordering of any parameter pair (β [i], β [j] ) whenever min{i, j} > 1 (i.e., whenever neither of the parameters is estimated to be largest). Here, our conclusions are more in line with MS. Specifically, we infer that β 2 > β 0 (or, equivalently, δ 2 > 0) and that β 2 > β 1 (or, equivalently, δ 2 > δ 1 ). That is, our conclusions on the second and third restrictions (MS2 and MS3) are consistent with MS. However, since the modified overlap procedure is not designed to allow us to infer anything about the ordering of β 1 and β 0 (or, equivalently, anything about the sign of δ 2 ), we can not make any conclusions about MS1. The increased power of the modified overlap procedure comes entirely at the cost of remaining silent on such comparisons. 5.2 Student Achievement Programs Angrist et al. (2009), hereafter ALO, conducted a field experiment at a large, public 17

18 university in Canada in order to examine programs aimed at improving students first year academic performance and second year re-enrollment. 11 The experiment involved sorting students into a control group and three treatment groups (i.e., k = 3). Students in the first treatment group were offered support services (supplemental instruction and peer advising), while students in the second treatment group were offered financial incentives (cash awards depending on their performance). Students in the third treatment group were offered both support services and financial incentives. Although ALO present results for several different outcome variables, we focus here on just first semester grade point averages for simplicity. Moreover, ALO present results for males and females (both separately and together), while we only consider females. We do so because ALO do not find any statistically significant treatment effects for males or for males and females combined (see the first two columns of their Table 5). In order to examine the effects of the different treatments in this case, ALO estimate the following model: Grades t = δ 0 + δ 1 Support t + δ 2 Financial t + δ 3 Combined t + Z tη + errors, where Grades is average grade at the end of first semester, Support, Financial, Combined are indicator variables indicating membership in treatment group 1 (support services only), group 2 (financial incentives only), and group 3 (both support services and financial incentives), respectively, and Z contains sets of dummy variables for mother tongue, high school group, number of courses in fall term, self reports on how often the student procrastinates, mother s education, and father s education. There are n = 729 observations, and the model is estimated using ordinary least squares. Heteroskedasticity-consistent standard errors are obtained using the method 11 This dataset is publicly-available from: data.zip 18

19 of White (1980). While ALO do not explicitly test for differences between the different treatment effects, they do directly compare the parameter estimates in the text. For example, on p. 14 the authors state that the estimates for women suggest the combination of services and fellowships... had a larger impact than fellowships alone. If they had formally tested for the differences between the coefficients, the ( ) = 6 restrictions would be (absolute values of the T -statistics obtained for the tests of these restrictions are given in brackets): 12 ALO1: δ 1 = 0 ( T n,alo1 = 0.58) ALO2: δ 2 = 0 ( T n,alo2 = 2.21) ALO3: δ 3 = 0 ( T n,alo3 = 3.17) ALO4: δ 1 = δ 2 ( T n,alo4 = 1.21) ALO5: δ 1 = δ 3 ( T n,alo5 = 2.10) ALO6: δ 2 = δ 3 ( T n,alo6 = 1.01) Thus, at the 5% nominal level, one would only reject the restrictions ALO2, ALO3, and ALO5 if the tests were conducted independently (i.e., without controlling the familywise error rate). In other words, one could conclude that only the treatment effects for the financial group and the combined group are statistically different from zero (ALO do test the first three restrictions and come to the same conclusions about them), and that the treatment effect for the combined group is statistically different from the treatment effect for the support group. Again, while ALO do not explicitly test ALO4 ALO6, their statement quoted above (about the treatment effect for the 12 As in the previous example, these T -statistics incorporate the estimated covariances between the parameters, which are not equal to zero because of the inclusion of the covariates. 19

20 combined group being larger than the treatment effect for the financial group) is not statistically supported, even if one ignores the multiple comparisons problem. We now turn our attention to the overlap procedures. As in the previous example, we first need to re-write the model above so that the mean of the control group is estimated as a parameter, rather than being absorbed into the intercept. That is, we estimate the following model: GPA t = β 0 Control t + β 1 Support t + β 2 Financial t + β 3 Combined t + Z tη + errors, where Control is an indicator variable for membership in the control group. Given a nominal familywise error rate of α = 0.05, we obtain a value of 0.64 for γ with the basic overlap procedure. This value is obtained using 999 wild bootstrap replications (the standard errors obtained by ALO are not clustered). Our resulting uncertainty intervals are shown in Figure 6, which can be interpreted as follows: Since the uncertainty intervals for β 1 and β 0 overlap, we can not infer that β 1 > β 0 (or, equivalently, we can not infer that δ 1 is positive). Similarly, since the uncertainty intervals for β 2 and β 0 overlap, we can not infer that β 2 > β 1 (or, equivalently, we can not infer that δ 2 is positive). Since L n,3 > U n,0, we infer that β 3 > β 0 (or, equivalently, that δ 3 > 0). Since the uncertainty intervals for β 2 and β 1 overlap, we can not infer that β 2 > β 1 (or, equivalently, we can not infer that δ 2 δ 1 is positive). Similarly, since the uncertainty intervals for β 3 and β 1 overlap, we can not infer that β 3 > β 1 (or, equivalently, we can not infer that δ 3 δ 1 is positive). Finally, since the uncertainty intervals for β 3 and β 2 overlap, we can not infer that β 3 > β 2 (or, equivalently, we can not infer that δ 3 δ 2 is positive). 20

21 (a) Uncertainty Intervals for β i, i K (b) Uncertainty Intervals for δ i, i {1,..., k} Figure 6: Student Achievement Example: Basic Overlap Procedure In summary, our conclusions on four of the restrictions (ALO1, AL03, AL04, and ALO6) are consistent with the results of independent tests conducted at the 5% nominal level. However, our conclusions on the other two restrictions (ALO2 and ALO5) are not. We next apply the modified overlap procedure. Here, we obtain a value of 0.35 for γ, leading to much narrower uncertainty intervals. From Figure 7, we infer that β 3 > β 0 (or, equivalently, δ 3 > 0), and that β 3 > β 1 (or, equivalently, δ 3 > δ 1 ). However, we can not that β 3 > β 2 (or, equivalently, we can not infer that δ 3 δ 2 is positive). That is, we can conclude that the treatment effect for the combined group (the largest estimate) is larger than the treatment effect for the control group, but not that it is larger than the treatment effect for the financial group. This reinforces our finding above that the claim made by ALO that the treatment effect for the combined group is larger than the treatment effect for the financial group is not statistically supported. 21

22 (a) Uncertainty Intervals for β i, i K (b) Uncertainty Intervals for δ i, i {1,..., k} Figure 7: Student Achievement Example: Modified Overlap Procedure Overall, the conclusions from Figure 7 differ from those based on Figure 6 only in that we can conclude that the treatment effect for the combined group is larger than the treatment effect for the support group. Similar to what was noted in the previous example, the increased power of the modified overlap procedure comes at the cost of not being able to make comparisons between the treatment effects for the financial group and the support group, or between either of these two groups and zero. That is, we move from being able to make 6 pairwise comparisons with the basic overlap procedure, to only being able to make 3 pairwise comparisons with the modified overlap procedure. References Angrist, J., Lang, D., and Oreopoulos, P. (2009). Incentives and services for college achievement: Evidence from a randomized trial. American Economic Journal: 22

23 Applied Economics, 1(1): Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1): Bennett, C. J. and Thompson, B. S. (forthcoming). Graphical procedures for multiple comparisons under general dependence. Journal of the American Statistical Association. Cameron, A. C., Gelbach, J. B., and Miller, D. L. (2008). Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics, 90(3): Gu, J. and Shen, S. (2015). Multiple testing for positive treatment effects. Unpublished manuscript. Horrace, W. C. and Schmidt, P. (2000). Multiple comparisons with the best, with economic applications. Journal of Applied Econometrics, 15(1):1 26. Hsu, J. C. (1981). Simultaneous confidence intervals for all distances from the best. The Annals of Statistics, 9(5): Hsu, J. C. (1984). Constrained simultaneous confidence intervals for multiple comparisons with the best. The Annals of Statistics, 12(3): Lee, S. and Shaikh, A. M. (2014). Multiple testing and heterogenous treatment effects: Re-evaluating the effect of Progresa on school enrollment. Journal of Applied Econometrics, 29(4): Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. The Annals of Statistics, 21(1):

24 Muralidharan, K. and Sundararaman, V. (2011). Teacher performance pay: Experimental evidence from India. Journal of Political Economy, 119(1): Romano, J. P. and Wolf, M. (2005a). Exact and approximate stepdown methods for multiple hypothesis testing. Journal of the American Statistical Association, 100(469): Romano, J. P. and Wolf, M. (2005b). Stepwise multiple testing as formalized data snooping. Econometrica, 73(4): Tukey, J. W. (1953). The problem of multiple comparisons. In The Collected Works of John W. Tukey VIII. Multiple Comparisons: , pages Chapman and Hall, New York. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4):

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

A Simple, Graphical Procedure for Comparing. Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing. Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb February 6, 2016 > Abstract In this paper, we utilize a new

More information

Graphical Procedures for Multiple Comparisons Under General Dependence

Graphical Procedures for Multiple Comparisons Under General Dependence Graphical Procedures for Multiple Comparisons Under General Dependence Christopher J. Bennett a and Brennan S. Thompson b March 7, 2014 Abstract It has been more than half a century since Tukey first introduced

More information

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Matthew D. Webb October 9, 2013 Introduction This paper is joint with: Contributions: James G. MacKinnon Department of Economics Queen's University

More information

Christopher J. Bennett

Christopher J. Bennett P- VALUE ADJUSTMENTS FOR ASYMPTOTIC CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE by Christopher J. Bennett Working Paper No. 09-W05 April 2009 DEPARTMENT OF ECONOMICS VANDERBILT UNIVERSITY NASHVILLE,

More information

Casuality and Programme Evaluation

Casuality and Programme Evaluation Casuality and Programme Evaluation Lecture V: Difference-in-Differences II Dr Martin Karlsson University of Duisburg-Essen Summer Semester 2017 M Karlsson (University of Duisburg-Essen) Casuality and Programme

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University QED Queen s Economics Department Working Paper No. 1319 Hypothesis Testing for Arbitrary Bounds Jeffrey Penney Queen s University Department of Economics Queen s University 94 University Avenue Kingston,

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Bootstrap Tests: How Many Bootstraps?

Bootstrap Tests: How Many Bootstraps? Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin October 2018 Bruce Hansen (University of Wisconsin) Exact

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Bootstrap Testing in Econometrics

Bootstrap Testing in Econometrics Presented May 29, 1999 at the CEA Annual Meeting Bootstrap Testing in Econometrics James G MacKinnon Queen s University at Kingston Introduction: Economists routinely compute test statistics of which the

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS*

WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS* L Actualité économique, Revue d analyse économique, vol 91, n os 1-2, mars-juin 2015 WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS* James G MacKinnon Department of Economics Queen s University Abstract Confidence

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin June 2017 Bruce Hansen (University of Wisconsin) Exact

More information

Inference on Optimal Treatment Assignments

Inference on Optimal Treatment Assignments Inference on Optimal Treatment Assignments Timothy B. Armstrong Yale University Shu Shen University of California, Davis April 23, 2014 Abstract We consider inference on optimal treatment assignments.

More information

Bootstrap-Based Improvements for Inference with Clustered Errors

Bootstrap-Based Improvements for Inference with Clustered Errors Bootstrap-Based Improvements for Inference with Clustered Errors Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Maryland, U.C. - Davis May, 2008 May, 2008 1 / 41 1. Introduction OLS regression

More information

Control of Generalized Error Rates in Multiple Testing

Control of Generalized Error Rates in Multiple Testing Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 245 Control of Generalized Error Rates in Multiple Testing Joseph P. Romano and

More information

Small-sample cluster-robust variance estimators for two-stage least squares models

Small-sample cluster-robust variance estimators for two-stage least squares models Small-sample cluster-robust variance estimators for two-stage least squares models ames E. Pustejovsky The University of Texas at Austin Context In randomized field trials of educational interventions,

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010 Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010 Instructor Name Time Limit: 120 minutes Any calculator is okay. Necessary tables and formulas are attached to the back of the exam.

More information

Inference via Kernel Smoothing of Bootstrap P Values

Inference via Kernel Smoothing of Bootstrap P Values Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Course Packet The purpose of this packet is to show you one particular dataset and how it is used in

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Wild Cluster Bootstrap Confidence Intervals

Wild Cluster Bootstrap Confidence Intervals Wild Cluster Bootstrap Confidence Intervals James G MacKinnon Department of Economics Queen s University Kingston, Ontario, Canada K7L 3N6 jgm@econqueensuca http://wwweconqueensuca/faculty/macinnon/ Abstract

More information

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments

Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments James E. Pustejovsky UT Austin Educational Psychology Department Quantitative Methods Program pusto@austin.utexas.edu Elizabeth

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Tobit and Interval Censored Regression Model

Tobit and Interval Censored Regression Model Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 98-994 Research India Publications http://www.ripublication.com Tobit and Interval Censored Regression Model Raidani

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Supplement to Randomization Tests under an Approximate Symmetry Assumption

Supplement to Randomization Tests under an Approximate Symmetry Assumption Supplement to Randomization Tests under an Approximate Symmetry Assumption Ivan A. Canay Department of Economics Northwestern University iacanay@northwestern.edu Joseph P. Romano Departments of Economics

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Obtaining Critical Values for Test of Markov Regime Switching

Obtaining Critical Values for Test of Markov Regime Switching University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

On block bootstrapping areal data Introduction

On block bootstrapping areal data Introduction On block bootstrapping areal data Nicholas Nagle Department of Geography University of Colorado UCB 260 Boulder, CO 80309-0260 Telephone: 303-492-4794 Email: nicholas.nagle@colorado.edu Introduction Inference

More information

A Course on Advanced Econometrics

A Course on Advanced Econometrics A Course on Advanced Econometrics Yongmiao Hong The Ernest S. Liu Professor of Economics & International Studies Cornell University Course Introduction: Modern economies are full of uncertainties and risk.

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Inference Based on the Wild Bootstrap

Inference Based on the Wild Bootstrap Inference Based on the Wild Bootstrap James G. MacKinnon Department of Economics Queen s University Kingston, Ontario, Canada K7L 3N6 email: jgm@econ.queensu.ca Ottawa September 14, 2012 The Wild Bootstrap

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Multiple comparisons of slopes of regression lines. Jolanta Wojnar, Wojciech Zieliński

Multiple comparisons of slopes of regression lines. Jolanta Wojnar, Wojciech Zieliński Multiple comparisons of slopes of regression lines Jolanta Wojnar, Wojciech Zieliński Institute of Statistics and Econometrics University of Rzeszów ul Ćwiklińskiej 2, 35-61 Rzeszów e-mail: jwojnar@univrzeszowpl

More information

Finite Population Correction Methods

Finite Population Correction Methods Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1

More information

NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE. Guido W. Imbens Michal Kolesar

NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE. Guido W. Imbens Michal Kolesar NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE Guido W. Imbens Michal Kolesar Working Paper 18478 http://www.nber.org/papers/w18478 NATIONAL BUREAU OF ECONOMIC

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

BOOTSTRAPPING DIFFERENCES-IN-DIFFERENCES ESTIMATES

BOOTSTRAPPING DIFFERENCES-IN-DIFFERENCES ESTIMATES BOOTSTRAPPING DIFFERENCES-IN-DIFFERENCES ESTIMATES Bertrand Hounkannounon Université de Montréal, CIREQ December 2011 Abstract This paper re-examines the analysis of differences-in-differences estimators

More information

6 Single Sample Methods for a Location Parameter

6 Single Sample Methods for a Location Parameter 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually

More information

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner Econometrics II Nonstandard Standard Error Issues: A Guide for the Practitioner Måns Söderbom 10 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Robust Standard Errors in Small Samples: Some Practical Advice

Robust Standard Errors in Small Samples: Some Practical Advice Robust Standard Errors in Small Samples: Some Practical Advice Guido W. Imbens Michal Kolesár First Draft: October 2012 This Draft: December 2014 Abstract In this paper we discuss the properties of confidence

More information

EC4051 Project and Introductory Econometrics

EC4051 Project and Introductory Econometrics EC4051 Project and Introductory Econometrics Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Intro to Econometrics 1 / 23 Project Guidelines Each student is required to undertake

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

The Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests

The Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests Working Paper 2013:8 Department of Statistics The Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests Jianxin Wei Working Paper 2013:8 June 2013 Department of Statistics Uppsala

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 516 Robust Performance Hypothesis Testing with the Variance Olivier Ledoit and Michael

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

A semi-bayesian study of Duncan's Bayesian multiple

A semi-bayesian study of Duncan's Bayesian multiple A semi-bayesian study of Duncan's Bayesian multiple comparison procedure Juliet Popper Shaer, University of California, Department of Statistics, 367 Evans Hall # 3860, Berkeley, CA 94704-3860, USA February

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Modified Variance Ratio Test for Autocorrelation in the Presence of Heteroskedasticity

Modified Variance Ratio Test for Autocorrelation in the Presence of Heteroskedasticity The Lahore Journal of Economics 23 : 1 (Summer 2018): pp. 1 19 Modified Variance Ratio Test for Autocorrelation in the Presence of Heteroskedasticity Sohail Chand * and Nuzhat Aftab ** Abstract Given that

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Section 2: Estimation, Confidence Intervals and Testing Hypothesis Section 2: Estimation, Confidence Intervals and Testing Hypothesis Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/

More information

Inference in difference-in-differences approaches

Inference in difference-in-differences approaches Inference in difference-in-differences approaches Mike Brewer (University of Essex & IFS) and Robert Joyce (IFS) PEPA is based at the IFS and CEMMAP Introduction Often want to estimate effects of policy/programme

More information

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies

More information

A multiple testing procedure for input variable selection in neural networks

A multiple testing procedure for input variable selection in neural networks A multiple testing procedure for input variable selection in neural networks MicheleLaRoccaandCiraPerna Department of Economics and Statistics - University of Salerno Via Ponte Don Melillo, 84084, Fisciano

More information

Physics 2019 v1.2. IA1 sample assessment instrument. Data test (10%) August Assessment objectives

Physics 2019 v1.2. IA1 sample assessment instrument. Data test (10%) August Assessment objectives Data test (10%) This sample has been compiled by the QCAA to assist and support teachers in planning and developing assessment instruments for individual school settings. Assessment objectives This assessment

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Friday, June 5, 009 Examination time: 3 hours

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 7, Issue 1 2011 Article 12 Consonance and the Closure Method in Multiple Testing Joseph P. Romano, Stanford University Azeem Shaikh, University of Chicago

More information

QED. Queen s Economics Department Working Paper No. 1355

QED. Queen s Economics Department Working Paper No. 1355 QED Queen s Economics Department Working Paper No 1355 Randomization Inference for Difference-in-Differences with Few Treated Clusters James G MacKinnon Queen s University Matthew D Webb Carleton University

More information

Quantitative Empirical Methods Exam

Quantitative Empirical Methods Exam Quantitative Empirical Methods Exam Yale Department of Political Science, August 2016 You have seven hours to complete the exam. This exam consists of three parts. Back up your assertions with mathematics

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information