Partially Identified Treatment Effects for Generalizability

Size: px
Start display at page:

Download "Partially Identified Treatment Effects for Generalizability"

Transcription

1 Partially Identified Treatment Effects for Generalizability Wendy Chan Human Development and Quantitative Methods Division, Graduate School of Education University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA (Dated: June 24, 2016) 1

2 Partially Identified Treatment Effects for Generalizability Abstract Recent methods to improve generalizations from nonrandom samples typically invoke assumptions such as the strong ignorability of sample selection that are often controversial in practice to derive point estimates. Rather than focus on the point estimate based inferences, this article considers inferences on partially identified estimates from fewer and weaker assumptions. We extend partial identification methods to causal generalization with nonrandom samples by using a cluster randomized trial in education. Bounds on the population average treatment effect are derived under four cases, two under no assumptions on the data, and two that assume bounded sample variation and monotonicity of response. This approach is amenable to incorporating population data frames to tighten bounds on the population average treatment effect. Under the assumptions of bounded sample variation and monotonicity, the interval estimates of the average treatment effect provide sufficiently informative bounds to rule out large treatment effects, which are consistent with the point estimates from the experimental study. This illustrates that partial identification methods can provide an alternative perspective to causal generalization in the absence of strong ignorability of sample selection. 2

3 Policymakers have become increasingly interested in the extent to which inferences from experimental studies apply to target populations of inference. However, in social science research, a common challenge to the generalizability of experimental results is that the samples used in experimental studies are typically not randomly selected (Greenberg & Shroder, 2004; Olsen et al., 2013). Without equal probability sampling, generalization of treatment effects is difficult since the bias induced from self-selection no longer allows for model free estimation of treatment effects (Keiding and Louis, 2016). Recently, statisticians have developed methods to improve generalizations from non-probability samples primarily by using propensity scores (Stuart et al., 2011; Tipton, 2012; O Muircheartaigh and Hedges, 2014). Propensity score methods match experimental samples to an inference population based on observable characteristics so that the two groups are compositionally similar (Rosenbaum and Rubin, 1984). These methods also extend the methods used to adjust for non-response in survey sampling. Such post-hoc adjustments facilitate bias-reduced estimates of the average treatment effect. While propensity score methods have significantly contributed to causal generalization, the assumptions required for these methods are often strong and even controversial. In particular, strong ignorability of sample selection imposes restrictions on the distribution of the outcome variable, which may not hold in practice. Consequently, the resulting inferences have limited applicability since they lack real world connections to empirical problems. Frequently, the credibility of inferences from experimental studies is questionable as the strength of the assumptions increases (Manski, 1990). Furthermore, different types of assumptions may lead to conflicting conclusions, presenting a challenge for replication. In causal generalization, strong ignorability of sample selection is needed for point identification of average treatment effects. This paper considers the inferences that can be made 3

4 with fewer and weaker assumptions when the treatment effects are partially identified. These interval estimates, rather than point estimates, serve as an intermediary between the empirical evidence (the data itself) and point estimation. The goal of partial identification methods is to make more credible and applicable conclusions by invoking assumptions that are more plausible in practice (Manski, 2009). In this paper, we apply partial identification methods to the problem of causal generalization in the absence of equal probability sampling. We situate our problem around a completed cluster randomized trial (CRT) in education that exhibits characteristics typical of randomized experiments in the social science setting such as the experimental sample size compared to the population. We choose this CRT in particular since the preliminary analysis of the results was completed in Konstantopoulous, et al. (2013) and the generalizability of point estimates of the treatment effect was considered in Tipton, et al. (2016). Three important questions are of interest. First, are interval estimates of average treatment effects informative when few to no assumptions are made on the experimental data? This question is addressed by examining bounds on the estimated treatment effects under two cases using known features of the experimental outcomes and design. The first case makes no assumptions on the empirical evidence and the second case examines interval estimation when treatment is randomized. Second, can supplemental sources of data improve upon the interval estimates of treatment effects in causal generalization? This question is addressed by considering differences in the interval estimates of treatment effects when auxiliary information is used to contribute identifying power. Third, what inferences can be made when assumptions that are weaker than strong ignorability, such as bounded sample variation and monotonicity, are imposed? To investigate this issue, we consider bounds when the two assumptions are made on the 4

5 experimental outcomes and compare the interval estimates to the ones under the no assumptions framework. Additionally, a fusion approach combining the bounding methods with propensity score subclassification is used to analyze differences in bounds when the population is stratified into matched subclasses. Lastly, we apply the interval estimation framework to the empirical example and examine differences in estimates with varying levels of assumptions. This section examines the role that assumptions play in narrowing interval estimates and compares these interval estimates with the point estimates derived assuming strong ignorability of sample selection. Through this comparison, we bring attention to partial identification as an alternative perspective to causal generalization and argue that inferences from a range of values can still be informative to researchers and policymakers. The goal of this paper is to highlight issues when generalization inferences rely on strong ignorability and instead consider what can be said in the absence of this assumption. CRT Example In 2006, the Indiana Department of Education and the Indiana State Board of Education managed the implementation of a new assessment system to measure annual student growth and to provide feedback to teachers (Konstantopoulos et al., 2013). During the academic year, 56 K-8 (elementary to middle) schools from the state of Indiana volunteered to implement the new system, of which 34 were randomly assigned to the state's assessment system while 22 served as control schools. In the treatment schools, students were given four diagnostic assessments that were aligned with the Indiana state test and their teachers received online reports on their performance to dynamically guide their instruction in the periods leading up to the state exam. The effectiveness of the assessment system was measured using the Indiana 5

6 Statewide Testing for Educational Progress-Plus (ISTEP+) scores in English Language Arts (ELA) and mathematics. For each study school, the ISTEP+ scores were discretized using the minimum cutoff scores from the Indiana Department of Education and aggregated as either Pass and Not Pass. A natural question from this study is, if every school in Indiana were to implement this system, what is the expected impact on student achievement? In other words, to what extent do the results from the Indiana CRT generalize to the entire state? If the principal investigators of this study planned with generalization in mind, treatment randomization and random sampling would have been implemented to facilitate causal generalization by direct computation of the estimates of the population average treatment effect (PATE) from the data. However, in this situation, whether this convenience sample is representative of the population is questionable (Kruskal and Mosteller, 1980). Nonrandom samples introduce a complication when there are potential systematic differences between self-selected schools and the non-sampled schools in the population, which requires assumption-based methods to produce bias-reduced estimates of the PATE. Notation and Assumptions In this section, we formally conceptualize the PATE in causal generalization and introduce the notation that is used throughout this article. The estimation of causal treatment effects is framed using Rubin's Causal Model (Rubin 1974, 1977, 1980, 1986). Let P denote the population of inference consisting of N schools, of which n schools are selected into the sample. Let W be an indicator of treatment assignment where Wi = 1 if school i was assigned to implement the assessment system (treatment) and Wi = 0 if school i was not assigned to implement the system (control). For each school in P, let Yi(W) denote the binary potential 6

7 outcome of school i to indicate whether school i received a Pass score or not under the respective treatment condition (W=0,1). Finally, let Z be an indicator of sample selection where Zi = 1 if school i was selected into the experimental study and Zi = 0 if not selected. The treatment effect for school i is defined as τi = Yi(1) - Yi(0), and because P consists of schools that were selected and not selected into the experimental study, we can define two average treatment effects τsate = E(τ Z=1) τpate = E(τ Z=1) * P(Z=1) + E(τ Z=0) * (1-P(Z=1)) where τsate is the expected sample average treatment effect (SATE) and τpate is the expected PATE. Using the law of iterated expectations, the expected values of the potential outcomes for the SATE is decomposed into the following E(Y(1)) = E(Y(1) W=1) * P(W=1) + E(Y(1) W=0) * P(W=0) (1) E(Y(0)) = E(Y(0) W=1) * P(W=1) + E(Y(0) W=0) * P(W=0) (2) The PATE is a function of both W and Z so that the decomposition for this estimator is given by E(Y(1)) = E(Y(1) W=1, Z=1) * P(W=1, Z=1) + E(Y(1) W=0, Z=1) * P(W=0, Z=1) + E(Y(1) W=1, Z=0) * P(W=1, Z=0) + E(Y(1) W=0, Z=0) * P(W=0, Z=0) (3) E(Y(0)) = E(Y(0) W=1, Z=1) * P(W=1, Z=1) + E(Y(0) W=0, Z=1) * P(W=0, Z=1) + E(Y(0) W=1, Z=0) * P(W=1, Z=0) + E(Y(0) W=0, Z=1) * P(W=0, Z=1) (4) Since each study school is assigned to at most one treatment, the quantities in Equations (1) (4) cannot be estimated, a premise of the Fundamental Problem of Causal Inference (Holland, 1986). For the SATE, the potential outcomes E(Y(1) W=0) and E(Y(0) W=1) are unobservable counterfactuals since they refer to the expected outcome under treatment (control) when assigned control (treatment), which are unknown (Greenland et al., 1999; Dawid, 2000). Similarly, the PATE cannot be estimated as it is also a function of unobservable counterfactuals. 7

8 It is important to note that the decomposition of the potential outcomes for the PATE are different from those of the SATE on two important aspects. First, the potential outcomes in (3) and (4) necessarily include additional counterfactual terms because the PATE requires information about the sample selection variable of the units (denoted by the indicator Z). Second, the PATE incorporates two additional counterfactuals, E(Y(w) W=w, Z=0) and E(Y(w) W=w, Z=0), for w,w ={0,1} and w w, which will be referred to as the sample counterfactuals. These are the potential outcomes under a treatment condition for units not selected into the experimental study. E(Y(1) W=0, Z=1) and E(Y(0) W=1, Z=1), which we will refer to as the treatment counterfactuals, coincide with the unobservable counterfactuals for the SATE in Equations (1) and (2). Note that these two types of counterfactuals are unobservable for different reasons. The goal of any causal inference study is to identify both treatment and sample counterfactuals, whether it is through the design stage of the study or through the use of assumptions on the distribution of these potential outcomes (Rubin, 2011). Under treatment randomization and equal probability sampling, the potential outcomes Y(W,Z) are statistically independent from the treatment and sample selection variables, W, Z, respectively. Under treatment randomization, Y W so that E(Y(w, z)) = E(Y(w, z) W=w) for w, z={0,1} and the distribution of the unobservable treatment counterfactuals is equivalent to that of the realized potential outcomes (Rubin, 1974; Imai et al., 2008). Similarly, under equal probability sampling, Y Z and E(Y(w, z)) =E(Y(w, z) Z=z) for w, z={0,1} so that the unobservable sample counterfactuals are equal in distribution to the observable potential outcomes among units selected into the sample (that is, Z=1). In this case, the treatment effects can be estimated model free and any unbiased estimator of the SATE will also be unbiased for the PATE. 8

9 Role of Assumptions The challenge in generalization studies, as illustrated in the Indiana CRT, is that while treatment is assigned randomly, the experimental sample is not randomly selected so that point identification of the PATE is infeasible solely due to the unobservable sample counterfactuals. Researchers then typically invoke assumptions such as strong ignorability of sample selection in order to account for the unknown potential outcomes. Because the Indiana CRT was a randomized experiment, we focus specifically on strong ignorability of sample selection henceforth. Under strong ignorability, given a vector of observable covariates X for all schools in P, (Y Z) X and since the conditional distribution of the potential outcomes is independent of sample selection indicator Z, we have (Tipton, 2012): E((Y(W=w, z) W=w, Z) X) = E(Y((W=w, z) W=w) X) for w, z={0,1} (5) A main stipulation of strong ignorability is that all the covariates that affect sample selection and moderate treatment effects are captured by X, and by conditioning on X, the selection process can be considered random (that is, a probability sample). As a result, the sample counterfactuals are conditionally equal in distribution to the realized potential outcomes, the PATE is fully identified and bias-reduced estimates of the PATE can be derived. Information on the covariates X is typically sourced from population data frames such as the Common Core of Data (CCD) or state longitudinal data systems (Stuart, et al., 2011; Tipton, 2012). Recent methods to improve generalizability use propensity scores to meet the strong ignorability assumption (Rosenbaum and Rubin, 1983; Stuart, et al., 2011; Tipton, 2012). Propensity scores, which we denote by s(x), model the probability of sample membership as a function of X and have the advantage of being balancing scores where matching by the propensity score is equivalent to matching by the covariates in the propensity score model 9

10 (Rosenbaum and Rubin, 1983). Under strong ignorability using propensity scores, E((Y(w, z) W=w, Z) s(x)) = E(Y((W=w, z) W=w) s(x)) for w, z={0,1} and s(x) ϵ (0,1] (Tipton, 2012). If strong ignorability using propensity scores holds, the resulting inferences are made using the conditional distribution of the potential outcomes. Whether the strong ignorability assumption is credible and plausible in practice is a controversial topic. At the heart of the matter, strong ignorability is an invariance assumption which argues that the effect of the assessment system is the same (invariant) for students, on average, regardless of whether the school volunteered in the Indiana CRT once the propensity scores are taken into account. In other words, if strong ignorability holds, self-selection does not matter because any differences between the volunteer schools and the rest of the Indiana population schools is explained by the propensity scores. Is this assumption credible? Conceivably, strong ignorability may not hold for a few reasons. For example, schools that respond differently to the assessment system may have a strong support base from parents, which may not observable in the population data frames. Some schools that chose not to volunteer in the Indiana CRT may have an assessment system already in place and the impact of the CRT system may be different for these schools compared to schools that did not have such a system or such resources to begin with for their students. If these characteristics of schools are not observed in X, strong ignorability does not hold. Note that the concern is not solely due to the inability to observe these characteristics, but that these potential covariates may explain treatment variability and they are not included in the propensity score model. In addition, strong ignorability would also not hold if there are schools in P whose estimated propensity score s(x) = 0. This refers to any school whose probability of selection into the sample is structurally zero. 10

11 This may occur if the sample consisted of all single gender schools and generalization to a population of co-educational schools was of primary interest. Partial Identification of the PATE Examples in which strong ignorability does not apply, as described in the previous section, are not uncommon. Manski (2009) first recommended that researchers begin analyses by considering what can be learned from the data alone, without any assumptions, so that a domain of consensus is the established starting point. In the absence of strong ignorability, weaker but plausible assumptions that do not fully identify the PATE can yield informative bounds so that inferences are not solely based on untestable claims on the data (Manski, 2009). In the following sections, we exclude the strong ignorability assumption and consider the inferences made on the partially identified PATE. Although strong ignorability is omitted, the interval estimates are derived with some preliminary assumptions. First, the stable unit treatment value assumption (SUTVA) and its extension to causal generalization are assumed (Tipton, 2012). A key stipulation of SUTVA is that the potential outcome Y(W,Z) of each school i depends only on the treatment and sample indicator of i and not on another school j where i j. Second, the bounds on the PATE are estimated assuming perfect compliance among the schools in the experimental study. While each of these assumptions is not modest, neither assumption alone is sufficient to point identify the PATE and excluding them would require additional assumptions on the potential outcomes Y(W,Z) for each unit in P. Assuming SUTVA and perfect compliance, the bounds for the PATE are derived under four cases. The first two make no assumptions on the data generation process while the third and fourth introduce weaker assumptions, weaker compared to strong ignorability, to achieve tighter bounds. We provide a roadmap of the cases and their respective sections in this article in Table 11

12 (1). Three important questions are of interest: first, how different are the inferences on the PATE based on strong ignorability compared to the inferences based on the data itself and are the latter inferences informative; second, what inferences can be made of the PATE with assumptions that are weaker than strong ignorability, and lastly, how can additional sources of information be used to tighten the bounds of the PATE? INSERT TABLE 1 ABOUT HERE Two Frameworks for Estimating Bounds on the PATE Because problems of generalizability focus on inferences from a sample to a population, a population frame is required in order to estimate the propensity score of being selected into the experimental sample for every school i in P (Shadish, 2010). The population data frame used in the Indiana CRT, sourced from the CCD, contains demographic information on students and schools as well as test scores over several years. Since these data frames enumerate all schools in P, they provide information on the sample counterfactuals in the decomposition of (3) and (4). While propensity score methods use the population data to model the selection probability, we propose using the data frame to present two frameworks for estimating bounds of the PATE. We define two frameworks, the full interval framework and the reduced interval framework which differ by the extent to which the population data frame is useful in providing information to tighten the bounds. The full interval framework uses the experimental sample data with no assumptions on the sample counterfactuals. Under this framework, the only observable quantities are the realized potential outcomes E(Y(w) W=w, Z=1) for w={0,1}, while the unobservable treatment and sample counterfactuals are replaced by known bounds of the outcome. The reduced interval framework uses the empirical evidence from the experimental sample and the population data frame (that is, both the study data and the population data from 12

13 the CCD) to identify the sample counterfactual E(Y(0) W=0, Z=0), the expected outcome under the control condition for schools that were assigned control (W=0) and that were not selected into the experimental sample (Z=0). A rationale for the use of the reduced interval framework lies in the idea that the control condition in educational experiments may be a business as usual (BaU) condition, where control schools continue implementing existing curricula or programs. The reduced interval framework considers the distribution of potential outcomes among schools not selected into the experiment (that is, Z=0) to be identified by the population data frame if the control condition was BaU. Because this was the case for the Indiana CRT, we argue that the non-experimental schools in the population were similarly exposed to the control condition so that their potential outcomes under control are identified by the population data frame. We compare the widths and magnitudes of the estimated bounds of the PATE under these two frameworks to assess the identifying power of the population data frame on the interval estimates. Note that the reduced interval framework requires additional information, specifically on the probabilities P(W=1 Z=0), P(W=0 Z=0) and P(Z=0). These three probabilities are all functions of P(Z=0), which is estimated by the proportion of the population that are not selected into the experimental sample. In practice, this probability can be estimated empirically if knowledge of budget constraints for the study was available. For example, principal investigators decide the scale of implementation within the funding parameters of the experimental study which informs empirical estimates of P(Z=0). The quantities P(W=1 Z=0) and P(W=0 Z=0) represent the probabilities of treatment assignment among non-selected schools, which are typically close to 0.5 in randomized experiments. Bounds Without Assumptions on the Data Generation Process 13

14 Using the two frameworks for bounds, we begin with a domain of consensus and consider the inferences made absent any additional assumptions on the data. The bounds in the following sections are derived using a binary Y so that the potential outcomes Y(W, Z) share the same lower and upper bound, {0,1}, for all units in P and the expectations E(Y(W, Z) W, Z) become the probabilities P(Y(W, Z)=1 W, Z). These bounds can easily be extended to bounded continuous Y, such as test scores, where the lower and upper bounds of Y are used in place of {0,1}. Cases in which Y is bounded on one side, but unbounded on the other have been discussed in Manski (2009), though their focus is on experimental studies and not causal generalization. Full Interval Framework Using (3) and (4), the lower and upper bound of the potential outcomes are derived by replacing the unobservable treatment and sample counterfactuals with 0 and 1, respectively. For w={0,1}, the bounds under the full interval framework are E(Y(w)) ϵ [E L (Y(w)), E U (Y(w))] (5) E L (Y(w)) = E(Y(w) W=w, Z=1) * P(W=w, Z=1) E U (Y(w)) = E L (Y(w)) + (1-P(W=w, Z=1)) The lower and upper bound of the PATE are given by the differences PATE L = E L (Y(1))-E U (Y(0)) (6) PATE U = E U (Y(1))-E L (Y(0)) The width of this bound is 1+ P(Z=0) and because its value is always at least 1, the sign of the PATE cannot be identified. Note that when P(Z=0) = 0, and all the units are selected into the experimental sample, the width 1+ P(Z=0) shrinks to 1, coinciding with the tightest possible width of the interval for the SATE under the same no assumptions framework. Reduced Interval Framework When the population data frame identifies the sample counterfactual E(Y(0) W=0, Z=0), no replacements are made for this potential outcome in the estimation of bounds. Because this 14

15 sample counterfactual pertains to the expected outcome under control, the lower and upper bounds for E(Y(1)) remain the same, but the bounds for E(Y(0)) become E(Y(0)) ϵ [E L (Y(0)), E U (Y(0))] (7) E L (Y(0)) = E(Y(0) W=0, Z=1) * P(W=0, Z=1) + E(Y(0) W=0, Z=0)*P(W=0, Z=0) E U (Y(0)) = E L (Y(0)) + (1-P(W=0, Z=1)-P(W=0, Z=0)) The bounds for the PATE are again given by (6) but now with the lower and upper bounds of E(Y(0)) given in (7). Note that the width of the bound for the PATE is now 1+ P(W=1,Z=0), which is smaller and at most the width for the full interval framework 1+ P(Z=0). Here, P(W=1, Z=0) is the probability for a school not selected into the experiment, to be assigned to the treatment condition. In the best case scenario when P(W=1,Z=0) = 0, the population data frame fully identifies the sample counterfactuals and the bounds for the PATE again shrink to have width 1. In the worst case, the reduced interval bounds are equivalent to those under the full interval framework. Bounds Under Random Treatment Assignment The bounds in (5) and (7) can be improved with randomized treatment. In randomized experiments, like the Indiana CRT, the outcomes Y(W, Z) are statistically independent of the treatment assigned so that E(Y(w)) = E(Y(w) W=w, Z=1). As a result, the treatment counterfactuals E(Y(w) W w, Z=1) for w={0,1} are equal in distribution to the realized potential outcomes and substitutions are only required for the sample counterfactuals. The bounds are now E(Y(w)) ϵ [E L (Y(w)), E U (Y(w))] (8) E L (Y(w)) = E(Y(w) W=w, Z=1) * P(Z=1) E U (Y(w)) = E L (Y(w)) + (1-P(Z=1)) for the full interval framework, but the width of (8) is now 2* P(Z=0), which is strictly tighter than the bound in (5) with exception to the case when P(Z=0) = 1. Under the reduced interval framework, the bounds for E(Y(0)) are given by 15

16 E(Y(0)) ϵ [E L (Y(0)), E U (Y(0))] (9) E L (Y(0)) = E(Y(0) W=0, Z=1) * P(Z=1) + E(Y(0) W=0, Z=0)*P(W=0, Z=0) E U (Y(0)) = E L (Y(0)) + (1-P(Z=1) - P(W=0, Z=0)) Here, the width of the bound is P(Z=0) + P(W=1, Z=0), which is smaller than all of the bounds given in the previous section and is at most equal to the width for the full interval case with treatment randomization. Note that the bounds under treatment randomization are narrower compared to their respective counterparts under the first case and the reduced interval bound is again potentially tighter when E(Y(0) W=0, Z=0) is identified by the population data frame. Bounded Sample Variation and Treatment Randomization The bounds under no assumptions and treatment randomization are often referred to as the worst case bounds, and they are conservative and often uninformative in terms of identifying the sign of the PATE (Manski, 2009). The worst case bounds represent the simplest bounds based on the data, but they can be tightened by adding assumptions. If strong ignorability is unlikely to hold in practice, are there weaker assumptions that may be more plausible? Weaker assumptions are those that are insufficient to point identify the PATE. To conceptualize weaker assumptions, it is helpful to return to the condition stipulated under strong ignorability. Under strong ignorability, E((Y(w) W=w, Z) s(x)) = E((Y(w) W=w) s(x)), which implies that the distribution of potential outcomes between the volunteer and non-volunteer schools are conditionally equal. In other words, the difference between the unobserved sample counterfactuals and the realized potential outcomes from the sample is essentially zero, on average, when conditioning on the estimated propensity scores s(x). Rather than set the difference to be exactly zero, a simple weakening of this invariance assumption is to restrict the difference to be at most a specified constant (Manski, 2015). In particular, a researcher may not be confident that the conditional distributions of ISTEP+ scores are exactly equal, but he/she 16

17 may be willing to believe that the distributions are sufficiently similar among the volunteer and non-volunteer schools in the experimental study. We quantify this similarity by extending the randomized treatment framework under a bounded sample variation assumption given as follows E(Y(w) W=w, Z=0) - E(Y(w) W=w, Z=1) λ E(Y(w) W w, Z=0) - E(Y(w) W w, Z=1) λ for w={0,1} (10) where λ ϵ [0,1] is a constant that represents the largest magnitude of the difference (Manski, 2015). Note that the condition in (10) is applied to both sets of sample counterfactuals. By design, bounded sample variation yields interval estimates of the PATE where the width of the intervals is based on λ. Bounded variation assumptions were first introduced in Manski (2015) and have been discussed in Manski & Pepper (2015) with applications to the impact of right to carry laws on crime rates. For the Indiana CRT, if bounded sample variation holds, the proportion of Pass schools differs by at most a constant λ between the volunteer and nonvolunteer schools for each respective treatment condition. Bounded sample variation yields new bounds of the PATE given by E L (Y(w)) = E(Y(w) W=w, Z=1) * P(Z=1) + (E(Y(w) W=w, Z=1) - λ) * P(Z=0) E U (Y(w)) = E(Y(w) W=w, Z=1) * P(Z=1) + (E(Y(w) W=w, Z=1) + λ) * P(Z=0) (11) for the full interval framework and E L (Y(0)) = E(Y(0) W=0, Z=1)* P(Z=1) + E(Y(0) W=0, Z=0)* p + (E(Y(0) W=0, Z=1) - λ)* (1- P(Z=1) - p) E U (Y(0)) = E(Y(0) W=0, Z=1)* P(Z=1) + E(Y(0) W=0, Z=0)* p + (E(Y(0) W=0, Z=1) + λ) * (1- P(Z=1) - p) (12) for the reduced interval framework where p = P(W=0, Z=0). Like the previous frameworks, the lower and upper bounds of the PATE are given by (6). Because Y is binary, the constant λ is constrained to the interval [0,1] as it represents a difference in estimated proportions. However, λ can be any positive constant when Y is 17

18 continuous. When λ = 0, the bounds reduce to a single value of the PATE so that point estimation is recovered. Larger values of λ imply larger differences between the expected potential outcomes from schools self-selecting into the experimental study, thus widening the interval estimates and weakening the assumption further. Importantly, bounded sample variation and strong ignorability are not nested. Bounded sample variation allows the potential outcomes among sample and non-sample schools to vary under the assumption that the outcomes are similar in magnitude, but not necessarily equal. To conceptualize the plausibility of this assumption, consider it in the context of matching. When experimental samples are matched to a population P in causal generalization, the goal is to achieve balance among observable covariates so that the resulting differences in distributions is minimized (see, e.g., Hansen (2004) for examples of matching). Here, balance is quantified as attaining the smallest standardized mean difference among covariates between the two groups. Conceivably, the difference in expected potential outcomes is smaller with matched samples if the expected outcomes Y are a function of the covariates that are balanced between the groups. In contrast to strong ignorability, the plausibility of bounded sample variation lies in the assumption that there is sufficient overlap between the distributions of potential outcomes among selfselected and population schools to facilitate the derivation of informative bounds. For the Indiana CRT, Tipton et al. (2016) found that the experimental sample was actually similar to the population P despite volunteer nature of the sampling. Using a simulation study, a comparison of the distribution of the SMDs with simple random samples showed that the balance statistics of the CRT sample were not significantly different from those found for probability samples. Since the experimental sample is not very different from the 18

19 population, bounded sample variation is not implausible and we explore the inferences under this assumption in a direct application with the experimental study. Choice of λ The plausibility and applicability of bounded sample variation relies on the choice of λ. Previous work on bounded variation assumptions estimate λ based on prior outcome data (Manski & Pepper, 2015). However, empirical estimates of λ are difficult in our example as it represents the difference between the volunteer and non-volunteer schools outcomes in the Indiana CRT, which is specific to this study. Because bounded sample variation is related to the goals of matching methods, a natural choice for the parameter λ is the standardized mean difference (SMD). In particular, let λ = X P X S /σ P, where X is a pre-treatment covariate, X P is the mean covariate value for schools in the population, X S is the mean covariate value for the sample schools and σ P is the standard deviation of X across all schools in the population. This choice of parameter reflects the assumption that λ effectively measures the degree to which the sample and population is balanced on the specific covariate. As a result, larger values of the SMD suggest larger differences between the sample and population distributions, which is then reflected in wider intervals. To illustrate, when Y is a test score outcome, one possible choice for X is a pre-test score covariate that is strongly correlated with the outcome. Alternatively, because the SMD is typically estimated using multiple covariates, another choice of λ may be based on the average SMD of several covariates or the maximum of the SMD to use as a conservative estimate. Although bounded sample variation is not necessarily validated using these estimates of λ, we provide these choices as a starting point for applications of this assumption. Another choice for λ is the actual variation of the realized potential outcomes in the experimental study, specifically by letting λ= 2* Var(Y(w, z) W=w, Z = 1). This value of λ is 19

20 analogous to the margin of error in the construction of confidence intervals for normally distributed data. When Y(w, z) is binary, the variance is a function of the proportion of successes in the sample (that is, the proportion of Pass schools). Choosing λ in this option offers a conservative limit on the difference in E(Y) by setting the difference to be no more than two standard deviations from the empirical distribution of outcomes. This choice of λ is made assuming that there is sufficient overlap between the two distributions of potential outcomes. Monotone Treatment Response Bounded sample variation and strong ignorability focus on the average distance in distributions of potential outcomes, but neither assumption considers the actual shape of the response function Y(W,Z). If the principal investigators in the Indiana CRT were confident that the benchmark assessment system improves student outcomes, this leads to a new assumption with a different type of identifying power. In particular, researchers may be willing to assume that the response function Y(W,Z) varies monotonically with the magnitude of the treatment. The concept of monotonicity is related to the idea that the benchmark assessment system at least does no harm and at worst, produces outcomes that are not statistically different from the BaU condition. Because monotonicity is concerned with the response function, it is not nested within the strong ignorability assumption. Unlike strong ignorability and bounded sample variation, monotonicity stipulates that the response Y(W,Z) is weakly increasing in the treatment, regardless of self-selection. Although monotonicity cannot be validated empirically, evidence from pilot studies may suggest its plausibility. Pilot studies are often used to determine the feasibility of an experimental study, but also to assess the expected magnitude and sign of the treatment effect in the first iteration of the experiment. During the pilot year of the CRT, Konstantopoulous, et al. (2013) 20

21 found positive, though insignificant, treatment effects for a majority of the grades among the experimental schools. These insignificant findings were largely based on the unconditional model without covariate adjustments and the estimated treatment effects were made for both ELA and mathematics. We discuss the unconditional model as the interval estimates in this article are based on nonparametric bounding methods. While the pilot study results do not guarantee monotonicity s validity, they do lend support to its plausibility for the specific time period of the pilot program. If monotonicity holds, this assumption, combined with the assumptions of SUTVA and perfect compliance, yields a new set of bounds of the PATE with analogous extensions to the bounds of the prior frameworks. Monotonicity of the response function Y(W) stipulates the following condition W W Yi(W) Yi(W ) for i=1,, N where W, W represent two treatment conditions (Manski, 1997). The monotonicity framework differs from the previous cases on two important aspects. First, the treatment indicator W now denotes an ordered set of treatments so that the distribution of values of Y(W) are at least as large as that of Y(W ) if W W. For the Indiana CRT, if W=1, W =0, this implies that the average ISTEP+ scores among the treatment schools (W=1) are at least as large as that among the control schools (W=0) if monotonicity holds. Second, the bounds under monotone treatment response (MTR) are derived using realized values of Y(W, Z) for each combination of (W,Z) across all units in the study. The original monotonicity framework proposed in Manski (1997) assumed that outcomes at different levels of the treatment were observable, which contributes identifying power even if some levels of the treatment are not realized. The sharpness of the bounds is a consequence of the monotonic nature of the response function. Although the focus here is on 21

22 weakly increasing response functions Y(W, Z), the results can easily be generalized to weakly decreasing functions (Manski, 2009). If Y(W, Z) is weakly increasing in W, the lower bound of the PATE is zero by design. For binary Y and W, the upper bound is a function of the proportion of successes and non-successes among the treatment and control schools. Formally, the upper bound, PATE U is PATE U = P(Y=0 W=0, Z=1) * P(W=0, Z=1) + P(Y=1 W=1, Z=1) * P(W=1, Z=1) + P(Y=0 W=0, Z=0) * P(W=0, Z=0) + P(Y=1 W=1, Z=0) * P(W=1, Z=0) (13) Note that the upper bound is the sum of two components. The first, given by P(Y=0 W=0, Z=1) * P(W=0, Z=1) + P(Y=1 W=1, Z=1) * P(W=1, Z=1), is the sum of the proportion of non-pass schools (Y=0) among the schools assigned to control (W=0) and the proportion of Pass schools (Y=1) among the schools assigned to treatment (W=1), for schools that were selected into the experimental study (Z=1). If Y is truly monotone in the treatment, the largest feasible upper bound is the sum of the upper bound for Y(1) and the lower bound for Y(0). The second component, P(Y=0 W=0, Z=0) * P(W=0, Z=0) + P(Y=1 W=1, Z=0) * P(W=1, Z=0), is the sum of analogous proportions among the population schools (Z=0). Because MTR is based on realized values Y(W, Z), additional consideration must be taken for the upper bound as it is a function of schools not selected into the experimental sample (denoted by Z). If the population data frame contributes identifying power, as proposed under the previous reduced interval framework, the term P(Y=0 W=0, Z=0) is identified using the empirical evidence. However, the proportion P(Y=1 W=1, Z=0) is still an unobservable sample counterfactual and the only information that contributes identifying power are the known bounds {0,1} for the proportion. Because the upper bound depends on this sample counterfactual, an interval of values for PATE U can be derived by substituting 0 (1) for the minimum (maximum) 22

23 upper PATE bound. For the purpose of comparing the MTR assumption with the previous frameworks, we focus on the minimum of the PATE U as it is derived using the realized values Y(W,Z) from both the sample and the population data. This lower bound of the PATE U is a best case bound because it represents the smallest value PATE U takes under the monotonicity framework. Note that the smallest PATE U is derived by substituting 0 for the unobserved sample counterfactuals and the largest PATE U is derived by substituting the value 1. Like the full and reduced interval cases, two MTR bounds, denoted as the sample MTR bound and the population MTR bound, are presented and given by Sample MTR Bound PATE U = P(Y=0 W=0, Z=1) * P(W=0, Z=1) + P(Y= 1 W=1, Z=1)* P(W=1, Z=1) Population MTR Bound PATE U = P(Y=0 W=0, Z=1) * P(W=0, Z=1) + P(Y= 1 W=1, Z=1)* P(W=1, Z=1) + P(Y=0 W=0, Z=0)* P(W=0, Z=0) (14) Note that the bounds in (14) differ by the probability P(Y=0 W=0, Z=0), which is identified by the population data frame in the population MTR bound. Thus, if monotonicity holds and the treatment is believed to at least do no harm, the interval estimate of the PATE shrinks to lie to one side of zero. Bounds by Propensity Score Stratum The bounds developed thus far are estimated based on the entire experimental sample and population data frame (the latter for the reduced interval and population MTR cases). In generalizations with nonrandom samples, a primary goal in the study is to match the sample and population. Subclassification is a common matching method in which the population is partitioned into smaller subclasses or strata using quantiles of the propensity score distribution (Tipton, 2012). Figure (1) shows the distribution of propensity score logits (logit(s(x) = 23

24 log(s(x)/1-s(x))), with s(x) denoting propensity score for the Indiana CRT. For this study, the sample only permitted three equally sized strata. These strata represent coarse matches between the volunteer and non-volunteer schools based on the covariates used to estimate the propensity scores. As a statistical tool, subclassification shares the same advantages as stratification methods by improving the precision of estimates when schools in the same stratum are more similar in the matched covariates than between strata. INSERT FIGURE 1 ABOUT HERE Because the bounds are estimated nonparametrically from the data, they can also be computed for subgroups of the data to derive stratum specific interval estimates. In our final application, we propose a combined approach where the previous no assumption, bounded sample variation and monotonicity frameworks are applied with subclassification to derive stratum specific interval estimates of the PATE. Note that strong ignorability is not invoked as the extended application is based on subclassification as a matching method. Given k propensity score strata, the bounds [PATEj L, PATEj U ] are now estimated using the empirical distribution P(Yj Wj, Zj), for j = 1,, k. Because the outcome Y is bounded by the same parameters in each stratum, the interval estimates have the same form as the bounds based on the original sample. Analogous arguments for the plausibility of the bounded sample variation and monotonicity frameworks are extended for the propensity strata. Conceivably, bounded sample variation assumptions may be more plausible as the interval estimates are now based on matched subgroups with improved overlap in the covariate distributions among the strata. Alternatively, when subclassification is used (though not necessarily for generalization), this combined approach is useful in deriving preliminary bounds for sparse strata. Sparse strata, like Stratum 3 in Figure (1), represent regions in the distribution with poor overlap so that fewer 24

25 sample schools are matched to population schools. The small stratum specific sample sizes present challenges for design-based estimation as the limited information leads to imprecise estimates. In these cases, inferences based on the interval estimates serve as a starting point in the analysis and they may be more credible as the limited data from sparse strata require additional assumptions for point estimation. Furthermore, the nonparametric method of deriving the bounds offers a flexibility that can easily be extended for any number of subclasses. Note that propensity score subclassification depends on the propensity score model and the covariates used to estimate propensity scores so that choice of covariates that are correlated with sample selection and treatment heterogeneity is important. Bounds for the PATE in Indiana CRT Tables 2 and 3 provide the bounds for the four cases based on the experimental and stratified Indiana CRT samples, respectively. The PATE is defined as P(Y(1)=1) P(Y(0)=1), the difference in expected proportions of Pass schools among the 34 schools that implemented the assessment system (assigned treatment) and the 22 schools that did not (assigned control). To illustrate this application, 4 th grade scores were used so that the population P was redefined to include the N= 1,029 fourth grade serving schools from the original 1,514 K 8 schools. Using the experimental sample, the conditional probabilities of treatment assignment were estimated as P(W=1 Z=1) = 34/56 = 0.61 and P(W=0 Z=1) = 22/56 = With the redefined P, the probability of selection into sample is given by P(Z=1) = 56/ and P(Z=0) = 1 P(Z=1). For the reduced interval framework, we derive the bounds using P(W=0 Z=0) = 0.5 * P(Z=0) to illustrate the potential of the population data to tighten bounds. This choice of P(W=0 Z=0) is implies randomized treatment among schools in the population. 25

26 Because the Indiana CRT was a randomized experiment, a natural starting place is with the bounds under Randomized Treatment, derived without additional assumptions beyond SUTVA and perfect compliance. As shown in Table 2, the bounds are distinctly uninformative with the interval nearly spanning the [-1, 1] range for both subjects so that the sign of the PATE cannot be identified. Although the bounds under randomized treatment improve over the No Assumptions bounds where the interval shrinks from [-0.97, 0.98] to [-0.93, 0.96] for ELA, the improvement is slight because the estimated probability of sample selection is small with P(Z=1) = When the population data frame identifies P(Y(0) W=0, Z=0) in the reduced interval framework, the upper bound for P(Y(0)) under the No Assumptions and Randomized Treatment cases is now larger because it includes the proportion of Pass schools among those assigned to control in the population. Using P(W=0 Z=0) = 0.5*P(Z=0), the bounds for Math under the reduced interval framework shrink from [-0.93, 0.96] to [-0.87, 0.55] under randomized treatment, a 25% shrinkage in width. The tightening of the upper bound from 0.96 to 0.55 is more prominent as the difference is taken with a larger value of P(Y(0)). Analogous results are seen for ELA. However, the tighter bounds under the reduced interval framework do not rule out an insignificant PATE as each of the four bounds include zero for both ELA and Math. The bounds under no assumptions illustrate that when P(Z=1) is small, the interval estimates of the PATE are rarely informative so that some assumptions are needed. The third rows for each subject in Table 2 provide the bounds under bounded sample variation. For ELA, the value λ= 0.3 was chosen based on the variance of Y and λ= 0.5 was selected based on the SMD of the pretest scores. The values λ= 0.1 and 0.6 were chosen similarly for Math. For both 26

27 subjects, the smaller values of λ represent a smaller difference in the expected potential outcomes between the volunteer and population schools. When λ= 0.3, the expected difference in proportion of Pass schools ranges from to 0.83 for ELA, a much tighter interval compared to the randomized treatment case. For Math, the smaller λ value of 0.1 facilitates identification of the sign of the PATE where the PATE ranges from 0.02 to Note that these intervals are significantly widened when the value of λ increases to 0.5 and 0.6 for ELA and Math, respectively. Since λ affects both the lower and upper bound of P(Y(1)) and P(Y(0)), the impact on the bounds on the PATE is more pronounced after taking the difference. For the reduced interval framework, the upper bound PATE U is again much smaller than that of the full interval framework since the difference P(Y(1)) L P(Y(0)) U is now taken with a larger P(Y(0)) U. With exception to Math with λ= 0.1, the interval estimates suggest an insignificant PATE. The bounds for the PATE are starkly different under monotonicity of treatment response. Note that the upper bound PATE U in Table 2 is the smallest upper bound estimated using the realized potential outcomes. If the benchmark system was assumed to at least do no harm, the expected difference in proportion of Pass schools is at most 0.02 in both subjects based on the experimental data. This upper bound increases to 0.07 and 0.09 for ELA and Math, respectively, when the upper bound includes P(Y(0)=0 W=0, Z=0). However, since the difference between 0.02 and 0.07 for ELA is small, P(Y(0)=0 W=0, Z=0) is small, which implies that the proportion of non-pass schools in the population is small. Importantly, because the PATE is a function of P(Z=1), if this proportion is small, the bound PATE U will have small values. Although the MTR bounds include zero by design, the magnitude of the smallest PATE U for both subjects suggests that, using the realized outcomes, large values of the PATE can be ruled out even if small insignificant treatment effects cannot be excluded. 27

28 INSERT TABLE 2 ABOUT HERE Table 3 provides the bounds for the combined bounding and subclassification approach. Note that the stratum specific sample sizes are smaller under this approach with Stratum 3 containing only a single experimental school. Under randomized treatment, the stratum specific bounds for ELA and Math are largely similar to those based on the entire experimental sample. The sign of the PATE is unidentified for both subjects and the range of values under the full interval case primarily span the [-1, 1] range. The reduced interval bounds under randomized treatment have similar widths as under the original sample, but slight differences can be seen among the strata. For example, the difference in proportion of Pass schools for ELA and Math in Stratum 1 ranges from approximately to 0.53 under the reduced interval framework, but the lower bound decreases to about in Stratum 2. The differences among bounds is seen more distinctly with bounded sample variation and MTR. Using the same bounding parameters as with the original sample, the interval estimates under bounded sample variation suggest larger differences among the strata. In Math, for example, the sign of the PATE is identified in Stratum 1 when λ = 0.1 with an interval estimate of [0.12, 0.49]. However, the bounds in Stratum 2 imply an insignificant PATE, which illustrate potential differences in inferences among the strata. For MTR, the smallest PATE U in each stratum has a similar magnitude as that based on the original sample, with exception to Stratum 3 where the stratum sample is one school. Like the bounds under bounded sample variation, the small differences in magnitude of the interval values facilitate an analysis in the differences among the sample and population in each subclass. INSERT TABLE 3 ABOUT HERE 28

29 As a comparison, Table 4 gives the point estimates of the PATE under no weighting, inverse propensity weighting (IPW), and subclassification for ELA and Math. The no weighting case refers to the direct estimate of the PATE that does not use propensity scores. As shown, the point estimates for ELA and Math are all insignificant, a result that is largely consistent with the bounds provided. While the interval estimates for Math show a positive PATE under λ = 0.1 with bounded sample variation, the interval estimate is still consistent. In particular, the lower bound of 0.02 suggests that small insignificant treatment effects are possible. With exception to IPW, the point estimates under no weighting and subclassification are similar in magnitude to the smallest PATE U under MTR, further supporting insignificant results. The magnitudes of the point estimates in Table 4 illustrate the potential for different inferences with different methods. While strong ignorability was not invoked, the interval estimates provided were sufficiently informative on the insignificant PATE. Its consistency with the point estimates support the potential usefulness of the partially identified estimators. INSERT TABLE 4 ABOUT HERE In the original analysis, Konstantopoulos, et al. (2013) found significant treatment effects for fourth grade ELA using a two level hierarchical linear model with covariates. Using standardized continuous ISTEP+ scores, a significant PATE of (0.057 standard error) was found based on the experimental sample with a model that included school and student level covariates. Importantly, the PATE using the model with treatment alone was not significant (PATE = with standard error). Although this point estimate is not directly comparable to the bounds based on binary outcomes provided here, it is important to note that the significance of the estimate and the resulting inferences depended on the choice of model. Discussion & Conclusion 29

30 This paper addresses the differences in inferences when strong ignorability of sample selection is not invoked in generalization problems. Because point estimates are typically desirable in experimental studies, strong ignorability assumptions are imposed with little discussion on its plausibility. By illustrating the application of bounding methods to causal generalization, we demonstrate the importance of thoughtful consideration into the types of assumptions that may be more plausible in practice. As shown by the differences in widths and magnitudes of the bounds, different assumptions, regardless of whether they have point identifying power, can significantly affect the inferences from a study. A primary goal of partial identification methods is to approach an analysis from a common starting point (Manski, 2009). For that reason, the nonparametric nature of the derived bounds in this paper offer a simple empirically-based approach as all the bounds are computed using observable and estimable quantities. Because models are not used to estimate the intervals, an advantage of bounding methods is that the results can be replicated by different researchers. Generalization problems incorporate both population and sample units so that the bounds for the PATE necessarily include the probability of sample selection (denoted by Z). The bounds under randomized treatment bounds with no additional assumptions are uninformative if the population to which the sample is generalized is three to five times larger. Because this is often the case for generalizations in educational research, assumptions are needed to derive more informative bounds on the PATE. While other assumptions are plausible in different problems, we focus on bounded sample variation and monotonicity as the former is a natural weakening of the strong ignorability assumption and the latter is considered when there is theoretical evidence on the impact of the intervention. While bounded sample variation and monotonicity are not 30

31 validated by our application, we present the interval estimates to illustrate the consistency in inferences with those of the point estimates under strong ignorability. Partial identification applications in generalizability have the advantage of additional sources of data. The primary motivation of our proposed reduced interval framework is to use additional empirical evidence to tighten the interval estimates in place of solely layering on assumptions. As shown in the example, population data frames have the potential to tighten bounds when information on the probabilities of treatment assignment is known among the nonselected schools. Although information on this probability may be sourced elsewhere, we argue the usefulness of population data frames to minimize the number of unobservable potential outcomes. The inclusion of population data was also revealing when comparing the bounds under the sample and population MTR frameworks. As it turns out, the small differences in magnitude of the upper bound PATE U implied small differences in the proportion of Pass schools among the treated and untreated schools. While interval estimates do not substitute for the point estimates used to inform policy, we present their application as an alternative perspective when weaker assumptions to strong ignorability are made in generalizations. Generalizations with nonprobability samples inevitably involve a discussion on the extent to which a self-selected sample differs from the population. Although the assumptions used in this article make conjectures about this difference, by using partial identification approaches, we show that the bounds may still be informative and a better case can be made for their credibility in practice. 31

32 References Dawid, A. P. (2000). Causal inference without counterfactuals. Journal of the American Statistical Association, 95(450), Greenburg, D. & M. Shroder (2004). The Digest of Social Experiments, 3 rd ed. Washington, DC: The Urban Institute Press. Greenland, S., Pearl, J., & Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology, Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association, 99(467), Heckman, J. J., & Vytlacil, E. J. (2001). Instrumental variables, selection models, and tight bounds on the average treatment effect. In Econometric Evaluation of Labour Market Policies (pp. 1-15). Physica-Verlag HD. Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 171(2), Keiding, N., & Louis, T. A. (2016). Perils and potentials of self selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(2), Konstantopoulos, S., Miller, S. R., & van der Ploeg, A. (2013). The impact of Indiana s system of interim assessments on mathematics and reading achievement. Educational Evaluation and Policy Analysis, 35(4), Kruskal, W., & Mosteller, F. (1980). Representative sampling, IV: The history of the concept in statistics, International Statistical Review/Revue Internationale de Statistique, Manski, C. F. (1990). Nonparametric bounds on treatment effects. The American Economic Review, 80(2), Manski, C. F. (1997). Monotone treatment response. Econometrica: Journal of the Econometric Society, Manski, C. F. (2009). Identification for Prediction and Decision. Harvard University Press. Manski, C. F. (2016). Credible interval estimates for official statistics with survey nonresponse. Journal of Econometrics, 191(2),

33 Manski, C. F., & Pepper, J. V. (2015). How Do Right-To-Carry Laws Affect Crime Rates? Coping With Ambiguity Using Bounded-Variation Assumptions (No. w21701). National Bureau of Economic Research. Olsen, R. B., Orr, L. L., Bell, S. H., & Stuart, E. A. (2013). External validity in policy evaluations that choose sites purposively. Journal of Policy Analysis and Management, 32(1), O'Muircheartaigh, C., & Hedges, L. V. (2014). Generalizing from unrepresentative experiments: a stratified propensity score approach. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63(2), Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), Rosenbaum, P. R., & Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79(387), Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688. Rubin, D. B. (1977). Assignment to Treatment Group on the Basis of a Covariate. Journal of Educational and Behavioral Statistics, 2(1), Rubin, D. B. (1980). Comment. Journal of the American Statistical Association, 75, Rubin, D.B. (1986). Statistics and causal inference: Comment: Which ifs have causal answers. Journal of the American Statistical Association, 81, Rubin, D. B. (2011). Causal inference using potential outcomes. Journal of the American Statistical Association. Shadish, W. R. (2010). Campbell and Rubin: A primer and comparison of their approaches to causal inference in field settings. Psychological Methods, 15(1), 3. Stuart, E. A., Cole, S. R., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174(2), Tipton, E. (2012). Improving Generalizations From Experiments Using Propensity Score Subclassification Assumptions, Properties, and Contexts. Journal of Educational and Behavioral Statistics, Tipton, E., Hallberg, K., Hedges, L.V., & Chan, W. (2016). Implications of Small Samples for Generalizations: Adjustments and Rules of Thumb. Forthcoming in Evaluation Review. 33

34 Figure 1. Distribution of Propensity Score Logits for Indiana CRT Note: S1: Stratum 1; S2: Stratum 2; S3: Stratum 3 34

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL BRENDAN KLINE AND ELIE TAMER Abstract. Randomized trials (RTs) are used to learn about treatment effects. This paper

More information

Propensity Score Matching

Propensity Score Matching Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In

More information

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers: Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages*

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages* Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages* German Blanco Department of Economics, State University of New York at Binghamton gblanco1@binghamton.edu Carlos A. Flores

More information

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy Department of Economics, Harvard University 1 / 40 Agenda instrumental variables part I Origins of instrumental

More information

Research Note: A more powerful test statistic for reasoning about interference between units

Research Note: A more powerful test statistic for reasoning about interference between units Research Note: A more powerful test statistic for reasoning about interference between units Jake Bowers Mark Fredrickson Peter M. Aronow August 26, 2015 Abstract Bowers, Fredrickson and Panagopoulos (2012)

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

CompSci Understanding Data: Theory and Applications

CompSci Understanding Data: Theory and Applications CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American

More information

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh)

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, 2016 Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) Our team! 2 Avi Feller (Berkeley) Jane Furey (Abt Associates)

More information

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017 Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability Alessandra Mattei, Federico Ricciardi and Fabrizia Mealli Department of Statistics, Computer Science, Applications, University

More information

SREE WORKSHOP ON PRINCIPAL STRATIFICATION MARCH Avi Feller & Lindsay C. Page

SREE WORKSHOP ON PRINCIPAL STRATIFICATION MARCH Avi Feller & Lindsay C. Page SREE WORKSHOP ON PRINCIPAL STRATIFICATION MARCH 2017 Avi Feller & Lindsay C. Page Agenda 2 Conceptual framework (45 minutes) Small group exercise (30 minutes) Break (15 minutes) Estimation & bounds (1.5

More information

arxiv: v1 [stat.me] 8 Jun 2016

arxiv: v1 [stat.me] 8 Jun 2016 Principal Score Methods: Assumptions and Extensions Avi Feller UC Berkeley Fabrizia Mealli Università di Firenze Luke Miratrix Harvard GSE arxiv:1606.02682v1 [stat.me] 8 Jun 2016 June 9, 2016 Abstract

More information

Sharp Bounds on Causal Effects under Sample Selection*

Sharp Bounds on Causal Effects under Sample Selection* OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 77, 1 (2015) 0305 9049 doi: 10.1111/obes.12056 Sharp Bounds on Causal Effects under Sample Selection* Martin Huber and Giovanni Mellace Department of Economics,

More information

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University Growth Mixture Modeling and Causal Inference Booil Jo Stanford University booil@stanford.edu Conference on Advances in Longitudinal Methods inthe Socialand and Behavioral Sciences June 17 18, 2010 Center

More information

Empirical approaches in public economics

Empirical approaches in public economics Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Introduction to Statistical Inference Kosuke Imai Princeton University January 31, 2010 Kosuke Imai (Princeton) Introduction to Statistical Inference January 31, 2010 1 / 21 What is Statistics? Statistics

More information

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing 1. Purpose of statistical inference Statistical inference provides a means of generalizing

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014 Assess Assumptions and Sensitivity Analysis Fan Li March 26, 2014 Two Key Assumptions 1. Overlap: 0

More information

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data Jeff Dominitz RAND and Charles F. Manski Department of Economics and Institute for Policy Research, Northwestern

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data Biometrics 000, 000 000 DOI: 000 000 0000 Discussion of Identifiability and Estimation of Causal Effects in Randomized Trials with Noncompliance and Completely Non-ignorable Missing Data Dylan S. Small

More information

CEPA Working Paper No

CEPA Working Paper No CEPA Working Paper No. 15-06 Identification based on Difference-in-Differences Approaches with Multiple Treatments AUTHORS Hans Fricke Stanford University ABSTRACT This paper discusses identification based

More information

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH Awanis Ku Ishak, PhD SBM Sampling The process of selecting a number of individuals for a study in such a way that the individuals represent the larger

More information

Principles Underlying Evaluation Estimators

Principles Underlying Evaluation Estimators The Principles Underlying Evaluation Estimators James J. University of Chicago Econ 350, Winter 2019 The Basic Principles Underlying the Identification of the Main Econometric Evaluation Estimators Two

More information

Weak Stochastic Increasingness, Rank Exchangeability, and Partial Identification of The Distribution of Treatment Effects

Weak Stochastic Increasingness, Rank Exchangeability, and Partial Identification of The Distribution of Treatment Effects Weak Stochastic Increasingness, Rank Exchangeability, and Partial Identification of The Distribution of Treatment Effects Brigham R. Frandsen Lars J. Lefgren December 16, 2015 Abstract This article develops

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages*

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages* Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages* German Blanco Department of Economics, State University of New York at Binghamton gblanco1@binghamton.edu Carlos A. Flores

More information

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E OBJECTIVE COURSE Understand the concept of population and sampling in the research. Identify the type

More information

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata

More information

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015 Introduction to causal identification Nidhiya Menon IGC Summer School, New Delhi, July 2015 Outline 1. Micro-empirical methods 2. Rubin causal model 3. More on Instrumental Variables (IV) Estimating causal

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned

More information

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism

More information

Observational Studies and Propensity Scores

Observational Studies and Propensity Scores Observational Studies and s STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Makeup Class Rather than making you come

More information

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small Bounds on Causal Effects in Three-Arm Trials with Non-compliance Jing Cheng Dylan Small Department of Biostatistics and Department of Statistics University of Pennsylvania June 20, 2005 A Three-Arm Randomized

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Assessing Studies Based on Multiple Regression

Assessing Studies Based on Multiple Regression Assessing Studies Based on Multiple Regression Outline 1. Internal and External Validity 2. Threats to Internal Validity a. Omitted variable bias b. Functional form misspecification c. Errors-in-variables

More information

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates Eun Sook Kim, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta E. Lanehart, Aarti Bellara,

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Lecture 5: Sampling Methods

Lecture 5: Sampling Methods Lecture 5: Sampling Methods What is sampling? Is the process of selecting part of a larger group of participants with the intent of generalizing the results from the smaller group, called the sample, to

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers: Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 13, 2009 Kosuke Imai (Princeton University) Matching

More information

Modeling Mediation: Causes, Markers, and Mechanisms

Modeling Mediation: Causes, Markers, and Mechanisms Modeling Mediation: Causes, Markers, and Mechanisms Stephen W. Raudenbush University of Chicago Address at the Society for Resesarch on Educational Effectiveness,Washington, DC, March 3, 2011. Many thanks

More information

The Econometrics of Randomized Experiments

The Econometrics of Randomized Experiments The Econometrics of Randomized Experiments Susan Athey Guido W. Imbens Current version June 2016 Keywords: Regression Analyses, Random Assignment, Randomized Experiments, Potential Outcomes, Causality

More information

Bounds on Average and Quantile Treatment E ects of Job Corps Training on Participants Wages

Bounds on Average and Quantile Treatment E ects of Job Corps Training on Participants Wages Bounds on Average and Quantile Treatment E ects of Job Corps Training on Participants Wages German Blanco Food and Resource Economics Department, University of Florida gblancol@u.edu Carlos A. Flores Department

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest Edward Vytlacil, Yale University Renmin University, Department

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Causal inference in multilevel data structures:

Causal inference in multilevel data structures: Causal inference in multilevel data structures: Discussion of papers by Li and Imai Jennifer Hill May 19 th, 2008 Li paper Strengths Area that needs attention! With regard to propensity score strategies

More information

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance Identi cation of Positive Treatment E ects in Randomized Experiments with Non-Compliance Aleksey Tetenov y February 18, 2012 Abstract I derive sharp nonparametric lower bounds on some parameters of the

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Identification with Latent Choice Sets: The Case of the Head Start Impact Study

Identification with Latent Choice Sets: The Case of the Head Start Impact Study Identification with Latent Choice Sets: The Case of the Head Start Impact Study Vishal Kamat University of Chicago 22 September 2018 Quantity of interest Common goal is to analyze ATE of program participation:

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

The problem of causality in microeconometrics.

The problem of causality in microeconometrics. The problem of causality in microeconometrics. Andrea Ichino University of Bologna and Cepr June 11, 2007 Contents 1 The Problem of Causality 1 1.1 A formal framework to think about causality....................................

More information

DETERRENCE AND THE DEATH PENALTY: PARTIAL IDENTIFICATION ANALYSIS USING REPEATED CROSS SECTIONS

DETERRENCE AND THE DEATH PENALTY: PARTIAL IDENTIFICATION ANALYSIS USING REPEATED CROSS SECTIONS DETERRENCE AND THE DEATH PENALTY: PARTIAL IDENTIFICATION ANALYSIS USING REPEATED CROSS SECTIONS Charles F. Manski Department of Economics and Institute for Policy Research Northwestern University and John

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death Kosuke Imai First Draft: January 19, 2007 This Draft: August 24, 2007 Abstract Zhang and Rubin 2003) derives

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information: Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information: aanum@ug.edu.gh College of Education School of Continuing and Distance Education 2014/2015 2016/2017 Session Overview In this Session

More information

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D.

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. David Kaplan Department of Educational Psychology The General Theme

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Covariate Selection for Generalizing Experimental Results

Covariate Selection for Generalizing Experimental Results Covariate Selection for Generalizing Experimental Results Naoki Egami Erin Hartman July 19, 2018 Abstract Social and biomedical scientists are often interested in generalizing the average treatment effect

More information

Causal Analysis in Social Research

Causal Analysis in Social Research Causal Analysis in Social Research Walter R Davis National Institute of Applied Statistics Research Australia University of Wollongong Frontiers in Social Statistics Metholodogy 8 February 2017 Walter

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Modeling Log Data from an Intelligent Tutor Experiment

Modeling Log Data from an Intelligent Tutor Experiment Modeling Log Data from an Intelligent Tutor Experiment Adam Sales 1 joint work with John Pane & Asa Wilks College of Education University of Texas, Austin RAND Corporation Pittsburgh, PA & Santa Monica,

More information

Quantitative Economics for the Evaluation of the European Policy

Quantitative Economics for the Evaluation of the European Policy Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti Davide Fiaschi Angela Parenti 1 25th of September, 2017 1 ireneb@ec.unipi.it, davide.fiaschi@unipi.it,

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea)

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) OUTLINE Inference: Statistical vs. Causal distinctions and mental barriers Formal semantics

More information

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate

More information

Marginal, crude and conditional odds ratios

Marginal, crude and conditional odds ratios Marginal, crude and conditional odds ratios Denitions and estimation Travis Loux Gradute student, UC Davis Department of Statistics March 31, 2010 Parameter Denitions When measuring the eect of a binary

More information

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Summary and discussion of The central role of the propensity score in observational studies for causal effects Summary and discussion of The central role of the propensity score in observational studies for causal effects Statistics Journal Club, 36-825 Jessica Chemali and Michael Vespe 1 Summary 1.1 Background

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments

Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments Jonathan Templin The University of Georgia Neal Kingston and Wenhao Wang University

More information

Technical and Practical Considerations in applying Value Added Models to estimate teacher effects

Technical and Practical Considerations in applying Value Added Models to estimate teacher effects Technical and Practical Considerations in applying Value Added Models to estimate teacher effects Pete Goldschmidt, Ph.D. Educational Psychology and Counseling, Development Learning and Instruction Purpose

More information

arxiv: v1 [math.st] 28 Feb 2017

arxiv: v1 [math.st] 28 Feb 2017 Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Partial Identification of Average Treatment Effects in Program Evaluation: Theory and Applications

Partial Identification of Average Treatment Effects in Program Evaluation: Theory and Applications University of Miami Scholarly Repository Open Access Dissertations Electronic Theses and Dissertations 2013-07-11 Partial Identification of Average Treatment Effects in Program Evaluation: Theory and Applications

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Balancing Covariates via Propensity Score Weighting

Balancing Covariates via Propensity Score Weighting Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014

More information

Optimal Blocking by Minimizing the Maximum Within-Block Distance

Optimal Blocking by Minimizing the Maximum Within-Block Distance Optimal Blocking by Minimizing the Maximum Within-Block Distance Michael J. Higgins Jasjeet Sekhon Princeton University University of California at Berkeley November 14, 2013 For the Kansas State University

More information

Section 10: Inverse propensity score weighting (IPSW)

Section 10: Inverse propensity score weighting (IPSW) Section 10: Inverse propensity score weighting (IPSW) Fall 2014 1/23 Inverse Propensity Score Weighting (IPSW) Until now we discussed matching on the P-score, a different approach is to re-weight the observations

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics Lecture 3 : Regression: CEF and Simple OLS Zhaopeng Qu Business School,Nanjing University Oct 9th, 2017 Zhaopeng Qu (Nanjing University) Introduction to Econometrics Oct 9th,

More information