Regression III Regression Discontinuity Designs

Size: px
Start display at page:

Download "Regression III Regression Discontinuity Designs"

Transcription

1 Motivation Regression III Regression Discontinuity Designs Dave Armstrong University of Western Ontario Department of Political Science Department of Statistics and Actuarial Science (by courtesy) e: w: Often times, we want to use regression analysis to make causal statements. We can only do this if: All of our modeling assumptions hold. Including - independence between X and ". Normally, with observational data, these assumptions are unlikely to hold. Some research designs can leverage near-random assignment to make mimic an experimental situation. 1 / 64 2 / 64 Example: State-building in Vietnam US Government Metrics What were the effects of different military strategies on security, development, governance, civil society, etc... in Vietnam? Why can t we just do: Modernization = b 0 + b 1 Bombing + Z where each observation is a hamlet in Vietnam. + e The US DoD used several metrics to guide military strategy. Abatteryof169questionsaboutsecurity,politicsandeconomicswas combined using Bayes rule to identify a security score: S =[0, 5]. The mainframe wouldn t print out the continuous score, so they rounded it and printed out the rounded numbers. Identification on causal effects can be obtained by considering hamlets that are close on the continuous score, but get rounded into different categories (e.g., , ) 3 / 64 4 / 64

2 Discontinuity Figure We know that in the assignment of a score, a discontinuity exists at the rounding threshold. How can we estimate the effect of bombings, which are assigned largely based on the discontinuity? How do we know that effect is real and not some modeling artifact? What assumptions are needed to motivate this type of analysis? 5 / 64 6 / 64 Reference Sharp vs. Fuzzy RDD This lecture is based primarily on the working manuscript: Matias D. Cattaneo, Nicolás Idrobo & Rocío Titiunik (2017) A Practical Introduction to Regression Discontinuity Designs 7 / 64 8 / 64

3 Potential Outcomes Framework Extrapolation and RDD Each observation has two potential outcomes: Y i (0): the outcome observed under the control and Y i (1): the outcome observed under the treatment Comparing these two effects should give us a sense of the causal effect of avariable. However, we only observe one or the other of those things for each observation. ( E[Y E[Y i X i ]= i (0) X i ] if x < x E[Y i (1) X i ] if x x In the sharp design, there is no joint support over Y i (0) and Y i (1) Extrapolation is required to identify the causal effect. 9 / / 64 The Main Idea The fundamental idea is that the discontinuity can provide a measure of the causal impact if: Both E[Y i (0) X i = x] and E[Y i (1) X i = x] are both continuous in x at the discontinuity X i = x. where E[Y i (1) Y i (0) X = x] =lim x# x E[Y i X = x] SRD = lim x# x E[Y i X = x] lim E[Y i X = x] x" x lim E[Y i X = x] x" x 11 / 64 In the fuzzy design... Fuzzy Design Pr(Treated) changes at x, but not from 0 to 1 like in the sharp design. This could happen if everyone above x was eligible for the treatment, but only some took part. FRD = E[(D i(1) D i (0))(Y i (1) Y i (0)) X i = x] E[(D i (1) D i (0)) X i = x] = lim x# x E[Y i X i = x] lim x" x E[Y i X i = x] lim x# x E[D i X i = x] lim x" x E[D i X i = x] where D i (0) is the treatment take-up indicator for those assigned to the control group, and D i (1) is the treatment take-up indicator for those assigned to the treatment 12 / 64

4 Let s get Kinky Kink Designs The Kink RD tries to estimate first derivatives of the regression function rather than the function itself. SKRD = d dx E[Y i(1) Y i (0) X i = x] x= x d = lim x# x dx E[Y d i X i = x] lim x" x dx E[Y i X i = x] d dx FKRD = E[(D i(1) D i (0))(Y i (1) Y i (0)) X i = x] x= x d dx E[(D i(1) D i (0)) X i = x] x= x = lim x# x d dx E[Y d i X i = x] lim x" x dx E[Y i X i = x] d lim x# x dx E[D d i X i = x] lim x" x dx E[D i X i = x] 13 / / 64 Other Designs RD Effects are Local The difference between E[Y i (1) X ] and E[Y i (0) X ] is calculated at a single point ( x) along the support of X. The effect will not necessarily generalize as we move away from the threshold without strong (usually unjustified) assumptions about the regression function. Multi-cutoff Designs Multiple Score/Geographic Designs 15 / / 64

5 Could Be (but isn t) Useful library(readstata13) data <- read.dta13("polecon.dta") Y <- data$y X <- data$x Z <- data$z Z_X <- Z*X plot(y ~ X, xlab = "Islamic Victory", ylab = "Female High School Share") abline(v=0) Islamic Victory Female High School Share 17 / 64 Binning Estimator We can partition the observations into bins and then take the average y within bins to get a sense of how the discontinuity looks. Ȳ,j = 1 #{X i 2 B,j } X i:x i 2B,j Y i and Ȳ +,j = 1 #{X i 2 B +,j } X i:x i 2B +,j Y i 18 / 64 RD Plot library(rdrobust) out <- rdplot(y, X, nbins = c(20, 20), binselect = "esmv") RD Plot Y axis 19 / 64 Notes on the Previous Slide 1. The binning and global parametric model certainly make it easier to see what is happening with respect to the discontinuity. 2. Global polynomials are not necessarily great because they are known to be unstable in the tails and the tail is, by definition the place we re looking. 20 / 64

6 Binning Estimators Bins Example out = rdplot(y, X, binselect = 'es') out = rdplot(y, X, binselect = 'qs') RD Plot RD Plot Bins can be: Evenly Spaced (with different numbers of observations in each category) Quantile Spaced (with different distances between bin boundaries) Y axis X axis X axis There are a number of methods to optimally pick the number of bins. Y axis / / 64 Optimally Choosing Bins: IMSE Optimally Choosing Bins: Mimicking Variability Some optimize on Integrated Mean Squared Error (IMSE), so as to make the optimal tradeoff between bias and variance. Not always best because it could produce an overly smooth plot. Omitting the nbins argument and specifying binselect = 'es' or binselect = 'qs' will generate these optimal bins for evenly and quantile space bins, respectively. Bins can be chosen such that the variability in the binned means mimics variability in the raw data. Not overly smooth like the IMSE binned estimator. Generally results in more bins than the IMSE method.d ES bins can sometimes encourage binselect='esmv' and binselect='qsmv' will generate the mimicking variance estimators. 23 / / 64

7 Bins Example RD Plots out = rdplot(y, X, binselect = 'esmv') out = rdplot(y, X, binselect = 'qsmv') RD Plot RD Plot Y axis Y axis Good for illustration and investigation, but not for treatment effect. polynomials are too variable at the boundary points Use MV bins (both QS and ES side-by-side) to illustrate the design, with a global 4th or 5th order polynomial X axis X axis 25 / / 64 Continuity-based Approach Fundamentals Better for point estimates and inference of the treatment effect. Use polynomial methods local to the cutoff to model E[Y i X i = x] from either side and treat SRD as a parameter to be estimated. Either global polynomials (when all obs are used) or local polynomials (when only obs near cutoff are used) model the treatment effect. The running (X )variableisassumedtobecontinuousandsothere are few, if any, observations at X = x. To estimate E[Y i (1) X i = x] and E[Y i (0) X i = x], points near (but not at) the cutoff need to be used. The main point of interest and attention here is how the regression function is specified. Has huge effects on the robustness and credibility of the design and inference. The primary tool for estimating the effect is a low-order local polynomial regression. 27 / / 64

8 LPR in RDD Example: First-order LPR 1. Choose order of the polynomial. 2. Choose bandwidth h, such that only observations between [ x h, x + h] are used to fit the LPR. 3. In the LPR, use weights w i = K x i = x h. The intercept from this LPR is an estimate ˆµ + of ˆµ = E[Y i (1) X i = x]. 4. Estimate ˆµ of µ = E[Y i (0) X i = x]. 5. ˆ SRD =ˆµ + ˆµ. 29 / / 64 Choices to make in LPR Bias and Bandwidth Kernel - triangular kernel (with MSE optimal bandwidth selection) leads to a point-estimate with optimal MSE properties. Here, weight declines linearly moving away from x. Other common options are Uniform and Epanechnikov kernels, but results tend to be robust with respect to this choice. Polynomial Order - in an effort to make the appropriate bias-variance tradeoff, polynomialorderofp = 1orp = 2isusuallyrecommended with optimal bandwidth selection to maximize accuracy of the estimate. Most research relies on local linear regression. Bandwidth - automatically selected given the two choices above (more below) to make the appropriate bias-variance tradeoff. 31 / / 64

9 Optimal Bandwidth Choice Optimal BW Selection in R Generally chosen to minimize MSE: Bias 2 + Variance. The bias is found by relating the local linear estimator to the curvature of the of the unknown regression function and depends primarily on the (p + 1) th derivative of the function. The variance term is a function of density of the running variable around the cutoff (which is negatively related to variance) and the conditional variability of the estimate. Different bandwidths can be chosen on either side of the cutoff since the treatment effect is the difference between two one-sided estimates. Aregularizationtermisoftenincludedtopreventstrangebehavior when bias is nearly zero (i.e., when a global linear model fits well). summary(rdbwselect(y, X, kernel = 'triangular', p = 1, bwselect = 'msetwo')) Call: rdbwselect Number of Obs BW type msetwo Kernel Triangular VCE method NN Number of Obs Order est. (p) 1 1 Order bias (p) 2 2 ======================================================= BW est. (h) BW bias (b) Left of c Right of c Left of c Right of c ======================================================= msetwo ======================================================= Use the argument bwselect = 'mserd' for a single bandwidth across both regions. 33 / / 64 Using rdrobust to Calculate Treatment Effect Using rdrobust to Calculate Treatment Effect (2) summary(rdrobust(y, X, kernel = "triangular", p = 1, bwselect = "mserd")) Call: rdrobust Number of Obs BW type mserd Kernel Triangular VCE method NN Number of Obs Eff. Number of Obs Order est. (p) 1 1 Order bias (p) 2 2 BW est. (h) BW bias (b) rho (h/b) ============================================================================= Method Coef. Std. Err. z P> z [ 95% C.I. ] ============================================================================= Conventional [0.223, 5.817] Robust [-0.309, 6.276] ============================================================================= summary(rdrobust(y, X, kernel = "triangular", p = 1, bwselect = "msetwo")) Call: rdrobust Number of Obs BW type msetwo Kernel Triangular VCE method NN Number of Obs Eff. Number of Obs Order est. (p) 1 1 Order bias (p) 2 2 BW est. (h) BW bias (b) rho (h/b) ============================================================================= Method Coef. Std. Err. z P> z [ 95% C.I. ] ============================================================================= Conventional [0.243, 5.695] Robust [-0.245, 6.152] ============================================================================= 35 / / 64

10 RD Plot, Optimal Bandwidth Inference bandwidth <- rdrobust(y, X, kernel = 'triangular', p = 1, bwselect = 'mserd')$h_l out <- rdplot(y[abs(x)<=bandwidth], X[abs(X)<=bandwidth], p = 1, kernel = 'triangular') Y axis RD Plot X axis Inference is less straightforward here, for reasons similar to those we ve seen before. Bandwidth has been selected to make the optimal bias-variance tradeoff. An implication of this is that the model is almost necessarily mis-specified because the algorithm didn t minimize bias, but a combination of bias and variance. Cattaneo et al propose a robust, bias-corrected confidence interval for hypothesis testing. Centered around a bias-corrected parameter estimate Variance takes into account the variability in the bias-correction phase as well as sampling variability. 37 / / 64 Inference in Practice Including Covariates out <- rdrobust(y, X, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE) cbind(out$coef, out$ci) Coeff CI Lower CI Upper Conventional Bias-Corrected Robust Covariates can be included in the RD design with the covs argument in rdrobust. The estimate is only really considered a treatment effect if the covariates are determined and fixed before the assignment of the treatment. Covariates can reduce sampling variability without increasing bias in the best case scenario. Z = data[,c("vshr_islam1994", "partycount", "lpop1994", "merkezi", "merkezp", "subbuyuk", "buyuk")] outcov <- rdrobust(y, X, covs = Z, kernel = 'triangular', scaleregul = 1, p = 1, bwselect = 'mserd') cbind(outcov$coef, outcov$ci) Coeff CI Lower CI Upper Conventional Bias-Corrected Robust / / 64

11 Randomization Inference Approach The previous approach leveraged the assumption of continuity and smoothness of E[Y i (0) X i = x] and E[Y i (1) X i = x] at the cutoff to make inferences. Randomization inference views the RD design as a randomized experiment around the cutoff x. The sharp differences in treatment status at the cutoff resemble a randomized controlled trial at the cutoff. Units whose score value (values on the running variable) are in a small window around the cutoff can be analyzed as being from a randomly assigned experiment. Local randomization inference is particularly useful when the running variable is discrete or has relatively few points. It can be used as a robustness check for continuity based designs, but local randomization requires stronger assumptions. We assume that: Local Randomization Overview For points in a small window around the cutoff, W 0 =[ x w 0, x + w 0 ], status into treatment or control can be considered to be randomly assigned (aka as if random assignment). Not only is the assignment random, but the running variable in the window must be unrelated to the outcome. Similarity of RD and Experiments: 41 / / 64 Formalization Estimation and Inference In the strongest version, we assume: For X i 2 W 0, Y i (X i, T i )=Y i (T i ), the running variable only influences Y through the treatment indicator. In a weaker version, we could relax above to: (Y i (X i, T i ), X i, T i )=Ỹi(T i ), there exists a transformation for which the first condition mentioned above is true. Estimation could take the form of large-sample statistical estimators if there are lots of X i 2 W 0, but this is often not the case. Randomization inference has exact, finite-sample properties which makes it quite attractive for this case. Fisherian inference: Potential outcomes are non-stochastic (i.e., fixed, no random sampling assumed). H0 F : Y i(0) =Y i (1)8i Under the null, all outcomes are observed because for each observation the two outcomes are the same. 43 / / 64

12 Hypothetical Example of Fisherian Inference Distribution of Test Statistic under Null Imagine we have 5 units in W 0 and we randomly assign n W0,+ = 3units to the treatment and n W0, = n W0 n W0,+ = 2unitstothecontrol. Under full randomization, we could assume that n W0,+, and by extension n W0, are fixed and find all possible vectors t of the treatment and control that preserve the marginal distribution of T. In our example, there are 5 3 = 10 possible assignments to treatment and control. Assume that Y =(5, 2, 2, 5, 5) and that T =(1, 0, 0, 1, 1), then the observed difference in means is S obs = Ȳ + Ȳ = = 3. If complete enumeration of all possible outcomes is not feasibe, simulation can be used. 45 / / 64 Test Statistics Randomization Inference for a Regression Coefficient Fisherian inference is general and should work for any test statistic. Some other common choices for RD designs are: Kolmogorov-Smirnov (KS) statistics: S KS = sum ˆF 1 (y) ˆF 0 (y), the biggest absolute difference in the two empirical CDFs. Better than difference of means when departures from null are in other moments or quantiles. Wilcoxon rank sum statistic: S WR = P i:t i =1 Ry i where R y i is the outcome rank. S WR is not effected by the cardinal values of the outcome, only their ordering. library(mass) set.seed(493) X <- mvrnorm(100, c(0,0,0), matrix(c(1,.25,.25,.25,1,.25,.25,.25,1), ncol=3)) b <- c(.3, -1, 2) y <- X %*% b + rnorm(100, 0, 1.5) printcoefmat(summary(mod <- lm(y ~ X))$coef) Estimate Std. Error t value Pr(> t ) (Intercept) X * X e-10 *** X < 2.2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ranb <- NULL for(i in 1:2500){ X[,1] <- sample(x[,1], nrow(x), replace=f) ranb <- rbind(ranb, coef(update(mod))) } 2*(1-mean(coef(mod)[2] > ranb[,2])) [1] / / 64

13 Example Choosing the Window library(rdlocrand) rdrandinf(y, X, wl = -2.5, wr=2.5, seed = 50, reps=2500) Selected window = [-2.5;2.5] Running randomization-based test... Randomization-based test complete. Number of obs = 2629 Order of poly = 0 Kernel type = uniform Reps = 2500 Window = set by user H0: tau = 0 Randomization = fixed margins Cutoff c = 0 Left of c Right of c Number of obs Eff. number of obs Mean of outcome S.d. of outcome Window Finite sample Large sample Statistic T P> T P> T Power vs d = 4.27 Diff. in means $sumstats [,1] [,2] [1,] [2,] [3,] [4,] / 64 Some options: 1. Ad hoc or theoretically defined - both are different flavors of arbitrary. 2. Use pre-treatment covariates to select the window. Assumes that there exists a variable Z that is related to the running variable outside the window, but not inside the window. (without this assumption, the procedure breaks down) The effect of the treatment on Z, since it is pre-determined, is 0 by construction. 50 / 64 Data-driven Choice of Window Formalization of the Procedure 1. Identify H0 F : Z is unrelated to T or balanced on T. 2. Start with smallest possible window and test H0 F. 3. Continue to widen window until H0 F is rejected at a pre-specified significance level. 4. The chosen window is the largest one that continues to fail to reject H0 F. We need to choose the following things: Relevant Covariates Test Statistic Randomization mechanism Minimum n in smallest window Significance level 1. Start with a symmetric window of length 2w j, W j = X ± w j 2. Compute the test statistic either for each covariate individually or compute the omnibus test p-value. 3. Find the smallest p-vale p min and evaluate whether whether p min >. If yes, then fail to reject H 0 and increase the size of the window by a pre-specified step. If no, then use the window W j 1. The step procedure can be defined by a fixed length (wstep in R) or such that a certein number of observations is included (wobs in R). 51 / / 64

14 Window Selection in R Example Z <- data[, c("i89", "vshr_islam1994", "partycount", "lpop1994", "merkezi", "merkezp", "subbuyuk", "buyuk")] rdwinselect(x, Z, seed = 50, reps = 1000, wobs = 2) library(rdlocrand) rdrandinf(y, X, wl = -.944, wr=.944, seed = 50, reps=2500) Window selection for RD under local randomization Number of obs = 2629 Order of poly = 0 Kernel type = uniform Reps = 1000 Testing method = rdrandinf Balance test = diffmeans Cutoff c = 0 Left of c Right of c Number of obs st percentile th percentile th percentile th percentile Window length / 2 p-value Var. name Bin.test Obs<c Obs>=c i i i i i i i i merkezi i Recommended window is [-0.944;0.944] with 38 observations (17 below, 21 above). 53 / 64 Selected window = [-0.944;0.944] Running randomization-based test... Randomization-based test complete. Number of obs = 2629 Order of poly = 0 Kernel type = uniform Reps = 2500 Window = set by user H0: tau = 0 Randomization = fixed margins Cutoff c = 0 Left of c Right of c Number of obs Eff. number of obs Mean of outcome S.d. of outcome Window Finite sample Large sample Statistic T P> T P> T Power vs d = Diff. in means $sumstats [,1] [,2] [1,] [2,] [3,] [4,] / 64 Local Randomization or Continuity Approach? Validation Local randomization requires stronger assumptions than the continuity-based approach, thus one might use this approach to probe the conditions under which inference makes sense. The continuity-based approach requires reasonable data density around the cutoff. If this isn t the case, then the local randomization approach might be better. When the running variable is discrete (even potentially with lots of values, e.g., age in years), the local randomization approach could be better because there will be mass points with multiple observations. There are threats to validity with RD designs. If the cutoff is known to the observations ahead of time, this can threaten the validity of the RD design. Observations may try to actively manipulate their score if they are just below the cutoff. There are empirical tests aimed at evaluating the validity of the design. 1. continuity of the score density around the cutoff 2. null treatment effects on pre-treatment covariates and placebos 3. Look at regression function continuity at arbitrary alternative cutoffs. 55 / / 64

15 Density of the Running Variable Null Effects on Pre-treatment Covariates and Placebos If units don t have the ability to manipulate their score, then there should be similar data density on both sides of the cutoff. summary(rddensity(x)) Error in rddensity(x): could not find function "rddensity" If the effect is causal, then it should not be related to pre-treatment covariates or placebo conditions. Anything determined before the treatment counts as a pre-treatment covariate. Placebo outcomes are context-specific. 57 / / 64 Covariates and Placebos With Randomization Inference robs <- lapply(1:ncol(z), function(x)rdrobust(z[,x], X)) names(robs) <- colnames(z) t(round(sapply(robs, function(x)cbind(x$coef, x$ci)[3,]), 3)) Coeff CI Lower CI Upper i vshr_islam partycount lpop merkezi merkezp subbuyuk buyuk robs <- lapply(1:ncol(z), function(x)rdrandinf(z[,x], X, wl=-.944, wr=.944)) names(robs) <- colnames(z) t(round(sapply(robs, function(x)c(stat=x$obs.stat, pval=x$p.value)), 4)) stat pval i vshr_islam partycount lpop merkezi merkezp subbuyuk buyuk / / 64

16 Regression Function Continuity One of the assumptions we made before was that the regression functions are continuous at the cutoff for both treatment and control groups. treat <- which(x >= 0) contr <- which(x < 0) cutoffs <- seq(-5,5, by=1) cutoffs <- cutoffs[-which(cutoffs == 0)] res <- list() for(i in 1:length(cutoffs)){ if(cutoffs[i] < 0){ res[[i]] <- rdrobust(y[contr], X[contr], c=cutoffs[i]) } else { res[[i]] <- rdrobust(y[treat], X[treat], c=cutoffs[i]) } } cbind(cutoff = cutoffs, t(round(sapply(res, function(x) cbind(x$coef, x$ci)[3,]), 3))) cutoff Coeff CI Lower CI Upper [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] Sensitivity to Observations Close to Cutoff If there is potential for manipulation, it would be those observations closes to the cutoff who are most susceptible. Take them out and evaluate effect. rdrobust(y[abs(x) >= 0.25], X[abs(X) >= 0.25])[c("coef", "ci")] $coef Coeff Conventional Bias-Corrected Robust $ci CI Lower CI Upper Conventional Bias-Corrected Robust / / 64 Donut-hole Estimation Conclusion out <- t(sapply(seq(0, 1.25, by=.25), function(i) with(rdrobust(y[abs(x) >= i], X[abs(X) >= i]), c(coef=coef[3], ci[3,])))) out <- cbind(radius = seq(0, 1.25, by=.25), out) out radius coef CI Lower CI Upper [1,] [2,] [3,] [4,] [5,] [6,] The RDD approach can be valuable with the right data and question. Have to be careful that the causal effect is not a modeling artifact. Use data-driven tools to estimate appropriate bandwidth, window width, etc... Do sensitivity testing to make sure that your results are not sensitive to modeling choices 63 / / 64

Regression III Regression Discontinuity Designs

Regression III Regression Discontinuity Designs Motivation Regression III Regression Discontinuity Designs Dave Armstrong University of Western Ontario Department of Political Science Department of Statistics and Actuarial Science (by courtesy) e: dave.armstrong@uwo.ca

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Regression Discontinuity Designs in Stata

Regression Discontinuity Designs in Stata Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Regression Discontinuity Designs

Regression Discontinuity Designs Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational

More information

Regression Discontinuity Design

Regression Discontinuity Design Chapter 11 Regression Discontinuity Design 11.1 Introduction The idea in Regression Discontinuity Design (RDD) is to estimate a treatment effect where the treatment is determined by whether as observed

More information

Multidimensional Regression Discontinuity and Regression Kink Designs with Difference-in-Differences

Multidimensional Regression Discontinuity and Regression Kink Designs with Difference-in-Differences Multidimensional Regression Discontinuity and Regression Kink Designs with Difference-in-Differences Rafael P. Ribas University of Amsterdam Stata Conference Chicago, July 28, 2016 Motivation Regression

More information

A Practical Introduction to Regression Discontinuity Designs: Part I

A Practical Introduction to Regression Discontinuity Designs: Part I A Practical Introduction to Regression Discontinuity Designs: Part I Matias D. Cattaneo Nicolás Idrobo Rocío Titiunik December 23, 2017 Monograph prepared for Cambridge Elements: Quantitative and Computational

More information

Exam ECON5106/9106 Fall 2018

Exam ECON5106/9106 Fall 2018 Exam ECO506/906 Fall 208. Suppose you observe (y i,x i ) for i,2,, and you assume f (y i x i ;α,β) γ i exp( γ i y i ) where γ i exp(α + βx i ). ote that in this case, the conditional mean of E(y i X x

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Supplemental Appendix to Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN Overview Assumptions of RD Causal estimand of interest Discuss common analysis issues In the afternoon, you will have the opportunity to

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp page 1 Lecture 7 The Regression Discontinuity Design fuzzy and sharp page 2 Regression Discontinuity Design () Introduction (1) The design is a quasi-experimental design with the defining characteristic

More information

Regression Discontinuity: Advanced Topics. NYU Wagner Rajeev Dehejia

Regression Discontinuity: Advanced Topics. NYU Wagner Rajeev Dehejia Regression Discontinuity: Advanced Topics NYU Wagner Rajeev Dehejia Summary of RD assumptions The treatment is determined at least in part by the assignment variable There is a discontinuity in the level

More information

Finding Instrumental Variables: Identification Strategies. Amine Ouazad Ass. Professor of Economics

Finding Instrumental Variables: Identification Strategies. Amine Ouazad Ass. Professor of Economics Finding Instrumental Variables: Identification Strategies Amine Ouazad Ass. Professor of Economics Outline 1. Before/After 2. Difference-in-difference estimation 3. Regression Discontinuity Design BEFORE/AFTER

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Introduction to Econometrics. Review of Probability & Statistics

Introduction to Econometrics. Review of Probability & Statistics 1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Regression Discontinuity Design

Regression Discontinuity Design Regression Discontinuity Design Marcelo Coca Perraillon University of Chicago May 13 & 18, 2015 1 / 51 Introduction Plan Overview of RDD Meaning and validity of RDD Several examples from the literature

More information

ted: a Stata Command for Testing Stability of Regression Discontinuity Models

ted: a Stata Command for Testing Stability of Regression Discontinuity Models ted: a Stata Command for Testing Stability of Regression Discontinuity Models Giovanni Cerulli IRCrES, Research Institute on Sustainable Economic Growth National Research Council of Italy 2016 Stata Conference

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

A Practical Introduction to Regression Discontinuity Designs: Volume I

A Practical Introduction to Regression Discontinuity Designs: Volume I A Practical Introduction to Regression Discontinuity Designs: Volume I Matias D. Cattaneo Nicolás Idrobo Rocío Titiunik April 11, 2018 Monograph prepared for Cambridge Elements: Quantitative and Computational

More information

Regression Discontinuity Design Econometric Issues

Regression Discontinuity Design Econometric Issues Regression Discontinuity Design Econometric Issues Brian P. McCall University of Michigan Texas Schools Project, University of Texas, Dallas November 20, 2009 1 Regression Discontinuity Design Introduction

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Regression Discontinuity Designs.

Regression Discontinuity Designs. Regression Discontinuity Designs. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 31/10/2017 I. Brunetti Labour Economics in an European Perspective 31/10/2017 1 / 36 Introduction

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Section 3 : Permutation Inference

Section 3 : Permutation Inference Section 3 : Permutation Inference Fall 2014 1/39 Introduction Throughout this slides we will focus only on randomized experiments, i.e the treatment is assigned at random We will follow the notation of

More information

Gov 2002: 3. Randomization Inference

Gov 2002: 3. Randomization Inference Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last week: This week: What can we identify using randomization? Estimators were justified via

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

AGEC 621 Lecture 16 David Bessler

AGEC 621 Lecture 16 David Bessler AGEC 621 Lecture 16 David Bessler This is a RATS output for the dummy variable problem given in GHJ page 422; the beer expenditure lecture (last time). I do not expect you to know RATS but this will give

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

L6: Regression II. JJ Chen. July 2, 2015

L6: Regression II. JJ Chen. July 2, 2015 L6: Regression II JJ Chen July 2, 2015 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error,

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

An Alternative Assumption to Identify LATE in Regression Discontinuity Design An Alternative Assumption to Identify LATE in Regression Discontinuity Design Yingying Dong University of California Irvine May 2014 Abstract One key assumption Imbens and Angrist (1994) use to identify

More information

Applied Microeconometrics Chapter 8 Regression Discontinuity (RD)

Applied Microeconometrics Chapter 8 Regression Discontinuity (RD) 1 / 26 Applied Microeconometrics Chapter 8 Regression Discontinuity (RD) Romuald Méango and Michele Battisti LMU, SoSe 2016 Overview What is it about? What are its assumptions? What are the main applications?

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Correlation and regression

Correlation and regression NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:

More information

Why high-order polynomials should not be used in regression discontinuity designs

Why high-order polynomials should not be used in regression discontinuity designs Why high-order polynomials should not be used in regression discontinuity designs Andrew Gelman Guido Imbens 6 Jul 217 Abstract It is common in regression discontinuity analysis to control for third, fourth,

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

The Economics of European Regions: Theory, Empirics, and Policy

The Economics of European Regions: Theory, Empirics, and Policy The Economics of European Regions: Theory, Empirics, and Policy Dipartimento di Economia e Management Davide Fiaschi Angela Parenti 1 1 davide.fiaschi@unipi.it, and aparenti@ec.unipi.it. Fiaschi-Parenti

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Two Sample Problems. Two sample problems

Two Sample Problems. Two sample problems Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

1 Independent Practice: Hypothesis tests for one parameter:

1 Independent Practice: Hypothesis tests for one parameter: 1 Independent Practice: Hypothesis tests for one parameter: Data from the Indian DHS survey from 2006 includes a measure of autonomy of the women surveyed (a scale from 0-10, 10 being the most autonomous)

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers: Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS 776 1 14 G-Estimation ( G-Estimation of Structural Nested Models 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

(1) Sort all time observations from least to greatest, so that the j th and (j + 1) st observations are ordered by t j t j+1 for all j = 1,..., J.

(1) Sort all time observations from least to greatest, so that the j th and (j + 1) st observations are ordered by t j t j+1 for all j = 1,..., J. AFFIRMATIVE ACTION AND HUMAN CAPITAL INVESTMENT 8. ONLINE APPENDIX TO ACCOMPANY Affirmative Action and Human Capital Investment: Theory and Evidence from a Randomized Field Experiment, by CHRISTOPHER COTTON,

More information

Section 3: Permutation Inference

Section 3: Permutation Inference Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47 Introduction Throughout this slides we will focus only on randomized experiments, i.e

More information

1 Impact Evaluation: Randomized Controlled Trial (RCT)

1 Impact Evaluation: Randomized Controlled Trial (RCT) Introductory Applied Econometrics EEP/IAS 118 Fall 2013 Daley Kutzman Section #12 11-20-13 Warm-Up Consider the two panel data regressions below, where i indexes individuals and t indexes time in months:

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Regression Discontinuity Designs Using Covariates

Regression Discontinuity Designs Using Covariates Regression Discontinuity Designs Using Covariates Sebastian Calonico Matias D. Cattaneo Max H. Farrell Rocío Titiunik May 25, 2018 We thank the co-editor, Bryan Graham, and three reviewers for comments.

More information

Holiday Assignment PS 531

Holiday Assignment PS 531 Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple

More information

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University QED Queen s Economics Department Working Paper No. 1319 Hypothesis Testing for Arbitrary Bounds Jeffrey Penney Queen s University Department of Economics Queen s University 94 University Avenue Kingston,

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models Causal Inference Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 Outline Causal Inference [G&H Ch 9] The Fundamental Problem Confounders, and how Controlled

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Introduction to Econometrics. Multiple Regression (2016/2017)

Introduction to Econometrics. Multiple Regression (2016/2017) Introduction to Econometrics STAT-S-301 Multiple Regression (016/017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 OLS estimate of the TS/STR relation: OLS estimate of the Test Score/STR relation:

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information