Regression III Regression Discontinuity Designs

Size: px
Start display at page:

Download "Regression III Regression Discontinuity Designs"

Transcription

1 Motivation Regression III Regression Discontinuity Designs Dave Armstrong University of Western Ontario Department of Political Science Department of Statistics and Actuarial Science (by courtesy) e: w: Often times, we want to use regression analysis to make causal statements. We can only do this if: All of our modeling assumptions hold. Including - independence between X and ". Normally, with observational data, these assumptions are unlikely to hold. Some research designs can leverage near-random assignment to make mimic an experimental situation. 1 / 67 2 / 67 Example: State-building in Vietnam US Government Metrics What were the effects of different military strategies on security, development, governance, civil society, etc... in Vietnam? Why can t we just do: Modernization = b 0 + b 1 Bombing + Z where each observation is a hamlet in Vietnam. Citation: Dell, Melissa and Pablo Querubin (2018) Nation Building Through Foreign Intervetion: Evidence from Discontinuities in Military Strategies Quarterly Journal of Economics, 133(2): e The US DoD used several metrics to guide military strategy. Abatteryof169questionsaboutsecurity,politicsandeconomicswas combined using Bayes rule to identify a security score: S =[0, 5]. The mainframe wouldn t print out the continuous score, so they rounded it and printed out the rounded numbers. Identification on causal effects can be obtained by considering hamlets that are close on the continuous score, but get rounded into different categories (e.g., , ) 3 / 67 4 / 67

2 Discontinuity Reference We know that in the assignment of a score, a discontinuity exists at the rounding threshold. How can we estimate the effect of bombings, which are assigned largely based on the discontinuity? How do we know that effect is real and not some modeling artifact? What assumptions are needed to motivate this type of analysis? This lecture is based primarily on the working manuscript: Matias D. Cattaneo, Nicolás Idrobo & Rocío Titiunik (2018) A Practical Introduction to Regression Discontinuity Designs. Cambridge University Press. Running Example (Meyersson, 2014): Units of observsation: Municipalities in Turkey s 1994 Mayoral Election Outcome: Educational attainment of women Score (Running) variable: Margin of victory of the (largest) Islamic party. Treatment: Islamic party electoral victory (Win if margin of victory > 0). 5 / 67 6 / 67 Preliminary Notation Sharp vs. Fuzzy RDD Formalizing the design, n units indexed by i = 1, 2,...,n. Each unit has a value on the score or running variable X ( i 1 if X c is a known cutoff such that T i = i > c 0 otherwise. The probability of treatment assignment changes discontinuously at the cutoff. 7 / 67 8 / 67

3 Potential Outcomes Framework Extrapolation and RDD Each observation has two potential outcomes: Y i (0): the outcome observed under the control and Y i (1): the outcome observed under the treatment Comparing these two effects should give us a sense of the causal effect of avariable. However, we only observe one or the other of those things for each observation. ( E[Y E[Y i X i ]= i (0) X i ] if x < x E[Y i (1) X i ] if x x In the sharp design, there is no joint support over Y i (0) and Y i (1) Extrapolation is required to identify the causal effect. 9 / / 67 The Main Idea The fundamental idea is that the discontinuity can provide a measure of the causal impact if: Both E[Y i (0) X i = x] and E[Y i (1) X i = x] are both continuous in x at the discontinuity X i = x. where E[Y i (1) Y i (0) X = x] =lim x# x E[Y i X = x] SRD =lim x# x E[Y i X = x] lim E[Y i X = x] x" x lim E[Y i X = x] x" x 11 / 67 In the fuzzy design... Fuzzy Design Pr(Treated) changes at x, but not from 0 to 1 like in the sharp design. This could happen if everyone above x was eligible for the treatment, but only some took part. FRD = E[(D i(1) D i (0))(Y i (1) Y i (0)) X i = x] E[(D i (1) D i (0)) X i = x] = lim x# x E[Y i X i = x] lim x" x E[Y i X i = x] lim x# x E[D i X i = x] lim x" x E[D i X i = x] where D i (0) is the treatment take-up indicator for those assigned to the control group, and D i (1) is the treatment take-up indicator for those assigned to the treatment 12 / 67

4 Let s get Kinky Kink Designs The Kink RD tries to estimate first derivatives of the regression function rather than the function itself. SKRD = d dx E[Y i(1) Y i (0) X i = x] FKRD = =lim x# x x= x d dx E[Y d i X i = x] lim x" x dx E[Y i X i = x] d dx E[(D i(1) D i (0))(Y i (1) Y i (0)) X i = x] x= x d dx E[(D i(1) D i (0)) X i = x] x= x = lim x# x d dx E[Y d i X i = x] lim x" x dx E[Y i X i = x] d lim x# x dx E[D d i X i = x] lim x" x dx E[D i X i = x] 13 / / 67 Other Designs RD Effects are Local The difference between E[Y i (1) X ] and E[Y i (0) X ] is calculated at a single point ( x) along the support of X. The effect will not necessarily generalize as we move away from the threshold without strong (usually unjustified) assumptions about the regression function. Multi-cutoff Designs Multiple Score/Geographic Designs 15 / / 67

5 Could Be (but isn t) Useful library(haven) data <- read_dta(" Y <- data$y X <- data$x Z <- data$z Z_X <- Z*X plot(y ~ X, xlab = "Islamic Victory", ylab = "Female High School Share") abline(v=0) Islamic Victory Female High School Share 17 / 67 Binning Estimator We can partition the observations into bins and then take the average y within bins to get a sense of how the discontinuity looks. Ȳ,j = 1 #{X i 2 B,j } X i:x i 2B,j Y i and Ȳ +,j = 1 #{X i 2 B +,j } X i:x i 2B +,j Y i 18 / 67 RD Plot library(rdrobust) out <- rdplot(y, X, nbins = c(20, 20), binselect = "esmv") RD Plot Y axis 19 / 67 Notes on the Previous Slide 1. The binning and global parametric model certainly make it easier to see what is happening with respect to the discontinuity. 2. Global polynomials are not necessarily great because they are known to be unstable in the tails and the tail is, by definition the place we re looking. 20 / 67

6 Binning Estimators Bins Example out = rdplot(y, X, binselect = 'es') out = rdplot(y, X, binselect = 'qs') RD Plot RD Plot Bins can be: Evenly Spaced (with different numbers of observations in each category) Quantile Spaced (with different distances between bin boundaries) Y axis X axis X axis There are a number of methods to optimally pick the number of bins. Y axis / / 67 Optimally Choosing Bins: IMSE Optimally Choosing Bins: Mimicking Variability Some optimize on Integrated Mean Squared Error (IMSE), so as to make the optimal tradeoff between bias and variance. Not always best because it could produce an overly smooth plot. Omitting the nbins argument and specifying binselect = 'es' or binselect = 'qs' will generate these optimal bins for evenly and quantile space bins, respectively. Bins can be chosen such that the variability in the binned means mimics variability in the raw data. Not overly smooth like the IMSE binned estimator. Generally results in more bins than the IMSE method. ES bins can sometimes result in high variance in sparse regions of the data because the bins have to be small to accommodate the higher density around the cutoff. binselect='esmv' and binselect='qsmv' will generate the mimicking variance estimators. 23 / / 67

7 Bins Example RD Plots out = rdplot(y, X, binselect = 'esmv') out = rdplot(y, X, binselect = 'qsmv') RD Plot RD Plot Y axis Y axis Good for illustration and investigation, but not for treatment effect. polynomials are too variable at the boundary points Use MV bins (both QS and ES side-by-side) to illustrate the design, with a global 4th or 5th order polynomial X axis X axis 25 / / 67 Continuity-based Approach Fundamentals Better for point estimates and inference of the treatment effect. Use polynomial methods local to the cutoff to model E[Y i X i = x] from either side and treat SRD as a parameter to be estimated. Either global polynomials (when all obs are used) or local polynomials (when only obs near cutoff are used) model the treatment effect. The running (X )variableisassumedtobecontinuousandsothere are few, if any, observations at X = x. To estimate E[Y i (1) X i = x] and E[Y i (0) X i = x], points near (but not at) the cutoff need to be used. The main point of interest and attention here is how the regression function is specified. Has huge effects on the robustness and credibility of the design and inference. The primary tool for estimating the effect is a low-order local polynomial regression. 27 / / 67

8 LPR in RDD Example: First-order LPR 1. Choose order of the polynomial. 2. Choose bandwidth h, such that only observations between [ x h, x + h] are used to fit the LPR. 3. Estimate the following two LPR models: Ŷ i =ˆµ + + Ŷ i =ˆµ + px ˆµ +,1 (X i c) p (1) i=1 px ˆµ,1 (X i c) p, (2) i=1 using weights w i = K x i c h. 4. ˆ SRD =ˆµ + ˆµ, which estimates: E[Y i (1) X i = x] E[Y i (0) X i = x] 29 / / 67 Choices to make in LPR Bias and Bandwidth Kernel - triangular kernel (with MSE optimal bandwidth selection) leads to a point-estimate with optimal MSE properties. Here, weight declines linearly moving away from x. Other common options are Uniform and Epanechnikov kernels, but results tend to be robust with respect to this choice. Polynomial Order - in an effort to make the appropriate bias-variance tradeoff, polynomialorderofp = 1orp = 2isusuallyrecommended with optimal bandwidth selection to maximize accuracy of the estimate. Most research relies on local linear regression. Bandwidth - automatically selected given the two choices above (more below) to make the appropriate bias-variance tradeoff. 31 / / 67

9 Optimal Bandwidth Choice Optimal BW Selection in R Generally chosen to minimize MSE: Bias 2 + Variance. The bias is found by relating the local linear estimator to the curvature of the of the unknown regression function and depends primarily on the (p + 1) th derivative of the function. The variance term is a function of density of the running variable around the cutoff (which is negatively related to variance) and the conditional variability of the estimate. Different bandwidths can be chosen on either side of the cutoff since the treatment effect is the difference between two one-sided estimates. Aregularizationtermisoftenincludedtopreventstrangebehavior when bias is nearly zero (i.e., when a global linear model fits well). summary(rdbwselect(y, X, kernel = 'triangular', p = 1, bwselect = 'msetwo')) Call: rdbwselect Number of Obs BW type msetwo Kernel Triangular VCE method NN Number of Obs Order est. (p) 1 1 Order bias (q) 2 2 ======================================================= BW est. (h) BW bias (b) Left of c Right of c Left of c Right of c ======================================================= msetwo ======================================================= Use the argument bwselect = 'mserd' for a single bandwidth across both regions. 33 / / 67 Using rdrobust to Calculate Treatment Effect Using rdrobust to Calculate Treatment Effect (2) summary(rdrobust(y, X, kernel = "triangular", p = 1, bwselect = "mserd")) Call: rdrobust Number of Obs BW type mserd Kernel Triangular VCE method NN Number of Obs Eff. Number of Obs Order est. (p) 1 1 Order bias (p) 2 2 BW est. (h) BW bias (b) rho (h/b) ============================================================================= Method Coef. Std. Err. z P> z [ 95% C.I. ] ============================================================================= Conventional [0.223, 5.817] Robust [-0.309, 6.276] ============================================================================= summary(rdrobust(y, X, kernel = "triangular", p = 1, bwselect = "msetwo")) Call: rdrobust Number of Obs BW type msetwo Kernel Triangular VCE method NN Number of Obs Eff. Number of Obs Order est. (p) 1 1 Order bias (p) 2 2 BW est. (h) BW bias (b) rho (h/b) ============================================================================= Method Coef. Std. Err. z P> z [ 95% C.I. ] ============================================================================= Conventional [0.243, 5.695] Robust [-0.245, 6.152] ============================================================================= 35 / / 67

10 RD Plot, Optimal Bandwidth Inference bandwidth <- rdrobust(y, X, kernel = 'triangular', p = 1, bwselect = 'mserd')$bws[1,1] out <- rdplot(y[abs(x)<=bandwidth], X[abs(X)<=bandwidth], p = 1, kernel = 'triangular') Y axis RD Plot X axis Inference is less straightforward here, for reasons similar to those we ve seen before. Bandwidth has been selected to make the optimal bias-variance tradeoff. An implication of this is that the model is almost necessarily mis-specified because the algorithm didn t minimize bias, but a combination of bias and variance. Cattaneo et al propose a robust, bias-corrected confidence interval for hypothesis testing. Centered around a bias-corrected parameter estimate Variance takes into account the variability in the bias-correction phase as well as sampling variability. 37 / / 67 Inference in Practice Another Alternative out <- rdrobust(y, X, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE) cbind(out$coef, out$ci) Coeff CI Lower CI Upper Conventional Bias-Corrected Robust Another alternative for inference is to use a different bandwidth for inference than for point estimation. An interesting choice here is to use h CER which is a bandwidth that is defined to minimize coverage errors of confidence intervals. out <- rdrobust(y, X, kernel = "triangular", p = 1, bwselect = "cerrd", all = TRUE) cbind(out$coef, out$ci) Coeff CI Lower CI Upper Conventional Bias-Corrected Robust / / 67

11 Covariates in the Meyersson Datas Including Covariates searchvarlabels(data, "") ind label X 1 Islamic Vote Margin in 1994 Y 2 Share Women aged with High School Education Z 3 Islamic mayor in 1994 ageshr19 4 Population share below 19 in 2000 ageshr60 5 Population share above 60 in 2000 buyuk 6 Metro center hischshr1520m 7 Share Men aged with High School Education i89 8 Islamic Mayor in 1989 lpop Log Population in 1994 merkezi 10 District center merkezp 11 Province center partycount 12 Number of parties receiving votes 1994 sexr 13 Gender ratio in 2000 shhs 14 Household size in 2000 subbuyuk 15 Sub-metro center vshr_islam Islamic vote share 1994 Covariates can be included in the RD design with the covs argument in rdrobust. The estimate is only really considered a treatment effect if the covariates are determined and fixed before the assignment of the treatment. Covariates can reduce sampling variability without increasing bias in the best case scenario. Z = data[,c("vshr_islam1994", "partycount", "lpop1994", "merkezi", "merkezp", "subbuyuk", "buyuk")] outcov <- rdrobust(y, X, covs = Z, kernel = 'triangular', scaleregul = 1, p = 1, bwselect = 'mserd') cbind(outcov$coef, outcov$ci) Coeff CI Lower CI Upper Conventional Bias-Corrected Robust / / 67 Randomization Inference Approach The previous approach leveraged the assumption of continuity and smoothness of E[Y i (0) X i = x] and E[Y i (1) X i = x] at the cutoff to make inferences. Randomization inference views the RD design as a randomized experiment around the cutoff x. The sharp differences in treatment status at the cutoff resemble a randomized controlled trial at the cutoff. Units whose score value (values on the running variable) are in a small window around the cutoff can be analyzed as being from a randomly assigned experiment. Local randomization inference is particularly useful when the running variable is discrete or has relatively few points. It can be used as a robustness check for continuity based designs, but local randomization requires stronger assumptions. We assume that: Local Randomization Overview For points in a small window around the cutoff, W 0 =[ x w 0, x + w 0 ], status into treatment or control can be considered to be randomly assigned (aka as if random assignment). Not only is the assignment random, but the running variable in the window must be unrelated to the outcome. Similarity of RD and Experiments: 43 / / 67

12 Formalization Estimation and Inference In the strongest version, we assume: For X i 2 W 0, Y i (X i, T i )=Y i (T i ), the running variable only influences Y through the treatment indicator. In a weaker version, we could relax above to: (Y i (X i, T i ), X i, T i )=Ỹi(T i ), there exists a transformation for which the first condition mentioned above is true. Estimation could take the form of large-sample statistical estimators if there are lots of X i 2 W 0, but this is often not the case. Randomization inference has exact, finite-sample properties which makes it quite attractive for this case. Fisherian inference: Potential outcomes are non-stochastic (i.e., fixed, no random sampling assumed). H0 F : Y i(0) =Y i (1)8i Under the null, all outcomes are observed because for each observation the two outcomes are the same. 45 / / 67 Hypothetical Example of Fisherian Inference Distribution of Test Statistic under Null Imagine we have 5 units in W 0 and we randomly assign n W0,+ = 3units to the treatment and n W0, = n W0 n W0,+ = 2unitstothecontrol. Under full randomization, we could assume that n W0,+, and by extension n W0, are fixed and find all possible vectors t of the treatment and control that preserve the marginal distribution of T. In our example, there are 5 3 = 10 possible assignments to treatment and control. Assume that Y =(5, 2, 2, 5, 5) and that T =(1, 0, 0, 1, 1), then the observed difference in means is S obs = Ȳ + Ȳ = = 3. If complete enumeration of all possible outcomes is not feasibe, simulation can be used. 47 / / 67

13 Test Statistics Fisherian inference is general and should work for any test statistic. Some other common choices for RD designs are: Kolmogorov-Smirnov (KS) statistics: S KS = sum ˆF 1 (y) ˆF 0 (y), the biggest absolute difference in the two empirical CDFs. Better than difference of means when departures from null are in other moments or quantiles. Wilcoxon rank sum statistic: S WR = P i:t i =1 Ry i where R y i is the outcome rank. S WR is not effected by the cardinal values of the outcome, only their ordering. 49 / 67 library(rdlocrand) rdrandinf(y, X, wl = -2.5, wr=2.5, seed = 50, reps=2500) Selected window = [-2.5;2.5] Running randomization-based test... Randomization-based test complete. Number of obs = 2629 Order of poly = 0 Kernel type = uniform Reps = 2500 Window = set by user H0: tau = 0 Randomization = fixed margins Cutoff c = 0 Left of c Right of c Number of obs Eff. number of obs Mean of outcome S.d. of outcome Window Finite sample Large sample Statistic T P> T P> T Power vs d = 4.27 Diff. in means $sumstats [,1] [,2] [1,] [2,] [3,] [4,] [5,] $obs.stat [1] / 67 Choosing the Window Data-driven Choice of Window Some options: 1. Ad hoc or theoretically defined - both are different flavors of arbitrary. 2. Use pre-treatment covariates to select the window. Assumes that there exists a variable Z that is related to the running variable outside the window, but not inside the window. (without this assumption, the procedure breaks down) The effect of the treatment on Z, since it is pre-determined, is 0 by construction. 1. Identify H0 F : Z is unrelated to T or balanced on T. 2. Start with smallest possible window and test H0 F. 3. Continue to widen window until H0 F is rejected at a pre-specified significance level. 4. The chosen window is the largest one that continues to fail to reject H0 F. We need to choose the following things: Relevant Covariates Test Statistic Randomization mechanism Minimum n in smallest window Significance level 51 / / 67

14 Formalization of the Procedure Window Selection in R Z <- data[, c("i89", "vshr_islam1994", "partycount", "lpop1994", "merkezi", "merkezp", "subbuyuk", "buyuk")] rdwinselect(x, Z, seed = 50, reps = 1000, wobs = 2) 1. Start with a symmetric window of length 2w j, W j = X ± w j 2. Compute the test statistic either for each covariate individually or compute the omnibus test p-value. 3. Find the smallest p-vale p min and evaluate whether p min >. If yes, then fail to reject H 0 and increase the size of the window by a pre-specified step. If no, then use the window W j 1. The step procedure can be defined by a fixed length (wstep in R) or such that a certein number of observations is included (wobs in R). 53 / 67 Window selection for RD under local randomization Number of obs = 2629 Order of poly = 0 Kernel type = uniform Reps = 1000 Testing method = rdrandinf Balance test = diffmeans Cutoff c = 0 Left of c Right of c Number of obs st percentile th percentile th percentile th percentile Window length / 2 p-value Var. name Bin.test Obs<c Obs>=c i i i i i i i i merkezi i Recommended window is [-0.944;0.944] with 38 observations (17 below, 21 above). 54 / 67 rdrandinf(y, X, wl = -.944, wr=.944, seed = 50, reps=2500) Selected window = [-0.944;0.944] Running randomization-based test... Randomization-based test complete. Number of obs = 2629 Order of poly = 0 Kernel type = uniform Reps = 2500 Window = set by user H0: tau = 0 Randomization = fixed margins Cutoff c = 0 Left of c Right of c Number of obs Eff. number of obs Mean of outcome S.d. of outcome Window Finite sample Large sample Statistic T P> T P> T Power vs d = Diff. in means $sumstats [,1] [,2] [1,] [2,] [3,] [4,] [5,] $obs.stat [1] $p.value 55 / 67 Local Randomization or Continuity Approach? Local randomization requires stronger assumptions than the continuity-based approach, thus one might use this approach to probe the conditions under which inference makes sense. The continuity-based approach requires reasonable data density around the cutoff. If this isn t the case, then the local randomization approach might be better. When the running variable is discrete (even potentially with lots of values, e.g., age in years), the local randomization approach could be better because there will be mass points with multiple observations. 56 / 67

15 Validation RD Plots of Covariates and Placebos rdplot(data$i89, X, x.label = "Score", y.label = "", title = "", x.lim = c(-100,100), cex.axis = 1.5, cex.lab = 1.5) There are threats to validity with RD designs. If the cutoff is known to the observations ahead of time, this can threaten the validity of the RD design. Observations may try to actively manipulate their score if they are just below the cutoff. There are empirical tests aimed at evaluating the validity of the design. 1. continuity of the score density around the cutoff 2. null treatment effects on pre-treatment covariates and placebos 3. Look at regression function continuity at arbitrary alternative cutoffs. rdplot(data$merkezi, X, x.label = "Score", y.label = "", title = "", x.lim = c(-100,100), cex.axis = 1.5, cex.lab = 1.5) Score Score 57 / / 67 Density of the Running Variable If units don t have the ability to manipulate their score, then there should be similar data density on both sides of the cutoff. library(rddensity) summary(rddensity(x)) Density Histogram bw_left = as.numeric(rddensity(x)$h[1]); bw_right = as.numeric(rddensity(x)$h[2]); tempdata = as.data.frame(x); colnames(tempdata) = c("v1"); plot2 = ggplot(data=tempdata, aes(tempdata$v1)) + theme_bw(base_size = 17) + geom_histogram(data = tempdata, aes(x = v1, y=..count..), breaks = seq(-bw_left, 0, 1), fill = "blue", col geom_histogram(data = tempdata, aes(x = v1, y=..count..), breaks = seq(0, bw_right, 1), fill = "red", col = labs(x = "Score", y = "Number of Observations") + geom_vline(xintercept = 0, color = "black") plot2 RD Manipulation Test using local polynomial density estimation. Number of obs = 2629 Model = unrestricted Kernel = triangular BW method = comb VCE method = jackknife Cutoff c = 0 Left of c Right of c Number of obs Eff. Number of obs Order est. (p) 2 2 Order bias (q) 3 3 BW est. (h) Method T P > T Robust Number of Observations Score 59 / / 67

16 Null Effects on Pre-treatment Covariates and Placebos Covariates and Placebos If the effect is causal, then it should not be related to pre-treatment covariates or placebo conditions. Anything determined before the treatment counts as a pre-treatment covariate. Placebo outcomes are context-specific. Z <- data[, c("i89", "vshr_islam1994", "partycount", "lpop1994", "merkezi", "merkezp", "subbuyuk", "buyuk")] robs <- lapply(1:ncol(z), function(x)rdrobust(z[[x]], X)) names(robs) <- colnames(z) t(round(sapply(robs, function(x)cbind(x$coef, x$ci)[3,]), 3)) Coeff CI Lower CI Upper i vshr_islam partycount lpop merkezi merkezp subbuyuk buyuk / / 67 With Randomization Inference robs <- lapply(1:ncol(z), function(x)rdrandinf(z[[x]], X, wl=-.944, wr=.944)) names(robs) <- colnames(z) t(round(sapply(robs, function(x)c(stat=x$obs.stat, pval=x$p.value)), 4)) stat pval i vshr_islam partycount lpop merkezi merkezp subbuyuk buyuk Regression Function Continuity One of the assumptions we made before was that the regression functions are continuous at the cutoff for both treatment and control groups. treat <- which(x >= 0) contr <- which(x < 0) cutoffs <- seq(-5,5, by=1) cutoffs <- cutoffs[-which(cutoffs == 0)] res <- list() for(i in 1:length(cutoffs)){ if(cutoffs[i] < 0){ res[[i]] <- rdrobust(y[contr], X[contr], c=cutoffs[i]) } else { res[[i]] <- rdrobust(y[treat], X[treat], c=cutoffs[i]) } } cbind(cutoff = cutoffs, t(round(sapply(res, function(x) cbind(x$coef, x$ci)[3,]), 3))) cutoff Coeff CI Lower CI Upper [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] / / 67

17 Sensitivity to Observations Close to Cutoff Donut-hole Estimation If there is potential for manipulation, it would be those observations closes to the cutoff who are most susceptible. Take them out and evaluate effect. rdrobust(y[abs(x) >= 0.25], X[abs(X) >= 0.25])[c("coef", "ci")] $coef Coeff Conventional Bias-Corrected Robust $ci CI Lower CI Upper Conventional Bias-Corrected Robust out <- t(sapply(seq(0, 1.25, by=.25), function(i) with(rdrobust(y[abs(x) >= i], X[abs(X) >= i]), c(coef=coef[3], ci[3,])))) out <- cbind(radius = seq(0, 1.25, by=.25), out) out radius coef CI Lower CI Upper [1,] [2,] [3,] [4,] [5,] [6,] / / 67 Conclusion The RDD approach can be valuable with the right data and question. Have to be careful that the causal effect is not a modeling artifact. Use data-driven tools to estimate appropriate bandwidth, window width, etc... Do sensitivity testing to make sure that your results are not sensitive to modeling choices 67 / 67

Regression III Regression Discontinuity Designs

Regression III Regression Discontinuity Designs Motivation Regression III Regression Discontinuity Designs Dave Armstrong University of Western Ontario Department of Political Science Department of Statistics and Actuarial Science (by courtesy) e: dave.armstrong@uwo.ca

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Regression Discontinuity Designs in Stata

Regression Discontinuity Designs in Stata Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization

More information

Regression Discontinuity Designs

Regression Discontinuity Designs Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational

More information

A Practical Introduction to Regression Discontinuity Designs: Part I

A Practical Introduction to Regression Discontinuity Designs: Part I A Practical Introduction to Regression Discontinuity Designs: Part I Matias D. Cattaneo Nicolás Idrobo Rocío Titiunik December 23, 2017 Monograph prepared for Cambridge Elements: Quantitative and Computational

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.

More information

A Practical Introduction to Regression Discontinuity Designs: Volume I

A Practical Introduction to Regression Discontinuity Designs: Volume I A Practical Introduction to Regression Discontinuity Designs: Volume I Matias D. Cattaneo Nicolás Idrobo Rocío Titiunik April 11, 2018 Monograph prepared for Cambridge Elements: Quantitative and Computational

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Supplemental Appendix to Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides

More information

Multidimensional Regression Discontinuity and Regression Kink Designs with Difference-in-Differences

Multidimensional Regression Discontinuity and Regression Kink Designs with Difference-in-Differences Multidimensional Regression Discontinuity and Regression Kink Designs with Difference-in-Differences Rafael P. Ribas University of Amsterdam Stata Conference Chicago, July 28, 2016 Motivation Regression

More information

Regression Discontinuity Design

Regression Discontinuity Design Chapter 11 Regression Discontinuity Design 11.1 Introduction The idea in Regression Discontinuity Design (RDD) is to estimate a treatment effect where the treatment is determined by whether as observed

More information

Regression Discontinuity: Advanced Topics. NYU Wagner Rajeev Dehejia

Regression Discontinuity: Advanced Topics. NYU Wagner Rajeev Dehejia Regression Discontinuity: Advanced Topics NYU Wagner Rajeev Dehejia Summary of RD assumptions The treatment is determined at least in part by the assignment variable There is a discontinuity in the level

More information

Introduction to Econometrics. Review of Probability & Statistics

Introduction to Econometrics. Review of Probability & Statistics 1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical

More information

Exam ECON5106/9106 Fall 2018

Exam ECON5106/9106 Fall 2018 Exam ECO506/906 Fall 208. Suppose you observe (y i,x i ) for i,2,, and you assume f (y i x i ;α,β) γ i exp( γ i y i ) where γ i exp(α + βx i ). ote that in this case, the conditional mean of E(y i X x

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp page 1 Lecture 7 The Regression Discontinuity Design fuzzy and sharp page 2 Regression Discontinuity Design () Introduction (1) The design is a quasi-experimental design with the defining characteristic

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Applied Microeconometrics Chapter 8 Regression Discontinuity (RD)

Applied Microeconometrics Chapter 8 Regression Discontinuity (RD) 1 / 26 Applied Microeconometrics Chapter 8 Regression Discontinuity (RD) Romuald Méango and Michele Battisti LMU, SoSe 2016 Overview What is it about? What are its assumptions? What are the main applications?

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Regression Discontinuity Designs.

Regression Discontinuity Designs. Regression Discontinuity Designs. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 31/10/2017 I. Brunetti Labour Economics in an European Perspective 31/10/2017 1 / 36 Introduction

More information

1 Independent Practice: Hypothesis tests for one parameter:

1 Independent Practice: Hypothesis tests for one parameter: 1 Independent Practice: Hypothesis tests for one parameter: Data from the Indian DHS survey from 2006 includes a measure of autonomy of the women surveyed (a scale from 0-10, 10 being the most autonomous)

More information

Regression Discontinuity Design Econometric Issues

Regression Discontinuity Design Econometric Issues Regression Discontinuity Design Econometric Issues Brian P. McCall University of Michigan Texas Schools Project, University of Texas, Dallas November 20, 2009 1 Regression Discontinuity Design Introduction

More information

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

An Alternative Assumption to Identify LATE in Regression Discontinuity Design An Alternative Assumption to Identify LATE in Regression Discontinuity Design Yingying Dong University of California Irvine May 2014 Abstract One key assumption Imbens and Angrist (1994) use to identify

More information

Regression Discontinuity Design

Regression Discontinuity Design Regression Discontinuity Design Marcelo Coca Perraillon University of Chicago May 13 & 18, 2015 1 / 51 Introduction Plan Overview of RDD Meaning and validity of RDD Several examples from the literature

More information

Why high-order polynomials should not be used in regression discontinuity designs

Why high-order polynomials should not be used in regression discontinuity designs Why high-order polynomials should not be used in regression discontinuity designs Andrew Gelman Guido Imbens 6 Jul 217 Abstract It is common in regression discontinuity analysis to control for third, fourth,

More information

Finding Instrumental Variables: Identification Strategies. Amine Ouazad Ass. Professor of Economics

Finding Instrumental Variables: Identification Strategies. Amine Ouazad Ass. Professor of Economics Finding Instrumental Variables: Identification Strategies Amine Ouazad Ass. Professor of Economics Outline 1. Before/After 2. Difference-in-difference estimation 3. Regression Discontinuity Design BEFORE/AFTER

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Regression Discontinuity

Regression Discontinuity Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 24, 2017 I will describe the basic ideas of RD, but ignore many of the details Good references

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN Overview Assumptions of RD Causal estimand of interest Discuss common analysis issues In the afternoon, you will have the opportunity to

More information

Regression Discontinuity

Regression Discontinuity Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 16, 2018 I will describe the basic ideas of RD, but ignore many of the details Good references

More information

Gov 2002: 3. Randomization Inference

Gov 2002: 3. Randomization Inference Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last week: This week: What can we identify using randomization? Estimators were justified via

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers: Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Selection on Observables

Selection on Observables Selection on Observables Hasin Yousaf (UC3M) 9th November Hasin Yousaf (UC3M) Selection on Observables 9th November 1 / 22 Summary Altonji, Elder and Taber, JPE, 2005 Bellows and Miguel, JPubE, 2009 Oster,

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

An Alternative Assumption to Identify LATE in Regression Discontinuity Designs

An Alternative Assumption to Identify LATE in Regression Discontinuity Designs An Alternative Assumption to Identify LATE in Regression Discontinuity Designs Yingying Dong University of California Irvine September 2014 Abstract One key assumption Imbens and Angrist (1994) use to

More information

ted: a Stata Command for Testing Stability of Regression Discontinuity Models

ted: a Stata Command for Testing Stability of Regression Discontinuity Models ted: a Stata Command for Testing Stability of Regression Discontinuity Models Giovanni Cerulli IRCrES, Research Institute on Sustainable Economic Growth National Research Council of Italy 2016 Stata Conference

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Regression #8: Loose Ends

Regression #8: Loose Ends Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch

More information

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs Andrew GELMAN Department of Statistics and Department of Political Science, Columbia University, New York, NY, 10027 (gelman@stat.columbia.edu)

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1

More information

Correlation and regression

Correlation and regression NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Two Sample Problems. Two sample problems

Two Sample Problems. Two sample problems Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

The Economics of European Regions: Theory, Empirics, and Policy

The Economics of European Regions: Theory, Empirics, and Policy The Economics of European Regions: Theory, Empirics, and Policy Dipartimento di Economia e Management Davide Fiaschi Angela Parenti 1 1 davide.fiaschi@unipi.it, and aparenti@ec.unipi.it. Fiaschi-Parenti

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.

More information

(1) Sort all time observations from least to greatest, so that the j th and (j + 1) st observations are ordered by t j t j+1 for all j = 1,..., J.

(1) Sort all time observations from least to greatest, so that the j th and (j + 1) st observations are ordered by t j t j+1 for all j = 1,..., J. AFFIRMATIVE ACTION AND HUMAN CAPITAL INVESTMENT 8. ONLINE APPENDIX TO ACCOMPANY Affirmative Action and Human Capital Investment: Theory and Evidence from a Randomized Field Experiment, by CHRISTOPHER COTTON,

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

Regression Discontinuity

Regression Discontinuity Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 9, 2016 I will describe the basic ideas of RD, but ignore many of the details Good references

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Experiments and Quasi-Experiments

Experiments and Quasi-Experiments Experiments and Quasi-Experiments (SW Chapter 13) Outline 1. Potential Outcomes, Causal Effects, and Idealized Experiments 2. Threats to Validity of Experiments 3. Application: The Tennessee STAR Experiment

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Regression Discontinuity Designs Using Covariates

Regression Discontinuity Designs Using Covariates Regression Discontinuity Designs Using Covariates Sebastian Calonico Matias D. Cattaneo Max H. Farrell Rocío Titiunik May 25, 2018 We thank the co-editor, Bryan Graham, and three reviewers for comments.

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If

More information

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Hypothesis Tests and Confidence Intervals. in Multiple Regression ECON4135, LN6 Hypothesis Tests and Confidence Intervals Outline 1. Why multipple regression? in Multiple Regression (SW Chapter 7) 2. Simpson s paradox (omitted variables bias) 3. Hypothesis tests and

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1 Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

Optimal Data-Driven Regression Discontinuity Plots. Supplemental Appendix

Optimal Data-Driven Regression Discontinuity Plots. Supplemental Appendix Optimal Data-Driven Regression Discontinuity Plots Supplemental Appendix Sebastian Calonico Matias D. Cattaneo Rocio Titiunik November 25, 2015 Abstract This supplemental appendix contains the proofs of

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Introduction to Econometrics. Multiple Regression (2016/2017)

Introduction to Econometrics. Multiple Regression (2016/2017) Introduction to Econometrics STAT-S-301 Multiple Regression (016/017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 OLS estimate of the TS/STR relation: OLS estimate of the Test Score/STR relation:

More information

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University QED Queen s Economics Department Working Paper No. 1319 Hypothesis Testing for Arbitrary Bounds Jeffrey Penney Queen s University Department of Economics Queen s University 94 University Avenue Kingston,

More information

AGEC 621 Lecture 16 David Bessler

AGEC 621 Lecture 16 David Bessler AGEC 621 Lecture 16 David Bessler This is a RATS output for the dummy variable problem given in GHJ page 422; the beer expenditure lecture (last time). I do not expect you to know RATS but this will give

More information

Lecture 10 Regression Discontinuity (and Kink) Design

Lecture 10 Regression Discontinuity (and Kink) Design Lecture 10 Regression Discontinuity (and Kink) Design Economics 2123 George Washington University Instructor: Prof. Ben Williams Introduction Estimation in RDD Identification RDD implementation RDD example

More information

L6: Regression II. JJ Chen. July 2, 2015

L6: Regression II. JJ Chen. July 2, 2015 L6: Regression II JJ Chen July 2, 2015 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error,

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 13 Nonlinearities Saul Lach October 2018 Saul Lach () Applied Statistics and Econometrics October 2018 1 / 91 Outline of Lecture 13 1 Nonlinear regression functions

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information