Differences-in- Differences. November 10 Clair

Similar documents
1 Impact Evaluation: Randomized Controlled Trial (RCT)

Gov 2002: 4. Observational Studies and Confounding

Instrumental Variables

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Quantitative Economics for the Evaluation of the European Policy

EMERGING MARKETS - Lecture 2: Methodology refresher

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Empirical approaches in public economics

PSC 504: Differences-in-differeces estimators

PSC 504: Dynamic Causal Inference

Section 5.4. Ken Ueda

Research Design: Causal inference and counterfactuals

Development. ECON 8830 Anant Nyshadham

Recitation Notes 6. Konrad Menzel. October 22, 2006

Simple Regression Model. January 24, 2011

36-463/663: Multilevel & Hierarchical Models

14.74 Lecture 10: The returns to human capital: education

LECTURE 15: SIMPLE LINEAR REGRESSION I

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

An example to start off with

causal inference at hulu

ECO375 Tutorial 8 Instrumental Variables

Impact Evaluation of Rural Road Projects. Dominique van de Walle World Bank

Causal Inference. Prediction and causation are very different. Typical questions are:

ECON Introductory Econometrics. Lecture 17: Experiments

The Simple Linear Regression Model

Difference-in-Differences Methods

Experiments and Quasi-Experiments

Potential Outcomes Model (POM)

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

LECTURE 2: SIMPLE REGRESSION I

Applied Econometrics Lecture 1

Introduction to Econometrics

Knots, Coloring and Applications

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Causality and Experiments

EC402 - Problem Set 3

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Business Statistics. Lecture 9: Simple Regression

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

Rubin s Potential Outcome Framework. Motivation. Example: Crossfit versus boot camp 2/20/2012. Paul L. Hebert, PhD

Lecture: Difference-in-Difference (DID)

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Challenges (& Some Solutions) and Making Connections

Gov 2002: 9. Differences in Differences

- a value calculated or derived from the data.

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

MAT 210 Test #1 Solutions, Form A

Applied Health Economics (for B.Sc.)

Ch 7: Dummy (binary, indicator) variables

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

On the Use of Linear Fixed Effects Regression Models for Causal Inference

VALUATION USING HOUSEHOLD PRODUCTION LECTURE PLAN 15: APRIL 14, 2011 Hunt Allcott

Estimation of pre and post treatment Average Treatment Effects (ATEs) with binary time-varying treatment using Stata

Lecture 9. Matthew Osborne

Click to edit Master title style

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

THE MULTIVARIATE LINEAR REGRESSION MODEL

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Differences in Differences (Dif-in and Panel Data. Technical Session III. Manila, December Manila, December Christel Vermeersch

Lecture 8a: Spurious Regression

Recitation Notes 5. Konrad Menzel. October 13, 2006

Predicting the Treatment Status

Behavioral Data Mining. Lecture 19 Regression and Causal Effects

Iterated Functions. Tom Davis November 5, 2009

Regression Discontinuity Designs.

Chapter 5 Friday, May 21st

The returns to schooling, ability bias, and regression

Introduction to Econometrics

Lecture 4: Training a Classifier

ECO Class 6 Nonparametric Econometrics

The Generalized Roy Model and Treatment Effects

Gov 2000: 9. Regression with Two Independent Variables

Please bring the task to your first physics lesson and hand it to the teacher.

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

Job Training Partnership Act (JTPA)

review session gov 2000 gov 2000 () review session 1 / 38

Econometrics of Policy Evaluation (Geneva summer school)

ECO 330: Economics of Development. Jack Rossbach Spring 2016

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN

Statistical View of Least Squares

Instrumental Variables

Econ Spring 2016 Section 9

Relationships between variables. Association Examples: Smoking is associated with heart disease. Weight is associated with height.

Econometrics -- Final Exam (Sample)

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

(Refer Slide Time: 00:10)

Bell s spaceship paradox

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

Technical Track Session I:

Lecture 8a: Spurious Regression

Gov 2002: 5. Matching

Geoffrey T. Wodtke. University of Toronto. Daniel Almirall. University of Michigan. Population Studies Center Research Report July 2015

Distribution-Free Procedures (Devore Chapter Fifteen)

Panel Data. STAT-S-301 Exercise session 5. November 10th, vary across entities but not over time. could cause omitted variable bias if omitted

(Riemann) Integration Sucks!!!

Sampling and Sample Size. Shawn Cole Harvard Business School

Transcription:

Differences-in- Differences November 10 Clair

The Big Picture What is this class really about, anyway?

The Big Picture What is this class really about, anyway? Causality

The Big Picture What is this class really about, anyway? Causality What is our biggest problem?

The Big Picture What is this class really about, anyway? Causality What is our biggest problem? Omitted variable bias

Omitted Variable Bias The actual cause is unobserved e.g. higher wages for educated actually caused by motivation, not schooling Happens when people get to choose their own level of the treatment (broadly construed) Selection bias Non-random program placement Because of someone else s choice, control isn t a good counterfactual for treated

Math Review (blackboard)

Math Review for those of you looking at these slides later, here s what we just wrote down: (1) Yi = a + bti + cxi + ei (2) E(Yi Ti=1) E(Yi Ti=0) = [a + b + ce(xi Ti=1) + E(ei Ti=1)] [a + 0 + ce(xi Ti=0) + E(ei Ti=0)] = b + c [E(Xi Ti=1) E(Xi Ti=0)] True effect Omitted variable/selection bias term

What if we had data from before the program? What if we estimated this equation using data from before the program? (1) Yi = a + bti + cxi + ei Specifically, what would our estimate of b be?

What if we had data from before the program? What if we estimated this equation using data from before the program? (1) Yi = a + bti + cxi + ei (2) E(Yi Ti=1) E(Yi Ti=0) = [a + 0 + ce(xi Ti=1) + E(ei Ti=1)] [a + 0 + ce(xi Ti=0) + E(ei Ti=0)] = c [E(Xi Ti=1) E(Xi Ti=0)] Omitted variable/selection bias term ALL THAT S LEFT IS THE PROBLEMATIC TERM HOW COULD THIS BE HELPFUL TO US?

Differences-in-Differences (just what it sounds like) Use two periods of data add second subscript to denote time = {E(Y i1 T i1 =1) E(Y i1 T i1 =0)} (difference btwn T&C, post) {E(Y i0 T i1 =1) E(Y i0 T i1 =0)} (difference btwn T&C, pre) = b + c [E(X i1 T i1 =1) E(X i1 T i1 =0)] c [E(X i0 T i1 =1) E(X i0 T i1 =0)]

Differences-in-Differences (just what it sounds like) Use two periods of data add second subscript to denote time = {E(Y i1 T i1 =1) E(Y i1 T i1 =0)} (difference btwn T&C, post) {E(Y i0 T i1 =1) E(Y i0 T i1 =0)} (difference btwn T&C, pre) = b + c [E(X i1 T i1 =1) E(X i1 T i1 =0)] c [E(X i0 T i1 =1) E(X i0 T i1 =0)] = b YAY!

Differences-in-Differences, Graphically Treatment Control Pre Post

Differences-in-Differences, Graphically Effect of program using only pre- & post- data from T group (ignoring general time trend). Pre Post

Differences-in-Differences, Graphically Effect of program using only T & C comparison from post-intervention (ignoring pre-existing differences between T & C groups). Pre Post

Differences-in-Differences, Graphically Pre Post

Differences-in-Differences, Graphically Effect of program difference-in-difference (taking into account preexisting differences between T & C and general time trend). Pre Post

Identifying Assumption Whatever happened to the control group over time is what would have happened to the treatment group in the absence of the program. Effect of program difference-in-difference (taking into account preexisting differences between T & C and general time trend). Pre Post

Graphing Exercise Which of these programs had no effect? Which of these programs look like they were randomly assigned? Which of these programs look like they were placed in the areas that needed them most? Which of these programs make you wonder if there was some mean reversion going on?

Uses of Diff-in-Diff Simple two-period, two-group comparison very useful in combination with other methods

Uses of Diff-in-Diff Simple two-period, two-group comparison very useful in combination with other methods Randomization Regression Discontinuity Matching (propensity score)

Uses of Diff-in-Diff Simple two-period, two-group comparison very useful in combination with other methods Randomization Regression Discontinuity Matching (propensity score) Can also do much more complicated cohort analysis, comparing many groups over many time periods

The (Simple) Regression Y i,t = a + btreat i,t + cpost i,t + d(treat i,t Post i,t )+ e i,t Treat i,t is a binary indicator ( turns on from 0 to 1) for being in the treatment group Post i,t is a binary indicator for the period after treatment and Treat i,t Post i,t is the interaction (product) Interpretation of a, b, c, d is holding all else constant

Putting Graph & Regression Together Y i,t = a + btreat i,t + cpost i,t + d(treat i,t Post i,t )+ e i,t d is the causal effect of treatment a + b + c + d a + b a a + c Pre Post

Cohort Analysis When you ve got richer data, it s not as easy to draw the picture or write the equations cross-section (lots of individuals at one point in time) time-series (one individual over lots of time) repeated cross-section (lots of individuals over several times) panel (lots of individuals, multiple times for each) Basically, control for each time period and each group (fixed effects) the coefficient on the treatment dummy is the effect you re trying to estimate

DiD Data Requirements Either repeated cross-section or panel Treatment can t happen for everyone at the same time If you believe the identifying assumption, then you can analyze policies ex post Let s us tackle really big questions that we re unlikely to be able to randomize

Malaria Eradication in the Americas (Bleakley 2007) Question: What is the effect of malaria on economic development? 5 types of correlations (remember?):

Malaria Eradication in the Americas (Bleakley 2007) Question: What is the effect of malaria on economic development? 5 types of correlations (remember?): Causation Reverse causation Simultaneity Omitted variables Spurious correlation

Assumption OK? Eradication campaigns not determined by affected regions Campaigns made major progress over a short time span (10 years) Cross-regional variation in how bad malaria was (ecological differences)

Malaria Eradication in the Americas (Bleakley 2007) Treated vs Control those who were (were not) children in malaria endemic regions Pre vs Post DDT spraying In both absolute terms and relative to the comparison group of non-malarious areas, cohorts born after eradication had higher income and literacy as adults than the preceding generations.

Robustness Checks If possible, use data on multiple pre-program periods to show that difference between treated & control is stable Not necessary for trends to be parallel, just to know function for each If possible, use data on multiple post-program periods to show that unusual difference between treated & control occurs only concurrent with program Alternatively, use data on multiple indicators to show that response to program is only manifest for those we expect it to be (e.g. the diff-in-diff estimate of the impact of ITN distribution on diarrhea should be zero)