Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

Similar documents
The Generalized Roy Model and Treatment Effects

1 Impact Evaluation: Randomized Controlled Trial (RCT)

Inference in Regression Model

Econometric Causality

Introduction to Econometrics

Quantitative Economics for the Evaluation of the European Policy

Principles Underlying Evaluation Estimators

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

ECON Introductory Econometrics. Lecture 17: Experiments

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Predicting the Treatment Status

Job Training Partnership Act (JTPA)

Technical Track Session I:

Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives

An example to start off with

Experiments and Quasi-Experiments

CompSci Understanding Data: Theory and Applications

Lecture 11 Roy model, MTE, PRTE

Empirical approaches in public economics

Descriptive Statistics (And a little bit on rounding and significant digits)

Please bring the task to your first physics lesson and hand it to the teacher.

Selection on Observables: Propensity Score Matching.

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

Causal Inference. Prediction and causation are very different. Typical questions are:

Front-Door Adjustment

Policy-Relevant Treatment Effects

Lecture 1 Introduction to Multi-level Models

Gov 2002: 4. Observational Studies and Confounding

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh)

Technical Track Session I: Causal Inference

Regression Discontinuity

One Economist s Perspective on Some Important Estimation Issues

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Development. ECON 8830 Anant Nyshadham

Potential Outcomes Model (POM)

Education Production Functions. April 7, 2009

causal inference at hulu

Identification of Models of the Labor Market

Big-oh stuff. You should know this definition by heart and be able to give it,

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Regression Discontinuity

Assessing Studies Based on Multiple Regression

Lecture: Difference-in-Difference (DID)

Lecture I. What is Quantitative Macroeconomics?

Instrumental Variables

The Distribution of Treatment Effects in Experimental Settings: Applications of Nonlinear Difference-in-Difference Approaches. Susan Athey - Stanford

A PRIMER ON LINEAR REGRESSION

Instrumental Variables in Models with Multiple Outcomes: The General Unordered Case

CS 124 Math Review Section January 29, 2018

14.74 Lecture 10: The returns to human capital: education

Solving with Absolute Value

Econometric analysis of models with social interactions

Lecture 9. Matthew Osborne

Causality and Experiments

Introduction to Econometrics. Assessing Studies Based on Multiple Regression

Market and Nonmarket Benefits

Applied Microeconometrics I

Advanced Statistical Methods for Observational Studies L E C T U R E 0 1

Instrumental Variables

29. GREATEST COMMON FACTOR

BIOSTATISTICAL METHODS

Causal Inference with Big Data Sets

Uncertainty. Michael Peters December 27, 2013

Introduction to Econometrics. Review of Probability & Statistics

Comments on Best Quasi- Experimental Practice

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

Ch 7: Dummy (binary, indicator) variables

Solving Equations by Adding and Subtracting

Simple Regression Model. January 24, 2011

The relationship between treatment parameters within a latent variable framework

What is to be done? Two attempts using Gaussian process priors

Partial Identification of Average Treatment Effects in Program Evaluation: Theory and Applications

EMERGING MARKETS - Lecture 2: Methodology refresher

p(t)dt a p(τ)dτ , c R.

Quadratic Equations Part I

Joint, Conditional, & Marginal Probabilities

33. SOLVING LINEAR INEQUALITIES IN ONE VARIABLE

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Identification with Latent Choice Sets: The Case of the Head Start Impact Study

Introduction This is a puzzle station lesson with three puzzles: Skydivers Problem, Cheryl s Birthday Problem and Fun Problems & Paradoxes

Observational Studies and Propensity Scores

The words linear and algebra don t always appear together. The word

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Probability and Discrete Distributions

Controlling for Time Invariant Heterogeneity

Algebra & Trig Review

Other Models of Labor Dynamics

Advanced Statistical Methods for Observational Studies L E C T U R E 0 1

LECTURE 2: SIMPLE REGRESSION I

N/4 + N/2 + N = 2N 2.

Differences-in- Differences. November 10 Clair

Recitation Notes 6. Konrad Menzel. October 22, 2006

Men. Women. Men. Men. Women. Women

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Data Analysis and Statistical Methods Statistics 651

Chapter 11. Regression with a Binary Dependent Variable

Many natural processes can be fit to a Poisson distribution

Transcription:

Treatment Effects Christopher Taber Department of Economics University of Wisconsin-Madison September 6, 2017

Notation First a word on notation I like to use i subscripts on random variables to be clear that this is individual specific (or firm specific, or time specific, or state specific etc). As much as possible I will use capital letters to denote random variables I will also try to be clear about conditioning so E (Y i X i = x) means the expected value of Y i conditional on X i having the value x. This thing can be thought of as a function of x and I ll try to write it that way

I will use the notation E (Y i X i ) to essentially mean E (Y i X i = x) for all values of x (in the support of X i ) For example if G i is gender and takes on the value m and f then E (Y i G i ) =0 means that E (Y i G i = m) =0 E (Y i G i = f ) =0

I will also use the notation to mean asymptotically equivalent so 1 N N X i E (X i ) i=1 means that as N gets large, 1 N N i=1 X i converges (in probability or almost surely) to E (X i ). I will not distinguish between different types of convergence (feel free to ignore this comment if you don t know it means)

Treatment Effects The key theme in this course will be causal estimation-we want to estimate the effect of some intervention or policy on an individual. The standard way to think about these issues is in the Treatment Effects framework We will start with the simplest case where the default is that an individual (or firm or state or whatever) either receives some treatment or they do not

The classic and easiest way to think about this is a drug We have a patient who is sick and either we give them the drug or we do not We let Y 0i be the outcome that individual i would receive if they did not get the drug and Y 1i the outcome they would receive if they did receive the drug.

A key concept here is that of a counterfactual Let T i be an indicator (0 or 1) for whether individual i receives the drug treatment then what we generally get to see in the data is Y i =T i Y 1i + (1 T i ) Y 0i

That is for persons who receive the drug we observe Y 1i for persons who don t receive the drug we observe Y 0i So for people who receive the drug, Y 0i is counterfactual-it is the outcome that person i would have received if they had not received the drug Similarly Y 1i is counterfactual for people who did not receive the drug. It is the outcome they would have obtained had they received the drug

We define the treatment effect for individual i as the change in their outcomes resulting from the treatment: i Y 1i Y 0i

There are many many different treatments we can think of: a drug for Ebola a big fence between here and Mexico a year of school a month of training an increase in the federal minimum wage to $15 a welfare program free tuition for public colleges inclusion of a public option in health exchanges exposure to polluted air access to clean water race or gender a cigarette a day while pregnant a glass of wine every night for dinner increased financial regulations on banks helmets on motorcyclists the death penalty

Our first issue: even if we knew the value of i for every person in the population that we study, What do we present in the paper? We must focus on a feature of the distribution

The most common: Average Treatment Effect (ATE) E( i ) Treatment on the Treated (TT) E( i T i = 1) Treatment on the Untreated (TUT) E( i T i = 0) (Heckman and Vytlacil discuss Policy Relevant Treatment effects, but I need more notation than I currently have to define those)

These each answer very different questions I will ignore TUT for the rest of these lecture notes because it is symmetric with TT In terms of identification they are symmetric.

What does Identification mean? Answer: different things to different people I will stick to a more statistical definition (though informal) A parameter θ is identified if under the maintained assumptions of the model it is uniquely determined by the population version of the data This can be explained within this treatment effect framework: In the data we observe (Y i, T i ), so the population version of the data is the full joint distribution of Y i and T i I used the phrase population version of the data to mean everything we could ever learn from the data. For example when T i = 1 the data doesn t inform us about Y 0i so this is not part of the population version of the data.

What is identified from this type of data? We can directly identify Pr(T i = 1) We can also observe the distribution of Y 0i conditional on T i = 0 the distribution of Y 1i conditional on T i = 1

Identification of TT By definition the TT parameter is TT = E( i T i = 1) =E(Y 1i Y 0i T i = 1) Is this identified? =E(Y 1i T i = 1) E(Y 0i T i = 1) Well one part we get immediately: E(Y 1i T i = 1) The other part is counterfactual: E(Y 0i T i = 1) It is the expected value the treatments would have received if the treatment option were not available to them. This is not measured in the data

To a large extent what this course and more generally causal estimation is about is trying to say something about E(Y 0i T i = 1) All the different empirical approaches we examine do this in one way or another

Average Treatment Effect This is slightly more complicated ATE = E( i ) =E( i T i = 1)Pr(T i = 1) + E( i T i = 0)Pr(T i = 0) = [E(Y 1i T i = 1) E(Y 0i T i = 1)] Pr(T i = 1) + [E(Y 1i T i = 0) E(Y 0i T i = 0)] [1 Pr(T i = 1)] All we can directly identify from the data is : E(Y 1i T i = 1), E(Y 0i T i = 0), Pr(T i = 1) In this case there are two missing pieces: E(Y 1i T i = 0), E(Y 0i T i = 1) If we could figure these out we could identify the ATE

Extensions: We have kept things simple by focusing on binary treatments, but more generally treatments could have multiple values: drug 1/drug 2 classroom training/ job search assistance 4 yr college/2 yr college Could also have continuous treatment hours of training dose of drug

This does not drastically change things-just makes more counterfactuals For example with 2 different drugs we have three things: Y 0i : outcome taking neither drug Y 1i : outcome taking drug 1 Y 2i : outcome taking drug 2 This gives a bunch of different things we can look at: E(Y 1i Y 0i ) E(Y 2i Y 0i T i = 2) E(Y 2i Y 1i T i = 1) etc.

Randomized Control Trial So how do I deal with this general identification problem? Simplest thing is to randomize, if I randomly force some people to receive the treatment and others not to receive the treatments then T i is random and independent of underlying variables. In particular it will be independent of Y 1i and Y 0i so Then I have solved the problem E (Y 1i T i = 0) =E (Y 1i T i = 1) E (Y 0i T i = 1) =E (Y 0i T i = 0) Since the things on the right hand side are identified, the things on the left hand side are as well

They work great when you can do them, but often experiments are not feasible: prohibitively expensive typically social experiments are very expensive to run, and we sometimes need very large samples Unethical effect of growing up with your own parents vs. foster home effect of exposure to carbon monoxide on mortality Compliance very difficult compare people who drink a glass of wine every day (for 30 years) to people who don t drink at all Not feasible monetary policy (GE effects) effects of preschool program on adult outcomes, but we need to know the answer for a current policy discussion

Since we can not typically use experimental methods we typically are restricted other econometric methods. We will spend the rest of this part of the course trying other methods.