Job Training Partnership Act (JTPA)

Similar documents
Selection on Observables: Propensity Score Matching.

ECON Introductory Econometrics. Lecture 17: Experiments

Statistics, inference and ordinary least squares. Frank Venmans

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

Quantitative Economics for the Evaluation of the European Policy

Rockefeller College University at Albany

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Gov 2002: 4. Observational Studies and Confounding

An example to start off with

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Propensity Score Analysis Using teffects in Stata. SOC 561 Programming for the Social Sciences Hyungjun Suh Apr

PROPENSITY SCORE MATCHING. Walter Leite

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Making sense of Econometrics: Basics

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Sociology 593 Exam 2 Answer Key March 28, 2002

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Difference-in-Differences Estimation

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Controlling for Time Invariant Heterogeneity

Lecture 4: Linear panel models

Assessing Studies Based on Multiple Regression

Multiple Linear Regression CIVL 7012/8012

Economics 308: Econometrics Professor Moody

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Propensity Score Matching

Principles Underlying Evaluation Estimators

Introduction to Econometrics

ECO375 Tutorial 8 Instrumental Variables

ECNS 561 Multiple Regression Analysis

150C Causal Inference

Sociology 593 Exam 2 March 28, 2002

Experiments and Quasi-Experiments

Groupe de lecture. Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings. Abadie, Angrist, Imbens

1 Motivation for Instrumental Variable (IV) Regression

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Implementing Matching Estimators for. Average Treatment Effects in STATA

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Click to edit Master title style

Instrumental Variables

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

THE MULTIVARIATE LINEAR REGRESSION MODEL

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Simple Regression Model (Assumptions)

(Mis)use of matching techniques

150C Causal Inference

14.74 Lecture 10: The returns to human capital: education

Difference-in-Differences Methods

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Lecture 4: Multivariate Regression, Part 2

Correlation and Regression Bangkok, 14-18, Sept. 2015

Session IV Instrumental Variables

Lecture: Difference-in-Difference (DID)

Introduction to Panel Data Analysis

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Week 6: Linear Regression with Two Regressors

Dealing With Endogeneity

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Applied Quantitative Methods II

Applied Health Economics (for B.Sc.)

Gov 2000: 9. Regression with Two Independent Variables

ECONOMETRICS HONOR S EXAM REVIEW SESSION

EMERGING MARKETS - Lecture 2: Methodology refresher

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometric Analysis of Cross Section and Panel Data

6. Dummy variable regression

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Development. ECON 8830 Anant Nyshadham

Gov 2000: 9. Regression with Two Independent Variables

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Lecture 4: Multivariate Regression, Part 2

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Causality and Experiments

Causal Directed Acyclic Graphs

AUTOCORRELATION. Phung Thanh Binh

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Econometrics Homework 4 Solutions

Introduction to Propensity Score Matching: A Review and Illustration

Behavioral Data Mining. Lecture 19 Regression and Causal Effects

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Observational Studies and Propensity Scores

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Telescope Matching: A Flexible Approach to Estimating Direct Effects

The Simple Linear Regression Model

Linear Regression. Junhui Qian. October 27, 2014

Applied Microeconometrics (L5): Panel Data-Basics

Multiple Linear Regression

Lab 07 Introduction to Econometrics

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

IV Estimation WS 2014/15 SS Alexander Spermann. IV Estimation

Empirical approaches in public economics

Transcription:

Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans

Example of a randomized experiment: Job Training Partnership Act (JTPA) Largest randomized training evaluation in US, started in 1983 at 649 sites Sample: previously unemployed or low earnings D: assignment to one of 3 general service strategies Classroom training in occupational skills On the job training and/or job search assistance Other services (eg. Probationary employment) Y: earnings 30 months following assignment X: characteristics measured before assignment: age, gender, previous earnings, race, etc.

Policy outcome After the results of the JTPA study, funding for the youth were drastically cut.

Selection on observables

Observational studies Not always possible to randomize (eg. Effect of smoking) Main problem selection bias Goal is to design observational study that approximates an experiment

Smoking and Mortality (Cochran 1986)

Subclassification Need to control for differences in age. Subclassification: For each country, divide each group in different age subgroups Calculate death rates within age subgroups Average within age subgroup death rates using fixed weights (eg. Number of cigarette smokers)

Subclassification: example Death rates pipe smokers # Pipe-smokers # non-smokers Age 20-50 15 11,000 29,000 Age 50-70 35 13,000 9,000 Age +70 50 16,000 2,000 Total 40,000 40,000 What is average death rate for pipe smokers? 15*(11/40)+ 35*(13/40)+50 (16/40)=35,5 What is average death rate for pipe smokers if they had the same age distribution as non-smokers? 15*(29/40)+35*(9/40)+50*(2/40)=21,2

Effect of cigarettes was underestimated because cigarette smokers were younger than average Effect of cigars was overestimated because cigar smokers are older than average

Covariates, outcomes and post-treatment bias Predetermined Covariates: Variable X (ex age) is predetermined with respect to treatment D (smoking) if for each individual i, X 0i =X 1i This does not imply that X and D are independent Are often time invariant, but not necessarily Outcomes Variables Y (ex death rate, lung cancer, color of teeth) that are (possibly) not predetermined are called outcomes if for some individual i, Y 0i Y 1i In general, wrong to condition on outcomes, because this may induce post-treatment bias

Identification assumption ATE Y 1, Y 0 D X (selection on observables) For a given value of X, potential outcomes are the same for treated and control units This means that all variables that affect the outcome and probability of being treated must be included in the model (X is a vector of covariates)! 0 < Pr(D = 1 X) < 1 for almost all X (common support) For a every value of X there is a non-zero probability to find treated and control units ATET Y 0 D X (selection on observables) For a given age, the death rate if they would have been non-smokers should be the same for smokers and non-smokers Pr(D = 1 X) < 1 (with Pr D = 1 > 0)(common support) For every value of X there is a non-zero probability to find control units. If for some values of X, there are no treated units, this is not a problem.

Subclassification estimator K k=1 N k α ATE = Y k k 1 Y ; α 0 N ATET = Y k k N 1 Y 1 k=1 0 N 1 N k k is # of obs. and N 1 is # of treated obs in cell k X k Death rate smokers Death rate non-smokers ATE= 4 * (10/20) + 6 * (10/20)=5 ATET= 4 * (3/10) + 6 * (7/10)= 5,4 k Diff. # smokers # Obs. Old 28 24 4 3 10 Young 22 16 6 7 10 Total 23,8 21,6 2,2 10 20 k

Matching Calculate α ATET by «imputing» the missing potential outcome of eacht treated unit using the observed outcome from the «closest» control unit: α ATET = 1 N Di =1 Y i Y j i 1 With Y j i the outcome of an untreated observation such that X i j is the closest value to X i among the untreated observations. Alternative: use the M closest matches α ATET = 1 Y N i 1 D 1 i =1 M M j=1 Y j i

Example ATET= 1/3(6-9) + 1/3(1-0) + 1/3(0-9) = -3,7

Trade-off between bias and efficiency Single matching vs multiple matching Single matching: only the best match is used => lower bias Multiple matching: a greater set of information is used=>more efficient (lower standard errors of estimate) Matching with replacement vs without replacement Matching with replacement: the best match can be used several times => lower bias Matching without bias: the best match may not be picked because it served already as a match. Therefore the second best match is used. This increases the set of information that is used. More efficient.

Distance metric When there are multiple confounders, a distance metric needs to be specified. Euclidian distance: every variable (standardized to have the same variance) has the same weight. Ex if there are 3 variables, you would match points that are closest in a standardized 3D plot. Mahalanobis distance: takes into account correlations between variables. If two variables are highly correlated, they receive less weight. This is in many cases theoretically more appealing. You can impose an exact match on certain variables (for example country, or sector), combined with another distance metric for other variables.

Bias correction If there are multiple continuous variables, matching estimators may behave badly. α ATET = 1 N 1 Y i Y j i μ 0 X i μ 0 X j i Di =1 Where μ 0 X i = E Y X = X i, D = 0 and μ 0 = β 0 + β 1 X 1 + β 2 X 2 estimated by OLS For example, if treated companies are much smaller than control companies, even if the matching algorithm searches among the smallest of control companies, the mean size of the control may still be greater than the mean size of the treated companies. Bias correction will calculate a size effect and deduce this from the estimated treatment effect (Abadie & Imbens, 2006).

Variance estimation (optional) Best with replacement to eliminate bias. But replacement will increase variance compared to a standard estimation of the form Var α ATET = 1 N 1 2 Y i Y i j α ATET 2 D i =1 (analytical solution not given) (Therefore bootstrap does not work)

Propensity score matching Propensity score is the probability of being treated conditional on the confounding variables: π X = P D = 1 X It can be shown that if Y 1, Y 0 D X Y 1, Y 0 D π X If 2 individuals or companies are both as likely to be treated given the combination of their confounders X, then they are a good (unbiased) match. Ex: if both older and male individuals smoke more, a good match for a man would be a women that is a little bit older. Identification assumptions are the same: selection on observables and common support

Propensity score: estimation 1st step: Estimate the propensity score π X = P D = 1 X using logit/probit regression 2 nd step: Do matching (or sub-classification) on the propensity score OR: multiply every observation by a weight based on the propensity score (no proof) α ATE = 1 N N i=1 Y i N i=1 D i π X i π X i 1 π X i α ATET = 1 D Y i π X i N i 1 π X i Standard error estimation: need to adjust for first step estimation of propensity score. Analytical solution: paramtetric first step: Neway & Mc Fadden (1994) or Newey (1994). Alternative: Bootstrap

Matching in Stata Need to download package with command: ssc install nnmatch, replace ssc install psmatch2, replace Nnmatch, nearest neighbour matching doesn t do propensity score matching. PSmatch2 propensity score matching, but does also Mahalanobis matching. No exact matching. Matching in R has some more options compared to Stata

Ceteris paribus interpretation of regression 5 Gauss-Markov assumptions: The true model is Y = α 0 + β 1 X 1 + β 2 X 2 + + ε with E ε = 0 (linearity) No perfect collinearity (you cannot write X 1 as a linear combination of the other X j s) Homoscedastic errors E ε 2 i = σ² matrix notation E εε Uncorrelated errors E ε i ε j = 0 E ε X 1, X 2 = 0 (exogenous explainatory variables, no endogeneïty) Conditions 1 and 5 imply E Y X 1, X 2 = α 0 + β 1 X 1 + β 2 X 2. This allows a ceteris paribus interpretation of beta s: all other relevant factors being equal, an increase of X1 by one unit will increase Y by β 1. For a causal interpretation of regression, some extra caution is needed The relationship must be specified in the correct way, i.e. X causes (precedes) Y and not the inverse Remark: If there is a causation in 2 senses, (think of a feedback loop), condition 5 will be violated (simultaneïty). There may not be a causal relationship between treatment and covariates. Ex: the effect of parents education on school results of their children keeping income constant underestimates the causal effect of parents educations on their children s results. Because the indirect effect is not included. = σi

The effect of smoking «all else being equal» smoking Encouragement by peers in the past Alcohol Age Gender Death rate Other health conditions Error= all other factors Car accidents Genetic E ε X 0 cov ε, X 0 ε and Xare driven by common factors.

The effect of smoking «all else being equal» smoking Encouragement by peers in the past Age Death rate Gender Alcohol Other health conditions Error Car accident Genetic

Sources of endogeneity Assume that the real model is: Y = α 0 + β 1 X 1 + β 2 X 2 + ε Assume that we omit X 2 and estimate Y = α 0 + β 1 X 1 + u with u = β 2 X 2 + ε E X 1 u = E X 1 β 2 X 2 + ε = E X 1 X 2 β 2 0 if X1 and X2 correlated + E[X 1 ε] 0 and Y and X2 correlated E u X 1 0 Leaving out a confounder creates an endogeneity bias (to all betas). Other causes of endogeneity (which can be framed as an omitted variable problem) Measurement errors correlated with X and Y. Simultaneity: X causes Y but also Y causes X (ex. prices as a function of concentration in aviation sector; supply and demand function) Y t-1 as a regressor when errors are serially correlated (ε t 1 affects ε t and y t-1 as well). 0

Smoking example again Assume we want to know the relationship between death rates and number of cigarettes per day, but we omit age =>error correlated with X Y=Death rate Observed relationship Unbiased relationship for Y conditional on confounders Young person, negative error from age X=# cigarets per day

Matching vs regression Consider a regression of the form: Y = α 0 + α 1 D + β 1 X 1 + β 2 X 2 + ε The standard condition of exogenous X s E ε D, X 1, X 2 = 0 boils down to the condition Y 0, Y 1 D X all variables affecting the outcome and treatment probability at the same time are included in the model. The same holds if D is not a dummy, but a continuous variable representing a continuum of treatments and a continuum of counterfactual scenarios. The estimated α 1 is a variance-weighted average treatment effect ( ~ a variance- weighted multiple matching without replacement). Matching allows for a balance check (check that treated and controls have the same mean X 1, X 2 ) which is an advantage over OLS. Matching only needs the linearity of the model for its bias correction. When exact matching, linearity of the model is not necessary. In general, OLS will be more efficient and at a higher risk of bias Both are based on the following assumptions Selection on observables Control variables may not have a causal relation with the treatment variable (else => use structural equation modeling) Common support