Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

Similar documents
Lecture 11 Roy model, MTE, PRTE

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Estimation of Treatment Effects under Essential Heterogeneity

Lecture 11/12. Roy Model, MTE, Structural Estimation

Estimating Marginal Returns to Education

Estimating Marginal and Average Returns to Education

Estimating marginal returns to education

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Policy-Relevant Treatment Effects

Principles Underlying Evaluation Estimators

Tables and Figures. This draft, July 2, 2007

AGEC 661 Note Fourteen

Comparative Advantage and Schooling

Do Dropouts Suffer from Dropping Out? Estimation and Prediction of Outcome Gains in Generalized Selection Models

Adding Uncertainty to a Roy Economy with Two Sectors

Econometric Causality

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Exploring Marginal Treatment Effects

The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs

Generalized Roy Model and Cost-Benefit Analysis of Social Programs 1

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

Marginal Treatment Effects from a Propensity Score Perspective

Treatment Effects with Normal Disturbances in sampleselection Package

Instrumental Variables: Then and Now

The relationship between treatment parameters within a latent variable framework

NBER WORKING PAPER SERIES STRUCTURAL EQUATIONS, TREATMENT EFFECTS AND ECONOMETRIC POLICY EVALUATION. James J. Heckman Edward Vytlacil

Discrete Dependent Variable Models

TECHNICAL WORKING PAPER SERIES LOCAL INSTRUMENTAL VARIABLES. James J. Heckman Edward J. Vytlacil

More on Roy Model of Self-Selection

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Testing for Essential Heterogeneity

The problem of causality in microeconometrics.

( ) : : (CUHIES2000), Heckman Vytlacil (1999, 2000, 2001),Carneiro, Heckman Vytlacil (2001) Carneiro (2002) Carneiro, Heckman Vytlacil (2001) :

The Problem of Causality in the Analysis of Educational Choices and Labor Market Outcomes Slides for Lectures

Empirical Methods in Applied Microeconomics

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

The problem of causality in microeconometrics.

Quantitative Economics for the Evaluation of the European Policy

Heterogeneous Treatment Effects in the Presence of. Self-Selection: A Propensity Score Perspective

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006

Understanding Instrumental Variables in Models with Essential Heterogeneity

Estimating Heterogeneous Treatment Effects in the Presence of Self-Selection: A Propensity Score Perspective*

Estimating Marginal and Average Returns to Education

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Instrumental Variables in Models with Multiple Outcomes: The General Unordered Case

Using matching, instrumental variables and control functions to estimate economic choice models

Flexible Estimation of Treatment Effect Parameters

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Empirical approaches in public economics

Chilean and High School Dropout Calculations to Testing the Correlated Random Coefficient Model

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Labor Supply and the Two-Step Estimator

PhD/MA Econometrics Examination January 2012 PART A

Average and Marginal Returns to Upper Secondary Schooling in Indonesia

Methods to Estimate Causal Effects Theory and Applications. Prof. Dr. Sascha O. Becker U Stirling, Ifo, CESifo and IZA

NBER WORKING PAPER SERIES

1 Static (one period) model

Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Recursive Causal Models

STRUCTURAL EQUATIONS, TREATMENT EFFECTS AND ECONOMETRIC POLICY EVALUATION 1. By James J. Heckman and Edward Vytlacil

SUPPOSE a policy is proposed for adoption in a country.

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS

Truncation and Censoring

Regression Discontinuity Designs.

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

Probabilistic Choice Models

5.5 Yitzhaki Weights as a Version of Theil Weights (1950)

An Alternative Assumption to Identify LATE in Regression Discontinuity Designs

Impact Evaluation Technical Workshop:

Selection on Observables: Propensity Score Matching.

The Econometric Evaluation of Policy Design: Part III: Selection Models and the MTE

Estimating the Marginal Odds Ratio in Observational Studies

Propensity Score Methods for Causal Inference

Generated Covariates in Nonparametric Estimation: A Short Review.

Regression #3: Properties of OLS Estimator

The Generalized Roy Model and Treatment Effects

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

A Note on the Correlated Random Coefficient Model. Christophe Kolodziejczyk

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Controlling for Time Invariant Heterogeneity

Non-linear panel data modeling

New Developments in Nonresponse Adjustment Methods

ted: a Stata Command for Testing Stability of Regression Discontinuity Models

A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

Propensity-Score-Based Methods versus MTE-Based Methods. in Causal Inference

Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity.

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

Identification with Latent Choice Sets: The Case of the Head Start Impact Study

College Education and Wages in the U.K.: Estimating Conditional Average Structural Functions in Nonadditive Models with Binary Endogenous Variables

EVALUATING EDUCATION POLICIES WHEN THE RATE OF RETURN VARIES ACROSS INDIVIDUALS PEDRO CARNEIRO * Take the standard model of potential outcomes:

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

PROPENSITY SCORE MATCHING. Walter Leite

Dynamics in Social Networks and Causality

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market

Understanding What Instrumental Variables Estimate: Estimating Marginal and Average Returns to Education

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Economics 241B Estimation with Instruments

Transcription:

Four Parameters of Interest in the Evaluation of Social Programs James J. Heckman Justin L. Tobias Edward Vytlacil Nueld College, Oxford, August, 2005 1

1 Introduction This paper uses a latent variable framework to unite the recent treatment eect literature with the classical selection bias literature. We obtain simple closed-form expressions for four treatment parameters of interest: the Average Treatment Eect (ATE), the eect of Treatment on the Treated (TT), the Local Average Treatment Eect (LATE) (Imbens and Angrist 1994), and the Marginal Treatment Eect (MTE) (Björklund and Mott 1987; Heckman 1997; Heckman and Vytlacil 1999, 2000a-b) for the textbook Gaussian selection model. Discuss how one might approach estimation of the distributions associated with these parameters of interest. 2

2 Treatment Parameters in a Canonical Model Consider a model of potential outcomes: 1 = 1 + 1 (2.1) 0 = 0 + 0 = + Each agent is observed in only one state, so that either 1 or 0 is observed. The pair ( 1 0 ) is never observed for any given person. Gain is denoted by 1 0 3

() denotes the observed treatment decision () =1denotes receipt of treatment () =0denotes nonreceipt. is a latent variable which generates (), () =1[ () 0] = 1[ + 0] (2.2) 4

1[] is the indicator function which takes the value 1 if the event is true, 0 otherwise. Extension of the Roy (1951) model, = 1 0, where represents the cost of participating in the treated state () =1[ ]. () indicates whether or not the individual would have received treatment had her value of been externally set to, holding her unobserved constant. 5

Varying, we can manipulate an individual s probability of receiving treatment without aecting the potential outcomes. Assume ( 1 0 ) independent of and. denotes observed earnings. = 1 +(1 ) 0 (2.3) Switching regression model: Quandt (1972), Rubin s model (Rubin 1978), or Roy model of income distribution (Roy 1951: Heckman and Honoré 1990). 1 1 Amemiya (1985) has classified models of this type as generalized tobit models, and refers to the model in (1) as the Type 5 tobit model. 6

Estimating the return to a college education. represents log earnings, 0 denotes the log earnings of college graduates and 1 denotes the log earnings of those not selecting into higher education. The latent index maps people into either the college (or treated) state and the no-college (or untreated) state. Expected college log wage premium for given characteristics, ( i.e. ( 1 0 )). 2 2 Other applications which fit directly into this model include Lee (1978) and Willis and Rosen (1979). 7

Examine four treatment parameters: Average Treatment Eect (ATE), the eect of Treatment on the Treated (TT), the Local Average Treatment Eect (LATE),andtheMarginalTreatmentEect (MTE). 8

Average Treatment Eect (ATE): expected gain from participating in the program for a randomly chosen individual. 1 0 : gain from program participation, where is sample size. ATE() =( = ) =( 1 0 ) ATE = () = Z ATE() () 1 X ATE( )=( 1 0 ) =1 9

Treatment on the Treated (TT): TT( () =1) = ( = = () =1) (2.4) = ( 1 0 )+( 1 0 = = ) = ( 1 0 )+( 1 0 ) 10

We can obtain an unconditional estimate by integrating. =1, TT can be approximated as follows: TT = ( () =1) (2.5) = Z TT( () =1) ( () =1) 1 X ( ( )=1) =1 11

Local Average Treatment Eect (LATE) of Imbens and Angrist (1994) LATE is defined as the expected outcome gain for those induced to receive treatment through a change in the instrument from = to = 0. LATE parameter as a change in the index from = to = 0,where 0 and and 0 are identical except for their coordinate. We could equivalently define the treatment parameters in terms of the propensity score, () =Pr( =1 ) =1 (), denotes the cdf of the random variable. 12

The LATE parameter: LATE(() =0( 0 )=1 = ) =( () =0( 0 )=1 = ) = ( 1 0 )+( 1 0 0 = ) = ( 1 0 )+( 1 0 0 ) 13

Two ways to define the unconditional version of LATE. First, consider ( () =0( 0 )=1) = Z 1 LATE(() =0( 0 )=1) () X LATE(() =0( 0 )=1 ) (2.6) =1 Parameter ( () =0( 0 )=1)treatment eect for individuals who would not select into treatment if their vector was set to but would select into treatment if was set to 0. Alternative definition of the unconditional versionoflateistolet 0 () equal Let 1 () equal but with the th element replaced by 0. 14

Second definition of the unconditional version of LATE, ( ( 0 ()) = 0( 1 ()) = 1) (2.7) = Z LATE(( 0 ()) = 0( 1 ()) = 1) ( ) 1 X LATE(( 0 ( )) = 0( 1 ( )) = 1 ) =1 15

Marginal Treatment Eect (MTE) (Björklund and Mott 1987; Heckman 1997; Heckman and Smith 1998; Heckman and Vytlacil 1999, 2000a-b), MTE( ) = ( = = ) (2.8) = ( 1 0 )+( 1 0 = = ) = ( 1 0 )+( 1 0 = ) where third equality follows ( 1 0 ) independent of. 16

At low values of average the outcome gain for those with unobservables making them least likely to participate, while evaluation of the MTE parameter at high values of is the gain for those individuals with unobservables which make them most likely to participate. is independent of,themte parameter unconditional on observed covariates can be written as MTE( ) = Z 1 MTE( ) () X MTE( )=( 1 0 )+( 1 0 = ) =1 17

MTE parameter can also be expressed as the limit form of the LATE parameter, lim 0 LATE( () =0(0 )=1) = ( 1 0 ) + lim 0 ( 1 0 0 = ) = ( 1 0 )+( 1 0 = 0 ) = MTE( 0 ) MTE parameter measures average gain in outcomes for those individuals who arejustindierent to treatment when the index is fixed at the value. 18

3 Simple Expressions for the Dierent Treatment Parameters in the General Case Textbook normal model: 1 0 0 1 1 0 1 2 1 10 0 10 2 0 19

Treatment on the Treated (TT) is: TT( () =1)=( 1 0 )+( 1 1 0 0 ) () () Thus, if Cov( 1 0 )=0,or 1 1 = 0 0 If Cov( 1 0 ) 0, thentt ATE,TTATE. (e.g. Cramer 1946 or Johnson, Kotz, and Balakrishnan 1992) 20

( ) ( ) and,then μ () () ( ) = + () () =( ), =( ).Thus, LATE( () =0( 0 )=1)=( 1 0 0 )) (3.1) = ( 1 0 )+( 1 1 0 0 ) (0 ) () ( 0 ) () 21

The Marginal Treatment Eect ( ) = ( 1 0 )+( 1 0 = ) = ( 1 0 )+( 1 = ) ( 0 = ) = ( 1 0 )+( 1 1 0 0 ) 22

Limit form of LATE. 3 ( ) () ( )=( 1 0 )+( 1 1 0 0 ) lim ( ) () ( ( ) ()) ( ) = ( 1 0 )+( 1 1 0 0 ) lim (( ) ()) ( ) = ( 1 0 )+( 1 1 0 0 ) 3 The last line in this derivation follows from L Hôpital s rule. 23

Evaluating MTE when is large corresponds to case where average outcome gain is evaluated for those individuals with unobservables making them most likely to participate, (and conversely when is small). When =0, MTE = ATE as a consequence of symmetry of normal distribution. 24

Non-Normal Extensions Following Lee (1982, 1983), trivariate Normal model can be generalized by exploiting natural flexibility of selection equation. In latent variable framework, selection rule assigns people to treated state ( =1)provided 0 This is equivalent to setting =1when ( ) () 0 for some strictly increasing function 25

Suppose,where an absolutely continuous distribution function. For simplicity, assume symmetry of about zero so that () =1 (). ( ) () 1 () is standard normal random variable. 26

Original model in (1) is equivalent to the transformed model: 1 = 0 1 + 1 0 = 0 0 + 0 = ( 0 )+ now assume [ 1 0 ] 0 is trivariate normal. Obtain the following selectioncorrected conditional mean functions: ( 1 () =1 = = ) = 0 ( ( 0 )) 1 + 1 1 (3.2) ( 0 ) ( 0 () =0 = = ) = 0 ( ( 0 )) 0 0 0 (3.3) 1 ( 0 ) 27

( () =1)= 0 ( 1 0 )+( 1 1 0 0 ) ( ( 0 )) ( 0 ) ( () = 0( ) =1)= 0 ( 1 0 )+( 1 1 0 0 ) ( ( 0 )) ( ( 0 )) ( 0 ) ( 0 ) ( )= 0 ( 1 0 )+( 1 1 0 0 ) ( ) 28

Less straightforward generalization can be achieved by following Lee (1982, 1983) in (14) to be jointly distributed according to the Student- distribution. ( ) denotes the multivariate. Student- density function with mean, scale matrix (variance equal to [( 2)]) and degrees of freedom. 4 Let denote the standardized univariate Student density with mean 0 and scale parameter equal to 1. Let denote the associated cdf. 4 The mean exists when 1 and the variance exists when 2 29

Letting, we define () 1 ( ()) as before, again noting that () = () Assume [ 1 0 ] 0 has a trivariate (0 ) density. ( 1 () =1 = = ) = 0 1 + 1 1 μ +[ ( 0 )] 2 1 μ ( ( 0 )) ( 0 ) ( 0 () =0 = = ) = 0 0 0 0 μ +[ ( 0 )] 2 1 μ ( ( 0 )) 1 ( 0 ) 30

μ +[ ()] 2 ( ) ( ()) 1 ( () =1)= 0 ( 1 0 )+( 1 1 0 0 ) (0 ) ( 0 ) ( () =0( ) =1)= 0 ( 1 0 )+( 1 1 0 0 ) ( 0 ) ( 0 ) ( 0 ) ( 0 ) ( )= 0 ( 1 0 )+( 1 1 0 0 ) ( ) 31

3.1 Estimation 1. Obtain ˆ from a probit model on the decision to take the treatment. 2. Compute the appropriate selection correction terms evaluated at ˆ, (i.e. ( ˆ)(ˆ) when =1, and ( ˆ)(1 (ˆ)) when =0) 32

3. Run treatment-outcome-specific regressions (for the groups { : = 1} and { : =0}) with the inclusion of the appropriate selectioncorrection terms obtained from the previous step. 4. Given ˆ 0 ˆ 1 1 ˆ 1 and 0 ˆ 0 obtained from step 3, and ˆ from step (1), use these parameter estimates to obtain point estimates of the treatment parameters for given,, and 0. Alternatively, one could integrate over the distribution of the characteristics to obtain unconditional estimates, as suggested in section 2. 33

Table 1 Point Estimates and Standard Errors of Alternate Treatment Parameters Outcome Errors / Link Function ATE TT LATE Normal/Normal.092.039.079 (SSR=345.25) (.03) (.04) (.03) t v=2 /Logit.061.036.053 (SSR = 346.09) (.02) (.03) (.02) t v=3 /Logit.073.035.062 (SSR = 345.79) (.02) (.03) (.02) t v=4 /Logit.079.035.067 (SSR = 345.61) (.02) (.04) (.03) t v=5 /Logit.082.034.069 (SSR = 345.51) (.03) (.04) (.03) t v=6 /Logit.084.034.071 (SSR = 345.44) (.03) (.04) (.03) t v=8 /Logit.085.034.073 (SSR = 345.36) (.03) (.04) (.03) t v=12 /Logit.087.034.073 (SSR = 345.29) (.03) (.04) (.04) t v=24 /Logit.088.033.075 (SSR = 345.23) (.04) (.04) (.03) t v=2 / t v=2.067.028.058 (SSR = 345.68) (.03) (.04) (.03) t v=3 / t v=3.075.030.063 (SSR = 345.56) (.03) (.04) (.03) t v=4 / t v=4.079.031.066 (SSR = 345.48) (.03) (.04) (.03) t v=5 / t v=5.082.032.069 (SSR = 345.43) (.03) (.04) (.03) t v=6 / t v=6.084.033.070 (SSR = 345.40) (.03) (.04) (.03) t v=8 / t v=8.086.034.072 (SSR = 345.36) (.03) (.04) (.03) t v=12 / t v=12.088.036.075 (SSR = 345.32) (.03) (.04) (.03) t v=24 / t v=24.090.037.077 (SSR = 345.29) (.03) (.04) (.03) 34

Table 2: Coecients and Standard Errors for Application of Section 5 Variable Coecient Standard Error College State Constant 1.85.225 g (Ability).092.053 Northeast.124.055 South.059.057 Experience.098.044 Experience 2 -.004.003 Urban.326.072 Unemp. Rate -.002.002 (Zˆ) -.165.081 No-College State Constant 1.89.424 g (Ability).191.036 Northeast.126.057 South -.046.053 Experience.043.067 Experience 2 -.001.003 Urban.136.051 Unemp. Rate.001.002 (Zˆ).097.094 Selection Equation Constant -.478.149 MomCollege.541.112 DadCollege.603.097 Numsibs -.069.024 g (Ability).754.048 Urban18.096.131 35

Figure 1: E( ~ U D j ~ U D > J(u)) for various Speci cations of the Outcome Disturbances / and Link Function 3 2.5 t(v=2) / Normal E(U D U D > J(x)) 2 1.5 1 t(v=20) / Normal t(v=2) / t(v=2) 0.5 Normal / Normal 0 1.5 1 0.5 0 0.5 1 1.5 36 x

Distributions of Treatment on the Treated and Marginal Treatment E ects Using Normal and t 2 Models. Generated NORMAL Data. 1,000 Replications with N = 1,500. Figure 2: Treatment on the Treated with Z = 2: True Value ¼ 2:28 2 1.8 1.6 1.4 1.2 Normal t(2) 1 0.8 0.6 0.4 0.2 0 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 37

Figure 3: Marginal Treatment E ect with u D =1: True Value ¼ 1:54 3 2.5 2 Normal t(4) 1.5 1 0.5 0 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Marginal Treatment Effect Z=2. True MTE 2.08. 38

Distributions of Treatment on the Treated and Marginal Treatment E ects Using Normal and t 2 Models. Generated t 4 Data. 1,000 Replications with N = 2,500. Figure 4: Treatment on the Treated with Z = 2: True Value ¼ 2:64 2 1.8 1.6 1.4 Normal t(4) 1.2 1 0.8 0.6 0.4 0.2 0 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 Treatment on the Treated Z=2. True TT 2.64. 39

Figure 5: Marginal Treatment E ect with u D =2: True Value ¼ 2:08 3 2.5 2 Normal t(4) 1.5 1 0.5 0 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Marginal Treatment Effect Z=2. True MTE 2.08. 40

Figure 6: Probability of Correctly Choosing Normal Model Over t 2 Model Using MSE Criterion. 1,000 Iterations 0.9 Probability of Selecting Correct Model Using MSE Criterion. 1,000 Iterations 0.85 Probability of Choosing Correctly 0.8 0.75 0.7 0.65 0.6 ρ 1D =.95, ρ 0D =.1 ρ 1D =.5, ρ 0D =.1 ρ 1D =.2, ρ 0D =.1 0.55 0.5 0 100 200 300 400 500 600 700 800 900 1000 Number of Observations 41

Figure 7: Plots of Marginal Treatment E ects Across Alternate Models (Unscaled) 1.2 1 0.8 Normal / Normal Marginal Treatment Effect (MTE) 0.6 0.4 0.2 0 t(2) / Logit t(2) / t(2) t(24) / t(24) 0.2 0.4 0.6 3 2 1 0 1 2 3 42 U D