Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

Four Parameters of Interest in the Evaluation of Social Programs James J. Heckman Justin L. Tobias Edward Vytlacil Nueld College, Oxford, August, 2005 1

1 Introduction This paper uses a latent variable framework to unite the recent treatment eect literature with the classical selection bias literature. We obtain simple closed-form expressions for four treatment parameters of interest: the Average Treatment Eect (ATE), the eect of Treatment on the Treated (TT), the Local Average Treatment Eect (LATE) (Imbens and Angrist 1994), and the Marginal Treatment Eect (MTE) (Björklund and Mott 1987; Heckman 1997; Heckman and Vytlacil 1999, 2000a-b) for the textbook Gaussian selection model. Discuss how one might approach estimation of the distributions associated with these parameters of interest. 2

2 Treatment Parameters in a Canonical Model Consider a model of potential outcomes: 1 = 1 + 1 (2.1) 0 = 0 + 0 = + Each agent is observed in only one state, so that either 1 or 0 is observed. The pair ( 1 0 ) is never observed for any given person. Gain is denoted by 1 0 3

() denotes the observed treatment decision () =1denotes receipt of treatment () =0denotes nonreceipt. is a latent variable which generates (), () =1[ () 0] = 1[ + 0] (2.2) 4

1[] is the indicator function which takes the value 1 if the event is true, 0 otherwise. Extension of the Roy (1951) model, = 1 0, where represents the cost of participating in the treated state () =1[ ]. () indicates whether or not the individual would have received treatment had her value of been externally set to, holding her unobserved constant. 5

Varying, we can manipulate an individual s probability of receiving treatment without aecting the potential outcomes. Assume ( 1 0 ) independent of and. denotes observed earnings. = 1 +(1 ) 0 (2.3) Switching regression model: Quandt (1972), Rubin s model (Rubin 1978), or Roy model of income distribution (Roy 1951: Heckman and Honoré 1990). 1 1 Amemiya (1985) has classified models of this type as generalized tobit models, and refers to the model in (1) as the Type 5 tobit model. 6

Estimating the return to a college education. represents log earnings, 0 denotes the log earnings of college graduates and 1 denotes the log earnings of those not selecting into higher education. The latent index maps people into either the college (or treated) state and the no-college (or untreated) state. Expected college log wage premium for given characteristics, ( i.e. ( 1 0 )). 2 2 Other applications which fit directly into this model include Lee (1978) and Willis and Rosen (1979). 7

Examine four treatment parameters: Average Treatment Eect (ATE), the eect of Treatment on the Treated (TT), the Local Average Treatment Eect (LATE),andtheMarginalTreatmentEect (MTE). 8

Average Treatment Eect (ATE): expected gain from participating in the program for a randomly chosen individual. 1 0 : gain from program participation, where is sample size. ATE() =( = ) =( 1 0 ) ATE = () = Z ATE() () 1 X ATE( )=( 1 0 ) =1 9

Treatment on the Treated (TT): TT( () =1) = ( = = () =1) (2.4) = ( 1 0 )+( 1 0 = = ) = ( 1 0 )+( 1 0 ) 10

We can obtain an unconditional estimate by integrating. =1, TT can be approximated as follows: TT = ( () =1) (2.5) = Z TT( () =1) ( () =1) 1 X ( ( )=1) =1 11

Local Average Treatment Eect (LATE) of Imbens and Angrist (1994) LATE is defined as the expected outcome gain for those induced to receive treatment through a change in the instrument from = to = 0. LATE parameter as a change in the index from = to = 0,where 0 and and 0 are identical except for their coordinate. We could equivalently define the treatment parameters in terms of the propensity score, () =Pr( =1 ) =1 (), denotes the cdf of the random variable. 12

The LATE parameter: LATE(() =0( 0 )=1 = ) =( () =0( 0 )=1 = ) = ( 1 0 )+( 1 0 0 = ) = ( 1 0 )+( 1 0 0 ) 13

Two ways to define the unconditional version of LATE. First, consider ( () =0( 0 )=1) = Z 1 LATE(() =0( 0 )=1) () X LATE(() =0( 0 )=1 ) (2.6) =1 Parameter ( () =0( 0 )=1)treatment eect for individuals who would not select into treatment if their vector was set to but would select into treatment if was set to 0. Alternative definition of the unconditional versionoflateistolet 0 () equal Let 1 () equal but with the th element replaced by 0. 14

Second definition of the unconditional version of LATE, ( ( 0 ()) = 0( 1 ()) = 1) (2.7) = Z LATE(( 0 ()) = 0( 1 ()) = 1) ( ) 1 X LATE(( 0 ( )) = 0( 1 ( )) = 1 ) =1 15

Marginal Treatment Eect (MTE) (Björklund and Mott 1987; Heckman 1997; Heckman and Smith 1998; Heckman and Vytlacil 1999, 2000a-b), MTE( ) = ( = = ) (2.8) = ( 1 0 )+( 1 0 = = ) = ( 1 0 )+( 1 0 = ) where third equality follows ( 1 0 ) independent of. 16

At low values of average the outcome gain for those with unobservables making them least likely to participate, while evaluation of the MTE parameter at high values of is the gain for those individuals with unobservables which make them most likely to participate. is independent of,themte parameter unconditional on observed covariates can be written as MTE( ) = Z 1 MTE( ) () X MTE( )=( 1 0 )+( 1 0 = ) =1 17

MTE parameter can also be expressed as the limit form of the LATE parameter, lim 0 LATE( () =0(0 )=1) = ( 1 0 ) + lim 0 ( 1 0 0 = ) = ( 1 0 )+( 1 0 = 0 ) = MTE( 0 ) MTE parameter measures average gain in outcomes for those individuals who arejustindierent to treatment when the index is fixed at the value. 18

3 Simple Expressions for the Dierent Treatment Parameters in the General Case Textbook normal model: 1 0 0 1 1 0 1 2 1 10 0 10 2 0 19

Treatment on the Treated (TT) is: TT( () =1)=( 1 0 )+( 1 1 0 0 ) () () Thus, if Cov( 1 0 )=0,or 1 1 = 0 0 If Cov( 1 0 ) 0, thentt ATE,TTATE. (e.g. Cramer 1946 or Johnson, Kotz, and Balakrishnan 1992) 20

( ) ( ) and,then μ () () ( ) = + () () =( ), =( ).Thus, LATE( () =0( 0 )=1)=( 1 0 0 )) (3.1) = ( 1 0 )+( 1 1 0 0 ) (0 ) () ( 0 ) () 21

The Marginal Treatment Eect ( ) = ( 1 0 )+( 1 0 = ) = ( 1 0 )+( 1 = ) ( 0 = ) = ( 1 0 )+( 1 1 0 0 ) 22

Limit form of LATE. 3 ( ) () ( )=( 1 0 )+( 1 1 0 0 ) lim ( ) () ( ( ) ()) ( ) = ( 1 0 )+( 1 1 0 0 ) lim (( ) ()) ( ) = ( 1 0 )+( 1 1 0 0 ) 3 The last line in this derivation follows from L Hôpital s rule. 23

Evaluating MTE when is large corresponds to case where average outcome gain is evaluated for those individuals with unobservables making them most likely to participate, (and conversely when is small). When =0, MTE = ATE as a consequence of symmetry of normal distribution. 24

Non-Normal Extensions Following Lee (1982, 1983), trivariate Normal model can be generalized by exploiting natural flexibility of selection equation. In latent variable framework, selection rule assigns people to treated state ( =1)provided 0 This is equivalent to setting =1when ( ) () 0 for some strictly increasing function 25

Suppose,where an absolutely continuous distribution function. For simplicity, assume symmetry of about zero so that () =1 (). ( ) () 1 () is standard normal random variable. 26

Original model in (1) is equivalent to the transformed model: 1 = 0 1 + 1 0 = 0 0 + 0 = ( 0 )+ now assume [ 1 0 ] 0 is trivariate normal. Obtain the following selectioncorrected conditional mean functions: ( 1 () =1 = = ) = 0 ( ( 0 )) 1 + 1 1 (3.2) ( 0 ) ( 0 () =0 = = ) = 0 ( ( 0 )) 0 0 0 (3.3) 1 ( 0 ) 27

( () =1)= 0 ( 1 0 )+( 1 1 0 0 ) ( ( 0 )) ( 0 ) ( () = 0( ) =1)= 0 ( 1 0 )+( 1 1 0 0 ) ( ( 0 )) ( ( 0 )) ( 0 ) ( 0 ) ( )= 0 ( 1 0 )+( 1 1 0 0 ) ( ) 28

Less straightforward generalization can be achieved by following Lee (1982, 1983) in (14) to be jointly distributed according to the Student- distribution. ( ) denotes the multivariate. Student- density function with mean, scale matrix (variance equal to [( 2)]) and degrees of freedom. 4 Let denote the standardized univariate Student density with mean 0 and scale parameter equal to 1. Let denote the associated cdf. 4 The mean exists when 1 and the variance exists when 2 29

Letting, we define () 1 ( ()) as before, again noting that () = () Assume [ 1 0 ] 0 has a trivariate (0 ) density. ( 1 () =1 = = ) = 0 1 + 1 1 μ +[ ( 0 )] 2 1 μ ( ( 0 )) ( 0 ) ( 0 () =0 = = ) = 0 0 0 0 μ +[ ( 0 )] 2 1 μ ( ( 0 )) 1 ( 0 ) 30

μ +[ ()] 2 ( ) ( ()) 1 ( () =1)= 0 ( 1 0 )+( 1 1 0 0 ) (0 ) ( 0 ) ( () =0( ) =1)= 0 ( 1 0 )+( 1 1 0 0 ) ( 0 ) ( 0 ) ( 0 ) ( 0 ) ( )= 0 ( 1 0 )+( 1 1 0 0 ) ( ) 31

3.1 Estimation 1. Obtain ˆ from a probit model on the decision to take the treatment. 2. Compute the appropriate selection correction terms evaluated at ˆ, (i.e. ( ˆ)(ˆ) when =1, and ( ˆ)(1 (ˆ)) when =0) 32

3. Run treatment-outcome-specific regressions (for the groups { : = 1} and { : =0}) with the inclusion of the appropriate selectioncorrection terms obtained from the previous step. 4. Given ˆ 0 ˆ 1 1 ˆ 1 and 0 ˆ 0 obtained from step 3, and ˆ from step (1), use these parameter estimates to obtain point estimates of the treatment parameters for given,, and 0. Alternatively, one could integrate over the distribution of the characteristics to obtain unconditional estimates, as suggested in section 2. 33

Table 1 Point Estimates and Standard Errors of Alternate Treatment Parameters Outcome Errors / Link Function ATE TT LATE Normal/Normal.092.039.079 (SSR=345.25) (.03) (.04) (.03) t v=2 /Logit.061.036.053 (SSR = 346.09) (.02) (.03) (.02) t v=3 /Logit.073.035.062 (SSR = 345.79) (.02) (.03) (.02) t v=4 /Logit.079.035.067 (SSR = 345.61) (.02) (.04) (.03) t v=5 /Logit.082.034.069 (SSR = 345.51) (.03) (.04) (.03) t v=6 /Logit.084.034.071 (SSR = 345.44) (.03) (.04) (.03) t v=8 /Logit.085.034.073 (SSR = 345.36) (.03) (.04) (.03) t v=12 /Logit.087.034.073 (SSR = 345.29) (.03) (.04) (.04) t v=24 /Logit.088.033.075 (SSR = 345.23) (.04) (.04) (.03) t v=2 / t v=2.067.028.058 (SSR = 345.68) (.03) (.04) (.03) t v=3 / t v=3.075.030.063 (SSR = 345.56) (.03) (.04) (.03) t v=4 / t v=4.079.031.066 (SSR = 345.48) (.03) (.04) (.03) t v=5 / t v=5.082.032.069 (SSR = 345.43) (.03) (.04) (.03) t v=6 / t v=6.084.033.070 (SSR = 345.40) (.03) (.04) (.03) t v=8 / t v=8.086.034.072 (SSR = 345.36) (.03) (.04) (.03) t v=12 / t v=12.088.036.075 (SSR = 345.32) (.03) (.04) (.03) t v=24 / t v=24.090.037.077 (SSR = 345.29) (.03) (.04) (.03) 34

Table 2: Coecients and Standard Errors for Application of Section 5 Variable Coecient Standard Error College State Constant 1.85.225 g (Ability).092.053 Northeast.124.055 South.059.057 Experience.098.044 Experience 2 -.004.003 Urban.326.072 Unemp. Rate -.002.002 (Zˆ) -.165.081 No-College State Constant 1.89.424 g (Ability).191.036 Northeast.126.057 South -.046.053 Experience.043.067 Experience 2 -.001.003 Urban.136.051 Unemp. Rate.001.002 (Zˆ).097.094 Selection Equation Constant -.478.149 MomCollege.541.112 DadCollege.603.097 Numsibs -.069.024 g (Ability).754.048 Urban18.096.131 35

Figure 1: E( ~ U D j ~ U D > J(u)) for various Speci cations of the Outcome Disturbances / and Link Function 3 2.5 t(v=2) / Normal E(U D U D > J(x)) 2 1.5 1 t(v=20) / Normal t(v=2) / t(v=2) 0.5 Normal / Normal 0 1.5 1 0.5 0 0.5 1 1.5 36 x

Distributions of Treatment on the Treated and Marginal Treatment E ects Using Normal and t 2 Models. Generated NORMAL Data. 1,000 Replications with N = 1,500. Figure 2: Treatment on the Treated with Z = 2: True Value ¼ 2:28 2 1.8 1.6 1.4 1.2 Normal t(2) 1 0.8 0.6 0.4 0.2 0 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 37

Figure 3: Marginal Treatment E ect with u D =1: True Value ¼ 1:54 3 2.5 2 Normal t(4) 1.5 1 0.5 0 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Marginal Treatment Effect Z=2. True MTE 2.08. 38

Distributions of Treatment on the Treated and Marginal Treatment E ects Using Normal and t 2 Models. Generated t 4 Data. 1,000 Replications with N = 2,500. Figure 4: Treatment on the Treated with Z = 2: True Value ¼ 2:64 2 1.8 1.6 1.4 Normal t(4) 1.2 1 0.8 0.6 0.4 0.2 0 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 Treatment on the Treated Z=2. True TT 2.64. 39

Figure 5: Marginal Treatment E ect with u D =2: True Value ¼ 2:08 3 2.5 2 Normal t(4) 1.5 1 0.5 0 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Marginal Treatment Effect Z=2. True MTE 2.08. 40

Figure 6: Probability of Correctly Choosing Normal Model Over t 2 Model Using MSE Criterion. 1,000 Iterations 0.9 Probability of Selecting Correct Model Using MSE Criterion. 1,000 Iterations 0.85 Probability of Choosing Correctly 0.8 0.75 0.7 0.65 0.6 ρ 1D =.95, ρ 0D =.1 ρ 1D =.5, ρ 0D =.1 ρ 1D =.2, ρ 0D =.1 0.55 0.5 0 100 200 300 400 500 600 700 800 900 1000 Number of Observations 41

Figure 7: Plots of Marginal Treatment E ects Across Alternate Models (Unscaled) 1.2 1 0.8 Normal / Normal Marginal Treatment Effect (MTE) 0.6 0.4 0.2 0 t(2) / Logit t(2) / t(2) t(24) / t(24) 0.2 0.4 0.6 3 2 1 0 1 2 3 42 U D