Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives

Estimating the Dynamic Effects of a Job Training Program with Multiple Alternatives Kai Liu 1, Antonio Dalla-Zuanna 2 1 University of Cambridge 2 Norwegian School of Economics June 19, 2018

Introduction Public job training programs: whether they are effective in promoting skill accumulation for disadvantaged individuals whether a greater return on public spending could be had elsewhere Existing evidence points to low or very modest returns from public job training (Barnow and Smith, 2015).

Introduction Most of the literature focuses on addressing endogenous (initial) selection into training Credible estimates now available thanks to experimental design The National Job Corps Study: random assignments into a treatment group (given an offer to participate) a control group (excluded from participation) Use randomization as instrument for participation to get causal effect of participation (Schochet et al., 2008) average treatment effect among compliers (LATE)

Introduction Even with experimental design, additional selection issues complicate the evaluation problem: Dynamic selection on when to quit Selection into alternative training/educational programs by: 1 Non-participant 2 Participant after having completed some training Yet these factors may be important to understand heterogeneity in the returns to training programs how different training programs interact (e.g. complementarity in program returns) cost-benefit analysis (ultimately) optimal program design/targeting

Our paper: What We Do We begin by building a non-parametric potential outcome framework in a dynamic and sequential choice setting, allowing for flexible dynamic selection in program participation, AND flexible (unordered) choice of multiple alternatives in each period With experimental variation in initial program participation, we show that LATE is a mixture of a number of different sublates (specific type of compliers) sublates and their population shares are not identified unless with strong assumptions Knowledge of shares of sublates essential for cost-benefit analysis

Our paper: What We Do We then estimate a (semi-parametric) dynamic selection model using data and experimental variation from the The National JC Study Using the estimated dynamic selection model, we quantify: 1 different sublates and their proportions in the population 2 ATE and selection patterns in each potential period 3 how JC and alternative training program interact (dynamic) complementarity in program returns intertemporal substitution in program choices

Introduction: Relation to existing research Our paper builds on the following literatures 1) Program evaluation: Choice substitution: Heckman, Hohmann and Smith (2000); Kline and Walters (2016) Dropouts and endogenous duration: Ham and LaLonde (1996); Heckman, Smith, Taber (1998) Dynamic treatment effect: e.g., Taber (2000), Heckman and Navarro (2007); Heckman, Humphries and Veramendi (2016) 2) Returns to job training: Heckman, LaLonde and Smith (Handbook, 1999), Barnow and Smith (2015)

Plan of the talk 1 Data and the National Job Corps Study 2 Potential outcomes and choices in a dynamic setting 3 Dynamic selection model of program participation 4 Estimation Results (Preliminary)

The Job Corps Program Job Corps (JC) is the largest vocationally focused education and training program for 1 disadvantaged (means-tested) 2 youths (16-24 years old) Features of JC: Center-based and most centers (87%) are residential Personalized vocational training, academic education (providing GED certificate) and other services. After training is completed, JC provides placement services to find a job or pursue additional education.

Data and the National Job Corps Study The National JC Study was conducted in mid-1990s to experimentally evaluate the program. 73% of the treatment group enrolled in JC mean duration is 8 months control group excluded from JC for 3 years (Only 1.4% did not obey this rule) 4 surveys were conducted during NJCS: baseline (at assignment), 6, 12, 48 months Our sample: all individuals who answered the last survey (around 80%) and reported duration in JC (dropped 3%) Main outcome variable: average weekly earnings during the 16th quarter after random assignment (zero or missing earnings: 25%)

Data and the National Job Corps Study Defining decision periods (S) 0 (randomization) 1 (1-3 months in JC) 2 (4-6 months in JC) 3 (7 months+ in JC) Defining multiple choices (T): No training: no additional training after JC Alternative training: enrolled in any program other than JC GED program (40%) high school (29%) vocational/technical/trade school (40%) college (21%)

Proportions in Each Treatment Status JC duration No training Alternative Total Ratio (a/n) control group (Z=0) 0 29.3 66.3 95.6 2.26 treatment group (Z=1) 0 8.5 19.6 28.1 2.30 1-3 8.3 11.8 20.2 1.42 4-6 5.6 7.9 13.5 1.41 7+ 16.7 21.5 38.2 1.29 Sum 39.2 60.8 100.0 1.55

ITT Decomposition Earnings Prob. Employed Earnings Prob. Employed (1) (2) (3) (4) z 21.1*** 2.7*** (4.2) (0.9) z*n in Period 0 7.2 1.1 (9.4) (2.2) z*a in Period 0 21.1*** 1.4 (7.5) (1.5) z* n after Period 1-2.4-2.0 (9.4) (2.3) z* a after Period 1-1.6-1.3 (8.5) (1.9) z* n after Period 2 20.7* 2.1 (12.0) (2.6) z* a after Period 2 21.7** 2.0 (10.7) (2.2) z*n after Period 3 35.9*** 5.8*** (7.3) (1.6) z*a after Period 3 36.5*** 6.6*** (7.0) (1.4) N 10,792 10,537 10,792 10,537

Plan of the talk 1 Data and the National Job Corps Study 2 Potential outcomes and choices in a dynamic setting 3 Dynamic selection model of program participation 4 Estimation Results

Choice Structure: Dynamic Case Potential duration of JC: t=0 (0 month), t=1 (1-3 months), t=2 (4-6 months) and t=3 (7 months+) Sequential choice: JC can only start in period 0 for Z=1 group, but can choose a or n in any period Once an individual opts out of JC: no recall Z = 0 Z = 1 Period 0 a n n a jc Period 1 n a jc Period 2 n a jc Period 3 n a

Potential Outcome Potential outcome in the multiperiod setting: Y i = Y 0,n i + S D i (S, a)(y S,a i Y 0,n i ) + S D i (S, n)(y S,n i Y 0,n i ) Y S,T i is earnings for individual i enrolled in JC for S periods and who chooses treatment T after S D i = (S, T ) identifies the treatment choice for individual i; D i (S, T ) defines an indicator function for each possible choice: D i (S, T ) = 1 [D i = (S, T )]. Linking the realized and the potential treatment: D i (S, T ) = D 0 i (S, T ) + (D 1 i (S, T ) D 0 i (S, T ))Z i

Identifying Assumptions In what follows we assume that: 1 Z i has no direct effect on Y S,T i, S, T (exclusion) 2 Z i independent of Y S,T and Di Z, S, T (random assignment) 3 i D 0 i D 1 i = D 1 i = (S, T ), S > 0 Compliers must choose some JC when Z = 1 stable rank of next-best alternatives in period 0 In addition, we assume there is no Always Takers (data driven).

Compliance types: Dynamic Case Z = 0 Z = 1 Period 0 a n n a jc Period 1 n a jc Period 2 n a jc Period 3 n a Compliers: those choosing JC for at least one period In this 3-period case there are 12 types of compliers 6 types of compliers are those who select Di 0 = (0, t) and Di 1 = (s, t) s, t the other 6 types: select Di 0 = (0, t) and Di 1 = (s, v) s, t, v t

Compliance types: Dynamic Case Z = 0 Z = 1 Period 0 a n n a jc Period 1 n a jc Period 2 n a jc Period 3 n a Compliers: those choosing JC for at least one period In this 3-period case there are 12 types of compliers 6 types of compliers are those who select D 0 i = (0, t) and D 1 i = (s, t) s, t [e.g. π Ca,1a =P(D 0 i = (0, a), D 1 i = (1, a))] the other 6 types: select D 0 i = (0, t) and D 1 i = (s, v) s, t, v t

Compliance types: Dynamic Case Z = 0 Z = 1 Period 0 a n n a jc Period 1 n a jc Period 2 n a jc Period 3 n a Compliers: those choosing JC for at least one period In this 3-period case there are 12 types of compliers 6 types of compliers are those who select D 0 i = (0, t) and D 1 i = (s, t) s, t [e.g. π Ca,1a =P(D 0 i = (0, a), D 1 i = (1, a))] the other 6 types: select D 0 i = (0, t) and D 1 i = (s, v) s, t, v t [e.g. π Ca,3n =P(D 0 i = (0, a), D 1 i = (3, n))]

Compliance types: Dynamic Case Z = 0 Z = 1 Period 0 a n n a jc Period 1 n a jc Period 2 n a jc Period 3 n a Never Takers are of two types Those who choose a: π Aa =P(D 0 i = (0, a), D 1 i = (0, a)) Those who choose n: π An =P(D 0 i = (0, n), D 1 i = (0, n))

Interpreting LATE: 3 Periods Case Suppose we ignore the duration of JC and do not distinguish alternatives (like in Schochet et al. (2008)) Single choice: D can be either 0 (no JC) or 1 (some JC) LATE is estimated using the Wald estimator: LATE = E[Y i Z i = 1] E[Y i Z i = 0] P(D 1 i = 1) P(D 0 i = 1) We now interpret this LATE parameter, under dynamic sequential choice structure with multiple alternatives and contrast it to static model with multiple alternatives

Interpreting LATE: 3 Periods Case 3 s=1 Define π C as π C 3 (π Ca,sa ) + s=1 3 (π Cn,sn ) + s=1 3 (π Ca,sn ) + s=1 The Wald estimator can be decomposed in π Ca,sa E[Y s,a i Y 0,a i Di 1 (s, a) = 1, Di 0 (0, a) = 1]+ π C 3 s=1 3 s=1 3 (π Cn,sa ) s=1 π Cn,sn E[Y s,n i Y 0,n i Di 1 (s, n) = 1, Di 0 (0, n) = 1]+ π C π Ca,sn E[Y s,a i Y 0,n i Di 1 (s, a) = 1, Di 0 (0, n) = 1]+ π C 3 s=1 π Cn,sa E[Y s,n i Y 0,a i Di 1 (s, n) = 1, Di 0 (0, a) = 1] π C Which implies that there are 12 sublates

Comparing with the Static Case Consider the static case where individuals make mutually exclusive choices in one period. Z = 0 Z = 1 a n n a jc Compliers are of two types π Ca =P(D 0 i = a, D 1 i = jc) π Cn =P(D 0 i = n, D 1 i = jc) LATE can be decomposed into two sublates (Kline and Walters 2016, Kirkeboen, Mogstad, Leuven 2016) π Cn E[Y jc i Yi n Di 1 = jc, Di 0 = n]+ π Ca π Cn + π Ca E[Y jc i π Cn + π Ca Y a i D 1 i = jc, D 0 i = a]

Static vs. Dynamic Case The dynamic case provides additional parameters of interest (dynamic) complementarity in program returns, e.g. Y 3,n Y 0,n < Y 3,a Y 0,a or Y 0,a Y 0,n < Y 3,a Y 3,n effect of program duration (e.g. Y S,n Y 0,n S ) intertemporal choice substitution: D 0 = a, D 1 = (S, n), S > 0 (static model tends to over-predict program substitution) intertemporal complementarity between JC and alternative: D 0 = n, D 1 = (S, a), S > 0 (ruled out in static case) In the dynamic potential outcome framework with a single instrument, we cannot non-parametrically identify sublates: static model: (with assumptions) using Z interacted with covariates as additional IV Results

Relevance of Separate Effect Estimation for Different Compliers Knowing the share of the different types of compliers is essential to conduct a cost-benefit analysis if the cost of JC changes depending on the period spent in JC we need to take into account also the costs of the alternative treatments Cost-benefit analysis

Plan of the talk 1 Data and the National Job Corps Study 2 Potential outcomes and choices in a dynamic setting 3 Dynamic selection model of program participation 4 Estimation Results

Dynamic Selection Model For each potential duration s (s [0, 3]), utilities from each potential choice: U n is = 0 (1) U a is = β a s X i + θ ia + u a is (2) U j is = βj sx j i + θ ij + u j is (3) β a s and β j s vary flexibly with potential JC duration state dependence in JC participation (may vary by X) experience from JC makes alternative program more attractive Permanent unobserved factors follow a bivariate normal distribution: (θ ia, θ ij ) BN(0, 0, σ a, σ j, ρ) (4) u a is, uj is are i.i.d. (variances normalized to 1)

Dynamic Selection Model Parameterize potential outcomes Y S,T i = α S,T + β S,T w X i + γ S,T a θ ia + γ S,T j θ ij + ε S,T i (5) γa S,T and γ S,T j capture how potential outcomes vary with unobserved factors H 0 : γa S,T = 0, γ S,T j = 0, S, T (no selection) H 0 : γa S,a = γa S,n, S (no selection into a on gains) H 0 : γ S,T j = γ S,T j (dynamic selection into JC on gains) α S,T is ATE for the group with X it = 0 ε S,T i are i.i.d (measurement errors).

Dynamic Selection Model: Identification Exclusion restrictions in the first period: Z and Z X E(Y i X i = x, Z i = 1, D i = n, s = 0) E(Y i X i = x, Z i = 0, D i = n, s = 0) = γ 0,n a λ a (x, 1, n, 0) + γ 0,n j λ j (x, 1, n, 0) γ 0,n a λ a (x, 0, n, 0) λ a (x, z, n, 0) = G 0 a (P(D i = n X i = x, Z i = z, s = 0)) Subsequent period: E(Y i X i = x, Z i = 1, D i = n, s = 1) = α 1,n + β 1,n x + γ 1,n a λ a(x, 1, n, 1) + γ 1,n j λ j (x, 1, n, 1) λ a(x, z, n, 1) = G 1 a (P(D i = j X i = x, Z i = z, s = 0), P(D i = n X i = x, Z i = z, s = 1)) Z and Z X shift λ a(x, z, n, 1) via P(D i = j X i = x, Z i = z, s = 0) but have no direct effect on Y.

Dynamic Selection Model: Estimation The conditional likelihood function of individuals with Z = 0 is L (1) (S = 0, T = k, Y θ ia, θ ij ) = P(U k i0 > U k i0 θ ia, θ ij )h(y T = k, θ ia, θ ij ) (6) where k = {a, n} and k denotes all the remaining choices other than k. The conditional likelihood function of individuals with Z = 1 is L (2) (S = s, T = k, Y θ ia, θ ij ) =P(U j i0 > max(uj i0 ),..., Uj is > max(uj is ), Uk is+1 > max(u k is+1 ) θ ia, θ ij ) h(y S = s, T = k, θ ia, θ ij ) (7)

Dynamic Selection Model: Estimation To form the likelihood contribution for the individual, we need to average out over all possible individual types: L (m) (S = s, T = k, Y ) = L (m) (S = s, T = k, Y θ ia, θ ij )f (θ ia, θ ij )dθ ia dθ ij, m = {1, 2} choice probabilities simulated using the GHK simulator The complete likelihood function consists of products over workers: L = L (1) (T, Y ) L (2) (S, T, Y ) (8) i B 1 i B 2

Plan of the talk 1 Data and the National Job Corps Study 2 Potential outcomes and choices in a dynamic setting 3 Dynamic selection model of program participation 4 Estimation Results

Estimation Results: ATE for Age 16-19 1 1 Test for program complementarity: H 0 : Y 3,n Y 0,n = Y 3,a Y 0,a. reject with p-val=0.05

Estimation Results: Selection on Unobservables S=0 S=1 S=2 S=3 a n a n a n a n γa S,T 0.56-0.79 0.25-0.70-0.54-0.65-0.03-0.58 (0.11) (0.12) (0.99) (0.34) (0.51) (0.27) (0.71) (0.78) γ S,T j -0.13 0.12-0.49-0.05 0.47 0.00-0.38 0.63 (0.08) (0.09) (0.35) (0.74) (0.39) (0.81) (0.43) (0.41) We can reject the following null hypothesis no selection (H 0 : γa S,T = 0, γ S,T j = 0, S, T ) no selection on gains (a) (H 0 : γa S,a We cannot reject = γa S,n, S) no dynamic selection on gains (jc) (H 0 : γ S,T j = γ S,T j, S, T )

Goodness of Fit: Proportions in Each Treatment Status JC duration No training Alternative control group 0 30.6 69.4 29.4 70.6 treatment group 0 8.2 19.2 8.2 19.0 1-3 7.8 11.2 7.7 12.1 4-6 5.5 7.7 5.4 7.8 7+ 17.4 21.5 16.6 23.1 Sum 39.2 60.8 37.9 62.1

Goodness of Fit: Wages (w/o measurement errors) All individuals Control group Density 0.2.4.6.8 1 Density 0.2.4.6.8 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 log wage 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 log wage actual predicted actual predicted Treatment group Density 0.2.4.6.8 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 log wage actual predicted

Estimation Results: sublates D 0 D 1 Estimates Shares SubLATEs (dynamic) a (1, a) -0.21 10.1 n (1, a) 0.87 2.1 n (1, n) 0.01 5.7 a (1, n) 0.14 2.1 a (2, a) -0.1 6.7 n (2, a) 0.93 1.2 n (2, n) 0.11 3.5 a (2, n) 0.21 1.9 a (3, a) -0.15 21.2 n (3, a) 1.32 2.3 n (3, n) 0.2 6.9 a (3, n) 0.55 9.9 SubLATEs (static) n j 0.36 21.7 a j 0 51.9 Overall LATE 0.11 73.6

Estimation Results: Understanding Intertemporal Substitution In the static framework, π aj = P(D 0 i = a, D 1 i = jc) = 0.52 implying a great degree of choice substitution between a and JC disallowing anyone to switch from n to a (identifying assumption) By comparison, the dynamic framework implies much less choice substitution: π a,1n + π a,2n + π a,3n = 0.14 < 0.52 intertemporal complementarity between JC and a: π n,1a + π n,2a + π n,3a = 0.06

Estimation Results: Understanding Differences Between Age Groups The effect of JC significantly larger for 20-24 age group (Schochet et al. (2008)): LATE young = 0.09, LATE old = 0.17 This age difference can be due to two factors: Different treatment effect (via β w, the age parameter in potential wages) Different patterns of sorting into different compliance groups Using the estimated model, we simulate a counterfactual LATE by shutting down age difference in potential wages (β w = 0) the counterfactual LATE old = 0.07 all the age difference in LATE is due to age difference in treatment effect

Conclusion In this paper, we decomposed LATE to various sublates by extending the potential outcome framework to a dynamic setting showed non-identification of the sublates imposed parametric assumptions to infer the sublates and ATE We learnt that there are large heterogeneity in program returns incorporating dynamics in program evaluation seems fruitful: dynamic complementarity+intertemporal substitution Our framework potentially useful to a large literature in development using encouragement design thinking of optimal program design/targeting

Interacted IV Results - Static Case Back Job Corps Job Corps and Alternative Training (1) (2) jc 26.2*** 218.7*** (6.0) (51.5) a 276.1*** (72.4) F-statistic jc 1,608.3 35.0 a 8.7 Overid. p-value 0.001 0.753 N 10,586 10,586 interact JC offer with observable covariates (mother s edu, age, race, first language, enrollment in welfare programs) assumption: sublates do NOT differ across covariates groups (Kline and Walters, 2016) the instruments need to be relevant in separately identifying

Cost-Benefit Analysis The benefit from JC can be summarized by the increase in net lifetime earnings for participants B = (1 τ) E[Y i ] where τ is the tax rate The cost is the sum of the cost of JC and the cost of the alternative training (which may be complements or substitutes to JC), net of the increase in tax revenue In a 3 periods setting it is reasonable to assume that the longer a person is enrolled in JC, the higher the cost of JC: C = 3 φ s jc P(Di 1 (s, t)) + φ a P(D i (S, a)) τ E[Y i ] t=n,a s=1 where φ s jc is the cost of JC for s periods, φ a is the cost of the alternative training and P(D i (S, a)) is the probability of enrolling in a in every period both for treatment and controls, with S (0, 1, 2, 3)

Cost-Benefit Analysis: Program Expansion Consider an expansion of the program which increases the proportion of individuals who are randomly offered JC treatment. Call this proportion δ The benefit of expanding the program depend on the overall LATE and on the proportion of individuals who enrolls in JC for at least one period (similar result in Kline and Walters, 2016): B δ = (1 τ)late P(D1 i (S > 0, T ))

Cost-Benefit Analysis: Program Expansion The change in costs due to an expansion of JC depends on the proportion of different types of compliers: C δ = t=n,a s=1 3 φ s jc P(Di 1 (s, t)) 3 φ a P(Di 1 (s, n), Di 0 (0, a))+ s=1 3 φ a P(Di 1 (s, a), Di 0 (0, n)) s=1 τlate P(D 1 i (S > 0, T )] the first term is the increase in marginal costs due to more people enrolling in JC the second term is the savings due to individuals enrolling in a if Z = 0 and in JC if Z = 1 who don t enroll in a after JC the third term is the increase in marginal costs due to individuals enrolling in n if Z = 0 who enroll in a after JC the last term is the increase in tax revenues