By Marcel Voia. February Abstract

Nonlinear DID estimation of the treatment effect when the outcome variable is the employment/unemployment duration By Marcel Voia February 2005 Abstract This paper uses an econometric framework introduced by Athey and Imbens (2003) to estimate the effect of a treatment on the entire employment distribution of a program that provides employment-related services to disadvantaged individuals when they leave welfare for jobs. To explore the effect of a treatment program, the entire counterfactual distribution of employment duration that would have occurred in the absence of treatment is estimated using a nonlinear DID methodology that accounts for unobserved heterogeneity. To test the estimator, I use simulated data and I apply the estimator to a subsample from a social experiment, the National Supported Work Demonstration (NSW). The results obtained on the simulated data show that the estimator is able to account for the differences in unobserved heterogeneity between the control and treatment groups. When applied to the NSW data, it shows that the estimated measure of the treatment effect is significantly different than the observed measure of the treatment effect. This result can be explained by the differences in the response rates between the control group and the treatment group, differences that cannot be captured by the observed average measure of treatment effect. IamverygratefultoTiemenWoutersenforthestimulating suggestions and for encouraging me to work on this project, and to Lance Lochner and Chris Robinson for very helpful comments. Correspondence addresses: Carleton University, Department of Economics, Loeb Building 25 Colonel By Drive, Ottawa, Ontario, KS 5B6, Canada. Email: mvoia@connect.carleton.ca

Introduction This paper uses a nonlinear difference-in-difference (DID) methodology introduced by Athey and Imbens (2003) to estimate the effect of policy interventions designed to improve the labour outcomes of disadvantaged individuals. The estimation methodology, is an extension to the standard DID approach and assumes that the distribution of the outcome in the treatment group is generating the same change in the absence of the treatment while allowing for different heterogeneity distributions between the two examined groups. The standard DID is used to analyze experiments with multiple subpopulations, some of them being subject to treatments or policy interventions while others not. The standard DID estimation methodology measures the impact of a program by analyzing the difference between treated and non-treated in the before-after difference in outcomes and allows for time-specific intercepts that are common across groups. Therefore, the standard DID does not allow for the treatment effect heterogeneity. The standard DID estimation methodology was used by LaLonde (986) to compare it with other standard evaluation estimators, while in recent surveys, Meyer (995), Angrist and Krueger (2000), Blundell and MaCurdy (2000), we find other applications of the standard DID. To explore the effect of a treatment program, firstly, I use simulated data to test the applicability of the estimator for an employment duration outcome variable. Then, I estimate the entire counterfactual distribution of employment duration that would have occurred in the absence of treatment on a subsample from the National Supported Work Demonstration (NSW) generated initially by LaLonde (986) and refined by Dehejia and Wahba (998) to accommodate non-experimental control groups. The current literature debates over the reliability of evaluation the social programs without a randomized experiment. In a randomized experiment, the control group has the same observed and unobserved distribution as the treatment group and, thus, the treatment effect can be consistently estimated when a reliable technique is used. The reliability of experiments is distorted by some of the drawbacks of a randomized experiment, such as its high cost and noncompliance. On the other hand, evaluation methods that use nonexperimental data are less costly but also less precise in estimating the effect of a social program. This is due to the wide variety of available estimators that give program estimates which are sensitive to the chosen estimator (Smith and Todd, 2000). 2

For LaLonde s data, Smith and Todd found that the DID matching estimators exhibit better performance than the cross-sectional estimators. This is due to the fact that the sources of bias are likely to be relatively stable over time and should be differenced out. The purpose of this paper is to use an estimator that allows the distribution of the effects of the treatment on the control group to differ from the one in the treatment group. The methodology can be used to construct the counterfactual distribution of the effects of the treatment on the treatment group as well as the distribution of the effects of the treatment on the control group. Another advantage of the method is that, for identification, one needs only two time periods and two groups. Therefore, I evaluate the effect of the treatment on the treatment group by comparing the counterfactual distribution of the treatment group as it was not treated with the actual distribution of the treatment group in the second period. The counterfactual is constructed using the inverse of an empirical distribution function with observations from one group and, then, it applies that function to the data from the other group. To get the average treatment effect, an average of the counterfactual distribution is employed. The results obtained on the simulated data show that the estimator is performing well for different DGPs and, when applied to the NSW data, it shows that the estimated measure of the treatment effect is significantly different than the actual measure of the treatment effect. This finding implies that the estimator corrects for any data selection problems due to the differences in the response rates between the control group and the treatment group. The paper is organized as follows. Sections 2 describes the estimation methodology. Section 3 applies the estimation methodology on different simulation designs. Section 4 applies the methodology on a subsample of NSW data and gives estimation results. Section 5 concludes. 2 Methodology As mentioned in the introduction, the estimation methodology needs only two time periods and two groups and is described by Athey and Imbens (2003). Consider a continuous model with the outcome variable Y i and define by Yi N the outcome of an individual who does not receive the treatment, and Yi I the outcome of an individual who receives the treatment (intervention), I i an 3

0 for nontreated individual and indicator for the treatment, I i =,G i an indicator for the group for treated individual 0 for control group G i = and t i {0, } the observed time period. for treated group To identify the Athey-Imbens estimator with continuous outcomes in absence of intervention the following conditions have to be satisfied:. The outcome of an individual, in the absence of intervention, satisfies the relationship : Y i = f(u i,t i ). This assumption require that outcomes do not depend directly on the group, and that all relevant unobservables can be captured in a single index, u. In this framework, the realized outcome is defined as where Y N i = f N (u i,t i ) and Y I i Y i =( I i ) Y N i + I i Y I i, = f I (u i,t i ) are increasing in u i, with u i the unobserved characteristics of a given individual (u i = α + ηg i + ε i ). The functions f N (u i,t i ) and f I (u i,t i ) defined in this methodology are allowed to be unknown and are different than the one used in standard DID models, where f (u i,t i ) has a well defined form, f (u i,t i )=α + βt i + ηg i + ε i. Therefore, the non-linear DID model allows f (u i,t i ) to be f (u i,t i )=φ (α + βt i + ηg i + ε i ), where φ is a strictly increasing function and is allowed to be unknown. 2. Strict Monotonicity: f(u i,t i ),wheref : U {0, } R is strictly increasing in u for t i {0, }, whereu i canbethesameinthetwogroups(u I = u N = u i ) or different in the two u I for treated group groups, u i = u N for non-treated group. This assumption requires that higher unobservables correspond to strictly higher outcomes. 3. Invariance within groups: u i t i G i. This assumption requires that the population of agents within a given group does not change over time. Therefore, estimating the trend on one group can help in eliminating the trend in the other group. This assumption also allows for general dependence of the unobserved component on the group indicator. 4. Support Restrictions: U U 0. 4

Define the observed outcome random variables Y N gt = outcome for the non-treated individual from the group G i = g in period t i Y I gt = outcome for treated individual from the group G i = g in period t i, and the realized outcome for the non-treated individual from the group G i = g in period t i as This assumption implies that a) Y gt = Y 0 Y gt = Y 00, b) Y N gt = Y N Y gt = Y 0. Y gt =( I i ) Y N gt + I i Y I gt. If there are no support restrictions, the distribution of Y N on Y 0 can be identified inside of Y 0 and not outside of Y 0. 5. When the covariates (X) are part of the model, all the previous assumptions have to hold conditional on X. Then, the identification of Y N X is required. In this model, unobserved heterogeneity can be different between the two groups. This is true when there is self selection or noncompliance in one of the groups. Thus, in the absence of treatment, the differences between the two groups are determined by the differences in the conditional distribution of unobserved heterogeneity given the group G. The unobserved heterogeneity u i is allowed to vary across groups, but not over time within groups (u i t i G i ). The model requires a full independence assumption, with ε i (G i,t i ). Also, the method rules out some models with over time and across groups mean and variance shifts. The simulated DGPs are generated to satisfy conditions to 4. For condition 2, I consider cases with the same unobserved heterogeneity in the two groups as in DGP, DGP2, DGP4, DGP5 and cases with different unobserved heterogeneity in the two groups as in DGP3 and DGP6. I assume that the selected NSW data satisfies condition ; condition 2, which assumes different unobserved heterogeneity distribution for the control and treatment outcomes due to the selection into the treatment (at the 27th-month interview, 72% of the treatments and 68 % of the control group completed the interviews); condition 3 (Smith and Todd suggested that the sources of bias for Dehejia and Wahba s NSW sample are stable over time and should be differenced out); and condition 4 which is satisfied given the difference between the number of observations in the treatment and control groups in both periods. 5

Therefore, a nonlinear DID methodology can be employed and the counterfactual employment duration in the absence of treatment can be identified. To draw inference about the effect of the policy on the treatment group, we compare the actual outcomes with the counterfactual outcome in the absence of the treatment. The model allows to analyze the counterfactual effect of the treatment on the control group. When inference about the effect of the policy on the control group is drawn, a comparison of the counterfactual outcome in the presence of treatment with the actual outcome in the absence of treatment should be performed. Even if the two problems seem that can be solved symmetrically, more assumptions are required to analyze the effect of the policy on the control group. Therefore, I restrict my analysis to the effect of the policy on the treatment group. Using Theorems 3. and 3.2 by Athey. and Imbens (2003), the distribution of Y N (the counterfactual effect of the policy on the treated group) is identified on the restricted support sup p Y N, and is given by ³ F Y N (y) =F Y0 F Y 00 ³F Y N (y). 0 If the distribution of Y0 I (the counterfactual effect of the policy on the untreated group) is desired, then Y0 I is identified on the restricted support sup p Y I, and is given by ³ F Y I (y) =F Y00 F 0 Y 0 ³F Y I (y). The counterfactual distribution can be compared with the realized distribution by employing tests for equality of distributions, First Order Stochastic Dominance (FOSD) and Second Order Stochastic Dominance (SOSD) using a Kolmogorov-Smirnov statistic. The tests were presented by Alberto Abadie (2002) and are using a two-sample Kolmogorov-Smirnov statistic because it has good power properties. Considering the group that received the intervention (I) and the counterfactual group (C), the hypothesis of interest for the distributional test for 2 distribution functions F I and F C is formulated as follows;. Equality of distributions: F I (t) =F C (t), for any t R, given that t =employment duration = t 0. 2. First order stochastic dominance (F I (t) dominates F C (t)) if F I (t) F C (t). 3. Second order stochastic dominance (F I (t) dominates F C (t)) if 0 Z t Z t 0 F I (x) dx F I (x) dx Z t Z t 0 F C (x) dx, for any t R or given t 0 F C (x) dx. 6

Given that the Kolmogorov-Smirnov statistic (KS N ) has an unknown asymptotic distribution under the null hypothesis, I use the same methodology as Alberto Abadie (2002) to overcome this problem. Therefore, a bootstrap methodology is employed as follows: (). ComputeKS N for the original data (2). Resample the data with replacement. Denote the new data as bt I and bt C (3). Repeat (2) B = 2000 times. You get KS d N,B (4). Compute the p values of the test as p value = B BX b= n o dksn,b >KS N. (5). Reject the null hypothesis if p value < α, where 0 <α<0.5. Having the distributions for the treatment and the counterfactual treatment, we can compute the average treatment effect on the treated group (compare the actual average outcomes of the treatment with the counterfactual average outcome in the absence of the treatment), which is defined as: AT T = E Y I E Y N = E Y I i E hfy 0 (F Y00 (Y 0 )). An estimator for the average treatment on the treated could be obtained by using estimators for Considering the empirical distribution functions F I,N (t) = F C,N2 (Y ) = N X {t I,i t}, and N i= N X 2 {t C,i t}, N 2 i= a two-sample Kolmogorov-Smirnov statistic for equality of distributions is defined as µ KS eq N = N N 2 2 sup F I,N (t) F C,N2 (t), N t R a two-sample Kolmogorov-Smirnov statistic for first order stochastic dominance is defined as µ NN KS fsd N = 2 2 sup (F I,N (t) F C,N2 (t)), N t R and a two-sample Kolmogorov-Smirnov statistic for second order stochastic dominance is defined as µ NN KS fsd N = 2 Z t 2 sup (F I,N (x) F C,N2 (x)) dx. N t R 0 7

the empirical distribution function and their inverse as follows bf Ygt (y) = N X gt n {Y gt,i y}, F b N Y gt (q) =min y gt : b o F Ygt (y gt ) q, gt i= with 0 <q and F b Y gt (0) = y gt, where y gt is the lower bound on the support of Y gt, g stands for group and t stands for time. Therefore, the average treatment on the treated estimator is constructed using the empirical distribution functions for the treated group and its counterfactual, and it is defined as dat T = Ã! N X Y,i XN 0 bf XN 00 Y N N 0 {Y 00,i y 0,i } i= 0 N i= 00 i= = N X N i= Y,i ( ) XN 0 XN 0 min y 0,i {Y 0,i y 0,i } XN 00 {Y 00,i y 0,i }. N 0 N i= 0 N i= 00 i= If the data allows the identification of the treatment effect for non-treated group (control group), the average treatment effect on the non-treated group (compare the counterfactual average outcomes of the control in presence of treatment with the average outcome of the actual control group) can be defined as: AT N = E Y0 I E Y N 0 h i = E FY (F Y0 (Y 00 )) E Y N 0. Therefore, the average treatment on the non-treated estimator can be constructed using the empirical distribution functions of the non-treated and its counterfactual Ã! AT d XN 00 N = bf XN 0 Y N {Y 0,i y 00,i } XN 0 00 N 0 N 0 = N 00 i= ( XN 00 min y,i i= i= N i= i= Y 0,i ) XN {Y,i y,i } XN 0 {Y 0,i y 00,i } N 0 XN 0 Y 0,i. N 0 i= Using this methodology, it is possible to compare the average treatment effect for the treatment i= group (AT T ) with the actual realization of the average treatment effect from the benchmark experiment (AT E), where AT E = XN Y,i XN 0 Y 0,i. N N 0 i= i= 8

This comparison is useful in order to see if there is a selection into treatment problem (the distribution of unobserved heterogeneity to be different between the control and treatment groups in the second period). Therefore, the methodology is useful to find out the ranges of adoption costs and distributions over unobservables that makes the treatment most effective. 3 Simulation Designs used to test the nonlinear DID methodology when the outcome variable takes a form of employment/unemployment duration 3. A. Linear DGP with no selection problem in the second period I use a simulation design as follows:. Generate a data with N=400 observations in the following way Y 0 = 5+ε, 0 for half a sample Y = 5+5G + t + ε with ε N (0, 6),t=and G = for the other half. 2. Estimate the distribution of the treated (G =)as if they were non-treated ³ F Y N (y) =F Y0 F Y 00 ³F Y N (y). 0 3. Perform a comparison test for the difference between the true distribution of the nontreated (G =0)and the distribution of the treated as if they were non-treated using a two-sample Kolmogorov-Smirnov statistic. 4. Represent the distributions of the estimated treatment and the true treatment graphically. The results of the tests are presented in Table. The distributions of the control and treatment groups are represented graphically and are compared with the distributions of the estimated control and the true control (see Fig., Fig. 2). The results obtained using a linear DGP data without selection problem in the second period show that the difference between the true control and the estimated control is not significant when their empirical distribution functions are compared. 9

Empirical CDFs of Control and Treatment Groups for Linear DGP.2 0.8 0.6 0.4 0.2 0 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 Figure : Empirical CDFs of Control and Counterfactual (Treated as they w ere non-treated).2 0.8 0.6 0.4 0.2 0 3 5 7 9 3 5 7 9 2 23 Figure 2: 0

3.2 B. Linear DGP with fixed effects and no selection problem in the second period I use a simulation design as follows:. Generate a data with N=400 observations as shown below Y 0 = f i +5+ε, Y = f i +5+5G + t + ε, with f N (0, ),t=, 0forhalfasample ε N (0, 6) and G =. for the other half 2. Estimate the distribution of the treated (G =)as if they were non-treated using ³ F Y N (y) =F Y0 F Y 00 ³F Y N (y). 0 3. Perform a comparison test for the difference between the true distribution of the nontreated (G =0)and the distribution of the treated as if they were non-treated using a two-sample Kolmogorov-Smirnov statistic. 4. Represent the distributions of the estimated control and the true control graphically (see Fig. 3, Fig. 4). The results of the test are presented in Table 2. The results obtained using a linear DGP with fixed effects and no selection problem in the second period show that the difference between the true control and estimated control employment durations is not significant when the two empirical distribution functions are compared. 3.3 C. Linear DGP with different heterogeneity distribution between the control group and treatment group in the second period (the case with a selection problem) I use a simulation design as follows:

Empirical CDFs of Control and Treatment Groups for Linear DGP w ith fixed effects.2 0.8 0.6 0.4 0.2 0 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 Figure 3: Empirical CDFs of Control and Counterfactual (Treated as they w ere nontreated).2 0.8 0.6 0.4 0.2 0 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 Figure 4: 2

Empirical CDFs of treatment and control groups for a linear model w ith selection problems.2 0.8 0.6 0.4 0.2 0 3 5 7 9 3 5 7 9 2 23 Figure 5:. Generate a data with N=400 observations as follows Y 0 = 5+ε, with ε N (0, 6), 0forhalfasample Y,G=0 = 5+t + ε, with ε N (0, 36),t=, and G = for the other half Y,G= = 5+2+t + ε 2, with ε 2 N (0, ). 2. Estimate the distribution of the treated (G =)as if they were non-treated using ³ F Y N (y) =F Y0 F Y 00 ³F Y N (y). 0 3. Perform a comparison test for the difference between the true distribution of the nontreated (G =0)and the distribution of the treated as if they were non-treated using a two-sample Kolmogorov-Smirnov statistic. 4. Represent the distributions of the estimated control and the true control graphically (see Fig. 5, Fig. 6). The results of the test are displayed in Table 3. 3

Empirical CDFs of control and counterfactual (treated as if they were non-treated).2 0.8 0.6 0.4 0.2 0 3 5 7 9 3 5 7 9 2 23 Figure 6: The results obtained using a linear DGP with fixed effects and a selection problem in the second period show that the difference between the true control and estimated control employment durations become more significant when the two empirical distribution functions are compared. 3.4 D. Non-Linear DGP without selection problem in the second period I use a simulation design as follows:. Generate a data with N=400 observations in the following way Y 0 = exp(2.2+ε), 0 for half a sample Y = exp(2.2+0.25g + t + ε) with ε N (0, ),t=and G = for the other half. 2. Estimate the distribution of the treated (G =)as if they were non-treated using ³ F Y N (y) =F Y0 F Y 00 ³F Y N (y). 0 3. Perform a comparison test for the difference between the true distribution of the nontreated (G =0)and the distribution of the treated as if they were non-treated using a two-sample 4

Empirical CDFs of treatment and control groups for a nonlinear DGP model.2 0.8 0.6 0.4 0.2 0 3 5 7 9 3 5 7 9 2 23 Figure 7: Kolmogorov-Smirnov statistic. 4. Represent the distributions of the estimated control and the true control graphically (see Fig. 7, Fig. 8). The results of the test are presented in Table 4. The results obtained using a non-linear DGP (exponential) and no selection problem show that the difference between the true control and estimated control employment durations is not significant when the two empirical distribution functions are compared. 3.5 E. Non-Linear DGP with fixed effects I use a simulation design as follows:. Generate a data with N=400 observations in the following way Y 0 = exp(f i +2.2+ε), Y = exp(f i +2.2+0.25G + t + ε) with f N (0, ),t=, 0 for half a sample ε N (0, ) and G =. for the other half 5

Empirical CDFs for control and counterfactual (treated as if they w ere not treated).2 0.8 0.6 0.4 0.2 0 2 3 4 5 6 7 8 9 023456789202222324 Figure 8: 2. Estimate the distribution of the treated (G =)as if they were non-treated using ³ F Y N (y) =F Y0 F Y 00 ³F Y N (y). 0 3. Perform a comparison test for the difference between the true distribution of the nontreated (G =0)and the distribution of the treated as if they were non-treated using a two-sample Kolmogorov-Smirnov statistic. 4. Represent the distributions of the estimated control and the true control graphically (see Fig. 9, Fig. 0). The results of the test are presented in Table 5. The results obtained using a non-linear DGP (exponential) with fixed effects and no selection problem show that the difference between the true control and estimated control employment durations is not significant when the two empirical distribution functions are compared. 6

Empirical CDFs of treatment and control groups for a nonlinear DGP w ith fixed effects.2 0.8 0.6 0.4 0.2 0 3 5 7 9 3 5 7 9 2 23 Figure 9: Empirical CDFs for the control and counterfactual (treated as if they w ere not treated).2 0.8 0.6 0.4 0.2 0 3 5 7 9 3 5 7 9 2 23 Figure 0: 7

3.6 F. Non-Linear DGP with different heterogeneity distribution between the control group and treatment group in the second period (the case with a selection problem) I use a simulation design as follows:. Generate a data with N=400 observations in the following way Y 0 = exp(2.2+ε) with ε N (0, ), 0 for half a sample Y,G=0 = exp(2.2+t + ε ) with ε N (0, 0.),t=and G = for the other half Y,G= = exp(2.2+0.25 + t + ε 2 ) with ε 2 N (0, 4). 2. Estimate the distribution of the treated (G =)as if they were non-treated using ³ F Y N (y) =F Y0 F Y 00 ³F Y N (y). 0 3. Perform a comparison test for the difference between the true distribution of the nontreated (G =0)and the distribution of the treated as if they were non-treated using a two-sample Kolmogorov-Smirnov statistic. 4. Represent the distributions of the estimated control and the true control graphically (see Fig., Fig. 2). The results of the test are presented in Table 6. The results obtained using a non-linear DGP with fixed effects and selection problem show that the difference between the true control and estimated control becomes more significant when the two empirical distribution functions are compared. Overall the results obtained using a simulated data show that the difference between the true control and the estimated control employment durations is small when there is no selection problem in the second period and is increasing when there is a selection problem in the second period. Thus, the method is estimating the counterfactual treatment for the non-treated very well and thus, the method can be used to estimate the average treatment effect and the distributions of the treatment and control employment durations in a consistent way. 8

Empirical CDFs of treatment and control groups for a a nonlinear DGP model w ith selection problem.2 0.8 0.6 0.4 0.2 0 3 5 7 9 3 5 7 9 2 23 Figure : Empirical CDFs for control and counterfactal (treated as if they w ere not treated).2 0.8 0.6 0.4 0.2 0 2 3 4 5 6 7 8 9 023456789202222324 Figure 2: 9

4 Application of a non-linear DID methodology on a Subsample of The National Supported Work Demonstration 4. Data. The data used in this study is a subsample of the National Supported Work (NSW) Demonstration. The subsample is a refined sample by Dehejia and Wahba (998) that was initially selected by LaLonde in 986 and it was used to test different evaluation estimators. The non-experimental comparison groups used by both Lalonde and Dehejia and Wahba were selected from the Population Survey of Income Dynamics (PSID) and the Current Population Survey (CPS). The NSW program was a transitional, subsidized work experience program that operated for four years at fifteen locations throughout the United States. Four groups were targeted by the program: female long-term AF DC recipients, ex-drug-addicts, ex-offenders, and young school dropouts. The program provided work in a sheltered training environment and assisted in job placement. The program participation was based on a set of eligibility criteria that were intended to identify persons with significant barriers to employment. Therefore, to participate in the program, the person must have been currently unemployed (defined as having worked no more than 40 hours in the four weeks prior to the time of selection for the program), and the person must have spent no more than three months on one regular job of at least 20 hours per week during the preceding six months. The program operated in 0 cities as a randomized experiment during April 975 to August 977 and provided work experience for a period of 6-8 months to the targeted treatment group. Therefore, individuals who joined early in the program had different characteristics than those who entered later.the applicants were randomly assigned to a treatment and a control group. The control group was not allowed to participate in the program. The experimental sample includes 6, 66 treatment and control observations for which data were gathered through a retrospective baseline interview and four follow-up interviews. These interviews covered the two years prior to random assignment and up to 36 months thereafter. The proportion of participants who failed to complete scheduled interviews varied across time, experimental and target groups. The respondents participation rates were statistically significantly higher for the treatment as opposed to the control group even if they were only a few percentage points higher. 20

For the 27th-moth interview, 72% of the treatments and 68 % of the control group completed the interviews. The response rates indicate that the experimental results may be biased. Thus, the analyzed data contains only the experimental sample from the randomized evaluation of the NSW program (the treatment group and control group), and it does not use the nonexperimental samples (for the control group) from PSID and CPS. The data provide information on demographic characteristics, employment history, job search, mobility, household income, housing and drug use. 4.2 Empirical Results and Heterogeneity Tests The experimental sample used in this paper is the one used by Dehejia and Wahba (998, 999), which is a subsample of LaLonde s original sample. Lalonde limits his sample to male participants assigned between January 976 and July 977 in order to achieve homogeneity within the treatment and control groups, reducing the sample to 297 treated observations and 425 control observations. Thus, in order to include two years of pre-program earnings to accommodate their model, Dehejia and Wahba discard 40% of LaLonde s data (they selected 85 treated and 260 control observations). According to their view, this selection does not affect the properties of the experimentally randomized data set and treatment and control groups still have the same distribution of pre-intervention variables. I used this particular sub-sample because Smith and Todd (2000) suggested that the sources of bias for Dehejia and Wahba s sample are stable over time and should be differenced out, and that DID estimators exhibit better performance than cross-sectional estimators in this particular case. Thus, a nonlinear DID estimator should also perform well on these data. In Graph 3, I plot the empirical CDF s for treatment and control groups. The graphs show that the treatment CDF s stochastically dominates control CDF s almost for all outcome values. Also, the graphs indicate that the difference between the 2 CDF s is not very significant. More tests about the stochastic dominance are performed in the next subsection. Table 7 shows the results obtained for the Average Treatment Effect (AT E) of the treated versus the control group for the experimental data. The results confirm the graphical results that the observed effect of the treatment is not very significant. While the observed AT E is 0.6303 months, the standard errors obtained using 2000 bootstrap samples show little significance of the AT E. Thus, the results confirm that we cannot see a very significant effect of the treatment on 2

Figure 3: the treatment group when we compare directly with the control group. Applying the non-linear DID approach to estimate the treatment effect, I obtained a significant difference between the AT E obtained in the benchmark experiment and the estimated average treatment effect AT d T obtained using the non-linear DID approach. Also, the graphical representation of the empirical distribution for the treatment and its counterfactual for non-treatment indicate at what points in the distribution the treatment is more effective. In Table 8, I present the AT d T results obtained using the non-linear DID approach. The result shows an estimated treatment effect of the treated of 2.2864 months. Also, the standard errors obtained using 2000 Bootstrap samples show that indeed the causal effect is significant and thus, the counterfactual estimate of the non-treated explains the effect of the unobserved heterogeneity in the two experimental groups (treatment versus control). Plotting the empirical CDF 0 s for the treated group and its counterfactual distribution (Fig 4), we observe that treatment dominates the counterfactual in a more significant way than the treatment dominates the control group. The results are reconfirmed by employing stochastic dominance tests in the next subsection. As a final remark, the results obtained using the non-linear DID method suggest that indeed 22

Figure 4: the benchmark experiment do not give an unbiased estimate of the treatment effect. 4.3 Heterogeneity Tests and Distributional Tests Informally, we can think of this as measuring the plausibility of having different unobserved heterogeneity in the control group and treatment groups by performing the following tests: H 0 : u c = u t H : u c 6= u t. We can test the effect heterogeneity if there is a significant difference between dat T AT E = E Y I E Y N E[Y I ]+E Y0 N = E Y0 N i E hfy 0 (F Y00 (Y 0 )) Ã! XN 0 = Y 0,i XN 0 bf XN 00 N 0 N Y 0 {Y 00,i y 0,i }. i= 0 N i= 00 i= The results obtained by comparing the AT E and d AT T confirm that, indeed, the difference between the estimated average of the treatment effect is significantly different than the average 23

treatment effect found in the data and thus, it can be inferred that there is unobserved heterogeneity difference between the treatment and control groups when the experimental data was collected (at the 27th month interview). To perform tests for equality of distributions, First Order Stochastic Dominance (FOSD) and Second Order Stochastic Dominance (SOSD) a two-sample Kolmogorov-Smirnov statistic is used as described in the Methodology Section. The tests show that the Null (F E (t) =F C (t) and FOSD) cannot be rejected, but the Hypothesis of SOSD is rejected (see Table 0). Overall, the treatment and control employment durations distributions are not significantly different and the tests confirm the observation from the graphical representations (see fig. 3). Performing a two-sample Kolmogorov-Smirnov statistic to test the Equality of distributions, FOSD and the SOSD between the Treatment and Counterfactual employment durations we reject the Null (F E (t) =F C (t),fosd,and SOSD), (see Table ). Therefore, the treatment and counterfactual employment durations distributions are significantly different, which confirms the initial result that there is different unobserved heterogeneity between the control and the treatment groups at the 27th month interview date due to respondents participation rates. 5 Conclusion In this paper, I explore the use of a non-linear DID model in estimating the treatment effect of a subsidized work experience program. The model was tested for different types of data generating processes. The tests suggested that the model can be used to estimate in a consistent way the average treatment effect and the distribution of the treatment effect. Therefore, analyzing the effect of the program on the treated individuals from a Subsample of The National Supported Work Demonstration, I found that the estimated difference between the AT E and the AT d T is significant (.656 months with a standard error of 0.7742). This result can be explained by the fact that at the 27th-month interview, 72% of the treatments and 68 % of the control group completed the interviews. The response rates indicate that the experimental results may be biased and that the average measure cannot account for the nonexperimental aspects of the data (take-up rates are increasing in subsamples of the data that are not random). Therefore, we need to use a model that corrects for this potential selection bias. This 24

evidence explains the observed difference between the average measure and the expected measure of the employment duration between the two treatment groups. 6 Related Bibliography Abadie, Alberto, (2002): Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models, Journal of the American Statistical Association, 97: 284-292. Abadie, Alberto, Joshua Angrist and Guido Imbens, (2002): Instrumental Variables Estimates of the Effect of Training on the Quantiles of Trainee Earnings, Econometrica, Vol. 70, No., 9-7. Athey, S. and G. Imbens, (2002), Identification and Inference in Nonlinear Difference-In-Differences Models, NBER Technical Working Paper No. t0280. Bassett, G., and R. Koenker (982): An Empirical Quantile Function for Linear Models with iid Errors, Journal of the American Statistical Association, 77, 407-45. Blundell, R., and MaCurdy, (2000): Labor Supply, Handbook of Labor Economics, O. Ashenfelter and D. Card, eds., North Holland: Elsevier, 559-695. Buchinsky, M., and J. Hahn (998): An Alternative Estimator for the Censored Quantile Regression Model, Econometrica, 66, 653-67. Burtless, Gary (995): The Case for Randomized Field Trials in Economic and Policy Research, Journal of Economic Perspectives, 9(2), 63-84. Burtless, Gary and Larry Orr (986): Are Classical Experiments Needed for Manpower Policy?, Journal of Human Resources, 2, 606-639. Chamberlain, G. (99): Quantile Regression, Censoring, and the Structure of Wages, in Advances in Econometrics Sixth World Congress, ed. by C.A. Sims. Cambridge: Cambridge University Press. Dehejia, Rajeev and Sadek Wahba (998): Propensity Score Matching Methods for Nonexperimental Causal Studies, NBER Working Paper No. 6829. Dehejia, Rajeev and Sadek Wahba (999): Causal Effects in Noexperimental Studies: Reevaluating the Evaluation of Training Programs, Journal of the American Statistical 25

Association, 94(448), 053-062. Dehejia, Rajeev and Sadek Wahba (2002): Propensity Score Matching Methods for Non- Experimental Causal Studies, Review of Economics and Statistics, Volume 84, pp. 5-6. Fraker, Thomas and Rebecca Maynard (987): The Adequacy of Comparison Group Designs for Evaluations of Employment Related Programs, Journal of Human Resources, 22, 94-227. Hahn, Jinyong (998): On the Role of the Propensity Score in Efficient Estimation of Average Treatment Effects, Econometrica, 66(2), 35-33. Heckman, James (992): Randomization and Social Policy Evaluation, in Charles Manski and Irwin Garfinkle, eds., Evaluating Welfare and Training Programs (Cambridge, Mass.: Harvard University Press), 20-230. Heckman, James (997): Randomization as an Instrumental Variables Estimator: A Study of Implicit Behavioral Assumptions in One Widely-used Estimator, Journal of Human Resources, 32, 442-462. Heckman, James and Joseph Hotz (989): Choosing Among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training, Journal of the American Statistical Association, 84 (408), 862-880. Heckman, James, Hidehiko Ichimura, Jeffrey Smith and Petra Todd (996): Sources of Selection Bias in Evaluating Social Programs: An Interpretation of Conventional Measures and Evidence on the Effectiveness of Matching as a Program Evaluation Method, Proceedings of the National Academy of Sciences, 93(23), 346-3420. Heckman, James, Hidehiko Ichimura, Jeffrey Smith and Petra Todd (998): Characterizing Selection Bias Using Experimental Data, Econometrica, 66(5), 07-098. Heckman, James, Hidehiko Ichimura and Petra Todd (997): Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Program, Review of Economic Studies, 64(4), 605-654. Heckman, James, Hidehiko Ichimura and Petra Todd (998), Matching As An Econometric Evaluation Estimator, Review of Economic Studies, 65(2), 26-294. Heckman, James, Robert Lalonde and Jeffrey Smith (999): The Economics and Econometrics of Active Labor Market Programs in Orley Ashenfelter and David Card, 26

eds., Handbook of Labor Economics Volume 3A (Amsterdam: North-Holland), 865-2097. Heckman, James and Jeffrey Smith and Nancy Clements (997): Making the Most Out of Social Experiments: Accounting for Heterogeneity in Programme Impacts, Review of Economic Studies, 64(4), 487-536. Hollister, Robinson, Peter Kemper and Rebecca Maynard. 984. The National Supported Work Demonstration (Madison: University of Wisconsin Press). Imbens, G. W., and J. D. Angrist (994): Identification and Estimation of Local Average Treatment Effects, Econometrica, 62, 467-476. Imbens, G. W., and D. B. Rubin (997): Estimating Outcome Distributions for Compliers in Instrumental Variables Models, Review of Economic Studies, 64, 555-574. LaLonde, Robert (986): Evaluating the Econometric Evaluations of Training Programs with Experimental Data, American Economic Review, 76, 604-620. Lalonde, R.J. (995): The Promise of Public-Sector Sponsored Training Programs, Journal of Economic Perspectives 9, 49-68. Manski, Charles, (990): Nonparametric Bounds on Treatment Effects, American Economic Review, Papers and Proceedings, Vol 80, 39 323. Rosenbaum, P., and D. Rubin, (983): The central role of the propensity score in observational studies for causal e ects, Biometrika, 70 (), 4-55. Stute, W. (982), The Oscillation Behavior of Empirical Processes, Annals of Probability, 0, 86-07. Van Der Vaart, A. (998), Asymptotic Statistics, Cambridge University Press, Cambridge, UK. 27

7 Tables Table : Tests on Distributional Effects of Control and Estimated Control Groups Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic p value 0.945 0.9070 0.0005 Table 2: Tests on Distributional Effects of Control and Estimated Control Groups Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic p value 0.9025 0.8965 0.0005 Table 3: Tests on Distributional Effects of Control and Estimated Control Groups Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic p value 0.5700 0.5700 0.0005 28

Table 4: Tests on Distributional Effects of Control and Estimated Control Groups Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic p value 0.9405 0.9270 0.0005 Table 5: Tests on Distributional Effects of Control and Estimated Control Groups Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic p value 0.889 0.5285 0.0005 Table 6: Tests on Distributional Effects of Control and Estimated Control Groups Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic p value 0.6275 0.6275 0.0005 Table 7: Results for the causal treatment effect (treated versus control group) Control Group Treated Causal effect (AT E = E (Y t) E (Y nt)) Employment weeks 7.865 8.498 0.6303 Standard errors 0.3905 0.3534 0.5278 N(individuals) 260 85 Table 8: Results for the estimated causal treatment effect (treated versus counterfactual) Treated Group Counterfactual Causal effect AT d T = E (Y t ) E (Y c ) Employment weks 8.498 6.2054 2.2864 Standard errors 0.386 0.3845 0.5403 N(individuals) 85 85 29

Table 9: Test on effect heterogeneity Estimate dat T AT E.656 SE (standard error) 0.7742 Table 0: Tests on Distributional Effects of Treatment and Control Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic 0.6538 0.6538 0.2843 p value 0.2340 0.2340 0.0005 Table : Tests on Distributional Effects of Treatment and Counterfactual Equality in Distributions FOSD SOSD Kolmogorov-Smirnov statistic 2.287.875.0026 p value 0.0005 0.005 0.0000 30