Evaluating the Effects of Lengths of Participation in the Workforce Investment Act Adult Program via Decomposition Analysis

Size: px

Start display at page:

Download "Evaluating the Effects of Lengths of Participation in the Workforce Investment Act Adult Program via Decomposition Analysis"

Lorraine Wood
5 years ago
Views:

1 Evaluating the Effects of Lengths of Participation in the Workforce Investment Act Adult Program via Decomposition Analysis Wallice Ao Institute for Defense Analyses Sebastian Calonico University of Miami Ying-Ying Lee University of Oxford (Preliminary Version) February, 2016 Abstract This paper proposes a procedure to analyze how different levels of active labor program participation affect participants labor market outcomes. To do so, we propose an efficient propensity score weighting estimator that decomposes the differences in wage distributions for participants of different lengths of program participation into (1) wage-structure effect: arising due to the different wage structures associated with different lengths of participation, and (2) composition effect: arising due to different characteristics distributions for participants of different lengths of participation. These counterfactual effects reveal causal effects relationships under the unconfoundedness assumption, such as treatment effects for the treated, where the multi-valued treatment variable is the length of participation. Moreover, we calculate the semiparametric efficiency bound for the multi-valued treatment effects which generalizes the well-studied results in the binary treatment effect literature. We implement to procedure to study the Workforce Investment Act (WIA) program. Our estimation results show that the heterogeneity in lengths of participation is an important dimension to evaluate the WIA Adult program and other social programs in which participation varies. The results of this paper, both theoretical and empirical, provide rigorous assessment of intervention programs and relevant suggestions to improve the performance and cost-effectiveness of these programs. JEL classification: I38, H53, C31, C14 Keywords: Program evaluation, decomposition analysis, treatment effects, propensity score, counterfactual distribution, multi-valued treatments, semiparametric efficiency ao@wisc.edu. The authors are thankful to Steven Durlauf, Christopher Taber, Jesse Gregory, Jack Porter, James Walker, Robert Haveman, and Matias Cattaneo for their valuable advice and suggestions. The author also thanks participants from 2013 Midwest Economic Association Annual Meeting, 2013 Public Workshop in UW-Madison, and COMPIE 2014 Conference for their useful comments. scalonico@bus.miami.edu. Department of Economics, University of Oxford. Manor Road Building, Manor Road, OX1 3UQ, United Kingdom. ying-ying.lee@economics.ox.ac.uk Website: 1

2 1 Introduction There is an extensive literature on the evaluation of active labor market programs. 1 In much of these works, program performance is measured by comparing employment outcomes of those who participate in the programs and those do not. However, a prominent feature of these programs is that participation varies, in length and/or intensity. As one may expect, different levels of program participation may yield very different outcomes. It is well-documented that while active labor market programs can improve participants labor market outcomes by providing training and assistance they need to obtain higher wages and find more suitable jobs, participants may face lock-in effects as longer participation may yield unsatisfying labor market outcomes due to lost labor market experience, which employers may view as a negative signal. From a policy point of view, while these programs can improve participants welfare and reduce unemployment, they can also be very costly over time. In this paper, we propose a procedure to estimate how different levels of program participation affect participants labor market outcomes, mainly in wages. In addition, we seek to identify the explanatory factors which drives the observed difference in wage outcomes. First, we decompose the differences in wage distribution for participants of different lengths of program participation to those arising due to (1) wage-structure effect and (2) composition effect. A wage structure, or wage schedule, is a (probabilistic?) mapping from workers characteristics to a certain wage distribution. Wage-structure effect captures how participants are distinguished in the labor market by their levels of program participation. Analyzing wage-structure effect helps us to identify where the program is the most effective. On the other hand, composition effect captures how much the difference in characteristic distributions for participants of different levels of participation affect the observed difference in wage distributions. Such decomposition can be carried out easily for distributional features of the counterfactual wage distributions, such as mean, quantiles, and inequality measures. We propose an efficient propensity score weighting estimator to implement the above decomposition analysis. When the levels of participation is considered as a multi-valued treatment variable and is chosen randomly by participants conditional on their characteristics, the counterfactual effects in the decomposition analysis reveal causal relationships, such as treatment effects for the treated. Our estimator is a nontrivial extension to Cattaneo (2010) who studies multi-valued treatment effects. Furthermore, we decompose composition effect to examine the isolate the effect of different explanatory factors within participants characteristics. We call this the detailed composition effect. It reveals how specific characteristics attributes to the observed wage outcomes. We implement our procedure to study how different levels of participation in the Workforce Investment Act (WIA) program affect participants post-program wage outcomes. Our data contains detailed demographic information and employment history for a representative sample of WIA program participants entering the program from July 2003 to June In this sample, program participation ranges 1 See handbook chapter by Heckman, LaLonde, and Smith (1999). Heckman and Vytlacil (2007) and Imbens and Wooldridge (2009) provide comprehensive review and discussions on the program evaluation literature. 2

3 from less than one month to more than four years. We discretize the length of participation to four levels: short (1-5 months), medium (6-8 months), long (9-14 months), and very long (15-24 months). We define wage outcome to be the difference between 8 quarters after leaving the program and 4 quarter prior to entering the program. Our estimation results suggest that the wage structures faced by participants of different lengths of participation contribute much to the difference in wage outcomes among them. On the other hand, the characteristics distributions of participants of different lengths of participation do not contribute much to the differences in wage outcomes. Wage outcomes are all positive and significantly different from zero for all 4 levels of participation. This is consistent with finding in the previous studies on WIA (Heinrich, Mueser, and Troske (2009) and Hollenbeck, Schroeder, King, and Huang (2005)), which compare the post-program wages of participants and non-participants using a binary treatment effect approach. The magnitude of the average wage gain, however, are different across gender-racial groups. We observe that the average wage gains are higher for white male and female participants compared to black male and female; black male participants experience the lowest average wage gain. This result is consistent with prior studies on labor market programs such as the JTPA (See Nightingale and Elaine (2011) for example). Most notably, we find that the wage outcomes do no have a monotonic relationship with lengths of participation in the program and this relationship varies across different gender-racial groups. Our results contrast to those found in Flores, Flores-Lagunes, Gonzalez, and Neumann (2012) where the Job Corps program has a slightly positive impact on the employment outcome for the shorter duration of participation. Furthermore, Finally, we estimate the detailed composition effect of unemployment insurance status... This paper contributes to the following strands of literature. First, we contribute to a growing program evaluation literature on accessing the impact of different levels of participation in social programs. In contrast to the vast binary treatment effect literature which captures the effect of participating in a program, econometrics methods for treatment intensity effects are less developed. To the best of our knowledge, this paper is the first program evaluation on the effects of the lengths of participation using the decomposition analysis and the multi-valued treatment effect estimation. Some similar works include Behrman, Cheng, and Todd (2004), which considers duration in a program as a continuous treatment and uses a matching-typed estimator to evaluate a Bolivian preschool program. Hirano and Imbens (2004) propose a regression estimator using the generalized propensity score for continuous treatment effects, which has been used to evaluate welfare programs, such as Progresa/Oportunidades in Mexico (Ibarraran and Villa, 2010) and the South African Child Support Grant (Agüero et al., 2010), and job training programs such as Job Corps in Flores, Flores-Lagunes, Gonzalez, and Neumann (2012) and German adult job training program in Kluve, Schneider, Uhlendorff, and Zhao (2012). Our second contribution is to provide estimators for distributional multi-valued treatment effects for the treated and decomposition analysis. Our propensity score weighting estimators build on and extend Cattaneo (2010) who studies multi-valued treatment effects for the population. The estimation and inference procedure apply to multi-valued quantile treatment effects for the treated and various inequality 3

4 measures. The nonparametric estimators are robust to misspecification and easy to implement. 2 estimators are efficient; their asymptotic variances reach the semiparametric efficiency bound derived in?. The GMM estimation procedure allows for joint estimation and joint inference on the treatment effects across treatment levels and among the distributional features. For example, we could test if the median equals the mean. The estimators are easy to implement and have many potential applications. One example is to evaluate the effects of the multiple treatments offered by the program, such as the JTPA in Plesca and Smith (2007) and the National Evaluation of Welfare-to-Work Strategies study in Flores and Mitnik (2014). 3 This paper also contributes to the decomposition analysis literature, recently reviewed by Fortin, Lemieux, and Firpo (2011). Chernozhukov, Fernandez-Val, and Melly (2013) develop a uniform inference procedure for a semiparametric regression estimation, while we contribute a uniform inference result for efficient nonparametric propensity score weighting estimators. Firpo and Pinto (2011) address the idea of counterfactual distribution and propose a propensity score weighting estimator for the distributional effect of a binary treatment. Most notably, we add to the decomposition analysis literature on the detailed composition effects by isolating the impacts of some specific explanatory factors, introduced by DiNardo, Fortin, and Lemieux (1996) and Fortin, Lemieux, and Firpo (2011). That is, the counterfactual is based on the conditional distribution of one factor given other explanatory covariates while other explanatory covariates remain the same distribution. Our The detailed decomposition effect provides policymaker the information on how specific characteristic affects the outcomes of different treated groups and is novel to the decomposition literature. We calculate the semiparametric efficiency bound and propose an efficient estimator to analyze the detailed decomposition effects. This paper is organized as follows: Section 2 describes the econometric methods for decomposition analysis. We discuss the relationship between our decomposition analysis and the treatment effects literature. Section 3 describes institutional background of the Workforce Investment Act programs, the data, and highlights the importance of addressing the heterogeneity in program participation as an important dimension for program evaluation. Section 4 describes the implementation of the decomposition analysis using our proposed propensity score weighting estimator. We present and discuss our estimation results. Section 5 collects econometric theories for our proposed propensity score weighting estimator. In Section 2.3, we propose an efficient propensity score weighting estimator for the detailed composition effects and derive its asymptotic properties. Appendix presents the proofs to our theorems and supplementary tables and diagrams. 2 Building on Stata modules developed by Cattaneo, Drukker, and Holland (2012), which implement the estimations of average treatment effect, we develop R modules to implement the estimations of average treatment effects for the treated. 3 Plesca and Smith (2007) use a matching estimator to evaluate the JTPA that offers multiple treatments or different services to participants. They illustrate disaggregating multi-treatment programs could provide useful insights into program operation. Flores and Mitnik (2014) consider the problem of using data from multiple programs implemented at different locations. A local government considers to implement one of several possible job training programs. In contrast to the population effects in Flores and Mitnik (2014), our treatment effects for the treated can evaluate the effect of implementing one program at a specified location. 4

5 2 Decomposition Analysis and Treatment Effects In this section, we formally describe the decomposition analysis framework. We observe that wage outcomes are different across groups with different levels of program participation. More specifically, the distributions of wage outcomes vary across groups. We seek to identify the explanatory features which drive such variation. Section 2.1 introduces the counterfactual effects by decomposition analysis in the spirit of Oaxaca (1973), Blinder (1973), DiNardo, Fortin, and Lemieux (1996), Fortin, Lemieux, and Firpo (2011), and Chernozhukov, Fernandez-Val, and Melly (2013). In particular, we analyze the effect on the wage distributions when the following happens. First, the conditional wage distribution given covariates changes; this is therefore considered as the wage structure effect. Second, the distribution of the covariates changes; this is therefore considered as the composition effect. In Section 2.2, we interpret the counterfactual effects as the treatment effects for the treated and overall treatment effects under the unconfoundedness assumption. 2.1 Counterfactual effects We follow the setup in Fortin, Lemieux, and Firpo (2011), DiNardo, Fortin, and Lemieux (1996), and Chernozhukov, Fernandez-Val, and Melly (2013). The population of agents is categorized into mutually exclusive sub-populations indexed by t T, where T = {0, 1, 2,..., J} is a finite discrete set with some fixed positive integer J. The index t could generally indicate a policy intervention or economic environments, such as unionization or time periods. We label the sub-population belonging to, choosing, or assigned t as group-t. In each group-t, an independent and identically distributed data set {Y ti, X ti },...,nt is drawn from the joint distribution of (Y t, Xt ) Y X R 1+d x. Given observability, we can identify the outcome distribution F Yt, the covariate distribution F Xt, and the conditional distribution F Yt X t. The actual outcome distribution of group-t can be written as F Yt (y) = F Yt X t (y x)df Xt (x) X by the law of iterated expectations. The conditional outcome distribution given covariates x, in which {x F Yt X t (y x) : y Y }, describes the wage structure that is the stochastic assignment of outcome to program participants with characteristics x, and can be viewed as the wage schedule of group-t. And F Xt (x) is the actual characteristics distribution of group-t. We formally define the counterfactual distribution F Y t t (y) F Yt X t (y x)df Xt (x) (2.1) that is a well-defined statistical object by the following assumption. Assumption 1 (Common Support) The support of X t, X R dx, is the same for all t T. X 5

6 The common support assumption ensures that we could observe agents with the same characteristics participate in the program from all duration of participation. We view the counterfactual outcome Y t t as a random variable generated by the distribution function F Y t t. The counterfactual distribution F Y t t has two interpretations: First, it is the counterfactual wage distribution that would have prevailed for group-t if they faced group-t s wage schedule (conditional outcome distribution) {x F Yt X t (y x) : y Y}. Second, it is the counterfactual wage distribution of group-t if they had group-t s characteristics distribution F Xt (x). These interpretations of the counterfactual distribution F Y t t (y) are valid by assuming invariance of conditional distribution in the following Assumption 2 4. Assumption 2 (Invariance of Conditional Distribution) The conditional wage distribution F Yt X t (y x) applies or can be extrapolated for x X. Or it remains valid when the marginal distribution F Xt F Xt. replaces Following the definition of the counterfactual distribution in (2.1), the difference in outcome distributions between group-t and group-t F Yt (y) F Yt (y) can be decomposed F Yt (y) F Yt (y) = F Y t t (y) F Y t t (y) ) = (F Yt Xt (y x) F (y x) Yt Xt df Xt (x) (2.2) X ) + F Yt X (y x) d t (F Xt (x) F Xt (x). (2.3) X The first part F Y t t (y) F Y t t (y) in (2.2) represents the wage structure effect, arising due to different wage schedules across groups. The second part F Y t t (y) F Y t t (y) in (2.3) is the composition effect, arising from the different characteristics distributions among participants across groups. We can similarly decompose functionals of the wage distributions, such as the mean or the quantile functions, into wage structure and composition effects. group-t and group-t can be decomposed as follows: For example, the average outcomes between E[Y t ] E[Y t ] = E[Y t t ] E[Y t t ] ) = y d (F Yt X t (y x) F (y x) Yt Xt df Xt (x) (2.4) X Y ( ) + E[Y t X t = x] d F Xt (x) F Xt (x). (2.5) X Let t denote the group of participants who are in the WIA program for the short term and let t denote those for the long term. Then E[Y t t ] E[Y t t ] in (2.4) represents the average wage structure effect and reveals the change in average wages for long-term participants, if they faced the wage schedule of short-term participants, F Yt X t (y x). On the other hand, the expression in (2.5), which is equivalent to E[Y t t ] E[Y t t ], represents the average composition effect and reveals the change in average wages 4 This assumption is borrowed from Assumption 6 in Fortin, Lemieux, and Firpo (2011) 6

7 for short-term participants, if they had the same covariates distribution as long-term participants. By the same reasoning, we define the overall counterfactual distribution F Y t (y) t T F Y t t (y)p(t = t ) = X F Yt X t (y x)df X (x). The overall counterfactual wage outcome Y t is a random variable generated by the marginal distribution F Y t (y). The overall counterfactual distribution F Y t (y) can be interpreted as the counterfactual distribution for all program participants if they all had the same wage schedule as group-t. Therefore, the overall average wage structure effect E[Y t ] E[Y t ] is the difference between the average wage if everyone got paid according to the wage schedule of long-term group (t) and the average wage if everyone got paid according to the wage schedule of short-term group (t ). Now we summarize and define the parameters of interest. We define the decomposition parameter γ t t = E[Y t t ] to be the mean of the counterfactual distribution for t, t T. And define the overall parameter β t = E[Y t ] to be the mean of the overall counterfactual distribution for t T. In our application, we evaluate the WIA by estimating the following counterfactual effects: the average wage structure effect E[Y t t ] E[Y t t ], the average composition effect E[Y t t ] E[Y t t ], and the overall average wage structure effect E[Y t ] E[Y t ] for different length of participation t, t {short, medium, long, very long}. In the next section, we will see the average wage structure effect E[Y t t ] E[Y t t ] is interpreted as the average treatment effect for the treated and the overall average wage structure effect E[Y t ] E[Y t ] is the average treatment effect under the unconfoundedness assumption. The overall parameter β t = E[Y (t)] is known as the dose response function in the statistical literature or the average structural function in the econometrics literature. 2.2 Treatment Effects This section relates the decomposition analysis with the treatment effect model following the discussion in Fortin, Lemieux, and Firpo (2011) and Chernozhukov, Fernandez-Val, and Melly (2013). In the treatment effect model, the multiple treatment status t is the realized value of a random treatment variable T T. We view the multi-valued treatment variable T as the length of participation chosen by the participants. The outcome of group-t Y t is assumed to be the potential outcome for the entire population of interest, commonly labeled as Y (t). Note that previously Y t does not have a causal interpretation. It is assumed there exists a sequence of potential outcome {Y (t)} t T for the population. For each individual i, we observe his outcome Y i = Y i (t) if he receives treatment T i = t. His other potential outcomes Y i (j) for j t are not observed or latent. For ease of exposition, we can express the outcome variable for the population to be Y = J t=0 Y td t, where D t 1{T = t} is an indicator function of the multi-valued group status and J t=0 D t = 1. And the covariates for the population are denoted by X = J t=0 X td t. Then we can write the actual conditional distribution function F Yt Xt (y x) = F Y T,X (y t, x) = F Y (t) T,X (y t, x) and F Xt (x) = F X T (x t) for each group t T. The counterfactual effects are well-defined statistical parameters by the common support Assump- 7

8 tion 1. When unconfoundedness is assumed, the descriptive decomposition analysis will carry a causal interpretation: F Y t t (y) = = X X F Y X,T (y x, t) df X T (x t ) = F Y (t) X,T (y x, t ) df X T (x t ) = F Y (t) T (y t ), X F Y (t) X,T (y x, t) df X T (x t ) where the third equality comes from the unconfoundedness assumption: Y (t) is independent of T conditional on X. That is to assume conditional on a rich set of covariates X, each individual chooses his or her treatment level T randomly over the whole choice set T. The invariant conditional distribution Assumption 2 is replaced by the stronger unfoundedness assumption, also known as selection on observables, ignobility, or missing at random, which has been often used in the treatment effect literature following a series of paper by Rubin and coauthors. The common support Assumption 1 is known as the overlapping assumption, i.e. the propensity score P(T = t X) is bounded away from zero almost surely. Let t denote long-term participation and t denote short-term participation where switching a participant from long-term participation to short-term is considered to be the treatment. The overall counterfactual distribution F Y t (y) = X F Y XT (y x, t)df X (x) = F Y (t) (y) is the marginal distribution of the potential outcome Y (t). For example, the mean of the overall counterfactual distribution E[Y t ] reveals the average treatment effect E[Y (t)] E[Y (t )], the change in average wages, if all participants switched from short-term participation to long-term participation. More importantly, the wage structure effect in (2.2) F Y t t (y) F Y t t (y), which, in our decomposition analysis, represents the counterfactual effect when wage schedule changes, is equal to the treatment effect for the treated F Y (t ) T (y t) F Y (t) T (y t), which is the impact on the wage distribution if long-term participants (t) had stayed in the program for the short-term. On the other hand, the composition effect in (2.3) F Y t t (y) F Y t t (y), which represents the counterfactual effect of changing the covariate distribution from that of the short-term participation to that of the long-term participants, is equal to F Y (t ) T (y t) F Y (t ) T (y t ), which is the difference in the wage distribution between short-term participants (t ) and long-term participants (t,) if they were both treated for a short term. 2.3 Detailed composition effects We further decompose the composition effect for different explanatory factors and provides an alternative method to perform the counterfactual analysis in DiNardo, Fortin, and Lemieux (1996) and Fortin, Lemieux, and Firpo (2011). For example, this detailed decomposition provides policymaker the information of how specific characteristic affects the outcome of different treated groups. Recall that in Section 2.1, the composition effect interprets F Y t t as the counterfactual outcome distribution of group-t if they had group-t s characteristics distribution F Xt (x). In this section, we follow DiNardo, Fortin, and Lemieux (1996) and Fortin, Lemieux, and Firpo (2011) to further decompose the aggregate composition effect to isolate the contribution of different factors in the characteristics 8

9 X t = (X t1, X t2 ). More specifically, we counterfactually assign the conditional distribution of X 1 given X 2 of group-t (F Xt 1 X t 2 ) to group-t, but X 2 remains its distribution of group-t (F Xt2 ). That is, we perform a counterfactual experiment by changing the conditional, as opposed to the marginal, distribution of X 1. Rothe (2014) decomposes the composition effect by the marginal distributions of X 1. In contrast, we focus on deposition based on sequential condoning arguments. We consider the question what would have happened to the outcome distribution if the distribution of X 1, but none of the other covariates, had changed from t to t. Then we formally define the corresponding factor counterfactual distribution by F Y t X1 t (y) X 2 X 1 F Yt X t (y x 1, x 2 ) df Xt 1 X t 2 (x 1 x 2 ) df Xt2 (x 2 ). Again, the factor counterfactual outcome Y t X 1 t is a random variable generated by the cdf F Y t X1 t (y). So we can decompose the composition effect F Y t t (y) F Y t t (y) = F Y t t (y) F Y t X1 t (y) (2.6) + F Y t X1 t (y) F Y t t (y). (2.7) The factor effect in (2.7) analyzes the change of the distribution of X 1 to that of group-t but X 2 remains its distribution for group-t. attributes X 2. The remaining factor effect in (2.6) accounts for the role of remaining Take the example in DiNardo, Fortin, and Lemieux (1996) who analyze the effects of institutional and labor market factors on the U.S. distribution of wages on the period 1979 to The outcome of interest Y is wage. The multi-valued treatment variable T indicates year: t is for year 1988 and t is for year The factor X 1 is a dummy variable for the union status and X 2 is a vector of other attributes. So F Y t X1 t in (2.8) is the distribution of wage that would have prevailed in 1988 if unionization, but none of the other attributes, had remained at its 1977 level. The factor composition effect in (2.7) extracts the impact of unionization, a factor of labor market institutions, on the wage distributions between 1979 to The semiparametric procedure in DiNardo, Fortin, and Lemieux (1996) provides a visually clear representation of where in the density of wages these various factors exert the greatest impact. nonparametric estimation procedure allows us to easily recover various distributional features. We rearrange the factor counterfactual distribution F Y t X1 t (y) = E [ F Y X1 X 2 T (y x 1, x 2, t)w X1 t (X) T = t ], where (2.8) W X1 t ((x 1, x 2 )) P( T = t X = (x 1, x 2 ) ) P ( T = t X 2 = x 2 ) P ( T = t X = (x1, x 2 ) ) P ( T = t X2 = x 2 ). Similar to the decomposition parameter γ t t, we define the factor parameter λ t t to be the distributional feature defined by the factor counterfactual distribution F Y t X1 t in (2.8). Our 9

10 Definition 1 (Factor parameter) Suppose a measurable function m : Y Θ R d m, where the parameter space Θ R d θ features of Y t X 1 t satisfying for any t, t T. The estimation of λ t t Y and d m d θ. Define the factor parameter λ t t Θ to be the causal distributional E[m(Y t X 1 t ; λ t t )] = is based on Y m(y; λ t t ) df Y t X1 t (y) = 0 [ ] D t m(y; λ)df Y t X1 t (y) = E m(y ; λ) P(T = t X) W = t X) X 1 t (X)P(T = 0. (2.9) P(T = t) For the mean when m(y; λ) = y λ, the factor parameter λ t t = E[Y t X 1 t ] = E[Y X 1 = x 1, X 2 = x 2, T = t]df Xt 1 X t 2 (x 1 x 2 ) df Xt2 (x 2 ) is the counterfactual mean of wage that would have prevailed if unionization, but none of the other attributes, had remained at its 1970 level. 3 The Workforce Investment Act (WIA) Programs 3.1 Institutional background Replacing the Job Training Partnership Act (JTPA), the Workforce Investment Act (WIA) of 1998 has two main goals. First, as stated in the Act, it is to...consolidate, coordinate, and improve employment, training, literacy, and vocational rehabilitation programs in the United States... by reforming the former public workforce programs that had become fragmented and uncoordinated. Specifically, the Workforce Investment Act established the largest network of public-financed career service programs and unified them to be available at over 3,000 One-Stop Career Centers around the country. Second, the Act established 3 flagship programs focusing on assessment, couseling, job readiness skills, occupational skills and trainings: WIA Adult program, WIA Dislocated Workers 5 program, WIA Youth program. The WIA programs are an integral part of Employment and Training Administration (ETA) under the U.S. Department of Labor. In 2010, the programs serviced more than 7 million Adult workers and 1.5 million dislocated workers nationwide. In this paper, we focus on the WIA Adult program that serves the disadvantaged individuals. 6 WIA career services are offered at three levels. All individuals entering the program receive the 5 Dislocated workers are officially defined by meeting one of the following criteria: (1) has been laid off or terminated, or received notice of termination or lay off and is unlikely to return to previous industry of occupation, (2) has been terminated or laid off, or has received a notice of termination or lay off, as a result of permanent closure of, or substantial layoff at a plant or facility, (3) was self-employed and now unemployed because of a natural disaster, (4) was self-employed (including farmer, rancher, or fisherman), but is unemployed as a result of general economic conditions in the community in which he or she resides or because of a natural disaster, or (5) is a displaced homemaker. 6 An individual is elgibile if he or she is age 18 and older who are unemployed at time of application or who are underemployed (in a job earning $10.10 or less) or whose family meets adult low income guidelines. 10

11 core services, which include staff-assisted job search and placement, labor market information, and basic counseling. After that, staff may recommend the participants to receive intensive services, which involves more comprehensive assessment and counseling, career planning, and possibly some short courses. Participants may then be recommended for training services, which may be on-the-job training with local employers and apprenticeships in different fields, or educational training programs in vocational schools and community college using vouchers authorized by WIA program staffs. The participants can be linked to job opportunities in their communities. In our data, about 2/3 of training recipients receive some kind of credentials. Although participation in WIA is voluntary, access is restricted. Program staffs must admit participants and authorize any services that are provided. 3.2 Data and descriptive statistics Our dataset comes from the following sources: the annual Workforce Investment Act Standardized Record Data (WIASRD), the Unemployment Insurance data, and the Unemployment Insurance Wage Record data. The WIASRD dataset was primarily collected in December 2007 by the state workforce agencies, as requested by the US Department of Labor, for evaluating federal-funded WIA activities. Agreements were reached and data were provided by twelve states: Connecticut, Indiana, Kentucky, Maryland, Minnesota, Missouri, Mississippi, Montana, New Mexico, Tennessee, Utah, and Wisconsin. This dataset includes individual-level information on the time of program entry and exit (month and year), qualification status (adult or dislocated worker), and detailed demographic characteristics such as age, race, level of education, gender, disability and veteran status of all participants entering the program from July 2003 to June 2005 in nine of the above twelve states. (WHICH NINE? The database I got didn t specify. Make a footnote of that?) Each individual in the WIASRD dataset is assigned a random identification number that can be matched with other administrative data. SSNs and other IDs were first replaced with random identification numbers to ensure that each individual has one unique ID. Invalid SSNs in wages data were dropped from the datasets. We cross-reference the WIASRD dataset with the Unemployment Insurance data and the Unemployment Insurance Wage Record data. The Unemployment Insurance (UI) data cover all individuals who filed an unemployment insurance claim and contain demographic information of claimants and the sum of insurance payments received. The Unemployment Insurance Wage Record (UIWR) data provide quarterly earnings from all employees in unemployment insurance-covered firms. All earnings are then adjusted for inflation in 2006 Q1 dollars. Using the unified identification numbers, we match the UI data and the (UIWR) data with the WIASRD to compile information on detailed labor market experience of the WIA participants including earnings and employment status before and after program participation. Combining the information from the above three sources, we obtain detailed demographic information, unemployment insurance status, pre- and post-participation labor market experience for our sample of WIA Adult program participants. Then, observations with missing entry and exit dates are dropped. We also discard observations with length of participation less than one month (those who entered and 11

12 Table 1: Overview of basic summary statistics Median Mean Std. Deviation (1) (2) (3) Length of Participation Age Education Notes: The sample size includes 66,693 WIA Adult Program participants. Table 2: Lengths of participation by groups Lengths of participation (months) Median Mean Std. Dev. Sample size (1) (2) (3) (4) Full sample ,693 By gender: Male ,744 Female ,949 By race: White ,153 Black ,175 Hispanic ,237 Others/Not specified ,128 By education: Less than high school ,077 Some high school ,457 High school graduates ,001 Some college ,429 College graduates and above ,279 Veteran ,206 Disable ,782 Unemployment insurance: Recipient ,836 Non-recipient ,857 Notes: 12

13 Table 3: Lengths of participation by race/gender groups Lengths of participation (months) Median Mean Std. Dev. Sample size (1) (2) (3) (4) White female ,056 White male ,097 Black female ,979 Black male ,196 Hispanic female ,319 Hispanic male Notes: exited the program within the same month). We restrict our sample to be participants between the ages of In addition, in order to consider the effect of the length of program participation on labor market outcomes, information on post-program labor market experience is required. Therefore, we also discard observations with an exit date later than June 2007 because the latest wage data we have is the second quarter of The variation of program participation ranges from less than one month to more than four years. Since about 95% of our sample participate in the program for less than 24 months and the distribution is highly skewed to the left. We will restrict sample according to this criterion. Our estimation sample comprises of observations from WIA Adult program participants. Table 1 presents the basic summary statistics for our sample of WIA participants. The length of participation has a median of 7 months and a mean of 7.22 month. The participants were mostly in their 30 s and had obtained a high school degree or equivalent. Table 2 presents detailed the variation of program participation for our estimation sample by demographic groups. We observe that female participation is longer. Hispanics participants stays substantially longer in the program than other races. Time spent in the program for blacks is the lowest. A small portion of our sample are those who have obtained college and post-graduate or professional degrees. Compared to the full sample average, both veterans and disable workers tend to spend less time in the program. Same as previous studies of WIA programs (for example, see Heinrich, Mueser, and Troske (2009)), we separate the analysis for male and female because the labor market activities can be very different for reasons such as fertility, marriage, and household production. We also separate our analysis for different racial group as they display substantial differences in lengths of participation. Table 3 presents the summary statistics of the lengths of participation of the gender-racial groups. 13

14 3.3 Program evaluations of WIA Previous literature on WIA focuses on comparing the wage outcomes of the program participants and nonparticipants. Heinrich, Mueser, and Troske (2009) compare the wages of WIA participants to those of UI claimants and Employment Service (ES) program participants for 16 quarters after program entries. They find that on average, female WIA participants earns $482-$638 more per quarter than the comparison group while male WIA participants earns $320-$692 more per quarter. Hollenbeck, Schroeder, King, and Huang (2005) compare the wages of WIA participants to those of Employment Service (ES) program participants for 8 quarters after leaving the programs. They find that on average, female WIA participants earns $887 more per quarter than the comparison group and male WIA participants earns $773 more per quarter. Overall, previous studies have shown that the WIA programs have positive effects on participants wage outcomes. However, as we previously argued, while active labor market programs can improve participants labor market outcomes by providing training and assistance they need to obtain higher wages and find more suitable jobs, participants may face lock-in effects as longer participation may yield unsatisfying labor market outcomes due to lost labor market experience and employers may view this as a negative signal. From a policy point of view, while active labor market programs can improve the welfare of the participants, they can be very costly. In the raw WIASRD data, program participation varies from less than 1 month to more than 4 years while the expenditure on each participant exiting the program ranges from about $1000 to $ Therefore, to better evaluate the program and to give suggestions to improve their performance and cost-efficiency, we focus on studying how different lengths of participation in the WIA Adult program affect participants post-program wage outcomes. Recall that we restrict our estimation sample to those who participate in this program for at least 1 month to 24 months, which include more than 95% of the participants in the raw WIASRD data. Figure 1 and 2 displays histograms of the length of participation of full estimation samples and the gender-racial groups. Figure 1: Full Sample Full Sample (N=66693) Density Months of Participation 14

15 From the histograms, we observe that the patterns of program participation are similar in all genderracial groups. Given the data, we partition the lengths of participation to four discrete levels: short (1-5 months), medium (6-8 months), long (9-14 months), and very long (15-24 months). Although the histograms for Hispanic male and female do no exactly coincide with those other gender-racial subgroups due to the limited number of observations in the data, such a partition is still valid for these two subgroups. To show that the pattern is not driven by participants receipt of Unemployment Insurance benefits, which generally ends in 6 months, Figure 3 displays the histograms of the length of participation of UI recipients and non UI recipients. We observe the same pattern as in our full sample and the gender-racial subgroups. 15

16 Density White Females (N=18056) Months of Participation Figure 2: Gender-Racial sub-groups Density White Male (N=14097) Months of Participation Black females (N=17979) Black males (N=13196) Density Months of Participation Density Months of Participation Hispanic females (N=1319) Hispanic males (N=918) Density Months of Participation Density Months of Participation 16

17 Figure 3: UI and Non UI Recipients UI Recipients (N=23836) Non UI Recipients (N=42857) Density Months of Participation Density Months of Participation We define the wage outcome to be the difference in wages between 8 quarters after leaving the program and 4 quarters prior to entering the program. Given the variation of participation observed in the data, we therefore aim to assess the heterogeneity of participants post-program wage outcomes arising from different lengths of participation: short (1-5 months), medium (6-8 months), long (9-14 months), and very long (15-24 months). In particular, we explore the role of wage structures and participants characteristics distributions associated with different lengths of participation in attributing to the difference in wage outcome distributions. 3.4 Limitation and remarks 4 Estimation and Results The primary goal of this paper is to assess the heterogeneity of WIA participants post-program wage outcomes arising from different lengths of participation. The decomposition analysis described in Section 2 allows us to explore the role of wage structures and participants characteristics distributions associated with different lengths of participation in attributing to the difference in wage outcome distributions. Section 4.1 describes the implementation of the decomposition analysis and the estimation procedure. Section 4.2 presents and discusses the estimation results. Recall that the wage outcome, Y, is the difference in wages between 8 quarters after leaving the program and 4 quarters prior to entering the program. We partition the lengths of participation to four discrete levels: short (1-5 months), medium (6-8 months), long (9-14 months), and very long (15-24 months). In our estimation, the set of covariates, X, includes age, years of education, veteran status, disability status, and wages 2 TYPO?, 5-8 quarters prior to entering the program. Since the estimators are familiar in the treatment effect model, we will use the notation for the treatment effect model to present our objects of interest, i.e., the overall parameter β t E[Y t ] = E[Y (t)] for the average treatment effect 17

18 (ATE) and the decomposition parameter γ t t E[Y t t ] = E[Y (t) T = t ] for the average treatment effect for the treated (ATT). We discuss the unconfoundedness assumption in Section Estimation Procedure Our estimation procedure follows and modifies Cattaneo (2010) and Cattaneo, Drukker, and Holland (2012) who estimate the average treatment effect for the population. We extend Cattaneo (2010) s results to the multi-valued treatment effect for the treated, in particular, the ATT by Ê[Y (t) T = t ] and the QTT by Q τ (Y (t) T = t ).(ADD EMPIRICAL RESULT FOR QTT) The estimation and limit theory for general distribution features are described in detail in Section 5. We first nonparametrically estimate the probability of each treatment (short, medium, long, and very long) for every individual i, given their characteristics, i.e., the propensity scores P t (X i ) P(T = t X = X i ) for t {1, 2, 3, 4} and i = 1,.., n. We use a Multinomial Logistic Series Estimator, where the order of the polynomial is selected by Akaike Information Criterion. The coefficients related to the base group are set to zero for identification purpose. (CHECK) Given the estimated propensity scores ˆP t (X i ), we select the common support region for estimation, following Flores, Flores-Lagunes, Gonzalez, and Neumann (2012). For each group-t, we find the minimum and maximum estimated propensity scores: p min t min ˆP t (X i ) and p max t max ˆP t (X i ). {i:t i =t} {i:t i =t} Define the support region for t to be the subpopulation whose ˆP t (X i ) bounded between p min t and p max t : S t {i : ˆPt (X i ) [p min t, p max t ]}. The common support region is the intersection of the support regions for all t T : CS t T S t. Observations that fall outside of the common support region is dropped. The means E[Y t ] and E[Y t t ] are estimated by the ATE and ATT estimators: ˆβ t = Ê[Y (t)] = 1 n n ˆγ t t = Ê[Y (t) T = t ] = 1 n ( Dti ( ˆP t (X i ) Y Dti ) ) i ˆP t (X i ) 1 ê t (X i ) n ( Dti ( ˆP t (X i ) Y Dti i ˆP t (X i ) D ) ) t i ˆPt (X i ) ê t (X i ), ˆP t (X i ) ˆp t where e t (X i ) E[Y T = t, X = X i ] is estimated by a polynomial-regression series estimator 7 and p t P(T = t) is estimated by the sample analogue ˆp t = n 1 n D ti. 7 We use the Akaike Information Criterion to select the order of the polynomial. 18

19 The τth-quantiles Q τ (Y t ) and Q τ (Y t t ) are estimated by the QTE and QTT estimators: ˆβ t = ˆQ τ (Y (t)) = arg min q Θ 1 n n ˆγ t t = ˆQ τ (Y (t) T = t ) = arg min q Θ ( Dti ˆP t (X i ) (1{Y i q} τ) 1 n n where ê t (X i ; q) = Ê[1{Y q} T = t, X = X i] τ. ( Dti ˆP t (X i ) 1 ) ê t (X i ; q)) ( Dti ( ˆP t (X i ) (1{Y Dti i q} τ) ˆP t (X i ) D ) t i ê t (X i ; q) ˆP t (X i ) ) ˆPt (X i ) ˆp t, 4.2 Estimation results and discussions Recall that the wage outcome Y is difference in wages between 8 quarters after exit the program and 4 quarters prior to entering the program. The lengths of participation in the program is partitioned into four levels: short, medium, long, very long, which are denoted by T {1, 2, 3, 4}. Tables 1-6 in Appendix A present the estimation results for the overall parameter β t = E[Y t ] = E[Y (t)] and the decomposition parameter γ t t = E[Y t t ] = E[Y (t) T = t ] for each gender-racial group. Note that the second equalities in both expressions hold under the unconfoundedness assumption holds, which allows for causal interpretation. 8 Therefore, our estimates can be interpreted in the decomposition analysis framework, described in Section 2, as well as the treatment effect framework. 8 More discussions of this assumption is relegated to the end of this section. 19

20 Figure 4: The overall parameter for the gender-racial groups 20

21 4.2.1 The overall effects Figures 4 depicts the results for the overall parameters E[Y t ] = E[Y (t)]. We first interpret our results under the familiar treatment effects model when the unconfoundedness assumption holds. E[Y (t)] E[Y (t )]is the average treatment effect, which is the change in average wage outcomes of all participants switch from t -term participation to t-term. The average wage outcomes are positive for all gender-racial groups. In other words, on average, all gender-racial group experienced some improvement in wages between 4 quarters prior to program entry to 8 quarters after program exit (wages imputed are in 2006 Q1 dollars). The magnitude of the average wage gain, however, are different across gender-racial groups. We observe that the average wage gains are higher for white male and female participants compared to black male and female. 9 More importantly, the relationship between the lengths of participation and wage outcomes is not monotonic. For all gender-racial groups, the medium-term participants experience the least wage gain. We find that females gain the most from long participation. For white females, however, very long participants wage outcomes are much lower than long participants while for black females, the difference is almost insignificant. For males, there is no significance difference in wage outcomes for participants of short, long, and very long terms. For black males, the wage gains of different lengths of participation are not significantly different. As mentioned above, our estimates can be interpreted in the treatment effect framework, which carries a casual interpretation, or in the decomposition analysis framework. For example, β 3 β 2 = E[Y 3 )] E[Y 2 ] = E[Y (3)] E[Y (2)], where the second equality requires the unconfoundedness assumption to hold. In the treatment effect framework, E[Y (3)] E[Y (2)] is the change in average wages if all participants switch from medium-term participation to long-term participation. This is the average treatment effect where treatment is considered to be switching from medium-term participation to long-term participation. Take black female participants for example, we find the average wage increases by about $440 by extending to about one year (long term). Our estimates imply that black female participants benefit from longer program participation and the mean wage gain of extending the participation from the short-term to the long-term is E[Y (3)] E[Y (2)] = $ (68.5% increase). Similar results can be found for white female and white male, whose mean wage gain of extending the participation are $ dollars (162% increase) and $ dollars respectively. The mean wage gain for white female is incredibly huge, about 162%. In the decomposition analysis framework, E[Y 3 )] E[Y 2 ] is the difference between average wages if all participants were paid according to the wage schedule of the long-term group and the average wage if all were paid according to the wage schedule of the medium-term group. Therefore, we interpret E[Y 3 )] E[Y 2 ] = $ as the counterfactual average wage gain between the case if all black female participants got paid according to the wage schedule of long-term group and the case if all of them got paid according to the wage schedule of short-term group. Our results suggest that there is lock-in effect for extended participation in the WIA Adult pro- 9 Due to the limited number of observations in the data, our estimates for Hispanic male and female have wide confidence intervals. We will focus on the other gender-racial group in our discussion. The estimation results are presented in the Appendix. 21

22 gram. That is, longer participation in the program do not lead to higher wage outcomes. This may be due to lost labor market experience while in the program which may be viewed as a negative signal for potential employers. For black female participants, we observe that the average wage outcomes drop by E[Y (4)] E[Y (3)] = $ (a 35.5% drop) if all participants extended participation from long to very long term. This drop is statistically significant. On the other hand, the averages wage outcomes increases significantly by E[Y (3)] E[Y (1)] = $ if all were switched from short to long participation. For white female participants, we observe that the average wage outcomes drop by E[Y (4)] E[Y (3)] = $ (a 35.5% drop) if all participants were switched from long to very long term. This observation may suggest that for female participants, the optimal training length may be about a year. This is after participants have received a substantial amount of training in the program The wage-structure effect, composition effect, and treatment effect for the treated Figures 5-8 depict the results for the decomposition parameters γ t t = E[Y t t ] = E[Y (t) T = t ]. We see that the patterns for the decomposition parameters are similar with the overall parameters. Again, we can interpret our estimation results in both treatment effect framework and decomposition analysis framework. For example, E[Y (3) T = 1] E[Y (1) T = 1] is the average treatment effect for the treated, which is the average change in wages for the short-term participant extend their participation to long-term. Our estimates imply that for white females, the average wage of those who participate for the short-term will change by E[Y (3) T = 1] E[Y (1) T = 1] = $ $ = $ by extending their participation to the long-term. Similarly for black female short-term participants, the average wage outcome will change by E[Y (3) T = 1] E[Y (1) T = 1] = $ by switching to longterm participation. Nevertheless, for short-term male participants, the 95% confidence intervals of our point estimates suggest that extending from short-term to long-term participation does not significantly increase their wage outcome. Differences in the patterns for female and male participants may partly reflect the type of training they receive. According to a study of exits for program year 2005, of males exiting the 22

23 Figure 5: The decomposition parameter for white female participants WIA Adult program, 37% received on-the-job training and 15% for female. 10 Classroom training would be expected to reduce initial earnings and employment by more than on-the-job training and possibility better earning with delay. While on-the-job training may induce participants to exit the program earlier, classroom training requires a longer period of time for participants to acquire some kind of credentials to improve their employment outcomes. Perhaps it is the most interesting to interpret our estimation results for γ t t = E[Y t t ] in the decomposition analysis framework. In Section 2, we show that the average wage outcomes between two groups can be decomposed to the average wage structure effect in (2.4), which is equivalent to E[Y t t ] E[Y t t ], and the average composition effect in (2.2), which is equivalent E[Y t t ] E[Y t t ]. Note that the average wage structure effect can be interpreted as the average 10 Social Policy Research (2007): PY 2005 WIASRD Data Book: Final. US Department of Labor. 23

24 Figure 6: The decomposition parameter for white male participants treatment effect under the unconfoudedness assumption. The average wage structure effect E[Y 3 1 ] E[Y 1 1 ] reveals the change in average wage outcomes for short-term participants if they faced the wage schedule of the long-term participants, which is $ for white female participants. Similarly, our estimates that the change in average wage outcomes of short-term black females participants is E[Y 3 1 ] E[Y 1 1 ] = $ $ = $ if they faced the wage schedule of the long-term participants. On the other hand, the average composition effect E[Y 3 3 ] E[Y 3 1 ] reveals the change in average wages for short-term participants, if they had the same covariates distribution as long-term participants. Our estimates suggest that all composition effects are negligible for all gender-racial groups. Overall, our decomposition analysis suggests that the wage structures faced by participants of different lengths of participation contribute much to the difference in wage outcomes among them. On the other hand, the characteristics distributions of participants of different lengths of participation do not contribute much to the differences in wage outcomes. 24

25 Figure 7: The decomposition parameter for black female participants Unconfoundedness. ADD TESTING UNCONFOUNDEDNESS IF WE HAVE. Note that the key identifying assumption for our estimates to carry a causal interpretation in the treatment effect framework is unconfoudedness (selection on observables, or conditional independence); that is, conditional on a set of observable covariates, selection into different levels of treatment is random. This assumption can be justified by the following: First, we have a rich set of covariates which includes demographic information, unemployment insurance status, pre-, during, and post-participation labor market experience (wage and industry worked) for up to 16 quarters, to account for the unobservables. It is very likely that unobservable characteristics such as motivation, self-esteem, or family condition are captured by the individual labor market history. Second, although participation in WIA is voluntary, access is restricted. Program staffs must admit participants and authorize any services that are provided. It is likely the recommended services provided to the participants, which is closely linked to the duration of 25

Figure 8: The decomposition parameter for black male participants participation in the program, are contingent on their prior labor market experience such as wages, which are accounted for in our set

26 Figure 8: The decomposition parameter for black male participants participation in the program, are contingent on their prior labor market experience such as wages, which are accounted for in our set of covariates. Finally, the outcome variable we consider in this paper is the the wage differential between after leaving the program and prior to entering the program. This difference-in-difference specification, along with the above justification for unconfoundedness, is also used in Flores, Flores-Lagunes, Gonzalez, and Neumann (2012) and Kluve, Schneider, Uhlendorff, and Zhao (2012), allowing us to account for time-invariant factors that may have influenced selection. 5 Asymptotic Properties In this section, we present the econometric theory of our estimation and inference procedure. The general setup follows the multi-valued treatment effect model in Cattaneo (2010) and extends to the treatment effect for the treated. Section 5.1 introduces the parameters of interest that are the distributional features of the counterfactual distributions defined in Section 2. Section 5.2 introduces the efficient estimators 26

27 and presents their large sample properties. Section 5.3 presents the inequality measures based on the counterfactual distributions. We illustrate our results by estimating quantile treatment effects for the treated. We derive a quantile process that weakly converges to a Gaussian process indexed by the quantile. 5.1 Parameters of interest We define formally the distributional features of the counterfactual distribution F Y t t and the overall counterfactual distribution F Y t via a generic moment function m. Definition 2 Suppose a measurable function m : Y Θ R d m, where the parameter space Θ R d θ d m d θ. Consider any t, t T. and 1. The decomposition parameter γ t t Θ satisfies E [ m(y t t ; γ t t ) ] = m(y; γ t t ) df Y t t (y) = 0, where Y F Y t t (y) = F Y XT (y x, t)df X T (x t ). 2. The overall parameter β t Θ satisfies E [m(y t ; β t )] = m(y; β t ) df Y t (y) = 0, where F Y t (y) = F Y T (y t). Y X Definition 2 based on a generic moment function m covers various distributional features of interest. For the mean when m(y ; θ) = Y θ, the decomposition parameter γ t t = E[Y t t ] = X E[Y X = x, T = t]df X T (x t ) is the mean of the counterfactual distribution F Y t t. When m(y ; θ) = 1{Y θ} τ, γ t t is the τth quantile of the counterfactual distribution F Y t t. When m(y ; θ) = 1{Y y} θ, γ t t is the counterfactual distribution F Y t t (y). Recall that in the treatment effect literature, F Y t t is the distribution of the potential outcome Y (t) for those who have been treated at t under the unconfoundedness assumption. So γ t t γ t t can be interpreted as the treatment effects for the treated. Similarly for the overall parameter, F Y t is the distribution of the potential outcome Y (t) for the population under the unconfoundedness assumption. So β t β t is the parameter of interest in Cattaneo (2010). 5.2 Efficient estimators is the overall treatment effect of switching from t to t, which We introduce two estimators the inverse probability weighting (IPW) estimator and the efficient influence function (EIF) estimator. We follow and modify the estimation procedure proposed by Cattaneo (2010) for the overall treatment effect to estimate the decomposition parameter. The estimators are overidentified GMM estimators (d m d θ ), which are convenient to conduct inference and implement hypothesis test with restrictions. Denote the object of interest to be a (J +1) 1 vector γ t (γ 0 t,..., γ J t ) 27

28 for the treated group-t. We focus on one treated group-t for simplicity. In general, we could consider all group-t for t T at the cost of notional complexity, i.e., γ (γ 0,..., γ J ), a (J + 1) 2 1 vector. At a preliminary step, we nonparametrically estimate the infinite-dimensional nuisance parameters the propensity score P t (X) P(T = t X) and the conditional expectation of the moment e t (X) = E [ m(y ; γ t t ) T = t, X ]. The proposed estimators are also semiparametric doubly robust in the sense that the misspecification of either P t (X) or e t (X) does not affect the consistency (Graham, 2011; Rothe and Firpo, 2013). The inverse probability weighting estimator uses the moment condition: [ E[m(Y t t ; γ t t )] = E m(y ; γ t t ) D t P(T = t X) P(T = t ] X) P(T = t = 0 ) by rewriting the definition of the decomposition parameter. 11 To define our GMM estimators, denote to be the Euclidean norm given by A = trace(a A) for any matrix A. Choose a (J + 1)d θ (J + 1)d m matrix A n = A + o p (1) such that a weighting matrix W = A A guarantees the resulting estimator to be efficient. The inverse probability weighting (IPW) estimator is defined by ˆγ IP W arg min A n M IP W θ Θ J+1 t n (θ, ˆP, ˆp) + o p (n 1/2 ), where Mt IP W n (θ, P, p) 1 n n m(y i ; θ) ( D0 P 0 (X i ),..., P t (X) P(T = t X), and p t P(T = t). ) D J P t (X i ), P J (X i ) p t The propensity scores P (P 0 (X),..., P J (X)) are estimated nonparametrically by a multinormial logistic series estimator that satisfies Assumption A.NP in the Appendix. The probability of being treated at t p t P(T = t) is estimated by the sample analog ˆp t = n 1 n D ti. Let p (p 0,..., p J ). The second estimator uses the efficient influence function derived in?. We define the main component of the efficient influence function to be ψ t (Z; γ t, P, p, e(γ t )), a (J + 1) 1 vector whose t-th component is ( Dt ( P t (X) m(y ; γ t t ) + e Dt t(x; γ t t ) P t (X) D ) ) t P t (X), where (5.1) P t (X) p t e t (X; γ t t ) E [ m(y ; γ t t ) T = t, X ]. The conditional expectations of the moments e(γ t ) ( e 0 ( ; γ 0 t ),..., e J ( ; γ J t ) ) are estimated nonparametrically by series estimators that satisfies Assumption A.NP in the Appendix. The efficient influ- 11 An alternative expression E [ E [ m(y ; γ t t ) T = t, X ] T = t ] = 0 motivates the regression estimator, for example, Hahn (1998), Chernozhukov, Fernandez-Val, and Melly (2013). 28

29 ence function (EIF) estimator is defined by Mt EIF n (θ, P, p, e(θ)) 1 n ˆγ EIF arg min An M EIF θ Θ J+1 t n (θ, ˆP, ˆp, ê(θ)) + op (n 1/2 ), where n ψ t (Z i ; θ, P, p, e(θ)). The asymptotic behavior of the estimators ˆγ IP W and ˆγ EIF will follow from the next assumption. Assumption 3 For all t T : (a) E [ m(y t t ; θ) 2] < and E [ m(y t t ; θ) ] is differentiable in θ Θ at γ t t ; and (b) Define the gradient matrix Γ 0 t Γ Γ t 1 t Γ J t, where Γ t t θ E[ m(y t t ; θ) ] and 0 is a d m d θ matrix of zeros. The rank of Γ t is (J + 1)d θ. θ=γt t Theorem 1 (Asymptotic Linear Representation) Suppose Assumption 3 and all Assumptions in Appendix hold. Then, ˆγ IP W γ t = ˆγ EIF γ t +o p (n 1/2 ) = ( Γ t W Γ t ) 1Γ t W M EIF t n (γ, P, p, e(γ t ))+ o p (n 1/2 ). The treatment effects and treatment effects for the treated are continuous transformations of the distributional features γ t, respectively. So a delta-method argument recovers any such collection of treatment effects. Remark (Efficient estimators) ˆγ IP W and ˆγ EIF are efficient for γ t in the following two cases: 1. d m = d θ for the just-identified case, where ˆγ IP W solves Mt IP W n (ˆγ IP W, ˆP, ˆp, ê(ˆγ IP W )) = 0 and ˆγ EIF solves Mt EIF n (ˆγEIF, ˆP, ˆp, ê(ˆγ EIF )) = The optimal weighting matrix W = V 1 t is chosen. The natural plug-in estimator of V t is given by ˆV t = 1 n n ψ t ( Zi ; ˆγ, ˆP, ˆp, ê(ˆγ) ) ( ψ t Zi ; ˆγ, ˆP, ˆp, ê(ˆγ) ) (5.2) for some consistent estimator ˆγ of γ t. The asymptotic covariance matrix and the optimal weighting matrix can be estimated consistently as Section 5.3 in Cattaneo (2010). So we do not repeat the proofs. 29

30 5.3 Inequality measures We now consider inequality measures based on the counterfactual distribution F Y t t. Some inequality measure, such as the Gini coefficient, cannot be expressed as a parameter defined by the moment function m in the previous setup. In this section, we first obtain weak convergence of the counterfactual distribution estimator ˆF Y t t (y) that is an empirical process indexed by y Y. Once the weak convergence of distribution process is established, we can extend the results to the Hadamard-differentiable functionals of the distribution process. We are able to provide the limit distribution and uniform inference for estimating common inequality measures, and various distributional structural features; for example, quantile functions, the Lorenz curves, and the Gini coefficients. The class of inequality measures that is Hadamard-differentiable has been studied recently in Firpo and Pinto (2011), Chernozhukov, Fernandez- Val, and Melly (2013), Donald and Hsu (2014), among others. Consider the moment function to be a process indexed by the distributional threshold value y Y, m( ; θ, y) { Y 1{Y y} θ : y Y }. This is a just-identified case in the previous setup. That is,, for t, t T, ˆF IP W Y t t (y) = 1 n n ˆF Y EIF IP W t t (y) = ˆF Y t t (y) + 1 n D ti ˆP t (X i ) 1{Y i y} ˆP t (X i ) ˆp t n ˆF Y T X (y t, X i ), ( D t i ˆP t (X i ) D ti ˆP t (X i ) ) ˆPt (X i ). ˆp t To shorten the notation, let the efficient influence function for estimating F Y t t from (5.1) be ( ( Dt ( ψ t t Z; y) 1{Y y} FY P t (X) t t (y) ) + ( F Y T X (y t, X) F Y t t (y) ) ( D t P t (X) D ) ) t P t (X). P t (X) p t Theorem 2 (Weak Convergence) Suppose the conditions in Theorem 1 and Assumption 5 in the Appendix hold. For any t, t T, uniformly in y Y, ( n ˆF IP W Y t t (y) F Y t t (y) ) = n ( EIF ˆF Y t t (y) F Y t t (y) ) + o p (1) = 1 n ψ n t t ( Zi ; y) + o p (1) = G t t (y). The empirical processes converge weakly to a Gaussian process G t t ( ) with mean zero and the covariance kernel Cov ( G t t (y 1 ), G t t (y 2 ) ) = lim n E [ ψ t t ( Z; y1 )ψ t t ( Z; y2 ) ] for y 1, y 2 Y. We can then implement the functional delta method on the Hadamard-differentiable functional of this distribution process. Denote D θ l (Y) to be a function space of bounded functions on Y. Corollary 1 (Functional Delta Method) Assume the conditions in Theorem 2 hold. Consider the 30

31 parameter θ as an element of a parameter space D θ l (Y) with D θ containing the true value θ 0 (y) F Y t t (y). Suppose a functional Γ(θ) mapping D θ to l (W) is Hadamard differentiable 12 in θ at θ 0 with derivative Γ θ. Then n ( Γ(ˆθ 0 )(w) Γ(θ 0 )(w) ) 1 n n Γ θ (ψ t t (Z i; y))(w) = o p (1) ( n Γ(ˆθ0 )(w) Γ(θ 0 )(w) ) = Γ θ (G t t )(w) G(w) where G is a Gaussian process indexed by w W in l (W), with mean zero and covariance kernel defined by the limit of the second moment of Γ θ (ψ t t (Z i; y)). We illustrate Corollary 1 by letting Γ be the τ-quantile operator on θ 0 (y) F Y t t (y), i.e., Γ is a generalized inverse θ0 1 : (0, 1) Y given by θ0 1 (τ) = inf{y : θ 0(y) τ}. For the quantile treatment effects for the treated, θ0 1 (τ) is the τth-quantile function of Y (t) for the treated t, denoted by Q τ = Q τ (Y (t) T = t ) = F 1 Y (t) T (τ t ). Hadamard-differentiability requires F Y t t (y) to be continuously differentiable at the τ th-quantile, with the derivative being strictly positive and bounded over a compact neighborhood. Additional assumptions might be needed for different policy functionals. For instance, Bhattacharya (2007) gives regularity conditions for Hadamard-differentiability of Lorenz and Gini functionals. Corollary 2 (Quantile treatment effect for the treated) Assume the conditions in Corollary 1. Then uniformly in τ [a, b] (0, 1), ( ) n ˆQτ Q τ = 1 n n ψ Q t t (Z i ; τ) + o p (1) that is an empirical process indexed by τ converging weakly to a Gaussian process with mean zero and covariance matrix E [ ψ Q t t (Z; τ 1 )ψ Q t t (Z; τ 2 ) ] for any τ 1, τ 2 [a, b]. The influence function is ψ Q t t (Z i ; τ) D ti p t f Y (t) T (Q τ t ) + D t i p t f Y (t)t (Q τ t ) ( ) 1{Y i Q τ } F Y T X (Q τ t, X i ) P t (X i ) P t (X i ) ( F Y T X (Q τ t, X i ) τ ). To carry out point-wise inference, the asymptotic variance can be estimated by n 1 n ˆψ Q t t (Z i ; τ) 2 as 12 See, for example, van der Vaart (2000) for dentition: let Γ be a Hadamard-differentiable functional mapping from F to some normed space E, with derivative Γ f, a continuous linear map F E. For every h n h and f F, lim u 0 1 ( ) Γ(f + uh n ) Γ(f) = Γ f (h). u 31

32 (5.2) in the previous section. 6 Detailed composition effect 6.1 Semiparametric efficiency bound We consider a (J + 1) 1 vector of interest for the detailed composition effect λ t (λ 0 t,..., λ J t ). Define ψ X1 t (Z; λ t, p, e(λ t ), P 1) to be a (J + 1) 1 vector whose t-th element is [( Dt ( P t (X) m(y ; λ t t ) + e Dt t(x; λ t t ) P t (X) D ) ) t P t (X) P t (X) P(T = t X 2 ) + E [ m(y ; λ t t )W X1 t ((X 1, X 2 )) T = t, X2 ] ( D t P(T = t X 2 ) D t P(T = t X 2 ) ) ] P(T = t X 2 ) p t (6.1) for t T. Denote the propensity score given X 2 by P 1 (x 2 ) ( P(T = 0 X 2 = x 2 ),..., P(T = J X 2 = x 2 ) ). The following assumption guarantees the existence of the efficiency bound for λ t. Assumption 4 For all t T : (a) E [ m(y t X 1 t ; θ) 2] < and E [ m(y t X 1 t ; θ) ] is differentiable in θ Θ at λ t t ; and (b) Define the gradient matrix Γ 0 X1 t Γ X1 t 0 Γ 1 X1 t Γ J X1 t and 0 is a d m d θ matrix of zeros. The rank of Γ t is (J + 1)d θ., where Γ t X 1 t θ E[ m(y t X 1 t ; θ) ] θ=λt t Theorem 3 (Factor parameter) Suppose Assumptions 1 and 4 hold. function of λ t is given by Then the efficient influence Ψ X1 t = ( Γ X ) 1 1Γ 1 1 t V X 1 t Γ X1 t X1 t V X 1 t ψ X1 t where V X1 t = var[ψ X 1 t ]. The semiparametric efficiency bound for any regular estimator of λ t by VX 1 t = ( Γ X 1 t V 1 ) 1. X 1 t Γ X1 t is given 32

33 6.2 Efficient estimators for the detailed composition effect Similarly, the factor parameter λ t can be estimated by ˆλ IP W = arg min An M IP W θ Θ J+1 X 1 t n (θ, ˆP, ˆp) + op (n 1/2 ), where MX IP W 1 t n (θ, P, p) = 1 n ( ) D0 m(y i ; θ) n P 0 (X i ),..., D J P t (X i ) W X1 t P J (X i ) p (X i); t ˆλ EIF = arg min An M EIF θ Θ J+1 X 1 t n (θ, ˆP, ˆp, ê(θ), ˆP 1 ) + op (n 1/2 ), where MX EIF 1 t n (θ, P, p, e(θ), P 1) = 1 n ψ X1 t n (Z i; θ, P, p, e(θ), P 1 ). The IPW estimator is based on (2.9). The additional weight P(T = t X 2 )/P(T = t X 2 ) can be estimated similarly as the propensity score P J (X) for j T. The additional regression of E[m(Y ; λ)w X1 t T = t, X 2] in (6.1) can be estimated similarly as e t (X; λ). The asymptotic linear representation for these estimators can be derived similarly as Theorem 1. We do not repeat the proofs. 7 Conclusion This paper is one of the first in the program evaluation literature on the effects of different levels of participation using the decomposition analysis and the multi-valued treatment effect estimation. In this paper, we study how different lengths of participation in the Workforce Investment Act (WIA) Adult program affect participants post-program wage outcomes. In particular, we explore the role of wage structures and participants characteristics distributions associated with different lengths of participation in attributing to the difference in observed wage distributions. To do so, we decompose the differences in wage distributions for participants of different lengths of program participation to (1) wage-structure effect: arising due to the different wage structures associated with different lengths of participation and (2) composition effect: arising due to different characteristics distributions for participants of different lengths of participation. When the length of participation in the WIA Adult program is considered as a multi-valued treatment and is chosen by participants randomly conditional on their characteristics (the unconfoundedness assumption), the decomposition analysis reveals causal effects, such as treatment effects for the treated. Therefore, we propose an efficient propensity score weighting estimator by extending Cattaneo (2010) s estimators for multi-valued treatment effects to treatment effect for the treated. We further propose an efficient estimator for the detailed composition effects by isolating the impacts of some specific explanatory factors, similar to DiNardo, Fortin, and Lemieux (1996) and Fortin, Lemieux, and Firpo (2011). That is, the counterfactual is based on the conditional distribution of one factor given other explanatory covariates while other explanatory covariates remain the same distribution. 33

34 We find that the wage outcomes do no have a monotonic relationship with lengths of participation in the program and this relationship varies across different gender-racial groups. Our estimation results suggest that the wage structures faced by participants of different lengths of participation contribute much to the difference in wage outcomes among them. On the other hand, the characteristics distributions of participants of different lengths of participation do not contribute much to the differences in wage outcomes. Our results suggest that the heterogeneity in the level of participation is an important dimension to investigate for program evaluation. In our future work, we will estimate the quantile counterfactual effects and quantile treatment effect for the treated to provide a more comprehensive assessment for the program. 13 The results of this paper, both theoretical and empirical, provide rigorous assessment of intervention programs and relevant suggestions to improve the performance and cost-effectiveness of these programs. 13 Stata modules are being developed to implement the ATT, QTT, and the decomposition analysis. 34

35 Appendix A Estimation Results 35

36 Table 4: Estimation results for white female Treatment effects Estimates Std. err. 95% Conf. Interval E[Y (t)] (1) (2) (3) E[Y (1)] 1, , , E[Y (2)] E[Y (3)] 1, , , E[Y (4)] 1, , , Treatment effects on the treated Estimates Std. err. 95% Conf. Interval E[Y (t) T ] (1) (2) (3) E[Y (1) T = 1] 1, , , E[Y (2) T = 1] E[Y (3) T = 1] 1, , , E[Y (4) T = 1] 1, , , E[Y (1) T = 2] 1, , , E[Y (2) T = 2] E[Y (3) T = 2] 1, , , E[Y (4) T = 2] 1, , , E[Y (1) T = 3] 1, , , E[Y (2) T = 3] E[Y (3) T = 3] 1, , , E[Y (4) T = 3] 1, , , E[Y (1) T = 4] 1, , , E[Y (2) T = 4] E[Y (3) T = 4] 1, , , E[Y (4) T = 4] 1, , , Notes: 36

37 Table 5: Estimation results for white male Treatment effects Estimates Std. err. 95% Conf. Interval E[Y (t)] (1) (2) (3) E[Y (1)] 1, , , E[Y (2)] E[Y (3)] 1, , , E[Y (4)] 1, , , Treatment effects on the treated Estimates Std. err. 95% Conf. Interval E[Y (t) T ] (1) (2) (3) E[Y (1) T = 1] 1, , , E[Y (2) T = 1] , E[Y (3) T = 1] 1, , , E[Y (4) T = 1] 1, , , E[Y (1) T = 2] 1, , , E[Y (2) T = 2] E[Y (3) T = 2] 1, , , E[Y (4) T = 2] 1, , , E[Y (1) T = 3] 1, , , E[Y (2) T = 3] E[Y (3) T = 3] 1, , , E[Y (4) T = 3] 1, , , E[Y (1) T = 4] 1, , , E[Y (2) T = 4] E[Y (3) T = 4] 1, , , E[Y (4) T = 4] 1, , , Notes: 37

38 Table 6: Estimation results for black female Treatment effects Estimates Std. err. 95% Conf. Interval E[Y (t)] (1) (2) (3) E[Y (1)] E[Y (2)] E[Y (3)] 1, , E[Y (4)] Treatment effects on the treated Estimates Std. err. 95% Conf. Interval E[Y (t) T ] (1) (2) (3) E[Y (1) T = 1] E[Y (2) T = 1] E[Y (3) T = 1] 1, , , E[Y (4) T = 1] E[Y (1) T = 2] E[Y (2) T = 2] E[Y (3) T = 2] 1, , E[Y (4) T = 2] E[Y (1) T = 3] E[Y (2) T = 3] E[Y (3) T = 3] 1, , E[Y (4) T = 3] E[Y (1) T = 4] E[Y (2) T = 4] E[Y (3) T = 4] 1, , E[Y (4) T = 4] Notes: 38

39 Table 7: Estimation results for black male Treatment effects Estimates Std. err. 95% Conf. Interval E[Y (t)] (1) (2) (3) E[Y (1)] E[Y (2)] l E[Y (3)] E[Y (4)] Treatment effects on the treated Estimates Std. err. 95% Conf. Interval E[Y (t) T ] (1) (2) (3) E[Y (1) T = 1] , E[Y (2) T = 1] E[Y (3) T = 1] E[Y (4) T = 1] E[Y (1) T = 2] E[Y (2) T = 2] E[Y (3) T = 2] , E[Y (4) T = 2] E[Y (1) T = 3] E[Y (2) T = 3] E[Y (3) T = 3] E[Y (4) T = 3] E[Y (1) T = 4] E[Y (2) T = 4] E[Y (3) T = 4] E[Y (4) T = 4] Notes: 39

40 Table 8: Estimation results for Hispanic female Treatment effects Estimates Std. err. 95% Conf. Interval E[Y (t)] (1) (2) (3) E[Y (1)] 1, , E[Y (2)] , E[Y (3)] 1, , , E[Y (4)] 1, , Treatment effects on the treated Estimates Std. err. 95% Conf. Interval E[Y (t) T ] (1) (2) (3) E[Y (1) T = 1] 1, , E[Y (2) T = 1] , E[Y (3) T = 1] 1, , , E[Y (4) T = 1] 1, , E[Y (1) T = 2] 1, , E[Y (2) T = 2] , E[Y (3) T = 2] 1, , , E[Y (4) T = 2] 1, , E[Y (1) T = 3] 1, , , E[Y (2) T = 3] 1, , E[Y (3) T = 3] 1, , , E[Y (4) T = 3] 1, , E[Y (1) T = 4] 1, , E[Y (2) T = 4] , E[Y (3) T = 4] 1, , , E[Y (4) T = 4] 1, , Notes: 40

41 41

42 Table 9: Estimation results for Hispanic male Treatment effects Estimates Std. err. 95% Conf. Interval E[Y (t)] (1) (2) (3) E[Y (1)] 1, , E[Y (2)] 1, , E[Y (3)] 1, , E[Y (4)] 1, , Treatment effects on the treated Estimates Std. err. 95% Conf. Interval E[Y (t) T ] (1) (2) (3) E[Y (1) T = 1] 1, , E[Y (2) T = 1] 2, , , E[Y (3) T = 1] 1, , E[Y (4) T = 1] 1, , E[Y (1) T = 2] 1, , E[Y (2) T = 2] 1, , E[Y (3) T = 2] 1, , E[Y (4) T = 2] 1, , E[Y (1) T = 3] 1, , E[Y (2) T = 3] 2, , , E[Y (3) T = 3] 1, , , E[Y (4) T = 3] 1, , E[Y (1) T = 4] 1, , E[Y (2) T = 4] , E[Y (3) T = 4] 1, , E[Y (4) T = 4] 1, , Notes: 42

43 43

Multivalued Treatments and Decomposition Analysis: An application to the WIA Program

Multivalued Treatments and Decomposition Analysis: An application to the WIA Program Wallice Ao Sebastian Calonico Ying-Ying Lee January 2017 Abstract We analyze how different levels of active labor program