Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited Yingying Dong and Shu Shen UC Irvine and UC Davis Sept 2015 @ Chicago 1 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Introduction Motivation: QTE/LQTE Literature In program evaluations, applied researchers care about treatment effect heterogeneity and often look at distributional/quantile effects of treatments. In quantile treatment effect (QTE) models, rank invariance or rank similarity is required either for identification: e.g., IVQR model of Chernozhukov and Hansen (2005, 06, 08), Chernozhukov, Imbens, and Newey (2007), Horowitz and Lee (2007). or for interpretation: e.g., LQTE framework (Abadie, Angrist and Imbens, 2002). Also Frolich and Melly (2013), Firpo (2007), and Imbens and Newey (2009). This paper studies the assumption of (unconditional) rank invariance and rank similarity. provides identification of the distribution of individuals (unconditional) potential ranks conditional on covariates. proposes nonparametric tests that are applicable to both exogenous and endogenous treatments 2 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Introduction Motivation: Program Evaluation Applications The Star Project: the effect of attending a small class (T ) in grade K on student outcome (Y, grade K test score) Figure: The Star Project Score Distributions Probability 50 100 150 Total Score QTE Regular Class With Aid Small Class Regular Class 3 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Introduction Motivation: Program Evaluation Applications JTPA (Job Training Partnership Act): the effect of job training (T ) on individual earnings (Y ). Randomly assigned (Z) treatment with about 60% compliance rate. Potential Earnings Distributions, Female Potential Earnings Distributions Among Compiers, Male Probability Probability 0 10000 20000 30000 40000 Earnings 0 20000 40000 60000 Earnings LQTE Control Treatment LQTE Control Treatment 4 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Introduction Definition of Rank Invariance Y 0 and Y 1 are the potential outcomes under no treatment and under treatment, respectively. U t = F t (Y t) U(0, 1) is the rank of the potential outcome Y t. U 0 and U 1 are unconditional and are never observed at the same time. Rank invariance is the condition that U 0 = U 1 Example: Y t = g t (X, V ), where Y t is test score, X is observed characteristics such as gender, race, and V is ability. If (X, V ) : Ω W, so that U t = F t (g t (X(ω), V (ω))), then rank invariance is says that U 0 (ω) = U 1 (ω) for all ω Ω. Let q t(τ) = F 1 Y t (τ) and QTE(τ) = q 1 (τ) q 0 (τ). Rank invaraince implies that QTE(τ) is the individual treatment effect for anyone who is at quantile τ. Rank invariance is restrictive does not allow for random slippages in potential ranks (e.g., caused by luck). 5 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Introduction Rank Similarity Suppose Y t = g t (X, V, S t), where X (gender, race) and V (ability) determine the common rank level, S t (luck) is a random shock (luck) responsible for the random slippages. S t is realized after a treatment is assigned. Rank similarity is the condition that U 0 (X = x, V = v) U 1 (X = x, V = v) for all (x, v) W. If (X, V ) : Ω W, then rank similarity says that U 0 (ω) U 1 (ω) for all ω Ω. 6 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Introduction Implications of Rank Similarity Rank similarity implies that Lemma 1 1 The distributions of observables and unobservables at the same rank are the same across treatment states. That is, Given rank similarity, F X,V U0 (x, v τ) = F X,V U1 (x, v τ), for all τ (0, 1), (x, v) W. 2 For any individual, her average treatment effect is a weighted average of the unconditional QTEs, where the weights are the individual s probabilities of being at different quantiles. That is, 1 Given rank similarity, E [Y 1 Y 0 X = x, V = v] = QTE(τ)dF U X,V (τ x, v) for all (x, v) W 0 3 (Main Testable Implication) Treatment should not affect the distribution of ranks among observationally equivalent individuals. That is, Given rank similarity, F U0 X(τ x) = F U1 X(τ x), for all τ (0, 1), x X. 7 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Exogenous Treatment Identification: the Exogenous Treatment Case If T is exogenous, identification of F U1 X(τ x) F U0 X (τ x) for τ (0, 1) and x X is trivial: F U1 X(τ x) F U0 X (τ x) = E ( 1(U 1 τ) X = x ) E ( 1(U 0 τ)) X = x ) = E ( 1(Y 1 q 1 (τ) X = x ) E ( 1(Y 0 q 0 (τ)) X = x ) = E ( 1(Y q 1 (τ)) X = x, T = 1 ) E ( 1(Y q 0 (τ)) X = x, T = 0 ), where marginal quantiles q 1 (τ) and q 0 (τ) are directly identified from sub-samples with T = 1 and T = 0, respectively. 8 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Identification: the Endogenous Treatment Case If T is endogenous, let Z = 0, 1 be an IV, and T z for z = 0, 1 be potential treatment status. Interested in testing for rank similarity among compliers (T 1 > T 0 ): F U1 C,X(τ x) = F U0 C,X(τ x) for all τ (0, 1) and x X C, where X C = {x X : Pr [T 1 > T 0 X = x] > 0}. Assumption 1 Let (Y t, T t, X, Z), t = 0, 1 be random variables mapped from the common probability space (Ω, F, P). The following conditions hold jointly with probability one. 1 Independence: (Y 0, Y 1, T 0, T 1 ) Z X. 2 First stage: E(T 1 ) E(T 0 ). 3 Monotonicity: Pr(T 1 T 0 ) =1. 4 Nontrivial assignment: 0 < Pr (Z = 1 X = x) < 1 for all x X. 9 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Identification: the Endogenous Treatment Case Theorem 1 Let I (τ) 1 ( Y ( Tq 1 C (τ) + (1 T ) q 0 C (τ) )). Given Assumption 1, for all τ (0, 1), x X C, and t = 0, 1, F Ut C,X(τ x) is identified and is given by F Ut C,X(τ x) = E [I (τ)1 (T = t) Z = 1, X = x] E [I (τ)1 (T = t) Z = 0, X = x]. (1) E[1 (T = t) Z = 1, X = x] E[1 (T = t) Z = 0, X = x] F U1 C,X(. x) = F U0 C,X(. x) for x X C if and only if for all τ (0, 1) and x X Note: I (τ) is a rank indicator. E [I (τ) Z = 1, X = x] = E [I (τ) Z = 0, X = x]. (2) Notice the change from x X C to x X in the theorem. This is because Equation (2) holds trivially for X /X C. Use the identification result of Equation (2) to test for H 0 : F U1 C,X(. x) = F U0 C,X(. x). Use the identification result of Equation (1) to estimate F U1 C,X(. x) F U0 C,X(. x). 10 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Mean Test Theorem 1 also gives identification of specific features of the potential rank distribution such as the mean. Rank similarity implies F U1 C,X(τ x) = F U0 C,X(τ x) which further implies E[U 1 C, X = x] = E[U 0 C, X = x]. E[U 1 C, X = x] = E[U 0 C, X = x] holds if and only if E [U Z = 1, X = x] = E [U Z = 0, X = x], where U TU 1 + (1 T )U 0 = 1 0 1 (( Tq 1 C (τ) + (1 T )q 0 C (τ) ) < Y ) dτ = 1 1 0 I (τ)dτ. U is identified because I (τ) is identified. E[U 1 C, X = x] E[U 0 C, X = x] represents the average rank change for each subpopulation. 11 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Star Project: Small Classes V.s. Regular Classes Rank Distributions by Race Rank Distributions by Gender Probability Probability Rank of Total Score Rank of Total Score Nonwhite, Small Class White, Small Class Nonwhite, Regular Class White, Regular Class Boy, Small Class Girl, Small Class Boy, Regular Class Girl, Regular Class Regular Class with Aid V.s. Regular Classes Rank Distributions by Race Rank Distributions by Gender Probability Probability Rank of Total Score Rank of Total Score 12 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Empirical Example: JTPA Female Potential Rank Distributions, by Education Potential Rank Distributions, by Employment Last Year Probability Probability 0 20 40 60 80 100 Rank 0 20 40 60 80 100 Rank <HS, Treatment HS, Treatment <HS, Control HS, Control <13 Weeks, Treatment <13 Weeks, Control >=13 Weeks, Treatment >=13 Weeks, Control Male Potential Rank Distributions by Education Potential Rank Distributions by Employment Last Year Probability Probability 0 20 40 60 80 100 0 20 40 60 80 100 Rank Rank 13 / 37 <HS, Treatment <HS, Control Dong, Shen Testing<13 forweeks, Rank Treatment Invariance or Similarity <13 Weeks, in Program ControlEvaluation

Identification Endogenous Treatment Null Hypothesis and Test Statistic Let X = {x 1, x 2,..., x J }, Ω = {τ 1, τ 2,..., τ K } H 0 : m 0 j (τ k) = m 1 j (τ k) for j = 1,..., J 1 and k = 1,..., K, for z = 0, 1 m z j (τ k ) E [ 1 ( Y Tq 1 C (τ k ) + (1 T )q 0 C (τ k ) ) Z = z, X = x j ]. ˆm z j (τ k ) = 1 n z j where ˆω i ( Zi Z i =z,x i =x j 1 ( Y i T i ˆq 1 C (τ k ) + (1 T i )ˆq 0 C (τ k ) ), with (ˆq0 C (τ k ), ˆq 1 C (τ k ) ) 1 = arg min q 0,q 1 n π(x i ) 1 Z i 1 π(x i ) n ρ τk (Y i q 0 (1 T i ) q 1 T i )ˆω i, i=1 ) (2T i 1) and π(x) is a consistent estimator of π(x) = Pr (Z = 1 X = x), and n z j = n i=1 1(Z i = z, X i = x j ). Wald-type test: W n ( ˆm 1 ˆm 0) ˆV 1 ( ˆm 1 ˆm 0). 14 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Assumption for Asymptotic Properties Assumption 3 1 i.i.d. data: the data (Y i, T i, Z i, X i ) for i = 1,..., n is a random sample of size n from (Y, T, Z, X). 2 For all τ Ω = {τ 1, τ 2,..., τ K }, the random variable Y 1 and Y 0 are continuously distributed with positive density in a neighborhood of q 0 C (τ) and q 1 C (τ) in the subpopulation of compliers. 3 For all j = 1,..., J, ˆπ(x j ) is consistent, or ˆπ ( x j ) p π ( xj ). 4 Let f Y T,Z,X be the conditional density of Y given T, Z and X. For all t, z = 0, 1, j = 1,..., J and τ Ω, f Y T,Z,X (y t, z, x j ) has bounded first derivative with respect to y in a neighborhood of q t C (τ). Let f Y X (y x) be the conditional density of Y given X. For all τ Ω and j = 1,..., J, f Y X (. x j ) is positive and bounded in a neighborhood of q t C (τ). 15 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Asymptotics Let ˆm z, z = 0, 1, be K(J 1) dimentional vector with K(j 1) + k-th element ˆm j z (τ k). Theorem 2 Given Assumptions 1 and 3, ( n ˆm 1 ˆm 0 ( m 1 m 0)) N(0, V) where ( V is the K(J 1) K(J 1) asymptotic variance-covariance matrix. The J 1 j=1 K(j 1) + k, J 1 j =1 K(j 1) + k )-th element of V is equal to [( ) ( )] E φ 1 j (τ k) φ 0 j (τ k) φ 1 j (τ k ) φ 0 j (τ k ) with φ z j (τ k) φ z j (τ k; Y, T, Z, X) = I (τ k) mj z (τ k) 1(Z = z, X = x j ) p Z,X (z, x j ) f Y T,Z,X(q 0 C (τ k ) 0, z, x j )(1 p T Z,X (z, x j )) ψ 0 (Y, T, Z, X) P cf 0 C (q 0 C (τ k )) f Y T,Z,X(q 1 C (τ k ) 1, z, x j )p T Z,X (z, x j ) ψ 1 (Y, T, Z, X), P cf 1 C (q 1 C (τ k )) where ψ 0 (Y, T, Z, X) and ψ 1 (Y, T, Z, X) are defined in the proof of Theorem 7 in Frolich and Melly (2007), and restated in the proof of this theorem in the Appendix. 16 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment Asymptotic Properties of the Test Remember the Wald-type test statistic W n ( ˆm 1 ˆm 0) ˆV 1 ( ˆm 1 ˆm 0) χ 2 (K(J 1)) under the null. Bootstrap ˆV. If q 0 C (τ k ) and q 1 C (τ k ) were known, φ z j (τ k ; Y, T, Z, X) would reduce to I (τ k ) m j z (τ k ) ( J 1 1(Z = z, X = x p Z,X (z,x j ) j ) and the j=1 K(j 1) + k, J 1 j K(j 1) + k ) -th element of =1 V is equal to z=0,1 mz j (τ k τ k ) m z j (τ k )m z j (τ k ) if j = j, and 0 if j j. If J is very large, then the first stage estimation error may be ignored and one can construct ˆV by the analytic formula. Discussed in extensions where J or X includes continuous variables. The critical value c α is the (1 α) 100-th percentile of the χ 2 (K(J 1)) distribution. The test is consistent for the null hypothesis H 0 Once again, the test does NOT test the unobservable part (e.g. V or ability) of the rank invariance assumption. 17 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment The Mean Rank Similarity Test Let m z j = E[U Z = z, X = x j ] for z = 0, 1. H 0,mean : m j 0 = m j 1, for all j = 1,..., J 1. Let {τ s } S x =1 be S random draws from U (0, 1). Ûi T Û1i + (1 T )Û0i for i = 1,..., n, can be estimated by Û i = 1 S 1 (( T ˆq 1 C (τ s ) + (1 T )ˆq 0 C (τ s ) ) ) Y i, S s=1 m z j can then be estimated by m z j = 1 n z j Z i =z,x i =x j Û i. 18 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Identification Endogenous Treatment The Mean Rank Similarity Test Corollary 3 Suppose Assumptions 1 and 3 hold for Ω = (0, 1). Under the null hypothesis where m 1 = m 0, when S, n n ( m 1 m 0) N(0, V mean), where V mean is the (J [( 1) (J 1) asymptotic variance-covariance matrix. The (j, j )-th element of V mean is E 1 0 φ1 j (τ)dτ ) ( 1 0 φ0 j (τ)dτ 1 0 φ1 j (τ)dτ )] 1 0 φ0 j (τ)dτ, where 1 φ z j (τ)dτ = U mz j 0 p Z,X (z, x j ) 1(Z = z, X = x j ) ( 1 f Y T,Z,X (q 0 C (τ) 0, z, x j ) 1 PT Z,X (z, x j ) ) ψ 0 (Y, T, Z, X) dτ 0 f 0 C (q 0 C (τ)) P c 1 f Y T,Z,X (q 1 C (τ) 1, z, x j ) dτ P T Z,X(z, x j )ψ 1 (Y, T, Z, X). 0 f 1 C (q 1 C (τ)) P c A Wald-type test statistic is then as N, J, N/J. W mean n ( m 1 m 0) V 1 ( m 1 m 0) χ 2 (J 1) 19 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Simulation and JTPA Simulation Simulation: DGPs DGP: Y 0 = X + V + S 0, Y 1 = X + V + (1 bxv ) + S 1, Y = Y 1 T + Y 0 (1 T ), Pr(X = 0.4j) = 1/5 for j = 1,..., 5, V, S 0, S 1 N(0, 1) and b = 0, 2. Exogenous treatment: Pr(T = t) = 1 2, t = 0, 1. Endogenous treatment: Pr(Z = z) = 1 2, z = 0, 1, and T = 1 (0.15(Y1 Y0) + Z 0.5 > 0). Rank similarity holds when b = 0 but not when b 0. 20 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Simulation and JTPA Simulation Illustration of DGPs: Exogenous Treatment Figure: Conditional distributions of potential ranks b=0 b=1 Conditional CDF Conditional CDF Potential rank Potential rank X=0.4, Y(1) X=0.8, Y(1) X=1.2, Y(1) X=1.6, Y(1) X=2.0, Y(1) X=0.4, Y(0) X=0.8, Y(0) X=1.2, Y(0) X=1.6, Y(0) X=2.0, Y(0) X=0.4, Y(1) X=0.8, Y(1) X=1.2, Y(1) X=1.6, Y(1) X=2.0, Y(1) X=0.4, Y(0) X=0.8, Y(0) X=1.2, Y(0) X=1.6, Y(0) X=2.0, Y(0) b=2 b=3 Conditional CDF Conditional CDF Potential rank Potential ranks X=0.4, Y(1) X=0.8, Y(1) X=1.2, Y(1) X=1.6, Y(1) X=2.0, Y(1) X=0.4, Y(0) X=0.8, Y(0) X=1.2, Y(0) X=1.6, Y(0) X=2.0, Y(0) X=0.4, Y(1) X=0.8, Y(1) X=1.2, Y(1) X=1.6, Y(1) X=2.0, Y(1) X=0.4, Y(0) X=0.8, Y(0) X=1.2, Y(0) X=1.6, Y(0) X=2.0, Y(0) 21 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Simulation and JTPA Simulation Simulation Results: Exogenous Treatment N 500 1000 1500 2000 2500 500 1000 1500 2000 2500 b = 0 b = 1 Test 1: Ω = {0.5} 0.034 0.039 0.051 0.040 0.053 0.047 0.059 0.101 0.120 0.146 Test 2: Ω = {0.2, 0.3, 0.4} 0.013 0.013 0.025 0.021 0.023 0.018 0.038 0.065 0.099 0.152 Test 3: Ω = {0.5, 0.6, 0.7, 0.8} 0.014 0.014 0.023 0.023 0.018 0.022 0.050 0.134 0.148 0.260 Test 4: Ω = {0.2, 0.3,..., 0.8} 0.006 0.010 0.013 0.013 0.013 0.009 0.044 0.102 0.144 0.283 Test 5: Mean Test 0.051 0.044 0.048 0.041 0.067 0.063 0.092 0.144 0.176 0.251 b = 2 b = 3 Test 1: Ω = {0.5} 0.074 0.150 0.232 0.303 0.388 0.143 0.335 0.512 0.640 0.800 Test 2: Ω = {0.2, 0.3, 0.4} 0.269 0.776 0.968 0.994 1.000 0.817 0.999 1.000 1.000 1.000 Test 3: Ω = {0.5, 0.6, 0.7, 0.8} 0.151 0.581 0.857 0.962 0.991 0.306 0.880 0.992 1.000 1.000 Test 4: Ω = {0.2, 0.3,..., 0.8} 0.287 0.910 0.996 1.000 1.000 0.836 0.999 1.000 1.000 1.000 Test 5: Mean Test 0.103 0.213 0.278 0.424 0.500 0.340 0.659 0.853 0.941 0.971 Sample Size = 1000 b = 2 Rejection Rate Rejection Rate 0 1 2 3 b Distributional Test 1 Distributional Test 2 Distributional Test 3 Distributional Test 4 Mean Test 500 1000 1500 2000 2500 Sample Size Distributional Test 1 Distributional Test 2 Distributional Test 3 Distributional Test 4 Mean Test 22 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Simulation and JTPA Simulation Simulation Results: Endogenous Treatment N 500 1000 1500 2000 2500 500 1000 1500 2000 2500 b = 0 b = 1 Ω = {0.5} 0.025 0.036 0.041 0.038 0.057 0.036 0.040 0.064 0.073 0.085 Ω = {0.2, 0.3, 0.4} 0.012 0.012 0.018 0.017 0.025 0.013 0.022 0.040 0.065 0.107 Ω = {0.5, 0.6, 0.7, 0.8} 0.006 0.013 0.016 0.022 0.015 0.006 0.016 0.034 0.047 0.074 Ω = {0.2, 0.3,..., 0.8} 0.002 0.010 0.006 0.010 0.008 0.003 0.015 0.029 0.057 0.094 Mean Test 0.054 0.050 0.051 0.045 0.057 0.037 0.050 0.054 0.063 0.068 b = 2 b = 3 Ω = {0.5} 0.084 0.242 0.379 0.522 0.615 0.113 0.293 0.441 0.617 0.700 Ω = {0.2, 0.3, 0.4} 0.170 0.589 0.870 0.965 0.993 0.284 0.783 0.975 1.000 1.000 Ω = {0.5, 0.6, 0.7, 0.8} 0.021 0.150 0.340 0.600 0.764 0.020 0.198 0.450 0.704 0.865 Ω = {0.2, 0.3,..., 0.8} 0.053 0.431 0.823 0.960 0.993 0.093 0.634 0.949 1.000 1.000 Mean Test 0.152 0.322 0.481 0.622 0.709 0.191 0.441 0.602 0.772 0.843 Sample Size = 1000 b = 2 Rejection Rate Rejection Rate 0 1 2 3 b Distributional Test 1 Distributional Test 2 Distributional Test 3 Distributional Test 4 Mean Test 500 1000 1500 2000 2500 Sample Size Distributional Test 1 Distributional Test 2 Distributional Test 3 Distributional Test 4 Mean Test 23 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Empirical Examples: Testing Results Star Testing: the Exogenous Treatment Case Conclusions Table: Star Project: Test Results Treatment type Total Test Score Birthday (1st-31st) Small Class V.s. Regular Class Test Stat 232 21.23 P-value 0.033 0.439 Aid Class V.s. Regular Class Test Stat 266 13.25 P-value 0.001 0.905 Both treatments (small class and regular class with aid) improve the rank distribution of the disadvantaged (boy, nonwhite) Assigning a teaching aid to the regular class systematically changes students rank. Researchers may want to reconsider the practice of using both regular class with and without aid as the control group in analysis. 24 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Empirical Examples: Testing Results Star Empirical Example: JTPA Y = 30 months earnings following assignment T = receiving training services, Z = random assignment indicator, X = black, Hispanic, HS or GED, married, worked at least 13 weeks the year before, AFDC receipt (for women only) and 5 age category dummies. (Abadie, Angrist and Imbens, 2002) 25 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Empirical Examples: Testing Results JTPA JTPF: First-Stage Unconditional LQTE Estimation Table: First-stage estimates of unconditional QTEs of training on trainee earnings Female Male Quantile Y 0 QTE Y 0 QTE 0.15 195 291 (341.88) 1,462 249 (713.36) 0.20 723 714 (358.31)* 2,733 390 (723.01) 0.25 1,458 1,200 (372.08)*** 4,434 489 (746.85) 0.30 2,463 1,380 (399.21)*** 6,993 340 (891.74) 0.35 3,784 1,705 (497.01)*** 8,836 594 (1,042.40) 0.40 5,271 1,974 (669.75)*** 11,010 723 (1,104.63) 0.45 6,726 2,451 (766.25)*** 13,104 1,069 (1,144.28) 0.50 8,685 2,436 (829.29)*** 15,374 1,291 (1,234.59) 0.55 11,007 2,089 (877.56)** 17,357 2,239 (1,295.79)* 0.60 12,618 2,729 (886.96)*** 20,409 2,118 (1,418.40) 0.65 14,682 2,943 (920.45)*** 23,342 2,319 (1,557.00) 0.70 16,971 2,772 (1,027.14)*** 27,169 1,780 (1,606.66) 0.75 20,252 2,106 (1,152.35)* 30,439 2,408 (1,641.47) 0.80 23,064 2,331 (1,149.71)** 34,620 2,800 (1,701.90)* 0.85 26,735 1,762 (1,179.91) 39,233 3,955 (1,886.98)** Note: Standard errors are in the parentheses; All estimates control for covariates including dummies for black, Hispanic, high-school graduates (including GED holders), marital status, whether the applicant worked at least 12 weeks in the 12 months preceding random assignment, and AFDC receipt (for women only) as well as 5 age group dummies; * significant at the 10% level, ** significant at the 5% level, ***significant at the 1% level. 26 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Empirical Examples: Testing Results JTPA JTPA: Joint Test Table: Rank similarity test jointly at all quantiles Female Male I II I II (1) (2) (1) (2) (1) (2) (1) (2) Panel A: Dependent Var. Earnings χ 2 7,652.1 7,763.8 1,197.2 1,177.8 2,780.7 2,719.0 886.1 876.8 (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) d.f. 1,544 1,544 723 723 1,218 1,218 570 570 Panel B: Falsification test (Dependent Var. Age) χ 2 478.8 471.9 252.0 259.9 209.3 203.5 124.7 123.0 (0.926) (0.953) (0.366) (0.245) (1.000) (1.000) (0.977) (0.982) d.f. 525 525 245 245 338 338 158 158 Note: Results are based on the Chi-squared test in Theorem 2; Variance-covariance matrices are bootstrapped with 2,000 replications; P-values are in the parentheses; Columns I report a joint test at equally-spaced 15 quantiles from 0.15 to 0.85; Columns II reports a joint test at equally-spaced 7 quantiles from 0.20 to 0.80; (1) controls for covariates in the first-stage unconditional QTE estimation, while (2) does not; X values with fewer than 5 observations when either Z = 0 or Z = 1 are not used in the test to ensure the common support assumption. 27 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Empirical Examples: Testing Results JTPA JTPA: Test Individual Quantiles Table: Rank similarity test at individual quantiles Panel A: Dependent Var. Earnings Panel B: Falsification test (Dependent Var. Age) Female Male Female Male Quantile χ 2 χ 2 χ 2 χ 2 0.15 134.4 (0.012) 103.8 (0.045) 43.9 (0.144) 19.4 (0.561) 0.20 143.0 (0.004) 113.3 (0.010) 37.9 (0.340) 22.1 (0.391) 0.25 126.2 (0.060) 107.8 (0.025) 26.0 (0.863) 13.9 (0.907) 0.30 131.9 (0.034) 104.7 (0.039) 26.9 (0.834) 15.0 (0.861) 0.35 147.2 (0.003) 95.8 (0.142) 22.1 (0.956) 17.9 (0.712) 0.40 118.3 (0.160) 88.6 (0.291) 31.1 (0.659) 23.2 (0.447) 0.45 107.5 (0.387) 110.7 (0.019) 32.1 (0.611) 22.4 (0.497) 0.50 110.9 (0.304) 113.6 (0.012) 32.3 (0.599) 19.2 (0.692) 0.55 112.6 (0.266) 110.9 (0.019) 30.8 (0.673) 19.6 (0.664) 0.60 112.1 (0.276) 112.3 (0.015) 32.7 (0.581) 22.3 (0.503) 0.65 121.7 (0.113) 105.0 (0.044) 29.4 (0.734) 18.4 (0.735) 0.70 108.0 (0.375) 106.1 (0.038) 36.7 (0.388) 24.0 (0.402) 0.75 130.4 (0.035) 109.7 (0.018) 45.4 (0.112) 16.5 (0.831) 0.80 118.4 (0.128) 116.5 (0.005) 47.7 (0.074) 17.1 (0.802) 0.85 92.3 (0.697) 118.7 (0.002) 44.7 (0.125) 18.7 (0.716) Note: Results are based on the Chi-squared test in Theorem 2; Variance-covariance matrices are bootstrapped with 2,000 replications; P-values are in the parentheses; Covariates are controlled for in the first-stage unconditional QTE estimation. X values with fewer than 5 observations when either Z = 1 or Z = 0 are not used in the test to ensure the common support assumption. 28 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Empirical Examples: Testing Results JTPA JTPA: Test the Mean Rank Table: Rank similarity test for the mean rank only Female Male (1) (2) (1) (2) Panel A: Dependent Var. Earnings χ 2 123.1 (0.098) 123.1 (0.098) 115.2 (0.009) 115.2 (0.009) d.f. 104 104 82 82 Panel B: Falsification test (Dependent Var. Age) χ 2 30.6 (0.683) 30.6 (0.683) 18.4 (0.736) 18.4 (0.736) d.f. 35 35 23 23 Note: Results are based on the Chi-squared test for the mean ranks only; Variance-covariance matrices are bootstrapped with 2,000 replications; P-values are in the parentheses; (1) controls for covariates in the first-stage unconditional QTE estimation, while (2) does not; X values with fewer than 5 observations when either Z = 1 or Z = 0 are not used in the test to ensure the common support assumption. Conclusion: Training causes some individuals to systemically change their ranks in the earnings distribution. Should be cautious in equating the distributional impacts of training with the true effects on individual trainees. Results largely agree with Heckman, Smith and Clements (1997): perfect positive dependence across potential outcome distributions... not credible. 29 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Extensions Extension I: Covariates with Large Support Assume that J, as n. Assumption 4 1 i.i.d. data: the data {Y i, T i, Z i, X i } for i = 1,..., n is a random sample of size n of (Y, T, Z, X). 2 For all τ Ω = {τ 1, τ 2,..., τ K }, the random variable Y 1 and Y 0 are continuously distributed with positive density in a neighborhood of q 0 C (τ) and q 1 C (τ) in the subpopulation of compliers. 3 Let n j = n i=1 1(X = x j). n j n/j uniformly over j, i.e. there exist 0 < c C < such that c n J n j C n for all j = 1,..., J. J p 4 ˆπ(x j ) is uniformly consistent, or sup j=1,...,j ˆπ(x j ) π(x j ) 0 as n, J and n/j. 5 For all t, z = 0, 1, j = 1,..., J and τ Ω, f Y T,Z,X (. t, z, x j ) is bounded in a neighborhood of q t C (τ). For all τ Ω and j = 1,..., J, f Y X (. x j ) is positive and bounded in a neighborhood of q t C (τ). 30 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Extensions Extension I: Covariates with Large Support Let ˆm z j = ( ˆm z j (τ 1),..., ˆm z j (τ K )) and m z j = (m z j (τ 1),..., m z j (τ K )) be K 1 vector. Corollary 4 Given Assumptions 1 and 4, we have n1 j n0 ( ( )) j nj 1 + nj 0 ˆm 1 j ˆm 0 j m 1 j m 0 j Z j N(0, V j ), where Z j for j = 1,..J follow independent multi-variate normal distributions; the (k, k )-th element of K K variance-covariance matrix ) V j is ) V j;k,k = π(x j )mj 1(τ k τ k ) (1 mj 1(τ k ) + (1 π(x j ))mj 0(τ k τ k ) (1 mj 0(τ k ). 31 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Extensions Extension I: Covariates with Large Support For each j = 1,..., J, define the Wald-type statistic w j = n1 j n0 j nj 1 + nj 0 ( ) ( ) ˆm 1 j ˆm 0 j ˆV 1 j ˆm 1 j ˆm 0 j where ˆV j is a consistent estimator of V j. The (k, k )-th element of ˆV j is ˆV j;k,k = n0 j nj 0 + nj 1 ) ˆm j 1 (τ k τ k ) (1 ˆm j 1 (τ k ) + n1 j nj 0 + nj 1 ) ˆm j 0 (τ k τ k ) (1 ˆm j 0 (τ k ). The test statistic is then W largej = J 1 j=1 w j K(J 1) 2K(J 1) N(0, 1). The one-sided decision rule of the test is to reject the null hypothesis H 0 if W largej > c α. 32 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Extensions Extension II: Continuous Covariates Let m z k (x) = E[I (τ k) Z = z, X = x] for z = 0, 1. Interested in testing H 0 : mk 1 (x) = m0 k (x) for all x X and k = 1,..., K, Apply Chernozhukov, Lee and Rosen (2013) and form Kolmogorov-Smirnov type test statistic: ˆm k 1 KS = sup (x) ˆm0 k (x) k,x s k (x), where ˆm z k (x) is local linear estimator and s k(x) the standard error of ˆm 1 k (x) ˆm0 k (x). Construct the critical value c α by multiplier bootstrap. Let ˆm k (x) is a multiplier process such that ˆm k (x) = Z i =1 η i ˆɛ k,i K h1 (X i x) Z Z i =1 K i =0 η i ˆɛ k,i K h0 (X i x) h 1 (X i x) Z i =0 K h 0 (X i x) where {η i } N i=1 is simulated from i.i.d. N(0, 1) and independent of data, ˆɛ k,i = 1 ( Y i ˆq 1 C (τ k )T i + ˆq 0 C (τ k )(1 T i ) ) ˆm k 1(x i )Z i ˆm k 0(x i )(1 Z i ). c α is the ˆm k (1 α) 100% percentile of the simulated process sup (x) k,x s k (x). Reject the null if KS > c α. 33 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Extensions Extension III: Testing for Conditional Rank Invariance or Similarity Two main modifications are required: 1 First, estimate conditional quantiles conditional on some covariates X 1 of interest. 2 Second, use additional covariates X 2 other than the conditioning covariates in the first-step to perform the test. Feasible only when the conditioning set for the conditional quantiles is small. E.g., we estimate quantiles of potential earnings, and perform tests for male and female trainees, so the tests are essentially rank similarity tests for conditional ranks conditional on gender. 34 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Conclusion Conclusion Proposes nonparametric tests for rank invariance or similarity popular in program evaluation or various QTE models. The tests explore whether the distribution (or features of it) of potential ranks remains the same among observationally equivalent individuals. Simulations show good size and power of the proposed tests in small samples. Empirical application to the JTPA training program: Training causes some individuals to systematically change their ranks in the distribution of earnings. Program effects are more complicated than suggested by standard QTEs. Should be cautious in equating program impacts on the distribution of earnings with those on individual trainees. 35 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Conclusion (Incomplete) Reference List QTE models: Abadie, Angrist and Imbens (2002), Chesher (2003, 2005), Chernozhukov and Hansen (2005, 2006, 2008), Firpo (2007), Firpo, Fortin and Lemieux (2007), Chernozhukov, Imbens and Newey (2007), Horowitz and Lee (2007), Imbens and Newey (2009), Rothe (2010), Frolich and Melly (2013), Powell (2013), Yu (2014), etc. Other works in rank invariance/similarity testing Frandsen and Lefgreen (2015): a parametric test for rank similarity, testing the equality of mean ranks with or without treatment. Yu (2015): a test for rank invariance, assuming unconfoundedness. JTPA: Abadie, Angrist and Imbens (2002), Chernozhukov and Hansen (2008), Orr et al. (1996), Heckman, Smith, and Clements (1997), etc. Star: Krueger (1999), Krueger and Whitmore (2001), Chetty, et. al. (2010), Jackson and Page (2013), etc. 36 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation

Conclusion Thank you! 37 / 37 Dong, Shen Testing for Rank Invariance or Similarity in Program Evaluation