Distributional Tests for Regression Discontinuity: Theory and Empirical Examples

Size: px

Start display at page:

Download "Distributional Tests for Regression Discontinuity: Theory and Empirical Examples"

Evelyn Stafford
5 years ago
Views:

1 Distributional Tests for Regression Discontinuity: Theory and Empirical Examples Shu Shen University of California, Davis Xiaohan Zhang University of California, Davis January 21, 2014 Abstract This paper proposes consistent testing methods that can be used to examine the effect of a policy treatment on the whole distribution of response outcomes within the settings of a regression discontinuity design. Such methods are useful when economists or policy makers expect heterogeneous treatment effects resulting from unobservables. The test statistics are based on local linear estimators of distributional treatment effects and local distributional treatment effects of compliers. They are constructed using Kolmogorov-Smirnov-type test statistics and are asymptotically distribution-free when the data are i.i.d. The proposed tests are applied to an Italian dataset to study households consumption behavior as household members reach retirement age. It is shown that the distributional tests are useful complements to the classic mean regression discontinuity methods as they potentially offer more information than mean analyses.

2 1 Introduction Regression discontinuity has gained increasing popularity in the field of applied economics in the past 15 years as it provides credible and straightforward identification of the causal effect of policies. The identification strategy of regression discontinuity utilizes the fact that in a lot of policy interventions the probability of individuals receiving a policy treatment changes discontinuously with one or more underlying variables. 1 Researchers compare the response outcomes above and below the threshold value of those underlying variables that influence the likelihood of receiving treatment to identify the effects of the policy interventions. Theoretically, such an identification strategy identifies not only the mean treatment effects but also the distributional effects of policies (c.f. Imbens and Lemieux, 2008). Nonetheless, with the exception of a recent innovative study by Frandsen, Frolich, and Melly (2012), almost all previous regression discontinuity studies in economics have focused merely on the identification and estimation of mean effects. In this paper, we propose uniform tests for the analysis of policy effects on the whole outcome distribution. The proposed tests, as far as the authors know, are the first set of consistent tests for distributional regression discontinuity analysis. They are easy to implement and are useful complements to classical mean regression discontinuity methods. Since the introduction and development of quantile regression techniques pioneered by Koenker (c.f Koenker, 2005), economists have long been aware that mean regression techniques provide limited information about the effect of covariates on the response outcome. Moreover, since economic theories often predict systematic heterogeneity in the impact of policy treatments, distributional analysis may be necessary to capture the range of expected effects. For example, Bitler, Gelbach, and Hoynes (2008) find substantial heterogeneity in the effect of Connecticut s Jobs First waiver program, a result that is in line with predictions 1 For example, van der Klaauw (1997) studies the effect of financial aids on students college choices using the administrative rule that partially links the amount students get in financial aid to their GPA and SAT scores, and Angrist and Lavy (1999) studies the effect of class size on students test scores using the policy that adds additional classroom if the average class size exceeds a threshold. 1

3 from labor supply theory. In regression discontinuity designs, the policy effects identified through mean analysis could also potentially miss a lot of information. Frandsen, Frolich, and Melly (2012) is the first paper to look at distributional policy effects within the regression discontinuity set-up. They propose a nonparametric estimator for local quantile treatment effects and then apply the estimator to a universal pre-k program in Oklahoma. They find that participation in a pre-k program significantly raises the lower end and the middle of the test score distribution. The contribution of this paper is to propose consistent tests for distributional partial effects, which is not discussed in Frandsen, Frolich, and Melly (2012). This would allow researchers to better understand whether an identified positive or negative average treatment effect is unambiguous along the whole outcome distribution given a pre-determined confidence level, or whether an insignificant mean effect could be generalized to the whole distribution. Both sharp and fuzzy regression discontinuity designs are considered. The test statistics are constructed as Kolmogorov-Smirnov type functionals of the estimated change in conditional outcome distributions at the threshold value of the running variable that influence policy implementation. The tests are easy to implement. When the data are i.i.d., the tests are distribution-free meaning that the critical values and p-values of the tests can be easily tabulated. When the data are stratified, as is in our empirical example, the limiting distribution of test statistics are more complicated. For such cases we propose a simple simulation method for calculating critical values and p-values. The proposed tests also produce, as byproducts, confidence regions for the part of the conditional outcome distribution on which the policy has a positive or negative effect. Besides the uniform tests, we also propose in the paper estimators for the treatment effect on the outcome distribution. These distributional estimators are dual estimators of the quantile treatment effect estimators proposed in Frandsen, Frolich, and Melly (2012). Our empirical exercise shows that proposed distributional regression discontinuity tests are useful complements to the classic mean regression discontinuity method. In particular, 2

4 they may reveal interesting new information about the local treatment effect neglected by the mean analysis. Using an Italian dataset, we examine the expenditure of households with household head close to the retirement age. For the full sample, we see a one-time drop in household total expenditure from both mean and distributional regression discontinuity analysis. But for some subsamples the expenditure effect of a household head s retirement is much more heterogeneous. For example, the (conditional) expenditure distribution of households in which the wife is ineligible for a pension when her husband retire (conditional on the husband reaching pension eligibility age) shifts to the right for the bottom three quintiles and to the left for the top two. This suggests that the upper-middle class might cut their spending when the household head retires while the lower-middle class and less well-off households increase their spending. Further analysis shows that the heterogeneous pattern in consumption changes at retirement is driven by non-food non-durable expenditures and is only present for households living in small municipalities. The dataset used in the empirical exercise, the Italy Survey on Household Income and Wealth (SHIW) was previously analyzed in Battistin, Brugiavini, Rettore, and Weber (2009). Since the SHIW follows stratified sampling, we propose extensions of distributional regression discontinuity tests that are consistent with such a data structure. The results in the empirical section show the importance of proper weighting in both mean and distributional regression discontinuity analysis. The remainder of the paper is organized as follows. Section 2 motivates distributional analysis under regression discontinuity and provides identification of the distributional treatment effect and the local distributional treatment effect. Section 3 proposes consistent tests for the distributional treatment effects based on local linear estimators of the conditional outcome distributions. Section 3 also proposes a rule-of-thumb bandwidth choice method and discusses tests under stratified sampling. Section 4 applies the proposed method to a study of the retirement consumption puzzle. Section 5 concludes. 3

5 2 Identification of Distributional Treatment Effects Let Y i denote the outcome of interest and T i the dummy variable indicating treatment of individual i if T i = 1. Use Y i (0) and Y i (1) to denote potential outcomes when T i = 0 and T i = 1. Whether individual i receives treatment depends at least partially on the running variable X i. 2 A policy intervention encourages an individual i to receive treatment if the running variable X i is larger than c. Let Z i = 1(X i c) be the dummy variable that indicates whether individual i is encouraged or not and T i (1), T i (0) be the treatment individual i would receive if encouraged or not. The following assumptions are required for the identification of the treatment effect of policy interventions. They imply the continuity of compliance (see Imbens and Zajonc (2011)), the absence of defiers in sample and the non-trivial presence of compliers. Assumption E[T i (1) X i = x] and E[T i (0) X i = x] are continuous in x = c. 2. T i (1) T i (0); 3. E[T i (1) X i = c] E[T i (0) X i = c]; Since we assume that there are no defiers in the sample, individuals are categorized into three groups: always takers, compliers and never takers. If the treatment decision T i is a deterministic function of X i, the data sample includes compliers only and is said to follow a sharp regression discontinuity design. If T i is a probabilistic function of X i, the data sample follows a fuzzy regression discontinuity design. In the classic mean regression discontinuity design, researchers further assume that the average of the potential outcomes are continuous at the threshold value X i = c. Assumption 2.2. E[Y i (0) X i = x] and E[Y i (1) X i = x] are continuous at x = c. 2 In this paper, we assume that X i is single dimensional as is in most regression discontinuity set-ups. See Imbens and Zajonc (2011) for a pioneering study with a multidimensional X i variable. 4

6 Given Assumptions 2.1 and 2.2, the average treatment effect (ATE) of policy intervention at X i = c is identified as E[Y i (1) Y i (0) X i = c] = lim x c E[Y i X i = x] lim x c E[Y i X i = x] in the sharp regression discontinuity design. It is equal to the local average treatment effect for compliers (LATE) since there are only compliers in this case. If receiving treatment depends only partially on the running variable X i, or the regression discontinuity has a fuzzy design, only LATE evaluated at the threshold level x = c is identified. It is equal to E[Y i (1) Y i (0) T i (1) T i (0) = 1, X i = c] = lim x c E[Y i X i = x] lim x c E[Y i X i = x] lim x c E[T i X i = x] lim x c E[T i X i = x]. However, the ATE/LATE only identifies one aspect of the policy s impact on the outcome distribution. Suppose the true effect of policy T i is characterized by the following oversimplified model with a heterogeneous error term Y i = β 0 + β 1 X i + β 2 T i + σ(t i )u i, T i = 1(X i c). Based on the model, the effect of the policy depends on the unobserved u i term and is therefore heterogeneous across the population. In such cases ATE/LATE generally fails to provide a full picture of the treatment effect. Figure 1 shows a hypothetical example where a policy intervention has opposite effects on the lower and upper parts of the outcome distribution but zero mean effect. To investigate the distributional effects of the policy intervention on response outcomes, it is natural to extend the mean regression discontinuity concept discussed above to look at the outcome distributions below and above the critical value of the running variable. The following assumption strengthens the continuity requirement in Assumption 2.2 and 5

7 assumes that the conditional distributions of the potential outcomes, defined as F Yi (0) X i =x and F Yi (1) X i =x, are continuous at the threshold level of the running variable. Assumption 2.3. F Yi (0) X i =x and F Yi (1) X i =x are continuous at x = c. With Assumptions 2.3 and 2.1 the effect of the policy intervention T i on the outcome distribution, or the distributional treatment effect (DTE), at X i = c is identified when the regression discontinuity design is sharp. It is also equal to the local distributional treatment effect (LDTE) for the compliers. Let δ c denote LDTE. δ c = F Yi (1) X i =c F Yi (0) X i =c = lim x c F Yi X i =x lim x c F Yi X i =x under the sharp regression discontinuity design. When the regression discontinuity design is fuzzy, only the LDTE of the policy intervention for compliers evaluated at X i = c is identified. δ c = F Yi (1) T i (1) T i (0)=1,X i =c F Yi (0) T i (1) T i (0)=1,X i =c = lim x c F Yi X i =x lim x c F Yi X i =x lim x c E[T i X i = x] lim x c E[T i X i = x]. The identification follows from the reasoning in Imbens and Angrist (1994) and Imbens and Zajonc (2011). The proof is given in the Appendix. 3 Tests for Distributional Regression Discontinuity Null Hypothesis and Test Statistics We are interested in testing whether a policy intervention unambiguously improves the whole conditional outcome distribution evaluated at the threshold value of the running variable as well as whether it has any effect on the conditional outcome distribution at all. Recall that the LDTE for compliers is identified in both the sharp and the fuzzy regression discontinuity 6

8 designs and is denoted by δ c (.). The null hypotheses of interest are H 1 0 : δ c (.) 0; H 2 0 : δ c (.) = 0. The LDTE δ c (.) has different formulas under sharp and fuzzy regression discontinuity designs as is shown in Section 2. In both cases, its sign depends only on the difference between the conditional outcome distributions approaching from below and above the threshold value of the running variable. Therefore, H0 1 and H0 2 are equivalent to H 1 0 : lim x c F Yi X i =x(.) lim x c F Yi X i =x(.) and H 2 0 : lim x c F Yi X i =x(.) = lim x c F Yi X i =x(.) respectively, regardless of the type of the regression discontinuity design. For simplicity we abuse the notation a bit and let F l lim x c F Yi X i =x, F r lim x c F Yi X i =x, and ˆF l, ˆF r be their local linear estimators respectively. We construct the test statistics using a Kolmogorov-Smirnov type of functional of the difference between ˆF l and ˆF r. Let Ŝ1 and Ŝ 1 be the test statistics for H 1 0 and H 2 0 respectively. Define ( 1 Ŝ 1 = A nh ˆf 2 ( 1 Ŝ 2 = A nh ˆf 2 ) 1 ) 1 2 sup y 2 sup y ( ˆFr ˆF l ) ; ˆF r ˆF l. where A = ( K 2 (u)du ) 1/2 is a constant that depends on the kernel function. n is the sample size for and ˆf the local linear density estimator of the running variable X i evaluated at the threshold value c for the full sample. The convergence rate of the test statistic involves n and ˆf for the full sample instead of subsamples divided by the cut-off value c because nf(c) = n l /P (X i < c) f l (c)p (X i < c) = n l f l (c) and similarly nf(c) = n r f r (c), 7

9 where f l (.) = f(. X i < c) and f r (.) = f(. X i c) are densities for corresponding subsamples. Before we discuss the asymptotic behaviors of the proposed test statistics, we define the local linear estimators ˆF l and ˆF r and investigate their asymptotics. Conditional Empirical Distribution and Its Inference Suppose the data {X i, T i, Y i } n i=1 are i.i.d. For kernel function K and bandwidth h, the local linear estimator ˆF l (y) for the control group evaluated at any fixed value y is the intercept term a for the coefficients a and b that minimize 1 i n,x i <c ( ) [1(Y i y) a b(x i c)] 2 Xi c K. h The local linear estimator ˆF r is defined correspondingly for the treatment group with X i c with the same kernel function K and bandwidth h. 3 To obtain the asymptotic properties of ˆFl and ˆF r, we need the following smoothness assumptions and assumptions on the kernel and bandwidths. Assumption The observations {(X i, T i, Y i )} n i=1 are independent and identically distributed. 2. Let f(y, x) be the probability distribution function of (X i, Y i ) and g(y, x) = 1(u y)f(u, x)du. g(y, x) is uniformly bounded up until the third derivatives everywhere in an open neighborhood of c except at c. Let g (k) r (y, c) and g (k) l (y, c) be the right and left limits of the k-th derivative of g(y, x) with respective to x evaluated at the threshold value x = c, k Let f(x) be the probability density function of X i, f(c) > 0. f(x) is continuously differentiable at c. 3 Theoretically, different bandwidths could be used for the treatment and the control group. But as we will discuss in the next subsection, we propose to use a rule-of-thumb bandwidth based on Imbens and Kalyanaraman (2011), which requires the bandwidth of the two local linear estimations to be the same. 8

10 Assumption The kernel function K is nonnegative, symmetric, bounded and has compact support. 2. The bandwidth h satisfies that nh, nh 5 0 as n. In the empirical section of the paper we use a triangular kernel which is shown by Cheng, Fan, and Marron (1996) to have optimal properties for boundary estimation problems. The bandwidth is chosen based on Imbens and Kalyanaraman (2011). Details are discussed later. The following lemma describes the asymptotic properties of the local linear based conditional empirical functions. Lemma 1. Under Assumption , we have that A nh ˆf [ ˆFl (.) F l (.)] B (F l (.)) where B is a copy of the standard Brownian Bridge. A similar convergence result also holds for the estimator ˆF r (.). A = (3/2) 1 2 for the triangular kernel function. Let n l and n r be the size of subsamples with X i < c and X i c. Horvath and Yandell (1988) study the asymptotic properties of a Nadaraya-Watson based conditional empirical distribution and show a weak convergence result. For the boundary concerns in regression discontinuity, we estimate the conditional distributions using local linear estimators. We show in the appendix that the local linear based conditional empirical distribution has the same limiting distribution as the Nadaraya-Watson based estimator considered by Horvath and Yandell (1988). Asymptotic Properties of the Tests Next we investigate the asymptotic distributions of the test statistics. The scaled difference of the conditional empirical distributions approached from above and below the threshold 9

11 value of the running variable follows ( 1 A = A nh ˆf 2 ( 1 ( 1 2 ) 1/2 [ ( ˆF r ˆF ] l ) (F r F l ) ) 1/2 [ ˆFr F r ] nh ˆf 2 ) 1/2 B (F r (.)) ( 1 2 D = B (F r (.)) if F r = F l. ( ) 1/2 1 [ ] A nh ˆf ˆFl F l 2 ) 1/2 B (F l (.)) The weak convergence result comes from Lemma 1 while the last equality in asymptotic distribution is from the fact that the Brownian Bridge is a Gaussian process and that the data are i.i.d. Under H 2 0 and the least favorable condition of H 1 0, F r F l = 0. Then applying the continuous mapping theorem to the above weak convergence results will give us the asymptotic distributions of the test statistics Ŝ1 and Ŝ2 under the null and the least favorable condition of the null. For both tests, one should reject the null hypothesis if the test statistic is larger than the critical value. Use d 1 and d 2 to denote the critical values of the tests for H 1 0 and H 2 0 respectively. The following propositions characterize the properties of the two uniform tests. Proposition 3.1. Given Assumptions and that d 1, d 2 are positive finite constants, we have: 1. If H 1 0 is true, lim n P (reject H 1 0) P ( sup t [0,1] B(t) > d 1 ), with equality holds if the equality in null holds; if H 1 0 is false, lim n P (reject H 1 0) = If H 2 0 is true, lim n P (reject H 2 0) = P ( sup t [0,1] B(t) > d 2 ) ; if H 2 0 is false, lim n P (reject H 2 0) = 1. The changed-time Brownian Bridge in Lemma 1 is replaced with a standard Brownian Bridge in the proposition since the conditional F l (and F r ) ranges from 0 to 1 and is assumed 10

12 to be continuous. The detailed proof is the same as the proof for Propositions 3.1 and 3.2 in Shen (2013) and is omitted in this paper. The proposition indicates that both tests are consistent. The probability of rejecting H 1 0 when the null is true is never larger than the probability of the supremum of a standard Brownian bridge exceeding the critical value. The probability of rejecting H0 2 when the null is true is equal to the probability of the supremum of the absolute value of a standard Brownian bridge exceeding the critical value. Moreover, the proposition shows that both tests have distribution-free critical values that could be easily tabulated. For example, critical value for testing H0 1 is for the 10% significance level, for the 5% and for the 1%; critical value for testing H0 2 is for the 10% significance level, for the 5% and for the 1%. Interpreting Testing Results How could the testing results be used to complement the mean RD results? The results for the overall significance test H0 2 are straightforward to interpret. For the uniform sign tests, first note that a third null hypothesis H 3 0 : δ c (.) 0 could be easily tested by multiplying the function δ c (.) by 1 and transforming the null to H0. 1 Then any results from testing H0 1 and H0 3 can be categorized in one of the following four ways: 1) Fail to reject δ c (.) 0; fail to reject δ c (.) 0; 2) Reject δ c (.) 0; reject δ c (.) 0; 3) Fail to reject δ c (.) 0; reject δ c (.) 0; 4) Reject δ c (.) 0; fail to reject δ c (.) 0. Let α be the significance level of the uniform sign tests. The first category is equivalent to failing to reject H 2 0 with significance level 2α. Results in this category imply that the policy has no (local) effect throughout the whole outcome distribution (conditional on X i = c). Results in the second category implies that the policy not only has an effect on the response outcome but also has opposite effects on different parts of the outcome distribution 11

13 (conditional on X i = c). Findings in the third category imply that the policy has unambiguously non-negative local treatment effect on the whole outcome distribution (conditional on X i = c) and significantly positive local treatment effect on some part of the outcome distribution. Or in short, the third category implies that the distributional analysis supports a positive policy effect. The fourth category, following a similar argument, supports a negative effect. Besides the testing results, sometimes researchers are also interested in knowing which part of the outcome distribution leads to a rejection of, say, a negative distributional partial effect, or in another words, the part of the outcome distribution on which the policy has a positive local treatment effect. Let Y characterize such a set in the population, then Y {y : δ c (y) < 0}. The true set Y is unknown. However, the calculation of test statistic S 1 also reveals information that could be used to a construct confidence region for the set. Let Ŷ and Ỹ be confidence regions that satisfy respectively lim inf n lim inf n P (Ŷ Y ) 1 α, P (Ỹ Y ) 1 α. The estimated set Ŷ includes, with a probability greater than (1 α), only the parts of the outcome distribution on which the policy has a positive impact. The set Ỹ includes, with a probability greater than (1 α), all parts of the outcome distribution on which the policy has a positive impact. 12

14 Given Proposition 3.1, it is obvious that Ŷ and Ỹ could be defined as { ( 1 Ŷ y : A Ỹ { y : A nh ˆf 2 ( 1 nh ˆf 2 ) 1 2 ( ˆFr ˆF l ) < d 1 }, ) 1 2 ( ˆFr ˆF l ) < d 1 }, where d 1 is the (positive) critical value for testing H0 1 : δ c (.) 0 using significance level α. d 1 is then the critical value for testing H0 3 : δ c (.) 0. The confidence regions defined above, Ŷ and Ỹ, could also be potentially improved if the step-down method for multiple testing (c.f. Romano and Shaikh, 2010) is applied. This detail is out of the scope of this paper. The methodology should be analogous to Armstrong and Shen (2012) which applies the step-down method to construct a confidence region for an optimal treatment assignment set. Bandwidth Choice A key decision in implementing the kernel based distributional test (and in fact any kernel based estimator or test) is to choose a proper bandwidth. Imbens and Kalyanaraman (2011) argue that the classic plug-in or cross validation method that minimizes the mean square error of the local linear estimator over the whole support of the independent variable is not relevant for regression discontinuity studies since researchers doing regression discontinuity analyses are only interested in estimating the treatment effect evaluated at one particular value of the running variable. Imbens and Kalyanaraman (2011) propose an optimal bandwidth choice method that minimizes asymptotic mean square error (AMSE) of the local linear estimator of the ATE or LATE. For our distributional tests, we choose to use rule-of-thumb bandwidths following h rot = h ik N 5/4.25 where h ik is the optimal bandwidth in Imbens and Kalyanaraman (2011) for the estimation 13

15 of F r (ỹ) F l (ỹ), where ỹ = median(y), and N 5/4.25 is the scalar used for under-smoothing. The optimal bandwidth h ik may be estimated as h ik = f ( m (2) r m (2) l σ r 2 + σ l 2 ) 2 + rr + r l 1/5 N 1/5 where is the scaling constant for triangular kernels, σ 2 r and σ 2 l are estimators of lim x c V ar(1(y i ỹ) X i = x) = F r (ỹ) (1 F r (ỹ)) and lim x c V ar(1(y i ỹ) X i = x) = F l (ỹ) (1 F l (ỹ)), f is estimator of f(c), and m (2) r and m (2) l are estimators of second derivatives of F r (ỹ) and F l (ỹ). σ 2 r, σ 2 l, f, m (2) r and m (2) l are all estimated using nonparametric kernel estimation with some initial choices of bandwidths. ˆr r and ˆr l are regularity terms used to avoid the zero denominator problem. See Imbens and Kalyanaraman (2011) for details. In the empirical sections, we also examine for robustness checks test results using 0.75h rot, 1.5h rot and h rot = h ik N 5/4.25, where h ik is the optimal bandwidth for the estimation of F r ( y) F l ( y) with y being the smallest value of the response outcome associated with the supremum or infimum of the estimated LDTE function. We find (reported in the Appendix) that generally h rot is very close to h rot and that changing the bandwidth in a reasonable region around the rule-of-thumb bandwidth does not largely affect the testing results. General Sampling Design Stratified sampling is a popular survey structure used to reduce survey cost. The dataset collected for the empirical example in Section 4 is one example of such a design. When a data sample is stratified, the tests discussed above are potentially inconsistent. In this section, we extend our distributional tests to account for this type of a sampling structure. Suppose a population has S strata and the population size is H. Let {X si, T si, Y si } ns i=1 in stratum s be a random sample of {X, T, Y }. The sampling probability of an observation in stratum s is equal to n s /H s where n s is the sample size and H s the population size of stratum s. Following Bhattacharya (2005), we assume that S is fixed while n s goes to infinity for 14

16 all s = 1,..., S. Let F l be the new weight adjusted local linear estimator for the conditional outcome distribution of the control group. For a fixed value of y, F l (y) is the intercept term a for the coefficients a and b that minimize S H s /n s s=1 1 i n s,x si <c ( ) [1(Y si y) a b(x si c)] 2 Xsi c K. h Let f ls (x) and g ls (y, x) be defined as the counterparts of the population concepts f l (x), and g l (y, x) for stratum s. Then f l (x) = S s=1 w sf ls (x) and g l (y, x) = S s=1 w sg ls (y, x) where w s = H s /H. Let a s = n/n s. If the smoothness assumptions in Assumption 3.1 are satisfied for all strata in the sample and the kernel and bandwidth assumptions in Assumption 3.2 are satisfied, we have that Lemma 2. ( A[nh f] ( S 1 2 Fl (.) F l (.)) W l s=1 let = G l, ) ( S ) wsa 2 g ls (., c) s F l W l w 2 f ls (c) f l (c) sa s f s=1 l (c) where W l is a standard Wiener process and f = S s=1 w s ˆf s with ˆf s being a consistent density estimator of X si evaluated at X si = c. A similar convergence result also holds for the estimator F r (.). When the data are i.i.d. (S = 1) the Gaussian process G l in the lemma simplifies to the changed-time Brownian bridge B (F l (.)) in Lemma 1. The lemma is proved in Shen (2013) for weight-adjusted Nadraya-Watson estimators of conditional outcome distributions. In the appendix, we show that F l (.) has the same limiting distribution as the Nadaraya-Watson based estimator. When F r F l = 0, i.e. under H0 2 or the least favorable condition of H0, 1 the scaled difference of the conditional empirical distributions approaching from above and below the 15

17 threshold value of the running variable follows ( ) 1/2 1 [ A nh f Fr 2 F ] l ( ) 1/2 1 G r 2 ( ) 1/2 1 G l G. 2 The asymptotic process G is data-dependent since both G l and G r are. To obtain critical values of the distributional tests H 1 0 and H 2 0, we want to simulate processes that weakly converge to processes that are identical to but (asymptotically) independent of G l and G r with probability one. Let {U i } n i=1 denote a sequence of i.i.d. N(0, 1) random variables that are independent of the samples. It is well recognized (cf. Billingsley, 1999) that the process W (.), with W (t) = 1 n n 1(i nt)u i, t [0, t], i=1 weakly converges to a standard Wiener process W on D([0, t]); P (W C) = 1. Here we construct a process G l such that ( S ) ( S S ) G l = W wsa 2 s ĝ ls / w s ˆfls F S l W wsa 2 s ˆfls / w s ˆfls. s=1 s=1 s=1 s=1 where ĝ ls (y) is a consistent estimator of g ls (y, c) for any y value and ˆf ls = g ls (, c) is a consistent estimator of f ls (c). Similarly, a process G r could be constructed. The process G = 1 2 (G r G l ) weakly converges to a process that is identical to but (asymptotically) independent of the asymptotic distribution of G. See Shen (2013) for a detailed proof for the weak convergence of the simulated process. The simulated p-value in practice is equal to the fraction of simulations that have the supremum functional sup y G or sup y G larger than the test statistic. The simulated critical value is equal to the (1 α) 100% s quantile of the simulated sup y G or sup y G 16

18 values in repeated simulations, where α is the significance level. 4 Empirical Examples 4.1 Empirical Exercise 1 Modigliani s life-cycle model predicts that households smooth their consumption over their entire life-span. The empirical literature (see Banks, Blundell, and Tanner, 1998; Bernheim, Skinner, and Weinberg, 2001; Miniaci, Monfardini, and Weber, 2010 and Battistin, Brugiavini, Rettore, and Weber, 2009, among many others), however, documents a one-time drop in average household consumption at the retirement age of household members. In this section, we investigate how the distribution of total household expenditures in Italy changes at the retirement age of household members. It turns out that the distributional results are much richer than the mean results documented in the literature. We find that as households consumption patterns are quite heterogeneous both before and after retirement. The dataset used in this empirical section is extracted from the Italy Survey on Household Income and Wealth (SHIW) by Battistin, Brugiavini, Rettore, and Weber (2009). The SHIW surveys around 8000 households annually in more than 3000 municipalities. The dataset consists of households for whom job-pension eligibility of the household head, either a single man or the husband in a couple, can be defined. The households included in the dataset have also completed the household consumption survey and do not have any missing data in key expenditure measures. The SHIW asks detailed questions about household spending in the previous year. Battistin, Brugiavini, Rettore, and Weber (2009) use this information and the discontinuity in the eligibility of retirement pension of household heads to study households expenditure patterns around retirement. They find a one-time drop in average household consumption upon retirement of household heads. Since the SHIW uses stratified sampling, sampling weights are used in the nonparametric mean and distributional regression discontinuity analyses. 4 In fact, we find that proper 4 The primary sampling units of the SHIW are municipalities. They are stratified by region (total of 20) and size (with a cut-off at population 40,000). To calculate the weight, the 2001 Italian Census population 17

19 weighting is important to obtain consistent estimates for both mean and distributional RD analysis. The result is intuitive; the SHIW over-samples households living in the large metropolitan areas, and spending patterns may differ for households living in small and large metropolitan areas. Throughout the analysis, we use the triangular kernel function and the rule-of-thumb bandwidths suggested in the theoretical section. The computation is carried out using the locfit package in R. First we carry out a nonparametric mean regression analysis of the full sample. Our results can be summarized by the left panel of Figure 2. The lower bimodal bar chart reports the number of households that fall into bins defined by the husbands years to/from pension eligibility. The circles and lines in the upper portion of the graph report the average log total household expenditures within each bin and the results of the local linear regression of log total household expenditure on years to pension eligibility. In all analyses in this section, we follow Battistin, Brugiavini, Rettore, and Weber (2009) and eliminate households in which the husband has zero years to/from pension eligibility since the documented consumption in the dataset for that year may cover both pre- and post-retirement periods. We find that the average total household expenditures drop by 5.96% once husbands become eligible for their pension. Since only 32.51% of household heads (as is reported in Table 1) retire as soon as they become eligible for their pension, the effect on average expenditures is approximately 18.33% for those who do retire. The distributional regression discontinuity analysis supports the finding of consumption drop for the full sample. The right panel of Figure 2 shows how the conditional distribution of the log total household expenditures changes when husbands become eligible for pension. The top two step functions in the figure are the estimated conditional cumulative distribution functions approached from before and after husbands become eligible for their pension. The solid and dashed jagged lines in the lower area of the graph report the difference between the two conditional cdfs and the 95% pointwise confidence band of the difference respectively. statistics are matched to each strata of the SHIW to back out the population size of each strata. 18

20 It appears from the graph that the second quartile of households expenditure distribution is fairly stable when husbands attain pension eligibility but all the rest shift clearly to the left. That is, for any given total expenditure value in these parts of the distribution, the probability that a household spends less than that value becomes larger. To formally test whether the distributional shift is statistically significant, we perform the distributional regression discontinuity test discussed in the theoretical section and report the results in the first column of Table 1. We find that we are able to reject the null hypothesis that the household consumption distribution uniformly shifts to the right after husbands become eligible for their pension at the 1% significance level and fail to reject the null hypothesis that the household consumption distribution uniformly shifts to the left at the 10% significance level. In Table A2 of Appendix B we provide robustness checks for both the mean and distributional regression discontinuity tests using the same nonparametric analysis technique but different bandwidths. We find that values of the test statistics depend somewhat on the bandwidth choice, which is a common problem of all local linear estimation and testing methods. However, this dependence generally does not affect the test results. In addition to the full-sample analysis, we recognize the possibility that the pension eligibility of the spouse may also play an important role in household consumption decision. For example, Lundberg, Startz, and Stillman (2003) find that household members influence over consumption decisions in the household is affected by their market income. In the next set of analyses, we divide the households by family type and wife s pension eligibility and repeat the mean and distributional regression discontinuity analyses performed on the full sample. Figure 3 as well as columns 2-4 of Table 1 show the results separately for households with the wife attaining pension eligibility at the same time or before the husband, with the wife attaining pension eligibility after the husband and with only single man. 5 The results show that, households where the wife is already eligible for pension on average cut their 5 We do not include couples in which the wife s pension eligibility has not already been defined in Battistin, Brugiavini, Rettore, and Weber (2009). 19

21 expenditure more than any other type of household when the head retires. This is consistent with our expectations since the wives of these households have probably already retired or may choose to retire at the same time as their husband. Households composed of a single man also cut their expenditure significantly. The expenditure cuts for both types of households are significant at the 1% significance level. Meanwhile, the mean regression discontinuity analysis finds an increase in expenditures for households where the wife is ineligible for a pension. The increase is significant at the 5% significance level for one-sided testing. The distributional tests reveal interesting new information. For households where the wife is not yet eligible for a pension, the effect of the head s retirement is heterogeneous and has an opposite sign in the top and the bottom parts of the distribution. The results suggest that low income households where the wife attains pension eligibility after the husband may maintain or even increase their consumption when the husband retires. On the contrary, the upper-middle class households with the same household structure tend to cut their spending upon husband s retirement. For households where the wife is already eligible for a pension and households composed of a single men, the distributional tests support an unambiguous cut along the expenditure distribution. Table 2 shows the mean and distributional regression discontinuity results assuming that households in the SHIW were i.i.d. draws from the Italian population. Comparing the results in Table 2 with those in Table 1, we see that properly applying sampling weights is important for the regression discontinuity analyses. The average consumption drop is substantially more significant in column (2) if the data is wrongly treated as i.i.d. Also the distributional analysis no longer shows evidence of heterogeneity in columns (3). Since the SHIW data we use in this section over-samples households living in large metropolitan areas with a population exceeding 40,000, this discrepancy between weighted and unweighed results suggests that households in small and large metropolitan areas of Italy may have very different consumption patterns at retirement age. Next we split the sample further by the population of the municipality in which the households reside and perform the same mean 20

22 and regression discontinuity analyses. Figures 4-5 and Table 3 show the mean and distributional regression discontinuity results separately for households living in municipalities with populations smaller and larger than 40,000. It is clear that the households living in these two locations behave differently as they reach retirement. The difference is particularly substantial for households where the wife is eligible for pension before the husband. In large metropolitan areas, more than half of household heads retire upon attaining pension eligibility if their wife attains pension eligibility at the same time or attains pension eligibility before them. Upon husbands retirement, these households cut their spending by more then two-thirds. Moreover, the cut is unambiguously present along the whole expenditure distribution. In contrast, those living in small municipalities exhibit a more heterogeneous change in expenditures when the household head retires. For households where the wife is ineligible for a pension when head reaches retirement age, we observe that small metropolitan households have a more heterogeneous spending response to husbands retirement while large metropolitan households almost unambiguously cut expenses along the household expenditure distribution. Single men appear to cut expenses along the expenditure distribution regardless of their location of residence. In the interest of space we omit graphs for single mean living in different locations. What is driving the substantial expenditure cut for the first type of households living in large municipalities (column (2) of Table 3)? And what is driving this heterogeneous change in expenditure for the second type of households living in small municipalities (column (3) of Table 3)? To understand the causes behind the change in expenditures, we conduct mean and distributional regression discontinuity analyses for various categories of household expenditure. Figure 6 shows that the substantial and unambiguous expenditure cut for the first type of households living in large municipalities is consistent for both food and non-food nondurable expenditures, while Figure 7 shows that the heterogeneous expenditure change for the second type of households living in small municipalities is driven by the non-food 21

23 nondurable expenditures. 5 Conclusion In this paper, we introduced new testing methods within the regression discontinuity setup to identify local policy impacts on the whole distribution of response outcomes. The proposed tests are applicable to both sharp and fussy regression discontinuity designs, robust to stratified sampling structure, and easy to implement. We apply the tests to study the retirement-consumption hypothesis using an Italian dataset. We find that household head s retirement results in unambiguous expenditure cut for some subgroups but heterogeneous expenditure changes for other subgroups along the household expenditure distribution. For example, for households where the wife is not yet eligible for a pension when the husband retires, the effect of the husband s retirement even has opposite signs in the top and the bottom parts of the household expenditure distribution, suggesting that low income households may behave very differently than the upper-middle class households at the retirement age. Further analysis shows that the heterogeneous pattern in consumption changes at household head s retirement is driven by non-food non-durable expenditures and is only present for households living in small municipalities. 22

24 References Angrist, J. D., and V. Lavy (1999): Using Maimonides Rule to Estimate the Effect of Class Size on Scholastic Achievement, The Quarterly Journal of Economics, 114(2), Armstrong, T., and S. Shen (2012): Inference on Optimal Treatment Assignment,. Banks, J., R. Blundell, and S. Tanner (1998): Is There a Retirement-Savings Puzzle?, The American Economic Review, 88(4), pp Battistin, E., A. Brugiavini, E. Rettore, and G. Weber (2009): The Retirement Consumption Puzzle: Evidence from a Regression Discontinuity Approach, The American Economic Review, 99(5), pp Bernheim, B. D., J. Skinner, and S. Weinberg (2001): What Accounts for the Variation in Retirement Wealth among U.S. Households?, American Economic Review, 91(4), Bhattacharya, D. (2005): Asymptotic Inference from Multi-stage Samples, Journal of Econometrics, 126, Billingsley, P. (1999): Convergence of Probability Measures. Wiley-Interscience. Bitler, M. P., J. B. Gelbach, and H. W. Hoynes (2008): Distributional impacts of the Self-Sufficiency Project, Journal of Public Economics, 92(3C4), Cheng, M., J. Fan, and J. S. Marron (1996): On Automatic Boundary Corrections, Annals of Statistics, 25, Frandsen, B. R., M. Frolich, and B. Melly (2012): Quantile treatment effects in the regression discontinuity design, Journal of Econometrics, 168(2),

25 Horvath, L., and B. S. Yandell (1988): Asymptotics of conditional empirical processes, Journal of Multivariate Analysis, 26(2), Imbens, G., and K. Kalyanaraman (2011): Optimal Bandwidth Choice for the Regression Discontinuity Estimator, The Review of Economic Studies. Imbens, G., and T. Zajonc (2011): Regression Discontinuity Design with Multiple Forcing Variables, Working Paper. Imbens, G. W., and J. D. Angrist (1994): Identification and Estimation of Local Average Treatment Effects, Econometrica, 62(2), pp Imbens, G. W., and T. Lemieux (2008): Regression discontinuity designs: A guide to practice, Journal of Econometrics, 142(2), , ce:title The regression discontinuity design: Theory and applications /ce:title. Koenker, R. (2005): Quantile Regression, no in Cambridge Books. Cambridge University Press. Lundberg, S., R. Startz, and S. Stillman (2003): The Retirement-Consumption Puzzle: a Marital Bargaining Approach, Journal of Public Economics, pp Miniaci, R., C. Monfardini, and G. Weber (2010): How does consumption change upon retirement?, Empirical Economics, 38(2), Romano, J. P., and A. M. Shaikh (2010): Inference for the Identified Set in Partially Identified Econometric Models, Econometrica, 78(1), Shen, S. (2013): Estimation, Inference and Testing of Distributional Partial Effects Under Stratified Sampling,. van der Klaauw, W. (1997): A Regression-Discontinuity Evaluation of the Effect of Financial Aid Offers on College Enrollment, C.V. Starr Center Research Report, (97-10). 24

26 Table 1: Results of Mean and Distributional Regression Discontinuity Analysis Point Estimates: (1) (2) (3) (4) Full Sample Wife Eligible Husband Eligible Single Men for Pension First for Pension First % of Compliers 32.51% 27.57% 24.98% 24.42% % Change of Average Expenditure -5.96% % 4.38% -7.40% Test statistics: H 0 : E Ti (1) X i =c E Ti (0) X i =c 28.63*** 7.58*** 8.24*** 13.53*** H 0 : E Yi (1) X i =c ( )E Yi (0) X i =c -3.08*** -2.33*** 1.78** -2.74*** H 0 : F Yi (1) X i =c F Yi (0) X i =c *** H 0 : F Yi (1) X i =c F Yi (0) X i =c 3.14*** 3.28*** 2.33*** 2.12*** Sample Size Notes: * Reject the null hypothesis at 10% ; ** Reject the null hypothesis at 5%; *** Reject the null hypothesis at 1%. The decisions are based on simulated critical values reported in Table A2 of Appendix B. Table 2: Mean and Distributional Regression Discontinuity Analysis: Unweighted Point Estimates: (1) (2) (3) (4) Full Sample Wife Eligible Husband Eligible Single Men for Pension First for Pension First % of Compliers 31.59% 36.99% 18.30% 33.87% % Change of Average Expenditure -5.99% % -1.47% -7.7% Test statistics: H 0 : E Ti (1) X i =c E Ti (0) X i =c 32.77*** 13.96*** 8.73*** 21.44*** H 0 : E Yi (1) X i =c ( )E Yi (0) X i =c -3.65*** -4.88*** *** H 0 : F Yi (1) X i =c F Yi (0) X i =c ** H 0 : F Yi (1) X i =c F Yi (0) X i =c 2.03*** 2.75*** * Sample Size Notes: * Reject the null hypothesis at 10% ; ** Reject the null hypothesis at 5%; *** Reject the null hypothesis at 1%. The critical values are tabulated. See Section 3. 25

27 Table 3: Results of Mean and Distributional Regression Discontinuity Analysis: by Population Size Wife Eligible Husband Eligible Single Man for Pension First for Pension First (1) (2) (3) (4) (5) (6) Small Muni. Large Muni. Small Muni. Large Muni. Small Muni. Large Muni. Point Estimates: % of Compliers 12.25% 52.28% 20.39% 36.68% 14.70% 41.44% % Change of Average Expenditure 13.06% % 8.64% -5.15% -3.98% -7.04% Test statistics: H 0 : E Ti (1) X i =c E Ti (0) X i =c 2.62*** 13.49*** 5.62 *** 9.31*** 6.64*** 19.47*** H 0 : E Yi (1) X i =c ( )E Yi (0) X i =c 1.71** -7.74*** 1.43* ** H 0 : F Yi (1) X i =c F Yi (0) X i =c *** *** -1.37* H 0 : F Yi (1) X i =c F Yi (0) X i =c 1.40* 5.24*** 2.46** 1.65* 2.04*** 3.34*** Sample Size Notes: * Reject the null hypothesis at 10% ; ** Reject the null hypothesis at 5%; *** Reject the null hypothesis at 1%. The decisions are based on simulated critical values reported in Table A3 of Appendix B. Figure 1: Mean and Distributional Change in Response Outcome F(y X=0) Control Treatment Outcome Y Average 75% Quantile 25% Quantile y Running Variable X Note: The left panel draws the left and right limits of conditional cumulative distribution functions of Y i as the running variable X i approaches to the threshold value c. The right panel draws the average outcome and the first and third quartiles of the outcome Y i as functions of the running variable X i. 26

28 Figure 2: Full Sample Analysis Mean RD Distributional RD ln(total Expenditure) Number of Observations CDF CCDF After CCDF Before Difference in CCDFs 95% Pointwise CI Difference in CDF Year to Pension Eligibiliy bandwidth =4.94 ln(total Expenditure) bandwidth =4.9 Wife Eligible for Pension First Husband Eligible for Pension First Single Man ln(total Expenditure) ln(total Expenditure) ln(total Expenditure) Year to Pension Eligibiliy bandwidth=4.42 Year to Pension Eligibiliy bandwidth=4.35 Year to Pension Eligibiliy bandwidth=5.1 CCDFs After Before CCDFs After Before CCDFs After Before ln(total Expenditure) bandwidth=4.39 ln(total Expenditure) bandwidth=4.36 ln(total Expenditure) bandwidth=5.4 Figure 3: Subsample Analysis: Household Structure 27

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

An Alternative Assumption to Identify LATE in Regression Discontinuity Design Yingying Dong University of California Irvine May 2014 Abstract One key assumption Imbens and Angrist (1994) use to identify