arxiv: v1 [stat.me] 9 May 2018

Size: px

Start display at page:

Download "arxiv: v1 [stat.me] 9 May 2018"

Lora Bond
5 years ago
Views:

1 Estiation Methods for Cluster Randoized Trials with Noncopliance: A Study of A Bioetric Sartcard Payent Syste in India Hyunseung Kang Luke Keele arxiv: v [stat.me] 9 May 208 May, 208 Abstract Many policy evaluations occur in settings with randoized assignent at the cluster level and treatent noncopliance at the unit level. For exaple, villagers or towns ight be assigned to treatent and control, but residents ay choose to not coply with their assigned treatent status. For exaple, in the state of Andhra Pradesh, the state governent sought to evaluate the use of bioetric sartcards to deliver payents fro antipoverty progras. Sartcard payents were randoized at the village level, but residents could choose to coply or not. In soe villages, ore than 90% of residents coplied with the treatent, while in other locations fewer than 5% of the residents coplied. When noncopliance is present, investigators ay choose to focus attention on either intention to treat effects or the causal effect aong the units that coply. When analysts focus on effects aong copliers, the instruental variables fraework can be used to evaluate identify causal effects. We first review extant ethods for instruental variable estiators in clustered designs which depend on assuptions that are often unrealistic in applied settings. In response, we develop a ethod that allows for possible treatent effect heterogeneity that is correlated with cluster size and uses finite saple variance estiator. We evaluate these ethods using a series of siulations and apply the to data fro an evaluation of welfare transfers via sartcard payents in India. We thank participants at the Penn Causal Inference Seinar for helpful coents. Assistant Professor, University of Wisconsin, Madison, Eail: hyunseung@stat.wisc.edu Associate Professor, Georgetown University, Eail: lk68@georgetown.edu

2 Introduction In any policy settings, randoized trials are used to evaluate policy. Randoized trials allow analysts to rule out that causal effect estiates are confounded with pretreatent differences or selection biases aongst subjects. In education and public health applications, investigators often use clustered randoized trials CRTs, where treatents are applied to groups of individuals rather than individuals. While clustered treatent assignent reduces power, it allows for arbitrary patterns of treatent spillover for units within the sae cluster Ibens and Wooldridge This design feature is critical in settings where interactions between units within clusters are difficult to prevent. typical of any CRTs. The state of Andhra Pradesh conducted an intervention that is The goal was to evaluate the effectiveness of using a bioetric payent syste to deliver social welfare payents to recipients in India Muralidharan et al The sartcard payent syste used a network of locally hired, bank-eployed staff to bioetrically authenticate beneficiaries and ake cash payents in villages. Such systes are designed to reduce the tie it takes for payents to reach payees and reduce the chance that payents are siphoned off to soeone other than the payee. The intervention was randoly assigned at the village level. Out of 57 villages, 2 were assigned to the intervention. Assignent at the village level prevented treatent spillovers that would have been difficult to prevent if assigned at the individual level. For villages assigned to the control condition, sartcard payents were not available. For those in treated villages, recipients first had to enroll in the sartcard progra by subitting bioetric data typically all ten fingerprints and digital photographs. Beneficiaries were then issued a physical sartcard that included their photograph and an ebedded electronic chip storing biographic and bioetric data; the card was also linked to newly created bank accounts. Governent officials conducted enrollent capaigns to collect the bioetric data and issue the cards. The governent contracted with banks to anage payents, and these banks in turn contracted with custoer service providers CSPs to anage the accounts and 2

3 travel to villages to deliver payents. The syste was used to deliver payents fro two large welfare progras: the National Rural Eployent Guarantee Schee NREGS, and Social Security Pensions SSP. The first is a work-fare progra that guarantees every rural household 00 days of paid eployent each year Dutta et al SSP copleents the first progra by providing incoe support to those who are unable to work. Beneficiaries used the sartcards to collect payents fro CSPs by inserting the into a point-of-service device. The device reads the card and retrieves account details. Payees were propted for one of ten fingers, chosen at rando, to be scanned. The device copares this scan with the records on the card, and authorizes a transaction if they atch. Once authorized the aount of cash requested is disbursed, and the device prints out a receipt. Muralidharan et al. 206 conducted the original evaluation but did not account for noncopliance in the analysis. In the sartcard evaluation, as is true in any CRTs with huan subjects, copliance with the treatent assignent varied. Beneficiaries were designated as copliant if they used the sartcard one or ore ties to collect payents. In soe villages, 90% or ore of the beneficiaries coplied with the treatent, while in any villages classroo less 0% of the recipients coplied with their assigned treatent assignent. While noncopliance is coon in these designs, the statistical literature contains relatively few results on the analysis of CRTs with noncopliance. For exaple, two widely used texts on CRTs do not ention noncopliance at all Hayes and Moulton 2009; Donner and Klar Noncopliance in CRTs can be analyzed within the instruental variables IV fraework. IV allows investigators to estiate the average causal effect aong the subpopulation that coplied with their assigned treatent Angrist et al o et al. 2008b and Schochet 203 both extended the IV estiation fraework to CRTs. In this paper, we study and develop ethods of estiation and inference for CRTs within noncopliance. We review the two ost coonly used ethods of estiation and inference: one based on group level averages, and a second using unit level data. We then propose an alternatives. The first is designed to provide accurate finite saple inferences, 3

4 which ay be useful given the relatively sall saple sizes in any CRT applications. Using a siulation study, we copare these different ethods of estiation and inference. We conclude with an analysis of the sartcard trial which otivates the study. 2 Review of Clustered Randoized Trials and Noncopliance 2. Notation and Setup Suppose there are total clusters j,...,. For each cluster j, there are n j units, indexed by i,..., n j and we have n n j total units. A fixed nuber of clusters are assigned treatent and the other clusters are assigned control. The treatent assignent is unifor within each cluster; if cluster j is assigned treatent, all n j individuals in cluster j are assigned treatent. Let Z j indicate that cluster j received treatent and Z j 0 indicates that cluster j received control. Let Z Z,..., Z be the collection of Z j s and let z z,..., z be one possible value of treatent assignent fro a set of treatent assignents Z {z {0, } Z j }. Let D indicate the observed copliance to treatent assignent for individual i in cluster j and Y denotes the observed outcoe for individual i in cluster j. We can also let Z to be the treatent assignent of individual i in cluster j, but because treatent assignent is unifor within each cluster and to iniize notational burden, this notation is used sparingly. We use the potential outcoes approach laid out in Neyan 923 and Rubin 974 to define causal effects. Let D and D 0 indicate the potential copliances to treatent assignent of individual i in cluster j if cluster j received treatent Z j or control Z j 0, respectively. Let Y,D and Y 0,D 0 indicate the potential outcoes of individual i in cluster j if cluster j received treatent Z j and control Z j 0, respectively. Because only one of the two potential outcoes can be observed for treatent assignent Z j, the 4

5 relationship between the potential outcoes and observed values can be written as { Let F Y,D 0,D 0 Y Z j Y + Z j Y, D Z j D + Z j D 0 z,d z }, D z, z 0,, j,...,, i,..., n j be the set containing all values of the potential outcoes. Note that F fors a paraeter space since knowing F allows us to characterize the distribution of the observables, Y and D. We note that the potential outcoes notation iplicitly assues that there is no inter-cluster interference and the stable unit treatent value assuption SUTVA holds Rubin 986. That is, the treatent assignent or copliance of any individual in cluster j cannot affect the potential outcoes and the potential copliances of individuals in cluster k and is coonly assued in clustered randoized trials; see Frangakis et al. 2002; Sall et al. 2008b; o et al. 2008a; Iai et al. 2009; Schochet and Chiang 20 and Middleton and Aronow 205 We also define suary easures for each cluster j in the for of sus and averages. Specifically, for each cluster j, let Y j n j i Y ij and D j n j i D ij indicate the sus of individual outcoes and copliances, respectively. Siilarly, for each cluster j, let Ȳj Y j /n j and D j D j /n j indicate the average of individual outcoes and copliances, respectively. For each cluster j and treatent indicator z {0, }, let Y D z j n j i Dz j z,d z j j n j i Y z,d z ij and indicate the sus of potential outcoes and copliance values, respectively, when cluster j is assigned treatent value z. 2.2 Assuptions We review the assuptions underlying CRTs with non-copliance and we begin by discussing assuptions specific to CRTs; see Hayes and Moulton 2009; Donner and Klar We assue that each cluster is randoly assigned to treatent Z j or control Z j 0 and that the probability of receiving both is non-zero. Assuption Cluster Randoization. Given clusters, clusters 0 < < are 5

6 randoly assigned treatent and the other clusters are assigned control. PZ z F, Z PZ z Z and PZ z Z Assuption holds by the design of CRTs since clusters are randoly assigned treatent and/or control. Also, with Assuption and SUTVA, we can identify and unbiasedly estiate the average causal effect of treatent assignent on the outcoe, denoted as µ Y and coonly referred to as the intent-to-treat ITT causal effect, and the average causal effect of the treatent assignent on copliance, denoted as µ D and coonly referred to as the average copliance rate. µ Y n n j Y i,d 0,D 0 Y, µ D n n j D D 0 i The unbiased estiators for the paraeters µ Y and µ D, denoted as ˆµ Y and ˆµ D, respectively, are difference-in-eans estiators of sus between treated and control clusters ˆµ Y n Z j Y j Z j Y j, ˆµ D n Z j D j Z j D j where E[ˆµ Y F, Z] µ Y and E[ˆµ D F, Z] µ D. Next, we review ters and assuptions that are specific to noncopliance; see Ibens and Angrist 994; Angrist et al. 996; Hernán and Robins 2006, Baiocchi et al. 204 and references therein for ore inforation. We assue that the average copliance rate is non-zero and no individual systeatically defies his/her treatent assignent, also referred to as onotonicity Ibens and Angrist 994 Assuption 2 Non-Zero Causal Effect. The treatent assignent z j induces, on average, changes in copliance D ij, i.e. µ D 0. Assuption 3 Monotonicity. There is no individual who systeatically defies the treatent assignent, i.e. D 0 D for all ij. 6

7 We briefly ake the following coents about Assuptions 2 and 3. If Assuption holds, Assuption 2 can be verified fro data by using the estiator ˆµ D. However, Assuption 3 cannot be verified fro data because it requires observing both potential copliance values D 0 and D for every individual. Under Assuption 3, each individual i in cluster j is categorized into four types, alwaystakers, never-takers, copliers, and defiers, based on his/her potential copliances D 0 D Angrist et al Always-takers are individuals where D D 0 and ; these individuals would use the sartcard payent syste irrespective of the treatent for his or her village. Never-takers are individuals where D D 0 0; these individuals would never use sartcard payents irrespective of the village treatent assignents. Copliers are individuals where D and D 0 0; these individuals coply with their treatent assignents. That is, they use sartcard payents only when their village is assigned to treatent. Under this categorization of individuals, Assuption 3 states that there are no defiers in the study population: no one who uses the sartcard syste when their village is not assigned or who does not use the sartcard syste when their village is assigned. Assuption 3 can hold by experiental design if units receiving the control cannot obtain the treatent, i.e. D 0 0; this is known as one-sided noncopliance in the literature. In the sartcard intervention, noncopliance was one-sided, since the key technology for payent verification was not ade available to control villages. Next, Assuptions 2 and 3 do not iply each other. For instance, if Assuption 3 holds, Assuption 2 ay fail to hold if all individuals have D D 0. Conversely, if Assuption 2 holds, Assuption 3 ay fail to hold if the proportion of individuals with D D 0 0 is higher than the proportion of individuals with D 0 and D. and Finally, we assue that conditional on copliance, the initial treatent assignent has no ipact on the outcoe. This is coonly known as the no direct effect or the exclusion restriction assuption. Assuption 4 Exclusion Restriction. Conditional on the copliance value d, the treat- 7

8 ent assignent has no effect on the outcoe. d {0, }, Y,d Y 0,d Y d Siilar to Assuption 3, Assuption 4 cannot be verified by data because it requires observing both potential outcoes Y,d and Y 0,d. Also, designing an experient where Assuption 4 holds can be difficult. For exaple, the exclusion restriction is often ore credible in double-blinded trials in edicine Hernán and Robins In general, Assuption 4 s validity is judged based on substantive knowledge about the treatent and the outcoe. For exaple, in this study, for Assuption 4 to hold, it ust be the case that being assigned to sartcard payents has no direct effect on the outcoe except through exposure to the intervention. That is, being assigned to the sartcard payent syste can only reduce payent delays through actual use of sartcard payents. This sees likely to hold, since the reduced payent ties are directly facilitated by the technology that underlies the intervention. 2.3 Coplier Average Treatent Effect In CRTs with non-copliance, the causal estiand of interest is the population coplier average causal effect CACE, which is the average effect of actually taking the treatent i.e. when D is set to versus not taking the treatent i.e. when D is set to 0 aong copliers. We denote the population CACE as τ τ nj i Y Y 0 ID, D 0 0 nj. i ID, D 0 0 The paraeter τ is also referred to as a local causal effect because it only describes the causal effect aong a subgroup of individuals in the population, specifically the copliers. Thus, we can describe the effect of using sartcard payents aong those recipient who were 8

9 onotonically induced to use sartcards when exposed to the intervention. Prior literature has shown that under assuptions -4, τ is identified by taking the ratio of the ITT effect µ Y with the copliance rate µ D, i.e. µ Y /µ D Ibens and Angrist 994; Angrist et al In the context of a CRT, we can decopose τ to understand how the CACE varies by cluster. Let n CO,j > 0 be the nuber of copliers in cluster j, n CO be the total nuber of copliers across all clusters, and τ j be the CACE for cluster j, i.e. n j n CO,j ID, D 0 0, n CO i n CO,j, τ j nj i Y Y 0 ID, D 0 0 n CO,j Then, we can rewrite τ as a weighted average of cluster-specific CACEs where the weights, denoted as w,..., w are functions of the nuber of copliers in each cluster. τ w j τ j, w j n CO,j n CO,j 2 If the CACEs are constant across clusters so that τ j τ k for j k, the population CACE τ is equal to the cluster CACE τ j. It is coon to assue constant effects in econoetrics where a structural odel between the outcoe Y and the copliance D is assued, for instance Y β 0 + β D + ɛ where β 0, β are paraeters, ɛ is a ean-zero error ter that is independent of the treatent assignent variable Z, and algebra shows that β τ Holland 988; Hernán and Robins 2006; Wooldridge 200; Kang et al Finally, we ake a technical note that the decoposition in equation 2 will still hold if soe clusters have no copliers, i.e. clusters j where n CO,j 0; instead of suing all clusters in equation 2, we would siply su over clusters j with n CO,j > 0 and our results below will hold. For ease of exposition, we assue n CO,j > 0 for every cluster, which is reasonable in practice and holds in our data. We can foralize testing for τ as follows. For any τ 0 R, let H 0 denote the null hypothesis 9

10 for τ H 0 : τ τ 0, τ 0 R 3 The null hypothesis H 0 in equation 3 is a coposite null hypothesis because there are several values of F for which the null holds. In other words, H 0 is not a sharp null hypothesis Fisher 935; under a sharp null, we would be able to specify exactly the unobserved potential outcoes. In fact, the sharp null of no ITT effect, Y,D i i Y 0,D 0 i i for all i iplies H 0 : τ 0, but the converse is not necessarily true; there are other values of F that satisfies the null hypothesis H 0 : τ 0. Next, we describe extant procedures for learning about CACE in CRTs with non-copliance. 3 Extant Methods for Analysis of CRTs with Noncopliance We review two popular ethods of inferring CACE in CRTs with non-copliance. The first ethod outlined in Schochet 203 and Hansen and Bowers 2009 aggregates individual observations at the cluster level and conducts a statistical analysis with the aggregate clusterlevel quantities. However, the ethods in Hansen and Bowers 2009 are only designed for binary outcoes. The second ethod applies two-stage least squares TSLS to the unit level data and uses a robust standard error to account for within cluster correlations. A third approach, which we do not explore, is to iplent IV ethods using rando effects odels o et al. 2008b; Sall et al These ethods are used ore infrequently. 3. The Cluster-Level Method Cluster-level ethods analyzes CRTs with non-copliance analyze at the cluster level. Specifically, the investigator take averages or sus of individual outcoes and copliances within the clusters, and then treat the study as though it only took place at the cluster level 0

11 and essentially servers as the effective population size Schochet 203; Hansen and Bowers Forally, given cluster-level average outcoes Ȳj and copliances D j, we define the average outcoes and copliances by treatent status Y T D T Z j Y j, Y C Z j D j, D C Z j Y j Z j D j Here, Y T and Y C represent the average Ȳj aong treated and control clusters, respectively. Siilarly, D T and D C represent the average D j aong treated and control clusters. The four quantities, Y T, Y C, D T and D C, are used to define a Wald-like estiator for τ, which we denote as τ CL : τ CL Y T Y C D T D C. 4 The nuerator τ cl in equation 4 is intended to capture the ITT effect of the treatent assignent on the outcoe. The denoinator τ cl in equation 4 is intended to capture the effect of treatent assignent on copliance. For testing the null hypothesis of τ in equation 3, Schochet 203 use a Taylor approxiation of the estiator τ CL. Specifically, the estiated variance for Y T Y C and D T D C are where VarY T Y C S 2 Y, VarDT D C S 2 D S 2 Y S 2 D Z jy j Y T 2 + Z jy j Y C 2 2 Z jd j D T 2 + Z jd j D C 2 2

12 Additionally, the estiated covariance between Y T Y C and D T D C is ĈovY T Y C, D T D C Z jy j Y T D j D T j0 + Z jy j Y C D j D C 2 2 Using the estiated variances/covariances, Schochet 203 proposes the following estiator for the variance of ˆτ CL based on the Delta Method of estiating variance of ratios like τ CL. Var τ CL VarY T Y C VarD + τ 2 T D C ĈovY T Y C, D T D C D T D C 2 cl 2 τ cl D T D C 2 D T D C 2 Also, for any α 0,, the corresponding α confidence interval for τ is τ CL ± z α/2 Var τ CL 5 where z α/2 is the α/2 quantile of the standard Noral distribution. If τ 0 in H 0 is included in the interval, then H 0 is retained at the α level. Otherwise, H 0 is rejected in favor of the two-sided alternative H : τ τ 0. While this approach to testing H 0 is straightforward, its validity, say whether the α confidence interval in equation 5 actually covers the true value τ α ties, depends on the asyptotic arguent where i the nuber of clusters is growing to infinity and ii the denoinator of τ CL being far away fro zero as. It is well known that the Delta Method can perfor poorly when copliance rates are low Bound et al. 995; Kang et al See Hansen and Bowers 2009 for a saple theoretic approach that is closer to the ethods we outline below. 3.2 The Unit-Level Method Unit-level ethods analyze CRTs with non-copliance at the unit level using individual easureents rather than cluster level using suary statistics in Section 3.. For inference, unit level ethods use robust variance estiation ethods that take into account intracluster correlations. This approach is a generalization of clustered standard errors developed 2

13 by Liang and Zeger 986. Here, point estiation of the CACE relies on two-stage least squares TSLS applied to the unit level outcoes, copliance indicators, and group-level treatent assignents. Forally, we define the the vectorized outcoe variable Y Y, Y 2,..., Y n, Y 2,..., Y n, the vectorized copliance variable D D, D 2,..., D n, D 2,..., D n, and the vectorized treatent assignent Z Z, Z 2,..., Z n, Z 2,..., Z n. We let D be the predicted copliance vector D based on regressing Z on D with an intercept ter. Then, the TSLS estiator of τ, denoted as τ TSLS, is the estiated coefficient of D fro running an ordinary least squares OLS regression between D and Y with an intercept. Explicitly, this two-stage process can be expressed as: τ TSLS Z jy j Z jn j Z jd j Z jn j Z jy j Z jn j 6 Z jd j Z jn j Z jn j Z jy j Z jn j Z jy j Z jn j Z jd j Z jn j Z jd j 7 For testing the null hypothesis of τ in equation 3, let u Y D τ TSLS denote the residual and u j n j i u. Also, let D j n j D i, D2 j n j D i, 2 and D j u j nj D i u. Then, following Liang and Zeger 986; Wooldridge 200 and Caeron and Miller 205, the estiated cluster-robust standard error is Var τ TSLS D 2 j u2 j + n 2 D 2 j u j 2n D j u D j j u j n D j 2 D 2 2 j Also, standard econoetric arguents see Chapter 5.2 of Wooldridge 200 can be used to construct a α confidence interval for τ τ TSLS ± z α/2 Var τ TSLS 8 3

14 An advantage of this approach is that standard statistical software can be used. However, the validity of the confidence interval in equation 8 relies on the sae two sets of assuptions in Section 3.: i the nuber of clusters is going to infinity and ii the denoinator of τ TSLS is far away fro zero as. Prior siulation studies suggest that 40 is necessary for the two set of assuptions to be reasonable Angrist and Pischke Identification of CACE with Existing Methods 4. Fixed Population While the two ethods in Sections 3. and 3.2 with their respective point estiators, τ CL and τ TSLS, clai to identify the CACE, a closer exaination reveals that this is only true in very narrow settings where i the nuber of units in all clusters are the sae or ii clusterlevel CACE is identical across all clusters. In this section, we explore the identifiability of τ CL and τ TSLS when the population F is assued to be fixed. The two propositions below study the identification of τ CL, the estiator of τ based on looking at the CRT data at the cluster level, and τ TSLS, the estiator of τ based on using TSLS. Proposition. For each cluster j, define the following weights w CL,j n CO,j n j n CO,j, n j where n CO,j is the nuber of copliers in cluster j. If Assuptions -4 hold, τ CL identifies a weighted coplier average treatent effect τ CL E[Y T Y C F, Z] E[D T D C F, Z] w CL,j τ j where τ j is the cluster-level coplier average treatent effect. 4

15 Proposition 2. For each cluster j, define the following weights w TSLS,j n CO,j n n j n CO,jn n j where n CO,j is the nuber of copliers in cluster j. If Assuptions -4 hold, ˆτ tsls identifies a weighted coplier average treatent effect. τ TSLS E [ E [ Z jn j Z jy j Z ] jn j Z jy j F, Z Z jn j Z jd j Z ] jn j Z jd j F, Z w TSLS,j τ j where τ j is the cluster-level coplier average treatent effect. Propositions and 2 show that the two priary ethods of analyzing CRT data identify τ CL and τ TSLS, which ay not always equal the CACE because of how each ethod weights cluster-level coplier average treatent effects. The cluster-level τ CL averages the clusterlevel CACE by the proportion of copliers, n CO,j /n j ; it does not take into consideration the size of each cluster. The individual-level τ TSLS averages the cluster-level CACE by the product of the nuber of copliers per cluster and the nuber of individuals in other clusters, via. n CO,j n n j. Tables and 2 provide nuerical deonstration of the consequences of the weights on identifying τ in a CRT with 3 clusters. In Table, the copliance rates across the three clusters are identical and set to 50% whereas in Table 2, the copliance rates differ across the three clusters, ranging fro 20% to 80%. In both tables, the size of the clusters n j vary and the coplier average treatent effect τ j vary. In Table, when the copliance rates are identical, the cluster-based ethod τ CL gives disproportionately large weights to sall clusters of size 0, j 2, 3, and consequently, all clusters, despite their differences in size, are weighted equally with weights w CL, w CL,2 w CL,3 /3. The TSLS ethod τ TSLS also gives disproportionately large weights to sall clusters. But, unlike the cluster-based τ CL, it still takes the size of each cluster 5

16 into consideration when weighing cluster-level CACE and thus, each cluster gets slightly different weights, w TSLS, 0.47, and w TSLS,2 w TSLS, Finally, the true CACE τ weights the cluster-level CACE proportional to the nuber of copliers in each cluster with weights w 0.8, w 2 w Both the cluster-level ethod and the TSLS ethod under-estiate the true CACE. However, under the scenario where the copliance rates are identical across clusters, the two-stage least squares estiator preserves soe eleents of the weights associated with the true CACE τ in that clusters of identical size receive identical weights and that the absolute nuber of units in each cluster are taken into account. In contrast, the cluster-level ethod ignores the size of the cluster and weights each cluster equally. Table : A nuerical exaple of a CRT with three clusters 3 and identical copliance rates. Cluster Cluster Size: n j Nuber of Copliers: n CO,j % copl. rate j n 80 n CO, 40 50% τ j 2 n 2 0 n CO,2 5 50% τ 2 2 j 3 n 3 0 n CO,3 5 50% τ 3.5 CACE per Cluster: τ j 3 n 00 n CO 50 τ τ CL τ TSLS Note: τ is the true CACE, τ CL is the identified value based on the cluster-level estiator τ CL, and τ TSLS is the identified value based on TSLS τ TSLS. Table 2 exaines the sae scenario as Table except that the copliance rates vary across cluster. In particular, sall clusters have higher copliance rates than large clusters copliance rates 80% versus 0%, but the nuber of copliers reain identical across clusters. Siilar to Table, we see that both the cluster-based ethod τ CL and the TSLS τ TSLS tend to up-weigh the saller clusters and down-weigh the large clusters in coparison to τ. We also see that siilar to Table, the TSLS ethod tends to be closer to the true 6

17 CACE τ than the cluster-based ethod; however both ethods tend to overestiate the true CACE. Table 2: A nuerical exaple of a CRT with three clusters 3 and different copliance rates. Cluster Cluster Size: n j Nuber of Copliers: n CO,j % copl. rate j n 80 n CO, 8 0% τ j 2 n 2 0 n CO,2 8 80% τ 2 2 j 3 n 3 0 n CO,3 8 80% τ 3.5 CACE per Cluster: τ j 3 n 00 n CO 24 τ τ CL τ TSLS Note: τ is the true CACE, τ CL is the identified value based on the cluster-level estiator τ CL, and τ TSLS is the identified value based on TSLS τ TSLS. The following Corollary shows that we can identify the desired coplier average treatent effect τ using the cluster-based ethod τ CL and the TSLS ethod τ TSLS if either the clusters are of equal size or if the cluster-level CACE is identical across clusters. Corollary. Suppose either condition below hold:. The nuber of units in each cluster n j is identical across all clusters. 2. The coplier average treatent effect per cluster τ j is hoogeneous/identical across all clusters. Then, τ CL and τ TSLS identify the coplier average treatent effect τ. The first condition in Corollary allows for heterogeneity in coplier average treatents across clusters, but stipulates that the cluster size ust be the sae for the cluster-based and TSLS ethods to identify τ. The second condition allows the cluster size to vary, but stipulates that every cluster ust have the sae cluster-level CACE for the cluster-based and TSLS ethods to identify τ. As we noted above, the latter condition is coon in 7

18 econoetrics in the for of a linear structural odeling assuption between Y and D where the coefficient associated with D and Y are assued to be constant Wooldridge 200. We also note that the conditions stated in Corollary are sufficient conditions for τ CL and τ TSLS to identify τ. In suary, under a fixed population F, the two popular ethods for analyzing CRT data with non-copliance ay not always identify the target causal paraeter of interest, CACE τ. This can be seen by looking at the weights that ake up both τ CL, w CL,j and τ TSLS, w TSLS,j, which differ fro the weights of τ in equation 2. Indeed, only under the restrictive conditions stated in Corollary can the two popular ethods actually identify the CACE. 4.2 Growing Population The previous section discussed identification of τ under the case where the population F reained fixed. However, for testing hypothesis about τ in equation 3, often one assues the saple size n and consequently, F is growing. Here, we consider whether the cluster-based and unit-based ethods identifies the true τ in this asyptotic regie, even though these ethods generally do not in the finite saple case. In the CRT setting, there are priarily two ways in which we ight iagine n. The first way is where the nuber of clusters is getting larger, but the units per cluster n j reains bounded. This regie is the basis for justifying the testing properties of any estiators in the CRT literature, including both τ CL and τ TSLS. This regie also serves as the basis for justifying the inferential properties of any causal estiators outside of the CRT literature; see Li and Ding 207 for a recent review. In practice, this regie is relevant in CRT designs that use household as the cluster and the nuber of households is large copared to the nuber of individuals that ake up the household. For exaple, this type of design is coon in the political science literature on the effectiveness of voter obilization strategies. See Gerber et al for an exaple and Green et al. 203 for 8

19 an overview. The second was is a setting where the nuber of clusters reains bounded often fixed, but the units per clusters n j is growing. This regie is not as coon as the first type and inferential properties e.g. confidence intervals for any estiators in CRT, including both τ CL and τ TSLS, typically break down under this regie. In practice, this regie ight occur under an experiental design that was ipleented at the state or county level. If each state in the U.S. serves as a cluster and we re studying individuals within each state, typically the nuber of individuals that reside in a state is larger than the nuber of states in the U.S. Soe CRTs are close to this regie. For exaple, Hayes et al. 204 outlines a CRT design with 2 clusters with approxiately 55,000 units in each cluster. However, for analysis 2,500 units are randoly sapled fro each cluster. Soloon et al. 205 describe a design with 2 clusters and 000 units per cluster. A design of this type is ore coon in clustered observational studies, see Aceoglu and Angrist 2000 for one exaple. Let p CO,j n CO,j /n j be the proportion of copliers in cluster j. Proposition 3 exaines the identification properties of both the cluster-level and unit-level ethods, under the asyptotic regie of the first type where the nuber of clusters and the nuber of units per cluster reain n j B for all j. Under this asyptotic regie, only the individuallevel ethod always identify the CACE at the liit whereas the cluster-level ethod ay fail to do so. Proposition 3. Suppose Assuptions -4 hold. Consider the asyptotic regie for F where i, ii n j s are bounded, iii τ j s are bounded. If p CO li p CO,j/, the liiting average proportion of copliers across all clusters and n CO li n CO,j/, the liiting average nuber of copliers across all clusters, exist, then τ CL has the following 9

20 asyptotic property. li τ CL τ li sup τ CL τ > K F li p p CO n CO 2 CO,j p CO,l n l n j τ j τ l j<l for soe constant K > 0. Also, τ TSLS has the following rate-sharp asyptotic property. sup τ TSLS τ O F 0 Unlike the results in Propositions and 2 where in finite saples, the two ethods ay not identify the true population CACE, Proposition 3 states that asyptotically, the identifying values τ TSLS always converge to the liiting τ if it exists. However, τ CL ay not converge to the liiting τ and in the worst case, it will never converge to it. One exaple non-convergence that ay arise in practice is in household surveys where each household defines a cluster and the total nuber of people per household, n j, is bounded by B; this type of design is studied under the growing asyptotic regie in Proposition 3. Specifically, if B 4 and the distribution of the cluster size n j is unifor between values 2, 4 with 50% copliers across all clusters and there are B distinct cluster-specific CACE where saller clusters have higher CACE than larger cluster, say τ j 2 if n j 4, τ j 4 if n j 2, then li τ CL τ 2 3 li 2 6 j<l,n l n j We can ake the difference between τ CL and τ arbitrary large by selecting different values of τ j. However, τ CL will converge to τ if conditions in Corollary are satisfied. In short, using cluster-based ethods, are sensitive to the design as well as the underlying unobserved heterogeneity of CACE across clusters. 20

21 Additionally, the result in Proposition also has broader iplications. First, the result in Proposition 3 underscores the role of asyptotic assuptions for identifying the target causal paraeter τ of interest, especially for τ TSLS. Second, because there ay be clustered study designs for which τ CL will never identify the CACE, this suggest that doing asyptotic inference using clustered-based ethod, say coputing p-values or confidence intervals in Section 3., ay be invalid i.e. inflated Type I errors or incorrect coverage since the target paraeter τ is never identified in the first place. Next, Proposition 4 exaines the identification properties of the cluster-based ethod and the unit-level ethod, in the asyptotic regie of the second type where the nuber of clusters reains bounded and the nuber of units per cluster follows n j. Proposition 4. Suppose we are in the asyptotic regie where i reains bounded, ii for every j, n j, n CO,j where n CO,j /n j p CO,j 0, and n j /n k ρ jk [0, for every j k, and iii τ j τ j,. Then, the identifying values fro the two ethods, τ CL and τ TSLS converge to the following values li τ CL τ n τ j, l j p CO,jp CO,l ρ lj p CO,j + l j ρ ljp CO,l l p CO,l and li τ TSLS τ n L p CO,q p CO,k ρ kq τ q, l j p CO,lρ lk ρ jq p CO,q + l q ρ lqp CO,l q k q Furtherore, we have that li n sup F τ CL τ > K and li n sup F τ TSLS τ > K for soe constant K > 0. Proposition 4 states that under the growing n j asyptotic regie, both τ CL and τ TSLS ay not converge to τ. For exaple, if we have clusters where n < n 2 <... < n and τ, > τ 2, >... > τ,, the liiting values will be both positive and away fro zero. However, following Corollary, if the cluster sizes are asyptotically identical so that ρ lj for all l j, the two estiators will converge to the liiting values of τ. Also, in 2

22 the worst case, i.e. under the supreu liit, there is a data generating process whereby neither ethods converge to the liiting value of τ. In suary, both Propositions 3 and 4 deonstrate that in asyptotic settings where the nuber of clusters is growing and the nuber of units within each cluster are bounded, the finite-saple identification probles of the cluster-based and unit-level ethods laid out in Propositions and 2 go away uniforly across all data generating processes of this type. Also, the unit-level ethod converges to the true τ faster than the cluster-based ethod. In contrast, in the asyptotic setting where the nuber of units within each cluster grow and the nuber of clusters reain fixed, the finite saple probles ay reain, and the two ethods ay not converge to the true CACE. We conclude the section by aking a few technical points. First, condition iii in Proposition 3 can be relaxed so that the cluster-level coplier average treatent τ j is growing on the order of p where 0 p < for the cluster-level ethod and 0 p < 2 for the TSLS ethod. We do not believe this scenario is realistic in practice and hence, we state a ore sipler condition for τ j in Proposition 3. Second, for clarity of exposition, we avoid a ore technically accurate exposition where we place subscript in F and the paraeters τ CL, τ TSLS, and τ should be functions of both F and. Third, for the interested reader, in the appendix, we derive a ore precise upper and lower bounds, including constants, for the convergences τ CL τ and τ TSLS τ. 4.3 Inference We suarize the inferential properties of these two ethods, which have been well-established in the literature Schochet 203; Wooldridge 200. Suppose that the estiators τ cl and τ tsls are identifying the CACE, whether by conditions stated in Corollary or in an asyptotic sense in Proposition 3. Both rely on asyptotic approxiations for inference, specifically a variation of the Delta ethod where the denoinators of τ cl and τ TSLS are assued to be away fro zero as, i.e. a siilar asyptotic regie described in Proposition 3. 22

23 In particular, both confidence intervals derived fro the point estiators rely on the treatent assignent having a strong effect on copliance, or, equivalently, having experients with uniforly high copliance rates. If the experient has low copliance rates, then the asyptotic arguents used to construct these confidence intervals no longer reains valid and will often be shorter and/or off fro the true target value. In general, we would argue that the conditions of the sartcard intervention do not closely hew to any of the asyptotic teplates. The nuber of clusters is not particularly large, the nuber of units vary fro cluster to cluster, and the copliance rates vary fro cluster to cluster. 5 An Alost Exact Approach The ethods outlined above suffer fro two particular weaknesses. First, these ethods do not always identify the CACE, especially in finite saples, and asyptotics assuptions have to be used if unifor identification is desired. This is a particular concern, since CRTs often have use uch sall saple sizes than non-clustered experiental designs. Second, both ethods rely on inferential techniques that require the copliance rate to be high. Next, we propose solutions that always identifies CACE and can produce confidence intervals that reain valid even if the copliance rate is low. To allow for finite saple inference, we adapt alost exact ethods fro non-clustered experiental designs. The alost exact approach approxiates exact inference but uses closed-for expressions for interval estiation. See Kang et al. 207 for a review of both approaches with unclustered data. More specifically, we rely on finite saple asyptotics to approxiate the exact null distribution Hájek 960; Lehann First, we outline a point estiator, by defining a series of adjusted responses. Adjusted responses are observed responses that have been adjusted to be consistent with the null hypothesis. Let A j τ 0 Y j D j τ 0 be the adjusted response, A j τ 0 Yj D j τ 0 is the adjusted response if cluster j was treated, and A 0 j τ 0 Y 0,D 0 j j D 0 j,d j τ 0 is the adjusted 23

24 response if cluster j was not treated. Next, A T τ 0 Z jy j D j τ 0 is the ean of the adjusted response aong the treated, and A C τ 0 Z jy j D j τ 0 is the ean of the adjusted response aong the control. Then, consider the test statistic T τ 0 for the hypothesis in equation 3 T τ 0 Z j Y j D j τ 0 Z j A j τ 0 Z j Y j D j τ 0 Z j A j τ 0 A T τ 0 A C τ 0 Then, we have the following stateent about the property of T τ 0. Proposition 5. Suppose Assuptions A-A4 hold and the null hypothesis H 0 : τ τ 0 holds. Then, as, where / p 0,, suppose i µ Y and µ D are constants so that µ Y /µ D τ 0 and ii the following growth condition holds A ax j j A j + A0 j + A0 j A j A j + A0 j 0 + A0 j Then, we have T τ 0 V art τ0 F, Z N0, A consequence of Proposition 5 is that we can construct a Hodges-Lehann type of point estiator Hodges and Lehann 963 based on the testing properties of T τ 0. Specifically, our estiate is the value of τ that satisfies the equation T τ 0. Algebraic anipulation reveals that the τ that satisfies T τ 0 is τ AE Z jy j Z jy j Z jd j Z 9 jd j 24

25 The following corollary shows that the identifying value of the proposed point estiator τ AE is the CACE in finite saples, irrespective of the nuber of units within the cluster or cluster-level CACE hoogeneity; this is in contrast to the two extant ethods where either i the nuber of clusters had to be identical, ii the cluster-level CACE had to be identical across clusters, or iii for TSLS, asyptotic arguent involving growing had to be invoked. Note that one can also derive the sae estiator using the ethods in Middleton and Aronow 205 as plug-in estiates for the ters in equation 9. Corollary 2. If Assuptions -4 hold, τ AE in equation 9 identifies the coplier average treatent effect, i.e. E[ Z jy j Z jy j F, Z] E[ Z jd j Z jd j F, Z] τ Another consequence of Proposition 5 is that for any α, 0 < α <, we can construct a two-sided α confidence interval for τ by inverting the test and using the asyptotic Noral distribution under the null hypothesis. This requires specifying an estiator for V art τ 0 F, Z and one such estiator is the classic su of variance between the treated and control units Neyan 923; Ibens and Rubin 205: S 2 τ 0 Z j A j τ 0 A T τ Z j A j τ 0 A C τ 0 2. The variance estiator S 2 τ 0 is well-known to be conservative and generally speaking, there does not exist a consistent nor unbiased estiator for V art τ 0 F, Z Neyan 923. Recent proposals by Robins 988 and Aronow et al. 204 provide sharper estiates of V art τ 0 F, Z, which can also be used in our context. For exaple, we can replace our variance estiate S 2 τ 0 by the upper bound V H N in equation 9 of Aronow et al Regardless, once we have selected an estiator for variance, say S 2 τ 0, a two-sided α 25

26 confidence interval of τ is the set of τ 0 that is accepted under the null hypothesis. { } T τ 0 τ 0 : P H0 Sτ 0 z α/2 F, Z 0 where z α/2 is the α/2 quantile of the standard Noral distribution. The interval will necessarily be conservative because S 2 τ 0 is a conservative estiator of the variance V art τ 0 F, Z; but since no unbiased or consistent estiator for the variance generally exist, ost two-sided α interval will be conservative. But, the interval produces valid statistical inference in that the interval in equation 0 will cover τ with at least α probability. After soe algebra, the α confidence interval in equation 0 can be greatly siplified to be the solution to a quadratic equation. Consider the following quantities. ŝ 2 Y T ŝ 2 Y C Z j Y j Z jy j Z j Y j Z 2 jy j Z j D j D 2 jy j Z j D j Z 2 jd j Z j Y j Z jy j D j Z jd j Z j Y j Z jy j D j Z jd j ŝ 2 D T ŝ 2 D C ŝ 2 Y D T ŝ 2 Y D C 2 The ters ŝ 2 Y T and ŝ 2 D T are the estiated variances of Y and D aong treated clusters. The ters ŝ 2 Y C and ŝ 2 D C are the estiated variances of Y and D. The ters ŝ 2 Y D T and ŝ 2 Y D C are the estiated covariances between Y and D aong the treated and control clusters, respectively. Using these estiated variances, the interval in equation 0 is a solution to 26

27 the quadratic equation {τ 0 aτ bτ 0 + c 0} where the coefficients a, b, and c in equation are ŝ2 a µ 2 D z α/2 2 DT + ŝ2 D C [ ŝ2 ] b µ Y µ D z α/2 2 Y DT + ŝ2 Y D C ŝ2 c µ 2 Y z α/2 2 YT + ŝ2 Y C The α confidence interval of τ based on equation is siple, requiring a quadratic solver based on coefficients a, b, and c. A notable feature of this confidence interval is that it will produce an infinite confidence intervals when an instruent is weak. An infinite confidence interval is a warning that the data contain little inforation Rosenbau 2002, p. 85 and is a theoretically necessary requireent for confidence interval in IV settings Dufour 997. Also, this quadratic confidence interval in equation can never be epty; see the suppleentary aterials for details. However, this ay not be generally true for different variance estiators V art τ 0 F, Z which is used to construct confidence intervals of the type in equation 0. We also propose an exact ethod for confidence interval construction under a ore narrow null hypothesis than in equation 3. Specifically, consider the null H 00 where the cluster-specific CACE τ j are equal to τ 0 for all j, i.e. there is cluster-level CACE hoogeneity. H 00 : τ j τ 0, j,..., 2 The null H 00 is a narrower hypothesis than H 0 in that the paraeters F that satisfy H 00 also satisfies H 0, but the converse is not true. However, and ost iportantly, H 00 is not a strict sharp null in the sense that we still cannot specify all the potential outcoes, say Y z,dz for any z {0, } even under H 00. Note that under H 00, the population CACE is 27

28 equal to τ 0. A benefit of deriving inferential properties under a restrictive H 00 is that we can obtain exact, non-paraetric, finite-population confidence intervals instead of the asyptotic ones in Proposition 5. In particular, the α confidence interval derived under H 00 covers τ with probability at least α in any finite saple. Proposition 6. Suppose Assuptions A-A4 hold and the null hypothesis H 00 in equation 2 holds. Then, for any t R and for any τ 0, the null distribution of T τ 0 under H 00 is the perutation distribution. P T τ 0 t Z, F z Z z ja j τ 0 z ja j τ 0 t 3 Using the duality between testing and confidence intervals, one can construct a α confidence interval of τ by finding values of τ 0 for which H 00 is accepted at the α level. For exaple, let q α/2 be the α/2 quantile of the null distribution in equation 3. Then, a two-sided α confidence interval of τ based on Proposition 6 is the following set of τ 0 { τ0 : P T τ 0 q α/2 Z, F } 4 where we assued, for siplicity, T τ 0 is syetric around 0; if the latter does not hold, we can take the union of two one-sided confidence intervals to for the desired two-sided confidence interval. Unlike the α confidence interval in equation 4, the α confidence interval in equation 4 is exact in finite-saples, not relying on any asyptotic arguents. Also, like the confidence interval in equation 0, if the confidence interval in equation 4 is infinite, is a warning that the data contain little inforation about τ 0. Also, if the confidence interval in equation 4 is epty, it suggests either that the instruental variables assuptions 2-4 are not et or, ost likely, that the CACE hoogeneity assuption H 00 is not et since technically speaking, the perutation null distribution in equation 4 28

29 reains valid even if assuptions 2-4 fails to hold; see the suppleentary aterials for technical details. A caveat of the interval in equation 4 is that one has to copute q α/2, which depending on the size of and, ay be coputationally burdensoe. In practice, we recoend using the confidence interval in 4 if is sall, roughly under 40 depending on one s coputational power. Next, we evaluate our proposed ethods against current practice using a siulation study. 6 Siulation Study We now use siulations to copare the perforance of current ethods of estiation and inference to the alost exact approach. As we outlined above, extant ethods ay not recover the target causal estiand when coplier effects vary with cluster size. In the following siulations, we focus on how perforance of the various ethods change as coplier effects vary with cluster sizes. First, we describe general aspects of the siulation. In general, we designed the siulations to closely replicate the conditions of the sartcard intervention. We assue non-copliance is one-sided. Next, π is the probability a unit within a cluster is a coplier. Let P i denote the copliance class, which only includes copliers and nevertakers in the one-sided copliance setting, since always-takers and defiers are excluded by design. We designate a unit as coplier, P i co, by rando sapling units fro each cluster with probability π and π. We generated outcoes using a linear ixed odel LMM of the for Y α + τip i coz j + βn i + γz j n i + c i + ɛ ij. where co indicates that a unit is a coplier. Under this odel, Z j if cluster i was assigned to treatent and Z j 0 if the cluster was assigned to control, so τ is a easure of the individual level treatent effect if the unit is a coplier. In the odel, c i is a cluster-level 29

30 rando effect, β is a specific cluster-level effect that varies with the size of the cluster, and γ allows the treatent effect to vary by cluster size. In the siulations that follow γ is the key paraeter that we vary in the siulations. In the siulations, we used a t-distribution with five degrees of freedo for the error ters c i and ɛ ij. For this data generating process, we define the intraclass correlation, λ, as Varc i /{Varc i + Varɛ ij } when c i and ɛ ij have finite variance. We can adjust the scale of the cluster distribution errors c i, so that we can control the value of λ in the siulations. In all the siulations, we set λ to 0.28, which is equal to the estiated intraclass correlation in the sartcard data. This is is fairly large ICC value. Hedges and Hedberg 2007 using ICC estiates fro clustered randoized experients in education find that λ s range fro 0.07 to 0.3, with an average value of 0.7. Sall et al. 2008a note that λ values in the range of.002 to 0.03 are ore typical in public health interventions that target clusters such as hospitals, clinics or villages. We repeated the siulation for differing nubers of clusters. We used cluster sizes of 20, 30, 50, 80, 00, and 200. In ost CRTs with noncopliance, the nuber of units per cluster and the copliance rate tend to vary fro cluster to cluster. In our siulations, we used the units per cluster and the copliance rates fro the sartcard data. We did this by sapling fro the cluster sizes and copliers rates fro the data. That is, when the nuber of cluster is 50, we took a rando saple of 50 cluster sizes and copliance rates fro the data. This allowed us to vary cluster sizes and copliance rates in a fashion that iics the typical data structure in a CRT with noncopliance. As we noted above, the key paraeter in the siulation is γ. As such, we conducted three different sets of siulations. In the first, we set γ 0, which iplies that coplier effects do not vary with cluster size. In the second and third set of siulations, we set γ to and For these two siulations, the coplier effects are negatively and positively correlated with cluster sizes respectively. Table 3 contains the results fro the first siulation study. First, we observe that when 30

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer