Abstract Title Page Title: A method for improving power in cluster randomized experiments by using prior information about the covariance structure.

Size: px

Start display at page:

Download "Abstract Title Page Title: A method for improving power in cluster randomized experiments by using prior information about the covariance structure."

Toby Logan
5 years ago
Views:

1 Abstract Title Page Title: A method for improving power in cluster randomized experiments by using prior information about the covariance structure. Author(s): Chris Rhoads, University of Connecticut. 1

2 2 Abstract Body Background/context: Description of prior research, its intellectual context and its policy context. The educational research community has become quite aware in recent years that experiments which randomize entire clusters (e.g. schools) to treatments result in estimates of the average treatment effect which are less precise than estimates from individually randomized experiments with the same total number of subjects. (Raudenbush, Martinez and Spybrook, 2007; Bloom, Richburg-Hayes and Black, 2007). In particular, we let σb 2 represent the within treatment group variance between school mean test scores and σw 2 represent the variance of individual test scores around the school means. We define σt 2 = σ2 b + σ2 w and ρ = σ2 b. The σt 2 quantity ρ is referred to as the intra-class correlation coefficient (ICC). Then the variance in a cluster randomized design is inflated by a factor of 1 + (n 1)ρ (where n is the sample size within each cluster). We assume n to be the same within each cluster for simplicity of exposition. Less well known is the fact that a second penalty is paid when clusters are randomized. Not only is variance inflated, but there are also many fewer degrees of freedom available to estimate that variance. The two penalties that are paid when one chooses cluster randomization were the subject of a short but widely cited paper by Cornfield (1978). Specifically, Cornfield noted that if there are 2m clusters in the experiment, divided equally between treatment and control groups, then the usual t test has only 2m 2 degrees of freedom when clusters are randomized. In contrast, when individuals are randomized the usual t test has 2mn 2 degrees of freedom. In situations where cluster randomization cannot be avoided it is impossible to avoid a treatment effect estimator with increased variance. However, Blair and Higgins (1986) pointed out that if the ICC is known prior to conducting the experiment, generalized least squares (GLS) methods can be used and Cornfield s second penalty can be avoided. Thus, when the ICC is known ahead of time, a test of the treatment effect can be conducted with 2mn 2 rather than 2m 2 degrees of freedom. Konstantopoulos (forthcoming) has made the same point with regard to experimental designs with two levels of nesting. While the methods proposed by Blair and Higgins (1986) and Konstantopoulos (forthcoming) are intriguing ways to boost power, they can only be used if the exact value of the ICC (or ICCs in the case of a three level design) is known prior to the experiment. In many situations some estimator of the ICC will be available from data external to the main experiment. For example, a pilot study may provide information about the ICC. However, the pilot study may not be very large and so the available estimate of the ICC will be subject to considerable imprecision due to sampling variance. In a situation like this it is not reasonable to assume that the ICC is known exactly. However, it would still be useful to use the prior information about the ICC in order to improve the power of the experiment. Utilizing an estimate of the ICC to improve power is the subject of two papers by Blitstein and colleagues: Blitstein, Hannan, Murray and Shadish (2005), and Blitstein, Murray, Hannan and Shadish (2005). These papers describe a method for utilizing external information about ρ when the external estimator of ρ is subject to sampling variance (so that ρ is not known exactly).

3 3 Purpose / objective / research question / focus of study: Description of what the research focused on and why At SREE 2010 I presented bounds on the amount of power improvement that can be achieved if we have information about the ICC ahead of time as compared to when we have no prior information. I also discussed some of the flaws with the method of df described in Blitstein, Hannan, Murray and Shadish (2005), and Blitstein, Murray, Hannan and Shadish (2005). In the current paper I illustrate a new method for incorporating prior information about the ICC into an analysis of a cluster randomized experiment with the objective of increasing power. The method can be used with multi-level designs with any number of levels (although its application is greatly simplified if there is only a single level of nesting). Unlike previous methods, the current procedure can be applied in a way that guarantees that type I error rates are at their nominal level. The design situations where it would be useful to utilize prior information about the ICC are explored in some detail. Significance / novelty of study: Description of what is missing in previous work and the contribution the study makes. Given the large number of schools (and hence, expense) necessary to do well powered experiments that randomize schools to treatments (Raudenbush, Martinez and Spybrook, 2007) it is natural for the educational research community to attempt to find methods that help achieve sufficient power with fewer schools. Utilizing prior information about the ICC holds out the promise of being one such method. However, the methods described in Blair and Higgins (1986) and Konstantopoulos (forthcoming) apply only when the ICC is known exactly prior to the experiment. When only an estimate of the ICC is known prior to the study, the method of df cannot guarantee that type I error rates will be controlled at the nominal level. Furthermore, existing methods for incorporating prior information about the correlation structure into an analysis have only dealt with certain designs (specifically, designs with no more than two random effects and all random effects nested). The current method can both (a) guarantee the correct type I error rate when prior information about the covariance structure is subject to sampling error; and (b) is broadly applicable to designs with any sort of covariance structure whatsoever (provided, of course, that prior information about this covariance structure is available). Statistical, Measurement, or Econometric Model: Description of the proposed new methods or novel applications of existing methods. I consider first the case of a cluster randomized design with a single level of nesting. Denote the test statistic that is used when ρ is known exactly prior to the study as t GLS (ρ). I consider the use of this test statistic with a prior estimate of ρ (denoted ˆρ ex ) substituted for ρ. Space constraints prevent a full exposition, but the basic steps in the method are as follows: (1) I note that conditional on ˆρ ex, t GLS (ˆρ ex ) is approximately distributed as a constant k times a t random variable with h degrees of freedom. The values of k and h are functions of both ρ and ˆρ ex. Hence, given any critical value, we can determine the conditional size of the test (which depends on ρ and ˆρ ex ). (2) Average the values of the conditional size over the sampling distribution of ˆρ ex to obtain an unconditional size which depends only on ρ. Denote this unconditional

4 4 size as α U (ρ). I note that if we make the usual normality assumptions about the distribution of the random effects in the model then the sampling distribution of ˆρ ex is a simple transformation of the F distribution. (3) Use a root finding algorithm to determine the critical value that satisfies the condition that max ρ [0,1] α U (ρ) = α. Using this critical value results in a level α test of the desired null hypothesis. The method can be easily modified for use with any sort of assumed correlation structure for the data. In particular, assume that the data y from the experiment can be reasonably modeled by a multivariate normal distribution with known correlation matrix V. Assume that we have specified a linear model for the mean of y with parameter vector b. Then any linear hypothesis of the form H b = θ, can be tested by forming the appropriate test statistic which utilizes the GLS estimate for the vector b. The test statistic will depend on the value of V and will follow the F distribution under the null hypothesis. Denote this test statistic F GLS (V), and denote the test statistic with the prior estimate of the correlation matrix substituted for V as F GLS ( ˆV ex ). Assume that the correlation matrix V for data can be parameterized in terms of a vector ρ = (ρ 1,..., ρ q ) [0, 1] q and that a prior estimate of ρ, ˆρ ex, is available. Then the same three steps outlined above may be followed to derive a level α test of the desired null hypothesis. That is: (1) Given a critical value c we can compute the size of the test conditional on ρ and ˆρ ex. (2) We can average over the distribution of ˆρ ex to obtain the unconditional size. (3) We use a multivariate root finding algorithm to determine the critical value that satisfies the condition (1) maxρ [0,1] q α U(ρ) = α. The above presentation may be impractical to implement as it involves (a) deriving the distribution of the q dimensional vector ˆρ ex and (b) finding the root of the equation given in (1). The paper points out that frequently the difficulty can be avoided by identifying a sum of squares like quantity (call it SS un ) such that: (a) SS un has an expected mean square equal to the expected mean square of the the numerator of the F statistic, (b) the degrees of freedom of a test statistic utilizing SS un in the denominator are large enough that further increases in degrees of freedom would produce negligible changes in the critical value used for the test, and (c) the distribution of SS un depends on only one or two elements of the q dimensional vector ρ. Usefulness / Applicability of Method: Demonstration of the usefulness of the proposed methods using hypothetical or real data. I focus on results for the simple case of a cluster randomized experiment with only one level of nesting. It is desirable to know under which conditions can external information about ρ improve power. In particular, it is useful to know how precise the prior estimate

5 of ρ needs to be in order to improve power. Some results in this area are presented below. The test based on the method outlined above is referred to as the CEGLS test, n ex refers to the within cluster sample size in the data providing the prior estimate of ρ and m ex refers to the number of clusters that provide prior information about ρ. The GLS test is the test that would be conducted if ρ were known exactly prior to the experiment and the cluster means (CM) test is the test that would be conducted if prior information about the ICC were not used at all. Results assume that the restricted maximum likelihood estimator of ρ is used to obtain the prior estimate of ρ and that hypothesis tests are one-sided at level This last assumption was made since in most practical situations the power of the one sided test at level will correspond to the power of the two sided test at level However, by assuming one-sided tests we avoid certain odd results that might occur when there is substantial power in the lower tail of the distribution. Figure 2 displays the power improvement of the CEGLS test over the usual (CM) test as a function of the true value of the ICC when m = 6, n = 25, m ex = 10 and n ex = 5, 25, 100. The graph makes clear that in this case the CEGLS R test performs no better than the CM test, and when n ex = 5 it performs significantly worse. The next figure, figure 2, keeps the value of n at 25, the value of δ at 0.5 and again varies n ex from 5 to 100. However, now m has been changed to 3, and m ex = 30. Unlike figure 1 in this case we see substantial power improvements, especially for value of ρ less than 0.2. The exception is when n ex = 5, in which case power improvement is very slight and is even negative for very small value of ρ. Figure 3 sets m to 4 and n = n ex to 25. The effect size remains at 0.5 and α is still The value of m ex is varied from 5 to 50. We see that when our prior estimate of ρ is based on only 5 clusters it is better not to use this information at all. When our prior estimate is based on 10 clusters it is useful to have prior information, but power improvement is substantially less than when ρ is known exactly. When our prior estimate is based on 20 clusters or more power improvement is almost as much as it would have been if rho had been known exactly. The pattern seen in figures 1 and 2, where power improvement is much less when n ex < n appears to hold in general. This fact is confirmed in table 1, where the power of the CEGLS test that uses the REML estimate of ρ is reported for a variety of values of n, m, n ex and m ex. As long as the prior estimate of ρ is based on at least 20 clusters it is preferable to have external information unless n ex < n. In this case it is frequently the case that power is actually worse when the external information about the ICC is utilized. Conclusions: Description of conclusions and recommendations based on findings and overall study. The current paper presents a method for improving power in experiments with clustering when prior information about the covariance structure is available. The method controls type I error to the nominal level and can be used with any sort of assumed covariance structure. It is shown that when prior estimates of the ICC are very noisy the experimenter will often be better off ignoring this information altogether, rather than trying to use it in her analysis. Additionally, if the size of the clusters providing the prior estimate of the ICC are substantially smaller than the size of the clusters in the main experiment it is difficult to obtain power improvements via the use of prior ICC information. However, if prior estimates are based on a sample of 20 or more clusters of size 25 or more and the cluster size in the 5

6 6 main experiment and the prior data are not too discrepant it will usually be advantageous to utilize the prior ICC estimate.

7 7 Appendix A. References Appendices Not included in page count. References [1] Blair, R.C. and Higgins, J.J. (1986). Comment on Statistical power with group mean as the unit of analysis. Journal of Educational Statistics, 11, [2] Blitstein, J.L., Hannan, P.J., Murray, D.M. and Shadish, W.R. (2005a). Increasing the degrees of freedom in existing group randomized trials through the use of external estimates of the intraclass correlation: The df* approach. Evaluation Review, 29, [3] Blitstein, J.L., Murray, D.M.,Hannan, P.J. and Shadish, W.R. (2005b). Increasing the degrees of freedom in future group randomized trials through the use of external estimates of the intraclass correlation: The df* approach. Evaluation Review, 29, [4] Bloom, H. S., Richburg-Hayes, L., and Black, A. R. (2007). Using covariates to improve precision: Empirical guidelines for studies that randomize schools to measure the impacts of educational interventions. Educational Evaluation and Policy Analysis, 29, [5] Cornfield, J. (1978). Randomization by Group: a formal analysis. American Journal of Epidemiology, 108, [6] Konstantopoulos, S (forthcoming). Constructing a More Powerful Test in Three-Level Cluster Randomized Designs. Journal of Research on Educational Effectiveness. [7] Raudenbush, S., Martinez, A. and Spybrook, J. (2007). Strategies for improving precision in grouprandomized experiments. Educational Evaluation and Policy Analysis, 29(1), Appendix B. Tables and Figures

8 8 Figure 1. Power CEGLS R (GLS)-CM test, m = 6, n = 25, m ex = 10, α =.025, δ = 0.5, vary n ex

9 Figure 2. Power CEGLS R (GLS)-CM test, m = 3, n = 25, m ex = 30, α =.025, δ = 0.5, vary n ex 9

10 10 Figure 3. Power CEGLS R (GLS)-CM test, m = 4, n = n ex = 25, α =.025, δ = 0.5, vary m ex

11 mex = 10 mex = 20 mex = 50 mex = 100 m n nex= CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val CEGLSR pow Improve crit val Table 1. CEGLSR power and improvement over CM test, δ = 0.5, ρ =.02, α =

Incorporating Cost in Power Analysis for Three-Level Cluster Randomized Designs

DISCUSSION PAPER SERIES IZA DP No. 75 Incorporating Cost in Power Analysis for Three-Level Cluster Randomized Designs Spyros Konstantopoulos October 008 Forschungsinstitut zur Zukunft der Arbeit Institute