Rerandomization to Balance Covariates Kari Lock Morgan Department of Statistics Penn State University Joint work with Don Rubin University of Minnesota Biostatistics 4/27/16
The Gold Standard Randomized experiments are the gold standard for estimating causal effects WHY? 1. They yield unbiased estimates 2. They eliminate confounding variables
R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Randomize R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R
Covariate Balance - Gender Suppose you get a bad randomization What would you do??? Can you rerandomize??? 512 Females, 158 Males 158 Females, 512 Males
The Origin of the Idea Rubin: What if, in a randomized experiment, the chosen randomized allocation exhibited substantial imbalance on a prognostically important baseline covariate? Cochran: Why didn't you block on that variable? Rubin: Well, there were many baseline covariates, and the correct blocking wasn't obvious; and I was lazy at that time. Cochran: This is a question that I once asked Fisher, and his reply was unequivocal: Fisher (recreated via Cochran): Of course, if the experiment had not been started, I would rerandomize.
Mind-Set Matters In 2007, Dr. Ellen Langer tested her hypothesis that mind-set matters with a randomized experiment She recruited 84 maids working at 7 different hotels, and randomly assigned half to a treatment group and half to control The treatment was simply informing the maids that their work satisfies the Surgeon General s recommendations for an active lifestyle Crum, A.J. and Langer, E.J. (2007). Mind-Set Matters: Exercise and the Placebo Effect, Psychological Science, 18:165-171.
Mind-Set Matters p-value =.01 p-value =.001
Covariate Imbalance The more covariates, the more likely at least one covariate will be imbalanced across treatment groups With just 10 independent covariates, the probability of a significant difference (α =.05) for at least one covariate is 1 (1-.05) 10 = 40%! Covariate imbalance is not limited to rare unlucky randomizations
The Gold Standard Randomized experiments are the gold standard for estimating causal effects WHY? 1. They yield unbiased estimates 2. They eliminate confounding variables on average! For any particular experiment, covariate imbalance is possible (and likely!), and conditional bias exists
1) Collect covariate data Specify criteria determining when a randomization is unacceptable; based on covariate balance 2) Randomize (Re)randomize subjects subjects to treated to treated and and control control RERANDOMIZATION Check covariate balance unacceptable acceptable 3) Conduct experiment 4) Analyze results with a randomization test
Unbiased? To maintain an unbiased estimate of the treatment effect, the decision to rerandomize or not should be Ø automatic and specified in advance Ø blind to which group is treated Theorem: If the treated and control groups are the same size, and if for every unacceptable randomization the exact opposite randomization is also unacceptable, then rerandomization yields an unbiased estimate of the treatment effect.
Unbiased? Equal sized treatment groups are not actually required, but that s the easiest case to consider. Theorem: If the rerandomization criterion is affinely invariant and if the covariates, x, are ellipsoidally symmetric, then rerandomization leads to unbiased estimates of any linear function of x.
Criteria for Acceptable Balance We recommend using Mahalanobis Distance, M, to represent multivariate distance between group means: M ( ) [ ] 1 X X 'cov( X) ( X X ) = T C T C Under adequate sample sizes and pure randomization: M 2 ~ χ k Choose a and rerandomize when M > a
Distribution of M p a = Probability of accepting a randomization RERANDOMIZE Acceptable Randomizations a M
Choosing p a Choosing the acceptance probability is a tradeoff between a desire for better balance and computational time The number of randomizations needed to get one successful one is Geometric(p a ), so the expected number needed is 1/p a Computational time must be considered in advance, since many simulated acceptable randomizations are needed to conduct the randomization test
Another Measure of Balance The obvious choice may be to set limits on the acceptable balance for each covariate individually This destroys the joint distribution of the covariates -3-2 -1 0 1 2 3 D2 D1-3 -2-1 0 1 2 3
Rerandomization Based on M Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same (and known) and is the same for any linear combination of the covariates
Covariates After Rerandomization Theorem: If n T = n C, the covariate means are normally distributed, and rerandomization occurs when M > a, then ( ) T C E X X M a = 0 and ( ) ( ) XT XC M a = v X a T XC cov cov. v a k a γ 1, 2 + 2 2, k k a γ, 2 2 where γ is the incomplete gamma function: c b 1 y γ (,) bc y e dy 0
Mind-Set Matters Difference in Means = -8.3
1000 Randomizations Pure Randomization Rerandomization p a =.001-10 -5 0 5 10 Difference in Mean Age
Percent Reduction in Variance Define the percent reduction in variance for each covariate to be the percentage by which rerandomization reduces the randomization variance for the difference in means: ( ) var( X j,t X ) j,c var( X j,t X ) j,c var X j,t X j,c rerandomization For rerandomization when M a, the percent reduction in variance for each covariate is 1 v a
Percent Reduction in Variance k = 10, p a =.001 a = 1.48 k v a = 2 γ k 2 +1, a 10 2 k γ 2, a = 2 γ 10 2 +1,1.48 2 10 2 γ 2,1.48 2 =.12 Pure Randomization Rerandomization Percent reduction in variance = 88% Increase p a, how does percent reduction in variance change? k = 10, p a =.1 a = 4.87 k v a = 2 γ k 2 +1, a 10 4.87 2 k γ 2, a = 2 γ +1, 10 2 2 10 2 γ 2, 4.87 2-10 -5 0 5 10 =.38 Difference in Mean Age Percent reduction in variance = 62% -10-5 0 5 10 Difference in Mean Age
Percent Reduction in Variance Percent Reduction in Variance R 2 =1 R 2 =.8 Acceptance Probability: log10 scale 6 5 4 3 2 1 0 100 80 60 40 20 0 80 60 40 20 0 10 20 30 40 50 k: Number of Covariates
Estimated Treatment Effect Theorem: If n T = n C and rerandomization occurs when M > a, then E Y# $ Y# & M a = τ and if the covariate means are normally distributed and the treatment effect, τ, is additive, the percent reduction in variance for the outcome difference in means is (1 v a )R 2, where R 2 is the coefficient of determination between Y and x.
Outcome Variance Reduction Percent Reduction in Variance R 2 =1 R 2 =.8 R 2 =.5 R 2 =.2 Acceptance Probability: log10 scale 6 5 4 3 2 1 0 100 80 60 40 20 0 80 60 40 20 0 50 40 30 20 10 0 20 15 10 5 0 10 20 30 40 50 k: Number of Covariates
Outcome Variance Reduction k = 10, p =.001 v =.12 a Outcome: Weight Change R 2 =.1 Percent Reduction in Variance = (1 v a )R 2 = (1.12 )(.1) = 8.8% a Outcome: Change in Diastolic Blood Pressure R 2 =.64 Percent Reduction in Variance = (1 v a )R 2 = (1.12 )(.64) = 56% Equivalent to increasing the sample size by 1/(1 -.56) = 2.27
Analysis Rerandomization can drastically improve power! BUT to take advantage of this, the analysis has to account for the rerandomization Typical distributional based methods (such as a t-test) will overestimate the standard error, and will be too conservative (p-values will be too insignificant) To get an accurate p-value and Type I error rate, use a randomization test, where each simulated randomization follows the same rerandomization procedure used in the actual experiment
Extensions Tiers of covariates importance Improving computational efficiency Sequential assignment Block experiments Unequal treatment group sizes Multiple treatment groups Choose the best from a fixed number of randomizations
Conclusion Rerandomization improves covariate balance between the treatment groups, giving the researcher more faith that an observed effect is really due to the treatment If the covariates are correlated with the outcome, rerandomization also increases precision in estimating the treatment effect, giving the researcher more power to detect a significant result
Lock Morgan, K. and Rubin, D.B. (2012). Rerandomization to Improve Covariate Balance in Experiments, Annals of Statistics, 40(2): 1262-1282. Lock Morgan, K. and Rubin, D.B. (2015). Rerandomization to Balance Tiers of Covariates, JASA, 110(512): 1412-1421. klm47@psu.edu
Tiers of Covariates Place covariates into tiers of varying importance 1) Assess balance for the top tier as usual (calculate M 1, rerandomize if M 1 > a 1 ) 2) For each lower tier, regress each covariate on all covariates in more important tiers. Balance the residuals (calculate M t for the residuals of tier t, rerandomize if M t > a t ) Balance criteria can be more stringent for more important tiers
Computational Efficiency First, orthogonalize covariates: regress each covariate on all previous covariates, and use residuals as new covariate Mahalanobis distance simplifies to a sum of squared differences in covariate means If squared difference in means for any covariate exceeds threshold, rerandomize Result is the same as if we used original covariates, because Mahalanobis distance is affinely invariant
Sequential Assignment Rerandomization used in the design of a clinical trial proposed to the FDA for a drug to improve female sexual satisfaction Patients enroll sequentially: rerandomize in blocks of B women at a time (B = 6 for this example) 1. Randomize the first B women to enroll 2. Treat these women with placebo or active treatment 3. Wait for B more women to enroll 4. Use rerandomization for these next B women, measuring overall covariate balance for all 2B women enrolled 5. Treat these new B women with placebo or active 6. Wait for B more women to enroll For each rerandomization, generate a fixed number of randomizations and choose the best
Blocking This is not meant to replace blocking, and blocking and rerandomization can be used together Blocking is great for balancing a few very important covariates, but is not easily used for many covariates Block what you can, rerandomize what you can t
Unequal Group Sizes If necessary for incentive to participate, one option is to rerandomizeinto multiple equal sized groups, then combine groups after the fact Example: for a treatment group twice as big as the control, rerandomize into 3 balanced groups, then combine two to form the treatment group If only enough resources to support a certain number in one group, it can be worth reducing the sample size to get equal group sizes (if you care about unbiasedness)
Multiple Treatment Groups Rerandomization is easily extended to multiple treatment groups; just need a balance measure To measure balance, one of the MANOVA test statistics (such as Wilks Λ) can be used, measuring the ratio of within group variability to between group variability for multiple covariates This is equivalent to rerandomizing based on Mahalanobis distance if used on only two groups
Small Samples or Non-Normality The theoretical results given depend on M ~ χ k 2 This is not true if n is not large enough for the CLT to make the covariate means normal That is fine! Rerandomization does NOT depend on asymptotics. The threshold for a desired p a and the percent reduction in variance for covariates can be determined empirically through simulation Variance of the estimate is used nowhere in analysis. The randomization test will work, no matter how nonnormal the covariate means are
Conclusion Rerandomization improves covariate balance between the treatment groups, giving the researcher more faith that an observed effect is really due to the treatment If the covariates are correlated with the outcome, rerandomization also increases precision in estimating the treatment effect, giving the researcher more power to detect a significant result
Lock Morgan, K. and Rubin, D.B. (2012). Rerandomization to Improve Covariate Balance in Experiments, Annals of Statistics, 40(2): 1262-1282. Lock Morgan, K. and Rubin, D.B. (2015). Rerandomization to Balance Tiers of Covariates, JASA, 110(512): 1412-1421. klm47@psu.edu