Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations

Similar documents
Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements:

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute

Sampling and Sample Size. Shawn Cole Harvard Business School

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size Determination

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Binary Logistic Regression

Tutorial 2: Power and Sample Size for the Paired Sample t-test

a Sample By:Dr.Hoseyn Falahzadeh 1

Power and sample size calculations

Study Design: Sample Size Calculation & Power Analysis

MATH 10 INTRODUCTORY STATISTICS

Sample Size and Power Considerations for Longitudinal Studies

Bayesian Updating: Discrete Priors: Spring

Liang Li, PhD. MD Anderson

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

BIOS 312: Precision of Statistical Inference

Dealing with the assumption of independence between samples - introducing the paired design.

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Conditioning Nonparametric null hypotheses Permutation testing. Permutation tests. Patrick Breheny. October 5. STA 621: Nonparametric Statistics

The Design of a Survival Study

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

multilevel modeling: concepts, applications and interpretations

The minimum required number of clusters with unequal cluster sizes in cluster randomized trials

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Intraclass Correlations in One-Factor Studies

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

Chapter 1 Statistical Inference

Interval Computations as an Important Part of Granular Computing: An Introduction

May 2, Why do nonlinear models provide poor. macroeconomic forecasts? Graham Elliott (UCSD) Gray Calhoun (Iowa State) Motivating Problem

Classification 1: Linear regression of indicators, linear discriminant analysis

Many natural processes can be fit to a Poisson distribution

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Power and nonparametric methods Basic statistics for experimental researchersrs 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Exam Applied Statistical Regression. Good Luck!

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Tutorial 1: Power and Sample Size for the One-sample t-test. Acknowledgements:

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Physics 509: Bootstrap and Robust Parameter Estimation

A Significance Test for the Lasso

Adaptive Designs: Why, How and When?

Sample Size Calculations

Lecture #11: Classification & Logistic Regression

Drug Combination Analysis

ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS

Sample Size. Vorasith Sornsrivichai, MD., FETP Epidemiology Unit, Faculty of Medicine Prince of Songkla University

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Lecture 4: Generalized Linear Mixed Models

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

Duke University. Duke Biostatistics and Bioinformatics (B&B) Working Paper Series. Randomized Phase II Clinical Trials using Fisher s Exact Test

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

18.05 Practice Final Exam

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Sign test. Josemari Sarasola - Gizapedia. Statistics for Business. Josemari Sarasola - Gizapedia Sign test 1 / 13

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

f rot (Hz) L x (max)(erg s 1 )

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Latent Trait Reliability

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Concept of Statistical Model Checking. Presented By: Diana EL RABIH

Bayesian Updating: Discrete Priors: Spring

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

Part III: A Simplex pivot

AST 418/518 Instrumentation and Statistics

Nonresponse weighting adjustment using estimated response probability

Modeling the Mean: Response Profiles v. Parametric Curves

Practical Statistics

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

REVIEW 8/2/2017 陈芳华东师大英语系

Two-stage k-sample designs for the ordered alternative problem

Power and the computation of sample size

STA 431s17 Assignment Eight 1

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Observations and objectivity in statistics

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 11

Lecture Notes in Quantitative Biology

Regression and the 2-Sample t

Conflicts of Interest

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Advanced Experimental Design

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Transcription:

Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations Ben Brewer Duke University March 10, 2017

Introduction Group randomized trials (GRTs) are widely used. Cancer Prevention Health Promotion Research involving an intervention and more! In GRTs, groups of subjects serve as the randomization units. NOT the subjects themselves. 3.10.17 2/46

Introduction Responses within a group can be similar. Quantified by the Intraclass Correlation Coefficient (ICC). 0 ICC 1 Measures how strongly units within a group resemble each other. Equal to fraction of total variation in data attributable to that unit of assignment. 3.10.17 3/46

Starting Out Lots of things to consider at the design stage. One of the most important considerations What sample size do I need?! N = 3.10.17 4/46

Current Practice Minimal required sample size for two-arm GRT with continuous outcome: N = subjects per arm n = subjects per group ρ = ICC = effect size N = 2[z 1 α + z 2 1 β ] 2 [1 + n 1 ρ] 2 Groups per arm: m = N n 3.10.17 5/46

Issues Problem: the previous formula assumes equal group size! Not always true. Formulas for other outcome types do exist These also assume equal group sizes. 3.10.17 6/46

Quick Fix? Replace the fixed group size n by an estimate of the average group size: തn. Works in situations with low to moderate variance in group sizes. Sadly, not all real life situations feature low to moderate variance. When variance is great, the equal group size assumption fails. 3.10.17 7/46

Consequences Estimates involving sample size and statistical power are unreliable. Could waste money. Could waste time. Could lead to imprudent conclusions. None of these things are good. 3.10.17 8/46

More Comprehensive Solutions Formulas do exist that adjust for variation. Assume that mean are variance of group sizes are known in advance. Performance can vary. Most are for continuous outcomes. 3.10.17 9/46

Alternative Approach Monte-Carlo Simulation! Sidesteps around study-specific assumptions. 1. Repeat data generating process. 2. Obtain empirical distribution of test statistic. 3. Use that empirical distribution to estimate power of statistical test. For our purposes, we do require that data generating process be fully detailed. 3.10.17 10/46

Roadmap 1. General framework I. Monte-Carlo applied to GRTs 2. Continuous Outcomes I. Simulated Example 3. Binary Outcomes 4. Real-Life Example! 3.10.17 11/46

Refocus Goal: Estimate sample size for balanced two-arm GRT study to evaluate an intervention. What do we need? An effect size, A prespecified type I error, α A prespecified type II error, β f nj ( ) = the probability distribution of n j f ρ ( ) = the probability distribution of the ICC 3.10.17 12/46

Remarks We assume group sizes are random variables from a known probability distribution f nj ( ). We need some prior information about n j May be approximately known by design stage. May figure it out from literature review or pilot study. May be able to use sensitivity analysis based on different prior assumptions of group size distributions. This will give you a range: conservative is better! 3.10.17 13/46

Remarks Assuming the ICC is from a known probability distribution f ρ ( ) allows for uncertainty in ρ! Also requires prior information. May be determined by previous slide methods. If we have NO prior information, could use some fixed value. 3.10.17 14/46

STEP 1 Since m is the number of groups per arm, 2m is the total number of groups in the study. Sample 2m n j s from f nj, and sample a single ρ from f ρ ( ). 3.10.17 15/46

Brief Interjection The initial m can be chosen as the minimum feasible number of groups. Could also be guided by the sample calculation from way back in slide 4. Could be bounded above by something else 3.10.17 16/46

STEP 2 Based on the statistical model G used in the analysis, the effect size, and the ICC ρ, simulate data, y. 3.10.17 17/46

STEP 3 Fit the simulated data, y, using G and calculate the test statistic T. Can use any fully specified parametric model. We will use generalized linear mixed effects models, where groups are handled as random effects. This accounts for the ICC! Perform the hypothesis test relevant to your study. Reject or fail to reject based on α. 3.10.17 18/46

STEP 4 Repeat the steps 1-3 M times. Let K be the number of rejected hypotheses. Then you can estimate the power using K ΤM. If K ΤM (1 β), then m is the required number of groups per arm. If K ΤM < (1 β), then increase m and re-iterate through the steps again. 3.10.17 19/46

Comments on Selecting M Want M large enough to get meaningful results. Don t want M so large that the results of your simulation go on your epitaph. If the standard deviation of the estimated power is p(1 p) M, you can get an estimate for M. Just prespecify a standard deviation (like 1%) and solve! 3.10.17 20/46

Word of Warning The above method has some dodgy results when m is very small. Greater variation in estimated power. Although this is uncommon in practice, still good to know! 3.10.17 21/46

Continuous Outcome (Setup) Prespecify: μ 1 = intervention arm mean μ 2 = control arm mean σ y 2 = overall variance Then the effect size is just μ 1 μ 2 σ y 2. 3.10.17 22/46

Continuous Outcome (Model) Our linear mixed effect model is: Y ij = β 0 + β 1 X j + b j + ε ij Y ij = outcome for subject i nested within group j X j = binary variable indicating treatment arm b j = random effect of group j, b j ~N(0, σ 2 b ). ε ij = random effect of subject i, group j, ε ij ~N(0, σ 2 ε ). We can now calculate ρ = σ b 2 2 = σ b. σ 2 b +σ 2 ε σ 2 y 3.10.17 23/46

STEP 1 Sample 2m n j s from f nj, and sample a single ρ from f ρ ( ). 3.10.17 24/46

STEP 2 Calculate σ b 2 and σ ε 2. 3.10.17 25/46

STEP 3 Sample b j and ε ij from their respective distributions. Simulate data Y ij according the prespecified model. Specifically, β 1 = μ 1 μ 2. Also, β 0 is the average outcome of the control arm. Fit the model and perform the hypothesis test for your study. 3.10.17 26/46

STEP 4 Repeat steps 1-3 M times, and let K be the number of times the hypothesis was rejected. If K ΤM (1 β), then m is the required minimum number of groups per arm. Total number of subjects required in the study is estimated by the average of σ 2m i=1 n j. If K ΤM < (1 β), then increase m and re-iterate through the steps again. 3.10.17 27/46

Simulated Example Goal: calculate sample size for balanced two-arm GRT study with a continuous outcome. Required parameters are pre-specified: μ 1 μ 2 = 5 σ y 2 = 20 So = μ 1 μ 2 σ y 2 = 5 20 = 0.25 Desired power = 80% Put ρ = 0.02 3.10.17 28/46

Simulated Example (con.) As a reference, assume all group sizes are equal (n = 50). Calculate the total sample size using: N = 2[z 1 α + z 2 1 β ] 2 [1 + n 1 ρ] 2 This is the current practice method. 3.10.17 29/46

Simulated Example (con.) Now, allow the group sizes n j to vary across all groups. Consider the following four distributions: TruncPoisson(50, range = [5, 500]). TruncNormal(47.54, 25, range = [5, 500]). TruncNormal(17.99, 50, range = [5, 500]). TruncPareto(5, 20, range = [5, 500]). NOTE: mean of TruncNormal is μ = μ + φ a φ(b) ϕ a ϕ b 3.10.17 30/46

Some Graphs Formula (Assume Equal Group Size) 3.10.17 31/46

Additional Notes The minimum number of groups for each arm is in relatively good agreement under some circumstances. Notice how variance can make a big difference in required sample size! Alternative formula for estimating M: if the mean of f nj ( ) is known, then use 2m തn f. 3.10.17 32/46

Binary Outcome (Setup) Prespecify: p 1 = proportion for intervention arm p 2 = proportion for control arm 3.10.17 33/46

Binary Outcome (Model) The logistic mixed effect model is as follows: Y ij = Bern(p ij ) p ij ln = β 1 p 0 + β 1 X j + b j ij Y ij = outcome (0 or 1) p ij = probability of 1 for subject i nested within group j X j and b j are defined as in the continuous outcome model. 3.10.17 34/46

Finding the ICC For this logistic mixed-effects model, the within group variance is π2 3. Thus, the ICC is ρ = σ b 2 σ b 2 + π2 3 3.10.17 35/46

STEP 1 Sample 2m n j s from f nj, and sample a single ρ from f ρ ( ). You should be feeling some déjà vu by this point! 3.10.17 36/46

STEP 2 Calculate σ b 2. 3.10.17 37/46

STEP 3 Sample b j and simulate data Y ij according to the logistic model. β 1 = ln( p1 1 p 1 p2 1 p 2 ) = log-odds ratio between two arms. β 0 = ln( p 1 1 p 1 ) = log-odds of the proportion of the control arm. Fit the model and perform the hypothesis test for your study. 3.10.17 38/46

STEP 4 As in the previous model, repeat your simulation M times. Same conclusions as before! Posted below in case you forgot: If K ΤM (1 β), then m is the required minimum number of groups per arm. Total number of subjects required in the study is estimated by the average of σ 2m i=1 n j. If K ΤM < (1 β), then increase m and re-iterate through the steps again. 3.10.17 39/46

Again Why do we Care? Consider a study performed in 2012. Goal: Examine the efficacy of the Patient Navigation Research Program. Primary outcome: Receiving a definitive diagnosis within a six-month follow-up period post-intervention. The study employed a GRT design; there were 12 clinics involved in the study. These were the units of randomization. Some clinics received treatment; others did not. 3.10.17 40/46

Okay, so Researchers determined that they needed 1576 total patients for this study. Assumption: group sizes are equal. However, group sizes were NOT equal. Smallest group size: 24 Largest group size: 379 3.10.17 41/46

What happened? The study did not find a significant difference between the two groups. The study also had a whopping power of 30%. These results have to be taken with a grain of salt; that sample size calculation was unreliable. What if the intervention helped expedite the health care process? What about the time and resources used to conduct the study? 3.10.17 42/46

Analysis The authors took this study to be prior information and performed a sample size calculation of their own. They considered three distributions for f nj ( ): Truncated Poisson Truncated Normal Truncated Pareto Each of these led to a total sample size between 5,000 and 6,000. 3.10.17 43/46

Punch Line Not assuming that group sizes were equal would have led to a more realistic sample size calculation, thereby potentially saving money and time. 3.10.17 44/46

Conclusion Monte Carlo Simulation can provide an alternative framework for determining sample size when group sizes are not assumed to be equal. More versatile; not constrained to continuous data! Something to ponder: what changes would need to be made when the logistic model has proportions close to 0 or 1? 3.10.17 45/46

Questions??? 3.10.17 46/46