HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY? TUTORIAL ON SAMPLE SIZE AND POWER CALCULATIONS FOR INEQUALITY TESTS. John Zavrakidis j.zavrakidis@nki.nl May 28, 2018 J.Zavrakidis Sample and Power Calculations

OUTLINE Introduction Sample Size Calculation General Information Sample Size Calculation for Continuous Outcome Sample Size Calculation for Binary Outcome Sample Size Calculation for Survival Outcome Summary Sample and Power Calculations J.Zavrakidis May 28, 2018

J.Zavrakidis Sample and Power Calculations May 28, 2018 HOW MUCH DATA? Not feasible to collect data on the entire population of interest a random sample is collected How many subjects in a random sample? If no data collection constraints: the more data the better If data collection constraints: sufficient data to ensure results to be accurate, efficient and credible Sample Size Calculation: Is intended to determine the minimal data (or sample size) required for detecting a relevant result. Should be used at the planning stage of theinvestigation

HOW MUCH DATA? Sample size & Statistical Power Too small sample size Powerless experiment Unreliable results Too big sample size Too powerful No meaningful results More Power increases chances of finding significant results More Power increases chances of replicating prior findings More Power increases confidence about results, either significant or not Sample and Power Calculations J.Zavrakidis May 28, 2018

PROCEDURE FOR A PRIORI SAMPLE SIZE CALCULATION 1. Decide on outcome variables and their measurement level 2. Choose a statistical model Specify the null and the alternative hypothesis Choose test statistic 3. Prespecify the effect size Decide about expected difference, the smallest effect size that can be considered clinically important 4. Select the desired power and α level α (significance level of the test): probability of rejecting the null hypothesis when it is true power: probability of rejecting the null hypothesis when it is false, i.e. probability of correctly rejecting H0 J.Zavrakidis Sample and Power Calculations May 28, 2018

TYPES OF OUTCOME AND STATISTICAL MODEL Examples: Means: t-test or ANOVA Proportions: Z test for 2 proportions Bivariate relationship: test for 2 correlations Multiple regression: test for 2 slopes J.Zavrakidis Sample and Power Calculations May 28, 2018

HYPOTHESIS TESTING Non-inferiority/superiority testing H0: parameter -δ H1: parameter > -δ Equivalence testing H0: parameter δ or parameter δ H1: δ < parameter < δ Equality/inequality testing H0: parameter = δ H1: parameter δ (or parameter > δ, or parameter < δ) δ: non-inferiority/superiority/equivalence margin Sample and Power Calculations J.Zavrakidis May 28, 2018

INEQUALITY HYPOTHESIS TESTING H 0 : parameter = parameter value H 1 : parameter parameter value (or parameter > parameter value, or parameter < parameter value) Decision State of nature Reject H 0 Do not reject H 0 H 0 true Type I error (α) Correct decision (1-α) H 0 false Correct decision (1 β) Type II error (β) Sample and Power Calculations J.Zavrakidis May 28, 2018

INEQUALITY HYPOTHESIS TESTING Significance level of the test: probability of incorrectly concluding a significant effect when it does not really exist in the population o The α is often set to 0.05 (5%), sometimes to 0.01 (1%) or 0.10 (10%). Sampling distribution associated with H0 /2 /2 Power : probability of correctly concluding a significant effect when it really exists in the population o It is accepted that the power should be 0.8 or greater. o Power = 1-β Sample and Power Calculations J.Zavrakidis May 28, 2018

INEQUALITY HYPOTHESIS TESTING Test statistics: Test Statistic = Effect size Standard error Test statistic has a sampling distribution Sampling distribution associated with H0 Sampling distribution associated with H1 Sample and Power Calculations J.Zavrakidis May 28, 2018

INEQUALITY HYPOTHESIS TESTING Test statistics: Test Statistic = Effect size Standard error Test statistic has a sampling distribution Sampling distribution associated with H1 Sampling distribution associated with H0 Sample and Power Calculations J.Zavrakidis May 28, 2018

EFFECT SIZE Effect size is a measure of the magnitude of a difference or relationship Unstandardized effect size E.g. (mean group 1 - mean group 2) Standardized effect size E.g. ((mean group 1 - mean group 2)/pooled standard deviation). Used when the metrics of variables being studied do not have intrinsic meaning, when results from multiple studies are being combined, when some or all of the studies use different scales. Determining effect size: Based on 1. substantive knowledge clinically relevant effect 2. findings from prior research 3. a pilot study 4. conventions, e.g. defined by Cohen (means comparison: d=0.20 is small effect, d=0.50 is medium effect, d=0.80 is large effect) Sample and Power Calculations J.Zavrakidis May 28, 2018

DIFFERENCE IN TWO INDEPENDENT MEANS H 0 : µ c = µ E vs H 1 : µ C > µ E Test statistic = ( ഥY c ഥY E )/ S 2 c n c + S 2 E n E Under H0: test statistic normal distribution with mean m1 = 0 Under H1 : teststatistic ~ normal distribution with mean σ 2 m 2 = (μ c μ E ) / c + σ 2 E n c n E µ C, µ E : population means; ഥY c, ഥY E : sample means; σ C, σ E : population standard deviations; S c, S E : sample standard deviations; n c, n E : sample sizes The statistical power is the probability that the test statistic is above the critical value z 1 α when the alternative hypothesis is true Sampling distribution associated with H0 power Sampling distribution associated with H1 m. 1 z1 J.Zavrakidis Sample and Power Calculations May 28, 2018. m2

DIFFERENCE IN TWO INDEPENDENT MEANS Derivation of sample size formula 1 β = P(Z + μ C μ E σ C 2 n C + σ E 2 n E > z 1 α) = P(Z > z 1 α μ C μ E σ C 2 n C + σ E 2 n E ) = = P(Z < z 1 α + μ C μ E σ C 2 n C + σ E 2 n E ) = F( z 1 α + μ C μ E ) σ 2 C + σ E 2 n C n E z 1 β = z 1 α + μ C μ E σ C 2 n C + σ E 2 n E (some algebra ) H0 H1 n C = ( s c 2 + s E 2 /k) z 1 α+z 1 β y c y E 2, k = n E n c power If H 1 : µ C µ E then z 1 α is replaced by z 1 α/2 m. 1 z1 J.Zavrakidis Sample and Power Calculations May 28, 2018. m2

DIFFERENCE IN TWO INDEPENDENT MEANS z 1 α, z 1 β : critical values from the standard normal distribution α = 5% z1 α = 1.65, z 1 α/2 = 1.96 α = 1% z 1 α = 2.33, z 1 α/2 = 2.58 Power = 80% z1 β = 0.84 Power = 85% z1 β = 1.04 Power = 90% z1 β = 1.28 Power = 95% z1 β = 1.65 J.Zavrakidis Sample and Power Calculations May 28, 2018

DIFFERENCE IN TWO INDEPENDENT MEANS H 0 : µ C = µ E vs H 1 : µ C > µ E Test statistic = ( ഥY c ഥY E )/ S 2 c n c + S 2 E n E Under H0: test statistic ~ Student-t distribution with non-centrality parameter 0 & df = nc + ne 2 Under H1 : teststatistic ~ Student-t distribution with non-centrality parameter with mean λ = (μ c μ E ) / σ c 2 n c + σ 2 E & df = nc + ne 2 n E Sampling distribution associated with H0 Sampling distribution associated with H1 power J.Zavrakidis Sample and Power Calculations May 28, 2018

DIFFERENCE IN TWO INDEPENDENT MEANS Example with breast cancer patients: QoL after mastectomy compared to QoL after breast conserving surgery; breast conserving surgery may improve QoL more than mastectomy ഥY M = 15, ഥY BCS = 25 S c = S E = 13 how many women are needed to detect the difference with the power of 90%? http://www.gpower.hhu.de/en.html J.Zavrakidis Sample and Power Calculations May 28, 2018

DIFFERENCE IN TWO INDEPENDENT PROPORTIONS H 0 : π C = π E vs H 1 : π C >π E p C p E Test Statistic = p C (1 p C ) + p E (1 p E ) nc n E Under H0 : test statistic normal distribution with mean = 0 Under H1 : test statistic normal distribution with mean = (π C πe ) / π C(1 π C) /n C + πe(1 πe) / ne π C, π E : population p r o p o r t i o n s; p C, p E : sample p r o p o r t i o n s ; n C, n E : samplesizes The statistical power is the probability that the test statistic is above the critical value z 1 α when the alternative hypothesis is true 1 β = P (Z + πc 1 πc nc π c π E + π E 1 π E n E > Z 1 α) z 1 α πc 1 πc nc π c π E + π E 1 π E n E = z 1 β (some algebra ) J.Zavrakidis n c = ( p c 1 p c + p E (1 p E )/k) z 1 α+z 1 β p c p E Sample and Power Calculations 2, k = n E n c May 28, 2018

Ƹ DIFFERENCE IN TWO INDEPENDENT PROPORTIONS, EXAMPLE Example with sarcoma patients: Standard treatment 25x2Gy radiotherapy without chemotherapy compared to New treatment 25x2Gy radiotherapy with chemotherapy; the New treatment may increase the proportion of patients with necrosis induction p c = 30%, pƹ E = 45%, n c = 50, n c = 50 what is the power level that we can reach? http://www.gpower.hhu.de/en.html J.Zavrakidis Sample and Power Calculations May 28, 2018

DIFFERENCE IN TWO INDEPENDENT PROPORTIONS, EXAMPLE Example with sarcoma patients: Standard treatment 25x2Gy radiotherapy without chemotherapy compared to New treatment 25x2Gy radiotherapy with chemotherapy; the New treatment may increase the proportion of patients with necrosis induction pƹ c = 30%, n c = 50, n c = 50 which effect size can be detected with power of 80%? http://www.gpower.hhu.de/en.html J.Zavrakidis Sample and Power Calculations May 28, 2018

RATIO BETWEEN TWO INDEPENDENT ODDS Ho : π c 1 π c = π E 1 π E VS H1: Test statistic = ( p c p E ) π c 1 π c > Nb(1 b) p (1 p) π E 1 π E ; OR = ; OR = Under H0: test statistic normal distribution with mean=0 πc 1 πc π E 1 π E p C 1 p C p E ; p = 1 b p C + b p C 1 p E Nb 1 b Under H1 : test statistic normal distribution with mean= π c π E π 1 π OR: population odds ratio; OR : sample odds ratio; π C, π E : population event rates; p c, p E : sample event rates; π: overall population event rate; p: overall sample event rate; size; b : p rop o rt i on of sample in E group The statistical power is the probability that the test statistic is above the critical value z 1 α when the alternative hypothesis is true N: total sample 1 β = P Z + π c π E 2 Nb 1 b π 1 π > z 1 α z 1 α π c π E 2 Nb 1 b π 1 π ) = z 1 β N = p(1 Ƹ p) Ƹ z 1 α +z 1 β 2 b(1 b) p c p E 2 J.Zavrakidis Sample and Power Calculations May 28, 2018

RATIO BETWEEN TWO INDEPENDENT ODDS Example with bladder cancer patients: Is bladder cancer associated with cigarette smoking? b = 80%, pƹ c = 20%, OˆR = 2 what is the power with N = 500? http://dceg.cancer.gov/tools/design/power J.Zavrakidis Sample and Power Calculations May 28, 2018

RATIO BETWEEN TWO INDEPENDENT HAZARDS H 0 : HR = 1 vs H 1 : HR > 1, HR = h 1(t) h 2 (t) Test statistic = l o g ( HR) q c q E dn for all t Under H0: test statistic normal distribution with mean = 0 Under H1: test statistic normal distribution with mean = l o g ( HR ) q c q E dn HR: population hazard ratio; H R: sample hazard ratio; q c, q E : p r o p o r t i o n s of N in each group; N: total sample size; d : overall baseline probability of an event The statistical power is the probability that the test statistic is above the critical value z 1 α when the alternative hypothesis is true 1 β = P(Z + log HR q c q E dn > z 1 α ) z 1 α log HR q c q E dn = z 1 β N = z 1 α+z 1 β 2 log (HR) 2 q c q E d Sample and Power Calculations J.Zavrakidis May 28, 2018

RATIO BETWEEN TWO INDEPENDENT HAZARDS, EXAMPLE Example with breast cancer patients: Is dementia associated with chemotherapy? HR = 1.5, q c = q E = 50%, d = 10% how many patients are needed to reach power of 80%? http://dceg.cancer.gov/tools/design/power J.Zavrakidis Sample and Power Calculations May 28, 2018

PRIOR KNOWLEDGE For any sample size calculation we need to know: Type of a test (e.g., independent t -test, paired t-test, ANOVA, regression, etc.) The significance level The expected effect size The power Fixing the significance level, the expected effect size and sample size we can calculate power Fixing the significance level, the expected effect size and power we can calculate sample size Fixing the significance level, power, sample size we can calculate effect size J.Zavrakidis Sample and Power Calculations May 28, 2018

PRIOR KNOWLEDGE For any sample size calculation we need to know: Type of a test (e.g., independent t -test, paired t-test, ANOVA, regression, etc.) The significance level The expected effect size The power General Rules: smaller effect size larger N smaller α or greater power larger N larger measurement variability larger N 2-tailed test larger N than for 1-tailedtest J.Zavrakidis Sample and Power Calculations May 28, 2018

GENERAL RULES Measurement variability influence Sampling distribution associated with H0 power Sampling distribution associated with H1 ṃ1 z 1. m2 Sampling distribution associated with H0 power Sampling distribution associated with H1 ṃ1 z 1. m2 J.Zavrakidis Sample and Power Calculations May 28, 2018

GENERAL RULES Effect size influence Sampling distribution associated with H0 power Sampling distribution associated with H1 ṃ1 z 1. m2 Sampling distribution associated with H0 power Sampling distribution associated with H1 ṃ1 z 1. m2 J.Zavrakidis Sample and Power Calculations May 28, 2018

GENERAL RULES α and test typeinfluence Sampling distribution associated with H0 power Sampling distribution associated with H1 ṃ1 ṃ2 z 1 /2 Sampling distribution associated with H0 power Sampling distribution associated with H1 ṃ1 z 1. m2 J.Zavrakidis Sample and Power Calculations May 28, 2018

POWER & SAMPLE SIZE

LIMITATIONS OF SAMPLE SIZE ANALYSIS Based on assumptions and educated guesses, the analyses give a "best case scenario" estimate of necessary sample size Good strategy is to compute the required sample size for different levels of effect size, α, power, and present N in a range instead of a single number q C q E d HR N power 0.5 0.5 10% 1.5 1809 0.8 0.4 0.6 10% 1.5 1908 0.8 0.3 0.7 10% 1.5 2208 0.8 0.4 0.6 10% 1.6 1388 0.8 0.4 0.6 8% 1.6 1680 0.8 0.4 0.6 8% 1.6 2225 0.9 Sample and Power Calculations J.Zavrakidis May 28, 2018

ADJUSTMENTS OF SAMPLE SIZE ANALYSIS Loss to follow-up: participants withdraw from the study, are lost to follow-up, information on key variables is missing Multiple regression: testing multiple hypotheses since many effect sizes are estimated, N > 10 ( # prognostic factors ) Interaction: subgroup analyses require at least four times larger sample size than analyses of overall association Sample and Power Calculations J.Zavrakidis May 28, 2018

SMALL SAMPLE SIZE ANALYSIS Small sample: studies that have typically between 5 and 30 observations Only large effects can be detected There are fewer options with respect to appropriate statistical procedure, e.g. correlations, logistic regression, multilevel modeling are not appropriate Generalizability of the results may also be questionable Precision of the point estimation is less precise, CI is wider with smaller sample size Sample and Power Calculations J.Zavrakidis May 28, 2018

SOFTWARE G*Power 3: Statistical Power Analyses for Windows and Mac, http://www.gpower.hhu.de/en.html Power, http://dceg.cancer.gov/tools/design/power PASS Sample Size, http://www.ncss.com/software/pass/ Statsdirect, https://www.statsdirect.com / Sample and Power Calculations J.Zavrakidis May 28, 2018

REFERENCES Ryan TP. Sample size determination and power. John Wiley & Sons, Inc. 2013. Chow SC, Shao J, Wang H. Sample Size Calculations in Clinical Research, 2nd Edition. Chapman & Hall/CRC. 2008. Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 1998, 17: 1623-1634 Garcia-Closas M, Lubin JH. "Power and sample size calculations in case-control studies of gene-environmental interactions: Comments on different approaches." American Journal of Epidemiology 1999, 149: 689-93. Faul F, Erdfelder E, Lang A-G, Buchner A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 2007, 39 (2): 175-191. Sample and Power Calculations J.Zavrakidis May 28, 2018