Power and sample size calculations Susanne Rosthøj Biostatistisk Afdeling Institut for Folkesundhedsvidenskab Københavns Universitet sr@biostat.ku.dk April 8, 2014
Planning an investigation How many individuals do we need??? It depends on the design the size of the effect we are looking for how certain we want to be in finding the effect and the purpose of the investigation : obtain a specific precision of an estimate obtain a specific (power) of a test (most common). 2 / 12
Precision We want to estimate the risk of CHD for females with a precision of p ± a. 95% confidence interval (95%) : p ± 1.96 p(1 p) Thus a = 1.96 p(1 p) n i.e. n = 3.84p(1 p) a 2. n. We need a guess at p. Example: For p = 0.10 and a = 0.04 we find n = 216. Similarly based on a quantitative outcome n = 3.84( SD a )2 if we need a precision of the mean µ at ±a. 3 / 12
Test of hypotheses A test of a hypothesis H 0 can give two types of error : Type I: Reject the hypothesis even though it is true. Type II: Accept the hypothesis even though it is wrong. Probability of type I error : α = level of significance. Probability of type II error : β. 1 β = power. Truth Conclusion Hypothesis true Hypothesis wrong Accept Correkt conclusion Type II error 1 α β Reject Type I fejl Correkt conclusion α 1 β 4 / 12
Comparison of two groups We determine the number of individuals in each group. Binary response: Determine p 1, p 2, α and β. Let p = 1 2 (p 1 + p 2 ). n = ( z 1 α/2 p1 (1 p 1 ) + p 2 (1 p 2 ) + z 1 β 2 p(1 p) ) 2 (p 1 p 2 ) 2 with z p being the quantiles in the standard normal distribution. Quantitative response: Determine µ 1, µ 2, SD, α an β. We need µ 1 µ 2 (= (delta)). SD 2 n = 2 (µ 1 µ 2 ) 2 (z 1 α/2 + z 1 β ) 2. 5 / 12
The size of the sample The needed samples size depends on : and the level of significance (α) the power (1 β) the difference between the groups : the larger difference the smaller the needed sample size the variation (SD) : the larger, the larger the sample size 6 / 12
Example Assume that we want to detect af difference in SBP of =10mmHg for women randomized to placebo / treatment. We want to be 90% sure to detect the difference (1 β = 0.90) when testing on the 5% significance level (α = 0.05). In the Framingham data we find (SD = 25). I.e. n = 2( SD )2 (z 1 0.05/2 + z 0.90 ) 2 = 2( SD )2 (1.96 + 1.28) 2 We need 132 women in each group. = 2 ( 25 10 )2 10.5 = 131.25. 7 / 12
Sample size calculations in R Comparison of proportions in two groups > power.prop.test( p1=0.6, p2=0.8, power=0.9, sig.level=0.05 ) Two-sample comparison of proportions power calculation n = 108.2355 p1 = 0.6 p2 = 0.8 sig.level = 0.05 power = 0.9 alternative = two.sided NOTE: n is number in *each* group Comparison of means in two groups > power.t.test( delta=10, sd=25, power=0.9, sig.level=0.05 ) Two-sample t test power calculation n = 132.3106 delta = 10 sd = 25 sig.level = 0.05 power = 0.9 alternative = two.sided ## NB : Difference from slide 6 due to rounding of quantiles ## used in formula NOTE: n is number in *each* group More functions for calculating sample size is available in package pwr. 8 / 12
Power calculations If we have two samples of size n = 100 we may ask what the power is to detect a specific difference between the groups. Comparison of proportions in two groups > power.prop.test( p1=0.6, p2=0.8, n=100, sig.level=0.05 ) Two-sample comparison of proportions power calculation n = 100 p1 = 0.6 p2 = 0.8 sig.level = 0.05 power = 0.8757319 alternative = two.sided NOTE: n is number in *each* group Comparison of means in two groups > power.t.test( delta=10, sd=25, n=100, sig.level=0.05 ) Two-sample t test power calculation n = 100 delta = 10 sd = 25 sig.level = 0.05 power = 0.8036466 alternative = two.sided NOTE: n is number in *each* group 9 / 12
Groups of uneaqual size If the group sizes differ we can find the total number of individuals needed by 1) calculate N = 2n as if the groups were of equal size 2) calculate k = n 1 /n 2 describing the difference in group sizes 3) determine the total number of individuals as N total = N (1+k)2 4k. Suppose, in the example above, that we want group 1 to have the double size of group 2 : 1) N = 2 132 = 264 2) k = 2 3) N total = N (1+k)2 4k = 264 9 8 = 297, i.e. n 1 = 198 and n 2 = 99. 10 / 12
Exercise With a power of 90%, a significance level of 5% : Comparing two groups of equal size: 1) How many individuals do we need to detect the difference between proportions of 0.02 and 0.04? How many individuals do we need to detect the difference between proportions of 0.52 and 0.54? 2) How many individuals do we need to detect the difference between means of 2 and 4 (SD = 25)? How many individuals do we need to detect the difference between means of 52 and 54 (SD = 25)? 11 / 12
Additional exercise We will simulate data to illustrate how often Type I an II errors occur. Run each of the two programs below 10 (or 100!) times. Consider the setup with two groups with equal means µ = µ 1 = µ 2 = 100 and SD = 25. Type I error y1 <-rnorm( n=100, mean=100, sd=25 ) y2 <-rnorm( n=100, mean=100, sd=25 ) t.test( y1, y2, var.equal=t ) # Generate sample size of 100 with mean 100, SD=25 # Generate sample size of 100 with mean 100, SD=25 How many times did you reject the (true) null hypothesis (approx 5%). Type II error y1 <-rnorm( n=100, mean=100, sd=25 ) y2 <-rnorm( n=100, mean=110, sd=25 ) t.test( y1, y2, var.equal=t ) # Generate sample size of 100 with mean 100, SD=25 # Generate sample size of 100 with mean 110, SD=25 How many times did you accept the (false) null hypothesis (approx 20%, 1-power found on slide 9). 12 / 12