Sample Size/Power Calculation by Software/Online Calculators

Size: px

Start display at page:

Download "Sample Size/Power Calculation by Software/Online Calculators"

Trevor Harris
5 years ago
Views:

1 Sample Size/Power Calculation by Software/Online Calculators May 24, 2018 Li Zhang, Ph.D. Associate Professor Department of Epidemiology and Biostatistics Division of Hematology and Oncology Department of Medicine University of California, San Francisco

2 Topics R packages SAS Proc Online calculator: CTSI sample size calculator Online calculator for clinical trial: SWOG Software: G*Power 2

3 Power Analysis with R 3

4 Power/Sample size calculation for one or two proportions Power calculations for proportion tests (one sample) H 0 : p=p 1 vs. H a : p p 1 pwr.p.test(h, n, sig.level, power, alternative = c("two.sided","less","greater")) Power calculation for two proportions (same sample size) H 0 : p 1 =p 2 vs. H a : p 1 p 2 pwr.2p.test(h, n, sig.level, power, alternative=c("two.sided","less","greater")) Power calculation for two proportions (different sample sizes) H 0 : p 1 =p 2 vs. H a : p 1 p 2 pwr.2p2n.test(h, n1, n2, sig.level, power, alternative = c("two.sided", "less","greater")) Effect size calculation R Demo 4

5 Power calculations for chi-squared tests pwr.chisq.test(w = NULL, N = NULL, df = NULL, sig.level = 0.05, power = NULL) ES.w1(P0, P1): Effect size calculation in the chi-squared test for goodness of fit, which is the sum of differences between observed and expected outcome frequencies Compute effect size w for two sets of k probabilities P0 (null hypothesis) and P1 (alternative hypothesis) ES.w2(P0, P1): Effect size calculation in the chi-squared test for association Compute effect size w for a two-way probability table corresponding to the alternative hypothesis in the chisquared test of association in two-way contingency tables 5

6 Power/Sample size calculation for one or two means Power calculations for t-tests of means (one sample, two samples and paired samples) One sample: H 0 : μ = μ 1 vs. H a : μ μ 1 Two sample or paired samples: H 0 : μ 1 = μ 2 vs. H a : μ 1 μ 2 pwr.t.test(n, d, sig.level, power, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "less", "greater")) Power calculations for two samples (different sizes) t-tests of means H 0 : μ 1 = μ 2 vs. H a : μ 1 μ 2 pwr.t2n.test(n1, n2, d, sig.level = 0.05, power, alternative = c("two.sided", "less","greater")) R Demo 6

7 Power calculations for balanced one-way analysis of variance tests pwr.anova.test(k = NULL, n = NULL, f = NULL, sig.level = 0.05, power = NULL) k n f Number of groups Number of observations (per group) Effect size 7

8 Power calculations for the general linear model pwr.f2.test(u = NULL, v = NULL, f2 = NULL, sig.level = 0.05, power = NULL) u and v are the numerator and denominator degrees of freedom. We use f2 as the effect size measure. when evaluating the impact of a set of predictors on an outcome when evaluating the impact of one set of predictors above and beyond a second set of predictors (or covariates) 8

9 Other R Packages for Sample Size Calculation powersurvepi: Power and Sample Size Calculation for Survival Analysis of Epidemiological Studies epir: Sample size cohort study, case-control study, cross-sectional study, under one or two-stage cluster sampling kappasize: Sample Size Estimation Functions for Studies of Interobserver Agreement powermediation: Power/Sample size calculation for mediation analysis, simple linear regression, logistic regression, or longitudinal study power.roc.test {proc}: Computes sample size, power, significance level or minimum AUC for ROC curves. RNASeqPower: Sample Size for RNA-Seq and similar Studies 9

10 Case-control study by library(epir) A matched case control study is to be carried out to quantify the association between exposure A and an outcome B. Assume the prevalence of exposure in controls is 0.60 and the correlation between case and control exposures for matched pairs (rho) is 0.20 (moderate). Assuming an equal number of cases and controls, how many subjects need to be enrolled into the study to detect an odds ratio of 3.0 with 0.80 power using a two-sided 0.05 test? epi.ccsize(or = 3.0, p0 = 0.60, n = NA, power = 0.80, r = 1, rho = 0.2, design = 1, sided.test = 2, conf.level = 0.95, method = "matched", fleiss = FALSE) A total of 162 subjects need to be enrolled in the study: 81 cases and 81 controls. 10

11 Case-control study by library(epir) How many cases and controls are required if we select three controls per case? epi.ccsize(or = 3.0, p0 = 0.60, n = NA, power = 0.80, r = 3, rho = 0.2, design = 1, sided.test = 2, conf.level = 0.95, method = "matched", fleiss = FALSE) A total of 204 subjects need to be enrolled in the study: 51 cases and 153 controls. 11

12 kappasize: Sample Size Estimation Functions for Studies of Interobserver Agreement Library(kappaSize) Can handle binary to 5 categories Confidence Interval Approach E.g. CI3Cats Calculation of the Lowest Expected Value E.g., FixedN4Cats Power-Based Approach E.g., PowerBinary 12

13 Computes sample size/power/minimum AUC for ROC curves power.roc.test(...) One or Two ROC curves test with roc objects: power.roc.test(roc1, roc2, sig.level = 0.05, power = NULL, alternative = c("two.sided", "one.sided"), reuse.auc=true, method = c("delong", "bootstrap", "obuchowski"),...) One ROC curve with a given AUC: power.roc.test(auc = NULL, ncontrols = NULL, ncases = NULL, sig.level = 0.05, power = NULL, kappa = 1, alternative = c("two.sided", "one.sided"),...) Two ROC curves with the given parameters: power.roc.test(parslist, ncontrols = NULL, ncases = NULL, sig.level = 0.05, power = NULL, kappa = 1, alternative = c("two.sided", "one.sided"),...) 13

14 RNASeqPower: Sample Size for RNA-Seq and Similar Studies rnapower(depth, n, n2 = n, cv, cv2 = cv, effect, alpha, power) depth average depth of coverage for the transcript or gene of interest. Common values are 5-20, any numeric value >0 is valid. n sample size in group 1 (or both) n2 sample size in group 2 cv biological coefficient of variation in group 1 (or both). cv2 biological coefficient of variation in group 2 effect size target effect size 14

15 Comments about R packages Pros: Free A lot of resources for different tests/study designs Generate a figure/table easily for different options of parameters, for example, sample size calculation for sequencing data Cons: Need to write codes Hard to implement sometimes Not sure about reliability 15

16 Power Analysis with SAS SAS PROC POWER t-tests, equivalence tests, and confidence intervals for means tests, equivalence tests, and confidence intervals for binomial proportions multiple regression tests of correlation and partial correlation one-way analysis of variance rank tests for comparing two survival curves logistic regression with binary response Wilcoxon-Mann-Whitney (rank-sum) test PROC GLMPOWER: Compute Power and Sample Size for Repeated Measures 16

17 SAS Example: Calculate power for Pearson chi-squared tests Same sample size, two-sided test of proportions proc power; twosamplefreq test=pchi groupproportions=( ) npergroup=30 power=.; run; The SAS System The POWER Procedure Pearson Chi-square Test for Proportion Difference Fixed Scenario Elements Distribution Asymptotic normal Method Normal approximation Group 1 Proportion 0.1 Group 2 Proportion 0.5 Computed Power Power Sample Size per Group 30 Number of Sides 2 Null Proportion Difference 0 Alpha

18 SAS Example: Calculate power for Pearson chi-squared tests Different sample size, two-sided test of proportions proc power; twosamplefreq test=pchi groupproportions=( ) groupns=25 50 power=.; run; The SAS System The POWER Procedure Pearson Chi-square Test for Proportion Difference Fixed Scenario Elements Distribution Asymptotic normal Method Normal approximation Group 1 Proportion 0.1 Group 2 Proportion 0.5 Group 1 Sample Size 25 Group 2 Sample Size 50 Computed Power Power Number of Sides 2 Null Proportion Difference 0 Alpha

19 SAS Example: Calculate power for t- tests Two independent samples, same size proc power; twosamplemeans test=diff meandiff=2 stddev=2.8 npergroup=30 power=.; run; The SAS System The POWER Procedure Two-Sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Mean Difference 2 Standard Deviation 2.8 Computed Power Power Sample Size per Group 30 Number of Sides 2 Null Difference 0 Alpha

20 SAS Example: Calculate power for t- tests One sample proc power; onesamplemeans test=t mean=2 stddev=2.8 ntotal=30 power=.; run; The SAS System The POWER Procedure One-Sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Mean 2 Standard Deviation 2.8 Computed Power Power Total Sample Size 30 Number of Sides 2 Null Mean 0 Alpha

21 SAS Example: Calculate power for t- tests Paired samples proc power; pairedmeans test=diff meandiff=2 corr=0.5 stddev=2.8 npairs=30 power=.; run; The SAS System The POWER Procedure Paired t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Mean Difference 2 Standard Deviation 2.8 Correlation 0.5 Computed Power Power Number of Pairs 30 Number of Sides 2 Null Difference 0 Alpha

22 SAS Example: Calculate power for t- tests Two independent samples, different sizes proc power; twosamplemeans test=diff meandiff=2 stddev=2.8 groupns=(20 40) power=.; run; The SAS System The POWER Procedure Two-Sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Mean Difference 2 Standard Deviation 2.8 Computed Power Power Group 1 Sample Size 20 Group 2 Sample Size 40 Number of Sides 2 Null Difference 0 Alpha

23 UCSF CTSI Sample Size Calculators Can do most of the popular tests Compare the mean of a continuous measurement in two samples which allow for clustered sampling. A cluster randomized controlled trial is a type of randomized controlled trial in which groups of subjects (as opposed to individual subjects) are randomised. 23

24 Online Calculators: SWOG Primary objective is not a hypothesis, just estimation, then provide the precision of the estimation Example: The expected adherence rate is 80%, n=50 95% CI is (66.3%, 90.0%) One arm binomial: H 0 : P=0.1 vs. H a : P 0.1 One arm survival: Length of the accrual period Length of the follow-up period, i.e. the time from end of accrual to analysis H 0 : median OS=6 months vs. H a : median OS>6 months H 0 : s(t)=0.5 at 6 months vs. H a : s(t) = 0.5 at 9 months 24

25 Online Calculators: SWOG (cont.) Two-arm binomial: H 0 : P 1 =P 2 vs. H a : P 1 P 2 P 1 = 0.1 vs. P 2 = 0.25 Two-arm survival: Length of the accrual period Length of the follow-up period, i.e. the time from end of accrual to analysis H 0 : HR = 2 vs. H a : HR 2 (Median OS = 6months for null, 12-month accrual and 12-month followup) 25

26 Online Calculators: SWOG (cont.) Two stage a l r l a 2 r 2 If the number of successes after completing the first stage is < al, we reject the alternative hypothesis that p > Pa. If the number of successes after completing the first stage is > r l, we reject the null hypothesis that p < P 0. If the number of successes after completing the trial is < a 2 then we reject the alternative hypothesis. If the number of successes after completing the trial is > r 2 then we reject the null hypothesis. 26

27 Online Calculators: SWOG (cont.) Other options: Survival noninferiority Competing Risk: the hazard of the competing risk random variable Hazard ratios between experimental and standard defining equivalence Hazard ratio must be less than hazard ratio defining equivalence Expected Deaths Make a table of expected death information Provide expected deaths for a given time Provide expected deaths for a time at which the expected proportion of deaths have occurred. 27

28 Online Calculators: Simon s two stage design nstwostagedesign.aspx One arm Phase II clinical trial Endpoint: Response rate or binary outcome Incorporate interim analysis for futility One-sided test Example: H 0 : P=0.1 vs. H a : P>0.1 Simon's two-stage design (Simon, 1989) will be used. The null hypothesis that the true response rate is 0.1 will be tested against a onesided alternative. In the first stage, 22 patients will be accrued. If there are 2 or fewer responses in these 22 patients, the study will be stopped. Otherwise, 18 additional patients will be accrued for a total of 40.The null hypothesis will be rejected if 8 or more responses are observed in 40 patients. This design yields a type I error rate of 0.04 and power of 80% when the true response rate is

29 G* Power G*Power is a tool to compute statistical power analyses for many different t tests, F tests, χ2 tests, z tests and some exact tests. G*Power can also be used to compute effect sizes and to display graphically the results of power analyses. It is free, both Windows and Mac version. 29

30 Exact: Proportion - inequality, two dependent groups (McNemar) Standard Treatment Yes No Yes p 11 p 12 p t No p 21 p 22 1 p t p s 1 p s 1 H 0 : p 12 /p 21 = 1 H 1 : p 12 /p 21 6= 1. Standard Treatment Yes No Yes No e the proportion of discordant pa Select Type of power analysis: Post hoc Options Computation: Exact Input Tail(s): Two Odds ratio: 0.25 α err prob: 0.05 Total sample size: 50 Prop discordant pairs: 0.4 Output Power (1-β err prob): 0.80 Actual α: 0.04 Proportion p12: 0.08 Proportion p21:

Example: We compare 10 groups, and we have reason to expect a "medium" effect size (f =.25). How many subjects do we need in a test with α = 0.05 to achieve a power of 0.95?

31 F test: Fixed effects One-Way ANOVA means to more than two groups. The null hypothesis is that all k means are identical H 0 : µ 1 = µ 2 =... = µ k. The alternative hypothesis states that at least two of the k means differ. H 1 : µ i 6= µ j, for at least one pair i, j with 1 apple i, j apple k. Example: We compare 10 groups, and we have reason to expect a "medium" effect size (f =.25). How many subjects do we need in a test with α = 0.05 to achieve a power of 0.95? Select Type of power analysis: A priori Input Effect size f : 0.25 α err prob: 0.05 Power (1-β err prob): 0.95 Number of groups: 10 Output Noncentrality parameter λ: Critical F: Numerator df: 9 Denominator df: 380 Total sample size: 390 Actual Power:

32 Questions? 32

POWER FOR COMPARING TWO PROPORTIONS WITH INDEPENDENT SAMPLES

This handout covers material found in Section 0.5 of the text. POWER FOR COMPARING TWO PROPORTIONS WITH INDEPENDENT SAMPLES EXAMPLE: Otolaryngology (Example 0.3 of your text, page 405). Suppose a study