Sample Size / Power Calculations

A Simple Example Goal: To study the effect of cold on blood pressure (mmhg) in rats Use a Completely Randomized Design (CRD): 12 rats are randomly assigned to one of two different temperature environments Trt 1: Normal environment (20 C) n=6 Trt 2: Cold environment (5 C) n=6 Expect higher mean BP when under Trt 2 Is n=6 enough to detect important difference? 2

Statistical Approach Two competing hypotheses: H o : µ 1 =µ 2 i.e. consider one-tailed test (for now) H 1 : µ 1 <µ 2 Will quantify evidence against null hypothesis in order to make conclusion Compare the P-value of some statistical test relative to a predetermined significance level (α) There are two possible incorrect conclusions based on the analysis of the data Commonly present conclusions/truths in terms of a 2x2 table 3

Type I and Type II errors True state H o : is true H 1 : is true What the data indicate: Fail to reject H o : (P>α) No error Type II error (Prob = β) Reject H o : (P α) Type I error (Prob is α) No error 4

Is n = 6 rats large enough? Rephrase: Do we have enough statistical power? To do calculations, need to know several unknowns Thus, power analysis involves educated guessing How large is the true mean difference (δ = µ 2 -µ 1 )? 1) What do you anticipate or want to make sure you can detect? 2) What would be economically/practically important? Suppose researchers believe that δ =20 mmhg is important How much variability (σ) exists between rats within a grp? Some prior information potentially available from previously published studies or small pilot study May also have to guess Suppose researchers believe that σ = 15 mmhg 5

One way to elicit/guess values for σ Use an empirical rule: Consider range of responses to be equal to 4σ Question to client: What would be the likely range (max-min) of responses for rats within the same trt? Suppose the answer was 60 mmhg R = 60 σ = 15 mmhg. Can often find similar published studies with estimates of σ. Always round up to be a little conservative. R 4σ 6

Quantifying evidence against H o Consider using a two-sample z test (σ known) Under H o : Under H 1 : 2 2σ y2 y1 ~ N 0, n 2 2σ y2 y1 ~ N δ, n Difference is in the means of the distributions. Assuming equal variances here but that is not necessary. Conduct one-tailed z-test for a certain α Reject H o : if z y 2 1 = > 2 2σ n y z α y y > z 2 1 α 2 2σ n 7

Distributions of y y 2 1 H o : H 1 : Power = 1-β ( ) 2 zα δ σ n = 1 Φ 2 / z α 2 2σ n α 1 β 0 δ 8

More reasonable statistical test t-test Rarely can you assume variance(s) to be known One-sided: Reject H o if 2 1 = > 2 2 s1 s2 Two-sided (H 1 : µ 1 µ 2 ): Reject H o : if t y n y n t α, df Must use non-central t distribution in power power calculations + t > t α /2,df Choice of df depends on assumptions regarding variances 9

More reasonable statistical test? Nonparametric test Often proposed simply because data not Normal or sample sizes are small In practice, t test approximately correct provided that y is approximately Normal (CLT) 2 y 1 When sample sizes small, t test more powerful if data are truly Normal Similar power calculations are available, usually based on large sample assumptions 10

Using SAS for power analysis proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 npergroup=6 stddev=15 power=.; run; One-sided t-test or proc power; onewayanova alpha=.05 test=overall groupmeans=(0 20) npergroup=6 stddev=15 power=.; run; Similar to two-sided t-test 11

SAS Output Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Number of Sides 1 Null Difference 0 Alpha 0.05 Mean Difference 20 Standard Deviation 15 Sample Size Per Group 6 Computed Power Power 0.693 12

SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Alpha 0.05 Group Means 0 20 Standard Deviation 15 Sample Size Per Group 6 Computed Power Power 0.550 Typically want power to be larger than 80%... 13

Using SAS for sample size estimation proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 npergroup=. stddev=15 power=.80; run; or proc power; onewayanova alpha=.05 test=overall groupmeans=(0 20) npergroup=. stddev=15 power=.80; run; 14

SAS Output Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Number of Sides 1 Null Difference 0 Alpha 0.05 Mean Difference 20 Standard Deviation 15 Nominal Power 0.8 Computed N Per Group Actual N Per Power Group 0.813 8 15

SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Alpha 0.05 Group Means 0 20 Standard Deviation 15 Nominal Power 0.8 Computed N Per Group Actual N Per Power Group 0.805 10 16

Generating a Power Curve proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 stddev=15 power=. npergroup=3 to 20 by 1; plot interpol=join yopts=(ref=0.80); run; 17

Power Curve for one-sided t test 1.0 0.9 0.8 0.8 0.7 Power 0.6 0.5 0.4 0.3 0 5 10 15 20 Sample Size Per Group 18

What if more than two treatments? In a study of vitamin D supplementation, subjects are assigned to each of 5 treatment groups and bone density changes over a twelve week period are to be recorded. Anticipate mean responses of 3.9, 4.1, 4.2, 4.3, and 4.5 mg/cm 2 for each of the five treatments Based on a previous study, they anticipate a within-treatment variance of about 0.30 They want to know if n=4 subjects per treatment would provide sufficient power for the ANOVA F- test. 19

Linear model written two ways 1) Y = µ + e ij i ij 2) Y = µ + α + e ij i ij Cell means model Factor level effects model i= 1,...,t=5; j = 1,2,,n=4 ( 2 e ~ NIID 0, σ ) ij e t µ i t i= 1 µ = α i = µ i µ αi t i= 1 = 0 i.e. Sum-to-zero constraint 20

One-way ANOVA table Source Df SS MS EMS Treatment t-1 SSA MSA σ 2 function + t i= 1 α 2 i Error t(n-1) SSE MSE σ 2 ANOVA F -test: 1) H o : µ 1 =µ 2 =µ 3 =µ 4 =µ 5 versus H 1 : at least one µ i µ i 2) H o : ALL α i = 0 versus. H 1 : at least one α i 0 Equivalent specs. Note: if H o : is true then both EMS = σ 2 such that F = MSA/MSE ~ F t-1, t(n-1) Central F-distribution 21

Power determination for F-test Under H 1 : µ 1 3.9 µ 4.1 2 µ 3 = 4.2 µ 4 4.3 µ 5 4.5, or α1 0.3 α 0.1 2 α 3 = 0.0 α 4 + 0.1 α 5 + 0.3 with µ = 4.2 such that F = MSA/MSE ~ F t-1,t(n-1),φ t n φ = α 2 σ i = 1 2 i is the non-centrality parameter Non central F-distribution (if φ 0) Corrected sum of squared means (CSSM) =(-0.3) 2 +(-0.1) 2 + +(0.0) 2 + (0.1) 2 +(+0.3) 2 =0.20 for example 22

SAS Code proc power; onewayanova alpha=.05 test=overall groupmeans=(3.9 4.1 4.2 4.3 4.5) npergroup=4 stddev=0.5477 power=.; run; This is the square root of 0.30 23

SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Alpha 0.05 Group Means 3.9 4.1 4.2 4.3 4.5 Standard Deviation 0.5477 Sample Size Per Group 4 Computed Power Power 0.171 Very poorly underpowered.as designed, this would be a waste of time and money to run!! 24

SAS Code proc power; onewayanova alpha=.05 test=overall groupmeans=(3.9 4.1 4.2 4.3 4.5) npergroup=4 to 30 stddev=0.5477 power=.; plot interpol=join yopts=(ref=.80); run; Let s look at a power curve to get an idea of the necessary sample size 25

1.0 0.9 0.8 0.8 0.7 Power 0.6 0.5 0.4 0.3 Looks like we need about 19 animals per group (almost 5 times the number before) 0.2 0.1 0 5 10 15 20 25 30 Sample Size Per Group 26

What if treatment means unknown? Use the worst case scenario Conservative assessment of power Just have to know the difference between the largest and smallest means or the difference you think is meaningful (say = 0.6). Assume all the other means clump in the middle. Minimizes 2 αi 27

SAS Code proc power; onewayanova alpha=.05 test=overall groupmeans=(-0.3 0 0 0 0.3) npergroup=4 to 30 stddev=0.5477 power=.; plot interpol=join yopts=(ref=.80); run; 28

1.0 1.0 0.9 0.9 0.8 0.8 0.8 0.8 0.7 0.7 Power 0.6 0.6 Power 0.5 0.5 0.4 0.4 0.3 0.3 Looks like we need about 21 animals per group in the worst case 0.2 0.2 0.1 0.1 0 5 10 15 20 25 30 0 5 10 Sample Size 15 Per Group 20 25 30 Sample Size Per Group 29

There is actually a trick to computing φ using ANOVA software like PROC GLM/MIXED (O Brien and Lohr, 1984) 1) Substitute true means for data in ANOVA. 2) Use the ANOVA table to compute the noncentrality parameter 3) Then use that computed value in power calculations! 30

Using true means for data data oneway; input treatment mean; datalines; 1 4.0 1 4.0 1 4.0 1 4.0 2 4.3 2 4.3 2 4.3 2 4.3 3 4.6 3 4.6 3 4.6 3 4.6 ; Suppose you are interested in 3 treatments. Anticipate true mean responses of 4.0, 4.3 and 4.6 Anticipate residual variance of 0.30 Wish to compute power based on sample size of n= 4 for each treatment. proc mixed data=oneway noprofile; class treatment; model mean = treatment; parms (0.30) /noiter; ods output tests3 = tests3; run; Output the ANOVA table to a file called tests3 31

Trick to compute φ Compute the ANOVA treatment F ratio " F Treatment " Obs Effect NumDF DenDF FValue ProbF 1 treatment 2 9 1.20 0.3452 Multiple F Treatment by numerator degrees of freedom (NumDF) to get φ: φ = " F " df = 1.2* 2 = 2. 4 Treatment * Treatment F Treatment is a function of corrected sum of squared means 32

Use φ to computer power data power; set tests3; noncent = Fvalue*numdf; alpha = 0.05; criticalvalue = Finv(1-alpha,numdf,dendf,0); Power = 1-Probf(criticalvalue,numdf,dendf,noncent); run; proc print data=power; run; Effect Num DF Den DF The critical value separating the acceptance region from the rejection region Probability of falling in rejection region if H1 is true. FValue ProbF noncent alpha Critical value Power treatment 2 9 1.20 0.3452 2.4 0.05 4.25649 0.20010 33

PROC GLMPOWER does this data example1; input FactorA $ mean; datalines; 1 4.0 2 4.3 3 4.6 run; proc glmpower data=example1 ; class FactorA ; model mean = FactorA ; power stddev =.548 ntotal = 12 power =. alpha=0.05; run; Much simpler data step Total number of experimental units The GLMPOWER Procedure Fixed Scenario Elements Dependent Variable mean Source FactorA Alpha 0.05 Error Standard Deviation 0.548 Total Sample Size 12 Test Degrees of Freedom 2 Error Degrees of Freedom 9 Computed Power Power 0.200 34

Mixed Model Power Calculations GLMPOWER is technically for fixed-effects models only Can use it for mixed models when comparisons only involves the error variance Can extend the PROC MIXED trick to handle all mixed models 35

RCBD mixed model Y = µ + ρ + α + ij i j ij ρ i = 1...,b, ( 2 0, σ ) ( ) ~ NIID ρ j = 1,...,a. i across all i e 2 e where ij ~ NIID 0, σ across all i and j. µ is the overall mean α j is the fixed effect of the jth diet ρ i is the random effect of the ith litter e ij is the residual error term specific to the ijth experimental unit. 36

Example using MIXED trick Suppose we d like to address power for a new RCBD study with 5 treatments and anticipate µ. j = µ + α j µ.1 µ.2 µ.3 µ.4 µ.5 = 70 = 72 = 74 = 76 = 78 Block (e.g. litter) variance: Residual variance: σ ρ σ 2 = 50 2 = 20 How much power would 10 blocks provide for the ANOVA F-test on treatments? 37

Entering the data data rcbd; input diet mean; datalines; 1 70 2 72 3 74 4 76 5 78 ; data rcbd; set rcbd; do litter=1 to 10; output; end; run; 38

SAS code proc mixed data=rcbd noprofile; class litter diet; model mean = diet ; random litter ; parms (20) (50) /noiter; ods output tests3 = ANOVAtest; run; σ 2 ρ = 20 σ 2 = 50 39

Remaining code data power; set ANOVAtest; alpha = 0.05; noncent = numdf*fvalue; crit = Finv(1-alpha,numdf,dendf,0); Power = 1-Probf(crit,numdf,dendf,noncent); run; proc print data=power; var alpha noncent crit numdf dendf power; run; 40

Num Den Obs alpha noncent crit DF DF Power 1 0.05 8 2.63353 4 36 0.54188 Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Central F (under H o ) Noncentral F (under H 1 ) Type I error rate α = Prob(F>F α H 0 ) =0.05 0 1 2 3 4 5 F-ratio 41

What if you consider GLMPOWER proc glmpower data=rcbd; class Diet Litter; model mean = Diet Litter; power stddev = 7.0711 ntotal = 50 power =. alpha=0.05; run; Litter no longer considered random 42

PROC GLMPOWER OUTPUT Fixed Scenario Elements Dependent Variable mean Alpha 0.05 Error Standard Deviation 7.0711 Total Sample Size 50 Error Degrees of Freedom 36 Computed Power Test Index Source DF Power 1 diet 4 0.542 2 litter 9 0.050 43

Summary In the case of an RCBD, results can be obtained via PROC GLMPOWER because litter variance does not enter standard error Cannot use GLMPOWER for incomplete block designs or repeated measures designs 44

Latin Square Design Consider a 3 period crossover design Measurements taken at 4 time points Period 1 Period 2 Period 3 Y 1 Y 2 Y 3 Y 4 Plan to compare Y 2 -Y 1, Y 3 -Y 2, and Y 4 -Y 3 Differences remove subject variability but... Var(Y 2 -Y 1 )=2σ 2 Cov(Y k -Y k-1, Y k+1 -Y k )=-σ 2 See Kononoff, P.J. and K.J. Hanford, 2006 for other latin square examples 45

SAS Code data means; input order period trt $ @@; cards; 1 1 A 1 2 B 1 3 C 2 1 B 2 2 C 2 3 A 3 1 C 3 2 A 3 3 B 4 1 B 4 2 A 4 3 C 5 1 A 5 2 C 5 3 B 6 1 C 6 2 B 6 3 A ; data meanset; set means; do subject=1 to 6; if(trt= A ) then mnresp=5.1; if(trt= B ) then mnresp=5.2; if(trt= C ) then mnresp=0.0; output; end; Can alter number of replications and the important treatment difference 46

SAS Code Proc Mixed noprofile data=meanset; class subject order trt period; model mnresp = trt period; repeated / subject=subject(order) type=toep(2) ; parms (50.562078-25.281039) / NoIter; estimate 'T1 VS T3 trt 1 0-1; contrast 'T1 VS T3' trt 1 0-1; contrast 'T2 VS T3' trt 0 1-1; ODS Output Tests3=anova; ODS Output Contrasts=contr; run; 47

SAS Output Data Power; Set anova contr; if Effect ne ' ' then Source=Effect; if Label ne ' ' then Source=Label; ALPHA = 0.05; * The desired Type I error rate ; FCRIT = FINV(1- ALPHA,NumDF, DenDF, 0); * The 0 represents non-centrality parameter ; NONCENT = NumDF*Fvalue; * The non-centrality parameter ; POWER = 1 - PROBF(FCRIT, NumDF, DenDF, NONCENT); if NONCENT > 3000 then POWER = 1.0; * When NC large PROBF does not return value; PROC PRINT; Var Source numdf dendf NonCent Power; run; 48

Output Num Den Obs Source DF DF NONCENT POWER 1 trt 2 68 10.4940 0.81697 2 period 2 68 0.0000 0.05000 3 T1 VS T3 1 68 7.7163 0.78184 4 T2 VS T3 1 68 8.0218 0.79733 49

Layout of Split Plot Design Two factor treatment structure Consider a completely randomized design in whole plot Whole plot EUs serve as blocks for subplot Treatment A1 Treatment A2 B1 B2 B4 B3 WP EU 1 B2 B4 WP EU 2 B3 B1 WP EU 3 B1 B3 WP EU 4 B4 B2 B3 B4 B2 B1 50

Linear Model Y = µ + A + Rep ) + B + ( AB) + ijk i j ( i k ik e ijk Two-factor treatment structure Several comparisons of interest 1. Compare main effects of A 2. Compare main effects of B 3. Compare levels of A for fixed level of B 4. Compare levels of B for fixed level of A 51

Approach #1 For each type of comparison, compute SE 1. Comparing main effects of A Var( y i.. y ) = Var(Rep Rep + e e i.. = 2 ( 2 2 σ / n +σ / bn) 2. Comparing levels of A at fixed level of B R.( i).( i ) i.. i.. ) Var( y i. k y ) = Var(Rep Rep + e e i. k = 2 ( 2 2 σ / n +σ / n) R.( i).( i ) i. k i. k ) 52

Approach #2 Using Proc Mixed Specify means (A has 4 levels, B has 3 levels) Specify the Rep and Error variances data ideal; input A B mu @@; cards; 1 1 5 1 2 6 1 3 7 2 1 7 2 2 6 2 3 5 3 1 6.5 3 2 6.5 3 3 6.5 4 1 5.5 4 2 5.5 4 3 5.5 ; data idealrep; set ideal; do rep=1 to 6; output; end; run; Can alter the number of EUs per level of A 53

Interaction Plot 54

SAS code proc mixed data=idealrep noprofile; class A Rep B; model mu = A B A*B; random rep(a); parms (2) (1) / noiter; 2 σ R = 2, σ = 1 contrast 'A1 vs A2' A 1-1 0 0 A*B.33333.33333.33333 -.33333 -.33333 -.33333 0 0 0 0 0 0 / df=20; contrast 'A11 vs A21' A 1-1 0 0 A*B 1 0 0-1 0 0 0 0 0 0 0 0; estimate 'A1 vs A2' A 1-1 0 0 A*B.33333.33333.33333 -.33333 -.33333 -.33333 0 0 0 0 0 0 / df=20; estimate 'A11 vs A21' A 1-1 0 0 A*B 1 0 0-1 0 0 0 0 0 0 0 0; ods output tests3 = anova contrasts = contr; run; 2 Documentation states the df for a contrast are associated with the last term specified. Since this is comparison of the WP factor, I use the DF associated with rep(a) 55

Selected Output Covariance Parameter Estimates Cov Parm Subject Estimate Intercept rep(a) 2.0000 Residual 1.0000 SEs agree with previous formulas Estimates Standard Label Estimate Error DF t Value Pr > t A1 vs A2 0.000020 0.8819 20 0.00 1.0000 A11 vs A21-2.0000 1.0000 40-2.00 0.0523 Num Den Obs Source DF DF NONCENT POWER 1 A 3 20 1.2857 0.12041 2 B 2 40 0.0000 0.05000 3 A*B 6 40 24.0000 0.94666 4 A1 vs A2 1 20 0.0000 0.05000 5 A11 vs A2 1 40 4.0000 0.49682 56

Comments Note that ddfm=kr has been recommended for use but not included here. That is because you cannot use ddfm= in this sample size estimation approach. ddfm= will have an effect on df and SE for comparison #3 (page 51). Should not affect others My suggestion would be to run this analysis to get a sample size and then use simulation (say 1000 data sets using the kr option) to confirm the sample size specified if you plan to power up for this comparison. 57

SAS code for simulation data simulate; do sim=1 to 10; do A=1 to 4; do rep=1 to 6; rep_eff = normal(612)*sqrt(2); do B=1 to 3; res_eff = normal(612)*sqrt(1); resp = 5.5 + rep_eff + res_eff; if A=1 & B=1 then resp = 5 + rep_eff + res_eff; if A=1 & B=2 then resp = 6 + rep_eff + res_eff; if A=1 & B=3 then resp = 7 + rep_eff + res_eff; if A=2 & B=1 then resp = 7 + rep_eff + res_eff; if A=4 & B=1 then resp = 5.5 + rep_eff + res_eff; if A=4 & B=2 then resp = 5.5 + rep_eff + res_eff; output; end; end; end; end; 58