Power analysis examples using R

Power analysis examples using R Code The pwr package can be used to analytically compute power for various designs. The pwr examples below are adapted from the pwr package vignette, which is available at https://cran.r-project.org/web/packages /pwr/vignettes/pwr-vignette.html (https://cran.r-project.org/web/packages/pwr/vignettes /pwr-vignette.html). In the first example, we will compute the power for an independent samples t-test. We specify the number of participants in each group, the effect size, the alpha level, the type of t-test, and the alternative hypothesis. pwr.t.test(n = 20, d = 0.3, sig.level = 0.05, type="two.sample", alternative = "tw o.sided") Two-sample t test power calculation n = 20 d = 0.3 sig.level = 0.05 power = 0.1522683 alternative = two.sided NOTE: n is number in *each* group Our power is only.15! Instead of computing the power given a sample size of 20 participants per group, we can specify a desired level of power and find out how many participants per group we should collect to obtain that level of power. pwr.t.test(power = 0.95, d = 0.3, sig.level = 0.05, type="two.sample", alternative = "two.sided") Two-sample t test power calculation n = 289.7353 d = 0.3 sig.level = 0.05 power = 0.95 alternative = two.sided NOTE: n is number in *each* group We would need to collect 290 participants per group to have 95% power for an effect size of 0.3. The paramtest package can be used to estimate power through simulation. The paramtest example below is slightly adapted from the package vignette, which is available at https://cran.r-project.org/web/packages /paramtest/vignettes/simulating-power.html (https://cran.r-project.org/web/packages/paramtest /vignettes/simulating-power.html). 1 of 9 13/11/2017, 12:31

# create user-defined function to generate and analyse data t_func <- function(simnum, N, d) { x1 <- rnorm(n, 0, 1) x2 <- rnorm(n, d, 1) t <- t.test(x1, x2, var.equal=true) # run t-test on generated data stat <- t$statistic p <- t$p.value } return(c(t=stat, p=p, sig=(p <.05))) # return a named vector with the results we want to keep We can now use the function we have created in conjunction with the power_ttest function to estimate power for our previously used design based on simulation. results(power_ttest <- run_test(t_func, n.iter=1000, output='data.frame', N=20, d=.3)) Running 1,000 tests... We now have the t-values and p-values from our simulation, along with a binary indicator of whether a significant p-value was produced. describe(power_ttest$results) vars n mean sd median trimmed mad min max range skew kurt osis se iter 1 1000 500.50 288.82 500.50 500.50 370.65 1.00 1000.00 999.00 0.00-1.20 9.13 t.t 2 1000-0.92 1.03-0.87-0.90 1.04-5.39 2.24 7.63-0.17 0.04 0.03 p 3 1000 0.40 0.31 0.35 0.38 0.40 0.00 1.00 1.00 0.34-1.23 0.01 sig 4 1000 0.15 0.35 0.00 0.06 0.00 0.00 1.00 1.00 2.00 2.01 0.01 Here, even with only 1000 iterations, the power estimate from simulation can be similar to the one obtained analytically. The paramtest package can also be used to estimate power for coefficients in more complex models. Some other packages focus on estimating power for a specific type of model or family of models. For example, the SIMR package conducts simulations to estimate power for multilevel models. The SIMR example below is adapted from Green and MacLeod (2016; https://doi.org/10.1111 /2041-210X.12504 (https://doi.org/10.1111/2041-210x.12504)). The code works with the simdata data in SIMR to estimate power. We will first fit a simple random-intercept model using the glmer function from the lme4 package. describe(simdata) 2 of 9 13/11/2017, 12:31

vars n mean sd median trimmed mad min max range skew kurtosis se y 1 30 9.35 3.02 8.38 9.19 3.27 5.78 14.51 8.73 0.40-1.47 0.55 x 2 30 5.50 2.92 5.50 5.50 3.71 1.00 10.00 9.00 0.00-1.34 0.53 g* 3 30 2.00 0.83 2.00 2.00 1.48 1.00 3.00 2.00 0.00-1.60 0.15 z 4 30 2.73 1.87 2.00 2.46 1.48 0.00 9.00 9.00 1.53 2.77 0.34 summary(model1 <- glmer(z ~ x + (1 g), family="poisson", data=simdata)) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [ 'glmermod'] Family: poisson ( log ) Formula: z ~ x + (1 g) Data: simdata AIC BIC loglik deviance df.resid 109.0 113.2-51.5 103.0 27 Scaled residuals: Min 1Q Median 3Q Max -1.28918-0.41836-0.03916 0.57284 1.29631 Random effects: Groups Name Variance Std.Dev. g (Intercept) 0.08345 0.2889 Number of obs: 30, groups: g, 3 Fixed effects: Estimate Std. Error z value Pr(> z ) (Intercept) 1.54079 0.27173 5.670 1.43e-08 *** x -0.11481 0.03955-2.903 0.0037 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Correlation of Fixed Effects: (Intr) x -0.666 The obtained coefficient for the fixed effect of x was -0.11. We are interested in finding out the required sample size for an effect size for x of -0.05. fixef(model1) (Intercept) x 1.540793-0.050000 The powersim function simulates the model and records how often the effect size(s) of interest are statistically significant. It then returns power estimates based on these simulations. The default is 1000 simulations but here, in the interests of time, we will do only 50 simulations. model1ps <- powersim(model1, nsim=50) 3 of 9 13/11/2017, 12:31

model1ps Power for predictor 'x', (95% confidence interval): 40.00% (26.41, 54.82) Test: z-test Effect size for x is -0.050 Based on 50 simulations, (0 warnings, 0 errors) alpha = 0.05, nrow = 30 Time elapsed: 0 h 0 m 6 s A study with the sample size of 30 has only around 33% power to detect the fixed effect of x. Now, we will estimate power if x had 20 levels instead of 10. summary(model2) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [ 'glmermod'] Family: poisson ( log ) Formula: z ~ x + (1 g) Data: simdata AIC BIC loglik deviance df.resid 109.0 113.2-51.5 103.0 27 Scaled residuals: Min 1Q Median 3Q Max -1.28918-0.41836-0.03916 0.57284 1.29631 Random effects: Groups Name Variance Std.Dev. g (Intercept) 0.08345 0.2889 Number of obs: 30, groups: g, 3 Fixed effects: Estimate Std. Error z value Pr(> z ) (Intercept) 1.54079 0.27173 5.670 1.43e-08 *** x -0.05000 0.03955-1.264 0.206 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Correlation of Fixed Effects: (Intr) x -0.666 model2ps <- powersim(model2, nsim=50) model2ps 4 of 9 13/11/2017, 12:31

5 of 9 13/11/2017, 12:31 Power for predictor 'x', (95% confidence interval): 94.00% (83.45, 98.75) Test: z-test Effect size for x is -0.050 Based on 50 simulations, (1 warning, 0 errors) alpha = 0.05, nrow = 60 Time elapsed: 0 h 0 m 6 s We can also plot power against the value of x. (pc2 <- powercurve(model2, nsim = 50)) pc2 Power for predictor 'x', (95% confidence interval), by largest value of x: 3: 6.00% ( 1.25, 16.55) - 9 rows 5: 6.00% ( 1.25, 16.55) - 15 rows 7: 16.00% ( 7.17, 29.11) - 21 rows 9: 26.00% (14.63, 40.34) - 27 rows 11: 38.00% (24.65, 52.83) - 33 rows 12: 42.00% (28.19, 56.79) - 36 rows 14: 66.00% (51.23, 78.79) - 42 rows 16: 84.00% (70.89, 92.83) - 48 rows 18: 92.00% (80.77, 97.78) - 54 rows 20: 94.00% (83.45, 98.75) - 60 rows Time elapsed: 0 h 0 m 47 s

6 of 9 13/11/2017, 12:31 For the random effect of g, the same approach can be taken. summary(model3)

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [ 'glmermod'] Family: poisson ( log ) Formula: z ~ x + (1 g) Data: simdata AIC BIC loglik deviance df.resid 109.0 113.2-51.5 103.0 27 Scaled residuals: Min 1Q Median 3Q Max -1.28918-0.41836-0.03916 0.57284 1.29631 Random effects: Groups Name Variance Std.Dev. g (Intercept) 0.08345 0.2889 Number of obs: 30, groups: g, 3 Fixed effects: Estimate Std. Error z value Pr(> z ) (Intercept) 1.54079 0.27173 5.670 1.43e-08 *** x -0.05000 0.03955-1.264 0.206 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Correlation of Fixed Effects: (Intr) x -0.666 pc3 <- powercurve(model3, along="g", nsim = 50) pc3 Power for predictor 'x', (95% confidence interval), by number of levels in g: 3: 36.00% (22.92, 50.81) - 30 rows 4: 54.00% (39.32, 68.19) - 40 rows 6: 64.00% (49.19, 77.08) - 60 rows 7: 64.00% (49.19, 77.08) - 70 rows 8: 70.00% (55.39, 82.14) - 80 rows 10: 80.00% (66.28, 89.97) - 100 rows 11: 86.00% (73.26, 94.18) - 110 rows 12: 86.00% (73.26, 94.18) - 120 rows 14: 90.00% (78.19, 96.67) - 140 rows 15: 90.00% (78.19, 96.67) - 150 rows Time elapsed: 0 h 0 m 54 s 7 of 9 13/11/2017, 12:31

8 of 9 13/11/2017, 12:31 For studies using a Bayes factor approach, simulation can be used to estimate the probability that strong evidence will be obtained for the null or the alternative hypothesis. The code below is adapted from Lakens (2016; http://daniellakens.blogspot.co.uk/2016/01/power-analysis-for-default-bayesian-t.html (http://daniellakens.blogspot.co.uk/2016/01/power-analysis-for-default-bayesian-t.html)). cat("the probability of observing support for the null hypothesis is ",supporth0 < - sum(bf<(1/threshold))/nsim) The probability of observing support for the null hypothesis is 0.017 cat("the probability of observing support for the alternative hypothesis is ",supp orth1 <- sum(bf>threshold)/nsim) The probability of observing support for the alternative hypothesis is 0.816

9 of 9 13/11/2017, 12:31