CMU MSP 36726: Power

Size: px

Start display at page:

Download "CMU MSP 36726: Power"

Ophelia Bailey
5 years ago
Views:

1 CMU MSP 36726: Power H. Seltman 2/21/2018 I. Consider three experiments: 1) Recruitment, randomization, priming with your group does well/poorly in math, math testing 2) Recruitment, each subject gets drug and placebo (counterbalanced), tests of concentration 3) Recruitment, randomization, priming with control vs. eating health extends lives, choice of snack is cake vs. carrot sticks. Parameter(s) Test Statistic Null sampling dist. Alt. sampling dist. #1 #2 #3 II. III. A Bayesian posterior tells us the Pr(θ>0 y), 95% HPD interval for θ, but not Pr(θ=0). When an experiment compares the effects of two treatments on one quantitative outcome, and when the model assumptions are (reasonably well) met for a non-bayesian test, there are four possibilities before the test is run (only two, i.e., one row, if you are omniscient) and two (one column) after: Truth! p α p>α δ = µ1-µ2 =0 FP α TN 1-α δ = µ1-µ2 >0 TP 1-βδ FN βδ Power is 1-β δ, with a different power value for each value of δ for a given experiment, usually visualized as a power curve with 1-β on the y-axis and δ on the x-axis. For any given statistical test for a given experiment in a given population, the power depends on sample size, n, and possibly other parameters such as variances and covariances, in addition to δ. α is also called type-1 error rate, and β is called type-2 error rate. IV. Here is one of the simplest examples: A one sample Z-test is performed on a sample of size n to test H 0:µ=µ 0 vs. H A:µ µ 0 for some specified µ 0 where Y~N(µ,σ 2 ) with known variance. E.g., Y is width of a smart phone screen coming out of a factory, where we choose to sample n=25 random screens, and the manufacturing specification is µ 0 = 3 inches, and long experience (lots of data) tells us that Y~N and σ 2 = inches. Choose a useful statistic, and make a diagram showing the null sampling distribution of the statistic including the cutoff value(s) for rejecting H 0 at significance level α=

2 Now add the sampling distribution of your statistic for H A: µ= inches. Shade the region of the alternate sampling distribution above the cutoff value. What is β for this alternative? What is the power for this alternative? What causes the power to increase or decrease? What happens if Y is not normally distributed? V. For the first priming experiment, how can we increase the power? 2

3 VI. The alternative sampling distribution for the F statistic in a one-way ANOVA is non-central F with numerator and denominator degrees of freedom, ndf=j-1 (factor df) and ddf=n-j (error df), and with non-centrality parameter (n.c.p.) equal to where σσ ee 2 is the usual residual error variance. nn. cc. pp. = nn JJ μμ jj μμ 2 jj σσ2 ee 2 JJ If we define σσ TT as μμ jj μμ 2 jj (JJ 1), then n.c.p. = nn(jj 1)σσ 2 TT 2 σσ ee. JJ Note: The Lenth power applet (see below) defines SD[treatment] = μμ jj μμ 2 jj (JJ 1) so that it must calculate n.c.p. as n * SD[treatment] 2 * (J-1) * σσ 2 ee. What are the null and alternative sampling distributions for a one-way ANOVA with 10 subjects in each of 3 groups with residual variance 20 and alternative population means 6, 12, and 12? In R, the numerator and denominator df are called df1 and df2 respectively, and we can use: # Find the F value above which 5% of the null values fall: qf(0.05, df1, df2, lower.tail=false) # Find the probability under H_A such that F>q pf(q, df1, df2, ncp, lower.tail=false) Using R, what is the power for this alternative in this experimental setup? How would you make a power curve? What labeling is needed for the power curve, i.e., what, if changed, would result in a different power curve? 3

4 VII. Information needed to calculate power a. Any real experiment has a specific power, i.e., the chance that a particular p-value will be less than or equal to α in a single experimental run. This depends on the number of subjects, effect size (e.g., spread of the population means), error variance (for a Gaussian DV), and possibly other quantities such as var(x) is simple regression, VIF in multiple regression, intra-class correlation in a two-level mixed model, etc. The true power is always unknown, because even if we estimate all other quantities, we do not know the effect size because that is the main thing we are trying to determine when we run the experiment. b. A power analysis computes the approximate power for specific achievable sample sizes, for one or several meaningful effect sizes, and for good guesses for the other needed quantities. Power analysis is only meaningful before running the experiment. After running the experiment, if p α, your combination of power and luck was adequate. Otherwise, confidence intervals tell you what you need to know: if meaningful effect sizes are within the CIs, then the experiment did not have enough power and it is too late to do anything about it. c. To obtain an estimate of error variance (or error sd) use one of these methods: 1. Steal MSE or residual SE from a previous similar experiment. 2. Run a pilot study, perhaps with no treatment, and compute sd(y) for a group. 3. Get an expert to guess 95% interval of Y for a given combination of x s. Error sd is ¼ the length of that interval. d. To obtain one or more meaningful effect sizes, ask a subject matter expert one of these questions: 1. What is the smallest meaningful difference in mean y for control vs. treatment? 2. What is a likely difference in mean y for control vs. treatment? 3. What is the smallest difference in mean y for control vs. treatment that would cause you to change your subsequent actions/decisions/beliefs? Or use some (stupid, but popular) idea like Cohen s effect sizes: d µ 1-µ 2 /σ, small: d=0.2, medium: d=0.5, large=0.8. 4

5 VIII. Power calculations using the Lenth power applet (a Java app) a. Find the power for the above ANOVA example using the Lenth Power Applet at 1. Choose Balanced ANOVA (any model) and click Run selection. 2. Fill in the Select an ANOVA model dialog box. Change levels to 3, check the Observations per factor combination and click F tests. 3. In the One-way ANOVA dialog box enter values by slider or by clicking the small gray box that opens an area to type in the value. For SD[treatment] enter the standard deviation of the population means (with the usual J-1 denominator, even though J makes more sense here). Read off the power value. 4. Now find the power for residual variance = 30. b. Find the power for a simple regression with residual standard error 8, and x={1,3,5,7) in quadruplicate with alternate β 1= 1.2 using the Lenth Power Applet (see Help). What sample size is needed for 80% power for this alternative? For multiple regression, the extra information required is VIF, 1/(1-R j2 ). If you have 3 predictors (IVs) and R 2 for the predictor of interest regressed on the other 2 IVs is 0.6, what is the power for the new sample size? c. Find the power for an experiment with three treatments and a binary (categorical) outcome with 60 subjects and expecting a nominal success rate of 30 percent and hoping to find that one treatment has 40 percent success. Click Help/ThisDialog in the Chi-Square dialog box for details. After calculating the power, guess first, and then determine the sample size needed for 80% power. 5

6 IX. R power calculation: pwr is one of many power packages a. Install, load, and then execute library(help=pwr). b. Calculate the power for the ANOVA example above using pwr.anova.test(). This function uses the Cohen s effect size parameter, defined as f = σσ 2 TT σσ2 ee. c. Calculate the power for regression with u=1 parameter, v=14 residual df, and R 2 =0.28, using pwr.f2.test() and Cohens definition f 2 =R 2 /(1-R 2 ). This is not the best way to represent effect size for regression. d. Calculate the power for the above chi-square example. Use pwr.chisq.test(). Parameter w is equal to sqrt(χ 2 /N) from a chi-square test with fake data resembling the alternative hypothesis. X. SAS power calculation: a. ANOVA TITLE "ANOVA power means=6,12,12 RMSE=20"; PROC POWER; ONEWAYANOVA TEST=OVERALL GROUPMEANS=( ) STDDEV=4.472 NPERGROUP=10 POWER=.; PLOT; The POWER Procedure Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Group Means Standard Deviation Sample Size Per Group 10 Alpha 0.05 Computed Power Power TITLE "ANOVA power finding n for 80% power for two variances"; PROC POWER; ONEWAYANOVA TEST=OVERALL GROUPMEANS=( ) STDDEV=4.472 NPERGROUP=. POWER=0.8; ONEWAYANOVA TEST=OVERALL GROUPMEANS=( ) STDDEV=5.477 NPERGROUP=. POWER=0.8; TITLE "ANOVA power unequal n"; PROC POWER; ONEWAYANOVA TEST=OVERALL GROUPMEANS=( ) STDDEV=4.472 GROUPNS=( ) POWER=.; ONEWAYANOVA TEST=OVERALL GROUPMEANS=( ) STDDEV=5.477 GROUPNS=( ) POWER=.; 6

7 TITLE "ANOVA contrast power contrast for variances"; PROC POWER; ONEWAYANOVA TEST=CONTRAST CONTRAST=(2-1 -1) GROUPMEANS=( ) STDDEV=4.472 NPERGROUP=10 POWER=.; ONEWAYANOVA TEST=CONTRAST CONTRAST=(2-1 -1) GROUPMEANS=( ) STDDEV=5.477 NPERGROUP=10 POWER=.; The POWER Procedure Single DF Contrast in One-Way ANOVA Fixed Scenario Elements Method Exact Contrast Coefficients Group Means Standard Deviation Sample Size Per Group 10 Number of Sides 2 Null Contrast Value 0 Alpha 0.05 Computed Power Power b. Regression This is based on the idea that you have some predictors (IVs) in a model which achieves a certain R 2, and you want to know the power for adding 1 or more additional predictors which raise the R 2 by some value, R 2 diff. PROC POWER; MULTREG MODEL=FIXED NREDUCEDPREDICTORS=3 RSQUAREREDUCED=0.5 NTESTPREDICTORS=1 RSQUAREDIFF=0.1 NTOTAL=. POWER=0.8; MULTREG MODEL=FIXED NREDUCEDPREDICTORS=3 RSQUAREREDUCED=0.5 NTESTPREDICTORS=1 RSQUAREDIFF=0.1 NTOTAL=10 to 35 by 5 POWER=.; Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Test Predictors 1 Number of Predictors in Reduced Model 3 R-square of Reduced Model 0.5 Difference in R-square 0.1 Alpha 0.05 Computed Power N Index Total Power

8 c. Chi-square (2 groups only) PROC POWER; TWOSAMPLEFREQ TEST=PCHI REFPROPORTION=0.3 NPERGROUP=50 PROPRORTIONDIFF=0.05 to 0.30 by 0.05 POWER=.; Pearson Chi-square Test for Two Proportions Fixed Scenario Elements Distribution Asymptotic normal Method Normal approximation Reference (Group 1) Proportion 0.3 Sample Size Per Group 50 Number of Sides 2 Null Proportion Difference 0 Alpha 0.05 Computed Power Proportion Index Diff Power d. PROC GLMPOWER: Uses fake dataset of anticipated results. For ANCOVA, specify correlation of the covariate(s) with Y. DATA Expected; INPUT Font $ Size $ Score; DATALINES; Serif Small 71 Sans Small 76 Serif Medium 80 Sans Medium 85 Serif Large 69 Sans Large 74 TITLE "Score on Font+Size with N=50, RSE=3.5 and One Covariate"; PROC GLMPOWER DATA=Expected; CLASS Font Size; MODEL Score = Font Size; CONTRAST "Serif vs. Sans" Font -1 1; CONTRAST "Small+Large vs. Medium" Size ; CONTRAST "Small vs. Large" Size 1 0-1; POWER STDDEV=3.5 NCOVARIATES=1 CORRXY = NTOTAL=30 POWER=.; PLOT X=n MIN=20 MAX=60; 8

9 The GLMPOWER Procedure Fixed Scenario Elements Dependent Variable Score Number of Covariates 1 Std Dev Without Covariate Adjustment 3.5 Total Sample Size 30 Alpha 0.05 Error Degrees of Freedom 25 Computed Power Adj Corr Std Test Index Type Source XY Dev DF Power 1 Effect Font Effect Font Effect Size > Effect Size > Contrast Serif vs. Sans Contrast Serif vs. Sans Contrast Small+Large vs. Medium > Contrast Small+Large vs. Medium > Contrast Small vs. Large Contrast Small vs. Large

10 XI. Power analysis by simulation (in R) a. Create an R simulation function that simulates the data under the desired model. The function should have arguments N (total sample size) and some measure of effect size, plus any optional arguments that represent other aspects of the simulated data that you might want to change now or in the future. The function should return a data object in a form that your analysis function (see next) understands. Test the function. b. Create an R analysis function that has arguments data (an object containing data, usually created by the simulation function) and possibly other parameters, e.g., to specify analysis options and/or which parameter to test. This function can return a p- value (simple form) or a full analysis (e.g., lm() result). Test the function. c. For each point on your power curve, use the apply functions to estimate the power at that N and effect size (and possibly other settings). Plot these points. d. Number of simulations needed: Var(power estimate) = var(n sig/n sim) = π(1- π)/n sim where n sig is the observed number of simulations with p α, and π is the true power. This is maximal at π=0.5. So with n sim=100, var max =0.5*0.5/100 and 2SE = 2*sqrt(var max)=0.10. So a power estimate of 0.50 (50% power) from 100 simulations has a 95% CI of [40%, 60%]. With n sim=500/2000, 2SE for π=0.5 is 0.044/ At π=0.05/0.80, n sim=2000, 2SE=0.010/ e. Example: 1. Simulation function for simple regression with β 0=0, σ x=1. simreg = function(n=100, beta1=0, sderr=1) { x = rnorm(n, 0, 1) y = rnorm(n, x*beta1, sderr) data = data.frame(x, y) return(data) } 2. Analysis function for ordinary regression (returns p-value) preg = function(data) { rslt = summary(lm(y~x, data)) return(coef(rslt)["x", "Pr(> t )"]) } 3. Compute power estimate for one N / effect size using the above: regpower = function(n, beta1=0, sderr=1, nsim=100) { ps = sapply(rep(n, nsim), function(n, beta1, sderr) { preg(simreg(n, beta1, sderr)) }, beta1=beta1, sderr=sderr) return(mean(ps<=0.05)) } 4. Accumulate power at several data points ns = seq(5, 50, 5) p0_1.5 = sapply(ns, regpower, beta1=0, sderr=1.5, nsim=2000) p0.5_1.5 = sapply(ns, regpower, beta1=0.5, sderr=1.5, nsim=2000) p1_1.5 = sapply(ns, regpower, beta1=1, sderr=1.5, nsim=2000) 10

11 5. Examine and plot results cbind(ns, p0_1.5, p0.5_1.5, p1_1.5) # ns p0_1.5 p0.5_1.5 p1_1.5 # [1,] # [2,] # [3,] # [4,] # [5,] # [6,] # [7,] # [8,] # [9,] # [10,] plot(ns, 100*p1_1.5, type="l", col=1, ylim=c(0,100), xlab="n", ylab="% Power", main="regression Power for B1 for sigma=1.5") abline(h=c(5, 80), col="gray") lines(ns, 100*p0.5_1.5, col=2) lines(ns, 100*p0_1.5, col=3) legend("topleft", paste0("beta1=",c(1,0.5,0)), lty=1, col=1:3) How can you make this plot less jagged? 6. More flexible code: betas = c(0,0.5,1) pwr = sapply(betas, function(b) sapply(ns, regpower, b, 1.5)) matplot(100*pwr, type="l") 11

12 XIII. What every statistician should know about power a. A clearly defined experiment has a single unknown power b. There are many ways to improve power other than increasing sample size c. A client who wants a power analysis will usually need help defining the inputs, and will want an output that shows power for one effect size and some n s and/or for one and some effect sizes. d. Practically, extra n is needed if there may be participant dropout. e. If an experiment is run with a power of P for effect size E with sample size N, then the power is larger (smaller) if the true effect size is larger (smaller) than specified. f. It is critical to understand that while some inputs are imprecise, the general conclusion is highly valuable: most simply, if the calculated power for an effect size of interest is well below, say, 80%, then the experiment is not worth doing because the correct result of p<0.05 will only happen if the experimenter is extremely lucky. 12

Sample Size / Power Calculations

Sample Size / Power Calculations A Simple Example Goal: To study the effect of cold on blood pressure (mmhg) in rats Use a Completely Randomized Design (CRD): 12 rats are randomly assigned to one of two