Power of a hypothesis test Scenario #1 Scenario #2 H 0 is true H 0 is not true test rejects H 0 type I error test rejects H 0 OK test does not reject H 0 OK test does not reject H 0 type II error Power = P(test rejects H 0 H 0 is not true)
Power of a hypothesis test H 0 is not true test rejects H 0 test does not reject H 0 OK type II error Power: Probability that test rejects H 0 when H 0 is not true Calculating power is important when designing studies Helps ensure a reasonable chance of gaining good information
Power of t test (distribution of test statistic) Distribution if H 0 is true Distribution if H 0 NOT true Probability reject H 0 if H 0 is true (significance level) Power = P Z < Z 1 α 2 + μ 1 μ 0 σ, where Z ~ N(0,1) n
Example: post-hoc power calculation Group A gum data: µ = mean change in DMFS Hypothesis H 0 : µ = 0 n = 25, X = -0.72, and s = 5.37. The t statistic T = -0.67, p-value = 0.51 WE KNOW: our test does not furnish us good evidence of a change in DMFS. WE DON T KNOW
Example: post-hoc power calculation WE DON T KNOW: Whether or not the DMFS truly changes. The lack of evidence could be the result of either: the mean change in DMFS is truly zero, or the test wasn t powerful enough to provide evidence of change. We can get some information about the second possibility by computing a power estimate.
Example: post-hoc power calculation We know: n = 25, α=0.05 We will assume: True average change in DMFS is 1 DMFS True population standard deviation is σ = 5.37* Under these assumptions our power to reject H 0 would be 1 0 P Z < 1.96 + 5.37 25 = P Z < 1.03 = 1 P Z < 1.03 = 0.15
Example: post-hoc power calculation If the true mean change is 1 DMFS, then: the probability that our test would have rejected H 0 was only 15%. It is very possible that our test would have missed indicating a change in DMFS if the true change were 1 DMFS or less. We can not use this test result to conclude that there is no change in DMFS.
Example: post-hoc power calculation Now let s assume: True average change in DMFS is 4 DMFS If this were the case then our power to reject H 0 would be 4 0 P Z < 1.96 + 5.37 25 = P Z < 1.76 = 0.96 We can conclude with reasonable certainty that the true change is not 4 DMFS or greater. Note that this still does not imply that H 0 is true.
Non-significant results (fail to reject H 0 ) Failing to reject H 0 should not be considered evidence that H 0 is true. It could be the case that the failure to reject was the result of an under-powered test. With an under-powered test, failing to reject tells you nothing. If the test has high power to reject, then failure to reject is more interpretable.
Important point A test with high power will: have a good chance of rejecting null hypothesis, when it is appropriate. make it easier to interpret what the result tells you, should it fail to reject H 0. Ensuring good power is an important step in the design of any study.
Factors that affect the power of a t test Power determined by μ 1 μ 0 σ n μ 1 μ 0 σ n
Factors that affect the power of a t test Power is greater for larger values of μ 1 μ 0 σ n Power is greater when μ 1 μ 0 is greater (the effect is larger) σ is lesser (the data are less variable) n is greater (more subjects, more information) When designing a study the investigators have the most control over the number of subjects (n).
Sample size calculation To have power 1-β for a test with significance level α to reject H 0 : μ=0, the sample size should be at least n = σ2 2 Z 1 α 2 + Z 1 β μ 1 μ 2 0 * In this formula β = probability of type II error, so power is denoted by 1- β
Example: chewing gum data Using the previous assumptions about the Group A chewing gum data, if the true mean change is 1 DMFS, then to have 80% power to yield good evidence of a change in DMFS, the sample size should be at least n = 5.372 1.960 + 0.842 2 1 0 2 = 226.4 So should enroll at least 227 children
Example: chewing gum data If the true mean change is 2 DMFS, then to have 80% power to yield good evidence of a change in DMFS, the sample size should be at least n = 5.372 1.960 + 0.842 2 2 0 2 = 56.6 So round up to 57 children.
Components of a sample size calculation The desired power: 1- β Industry standard is minimum of 80%. Because of potential for incorrect estimates of the various parameters in the calculations, investigators often try for 90% power to be conservative. Some investigations have the goal of demonstrating evidence of equality (instead of difference). One method to do this is to specify tests with greater power (95%)
Components of a sample size calculation Significance level: α Usual choices are α = 0.05 or α = 0.01. Sometimes adjustments for multiple testing will lead to specifying other levels for α.
Components of a sample size calculation Population standard deviation: σ The population standard deviation will not be known, and must be estimated from previous studies. These estimates should be conservative (err on the high side).
Example: estimation of σ In the gum data example we estimated σ using s from a sample of size n=25. The 95% confidence interval for σ in this case would be (4.19, 7.47) * Thus, σ = 5.37 might well be an underestimate of the true population σ *see Rosner, section 6.7 for details of calculation
Example: estimation of σ Say we assumed σ = 5.37, and so following our previous calculation that leads to a sample of 227. However, suppose that σ really was 6.00. Then our true power would be only P Z < 1.96 + 1 0 6.00 227 = 0.71 To avoid low power it is a good idea to assume a higher standard deviation than was observed in previous studies.
Example: estimation of σ A reasonably conservative method is to use the upper 80% confidence limit for σ, as an estimate for σ, which is given by n 1 s 2 2 χ n 1,0.2 In this case the value would be 6.19* *see Rosner, section 6.7 for details of calculation
Components of a sample size calculation Difference in the means μ 0 - μ 1 By your choice of μ 1 you design your study to be able to indicate a difference of size μ 1 Your study will not be able to dependably indicate a difference smaller than μ 1 Think of μ 1 as a cutoff for the size of effect you would like to find rather than an estimate. Ideally, one should specify μ 1 to be the minimum clinically significant difference.
Example: choice of μ 1 Graph displaying power by true μ when n was chosen based on μ 1 = 2.
Notes on power and sample-size calcs This lecture has focused on the one-sample t-test. But the general ideas apply to the various different hypothesis tests that we will be covering. The formulas presented in this lecture are approximations that work well for large studies (large sample sizes). Programs are available for computing more exact estimates (next slide).
Online power calculators Web-based calculator (java) (http://www.stat.uiowa.edu/~rlenth/power) Downloadable calculators: PS (http://biostat.mc.vanderbilt.edu/powersamplesize) G-power (http://www.gpower.hhu.de/ ) Others: Do web search on Power and Sample Size