Power of a hypothesis test

Similar documents
Interpretation of results through confidence intervals

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

8.1-4 Test of Hypotheses Based on a Single Sample

Definition: A "system" of equations is a set or collection of equations that you deal with all together at once.

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

How do we compare the relative performance among competing models?

Background to Statistics

Using SPSS for One Way Analysis of Variance

Business Statistics. Lecture 9: Simple Regression

Sample Size Calculations

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

hypothesis a claim about the value of some parameter (like p)

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Slides for Data Mining by I. H. Witten and E. Frank

ECO220Y Hypothesis Testing: Type I and Type II Errors and Power Readings: Chapter 12,

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

Part III: Unstructured Data

Mathematical Induction. EECS 203: Discrete Mathematics Lecture 11 Spring

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Structural Induction

Power Analysis. Introduction to Power

Sampling Distributions: Central Limit Theorem

Math 140 Introductory Statistics

Chapter 26: Comparing Counts (Chi Square)

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

SIMPLE REGRESSION ANALYSIS. Business Statistics

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Ch. 7. One sample hypothesis tests for µ and σ

Difference in two or more average scores in different groups

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

ECO220Y Simple Regression: Testing the Slope

Hypothesis testing: Steps

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Quantitative Analysis and Empirical Methods

Loglikelihood and Confidence Intervals

23. MORE HYPOTHESIS TESTING

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Retrieve and Open the Data

Polynomial and Synthetic Division

Categorical Data Analysis. The data are often just counts of how many things each category has.

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Hypothesis testing. Data to decisions

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Hypothesis tests

MA 3280 Lecture 05 - Generalized Echelon Form and Free Variables. Friday, January 31, 2014.

Hypothesis testing: Steps

Lecture 30. DATA 8 Summer Regression Inference

First Derivative Test

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Chapter 12 - Lecture 2 Inferences about regression coefficient

Multiple Comparisons

UCLA STAT 251. Statistical Methods for the Life and Health Sciences. Hypothesis Testing. Instructor: Ivo Dinov,

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Data analysis and Geostatistics - lecture VI

POLI 443 Applied Political Research

Ch. 11 Inference for Distributions of Categorical Data

POLI 443 Applied Political Research

ph: 5.2, 5.6, 5.8, 6.4, 6.5, 6.8, 6.9, 7.2, 7.5 sample mean = sample sd = sample size, n = 9

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Lecture 28 Chi-Square Analysis

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

STA Module 10 Comparing Two Proportions

Understanding p Values

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Interactions and Factorial ANOVA

E509A: Principle of Biostatistics. GY Zou

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Lecture 4: Testing Stuff

Visual interpretation with normal approximation

14.75: Leaders and Democratic Institutions

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Interactions and Factorial ANOVA

Review of Statistics 101

Lectures 5 & 6: Hypothesis Testing

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Modeling the Mean: Response Profiles v. Parametric Curves

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Psych 10 / Stats 60, Practice Problem Set 5 (Week 5 Material) Part 1: Power (and building blocks of power)

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

STA 2101/442 Assignment 2 1

WISE Power Tutorial Answer Sheet

Lecture 5: Introduction to Markov Chains

Many natural processes can be fit to a Poisson distribution

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Introductory Econometrics

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Contrasts (in general)

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Intro. To Exptl. Design

CS173 Strong Induction and Functions. Tandy Warnow

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Transcription:

Power of a hypothesis test Scenario #1 Scenario #2 H 0 is true H 0 is not true test rejects H 0 type I error test rejects H 0 OK test does not reject H 0 OK test does not reject H 0 type II error Power = P(test rejects H 0 H 0 is not true)

Power of a hypothesis test H 0 is not true test rejects H 0 test does not reject H 0 OK type II error Power: Probability that test rejects H 0 when H 0 is not true Calculating power is important when designing studies Helps ensure a reasonable chance of gaining good information

Power of t test (distribution of test statistic) Distribution if H 0 is true Distribution if H 0 NOT true Probability reject H 0 if H 0 is true (significance level) Power = P Z < Z 1 α 2 + μ 1 μ 0 σ, where Z ~ N(0,1) n

Example: post-hoc power calculation Group A gum data: µ = mean change in DMFS Hypothesis H 0 : µ = 0 n = 25, X = -0.72, and s = 5.37. The t statistic T = -0.67, p-value = 0.51 WE KNOW: our test does not furnish us good evidence of a change in DMFS. WE DON T KNOW

Example: post-hoc power calculation WE DON T KNOW: Whether or not the DMFS truly changes. The lack of evidence could be the result of either: the mean change in DMFS is truly zero, or the test wasn t powerful enough to provide evidence of change. We can get some information about the second possibility by computing a power estimate.

Example: post-hoc power calculation We know: n = 25, α=0.05 We will assume: True average change in DMFS is 1 DMFS True population standard deviation is σ = 5.37* Under these assumptions our power to reject H 0 would be 1 0 P Z < 1.96 + 5.37 25 = P Z < 1.03 = 1 P Z < 1.03 = 0.15

Example: post-hoc power calculation If the true mean change is 1 DMFS, then: the probability that our test would have rejected H 0 was only 15%. It is very possible that our test would have missed indicating a change in DMFS if the true change were 1 DMFS or less. We can not use this test result to conclude that there is no change in DMFS.

Example: post-hoc power calculation Now let s assume: True average change in DMFS is 4 DMFS If this were the case then our power to reject H 0 would be 4 0 P Z < 1.96 + 5.37 25 = P Z < 1.76 = 0.96 We can conclude with reasonable certainty that the true change is not 4 DMFS or greater. Note that this still does not imply that H 0 is true.

Non-significant results (fail to reject H 0 ) Failing to reject H 0 should not be considered evidence that H 0 is true. It could be the case that the failure to reject was the result of an under-powered test. With an under-powered test, failing to reject tells you nothing. If the test has high power to reject, then failure to reject is more interpretable.

Important point A test with high power will: have a good chance of rejecting null hypothesis, when it is appropriate. make it easier to interpret what the result tells you, should it fail to reject H 0. Ensuring good power is an important step in the design of any study.

Factors that affect the power of a t test Power determined by μ 1 μ 0 σ n μ 1 μ 0 σ n

Factors that affect the power of a t test Power is greater for larger values of μ 1 μ 0 σ n Power is greater when μ 1 μ 0 is greater (the effect is larger) σ is lesser (the data are less variable) n is greater (more subjects, more information) When designing a study the investigators have the most control over the number of subjects (n).

Sample size calculation To have power 1-β for a test with significance level α to reject H 0 : μ=0, the sample size should be at least n = σ2 2 Z 1 α 2 + Z 1 β μ 1 μ 2 0 * In this formula β = probability of type II error, so power is denoted by 1- β

Example: chewing gum data Using the previous assumptions about the Group A chewing gum data, if the true mean change is 1 DMFS, then to have 80% power to yield good evidence of a change in DMFS, the sample size should be at least n = 5.372 1.960 + 0.842 2 1 0 2 = 226.4 So should enroll at least 227 children

Example: chewing gum data If the true mean change is 2 DMFS, then to have 80% power to yield good evidence of a change in DMFS, the sample size should be at least n = 5.372 1.960 + 0.842 2 2 0 2 = 56.6 So round up to 57 children.

Components of a sample size calculation The desired power: 1- β Industry standard is minimum of 80%. Because of potential for incorrect estimates of the various parameters in the calculations, investigators often try for 90% power to be conservative. Some investigations have the goal of demonstrating evidence of equality (instead of difference). One method to do this is to specify tests with greater power (95%)

Components of a sample size calculation Significance level: α Usual choices are α = 0.05 or α = 0.01. Sometimes adjustments for multiple testing will lead to specifying other levels for α.

Components of a sample size calculation Population standard deviation: σ The population standard deviation will not be known, and must be estimated from previous studies. These estimates should be conservative (err on the high side).

Example: estimation of σ In the gum data example we estimated σ using s from a sample of size n=25. The 95% confidence interval for σ in this case would be (4.19, 7.47) * Thus, σ = 5.37 might well be an underestimate of the true population σ *see Rosner, section 6.7 for details of calculation

Example: estimation of σ Say we assumed σ = 5.37, and so following our previous calculation that leads to a sample of 227. However, suppose that σ really was 6.00. Then our true power would be only P Z < 1.96 + 1 0 6.00 227 = 0.71 To avoid low power it is a good idea to assume a higher standard deviation than was observed in previous studies.

Example: estimation of σ A reasonably conservative method is to use the upper 80% confidence limit for σ, as an estimate for σ, which is given by n 1 s 2 2 χ n 1,0.2 In this case the value would be 6.19* *see Rosner, section 6.7 for details of calculation

Components of a sample size calculation Difference in the means μ 0 - μ 1 By your choice of μ 1 you design your study to be able to indicate a difference of size μ 1 Your study will not be able to dependably indicate a difference smaller than μ 1 Think of μ 1 as a cutoff for the size of effect you would like to find rather than an estimate. Ideally, one should specify μ 1 to be the minimum clinically significant difference.

Example: choice of μ 1 Graph displaying power by true μ when n was chosen based on μ 1 = 2.

Notes on power and sample-size calcs This lecture has focused on the one-sample t-test. But the general ideas apply to the various different hypothesis tests that we will be covering. The formulas presented in this lecture are approximations that work well for large studies (large sample sizes). Programs are available for computing more exact estimates (next slide).

Online power calculators Web-based calculator (java) (http://www.stat.uiowa.edu/~rlenth/power) Downloadable calculators: PS (http://biostat.mc.vanderbilt.edu/powersamplesize) G-power (http://www.gpower.hhu.de/ ) Others: Do web search on Power and Sample Size