Power and sample size calculations

Similar documents
One-sample categorical data: approximate inference

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Multiple samples: Modeling and ANOVA

Two-sample inference: Continuous data

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Two-sample inference: Continuous data

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Hypothesis testing. Data to decisions

The normal distribution

The Central Limit Theorem

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

Power Analysis. Introduction to Power

Lecture 21: October 19

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Confidence Interval Estimation

Chapter 12: Estimation

Harvard University. Rigorous Research in Engineering Education

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Count data and the Poisson distribution

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Two-sample Categorical data: Testing

Dealing with the assumption of independence between samples - introducing the paired design.

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Descriptive statistics

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Two or more categorical predictors. 2.1 Two fixed effects

Chapter 5: Preferences

Liang Li, PhD. MD Anderson

Warm-Up Problem. Is the following true or false? 1/35

E509A: Principle of Biostatistics. GY Zou

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

BIOS 312: Precision of Statistical Inference

Statistics for IT Managers

MATH 240. Chapter 8 Outlines of Hypothesis Tests

James H. Steiger. Department of Psychology and Human Development Vanderbilt University. Introduction Factors Influencing Power

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Semiparametric Regression

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

The normal distribution

POLI 443 Applied Political Research

DISTRIBUTIONS USED IN STATISTICAL WORK

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Lectures 5 & 6: Hypothesis Testing

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Two-sample Categorical data: Testing

Section 9.1 (Part 2) (pp ) Type I and Type II Errors

CHAPTER 9: HYPOTHESIS TESTING

Statistical Inference

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Statistical inference

Accelerated Failure Time Models

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Hypotheses Test Procedures. Is the claim wrong?

COMPARING GROUPS PART 1CONTINUOUS DATA

Review of basic probability and statistics

POLI 443 Applied Political Research

Two-sample inference: Continuous Data

Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations

Sample Size Estimation for Studies of High-Dimensional Data

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Power of a test. Hypothesis testing

Probability. We will now begin to explore issues of uncertainty and randomness and how they affect our view of nature.

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Inference for Distributions Inference for the Mean of a Population

Mathematical statistics

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Two Sample Problems. Two sample problems

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

Uni- and Bivariate Power

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Quantitative Analysis and Empirical Methods

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

ST 732, Midterm Solutions Spring 2019

Other likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 26, 2015

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

The Chi-Square and F Distributions

Multivariate Statistical Analysis

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

a Sample By:Dr.Hoseyn Falahzadeh 1

COGS 14B: INTRODUCTION TO STATISTICAL ANALYSIS

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute

Power and the computation of sample size

Transcription:

Patrick Breheny October 20 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 26

Planning a study Introduction What is power? Why is it important? Setup One of the most important questions as far as planning and budgeting a study is concerned is: how many subjects do I need? The number of subjects tends to play a very large role in determining the cost of a study, so funding agencies generally want to know the number of subjects that a study will require before they make a decision about whether or not to pay for it But of course, the fewer subjects you have, the harder it is to distinguish a real phenomenon from chance Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 2 / 26

Power Introduction What is power? Why is it important? Setup The probability that you will successfully distinguish the real phenomenon from chance (i.e., reject the null hypothesis) is captured in the notion of power Power is the probability of rejecting the null hypothesis given that it is in reality false Note that this is the complement of the type II error rate (β), which was defined as the probability of failing to reject the null hypothesis given that it was false: Power = 1 β Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 3 / 26

Two important questions What is power? Why is it important? Setup There are two important, highly related questions here: If I plan a study with a certain number of subjects, what is my power going to be? If I want to achieve a certain power, how large does my sample size need to be? Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 4 / 26

Why is power important? What is power? Why is it important? Setup Over the past 40 years, power calculations have played a very important role in advancing scientific rigor, particularly in medical studies Decades ago, it was common to carry out studies with small sample sizes and obtain non-significant results This is problematic for two important reasons: People are very bad at interpreting non-significant results There is often a publication bias against non-significant results, skewing the results that appear in print The goal of power calculations is to plan a study with a sufficiently large sample size to avoid these problems Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 5 / 26

The null and true distributions What is power? Why is it important? Setup In order to calculate power, we will need to keep track of two sampling distributions for our statistic of interest: Its distribution under the null hypothesis Its actual distribution (note that this is different from the null distribution; otherwise we d be calculating a Type I error rate) The power of any test depends on these distributions, which in turn depend on a number of factors, such as the size of the effect (the signal) and on the variability/standard error (the noise) In general, these are unknown parameters and we typically know very little about them especially before we have conducted the study Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 6 / 26

Specifying unknown parameters What is power? Why is it important? Setup So, in order to actually calculate power, we must make educated guesses about realistic/biologically plausible values for these quantities, and our calculated power will depend on the values that we choose Of course, if we specify values that are far away from reality, our power calculations are not going to be accurate Sometimes, reasonable values for certain quantities can be chosen on the basis of past studies or observations Other times, a small initial study called a pilot study is conducted in order to provide some data with which to estimate these quantities and help plan for a larger study that would take place in the future. Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 7 / 26

Outline Introduction What is power? Why is it important? Setup Depending on the complexity of the problem, there are three basic approaches one can take in calculating power: In simple settings (such as for t-tests), exact solutions are possible In moderately complex settings, it is often possible to (conservatively) approximate the power by considering a related simpler problem for which an exact solution is available In more complex settings, simulations are required Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 8 / 26

Hypothetical example Suppose we develop some intervention that may reduce LDL cholesterol levels Suppose we plan to conduct a study in which n individuals are randomized to receive this intervention and n individuals are randomized to receive a placebo, and we are going to look at the difference in mean LDL cholesterol levels at the end of the study between the two groups We propose to analyze this data using a two-sample t-test; to begin, however, let s assume that we know the true standard deviation σ Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 9 / 26

Example (cont d) Introduction The power of our study depends on four factors, most of which are outside our control: The sample size, n The type I error rate we are willing to live with, α How variable LDL cholesterol levels are in the population we are studying: σ, the variability The amount by which our intervention actually reduces cholesterol: µ 1 µ 2, the effect size For our calculations, we will assume a variability of σ = 36 mg/dl, an effect size of µ 1 µ 2 = 5 mg/dl, a type I error rate of 5%, and start out with n = 100 in each group Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 10 / 26

Picture of sampling distributions Null True 3 2 1 0 1 2 3 4 (x 1 x 2 ) σ 1 n 1 + 1 n 2 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 11 / 26

Critical values Introduction The key to calculating the power of a test is to determine its critical value the value that the test statistic will have to lie outside in order to reject the test With this in mind, a power calculation is essentially a two-step procedure: First, calculate the critical value based on the null distribution Second, calculate the probability of lying outside the critical value based on the hypothesized true distribution Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 12 / 26

Calculating power for our cholesterol study In our hypothetical cholesterol study, the critical value (on the standardized scale) is 1.96 On that scale, our true distribution has mean Thus, our power is µ 1 µ 2 σ 1/n + 1/n = 5 36 2/100 = 0.982 1 Φ(1.96 0.982) = 0.164; with n = 100, our study is likely to be non-significant and lead to the problems mentioned at the outset of this lecture Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 13 / 26

Sample size calculation How large must n be in order to increase our power to, say, 80%? Let s let z 1 α/2 and z 1 β denote the 1 α/2 and 1 β quantiles of the standard normal distribution; we can easily solve the equation on the previous slide for n to obtain n = 2 Thus, in our example, we need people in each group ( ) z1 α/2 + z 2 1 β (µ 1 µ 2 )/σ ( ) 1.96 + 0.84 2 n = 2 = 813.8 5/36 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 14 / 26

Accounting for uncertainty concerning σ The preceding calculations were somewhat unrealistic in that we assumed we knew σ We can carry out more accurate calculations by basing everything on a t distribution instead of a normal distribution conceptually this is a simple change, but the calculations get quite a bit messier For n = 100 in each group, the critical value is t = 1.972, but we must base power now off of the noncentral t-distribution: 1 F (1.972 198, 0.982) = 0.163, where F (x ν, µ) is the CDF of the noncentral t distribution with noncentrality parameter µ and ν degrees of freedom As we would expect, this is slightly lower than the z-calculations, although the difference is negligible since n is reasonably large Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 15 / 26

Sample size calculation: t Also, the complicated dependence of t quantiles on n prevents us from using a simple formula as we had in the z case Instead, we need to find the value n that satisfies t 2n 2,1 α/2 = t 2n 2,β,µ, where t ν,p,µ is the pth quantile of the noncentral t distribution with noncentrality parameter µ and ν degrees of freedom This, in turn, is a root-finding problem which requires a computer to solve; using one, we find that we need n = 814.7 in each group Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 16 / 26

Power curve: n = 100, σ = 36 It is often instructive to look at power curves, which plot power as a function of various parameters 1.0 0.8 Power 0.6 0.4 0.2 0.0 20 10 0 10 20 Effect size Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 17 / 26

Power curve: n = 100, µ 1 µ 2 = 5 1.0 0.8 Power 0.6 0.4 0.2 0.0 10 20 30 40 50 60 SD Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 18 / 26

Power curve: µ 1 µ 2 = 5, σ = 36 1.0 0.8 Power 0.6 0.4 0.2 0.0 0 200 400 600 800 1000 Sample size Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 19 / 26

A more complicated version of the cholesterol study In practice, our study might be more complicated than what we have described for the cholesterol example For example, rather than simply measuring cholesterol at the end of the study, the researchers might measure cholesterol repeatedly over the course of the study and analyze the trajectories of patients over their time on the study This would be a considerably more complicated analysis and difficult to carry out an exact power calculation Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 20 / 26

Using simpler studies to obtain a bound However, one can approximate (or perhaps more accurately, bound) the power of this more complicated study using the simpler study A more sophisticated analysis that takes into account the trajectories and repeated measurements on subjects should be more powerful than the simple endpoint analysis; therefore, if n = 815 yielded 80% power for the simple analysis, it should yield > 80% for a more powerful analysis Obviously, if the complex analysis is dramatically more powerful than the simple analysis, this approach will be rather conservative Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 21 / 26

Simulating power: Growth curve example If one wants to avoid this conservatism, or is in a situation so complex that no simple version seems sufficient, one can use simulations to calculate the power For example, a few years ago I performed a power calculation for a researcher at the UI cancer center who was investigating a treatment expected to slow (on average) the growth rate of a tumor by a factor of 2 The proposal was to induce tumor growth in a sample of rats, then split them into a treatment and control group We would then analyze the effect of the treatment by measuring the growth of the tumor over time, fitting growth curves for each rat, and testing whether the average growth rate in the two groups was equal Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 22 / 26

Assumed distributions Control Treatment 0 20 40 60 80 Time to 1000 mm 3 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 23 / 26

Simulated power Introduction 1.0 0.8 Power 0.6 0.4 0.2 0.0 3 4 5 6 7 8 9 10 Sample size (per group) We then proposed n = 8 rats per group in the grant Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 24 / 26

Sample size determination in practice Statistical calculations are essential to ensure a meaningful study adequately powered to answer the question of interest In reality, of course, lots of other things like money, time, resources, availability of subjects, etc., also influence the actual sample size of a study Finally, we may be interested in calculating the required sample size under a few different designs to see which way is the easiest/cheapest to conduct the study Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 25 / 26

Introduction The power of a study depends on: Sample size Variability Effect size Power calculations are a two-step process involving both the null and assumed true distributions Know how to calculate power and sample size for one- and two-sample z tests (and know how to use a computer to calculate the power of a t-test) Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 26 / 26