Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Similar documents
PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

ANOVA: Comparing More Than Two Means

The Design of a Survival Study

Power and Sample Sizes in RCTs Cape Town 2007

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Difference in two or more average scores in different groups

First we look at some terms to be used in this section.

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Statistical Testing I. De gustibus non est disputandum

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING: SINGLE MEAN, NORMAL DISTRIBUTION (Z-TEST)

Bios 6648: Design & conduct of clinical research

Sample Size Determination

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Review of Statistics 101

Sociology 6Z03 Review II

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

a Sample By:Dr.Hoseyn Falahzadeh 1

MTMS Mathematical Statistics

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Section 6.2 Hypothesis Testing

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

Overrunning in Clinical Trials: a Methodological Review

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Valida&on of Predic&ve Classifiers

Sample Size Estimation for Studies of High-Dimensional Data

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Power and Sample Size Bios 662

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Multiple Testing in Group Sequential Clinical Trials

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

6 Sample Size Calculations

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Power Analysis. Introduction to Power

Measures of Association and Variance Estimation

6 Single Sample Methods for a Location Parameter

What p values really mean (and why I should care) Francis C. Dane, PhD

Pubh 8482: Sequential Analysis

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 10-11: Statistical Inference: Hypothesis Testing

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Superiority by a Margin Tests for One Proportion

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Econometrics. 4) Statistical inference

Study Design: Sample Size Calculation & Power Analysis

Sample Size Calculations

Group sequential designs for Clinical Trials with multiple treatment arms

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

LECTURE 5. Introduction to Econometrics. Hypothesis testing

The problem of base rates

Introduction to the Analysis of Variance (ANOVA)

Correlation and Simple Linear Regression

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Testing for IID Noise/White Noise: I

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Introduction to Statistical Hypothesis Testing

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

On Assumptions. On Assumptions

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

+ Specify 1 tail / 2 tail

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Visual interpretation with normal approximation

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Two-sample Categorical data: Testing

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Statistical Inference. Hypothesis Testing

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Probability and Probability Distributions. Dr. Mohammed Alahmed

Package BayesNI. February 19, 2015

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Chapter 12: Estimation

Power and sample size calculations

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Tests about a population mean

Introductory Econometrics. Review of statistics (Part II: Inference)

hypothesis testing 1

Group Sequential Designs: Theory, Computation and Optimisation

Instrumental variables estimation in the Cox Proportional Hazard regression model

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Power and sample size calculations

BIOS 312: Precision of Statistical Inference

S D / n t n 1 The paediatrician observes 3 =

Sleep data, two drugs Ch13.xls

PASS Sample Size Software. Poisson Regression

Review. December 4 th, Review

Chapter Seven: Multi-Sample Methods 1/52

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Transcription:

Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider sample size requirements early A well-designed trial is large enough to detect clinically important differences between groups with high probability To perform sample size calculations, we need well defined study endpoints, hypotheses, and statistical tests.

Specify the Null Hypothesis Study hypotheses should be based on a clearly defined endpoint and period of study. In most RCTs, known as superiority trials, the study hypothesis is stated as a null hypothesis of no difference in the distribution of the primary endpoint between study groups. In the CORONARY Trial, the short-term null hypothesis was H 0 : Patients receiving on-pump and off-pump coronary artery bypass surgery will have identical event rates at 30 days post-randomization

Specify the Alternative Hypothesis We have an alternative hypothesis in mind, for example, H A : The frequency of events at 30 days will differ in the two treatment groups. In superiority trials, we test the null hypothesis against a two-sided alternative. We have a directional alternative hypothesis in mind, for example, that fewer events will occur within 30 days in the off-pump group.

Outcomes of Hypothesis Testing When we test the null hypothesis, there are two possible states of nature and two decisions: Truth About Risk Difference Test Result H 0 True H a True Reject H 0 Do Not Rej H 0 Type I Error No Error No Error Type II Error

Power We will perform a test that has a small probability of a Type 1 error, usually 0.05. The power of the study is the probability that we will reject the null hypothesis when the alternative hypothesis is actually true. We would like this probability to be large, typically at least 0.8.

Express the Hypotheses in Terms of Probabilities The 30-day outcome is a binary event, occurrence or nonoccurrence of death or complications within 30 days of surgery. p T = Probability that an off-pump patient will have an event p C = Probability that an on-pump patient will have an event The study hypotheses are: H 0 : p T = p C (T and C are equally effective) H A : p T p C (T and C are not equally effective) We specify the direction of the alternative for the sample size calculation.

The Test Statistic To test the null hypothesis, we calculate a test statistic, T, and a critical value, C, and reject the null hypothesis if T > C, that is, if T > C or T < -C. To calculate power or sample size, we will focus on significance in one direction, T < -C, implying that p T < p C. For the CORONARY Trial, define T as the difference between the observed proportions divided by the standard error of the difference.

The Test Statistic The observed difference in proportions is D = p T p C Assuming equal sample sizes in the two groups, Var(D) = p T *(1-p T )/n + p C *(1-p C )/n Under the null hypothesis, p T = p C. Define the test statistic as D divided by its standard deviation T = p T p C 2p (1 p )/n = D SD(D) where p is the average event rate

Choosing C Choose C so that P(T < -C H 0 ) = /2. Usually, = 0.05 (two-sided) so /2 = 0.025. T is approximately N(0,1) if H 0 is true. Hence, if /2 = 0.025, C = 1.96. Power is P(T < -C H A ) = 1 P(Type 2 error) = 1 - β. The investigator can control the power by choosing the sample size

Null and Alternative Hypothesis In the CORONARY Trial, one possible scenario for the 30-day endpoint was p C = 0.08, and, under the alternative hypothesis p T = (0.85)*0.08 = 0.068 a 15% reduction in the event rate in the off-pump group. Under H A, the expected value of D would be Δ = 0.068 0.08 = -0.012 If H 0 is true, var(d) = 2*0.08*0.92/n

Logic of Sample Size Calculations Again consider the risk difference, D = p T p C D is approximately normally distributed with mean = 0 if H 0 is true and mean = -.012 if H A is true Var(D) = p T *(1-p T )/n T + p C *(1-p C )/n C The mean is independent of n but the variance decreases as n increases

PDF* The Logic of Hypothesis Testing H A H 0 100 80 60 40 20 -C The distribution of D under the null and alternative hypotheses 0 *pdf: probability density function D 0.0

PDF Type 1 Error H A H 0 100 80 60 40 /2 20 0 D

PDF Power H A H 0 100 80 60 1-β 40 20 0-0.084-0.042 0.0 0.042 0.084 D

Var(D) Depends on Sample Size n = 500 Var = 0.00029 SD = 0.017 n = 1,000 Var = 0.00015 SD = 0.012 n = 2,000 Var = 0.000074 SD = 0.0086

PDF n= 500 100 80 60 40 H A H 0 20 0 -.034 0.034 D

PDF n=1,000 100 80 60 40 H A H 0 20 0 -.024 0.0.024 D

PDF n=2,000 100 H A H 0 80 60 40 20 0 -.017 0.017 D

True Difference (Δ) For a fixed sample size, the power of the study will increase with the size of the true difference

Sample Size Formula To achieve the desired Type 1 and Type 2 error, we need to satisfy two conditions -Z α/2 *SD(D) = -C and Δ + Z β *SD(D) = -C Recall that we estimate the variance of D by 2p (1 p )/n To determine n, set -Z α/2 *SD(D) = Δ + Z β *SD(D) and solve for n

PDF n=1000 100 H A H 0 80 60 40 -Z /2 SD(D) = -C 20 0 Δ 0 D

PDF n=1000 100 H A H 0 80 60 Δ + Z β SD(D) = -C 40 20 0 Δ 0 D

The Sample Size Formula n = 2p (1 p ) Z α/2 + Z β Δ 2 Z α/2 and Z β are the critical values of the normal distribution, p is the average of the event rates under the alternative hypothesis, and Δ is the true difference under H A. For the CORONARY Trial, with α = 0.05, β = 0.20, n = 1,903 or 2n = 3,806. (Note: If p c = 0.08, and p t = 0.068 if H A is true, the correct value for the sample size is n = 7,462.) The CORONARY investigators considered a range of scenarios and settled on a total sample size of 4,700 patients. 2