Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Similar documents
Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

a Sample By:Dr.Hoseyn Falahzadeh 1

Sample Size. Vorasith Sornsrivichai, MD., FETP Epidemiology Unit, Faculty of Medicine Prince of Songkla University

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Two sample hypothesis testing

BINF 702 SPRING Chapter 8 Hypothesis Testing: Two-Sample Inference. BINF702 SPRING 2014 Chapter 8 Hypothesis Testing: Two- Sample Inference 1

Power and sample size calculations

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Probability and Probability Distributions. Dr. Mohammed Alahmed

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

BIOS 312: Precision of Statistical Inference

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Sampling and Sample Size. Shawn Cole Harvard Business School

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Conditional Probabilities

E509A: Principle of Biostatistics. GY Zou

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Statistics in medicine

Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

HYPOTHESIS TESTING. Hypothesis Testing

Inference for Distributions Inference for the Mean of a Population

One-sample categorical data: approximate inference

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Correlation and Simple Linear Regression

Power of a test. Hypothesis testing

Module 17: Two-Sample t-tests, with equal variances for the two populations

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

BIOS 2041: Introduction to Statistical Methods

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Chapter Six: Two Independent Samples Methods 1/51

Training and Technical Assistance Webinar Series Statistical Analysis for Criminal Justice Research

Sample Size Determination

Power and Sample Size Bios 662

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

The Design of a Survival Study

Basic Statistics and Probability Chapter 9: Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses

Descriptive Statistics-I. Dr Mahmoud Alhussami

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

COMPARING GROUPS PART 1CONTINUOUS DATA

The Difference in Proportions Test

Tests for Two Correlated Proportions in a Matched Case- Control Design

Categorical Data Analysis 1

Hypothesis testing. Data to decisions

Harvard University. Rigorous Research in Engineering Education

Study Design: Sample Size Calculation & Power Analysis

NI - INTEGRATED PUBLIC PROVISION OF HEALTH CARE SERVICES (P164452)

Introduction to Statistical Data Analysis III

Sample Size/Power Calculation by Software/Online Calculators

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Lecture 1 Introduction to Multi-level Models

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Sample Size Estimation for Studies of High-Dimensional Data

6 Sample Size Calculations

Rejection regions for the bivariate case

Analysing data: regression and correlation S6 and S7

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

10: Crosstabs & Independent Proportions

Review. December 4 th, Review

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Econ 325: Introduction to Empirical Economics

Statistics: revision

10.1. Comparing Two Proportions. Section 10.1

Comparing Means from Two-Sample

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Statistical Inference

DISCRETE PROBABILITY DISTRIBUTIONS

Statistics for IT Managers

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

16.400/453J Human Factors Engineering. Design of Experiments II

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Determining a Statistically Valid Sample Size: What Does FDA Expect to See?

STAC51: Categorical data Analysis

Business Statistics. Lecture 10: Course Review

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Chapter 9. Hypothesis testing. 9.1 Introduction

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Power and sample size calculations

Power Analysis. Introduction to Power

Review. More Review. Things to know about Probability: Let Ω be the sample space for a probability measure P.

Statistics in medicine

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Describing distributions with numbers

Liang Li, PhD. MD Anderson

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Chapter 2: Describing Contingency Tables - I

Transcription:

. Welcome! Webinar Biostatistics: sample size & power Thursday, April 26, 12:30 1:30 pm (NDT) Get started now: Please check if your speakers are working and mute your audio. Please use the chat box to ask questions.

How to participate in this webinar Any issues? Technical Support Locally: 709-864-8700 or go to https://www.citl.mun.ca/support/ 2

Introduction Speaker Hensley H Mariathas Ph.D., Biostatistic Lead NL SUPPORT 3

Webinar #4-2018 Biostatistics: sample size & power

Sample Size Calculation and Power Analysis Lead Biostatistician NL SUPPORT Faculty of Medicine Memorial University of Newfoundland Canada April 26, 2018

Why and What Why sample size calculation? Part of study design to determine the number of participants needed to detect clinically relevant treatment effect Number of patients in a study restricted because of ethical, cost and time consideration What we need? Requires prior estimation of study results May use data from other prior studies Will use investigator judgment and choice

Key Things to Decide Things You Have to Decide First Type of question: estimation, comparison etc. Study design: trial, cluster, case-control etc. Type of outcome measure: data level Likely analytic method Hypothesis test verses confidence interval approach Things You May Have to Decide Next Desired maximal type I & II error rates or Desired % ile and width of CI Likely outcome in controls Effect size you wish to detect (e.g. minimal difference in group outcome) Measure of variation expected (e.g. group SD for outcome)

Research Questions Example 1: Estimating a population proportion We want to estimate the true immunization coverage in a community of school children to be within 4% of the true value. Example 2: Trial to reduce blood pressure Suppose researcher wants to compare two treatments designed to reduce blood pressure. Researcher decide that a clinically important difference would be 10 mm Hg.

Two approaches to sample size calculations Precision-based With what precision do you want to estimate the quantum of one or more characteristics of the population, called parameter(s),(for example, mean hemoglobin level or prevalence of asthma ect.,)? To estimate the value of the parameter under study for a prefixed precision and level of confidence. Power-based How small a difference is it important to detect and with what degree of certain? To achieve a desired power for detecting a clinically or scientifically meaningful difference at a prefixed level of significance.

Precision-based sample size calculation Suppose you want be able to estimate your unknown parameter with a certain degree of precision. A 100(1 α)% confidence interval with certain width. The narrower the interval is, the more precise the inference is Consider the maximum half width of the 100(1 α)% confidence interval of the unknown parameter and usually referred to as the maximum error or margin of error In general a 100(1 α)% confidence interval of the unknown parameter is given by Estimate ± z α/2 SE, SE = σ n where z α/2 is the upper (α/2) th quartile of the standard normal distribution and σ is the population SD which is unknown The sample size required can be chosen as n = z2 α/2 σ2 E 2, E = maximum error

Precision-based sample size calculation-one sample Sample size for estimating a population mean ( zα/2 σ ) 2 n = (1) E

Precision-based sample size calculation-one sample Sample size for estimating a population mean ( zα/2 σ ) 2 n = (1) E Sample size for estimating a population proportion n = p(1 p) ( zα/2 ) 2 ( zα/2 ) 2 = pq (2) E E

Precision-based sample size calculation-one sample Sample size for estimating a population mean ( zα/2 σ ) 2 n = (1) E Sample size for estimating a population proportion n = p(1 p) The values of σ and p ( zα/2 E ) 2 ( zα/2 ) 2 = pq (2) E The values of σ and p can be taken from a similar published studies or based on a pilot studies.

Precision-based sample size calculation-two sample Sample size for estimating mean difference ( 2 unrelated group) n 1 = n 2 = n(per group) = z2 α/2 ( σ 2 E 2 1 + σ2 2 ) (3)

Precision-based sample size calculation-two sample Sample size for estimating mean difference ( 2 unrelated group) n 1 = n 2 = n(per group) = z2 α/2 ( σ 2 E 2 1 + σ2 2 ) (3) Sample size for estimating of difference in proportions n 1 = n 2 = n(per group) = z2 α/2 E 2 (p 1(1 p 1 ) + p 2 (1 p 2 )) (4)

Precision-based sample size calculation-two sample Sample size for estimating mean difference ( 2 unrelated group) n 1 = n 2 = n(per group) = z2 α/2 ( σ 2 E 2 1 + σ2 2 ) (3) Sample size for estimating of difference in proportions n 1 = n 2 = n(per group) = z2 α/2 E 2 (p 1(1 p 1 ) + p 2 (1 p 2 )) (4) About σ i and p i for i = 1, 2 are: Where, σ i is the standard deviation expected in group i, p i is the expected proportion with events in group i and E is the half the desired width of the CI.

Practical Problem Example 1 Suppose you wish to carry out a trial of a new treatment for hypertension (high blood pressure) among men aged between 50 and 60. You would like your 95% confidence interval to have width 10 mmhg (i.e. you want to be 95% sure that the true difference in means is within ±5 mmhg of your estimated difference in means). How many subjects will you need to include in your study?

Solution for the practical problem We know that the 95% confidence interval for a difference in means is given by σ1 2 ( x 1 x 2 ) ± 2 + σ2 2 (5) n 1 n 2 Hence, we want 2 σ 2 1 n 1 + σ2 2 n 2 to be equal to 5, that is for n 1 = n 2 = n and equal variance σ1 2 2 + σ2 2 2 2 = 2 σ n 1 n 2 n = 5 σ n = 2.5 since we are aiming for groups of the same size

Solution for the practical problem cont... Need to know what σ is likely to be This is known from (a) previous experience (i.e. knowledge of the distribution of systolic blood pressure among men with hypertension in this age group), (b) using other published papers on blood pressure studies in a similar group of people or (c) carrying out a pilot study From option (b) assume σ = 20 mmhg. This gives 2 20 n = 2.5 n ( ) 20 2 2 = n = 128 2.5

Power-based sample size calculation Relates to hypothesis testing Four possible outcomes of a test of hypothesis True State of Nature Clinical Decision H 0 True H a True Do not reject H 0 Correct Decision Type II error (β) Reject H 0 Type I error(α) Correct Decision Type I error (False positive): Rejecting H 0 when H 0 is true α: Type I error rate. Maximum p-value considered statistically significant Type II error (False negative): Failing to reject H 0 when H 0 is false β: Type II error rate. Power: The statistical power of a test is defined to be 1 β.

Power-based sample size calculation Power analysis Type I error is usually considered to be a more important and/or serious error which one would like to avoid Control α at an acceptable level and try to minimize β by choosing an appropriate sample size

Power-based sample size calculation Power analysis Type I error is usually considered to be a more important and/or serious error which one would like to avoid Control α at an acceptable level and try to minimize β by choosing an appropriate sample size Key steps of power analysis Select a significance level (type I error), which is willing to tolerate (i.e. typically α = 0.05) A choice of power is either 90% or 80% Specify a clinically meaningful difference( ). The knowledge regarding the standard deviation (i.e.,σ) of the primary endpoint considered in the study is also required for sample size determination

Comparing Means One-Sample Design The sample size need to achieve power 1 β can be obtained by solving the following equation, where, n = (z α/2 + z β ) 2 σ 2 2 (6) z p is the upper p th quartile of the standard normal distribution. = µ µ 0 is the difference between true mean (µ) and a reference value (µ 0 ). σ 2 is the population variance need to be replace by prior knowledge.

Comparing Means Two-Sample Design: Two independent groups The formula for estimating the sample size in each of the study group n 1 = n 2 = n is given by where, n = 2(z α/2 + z β ) 2 σ 2 2 or n = 2(z α/2 + z β ) 2 δ 2 (7) n is the minimum sample size required in each group (equal sample size and hence total size is 2n) = µ 1 µ 2 is the chosen difference in means to be detected with. σ 2 anticipated endpoint group variance ( You may assume σ 2 1 = σ 2 2 = σ2 ). δ = σ is the effect size, minimum difference we wish to detect relative to the endpoint group variance.

Comparing Means Two-Sample Design: Two dependent groups (Paired) These observations cannot be assumed to be independent of each other; the statistical test and the sample size estimation will have to take into account this dependency. Because of this we can use the sample size formula as given in (6) for one sample: n = (z α/2 + z β ) 2 σ 2 d 2 d (8) where, σ d is the unknown population SD of the differences between pairs of observations But σ d = σ 2(1 r), with σ is the SD of the observations at one time point and r is the correlation between observations within subjects over measurements

If two groups of unequal size Applies even if not using t-test Assuming n per group needed if groups equal n 1 as the size of the first unequal group (say, standard treatment group) kn 1 as the size of the second group ( say, new treatment group) (k + 1)n Then n 1 = 2k From (7), n = 2(z α/2 + z β ) 2 σ 2 2.

Comparison of Proportions between Two Groups Two independent groups, Binary The number of participants per group required to detect a difference p 1 p 2 in the proportions with significance level α and power 1 β is given by n = (z α/2 + z β ) 2 (p 1 (1 p 1 ) + p 2 (1 p 2 )) (p 1 p 2 ) 2 (9) where, p 1 is the expected proportion in Group 1 and p 2 is the expected proportion in Group 2.

Comparison of Proportions between Two Groups Two independent groups, Odds If the difference is specified as an odds ratio OR = p 1/(1 p 1 ) p 2 /(1 p 2 ) = p 1(1 p 2 ) p 2 (1 p 1 ) then an approximate formula is given by n = 2(z α/2 + z β ) 2 [ln(or)] 2 p(1 p) (10) where, p = p 1 + p 2 2 and OR is the odds ratio to be detected.

Comparison of Proportions between Two Groups Case Control Study, Odds In case-control study, data are usually summarized in odds ratio, rather than difference between two proportions when the outcome variables of interest were categorical If p 1 and p 2 are proportions of cases and controls respectively, exposed to a risk factor OR = p 1/(1 p 1 ) p 2 /(1 p 2 ) = p 1(1 p 2 ) p 2 (1 p 1 ) If we know the prevalence of exposure in the general population p, the total sample size N for estimating an OR is N = (1 + k)2 k (z α/2 + z β ) 2 [ln(or)] 2 p(1 p), k = n 1 (11) n 2

Comparison of Proportions between Two Groups Ordered categorical data n = 6(z α/2 + z β ) 2 [ln(or)] 2 (1 k i=1 p 3 i ) (12) where OR is the odds ratio of a patient being in category i or less for one treatment compared to the other k is the number of categories and p i is the mean proportion expected in category i, that is, p i = (p 1i + p 2i )/2 where p 1i and p 2i are the proportions expected in category i for the two groups 1 and 2 respectively.

Comparison of Proportions between Two Groups Example 3 Randomized trial of two treatments for HIV patients Primary outcome = proportion of patients with viral load (VL) less than the limit of detection at 48 weeks Expect that 60% of patients in the standard of care arm will have suppressed VL Interested in difference of 20% p 1 = 0.6, p 2 = 0.8, α = 0.05 and β = 0.2, The sample size per group is 81

Problems Problem 1 An investigator wish to estimate the sample size necessary to detect a 10 mg/dl difference in cholesterol level in a diet intervention group compared to the control group. The SD from the other data is estimated to be 50 mg/dl. For two sided 5% significance level, z α/2 = 1.96 and for 90% power z β = 1.282 Thus the required sample size in each group is n = 2 (1.96 + 1.282) 2 502 10 2 = 526

Problems Problem 2 The Canadian Contraceptive Study 2002 is used as the best evidence for rates of oral contraceptive pills (OCP) use in the Canadian Population. It reported that 80% of women report ever use of OCP and 20% report use more than 10years. The average time duration of OCP use was 6.4 years. Meta-analysis reviewing the extent of relative risk reduction of endometrial cancer with OCP use report 0.44 at 4 years of use, 0.33 at 8years of use and 0.28 at 12 years of use. ( decreased rates of 56%, 67% and 72% respectively). Calculating that the probability of exposure to OCP for greater than 5 years at 60% and that women without the exposure have a risk of endometrial cancer that is 3 times higher. What sample size needed at α = 5% with power of 80%?

Problems Problem 2: Solution For two sided 5% significance level, z α/2 = 1.96 and for 80% power z β = 0.84 OR = 3 For k = 1 from (11), the total sample size N is N = (1 + 1)2 1 (1.96 + 0.84) 2 [ln(3)] 2 0.4 0.6 = 108 Since k = 1, then n 1 = n 2 = 54

Problems SAS output using proc power for problem 2

Problems From Example 3 Instead of 81, if we had only 60 samples in each group If we had 110 samples in each group

Questions?

Thank You!

Funding opportunity

Save the date: Upcoming sessions Webinar: Knowledge Translation beyond Publications. Thursday May 24, 2018 - Time: 12:30 PM 1:30 PM (NDT) Webinar: Let's Talk Policy. Thursday June 21, 2018 - Time: 12:30 PM 1:30 PM (NDT) Workshop: Writing in Plain Language. Thursday May 17, 2018 - Time: 12:00 PM 2:00 PM (NDT) Go to http://nlsupport.eventbrite.ca to register Check out our past events, slides and recordings on our website.

Keep in touch Eva Vat, Training and Capacity lead / Patient Engagement lead eva.vat@med.mun.ca 709 864 6654 www.facebook.com/nlsporsupport http://www.nlsupport.ca Sign up for our Newsletter