PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Similar documents
Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

1 Statistical inference for a population mean

Sociology 6Z03 Review II

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Topic 16 Interval Estimation

Chapter 9 Inferences from Two Samples

Unit 9: Inferences for Proportions and Count Data

TUTORIAL 8 SOLUTIONS #

Unit 9: Inferences for Proportions and Count Data

Pubh 8482: Sequential Analysis

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics.

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

The Components of a Statistical Hypothesis Testing Problem

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

Binomial and Poisson Probability Distributions

Pump failure data. Pump Failures Time

Lecture 11 - Tests of Proportions

Lecture 01: Introduction

Margin of Error for Proportions

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Section Inference for a Single Proportion

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Lecture 6: Point Estimation and Large Sample Confidence Intervals. Readings: Sections

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Introduction to Bayesian Learning. Machine Learning Fall 2018

Sections 7.1 and 7.2. This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics

Bernoulli Trials, Binomial and Cumulative Distributions

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Inferences for Proportions and Count Data

Bernoulli Trials and Binomial Distribution

Reports of the Institute of Biostatistics

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Lecture 7: Confidence interval and Normal approximation

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Lecture 3. Biostatistics in Veterinary Science. Feb 2, Jung-Jin Lee Drexel University. Biostatistics in Veterinary Science Lecture 3

Tests for Population Proportion(s)

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Bernoulli Trials and Binomial Distribution

BINF702 SPRING 2015 Chapter 7 Hypothesis Testing: One-Sample Inference

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Confidence Intervals for Normal Data Spring 2018

Lecture 2: Discrete Probability Distributions

One-sample categorical data: approximate inference

Topic 12 Overview of Estimation

Data Analysis and Statistical Methods Statistics 651

Chapters 3.2 Discrete distributions

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

STA 101 Final Review

Lecture Slides. Elementary Statistics. Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

COMPARING GROUPS PART 1CONTINUOUS DATA

1 Hypothesis testing for a single mean

Percentage point z /2

an introduction to bayesian inference

E509A: Principle of Biostatistics. GY Zou

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Advanced Herd Management Probabilities and distributions

Practice Questions: Statistics W1111, Fall Solutions

Confidence Intervals for Normal Data Spring 2014

1 Matched pair comparison(p430-)

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Topic 19 Extensions on the Likelihood Ratio

Statistics in medicine

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Summary of Chapters 7-9

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Medical statistics part I, autumn 2010: One sample test of hypothesis

STAT 4385 Topic 01: Introduction & Review

Expected Value - Revisited

hypothesis a claim about the value of some parameter (like p)

Point and Interval Estimation II Bios 662

p = q ˆ = 1 -ˆp = sample proportion of failures in a sample size of n x n Chapter 7 Estimates and Sample Sizes

Chapter 6 Estimation and Sample Sizes

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Foundations of Statistical Inference

Pubh 8482: Sequential Analysis

Conditional Probabilities

This does not cover everything on the final. Look at the posted practice problems for other topics.

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Testing Independence

Significance Tests. Review Confidence Intervals. The Gauss Model. Genetics

Inference for Single Proportions and Means T.Scofield

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Inference for Proportions

REVIEW: Midterm Exam. Spring 2012

Probability and Probability Distributions. Dr. Mohammed Alahmed

The Multinomial Model

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

2011 Pearson Education, Inc

Transcription:

PubH 5450 Biostatistics I Prof. Carlin Lecture 13

Outline Outline Sample Size Counts, Rates and Proportions

Part I Sample Size

Type I Error and Power Type I error rate: probability of rejecting the null when the null is true a mistake!

Type I Error and Power Type I error rate: probability of rejecting the null when the null is true a mistake! Power: probability of rejecting the null when the alternative is true NOT a mistake!

Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.)

Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = 0.05.

Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = 0.05. 3. The (minimal) power: say, 1 β = 0.8.

Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = 0.05. 3. The (minimal) power: say, 1 β = 0.8. 4. The (minimal) magnitude of the effect µ 1 µ 2 to be detected.

Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = 0.05. 3. The (minimal) power: say, 1 β = 0.8. 4. The (minimal) magnitude of the effect µ 1 µ 2 to be detected. 5. Variability: σ 2 (if we can assume equal variances)

Sample Size for Two-Sample Tests n is a function of the standardized difference between the two populations: = µ 1 µ 2. σ

Sample Size for Two-Sample Tests n is a function of the standardized difference between the two populations: = µ 1 µ 2. σ For a two-sided test, the required sample size per group is n = 2( z 1 α/2 + z 1 β ) 2 2.

Sample Size for Two-Sample Tests n is a function of the standardized difference between the two populations: = µ 1 µ 2. σ For a two-sided test, the required sample size per group is n = 2( z 1 α/2 + z 1 β ) 2 2. (Rule of thumb) For α = 0.05 and 1 β = 0.8, n 16/ 2.

Notes on Sample Size Formula It assumes n 1 = n 2, which gives the best power when n 1 + n 2 is fixed.

Notes on Sample Size Formula It assumes n 1 = n 2, which gives the best power when n 1 + n 2 is fixed. To detect half the effect, the sample size needs to be quadrupled.

Notes on Sample Size Formula It assumes n 1 = n 2, which gives the best power when n 1 + n 2 is fixed. To detect half the effect, the sample size needs to be quadrupled. Rule of thumb: When σ 2 is estimated (from previous studies), add 1 to each group.

Sample Size for One-Sample Tests For one sample tests, the standardized difference is = µ µ 0. σ

Sample Size for One-Sample Tests For one sample tests, the standardized difference is = µ µ 0. σ For a two-sided test, the required sample size in the group is n = ( z1 α/2 + z 1 β ) 2 2.

Sample Size for One-Sample Tests For one sample tests, the standardized difference is = µ µ 0. σ For a two-sided test, the required sample size in the group is n = ( z1 α/2 + z 1 β ) 2 2. Rule of thumb: For α = 0.05 and 1 β = 0.8, n 8/ 2.

Notes on Sample Size for One-Sample Tests For a matched case-control study (paired, dependent samples), you still need 2n subjects.

Notes on Sample Size for One-Sample Tests For a matched case-control study (paired, dependent samples), you still need 2n subjects. That is still only half the sample size needed for an unmatched design (more variability in two independent groups need more samples)

Notes on Sample Size for One-Sample Tests For a matched case-control study (paired, dependent samples), you still need 2n subjects. That is still only half the sample size needed for an unmatched design (more variability in two independent groups need more samples) Rule of thumb: When σ 2 is estimated (from previous studies), add 2 to n.

One-Sided Tests When doing a sample size calculation for a one-sided test, replace z 1 α/2 by z 1 α in the formulae above.

One-Sided Tests When doing a sample size calculation for a one-sided test, replace z 1 α/2 by z 1 α in the formulae above. For α = 0.05, these are of course 1.96 and 1.645, respectively.

Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2.

Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered:

Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered: One group of people is difficult to recruit.

Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered: One group of people is difficult to recruit. The costs of the two treatments are different.

Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered: One group of people is difficult to recruit. The costs of the two treatments are different. The variances of the two populations are different.

Counts, Rates and Proportions Part II Counts, Rates and Proportions

Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p.

Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p. The mean of X is np and its variance is np(1 p).

Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p. The mean of X is np and its variance is np(1 p). ˆp = X /n is an estimator of p with variance (ˆp(1 ˆp))/n.

Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p. The mean of X is np and its variance is np(1 p). ˆp = X /n is an estimator of p with variance (ˆp(1 ˆp))/n. The sampling probability (the mean of ˆp) and the population proportion (p) are equal only under simple random sampling.

Counts, Rates and Proportions What are these rates? Definitions The incidence of a disease is the number of new cases diagnosed during the time interval.

Counts, Rates and Proportions What are these rates? Definitions The incidence of a disease is the number of new cases diagnosed during the time interval. The prevalence of a disease is the number of individuals with the disease at a fixed time point.

Counts, Rates and Proportions Cautions in Comparing Proportions What are the numerators?

Counts, Rates and Proportions Cautions in Comparing Proportions What are the numerators? What are the denominators?

Counts, Rates and Proportions Confidence Intervals for Proportions Wilson s 95% CI: p ± 1.96 p(1 p) n + 4, where p = X + 2 n + 4.

Counts, Rates and Proportions Confidence Intervals for Proportions Wilson s 95% CI: where p ± 1.96 p(1 p) n + 4, p = X + 2 n + 4. This technique has a Bayesian interpretation: note it is as if we are adding two successes and two failures to the actual observed dataset.

Counts, Rates and Proportions Confidence Intervals for Proportions Wilson s 95% CI: where p ± 1.96 p(1 p) n + 4, p = X + 2 n + 4. This technique has a Bayesian interpretation: note it is as if we are adding two successes and two failures to the actual observed dataset. It is still more common to use the ordinary ˆp = X /n (instead of p) when all we want is a point estimate of p.

Counts, Rates and Proportions Rare Events Wilson s CI does not work very well when p is very close to 0 or 1: the result of our Bayesian prior belief that p is close to 1/2 (our fake preliminary data are balanced: 2 successes, 2 failures)

Counts, Rates and Proportions Rare Events Wilson s CI does not work very well when p is very close to 0 or 1: the result of our Bayesian prior belief that p is close to 1/2 (our fake preliminary data are balanced: 2 successes, 2 failures) The rule of threes : If in n trials, no success is observed, the estimated success probability is 0, with an approximate 95% upper bound 3 n.

Counts, Rates and Proportions Large-sample testing for a population proportion To test H 0 : p = p 0, use the z-statistic: where ˆp = X /n. z = ˆp p 0 p 0 (1 p 0 ) n,

Counts, Rates and Proportions Large-sample testing for a population proportion To test H 0 : p = p 0, use the z-statistic: where ˆp = X /n. z = ˆp p 0 p 0 (1 p 0 ) n Note that p 0 is used and Z has a standard normal distribution (when n is large, e.g., np 0 > 10 and n(1 p 0 ) > 10 or np 0 (1 p 0 ) > 5).,

Counts, Rates and Proportions Large-sample testing for a population proportion To test H 0 : p = p 0, use the z-statistic: where ˆp = X /n. z = ˆp p 0 p 0 (1 p 0 ) n Note that p 0 is used and Z has a standard normal distribution (when n is large, e.g., np 0 > 10 and n(1 p 0 ) > 10 or np 0 (1 p 0 ) > 5). The p-value again depends on H 1 : H 1 : p > p 0 use Pr(Z z) H 1 : p < p 0 use Pr(Z z) H 1 : p p 0 use Pr( Z z ) = 2 Pr(Z z ),

Counts, Rates and Proportions Choosing a sample size for a desired margin of error Recall the margin of error for our large-sample Wilson CI is z SE p = z p(1 p) n + 4 where typically z = 1.96, the upper.025 point of Z.

Counts, Rates and Proportions Choosing a sample size for a desired margin of error Recall the margin of error for our large-sample Wilson CI is z SE p = z p(1 p) n + 4 where typically z = 1.96, the upper.025 point of Z. When doing a sample size calculation, we must guess the value of p; call it p. We can either Use an estimate of p from an earlier, pilot study, or Use p = 0.5, since this will maximize the margin of error conservative! (safe regardless of what p turns out to be)

Counts, Rates and Proportions Choosing a sample size for a desired margin of error Recall the margin of error for our large-sample Wilson CI is z SE p = z p(1 p) n + 4 where typically z = 1.96, the upper.025 point of Z. When doing a sample size calculation, we must guess the value of p; call it p. We can either Use an estimate of p from an earlier, pilot study, or Use p = 0.5, since this will maximize the margin of error conservative! (safe regardless of what p turns out to be) Using the conservative p, the required sample size is ( ) z 2 n = 4, 2m provided this number is still positive!