PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Size: px

Start display at page:

Download "PubH 5450 Biostatistics I Prof. Carlin. Lecture 13"

Anastasia Sparks
6 years ago
Views:

1 PubH 5450 Biostatistics I Prof. Carlin Lecture 13

2 Outline Outline Sample Size Counts, Rates and Proportions

3 Part I Sample Size

4 Type I Error and Power Type I error rate: probability of rejecting the null when the null is true a mistake!

5 Type I Error and Power Type I error rate: probability of rejecting the null when the null is true a mistake! Power: probability of rejecting the null when the alternative is true NOT a mistake!

6 Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.)

7 Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = 0.05.

8 Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = The (minimal) power: say, 1 β = 0.8.

9 Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = The (minimal) power: say, 1 β = The (minimal) magnitude of the effect µ 1 µ 2 to be detected.

10 Sample Size Calculation: Requirements 1. Distribution of the test statistic under the alternative (normal for two-sample t-tests.) 2. Type I error rate: usually α = The (minimal) power: say, 1 β = The (minimal) magnitude of the effect µ 1 µ 2 to be detected. 5. Variability: σ 2 (if we can assume equal variances)

11 Sample Size for Two-Sample Tests n is a function of the standardized difference between the two populations: = µ 1 µ 2. σ

12 Sample Size for Two-Sample Tests n is a function of the standardized difference between the two populations: = µ 1 µ 2. σ For a two-sided test, the required sample size per group is n = 2( z 1 α/2 + z 1 β ) 2 2.

13 Sample Size for Two-Sample Tests n is a function of the standardized difference between the two populations: = µ 1 µ 2. σ For a two-sided test, the required sample size per group is n = 2( z 1 α/2 + z 1 β ) 2 2. (Rule of thumb) For α = 0.05 and 1 β = 0.8, n 16/ 2.

14 Notes on Sample Size Formula It assumes n 1 = n 2, which gives the best power when n 1 + n 2 is fixed.

15 Notes on Sample Size Formula It assumes n 1 = n 2, which gives the best power when n 1 + n 2 is fixed. To detect half the effect, the sample size needs to be quadrupled.

16 Notes on Sample Size Formula It assumes n 1 = n 2, which gives the best power when n 1 + n 2 is fixed. To detect half the effect, the sample size needs to be quadrupled. Rule of thumb: When σ 2 is estimated (from previous studies), add 1 to each group.

17 Sample Size for One-Sample Tests For one sample tests, the standardized difference is = µ µ 0. σ

18 Sample Size for One-Sample Tests For one sample tests, the standardized difference is = µ µ 0. σ For a two-sided test, the required sample size in the group is n = ( z1 α/2 + z 1 β ) 2 2.

19 Sample Size for One-Sample Tests For one sample tests, the standardized difference is = µ µ 0. σ For a two-sided test, the required sample size in the group is n = ( z1 α/2 + z 1 β ) 2 2. Rule of thumb: For α = 0.05 and 1 β = 0.8, n 8/ 2.

20 Notes on Sample Size for One-Sample Tests For a matched case-control study (paired, dependent samples), you still need 2n subjects.

21 Notes on Sample Size for One-Sample Tests For a matched case-control study (paired, dependent samples), you still need 2n subjects. That is still only half the sample size needed for an unmatched design (more variability in two independent groups need more samples)

22 Notes on Sample Size for One-Sample Tests For a matched case-control study (paired, dependent samples), you still need 2n subjects. That is still only half the sample size needed for an unmatched design (more variability in two independent groups need more samples) Rule of thumb: When σ 2 is estimated (from previous studies), add 2 to n.

23 One-Sided Tests When doing a sample size calculation for a one-sided test, replace z 1 α/2 by z 1 α in the formulae above.

24 One-Sided Tests When doing a sample size calculation for a one-sided test, replace z 1 α/2 by z 1 α in the formulae above. For α = 0.05, these are of course 1.96 and 1.645, respectively.

25 Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2.

26 Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered:

27 Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered: One group of people is difficult to recruit.

28 Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered: One group of people is difficult to recruit. The costs of the two treatments are different.

29 Unequal Sample Sizes In general, for a two-sample problem, when the total sample size n 1 + n 2 is fixed it is most efficient to have n 1 = n 2. Situations where unequal sample sizes should be considered: One group of people is difficult to recruit. The costs of the two treatments are different. The variances of the two populations are different.

30 Counts, Rates and Proportions Part II Counts, Rates and Proportions

31 Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p.

32 Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p. The mean of X is np and its variance is np(1 p).

33 Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p. The mean of X is np and its variance is np(1 p). ˆp = X /n is an estimator of p with variance (ˆp(1 ˆp))/n.

34 Counts, Rates and Proportions Binomial Distribution Refresher A binomial variable X with distribution B(n, p) can be interpreted as the total number of successes in n independent and identical Bernoulli trials with success probability p. The mean of X is np and its variance is np(1 p). ˆp = X /n is an estimator of p with variance (ˆp(1 ˆp))/n. The sampling probability (the mean of ˆp) and the population proportion (p) are equal only under simple random sampling.

35 Counts, Rates and Proportions What are these rates? Definitions The incidence of a disease is the number of new cases diagnosed during the time interval.

36 Counts, Rates and Proportions What are these rates? Definitions The incidence of a disease is the number of new cases diagnosed during the time interval. The prevalence of a disease is the number of individuals with the disease at a fixed time point.

37 Counts, Rates and Proportions Cautions in Comparing Proportions What are the numerators?

38 Counts, Rates and Proportions Cautions in Comparing Proportions What are the numerators? What are the denominators?

39 Counts, Rates and Proportions Confidence Intervals for Proportions Wilson s 95% CI: p ± 1.96 p(1 p) n + 4, where p = X + 2 n + 4.

40 Counts, Rates and Proportions Confidence Intervals for Proportions Wilson s 95% CI: where p ± 1.96 p(1 p) n + 4, p = X + 2 n + 4. This technique has a Bayesian interpretation: note it is as if we are adding two successes and two failures to the actual observed dataset.

41 Counts, Rates and Proportions Confidence Intervals for Proportions Wilson s 95% CI: where p ± 1.96 p(1 p) n + 4, p = X + 2 n + 4. This technique has a Bayesian interpretation: note it is as if we are adding two successes and two failures to the actual observed dataset. It is still more common to use the ordinary ˆp = X /n (instead of p) when all we want is a point estimate of p.

42 Counts, Rates and Proportions Rare Events Wilson s CI does not work very well when p is very close to 0 or 1: the result of our Bayesian prior belief that p is close to 1/2 (our fake preliminary data are balanced: 2 successes, 2 failures)

43 Counts, Rates and Proportions Rare Events Wilson s CI does not work very well when p is very close to 0 or 1: the result of our Bayesian prior belief that p is close to 1/2 (our fake preliminary data are balanced: 2 successes, 2 failures) The rule of threes : If in n trials, no success is observed, the estimated success probability is 0, with an approximate 95% upper bound 3 n.

44 Counts, Rates and Proportions Large-sample testing for a population proportion To test H 0 : p = p 0, use the z-statistic: where ˆp = X /n. z = ˆp p 0 p 0 (1 p 0 ) n,

45 Counts, Rates and Proportions Large-sample testing for a population proportion To test H 0 : p = p 0, use the z-statistic: where ˆp = X /n. z = ˆp p 0 p 0 (1 p 0 ) n Note that p 0 is used and Z has a standard normal distribution (when n is large, e.g., np 0 > 10 and n(1 p 0 ) > 10 or np 0 (1 p 0 ) > 5).,

46 Counts, Rates and Proportions Large-sample testing for a population proportion To test H 0 : p = p 0, use the z-statistic: where ˆp = X /n. z = ˆp p 0 p 0 (1 p 0 ) n Note that p 0 is used and Z has a standard normal distribution (when n is large, e.g., np 0 > 10 and n(1 p 0 ) > 10 or np 0 (1 p 0 ) > 5). The p-value again depends on H 1 : H 1 : p > p 0 use Pr(Z z) H 1 : p < p 0 use Pr(Z z) H 1 : p p 0 use Pr( Z z ) = 2 Pr(Z z ),

47 Counts, Rates and Proportions Choosing a sample size for a desired margin of error Recall the margin of error for our large-sample Wilson CI is z SE p = z p(1 p) n + 4 where typically z = 1.96, the upper.025 point of Z.

48 Counts, Rates and Proportions Choosing a sample size for a desired margin of error Recall the margin of error for our large-sample Wilson CI is z SE p = z p(1 p) n + 4 where typically z = 1.96, the upper.025 point of Z. When doing a sample size calculation, we must guess the value of p; call it p. We can either Use an estimate of p from an earlier, pilot study, or Use p = 0.5, since this will maximize the margin of error conservative! (safe regardless of what p turns out to be)

49 Counts, Rates and Proportions Choosing a sample size for a desired margin of error Recall the margin of error for our large-sample Wilson CI is z SE p = z p(1 p) n + 4 where typically z = 1.96, the upper.025 point of Z. When doing a sample size calculation, we must guess the value of p; call it p. We can either Use an estimate of p from an earlier, pilot study, or Use p = 0.5, since this will maximize the margin of error conservative! (safe regardless of what p turns out to be) Using the conservative p, the required sample size is ( ) z 2 n = 4, 2m provided this number is still positive!

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews Outline Outline PubH 5450 Biostatistics I Prof. Carlin Lecture 11 Confidence Interval for the Mean Known σ (population standard deviation): Part I Reviews σ x ± z 1 α/2 n Small n, normal population. Large