Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Size: px

Start display at page:

Download "Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval"

Dustin Gaines
5 years ago
Views:

1 Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario

2 What is being covered 1. sample size 2. inference about single samples - goodness of fit

3 sample size and power calculations 1. sample size for margin of error (E) 2. power 3. sample size for effect size (ES)

4 required margin of error - precision E is margin of error - half interval width measure of precision from previous formula for single sample n = π(1 π) ( z(α/2) ) 2 E since π usually unknown use π = 0.5 which maximizes π(1 π) Formula becomes n = 1 ( z(α/2) ) 2(1) 4 E

5 sample size - margin of error - example α = 0.05, π = 0.5 want to estimate proportion (prevalence) of smokers to within 10% E =.10 ( 1.96 ) 2 n = = 0.25(384.16) = that is 97, (essentially 100) for E = 0.05 (width of 0.10) n is almost 400. since z α/2 2 formula is essentially n = ( ) 1 2 (2) E

6 Power Epidemiology 9509 interested in the difference in probability δ π = π A π o where π o is the probability (proportion) under null and π A is the probability (proportion) under alternative we have to specify π o and π A because variances are related to means cannot use the formula for a single population that we used previously, but have to derive a new formula.

7 power (continued) The rule for rejecting a null hypothesis if p > π o +z α/2 σ po, then reject H o. where π σ po = o(1 π o) n

8 formula for power Pr where π σ pa = A (1 π A ) n ( Z N > z ) α/2σ po π o π A (3) σ pa This may also be written as ( Pr Z N > z α/2σ o ) n π o π A (4) σ A where σ o = π o (1 π o ) and σ A = π A (1 π A ) π o π A is Effect Size (for proportions)

9 Example of calculation of power for proportion Usually α = 0.05, so that z α/2 = z = current success rate in treatment of a disease is 50%. In a clinical trial of a new drug on 9 subjects, what power do we have of finding a new rate of 60%. Mathematically, this may be stated as π o = 0.5 and π A = 0.60, δ π = 0.10 = ES

10 σ o = π o (1 π o ) = 0.5(0.5) = 0.5 and σ A = π A (1 π A ) = 0.4(0.6) = 0.24 = 0.490

11 Plug these figures into equation (4) ( ) Pr Z N > 1.96(0.5) 9(0.10) ( = Pr Z N > ) = z (1.388) = Hence the chance of finding a difference of 0.10 with a sample of size 9 is about 8%

12 simplifying formula may use σ o = σ A = 0.5 in (4) In our example, this becomes Pr(Z N > z α/2 2 π o π A n)(5) Pr(Z N > (0.1) 9) = Pr(Z N > ) = Pr(Z N > 1.36) = z (1.36) = This is larger than previous calculation optimistic; only use when π o and π A close to 0.5

13 Sample size for a single proportion start with equation (4) and solve for n. ( 1 β = Pr Z N > z α/2σ o ) n π o π A σ A and eventually get n = ( ) zα/2 σ o +z β σ 2 A π o π A where σ o = π o (1 π o ) and σ A = π A (1 π A ) π o π A is sometimes referred to as ES (Effect size)

14 This formula maximized when σ o = σ A = 0.5 n = ( ) zα/2 +z 2 β (6) 2 π o π A don t use when π o or π A far from 0.5.

15 Example of sample size calculation Usually, the required power is 80% so that z β = z (0.20) = For the same clinical trial, where we wish to show a proportion of 0.6 (π A ) where the usual result is 0.5 (π o ), we have σ A = and σ o = 0.5. Moreover ES, the effect size,is δ π = = 0.1

16 Substituting into the preceding equation, we get n = ( zα/2 σ o +z β σ A π o π A ) 2 ( ) 1.96(0.5) (0.490) 2 = 0.1 ( ) = = = Hence 194 subjects are required (close enough to 200) Short formula (6) gives ( n = 2(0.1) ) 2 or 197 subjects = ( ) =

17 Goodness of fit chi-square multinomial (categorical) data observe x i in each of k categories

18 examples toss coin k = 2 20 tosses, 8 heads (x 1 ), 12 tails (x 2 ) is coin fair? random sample of 20 graduate students 8 males (x 1 ), 12 females (x 2 ) supposedly 70% of Western grad students are female does this seem to be a random sample of the population? of 36 graduate students, selected at random 9 are smokers and 27 are not Statistics Canada says that 20% of young people smoke. Do the results for this sample agree with that?

19 examples (continued) of the 20 graduate students 12 from Ontario, 4 from Canada outside Ontario 4 from outside Canada k = 3 supposedly the population figures for Western are 70% Ontario, 20% Canadian outside Ontario 10% from outside Canada Is this sample representative of population?

20 statistical test Exact distribution is binomial 8 heads from 20 trials H o : π = 0.5 H A : π 0.5 p-value = Pr(X B 8 π = 0.5) +Pr(X B 12 π = 0.5) = 2Pr(X B 8 π = 0.5) symmetric Use SAS program to calculate p-value =

21 approximate test goodness of fit how well does data fit theoretical distribution when π = π o can be used to test against two-sided alternative H A : π π o O i observed : x i E i expected : under H o O 1 = 8,E 1 = 10 O 2 = 12,E 2 = 10

22 approximate test (continued) S = k (O i E i ) 2 i 1 E i = (8 10)2 10 = = (12 10)2 10 under H o, S P χ 2 k 1 Karl Pearson

23 example (continued) In this case S χ 2 1 pvalue = Pr(χ 2 1 > 0.8) > 0.10 at α = 0.05, fail to reject H o

24 better p-values χ 2 1 (Z N) 2 Pr(χ 2 1 > 0.8) = 2Pr(Z N > 0.8) = 2Pr(Z n > ) = 2z (0.8944) = 2( ) by linear interpolation =

25 better approximate test χ 2 approximation to binomial continuity correction Frank Yates S Y = 2 ( O i E i 0.5) 2 i=1 E i = ( )2 10 = = ( )2 10

26 better approximate test (continued) Pr(χ 2 1 > 0.45) = 2Pr(Z N > 0.45) = 2Pr(Z n > ) = 2(.25115) by liner interpolation = good approximation 1. nπ o = 20(0.5) = 10 > 5 2. O i > 5,i = 1,...,k

27 example II same observations, O 1 = 8,O 2 = 12 but different E i E 1 = 20(0.3) = 6 E 2 = 20(0.7) = 14 S = 2 ( O i E i 0.5) 2 i=1 E i = ( )2 6 = = ( )2 14 so that p-value= 2Pr(Z N > ) = 2(0.2321) =

28 example III O 1 = 9,O 2 = 27 E 1 = 36(0.2) = 7.2 E 2 = 36(0.8) = 28.8 S = 2 ( O i E i 0.5) 2 i=1 E i = ( )2 7.2 = = so that p-value= Pr(χ 2 1 > ) = 2Pr(Z N > ) = 2z (0.5417) = 2(0.2940) = ( )2 28.8

29 Relationship - test of proportion and Goodness of Fit The test of H o : π = π o against a two-sided interval H A : π π o can be handled by the p-value calculation p = 2Pr(Z N > (p πo) 1/2n ) πo(1 π o)/n or by the Goodness of Fit test S = 2 i=1 ( O i E i 0.5) 2 E i

30 example IV O 1 = 12,E 1 = 20(0.7) = 14 O 2 = 4,E 2 = 20(0.2) = 4 O 3 = 4,E 3 = 20(0.1) = 2 nocc S = (12 14) (4 2)2 2 = = (4 4)2 4 However, under H o, S χ 2 2 so that pvalue = Pr(χ 2 2 > 2.285) > 0.10

31 Using SAS for inference with a single population title inference for single sample probabilities ; options ls=64; proc format; value grp 0= non-smoker 1= smoker ; data marj; input grp smok; format grp grp.; datalines; ;

32 SAS program (continued) proc freq; weight smok; tables grp/binomial(level = smoker p=0.2 wilson ac); exact binomial; quit; 1. have to indicate group membership; 2. indicate counts by using the WEIGHT command in Proc FREQ; 3. Wilson (option WILSON) and adjusted Wald (option AC) confidence intervals. 4. SAS does not do continuity correction We have to ask for an exact test for the binomial (EXACT BINOMIAL).

33 output of SAS program inference for single sample probabilities The FREQ Procedure Cumulative Cumulative grp Frequency Percent Frequency Percent non-smoker smoker Binomial Proportion for grp = smoker Proportion ASE Type 95% Confidence Limits Wilson Agresti-Coull

34 output of SAS program (continued) Test of H0: Proportion = 0.2 ASE under H Z One-sided Pr > Z Two-sided Pr > Z Exact Test One-sided Pr >= Two-sided = 2 * One-sided Sample Size = 36

35 SAS with raw data raw data refers to data that occurs as one subject per line not in table form Solution: use Proc FREQ as above, but 1. don t need WEIGHT command, because SAS calculates the weights, that is, the counts (number of people in each group) 2. use variable which defines group eg sex, or cryo. etc.

36 SAS program proc freq data=fred.cancer; tables cryo/binomial(level = smoker p=0.2 wilson ac); exact binomial; quit;

37 Creating new SAS datasets Remember that you require 1. LIBNAME for your dataset 2. LIBNAME for your formats library 3. DATA command with name of the new permanent dataset 4. SET sub-command with name of current permanent dataset

38 example of part of SAS program for dataset creation LIBNAME fred U:/Epid9509 ; LIBNAME library U:/Epid9509 ; DATA fred.cancer2; SET fred.cancer;

39 creating new variables 1. Must be done in a DATA step 2. often involves IF statement 3. usually involves initially creation of new variable then modification of values diag3 = diagnosis ; if (diagnosis ge 2) then diag3 = 2; 4. missing value indicator is. diag2 = diagnosis ; if (diagnosis ge 2) then diag2 =.;

Epidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval

Epidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval Epidemiology 9509 Principle of Biostatistics Chapter 11 - Inference about probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is