Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Size: px

Start display at page:

Download "Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval"

Blanche Shields
5 years ago
Views:

1 Epidemiology 9509 Principles of Biostatistics Chapter 10 - Inferences about John Koval Department of Epidemiology and Biostatistics University of Western Ontario

2 What is being covered 1. differences in means of 2. depends on variances

3 simplest case know variances σ 2 1,σ2 2 sample means males 94.0 females 92.0 Are they from the same population?

4 confidence interval for difference in means ( X 1 X 2 )± ( ) σ 2 g 1 α/2 n 1 + σ2 2 n 2 g?? 1. X 1,X 2 Normally distributed N(µ 1,σ 2 1 ), N(µ 2,σ 2 2 ) g α/2 is z α/2 2. X 1,X 2 NOT Normally distributed but n 1, n 2 suitably large use Normal as an approximation 3. X 1,X 2 NOT Normally distributed but n 1, n 2 NOT suitably large see later section

5 Example: variances known ( ) ( X 1 X σ 2 2 )± z 1 α/2 n 1 + σ2 2 n 2 sample 1 (males) x 1 = 94.0,n 1 = 25,σ 2 1 = 16 sample 2 (females) x 2 = 92.0,n 2 = 16,σ 2 2 = 9 (16 ( )± ) 16 = 20 ± 1.960(1.0966) = 2.0±2.149 = ( 0.149,4.149)

6 Don t know variance - more typical 1. variances are equal 2. variances are NOT equal

7 variances equal ( X 1 X 2 )±g α/2 σ 2 ( 1 n n 2 ) but don t know variance estimate it with sp 2 = (n 1 1)s1 2+(n 2 1)s2 2 n 1 +n 2 2 where s1 2 and s2 2 are the respective sample variances s 2 p called pooled variance

8 Example: variances unknown but equal sample 1 (males) x 1 = 94.0,n 1 = 25,s 2 1 = 16 sample 2 (females) x 2 = 92.0,n 2 = 16,s 2 2 = 9 s 2 p = (24(16)+15(9) = = 13.31

9 example (continued) 95% confidence interval given by ( X 1 X 2 )±g α/2 σ 2 ( 1 n n 2 ) ( )±t 39,α/ ( ) 16 = (2.0 ± 2.023(1.168) = 2.0±2.362 = ( 0.262,4.363) contains 0 null difference plausible

10 variances not equal - more likely use original estimates of σ1 2 and σ2 2 s1 2 and s2 2 ( ) s 2 ( X 1 X 2 )± g 1 α/2 n 1 + s2 2 n 2 easy

11 variances not equal - more likely use original estimates of σ1 2 and σ2 2 s1 2 and s2 2 ( ) s 2 ( X 1 X 2 )± g 1 α/2 n 1 + s2 2 n 2 easy except for g t ν,α/2 where ν = (s2 1 /n 1+s2 2/n 2) 2 (s 1 2/n 1 )2 n 1 1 +(s2 2 /n 2 )2 n 2 1 ν pronounced new - say vee

12 variances not equal - more likely use original estimates of σ1 2 and σ2 2 s1 2 and s2 2 ( ) s 2 ( X 1 X 2 )± g 1 α/2 n 1 + s2 2 n 2 easy except for g t ν,α/2 where ν = (s2 1 /n 1+s2 2/n 2) 2 (s 1 2/n 1 )2 n 1 1 +(s2 2 /n 2 )2 n 2 1 ν pronounced new - say vee Welch(1947), Satterthwaite(1946)

13 example revisited calculate degrees of freedom ν using Satterthwaite s formula ν = (16/25+9/16)2 (16/25) 2 + (9/16) = ( )2 (0.64) 2 + (0.5625) = = ie 37 not much different from 39 df for previous version

14 example (continued) t 37,0.025 = so that 95% confidence interval is (16 ( )± ) 16 = 20 ± 2.026(1.0966) = 2.0±2.222 = ( 0.222,4.222) slightly narrower than previous

15 which approach Moser and Stevens (1992) argue that one should ALWAYS assume variances are not equal and do appropriate interval and/or test 1. you will be using the correct procedure when this is true 2. you won t do too badly when this is not true 3. the alternative (to be discussed) is terrible

16 An hypothesis test H o : µ 1 = µ 2 or H o : µ 1 µ 2 = 0 (the original null hypothesis) against H A : µ 1 µ 2 or H A : µ 1 µ 2 0

17 hypothesis using confidence interval For our example Since 0 is in 95% confidence interval calculated on page 14 at α = 0.05 we fail to reject the null hypothesis H o : µ 1 µ 2 = 0

18 other hypothesis tests: p-value p-value p = 2Pr T ν > ( X 1 X 2 ) (µ 1 µ 2 ) s s2 2 n 1 n2 have calculated ( that ν = 37 ) p = 2Pr = 2Pr ( T 37 > 2 ) = 2Pr(T 37 > ) T 37 > ( )

19 Example: p-value (continued) Since, from Table A.2, and by linear interpolation > > that is t 37,0.05 > > t 37,0.025 we have 0.05 > Pr(T 37 > ) > and, since p = 2Pr(T 37 > ) then 0.10 > p > 0.05 At α = 0.05 we fail to reject the null.

20 critical value of distribution/test statistic 1. critical value of T 37 is test statistic is ( X 1 X 2 ) s s2 2 n 1 n2 = ( ) = = since test statistic is not greater than or less than at α = 0.05 we fail to reject H o

21 critical value of statistic 1. statistic is d = ( x 1 x 2 ) which has value 2 2. critical value given by δ ±t ν,α/2 s 2 1 n 1 + s2 2 n 2 0±t 37, ±2.026(1.0966) = (2.026 is determined by linear interpolation) 3. since statistic is not greater than or less than at α = 0.05 we fail to reject H o

22 Alternative (WRONG) approach 1. test whether variances are equal or not 2. if fail to reject equality assume equality of variances use appropriate test (with pooled variance) 3. if reject equality use test with separate sample variances

23 PROBLEM test for equality of variances is weak (has low power) hence fail to reject H o when it is false use wrong test (with pooled variances) too often fail to find differences in means when one should

24 test for equality of variances ratio of sample variances s 2 max s 2 min F νmax,ν min where smax 2 is larger of two sample variances where smin 2 is smaller of two sample variances ν max is degrees of freedom of smax 2 ν min is degrees of freedom of smin 2 F is the F distribution Snedecor ( ) - American the p-value ( is given by ) p = 2Pr F νmax,ν min > s2 max smin 2

25 Example For our example s 2 1 = 16, ν 1 = 24 s 2 2 = 9, ν 2 = 15 so s2 max s 2 min = 16 9 = and p = 2Pr(F 24,15 > 1.778) From Table A.4, page A.17, we have > > so that 0.10 > p > 0.05 so at α = 0.05 we fail to reject H o

26 Paired samples 1. natural pairing eyes, brothers, twins 2. experimental pairing 2.1 before-after 2.2 both placebo and new drug to same patient 2.3 matching - on neighbourhood and gender

27 example before-after measures of blood pressure after drug treatment Subject before after difference

28 inference interested in difference H o : δ = µ 1 µ 2 = 0 against H A : δ 0 Method: 1. compute differences 2. treat them as observations from N(δ,σ 2 δ )

29 computations with differences confidence interval d ±t n 1,α/2 s 2 d n = 2.0±t 4, = 2.0 ± 2.776(0.548) = (0.479, 3.521) at α = 0.05 can reject H o

30 differences - hypothesis test p = Pr( D d or d H o true) = 2Pr T n 1 > d δ o s d 2 n = 2Pr(T 4 > 3.65) t 4,0.010 = > 3.65 > = t 4,0.025 so that > Pr(T 4 > 3.65) > and 0.05 > p > 0.2 at α = 0.05 can reject H o

31 Wrong approach 1. assume there are two independent samples 2. calculate statistics given earlier 3. loses power example using x 1 = 94.0, s 2 1 = 45.5 and x 2 = 92.0, s 2 2 = % confidence interval for δ (-8.5, 12.5) p = 2Pr(T 8 > 0.45) = We fail to reject the null which the correct (more powerful) test does reject

32 linear interpolation If we require a critical value, g x for degrees of freedom, df x when we have a table with critical value g 1 for df 1 and value g 2 for df 2 where g 1 > g 2 and df 1 < df x < df 2

33 linear interpolation 2 then g x can be calculated by linear interpolation 1. g x = g 1 n 1 n (g 1 g 2 ) where n 1 = df x df 1 and n = df 2 df 1 OR 2. g x = g 2 + n 2 n (g 1 g 2 ) where n 2 = df 2 df x and n = df 2 df 1

34 linear interpolation 3 or by linearly weighted average 1. g x = n 2g 1 +n 1 g 2 n where n 1 = df x df 1 where n 2 = df 2 df x and n = n 1 +n 2 2. in terms of original values g x = (df 2 df x)g 1 +(df x df 1 )g 2 df 2 df 1

35 Example From earlier in this lecture we wish g 37,0.05, ie df x = 37 from table A.3, we know that for df 1 = 35, g 1 = g 35,0.05 = and df 2 = 40, g 2 = g 40,0.05 = 1.684

36 solutions 1. g x = ( ) = (0.006) = = g x = ( ) = (0.006) = = g x = 3(1.690)+2(1.684) 5 = = 0.2(8.438) =

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis