STAT 503 Two Sample Inferences Comparing Two Variances Assume independent normal populations. Slide For Σ χ ν and Σ χ ν independent the ration Σ /ν Σ /ν follows an F-distribution with degrees of freedom (ν ν ). df=(35) (53) Since (n )s /σ χ n and (n )s /σ χ n one has s σ s σ F n n. To test the hypotheses 0.0 0. 0.4 0.6 0 3 4 5 By definition of F-distribution one has F.0535 = /F.97553. H 0 : σ = vs. H σ a : σ σ calculate F = s and reject H 0 s when F > F α/n n or F < F α/n n. CI For Variance Ratio Slide Amine serotonin levels were measured from heart disease patients and a control group. D C n 8 X 3840 530 s 850 640 For normal populations P(F.05ν ν s σ s σ F.975ν ν ) = 0.95. Solving for σ /σ a 95% CI for the ratio is given by ForH 0 : σ =σ vs. H a : σ σ F = 850 640 =.764 which is between F.9757 =3.76 and F.057 = /F.9757 =. so one accepts H 0. s s (F.05ν ν F.975ν ν ). A 95% CI for σ σ is given by (.764/3.76.764/.) or (.4698.3) where F.057 = /3.76 and F.9757 = /.. C. Gu Fall 08
STAT 503 Two Sample Inferences Comparing Two Means By assumption or by CLT X N(µ σ /n ) and X N(µ σ /n ) independent so Slide 3 or equivalently ( X X ) N(µ µ σ n + σ n ) ( X X ) (µ µ ) σ n + σ n N(0). A 95% CI for µ µ would be ( X X )±.96 assuming σ and σ known. σ n + σ n Practical inferences concerning µ µ with σ and σ unknown are based on variations of this distributional result. Inferences With Pooled Variance For the amine serotonin data With normality and σ =σ =σ Slide 4 x x = 3840 530 = 470 s p = 7(850) +(640) 7+ =79. A 95% CI for (µ µ ) is given by 470 ±.0(79) /8+/ or ( 69 77) where t.9758 =.0. For H a : µ µ t = 3840 530 = 4.4. 79 + 8 As t = 4.4 >.0 one rejects H 0 at the 5%-level. The p-value is given by P( t 8 > 4.4) =.00033. ( X X ) (µ µ ) σ /n +/n and σ can be estimated by N(0) s p = (n )s +(n )s (n )+(n ) the pooled variance. Since ( X X ) (µ µ ) s p /n +/n t ν where ν = n + n a CI for (µ µ ) is given by ( X X )±t α/ν s p n + n. C. Gu Fall 08
STAT 503 Two Sample Inferences 3 Inferences With Unequal Variances Slide 5 For the amine serotonin data x x = 3840 530 = 470 s + s = 850 + 640 n n 8 = 35.8 with the df given by ( 850 8 ) ( 7 )+(640 ) ( ) = 35.8 4.8. Thus a 95% CI for µ µ is given by 470 ±.75(35.8) = ( 37 703) where t.975.8 =.75. qt(...) does take fractional df. For small sample normal data with unequal variances one has ( X X ) (µ µ ) s /n +s /n where the df ν is given by ν = (s /n ) (n + (s /n ) ) (n ) (s /n +s /n ). approx. t ν Inferences concerning (µ µ ) can be conducted accordingly. In general ν < n + n. When n = n = n and s = s ν = (n ). Wilcoxon Rank-Sum (Mann-Whitney) Test When normality is of concern for small samples one may use a Wilcoxon Rank-Sum test also known as Mann-Whitney test. Slide 6 Radial lengths were measured of redandgreenmorphsofaspecies of sea star. red green 08 (8) 64 () 0 (6) 6 (9) 80 (3) 9 (4) 98 (5) 3 () 40 () 04 (7) 4 (0) So W = ++3+4+8 = 8. By Table C.8 under H 0 Consider samples X i of size m and Y j of size n.. Merge the m + n data points and rank the merged data.. Add the ranks of X i to get W. The test is designed to test H 0 : P(X<Y) = P(X>Y). P(W 0) = P(W 40) =.04 P(W 8) = P(W 4) =.05 So the p-value is in (.05.04). LargeW signalslargep(x > Y). Equivalently one may list the mnpairsofx-y andtally thenumber of times x i > y j to obtain U. C. Gu Fall 08
STAT 503 Two Sample Inferences 4 Paired Data Slide 7 Blocks of land were divided into two plots and the plots were planted with two varieties of wheat. The yields follow. Variety Block d 3. 34.5 -.4 30.6 3.6 -.0 3 33.7 34.6-0.9 4 9.7 3.0 -.3 mean 3.53 33.8 -.65 SD.76.7.676 Compare the mean yields of the two varieties. Typical pairing: blocking designs before-after studies left-right organs repeated measurements etc. Pairing effectively reduces background noise. If pairing is ignored features may be swamped by background noise. When done on irrelevant factors pairing may yield loss of power. Inference for Paired Data Slide 8 Working on d a 95% CI for wheat yield difference is given by.65 ± 3.8(.676)/ 4 or (.73 0.57) where t.9753 = 3.8. Variety appears to yield significantly more. Ignoring pairing s p = 3(.76) +3(.7) =.74 3+3 thus s p + =.3 so a 4 4 95% CI is.65 ±.447(.3) or ( 4.66.36) where t.9756 =.447. The result is inconclusive. Note that ˆσ d =.338 and ˆσ X X =.3 a 4-fold difference. σ X X includes block to block variability. Pairing results in a loss of df: compare 3.8 with.447. When pairing is ineffective say σ d σ X X this would yield loss of power. This is negligible for larger sample sizes though. Sign test or signed-rank test may be used on differences. C. Gu Fall 08
STAT 503 Two Sample Inferences 5 CIs and Tests in R In R one may use various utilities such as t.test(...) to calculate various tests and the associated CI s. Slide 9 x <- rnorm(30); y <- rnorm(0mean=) t.test(x); t.test(ymu=alt="less"); t.test(xalt= gre ) t.test(xy) ## unequal var with approximate df t.test(xyvar.equal=true) ## with pooled var est t.test(xyconf.level=.9) ## 90% CI t.test(x[:0]ypaired=t); t.test(x[:0]-y) var.test(xyratio=) ## F-test for var ratio wilcox.test(ymu=alt="less") ## signed-rank wilcox.test(xyconf.int=true) ## rank-sum (mann-whitney) wilcox.test(x[:0]ypaired=t) ## paired signed-rank C. Gu Fall 08