Midterm 1 and 2 results Midterm 1 Midterm 2 ------------------------------ Min. :40.00 Min. : 20.0 1st Qu.:60.00 1st Qu.:60.00 Median :75.00 Median :70.0 Mean :71.97 Mean :69.77 3rd Qu.:85.00 3rd Qu.:85.0 Max. :100.00 Max. :95.0 SD :14.75 SD :17.35 n=66 c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 2 / 18
Pooled t-test on midterm means Suppose we are wondering if µ 1 = µ 2 (i.e. µ 1 µ 2 = 0) The difference in the sample mean is 2.2 Sample standard deviations are not so different Assumptions: Midterm 1 and Midterm 2 results are independent Sample size is not large Unknown but equal variances c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 3 / 18
Pooled t-test on midterm means Hypotheses: H 0 : µ 1 µ 2 = 0, H a : µ 1 µ 2 0 s 2 p = (n 1 1)s 2 1 +(n 2 1)s 2 2 n 1 +n 2 2 = 217.60 Test statistic: t = q 71.97 69.77 0 = 0.86 217.60 ( 1 66 + 1 66 ) Critical value: if α = 0.05, t 0.025,66+66 2 = 1.978 We DO NOT reject H 0 c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 4 / 18
Unpooled t-test on midterm means What if we assume that the variances are not equal? Hypotheses: H 0 : µ 1 µ 2 = 0, H a : µ 1 µ 2 0 Test statistic: t = x 1 x 2 (µ 1 µ 2 ) r s 2 1 + s2 2 n 1 n 2 ν = ( s 2 1 + s2 2 2 ) n 1 n 2 ( s2 1 n ) 2 s 2 2 1 n 1 1 +( n ) 2 1 n 2 1 = 126.72 127 = q 71.97 69.77 0 217.60 + 301.10 66 66 Critical value: if α = 0.05, t 0.025,127 = 1.979 We DO NOT reject H 0 = 0.78 c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 5 / 18
Midterm 1 and 2 results STAT211 507 Spring 2011 Midterm 2 20 40 60 80 40 50 60 70 80 90 100 Midterm 1 Do you think the two scores are correlated? c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 6 / 18
Midterm 1 and 2 results In the previous two tests (Pooled and Unpooled), we assumed that the two populations are independent Now that we see positive correlation between the two, the independence assumption may not be realistic For some dependent data, there may be a third variable that connect the two data sets. We call it paired data. The previous data set is a good example of paired data set. How do we do t-test when we have paired data? c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 7 / 18
Paired t-test for Midterm scores Suppose we test H 0 : µ 1 µ 2 = 0 and H a : µ 1 µ 2 0 The differences between Midterm 1 and 2 scores for each students: 20 10 25 0-5 10-15 15 10 0-10 15 10 0 15-5 0 0-20 10 20 30 0 0-15 5 0 0-5 25-10 10 15 25 5-15 -5 0-5 20-20 5 5-5 -10-10 -25-5 0-5 -30 0 15-10 25 20-10 -20 0 40-5 -5 10-15 10 5 The sample mean of the differences is and the sample standard deviation is 14.04 Test statistic: t = 2.2 0 14.04/ = 1.27 66 Critical value: t α/2,66 1, if α = 0.05, t 0.025,65 = 1.99 We DO NOT reject H 0 c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 9 / 18
Exercise: The effect of Prozac Suppose you wish to test the effect of Prozac on the well-being of depressed individuals, using a standardized well-being scale that sums Likert-type items to obtain a score that could range from 0 to 20. Higher scores indicate greater well-being. c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 10 / 18
Exercise: The effect of Prozac We want to see if Prozac has a positive effect on the well-being or not Sample average of moodpre is 3.3333 and that of moodpost is 7.0000 Sample average of difference is 3.6667 and sample standard deviation is 3.5000 A confidence interval of µ 1 µ 2 with confidence level 100(1 α)% is 3.6667 ± 3.5/ 9 What test should we use? Pooled t-test Unpooled t-test Paired t-test c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 11 / 18
Exercise: The effect of Prozac Test statistic: t = 3.6667 0 3.5/ = 3.143 9 H 0 : µ 1 µ 2 = 0, H a : µ 1 µ 2 < 0 Suppose we test at significance level 5% t 0.05,8 = 1.86, t 0.025,8 = 2.3 Do we reject H 0? For T t(df = 9), P(T < 3.143) = 0.0069. What is P-value? c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 12 / 18
Two-sample z test (normal population with known variance) Suppose X 1,, X n1 is a random sample from normal distribution with mean µ 1 and variance σ 2 1 Suppose Y 1,, Y n2 is a random sample from normal distribution with mean µ 2 and variance σ 2 2 Suppose the two samples are independent A confidence interval of µ 1 µ 2 with level 100(1 α)% is x ȳ ± z α/2 σ 2 1 n 1 + σ2 2 n 2 When H 0 : µ 1 µ 2 = 0, test statistic: z = r x ȳ 0 σ 1 2 + σ2 2 n 1 n 2 The rest is the same as in Topic 7 c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 13 / 18
Two-sample z test (large sample) Suppose X 1,, X n1 is a random sample from normal distribution with mean µ 1 and variance σ 2 1 (unknown) Suppose Y 1,, Y n2 is a random sample from normal distribution with mean µ 2 and variance σ 2 2 (unknown) Suppose n 1, n 2 are large and the two samples are independent A confidence interval of µ 1 µ 2 with level 100(1 α)% is x ȳ ± z α/2 s 2 1 n 1 + s2 2 n 2 When H 0 : µ 1 µ 2 = 0, test statistic: z = r x ȳ 0 s 2 1 + s2 2 n 1 n 2 The rest is the same as in Topic 7 c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 14 / 18
Two-sample test on population proportions (large sample) Suppose we have two independent samples X 1,, X m and Y 1,, Y n Suppose m and n are large p 1 is population proportion of X 1,, X m and p 2 is population proportion of Y 1,, Y n A confidence interval of p 1 p 2 with level 100(1 α)% is ˆp ˆp 1 ˆp 2 ± z 1ˆq 1 α/2 m + ˆp 2ˆq 2 n When H 0 : p 1 p 2 = 0, test statistic: z = ˆp 1 ˆp 2 qˆpˆq( 1 m + 1 n ) (ˆp = m m+n ˆp 1 + n m+n ˆp 2) c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 15 / 18
Exercise 1 The level of lead in the blood was determined for a sample of 152 male hazardous-waste workers age 20-30 and also for a sample of 86 female workers Male: sample average is 5.5 and sample standard deviation is 0.3 Female: sample average is 3.8 and sample standard deviation is 0.2 What is a 95% CI for the difference between average blood lead levels for male and female? z 0.025 = 1.96, z 0.05 = 1.64 (1.63,1.76) c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 16 / 18
Exercise 1 Someone claims that the difference is greater than 1.6. Test this claim at significance level 5% Test statistic: z = 3.08 What is P-value? (P(Z > 3.08) = 0.001) We reject H 0 at level 5% Can we conclude the same from a 95% CI? c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 17 / 18
Exercise 2 A sample of 300 urban adult residents of a particular state revealed 63 who favored increasing the highway speed limit from 55 to 65 mph, whereas a sample of 180 rural residents yielded 75 who favored the increase Urban people: sample proportion is 0.21 Rural people: sample proportion is 0.42 What is 95% CI for the difference of proportion? (-0.30,-0.12) Now test H 0 : p 1 p 2 = 0, H a : p 1 p 2 < 0 Test statistic: z = 4.9 What is P-value? (P(Z < 4.9) = 0.00000048) Do you think a 99% CI should include zero? c Mikyoung Jun (Texas A&M) stat211 lecture 17 April 7, 2011 18 / 18