Difference between means - t-test 1
Discussion Question p492 Ex 9-4 p492 1-3, 6-8, 12 Assume all variances are not equal. Ignore the test for variance. 2
Students will perform hypothesis tests for two sample means using the t statistic. Compares the means of two small, independent samples. Two choices, variances are equal, variances are not equal. We will assume variances are not equal. 3
4 t-test When variances are assumed to be equal Test Statistic = Observed Value - Expected Value standard error t = X - µ, d.f. = n 1 s n With two samples things need adjusting... t = ( ) X X µ µ 1 2 1 2 ( ) (n 1)s 2 + (n 1)s 2 1 1 1 2 2, d.f. = n 1 + n 2 2 n 1 + n 2 2 n 1 + 1 n 2
5 t-test When variances are assumed to be equal With two samples from populations of equal variances, the variances can be weighted according to the size of the samples. The larger sample (greater d.f.) is weighted more than the smaller sample. This creates a pooled variance. t = X - µ s t = ( ) X X µ µ 1 2 1 2 ( ) (n 1)s 2 + (n 1)s 2 1 1 1 2 2 n n 1 + n 2 2 n 1 + 1 n 2
6 t-test When variances are assumed to be not equal Test Statistic = Observed Value - Expected Value standard error t = X - µ, d.f. = n 1 s n This time the formula is more familiar... ( ) ( ) µ µ 1 2 X X 1 2 t Z = σs 2 2 1 σs + 2 n 1 n 2 ( ) X X 1 2 = σs 2 2 1 σs + 2 n 1 n 2 ( ) *, d.f. = small n 1 * Not really
7 t-test When variances are assumed to be not equal Now that I told you all that, simply to tell you this... Do not bother to pool the variance. First, we have not done the test for homogeneity of variance, and second, the gain from pooling is not worth the work.
8 Confidence Interval As with the previous hypothesis testing, the t statistic can determine confidence intervals. We find the interval within which we expect to find the difference between means. With the t statistic, we will always assume the variances are not equal and we will NOT pool variances.
9 Confidence Interval Since we assume the population variances are not equal. ( X X ) t 1 2 α 2 σ 2 2 1 ( ) + t α 2 + σ 2 < µ µ < X X n n 1 2 1 2 1 2 σ 1 2 2 n 1 + σ 2 n 2 Your book uses d.f. = (smaller n) - 1, but I prefer you not. We will let the calculator find the degrees of freedom for us.
10 Steps for Hypothesis Testing 1. Formulate all hypotheses: H0, Ha 2. Determine the test statistic and the critical value of the test statistic (based on α) that will assess the evidence against the null hypothesis. (Currently Z) 2a. Draw, label, and appropriately shade the curve representing H0 3. Find value(s) of the test statistic based on the data and the p-value for that statistic. 4. Make a decision to Reject or Fail to reject H0. 5. Tell someone about your conclusion.
11 Ejemplo A study was conducted to compare college majors requiring 12 or more hours of mathematics to majors that required less than 12 hours of mathematics. The mean income of 8 majors requiring more than 12 hours of math was $45,692 per year, with a standard deviation of 1236. The mean income of 10 majors that require less than 12 hours of math was $40,416, with a standard deviation of 5324. Test for a difference in the incomes at the.05 level. Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10
12 TI-84 Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10 We will conduct a 2-tailed 2-Sample T-test. STAT TESTS 4:2-SampleTTest Inpt: Stats x1: 45692 Sx1: 1236 n1: 8 x2: 40416 Sx2: 5324 n2: 10 µ2 Result µ1 µ2 t= 3.033256297 p=.0123463074 df= 10.19403292 x1= 45692 x2: 40416 Sx2: 5324 n1= 8 Pooled: No n2: 10
13 Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10 1. Hypotheses - Means H 0 : µ M = µ L and H 1 : µ M µ L H 0 : µ M µ L = 0 and H 1 : µ M µ L 0 M = more than 12 hours L = less than 12 hours 2. Test Statistic and Critical Value If you choose to use the book choice for d.f. Since our samples are small and we do not have -2.3646 2.3646 the population σ we will use the t-statistic..025 95%.025 α =.05 α/2 =.025, d.f. = 8-1 = 7 t c = invt(.975, 7) = 2.3646-3σ -2σ -1σ 0 1σ 2σ 3σ
14 2. Test Statistic and Critical Value Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10 Since our samples are small and we do not have the population σ we will use the t-statistic with the df provided by calculator. If you choose to use the calculator definition of d.f. α =.05 α/2 =.025, d.f. = 10.19 t c = invt(.975, 10.19) = 2.2225.025-2.2225 95% 2.2225.025-3σ -2σ -1σ 0 1σ 2σ 3σ
15 3. Calculate t & p-value Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10 t = ( ) 45692 40416 0 = 3.033 1236 2 8 + 53242 10 p-value: p(t > 3.033) = p(x 1 - x 2 > 5246) If you choose to use the small d.f. you will need to use tcdf (or normalcdf) to find the p-value. d.f. = n - 1 = 8-1 = 7 tcdf(3.033, 99, 7) =.0095 x 2 =.0190 d.f. = 10.1940 2-SampTTest(stats, 45692, 1236, 8, 40416, 5324, 10, µ 2, No) =.0123
16 Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10 4. Decision 3.033 > 2.3646 (2.2225),.019 <.05, we reject the null. 5. Conclusion There is sufficient evidence to suggest there is a difference in the mean incomes -2.3646 2.3646-2.2225 2.2225 for majors requiring 12 units of math and those not requiring 12 units of math..025.0095 95%.025.0095-3.033 3.033-3σ -2σ -1σ 0 1σ 2σ 3σ
17 Confidence Interval TI-84 Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10 STAT TESTS 9:2-SampTInt Inpt: Data Stats x1: 45692 Sx1: 1236 n1: 8 x2: 40416 Sx2: 5324 n2: 10 C-Level:.95 Pooled: No Yes Calculate 2-SampZInt (1410.4, 9141.6) x1: 45692 x2: 40416 Sx1: 1236 Sx2: 5324 n1: 8 n2: 10 Based on data from our sample of 8 and 10, we believe the true difference in incomes is between $1410.40 and $9141.60
18 Confidence Interval Sample M; x = 45692, s = 1236, n = 8 Sample L; x = 40416, s = 5324, n = 10 Since we conclude it is plausible to believe there is a difference in the mean incomes, what is that difference? X M X L = 45692 40416 = 5276 d.f. = 10.19 d.f. = 7 5276 ± 2.2224 12362 8 + 53242 10 5276 ± 2.3646 12362 8 + 53242 10 (1410.36, 9141.61) (1163.05, 9388.95) Based on data from our sample of 8 and 10, we believe the true difference in incomes is between $1410 and $9141 ($1163 and $9389)
19 Another Example Two groups were randomly assigned to two speed-reading classes that use different teaching techniques. Test the claim that the Orozco Oral Method gives better results than the VanVoorhis Vision Method. Use α =.01. Orozco Oral VV Vision n = 16 n = 12 x = 44.0 x = 36.5 s = 13.2 s = 10.2 1. Hypotheses H0: µ O µ VV ; H1: µ O > µ VV H0: µ O µ VV 0; H1: µ O µ VV > 0 O = Orozco VV = VanVoorhis
20 Orozco Oral VV Vision 2. Test Statistic and Critical Value n = 16 n = 12 x = 44.0 x = 36.5 s = 13.2 s = 10.2 Samples are small and no σ, we will use the t-statistic. We will run a one tail t-test. t c = invt(.99, 25.9567) = 2.4789 t c = invt(.99, 11) = 2.7181 99% 2.7181 2.4789.01-3σ -2σ -1σ 0 1σ 2σ 3σ Slide 24
21 TI-84 STAT TESTS 4:2-SampleTTest n = 16 n = 12 Orozco Oral VV Vision Results x = 44.0 x = 36.5 Input s = 13.2 s = 10.2 2-SampleTTest Stats µ1 > µ2 x1: 44 sx1: 13.2 n1: 16 x2: 36.5 sx2: 10.2 t=1.6958 p=.0509 df=25.9567 x1=44 x2=36.5 99% 1.6958 2.7181 2.4789.01 n2: 12 sx1: 13.2-3σ -2σ -1σ 0 1σ 2σ 3σ µ1: >µ2 sx2=10.2 Pooled: No n1=16 Calculate n2=12
22 Results Orozco Oral VV Vision 3. Calculate t & p-value 2-SampleTTest µ1 > µ2 t=1.6958 n = 16 n = 12 x = 44.0 x = 36.5 s = 13.2 s = 10.2 p=.0509 ( ) 44 36.5 0 df=25.9567 t = = 1.6958 13.2 2 10.2 + 16 12 2 x1=44 x2=36.5 sx1: 13.2 p(t 1.6958) = p(x 1 - x 2 7.5) sx2=10.2 n1=16 1.6958 2.7181 2.4789 n2=12 99% tcdf(1.6958, 99, 11) =.0590.01-3σ -2σ -1σ 0 1σ 2σ 3σ
23 Results Orozco Oral VV Vision 4. Decision 1.6958 < 2.4789,.0509 >.01, we fail to reject the null. 5. Conclusion 2-SampleTTest µ1 > µ2 t=1.6958 p=.0509 df=25.9567 x1=44 x2=36.5 n = 16 n = 12 x = 44.0 x = 36.5 s = 13.2 s = 10.2 sx1: 13.2 There is not sufficient evidence sx2=10.2 to suggest that the Orozco Oral n1=16 1.6958 method of reading produces n2=12 2.4789 significantly better results than the VanVoorhis Vision method. 99%.01-3σ -2σ -1σ 0 1σ 2σ 3σ
24 Confidence Interval For this example there is no reason to do so, but we can find the 98% confidence interval within Orozco Oral VV Vision n = 16 n = 12 x = 44.0 x = 36.5 s = 13.2 s = 10.2 which we expect to find the true difference (0) between the reading methods. Stat - Tests - 0:2-SampTInt X M X L = 44 36.5 = 7.5 13.22 10.2 2 7.5 ± 2.4789 + 16 12 Slide 24 ( 3.4634, 18.4634) 13.22 10.2 2 7.5 ± 2.7181 + 16 12 ( 4.5213, 19.5213) Stats x1: 44 sx1: 13.2 n1: 16 x2: 36.5 sx2: 10.2 n2: 12 C-Level:.98 Pooled: No (-3.463, 18.463) df=25.9567 x1: 44 x2: 36.5 sx1: 13.2 sx2: 10.2 n1: 16 n2: 12
25 Conclusion ( 3.4634, 18.4634) We are 98% confident that the true difference in reading performance between the Baldon Bio Method and DeAlba Decoding Method is between -3.5 and 18.5. Since 0 is found within the interval there is not sufficient evidence to believe there is a significant difference.