Chapter 7 Comparison of two independent samples

Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N 1, N : number of elements in each population n 1, n : sample sizes y 1, y : sample means s 1, s : sample standard deviations 1

Type of populations : 1. Those that occur naturally - observational study. Those created by intervention - experimental study Q: What do we want to do in Chapter 7? A: Compare samples from different populations. Q: How do we compare the samples? A: Using complementary approaches i.e. Method 1: Confidence Interval approach Method : Hypothesis testing approach Q: For two quantitative variables, what can we compare? A: Their 1. means,. standard deviations (variance) and 3. shapes

7. Standard error of y 1 y Basic Idea Chapter 6 y estimates To measure how precise y estimates we calculated SE y s n Chapter 7 y 1 estimates 1 (population 1) y estimates (population ) To measure how precise y 1 and y estimates 1 and respectively, we can calculate SE y 1 s 1 n 1 and SE y s n...but... Q: How do we compare the sample means y 1 and y? A: We look at their difference i.e. we look at y 1 y (or y y 1 )...THUS... y 1 y estimates 1 (or y y 1 estimates 1 ) 3

And now we can ask: Q:How precise does y 1 y estimate 1? A:We need to look at SE y 1 y Formula for the unpooled standard error of y 1 y SE y 1 y s 1 n 1 s n s 1 n 1 s n SE y 1 SE y SE y 1 y SE y 1 SE y Graph SE y y ( ) 1 SE y1 SE y1 Note: The 90 angle represents independent 4

samples 5

Formula for the pooled standard error of y 1 y Suppose we have populations i.e. N 1 Population 1 1 1 N Population Assume: 1 and 1 versus Assume: 1 and 1 6

Assume we take a random sample from each population i.e. n 1 Sample 1 n Sample y 1 s 1 y s Q: Since we assume that 1, how can we combine s 1 from sample 1 and s from sample, to obtain a pooled estimate for the variance in the populations and at the same also take into account that the sample sizes differ i.e. n 1 n.? A: s pooled n 1 1 s 1 n 1 s n 1 1 n 1 n 1 1 s 1 n 1 s n 1 n 7

Therefore, becomes SE y 1 y s 1 n 1 s n SE pooled s pooled n 1 s pooled n s pooled n 1 1 n 1 8

Method 1 7.3 Confidence Interval for ( 1 In chapter 6: Confidence interval for y t df, SE y df n 1 degrees of freedom 1 100% confidence level In chapter 7: Confidence interval for 1 y 1 y t df, SE y 1 y df degrees of freedom 1 100% confidence level Q:How do we find df? A: There are three methods: 1. df SE 1 SE SE4 1 n 1 1 SE 4 n 1 (a good approximation). smaller of n 1 1 and n 1 i.e. df min n 1 1, n 1 (bit conservative actual CL is larger) 3. df n 1 n (bit liberal actual CL is 9

smaller) 10

Conditions for the confidence interval to be valid 1. random samples. The random samples should be independent from each other 3. The random samples should be from normal populations 11

Method 7.4 Hypothesis Testing The hypothesis testing procedure consists of 5 steps: 1. The Null and Alternative Hypothesis H 0 : H A :. Choose the significance level 3. Calculate the t S -test statistic t S 4. Calculate the p-value p-value 5. Conclusion 1

Step 1 The null and alternative hypothesis H 0 : The null hypothesis is 1 1 0 i.e. the two population means are equal H A : The alternative hypothesis is 1 1 0 i.e. the two population means are not equal Step Choose the significance level ( ) Typical choices are : 0.1, 0.05 and 0.01which is similar to a 90%, 95% and 99% confidence interval. Q: When do we choose? A: Before we start the procedure i.e. at the beginning. Q: How do we use the? A: If p-value we reject H 0 If p-value we do not reject H 0 13

Step 3 Calculate the t s -test statistic t s y 1 y 0 t SE df y 1 y 14

Step 4 Calculate the p-value Take Note: The t-table on p677 only gives the upper tail area. Step 5 Conclusion Compare the p-value with (the significance level). Conclusion follow. Take note: We only say that we rejet or do not rejet H 0. We NEVER say that we ACCEPT 15

either one of the hypothesis. 16

Take Note: (p4) 1. The Null and Alternative Hypothesis H 0 : 1 c where c is a constant H A : 1 c. Choose the significance level 3. Calculate the t S -test statistic t s y 1 y c t SE df y 1 y 4. Calculate the p-value p-value 5. Conclusion 17

7.5 Further Discussion on the t-test. Relationship between t-test (Method ) and the confidence interval (Method 1) Both methods uses y 1 y SE y 1 y t df, 0.05 if we look at a 95% CI or an alpha of 0.05 i.e. 0. 05 Method 1: We do not reject H 0 : 1 if the confidence interval includes zero. Method : We do not reject H 0 : 1 if the p-value. Which is similar to: We do not reject H 0 : 1 if the test statistic, t s t df, 18

Q: How do we know this last statement is true? A: t s and p value t df, and 19

Thus, we fail to reject H 0 on an % level of significance if and only if t s t df, y 1 y SE y 1 y t df, y 1 y t df, SE y 1 y t df, SE y 1 y y 1 y t df, SE y 1 y y 1 y t df, SE y 1 y 0 y 1 y t df, SE y 1 y Now we see that Method 1 is similar to Method. ( denoted if and only if statements) 0

Type I and type II errors TRUE situation OUR decision H 0 true H 0 false Do not reject H 0 Type II error ( ) Reject H 0 Type I error ( ) P Type I error R reject H 0 H 0 true P Type II error R do not reject H 0 H 0 false Power 1 P Type II error R do not reject H 0 H 0 false 1

7.6 One-tailed t Tests Two tailed (previous section) H 0 : 1 H A : 1 Non-directional alternative One tailed (this section) H 0 : 1 H A : 1 1 Directional alternative You will know the direction before you collect the data

The One-tailed hypothesis testing procedure 1. H 0 : 1 H A : 1?. choose your own value 3. Calculate test statistic t S y 1 y 0 t SE df y 1 y 4. Calculate the p value Take note: This is where textbook specify a -step procedure y 1 y y 1 y y 1 y 5. Conclusion 3

7.11 The Wilcoxon-Mann-Whitney Test This test is also used to compare independent samples and is a competitor for the t-test The test can be used for populations that do not have a normal distribution Distribution free Since we do not specifically use the mean or the median of the sample Non-parametric...BUT... We still look for a difference in location i.e. we still look at the degree of separation / shift between samples The Null hypothesis and the Alternative hypothesis Let Y 1 denote the observations from sample 1 and let Y denote the observations from sample H 0 : The population distributions of Y 1 and Y are the same. The test statistic to be used: U S 1. large value: two samples are well separated 4

. small value: two samples are not that well separated 5

Example 7.39 (p90) Soil respiration and plant growth at two different locations in a forest. Growth Gap 17 0 170 315 9 13 16 190 64 15 18 14 6 The procedure: 1. Arrange the observations in increasing order. a. K 1 counts the number of observation in sample that is less that each observation in sample 1 b. K 1 counts the number of observation in sample 1 that is less that each observation in sample c. As a check we use: K 1 K n 1 n n 1 : number of observations in sample 1 n : number of observations in sample 3. U S max K 1, K 4. Determine the critical value with n the larger sample size and 6

n the smaller sample size 7

The procedure applied to example 7.39: X Y 1 Y X 5 17 6 0 6 0 13 0 6.5 14 0 8 64 15 0 8 170 16 0 8 190 18 1 8 315.5 n 1 7 9 3 n 8 K 1 49. 5 K 6. 5 U S max K 1, K max 49. 5 ; 6. 5 49. 5 Take note: You should be able to find the p value for directional as well as non-directional alternatives 8

Chapter 7: Exercises 7.3 7.4 7.5 7.6 7.7 7.11 7.1 7.14 7.19 p5 p31 7.3 7.4 7.5 7.6 7.7 7.9 7.46 7.47 7.48 7.49 7.50 7.5 p43 p63 7.77 7.78 7.79 7.80 p96 Take note: These exercises is part of the textbook and can be included in any class test, semester test or exam! 9