Comparing Means from Two-Sample

Comparing Means from Two-Sample Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 3, 2015 Kwonsang Lee STAT111 April 3, 2015 1 / 22

Inference from One-Sample We have two options to make an inference about population mean µ from one sample of size n: 1) 100(1-C)% Confidence interval and 2) Hypothesis test with a level α 1) 100(1-C)% Confidence Interval We need to consider the case when σ is known or unknown. a. Known σ ( X Z σ n, X + Z σ n ). b. Unknown σ ( X tn 1 s, X + t s n n 1 ) n Kwonsang Lee STAT111 April 3, 2015 2 / 22

Inference from One-Sample 2) Hypothesis test with a level α a. State the null and alternative hypotheses. (Here, two-sided example) H 0 : µ = µ 0 and H a : µ µ 0. b. Calculate a test statistic Z 0 (known σ) or a test statistic T 0 (unknown σ) c. Calculate the P-value P-value = Z 0 = X µ 0 σ/ n or T 0 = X µ 0 s/ n { 2 P(Z Z 0 ) σ is known 2 P(T T 0 ) σ is unknown d. Compare the P-value to the significance level α Kwonsang Lee STAT111 April 3, 2015 3 / 22

Supplement of t-test (Two-sided test) Because t-table doesn t give the P-value, we can modify our t-test. Instead of computing P-value, we can find the value tn 1 such that P(T > t n 1) = α 2 Then, Conclusion = { We reject the null We don t reject the null if T 0 t n 1 if T 0 < t n 1 Note: If one-sided alternative hypothesis is H a : µ > 0, we need to find the value tn 1 such that P(T > t n 1 ) = α. We reject the null if T 0 > tn 1. Also, if H a : µ < 0, we need to use tn 1 such that P(T < t n 1 ) = α. We reject the null if T 0 < tn 1. Draw the t-distribution and think about it! Kwonsang Lee STAT111 April 3, 2015 4 / 22

New Terminology: Standard Error X is from the population with mean µ and SD σ. We take a sample of size n from the population and say X 1,..., X n. What we learned: A sample (X 1,..., X n ) has the sample mean X and the sample SD s The sample mean X has the distribution with mean µ and SD σ n. New terminology: Standard error of X is s n. i.e. SE( X ) = s n. Kwonsang Lee STAT111 April 3, 2015 5 / 22

Two-Sample Example Let s assume that we want to study about household incomes in Philadelphia and New York. Philadelphia income dist. mean µ p and SD σ p New York income dist. mean µ n and SD σ n Then, we take a Philadelphia sample of size n p and a New York sample of size n n. Phila. sample sample mean x p and sample SD s p NY sample sample mean x n and sample SD s n What to do? We want to compare µ p with µ n. 1) Hypothesis test of µ p = µ n 2) Confidence interval of µ p µ n. Kwonsang Lee STAT111 April 3, 2015 6 / 22

Inference from Two-Sample: Intro We don t know the values of µ 1 and µ 2. We want to make inferences from Sample 1 and Sample 2. We can conduct a hypothesis test or construct a confidence interval for µ. Also, we need to consider the case when σ 1 and σ 2 are known or unknown. 1) Hypothesis test of µ 1 = µ 2 or µ 1 µ 2 = 0. 2) Confidence interval of µ 1 µ 2. Kwonsang Lee STAT111 April 3, 2015 7 / 22

Two-Sample Hypothesis Test: Known σ 1 and σ 2 Since σ 1 and σ 2 are known, we can take the Z test. a. H 0 : µ 1 µ 2 = 0 and H a : µ 1 µ 2 0. b. Test statistic Z 0 is c. P-value is Z 0 = ( X 1 X 2 ) (µ 1 µ 2 ) σ 2 1 n 1 + σ2 2 n 2 P(Z Z 0 ) + P(Z Z 0 ) = 2 P(Z Z 0 ) d. Compare P-value with a level α Kwonsang Lee STAT111 April 3, 2015 8 / 22

Two-Sample Hypothesis Test: Unknown σ 1 and σ 2 Since σ 1 and σ 2 are unknown, we use s 1 and s 2 instead and use the t-test with a level α. a. H 0 : µ 1 µ 2 = 0 and H a : µ 1 µ 2 0. b. Test statistic T 0 is T 0 = ( X 1 X 2 ) (µ 1 µ 2 ) s 2 1 n 1 + s2 2 n 2 c. (Modified Version) We can find the critical value t k such that P(T > t k ) = α 2 where k = min(n 1 1, n 2 1). d. Compare T 0 with the value t k. ( T 0 > t k Reject the null.) Kwonsang Lee STAT111 April 3, 2015 9 / 22

Example 1 There is a product A that is advertised as helping students to learn Statistics more effectively. We want to test if there is any positive effect of the product A. Among 44 participants, we randomly select 21 people to use the product (21 treated and 23 control). After one month, all participants take a statistic test, and the scores are recorded. The following is the summary: n x s Treated 21 51.5 11 Control 23 41.5 17 Q: How can we conduct a hypothesis test? Kwonsang Lee STAT111 April 3, 2015 10 / 22

Example 1 Two-Sample t-test with a level α = 0.05: a. H 0 : µ treated µ control = 0 and H a : µ t µ c 0. b. Test statistic T 0 is given by T 0 = ( X t X c ) (µ t µ c ) = + s2 c n c s 2 t n t (51.5 41.5) 0 11 2 21 + 172 23 = 2.336 c. Conservatively, degree of freedom k is min(n t 1, n c 1) = 20. The critical value tk is 2.086. P(T > t 20) = α 2 = 0.025 d. Since T 0 = 2.336 > 2.086 = t20, we reject the null hypothesis. t-table http: //bcs.whfreeman.com/ips6e/content/cat_050/ips6e_table-d.pdf Kwonsang Lee STAT111 April 3, 2015 11 / 22

Two-Sample t-test in JMP Here are the references for t-test. One-sample t-test: http://www.chem.sc.edu/faculty/morgan/ resources/statistics/jmp_one_sample_t-test.pdf Two-sample t-test: http://www.chem.sc.edu/faculty/morgan/ resources/statistics/jmp_two_sample_t-test.pdf Steps for two-sample t-test: 1. Open the data file. 2. Go to Analyze Fit Y by X. For example, Y is a score variable and X is an indicator of either treated or control. 3. Click the red triangle next to Oneway Analysis of... and choose t-test. Kwonsang Lee STAT111 April 3, 2015 12 / 22

Using JMP We can find the descriptions of each sample. We also find a confidence interval of µ 1 µ 2 and the results of Two-sample t-test. Using JMP, we can compute the p-value of our test statistic T 0 in the previous Example 1. The P-value is 0.0264 which is less than 0.05, so we reject the null hypothesis. Here is another reference relate with Example 1: http://web.utk.edu/~cwiek/201tutorials/twosamplettest/ Kwonsang Lee STAT111 April 3, 2015 13 / 22

Confidence Interval from Two-Sample We can consider two cases: 1) Known σ 1 and σ 2 case and 2) Unknown σ 1 and σ 2 case. Confidence interval of µ 1 µ 2 with known σ 1 and σ 2 ( X 1 X 2 ) ± Z σ 2 1 n 1 + σ2 2 n 2 Confidence interval of µ 1 µ 2 with unknown σ 1 and σ 2 ( X 1 X 2 ) ± tk s1 2 + s2 2 n 1 n 2 where k = min(n 1 1, n 2 1). Kwonsang Lee STAT111 April 3, 2015 14 / 22

Special Case: Matched Pairs Sometimes the two samples that are being compared are matched pairs. For example, if there is a drug A and it can lower blood pressure. Each subject s blood pressure is measured before taking the drug and is measured after intake. One subject has two values of the outcome. Then, we want to test if there is any difference between blood pressure before intake and blood pressure after intake. Subject 1 Subject 2... Subject n Before 130 128... 126 After 116 110... 108 We want to test if blood pressure before = blood pressure after. Kwonsang Lee STAT111 April 3, 2015 15 / 22

Matched Pairs In this case, we can compute the difference D = X 1 X 2. Here, Diff=Before After. Subject 1 Subject 2... Subject n Before 130 128... 126 After 116 110... 108 Diff 14 18... 18 Then, we can use a test like H 0 : Diff = 0. This is One-Sample t-test. Kwonsang Lee STAT111 April 3, 2015 16 / 22

Matched Pairs Test From Matched pairs design, we have X 1 and X 2 for n subjects. Then we compute the new variable D = X 1 X 2 and compute the sample mean and the sample SD of D: Xd and s d. Then, we can state the null hypothesis H 0 : µ d = 0 and the alternative H a : µ d 0. We calculate the test statistic T 0 T 0 = X d µ d s d / n. Then, we can calculate the critical value tn 1 statistic T 0. and compare it with the test Kwonsang Lee STAT111 April 3, 2015 17 / 22

Example 2 We consider the drug of lowering blood pressure example. The summary is that Subject Before After D 1 130 116 14 2 128 110 18.... 10 126 108 18 x before = 122.2, s before = 6.3 x after = 113, s after = 9.1 However, in matched pairs design, what we need is a new variable D = Before After. We have x d = 9.2, s d = 9.8 Kwonsang Lee STAT111 April 3, 2015 18 / 22

Example 2: Under Independent Assumption It is clear that Before and After is not independent because these two are from the same subject. If we consider Before and After are independent, then what can we conclude? We want to do hypothesis test with a level α = 0.02. a. H 0 : µ before µ after = 0 and H a : µ before µ after 0. b. The test statistic T 0 is T 0 = ( x before x after ) (µ before µ after ) = s 2 1 n 1 + s2 2 n 2 122.2 113 6.3 2 10 + 9.12 10 = 2.629 c. k = n 1 = 9 and t9 = 2.821 such that P(T > t 9 ) = α/2 = 0.01. d. T 0 = 2.629 < 2.821 = t9. So, we don t reject the null. It means that there is not enough evidence that there is an effect of a drug on lowering blood pressure. Kwonsang Lee STAT111 April 3, 2015 19 / 22

Example 2: Matched Pairs Before and After from the original data are dependent. So, doing t-test isn t correct under independence. Here is right Matched pairs t-test. Hypothesis test with a level α = 0.02 a. H 0 : µ d = 0 and H a : µ d 0. b. The test statistic T 0 is T 0 = x d µ d s d / n = 9.2 9.8/ 10 = 2.969 c. k = n 1 = 9 and t9 = 2.821 such that P(T > t 9 ) = α/2 = 0.01. d. T 0 = 2.969 > 2.821 = t9. So, we reject the null. It means that there is enough evidence that there is an effect of a drug on lowering blood pressure. Note: It is important to use the right approach!! Kwonsang Lee STAT111 April 3, 2015 20 / 22

Summary We learned CI and Hypothesis test in so many situations. I can give a direction about how to choose the right way of analysis. 1. Need to understand the data. i.e. is it one-sample? or two-sample? or matched pairs design? 2. Do we know the population SD σ? 3. Is our goal making a CI? or doing hypothesis test? 4. If we need to do hypothesis test, what is the null and alternative hypotheses? (One-sided? or Two-sided?) Kwonsang Lee STAT111 April 3, 2015 21 / 22

Next Week We have been talking about inferences of the population mean µ. Next week, we re going to talk about CI and hypothesis test for population p. Kwonsang Lee STAT111 April 3, 2015 22 / 22