Chapter 3. Comparing two populations Contents Hypothesis for the difference between two population means: matched pairs Hypothesis for the difference between two population means: independent samples Two normal populations with equal (unknown) variances Two normal populations with known variances Two nonnormal populations with unknown variances and large samples Two Bernoulli populations Hypothesis for the ratio of two population variances: independent samples
Chapter 3. Comparing two populations Learning goals At the end of this chapter you should be able to: Perform a test of hypothesis for the difference between two population means and for the ratio of two population variances Construct confidence intervals for the difference/ratio Distinguish situations where a test based on matched pairs is suitable from those where a test based on independent samples is Calculate the power of a test and the probability of Type II Error
Chapter 3. Comparing two populations References Newbold, P. Statistics for Business and Economics Chapter 9 (9.6-9.9) Ross, S. Chapter 10
Introduction In this chapter, we examine the case where instead of one random sample, two random samples are available from two populations, and the quantities of interest are: the difference between two population means case of matched pairs case of independent samples the ratio between two population variances case of independent samples We will draw on our experience from Chapters 1 and 2 to construct confidence intervals and perform tests of hypothesis for the abovementioned differences/rations of population parameters.
Tests for the difference between two means: matched pairs Example: In a study aimed at assessing the relationship between a subject s brain activity while watching a tv commercial and the subject s subsequent ability to recall the contents of the commercial, subjects were shown commercials for two brands of each of ten products. For each commercial, the ability to recall 24h later was measured, and each member of a pair of commercials was then designated high-recall or low-recall. The table below shows an index of the total amount of brain activity of subjects while watching these commercials. product: i 1 2 3 4 5 6 7 8 9 10 high-recall: x i 137 135 83 125 47 46 114 157 57 144 low-recall: y i 53 114 81 86 34 66 89 113 88 111 diff.: d i = x i y i 84 21 2 39 13 20 25 44 31 33
Tests for the difference between two means: matched pairs Let X be a population with mean µ X and Y be a population with mean µ Y. Suppose we have a random sample of n matched pairs of observations from these two populations and let d 1 = x 1 y 1, d 2 = x 2 y 2,..., d n = x n y n represent n differences with mean d and quasi-standard deviation s d. Let assume that the population of differences is normal. In a two-tail test H 0 : µ X µ Y = D 0 against H 1 : µ X µ Y D 0 : The test statistic is T = D D 0 s D / n H 0 t n 1 The rejection region is (at significance level α): RR α = {t : t < t n 1;α/2 or t > t n 1;α/2 }
Tests for the difference between two means: matched pairs Example: cont. Population: D = difference between highand low-recall D N(µ X µ Y, σ 2 D ) 'SRS: n = 10 Sample: d = 210 10 = 21 s 2 d = 142022 10(21)2 10 1 = 1088 = Objective: test D 0 {}}{ H 0 : µ X µ Y 0 against H 1 : µ X µ Y > 0 (Upper-tail test) Test statistic: T = D D 0 s D / n t n 1 Observed test statistic: D 0 = 0 n = 10 d = 21 s d = 1088 = 32.98 t = d D 0 s d / n 21 32.98/ 10 = 2.014
Tests for the difference between two means: matched pairs Example: cont. p-value = P(T 2.014) (0.025, 0.05) because t 9;0.05 t 9;0.025 {}}{{}}{ 1.833 < 2.014 < 2.262 Hence, given that p-value < α = 0.05 we reject the null hypothesis at this level. t n 1 density t= 2.014 p value =area 1.833 2.262 Conclusion: The sample data gave enough evidence to support the claim that on the average, brain activity is higher for the high-recall than for the low-recall group. If in fact, the mean brain activity were the same for these two groups, then the probability of finding a sample result as extreme as or more extreme than that actually obtained would be between 0.025 and 0.05 (which is rather low).
Tests for the difference between two means: matched pairs Example: cont. in Excel: Go to menu: Data, submenu: Data Analysis, choose function: t-test Paired Two Sample for Means. Columns A and B (data), in yellow (the observed test statistic and p-value).
Two-tail test for the difference between two means via CI: matched pairs Example: cont. Construct a 95% confidence interval for µ X µ Y. ( ) s d CI 0.95 (µ X µ Y ) = d t n 1;0.025, d s d + t n 1;0.025 n n ( = 21 2.262 32.98, 21 + 2.262 32.98 ) 10 10 = ( 2.59, 44.59) Since the value of 0 belongs to this interval, we cannot reject the null hypothesis of the equality of the two population means at a α = 0.05 significance level.
Tests for the difference between two means: independent normal samples, population variances equal Let X be a population with mean µ X and variance σx 2 and Y be a population with mean µ Y and variance σy 2, both normally distributed with unknown, but equal population variances σ 2 = σx 2 = σ2 Y. Suppose we have a random sample of n 1 observations from X and an independent random sample of n 2 observations from Y. In a two-tail test H 0 : µ X µ Y = D 0 against H 1 : µ X µ Y D 0 : The test statistic is T = X Ȳ D 0 q H0 t n1 1 s p n 1 + 1 +n 2 2 n 2 where the estimator of the common population variance is s 2 p = (n1 1)s2 X + (n 2 1)s 2 Y n 1 + n 2 2 Note: the number of degrees of freedom is n 1 + n 2 2 (the total number of observations from both samples minus two - two dfs are lost to estimate µ X and µ Y ) The rejection region is (at significance level α): RR α = {t : t < t n1 +n 2 2;α/2 or t > t n1 +n 2 2;α/2}
Tests for the difference between two means: independent normal samples, population variances equal Example: 9.8 (Newbold) A study attempted to assess the effect of the presence of a moderator on the number of ideas generated by a group. Groups of four members, with or without moderator, were observed. For a random sample of four groups with a moderator, the mean number of ideas generated per group was 78.0, and the sample quasi-standard deviation was 24.4. For an independent sample of four groups without a moderator, the mean number of ideas generated was 63.5, and the sample quasi-standard deviation was 20.2. Assuming that the populations distributions are normal with equal variances, test the null hypothesis (α = 0.1) that the population means are equal against the alternative that the true mean is higher for groups with a moderator. Population 1: Population 2: X = number of ideas in groups Y = number of ideas in groups with a moderator without a moderator X N(µ X, σx 2 ) X N(µ Y, σy 2 ) 'SRS: n 1 = 4 'SRS: n 2 = 4 Sample: x = 78.0 Sample: ȳ = 63.5 s x = 24.4 s y = 20.2 Assume independent normal samples and σx 2 = σy 2 = σ 2
Tests for the difference between two means: independent normal samples, population variances equal Example: 9.8 (Newbold cont.) Objective: test D 0 z} { H 0 : µ X µ Y = 0 against H 1 : µ X µ Y > 0 (Upper-tail test) Test statistic: T = r X Ȳ 1 sp + 1 n1 n 2 Observed test statistic: H0 t n1 +n 2 2 sp = t = Rejection region: = 501.7 = 22.4 x ȳ sp p 1/n1 + 1/n 2 78.0 63.5 22.4 p = 0.915 1/4 + 1/4 1.440 z } { RR 0.1 = {t : t > t 6;0.1 } Since t = 0.915 / RR 0.1 we cannot reject the null hypothesis at a 10% level. D 0 = 0 n 1 = 4 n 2 = 4 x = 78.0 sx = 24.4 ȳ = 63.5 sy = 20.2 s p 2 = (n 1 1)sx 2 + (n 2 1)s2 y n 1 + n 2 2 = (4 1)24.4 2 + (4 1)20.2 2 4 + 4 2 = 501.7 Conclusion: The sample data did not contain strong evidence suggesting that on average, more ideas will be generated by groups with moderators. However, for such small sample sizes, we cannot expect great power in the test so quite large differences in the population means would be needed to reject the null hypothesis at low significance levels.
Two-tail test for the difference between two means via CI: independent normal samples, population variances equal Example: 9.8 (Newbold cont.) Construct a 99% confidence interval for µ X µ Y. CI 0.99 (µ X µ Y ) = = ( x ȳ t n1+n2 2;0.005s p 1n1 + 1n2 ) ( ) 1 78.0 63.5 3.707 22.4 4 + 1 4 = ( 44.22, 73.22) Since the value of 0 belongs to this interval, we cannot reject the null hypothesis of the equality of the two population means at a α = 0.01 significance level.
Tests for the difference between two means: independent large samples or two normal populations with known variances Let X be a population with mean µ X and variance σx 2 and Y be a population with mean µ Y and variance σy 2. Suppose we have a random sample of n 1 observations from X and an independent random sample of n 2 observations from Y and: Either that both n1 and n 2 are large and σ1 2 and σ2 2 are unknown Or that X and Y are normally distributed and σ 2 1 and σ2 2 are known In a two-tail test H 0 : µ X µ Y = D 0 against H 1 : µ X µ Y D 0 : The test statistic is: Either Z = X Ȳ D 0 r H0, approx. N(0, 1) s X 2 n1 + s2 Y n2 Or Z = X Ȳ D 0 r σ 2 X n1 + σ2 Y n2 H0 The rejection region is (at significance level α): N(0, 1) RR α = {z : z < z α/2 or z > z α/2 }
Tests for the difference between two means: independent large samples or two normal populations with known variances Example: 9.7 (Newbold) A survey of practicing certified public accountants on attitudes to women in the profession was carried out. Survey respondents were asked to react on a scale from one (strongly disagree) to five (strongly agree) to the statement: Women in public accounting are given the same job assignments as men. For a sample of 186 male accountants, the mean response was 4.059 and the sample quasi-standard deviation was 0.839. For an independent random sample of 172 female accountants, the mean response was 3.680 and the sample quasi-standard deviation was 0.966. Test the null hypothesis (α = 0.0001) that the two population means are equal against the alternative that the true mean is higher for male accountants. Population 1: X = response of a male accountant X µ X, σ 2 X Population 2: Y = response of a female accountant X µ Y, σ 2 Y 'SRS: n 1 = 186 Sample: x = 4.059 s x = 0.839 'SRS: n 2 = 172 Sample: ȳ = 3.680 s y = 0.966
Tests for the difference between two means: independent large samples or two normal populations with known variances Example: 9.7 (Newbold cont.) Objective: test D 0 z} { H 0 : µ X µ Y = 0 against H 1 : µ X µ Y > 0 (Upper-tail test) Test statistic: Z = s X Ȳ s X 2 + s2 Y n1 n2 Observed test statistic: H0, approx. N(0, 1) D 0 = 0 n 1 = 186 n 2 = 172 x = 4.059 sx = 0.839 ȳ = 3.680 sy = 0.966 z = x ȳ q s x 2/n 1 + s2 y /n 2 Rejection region: z 3.75 } { RR 0.0001 = {z : z > z 0.0001 } Since z = 3.95 RR 0.0001 we reject the null hypothesis at a 0.01% level. Conclusion: The data contains very strong evidence suggesting that the population mean response is higher for males than for females - that is, on average, males feel more strongly than females in the profession that women are given the same job assignments as men. = 4.059 3.680 q 0.839 2 /186 + 0.966 2 = 3.95 /172
Tests for the difference between two means: independent large samples or two normal populations with known variances Example: 9.7 (Newbold) Construct a 95% confidence interval for µ X µ Y. CI 0.95 (µ X µ Y ) = = sx x ȳ z 2 0.025 + s2 y n 1 n 2 ( 4.059 3.680 1.96 ) 0.839 2 /186 + 0.966 2 /172 = (0.19, 0.57) Since the value of 0 does not belong to this interval, we can reject the null hypothesis of the equality of the two population means at a α = 0.05 significance level.
Tests for the difference between two proportions: independent large samples Let X Bernoulli(p X ) and let Y Bernoulli(p Y ) where p X and p Y are two population proportions of individuals with a characteristic of interest. Suppose we have a random sample of n 1 observations from X and an independent random sample of n 2 observations from Y and that both n 1 and n 2 are large In a two-tail test H 0 : p X = p Y (= p 0 ) against H 1 : p X p Y : The test statistic is: ˆp X ˆp Y Z = r H0, approx. N(0, 1), 1 ˆp 0(1 ˆp 0) n 1 + 1 n 2 where ˆp 0 = n1ˆp X + n 2ˆp Y n 1 + n 2 The rejection region is (at significance level α): RR α = {z : z < z α/2 or z > z α/2 }
Tests for the difference between two proportions: independent large samples Example: 9.9 (Newbold) In market research, when populations of individuals or households are surveyed by mail questionnaires, it is important to achieve as high a response rate as possible. One way to improve response might be to include in the questionnaire an initial inducement question, intended to increase the respondent s interest in completing the questionnaire. Questionnaires containing an inducement question on the importance of recreation facilities in a city were sent to a sample of 250 households, yielding 101 responses. Otherwise identical questionnaires, but without the inducement question, were sent to an independent random sample of 250 households, producing 75 responses. Test the null hypothesis that the two population proportions of responses would be the same against the alternative that the response rate would be higher when the inducement question is included. Population 1: X = 1 if a person completes the questionnaire with the inducement question, and 0 otherwise X Bernoulli(p X ) Population 2: Y = 1 if a person completes the questionnaire without the inducement question, and 0 otherwise Y Bernoulli(p Y ) 'SRS: n 1 = 250 Sample: ˆp x = 101 250 = 0.404 'SRS: n 2 = 250 Sample: ˆp y = 75 250 = 0.300
Tests for the difference between two proportions: independent large samples Example: 9.9 (Newbold cont.) Objective: test H 0 : p X = p Y against H 1 : p X > p Y (Upper-tail test) Test statistic: ˆp Z = X ˆp s Y «H0 ˆp 0 (1 ˆp 0 ) 1 + n 1, approx. N(0, 1) 1 n 2 Observed test statistic: n 1 = 250 n 2 = 250 ˆpx = 0.404 ˆpy = 0.300 ˆp 0 = = = 0.352 n 1 ˆpx + n 2 ˆpy n 1 + n 2 250(0.404) + (250)(0.300) 250 + 250 z = = ˆpx ˆpy s «ˆp 0 (1 ˆp 0 ) 1 + n 1 1 n 2 0.404 0.300 r 0.352(1 0.352) 1 250 + 250 1 = 2.43 p-value = P(Z z) = P(Z 2.43) = 0.0075 Since p-value is very small, the null hypothesis can be rejected at any significance level bigger than 0.0075. Conclusion: The sample data did contain very strong evidence suggesting that a higher response rate will be achieved when an inducement question is included than when it is not.
Tests for the difference between two proportions: independent large samples Example: 9.9 (Newbold cont.) Construct a 95% confidence interval for p X p Y. ( ) CI 0.95 (p X p Y ) = (ˆp ) 1 x ˆp y z 0.025 ˆp 0 (1 ˆp 0 ) = ( 0.404 0.300 1.96 = (0.1877, 0.0203) n 1 + 1 n 2 0.352(1 0.352) Since the value of 0 does not belong to this interval, we can reject the null hypothesis of the equality of the two population means at a α = 0.05 significance level. ( 1 250 + 1 ) ) 250
Tests for the ratio of variances: normal samples Let X be a population with mean µ X and variance σx 2 and Y be a population with mean µ Y and variance σy 2, both normally distributed. Suppose we have a random sample of n 1 observations from X and an independent random sample of n 2 observations from Y. In a two-tail test H 0 : σ 2 X = σ2 Y (= σ2 ) against H 1 : σ 2 X σ2 Y : The test statistic is F = s2 X s 2 Y H0 F n1 1,n 2 1 The rejection region is (at significance level α): RR α = {f : f < F n1 1,n 2 1;1 α/2 or f > F n1 1,n 2 1;α/2}
F distribution Recall that if X 1, X 2,..., X n and Y 1, Y 2, Y 3,..., Y m denote independent rvs, all following an N(0, 1) distribution. The random variable F = 1 n 1 m P n i=1 X 2 i P m i=1 Y 2 i follows an F n,m distribution with n and m degrees of freedom. We can view it as a ratio of two normalized chi-square rvs. This is where the result from the previous page comes from: s 2 X s 2 Y = H0 1 n 1 1 χ 2 n 1 1 z } { (n 1 1)s 2 X σ 2 1 (n 2 1)sY 2 n 2 1 σ {z 2 } χ 2 n 2 1 F n1 1,n 2 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 F densities df1=30 df2=30) df1=10 df2=15 df1=8 df2=8 df1=5 df2=3 0 2 4 6 8
Tests for the ratio of variances: normal samples Example: 9.10 (Newbold) For a random sample of 17 newly issued AAA-rated industrial bonds, the quasi-variance of maturities (in years squared) was 123.35. For an independent random sample of 11 issued CCC-rated industrial bonds, the quasi-variance of maturities was 8.02. If the respective population variances are denoted σ 2 X and σ2 Y, perform a two-sided test at a 5% level. Population 1: X maturity of AAA-rated bonds (in years) X N(µ X, σ 2 X ) Population 2: Y maturity of CCC-rated bonds (in years) Y N(µ Y, σ 2 Y ) 'SRS: n 1 = 17 Sample: s 2 x = 123.35 'SRS: n 2 = 11 Sample: s 2 y = 8.02
Tests for the ratio of variances: normal samples Example: 9.10 (Newbold cont.) Objective: test H 0 : σ 2 X = σ 2 Y against H 1 : σ 2 X σ 2 Y (Two-tail test) Test statistic: F = s2 X s Y 2 H0 F n1 1,n 2 1 Observed test statistic: n 1 = 17 n 2 = 11 s 2 x = 123.35 s 2 y = 8.02 f = 123.35 8.02 = 15.38 Rejection region: RR 0.10 = {f : f < 0.402 z } { F 16,10;1 0.05} {f : f > F 16,10;0.05 } {z } 2.83 Note: the quantile F 16,10;0.05 = 2.83 is directly available from the F-table, but the other one not. We can get it however using the following property of the F-distribution F n,m;α = F 16,10;1 0.05 = 1 F m,n;1 α Hence 1 F 10,16;0.05 = 1 2.49 = 0.402 We see that f = 15.38 RR 0.10. Conclusion: There is very strong evidence that the population variances are different.
Two-tail test for the ratio of variances via confidence interval Example: 9.10 (Newbold cont.) Construct a 90% confidence interval for the ratio of the variances. ( ) ( ) σ 2 CI X s 2 0.90 σy 2 = x 1 sy 2, s2 x 1 F n1 1,n 2 1;0.05 sy 2 F n1 1,n 2 1;1 0.05 ( 123.35 1 = 8.02 2.83, 123.35 ) 1 8.02 0.402 = (5.43, 38.26) As we expected, the value of 1 does not belong to this interval, so we can reject the null hypothesis of the equality of the two population variances at a α = 0.1 significance level.
Test statistics Parameter Assumptions Test statistic Normal differences Matched pairs Normal pops. Equal common var. µ X µ Y = D 0 Normal pops. Known vars. p X p Y = 0 Nonnormal pops. Unknown vars. Large samples Bernoulli pops. Large samples D D 0 s D / n t n 1 X Ȳ D r 0 1 sp + 1 n1 n 2 H0 t n1 +n 2 2 Ȳ D s X 0 σ X 2 + σ2 Y n1 n2 H0 N(0, 1) X Ȳ s D 0 s X 2 H0 + s2, approx N(0, 1) Y n1 n2 ˆp X ˆp s Y «H0 ˆp 0 (1 ˆp 0 ) 1 + n 1, approx N(0, 1) 1 n 2 σ 2 X /σ2 Y = 1 Normal pops. s 2 X s 2 Y H0 F n1 1,n 2 1 Question: How would you define RR α in upper- and lower-tail tests?