Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent of those in the other group. Example: Effects of ozone Study the effects of ozone by controlled randomized experiment 55 70-day-old rats were randomly assigned to two treatment or control Treatment group: 22 rats were kept in an environment containing ozone. Control group: 23 rats were kept in an ozone-free environment Data: Weight gains after 7 days We are interested in the difference in weight gain between the treatment and control group. Question: Do the weight gains differ between groups? x 1,..., x 22 - weight gains for treatment group y 1,..., y 23 - weight gains for control group Test problem: H 0 : µ X = µ Y vs H a : µ X µ Y Idea: Reject null hypothesis if x ȳ is large. Weight gain (in gram) 10 0 10 20 30 40 50 Treatment Control Two Sample Tests, Feb 23, 2004-1 -
Comparing Means Let X 1,..., X m and Y 1,..., Y n be two independent normally distributed samples. Then ) X (µ Ȳ N X µ Y, σ2 X m + σ2 Y n Two-sample t test Two-sample t statistic T = X Ȳ s 2 X m + s2 Y n Distribution of T can be approximated by t distribution Two-sided test: H 0 : µ X = µ Y against H a : µ X µ Y reject H 0 if T > t df,α/2 One-sided test: H 0 : µ X = µ Y against H a : µ X > µ Y reject H 0 if T > t df,α Degrees of freedom: Approximations for df provided by statistical software Satterthwaite approximation df = 1 m 1 ( s 2 X ( s 2 X m ) 2 m + s2 Y n ) 2 + 1 n 1 ( ) s 2 2 Y n commonly used, conservative approximation Otherwise: use df = min(m 1, n 1) Two Sample Tests, Feb 23, 2004-2 -
Comparing Means Example: Effects of ozone Data: Treatment group: x = 11.01, s X = 19.02, m = 22 Control group: x = 22.43, s X = 10.78, n = 23 Testproblem: H 0 : µ X = µ Y vs H a : µ X µ Y α = 0.05, df = min(m 1, n 1) = 21, t 21,0.025 = 2.08 The value of the test statistic is t = x ȳ s 2 Xm + s2 Ym = 2.46 The corresponding P-value is P( T t ) = P( T 2.46) = 0.023 Thus we reject the hypothesis that ozone has no effect on weight gain. Two-sample t test with STATA:. ttest weight, by(group) unequal Two-sample t test with unequal variances ---------------------------------------------------------------------------- Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+------------------------------------------------------------------ 0 23 22.42609 2.247108 10.77675 17.76587 27.0863 1 22 11.00909 4.054461 19.01711 2.577378 19.4408 ---------+------------------------------------------------------------------ combined 45 16.84444 2.422057 16.24765 11.96311 21.72578 ---------+------------------------------------------------------------------ diff 11.417 4.635531 1.985043 20.84895 ---------------------------------------------------------------------------- Satterthwaite s degrees of freedom: 32.9179 Ho: mean(0) - mean(1) = diff = 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 t = 2.4629 t = 2.4629 t = 2.4629 P < t = 0.9904 P > t = 0.0192 P > t = 0.0096 Two Sample Tests, Feb 23, 2004-3 -
Suppose that σx 2 = σ2 Y = σ2. Then σ 2 ( m + σ2 1 n = σ2 m + 1 ). n Comparing Means Estimate σ 2 by the pooled sample variance s 2 p = (m 1)s2 X + (n 1)s2 Y m + n 2. Pooled two-sample t test Two-sample t statistic T = X Ȳ 1 s p m + 1 n T is t distributed with m + n 2 degrees of freedom. Two-sided test: H 0 : µ X = µ Y against H a : µ X µ Y reject H 0 if T > t m+n 2,α/2 One-sided test: H 0 : µ X = µ Y against H a : µ X > µ Y Remarks: reject H 0 if T > t m+n 2,α If m n, the test is reasonably robust against nonnormality and unequal variances. If sample sizes differ a lot, test is very sensitive to unequal variances. Tests for differences in variances are sensitive to nonnormality. Two Sample Tests, Feb 23, 2004-4 -
Comparing Means Example: Parkinson s disease Study on Parkinson s disease Parkinson s disease, among other things, affects a person s ability to speak Overall condition can be improved by an operation How does the operation affect the ability to speak? Treatment group: Eight patients received operation Control group: Fourteen patients Data: score on several test Speaking ability 1.5 2.0 2.5 3.0 high scores indicate problem with speaking Treat. Contr. Pooled twpo sample t test with STATA:. infile ability group using parkinson.txt. ttest ability, by(group) Two-sample t test with equal variances --------------------------------------------------------------------------- Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+----------------------------------------------------------------- 0 14 1.821429.148686.5563322 1.500212 2.142645 1 8 2.45.14516.4105745 2.106751 2.793249 ---------+----------------------------------------------------------------- combined 22 2.05.1249675.5861497 1.790116 2.309884 ---------+----------------------------------------------------------------- diff -.6285714.2260675-1.10014 -.1570029 --------------------------------------------------------------------------- Degrees of freedom: 20 Ho: mean(0) - mean(1) = diff = 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 t = -2.7805 t = -2.7805 t = -2.7805 P < t = 0.0058 P > t = 0.0115 P > t = 0.9942 Two Sample Tests, Feb 23, 2004-5 -
Comparing Variances Example: Parkinson s disease In order to apply the pooled two-sample t test, the variances of the two groups have to be equal. Are the data compatible with this assumption? F test for equality of variances The F test statistic F = s2 X s 2. Y is F distributed with m 1 and n 1 degrees of freedom.. sdtest ability, by(group) Variance ratio test ------------------------------------------------------------------------------ Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 14 1.821429.148686.5563322 1.500212 2.142645 1 8 2.45.14516.4105745 2.106751 2.793249 ---------+-------------------------------------------------------------------- combined 22 2.05.1249675.5861497 1.790116 2.309884 ------------------------------------------------------------------------------ Ho: sd(0) = sd(1) F(13,7) observed = F_obs = 1.836 F(13,7) lower tail = F_L = 1/F_obs = 0.545 F(13,7) upper tail = F_U = F_obs = 1.836 Ha: sd(0) < sd(1) Ha: sd(0)!= sd(1) Ha: sd(0) > sd(1) P < F_obs = 0.7865 P < F_L + P > F_U = 0.3767 P > F_obs = 0.2135 Result: We cannot reject the null hypothesis that the variances are equal. Problem: Are the data normally distributed? Speaking ability (Treat.) 3.0 2.8 2.6 2.4 2.2 2.0 1.8 Speaking ability (Contr.) 3.0 2.5 2.0 1.5 1.5 0.5 0.5 1.0 1.5 Theoretical Quantiles 1 0 1 Theoretical Quantiles Two Sample Tests, Feb 23, 2004-6 -
Comparing Proportions Suppose we have two populations with unknown proportions p 1 and p 2. Random samples of size n 1 and n 2 are drawn from the two population ˆp 1 is the sample proportion for the first population ˆp 2 is the sample proportion for the second population Question: Are the two proportions p 1 and p 2 different? Test problem: H 0 : p 1 = p 2 vs H 1 : p 1 p 2 Idea: Reject H 0 if ˆp 1 ˆp 2 is large. Note that ˆp 1 ˆp 2 N This suggests the test statistic ( p 1 p 2, p 1(1 p 1 ) + p ) 2(1 p 2 ) n 1 n 2 T = ˆp 1 ˆp 2 ( ) 1 ˆp(1 ˆp) n 1 + 1 n 2 where ˆp is the combined proportion of successes in both samples ˆp = X 1 + X 2 n 1 + n 2 = n 1 ˆp 1 + n 2 ˆp 2 n 1 + n 2 with X 1 and X 2 denoting the number of successes in each sample. Under H 0, the test statistic is approximately standard normally distributed. Two Sample Tests, Feb 23, 2004-7 -
Example: Question wording Comparing Proportions The ability of question wording to affect the outcome of a survey can be a serious issue. Consider the following two questions: 1. Would you favor or oppose a law that would require a person to obtain a police permit before purchasing a gun? 2. Would you favor or oppose a law that would require a person to obtain a police permit before purchasing a gun, or do you think such a law would interfere too much with the right of citizens to own guns? In two surveys, the following results were obtained: Question Yes No Total 1 463 152 615 2 403 182 585 Question: Is the true proportion of people favoring the permit law the same in both groups or not?. prtesti 615 0.753 585 0.689 Two-sample test of proportion x: Number of obs = 615 y: Number of obs = 585 -------------------------------------------------------------------------- Variable Mean Std. Err. z P> z [95% Conf. Interval] ---------+---------------------------------------------------------------- x.753.0173904.7189155.7870845 y.689.0191387.6514889.7265111 ---------+---------------------------------------------------------------- diff.064.0258595.0133163.1146837 under Ho:.0258799 2.47 0.013 -------------------------------------------------------------------------- Ho: proportion(x) - proportion(y) = diff = 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 z = 2.473 z = 2.473 z = 2.473 P < z = 0.9933 P > z = 0.0134 P > z = 0.0067 Two Sample Tests, Feb 23, 2004-8 -
Final Remarks Statistical theory focuses on the significance level, the probability of a type I error. In practice, discussion of power of test also important: Example: Efficient Market Hypothesis Efficient market hypothesis for stock prices: future stock prices show only random variation market incorporates all information available now in present prices no information available now will help to predict future stock prices Testing of the efficient market hypothesis: Many studies tested H 0 : Market is efficient H a : Prediction is possible Almost all studies failed to find good evidence against H 0. Consequently the efficient market hypothesis became quite popular. Problem: Power was generally low in the significance tests employed in the studies. Failure to reject H 0 is no evidence that H 0 is true. More careful studies showed that the size of a company and measures of value such as ratio of stock price to earnings do help predict future stock prices. Two Sample Tests, Feb 23, 2004-9 -
Final Remarks Example IQ of 1000 women and 1000 men ˆµ w = 100.68, σ w = 14.91 ˆµ m = 98.90, σ m = 14.68 Pooled two-sample t test: T = 2.7009 Reject H 0 : µ w = µ m since T > t 1998,0.005 = 2.58. The difference in the IQ is statistically significant at the 0.01 level. However we might conclude that the difference is scientifically irrelevant. Note: A low significance level does not mean there is a large difference, but only that there is strong evidence that there is some difference. Two Sample Tests, Feb 23, 2004-10 -
Final Remarks Example: Is radiation from cell phones harmful? Observational study Comparison of brain cancer patients and similar group without brain cancer No statistically significant association between cell phone use and a group of brain cancers known as gliomas. Separate analysis for 20 types of gliomas found association between phone use and one rare from. Risk seemed to decrease with greater mobile phone use. Think for a moment: Suppose all 20 null hypotheses are true. Each test has 5% chance of being significant - the outcome is Bernoulli distributed with parameter 0.05. The number of false positive tests is binomially distributed: N Bin(20, 0.05) The probability of getting one or more positive results is P(N 1) = 1 P(N = 0) = 1 0.95 20 = 0.64. We therefore might have expected at least one significant association. Beware of searching for significance Two Sample Tests, Feb 23, 2004-11 -
Final Remarks Problem: If several tests are performed, the probability of a type I error increases. Idea: Adjust significance level of each single test. Bonferroni procedure: Perform k tests Use significance level α/k for each of the k tests If all null hypothesis are true, the probability is α that any of the tests rejects its null hypothesis. Example Suppose we perform k = 6 tests and obtain the following P -values: P -value α/k 0.476 0.032 0.241 0.008* 0.010 0.001* 0.0083 Only two tests (*) are significant at the 0.05 level. Two Sample Tests, Feb 23, 2004-12 -