Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem Reading: 2.4 2.6. Motivation: Designing a new silver coins experiment Sample size calculations Margin of error for the pooled two sample t CI The power of the pooled two sample t-test The paired experiment versus the two independent samples experiment Comparing the standard errors for the teachers example 1
Motivation: Designing a new silver coins experiment Problem: to distinguish whether there was a change in the silver content in coins minted during the reign of.... Design: Two samples are chosen from coins minted in the early and late periods of the reign. Analysis: A two-sample pooled t-test along with a 95% CI confidence interval for µ 1 µ 2, the difference in the mean silver content. Goals: 1. To provide an interval with a margin of error of 0.2. 2. What sample sizes would we need to detect a difference of 0.2 with a two-sided two-sample pooled t-test at level α = 0.05 with 90% power? 2
Experimental design Experimental design is the act of evaluating and choosing between different experiments. Sample size calculations are commonly used either before or after an experimental design is chosen. Here are two different approaches to sample size calculation for the two-sample pooled problem: 1. You want to select the sample sizes for a C% confidence interval for µ 1 µ 2 with a certain margin of error, m. 2. You want to select the samples sizes for testing H 0 : µ 1 = µ 2 with a certain significance level and power. 3
The margin of error Remember the pooled t-based 100(1 α)% confidence interval for µ 1 µ 2 is where S.E.(Y 1 Y 2 ) = In the above interval, the margin of error is m = For the silver coins example for a confidence level of C = 95%, say we want a margin of error of m = Why is this hard to solve? 4
Making approximations We have m = t n1 +n 2 2(0.975) s p 1 n 1 + 1 n 2. Approximation 1: Setting n 1 = n 2 = n we have: Approximation 2: Plug-in an estimate of s p (we will use the value of 0.474 from the data from Manuel I s reign). Approximation 3: Setting the df = we get a guess for n: 5
Assumptions, assumptions This calculation for n 1 and n 2 assumes that 1. A two-sample pooled t procedure will be appropriate. 2. We can actually obtain samples of size n 1 and n 2. 3. The variability in the sample we will collect is similar to that of our Byzantine coins. 6
The power of a significance test Remember, the power of a significance test is related to the probability of a type II error: Power = 1 P (Type II error) = 1 P (fail to reject H 0 when H 0 is false) = P (reject H 0 when H 0 is false). We need to be specific about what when H 0 is false means. For two-sample t-test: H 0 true: H 0 false: We must specify what specific value of µ 1 µ 2 in the alternative hypothesis we mean when we say H 0 is false in order to compute power. 7
The power of the pooled two sample t-test We use MINITAB. Stat Power and Sample Size 2-Sample t. Under Options select the Alternative Hypothesis and Significance Level. Then enter any two of the following three items: 1. Sample sizes: 2. Differences: (the difference between the µ 1 µ 2 value under H a and the µ 1 µ 2 value under H 0 ). 3. Power values: Enter the Standard deviation (s p ) and click OK. 8
Power calculation for the silver coins What sample sizes would we need to detect a difference of 0.2 with a two-sided two-sample pooled t-test at level α = 0.05 with 90% power? Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 0.474 Sample Target Difference Size Power Actual Power 0.2 120 0.9 0.902368 The sample size is for each group. We need n 1 = n 2 = 9
The paired experiment versus the two independent samples experiment To compare these two experiments, we will evaluate the standard error (S.E.) under each setup for the Spanish teachers. We could also compare the margin of errors of the confidence intervals. 10
Spanish teachers example Suppose that the data had been collected on two separate sets of 20 teachers. One set at the beginning of the course. One set at the end of the course. Here are the statistics required for the two sample analysis: (check this for yourself) For sample 1 (pre): n 1 = 20 Y 1 = 27.3 s 2 1 = 25.38 For sample 2 (post): n 2 = 20 Y 2 = 28.75 s 2 2 = 22.51 The pooled estimate of σ is (n 1 1)s 2 1 s p = + (n 2 1)s 2 2 (n 1 1) + (n 2 1) = 4.89. 11
The two sample analysis (exercise!) Hypotheses: H 0 : µ 1 = µ 2 versus H a : µ 1 < µ 2, where µ 1 is the pre-mean score, and µ 2 is the post-mean score. The two-sample pooled t statistic is t = Y 1 Y 2 δ s p 1 n 1 + 1 n 2. 1.45 = 1 4.89 20 + 1 20 = 1.45 1.546 = 0.937. If T is a t distributed random variable on n 1 + n 2 2 = 20 + 20 2 = 38 degrees of freedom, the p-value is P (T < 0.937) = 0.177. Conclusion: 12
Comparing the SEs for the two experimental designs The paired experiment: Test the same n = 20 teachers before and after the course. The S.E. for the sample mean of the differences, Y, is S.E.(Y ) = s d n = 0.716. The non-paired experiment: Test different teachers before and after the course. n 1 = n 2 = 20. 1 S.E.(Y 1 Y 2 ) = s p + 1 = 1.546. n 1 n 2 Was it wise to originally use the paired t-test? 13