Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized sample distributions for a mean and assumed that the sample distribution was either Normal or could be approximated as Normal. We also generally assumed that we knew the population standard deviation. These assumptions allowed us to develop an understanding of a number of concepts such as the confidence interval, significance tests, P-values and Type I and Type II errors. These assumptions allowed us to compute a z-score and to establish critical values,. Confidence Intervals Hypothesis Testing P-Value is the probability, under the null hypothesis of having a z value more extreme then z score for the sample. A critical value can be used for a significance level α The presentation used generic terms such as the test statistic, critical values, and the standard deviation of the test statistic. These general terms anticipated the fact that most often, all we have is a sample! We don t know the standard deviation of a population. Nor can we assume that our test statistic is normally distributed. The z score/statistic represents a centering and rescaling of a random variable. It represents a deviation (from a mean) expressed in units of standard deviations. The t statistic can be thought of as an estimate for the z statistic that uses an estimate of the standard deviation. In Chapter 1, we computed/estimated the variance and standard deviation from a sample as: 1
When we know the standard deviation, σ, for the population, we computed the standard deviation for the sample mean as When we don t know the standard deviation, we estimate the standard deviation for the sample mean as is known as the standard error or standard error of the mean. The t statistic is: If (normally distributed with a mean μ and stand deviation σ) then, t will have a t-distribution with n-1 degrees of freedom. If we took repeated samples and recalculated t many times, the probability that the valued of t would be in some interval dt would be independent of μ and σ but would depend on the sample size n. For any particular sample size, there is a corresponding t distribution: t(1) is the t distribution based on a sample size of 2, t(2) is the t distribution based on a sample size of 3 and so on. The t distribution (for any sample size) looks very similar to the z distribution. It is a symmetric bell shaped curve. 2
0.0 0.1 0.2 0.3 0.4-6 -4-2 0 2 4 6 x Figure 1: z density function (smooth) and t(5) density function (circles) The t distribution (for all degrees of freedom) is shorter (less probability in the middle) and so fatter at the tails (more probability in the tails) than the standard normal. Critical t * values can be computed for use in developing margins of error for confidence intervals or for one or two tailed hypothesis tests. The critical values identify points that partition X into events (subsets of the sample space) with specific probabilities. Inference for the mean (one sample) Whether we are using a z statistic (with a known σ) or a t statistic with an estimated we are always assuming that our population is normally distributed or that at least can be assumed to be approximately normally distributed. With this assumption, all of the previous formulas used when σ was known are repeated but with replaced with and z values replaced with corresponding t values. 3
Table 1Basic formulae used in hypothesis testing and parameter estimation Known σ and hence known C level confidence interval for the mean Estimated s and hence estimated Critical values of the t statistic, t * are based on a t distribution with k= n-1 degrees of freedom (t(k)). where P( z ) = C where P( t ) = C hypothesis test (two sided alternative) with a significance level α reject if reject if where P( z ) = α where P( t ) = α hypothesis test (one sided alternative) with a significance level α (the following considers just one of the one sided alternatives with a significance level α reject if where P(z ) = α P-Value where P(t reject if ) = α P(z is more extreme than ) P(t is more extreme than ) Table D Critical values,, used in the above formulas can be determined from Table A, a table of standard normal probabilities. The determination is tedious, particularly if you are using a confidence level or a significance level for a two tailed test. C or α must be divided by two and this probability needs to be isolated in the body of the table. The probability is seldom exactly represented leaving the user to consider interpolation between a couple of neighbouring values. What we really want in these situations is a table that allows us to lookup a specific value for C (like 95%) or a specific value for α (like 5%) and find the appropriate corresponding. Table D does this! Equivalent tables to table A, for determining critical values for would require a two page table for each possible degree of freedom. Table D contains the summary information from 37 different tables for the t distribution together with summary information from table A for the standard normal distribution. Each row of Table D displays critical values for (the last row displays critical values for.) The header row identifies an upper tail probability. The footer row identifies a confidence level C. A two tailed probability (α) can be computed as either two times the one tailed probability or as 1-C. The body of the table displays the corresponding critical value. The degrees of freedom (sample size minus 1, n-1) are used to identify each row. The degrees of freedom range from 1 to 30 and then jump to 40, 50, 60, 80, 100, and 1000. The last row displays the critical values determined from the standard normal. Inspection of the table reveals that as the sample size increases, the critical values from the corresponding t distribution get closer and closer to the critical values of the z distribution. (This makes intuitive sense in that the t distribution uses to estimate. As the sample size increases, this estimate gets better and better. 4
As an example of the use of Table D, consider a random sample of 25 SAT scores from a population of students. The sample mean is 450, the sample standard deviation, s, is 20 and the degrees of freedom, k is 24. The 95% confidence interval is: The interpretation is that if you claimed (in situations like this) that the population mean was between 442 and 458, you would be correct 95% of the time (or wrong 5% of the time.) Two Sample Tests The simplest two sample test is not really a two sample test as it involves matched pairs. This can arise in an experimental situation that involves selecting n pairs of similar experimental units, randomly assigning each unit in a matched pair to one of two treatments, and testing to see if there is a treatment 5
effect. In other situations, the response can be measured on the same individual before and after a treatment. In matched pairs, the focus is on the difference between the responses within a pair. It is a single sample (of size n for the n pairs) of differences. Any of the one sample inferences (Table 1) can be applied to the difference between pairs of a sample of matched pairs. The column you use will depend on whether you have a known or assumed standard deviation for the population or have estimated the standard deviation from the sample of differences. is replaced by, the average of the observed differences. The next level of complexity involves two samples that are assumed to come from a population or populations that are normally distributed (or approximately normal) and have a common standard deviation. As an example, a comparative experiment involving a control and a treatment would be considered, under the null hypothesis as two independent samples from a common population. Is there a sufficient difference between the sample means to reject the null hypothesis? If the common population standard deviation is σ, the two sample means are μ 1 and μ 2 and the samples are independent then the mean of the difference between the two means will be the difference between the means: and the variance of the difference will be the sum of the two variances: (Keep in mind that the above mean and variance represent a mean and variance of the difference of just two numbers, the means of the two samples. This difference, a single number, is a random variable and has a sampling distribution) If it is additionally assumed that the distributions of the sample means are normal or can be approximated as normal then: The left hand column of table 1 can now be used. is replaced by, the observed difference between the two sample means, and is replaced by., the population standard deviation for the difference, d. One can construct a confidence interval around the estimate, d, of the difference 6
between the two population averages. Or one can create a one-tailed or two tailed hypothesis test with some hypothesized value for the difference between the two population means. If σ is not known and must be estimated, we can use the two estimates, s 1 and s 2, and combine them to provide a pooled estimate of σ. Each estimate is combined in proportion to its degrees of freedom relative to the total degrees of freedom for the two samples. (The degrees of freedom reflect the number of independent observations that have been used to generate the estimate. An estimate of the sample variation uses n observations but also uses the estimated mean. The mean is said to use one degree of freedom. To put it another way, if I told you the mean for a sample of n points, and told you the values of n-1 points, you would be able to tell me the value of the n th point. The n th point, given the mean and the previous n-1 points, is, in a sense, superfluous. The effective number of points being used to estimate the variance is one less than the sample size.) Pooling estimates in proportion to their degrees of freedom places greater emphasis on those estimates that are based on larger samples. is the pooled estimate of the common variance, s 1 and s 2 are the estimated sample variances from the two samples. The best pooled estimated variance for each of the sample means uses this pooled estimate and each of the sample sizes: The estimated variance for the difference is the sum of these estimates (The t statistic, a test statistic, will have a t distribution with degrees of freedom equal to the sum of the degrees of freedom for the two samples.) Any of the one sample inferences (Table 1) can be applied to the difference between sample averages (with an assumed common standard deviation.) The column you use will depend on whether you have a known or assumed standard deviation for the population or have estimated the standard error of the difference (standard deviation) from the pooled estimates of the sample variances. All occurrences of are replaced by d. σ d and SE d are computed from the above equations. And n and the degrees of 7
freedom are the total number of observations in the two samples or the sum of the degrees of freedom from the two samples respectively. As an example, consider two schools that tested 25 and 36 randomly selected students. In the first school, the SAT scores had an average and standard deviation was 470 and 40 respectively. In the second school, the average and standard deviation was 455 and 42. Assuming that the distribution of SAT scores for both schools can be assumed to be approximated as Normal with a common standard deviation we can estimate the 95% confidence interval for the difference between the mean school scores. the critical value t * for the t statistic with 74 degrees of freedom for a confidence level of 95% is around 1.995. The estimated difference between the two schools is 15 ± 1.995 10.8 =15 ± 21.6. A two tailed hypothesis test with an H 0 : μ = 0 would not be rejected at a significance level of.05. (The 95% confidence interval contains the point 0.) One can also use the formula and substitute d and SE d in place of and The null hypothesis would be rejected if 1.995. As expected, we cannot conclude that there is a significant difference in the average SAT scores between the two schools. The final complication would be to consider two samples coming from two different populations with two different population variances. The computations are similar to those above. If we assume we know the two variances, and then the expected value (mean) of the difference between the sample means will be the difference of the expected values and the variance of the difference will be the sum of the variances of the two sample means. 8
the z score for the difference is: As before, we can proceed with the left column of Table 1 by substituting d for and for. A problem arises when we have to estimate the two population variations. We would like to use as a test statistic some standardize score for the difference between the two means. Recall that we developed the t statistic as if it was an approximation to the z statistic but with an estimate for the standard deviation: We can estimate means: as the square root of the sum of the estimated variances of the two sample If we do this, t is referred to as the two-sample t statistic. It does not have a t distribution! In fact, the actual distribution for the two-sample t statistic depends on the unknown variances and. However, the shape is still bell shaped and can be approximated by a t distribution. A good approximation is provided if, k for the t distribution is chosen to be one less than the smaller sample size. (Computer software can be used to estimate a better value for k. Since this value for k is being used to pick a t distribution curve to approximate a two-sample t distribution, the value of k may be a decimal number. The value of k is referred to as the degrees of freedom for the two sample t procedure but it is really just a value that allows you to pick the best approximating t distribution.) Once again, we can use Table 1 if we make all of the appropriate substitutions, including the estimated degrees of freedom. As an example, consider again the two schools that tested 25 and 36 randomly selected students. In the first school, the SAT scores had an average and standard deviation was 470 and 40 respectively. In the second school, the average and standard deviation was 455 and 42. Again, we assume that the distribution of SAT scores for both schools can be approximated as Normal but we don t assume that they necessarily have a common standard deviation. We can estimate the 95% confidence interval for the difference between the mean school scores first estimating the standard error of the difference: 9
and then computing a critical t * (24), 2.064 The 95% confidence interval for the difference in mean SAT scores for the two schools is: 15 ± 2.064 10.6 = 15 ± 21.9 This is just slightly more conservative (a broader confidence interval) than the one-sample t test based on a pooled estimate of a common variance. The two are very similar reflecting the similarity in the estimate of. Summary The focus of this chapter is on comparing the difference between the mean of two samples. In all situations, the data collected from the two samples is condensed into a single number (test statistic) that is a standardized value for the difference between the two populations. The test statistic will, under suitable assumptions, have a standard normal distribution, a t distribution, or a two-sample t distribution. The test statistic can be used to develop confidence intervals around the sample difference (the difference between the two sample means) or to compute P-values for hypothesis tests. The complications involve determining the standard deviation or standard error to be used in computing a standardized test statistic. Under the null hypothesis the test statistic can be computed as: Known σ z-distribution matched pairs Unknown σ t-distribution pooled two sample procedure, equal variances two sample procedure, unequal variance σ is the population standard deviation, is the average difference between matched pairs, n is the number of matched pairs, is the estimated standard deviation of the sample of differences between the matched pairs, d is the difference between the two sample means, n 1 is the sample size for population 1, n 2 is the sample size for population 2, df 1 is the degrees of freedom for the standard error 10
for sample 1, df 2 is the degrees of freedom for the standard error for sample 2, s 1 and s 2 are the estimated standard deviations from sample 1 and sample 2, and t 2 is a two-sample t statistic that has a distribution that is approximated by a t distribution with degrees of freedom equal to the smaller of df 1 or df 2. is the hypothesized difference between the two population means and is generally assumed to be zero. 11