ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only on the mean and proportion. The goal was to show that there was a large enough difference between the value of a sample statistic and the population parameter and that the difference could not be due to chance alone. When the difference was large, we argued that the population parameter has changed. When the difference was small, we say that the population parameter has not changed. This approach, however did not give us any information of what causes the change in the population parameter. In this part of our course we will be comparing two populations and use independent and dependent samples to determine what causes the change. Definition of Independent Vs. Dependent Samples Two samples drawn from two populations are independent if the selection of one sample from one population does not affect the selection of the second sample from the second population. Otherwise the samples are dependent. Mean, Standard Deviation and Sampling Distribution of Suppose we select two independent large samples from two different populations that are referred to as Population 1 and Population 2. : 1 : 2 :. 1 :. 2 : 1 30 : 2 30 : 1 : 2 The Central Limit Theorem says that: 1. is approximately normally distributed with mean and std. dev 2. is approximately normally distributed with mean and std. dev Using these results we state the following: 1. The mean of is 2. The Standard deviation of is 1
3. The shape of the sampling distribution of is approximately normal regardless of the shape of the two populations. This is so because the difference between two normally distributed random variables is also normally distributed. 4. For the above to hold true, the samples must both be large. Sampling Distribution, Mean and Std. Dev. of For two large and independent samples selected from two different populations, the sampling distribution of is approximately normal with mean and std. dev. as follows: The mean of is The Standard Deviation of is The value of which gives an estimate of is calculated as Interval Estimation of Confidence Interval for : The 1 100% is if if Examples to be given in class Hypothesis Testing about The value of the test statistic is computed as follows: Inferences about the two population means for small and independent samples: Equal Standard Deviation How and when to use the t-distribution to make inferences about The t-distribution is used to make inferences about when the following assumptions hold true. 1. The two populations from which the two samples are drawn are approximately normally distributed. 2. The samples are small ( 30 30 and independent. 3. The std. dev. of the two populations are unknown but they are assumed to be equal. 2
Pooled Standard deviation for Two Samples The pooled std. dev. for two samples is computed as 1 1 2 Where are the sizes of the two samples and are the variances of the two samples. Here, is called an estimator of. When is used as an estimator of, the std. dev. of is estimated by Estimator of the Standard Deviation of The estimator of the standard deviation of is : 1 1 Now we are ready to discuss the procedures that are used to make confidence intervals and test hypothesis about for small and independent samples selected from two population with unknown but equal standard deviations. Interval Estimation of As we mentioned earlier the difference is the Point Estimator of. The following formulas gives the confidence intervals for when the t-distribution is used. Confidence Interval for. The 1 100% where the value of t is obtained from the t-distribution table for the given confidence level and 2 degrees of freedom. Hypothesis Testing about. When the three hypotheses mentioned earlier are satisfied, the t-distribution is used to make hypothesis test about. The test statistic t in this case is calculated as follows: The value of the test statistic t for is 3
Inferences about the two population means for small and independent samples: Unequal Standard Deviation If all the 3 assumptions made above for the Equal Std. Dev are satisfied, but the std. dev. are not only unknown but unequal, then the procedures for finding confidence intervals and testing of hypothesis about remains similar to what we have learned except for two differences: When the Std. dev. are unknown and unequal: 1. The Degrees of Freedom are no longer given by 2 2. The Std. Dev. of is not calculated using the pooled std. dev. Degrees of Freedom If 1. The two populations from which the two samples are drawn are approximately normally distributed. 2. The samples are small ( 30 30 and independent. 3. The std. dev. of the two populations are unknown and unequal. Then the t-distribution is used to make inferences about and the degrees of freedom for the t-distribution are given by: 1 1 The number given by this formula is always rounded down for df Because the two std. dev. of the two populations are not known, we use as a point estimator of. The following formula is used to calculate the standard deviation 4
Interval Estimation of The difference between the sample means is the point estimator of the difference between the population means. The following formulas give the confidence interval for when the t-distribution is used. The population std. dev. are unknown and are presumed to be unequal. Confidence Interval for The 1 100% where the value of t is obtained from the t-distribution. Hypothesis Testing about The value of the test statistic is computed as follows: POPULATION PROPORTIONS Inferences about the difference between two Population Proportions for large and independent samples Mean, Standard Deviation and Sampling Distribution of For two large and independent samples selected from two different populations, the sampling distribution of is approximately normal with mean and std. dev. as follows: The mean of is The Standard Deviation of is Where 1 1 To construct a confidence interval and test of hypothesis about for large and independent samples, we use the normal distribution. As was indicated earlier, that in the case of proportion, the sample is large if 5 5. In the case of two samples, both samples are large if: 5 & 5 5 & 5 5
Interval Estimation of The difference between the two sample proportions is the point estimator for the difference between the two population proportions. Because we do not know we cannot calculate the value of Therefore we use as the point estimator of. We construct the confidence interval for using the following formulas: Confidence Interval for : The 1 100% is where Examples to be given in class Hypothesis Testing about The testing of hypothesis about for large and independent samples, involve the same five steps as learned earlier. We will calculate the standard deviation of as When a test of hypothesis about is performed the null hypothesis is and the values of are not known. Assuming the null hypothesis is true and, a common value of denoted by is calculated using the following formulas: In order to decide which one of these formulas are to be used, depends on whether the values of or the values of are known. Note that are the number of elements that each of the two samples possess a certain characteristics. The quantity is called the Pooled sample proportion. Using the value of the Pooled sample proportion the standard deviation of is given by the formula: 1 1 1 The Test Statistic z for is 6
The Tree Diagram below is presented as an aid to understand how the formulas are connected and which one to use in calculating the test statistic. DECISION Large Sample Small Sample σ σ σ σ 1 1 2 1 1 7