Chapter 20 Comparing Groups
Comparing Proportions Example Researchers want to test the effect of a new anti-anxiety medication. In clinical testing, 64 of 200 people taking the medicine reported symptoms of anxiety. Of the people receiving the placebo, 92 of 200 reported symptoms of anxiety. Is the medication working any different than the placebo? Test the claim at the α =.05 significance level.
The Conditions Notice that we are talking about two samples here. To take care of this, we need to understand the process for dealing with two proportions. What we will see is that it is largely the same as the process for one proportion with different formulas, but there are some important differences. First, the conditions.
The Conditions Notice that we are talking about two samples here. To take care of this, we need to understand the process for dealing with two proportions. What we will see is that it is largely the same as the process for one proportion with different formulas, but there are some important differences. First, the conditions. 1 Independence Assumption: We are drawing independent samples from each population.
The Conditions Notice that we are talking about two samples here. To take care of this, we need to understand the process for dealing with two proportions. What we will see is that it is largely the same as the process for one proportion with different formulas, but there are some important differences. First, the conditions. 1 Independence Assumption: We are drawing independent samples from each population. 2 Randomization: Each sample is an SRS.
The Conditions Notice that we are talking about two samples here. To take care of this, we need to understand the process for dealing with two proportions. What we will see is that it is largely the same as the process for one proportion with different formulas, but there are some important differences. First, the conditions. 1 Independence Assumption: We are drawing independent samples from each population. 2 Randomization: Each sample is an SRS. 3 10% Condition: The samples should not exceed 10% of their population.
The Conditions Notice that we are talking about two samples here. To take care of this, we need to understand the process for dealing with two proportions. What we will see is that it is largely the same as the process for one proportion with different formulas, but there are some important differences. First, the conditions. 1 Independence Assumption: We are drawing independent samples from each population. 2 Randomization: Each sample is an SRS. 3 10% Condition: The samples should not exceed 10% of their population. 4 Independent Groups Assumptions: The two groups must be independent of each other.
The Formulas The sampling distribution for the difference between the two proportions is modeled after the one for one proportion, but with µ = p 1 p 2. The standard deviation is SD( p 1 p p 1 (1 p 1 ) p 2 (1 p 2 ) 2 ) = + n 1 n 2 So, when making our confidence interval, it is again a z interval like for one proportion and is given by ( p 1 p 2 ) ± z p 1 (1 p 1 ) p 2 (1 p 2 ) + n 1 n 2
The Formulas The sampling distribution for the difference between the two proportions is modeled after the one for one proportion, but with µ = p 1 p 2. The standard deviation is SD( p 1 p p 1 (1 p 1 ) p 2 (1 p 2 ) 2 ) = + n 1 n 2 So, when making our confidence interval, it is again a z interval like for one proportion and is given by ( p 1 p 2 ) ± z p 1 (1 p 1 ) p 2 (1 p 2 ) + n 1 n 2 What we look for with the confidence interval is if 0 falls in the interval. The confidence interval tells us we are C% confident that the difference between the true population proportions will be between the endpoints C% of the time. If 0 is in there, we have no evidence that the proportions are different, so we would fail to reject the null hypothesis.
Back To Our Example All of the assumptions are met, so we will calculate the pertinent statistics. p 1 =.32 p 2 =.46 n 1 = 200 n 2 = 200
Back To Our Example All of the assumptions are met, so we will calculate the pertinent statistics. p 1 =.32 p 2 =.46 n 1 = 200 n 2 = 200 What are our hypotheses and claim?
Back To Our Example All of the assumptions are met, so we will calculate the pertinent statistics. p 1 =.32 p 2 =.46 n 1 = 200 n 2 = 200 What are our hypotheses and claim? H 0 : p 1 = p 2 H A : p 1 p 2 claim
Back To Our Example All of the assumptions are met, so we will calculate the pertinent statistics. p 1 =.32 p 2 =.46 n 1 = 200 n 2 = 200 What are our hypotheses and claim? H 0 : p 1 = p 2 H A : p 1 p 2 claim.32(.68) (.32.46) ± 1.96 +.46(.54) 200 200.2346 < p 1 p 2 <.0454 Since 0 is not in the interval, we reject H 0 and this support the claim here.
Using Technology We have two samples now, so we have to use 2-Prop ZInt here. The input is the number of successes and total number of observations from each of the populations, so it is similar to when we did this for one sample.
Using Technology We have two samples now, so we have to use 2-Prop ZInt here. The input is the number of successes and total number of observations from each of the populations, so it is similar to when we did this for one sample. When we find the interval for our situation, we get.2346 < p 1 p 2 <.0454 which is the same as the interval we found by hand and yields the same conclusion.
What About Probabilities? We could have also used probabilities to test this hypothesis. The beginning of this would be the same. We would make sure the conditions are met, then calculate the sample proportions. But the error is different for a probability than it is for the confidence interval. We have to use what is called the pooled proportion. p = success 1 + success 2 n 1 + n 2
What About Probabilities? We could have also used probabilities to test this hypothesis. The beginning of this would be the same. We would make sure the conditions are met, then calculate the sample proportions. But the error is different for a probability than it is for the confidence interval. We have to use what is called the pooled proportion. In our problem, we d have p = success 1 + success 2 n 1 + n 2 p = 64 + 92 200 + 200 =.39
Probabilities Continued The standard error for the pooled proportion is ( ) SE pooled ( p 1 p 2 ) = p(1 p) + 1n1 1n2 It is better to not find this separately, but rather to find it as part of the z-score. z = p 1 p 2 SE pooled ( p 1 p 2 ) =.32.46.39(.61) ( 1 200 + 1 ) = 2.87 200
Probabilities Continued The standard error for the pooled proportion is ( ) SE pooled ( p 1 p 2 ) = p(1 p) + 1n1 1n2 It is better to not find this separately, but rather to find it as part of the z-score. z = p 1 p 2 SE pooled ( p 1 p 2 ) =.32.46.39(.61) ( 1 200 + 1 ) = 2.87 200 Since this is 2-sided, we double the probability, so... 2P(z 2.87) = 2(.0021) =.0042 < α And, we reach the same conclusion.
Probabilities Continued WE could have used probabilities here as well by using the function 2-Prop ZTest. We input the raw numbers and then the type of test (here, 2 sided) and we get 2P(p 1 p 2 > 0) =.0041 which is close to the probability we found by hand and gives the same conclusion.
AAA Example Example AAA investigated the question of whether a man or a woman would be more likely to stop and ask for directions. A random sample showed that 300 of 811 women would stop and ask for directions while 255 of 750 men indicated they would. At the α =.05 significance level, test the claim that women are more likely to ask for directions.
AAA Example Example AAA investigated the question of whether a man or a woman would be more likely to stop and ask for directions. A random sample showed that 300 of 811 women would stop and ask for directions while 255 of 750 men indicated they would. At the α =.05 significance level, test the claim that women are more likely to ask for directions. After checking conditions, we move on to the hypotheses and claim.
AAA Example Example AAA investigated the question of whether a man or a woman would be more likely to stop and ask for directions. A random sample showed that 300 of 811 women would stop and ask for directions while 255 of 750 men indicated they would. At the α =.05 significance level, test the claim that women are more likely to ask for directions. After checking conditions, we move on to the hypotheses and claim. H A : p 1 > p 2 claim H 0 : p 1 p 2
AAA Example Example AAA investigated the question of whether a man or a woman would be more likely to stop and ask for directions. A random sample showed that 300 of 811 women would stop and ask for directions while 255 of 750 men indicated they would. At the α =.05 significance level, test the claim that women are more likely to ask for directions. After checking conditions, we move on to the hypotheses and claim. H A : p 1 > p 2 claim H 0 : p 1 p 2 Is this a 1-sided or 2-sided claim?
AAA Example Example AAA investigated the question of whether a man or a woman would be more likely to stop and ask for directions. A random sample showed that 300 of 811 women would stop and ask for directions while 255 of 750 men indicated they would. At the α =.05 significance level, test the claim that women are more likely to ask for directions. After checking conditions, we move on to the hypotheses and claim. H A : p 1 > p 2 claim H 0 : p 1 p 2 Is this a 1-sided or 2-sided claim? Since it is 1-sided, we will use probabilities.
AAA Example First, we find the proportions and the pooled proportion.
AAA Example First, we find the proportions and the pooled proportion. p 1 =.3699 p 2 =.34 n 1 = 811 n 2 = 750 p =.3555
AAA Example First, we find the proportions and the pooled proportion. Next, the z-score. p 1 =.3699 p 2 =.34 n 1 = 811 n 2 = 750 p =.3555
AAA Example First, we find the proportions and the pooled proportion. Next, the z-score. p 1 =.3699 p 2 =.34 n 1 = 811 n 2 = 750 p =.3555 z =.3699.34.3555(.6445) ( 1 811 + 1 ) = 1.23 750
AAA Example First, we find the proportions and the pooled proportion. Next, the z-score. p 1 =.3699 p 2 =.34 n 1 = 811 n 2 = 750 p =.3555.3699.34 z =.3555(.6445) ( 1 811 + 1 ) = 1.23 750 And the probability?
AAA Example First, we find the proportions and the pooled proportion. Next, the z-score. p 1 =.3699 p 2 =.34 n 1 = 811 n 2 = 750 p =.3555.3699.34 z =.3555(.6445) ( 1 811 + 1 ) = 1.23 750 And the probability? P(z > 1.23) = 1.8907 =.1093 > α or with technology we get P(p 1 > p 2 ) =.10868 This probability is large enough that we fail to reject the null hypothesis and therefore do not support the claim that women are more likely to ask for directions.
ACME Drug Company Example Suppose the ACME Drug Company develops a new drug designs to prevent colds. The company states that the drug is equally effective for men and women. To test the claim they choose a SRS of 100 women and 200 men from a population of 100,000 volunteers. At the end of the study, 38% of women caught a cold and 51% of men caught a cold. Based on these findings, can we reject the company s claim that the drug is equally effective for men and women? Test at the 5% significance level.
ACME Drug Company Example Suppose the ACME Drug Company develops a new drug designs to prevent colds. The company states that the drug is equally effective for men and women. To test the claim they choose a SRS of 100 women and 200 men from a population of 100,000 volunteers. At the end of the study, 38% of women caught a cold and 51% of men caught a cold. Based on these findings, can we reject the company s claim that the drug is equally effective for men and women? Test at the 5% significance level. We first test the conditions.
ACME Drug Company Example Suppose the ACME Drug Company develops a new drug designs to prevent colds. The company states that the drug is equally effective for men and women. To test the claim they choose a SRS of 100 women and 200 men from a population of 100,000 volunteers. At the end of the study, 38% of women caught a cold and 51% of men caught a cold. Based on these findings, can we reject the company s claim that the drug is equally effective for men and women? Test at the 5% significance level. We first test the conditions. Now, the hypotheses and the claim.
ACME Drug Company Example Suppose the ACME Drug Company develops a new drug designs to prevent colds. The company states that the drug is equally effective for men and women. To test the claim they choose a SRS of 100 women and 200 men from a population of 100,000 volunteers. At the end of the study, 38% of women caught a cold and 51% of men caught a cold. Based on these findings, can we reject the company s claim that the drug is equally effective for men and women? Test at the 5% significance level. We first test the conditions. Now, the hypotheses and the claim. H 0 : p 1 = p 2 claim H A : p 1 p 2
ACME Drug Company Example Suppose the ACME Drug Company develops a new drug designs to prevent colds. The company states that the drug is equally effective for men and women. To test the claim they choose a SRS of 100 women and 200 men from a population of 100,000 volunteers. At the end of the study, 38% of women caught a cold and 51% of men caught a cold. Based on these findings, can we reject the company s claim that the drug is equally effective for men and women? Test at the 5% significance level. We first test the conditions. Now, the hypotheses and the claim. H 0 : p 1 = p 2 claim H A : p 1 p 2 Is this 1-sided or 2-sided?
ACME Drug Example The needed statistics come next.
ACME Drug Example The needed statistics come next. p 1 =.38 p 2 =.51 n 1 = 100 n 2 = 200 p =.47
ACME Drug Example The needed statistics come next. p 1 =.38 p 2 =.51 n 1 = 100 n 2 = 200 p =.47 If we want to find the probability, the z-score would be next.
ACME Drug Example The needed statistics come next. p 1 =.38 p 2 =.51 n 1 = 100 n 2 = 200 p =.47 If we want to find the probability, the z-score would be next. z =.38.51.47(.53) ( 1 100 + 1 ) = 2.13 200
ACME Drug Example The needed statistics come next. p 1 =.38 p 2 =.51 n 1 = 100 n 2 = 200 p =.47 If we want to find the probability, the z-score would be next..38.51 z =.47(.53) ( 1 100 + 1 ) = 2.13 200 And then the probability...
ACME Drug Example The needed statistics come next. p 1 =.38 p 2 =.51 n 1 = 100 n 2 = 200 p =.47 If we want to find the probability, the z-score would be next..38.51 z =.47(.53) ( 1 100 + 1 ) = 2.13 200 And then the probability... or, technology gives 2P(z 2.13) = 2(.0166) =.0332 < α 2P(p 1 > p 2 ) =.0333
Confidence Intervals We also could have used confidence intervals.
Confidence Intervals We also could have used confidence intervals..38(.62) (.38.51) ± 1.96 +.51(.49) 100 200
Confidence Intervals We also could have used confidence intervals..38(.62) (.38.51) ± 1.96 +.51(.49) 100 200.13 ± 1.96(.0600)
Confidence Intervals We also could have used confidence intervals..38(.62) (.38.51) ± 1.96 +.51(.49) 100 200.13 ± 1.96(.0600).2477 < p 1 p 2 <.0123 which is the same whether we use the calculator or do it by hand.
Confidence Intervals We also could have used confidence intervals..38(.62) (.38.51) ± 1.96 +.51(.49) 100 200.13 ± 1.96(.0600).2477 < p 1 p 2 <.0123 which is the same whether we use the calculator or do it by hand. Conclusion?
Confidence Intervals We also could have used confidence intervals..38(.62) (.38.51) ± 1.96 +.51(.49) 100 200.13 ± 1.96(.0600).2477 < p 1 p 2 <.0123 which is the same whether we use the calculator or do it by hand. Conclusion? We reject H 0 and fail to support the claim that the drug is equally effective for each gender.
Last Proportions Example Example In a study done in Michigan, it was determined that 38 of 62 underprivileged children who attended preschool needed social services later in life compared to 49 of 61 children who did not attend preschool. Does this provide evidence that preschool reduces the need for social services later in life? Test at the.01 significance level.
Last Proportions Example Example In a study done in Michigan, it was determined that 38 of 62 underprivileged children who attended preschool needed social services later in life compared to 49 of 61 children who did not attend preschool. Does this provide evidence that preschool reduces the need for social services later in life? Test at the.01 significance level. First check to make sure the conditions are met.
Last Proportions Example Example In a study done in Michigan, it was determined that 38 of 62 underprivileged children who attended preschool needed social services later in life compared to 49 of 61 children who did not attend preschool. Does this provide evidence that preschool reduces the need for social services later in life? Test at the.01 significance level. First check to make sure the conditions are met. Now the statements.
Last Proportions Example Example In a study done in Michigan, it was determined that 38 of 62 underprivileged children who attended preschool needed social services later in life compared to 49 of 61 children who did not attend preschool. Does this provide evidence that preschool reduces the need for social services later in life? Test at the.01 significance level. First check to make sure the conditions are met. Now the statements. H A : p 1 < p 2 claim H 0 : p 1 p 2
Last Proportions Example Example In a study done in Michigan, it was determined that 38 of 62 underprivileged children who attended preschool needed social services later in life compared to 49 of 61 children who did not attend preschool. Does this provide evidence that preschool reduces the need for social services later in life? Test at the.01 significance level. First check to make sure the conditions are met. Now the statements. H A : p 1 < p 2 claim H 0 : p 1 p 2 Confidence interval or P-value?
Last Proportions Example Example In a study done in Michigan, it was determined that 38 of 62 underprivileged children who attended preschool needed social services later in life compared to 49 of 61 children who did not attend preschool. Does this provide evidence that preschool reduces the need for social services later in life? Test at the.01 significance level. First check to make sure the conditions are met. Now the statements. H A : p 1 < p 2 claim H 0 : p 1 p 2 Confidence interval or P-value? This is a 1-sided test so we will only use a P-value.
Last Example s Solution First, the statistics we need.
Last Example s Solution First, the statistics we need. p 1 =.6129 p 2 =.8033 n 1 = 62 n 2 = 61 p =.7073
Last Example s Solution First, the statistics we need. Now the z-score. p 1 =.6129 p 2 =.8033 n 1 = 62 n 2 = 61 p =.7073
Last Example s Solution First, the statistics we need. Now the z-score. p 1 =.6129 p 2 =.8033 n 1 = 62 n 2 = 61 p =.7073 z =.6129.8033.7073(.2927) ( 1 62 + 1 ) = 2.32 61
Last Example s Solution First, the statistics we need. Now the z-score. p 1 =.6129 p 2 =.8033 n 1 = 62 n 2 = 61 p =.7073.6129.8033 z =.7073(.2927) ( 1 62 + 1 ) = 2.32 61 Now we find the probability.
Last Example s Solution First, the statistics we need. Now the z-score. p 1 =.6129 p 2 =.8033 n 1 = 62 n 2 = 61 p =.7073.6129.8033 z =.7073(.2927) ( 1 62 + 1 ) = 2.32 61 Now we find the probability. P(z < 2.32) =.0102 > α Note: calculator rounds to the same probability.
Last Example s Solution First, the statistics we need. Now the z-score. p 1 =.6129 p 2 =.8033 n 1 = 62 n 2 = 61 p =.7073.6129.8033 z =.7073(.2927) ( 1 62 + 1 ) = 2.32 61 Now we find the probability. P(z < 2.32) =.0102 > α Note: calculator rounds to the same probability. Conclusion?
Last Example s Solution First, the statistics we need. Now the z-score. p 1 =.6129 p 2 =.8033 n 1 = 62 n 2 = 61 p =.7073.6129.8033 z =.7073(.2927) ( 1 62 + 1 ) = 2.32 61 Now we find the probability. P(z < 2.32) =.0102 > α Note: calculator rounds to the same probability. Conclusion? We do not have enough evidence, at the α =.01 significance level, so we fail to reject the null hypothesis and in this case reject the claim.
Comparing Means We can also use t-statistics to compare the characteristics of two populations or to compare the response of two treatments.
Comparing Means We can also use t-statistics to compare the characteristics of two populations or to compare the response of two treatments. For example, if we were comparing the sale prices of houses in one area of town (where σ is unknown) then it would be a t-statistic from last chapter. If we were comparing the sale prices of houses from two different areas of town and had separate means and standard deviations, this would be a two-sample t problem (provided we know nether σ).
Comparing Means We can also use t-statistics to compare the characteristics of two populations or to compare the response of two treatments. For example, if we were comparing the sale prices of houses in one area of town (where σ is unknown) then it would be a t-statistic from last chapter. If we were comparing the sale prices of houses from two different areas of town and had separate means and standard deviations, this would be a two-sample t problem (provided we know nether σ). We will compare means here in a way not that different from the way we compared proportions. We could if we so chose, compare spreads but that doesn t tell us much on its own.
Conditions for Comparing Two Means We have 2 SRSs from two distinct populations. The samples must be independent but we are measuring the same variable for both.
Conditions for Comparing Two Means We have 2 SRSs from two distinct populations. The samples must be independent but we are measuring the same variable for both. Both populations are ideally Normally distributed with unknown means and standard deviations. In practice it is often enough to know that they distributions have the same shape and that there are no strong outliers. We often plot with either box plots or histograms to get an idea of the shape if we are not told that the populations are Normal. If we are not told, remember the 15 and 40 rules...
Conditions for Comparing Two Means We have 2 SRSs from two distinct populations. The samples must be independent but we are measuring the same variable for both. Both populations are ideally Normally distributed with unknown means and standard deviations. In practice it is often enough to know that they distributions have the same shape and that there are no strong outliers. We often plot with either box plots or histograms to get an idea of the shape if we are not told that the populations are Normal. If we are not told, remember the 15 and 40 rules... We must have random samples.
Conditions for Comparing Two Means We have 2 SRSs from two distinct populations. The samples must be independent but we are measuring the same variable for both. Both populations are ideally Normally distributed with unknown means and standard deviations. In practice it is often enough to know that they distributions have the same shape and that there are no strong outliers. We often plot with either box plots or histograms to get an idea of the shape if we are not told that the populations are Normal. If we are not told, remember the 15 and 40 rules... We must have random samples. We must hold to the 10% rule.
Two-Sample t-statistics where t = y 1 y 2 SE(y 1 y 2 ) SE(y 1 y 2 ) = s 2 1 n 1 + s2 2 n 2 The number of degrees of freedom is min{n 1 1, n 2 1}.
Captopril Example Example A dose of the drug captopril, designed to lower systolic blood pressure, is administered to 10 randomly selected volunteers and the following results are recorded. Before 120 136 160 98 115 110 180 190 138 128 After 118 122 143 105 98 98 180 175 105 112 Test the claim that the systolic blood pressure is not affected by the pill. Use a α =.05 significance level.
Captopril Example We first have to make sure the conditions are met.
Captopril Example We first have to make sure the conditions are met. Now the pertinent statistics.
Captopril Example We first have to make sure the conditions are met. Now the pertinent statistics. Before After n 1 = 10 n 2 = 10 x 1 = 137.5 x 2 = 125.6 s 1 = 30.35 s 2 = 30.42
Captopril Example We first have to make sure the conditions are met. Now the pertinent statistics. What is out hypotheses and claim? H 0 : µ 1 = µ 2 claim H A : µ 1 µ 2 Before After n 1 = 10 n 2 = 10 x 1 = 137.5 x 2 = 125.6 s 1 = 30.35 s 2 = 30.42
Captopril Example Now we need to find t.
Captopril Example Now we need to find t. t = 137.5 125.6 30.35 2 10 + 30.422 10 =.8757
Captopril Example Now we need to find t. t = 137.5 125.6 30.35 2 10 + 30.422 10 =.8757 To find the probability, we will use 9 degrees of freedom..4 < 2P(T >.8757) <.5 > α
Captopril Example Now we need to find t. t = 137.5 125.6 30.35 2 10 + 30.422 10 =.8757 To find the probability, we will use 9 degrees of freedom. Conclusion?.4 < 2P(T >.8757) <.5 > α
Captopril Example Now we need to find t. t = 137.5 125.6 30.35 2 10 + 30.422 10 =.8757 To find the probability, we will use 9 degrees of freedom. Conclusion?.4 < 2P(T >.8757) <.5 > α So, we fail to reject the null hypothesis and consequently do support the claim that the systolic blood pressure is not affected by captopril.
Using Technology Of course there is a function on the calculator to find probabilities such as this one. The function we d need is 2-Samp TTest. We find this by pressing STAT and scrolling to TESTS.
Using Technology Of course there is a function on the calculator to find probabilities such as this one. The function we d need is 2-Samp TTest. We find this by pressing STAT and scrolling to TESTS. What we have to be careful about here is where we have the lists if we are inputting data. If we have them backwards, we will have the complement of the probability we seek. So, my suggestion would be to determine what type of test this is, write the hypotheses and then put the data associated with population 1 under L 1 and that for population 2 under L 2.
Using Technology Of course there is a function on the calculator to find probabilities such as this one. The function we d need is 2-Samp TTest. We find this by pressing STAT and scrolling to TESTS. What we have to be careful about here is where we have the lists if we are inputting data. If we have them backwards, we will have the complement of the probability we seek. So, my suggestion would be to determine what type of test this is, write the hypotheses and then put the data associated with population 1 under L 1 and that for population 2 under L 2. Here, when we plug in the values, we get 2P(µ 1 > µ 2 ) =.3927. This is slightly smaller than the range we got by hand, and this is due to the approximation for the number of degrees of freedom.
Degrees of Freedom We estimated the number of degrees of freedom in the first example just to use the approximated number of degrees of freedom. But the text uses a more exact number of degrees of freedom in its examples. df = ( ) s 2 2 1 n 1 + s2 2 n 2 ( ) 1 s 2 2 ( ) 1 n 1 1 n 1 + 1 s 2 2 2 n 2 1 n 2 This will invariably give us a decimal, which is only a problem when using the table. When it is a decimal, round down to be safe. If using technology, this isn t a problem.
Traffic Example Example Two different independent procedures are used to control traffic at the airport. The number of operations for 30 different randomly selected hours are listed below. At the α =.05 significance level, test the claim that the use of system 2 results in a mean number of operations per hour exceeding the mean for system 1. System 1 63 62 67 66 53 72 62 57 49 57 60 68 58 64 61 System 2 62 63 67 61 68 66 62 67 66 62 62 65 65 66 64
Traffic Example We first check our conditions.
Traffic Example We first check our conditions. Now the necessary statistics.
Traffic Example We first check our conditions. Now the necessary statistics. System 1 System 2 n 1 = 15 n 2 = 15 x 1 = 61.27 x 2 = 64.4 s 1 = 5.95 s 2 = 2.26
Traffic Example We first check our conditions. Now the necessary statistics. System 1 System 2 n 1 = 15 n 2 = 15 x 1 = 61.27 x 2 = 64.4 s 1 = 5.95 s 2 = 2.26 What are the hypotheses? The claim? H 0 : µ 1 = µ 2 H A : µ 1 < µ 2
Traffic Example We first check our conditions. Now the necessary statistics. System 1 System 2 n 1 = 15 n 2 = 15 x 1 = 61.27 x 2 = 64.4 s 1 = 5.95 s 2 = 2.26 What are the hypotheses? The claim? H 0 : µ 1 = µ 2 H A : µ 1 < µ 2 claim
Traffic Example Now to find the t-statistic.
Traffic Example Now to find the t-statistic. t = 61.27 64.4 5.95 2 15 + 2.262 15 How many degrees of freedom do we have? = 1.905
Traffic Example Now to find the t-statistic. t = 61.27 64.4 5.95 2 15 + 2.262 15 How many degrees of freedom do we have? df = 1 14 ( ) 5.95 2 2 15 + 2.262 15 ( 5.95 2 15 ) 2 + 1 14 ( 2.26 2 15 = 1.905 ) 2 = 17.957
Traffic Example What is the probability we are looking for?
Traffic Example What is the probability we are looking for? P(T > 1.905) with 17 degrees of freedom.
Traffic Example What is the probability we are looking for? P(T > 1.905) with 17 degrees of freedom..025 < P(T > 1.905) <.05 < α Using technology, we get P(µ 1 < µ 2 ) =.0363
Traffic Example What is the probability we are looking for? P(T > 1.905) with 17 degrees of freedom. Using technology, we get.025 < P(T > 1.905) <.05 < α P(µ 1 < µ 2 ) =.0363 So, what is our conclusion? So, we reject H 0 and support the claim that system 2 averages more operations per hour than system 1.
Phone Call Lengths Example The Lectrolyte Company collects data samples on the lengths of telephone calls (in minutes) made by employees in two different divisions, and the results are shown below. At the 0.02 significance level, test the claim that there is no difference between the mean times of all long distance calls made in the two divisions. Sales Division Customer Service Division n s = 40 n c = 20 x s = 10.26 x c = 6.93 s s = 8.65 s c = 4.93
Phone Call Lengths First we check the conditions.
Phone Call Lengths First we check the conditions. Now, the hypotheses and claim.
Phone Call Lengths First we check the conditions. Now, the hypotheses and claim. H 0 : µ s = µ c claim H A : µ s µ c
Phone Call Lengths First we check the conditions. Now, the hypotheses and claim. H 0 : µ s = µ c claim H A : µ s µ c Is this one or two sided?
Phone Call Lengths First we check the conditions. Now, the hypotheses and claim. H 0 : µ s = µ c claim H A : µ s µ c Is this one or two sided? We have a choice, but here we will use t-scores and probability.
Phone Call Lengths So, first we find t.
Phone Call Lengths So, first we find t. t = 10.26 6.93 8.65 2 40 + 4.932 20 = 1.896
Phone Call Lengths So, first we find t. t = 10.26 6.93 8.65 2 40 + 4.932 20 = 1.896 Now, we need to know how many degrees of freedom we need to work with.
Phone Call Lengths So, first we find t. t = 10.26 6.93 8.65 2 40 + 4.932 20 = 1.896 Now, we need to know how many degrees of freedom we need to work with. df = 1 40 ( ) 8.65 2 2 40 + 4.932 20 ( 8.65 2 40 ) 2 + 1 20 ( 4.93 2 20 ) 2 = 59.203
Phone Call Lengths So, first we find t. t = 10.26 6.93 8.65 2 40 + 4.932 20 = 1.896 Now, we need to know how many degrees of freedom we need to work with. df = 1 40 ( ) 8.65 2 2 40 + 4.932 20 ( 8.65 2 40 ) 2 + 1 20 ( 4.93 2 20 ) 2 = 59.203 We want to find 2P(T > 1.896) based on 59 degrees of freedom. When we refer to our table, what do we get?
Phone Call Lengths So, first we find t. t = 10.26 6.93 8.65 2 40 + 4.932 20 = 1.896 Now, we need to know how many degrees of freedom we need to work with. df = 1 40 ( ) 8.65 2 2 40 + 4.932 20 ( 8.65 2 40 ) 2 + 1 20 ( 4.93 2 20 ) 2 = 59.203 We want to find 2P(T > 1.896) based on 59 degrees of freedom. When we refer to our table, what do we get?.05 < 2P(T > 1.896) <.10 Using technology, we get 2P(µ 1 > µ 2 ) =.0631
Phone Call Lengths Conclusion?
Phone Call Lengths Conclusion? So, we fail to reject H 0 because the probability is sufficiently large. Thus, we support the claim that the mean times are the same between the two divisions.
Solvent Example Example In order to evaluate the degree of suspension of a polyethylene, its gel contents are determined after extraction using a solvent. The method is called the gel proportion estimation method. A study was run to compare two solvents, ethanol and toluene, using an extraction tome of 8 hours. Ethanol Toluene 94.7 96.6 96.4 95.5 93.0 95.9 95.4 95.0 96.4 96.1 94.6 95.3 94.8 95.4 96.9 96.8 94.8 95.2 94.8 96.1 At the 5% significance level, is the true mean gel content extracted different for the two solvents?
Solvent Example First we check the conditions.
Solvent Example First we check the conditions. Now the hypotheses and claim.
Solvent Example First we check the conditions. Now the hypotheses and claim. H 0 : µ e = µ t H A : µ e µ t claim
Solvent Example First we check the conditions. Now the hypotheses and claim. H 0 : µ e = µ t H A : µ e µ t claim And the pertinent statistics.
Solvent Example First we check the conditions. Now the hypotheses and claim. H 0 : µ e = µ t H A : µ e µ t claim And the pertinent statistics. Ethanol Toluene n e = 10 n t = 10 x e = 95.18 x t = 95.79 s e = 1.142 s t =.60
Solvent Example First, we find t.
Solvent Example First, we find t. t = 95.18 95.70 1.142 2 10 +.6082 10 = 1.491
Solvent Example First, we find t. t = Degrees of freedom? 95.18 95.70 1.142 2 10 +.6082 10 = 1.491
Solvent Example First, we find t. Degrees of freedom? df = t = 1 9 95.18 95.70 ( 1.142 2 10 1.142 2 10 +.6082 10 ( ) 1.142 2 2 10 +.62 10 ) 2 ( + 1.6 2 9 10 = 1.491 ) 2 = 13.6149 So, we want to find 2P(T > 1.491) with 13 degrees of freedom.
Solvent Example First, we find t. Degrees of freedom? df = t = 1 9 95.18 95.70 ( 1.142 2 10 1.142 2 10 +.6082 10 ( ) 1.142 2 2 10 +.62 10 ) 2 ( + 1.6 2 9 10 = 1.491 ) 2 = 13.6149 So, we want to find 2P(T > 1.491) with 13 degrees of freedom. Using technology, we get.10 < 2P(T > 1.491) <.20 2P(µ 1 < µ 2 ) =.1586
Solvent Example Conclusion?
Solvent Example Conclusion? We fail to reject the null hypothesis and do not support the claim that the mean gel content extracted is different.
Impulse Shopping Example White Hen Pantry has two grocery stores located in Salem. One store is located on North Street and the other on Highland Ave and each is run by a different manager. Each manager claims that her store s layout maximizes the amounts customers will purchase on impulse. Both managers surveyed a sample of their customers and asked them how much more they spent than they had planned to, in other words, how much did they spend on impulse? The following table shows the sample data collected from the two stores. Upper-level management at White Hen Pantry wants to know if there is a difference in the mean amounts purchased on impulse at the two stores and has hired you to perform the statistical analysis. At the 10% significance level, do you tell them that the stores are statistically the same in this regard or that one of the locations has a higher rate of impulse shopping?
Impulse Shopping We don t need the data to check the conditions and to write the hypotheses and claim.
Impulse Shopping We don t need the data to check the conditions and to write the hypotheses and claim. H 0 : µ N = µ H claim H A : µ N µ H
Impulse Shopping We don t need the data to check the conditions and to write the hypotheses and claim. H 0 : µ N = µ H claim H A : µ N µ H Now the data. North Street Highland Ave 15.78 17.73 15.19 18.22 10.61 15.79 15.38 15.96 14.22 13.82 21.92 12.87 13.45 12.86 12.47 13.96 10.82 12.85 13.74 13.74 18.40 17.79 10.83
Impulse Shopping Now the statistics we need.
Impulse Shopping Now the statistics we need. n N = 10 n H = 13 x N = 13.79 x H = 15.42 s N = 2.22 s H = 3.01
Impulse Shopping Now the statistics we need. And t. n N = 10 n H = 13 x N = 13.79 x H = 15.42 s N = 2.22 s H = 3.01
Impulse Shopping Now the statistics we need. And t. t = Degrees of freedom is next. n N = 10 n H = 13 x N = 13.79 x H = 15.42 s N = 2.22 s H = 3.01 13.79 15.42 2.22 2 10 + 3.012 13 = 1.494
Impulse Shopping Now the statistics we need. And t. t = Degrees of freedom is next. df = 1 9 n N = 10 n H = 13 x N = 13.79 x H = 15.42 s N = 2.22 s H = 3.01 13.79 15.42 ( 2.22 2 10 2.22 2 10 + 3.012 13 ( ) 2.22 2 2 10 + 3.012 13 ) 2 ( + 1 3.01 2 12 13 = 1.494 ) 2 = 20.982
Impulse Shopping Now the probability.
Impulse Shopping Now the probability..10 < 2P(T > 1.494) <.20 The calculator gives 2P(µ 1 > µ 2 ) =.1510
Impulse Shopping Now the probability..10 < 2P(T > 1.494) <.20 The calculator gives 2P(µ 1 > µ 2 ) =.1510 And the conclusion.
Impulse Shopping Now the probability..10 < 2P(T > 1.494) <.20 The calculator gives 2P(µ 1 > µ 2 ) =.1510 And the conclusion. We fail to reject the null hypothesis and support the claim that the stores are statistically the same.
Two-Sample t-confidence Intervals The idea behind these is basically the same as for two proportions, with the obvious exception that we are talking about means. But the basic idea behind the setup is the same, as is the interpretation. We compare means this way by finding a confidence interval for the difference µ 1 µ 2 or by testing the hypothesis of no difference µ 1 = µ 2.
Two-Sample t-confidence Intervals The idea behind these is basically the same as for two proportions, with the obvious exception that we are talking about means. But the basic idea behind the setup is the same, as is the interpretation. We compare means this way by finding a confidence interval for the difference µ 1 µ 2 or by testing the hypothesis of no difference µ 1 = µ 2. The Confidence Interval for µ 1 µ 2 (x 1 x 2 ) ± tdf s 2 1 + s2 2 n 1 n 2
House Price Example Example Random samples of home selling prices are obtained from two different zones in a county. North Zone South Zone n 1 = 11 n 2 = 14 x 1 = $142, 318 x 2 = $138, 237 s 1 = $46, 068 s 2 = $21, 336 Construct and interpret a 95% confidence interval for the difference between the two population means.
House Price Example Now we need degrees of freedom so that we can get the correct t. df = 1 10 ( ) 46068 2 2 11 + 213362 14 ( ) 46068 2 2 ( 11 + 1 21336 2 13 14 ) 2 = 13.363 13
House Price Example Now we need degrees of freedom so that we can get the correct t. df = 1 10 ( ) 46068 2 2 11 + 213362 14 ( ) 46068 2 2 ( 11 + 1 21336 2 13 14 ) 2 = 13.363 13 Now, the confidence interval. 46068 2 (142318 138237) ± 2.160 + 213362 11 14 28351.30 < µ 1 µ 2 < 36513.30
House Price Example Now we need degrees of freedom so that we can get the correct t. df = 1 10 ( ) 46068 2 2 11 + 213362 14 ( ) 46068 2 2 ( 11 + 1 21336 2 13 14 ) 2 = 13.363 13 Now, the confidence interval. 46068 2 (142318 138237) ± 2.160 + 213362 11 14 28351.30 < µ 1 µ 2 < 36513.30 And... the interpretation?
House Price Example Now we need degrees of freedom so that we can get the correct t. df = 1 10 ( ) 46068 2 2 11 + 213362 14 ( ) 46068 2 2 ( 11 + 1 21336 2 13 14 ) 2 = 13.363 13 Now, the confidence interval. 46068 2 (142318 138237) ± 2.160 + 213362 11 14 28351.30 < µ 1 µ 2 < 36513.30 And... the interpretation? We are 95% confident that the true mean difference between the house prices in the two zones is between -$28351.30 and $36513.30. This would indicate that we would have no way of knowing, based on this pair of samples, if the house prices are really greater in zone 1 or if this sample is just a reflection of the probability.
Using Technology We have the option of using the calculator for this as well. What we need is 2-Samp TInterval, which is again found in the STAT menu in the TESTS submenu.
Using Technology We have the option of using the calculator for this as well. What we need is 2-Samp TInterval, which is again found in the STAT menu in the TESTS submenu. Here, we get 28268 < µ! µ 2 < 36430 A little bit wider an interval than by hand, due to the number of degrees of freedom.
Cold Medicine Example Example Twelve different independent samples from each of two competing cold medicines are tested for the amount of acetaminophen, and the results (in mg) are given below. Test the claim that the amount of acetaminophen is the same in each brand. Use a α =.05 significance level. Brand X 472 487 506 512 489 503 511 501 495 504 494 462 Brand Y 562 512 523 528 554 513 516 510 524 510 524 508
Cold Medicine Example First we check our conditions.
Cold Medicine Example First we check our conditions. Now the hypotheses and claim.
Cold Medicine Example First we check our conditions. Now the hypotheses and claim. H 0 : µ X = µ Y claim H A : µ X µ Y
Cold Medicine Example First we check our conditions. Now the hypotheses and claim. H 0 : µ X = µ Y claim H A : µ X µ Y Now the statistics we need.
Cold Medicine Example First we check our conditions. Now the hypotheses and claim. H 0 : µ X = µ Y claim H A : µ X µ Y Now the statistics we need. Brand X Brand Y n X = 12 n Y = 12 x X = 494.67 x Y = 523.67 s X = 15.27 s Y = 17.42
Cold Medicine Example Now the degrees of freedom.
Cold Medicine Example Now the degrees of freedom. df = 1 11 ( ) 15.27 2 2 12 + 17.422 12 ( 15.27 2 12 ) 2 + 1 11 ( 17.42 2 12 ) 2 = 21.629 21
Cold Medicine Example Now the degrees of freedom. df = 1 11 ( ) 15.27 2 2 12 + 17.422 12 ( 15.27 2 12 ) 2 + 1 11 ( 17.42 2 12 Which leads to our confidence interval... ) 2 = 21.629 21
Cold Medicine Example Now the degrees of freedom. df = 1 11 ( ) 15.27 2 2 12 + 17.422 12 ( 15.27 2 12 ) 2 + 1 11 ( 17.42 2 12 ) 2 = 21.629 21 Which leads to our confidence interval... 15.27 2 (494.67 523.67) ± 2.080 + 17.422 12 12 38.505 < µ X µ Y < 19.495 And finally our conclusion.
Cold Medicine Example Now the degrees of freedom. df = 1 11 ( ) 15.27 2 2 12 + 17.422 12 ( 15.27 2 12 ) 2 + 1 11 ( 17.42 2 12 ) 2 = 21.629 21 Which leads to our confidence interval... 15.27 2 (494.67 523.67) ± 2.080 + 17.422 12 12 38.505 < µ X µ Y < 19.495 And finally our conclusion. Since 0 is not in our confidence interval, we have sufficient evidence to reject the null hypothesis. Therefore, we do not support the claim that the amount of acetaminophen is the same in the two brands. This confidence interval lends to the idea that Brand Y contains more since the entire interval is negative.
Condifence Interval If we would have used the calculator, we would have gotten the interval 42.88 < µ 1 µ 2 < 15.12 Which does not contain 0 and therefore gives the same conclusion.
Student Performance Example Within a school district, students were randomly assigned to one of two Math teachers - Mrs. Smith and Mrs. Jones. After the assignment, Mrs. Smith had 30 students, and Mrs. Jones had 25 students. At the end of the year, each class took the same standardized test. Mrs. Smith s students had an average test score of 78, with a standard deviation of 10; and Mrs. Jones students had an average test score of 85, with a standard deviation of 15. Test the hypothesis that Mrs. Smith and Mrs. Jones are equally effective teachers. Use a 0.10 level of significance. (Assume that student performance is approximately Normal.)
Student Performance After checking the conditions, we proceed to the hypotheses and claim.
Student Performance After checking the conditions, we proceed to the hypotheses and claim. H 0 : µ 1 = µ 2 claim H A : µ 1 µ 2
Student Performance After checking the conditions, we proceed to the hypotheses and claim. H 0 : µ 1 = µ 2 claim H A : µ 1 µ 2 We will use a confidence interval, so we need the number of degrees of freedom.
Student Performance After checking the conditions, we proceed to the hypotheses and claim. H 0 : µ 1 = µ 2 claim H A : µ 1 µ 2 We will use a confidence interval, so we need the number of degrees of freedom. df = Now the interval. 1 29 ( ) 10 2 2 30 + 152 25 ( 10 2 30 ) 2 + 1 24 ( ) 15 2 2 = 40.4766 40 25
Student Performance After checking the conditions, we proceed to the hypotheses and claim. H 0 : µ 1 = µ 2 claim H A : µ 1 µ 2 We will use a confidence interval, so we need the number of degrees of freedom. df = Now the interval. 1 29 ( ) 10 2 2 30 + 152 25 ( 10 2 30 ) 2 + 1 24 ( ) 15 2 2 = 40.4766 40 25 10 2 (78 85) ± 1.684 30 + 152 25 12.914 < µ 1 µ 2 < 1.086
Student Performance If we used the calculator, we would have gotten 12.91 < µ 1 µ 2 < 1.088 Notice that 0 is not in this interval either.
Student Performance If we used the calculator, we would have gotten 12.91 < µ 1 µ 2 < 1.088 Notice that 0 is not in this interval either. Conclusion?
Student Performance If we used the calculator, we would have gotten 12.91 < µ 1 µ 2 < 1.088 Notice that 0 is not in this interval either. Conclusion? Since 0 is not in our interval, we reject the null hypothesis and reject the claim that the teachers are equally effective. In fact, we are led to believe that Mrs. Jones is a more effective teacher.
Education of Officers Example You have obtained the number of years of education from one random sample of 38 police officers from City A and the number of years of education from a second random sample of 30 police officers from City B. The average years of education for the sample from City A is 15 years with a standard deviation of 2 years. The average years of education for the sample from City B is 14 years with a standard deviation of 2.5 years. Is there a statistically significant difference between the education levels of police officers in City A and City B? Assume both distributions are roughly symmetric with no strong outliers.
Education of Officers Example You have obtained the number of years of education from one random sample of 38 police officers from City A and the number of years of education from a second random sample of 30 police officers from City B. The average years of education for the sample from City A is 15 years with a standard deviation of 2 years. The average years of education for the sample from City B is 14 years with a standard deviation of 2.5 years. Is there a statistically significant difference between the education levels of police officers in City A and City B? Assume both distributions are roughly symmetric with no strong outliers. After we check assumptions, we move onto the hypotheses and claim.
Education of Officers Example You have obtained the number of years of education from one random sample of 38 police officers from City A and the number of years of education from a second random sample of 30 police officers from City B. The average years of education for the sample from City A is 15 years with a standard deviation of 2 years. The average years of education for the sample from City B is 14 years with a standard deviation of 2.5 years. Is there a statistically significant difference between the education levels of police officers in City A and City B? Assume both distributions are roughly symmetric with no strong outliers. After we check assumptions, we move onto the hypotheses and claim. H 0 : µ A = µ B H A : µ A µ B claim
Officer Education Since this is two-sided, we will use a confidence interval here, which means we need to number of degrees of freedom.
Officer Education Since this is two-sided, we will use a confidence interval here, which means we need to number of degrees of freedom. df = What is t here? 1 37 ( ) 2 2 2 38 + 2.52 30 ( 2 2 38 ) 2 + 1 29 ( ) 2.5 2 2 = 54.767 54 30
Officer Education Since this is two-sided, we will use a confidence interval here, which means we need to number of degrees of freedom. df = 1 37 What is t here? 2.009 ( ) 2 2 2 38 + 2.52 30 ( 2 2 38 ) 2 + 1 29 So, our confidence interval is what? ( ) 2.5 2 2 = 54.767 54 30
Officer Education Since this is two-sided, we will use a confidence interval here, which means we need to number of degrees of freedom. df = 1 37 What is t here? 2.009 ( ) 2 2 2 38 + 2.52 30 ( 2 2 38 ) 2 + 1 29 ( ) 2.5 2 2 = 54.767 54 30 So, our confidence interval is what? 2 2 (15 14) ± 2.009 38 + 2.52 30.125 < µ A µ B < 2.125
Officer Education Using technology, we get.1224 < µ A µ B < 2.1224
Officer Education Using technology, we get And finally, our conclusion..1224 < µ A µ B < 2.1224
Officer Education Using technology, we get And finally, our conclusion..1224 < µ A µ B < 2.1224 Since 0 is in the interval, we fail to reject the null hypothesis and therefore reject the claim that there is a difference in the education level among officers in the different cities.
Music Leads To Productivity? Example Do employees perform better at work with music playing? The music was turned on during the working hours of a business with 45 employees. There productivity level averaged 5.2 with a standard deviation of 2.4. On a different day the music was turned off and there were 40 workers. The workers productivity level averaged 4.8 with a standard deviation of 1.2. At the.05 level, can we conclude that music makes employees perform better?
Music Leads To Productivity? Example Do employees perform better at work with music playing? The music was turned on during the working hours of a business with 45 employees. There productivity level averaged 5.2 with a standard deviation of 2.4. On a different day the music was turned off and there were 40 workers. The workers productivity level averaged 4.8 with a standard deviation of 1.2. At the.05 level, can we conclude that music makes employees perform better? After checking conditions, we proceed to the hypotheses.
Music Leads To Productivity? Example Do employees perform better at work with music playing? The music was turned on during the working hours of a business with 45 employees. There productivity level averaged 5.2 with a standard deviation of 2.4. On a different day the music was turned off and there were 40 workers. The workers productivity level averaged 4.8 with a standard deviation of 1.2. At the.05 level, can we conclude that music makes employees perform better? After checking conditions, we proceed to the hypotheses. H A : µ M > µ N claim H 0 : µ M = µ N Is this a one or two sided problem?
Music Leads To Productivity? Example Do employees perform better at work with music playing? The music was turned on during the working hours of a business with 45 employees. There productivity level averaged 5.2 with a standard deviation of 2.4. On a different day the music was turned off and there were 40 workers. The workers productivity level averaged 4.8 with a standard deviation of 1.2. At the.05 level, can we conclude that music makes employees perform better? After checking conditions, we proceed to the hypotheses. H A : µ M > µ N claim H 0 : µ M = µ N Is this a one or two sided problem? This is a 1-sided problem, so we will use probabilities.