Chapter 24. Comparing Means

Similar documents
Difference between means - t-test /25

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means

Chapter 22. Comparing Two Proportions 1 /29

Chapter 22. Comparing Two Proportions 1 /30

Chapter 23: Inferences About Means

9-6. Testing the difference between proportions /20

STA Module 11 Inferences for Two Population Means

STA Rev. F Learning Objectives. Two Population Means. Module 11 Inferences for Two Population Means

Chapter 9 Inferences from Two Samples

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

appstats27.notebook April 06, 2017

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Inferences for Regression

9.5 t test: one μ, σ unknown

Chapter 20 Comparing Groups

Statistical inference provides methods for drawing conclusions about a population from sample data.

Inferences About Two Proportions

Chapter 5: HYPOTHESIS TESTING

Two-Sample Inferential Statistics

POLI 443 Applied Political Research

STA Module 10 Comparing Two Proportions

HYPOTHESIS TESTING. Hypothesis Testing

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

ECO220Y Simple Regression: Testing the Slope

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Lab #12: Exam 3 Review Key

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Chapter 27 Summary Inferences for Regression

Single Sample Means. SOCY601 Alan Neustadtl

determine whether or not this relationship is.

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Introduction to Business Statistics QM 220 Chapter 12

Business Statistics. Lecture 10: Course Review

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Chapter 18. Sampling Distribution Models /51

Mathematical Notation Math Introduction to Applied Statistics

Data analysis and Geostatistics - lecture VI

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

An inferential procedure to use sample data to understand a population Procedures

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Sampling Distributions: Central Limit Theorem

Visual interpretation with normal approximation

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Can you tell the relationship between students SAT scores and their college grades?

Correlation and regression

Chapter 7 Comparison of two independent samples

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Sociology 6Z03 Review II

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Statistics for IT Managers

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

10.4 Hypothesis Testing: Two Independent Samples Proportion

The Chi-Square Distributions

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Comparing Means from Two-Sample

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The Chi-Square Distributions

Inference with Simple Regression

Psychology 282 Lecture #4 Outline Inferences in SLR

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

The t-statistic. Student s t Test

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Lectures 5 & 6: Hypothesis Testing

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Ch18 links / ch18 pdf links Ch18 image t-dist table

MTMS Mathematical Statistics

+ Specify 1 tail / 2 tail

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. describes the.

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Hypothesis Testing hypothesis testing approach formulation of the test statistic

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

CHAPTER 10 HYPOTHESIS TESTING WITH TWO SAMPLES

Hypothesis Testing with Z and T

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

REVIEW 8/2/2017 陈芳华东师大英语系

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

PSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

EXAM 3 Math 1342 Elementary Statistics 6-7

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

CHAPTER 13: F PROBABILITY DISTRIBUTION

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

CENTRAL LIMIT THEOREM (CLT)

CHAPTER 13: F PROBABILITY DISTRIBUTION

Advanced Experimental Design

Midterm 2. Math 205 Spring 2015 Dr. Lily Yen

Sampling, Confidence Interval and Hypothesis Testing

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except in problem 1. Work neatly.

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Probability and Statistics

Problem Set 4 - Solutions

Transcription:

Chapter 4 Comparing Means!1 /34

Homework p579, 5, 7, 8, 10, 11, 17, 31, 3! /34

!3 /34

Objective Students test null and alternate hypothesis about two!4 /34

Plot the Data The intuitive display for comparing two groups is box-and-whisker plots of the data for the two groups, arranged for easy comparison. For example: Of course, the means of the two groups are not part of the display but the comparison will give you a heads up when comparing means. What do you think are the values of the means for the two groups?!5 /34

Comparing Two Means You have just spent time comparing proportions for two groups. Comparing two means is very similar to comparing two proportions. The parameter in these cases is the difference between the two means; Remember, for independent random values, variances add. The standard deviation of the distribution of differences between two sample means is SD(y y ) = σ 1 + σ 1 n When we do not know the true standard deviations of the two groups we use the standard error SE(y y ) = s 1 + s 1 n!6 /34

Comparing Two Means When working with means and the standard error of the difference between means, the appropriate test statistic for the sampling model is Student s t. We build a confidence interval called a two-sample t-interval for the difference between sample means. The corresponding hypothesis test is called a two-sample t-test.!7 /34

TI-84 Inpt: Data Stats Inpt: Data Stats x1: List1: Two sample t - interval. STAT TESTS 0:-SampleTInterval Sx1: n1: x: Sx: n: C-Level: Pooled: No Yes Calculate List: Freq1: Freq: C-Level: Pooled: No Yes Calculate Two sample t - test. Inpt: Data Stats Inpt: Data Stats x1: List1: STAT TESTS 4:-SampleTTest Sx1: n1: x: Sx: n: µ1: µ0 <µ0 >µ0 Pooled: No Yes Calculate List: Freq1: Freq: µ1: µ0 <µ0 >µ0 Pooled: No Yes Calculate!8 /34

Comparing Two Means When the conditions are met, the standardized sample difference between the means of two independent groups is t = (y 1 y ) (µ 1 µ ) s 1 + s n modeled by a Student s t with a number of degrees of freedom found with a special formula. We estimate the standard error with SE(y y ) = s 1 + s 1 n!9 /34

Assumptions and Conditions Independence Assumption (Each condition needs to be checked for both groups.): Randomization Condition: Were the data collected with suitable randomization (representative random samples or a randomized experiment)? 10% Condition: This condition is rarely applied for differences of means. Check it for means only with very small populations or an extremely large sample.!10 /34

Assumptions and Conditions Normal Population Assumption: Sufficiently Normal Condition: must be checked for both groups. A violation by either one violates the condition. Check by examining a histogram for both samples. Independent Groups Assumption: The two groups we are comparing must be independent of each other. For dependent groups there is a special test that will be covered in the next chapter.!11 /34

Two-Sample t-interval With appropriate conditions met, we find the confidence interval for the difference between means of two independent groups, μ 1 μ. (y 1 y ) ± t * s 1 + s n SE(y y ) = s 1 + s 1 n The critical value t* (or t df ) depends on the particular confidence level, C, that you specify and on the number of degrees of freedom, which we get from the sample sizes and a special formula..!1 /34

Degrees of Freedom The special formula for the degrees of freedom for our t critical value ain t fun: df = 1 1 s 1 + s n 1 s 1 n + 1 n 1 s n Because of this, we will let technology calculate degrees of freedom for us when we find a -SampleTInterval and -SampleTTest.!13 /34

Testing the Difference The hypothesis test we use is the two-sample t-test for means. The conditions for a two-sample t-test for the difference between means of two independent groups are the same as for the two-sample t-interval. We test the hypothesis H 0 : μ 1 μ = Δ 0, where the hypothesized difference, Δ 0, is nearly always 0, using the statistic t = (y 1 y ) (µ 1 µ ) s 1 + s N 0, n s 1 + s n!14 /34

Example A study was conducted to compare college majors requiring 1 or more hours of mathematics to majors that required less tha hours of math. The mean income of 8 randomly chosen majors requiring more tha hours of math was $45,69/year, with a standard deviation of $136. The mean income of 10 randomly chosen majors that require less tha hours of math was $40,416, with a standard deviation of $534. Test for a difference in incomes at α =.01. What sampling method was used? We will test the difference between the sample means and compare to a hypothesized population difference of 0. H0: µ1 = µ and H1: µ1 µ or H0: μ1 μ = 0 and H1: μ1 μ 0 µ1 = mean requiring 1 or more hours µ = mean requiring less tha hours!15 /34

Example 8 majors more tha hours x = $45,69/year, s = $136 α =.01 10 majors less tha hours x = $40,416, s = $534 Independence Assumption: Independence Condition: The individual majors are independent. Randomization Condition: The majors were randomly selected. 10% Condition: 8 and 10 are less tha0% of majors offered. Normal Population Assumption: Nearly Normal Condition: Our samples are small but it is reasonable to assume salaries are normally distributed for all majors. Independent Groups Assumption: The two groups are independent.!16 /34

Example 8 majors more tha hours x = $45,69/year, s = $136 α =.01 10 majors less tha hours x = $40,416, s = $534 We will conduct a -tailed -Sample T-test for the model N(0, 1739.3848) STAT TESTS 4:-SampleTTest Result 136 8 + 534 10 1739.3848 Inpt: Data Stats x1: 4569 Sx1: 136 n1: 8 x: 40416 Sx: 534 n: 10 µ1: µ0 <µ0 >µ0 Pooled: No Yes Calculate µ1 µ t= 3.0335697 p=.013463074 df= 10.194039 x1= 4569 x: 40416 Sx1: 136 Sx: 534 n1= 8 n: 10!17 /34

Draw a picture The t score we calculated, 3.033 does not reach the rejection region. The p-value is greater than.005. Result µ1 µ t= 3.0335697 p=.013463074 df= 10.194039 t = 3.033 t*10.194 = 3.1558.005-3 - -1 0 1 3 x1= 4569 x: 40416 Sx1: 136 Sx: 534 n1= 8 n: 10 Fail to reject H0.!18 /34

Example Remember to calculate the test statistic. No need to calculate df. ( ) ( ) µ µ 1 t = X 1 X s 1 + s n = ( ) 4569 40416 0 136 + 534 8 10 = 3.033 df = 1 n 1 1 s 1 s 1 + s n + 1 n 1 s n = 136 8 1 136 8 1 8 + 534 10 + 1 10 1 534 10 df = 10.19!19 /34

Example Here is where df comes into play: Result µ1 µ nd Vars - 4:invT(.995, 10.194) t= 3.0335697 p=.013463074 t*10.194 = 3.1558 df= 10.194039 x1= 4569 p(t > 3.033) = p(x1 - x > 576) =.013 x: 40416 Sx1: 136 t = 3.033 < 3.1558 Fail to reject the null.01 >.005 Sx: 534 n1= 8 n: 10 There is not sufficient evidence at the.01 level to suggest that the mean income for majors requiring 1 hours of math is different than the mean income for majors requiring less tha hours of math.!0 /34

When to pool Remember with the tests for proportion, we know its standard deviation. So, testing the null hypothesis that two proportions are equal, the variances must be equal as well. That allowed us to pool the data for the hypothesis test. For means, there is also a pooled t-test. Like the two-proportions z-test, this test is used when we assume that the variances in the two groups are equal. But, we must be careful, as there is no direct connection between a mean and the value of the standard deviation If we are willing to assume that the variances of two means are equal, we can pool the data from two groups to improve our estimate of the population variance and make the degrees of freedom formula much simpler.!1 /34

*The Pooled t-test We are still estimating the pooled standard deviation from the data, so we use Student s t-model, and the test is called a pooled t-test for the difference between means. If we assume that the variances are equal, we can estimate the common variance from the numbers we already have: s pooled = ( 1)s + (n 1)s 1 n 1 1 ( ) ( ) + n 1 SE pooled (y 1 y ) = s pooled 1 + 1 n Our degrees of freedom becomes df = + n.! /34

*The Pooled t-test The conditions for the pooled t-test and corresponding confidence interval are the same as for our earlier two-sample t procedures, with the additional assumption that the variances of the two groups are the same. For a hypothesis test, the calculated test statistic is: t = (y 1 y ) (µ 0 µ 0 ) df = + n s pooled 1 + 1 n The confidence interval would be: (y 1 y ) ± t df * s pooled 1 + 1 n!3 /34

To Pool or Not to Pool So, now that I have told you that, when should you use pooled-t methods rather than two-sample t methods? Never. (OK, rarely.) Because the advantages of pooling are small, and you are allowed to pool only rarely (when the equal variance assumption is met), we do not pool. It s never wrong not to pool. Not pooling is a more conservative test or interval. Wider interval, more difficult to reject the null hypothesis. So let us make it simple. When comparing means, do not pool.!4 /34

Variances Are Equal? There is a hypothesis test that will test for homogeneity of variance. The homogeneity of variance test is very sensitive to failures of the necessary assumptions and is not reliable for small sample sizes. Since those are the conditions for which we need the two sample t-test there is not much justification for pooling. So, the test is unreliable under the conditions most effected by pooling and when we would most like to pool.!5 /34

Never? Objective Students test null and Is there a place for pooling variances from two groups? Yeperdoo. In a randomized comparative experiment, we begin by randomly assigning experimental units (experimental units!) to treatments. So each treatment group comes from the same population and thus begins with the same population variance. In this case, assuming the variances are equal is a more plausible assumption, but there are conditions that still need to be verified to justify the assumption of equal variance. We ain t goin there.!6 /34

Caution Do not assume that because you have two samples you then have two independent samples. We have not yet talked about paired data (matched pairs). That will be a new (much simpler) test. Make certain your groups are independent. The difference between two means, and the mean of differences are most assuredly not the same. Know which you have. Make certain your groups are independent when you use the two sample t test.!7 /34

Example Resting pulse rates for a random sample of 6 smokers had a mean of 80 beats per minute (bpm) and a standard deviation of 5 bpm. Among 3 randomly chosen nonsmokers, the mean and standard deviation were 74 and 6 bpm. Both sets of data were roughly symmetric and had no outliers. Is there evidence of a difference in mean pulse rate between smokers and nonsmokers? How big? We will test the difference between the sample means and compare to a hypothesized difference of 0. H0: μsmokers = μnon-smokers H1: μsmokers μnon-smokers or H 0: μsmokers μnon-smokers = 0 H1: μsmokers μnon-smokers 0!8 /34

Example 6 smokers: mean of 80 bpm, standard deviation 5 bpm. α =.01 3 nonsmokers: mean and standard deviation were 74 and 6 bpm. Independence Assumption: Independence Condition: The individuals are independent. Randomization Condition: Samples were randomly selected. 10% Condition: 6 and 3 are less tha0% of population of smokers and non-smokers. Normal Population Assumption: Nearly Normal Condition: We are told the populations are normal. Independent Groups Assumption: The two groups are certainly independent.!9 /34

Example 6 smokers: mean of 80 bpm, standard deviation 5 bpm. α =.05 3 nonsmokers: mean and standard deviation were 74 and 6 bpm. We will conduct a -tailed -Sample T-test for the model N(0, 1.4445). t*55.953 = nd Vars - 4:invT(.975, 55.953) =.0033 Stat - Tests - 4:-SampTTest Result Inpt: Data Stats x1: 80 Sx1: 5 n1: 6 x: 74 Sx: 6 n: 3 µ1: µ0 <µ0 >µ0 Pooled: No Yes Calculate *Note the p-value* *Note the df* µ1 µ t= 4.15377991 p= 1.189501x10-4 df= 55.95304534 x1: 80 x: 74 Sx1: 5 Sx: 6 n1: 6 n: 3!30 /34

Example.000056 t = 4.1537 t*55.953 =.0033.000056 Result µ1 µ t= 4.15377991 p= 1.189501x10-4 df= 55.95304534-3 - -1 0 1 3 p(t < -4.1537 or t > 4.1537) = p(x1 - x > 576) x1: 80 x: 74 Sx1: 5 Sx: 6 =.0001 n1: 6 n: 3 Decision: Because the P-value is so small, the observed difference is unlikely to be just sampling error. We reject the null hypothesis. Conclusion: We have strong evidence of a difference in mean resting pulse rates for smokers and nonsmokers. Based on the data it is plausible that the mean pulse rate for non-smokers is lower that the mean pulse rate for smokers.!31 /34

Example Calculate t, do not calculate df. Let the calculator handle that. ( ) ( ) µ µ 1 t = X 1 X s 1 + s n = ( ) 80 74 0 5 6 + 6 3 = 4.1537 df = 1 n 1 1 s 1 s 1 + s n + 1 n 1 s n = 1 6 1 5 5 6 + 6 3 6 + 1 3 1 3 6 df = 55.953!3 /34

Example What is the difference in mean pulse rate between smokers and non-smokers? Since we rejected the null hypothesis that the difference is 0, the next step is to find an interval within which we believe the true difference falls. We will find a 95% confidence t-interval for the difference between mean pulse rates mean with model N(6, 1.4445) The assumptions have been met in the previous hypothesis test. t*55.953 = nd Vars - 4:invT(.975, 55.953) =.0033 (y 1 y ) ± t * s 1 + s n 5 = (80 74) ±.0033 6 + 6 3 = 6 ±.8937 (3.1063, 8.8937)!33 /34

Example (3.1063, 8.8937) Stat - Tests - 0:-SampTInt(Stats, 80, 5, 6, 74, 6, 3,.95, No) Result (3.1063, 8.8937) df= 55.95304534 x1= 80 x: 74 We can be 95% confident that the average pulse rate for smokers is between 3.1 and 8.9 beats per minute higher than for non-smokers. Sx 1: 5 Sx : 6 n1= 6 n: 3 Interpret 95% confidence: If we were to repeatedly sample 6 smokers and 3 non-smokers, 95% of calculated intervals would capture the true difference in mean pulse rate.!34 /34