Unit 27 One-Way Analysis of Variance

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied when two population means are being compared with data consisting of two independent random samples. We discussed two available test statistics: the pooled t test statistic and the separate t test statistic. Our recommendation was that we just use the separate t test statistic, but as be seen from the formulas (displayed in an earlier unit and in Section B.2 of Appendix B, each of these test statistics is a difference between sample means x 1 x 2 divided by an appropriate standard error. We shall now consider a hypothesis test for comparing more than two population means with data consisting of independent random samples. We let k represent the number of populations; consequently, the data consists of k independent random samples, one from each population. As an illustration, we shall consider Table 27-1 which displays data consisting of three random samples of rope breaking strengths (in pounds): one for rope brand Deluxe, one for rope brand Econ, and one for rope brand Nogood. Our interest is in a hypothesis test to see if there is any evidence of a difference in mean breaking strength among the rope brands. The null hypothesis of course states that all the means are equal. Since the null hypothesis in a two sample t test involves only two means, the alternative hypothesis could be one-sided or two-sided depending on whether we are looking for evidence of a difference in only one direction or in either direction. However, when the null hypothesis involves more than two means, there is only type of alternative hypothesis we consider: one stating that there is at least one difference among the means. For instance, in comparing the mean breaking strength among the three rope brands, the null hypothesis states that the mean breaking strength is the same for the three rope brands. If this null hypothesis were not true, there are many different possibilities: the mean could be largest for Deluxe and equal for Econ and Nogood, the mean could be smallest for Deluxe and equal for Econ and Nogood, the mean could be different for all three brands being largest for Deluxe and smallest for Nogood, the mean could be different for all three brands being largest for Deluxe and smallest for Econ, etc. The fact that so many possibilities exist is why the only alternative hypothesis we consider is the one stating that there is at least one difference among the means. As a result, a hypothesis test involving more than two means is always treated as one-sided. In our illustration, the variable "Brand of Rope" is a qualitative variable whose different categories define the different populations being compared. In general, one may consider more than one qualitative variable, but we will not consider these more complex analyses here. When only one qualitative variable is used to defined the populations whose means are being compared, a hypothesis test can be derived from a technique known as a one-way analysis of variance, abbreviated as one-way ANOVA. Analysis of variance may at first seem like a strange name for a procedure whose purpose is to compare means, but this name is actually very appropriate because the test statistic in an ANOVA is the ratio of two measures of variation: the variation between samples and the variation within samples. This test statistic is 202

2 similar to a two sample t test statistic which has the difference between the two sample means as its numerator and a standard error as its denominator. We can think of the difference between means in the numerator as a measure of variation (i.e., the difference) between the two samples, and we can think of the standard error in the denominator as a measure of variation within samples. The numerator of our one-way ANOVA test statistic with the data in Table 27-1 is going to measure how different from each other the three sample means are that is, the amount of variation among the sample means. It should be an easy task for you to calculate each of these sample means and complete Table 27-2 by entering the sample means and sample sizes. (You should find that n D = 4, x D = 162, n E = 3, x E = 159, n N = 3, x N = 155.) The amount of variation among the sample means is measured from the deviations of the sample means from what we call the grand mean (similar to the way the sample variance measures the variation in a sample from the deviations of observations from the sample mean). This grand mean, denoted as x, is the mean obtained when all samples are combined together into one sample; we represent this total sample size as n. You can easily check that when all n = 10 observations of Table 27-1 are combined together, the grand mean is x = 159. Variation among the sample means is based on squaring the deviation of each sample mean from the grand mean, multiplying by the corresponding sample size, and summing these results. This sum of squares is called the between samples sum of squares, which we shall denote as SSB. For the data of Table 27-1, you can easily check that SSB = 84. The df (degrees of freedom) associated with this sum of squares for the numerator of our one-way ANOVA test statistic is one less than the number of samples being compared (i.e., one less than the number of categories of the qualitative variable which defines the populations in our one-way ANOVA). With k samples being compared, we have df = k 1 associated with SSB, which is 3 1 = 2 for the data of Table 27-1; we shall call this the numerator df for our test statistic. The numerator of our one-way ANOVA test statistic is SSB, ( k 1) which is called the between samples mean square and denoted as MSB. In general a mean square is a sum of squares divided by its associated degrees of freedom. For the data of Table 27-1, MSB = 84/2 = 42. While the numerator of our one-way ANOVA test statistic measures how different from each other the k sample means are, the denominator of the test statistic measures how different from each other the observations within any particular sample are, that is, the amount of variation within any particular sample. This within sample variation is measured from the deviations of individual observations from the corresponding sample mean by squaring the deviation of each observation from its respective sample mean and summing these squares. This sum of squares is called the within samples sum of squares or the error sum of squares (or sometimes the residual sum of squares). We shall denote this sum of squares as SSE. For the data of Table 27-1, you can easily check that SSE = 86. The df (degrees of freedom) associated with this sum of squares for the denominator of our one-way ANOVA test statistic is n k. That is, we have df = n k associated with SSE, which is 10 3 = 7 for the data of Table 27-1; we shall call this the denominator df for our test statistic. The denominator of our one-way ANOVA test statistic is SSE ( n k), which is called the within samples mean square or the error mean square (or the residual mean square) and is denoted as MSE. For the data of Table 27-1, MSE = 86/7 = The formula for our one-way ANOVA test statistic is 203

3 MSB MSE SSB ( k 1) =. SSE ( n k) For the data of Table 27-1, you should now easily be able to verify that the one-way ANOVA test statistic is While we have outlined how this test statistic is calculated, a more detailed description of the calculation is provided in Section B.3 in Appendix B for the interested reader. By making use of the appropriate statistical software or programmable calculator, it is easy to obtain the sums of squares, degrees of freedom, mean squares, and test statistic value without having to grind through all the calculations. With data sets larger than that of Table 27-1, these calculations can be cumbersome. This one-way ANOVA test statistic is similar to the pooled two sample t statistic in that the denominator measures variation within samples by assuming that the variation is roughly the same within the different samples. Recall that we recommended using the separate t statistic instead of the pooled t statistic to avoid having to be concerned about whether or not the assumption of the same variation within all samples is reasonable. While alternative test statistics are available for a one-way ANOVA for cases where samples show drastically different variation (and readily available with the appropriate statistical software or programmable calculator), we shall not discuss these in detail here. Our focus will be exclusively on the one-way ANOVA test statistic obtained by dividing the between samples mean square by the error mean square. A two-sample t statistic is compared to the appropriate Student's t distribution in order to decide whether or not to reject the null hypothesis. We denote this test statistic by t(df) and refer to it as a t statistic with degrees of freedom df. In a one-way ANOVA, we decide whether or not to reject the null hypothesis by comparing our test statistic to an appropriate Fisher's f distribution. In order to perform the hypothesis test in a one-way ANOVA, we must first study the f distributions. Just as Table A.2 provides us with information about the standard normal distribution and Table A.3 provides us with information about the t distributions, Table A.4, which extends over several pages in Appendix A, provides us with information about the f distributions. The figures at the top of Tables A.2 and A.3 indicate that the standard normal distribution and the t distributions are each symmetric and bell-shaped. In contrast, the figure at the top of the first page of Table A.4 indicates that each f distribution is positively skewed. The figure also indicates that values from an f distribution are always nonnegative, unlike values from a normal distribution or a t distribution, which may be negative or positive. Table A.4 contains information about several different f distributions. Each different t distribution is identified by df (degrees of freedom); however, each f distribution depends on two different df : one called numerator degrees of freedom and one called denominator degrees of freedom. Each column of Table A.4 is labeled by a value for numerator df, and each row is labeled by a value for denominator df. Corresponding to each combination of numerator df and denominator df is a set of four values. As indicated at the top of each page of Table A.4, the first value of the set is the f-score above which lies 0.10 of the area under the corresponding f density curve, the second value is the f-score above which lies 0.05 of the area under the corresponding f density curve, the third value is the f-score above which lies 0.01 of the area under the corresponding f density curve, and the fourth value is the f-score above which lies of the area under the corresponding f density curve. The notation we shall use to represent the f-scores in Table A.4 is similar to the notation we use to represent t-scores in Table A.3. For instance, just as we use t 5;0.01 to represent the t-score above which lies 0.01 of the area under the density curve for the Student's t distribution with df = 5, we shall use f 5, 13; 0.01 to represent the f-score above which lies 0.01 of the area under the density curve for the Fisher's f distribution with numerator df = 5 and denominator df = 13. Notice that when designating f-scores in Table A.4, we include in the subscript both the numerator and denominator degrees of freedom followed by the area above the f-score. Recall that our test statistic in a one-way ANOVA is MSB divided by MSE. As we have seen, the numerator df is k 1, since MSB is obtained by dividing SSB by k 1; similarly, we have seen that the denominator df is n k, since MSE is obtained by dividing SSE by n k. Consequently, we shall use f k, k 1 n to denote this test statistic; that is, we can write 204

4 f n k, k 1 = MSB MSE SSB ( k 1) =, SSE ( n k) which we can refer to as Fisher s f statistic with k 1 numerator degrees of freedom and n k denominator degrees of freedom. Recall that for the data of Table 27-1, we have found that f 2, 7 = Our null hypothesis in a one-way ANOVA states that the population means are all equal. If all sample means tuned out to be equal to each other, the f statistic would be equal to zero (since SSB would be equal to zero). However, even if the null hypothesis in a one-way ANOVA is true, we still expect there to be some random variation among the sample means. The figure at the top of the first page of Table A.4 illustrates the distribution of the f test statistic if the null hypothesis is true. Figure 27-1 displays this same typical density curve for an f distribution. Notice that the shape of this density curve suggests that a high proportion of the f-scores for most f distributions are roughly speaking in the range from 0 to 2. In a one-way ANOVA, an f test statistic which is unusually large would provide us with evidence that at least one population mean is different from the others. We then define the rejection region in a one-way ANOVA by the f-score above which lies α of the area under the corresponding density curve, where α is of course the significance level. The shaded area in the figure at the top of the first page of Table A.4 graphically illustrates the type of rejection region we use in a one-way ANOVA. The same four steps we have used in the past to perform hypothesis tests are used in a one-way ANOVA. Let us use a one-way ANOVA with the data of Table 27-1 to see if there is any evidence of a difference in mean breaking strength among the rope brands Deluxe, Econ, and Nogood. We shall choose a 0.01 significance level for the one-way ANOVA. The first step is to state the null and alternative hypotheses, and choose a significance level. The null hypothesis of course states that mean breaking strength is equal for the rope brands Deluxe, Econ, and Nogood. We can state this null hypothesis in an abbreviated form and complete the first step of the hypothesis test as follows: H 0 : µ D = µ E = µ N vs. H 1 : At least one of µ D, µ E, and µ N is different (α = 0.01). We complete the second step of the hypothesis test by calculating the f test statistic. Recall that we did this earlier and found that f 2, 7 = The third step is to define the rejection region, decide whether or not to reject the null hypothesis, and obtain the p-value of the test. As we have indicated earlier, our rejection region is defined by the f-score above which lies an area equal to the significance level. For this hypothesis test, the rejection region is defined by the f-score above which lies 0.01 of the area under the f density curve with 2 numerator degrees of freedom and 7 denominator degrees of freedom; from Table A.4, we see that this f-score is f 2, 7; 0.01 = We can then define our rejection region algebraically as f 2, Graphically, we can picture the rejection region as corresponding to the shaded area in the figure at the top of the first page of Table A.4, where the value on the horizontal axis defining the rejection region is f 2, 7; 0.01 = Since our test statistic f 2, 7 = 3.42 is not in the rejection region, our decision is not to reject H 0 : µ D = µ E = µ N ; in other words, our data does not provide sufficient evidence against H 0 : µ D = µ E = µ N. 205

5 The p-value of this hypothesis test is the probability of obtaining a test statistic value f 2, 7 which represents greater differences among the sample means than does the value actually observed f 2, 7 = Graphically, we can picture the p-value of this hypothesis test as corresponding to the shaded area in the figure at the top of the first page of Table A.4, where the value on the horizontal axis is our observed test statistic value f 2, 7 = By looking at the entries of Table A.4 corresponding to 2 numerator degrees of freedom and 7 denominator degrees of freedom, we find that the observed test statistic f 2, 7 = 3.42 is between f 2, 7; 0.10 = 3.26 and f 2, 7; 0.05 = This tells us that the area above f 2, 7 = 3.42, which is the p-value, is between 0.05 and We indicate this by writing 0.05 < p-value < The fact that 0.05 < p-value < 0.10 confirms to us that H 0 is not rejected with α = However, this also tells us that H 0 would be rejected with α = 0.10, but would not be rejected with α = To complete the fourth step of the hypothesis test, we can summarize the results of the hypothesis test as follows: Since f 2, 7 = 3.42 and f 2, 7; 0.01 = 9.55, we do not have sufficient evidence to reject H 0. We conclude that the mean breaking strength is not different for the Deluxe, Econ, and Nogood rope brands (0.05 < p-value < 0.10). Since H 0 is not rejected, no further analysis is necessary. Notice that in the statement of results, the last sentence indicates that no further analysis is necessary. This is always true in a one-way ANOVA when we do not reject the null hypothesis, since concluding that the population means are equal leaves us with no unanswered questions. On the other hand, whenever we reject the null hypothesis in a one-way ANOVA, we are concluding that at least one of the population means is different, and this leaves us with the question of which population mean or means are different. Answering this question requires further analysis. One type of analysis available to identify significantly different population means is discussed in the next unit. Often, the results from an ANOVA are summarized in an ANOVA table. Table 27-3 displays how a one-way ANOVA table is organized. The first column is labeled Source. This refers to the three sources of variation on which the one-way ANOVA is based: the variation between samples, the variation within samples, and the total variation. We can think of the total variation as consisting of two components: the variation between samples and the variation within samples. This is because SSB + SSE = SST. The second and third columns of the ANOVA table are respectively labeled SS and df, referring to sums of squares and degrees of freedom. Notice that the total degrees of freedom is one less than the total sample size, which is the sum of the between samples degrees of freedom and the error degrees of freedom. The last three columns of the ANOVA table contain respectively the two mean squares, the f statistic and the p-value. 206

6 Table 27-4 is the one-way ANOVA table for the data of Table Notice that in place of the label "Between" we have put the label "Rope Brands" to emphasize that the three different brands of rope were being compared in this one-way ANOVA. Using the calculations we have done previously for the data of Table 27-1, complete the ANOVA table in Table (After you complete the ANOVA table, you should have 84, 86, and 170 respectively in the SS column; you should have 2, 7, and 9 in the df column; you should have 42 and respectively in the MS column; you should have 3.42 in the f column; you should have either 0.05 < p-value < 0.10 or an exact p-value, obtained from a calculator or computer, in the p-value column.) Even though the one-way ANOVA is based the assumption that independent random samples are selected from normally distributed populations all having the same standard deviation, the f test in the one-way ANOVA is known to be somewhat robust, which means that it will still perform well in several situations when the assumptions are not satisfied. In general, it is advisable to check how far data departures from the necessary assumptions in order to make a decision as to whether or not to employ an alternative to the one-way ANOVA. Self-Test Problem A 0.01 significance level is chosen for a hypothesis test to see if there is any evidence of a difference in mean yearly income among different political party affiliations in a particular state. The individuals selected for the SURVEY DATA, displayed as Set 1-1 at the end of Unit 1, are treated as comprising four random samples: one from Republicans, one from Democrats, one from Independents, and one from Others. (a) Explain how the data for this hypothesis test is appropriate for a one-way ANOVA. (b) Complete the four steps of the hypothesis test by completing the table titled Hypothesis Test for Self-Test Problem As part of the second step, complete the construction of the one-way ANOVA table, displayed as Table 27-5, where you should find that SSB = , SSE = , and f 3, 26 = (c) Decide which of the following would be best as a graphical display for the data and say why: (i) four pie charts, (ii) four scatter plots, (iii) four box plots. (d) Considering the results of the hypothesis test, decide which of the Type I or Type II errors is possible, and describe this error. (e) Decide whether H 0 would have been rejected or would not have been rejected with each of the following significance levels: (i) α = 0.05, (ii) α = (f) What would the presence of one or more outliers in the data suggest about using the f statistic? 207

7 208

8 Answers to Self-Test Problems 27-1 (a) The data consists of independent random samples selected from more than two populations, and the purpose of a one-way ANOVA is to compare means of such data. (b) Step 1: H 0 : µ R = µ D = µ I = µ O, H 1 : At least one of µ R, µ D, µ I, and µ O is different, (α = 0.01) Step 2: n R = 10, x R = 44.5, n D = 8, x D = 49.4, n I = 4, x I = 42.5, n O = 8, x O = 44.0, f 3, 26 = Step 3: The rejection is f 3, H 0 is not rejected; 0.10 < p-value. Step 4: Since f 3, 26 = 0.22 and f 3, 26; 0.01 = 4.64, we do not have sufficient evidence to reject H 0. We conclude that there is no difference in mean yearly income among the four political party affiliations in the state (0.10 < p-value). Since H 0 is not rejected, no further analysis is necessary. In Table 27-5, the first label in the Source column should be Party Affiliation ; the entries in the SS column should respectively be , , and ; the entries in the df column should respectively be 3, 26, and 29; the entries in the MS column should respectively be and ; the entry in the f column should be 0.22; the entry in the p-value column should be either 0.10 < p-value or an exact p-value, obtained from a calculator or computer. (c) Since yearly income is a quantitative variable, four box plots, one for each party affiliation category, is an appropriate graphical display. (d) Since H 0 is not rejected, the Type II error is possible, which is concluding that the means are all equal when in reality at least one mean is different. (e) H 0 would not have been rejected with both α = 0.05 and α = (f) The presence of one or more outliers in the data would suggest that the f statistic may not be appropriate. Summary A two sample t test statistic is the difference between sample means x 1 x 2 divided by an appropriate standard error. A hypothesis test for comparing k > 2 population means with data consisting of independent random samples is available from the one-way analysis of variance (ANOVA). The test statistic in the one-way ANOVA is somewhat similar to a two sample t test statistic, in the sense that it is the ratio of two measures of variation: the variation between samples and the variation within samples. The grand mean denoted as x, is the mean obtained when all samples are combined together into one sample; we represent this total sample size as n. The squared deviation of each sample mean from the grand mean is multiplied by the corresponding sample size, and these results are summed to produce what is called the between samples sum of squares, abbreviated as SSB. We divide SSB by k 1 to obtain the between samples mean square, abbreviated as MSB. The MSB is the numerator of the test statistic in the one-way ANOVA and is a measure of the variation between samples. The deviation of each observation from its respective sample mean is squared, and these results are summed to produce what is called the within samples sum of squares or the error sum of squares (or sometimes the residual sum of squares), abbreviated as SSE. We divide SSE by n k to obtain the within samples mean square or the error mean square (or the residual mean square), abbreviated as MSE. The MSE is the denominator of the test statistic in the one-way ANOVA and is a measure of how different from each other the observations within any particular sample are, that is, the amount of variation within any particular sample. In general, a mean square is a sum of squares divided by its associated degrees of freedom. The test statistic in a one-way ANOVA is the ratio of the mean square MSB divided by the mean square MSE. This test statistic is called a Fisher s f statistic. If the sample means were all equal, this test statistic would be equal to 209

9 zero, since the numerator MSB would be zero. The null hypothesis in a one-way ANOVA states that the population means are all equal and is rejected when the Fisher s f statistic f n k, k 1 = MSB MSE SSB ( k 1) = SSE ( n k) is sufficiently large, when compared to the Fisher's f distribution with k 1 numerator degrees of freedom and n k denominator degrees of freedom. The results from a one-way ANOVA are often organized into the one-way ANOVA table displayed as Table The one-way ANOVA is based the assumption that independent random samples are selected from normally distributed populations all having the same standard deviation. The f test in the one-way ANOVA is known to be somewhat robust, which means that it will still perform well in several situations when the assumptions are not satisfied. Alternative tests are available when there is uncertainty about whether one or more of the assumptions are not satisfied. Access to the appropriate statistical software or programmable calculator allows one to avoid having to do much of the calculation necessary in a one-way ANOVA. 210

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able

10/31/2012. One-Way ANOVA F-test

PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 1. Situation/hypotheses 2. Test statistic 3.Distribution 4. Assumptions One-Way ANOVA F-test One factor J>2 independent samples

STAT Chapter 10: Analysis of Variance

STAT 515 -- Chapter 10: Analysis of Variance Designed Experiment A study in which the researcher controls the levels of one or more variables to determine their effect on the variable of interest (called

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing

Basic Statistics Exercises 66

Basic Statistics Exercises 66 42. Suppose we are interested in predicting a person's height from the person's length of stride (distance between footprints). The following data is recorded for a random

Note: k = the # of conditions n = # of data points in a condition N = total # of data points

The ANOVA for2 Dependent Groups -- Analysis of 2-Within (or Matched)-Group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different, but

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

The One-Way Independent-Samples ANOVA (For Between-Subjects Designs) Computations for the ANOVA In computing the terms required for the F-statistic, we won t explicitly compute any sample variances or

WELCOME! Lecture 13 Thommy Perlinger

Quantitative Methods II WELCOME! Lecture 13 Thommy Perlinger Parametrical tests (tests for the mean) Nature and number of variables One-way vs. two-way ANOVA One-way ANOVA Y X 1 1 One dependent variable

Correlation Analysis

Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

1 Introduction to One-way ANOVA

Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: http://www.auburn.edu/~carpedm/courses/stat3610/textbookdata/minitab/

Chapter 4. Regression Models. Learning Objectives

Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE It would be very unusual for all the research one might conduct to be restricted to comparisons of only two samples. Respondents and various groups are seldom divided

1. The (dependent variable) is the variable of interest to be measured in the experiment.

Chapter 10 Analysis of variance (ANOVA) 10.1 Elements of a designed experiment 1. The (dependent variable) is the variable of interest to be measured in the experiment. 2. are those variables whose effect

CHAPTER 13: F PROBABILITY DISTRIBUTION

CHAPTER 13: F PROBABILITY DISTRIBUTION continuous probability distribution skewed to the right variable values on horizontal axis are 0 area under the curve represents probability horizontal asymptote

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G 1 ANOVA Analysis of variance compares two or more population means of interval data. Specifically, we are interested in determining whether differences

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance) Outline: Testing for a difference in means Notation Sums of squares Mean squares The F distribution The ANOVA table Part II: multiple comparisons Worked

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

PSYC 331 STATISTICS FOR PSYCHOLOGISTS Session 4 A PARAMETRIC STATISTICAL TEST FOR MORE THAN TWO POPULATIONS Lecturer: Dr. Paul Narh Doku, Dept of Psychology, UG Contact Information: pndoku@ug.edu.gh College

Hypothesis testing: Steps

Review for Exam 2 Hypothesis testing: Steps Exam 2 Review 1. Determine appropriate test and hypotheses 2. Use distribution table to find critical statistic value(s) representing rejection region 3. Compute

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

One-Way Analysis of Variance With regression, we related two quantitative, typically continuous variables. Often we wish to relate a quantitative response variable with a qualitative (or simply discrete)

Chapter 11 - Lecture 1 Single Factor ANOVA

Chapter 11 - Lecture 1 Single Factor ANOVA April 7th, 2010 Means Variance Sum of Squares Review In Chapter 9 we have seen how to make hypothesis testing for one population mean. In Chapter 10 we have seen

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

Sampling Distributions: Central Limit Theorem

Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data 1999 Prentice-Hall, Inc. Chap. 10-1 Chapter Topics The Completely Randomized Model: One-Factor

ANOVA: Analysis of Variation

ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

Correlation and Regression

Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

Inferences About the Difference Between Two Means

7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D. Purpose While the T-test is useful to compare the means of two samples, many biology experiments involve the parallel measurement of three or

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Statistical methods for comparing multiple groups Lecture 7: ANOVA Sandy Eckel seckel@jhsph.edu 30 April 2008 Continuous data: comparing multiple means Analysis of variance Binary data: comparing multiple

Lecture 41 Sections Mon, Apr 7, 2008

Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

One-way Analysis of Variance Ψ30 Ainsworth Major Points Problem with t-tests and multiple groups The logic behind ANOVA Calculations Multiple comparisons Assumptions of analysis of variance Effect Size

4.1. Introduction: Comparing Means

4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly

ANOVA: Comparing More Than Two Means

1 ANOVA: Comparing More Than Two Means 10.1 ANOVA: The Completely Randomized Design Elements of a Designed Experiment Before we begin any calculations, we need to discuss some terminology. To make this

Comparing Several Means: ANOVA

Comparing Several Means: ANOVA Understand the basic principles of ANOVA Why it is done? What it tells us? Theory of one way independent ANOVA Following up an ANOVA: Planned contrasts/comparisons Choosing

Orthogonal, Planned and Unplanned Comparisons

This is a chapter excerpt from Guilford Publications. Data Analysis for Experimental Design, by Richard Gonzalez Copyright 2008. 8 Orthogonal, Planned and Unplanned Comparisons 8.1 Introduction In this

Fractional Factorial Designs

k-p Fractional Factorial Designs Fractional Factorial Designs If we have 7 factors, a 7 factorial design will require 8 experiments How much information can we obtain from fewer experiments, e.g. 7-4 =

DISTRIBUTIONS USED IN STATISTICAL WORK

DISTRIBUTIONS USED IN STATISTICAL WORK In one of the classic introductory statistics books used in Education and Psychology (Glass and Stanley, 1970, Prentice-Hall) there was an excellent chapter on different

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

What Is ANOVA? One-way ANOVA ANOVA ANalysis Of VAriance ANOVA compares the means of several groups. The groups are sometimes called "treatments" First textbook presentation in 95. Group Group σ µ µ σ µ

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

W&M CSCI 688: Design of Experiments Homework 2 Megan Rose Bryant September 25, 201 3.5 The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically.

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA

Chapter Seven: Multi-Sample Methods 1/52

Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means 4.1 The Need for Analytical Comparisons...the between-groups sum of squares averages the differences

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

QUEEN MARY, UNIVERSITY OF LONDON

QUEEN MARY, UNIVERSITY OF LONDON MTH634 Statistical Modelling II Solutions to Exercise Sheet 4 Octobe07. We can write (y i. y.. ) (yi. y i.y.. +y.. ) yi. y.. S T. ( Ti T i G n Ti G n y i. +y.. ) G n T

The Multiple Regression Model

Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:

Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

Aldrin in the Wolf River Wolf River Lecture 19 - Sta 111 Colin Rundel June 11, 2014 The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests

Hypothesis Testing hypothesis testing approach

Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

Hypothesis Testing with One-Way ANOVA Statistics Arlo Clark-Foos Conceptual Refresher 1. Standardized z distribution of scores and of means can be represented as percentile rankings. 2. t distribution

Quantitative Techniques - Lecture 8: Estimation

Quantitative Techniques - Lecture 8: Estimation Key words: Estimation, hypothesis testing, bias, e ciency, least squares Hypothesis testing when the population variance is not known roperties of estimates

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

Wolf River Lecture 15 - Sta102 / BME102 Colin Rundel October 22, 2014 The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane

Chapter 12. ANalysis Of VAriance. Lecture 1 Sections:

Chapter 1 ANalysis Of VAriance Lecture 1 Sections: 1.1 1. ANOVA test is an extension of two sample independent t-test demonstrated how to compare differences of means between two groups. The t-test is

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

ANOVA CIVL 7012/8012

ANOVA CIVL 7012/8012 ANOVA ANOVA = Analysis of Variance A statistical method used to compare means among various datasets (2 or more samples) Can provide summary of any regression analysis in a table called

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

STATS Analysis of variance: ANOVA

STATS 1060 Analysis of variance: ANOVA READINGS: Chapters 28 of your text book (DeVeaux, Vellman and Bock); on-line notes for ANOVA; on-line practice problems for ANOVA NOTICE: You should print a copy

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Section 9.5 Testing the Difference Between Two Variances Bluman, Chapter 9 1 This the last day the class meets before spring break starts. Please make sure to be present for the test or make appropriate

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

and F Distributions Lecture 9 Distribution The distribution is used to: construct confidence intervals for a variance compare a set of actual frequencies with expected frequencies test for association

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

Average weight of Eisenhower dollar: 23 grams

Average weight of Eisenhower dollar: 23 grams Average cost of dinner in Decatur: 23 dollars Would it be more surprising to see A dinner that costs more than 27 dollars, or An Eisenhower dollar that weighs

Introduction to Analysis of Variance (ANOVA) Part 2

Introduction to Analysis of Variance (ANOVA) Part 2 Single factor Serpulid recruitment and biofilms Effect of biofilm type on number of recruiting serpulid worms in Port Phillip Bay Response variable:

ANOVA: Analysis of Variance

ANOVA: Analysis of Variance Marc H. Mehlman marcmehlman@yahoo.com University of New Haven The analysis of variance is (not a mathematical theorem but) a simple method of arranging arithmetical facts so

Ratio of Polynomials Fit One Variable

Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

What If There Are More Than. Two Factor Levels?

What If There Are More Than Chapter 3 Two Factor Levels? Comparing more that two factor levels the analysis of variance ANOVA decomposition of total variability Statistical testing & analysis Checking

Calculating Fobt for all possible combinations of variances for each sample Calculating the probability of (F) for each different value of Fobt

PSY 305 Module 5-A AVP Transcript During the past two modules, you have been introduced to inferential statistics. We have spent time on z-tests and the three types of t-tests. We are now ready to move

9 Correlation and Regression

9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last week: Sample, population and sampling

Topic 1. Definitions

S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

MALLOY PSYCH 3000 1-ANOVA PAGE 1 ONE FACTOR COMPLETELY RANDOMIZED ANOVA Sampling Distribution of F F is a test statistic [ ][ ][ ][ ] Test Statistic: F = MALLOY PSYCH 3000 1-ANOVA PAGE 2 ONE WAY ANOVA

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Variance Estimates and the F Ratio ERSH 8310 Lecture 3 September 2, 2009 Today s Class Completing the analysis (the ANOVA table) Evaluating the F ratio Errors in hypothesis testing A complete numerical

Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

16.400/453J Human Factors Engineering. Design of Experiments II

J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

12.1 Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures 12.9 Repeated measures analysis Sometimes researchers make multiple measurements on the same experimental unit. We have

Lecture 13 Extra Sums of Squares

Lecture 13 Extra Sums of Squares STAT 512 Spring 2011 Background Reading KNNL: 7.1-7.4 13-1 Topic Overview Extra Sums of Squares (Defined) Using and Interpreting R 2 and Partial-R 2 Getting ESS and Partial-R

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Chapter 2. Continued Proofs For ANOVA Proof of ANOVA Identity We are going to prove that Writing SST SSR + SSE. Y i Ȳ (Y i Ŷ i ) + (Ŷ i Ȳ ) Squaring both sides summing over all i 1,...n, we get (Y i Ȳ

Simple Linear Regression

9-1 l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical Method for Determining Regression 9.4 Least Square Method 9.5 Correlation Coefficient and Coefficient

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

Correlation & Simple Regression

Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

ANOVA: Analysis of Variance

ANOVA: Analysis of Variance Marc H. Mehlman marcmehlman@yahoo.com University of New Haven The analysis of variance is (not a mathematical theorem but) a simple method of arranging arithmetical facts so

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

WISE ANOVA and Regression Lab Introduction to the WISE Correlation/Regression and ANOVA Applet This module focuses on the logic of ANOVA with special attention given to variance components and the relationship

ECON 312 FINAL PROJECT

ECON 312 FINAL PROJECT JACOB MENICK 1. Introduction When doing statistics with cross-sectional data, it is common to encounter heteroskedasticity. The cross-sectional econometrician can detect heteroskedasticity

Two Factor ANOVA. March 2, 2017

Two Factor ANOVA March, 07 Contents Two Factor ANOVA Example : peanut butter and jelly Within-cell variance - the denominator of all three F-tests Main effects for rows: Does peanut butter affect taste

BALANCED INCOMPLETE BLOCK DESIGNS

BALANCED INCOMPLETE BLOCK DESIGNS V.K. Sharma I.A.S.R.I., Library Avenue, New Delhi -110012. 1. Introduction In Incomplete block designs, as their name implies, the block size is less than the number of