Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)

Similar documents
Chapter 11 - Lecture 1 Single Factor ANOVA

16.3 One-Way ANOVA: The Procedure

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

EX1. One way ANOVA: miles versus Plug. a) What are the hypotheses to be tested? b) What are df 1 and df 2? Verify by hand. , y 3

One-Way Analysis of Variance (ANOVA)

Chapter 11 - Lecture 1 Single Factor ANOVA

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Inferences for Regression

1 Introduction to One-way ANOVA

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

df=degrees of freedom = n - 1

Unit 27 One-Way Analysis of Variance

ANOVA (Analysis of Variance) output RLS 11/20/2016

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

CHAPTER 13: F PROBABILITY DISTRIBUTION

ANOVA: Analysis of Variation

Chapter 12 - Lecture 2 Inferences about regression coefficient

STATS Analysis of variance: ANOVA

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

Sociology 6Z03 Review II

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

ANOVA: Comparing More Than Two Means

CHAPTER 13: F PROBABILITY DISTRIBUTION

INTRODUCTION TO ANALYSIS OF VARIANCE

Analysis of Variance

F-tests and Nested Models

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

This document contains 3 sets of practice problems.

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Research Methods II MICHAEL BERNSTEIN CS 376

The simple linear regression model discussed in Chapter 13 was written as

Correlation & Simple Regression

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Inference for Regression

Chapter 10: Analysis of variance (ANOVA)

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

We need to define some concepts that are used in experiments.

ANOVA - analysis of variance - used to compare the means of several populations.

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

1 Use of indicator random variables. (Chapter 8)

INFERENCE FOR REGRESSION

STAT 705 Chapter 16: One-way ANOVA

The t-statistic. Student s t Test

4.1. Introduction: Comparing Means

One-way ANOVA. Experimental Design. One-way ANOVA

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Statistics for Managers using Microsoft Excel 6 th Edition

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Data analysis and Geostatistics - lecture VII

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Two-Way Analysis of Variance - no interaction

Chapter 4. Regression Models. Learning Objectives

Battery Life. Factory

STAT Chapter 10: Analysis of Variance

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

Basic Business Statistics, 10/e

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

The Multiple Regression Model

Week 14 Comparing k(> 2) Populations

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Chapter 6 Randomized Block Design Two Factor ANOVA Interaction in ANOVA

Chapter 10. Design of Experiments and Analysis of Variance

While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1

28. SIMPLE LINEAR REGRESSION III

1 Introduction to Minitab

ANOVA Randomized Block Design

Josh Engwer (TTU) 1-Factor ANOVA / 32

Note: k = the # of conditions n = # of data points in a condition N = total # of data points

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Lecture 3: Inference in SLR

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

1. The (dependent variable) is the variable of interest to be measured in the experiment.

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

Formal Statement of Simple Linear Regression Model

Warm-up Using the given data Create a scatterplot Find the regression line

STAT 115:Experimental Designs

Chapter 16. Simple Linear Regression and Correlation

Table 1: Fish Biomass data set on 26 streams

Multiple Regression Examples

1. Least squares with more than one predictor

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Correlation Analysis

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

9 One-Way Analysis of Variance

Design & Analysis of Experiments 7E 2009 Montgomery

Section 3: Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Inference for Regression Inference about the Regression Model and Using the Regression Line

Analysis of Variance and Design of Experiments-I

Biostatistics 380 Multiple Regression 1. Multiple Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

10.2: The Chi Square Test for Goodness of Fit

Transcription:

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance) Outline: Testing for a difference in means Notation Sums of squares Mean squares The F distribution The ANOVA table Part II: multiple comparisons Worked example 1

Testing for a difference in more than two means Previously we have seen how to test for a difference in two means, using a 2 sample t-test. But what if we want to test to see if there are differences in a set of more than two means? The tool for doing this is called ANOVA, which is short for analysis of variance. 2

How to detect differences in means: two cases In both of these cases, the sample means for the three boxplots are about 5, 10, and 15, respectively. In the first case, it seems clear that the true means must also differ. In the second case, this isn t so clear. 3

In other words, did these three boxplots come from populations with the same mean, or with different means? or? 4

Using variance to test for a difference in means The thing that makes these two cases so different is that in the first, the data is not very spread out, while in the second it is. More variation reflects greater uncertainty regarding the values of the true unknown means. With ANOVA, we compare average between group variance to average within group variance. 5

Using variance to test for a difference in means If the average amount of variation between the groups is substantially larger than the average amount of variation within the groups, then the true group means likely differ. If not, then we cannot rule out that they are equal. As you might guess, we will make this decision using a hypothesis test. For ANOVA, we have: 6

Male lab rats were randomly assigned to 3 groups differing in access to food for 40 days. The rats were weighed (in grams) at the end. Group 1: access to chow only. Group 2: chow plus access to cafeteria food restricted to 1 hour every day. Group 3: chow plus extended access to cafeteria food (all day long every day). Cafeteria food: bacon, sausage, cheesecake, pound cake, frosting and chocolate Does the type of food access influence the body weight of male lab rats significantly? H 0 : µ chow = µ restricted = µ extended H a : H 0 is not true (at least one mean µ is different from at least one other mean µ) N Mean StDev Chow 19 605.63 49.64 Restricted 16 657.31 50.68 Extended 15 691.13 63.41

The ANOVA table All of the information regarding this ANOVA procedure can be summarized in an ANOVA table: Source of variation df Sum of squares Mean square F statistic Treatment k-1 SSTr MSTr = SSTr/(k-1) F = MSTr/MSE Error N-k SSE MSE = SSE/(N-k) Total N-1 SSTo 8

Notation We will be using the following notation in our calculations. k: # of groups µ...µ: true means (groups 1 throughk ) 1 k x...x: 1 k 1 k 1 k sample means (groups 1 throu g h k ) s...s: sample standard deviations groups ( 1 through k) n... n : s a m psizes l e (groups 1 through k) N: Overall sample size for a l l g r o u p s cno = m bn+ i n e+ d n+ (. n. 1 1 2 2 1 2 T: Sum of all data from allg r To = un p+ x s n + ( x+. n.. x ) x: Grand mean o f a l l d a t a f r o m a l l xg = r o Tu p s N( / ) k k k 9

Sums of squares Sums of squares refers to sums of squared deviations from some mean. ANOVA compares between to within group variation, and these types of variation are quantified using sums of squares. The three important sums of squares we will consider are: SSTo: Total sums of squares SSTr: Treatment sums of squares 10

Total sum of squares Total sum of squares is calculated by summing up the squared deviations of every point in the data from the grand mean. This quantifies the amount of overall variation in the data, for all of the groups put together. The degrees of freedom associated with SSTo are S S = T ( o x 2 x d = f N1 11

Treatment sum of squares S S= T( r ) n + ( x x) n. + x +. x. n x( 2 2 1 1 2 2 Treatment sum of squares is calculated by summing the squared deviations of each sample mean from the grand mean. This quantifies how much the sample means vary. It is what we are referring to when we say between group variation. The degrees of freedom associated with SSTr are k k d = f 1k 12

Error sum of squares S S= ne ( 1 s + ) ( n1 ) +. s +.. n ( 1 ) 2 2 1 1 2 2 k Error sum of squares is calculated by added up the squared deviations of each data point around its group mean. This quantifies how much the data varies around its group sample means. It is what we are referring to when we say within group variation. d = f N 13

Putting the sums of squares together SSTo = SSTr+SSE In other words, we can express the total amount of variation as the sum of between group variation (SSTr) and within group variation (SSE). 14

Mean squares A mean square is a sum of squares, divided by its associated degrees of freedom. MSTr = mean square for treatments = S S T k 1 S S E N 15

The F distribution The statistic we will calculate is: This follows an F distribution, which is a ratio of two chi-square distributions. It looks like a chi-square distribution (skewed right). It has separate degrees of freedom for the numerator and for the denominator. F t e s t M S T = M S 16

Using Table F The F distribution is asymmetrical and has two distinct degrees of freedom. This was discovered by Fisher, hence the label "F". Once again what we do is calculate the value of F for our sample data, and then look up the corresponding area under the curve in Table F.

The F distribution To find a p-value based on an F statistic with your calculator, use Which in this case is Fcdfa, ( b, numerator df, denominator d) Fcdftest ( statistic, 1000, k - 1, N )- 18

The ANOVA table All of the information regarding this ANOVA procedure can be summarized in an ANOVA table: Source of variation df Sum of squares Mean square F statistic Treatment k-1 SSTr MSTr = SSTr/(k-1) F = MSTr/MSE Error N-k SSE MSE = SSE/(N-k) Total N-1 SSTo 19

Assumptions of ANOVA The ANOVA procedure makes two main assumptions: i. The population distribution of each group is normal ii. The population standard deviations of each group are equal. 20

Assumptions of ANOVA We cannot know for sure if our assumptions are met, but we can eyeball our data to make sure they aren t being clearly violated. If the data look approximately normal around each mean, and no sample standard deviation is more than twice as big as another, we re probably in good shape. 21

Example A researcher who studies sleep is interested in the effects of ethanol on sleep time. She gets a sample of 20 rats and gives each an injection having a particular concentration of ethanol per body weight. There are 4 treatment groups, with 5 rats per treatment. She records REM sleep time for each rat over a 24-period. The results are plotted below: 22

Set up hypotheses We d like to use the ANOVA procedure to see if we have evidence that any of these means differ from the others. We ll use a level of significance of 0.05. 23

And here are the relevant summary statistics: Treatment Sample mean (x Ⱪ) Sample standard deviation (s) 1) 0 (control) 83.0 9.7 2) 1 g/kg 59.4 9.0 3) 2 g/kg 49.8 9.66 4) 4 g/kg 37.6 8.8 First, we find the grand sample mean: T = N = x = 24

The ANOVA table Here is what we can fill out so far in the ANOVA table. The goal is to get all the information we need in order to compute the F statistic. Source of variation df Sum of squares Mean square F statistic Treatment 3 SSTr MSTr = SSTr/(3) F = MSTr/MSE Error 16 SSE MSE = SSE/(16) Total 19 SSTo 25

Calculate SSTr and SSE S S = T( r n) + x x( n ) x+ x ( n x + ) x n ( x 2 2 2 1 1 2 2 3 3 4 4 = 26

Calculate SSTr and SSE S S= ne ( s 1 + ) n ( s1 + ) n ( 1 s+ ) ( n 1 ) 2 2 2 1 1 2 2 3 3 4 = 27

The ANOVA table Now we can update our ANOVA table with these sums of squares. Source of variation df Sum of squares Mean square F statistic Treatment 3 MSTr = SSTr/(3) F = MSTr/MSE Error 16 MSE = SSE/(16) Total 19 28

Calculate MSTr, MSE, and F S S T M S T= r = k 1 S S E M S = E = N k F M S T r = = M S E 29

The ANOVA table Now we can finish filling in our ANOVA table Source of variation df Sum of squares Mean square F statistic Treatment 3 Error 16 Total 19 30

Compute the p-value Now we have to see if our F statistic is big enough in order to reject H. We do this the usual way: by finding the area under the F- 0 distribution to the right of the test statistic, i.e. by finding the p-value. Remember that the F distribution has both a numerator df and a denominator df. In this example, numerator df = 3 and denominator df = 16. 31

Results This p-value is very small, so we reject H others. 0 and conclude that at least one group s average sleep time differs significantly from the Note that this overall ANOVA test does not identify which means differ from which other means. It only tells us that at least one is significantly different from at least one other. 32

Male lab rats were randomly assigned to 3 groups differing in access to food for 40 days. The rats were weighed (in grams) at the end. Does the type of access to food influence the body weight of lab rats significantly? Chow Restricted Extended 516 547 546 564 577 570 582 594 597 599 606 606 624 623 641 655 667 690 703 546 599 612 627 629 638 645 635 660 676 695 687 693 713 726 736 Chow Restricted Extended 564 611 625 644 660 679 687 688 706 714 738 733 744 780 794 N 19 16 15 Mean 605.63 657.31 691.13 Minitab One-way ANOVA: Chow, Restricted, Extended Source Factor Error Total DF 2 47 49 S = 54.42 StDev 49.64 50.68 63.41 SS 63400 139174 202573 MS 31700 2961 R-Sq = 31.30% F 10.71 P 0.000 R-Sq(adj) = 28.37%

Male lab rats were randomly assigned to 3 groups differing in access to food for 40 days. The rats were weighed (in grams) at the end. Does the type of access to food influence the body weight of lab rats significantly? The test statistic is F = 10.71. F, df1=2, df2=47 0 4 F 8 0.0001471 10.71 12 The F statistic (10.71) is large and the P-value is very small (<0.001). We reject H 0. The population mean weights are not all the same; food access type influences weight significantly in male lab rats.