ANOVA: Analysis of Variance

Similar documents
ANOVA: Analysis of Variance

Statistics for EES Factorial analysis of variance

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Regression. Marc H. Mehlman University of New Haven

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Chapter 12. Analysis of variance

Statistics - Lecture 05

ANOVA CIVL 7012/8012

MATH Notebook 3 Spring 2018

Confidence Intervals, Testing and ANOVA Summary

Comparing Several Means: ANOVA

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Unit 27 One-Way Analysis of Variance

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Chapter 16 One-way Analysis of Variance

Multiple comparisons - subsequent inferences for two-way ANOVA

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38

WELCOME! Lecture 13 Thommy Perlinger

In ANOVA the response variable is numerical and the explanatory variables are categorical.

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Battery Life. Factory

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

Multiple Sample Numerical Data

Sociology 6Z03 Review II

ANOVA: Comparing More Than Two Means

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Analysis of Variance (ANOVA)

ANOVA: Analysis of Variation

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Factorial Analysis of Variance

One-way between-subjects ANOVA. Comparing three or more independent means

Analysis of variance (ANOVA) Comparing the means of more than two groups

One-way ANOVA. Experimental Design. One-way ANOVA

ANOVA Randomized Block Design

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Week 14 Comparing k(> 2) Populations

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths

10/31/2012. One-Way ANOVA F-test

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

4.1. Introduction: Comparing Means

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

One-way between-subjects ANOVA. Comparing three or more independent means

Factorial Analysis of Variance

ANOVA (Analysis of Variance) output RLS 11/20/2016

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

10 One-way analysis of variance (ANOVA)

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

COMPARING SEVERAL MEANS: ANOVA

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

ANOVA continued. Chapter 11

Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes

16.3 One-Way ANOVA: The Procedure

ANOVA - analysis of variance - used to compare the means of several populations.

Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

Topic 22 Analysis of Variance

Comparing the means of more than two groups

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Comparing Several Means

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Chapter 10: Analysis of variance (ANOVA)

Analysis of Variance

Analysis of variance (ANOVA) ANOVA. Null hypothesis for simple ANOVA. H 0 : Variance among groups = 0

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Analysis of Variance

Announcements. Unit 4: Inference for numerical variables Lecture 4: ANOVA. Data. Statistics 104

Statistiek II. John Nerbonne. February 26, Dept of Information Science based also on H.Fitz s reworking

Assignment #7. Chapter 12: 18, 24 Chapter 13: 28. Due next Friday Nov. 20 th by 2pm in your TA s homework box

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

Lecture 3: Analysis of Variance II

The Statistical Sleuth in R: Chapter 5

Analysis of Variance: Part 1

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Sleep data, two drugs Ch13.xls

Chapter 11 - Lecture 1 Single Factor ANOVA

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

More about Single Factor Experiments

Group comparison test for independent samples

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

Nonparametric Statistics

Why should I use a Kruskal-Wallis test? (With Minitab) Why should I use a Kruskal-Wallis test? (With SPSS)

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

1 Introduction to One-way ANOVA

Independent Samples ANOVA

13: Additional ANOVA Topics

Hypothesis testing: Steps

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

(Foundation of Medical Statistics)

Transcription:

ANOVA: Analysis of Variance Marc H. Mehlman marcmehlman@yahoo.com University of New Haven The analysis of variance is (not a mathematical theorem but) a simple method of arranging arithmetical facts so as to isolate and display the essential features of a body of data with the utmost simplicity. Sir Ronald A. Fisher (University of New Haven) ANOVA: Analysis of Variance 1 / 31

Table of Contents 1 ANOVA: One Way Layout 2 Comparing Means 3 ANOVA: Two Way Layout 4 Chapter #11 R Assignment (University of New Haven) ANOVA: Analysis of Variance 2 / 31

ANOVA (analysis of variance) is for testing if the means of k different populations are equal when all the populations are independent, normal and have the same unknown variance. An ANOVA test compares the randomness (variance) within groups (populations) to the randomness between groups. To test if the means of all the populations are equal, one considers the ratio variance between groups variance within groups as a test statistic. A large ratio would indicate a difference between in means between the groups. (University of New Haven) ANOVA: Analysis of Variance 3 / 31

ANOVA: One Way Layout ANOVA: One Way Layout ANOVA: One Way Layout (University of New Haven) ANOVA: Analysis of Variance 4 / 31

ANOVA: One Way Layout The Idea of ANOVA The sample means for the three samples are the same for each set. The variation among sample means for (a) is identical to (b). The variation among the individuals within the three samples is much less for (b). CONCLUSION: the samples in (b) contain a larger amount of variation among the sample means relative to the amount of variation within the samples, so ANOVA will find more significant differences among the means in (b) assuming equal sample sizes here for (a) and (b). Note: larger samples will find more significant differences. 7 (University of New Haven) ANOVA: Analysis of Variance 5 / 31

ANOVA: One Way Layout Note: When k = 2, one usually uses the two sample t test. However, ANOVA will give the same result. When k > 2, hypothesis testing two populations at a time does not work well. For instance, if one has four populations and each test is a significance level 0.05, then the significance level of all ( 4 2) = 6 tests would be 1 (1 0.05) 6 = 0.265. The ANOVA procedure is computationally intense - one usually uses a computer program. (University of New Haven) ANOVA: Analysis of Variance 6 / 31

ANOVA: One Way Layout Assumptions for doing ANOVA 1 the populations are normal. 2 the populations have same (unknown) variance. The above conditions are robust in the sense one can use ANOVA if the populations are approximately normal (otherwise the Kruskal Wallis Test a nonparametric test) and the population variances are approximately equal. Convention: Rule for establishing equal variance If the largest sample standard deviation is less than twice the smallest sample standard deviation, one can use ANOVA techniques under the assumption the variances are all the same. Some textbooks use four times the smallest sample variance instead of just twice. (University of New Haven) ANOVA: Analysis of Variance 7 / 31

ANOVA: One Way Layout The Treatment or Factor is what differs between populations. Example A Blood pressure drug is administered to k populations in k different doses. One samples from each of the the k populations. dosage #1. dosage #k X 11,, X 1n1. X k1,, X knk (University of New Haven) ANOVA: Analysis of Variance 8 / 31

ANOVA: One Way Layout Definition Let k = # of levels (populations) n j = sample size of random sample from j th population N = n 1 + n 2 + + n k = total number of random varibles x j = sample mean from j th population s 2 j = sample variance from j th population x = the grand mean = 1 k n i x ij N i=1 j=1 (University of New Haven) ANOVA: Analysis of Variance 9 / 31

ANOVA: One Way Layout Definition SS TOT = k n i (x ij x) 2 = Sum of Squares Total i=1 j=1 SS A = Sum of Squares between levels = n 1 ( x 1 x) 2 + n 2 ( x 2 x) 2 + + n k ( x k x) 2 SS E = Sum of Squares within the levels = (n 1 1)s 2 1 + (n 2 1)s 2 2 + + (n k 1)s 2 k Theorem SS TOT = SS A + SS E. (University of New Haven) ANOVA: Analysis of Variance 10 / 31

ANOVA: One Way Layout Definition MS A = Mean Squares between levels (groups) = SS A k 1 = n 1( x 1 x) 2 + n 2 ( x 2 x) 2 + + n k ( x k x) 2. k 1 MS E = Mean Squares within the levels = pooled sample variance = Mean Squared Error = SS E N k = (n 1 1)s1 2 + (n 2 1)s2 2 + + (n k 1)sk 2. N k Theorem The Mean Square Error, MS E, is an unbiased estimator of σ 2. (University of New Haven) ANOVA: Analysis of Variance 11 / 31

ANOVA: One Way Layout Theorem (ANOVA F Test) To test use test statistic H 0 : µ 1 = = µ k vs H A : not H 0 F = MS A MS E F (k 1, N k) under H 0. Not H 0 F large, so use right tail test. One creates an ANOVA table: Source df SS MS F p Between k 1 SS A MS A MS A MS E P(F(k 1, N I ) f ) Within N k SS E MS E Total N 1 SS TOT (University of New Haven) ANOVA: Analysis of Variance 12 / 31

ANOVA: One Way Layout Example Judges at the Parisian photography contest, FotoGras, numerically scored photographs submitted by a number of photographers on a scale 0 10. A One Way Anova Test was performed to see which type of camera the photograph was taken with had anything to do with the judges numerical scores. A summary of the data is given below: Brand Sample Size Sample Mean Sample Variance Canon 11 7.6 2.1 Nikon 9 8.0 3.3 Pentax 5 8.7 2.9 Samsung 3 8.3 2.0 Sony 8 8.0 1.9 The scores awarded from each brand was verified as being (mostly) normally distributed and independent from the scores awarded from other brands. Create an ANOVA Table from the scores and decide whether there was no brand effect at a 0.05 significance level. (University of New Haven) ANOVA: Analysis of Variance 13 / 31

ANOVA: One Way Layout Example (cont.) Solution: Since the largest sample standard deviation, 3.3, is less than twice the size of the smallest sample variance, 1.9, we can assume the population variances are all the same. k = 5 N = 11 + 9 + 5 + 3 + 8 = 36 11(7.6) + 9(8.0) + 5(8.7) + 3(8.3) + 8(8.0) x = = 8.0 36 SS A = 11(7.6 8.0) 2 + 9(8.0 8.0) 2 + 5(8.7 8.0) 2 + 3(8.3 8.0) 2 + 8(8.0 8.0) 2 = 4.48 SS E = (11 1)2.1 + (9 1)3.3 + (5 1)2.9 + (3 1)2.0 + (8 1)1.9 = 76.3 SS TOT = SSG + SSE = 4.48 + 76.3 = 80.78 MS A = MS E = SS A k 1 = 4.48 5 1 = 1.12 SS E N k = 76.3 36 5 = 2.46129 f = MS A MS E = 1.12 2.46129 = 0.4550459 p value = P(F(4, 31) f ) = 0.7679706 Source df SS MS F p Between 4 4.48 1.12 0.45505 0.76797 Within 31 76.3 2.46129 Total 35 80.78 One accepts the hypothesis that there is no brand effect. (University of New Haven) ANOVA: Analysis of Variance 14 / 31

ANOVA: One Way Layout Example Given data on carpet durability > cdat=read.table("carpet.dat",h=true) > cdat Durability Carpet 18.95 1 12.62 1 11.94 1 14.42 1 10.06 2 7.19 2 7.03 2 14.66 2 10.92 3 13.28 3 14.52 3 12.51 3 10.46 4 21.40 4 18.10 4 22.50 4 Test if durability depends on which carpet type one choses. (University of New Haven) ANOVA: Analysis of Variance 15 / 31

ANOVA: One Way Layout Example (continued) > cdat=read.table("carpet.dat",h=true) > Carpet.F = as.factor(cdat$carpet) # change to a categorical variable > g.lm=lm(cdat$durability~carpet.f) > anova(g.lm) Analysis of Variance Table Response: cdat$durability Df Sum Sq Mean Sq F value Pr(>F) Carpet.F 3 146.374 48.791 3.5815 0.04674 * Residuals 12 163.477 13.623 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 > kruskal.test(cdat$durability~carpet.f) # Kruskal--Wallis Test Kruskal-Wallis rank sum test data: cdat$durability by Carpet.F Kruskal-Wallis chi-squared = 5.2059, df = 3, p-value = 0.1573 (University of New Haven) ANOVA: Analysis of Variance 16 / 31

Comparing Means Comparing Means Comparing Means (University of New Haven) ANOVA: Analysis of Variance 17 / 31

Comparing Means If H 0 is rejected, ie all means are not equal, how do you find how the population means differ from each other? Answer: boxplots (all in one graph). multiple comparison methods such as the Bonferroni Multiple Comparison Test. (University of New Haven) ANOVA: Analysis of Variance 18 / 31

Comparing Means Continuing with the carpet durability example, using R one can create boxplots: > boxplot(cdat$durability[1:4], cdat$durability[5:8], cdat$durability[9:12], cdat$durability[13:16]) 10 15 20 1 2 3 4 It seems that type 4 carpet is the most durable and type 2 is the least durable, but both of these types have more variably in durability than types 1 and 3. One should be careful about how strongly we use the word seems as we used only four carpets of each type. (University of New Haven) ANOVA: Analysis of Variance 19 / 31

Comparing Means Definition A least significant differences (LDS) method is a multiple comparisons procedure that tests each pair of levels and rejects H 0 : µ 1 = = µ k if any of the ( k 2) tests is significant. The Bonferroni Multiple Comparison Test is a LDS method. Theorem (Bonferroni Multiple Comparison Test) To test H 0 at the α significance level for every 1 i < j k: Step #1 calculate the test statistic x j x i t ij = ( ) t(n k). 1 MS E n i + 1 n j Step #2 Test whether the means of levels i and j are equal at the the a two sided test with the test statistic t ij. If any of the ( k 2) test are significant, reject H0. Otherwise accept H 0. α level using ( k 2) (University of New Haven) ANOVA: Analysis of Variance 20 / 31

Comparing Means Example > pairwise.t.test(cdat$durability, Carpet.F, "bonferroni") Pairwise comparisons using t tests with pooled SD data: cdat$durability and Carpet.F 1 2 3 2 0.564 - - 3 1.000 1.000-4 1.000 0.045 0.388 P value adjustment method: bonferroni (University of New Haven) ANOVA: Analysis of Variance 21 / 31

ANOVA: Two Way Layout ANOVA: Two Way Layout ANOVA: Two Way Layout (University of New Haven) ANOVA: Analysis of Variance 22 / 31

ANOVA: Two Way Layout Same assumptions as before plus 1 Treatment A has I levels. 2 Treatment B has J levels. 3 a balanced design, i.e. all sample sizes = K (the same). One is interested in: 1 is there an effect for the treatment A? 2 is there an effect for the treatment B? 3 is there an effect for interaction of treatments? One can t answer 3 if sample size = 1. Two way ANOVA is more efficent than doing two one way ANOVA s plus it tells us information about the interaction of the two factors. (University of New Haven) ANOVA: Analysis of Variance 23 / 31

ANOVA: Two Way Layout Definition Here SS A = Sum of Squares of for Treatment A SS B = Sum of Squares of for Treatment B SS AB = Sum of Squares of Non Additive part SS E = Sum of Squares within treatments SS TOT = Total Sum of Squares A and B are the two main effects from each of the two factors, and AB represents the interaction of factors A and B. Theorem SS TOT = SS A + SS B + SS AB + SS E. (University of New Haven) ANOVA: Analysis of Variance 24 / 31

ANOVA: Two Way Layout Definition MS A = SS A = Mean Squares of Treatment A I 1 MS B = SS B J 1 MS AB = MS E = SS E N IJ = Mean Squares of Treatment B SS AB = Mean Squares of Non Additive part (I 1)(J 1) = Mean Squares within treatments Theorem MS E is an unbiased estimator of the population variance, σ 2. (University of New Haven) ANOVA: Analysis of Variance 25 / 31

ANOVA: Two Way Layout One creates a Two Way ANOVA Table: Source df SS MS F p Treatment A I 1 SS A MS A MS A MS E P(F(I 1, N IJ) observed F) Treatment B J 1 SS B MS B MS B MS E P(F(J 1, N IJ) observed F) Interaction (I 1)(J 1) SS AB MS AB MS AB MS E P(F((J 1)(I 1), N IJ) observed F) Error N IJ SS E MS E Total N 1 SS TOT Here The p value in the first row is for a test of H 0 : there is no effect for treatment A versus H A : there is an effect. The p value in the second row is for a test of H 0 : there is no effect for treatment B versus H A : there is an effect. The p value in the third row is for a test of H 0 : there is no non additive interactive effect for treatments A and B versus H A : there is an effect. (University of New Haven) ANOVA: Analysis of Variance 26 / 31

ANOVA: Two Way Layout Example Given data on carpet durability > cdat=read.table("carpet.dat",h=true) > cdat Durability Carpet Composition 18.95 1 A 12.62 1 B 11.94 1 A 14.42 1 B 10.06 2 A 7.19 2 B 7.03 2 A 14.66 2 B 10.92 3 A 13.28 3 B 14.52 3 A 12.51 3 B 10.46 4 A 21.40 4 B 18.10 4 A 22.50 4 B Test if durability depends on which carpet and which composition one choses. (University of New Haven) ANOVA: Analysis of Variance 27 / 31

ANOVA: Two Way Layout Example > cdat=read.table("carpet.dat",h=true) > Carpet.F=as.factor(cdat$Carpet) > Composition.F=as.factor(cdat$Composition) > gc3=lm(durability~carpet.f+composition.f+carpet.f:composition.f,data=cdat) > anova(gc3) Analysis of Variance Table Response: Durability Df Sum Sq Mean Sq F value Pr(>F) Carpet.F 3 146.374 48.791 4.0981 0.04912 * Composition.F 1 17.222 17.222 1.4466 0.26347 Carpet.F:Composition.F 3 51.007 17.002 1.4281 0.30462 Residuals 8 95.247 11.906 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (University of New Haven) ANOVA: Analysis of Variance 28 / 31

Chapter #11 R Assignment Chapter #11 R Assignment Chapter #11 R Assignment (University of New Haven) ANOVA: Analysis of Variance 29 / 31

Chapter #11 R Assignment Enter the following in R to create the data.frame, data, that contains one factor with three levels. > y1 = c(18.2, 20.1, 17.6, 16.8, 18.8, 19.7, 19.1) > y2 = c(17.4, 18.7, 19.1, 16.4, 15.9, 18.4, 17.7) > y3 = c(15.2, 18.8, 17.7, 16.5, 15.9, 17.1, 16.7) > y = c(y1, y2, y3) > group = rep(1:3, c(7, 7, 7)) > data = data.frame(y = y, group = factor(group)) 1 Do a qqnorm plot for y1, y2 and y3 to check for normality. 2 Check to see if one can assume the population variances are all equal. 3 Make a boxplot showing y1, y2 and y3. 4 Create a ANOVA Table. (University of New Haven) ANOVA: Analysis of Variance 30 / 31

Chapter #11 R Assignment The data file data2way.csv, found on math.newhaven.edu/mhm/courses/bstat/items.html, contains a hypothetical sample of 27 participants who are divided into three stress reduction treatment groups (mental, physical and medical) and three age groups (young, mid, and old). The stress reduction values are represented on a scale that ranges from 0 to 10. Read this data into R using data2way = read.csv("data2way.csv") Create a two-way ANOVA table and use the table for the following four problems: 5 Consider a test that the treatments have no effect on stress versus there is an effect. What is the p value of this test. 6 Consider a test that age has no effect on stress versus there is an effect. What is the p value of this test. 7 What is SS TOT? 8 What is the degrees of freedom for SS TOT? (University of New Haven) ANOVA: Analysis of Variance 31 / 31