Analysis of Variance

Similar documents
Statistical Techniques II EXST7015 Simple Linear Regression

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Chapter 12. ANalysis Of VAriance. Lecture 1 Sections:

Inference for Regression Simple Linear Regression

Calculating Fobt for all possible combinations of variances for each sample Calculating the probability of (F) for each different value of Fobt

Mathematical Notation Math Introduction to Applied Statistics

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

The Random Effects Model Introduction

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Lec 1: An Introduction to ANOVA

Categorical Predictor Variables

Unit 12: Analysis of Single Factor Experiments

Confidence Intervals, Testing and ANOVA Summary

df=degrees of freedom = n - 1

Review of Statistics 101

What If There Are More Than. Two Factor Levels?

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

PLSC PRACTICE TEST ONE

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Hypothesis testing: Steps

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

STAT 115:Experimental Designs

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

Sociology 6Z03 Review II

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

ANOVA Analysis of Variance

Formulas and Tables. for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. ˆp E p ˆp E Proportion.

Design & Analysis of Experiments 7E 2009 Montgomery

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Inference for Regression Inference about the Regression Model and Using the Regression Line

Analysis of Variance and Design of Experiments-I

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

PLS205 Lab 2 January 15, Laboratory Topic 3

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Analysis of Variance (ANOVA)

EXST Regression Techniques Page 1 SIMPLE LINEAR REGRESSION WITH MATRIX ALGEBRA

Formulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc.

using the beginning of all regression models

Chap The McGraw-Hill Companies, Inc. All rights reserved.

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

COMPARING SEVERAL MEANS: ANOVA

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Comparing the means of more than two groups

Regression With a Categorical Independent Variable: Mean Comparisons

Introduction to the Analysis of Variance (ANOVA)

VIII. ANCOVA. A. Introduction

Lecture 7: Hypothesis Testing and ANOVA

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Assignment #7. Chapter 12: 18, 24 Chapter 13: 28. Due next Friday Nov. 20 th by 2pm in your TA s homework box

Central Limit Theorem ( 5.3)

One-way between-subjects ANOVA. Comparing three or more independent means

Elementary Statistics for the Biological and Life Sciences

Chapter 4. Regression Models. Learning Objectives

Factorial designs. Experiments

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Comparing Nested Models

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Tables Table A Table B Table C Table D Table E 675

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

22s:152 Applied Linear Regression. 1-way ANOVA visual:

10/31/2012. One-Way ANOVA F-test

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Analysis of variance (ANOVA) ANOVA. Null hypothesis for simple ANOVA. H 0 : Variance among groups = 0

Unit 27 One-Way Analysis of Variance

Two-Sample Inferential Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Inferences for Regression

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

Data Analysis and Statistical Methods Statistics 651

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Correlation Analysis

Chapter 3 Multiple Regression Complete Example

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Lecture 18: Analysis of variance: ANOVA

Analysis of Variance

STAT Chapter 10: Analysis of Variance

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Two-Way Factorial Designs

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Chapter 4: Regression Models

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Transcription:

Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1

Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also known a "One-way" Analysis of Variance. Unlike regression, which fits slopes for regression lines and calculates a measure of random variation about those lines, ANOVA fits means and variation about those means. 15a_ANOVA_Introduction 2

Design (continued) The hypotheses tested are hypotheses about the equality of means H 0 : µ 1 = µ 2 = µ 3 = µ 4 =... = µ t Where the µ i represent means of the levels of some categorical variable "t" is the number of levels in the categorical variable. H 1 : some µ i is different 15a_ANOVA_Introduction 3

Design (continued) We will generically refer to the categorical variable as the "treatment" even though it may not actually be an experimenter manipulated effect. The number of treatments will be designated "t". The number of observations within treatments will be designated n for a balanced design (the same number of observations in each treatment), or n i for an unbalanced design (for i = 1 to t). 15a_ANOVA_Introduction 4

Design (continued) The assumptions for basic ANOVA are very similar to those of regression. The residuals, or deviations of observations within groups should be normally distributed. The treatments are independently sampled. The variance of each treatment is the same (homogeneous variance). 15a_ANOVA_Introduction 5

ANOVA review I am borrowing some material from my EXST7005 notes on t-test and ANOVA. See those notes for a more complete review of the introduction to Analysis of Variance (ANOVA). Start with the logic behind ANOVA. 15a_ANOVA_Introduction 6

Prior to R. A. Fisher's development of ANOVA, investigators were likely to have used a series of t tests to test among t treatment levels. What is wrong with that? Recall the Bonferroni adjustment. Each time we do a test we increase the chance of error. To test among 3 treatments we need to do 3 tests, among 4 treatments, 6 tests, 5 treatments are 10 tests, etc. 15a_ANOVA_Introduction 7

What is needed is ONE test for a difference among all tests with one overall value of α specified by the investigator (usually 0.05). Fisher's solution was simple, but elegant. 15a_ANOVA_Introduction 8

Suppose we have a treatment with 5 categories or levels. We can calculate a mean and variance for each treatment level. In order to get one really good estimate of variance we can pool the individual variances of the 5 categories (assuming homogeneity of variance). This pooled variance can be calculated as a weighted mean of the variance (weighted by the degrees of freedom). 15a_ANOVA_Introduction 9

Y Y A B C D E Group 15a_ANOVA_Introduction 10

And since (n 1-1)S 12 = SS 1, the weighted mean is simply the sum of the SS divided by the sum of the d.f. 2 2 2 2 2 ( n1 1) S1 + ( n2 1) S2 + ( n3 1) S3 + ( n4 1) S4 + ( n 1) S Sp = ( n 1) + ( n 1) + ( n 1) + ( n 1) + ( n 1) 1 2 3 4 5 2 5 5 S 2 p = SS1+ SS2 + SS3 + SS4 + SS5 ( n 1) + ( n 1) + ( n 1) + ( n 1) + ( n 1) 1 2 3 4 5 15a_ANOVA_Introduction 11

15a_ANOVA_Introduction 12

So we have one very good estimate of the random variation, or sampling error, S 2. Then what? 15a_ANOVA_Introduction 13

Now consider the treatments. Why don't they all fall on the overall mean? Actually, under the null hypothesis, they should, except for some random variation. So if we estimate that random variation, it should be equal to the same error we already estimated within groups? 15a_ANOVA_Introduction 14

If we estimate a variance with means, we are estimating the variance of means, which is S 2 /n. If we multiply this by "n" it should actually be equal to S 2, which we estimated with S 2 p, the pooled variance estimate. 15a_ANOVA_Introduction 15

So if the null hypothesis is true, the mean square of the deviations within groups should be equal to the mean square of the deviations of the means multiplied by "n"!!!! 15a_ANOVA_Introduction 16

Y Deviations within groups Y Means A B C D E Deviations between groups Group 15a_ANOVA_Introduction 17

Now, if the null hypothesis is not true, and some µ i is different, then what? Y Y A B C D E Group 15a_ANOVA_Introduction 18

Then, when we calculate a mean square of deviations of the means from the overall mean, it should be larger than the previously estimated S 2 p. So we have two estimates of variance, S 2 p and the variance from the treatment means. If the null hypothesis is true, they should not be significantly different. 15a_ANOVA_Introduction 19

If the null hypothesis is FALSE, the treatment mean square should be larger. It will therefore be a ONE TAILED TEST! 15a_ANOVA_Introduction 20

We usually present this in an "Analysis of Variance" table. Source d.f. Sum of Squares Mean Square Treatment t-1 SSTreatment MSTreatment Error t(n-1) SSError MSError Total tn-1 SSTotal 15a_ANOVA_Introduction 21

Degrees of freedom There are tn observations total (Σn i if unbalanced). After the correction factor, there are tn-1 for the corrected total. There are t-1 degrees of freedom for the t treatment levels. Each group contributes n-1 d.f. to the pooled error term. There are t groups, so the pooled error (MSE) has t(n-1) d.f. 15a_ANOVA_Introduction 22

The SSTreatments is the SS deviations of the treatment means from the overall mean. Each deviation is denoted τ i, and is called an treatment "effect". SSTreatments = (Y - Y) t i=1 i 2 t = τ 2 i=1 i 15a_ANOVA_Introduction 23

The model for regression is Y i = β 0 + β 1 X i + ε i The effects model for a CRD is Y ij = µ + τ i + ε ij where the treatments are i=1, 2,... t and the observations are j=1, 2,... n, or n i for unbalanced data The means model is Y ij = µ i + ε ij 15a_ANOVA_Introduction 24

The calculations. The SSTotal is exactly the same as regression, the sum of all ΣY 2 ij observations (squared first). The correction factor is exactly the same too, all observations are summed, the sum is squared and divided by the number of observations, (ΣY ij ) 2 /tn. UncorrectedSSTreatments = 15a_ANOVA_Introduction 25 t i=1 n 2 ( Y ) ij j=1

Obs Group 1 Group 2 Group 3 Group 4 1 Y 11 Y 21 Y 31 Y 41 2 Y 12 Y 22 Y 32 Y 42... n Y 1n Y 2n Y 3n Y 4n sum ΣY 1 ΣY 2 ΣY 3 ΣY 4 mean Y 1 Y 2 Y 3 Y 4 15a_ANOVA_Introduction 26

Calculations are the same as regression for the corrected sum of squares total. The corrected SS treatments is the uncorrected treatments calculated from the marginals less the same correction factor used for the total. Error is usually calculated as the SSTotal minus the SSTreatments. 15a_ANOVA_Introduction 27

We use an F test to test the equality of two means. An Analysis of Variance usually proceeds with an F test of the MSTreatment using the MSError. The test has t-1 and t(n-1) degrees of freedom. This F test will be ONE TAILED since we expect the treatment variance to be too large if the null hypothesis is not true. 15a_ANOVA_Introduction 28

The MSError estimate a variance we designate σ 2 or σ 2 ε. If the null hypothesis is true, the MSTreatments estimates the SAME VARIANCE, σ 2. However, if the null hypothesis is false the MSTreatment variance is the same σ 2 plus some amount due to the differences between treatments. This is designated σ 2 + nσ 2 τ. 15a_ANOVA_Introduction 29

Since the treatment variance can be designated σ 2 + nσ 2 τ, we can see that the null hypothesis can be stated as either the usual H 0 :µ 1 =µ 2 =...=µ t, or as H 0 :σ 2 τ=0. Which is best depends on the nature of the treatment. If the treatment levels are randomly chosen from a large number of treatment levels, then they estimate the variance of that treatment population and would be random. This would be σ 2 τ. 15a_ANOVA_Introduction 30

However, if the treatments are not chosen from a large number of treatments; if they are either all of the levels of interest or all of the levels that exist, then they are said to be FIXED. Fixed treatment levels represent a group of means that are of interest to the investigator, so H 0 :µ 1 =µ 2 =...=µ t is a better representation of the null hypothesis than H 0 :σ 2 τ=0. 15a_ANOVA_Introduction 31

For fixed treatments we still calculate a sum of squared treatment effects and divide by d.f. This is designated nστ 2 i,/(n-1) and the F test is the same. These simply do not represent a variance. The two values estimated by the MSTreatment (σ 2 + nσ 2 τ or σ 2 + nστ 2 i,/(n-1)) and MSError (σ 2 ) are called expected mean squares. 15a_ANOVA_Introduction 32

One final note on the F test. Given that MSTreatments and MSError estimate these EMS (expected mean squares) we can rewrite the F test as F=(σ 2 + nσ 2 τ)/(σ 2 ). From this value we can see that it must be a one tailed test because nσ 2 τ cannot be negative, so the ratio is always >1. We can also see that increasing n increases power. 15a_ANOVA_Introduction 33

Example See SAS handout. 15a_ANOVA_Introduction 34

Summary? Y Means Y Deviations A B C D E Group 15a_ANOVA_Introduction 35