What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Similar documents
1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

SIMPLE REGRESSION ANALYSIS. Business Statistics

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 4. Regression Models. Learning Objectives

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Inferences for Regression

ANOVA: Analysis of Variation

Lecture 7: Hypothesis Testing and ANOVA

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Chapter 10. Design of Experiments and Analysis of Variance

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter 16. Simple Linear Regression and Correlation

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

The Multiple Regression Model

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

10 One-way analysis of variance (ANOVA)

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 16. Simple Linear Regression and dcorrelation

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Mathematics for Economics MA course

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

PLSC PRACTICE TEST ONE

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Finding Relationships Among Variables

ECN221 Exam 1 VERSION B Fall 2017 (Modules 1-4), ASU-COX VERSION B

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Factorial designs. Experiments

Correlation Analysis

Inferences About the Difference Between Two Means

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Statistics for Managers using Microsoft Excel 6 th Edition

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Analysis of Variance

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Chapter 4: Regression Models

16.3 One-Way ANOVA: The Procedure

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

Analysis of variance (ANOVA) Comparing the means of more than two groups

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

STA121: Applied Regression Analysis

Statistics For Economics & Business

n i n T Note: You can use the fact that t(.975; 10) = 2.228, t(.95; 10) = 1.813, t(.975; 12) = 2.179, t(.95; 12) =

Unit 27 One-Way Analysis of Variance

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

The simple linear regression model discussed in Chapter 13 was written as

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Week 14 Comparing k(> 2) Populations

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03

13: Additional ANOVA Topics

4.1. Introduction: Comparing Means

Exam details. Final Review Session. Things to Review

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

ANOVA: Comparing More Than Two Means

One-Way Analysis of Variance (ANOVA)

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

EPSE 592: Design & Analysis of Experiments

STAT 350 Final (new Material) Review Problems Key Spring 2016

STATS Analysis of variance: ANOVA

Lecture 9: Linear Regression

Lec 1: An Introduction to ANOVA

Hypothesis Testing hypothesis testing approach

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Introduction to Analysis of Variance (ANOVA) Part 2

Review of Statistics 101

Regression Analysis. BUS 735: Business Decision Making and Research

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

What is a Hypothesis?

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

Analysis of Variance

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Formal Statement of Simple Linear Regression Model

Advanced Experimental Design

Statistics for EES Factorial analysis of variance

ANOVA CIVL 7012/8012

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Ordinary Least Squares Regression Explained: Vartanian

One-way between-subjects ANOVA. Comparing three or more independent means

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Comparing the means of more than two groups

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 26, 2015

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Non-parametric tests, part A:

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Sociology 6Z03 Review II

Basic Business Statistics 6 th Edition

Transcription:

What Is ANOVA? One-way ANOVA ANOVA ANalysis Of VAriance ANOVA compares the means of several groups. The groups are sometimes called "treatments" First textbook presentation in 95. Group Group σ µ µ σ µ σ Group Group 4 σ µ 44 µ σ σ 44 One way ANOVA (the F ratio test) A One-Way ANOVA and a Two-Way ANOVA were talking shop one day. The One-Way said, "I sure do envy the interaction you have with your variables. The Two-Way frowned and replied, "Yah, but the minute it diminishes to any significant extent they really become independent and go their own separate ways." it is the standard parametric test of difference between or more samples, it is the parametric equivalent of the Kruskal- Wallis test H 0 µ µ µ µ k the population means are all equal so that if just one is different, we will reject H 0 Comparing Groups Basketball Player Heights This one is easy! HSFresh HSSenior College NBA 59.00 6.50 64.00 66.50 69.00.50 4.00 6.50 9.00 8.50 84.00 86.50 Player Height (inches)

Why not multiple t-tests? If you have 5 groups you end up with 0 t- tests, too difficult to evaluate The greater the number of tests you make, the more likely you commit a type-i error (that is, you reject H 0 when you should accept it) There are methods to do pair wise tests that we ll discuss later Some Terminology The ANOVA model specifies a single dependent variable (continuous) There will be one or more explanatory factors (categorical) Each factor has several levels (or groups) called treatments. ANOVA reveals whether or not the mean depends on the treatment group from which an observation is taken. Assumptions we expect that the sample means will be different, question is, are they significantly different from each other? H 0 : differences are not significant - differences between sample means have been generated in a random sampling process H : differences are significant - sample means likely to have come from different population means ) data must be at the interval/ratio level ) sample data must be drawn from a normally distributed population ) sample data must be drawn from independent random samples One-Factor ANOVA 4) populations have the same/similar variances - homoscedasticity assumption from which the samples are drawn The dependent variable should have the same variance in each category of the independent variable. The reason for this assumption is that the denominator of the F-ratio is the within-group mean square, which is the average of group variances taking group sizes into account. When groups differ widely in variances, this average is a poor summary measure. However, ANOVA is robust for small and even moderate departures from homogeneity of variance Model Form Hypotheses H 0 : τ j 0 (no treatment effect exists) H : τ j 0 (treatment effect exists) X ij µ + τ j + ε ij Definitions X ij observation i in group j µ common mean τ j effect due to treatment group j ε ij random error Note If H 0 is true, the model collapses to X ij µ + ε ij which says that each observed data value is just the mean perturbed by some random error.

One Factor: DVD Price General Form: Specific Form: Verbal Form: X f(t) X ij µ + τ j + ε ij Price f(store Type) Music Store Bookstore Discount Store 8.95 4.95.50 4.95 5.95.50 5.95.95 9.50.00.5.5.00.5 4.50.00 X price of a recent DVD (continuous variable) T store type (discrete - treatment levels) This example shows that group sizes can be unequal. ANOVA examines differences in means by looking at estimates of variances essentially ANOVA poses the following partition of a data value observationoverall mean + deviation of group mean from overall mean + deviation of observation from group mean the overall mean is a constant to all observations Case example deviation of the group mean from the overall mean is taken to represent the effect of belonging to a particular group deviation from the group mean is taken to represent the effect of all other variables other than the group variable Example consider the numbers below as constituting One data set Sample Sample Sample observations 4 x... Much of the variation is within each row, it is difficult to tell if the means are significantly different as stated before, the variability in all observations can be divided into parts ) variations due to differences within rows ) variations due to differences between rows ) variations due to sampling errors

Example much of the variation is between each row, it is easy to tell if the means are significantly different [note: way ANOVA does not need equal numbers of observations in each row] Sample Sample Sample observations 4 8 x.5..5 in example : between row variation is small compared to row variation, therefore F will be small large values of F indicate difference conclude there is no significant difference between means in example : between row variation is large compared to within row variation, therefore, F will be large conclude there is a significant difference Sample vs Sample. obs S S 4. S. x obs x the first step in ANOVA is to make two.5 S S 4. S 8.5 estimates of the variance of the hypothesized common population ) the within samples variance estimate ) the between samples variance estimate within sample variance estimate Between sample variance σ w k n ( x j N k σ b k nx ( x G ) k k - is the number of sample N - total number of individuals in all samples X bar - is the mean x G is the grand mean 4

F ratio having calculated estimates of the population variance how probable is it that values are estimates of the same population variance to answer this we use the statistic known as the F ratio between row variation F ratio within row variation estimate of variance between samples Fratio estimate of variance within samples significance: critical values are available from tables df are calculated as ) for the between sample variance estimate they are the number of sample means minus (k-) ) for the within sample variance they are the total number of individuals in the data minus the number of samples (N-k) since the calculations are somewhat complicated it should be done in a table Example Test winning times for the men s Olympic 00 meter dash over several time periods Hypotheses 900-9 90-9 96-956 winning time (in seconds) 0.8 0.8 0.8 0.6 0.8 0. 0. 0.4 0.8 0. 0.5 X k 0.85 0.65 0.5 H 0 : There is no significant difference in winning times. The difference in means have been generated in a random sampling process H : There are significant differences in winning times. Given observed differences in sample means, it is likely they have been drawn from different populations. Confidence at p0.0, 99% confident from different population source: Chatterjee, S. and Chatterjee, S. (98) New lamps for old: an exploratory analysis of running times in Olympic Games, Applied statistics,, 4-. 900-9 0.8 0.8 0.8 x4.4 X 0.85 90-9 0.8 0.6 0.8 0. x4.5 X 0.65 96-956 0. 0. 0.4 0.5 x4.5 X 0.5 X G 0.6 900-9 90-9 96-956 900-9 90-9 96-956 ( x ( x ( x ( x ( x ( x -.05.5 -.05.005.006.0056.5 -.05 -.05.05.0006.0056 -.05.5.05.005.006.0006 -.05 -.5.5.005.056.056 Σ ( x.0 ( x.64 ( x.04 5

calculation of between samples variance estimate nx x G 0 6 0 0 4 σ B ( ). + +. 058. k σ w k n ( x j 048. 005. N k 900-9 90-9 X0.85 X0.65 n(x-x G ) n(x-x G ) 4(0.85-0.6) 4(0.65-0.6) 4(.059) 4(.0000).6 0 96-956 X0.5 n(x-x G ) 4(0.5-0.6) 4(.0600).4 estimate of between samples Fratio variance estimate of within samples 0. 58 variance 05 9. 0. variance estimate df critical value8.0, page in text, α.0 therefore we reject the null hypothesis between samples.58 (k-) within samples.05 9 (N-k) one problem in using these formulas, however, is that if the means are approximated, the multiple subtractions compound rounding errors. To get around this, the formulas can rewritten as: Ntotal number of observations T is the total sum of observations T r j r k SST ( x n T ij ) k j x ij It can be shown that: Total sum of squares Between row sum of squares + within row sum of squares 6

SST SSR + SSE 900-9 0.8 0.8 0.8 X 6.64 6.64 6.64 90-9 0.8 0.6 0.8 0. X 6.64.6 6.64 06.09 96-956 0. 0. 0.4 0.5 X 06.09 06.09 08.6 0.5 Where: SST ( x N T T i ij ) SSR n N T i T sum of all observations T i is the sum of all the observations in a row Totals 40.9 45. 40.59 5.4 k Ti SSR n N T 44. 4. 5 45. [ + + i ] [ ] [ (. 4 ] 0. 45 4 4 4 i SSESST-SSR SST 564 4.. 068. SSE.68 -.45 0. MSR SSR 045. k 05. SSE MSE N k 0. 0055. MSR 05. F MSE 0055. 88. df k df N k 9 Source of variation between rows/groups /samples within rows/groups /samples Total df k- () N-k (9) N- sum of squares SSR (0.45) SSE (0.) 0.68 mean square SSR/k- (0.5) SSE/N-k (0.055) F-statistic MSR/MSE (8.8) F distribution is One tail only (no such thing as a two tailed test), it doesn t make sense to specify direction with k samples computed value of F (8.8) > critical value df, df 9, α0.05, critical value4.6 df, df 9, α0.0, critical value8.0 We can be > 99% certain the differences between the time periods are significant and that the observations in each time period are drawn from distributions with different population means

Violations Lack of independence Example is time series Outliers Outliers tend to increase the estimate of sample variance, thus decreasing the calculated F statistic for the ANOVA and lowering the chance of rejecting the null hypothesis Nonnormality Do a histogram to check If sample size is small it may be difficult to detect If the samples are not seriously imbalanced in size skewness won t have much impact Do a normal Q-Q plot or normal quantile-quantile plot, it s a plot of the ordered data values (as Y) against the associated quantiles of the normal distribution (as X) Expected Normal Value Normal Q-Q Plot of discharge cubic km 4000 000 000 000 0-000 -000-000 0 000 4000 6000 8000 Observed Value Pairwise comparisons There are procedures available for doing comparisons between the means of the classes in the anova Some of those available in SPSS Scheffe s test Bonferrani s test Tukey-Kramer best for ANOVA of unequal sample sizes Tukey test best for balanced designs 8