STAT 705 Chapter 16: One-way ANOVA

Similar documents
STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

STAT 705 Chapters 22: Analysis of Covariance

Topic 20: Single Factor Analysis of Variance

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

Formal Statement of Simple Linear Regression Model

Chapter 11 - Lecture 1 Single Factor ANOVA

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA

Chapter 8: Regression Models with Qualitative Predictors

Categorical Predictor Variables

Sections 7.1, 7.2, 7.4, & 7.6

22s:152 Applied Linear Regression. Take random samples from each of m populations.

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA

Regression Models for Quantitative and Qualitative Predictors: An Overview

Chapter 20 : Two factor studies one case per treatment Chapter 21: Randomized complete block designs

Ch 2: Simple Linear Regression

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Analysis of Variance Bios 662

One-way ANOVA (Single-Factor CRD)

Chapter 1. Linear Regression with One Predictor Variable

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

STAT5044: Regression and Anova

Lecture 6 Multiple Linear Regression, cont.

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

BIOS 2083 Linear Models c Abdus S. Wahed

Lecture 1 Linear Regression with One Predictor Variable.p2

Chapter 3. Diagnostics and Remedial Measures

STAT 540: Data Analysis and Regression

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Section Poisson Regression

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

F-tests and Nested Models

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Nested Designs & Random Effects

Research Methods II MICHAEL BERNSTEIN CS 376

STAT 3A03 Applied Regression With SAS Fall 2017

22s:152 Applied Linear Regression

Regression With a Categorical Independent Variable

Fitting a regression model

Regression With a Categorical Independent Variable

STAT 506: Randomized complete block designs

STATISTICS FOR ECONOMISTS: A BEGINNING. John E. Floyd University of Toronto

Ch 3: Multiple Linear Regression

Lecture 10 Multiple Linear Regression

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Chapter 2 Inferences in Simple Linear Regression

VIII. ANCOVA. A. Introduction

Topic 28: Unequal Replication in Two-Way ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

STAT 401A - Statistical Methods for Research Workers

MAT3378 ANOVA Summary

Concordia University (5+5)Q 1.

General Linear Model (Chapter 4)

Chapter 11 - Lecture 1 Single Factor ANOVA

3. Design Experiments and Variance Analysis

Simple Linear Regression

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Stat 579: Generalized Linear Models and Extensions

Homework 2: Simple Linear Regression

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

General Linear Model: Statistical Inference

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Simple Linear Regression: One Qualitative IV

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

STAT Chapter 10: Analysis of Variance

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 4. Regression Models. Learning Objectives

6. Multiple Linear Regression

3. Diagnostics and Remedial Measures

Master s Written Examination

The Multiple Regression Model

where x and ȳ are the sample means of x 1,, x n

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Linear Models and Estimation by Least Squares

STAT 7030: Categorical Data Analysis

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Regression With a Categorical Independent Variable

Outline Topic 21 - Two Factor ANOVA

Confidence Intervals, Testing and ANOVA Summary

Lecture 4 Multiple linear regression

Stat 579: Generalized Linear Models and Extensions

df=degrees of freedom = n - 1

STA 4210 Practise set 2a

Chapter 6 Multiple Regression

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Lecture 6: Linear Regression

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Basic Business Statistics, 10/e

Transcription:

STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21

What is ANOVA? Analysis of variance (ANOVA) models are regression models with qualitative predictors, called factors or treatments Factors have different levels For example, the factor education may have the levels high school, undergraduate, graduate The factor gender has two levels female, male We may have several factors as predictors, eg race and gender may be used to predict annual salary in $ There are two types of factors: Classification (investigator cannot control) Experimental (investigator can control) 2 / 21

ANOVA A control treatment (or control factor level) is sometimes used to measure effects of (new or experimental) treatments under investigation, relative to the status quo eg ibuprofin, aspirin, and placebo We have 3 factor levels Without placebo, we do not know how iboprofin or aspirin does relative to no pain killer, only relative to each other Uses of ANOVA models: find best/worst treatment, measure effectiveness of new treatment, compare treatments Often interested in determining whether there is a difference in treatments Read Sections 161 168 in the text 3 / 21

163 Cell means model Have r different treatments or factor levels At each level i, have n i observations from group i Total number of observations is n T = n 1 + n 2 + + n r { i = 1,, r factor level Response is Y ij where j = 1,, n i obs within factor level } Example: Two factors: MS, PhD Y ij is age in years Spring of 2014 we observe Y 11 = 28, Y 12 = 24, Y 13 = 24, Y 14 = 22, Y 15 = 26, Y 16 = 23, Y 21 = 29, Y 22 = 23, Y 23 = 26, Y 24 = 25, Y 25 = 22, Y 26 = 23, Y 27 = 38, Y 28 = 33, Y 29 = 30, Y 2,10 = 27 4 / 21

One-way ANOVA model Y ij = µ i + ɛ ij, ɛ ij iid N(0, σ 2 ) Can rewrite as Y ij ind N(µ i, σ 2 ) Data are normal, data are independent, variance constant across groups µ i is allowed to be different for each group µ 1,, µ r are the r population means of the response A picture helps Questions: what is E{Y ij }? What is σ 2 {Y ij }? 5 / 21

Matrix formulation (pp 683 684, 710 712) For r = 3 we have Y 11 Y 12 Y 1n1 Y 21 Y 22 Y 2n2 Y 31 Y 32 Y 3n3 = 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 µ 1 µ 2 µ 3 + ɛ 11 ɛ 12 ɛ 1n1 ɛ 21 ɛ 22 ɛ 2n2 ɛ 31 ɛ 32 ɛ 3n3 or Y = Xβ + ɛ 6 / 21

164 Fitting the model For r = 3, let Q(µ 1, µ 2, µ 3 ) = 3 ni i=1 j=1 (Y ij µ i ) 2 Need to minumize this over all possible (µ 1, µ 2, µ 3 ) to find least-squares (LS) solution Can easily show that Q(µ 1, µ 2, µ 3 ) has minimum at ˆβ = ˆµ 1 ˆµ 2 ˆµ 3 = Ȳ 1 Ȳ 2 Ȳ 3 where Ȳ i = 1 n i ni j=1 Y ij is the sample mean from the ith group (pp 687 688) These ˆβ are also maximum likelihood estimates 7 / 21

Matrix formula of least-squares estimators (r = 3) X X = 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 (X X) 1 = n 1 1 0 0 0 n 1 2 0 0 0 n 1 3 1 0 0 1 0 0 0 1 0 = 0 1 0 0 0 1 0 0 1, X Y = Y 1 Y 2, Y 3 ˆβ = (X X) 1 X Y = Ȳ 1 Ȳ 2 Ȳ 3 n 1 0 0 0 n 2 0 0 0 n 3, 8 / 21

Residuals As in regression (STAT 704), e ij = Y ij Ŷ ij = Y ij ˆµ i = Y ij Ȳ i As usual, Ŷ ij is the estimated mean response under the model Note that n i j=1 e ij = 0 [check this!] In matrix terms e = Y Xˆβ = Y Ŷ 9 / 21

Kenton Food Company Example r = 4 box designs for a new breakfast cereal 20 stores w/ roughly equal sales volumes picked to participate; n i = 5 is planned for each A fire occurred at one store that had design 3, so ended up with n T = 19 instead of 20, and n 1 = n 2 = n 4 = 5 and n 3 = 4 10 / 21

Kenton foods example data kenton; input sales design @@; datalines; 11 1 17 1 16 1 14 1 15 1 12 2 10 2 15 2 19 2 11 2 23 3 20 3 18 3 17 3 27 4 33 4 22 4 26 4 28 4 ; proc sgscatter; plot sales*design; run; proc glm plots=all; * zero/one dummy variables, but recover cell means via lsmeans; class design; model sales=design; lsmeans design; run; 11 / 21

165 ANOVA table (pp 690 698) Define the following n i Y i = Y ij = i group sum, j=1 Ȳ i = 1 n i n i Y ij = ith group mean j=1 Y = r n i Y ij = i=1 j=1 r Y i = sum all obs i=1 Ȳ = 1 n T r n i i=1 j=1 Y ij = 1 n T r Y i = mean all obs i=1 12 / 21

Sums of squares for treatments, error, and total SSTO = SSTR = = r n i (Y ij Ȳ ) 2 = variability in Y ij s i=1 j=1 n i r (Ŷ ij Ȳ ) 2 = i=1 j=1 n i r (Ȳ i Ȳ ) 2 = r n i (ˆµ ij Ȳ ) 2 i=1 j=1 i=1 j=1 i=1 r n i (Ȳ i Ȳ ) 2 = variability explained by ANOVA model r n i r n i SSE = (Y ij Ŷ ij ) 2 = i=1 j=1 i=1 j=1 = variability NOT explained by ANOVA model e 2 i 13 / 21

Comments As before in regression, SSTO }{{} = } SSTR {{} + }{{} SSE total treatment effects leftover randomness SSE=0 Y ij = Y ik for all j k SSTR=0 Ȳi = Ȳ for i = 1,, r 14 / 21

ANOVA table (p 694) Source SS df MS SSTR r ni i=1 j=1 (Ȳ i Ȳ ) 2 r 1 SSTR/(r 1) SSE r ni i=1 j=1 (Y ij Ȳi ) 2 n T r SSE/(n T r) SSTO r ni i=1 j=1 (Y ij Ȳ ) 2 n T 1 15 / 21

Degrees of freedom SSTO has n T 1 df because there are n T Y ij Ȳ terms in the sum, but they add up to zero (1 constraint) SSE has n T r df because there are n T Y ij Ȳi terms in the sum, but there are r constraints of the form ni j=1 (Y ij Ȳ i ) = 0 SSTR has r 1 df because there are r terms n i (Ȳ i Ȳ ) in the sum, but they sum to zero (1 constraint) 16 / 21

Estimated mean squares E{MSE} = σ 2, MSE is unbiased estimate of σ 2 r E{MSTR} = σ 2 i=1 + n i(µ i µ ) 2, r 1 where µ = r n i µ i i=1 n T is weighted average of µ 1,, µ r (pp 696 698) If µ i = µ j for all i, j {1,, r} then E{MSTR} = σ 2, otherwise E{MSTR} > σ 2 Hence, if any group means are different then E{MSTR} E{MSE} > 1 17 / 21

166 F test of H 0 : µ 1 = = µ r Fact: If µ 1 = = µ r then F = MSTR MSE F (r 1, n T r) To perform α-level test of H 0 : µ 1 = = µ r vs H a : some µ i µ j for i j, Accept if F F (1 α, r 1, n T r) or p-value α Reject if F > F (1 α, r 1, n T r) or p-value < α p-value = P{F (r 1, n T 1) F } Example: Kenton Foods 18 / 21

Comments If r = 2 then F = (t ) 2 where t is t-statistic from 2-sample pooled-variance t-test The F-test may be obtained from the general nested linear hypotheses approach (big model / little model) Here the full model is Y ij = µ i + ɛ ij and the reduced is Y ij = µ + ɛ ij F = [ ] SSE(R) SSE(F ) dfe R dfe F SSE(F ) dfe F = MSTR MSE 19 / 21

167 Alternative formulations SAS will fit the cell means model (discussed so far) with a noint option in model statement; however, the F-test will not be correct Your textbook discusses an alternative parameterization that is not easy to get out of the SAS procedures we will use By default, SAS fits the model where α r = 0 Y ij = µ + α i + ɛ ij, E{Y rj } = µ; µ is the cell-mean for the rth level For i < r, E{Y ij } = µ + α i ; α i is i s offset to group r s mean µ Note that SAS s default corresponds to a regression model where categorical predictors are modeled using the usual zero-one dummy variables In class, let s find the design X for SAS s model for r = 3 and n 1 = n 2 = n 3 = 2 20 / 21

SAS s baseline & offset model Even though SAS parameterizes the model differently, with the r th level as baseline, the ANOVA table and F-test is the same as the cell means model Also ˆµ = Ȳr and ˆα i = Ȳi Ȳr are the OLS and MLE estimators These are reported in SAS Use, eg model sales=design / solution; The cell means ˆµ i are obtained in SAS by adding lsmeans to glm or glimmix 21 / 21