Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Similar documents
Topic 20: Single Factor Analysis of Variance

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Outline Topic 21 - Two Factor ANOVA

Topic 32: Two-Way Mixed Effects Model

Topic 28: Unequal Replication in Two-Way ANOVA

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

Topic 23: Diagnostics and Remedies

Topic 29: Three-Way ANOVA

Comparison of a Population Means

Statistics 512: Applied Linear Models. Topic 9

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

STAT 705 Chapter 16: One-way ANOVA

Chapter 1 Linear Regression with One Predictor

Lecture 4. Random Effects in Completely Randomized Design

Single Factor Experiments

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Analysis of Covariance

Chapter 2 Inferences in Simple Linear Regression

Chapter 11. Analysis of Variance (One-Way)

Topic 18: Model Selection and Diagnostics

Unbalanced Designs Mechanics. Estimate of σ 2 becomes weighted average of treatment combination sample variances.

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Chapter 8 Quantitative and Qualitative Predictors

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Answer to exercise: Blood pressure lowering drugs

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

General Linear Model (Chapter 4)

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Statistics 512: Applied Linear Models. Topic 7

Ch 2: Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Statistics 512: Applied Linear Models. Topic 1

Course Information Text:

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Analysis of Variance

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Assignment 6 Answer Keys

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Covariance Structure Approach to Within-Cases

Topic 14: Inference in Multiple Regression

Formal Statement of Simple Linear Regression Model

Repeated Measures Design. Advertising Sales Example

Ch 3: Multiple Linear Regression

Simple Linear Regression

PLS205 Lab 2 January 15, Laboratory Topic 3

Inference for Regression

13. The Cochran-Satterthwaite Approximation for Linear Combinations of Mean Squares

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Lecture 10 Multiple Linear Regression

STOR 455 STATISTICAL METHODS I

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Models for Clustered Data

Simple Linear Regression

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Models for Clustered Data

Lecture 11 Multiple Linear Regression

Lecture 9: Factorial Design Montgomery: chapter 5

Assessing Model Adequacy

Random Intercept Models

Lecture 3: Inference in SLR

R Output for Linear Models using functions lm(), gls() & glm()

The Random Effects Model Introduction

Overview Scatter Plot Example

Regression With a Categorical Independent Variable

Topic 13. Analysis of Covariance (ANCOVA) [ST&D chapter 17] 13.1 Introduction Review of regression concepts

Odor attraction CRD Page 1

Topic 16: Multicollinearity and Polynomial Regression

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

STAT 115:Experimental Designs

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Inferences for Regression

STAT 401A - Statistical Methods for Research Workers

3. Design Experiments and Variance Analysis

Regression With a Categorical Independent Variable

df=degrees of freedom = n - 1

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Introduction to SAS proc mixed

Chapter 8: Regression Models with Qualitative Predictors

SAS Syntax and Output for Data Manipulation:

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

14 Multiple Linear Regression

Introduction to SAS proc mixed

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Regression With a Categorical Independent Variable

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Lecture 11: Simple Linear Regression

Transcription:

Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is categorical Often called a factor The possible values are its levels Approach is a generalization of the independent twosample t-test (i.e., can be used when there are more than two treatments) The Data / Notation Y is the response variable X is the factor with r levels. These levels are often called groups or treatments. Let Y ij be the j th observation (j = 1, 2,..., n i ) in the i th group (i = 1, 2,..., r) Topic 17 3 Topic 17 4

ANOVA vs Regression ANOVA a special case of regression using indicator variables Recall in comparing regression lines, indicator variables were used to describe differences in intercepts Consider the linear model Y i = β 0 + β 1 X i1 + β 2 X i2 + ε i involving three groups where X 1 is the indicator for group 1 and X 2 is the indicator for group 2 Group 1 : Y i = β 0 + β 1 + ε i = µ 1 + ε i Group 2 : Y i = β 0 + β 2 + ε i = µ 2 + ε i Group 3 : Y i = β 0 + ε i = µ 3 + ε i Allows each level of factor to have different intercept (i.e., mean). There is no linear structure among these means. Example Page 685 Kenton Food Company wants to test four different package designs for a new breakfast cereal Twenty similar stores were selected to be part of the experiment Package designs randomly and equally assigned to stores. Fire hit one store so it was dropped Y is the number of cases sold X is the package design with r = 4 levels i = 1, 2, 3, 4 j = 1, 2,.., n i where n i = 5, 5, 4, 5 respectively will use n when n i constant Topic 17 5 Topic 17 6 SAS Commands data a1; infile u:\.www\datasets525\ch16ta01.txt ; input cases design store; proc print; symbol1 v=circle i=none; proc gplot data=a1; plot cases*design; proc means data=a1; var cases; by design; output out=a2 mean=avcases; proc print data=a2; symbol1 v=circle i=join; proc gplot data=a2; plot avcases*design; run; Topic 17 7 The Data Obs cases design store 1 11 1 1 2 17 1 2 3 16 1 3 4 14 1 4 5 15 1 5 6 12 2 1 7 10 2 2 8 15 2 3 9 19 2 4 10 11 2 5 11 23 3 1 12 20 3 2 13 18 3 3 14 17 3 4 15 27 4 1 16 33 4 2 17 22 4 3 18 26 4 4 19 28 4 5 Topic 17 8

Scatterplot Profile Plot Topic 17 9 Topic 17 10 The Model Since a special case of linear regression, same assumptions on errors hold. This implies... All observations assumed independent All observations Normally distributed with a mean that may depend on the level of the factor constant variance Model often presented in terms of the cell means or the factor effects The Cell Means Model Expressed numerically Y ij = µ i + ε ij where µ i is the theoretical mean of all observations at level i (or in cell i) The ε ij are iid N(0,σ 2 ) which implies the Y ij are independent N(µ i,σ 2 ) Parameters µ 1,µ 2,..., µ r σ 2 Topic 17 11 Topic 17 12

Primary Question In simple linear regression we ask Does the explanatory variable X help explain Y? Since the factor levels only affect the cell means we can similarly ask... Does µ i depend on i? H 0 : µ 1 = µ 2 =... = µ r = µ H a : at least one µ i different Estimates / Inference Estimate µ i by the sample mean of the observations at level i ˆµ i = Y i. For each level i, also estimate of the variance s 2 i = (Y ij Y i. ) 2 /(n i 1) These s 2 i are combined to estimate σ 2 NOTE: Same summaries computed for independent two-sample t-test Topic 17 13 Topic 17 14 Estimates / Inference If n i were constant, can compute s 2 by averaging the s 2 i s More general formula pools s 2 i using weights proportional to sample size (i.e., df) s 2 = = (ni 1)s 2 i (ni 1) (ni 1)s 2 i n T r ANOVA Table Similar ANOVA table construction Plug in Y i. as fitted value Source of Variation df SS Model r 1 ni (Y i. Y.. ) 2 Error n T r (Yij Y i. ) 2 Total n T 1 (Yij Y ) 2 where n T is the total number of obs NOTE: Do not pool or average s i s Y.. Y i. = Y ij /n T = Y ij /n i Topic 17 15 Topic 17 16

Expected Mean Squares All mean squares are random variables SAS Commands Can show E(MSE) = σ 2 (page 696) Can also show (page 697) E(MSR) = σ 2 ni (µ i µ. ) 2 + r 1 where µ. = ni µ i n T If H 0 true, MSR unbiased estimate of σ 2 In more complicated ANOVA models, EMS tell us how to construct F tests proc glm data=a1; class design; model cases=design; means design; lsmeans design / stderr; run; proc mixed data=a1; class design; model cases=design; lsmeans design; run; ***Don t use*** Topic 17 17 Topic 17 18 Output - GLM Class Level Information Class Levels Values design 4 1 2 3 4 Number of observations 19 Sum of Source DF Squares Mean Square F Value Pr > F Model 3 588.2210526 196.0736842 18.59 <.0001 Error 15 158.2000000 10.5466667 Corrected Total 18 746.4210526 Source DF Type III SS Mean Square F Value Pr > F design 3 588.2210526 196.0736842 18.59 <.0001 R-Square Coeff Var Root MSE cases Mean 0.788055 17.43042 3.247563 18.63158 Source DF Type I SS Mean Square F Value Pr > F design 3 588.2210526 196.0736842 18.59 <.0001 Topic 17 19 Topic 17 20

The GLM Procedure Output - GLM Level of ------------cases------------ design N Mean Std Dev 1 5 14.6000000 2.30217289 2 5 13.4000000 3.64691651 3 4 19.5000000 2.64575131 4 5 27.2000000 3.96232255 Least Squares Means Standard design cases LSMEAN Error Pr > t 1 14.6000000 1.4523544 <.0001 2 13.4000000 1.4523544 <.0001 3 19.5000000 1.6237816 <.0001 4 27.2000000 1.4523544 <.0001 Note: 4 2.30 2 + 4 3.65 2 + 3 2.65 2 + 4 3.96 2 = 158.24. Except for rounding, this is equal to SSE. Also, 19-4=15 which is the df error in the ANOVA table. Output - MIXED Model Information Data Set WORK.A1 Dependent Variable cases Covariance Structure Diagonal Estimation Method REML ***** Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Residual Class Level Information Class Levels Values design 4 1 2 3 4 Dimensions Covariance Parameters 1 ***** Columns in X 5 Columns in Z 0 Subjects 1 ***** Max Obs Per Subject 19 ***** Topic 17 21 Topic 17 22 Covariance Parameter Estimates Cov Parm Estimate Residual 10.5467 Output - MIXED Fit Statistics -2 Res Log Likelihood 84.1 AIC (smaller is better) 86.1 BIC (smaller is better) 86.8 Type 3 Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F design 3 15 18.59 <.0001 Least Squares Means design Estimate Std. Error DF t Value Pr > t 1 14.6000 1.4524 15 10.05 <.0001 2 13.4000 1.4524 15 9.23 <.0001 3 19.5000 1.6238 15 12.01 <.0001 4 27.2000 1.4524 15 18.73 <.0001 The Factor Effects Model A reparameterization of the cell means model A very useful way of looking at more complicated ANOVA models (i.e., more than one factor) Null hypotheses are easier to state Expressed numerically Y ij = µ + τ i + ε ij Topic 17 23 Topic 17 24

Parameters The Factor Effects Model µ, τ 1,τ 2,..., τ r σ 2 Factor effects model has r + 2 parameters while the cell means model has r + 1 parameters Overparameterized...not a unique solution Consider r = 3 with µ 1 = 10,µ 2 = 0, and µ 3 = 20 µ = 0, τ 1 = 10, τ 2 = 0, τ 3 = 20 µ = 10, τ 1 = 0, τ 2 = 10, τ 3 = 10 µ = 100, τ 1 = 90, τ 2 = 100, τ 3 = 80 The Factor Effects Model Because the factor effects model has non-unique solution, we put a constraint on the τ i s Examples of constraint τ r = 0 (SAS approach) τ i = 0 (conceptual approach) Reduces the number of parameters by 1 so we now have a unique solution Topic 17 25 Topic 17 26 Consequences of Constraint Choice Consider r = 3 with n i = n Factor effects model with constraint τ i = 0 E(Y.. ) = 3µ + τ i 3 = µ E(Y i. ) = µ + τ i In this case µ is the grand mean and τ i is the effect of the i th factor Factor effects model with τ r = 0 E(Y 3. ) = µ E(Y 1. Y 3. ) = µ + τ 1 µ = τ 1 In this case µ is the mean of the r th group and τ i is the difference between the means of group i and group r Consequences of Constraints Different constraints result in different parameter / parameter estimates Many estimates, however, are the same regardless of constraint. Recall our example ˆµ + ˆτ 1 = trt 1 mean ˆµ + ˆτ 3 = trt 3 mean ˆτ 1 ˆτ 3 = difference in trt 1 and trt 3 ˆτ 1 ˆτ 2 = difference in trt 1 and trt 2 These are primarily the ones of interest Details a bit more complicated when n i not constant (pages 709-710) but same concept Topic 17 27 Topic 17 28

Hypotheses H 0 : µ 1 = µ 2 =... = µ r = µ H a : at least one µ i different is translated into H 0 : τ 1 = τ 2 =... = τ r = 0 Background Reading KNNL Section 16.1-16.7 knnl686.sas knnl717.sas KNNL Chapter 16.8 H a : at least one τ i 0 Topic 17 29 Topic 17 30