Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Similar documents
Two-Way Analysis of Variance - no interaction

Contents. TAMS38 - Lecture 8 2 k p fractional factorial design. Lecturer: Zhenxia Liu. Example 0 - continued 4. Example 0 - Glazing ceramic 3

Contents. TAMS38 - Lecture 6 Factorial design, Latin Square Design. Lecturer: Zhenxia Liu. Factorial design 3. Complete three factor design 4

Multiple Regression Examples

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Contents. 2 2 factorial design 4

One-Way Analysis of Variance (ANOVA)

Confidence Interval for the mean response

1 Use of indicator random variables. (Chapter 8)

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Unit 8: A Mixed Two-Factor Design

Model Building Chap 5 p251

DESAIN EKSPERIMEN BLOCKING FACTORS. Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA

This document contains 3 sets of practice problems.

STAT 525 Fall Final exam. Tuesday December 14, 2010

Formal Statement of Simple Linear Regression Model

Unit 8: A Mixed Two-Factor Design

The Random Effects Model Introduction

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Lecture 18: Simple Linear Regression

Basic Business Statistics 6 th Edition

Inference for the Regression Coefficient

Contents. TAMS38 - Lecture 10 Response surface. Lecturer: Jolanta Pielaszkiewicz. Response surface 3. Response surface, cont. 4

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Stat 6640 Solution to Midterm #2

2-way analysis of variance

Chapter 12. Analysis of variance

STAT 506: Randomized complete block designs

Correlation & Simple Regression

Ch 2: Simple Linear Regression

ACOVA and Interactions

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Orthogonal contrasts for a 2x2 factorial design Example p130

More about Single Factor Experiments

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Multiple comparisons - subsequent inferences for two-way ANOVA

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

23. Inference for regression

School of Mathematical Sciences. Question 1

3. Design Experiments and Variance Analysis

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Examination paper for TMA4255 Applied statistics

Lecture 10. Factorial experiments (2-way ANOVA etc)

Factorial designs. Experiments

INTRODUCTION TO ANALYSIS OF VARIANCE

ANOVA (Analysis of Variance) output RLS 11/20/2016

Inference for Regression

Joint Probability Distributions

Statistics for Managers using Microsoft Excel 6 th Edition

PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design

Simple Linear Regression: A Model for the Mean. Chap 7

Data files & analysis PrsnLee.out Ch9.xls

Concordia University (5+5)Q 1.

Sleep data, two drugs Ch13.xls

Model Checking. Chapter 7

Unit 7: Random Effects, Subsampling, Nested and Crossed Factor Designs

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Inference for Regression Inference about the Regression Model and Using the Regression Line

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

Stat 705: Completely randomized and complete block designs

The simple linear regression model discussed in Chapter 13 was written as

Regression. Marc H. Mehlman University of New Haven

Design of Experiments. Factorial experiments require a lot of resources

MODULE 4 SIMPLE LINEAR REGRESSION

INFERENCE FOR REGRESSION

VARIANCE COMPONENT ANALYSIS

28. SIMPLE LINEAR REGRESSION III

BIOS 2083 Linear Models c Abdus S. Wahed

3. Factorial Experiments (Ch.5. Factorial Experiments)

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA

STAT 360-Linear Models

Conditions for Regression Inference:

Multiple Regression an Introduction. Stat 511 Chap 9

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

Confidence Intervals, Testing and ANOVA Summary

Chapter 12 - Lecture 2 Inferences about regression coefficient

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Scatter plot of data from the study. Linear Regression

Six Sigma Black Belt Study Guides

SMAM 314 Practice Final Examination Winter 2003

Design & Analysis of Experiments 7E 2009 Montgomery

Stat 217 Final Exam. Name: May 1, 2002

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

STATISTICS 368 AN EXPERIMENT IN AIRCRAFT PRODUCTION Christopher Wiens & Douglas Wiens

Multiple Predictor Variables: ANOVA

Transcription:

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying out the F tests for interaction, and for the main effects of factors A and B, we have assumed that ɛ ijk are as sample from N(0, σ 2 ). Among other things, this means that: the distribution of the errors (and in particular, the variance σ 2 ) does not differ depending on the level of factor A, the level of factor B, or the mean of the response (µ ij = µ + α i + β j + γ ij ) the errors are a sample from a normal distribution If these assumptions hold, then the p-values for the tests of interaction and main effects are valid. If the assumptions do not hold, then the p-values may substantially over- or under-estimate the evidence against the null hypotheses. Residuals are usually defined as the difference dataprediction. In the twoway anova model with interaction, the predicted value of Y ijk is ˆµ ij, and so the residuals are r ijk = Y ijk ˆµ ij = Y ijk Ȳij.

(Another way of writing the residual for the twoway model with interaction is r ijk = Y ijk ˆµ ˆα i ˆβ j ˆγ ij.) If the sample size is moderately large, the residuals should be approximately equal to the errors ɛ ijk, and so we use the residuals (which are known to us) in place of the errors ɛ ijk (which are unknown) to assess the plausibility of the model assumptions. The following plots are often useful in this regard: 1. A QQ plot of the residuals is used to assess the assumption of normality of errors 2. To assess the assumption that the distribution of the errors (in particular the variance of the distribution) does not depend on the levels of either factor A or factor B, the residuals should be plotted against: (a) the levels of factor A (b) the levels of factor B (c) the fitted values Ȳij. Example: A two factor experiment was carried out in which the survival times (in units of 10 hours) were measured for groups of four animals (replicates) randomly allocated to three poisons and four treatments. The data were as follows:

Poison Treatment Data I A 0.31 0.45 0.46 0.43 II A 0.36 0.29 0.40 0.23 III A 0.22 0.21 0.18 0.23 I B 0.82 1.10 0.88 0.72 II B 0.92 0.61 0.49 1.24 III B 0.30 0.37 0.38 0.29 I C 0.43 0.45 0.63 0.76 II C 0.44 0.35 0.31 0.40 III C 0.23 0.25 0.24 0.22 I D 0.45 0.71 0.66 0.62 II D 0.56 1.02 0.71 0.38 III D 0.30 0.36 0.31 0.33 The data were entered into minitab, and a twoway anova was carried out, as follows: MTB > print c1 0.31 0.45 0.46 0.43 0.36 0.29 0.40 0.23 0.22 0.21 0.18 0.23 0.82 1.10 0.88 0.72 0.92 0.61 0.49 1.24 0.30 0.37 0.38 0.29 0.43 0.45 0.63 0.76 0.44 0.35 0.31 0.40 0.23 0.25 0.24 0.22 0.45 0.71 0.66 0.62 0.56 1.02 0.71 0.38 0.30 0.36 0.31 0.33

MTB > set c2 DATA> 4(1 1 1 1 2 2 2 2 3 3 3 3) DATA> set c3 DATA> 12(1) 12(2) 12(3) 12(4) MTB > twoway c1 c2 c3; SUBC> residuals c4; SUBC> fits c5. Two-way ANOVA: C1 versus C2, C3 Source DF SS MS F P C2 2 1.03301 0.516506 23.22 0.000 C3 3 0.92121 0.307069 13.81 0.000 Interaction 6 0.25014 0.041690 1.87 0.112 Error 36 0.80073 0.022242 Total 47 3.00508 S = 0.1491 R-Sq = 73.35% R-Sq(adj) = 65.21% MTB > nscores c4 c6 The normal scores and residual plots are as follows:

Figure 1: Normal scores plot of residuals

Figure 2: Plot of residuals vs type of poison

Figure 3: Plot of residuals vs treatment

Figure 4: Plot of residuals vs fitted values If the QQ plot shows evidence of non-normality, or if the disribution of the residuals appears to depend on the levels of one or both factors, then the inferences (eg p- values) concerning the model parameters may be invalid. In this case, the QQ plot provides some suggestion of non-normality. The plots of residual vs factor level suggest that the variance of the residuals is not constant across

levels of either factor. A definite pattern can be seen in the plot of residuals vs predicted values, in which variance of the residual is increasing as the fitted value increases. This suggests that the variance of Y is increasing with the mean of Y. Consequently, our conclusions regarding the significance of effects and interactions may be in error due to incorrect assumptions. In such cases, one approach which is often taken is to try to find a transformation of the dependent variable to a form for which the model assumptions are better satisfied. Transformations which are sometimes tried are to replace Y by Y, log(y ), or 1/Y. There are some results from probability and statistical theory which provide techniques to search for so-called variance stabilizing transformations. These ideas are studied in some higher level statistics courses. After careful examination of the pattern of residuals, we are led to consider the reciprocal transformation, Z ijk = 1/Y ijk. (In this case, where Y are measurements of time, then Z = 1/Y are described as rates, and have units of 1/time.)

A twoway model was fit for Z ijk, leading to the following output: MTB > let c7=1/c1 MTB > twoway c7 c2 c3; SUBC> residuals c8; SUBC> fits c9. Two-way ANOVA: C7 versus C2, C3 Source DF SS MS F P C2 2 34.8771 17.4386 72.63 0.000 C3 3 20.4143 6.8048 28.34 0.000 Interaction 6 1.5708 0.2618 1.09 0.387 Error 36 8.6431 0.2401 Total 47 65.5053 S = 0.4900 R-Sq = 86.81% R-Sq(adj) = 82.77% MTB > nscores c8 c10 The normal scores and residual plots for the transformed data are as follows:

Figure 5: Plot of residuals vs poison - transformed data

Figure 6: Plot of residuals vs treatment - transformed data

Figure 7: Plot of residuals vs fitted values - transformed data

Figure 8: Normal scores plot of residuals - transformed data These residual plots suggest few departures from the model assumptions, and so we can be confident about the validity of our conclusions for the transformed data.

Reference: Box and Cox (1964) An analysis of transformation. J.Roy.Stat.Soc.B, 26, 211. The data were part of a larger investigation to combat the effects of toxic agents.