STAT 705 Chapter 19: Two-way ANOVA

Similar documents
STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapters 22: Analysis of Covariance

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Outline Topic 21 - Two Factor ANOVA

Two-Way Analysis of Variance - no interaction

Chapter 20 : Two factor studies one case per treatment Chapter 21: Randomized complete block designs

1 Tomato yield example.

Formal Statement of Simple Linear Regression Model

Ch. 5 Two-way ANOVA: Fixed effect model Equal sample sizes

MAT3378 ANOVA Summary

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Statistics 512: Applied Linear Models. Topic 7

STAT 506: Randomized complete block designs

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

STAT Final Practice Problems

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Factorial and Unbalanced Analysis of Variance

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Unit 8: 2 k Factorial Designs, Single or Unequal Replications in Factorial Designs, and Incomplete Block Designs

Lecture 9: Factorial Design Montgomery: chapter 5

STAT 525 Fall Final exam. Tuesday December 14, 2010

Homework 3 - Solution

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Factorial designs. Experiments

Stat 705: Completely randomized and complete block designs

CS 147: Computer Systems Performance Analysis

Analysis of Variance and Design of Experiments-I

Statistics 210 Part 3 Statistical Methods Hal S. Stern Department of Statistics University of California, Irvine

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Diagnostics and Remedial Measures

Topic 28: Unequal Replication in Two-Way ANOVA

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

3. Design Experiments and Variance Analysis

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Unbalanced Data in Factorials Types I, II, III SS Part 1

Remedial Measures, Brown-Forsythe test, F test

Ch 2: Simple Linear Regression

Fractional Factorial Designs

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Factorial ANOVA. Psychology 3256

Sections 7.1, 7.2, 7.4, & 7.6

Two-Factor Full Factorial Design with Replications

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Topic 20: Single Factor Analysis of Variance

Two or more categorical predictors. 2.1 Two fixed effects

Nested Designs & Random Effects

Analysis of Covariance

Unit 7: Random Effects, Subsampling, Nested and Crossed Factor Designs

Allow the investigation of the effects of a number of variables on some response

STAT 401A - Statistical Methods for Research Workers

V. Experiments With Two Crossed Treatment Factors

Analysis of Covariance

Chapter 15: Analysis of Variance

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Chapter 3. Diagnostics and Remedial Measures

One-way ANOVA (Single-Factor CRD)

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

ACOVA and Interactions

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Lectures on Simple Linear Regression Stat 431, Summer 2012

Key Features: More than one type of experimental unit and more than one randomization.

Statistics 512: Applied Linear Models. Topic 9

Fitting a regression model

F-tests and Nested Models

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Lec 1: An Introduction to ANOVA

Lecture 1 Linear Regression with One Predictor Variable.p2

Statistics For Economics & Business

Statistics for Managers using Microsoft Excel 6 th Edition

Topic 9: Factorial treatment structures. Introduction. Terminology. Example of a 2x2 factorial

More about Single Factor Experiments

df=degrees of freedom = n - 1

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Simple Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

STAT22200 Spring 2014 Chapter 8A

Stat 511 HW#10 Spring 2003 (corrected)

3. Factorial Experiments (Ch.5. Factorial Experiments)

Correlation Analysis

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

Least Squares Analyses of Variance and Covariance

Definitions of terms and examples. Experimental Design. Sampling versus experiments. For each experimental unit, measures of the variables of

Lecture 6 Multiple Linear Regression, cont.

Multiple Predictor Variables: ANOVA

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Unit 6: Orthogonal Designs Theory, Randomized Complete Block Designs, and Latin Squares

ANOVA: Analysis of Variation

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Unit 12: Analysis of Single Factor Experiments

Topic 14: Inference in Multiple Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Orthogonal contrasts for a 2x2 factorial design Example p130

Transcription:

STAT 705 Chapter 19: Two-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 38

Two-way ANOVA Material covered in Sections 19.2 19.4, but a bit differently. Have two factors, A and B. Levels of A indexed by i = 1,..., a. Levels of B indexed by j = 1,..., b. Number sampled when A = i and B = j is n ij. If data are balanced then n ij = n for all i, j. n T = a b i=1 j=1 n ij. If balanced, n T = nab. Y ijk is kth replicate of factor A = i & B = j. Let µ ij = E{Y ijk }. Have ab different means. Factor B Factor A 1 2 b 1 µ 11 µ 12 µ 1b 2 µ 21 µ 22 µ 2b....... a µ a1 µ a2 µ 1b 2 / 38

Two-way ANOVA Model is written Y ijk = µ ij + ɛ ijk, ɛ ijk iid N(0, σ 2 ). Most general, least restrictive case is when each combination of levels (i, j) has its own distinct mean. We ll look at special cases that impose structure on {µ ij }. I Y ijk = µ + ɛ ijk II Y ijk = µ + α i + ɛ ijk III Y ijk = µ + β j + ɛ ijk IV Y ijk = µ + α i + β j + ɛ ijk V Y ijk = µ + α i + β j + (αβ) ij + ɛ ijk 3 / 38

I. µ ij = µ, neither A nor B important When a = b = 2, means are Factor A Factor B 1 2 1 µ µ 2 µ µ 4 / 38

I. µ ij = µ, neither A nor B important If this model fits, you re done! Nothing to look at. Overall µ is estimated by ˆµ = Ȳ. Fit in SAS proc glm as model response = ; 5 / 38

II. µ ij = µ + α i, only A important When a = b = 2, means are Factor A Factor B 1 2 1 µ + α 1 µ + α 2 2 µ + α 1 µ + α 2 6 / 38

II. µ ij = µ + α i, only A important If this fits, have oneway model in A. Interested in L = a i=1 c iα i. SAS sets ˆµ = Ȳ a and ˆα i = Ȳ i Ȳ a. Fit in SAS proc glm as model response = A; Can get pairwise differences in factor A levels via, e.g. lsmeans A / pdiff adj=tukey; General contrasts available in estimate or lsmestimate. 7 / 38

III. µ ij = µ + β j, only B important When a = b = 2, means are Factor A Factor B 1 2 1 µ + β 1 µ + β 1 2 µ + β 2 µ + β 2 8 / 38

III. µ ij = µ + β j, only B important If this fits, have oneway model in B. Interested in L = b j=1 c jβ j. SAS sets ˆµ = Ȳ b and ˆβ j = Ȳ j Ȳ b. Fit in SAS proc glm as model response = B; Can get pairwise differences in factor B levels via, e.g. lsmeans B / pdiff adj=tukey; General contrasts available in estimate or lsmestimate. 9 / 38

IV. µ ij = µ + α i + β j, A and B additive When a = b = 2, means are Factor A Factor B 1 2 1 µ + α 1 + β 1 µ + α 2 + β 1 2 µ + α 1 + β 2 µ + α 2 + β 2 10 / 38

IV. µ ij = µ + α i + β j, both A and B important, but additive Differences in factor A level means are the same for each level of B. Differences in factor B level means are the same for each level of A. For example, comparing mean differences for A = 1 to A = 2 we have µ 1j µ 2j = µ + α 1 + β j (µ + α 2 + β j ) = α 1 α 2, independent of j! Similarly, µ i1 µ i2 = β 1 β 2 indep. of i. SAS computes the LS estimates as ˆβ = (X X) 1 X Y, doesn t simplify much. SAS sets α a = β b = 0. Fit in SAS proc glm as model response = A B; Can get pairwise differences in factor A levels via, e.g. lsmeans A / pdiff adj=tukey; Can get pairwise differences in factor B levels via, e.g. lsmeans B / pdiff adj=tukey; General forms L = a i=1 b j=1 c ijµ ij can be computed in estimate or lsmestimate (more later). 11 / 38

V. µ ij = µ + α i + β j + (αβ) ij, interaction model When a = b = 2, means are Factor A Factor B 1 2 1 µ + α 1 + β 1 + (αβ) 11 µ + α 2 + β 1 + (αβ) 12 2 µ + α 1 + β 2 + (αβ) 21 µ + α 2 + β 2 + (αβ) 22 12 / 38

V. µ ij = µ + α i + β j + (αβ) ij, interaction model Now we have Also µ 1j µ 2j = α 1 α 2 + (αβ) 1j (αβ) 2j. }{{} depends on B = j µ i1 µ i2 = β 1 β 2 + (αβ) i1 (αβ) }{{ i2. } depends on A = i No longer have parallel curves; mean differences in A change with levels of B and vice-versa. SAS sets α a = β b = 0, (αβ) aj = 0 for j < b, and (αβ) ib = 0 for i < a. Estimates can be obtained from solving Ȳ ij = ˆµ + ˆα i + ˆβ j + (αβ) ij. 13 / 38

Comments on model V Interaction model gives each pairing (i, j) it s own distinct mean, no structure on {µ ij }. Same as oneway model on r = ab groups. Your book focuses on the model where a i=1 α i = 0, b j=1 β j = 0, a i=1 (αβ) ij = 0 for each j, and b j=1 (αβ) ij = 0 for each i. This model has enhanced interpretability, but is not straightforward to fit in SAS. Interaction plots estimate the means using model V, e.g. ˆµ ij = Ȳ ij. The overall shape of these plots give clues as to which is the most appropriate model from I, II, III, IV, V. More shortly. Fit in SAS using either model response = A B A*B; or model response = A B;. General forms L = a b i=1 j=1 c ijµ ij can be computed in estimate or lsmestimate (more later). 14 / 38

Model fitting Say a = 3, b = 2, and n ij = n = 2 for each pairing (i, j). The parameters are β = (µ, α 1, α 2, β 1, (αβ) 11, (αβ) 21 ). Note that α 3 = β 2 = (αβ) 12 = (αβ) 22 = (αβ) 32 = (αβ) 31 = 0. In general, the degrees of freedom for A, B, and A*B are the number of free parameters associated with each of these: dfa = a 1 (α 1, α 2,..., α a 1 ) dfb = b 1 (β 1, β 2,..., β b 1 ) dfab = (a 1)(b 1) (αβ) 11 (αβ) 12 (αβ) 1,b 1 (αβ) 21 (αβ) 22 (αβ) 2,b 1...... (αβ) a 1,1 (αβ) a 1,2 (αβ) a 1,b 1 15 / 38

Matrix formulation As usual, Y = Xβ + ɛ. For our example, Y 111 1 1 0 1 1 0 Y 112 1 1 0 1 1 0 Y 121 1 1 0 0 0 0 Y 122 1 1 0 0 0 0 µ Y 211 1 0 1 1 0 1 α 1 Y 212 1 0 1 1 0 1 = α 2 Y 221 1 0 1 0 0 0 β 1 Y 222 1 0 1 0 0 0 (αβ) 11 Y 311 1 0 0 1 0 0 (αβ) 21 Y 312 1 0 0 1 0 0 Y 321 1 0 0 0 0 0 Y 322 1 0 0 0 0 0 + ɛ 111 ɛ 112 ɛ 121 ɛ 122 ɛ 211 ɛ 212 ɛ 221 ɛ 222 ɛ 311 ɛ 312 ɛ 321 ɛ 322. 16 / 38

Fitting Your textbook has a lot on fitting, parameter estimation under balance, etc. In general, easy to compute closed-form estimates do not exist, especially with unbalanced data. In that case, matrix algebra saves the day. The LS estimates are easily computed as ˆβ = (X X) 1 X Y. Recall ˆβ N p (β, (X X) 1 σ 2 ) where p is the number of mean parameters in β. Let c = (c 1,..., c p ). Then ˆL = c ˆβ N(c β, c (X X) 1 cσ 2 ). Any linear combination of mean parameters L is easily estimated. Recall that ˆL L se(ˆl) t(n T p). 17 / 38

SAS estimate command You can estimate any linear combination of model parameters in β using estimate. Say a = 3, b = 3. To estimate L 1 = α 2 α 1 use estimate L1 A -1 1 0; To estimate L 2 = β 3 β 1 + (αβ) 23 (αβ) 21 use estimate L2 B -1 0 1 A*B 0 0 0-1 0 1 0 0 0; To find order of levels for main effects and interaction, look at table of estimated coefficients. µ is called the intercept. Confidence intervals obtained via clparm option. 18 / 38

SAS lsmeans command By definition, the least-squares means are either the raw µ ij or simple averages of these under any of the models I, II, III, IV, V. You can get estimates of these from lsmeans. For two-way models there are three, lsmeans A, lsmeans B, and lsmeans A*B, yielding estimates of µ i = 1 b µ j = 1 a µ ij These are defined on p. 818 in your text; your book uses, e.g. µ i instead of µ i. b j=1 a i=1 µ ij µ ij 19 / 38

SAS lsmeans command pdiff gives all pairwise differences in LS means. If additive model IV fits, then lsmeans A / pdiff; gives all µ i1 µ i2 = α i1 α i2 and lsmeans B / pdiff; gives all µ j1 µ j2 = β j1 β j2. You can adjust these using Tukey or Bonferroni. If V fits, and looking at a single factor A or B, then pdiff can still be used, but pairwise differences are averaged over the levels of the remaining factor(s); then µ i1 µ i2 α i1 α i2 and lsmeans gives the former. cl adds confidence intervals; the intervals are adjusted if using, e.g. adjust=tukey. Can also look at lsmeans A*B. Gives all estimates of each {µ ij }, pdiff gives all possible pairwise comparisons here too. Alternatively can specify lsmeans A*B / slice=a or lsmeans A*B / slice=b to look at all pairwise differences at each level of the other factor. 20 / 38

Grouping like factor levels A lines plot shows groups of levels that are not significantly different from each other, usually with an overall FER, say FER = 0.05. These can be obtained automatically in proc glimmix by adding a lines subcommand to lsmeans. For example, proc glimmix data=bakery; class height width; model sales=height width / solution; lsmeans height / pdiff adjust=tukey lines; 21 / 38

SAS lsmestimate command You can use lsmestimate in proc glimmix to obtain inference for general linear combinations L = a b i=1 j=1 c ijµ ij under any of the models. You can obtain simultaneous inference using Bonferroni, Scheffe, or Tukey (if pairwise differences). The theory behind the multiple comparisons is similar to that as from the one-way model, but a bit different; pp. 848 861 Sections 19.8 & 19.9. SAS takes care of the details for us. Just make sure you know what you are estimating. As before, Tukey works best when looking at all pairwise comparisons. Bonferroni works best when looking at only a few linear combinations, Scheffe can work better when looking at a large number of combinations. 22 / 38

ANOVA table The ANOVA table lists rows for Treatments (depends on which model you are fitting I V), Error, and Total as before. As usual, the Pythagorean Theorem tells us Xˆβ 1 nt Ȳ 2 + Y Xˆβ 2 = Y 1 nt Ȳ 2. SSTR+SSE=SSTO. The p-value in the ANOVA table tests whether anything is important beyond a simple intercept µ. For example, in the interaction model V, the F-test is for H 0 : α i = 0, β j = 0, (αβ) ij = 0. In model IV H 0 : α i = 0, β j = 0, etc. 23 / 38

Type 3 sums of squares and associated tests There are also Type 3 sums of squares and associated tests. In interaction model V, there are SSA, SSB, and SSAB; these measure variability explained by model due to factors A, B, and interaction respectively (pp. 836 841). By definition, these are the differences in sums of squared errors comparing model V to (a) a model with B and A*B, (b) a model with A and A*B, and (c) a model with A and B. Only (c) is a hierarchical model, so that s the only test of interest here. If the data are balanced, n ij = n for all i, j, then SSTR=SSA+SSB+SSAB. The book notes this (pp. 837 838) and discusses associated testing at some length. 24 / 38

Type 3 sums of squares and associated tests Tim recommends fitting V and testing H 0 : (αβ) ij = 0, i.e. the additive model fits. If IV fits, then interpretation simplifies. Furthermore, if IV fits, you may want to test whether you can drop A or B from model IV; these two Type 3 tests are given to you automatically after fitting IV via model response=a B; All of these are standard nested linear hypotheses type F-tests. Your book does not recommend refitting the model when you accept H 0 : (αβ) ij = 0. This goes against the book s own advice for fitting general regression models in STAT 704. Tim recommends using the additive model if you accept the interaction is not important; discussed briefly in 19.10. 25 / 38

Type III test of no interaction in simple example Recall data where a = 3, b = 2, and n ij = n = 2. Want to test H 0 : (αβ) ij = 0 in full model V Y = X F β F + ɛ F ; here Y 111 Y 112 Y 121 Y 122 Y 211 Y 212 Y 221 Y 222 Y 311 Y 312 Y 321 Y 322 = 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 Reduced model is Y = X R β R + ɛ R ; here Y 111 Y 112 Y 121 Y 122 Y 211 Y 212 Y 221 Y 222 Y 311 Y 312 Y 321 Y 322 = 1 1 0 1 1 1 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 µ α 1 α 2 β 1 µ α 1 α 2 β 1 (αβ) 11 (αβ) 21 + + ɛ 111 ɛ 112 ɛ 121 ɛ 122 ɛ 211 ɛ 212 ɛ 221 ɛ 222 ɛ 311 ɛ 312 ɛ 321 ɛ 322 ɛ 111 ɛ 112 ɛ 121 ɛ 122 ɛ 211 ɛ 212 ɛ 221 ɛ 222 ɛ 311 ɛ 312 ɛ 321 ɛ 322.. 26 / 38

Type III test, continued... For the full model ˆβ F = (X F X F ) 1 X F Y, SSE(F ) = Y X F ˆβ F 2, dfe(f ) = 12 6 = 6, and MSE(F ) = SSE(F )/dfe(f ). For the reduced model ˆβ R = (X R X R) 1 X R Y, SSE(R) = Y X R ˆβ R 2, and dfe(r) = 12 4 = 8. Define F = {SSE(R) SSE(F )}/{dfe(r) dfe(f )}. MSE(F ) Then if H 0 : (αβ) 11 = (αβ) 21 = 0 is true, F F (dfe(r) dfe(f ), dfe(f )). Use the Type III test for A*B in SAS. 27 / 38

Castle Bakery (p. 833) Castle Bakery supplies Italian bread to stores in a large city. They want to sell more bread so they designed an experiment. Factor A is shelf height, i = 1, 2, 3 for bottom, middle, or top. Factor B is is width of the display, j = 1, 2 is regular, wide. The design is balanced, with n ij = n = 2 stores receiving one of the (i, j) pairings. data bakery; input sales height width store; datalines; 47 1 1 1 43 1 1 2 46 1 2 1 40 1 2 2 62 2 1 1 68 2 1 2 67 2 2 1 71 2 2 2 41 3 1 1 39 3 1 2 42 3 2 1 46 3 2 2 ; * initial fit of model V; proc glm data=bakery plots=all; class height width; model sales = height width height*width / solution; run; 28 / 38

More Castle Bakery This duplicates your textbook s analyses: * Example 1, pp. 853-855; proc glm data=bakery; class height width; model sales=height width height*width / solution; lsmeans height / pdiff adjust=tukey alpha=0.05 cl; run; * Example 2, p. 855; proc glimmix data=bakery; class height width; model sales=height width height*width / solution; lsmestimate height*width "reg width middle" 0 0 1 0 0 0, "reg width high " 0 0 0 0 1 0 / adjust=bon alpha=0.1 cl; run; Alternatively, you could use model sales = height width in these commands instead of sales=height width height*width, following Tim s advice. 29 / 38

Diagnostics and remedial measures Interaction plots are given from plots=all fitting model V. These will tell you which of models I V are good candidates for the data. Residuals are defined as usual. For example, under model IV, e ijk = Y ijk Ŷijk = Y ijk [ˆµ + ˆα i + ˆβ j ]. Look at least at e ijk vs. Ŷ ijk. Can plot the {e ijk } vs. the indices i and j for two additional plots. Look at normal probability plot. If data have nonconstant variance but are normal, can use repeated / group=a*b; in proc mixed. If variance only changes with levels of A or B, can instead use repeated / group=a; or repeated / group=b; If data are nonnormal and have nonconstant variance, try a Box-Cox transformation of the Y ijk in proc transreg. Sometimes a Box-Cox transformation of the Y ijk can also get rid of a signficant interaction (pp. 826 827). 30 / 38

Strategy for analysis Check whether A and B interact with Type 3 test. If not, base inference on additive model or else model with A or B only. Typically one looks at pairwise mean differences from lsmeans. If A and B signficantly interact, then you can examine pairwise differences of averaged effects, e.g. µ j1 µ j2, or else pairwise difference slices µ ij1 µ ij2 for i = 1,..., a. These are interpreted differently. For example, the slices may be signficant whereas the averaged differences may not. Check the appropriateness of the model with standard diagnostic plots. If have both non-constant variance and non-normal data, consider Box-Cox transformation of the response. Often a Box-Cox transformation will also eliminate significant interactions. 31 / 38

Example pp. 870 871 n T = 24 programmers asked to predict how long a big project would take in programmer-days. After project over Y ijk is actual minus predicted programmer-days (prediction errors). Programmers classified by type of experience (A = 1 small systems, A = 2 small & large); and experience (B = 1 is < 5 years, B = 2 is 5 10 years, B = 3 is 10 years). data predict; input days exper years @@; datalines; 240.0 1 1 206.0 1 1 217.0 1 1 225.0 1 1 110.0 1 2 118.0 1 2 103.0 1 2 95.0 1 2 56.0 1 3 60.0 1 3 68.0 1 3 58.0 1 3 71.0 2 1 53.0 2 1 68.0 2 1 57.0 2 1 47.0 2 2 52.0 2 2 31.0 2 2 49.0 2 2 37.0 2 3 33.0 2 3 40.0 2 3 45.0 2 3 ; * gives all a*b choose 2 pairwise comparisons via Tukey; * slice subcommand only gives F-test within each slice; proc glm data=predict plots=all; class exper years; model days=exper years; lsmeans exper*years / adjust=tukey slice=exper; * slice command in glimmix better; * gives PW comparisons within each slice, i.e.; * b choose 2 comparisons within each of a slices; proc glimmix; class exper years; model days=exper years; slice exper*years / sliceby=exper adjust=tukey cl; proc glimmix; class exper years; model days=exper years; * adjust=tukey not needed; slice exper*years / sliceby=years adjust=tukey cl; 32 / 38

Outline of book s approach Everything in Chapter 19 requires balance n ij = n. Balance is nice if you are computing ANOVA s by hand, but more often than not data are unbalanced and the nice formulae do not apply. Chapter 23 is an entire chapter devoted to unbalanced analyses. 19.1 Three examples of designs leading to two-way ANOVA. 19.2 Interpretation of model, overall means, additive and interaction models, important and non-important interactions, transforming the data to get rid of an interaction. 19.3 Cell-means model with two factors. 19.4 Interaction model: fitting via least squares, partitioning sums of squares & degrees of freedom, an augmented ANOVA table with SSTR=SSA+SSB+SSAB. 19.5 Residual analysis. 33 / 38

Outline of book s approach 19.6 F tests: three of them from only fitting model V using augmented table. Kimball inequality α 1 (1 α 1 )(1 α 2 )(1 α 3 ) (has to do with doing sequential tests on whether factors are important). 19.7 Strategy for analysis & flowchart. 19.8 Analyzing factor effects without an interaction (within the context of V!) 19.9 Analyzing factor effects with an interaction. 19.10 Pooling sums of squares, i.e. using IV instead of V when interaction not important. 34 / 38

Focus on model V (as in textbook) This is Minitab s model. LS parameter estimates minimize Q(µ, α, β, αβ) = a n b ij (Y ijk (µ + α i + β j + (αβ) ij ) 2, i=1 j=1 k=1 subject to α = β = (αβ) i = (αβ) j = 0. These are given by We have Ȳ ij Ȳ }{{} ˆµ ij ˆµ ˆµ = Ȳ ˆα i = Ȳ i Ȳ ˆβ j = Ȳ j Ȳ (αβ) ij = Ȳij Ȳi Ȳ j + Ȳ = Ȳi }{{ Ȳ + } Ȳ j }{{ Ȳ } ˆα i ˆβ j + Ȳij Ȳi Ȳ j + }{{ Ȳ. } (αβ) ij Fitted values are Ŷ ijk = Ȳ ij. Estimates only for balanced data! 35 / 38

Matrix formulation For example considered earlier, Y 111 1 1 0 1 1 0 Y 112 1 1 0 1 1 0 Y 121 1 1 0 1 1 0 Y 122 1 1 0 1 1 0 Y 211 1 0 1 1 0 1 Y 212 1 0 1 1 0 1 = Y 221 1 0 1 1 0 1 Y 222 1 0 1 1 0 1 Y 311 1 1 1 1 1 1 Y 312 1 1 1 1 1 1 Y 321 1 1 1 1 1 1 Y 322 1 1 1 1 1 1 µ α 1 α 2 β 1 (αβ) 11 (αβ) 21 + ɛ 111 ɛ 112 ɛ 121 ɛ 122 ɛ 211 ɛ 212 ɛ 221 ɛ 222 ɛ 311 ɛ 312 ɛ 321 ɛ 322. Uses α 3 = α 1 α 2, β 2 = β 1, (αβ) 12 = (αβ) 11, (αβ) 22 = (αβ) 12, and (αβ) 32 = (αβ) 12 (αβ) 22 = (αβ) 11 + (αβ) 12. 36 / 38

Model V SS Recall model V is simply one-way model with cell-means {µ ij }. SSTO = SSTR = SSE = n a b ij (Y ijk Ȳ ) 2 i=1 j=1 k=1 a i=1 j=1 b n ij (Ȳ ij Ȳ ) 2 n a b ij (Y ijk Ȳ ij ) 2 i=1 j=1 k=1 Note that each deviation can be broken up as Y ijk }{{ Ȳ } = Ȳ ij }{{ Ȳ } + Y ijk Ȳij. }{{} ijkth deviation explained by model left over 37 / 38

Model V, balanced case n ij = n (p. 841) Can show that SSTR=SSA+SSB+SSAB where SSA = nb SSA = na SSAB = n a (Ȳi Ȳ )2 i=1 b (Ȳ j Ȳ ) 2 j=1 a i=1 j=1 b (Ȳij Ȳi Ȳ j + Ȳ )2 SSA, SSB, and SSAB measure portion of variability explained by model due to Factors A, B, and interaction, respectively. Leads to a refined ANOVA table Source SS df MS F p-value A SSA a 1 SSA MSA P{F (a 1, (n 1)ab) > MSA/MSE} a 1 MSE B SSB b 1 SSB MSB P{F (b 1, (n 1)ab) > MSB/MSE} b 1 MSE AB SSAB (a 1)(b 1) SSAB MSAB P{F ((a 1)(b 1), (n 1)ab) > MSAB/MSE} (a 1)(b 1) MSE Error SSE (n 1)ab SSE (n 1)ab Total SSTO nab 1 Three tests test H 0 : α i = 0, H 0 : β j = 0, and H 0 : (αβ) ij = 0 respectively. Only the last one, the test for the interaction, yields a hierarchical model if accepted. 38 / 38