STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA

Similar documents
STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA

Topic 29: Three-Way ANOVA

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Topic 28: Unequal Replication in Two-Way ANOVA

STAT 705 Chapter 16: One-way ANOVA

Chapter 20 : Two factor studies one case per treatment Chapter 21: Randomized complete block designs

STAT 705 Chapters 22: Analysis of Covariance

Unbalanced Designs Mechanics. Estimate of σ 2 becomes weighted average of treatment combination sample variances.

More than one treatment: factorials

Lec 5: Factorial Experiment

1 Tomato yield example.

STAT 506: Randomized complete block designs

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Lecture 9: Factorial Design Montgomery: chapter 5

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

y = µj n + β 1 b β b b b + α 1 t α a t a + e

STAT 401A - Statistical Methods for Research Workers

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Unbalanced Designs & Quasi F-Ratios

Unbalanced Data in Factorials Types I, II, III SS Part 2

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Analysis of Covariance

Outline Topic 21 - Two Factor ANOVA

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Two-Way Analysis of Variance - no interaction

Sections 7.1, 7.2, 7.4, & 7.6

df=degrees of freedom = n - 1

Stat 705: Completely randomized and complete block designs

Introduction. Chapter 8

General Linear Model (Chapter 4)

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Regression Models for Quantitative and Qualitative Predictors: An Overview

Incomplete Block Designs

Multiple Predictor Variables: ANOVA

4.8 Alternate Analysis as a Oneway ANOVA

Factorial and Unbalanced Analysis of Variance

Contents. TAMS38 - Lecture 6 Factorial design, Latin Square Design. Lecturer: Zhenxia Liu. Factorial design 3. Complete three factor design 4

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Analysis of Variance Bios 662

Reference: Chapter 6 of Montgomery(8e) Maghsoodloo

Lecture 7: Latin Square and Related Design

Analysis of Covariance

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Formal Statement of Simple Linear Regression Model

Unbalanced Data in Factorials Types I, II, III SS Part 1

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Food consumption of rats. Two-way vs one-way vs nested ANOVA

PLS205 Lab 2 January 15, Laboratory Topic 3

3. Factorial Experiments (Ch.5. Factorial Experiments)

y response variable x 1, x 2,, x k -- a set of explanatory variables

Lecture 15 Topic 11: Unbalanced Designs (missing data)

Power & Sample Size Calculation

6 Designs with Split Plots

Biological Applications of ANOVA - Examples and Readings

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Lecture 10: Experiments with Random Effects

REVIEW 8/2/2017 陈芳华东师大英语系

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

Topic 20: Single Factor Analysis of Variance

Least Squares Analyses of Variance and Covariance

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Analysis of Variance

8 Analysis of Covariance

STAT 7030: Categorical Data Analysis

Nested Designs & Random Effects

Orthogonal contrasts for a 2x2 factorial design Example p130

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Lec 1: An Introduction to ANOVA

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Chapter 19. More Complex ANOVA Designs Three-way ANOVA

Lecture 13 Extra Sums of Squares

22s:152 Applied Linear Regression. Take random samples from each of m populations.

4.1. Introduction: Comparing Means

Orthogonal and Non-orthogonal Polynomial Constrasts

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Assignment 6 Answer Keys

Unit 12: Analysis of Single Factor Experiments

Chapter 11. Analysis of Variance (One-Way)

VIII. ANCOVA. A. Introduction

Pairwise multiple comparisons are easy to compute using SAS Proc GLM. The basic statement is:

ANOVA (Analysis of Variance) output RLS 11/20/2016

Unit 8: 2 k Factorial Designs, Single or Unequal Replications in Factorial Designs, and Incomplete Block Designs

V. Experiments With Two Crossed Treatment Factors

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Stat 217 Final Exam. Name: May 1, 2002

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Transcription:

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22

Balanced vs. unbalanced data Balance is nice if calculating by hand! Typically, data are not balanced. Why? Observational studies don t get to impose treatments on groups of same size. Subjects may drop out of a planned experiment. Cost considerations some treatments more expensive. Notation and model is exactly the same for balanced (n ij = n) and unbalanced: Y ijk = µ + α i + β j + (αβ) ij + ɛ ijk. Here, i = 1,..., a, j = 1,..., b, and k = 1,..., n ij. I have already covered two-way ANOVA assuming unbalanced data. 2 / 22

Fitting The model is fit as a regression model. There are (a 1) binary predictors for factor A, (b 1) binary predictors for factor B, and (a 1)(b 1) interaction predictors obtained by multiplying factor A predictors by factor B predictors. See example, pp. 954 957. In general, SSTR SSA + SSB +SSAB as defined in Chapter 19; orthogonality is lost in unbalanced designs. 3 / 22

Type III tests We treat model as regression model with (a 1) + (b 1) + (a 1)(b 1) = ab 1 predictors, but we only test dropping blocks of predictors from this full model V corresponding to A, B, or AB, using general nested linear hypotheses ( big model / little model ), as in regression. Recall n T = a i=1 b j=1 n ij. SAS gives Type III tests for F A = MSE(A B, AB) MSE F (a 1, n T dfe) if H 0 : α i = 0. MSE(B A, AB) F B = F (b 1, n T dfe) if H 0 : β i = 0. MSE MSE(AB A, B) F AB = F ((a 1)(b 1), n T dfe) if H 0 : (αβ) ij = 0. MSE Only the last test leaves a hierarchical model (additive model IV). 4 / 22

Modeling strategy Say a = 2 and b = 3. If accept H 0 : (αβ) ij = 0 then can look at, e.g., L 1 = β 1 1 2 (β 2 + β 3 ) and L 2 = β 2 β 3 via lsmestimate B "L1" 1.0-0.5-0.5, "L2" 0.0 1.0-1.0 / adjust=bonf; If reject H 0 : (αβ) ij = 0 then look at linear combinations of µ ij = µ + α i + β j + (αβ) ij. For example, maybe µ 21 µ 11, µ 22 µ 12, and µ 23 µ 13 (differences in A over levels of B). lsmestimate A*B "mu21-mu11" -1 0 0 1 0 0, "mu22-mu12" 0-1 0 0 1 0, "mu23-mu13" 0 0-1 0 0 1 / adjust=bonf; Note that Tukey still works for pairwise comparisons, but FER < α rather than FER = α. Note: can also work directly with model parameters using estimate. 5 / 22

Bone growth, pp. 954 959 Synthetic growth hormone given to n T = 14 children, all hormone deficient and short. Y ijk is difference in growth rate during month of treatment vs. previous non-treatment in cm/month. i = 1, 2 is gender (male/female) and j = 1, 2, 3 is bone development (severely depressed, moderately depressed, mildly depressed). No randomization of treatments employed; treatments (gender and level of depression) are observational here. Every child gets growth hormone. 6 / 22

Analysis in SAS data growth; input ratediff gender bonedev @@; datalines; 1.4 1 1 2.4 1 1 2.2 1 1 2.1 1 2 1.7 1 2 0.7 1 3 1.1 1 3 2.4 2 1 2.5 2 2 1.8 2 2 2.0 2 2 0.5 2 3 0.9 2 3 1.3 2 3 ; * get interaction plot; * table 23.4 (p. 959) is Type III SS Table; proc glm plots=all; class gender bonedev; model ratediff=gender bonedev; * with p=0.80 we can drop the interaction and look at main effects; proc glm plots=all; class gender bonedev; model ratediff=gender bonedev; lsmeans bonedev / pdiff adjust=tukey alpha=0.05 cl; * removing gender as well; proc glm plots=all; class gender bonedev; model ratediff=bonedev; lsmeans bonedev / pdiff adjust=tukey alpha=0.05 cl; 7 / 22

Chapter 24: Multi-factor studies Say we have three factors: A, B, and C. A full, three-way interaction model is Y ijkl = µ + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk + (αβγ) ijk + ɛ ijkl. Here i = 1,..., a, j = 1,..., b, and k = 1,..., c; have replicates l = 1,..., n ijk. Balanced if n ijk = n for all i, j, k. That s a lot of parameters! SAS sets parameters equal to zero that have indices i = a, j = b, or k = c. Section 24.2 has some text on interpreting the model; pp. 998 1002. 8 / 22

A model-based approach to multi-way ANOVA Use interaction plots and Type III tests for find a simpler hierarchical model to explain the data. Check residuals vs. fitted values (heteroscedascity?), histogram of residuals (skew? bimodality?), and normal probability plot to assess model adequacy. Decide what sorts of paired differences or linear combinations you want to look at. For example, if end up with A, B, C, and B*C, you can look at main effects in A, and B*C interaction effects. These might include looking at all differences in main effects of A µ i1 µ i2 = α i1 α i2 (use Tukey), and looking at slices µ j1 k µ j2 k for pairs 1 j 1 < j 2 b. The lowest order interactions in the effect left in the model determine which pairwise differences make sense to look at! 9 / 22

Hierarchical model building Recall with hierarchical model building, if we have an interaction, we must include all lower order effects that comprise the interaction. So if we have a three way interaction A C D, we must also include the effects A, C, D, A C, A D, and C D. In SAS this is accomplished including A C D in the model statement. A reasonable approach to model building is pare down higher order interactions until you have a model with largely significant effects in it, i.e. backwards elimination. This approach incurs the problem of multiple hypothesis testing, but can be somewhat eleviated using Kimball s inequality, or else by considering one overall test for dropping several effects at once; I suggest the latter. 10 / 22

Averaged effects Regardless of the final model chosen, one can always resort to the examination of so-called averaged effects. Let s consider three factors A, B, and C for simplicity. The averaged effect for A = i is given by µ i = 1 bc b j=1 k=1 c µ ijk, where µ ijk = E(Y ijkm ) under your final model. This is the mean response at A = i averaged over the levels of B and C, and are provided by SAS lsmeans A. This averaging assumes that all factors levels are weighted equally. 11 / 22

Differences in averaged effects We may furthermore look at differences in averaged effects, e.g. µ 2 µ 1. These are also interpreted as treatment differences averaged over the other effects, e.g. µ 2 µ 1 = 1 bc b j=1 k=1 c [µ 2jk µ 1jk ]. You can obtain these from adding pdiff to your lsmeans statement. Both lsmeans and lsmestimate deal with averaged effects. The rest of the averaged effects for the three-factor model are µ j = 1 ac a c i=1 k=1 µ ijk, µ k = 1 ab a i=1 j=1 b µ ijk, µ ij = 1 c µ ijk, µ i k = 1 b µ ijk, µ jk = 1 a µ ijk. c b a k=1 j=1 i=1 You can look at these effects and obtain pairwise differences by including, e.g. lsmeans A*B / pdiff adjust=tukey cl; 12 / 22

Simplification when higher order interactions are dropped Note, if A does not share any interactions with other factors, e.g. the model µ ijk = µ + α i + β j + γ k + (βγ) jk fits, then µ 2 µ 1 = α 2 α 1. This idea generalizes to the other factors as well. However, if the model µ ijk = µ + α i + β j + γ k + (αβ) ij + (βγ) jk fits, then µ 2 µ 1 α 2 α 1. In fact, µ 2 µ 1 = α 2 α 1 + 1 b b j=1 (αβ) 2j 1 b b j=1 (αβ) 1j. In general, differences in the A treatments can vary across the other two factors in a complex way. Ideally, one would then look at, e.g. µ 2jk µ 1jk for different values of j and k to see where treatment A differences occur. It can happen that the averaged difference µ 2 µ 1 is not significantly non-zero, yet one or more of the µ 2jk µ 1jk are significantly non-zero. You can examine the individual differences using estimate. 13 / 22

Example where a = b = c = 2 Say through backward elimination, the model A, B, C, A B, B C is shown to adequately describe the data; i.e. µ ijk = µ + α i + β j + γ k + (αβ) ij + (βγ) jk. Then in SAS the A B effects are listed in the order (A, B) = (1, 1), (1, 2), (2, 1), and (2, 2), same with the B C effects. Since A does not interact with C, we only need to examine A differences over B. To visualize this, note that for any k µ 2jk µ 1jk = µ + α 2 + β j + γ k + (αβ) 2j + (βγ) jk [µ + α 1 + β j + γ k + (αβ) 1j + (βγ) jk ] = α 2 α 1 + (αβ) 2j (αβ) 1j, which is independent of k, i.e. independence of factor C. There are only two of these to look at, namely α 2 α 1 + (αβ) 21 (αβ) 11, how treatment A differs when B = 1, and α 2 α 1 + (αβ) 22 (αβ) 12, how treatment A differs when B = 2. 14 / 22

Example, continued You can get these either out of estimate directly, or lsmestimate by noting that for this model µ 2j µ 1j = µ 2jk µ 1jk = α 2 α 1 + (αβ) 2j (αβ) 1j. The command estimate works with all of the effects in the model that you list, i.e. the α i s, β j s, γ k s, (αβ) ij s, etc., whereas lsmestimate works with the averaged effects µ i, µ j, µ k, µ ij, etc. 15 / 22

Final comments Output from lsmestimate is essentially always interpratable, either as a conditional or averaged linear combination, depending on the interaction structure. Similarly, output from lsmeans is also interpretable in terms of averaged effects. Output from estimate will be interpretable if you are careful. Don t be afraid to write down the model and play around with the math. This is how you can find out when estimate and lsmestimate give you the same results! 16 / 22

Interaction plots for multi-way models For multi-factor models, we can look at averaged (or marginal) interaction plots obtained in proc glm by simply fitting a model with only two of the factors, e.g. get each of model=a B;, model=a C;, and model=b C; It is also possible to get conditional interaction plots directly out of SAS. Say you have three factors, A, B, and C, each with two levels. The averaged plot for A and B uses Ȳ ij ; there are two conditional plots for A and B, one at k = 1 uses Ȳ ij1 and the other at k = 2 uses Ȳij2. Averaged plots can tell you whether two-way interactions are necessary; conditional plots can tell you whether two-way and higher interactions are necessary, but are a pain to interpret without some practice. See Section 24.2 (pp. 998 1000). 17 / 22

Averaged interaction plots for some models Model A B C has A/B, A/C, and B/C averaged plots µ ij = µ + α i + β j + γ parallel µ i k = µ + α i + β + γ k parallel µ jk = µ + ᾱ + β j + γ k parallel Model A B C A*B has A/B, A/C, and B/C averaged plots µ ij = µ + α i + β j + γ + (αβ) ij not parallel µ i k = µ + α i + β + γ k + (αβ) i parallel µ jk = µ + ᾱ + β j + γ k + (αβ) j parallel Model A B C A*B B*C has A/B, A/C, and B/C averaged plots µ ij = µ + α i + β j + γ + (αβ) ij + (βγ) j not parallel µ i k = µ + α i + β + γ k + (αβ) i + (βγ) k parallel µ jk = µ + ᾱ + β j + γ k + (αβ) j + (βγ) jk not parallel 18 / 22

Body fat Effects of gender (A), body fat (%, B), and smoking history (C) of subjects on exercise tolerance Y ijkl is minutes of bicycling until fatigue, were measured in small study of n T = 24 subjects 25 35 years old. Study happens to be balanced. Partial analysis on pp. 1005 1012. 19 / 22

Time until fall off bike, SAS analysis data tol; * 1=male vs. female, 1=low fat vs. high, 1=light smoking vs. heavy; input tol gender fat smoking @@; datalines; 24.1 1 1 1 29.2 1 1 1 24.6 1 1 1 20.0 2 1 1 21.9 2 1 1 17.6 2 1 1 14.6 1 2 1 15.3 1 2 1 12.3 1 2 1 16.1 2 2 1 9.3 2 2 1 10.8 2 2 1 17.6 1 1 2 18.8 1 1 2 23.2 1 1 2 14.8 2 1 2 10.3 2 1 2 11.3 2 1 2 14.9 1 2 2 20.4 1 2 2 12.8 1 2 2 10.1 2 2 2 14.4 2 2 2 6.1 2 2 2 ; * "conditional" interaction plot; proc sgpanel; panelby gender / rows=1 columns=2; scatter x=fat y=tol / group=smoking; reg x=fat y=tol / group=smoking; * fat*smoking averaged over gender; proc sgplot; title "averaged over gender"; scatter x=fat y=tol / group=smoking; reg x=fat y=tol / group=smoking; * can also get them the usual way through proc glm; * fat by gender interaction probably not needed; proc glm plots=all; class gender fat smoking; model tol=gender fat; * gender by smoking interaction probably not needed; proc glm plots=all; class gender fat smoking; model tol=gender smoking; * fat by smoking interaction probably needed; proc glm plots=all; class gender fat smoking; model tol=fat smoking; 20 / 22

One overall F-test for dropping effects * saturated model with all possible interactions; * we can drop any one of gender*fat, gender*smoking, or gender*fat*smoking; proc glm data=tol outstat=full; class gender fat smoking; model tol=fat smoking gender / solution; run; * let s see if we can drop all three of these effects at the same time; proc glm data=tol outstat=reduced; class gender fat smoking; model tol=gender fat smoking; run; * p-value for nested hypothesis in SAS; data test1; set full reduced; if _SOURCE_="ERROR"; * get error SS and DF; proc means data=test1 noprint; var ss df; output out=test2 min=minss mindf max=maxss maxdf; data nested; set test2; fstar=((maxss-minss)/(maxdf-mindf))/(minss/mindf); pvalue=1-cdf( f,fstar,maxdf-mindf,mindf); proc print; run; 21 / 22

Analysis of reduced model proc glm plots=all; class gender fat smoking; model tol=gender fat smoking; lsmeans gender / pdiff adjust=tukey alpha=0.05 cl; lsmeans fat*smoking / pdiff adjust=tukey alpha=0.05 cl; proc glimmix; class gender fat smoking; model tol=gender fat smoking; lsmestimate gender "gender" 1-1 / adjust=t alpha=0.05 cl; lsmestimate fat*smoking "diff1" -1 1 0 0, "diff2" 0 0-1 1 / adjust=bon alpha=0.05 cl; * for illustration consider a model with G F S G*F and F*S; * since gender DOES NOT interact with smoking, gender *differences* only change with fat type; proc glimmix; class gender fat smoking; model tol=gender fat smoking gender*fat fat*smoking; estimate "gender diff @ fat=1" gender -1 1 gender*fat -1 0 1 0; estimate "gender diff @ fat=2" gender -1 1 gender*fat 0-1 0 1; lsmestimate gender*fat "gender diff @ fat=1" -1 0 1 0; * same as above!; lsmestimate gender*fat "gender diff @ fat=1" 0-1 0-1; 22 / 22