Topic 28: Unequal Replication in Two-Way ANOVA

Similar documents
SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Topic 29: Three-Way ANOVA

Topic 20: Single Factor Analysis of Variance

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Outline Topic 21 - Two Factor ANOVA

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Topic 32: Two-Way Mixed Effects Model

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Topic 23: Diagnostics and Remedies

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

General Linear Model (Chapter 4)

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Topic 14: Inference in Multiple Regression

Statistics 512: Applied Linear Models. Topic 7

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Topic 16: Multicollinearity and Polynomial Regression

Chapter 2 Inferences in Simple Linear Regression

Topic 18: Model Selection and Diagnostics

Statistics 512: Applied Linear Models. Topic 9

Unbalanced Designs Mechanics. Estimate of σ 2 becomes weighted average of treatment combination sample variances.

Lecture 27 Two-Way ANOVA: Interaction

SAS Program Part 1: proc import datafile="y:\iowa_classes\stat_5201_design\examples\2-23_drillspeed_feed\mont_5-7.csv" out=ds dbms=csv replace; run;

Chapter 1 Linear Regression with One Predictor

PLS205 Lab 2 January 15, Laboratory Topic 3

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

1 Tomato yield example.

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Overview Scatter Plot Example

Chapter 8 Quantitative and Qualitative Predictors

Workshop 7.4a: Single factor ANOVA

Analysis of Covariance

Lecture 9: Factorial Design Montgomery: chapter 5

Repeated Measures Part 2: Cartoon data

4.8 Alternate Analysis as a Oneway ANOVA

STAT 705 Chapter 16: One-way ANOVA

Lecture 10 Multiple Linear Regression

STOR 455 STATISTICAL METHODS I

Lecture 3: Inference in SLR

Statistical Techniques II EXST7015 Simple Linear Regression

Regression With a Categorical Independent Variable: Mean Comparisons

14 Multiple Linear Regression

STAT 705 Chapter 19: Two-way ANOVA

Chapter 13. Multiple Regression and Model Building

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Statistics 5100 Spring 2018 Exam 1

Regression With a Categorical Independent Variable

Lecture 13 Extra Sums of Squares

df=degrees of freedom = n - 1

Biological Applications of ANOVA - Examples and Readings

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Analysis of Variance

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Factorial ANOVA. STA305 Spring More than one categorical explanatory variable

ISQS 5349 Final Exam, Spring 2017.

using the beginning of all regression models

Simple, Marginal, and Interaction Effects in General Linear Models

Statistics 512: Applied Linear Models. Topic 1

Assessing Model Adequacy

Multivariate analysis of variance and covariance

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

6. Multiple regression - PROC GLM

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

1. (Problem 3.4 in OLRT)

Least Squares Analyses of Variance and Covariance

Regression With a Categorical Independent Variable

Lecture 12: 2 k Factorial Design Montgomery: Chapter 6

Analysis of variance and regression. November 22, 2007

Incomplete Block Designs

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

The Random Effects Model Introduction

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

Linear models Analysis of Covariance

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Ch 2: Simple Linear Regression

Lab 10 - Binary Variables

2-way analysis of variance

Linear models Analysis of Covariance

Lecture 7: OLS with qualitative information

EXST7015: Estimating tree weights from other morphometric variables Raw data print

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Lecture 11: Simple Linear Regression

Chapter 11. Analysis of Variance (One-Way)

Lecture notes on Regression & SAS example demonstration

Lec 1: An Introduction to ANOVA

Course Information Text:

Week 7.1--IES 612-STA STA doc

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Business Statistics. Lecture 10: Course Review

Power Analysis for One-Way ANOVA

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

Lecture 11 Multiple Linear Regression

Transcription:

Topic 28: Unequal Replication in Two-Way ANOVA

Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant n just special case

Data for two-way ANOVA Y is the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Y ijk is the k th observation in cell (i,j) k = 1 to n ij and n ij may vary

Recall Bread Example KNNL p 833 Y is the number of cases of bread sold A is the height of the shelf display, a=3 levels: bottom, middle, top B is the width of the shelf display, b=2: regular, wide n=2 stores for each of the 3x2 treatment combinations (BALANCED)

Regression Approach Create a-1 dummy variables to represent levels of A Create b-1 dummy variables to represent levels of B Multiply each of the a-1 dummy variables with b-1 dummy variables for B to get variables for AB LET S LOOK AT THE RELATIONSHIP AMONG THESE SETS OF VARIABLES

Common Set of Variables data a2; set a1; i X1 = (height eq 1) - (height eq 3); X2 = (height eq 2) - (height eq 3); X3 = (width eq 1) - (width eq 2); X13 = X1*X3; X23 = X2*X3; i 0, 0 i ij j ij j j 0, 0

Run Proc Reg proc reg data=a2; model sales= X1 X2 X3 X13 X23 / XPX I; height: test X1, X2; width: test X3; interaction: test X13, X23; run;

X X Matrix Model Crossproducts X'X X'Y Y'Y Variable Intercept X1 X2 X3 X13 X23 Intercept 12 0 0 0 0 0 X1 0 8 4 0 0 0 X2 0 4 8 0 0 0 X3 0 0 0 12 0 0 X13 0 0 0 0 8 4 Sets of variables orthogonal Crossproducts between sets is 0 X23 0 0 0 0 4 8

Orthogonal X s Order in which the variables are fit in the model does not matter Type I SS = Type III SS Order of fit not mattering is true for all choices of restrictions when n ij is constant Orthogonality lost when n ij are not constant

KNNL Example KNNL p 954 Y is the change in growth rates for children after a treatment A is gender, a=2 levels: male, female B is bone development, b=3 levels: severely, moderately, or mildly depressed n ij =3, 2, 2, 1, 3, 3 children in the groups

Read and check the data data a3; infile 'c:\...\ch23ta01.txt'; input growth gender bone; proc print data=a1; run;

Obs growth gender bone 1 1.4 1 1 2 2.4 1 1 3 2.2 1 1 4 2.1 1 2 5 1.7 1 2 6 0.7 1 3 7 1.1 1 3 8 2.4 2 1 9 2.5 2 2 10 1.8 2 2 11 2.0 2 2 12 0.5 2 3 13 0.9 2 3 14 1.3 2 3

Common Set of Variables data a3; set a3; i i 0, 0 i ij j ij X1 = (bone eq 1) - (bone eq 3); X2 = (bone eq 2) - (bone eq 3); X3 = (gender eq 1) - (gender eq 2); X13 = X1*X3; X23 = X2*X3; j j 0, 0

Run Proc Reg proc reg data=a3; model growth= X1 X2 X3 X13 X23 / XPX I; run;

X X Matrix Model Crossproducts X'X X'Y Y'Y Variable Intercept X1 X2 X3 X13 X23 Intercept 14-1 0 0 3 0 X1-1 9 5 3 1-1 X2 0 5 10 0-1 -2 X3 0 3 0 14-1 0 X13 3 1-1 -1 9 5 X23 0-1 -2 0 5 10 Crossproduct terms no longer all 0 Order of fit matters

How does this impact the analysis? In regression, this happens all the time (explanatory variables are correlated) In regression, t tests look at significance of variable when fitted last When looking at comparing means order of fit will alter null hypothesis

Prepare the data for a plot data a1; set a1; if (gender eq 1)*(bone eq 1) then gb='1_msev '; if (gender eq 1)*(bone eq 2) then gb='2_mmod '; if (gender eq 1)*(bone eq 3) then gb='3_mmild'; if (gender eq 2)*(bone eq 1) then gb='4_fsev '; if (gender eq 2)*(bone eq 2) then gb='5_fmod '; if (gender eq 2)*(bone eq 3) then gb='6_fmild';

Plot the data title1 'Plot of the data'; symbol1 v=circle i=none; proc gplot data=a1; plot growth*gb; run;

Find the means proc means data=a1; output out=a2 mean=avgrowth; by gender bone; run;

Plot the means title1 'Plot of the means'; symbol1 v='m' i=join c=blue; symbol2 v='f' i=join c=green; proc gplot data=a2; plot avgrowth*bone=gender; run;

Plot of the means avgrowth 2.4 F 2.2 2.0 1.8 M F M Interaction? 1.6 1.4 1.2 1.0 0.8 MF 1 2 3 bone gender M M M 1 F F F 2

Cell means model Y ijk = μ ij + ε ijk where μ ij is the theoretical mean or expected value of all observations in cell (i,j) the ε ijk are iid N(0, σ 2 ) Y ijk ~ N(μ ij, σ 2 ), independent

Estimates Estimate μ ij by the mean of the observations in cell (i,j), Y ij ˆ Y Y n ij ij k ijk ij For each (i,j) combination, we can get an estimate of the variance s 2 2 ij ijk ij ij Y Y n 1 k We pool these to get an estimate of σ 2

Pooled estimate of σ 2 In general we pool the s ij2, using weights proportional to the df, n ij -1 The pooled estimate is s n ij 1 sij nij 1 2 2 ij Nothing different in terms of parameter estimates from balanced design ij

Run proc glm proc glm data=a1; class gender bone; model growth=gender bone/solution; means gender*bone; run; Shorthand way to write main effects and interactions

Parameter Estimates Solution option on the model statement gives parameter estimates for the glm parameterization These constraints are Last level of main effect is zero Interaction terms with a or b are zero These reproduce the cell means in the usual way

Parameter Estimates Parameter Estimate Standard Error t Value Pr > t Intercept 0.90000000 B 0.2327373 3.87 0.0048 gender 1-0.00000000 B 0.3679900-0.00 1.0000 bone 1 1.50000000 B 0.4654747 3.22 0.0122 bone 2 1.20000000 B 0.3291403 3.65 0.0065 gender*bone 1 1-0.40000000 B 0.5933661-0.67 0.5192 gender*bone 1 2-0.20000000 B 0.5204165-0.38 0.7108 Example: ˆ22 0.90 0.00 1.20 0.00 2.10

Output Source DF Sum of Squares Mean Square F Value Pr > F Model 5 4.4742857 0.89485714 5.51 0.0172 Error 8 1.3000000 0.16250000 Corrected Total 13 5.7742857 Note DF and SS add as usual

Output Type I SS Source DF Type I SS Mean Square F Value Pr > F gender 1 0.0028571 0.00285714 0.02 0.8978 bone 2 4.3960000 2.19800000 13.53 0.0027 gender*bone 2 0.0754286 0.03771429 0.23 0.7980 SSG+SSB+SSGB=4.47429

Output Type III SS Source DF Type III SS Mean Square F Value Pr > F gender 1 0.12000000 0.12000000 0.74 0.4152 bone 2 4.18971429 2.09485714 12.89 0.0031 gender*bone 2 0.07542857 0.03771429 0.23 0.7980 SSG+SSB+SSGB=4.38514

Type I vs Type III SS for Type I add up to model SS SS for Type III do not necessarily add up Type I and Type III are the same for the interaction because last term in model The Type I and Type III analysis for the main effects are not necessarily the same Different hypotheses are being examined

Type I vs Type III Most people prefer the Type III analysis This can be misleading if the cell sizes differ greatly Contrasts can provide some insight into the differences in hypotheses

Contrast for A*B Same for Type I and Type III Null hypothesis is that the profiles are parallel; see plot for interpretation μ 12 - μ 11 = μ 22 - μ 21 and μ 13 - μ 12 = μ 23 - μ 22 μ 11 - μ 12 - μ 21 + μ 22 = 0 and μ 12 - μ 13 - μ 22 + μ 23 = 0

A*B Contrast statement contrast 'gender*bone Type I and III' gender*bone 1-1 0-1 1 0, gender*bone 0 1-1 0-1 1; run;

Type III Contrast for gender (1) μ 11 = (1)(μ + α 1 + β 1 + (αβ) 11 ) (1) μ 12 = (1)(μ + α 1 + β 2 + (αβ) 12 ) (1) μ 13 = (1)(μ + α 1 + β 3 + (αβ) 13 ) (-1) μ 21 = (-1)(μ + α 2 + β 1 + (αβ) 21 ) (-1) μ 22 = (-1)(μ + α 2 + β 2 + (αβ) 22 ) (-1) μ 23 = (-1)(μ + α 2 + β 3 + (αβ) 23 ) L = 3α 1 3α 2 + (αβ) 11 + (αβ) 12 + (αβ) 13 (αβ) 21 (αβ) 22 αβ 23

Contrast statement Gender Type III contrast 'gender Type III' gender 3-3 gender*bone 1 1 1-1 -1-1;

Type I Contrast for gender (3) μ 11 = (3)(μ + α 1 + β 1 + (αβ) 11 ) (2) μ 12 = (2)(μ + α 1 + β 2 + (αβ) 12 ) (2) μ 13 = (2)(μ + α 1 + β 3 + (αβ) 13 ) (-1) μ 21 = (-1)(μ + α 2 + β 1 + (αβ) 21 ) (-3) μ 22 = (-3)(μ + α 2 + β 2 + (αβ) 22 ) (-3) μ 23 = (-3)(μ + α 2 + β 3 + (αβ) 23 ) L = (7α 1 7α 2 )+(2β 1 β 2 β 3 )+3(αβ) 11 +2(αβ) 12 +2(αβ) 13 1(αβ) 21 3(αβ) 22 3(αβ) 23

Contrast statement Gender Type I contrast 'gender Type I' gender 7-7 bone 2-1 1 gender*bone 3 2 2-1 -3-3;

Type III Contrast for bone Null hypothesis is that the marginal means are the same In terms of means H 0 : μ.1 = μ. 2 and μ.2 = μ.3 contrast bone Type III' bone 2-2 0 gender*bone 1-1 0 1-1 0, bone 2 0-2 gender*bone 1 0-1 1 0-1;

Contrast output Contrast DF Contrast SS Mean Square F Value Pr > F gender*bone Type I and III 2 0.07542857 0.03771429 0.23 0.7980 gender Type III 1 0.12000000 0.12000000 0.74 0.4152 gender Type I 1 0.00285714 0.00285714 0.02 0.8978 bone Type III 2 4.18971429 2.09485714 12.89 0.0031

Summary Type I and Type III F tests test different null hypotheses Should be aware of the differences Most prefer Type III as it follows logic similar to regression analysis Be wary, however, if the cell sizes vary dramatically

Comparing Means If interested in Type III hypotheses, need to use LSMEANS to do comparisons If interested in Type I hypotheses, need to use MEANS to do comparisons. We will show this difference via the ESTIMATE statement

SAS Commands Will use earlier contrast code to set up the ESTIMATE commands estimate 'gender Type III' gender 3-3 gender*bone 1 1 1-1 -1-1 / divisor=3; estimate 'gender Type I' gender 7-7 bone 2-1 -1 gender*bone 3 2 2-1 -3-3 / divisor=7;

MEANS OUPUT Level of ------------growth----------- gender N Mean Std Dev 1 7 1.65714286 0.62411843 2 7 1.62857143 0.75655862 Diff = 0.0286

LSMEANS OUPUT gender growth LSMEAN 1 1.60000000 2 1.80000000 Diff = -0.20

Estimate output Parameter Estimate Std Err gender Type III -0.200 0.2327 gender Type I 0.029 0.2155 Notice that these two estimates agree with the difference of estimates for LSMEANS or MEANS

Analytical Strategy First examine interaction Some options when the interaction is significant Interpret the plot of means Run A at each level of B and/or B at each level of A Run as a one-way with ab levels Use contrasts

Analytical Strategy Some options when the interaction is not significant Use a multiple comparison procedure for the main effects Use contrasts for main effects If needed, rerun without the interaction

Example continued proc glm data=a3; class gender bone; model growth=gender bone/ solution; For Type I hypotheses means gender bone/ tukey lines; run; Pool here because small df error

Output Source DF Sum of Squares Mean Square F Value Pr > F Model 3 4.3988571 1.46628571 10.66 0.0019 Error 10 1.3754286 0.13754286 Corrected Total 13 5.7742857

Output Type I SS Source DF Type I SS Mean Square F Value Pr > F gender 1 0.00285714 0.00285714 0.02 0.8883 bone 2 4.39600000 2.19800000 15.98 0.0008

Output Type III SS Source DF Type III SS Mean Square F Value Pr > F gender 1 0.09257143 0.09257143 0.67 0.4311 bone 2 4.39600000 2.19800000 15.98 0.0008 Although different null hypothesis for gender, both Type I and III tests are not found significant

Tukey comparisons Group Mean N bone A 2.1000 4 1 A A 2.0200 5 2 B 0.9000 5 3

Tukey Comparisons Why don t we need a Tukey adjustment for gender? Means statement does provide mean estimates so you know directionality of F test but that is all the statement provides you

Last slide Read KNNL Chapter 23 We used program topic28.sas to generate the output for today