Lecture 14 Topic 10: ANOVA models for random and mixed effects. Fixed and Random Models in One-way Classification Experiments

Similar documents
TWO OR MORE RANDOM EFFECTS. The two-way complete model for two random effects:

Increasing precision by partitioning the error sum of squares: Blocking: SSE (CRD) à SSB + SSE (RCBD) Contrasts: SST à (t 1) orthogonal contrasts

Stat 217 Final Exam. Name: May 1, 2002

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Allow the investigation of the effects of a number of variables on some response

Topic 6. Two-way designs: Randomized Complete Block Design [ST&D Chapter 9 sections 9.1 to 9.7 (except 9.6) and section 15.8]

Design of Experiments. Factorial experiments require a lot of resources

Lec 5: Factorial Experiment

Reference: Chapter 13 of Montgomery (8e)

Lecture 7 Randomized Complete Block Design (RCBD) [ST&D sections (except 9.6) and section 15.8]

Unit 7: Random Effects, Subsampling, Nested and Crossed Factor Designs

Topic 9: Factorial treatment structures. Introduction. Terminology. Example of a 2x2 factorial

Fractional Factorial Designs

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Chap The McGraw-Hill Companies, Inc. All rights reserved.

2. Two distinct error terms (Error for main plot effects > Error for subplot effects)

Analysis of Variance

PLS205 Lab 2 January 15, Laboratory Topic 3

Reference: Chapter 14 of Montgomery (8e)

6 Designs with Split Plots

Analysis of Variance and Design of Experiments-I

2. Treatments are randomly assigned to EUs such that each treatment occurs equally often in the experiment. (1 randomization per experiment)

Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik

Split-plot Designs. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 21 1

Lecture 10. Factorial experiments (2-way ANOVA etc)

Chapter 13 Experiments with Random Factors Solutions

MEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS FALL 1999

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

PLS205 Lab 6 February 13, Laboratory Topic 9

Topic 13. Analysis of Covariance (ANCOVA) - Part II [ST&D Ch. 17]

Lecture 10: Experiments with Random Effects

Factorial designs. Experiments

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

FACTORIALS. Advantages: efficiency & analysis of interactions. Missing data. Predicted EMS Special F tests Satterthwaite linear combination of MS

Power & Sample Size Calculation

Topic 12. The Split-plot Design and its Relatives [ST&D Ch. 16] Definition Uses of Split-plot designs

Chapter 11: Factorial Designs

Note: The problem numbering below may not reflect actual numbering in DGE.

Answer Keys to Homework#10

Assignment 9 Answer Keys

Topic 7: Incomplete, double-blocked designs: Latin Squares [ST&D sections ]

Quantitative characters - exercises

ST4241 Design and Analysis of Clinical Trials Lecture 4: 2 2 factorial experiments, a special cases of parallel groups study

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

STAT Chapter 10: Analysis of Variance

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Two-Factor Full Factorial Design with Replications

Statistics For Economics & Business

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Introduction to Factorial ANOVA

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model

23. Fractional factorials - introduction

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

y = µj n + β 1 b β b b b + α 1 t α a t a + e

Battery Life. Factory

19. Blocking & confounding

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

1 Use of indicator random variables. (Chapter 8)

Two-Way Factorial Designs

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

Sociology 6Z03 Review II

Statistics 512: Applied Linear Models. Topic 9

Contents. TAMS38 - Lecture 8 2 k p fractional factorial design. Lecturer: Zhenxia Liu. Example 0 - continued 4. Example 0 - Glazing ceramic 3

Two-Way ANOVA (Two-Factor CRD)

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Sleep data, two drugs Ch13.xls

STAT22200 Spring 2014 Chapter 8A

Written Exam (2 hours)

Laboratory Topics 4 & 5

Topic 12. The Split-plot Design and its Relatives (Part II) Repeated Measures [ST&D Ch. 16] 12.9 Repeated measures analysis

Econometrics. 4) Statistical inference

Inferences for Regression

Analysis of Variance. One-way analysis of variance (anova) shows whether j groups have significantly different means. X=10 S=1.14 X=5.8 X=2.4 X=9.

Chapter 30 Design and Analysis of

Lecture 15 Topic 11: Unbalanced Designs (missing data)

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Lecture 9: Factorial Design Montgomery: chapter 5

We need to define some concepts that are used in experiments.

Analysis of variance

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Lecture 13 Extra Sums of Squares

20g g g Analyze the residuals from this experiment and comment on the model adequacy.

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

IX. Complete Block Designs (CBD s)

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Latin square designs are special block designs with two blocking factors and only one treatment per block instead of every treatment per block.

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

Topic 29: Three-Way ANOVA

Outline Topic 21 - Two Factor ANOVA

Nesting and Mixed Effects: Part I. Lukas Meier, Seminar für Statistik

Lecture 22 Mixed Effects Models III Nested designs

If we have many sets of populations, we may compare the means of populations in each set with one experiment.

Introduction. Chapter 8

Chapter 9 Inferences from Two Samples

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

Introduction to the Analysis of Variance (ANOVA)

Lecture 3: Inference in SLR

Transcription:

Lecture 14 Topic 10: ANOVA models for random and mixed effects To this point, we have considered only the Fixed Model (Model I) ANOVA; now we will extend the method of ANOVA to other experimental objectives. Fixed and Random Models in One-way Classification Experiments A recap of the fixed-effects model 1. Treatment levels are selected by the researcher.. The interest is only in those specific treatment levels. 3. Conclusions are limited to those specific treatment levels. 4. The linear model: Y ij = µ + i + ij where the treatment effects ( i ) are fixed and ( µ) = 5. H 0 : 1 = = = t = 0 H 1 : some i 0 i = Y i 0. 6. When H 0 is false, there is an additional component of variance in the experiment: i r t 1 7. The typical objective is to compare particular treatment means to one another in an attempt to detect differences. A typical example: Measuring the effect of four different, specific fertilizers on yield. 1

Introducing the random-effects model 1. Treatment levels are a random sample drawn from a larger population of treatments with mean effect 0 and variance.. The interest is in this larger population of treatment levels. 3. Conclusions are meant to extend to the larger population of treatment levels. 4. The linear model: Y ij = µ + i + ij where the treatment effect ( i ) is a random variable. While ( µ) = i = Y i 0 for the population of treatment effects, for any given random sample of treatment levels, i 0. 5. H 0 : 0 = H 1 : 0 6. When H 0 is false, there is an additional component of variance in the experiment: r 7. The typical objective is to test for the presence of this additional component of variance and to estimate its magnitude. Example: In a sheep breeding experiment conducted to characterize the breeding value of sires from a certain breed, four sires were randomly selected from a population and each sire was mated with six dams. The weights of all the newborn animals were recorded. These four sires are of no specific interest; they are merely a random sample from a population of sires of the breed. These four sires are interesting only to the extent that they represent their population. If the experiment were to be repeated, another set of sires very likely would be used. For a one-way ANOVA, the computation is the same for the fixed and random models. But the objectives and conclusions are different. And the computations following the initial significance test are also different.

Sometimes the determination of whether an effect is fixed or random is not obvious. Laboratories or technicians in a comparative study Years in a multiple-year trial Locations in a multiple-location study These factors can be fixed or random depending on the objective of the study, the intended inferences to be made, and the process by which the levels of the factors were selected. Differences between the fixed- and random-effects models 1. The objectives are different. Fixed-effects (the fertilizer experiment): Each fertilizer is of specific interest, and the purpose is to compare these particular fertilizers to one another. Random-effects (the sheep breeding experiment): Each sire is of no specific interest. The sires in the study are merely a random sample from which inferences are to be made concerning the population. The purpose is to estimate the component of variance.. The sampling procedures are different. Fixed-effects (the fertilizer experiment): The fertilizers are selected purposefully (i.e. not randomly) by the investigator. If the experiment is repeated, the same four fertilizers will be used again and only the random errors change from experiment to experiment (i.e. the i 's are assumed to be constants; only the ij 's change). Random-effects (the sheep breeding experiment): The sires are selected randomly, and the unknown variance of the population of sires contributes to the total sum of squares. If the experiment is repeated, the four sires most likely will differ each time (i.e. not only the errors vary but the sire effects ( i 's) vary as well). 3. The expected sums of the effects are different. Because the effects are constants, in the fixed-effects model: i ( Y i µ) = = 0 But for any given sample in the random-effects model: i ( Y i µ) = 0 3

4. And, therefore, the expected variances are different. For Model I: Var of Y ij = variance of µ + variance of i + variance of ij = variance of ij = For Model II: Var of Y ij = variance of µ + variance of i + variance of ij = variance of i + variance of ij = r + ( and are the variance components) The expected mean squares for these models: EMS Source df MS Fixed Model Random Model Treatment t-1 MST + r i + r t 1 Error t(r-1) MSE Example: Suppose five different cuts of meat are taken from each of three pigs, all from the same breed, and the fat content is measured for each cut. 1. If the goal were to compare the fat content among these five cuts of meat: Fixed-effects model Five treatment levels (cuts) Three replications (pigs). If the goal were to assess animal-to-animal and within-animal variation for this breed: Random-effects model Three levels to estimate among-animal variation (pigs; like treatment) Five levels to estimate within-animal variation (cuts; like replications) 4

Sticking with the pigs let's assume the goal is to assess the components of variation (amonganimals and within-animals): MST = 80 (df = ) MSE = 0 (df = 1) F = 80/0 = 4 (p < 0.05) Our conclusion? > 0 EMS Source df MS Random Model Treatment t-1 MST r + Error t(r-1) MSE 80 = r + Since r = 5 and = 0, we can solve: 80 = 5 + 0 = 1 Fat content is nearly twice as variable among cuts within this breed of pig ( ) as it is among pigs of this breed ( ). 5

Two-way classification experiments Y ijk = µ + Ai + Bj + ( A B ) ij + ijk Consider: A field study of several varieties of sunflower (Factor A) tested across a number of locations (Factor B). In each location, a CRD is used, with each variety replicated over r plots. Let Y ijk represent the plot yield of the k th plot of the i th variety at the j th location. Fixed-effects model: Interested in a particular set of varieties grown in a particular set of locations. Random-effects model: Interested in a population of varieties grown in some random set of locations which represent a particular region. Mixed-effects model: Interested in a particular set of varieties grown in some random set of locations which represent a particular region. Fixed Random Mixed µ Mean yield in the study Mean yield of all varieties at all possible locations Mean yield of specific varieties at all possible locations A (variety) B (location) A:B Variance Y Objective True effect of the i th variety Ai = 0 True effect of the j th location Bj = 0 Specific interaction effect of the i th variety and the j th location Estimate and test hypotheses about Ai, Bj, and ( A B ) ij Random effect from N(0, α ) Ai 0 Random effect from N(0, ) β Bj 0 Random effect from N(0, AB ) A + B + AB + Estimate and test hypotheses about A, B, and AB True effect of the i th variety Ai = 0 Random effect from N(0, ) β Bj 0 Random effect from N(0, AB ) B + AB + Estimate and test hypotheses about Ai, B, and AB 6

Expected mean squares and F tests EMS: Algebraic expressions which specify the underlying model parameters that are estimated by the calculated mean squares. Used to determine the appropriate error variances for F tests. An EMS is composed of: 1. The error variances. Functions of variances of random effects 3. Functions of sums of squares of fixed effects EMS table for three two-way classification experiments, featuring a varieties, b locations, and r replications: Source df MS Fixed Model Random Model Var a-1 MSV + br + r AB + br A a 1 Loc b-1 MSL + ar B + r AB + ar B b 1 V:L (b-1)(a-1) MSVL ( + r A B ) + r AB (a 1)(b 1) Error ba(r-1) MSE A Mixed Model Fixed Var, Random Loc + r AB + r AB + r AB + br + ar B A a 1 Based on the expected mean squares, correct denominators can be identified to perform the appropriate F tests on locations, varieties, and their interaction: The appropriate test statistic (F) is a ratio of mean squares that is chosen such that the expected value of the numerator differs from the expected value of the denominator only by the specific variance component or fixed factor being tested. Source Fixed Random Mixed Variety MSV/MSE MSV/MSVL MSV/MSVL Location MSL/MSE MSL/MSVL MSL/MSVL V:L MSVL/MSE MSVL/MSE MSVL/MSE Note that the appropriate F tests change, depending on the model. 7

Expected Mean Squares (EMS) and Custom F Tests Assume we are testing 8 varieties (fixed) across 4 locations (random), with 3 replications in each location. To generate the correct sums of squares, the proper R code would be: lm( Yield ~ Variety + Location + Variety:Location ) But to carry out the proper F tests, we need to consult the table of expected mean squares: Source Var Loc V:L Error Mixed Model Fixed Var, Random Loc + r AB + r AB + r AB A Mixed Model Fixed Var, Random Loc A + br + 3 AB +1 a 1 a 1 + ar B + 3 AB + 4 B + 3 AB The appropriate F tests in this case would be: Source DF SS Mean Square F Value Pr > F Var 7 16.187979.31568.36 0.0606 Loc 3 0.739818 0.46606 0.5 0.8594 Error: MS(Var:Loc) 1 0.59643 0.98078 Source DF SS Mean Square F Value Pr > F Var:Loc 1 0.59643 0.98078 1.08 0.399 Error: MS(Error) 64 58.18486 0.909664 8

An aside about RCBD's Source Trtmt Block T:B Error Mixed Model Fixed Var, Random Loc + r β + br + r β + tr β + r β t 1 Now we see the reasoning behind the standard practice of having just one replication per blocktreatment combination. Merging the Trtmt:Block interaction into the error makes some sense because the correct denominator for Treatment is the Trt:Block interaction (i.e. we want to generalize across the entire population of blocks represented by the blocks actually used in the study). 9

Expected Mean Squares for a three-way ANOVA Now consider a three-factor factorial experiment: A is fixed; B and C are both random. The linear model: Y ijkl = µ + Ai + Bj + Ck + ( A B ) ij + ( A C ) ik + ( B C ) jk + ( A B C ) ijk + ijkl Source Expected Mean Squares F A B C AB AC BC ABC Error + r ABC + r ABC + r ABC + r ABC + r ABC + r ABC + r ABC + br AC + ar BC + ar BC + cr AB + br AC + ar BC + cr AB + cr AB + br AC + bcr A a 1? + acr B? + abr C? MS( AB) MS( ABC) MS( AC) MS( ABC) MS( BC) MS( ABC) MS ( ABC) MSE No exact tests exist for the main effects of A, B, or C! 10

Approximate F tests: synthetic errors Satterthwaite s method (1946) utilizes linear combinations of extant mean squares to build appropriate error terms. MS' = MS A + MS B + + MS N and where: MS'' = MS a + MS b + + MS n a. No MS appears simultaneously in MS' and MS''; and b. EMS' EMS'' is equal to the effect considered in the null hypothesis Under these conditions, the test statistic: F = MS' / MS'' is distributed approximately as F p,q where p and q are the effective degrees of freedom: ( MS +... + MS ) = and q = (MS +... + MS a n ) MS a +... + +... + MS n df df df a df n A X p MSA MSX A X In these expressions for p and q, df i is the number of degrees of freedom associated with the mean square MS i. Example: In the three-factor mixed effects model discussed above, an appropriate test statistic for H 0 : A1 =... = At = 0 would be: F = MSA + MS( ABC) MS( AB) + MS( AC) F = + r ABC + r ABC + br AC + br AC + cr AB + cr AB A + bcr a 1 + + r ABC +... + + r ABC 11

Nested effects (subsamples) Recall that subsamples force the experimental units into the linear model, and experimental units (i.e. replications) are random, by definition. So even when all factors are fixed effects, nesting transforms the model into a mixed model. Consider a two-way factorial (fixed A, random B) with C (random) replications in each A-B combination and D subsamples measured on each replication. Source A B (A:B) (A:B):C Error Mixed Model EMS Fixed A, Random B, Reps C, Subsamples D + d (AB)C + d (AB)C + d (AB)C + d (AB)C + cd AB + cd AB + cd AB A + bcd a 1 + acd B 1. To test the interaction, we must use the MS of the factor nested within the interaction (i.e. the EU), as with any previous nested design.. To test A or B, we must use the interaction (MS A:B ), also as before. 1

Blocks nested within random locations Four eel grass restoration methods (Factor A) are tested in five locations (Factor B), selected at random within the Great Bay. At each location, the experiment is organized as an RCBD with 3 blocks. Source Method Loc Method:Loc (Loc):Block Mixed Model Fixed Var, Random Loc + 3 AB + 4 (B)Block + 3 AB + 4 (B)Block A +15 a 1 + 3 AB +1 B The methods need to be tested using the Method:Location interaction variance because we are trying to make generalizations which are valid across all locations. MS F Location = Loc + MS Error MS Interaction + MS Loc/Block 13

Some typical designs with blocks nested in random "environments" Model 1 Fixed Genotypes Environments = Random Locations Blocks within Locations Source df MS F VC G g-1 M G M G /M GL G = (M G -M GL )/rl L l-1 M L synthetic L = (M L -M B(L) -M GL +M e )/rg Block(Loc) (r-1)l M B(L) M B(L) /M e B(L) = (M B(L) -M e )/g GxL (g-1)(l-1) M GL M GL /M e GL = (M GL -M e )/r Error (r-1)(g-1)l M e - e = M e Source Geno Loc Geno:Loc Block(Loc) Expected Mean Squares e + r GL + lr γ g 1 e + g B(L) + r GL + gr L e + r GL e + g B(L) The varieties need to be tested using the Geno:Loc interaction because we are trying to make generalizations which are valid across all locations. 14

Model Fixed Genotypes Environments = Blocks within (Locations x Years) Random Locations Random Years (shared across locations) Source df MS F VC G g-1 M G synthetic L l-1 M L synthetic Y y-1 M Y synthetic Block(LxY) (r-1)l M B(LY) - GxL (g-1)(l-1) M GL M GL /M GLY (M GL -M GLY )/ry GxY (g-1)(l-1) M GY M GY /M GLY (M GY -M GLY )/rl GxLxY (g-1)(l-1) M GLY M GLY /M e (M GLY -M e )/r Error (r-1)(g-1)l M e - M e Source Geno Loc Year Block(Loc:Year) Geno:Loc Geno:Year Geno:Loc:Year Expected Mean Squares e + r GLY + rl GY + ry GL + lyr γ g 1 e + r GLY + ry GL + g B(LY) + gr LY + gry L e + r GLY + rl GY + g B(LY) + gr LY + grl Y e + g B(LY) e + r GLY + ry GL e + r GLY + rl GY e + r GLY 15

Model 3 Fixed Genotypes Environments = Random Locations Random Years within Locations (years not shared across locations) Blocks within Years within Locations Source df MS F VC G g-1 M G M G /M GL L l-1 M L synthetic Y(L) y-1 M Y(L) synthetic Block(YxL) (r-1)l M B(YL) - GxL (g-1)(l-1) M GL M GL /M GLY (M GL -M GY(L) )/ry GxY(L) (g-1)(l-1) M GY(L) M GY(L) /M e (M GY(L) -M e )/r Error (r-1)(g-1)l M e - M e Source Geno Loc Year(Loc) Block(Year:Loc) Geno:Loc Geno:Year(Loc) Expected Mean Squares e + r GLY + ry GL + lyr γ g 1 e + r GLY + ry GL + g B(LY) + gr Y(L) + gry L e + r GLY + g B(LY) + grl Y(L) e + g B(LY) e + r GLY + ry GL e + r GY(L) 16