PLS205 Lab 6 February 13, Laboratory Topic 9

Similar documents
PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

Topic 6. Two-way designs: Randomized Complete Block Design [ST&D Chapter 9 sections 9.1 to 9.7 (except 9.6) and section 15.8]

PLS205 Lab 2 January 15, Laboratory Topic 3

PLS205 Winter Homework Topic 8

Lecture 7 Randomized Complete Block Design (RCBD) [ST&D sections (except 9.6) and section 15.8]

Laboratory Topics 4 & 5

PLS205 KEY Winter Homework Topic 3. The following represents one way to program SAS for this question:

Topic 13. Analysis of Covariance (ANCOVA) [ST&D chapter 17] 13.1 Introduction Review of regression concepts

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

Topic 9: Factorial treatment structures. Introduction. Terminology. Example of a 2x2 factorial

Topic 8. Data Transformations [ST&D section 9.16]

BIOL 933!! Lab 10!! Fall Topic 13: Covariance Analysis

Increasing precision by partitioning the error sum of squares: Blocking: SSE (CRD) à SSB + SSE (RCBD) Contrasts: SST à (t 1) orthogonal contrasts

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

IX. Complete Block Designs (CBD s)

Chapter 11: Factorial Designs

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Topic 13. Analysis of Covariance (ANCOVA) - Part II [ST&D Ch. 17]

Unbalanced Data in Factorials Types I, II, III SS Part 2

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Biological Applications of ANOVA - Examples and Readings

Single Factor Experiments

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes)

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Chap The McGraw-Hill Companies, Inc. All rights reserved.

STAT 115:Experimental Designs

Answer Keys to Homework#10

Assignment 9 Answer Keys

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

Topic 7: Incomplete, double-blocked designs: Latin Squares [ST&D sections ]

Analyses of Variance. Block 2b

Linear Combinations of Group Means

Comparison of a Population Means

Lecture 11: Simple Linear Regression

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Allow the investigation of the effects of a number of variables on some response

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

5.3 Three-Stage Nested Design Example

Using SPSS for One Way Analysis of Variance

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test.

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

SAS Program Part 1: proc import datafile="y:\iowa_classes\stat_5201_design\examples\2-23_drillspeed_feed\mont_5-7.csv" out=ds dbms=csv replace; run;

Unbalanced Designs & Quasi F-Ratios

STA 303H1F: Two-way Analysis of Variance Practice Problems

Outline Topic 21 - Two Factor ANOVA

Lecture 9: Factorial Design Montgomery: chapter 5

One-way ANOVA Model Assumptions

Reference: Chapter 6 of Montgomery(8e) Maghsoodloo

CHAPTER 13: F PROBABILITY DISTRIBUTION

EXST7015: Estimating tree weights from other morphometric variables Raw data print

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Lecture 7: Latin Square and Related Design

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Topic 12. The Split-plot Design and its Relatives (Part II) Repeated Measures [ST&D Ch. 16] 12.9 Repeated measures analysis

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

MEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS FALL 1999

One-way ANOVA (Single-Factor CRD)

PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design

Cross-Over Design Experiment (Using SAS)

4.8 Alternate Analysis as a Oneway ANOVA

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Assignment 6 Answer Keys

Simple, Marginal, and Interaction Effects in General Linear Models

Topic 20: Single Factor Analysis of Variance

1 A Review of Correlation and Regression

Split-plot Designs. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 21 1

POL 681 Lecture Notes: Statistical Interactions

STAT 350. Assignment 4

Overview Scatter Plot Example

Topic 28: Unequal Replication in Two-Way ANOVA

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Three Factor Completely Randomized Design with One Continuous Factor: Using SPSS GLM UNIVARIATE R. C. Gardner Department of Psychology

Chapter 20 : Two factor studies one case per treatment Chapter 21: Randomized complete block designs

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects

Ch. 3 Equations and Inequalities

In social and biological sciences, a useful and frequently used statistical technique is the

RCB - Example. STA305 week 10 1

Contrasts and Multiple Comparisons Supplement for Pages

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

Covariance Structure Approach to Within-Cases

2. Treatments are randomly assigned to EUs such that each treatment occurs equally often in the experiment. (1 randomization per experiment)

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Figure 9.1: A Latin square of order 4, used to construct four types of design

Lecture 7: Latin Squares and Related Designs

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Advanced Experimental Design

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

a = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination

Factorial ANOVA. STA305 Spring More than one categorical explanatory variable

Chapter 4: Randomized Blocks and Latin Squares

STAT 8200 Design of Experiments for Research Workers Lab 11 Due: Friday, Nov. 22, 2013

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

Linear Independence Reading: Lay 1.7

Analysis of Covariance

Aquatic Toxicology Lab 10 Pimephales promelas Larval Survival and Growth Test Data Analysis 1. Complete test initiated last week 1.

Transcription:

PLS205 Lab 6 February 13, 2014 Laboratory Topic 9 A word about factorials Specifying interactions among factorial effects in SAS The relationship between factors and treatment Interpreting results of an experiment with a factorial treatment structure Visualizing simple and main effects Visualizing three-way interactions APPENDIX: The Almost Practically Complete Analysis of Example 6.1 APPENDIX 2: Graphing in Excel A word about factorials A factorial is not an experimental design. Why? Because the term "factorial" merely describes the structure of the treatment effects (i.e. the factors), not how they are randomized. Specifically, a factorial treatment structure is one in which all levels of every factor are present in all possible combinations with all the levels of every other factor in the experiment (i.e. the crossing of factors is complete and orthogonal). It is this complete, orthogonal structure that allows an experimenter to gain insight into interactions among factors. Seen in this way, it becomes clear that any of the true experimental designs (i.e. randomization strategies) we have discussed so far (CRD, RCBD, Latin Square) can be factorials, provided the treatments are structured correctly. A factorial is a complete, orthogonal structure of treatment effects intended to provide insight into their interactions. Specifying Interactions Among Factorial Effects in SAS Specifications about designs with factorial treatment structures are entered through the Model statement of the Proc GLM, and this syntax can assume one of two forms: Stars (*) or bars ( ). Stars are used to partition out specific interactions from the Treatment SS and are useful when certain interactions must be used as error terms in custom F tests. Examples: Model Resp = a b a*b Model Resp = a a*b a*b*c specifies partitioning of SST into main effect A, me B and interaction AxB specifies me A and interactions AxB and AxBxC Bars are used as a nice shortcut to partition the Treatment SS into all possible combinations of the included factors. On a standard PC keyboard, the bar symbol ( ) is typed as Shift-\. Examples: Model Resp = a b Model Resp = a b c is equivalent to Model Resp = a b a*b is equivalent to Model Resp = a b c a*b a*c b*c a*b*c An additional nice trick to know is the use of "@" in factorial model statements. The "@" symbol in conjunction with bars ( ) allows you to specify all possible combinations of model factors up to a certain level (e.g. two-way effects), saving you lots of typing. An example: Model Resp = Block a b c@2 is equivalent to Model Resp = Block a b c a*b a*c b*c notice this excludes the three-way effect a*b*c PLS205 2014 6.1 Lab 6 (Topic 9)

The Relationship Between Factors and Treatment Until now, we have had only a single 'treatment' (the effect of which we are trying to understand) with zero (CRD), one (RCBD), or two (LS) blocking variables (the effects of which we are trying to account for but not really investigate). With factorials, we now have two or more 'factors' that are experimentally equivalent to the single 'treatment' variable from the first half of the course. To illustrate this equivalence, reconsider Example 1 from Lab 3 (Topics 4-5): An experiment with 6 treatments (L08, L12, L16, H08, H12, H16), where L/H refers to Low/High temperatures and 8/12/16 refers to hours of light. This is exactly equivalent to having temperature as one factor and light as another, organized as a factorial: Model Growth = Treatment; df Treatment = 5 Model Growth = Temp Light Temp*Light; df Temp = 1 df Light = 2 df Temp*Light = 2 Sum = 5 What this is meant to show is that the old classification variable Treatment is simply a combination of two factors (light and temperature). Rewriting the model in terms of the factors does not affect the Model df at all; it simply expands the class variable Treatment into Temp Light Temp*Light. Before, we accomplished this opening up of the treatment through orthogonal contrasts. The insights gained through each approach are equivalent. Example 6.1 Two-Way ANOVA with interactions [Lab6ex1.sas] In a study comparing the relative growth of five varieties of turfgrass (VARIETY) in three experimental soil mixtures (SOIL), six pots were prepared with each VARIETY-SOIL combination. The 90 pots were randomly allocated to six growth chambers (BLOCKS) and the dry matter yields were measured by clipping the plants at the end of four weeks. In this experiment, the researchers are interested only in these five varieties and three soil mixtures; so VARIETY and SOIL can be regarded as fixed factors. Data RCBDFactorial; Do Soil = 1 to 3; Do Variety = 1 to 5; Do Block = 1 to 6; Input Yield @@; Output; Cards; 22.1 24.1 19.1 22.1 25.1 18.1 27.1 15.1 20.6 28.6 15.1 24.6 22.3 25.8 22.8 28.3 21.3 18.3 19.8 28.3 26.8 27.3 26.8 26.8 20.0 17.0 24.0 22.5 28.0 22.5 13.5 14.5 11.5 6.0 27.0 18.0 16.9 17.4 10.4 19.4 11.9 15.4 15.7 10.2 16.7 19.7 18.2 12.2 PLS205 2014 6.2 Lab 6 (Topic 9)

15.1 6.5 17.1 7.6 13.6 21.1 21.8 22.8 18.8 21.3 16.3 14.3 19.0 22.0 20.0 14.5 19.0 16.0 20.0 22.0 25.5 16.5 18.0 17.5 16.4 14.4 21.4 19.9 10.4 21.4 24.5 16.0 11.0 7.5 14.5 15.5 11.8 14.3 21.3 6.3 7.8 13.8 ; Proc GLM Data = RCBDFactorial; Class Soil Variety Block; Model Yield = Soil Variety Block; Proc GLM Data = RCBDFactorial; Class Soil Variety Block; Model Yield = Soil Variety Block@2; Run; Quit; * This Model includes all main effects as well as the Method*Variety interaction; * Exploratory model to examine the one-way block interactions (see discussion below); NOTE: This initial analysis enables us to see if the interaction is significant and decide: Main or simple effects? Take a look at the resultant ANOVA table: Sum of Source DF Squares Mean Square F Value Pr > F Model 19 1361.153778 71.639673 3.45 <.0001 Error 70 1451.637778 20.737683 Corrected Total 89 2812.791556 R-Square Coeff Var Root MSE Yield Mean 0.483916 24.69855 4.553865 18.43778 Source DF Type III SS Mean Square F Value Pr > F Soil 2 953.1562222 476.5781111 22.98 <.0001 Variety 4 11.3804444 2.8451111 0.14 0.9680 Soil*Variety 8 374.4882222 46.8110278 2.26 0.0330 Block 5 22.1288889 4.4257778 0.21 0.9557 This is an RCBD with 6 blocks. Even though there are six replications per Method-Variety combination (which allows us to include their interaction in the model), there is only one replication per Method- Variety-Block combination. The upshot of this is that the Block*Factor interactions are inside the experimental error for this ANOVA. In other words, if the model statement had been: Model Yield = Soil Variety Block; there would have been no variation left to estimate the error (df e = 0), because: Block * Soil = Block * Variety = Block * Soil * Variety = 10 df 20 df 40 df 70 df = df e PLS205 2014 6.3 Lab 6 (Topic 9)

We exclude the one-way Block interactions from the model because, in general, we don't care about them (remember, we block to reduce the error term, not to gain understanding of the effect of blocking). In other words, this is a choice we make. Excluding the two-way Block interaction is not a choice, however; it cannot be a part of the model because it is the only term we have for our error. Of course, we still want to check these Block*Treatment interactions to see if they are significant. It they are not significant it is justifiable to relegate these interactions to the error term. If they are significant, you can attempt a transformation or be aware that they will contribute to a larger MSE when taken out of the model.. To test the Block*Treatment interactions they can simply be placed into an exploratory model (the second Proc GLM above): Sum of Source DF Squares Mean Square F Value Pr > F Model 49 2040.397111 41.640757 2.16 0.0069 Error 40 772.394444 19.309861 Corrected Total 89 2812.791556 R-Square Coeff Var Root MSE Yield Mean 0.725399 23.83313 4.394299 18.43778 Source DF Type III SS Mean Square F Value Pr > F Soil 2 953.1562222 476.5781111 24.68 <.0001 Variety 4 11.3804444 2.8451111 0.15 0.9631 Soil*Variety 8 374.4882222 46.8110278 2.42 0.0308 Block 5 22.1288889 4.4257778 0.23 0.9476 Soil*Block 10 242.2944444 24.2294444 1.25 0.2881 NS Variety*Block 20 436.9488889 21.8474444 1.13 0.3589 NS A little side note about what interactions to include in your model Although one can do exploratory work with different interactions in the model and then merge the Block*Treatment into the error, you should always keep the treatment interactions in the model, whether significant or not. This makes it very clear to the reader the status of the interaction and saves you from having to do a lot of explaining. Sometimes in higher order factorials (e.g. four factors) the higher order interactions (e.g. 4-way interactions) are excluded from the model if they are not significant to simplify the model. Interpreting Results of an Experiment with a Factorial Treatment Structure The above ANOVA results indicate that there are significant differences among soil mixtures but not among varieties. More importantly, however, it shows that the interaction between these two factors is significant (i.e. the effects of soil are different for the different varieties, and vice versa). Because the interaction is significant, it is not appropriate to analyze the main effects. One must compare the soil means separately for each variety (simple effects). Example 6.1b Proc Sort Data = RCBDFactorial; By Variety; Proc GLM Data = RCBDFactorial; [Lab6ex1b.sas] To analyze simple effects, you must first sort by one of the factors (in this case, Variety) and then run an ANOVA for each level of that factor PLS205 2014 6.4 Lab 6 (Topic 9)

Class Soil Block; Model Yield = Soil Block; Means Soil / Tukey; By Variety; Run; Quit; The above code tells SAS to generate five different ANOVAs, one for each variety. The results: Variety Treatment Block MSD Tukey 1 0.0519 NS 0.1822 NS 6.45 1 = 3 3 = 2 2 0.0746 NS 0.5530 NS 7.15 1 = 3 = 2 3 0.0130 ** 0.3708 NS 5.90 1 = 3 3 = 2 4 0.0041 *** 0.6843 NS 8.38 1 3=2 5 0.0144 ** 0.8428 NS 7.50 1 = 2 2 = 3 By investigating the simple effects, we see that only some varieties are significantly affected by the seed treatment. The MSE and means separation tests vary across varieties. Visualizing Simple and Main Effects Example 6.1c proc gplot data=rcbdfactorial ; ** Main effect plots **; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=blue; plot Yield * Soil = 1 / description="means plot of Yield by Soil"; run; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=red; plot Yield * Variety = 1 / description="means plot of Yield by Variety"; run; ** Two-way Plots **; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=blue; symbol2 i=std1mtj v=none color=black; symbol3 i=std1mtj v=none color=green; symbol4 i=std1mtj v=none color=orange; symbol5 i=std1mtj v=none color=red; plot Yield * Soil = Variety / description="means plot of Yield by Soil and Variety"; run; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=blue; symbol2 i=std1mtj v=none color=black; symbol3 i=std1mtj v=none color=green; PLS205 2014 6.5 Lab 6 (Topic 9)

plot Yield * Variety = Soil / description="means plot of Yield by Variety and Soil"; run; quit; The symbol statement `i=std1mtj' determine various details in the plot: For each mean, an interval of length 1 standard error (std1) to either side of the mean (m) is shown. Each interval has a top and bottom line (t), and the means are joined (j). `color' is for color (the options include black, red, blue, green, cyan, gold). `v' determines the symbol used for the individual observations, in this example (v=none) the individual observations are not shown in the figure. Don t get too worried about the code! It is simply connecting the data points and giving standard errors. Remember that standard error bars give you an idea of the distribution of observations about each mean. Plots of the main effects PLS205 2014 6.6 Lab 6 (Topic 9)

These plots show the main effects of soil and variety on yield. In the case of having a NS interaction, the implication is that each factor affects the response variable independent of the other; so consideration of the main effects alone would be sufficient. [NOTE: Of course, this is not the case in this example.] The interaction plots PLS205 2014 6.7 Lab 6 (Topic 9)

The non-parallel nature of the lines in this interaction plot demonstrates visually the significant interaction we found in the ANOVA. DON T FORGET: As always, we need to test assumptions in these tests. In this particular example, there are eight different ANOVAs (one for variety for each of the three soil mixtures and one for each soil mixture for each of the five varieties), the assumptions of each of which must be met. See the appendix at the end of this lab for the full, un-cut procedure. Example 6.2 Three-Way ANOVA with one replication [Lab6ex2.sas] The following is the code for a generic CRD with a 3x5x2 factorial treatment structure: Data ThreeFact; Input a b c resp @@; Cards; 1 1 1 61 2 1 1 38 3 1 1 81 1 1 2 31 2 1 2 27 3 1 2 113 1 2 1 39 2 2 1 61 3 2 1 49 1 2 2 68 2 2 2 103 3 2 2 143 1 3 1 121 2 3 1 82 3 3 1 41 1 3 2 78 2 3 2 57 3 3 2 63 1 4 1 79 2 4 1 68 3 4 1 59 1 4 2 122 2 4 2 127 3 4 2 167 1 5 1 91 2 5 1 31 3 5 1 61 1 5 2 92 2 5 2 43 3 5 2 128 ; Proc GLM Data = ThreeFact; Class a b c; Model Resp = a b c; Run; Quit; Running the program like this will make you sad because there are zero degrees of freedom for the error term and thus no estimation of the error SS. The result? A bunch of dots. The solution to this problem is to assume that there is no three-way interaction, allowing us to then use the three-way interaction as an estimate of the experimental error. To do this, modify the model statement above as follows: Model Resp = a b c@2; PLS205 2014 6.8 Lab 6 (Topic 9)

and re-run the program. The results: Source DF Type III SS Mean Square F Value Pr > F a 2 3599.266667 1799.633333 620.56 <.0001 *** b 4 6423.133333 1605.783333 553.72 <.0001 *** c 1 5333.333333 5333.333333 1839.08 <.0001 *** a*b 8 9675.066667 1209.383333 417.03 <.0001 *** a*c 2 5692.466667 2846.233333 981.46 <.0001 *** b*c 4 7987.000000 1996.750000 688.53 <.0001 *** You should also be able to determine which assumptions to test here and how to do them. Visualizing Three-Way Interactions Can't we do better than just assume a three-way interaction to be NS? What is a three-way interaction anyway? Though words may only confuse the issue here, one way to think about it might be: A three-way interaction exists if the character of the interaction between two factors differs among the different levels of a third factor. Difficult to articulate but easy to visualize. Walk through the following steps to see how one can cleverly visualize three-way interactions in a two-dimensional plot: 1. Open the Word file ThreeWayInteraction.doc. Familiarize yourself as to how the new dependent variable C1-C2 was created: A B C C1-C2 Resp 1 1 1 30 61 1 1 2 30 31 The new variable (C1-C2) is simply the effect of C1 relative to C2 for any given combination of levels of Factors A and B. [Side note: If C had three levels (C1, C2, C3) instead of just two, the procedure outlined here would have to be carried out for three new variables (C1-C2, C1-C3, and C2-C3) instead of just one.] 2. Set up your graph with C1-C2 as the DEPENDENT variable (Y-axis) and A and B as the CLASS variables (A on the X-axis, and B as the group variable). See the example below. The C1-C2 variable replaces the response variable as the dependent variable. a1 a2 a3 b1 30 11-32 b1-29 -42-94 b3 43 25-22 b4-43 -59-108 b5-1 -12-67 The output (it s like seeing in four dimensions!) PLS205 2014 6.9 Lab 6 (Topic 9)

C1-C2 60 40 20 0-20 -40-60 -80 b1 b1 b3 b4 b5-100 -120 a1 a2 a3 One way to think about this: Each line represents one level of B, and the average of each line represents the effect of C for each level of B. While these averages differ among lines (i.e. B*C is significant), their differences are fairly constant across all levels of A. In other words, the roughly parallel nature of the lines in this interaction plot shows us that the difference in the effects of C at the different levels of B do not vary significantly across the levels of A. [Translation: No significant three-way interaction, so we are justified in using A*B*C as our error term.] phew! APPENDIX: The Almost Practically Complete Analysis for Example 6.1 Step 1: Decide if you need to analyze simple effects Data RCBDFactorial; Do Soil = 1 to 3; Do Variety = 1 to 5; Do Block = 1 to 6; Input Yield @@; Output; Cards; PLS205 2014 6.10 Lab 6 (Topic 9)

22.1 24.1 19.1 22.1 25.1 18.1 27.1 15.1 20.6 28.6 15.1 24.6 22.3 25.8 22.8 28.3 21.3 18.3 19.8 28.3 26.8 27.3 26.8 26.8 20.0 17.0 24.0 22.5 28.0 22.5 13.5 14.5 11.5 6.0 27.0 18.0 16.9 17.4 10.4 19.4 11.9 15.4 15.7 10.2 16.7 19.7 18.2 12.2 15.1 6.5 17.1 7.6 13.6 21.1 21.8 22.8 18.8 21.3 16.3 14.3 19.0 22.0 20.0 14.5 19.0 16.0 20.0 22.0 25.5 16.5 18.0 17.5 16.4 14.4 21.4 19.9 10.4 21.4 24.5 16.0 11.0 7.5 14.5 15.5 11.8 14.3 21.3 6.3 7.8 13.8 ; Proc GLM Data = RCBDFactorial; Class Soil Variety Block; Model Yield = Soil Variety Block; Proc GLM Data = RCBDFactorial; Class Soil Variety Block; Model Yield = Soil Variety Block@2; Run; Quit; Notice there are 2 Proc GLM's in this code. The first features the model we're interested in, and we run it to see if there is a significant Soil*Variety interaction (i.e. to see if we should analyze main or simple effects). The second is what we call an "exploratory model" to check the significance of the two-way block interactions. The output: First Proc GLM Source DF Type III SS Mean Square F Value Pr > F Soil 2 953.1562222 476.5781111 22.98 <.0001 *** Variety 4 11.3804444 2.8451111 0.14 0.9680 Soil*Variety 8 374.4882222 46.8110278 2.26 0.0330 * Block 5 22.1288889 4.4257778 0.21 0.9557 There is a significant Soil*Variety interaction, so we must look at simple effects. Second and third Proc GLM results Source DF Type III SS Mean Square F Value Pr > F Method*Block 10 242.2944444 24.2294444 1.25 0.2881 NS Variety*Block 20 436.9488889 21.8474444 1.13 0.3589 NS Neither 2-way block interaction is significant, so we're justified in merging them into the error (and gaining 30 df by doing so). Step 2: Analyze the simple effect of Soil (i.e. for each Variety separately) Data RCBDFactorial; Do Soil = 1 to 3; Do Variety = 1 to 5; Do Block = 1 to 6; Input Yield @@; Output; Cards; PLS205 2014 6.11 Lab 6 (Topic 9)

22.1 24.1 19.1 22.1 25.1 18.1 27.1 15.1 20.6 28.6 15.1 24.6 22.3 25.8 22.8 28.3 21.3 18.3 19.8 28.3 26.8 27.3 26.8 26.8 20.0 17.0 24.0 22.5 28.0 22.5 13.5 14.5 11.5 6.0 27.0 18.0 16.9 17.4 10.4 19.4 11.9 15.4 15.7 10.2 16.7 19.7 18.2 12.2 15.1 6.5 17.1 7.6 13.6 21.1 21.8 22.8 18.8 21.3 16.3 14.3 19.0 22.0 20.0 14.5 19.0 16.0 20.0 22.0 25.5 16.5 18.0 17.5 16.4 14.4 21.4 19.9 10.4 21.4 24.5 16.0 11.0 7.5 14.5 15.5 11.8 14.3 21.3 6.3 7.8 13.8 ; Proc Sort Data = RCBDFactorial; By Variety; Proc GLM Data = RCBDFactorial; Class Soil Block; Model Yield = Soil Block; Means Soil / Tukey; By Variety; Output Out = PR r = res p = pred; Proc Print Data = PR; Proc Univariate normal data = PR; Var res; By Variety; Proc GLM data = RCBDFactorial; Class Soil; Model Yield = Soil; Means Soil / hovtest = Levene; By Variety; Proc GLM Data = PR; Class Soil Block; Model Yield = Soil Block pred*pred; By Variety; Proc Plot Data = PR; Plot res*pred; By Variety; Run; Quit; The first Proc GLM carries out five separate ANOVA's, one for each Variety; it also generates predicted and residual values. The Proc Univariate tests for normality of residuals within each ANOVA. The second Proc GLM conducts Levene's Tests for Soil within each level of variety. And the last Proc GLM tests for nonadditivity within each of the five models. The output is extensive but can be organized as shown on the next page: Normality of residuals (Variety 1 Variety 5) Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.962536 Pr < W 0.6512 Shapiro-Wilk W 0.954449 Pr < W 0.4990 Shapiro-Wilk W 0.954754 Pr < W 0.5043 Shapiro-Wilk W 0.991391 Pr < W 0.9996 Shapiro-Wilk W 0.945644 Pr < W 0.3608 Homogeneity of variances (Variety 1 Variety 5) Levene's Test for Homogeneity of Yield Variance ANOVA of Squared Deviations from Group Means PLS205 2014 6.12 Lab 6 (Topic 9)

Sum of Mean Source DF Squares Square F Value Pr > F Method 2 4967.3 2483.7 2.16 0.1502 Method 2 1479.1 739.5 3.59 0.0532 Method 2 128.7 64.3531 0.37 0.6985 Method 2 1425.9 713.0 0.93 0.4157 Method 2 734.1 367.1 0.93 0.4161 Nonadditivity (Variety 1 Variety 5) Source DF Type I SS Mean Square F Value Pr > F pred*pred 1 53.2354203 53.2354203 4.25 0.0693 pred*pred 1 8.3170588 8.3170588 0.38 0.5517 pred*pred 1 0.1770860 0.1770860 0.01 0.9171 pred*pred 1 105.6521057 105.6521057 5.44 0.0446 pred*pred 1 56.8633800 56.8633800 3.05 0.1148 The only assumption we violate is Nonadditivity within Variety 4 (though a few others are close). At this point, you could try to transform the data for Variety 4 to bring that subset of your data into alignment with the ANOVA assumptions. But since you're already able to detect differences among soils within Variety 4 (see summary of Tukey separations below), you may decide that transforming is not worth it. Variety Treatment Block MSD Tukey 1 0.0519 NS 0.1822 NS 6.45 1 = 3 3 = 2 2 0.0746 NS 0.5530 NS 7.15 1 = 3 = 2 3 0.0130 ** 0.3708 NS 5.90 1 = 3 3 = 2 4 0.0041 *** 0.6843 NS 8.38 1 3=2 5 0.0144 ** 0.8428 NS 7.50 1 = 2 2 = 3 To be truly comprehensive in our analysis, we should also analyze the differences among varieties for each of the soils. To do this, simply sort by Soil instead of Variety and replace all the "by Variety" commands with "by Soil" commands in the code; other changes are necessary in the class and model statements, resulting in a final code like the one below: Data RCBDFactorial; Do Soil = 1 to 3; Do Variety = 1 to 5; Do Block = 1 to 6; Input Yield @@; Output; Cards; 22.1 24.1 19.1 22.1 25.1 18.1 27.1 15.1 20.6 28.6 15.1 24.6 22.3 25.8 22.8 28.3 21.3 18.3 19.8 28.3 26.8 27.3 26.8 26.8 20.0 17.0 24.0 22.5 28.0 22.5 13.5 14.5 11.5 6.0 27.0 18.0 16.9 17.4 10.4 19.4 11.9 15.4 15.7 10.2 16.7 19.7 18.2 12.2 15.1 6.5 17.1 7.6 13.6 21.1 21.8 22.8 18.8 21.3 16.3 14.3 19.0 22.0 20.0 14.5 19.0 16.0 20.0 22.0 25.5 16.5 18.0 17.5 16.4 14.4 21.4 19.9 10.4 21.4 PLS205 2014 6.13 Lab 6 (Topic 9)

24.5 16.0 11.0 7.5 14.5 15.5 11.8 14.3 21.3 6.3 7.8 13.8 ; Proc Sort Data = RCBDFactorial; By Soil; Proc GLM Data = RCBDFactorial; Class Variety Block; Model Yield = Variety Block; Means Variety / Tukey; By Soil; Output Out = PR r = res p = pred; Proc Univariate normal data = PR; Var res; By Soil; Proc GLM data = RCBDFactorial; Class Variety; Model Yield = Variety; Means Variety / hovtest = Levene; By Soil; Proc GLM Data = PR; Class Variety Block; Model Yield = Variety Block pred*pred; By Soil; Proc Plot data = PR; Plot res*pred; By Soil; Run; Quit; And the results: Normality of residuals (Method 1 Method 3) Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.975394 Pr < W 0.6943 Shapiro-Wilk W 0.977548 Pr < W 0.7573 Shapiro-Wilk W 0.976278 Pr < W 0.7204 Homogeneity of variances (Method 1 Method 3) Levene's Test for Homogeneity of Yield Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F Variety 4 2008.6 502.2 2.47 0.0705 Variety 4 4763.3 1190.8 1.40 0.2620 Variety 4 1963.4 490.8 0.87 0.4950 Nonadditivity (Method 1 Method 3) Source DF Type I SS Mean Square F Value Pr > F pred*pred 1 1.27330875 1.27330875 0.07 0.7914 pred*pred 1 72.8265041 72.8265041 2.90 0.1049 pred*pred 1 17.2174555 17.2174555 1.07 0.3129 All assumptions are nicely met, so we can report the ANOVA results for Variety without reservations: Method Treatment Block MSD Tukey 1 0.3947 NS 0.7011 NS 7.10 4=3=5=2=1 PLS205 2014 6.14 Lab 6 (Topic 9)

2 0.4435 NS 0.9244 NS 9.06 5=3=2=1=4 3 0.0347 * 0.0950 NS 6.93 2=1=3=4 1=3=4=5 Interesting. While for the overall ANOVA no differences among Varieties was detected, here we see that within Method 3, differences are in fact present. This is an "almost practically complete analysis" because a complete analysis would require commentary (i.e. interpretation) of all the results generated above, a discussion as to which variety-method combinations are recommended or not recommended, etc. One should also make efforts to visualize the data using bar charts or interaction plots. The things to realize is that, even for a simple example like this, the necessary analysis can be substantial. An added thorn: This analysis of simple effects involves an enormous number of independent questions: 8 Shapiro-Wilk tests, 8 Levene s tests, 8 non-additivity tests, 45 Tukey pairwise comparisons! This has major implications in terms of the experimentwise error rate, so be aware! APPENDIX 2: Graphing in Excel The same results can be easily obtained in excel by organizing the data into series (rows) as below: Var1 Var2 Var3 Var4 Var5 Soil1 21.8 21.9 23.1 26.0 22.3 Soil2 15.1 15.2 15.5 13.5 19.2 Soil3 18.4 19.9 17.3 14.8 12.6 and then selecting insert->line->2d-line PLS205 2014 6.15 Lab 6 (Topic 9)

30.0 28.0 26.0 24.0 22.0 20.0 18.0 16.0 14.0 12.0 10.0 Var1 Var2 Var3 Var4 Var5 Soil1 Soil2 Soil3 Errors can be added by selecting Chart tools->layout ->error bars -> custom ->specify value and selecting rows for each series from a Table organized as above with SE. SE Var1 Var2 Var3 Var4 Var5 Soil1 1.1 2.4 1.4 1.3 1.5 Soil2 2.9 1.4 1.5 2.3 1.4 Soil3 1.1 1.4 1.8 2.3 2.2 The non-parallel nature of the lines in this interaction plot demonstrates visually the significant interaction we found in the ANOVA. PLS205 2014 6.16 Lab 6 (Topic 9)