PLS205 Lab 2 January 15, Laboratory Topic 3

Similar documents
Laboratory Topics 4 & 5

PLS205 KEY Winter Homework Topic 3. The following represents one way to program SAS for this question:

Topic 6. Two-way designs: Randomized Complete Block Design [ST&D Chapter 9 sections 9.1 to 9.7 (except 9.6) and section 15.8]

Lecture 7 Randomized Complete Block Design (RCBD) [ST&D sections (except 9.6) and section 15.8]

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

df=degrees of freedom = n - 1

Topic 13. Analysis of Covariance (ANCOVA) [ST&D chapter 17] 13.1 Introduction Review of regression concepts

PLS205 Lab 6 February 13, Laboratory Topic 9

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

PLS205 Winter Homework Topic 8

4.8 Alternate Analysis as a Oneway ANOVA

Power Analysis for One-Way ANOVA

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 20: Single Factor Analysis of Variance

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Lecture 3: Inference in SLR

Week 7.1--IES 612-STA STA doc

Analysis of Variance

Topic 28: Unequal Replication in Two-Way ANOVA

Orthogonal contrasts and multiple comparisons

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Lecture 15 Topic 11: Unbalanced Designs (missing data)

5.3 Three-Stage Nested Design Example

Linear Combinations of Group Means

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Sample Size / Power Calculations

General Linear Model (Chapter 4)

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Topic 13. Analysis of Covariance (ANCOVA) - Part II [ST&D Ch. 17]

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Analysis of variance

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test.

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Chapter 11. Analysis of Variance (One-Way)

N J SS W /df W N - 1

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Lecture 13 Extra Sums of Squares

Single Factor Experiments

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

STAT 115:Experimental Designs

ANOVA (Analysis of Variance) output RLS 11/20/2016

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

ST505/S697R: Fall Homework 2 Solution.

Lecture 4. Random Effects in Completely Randomized Design

Statistics 512: Applied Linear Models. Topic 9

Chapter 11 - Lecture 1 Single Factor ANOVA

22s:152 Applied Linear Regression. Take random samples from each of m populations.

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

A discussion on multiple regression models

Lecture 4. Checking Model Adequacy

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

SAS/STAT 15.1 User s Guide The GLMMOD Procedure

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes)

RCB - Example. STA305 week 10 1

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006

Design of Experiments. Factorial experiments require a lot of resources

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Topic 14: Inference in Multiple Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Ch 2: Simple Linear Regression

Introduction to the Analysis of Variance (ANOVA)

Lecture 11: Simple Linear Regression

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Orthogonal and Non-orthogonal Polynomial Constrasts

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Confidence Interval for the mean response

1 Tomato yield example.

Answer to exercise: Blood pressure lowering drugs

Lec 1: An Introduction to ANOVA

Answer Keys to Homework#10

IX. Complete Block Designs (CBD s)

Assignment 9 Answer Keys

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

Incomplete Block Designs

STA 303H1F: Two-way Analysis of Variance Practice Problems

Repeated Measures Part 2: Cartoon data

2 Prediction and Analysis of Variance

STOR 455 STATISTICAL METHODS I

Orthogonal contrasts for a 2x2 factorial design Example p130

One-way ANOVA Model Assumptions

Addition of Center Points to a 2 k Designs Section 6-6 page 271

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

SAS/STAT 12.3 User s Guide. The GLMMOD Procedure (Chapter)

COMPARING SEVERAL MEANS: ANOVA

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Inferences for Regression

Introduction. Chapter 8

Comparison of a Population Means

2. Treatments are randomly assigned to EUs such that each treatment occurs equally often in the experiment. (1 randomization per experiment)

Chapter 31 The GLMMOD Procedure. Chapter Table of Contents

Transcription:

PLS205 Lab 2 January 15, 2015 Laboratory Topic 3 General format of ANOVA in SAS Testing the assumption of homogeneity of variances by "/hovtest" by ANOVA of squared residuals Proc Power for ANOVA One-way ANOVA of nested design by Proc Nested by Proc GLM Obtaining and interpreting Components of Variance by Proc VarComp ANOVA in SAS The primary SAS procedures for analysis of variance are Proc ANOVA and Proc GLM (General Linear Model). Proc ANOVA assumes that the dataset is "balanced" (i.e. that it has the same number of replications r for each treatment), though it can be used for single factor ANOVA even if the design is unbalanced. Proc GLM is more general and can be used for data not meeting this restriction; this is the procedure we will be using. Proc GLM has two required statements: Class and Model. CLASS: With the Class statement, you declare all the classification variables used in the linear model. All classification variable appearing in the model must first be declared here. The syntax is: Class variables ; MODEL: With the Model statement, you use the declared classification variables to build an explanatory linear model for the response variable (explained below). The syntax is: Model dependent variable = independent effects; The Model statement is a minimalist representation of a general linear effects model, where a response (i.e. dependent variable) is explained by a host of known additive deviations from a base mean: Y i = μ + κ 1 + κ 2 +... + κ n + ε i For a completely randomized design (CRD) with one treatment variable, the independent effect is simply the one classification variable and the dependent variable is simply the response. But there can be more complex models: Model y = a b ; -> main effects Model y = a b a*b ; -> interaction PLS205 2015 2.1 Lab 2 (Topic 3)

Example 3.1 From ST&D pg. 141 [Lab3ex1.sas] In this experiment, the nitrogen fixation capacities of six different rhizobia on clover are compared. The experiment is arranged as a CRD with 6 treatments (i.e. 6 different rhizobia strains) and five independent replications per treatment. Data Clover; Input Culture $ Nlevel; Cards; 3DOk1 24.1 3DOk1 32.6 3DOk1 27 3DOk1 28.9 3DOk1 31.4 3DOk5 19.1 3DOk5 24.8 3DOk5 26.3 3DOk5 25.2 3DOk5 24.3 ; Proc GLM; Class Culture; Model Nlevel = Culture; Means Culture; Proc Plot; Plot Nlevel*culture; Run; Quit; 3DOk4 17.9 3DOk4 16.5 3DOk4 10.9 3DOk4 11.9 3DOk4 15.8 3DOk7 20.7 3DOk7 23.4 3DOk7 20.5 3DOk7 18.1 3DOk7 16.7 3DOk13 14.3 3DOk13 14.4 3DOk13 11.8 3DOk13 11.6 3DOk13 14.2 Comp 17.3 Comp 19.4 Comp 19.1 Comp 16.9 Comp 20.8 * Response variable = Class variable; * Gives us the means and stdevs of the treatments; * Generates plot of NLevel (y) vs. Culture (x); Results of the GLM procedure: Dependent Variable: Nlevel Sum of Source DF Squares Mean Square F Value Pr > F Model 5 845.717667 169.143533 25.36 <.0001 Error 24 160.052000 6.668833 Corrected Total 29 1005.769667 R-Square Coeff Var Root MSE Nlevel Mean 0.840866 13.00088 2.582408 19.86333 Source DF Type III SS Mean Square F Value Pr > F Culture 5 845.7176667 169.1435333 25.36 <.0001 Interpretation Recall that the null hypothesis of an ANOVA is that all means are equal (H 0 : μ 1 = μ 2 = = μ n ) while the alternate hypothesis is that at least one mean is not equal (H 1 : μ i μ j ). With a p-value of less than 0.0001, we soundly reject H 0. A significant difference exists among the treatment means. Notice that the R-Square value is simply a measure of the amount of variation explained by the model: R-Square = 845.717667/ 1005.769667= 0.840866 PLS205 2015 2.2 Lab 2 (Topic 3)

Sums of Squares (SS) When you run an ANOVA, SAS will produce two different tables by default, one featuring "Type I SS," the other featuring "Type III SS." These Types have nothing to do with Type I and Type II errors. We'll discuss these SS Types later in the term; for now, just obey the following: 1. Always report Type III SS when performing ANOVAs. 2. Always report Type I SS when performing regressions. Testing the Assumption of Homogeneity of Variances When you perform an ANOVA, you assert that the data under consideration meet the assumptions for which an ANOVA is valid. We briefly discussed normality last week and will revisit it more formally next week. Now we will cover a second assumption, that the variances of all compared treatments are homogeneous. by / HovTest You can add the option "/ HovTest = Levene" to the Means statement following the Proc GLM. Levene s test is an ANOVA of the squares of the residuals (i.e. deviations of the observations from their expected values based on the linear model; in this case, this means deviations of the observation from 2 their respective treatment means Y Y ) ). To see how this works, modify the Means statement in Example 3.1 to read as follows: ( ij i Proc GLM; Class Culture; Model Nlevel = Culture; Means Culture / HovTest = Levene; Result of Levene's Test: Levene's Test for Homogeneity of Nlevel Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F Culture 5 227.0 45.3982 1.15 0.3606 Error 24 945.2 39.3821 Interpretation H 0 for Levene's Test is that the variances of the treatments are homogeneous, and H a is that they are not homogeneous. With a p-value of 0.3868, we fail to reject H 0 ; thus we have no evidence, at 95% confidence level, that the variances are not homogeneous. by ANOVA of Squared Residuals As stated above, the "/ HovTest = Levene" option performs an ANOVA of the squared residuals. To see that this is the case, we can program SAS the following way: PLS205 2015 2.3 Lab 2 (Topic 3)

Example 3.2 [Lab3ex2.sas] Proc GLM; Class Culture; Model Nlevel = Culture; Means Culture; Output Out = Residual R = Res1 P = Pred1; * Creates data set called Residual that includes all the variables as before plus two new ones: Res1 and Pred1. R tells SAS to calculate the Residuals and we name that new variable Res1 P tells SAS to calculate the Predicted and we name that new variable Pred1; Proc Print Data = Residual; Proc Plot Data = Residual; Plot Nlevel*Culture; Plot Res1*Pred1; * Take a look at this data set Residual; * It is now necessary to specify the data set; Data Levene; * Creates new data set called Levene (or any name you like); Set Residual; * Imports the data from the data set Residual; SqRes = Res1*Res1; * Adds to it a new variable called SqRes that is the square of the residuals; Proc Print Data = Levene; * Take a look at this data set Levene; Proc GLM Data = Levene; Class Culture; Model SqRes = Culture; * Perform an ANOVA of the variable SqRes; Run; Quit; Results of the GLM Procedure: Dependent Variable: SqRes Source DF Type III SS Mean Square F Value Pr > F Culture 5 226.9909083 45.3981817 1.15 0.3606 It is no coincidence that we obtain the same p-value (0.3606) as in the previous case where we used the /HovTest command!! Example 3.3 [Lab3ex3.sas] Proc Power and ANOVA From SAS, we get the mean of each treatment, and the standard deviation for each treatment. PLS205 2015 2.4 Lab 2 (Topic 3)

Level of ------------Nlevel----------- Culture N Mean Std Dev 3DOk1 5 28.8000000 3.41101158 3DOk13 5 13.2600000 1.42758537 3DOk4 5 14.6000000 3.03809151 3DOk5 5 23.9400000 2.80410414 3DOk7 5 19.8800000 2.58495648 Comp 5 18.7000000 1.60156174 To get the pooled standard deviation, we first square each standard deviation (to get the variance), and then calculate as before: Pooled Standard Deviation = SQRT ((11.635+2.037999989+9.230000023+7.86300002 +6.68200000 +2.565000007)/6) = 2.582408 Please note that the Pooled Standard Deviation is equal to the Root Mean Square Error (MSE) that we found in the ANOVA: 2.582408 proc power; onewayanova test=overall_f groupmeans = 28.8 13.26 14.6 23.94 19.88 18.7 stddev = 2.582408438 npergroup = 5 power =.; run; Overall F Test for One-Way ANOVA Method Exact Group Means 28.8 13.26 14.6 23.94 19.88 18.7 Standard Deviation 2.582408 Sample Size Per Group 5 Alpha 0.05 With multiple sample sizes Power >.999 proc power; onewayanova test=overall_f groupmeans = 28.8 13.26 14.6 23.94 19.88 18.7 stddev = 2.582408438 npergroup = 2 to 10 by 1 power =.; run; Overall F Test for One-Way ANOVA Method Exact Group Means 28.8 13.26 14.6 23.94 19.88 18.7 Standard Deviation 2.582408 Alpha 0.05 N Per Index Group Power PLS205 2015 2.5 Lab 2 (Topic 3)

1 2 0.957 2 3 >.999 3 4 >.999 4 5 >.999 5 6 >.999 6 7 >.999 7 8 >.999 8 9 >.999 9 10 >.999 One-Way ANOVA of Nested Design The following example will demonstrate how to analyze data if it has more than one observation per experimental unit using Proc GLM. The example also demonstrates how to calculate the estimates for the variance components using ProcVarcomp. Example 3.4 ST&D pg. 157-170 [Lab3ex5.sas] In this experiment, the effects of different treatments on the growth of mint plants are being measured. It is a purely nested CRD with six levels of treatment, three replications (i.e. pots) per level, and four subsamples per replication (i.e. plants). Title 'Nested Design'; Data Mint; do Trtmt = 1 to 6; do Pot = 1 to 3; do Plant = 1 to 4; Input Growth @@; end; end; end; Cards; 3.5 4.0 3.0 4.5 2.5 4.5 5.5 5.0 3.0 3.0 2.5 3.0 5.0 5.5 4.0 3.5 3.5 3.5 3.0 4.0 4.5 4.0 4.0 5.0 5.0 4.5 5.0 4.5 5.5 6.0 5.0 5.0 5.5 4.5 6.5 5.5 8.5 6.0 9.0 8.5 6.5 7.0 8.0 6.5 7.0 7.0 7.0 7.0 6.0 5.5 3.5 7.0 6.0 8.5 4.5 7.5 6.5 6.5 8.5 7.5 Output; 7.0 9.0 8.5 8.5 6.0 7.0 7.0 7.0 11.0 7.0 9.0 8.0 ; Proc GLM; * The @@ says the input line has values; * for more than 1 observation; Reads Trtmt 1, Pot 1, Plant 1, Growth = 3.5 Then Trtmt 1, Pot 1, Plant 2, Growth = 4.0 And so on Next line starting at 2.5 represents Trtmt 1, Pot 2, Plant 1, Growth = 2.5 Etc. PLS205 2015 2.6 Lab 2 (Topic 3)

Class Trtmt Pot; * Plant not included because it is the error term; Model Growth = Trtmt Pot(Trtmt); * As an ID variable, 'Pot' only has meaning once you specify the "Treatment"; Random Pot(Trtmt); * Must declare "Pot" as a random variable; Test H = Trtmt E = Pot(Trtmt); * Here we request a customized F test with the correct error term Hypothesis H = Trtmt, Error E = Pot(Trtmt); Proc VarComp Method = Type1; Class Trtmt Pot; Model Growth = Trtmt Pot(Trtmt); * To obtain VARiance COMPonents analysis; Run; Quit; Results of the GLM Procedure: Dependent Variable: Growth Sum of Source DF Squares Mean Square F Value Pr > F Model 17 205.4756944 12.0868056 12.94 <.0001 Error 54 50.4375000 0.9340278 Corrected Total 71 255.9131944 Source DF Type III SS Mean Square F Value Pr > F Trtmt 5 179.6423611 35.9284722 38.47 <.0001 Pot(Trtmt) 12 25.8333333 2.1527778 2.30 0.0186 This is the incorrect F test for Trtmt because it is using the wrong error term. By default, SAS computes F values using the residual MS as the error term (0.934 in this case). The TEST statement, however, allows you to request additional F tests using other error terms. "H" specifies which effects in the preceding model are to be used as hypothesis (numerator). "E" specifies one and only one effect to be used as the error term (denominator). Results of the TEST statement: Source Type III Expected Mean Square. Trtmt Pot(Trtmt) Var(Error) + 4 Var(Pot(Trtmt)) + Q(Trtmt) Var(Error) + 4 Var(Pot(Trtmt)) Dependent Variable: Growth Tests of Hypotheses Using the Type III MS for Pot(Trtmt) as an Error Term Source DF Type III SS Mean Square F Value Pr > F Trtmt 5 179.6423611 35.9284722 16.69 <.0001 This is the correct F test for treatment since it uses MS-pot as the error term! The results of Proc VarComp: Variance Component Estimate Var(Trtmt) 2.81464 PLS205 2015 2.7 Lab 2 (Topic 3)

Var(Pot(Trtmt)) 0.30469 Var(Error) 0.93403 It is worth noting that the same F value will result if you simply consider the averages of the subsamples, though the MS's will be different. Again, if you are not concerned with the components of the variance, there is no need to carry out a nested analysis. Just average the subsamples and treat it as an un-nested experiment. Appendix For those of you interested, here's another way you could have coded the data input routine for the first example (useful depending on how you're presented the dataset): Data Clover; Do Culture = 1 to 6; Do Rep = 1 to 5; Input Nlevel @@; Output; End; End; Cards; 24.1 32.6 27.0 32.1 33.0 17.7 24.8 27.9 25.2 24.3 17.0 19.4 9.1 11.9 15.8 20.7 21.0 20.5 18.8 18.6 14.3 14.4 11.8 11.6 14.2 17.3 19.4 19.1 16.9 20.8 ; Or if you wished to keep the names of the treatments in: Data Clover; Do Culture = '3Dok01', '3Dok05', '3Dok04', '3Dok07', '3Dok13', 'Comp00'; Do Rep = 1 to 5; Input Nlevel @@; Output; End; End; Cards; 24.1 32.6 27.0 32.1 33.0 17.7 24.8 27.9 25.2 24.3 17.0 19.4 9.1 11.9 15.8 20.7 21.0 20.5 18.8 18.6 14.3 14.4 11.8 11.6 14.2 PLS205 2015 2.8 Lab 2 (Topic 3)

17.3 19.4 19.1 16.9 20.8 ; PLS205 2015 2.9 Lab 2 (Topic 3)