Analysis of Covariance

Similar documents
Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Single Factor Experiments

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Lecture 9: Factorial Design Montgomery: chapter 5

Outline Topic 21 - Two Factor ANOVA

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Assignment 6 Answer Keys

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

STAT 705 Chapters 22: Analysis of Covariance

Lecture 7: Latin Square and Related Design

Topic 13. Analysis of Covariance (ANCOVA) [ST&D chapter 17] 13.1 Introduction Review of regression concepts

STA 303H1F: Two-way Analysis of Variance Practice Problems

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

Topic 32: Two-Way Mixed Effects Model

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Analysis of Covariance

Topic 20: Single Factor Analysis of Variance

SAS Program Part 1: proc import datafile="y:\iowa_classes\stat_5201_design\examples\2-23_drillspeed_feed\mont_5-7.csv" out=ds dbms=csv replace; run;

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Lecture 7: Latin Squares and Related Designs

Topic 23: Diagnostics and Remedies

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

VIII. ANCOVA. A. Introduction

Lecture 4. Random Effects in Completely Randomized Design

Unbalanced Data in Factorials Types I, II, III SS Part 2

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

Lecture 11 Multiple Linear Regression

Comparison of a Population Means

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes)

Data Set 8: Laysan Finch Beak Widths

Topic 28: Unequal Replication in Two-Way ANOVA

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

Response Surface Methodology

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Overview Scatter Plot Example

Least Squares Analyses of Variance and Covariance

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Chapter 20 : Two factor studies one case per treatment Chapter 21: Randomized complete block designs

STAT 705 Chapter 19: Two-way ANOVA

Linear Combinations of Group Means

STAT 350. Assignment 4

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Lecture 4. Checking Model Adequacy

Multiple Linear Regression

Chapter 1 Linear Regression with One Predictor

Topic 29: Three-Way ANOVA

6. Multiple regression - PROC GLM

Formula for the t-test

Lecture 11: Nested and Split-Plot Designs

STAT 705 Chapter 19: Two-way ANOVA

Categorical Predictor Variables

where x and ȳ are the sample means of x 1,, x n

Week 7.1--IES 612-STA STA doc

Analysis of variance. April 16, Contents Comparison of several groups

Ch Inference for Linear Regression

Analysis of variance. April 16, 2009

Models for Clustered Data

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Chapter 19. More Complex ANOVA Designs Three-way ANOVA

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Models for Clustered Data

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Split-plot Designs. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 21 1

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

Lecture 11: Simple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

N J SS W /df W N - 1

Ch 2: Simple Linear Regression

Chapter 6 Multiple Regression

11 Factors, ANOVA, and Regression: SAS versus Splus

Analysis of Variance

Analysis of variance and regression. November 22, 2007

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Differences of Least Squares Means

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Multivariate analysis of variance and covariance

3 Variables: Cyberloafing Conscientiousness Age

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

STAT 525 Fall Final exam. Tuesday December 14, 2010

General Linear Models. with General Linear Hypothesis Tests and Likelihood Ratio Tests

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

BIOL 933!! Lab 10!! Fall Topic 13: Covariance Analysis

1 Tomato yield example.

Lecture 10: Experiments with Random Effects

EXST 7015 Fall 2014 Lab 11: Randomized Block Design and Nested Design

Lecture 10: Factorial Designs with Random Factors

General Linear Model (Chapter 4)

Repeated Measures Part 2: Cartoon data

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Chapter 8 Quantitative and Qualitative Predictors

Transcription:

Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1

When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2 Unaffected by treatment Can measure x but can t control it (otherwise block) Factor x then called a covariate or concomitant variable ANCOVA adjusts y for effect of covariate x Combination of regression and analysis of variance Without adjustment, effects of x on y Will inflate σ 2 May alter trt mean comparisons (in extreme cases) STAT 514 Topic 10 2

Examples Pretest/Posttest score analysis: The change in score y may be associated with current GPA. Also the posttest score y may be associated with the pretest score x. Analysis of covariance provides a way to handicap students. Weight gain experiments in animals: When comparing different feeds, the weight gain y may be associated with the dominance x of the animal. While it may be hard to control for dominance, it is not too difficult to measure. Comparing competing drug products: The effect of the drug y after two hours may be associated with the initial mental and physical shape of the subject. Variables describing mental and physical shape x at baseline may be used as covariates. STAT 514 Topic 10 3

Model Description Consider single covariate in CRD Statistical model is y ij = µ+τ i +β(x ij x.. )+ǫ ij { i = 1,2,...,a j = 1,2,...n i Additional assumptions x ij not affected by treatment x and y are linearly related Constant slope across groups (can be relaxed) Note: Subtracting off x.. is not needed (conceptual) STAT 514 Topic 10 4

Estimation Conceptual Approach: Fit one-way model (y = trt) Fit one-way model (x = trt) Regress residuals (residuals1 = residuals2) Provides estimate of slope after adjusting for trt Model estimates are ˆµ = y.. ˆβ = (y ij y i. )(x ij x i. )/ (x ij x i. ) 2 ˆτ i = y i. y.. ˆβ(x i. x.. ) STAT 514 Topic 10 5

F Tests Test H 0 : τ 1 = τ 2 =... = τ a = 0 Compare treatment means after adjusting for differences among treatments due to differences in covariate levels Trt and covariate not orthogonal (order of fit matters) F 0 = SS(trt x)/a 1 SS E /(N a 1) Test: β = 0 Sum of Squares regression (SS x ): ˆβ 2 (x ij x i. ) 2 F 0 = SS x /1 SS E /(N a 1) STAT 514 Topic 10 6

Mean Estimates Adjusted treatment means Estimate ˆµ i = ˆµ+ ˆτ i = y i. ˆβ(x i. x.. ) Using the expected value of y when x is equal to the average covariate value Can really use any value of x, just make sure it is reasonable for all factor levels Variance: ˆσ 2( 1/n+(x i. x.. ) 2 / (x ij x i. ) 2) Pairwise differences Estimate: ˆτ i ˆτ i = y i. y i. ˆβ(x i. x i.) Variance: ˆσ 2( 2/n+(x i. x i.) 2 / (x ij x i. ) 2) STAT 514 Topic 10 7

Analysis of Covariance Table 15.10 Looking at the breaking strength (in pounds) of a monofilament fiber produced by 3 different machines Known that strength depends on the fiber thickness Machines designed to keep thickness within specification limits but thickness will vary fiber to fiber Will consider diameter of the fiber as a covariate STAT 514 Topic 10 8

SAS Code data ancova; input machine str dia @@; datalines; 1 36 20 1 41 25 1 39 24 1 42 25 1 49 32 2 40 22 2 48 28 2 39 22 2 45 30 2 44 28 3 35 21 3 37 23 3 42 26 3 34 21 3 32 15 ; symbol1 i=rl v=circle; proc gplot; plot str*dia=machine; proc glm; class machine; model str = machine; lsmeans machine / adjust=tukey; proc glm; class machine; model str = machine dia; lsmeans machine / adjust=tukey; STAT 514 Topic 10 9

Boxplot STAT 514 Topic 10 10

SAS Output - No Covariate The GLM Procedure Dependent Variable: str Sum of Source DF Squares Mean Square F Value Pr > F Model 2 140.4000000 70.2000000 4.09 0.0442 Error 12 206.0000000 17.1666667 Corrected Total 14 346.4000000 R-Square Coeff Var Root MSE str Mean 0.405312 10.30664 4.143268 40.20000 Source DF Type I SS Mean Square F Value Pr > F machine 2 140.4000000 70.2000000 4.09 0.0442 Source DF Type III SS Mean Square F Value Pr > F machine 2 140.4000000 70.2000000 4.09 0.0442 STAT 514 Topic 10 11

SAS Output - No Covariate The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer LSMEAN machine str LSMEAN Number 1 41.4000000 1 2 43.2000000 2 3 36.0000000 3 str LSMEAN LSMEAN machine Number A 43.2 2 2 A B A 41.4 1 1 B B 36.0 3 3 STAT 514 Topic 10 12

Difference Plot STAT 514 Topic 10 13

Table of Means The MEANS Procedure ------------------------------ machine=1 --------------------------------- Variable N Mean Std Dev Minimum Maximum str 5 41.4000000 4.8270074 36.0000000 49.0000000 dia 5 25.2000000 4.3243497 20.0000000 32.0000000 ------------------------------ machine=2 --------------------------------- Variable N Mean Std Dev Minimum Maximum str 5 43.2000000 3.7013511 39.0000000 48.0000000 dia 5 26.0000000 3.7416574 22.0000000 30.0000000 ------------------------------ machine=3 --------------------------------- Variable N Mean Std Dev Minimum Maximum str 5 36.0000000 3.8078866 32.0000000 42.0000000 dia 5 21.2000000 4.0249224 15.0000000 26.0000000 STAT 514 Topic 10 14

Scatterplot STAT 514 Topic 10 15

SAS Output The GLM Procedure Dependent Variable: str Sum of Source DF Squares Mean Square F Value Pr > F Model 3 318.4141104 106.1380368 41.72 <.0001 Error 11 27.9858896 2.5441718 Corrected Total 14 346.4000000 R-Square Coeff Var Root MSE str Mean 0.919209 3.967776 1.595046 40.20000 Source DF Type I SS Mean Square F Value Pr > F machine 2 140.4000000 70.2000000 27.59 <.0001 dia 1 178.0141104 178.0141104 69.97 <.0001 Source DF Type III SS Mean Square F Value Pr > F machine 2 13.2838506 6.6419253 2.61 0.1181 dia 1 178.0141104 178.0141104 69.97 <.0001 STAT 514 Topic 10 16

SAS Output The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer LSMEAN machine str LSMEAN Number 1 40.3824131 1 2 41.4192229 2 3 38.7983640 3 str LSMEAN LSMEAN machine Number A 41.41922 2 2 A A 40.38241 1 1 A A 38.79836 3 3 ****Must use LSMEANS to get adjusted means **** STAT 514 Topic 10 17

Difference Plot STAT 514 Topic 10 18

Summary Positive linear association between diameter and strength. Are slopes constant? Will investigate shortly. Model including covariate better explains the data. Percent of explained variation jumps from 40.5% to 91.9%. MSE drops from 17.167 to 2.544. Because Machine 3 had narrower fibers, its adjusted mean strength is shifted upwards. Likewise Machine 2 had wider fibers so mean shifted downward No significant difference among the machines relies on assumption that diameter not different across machines STAT 514 Topic 10 19

Nonconstant Slope in ANCOVA Statistical model for constant slope is { i = 1,2,...,a y ij = µ+τ i +β(x ij x..)+ǫ ij j = 1,2,...n i Can allow for different slope by including interaction { i = 1,2,...,a y ij = µ+τ i +(β +(βτ) i )(x ij x..)+ǫ ij j = 1,2,...n i In SAS, simply add interaction term into model Provides test for nonconstant slope STAT 514 Topic 10 20

SAS Code data ancova; input machine str dia @@; datalines; 1 36 20 1 41 25 1 39 24 1 42 25 1 49 32 2 40 22 2 48 28 2 39 22 2 45 30 2 44 28 3 35 21 3 37 23 3 42 26 3 34 21 3 32 15 ; proc glm; class machine; model str = machine dia; lsmeans machine / adjust=tukey lines; proc glm; class machine; model str = machine dia machine*dia; lsmeans machine / adjust=tukey lines; run; STAT 514 Topic 10 21

SAS Output The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr > F Model 5 321.1512879 64.2302576 22.90 <.0001 Error 9 25.2487121 2.8054125 Corrected Total 14 346.4000000 R-Square Coeff Var Root MSE str Mean 0.927111 4.166509 1.674937 40.20000 Source DF Type I SS Mean Square F Value Pr > F machine 2 140.4000000 70.2000000 25.02 0.0002 dia 1 178.0141104 178.0141104 63.45 <.0001 dia*machine 2 2.7371774 1.3685887 0.49 0.6293 Source DF Type III SS Mean Square F Value Pr > F machine 2 2.6641625 1.3320812 0.47 0.6367 dia 1 171.1192314 171.1192314 61.00 <.0001 dia*machine 2 2.7371774 1.3685887 0.49 0.6293 STAT 514 Topic 10 22

Regression Approach to ANCOVA Consider ANCOVA model with a = 3 y j = β 0 +β 1 X 1j +β 2 X 2j +β 3 X 3j +ǫ j j = 1,2,...N X 1j = 1 if Trt 1 and X 1j = 1 if Trt 3 X 2j = 1 if Trt 2 and X 2j = 1 if Trt 3 X 3j = (x j x.. ) Trt 1: y j = β 0 +β 1 +β 3 (x j x.. )+ǫ j Trt 2: y j = β 0 +β 2 +β 3 (x j x.. )+ǫ j Trt 3: y j = β 0 β 1 β 2 +β 3 (x j x.. )+ǫ j Results in estimates ˆµ = ˆβ 0 ˆτ 1 = ˆβ 1 ˆτ 2 = ˆβ 2 ˆβ = ˆβ 3 STAT 514 Topic 10 23

Analysis of Covariance Can incorporate covariate into any model For two factor model y ijk = µ+τ i +β j +(τβ) ij +β(x ijk x... )+ǫ ijk Assume constant slope for each ij combination Can include interaction terms to vary slope Plot y vs x for each combination STAT 514 Topic 10 24

Background Reading ANCOVA Model: Montgomery 15.3.1 General Regression Significance Test: Montgomery 15.3.3 STAT 514 Topic 10 25