Multiple Predictor Variables: ANOVA

Similar documents
Multiple Predictor Variables: ANOVA

Handling Categorical Predictors: ANOVA

Orthogonal contrasts for a 2x2 factorial design Example p130

Workshop 7.4a: Single factor ANOVA

Lecture 10. Factorial experiments (2-way ANOVA etc)

School of Mathematical Sciences. Question 1

Factorial designs. Experiments

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Lecture 15 Topic 11: Unbalanced Designs (missing data)

Unbalanced Data in Factorials Types I, II, III SS Part 1

Increasing precision by partitioning the error sum of squares: Blocking: SSE (CRD) à SSB + SSE (RCBD) Contrasts: SST à (t 1) orthogonal contrasts

Factorial and Unbalanced Analysis of Variance

Multiple Regression Examples

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Biostatistics 380 Multiple Regression 1. Multiple Regression

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

More about Single Factor Experiments

2-way analysis of variance

1 Use of indicator random variables. (Chapter 8)

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

STATISTICS 110/201 PRACTICE FINAL EXAM

Allow the investigation of the effects of a number of variables on some response

Analysis of Variance Bios 662

Multiple Regression: Example

Confidence Interval for the mean response

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

ST430 Exam 2 Solutions

ANOVA: Analysis of Variation

3. Factorial Experiments (Ch.5. Factorial Experiments)

FACTORIAL DESIGNS and NESTED DESIGNS

Stat 6640 Solution to Midterm #2

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

22s:152 Applied Linear Regression. Take random samples from each of m populations.

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Comparing Nested Models

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Tests of Linear Restrictions

Inference for Regression

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Lecture 1: Linear Models and Applications

Week 7 Multiple factors. Ch , Some miscellaneous parts

Stat 5102 Final Exam May 14, 2015

R 2 and F -Tests and ANOVA

Simple Linear Regression

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Variance Decomposition and Goodness of Fit

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lecture 6 Multiple Linear Regression, cont.

Note: The problem numbering below may not reflect actual numbering in DGE.

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Linear Modelling: Simple Regression

Workshop 9.3a: Randomized block designs

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

MODELS WITHOUT AN INTERCEPT

36-707: Regression Analysis Homework Solutions. Homework 3

Chapter 4: Randomized Blocks and Latin Squares

Confidence Intervals, Testing and ANOVA Summary

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing

CS 147: Computer Systems Performance Analysis

NC Births, ANOVA & F-tests

STAT22200 Spring 2014 Chapter 8A

Regression. Marc H. Mehlman University of New Haven

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA

Two (or more) factors, say A and B, with a and b levels, respectively.

Extensions of One-Way ANOVA.

Lec 5: Factorial Experiment

A discussion on multiple regression models

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Basic Business Statistics, 10/e

Unit 8: 2 k Factorial Designs, Single or Unequal Replications in Factorial Designs, and Incomplete Block Designs

R Output for Linear Models using functions lm(), gls() & glm()

Two-Way Factorial Designs

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

ANOVA (Analysis of Variance) output RLS 11/20/2016

23. Fractional factorials - introduction

Ch 2: Simple Linear Regression

Chapter 5 Introduction to Factorial Designs Solutions

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

The simple linear regression model discussed in Chapter 13 was written as

Regression Models for Quantitative and Qualitative Predictors: An Overview

Introduction to Regression

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

STAT22200 Spring 2014 Chapter 14

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Statistics For Economics & Business

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Topic 9: Factorial treatment structures. Introduction. Terminology. Example of a 2x2 factorial

Correlation and Simple Linear Regression

Lecture 13 Extra Sums of Squares

Transcription:

Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment types and combinations? What if we have spatial gradients in our experiments? 2/32

Multiway ANOVA Extends multiple predictor framework Categorical treatments are orthogonal Reflects reality of experiments Stepping-stone to factorial designs 3/32 Blocked Designs 4/32

What if you manipulate two factors? Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment only has 1 replicate of a second treatment 5/32 What if you manipulate two factors? Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment only has 1 replicate of a second treatment Note: Above is a Latin Squares Design - Every row and column contains one replicate of a treatment. 5/32

Effects of Stickleback Density on Zooplankton Units placed across a lake so that 1 set of each treatment was blocked together 6/32 Treatment and Block Effects 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 control high low Treatment 1 2 3 4 5 Block 7/32

Modeling & Evaluating Multiple Factors 8/32 Model for Multiway ANOVA/ANODEV y k = β 0 + β i x i + β j x j + ɛ k ɛ ijk N(0, σ 2 ), x i = 0, 1 9/32

Model for Multiway ANOVA/ANODEV y k = β 0 + β i x i + β j x j + ɛ k ɛ ijk N(0, σ 2 ), x i = 0, 1 Or, with matrices... Y = βx + ɛ 9/32 Model for Multiway ANOVA/ANODEV Y = βx + ɛ y1 β i1 1 0 1 0 ɛ 1 y2 y3 = β i2 1 0 0 1 β j1 0 1 1 0 + ɛ 2 ɛ 3 y4 β j2 0 1 0 1 ɛ 4 10/32

Model for Multiway ANOVA/ANODEV Y = βx + ɛ y1 β i1 1 0 1 0 ɛ 1 y2 y3 = β i2 1 0 0 1 β j1 0 1 1 0 + ɛ 2 ɛ 3 y4 β j2 0 1 0 1 ɛ 4 We can have as many groups as we need, so long as there is sufficient replication of each treatment combination. 10/32 Hypotheses for Multiway ANOVA/ANODEV TreatmentHo: µ i1 = µi2 = µi3 =... Block Ho: µ j1 = µj2 = µj3 =... 11/32

Sums of Squares for Multiway ANOVA Factors are Orthogonal and Balanced, so... SST = SSA + SSB + SSR F-Test using Mean Squares as Before Type I and Type II SS will produce the same result 12/32 Before we model it, make sure Block is a factor zoop$block <- factor(zoop$block) 13/32

Two-Way ANOVA as a Linear Model zoop_lm <- lm(zooplankton treatment + block, data=zoop) 14/32 Check Diagnostics Residuals vs Fitted Normal Q Q Scale Location Residuals 0.5 0.0 0.5 14 1 13 Standardized residuals 2 1 0 1 2 13 14 1 Standardized residuals 0.0 0.4 0.8 1.2 14 1 13 1.0 1.5 2.0 2.5 3.0 3.5 1 0 1 1.0 1.5 2.0 2.5 3.0 3.5 Fitted values Theoretical Quantiles Fitted values Cook's distance Constant Leverage: Residuals vs Factor Levels Cook's distance 0.0 0.1 0.2 0.3 0.4 0.5 1 13 14 2 4 6 8 10 12 14 Standardized residuals 2 1 0 1 2 1 13 treatment : control high low 14 Obs. number Factor Level Combinations 15/32

Residuals by Groups and No Non-Additivity Pearson residuals 0.6 0.2 0.2 0.6 0.6 0.2 0.2 0.6 control high low treatment 1 2 3 4 5 block Pearson residuals 0.6 0.2 0.2 0.6 Pearson residuals 1.0 1.5 2.0 2.5 3.0 3.5 Fitted values 16/32 Residuals by Groups and No Non-Additivity Tukey s Test for Non-Additivity library(car) residualplots(zoop_lm, cex.lab=1.4) # Test stat Pr(> t ) # treatment NA NA # block NA NA # Tukey test 0.474 0.635 17/32

The ANOVA But first, what are the DF for... Treatment (with 3 levels) Block (with 5 blocks) Residuals (with n=15) 18/32 The ANOVA anova(zoop_lm) # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 6.8573 3.4287 16.3660 0.001488 # block 4 2.3400 0.5850 2.7924 0.101031 # Residuals 8 1.6760 0.2095 19/32

Coefficients via Treatment Contrasts summary(zoop_lm)$coef # Estimate Std. Error t value # (Intercept) 3.420000e+00 0.3126766 1.093782e+01 # treatmenthigh -1.640000e+00 0.2894823-5.665286e+00 # treatmentlow -1.020000e+00 0.2894823-3.523532e+00 # block2 1.039137e-15 0.3737200 2.780521e-15 # block3-7.000000e-01 0.3737200-1.873060e+00 # block4-1.000000e+00 0.3737200-2.675800e+00 # block5-3.000000e-01 0.3737200-8.027399e-01 # Pr(> t ) # (Intercept) 4.330286e-06 # treatmenthigh 4.729729e-04 # treatmentlow 7.805477e-03 # block2 1.000000e+00 # block3 9.794523e-02 # block4 2.810839e-02 # block5 4.453163e-01 20/32 Unique Effect of Each Treatment crplots(zoop_lm) Component + Residual Plots Component+Residual(zooplankton) 1.0 0.5 0.0 0.5 1.0 1.5 Component+Residual(zooplankton) 0.5 0.0 0.5 1.0 control high low treatment 1 2 3 4 5 block 21/32

Unique Effect of Each Treatment (visreg) 4.0 4.0 zooplankton 3.5 3.0 2.5 2.0 zooplankton 3.5 3.0 2.5 1.5 2.0 1.0 control high low treatment 1 2 3 4 5 block 22/32 Exercise: Bees! Load the Bee Gene Expresion Data Does bee type or colony matter? How much variation does this experiment explain? 23/32

Bee ANOVA anova(bee_lm) # Analysis of Variance Table # # Response: Expression # Df Sum Sq Mean Sq F value Pr(>F) # type 1 2.69340 2.69340 35.3465 0.02714 # colony 2 0.34293 0.17147 2.2502 0.30767 # Residuals 2 0.15240 0.07620 24/32 Bee Effects crplots(bee_lm) Component + Residual Plots Component+Residual(Expression) 0.5 0.0 0.5 Component+Residual(Expression) 0.4 0.2 0.0 0.2 for nurse 1 2 3 type colony 25/32

What if my data is unbalanced? 26/32 Unbalancing the Zooplankton Data zoop_u <- zoop[-c(1,2),] 27/32

An Unbalanced ANOVA zoop_u_lm <- update(zoop_lm, data=zoop_u) anova(zoop_u_lm) # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 4.1751 2.08754 16.481 0.003652 # block 4 1.7480 0.43700 3.450 0.086009 # Residuals 6 0.7600 0.12667 28/32 An Unbalanced ANOVA zoop_u_lm <- update(zoop_lm, data=zoop_u) anova(zoop_u_lm) # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 4.1751 2.08754 16.481 0.003652 # block 4 1.7480 0.43700 3.450 0.086009 # Residuals 6 0.7600 0.12667 Is this valid? Can we use Type I sequential SS? 28/32

Unbalanced Data and Type I SS Missing cells (i.e., treatment-block combinations) mean that order matters in testing SS zoop_u_lm1 <- lm(zooplankton treatment + block, data=zoop_u) zoop_u_lm2 <- lm(zooplankton block + treatment, data=zoop_u) Intercept versus Treatment and Block versus Treatment + Block will not produce different SS 29/32 Unbalanced Data and Type I SS # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 4.1751 2.08754 16.481 0.003652 # block 4 1.7480 0.43700 3.450 0.086009 # Residuals 6 0.7600 0.12667 # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # block 4 2.2364 0.55910 4.414 0.052852 # treatment 2 3.6867 1.84333 14.553 0.004993 # Residuals 6 0.7600 0.12667 30/32

Solution: Marginal, or Type II SS SS of Block: Treatment versus Treatment + Block SS of Treatment: Block versus Block + Treatment Note: Because of marginality, the sum of all SS will no longer equal SST 31/32 Solution: Marginal, or Type II SS Anova(zoop_u_lm1) # Anova Table (Type II tests) # # Response: zooplankton # Sum Sq Df F value Pr(>F) # treatment 3.6867 2 14.553 0.004993 # block 1.7480 4 3.450 0.086009 # Residuals 0.7600 6 Note the capital A - this is a function from the car package. 32/32