Multiple Predictor Variables: ANOVA

Similar documents
Multiple Predictor Variables: ANOVA

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Stat 5102 Final Exam May 14, 2015

Handling Categorical Predictors: ANOVA

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

MODELS WITHOUT AN INTERCEPT

36-707: Regression Analysis Homework Solutions. Homework 3

Lecture 15 Topic 11: Unbalanced Designs (missing data)

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Week 7 Multiple factors. Ch , Some miscellaneous parts

Stat 5303 (Oehlert): Models for Interaction 1

Tests of Linear Restrictions

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

NC Births, ANOVA & F-tests

Workshop 7.4a: Single factor ANOVA

ANOVA: Analysis of Variation

1 Use of indicator random variables. (Chapter 8)

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Biostatistics 380 Multiple Regression 1. Multiple Regression

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

General Linear Statistical Models - Part III

R Output for Linear Models using functions lm(), gls() & glm()

Multiple Regression: Example

22s:152 Applied Linear Regression. Take random samples from each of m populations.

School of Mathematical Sciences. Question 1

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Regression Models for Quantitative and Qualitative Predictors: An Overview

STAT22200 Spring 2014 Chapter 14

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Inference for Regression

Multiple Regression Examples

More about Single Factor Experiments

Lecture 1: Linear Models and Applications

Simple Linear Regression

R 2 and F -Tests and ANOVA

Factorial and Unbalanced Analysis of Variance

Extensions of One-Way ANOVA.

Workshop 9.3a: Randomized block designs

Analysis of Covariance

Increasing precision by partitioning the error sum of squares: Blocking: SSE (CRD) à SSB + SSE (RCBD) Contrasts: SST à (t 1) orthogonal contrasts

Orthogonal contrasts for a 2x2 factorial design Example p130

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Section 4.6 Simple Linear Regression

Unbalanced Data in Factorials Types I, II, III SS Part 1

STAT 215 Confidence and Prediction Intervals in Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

Analysis of Variance Bios 662

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

BIOL 933!! Lab 10!! Fall Topic 13: Covariance Analysis

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

Booklet of Code and Output for STAC32 Final Exam

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Notes on Maxwell & Delaney

Stat 502 Design and Analysis of Experiments General Linear Model

Lecture 10. Factorial experiments (2-way ANOVA etc)

STAT22200 Chapter 14

Confidence Interval for the mean response

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA

Topic 13. Analysis of Covariance (ANCOVA) - Part II [ST&D Ch. 17]

Data Analysis 1 LINEAR REGRESSION. Chapter 03

SCHOOL OF MATHEMATICS AND STATISTICS

Lecture 6 Multiple Linear Regression, cont.

Multivariate Linear Regression Models

CS 147: Computer Systems Performance Analysis

Lecture 22 Mixed Effects Models III Nested designs

R Demonstration ANCOVA

General Linear Model (Chapter 4)

Linear Modelling: Simple Regression

A discussion on multiple regression models

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

ST430 Exam 2 Solutions

CAS MA575 Linear Models

2-way analysis of variance

Exercise 2 SISG Association Mapping

SIMPLE REGRESSION ANALYSIS. Business Statistics

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Regression. Marc H. Mehlman University of New Haven

Stat 6640 Solution to Midterm #2

STK4900/ Lecture 3. Program

Randomized Block Designs with Replicates

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

FACTORIAL DESIGNS and NESTED DESIGNS

Factorial designs. Experiments

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Unit 8: 2 k Factorial Designs, Single or Unequal Replications in Factorial Designs, and Incomplete Block Designs

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Allow the investigation of the effects of a number of variables on some response

Exam Applied Statistical Regression. Good Luck!

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Introduction and Single Predictor Regression. Correlation

Transcription:

What if you manipulate two factors? Multiple Predictor Variables: ANOVA Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment only has 1 replicate of a second treatment Note: Above is a Latin Squares Design - Every row and column contains one replicate of a treatment. Effects of Stickleback Density on Zooplankton Treatment and Block Effects Units placed across a lake so that 1 set of each treatment was blocked together 1.0 1.5 2.0 2.5 3.0 3.5 4.0 control high low 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 2 3 4 5 Treatment Block

Model for Multiway ANOVA/ANODEV Model for Multiway ANOVA/ANODEV Y = βx + ɛ yk = β0 + βixi + βjxj + ɛk ɛijk N(0, σ 2 ), xi = 0, 1 Or, with matrices... Y = βx + ɛ y1 βi1 1 0 1 0 ɛ1 y2 y3 = βi2 1 0 0 1 βj1 0 1 1 0 + ɛ2 ɛ3 y4 βj2 0 1 0 1 ɛ4 hfill We can have as many groups as we need, so long as there is sufficient replication of each treatment combination. The Treatment Contrast Model for Multiway ANOVA/ANODEV Hypotheses for Multiway ANOVA/ANODEV Y = βx + ɛ y1 1 0 0 ɛ1 y2 β0 y3 = 1 0 1 βi2 1 1 0 + ɛ2 ɛ3 βj2 y4 1 1 1 ɛ4 TreatmentHo: µi1 = µi2 = µi3 =... Block Ho: µj1 = µj2 = µj3 =... Remember, this can also be stated in terms of β

1 13 13 1 13 13 1 1 13 1 Sums of Squares for Multiway ANOVA Before we model it, make sure Block is a factor Factors are Orthogonal and Balanced, so... SST = SSA + SSB + SSR F-Test using Mean Squares as Before zoop$block <- factor(zoop$block) Two-Way ANOVA as a Linear Model Check Diagnostics zoop_lm <- lm(zooplankton treatment + block, data=zoop) Residuals 0.5 0.0 0.5 Residuals vs Fitted Normal Q Q Standardized residuals 2 1 0 1 2 Standardized residuals 0.0 0.4 0.8 1.2 Scale Location 1.0 2.0 3.0 1 0 1 1.0 2.0 3.0 Fitted values Theoretical Quantiles Fitted values Cook's distance 0.0 0.2 0.4 Constant Leverage: Cook's distance Residuals vs Factor Levels Standardized residuals 2 1 0 1 2 treatment : 2 6 10 low high Obs. number Factor Level Combinations

Residuals by Groups and No Non-Additivity Residuals by Groups and No Non-Additivity Pearson residuals 0.6 0.0 0.4 control low Pearson residuals 0.6 0.0 0.4 1 2 3 4 5 Tukey s Test for Non-Additivity library(car) residualplots(zoop_lm) Pearson residuals 0.6 0.0 0.4 treatment block Test stat Pr(> t ) treatment NA NA block NA NA Tukey test 0.474 0.635 1.0 2.0 3.0 Fitted values The ANOVA The ANOVA But first, what are the DF for... Treatment (with 3 levels) Block (with 5 blocks) Residuals (with n=15) anova(zoop_lm) Analysis of Variance Table Response: zooplankton Df Sum Sq Mean Sq F value Pr(>F) treatment 2 6.86 3.43 16.37 0.0015 block 4 2.34 0.58 2.79 0.1010 Residuals 8 1.68 0.21

Sums of Squares as Model Comparison Testing SS for a Factor is the same as comparing the residual SS of a model with v. without that factor. Here is y = intercept versus y = intercept + treatment: zoop_intonly <- lm(zooplankton 1, data=zoop) zoop_treatment <- lm(zooplankton treatment, data=zoop) anova(zoop_intonly, zoop_treatment) Analysis of Variance Table Model 1: zooplankton 1 Model 2: zooplankton treatment Res.Df RSS Df Sum of Sq F Pr(>F) 1 10.87 2 12 4.02 2 6.86 10.2 0.0025 Sums of Squares as Model Comparison Testing SS for a Factor is the same as comparing the residual SS of a model with v. without that factor. Here is y = intercept+treatment versus y = intercept + treatment+block: anova(zoop_treatment, zoop_lm) Analysis of Variance Table Model 1: zooplankton treatment Model 2: zooplankton treatment + block Res.Df RSS Df Sum of Sq F Pr(>F) 1 12 4.02 2 8 1.68 4 2.34 2.79 0.1 Sums of Squares as Model Comparison Coefficients via Treatment Contrasts summary(zoop_lm)$coef Squential model building and SS Calculation is called Type I Sums of Squares Estimate Std. Error t value Pr(> t ) (Intercept) 3.420e+00 0.3127 1.094e+01 4.330e-06 treatmenthigh -1.640e+00 0.2895-5.665e+00 4.730e-04 treatmentlow -1.020e+00 0.2895-3.524e+00 7.805e-03 block2 1.039e-15 0.3737 2.781e-15 1.000e+00 block3-7.000e-01 0.3737-1.873e+00 9.795e-02 block4-1.000e+00 0.3737-2.676e+00 2.811e-02 block5-3.000e-01 0.3737-8.027e-01 4.453e-01

Unique Effect of Each Treatment Exercise: Likelihood and Bees! crplots(zoop_lm) Component + Residual Plots Component+Residual(zooplankton) 1.0 0.5 0.0 0.5 1.0 1.5 Component+Residual(zooplankton) 0.5 0.0 0.5 1.0 Load the Bee Gene Expresion Data Does bee type or colony matter? How much variation does this experiment explain? control high low treatment 1 2 3 4 5 block Bee ANOVA Bee Effects crplots(bee_lm) Component + Residual Plots anova(bee_lm) Analysis of Variance Table Response: Expression Df Sum Sq Mean Sq F value Pr(>F) type 1 2.693 2.693 35.35 0.027 colony 2 0.343 0.171 2.25 0.308 Residuals 2 0.152 0.076 Component+Residual(Expression) 0.5 0.0 0.5 Component+Residual(Expression) 0.4 0.2 0.0 0.2 for nurse 1 2 3 type colony

Unbalancing the Zooplankton Data What if my data is unbalanced? zoop_u <- zoop[-c(1,2),] An Unbalanced ANOVA Unbalanced Data and Type I SS zoop_u_lm <- update(zoop_lm, data=zoop_u) anova(zoop_u_lm) Analysis of Variance Table Response: zooplankton Df Sum Sq Mean Sq F value Pr(>F) treatment 2 4.18 2.088 16.48 0.0037 block 4 1.75 0.437 3.45 0.0860 Residuals 6 0.76 0.127 Missing cells (i.e., treatment-block combinations) mean that order matters in testing SS zoop_u_lm1 <- lm(zooplankton treatment + block, data=zoop_u) zoop_u_lm2 <- lm(zooplankton block + treatment, data=zoop_u) Intercept versus Treatment and Block versus Treatment + Block will not produce different SS Is this valid? Can we use Type I sequential SS?

Unbalanced Data and Type I SS Solution: Marginal, or Type II SS Analysis of Variance Table Response: zooplankton Df Sum Sq Mean Sq F value Pr(>F) treatment 2 4.18 2.088 16.48 0.0037 block 4 1.75 0.437 3.45 0.0860 Residuals 6 0.76 0.127 Analysis of Variance Table Response: zooplankton Df Sum Sq Mean Sq F value Pr(>F) block 4 2.24 0.559 4.41 0.053 treatment 2 3.69 1.843.55 0.005 Residuals 6 0.76 0.127 SS of Block: Treatment versus Treatment + Block SS of Treatment: Block versus Block + Treatment Note: Because of marginality, the sum of all SS will no longer equal SST Solution: Marginal, or Type II SS Neanderthals and Sums of Squares Anova(zoop_u_lm1) Anova Table (Type II tests) Response: zooplankton Sum Sq Df F value Pr(>F) treatment 3.69 2.55 0.005 block 1.75 4 3.45 0.086 Residuals 0.76 6 Note the capital A - this is a function from the car package. How big was their brain?

Does Species Matter for Brain Size? The General Linear Model 7.5 7.4 Y = βx + ɛ lnbrain 7.3 7.2 7.1 4.0 4.1 4.2 4.3 4.4 lnmass species neanderthal We want to evaluate the species effect, controlling for brain size - is size balanced? recent This equation is huge. X can be anything - categorical, continuous, etc. One easy way to see this is if we want to control for the effect of a covariate - i.e., ANCOVA Type of SS matters, as covariate is de facto unbalanced Exercise: Fit like a cave man Type of SS Matters Fit a model that will describe brain size from this data Does species matter? Compare type I and type II results Use Component-Residual plots to evaluate results Analysis of Variance Table Response: lnbrain Df Sum Sq Mean Sq F value Pr(>F) species 1 0.0001 0.0001 0.01 0.91 lnmass 1 0.1300 0.1300 29.28 4.3e-06 Residuals 36 0.1599 0.0044 Anova Table (Type II tests) Response: lnbrain Sum Sq Df F value Pr(>F) species 0.0276 1 6.2 0.017 lnmass 0.1300 1 29.3 4.3e-06 Residuals 0.1599 36

Species Effect Species Effect Component + Residual Plots Component+Residual(lnbrain) 0.15 0.10 0.05 0.00 0.05 0.10 Component+Residual(lnbrain) 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 summary(neand_lm)$coefficients Estimate Std. Error t value Pr(> t ) (Intercept) 5.18807 0.39526 13.126 2.736e-15 speciesrecent 0.07028 0.02822 2.491 1.749e-02 lnmass 0.49632 0.09173 5.411 4.262e-06 neanderthal recent 4.0 4.1 4.2 4.3 4.4 species lnmass