Interactions and Factorial ANOVA

Similar documents
Interactions and Factorial ANOVA

Factorial ANOVA. More than one categorical explanatory variable. See last slide for copyright information 1

Factorial ANOVA. STA305 Spring More than one categorical explanatory variable

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Regression Part II. One- factor ANOVA Another dummy variable coding scheme Contrasts Mul?ple comparisons Interac?ons

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

Factorial Analysis of Variance with R

Categorical Data Analysis 1

The Bootstrap 1. STA442/2101 Fall Sampling distributions Bootstrap Distribution-free regression example

STA 2101/442 Assignment Four 1

Power and Sample Size for Log-linear Models 1

Generalized Linear Models 1

STA442/2101: Assignment 5

Instrumental Variables Again 1

The Multinomial Model

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

STA 302f16 Assignment Five 1

Contingency Tables Part One 1

Structural Equa+on Models: The General Case. STA431: Spring 2015

Harvard University. Rigorous Research in Engineering Education

Chapter 12 - Lecture 2 Inferences about regression coefficient

UNIVERSITY OF TORONTO Faculty of Arts and Science

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

A discussion on multiple regression models

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

POL 681 Lecture Notes: Statistical Interactions

Structural Equa+on Models: The General Case. STA431: Spring 2013

MA 1128: Lecture 08 03/02/2018. Linear Equations from Graphs And Linear Inequalities

Chapter 4. Regression Models. Learning Objectives

STA 2101/442 Assignment 2 1

STA 2101/442 Assignment 3 1

STA 431s17 Assignment Eight 1

Simple Linear Regression. John McGready Johns Hopkins University

Chapter 5. Multiple Regression. 5.1 Three Meanings of Control

Simple Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

MORE ON MULTIPLE REGRESSION

Analysis of Variance

The Multivariate Normal Distribution 1

Definition: A "system" of equations is a set or collection of equations that you deal with all together at once.

Chapter 1 Statistical Inference

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Review of Statistics 101

Mathematics for Economics MA course

LOOKING FOR RELATIONSHIPS

Introduction to Bayesian Statistics 1

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Review of the General Linear Model

Mathematical Notation Math Introduction to Applied Statistics

Regression With a Categorical Independent Variable

df=degrees of freedom = n - 1

Categorical Predictor Variables

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

SIMPLE REGRESSION ANALYSIS. Business Statistics

1. (Problem 3.4 in OLRT)

22s:152 Applied Linear Regression

review session gov 2000 gov 2000 () review session 1 / 38

General Principles Within-Cases Factors Only Within and Between. Within Cases ANOVA. Part One

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)

Lecture 10: F -Tests, ANOVA and R 2

Confidence Intervals, Testing and ANOVA Summary

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

One-way ANOVA (Single-Factor CRD)

Lesson 3-2: Solving Linear Systems Algebraically

Continuous Random Variables 1

Unit 12: Analysis of Single Factor Experiments

The Multivariate Normal Distribution 1

Regression With a Categorical Independent Variable

Regression and Models with Multiple Factors. Ch. 17, 18

Data Analysis and Statistical Methods Statistics 651

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Comparing Nested Models

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

WELCOME! Lecture 13 Thommy Perlinger

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Chapter 4: Regression Models

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

Chapter 2: Looking at Data Relationships (Part 3)

Inferences for Regression

Multivariate Regression (Chapter 10)

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

What If There Are More Than. Two Factor Levels?

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Review of Multiple Regression

Simple linear regression

Regression With a Categorical Independent Variable

Regression Analysis: Exploring relationships between variables. Stat 251

Introduction to Algebra: The First Week

Basic Probability Reference Sheet

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Midterm 2 - Solutions

Using R formulae to test for main effects in the presence of higher-order interactions

Chapter 3 Multiple Regression Complete Example

L6: Regression II. JJ Chen. July 2, 2015

Transcription:

Interactions and Factorial ANOVA STA442/2101 F 2018 See last slide for copyright information 1

Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory variable and the response variable depends on the value of the other explanatory variable. Can have Quantitative by quantitative Quantitative by categorical Categorical by categorical

Quantitative by Quantitative Y = 0 + 1 x 1 + 2 x 2 + 3 x 1 x 2 + E(Y x) = 0 + 1 x 1 + 2 x 2 + 3 x 1 x 2 For fixed x 2 E(Y x) = ( 0 + 2 x 2 ) + ( 1 + 3 x 2 )x 1 Both slope and intercept depend on value of x 2 And for fixed x 1, slope and intercept relating x 2 to E(Y) depend on the value of x 1

Quantitative by Categorical One regression line for each category. Interaction means slopes are not equal Form a product of quantitative variable by each dummy variable for the categorical variable For example, three treatments and one covariate: x 1 is the covariate and x 2, x 3 are dummy variables Y = 0 + 1 x 1 + 2 x 2 + 3 x 3 + 4 x 1 x 2 + 5 x 1 x 3 +

General principle Interaction between A and B means Relationship of A to Y depends on value of B Relationship of B to Y depends on value of A The two statements are formally equivalent

Make a table E(Y x) = 0 + 1 x 1 + 2 x 2 + 3 x 3 + 4 x 1 x 2 + 5 x 1 x 3 Group x 2 x 3 E(Y x) 1 1 0 ( 0 + 2 ) + ( 1 + 4 )x 1 2 0 1 ( 0 + 3 ) + ( 1 + 5 )x 1 3 0 0 0 + 1 x 1

Group x 2 x 3 E(Y x) 1 1 0 ( 0 + 2 ) + ( 1 + 4 )x 1 2 0 1 ( 0 + 3 ) + ( 1 + 5 )x 1 3 0 0 0 + 1 x 1 What null hypothesis would you test for Equal slopes Comparing slopes for group one vs three Comparing slopes for group one vs two Equal regressions Interaction between group and x 1

What to do if H 0 : β 4 =β 5 =0 is rejected How do you test Group controlling for x 1? A reasonable choice is to set x 1 to its sample mean, and compare treatments at that point.

Categorical by Categorical Naturally part of factorial ANOVA in experimental studies Also applies to purely observational data

Factorial ANOVA More than one categorical explanatory variable 10

Factorial ANOVA Categorical explanatory variables are called factors More than one at a time Primarily for true experiments, but also used with observational data If there are observations at all combinations of explanatory variable values, it s called a complete factorial design (as opposed to a fractional factorial). 11

The potato study Cases are potatoes Inoculate with bacteria, store for a fixed time period. Response variable is percent surface area with visible rot. Two explanatory variables, randomly assigned Bacteria Type (1, 2, 3) Temperature (1=Cool, 2=Warm) 12

Two-factor design Bacteria Type Temp 1 2 3 1=Cool 2=Warm Six treatment conditions 13

Factorial experiments Allow more than one factor to be investigated in the same study: Efficiency! Allow the scientist to see whether the effect of an explanatory variable depends on the value of another explanatory variable: Interactions Thank you again, Mr. Fisher. 14

Normal with equal variance and conditional (cell) means Bacteria Type Temp 1 2 3 1=Cool 2=Warm 15

Tests Main effects: Differences among marginal means Interactions: Differences between differences (What is the effect of Factor A? It depends on the level of Factor B.) 16

To understand the interaction, plot the means Temperature by Bacteria Interaction 25 20 Rot 15 10 Cool Warm 5 0 1 2 3 Bacteria Type 17

Either Way Temperature by Bacteria Interaction Temperature by Bacteria Interaction 25 25 20 20 Rot 15 10 Cool Warm Rot 15 10 Bact 1 Bact 2 Bact 3 5 5 0 1 2 3 Bacteria Type 0 Cool Temperature Warm 18

Non-parallel profiles = Interaction It Depends 25 20 Mean Rot 15 10 Cool Warm 5 0 1 2 3 Bacteria Type 19

Main effects for both variables, no interaction Main Effects Only 25 20 Mean Rot 15 10 Cool Warm 5 0 1 2 3 Bacteria Type 20

Main effect for Bacteria only 35 30 25 Mean Rot 20 15 Bact 1 Bact 2 Bact 3 10 5 0 Cool Temperature Warm 21

Main Effect for Temperature Only Temperature Only 25 Mean Rot 20 15 10 5 Cool Warm 0 1 2 3 Bacteria Type 22

Both Main Effects, and the Interaction Mean Rot as a Function of Temperature and Bacteria Type 25 20 Rot 15 10 Cool Warm 5 0 1 2 3 Bacteria Type 23

Should you interpret the main effects? It Depends 25 20 Mean Rot 15 10 Cool Warm 5 0 1 2 3 Bacteria Type 24

A common error Categorical explanatory variable with p categories p dummy variables (rather than p-1) And an intercept There are p population means represented by p+1 regression coefficients - not unique

But suppose you leave off the intercept Now there are p regression coefficients and p population means The correspondence is unique, and the model can be handy -- less algebra Called cell means coding

Cell means coding: p indicators and no intercept

Add a covariate: x4

Contrasts c = a 1 µ 1 + a 2 µ 2 + + a p µ p c = a 1 Y 1 + a 2 Y 2 + + a p Y p

In a one-factor design Mostly, what you want are tests of contrasts, Or collections of contrasts. You could do it with any dummy variable coding scheme. Cell means coding is often most convenient. With β=µ, test H 0 : Lβ=h Can get a confidence interval for any single contrast using the t distribution.

Testing Contrasts in Factorial Designs Differences between marginal means are definitely contrasts Interactions are also sets of contrasts 31

Interactions are sets of Contrasts 32

Interactions are sets of Contrasts Main Effects Only 25 20 Mean Rot 15 10 Cool Warm 5 0 1 2 3 Bacteria Type 33

Equivalent statements The effect of A depends upon B The effect of B depends on A 34

Three factors: A, B and C There are three (sets of) main effects: One each for A, B, C There are three two-factor interactions A by B (Averaging over C) A by C (Averaging over B) B by C (Averaging over A) There is one three-factor interaction: AxBxC 35

Meaning of the 3-factor interaction The form of the A x B interaction depends on the value of C The form of the A x C interaction depends on the value of B The form of the B x C interaction depends on the value of A These statements are equivalent. Use the one that is easiest to understand. 36

To graph a three-factor interaction Make a separate mean plot (showing a 2-factor interaction) for each value of the third variable. In the potato study, a graph for each type of potato 37

Four-factor design Four sets of main effects Six two-factor interactions Four three-factor interactions One four-factor interaction: The nature of the three-factor interaction depends on the value of the 4th factor There is an F test for each one And so on 38

As the number of factors increases The higher-way interactions get harder and harder to understand All the tests are still tests of sets of contrasts (differences between differences of differences ) But it gets harder and harder to write down the contrasts Effect coding becomes easier 39

Effect coding Like indicator dummy variables with intercept, but put -1 for the last category. Bact B 1 B 2 1 1 0 2 0 1 3-1 -1 Temperature T 1=Cool 1 2=Warm -1 40

Interaction effects are products of dummy variables The A x B interaction: Multiply each dummy variable for A by each dummy variable for B Use these products as additional explanatory variables in the multiple regression The A x B x C interaction: Multiply each dummy variable for C by each product term from the A x B interaction Test the sets of product terms simultaneously 41

Make a table Bact Temp B 1 B 2 T B 1 T B 2 T 1 1 1 0 1 1 0 1 2 1 0-1 -1 0 2 1 0 1 1 0 1 2 2 0 1-1 0-1 3 1-1 -1 1-1 -1 3 2-1 -1-1 1 1 42

Cell and Marginal Means Bacteria Type Tmp 1 2 3 1=C 2=W 43

We see Intercept is the grand mean Regression coefficients for the dummy variables are deviations of the marginal means from the grand mean What about the interactions? 44

A bit of algebra shows 45

Factorial ANOVA with effect coding is pretty automatic You don t have to make a table unless asked. It always works as you expect it will. Hypothesis tests are the same as testing sets of contrasts. Covariates present no problem. Main effects and interactions have their usual meanings, controlling for the covariates. Plot the least squares means (Y-hat at x-bar values for covariates). 46

Again Intercept is the grand mean Regression coefficients for the dummy variables are deviations of the marginal means from the grand mean Test of main effect(s) is test of the dummy variables for a factor. Interaction effects are products of dummy variables. 47

Balanced vs. Unbalanced Experimental Designs Balanced design: Cell sample sizes are proportional (maybe equal) Explanatory variables have zero relationship to one another Numerator SS in ANOVA are independent Everything is nice and simple Most experimental studies are designed this way. As soon as somebody drops a test tube, it s no longer true 48

Analysis of unbalanced data When explanatory variables are related, there is potential ambiguity. A is related to Y, B is related to Y, and A is related to B. Who gets credit for the portion of variation in Y that could be explained by either A or B? With a regression approach, whether you use contrasts or dummy variables (equivalent), the answer is nobody. Think of full, reduced models. Equivalently, general linear test 49

Some software is designed for balanced data The special purpose formulas are much simpler. They were very useful in the past. Since most data are at least a little unbalanced, thy are a recipe for trouble. Most textbook data are balanced, so they cannot tell you what your software is really doing. R s anova and aov functions are designed for balanced data, though anova applied to lm objects can give you what you want if you use it with care. 50

Copyright Information This slide show was prepared by Jerry Brunner, Department of Statistical Sciences, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. These Powerpoint slides will be available from the course website: http://www.utstat.toronto.edu/brunner/oldclass/appliedf18 51