Regression Part II. One- factor ANOVA Another dummy variable coding scheme Contrasts Mul?ple comparisons Interac?ons

Similar documents
Interactions and Factorial ANOVA

Interactions and Factorial ANOVA

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Structural Equa+on Models: The General Case. STA431: Spring 2015

Factorial ANOVA. More than one categorical explanatory variable. See last slide for copyright information 1

Factorial ANOVA. STA305 Spring More than one categorical explanatory variable

Structural Equa+on Models: The General Case. STA431: Spring 2013

Some Review and Hypothesis Tes4ng. Friday, March 15, 13

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

Analysis of Variance (ANOVA)

The Bootstrap 1. STA442/2101 Fall Sampling distributions Bootstrap Distribution-free regression example

STA 2101/442 Assignment Four 1

Generalized Linear Models 1

The Multinomial Model

Multiple Comparisons

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce

STA442/2101: Assignment 5

Contingency Tables Part One 1

ANOVA Analysis of Variance

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Power and Sample Size for Log-linear Models 1

Data Processing Techniques

Comparing Several Means

Specific Differences. Lukas Meier, Seminar für Statistik

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on

One- factor ANOVA. F Ra5o. If H 0 is true. F Distribu5on. If H 1 is true 5/25/12. One- way ANOVA: A supersized independent- samples t- test

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Chapter 1 Statistical Inference

UNIVERSITY OF TORONTO Faculty of Arts and Science

Class Notes. Examining Repeated Measures Data on Individuals

STA 2101/442 Assignment 2 1

Categorical Data Analysis 1

STA 431s17 Assignment Eight 1

SIMPLE REGRESSION ANALYSIS. Business Statistics

Comparing Several Means: ANOVA

Contrasts (in general)

REGRESSION AND CORRELATION ANALYSIS


Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

Simple Linear Regression

STA 302f16 Assignment Five 1

Introduction to Bayesian Statistics 1

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Chapter 4. Regression Models. Learning Objectives

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Chapter Seven: Multi-Sample Methods 1/52

More about Single Factor Experiments

Review of Statistics 101

These are all actually contrasts (the coef sum to zero). What are these contrasts representing? What would make them large?

Analysis of Variance

Group comparison test for independent samples

One-way between-subjects ANOVA. Comparing three or more independent means

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Least Squares Parameter Es.ma.on

Categorical Predictor Variables

Visual interpretation with normal approximation

One-Way ANOVA Source Table J - 1 SS B / J - 1 MS B /MS W. Pairwise Post-Hoc Comparisons of Means

Review of Statistics

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

Workshop 7.4a: Single factor ANOVA

Probability and Statistics Notes

Garvan Ins)tute Biosta)s)cal Workshop 16/7/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Difference in two or more average scores in different groups

The Multivariate Normal Distribution 1

df=degrees of freedom = n - 1

The Multivariate Normal Distribution 1

Comparisons among means (or, the analysis of factor effects)

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

STA 2101/442 Assignment 3 1

Regression With a Categorical Independent Variable: Mean Comparisons

Multiple Regression Analysis: The Problem of Inference

Applied Quantitative Methods II

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Preview from Notesale.co.uk Page 3 of 63

Power Analysis using GPower 3.1

Factorial Analysis of Variance with R

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

An Old Research Question

Regression and Models with Multiple Factors. Ch. 17, 18

COMPARING SEVERAL MEANS: ANOVA

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Master s Written Examination

9 One-Way Analysis of Variance

Regression With a Categorical Independent Variable

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models

STAT22200 Spring 2014 Chapter 5

Experimental Design and Data Analysis for Biologists

Introduction to Statistical Inference Lecture 8: Linear regression, Tests and confidence intervals

Chapter 3 Multiple Regression Complete Example

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Continuous Random Variables 1

Transcription:

Regression Part II One- factor ANOVA Another dummy variable coding scheme Contrasts Mul?ple comparisons Interac?ons

One- factor Analysis of variance Categorical Explanatory variable Quan?ta?ve Response variable p categories (groups) H 0 : All popula?on means equal Normal condi?onal distribu?ons Equal variances

Dummy Variables You have seen Indicator dummy variables with intercept Effect coding (with intercept) Cell means coding is also useful at?mes.

A common error Categorical explanatory variable with p categories p dummy variables (rather than p- 1) And an intercept There are p popula?on means represented by p+1 regression coefficients - not unique

But suppose you leave off the intercept Now there are p regression coefficients and p popula?on means The correspondence is unique, and the model can be handy - - less algebra Called cell means coding

Cell means coding: p indicators and no intercept

Add a covariate: x4

Contrasts c = a 1 µ 1 + a 2 µ 2 + + a p µ p c = a 1 Y 1 + a 2 Y 2 + + a p Y p

Overall F- test is a test of p- 1 contrasts c = a 1 µ 1 + a 2 µ 2 + + a p µ p

In a one- factor design Mostly, what you want are tests of contrasts, Or collec?ons of contrasts. You could do it with any dummy variable coding scheme. Cell means coding is oven most convenient. With β=μ, test H 0 : Lβ=h Can get a confidence interval for any single contrast using the t distribu?on.

Mul?ple Comparisons Most hypothesis tests are designed to be carried out in isola?on But if you do a lot of tests and all the null hypotheses are true, the chance of rejec?ng at least one of them can be a lot more than α. This is infla3on of the Type I error probability. Otherwise known as the curse of a thousand t- tests. Mul?ple comparisons (some?mes called follow- up tests, post hoc tests, probing) try to offer a solu?on.

Mul?ple Comparisons Protect a family of tests against Type I error at some joint significance level α If all the null hypotheses are true, the probability of rejec?ng at least one is no more than α

Mul?ple comparison tests of contrasts in a one- factor design Usual null hypothesis is μ 1 = = μ p. Usually do them aver rejec?ng the ini?al null hypothesis with an ordinary F test. Call them follow- up tests. The big three are Bonferroni Tukey Scheffé

Bonferroni Based on Bonferroni s inequality P r k j=1a j k P r{a j } j=1 Applies to any collec?on of k tests Assume all k null hypotheses are true Event A j is that null hypothesis j is rejected. Do the tests as usual Reject each H 0 if p < 0.05/k Or, adjust the p- values. Mul?ply them by k, and reject if pk < 0.05

Bonferroni Advantage: Flexible Applies to any collec?on of hypothesis tests. Advantage: Easy to do. Disadvantage: Must know what all the tests are before seeing the data. Disadvantage: A ligle conserva?ve; the true joint significance level is less than α.

Tukey (HSD) Based on the distribu?on of the largest mean minus the smallest. Applies only to pairwise comparisons of means. If sample sizes are equal, it s most powerful, period. If sample sizes are not equal, it s a bit conserva?ve.

Scheffé Find the usual cri?cal value for the ini?al test. Mul?ply by p- 1. This is the Scheffé cri?cal value. Family includes all contrasts: Infinitely many! You don t need to specify them in advance. Based on the union- intersec?on principle.

General principle The intersec?on of the follow- up null hypothesis regions is contained in the null hypothesis region of the overall test. So if all the follow- up null hypotheses are true, the null hypothesis of the overall test is true. The union of the follow- up cri?cal regions is contained in the cri?cal region of the overall test. And if all the follow- up null hypotheses are true, the probability of rejec?ng at least one of them is no more than α, the significance level of the overall test. A follow- up test cannot reject H 0 if the overall test does not.

Cri?cal region is contained in the union of cri?cal regions Null hypothesis region contained in the intersec?on of null hypotheses Sample Space Parameter Space The ideal is equality, not just containment. Scheffé tests enjoy equality.

Scheffé are union- intersec?on tests Follow- up tests cannot reject H 0 if the ini?al F- test does not. Not quite true of Bonferroni and Tukey. If the ini?al test (of p- 1 contrasts) rejects H 0, there is a contrast for which the Scheffé test will reject H 0 (not necessarily a pairwise comparison). Adjusted p- value is the tail area beyond F/(p- 1) using the null distribu?on of the ini5al test.

Which method should you use? If the sample sizes are nearly equal and you are only interested in pairwise comparisons, use Tukey because it's most powerful If the sample sizes are not close to equal and you are only interested in pairwise comparisons, there is (amazingly) no harm in applying all three methods and picking the one that gives you the greatest number of significant results. (It s okay because this choice could be determined in advance based on number of treatments, α and the sample sizes.)

If you are interested in contrasts that go beyond pairwise comparisons and you can specify all of them before seeing the data, Bonferroni is almost always more powerful than Scheffé. (Tukey is out.) If you want lots of special contrasts but you don't know in advance exactly what they all are, Scheffé is the only honest way to go, unless you have a separate replica?on data set.

Interac?ons Interac?on between independent variables means It depends. Rela?onship between one explanatory variable and the response variable depends on the value of the other explanatory variable. Can have Quan?ta?ve by quan?ta?ve Quan?ta?ve by categorical Categorical by categorical

Quan?ta?ve by Quan?ta?ve Y = 0 + 1 x 1 + 2 x 2 + 3 x 1 x 2 + E(Y x) = 0 + 1 x 1 + 2 x 2 + 3 x 1 x 2 For fixed x 2 E(Y x) = ( 0 + 2 x 2 ) + ( 1 + 3 x 2 )x 1 Both slope and intercept depend on value of x 2 And for fixed x 1, slope and intercept rela?ng x 2 to E(Y) depend on the value of x 1

Quan?ta?ve by Categorical One regression line for each category. Interac?on means slopes are not equal Form a product of quan?ta?ve variable by each dummy variable for the categorical variable For example, three treatments and one covariate: x 1 is the covariate and x 2, x 3 are dummy variables Y = 0 + 1 x 1 + 2 x 2 + 3 x 3 + 4 x 1 x 2 + 5 x 1 x 3 +

General principle Interac?on between A and B means Rela?onship of A to Y depends on value of B Rela?onship of B to Y depends on value of A The two statements are formally equivalent

Make a table E(Y x) = 0 + 1 x 1 + 2 x 2 + 3 x 3 + 4 x 1 x 2 + 5 x 1 x 3 Group x 2 x 3 E(Y x) 1 1 0 ( 0 + 2 ) + ( 1 + 4 )x 1 2 0 1 ( 0 + 3 ) + ( 1 + 5 )x 1 3 0 0 0 + 1 x 1

Group x 2 x 3 E(Y x) 1 1 0 ( 0 + 2 ) + ( 1 + 4 )x 1 2 0 1 ( 0 + 3 ) + ( 1 + 5 )x 1 3 0 0 0 + 1 x 1 What null hypothesis would you test for Equal slopes Comparing slopes for group one vs three Comparing slopes for group one vs two Equal regressions Interac?on between group and x 1

What to do if H 0 : β 4 =β 5 =0 is rejected How do you test Group controlling for x 1? A reasonable choice is to set x 1 to its sample mean, and compare treatments at that point. How about sesng x 1 to sample mean of the group (3 different values)? With random assignment to Group, all three means just es?mate E(X 1 ), and the mean of all the x 1 values is a beger es?mate.

Categorical by Categorical Soon But first, an example of mul?ple comparisons.

Copyright Informa?on This slide show was prepared by Jerry Brunner, Department of Sta?s?cal Sciences, University of Toronto. It is licensed under a Crea?ve Commons Agribu?on - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. These Powerpoint slides are available on the course website: hgp://www.utstat.toronto.edu/brunner/oldclass/appliedf16