22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Similar documents
22s:152 Applied Linear Regression. Take random samples from each of m populations.

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression

One-way ANOVA (Single-Factor CRD)

Categorical Predictor Variables

Ch 2: Simple Linear Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

ANOVA (Analysis of Variance) output RLS 11/20/2016

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

MATH 644: Regression Analysis Methods

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

ST430 Exam 2 Solutions

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Lecture 15. Hypothesis testing in the linear model

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

Lecture 6 Multiple Linear Regression, cont.

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

Simple Linear Regression

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Workshop 7.4a: Single factor ANOVA

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Section 4.6 Simple Linear Regression

QUEEN MARY, UNIVERSITY OF LONDON

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

STAT 705 Chapter 16: One-way ANOVA

Regression and the 2-Sample t

Unbalanced Data in Factorials Types I, II, III SS Part 1

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Chapter 3: Multiple Regression. August 14, 2018

Chapter 12. Analysis of variance

ST505/S697R: Fall Homework 2 Solution.

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

df=degrees of freedom = n - 1

Comparing Nested Models

Multiple Regression: Example

Ch 3: Multiple Linear Regression

Inference for Regression

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Lec 1: An Introduction to ANOVA

Simple Linear Regression

Handling Categorical Predictors: ANOVA

R 2 and F -Tests and ANOVA

Two-Way Analysis of Variance - no interaction

6. Multiple Linear Regression

Sociology 6Z03 Review II

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Density Temp vs Ratio. temp

Confidence Intervals, Testing and ANOVA Summary

MODELS WITHOUT AN INTERCEPT

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Topic 20: Single Factor Analysis of Variance

Tests of Linear Restrictions

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Chapter 4: Regression Models

Chapter 3. Diagnostics and Remedial Measures

One-way ANOVA Model Assumptions

STATISTICS 110/201 PRACTICE FINAL EXAM

General Linear Model (Chapter 4)

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Lecture 19 Multiple (Linear) Regression

Chapter 4. Regression Models. Learning Objectives

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Chapter 11. Analysis of Variance (One-Way)

ANOVA: Analysis of Variation

Statistical Techniques II EXST7015 Simple Linear Regression

More about Single Factor Experiments

Lecture 10. Factorial experiments (2-way ANOVA etc)

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

BIOS 2083 Linear Models c Abdus S. Wahed

1 Use of indicator random variables. (Chapter 8)

WELCOME! Lecture 13 Thommy Perlinger

ANOVA: Analysis of Variance

Stat 5102 Final Exam May 14, 2015

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Simple Linear Regression

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

ANOVA CIVL 7012/8012

36-707: Regression Analysis Homework Solutions. Homework 3

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

3. Design Experiments and Variance Analysis

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Formal Statement of Simple Linear Regression Model

Multiple Linear Regression

Multiple comparisons - subsequent inferences for two-way ANOVA

Analysis of Variance

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Lecture 1: Linear Models and Applications

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Variance Decomposition and Goodness of Fit

Analysis of Variance

Inferences for Regression

Transcription:

22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each of m populations. n i is the sample size in the ith population for i =1,...,m. y ij is the jth observation in the ith population. 1 There are a couple commonly used models for a one-way ANOVA with m groups. The cell means model: iid Y ij = µ i + ij with ij N(0, σ 2 ) i =1, 2,...,m j =1, 2,...,n i So, E[Y 1j ]=µ 1,andallobservationsfrom group 1 have the same mean, µ 1. The mean of group i is µ i. The mean parameters to be estimated are: µ 1,µ 2,...,µ m There is 1 noise parameter to estimate σ 2 2 Estimators: ˆµ i = Ȳi = ni j=1 Y ij n i The estimated ˆµ i for a group is just the sample group mean. σ 2 is estimated using a pooled estimate because constant variance is assumed. ˆσ 2 = s 2 P = (n 1 1)s 2 1 +(n 2 1)s 2 2 + (n m 1)s 2 m N m where s 2 i is the sample variance in the ith group Pooled estimate of σ: s P = s 2 P 3 Now, a different way to parameterize the same situation... The effects model: Y ij = µ + α i + ij i =1, 2,...,m j =1, 2,...,n i with ij iid N(0, σ 2 ) So, E[Y 1j ]=µ + α 1,andallobservations from group 1 have the same mean, µ+α 1. In this model, there are m groups (m estimated means), and we re using m +1 parameters to define the mean structure. This is an over-parameterization. Different sets of parameter values (µ, α 1,...α m ) can give the same fitted values (i.e. can give the same estimated group means). 4

For example, suppose m =3,and Ȳ 1 =10, Ȳ 2 =20, and Ȳ3 =30. In the over-parameterized effects model, Ŷ ij = ˆµ + ˆα i for i =1, 2, 3 many different combinations of (µ, α 1, α 2, α 3 ) estimates will give me these same estimated group means of (10, 20, 30), for example... ˆµ ˆα 1 ˆα 2 ˆα 3 Ŷ 1j Ŷ 2j Ŷ 3j 0 10 20 30 10 20 30-10 20 30 40 10 20 30 20-10 0 10 10 20 30 This means we have to use a constraint or restriction to make the parameters in the model identifiable (uniquely determined). The effects model: Y ij = µ + α i + ij The α m = constraint: Set the last group parameter to zero. (Essentially, delete the parameter for the last category). Under this constraint, group m is seen as the baseline group... α m =0,soE[Y mj ]=µ + α m = µ µ represents the mean of the m th group under this constraint. α i is the distance of group i from group m. (The α i s give distance from baseline group.) This may or may not be a useful interpretation for your situation. 5 6 Dummy Regressor Coding for the α m =0 constraint with m =3: Category D 1 D 2 group 1 1 0 group 2 0 1 group 3 0 0 This is the coding we ve been using so far with our dummy regressors (we ll call this Baseline Coding or Indicator Coding). Regression Model: Y i = µ + α 1 D 1i + α 2 D 2i + i Model by group... Group 1: Y i = µ + α 1 + i Group 2: Y i = µ + α 2 + i Group 3: Y i = µ + i The effects model: Y ij = µ + α i + ij There is another often used constraint that produces easily interpretable parameters... The sum-to-zero constraint: mi=1 α i =0 α m = (α 1 + α 2 + + α m 1 ) m 1dummyvariablesneeded µ is seen as the grand mean, or the average of the pop n means (nice interpretation). If you have balanced data: ˆµ = Ȳ, the overall mean of the sample If you have unbalanced data: mi=1 Ȳ ˆµ = i m,themeanofthesample means 7 8

α i represents the distance of group i from the grand mean. Thus, α i is the effect of being in group i (tells us if the mean of group i is up or down from the grand mean). Model by group... Group 1: Y i = µ + α 1 + i Group 2: Y i = µ + α 2 + i Group 3: Y i = µ (α 1 + α 2 )+ i Dummy Regressor Coding for sum-to-zero constraint with m =3: Category D 1 D 2 group 1 1 0 group 2 0 1 group 3-1 -1 Regression Model (looks the same as indicator coding): No baseline group in this interpretation. You still only need 2 dummy variables, as α 3 = (α 1 + α 2 )... that s the restriction we ve imposed. These (1,0,-1) dummy regressors are called deviation regressors, becausetheinterpretation gives values as distances (or deviations) from the grand mean. Y i = µ + α 1 D 1i + α 2 D 2i + i 9 10 Example: Deviationregressors- Back to the Pet and Stress data We ll now use a different dummy regressor coding of the same situation, and we ll use the deviation regressors for the dummy variables. Category D 1 D 2 Conrol 1 0 Friend 0 1 Pet -1-1 > pets=read.csv("pets.csv") > attach(pets) > names(pets) [1] "group" "rate" > levels(group) [1] "C" "F" "P" Create the deviation regressors... > n=nrow(pets) > dummy.1=rep(0,n) > dummy.1[group=="c"]= 1 > dummy.1[group=="p"]= -1 > dummy.2=rep(0,n) > dummy.2[group=="f"]= 1 > dummy.2[group=="p"]= -1 > data.frame(group,dummy.1,dummy.2) group dummy.1 dummy.2 1 P -1-1 2 F 0 1 3 P -1-1 4 C 1 0 5 C 1 0... Regression Model: Y i = µ + α 1 D 1i + α 2 D 2i + i > lm.out=lm(rate ~ dummy.1 + dummy.2) > lm.out$coefficients (Intercept) dummy.1 dummy.2 82.44408889 0.07997778 8.88104444 ˆµ ˆα 1 ˆα 2 11 12

Since this is balanced data, overall mean ˆµ: > mean(rate) [1] 82.44409 All three group means: Ȳ 1, Ȳ2, Ȳ3 > tapply(rate,group,mean) C F P 82.52407 91.32513 73.48307 Control treatment group: µ + α 1 > lm.out$coefficients[1]+lm.out$coefficients[2] [1] 82.52407 Friend treatment group: µ + α 2 > lm.out$coefficients[1]+lm.out$coefficients[3] [1] 91.32513 Pet treatment group: µ (α 1 + α 2 ) > lm.out$coefficients[1]-(lm.out$coefficients[2]+ lm.out$coefficients[3]) [1] 73.48307 13 You can use the summary statement to get an overall F-test: > lm.out=lm(rate ~ dummy.1 + dummy.2) > summary(lm.out) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 82.44409 1.37269 60.060 < 2e-16 *** dummy.1 0.07998 1.94128 0.041 0.967 dummy.2 8.88104 1.94128 4.575 4.18e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 9.208 on 42 degrees of freedom Multiple R-Squared: 0.4014,Adjusted R-squared: 0.3729 F-statistic: 14.08 on 2 and 42 DF, p-value: 2.092e-05 F-statistic is 14.08 and p-value is 0.00002. We reject the null and conclude that there is statistically significant evidence that at least one of the group levels is different from the others. This F-statistic and p-value are EXACTLY the same as when we fit the model using the α 3 =0 constraint in the part 1 notes (on p.16). 14 Hypothesis testing in one-way ANOVA Cell means model H 0 : µ 1 = µ 2 = = µ m H A :atleast1µ i different Effects model H 0 : α 1 = α 2 = = α m =0 H A :atleast1α i =0 Both hypotheses are testing the same thing... whether or not all the group means are equal. ANOVA table and overall F-test When we represent group by dummy regressors, R sees each dummy variable as a separate covariate (notice how it performs a test for each dummy regressor in the summary). In the pet example, we can test the significance of group by lumping the two covariates together and doing a partial F-test (or an overall F-test in this case because they were the only predictors in the model). What about the ANOVA table and the sums of squares? RegSS ni=1 (Ŷi Ȳ )2 RSS TSS ni=1 (Y i Ŷi) 2 ni=1 (Y i Ȳ )2 15 16

> RegSS=sum((lm.out$fitted.values-mean(rate))^2) > RegSS [1] 2387.689 > RSS=sum((rate-lm.out$fitted.values)^2) > RSS [1] 3561.299 Source Sum of Squares df Mean Square F Regression 2387.689 2 1193.844 14.07954 Residuals 3561.299 42 84.79285 Total 5948.988 44 When R sees group as a factor (categorical variable), and it s the ONLY predictor, we can get the RegSS from the anova statement. > lm.out=lm(rate~group) > anova(lm.out) Analysis of Variance Table Response: rate Df Sum Sq Mean Sq F value Pr(>F) group 2 2387.7 1193.8 14.079 2.092e-05 *** Residuals 42 3561.3 84.8 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 17 Classical ANOVA sums of squares Notation for sums of squares in a 1-way ANOVA: m RegSS = SS group = n i (Ȳi Ȳ )2 i=1 where Ȳi is the group i mean and Ȳ is the overall sample mean (Note that Ŷij = Ȳi ) Residual sum of squares m n i RSS = (Y ij Ȳi ) 2 i=1 j=1 Source Sum of Squares df Mean Square F SS group m i=1 n i(ȳi Ȳ )2 m-1 RegSS m 1 = RegMS RegMS MSE Residuals m ni i=1 j=1 (Y ij Ȳi ) 2 RSS n-m n m = MSE Total m ni i=1 j=1 (Y ij Ȳ )2 n-1 18 Assessing the assumptions of one-way ANOVA Normal distribution of response variable in each population (or group) histograms, boxplots for sample data from each population (done separately) normal qq plot for sample data from each population (done separately) normal qq plot of all residuals from the fitted model if n i s are equal (balanced design), the ANOVA is less sensitive to the violation of equal variance If one or both assumptions are violated, try atransformation. If only normality is violated, try non-parametric procedure such as Kruskal-Wallis test. Same standard deviation (or variance) in all populations can use Levene s test for homogeneity of variance (but it assumes normality of observations) rule of thumb: if largest sample standard deviation isn t more than twice as large as smallest sample standard deviation, assumption is probably met close enough for ANOVA to be OK 19 20

Earlier, we mentioned that 1-way ANOVA... Focuses on testing for differences among group means. Can you get at the differences between means using either effects model coding method (i.e. either constraint)? Case 1: Baseline coding (α m =0) µ represents baseline group. α 1 is distance group 1 from baseline group. α 2 is distance group 2 from baseline group. α 2 α 1 is distance between group 1 & 2. The answer is yes. The interpretation of the parameters depends on the constraint used, but the important results are still the same (p-values, Ŷij values, etc.). Because hypothesis tests are built on parameter interpretation, the hypothesis test used to answer a given question does depend on the constraint used. Case 2: sum-to-zero coding ( α i =0) µ represents overall or grand mean. α 1 is distance group 1 from overall mean. α 2 is distance group 2 from overall mean. α 2 α 1 is distance between group 1 & 2. 21 22