22s:152 Applied Linear Regression. Take random samples from each of m populations.

Similar documents
22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression

One-way ANOVA (Single-Factor CRD)

Categorical Predictor Variables

Ch 2: Simple Linear Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

ANOVA (Analysis of Variance) output RLS 11/20/2016

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

ST430 Exam 2 Solutions

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Lecture 6 Multiple Linear Regression, cont.

Simple Linear Regression

MATH 644: Regression Analysis Methods

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

STAT 705 Chapter 16: One-way ANOVA

Multiple Regression: Example

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Lecture 15. Hypothesis testing in the linear model

Comparing Nested Models

Ch 3: Multiple Linear Regression

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Chapter 12. Analysis of variance

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Inference for Regression

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Workshop 7.4a: Single factor ANOVA

Section 4.6 Simple Linear Regression

QUEEN MARY, UNIVERSITY OF LONDON

Handling Categorical Predictors: ANOVA

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Two-Way Analysis of Variance - no interaction

Regression and the 2-Sample t

R 2 and F -Tests and ANOVA

Unbalanced Data in Factorials Types I, II, III SS Part 1

6. Multiple Linear Regression

Chapter 3: Multiple Regression. August 14, 2018

Confidence Intervals, Testing and ANOVA Summary

ST505/S697R: Fall Homework 2 Solution.

Density Temp vs Ratio. temp

df=degrees of freedom = n - 1

Tests of Linear Restrictions

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Lec 1: An Introduction to ANOVA

Simple Linear Regression

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

More about Single Factor Experiments

One-way ANOVA Model Assumptions

Lecture 10. Factorial experiments (2-way ANOVA etc)

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

1 Use of indicator random variables. (Chapter 8)

Sociology 6Z03 Review II

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Multiple Linear Regression

Chapter 11. Analysis of Variance (One-Way)

36-707: Regression Analysis Homework Solutions. Homework 3

MODELS WITHOUT AN INTERCEPT

BIOS 2083 Linear Models c Abdus S. Wahed

Formal Statement of Simple Linear Regression Model

Topic 20: Single Factor Analysis of Variance

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Chapter 4: Regression Models

Chapter 3. Diagnostics and Remedial Measures

3. Design Experiments and Variance Analysis

Lecture 18: Simple Linear Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

General Linear Model (Chapter 4)

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Analysis of Variance

Lecture 19 Multiple (Linear) Regression

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Lecture 1: Linear Models and Applications

3. Diagnostics and Remedial Measures

Chapter 4. Regression Models. Learning Objectives

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

ANOVA: Analysis of Variation

Statistical Techniques II EXST7015 Simple Linear Regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

4.1. Introduction: Comparing Means

WELCOME! Lecture 13 Thommy Perlinger

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

ANOVA: Analysis of Variance

Stat 5102 Final Exam May 14, 2015

Simple Linear Regression

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

Lecture 10: F -Tests, ANOVA and R 2

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

ANOVA CIVL 7012/8012

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Transcription:

22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each of m populations. n i is the sample size in the ith population for i = 1,..., m. y ij is the jth observation in the ith population. 1

There are a couple commonly used models for a one-way ANOVA with m groups. The cell means model: iid Y ij = µ i + ɛ ij with ɛ ij N(0, σ 2 ) i = 1, 2,..., m j = 1, 2,..., n i So, E[Y 1j ] = µ 1, and all observations from group 1 have the same mean, µ 1. The mean of group i is µ i. The mean parameters to be estimated are: µ 1, µ 2,..., µ m There is 1 noise parameter to estimate σ 2 2

Estimators: ˆµ i = Ȳi = ni j=1 Y ij n i The estimated ˆµ i for a group is just the sample group mean. σ 2 is estimated using a pooled estimate because constant variance is assumed. ˆσ 2 = s 2 P = (n 1 1)s 2 1 + (n 2 1)s 2 2 + (n m 1)s 2 m N m where s 2 i group is the sample variance in the ith Pooled estimate of σ: s P = s 2 P 3

Now, a different way to parameterize the same situation... The effects model: Y ij = µ + α i + ɛ ij with ɛ ij iid N(0, σ 2 ) i = 1, 2,..., m j = 1, 2,..., n i So, E[Y 1j ] = µ + α 1, and all observations from group 1 have the same mean, µ+α 1. In this model, there are m groups (m estimated means), and we re using m + 1 parameters to define the mean structure. This is an over-parameterization. Different sets of parameter values (µ, α 1,... α m ) can give the same fitted values (i.e. can give the same estimated group means). 4

For example, suppose m = 3, and Ȳ 1 = 10, Ȳ 2 = 20, and Ȳ3 = 30. In the over-parameterized effects model, Ŷ ij = ˆµ + ˆα i for i = 1, 2, 3 many different combinations of (µ, α 1, α 2, α 3 ) estimates will give me these same estimated group means of (10, 20, 30), for example... ˆµ ˆα 1 ˆα 2 ˆα 3 Ŷ 1j Ŷ 2j Ŷ 3j 0 10 20 30 10 20 30-10 20 30 40 10 20 30 20-10 0 10 10 20 30 This means we have to use a constraint or restriction to make the parameters in the model identifiable (uniquely determined). 5

The effects model: Y ij = µ + α i + ɛ ij The α m = 0 constraint: Set the last group parameter to zero. (Essentially, delete the parameter for the last category). Under this constraint, group m is seen as the baseline group... α m = 0, so E[Y mj ] = µ + α m = µ µ represents the mean of the m th group under this constraint. α i is the distance of group i from group m. (The α i s give distance from baseline group.) This may or may not be a useful interpretation for your situation. 6

Dummy Regressor Coding for the α m = 0 constraint with m = 3 : Category D 1 D 2 group 1 1 0 group 2 0 1 group 3 0 0 This is the coding we ve been using so far with our dummy regressors (we ll call this Baseline Coding or Indicator Coding). Regression Model: Y i = µ + α 1 D 1i + α 2 D 2i + ɛ i Model by group... Group 1: Y i = µ + α 1 + ɛ i Group 2: Y i = µ + α 2 + ɛ i Group 3: Y i = µ + ɛ i 7

The effects model: Y ij = µ + α i + ɛ ij There is another often used constraint that produces easily interpretable parameters... The sum-to-zero constraint: mi=1 α i = 0 α m = (α 1 + α 2 + + α m 1 ) }{{} m 1 dummy variables needed µ is seen as the grand mean, or the average of the pop n means (nice interpretation). If you have balanced data: ˆµ = Ȳ, the overall mean of the sample If you have unbalanced data: ˆµ = mi=1 Ȳ i m, the mean of the sample means 8

α i represents the distance of group i from the grand mean. Thus, α i is the effect of being in group i (tells us if the mean of group i is up or down from the grand mean). Dummy Regressor Coding for sum-to-zero constraint with m = 3: Category D 1 D 2 group 1 1 0 group 2 0 1 group 3-1 -1 Regression Model (looks the same as indicator coding): Y i = µ + α 1 D 1i + α 2 D 2i + ɛ i 9

Model by group... Group 1: Y i = µ + α 1 + ɛ i Group 2: Y i = µ + α 2 + ɛ i Group 3: Y i = µ (α 1 + α 2 ) + ɛ i No baseline group in this interpretation. You still only need 2 dummy variables, as α 3 = (α 1 + α 2 )... that s the restriction we ve imposed. These (1,0,-1) dummy regressors are called deviation regressors, because the interpretation gives values as distances (or deviations) from the grand mean. 10

Example: Deviation regressors - Back to the Pet and Stress data We ll now use a different dummy regressor coding of the same situation, and we ll use the deviation regressors for the dummy variables. Category D 1 D 2 Conrol 1 0 Friend 0 1 Pet -1-1 > pets=read.csv("pets.csv") > attach(pets) > names(pets) [1] "group" "rate" > levels(group) [1] "C" "F" "P" Create the deviation regressors... 11

> n=nrow(pets) > dummy.1=rep(0,n) > dummy.1[group=="c"]= 1 > dummy.1[group=="p"]= -1 > dummy.2=rep(0,n) > dummy.2[group=="f"]= 1 > dummy.2[group=="p"]= -1 > data.frame(group,dummy.1,dummy.2) group dummy.1 dummy.2 1 P -1-1 2 F 0 1 3 P -1-1 4 C 1 0 5 C 1 0... Regression Model: Y i = µ + α 1 D 1i + α 2 D 2i + ɛ i > lm.out=lm(rate ~ dummy.1 + dummy.2) > lm.out$coefficients (Intercept) dummy.1 dummy.2 82.44408889 0.07997778 8.88104444 ˆµ ˆα 1 ˆα 2 12

Since this is balanced data, overall mean ˆµ: > mean(rate) [1] 82.44409 All three group means: Ȳ 1, Ȳ2, Ȳ3 > tapply(rate,group,mean) C F P 82.52407 91.32513 73.48307 Control treatment group: µ + α 1 > lm.out$coefficients[1]+lm.out$coefficients[2] [1] 82.52407 Friend treatment group: µ + α 2 > lm.out$coefficients[1]+lm.out$coefficients[3] [1] 91.32513 Pet treatment group: µ (α 1 + α 2 ) > lm.out$coefficients[1]-(lm.out$coefficients[2]+ lm.out$coefficients[3]) [1] 73.48307 13

You can use the summary statement to get an overall F-test: > lm.out=lm(rate ~ dummy.1 + dummy.2) > summary(lm.out) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 82.44409 1.37269 60.060 < 2e-16 *** dummy.1 0.07998 1.94128 0.041 0.967 dummy.2 8.88104 1.94128 4.575 4.18e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 9.208 on 42 degrees of freedom Multiple R-Squared: 0.4014,Adjusted R-squared: 0.3729 F-statistic: 14.08 on 2 and 42 DF, p-value: 2.092e-05 F-statistic is 14.08 and p-value is 0.00002. We reject the null and conclude that there is statistically significant evidence that at least one of the group levels is different from the others. This F-statistic and p-value are EXACTLY the same as when we fit the model using the α 3 = 0 constraint in the part 1 notes (on p.16). 14

Hypothesis testing in one-way ANOVA Cell means model H 0 : µ 1 = µ 2 = = µ m H A : at least 1 µ i different Effects model H 0 : α 1 = α 2 = = α m = 0 H A : at least 1 α i 0 Both hypotheses are testing the same thing... whether or not all the group means are equal. 15

ANOVA table and overall F-test When we represent group by dummy regressors, R sees each dummy variable as a separate covariate (notice how it performs a test for each dummy regressor in the summary). In the pet example, we can test the significance of group by lumping the two covariates together and doing a partial F-test (or an overall F-test in this case because they were the only predictors in the model). What about the ANOVA table and the sums of squares? RegSS RSS TSS ni=1 (Ŷi Ȳ )2 ni=1 (Y i Ŷi) 2 ni=1 (Y i Ȳ )2 16

> RegSS=sum((lm.out$fitted.values-mean(rate))^2) > RegSS [1] 2387.689 > RSS=sum((rate-lm.out$fitted.values)^2) > RSS [1] 3561.299 Source Sum of Squares df Mean Square F Regression 2387.689 2 1193.844 14.07954 Residuals 3561.299 42 84.79285 Total 5948.988 44 When R sees group as a factor (categorical variable), and it s the ONLY predictor, we can get the RegSS from the anova statement. > lm.out=lm(rate~group) > anova(lm.out) Analysis of Variance Table Response: rate Df Sum Sq Mean Sq F value Pr(>F) group 2 2387.7 1193.8 14.079 2.092e-05 *** Residuals 42 3561.3 84.8 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 17

Classical ANOVA sums of squares Notation for sums of squares in a 1-way ANOVA: m RegSS = SS group = n i (Ȳi Ȳ )2 i=1 where Ȳi is the group i mean and Ȳ is the overall sample mean (Note that Ŷij = Ȳi ) Residual sum of squares m n i RSS = (Y ij Ȳi ) 2 i=1 j=1 Source Sum of Squares df Mean Square F SS group m i=1 n i(ȳi Ȳ )2 m-1 RegSS m 1 = RegMS RegMS MSE Residuals m ni i=1 j=1 (Y ij Ȳi ) 2 RSS n-m n m = MSE Total m i=1 ni j=1 (Y ij Ȳ )2 n-1 18

Assessing the assumptions of one-way ANOVA Normal distribution of response variable in each population (or group) histograms, boxplots for sample data from each population (done separately) normal qq plot for sample data from each population (done separately) normal qq plot of all residuals from the fitted model Same standard deviation (or variance) in all populations can use Levene s test for homogeneity of variance (but it assumes normality of observations) rule of thumb: if largest sample standard deviation isn t more than twice as large as smallest sample standard deviation, assumption is probably met close enough for ANOVA to be OK 19

if n i s are equal (balanced design), the ANOVA is less sensitive to the violation of equal variance If one or both assumptions are violated, try a transformation. If only normality is violated, try non-parametric procedure such as Kruskal-Wallis test. 20

Earlier, we mentioned that 1-way ANOVA... Focuses on testing for differences among group means. Can you get at the differences between means using either effects model coding method (i.e. either constraint)? Case 1: Baseline coding (α m = 0) µ represents baseline group. α 1 is distance group 1 from baseline group. α 2 is distance group 2 from baseline group. α 2 α 1 is distance between group 1 & 2. Case 2: sum-to-zero coding ( α i = 0) µ represents overall or grand mean. α 1 is distance group 1 from overall mean. α 2 is distance group 2 from overall mean. α 2 α 1 is distance between group 1 & 2. 21

The answer is yes. The interpretation of the parameters depends on the constraint used, but the important results are still the same (p-values, Ŷij values, etc.). Because hypothesis tests are built on parameter interpretation, the hypothesis test used to answer a given question does depend on the constraint used. 22