MODELS WITHOUT AN INTERCEPT

Similar documents
Multiple Regression: Example

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Tests of Linear Restrictions

MATH 644: Regression Analysis Methods

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Stat 5303 (Oehlert): Randomized Complete Blocks 1

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

1 Use of indicator random variables. (Chapter 8)

NC Births, ANOVA & F-tests

36-707: Regression Analysis Homework Solutions. Homework 3

Stat 5102 Final Exam May 14, 2015

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

General Linear Statistical Models - Part III

STA 303H1F: Two-way Analysis of Variance Practice Problems

Regression and the 2-Sample t

Lecture 10. Factorial experiments (2-way ANOVA etc)

ST430 Exam 2 Solutions

Workshop 7.4a: Single factor ANOVA

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

MS&E 226: Small Data

Biostatistics 380 Multiple Regression 1. Multiple Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Variance Decomposition and Goodness of Fit

Psychology 405: Psychometric Theory

Booklet of Code and Output for STAC32 Final Exam

Multiple Linear Regression

Chapter 8 Conclusion

Inference for Regression

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Dealing with Heteroskedasticity

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Stat 401B Exam 2 Fall 2015

Stat 5303 (Oehlert): Models for Interaction 1

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Linear Model Specification in R

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

R Output for Linear Models using functions lm(), gls() & glm()

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

ST430 Exam 1 with Answers

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

1 Multiple Regression

Coefficient of Determination

STAT 350: Summer Semester Midterm 1: Solutions

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Factorial Analysis of Variance with R

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Exercise 2 SISG Association Mapping

Statistics Lab #6 Factorial ANOVA

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Stat 401B Final Exam Fall 2015

Ch 2: Simple Linear Regression

CAS MA575 Linear Models

1 Introduction 1. 2 The Multiple Regression Model 1

Comparing Nested Models

Module 4: Regression Methods: Concepts and Applications

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Statistics for Engineers Lecture 9 Linear Regression

Week 7 Multiple factors. Ch , Some miscellaneous parts

Chapter 12: Linear regression II

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Motor Trend Car Road Analysis

Diagnostics and Transformations Part 2

Multiple Predictor Variables: ANOVA

FACTORIAL DESIGNS and NESTED DESIGNS

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

MATH 556 Homework 13 Due: Nov 21, Wednesday

Density Temp vs Ratio. temp

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

The Classical Linear Regression Model

5. Linear Regression

lm statistics Chris Parrish

Example of treatment contrasts used by R in estimating ANOVA coefficients

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Extensions of One-Way ANOVA.

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

movies Name:

Extensions of One-Way ANOVA.

R 2 and F -Tests and ANOVA

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

The Statistical Sleuth in R: Chapter 5

Multiple Regression Part I STAT315, 19-20/3/2014

STAT 215 Confidence and Prediction Intervals in Regression

Stat 401B Exam 2 Fall 2016

13 Simple Linear Regression

Chapter 3: Multiple Regression. August 14, 2018

Lecture 18: Simple Linear Regression

Fractional Factorial Designs

Linear Modelling: Simple Regression

Transcription:

Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level combination; total sample size n 3 5 4 60. Data can be simulated as follows set.seed(2332) nk<-4; m.a<-3; m.b<-5 n<-nk*m.a*m.b A<-rep(gl(m.A,m.B,labelsc(0(m.A-1))),nk) B<-rep(gl(m.B,1,lengthm.A*m.B,labelsc(0(m.B-1))),nk) beta.a<-c(0,2,-1) beta.b<-c(1,1,2,3,1) beta.ab<-matrix(0,nrowm.a,ncolm.b) beta.ab[2,25]<-0.1 beta.ab[3,25]<--0.1 Y<-beta.A[as.numeric(A)]+beta.B[as.numeric(B)]+diag(beta.AB[as.numeric(A),as.numeric(B)])+rnorm(n)*2 table(a,b) 0 4 4 4 4 4 1 4 4 4 4 4 2 4 4 4 4 4 Consider first the model A that fits the intercept plus the A factor. This model supposes that the three groups of 20 subjects, defined by the three levels of factor A, each have a different mean. The mean model is β 2 0 A 0 E[Y i x i ] β 0 + βaj1 C j (A i ) β 0 + βa1 C A i 1 j1 β 0 + βa2 C A i 2 respectively. We define the group means by the usual reparameterization β G 0 β 0 β G 1 β 0 + β C A1 β G 2 β 0 + β C A2 fit1<-lm(y~a); summary(fit1) Call lm(formula Y ~ A) Residuals Min 1Q Median 3Q Max -4.9040-1.9531 0.0044 2.0188 5.0067 Coefficients Estimate Std. Error t value Pr(> t ) (Intercept) 1.9141 0.5354 3.575 0.000722 *** A1 1.1201 0.7572 1.479 0.144574 A2-1.1159 0.7572-1.474 0.146068 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 2.394 on 57 degrees of freedom Multiple R-squared 0.1327,Adjusted R-squared 0.1023 F-statistic 4.36 on 2 and 57 DF, p-value 0.0173 1

which reveals the estimates β 0 1.9141, β C A1 1.1201, β C A2 1.1159 and group mean estimates β G 0 1.9141, β G 1 1.9141 + 1.1201 3.0342 βg 2 1.9141 1.1159 0.7983. Using the formula Y -1+A, we may reparameterize directly. fit2<-lm(y~-1+a); summary(fit2) Call lm(formula Y ~ -1 + A) Residuals Min 1Q Median 3Q Max -4.9040-1.9531 0.0044 2.0188 5.0067 Coefficients Estimate Std. Error t value Pr(> t ) A0 1.9141 0.5354 3.575 0.000722 *** A1 3.0342 0.5354 5.667 5.02e-07 *** A2 0.7983 0.5354 1.491 0.141505 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 2.394 on 57 degrees of freedom Multiple R-squared 0.4525,Adjusted R-squared 0.4237 F-statistic 15.71 on 3 and 57 DF, p-value 1.462e-07 This produces the same estimates of group means β G k, k 0, 1, 2. The two models yield the same SS Res quantity, as expected. anova(fit1) Analysis of Variance Table Response Y Df Sum Sq Mean Sq F value Pr(>F) A 2 50.00 24.9982 4.36 0.0173 * Residuals 57 326.81 5.7336 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(fit2) Analysis of Variance Table Response Y Df Sum Sq Mean Sq F value Pr(>F) A 3 270.15 90.052 15.706 1.462e-07 *** Residuals 57 326.81 5.734 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 However, the ANOVA tables are different, as are the R 2 quantities. This is because the analysis with the intercept included uses the decomposition SS T SS Res + SS R and the model with the intercept excluded uses the decomposition where SS T SS Res + SS R n SS T yi 2 n(y) 2 SS T n(y) 2. i1 2

and SS R SS R + n(y) 2. Thus the difference between the Sum Sq quantities is ny 2 220.1585031. n(y) 2 270.15 50.00 220.15 The R 2 values for the models with and without intercepts are SS R SS T SS R SS Res + SS R 50.00 50.00 + 326.81 0.13269 and SS R SS T SS R SS Res + SS R respectively. We now consider the model A + B with and without the intercept fit3<-lm(y~a+b);summary(fit3) Call lm(formula Y ~ A + B) Residuals Min 1Q Median 3Q Max -4.0764-1.7721 0.2442 1.7323 3.8037 Coefficients Estimate Std. Error t value Pr(> t ) (Intercept) 0.3097 0.7606 0.407 0.68552 A1 1.1201 0.7042 1.591 0.11764 A2-1.1159 0.7042-1.585 0.11900 1 2.1083 0.9091 2.319 0.02428 * 2 1.9145 0.9091 2.106 0.03997 * 3 3.0233 0.9091 3.326 0.00161 ** 4 0.9760 0.9091 1.074 0.28789 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 2.227 on 53 degrees of freedom Multiple R-squared 0.3025,Adjusted R-squared 0.2235 F-statistic 3.831 on 6 and 53 DF, p-value 0.003026 fit4<-lm(y~-1+a+b);summary(fit4) Call lm(formula Y ~ -1 + A + B) Residuals Min 1Q Median 3Q Max -4.0764-1.7721 0.2442 1.7323 3.8037 Coefficients Estimate Std. Error t value Pr(> t ) A0 0.3097 0.7606 0.407 0.68552 A1 1.4298 0.7606 1.880 0.06564. A2-0.8062 0.7606-1.060 0.29400 1 2.1083 0.9091 2.319 0.02428 * 2 1.9145 0.9091 2.106 0.03997 * 3 3.0233 0.9091 3.326 0.00161 ** 4 0.9760 0.9091 1.074 0.28789 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 2.227 on 53 degrees of freedom Multiple R-squared 0.5597,Adjusted R-squared 0.5016 F-statistic 9.626 on 7 and 53 DF, p-value 1.162e-07 270.15 270.15 + 326.81 0.45254 3

The parameter estimates for the levels of factor A change in the same way as before; the contrast parameter estimates do not change. tapply(y,indexlist(aa,bb),mean) #Data means by group 0 0.4438347 2.918221 2.1054444 3.781890 0.3212851 1-0.3813390 3.287376 3.5894262 4.500478 4.1752655 2 0.8708594 1.052791 0.9820351 1.720837-0.6352464 tapply(fitted(fit3),indexlist(aa,bb),mean) #Fitted means with intercept 0 0.3097095 2.418054 2.224226 3.332993 1.2856925 1 1.4298158 3.538160 3.344333 4.453099 2.4057988 2-0.8061702 1.302174 1.108347 2.217113 0.1698128 tapply(fitted(fit4),indexlist(aa,bb),mean) #Fitted means without intercept 0 0.3097095 2.418054 2.224226 3.332993 1.2856925 1 1.4298158 3.538160 3.344333 4.453099 2.4057988 2-0.8061702 1.302174 1.108347 2.217113 0.1698128 In an interaction model fit5<-lm(y~a*b); summary(fit5) Call lm(formula Y ~ A * B) Residuals Min 1Q Median 3Q Max -4.5253-1.3724 0.2075 1.3955 4.0841 Coefficients Estimate Std. Error t value Pr(> t ) (Intercept) 0.4438 1.0951 0.405 0.6872 A1-0.8252 1.5488-0.533 0.5968 A2 0.4270 1.5488 0.276 0.7840 1 2.4744 1.5488 1.598 0.1171 2 1.6616 1.5488 1.073 0.2891 3 3.3381 1.5488 2.155 0.0365 * 4-0.1225 1.5488-0.079 0.9373 A1B1 1.1943 2.1903 0.545 0.5883 A2B1-2.2925 2.1903-1.047 0.3009 A1B2 2.3092 2.1903 1.054 0.2974 A2B2-1.5504 2.1903-0.708 0.4827 A1B3 1.5438 2.1903 0.705 0.4846 A2B3-2.4881 2.1903-1.136 0.2620 A1B4 4.6792 2.1903 2.136 0.0381 * A2B4-1.3836 2.1903-0.632 0.5308 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 2.19 on 45 degrees of freedom Multiple R-squared 0.4271,Adjusted R-squared 0.2488 F-statistic 2.396 on 14 and 45 DF, p-value 0.01353 4

fit6<-lm(y~-1+a*b); summary(fit6) Call lm(formula Y ~ -1 + A * B) Residuals Min 1Q Median 3Q Max -4.5253-1.3724 0.2075 1.3955 4.0841 Coefficients Estimate Std. Error t value Pr(> t ) A0 0.4438 1.0951 0.405 0.6872 A1-0.3813 1.0951-0.348 0.7293 A2 0.8709 1.0951 0.795 0.4307 1 2.4744 1.5488 1.598 0.1171 2 1.6616 1.5488 1.073 0.2891 3 3.3381 1.5488 2.155 0.0365 * 4-0.1225 1.5488-0.079 0.9373 A1B1 1.1943 2.1903 0.545 0.5883 A2B1-2.2925 2.1903-1.047 0.3009 A1B2 2.3092 2.1903 1.054 0.2974 A2B2-1.5504 2.1903-0.708 0.4827 A1B3 1.5438 2.1903 0.705 0.4846 A2B3-2.4881 2.1903-1.136 0.2620 A1B4 4.6792 2.1903 2.136 0.0381 * A2B4-1.3836 2.1903-0.632 0.5308 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 2.19 on 45 degrees of freedom Multiple R-squared 0.6384,Adjusted R-squared 0.5178 F-statistic 5.296 on 15 and 45 DF, p-value 6.465e-06 The fitted values from this model recover the group means exactly tapply(y,indexlist(aa,bb),mean) 0 0.4438347 2.918221 2.1054444 3.781890 0.3212851 1-0.3813390 3.287376 3.5894262 4.500478 4.1752655 2 0.8708594 1.052791 0.9820351 1.720837-0.6352464 tapply(fitted(fit6),indexlist(aa,bb),mean) 0 0.4438347 2.918221 2.1054444 3.781890 0.3212851 1-0.3813390 3.287376 3.5894262 4.500478 4.1752655 2 0.8708594 1.052791 0.9820351 1.720837-0.6352464 anova(fit3,fit5) Analysis of Variance Table Model 1 Y ~ A + B Model 2 Y ~ A * B Res.Df RSS Df Sum of Sq F Pr(>F) 1 53 262.82 2 45 215.88 8 46.941 1.2231 0.3078 5

anova(fit5) Analysis of Variance Table Response Y Df Sum Sq Mean Sq F value Pr(>F) A 2 49.996 24.9982 5.2108 0.009215 ** 4 63.988 15.9971 3.3345 0.017858 * AB 8 46.941 5.8677 1.2231 0.307797 Residuals 45 215.883 4.7974 --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 6