assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Similar documents
Multiple Regression Examples

1 Use of indicator random variables. (Chapter 8)

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

One-Way Analysis of Variance (ANOVA)

Ch 3: Multiple Linear Regression

Confidence Interval for the mean response

Basic Business Statistics, 10/e

Matrix Approach to Simple Linear Regression: An Overview

Ch 2: Simple Linear Regression

Inferences for Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Inference for Regression Inference about the Regression Model and Using the Regression Line

Multiple Linear Regression

STAT 540: Data Analysis and Regression

Two-Way Analysis of Variance - no interaction

Lecture 10 Multiple Linear Regression

Model Building Chap 5 p251

Linear models and their mathematical foundations: Simple linear regression

F-tests and Nested Models

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

14 Multiple Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

Applied Regression Analysis

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Inference for the Regression Coefficient

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Simple Linear Regression: A Model for the Mean. Chap 7

Correlation Analysis

Correlation & Simple Regression

Statistical Modelling in Stata 5: Linear Models

Inference for Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

INFERENCE FOR REGRESSION

Lecture 11: Simple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 18: Simple Linear Regression

Concordia University (5+5)Q 1.

Simple and Multiple Linear Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Categorical Predictor Variables

Chapter 1. Linear Regression with One Predictor Variable

Linear Algebra Review

Lectures on Simple Linear Regression Stat 431, Summer 2012

Models with qualitative explanatory variables p216

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Statistics for Managers using Microsoft Excel 6 th Edition

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

General Linear Model (Chapter 4)

MAT 212 Introduction to Business Statistics II Lecture Notes

Properties of the least squares estimates

Basic Business Statistics 6 th Edition

Lecture 6 Multiple Linear Regression, cont.

Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression

TMA4255 Applied Statistics V2016 (5)

Lecture 3: Inference in SLR

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

ECO220Y Simple Regression: Testing the Slope

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STOR 455 STATISTICAL METHODS I

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

1 Introduction to Minitab

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

STAT 360-Linear Models

Chapter 5 Matrix Approach to Simple Linear Regression

Ordinary Least Squares Regression Explained: Vartanian

MATH ASSIGNMENT 2: SOLUTIONS

Variance Decomposition and Goodness of Fit

Inference for Regression Simple Linear Regression

Homoskedasticity. Var (u X) = σ 2. (23)

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12

Estadística II Chapter 5. Regression analysis (second part)

The Multiple Regression Model

Hypothesis Testing hypothesis testing approach

Contents. 2 2 factorial design 4

POL 213: Research Methods

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Lecture 18 MA Applied Statistics II D 2004

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

This document contains 3 sets of practice problems.

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

23. Inference for regression

ANOVA: Analysis of Variation

Chapter 14 Student Lecture Notes 14-1

Multiple Regression: Chapter 13. July 24, 2015

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Lecture 1: Linear Models and Applications

Table 1: Fish Biomass data set on 26 streams

Chapter 12 - Lecture 2 Inferences about regression coefficient

Transcription:

Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship between mean of Y and the X s with additive normal errors X ij is the value of independent variable j for subject i Y i is the value of the dependent variable for subject i, i = 1, 2,, n Statistical model Y i = β 0 + β 1 X i1 + β 2 X i2 + + β k X ik + ɛ i the errors are assumed to be a sample from N(0, σ 2 )

Example 1: The data set heightmtw can be found on WebCT, in the data files folder The variables in the data set include Y (height of student), X 1 (height of same sex parent), X 2 (height of opposite sex parent), and sex of the student (X 3 = 1 for males; X 3 = 0 for females) The following minitab output gives the various Pearson correlations for different pairs Results for: HEIGHTSMTW MTB > corr c1-c4 Student heig Height of sa Height of op Height of sa 0641 0000 Height of op -0193-0408 0205 0005 Male or Fema 0692 0606-0539 0000 0000 0000 Cell Contents: Pearson correlation P-Value

There are a number of possible regression models, including regressions on single or multiple independent variables As with simple linear regression, parameters are estimated by least squares Details of estimation and statistical inference to be discussed later

Student height versus height of same sex parent MTB > regress c1 1 c2 Y i = β 0 + β 1 X i1 + ɛ i The regression equation is Student height = 307 + 0557 Height of same sex parent Predictor Coef SE Coef T P Constant 30702 6765 454 0000 Ht of same sex parent 05567 01017 547 0000 S = 274051 R-Sq = 411% R-Sq(adj) = 397% Analysis of Variance Source DF SS MS F P Regression 1 22507 22507 2997 0000 Residual Error 43 32295 751 Total 44 54802

Student height versus height of opposite sex parent MTB > regress c1 1 c3 Y i = β 0 + β 1 X i2 + ɛ i The regression equation is Student height = 819-0209 Height of opposite sex par Predictor Coef SE Coef T P Constant 8189 1106 740 0000 Ht opposite sex parent -02094 01626-129 0205 S = 350309 R-Sq = 37% R-Sq(adj) = 15% Analysis of Variance Source DF SS MS F P Regression 1 2034 2034 166 0205 Residual Error 43 52768 1227 Total 44 54802

The regression equation is Student height = 227 + 0586 Height of same sex parent + 0090 Height of opposite sex parent Y i = β 0 + β 1 X i1 + β 2 X i2 + ɛ i MTB > regress c1 2 c2 c3 Predictor Coef SE Coef T P Constant 2266 1430 159 0120 Ht same sex parent 05859 01122 522 0000 Ht opposite sex parent 00897 01403 064 0526 S = 275953 R-Sq = 416% R-Sq(adj) = 389% Analysis of Variance Source DF SS MS F P Regression 2 22819 11409 1498 0000 Residual Error 42 31983 762 Total 44 54802

The regression equation is Student height = 215 + 0338 Height of same sex parent + 0325 Height of opposite sex parent + 444 Male or Female ( 1=M, 0 = F) Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i MTB > regress c1 3 c2 c3 c4 Predictor Coef SE Coef T P Constant 2150 1170 184 0073 Ht same sex parent 03375 01061 318 0003 Ht opposite sex parent 03249 01254 259 0013 Male/Female (1=M,0=F) 44446 09534 466 0000 S = 225790 R-Sq = 619% R-Sq(adj) = 591% Analysis of Variance Source DF SS MS F P Regression 3 33900 11300 2216 0000 Residual Error 41 20902 510 Total 44 54802

Multiple Regression Model Statistical model: Y i = β 0 + β 1 X i1 + β 1 X i2 + + β 1 X ik + ɛ i Assumptions: The errors are a sample from N(0, σ 2 ) Estimation: The regression parameters are estimated using least squares That is, we choose β 0, β 1,, β k to minimize SSE = n (y i β 0 β 1 x i1 β k x ik ) 2 i=1 Inferences: The ANOVA table similar to that for simple linear regression, with changes to degrees of freedom to match the number of predictor variables

Source df SS MS F Regression k SSR MSR=SSR/k F obs =MSR Residual n-k-1 SSE MSE=SSE/(n-k-1) Total n-1 SST SST=SSR+SSE H 0 : β 1 = β 2 = = β k = 0 H A : one or more of β 1, β 2,, β k are non-zero p-value = P (F k,n k 1 > F obs ) The goodness of fit of the linear regression line is measured by the coefficient of determination R 2 R 2 = SSR SST R 2 is the fraction of the total variability in y accounted for by the regression line, and ranges between 0 and 1 R 2 = 100 indicates a perfect (linear) fit, while R 2 = 000 is a complete lack of linear fit

Matrix Methods for Multiple Linear Regression Y i = β 0 + β 1 X i1 + β 2 X i2 + + β k X ik + ɛ i X ij the value of explanatory variable j for subject i Y 1 Y 2 Y n = 1 X 11 X 12 X 1k 1 X 21 X 22 X 2k 1 X 31 X 32 X 3k 1 X n1 X n2 X nk β 0 β 1 β 2 β k + ɛ 1 ɛ 2 ɛ n or in matrix form Y = Xβ + ɛ where Y = Y 1 Y 2 Y n

X = 1 X 11 X 12 X 1k 1 X 21 X 22 X 1k 1 X 31 X 32 X 1k 1 X n1 X n2 X nk β = β 0 β 1 β k and ɛ = ɛ 1 ɛ 2 ɛ n

ˆβ 0, ˆβ1,, ˆβk denote estimates of the true regression coefficients β 0, β 1,, β k The estimates minimize the sum of squared residuals, or error sum of squares(sse) Hence, they are then called least squares estimates The error sum of squares is: SSE = n i=1 e 2 i = n i=1 (y i ( ˆβ 0 + ˆβ 1 x i1 ++ ˆβ k x ik )) 2 = e e SSE is minimized when all SSE ˆβ i = 0 These conditions result in a system of normal equations which are solved simultaneously to obtain estimates of β 0, β 1,, β k In matrix form the Normal Equations are (X X)ˆβ = (X Y) A unique solution (when it exists) to the normal equations is ˆβ = ˆβ 0 ˆβ 1 ˆβ k = (X X) 1 (X Y) Least squares estimates are calculated using this equation

The vector Ŷ = Ŷ 1 Ŷ 2 Ŷ n contains the predicted values, where Ŷ i = ˆβ 0 + ˆβ 1 X i1 + ˆβ 2 X i2 + + ˆβ k X ik In matrix form, the predicted values are given by Ŷ = Xˆβ The error variance, σ 2, is estimated as follows ˆσ 2 = MSE = SSE n (k + 1) = (yi ŷ i ) 2 n (k + 1) which is the sum of squares for error (SSE) divided by the error degrees of freedom (n-(k+1)) As before, the total sum of squares is (y i ȳ) 2, and the total degrees of freedom is n 1 With these quantities, an ANOVA table can be constructed in the same manner as for simple linear regression

Source df SS MS F Regression k SSR MSR F obs = MSR/MSE Error n-k-1 SSE MSE Total n-1 SST The observed value of F is used to test the overall utility of the regression model The formal hypotheses are H 0 : β 1 = β 2 = = β k = 0, vs H A : at least 1 of β 1, β 2,, β k is non-zero The p-value is P (F k,n k 1 > F obs )

Example: Consider the following data set with 8 values of the response, Y, and four explanatory variables, X1,X2,X3, and X4 Y X1 X2 X3 X4 785 7 26 6 60 743 1 29 15 52 1043 11 56 8 20 876 11 31 8 47 959 7 52 6 33 1092 11 55 9 22 1027 3 71 17 6 725 1 31 22 44 Regression model using all four explanatory variables is ŷ = ˆβ 0 + ˆβ 1 X 1 + ˆβ 2 X 2 + ˆβ 3 X 3 + ˆβ 4 X 4 In MINITAB the data are assembled as shown below MTB > set c1 # Y DATA> 785 743 1043 876 959 1092 1027 725 DATA> set c2 # X1 DATA> 7 1 11 11 7 11 3 1 DATA> set c3 # X2 DATA> 26 29 56 31 52 55 71 31 DATA> set c4 # X3 DATA> 6 15 8 8 6 9 17 22 DATA> set c5 # X4

DATA> 60 52 20 47 33 22 6 44 DATA> end MTB > print c1-c5 C1 C2 C3 C4 C5 Row Y X1 X2 X3 X4 1 785 7 26 6 60 2 743 1 29 15 52 3 1043 11 56 8 20 4 876 11 31 8 47 5 959 7 52 6 33 6 1092 11 55 9 22 7 1027 3 71 17 6 8 725 1 31 22 44 MTB > copy c1 m1 MTB > set c6 DATA> 8(1) DATA> end # Y vector

MTB > copy c6 c2 c3 c4 c5 m2 MTB > print m2 Matrix M2 # m2 = X matrix 1 7 26 6 60 1 1 29 15 52 1 11 56 8 20 1 11 31 8 47 1 7 52 6 33 1 11 55 9 22 1 3 71 17 6 1 1 31 22 44 To determine the value of the regression coefficients, the above matrix equation must be solved To do so, first requires determination of the elements of three other important matrices, (X X), (X Y), and (X X) 1 MTB > transpose m2 m3 MTB > multiply m3 m2 m4

MTB > inverse m4 m5 MTB > print m5 Matrix M5 # m5 = inv(x X) 188008-226974 -190160-209861 -185740-2270 02914 02263 02622 02223-1902 02263 01931 02100 01882-2099 02622 02100 02425 02059-1857 02223 01882 02059 01839 MTB > multiply m3 m1 m6 MTB > multiply m5 m6 m7 MTB > print m7 Matrix M7 # m7 = b 485583 16154 06921 00588 00150 The resulting regression model is Ŷ = 485583 + 16154X 1 + 06921X 2 + 00588X 3 +00150X 4

The error variance, σ 2, is then estimated as MSE below MTB > multiply m2 m7 m8 MTB > subtract m8 m1 m9 MTB > transpose m9 m10 MTB > multiply m10 m9 m11 Answer = 286406 # SSE This is SSE = n i=1 (y i ŷ i ) 2 = ê ê MTB > let k1 = 8-(4+1) # k1 = n-(k+1) MTB > let k2 = 1/k1 MTB > multiply k2 m11 m12 Answer = 95469 # MSE The regression coefficient values estimated above are point estimates To make confidence intervals, we need to first estimate their standard errors In matrix form, the standard errors are the squareroots of the diagonal elements of the covariance matrix of the regression coefficients

The covariance matrix of the regression coefficients contains the variances of ˆβ 0, ˆβ 1,, ˆβ k on its diagonal, and related statistics call covariances off of the diagonals Cov( ˆβ) = σ 2 [X X] 1 The diagonal elements of this covariance matrix, beginning from the (1,1) position, and proceeding to the (k + 1, k + 1) position, are, respectively, V ( ˆβ 0 ), V ( ˆβ 1 ),, V ( ˆβ k ) The covariance matrix is estimated by replacing σ 2 in the previous expression by MSE The square roots of the diagonal elements in this estimated covariance matrix are the standard errors of the associated elements of ˆβ MTB > multiply m12 m5 m13 MTB > print m13 Matrix M13 # Cov(b) 179489-216689 -181543-200351 -177324-2167 2782 2161 2504 2122-1815 2161 1844 2005 1797-2004 2504 2005 2315 1965-1773 2122 1797 1965 1756 MTB > let k3 = sqrt(179489)

MTB > let k4 = sqrt(2782) MTB > let k5 = sqrt(1844) MTB > let k6 = sqrt(2315) MTB > let k7 = sqrt(1756) MTB > print k3 - k7 K3 133974 # SEbo K4 166793 # SEb1 K5 135794 # SEb2 K6 152151 # SEb3 K7 132514 # SEb4 To test the hypothesis H 0 : variable X j has no effect, given that all all other predictor variables are in the model H A : variable X j has an effect, given that all all other predictor variables are in the model alternatively H 0 : variable X j provides no significant additional reduction in SSE given that all all other predictor variables are in the model H A : variable X j provides a significant additional reduction in SSE, given that all all other predictor variables are in the model Calculate the test statistic t obs = ˆβ j /se( ˆβ j ) p-value = 2P (t n 1 k > t obs )

Confidence intervals for the β i are constructed using the standard errors computed above, as ˆβ i ± t α/2,n (k+1) se( ˆβ i ) (The degrees of freedom for t is the error degrees of freedom in the ANOVA table, or n 1 k) Example: 95% confidence intervals for regression coefficients are computed using t 025,3 = 3182 MTB > print m7 Matrix M7 # m7 = b 485583 16154 06921 00588 00150 MTB > let k10 = 485583 + 3182*k3 MTB > let k11 = 485583-3182*k3 MTB > print k10 k11 K10 474862 # 95% CI for Beta 0 K11-377745 MTB > let k12 = 16154 + 3182*k4 MTB > let k13 = 16154-3182*k4

MTB > print k12 k13 K12 692276 # 95% CI for Beta 1 K13-369196 MTB > let k14 = 06921 + 3182*k5 MTB > let k15 = 06921-3182*k5 MTB > print k14 k15 K14 501306 # 95% CI for Beta 2 K15-362886 MTB > let k16 = 00588 + 3182*k6 MTB > let k17 = 00588-3182*k6 MTB > print k16 k17 K16 490025 # 95% CI for Beta 3 K17-478265 MTB > let k18 = 00150 + 3182*k7 MTB > let k19 = 00150-3182*k7 MTB > print k18 k19 K18 423160 # 95% CI for Beta 4 K19-420160

The various calculations are tedious in minitab, which isn t well set up for matrix arithmetic Here are the commands for R or Splus (The course Stat2050 teaches selected aspects of the R and/or Splus programming languages These are the programs of choice for statisticians in North America and Europe, and the programs have superb graphics capabilities) You can download the freeware program R at cranrprojectorg X<-matrix(c( 1,7,26,6,60, 1,1,29,15,52, 1,11,56,8,20, 1,11,31,8,47, 1,7,52,6,33, 1,11,55,9,22, 1,3,71,17,6, 1,1,31,22,44),byrow=T,ncol=5) y<-c(785,743,1043,876,959,1092,1027,725) XpX<-t(X)%*%X # contains Xprime X XpXi<-solve(XpX) #(Xprime Xpy<-t(X)%*%y #Xprime y X) inverse bhat<-xpxi%*%xpy #least squares estimates

yhat<-x%*%bhat #predicted values ymyhat<-y-yhat #residuals SSE<-sum(ymyhat^2) #error sum of squares MSE<-SSE/(8-1-4) #error mean square COV<-MSE*XpXi sqrt(diag(cov)) #covariance matrix of beta hat #standard errors of beta hats cutting and pasting into R or Splus leads to: > XpX<-t(X)%*%X > XpX [,1] [,2] [,3] [,4] [,5] [1,] 8 52 351 91 284 [2,] 52 472 2381 447 1744 [3,] 351 2381 17345 3983 10361 [4,] 91 447 3983 1279 3142 [5,] 284 1744 10361 3142 12458 > > XpXi<-solve(XpX) > XpXi [,1] [,2] [,3] [,4] [,5] [1,] 188007-22697 -19015-20986 -18574 [2,] -2269 0291 0226 0262 0222 [3,] -1901 0226 0193 0210 0188 [4,] -2098 0262 0210 0242 0205 [5,] -1857 0222 0188 0205 0183

> Xpy<-t(X)%*%y > > bhat<-xpxi%*%xpy > bhat [,1] [1,] 485583488 [2,] 16153626 [3,] 06920995 [4,] 00587741 [5,] 00149964 > > yhat<-x%*%bhat > ymyhat<-y-yhat > SSE<-sum(ymyhat^2) > MSE<-SSE/(8-1-4) > MSE [1] 9546873 > > COV<-MSE*XpXi > sqrt(diag(cov)) [1] 133973 1667 1357 1521 1325

Confidence intervals for the mean, and prediction intervals Where x = (1, x 1, x 2,, x k), a 100(1 α)% confidence interval for the mean of Y at x is given by x ˆβ ± tα/2,n k 1 MSE x [X X] 1 x A 100(1 α)% prediction interval for Y when x = x is x ˆβ ± tα/2,n k 1 MSE 1 + x [X X] 1 x Example: In the four variable multiple regression, suppose that we wish to find a 95% confidence interval for the mean, and a 95% prediction interval for y, when x = (1, 8, 30, 10, 40) The syntax of the appropriate minitab regress call is: MTB > regress c20 4 c21-c24; SUBC> pred 8 30 10 40 Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 8343 1259 (4336, 12351) (4217, 12469)XX

In R or Splus the appropriate commands, with output, are: #95% CI for mean at x=xstar xstar<-c(1,8,30,10,40) ypred<-sum(xstar*bhat) tcrit<-qt(975,3) pm<-tcrit*sqrt(mse)*sqrt(xstar%*%(xpxi%*%xstar)) ucl<-ypred+pm lcl<-ypred-pm lcl # [,1] #[1,] 433576 ypred #[1] 8343183 ucl # [,1] #[1,] 1235061 #95% prediction interval for y at x=xstar pm<-tcrit*sqrt(mse)*sqrt(1+ xstar%*%(xpxi%*%xstar)) ucl<-ypred+pm lcl<-ypred-pm lcl # [,1] #[1,] 4216884 ypred %[1] 8343183 ucl % [,1] %[1,] 1246948

Partial F test for comparing nested models: Model 1 - k independent variables Y i = β 0 +β 1 X i1 +β 2 X i2 ++β k X ik +ɛ i Model 2 - m < k independent variables Y i = β 0 +β 1 X i1 +β 2 X i2 ++β m X im +ɛ i is obtained from model 1, by setting β m+1 = β m+2 = = β k = 0 That is, k m of the model 1 parameters are set to 0 in model 2 We say that model 2 is nested within the full model, model 1

Model SSE MSE Regression Error df df 1 SSE 1 MSE 1 k n-k-1 2 SSE 2 MSE 2 m n-m-1 To test the hypothesis H 0 : β m+1 = β m+2 = = β k = 0 against the alternative that at least one of β m+1, β m+2,, β k is nonzero, calculate the test statistic: F obs = (SSE 2 SSE 1 )/(k m) MSE 1 The p-value for the test is P (F k m,n k 1 > F obs ) The numerator degrees of freedom for F is the number of parameters set to 0 under H 0 The denominator degrees of freedom for F is the error degrees of freedom for the bigger model

Types of (Linear) Regression Models Curve Conc = β 0 + β 1 t + β 2 t 2 Conc 00 005 010 015 Conc = 0 + 000460*t -000004*t^2 0 10 20 30 40 50 60 One continuous, one binary predictor Two parallel lines time Conc = β 0 + β 1 time + β 2 X, where X = 0 for Males, 1 for Females Conc 00 01 02 03 Conc=01 + 0015*t + 2*X 0 10 20 30 40 50 60 Time

Two nonparallel lines Conc = β 0 +β 1 time+β 2 X+β 3 time X, where X = 0 for Males, 1 for Females Conc 00 01 02 03 04 05 Conc = 01 + 0015*t + 12*X + 0030*t*X 0 10 20 30 40 50 60 Time Two continuous predictors First order Conc = β 0 + β 1 time + β 2 Dose effect of dose constant over time Conc 00 01 02 03 Conc=01 + 0015*t + 20*dose Dose = 10 Dose = 01 0 10 20 30 40 50 60 Time

Interaction Conc = β 0 +β 1 time+β 2 Dose+β 3 time dose effect of dose changes with time Conc 00 01 02 03 04 05 D = 1 D = 01 Conc = 09 + 0015*t + 11*dose + 0185*t*dose 0 10 20 30 40 50 60 Time

One-Way Analysis of Variance using Regression We can carry out any Analysis of Variance using multiple regression, by suitable definition of the independent variables Recall Example: A group of 32 rats were randomly assigned to each of 4 diets labelled (A,B,C,and D) The response is the liver weight as a percentage of body weight Two rats escaped and another died, resulting in the following data Anova Model: A B C D 342 317 334 365 396 363 372 393 387 338 381 377 419 347 366 418 358 339 355 421 376 341 351 388 384 355 396 344 391 X ij = µ i + e ij Assumptions e ij are independent N(0, σ 2 ), i = 1, 2,, a, j = 1, 2,, n i

Indicator Variables Define a 1 indicator variables to identify the a groups Index the subjects using a single index i, i = 1,, n Let X i1 = 1 if subject i is in group 1, and 0 otherwise X i2 = 1 if subject i is in group 2, and 0 otherwise X i3 = 1 if subject i is in group 3, and 0 otherwise X i,a 1 = 1 if subject i is in group a 1, and 0 otherwise Regression Model : Y i = β 0 + β 1 X i1 + β 2 X i2 + + β a 1 X i,a 1 + ɛ i Table of Means Group ANOVA Regression mean mean 1 µ 1 β 0 + β 1 2 µ 2 β 0 + β 2 a 1 µ a 1 β 0 + β a 1 a µ a β 0

Equivalent ANOVA and regression hypotheses H 0 : µ 1 = µ 2 = = µ a H 0 : β 1 = β 2 = = β a 1

MTB > oneway c1 c2 One-way ANOVA: C1 versus C2 Source DF SS MS F P C2 3 11649 03883 1084 0000 Error 25 08954 00358 Total 28 20603 S = 01893 R-Sq = 5654% R-Sq(adj) = 5132% MTB > indicator c2 c3-c6 MTB > print c2 c3-c6 Data Display Row C2 C3 C4 C5 C6 1 1 1 0 0 0 2 1 1 0 0 0 3 1 1 0 0 0 7 1 1 0 0 0 8 2 0 1 0 0 9 2 0 1 0 0 10 2 0 1 0 0 18 3 0 0 1 0 19 3 0 0 1 0 20 3 0 0 1 0 28 4 0 0 0 1

29 4 0 0 0 1 MTB > regress c1 3 c3-c5 Regression Analysis: C1 versus C3, C4, C5 The regression equation is C1 = 394-0133 C3-0506 C4-0338 C5 Predictor Coef SE Coef T P Constant 393625 006691 5883 0000 C3-013339 009795-136 0185 C4-050625 009463-535 0000 C5-03379 01022-331 0003 S = 0189253 R-Sq = 565% R-Sq(adj) = 513% Analysis of Variance Source DF SS MS F P Regression 3 116490 038830 1084 0000 Residual Error 25 089541 003582 Total 28 206032