STAT5044: Regression and Anova Inyoung Kim 1 / 25
Outline 1 Multiple Linear Regression 2 / 25
Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several predictor variables are added to the regression model, given that other predictor variables are already in the model. Equivalently, one can view an extra sum of squares as measuring the marginal increase in regression sum of squares when one or several predictor variables are added to the regression model. 3 / 25
Example A study of the relation of amount of body fat (Y ) to several possible predictor variables, based on a sample of 20 healthy females 25-34 years old. The possible predictor variables are triceps skinfold thickness (X 1 ), thigh circumference (X 2 ), and midarm circumference (X 3 ). The amount of body fat for each of the 20 persons was obtained by a cumbersome and expensive procedure requiring the immersion of the person in water. It would therefore be very helpful if a regression model with some or all of these predictor variables could provide reliable estimates of the amount of body fat since the measurements needed for the predictor variables are easy to obtain. 4 / 25
Multiple Linear Regression When there are m independent variables, y i = β 0 + β 1 x i1 + β 2 x i2 + + β m x im + ε i where ε i N(0,σ 2 ). Special case is polynomial regression: X m = x m Y = Xβ + ε X = [1 x 1 x 2 x m ] LSE ˆβ = (X t X) 1 X t y N(β,(X t X) 1 σ 2 ) 5 / 25
Multiple Regression Set of methods for estimation of model parameters and testing hypotheses about simple model relating a response and explanatory variable Is there a linear relationship? what variables are important? Is there a best model to use? Predict a new value Predict the value of the explanatory variable that causes a specified response 6 / 25
Multiple Regression Similar to SLR - multiple explanatory vars Tools: graphical assessment more difficult interpretation more complex scatterplot matrix sequential and partial tests variable selection methods residual analysis 7 / 25
Gauss Markov Theorem E(Y ) = Xβ, Var(Y ) = σ 2 I, ˆβ : LS estimator of β C t ˆβ is an unbiased estimate of C t β For any other linear unbiased estimate of C t β, Ψ, Var( Ψ) Var( ˆΨ), where ˆΨ = C t ˆβ NOTE: Do NOT need normality assumption 8 / 25
Testing H 0 : There is no regression relationship between Y and X. β 1 = β 2 = = β m = 0 H 1 : at least one parameter 0 ANOVA table Source SS df regression (Ŷ i Ȳ ) 2 m error (Y i Ŷ i ) 2 n-m-1 total (Y i Ȳ.. ) n-1 Test statistic F m,n m 1 = SSreg/m SSE/(n m 1) 9 / 25
Testing H 0 : β 3 = 0 H 0 : β 3 = 0 vs H a : β 3 0 Reduced model: Y i = β 0 + β 1 X i1 + β 2 X i2 + ε i SSE(R) = SSE(X 1,X 2 ) Full model: Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i SSE(F) = SSE(X 1,X 2,X 3 ) Test statistic: what is df F =? what is df R =? F = (SSE(R) SSE(F))/(df R df F ) SSE(F)/df F 10 / 25
Testing H 0 : β k = 0 H 0 : β k = 0 vs H a : β k 0 Reduced model: Y i = β 0 + β 1 X i1 + + β k 1 X i,k 1 + β k+1 X i,k+1 + β m x mi + ε i SSE(R) = SSE(X 1,...,X k 1,X k+1,...,x m ) Full model: Y i = β 0 + β 1 X i1 + + β k X i,k + + β m x mi + ε i SSE(F) = SSE(X 1,...,X k,...,x m ) Test statistic: what is df F = what is df R = F = (SSE(R) SSE(F))/(df R df F ) SSE(F)/df F 11 / 25
Testing H 0 : β 2 = β 3 = 0 H 0 : β 2 = β 3 = 0 vs H a : not both β 2 and β 3 equal zero Reduced model: Y i = β 0 + β 1 X i1 + ε i SSE(R) = SSE(X 1 ) Full model: Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i SSE(F) = SSE(X 1,X 2,X 3 ) SSE(X 1 ) SSE(X 1,X 2,X 3 ) = SSR(X 2,X 3 X 1 ) Test statistic: F = SSE(X 1) SSE(X 1,X 2,X 3 )/((n 2) (n 4)) SSE(X 1,X 2,X 3 )/(n 4) = SSR(X 2,X 3 X 1 )/(4 2) SSE(X 1,X 2,X 3 )/(n 4) = MSR(X 2,X 3 X 1 ) MSE(X 1,X 2,X 3 ) 12 / 25
Test H 0 : some β k = 0 H 0 : β q = β q+1 = = β p 1 = 0 vs H a : not all of the β k in H 0 equal zero. Test statistic: NOTE: F = SSR(X q,...,x p 1 X 1,...,X q 1 )/(p q)) SSE(X 1,...,X p 1 )/(n p) = MSR(X q,...,x p 1 X 1,...,X q 1 ) MSE SSR(X q,...,x p 1 X 1,...,X q 1 ) = SSR(X q X 1,...,X q 1 ) + + SSR(X p 1 X 1,...,X p 2 ) 13 / 25
Testing H 0 : β 1 = β 2 Full model: Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i Test: H 0 : β 1 = β 2 vs H a : β 1 β 2 Define β c as β 1 = β 2 Reduced model Y i = β 0 + β c (X i1 + X i2 ) + β 3 X i3 + ε i where β c : the common coefficient for β 1 and β 2 under H 0 and X i1 + X i2 is corresponding new X variable. Test statistics is What is df R? What is df F? F = (SSE(R) SSE(F))/(df R df F ) SSE(F)/df F 14 / 25
Test whether some β k = 0 Test statistic can be stated equivalently in terms of the coefficients of multiple determination for the full and reduced models when these models contain the intercept term β 0 as follow: F = (R2 y x 1,...,x p 1 R 2 y x 1,...,x q 1 )/(p q) (1 R 2 y x 1,...,x p 1 )/(n p) where R 2 Y x 1,...,x p 1 : the coefficient of multiple determination when Y is regression on all X variables and R 2 y x 1,...,x q 1 : the coefficient when Y is regressed on x 1,...,x q 1 only. Q: What is the coefficient of multiple determination? 15 / 25
Coefficients of Partial Determination Extra sums of squares are not only useful for tests on the regression coefficients of a multiple regression model, but they are also encountered in descriptive measures of relationship called coefficients of partial determination. Recall that the coefficient of multiple determination, R 2, measures the proportionate reduction in the variation of Y achieved by the introduction of the entire set of X variables considered in the model. A coefficient of partial determination, in contrast, measures the marginal contribution of one X variable when all others are already included in the model. 16 / 25
Calculation of Coefficients of Partial Determination R 2 y,x 1 x 2 = SSE(X 2) SSE(X 1,X 2 ) SSE(X 2 ) R 2 y,x 2 x 1 = SSE(X 1) SSE(X 1,X 2 ) SSE(X 1 ) = SSR(X 1 X 2 ) SSE(X 2 ) = SSR(X 2 X 1 ) SSE(X 1 ) R 2 y,x 1 x 2,x 3 = SSR(X 1 X 2,X 3 ) SSE(X 2,X 3 ) R 2 y,x 2 x 1,x 3 = SSR(X 2 X 1,X 3 ) SSE(X 1,X 3 ) R 2 y,x 3 x 1,x 2 = SSR(X 3 X 1,X 2 ) SSE(X 1,X 2 ) R 2 y,x 4 x 1,x 2,x 3 = SSR(X 4 X 1,X 2,x 3 ) SSE(X 1,X 2,x 3 ) 17 / 25
Coefficients of Partial Determination The coefficients of partial determination can take on values between 0 and 1, as the definitions readily indicates 18 / 25
Coefficients of Partial Determination A coefficient of partial determination can be interpreted as a coefficient of simple determination. Suppose we regress Y on X 2 and obtain the residuals e i (Y X 2 ) = Y i Ŷ i (X 2 ) where Ŷ i (X 2 ): fitted values of Y when X 2 is in the model. Suppose we further regress X 1 on X 2 and obtain the residuals e i (X 1 X 2 ) = X i1 ˆXi1 (X 2 ) where ˆXi1 (X 2 ): the fitted values of X 1 in the regression of X 1 on X 2. 19 / 25
Coefficients of Partial Determination The coefficient of simple determination R 2 between these two sets of residuals equals the coefficient of partial determination R 2 Y,X 1 X 2. This coefficient measures the relation between Y and X 1 when both of these variables have been adjusted for their linear relationships to X 2 20 / 25
Added variable plots/partial regression plots The plot of the residuals e i (Y X 2 ) against e i (X 1 X 2 ) provides a graphical representation of the strength of the relationship between Y and X 1, adjusted for X 2. Such plots of residuals, called added variable plots or partial regression plots 21 / 25
R 2 and partial correlation R 2 Y,X 2 X 1 = [r Y,X2 X 1 ] 2 = (r Y,X 2 r X1,X 2 r Y,X1 ) 2 (1 rx 2 1 X 2 )(1 ryx 2 1 ) R 2 Y,X 2 X 1 X 3 = [r Y,X2 X 1 X 3 ] 2 = (r Y,X 2 X 3 r X1,X 2 X 3 r Y,X1 X 3 ) 2 (1 r 2 )(1 X 1 X 2 X 3 r 2 YX 1 X 3 ) 22 / 25
Partial correlation r 2 y,x k x 1,...,x k 1 0 r 2 y,x k x 1,...,x k 1 1 r 2 y,x k x 1,...,x k 1 = 1 r 2 y,ŷ(x 1,...,x k 1,x k ) = 1 r 2 {y T (I H [k 1] )x k } 2 y,x k x 1,...,x k 1 = {y T (I H [k 1] )y}{xk t (I H [k 1])x k } where H [k 1] = X [k 1] [X t [k 1] X [k 1]] 1 X t k 1, X [k 1] = [1,x 1,...,x k 1 ] 23 / 25
Recall: R 2 R 2 : determination of coefficient. R 2 = SSreg S yy = 1 SSE S yy In the simple linear regression case, In general, R 2 = (r y,x ) 2 = [ (x i x) (y i ȳ)] 2 S yy S xx = r 2 R 2 = (r y,ŷ ) 2 = [ (Y i Ȳ )(Ŷ i Ŷ i )] 2 (Y i Ȳ ) 2 (Ŷ i Ŷ i ) 2 24 / 25
Proof In general, R 2 = (r y,ŷ ) 2 = [ (Y i Ȳ )(Ŷ i Ŷ i )] 2 (Y i Ȳ ) 2 (Ŷ i Ŷ i ) 2 25 / 25