Finansiell Statistik, GN, 5 hp, VT28 Lecture 5: Multiple Linear Regression & Correlation Gebrenegus Ghilagaber, PhD, ssociate Professor May 5, 28
Introduction In the simple linear regression Y i = + X i + i () the least square estimates of the parameters were obtained from the normal equations: and Y i = na + b X i Y i = a X i + b X i (2) X 2 i (3) 2
2 The Models and its assumptions Suppose, we extend the simple linear regression model by including one explanatory variable. Then, the population multiple regression model becomes and its sample estimate is given by Y i = + X i + 2 X 2i + i (4) by i = a + b X i + b 2 X 2i + e i (5) The sum of squares of the error terms is, then, given by: e 2 i = b Y i 2 X n Y i = (Y i a b X i b 2 X 2i ) 2 ; (6) 3
and the least square estimates of the parameters are obtained from the normal equations: Y i = na + b X i + b 2 X 2i (7) and X i Y i = a X i + b Xi 2 + b 2 X i X 2i (8) X 2i Y i = a X 2i + b X i X 2i + b 2 X2i 2 (9) 4
3 Standard ssumptions for Multiple Regression (with two explanatory variables) Normality - the i are normally distributed. Zero mean - the i have zero mean, E( i ) = Constant Variance (Homoscedasticity): the i are normally distributed.with mean and constant variance, i s N(; 2 ) Independence: the i are independent, Cov( i ; j ) = for i 6= j: 5
X i and i are uncorrelated: The X i are either xed or random but uncorrelated with the i, Cov( i ; X i ) = No Multicollinearity: the explanatory variables X and X 2 are not strongly correlated 6
4 Estimating Multiple Regression Parameters In obtaining the least square estimates of the parameters in multiple regression, it is easier to work with the deviations: y i = Y i Y ; x i = X i X ; and x 2i = X 2i X 2 instead of Y i ; X i ; and X 2i : In such a case, and y i = x 2i = Yi Y = ; X2i X 2 = x i = Xi X = ; 7
Thus, equations (8) and (9) may be rewritted as, x i y i = b x 2 i + b 2 x i x 2i () x 2i y i = b x i x 2i + b 2 x 2 2i () 8
From equations () and () we get, b = b 2 = x 2 2i x 2i y i 2 x 2 i x 2 2i x i x 2i x i y i x 2 i x i y i 2 x 2 i x 2 2i x i x 2i x 2i y i x i x 2i x i x 2i and (it can be shown that), a = Y b X b 2 X 2 9
5 Decomposing the total variance - the NOV Table s we did in Simple Linear Regression, we can now decompose the total variance into its various sources and create an NOV table: Y i Y = Y i b Y i + b Y i Y = Y i b Y i + b Y i Y so that Yi Y 2 = n X = +2 Yi Yi b Y i 2 + n X b Y i Y 2 Yi b Y i b Y i Y b Y i 2 + n X b Y i Y 2
The corresponding NOV-table will then be given by Source of Degrees Sum of Mean F-ratio variation of freedom Squares Squares Regression k SS R = Error n k SS E = b Y i Y 2 MS R = SS R k Yi b Y i 2 MS E = SS E n k F = MS R MS E Total n SS T = Yi Y 2 MS T = SS T n Note that the degrees of freedom and the sum of squares are additive but not the mean squares: k + (n k ) = n ; SS R + SS E = SS T ; MS R + MS E 6= MS T
6 The Residual Standard Error, S 2 e; & the coef- cient of multiple determination, R 2 oth Se 2 and R 2 may be used to evaluate the goodness-of- t of our multiple regression model. The Residual Standard Error, Se; 2 is just the standard deviation of the error terms: S e = v Yi b 2 Y u i t n k = s SSE n k = s SSE ; when k = 2. n 3 2
The coe cient of multiple determination, R 2, is given by R 2 = SS R SS = E SS T SS T It gives the proportion (percentage) of the total variation in the dependent variable (Y) that is explained by the explanatory variables X and X 2. The larger the value of R 2, the better the t of the model. The adjusted, Radj 2, that takes due account of the degrees of freedom, is given by Radj 2 SS = E = (n k ) SS = E n SS T = (n ) SS T n k! SS = R n SS T n k = R 2 n n k = R 2 n n k 3
gain, note that R 2 adj R2, indicating that the unadjusted R 2 is an overestimate. oth S e and R 2 measure the goodness-of- t for a regression model, but S e is an absolute measure while R 2 is a relative measure. 4
7 Testing for the overall model-signi cance To test, the appropriate test statistics is H : = 2 = ::: = k = H : i 6= ; for at least one i. F = MS R MS E = Y b i Y 2 Yi Y b 2 C i which is to be compared with F (k;n k ;) : n k k 5
This is a global test in the sense that if the test is signi cant (H is rejected), we don t yet know which of the i is (are) signi cantly di erent from. Note also that the test statistics may be related to the coe cient of multiple determination, R 2, as follows: F = = Y b i Y 2 Yi Y b 2 C i R 2! n k R 2 k n k k = 2 byi Y (Y i Y ) 2 (Y i Y ) 2 X n 2 byi Y (Y i Y ) 2 C n k 6 k
8 Tests on sets of individual regression coe - cients To test, say H : i = H : i 6= for the individual coe cients, we may use the t-statistic: t = b i S(b i ) 7
and compare the calculated value of t with that of t (n k ; 2 ): The standard errors of the individual estimates are given by and S(b ) = S(b 2 ) = Se 2 x 2 2i x 2 i x 2 2i Se 2 x 2 i x 2 i x 2 2i where, x i = X i X ; and x 2i = X 2i X 2 : x i x 2i x i x 2i 2 2 8
9 Con dence Interval for the mean response Once we get the least square estimates of the model parameters, the estimated regression model is given by by i = a + b X i + b 2 X 2i This model may be used, among others, to predict values of Y for given values of X and X 2 : Thus, for new values X ;n+ and X 2;n+, the predicted value of Y is given by: by n+ = a + b X ;n+ + b 2 X 2;n+ Since, Y b i is a statistic (computed from a sample) it is subject to variation. This variation is measured by its standard error which is given by s Se S byn+ = S Y = 2 s MSE n = n(n k ) 9
This may, then, be used to construct ( )% con dence interval for the predicted population mean response, E(Y n+ jx ; n+ ; X 2 ; n+ ) as! S e by n+ t (n 3; 2 ) p ; Y b S e n+ + t n (n 3; 2 ) p n 2
Example(s) i Y i X i X 2i 2 3 2 3 2 4 3 5 3 5 4 4 4 5 7 5 2 X 2 5 5 Mean 4:2 3 3 y i x i x 2i x 2 i x 2 2i x i y i x 2i y i x i x 2i (a) Fit a Simple Linear Regression: b Y i = a + b X i and estimate all relevant quantities (NOV, S 2 e ; R2, etc...). (b) Do same with b Y i = a 2 + b 2 X 2i (c) Fit a Multiple Linear Regression b Y i = a 3 + b 3 X i + b 4 X 2i (with NOV, S 2 e; R 2, etc...) and compare the results with those in (a) and (b) 2
Introduction to Matrix lgebra 2 (This section is Extra! It is not part of the course, but it may be helpfull to know!!!) 2. De nition & Notation matrix is is a rectangular array of numbers. If has n rows and p columns, we say it is of order n x p: For instance, n observations on p 22
variables give an n x p matrix as follows: = a a 2 : : : a p a 2 a 22 : : : a 2p : : : : : : : : : a n a n2 : : : a np C vector is a matrix with only one row or column: a = a a 2 : : : a c 23
is a row-vector, while b = b b 2 : : : b r C is a column-vector 24
2.2 Elementary Operations with Matrices If = a : : : a p : : : : : : a n : : : a np C and = b : : : b p : : : : : : b n : : : b np C then, their sum is given by + = a + b : : : a p + b p : : : : : : a n + b n : : : a np + b np C 25
For a constant, c c = ca : : : ca p : : : : : : ca n : : : ca np C Further, if number of columns in is equal to number of rows in (p = n) then their product is given by a b +a 2 b 2 +::: + a p b p ::: a b p +a 2 b 2p +::: + a p b pp : : *= : : : : a n b +a n2 b 2 +::: + a np b p ::: a n b p +a n2 b 2p +::: + a np b pp 26
2.3 Row Exchanges, Inverse, Transpose The transpose of an r x c matrix is denoted by and is the c x r matrix formed by interchanging the roles of rows and columns: = a a 2 : : : a n a 2 a 22 : : : a n2 : : : : : : : : : a p a 2p : : : a np C 27
The inverse of matrix is denoted by and is such that = = I = : : : : : : : : : : : : : : : : : : is the identity-matrix whose elements are -s in the main-diagonal and -s elsewhere C 28
2.4 Square Matrices, Symetric Matrices, etc... matrix is said to be square matrix if its number of rows and columns are equal matrix is said to be symetric matrix if it = (if it is equal to its transpose) 29
2.5 Determinants The determinant of a matrix is denoted by det() or jj and is de ned only for square matrices, For a 2 x 2 matrix its determinant is given by = a a 2 a 2 a 22! det () = jj = a a 22 a 2 a 2 3
while for a 3 x 3 matrix its determinant is given by = a a 2 a 3 a 2 a 22 a 23 a 3 a 32 a 33 C det () = jj = a a 22 a 33 + a 2 a 23 a 3 + a 3 a 2 a 32 a 3 a 22 a 3 a a 23 a 32 a 2 a 2 a 33 Computation of larger matrices gets more complicated but there are special methods 3
2.6 Eigen-values and eigen-vectors 2.7 Positive-de nite matrices 32
3 The Matrix-approach to Linear Regression 3. Model formulation Let Y = y y 2 : : : y n C variable (dependent variable), be a column-vector of n observations of the response 33
X = x 2 : : : x p x 22 x 2p : : : : : : : : : x n2 x np an n x (p+) matrix of explanatory variables (including a constant for the intercept), C = 2 C a column-vector of regression coe cients (one intercept and p 34
p slopes), and = 2 : : : n C a column-vector of disturbance (error) terms. Then, the multiple regression model may be written in matrix form as Y = X + 35
3.2 Model ssumptions Soime of the standard assumptions are E () = = : : : C ; 36
and Cov () = E = 2 I = 2 : : : : : : : : : : : : : : : : : : C Thus, E (Y) = E (X + ) = E (X ) + E () = E (X ) = X 37
3.3 Estimation of Parameters If e = Y c Y = Y X b = y by y 2 by 2 : : : y n by n is the estimated vector of error terms, then the vector of coe cients is estimated by minimizing the sum of squares of these error terms (Least Square method): e e = Y X b Y X b = Y Y 2 b X Y + b X X b C 38
This sum of squares is then minimized by di erentiating e e with respect to b, equating to and solving for b : so that b e e = =) 2X Y + 2 b X X = b = X Y X X = X X X Y and the tted regression model is given by cy = X b = X X X X Y and E b = E X X X Y = X X X E (Y) = X X X X = showing that the least square estimate b is an unbiassed estimator of the true parameter. 39
3.4 Numerical Examples Let Y = 5 7 9 3 5 7 9 2 23 C ; Y 2 = 28 25 22 9 6 3 7 4 C ; X = 2 3 4 5 6 7 8 9 C ; =! ; =! Then, Y = X =) b = X X X Y 4
where X X = 2 3 4 5 6 7 8 9! 2 3 4 5 6 7 8 9 C = 55 55 385! and X X = 55 55 385! = :466667 :666667 :666667 :222! 4
while X Y = 2 3 4 5 6 7 8 9! 5 7 9 3 5 7 9 2 23 C = 4 935! 42
Thus, b = =! b = X b X X :466667 :666667 Y = :666667 :222! :466667 4 :666667 935 = 3! :666667 4 + :222 935 2! 4 935! =) b = 3 and b = 2 43
Similarly, Y 2 = X =) b = X X X Y 2 where X Y 2 = 2 3 4 5 6 7 8 9! 28 25 22 9 6 3 7 4 C = 45 55! 44
so that b = b b! = X X X Y 2 == :466667 45 :666667 55 = :666667 45 + :222 55 =) b = 3 and b = 3 :466667 :666667 :666667 :222! = 3 3!! 45 55! The results are intuitively appealing since... 45