REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b) Explain/Obtain the variance-covariance matrix of Both in the bivariate case (two regressors) and in the multivariate case (k regressors) Decompose the residual sum of squares The coe cient of the multiple correlation Information criteria
BIVARIATE REGRESSION Consider the following bivariate regression Y i b 1 + b X i + e i ; i 1; : : : ; N The above expression can be written in a matrix as or 6 4 Y 1 Y. Y N Y (Nx1) 3 7 5 6 4 X b + (Nx)(x1) 1 X 1 1 X.. 1 X N 3 " b1 7 5 b e ; (Nx1) + 6 4 e 1 e. e N 3 7 5
SUM OF THE SQUARED ERRORS The sum of the squared errors can be written as NX i1 e i e 0 e h e 1 e e N i 6 4 e 1 e. e N 3 7 5 e 1 + e + + e N The vector of the errors is given by e (Nx1) Y (Nx1) X b (Nx)(x1)
X 0 X FOR THE BIVARIATE CASE X 0 X is given by X 0 X (x) " 1 1 1 X 1 X X N " N P X P X P X 6 4 1 X 1 1 X.. 1 X N 3 7 5 From the above equation it follows that (X 0 X) 1 " P N Ni1 1 X P P i X X " P X P Ni1 X P i X N N P X ( P X) " P X P Ni1 X P i X N N P x
X 0 Y FOR THE BIVARIATE CASE Similarly, X 0 Y is given by X 0 Y (x1) " 1 1 1 X 1 X X N " P Y P XY 6 4 Y 1 Y. Y N 3 7 5
LS ESTIMATOR OF THE SLOPE COEFFICIENT The LS estimator is " 1 (X 0 X) 1 X 0 Y " P X P " Ni1 P X P i Y P X N XY " P X P Y N P XY N P x P P X XY P P X Y N P x Thus, since N P xy N P P P XY X Y : N P P P P XY X Y xy N P x P x
LS ESTIMATOR OF THE CONSTANT " P X P P P Y X XY N P P P XY X Y N P x Thus P X P P P Y X XY 1 N P x It can be shown that 1 Y X
LS ESTIMATOR k VARIABLES Consider the: i) Nx1 vector of the Y s, ii) Nxk matrix of the regressors (X s), iii) Nx1 vector of the parameters (b s) and, iv) Nx1 vector of the errors (e s) Y (Nx1) b (kx1) 6 4 6 4 Y 1 Y. Y N b 1 b. b K 3 7 5 ; X 3 7 (Nxk) 5 ; e (Nx1) 6 4 6 4 1 X 1 X k1 1. X.. X k. 1 X N X kn e 1 e. e N 3 7 5 3 7 5 ;
The multiple regression is Y i b 1 + b X i + + b k X ki + e i In a matrix form it can be written as Y (Nx1) X b + (Nxk)(kx1) e (Nx1) The LS estimator of the vector b, as in the bivariate case, is given by (X 0 X) 1 X 0 Y (kx1) (kxk) (kx1)
VARIANCE-COVARIANCE MATRIX OF THE ERRORS ERRORS Consider the x matrix ee 0 ee 0 (x) " e1 e h e1 e i " e 1 e 1 e e 1 e e Since E(e i ) 0, E(e i ) (the errors are homoskedastic) and E(e 1 e ) 0 (the errors are serially uncorrelated) the variance-covariance matrix of the vector of errors is given by E(ee 0 ) (x) E " e 1 e 1 e e 1 e e where I is the identity matrix " 0 0 I
VARIANCE-COVARIANCE MATRIX OF THE ERRORS N ERRORS If we have N observations: ee 0 (NxN) 6 4 e 1 e 1 e e 1 e N e e 1 e e e N... e N e 1 e N e e N Accordingly, the variance-covariance matrix of the errors, E(ee 0 ), is (NxN) E(ee 0 ) (NxN) E 6 4 6 4 3 7 5 e 1 e 1 e e 1 e N e e 1 e e e N... e N e 1 e N e e N 3 0 0 0 0... 7 0 0 5 I 3 7 5
VARIANCE-COVARIANCE MATRIX OF THE LS ESTI- MATOR VECTOR BIVARIATE CASE The x variance covariance matrix of the vector is de ned as V ar() E[( b) ( b) 0 ] (x) (x1) (1x) Note that if was a scalar with expected value b then V ar() E( b) Since is a x1 vector: V ar() (x) " V ar( 1 ) Cov( 1 ; ) Cov( 1 ; ) V ar( ) It can be shown that V ar() (x) (X 0 X) 1
VARIANCE OF THE ESTIMATOR OF THE SLOPE CO- EFFICIENT Recall that (X 0 X) 1 " P X P X P X N N X x Thus, V ar() (x) V ar() (x) (X 0 X) 1 is given by " V ar( 1 ) Cov( 1 ; ) Cov( 1 ; ) V ar( ) " P X P Ni1 X i P X N N X x Moreover, the above expression implies V ar( ) N N P x P x ;
VARIANCE OF THE ESTIMATOR OF THE CONSTANT Since V ar() (x) " P X P Ni1 X i P X N N X x It can be shown that V ar( 1 ) [ 1 N + X P x ]:
BIVARIATE CASE: COVARIANCE BETWEEN THE TWO ESTIMATORS V ar() (x) " P X P Ni1 X i P X N N X x Finally, Cov( 1 ; ) P X N P x X P x :
VARIANCE-COVARIANCE MATRIX OF THE LS VEC- TOR -MULTIVARIATE CASE For the multiple linear regression we also have V ar() (kxk) (X 0 X) 1 (kxk) In other words, in this case we have k unknowns: The k variances and the k(k 1) covariances.
DECOMPOSITION OF RESIDUAL SUM OF SQUARES (RSS) It can be shown that TSS z } { X y ESS z } { 0 X 0 X NY + RSS z} { be 0 be TSS: total sum of squares; ESS: explained sum of squares; RSS: residual sum of squares
R We de ne the R of the regression (the coe cient of the multiple correlation) as R ESS TSS TSS-RSS TSS 1 RSS TSS The adjusted R (R ) is given by R 1 RSS(N k) TSS(N 1)
INFORMATION CRITERIA Next we de ne two information criteria: The Schwarz and Akaike information criteria (SIC, AIC respectively): SIC ln b e 0 be N + k N ln(n); e AIC ln b0 be N + k N the pre ered model is the one with the minimum information criterion.
SUMMARY (MULTIPLE LINEAR REGRESSION) We obtained the least square estimator of the vector of coe cients: (X 0 X) 1 X 0 Y In the bivariate case this gives us: 1 Y X; P xy P x
We obtained the variance-covariance matrix of : V ar() (X 0 X) 1 In the bivariate case this gives us V ar( 1 ) [ 1 N + X P x ]; V ar( ) P x ; Cov( 1 ; ) X P x ;
We decomposed the TSS: TSS z } { X y ESS z } { 0 X 0 X NY + RSS z} { be 0 be We de ned the R of the regression: R RSS 1 TSS We de ned two information criteria: SIC ln b e 0 be N + k N ln(n); e AIC ln b0 be N + k N
PROOFS FROM WEEK 7 ONWARDS ONLY FOR THE EC5501 SUM OF THE SQUARED ERRORS The sum of the squared errors can be written as NX i1 e i e 0 e h e 1 e e N i 6 4 e 1 e. e N 3 7 5 e 1 + e + + e N The vector of the errors is given by e (Nx1) Y (Nx1) X b (Nx)(x1) The sum of the squared errors, P N i1 e i, can be written as NX i1 e i e 0 e (Y Xb) 0 (Y Xb) (Y 0 b 0 X 0 )(Y Xb) Y 0 Y b 0 X 0 Y Y 0 Xb + b 0 X 0 Xb
SUM OF THE SQUARED ERRORS NX i1 e i e0 e Y 0 Y b 0 X 0 Y Y 0 Xb + b 0 X 0 Xb Next note that since Y 0 X b (1xN) (Nx) (Y 0 Xb) 0 b 0 X 0 Y Y 0 Xb. Thus (x1) is a scalar we have: NX i1 e i e0 e Y 0 Y b 0 X 0 Y + b 0 X 0 Xb
LS ESTIMATOR OF THE VECTOR b NX i1 e i e0 e Y 0 Y b 0 X 0 Y + b 0 X 0 Xb The least squares (LS) principle is to choose the vector of the parameters b in order to minimize P N i1 e i. Thus we take the rst derivative of e 0 e with respect to b and set it equal to zero: @(e 0 e) 0 X 0 Y + X 0 X 0 @b The above equation implies that the LS estimator of b, denoted by, is given by X 0 X X 0 Y ) (X 0 X) 1 X 0 Y This is a system of two equations and two unknowns ( 1, ) because X 0 X X0 Y and (xn)(nx). (xn)(nx1)
X 0 X FOR THE BIVARIATE CASE X 0 X is given by X 0 X (x) " 1 1 1 X 1 X X N " N P X P X P X 6 4 1 X 1 1 X.. 1 X N 3 7 5 From the above equation it follows that (X 0 X) 1 " P N Ni1 1 X P P i X X " P X P Ni1 X P i X N N P X ( P X) " P X P Ni1 X P i X N N P x
X 0 Y FOR THE BIVARIATE CASE Similarly, X 0 Y is given by X 0 Y (x1) " 1 1 1 X 1 X X N " P Y P XY 6 4 Y 1 Y. Y N 3 7 5
LS ESTIMATOR OF THE SLOPE COEFFICIENT The LS estimator is " 1 (X 0 X) 1 X 0 Y " P X P " Ni1 P X P i Y P X N XY " P X P Y N P XY N P x P P X XY P P X Y N P x Thus, since N P xy N P P P XY X Y : N P P P P XY X Y xy N P x P x
LS ESTIMATOR OF THE CONSTANT " P X P P P Y X XY N P P P XY X Y N P x Further 1 P X P P P Y X XY N P x P X Y X P XY P x Next, in the numerator we NY X to get 1 Y (P X NX ) X( P XY NY X) P x Y P x X P xy P x Y X
PROPERTIES OF THE LS RESIDUALS From the least squares methodology we have (X 0 X) X 0 Y Since Y X + be, it follows that In other words " 1 1 1 X 1 X X N (X 0 X) X 0 (X + be) ) X 0 be 0 6 4 be 1 be. be N 3 7 5 " P b e i P Xi be i 0 P That is, the residuals, be i, have mean zero: e bi 0, and are orthogonal to the regressor X i : P X i be i 0 Recall also that Y 1 + X. In other words (X; Y ) satisfy the estimated linear relationship
The predicted values for Y, denoted by b Y, are given by by X. Thus by 0 be (X) 0 be 0 X 0 be 0 In other words, the vector of regression values for Y is uncorrelated with be
EXPECTED VALUE OF THE LS ESTIMATOR Recall that LS vector estimator is (X 0 X) 1 X 0 Y Since Y Xb + e, it follows that (X 0 X) 1 X 0 (Xb + e) (X 0 X) 1 X 0 Xb + (X 0 X) 1 X 0 e b + (X 0 X) 1 X 0 e (1) Taking expectation from both sides of the above equation gives E() b + (X 0 X) 1 X 0 E(e) b since E(e) 0. That is is an unbiased estimator of b
VARIANCE-COVARIANCE MATRIX OF THE LS ESTI- MATOR VECTOR BIVARIATE CASE The x variance covariance matrix of the vector is de ned as V ar() E[( b) ( b) 0 ] (x) (x1) (1x) Note that if was a scalar with expected value b then V ar() E( b) Since is a x1 vector: V ar() (x) " V ar( 1 ) Cov( 1 ; ) Cov( 1 ; ) V ar( )
From equation (1) we have b (X 0 X) 1 X 0 e Hence, E[( b) (x1) ( b) 0 ] Ef(X 0 X) 1 X 0 e[(x 0 X) 1 X 0 e] 0 g (1x) E[(X 0 X) 1 X 0 ee 0 X(X 0 X) 1 ] since [(X 0 X) 1 ] 0 (X 0 X) 1 and (X 0 ) 0 X Next, we have E[( b) (x1) ( b) 0 ] [(X 0 X) 1 X 0 E(ee 0 )X(X 0 X) 1 ] (1x) Using the fact that E(ee 0 ) I we get V ar() (x) (X 0 X) 1 X 0 X(X 0 X) 1 (X 0 X) 1
VARIANCE OF THE ESTIMATOR OF THE CONSTANT Recall that (X 0 X) 1 " P X P X P X N N X x Thus, V ar() (x) V ar() (x) (X 0 X) 1 is given by " V ar( 1 ) Cov( 1 ; ) Cov( 1 ; ) V ar( ) " P X P Ni1 X i P X N N X x Thus, V ar( 1 ) P X N P x ;
V ar( 1 ) P X N P x ; Using P x P X NX ) P X P x +NX, we get V ar( 1 ) P x + NX N P x [ 1 N + X P x ]:
DECOMPOSITION OF RESIDUAL SUM OF SQUARES (RSS) Recall that Y X + be b Y + be In other words we decompose the Y vector into: the part explained by the regression, b Y X (the predicted values of Y ), and the unexplained part be (the residuals)
Thus X Y Y 0 Y ( b Y + be) 0 ( b Y + be) b Y 0 b Y + be 0 be X b Y + NX i1 since b Y 0 be be 0 b Y 0 (from the rst order conditions). Note that b Y 0 b Y (X) 0 X 0 X 0 X be i
X Y b Y 0 b Y + be 0 be 0 X 0 X + be 0 be Recall that P y P Y NY N d V ar(y ). Thus TSS z } { X y ESS z } { 0 X 0 X NY + RSS z} { be 0 be TSS: total sum of squares; ESS: explained sum of squares; RSS: residual sum of squares