LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur

6. Rdge regresson The OLSE s the best lnear unbased estmator of regresson coeffcent n the sense that t has mnmum varance n the class of lnear and unbased estmators. However f the condton of unbasedness can be relaxed then t s possble to fnd a based estmator of regresson coeffcent say error (MSE) of ˆ s ˆ that has smaller varance them the unbased OLSE b. The mean squared. MSE( ˆ ) = E( ˆ ) { ˆ ( ˆ ) } { ( ˆ ) } = E E + E = Var( ˆ ) + E( ˆ ) = Var( ˆ ) + Bas( ˆ ). ˆ Thus MSE( ) can be made smaller than Var( ˆ ) by ntroducng small bas n. One of the approach to do so s the regresson. The regresson estmator s obtaned by solvng the normal equatons of least squares estmaton. The normal equatons are modfed as ˆ ˆ ( ' ) X X + I ˆ = X ' y ( ) 1 ˆ = X ' X + I X ' y. s the regresson estmator of and 0 s any characterzng scalar termed as basng parameter.

3 As 0, ˆ b ( OLSE) and as, ˆ 0. So larger the value of, larger shrnkage towards zero. Note that the OLSE s napproprate to use n the sense that t has very hgh varance when multcollnearty s present n the data. On the other hand, a very small value of accept the null hypothess H : = 0 0 parameter controls the amount of shrnkage n the estmates. may tend to ndcatng that the correspondng varables are not relevant. The value of basng ˆ Bas of regresson estmator ˆ The bas of s Bas( ˆ ) = E( ˆ ) ( ' ) ' ( ) = X X + I X E y = ( X ' X + I) X ' X I = ( X ' X + I) X ' X X ' X I = ( X ' X + I). [ ] Thus the regresson estmator s a based estmator of.

4 Covarance matrx The covarance matrx of ˆ s defned as { }{ } ' V( ˆ ) ˆ ( ˆ ) ˆ ( ˆ = E E E ). Snce ˆ E( ˆ ) = ( X ' X + I) X ' y ( X ' X + I) X ' X = + ( X ' X I) X '( y X) = + ( X ' X I) X ' ε, so V( ˆ ) = ( X ' X + I) X ' V( ε) X( X ' X + I) = σ ( X ' X + I) X ' X( X ' X + I).

Mean squared error 5 ' =Λ= (,,..., ), X X dag λ1 λ λ k ˆ Wrtng the mean squared error of s MSE( ˆ ) ( ˆ ) ( ˆ = Var + bas ) = tr V ( ˆ ) ( ˆ + bas ) = ( ' + ) ' ( ' + ) + '( ' + ) σ tr X X I X X X X I X X I λ = σ + k k j j j= 1 ( λj + ) j= 1 ( λj + ) where λ1, λ,..., λk are the egenvalues of X ' X. Thus as ncreases, the bas n ncreases but ts varance decreases. The trade off between bas and varance hnges upon the value of. It can be shown that there exsts a value of such that provded ' Choce of ' s bounded. ˆ MSE( ˆ ) < Var( b) The estmaton of regresson estmator depends upon the value of. Varous approaches have been suggested n the lterature to determne the value of. The value of can be chosen on the bass of crtera lke - stablty of estmators wth respect to. - reasonable sgns. - magntude of resdual sum of squares etc. We consder here the determnaton of by the nspecton of trace.

6 Rdge trace Rdge trace s the graphcal dsplay of regresson estmator versus. If multcollnearty s present and s severe, then the nstablty of regresson coeffcents s reflected n the trace. As ncreases, some of the estmates vary dramatcally and they stablzes at some value of. The objectve n trace s to nspect the trace (curve) and fnd the reasonably small value of at whch the regresson estmators are stable. The regresson estmator wth such a choce of wll have smaller MSE than the varance of OLSE. An example of trace for a model wth 6 parameters s as follows. In ths trace, the ˆ s evaluated for varous ˆ j ( ) choces of and the correspondng values of all regresson coeffcents s, j = 1,,,6 are plotted versus. These values are denoted by dfferent symbols and are joned by a smooth curve. Ths produces a trace for respectve parameter. Now choose the value of where all the curves stablze and become nearly parallel. For example, the curves n followng fgure become nearly parallel startng from or so. Thus one possble choce of s and parameters can be estmated as ˆ ' '. ( ) 1 = X X + 4I X y = 4 = 4

7 The fgure drastcally exposes the presence of multcollnearty n the data. The behavour of at s very dfferent than at other values of. For small values of, the estmates change rapdly. The estmates stablze gradually as ncreases. The value of at whch all the estmates stablze gves the desred value of because movng away from such wll not brng any apprecable reducton n the resdual sum of squares. If multcollnearty s present, then the varaton n regresson estmators s rapd around The optmal s chosen such that after that value of ˆ ( ) 0 0 0 0., almost all traces stablze.

8 Lmtatons 1. The choce of s data dependent and therefore s a random varable. Usng t as a random varable volates the assumpton that of. s a constant. Ths wll dsturb the optmal propertes derved under the assumpton of constancy (0, ). The value of les n the nterval. So large number of values are requred for exploraton. Ths results s wastng of tme. However, ths s not a bg ssue when workng wth software. 3. The choce of from graphcal dsplay may not be unque. Dfferent people may choose dfferent and consequently the values of regresson estmators wll be changng. However estmators of all the coeffcents stablze. Hence small varaton n choosng the value of change n estmators of the coeffcents. Another choce of s s chosen so that all the may not produce much = k ˆ σ bb ' where b and are obtaned from the least squares estmaton. 4. The stablty of numercal estmates of s a rough way to determne. Dfferent estmates may exhbt stablty for dfferent σˆ estmators are used. ˆ ' s and t may often be hard to strke a compromse. In such stuaton, generalzed regresson 5. There s no gudance avalable regardng the testng of hypothess and for confdence nterval estmaton.

9 Idea behnd regresson estmator The problem of multcollnearty arses because some of the egenvalues roots of X X are close to zero or are zero. So f λ are the characterstc roots, and f 1, λ,..., λp then X X =Λ= dag λ1 λ λ k ' (,,..., ) ˆ ( ) ( ) 1 = Λ+ I X y = I + Λ b where b s the OLSE of gven by b= X X X y = Λ X y 1 1 ( ' ) ' '. Thus a partcular element n ˆ wll be of the form 1 ' λ xy = b. λ + λ + λ λ = 0, So a small quantty s added to so that f even then remans meanngful. 1 λ +

10 Another nterpretaton of regresson estmator In the model obtan the least squares estmator of when = C, where C s some constant. So mnmze y = X + ε, ( ) = ( y X)'( y X) + ( ' C) S( ) where s the Lagrangan multpler. Dfferentatng wth respect to, the normal equatons are obtaned as k = 1 1 S( ) = 0 X ' y+ X ' X + = 0 ˆ ( X ' X I) X ' y. = + Note that f s very small, t may ndcate that most of the regresson coeffcents are close to zero and f s large, then t may ndcate that the regresson coeffcents are away from zero. So coeffcents to enable ts estmaton. puts a sort of penalty on the regresson