Multcollnearty multcollnearty Ragnar Frsch (934 perfect exact collnearty multcollnearty K exact λ λ λ K K x+ x+ + x 0 0.. λ, λ, λk 0 0.. x perfect ntercorrelated λ λ λ x+ x+ + KxK + v 0 0.. v 3 y β + β x+ β x + β x + u 3 4 3 4 5 trend (3. X X [ ] X x x x 3.30 3 K centered and scaled
x S x x (,,, n;,,, K Sx ( x x centerng and scalng XX ' a correlaton matrx 0 0 0 0 r3 r K 0 r3 r 3K XX ' rk, K 0 r x x X raw X n X y + X β 3.3 β ' [ β β β β ] X s the n*(k- matrx of centered and scaled regressor 3 K varables. X ' X s the k*k correlaton matrx X ' X r3 r4 r K r34 r3 K rk, K What s multcollnearty? X x near lnear dependences 0 K c x 0 3.3 0 exact XX '
Use of egenvalues and egenvectors to explan multcollnearty X ' X V v v v ( dag ( λ λ3 λ orthogonal matrx [ ] V' X ' X V,,, K 3.33 3 K λ (3.33 X ' X V X ' X normalzed egenvectors 0 λ 0 ' ( v X ' X v 0 v K vlx l l 0 v. v OLS perfect multcollnearty ( b X X X' y var ( σ ( ' ' ( b X X XX ' b ( y y( x x ( x3x3 ( y y( x3x3 ( x x( x3x3 ( x x ( x3x3 ( x x( x3x3 var ( b x ( x x3 σ ( x ( 3 3 ( ( 3 3 ( x x ( r x x x x x x x 3 3 σ 3 λx λ b 0 b 0 3
( ( ( ( 3 x x x3x3 r3 x x3 x x x x3 r b 0 3 λ x x 3 ( λ y y b + b x + e ax + e 3 λ a b + b 3 β + β + ε devaton form y y ( x x 3( x 3 x3 OLS a b + λb ( y y( x x ( x x 3 b b3 + estmable functon β λβ 3 hgh but mperfect multcollnearty 3 OLS BLUE BUE near multcollnearty OLS BLUE In case of near or hgh multcollnearty, one s lkely to encounter the followng consequences:. Although BLUE, the OLS estmators have large varances and covarances, makng precse estmaton dffcult.. Due to consequence, the confdence ntervals tend to be much wder, leadng to the acceptance of the zero null hypothess more readly. 3. Also due to consequence, the t rato of one or more coeffcents tends to be statstcally 4
nsgnfcant (.e., becomes smaller. 4. Although the t rato of one or more coeffcents s statstcally nsgnfcant, R, the overall measure of goodness of ft, can be very hgh. Indeed, ths s one of the sgnals of multcollnearty nsgnfcant t values but a hgh overall R (and a sgnfcant F value! 5. The OLS estmators and ther standard errors can be senstve to small changes n the data. 4 detecton. Hgh R but few sgnfcant t ratos. F 0 t. Hgh par-wse correlatons among regressors. 3. Examnaton of partal correlatons. Farrar Glauber R y x x3 x4 r, r, r.34.34 3.4 4.3 x x3 x4 4. Auxlary regressons. exact or approxmate R Auxlary regresson y x Rx. /( x x K K F 0.7.3 R / n K + ( x. x x ( K K n K F n K R x x. x xk F x F x x F x R Klen s rule of thumb 5
R overall R 5. Egenvalues and condton ndex. SAS package uses egenvalues and the condton ndex to dagnose multcollnearty. From these egenvalues, we can derve what s known as the condton number k defned as Maxmum egenvalue k Mnmum egenvalue and the condton ndex (CI defned as CI Maxmum egenvalue Mnmum egenvalue k Then we have ths rule of thumb. If k s between 00 and 000 there s moderate to strong multcollnearty and f t exceeds 000 there s severe multcollnearty. Alternatvely, f the CI s between 0 and 30, there s moderate to strong multcollnearty and f t exceeds 30 there s severe multcollnearty. 6. Tolerance and varance nflaton factor. var var σ r3 ( b ( b ( x x σ 3 r3 ( x 3 x3 7.4. 7.4.5 σ r 3 3 r3 ( b b cov, ( x x ( x3 x3 7.4.7 varance-nflatng factor, VIF VIF r 0.5. 3 K (7.5.6 Myers 3.34 var ( b σ R ( x x (7.5.6 where b s the partal regresson coeffcent of x and R s the R n the regresson of x on the remanng (K- regressors. 6
VIF (,3, K 0.5.4 R The nverse of the VIF s called tolerance (TOL. TOL R 0.5.4 VIF Some authors use the VIF as an ndcator of multcollnearty. As a rule of thumb, f the VIF of a varable exceeds 0, whch wll happen f R exceeds 0.9, that varable s sad to be hghly collnear. 5 remedy Rule-of-thumb procedures. A pror nformaton. Combnng cross-sectonal and tme seres data poolng the data 3. Droppng a varable(s and specfcaton bas But n droppng a varable from the model we may be commttng a specfcaton bas or specfcaton error. Specfcaton bas arses from ncorrect specfcaton of the model used n the analyss. β β β ε If the true model s y + x + 3x 3+ But we mstakenly ft the model y b+ bx + e 7
Then t can be shown that E ( b β + β b 3 3 where b 3 slope coeffcent n the regresson of x3 on x. It s obvous that b wll be a based estmated of β as long as b3 s dfferent from zero. If b3 does not approach zero as the sample sze s ncreased ndefntely, then b wll be not only based but also nconsstent. Of course f b3 s zero, we have no multcollnearty problem to begn wth. 4. Transformaton of varables y β+ βx + β3x3+ ε y x ε β + β + β3+ x3 x3 x3 x3 But the frst-dfference or rato transformatons are not wthout problems. v ε ε t t t ε x 3 5. Addtonal or new data. 6. Other methods of remedyng multcollnearty Multvarate statstcal technques such as factor analyss and prncple components or technques such as stepwseregresson, rdge regresson are often employed to solve the problem of multcollnearty. 8