Multicollinearity. Filippo Ferroni 1. Course in Econometrics and Data Analysis Ieseg September 22, Banque de France.

Filippo Ferroni 1 1 Business Condition and Macroeconomic Forecasting Directorate, Banque de France Course in Econometrics and Data Analysis Ieseg September 22, 2011

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect).

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k ) 2 Thus, we tend to underestimate the t-statistic. Recall t s = b k β H0 s k = b k s k

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues.

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression 2 Compute y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε VIF(b j ) = 1 1 R 2 1

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression 2 Compute y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε VIF(b j ) = 1 1 R 2 1 3 Repeat step 1 and 2 for all the regressors, if VIF(b 1 ) > 5 we have multicollinearity.

Remedies Do nothing. If in your original specification you get significant estimate, then never mind about.

Remedies Do nothing. If in your original specification you get significant estimate, then never mind about. Drop a redundant variable (the one that has the largest VIR or the largest pairwise correlation). Add more data if possible The best regression models are those where the regressors correlate highly with the dependent (outcome) variable but correlate at most only minimally with each other.