Filippo Ferroni 1 1 Business Condition and Macroeconomic Forecasting Directorate, Banque de France Course in Econometrics and Data Analysis Ieseg September 22, 2011
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect).
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more.
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice.
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes:
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k )
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k ) 2 Thus, we tend to underestimate the t-statistic. Recall t s = b k β H0 s k = b k s k
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k ) 2 Thus, we tend to underestimate the t-statistic. Recall t s = b k β H0 s k = b k s k 3 As a consequence, it is more likely to accept the null hypothesis (that the parameter is zero).
We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k ) 2 Thus, we tend to underestimate the t-statistic. Recall t s = b k β H0 s k = b k s k 3 As a consequence, it is more likely to accept the null hypothesis (that the parameter is zero). 4 A danger of such data redundancy is overfitting.
Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues.
Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation y = α + β 1x 1 + + β k x k + ɛ
Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε
Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression 2 Compute y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε VIF(b j ) = 1 1 R 2 1
Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression 2 Compute y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε VIF(b j ) = 1 1 R 2 1 3 Repeat step 1 and 2 for all the regressors, if VIF(b 1 ) > 5 we have multicollinearity.
Remedies Do nothing. If in your original specification you get significant estimate, then never mind about.
Remedies Do nothing. If in your original specification you get significant estimate, then never mind about. Drop a redundant variable (the one that has the largest VIR or the largest pairwise correlation).
Remedies Do nothing. If in your original specification you get significant estimate, then never mind about. Drop a redundant variable (the one that has the largest VIR or the largest pairwise correlation). Add more data if possible
Remedies Do nothing. If in your original specification you get significant estimate, then never mind about. Drop a redundant variable (the one that has the largest VIR or the largest pairwise correlation). Add more data if possible The best regression models are those where the regressors correlate highly with the dependent (outcome) variable but correlate at most only minimally with each other.