Multicollinearity. Filippo Ferroni 1. Course in Econometrics and Data Analysis Ieseg September 22, Banque de France.

Similar documents
L7: Multicollinearity

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Making sense of Econometrics: Basics

Violation of OLS assumption- Multicollinearity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Steps in Regression Analysis

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Econ107 Applied Econometrics

Homoskedasticity. Var (u X) = σ 2. (23)

ECNS 561 Multiple Regression Analysis

Modelling the Electric Power Consumption in Germany

AUTOCORRELATION. Phung Thanh Binh

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

CHAPTER 6: SPECIFICATION VARIABLES

Multiple Linear Regression

ECON 4230 Intermediate Econometric Theory Exam

Microeconometria Day # 5 L. Cembalo. Regressione con due variabili e ipotesi dell OLS

Lectures 5 & 6: Hypothesis Testing

Regression Models - Introduction

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

1. The Multivariate Classical Linear Regression Model

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Simple and Multiple Linear Regression

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Multiple Regression Analysis

Lecture 4: Multivariate Regression, Part 2

Iris Wang.

LECTURE 11. Introduction to Econometrics. Autocorrelation

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Econometrics - 30C00200

Effect of Multicollinearity on Power Rates of the Ordinary Least Squares Estimators. P.M.B. 4000, Ogbomoso, Oyo State, Nigeria

Applied Econometrics (QEM)

Lecture 3: Multiple Regression

1 Motivation for Instrumental Variable (IV) Regression

Reliability of inference (1 of 2 lectures)

Christopher Dougherty London School of Economics and Political Science

ECON 497: Lecture 4 Page 1 of 1

CHAPTER 4: Forecasting by Regression

Introductory Econometrics

Econometrics Review questions for exam

Applied Statistics and Econometrics

Linear Regression with Time Series Data

Review of Econometrics

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Introduction to Econometrics

Basic Econometrics - rewiev

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Least Squares Estimation-Finite-Sample Properties

2. Linear regression with multiple regressors

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Linear Regression & Correlation

Heteroscedasticity 1

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics Summary Algebraic and Statistical Preliminaries

Applied Quantitative Methods II

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

STA121: Applied Regression Analysis

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati

Multiple Linear Regression CIVL 7012/8012

Statistical Inference with Regression Analysis

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Econometrics Midterm Examination Answers

Quantitative Methods I: Regression diagnostics

Lecture 4: Multivariate Regression, Part 2

Regression Models - Introduction

Econometrics. Final Exam. 27thofJune,2008. Timeforcompletion: 2h30min

Lab 11 - Heteroskedasticity

Simple Linear Regression Model & Introduction to. OLS Estimation

Multiple Regression Analysis

Economics 300. Econometrics Multiple Regression: Extensions and Issues

Föreläsning /31

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Simple Linear Regression

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Chapter 10 Logistic Regression

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

1. The OLS Estimator. 1.1 Population model and notation

Single and multiple linear regression analysis

Job Training Partnership Act (JTPA)

Economics 308: Econometrics Professor Moody

Linear Regression with Time Series Data

STAT Checking Model Assumptions

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

FinQuiz Notes

Simple Linear Regression: The Model

Lecture 1: OLS derivations and inference

Properties of the least squares estimates

Introduction to Econometrics. Heteroskedasticity

Simple Linear Regression

Transcription:

Filippo Ferroni 1 1 Business Condition and Macroeconomic Forecasting Directorate, Banque de France Course in Econometrics and Data Analysis Ieseg September 22, 2011

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect).

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more.

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice.

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes:

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k )

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k ) 2 Thus, we tend to underestimate the t-statistic. Recall t s = b k β H0 s k = b k s k

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k ) 2 Thus, we tend to underestimate the t-statistic. Recall t s = b k β H0 s k = b k s k 3 As a consequence, it is more likely to accept the null hypothesis (that the parameter is zero).

We have multicollinearity when two or more regressors are linear combinations of other regressors (perfect) or highly correlated (imperfect). The usual interpretation of the regression coefficient (as the average impact of one variable ceteris parisbus) does not apply any more. Perfect violates the the assumption for the Gauss-Markov theorem to hold. Almost impossible in practice. Imperfect multicollinearity causes: 1 Estimates are typically unbiased, however not very precise (large standard errors around the OLS estimates, s k ) 2 Thus, we tend to underestimate the t-statistic. Recall t s = b k β H0 s k = b k s k 3 As a consequence, it is more likely to accept the null hypothesis (that the parameter is zero). 4 A danger of such data redundancy is overfitting.

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues.

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation y = α + β 1x 1 + + β k x k + ɛ

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression 2 Compute y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε VIF(b j ) = 1 1 R 2 1

Detecting Compute the pairwise sample correlation among regressors. If it is large, then we have issues. High Variance Factors (VIF) Suppose you want ot detect MC in the following equation 1 Run The following regression 2 Compute y = α + β 1x 1 + + β k x k + ɛ x 1 = γ + δ 2 x 2 + + δ k x k + ε VIF(b j ) = 1 1 R 2 1 3 Repeat step 1 and 2 for all the regressors, if VIF(b 1 ) > 5 we have multicollinearity.

Remedies Do nothing. If in your original specification you get significant estimate, then never mind about.

Remedies Do nothing. If in your original specification you get significant estimate, then never mind about. Drop a redundant variable (the one that has the largest VIR or the largest pairwise correlation).

Remedies Do nothing. If in your original specification you get significant estimate, then never mind about. Drop a redundant variable (the one that has the largest VIR or the largest pairwise correlation). Add more data if possible

Remedies Do nothing. If in your original specification you get significant estimate, then never mind about. Drop a redundant variable (the one that has the largest VIR or the largest pairwise correlation). Add more data if possible The best regression models are those where the regressors correlate highly with the dependent (outcome) variable but correlate at most only minimally with each other.