Violation of OLS assumption- Multicollinearity

Similar documents
Violation of OLS assumption - Heteroscedasticity

Föreläsning /31

L7: Multicollinearity

Multicollinearity. Filippo Ferroni 1. Course in Econometrics and Data Analysis Ieseg September 22, Banque de France.

1 A Non-technical Introduction to Regression

Collection of Formulae and Statistical Tables for the B2-Econometrics and B3-Time Series Analysis courses and exams

CHAPTER 6: SPECIFICATION VARIABLES

1 Correlation between an independent variable and the error

LECTURE 13: TIME SERIES I

Introductory Econometrics

Econometrics Midterm Examination Answers

Econometrics Homework 1

1 Regression with Time Series Variables

Economics 620, Lecture 13: Time Series I

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1. The Multivariate Classical Linear Regression Model

Simple Linear Regression

Lecture 4: Multivariate Regression, Part 2

May 2, Why do nonlinear models provide poor. macroeconomic forecasts? Graham Elliott (UCSD) Gray Calhoun (Iowa State) Motivating Problem

Quantitative Techniques - Lecture 8: Estimation

Lecture 4: Multivariate Regression, Part 2

1 Quantitative Techniques in Practice

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

(c) i) In ation (INFL) is regressed on the unemployment rate (UNR):

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

Applied Statistics and Econometrics

Chapter 2. Dynamic panel data models

i) the probability of type I error; ii) the 95% con dence interval; iii) the p value; iv) the probability of type II error; v) the power of a test.

Environmental Econometrics

Lecture 1: OLS derivations and inference

Econometrics Review questions for exam

2. Linear regression with multiple regressors

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

ECON 497: Lecture 4 Page 1 of 1

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

The linear regression model: functional form and structural breaks

Econ107 Applied Econometrics

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Multiple Regression Analysis

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Homoskedasticity. Var (u X) = σ 2. (23)

Heteroscedasticity 1

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Problem set 1 - Solutions

Multivariate Time Series

ECON 497 Midterm Spring

Lecture 4: Linear panel models

Motivation for multiple regression

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

STAT Checking Model Assumptions

Department of Economics Queen s University. ECON435/835: Development Economics Professor: Huw Lloyd-Ellis

Cointegration Tests Using Instrumental Variables Estimation and the Demand for Money in England

Chapter 1. GMM: Basic Concepts

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 15: Multiple Linear Regression & Correlation

Birkbeck Economics MSc Economics, PGCert Econometrics MSc Financial Economics Autumn 2009 ECONOMETRICS Ron Smith :

A New Approach to Robust Inference in Cointegration

Microeconometria Day # 5 L. Cembalo. Regressione con due variabili e ipotesi dell OLS

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b)

Solving with Absolute Value

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Statistical Inference with Regression Analysis

The Simple Linear Regression Model

Linear Regression Models

STA 302f16 Assignment Five 1

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

OLS, MLE and related topics. Primer.

WISE International Masters

Least Squares Estimation-Finite-Sample Properties

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Homework Set 2, ECO 311, Fall 2014

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Solving a Series. Carmen Bruni

Testing Linear Restrictions: cont.

Simple Linear Regression for the MPG Data

Empirical Application of Simple Regression (Chapter 2)

Instead of using all the sample observations for estimation, the suggested procedure is to divide the data set

Lecture Notes on Measurement Error

Linear Regression with Time Series Data

ECON 4230 Intermediate Econometric Theory Exam

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Instrumental Variables. Ethan Kaplan

Remedial Measures for Multiple Linear Regression Models

Basic Econometrics - rewiev

ECNS 561 Multiple Regression Analysis

Finnancial Development and Growth

Exercise sheet 3 The Multiple Regression Model

Simple Regression Model. January 24, 2011

ECO220Y Simple Regression: Testing the Slope

Iris Wang.

Section 3: Simple Linear Regression

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati

Ordinary Least Squares Regression Explained: Vartanian

Transcription:

Violation of OLS assumption- Multicollinearity What, why and so what? Lars Forsberg Uppsala University, Department of Statistics October 17, 2014 Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 1 / 36

Econometrics - Objectives and exam Violations of assumptions - mulitcollinearity: 1 Explain what multicollinearity is 2 Formulate perfect multicollinearity using formulae 3 Give an empirical example of when two regressors could be highly correlated Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 2 / 36

Econometrics - Objectives and exam 1 Tell the consequences of multicollinearity (expectation and variance of OLS estimators) 2 Explain how one can detect multicollinearity 3 Do a t-test of a slope coe cient in the prescence of high (but not perfect) multicollinearity and interpret the result Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 3 / 36

Questions to ask ourselves 1 How to spell it? 2 What is multicollinearity? 3 Is it a problem? In what situations? Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 4 / 36

Questions to ask ourselves 1 Detection: How do I know if there is a multicollinearity problem? 2 Why: How does multicollinearity come about? 3 Remedy: What can we do about it? Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 5 / 36

How to spell it Multicollinearity Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 6 / 36

Multicollinearity - What? Also a problem when, for small random ν λ 1 X 1 +... + λ k X k + ν = 0 In practice, it is not a question of IF we have multicollinearity, but of the degree of multicollinearity. Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 7 / 36

Multicollinearity - What? How then, do we measure the degree of multicollinearity? When does it become a problem? Consequences: What kind of problem(s) does it cause? Remedy: What can we do about it? Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 8 / 36

Multicollinearity - What? Perfect multicollinearity, assume that X 3 = αx 2 (so, X 3 is just a scaled version of X 2, ) and we want to estimate the parameters of Y = β 1 + β 2 X 2 + β 3 X 3 + u (1) Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 9 / 36

Multicollinearity - What? Substitute: Y = β 1 + β 2 X 2 + β 3 X 3 +u X 3 = αx 2 Y = β 1 + β 2 X 2 + β 3 (αx 2 ) +u Giving Y = β 1 + β 2 X 2 + β 3 αx 2 +u Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 10 / 36

Multicollinearity - What? Y = β 1 + β 2 X 2 + β 3 αx 2 +u Y = β 1 + (β 2 + β 3 α) X 2 +u Y = β 1 + γx 2 +u Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 11 / 36

Multicollinearity - What? We note that γ = (β 2 + β 3 α) 1 See that β 2, β 3 and α "sticks" togheter. 2 We cannot separate them. 3 Not only is β 2 and β 3 not identi ed - OLS breaks down... 4 If we try to estimate the above model (1), we will not get any numbers out of Eviews... Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 12 / 36

Multicollinearity - What? So, it is a matter of degree... If not exact: In the case of multicollinearity - this is the correct variance of the estimator bβ j! V b σ β j = 2 1 (X j X j ) 2 1 Rj 2 R 2 j being the R 2 of the regression of X j on the other regressions. Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 13 / 36

Multicollinearity - What? For instance R 2 2 would be the R2 from the regression X 2 = α 1 + α 2 X 3 + u If this R 2 is high: It means that X 3 can explain a lot of the variation in X 2 and that is not a good thing. They should in the "best of regressions" be independent, (orthogonal), or at least uncorrelated. Di erent regressors should explain "di erent parts" of the variation in the dependent variable. Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 14 / 36

Multicollinearity - Consequences What happens when we have (high degree of) multicollinearity? 1 OLS estimators still unbiased 2 Large variance, but estimates still BLUE (still the best we can use) 3 To wide CI (function of too large variances, to large standard errors) Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 15 / 36

Multicollinearity - Consequences What happens when we have (high degree of) multicollinearity? 1 t statistics to small (see above), leads no "no rejection/acceptance" of H 0 : β j = 0 2 but high R 2 (the model explains variation in Y, although some X s explain the same thing...) 3 Estimates sensitive to small changes in data Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 16 / 36

Multicollinearity - Why? What could be the reason? (If we knew, we could correct...) 1 Natural constraints on model/data (Rooms in at and Square meters) 2 Model speci cation (polynomial) Y i = β 0 Xi 0 + β 1 Xi 1 + β 2 Xi 2 + β 3 Xi 3 + u i Y i = β 0 + β 1 X i + β 2 Xi 2 + β 3 Xi 3 + u i 3 To many variables in the model 4 Common trends in time series (two variables trending together) Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 17 / 36

Multicollinearity - Detection How do we know if we have multicollinearity? 1 Using VIF (Variance In ation Factor) (see formula for variance)! VIF j = 1 1 Rj 2 2 Insigni cant t ratios, ) the model "is NO good" 3 but "high" R 2 4 Test of model signi cant ) the model "is good" Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 18 / 36

Multicollinearity - Consequences If the variances of the slope-estimators are too big, then what? In terms of t-ratios: bβ j σ b β j σ b β j to BIG + to SMALL + Never Reject H 0 : β j = 0 + Never Signi cance + Think model is "worse" that it actually is Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 19 / 36

Multicollinearity - Consequences But the F-test of the model, will still be signi cant... Analysing gives Result Interpretation t-test of parameters Not Signi cant ) Model is (di erent from zero) "NO GOOD" F-Test of model Signi cant ) Model is "OK" (at least one β j 6= 0 Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 20 / 36

Multicollinearity - Detection How do we know if we have multicollinearity? 1 Change one observation and see what happens (OLS on borderline to breakdown, should react...) 2 Scatterplot of X s (X 2 vs X 3 to see if there is a strong correlation) 3 Correlation matrix of the regressors (why not the covariance matrix?) Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 21 / 36

Multicollinearity - Remedy OK, we have (a high degree of) multicollinearity, what should/ can we do? 1 Nothing (Point Prediction only, S.E. being messed up) 2 Add data or another dataset 3 Drop variable(s) 4 Transformation of data, e.g. logs, di erences (will "destroy" linear dependency) Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 22 / 36

Multicollinearity - Remedy Example, Table 8.8: Model for number of employed, yearly data Variables: Y number of employed X 1 GNP price de ator X 2 GNP X 3 number of unemployed X 4 number in armed forces X 5 noninstitionalized population X 6 year Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 23 / 36

Multicollinearity - Example The original data Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 24 / 36

Multicollinearity - Example We estimate the model: What do we note? Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 25 / 36

Multicollinearity - Example Take a look at the correlation matrix Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 26 / 36

Multicollinearity - Example Run the "auxiliary" (help-) regression: X 1 on the other X 0 s (note that the dependent variable now is X 1 ) Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 27 / 36

Multicollinearity - Example Given the above output, we can calculate the Variance In ation Factor (VIF): VIF 1 = = 1 1 R1 2 1 1 0.992622 = 135.54 This is the "in ation" on the variance of the bβ 1 caused by X 1 being correlated with the other variables. Recall:! V b σ β j = 2 1 (X j X j ) 2 1 Rj 2 Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 28 / 36

Multicollinearity - Example We can calculate the variance without multicollinearity (if we did not have it, but now we do, so just for illustration, do not try this at home...)! σ 2 σ = 2 1 bβ j (X j X j ) 2 1 Rj 2 Variance without Multicollinearity (in the case R 2 j = 0) σ 2 bβ j = σ 2 bβ j = σ 2 (X j X j ) 2 σ 2 (X j X j ) 2 1 1 0 Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 29 / 36

Multicollinearity - Example SE without Multicollinearity rv With M b β j =! s σ 2 (X j X j ) 2 = v! u t σ 2 1 (X j X j ) 2 1 Rj 2 v V With M b β j u t 1 1 R 2 j Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 30 / 36

Multicollinearity - Example σ b β j,without M = v V With M b β j u t 1 1 R 2 j = 8.491493 p 135.54 = 0.729 Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 31 / 36

Multicollinearity - Example For X 1 where we have σ b β 1,With M = 8.491493 Recall So VIF 1 = R 2 1 = 0.992622 1 1 0.992622 VIF 1 = 135.54 Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 32 / 36

Multicollinearity - Example "Take out" VIF σ b β j,without M = v V With M b β j u t 1 1 R 2 j = 8.491493 p 135.54 = 0.729 Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 33 / 36

Multicollinearity - Example Comparision Multicollinearity Measure With M. Without M r V b β j 8.491 0.729 t obs 0.177 2.066 H 0 : β 1 = 0 Not reject Reject Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 34 / 36

Multicollinearity - Example Change one observation, i.e. rst obs in X 1 X1 Being the original data X11 Being the manipulated data Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 35 / 36

Multicollinearity - Example The regression results original data The regression results with the manipulated data: Big di erence in estimates of β 1 thus, we have a problem... Lars Forsberg (Uppsala University) 1110 - Multi - co - linear -ity October 17, 2014 36 / 36