Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Similar documents
R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Econ Autocorrelation. Sanjaya DeSilva

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Solutions to Odd Number Exercises in Chapter 6

ECON 482 / WH Hong Time Series Data Analysis 1. The Nature of Time Series Data. Example of time series data (inflation and unemployment rates)

The Simple Linear Regression Model: Reporting the Results and Choosing the Functional Form

Distribution of Estimates

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Comparing Means: t-tests for One Sample & Two Related Samples

Regression with Time Series Data

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

Licenciatura de ADE y Licenciatura conjunta Derecho y ADE. Hoja de ejercicios 2 PARTE A

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

1. Diagnostic (Misspeci cation) Tests: Testing the Assumptions

ESTIMATION OF DYNAMIC PANEL DATA MODELS WHEN REGRESSION COEFFICIENTS AND INDIVIDUAL EFFECTS ARE TIME-VARYING

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

4.1 Other Interpretations of Ridge Regression

Dynamic Econometric Models: Y t = + 0 X t + 1 X t X t k X t-k + e t. A. Autoregressive Model:

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

Chapter 7: Solving Trig Equations

Properties of Autocorrelated Processes Economics 30331

Unit Root Time Series. Univariate random walk

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

Financial Econometrics Jeffrey R. Russell Midterm Winter 2009 SOLUTIONS

Some Basic Information about M-S-D Systems

Lecture 5. Time series: ECM. Bernardina Algieri Department Economics, Statistics and Finance

Wednesday, November 7 Handout: Heteroskedasticity

GMM - Generalized Method of Moments

14 Autoregressive Moving Average Models

Math 10B: Mock Mid II. April 13, 2016

Chapter 2. First Order Scalar Equations

Forecasting optimally

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Chapter 11. Heteroskedasticity The Nature of Heteroskedasticity. In Chapter 3 we introduced the linear model (11.1.1)

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

EXERCISES FOR SECTION 1.5

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

10. State Space Methods

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Hypothesis Testing in the Classical Normal Linear Regression Model. 1. Components of Hypothesis Tests

Lecture 4. Classical Linear Regression Model: Overview

DEPARTMENT OF ECONOMICS

Assignment 6. Tyler Shendruk December 6, 2010

Solutions from Chapter 9.1 and 9.2

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

How to Deal with Structural Breaks in Practical Cointegration Analysis

The General Linear Test in the Ridge Regression

KEY. Math 334 Midterm I Fall 2008 sections 001 and 003 Instructor: Scott Glasgow

Distribution of Least Squares

Testing for a Single Factor Model in the Multivariate State Space Framework

CHAPTER 17: DYNAMIC ECONOMETRIC MODELS: AUTOREGRESSIVE AND DISTRIBUTED-LAG MODELS

Testing the Random Walk Model. i.i.d. ( ) r

3.1 More on model selection

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

OBJECTIVES OF TIME SERIES ANALYSIS

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence

Exponential Smoothing

dy dx = xey (a) y(0) = 2 (b) y(1) = 2.5 SOLUTION: See next page

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Chapter 16. Regression with Time Series Data

DEPARTMENT OF STATISTICS

Chapter 6. Systems of First Order Linear Differential Equations

y = β 1 + β 2 x (11.1.1)

Time series Decomposition method

This document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC

Solutions: Wednesday, November 14

Cointegration and Implications for Forecasting

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

System of Linear Differential Equations

STATE-SPACE MODELLING. A mass balance across the tank gives:

Challenge Problems. DIS 203 and 210. March 6, (e 2) k. k(k + 2). k=1. f(x) = k(k + 2) = 1 x k

Linear Response Theory: The connection between QFT and experiments

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Dynamic Models, Autocorrelation and Forecasting

Two Coupled Oscillators / Normal Modes

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Solutions Problem Set 3 Macro II (14.452)

The general Solow model

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Second Order Linear Differential Equations

You must fully interpret your results. There is a relationship doesn t cut it. Use the text and, especially, the SPSS Manual for guidance.

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Generalized Least Squares

Matlab and Python programming: how to get started

Ensamble methods: Bagging and Boosting

Solutionbank Edexcel AS and A Level Modular Mathematics

20. Applications of the Genetic-Drift Model

Measurement Error 1: Consequences Page 1. Definitions. For two variables, X and Y, the following hold: Expectation, or Mean, of X.

Empirical Process Theory

Linear Combinations of Volatility Forecasts for the WIG20 and Polish Exchange Rates

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Transcription:

I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression Y i i... K... Ki i i i K Ki i if: 3... K i i 3i Ki where he 'lambdas' are a se of parameers (no all equal o zero) and i. This mus be rue for all observaions. Alernaively, we could wrie any independen variable as an exac linear funcion of he ohers. = - - 3 K -... - i i 3i K i This says essenially ha is redundan. Jus a linear combinaion of he oher regressors. = Problems wih perfec mulicollineariy: () Coefficiens can' be esimaed. For example, in he -variable regression: ˆ xi y xi i Suppose i =λ i =λ. As a resul x i = for all i and hence he denominaor is. Thus, esimaed slope coefficiens are undefined. This resul applies o MLR. Inuiion: Again, we wan o esimae he parial coefficiens. Depends on he variaion in one variable and is abiliy o explain he variaion in he dependen variable ha can be explained by he oher regressor. Bu we can ge variaion

Page - in one wihou geing variaion in he oher by definiion. () Sandard errors can' be esimaed. In he 3-variable regression model, he sandard error on ˆ can be wrien: ˆ se( )= ˆ xi( - r 3 ) Bu perfec mulicollineariy implies r 3 = or r 3 =- (r 3= in eiher case), and he denominaor is zero. Thus, sandard errors are undefined ( ). The soluion o perfec mulicollineariy is rivial: Drop one or several of he regressors. B. Imperfec Mulicollineariy Definiion: Imperfec mulicollineariy exiss in a K-variable regression if: 3... K i i 3i Ki where v i is a sochasic variable wih mean zero and small variance. As Var(v i ), imperfec becomes perfec mulicollineariy. Alernaively, we could wrie any paricular independen variable as an 'almos' exac linear funcion of he ohers. i = - If you know K variables, you don know he K h variable precisely. Wha are he problems wih imperfec mulicollineariy? i - 3 -... - vi = Coefficiens can be esimaed. OLS esimaors are sill unbiased, and minimum variance (i.e., BLUE). Imperfec mulicollineariy does no violae he classical assumpions. Bu sandard errors 'blow up'. They increase wih he degree of mulicollineariy. I reduces he precision of our coefficien esimaes. K 3i K i - vi

se(bea) 3 4 5 6 7 Page - 3 For example, recall: se( ˆ )= xi( - r 3 ) As r 3, he sandard error. Numerical example: Suppose sandard error is when r 3 =. If r 3 =., If r 3 =.5, If r 3 =.5, If r 3 =.75, If r 3 =.9, If r 3 =.99, hen he sandard error=.. hen he sandard error=.3. hen he sandard error=.5. hen he sandard error=.5. hen he sandard error=.9. hen he sandard error=7.9....4.6.8. Sandard error increases a an increasing rae wih he mulicollineariy beween he explanaory variables. Resul in wider confidence inervals and insignifican raios on our coefficien esimaes (e.g., you ll have more difficuly rejecing he null ha a slope coefficien is equal o zero). r3

Page - 4 This problem is closely relaed o he problem of a small sample size. In boh cases, sandard errors blow up. Wih a small sample size he denominaor is reduced by he lack of variaion in he explanaory variable. II. Mehods of Deecion 3 General Indicaors or Diagnosic Tess. T Raios vs. R. Look for a high R, bu few significan raios. Common 'rule of humb'. Can' rejec he null hypoheses ha coefficiens are individually equal o zero ( ess), bu can rejec he null hypohesis ha hey are simulaneously equal o zero (F es). No an 'exac es'. Wha do we mean by 'few' significan ess, and a 'high' R? Too imprecise. Also depends on oher facors like sample size.. Correlaion Marix of Regressors. Look for high pair-wise correlaion coefficiens. Look a he correlaion marix for he regressors. Mulicollineariy refers o a linear relaionship among all or some he regressors. Any pair of independen variables may no be highly correlaed, bu one variable may be a linear funcion of a number of ohers. In a 3-variable regression, mulicollineariy is he correlaion beween he explanaory variables. Ofen said ha his is a... sufficien, bu no a necessary condiion for mulicollineariy. In oher words, if you ve go a high pairwise correlaion, you ve go problems. However, i isn conclusive evidence of an absence of mulicollineariy. 3. Auxiliary Regressions. Run series of regressions o look for hese linear relaionships among he explanaory variables. Given he definiion of mulicollineariy above, regress one independen variable agains he ohers and 'es' for his linear relaionship.

For example, esimae he following: = 3 Page - 5... K ei where our hypohesis is ha i is a linear funcion of he oher regressors. i 3i Ki We es he null hypohesis ha he slope coefficiens in his auxiliary regression are simulaneously equal o zero: wih he following F es. H K : 3 4 R F = K - - R n - K where R is he coefficien of deerminaion wih i as he dependen variable, and K is he number of coefficiens in he original regression. This is relaed o high Variance Inflaion Facors discussed in he exbook, where VIFs ; if R VIF>5, he mulicollineariy is severe. Bu ours is a formal es. Summary: No single es for mulicollineariy. III. Remedial Measures Once we're convinced ha mulicollineariy is presen, wha can we do abou i? Diagnosis of he ailmen isn clear cu, neiher is he reamen. Appropriaeness of he following remedial measures varies from one siuaion o anoher. EAMPLE: Esimaing he labour supply of married women from 95-999: where: HRS = WW W HRS = Average annual hours of work of married women. W w = Average wage rae for married women. W m = Average wage rae for married men. M

Page - 6 Suppose we esimae he following: HRS ˆ =733.7 48.37 W...(34.97) W -.9W (9.) M R =.847 Mulicollineariy is a problem here. Firs ipoff is he -raios are less han.5 and respecively (insignifican a % levels). Ye, R is.847. Bu easy o confirm mulicollineariy in his case. Correlaion beween mean wage raes is.99 over our sample period! Sandard errors blow up. Can separae he wage effecs on labour supply of married women. Possible Soluions?. A Priori Informaion. If we know he relaionship beween he slope coefficiens, we can subsiue his resricion ino he regression and eliminae he mulicollineariy. Heavy reliance on economic heory. For example, suppose ha β =-.5β. We expec ha β > and β <. HRS = W = (W W = where we compue W * =W W -.5W M. Suppose we re-esimae and find: M * Clearly, his has eliminaed mulicollineariy by reducing his from a 3- o a -variable regression. Using earlier assumpion ha β =-.5β we ge individual coefficien esimaes: Unforunaely, such a priori informaion is exremely rare. W -.5 W M -.5W ) W HRS ˆ =78. 46.8W...(6.7) ˆ = 46.8 ˆ = -3.4 *

Page - 7. Dropping a Variable. Suppose we omi he wage of married men. We esimae: HRS = W W v The problem is ha we're inroducing 'specificaion bias'. We're subsiuing one problem for anoher. Remedy may be worse han he disease. Recall he fac ha he esimae of α is likely o be a biased esimae of β. E( ˆ) b In fac, bias is increased by he mulicollineariy. Where he laer erm comes from he regression of he omied variable on he included regressor. 3. Transformaion of he Variables. One of he simples hings o do wih ime series regressions is o run 'firs differences'. Sar wih he original specificaion a ime. The same linear relaionship holds for he previous period as well: Subrac he second equaion from he firs: or HRS = WW W HRS - The advanage is ha changes in wage raes may no be as highly correlaed as heir levels. M = WW - W M - ( HRS - HRS - )= (WW -WW - ) (W M -W M - ) ( - - ) HRS = WW W M

Page - 8 The disadvanages are: (i) Number of observaions are reduced (i.e., loss of a degree of freedom). Sample period is now 95-999. (ii) May lead o serial correlaion. Cov (, - ) = - - = - - - - Again, he cure may be worse han he disease. Violaes one of he classical assumpions. More on serial correlaion laer. 4. New Daa. Two possibiliies here: (i) Exend ime series. Mulicollineariy is a 'sample phenomenon'. Wage raes may be correlaed over he period 95-999. Add more years. For example, go back o 94. Correlaion may be reduced. Problem is ha i may no be available, or he relaionship among he variables may have changed (i.e., he regression funcion isn sable ). More likely ha he daa isn here. If i was, why wasn i included iniially? (ii) Change Naure or Source of Daa. Swich from ime-series o cross-secional analysis. Change he 'uni of observaion'. Use a random sample of households a a poin in ime. The degree of mulicollineariy in wages may be relaively lower beween spouses. Or combine daa sources. Use 'panel daa'. Follow a random sample of households over a number of years. 5. 'Do Nohing' (A Remedy!). Mulicollineariy is no a problem if he objecive of he analysis is forecasing. Doesn' affec he overall 'explanaory power' of he regression (i.e., R ). More of a problem if he objecive is o es he significance of individual parial coefficiens.

Page - 9 However, esimaed coefficiens are unbiased. Simply reduces he 'precision' of he esimaes. Mulicollineariy ofen given oo much emphasis in he lis of common problems wih regression analysis. If i s imperfec mulicollineariy, which is almos always going o be he case, hen i doesn violae he classical assumpions. Much more a problem of if he goal is o es he significance of individual coefficiens. Less of a problem for forecasing and predicion. IV. Quesions for Discussion: Q8. V. Compuing Exercise: Example 8.5. (Johnson, Ch 8)