Matematické Metody v Ekonometrii 7.

Similar documents
Matematické Metody v Ekonometrii 5.

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Day 4: Shrinkage Estimators

CHAPTER 6: SPECIFICATION VARIABLES

Föreläsning /31

ISyE 691 Data mining and analytics

Lecture 4: Multivariate Regression, Part 2

Math 423/533: The Main Theoretical Topics

CHAPTER 3: Multicollinearity and Model Selection

Linear regression methods

Least Squares Estimation-Finite-Sample Properties

Regression, Ridge Regression, Lasso

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Lecture 4: Multivariate Regression, Part 2

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Linear Methods for Regression. Lijun Zhang

STAT 100C: Linear models

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Multiple Regression Analysis

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Topic 4: Model Specifications

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

CHAPTER 4: Forecasting by Regression

Linear Model Selection and Regularization

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Data Mining Stat 588

Topic 18: Model Selection and Diagnostics

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

The Statistical Property of Ordinary Least Squares

The Linear Regression Model

A Modern Look at Classical Multivariate Techniques

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Introduction to Estimation Methods for Time Series models. Lecture 1

1 Motivation for Instrumental Variable (IV) Regression

Lecture 11. Correlation and Regression

Variable Selection and Model Building

FinQuiz Notes

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

How the mean changes depends on the other variable. Plots can show what s happening...

Remedial Measures for Multiple Linear Regression Models

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Multiple Linear Regression

Model comparison and selection

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

High-dimensional regression modeling

Regression coefficients may even have a different sign from the expected.

Introduction to Statistical modeling: handout for Math 489/583

ECNS 561 Multiple Regression Analysis

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Multiple linear regression S6

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Empirical Economic Research, Part II

Model Selection Procedures

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Variable Selection and Model Building

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

Multicollinearity and A Ridge Parameter Estimation Approach

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Machine Learning Linear Regression. Prof. Matteo Matteucci

Review of Econometrics

Making sense of Econometrics: Basics

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

Chapter 11 Specification Error Analysis

Econ 510 B. Brown Spring 2014 Final Exam Answers

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

L7: Multicollinearity

Dimension Reduction Methods

Linear model selection and regularization

Regression I: Mean Squared Error and Measuring Quality of Fit

Sociology 593 Exam 1 Answer Key February 17, 1995

School of Mathematical Sciences. Question 1. Best Subsets Regression

Linear Regression Linear Regression with Shrinkage

Introductory Econometrics

Economics 620, Lecture 4: The K-Varable Linear Model I

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Regression Analysis for Data Containing Outliers and High Leverage Points

Advanced Statistics I : Gaussian Linear Model (and beyond)

The prediction of house price

Python 데이터분석 보충자료. 윤형기

MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp

Section 2 NABE ASTEF 65

Linear Regression Models

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Machine Learning for OR & FE

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

1 Introduction 1. 2 The Multiple Regression Model 1

Ridge Regression and Ill-Conditioning

The regression model with one fixed regressor cont d

Transcription:

Matematické Metody v Ekonometrii 7. Multicollinearity Blanka Šedivá KMA zimní semestr 2016/2017 Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 1 / 15

One of the assumptions of the classical and normal regression models is that columns of X are linearly independent (i.e. X is a matrix with full rank). The so called multicollinearity is a high dependency of columns of the matrix X is almost singular and consequently it is problematic to find its inverse. Multicollinearity can be caused by adding polynomial terms or other regressor derived from already existing regressors. Another causes might be including too many variables in the model when some of them measures the same conceptual variable or wrong data collection procedure. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 2 / 15

Gauss Markov theorem for normally distributed disturbances Model Y N ( X β, s 2 ) I, ( (i) b N β; σ 2 ( X T ) 1 ) X ; (ii) RSE σ 2 = s2 (n p) σ 2 χ 2 with ν = n p degree of free (df) (iii) b a RSE are independent (iv) E (a T b) = a T β; where a = (a 0, a 1,..., a k ) T 0 (v) Var (a T b) = s 2 a T ( X T X ) 1 a; (vi) T = a T b a T β s 2 a T (X T X ) 1a t-distribution with ν = n p df. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 3 / 15

The mean squared error (MSE) or mean squared deviation (MSD) of an estimator The MSE of an estimator ˆθ with respect to an unknown parameter θ is defined as MSE(ˆθ) = E ) 2 ( 2 2 (ˆθ θ = Var ˆθ + E ˆθ θ) = Var ˆθ + Bias (ˆθ, θ) for model Y ( X β, σ 2 I ) are given ) E (Ŷ T Ŷ = Y T Y + σ 2 rank (X ) and if X has linear independent columns ( ) E b T b = β T β + σ 2 tr (X T X ) 1 Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 4 / 15

Consequences of multicollinearity Small changes in X result in large changes in estimations of β (i.e. the OLS procedure is ill-conditioned). The standard errors of estimated coefficients tend to be large and therefore they often seem to be statistically insignificant despite a high value of R 2 and high significance of the whole model. The estimated coefficient can have wrong sign or unexpected values not corresponding with economical interpretation of the model. Note: The important sign of multicollinearity is also a fact that increasing number of observation neither reduces standard errors of estimations nor helps to eliminate the other problems mentioned above. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 5 / 15

Detection of multicollinearity - pair correlation coefficients The basic approach is based on pair correlation coefficients between columns of X Compute µ ij = cor (X i, X j), i, j = 1, 2,..., k for all pairs of columns. We can use several rules of thumb to test the multicollinearity. We say that the multicollinearity is present in our model if: There exists µ ij > 0.75 (some literature suggest 0.8 or even 0.9). There exists µ ij R 2, where R 2, the coefficient of determination of the regression model. This method is not very effective when the dependency is generated by three and more columns. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 6 / 15

Detection of multicollinearity - auxiliary regressions Perform auxiliary regression of the column X j = X 1 α 1 +... + X j 1 α j 1 + X j+1 α j+1 +... + X k α k + ε j Rj 2 is the coefficient of determination of the corresponding auxiliary regression. Again, we can use several rules of thumb to test the multicollinearity. We say that the multicollinearity is present in our model if: there exists R 2 j > R 2 there exists VIFj = 1 > 10 We call VIF 1 Rj 2 j variation inflation factor (of the regressor j) and it quantifies the severity of the multicollinearity s influence on the standard error of coefficient b j The VIF j are diagonal elements inverse matrix of correlations diag(cor (X ) 1 ) = VIF. The test statistics F j = R2 j n p 1 Rj 2 p 1 F ν 1,ν 2 exceeds its critical value. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 7 / 15

Solution of multicollinearity We have several options what to do when we detect multicollinearity: (i) Normalise or transform columns of X such that the multicollinearity is eliminated. (ii) Select a submodel such that the regressors which cause multicollinearity are omitted. (iii) Use transformed regressors which are linear combinations of the original regressors (the so called principal component regression (PCR)). (iv) Use the so called ridge regression. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 8 / 15

Ridge regression Consider a linear regression model Y ( X β, σ 2 I ), with diagnosed multicollinearity. Because multicollinearity causes the matrix X T X to be ill-conditioned, we use ridge regression estimator where δ 0 bδ = ( X T X + δi ) 1 X T Y, relation between b = ( X T ) 1 T X X Y and bδ can be expressed as bδ = ( X T ) 1 ( T X + δi X X X T ) 1 T X X y = = ( X T ) 1 T X + δi X X b = [ = I + δ (X T ) 1 ] 1 X b Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 9 / 15

Ridge regression - statistic properties of b δ Obviously, ridge regression estimator is bδ biased which is not a desired property. However, it can be shown that under certain conditions the ridge regression estimator can be somewhat favourable but it can be shown that for 0 < δ < 2σ 2 1 it holds that β 2 MSE(b) MSE(bδ) In practice, however, we do not know the real values of β and σ 2, so we are not able to directly determine the value of δ. s Hence, the usual choice is δ 1 = k 2 = k b T s 2 n, or we could b j=1 b2 j employ values δ (0, δ max ) and plot bδ to create the so called ridge trace. The desired value of δ is where bδ stabilise. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 10 / 15

Choice of the submodel overspecification and underspecification of model including irrelevant variables in a regression model true model y = X β + ε, hod(x ) = k and false selected model y = X β + X 2 β 2 + ε 1, hod([x X 2 ]) = k > k. the estimations b are unbias but the variances of the OLS this estimators higher there is also risk of multicollinearity misspecification of model true model y = X β + ε, rank (X ) = k a y = X 1 β 1 + X 2 β 2 + ε and false selected model y = X 1 β 1 + ɛ 1, rank (X 1 ) = k 1 < k the estimations b are bias E (b 1 ) = β 1 + (X T 1 X 1) 1 X T 1 X 2β 2. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 11 / 15

Model and submodel relationship between model Y N(X β; σ 2 I ), hod(x ) = k and submodel with rank (X ) = k 0 the estimated parameters based on submodel we denote b0, s 2 0, e0 = ŷ 0 y and RSE 0 = e T 0 e 0, than it is hold F 0 = (RSE 0 RSE) / (k k 0 ) RSE/ (n k) F ν1,ν 2, where ν 1 = k k 0 and ν 2 = n k Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 12 / 15

Choice of the model - critters There are many different critters derived for the choice of the model: Minimization of the residual sum of squares RSE = e T e Maximization of the coefficient of determination R 2 = 1 e T e y T y Maximization of the adjusted coefficient of determination R 2 adj = 1 (e T e)/(n k) (y T y )/(n 1) Minimization of the residual variance s 2 = e T e n k Maximization of Mallows C k C k = RSS 0 s 2 + 2 k 0 n, Minimization of an information criterion such as AIC = ln ( s 2 + 2k... (Akaike) A = s 2 1 + k n 1/4)... (Anděl a kol.) SR = ln s 2 + k ln n n... (Swarz,Rissanen) HQ = ln s 2 ln(ln n) + 2 c k n, c = 2 or 3... (Hannan,Quinn) Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 13 / 15

Stepwise regression Forward selection, which involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until none improves the model. Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible. Bidirectional elimination, a combination of the above, testing at each step for variables to be included or excluded. Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 14 / 15

Stepwise regression Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 15 / 15