Diagnostics of Linear Regression

Similar documents
Iris Wang.

Formulary Applied Econometrics

Heteroskedasticity and Autocorrelation

AUTOCORRELATION. Phung Thanh Binh

LECTURE 10: MORE ON RANDOM PROCESSES

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

LECTURE 11. Introduction to Econometrics. Autocorrelation

Econometrics - 30C00200

ECON The Simple Regression Model

Ch 2: Simple Linear Regression

Linear Regression. Junhui Qian. October 27, 2014

Heteroskedasticity. Part VII. Heteroskedasticity

Econometrics of Panel Data

Econometrics. 9) Heteroscedasticity and autocorrelation

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Linear Regression with Time Series Data

The Simple Regression Model. Part II. The Simple Regression Model

ECON 4230 Intermediate Econometric Theory Exam

Econometrics Multiple Regression Analysis: Heteroskedasticity

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

F9 F10: Autocorrelation

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Multiple Regression Analysis: Heteroskedasticity

FinQuiz Notes

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

Econometrics Summary Algebraic and Statistical Preliminaries

Advanced Econometrics

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Estadística II Chapter 5. Regression analysis (second part)

Multiple Regression Analysis

DEMAND ESTIMATION (PART III)

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Econometrics I Lecture 3: The Simple Linear Regression Model

Ch 3: Multiple Linear Regression

Heteroscedasticity. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Heteroscedasticity POLS / 11

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

Homoskedasticity. Var (u X) = σ 2. (23)

Econometrics Part Three

Introductory Econometrics

Week 11 Heteroskedasticity and Autocorrelation

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECO375 Tutorial 7 Heteroscedasticity

Introductory Econometrics

Assumptions of the error term, assumptions of the independent variables

Empirical Economic Research, Part II

The Multiple Regression Model Estimation

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Introductory Econometrics

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Linear Regression with Time Series Data

Simple Linear Regression: The Model

Math 3330: Solution to midterm Exam

Econ 510 B. Brown Spring 2014 Final Exam Answers

Chapter 5. Classical linear regression model assumptions and diagnostics. Introductory Econometrics for Finance c Chris Brooks

Econometrics of Panel Data

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Likely causes: The Problem. E u t 0. E u s u p 0

Section 2 NABE ASTEF 65

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Economics 308: Econometrics Professor Moody

The Simple Regression Model. Simple Regression Model 1

Regression Analysis: Basic Concepts

10. Time series regression and forecasting

Introduction to Eco n o m et rics

2. Linear regression with multiple regressors

Linear Regression with Time Series Data

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Applied Econometrics (QEM)

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Measuring the fit of the model - SSR

in the time series. The relation between y and x is contemporaneous.

Modified Variance Ratio Test for Autocorrelation in the Presence of Heteroskedasticity

Lab 11 - Heteroskedasticity

Multiple Regression Analysis

Econometrics 2, Class 1

Dealing With Endogeneity

CHAPTER 6: SPECIFICATION VARIABLES

Multiple Linear Regression CIVL 7012/8012

The Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Making sense of Econometrics: Basics

Diagnostics and Remedial Measures: An Overview

LECTURE 13: TIME SERIES I

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Non-independence due to Time Correlation (Chapter 14)

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

Intermediate Econometrics

ECON 120C -Fall 2003 PROBLEM SET 2: Suggested Solutions. By substituting f=f0+f1*d and g=g0+... into Model 0 we obtain Model 1:

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Transcription:

Diagnostics of Linear Regression Junhui Qian October 7, 14

The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions we made are valid. For OLS estimation, we should usually check: Is the relationship between x and y linear? Are the residuals serially uncorrelated? Are the residuals uncorrelated with explanatory variables? (endogeneity) Does homoscedasticity hold?

Residuals Residuals are unobservable. But they can be estimated: û i = y i x i ˆβ. Using matrix language, û = (I P X )Y. If ˆβ is close to β, then û i is close to u i. Let ŷ i = x i ˆβ, we call ŷ i the the fitted value. Then the explained variable can be decomposed into y i = ŷ i + û i.

Variations SST (total sum of squares) SST n (y i ȳ) = Y (I P ι )Y. i=1 SSE (explained sum of squares) SSE n (ŷ i ȳ) = Y (P X P ι )Y. i=1 SSR (sum of squared residuals) SSR n ûi = Y (I P X )Y. i=1 We have SST = SSE + SSR.

Goodness of Fit R of the regression: R SSE/SST = 1 SSR/SST. R is the fraction of the sample variation in y that is explained by x. And we have R 1. R does NOT validate a model. A high R only says that y is predictable with information in x. In social science, this is not the case in general. If additional regressors are added to a model, R will increase. The adjusted R, denoted as R, is designed to penalize the number of regressors, R = 1 [SSR/(n 1 k)]/[sst/(n 1).

6 Low R 4 High R 4 3 1 Y Y 1 4 4 4 X 3 4 4 X

Residual Plots We can plot Residuals Residuals versus Fitted Value Residuals versus Explanatory Variables Any pattern in residual plots suggests nonlinearity or endogeneity.

1 Regression 6 Residuals Y 8 6 4 4 4 4 Residuals 4 4 4 6 8 1 6 Residuals v.s. Regressor 6 Residuals v.s. Fitted Value 4 4 Residuals Fitted Value 4 4 4 X 4 4 6 Residuals Figure : Residual Plots. DGP: y =. + x +.5x + u

Partial Residual Plots To see whether there exists nonlinearity in a regressor, say the j-th explanatory variable x j, We can plot û + ˆβ j x j versus x j, where û is residual from the full model. Partial residual plots may help us find the true (nonlinear) functional form of x j.

Partial Residual Plots: Example Suppose the true model is y = β + β 1 x + β z + g(z) + u, where g(z) is a nonlinear function. We mistakenly estimate: y = ˆβ + ˆβ 1 x + ˆβ z + û. If we plot ˆβ z + û versus z, we may probably be able to detect nonlinearity in g(z).

1 Partial Residual Plot 1 8 6 Partial residual 4 4 3 1 1 3 4 Z Figure : Residual Plots. DGP: y =. + x +.5z + z + u

The iid Assumption The CLR assumption dictates that residuals should be iid. It is generally difficult to determine whether a given number of observations are from the same distribution. If there is a natural order of the observations (e.g., time), then we may check whether the residuals are correlated. If there is correlation, then the iid assumption is violated.

Residuals with Time When we deal with time series regression, for example, π t = β + β m t + u t, where π t is the inflation rate and m t is the growth rate of money supply, both indexed by time t. Now the natural order is time, and a time series plot of the estimated residual contains information.

Residual Plots We can plot: Residuals over time Residuals v.s. previous residual Correlogram

4 No Correlation 4 4 Weak Correlation 4 1 Strong Correlation 5 5 1 4 6 8 1 1 14 16 18 Time Figure : Residuals over time: u t = αu t 1 + ε t, α =,.5,.95, from top to bottom.

3 No Correlation 4 Weak Correlation 8 Strong Correlation u t 1 1 4 u t 3 1 1 3 5 5 u t 6 4 4 6 5 5 1 u t 1 u t 1 u t 1 Figure : Residuals v.s. previous residual: u t = αu t 1 + ε t, α =,.5,.95, from left to right.

1. No Correlation 1 Weak Correlation 1 Strong Correlation 1.8.8.6.8.6.4.6.4....4 4 4 Lag Lag..4. 4 Lag Figure : Correlograms: u t = αu t 1 + ε t, α =,.5,.95, from left to right.

Durbin-Watson Test Durbin-Watson is the formal test for independence, or more precisely, non-correlation. It assumes a AR(1) model for u t, u t = αu t 1 + ε t. The null hypothesis is: H : ρ = α =. The test statistic is DW = T t= (û t û t 1 ). T t=1 û t 1

Durbin-Watson Test DW [, 4]. DW = indicates no autocorrelation. If DW is substantially less than, there is evidence of positive serial correlation. As a rough rule of thumb, if DW is less than 1., there may be cause for alarm. Small values of DW indicate successive error terms are, on average, close in value to one another, or positively correlated. Large values of DW indicate successive error terms are, on average, much different in value to one another, or negatively correlated.

Fixing Correlation It s most likely that the model is misspecified. The usual practices are: Add more explanatory variables Add more lags of the existing explanatory variables

If var(u i x) = σ, we call the model homoscedastic. If not, we call it heteroscedastic. If homoscedasticity does not hold, but CLR Assumptions 1-4 still hold, the OLS estimator is still unbiased and consistent. However, OLS is no longer BLUE. We can detect heteroscedasticity by looking at the residuals v.s. regressors. For simple regressions, we can look at regression lines. And we can formally test for homoscedasticity. White test Breusch-Pagan test

Heteroscedasticity 3 Regression Line.5 1 Residuals Y 1.5 1 1.5.5 1 1.5.5 X.5.5 1 1.5.5 X Figure : Heteroscedasticity. DGP: y i = β +.5x i + x i ε i.

Fixing Heteroscedasticity Use a different specification for the model (different variables, or perhaps non-linear transformations of the variables). Use GLS (Generalized Least Square).