OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Similar documents
LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

ECON 497: Lecture Notes 10 Page 1 of 1

Econometrics - 30C00200

Lecture 4: Heteroskedasticity

Chapter 8 Heteroskedasticity

Econ107 Applied Econometrics

Topic 7: Heteroskedasticity

Intermediate Econometrics

Econometrics Multiple Regression Analysis: Heteroskedasticity

Multiple Regression Analysis

Graduate Econometrics Lecture 4: Heteroskedasticity

Heteroscedasticity 1

Answers to Problem Set #4

Multiple Regression Analysis: Heteroskedasticity

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

The Simple Linear Regression Model

WISE International Masters

Lab 11 - Heteroskedasticity

Multiple Regression Analysis

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Econometrics Summary Algebraic and Statistical Preliminaries

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

The regression model with one fixed regressor cont d

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Review of Econometrics

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague


Applied Econometrics (QEM)

Reliability of inference (1 of 2 lectures)

3. Linear Regression With a Single Regressor

Heteroskedasticity. Part VII. Heteroskedasticity

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Least Squares Estimation-Finite-Sample Properties

Introduction to Econometrics. Heteroskedasticity

Homoskedasticity. Var (u X) = σ 2. (23)

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation


Heteroskedasticity and Autocorrelation

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Multiple Linear Regression CIVL 7012/8012

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Steps in Regression Analysis

Introductory Econometrics

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Motivation for multiple regression

ECNS 561 Multiple Regression Analysis

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Econometrics Part Three

Multiple Linear Regression

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

AUTOCORRELATION. Phung Thanh Binh

Simple Linear Regression: The Model

Making sense of Econometrics: Basics

ECON3150/4150 Spring 2016

LECTURE 11. Introduction to Econometrics. Autocorrelation

the error term could vary over the observations, in ways that are related

Introduction to Econometrics

Econ 510 B. Brown Spring 2014 Final Exam Answers

Making sense of Econometrics: Basics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Applied Quantitative Methods II

Answer Key: Problem Set 6

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Introductory Econometrics

Applied Statistics and Econometrics

Applied Econometrics (QEM)

2. Linear regression with multiple regressors

Statistical Inference with Regression Analysis

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

The multiple regression model; Indicator variables as regressors

Environmental Econometrics

ECON The Simple Regression Model

Linear Regression. Junhui Qian. October 27, 2014

More on Specification and Data Issues

Lectures 5 & 6: Hypothesis Testing

Chapter 1. An Overview of Regression Analysis. Econometrics and Quantitative Analysis. What is Econometrics? (cont.) What is Econometrics?

CHAPTER 6: SPECIFICATION VARIABLES

Econometrics I Lecture 3: The Simple Linear Regression Model

ECON 4160, Autumn term Lecture 1

Econometrics of Panel Data

FinQuiz Notes

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

1 Motivation for Instrumental Variable (IV) Regression

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

EC4051 Project and Introductory Econometrics

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Lab 07 Introduction to Econometrics

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON3150/4150 Spring 2015

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Economics 582 Random Effects Estimation

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Economics 308: Econometrics Professor Moody

Applied Statistics and Econometrics

Econometrics Homework 4 Solutions

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Transcription:

OSU Economics 444: Elementary Econometrics Ch.0 Heteroskedasticity (Pure) heteroskedasticity is caused by the error term of a correctly speciþed equation: Var(² i )=σ 2 i, i =, 2,,n, i.e., the variance of the error term depends on exactly which oberservation is. ) Heteroskedasticity occurs in data sets in which there is a wide disparity between the largest and smallest observed values. We may expect that the error term for very large observations might have a large variance, but the error term for small observations might have a small variance. 2) Heteroskedasticity is more likely to take place on cross-sectional models. Cross-sectional models often have observtions of widely different sizes in a sample. 3) Heteroskedasticity may take on many complex forms. 4) A simple but special model of heteroskedasticity assumes that the variance of the error term is related to an exogenous variable z: y i = β 0 + β x i + + β k x ki + ² i with var(² i )=σ 2 z 2 i. (a) The variance of ² i is proportional to the square of. The higher the value of, the higher the variance of ² i. (b) An example: the consumption of a household to its income. The expenditures of a low income household are not likely to be as variable in absolute varlue as the expenditures of a high income one. Figure 0.3 here Impure heteroskedasticity heteroskedasticity that is caused by an error in speciþcation,suchasanomittedvariable. ) An omitted variable may cause a heteroskedastic error because the portion of the omitted effect not represented by included explanatory variables may be absorbed by the error term. 2) The correct remedy is to Þnd the omitted variable and include it in the regression Consequences of (pure) Heteroskedasticity ) Pure heteroskedasticity does not cause bias in the OLSEs of the regression coefficients. (a) Consider the simple regression model y i = βx i + ² i with var(² i )=σi 2. The OLSE is i= ˆβ = x P iy i n = β + Pi= x i² i n. Therefore, E( ˆβ) =β + i= x ie(² i x i ) = β.

2) The Gauss-Markov theorem does not hold. The OLSE may not be the estimator with the smallest variance within the class of linear unbiased estimators. 3) The variance formula for the OLSE is not correct. The variance formula tends to underestimate the true variance of the OLSE. (a) For the simple regression model y i = βx i + ² i with var(² i )=σi 2, the true variance of the OLSE ˆβ is var(ols ˆβ) i= = x2 i σ2 i (. )2 (b) The variance formula from the computer (ignoring heteroskedastic variances) is i= e2 i P n n.itcanbeshownthat E( nx e 2 i )= i= n X i= σi 2 σ2 i If σi 2 and x2 i are positively correlated, one has i= σ2 i σ2 i (n )( P i= n x2 i σ2 i )2 (. )2 That is, the expected value of the estimated variance is smaller than the true variance. Testing for Heteroskedasticity There are many test statistics depending on models. The following are two familiar tests. The Park Test It is designed to test possible heteroskedasticity of the form var(² i )=σ 2 z δ i. It has three steps:. Obtain the OLS residuals: Estimate the regression model by OLS (ignoring possible heteroskedasticity) ˆβ and compute. e i = y i ˆβ 0 ˆβ x i ˆβ k x ki, i =,,n. 2. Run thhe regression ln(e 2 i )=α 0 + α ln( )+u i, where = is a possible (best choice) proportionality factor. 3. Test the signiþcance of ˆα with a t-test. If it is signiþcant, this is evidence of heteroskedasticity; otherwise, not. 4. An empirical example: Woody s restaurants OLSE: ŷ i =02, 92 9075N i +0.355P i +.288I i (2053) (0.073) (0.543) t = 4.42 4.88 2.37 n =33 R2 =0.579 F =5.65, 2

where y = the check volume at a Woody s restaurant N = the number of nearby competitors P = the nearby population I = the average household income of the local area. Park test: try to see if the residuals give any indication of heteroskedasticity by using the population P because large error term variances might exist in more heavily populated areas. ˆ ln(e 2 i )=2.05 0.2865 ln P i (0.6263) t = 0.457 n =33 R 2 =0.0067 F =0.209. The calculated t-score of -0.457 is too small and there is no strong evidence for heteroskedasticity. The White Test It is more general than the Park test and does not need to decide on possible z factor (as in the Park test). ) It runs a regression with the squared residuals on all the original independent variables, their squares and cross products. 2) For example, for y = β 0 + β x + β 2 x 2 + ², the White s test regression equation is e 2 i = α 0 + α x i + α 2 x 2i + α 3 x 2 i + α 4x 2 2i + α 5x i x 2i + u i. 3) Test the overall signiþcance of regression coefficients of the test regression of e 2 i (excluding constant term) by a F -statistic. Alternatively, use nr 2,whereR 2 from the test regression equation, as a chi-square test with degrees of freedom equal to the number of slope coefficients. Remedies for Heteroskedasticity Weighted Least Squares a version of GLS, specially for the heteroskedastic problem. The method is to transform the ² i into a new disturbance with constant variance σ 2.TheOLSapproach is then applied to the transformed equation. The resulted OLS estimator for the transformed equation is called the weighted least squares estimator. ) This approach requires knowledge on the speciþcation of the variance function. 2) For the model y i = β 0 + β x i + ² i where the variance of ² i is speciþed as var(² i )=σ 2 x 2 2. The transformed equation is y i = β 0 + β x i + u i, 3

because u i = ²i which is homoskedastic. a) Estimate the transformed equation by OLS with dependent variable y z and x i. and explanatory variables b) Note the transformed equation may not have an intercept term. That is ok. c) An intercept term may appear if s one of the explanatory variable x. For example, if z = x, then the transformed equation is y i = β 0 + β + u i, x i x i where β becomes the intercept term in the transformed equation. 3) The interpretation of the weighted least squares estimates should be the coefficients of the original (not transformed) regression equation. 4) The weighted least squares is the BLUE (assuming that the variance function) is correctly speciþed. Robust variance estimates for OLSE with an unknown form of heteroskedasticity ) The OLSE (by ignoring heteroskedastic variances) is unbiased, but the standard variance formula for the OLSE is valid. 2) This approach is not attempting to get a possible better coefficient estimate. But, it attempts to get a valid (for large sample) estimate of the proper variance of an OLSE. 3) For example, for the model y i = βx i + ² i (with only a single regressor and no intercept term, for illustration purpose), the heteroskedasticity-corrected standard errors of OLSE ˆβ is e2 i (, )2 where e i s are the OLS residuals. 4) The robust variance formula does not require any speciþcation of the variance function. The technique works better in large samples. 5) The robust variance can be used in t-tests in hypothesis testing. use the value of the robust variance in the denominator of the t ratio formulae. RedeÞning the variables Select variables within theoretical reasoning in the formulation of a regression model which might be less likely subject to heteroskedasticity. ) For an example, consider a model of total expenditures (EXP) by governments of different cities that might be explained by aggregate income (INC), the population (POP), and the average wage (WAGE) in each city. A regression model speciþed as EXP i = β 0 + β + β 2 INC i + β 3 WAGE i + ² i might likely have heteroskedastic disturbances because larger cities have larger incomes and large expenditures thatn the smaller ones. 4

Another theoretical model may be EXP i = α 0 + α INC i + α 2 W AGE i + ² i, where the variables are formulated in per capita terms. The large and small size observations disappear with the per capita variables and this speciþed equation might be less likely subject to the heteroskedasticity issue. An empirical example: Try to explain petroleum consumption by state (PCON), using explanatory variables including the size of the state and gasoline tax rate (TAX). A possible speciþcation is PCON i = β 0 + β REG i + β 2 TAX i + ² i, where PCON i = petroleum consumption in the ith state REG i = motor vehicle registrations in the ith state TAX i = the gasoline tax rate in the ith state ) OLS approach: the estimated equation is ˆ PCON i =55.7+0.86REG i 53.59TAX i (0.07) (6.86) t =5.88 3.8 R 2 =0.86 n =50. The estimated coefficients are signiþcant and have the expected sign. 2) The equation might be subject to heteroskedasticity caused by variation in the size of the states. A plot of the OLS residuals with respect to REG appear to follow a wider distribution for large values of REG than for small value of REG. (Figure 0.8 here) 3) Run a Park test: with ln(reg) asfactor ˆ ln(e 2 i )=.650+0.952 ln(reg i) (0.308) t =3.09 R 2 =0.48 n =50 F =9.533. The critical t-value for a % two-tailed t-test is about 2.7. The computed t =3.09 is larger than 2.7 and, hence, we reject the null hypothesis of homoskedasticity. 5

4) Use robust estimated variances for OLSEs ˆ PCON i =55.7+0.86REG i 53.59TAX i (0.022) (23.90) t =8.64 2.24 R 2 =0.86 n =50. The robust variances of the OLSEs are larger than those without correction. So the uncorrected variance formulas underestimate the proper variances of the OLSE. 5) Estimation with the weighted least squares method PCON ˆ i =28.54 +0.68 7.389 TAX i REG i REG i REG i (0.04) (4.682) t =2.27 3.7 R 2 =0.333 n =50. The weighted least squares estimates of β and β 2 have smaller (estimated) standard errors than those of the OLSEs (compared with the robust variances) in 4). The overall Þt is worse but this has no importance as the dependent variables are different in the two equations. 6) An alternative formulation using per captit petroleum consumption (PCON POP),wheres the population of a state: PCON ˆ i =0.68+0.082 REG i 0.003TAX i (0.076) (0.0035) t =.5 2.95 R 2 =0.65 n =50. This approach is quite different. It is not necessarily better and is not directly comparable to the other equations. Which specþcation is better will depend on the purposes of research. 6