Introductory Econometrics

Similar documents
Econometrics Multiple Regression Analysis: Heteroskedasticity

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Multiple Regression Analysis

Föreläsning /31

Chapter 8 Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity

Intermediate Econometrics

1. The OLS Estimator. 1.1 Population model and notation

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Topic 7: Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics - 30C00200

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland

Introduction to Econometrics. Heteroskedasticity

Semester 2, 2015/2016

Heteroskedasticity (Section )

LECTURE 11. Introduction to Econometrics. Autocorrelation

Multiple Regression Analysis: Heteroskedasticity

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Least Squares Estimation-Finite-Sample Properties

Lecture 4: Heteroskedasticity

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Lab 11 - Heteroskedasticity

Basic Econometrics - rewiev

Reliability of inference (1 of 2 lectures)

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Environmental Econometrics

Advanced Econometrics I

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECON The Simple Regression Model

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Homoskedasticity. Var (u X) = σ 2. (23)

Violation of OLS assumption- Multicollinearity

Economics 620, Lecture 13: Time Series I

Problem set 1 - Solutions

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Ordinary Least Squares Regression

Course Econometrics I

LECTURE 13: TIME SERIES I

The Simple Regression Model. Simple Regression Model 1

Violation of OLS assumption - Heteroscedasticity

Iris Wang.

Topic 4: Model Specifications

Chapter 2. Dynamic panel data models

Introductory Econometrics

OLS, MLE and related topics. Primer.

1 A Non-technical Introduction to Regression

Econometrics Midterm Examination Answers

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Graduate Econometrics Lecture 4: Heteroskedasticity

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 15: Multiple Linear Regression & Correlation

Exercises (in progress) Applied Econometrics Part 1

i) the probability of type I error; ii) the 95% con dence interval; iii) the p value; iv) the probability of type II error; v) the power of a test.

Econometrics Summary Algebraic and Statistical Preliminaries

CHAPTER 6: SPECIFICATION VARIABLES

Instrumental Variables and Two-Stage Least Squares

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Heteroscedasticity and Autocorrelation

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Advanced Quantitative Methods: ordinary least squares

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Testing Linear Restrictions: cont.

the error term could vary over the observations, in ways that are related

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Quantitative Techniques - Lecture 8: Estimation

Exercise sheet 3 The Multiple Regression Model

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

Answer Key: Problem Set 6

Model Mis-specification

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Regression Models - Introduction

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Lecture 8: Instrumental Variables Estimation

Econometrics of Panel Data

A Course on Advanced Econometrics

Lab 07 Introduction to Econometrics

ECO220Y Simple Regression: Testing the Slope

Applied Statistics and Econometrics

Single-Equation GMM: Endogeneity Bias

Heteroskedasticity and Autocorrelation

Ma 3/103: Lecture 24 Linear Regression I: Estimation

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

LECTURE 10: MORE ON RANDOM PROCESSES

Lecture 4: Linear panel models

The returns to schooling, ability bias, and regression

We begin by thinking about population relationships.

1. The Multivariate Classical Linear Regression Model

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Econometrics. 7) Endogeneity

Regression and Statistical Inference

1 Correlation between an independent variable and the error

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Transcription:

Introductory Econometrics Violation of basic assumptions Heteroskedasticity Barbara Pertold-Gebicka CERGE-EI 16 November 010

OLS assumptions 1. Disturbances are random variables drawn from a normal distribution. Mean of this distribution is zero E [ε i ] = 0 / E [ε] = 0 3. Variance of this distribution is constant Var [ε i ] = σ (homoskedasticity) 4. Disturbances are not autocorrelated Cov [ε i ε j ] = 0 3.-4. In matrix notation: Var [ε] = σ I n 1.-4. Can be summarized as: ε NID(0, σ I n ) 5. Disturbances are not correlated with the explanatory variable cov(x ik, ε i ) = 0 / Cov [X, ε] = 0 (consistency assumption) 6. Explanatory variables are not linearly dependent (no multicolinearity)

When all the assumptions are satis ed OLS estimator bβ is a normally distributed random variable h i 9 E b β k = β = h i k ) OLS estimator is unbiased E bβ = β ; h i 9 Var b σ β k = = SST k(1 R h i k ) Var bβ = X T X 1 σ ; ) OLS estimator is e cient (has the lowest possible variance) Thus: OLS estimator is BLUE (best linear unbiased estimator) OLS estimator is consistent (is e cient and unbiased when n! )

Homoskedasticity vs. Heteroskedasticity

Variance-covariance matrix of disturbance term ε = 6 4 ε 1 ε... ε n 3 7 5 Var[ε] = 6 4 Var[ε 1 ] Cov[ε 1, ε ]... Cov[ε 1, ε n ] Cov[ε, ε 1 ] Var[ε ]... Cov[ε, ε n ]............ Cov[ε n, ε 1 ] Cov[ε n, ε ] Var[ε n ] 3 7 5 Homoskedasticity: Var [ε i ] = E [ε i ] (E [ε i ]) = E ε i = σ No autocorrelation: Cov [ε i, ε j ] = E [ε i ε j ] E [ε i ]E [ε j ] = E [ε i ε j ] = 0 Var[ε] = 6 4 σ 0... 0 0 σ... 0............ 0 0 σ 3 7 5 = σ I n

Homoskedasticity vs. heteroskedasticity Homoskedastic disturbance: σ 0... 0 3 0 σ... 0............ 0 0 σ Var[ε] = 6 4 7 5 = σ I n Var[ε] = 6 4 σ 1 0... 0 0 σ... 0............ 0 0 σ i Heteroskedastic disturbance: 3 7 5 = 6 4 σ ω 1 0... 0 0 σ ω... 0............ 0 0 σ ω n 3 7 5 = σ Ω or: Var [ε i ] = σ ω i

x Picturing heteroskedasticity in -variable case y

What happens to OLS estimates under heteroskedasticity? y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i y = Xβ + ε where E [x ik ε i ] = 0, E [ε i ε j ] = 0, but Var [ε i ] = σ ω i 6= const. or E X T ε = 0, but Cov [ε] = σ OLS estimate: b β = X T X 1 X T y h i E bβ h i Var bβ = β (by assumption E [εjx] = 0) ) β b is unbiased 1 1 = X X T X T Var [ε] X X X T = 1 = X X T X T σ X 1 X X T ) h i 1 Var bβ is di erent than X T X σ

What happens to OLS estimates under heteroskedasticity? h i Var bβ 1 1 = σ X X T X T ΩX X T X We used to estimate the OLS estimator variance by: h i dvar bβ = bσ X X 1 1 T = s X T e X = X T X n k 1 h h i E Var d bβ 1 h i Is this a good estimator of Var bβ? h ii Var bβ = E s X X 1 1 T σ X X T X T ΩX X T X in large samples s σ 1 h i 1 = E σ X T X X T X X T ΩX X T X 6= 0 if cov(ω X) 6= 0

What happens to OLS estimates under heteroskedasticity? Standard estimate of OLS standard errors are biased in small samples Standard estimate of OLS standard errors are biased even in large samples if heteroskedasticity is correlated with some explanatory variables We can not perform reliable hypothesis testing OLS estimator is no longer BLUE (Best Linear Unbiased Estimator)

Example - marginal propensity to save Model 1: OLS estimates using the 100 observations 1-100 Dependent variable: sav coefficient std. error t-ratio p-value - - - - - - - - - - - - - - - - - - - - - const -1438.80 1477.88-0.9736 0.337 inc 0.10815 0.06514 1.660 0.1001 size 65.8668 16.69 0.3040 0.7618 educ 143.316 105.37 1.361 0.1768 Unadjusted R-squared = 0.0814 Adjusted R-squared = 0.0553 F-statistic (3, 96) =.8957 (p-value = 0.045)

Heteroskedasticity-robust standard errors OLS estimator b β is unbiased even under heteroskedasticity The only thing we need to be careful about are the standard errors of coe cients h i Knowing that Var bβ = X T X 1 X T σ ΩX X T X 1 rather than h i Var bβ = σ X T X 1, h i let us nd a consistent estimator of Var bβ White (1980) showed that X T σ ΩX can be estimated by 1 n e i x i x T i Heteroskedasticity-robust variance of OLS estimator is estimated as: h i dvar bβ = 1 n X T X 1 e i x i x T i 1 X T X

Heteroskedasticity-robust standard errors h i Var bβ = X T X 1 X T σ ΩX X T X 1 6 4 1 1... 1 x 11 x 1... x n1 x 1 x... x n............ x 1k x k... x nk 3 6 7 4 5 X T σ ΩX = σ 1 0... 0 0 σ... 0............ 0 0 σ i = σ i x i x T i 3 7 6 5 4 1 x 11 x 1... x 1k 1 x 1 x... x k............... 1 x n1 x n... x nk 3 7 5 dvar bβ = 1 n X T X 1 e i x i x T i X T X 1 is a good estimate of the above

Example - marginal propensity to save Model : OLS estimates using the 100 observations 1-100 Heteroskedasticity-robust standard errors Dependent variable: sav coefficient std. error t-ratio p-value - - - - - - - - - - - - - - - - - - - - - - - - const -1438.80 10.51-0.6785 0.4991 inc 0.1081 0.0756476 1.430 0.1560 size 65.8668 17.064 0.3034 0.76 educ 143.316 15.576 0.9393 0.3499 Unadjusted R-squared = 0.0814 Adjusted R-squared = 0.0553 F-statistic (3, 96) =.3654 (p-value = 0.0795)

Why don t we always apply heteroskedasticity-robust standard errors? Robust standard errors can be used for valid hypothesis testing only in large samples t = b β β! t(n k 1) b n! se( b β) In small samples robust t-statistics might be distributed di erently We prefer to use robust standard errors only where presence of heteroskedasticity is justi ed We would like to test whether hetoerskedasticity is present

Testing for heteroskedasticity y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i Under homoskedasticity (H 0 ): Var[ε] = σ I or Var (ε i ) = σ (const) Under heteroskedasticity (H A ): Var[ε] = σ Ω or Var (ε i ) 6= const Let us remember that all these assumptions are conditional on the explanatory variables, i.e. Under homoskedasticity (H 0 ): Var[εjx] = σ I or Var (ε i jx i ) = σ (const) Under heteroskedasticity (H A ): Var[εjx] = σ Ω or Var (ε i jx i ) 6= const

Testing for heteroskedasticity Under homoskedasticity(h 0 ): Var[εjx] = σ I or Var (ε i jx i ) = σ (const) Under heteroskedasticity(h A ): Var[εjx] = σ Ω or Var (ε i jx i ) 6= const Another assumption states that E [ε i jx i ] = 0, thus: Var (ε i jx i ) = E ε i jx i E [ε i jx i ] = E ε i jx i H 0 : E ε i jx i = const H A : E ε i jx i 6= const Estimate ε i by residuals e i and nd out if E e i jx i = const

Testing for heteroskedasticity H 0 : E ε i jx i = const H A : E ε i jx i 6= const Estimate ε i by residuals e i and nd out if E e i jx i = const e i = δ 0 + δ 1 x i1 + δ x i +... + δ k x ik + u i H 0 : δ 1 = δ =... = δ k = 0 H A : at least one of the deltas is signi cant Use the F-test for overall signi cance of the above regression: F = (SSR R SSR U )/k SSR U /(n k 1) = Ru /k (1 Ru )/(n k 1) (because SSR R = SST R ) R R = 0)

Testing for heteroskedasticity e i = δ 0 + δ 1 x i1 + δ x i +... + δ k x ik + u i H 0 (homoskedasticity): δ 1 = δ =... = δ k = 0 H A (heteroskedasticity): at least one of the deltas is signi cant Test statistics: F = Ru /k F (k, n k 1) (1 Ru )/(n k 1) We reject the null hypothesis (reject homoskedasticity) if the test statistics is higher than the appropriate critical value.

Example - marginal propensity to save Model 4: OLS estimates using the 100 observations 1-100 Dependent variable: e_sq coefficient std. error t-ratio p-value - - - - - - - - - - - - - - - - - - - - - - - - const -4.064E+07.36545E+07-1.70 0.090 inc 6.309 104.6 0.171 0.886 size.91615e+06 3.4689E+06 0.8408 0.405 educ 3.03488E+06 1.6858E+06 1.800 0.0750 Unadjusted R-squared = 0.0541 Adjusted R-squared = 0.080 F-statistic (3, 96) = 1.7699 (p-value = 0.158)

Heteroskedasticity - summary In small samples it always means estimating biased standard errors of coe cients by OLS In large samples OLS estimates of coe cients standard errors are biased only if heteroskedasticity is correlated with explanatory variables Heteroskedasticity robust standard errors might not produce a t-distribudion in small samples In large samples robust standard errors produce a t-distribudion Especially in small samples we would like to test for presence of heteroskedasticity before applying robust standard errors

Special form of heteroskedasticity Heteroskedasticity is problematic when correlated with X We could model this relationship by Var (ε i jx i ) = σ h(x i ) h(x i ) > 0 Assume we know h(x i ) y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i where E [εjx] = 0, E [ε i ε j jx] = 0 8i, j and Var [εjx] = σ h(x) Note that although Var [εjx] = E ε jx = σ h(x), ε Var p jx = 1 h(x) h(x) Var ε jx = 1 h(x) σ h(x) = σ ε Thus, Var p jx = 0 h(x) ε Moreover E p jx = p 1 E [εjx] = p 1 0 = 0 h(x) h(x) h(x)

Special form of heteroskedasticity y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i ε where Var p ε jx = 0 and E p jx = 0 h(x) h(x) y i p h(xi ) = β 0 1 p h(xi ) + β 1 x i1 p h(xi ) +... + β k x ik p h(xi ) + ε i p h(xi ) y i = β 0 x i0 Satis es the assumptions that E ε jx = 0, E + β xi +... + β k xik + εi h εi ε j jxi = 0 8i, j and Var [ε jx ] = σ

Weighted Least Squares (WLS) y i p h(xi ) = β 0 1 p h(xi ) + β 1 x i1 p h(xi ) +... + β k x ik p h(xi ) + ε i p h(xi ) y i = β 0 x i0 + β x i +... + β k x ik + ε i OLS estimates of this equation (β 0, β 1,...,β k ) are called the WLS estimates Each observation (including the constant term) is weighted by 1 p h(xi ) WLS estimators are more e cient than the OLS estimators in presence of heteroskedasticity The Weighted Least Squares (WLS) estimation is a special case of the Generalized Least Squares (GLS) estimation.

Feasible GLS - estimating the heteroskedasticity function In the above example we knew what is the form of heteroskedasticity Usually, we do not know this form We can assume some general functional form and estimate it using the data The assumed functional form of heteroskedasticity: Var (ε i jx i ) = σ exp (δ 0 + δ 1 x i1 + δ x i +... + δ k x i Thus, we assume that h(x i ) = exp (δ 0 + δ 1 x i1 + δ x i +... + δ k x ik ) We use exponential function to assure that h(x i ) is positive We can write: Var (ε i jx i ) = E ε i jx i e i jx i v i, where v i is a random variable wit e i = σ exp (δ 0 + δ 1 x i1 + δ x i +... + δ k x ik ) v i

Feasible GLS - estimating the heteroskedasticity function Original regression: y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i Heteroskedasticity form : Var (ε i jx i ) = σ exp (δ 0 + δ 1 x i1 + δ x i +... + δ k x ik ) log ei log ei e i = σ exp (δ 0 + δ 1 x i1 + δ x i +... + δ k x ik ) v i assuming that v i is independent of x i = log σ + (δ 0 {z } v + δ 1 x i1 + δ x i +... + δ k x ik ) + log v i {z } u i = α 0 + δ 1 x i1 + δ x i +... + δ k x ik + u i, where E [u i jx i ] = 0, because E [v i jx i ] = 1 and log(1) = 0

Feasible GLS - estimating the heteroskedasticity function Original regression: y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i Var(ε i ) = σ h(x i ) h(x i ) = exp (δ 0 + δ 1 x i1 + δ x i +... + δ k x ik ) log [h(x i )] = δ 0 + δ 1 x i1 + δ x i +... + δ k x ik We can estimate h(x i ) by regressing the log of squared e i from the original regression on all explanatory variables log ei = α0 + δ 1 x i1 + δ x i +... + δ k x ik + u i Note, that tted values, log \ (e i ) are estimates of α 0 + δ 1 x i1 + δ x i +... + δ k x ik and thus exp log \(e i ) = σ \ h(x i ) can be then used to estimate original equation by

Feasible GLS - the procedure 1 Estimate the original equation by OLS y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i and record e i Calculate log ei 3 Estimate the following regression by OLS: log ei = α0 + δ 1 x i1 + δ x i +... + δ k x ik + u i and record the tted values log(e \ i ) 4 Calculate h(x [ i ) = exp log(e \ i ) - i.e. estimate h(x i ) by the exponential of tted values 5 Finally, estimate the original equation by WLS, using 1 [h(x i ) y i = β 0 + β 1 x i1 + β x i +... + β k x ik + ε i as weights.

Summary What is heteroskedasticity What happens to OLS estimates under heteroskedasticity? How to test for presence of heteroskedasticity? Three methods to deal with heteroskedasticity heteroskedasticity-robust standard errors Weighted Least Squares (Generalized Least Squares) Feasible Generalized Least Squares