Heteroscedasticity 1

Similar documents
CHAPTER 6: SPECIFICATION VARIABLES

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

The Simple Regression Model. Part II. The Simple Regression Model

Heteroskedasticity. Part VII. Heteroskedasticity

Multiple Regression Analysis. Part III. Multiple Regression Analysis

7. Prediction. Outline: Read Section 6.4. Mean Prediction

2. Linear regression with multiple regressors

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Exercise Sheet 6: Solutions

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

6. Assessing studies based on multiple regression

Practice Questions for the Final Exam. Theoretical Part

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Lecture 8. Using the CLR Model

Statistical Inference. Part IV. Statistical Inference

Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I. M. Balcilar. Midterm Exam Fall 2007, 11 December 2007.

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

About the seasonal effects on the potential liquid consumption

4. Nonlinear regression functions

LECTURE 11. Introduction to Econometrics. Autocorrelation

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

ECNS 561 Multiple Regression Analysis

Model Specification and Data Problems. Part VIII

Exercise Sheet 5: Solutions

7. Integrated Processes

Econometrics Part Three

Topic 4: Model Specifications

Answers to Problem Set #4

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

7. Integrated Processes

Brief Suggested Solutions

Types of economic data

AUTOCORRELATION. Phung Thanh Binh

Iris Wang.

Econometrics - 30C00200

DEMAND ESTIMATION (PART III)

Making sense of Econometrics: Basics

Forecasting Seasonal Time Series 1. Introduction. Philip Hans Franses Econometric Institute Erasmus University Rotterdam

APPLIED MACROECONOMETRICS Licenciatura Universidade Nova de Lisboa Faculdade de Economia. FINAL EXAM JUNE 3, 2004 Starts at 14:00 Ends at 16:30

10. Time series regression and forecasting

Statistical Inference with Regression Analysis

1 Quantitative Techniques in Practice

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

3. Linear Regression With a Single Regressor

ECON 497: Lecture Notes 10 Page 1 of 1

Exercises (in progress) Applied Econometrics Part 1

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Econometrics - Slides

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Econ 427, Spring Problem Set 3 suggested answers (with minor corrections) Ch 6. Problems and Complements:

AMS 7 Correlation and Regression Lecture 8

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

11. Simultaneous-Equation Models

Föreläsning /31

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

x = 1 n (x = 1 (x n 1 ι(ι ι) 1 ι x) (x ι(ι ι) 1 ι x) = 1

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Multiple Regression Analysis

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Empirical Economic Research, Part II

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Violation of OLS assumption - Heteroscedasticity

Chapter 8 Heteroskedasticity

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Problem Set 2: Box-Jenkins methodology

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Making sense of Econometrics: Basics

Introduction to Econometrics Chapter 4

Applied Quantitative Methods II

Summary of OLS Results - Model Variables

Solution to Exercise E6.

The GARCH Analysis of YU EBAO Annual Yields Weiwei Guo1,a

Tjalling C. Koopmans Research Institute

The Multiple Regression Model Estimation

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Christopher Dougherty London School of Economics and Political Science

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Introduction to Econometrics. Heteroskedasticity

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Regression Analysis By Example

ECO220Y Simple Regression: Testing the Slope

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

THE MULTIVARIATE LINEAR REGRESSION MODEL

Introduction to Econometrics

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

BUSINESS FORECASTING

OLS Assumptions Violation and Its Treatment: An Empirical Test of Gross Domestic Product Relationship with Exchange Rate, Inflation and Interest Rate

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Univariate linear models

The Simple Linear Regression Model

Transcription:

Heteroscedasticity 1 Pierre Nguimkeu BUEC 333 Summer 2011 1 Based on P. Lavergne, Lectures notes

Outline Pure Versus Impure Heteroscedasticity Consequences and Detection Remedies

Pure Heteroscedasticity Homoscedasticity The variance of the error term is constant Heteroscedasticity The variance of the error terms varies, that is Var ε i = σ 2 i i = 1,..., n Violates Classical Assumption 5, which states that Var ε i = σ 2 i = 1,..., n. Pure heteroscedasticity The model is well specified, i.e. Classical Assumptions 1,2,3 holds, but there is heteroscedasticity. Occurs in particular In cross-section, when there is large variation in the dependent variable. In time-series, when there is large variation in the dependent variable over time. When the quality of data collection changes a lot across the sample.

Heteroscedasticity: Examples R i = rent of renter i, I i = income of renter i. R i = β 0 + β 1 I i + ε i Seems sensible to expect that not only mean of rent increases with income, but also that variance (or s.d.) of rent increases with income. W i = β 0 + β 1 E i + β 3 X i + ε i W i = wage of worker i, E i = education level of worker i, X i = experience level of worker i. Mean wage increases with education and experience, but wage dispersion also increases with education and experience.

Impure Heteroscedasticity Caused by misspecification in the model, i.e. Classical Assumption 1 does not hold. If Y i = β 0 + β 1 X 1i + β 2 X 2i + ε i but we omit X 2i, then Y i = β 0 + β 1 X 1i + ε i where ε i = ε i + β 2 X 2i If X 2 is relevant, i.e. β 2 0, Var ε i = f (X 2i ). We should write Y i = β 0 + β 1 X 1i + ε i ε i = ε i + β 0 β 0 + (β 1 β 1 ) X 1i + β 2 X 2i because nothing ensures the coefficients are the same in the two equations. If Y i = β 0 + β 1 X 1i + β 2 X1i 2 + ε i, but we specify a linear equation, then Y i = β 0 + β 1 X 1i + ε i ε i = ε i + f (X 1i ). ε i depends on X 1i, so its s.d.

Consequences Estimates remain unbiased OLS is not BLUE in general, then OLS has not minimum variance The standard errors are biased t-scores don t have a t-distribution, so confidence intervals and tests are unreliable. t-scores are often too large.

Preliminary Checks Are there any obvious specification errors? Delay testing for heteroscedasticity until you are confident with your specification. Is the dependent variable likely afflicted with heteroscedasticity? Range of dependent variable, previous studies,... Is there any likely factor of heteroscedasticity? Graph the residuals against this variable.

Rent versus Income 28000 Rent of renter 24000 20000 16000 12000 8000 4000 0 0 20000 40000 60000 80000 100000 Income of Renter Dependent Variable: RENT Method: Least Squares Date: 11/09/09 Time: 17:38 Sample: 1 108 Included observations: 108 C 5455.483 602.7776 9.050573 0.0000 INCOME 0.063568 0.014390 4.417505 0.0000 R-squared 0.155475 Mean dependent var 7718.111 Adjusted R-squared 0.147508 S.D. dependent var 3577.000 S.E. of regression 3302.662 Akaike info criterion 19.06119 Sum squared resid 1.16E+09 Schwarz criterion 19.11086 Log likelihood -1027.304 F-statistic 19.51435 Durbin-Watson stat 2.012384 Prob(F-statistic) 0.000024 20000 16000 12000 RESID01 8000 4000 0-4000 -8000 0 20000 40000 60000 80000 100000 Income of Renter

The Park Test Assume that Var ε i = σ 2 Zi 2 i = 1,..., n for some proportionality factor Z you can observe. Then ln IEε 2 = ln σ 2 + 2 ln Z i The same reasonning applies if Var ε i = σ 2 Z α i i = 1,..., n. 1. Estimate your equation by OLS and get the residuals e i 2. Run the OLS auxiliary regression ln e 2 i = α 0 + α 1 ln Z i + u i 3. Test the significance of α 1 with a t-test. The Park test assumes that there is only one proportionality factor and you know which one. We look at whether squared residuals are related to Z (all in logs).

Rent versus Income LOGRESIDSQ LOGRESIDSQ vs. Log Income of Renter 20 16 12 8 4 0 20000 40000 60000 80000 100000 Income of Renter Dependent Variable: LOG(RESID01^2) Method: Least Squares Date: 03/12/08 Time: 21:30 Sample: 1 108 Included observations: 108 C 2.083771 2.994793 0.695798 0.4881 LOG(INCOME) 1.216408 0.290830 4.182539 0.0001 R-squared 0.141656 Mean dependent var 14.58387 Adjusted R-squared 0.133559 S.D. dependent var 2.142308 S.E. of regression 1.994121 Akaike info criterion 4.236629 Sum squared resid 421.5110 Schwarz criterion 4.286298 Log likelihood -226.7780 F-statistic 17.49363 Durbin-Watson stat 1.843326 Prob(F-statistic) 0.000060 What is the outcome of the test?

1. Estimate your equation The White Test Y i = β 0 + β 1 X 1i + β 2 X 2i + ε i by OLS and get the residuals e i 2. Run the OLS auxiliary regression e 2 i = α 0 + α 1 X 1i + α 2 X 2i + α 3 X 2 1i + α 4 X 2i + α 5 X 1i X 2i + u i That is regress the squared residuals on all the independent variables, their squares and their cross-products. 3. Test the significance of all coefficients but α 0 with an F-test. H 0 : α 1 = α 2 =... = α 5 = 0 against H A : at least one is not 0 Beware of perfect multicollinearity: If the equation is Y i = β 0 + β 1 X 1i + β 2 X 2 1i + ε i regress squared residuals on an intercept, X 1i, X 2 1i, X 3 1i and X 4 1i.

Rent versus Income Dependent Variable: RESID01^2 Method: Least Squares Date: 11/09/09 Time: 17:58 Sample: 1 108 Included observations: 108 C -14296693 9696170. -1.474468 0.1433 INCOME 1173.190 516.8715 2.269791 0.0253 INCOME^2-0.009549 0.005590-1.708341 0.0905 R-squared 0.077555 Mean dependent var 10705585 Adjusted R-squared 0.059985 S.D. dependent var 31078674 S.E. of regression 30132136 Akaike info criterion 37.30747 Sum squared resid 9.53E+16 Schwarz criterion 37.38197 Log likelihood -2011.603 F-statistic 4.413975 Durbin-Watson stat 1.860540 Prob(F-statistic) 0.014433 What is the outcome of the test?

Weighted Least-Squares Y i = β 0 + β 1 X 1i + β 2 X 2i + ε i and Var ε i = σ 2 Z 2 i. Then Y i Z i = β 0 1 Z i + β 1 X 1i Z i If Z i = X 1i, Now we can use OLS! But careful + β 2 X 2i Z i + u i Y i 1 X 2i = β 0 + β 1 + β 2 + u i X 1i X 1i X 1i is such that Var u i = Var ε i Z i = σ 2 There may be no intercept in the equation. The transformation is only to get OLS estimates, but interpretation relies on the original equation R i = β 0 + β 1 I i + ε i R i 1 = β 0 + β 1 + ε i I i I i β 1 : marginal effect of income on rent.

Rent versus Income.9 RATIO.8.7.6.5.4.3.2.1.0.00000.00004.00008.00012.00016 INVINCOME Dependent Variable: RENT/INCOME Method: Least Squares Date: 11/09/09 Time: 18:05 Sample: 1 108 Included observations: 108 1/INCOME 4811.862 322.2745 14.93094 0.0000 C 0.085679 0.016701 5.130303 0.0000 R-squared 0.677746 Mean dependent var 0.291701 Adjusted R-squared 0.674706 S.D. dependent var 0.171429 S.E. of regression 0.097774 Akaike info criterion -1.793980 Sum squared resid 1.013325 Schwarz criterion -1.744311 Log likelihood 98.87494 F-statistic 222.9331 Durbin-Watson stat 1.900821 Prob(F-statistic) 0.000000 What is the marginal effect of income on rent?

Redefining the Model C i = expenditure in city i, Y i = income in city i, POP i = population in city i, W i = average wage in city i. C i = β 0 + β 1 Y i + β 2 POP i + β 3 W i + ε i When estimated by OLS, this formulation gives a large weight to the large cities. See Figure 10.5. It makes sense to consider a specification that redefine the variables with respect to the size of the city, i.e. C i Y i = α 0 + α 1 + α 2 W i + u i POP i POP i This is a new formulation that relates per capita consumption to per capita income. There may still be heteroscedasticity.

Heteroscedasticity-Corrected Standard Errors In place of another estimation method or another model, we can use OLS (unbiased and consistent) and correct the standard errors. Heteroscedasticity-robust standard errors (White standard errors) Estimate the standard deviation of the OLS coefficients whether there is heteroscedasticity or not Are often larger than the OLS standard errors Can be used to construct tests and confidence intervals in the usual way Works well in large samples Are given by Eviews, see Options/Heteroscedasticty consistent coefficient covariance.

Rent versus Income Dependent Variable: RENT Method: Least Squares Date: 11/09/09 Time: 17:38 Sample: 1 108 Included observations: 108 C 5455.483 602.7776 9.050573 0.0000 INCOME 0.063568 0.014390 4.417505 0.0000 R-squared 0.155475 Mean dependent var 7718.111 Adjusted R-squared 0.147508 S.D. dependent var 3577.000 S.E. of regression 3302.662 Akaike info criterion 19.06119 Sum squared resid 1.16E+09 Schwarz criterion 19.11086 Log likelihood -1027.304 F-statistic 19.51435 Durbin-Watson stat 2.012384 Prob(F-statistic) 0.000024 Dependent Variable: RENT White Heteroskedasticity-Consistent Standard Errors & Covariance C 5455.483 403.2469 13.52889 0.0000 INCOME 0.063568 0.014759 4.307218 0.0000 R-squared 0.155475 Mean dependent var 7718.111 Adjusted R-squared 0.147508 S.D. dependent var 3577.000 S.E. of regression 3302.662 Akaike info criterion 19.06119 Sum squared resid 1.16E+09 Schwarz criterion 19.11086 Log likelihood -1027.304 F-statistic 19.51435 Durbin-Watson stat 2.012384 Prob(F-statistic) 0.000024 OK, the difference is small here, but not always! Id the sample size large enough?

Log Hourly Wage versus Educ and Age Dependent Variable: LWAGE Method: Least Squares Date: 03/12/08 Time: 21:39 Sample: 1 340 Included observations: 340 C -0.056965 0.227325-0.250589 0.8023 EDUC 0.122578 0.013614 9.003560 0.0000 AGE 0.020087 0.002445 8.213880 0.0000 R-squared 0.274597 Mean dependent var 2.424713 Adjusted R-squared 0.270292 S.D. dependent var 0.602183 S.E. of regression 0.514402 Akaike info criterion 1.517162 Sum squared resid 89.17345 Schwarz criterion 1.550947 Log likelihood -254.9175 F-statistic 63.78458 Durbin-Watson stat 2.139412 Prob(F-statistic) 0.000000 Dependent Variable: RESID01^2 Method: Least Squares Date: 11/09/09 Time: 18:31 Sample: 1 340 Included observations: 340 C 1.226049 1.302637 0.941206 0.3473 EDUC -0.067572 0.162368-0.416164 0.6776 AGE -0.038086 0.018546-2.053550 0.0408 EDUC^2 0.002388 0.005441 0.438937 0.6610 AGE^2 0.000419 0.000145 2.894857 0.0040 EDUC*AGE 0.000550 0.000980 0.561242 0.5750 R-squared 0.046380 Mean dependent var 0.262275 Adjusted R-squared 0.032105 S.D. dependent var 0.420260 S.E. of regression 0.413459 Akaike info criterion 1.088974 Sum squared resid 57.09682 Schwarz criterion 1.156544 Log likelihood -179.1256 F-statistic 3.248895 Durbin-Watson stat 1.892939 Prob(F-statistic) 0.007039 Seems like there is heteroscedasticity!

Log Hourly Wage versus Educ and Age Dependent Variable: LWAGE Method: Least Squares Date: 11/09/09 Time: 18:29 Sample: 1 340 Included observations: 340 C -1.461583 0.345819-4.226436 0.0000 EDUC 0.122720 0.013108 9.361960 0.0000 AGE 0.094515 0.014381 6.572153 0.0000 AGE^2-0.000904 0.000172-5.246208 0.0000 R-squared 0.329518 Mean dependent var 2.424713 Adjusted R-squared 0.323531 S.D. dependent var 0.602183 S.E. of regression 0.495281 Akaike info criterion 1.444314 Sum squared resid 82.42203 Schwarz criterion 1.489360 Log likelihood -241.5333 F-statistic 55.04395 Durbin-Watson stat 2.090796 Prob(F-statistic) 0.000000 Dependent Variable: RESID02^2 Method: Least Squares Date: 11/09/09 Time: 18:32 Sample: 1 340 Included observations: 340 C 1.375719 1.340224 1.026485 0.3054 EDUC -0.127623 0.167053-0.763969 0.4454 AGE -0.023028 0.019081-1.206850 0.2283 EDUC^2 0.003993 0.005598 0.713217 0.4762 AGE^2 0.000188 0.000149 1.262912 0.2075 EDUC*AGE 0.000818 0.001009 0.811230 0.4178 R-squared 0.025134 Mean dependent var 0.242418 Adjusted R-squared 0.010540 S.D. dependent var 0.427649 S.E. of regression 0.425389 Akaike info criterion 1.145866 Sum squared resid 60.43937 Schwarz criterion 1.213436 Log likelihood -188.7973 F-statistic 1.722212 Durbin-Watson stat 1.981363 Prob(F-statistic) 0.128881 It was likely impure!