Agricultural and Applied Economics 637 Applied Econometrics II

Similar documents
Agricultural and Applied Economics 637 Applied Econometrics II. Assignment III Maximum Likelihood Estimation (Due: March 31, 2016)

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

Agricultural and Applied Economics 637 Applied Econometrics II. Assignment III Maximum Likelihood Estimation (Due: March 25, 2014)

Introduction to Econometrics. Heteroskedasticity

Econometrics Part Three

LECTURE 11. Introduction to Econometrics. Autocorrelation

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Graduate Econometrics Lecture 4: Heteroskedasticity

Econometrics - 30C00200

Multiple Linear Regression

Heteroskedasticity. Part VII. Heteroskedasticity

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Econ 510 B. Brown Spring 2014 Final Exam Answers

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

ECON 4230 Intermediate Econometric Theory Exam

Week 11 Heteroskedasticity and Autocorrelation

Regression of Time Series

Heteroskedasticity and Autocorrelation

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Likely causes: The Problem. E u t 0. E u s u p 0

Instrumental Variables, Simultaneous and Systems of Equations

Econometrics. 9) Heteroscedasticity and autocorrelation

Intermediate Econometrics

Introductory Econometrics

AUTOCORRELATION. Phung Thanh Binh

7. GENERALIZED LEAST SQUARES (GLS)

Instrumental Variables

Econometrics of Panel Data

Correlation Analysis

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Heteroscedasticity. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Heteroscedasticity POLS / 11

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Econometrics Multiple Regression Analysis: Heteroskedasticity

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Topic 7: Heteroskedasticity

Diagnostics of Linear Regression

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Maximum Likelihood (ML) Estimation

Lecture 4: Heteroskedasticity

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

FinQuiz Notes

Reliability of inference (1 of 2 lectures)

Linear Model Under General Variance

Statistics 910, #5 1. Regression Methods

F9 F10: Autocorrelation

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Econometrics of Panel Data

Linear Model Under General Variance Structure: Autocorrelation

Ordinary Least Squares Regression

Empirical Economic Research, Part II

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Topic 10: Panel Data Analysis

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

MEI Exam Review. June 7, 2002

DEMAND ESTIMATION (PART III)

Appendix A: The time series behavior of employment growth

Multiple Regression Analysis

Model Mis-specification

Econometrics - ECON4160 Exercises (GiveWin & PcGive) 1. Exercises to PcGive lectures 15. January 2009

Econometrics - ECON4160 Exercises (GiveWin & PcGive) 1. Exercises to PcGive lectures 5th February 2004

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

1 Introduction to Generalized Least Squares

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Okun's Law Testing Using Modern Statistical Data. Ekaterina Kabanova, Ilona V. Tregub

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Iris Wang.

Spatial Econometrics

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

This is a repository copy of Estimating Quarterly GDP for the Interwar UK Economy: An Application to the Employment Function.

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics of Panel Data

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

Econometrics of Panel Data

11.1 Gujarati(2003): Chapter 12

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

GLS and related issues

Midterm 2 - Solutions

Economics 308: Econometrics Professor Moody

Econometrics. 7) Endogeneity

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Polynomial Regression

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

the error term could vary over the observations, in ways that are related

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Statistics for Managers using Microsoft Excel 6 th Edition

Volatility. Gerald P. Dwyer. February Clemson University

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Econ 583 Final Exam Fall 2008

Econometric Forecasting Overview

SERIAL CORRELATION. In Panel data. Chapter 5(Econometrics Analysis of Panel data -Baltagi) Shima Goudarzi

Transcription:

Agricultural and Applied Economics 637 Applied Econometrics II Assignment 1 Review of GLS Heteroskedasity and Autocorrelation (Due: Feb. 4, 2011) In this assignment you are asked to develop relatively simple estimation procedures that account for error terms in a linear regression model that are either heteroskedasity or autocorrelated. In answering these questions you are expected to develop the necessary R code for estimation and hypothesis testing. Please hand in all code developed for this assignment used in parameter estimation or hypothesis testing along with resulting output files. 1. (65 pts) In the data section of the class website is a data series summarizing the U.S. copper industry from 1971 until 2000. The following table contains a summary of the variables contained in that data set (Hint: The listing does not follow the order in which they are stored in the data set. Feel free to validate your R generated results with any canned program you have used in the past.) Description of Variables in the Copper Market Dataset Variable Description Units CP AP INDUS Twelve-Month Average U.S. Domestic Price of Copper Twelve-Month Average Price of Aluminum Twelve-Month Average Index of Industrial Production Cents/lb. Cents/lb. GNP Annual Gross National Product Bil. $ LONDON Twelve-Month Average London Metal Pounds Exchange Price of Copper Sterling HOUSE Number of Housing Starts per Year 1,000 Units YEAR Year of analysis # a. (20 pts) Using the above dataset, estimate the following regression model via the Classical Regression Model (CRM ) method: ln(cp) = β 0 + β 1 ln(indus) + β 2 ln(london) + β 3 ln(house) + β 4 ln(ap) + ε where ε t ~(0,σ 2 ). #

Modify the R CRM procedure we reviewed in our workshop. That is, in addition to the typical regression summary statistics, coefficient estimates, coefficient standard errors, equation F-statistic, etc., modify this code so that it automatically calculates and displays (i) the Durbin-Watson statistic, (ii) the ρ value assuming AR(1), (iii) the asymptotic standard normal test statistic for testing for AR(1) based on your estimate of ρ, and (iv) generation of the Lagrange Multiplier (LM) test statistic for the AR(1) process. 1 What is the DW statistic for this model? Does this statistic indicate the presences of an AR(1) error structure? Using an asymptotic test, what does the estimated ρ value indicate in terms of the presence of autocorrelation? What does the LM test statistic indicate with respect to the presence of autocorrelation? b. (5 pts) Assuming there is autocorrelation, modify the above code to present not only the traditional CRM parameter estimates and associated biased CRM parameter standard errors but also the CRM inefficient but unbiased parameter standard errors. 2 Are there any major differences in standard error estimates? c. (5 pts) I would like you to use the CRM procedure developed in (b) undertake hypothesis tests of the role of the level of industrial production and housing starts on the level of copper prices (individually) using the unbiased (but possibly inefficient) standard error estimates. Do your results make sense? d. (20 pts) I would like you to extend the CRM procedure you developed above to develop a new procedure that undertakes a non-iterative, two-step Feasible Generalized Least Squares (FGLS) estimation assuming that AR(1) does exist. The file ar_1_general_algorithm.pdf contains in words, a general algorithm for undertaking a FGLS estimation of the AR(1) model. The following diagram parallels this description: 1 Under suitable conditions, it can be shown that will be approximately normally distributed with mean ρ and variance (1-ρ 2 )/T. If the null hypothesis that ρ=0 is true the variance becomes 1/T. You can therefore use a Z-statistic to test the above hypothesis. With respect to the Lagrangian multiplier statistic, refer to the file LM_auto.ppt for more information. 2 In general when we have error terms that are characterized as being autocorrelated or heteroscedastic, the correct formulas to use to estimate the CRM parameter covariance matrix is not σ 2 (X X) -1 but rather the following: Σ β = (X X) -1 X ΦX(X X) -1 = σ 2 (X X) -1 X ΨX(X X) -1 where Φ is the error variance matrix, Φ=σ 2 Ψ, σ is a scalar, Ψ is a (T x T) symmetric matrix and T the number of observations. 2

Flow Chart for AR(1) GLS Code Obtain Estimates of CRM Errors β G =(X Ψ -1 X) -1 X Ψ -1 y CRM of e s,t =f(e s,t-1 ) Estimate ρ, Durbin- Watson Build Matrix Estimate FGLS Parameters 2 2 T-1 1 ρ ρ ρ 1-0 0 0 0 T-2 -ρ 1 0 0 0 ρ 1 ρ ρ 1 2 T-3 0 -ρ 1 0 0 where 2 ρ ρ 1 ρ or P 1-ρ 0 0 0 1 0 T-1 T-2 T-3 ρ ρ ρ 1 0 0 0 ρ 1 1 and PP The R AR(1) procedure you develop should enable you to estimate an AR(1) model via an invoking command that looks something like the following: output.estimatear1 <- EstimateAR1(rhsvar, depend) out.betagls <- output.estimatear1$ret.bgls out.cov.betagls <- output.estimatear1$ret.cov.bgls out.dwstat.stat <- output.estimatear1$ret.dw.stat out.rhoest.hat <- output.estimatear1$ret.rho.hat The procedure EstimateAR1 is a function that you define that takes two arguments, the matrix of explanatory variables, rhsvar (that may or may not include a vector of ones depending on how you define your procedure) and a vector that identifies your dependent vector, depend. There are four returns to this procedure, the vector of estimated coefficients, bgls, the GLS parameter covariance matrix, cov.bgls, an estimate of the DW calculated from the CRM residuals, dw.stat and an estimate of ρ, rho.hat. Make sure that your AR procedure reports both the CRM results (which 3

includes both the traditional but incorrect CRM parameter covariance matrix and the modified but inefficient CRM covariance matrix under AR(1)), the results of your AR(1) test and the resulting GLS estimates, standard errors, t-values, etc. [NOTE: As with the development of your CRM procedure, you should design your procedure for use with any data set. The matrices, rhsvar and depend are defined by you in the R code before you call out the function and they can be named anything you want. For example in another application you may call out your AR(1) procedure via the following output.estimatear1 <- EstimateAR1(cbind(age, income,kids), foodexp) That is, your procedure knows that the 1 st argument to the EstimateAR1 function call is the matrix of exogenous variables and the 2 nd argument is the endogenous variable whose variance we are attempting to explain.] I would like you to apply your AR(1) procedure to the copper market dataset. Given the above, obtain feasible generalized least squares [AR(1)] estimates of the parameters of the relationship represented by [1.1]. Report your GLS regression results and compare these results to those obtained under the CRM. e. (10 pts) Calculate and display the squared correlation between the predicted and actual values of the U.S. copper price (CP, not the logarithm) under the GLS-based model. (Note: Remember the functional form. What does the non-logrithmic version look like with the error term? What is E[exp(ε)]? You should note that under the CRM specification that E(ε t )=0 and E[exp(ε t )] may not equal 1.0 but in fact E[exp(ε t )] = exp(σ 2 /2)]. h. (5 pts) Undertake a joint hypothesis test that housing starts and the world aluminum price have no effect on domestic copper prices using the FGLS results. What are the results of your joint test? 2. (45 pts) As you reviewed in AAE636, there are a variety of ways to control for the effect of heteroskedasityity in the estimation of a linear regression model. A common approach is referred to as the multiplicative heteroskedasityity specification. Under this approach we have y t = X t β + ε t where E(ε t 2 )= σ t 2 = exp(z t α), Z t is a (1 x S) vector containing the t th observation on S nonstochastic explanatory variables and α is a (S x 1) parameter vector. Thus, the error variance changes across observation and the error covariances across observations are 0. These variances depend on a set of exogenous variables. The following provides a method for obtaining FGLS estimates of the unknown parameters using above structure: (i) Use CRM to obtain consistent estimates of error term. That is, β S =(X'X) -1 X y continues to be a consistent estimator of β even with multiplicative heteroskedasityity. This implies e s =y Xβ s is a consistent estimate of the true, unknown error vector. 4

(ii) ln(σ t 2 )= Z t α given σ t 2 = exp(z t α) (iii) ln(e st 2 )+ ln(σ t 2 )= ln(e st 2 )+ Z t α (iv) ln(e st 2 ) = Z t α +ν t where ν t ln(e st 2 ) - ln(σ t 2 ) (v) α s =(Z'Z) -1 Z' ln(e st 2 ) That is, one can treat (iv) as a traditional linear regression model where ln(e st 2 ) is the dependent variable. It can be shown that in (v) α s0, the intercept contained within the α s vector, is an inconsistent estimator of the intercept term with an inconsistency of -1.2704 asymptotically a consistent estimator of the intercept can be obtained by calculating: α s0 + 1.2704. Also the matrix 4.9348(Z Z) -1 can be used to approximate the (S x S) covariance matrix of α s. Testing this model as an alternative to one with homoscedastic errors is equivalent to testing the null hypothesis H 0 : α * =0 against the alternative H 1 : α * 0 where α * is an ((S-1) x 1) parameter vector that excludes the intercept term. Let D be the matrix of (Z Z) -1 with its first row and column deleted (after inversion). This implies that α * S ~ N(α *,4.9348D). We can use the estimated covariance matrix of α * to test the null hypothesis H 0 : α * =0 via the following statistic which has a χ 2 distribution with (S-1) df: 4.9348 * 1 * s D s ~ χ 2 (S-1). Given the above, we have the following flowchart of how one can estimate the parameters of a model that is characterized by multiplicative heteroscedaticity using a two-step FGLS approach: 5

Flow Chart for Multiplicative Heteroscedasticity GLS Code Obtain Estimates of CRM Errors CRM of ln(e s,t 2 )=f(z t ) Estimate α, Durbin- Watson Diagonal Matrix Build Matrix Estimate FGLS Parameters β G =(X Ψ -1 X) -1 X Ψ -1 y where * * exp Z1 α exp Z1α * * exp Z2α 2 2 exp Z2α exp ZT α * * exp ZTα * * exp -Z1α * * -1 exp -Z2α and * * exp -ZTα (a) (10 pts) On the class website is a dataset that provides data on 81 cars with respect to average miles per gallon (MPG), engine horsepower (HP), cubic feet of cab space (VOL), top speed in miles per hour (SP) and vehicle weight (WGT). Estimate the following model via the CRM: MPG = β 0 + β 1 SP +β 2 HP + β 3 WGT + ε where ε t ~(0,σ 2 ). Would you expect the error variance from the above to be heteroskedasity? Why or Why not? Use Whites test to determine if the error variance is heteroskedasity. Using Excel or whatever software you feel comfortable with, generate a scatter plot of the vector of errors obtained from the above model against the HP and VOL variables. 6

(b) (5 pts) Modify your CRM procedure to calculate White s heteroskedasity consistent estimate of your parameter standard errors which was made available in Workshop #3. How do the White standard errors compare with the traditional CRM standard errors? (c) (20 pts) Assume that if you have heteroskedasity errors that you will account for this heteroskedasityity via the multiplicative heteroskedasityity specification as outlined on pages 170-171 of Greene and pages 365-369 of Ch. 9 in JHGLL. Also let s assume that the error variance is impacted by the variables VOL and HP. Using the above flowchart as a template, modify the CRM procedure used to answer (a) to undertake a two-step estimation procedure that calculates consistent parameters of the above multiplicative heteroskedasityity model and associated parameter standard errors associated with the error variance component of the model. Is there statistical evidence that we have multiplicative heteroskedasityity? (Hint: Use the χ 2 test noted above). In modifying this procedure you should make it as general as possible so that it can be used for any linear regression model where heteroskedasticity is assumed. That is the call out to the multiplicative heteroskedasityity procedure should look something like this: output.estimategls <- EstimateGls(rhsvar, depend, rhsvar2) out.bolsls <- output.estimatear1$ret.bls out.corrols.covb <- output.estimatear1$ret.corr.covb out.bglsest <- output.estimatear1$ret.bgls out.gls.estcovb <- output.estimatear1$ret.gls.covb There are four returns to this function: bls are the CRM estimated coefficients, corr.covb is the correct inefficient CRM parameter standard errors but accounting for heteroskedasityity, bgls are the FGLS parameter estimates, and gls.covb is the FGLS parameter covariance matrix. And note there are 3 inputs to this function: rhsvar is the matrix of exogenous variables used in the regression, depend is the endogenous variable vector and rhsvar2 is the matrix of exogenous variables used to explain the error variance. (d) (5 pts) Given the above error structure, modify the original CRM code further to correctly calculate CRM standard errors when we have multiplicative heteroskedasityity where the CRM parameter covariance matrix is not σ 2 (X X) -1. How do these standard errors compare with the traditional CRM formulation and with White s heteroskedasityity consistent standard errors? How do the t-statistics obtained under the FGLS model compare with the values correctly calculated under the CRM and with White s standard errors? (e) (5 pts) Does the GLS-based regression explain a significant amount of variability of the MPG variable? Provide a statistic to back up your answer. (Hint: By the time you answer (c) you should have an R program that can be used to estimate the parameters of the multiplicative heteroskedasity model for any size regression model, calculate a number of regression statistics, and undertake a variety of hypothesis tests.) 7