ECON 312 FINAL PROJECT

Similar documents
Lab 11 - Heteroskedasticity

Econometrics. 9) Heteroscedasticity and autocorrelation

LECTURE 10: MORE ON RANDOM PROCESSES

at least 50 and preferably 100 observations should be available to build a proper model

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Heteroskedasticity and Autocorrelation

F9 F10: Autocorrelation

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

SOME BASICS OF TIME-SERIES ANALYSIS

Econometrics 2, Class 1

Multiple Regression Analysis: Heteroskedasticity

Econometrics Review questions for exam

Testing for Unit Roots with Cointegrated Data

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

Topic 7: Heteroskedasticity

Chapter 8 Heteroskedasticity

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Stationary and nonstationary variables

Iris Wang.

AUTOCORRELATION. Phung Thanh Binh

Diagnostics of Linear Regression

EC312: Advanced Econometrics Problem Set 3 Solutions in Stata

Section 2 NABE ASTEF 65

LECTURE 11. Introduction to Econometrics. Autocorrelation

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

1 Motivation for Instrumental Variable (IV) Regression

Modified Variance Ratio Test for Autocorrelation in the Presence of Heteroskedasticity

Week 11 Heteroskedasticity and Autocorrelation

Lecture 4: Heteroskedasticity

Christopher Dougherty London School of Economics and Political Science

Eksamen på Økonomistudiet 2006-II Econometrics 2 June 9, 2006

Reliability of inference (1 of 2 lectures)

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Dynamic Panels. Chapter Introduction Autoregressive Model

2 Prediction and Analysis of Variance

Economics 308: Econometrics Professor Moody

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Lecture 7: Dynamic panel models 2

Using EViews Vox Principles of Econometrics, Third Edition

MISCELLANEOUS REGRESSION TOPICS

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Unit 27 One-Way Analysis of Variance

Testing Random Effects in Two-Way Spatial Panel Data Models

Autocorrelation. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Autocorrelation POLS / 20

Section 6: Heteroskedasticity and Serial Correlation

Empirical Market Microstructure Analysis (EMMA)

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

TESTING FOR CO-INTEGRATION

1 Introduction to Generalized Least Squares

Linear Regression with Time Series Data

Problem Set 2: Box-Jenkins methodology

Introductory Econometrics

IV and IV-GMM. Christopher F Baum. EC 823: Applied Econometrics. Boston College, Spring 2014

Chapter 1 Statistical Inference

Econometrics of Panel Data

Nonstationary Time Series:

Regression #8: Loose Ends

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Greene, Econometric Analysis (7th ed, 2012) Chapters 9, 20: Generalized Least Squares, Heteroskedasticity, Serial Correlation

ECONOMETRICS II, FALL Testing for Unit Roots.

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

ECO375 Tutorial 4 Introduction to Statistical Inference

Forecasting the term structure interest rate of government bond yields

Linear Regression with Time Series Data

Econometrics - 30C00200

Model Mis-specification

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Bootstrap Testing in Econometrics

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 7: Single equation models

Volatility. Gerald P. Dwyer. February Clemson University

Empirical Economic Research, Part II

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Summary Algebraic and Statistical Preliminaries

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

LECTURE 5 HYPOTHESIS TESTING

Answers: Problem Set 9. Dynamic Models

Econometrics of Panel Data

Rockefeller College University at Albany

Dynamic Regression Models (Lect 15)

Casuality and Programme Evaluation

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Graduate Econometrics Lecture 4: Heteroskedasticity

Multivariate Time Series: VAR(p) Processes and Models

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Heteroskedasticity Example

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

Binary Dependent Variables

Linear Regression with Time Series Data

ECO375 Tutorial 7 Heteroscedasticity

The Simple Linear Regression Model

Transcription:

ECON 312 FINAL PROJECT JACOB MENICK 1. Introduction When doing statistics with cross-sectional data, it is common to encounter heteroskedasticity. The cross-sectional econometrician can detect heteroskedasticity with a Lagrange Multiplier test such as the Breusch-Pagan test. While there are considerations to be made about the form of the variance function for the error term, the Breusch-Pagan test is always conducted in the same general form; regress the squared residuals on some function of the regressors and compare the test statistic (NR 2 ) to a critical value of the χ 2 distribution. While heteroskedasticity can be a problem in both cross-sectional and time-series data, serial correlation is an issue unique to the time-series domain. Spatial autocorrelation, an analogous issue in the cross-section domain, is not as ubiquitous as the ever-present autocorrelation problem in time-series regression. Detecting autocorrelation, however, is less straight forward than detecting its cross-sectional relative, heteroskedasticity. This is because our null hypothesis in an analogous Lagrange-Multiplier test contains lagged values of the residuals, and, consequently, the economist must make a choice about how to deal with the observations for which lagged values of the residuals do not exist. 1.1. An Illustration of the Problem. Consider a time-series process with an AR(1) error term: (1) y t = α + βx t + u t where u t = ρu t 1 + ɛ t, and ɛ t is white noise To test for first order autocorrelation using LaGrange Multipliers, we would take the residuals from equation(1), û t, and regress them on û t 1 and the explanatory variables from the original regression: (2) û t = ρu t 1 ˆ + βx t + ɛ t H 0 : no autocorrelation, T 0 R 2 χ 2 p H 1 : Autocorrelation is present 1

2 JACOB MENICK For the Breusch-Godfrey Lagrange Multiplier test, our test statistic is T 0 R 2, where T 0 is the number of time periods, T, minus the first p observations, and R 2 is the coefficient of determination from the regression in equation (2). It makes sense that if the lagged residuals do little to explain the current residuals, then R 2 will be small, and we will fail to reject H 0 that there is no serial correlation. If they explain a lot of the variation in the current residuals, then R 2 will be large and we will reject H 0. In order to run this regression, though, we must decide what to do with the first observation on u. When testing for order p autocorrelation, we must decide what to do with the first p observations because the regression will have p lags of the residuals. According to Parker [?], there are two options at our disposal. (1) replace û 1, û 2,..., û p with zeroes (their expected value) and use T 0 = T. (2) drop û 1, û 2,..., û p and use T 0 = T p. Of the two, there is no clear favorite, and econometrics textbooks offer little guidance over how to pick between these options. Even Trevor Breusch himself neglects to advise statisticians on this choice[?]. Stata chooses option (1) by default, but altering our data in any way can feel somewhat sacrilegious, and dropping our observations is a waste of information that can be valuable, especially when the sample size is small. 1.2. A Course of Inquiry. This paper will attempt to determine if either of the two options for the Breusch-Godfrey test does a better job at detecting autocorrelation across various error structures and types of variables included in the regression. It will not assess the coefficient estimates in the presence of autocorrelation; only the comparative quality of options 1 and 2 mentioned in section 1.1. 2. The Data I will be conducting a Monte Carlo simulation on a dataset I created. I created three variables, denoted x1 t, x2 t, and x3 t, chosen to have normal distributions with arbitrary means and variances. Summary statistics for these three variables can be found in Table 1. The data were randomly sampled as follows: x1 t N(5, 2) x2 t N(1, 0.8) x3 t N(2.4, 0.3) There are 100 observations on each variable, thought of as a time series.

ECON 312 FINAL PROJECT 3 2.1. The Data-Generating Process (DGP). There are numerous data-generating processes so that we can compare the performances of the two brands of Breusch-Godfrey tests under various circumstances. The general form of the DGP is as follows: (3) y t = α + β 1 x1 t + β 2 x2 t + β 3 x3 t + θ 1 y t 1 + u t where u t = ρ 1 u t 1 + ρ 2 u t 2 +... + ρ p u t p + ɛ t and ɛ t is white noise 3. The Simulation In total, I ran 72 simulations, each with 1000 Trials. There are were four qualities of the DGP structure that I varied: (1) The number of explanatory variables include 1-3 x s (2) The presence of a lagged dependent variable (3) The order of the AR(p) process of the disturbance term, u t p {1, 2} (4) The magnitude of ρ 1 and ρ 2 each ρ i [0, 1] for stationarity In each of the 72 cases, I compared the success rate of the Breusch-Godfrey test between the two practices of dealing with the first p observations. The two versions of the test were conducted in two separate simulations. Each success rate was determined by the following algorithm: simulate y according to the DGP regress y on the relevant variables run the Breusch-Godfrey test for the known order of serial correlation in the DGP Store the χ 2 test statistic in each trial divide the total number of significant χ 2 statistics by the number of trials (1000) The 72 simulations can be broken down into four categories: (1) AR(1) error process, no lagged dependent variable included (2) AR(2) error process, no lagged dependent variable included (3) AR(1) error process, lagged dependent variable is included (4) AR(2) error process, lagged dependent variable is included

4 JACOB MENICK In each of the four categories, there were eighteen trials conducted: a model had either one, two, or three regressors (not including the lagged dependent variable) and six different values of ρ for each number of regressors. 3.1. No Lagged Dependent Variable Present in Regression. The first group of simulations does not include y t 1 as a regressor. 3.1.1. u t as an AR(1) Process. For these models, u t = ρ 1 u t 1 + ɛ t DGP 1-18: y t = 17 + 3x1 t + ρ 1 u t 1 + ɛ t y t = 17 + 3x1 t +.4x2 t + ρ 1 u t 1 + ɛ t y t = 17 + 3x1 t +.4x2 t +.8x3 t + ρ 1 u t 1 + ɛ t where ρ 1 varies from.1 to.6 in increments of.1, for each of the three models Table: Found in Appendix B, Table 2 Stata Output: Found in Appendix C, Figure 1 For this group of simulations, the two forms of the test were virtually indistinguishable in terms of their ability to detect autocorrelation. The Breusch-Godfrey test using the nomiss0 option (referred to earlier as option 2), detected autocorrelation, on average 72.5 % of the time, and it detected it roughly 72% of the time when it replaced the first observation with a zero. This improvement of about.5% does not seem to be enough of a difference to be conclusive. 3.1.2. u t as an AR(2) Process. For these models, u t = ρ 1 u t 1 + ρ 2 u t 2 + ɛ t DGP 19-36: y t = 17 + 3x1 t + ρ 1 u t 1 + ρ 2 u t 2 + ɛ t y t = 17 + 3x1 t +.4x2 t + ρ 1 u t 1 + ρ 2 u t 2 + ɛ t y t = 17 + 3x1 t +.4x2 t +.8x3 t + ρ 1 u t 1 + ρ 2 u t 2 + ɛ t where ρ 1 =.2 and ρ 2 varies from.1 to.6 in increments of.1, for each of the three models Table: Found in Appendix B, Table 3 Stata Output: Found in Appendix C, Figure 2

ECON 312 FINAL PROJECT 5 This group of simulations sees no distinguishable difference between the two forms of the test. 3.2. Lagged Dependent Variable Present in Regression. The second group of simulations does include y t 1 as a regressor. 3.2.1. u t as an AR(1) Process. For these models, u t = ρ 1 u t 1 + ɛ t DGP 37-54: y t = 17 + 3x1 t +.4y t 1 + ρ 1 u t 1 + ɛ t y t = 17 + 3x1 t +.4x2 t +.4y t 1 + ρ 1 u t 1 + ɛ t y t = 17 + 3x1 t +.4x2 t +.8x3 t +.4y t 1 + ρ 1 u t 1 + ɛ t where ρ 1 varies from.1 to.6 in increments of.1, for each of the three models Table: Found in Appendix B, Table 4 Stata Output: Found in Appendix C, Figure 3 This group of models also sees a small difference between the two versions of the test. The test with the nomiss0 option performs on average only about 1% better. 3.2.2. u t as an AR(2) Process. For these models, u t = ρ 1 u t 1 + ρ 2 u t 2 + ɛ t DGP 55-72: y t = 17 + 3x1 t +.4y t 1 + ρ 1 u t 1 + ρ 2 u t 2 + ɛ t y t = 17 + 3x1 t +.4x2 t +.4y t 1 + ρ 1 u t 1 + ρ 2 u t 2 + ɛ t y t = 17 + 3x1 t +.4x2 t +.8x3 t +.4y t 1 + ρ 1 u t 1 + ρ 2 u t 2 + ɛ t where ρ 1 =.2 and ρ 2 varies from.1 to.6 in increments of.1, for each of the three models Table: Found in Appendix B, Table 5 Stata Output: Found in Appendix C, Figure 4 When a lagged dependent variable is included, and the error term is an AR(2) process, the second version of the test performs measurably better. It detects autocorrelation on average almost 6% more of the time.

6 JACOB MENICK 4. Results In hindsight, the small difference between the two tests in this simulation for the majority of the groups is not all that surprising; the first p observations are only one or two observations out of 100. It makes sense, then that the difference between the two tests is often on the order of one percentage point. The interesting difference though, is in the final group, in which the DGP has an AR(2) error process and a lagged dependent variable as a regressor. Why would dropping the first two observations have such an effect on the ability of a LaGrange Multiplier test to detect autocorrelation? One answer, I propose, is that when we substitute zeroes for the first p observations, not only are we eliminating evidence of autocorrelation, we are inserting evidence of no autocorrelation. Recall our test statistic, T 0 R 2, and recall that we reject the null hypothesis of no autocorrelation when our test statistic is large (larger than the critical χ 2 value with p degrees of freedom). It must be the case that the reduction in T 0 is made up by an increase in R 2 when we drop the first two observations in the case of our last group of models. Indeed, three out of the four groups of models saw the success rate of the test increase when option 2 was chosen, and the other had an indistinguishable difference between the two forms. 5. Conclusion and Validity Assessment The conclusions of this paper are not easily applied elsewhere, as the results of our test choice are largely dependent on the sample. The specifics of the data-generating processes used in this simulation were entirely arbitrary. This includes the magnitudes of the coefficients chosen, the distributions of the variables, and the sample size. I would guess that in a smaller sample, the default test option would be more robust because there would not be much to sacrifice in the way of T 0. Nonetheless, this paper suggests that there could be cases in which the Breusch-Godfrey test performs significantly better at detecting order p autocorrelation when the first p residuals are dropped rather than replaced with zeroes. Perhaps the folks over at Stata should consider making this the default option on their estat bgodfrey command. 6. Appendices 6.1. Appendix A: A Note on Stata Commands Used. This project made use of some Stata commands that were not covered in the course. The replace command was used in the forvalues loops so that I would not try to generate the same variable twice and trigger an error. I used this function particularly for lagging the error term in the AR(1)

ECON 312 FINAL PROJECT 7 process and the dependent variable in the models that included the dependent variable as a regressor. I also had to make some tricky use of commands to do with matrices in order to get the chi-squared statistics from the Breusch-Godfrey tests as scalars. In the Monte program, I created a matrix to hold the results from the estat bgodfrey command, and then generated a variable to pick out the [1,1] entry as a scalar. I also generated a variable to grab results from thesummarize command so that I could determine the proportion of successful tests for autocorrelation in the simulation. Table 1. Summary Statistics Variable Mean Std. Dev. Min. Max. N x1 4.84374 1.73563-1.00358 8.46331 100 x2 0.94684 0.82075-1.31923 3.25001 100 x3 2.33507 0.26249 1.54605 3.04348 100 6.2. Appendix B: Tables. These tables show the results of the Monte Carlo simulation for each group of models. The first column indicates the number of X s included in the regression. The second column indicates the value of either ρ 1 or ρ 2, whichever was varied. The third column is the proportion of significant test statistics in each round of 1000 simulations, when option 1 is used in the Breusch-Godfrey test (substitute zero for the first p residuals). The fourth column indicates the proportion of significant test statistics when option 2 is used. The bolded number in either the third or fourth column indicates at a glance which form of the test had a higher success rate for that simulation. 6.3. Appendix C: Stata Input/Output. These figures are screenshots of the do-file that I used in each group of simulations. As mentioned in the paper, each of these do-files was adjusted 18 times to include different values of ρ and include a different number of X s in the regression. These do-files will also be attached to this paper in case a reader is interested in replicating the results. 6.4. Appendix D: References.

8 JACOB MENICK Table 2. AR(1) Error, No Lagged Y Number of X s ρ 1 Zeros prop. success. Nomiss prop. success 1.1.1.13 1.2.461.433 1.3.794.801 1.4.962.956 1.5.995.994 1.6 1 1 2.1.115.159 2.2.460.472 2.3.809.794 2.4.969.965 2.5.999.995 2.6 1 1 3.1.123.132 3.2.429.458 3.3.792.805 3.4.963.960 3.5.995.996 3.6 1 1 References [1] Jeffrey Parker Chapter 2: Regression with Stationary Time Series p.28 2013. [2] T.S. Breusch Testing for Autocorrelation in Dynamic Linear Models 1978: Australian National University

ECON 312 FINAL PROJECT 9 Table 3. AR(2) Error, No Lagged Y,ρ 1 =.2 Number of X s ρ 2 Zeros prop. success. Nomiss prop. success 1.1.460.465 1.2.695.678 1.3.907.886 1.4.979.982 1.5.996.997 1.6 1 1 2.1.466.450 2.2.677.671 2.3.898.898 2.4.971.986 2.5.998.996 2.6 1 1 3.1.443.458 3.2.691.694 3.3.891.887 3.4.977.979 3.5.996.998 3.6 1 1

10 JACOB MENICK Table 4. AR(1) Error,Lagged Y Included Number of X s ρ 1 Zeros prop. success. Nomiss prop. success 1.1.138.144 1.2.404.462 1.3.781.788 1.4.958.956 1.5.994.992 1.6 1 1 2.1.115.121 2.2.409.435 2.3.795.811 2.4.963.961 2.5.995.996 2.6 1 1 3.1.144.158 3.2.414.426 3.3.782.815 3.4.944.954 3.5.997.993 3.6 1 1

ECON 312 FINAL PROJECT 11 Table 5. AR(2) Error,Lagged Y Included,ρ 1 =.2 Number of X s ρ 2 Zeros prop. success. Nomiss prop. success 1.1.462.457 1.2.663.690 1.3.877.874 1.4.976.969 1.5.997.996 1.6 1 1 2.1.423.456 2.2.67.694 2.3.871.880 2.4.961.981 2.5.999.998 2.6 1 1 3.1.429.447 3.2.681.654 3.3.883.869 3.4.966.971 3.5.999.993 3.6 1 1

12 JACOB MENICK Figure 1. An example of a do-file for the first group of simulations

ECON 312 FINAL PROJECT 13 Figure 2. An example of a do-file for the second group of simulations

14 JACOB MENICK Figure 3. An example of a do-file for the third group of simulations

ECON 312 FINAL PROJECT 15 Figure 4. An example of a do-file for the fourth group of simulations