Week 11 Heteroskedasticity and Autocorrelation

Similar documents
Graduate Econometrics Lecture 4: Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

LECTURE 11. Introduction to Econometrics. Autocorrelation

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

AUTOCORRELATION. Phung Thanh Binh

LECTURE 10: MORE ON RANDOM PROCESSES

Lecture 4: Heteroskedasticity

A Guide to Modern Econometric:

Econometrics. 9) Heteroscedasticity and autocorrelation

Iris Wang.

Econometrics - 30C00200

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Introductory Econometrics

Heteroskedasticity and Autocorrelation

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

Multiple Regression Analysis

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

FinQuiz Notes

Making sense of Econometrics: Basics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Intermediate Econometrics

Economics 308: Econometrics Professor Moody

Heteroskedasticity. Part VII. Heteroskedasticity

Econ 510 B. Brown Spring 2014 Final Exam Answers

the error term could vary over the observations, in ways that are related

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Model Mis-specification

Reliability of inference (1 of 2 lectures)

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Linear Regression with Time Series Data

ECON3327: Financial Econometrics, Spring 2016

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Multiple Regression Analysis: Heteroskedasticity

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Microeconometrics (L5): Panel Data-Basics

CHAPTER 6: SPECIFICATION VARIABLES

Topic 7: Heteroskedasticity

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Instrumental Variables, Simultaneous and Systems of Equations

ECON 4230 Intermediate Econometric Theory Exam

Heteroscedasticity and Autocorrelation

1 Motivation for Instrumental Variable (IV) Regression

Modified Variance Ratio Test for Autocorrelation in the Presence of Heteroskedasticity

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Lecture 6: Dynamic Models

Semester 2, 2015/2016

Econometrics Summary Algebraic and Statistical Preliminaries

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Diagnostics of Linear Regression

Section 6: Heteroskedasticity and Serial Correlation

Christopher Dougherty London School of Economics and Political Science

Introduction to Econometrics

Econometrics Multiple Regression Analysis: Heteroskedasticity

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Econometrics of Panel Data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Likely causes: The Problem. E u t 0. E u s u p 0

Econometrics Part Three

Time Series Econometrics For the 21st Century

F9 F10: Autocorrelation

Review of Econometrics

1. The OLS Estimator. 1.1 Population model and notation

Agricultural and Applied Economics 637 Applied Econometrics II

Chapter 8 Heteroskedasticity

9. AUTOCORRELATION. [1] Definition of Autocorrelation (AUTO) 1) Model: y t = x t β + ε t. We say that AUTO exists if cov(ε t,ε s ) 0, t s.

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

in the time series. The relation between y and x is contemporaneous.

Multiple Regression Analysis

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

11.1 Gujarati(2003): Chapter 12

Eksamen på Økonomistudiet 2006-II Econometrics 2 June 9, 2006

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Ordinary Least Squares Regression

Introductory Econometrics

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Econometrics of Panel Data

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Applied Quantitative Methods II

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

Using EViews Vox Principles of Econometrics, Third Edition

DEMAND ESTIMATION (PART III)

Transcription:

Week 11 Heteroskedasticity and Autocorrelation İnsan TUNALI Econ 511 Econometrics I Koç University 27 November 2018

Lecture outline 1. OLS and assumptions on V(ε) 2. Violations of V(ε) σ 2 I: 1. Heteroskedasticity 2. Autocorrelation 3. Clustering 3. What to do? Under cases 1 and 2 we examine: 1. Computation and use of robust standard errors 2. Tests for detection of violations 3. Deriving an alternate estimator 4. How inference is affected 5. Model misspecification Draws on: Marno Verbeek, A Guide to Modern Econometrics, 2012 (4 th Ed.), Ch. 4. 2

CLRM and OLS Data: (y, X). Rows (y i, x i ) are i.i.d draws from a population which satisfies: (A0) y i = x i β + ε i, i = 1,2,..,N. (A1) Errors have mean zero: E{ε i } = 0 for all i. (A2) ε is independent of X or {ε i, ε N } is independent of {x 1, x N }. All error terms are independent of all explanatory variables. (A3) (A4) V{ε i } = σ 2 for all i. All error terms have the same variance (homoskedasticity). cov{ε i,,ε j } = 0, i j. All error terms are uncorrelated with all other error terms (no autocorrelation). 3

OLS estimator properties Under assumptions (A1) and (A2): 1. The OLS estimator b = (X X) -1 X y is unbiased. That is, E{b} = β. Under assumptions (A1), (A2), (A3) and (A4): 2. The variance of the OLS estimator is given by V{b X} = σ 2 (X X) -1 = σ 2 ( Σ i x i x i ) -1 (2.33) nn. where Σ ii Σ ii=1 3. An unbiased estimator for V{b X} may be formed by replacing σ 2 by its unbiased estimator s 2 = (n-k) -1 Σ i e 2 i (2.35) VV {bb XX} = ss 2 (XXXXX) 1 (2.36) 4

Gauss-Markov conditions 4. GM Theorem: The OLS estimator b is the best linear unbiased estimator (BLUE) for β. Denoting the N-dimensional vector of all error terms by ε, and the entire matrix of explanatory variables by X, sufficient conditions for the Gauss-Markov results are: E{ ε X} = 0 (4.3) and V{ε X} = σ 2 I, (4.4) where I is the NxN identity matrix. This says: the distribution of error terms given X has zero means, constant variances, and zero covariances. 5

Violation of A3: heteroskedasticity Heteroskedasticity arises if error terms do not have the same variance. Relevant to CS, TS and Panel data. How this might happen: TS/Panel: Variances evolve over time. Example: y = daily stock returns. General market conditions are likely to influence the variance of returns. CS/Panel: Variances depend upon one or more explanatory variables. Example: y = household income (or total expenditures). At higher income levels, we expect lower expenditures/higher savings on average, but also more variability. 6

Violation of A3: heteroskedasticity Source: Stock and Watson, Intro. to Econometrics. 7

Violation of A3: heteroskedasticity 8

Violation of A4: autocorrelation Autocorrelation arises if error terms are correlated across observations. Autocorrelation is a typical feature of time-series data (in which case it is also known as serial correlation). When do we expect this? Unobserved factors (included in ε) from one period partly carry over to the next. Model is missing seasonal patterns. Model is based on overlapping samples (e.g. quarterly returns observed each month). Model is otherwise misspecified (omitted variable, incorrect dynamics, etc.) 9

Positive autocorrelation Demand for ice cream (as a function of income and price index) 10

Positive autocorrelation Demand for Money Greene Example 20.1 (p.943): A naive model: 11

Violation of A3 & A4: Clustering Sometimes multiple individual observations contained in a cross-section data set are drawn from the same cluster. Examples: Multiple individuals in the same household. Multiple households from the same city. Multiple firms from the same sector, etc. This induces nonzero correlation between different error terms when observations are from the same cluster. In addition clusters might have different variances. 12

Consequences The consequences of both problems are similar. As long as (4.3) holds, i.e. E{ ε X} = 0, the OLS estimator is unbiased. However, if (4.4) is violated (V{ε X} σ 2 I), then: OLS is no longer BLUE! Furthermore: The estimator for V{b X} given by (2.36) is not correct! Standard errors routinely calculated by your software (Stata) are incorrect. F-statistics are not correct; R-sq/RSS based tests do not work. 13

What to do? Overview Four ways to deal with the problem: 1. Use OLS but compute standard errors correctly; 2. Use an alternative estimator (FGLS); 3. Test for heteroscedasticity/autocorrelation; use the appropriate (OLS or FGLS) estimator; 4. Reconsider the model specification. The first route is the most popular (easiest). When efficiency is desirable, route 2 is followed. The third may require many tests (pre-test bias?). The fourth route is often employed when autocorrelation is detected. 14

Solution 1: Use OLS Model: y i = x i β + ε i,, i = 1,2,, n. In matrix notation: y = Xβ + ε We have V{ε X} = Σ; V{y X} = Σ as well. V{b X} = A V{y X} A = A Σ A = (X X) -1 X ΣX (X X) -1. Estimation of V{b X} requires estimation of Σ. 15

1 - (Pure) Heteroskedasticy Easier case: When V{ε X} is diagonal, but with different diagonal elements, we have heteroskedasticity but no autocorrelation. In this case assumption (A3) may be replaced by V{ε i X} = σ 2 i, i = 1,2,, N. V{y i X } = σ 2 i as well. Thus Σ is a diagonal matrix, with σ 2 i as i th diagonal element. Let x i denote the i th row of X. Then X X = Σ i x i x i and X ΣX = Σ i σ 2 ix i x i. 16

Estimation of V(b X) under heteroskedasticy Note the difficulty: With V{ε i X} = σ 2 i, each observation has its own unknown parameter! Key insight: ε 2 i is large when σ 2 i is large; e 2 i is large when ε 2 i is large. Σ i e 2 ix i x i serves as a consistent estimator of Σ i σ 2 ix i x i. We can use this idea to estimate V(b X) consistently. This result is due to Eicker, Huber and White. 17

Heteroskedasticity-robust inference The (Eicker-Huber-)White (heteroskedasticity-robust) covariance matrix of the OLS estimator b is: VV WW bb XX) = ( (4.30) We use this formula to compute standard errors rather than the standard one from (2.36) and continue as before with our t-tests. (How about F-tests?) Note: This formula is still appropriate if the errors have a constant variance. Robust the formula is valid for arbitrary heteroskedasticity. 18

Heteroskedasticity-robust inference To implement in STATA: include subcommand robust after regress. regress y x1 x2.., robust Actually STATA uses a slightly different version of (4.30): VV RR bb XX) = NN NN KK VV WW bb XX). This is because e 2 i s are biased towards zero (recall the bias correction we used earlier, in estimating σ 2 ). See STATA: help vce_option. Also Greene, 9.4.4. For additional reading, you may consult Angrist and Pischke (2009) Mostly Harmless Econometrics, Ch.8. 19

Example: Demand for labor We estimate a simple labor demand function for a sample of 569 Belgian firms (from 1996). We seek to explain the variation in labor in terms of variation in wages, output and capital stock. Note that the original variables were rescaled to obtain coefficients in the same order of magnitude. 20

Demand for labor: A double-log model n = 569 What do the coefficients measure? Any surprises? How might heteroskedasticity influence the results? 21

Demand for labor: A double-log model with robust standard errors Compare with Table 4.3. What changed? Should we conclude that labor demand is inelastic with respect to capital? 22

Solution 2: Derive an alternative estimator We know that OLS is BLUE only under the Gauss-Markov conditions. How to find an efficient alternative? 1. Transform the model such that it satisfies the Gauss- Markov assumptions again. 2. Apply OLS to the transformed model. This leads to the generalized least squares (GLS) estimator, which is BLUE. See Week 9: GCR. 3. Transformation often depends upon unknown parameters (that characterize heteroskedasticity and/or autocorrelation). In this case we estimate Σ first; then transform the model. This leads to a Feasible GLS (FGLS) or Estimated GLS (EGLS) estimator, which is approximately BLUE. 23

Weighted Least Squares (WLS) estimator With heteroskedasticity we have V{ε i X } = σ 2 i = σ 2 h 2 i. (GCR handout p. 32.) Suppose h i s are known. Then y i /h i = (x i /h i ) β + ε i /h i (4.16) has a homoskedastic error term: V{ε i /h i X } = σ 2. OLS applied to this transformed model yields (4.17) which is a weighted least squares (WLS) estimator. 24

Weighted Least Squares (WLS) The weighted least squares estimator is a special least squares estimator where each observation is weighted by (a factor proportional to) the inverse of the error variance. Observations with a higher variance get a lower weight (because they provide less accurate info on β). The resulting estimator is more efficient (more accurate) than OLS. However, it can only be applied if we know h i (we rarely do) or if we can estimate it by making additional restrictive assumptions on the form of h i (it is a good idea to test first). 25

Implementation of WLS estimation (example of FGLS) Suppose where z i is a vector of observed variables (typically a subset of x i, excluding the constant). Note that this is an example of V{ε i X } = σ 2 i = σ 2 h 2 i. The functional form h(.) has been chosen so that the variances are never negative, and the homoskedastic case obtains as a special case (when all slopes are zero). 26

WLS as FGLS, cont d. Assumed form of heteroskedasticity: To estimate α we run an auxiliary regression where e s denote the OLS residuals and z s variables.. This provides a consistent estimator for α, which can be used transform the model (so that OLS on the transformed model would yield the WLS estimator). The auxiliary regression also provides a test for heteroskedasticity and sets the stage for the third approach: Test and decide whether to use OLS or GLS. 27

The Breusch-Pagan test of heteroscedasticity The Breusch-Pagan test investigates whether the error variance is a function of the vector z i. In particular, the alternative hypothesis is (4.36) for some function h(.) with h(0) = 1. The all slopes are zero null corresponds to the homoskedastic case. The BP test is based on regressing the squared OLS residuals e 2 i upon z i. Often the original regressors serve as z i. Test statistic ("LM ") = N multiplied by R-sq of the auxiliary regression. Has a Chi-squared distribution (df = no. of variables in z i ). 28

A linear model of labor demand Is the variance of labor (level of employment) constant? Or is it likely to vary with covariates? Which ones? 29

Breusch-Pagan test Demand for labor example ------------------- n = 569 30

The Breusch-Pagan test Demand for labor example cont d. In the auxiliary regression we see (very) high t-ratios and a high R-sq. This indicates that the squared errors are strongly related to z i. Test statistic: LM = NxR-sq ~ Chi Sq (3), here LM o = 331.0, which leads to a very strong rejection of the null hypothesis of homoskedasticity. Example of Lagrange Multiplier test (subject of Week 12) Recall Verbeek s point that using logs reduces heteroscedasticity (Ch.3, section 3.6.2). Results from the double log model reported on the next page lend credence to this. 31

A double-log model of labor demand Auxiliary regression (not shown) yields R-sq = 0.0136; LM o = NxR-sq = 7.74, p-value 0.05. n = 569 32

The White test for heteroscedasticity The White test uses a more general h(.), that is a more general alternative than Breusch-Pagan. It is based on regressing the squared OLS residuals upon all the regressors, their squares and their (unique) cross-products. Test statistic: N multiplied by R 2 of the auxiliary regression. Has Chi-squared distribution (df = no. of variables in the auxiliary regression). Advantage: very general. Disadvantage: low power in small samples (why?). 33

The White test Demand for labor example cont d. Verbeek carried out the White test using the double-log model, which is arguably a better functional form choice in this context. The starting point for the investigation is Table 4.3. The auxiliary regression results are given in Table 4.4 (next page). With an R 2 of 0.1029, the value of the White test statistic is W o = 58.5 (df = 9), a highly significant Chisquared value. 34

The White test Demand for labor example n = 569 35

FGLS Given the strong rejection, the next step would be to turn to WLS (FGLS). Theory: Next page. Results: Verbeek 4.5, Tables 4.6-4.7. Table 4.6: Auxiliary regression (not in lecture notes) Table 4.7: FGLS (or Estimated GLS = EGLS) results 36

Multiplicative heteroskedasticity Demand for labor example -- FGLS To obtain the FGLS estimator, compute and transform all observations to obtain The error term in this model is (approximately) homoskedastic. Applying OLS to the transformed model gives the FGLS (Verbeek: EGLS) estimator for β. Note: the transformed regression is for computational purposes only. All economic interpretations refer to the original model! 37

38

Comparison of Tables 4.5 and 4.7 We see that the efficiency gain is substantial. Comparison with Table 4.3 (OLS with incorrect standard errors) is not appropriate. The coefficient estimates are fairly close to the OLS ones. Note that the effect of capital is now statistically significant. Employment elasticity with respect to capital is indeed negative, but small in absolute magnitude. The R 2 in Table 4.7 is misleading, because - it applies to the transformed model (not the original one); - is uncentered because there is no intercept. Recall that OLS always gives the highest R 2! 39

About heteroskedascity robust standard errors RECAP: Use of (Ecker-Huber-)White (heteroskedasticityconsistent) standard errors is often an appropriate solution to the problem of heteroskedasticity. They are easily available in most modern software (such as STATA). It allows one to make appropriate inference without specifying the type of heteroskedasticity. This is (almost) standard in many applications. Sometimes, we would like to have a more efficient estimator, by making some assumption about the form of heteroskedasticity. 40

2 - Autocorrelation Autocorrelation typically occurs with time series data (where observations have a natural ordering). To stress this, we shall index the observations by t = 1,,T, rather than i = 1,..,N. The error term picks up the influence of those (many) variables and factors not included in the model. If there is some persistence in these factors, (positive) autocorrelation may arise. Thus, autocorrelation may be an indication of a misspecified model (omitted variables, incorrect functional form, incorrect representation of dynamics). Accordingly, autocorrelation tests are often interpreted as misspecification tests. 41

What to do? Four ways to deal with the problem: 1. Use OLS but compute standard errors correctly; 2. Use an alternative estimator (FGLS); 3. Test for autocorrelation; use the appropriate (OLS or FGLS) estimator; 4. Reconsider the model specification. The first route is the most popular. When efficiency is desirable, route 2 is followed. The third may require many tests (pre-test bias?). The fourth route is followed often. 42

Solution 1: Use OLS Model: y i = x i β + ε i,, i = 1,2,, n. In matrix notation: y = Xβ + ε We have V{ε X} = Σ; V{y X} = Σ as well. V{b X} = A V{y X} A = A Σ A = (X X) -1 X ΣX (X X) -1. Estimation of V{b X} requires estimation of Σ. 43

Estimation of V(b X) under heteroskedasticy and autocorrelation When autocorrelation is present, Σ is no longer diagonal. In this case (A3)-(A4) may be replaced by Cov{ε t ε s X} = σ ts, t, s = 1,2,, T; Cov{y t y s X } = σ ts, t, s = 1,2,, T, as well. The subscripts (t,s) keep track of time, which has a natural ordering. The sample size is denoted by T and σ tt = σ 2 t =V {ε t X}, t = 1,2,, T. TS data typically violate the strong assumption that GMtheorem requires. We settle for consistency instead. (A2) may be replaced by: E(x t ε t ) = 0 for all t. 44

Estimation of V(b X) under heteroskedasticy and autocorrelation, cont d. Let x t denote the t th row of X. Then X X = Σ t x t x t and X ΣX = Σ t Σ s σ ts x t x s. Following the key idea of the heteroskedastic case, we might consider estimating X ΣX by Σ t Σ s e t e s x t x s, where e s denote the OLS residuals. Unfortunately this approach has T 2 terms. It is clear that consistency cannot be attained. We have a harder problem at hand, we need simplifying assumptions. 45

Estimation of V(b X) under heteroskedasticy and autocorrelation, cont d. Simplification: Suppose V{ε X} is not diagonal, but the error term has certain features: (A2) E(x t ε t ) = 0, (A1) E(ε t ) = 0, for all t; (A3-A4) Cov(ε t ε s ) may be nonzero for t s, but Cov(ε t ε s ) = E(ε t ε s ) = 0 for t s > H > 1. i.e. autocorrelation dies out at lag H. In this case a consistent estimator of V{b X} which is robust to both heteroskedasticity and autocorrelation can be found. The result is due to Newey and West. 46

Estimation of V(b X) under heteroskedasticy and autocorrelation, cont d. A consistent estimator of X ΣX is: (4.63) As usual e s denote the OLS residuals. T denotes the sample size. H denotes the lag length at which serial correlation is assumed to become zero. Observe that we obtain the (Huber-Eicker-)White version when w j = 0 for all j. 47

Heteroscedasticity and Autocorrelationrobust inference The HAC (heteroskedasticity and autocorrelation- robust) or Newey-West estimator of the covariance matrix of the OLS estimator b is: (4.62) where S* is given in (4.63) on the previous page. The resulting standard errors are known as HAC standard errors or Newey-West standard errors. Special case: When w j = 1 in (4.63), the std. errors are known as Hansen-White standard errors; see Verbeek p.126. 48

Heteroscedasticity and Autocorrelationrobust inference, cont d. STATA implementation: Computation of HAC standard errors (and correct F-stats) is more complicated than heteroskedasticity-consistent inference. Unlike CS data, with TS data correct ordering of the observations is essential. This is facilitated by teaching STATA the time series nature of the data and the variable responsible for the ordering. To get an introduction to STATA s TS commands, type help time series 49

About HAC standard errors In many cases, using Newy-West (or Hansen-White) (heteroskedasticity-and-autocorrelation consistent) standard errors is an appropriate solution to the problems that arise with stationary time series data. STATA handles these; read the documentation before you engage in serious work. STATA allows lag length H to be chosen by the researcher. How to choose it? Follow those before you. What to do when the series are non-stationary? Econ 513 (TS version). 50

Solutions 2 and 3 -- Overview 2. Use an alternative estimator (FGLS); 3. Test for autocorrelation; use the appropriate (OLS or FGLS) estimator. These require modelling of the errors. Some popular models: AR(1) model: where v t is an error with mean zero and constant variance. MA(1) model: with v t defined in similar fashion. 51

Consider First-order autocorrelation Suppose (A1)-(A3) hold, but (A4) is violated. Many forms of autocorrelation exist. The most popular one is first-order autocorrelation: where v t is an error with mean zero and constant variance. GM conditions are restored if ρ = 0. Also known as AR(1): Autoregressive Model of order 1. 52

AR(1) -- Properties of ε t We examine the properties of ε t under: ρ < 1 (ρ = 1 case, known as unit roots, is ignored). Covariance stationarity: Means, variances, covariances do not change over time. See Verbeek Ch. 8, Greene Ch. 21 for violations. Under (covariance) stationarity: We can solve for: Note that this requires -1 < ρ < 1. 53

First-order autocorrelation Properties of ε t cont d. Further and and in general (s > 0). See GCR handout, Week 9, last page. 54

First-order autocorrelation Thus, this form of autocorrelation implies that all error terms are correlated. Their covariance decreases if the distance in time gets large. To transform the model such that it satisfies the Gauss- Markov conditions we use (4.46): With known ρ, this produces (almost) the GLS estimator. Note: first observation is lost by this transformation (see (4.47) on how to handle this). Of course, typically ρ is unknown. 55

First-order autocorrelation Estimating ρ First estimate the original model by OLS. This gives the OLS residuals. Starting from it seems natural to estimate ρ by regressing the OLS residual e t upon its lag e t-1. This gives (4.49) We then use (4.49) in (4.46), and apply OLS to the transformed model to get the FGLS estimator. While this estimator is typically biased, it is consistent for ρ under weak conditions. 56

Testing for first-order auto-correlation 1. Asymptotic tests The auxiliary regression producing also provides a standard error to it. The resulting t-test statistic is approximately equal to We reject the null (no autocorrelation) against the alternative of nonzero autocorrelation if t > 1.96 (5% significance). Another form is based on (T-1) x R 2 of this regression, to be compared with Chi-squared distribution with df = 1 (reject if > 3.86; 5% significance). 57

Testing for first-order auto-correlation 1. Asymptotic tests Remarks: If the model of interest contains lagged values of y t as explanatory variables (or other explanatory variables in lagged form that may be correlated with lagged error terms), the auxiliary regression should also include all explanatory variables. If we also suspect heteroskedasticity, White standard errors may be used in the auxiliary regression. 58

Testing for first-order auto-correlation 2. Durbin-Watson test This is a very popular test, routinely computed by most regression packages (even when it is inappropriate!). Requirements: (a) intercept in the model, and (b) no lagged dependent variables! The test statistic is given by which is approximately equal to 59

Testing for first-order auto-correlation 2. Durbin-Watson test Distribution is peculiar. It depends on x t s. In general, dw values close to 2 are fine, while dw values close to 0 imply positive autocorrelation. The exact critical value is unknown, but upper and lower bounds can be derived (for extremely slow and fast changing x s --see Table 4.8). Thus (to test for positive autocorrelation): dw is less than lower bound: reject dw is larger than upper bound: not reject dw is in between: inconclusive. The inconclusive region becomes smaller if T gets large. 60

Bounds on critical values Durbin-Watson test The test is inconclusive if d L < dw < d U. Conservative approach: Reject null when dw < d U. 61

Example: the demand for ice cream Based on classic article Hildreth and Lu (1960), based on a time-series of 30 (!) four-weekly observations 1951-1953. See Figure 4.3 for plots of these series. 62

The demand for ice cream Figure 4.3 63

The demand for ice cream OLS results Note: A simpler model (w/o income) was used to obtain the fitted values in Fig. 4.2. 64

The demand for ice cream Actual and fitted values Based on the OLS results in Table 4.9. 65

Estimation of ρ From we get = 0.49. Regressing the OLS residuals on their lags gives. This gives test statistics: Both reject the null of no autocorrelation. Use FGLS or change model specification. 66

The demand for ice cream FGLS estimation Compare with Table 4.9. The starred statistics are for the transformed OLS model. New dw-stat is still low! 67

The demand for ice cream Augmented model with lagged temperature The new dw-stat is in the inconclusive region.. 68

Consider Alternative autocorrelation patterns with first order (autoregressive) autocorrelation This implies that all errors are correlated with each other, with correlations becoming smaller if they are further apart. Two more general alternatives: 1. Higher order autoregression; or 2. Moving average terms. 69

Higher order autocorrelation With quarterly or monthly (macro) data, higher order patterns are possible (due to a periodic effect). For example, with quarterly data: or, more generally known as 4 th order (autoregressive) autocorrelation. Correlations between different error terms are more flexible than with 1 st order. 70

Moving average autocorrelation Arises if the correlation between different error terms is limited by a maximum time lag. Simplest case: MA(1) Moving Average of order 1. This implies that ε t is correlated with ε t-1, but not with ε t-2 or ε t-3, etc. Moving average errors arise by construction when overlapping samples are used (see Illustration in Section 4.11). 71

What to do when you find autocorrelation Autocorrelation may be due to misspecification! 1. Reconsider the model: 1.a: change functional form (e.g. use log(x) rather than x), see Figure 4.5 (on next page). 1.b: extend the model by including additional explanatory variables (seasonals) or additional lags; 2. Compute heteroskedasticity-and-autocorrelation consistent standard errors (HAC standard errors) for the OLS estimator; 3. If you can defend your model, use FGLS. 72

Wrong functional form True model: y t = β0 + β1 log(x t ) + ε t We regressed y on x. 73

Incomplete dynamics We have been considering the static model y t = x t β + ε t which has E{y t x t } = x t β, with ε t = ρε t-1 + v t. Consider the dynamic alternative (write static model for period t-1, solve for ε t-1 and substitute): E{y t x t, x t-1, y t-1 } = x t β + ρ (y t-1 x t-1 β). Accordingly, we can also write the linear model y t = x t β + ρ y t-1 ρx t-1 β + v t, where the error term does not exhibit serial correlation. In many cases, including lagged values of y and/or x will eliminate the serial correlation problem. 74

3 Clustering Robust Inference If clustered CS data are used, error terms of observations drawn from the same group will be correlated, whileas error terms across groups will be uncorrelated. Error terms of clusters may have different variances. The Newey-West formula (4.63) can be adjusted to take care of this situation (see Verbeek, p.390). STATA: use the cluster subsommand to identify the unique variable that marks the different groups (clusters). regress y x1 x2.., cluster(groupvar) 75

Clustering example Greene p.394 76