Single Equation Linear GMM with Serially Correlated Moment Conditions

Similar documents
Single Equation Linear GMM with Serially Correlated Moment Conditions

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Class 1: Stationary Time Series Analysis

Define y t+h t as the forecast of y t+h based on I t known parameters. The forecast error is. Forecasting

Introduction to Stochastic processes

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

5: MULTIVARATE STATIONARY PROCESSES

Discrete time processes

Asymptotic distribution of GMM Estimator

Trend-Cycle Decompositions

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal

Ch. 14 Stationary ARMA Process

Heteroskedasticity and Autocorrelation Consistent Standard Errors

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

7. MULTIVARATE STATIONARY PROCESSES

ECON 616: Lecture 1: Time Series Basics

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Autoregressive and Moving-Average Models

Forecasting with ARMA

Vector autoregressions, VAR

Università di Pavia. Forecasting. Eduardo Rossi

The generalized method of moments

Nonlinear GMM. Eric Zivot. Winter, 2013

Nonlinear time series

Econometrics II Heij et al. Chapter 7.1

Single Equation Linear GMM

9. AUTOCORRELATION. [1] Definition of Autocorrelation (AUTO) 1) Model: y t = x t β + ε t. We say that AUTO exists if cov(ε t,ε s ) 0, t s.

E 4101/5101 Lecture 6: Spectral analysis

Lecture 1: Stationary Time Series Analysis

Lecture 1: Stationary Time Series Analysis

Lecture on ARMA model

ECONOMETRICS Part II PhD LBS

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

Chapter 4: Models for Stationary Time Series

TIME SERIES AND FORECASTING. Luca Gambetti UAB, Barcelona GSE Master in Macroeconomic Policy and Financial Markets

Autoregressive Moving Average (ARMA) Models and their Practical Applications

1 Class Organization. 2 Introduction

Advanced Econometrics

Financial Econometrics Return Predictability

Multivariate Time Series

Stationary Stochastic Time Series Models

1 Linear Difference Equations

Econometric Forecasting

Economic modelling and forecasting

Class 4: VAR. Macroeconometrics - Fall October 11, Jacek Suda, Banque de France

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Empirical Macroeconomics

Multiple Equation GMM with Common Coefficients: Panel Data

Empirical Macroeconomics

Testing in GMM Models Without Truncation

Permanent Income Hypothesis (PIH) Instructor: Dmytro Hryshko

Econ 583 Final Exam Fall 2008

ECON/FIN 250: Forecasting in Finance and Economics: Section 6: Standard Univariate Models

Econ 424 Time Series Concepts

Estimating Deep Parameters: GMM and SMM

Consider the trend-cycle decomposition of a time series y t

Lecture 4a: ARMA Model

Generalized Method of Moments (GMM) Estimation

Basic concepts and terminology: AR, MA and ARMA processes

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Empirical Market Microstructure Analysis (EMMA)

Lecture 2: ARMA(p,q) models (part 2)

VAR Models and Applications

A Primer on Asymptotics

Heteroskedasticity- and Autocorrelation-Robust Inference or Three Decades of HAC and HAR: What Have We Learned?

Asymptotics for Nonlinear GMM

10. Time series regression and forecasting

Univariate Time Series Analysis; ARIMA Models

Estimating Moving Average Processes with an improved version of Durbin s Method

Economics 583: Econometric Theory I A Primer on Asymptotics

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Econ 623 Econometrics II Topic 2: Stationary Time Series

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

Vector Auto-Regressive Models

Financial Econometrics and Volatility Models Estimation of Stochastic Volatility Models

Econometrics of financial markets, -solutions to seminar 1. Problem 1

Title. Description. Quick start. Menu. stata.com. xtcointtest Panel-data cointegration tests

Econometric Methods for Panel Data

Lesson 2: Analysis of time series

STAT Financial Time Series

2.5 Forecasting and Impulse Response Functions

Econ 582 Nonparametric Regression

Forecasting and Estimation

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

Lecture 2: Univariate Time Series

Introduction to ARMA and GARCH processes

Midterm Suggested Solutions

Class: Trend-Cycle Decomposition

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 9: Multiple equation models II

Econometría 2: Análisis de series de Tiempo

Principles of forecasting

3. ARMA Modeling. Now: Important class of stationary processes

Ch. 19 Models of Nonstationary Time Series

Stochastic process for macro

Non-nested model selection. in unstable environments

TitleReducing the Size Distortion of the.

Linear Regression with Time Series Data

Transcription:

Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009

Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t ) <. A fundamental decomposition result is the following: Wold Representation Theorem. y t has the representation y t = μ + j=0 ψ j ε t j = μ + ε t + ψ 1 ε t 1 + ψ 0 = 1, j=0 ψ 2 j < ε t MDS(0,σ 2 )

Remarks 1. The Wold representation shows that y t has a linear structure. As a result, thewoldrepresentationisoftencalledthelinear process representation of y t P j=0 ψ 2 j 2. < is called square-summability and controls the memory of the process. It implies that ψ j 0 as j at a sufficiently fast rate.

Variance γ 0 = var(y t ) = var = k=0 k=0 ψ k ε t k ψ 2 k var(ε t) = σ 2 X ψ 2 k < k=0

Autocovariances γ j = E[(y t μ)(y t j μ)] = E k=0 ψ k ε t k h=0 ψ h ε t h j E[(ψ 0 + ψ 1 ε t 1 + + ψ j ε t j + ) (ψ 0 ε t j + ψ 1 ε t j 1 + )] = σ 2 X ψ j+k ψ k, j =0, 1, 2,... k=0

Ergodicity Ergodicity requires It can be shown that implies P j=0 γ j <. j=0 j=0 γ j < ψ 2 j <

Example: MA(1) process (truncated Wold form) Y t = μ + ε t + θε t 1, θ < 1 ε t iid (0,σ 2 ) Then ψ 1 = θ, ψ k =0for k>1 E[Y t ] = μ γ 0 = E[(Y t μ) 2 ]=σ 2 (1 + θ 2 ) γ 1 = E[(Y t μ)(y t 1 μ)] = σ 2 θ γ k = 0, k > 1 Clearly, j=0 ψ 2 j =1+θ2 <, j=0 so that {Y t } is both weakly stationary and ergodic. γ j = σ 2 (1 + θ 2 + θ ) <

Example: AR(1) process Mean adjusted form: Y t μ = φ(y t 1 μ)+ε t, ε t WN(0,σ 2 ), φ < 1 E[Y t ] = μ Regression form: Solution by recursive substitution: Y t = c + φy t 1 + ε t, c = μ(1 φ) Y t μ = φ t+1 (Y 1 μ)+φ t ε 0 + + φε t 1 + ε t = φ t+1 (Y 1 μ)+ = φ t+1 (Y 1 μ)+ tx i=0 tx i=0 φ i ε t i ψ i ε t i, ψ i = φ i

Stability and Stationarity Conditions If φ < 1 then lim j φj = lim j =0 j lim j φj (Y 1 μ) = 0 and the stationary solution (Wold form) for the AR(1) becomes. Y t = μ + ψ j = φ j j=0 φ j ε t j = j=0 ψ j ε t j This is a stable (non-explosive) solution.

Remarks 1. P j=0 ψ 2 j = P j=0 φ 2j = 1 1 φ 2 < 2. For the stationary solution, think of the process as having started in the infinite past

AR(1) in Lag Operator Notation If φ < 1 then (1 φl)(y t μ) =ε t such that (1 φl) 1 = j=0 φ j L j =1+φL + φ 2 L 2 + (1 φl) 1 (1 φl) =1

Trick to find Wold form: Y t μ = (1 φl) 1 (1 φl)(y t μ) =(1 φl) 1 ε t = = = j=0 j=0 j=0 φ j L j ε t φ j ε t j ψ j ε t j, ψ j = φ j

Trick for calculating moments: use stationarity properties E[Y t ] = E[Y t j ] for all j cov(y t,y t j ) = cov(y t k,y t k j ) for all k, j Mean of AR(1) E[Y t ] = c + φe[y t 1 ]+E[ε t ] = c + φe[y t ] E[Y t ]= c 1 φ = μ

Variance of AR(1) γ 0 = var(y t )=E[(Y t μ) 2 ] = E[(φ(Y t 1 μ)+ε t ) 2 ] = φ 2 E[(Y t 1 μ) 2 ]+2E[(Y t 1 μ)ε t ]+E[ε 2 t ] = φ 2 E[(Y t μ) 2 ]+0+σ 2 = φ 2 γ 0 + σ 2 γ 0 = σ2 1 φ 2 Note: From the Wold representation γ 0 =var j=0 φ j ε t j = σ 2 X φ 2j = j=0 σ2 1 φ 2

Autocovariances and Autocorrelations Trick: multiply Y t μ by Y t j μ and take expectations γ j = E[(Y t μ)(y t j μ)] = E[φ(Y t 1 μ)(y t j μ)] + E[ε t (Y t j μ)] = φγ j 1 (by stationarity) γ j = φ j γ 0 = φ j σ 2 1 φ 2 Autocorrelations: Remark: For 0 <φ<1 ρ j = γ j γ 0 = φj γ 0 γ 0 = φ j = ψ j j=0 γ j = γ 0 X j=1 φ j = γ 0 1 φ <

Asymptotic Properties of Linear Processes LLN for Linear Processes (Phillips and Solo, Annals of Statistics 1992). Assume j=0 y t = μ + ψ(l)ε t, ε t MDS(0,σ 2 ) = μ + j=0 ψ j ε t j, ψ(l) = ψ(l) is 1-summable; i.e, j ψ j = 1 ψ 1 +2 ψ 2 + < j=0 ψ j L j

Then ˆμ = 1 T ˆγ j = 1 T TX t=1 TX t=1 y t p E[yt ]=μ (y t ˆμ)(y t ˆμ) p cov(y t,y t j )=γ j

CLT for Linear Processes (Phillips and Solo, Annals of Statistics 1992) y t = μ + ψ(l)ε t, ε t MDS(0,σ 2 ) ψ(1) = = μ + j=0 ψ j ε t j, ψ(l) = ψ(l) is 1-summable j=0 ψ j 6=0 j=0 ψ j L j

Then T (ˆμ μ) d N(0, LRV) LRV = long-run variance = γ j j= = γ 0 +2 = σ 2 ψ(1) 2 j=1 γ j, since γ j = γ j

Intuition behind LRV formula Consider var( T ȳ) =var Using the fact that TX t=1 it follows that var X T y t = 1 T t=1 T var 1 TX y t t=1 y t = 1 0 y, 1 =(1,...,1) 0, y =(y 1,...,y T ) 0 TX y t t=1 =var(1 0 y)=1 0 var(y)1

Now var(y) = E[(y μ1)(y μ1) 0 ] = = Γ γ 0 γ 1 γ 2 γ T 1 γ 1 γ 0 γ 1 γ T 2 γ 2 γ 1 γ 0........ γ T 3. γ T 1 γ T 2 γ T 3 γ 0 where γ j =cov(y t,y t j ) and γ j = γ j. Therefore, var TX y t t=1 = 1 0 var(y)1 = 1 0 Γ1

Now, 1 0 Γ1 = sum of all elements in the T T matrix Γ. This sum may be computed by summing across the rows, or the columns or along the diagonals. Given the band diagonal structure of Γ, it is most convenient to sum along the diagonals. Doing so gives 1 0 Γ1 = Tγ 0 +2(T 1)γ 1 +2(T 2)γ 2 + +2γ T 1

Then 1 (T 1) (T 2) T 10 Γ1 = γ 0 +2 γ T 1 +2 γ T 2 + +2 1 T γ T 1 TX 1 µ = γ 0 +2 1 j γ T j j=1 As T, it can be shown that 1 T 10 Γ1 γ 0 +2 j=1 γ j =LRV

Remark Since γ j = γ j, we may re-express T 1 1 0 Γ1 as 1 T 10 Γ1 = γ 0 + T 1 X j= (T 1) Ã 1 j T! γ j

Example: MA(1) Process Y t = μ + ε t + θε t 1, θ < 1 ε t iid (0,σ 2 ) Recall ψ(l) = 1+θL γ 0 = σ 2 (1 + θ 2 ), γ 1 = σ 2 θ Then LRV = γ 0 +2 γ j j=1 = σ 2 (1 + θ 2 )+2σ 2 θ = σ 2 (1 + θ) 2 = σ 2 ψ(1) 2

Remarks 1. If θ =0then LRV = σ 2 2. If θ = 1 then ψ(1) = 0 LRV = σ 2 ψ(1) 2 =0 (a) This motivates the condition ψ(1) 6= 0in the CLT for stationary and ergodic linear processes

Example: AR(1) Process Y t μ = φ(y t 1 μ)+ε t, ε t WN(0,σ 2 ), φ < 1 E[Y t ] = μ Recall, Then ψ(l) = (1 φl) 1 γ 0 = σ 2 LRV = σ 2 ψ(1) 2 = 1 φ 2, γ j = φ j γ 0 σ 2 (1 φ) 2

Straightforward algebra gives LRV = γ 0 +2 γ j j=1 = γ 0 +2 γ 0 X j=1 φ j = σ 2 (1 φ) 2 Remark LRV as φ 1

Estimating Long-Run Variance y t = μ + ψ(l)ε t, ε t MDS(0,σ 2 ) LRV = j= = σ 2 ψ(1) 2 There are two types of estimators γ j = γ 0 +2 γ j j=1 Parametric (assume a parametric model for y t ) Nonparametric (do not assume a parametric model for y t )

Example: MA(1) process A parametric LRV estimate is Y t = μ + ε t + θε t 1, θ < 1 ε t iid (0,σ 2 ) LRV = σ 2 (1 + θ) 2 d LRV = ˆσ 2 (1 + ˆθ) 2, ˆσ 2 p σ 2, ˆθ p θ

Remarks 1. Parametric estimates require us to specify a model for y t 2. For the parametric estimate, we only need consistent estimates for σ 2 and θ (e.g., GMM estimates or ML estimates) in order for LRV d to be consistent 3. Parametric estimates are generally more efficient than non-parametric estimates provided they are based on the correct model

For the general linear process y t = μ + ψ(l)ε t, ε t MDS(0,σ 2 ) a parametric estimator is not possible because ψ(l) = P j=0 ψ j L j has an infinite number of parameters. A natural nonparametric estimator is the truncated sum of sample autocovariances d LRV q = ˆγ 0 +2 qx j=1 ˆγ j q = truncation lag

Problems 1. How to pick q? 2. q must grow with T in order for d LRV q p LRV 3. d LRV q can be negative in finite samples, particularly if q is close to T. This is due to Perceval s result TX j=1 ˆγ j =0

Kernel Based Estimators These are nonparametric estimators of the form (Hayashi notation) where d LRV ker = T 1 X j= (T +1) κ Ã j q(t )! ˆγ j κ( ) = kernel weight function q(t ) = bandwidth parameter

The kernel estimators are motivated by the result which suggests var ³ T ȳ = κ Ã j q(t )! = T 1 X j= (T +1) Ã 1 j T! Ã 1 j T! ˆγ j, q(t )=T

Examples of Kernel Weight Functions Truncated kernel Note q(t ) = q<t ( 1 for x 1 κ(x) = 0 for x > 1 κ Ã j q! = ( 1 0 for j q for j >q

Example: q =2 LRV d Trunc ker = T 1 X j= (T +1) κ Ã j q(t )! ˆγ j = = ˆγ 2 +ˆγ 1 +ˆγ 0 +ˆγ 1 +ˆγ 2 = ˆγ 0 +2(ˆγ 1 +ˆγ 2 ) 2X j= 2 ˆγ j Remarks: 1. The bandwidth parameter q(t )=q acts as a lag truncation parameter 2. Here, d LRV Trunc ker is not guaranteed to be positive if q is close to T

Bartlett kernel q(t ) < T ( 1 x κ(x) = 0 for x 1 for x > 1 d LRV ker computed with the Bartlet kernel gives the so-called Newey-West estimator. Example: q(t )=3 κ Ã j q(t )! = κ µ j 3 = ( 1 j 0 3 for j 3 for j > 3

Then LRV d Bartlett ker = T 1 X j= (T +1) κ Ã j q(t )! ˆγ j = 3X j= 3 Ã 1 j 3! ˆγ j = 3ˆγ 1 2 + 2 3ˆγ 1 +ˆγ 0 + 2 3ˆγ 1 + 1 3ˆγ 2 2 = ˆγ 0 +2 3ˆγ 1 + 1 2 3ˆγ = ˆγ 0 +2 2X j=1 1 j ˆγ 3 j

Remarks 1. Use of the Bartlett kernel guarantees that d and West, 1987 Ecta for a proof) LRV Bartlett ker > 0 (See Newey 2. In order for d p LRV, it must be the case that q(t ) as T. There is a bias-variance tradeoff. Ifq(T ) tooslowlythenthere will be bias; if q(t ) too fast then there will be an increase in variance. LRV Bartlett ker 3. Andrews (1991) Ecta proved that if q(t ) at rate T 1/3 then MSE( d LRV Bartlett ker, LRV) will be minimized asymptotically. 4. Andrews result q(t ) at rate T 1/3 suggests setting q(t )=ct 1/3, c = some constant

However, Andrews result says nothing about what c should be. Hence, Andrews result is not practically useful.

Remarks continued 5. Many software programs use the following rule due to Newey and West (1987) " q(t )=int 4 µ T 100 1/4 # 6. Andrews (1991) studied two other kernels - the Parzen kernel and the quadratic spectral (QS) kernel (these are commonly programmed into software). He shows that q(t ) at rate T 1/5 in order to asymptotically minimize MSE( LRV d ker, LRV). He also showed that MSE( LRV d QS ker, LRV) is the smallest among the different kernels.

Remarks continued 7. Newey and West (1994) Ecta gave Monte Carlo evidence that the choice of bandwidth, q(t ), ismoreimportantthanthechoiceofkernel. They suggested some data-based methods for automatically choosing q(t ). However, as discussed in den Haan and Levin (1997) these methods are not really automatic because they depend on an initial estimate of LRV and some pre-specified weights. See Hall (2005), section 3.5 for more details.

weights 0.0 0.2 0.4 0.6 0.8 1.0 Truncated Bartlett Parzen Quadratic Spectral 1 2 3 4 5 6 7 8 9 10 lags Figure 1: Common kernel functions.

Multivariate Time Series Consider n time series variables {y 1t },...,{y nt }. A multivariate time series is the (n 1) vector time series {Y t } where the i th row of {Y t } is {y it }. That is, for any time t, Y t =(y 1t,...,y nt ) 0. Multivariate time series analysis is used when one wants to model and explain the interactions and co-movements among a group of time series variables: Consumption and income, Stock prices and dividends, forward and spot exchange rates interest rates, money growth, income, inflation GMM moment conditions (g t = x t ε t )

Stationary and Ergodic Multivariate Time Series A multivariate time series Y t is covariance stationary and ergodic if all of its component time series are stationary and ergodic. E[Y t ]=μ =(μ 1,...,μ n ) 0 = var(y t )=Γ 0 = E[(Y t μ)(y t μ) 0 ] var(y 1t ) cov(y 1t,y 2t ) cov(y 1t,y nt ) cov(y 2t,y 1t ) var(y 2t ) cov(y 2t,y nt )...... cov(y nt,y 1t ) cov(y nt,y 2t ) var(y nt ) The correlation matrix of Y t is the (n n) matrix corr(y t )=R 0 = D 1 Γ 0 D 1 where D is a diagonal matrix with j th diagonal element (γ 0 jj )1/2 =var(y jt ) 1/2.

The parameters μ, Γ 0 and R 0 are estimated from data (Y 1,...,Y T ) using thesamplemoments Ȳ = 1 T ˆΓ 0 = 1 T TX t=1 TX t=1 Y t p E[Yt ]=μ (Y t Ȳ)(Y t Ȳ) 0 p var(yt )=Γ 0 ˆR 0 = ˆD 1ˆΓ 0 ˆD 1 p cor(y t )=R 0 where ˆD is the (n n) diagonal matrix with the sample standard deviations of y jt along the diagonal. The Ergodic Theorem justifies convergence of the sample moments to their population counterparts.

Cross Covariance and Correlation Matrices Foraunivariatetimeseriesy t the autocovariances γ k and autocorrelations ρ k summarize the linear time dependence in the data. WithamultivariatetimeseriesY t each component has autocovariances and autocorrelations but there are also cross lead-lag covariances and correlations between all possible pairs of components. The autocovariances and autocorrelations of y jt for j =1,...,n are defined as γ k jj = cov(y jt,y jt k ), ρ k jj = cor(y jt,y jt k )= γk jj γ 0 jj and these are symmetric in k: γ k jj = γ k jj, ρk jj = ρ k jj.

The cross lag covariances and cross lag correlations between y it and y jt are defined as γ k ij = cov(y it,y jt k ), ρ k ij = cor(y jt,y jt k )= γk ij q γ 0 ii γ 0 jj and they are not necessarily symmetric in k. In general, γ k ij = cov(y it,y jt k ) 6= cov(y it,y jt+k ) = cov(y jt,y it k )=γ k ij

If γ k ij 6=0for some k>0 then y jt is said to lead y it. If γ k ij 6= 0for some k>0 then y it is said to lead y jt. It is possible that y it leads y jt and vice-versa. In this case, there is said to be feedback between the two series.

All of the lag k cross covariances and correlations are summarized in the (n n) lag k cross covariance and lag k cross correlation matrices Γ k = E[(Y t μ)(y t k μ) 0 ]= cov(y 1t,y 1t k ) cov(y 1t,y 2t k ) cov(y 1t,y nt k ) cov(y 2t,y 1t k ) cov(y 2t,y 2t k ) cov(y 2t,y nt k )...... cov(y nt,y 1t k ) cov(y nt,y 2t k ) cov(y nt,y nt k ) R k = D 1 Γ k D 1 The matrices Γ k and R k arenotsymmetricink but it is easy to show that Γ k = Γ 0 k and R k= R 0 k.

The matrices Γ k and R k are estimated from data (Y 1,...,Y T ) using ˆΓ k = 1 T TX t=k+1 ˆR k = ˆD 1ˆΓ k ˆD 1 (Y t Ȳ)(Y t k Ȳ) 0

Multivariate Wold Representation Any (n 1) covariance stationary multivariate time series Y t has a Wold or linear process representation of the form Y t = μ + ε t + Ψ 1 ε t 1 + Ψ 2 ε t 2 + = μ + k=0 Ψ k ε t k, Ψ 0 = I n ε t WN(0, Σ) Ψ k is an (n n) matrix with (i, j)th element ψ k ij.

In lag operator notation, the Wold form is Ψ(L) = The moments of Y t are given by Y t = μ + Ψ(L)ε t k=0 E[Y t ] = μ var(y t ) = k=0 Ψ k L k Ψ k ΣΨ 0 k

Long Run Variance Let Y t be an (n 1) stationary and ergodic multivariate time series with E[Y t ]=μ. Phillips and Solo s CLT for linear processes states that T (Ȳ μ) d N (0, LRV) LRV (n n) = X Γ j j= Since Γ j = Γ 0 j, LRV may be alternatively expressed as LRV = Γ 0 + j=1 (Γ j + Γ 0 j )

Using the Wold representation of Y t it can be shown that LRV = Ψ(1)ΣΨ(1) 0 Ψ(1) = Ψ k k=0

Non-parametric Estimate of the Long-Run Variance A consistent estimate of LRV may be computed using non-parametric methods. A popular estimator is the Newey-West weighted autocovariance estimator basedonbartlettweights d LRV = ˆΓ 0 + ˆΓ j = 1 T q(t ) 1 X TX t=j+1 j=1 q(t ) = bandwidth à 1 j q(t )! ³ˆΓ j + ˆΓ 0 j (Y t Ȳ)(Y t j Ȳ) 0

Single Equation Linear GMM with Serial Correlation y t = z 0 tδ 0 + ε t, t =1,...,n z t = L 1 vector of explanatory variables δ 0 = L 1 vector of unknown coefficients ε t = random error term x t = K 1 vector of exogenous instruments Moment Conditions and identification for General Model g t (w t, δ 0 ) = x t ε t = x t (y t z 0 tδ 0 ) E[g t (w t, δ 0 )] = E[x t ε t ]=E[x t (y t z 0 tδ 0 )] = 0 E[g t (w t, δ)] 6= 0 for δ 6= δ 0

Serially Correlated Moments It is assumed that {g t } = {x t ε t } is a mean zero, 1-summable linear process g t = ψ(l)ε t, ε t MDS(0, Σ) satisfying the Phillips-Solo CLT with long-run variance Remark S = LRV = Γ 0 + j=1 (Γ j + Γ 0 j ) Γ 0 = E[g t gt]=e[x 0 t x 0 tε 2 t ] Γ j = E[g t gt j 0 ]=E[x tx 0 t j ε tε t j ] If {g t } is not serially correlated, then Γ j =0for j>0 and LRV = Γ 0 = E[g t g 0 t]

Estimation of LRV LRV is typically estimated using the Newey-West estimator (kernel estimator with Bartlett kernel) d LRV = ˆΓ 0 + ˆΓ 0 = 1 T ˆΓ j = 1 T q(t ) 1 X TX j=1 à t=j+1 g t g 0 t = 1 T TX 1 j q(t ) t=j+1 g t g 0 t j = 1 T ˆε t = y t z 0 tˆδ, ˆδ p ˆδ TX! t=j+1 TX t=j+1 ³ˆΓ j + ˆΓ 0 j x t x 0 tˆε 2 t x t x 0 t jˆε tˆε t j

Remark The estimate d LRV is often denoted Ŝ HAC, where HAC denotes heteroskedastic and autocorrelation consistent.

Efficient GMM ˆδ(Ŝ HAC 1 ) = argmin δ = argmin δ J(Ŝ 1 HAC, δ) ng n (δ) 0 Ŝ 1 HAC g n(δ) As with GMM with heteroskedastic errors, the following efficient GMM estimators can be computed 1. 2-step efficient 2. Iterated efficient 3. Continuously-Updated (CU)

Note: the CU estimator solves where ˆδ(Ŝ 1 HAC,CU )=argmin δ + q(t ) 1 X j=1 à Ŝ HAC (δ) =ˆΓ 0 (δ) 1 j q(t ) ng n (δ) 0 Ŝ 1 HAC (δ)g n(δ)! ³ˆΓ j (δ)+ˆγ 0 j (δ) ˆΓ j (δ) = 1 T TX t=j+1 ε t (δ) = y t z 0 tδ x t x 0 t j ε t(δ)ε t j (δ)