GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics Wouter J. Den Haan London School of Economics c Wouter J. Den Haan

Overview Generic GMM problem Estimation Heteroskedastic and Autocorrelation Consistent (HAC) estimators to calcuate optimal weighting matrix and standard errors Simple applications OLS with correct standard errors IV with multiple instruments standard errors for business cycle statistics

GMM problem Underlying true model: E [h(x t ; θ)] = 0 p θ : m 1 vector with parameters x : n 1 vector of observables h( ) : p 1 vector-valued function with p m 0 p : p 1 vector with zeros

Examples OLS: q t = a + bp t + u t [ ut E u t p t ] = [ 0 0 ] IV: q t = a + bp t + u t [ ut E u t z t ] = [ 0 0 ]

Examples DSGE: [ ] c γ t = E t β (1 + r t+1 ) c γ t+1 u t+1 = β (1 + r t+1 ) c γ t+1 c γ t E t [u t+1 f (z t )] = 0 p = E [u t+1 f (z t )] = 0 p where z t is a vector with variables in information set in period t and f ( ) is a measurable function

GMM estimation Let where Y T contains data g(θ; Y T ) = T h (x t ; θ) /T, t=1 Idea of estimation: choose θ such that g(θ; Y T ) is as small as possible Throughout the slides, remember that g(θ; Y T ) is a mean

GMM estimation θ T = arg min θ Θ g(θ; Y T) W T g(θ; Y T ) where W T : p p weighting matrix No weighting matrix needed if p = m

Asymptotic standard errors T ( θ T θ 0 ) N(0, V) V = ( DWD ) 1 DWΣ0 W D ( DWD ) 1

Asymptotic standard errors W = plim W T T D = plim T θ 0 : true θ Σ 0 = E j= g (θ; Y T ) θ θ=θ0 [h (x t ; θ 0 ) h ( x t j ; θ 0 ) ]

Terms showing up in V D measures how sensitive g is to changes in θ 0. less precise estimates of θ 0 if g is not very sensitive to changes in θ. Σ 0 is the variance-covariance matrix of Tg(θ 0 ; Y T ) as T. if the mean underlying the estimation is more volatile = estimates of θ 0 less precise obtaining an estimate for Σ 0 is often the tricky bit (more on this below)

Two cases when formula for V simplifies 1 p = m (no overidentifying restrictions) T ( θ T θ 0 ) V = N(0, V) ( DΣ 1 0 D ) 1 2 using optimal weighting matrix, i.e., the matrix W that minimizes V. W optimal = Σ 1 0 T ( θ T θ 0 ) V = N(0, V) ( DΣ 1 0 D ) 1

Example 1 Y T = {x t } T t=1 and we want to estimate the mean µ h (x t ; µ) = x t µ g (µ; Y T ) = T t=1 (x t µ) T µ T = T t=1 x t T

Example 1 D = 1 Σ T equals the variance of ( T equals variance of T ( T t=1 x t ) T t=1 (x t µ) /T, which ) /T Variance of T t=1 x t equals variance of x 1 + covariance of x 1 &x 2 + + covariance of x 1 &x T + covariance of x 1 &x 2 + variance x 2 + etc. IF x t serially uncorrelated, then variance of ( T equals variance of x t T t=1 x t ) /T

Example 2 x 1,t and x 2,t have the same mean for simplicity assume that x 1,t and x 2,t are not correlated with each other and are not serially correlated Y T = {x 1,t, x 2,t } T t=1 h (x t ; µ) = Σ 0 = W = [ ] x1,t µ x 2,t µ & D = [ 1 1 ] [ σ 2 x1 0 ] 0 σ 2 x 2 [ 1/σ 2 x1 0 ] 0 1/σ 2 x 2

Estimating variance-covariance matrix g(θ; Y T ) is the mean of h(x t ; θ) variance of g(θ; Y T ) is easy to calculate if h (x t ; θ) is serially uncorrelated but in general it is diffi cult

Estimate variance of a mean The (limit of the) variance-covariance of Tg(θ; Y T ) equals Σ 0 = E [h (x t ; θ 0 ) h ( ) ] x t j ; θ 0 j= This is the spectral density of h (x t ; θ 0 ) at frequency 0 Since Σ 0 is a variance-covariance matrix, it should be positive semi-definite (PSD), that is, z Σ 0 z should be non-negative for any non-zero column vector z

HAC estimators 1 Kernel-based (truncated, Newey-West, Andrews) 2 Parametric (Den Haan & Levin)

Estimate variance of a mean Σ T = J [h κ(j; J) min{t,t+j} (x t=max{1,j+1} t ; θ 0 ) h ( ) ] x t j ; θ 0 T j= J where κ( ; J) is the kernel with bandwidth parameter J.

Truncated kernel Truncated κ(j; J) = { 1 if j J 0 o.w. disadvantages potential bias because autocorrelation at j > J is ignored answer not necessarily positive semi-definite

Newey-West (Bartlett) kernel κ(j; J) = 1 j J + 1 for j J Newey-West kernel always gives PSD disadvantages potential bias because autocorrelation at j > J is ignored bias because k (j; J) < 1 for included terms

Choice of bandwidth parameter a high (low) J leads to less (more) bias more (less) sampling variability J should be higher if h ( ; θ 0 ) is more persistent J optimal as T but at a slower rate

Choice of bandwidth parameter There are papers that give expressions for J optimal, but adding a constant to these expressions does not affect optimality (that is, the analysis only gives an optimal rate). Thus, in a finite sample you simply have to experiment and hope your answer is robust

Cost of imposing PSD Suppose h (x t ; θ 0 ) is an MA(1) = E [ h (x t ; θ 0 ) h ( x t j ; θ 0 )] = 0 for j > 1 If J = 1 then you have a biased estimate since k(1; 1) = 1/2 If J > 1 then you must include estimates of expectations that we know are zero Suppose h(x t ; θ 0 ) has a persistent and not persistent component = you must use the same J for both components

VARHAC Idea: Estimate a flexible time-series process (VAR) for h t, that is, h (x t ; θ 0 ) = J A j h ( ) x t j ; θ 0 + εt j=1 Use implied spectrum at frequency zero as the estimate for Σ 0

VARHAC h (x t ; θ 0 ) = J A j h ( ) x t j ; θ 0 + εt j=1 Then the implied spectrum at frequency zero, i.e., Σ 0, equals Σ T = [ I p J j=1 A j ] 1 Σ ε,t [I p ] J 1 A j j=1 with Σ ε,t = T j=j+1 ε tε t T

VARHAC Estimate is PSD by construction (PSD is obtained without imposing additional bias) Only bias is due to lag length potentially being too short You can use standard model selection criteria (AIC, BIC) to determine lag length You could estimate a VARMA VARHAC also gives a consistent estimate for nonlinear processes since Σ 0 only depends on second-order processes which can be captured with VAR

Back to GMM Exactly identified case: estimate θ T and calculate V T Over-identified case: You need W T to estimate θ T W optimal T depends on Σ 0, which depends on θ 0

Back to GMM What to do in practice? obtain (consistent) estimate of θ 0 with simple W T (identity although scaling is typically a good idea) use this estimate to calculate W optimal T use W optimal T to get a more effi cient estimate of θ 0 calculate variance of θ T using ( D T Σ T 1 D ) 1 T you could iterate on this

Examples OLS with heteroskedastic and serially correlated errors IV with multiple instruments Business cycle statistics

Key lesson of today s lecture If you can write the estimation problem as E [h(x t ; θ] = 0 p then you can use GMM and we know how to calculate standard errors

OLS q t = θp t + u t h (x t ) = u t p t = q t p t θp 2 t

OLS & GMM ) T ( θ T θ 0 N(0, V) V = ( DΣ0 1 1 ( ) D = E p 2 t

OLS & GMM ( ( ( )) ( ( )) ) 1 V = E p 2 t Σ0 1 E p 2 t If E[u t u t+τ ] = 0 for τ = 0, then [ Σ0 1 = E (u t p t ) 2] and ( ( ( )) [ V = E p 2 t E (u t p t ) 2] ( ( )) ) 1 E p 2 t If errors are homoskedastic, then [ E (u t p t ) 2] [ = E ( V = E u 2 t ] ( p 2 t [ E p 2 t ] )) 1 E [ u 2 t ]

OLS & GMM Suppose that q t = θp t + u t u t = ε t p t ε t i.i.d, E t [p t ε t ] = 0, and ε t homoskedastic Then you should do GLS q t p t = θ + ε t GMM does not by itself do the transformation, i.e., it does not do GLS, but it does give you standard errors that correct for heteroskedasticity

IV Suppose that q t = θp t + u t, E [u t z 1,t ] = 0 and E [u t z 2,t ] = 0 With GMM you can find the optimal weighting of the two instruments

Business cycles statistics 1 2 ψ = standard deviation (c t) standard deviation (y t ) ρ = correlation (c t, y t ) Statistics are easy to estimate, but what is the standard error?

Business cycles statistics GMM problem for ψ h (x t ; θ 0 ) = c t µ c y t µ y ( y t µ y ) 2 ψ 2 (c t µ c ) 2

Business cycles statistics corresponding D matrix D = = 1 0 0 0 1 0 ) [ ( 2E [(c t µ c )] 2E [(y t µ y ψ 2] ) ] 2 2E y t µ y ψ 1 0 0 0 1 0 [ ( ) ] 2 0 0 2E y t µ y ψ

Steps to follow 1 Use standard estimates for µ c, µ y, and ψ 2 Obtain estimate for D 3 Obtain estimate for Σ 0 using ) h (x t ; θ T = ( y t µ y,t ) 2 ψ 2 c t µ c,t y t µ y,t ) 2 (c t µ c,t 4 Obtain estimate for V using ( D T Σ T 1 D ) 1 T

Business cycles statistics GMM problem for ρ h ( x t; θ ) = c t µ c y t µ ( ) y 2 y t µ y ψ 2 (c t µ c ) 2 ( ) 2 ) y t µ y ψρ (ct µ c ) (y t µ y

Business cycles statistics corresponding D matrix D = 1 0 0 0 0 1 0 0 [ ( ) ] 2 0 0 2E y t µ y ψ 0 [ ( ) ] [ 2 ( ) ] 2 0 0 E y t µ y ρ E y t µ y ψ

References Den Haan, W.J., and A. Levin, 1997, A practioner s guide to robust covariance matrix estimation a detailed survey of all the ways to estimate Σ 0 with more detailed information