GMM, HAC estimators, & Standard Errors for Business Cycle Statistics Wouter J. Den Haan London School of Economics c Wouter J. Den Haan
Overview Generic GMM problem Estimation Heteroskedastic and Autocorrelation Consistent (HAC) estimators to calcuate optimal weighting matrix and standard errors Simple applications OLS with correct standard errors IV with multiple instruments standard errors for business cycle statistics
GMM problem Underlying true model: E [h(x t ; θ)] = 0 p θ : m 1 vector with parameters x : n 1 vector of observables h( ) : p 1 vector-valued function with p m 0 p : p 1 vector with zeros
Examples OLS: q t = a + bp t + u t [ ut E u t p t ] = [ 0 0 ] IV: q t = a + bp t + u t [ ut E u t z t ] = [ 0 0 ]
Examples DSGE: [ ] c γ t = E t β (1 + r t+1 ) c γ t+1 u t+1 = β (1 + r t+1 ) c γ t+1 c γ t E t [u t+1 f (z t )] = 0 p = E [u t+1 f (z t )] = 0 p where z t is a vector with variables in information set in period t and f ( ) is a measurable function
GMM estimation Let where Y T contains data g(θ; Y T ) = T h (x t ; θ) /T, t=1 Idea of estimation: choose θ such that g(θ; Y T ) is as small as possible Throughout the slides, remember that g(θ; Y T ) is a mean
GMM estimation θ T = arg min θ Θ g(θ; Y T) W T g(θ; Y T ) where W T : p p weighting matrix No weighting matrix needed if p = m
Asymptotic standard errors T ( θ T θ 0 ) N(0, V) V = ( DWD ) 1 DWΣ0 W D ( DWD ) 1
Asymptotic standard errors W = plim W T T D = plim T θ 0 : true θ Σ 0 = E j= g (θ; Y T ) θ θ=θ0 [h (x t ; θ 0 ) h ( x t j ; θ 0 ) ]
Terms showing up in V D measures how sensitive g is to changes in θ 0. less precise estimates of θ 0 if g is not very sensitive to changes in θ. Σ 0 is the variance-covariance matrix of Tg(θ 0 ; Y T ) as T. if the mean underlying the estimation is more volatile = estimates of θ 0 less precise obtaining an estimate for Σ 0 is often the tricky bit (more on this below)
Two cases when formula for V simplifies 1 p = m (no overidentifying restrictions) T ( θ T θ 0 ) V = N(0, V) ( DΣ 1 0 D ) 1 2 using optimal weighting matrix, i.e., the matrix W that minimizes V. W optimal = Σ 1 0 T ( θ T θ 0 ) V = N(0, V) ( DΣ 1 0 D ) 1
Example 1 Y T = {x t } T t=1 and we want to estimate the mean µ h (x t ; µ) = x t µ g (µ; Y T ) = T t=1 (x t µ) T µ T = T t=1 x t T
Example 1 D = 1 Σ T equals the variance of ( T equals variance of T ( T t=1 x t ) T t=1 (x t µ) /T, which ) /T Variance of T t=1 x t equals variance of x 1 + covariance of x 1 &x 2 + + covariance of x 1 &x T + covariance of x 1 &x 2 + variance x 2 + etc. IF x t serially uncorrelated, then variance of ( T equals variance of x t T t=1 x t ) /T
Example 2 x 1,t and x 2,t have the same mean for simplicity assume that x 1,t and x 2,t are not correlated with each other and are not serially correlated Y T = {x 1,t, x 2,t } T t=1 h (x t ; µ) = Σ 0 = W = [ ] x1,t µ x 2,t µ & D = [ 1 1 ] [ σ 2 x1 0 ] 0 σ 2 x 2 [ 1/σ 2 x1 0 ] 0 1/σ 2 x 2
Estimating variance-covariance matrix g(θ; Y T ) is the mean of h(x t ; θ) variance of g(θ; Y T ) is easy to calculate if h (x t ; θ) is serially uncorrelated but in general it is diffi cult
Estimate variance of a mean The (limit of the) variance-covariance of Tg(θ; Y T ) equals Σ 0 = E [h (x t ; θ 0 ) h ( ) ] x t j ; θ 0 j= This is the spectral density of h (x t ; θ 0 ) at frequency 0 Since Σ 0 is a variance-covariance matrix, it should be positive semi-definite (PSD), that is, z Σ 0 z should be non-negative for any non-zero column vector z
HAC estimators 1 Kernel-based (truncated, Newey-West, Andrews) 2 Parametric (Den Haan & Levin)
Estimate variance of a mean Σ T = J [h κ(j; J) min{t,t+j} (x t=max{1,j+1} t ; θ 0 ) h ( ) ] x t j ; θ 0 T j= J where κ( ; J) is the kernel with bandwidth parameter J.
Truncated kernel Truncated κ(j; J) = { 1 if j J 0 o.w. disadvantages potential bias because autocorrelation at j > J is ignored answer not necessarily positive semi-definite
Newey-West (Bartlett) kernel κ(j; J) = 1 j J + 1 for j J Newey-West kernel always gives PSD disadvantages potential bias because autocorrelation at j > J is ignored bias because k (j; J) < 1 for included terms
Choice of bandwidth parameter a high (low) J leads to less (more) bias more (less) sampling variability J should be higher if h ( ; θ 0 ) is more persistent J optimal as T but at a slower rate
Choice of bandwidth parameter There are papers that give expressions for J optimal, but adding a constant to these expressions does not affect optimality (that is, the analysis only gives an optimal rate). Thus, in a finite sample you simply have to experiment and hope your answer is robust
Cost of imposing PSD Suppose h (x t ; θ 0 ) is an MA(1) = E [ h (x t ; θ 0 ) h ( x t j ; θ 0 )] = 0 for j > 1 If J = 1 then you have a biased estimate since k(1; 1) = 1/2 If J > 1 then you must include estimates of expectations that we know are zero Suppose h(x t ; θ 0 ) has a persistent and not persistent component = you must use the same J for both components
VARHAC Idea: Estimate a flexible time-series process (VAR) for h t, that is, h (x t ; θ 0 ) = J A j h ( ) x t j ; θ 0 + εt j=1 Use implied spectrum at frequency zero as the estimate for Σ 0
VARHAC h (x t ; θ 0 ) = J A j h ( ) x t j ; θ 0 + εt j=1 Then the implied spectrum at frequency zero, i.e., Σ 0, equals Σ T = [ I p J j=1 A j ] 1 Σ ε,t [I p ] J 1 A j j=1 with Σ ε,t = T j=j+1 ε tε t T
VARHAC Estimate is PSD by construction (PSD is obtained without imposing additional bias) Only bias is due to lag length potentially being too short You can use standard model selection criteria (AIC, BIC) to determine lag length You could estimate a VARMA VARHAC also gives a consistent estimate for nonlinear processes since Σ 0 only depends on second-order processes which can be captured with VAR
Back to GMM Exactly identified case: estimate θ T and calculate V T Over-identified case: You need W T to estimate θ T W optimal T depends on Σ 0, which depends on θ 0
Back to GMM What to do in practice? obtain (consistent) estimate of θ 0 with simple W T (identity although scaling is typically a good idea) use this estimate to calculate W optimal T use W optimal T to get a more effi cient estimate of θ 0 calculate variance of θ T using ( D T Σ T 1 D ) 1 T you could iterate on this
Examples OLS with heteroskedastic and serially correlated errors IV with multiple instruments Business cycle statistics
Key lesson of today s lecture If you can write the estimation problem as E [h(x t ; θ] = 0 p then you can use GMM and we know how to calculate standard errors
OLS q t = θp t + u t h (x t ) = u t p t = q t p t θp 2 t
OLS & GMM ) T ( θ T θ 0 N(0, V) V = ( DΣ0 1 1 ( ) D = E p 2 t
OLS & GMM ( ( ( )) ( ( )) ) 1 V = E p 2 t Σ0 1 E p 2 t If E[u t u t+τ ] = 0 for τ = 0, then [ Σ0 1 = E (u t p t ) 2] and ( ( ( )) [ V = E p 2 t E (u t p t ) 2] ( ( )) ) 1 E p 2 t If errors are homoskedastic, then [ E (u t p t ) 2] [ = E ( V = E u 2 t ] ( p 2 t [ E p 2 t ] )) 1 E [ u 2 t ]
OLS & GMM Suppose that q t = θp t + u t u t = ε t p t ε t i.i.d, E t [p t ε t ] = 0, and ε t homoskedastic Then you should do GLS q t p t = θ + ε t GMM does not by itself do the transformation, i.e., it does not do GLS, but it does give you standard errors that correct for heteroskedasticity
IV Suppose that q t = θp t + u t, E [u t z 1,t ] = 0 and E [u t z 2,t ] = 0 With GMM you can find the optimal weighting of the two instruments
Business cycles statistics 1 2 ψ = standard deviation (c t) standard deviation (y t ) ρ = correlation (c t, y t ) Statistics are easy to estimate, but what is the standard error?
Business cycles statistics GMM problem for ψ h (x t ; θ 0 ) = c t µ c y t µ y ( y t µ y ) 2 ψ 2 (c t µ c ) 2
Business cycles statistics corresponding D matrix D = = 1 0 0 0 1 0 ) [ ( 2E [(c t µ c )] 2E [(y t µ y ψ 2] ) ] 2 2E y t µ y ψ 1 0 0 0 1 0 [ ( ) ] 2 0 0 2E y t µ y ψ
Steps to follow 1 Use standard estimates for µ c, µ y, and ψ 2 Obtain estimate for D 3 Obtain estimate for Σ 0 using ) h (x t ; θ T = ( y t µ y,t ) 2 ψ 2 c t µ c,t y t µ y,t ) 2 (c t µ c,t 4 Obtain estimate for V using ( D T Σ T 1 D ) 1 T
Business cycles statistics GMM problem for ρ h ( x t; θ ) = c t µ c y t µ ( ) y 2 y t µ y ψ 2 (c t µ c ) 2 ( ) 2 ) y t µ y ψρ (ct µ c ) (y t µ y
Business cycles statistics corresponding D matrix D = 1 0 0 0 0 1 0 0 [ ( ) ] 2 0 0 2E y t µ y ψ 0 [ ( ) ] [ 2 ( ) ] 2 0 0 E y t µ y ρ E y t µ y ψ
References Den Haan, W.J., and A. Levin, 1997, A practioner s guide to robust covariance matrix estimation a detailed survey of all the ways to estimate Σ 0 with more detailed information