LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

LECURE ON HAC COVARIANCE MARIX ESIMAION AND HE KVB APPROACH CHUNG-MING KUAN Institute of Economics Academia Sinica October 20, 2006 ckuan@econ.sinica.edu.tw www.sinica.edu.tw/ ckuan

Outline C.-M. Kuan, HAC.1 Preliminary HAC (heteroskedasticity and autocorrelation consistent) covariance matrix estimation Kernel HAC estimators Choices of the kernel function and bandwidth KVB Approach: Constructing asymptotically pivotal tests without consistent estimation of asymptotic covariance matrix ests of parameters M tests for general moment conditions Over-identifying restrictions tests

Preliminary C.-M. Kuan, HAC.2 Consider the specification: y t = x tβ + e t. [A1] For some β o, ɛ t = y t x tβ o such that IE(x t ɛ t ) = 0 and [ r] 1 t=1 x t ɛ t S o W k (r), r [0, 1], Σ o = lim var ( 1/2 t=1 x tɛ t ) with So its matrix square root (i.e., Σ o = S o S o). [A2] M [ r] := [ r] 1 [ r] t=1 x tx t Asymptotic normality of the OLS estimator: (ˆβ β o ) = M 1 1 IP M o uniformly in r (0, 1]. x t ɛ t t=1 which has the distribution N (0, M 1 o Σ o M 1 o ). D M 1 o S o W k (1),

C.-M. Kuan, HAC.3 he null hypothesis is Rβ o = r, where R (q k) has full row rank. ( Rˆβ r ) D N (0, RM 1 o Σ o M 1 o R ). ( RM 1 Σ M 1 R ) 1/2 ( Rˆβ r ) D N (0, Iq ). he Wald test is W = ( Rˆβ r ) ( RM 1 Σ M 1 R ) 1( Rˆβ r ) D χ 2 (q). Consistent estimation of Σ o is crucial; W would not have a limiting χ 2 distribution if Σ is not a consistent estimator. Othe large sample tests (e.g., the LM test) also depend on consistent estimation of the asymptotic covariance matrix. When Σ is consistent, the resulting tests are said to be robust to heteroskedasticity and serial correlations of unknown form.

Asymptotic Covariance Matrix Σ o C.-M. Kuan, HAC.4 A general form: Σ o = lim 1 t=1 s=1 IE(x t ɛ t ɛ s x s) = lim 1 j= +1 Γ (j), with autocovariances: { 1 t=j+1 Γ (j) = IE(x tɛ t ɛ t j x t j ), j = 0, 1, 2,..., 1 t= j+1 IE(x t+jɛ t+j ɛ t x t), j = 1, 2,.... When x t ɛ t are covariance stationary, Γ (j) = Γ(j), and the spectral density of x t ɛ t at frequency ω is f(ω) = 1 Γ(j)e ijω. 2π j= Σ o = 2π f(0) and hence is also known as the long-run variance of x t ɛ t.

Examples of Σ o C.-M. Kuan, HAC.5 When x t ɛ t are serially uncorrelated: Σ o = lim Γ (0) = lim 1 IE(ɛ 2 tx t x t), t=1 which can be consistently estimated by White s heteroskedasticity-consistent estimator: Σ = 1 t=1 ê2 tx t x t. When x t ɛ t are serially uncorrelated and ɛ t conditionally homoskedastic: ( ) Σ o = σo 2 1 lim IE(x t x t) = σom 2 o, t=1 which can be consistently estimated by Σ = ˆσ 2 M. Estimating Σ o is much more difficult when heteroskedasticity and serial correlations are present and of unknown form.

A Consistent Estimator of Σ o C.-M. Kuan, HAC.6 Recall Σ o = lim 1 j= +1 Γ (j). A consistent estimator (White, 1984): Σ = l( ) Γ (j), j= l( ) with sample autocovariances { 1 t=j+1 Γ (j) = x tê t ê t j x t j, j = 0, 1, 2,..., 1 t= j+1 x t+jê t+j ê t x t, j = 1, 2,.... where l( ) grows with but at a slower rate. Drawbacks: his estimator is not guaranteed to be a positive semi-definite matrix. Must determine l( ) in a given sample.

Kernel HAC Estimators C.-M. Kuan, HAC.7 Newey and West (1987) and Gallant (1987): A consistent estimator that is also positive semi-definite is: Σ κ = 1 j= +1 ( j κ ) Γ (j), l( ) where κ is a kernel function and l( ) is its bandwidth. κ and l( ) jointly determine the weighting scheme on Γ (j) and must be selected by users. For a given j, κ(j/l( )) 1 as, so as to ensure consistency. For a given, κ(j/l( )) should be small when j is large, so as to ensure positive semi-definiteness. More formally (Andrews, 1991), κ(x)e ixω dx 0, ω R.

Kernel Functions C.-M. Kuan, HAC.8 Bartlett kernel (Newey and West, 1987): { 1 x, x 1, κ(x) = 0, otherwise; Parzen kernel (Gallant, 1987): 1 6x 2 + 6 x 3, x 1/2, κ(x) = 2(1 x ) 3, 1/2 x 1, 0, otherwise; Quadratic spectral kernel (Andrews, 1991): κ(x) = 25 ( ) sin(6πx/5) cos(6πx/5) ; 12π 2 x 2 6πx/5 Daniel kernel (Ng and Perron, 1996): κ(x) = sin(πx) πx.

Bandwidths C.-M. Kuan, HAC.9 Andrews (1991): By minimizing MSE, the optimal bandwidth growth rates are: l ( ) = 1.1447(c 1 ) 1/3, l ( ) = 2.6614(c 2 ) 1/5, l ( ) = 1.3221(c 2 ) 1/5, (Bartlett), (Parzen), (quadractic spectral), c 1 and c 2 are unknown numbers depending on the spectral density, and /l ( ) ( ˆΣ κ Σ o ) = O IP (1). As far as MSE is concerned, the quadratic spectral kernel is to be preferred. Σ B = O IP ( 1/3 ); Σ P nd Σ QS Σ QS are O IP ( 2/5 ). is more efficient than Σ P ; Σ B is the least efficient.

Problems with the Kernel Estimators C.-M. Kuan, HAC.10 he performance of the kernel HAC estimator varies with the choices of the kernel and its bandwidth. he kernel weighting scheme yields negative bias, and such bias could be substantial in finite samples. he tests based on the HAC estimators usually over-reject the null. he choices of kernel and bandwidth are somewhat arbitrary in practice, and hence the statistical inferences are vulnerable. he HAC estimator with the quadratic spectral kernel need not have better performance in finite samples. Andrews (1991) suggested a plug-in method to estimate the optimal growth rates l ( ), but this method requires estimation of a userselected model to determine c 1 and c 2.

Other Improved HAC Estimators C.-M. Kuan, HAC.11 Andrews and Monahan (1992): Pre-whitened estimator. Apply a VAR model to whiten x t ê t and estimate the covariance matrix based on its residuals. he choices of the model for pre-whitening and VAR lag order are, again, arbitrary. Kuan and Hsieh (2006): Computing sample autocovariances based on forecast errors (y t x β t t 1 ), instead of the OLS residuals. It does not require another user-chosen parameter. It yields a smaller bias (but a larger MSE); the resulting tests have more accurate test size without sacrificing test power. Bias reduction seems more important for improving HAC estimators.

KVB Approach C.-M. Kuan, HAC.12 Kiefer, Vogelsang, and Bunzel (2000): A Wald-type test is W = ( Rˆβ r ) ( RM 1 1 ĈM R ) 1( Rˆβ r ), where a normalizing matrix Ĉ is used in place of Σ κ. Ĉ is inconsistent for Σ o but is able to eliminate the nuisance parameters in Σ o. Advantages: Do not have to choose a kernel bandwidth. he resulting test remains pivotal asymptotically. he limiting distribution of the test approximates the finite-sample distribution very well (i.e., little size distortion).

KVB s Normalizing Matrix C.-M. Kuan, HAC.13 Let ˆϕ t = 1/2 t i=1 x iê i. he normalizing matrix Ĉ is ( Ĉ = 1 ˆϕ t ˆϕ t = 1 t ) ( t ) x 2 i ê i ê i x i. t=1 t=1 i=1 i=1 he limit of ˆϕ [ r] : ˆϕ [ r] = 1 [ r] i=1 x i ɛ i [ r] 1 [ r] [ r] i=1 x i x i (ˆβ β o ) Hence, S o W k (r) rm o M 1 o S o W k (1) = S o B k (r). Ĉ S o ( 1 0 ) B k (r)b k (r) dr S o =: S o P k S o.

C.-M. Kuan, HAC.14 Let G o denote the matrix square root of RM 1 o S o S om 1 R. hen, RM 1 1 ĈM R RM 1 o S o P k S om 1 o R d = Go P q G o. and R (ˆβ ) D β o RM 1 o S o W k (1) = d G o W q (1). o W is thus asymptotically pivotal: W W ) q(1) G o( Go P q G 1Go o W q (1) = W q (1) P 1 q W q (1). Lobato (2001) reported some quantiles, cf. Kiefer et al. (2000). For the null of β i = r, a t-type test is ( ˆβi, t r ) D W (1) = [ 1 ˆδi 0 B(r)2 dr ]. 1/2 his distribution is more disperse than the standard normal distribution.

Kernel-Based Normalizing Matrices C.-M. Kuan, HAC.15 Kiefer and Vogelsang (2002a): 2Ĉ = Σ B without truncation, i.e., l( ) =. he usual Wald test based on Σ B without truncation is thus the same as W /2. In particular, the t test based on Σ B truncation is also t / 2. Kiefer and Vogelsang (2002b): Σκ S o P κ ks o, with P κ k = 1 1 0 0 κ (r s)b k (r)b k (s) dr ds; without he Wald test based on Σ κ without truncation can also serve as a KVB s robust test. A test based on Σ B without truncation compares favorably with that based on Σ QS in terms of test power. Hence, the Bartlett kernel is to be preferred in constructing a KVB test, in contrast with HAC estimation.

M ests C.-M. Kuan, HAC.16 he null hypothesis: IE[f(η t ; θ o )] = 0, where θ o is the k 1 true parameter vector, and f is a q 1 vector of functions. θ o Is Known Define m [r ] (θ) = 1 [r ] t=1 f(η t; θ), for r (0, 1]. An M test is based on m (θ o ), the sample counterpart of the null. By a CL, 1/2 m (θ o ) m (θ o ) Σ 1 m (θ o ) D N (0, Σ o ), and the conventional M test is: D χ 2 (q), when Σ is a consistent estimator of Σ o. he limiting χ 2 distribution hinges on consistent estimation of Σ o.

C.-M. Kuan, HAC.17 [B1](a) Under the null, m [r ] (θ o ) S o W q (r) for 0 r 1, where S o is the nonsingular, matrix square root of Σ o. C (θ o ) = 1 t=1 ϕ t(θ o )ϕ t (θ o ) with ϕ t (θ o ) = 1 t [ f(ηi ; θ o ) m (θ o ) ]. i=1 Analogous to KVB s Wald-type test, an M test is M = m (θ o ) C (θ o ) 1 m (θ o ) By [B1](a), 1/2 m (θ o ) S o W q (1). D W q (1) P 1 q W q (1). ϕ [r ] (θ o ) S o [ Wq (r) rw q (1) ] = S o B q (r), 0 r 1. C (θ o ) S o P q S o with P q = 1 0 B q(r)b q (r) dr.

θ o Is Unknown C.-M. Kuan, HAC.18 Replacing θ o in m and ϕ t with a root- consistent estimator ˆθ that satisfies [ 1 (ˆθ θ o ) = Q o ] q(η t ; θ o ) + o IP (1). t=1 Kuan and Lee (2006): he M test is M = m (ˆθ ) Ĉ 1 m (ˆθ ), where Ĉ = C (ˆθ ) = 1 t=1 ϕ t(ˆθ )ϕ t (ˆθ ) with ϕ t (ˆθ ) = 1 t [ f(ηi ; ˆθ ) m (ˆθ ) ]. i=1 he limit of M depends on the estimation effect of replacing θ o with ˆθ.

C.-M. Kuan, HAC.19 [B1](b) Under the null, 1 t=1 f(η t; θ o ) 1 t=1 q(η G o W q+k (1), t; θ o ) where G o is nonsingular. [B2] F [r ] (θ o ) = [r ] 1 [r ] t=1 θf ( η t ; θ o ) IP F o, uniformly in r (0, 1], where F o is a q k non-stochastic matrix; θ F [r ] (θ o ) = O IP (1). A aylor expansion about θ o gives m[r ] (ˆθ ) = m [r ] (θ o )+ [r ] F [r ](θ o ) [ (ˆθ θ o ) ] +o IP (1); the second term is the estimation effect and converges to rf o Q o A o W k (1), where A o is the matrix square root of G 22 G 22 + G 21 G 21.

C.-M. Kuan, HAC.20 [B1](b) and [B2] imply m (ˆθ ) [I q F o Q o ]G o W q+k (1) d = V o W q (1), where V o is the matrix square root of [I q F o Q o ]G o G o[i q F o Q o ]. Note V o = S o when F o = 0 (i.e., no estimation effect). Due to centering, the estimation effects in ϕ [r ] (ˆθ ) cancel out: ϕ [r ] (ˆθ ) = m [r ] (θ o ) + [r ] F [r ](θ o ) [ (ˆθ θ o ) ] [r ] m (θ o ) [r ] F (θ o ) [ (ˆθ θ o ) ] + o IP (1) = m [r (θ o ) [r ] m (θ o ) + o IP (1). Ĉ = C (θ o ) + o IP (1) S o P q S o, regardless of the estimation effect.

C.-M. Kuan, HAC.21 When estimation effect is present, Ĉ is unable to eliminate V o, and M D Wq (1) V o[ So P q S o] 1V o W q (1), hat is, M depends on S o and V o and is not asymptotically pivotal. When there is no estimation effect (F o = 0), V o = S o, and M D Wq (1) P 1 q W q (1), which is also the limit of M. Remark: he nonsingularity of G o required in [B1](b) is crucial for the M tests here. It excludes the cases that the moment functions (f) and the estimator (which depends on q) are asymptotically correlated, e.g., the over-identifying restrictions in the context of GMM.

M est under Estimation Effect C.-M. Kuan, HAC.22 Kuan and Lee (2006): C = 1 t=k+1 ϕ t ϕ t with ϕ t = ϕ t ( θ t, θ ) = 1 t [ f(ηi, θ t ) m ( θ ) ], i=1 where θ t are the recursive estimators based on first t observations. he M test is M = m (ˆθ ) C 1 m (ˆθ ) which has the same limit as M. 1/2 m (ˆθ ) V o W q (1). D W q (1) P 1 q W q (1), ϕ [r ] V o B q (r), and hence C V o P q V o. While HAC estimation of V o is practically difficult, M avoids estimating V o and hence is also robust to estimation effect.

Example: ests of Serial Correlations C.-M. Kuan, HAC.23 Specification: y t = h(x t ; θ) + e t (θ) with the NLS estimator ˆθ. IE(y t x t ) = h(x t ; θ o ) and ε t := e t (θ o ) = y t h(x t ; θ o ). he null hypothesis is IE[f t,q (θ o )] = IE(ε t ɛ t 1,q ) = 0, where ɛ t 1,q = [ε t 1,..., ε t q ]. Letting q = q, define m q (θ) = 1 e t (θ)e t 1,q (θ). q t=q+1 We can base an M test on m q (ˆθ ) = 1 q t=q+1 e t(ˆθ )e t 1,q (ˆθ ). q 1/2 m q (ˆθ ) and q 1/2 m q (θ o ) are not asymptotically equivalent unless F q (θ o ) converges to F o = 0.

C.-M. Kuan, HAC.24 Here, F q (θ o ) = 1 q [ t=q+1 ɛt 1,q θ h t (θ o ) + ε t θ h t 1,q (θ o ) ]. F o would be zero if {x t } and {ε t } are mutually independent. When h(x t ; θ o ) = x tθ o, F o = 0 when {x t } and {ε t } are mutually uncorrelated. he M test based on model residuals is M q = q m q (ˆθ ) Ĉ 1 q m q (ˆθ ) where the normalizing matrix is Ĉ q = 1 q D W q (1) P 1 q W q (1), t=q+1 ˆϕ t ˆϕ t with ˆϕ t = 1 q t [ ei (ˆθ )e i 1,q (ˆθ ) ] t q 1 q i=q+1 q [ ei (ˆθ )e i 1,q (ˆθ ) ]. i=q+1 M q includes the test of Lobato (2001) for raw time series as a special case.

C.-M. Kuan, HAC.25 F o 0 for the residuals of dynamic models, such as AR models and models with lagged dependent variables. he M test based on the residuals of dynamic models is M q = m q (ˆθ ) C 1 q m q (ˆθ ) where the normalizing matrix is C q = 1 q D W q (1) P 1 q W q (1), t=q+1 ϕ t ϕ t with ϕ t = 1 q t [ ei ( θ t )e i 1,q ( θ t ) ] t q 1 q i=q+1 q [ ei ( θ )e i 1,q ( θ ) ], i=q+1 and e i ( θ t ) = y i h(x i ; θ t ) is the i th residual evaluated at the recursive NLS estimator θ t. M q is a specification test without consistent estimation of the asymptotic covariance matrix.

C.-M. Kuan, HAC.26

C.-M. Kuan, HAC.27 κ(x) 0.2 0.2 0.4 0.6 0.8 1.0 Bartlett Parzen Quadratic Spectra Daniell 0 1 2 3 4 x Figure 1: he Bartlett, Parzen, quadratic spectral and Daniel kernels.

C.-M. Kuan, HAC.28 Figure 2: he asymptotic local powers of the standard M test (solid), M (dashed) and at 5% level. M (dotted)