A TWO-STAGE PLUG-IN BANDWIDTH SELECTION AND ITS IMPLEMENTATION FOR COVARIANCE ESTIMATION

Size: px

Start display at page:

Download "A TWO-STAGE PLUG-IN BANDWIDTH SELECTION AND ITS IMPLEMENTATION FOR COVARIANCE ESTIMATION"

Frederica Baker
6 years ago
Views:

1 Econometric heory, 26, 2010, doi: /s A WO-SAGE PLUG-IN BANDWIDH SELECION AND IS IMPLEMENAION FOR COVARIANCE ESIMAION MASAYUKI HIRUKAWA Northern Illinois University he two most popular bandwidth choice rules for kernel HAC estimation have been proposed by Andrews 1991 and Newey and West his paper suggests an alternative approach that estimates an unknown quantity in the optimal bandwidth for the HAC estimator called normalized curvature using a general class of kernels, and derives the optimal bandwidth that minimizes the asymptotic mean squared error of the estimator of normalized curvature. It is shown that the optimal bandwidth for the kernel-smoothed normalized curvature estimator should diverge at a slower rate than that of the HAC estimator using the same kernel. An implementation method of the optimal bandwidth for the HAC estimator, which is analogous to the one for probability density estimation by Sheather and Jones 1991, is also developed. he finite sample performance of the new bandwidth choice rule is assessed through Monte Carlo simulations. 1. INRODUCION Over the last two decades considerable attention has been paid to heteroskedasticity and autocorrelation consistent HAC estimation for the long-run variance LRV matrix of random vector processes that may exhibit serial dependence and conditional heteroskedasticity of unknown form. his paper focuses on a standard, kernel-smoothing approach to HAC estimation and prescribes a suitable choice of bandwidth for the HAC estimator. he bandwidth choice for a prespecified kernel has been considered by Andrews 1991 and Newey and West While both of these papers derive I would like to thank Bruce Hansen and Kenneth West for providing advice and encouragement. Comments from two anonymous referees, Gordon Fisher, Nikolay Gospodinov, Yuichi Kitamura the co-editor, and Victoria Zinde- Walsh substantially helped the revision of this paper. I also thank David Brown, Xiaohong Chen, Sílvia Gonçalves, Guido Kuersteiner, Carlos Martins-Filho, Nour Meddahi, aisuke Otsu, Katsumi Shimotsu, Gautam ripathi, and participants at Montreal Econometrics Workshop, 2005 Canadian Economics Association Annual Meetings, and seminars at Concordia University, Hitotsubashi University, Oregon State University, Queen s University, University of okyo, and University of Wisconsin for helpful comments and suggestions. Address correspondence to Masayuki Hirukawa, Department of Economics, Northern Illinois University, Zulauf Hall 515, DeKalb, IL 60115, USA; mhirukawa@niu.edu. 710 c Cambridge University Press /10 $15.00

2 WO-SAGE PLUG-IN BANDWIDH SELECION 711 the bandwidth that minimizes the asymptotic mean squared error AMSE of the HAC estimator, they differ in their approach to estimating an unknown quantity in the AMSE-optimal bandwidth. his unknown quantity is the ratio of the spectral density of the innovation process and its generalized derivative, evaluated at zero frequency, which is referred to as normalized curvature hereinafter. Andrews 1991 estimates the normalized curvature by simply fitting an AR1 model. His approach is analogous to Silverman s rule of thumb for probability density estimation Silverman, 1986, Sect A potential problem is that, in general, the data-dependent/automatic bandwidth is not consistent for the AMSEoptimal bandwidth unless the reference model provides a correct specification of the process. Hence, this approach may perform poorly when the process is not well approximated by an AR1 model. In contrast, in order to avoid the issue of misspecification of the process, Newey and West 1994 estimate the normalized curvature nonparametrically using the truncated kernel. However, the use of the truncated kernel prevents them from providing an optimal bandwidth for the normalized curvature estimator. As a result, they implement the bandwidth for the normalized curvature estimator in an ad hoc manner. his paper suggests an alternative approach that adapts the reliable Sheather and Jones 1991 bandwidth choice rule for probability density estimation to HAC estimation. he proposed method is motivated by the parallel setting of probability and spectral density estimation: using the fact that their AMSEs have some common structure, the aim is to establish an analog to the bandwidth choice rule by Sheather and Jones 1991, which has been appraised as the most reliable among all existing methods by Jones, Marron, and Sheather Similarly to the bandwidth choice of Sheather and Jones 1991 that builds on two-stage density estimation Jones and Sheather, 1991, the approach in this paper sequentially estimates normalized curvature first-stage and LRV second-stage using a general class of kernels, where the kernels in the two parts are possibly different. For this reason, the paper calls the proposed approach two-stage plug-in bandwidth selection. he AMSE-optimal bandwidth for the normalized curvature estimator is derived, and it is used to implement the AMSE-optimal bandwidth for the HAC estimator with an algorithm analogous to the one by Sheather and Jones In a related context, Politis 2003 and Politis and White 2004 propose to estimate normalized curvature nonparametrically using the flat-top kernel for probability and spectral density estimation and for the block choice problem in the moving block bootstrap. While they argue that the flat-top kernel for normalized curvature estimation appears to be theoretically very appealing, such an infiniteorder kernel is not considered in this paper. Also, although an optimal kernel choice for normalized curvature estimation or even an optimal combination of kernels for first- and second-stage estimation is beyond the scope of this paper, this presents an interesting challenge for future research. he remainder of the paper is organized as follows: Section 2 develops the theory of two-stage plug-in bandwidth selection and the implementation method

3 712 MASAYUKI HIRUKAWA of the optimal bandwidth with theoretical ustifications. Section 3 reports the results of two Monte Carlo experiments. Section 4 summarizes the main results of the paper. All assumptions are given in Appendix A and proofs are given in Appendix B. his paper adopts the following notational conventions: [x] denotes the integer part of x; A signifies the Euclidean norm of matrix A, i.e., A = tra A } 1/2 ; vec A denotes the column-by-column vectorization function of matrix A; is used to represent the tensor or Kronecker product; and c> 0 denotes a generic constant, the quantity of which varies from statement to statement. he expression X Y is used whenever X /Y 1as. Lastly, define 0 0 1by convention. 2. WO-SAGE PLUG-IN BANDWIDH SELECION 2.1. Optimal Bandwidth for Normalized Curvature Estimation o illustrate the main ideas, consider LRV estimation in the generalized method of moments GMM framework Hansen, Suppose that an economic theory implies a set of moment conditions Egz t,θ 0 } Eg t = 0, where z t } t= is a stationary, strongly mixing process, θ R p is a parameter vector of interest with true value θ 0, and gz,θ R s p s is a known measurable vector-valued function in z, θ. Define the LRV of g t } as 1 = lim E t=1 g t g t t=1 } = Eg t g t = = Ɣ g. = When g t } exhibits serial dependence and conditional heteroskedasticity of unknown form, the inverse of a HAC estimator of consistently estimates the optimal weighting matrix that is required for efficient GMM estimation. he standard HAC estimator of takes the form of weighted autocovariances ˆ = 1 k ˆƔ g = = 1 S 1 1 k = 1 S min +, } ĝ t ĝ t, t=max1,1+ } where k is a kernel function, S R + is a nonstochastic bandwidth sequence, ĝ t = gz t, ˆθ, and ˆθ is the first-step GMM estimator. Likewise, we denote the pseudo-estimator of as = 1 k Ɣ g = = 1 S 1 1 k = 1 S min +, } g t g t, t=max1,1+ } which has the same form as ˆ but is based on the unobservable process g t } rather than ĝ t }.

4 WO-SAGE PLUG-IN BANDWIDH SELECION 713 Consider first the AMSE-optimal bandwidth S for the pseudo-estimator. Following Newey and West 1994, 1 define the mean squared error MSE of as } 2 MSE ; = E w w, 1 where w is an s 1 possibly random weighting vector that converges in probability, at a suitable rate, to a constant vector w. Also let s n = = n w Ɣ g w for n = 0,q 0,, where q is the characteristic exponent of a kernel kx Parzen, 1957 that satisfies k q lim x 0 1 kx}/ x q 0,. hen, if s q 0, 1 is approximated by MSE ; = k2 q s q 2 S 2q + S 2 s 0 2 } k 2 xdx + o S 2q + S. 2 he optimal bandwidth that minimizes 2 is S = γ qk 2 1/2q+1 q R q 2 = k 2 xdx 1/2q+1 1/2q+1, 3 where R q = s q /s 0 is the only unknown quantity in this formula called normalized curvature. Following Jones and Sheather 1991, we estimate the normalized curvature R q using a kernel l possibly different from k that has the characteristic exponent r 0, satisfying l r lim x 0 1 lx}/ x r 0,. Hereinafter, the kernels l and k are called the first- and second-stage kernels, respectively. Also let Ɣ h be the th autocovariance of the scalar process h t } = w } g t, where w is the probability limit of the weighting vector in 1. hen Ɣ h = w Ɣ g w = w E g t g t w and s n = = n Ɣ h. Also, let b R + be a nonstochastic bandwidth sequence for the first-stage kernel, and let Ɣ h = 1 min +, } t=max1,1+ } h th t. hen, the pseudo-estimator of R q is written as R q b sq s 0 1 = 1 l /b q Ɣ h = 1 1 l /b. 4 Ɣ h Now we derive the AMSE-optimal bandwidth for R q b. 2 o approximate the MSE of R q b, it is convenient to apply the idea of the delta method. Let δ = 1/s 0, s q /s 0 2 and h = s q s q, s 0 s 0. aking the first-order aylor expansion of R q b around s q, s 0 = s q,s 0 gives R q b =

5 714 MASAYUKI HIRUKAWA R q + δ h + h o p 1. hen, the asymptotic bias ABias and the asymptotic variance AVar of R q b become E s q ABias R q b = δ s q E s 0, s 0 Var s q AVar R q b = δ Cov s q, s 0 δ. Cov s q, s 0 Var s 0 Based on the assumptions given in Appendix A, the following lemmas give the approximations to the bias and variance terms of h. LEMMA 1. If Assumptions A1, A3, and A4 hold, then lim br E s q s q} = l r s q+r, E s 0 s 0} = l r s r. lim br LEMMA 2. If Assumptions A1, A3, and A4 hold, then lim lim lim b q+1 b 2q+1 Var s q = 2 s 0 2 x 2q l 2 xdx, Var s 0 = 2 s 0 2 l 2 xdx, b Cov s q, s 0 = 2 s 0 2 x q l 2 xdx. he two lemmas demonstrate that while the asymptotic biases of the spectral density and its generalized derivative estimators are of the same order, the asymptotic variance of the derivative estimator dominates in order of magnitude. heorem 1 on the AMSE of R q b and the optimal first-stage bandwidth b follows directly from these lemmas, and thus the proof is omitted. HEOREM 1. If Assumptions A1, A3, and A4 hold and s q s r s 0 s q+r 0, then the MSE of R q b is approximated by MSE R q b ; R q = l2 r C2 q,r b 2r + o b 2r + b2q+1 } 2 x 2q l 2 xdx + b2q+1, 5

6 WO-SAGE PLUG-IN BANDWIDH SELECION 715 where Cq,r = s q s r s 0 s q+r} / s 0 2. he optimal bandwidth that minimizes 5 is b = β 1/2q+2r+1 } 1/2q+2r+1 rlr 2 = C2 q,r 2q + 1 x 2q l 2 1/2q+2r+1. 6 xdx At the optimum, MSE R q b ; Rq 2r/2q+2r+1 β 2r/2q+2r+1 l 2 r C2 q,r + 2β 2q+1/2q+2r+1 } x 2q l 2 xdx. Practitioners may wish to employ the same kernel twice to estimate normalized curvature and LRV. he following corollary refers to the special case in which the same kernel is employed in both stages. his corollary is also valid when two distinct kernels that have the same characteristic exponent are employed e.g., when the Parzen and quadratic spectral QS kernels are employed in the first and the second stages, respectively. It is worth mentioning that the Bartlett and Parzen kernels can be employed twice, whereas the QS kernel cannot, because x 4 k 2 QS xdx = see able 1 in Section 2.2 and the AVar in 5 is not well defined. COROLLARY 1. Suppose that the kernels employed in the first and second stages have the same characteristic exponent, i.e., r = q. If Assumptions A1, A3, and A4 hold and s q 2 s 0 s 2q 0, then the MSE of R q b is approximated by MSE R q b ; R q = l2 q C2 q + o b 2q b 2q + b2q+1 } 2 x 2q l 2 xdx + b2q+1, 7 where Cq Cq,q = s q 2 s 0 s 2q}/ s 0 2. he optimal bandwidth that minimizes 7 is } b = β ql 2 1/4q+1 1/4q+1 q = C2 q 2q + 1 x 2q l 2 1/4q+1. xdx

7 716 MASAYUKI HIRUKAWA At the optimum, MSE R q b ; Rq 2q/4q+1 β 2q/4q+1 l 2 q C2 q + 2β 2q+1/4q+1 } x 2q l 2 xdx. heorem 1 shows that the optimal bandwidth 6 depends on yet another unknown quantity Cq, r; the next section discusses the implementation method of the optimal bandwidth, including the estimation of this unknown quantity. Corollary 1 demonstrates that if the same kernel is employed in both stages, the optimal divergence rate of the first-stage bandwidth is b = O 1/5 with MSE R 1 b ; R1 = O 2/5 for q = 1 Bartlett, and b = O 1/9 with MSE R 2 b ; R2 = O 4/9 for q = 2 Parzen. hus, the divergence rate of b is slower than that of the optimal bandwidth for the HAC estimator S using the same kernel. Next, we focus on the HAC estimator ˆ that is based on the observable process ĝ t }. Accordingly, the normalized curvature estimator should be based on } ĝ t. A random weighting vector w may need to be considered. hen, let ŝ n 1 = min +, } } t=max1,1+ } ĥ,t = 1 l /b n ˆƔ h, for n = 0,q, where ˆƔ h, = 1 ĥ,t ĥ,t is the th sample autocovariance of the process = w ĝt}. Also, let ˆR q b =ŝ q /ŝ0. Furthermore, let ŝn and ˆR q b denote the corresponding counterparts to a constant weighting vector case. Following Andrews 1991, the AMSE criterion is also modified in two respects. First, the normalized or scale-adusted version of MSE is introduced so that its dominating term becomes O1. Using the scale factor 2r/2q+2r+1, the normalized MSE of ˆR q b can be expressed as MSE ˆR q b ; R q, 2r/2q+2r+1 = 2r/2q+2r+1 MSE ˆR q b ; R q. 8 Hereinafter, the MSE refers to 8, unless otherwise stated. Second, if ˆθ has an infinite second moment, its use may dominate the normalized MSE criterion, even though the effect of replacing θ 0 with ˆθ in constructing ˆR q b is at most o p 1. hen, the MSE is truncated by the scalar m > 0. he truncated MSE of ˆR q b with the scale factor 2r/2q+2r+1 is MSE m ˆR q b ; R q, 2r/2q+2r+1 = E min 2r/2q+2r+1 ˆR q b R q 2,m }.

8 WO-SAGE PLUG-IN BANDWIDH SELECION 717 In the rest of the paper, the truncated MSE with arbitrarily large truncation point lim lim MSE m ˆR q m b ; R q, 2r/2q+2r+1 is used as the criterion of optimality. he next theorem shows that the normalized MSE of ˆR q b is asymptotically equivalent to the normalized MSE of R q b. HEOREM 2. If Assumptions A1 and A3 A6 hold and b 2q+2r+1 / β 0,, then a r/2q+2r+1 ˆR q b R q p b } 0. b lim m lim MSE m ˆR q b ; R q, 2r/2q+2r+1 = lim m lim MSE m R q b ; R q, 2r/2q+2r+1 = lim MSE R q b ; R q, 2r/2q+2r Implementation of Optimal Bandwidth for HAC Estimation Following Sheather and Jones 1991, we obtain the optimal bandwidth for the HAC estimator S by numerically solving the fixed-point problem. We refer to this implementation method as the solve-the-equation plug-in SP rule. 3 he SP bandwidth estimator of S may be derived by solving 3 for, yielding } k 2 xdx S = qk 2q+1 q 2 R q 2, 9 and then substituting 9 into 6 to get an expression for b as a function of S : b = b S α 2 q,rrl 2 } = r k 2 1/2q+2r+1 xdx q2q+1kq 2 x 2q l 2 S xdx 2q+1/2q+2r+1, 10 where αq,r = Cq,r/R q = s r /s 0 s q+r /s q. By 3 and 4, the bandwidth estimator Ŝ is given by the root of the system of nonlinear equations 10 and qk 2 S = q ˆR q 2 1/2q+1 b S k 2 1/2q xdx In case of multiple roots, the SP bandwidth estimator is defined formally as follows: 4

9 718 MASAYUKI HIRUKAWA ABLE 1. Characteristic numbers of kernels most popularly applied Kernel q k q k 2 xdx x 2 k 2 xdx x 4 k 2 xdx Bartlett 1 1 2/3 1/15 2/105 Parzen / / / Quadratic spectral 2 18π 2 / /72π 2 DEFINIION. he SP bandwidth estimator Ŝ is defined as the largest root that solves the system of equations 10 and 11. When the same kernel is employed to estimate normalized curvature and LRV so that lx = kx and r = q, many common factors are canceled out, and Ŝ is derived by the simplified system b S = S = qk 2 q ˆR q 2 b S k 2 xdx α 2 q k 2 xdx 2q + 1 x 2q k 2 xdx 1/2q+1 1/2q+1, } 1/4q+1 S 2q+1/4q+1, where αq = αq,q = s q /s 0 s 2q /s q. 5 For convenience, able 1 displays the characteristic numbers of popular kernels that are required to calculate the optimal bandwidths b and S. he only remaining problem is to determine how to deal with the unknown quantity αq. Since ˆ and ˆR q b are q/2q+1 - and q/4q+1 -consistent at the optimum, a proxy of αq with a parametric convergence rate suffices for the consistency of the HAC estimator. Park and Marron 1990 and Sheather and Jones 1991 argue that the influence of fitting a parametric model to αq at this point appears to be less crucial than fitting it directly to R q as in Andrews hen, fitting h t } to a reference AR1 model h t = φh t 1 + ɛ t is considered. A proxy of αq is obtained by substituting the least squares estimate of the AR coefficient ˆφ LS into s n, n = 0,q,2q. he formulas of the proxy ˆαq for q = 1,2 for the AR1 model are ˆφ LS / ˆφ LS 2 1 for q = 1 ˆαq = 2. ˆφ LS ˆφ LS + 1 / ˆφ LS 1 for q = Properties of Automatic Bandwidth his section provides a theoretical ustification for the automatic two-stage plugin bandwidth selection. Let ˆξ and ξ be the parameter estimator of the model fitted

10 WO-SAGE PLUG-IN BANDWIDH SELECION 719 to the process h t }, and its probability limit, respectively. In line with the parametric specification, the first- and second-stage bandwidths are rewritten as b ξ 1/2q+2r+1 and S ξ. Also let ˆb = ˆβ and Ŝ = ˆγ 1/2q+1 be the corresponding automatic bandwidths with ˆξ plugged in. he next two theorems show that using the automatic two-stage plug-in bandwidth, we can consistently estimate the normalized curvature and LRV, even when the fitted reference model is misspecified. HEOREM 3. If Assumptions A1 and A3 A7 hold and b 2q+2r+1 ξ / β ξ = rlr 2C2 ξ q,r/ 2q + 1 x 2q l 2 xdx } with C ξ q,r 0,, then a r/2q+2r+1 ˆR q ˆb R q b ξ } p 0. b lim m lim MSE m ˆR q ˆb ; R q ξ, 2r/2q+2r+1 = lim m lim MSE m R q b ξ ; R q ξ, 2r/2q+2r+1 = lim MSE R q b ξ ; R q ξ, 2r/2q+2r+1. HEOREM 4. If Assumptions A1 A7 hold and S 2q+1 / ξ γξ = qkq R 2 q 2 / ξ k 2 xdx with R q 0,, then ξ a q/2q+1 w ˆ w w w p 0. b lim m lim MSE m ˆ ;, 2q/2q+1 = lim m lim MSE m ;, 2q/2q+1 = lim MSE ;, 2q/2q+1. From a practical point of view, it is interesting to know what happens to the automatic two-stage plug-in bandwidth if the process g t } is serially uncorrelated. he next lemma shows that even in the absence of serial dependence in g t }, the consistency results still hold. LEMMA 3. Suppose that Ɣ g = 0, 0. If Assumptions A1 A7 hold, then ˆR q ˆb p R q ξ and ˆ p. Lemma 3 does not consider a random weighting scheme; for the consistency of ˆR q ˆb and ˆ, Assumption A6b should be replaced by the fairly stringent condition 1/2 w w p 0.

11 720 MASAYUKI HIRUKAWA 3. MONE CARLO RESULS 3.1. Experiment A: Accuracy of LRV Estimates his experiment investigates the accuracy of LRV estimates using the SP bandwidth estimator. he data generating processes DGPs are univariate ARMA1,1 and MA2 models. hese models are often used for Monte Carlo experiments in time series analysis. he parameter settings are given below. In all experiments, the sample size and the number of replications are 128 and 2,000, respectively. ARMA1,1: x t = ρx t 1 + ɛ t + ψɛ t 1,ɛ t iid N0,1, ρ,ψ 0,±.5,±.9}, ρ + ψ 0. iid MA2: x t = ɛ t + ψ 1 ɛ t 1 + ψ 2 ɛ t 2,ɛ t N0,1, ψ 1,ψ 2 = 1.9,.95, 1.3,.5, 1.0,.2,.67,.33, 0,.9,0,.9, 1.0,.9. LRV estimates are calculated by the following nine estimators: i the QS estimator with AR1 referenced by Andrews 1991 QS-AR; ii the Bartlett estimator by Newey and West 1994 with the bandwidth for the normalized curvature estimator set equal to [ 4/100 2/9] B-NW; iii the Bartlett estimator with AR1 reference B-AR; iv the Bartlett two-stage plug-in estimator, where C1 in b is estimated by AR1 reference B-2P; v the Bartlett SP estimator B-SP; vi the Parzen estimator with AR1 reference PZ-AR; vii the Parzen two-stage plug-in estimator, where C 2 in b is estimated by AR1 reference PZ-2P; viii the Parzen SP estimator PZ-SP; and xi the truncated estimator with AR1 reference suggested by Andrews 1991, fn. 5, p. 834 R- AR. Estimators i ii are widely applied in empirical work, while iii iv and vi vii are calculated as the benchmarks for two corresponding SP estimators. Unlike the other estimators, estimator xi does not necessarily yield nonnegative LRV estimates in finite samples. In case of a negative estimate, the bandwidth is shortened until the resulting estimate becomes positive. he root mean squared error RMSE is chosen as the performance criterion, whereas bias is reported for convenience. o avoid obtaining extraordinarily large RMSEs, the least squares estimate of the AR1 coefficient ˆφ is adusted to be less than.95 in modulus. ables 2 and 3 present the Monte Carlo results for ARMA1,1 and MA2 models, respectively. he RMSEs and the biases in parentheses of LRV estimates are reported in the first and second rows of a given DGP. For convenience, the true value of LRV is also provided. he main findings can be summarized as follows: So long as the AR1 reference correctly specifies the underlying process, QS-AR performs best. However, for DGPs with MA terms MA2 models, in particular the performance of QS-AR tends to be dominated by SP estimators.

12 WO-SAGE PLUG-IN BANDWIDH SELECION 721 Since the SP estimators are designed to limit the influence of the AR1 reference, they do not perform well for AR1 models. Once MA terms are introduced, they appear reliable in the sense that they often substantially reduce RMSEs, compared with their corresponding AR1 reference-based and 2P estimators. B-SP performs best in the presence of moderate positive serial dependence. Even in the presence of negative serial dependence, it often outperforms QS-AR, while the latter still exhibits advantages for the DGPs with dominating AR coefficients such as ARMA1,1 with ρ,ψ =.9,.5. B-SP tends to improve its RMSE mainly by reducing the variance, and as a result it possesses a large bias even in the case in which it has a smaller RMSE than QS-AR; see ARMA1,1 with ρ,ψ = 0,.9,.5,.5 and MA2 with ψ 1,ψ 2 =.67,.33, for example. he issue of large bias is remarkable particularly for highly persistent DGPs. PZ-SP performs best in the presence of negative serial dependence. However, in the presence of positive serial dependence, it often has a large RMSE and tends to be outperformed by QS-AR. Because of its way of estimating normalized curvature, B-NW is expected to work well for MA models. It indeed performs best for some MA2 models, but its overall performance does not exceed QS-AR or SP estimators. Due to the issue of negative estimates in the presence of strong negative serial dependence, R-AR performs extremely poorly for such DGPs. On the other hand, it sometimes performs best with respect to both RMSE and bias for the DGPs with positive serial dependence. he results indicate that although no dominant estimator is found, SP estimators can yield more accurate LRV estimates for a wide variety of DGPs that cannot be well approximated by AR1 models. herefore, the next experiment focuses only on SP estimators Experiment B: Size Properties of est Statistic Although the SP rule is primarily motivated by improved LRV estimation, it is also of interest whether the SP bandwidth estimator can be applied as a useful tool for inference. hen, following West 1997, this experiment investigates the size properties of a test statistic based on the linear regression y t = θ 1 + θ 2 x 2t + θ 3 x 3t + θ 4 x 4t + θ 5 x 5t + u t x t θ + u t, x 1t 1, t = 1,...,. Without loss of generality the true parameter value θ is set equal to zero. he parameter is estimated by OLS, and thus the asymptotic covariance matrix of the OLS estimator ˆθ is calculated as ˆV 1 t=1 x t x t 1 estimate of 1 t=1 x t x t 1.

13 722 MASAYUKI HIRUKAWA ABLE 2. Accuracy of LRV estimates for ARMA 1,1 models ρ ψ QS-AR B-NW B-AR B-2P B-SP PZ-AR PZ-2P PZ-SP R-AR

14 WO-SAGE PLUG-IN BANDWIDH SELECION Note: he first and second rows of each DGP are RMSEs and biases in parentheses.

15 724 MASAYUKI HIRUKAWA ABLE 3. Accuracy of LRV estimates for MA2 models ψ1 ψ2 QS-AR B-NW B-AR B-2P B-SP PZ-AR PZ-2P PZ-SP R-AR Note: he first and second rows of each DGP are RMSEs and biases in parentheses.

16 WO-SAGE PLUG-IN BANDWIDH SELECION 725 he test statistic of interest is the Wald statistic of the first slope coefficient ˆθ 2 2/ d ˆV 22 χ 2 1 under H 0 : θ 2 = 0. In all experiments, the sample size and the number of replications are 128 and 2,000, respectively. he regressors follow independent AR1 processes with a common AR parameter φ, i.e., x it = φ x it 1 + e it, i = 2,...,5, where φ =.5 or.9. he variance of the i.i.d. normal random variable e it } is chosen so that x it } has a unit variance. he error term u t } independently follows one of the time series models used in Experiment A or the AR2 model u t = 1.6u t 1.9u t 2 +v t. An important difference between the error term and the regressors is that since the innovation in each DGP of u t } follows iid v t N 0,1, the variance of u t } varies across models. he Wald statistics are calculated based on five estimators, namely, QS-AR, B-NW, B-SP, PZ-SP, and R-AR. o check whether the size properties can be improved by prewhitening Andrews and Monahan, 1992, both nonprewhitened and prewhitened versions are investigated for all estimators except R-AR. he procedure of prewhitening follows Andrews and Monahan 1992, with the eigenvalues of the fitted VAR1 coefficient matrix adusted to be less than.97 in modulus. he weighting matrix for QS-AR and R-AR is a diagonal that assigns zero to the element corresponding to the cross-product of the intercept and the error term and one otherwise, as suggested in Andrews he same rule applies to the weighting vector for the other estimators. ables 4 φ =.5 and 5 φ =.9 report finite sample reection frequencies at the 5% nominal size. he main findings can be summarized as follows: able 4 shows that the performance of each of the three nonprewhitened estimators QS-AR, B-SP and PZ-SP is similar and satisfactory in general. Although overreections are observed in the presence of positive serial dependence and they are sometimes considerable for B-SP, they are substantially reduced by prewhitening. he size properties of the three prewhitened estimators are comparable. able 5 indicates that QS-AR sometimes yields an erratic test statistic. As reported in West 1997, it often reects the null too infrequently in the presence of strong negative serial dependence, and it appears that prewhitening does not improve the size properties; see ARMA1,1 with ρ,ψ = 0,.9,.5,.9 and MA2 with ψ 1,ψ 2 = 0,.9. Moreover, there are cases in which prewhitening makes the performance of PZ-SP worse; see ARMA1,1 with ρ,ψ = 0,.9,0,.5 and MA2 with ψ 1,ψ 2 = 1.9,.95, 1.3,.5, 1.0,.2. In contrast, B-SP tends to be less sensitive to prewhitening for the same DGPs. It could be the case that the second-order spectral density derivative estimator and thus second-order normalized curvature estimator appears to be more sensitive to prewhitening than the first-order one. Overall nonprewhitened B-NW tends to exhibit overreections of the null, and prewhitening does not necessarily help to reduce them substantially.

17 726 MASAYUKI HIRUKAWA ABLE 4. Finite sample reection frequencies at the 5% nominal size φ =.5 Nonprewhitened Prewhitened QS-AR B-NW B-SP PZ-SP QS-AR B-NW B-SP PZ-SP R-AR ARMA 1,1: ρ ψ MA2: ψ1 ψ AR2: ρ1 ρ

18 WO-SAGE PLUG-IN BANDWIDH SELECION 727 ABLE 5. Finite sample reection frequencies at the 5% nominal size φ =.9 Nonprewhitened Prewhitened QS-AR B-NW B-SP PZ-SP QS-AR B-NW B-SP PZ-SP R-AR ARMA1,1: ρ ψ MA2: ψ1 ψ AR2: ρ1 ρ

19 728 MASAYUKI HIRUKAWA Again, as reported in West 1997, R-AR often yields a test statistic that is too small in the presence of negative serial dependence. Its performance in the presence of positive serial dependence is in general better than nonprewhitened QS-AR but worse than prewhitened QS-AR, B-SP, and PZ-SP. able 5 indicates that there are cases in which prewhitening does not work well for inference. For MA2 with ψ 1,ψ 2 = 0,.9, prewhitening worsens the size properties of QS-AR and B-SP. For MA2 with ψ 1,ψ 2 = 1.0,.9 and AR2 with ρ 1,ρ 2 = 1.6,.9, the nonprewhitened Wald statistic based on B-SP shows satisfactory performance. Prewhitening worsens the size properties of the Bartlett-based estimator in the MA2 case, and it makes the QS- and Bartlett-based estimators underreect in the AR2 case. he spectral densities of the three DGPs have a peak or trough at a nonzero frequency. A lesson that can be drawn from this experience is that prewhitening may adversely affect the performance of test statistics when DGPs have such nasty spectral densities. 4. CONCLUSION his paper develops a new method for bandwidth selection in HAC estimation. he proposed two-stage plug-in bandwidth selection is inspired by a well-known bandwidth choice rule in the literature of probability density estimation. he key idea is to estimate normalized curvature using a general class of kernels and then derive the AMSE-optimal bandwidth for the normalized curvature estimator. It is demonstrated that the optimal bandwidth should diverge at a slower rate than that of the HAC estimator using the same kernel. he SP rule, an implementation method for the AMSE-optimal bandwidth selection for the HAC estimator, is also developed. he Monte Carlo results indicate that for a variety of DGPs, the HAC estimator based on the SP rule can estimate LRV more accurately than the QS estimator by Andrews 1991 or the Bartlett estimator by Newey and West he test statistic constructed from the SP-HAC variance estimator has size properties that are comparable with the QS-based test statistic, and better in general than the test statistic based on the Bartlett estimator. NOES 1. In the approximation to the MSE of the HAC estimator, it is convenient to reduce the problem to a scalar one using some weighting vector, as in Newey and West Deriving only the range of divergence rates of b for the consistency of the HAC estimator is not sufficient for constructing an analog to the Sheather and Jones 1991 rule. 3. he solve-the-equation approach originally comes from Park and Marron he following definition comes from the suggestion in Park and Marron In line with this definition, a recommended root search algorithm is the grid search starting from some large positive number. GAUSS codes for SP bandwidth estimators using the Bartlett and Parzen kernels are available on the author s web page.

20 WO-SAGE PLUG-IN BANDWIDH SELECION he rest of this section and Section 3 Monte Carlo Results exclusively consider the case in which the same kernel is employed twice. REFERENCES Andrews, D.W.K Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, Andrews, D.W.K. & J.C. Monahan 1992 An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica 60, Hannan, E.J Multiple ime Series. Wiley. Hansen, L.P Large sample properties of generalized method of moments estimators. Econometrica 50, Jansson, M Consistent covariance matrix estimation for linear processes. Econometric heory 18, Jones, M.C., J.S. Marron, & S.J. Sheather 1996 A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association 91, Jones, M.C. & S.J. Sheather 1991 Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Statistics & Probability Letters 11, Newey, W.K. & K.D. West 1994 Automatic lag selection in covariance matrix estimation. Review of Economic Studies 61, Park, B.U. & J.S. Marron 1990 Comparison of data-driven bandwidth selectors. Journal of the American Statistical Association 85, Parzen, E On consistent estimates of the spectrum of a stationary time series. Annals of Mathematical Statistics 28, Politis, D.N Adaptive bandwidth choice. Journal of Nonparametric Statistics 4 5, Politis, D.N. & H. White 2004 Automatic block-length selection for the dependent bootstrap. Econometric Reviews 23, Silverman, B.W Density Estimation for Statistics and Data Analysis. Chapman & Hall. Sheather, S.J. & M.C. Jones 1991 A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B 53, West, K.D Another heteroskedasticity- and autocorrelation-consistent covariance matrix estimator. Journal of Econometrics 76, APPENDIX A: Assumptions All the assumptions that establish the theorems are given below. Assumption A1 and A2 refer to the properties of the first- and second-stage kernels. Although these appear restrictive, every K 1 class kernel Andrews, 1991 with bounded support and a finite characteristic exponent greater than 1/2 including the Bartlett and Parzen kernels satisfies these conditions. Note that infinite-order kernels such as the truncated and flat-top kernels do not satisfy Assumption A1 or A2. he conditions 0 x 2q lxdx < and 0 kxdx < in Assumptions A1a and A2a ensure that certain Riemann-type sums defined in terms of kernels l and k converge to their integral representation counterparts; see Jansson 2002, p for discussion. Assumptions A4a and A4b are the same as Assumption 2 in Newey and West Assumption A4c is also standard for spectral density estimation. As discussed in Andrews 1991, Assumption A6a implies that the right-hand side of 8 is L 1+δ -bounded for some δ>0. Without this assumption, it would be L 1 - bounded, which would not suffice to establish the first-order equivalences of MSEs in heorems 2, 3, and 4. Assumption A6b is required only when a random weighting scheme is applied.

21 730 MASAYUKI HIRUKAWA Assumption A1. he first-stage kernel l satisfies the following conditions: a l : R [ 1,1], l0 = 1, lx = l x, x R, l is continuous at 0 and almost everywhere, the characteristic exponent r 1/2,, for a given characteristic exponent of the second-stage kernel q, sup x 0 x q lx < and 0 x 2q lxdx < where lx = sup y x ly. b lx ly c x y for some c, x, y R. c For a given characteristic exponent of the second-stage kernel q, lx c x b 1 for some c and for some b 1 > q q + 2/2q +r}. d lx has [r] + 1 continuous, bounded derivatives on [ 0, x 1 ] for some x1 > 0, with the derivatives at x = 0 evaluated as x 0+. Assumption A2. he second-stage kernel k satisfies the following conditions: a k : R [ 1,1], k0 = 1, kx = k x, x R, k is continuous at 0 and almost everywhere, the characteristic exponent q /4,, and 0 kxdx< where kx = sup y x ky. b kx ky c x y for some c, x, y R. c For a given characteristic exponent of the first-stage kernel r, kx c x b 2 for some c and for some b 2 > 1 + 2q + 2r + 1/q2r 1 1/2}, provided that q2r 1>1/2. d kx has [q] + 1 continuous, bounded derivatives on [ ] 0, x 2 for some x2 > 0, with the derivatives at x = 0 evaluated as x 0+. Assumption A3. a he first-stage bandwidth b satisfies 1 / b + b max1,r} / 2q+1/ + b 0as. b he second-stage bandwidth S satisfies 1 / S + S max1,q} / 0as. Assumption A4. a gz,θ is twice continuously differentiable with respect to θ in a neighborhood N 0 of θ 0 with probability 1. b Let g t θ gz t,θ, g tθ θ gz t,θ / θ, and g itθθ θ 2 g i z t,θ/ θ θ, where g i, is the ith component of g,. hen, there exist a measurable function ϕz and some constant K > 0 such that sup θ N0 g t θ <ϕz, sup θ N0 g tθ θ <ϕz, sup θ N0 g itθθ θ <ϕz, i = 1,...,s, and Eϕ 2 z} < K. c Let v t g t θ 0,vecg tθ θ 0 Eg tθ θ 0 = g t,vecg tθ θ 0 Eg tθ θ 0. Also let Ɣ v and κ v,abcd,, be the th-order autocovariance of the process v t } and the fourth-order cumulant of v a,t,v b,t+,v c,t+ +l,v d,t+ +l+n, where v i,t is the ith element of v t. hen, v t } is a zero-mean, fourth-order stationary sequence that satisfies = q+max1,r} Ɣ v < and = l= n= κv,abcd,l,n <, a,b,c,d s + ps. Assumption A5. 1/2 ˆθ θ 0 = O p 1.

22 WO-SAGE PLUG-IN BANDWIDH SELECION 731 Assumption A6. a he process g t } is eighth-order stationary with 1 = 7 = κ g,a 1...a 8 1,..., 7 <, a 1,...,a 8 s, where κ g,a1...a 8 1,..., 7 is the cumulant of ga1,0, g a2, 1,...,g a8, 7 and gi,t is the ith element of g t. b he random weighting vector w satisfies either q/2q+1 w w p 0 for r q2q + 1, or r/2q+2r+1 w w p 0 for r > q2q + 1. ˆξ Assumption A7. 1/2 ξ = O p 1. APPENDIX B: Proofs Proof of Lemma 1. he proof closely follows that of heorem 10 in Chapter V of Hannan Using E Ɣ h = 1 / Ɣ h, 0,±1,...,± 1},gives b r E s q s q} = b r 1 } l 1 q Ɣ h b r 1 = 1 b l q = 1 b Ɣ h b r q Ɣ h B 1 B 2 B 3. Now, 1 } 1 l /b B 1 = = 1 /b r q+r Ɣ h l r q+r Ɣ h = l r s q+r. = On the other hand, B 2 br 1 l q+1 Ɣ h = 1 b b r / = q+r Ɣ h 0 for r 1 b r / = q+1 Ɣ h 0 for r < 1. Also by b for an arbitrarily large, B 3 2 q+r Ɣ h 0, = which establishes the first approximation. Assumption A4c implies that = max1,r} Ɣ h <. hen the second approximation is immediately established if this condition is used for the term corresponding to B 2. n

23 732 MASAYUKI HIRUKAWA Proof of Lemma 2. he proof closely follows that of heorem 9 in Chapter V of Hannan he result in Hannan 1970, p. 313 gives Cov Ɣ h i, Ɣ h = Ɣ h uɣ h u + i + Ɣ h u + iɣ h u + κ h i,u,u + }ϕ u,i,, u= B.1 where κ h,, is the fourth-order cumulant generated by the process h t }, and ϕ u,i, is defined for i by 0 if u + i; 1 i u/ if + i u 0; ϕ u,i, = 1 i/ if 0 u i ; 1 + u/ if i u ; 0 if u. Hence, Var s q b 2q+1 = 1 b + 1 b + 1 b 1 i= 1 1 i= 1 1 i= 1 V 1 + V 2 + V 3. 1 = 1 1 = 1 1 = 1 i q q b b i q q b b i q q b b Let v i. hen, V 1 can be rewritten as i l l Ɣ h uɣ h u + i ϕ u,i, b b u= i l l Ɣ h u + iɣ h u ϕ u,i, b b u= i l l κ h i,u,u + ϕ u,i, b b u= 2 1 V 1 = Ɣ h uɣ h u + v v= 2 1 u= 1 b ϕ u, + v, q + v q } b l + v b b l, b where the summation over runs only for : 1, + v 1}. Picking trimming functions m = O b 1 ɛ for some ɛ 0,1 and M = O b 1+η for some η 0,ɛ/2q + 1, we can show that } 2 1 V 1 Ɣ h u u m b 2q } M b l 2 s 0 2 b x 2q l 2 xdx<.

24 WO-SAGE PLUG-IN BANDWIDH SELECION 733 A detailed argument is available on the author s web page. Similarly, we have V 2 s 0 2 x 2q l 2 x dx. Lastly, by Assumptions A1a and A4c, V 3 1 b 2 sup x lx q x 0 i= u= κ h i,u,v 0, v= which establishes the first approximation. he second approximation is a standard result of spectral density estimation. he third approximation can be shown by recognizing that x q l 2 xdx < by Assumption A1a. n Proof of heorem 2. Part a: On the right-hand side of r/2q+2r+1 ˆR q } b R q b = r/2q+2r+1 ˆR q b ˆR q b } } + r/2q+2r+1 ˆR q b R q b, the first term is o p 1 by Assumption A6b. Hence, we need to show that the second term is o p 1. aking the first-order aylor expansion of ˆR q b around ŝ q,ŝ 0 = s q, s 0 gives ˆR q b = R q b + δ ĥ + ĥ o p 1, where δ = 1/ s 0, s q / s 0 2 and ĥ = ŝ q s q,ŝ 0 s 0. hen we only need to show that r/2q+2r+1 ŝ n s n} p 0, n = 0,q. B.2 aking the second-order aylor expansion of ĥ t = w ĝ t = w g z t, ˆθ around ˆθ = θ 0 gives ĥ t = h t + h t θ ˆθ θ h t ˆθ θ 0 θ=θ0 2 θ θ ˆθ θ 0 θ= θ h t + h tθ ˆθ θ ˆθ θ 0 h tθθ ˆθ θ 0 2 for some θ oining ˆθ and θ 0. hen, ĥ t ĥ t = h t h t + h t h tθ Eh tθ + h t ht θ Eh tθ } ˆθ θ 0 + h t + h t Eh tθ ˆθ θ 0 + ˆθ θ 0 h tθ h t θ h t h tθθ h t h t θθ ˆθ θ 0 h tθ ˆθ θ 0 ˆθ θ 0 h t θθ ˆθ θ 0 + h t θ ˆθ θ } ˆθ θ 0 h tθθ ˆθ θ } ˆθ θ 0 h tθθ ˆθ θ 0 4 ˆθ θ 0 h t θθ ˆθ θ 0 }.

25 734 MASAYUKI HIRUKAWA Hence, we can rewrite r/2q+2r+1 ŝ n s n} i=1 6 D i, where } D 1 = r/2q+2r+1 0 n ˆƔ h 0 Ɣ h 0, D 2 = 2 D 3 = 2 1 r/2q+2r+1 1 l n b ht h tθ Eh tθ + h t ht θ Eh tθ } ˆθ θ 0, t= +1 1 r/2q+2r+1 l n 1 } b ht + h t Eh tθ ˆθ θ 0, t= +1 D 4 = 2 r/2q+2r+1 1 ˆθ θ 0 l n b 1 h tθ h t θ + 12 h t h tθθ + 12 } h t h t θθ ˆθ θ 0, t= +1 D 5 = 2 r/2q+2r l n b 1 h tθ ˆθ θ 0 ˆθ θ 0 h t θθ ˆθ θ 0 t= +1 + h t θ ˆθ θ 0 ˆθ θ 0 h tθθ ˆθ θ 0 } ˆθ θ 0, D 6 = 2 r/2q+2r l b n 1 ˆθ θ 0 h tθθ ˆθ θ 0 t= +1 ˆθ θ 0 h t θθ ˆθ θ 0 }. D 1 = o p 1 is obvious. Since D 2 = 2q+1/22q+2r l n b ht h tθ Eh tθ + h t ht θ Eh tθ } } 1/2 ˆθ θ 0 t= +1 2q+1/22q+2r+1} R 2 1/2 ˆθ θ 0 },

26 WO-SAGE PLUG-IN BANDWIDH SELECION 735 we only need to show that R 2 = O p 1 to establish D 2 = o p 1. R 2 is further rewritten as R 2 = 2 1 l b l b 2R R 22. n 1 n 1 } h t h tθ Eh tθ t= +1 h t ht θ Eh tθ } t= +1 Since E h t h tθ Eh tθ } and E h t ht θ Eh tθ } are autocovariances, the same arguments apply as in the proofs of Lemmas 1 and 2. hen, R 21 and R 22 can be shown to converge in mean square and thus in probability to R21 n E h t h tθ Eh tθ } and R22 n E h t ht θ Eh tθ }, respectively, where R21 and R 22 are both bounded by Assumption A4c. Hence, R 2 = O p 1. D 3 can be rewritten as D 3 = 2q+1/22q+2r+1 2 } 1/2 ˆθ θ 0 1 l b n 1 2q+1/22q+2r+1} R 3 Eh tθ 1/2 ˆθ θ 0 }. } ht + h t Eh tθ t= +1 o establish D 3 = o p 1, we only need to show that R 3 = o p 1. We can see, for example, that 1 2 l n 1 1 b h t = o p l n 1 t= +1 b h t t=1 1 = O p 1/2 l, n b where 2 1 b q+1 l n 1 b b 1 = 1 l q b b x q lx dx < by Assumption A1a. It follows from b = O 1/2q+2r+1 and r > 1/2 that R 3 = O p 1/2 b q+1 = o p 1. A similar argument can also establish that each of D 4, D 5, and D 6 is at most o p 1. herefore, B.2 is shown. Part b: he proof directly follows the proof of heorem 1c in Andrews Proof of heorem 3. Part a: By Assumption A6b we only need to show that r/2q+2r+1 ˆR q ˆb R q b ξ } p 0. aking the first-order aylor expansion of ˆR q ˆb around ŝ q ˆb, n

27 736 MASAYUKI HIRUKAWA ŝ 0 ˆb = s q b ξ, s 0 b ξ gives ˆR q ˆb = R q b ξ + δ ξ ĥξ + ĥ ξ 2 o p 1, where δ ξ = 1/ s 0 b ξ, s q b ξ / s 0 b ξ and ĥ ξ = ŝ q ˆb s q b ξ,ŝ 0 ˆb s 0 b ξ. hen, we need to demonstrate that r/2q+2r+1 ŝ n ˆb s n b ξ } p 0, n = 0,q. B.3 Observe that } r/2q+2r+1 ŝ n ˆb s n b ξ = r/2q+2r+1 1 } } l l n Ɣ h E Ɣ h = 1 ˆb b ξ + r/2q+2r+1 1 } l l E n Ɣ h = 1 ˆb b ξ + r/2q+2r+1 1 } l n ˆƔ h Ɣ h = 1 ˆb H 1 + H 2 + H 3. First, we establish H 1 = o p 1. By Assumption A1c we can pick some η 1 + 1/ 2 b1 q 1 },2 + r 2 / q + 2 [. For such η, let an integer m 1 be m 1 b η ] ξ. hen, H 1 = 2 r/2q+2r+1 m } 1 } l l n Ɣ h E Ɣ h ˆb b ξ + 2 r/2q+2r+1 1 } l n Ɣ h E Ɣ h =m 1 +1 ˆb 2 r/2q+2r+1 1 } l n Ɣ h E Ɣ h =m 1 +1 b ξ 2H H 12 2H 13. By Assumption A1b, H 11 c r/2q+2r+1 m 1 1/2q+2r+1 ˆβ βξ 1/2q+2r+1 q Ɣ h E Ɣ h = c 1/2 Ĉ2 q,r 1/2q+2r+1 C 2 ξ q,r 1/2q+2r+1

28 WO-SAGE PLUG-IN BANDWIDH SELECION 737 Ĉ2 q,rc 2 1/2q+2r+1 ξ q,r r 1/2q+2r+1 1 m 1 } q+1 1/2 Ɣ h E Ɣ h. Now, 1/2 Ĉ2 1/2q+2r+1 1/2q+2r+1 q,r Cξ 2q,r = O p 1 and Ĉ 2 q,r p Cξ 2q,r 0, by Assumption A7 and the delta method. By B.1, ϕ,, 1, and Assumption A4c, we can find a constant M which depends on neither nor such that Var 1/2 Ɣ h M. As a referee points out, the upper bound M can be made to decrease with under certain conditions. While decaying the upper bound is beyond the scope of our analysis at this moment, it would immediately open a road to refining the results here and obtaining better conditions.. It follows from m 1 q+1 = O ηq+2/2q+2r+1 and η<2 + r 2/q + 2 that r 1/2q+2r+1 1 m 1 q+1 = o1, and as a result we have H 11 = o p 1 by Markov s inequality. Next, by Assumption A1c, b 1 H 12 c r/2q+2r+1 1 1/2q+2r+1 q Ɣ h E Ɣ h =m 1 +1 ˆβ b1 /2q+2r+1 = c Ĉ2 q,r r+b 1 /2q+2r+1 1/2 1 } q b 1 1/2 Ɣ h E Ɣ h. =m 1 +1 Obviously, H 12 = o p 1 if H 12 r+b 1/2q+2r+1 1/2 1 =m 1 +1 q b 1 1/2 Ɣ h E Ɣ h } =o p 1. Assumption A1c implies that b 1 q > 1, and thus =m q b 1 = O ηq+1 b 1/2q+2r+1. It follows from η > 1 + 1/2b 1 q 1} that r+b 1/2q+2r+1 1/2 1 =m 1 +1 q b 1 = o1. hen, H 12 = o p1 follows from Var 1/2 Ɣ h M and Markov s inequality. Using a similar argument, we have H 13 = o p 1, which establishes H 1 = o p 1. Next, we demonstrate that H 2 = o p 1. Let ˆx / ˆβ 1/2q+2r+1. By Assumption A1d and the definition of the characteristic exponent, for 0 ˆx x 1 the aylorseries expansion of l ˆx around ˆx = 0gives l ˆx = 1 +l 1 0 ˆx + + l[r] 0 ˆx [r] [r]! = 1 + l[r] 0 ˆx [r] [r]! + l[r]+1 x [r] + 1! + l[r]+1 x [r] + 1! ˆx [r]+1 ˆx [r]+1 for some x oining 0 and ˆx. Similarly, let x ξ / β ξ 1/2q+2r+1. hen, for 0 x ξ x 1,

29 738 MASAYUKI HIRUKAWA lx ξ = 1 + l[r] 0 x [r] [r]! ξ + l[r]+1 x ξ [r] + 1! for some x ξ oining 0 and x ξ. Hence, x [r]+1 ξ l ˆx lx ξ = l[r] 0 [r]! ˆx [r] x [r] ξ + l[r]+1 x ˆx [r]+1 [r] + 1! l[r]+1 x ξ [r] + 1! Note that this expansion is valid for J min [ x 1 βξ 1/2q+2r+1 ]}. For such J, H 2 can be rewritten as 1, H 2 = 2 r/2q+2r+1 J } l l E n Ɣ h ˆb b ξ + 2 r/2q+2r+1 1 l E n Ɣ h =J+1 ˆb 2 r/2q+2r+1 1 l E n Ɣ h =J+1 b ξ 2H H 22 2H 23. x [r]+1 ξ. B.4 [ x 1 ˆβ 1/2q+2r+1 ], Using B.4, we can rewrite H 21 as H 21 = r/2q+2r+1 J l [r] 0 [r] [r] [r]! 1/2q+2r+1 ˆβ βξ 1/2q+2r+1 n E Ɣ h + r/2q+2r+1 J n E Ɣ h r/2q+2r+1 J l [r]+1 x ξ [r] + 1! [r]+1 βξ E n 1/2q+2r+1 Ɣ h l [r]+1 x [r]+1 [r] + 1! ˆβ 1/2q+2r+1 H H 212 H 213. If [r] < r, then l [r] 0 = 0 by the definition of the characteristic exponent, which trivially yields H 211 = o p 1. If [r] = r, then by l r 0 < and E Ɣ h Ɣh, itis

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH LECURE ON HAC COVARIANCE MARIX ESIMAION AND HE KVB APPROACH CHUNG-MING KUAN Institute of Economics Academia Sinica October 20, 2006 ckuan@econ.sinica.edu.tw www.sinica.edu.tw/ ckuan Outline C.-M. Kuan,