Asymptotically unbiased estimation of auto-covariances and auto-correlations with long panel data

Asymptotically unbiased estimation of auto-covariances and auto-correlations with long panel data Ryo Oui February 2, 2007 Incomplete and Preliminary Abstract Many economic variables are correlated over time. It is important to determine whether this observed correlation comes from time invariant unobserved heterogeneity among individuals or from temporal persistency of a shoc. his paper examines how to estimate the auto-covariances and auto-correlations of individual dynamics separately from unobserved heterogeneity. When both cross-sectional and time-series sample sizes tend to infinity, we show that the within-group auto-covariances are consistent for the auto-covariances of individual dynamics, but that they are severely biased when the length of the time series is short. he biases have the leading term that converges to the long-run variance of the individual dynamics. his paper develops methods to estimate the long-run variance in panel data settings, and methods to alleviate the biases of the within-group auto-covariances based on the proposed long-run variance estimators. Monte Carlo simulations reveal that the procedures developed in this paper effectively reduce the biases of the estimators in small samples. Keywords: panel data; covariance structure; double asymptotics; bias correction; longrun variance. JEL classification: C3; C23. Variable comments were obtained from In Choi, Songnian Chen and seminar participants at Kobe, Hoaido and Kansai Econometric Society Meeting (at Yoohama). he author acnowledges financial support from HKUS under Project o. DAG05/06.BM6. he author is responsible for all errors. Department of Economics, Hong Kong University of Science and echnology, Clear Water Bay, Kowloon, Hong Kong. Email: oui@ust.h

Introduction Many economic variables are correlated over time. For example, an individual who has a high income in one year tends to earn a high income in the next year. An interpretation of the correlation might be that a shoc that occurred during a certain time period has a persistent effect on future income. Another potential explanation might be that individuals are different in their abilities to earn income and an individual with high income in the two consequtive years has a high ability. hese two interpretations yield different policy implications. It is important to determine whether this correlation comes from persistency of individual dynamics or from unobserved heterogeneity, and to examine to what extent each component contributes to observed correlation. A way to attac this question is to investigate the auto-covariance structure of individual dynamics separately from unobserved heterogeneity. his paper aims at developing statistical tools to estimate this using panel data. In particular, we consider how to obtain asymptotically unbiased estimates of the auto-covariances without imposing some specific structure on the auto-covariance structure when we have relatively long panel data sets. he existing procedures impose specific model structures on the individual dynamics. When the length of the time series of a panel is short, some restrictions are necessary, otherwise the auto-covariances of the individual dynamics are not identified (See, for example, Arellano (2003, Chapter 5)), and the sample auto-covariances or the sample auto-correlations are biased even asymptotically, as pointed out by Solon (984). As in the case of time series analysis, autoregressive models and moving average models are popular specifications. For example, early studies on income dynamics (e.g., Lillard and Willis (978), MaCurdy (982), and Abowd and Card (989)) model the time varying component as ARMA processes. Researchers have developed methods to estimate those models. For the auto-regressive models, the within-group estimator is inconsistent (icell (98)). Anderson and Hsiao (98) have proposed instrumental variable estimation of first order auto-regressive (AR()) models. heir methods have been extended by Arellano and Bond (992), and Holtz-Eain, ewey and Rosen (988) to generalized methods of moments estimation. Baltagi and Li (994) developed methods to estimate the moving average models. Alternatively, one may consider estimating the covariance structure by the minimum distance estimator (see Chamberlain (984)) as considered by Abowd and Card (989). Recently, panel data with moderately long time lengths have become available, and re- 2

searchers have developed mathematical tools to handle asymptotic sequences under which two indexes tend to infinity. hese panels and mathematical tools have motivated researchers to loo into the asymptotic properties of statistics in long panel data. Alvarez and Arellano (2003) and Hahn and Kuersteiner (2002) study the asymptotic properties of the within-group estimator for panel AR() models when both the cross-sectional sample size () and the length of the time series ( ) are large. Kiviet (995) and Bun and Kiviet (2006) consider more general (but still AR()-type) models that include covariates, and derive the bias of the within-group estimator. Hahn and Kuersteiner (2002) also develop a bias-corrected within-group estimator for models in which the individual dynamics follows the AR() structure. heir results may not be valid when the AR() structure is not true (see Lee (2005)). Lee (2005) and Hansen (forthcoming) study the asymptotic properties of the within-group estimators in AR(p) models and develop methods to correct the biases of the within-group estimators. Lee (2005) also considers cases in which the lag order is misspecified and proposes methods to choose the order of the auto-regression. While AR(p) models can capture many inds of dynamics, these methods still suffer from model misspecification. Moreover, the locus of these articles is the estimation of the coefficients in autoregressive models, and the results are not readily applicable to the purpose of this project. his paper addresses a basic, yet unanswered question of how to estimate the auto-covariance and auto-correlation structure of individual dynamics without imposing a specific structure. he auto-covariances of individual dynamics can be identified and be consistently estimated without imposing a specific structure when tends to infinity. While highly useful, the existing methods that are all model-based may provide misleading results when we impose wrong models. he statistical methods developed in this paper have several potential impacts. First, they should yield better understandings of the dynamic nature of ey economic variables. For example, understanding the nature of income dynamics is important when we discuss policies related to poverty and income inequality. hey are also useful for the purpose of finding appropriate models even if we want to do a model based analysis in empirical applications. We study asymptotic properties of the within-group auto-covariances, using double asymptotics, under which both and tend to infinity. Recent developments of asymptotic theory that handle double asymptotics mae it possible to analyze properties of estimators in long panel data. We show that the within-group auto-covariances are consistent for the auto-covariances of individual dynamics, but that these estimators are heavily biased when is only moderately large. he ey finding here is that the leading terms of the biases of these estimators are 3

proportional to the long-run variance of individual dynamics. We consider the estimation of the biases and propose bias-corrected estimators. he ey is the estimation of the long-run variance of individual dynamics. here has been numerous procedures proposed for the estimation of the long-run variance in the time series literature. (See, e.g., den Haan and Levin (997) for a review, although a large number of articles on this issue have been published since then.) In this paper, we extend the long-run variance estimators described by Andrews (99) to panel data settings and study the asymptotic properties of the proposed estimators using double asymptotics. It is interesting by its own right to study asymptotic properties of these long-run variance estimators in panel data settings under double asymptotics. Another issue is that these estimators depend on bandwidth parameters. Following Andrews (99), we derive the asymptotic mean square error of the estimator and choose the bandwidth parameter so that it minimized the mean square error. An interesting finding regarding the mean square error is the following: the long-run variance estimator considered in this paper can be written as the cross-sectional average of the long-run variance estimator for each time series; increasing reduces the variance but does not affect the bias of the estimator; when is large compared to, we obtain a similar mean square formula to the one given in Andrews (99); however, when is large relatively to, the variance becomes small and we need to consider an additional bias term instead of the variance (thus, we encounter a bias-bias trade-off, instead of a usual bias-variance trade-off). We then develop methods to alleviate the biases of the within-group auto-covariances using the proposed long-run variance estimator. We also consider iterated procedures in which we estimate the long-run variance based on the bias corrected estimators of the auto-covariances and correct the bias using the new long-run variance estimator. We may repeat this iteration many times. It turns out in Monte Carlo simulations that this iteration improves the performances of the estimators. We also study the estimation of other parameters of interest, such as the auto-correlations of individual dynamics, the partial auto-correlations, the variance of the individual effects, and the ratio of the variance of unobserved heterogeneity to the observed correlation. hey can be estimated easily based on the estimators of the auto-covariances and their statistical properties can be studied easily given the results concerning the auto-covariance estimators. Monte Carlo simulations are conducted to investigate small sample properties of the proposed auto-correlation estimators. We confirm that the within-group auto-correlations approach to the auto-covariances of individual dynamics as both and increase. We also confirm that they 4

can be severely biased when is not large and that the biases do not depend on while affects the variances. When the persistency of individual dynamics is not large, the auto-correlation estimators based on the bias-corrected auto-covariance estimators wor well even if is short. While the procedures developed in this paper effectively reduce the biases of the estimators even in cases in which the persistency is large, the bias reduction is not sufficient in those cases and a long time series is required to obtain reliable estimates of auto-correlations. We also find that iterating the bias correction remarably improves the performance. he reminder of the paper is organized as follows. Section 2 introduces our theoretical framewors. In Section 3, we study the asymptotic properties of the within-group auto-covariance estimators. Methods to alleviate the biases of the within-group autocovariance estimators are discussed in Section 4. Section 5 considers the estimation of the auto-correlations of individual dynamics based on the results in the previous sections. he estimation of the variance of the individual effects is considered in Section 6. In Section 7, we report the results of Monte Carlo experiments conduced to see the properties of the auto-correlation estimators developed in this paper. Section 8 concludes this article by stating possible extensions. 2 Setting Suppose that panel data {y it } for i =,..., and t =,..., are available. We consider a one-way error components model in which y it is generated by the sum of the time-invariant individual effect, η i, and the time varying stationary process, w it : y it = η i + w it. () We assume that {η i } i= are independently and identically distributed (i.i.d.) and that {{w it} t= } i= are i.i.d across individuals and stationary over time with mean E(w it ) = 0. We also assume that w it and η i are uncorrelated. We do not impose any specific model on the auto-covariance structure of w it. Let Γ be the -th order auto-covariance of y it : Γ E[{y it E(y it )}{y i,t E(y i,t )}], (2) where µ E(y it ) = E(η i ). Let ση 2 denote the variance of η i and γ denote the -th order autocovariance of w it (i.e., γ = E(w it w it )). hen, Γ can be decomposed into two components: Γ = ση+γ 2. here are two sources of observed dependency of y it across time. One is unobserved 5

heterogeneity among individuals, whose magnitude is represented by ση. 2 he other is the state dependence whose magnitude is represented by γ. ote that when is fixed, we cannot identify γ s and ση 2 without some restriction. Roughly speaing, the reason is that we only observe parameters (i.e. Γ, = 0,..., ) while there are + unnown parameters. However, when tends to infinity, we can identify γ s separately from ση. 2 Our main question is how to estimate the auto-covariances of w it when relatively long panel data sets are available. In the next section, we examine the asymptotic properties of the within-group auto-covariances. 3 Asymptotic properties of the within-group auto-covariances First, we examine the asymptotic properties of the -th within-group auto-covariance: ˆγ = ( ) i= t=+ (y it ȳ i )(y i,t ȳ i ), (3) which may be a natural estimator of γ, where ȳ i = t= y it/. When the length of the time series is fixed, ˆγ is not consistent for γ (Solon (984)). he main source of the inconsistency is that we cannot consistently estimate η i when is fixed. On the other hand, it is shown below that ˆγ is consistent for γ when both and tend to infinity under the following assumption. Assumption.. {{w it } t= } i= are i.i.d. across individuals. 2. w it is strictly stationary within individuals and j= γ j <. 3. here exists M < such that E( w it w i w im w il ) < M for any t,, m and l. his set of assumptions is standard. ote that Assumption does not impose any restriction on the probabilistic nature of η i, as η i does not appear in ˆγ. he following theorem shows the consistency of ˆγ. heorem. Suppose that Assumption is satisfied. hen, as and, for any. ˆγ p γ, (4) However, ˆγ may be severely biased when is not very large relative to. o see this, we observe that ˆγ may be decomposed in the following form (see the proof of heorem ): ˆγ = ( ) i= t=+ w it w i,t ( w i ) 2 + small. (5) i= 6

he term w i (= ȳ i η i ) can be understood as the estimation error for η i. his estimation error is the main source of the bias even when tends to infinity. ow, we have { } E ( w i ) 2 = E{( w i ) 2 } = j γ 0 + 2 γ j, (6) i= which is of order O(/ ) by Assumption. hus, the estimator ˆγ exhibits the bias of order O(/ ), which may be severe when is not very large. o mae the argument formal, we present the theorem below that concerns the asymptotic distribution of ˆγ. We mae the following additional assumption. Assumption 2.. t,t 2,t 3,t 4,t 5,t 6,t 7 = j= 7 E(w i j= w it j ) = O( ). 2. lim t =+ t 2 =+ {E(w it w i,t w it2 w i,t2 ) γ 2 } exists for any. We use heorem 3 of Phillips and Moon (999) to prove the next theorem. Assumption 2. is used to guarantee the uniform integrable condition, which is one of the ey conditions of heorem 3 of Phillips and Moon (999). his assumption may be relaxed as long as the uniform integrable condition is met. Assumption 2.2 guarantees that the asymptotic variance of ˆγ exists. his assumption can be satisfied if fourth order cumulants of w it are sufficiently small. For example, if w it is a Gaussian process (i.e., the fourth cumulants are zero), then lim t =+ t 2 =+ {E(w it w i,t w it2 w i,t2 ) γ 2} = j= γ2 j + j= γ +jγ j, which is finite under Assumption.. he following theorem gives the asymptotic distribution of ˆγ. heorem 2. Suppose that Assumptions and 2 are satisfied. hen, as, and / 3 0, ( ˆγ γ + ) V d where 0, lim V γ 0 + 2 t =+ t 2 =+ j= {E(w it w i,t w it2 w i,t2 ) γ 2 }, (7) j γ j. (8) Remar. Let V j= γ j denote that long-run variance of w it. We have V V as. he leading term of the bias of ˆγ converges to the long-run variance of w it. his observation leads us to consider a possibility of correcting the bias by estimating the long run 7

variance. In the next section, we examine this possibility, which turns out to be successful. his observation also implies that the bias is large if w it is highly persistent. ote that V > 0, which implies that the bias is downward and ˆγ is, on average, smaller than γ. It is also notable that the leading term of the bias does not depend on the order of the auto-covariance. Remar 2. he condition / 3 0 is required to ignore the bias term of order / 2. his condition can be relaxed if the bias term of order / 2 is taen into account. However, it maes the expression of the asymptotic bias complicated and we shall eep the condition / 3 0. Remar 3. heorem 2 presents the asymptotic distribution of ˆγ for each. It is easy to find the joint asymptotic distribution of ˆγ and ˆγ j for j because ˆγ has an asymptotic linear form: ( ˆγ γ + ) V = i= t=+ ote that the asymptotic covariance between ˆγ and ˆγ j is lim t =+ t 2 =j+ (w it w i,t γ ) + o p (). (9) {E(w it, w i,t w it2 w i,t2 j) γ γ j }. (0) If w it is a Gaussian process, the asymptotic covariance becomes t= γ tγ t +j + t= γ t+jγ t. 4 Bias correction In this section, we consider ways to alleviate the bias of γ. he leading term of the bias of γ is proportional to V, which converges to the long-run variance of w it. We propose to use an estimate of V to mitigate the bias of γ. Let ˆV denote an estimator of V. he estimator ˆV can be any estimator that satisfies the condition in heorem 3. We define γ as: γ = ˆγ + ˆV. () he motivation of γ is to correct the bias of ˆγ by adding an estimate of the leading term of the bias. Let r, be the rate of convergence of ˆV so that ˆV V = O p (r, ). he next theorem shows that the asymptotic distribution of γ is centered around zero. heorem 3. Suppose that Assumptions and 2 are satisfied. Suppose also that / 3 0. r, / 0. hen, as and, ( γ γ ) d 0, lim t =+ t 2 =+ {E(w it w i,t w it2 w i,t2 ) γ 2 }. (2) 8

his theorem implies that we may obtain estimates of the auto-covariances whose biases are small if we get some estimates of V. hus, the main question of this section is how to construct a good estimator of the long-run variance of w it and what is the rate of convergence of the estimator. 4. Estimating the long-run variance his subsection considers the estimation of the long-run variance of w it. As nown in the time series literature, it is not trivial to estimate the long-run variance. he long-run variance is a sum of auto-covariances. Unfortunately, it is well nown in the time series literature that simply summing sample auto-covariances does not give a consistent estimator. One must weigh downward the effect of higher order auto-covariances in order to obtain a consistent estimator for the long-run variance. In this paper, following Parzen (957) and Andrews (99), we consider the ernel estimators: Ṽ = j= + ( ) j j ˆγ j, (3) S where ( ) is a ernel function and the scalar S is the bandwidth to be chosen by the researcher. We assume that the ernel function belongs to the class K : K = { ( ) : R [, ] (0) =, (x) = ( x) x R, (4) } 2 (x)dx <, ( ) is continuous. (5) In the Monte Carlo simulations, we use the QS ernel that belongs to K. he functional form of the QS ernel is: (x) = { } 3 sin(6πx/5) (6πx/5) 2 cos(6πx/5), (6) 6πx/5 for x and (x) = 0 otherwise. Andrews (99) demonstrates several attractive properties of the QS ernel function. ote that Ṽ is always non-negative with the QS ernel, which also means that γ 0 is non-negative with the QS ernel. We present the theorem that shows the consistency of ˆV. We also provide the mean square error of ˆV, which is used for choosing the bandwidth parameter. We introduce several notations to present the theorem. Let In fact, ˆγ 0 + 2 P j= {( j)/ }ˆγj = 0 in the current setting. q lim x 0 (x) x q, (7) 9

for 0 q <. Let V (q) = j q γ j. (8) j= We need the following assumption that concerns the cumulants of w it. Let cum(t, t 2, t 3, t 4 ) denote the fourth order cumulant of (w i,t, w i,t2, w i,t3, w i,t4 ). Similarly let cum(t,..., t 8 ) denote the eighth order cumulant of (w i,t,..., w i,t8 ). Assumption 3.. j = j 2 = j 3 = cum(0, j, j 2, j 3 ) <. 2. j = j 7 = cum(0, j,..., j 7 ) <. he following theorem shows the consistency of ˆV and gives the rate of convergence of ˆV. he mean square formula given in the theorem also serves as the device to choose the bandwidth parameter. 2 heorem 4. Suppose that Assumptions, 2 and 3 are satisfied. If S and S 2 / 0, then Ṽ V p 0. (9) Suppose also that 3 / 2q 2 0 and S 2q+ /( ) τ, where 0 < τ <, for some 0 < q <, for which q and V (q) are finite. hen, lim, S MSE( ˆV ) = q (V 2 (q)) 2 τ + 2V 2 2 (x)dx. (20) On the other hand, suppose that 3 / 2q 2 and S q+ / τ, where 0 < τ <, for some 0 < q <, for which q and V (q) are finite. hen, 2 { lim, S 2 MSE(Ṽ ) = q V (q) τ V (x)dx} 2. (2) Remar 4. here are two bias terms that are relevant to this result. he first bias term comes from the fact that we use a ernel function. he other bias term stems from the result that each ˆγ is biased. When is sufficiently large relative to (i.e., 3 / 2q 2 0), the MSE has a similar form to that presented in Andrews (99). he difference is that the variance part of the MSE is of order S/( ) in the current setting, while it is of order S/ in the time series 2 We do not consider cases in which 3 / 2q 2 converges to a nonzero and finite constant. In those cases, the MSE formula involves three terms (two bias terms and one variance term) and we cannot obtain a closed form expression for the optimal bandwidth, which maes choosing a bandwidth parameter difficult in practice. 0

setting as in Andrews (99). When is not very large compared with (i.e., 3 / 2q 2 ), the second term of the bias becomes more important than the variance term. ote that the estimator ˆV is the sample average of long-run variance estimators across individuals, and that affects the variance but does not affect the bias of ˆV. herefore, the leading term in the MSE is the square of the leading terms of the biases and does not involve the variance term. Remar 5. When we use the QS ernel, we have q = 2, which implies that r, = ( ) 2/5 if 3 / 2 0 and r, = 2/3 if 3 / 2. herefore, the QS ernel function gives r, / = (/ 9 ) /0 if 3 / 2 0 and r, / = ( 3 / 7 ) /6 if 3 / 2. he condition r, / 0 that is required in heorem 3 is automatically satisfied when 3 / 2 0. However, when 3 / 2, the condition r, / 0 requires that 3 / 7 0, which is stronger than the condition / 3 0. 4.2 Choosing the bandwidth parameter We choose the bandwidth parameter by minimizing the MSE of Ṽ. We focus our attention on the QS ernel function. For the QS ernel function, we have q = 2, q.422, (x)dx.2930 and 2 (x)dx =. Let β = V (q) /V. hen, the value of the bandwidth parameter that minimizes the MSE is.322(β ) /5, when 3 / 2 0, S =.3002(β ) /3, when 3 / 2 and V (q) 0, (22).0320( β ) /3, when 3 / 2 and V (q) > 0. In order to operationalize the procedures, we need to obtain an estimate of β. We follow the strategy taen by Andrews (99). We estimate β based on the formula that is valid when the true data generating process follows the panel AR() model. When w it follows the AR() process with coefficient δ, then, the parameter β can be written as: β = 2δ ( δ) 2. (23) here are many ways to estimate the parameter δ. Here, we consider the estimator of Hahn and Kuersteiner (2002), but other estimators can also be considered. Let ˆα be the estimator of Hahn and Kuersteiner (2002): ˆδ = i= t=2 (y i,t ȳ i )(y i,t ȳ i+ ) i= t=2 (y + i,t ȳ i ) 2, (24)

where ȳ i = t= y it/( ) and ȳ i+ = t=2 y it/( ). hen, we estimate β by 2ˆδ ˆβ =. (25) ( ˆδ) 2 We use the following estimated bandwidth: min{.322( ˆβ 2 ) /5,.3002( ˆβ ) /3 }, if ˆβ 0, Ŝ = min{.322( ˆβ 2 ) /5,.0320( ˆβ ) /3 }, if ˆβ < 0. ote that Pr{Ŝ =.322( ˆβ 2 ) /5 } p if 3 / 2 0 and Pr{Ŝ =.322( ˆβ 2 ) /5 } p 0 if 3 / 2. While the formula of the optimal bandwidth depends on the rates at which and go to infinity, the bandwidth has an appropriate rate in large samples. ote that ˆδ converges to the first order auto-correlation of w it and is bounded in probability. hus, the estimation of ˆδ does not affect the rate of the bandwidth asymptotically. (26) 4.3 Iterated procedures In this subsection, we consider an iterated procedure. We update the estimate of V by using the biased corrected estimators for γ for = 0,...,. hen, we re-estimate γ based on the updated estimate of V. his iteration may be repeated many times. As γ s are better estimates of γ, the bias may be better estimated using γ. Let and Ṽ (m + ) = j= + ( ) j j γ j (m), (27) S m γ (m) = ˆγ + Ṽ (m), = 0,...,, (28) where m denote the number of iterations, S m is the bandwidth parameter for the m-th iteration and γ (0) = ˆγ for = 0,...,. Let γ(m) = ( γ 0 (m),..., γ (m)), ˆγ = (ˆγ 0,..., ˆγ ), I be the identity matrix and ι be the vector of ones. We consider using the same bandwidth throughout the iterations expect for m = 0. Let S denote the bandwidth parameter such that S m = S for m. Let ( K = (0), 2 ( ), 2 2 ( ) 2 S,..., 2 ( )) S. (29) S We can write the iteration formula in the following way: γ(m + ) = ˆγ + ι K γ(m). (30) 2

If ι K <, this iteration converges and the limit γ( ) can be written as: γ( ) = ( I + ι K ι K ) ˆγ, (3) or, equivalently, γ( ) = ( I ι K ) ˆγ. (32) ote that ι K < is satisfied when we use the QS ernel. 3 ote that the iterations do not affect the first order asymptotic results. hus, the gain should come in terms of finite sample performances. In Monte Carlo simulations, we see how the iteration improves the performances of the estimators. It turns out that the gain can be substantial particularly when is not large and/or when w it is persistent. V. he following theorem gives the asymptotic properties of the m-times iterated estimator of heorem 5. Suppose that Assumptions, 2 and 3 are satisfied. Suppose also that S m and S 2 m/ 0 for any m. hen, Ṽ V p 0. Let q denote a number that satisfies 0 < q <, for which q and V (q) are finite. Suppose also that S 2q+ 0 /( ) τ if 3 / 2q 2 0 and S q+ 0 / τ if 3 / 2q 2, and that Sm 2q+ /( ) τ for m for 0 < τ <. hen, as, and (q+)2 / q(3q+2) 0, for m 2. lim MSE(Ṽ (m)) = q (V 2 (q)) 2 τ + 2V 2, S Remar 6. In this theorem, we present only one mean square formula. 2 (x)dx, (33) Since Ṽ (m) for m 2 is based on bias-corrected estimators, the second term in the bias becomes small when (q+)2 / q(3q+2) 0, and we have a usual bias-variance trade-off. ote that (q+)2 / q(3q+2) 0 is weaer than 3 / 2q 2 0. Remar 7. When we use the QS ernel, the condition (q+)2 / q(3q+2) 0 means 9 / 6 0 and we have r, = ( ) 2/5 if 9 / 6 0. herefore, r, / = (/ 9 ) /0 if 9 / 6 0. hus, r, / = o() is automatically satisfied. On the other hand, 9 / 6 0 is stronger than / 3 0 and 3 / 7 0. 3 We have ι K < because (x) < for x 0. 3

As before, we use the mean square formula as the device to choose the bandwidth parameter. For the QS ernel function, the bandwidth parameter may be chosen to be Ŝ m =.322( ˆβ 2 ) /5. (34) 5 Autocorrelation his section considers the estimation of autocorrelation. Let ρ be the -th order autocorrelation of w it. By definition ρ = γ /γ 0. In this paper, we estimate ρ based on estimates of γ and γ 0. A natural estimator of ρ might be ˆρ ˆγ /ˆγ 0. As a corollary of heorem, the estimator ˆρ is consistent for ρ. However, ˆρ has the bias of order O(/ ). heorem 6. Suppose that Assumption is satisfied. hen, for all, as and, ˆρ p ρ. (35) Suppose that Assumptions and 2 are satisfied. Suppose also that / 3 0. hen, for all, as and, { ˆρ ρ + V where V ar(ˆγ ) = lim t =+ t 2 =+ } ( ρ ) d (0, Ω ), (36) γ 0 Ω = γ0 2 V ar(ˆγ ) 2 ρ γ0 2 Cov(ˆγ, ˆγ 0 ) + ρ2 γ0 2 V ar(ˆγ 0 ), (37) {E(w it w i,t w i,t2 w i,t2 ) γ 2 }, (38) Cov(ˆγ, ˆγ 0 ) = lim V ar(ˆγ 0 ) = lim t = t 2 =+ t = t 2 = {E(w 2 it w it2 w i,t2 ) γ 0 γ }, (39) {E(wit 2 wit 2 2 ) γ0}. 2 (40) he proof of the consistency is omitted because it is a simple application of the continuous mapping theorem. he proof of the asymptotic distribution part of the theorem is available in the Appendix. Remar 8. he estimator ˆρ exhibits the downward bias of order /. he limit of the bias term is proportional to the long-run variance: (V /γ 0 )( ρ ) ( ρ ) j= ρ j. It is interesting to note that the bias is small if ρ is close to given the values of other autocorrelations. he bias is largest when ρ = 0.5. 4

Remar 9. ote that ˆρ is essentially the within-group estimator of the coefficient of the panel first order autoregressive model. he asymptotic properties of ˆρ is studied by Hahn and Kuersteiner (2002) and Alvarez and Arellano (2003) under the assumption that the true data generating process follows the panel AR() model. Our result is equivalent to their result when the panel AR() model is correct. See the Appendix. However, if the true data generating process is the panel AR() structure, the results of Hahn and Kuersteiner (2002) and Alvarez and Arellano (2003) may not hold. Remar 0. Even if the true data generating process does not have the panel AR() structure, the within-group estimator of the coefficient on the lagged dependent variable can be considered as an estimator of the first-order auto-correlation. On the other hand, the instrumental variable estimator of Anderson and Hsiao (98) cannot be considered as the estimator of ρ. probability limit of the Anderson and Hsiao (98) estimator is: he i= t=3 y i,t 2(y it y i,t ) i= t=3 y i,t 2(y i,t y i,t 2 ) p γ 2 γ γ γ 0. (4) hus, the limit of the Anderson and Hsiao (98) estimator is not equal to ρ in general. When the model is misspecified, the Anderson and Hsiao (98) estimator may not have a good interpretation, while the within-group estimator still can be considered as an estimator of ρ. Let ρ be the estimator for ρ based on bias-corrected estimators for γ and γ 0 so that ρ = γ / γ. he next theorem shows that the asymptotic distribution of ρ is centered around zero. heorem 7. Suppose that Assumption is satisfied and ˆV = O p () as and. hen, as and, ρ p ρ. Suppose that Assumptions and 2 are satisfied. hen, for all, as and with r, / 0, ( ρ ρ ) d (0, Ω ), (42) where Ω is defined as in heorem 6. he proof of the theorem is omitted because it is a simple application of the continuous mapping theorem and the Delta method. 5

5. Partial auto-correlation (PAC) he partial auto-correlation is another popular measure of dependence over time. he -th partial auto-correlation is defined as the population value of the coefficient on w i,t in the regression of w it on w i,t,... w i,t. ote that α is the population value of the coefficient on w i,t in the regression of w it on w i,t,... w i,t. his does not mean that w it follows an AR() model. We can write the -th partial auto-correlation as a function of the auto-covariances. Let α signify the -th order partial auto-correlation. he parameter α satisfies the equation: γ 0 γ... γ γ = γ 0... γ 2............... γ γ 2... γ 0 α where s are elements of the vector that are irrelevant. γ By replacing γs by their estimates, we can obtain estimates for α. γ 2, (43)... γ We recommend to estimate α by using biased corrected estimators for γs. he estimator α is obtained by solving the following equation: γ 0 γ... γ γ = γ 0... γ 2............... α γ γ 2... γ 0 γ γ 2. (44)... γ he asymptotic properties of α can be studied by applying the continuous mapping theorem and the Delta method. heorem 8. Suppose that Assumption is satisfied and ˆV = O p () as and. hen, as and, α p α. Suppose that Assumptions and 2 are satisfied. hen, for all, as and with r, / 0, ( α α ) = (0, Λ ), (45) where Λ is defined in the Appendix. Remar. he results of Lee (2005) can be used to find the probability limit and the asymptotic distribution of a partial auto-correlation coefficient estimator based on ˆγs (not γ). ote 6

that the q-th element of the vector α(p, q) in Lee (2005) is the q-th order partial auto-correlation when w it follows an AR(p) process. However, we cannot use the bias correction method given by Lee (2005) for the estimation of partial auto-correlations. he problems Lee (2005) considers are to correctly select the order of auto-regression and to mitigate the bias of the estimates of the coefficients in correctly specified AR (p) models. 6 he variance of individual effects In this section, we consider the estimation of the variance of the individual effect. A natural estimator for the variance of η i may be the between-group variance: where ȳ = i= t= y it/( ). ˆσ 2 η = (ȳ i ȳ) 2, (46) i= We need assumptions on the distribution of η i to study the asymptotic properties of ˆσ 2 η in addition to the assumptions on w it. he following assumption is used to show the consistency of ˆσ 2 η. Assumption 4.. {η i } i= are i.i.d. across individuals. 2. E(η 4 i ) <. 3. w it and η i are independent for any t. his set of assumptions is standard. he next assumption is used to prove the asymptotic normality of ˆσ η. Assumption 5. t,t 2,t 3 = 3 E(w i j= w it j ) = O( ). he following theorem gives the asymptotic properties of ˆσ 2 η heorem 9. Suppose that Assumptions and 4 are satisfied. hen, as and, ˆσ 2 η p σ 2 η. (47) Suppose that Assumptions, 4 and 5 are satisfied. hen, as, and / 4 0, ( ˆσ η 2 ση 2 ) V d ( 0, [ E{(η i µ) 4 } ση]) 2. (48) 7

Remar 2. We have the bias of order. It is the main source of the inconsistency of ˆσ 2 η when is fixed. he estimator ˆσ 2 η is consistent for ˆσ 2 η as. It should also be noted that ˆσ 2 η is -consistent. Roughly speaing, it is because we can observe only one η for each individual. Remar 3. As for ˆγ, the leading term of the bias converges to the long run variance of w it. However, the direction of the bias of ˆσ 2 η is upward. hus, if we decompose ˆΓ by ˆγ and ˆσ 2 η, we tend to over-evaluate the persistency component, and to under-evaluate the unobserved heterogeneity component. We construct a bias corrected estimator of σ 2 η by σ 2 η = ˆσ 2 η ˆV. (49) Again, ˆV is an estimator of V, which may be Ṽ or Ṽ ( ), and whose rate of convergence is r,. We show that the asymptotic distribution of σ 2 η is centered around zero. heorem 0. Suppose that Assumptions, 4 and 5 are satisfied. Suppose also that r, 0. hen, as, and / 4 0, 2 ( σ η ση 2 ) d ( 0, [ E{(η i µ) 4 } ση 2 ]). (50) A problem of the estimator σ 2 η is that it may tae a negative value. While this problem might be of some practical concern, it would be beyond the scope of the current paper. 7 Monte Carlo simulations his section reports the results of the Monte Carlo simulations. he simulations are conducted with Ox 4.0 (Doorni (2006)). he primary purpose of the simulations is to see finite sample properties of the estimators proposed. he statistical procedures developed in this project are based on asymptotic results. It is important to see if the asymptotic results provide a good approximation of finite sample properties of the procedures. In particular, we focus on the question of how well each procedure estimates the auto-correlations of individual dynamics. 7. Design he data generating process used in the experiments is the following: y it = w it + η i, (5) 8

where η i i.i.d.(0, σ 2 η), and w it follows an ARMA (,) process: w it = αw i,t + ϵ it + θϵ i,t, (52) and ϵ it i.i.d.(0, σ 2 ). he initial observations are generated from the stationary distribution. 4 Specifically, we generate (w i0, ϵ i0 ) from w i0 + 0, σ2 θ2 + αθ α 2 σ 2. (53) ϵ i0 σ 2 σ 2 We fix the value of σ 2 and ση 2 so that σ 2 = and ση 2 =. ote that these variances do not affect the estimation of the auto-correlations of w it. Each experiment is characterized by the vector of (,, α, θ). We set = 20, 00; = 5, 0, 25, 50; α = 0, 0.5, 0.9; and θ = 0, 0.5. We consider three procedures. he first procedure considered is to estimate the autocorrelations based on the within-group auto-correlations (i.e., ˆρ. We call them WG ). ext, we consider the procedure based on the one-time bias corrected auto-covariances (i.e., ρ based on Ṽ (). We call them BWG ). Finally, we consider the procedure based on the auto-covariance estimators obtained after infinitely many iterations (i.e., ρ based on Ṽ ( ). We call them IB ). ote that the first-order asymptotic properties of BWG and IB are the same, but that their properties might be different in finite samples. he bandwidth parameters are chosen by using formula (26) for BWG and formula (34) for IB. he number of replications is 5000. 7.2 Results ables -6 summarize the results of the experiments. For each procedure, we report the biases and standard deviations (std) of estimates of first-, second- and third-order auto-correlations. [ables -6 around here] First, we examine the validity of our theoretical results by looing at the results of WG. It is clear in the tables that the cross-sectional sample size affects the standard deviations of the estimators, but it does not affect the biases. On the other hand, the length of the time series has a substantial impact on both the biases and the standard deviations. he biases are large 4 In order to examine the sensitivity of the results to the specification of the initial observations, we have also tried cases in which (w i0, ϵ i0 ) = 0 for all i throughout the simulations. he results that are available from the author upon request are almost identical to those reported in the current paper. 9

when the length of the time series is short and when the degree of persistency is large (α = 0.9). hese results are consistent with our theoretical results. ext, we investigate the performances of the procedures developed in this paper that have the bias-reducing property. ote that the standard deviations of BWG and IB are similar to those of WG, which means that the bias correction does not inflate the diversity of the estimators. Moreover, the standard deviations for all procedures are small compared to the biases. he performance of each procedure can, therefore, be measured by the magnitude of its bias. While the procedure BWG alleviates the bias, the procedure IB mitigates the bias more effectively than BWG. he gain of iterating the bias correction is substantial particularly when is small ( = 5 and 0). he effectiveness of our bias correction crucially depends on and α. (In the current setting, α measures the persistency of individual dynamics.) When there is no persistency in individual dynamics (α = 0), our bias correction wors very well and can eliminate the bias completely even if is small. However, when there is strong persistency (α = 0.9), a long time series is required to obtain estimates that are almost unbiased. Still, our procedures (in particular, IB ) are able to improve the within-group auto-correlation estimators substantially. he parameter θ does not appear to be more significant for determining the performances than the other parameters. o sum up, we observe that the procedures developed in this paper effectively reduce the biases without increasing the variances. hey provide reliable estimates of the auto-correlations particularly when the time dimension is moderately large or when the persistency is not very large. On the other hand, when the length of the time series is short and the persistency is large, our procedures might not be able to completely eliminate the biases although they perform remarably better than does the conventional procedure. Given the results of the experiments, we thin that applied researchers would benefit by using the procedures developed in this paper. In particular, the procedure IB wors remarably well. 8 Possible extensions his paper develops methods to estimate the auto-covariance and auto-correlation structure of individual dynamics separately from unobserved heterogeneity. here are many directions to which our procedures would be extended. First, in several applications, we may be interested in 20

the covariance structure of error terms in panel regression models. In those cases, the variable y it may not be directly observed and we need to tae the estimation error into account when we study the asymptotic properties of the statistics. his question is related to the construction of standard errors of estimators for penal data models and the GLS estimation of random effects models (see, e.g., Hansen (forthcoming) and references therein). Another important extension would be to consider the models that allow non-stationarity. Popular specifications are two-way error components models (i.e., models with time effects) and models with incidental trend effects. Hahn and Moon (2006) consider the estimation of panel AR() models with both individual and time fixed effects. he estimation of panel autoregressive models with incidental trends is examined in Phillips and Sul (2003) and Phillips and Sul (forthcoming). Extending our procedures to these models would enhance the range of application of our methods and may be useful. A echnical appendix A. Proof of heorem Proof. We have the following decomposition: ˆγ = ( ) 2 ( ) i= t=+ ( w i ) 2 + i= w it w i,t By Lemmas, 2 and 6, we get ˆγ p γ. A.2 Proof of heorem 2 ( ) ( w i ) 2 (54) i= i= t= w it w i + ( ) i= t= w it w i. (55) Proof. We have the following decomposition: (ˆγ γ ) { } (56) = +2 + ( ) ( ) ( ) i= t=+ (w it w i,t γ ) i= (ȳ i η i ) 2 V (57) ( w i ) 2 (58) i= i= t= w it w i + ( ) i= t= w it w i. (59) he first term on the right hand side is asymptotically normal by Lemma, and the second and third terms are o p () by Lemma 2. he last two terms in the right hand side are o p () by Lemma 6. 2

A.3 Proof of heorem 4 Proof. First, we consider the bias. ote that where We have j j B j = E 2 } E {Ṽ V j= + E(Ṽ ) = = j= + E(ˆγ j ) = j i= ( w i ) 2 + j= + + V ( ) j j E(ˆγ j ). (60) S γ j + j j i= t= V + B j, (6) w it w i + i= t= j w it w i. (62) (63) { ( ) } j j γ j (64) S j= + ( ) j j S + j= + ( ) j B j (65) S As shown in Parzen (957), S q times the first term on the right hand side converges to q V (q). ext, we consider the second term. First we have V V. Observing that S j= + (j/s) (x)dx, the second term is of order O(S/ ). It is smaller than the order of the first term when 3 / 2q 2 0 and S 2q+ /( ) τ. When 3 / 2q 2 and S q+ / τ, the orders of the first and second terms are the same. Lastly, we consider the third term on the right hand side. We observe that ( ) { } j E 2 j ( ) j 2 j ( w i ) 2 = S S 2 V (66) and that j= + j= + j= + ( ) j S E ( ) j j S 2 i= j= + 2S 2 V j w it w i i= t= γ m S 2 γ m m= by Lemma 6. Similarly, we can show that ( ) j E S m= i= t= j 22 j= + j= + ( w it w i S 2 = O 2 ( ) j = O S ( ) j = O S ( S 2 2 ( S 2 2 ), (67) (68) ), (69) ). (70)

herefore we have j= + ( ) j S B j ( ) S 2 = O 2. (7) o sum up, when S q+ / 0, } S q E {Ṽ V q V (q), (72) and, when S q+ / τ, where 0 < τ <, we have } S q E {Ṽ V q V (q) V (x)dx. (73) ext, we consider the variance. We note that ˆV is the sample average across cross section of the long-run variance estimator for each time series. Let where Ṽ,i j= + Ṽ = ( ) j S ˆV,i, (74) i= t= j + (y it ȳ i )(y i,t j ȳ i ). (75) herefore, we have var(ṽ ) = var(ṽ,i). (76) We verify Assumptions B, C and D of Andrews (99), under which we can use the variance formula for ˆV,i provided by Andrews (99). ote that θ, ˆθ and Vt (θ) in Assumptions B, C and D of Andrews (99) are η i, ȳ i and y it η i, respectively in our case. Observing that (y it η i )/( η i ) =, we can easily verify that Assumptions B, C and D are satisfied. herefore, we have S var(ṽ (S)) 2V 2 (x) 2 dx. (77) A.4 Proof of heorem 5 Proof. In this proof, we use the notations defined in the proof of heorem 4. First, we consider the case with m =. ow, j E{Ṽ (2)} = j= + E( γ j ()) = j ( ) j j E({ γ j ()}. (78) S γ j + B j + j 23 [V E{ ˆV ()}]. (79)

When 3 / 2q 2 0 and S 2q+ 0 /( ) τ (0 < τ < ), the term V E( ˆV ) is of smaller order than S q. hus, the formula for the bias part of the mean square is the same as that in heorem 4. ow, we consider cases with 3 / 2q 2 and S q+ 0 / τ (0 < τ < ). hen, we have V E{Ṽ ()} = O(S 0 / ). Observe that j= + ( ) j j S [V E{Ṽ ()}] = O ( ) S S 0 2 = O ) (S 2q+ q+. (80) herefore, when S 2q+ /( ) τ (0 < τ < ) and (q+)2 / q(3q+2) 0, we have [Ṽ (2) E{Ṽ (2)}] q V (q), (8) S q if (q+)2 / q(3q+2) 0. For m 2, by mathematical induction, we can show that [Ṽ (m + ) E{Ṽ (m + )}] q V (q), (82) S q m if S 2q+ m /( ) τ and (q+)2 / q(3q+2) 0. ext, we consider the variance of Ṽ (2). Ṽ (2) E(Ṽ (2)) (83) = Ṽ () E(Ṽ ()) + ( ) j j (Ṽ () E(Ṽ ())). S (84) j= + Since j= + (j/s){( j )/ } = O(S), we have var(ṽ (2)) 2V 2 S (x) 2 dx. (85) Again, by mathematical induction, for any positive integer m > 2 we have var(ṽ (m)) 2V 2 (x) 2 dx. (86) S m A.5 Proof of heorem 6 Proof. First, we observe that ˆγ γ + V ˆγ 0 γ 0 + V d ow, ( 0, ( )) V ar(ˆγ ) Cov(ˆγ, ˆγ 0 ). (87) Cov(ˆγ, ˆγ 0 ) V ar(ˆγ ) ˆρ ρ + V ρ = ˆγ γ + γ 0 ˆγ 0 γ 0 V ρ (88) γ 0 = (ˆγ γ + ) γ 0 V ρ ( ˆγ 0 γ 0 + ) γ 0 V (89) + γ 0 (ˆγ γ )(ˆγ 0 γ 0 ) + ˆγ (γ 0 )3 (ˆγ 0 γ 0 ) 2, (90) 24

where γ0 is between γ 0 and ˆγ 0. Since (ˆγ γ )(ˆγ 0 γ 0 ) and (ˆγ 0 γ 0 ) 2 are of order O(( ) + /2 3/2 + 2 ), we have ( ˆρ ρ + ) V ρ (9) γ 0 ( ( ) ( ) ( )) d 0, γ 0 ρ V ar(ˆγ ) Cov(ˆγ, ˆγ 0 ) γ 0 (92) Cov(ˆγ, ˆγ 0 ) V ar(ˆγ ) γ 0 ρ γ 0 A.6 Proof heorem 8 he proofs of the consistency and the asymptotic normality are omitted as they can be easily shown by applying the continuous mapping theorem and the Delta method. We discuss the asymptotic variance of α in this section. Let γ 0, = (γ 0,..., γ ) and γ 0, = ( γ 0,..., γ ). Let γ 0 γ... γ γ 0 γ... γ γ Υ = γ 2... γ 2, and Υ γ = γ 2... γ 2. (93)........................ γ γ 2... γ 0 γ γ 2... γ 0 Let A = (I, 0 ) and A + = (0, I ), where I is the identity matrix and 0 is the vector of zeros. Let e j be the dimensional vector whose j-th element is and other elements are zero. he matrix E j signifies the matrix whose ( + j, j + )-th and ( j, j + )-th elements for j = 0,... are one if well defined and other elements are zero. ow, α and α are the -th elements of the vectors A + γ 0, and Υ A +γ 0,, respectively. We can write Υ ( Υ A + γ 0, Υ A +γ 0, ) (94) = {Υ he matrix Υ Υ can be written as: A +( γ 0, γ 0, ) Υ ( Υ Υ )Υ A +γ 0, } + o p (). (95) Υ Υ = E j A ( γ 0, γ 0, )e j. (96) j= 25

herefore, we have {Υ A +( γ 0, γ 0, ) Υ ( Υ Υ )Υ A +γ 0, } + o p () (97) = Υ A +( γ 0, γ 0, ) Υ E j A ( γ 0, γ 0, )e j Υ A +γ 0, + o p () (98) j= = Υ A + Υ γ 0, A + Υ e je j A ( γ 0, γ 0, ) + o p (). (99) j= Let Σ be the matrix whose (i, j)-th element is lim t =j t 2 =j so that Σ is the asymptotic variance matrix of γ 0,. denoted by Λ is the (, )-th element of Υ A + Υ j= Υ A + Υ A.7 Proof of heorem 9 {E(w it w i,t i+w it2 w i,t2 j+) γ i γ j }, (00) j= Proof. We have the following decomposition: ˆσ 2 η = = γ 0, A + Υ γ 0, A + Υ hen, the asymptotic variance of α e je j e je j A Σ (0) A. (02) (ȳ i ȳ) 2 (03) i= (η i µ) 2 + i= (ȳ i η i ) 2 (ȳ µ) 2 + 2 i= (ȳ i η i )(η i µ). (04) By Lemmas 2, 3(), 4 and 5, we get ˆσ η 2 p ση. 2 ext, we examine the asymptotic distribution of ˆσ η. 2 We have the following decomposition: ( ˆσ η 2 ση 2 ) V (05) = {(η i µ) 2 ση} 2 + { } (ȳ i η i ) 2 V (06) i= (ȳ µ) 2 + 2 i= i= i= (ȳ i η i )(η i µ). (07) he first term converges to ( 0, [ E{(η i µ) 4 } σ 4 η]), by Lemma 3(2). he other terms are o p () by Lemmas 2, 4 and 5. 26

A.8 he asymptotic distribution of ˆρ when w it follows an AR() model Suppose that w it = αw i,t + ϵ it, where α < and ϵ it s are i.i.d. over time (and across individuals) with mean zero and variance σ 2. It is easy to see that γ = α σ 2 /( α 2 ). herefore, the asymptotic bias of ˆρ is V ( ρ ) = γ 0 = j= α σ 2 ( α 2 ) ( α)( α2 ) ( + 2α α ow we consider the asymptotic variance of ˆρ. First we observe herefore, V ar(ˆγ ) = lim Cov(ˆγ, ˆγ 0 ) = lim t =2 t 2 =2 σ 2 (08) ) ( α) = ( + α). (09) { α 2 E(w 2 it w 2 it 2 ) + 2αE(ϵ it w i,t w 2 i,t 2 ) (0) +E(ϵ it ϵ it2 w i,t w i,t2 ) α 2 γ 2 0 t =2 t 2 =2 } () { αe(w 2 it wi,t 2 2 ) + E(ϵ it w i,t wit 2 2 ) αγ0 2 }. (2) Ω = γ0 2 V ar(ˆγ ) 2 ρ γ0 2 Cov(ˆγ, ˆγ 0 ) + ρ2 γ0 2 V ar(ˆγ 0 ) (3) = γ0 2 lim E(ϵ it ϵ it2 w i,t w i,t2 ) (4) = γ 2 0 = σ2 γ 0 = lim σ2 t =2 t 2 =2 E(ϵ 2 itwi,t 2 ) (5) t=2 = α 2. (6) σ 2 α 2 hese results match the results of Hahn and Kuersteiner (2002) and Alvarez and Arellano (2003) on the within-group estimator for panel AR() models. A.9 Lemmas Lemma. Suppose Assumption is satisfied. hen, for any, as and, ( ) i= t=+ w it w i,t p γ. (7) 27

Suppose that Assumptions and 2 are satisfied. hen, for any, as and, d ( ) 0, lim i= t=+ t =+ t 2 =+ (w it w i,t γ ) (8) { E(wit w i,t w it2 w i,t2 ) γ} 2. (9) Proof. First, we observe that { E ( ) he variance term is { var ( ) i= t=+ i= t=+ w it w i,t } w it w i,t } = E(w it w i,t ) = γ. (20) = { } 2 ( ) 2 E (w it w i,t γ ) (2) t=+ < M 0. (22) By the Chebyshev inequality, we obtain the desired result. We apply heorem 3 of Phillips and Moon (999) to show the asymptotic normality. It is easy to see that Conditions (i), (ii), and (iv) of heorem 3 of Phillips and Moon (999) are satisfied. he sufficient condition for Condition (iii) is { } 4 E (w it w i,t γ ) <. (23) t=+ ow = { } 4 E (w it w i,t γ ) (24) t=+ { 2 } 4 ( ) 4 E (w it w i,t γ ) (25) t=+ = O(), (26) by Assumption 2. herefore, we have the desired result. Lemma 2. Suppose Assumption is satisfied. hen, as and, (ȳ i η i ) 2 p 0. (27) i= Suppose that Assumptions and 5 are satisfied. hen, as and, (ȳ i η i ) 2 ( ) V = O p. (28) i= 28

Proof. First we note that i= (ȳ i η i ) 2 / 0. Its expectation is { } E (ȳ i η i ) 2 = E{( w i ) 2 } (29) i= = V < γ j 0. (30) j= By the Marov inequality, the term is o p (). ow, the variance is { } [ { var (ȳ i η i ) 2 = E ( w i ) 2 } ] 2 V i= = [ E{( w i ) 4 } ] 2 V 2 ( = ) 4 4 E w it 2 V 2 = O by Assumption 5. By the Chebyshev inequality, we obtain the desired result. Lemma 3. Suppose Assumption is satisfied. hen, as,. i= (η i µ) 2 p ση; 2 2. i= {(η i µ) 2 σ 2 η} d ( 0, [ E{(η i µ) 4 } σ 4 η]). he results hold independently of the rate at which tends to infinity. t= ( ) 2 (3) (32) (33) Proof. ote {(η i µ) 2 } i= is an i.i.d. sequence. herefore, the results can be obtained by applying the Kinchine law of large numbers and the Lindberg-Levy central limit theorem. Lemma 4. Suppose Assumption is satisfied. hen, as and, Proof. First we note that ( ) (ȳ i η i )(η i µ) = O p. (34) i= (ȳ i η i )(η i µ) = i= Its expectation is 0, and its variance is ( ) var w i (η i µ) By the Chebyshev inequality, we obtain the desired result. i= w i (η i µ). (35) i= = E[( w i) 2 ]E[(η i µ) 2 ] (36) = 29 ( ) V ση 2 = O. (37)