Bias in Dynamic Panel Models under Time Series Misspeci cation

Bias in Dynamic Panel Models under Time Series Misseci cation Yoonseok Lee August 2 Abstract We consider within-grou estimation of higher-order autoregressive anel models with exogenous regressors and xed e ects, where the lag order is ossibly misseci ed. Even when disregarding the misseci cation bias, the xed-e ect bias formula is quite different from the correctly seci ed case though its asymtotic order remain the same under the stationarity. We suggest some bias reduction methods under the ossible misseci cation. Keywords: Bias, dynamic anel, xed e ects, misseci cation, bias reduction. JEL Classi cations: C23, C33 An earlier version of this aer is the art of my dissertation at Yale University and it was circulated as A General Aroach to Bias Correction in Dynamic Panels under Time Series Misseci cation. I thank Peter Phillis, Donald Andrews, Yuichi Kitamura and seminar articiants at Cornell, MSU, Texas A&M, SUNY-Binghamton, Yale, 25 Inter University Conference at Princeton, 27 Midwest Econometrics Grou Meeting, and Conference in Honor of Peter C. B. Phillis. I also would like to thank an anonymous referee and coeditors for their detailed and constructive comments. All errors are solely mine. Address: Deartment of Economics, University of Michigan, 6 Taan Street, Ann Arbor, MI 489-22. E-mail: yoolee@umich.edu.

Introduction Since the in uential aers by Nerlove (967) and Nickell (98), nite samle autoregressive bias in xed-e ect dynamic anel models has been well understood and many bias reduction methods were roosed in the context of within-grou estimators (e.g., Kiviet, 995; Hahn and Kuersteiner, 22; Alvarez and Arellano, 23; Bun and Carree, 25; and Phillis and Sul, 27 to name a few). Dynamic anel studies have long relied on the rst-order autoregressive structure, which is indeed unavoidable esecially when the length of the time series (T ) is small. As longer anel data become available, however, it is more natural to consider higher order dynamics when rst-order models are susected to be misseci ed. This aer evaluates the e ects of this tye of misseci cation articularly on the xed-e ect bias in the dynamic anel regressions. Seci cally, we extend the bias formula of Nickell (98) to the case that the dynamics follow general autoregressive forms with exogenous regressors, ARX(), but the lag order could be misseci ed. We also develo NT -normalized limit distribution of the withingrou estimator that allows for lag order misseci cation, when both N (the cross section samle size) and T get large at the same rate. Besides the order misseci cation bias and the xed-e ect bias, the analytical results reveal an additional bias, which is generated from combining order misseci cation and incidental arameters roblem (Neyman and Scott, 948). The additional bias is, however, still of the same order of magnitude as the standard xed-e ect bias (i.e., O(=T )) and thus it is ossible to develo a bias reduction method. Though attemts to adjust for the bias using formulae that correct for rst-order dynamics could be wrong and even exacerbate the bias under such misseci cation, it is found that we can reduce the xed-e ect bias robustly to the lag order misseci cation using the enalized likelihood function aroach (e.g., Hahn and Kuersteiner, 24; Arellano and Hahn, 26; Bester and Hansen, 27) or the model selection based aroach (e.g., Lee, 26). 2 The Model We consider a anel rocess fy i;t g generated from the homogeneous th-order univariate autoregressive model with exogenous regressors X i;t 2 R r (i.e., (ARX ())) given by y i;t = i + X jy i;t j + X i;t + u i;t for i = ; ; N and t = ; ; T, () j=

where the lag order is assumed to be nite. We let the initial values (y i; ; y i; ; ; y i; + ) be observed for all i. We rst assume the following conditions. Assumtion E (i) fu i;t g is i.i.d. across i and t; (ii) E (u i;t i ) = E (u i;t X i;s ) =, Eu 2 i;t = 2 with < 2 < and Eu 8 i;t < for all i, s and t; (iii) fx i;tg is i.i.d. across i and strictly stationary in t for each i; (iv) jje(x i;t Xi;t )jj <. Assumtion S P j= j jj < and all roots of the characteristic equation P j= jz j = lie outside the unit circle. Assumtion E imlies weak (or sequential) exogeneity in y i;t but strict exogeneity in X i;t. It still allows for non-zero correlation between the unobserved individual e ect i and X i;t. Note that X i;t could be serially correlated; but we assume that X i;t and the higher order lags of y i;t cature all the ersistence and thus the error term does not have any serial correlation. Assuming indeendence of u i;t over t makes the exressions simle. We also exclude cross sectional deendence in u i;t. Finally note that the eighth moment condition is required to derive joint asymtotic CLT in the later section. We eliminate xed e ects i by subtracting the individual samle average over time (i.e., within-grou transformation) from equation () to obtain y i;t = X j= jy i;t j + X i;t + u i;t, (2) where for any variable w i;t we de ne w i;t j = w i;t j w i; j and w i; j = (=T ) P T s= w i;s j for j = ; ; ;. The formula (2) readily transforms into the rst-order -dimensional vector autoregressive rocess with exogenous regressors given by Y i;t = AY i;t + BX i;t + U i;t, (3) where Y i;t j = (y i;t j ; y i;t j ; ; y i;t j + ) for j = ;, and A = " () I ; with () = ( ; 2 ; ; ) and I k being the identity matrix of rank k. We also let As noted in Bhansali (98) and Kunitomo and Yamamoto (985), we can consider the in nite case given suitable de nitions of the in nite dimensional autoregressive coe cient matrix A and the choice matrices e and J q. This is the case of estimating an aroximate ARX ( T ) model with T!, where 3 T =T! and T =2 P j=+ jjj! as T! (e.g., Bhansali, 978). See Lee (26) for further discussions. 2 #

U i;t = e u i;t and B = e, where e is the column vector with one in the rst element and zeros elsewhere. Note that, from (3), Assumtion S is equivalent to det [I Az] 6= for all jzj, or that each eigenvalue of A has modulus less than one. It thus guarantees that the sequence fa j g is absolutely summable and P j= Aj = (I A) exists. Hence, if we de ne vector linear rocesses V i;t = P j= Aj U i;t j and Z i;t = P j= Aj BX i;t j with U i;t = e u i;t, then both V i;t and Z i;t exist in the mean square sense and we can rewrite (3) as Yi;t = Z i;t + V i;t. Moreover, if we let j = E(V i;t Vi;t+j ) for j = ; ;, we have j = A j, where = 2 P j= Aj e e A j. For later convenience we also introduce the long-run covariance matrix of V i;t as = P j= j = + +, where = P j= j and = P j= j = P j= j. Assumtions E and S guarantee that exists. In most cases, the true lag order of the underlying autoregressive rocess fy i;t g in () is unknown. Hence we consider the situation that fy i;t g is tted to an ARX (q) rocess instead of ARX (), where q. 2 By stacking cross section observations rst and then time series observations, (3) can be rewritten as Y = Y A + X B + U, from which the within-grou estimators in this case are de ned as b (; q) = J qy Q X Y J q J q Y Q X Y e (4) b = X X X Y e Y J q b (; q), (5) where Q X = I NT X (X X ) X, b (; q) = (b ;q ; b ;q2 ; ; b ;qq ) and b ;qr for r = ; 2; ; q is the within-grou estimator for the coe cient of y i;t r when ARX () rocess is tted to ARX (q). The q choice matrix J q is de ned as J q = [I q ; ] if q < ; J q = I if q =. We further assume the invertibility of JY Q X Y J and X X, which is imlied by the following condition. Assumtion R (i) For given, Y Q X Y is a full column rank matrix; (ii) X is a full column rank matrix and it does not include time-invariant variables; (iii) q <. 3 Bias Formulae When the number of lags is not correctly chosen, and esecially when the data is tted to a lower order ARX model, the within-grou estimators in (4) and (5) are exected to 2 The case of q > is less interesting since there is no variable omission bias. In this case, we simly let (; q) = (() ; q ). An interesting examle is when (; q) = (; ) without exogenous regressors X i;t (i.e., y i;t = i + u i;t is the data generating model). In this case, for the within-grou estimator b(; ) = P N P T i= t= u i;t u i;t= P N P T 2 i= t= u i;t, we can derive that lim N! b(; ) = =T. 3

have variable omission bias on to of the standard xed-e ect bias. Seci cally, we let b = Y Q X Y =NT, b = Y Q X Y =NT, = + lim N! Z Q X Z =NT and = + lim N! Z Q X Z A =NT. Then, using the identity C = (I C (C C ))C for symmetric invertible matrices C and C, we can rewrite (4) as b (; q) Jq () = (; q) Jq () + fb (; q) (; q)g (6) = n (; q) Jq () + (Jq J q ) Jq( b )e (Jq b J q ) [Jq( b b (b (; q)) + b 2 (b (; q)), )J q ](J q o J q ) Jq b e where (; q) = (Jq J q ) Jq e = ( ;q ; ;q2 ; ; ;qq ) is the theoretical arameter value from the ARX(q) tting. For examle, (; ) = ; is the rst-order autocorrelation coe cient of y i;t for the ure autoregressive case. In addition, since we can rewrite (3) as J qy i;t = J qaj q J qy i;t b = + J qbx i;t + J qu i;t, (5) can be decomosed as n X X X Y J q Jq () (; q) o (7) n + X X o X Y J q ( (; q) b (; q)) + o () b ( b ) + b 2 ( b ) + o (). by letting (X X ) X U e = o () from the strict exogeneity. Note that in these two bias exressions (6) and (7), b (b (; q)) and b ( b ) are the variable omission biases, which cannot be eliminated unless the model is correctly seci ed. This art of bias, therefore, should be took care of by roer lag order selection methods (e.g., Lee, 26, 2b). On the other hand, b 2 (b(; q)) and b 2 ( b ) are the biases from the within-grou transformation. In this section, we will show that the second art of the biases have di erent exressions from the standard xed-e ect bias formula and they indeed have additional terms olluted by the misseci cation. b 2 (b(; q)) and b 2 ( b ) are, however, shown to be still O (=T ), which will be disaear as T!, whereas the misseci cation biases b (b(; q)) and b ( b ) do not vanish even when N; T!. The main interest here is, therefore, in the biases b 2 (b(; q)) and b 2 ( b ), which are manageable, instead of the entire biases b(; q) J q() and b. 3 This is the case when 3 The theoretical arameters (; q) indeed corresond to the autocorrelation coe cients of y i;t u to the qth-order, which are not model seci c. Thus, the results in this aer could be alternatively seen as the bias in estimating autocorrelations in xed-e ects models. The bias from the theoretical arameter value is also studied in the standard time series literature under misseci cation (e.g., Bhansali, 98; Kunitomo and Yamamoto, 985). 4

the researcher is only able to run the ARX(q) regression (with q < ) due to the lack of enough time eriods or simly believes ARX(q) is true; but she tries to nd the bias formula from the arameters in ARX(q) in order to correct the xed-e ect bias. A leading examle is tting AR() rocess using nite number of lags. The main lesson should be that the standard bias formula for the within-grou estimator is no longer valid even around the theoretical arameter values (i.e., even we disregard the misseci cation bias) and thus the standard bias correction methods would not work roerly in this case. 3. Nickell bias We rst examine the asymtotic bias of the within-grou estimator when N tends to in nity but T is xed. Even in the case of correct seci cation, the standard within-grou estimator in AR () xed-e ects models is not consistent for large N (e.g., Nerlove, 967; Nickell, 98). As well described in Phillis and Sul (27), such autoregressive bias arises from the correlation of the error and the lagged deendent variables after the unknown mean is estimated to be removed. Not surrisingly, such bias becomes more comlicated when the lag order is not correctly seci ed. The rst theorem generalizes the Nickell bias to the case of ossible time series misseci cation. Theorem Let fy i;t g be generated from () and S X = lim N! Z Q X Z =NT exists. Under Assumtions E, R and S, lim (b (; q) (; q)) = (I q R q; ) R q;2 (8) N! (I q R q; ) R q; R q; D X q J q(s X A + )e, where D X q = (J q( + S X )J q ), R q; = D X q J qg J q, R q; = D X q J qg e and R q;2 = D X q J qg 2 e. Exressions of G, G 2 and G 2 are as (A.3), (A.4) and (A.5) in Aendix. The rst term of the bias (8) is mainly from the nonzero correlation between Yi;t and, and the second term is mainly from order misseci cation (more recisely, from the U i;t combination of endogeneity and order misseci cation). It can be easily veri ed that the asymtotic bias of b (; q) around (; q) becomes negligible as T grows. The bias exression of (5) follows easily from (7) and (8). If there is no exogenous regressors, the exression for lim N! (b (; q) as (; q)) remains the same without S X term, which can be aroximated 2 T D qjq (I A) e T D qjq I J q D q Jq A e + O T 2 5 (9)

for large T, where D q = (J q J q ). Remark When q =, (9) could be further reduced to 2` X (T ) (T ) k! k! + O k= T 2, () P where ` = = k= k and!k = P j= j+k with j being the autocovariances of y i;t : For a articular examle with (; q) = (2; ), the asymtotic bias can be obtained as T 2 ( + 22) ( + ) T 2 22 using the Yule-Walker equation, where = 2 =( + 22 22 ( + ) + O T 2 () 22 ) is the rst-order autocorrelation coe cient of y i;t in AR(2), which indeed corresonds to the theoretical coe cient (2; ) = 2;. 4 If = q =, then 22 = and () is equivalent to the bias exression of Nickell (98), which could be also seen in () by letting k = for k 2 since = =. Note that since the true arameters satisfy j 22 j <, 22 + 2 < and 22 2 < under the Assumtion S (e.g., Marmol, 995), it can be veri ed that the rst art of the bias exression () is always negative, whereas the direction of the second art deends on the sign of 22. On the other hand, the sum of these two terms, (=(T 2))((+ 22 )=( 22 )) ( + ), is always negative under the stationarity so that both the direction and the asymtotic order of the xed-e ect bias of the AR() coe cient b(2; ) about the rst-order autocorrelation coe cient = (2; ) still remains the same as the standard (correctly seci ed) case. 5 Another nding is that the absolute value of the bias () increases with ositive 22. In articular, the second art of the bias, which is mainly from the misseci cation, exlodes as 22 gets close to unity even when 2 =. Similar henomenon was also discussed by Kunitomo and Yamamoto (985) in the time series context. 3.2 Noncentrality in the asymtotic distribution We now consider the asymtotic distribution of the within-grou estimator when both N and T are large. Hahn and Kuersteiner (22), Alvarez and Arellano (23), and Lee (2a) 4 In an indeendent work by Okui (28), it is also shown that the within-grou estimator of the anel AR() coe cient converges to the rst-order autocorrelation coe cient as N! and T! even under dynamic misseci cation. This nding is a articular examle of Theorem since b (; ) (; )! as N! and T!, where (; ) should be the rst-order autocorrelation coe cient for any by construction. 5 Based on a number of simulation studies, we exect that the direction of the bias () still remains negative even for 3. 6

consider a similar case, where the asymtotic ratio of N to T is a nite constant. More recisely, we assume the following condition. Assumtion NT lim N;T! N=T =, where < <. Under Assumtion NT, the standard NT -normalized within-grou estimator has nondegenerating asymtotic bias, which is roortional to the limiting samle size ratio,. As in Theorem, however, the asymtotic bias could increase when the dynamic seci cation is incorrect. We also assume the following initial condition. Assumtion I (i) Y i; iid( i (I A) e + E(Z i;t ); + var(z i;t )) for each i, where var(z i;t ) < ; (ii) (=N) P N i= 2 i = O (). The initial condition in Assumtion I is standard in time series models when studying stationary autoregressive rocess. Since we assume large T, we use this initial condition without loss of generality as the initial values become less imortant for longer stationary anels. Theorem 2 We assume that lim T! S X = lim N;T! Z Q X Z =NT > and it exists. Under Assumtions E, R, S, I and NT, NT (b (; q) (; q))!d N ( X q ; 2 D X q ) (2) as N; T! jointly, where X q = 2 D X q Jq(I A) e +D X q JqA e D X q (JqJ q )D X q Jq( + lim T! S X A )e and D X q = (Jq( + lim T! S X )J q ). The asymtotic distribution of (5) follows easily from (7) and (2). Theorem 2 shows that the asymtotic bias deends on the samle size ratio,, and thus large T does not attenuate the bias unless is zero. If there is no exogenous regressors, the exression reduces to NT (b (; q) (; q))!d N q ; 2 D q (3) with q = 2 D q Jq (I A) e Dq Jq(I J q D q Jq )A e, where the rst comonent mainly contributes to the well-known negative bias from the within-grou transformation, which always underestimates (; q) even when the number of lags is correctly chosen. As discussed in Alvarez and Arellano (23), we can further derive that NT (b (; q) [ (; q) (=T ) q ])! d N ; 2 D q using a higher order exansion of the 7

bias term (e.g., Lee, 26) rovided that N=T 3!. The asymtotic distribution under the correct time series seci cation (e.g., Hahn and Kuersteiner, 22) is a secial case of (3) by letting = q, which is NT (b () ())! d N ( 2 ( ) e ; 2 ). In articular, when =, we have NT (b () ())! d N ( ( + ()); () 2 ). 4 Bias Reductions Theorems and 2 show that the within-grou estimators have additional bias even around the theoretical arameter values when the lag order is not correctly chosen. Therefore, most existing bias corrections would not work roerly because the correction formulae assume correct model seci cation. In fact, attemts to adjust for the bias using formulae that correct for AR() dynamics would be wrong and may even exacerbate the bias when the true lag order is larger than one. For examle, we consider estimating AR() anel regression with Hahn and Kuersteiner (22) s tye bias correction when the true data generating rocess is AR(2): e(2; ) = ((T + )=T )b(2; ) + (=T ), where b (2; ) is the within-grou estimator. From () and using (2; ) =, however, e(2; ) (2; ) = (b(2; ) (2; )) + (=T )(b(2; ) + )! (=T ) ( + ) (2 22 =( 22 )) + O(=T 2 ) as N!, where the leading bias term can be zero only when 22 = (i.e., AR() is indeed the true data generating rocess) under the stationarity. Unfortunately, the direction and the size of the bias after the correction would vary deending on the unknown arameter value 22. Note that this result is even without considering the ure misseci cation bias (i.e., (2; ) 2 ), which will add additional bias toward the true arameter value e(2; ) 2. This examle manifests that a recise dynamic seci cation is imortant articularly when we attemt to correct the xed-e ect bias. A reasonable aroach is to conduct model selection before any bias corrections as suggested in Lee (26). 6 By doing so, the additional bias terms from the misseci cation will disaear asymtotically and we can focus on eliminating the ure xed-e ect bias using the bias formulae derived. Additional bene t of this aroach is that the bias correction works not only toward the theoretical arameter but also toward the true arameter since the ure misseci cation bias (e.g., b (b (; q)) in (6)) will disaear from the model selection stage. Alternatively, articularly when we are interested in bias correction toward the theo- 6 Also see Lee (2b) for a modi ed lag order selection criterion develoed for the AR() dynamic anel models with xed e ects. It is imortant to note that we cannot use the standard information-based lag order selection criteria (e.g., AIC, BIC) in this case. Intuitively, this is because the ML estimator and the KL information criterion could be inconsistent for small T in the resence of incidental arameters (e.g., the xed e ects). 8

retical arameter, we can address it, even under ossible lag order misseci cation, using the enalized likelihood function aroach (e.g., Hahn and Kuersteiner, 24; Arellano and Hahn, 26; Bester and Hansen, 27). Though the original formulae were develoed under the correct model seci cation, we can use this bias correction aroach robustly to the dynamic order misseci cation since they are formulated using the HAC estimator for the variance of the scores. Note that any lag order misseci cation yields erroneous serial correlation in the error term (or in the scores in general). More recisely, we can derive the enalty term in this context as 2 2 T X n i= X m `= m X minft;t +`g t=maxf;`+g u i;tu i;t ` (4) for some truncation arameter m satisfying m=t =2!, which is based on the HAC estimator for the long-run variance of u i;t and thus indeed allows for general forms of serial correlations in u i;t. Note that Bester and Hansen (27) suggest to use small m in ractice (e.g., m = for AR() model) but we need to use large m as in the standard HAC estimation to cover general forms of serial correlations from ossible order misseci cations. Tables I and II summarize some Monte Carlo simulation results showing how the bias reductions work even under the lag order misseci cation. The true model is AR(2) without exogenous regressors but it is tted to AR(). The true coe cients are 2 = 22 = :4 or :4. The theoretical arameter value 2; = (2; ) is the rst-order autocorrelation coe cient in the given AR(2) rocess (about :67 and :29, resectively), which is calculated using the true arameter values. i are randomly drawn from U ( :5; :5) and u i;t from N (; ). b 2; is the within-grou estimator before bias correction; e AR() 2; is the Hahn- Kuersteiner s bias-corrected estimator assuming AR() is correct; e P 2; L is the bias-corrected ML estimator from the enalized likelihood using (4) with m being the smallest integer larger than T =4, which is suosed to be robust to order misseci cation; e = e (b( ) lim N! [b ( ; ) ( ; )]) and e ; = b 2; lim N! (b ( ; ) ( ; )) are the bias-corrected estimators using the bias formula (8) and assuming the selected lag order is true, where is chosen as Lee (26 or 2b). 7 Table I tabulates the biases from the theoretical arameter values, which are b 2 (b (; q)) in (6), whereas Table II tabulates the biases from the true arameter values, which thus include the misseci cation biases (i.e., b (b (; q)) + b 2 (b (; q)) in (6)). The values are obtained from averaging over relications. 7 In other words, e is the rst element of the standard bias-corrected within-grou estimator of correctly seci ed AR( ); e ; is the bias-corrected estimator of AR() but the correction formula (8) assumes is true. Both cases believe is correctly chosen but the target arameters are di erent. 9

Table I: Bias from the theoretical arameter and RMSE of bias corrected estimates (N=25 ; ercentage of biases to the target arameter values are in the arentheses) 2; T b 2; 2; e AR() 2; 2; e P 2; L 2; e ; 2; bias (%) rmse bias (%) rmse bias (%) rmse bias (%) rmse.67 2 -.357(53.6).358 -.248(37.2).25 -.27(4.5).27 -.27(4.5).27 25 -.7(25.5).7 -.(6.5).2 -.(6.6).2 -.6(5.9).7 5 -.8(2.).8 -.49 (7.3).5 -.5 (7.4).5 -.42 (6.3).44 -.39 (5.8).4 -.23 (3.4).24 -.2 (3.).22 -.8 (2.7).2 25 -.4 (2.).5 -.8 (.2).9 -.7 (.).9 -.6 (.9).8 -.29 2 -.25 (8.9).28.32(.2).35 -.8 (2.8).5 -.6 (2.).3 25 -.2 (4.3).5.6 (5.5).8 -.8 (2.7). -.2 (.5).8 5 -.6 (2.).8.8 (2.9). -.3 (.9).6. (.).6 -.3 (.).5.4 (.5).6. (.).4. (.).4 25 -. (.4).3.2 (.7).3. (.).3. (.).3 Table II: Bias from the true arameter and RMSE of bias corrected estimates (N=25 ; ercentage of biases to the target arameter values are in the arentheses) 2 T b 2; 2 e AR() 2; 2 e P 2; L 2 e 2 bias (%) rmse bias (%) rmse bias (%) rmse bias (%) rmse.4 2 -.9(22.7).94.8 (4.6).32 -.3 (.8).24 -.62(4.4).67 25.97(24.2).98.57(39.).57.56(39.).57 -.55(3.8).57 5.86(46.6).87.28(54.5).28.27(54.3).27 -.2 (5.3).23.228(57.).228.244(6.).244.246(6.5).246 -.9 (2.2). 25.252(63.).252.259(64.8).259.26(64.9).26 -.3 (.6).5 -.4 2.89(22.3).9.46(36.6).47.6(26.6).7 -.24 (5.9).5 25.2(25.5).2.3(32.5).3.7(26.7).7 -.5 (.2).3 5.9(27.).9.23(3.7).23.2(27.9).2 -.2 (.4).9.(27.9).2.9(29.6).9.5(28.7).5 -. (.2).6 25.3(28.3).3.6(29.).6.5(28.7).5. (.).4 The result tells several imortant oints. First, for the original atterns of the xed-e ect bias, Table I shows that the direction of the bias is always negative (under the stationarity) and it decreases as T gets large, which corresonds to the analytical ndings in the main theorems, whereas Table II shows it is not the case when the misseci cation bias is also considered. Second, the standard bias correction (e AR() 2; ) could exacerbate the overall bias deending on the arameter values though it reduces the bias from 2; to some degree in most cases (Table I). It is mainly because the Hahn-Kuersteiner s bias correction term is ositive unless b 2; < (which is a rare case under the stationarity) and it would o set the negative xed-e ect bias. The correction, however, could increase the bias when the

correction size is too large comaring to the xed-e ect bias. Third, the enalized likelihood based aroach (e P L 2;) well reduces the bias from the theoretical arameter value robustly to the order misseci cation (Table I) and it is comarable to the model-selection-embedded rocedure (e ;). In addition to the robustness, the enalized likelihood based aroach never increases the mean square error, which tells the bias correction barely increases the samle variance. However, e P 2; L as well as e AR() 2; does not work roerly toward the true arameter 2 (Table II), which is well exected since these two rocedures are not designed to correct the ure misseci cation bias (e.g., b (b (; q)) in (6)). In comarison, the correction method involving model selection (e and e ;) outstands for most of the cases (both for 2; and 2 ) and the erformance imroves as T increases, which would be mainly because that the correct selection robability of the lag order selection rocedure imroves with T. 5 Concluding Remarks This aer calls into question the simle ARX () structure in dynamic anels with xed e ects. When the lag orders are unknown, the rst-order models are most likely misseci ed. In such cases, attemts to adjust for the bias using formulae that correct for ARX () models would be wrong and may even exacerbate the bias. To address these concerns, we undertake an in-deth investigation of the asymtotic bias of the within-grou estimator, where the bias formulae are derived under ossible lag order misseci cation. It should be noted that, therefore, the main focus of this aer is di erent from develoing bias correction methods that is robust to the serial correlations in the error (e.g., Hahn and Kuersteiner, 24). It is closely related with Solon (984), who considers autocorrelation estimators of the serially correlated error term, but deriving exlicit bias formulae of the autoregressive arameters is new. For dynamic anel regression, instrumental variables estimation after rst di erencing (e.g., Arellano and Bond, 99) is an alternative aroach, which does not require any bias correction. However, the instruments are found from the lagged values of the deendent variable with resuming that the error term does not have serial correlations. With lag order misseci cation, however, the regression error could imose serial correlation and thus the instruments could be no longer valid, which incurs inconsistency of the estimator. This aer emhasizes the lag order misseci cation in the linear model. When the linearity is in doubt, however, we could consider the nonarametric aroach as Lee (2a), in which a roer bias correction is also develoed for nonlinear models.

Aendix: Mathematical roofs For V i;t = P j= Aj U i;t j, we rst derive the following lemmas. See Lee (29) for the roof. Lemma A Under Assumtions E and S, lim NT N! lim NT N! NX TX i= t= NX TX i= t= V i;t U i;t = 2 T (I A) (I H T ) M, V i;t V i;t = T (I A) (I H T A) T f(i A) (I H T ) A g, where M = e e and H T = (I A) I A T =T. Lemma A2 Under Assumtions E, S, I and NT, NT N X TX i= t= vec Vi;t Ui;t!d N 2 vec((i A) M); 2 (M ) as N; T! jointly, where = EV i;t V i;t = 2 P j= Aj MA j. Proof of Theorem imlies that Recall that Y = Y A + X B + U = Z + V. Lemma A lim b = lim N! N!NT Z Q X Z + lim N!NT V V = S X + G, (A.) lim b = lim N! N!NT Y Q X Y A + lim N!NT V U = S X A + G G 2, (A.2) from the strict exogeneity of X i;t, where G = h (I A) (I H T A) + f(i A) (I H T ) A g i, (A.3) T G = h (I A) (I H T A) + fa (I A) (I H T )g i, (A.4) T G 2 = 2 T (I A) (I H T ) M. (A.5) 2

Note that = A. Then, from (6), the bias exression for b (; q) follows as lim (b (; q) N! (; q)) = Jq(S X + )J q J q (G + G 2 ) e + I q Jq(S X + )J q J q G J q J q (S X + )J q J q G J q Jq(S X + )J q Jq S X A + G G 2 e = (R q; + R q;2 ) + (I q R q; ) R q; Jq(S X + )J q J q (S X + )A e R q; R q;2 n = (I q R q; ) R q;2 (I q R q; ) R q; R q; Jq(S o X + )J q J q (S X + )A e since J q(s X + )J q J qg J q = (Iq J q(s X + )J q J q G J q ) J q(s X + )J q and by letting R q; = J q(s X + )J q J q G J q, R q; = J q(s X + )J q J q G e and R q;2 = J q(s X + )J q J q G 2 e. Proof of Theorem 2 NT ( b First note that r N ) = T T (by jx )A + Y Q X U, NT in which the rst term satis es N=T T ( b )A! A as N; T! from (A.) and Lemma B2 in Lee (29). For the second term, we have Y Q X U = Z Q X U + V Q X U CN;T + CN;T 2. NT NT NT It can be veri ed that vec(cn;t )! d N ; 2 M lim N;T! (Z Q X Z =NT ) as N; T! rovided lim N;T! (Z Q X Z =NT ) is nonzero and exists. However, since X X =NT = O (), X U = NT = O () and V X =NT = o () from the strict exogeneity, we have CN;T 2 = V V U X X X X U = V NT NT NT U + o (). NT NT Lemma A2 imlies that, therefore, vec(c 2 N;T )! d N ( 2 vec((i A) M); 2 (M )) as N; T!. In sum, we have NT vec( b )! d N ; 2 M ( + lim N;T! (Z Q X Z =NT )), where = vec( 2 (I A) M + A ) since CN;T and C2 N;T are uncorrelated. Further note that as shown in Lee (29), lim T! S X = lim N;T! Z Q X Z =NT (i.e., equivalence between the sequential and the joint robability limits; e.g., Phillis and Moon, 999). 3

Therefore, from (6), for a random vector W N ( ( 2 (I A) M + A )e ; 2 ( + lim T! S X )), we can conclude that (using (A.), (A.2) and Lemma B2 in Lee (29)) NT (b (; q) = (J q (; q)) J q ) J qf NT ( b (Jq b J q ) [Jq N=T T ( b! d (J q( + lim T! S X)J q ) J qw + (J q( + lim T! S X)J q ) [J q = d N ( X q ; 2 D X q ) as N; T!, )e g )J q ](J q J q ) Jq b e Jq ](J q( + lim T! S X)J q ) J q( + lim T! S XA )e where D X q = lim T! D X q = (J q( + lim T! S X )J q ) and X q = D X q J q( 2 (I A) M + A )e References D X q (J qj q )D X q J q( + lim T! S X A )e. [] Alvarez, J. and M. Arellano (23). The time series and cross-section asymtotics of dynamic anel data estimators, Econometrica, 7, 2-59. [2] Arellano, M. and S. Bond (99). Some tests of seci cation for anel data: Monte Carlo evidence and an alication of emloyment equations, Review of Economics Studies, 58, 277-297. [3] Arellano, M. and J. Hahn (26). A likelihood-based aroximate solution to the incidental arameter roblem in dynamic nonlinear models with multile e ects, CEMFI Working Paer: No. 63. [4] Bester, C.A. and C. Hansen (27). A Penalty Function Aroach to Bias Reduction in Nonlinear Panel Models with Fixed E ects, Journal of Business and Economic Statistics, forthcoming. [5] Bhansali, R.J. (978). Linear rediction by autoregressive model tting in the time domain, Annals of Statistics, 6, 224-23. [6] Bhansali, R.J. (98). E ects of not knowing the order of an autoregressive rocess on the mean squared error of rediction I, Journal of the American Statistical Association, 76, 588-597. [7] Bun, M.J.G. and M.A. Carree (25). Bias-corrected Estimation in Dynamic Panel Data Models, Journal of Business and Economic Statistics, 23, 2-2. [8] Hahn, J. and G. Kuersteiner (22). Asymtotically unbiased inference for a dynamic anel model with xed e ects, Econometrica, 7, 639-657. 4

[9] Hahn, J. and G. Kuersteiner (24). Bias reduction for dynamic nonlinear anel models with xed e ects, unublished manuscrit, UCLA. [] Kiviet, J.F. (995). On bias, inconsistency, and e ciency of various estimators in dynamic anel models, Journal of Econometrics, 68, 53-78. [] Kunitomo, N. and T. Yamamoto (985). Proerties of redictors in misseci ed autoregressive time series models, Journal of the American Statistical Association, 8, 94-95. [2] Lee, Y. (26). Nonarametric Aroaches to Dynamic Panel Modelling and Bias Correction, Ph.D. dissertation, Yale University. [3] Lee, Y. (29). Sulementary Aendix to Bias in Dynamic Panel Models under Time Series Misseci cation, available on www-ersonal.umich.edu=~yoolee =research.html. [4] Lee, Y. (2a). Nonarametric Estimation of dynamic anel models with xed e ects, unublished manuscrit, University of Michigan. [5] Lee, Y. (2b). Model selection in the resence of incidental arameters, unublished manuscrit, University of Michigan. [6] Marmol, F. (995). The stationarity conditions for an AR(2) rocess and Schur s theorem, Econometric Theory,, 8-82. [7] Nerlove, M. (967). Exerimental evidence on the estimation of dynamic economic relations from a time series of cross-sections, Economic Studies Quarterly, 8, 42-74. [8] Neyman, J. and E. Scott (948). Consistent estimates based on artially consistent observations, Econometrica, 6, -32. [9] Nickell, S. (98). Biases in dynamic models with xed e ects, Econometrica, 49, 47-425. [2] Okui, R. (28). Panel AR() estimators under misseci cation, Economics Letters,, 2-23. [2] Phillis, P.C.B. and H.R. Moon (999). Linear Regression Limit Theory for Nonstationary Panel Data, Econometrica, 67, 57-. [22] Phillis, P.C.B. and D. Sul (27). Bias in Dynamic Panel Estimation with Fixed E ects, Incidental Trends and Cross Section Deendence, Journal of Econometrics, 37, 62-88. [23] Solon, G. (984). Estimating autocorrelations in xed-e ects models, NBER Technical Working Paer No.32. 5