Robust Standard Errors in Transformed Likelihood Estimation of Dynamic Panel Data Models with Cross-Sectional Heteroskedasticity

Similar documents
Robust Standard Errors in Transformed Likelihood Estimation of Dynamic Panel Data Models with Cross-Sectional Heteroskedasticity

Robust Standard Errors in Transformed Likelihood Estimation of Dynamic Panel Data Models

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM based inference for panel data models

Improving GMM efficiency in dynamic models for panel data with mean stationarity

Short T Dynamic Panel Data Models with Individual and Interactive Time E ects

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

GMM estimation of spatial panels

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Chapter 2. Dynamic panel data models

Strength and weakness of instruments in IV and GMM estimation of dynamic panel data models

Econometrics of Panel Data

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Testing for Regime Switching: A Comment

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

A Course on Advanced Econometrics

Linear dynamic panel data models

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

1 Estimation of Persistent Dynamic Panel Data. Motivation

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Bias Correction Methods for Dynamic Panel Data Models with Fixed Effects

1 Introduction The time series properties of economic series (orders of integration and cointegration) are often of considerable interest. In micro pa

Notes on Generalized Method of Moments Estimation

Transformed Maximum Likelihood Estimation of Short Dynamic Panel Data Models with Interactive Effects

Judging contending estimators by simulation: tournaments in dynamic panel data models

Panel VAR Models with Spatial Dependence

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Iterative Bias Correction Procedures Revisited: A Small Scale Monte Carlo Study

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

Dynamic Panels. Chapter Introduction Autoregressive Model

Chapter 1. GMM: Basic Concepts

On the optimal weighting matrix for the GMM system estimator in dynamic panel data models

A Bias-Corrected Method of Moments Approach to Estimation of Dynamic Short-T Panels *

Asymptotic Properties of Empirical Likelihood Estimator in Dynamic Panel Data Models

Notes on Time Series Modeling

Panel VAR Models with Spatial Dependence

The Time Series and Cross-Section Asymptotics of Empirical Likelihood Estimator in Dynamic Panel Data Models

2. Multivariate ARMA

Some Recent Developments in Spatial Panel Data Models

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels.

Testing for a Trend with Persistent Errors

13 Endogeneity and Nonparametric IV

Consistent OLS Estimation of AR(1) Dynamic Panel Data Models with Short Time Series

Econometric Analysis of Cross Section and Panel Data

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

1. The Multivariate Classical Linear Regression Model

What s New in Econometrics. Lecture 15

The Hausman Test and Weak Instruments y

Estimation of Dynamic Panel Data Models with Sample Selection

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Subset-Continuous-Updating GMM Estimators for Dynamic Panel Data Models

Large Panel Data Models with Cross-Sectional Dependence: A Surevey

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

1. GENERAL DESCRIPTION

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Pseudo panels and repeated cross-sections

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Simple Estimators for Semiparametric Multinomial Choice Models

Parametric Inference on Strong Dependence

We begin by thinking about population relationships.

Introduction: structural econometrics. Jean-Marc Robin

Spatial Dynamic Panel Model and System GMM: A Monte Carlo Investigation

Spatial panels: random components vs. xed e ects

GMM Estimation with Noncausal Instruments

11. Bootstrap Methods

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

New GMM Estimators for Dynamic Panel Data Models

Asymptotic distributions of the quadratic GMM estimator in linear dynamic panel data models

GMM ESTIMATION OF SHORT DYNAMIC PANEL DATA MODELS WITH INTERACTIVE FIXED EFFECTS

Contents. University of York Department of Economics PhD Course 2006 VAR ANALYSIS IN MACROECONOMICS. Lecturer: Professor Mike Wickens.

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

The Case Against JIVE

GMM Estimation of Empirical Growth Models

Estimation and Inference of Linear Trend Slope Ratios

Averaging and the Optimal Combination of Forecasts

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Economics 241B Estimation with Instruments

Simple Estimators for Monotone Index Models

Chapter 2. GMM: Estimating Rational Expectations Models

Estimation of a Local-Aggregate Network Model with. Sampled Networks

Comparing Nested Predictive Regression Models with Persistent Predictors

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria (March 2017) TOPIC 1: LINEAR PANEL DATA MODELS

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Specification Test for Instrumental Variables Regression with Many Instruments

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Cointegration Tests Using Instrumental Variables Estimation and the Demand for Money in England

When is a copula constant? A test for changing relationships

Speci cation of Conditional Expectation Functions

GMM Estimation of SAR Models with. Endogenous Regressors

Bootstrapping the Grainger Causality Test With Integrated Data

Lecture 4: Linear panel models

Transcription:

Robust Standard Errors in ransformed Likelihood Estimation of Dynamic Panel Data Models with Cross-Sectional Heteroskedasticity Kazuhiko Hayakawa Hiroshima University M. Hashem Pesaran University of Southern California, and rinity College, Cambridge January 27, 24 Abstract his paper extends the transformed maximum likelihood approach for estimation of dynamic panel data models by Hsiao, Pesaran, and ahmiscioglu (22) to the case where the errors are cross-sectionally heteroskedastic. his extension is not trivial due to the incidental parameters problem and its implications for estimation and inference. We approach the problem by working with a mis-speci ed homoskedastic model, and then show that the transformed maximum likelihood estimator continues to be consistent even in the presence of cross-sectional heteroskedasticity. We also obtain standard errors that are robust to cross-sectional heteroskedasticity of unknown form. By means of Monte Carlo simulations, we investigate the nite sample behavior of the transformed maximum likelihood estimator and compare it with various GMM estimators proposed in the literature. Simulation results reveal that, in terms of median absolute errors and accuracy of inference, the transformed likelihood estimator outperforms the GMM estimators in almost all cases. Keywords: Dynamic Panels, Cross-sectional heteroskedasticity, Monte Carlo simulation, ransformed MLE, GMM estimation JEL Codes: C2, C3, C23 We are grateful to three referees and the participants at the 8th International Conference on Panel Data, and seminars at Osaka, Sogang and anyang echnological University for helpful comments. his paper was written whilst Hayakawa was visiting the University of Cambridge as a JSPS Postdoctoral Fellow for Research Abroad. He acknowledges the nancial support from the JSPS Fellowship and the Grant-in-Aid for Scienti c Research (KAKEHI 227378) provided by the JSPS. Pesaran acknowledges nancial support from the ESRC Grant o. ES/3626/. Elisa osetti contributed to a preliminary version of this paper. Her assistance in coding of the transformed ML estimator and some of the derivations is gratefully acknowledged.

Introduction In dynamic panel data models where the time dimension ( ) is short, the presence of lagged dependent variables among the regressors makes standard panel estimators inconsistent, and complicates statistical inference on the model parameters considerably. o deal with these di culties a sizable literature has emerged, starting with the seminal papers of Anderson and Hsiao (98, 982) who proposed the application of the instrumental variable (IV) approach to the rst-di erenced form of the model. More recently, a large number of studies have been focusing on the generalized method of moments (GMM), see, among others, Holtz-Eakin, ewey, and Rosen (988), Arellano and Bond (99), Arellano and Bover (995), Ahn and Schmidt (995) and Blundell and Bond (998). One important reason for the popularity of GMM in applied economic research is that it provides asymptotically valid inference under a minimal set of statistical assumptions. Arellano and Bond (99) proposed GMM estimators based on moment conditions where lagged variables in levels are used as instruments. Blundell and Bond (998) showed that the performance of this estimator deteriorates when the parameter associated with the lagged dependent variable is close to unity and/or the variance ratio of the individual e ects to the idiosyncratic errors is large, since in such cases the instruments are only weakly related to the lagged dependent variables. he poor nite sample properties of GMM estimators has been documented using Monte Carlo studies by Kiviet (27), for example. o deal with the weak instrument problem, Arellano and Bover (995) and Blundell and Bond (998) proposed the use of extra moment conditions arising from the model in levels, which become available when the initial observations satisfy certain conditions. he resulting GMM estimator, known as system GMM, combines moment conditions for the model in rst di erences with moment conditions for the model in levels. We refer to Blundell, Bond, and Windmeijer (2) for an extension to the multivariate case, and for a Monte Carlo study of the properties of GMM estimators using moment conditions from either the rst di erenced and/or levels models. Bun and Windmeijer (2) proved that the equation in levels su ers from a weak instrument problem when the variance ratio is large. Hayakawa (27) also shows that the nite sample bias of the system GMM estimator becomes large when the variance ratio is relatively large. he GMM estimators have been used in a large number of empirical studies to investigate problems in areas such as labour, development, health, macroeconomics and nance. heoretical and applied research on dynamic panels has mostly focused on the GMM, and has by and large neglected the maximum likelihood (ML) approach though there are several theoretical advances such as Hsiao, Pesaran, and ahmiscioglu (22), Binder, Hsiao, and Pesaran (25), Alvarez and Arellano (24) and Kruiniger (28). Hsiao, Pesaran, and ahmiscioglu (22) propose the transformed likelihood approach while Binder, Hsiao, and Pesaran (25) have extended the approach to estimating panel VAR (PVAR) models. Alvarez and Arellano (24) have studied ML estimation of autoregressive panels in the presence of time-speci c heteroskedasticity (see also Bhargava and Sargan (983)). Kruiniger See also the discussion in Binder, Hsiao, and Pesaran (25), who proved that the asymptotic variance of the Arellano and Bond (99) GMM estimator depends on the variance of the individual e ects. 2

(28) considers ML estimation of a stationary/unit root AR() panel data models. here are several reasons why the GMM approach is preferred compared to the ML approach. First, the regularity conditions required to prove consistency and asymptotic normality of the GMM type estimators are relatively mild and allow for the presence of cross-sectional heteroskedasiticity of the errors. In particular, see Arellano and Bond (99), Arellano and Bover (995) and Blundell and Bond (998). Second, for the ML approach, the incidental parameters problem and the initial values problem lead to a violation of the standard regularity conditions, which causes inconsistency. Although Hsiao, Pesaran, and ahmiscioglu (22) developed a transformed likelihood approach to overcome some of the weaknesses of the GMM approach (particularly the weak IV problem), their analysis still requires the idiosyncratic errors to be homoskedastic, which is likely to be restrictive in many empirical applications. 2 It is therefore highly desirable to extend the transformed ML approach of Hsiao, Pesaran and ahmiscioglu (HP) so that heteroskedastic errors can be permitted. 3. his is accomplished in this paper. he extension is not trivial due to the incidental parameters problem that arises, in particular its implications for inference. We follow the time series literature, and initially ignore the error variance heterogeneity and work with a misspeci ed homoskedastic model, but show that the transformed maximum likelihood estimator by Hsiao, Pesaran, and ahmiscioglu (22) continues to be consistent. We then derive, under fairly general conditions, a covariance matrix estimator for the quasi-ml (QML) estimator which is robust to cross-sectional heteroskedasticity. Using Monte Carlo simulations, we investigate the nite sample performance of the transformed QML estimator and compare it with a range of GMM estimators. Simulation results reveal that, in terms of median absolute errors and accuracy of inference, the transformed likelihood estimator outperforms the GMM estimators in almost all cases when the model contains an exogenous regressor, and in many cases if we consider pure autoregressive panels. he rest of the paper is organized as follows. Section 2 describes the model and its underlying assumptions. Section 3 proposes the transformed QML estimator for cross-sectionally heteroskedastic errors. Section 4 provides an overview of the GMM estimators used in the simulation exercise. Section 5 describes the Monte Carlo design and comments on the small sample properties of the transformed likelihood and GMM estimators. Finally, Section 6 ends with some concluding remarks. 2 he dynamic panel data model Consider the following dynamic panel data model y it = i + y i;t + x it + u it ; i = ; 2; :::; ; () 2 In the application of the GMM approach to dynamic panels, it is generally di cult to avoid the so-called many/weak instruments problem, which is shown to result in biased estimates and substantially distorted test outcomes. See Section 5 for further evidence. 3 ote, however, that since the transformed ML approach does not impose any restrictions on the individual e ects, the errors of the original panel (before di erencing) can have any arbitrary degree of cross-sectional heteroskedasticity. 3

where i, i = ; 2; :::; are the unobserved individual e ects, u it is an idiosyncratic error term, x it is observed regressor assumed to vary over time (t) and across the individuals (i). It is further assumed that x it is a scaler variable to simplify the notations. 4 he coe cients of interest are and, which are assumed to be xed nite constants. o restrictions are placed on the individual e ects, i. hey can be heteroskedastic, correlated with x jt and u jt, for all i and j, and can be cross-sectionally dependent. In contrast, the idiosyncratic errors, u it, are assumed to be uncorrelated with x it for all i, t and t. However, we allow the variance of u it to vary across i, and let the variance ratio, 2 = i= V ar ( i) = i= V ar (u it) to take any positive value. We shall investigate the robustness of the QML and GMM estimators to choices of and. Follwoing the literature we take rst di erences of () to eliminate the individual e ects 5 y it = y i;t + x it + u it ; (2) and make the following assumptions: Assumption (Starting period) he dynamic processes () have started at time t = m, (m being a positive constant) but only the time series data, y it ; x it ; i = ; 2; :::; ; t = ; ; :::; ; are observed. Assumption 2 (Exogenous variable) It is assumed that x it is generated either by or X x it = i + t + a j " i;t j ; j= X x it = + d j " i;t j ; j= X ja j j < (3) j= X jd j j < (4) j= where i can either be xed or random. " it are independently distributed over i and t with E(" it ) =, and var(" it ) = 2 "i with < 2 "i < K <, and independent of u is for all s and t. Assumption 3 (Initialization) Depending on whether the y it process has reached stationarity, one of the following two assumptions holds: (i) j j< ; and the process has been going on for a long time, namely m! ; (ii) he process has started from a nite period in the past not too far back from the th period, namely for given values of y i; m with m nite, such that E(y i; m+ jx i ; x i2 ; :::; x i ) = b m + mx i ; for all i; where b m is a nite constant, m is a -dimensional vector of constants, and x i = (x i ; x i2 ; :::; x i ). 4 Extension to the case of multiple regressors is straightforward at the expense of notational complexity. 5 As shown in Appendix A of Hsiao, Pesaran, and ahmiscioglu (22), other transformations can be used to eliminate the individual e ects and the QML estimator proposed in this paper is invariant to the choice of such transformations. 4

Assumption 4 (idiosyncratic shocks) Disturbances u it are serially and cross-sectionally independently distributed, with E (u it ) = ; and E u 2 it = 2 i such that < 2 i < K <, for i = ; 2; :::; and t = ; 2; :::;. Remark Assumption 3.(ii) constrains the expected changes in the initial values to be the same across all individuals, but does not necessarily require that j j<. ote that this does not require the initial values, y i; m, to have the same mean across i, and allows y i; m to vary both with i and i. It is only required that y i; m+ y i; m is free of the incidental paramter problem. Remark 2 Assumptions 2, and 4 allow for heteroskedastic disturbances in the equations for y it and x it : Also Assumption 2 requires x it to be strictly exogenous. hese restrictions can be relaxed by considering a panel vector autoregressive speci cation of the type considered in Binder, Hsiao, and Pesaran (25). However, these further developments are beyond the scope of the present paper. See also the remarks in Section 6. 3 ransformed likelihood estimation he rst-di erenced model (2) is well de ned for t = 2; 3; :::;, and can be used to derive the joint distribution of (y i2 ; y i3 ; :::; y i ) conditional on y i. o obtain the (unconditional) distribution of y i, starting from y i; m+, and by continuous substitution, we note that y i = m y i; mx mx m+ + j x i; j + j u i; j : (5) j= j= ote that the mean of y i conditional on y i; m+ ; x i ; x i ; :::, is given by mx i = E (y i jy i; m+ ; x i ; x i ; :::) = m y i; m+ + j x i; j ; (6) which depends on the unknown values y i; m+, and x i; j ; for j = ; 2; :::; m, i = ; 2; :::;. o solve this problem, we need to express the expected value of i, conditional on the observables, in a way that it only depends on a nite number of parameters. he following theorem provides the conditions under which it is possible to derive a marginal model for y i, which is a function of a nite number of unknown parameters. heorem Consider model (2), where x it follows either (3) or (4). Suppose that Assumptions, 2, 3, and 4 hold. hen y i can be expressed as: y i = b + x i + v i ; (7) j= where b is a constant, is a -dimensional vector of constants, x i = (x i ; x i2 ; :::; x i ), and v i 5

is independently distributed across i, such that E(v i ) = ; and E(v 2 i ) =! i 2 i ; with <! i < K <, for all i. It is now possible to derive the likelihood function of the transformed model given by equations (2) for t = 2; 3; :::; and (7). Let y i = (y i ; y i2 ; :::; y i ), x i y W i = i x i2 B C ( +3) @.... A ; (8) y i; x i and note that the transformed model can be rewritten as y i = W i ' + r i ; (9) with ' = (b; ; ; ). he covariance matrix of r i = (v i ; u i2 ; :::; u i ) has the form: E(r i r i) = 2 i B @! i 2......... 2 2 where! i > is a free parameter de ned in heorem. he log-likelihood function of the transformed model (9) is given by = 2 i (! i ) ; () C A ` ( ) = 2 ln (2) 2 2 2 i= i ln 2 i i= 2 ln [ + (! i )] i= (y i W i ') (! i ) (y i W i ') ; () where = ' ;! ;! 2 :::;! ; 2. ; 2 2 ; :::2 Unfortunately, the maximum likelihood estimation based on `( ) encounters the incidental parameters problem of eyman and Scott (948) since the number of parameters grows with the sample size,. Following the mis-speci cation literature in econometrics, (White, 982; Kent, 982), we examine the asymptotic properties of the ML estimators of the parameters of interest, ', using a mis-speci ed model where the heteroskedastic nature of the errors is ignored. Accordingly, suppose that it is incorrectly assumed that the regression errors u it are homoskedastic, i.e., 2 i = 2 and! i =!, for i = ; 2; :::;. hen under this mis-speci cation, the pseudo log- 6

likelihood function of the transformed model (9), is given by `p () = 2 ln (2) 2 ln 2 ln [ + (! )] 2 2 2 (y i W i ') (!) (y i W i ') ; (2) i= where = ' ;!; 2 is the vector of unknown parameters. Let b be the estimator obtained by maximizing the pseudo log-likelihood function in (2). o characterize the relationship between the true parameter values = ' ;! ; :::;! ; 2 ; :::2 and the pseudo true values = ' ;! ; 2 that maximizes the pseudo log-likelihood function (2), we introduce the following average parameter measures. Assumption 5 he average true parameter values 2 ; = X 2 i; i= and! ; = P i=! i 2 i P i= 2 i ; (3) have nite limits (as! ) given by value. 2 = lim! 2 ; ; and! = lim! P i=! i 2 i lim! P : (4) i= 2 i he following theorem establishes the relationship between the true value and the pseudo true heorem 2 Suppose that Assumptions, 2, 3, 4, and 5 hold, and let = ' ;! ; 2 be the solution of the rst-order condition E [@`p ( ) =@] = of the pseudo log-likelihood function, (2), where expectations are taken with respect to the true probability measures. hen, = ' ;! ; 2 : his theorem summarizes one of the key results of the paper, and holds under fairly general conditions. Assumptions, 2, 3 are identical to those used in Hsiao et. al. (22). Assumption 4 allows the error variances of the error terms to be heteroskedastic in an arbitrary manner. Assumption 5 only requires the indiviudal error variances and their ratios to be nite. he possible non-uniqeness of the pseudo true values in the case of heterogenous! i, is analogous to the non-uniqueness of the ML estimators encountered in the case of the random e ect models as demonstarted initially by Maddala (97) and further discussed by Breusch (987) who proposes a practical approach to detecting the presence of local maxima. 6 heoretically, it is quite complicated to demonsrate which solution leads 6 his result follows since, as established by Grassetti (2), the transformed likelihood function can be written equivalently in the form of a random e ects model with endogenous regressors. For further details see Section A.3. 7

to the global maximum. However, the Monte Carlo simulation results in Section 5 suggest that the solution = ' ;! ; 2 is assocaited with the global maximum. In what follows we assume that the global maximum of the probability limit of the pseudo log-likelihood function is atatined at = ' ;! ; 2 =. he following theorem establishes the asymptotic distribution of the ML estimator of the transformed model. heorem 3 Suppose that Assumptions, 2, 3, 4, and 5 hold and let b = b' ; b!; b 2 be the QML estimator obtained by maximizing the pseudo log-likelihood function in (2). hen as tends to in nity, b is asymptotically normal with p b d! ; A B A (5) where = ' ;! ; 2, A = lim E! @ 2`p ( ) @@ ; and B = lim E! @`p ( ) @ @`p ( ) @ : (6) A consistent estimator of A, denoted by b A which is robust to unknown heteroskedasticity, is given by ba = B @ P b 2 i= W i (b!) W i g(b!) 2 b 2 P i= br i W i g(b!) 2 b 2 P i= W i br i 2 2g(b!) 2 2g(b!)b 2 2g(b!)b 2 2b 4 C A ; (7) where br i = y i W i b'; g(^!) = + (^! ), is de ned by (47) in the Appendix, and b 2 = P i= br i (b!) br i. Partitioning B, and its estimator b B, accordingly to the above partioned form of b A, we have bb = b 4 Wi(b!) br i br i(b!) W i ; B b 22 = i= ( bb 33 = 2 br i (b!) 2 ) br i 4b 8 b 4 ; B b 2 = i= bb 32 = bb 3 = 2b 6 " 2 4g(b!) 2 b 6 i= ( 2 4g(b!) 4 b 4 2g(b!) 2 b 4 i= ) br i br 2 i g(b!) 2 b 4 ; i= (8) br i(b!) W i br i br i ; (9) br i(b!) W i br i (b!) br i ; (2) br i br i br i (b!) br i i= g(b!)b 4 # : (2) 8

See also lemmas A5 and A6 and section A.4. 4 GMM Estimators: an overview In this section, we review, and for completeness, de ne the GMM type estimators which are included in our simulation exercise. he GMM approach assumes that i and u it have an error components structure, E ( i ) = ; E (u it ) = ; E( i u it ) = ; (i = ; ::; ; t = ; 2; :::; ); (22) and the errors are uncorrelated with the initial values E (y i u it ) = ; (i = ; ::; ; t = ; 2; :::; ): (23) As with the transformed likelihood approach, it is also assumed that the errors, u it, are serially and cross-sectionally independent: E (u it u is ) = ; (i = ; ::; ; t = ; 2; :::; ): (24) However, note that under the transformed QML no restrictions are placed on E( i u it ), and E( i u it ) are allowed to be non-zero and heterogenous across i. 4. Estimation 4.. he rst-di erence GMM estimator Under (22)-(24), and focusing on the equation in rst di erences, (2), Arellano and Bond (99) suggest the following ( )=2 moment conditions: E (y is u it ) = ; (s = ; ; :::; t 2; t = 2; 3; :::; ): (25) If regressors, x it, are strictly exogenous, i.e., if E (x is u it ) =, for all t and s, then the following additional moments can also be used E (x is u it ) = ; (s; t = 2; :::; ): (26) he moment conditions (25) and (26) can be written compactly as: h i E _Z i _u i = ; 9

where _u i = _q i _Z i = _W i, = (; ) = ( ; 2 ) and y i ; x i ; :::; x i ::: y i ; y i ; x i ; :::; x i ::: B. @.... C A ; _q i = B @ y i2. y i ::: y i ; :::; y i; 2 ; x i ; :::; x i C A ; _W i = B @ y i. x i2 y i; x i. C A : he one and two-step rst-di erence GMM estimators based on the above moment conditions are given by b dif GMM = b dif GMM2 = _S _ ZW D _SZW step _S _ ZW D _SZq step ; (27) _S _ ZW D _SZW 2step _S _ ZW D _SZq 2step ; (28) where S _ ZW = P _ i= Z _ i W i, S _ Zq = P _ i= Z i _q i, D _ step = P _ i= Z i H Z _ i, D _ 2step = P _ i= Z b i _u i b_u _ i Z i, dif b_u i = _q i _W i b GMM, and H is a matrix with 2 s on the main diagonal, - s on the rst upper and lower sub-diagonals and s otherwise. 4..2 System GMM estimator Although consistency of the rst-di erence GMM estimator is obtained under the no serial correlation assumption, Blundell and Bond (998) demonstrated that it su ers from the so called weak instruments problem when is close to unity, and/or the variance ratio 2 = i= var( i)= i= var(u it) is large. As a solution, these authors propose the system GMM estimator due to Arellano and Bover (995) and show that it works well even if is close to unity. But as shown recently by Bun and Windmeijer (2), the system GMM estimator continues to su er from the weak instruments problem when the variance ratio, 2 is large. See also Appendix of Binder, Hsiao, and Pesaran (25) where it is shown that the asymptotic variance of the GMM estimator is an increasing function of 2. o introduce the moment conditions for the system GMM estimator, the following additional homogeneity assumptions are required: E(y is i ) = E(y it i ); for all s and t; E(x is i ) = E(x it i ); for all s and t:

Under these assumptions, we have the following moment conditions: E [y is ( i + u it )] = ; (s = ; :::; t ; t = 2; 3; :::; ); (29) E [x is ( i + u it )] = ; (s; t = 2; 3; :::; ): (3) For the construction of the moment conditions for the system GMM estimator, given the moment conditions for the rst-di erence GMM estimator, some moment conditions in (29) and (3) are redundant. Hence, to implement the system GMM estimation, in addition to (25) and (26), we use the following moment conditions: E [y i;t ( i + u it )] = ; (t = 2; 3; :::; ); (3) E [x it ( i + u it )] = ; (t = 2; 3; :::; ): (32) he moment conditions (25), (26), (3) and (32) can be written compactly as h i E Z i u i = ; where u i = q i Wi, Z i = diag _Zi ; Z i ; Zi = B @ y i ; x i2 ::: y i2 ; x i3..... C A ; q i = _q i q i! ; q i = B @ y i2. y i ::: y i; ; x i! y i x i2 C A ; Wi _W i = ; Wi = B W @.. i y i; x i C A : he one and two-step system GMM estimators based on the above conditions are given by b sys GMM = b sys GMM2 = S Dstep SZW S Dstep SZq ZW ZW ; (33) S D2step SZW S D2step SZq ZW ZW ; (34) where S ZW = P i= Z i W i, S Zq = P i= Z i q i and D step = diag P i= _ Z i H _ Z i ; P i= Z i Z i. he two-step system GMM estimator is obtained by replacing D step with D 2step = P i= Z b iu i bu iz i ; where b sys u i = q i Wi b GMM. Although additional moment conditions proposed by Ahn and Schmidt (995) could be used, we mainly focus on the above two set of moment conditions since they are often used in applied research.

4..3 Continuous-updating GMM estimator Since the two-step GMM estimators have undesirable nite sample bias property, (ewey and Smith, 24), alternative estimation methods have been proposed in the literature. hese include the empirical likelihood estimator, (Qin and Lawless, 994), the exponential tilting estimator (Kitamura and Stutzer, 997; Imbens, Spady, and Johnson, 998) and the continuous updating (CU-) GMM estimator (Hansen, Heaton, and Yaron, 996), where these are members of the generalized empirical likelihood estimator (ewey and Smith, 24). Amongst these estimators, we focus on the CU-GMM estimator as an alternative to the two-step GMM estimator. o de ne the CU-GMM estimator, we need some additional notation. Let Z i denote Z _ i or Z i, and u i denote _u i or u i, and set g i () = Z i u i; bg () = g i (); b () = [g i () bg()] [g i () bg()] : i= i= hen, the CU-GMM estimator is de ned as b GMM CU = arg min bg () b () bg (): (35) ewey and Smith (24) demonstrate that the CU-GMM estimator has a smaller nite sample bias than the two-step GMM estimator. 4.2 Inference using GMM estimators 4.2. Alternative standard errors In the case of GMM estimators the choice of the covariance matrix is often as important as the choice of the estimator itself for inference. Although, it is clearly important that the estimator of the covariance matrix should be consistent, in practice it might not have favorable nite sample properties and could result in inaccurate inference. o address this problem, some modi ed standard errors have been proposed. For the two-step GMM estimators, Windmeijer (25) proposes corrected standard errors for linear static panel data models which are applied to dynamic panel models by Bond and Windmeijer (25). For the CU-GMM, while it is asymptotically equivalent to the two-step GMM estimator, it is more dispersed than the two-step GMM estimator in nite samples and inference based on conventional standard errors formula results in a large size distortions. o overcome this problem, ewey and Windmeijer (29) propose an alternative estimator for the covariance matrix of CU-GMM estimator under many-weak moments asymptotics and demonstrate by simulation that the use of the modi ed standard errors improve the size property of the tests based on the CU-GMM estimators. 7 7 For the precise de nition of many weak moments asymptotics, see ewey and Windmeijer (29). 2

4.2.2 Weak instruments robust inference As noted above, the rst-di erence and system GMM estimators could be subject to the weak instruments problem. In the presence of weak instruments, the estimators are biased and inference becomes inaccurate. As a remedy for this, some tests that have correct size regardless of the strength of instruments have been proposed in the literature. hese include Stock and Wright (2) and Kleibergen (25). Stock and Wright (2) propose a GMM version of the Anderson and Rubin(AR) test (Anderson and Rubin, 949). Kleibergen (25) proposes a Lagrange Multiplier (LM) test. his author also extends the conditional likelihood ratio (CLR) test of Moreira (23) to the GMM case since the CLR test performs better than other tests in linear homoskedastic regression models. We now introduce these tests. Wright (2) is de ned as he GMM version of the AR statistic proposed by Stock and AR() = 2 Q(); (36) where Q () = bg() b () bg()=2: Under the null hypothesis H : =, this statistic is asymptotically (as! ) distributed as 2 n; regardless of the strength of the instruments, where n is the dimension of g i (). he LM statistic proposed by Kleibergen (25) is LM() = @Q() @ where D() b = bd (); d b 2 () with h bd() b () b D() i @Q() @ ; (37) bd j () = i= @g i () @ j i=! @g i () g i () b() bg(); for j = ; 2: @ j Under the null hypothesis H : =, this statistic follows 2 2, asymptotically. he GMM version of the CLR statistic proposed by Kleibergen (25) is given by CLR() = r # 2 "AR() R() b + AR() R() b + 4LM() R() b 2 (38) where R() b is a statistic which is large when instruments are strong and small when the instruments are weak, and is random only through D() b asymptotically. In the simulation, following ewey and Windmeijer (29), we use R() b = min bd() () b D() b where min (A) denotes the smallest eigenvalue of A. Under the null hypothesis H : =, this statistic asymptotically follows a nonstandard distribution whose critical values can be obtained by simulation 8. hese tests are derived under the standard asymptotics where the number of moment conditions is xed. Recently, ewey and Windmeijer (29) show that these results are valid even under many 8 For the details of computation, see Kleibergen (25) or ewey and Windmeijer (29). 3

weak moments asymptotics. 5 Monte Carlo simulations In this section, we conduct Monte Carlo simulations to investigate the nite sample properties of the transformed QML approach and compare them to those of the various GMM estimators proposed in the literature and reviewed in the previous section. 5. ARX() model We rst consider a distributed lag model with one exogenous regressor, ARX(), which is likely to be more relevant in practice than the pure AR() model which will be considered later. 5.. Monte Carlo design For each i, the time series processes fy it g are generated as y it = i + y i;t + x it + u it ; for t = m + ; m + 2; ::; ; ; :::; ; (39) where u it (; 2 i ); with 2 i U[:5; :5], so that E(2 i ) =. For the initial values, we set y i; m = and note that for m su ciently large, y i t m mx i + j= mx j x i; j + j= j u i; j : We discard the rst m = 5 observations, and use the observations t = through for estimation and inference. 9 he regressor, x it, is generated as where i iid (; ) x it = i + it ; for t = m; m + ; ::; ; ; :::; ; (4) it = i;t + " it ; for t = 49 m; 48 m; :::; ; ; :::; ; (4) " it (; 2 "i); i; m 5 = : (42) with jj <. We also generate a set of heteroskedastic errors for the x it process and generate 2 "i U[:5; :5], independently of 2 i, and ensure heterogeneous variance ratios 2 i =2 "i across i. We discard the rst 5 observations of it and use the remaining + + m observations for generating x it and y it. 9 Hence, + is the actual length of the estimation sample. 4

In the simulations, we try the values = :4; :9, and = :5. he slope coe cient,, is set to ensure a reasonable degree of t. But to deal with the error variance heterogeneity across the di erent equations in the panel we use the following average measure R 2 y = P i= V ar(u it) P i= V ar(y itjc i ) ; where V ar(y it jc i ) is the time-series variation of i th unit. Since y it is stationary and it is assumed to have started some time in the past we have X X X y it = c i + j i;t j + j u i;t j = c i + w it + j u i;t j ; j= j= j= c i = ( i + i ) =( ), and w it is an AR(2) process, w it = ' w i;t + ' 2 w i;t 2 + " it, with parameters ' = +, ' 2 =, and having the variance (Hamilton, 994, p. 58) V ar (w it ) = ( + ) 2 "i ( ) h( + ) 2 ( + ) 2i = ( + ) 2 "i ( 2 ) 2 ( ) : Hence R 2 y = P i= 2 i 2 (+)( P i= 2 "i) ( 2 )( 2 )( ) + P i= 2 i ( 2 ) = 2 (+) 2 " ( 2 )( ) + 2 2 2 (+) 2 " ( 2 )( ) + 2 ; 2 = P i= 2 i, and 2 " = P i= 2 "i. For su ciently large we now have (note that 2 and 2 "! with! ) and R 2 y = 2 (+) ( 2 )( ) + 2 ; 2 (+) + ( 2 )( )! 2 = R2 y 2 2 ( ) Ry 2 : ( + ) We set such that R 2 y = 2 + :. For = :4 and = :9 we have R 2 y = :26 and R 2 y = :9. For the individual e ects, we set i = ( i + u i + v i ) ; where u i = P t= u it, and v i iid (; ). ote that i is correlated with u it : o set we consider the variance ratio, 2 = P i= V ar( i) P i= V ar(u it) = 2 ( 2 + 2) 2 ; 5

and use two values for, namely a low value of = often set in the Monte Carlo experiments conducted in the literature, and the high value of = 5. he sample sizes considered are = 5; 5; 5 and = 5; ; 5. For the computation of the transformed ML estimators, we try two procedures. One is to maximize the log likelihood function directly, while the other is to use an iterative procedure suggested by Grassetti (2). For the starting value of the nonlinear optimization, we use the minimum distance estimator of Hsiao, Pesaran, and ahmiscioglu (22) where! is estimated by the one-step rstdi erence GMM estimator (27) in which _ Z i is replaced with _Z i = B @ y i x i y i x i2 y i x i.. y i; 2 x i; y i; 3 x i; 2 his GMM estimator is also used as the starting value for the iterative procedure. For the GMM estimators, although there are many moment conditions for the rst-di erence GMM estimator as in (25) and (26), we consider three sets of moment conditions which only exploit a subset of instruments. he rst set of moment conditions, denoted as DIF", consists of E(y is u it ) = for s = ; :::; t. 2; t = 2; :::; and E(x is u it ) = for s = ; :::; t; t = 2; :::;. In this case, the number of moment conditions are 24; 99; 224 for = 5; ; 5, respectively. he second set of moment conditions, denoted as DIF2", consist of E(y i;t 2 l u it ) = with l = for t = 2, l = ; for t = 3; :::; and E(x i;t l u it ) = with l = ; for t = 2, l = ; ; 2 for t = 3; :::;. In this case, the number of moment conditions are 8; 43; 68 for = 5; ; 5, respectively. he third set of moment conditions, denoted as DIF3", consist of P t=2 E(y i;t 2u it ) =, P t=2 E(y i;t 2u it ) =, P t=2 E(x itu it ) = ; and P t=2 E(x itu it ) =. he number of moment conditions for this case, often called the stacked instruments, are 4 for all : Similarly, for the system GMM estimator, we add moment conditions (3) and (32) in addition to DIF" and DIF2", which are denoted as SYS" and SYS2", respectively. For SYS" we have 32, 7, 252 moment conditions for = 5; ; 5, respectively, while for SYS2" we have 26, 6, 96 moment conditions for = 5; ; 5, respectively. Also, we add moment conditions P t=2 E [y i;t ( i + u it )] =, P t=2 E [y i;t ( i + u it )] =, P t=2 E [x it( i + u it )] = and P t=2 E [x it( i + u it )] = in addition to DIF3", which is denoted as SYS3". In this case, the number of moment conditions is 8 for any : In a number of cases where is not su ciently large relative to the number of moment conditions (for example, when = 5 and = 5) the inverse of the weighting matrix can not be computed. Such cases are denoted as " in the summary result tables. For inference, we use the robust standard errors formula given in heorem 2 for the transformed QML estimator. For the GMM estimators, in addition to the conventional standard errors, we also compute Windmeijer (25) s standard errors with nite sample correction for the two-step GMM estimators and ewey and Windmeijer (29) s alternative standard errors formula for the CU-GMM. C A : 6

estimators. For the computation of optimal weighting matrix, a centered version is used except for the CU-GMM. In addition to the Monte Carlo results for and, we also report simulation results for the longrun coe cient de ned by = =( ). We report median biases, median absolute errors (MAE), size and power for, and. he power is computed at :, : and ( :)=( ( :)), for selected null values of and. All tests are carried out at the 5% signi cance level, and all experiments are replicated ; times. 5..2 Results for the ARX() model o save space, we report the results of the GMM estimators which exploit moment conditions DIF2" and SYS2" with one-step estimation procedure only. he reason for selecting these moment conditions is that, in practice, these moment conditions are often used to mitigate the nite sample bias caused by using too many instruments. A complete set of results giving the remaining GMM estimators that make use of additional instruments are provided in a supplement available from the authors on request. he small sample results for and are summarized in ables to 4. We rst focus on the results of and then discuss the results for : able (and A.2 in the supplement) provide the results of bias and MAE for the case of = :4, and shows that the transformed QMLE has a smaller bias than the GMM estimators in all cases with the exception of the CU-GMM estimator (see able A.2). In terms of MAE the transformed QMLE outperforms the GMM estimators in all cases. As for the e ect of increasing the variance ratio,, on the various estimators, we rst recall that the transformed QMLE is invariant to the choice of. In contrast, as to be expected the performance of the GMM estimators deteriorates (in some case substantially) as is increased from to 5. his tendency is especially evident in the case of the system GMM estimators, and is in sharp contrast to the performance of the transformed QMLE which is robust to changes in. hese observations also hold if we consider the experiments with = :9 (able 2). Although the GMM estimators have smaller biases than the transformed likelihood estimator in a few cases, in terms of MAE, the transformed QMLE performs best in all cases (see also able A.2 in the supplement). We next consider size and power of the various tests, summarized in ables 3 and 4 (A.3 and A.3 in the supplement). he results in these tables show that the empirical size of the transformed QMLE is close to the nominal size of 5% for all values of,, and. In contrast, for the GMM estimators, we nd that the test sizes vary considerably depending on ;,,, the estimation method (step, 2step, CU), and whether corrections are applied to the standard errors. In the case of the GMM results without standard error corrections, most of the GMM methods are subject to substantial size distortions when is small. For instance, when = :4; = 5, = 5, and =, the size of the test based on the two-step procedure using moment conditions DIF2" estimator is 34:2%. But the In the earlier version, we used centered weighting matirx. However, in this version, uncentered weighting matrix is used for the CU-GMM since it gave better performance than using centered weighting matrix. he corresponding tables in the supplement are labelled as ables A. to A.9. 7

size distortion gets smaller as increases. Increasing to 5, reduces the size of this test to 7:7%. However, even with = 5, the size distortion gets larger for two-step and CU-GMM estimators as increases. As to the e ects of changes in on the estimators, we nd that the system GMM estimators are signi cantly a ected when is increased. When = 5, all the system GMM estimators have large size distortions even when = 5 and = 5; where conventional asymptotics are expected to work well. his may be due to large nite sample biases caused by a large. For the tests based on corrected GMM standard errors, Windmeijer (25) s correction seems to be quite useful, and in many cases it leads to accurate inference, although the corrections do not seem able to mitigate the size problem of the system GMM estimator when is large. he standard errors of ewey and Windmeijer (29) are also helpful: they improve the size property in many cases. Comparing power of the tests, we observe that the transformed likelihood estimator is in general more powerful than the GMM estimators. Speci cally, the transformed likelihood estimators have higher power than the most e cient two-step system GMM estimator based on SYS" with Windmeijer s correction. he above conclusions for size and power hold generally when we consider experiments with = :9 (able 4 and A.3), except that the system GMM estimators now perform rather poorly even for a relatively large. For example, when = :9, = 5, = 5 and =, size distortions of the system GMM estimators are substantial, as compared to the case where = :4. Although it is known that the system GMM estimators break down when is large 2, the simulation results in able 4 and A.3 reveal that they perform poorly even when is not so large ( = ). We next consider the small sample results for (ables to 4, A.4 to A.6, and A.4 to A.6). he outcomes are similar to the results reported for. he transformed likelihood estimator tends to have smaller biases and MAEs than the GMM estimators in many cases, and there are almost no size distortions for all values of, and. he performance of the GMM estimators crucially depends on the values of, and. Unless is large, the GMM estimators perform poorly and the system GMM estimators are subject to substantial size distortions when is large even for = 5, although the magnitude of size distortions are somewhat smaller than those reported for. he results for the long-run coe cient, = =( ), which are reported in the supplement (ables A.7 to A.9 and A.7 to A.9), are very similar to those of and. Although the GMM estimators outperform the transformed likelihood estimator in some cases, in terms of MAE, the transformed likelihood estimator performs best in almost all cases. As for inference, the transformed likelihood estimator has correct sizes for all values of, and when = :4. However, it shows some size distortions when = :9 and the sample size is small, say, when = 5 and = 5. However, size improves as and/or increase(s). When = 5 and = 5, there is essentially no size distortions. For the GMM estimators, it is observed that although the sizes are correct in some cases, say, the case with = 5 and = 5 when = :4, it is not the case when = :9; even for the case 2 See Hayakawa (27) and Bun and Windmeijer (2). 8

of = 5 and = 5, there are size distortions and a large aggravates the size distortions. Finally, we consider weak instruments robust tests, which are reported in ables 5, A. and A.2. We nd that test sizes are close to the nominal value only when = 5 and = 5. In other cases, especially when is small and/or is large, there are substantial size distortions. Although ewey and Windmeijer (29) prove the validity of these tests under many weak moments asymptotics, they are essentially imposing n 2 =! or a stronger restriction where n is the number of moment conditions, which is unlikely to hold when is small and/or is large. herefore, the weak instruments robust tests are less appealing, considering the very satisfactory size properties of the transformed likelihood estimator, the di culty of carrying out inference on subset of the parameters using the weak instruments robust tests, and large size distortions observed for these tests when is small. In summary, for estimation of ARX panel data models the transformed likelihood estimator has several favorable properties over the GMM estimators in that the transformed likelihood estimator generally performs better than the GMM estimators in terms of biases, MAEs, size and power, and unlike GMM estimators, it is not a ected by the variance ratio of individual e ects to disturbances. 5.2 AR() model 5.2. Monte Carlo design he data generating process is the same as that in the previous section with =. More speci cally, y it are generated as y it = i + y i;t + u it ; (t = m + ; :::; ; :::; ; i = ; :::; ); (43) with y i; m = where u it (; 2 i ) with 2 i y i t m U[:5; :5], and mx i + j= j u i; j : Individual e ects are generated as i = (u i + v i ) ; where v i iid (; ), and is set so that particular values of the variance ratio, 2 = P i= V ar( i) P i= V ar(u it) = 2 ( 2 + ) 2 ; is achieved. ote that for su ciently large 2 t 2 ( + = ): For parameters and sample sizes, we consider = :4; :9, = 5; ; 5; 2 = 5; 5; 5; and = ; 5. Some comments on the computations are in order. For the starting value in the nonlinear optimization routine used to compute the transformed log-likelihood estimator, we use ( e b; e; e!; e 2 ) where 9

e b = P i= y i, e is the one-step rst-di erence GMM estimator (27) where _W i and Z _ i are replaced with 3 y i y i y i y i _W i = B @. C A ; Zi _ = y i2 y i y i ; y B i; @... C A y i; 2 y i; 3 y i; 4 e! = [( )e 2 u] P 2 i= y i e b and e 2 u = [2( 2)] P i= (y it ey i;t ) 2. For the rst-di erence GMM estimators, we consider three sets of moment conditions. he rst set of moment conditions, denoted as DIF", consists of E(y is u it ) = for s = ; :::; t 2; t = 2; :::;. In this case, the number of moment conditions are, 45, 5; 9 for = 5; ; 5; 2, respectively. he second set of moment conditions, denoted by DIF2", consist of E(y i;t 2 l u it ) = with l = for t = 2, and l = ; for t = 3; :::;. In this case, the number of moment conditions are 7; 7; 27; 37 for = 5; ; 5; 2, respectively. he third set of moment conditions, denoted as DIF3", consists of P t=2 E(y i;t 2u it ) =, P t=2 E(y i;t 2u it ) = and P 2 t=2 E(y i;t 2u it ) = : In this case, the number of moment conditions are 3 for all : Similarly, for the system GMM estimator, we add moment conditions E[y i;t ( i + u it )] = for t = 2; :::; in addition to DIF" and DIF2", which are denoted as SYS" and SYS2", respectively. We also add moment conditions P t=2 E [y i;t ( i + u it )] =, P t=2 E [y i;t ( i + u it )] =, P t=2 E [x it( i + u it )] = and P t=2 E [x it( i + u it )] = in addition to DIF3". For the moment conditions SYS", we have 4; 54; 9; 29 moment conditions for = 5; ; 5; 2, respectively, while for the moment conditions SYS2", we have ; 26; 4; 56 moment conditions for = 5; ; 5; 2, respectively. he number of moment conditions for SYS3" are 6 for all : With regard to the inference, we use the robust standard errors formula given in heorem 2 for the transformed loglikelihood estimator. For the GMM estimators, in addition to the conventional standard errors, we also compute Windmeijer (25) s standard errors for the two-step GMM estimators and ewey and Windmeijer (29) s standard errors for the CU-GMM estimators. We report the median biases, median absolute errors (MAE), sizes ( = :4 and :9) and powers (resp. = :3 and :8) with the nominal size set to 5%. As before, the number of replications is set to ;. 5.2.2 Results As in the case of ARX() experiments, to save space, we report the results of the transformed likelihood estimator and the GMM estimators exploiting moment conditions DIF2" and SYS2" with one-step estimation procedure. Complete set of results are provided in a supplement, which is available upon 3 his type of estimator is considered in Bun and Kiviet (26). Since the number of moment conditions are three, this estimator is always computable for any values of and considered in this paper. Also, since there are two more moments, we can expect that the rst and second moments of the estimator to exist. 2

request. In the following, ables 6 to 8 are given in the paper and ables A.2 to A.28 are given in a supplement. he biases and MAEs of the various estimators for the case of = :4 are summarized in ables 6, A.22 and A.26. As can be seen from these tables, the transformed likelihood estimator performs best (in terms of MAE) in almost all cases, the exceptions being the CU-GMM estimators that show smaller biases in some experiments. As to be expected, the one- and two-step GMM estimators deteriorate as the variance ratio,, is increased from to 5, and this tendency is especially evident for the system GMM estimator. For the case of = :9, we nd that the system GMM estimators have smaller biases and MAEs than the transformed likelihood estimator in some cases. However, when = 5, the transformed likelihood estimator outperforms the GMM estimators in all cases, both in terms of bias and MAE. Consider now the size and power properties of the alternative procedures. he results for = :4 are summarized in ables 7 and A.23. We rst note that the transformed likelihood procedure shows almost correct sizes for all experiments. For the GMM estimators, although there are substantial size distortions when = 5, the empirical sizes become close to the nominal value as is increased. When = 5; and = 5 and =, the size distortions of the GMM estimators are small. However, when = 5, there are severe size distortions for the system GMM estimator even when = 5. For the e ects of corrected standard errors, similar results to the ARX() case are obtained. amely, Windmeijer (25) s correction is quite useful, and in many cases it leads to accurate inference although the corrections do result in severely under-sized tests in some cases. Also, this correction does not seem that helpful in mitigating the size problem of the system GMM estimator when is large. he standard errors of ewey and Windmeijer (29) used for the CU-GMM estimators are also helpful: they improve the size property in many cases. Size and power of the tests in the case of experiments with = :9 are summarized in ables 7 and A.27, and show signi cant size distortions in many cases. he size distortion of the transformed likelihood gets reduced for relatively large sample sizes and its size declines to 8.% when =, = 5 and = 2. As to be expected, increasing the variance ratio,, to 5, does not change this result. A similar pattern can also be seen in the case of rst-di erence GMM estimators if we consider =. But the size results are much less encouraging if we consider the system GMM estimators. Also, as to be expected, size distortions of GMM type estimators become much more pronounced when the variance ratio is increased to = 5. Finally, we consider the small sample performance of the weak instruments robust tests which are provided in ables 8, A.24 and A.28. hese results show that size distortions are reduced only when is large ( = 5). In general, size distortions of these tests get worse as, or the number of moment conditions, increases. In terms of power, the Lagrange multiplier test and conditional likelihood ratio test based on SYS2" have almost the same power as the transformed likelihood estimator when = :4, = 5, = 5 and = : For the case of = :9, the results are very similar to the case of = :4. Size distortions are small only when is large. When is small, there are substantial size 2

distortions. 6 Concluding remarks In this paper, we proposed the transformed likelihood approach to estimation and inference in dynamic panel data models with cross-sectionally heteroskedastic errors. It is shown that the transformed likelihood estimator by Hsiao, Pesaran, and ahmiscioglu (22) continues to be consistent and asymptotically normally distributed, but the covariance matrix of the transformed likelihood estimators must be adjusted to allow for the cross-sectional heteroskedasticity. By means of Monte Carlo simulations, we investigated the nite sample performance of the transformed likelihood estimator and compared it with a range of GMM estimators. Simulation results revealed that the transformed likelihood estimator for an ARX() model with a single exogenous regressor has very small bias and accurate size property, and in most cases outperformed GMM estimators, whose small sample properties vary considerably across parameter values ( and ), the choice of moment conditions, and the value of the variance ratio,. In this paper, x it is assumed to be strictly exogenous. However, in practice, the regressors may be endogenous or weakly exogenous (c.f. Keane and Runkle, 992). o allow for endogenous and weakly exogenous variables, one could consider extending the panel VAR approach advanced in Binder, Hsiao, and Pesaran (25) to allow for cross-sectional heteroskedasticity. More speci cally, consider the following bivariate model: y it = yi + y i;t + x it + u it x it = xi + y i;t + x i;t + v it where cov(u it ; v it ) =. In this model, x it is strictly exogenous if = and =, weakly exogenous if =, and endogenous if 6=. his model can be written as a PVAR() model as follows y it x it! = yi + xi xi! + +! y i;t x i;t! + u it + v it v it! ; for i = ; 2; :::;. Let A = fa ij g(i; j = ; 2) be the coe cient matrix of (y i;t ; x i;t ) in the above VAR model. hen, we have = a 2 =a 22, = a a 2 a 2 =a 22, = a 22 and = a 2. hus, if we estimate a PVAR model in (y it ; x it ), allowing for xed e ects and cross-sectional heteroskedasticity, we can recover the parameters of interest, and ; from the estimated coe cients of such a PVAR model. However, detailed analysis of such a model is beyond the scope of the present paper and is left to future research. 22