GMM Estimation of SAR Models with. Endogenous Regressors

Similar documents
GMM Estimation of Spatial Autoregressive Models in a. System of Simultaneous Equations with Heteroskedasticity

GMM Estimation of Spatial Autoregressive Models in a System of Simultaneous Equations

GMM estimation of spatial panels

Estimation of a Local-Aggregate Network Model with. Sampled Networks

Identi cation and E cient Estimation of Simultaneous. Equations Network Models

Some Recent Developments in Spatial Panel Data Models

A Robust Test for Network Generated Dependence

Spatial panels: random components vs. xed e ects

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM Estimation of the Spatial Autoregressive Model in a System of Interrelated Networks

Spatial Regression. 11. Spatial Two Stage Least Squares. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Simple Estimators for Monotone Index Models

13 Endogeneity and Nonparametric IV

Panel VAR Models with Spatial Dependence

Simple Estimators for Semiparametric Multinomial Choice Models

Chapter 2. Dynamic panel data models

GMM Estimation of Social Interaction Models with. Centrality

Testing Random Effects in Two-Way Spatial Panel Data Models

A SPATIAL CLIFF-ORD-TYPE MODEL WITH HETEROSKEDASTIC INNOVATIONS: SMALL AND LARGE SAMPLE RESULTS

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

A Course on Advanced Econometrics

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Chapter 1. GMM: Basic Concepts

Estimation and Inference with Weak Identi cation

Tests forspatial Lag Dependence Based onmethodof Moments Estimation

Panel VAR Models with Spatial Dependence

1 Estimation of Persistent Dynamic Panel Data. Motivation

GMM based inference for panel data models

Analyzing spatial autoregressive models using Stata

The Case Against JIVE

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

ESTIMATION PROBLEMS IN MODELS WITH SPATIAL WEIGHTING MATRICES WHICH HAVE BLOCKS OF EQUAL ELEMENTS*

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Introduction: structural econometrics. Jean-Marc Robin

11. Bootstrap Methods

Specification Test for Instrumental Variables Regression with Many Instruments

GMM Estimation of Spatial Error Autocorrelation with and without Heteroskedasticity

Rank Estimation of Partially Linear Index Models

Tests for Cointegration, Cobreaking and Cotrending in a System of Trending Variables

On Standard Inference for GMM with Seeming Local Identi cation Failure

Estimation of xed e ects panel regression models with separable and nonseparable space-time lters

Notes on Time Series Modeling

Reduced rank regression in cointegrated models

We begin by thinking about population relationships.

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Multivariate Choice and Identi cation of Social Interactions

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

Estimator Averaging for Two Stage Least Squares

Subsets Tests in GMM without assuming identi cation

The Hausman Test and Weak Instruments y

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer

When is it really justifiable to ignore explanatory variable endogeneity in a regression model?

Instrumental Variables/Method of

The properties of L p -GMM estimators

Testing for Regime Switching: A Comment

Lecture 6: Hypothesis Testing

Discussion Paper Series

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

Economics 241B Estimation with Instruments

Robust Standard Errors in Transformed Likelihood Estimation of Dynamic Panel Data Models with Cross-Sectional Heteroskedasticity

Cointegration Tests Using Instrumental Variables Estimation and the Demand for Money in England

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Nonparametric Trending Regression with Cross-Sectional Dependence

Parametric Inference on Strong Dependence

Nonparametric Estimation of Wages and Labor Force Participation

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

1 Overview. 2 Data Files. 3 Estimation Programs

GMM Based Tests for Locally Misspeci ed Models

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Bootstrapping the Grainger Causality Test With Integrated Data

Lecture Notes on Measurement Error

Notes on Generalized Method of Moments Estimation

1. The Multivariate Classical Linear Regression Model

Chapter 2. GMM: Estimating Rational Expectations Models

A New Approach to Robust Inference in Cointegration

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Nonparametric Identi cation of a Binary Random Factor in Cross Section Data

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Least Squares Estimation-Finite-Sample Properties

On the asymptotic distribution of the Moran I test statistic withapplications

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Department of Economics. Working Papers

Finite-sample quantiles of the Jarque-Bera test

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

2014 Preliminary Examination

Estimation and Inference with Weak, Semi-strong, and Strong Identi cation

Averaging Estimators for Regressions with a Possible Structural Break

1 Motivation for Instrumental Variable (IV) Regression

Omitted Variable Biases of OLS and Spatial Lag Models

Web Appendix to Multivariate High-Frequency-Based Volatility (HEAVY) Models

Transcription:

GMM Estimation of SAR Models with Endogenous Regressors Xiaodong Liu Department of Economics, University of Colorado Boulder E-mail: xiaodongliucoloradoedu Paulo Saraiva Department of Economics, University of Colorado Boulder E-mail: saraivacoloradoedu August 015 Abstract In this paper, we extend the GMM estimator in Lee (00) to estimate SAR models with endogenous regressors We propose a new set of quadratic moment conditions exploiting the correlation of the spatially lagged dependent variable with the disturbance term of the main regression equation and with the endogenous regressor The proposed GMM estimator is more e cient than the IV-based linear estimators in the literature, and computationally simpler than the ML estimator With carefully constructed quadratic moment equations, the GMM estimator can be asymptotically as e cient as the ML estimator under normality Monte Carlo experiments show that the proposed GMM estimator performs well in nite samples Key Words: Spatial models, endogeneity, simultaneous equations, moment conditions, e ciency JEL Classi cation: C1, C, R15 We would like to thank the Co-Editor and an anonymous referee for helpful comments and suggestions remaining errors are our own The 1

1 Introduction In recent years, spatial econometric models play a vital role in empirical research on regional and urban economics By expanding the notion of space from geographic space to economic space and social space, these models can be used to study cross-sectional interactions in much wider applications including education (eg Lin, 010; Sacerdote, 011; Carrell et al, 01), crime (eg Patacchini and Zenou, 01; Lindquist and Zenou, 014), industrial organization (eg König et al, 014), nance (eg Denbee et al, 014), etc Among spatial econometric models, the spatial autoregressive (SAR) model introduced by Cli and Ord (19, 1981) has received the most attention In this model, the cross-sectional dependence is modeled as the weighted average outcome of neighboring units, typically referred to as the spatially lagged dependent variable As the spatially lagged dependent variable is endogenous, likelihoodand moment-based methods have been proposed to estimate the SAR model (eg Kelejian and Prucha, 1998; Lee, 004; Lee, 00; Lee and Liu, 010) In particular, for the SAR model with exogenous regressors, Lee (00) proposes a generalized method of moments (GMM) estimator that combines linear moment conditions, with the (estimated) mean of the spatially lagged dependent variable as the instrumental variable (IV), and quadratic moment conditions based on the covariance structure of the spatially lagged dependent variable and the model disturbance term The GMM estimator improves estimation e ciency of IV-based linear estimators in Kelejian and Prucha (1998) and is computationally simple relative to the maximum likelihood (ML) estimator in Lee (004) Furthermore, Lin and Lee (010) show that a sub-class of the GMM estimators is consistent in the presence of an unknown form of heteroskedasticity in model disturbances, and thus more robust relative to the ML estimator For SAR models with endogenous regressors, Liu (01) and Liu and Lee (01) consider, respectively, the limited information maximum likelihood (LIML) and two stage least squares (SLS) estimators, in the presence of many potential IVs Liu and Lee (01) also propose a criterion based on the approximate mean square error of the SLS estimator to select the optimal set of IVs The SAR model with endogenous regressors can be considered as an equation in a system of simultaneous equations For the full information estimation of the system, Kelejian and Prucha (004) propose a three stage least squares (SLS) estimator and, in a recent paper, Yang and Lee (014) consider the quasi-maximum likelihood (QML) approach The QML estimator is asymptotically more e cient

than the SLS estimator under normality but can be computationally di cult to implement The existing estimators for the SAR model with endogenous regressors are summarized in Table 1 Table 1: Existing Estimators for SAR Models with Endogenous Regressors single-equation estimator system estimator IV-based linear estimator Liu and Lee (01) Kelejian and Prucha (004) likelihood-based estimator Liu (01) Yang and Lee (014) In this paper, we extend the GMM estimator in Lee (00) to estimate SAR models with endogenous regressors We propose a new set of quadratic moment equations exploiting (i) the covariance structure of the spatially lagged dependent variable and the disturbance term of the main regression equation and (ii) the covariance structure of the spatially lagged dependent variable and the endogenous regressor We establish the identi cation, consistency and asymptotic normality of the proposed GMM estimator The GMM estimator is more e cient than the SLS and SLS estimators, and computationally simpler than the ML estimator With carefully constructed quadratic moment equations, the GMM estimator can be asymptotically as e cient as the ML estimator under normality We also conduct a limited Monte Carlo experiment to show that the proposed GMM estimator performs well in nite samples The rest of the paper is organized as follows In Section, we introduce the SAR model with endogenous regressors In Section, we de ne the GMM estimator and discuss the identi cation of model parameters In Section 4, we study the asymptotic properties of the GMM estimator and discuss the optimal moment conditions to use Section 5 reports Monte Carlo experiment results Section brie y concludes The proofs are collected in the appendix Throughout the paper, we adopt the following notation For an n n matrix A = [a ij ] i;j=1; ;n, let A (s) = A + A 0, vec D (A) = (a 11 ; ; a nn ) 0, and diag(a) = diag(a 11 ; ; a nn ) The row (or column) sums of A are uniformly bounded in absolute value if max i=1; ;n j=1 ja P n P n ijj (or max j=1; ;n i=1 ja ijj) is bounded Model Consider a SAR model with an endogenous regressor 1 given by y 1 = 0 Wy 1 + 0 y + X 1 0 + u 1 ; (1) 1 In this paper, we focus on the model with a single endogenous regressor for exposition purpose The model and proposed estimator can be easily generalized to accommodate any xed number of endogenous regressors

where y 1 is an n 1 vector of observations on the dependent variable, W is an n n nonstochastic spatial weights matrix with a zero diagonal, y is an n 1 vector of observations on an endogenous regressor, X 1 is an n K 1 matrix of observations on K 1 nonstochastic exogenous regressors, and u 1 is an n 1 vector of iid innovations Wy 1 is usually referred to as the spatially lagged dependent variable Let X = [X 1 ; X ], where X is an n K matrix of observations on K excluded nonstochastic exogenous variables The reduced form of the endogenous regressor y is assumed to be y = X 0 + u ; () where u is an n 1 vector of iid innovations Let 0 = ( 0 0; 0 0) 0, with 0 = ( 0 ; 0 ; 0 0) 0, denote the vector of true parameter values in the data generating process (DGP) The following regularity conditions are common in the literature of SAR models (see, eg, Lee, 00; Kelejian and Prucha, 010) Assumption 1 Let u 1;i and u ;i denote, respectively, the i-th elements of u 1 and u (i) (u 1;i ; u ;i ) 0 i:i:d:(0; ), where = 4 1 1 1 5 : (ii) Eju k;i u l;i u r;i u s;i j 1+ is bounded for k; l; r; s = 1; and some small constant > 0 Assumption (i) The elements of X are uniformly bounded constants (ii) X has full column rank K X = K 1 + K (iii) lim n!1 n 1 X 0 X exists and is nonsingular Assumption (i) All diagonal elements of the spatial weights matrix W are zero (ii) 0 ( ; ) with 0 < ; c < 1 (iii) S() = I n W is nonsingular for all ( ; ) (iv) The row and column sums of W and S( 0 ) 1 are uniformly bounded in absolute value Assumption 4 0 is in the interior of a compact parameter space y 1 ; y ; u 1 ; u ; X; W are allowed to depend on the sample size n, ie, to formulate triangular arrays as in Kelejian and Prucha (010) Nevertheless, we suppress the subscript n to simplify the notation 4

GMM Estimation 1 Estimator Let S = S( 0 ) = I n 0 W and G = WS 1 Under Assumption, model (1) has a reduced form y 1 = S 1 X 1 0 + 0 S 1 X 0 + S 1 u 1 + 0 S 1 u ; () which implies that Wy 1 = GX 1 0 + 0 GX 0 + Gu 1 + 0 Gu : (4) As Wy 1 and y are endogenous, consistent estimation of (1) requires IVs for Wy 1 and y From (4), the deterministic part of Wy 1 is a linear combination of the columns in GX = [GX 1 ; GX ] Therefore, GX can be used as an IV matrix for Wy 1 From (), X can be used as an IV matrix for y In general, let Q be an n K Q matrix of IVs such that E(Q 0 u 1 ) = E(Q 0 u ) = 0 Let u 1 () = S()y 1 y X 1 and u () = y X, where = (; ; 0 ) 0 The linear moment function for the GMM estimation is given by g 1 () = (I Q) 0 u(); where denotes the Kronecker product, u() = [u 1 () 0 ; u () 0 ] 0, and = ( 0 ; 0 ) 0 4 Besides the linear moment functions, Lee (00) proposes to use quadratic moment functions based on the covariance structure of the spatially lagged dependent variable and model disturbances to improve estimation e ciency We generalize this idea to SAR models with endogenous regressors Substitution of () into (1) leads to a pseudo reduced form y 1 = 0 Wy 1 + 0 X 0 + X 1 0 + u 1 + 0 u (5) By exploiting the covariance structure of the spatially lagged dependent variable Wy 1 and the The IV matrix GX is not feasible as G involves the unknown parameter 0 Under Assumption, GX = WX + 0 W X + 0 W X + Therefore, we can use the leading order terms WX; W X; W X of the series expansion as feasible IVs for Wy 4 In practice, we could use two di erent IV matrices Q 1 and Q to construct linear moment functions Q 0 1 u 1() and Q 0 u () The GMM estimator with g 1 () is (asymptotically) no less e cient than that with Q 0 1 u 1() and Q 0 u () if Q includes all linearly independent columns of Q 1 and Q 5

disturbances of (5), we propose the following quadratic moment functions g () = [g ;11 () 0 ; g ;1 () 0 ; g ;1 () 0 ; g ; () 0 ] 0 with g ;11 () = [ 0 1u 1 (); ; 0 mu 1 ()] 0 u 1 () g ;1 () = [ 0 1u 1 (); ; 0 mu 1 ()] 0 u () g ;1 () = [ 0 1u (); ; 0 mu ()] 0 u 1 () g ; () = [ 0 1u (); ; 0 mu ()] 0 u () where j is an n n constant matrix with tr( j ) = 0 for j = 1; ; m 5 Possible candidates for j are W, W n 1 tr(w )I n, etc These quadratic moment functions are based on the moment conditions that E(u 0 1 j u 1 ) = E(u 0 1 j u ) = E(u 0 j u 1 ) = E(u 0 j u ) = 0 for j = 1; ; m Let g() = [g 1 () 0 ; g () 0 ] 0 ; () and = Var[g( 0 )] The following assumption is from Lee (00) Assumption 5 (i) The elements of Q are uniformly bounded constants (ii) j is an nn constant matrix with tr( j ) = 0 for j = 1; ; m The row and column sums of j are uniformly bounded in absolute value (iii) lim n!1 n 1 exists and is nonsingular Combining both linear and quadratic moment functions, the GMM estimator of 0 is given by e gmm = arg min g() 0 F 0 Fg(); () for some matrix F such that lim n!1 F exists and has full row rank greater than or equal to dim() In the GMM literature, F 0 F is known as the GMM weighting matrix For instance, one can use the identity matrix as the weighting matrix to implement the GMM The asymptotic e ciency of the 5 In practice, we could use di erent sets of weighting matrices f 11;j g m 11 j=1, f 1;jg m 1 j=1, f 1;jg m 1 j=1 and f ;jg m j=1 for the quadratic moment functions g ;11 (), g ;1 (), g ;1 () and g ; () respectively The quadratic moment functions g () are (asymptotically) no less e cient than that with f 11;j g m 11 j=1, f 1;jg m 1 j=1, f 1;jg m 1 j=1 and f ;jg m j=1 if f 1 ; : : : ; mg = f 11;j g m 11 j=1 [ f 1;jg m 1 j=1 [ f 1;jg m 1 j=1 [ f ;jg m j=1 We discuss the optimal Q and in Section 4

GMM estimator depends on the choice of the weighting matrix as discussed in Section 41 Identi cation For 0 to be identi ed through the moment functions g(), lim n!1 n 1 E[g()] = 0 needs to have a unique solution at = 0 (Hansen, 198) As S()S 1 = I n + ( 0 )G, it follows from () and () that u 1 () = d 1 () + [I n + ( 0 )G]u 1 + [( 0 )I n + 0 ( 0 )G]u and u () = d () + u ; where d 1 () = [GX 1 0 + 0 GX 0 ; X 0 ; X 1 ]( 0 ) and d () = X( 0 ) For the linear moment functions, we have lim n 1 E[Q 0 u 1 ()] = lim n 1 Q 0 d 1 () = lim n 1 Q 0 [GX 1 0 + 0 GX 0 ; X 0 ; X 1 ]( 0 ) n!1 n!1 n!1 and lim n 1 E[Q 0 u ()] = lim n 1 Q 0 d () = lim n 1 Q 0 X( 0 ) n!1 n!1 n!1 Therefore, lim n!1 n 1 E[g 1 ()] = 0 has a unique solution at = 0, if Q 0 [GX 1 0 + 0 GX 0 ; X 0 ; X 1 ] and Q 0 X have full column rank for large enough n This su cient rank condition implies the necessary rank condition that [GX 1 0 + 0 GX 0 ; X 0 ; X 1 ] and X have full column rank and the rank of Q is at least maxfdim(); K X g, for large enough n Suppose [X 0 ; X 1 ] has full column rank for large enough n The necessary rank condition for identi cation does not hold if GX 1 0 + 0 GX 0 and [X 0 ; X 1 ] are asymptotically linearly dependent 8 GX 1 0 + 0 GX 0 and [X 0 ; X 1 ] are linearly dependent if there exist a constant scalar c 1 and a K 1 1 constant vector c such that GX 1 0 + 0 GX 0 = c 1 X 0 + X 1 c, which implies that d 1 () = [( 0 )c 1 + ( 0 )]X 0 + X 1 [( 0 )c + ( 0 )]: Hence, the solutions of the linear moment equations lim n!1 n 1 E[Q 0 u 1 ()] = 0 are characterized As X 0 = X 1 10 + X 0, a necessary condition for (X 0 ; X 1 ) to have full column rank is 0 = 0 8 A necessary condition for GX 1 0 + 0 GX 0 and [X 0 ; X 1 ] to be asymptotically linearly independent is ( 0 ; 0 0 )0 = 0

by = 0 + ( 0 )c 1 and = 0 + ( 0 )c (8) as long as Q 0 [X 0 ; X 1 ] has full column rank for large enough n In this case, 0 and 0 can be identi ed if and only if 0 can be identi ed from the quadratic moment equations Given (8), we have E[u 1 () 0 j u 1 ()] = ( 0 )( 1 + 0 1 )tr( (s) j G) +( 0 ) [( 1 + 0 1 + 0 )tr(g 0 j G) c 1 ( 1 + 0 )tr( (s) j G)] and E[u 1 () 0 j u ()] = ( 0 )( 1 + 0 )tr( 0 jg) E[u () 0 j u 1 ()] = ( 0 )( 1 + 0 )tr( j G) for j = 1; ; m If ( 1 + 0 1 ) lim n!1 n 1 tr( (s) j G) = 0 for some j f1; ; mg, the quadratic moment equation has two roots = 0 and lim n 1 E[u 1 () 0 j u 1 ()] = 0 n!1 = 0 + ( 1 + 0 1 ) ( 1 + 0 1 + 0 ) lim n!1[tr(g 0 j G)=tr( (s) j G)] c 1 ( 1 + 0 ): As ( 1 + 0 1 + 0 ) > 0, if lim n!1 [tr(g 0 j G)=tr( (s) j for some j = k, the moment equations G)] = lim n!1 [tr(g 0 k G)=tr( (s) k G)] lim n 1 E[u 1 () 0 j u 1 ()] = 0 and lim n 1 E[u 1 () 0 k u 1 ()] = 0 n!1 n!1 have a unique common root = 0 On the other hand, if ( 1 + 0 ) lim n!1 n 1 tr( 0 jg) = 0 for some j f1; ; mg, the quadratic moment equation lim n 1 E[u 1 () 0 j u ()] = 0 n!1 8

has a unique root = 0 ; and if ( 1 + 0 ) lim n!1 n 1 tr( j G) = 0 for some j f1; ; mg, the quadratic moment equation lim n!1 n 1 E[u () 0 j u 1 ()] = 0 has a unique root = 0 To wrap up, the su cient identi cation condition of 0 is summarized in the following assumption Assumption lim n!1 n 1 Q 0 X and lim n!1 n 1 Q 0 [X 0 ; X 1 ] both have full column rank, and at least one of the following conditions is satis ed (i) lim n!1 n 1 Q 0 [GX 1 0 + 0 GX 0 ; X 0 ; X 1 ] has full column rank (ii) ( 1 + 0 1 ) lim n!1 n 1 tr( (s) j G) = 0 for some j f1; ; mg, and lim n!1 n 1 [tr( (s) 1 G); ; tr((s) m G)] 0 is linearly independent of lim n!1 n 1 [tr(g 0 1 G); ; tr(g 0 m G)] 0 (iii) ( 1 + 0 ) lim n!1 n 1 tr( j G) = 0 or ( 1 + 0 ) lim n!1 n 1 tr( 0 jg) = 0 for some j f1; ; mg 4 Asymptotic Properties 41 Consistency and Asymptotic Normality The GMM estimator de ned in () falls into the class of Z-estimators (see Newey and McFadden, 1994) Therefore, to establish the consistency and asymptotic normality, it su ces to show that the GMM estimator satis es the su cient conditions for Z-estimators to be consistent and asymptotically normally distributed when properly normalized and centered A similar argument has been adopted by Lee (00) to establish the asymptotic normality of the GMM estimator for the SAR model with exogenous regressors Let r;s = E(u r 1;i us ;i ) for r + s = ; 4 By Lemmas B1 and B in the Appendix, we have = Var[g( 0 )] = 4 11 1 0 1 5 (9) with 11 = Var[g 1 ( 0 )] = (Q 0 Q), 1 = E[g 1 ( 0 )g ( 0 ) 0 ] = 4 ;0 ;1 ;1 1; ;1 1; 1; 0; 5 (Q 0!) 9

and = = Var[g ( 0 )] 4;0 4 1 ;1 1 1 ;1 1 1 ; 1 1 ; 1 1 ; 1 1 1; 1 (! 0!) 4 ; 1 1 1; 1 5 0;4 4 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 + 1 1 1 + 4 5 4 1 1 ; 5 4 4 where! = [vec D ( 1 ); ; vec D ( m )] and 1 = 4 tr( 1 1 ) tr( 1 m ) tr( m 1 ) tr( m m ) 5 and = 4 tr( 0 1 1 ) tr( 0 1 m ) tr( 0 m 1 ) tr( 0 m m ) 5 : Let where D 1 = E[ 0 g 1( 0 )] = D = E[ 0 g( 0)] = [D 0 1; D 0 ] 0 ; (10) 4 Q0 (GX 1 0 + 0 GX 0 ) Q 0 X 0 Q 0 X 1 0 0 0 0 Q 0 X 5 10

and D = E[ 0 g ( 0 )] = 4 ( 1 + 0 1 )tr( (s) 1 G) 0 1(K X +K 1+1) ( 1 + 0 1 )tr( (s) m G) 0 1(KX +K 1+1) ( 1 + 0 )tr( 0 1G) 0 1(KX +K 1+1) ( 1 + 0 )tr( 0 mg) 0 1(KX +K 1+1) ( 1 + 0 )tr( 1 G) 0 1(KX +K 1+1) ( 1 + 0 )tr( m G) 0 1(KX +K 1+1) : 5 0 m1 0 m(kx +K 1+1) The following proposition establishes the consistency and asymptotic normality of the GMM estimator Proposition 1 Suppose Assumptions 1- hold Then e gmm de ned in () is a consistent estimator of 0 and has the following asymptotic distribution p n( e gmm 0 ) d! N(0; AsyVar( e gmm )) where AsyVar( e gmm ) = lim n!1 [(n 1 D) 0 F 0 F(n 1 D)] 1 (n 1 D) 0 F 0 F(n 1 )F 0 F(n 1 D)[(n 1 D) 0 F 0 F(n 1 D)] 1 with and D de ned in (9) and (10) respectively Close inspection of AsyVar( e gmm ) reveals that the optimal F 0 F is (n 1 ) 1 by the generalized Schwarz inequality The following proposition establishes the consistency and asymptotic normality of the GMM estimator with the estimated optimal weighting matrix It also suggests a over-identifying restrictions (OIR) test based on the proposed GMM estimator Proposition Suppose Assumptions 1- hold and n 1 b is a consistent estimator of n 1 de ned in (9) Then, b gmm = arg min g() 0 b 1 g() (11) 11

is a consistent estimator of 0 and p n( b gmm 0 ) d! N(0; [ lim n!1 n 1 D 0 1 D] 1 ); where D is de ned in (10) Furthermore g( b gmm ) 0 b 1 g( b gmm ) d! dim(g) dim() : 4 Asymptotic E ciency When only the linear moment function g 1 ( 0 ) is used for the GMM estimation, the GMM estimator de ned in (11) reduces to the generalized spatial SLS in Kelejian and Prucha (004) because b SLS = arg min g 1 () 0 b 1 11 g 1() = arg min u() 0 ( b 1 P)u() = [Z 0 ( b 1 P)Z] 1 Z 0 ( b 1 P)y; where P = Q(Q 0 Q) 1 Q 0, y = (y 0 1; y 0 ) 0, Z = 4 Wy 1 y X 1 0 0 0 0 X 5 ; and b is a consistent estimator of It follows from Proposition that p n( b SLS 0 ) d! N(0; [ lim n!1 n 1 D 0 1 1 11 D 1] 1 ): As D 0 1 D D 0 1 1 11 D 1 = (D 0 1 1 11 D 1) 0 ( 0 1 1 11 1) 1 (D 0 1 1 11 D 1); which is positive semi-de nite, the proposed GMM estimator is asymptotically more e cient than the SLS estimator The asymptotic e ciency of the proposed GMM estimator depends on the choices of Q and 1 ; ; m Following Lee (00), our discussion on the asymptotic e ciency focuses on two cases: (i) u = (u 0 1; u 0 ) 0 N(0; I n ), and (ii) j has a zero diagonal for all j = 1; ; m Let P be a subset of all s satisfying Assumption 5 such that diag() = 0 for all P The sub-class of 1

quadratic moment functions using P is of a particular interest because these quadratic moment functions could be robust against unknown form of heteroskedasticity as shown in Lin and Lee (010) Let g () = [g1() 0 ; g() 0 ] 0 ; (1) where g 1() = (I Q ) 0 u() and g () = [u 1 () 0 u 1 (); u 1 () 0 u (); u () 0 u 1 (); u () 0 u ()] 0 : In cases (i) and (ii), = Var[g ( 0 )] = 4 (Q0 Q ) 0 0 5 (1) where = 4 4 1 1 1 1 1 1 1 1 1 1 1 tr( ) + 5 4 4 1 1 1 1 1 1 1 1 1 1 1 tr( 0 ): 5 4 4 The following proposition gives the infeasible best GMM (BGMM) estimator e bgmm = arg min g () 0 1 g () (14) with the optimal Q and in cases (i) and (ii) respectively Proposition Suppose Assumptions 1- hold Let G = WS 1 (i) Suppose u N(0; I n ) The BGMM estimator de ned in (14) with Q = [GX; X] and = G n 1 tr(g)i n is the most e cient one in the class of GMM estimators de ned in () (ii) Without the normality assumption on u, the BGMM estimator de ned in (14) with Q = [GX; X] and = G diag(g) is the most e cient one in the sub-class of GMM estimators de ned in () with j P for all j = 1; ; m Under normality, the model can be e ciently estimated by the ML estimator To get some 1

intuition of the optimal Q and in case (i), we compare the linear and quadratic moment functions utilized by the GMM estimator with the rst order partial derivatives of the log likelihood function Let G() = WS() 1, where S() = I n W The log likelihood function based on the joint normal distribution of y = (y 0 1; y 0 ) 0 is 9 L(; ) = n ln() 1 ln j I nj + ln js()j 1 u()0 ( I n ) 1 u() (15) with the rst order partial derivatives L(; b ) = [(X + X 1 ) 0 G() 0 ; 0 1n ] 0 ( b I n ) 1 u() + b j b j u 1() 0 [G() n 1 tr(g())i n ]u 1 () b 1 j b j u () 0 [G() n 1 tr(g())i n ]u 1 () + b j b j u 1() 0 [G() n 1 tr(g())i n ]u () b 1 j b j u () 0 [G() n 1 tr(g())i n ]u () and L(; ) b = [ 0 X 0 ; 0 1n ]( b I n ) 1 u() L(; ) b = [X 0 1; 0 K1n]( b I n ) 1 u() L(; ) b = [0 KX n; X 0 ]( b I n ) 1 u() where b is the ML estimator for given by b = 4 b 1 b 1 b 1 b 5 = n 1 4 u 1() 0 u 1 () u 1 () 0 u () u 1 () 0 u () u () 0 u () 5 : Close inspection reveals the similarity between the ML and BGMM estimators under normality, as the rst order partial derivatives of the log likelihood function can be considered as linear combinations of the moment functions Q () 0 u 1 (), Q () 0 u (), u 1 () 0 ()u 1 (), u 1 () 0 ()u (), u () 0 ()u 1 (), and u () 0 ()u () with Q () = [G()X; X] and () = G() n 1 tr(g())i n The optimal Q and are not feasible as G involves the unknown parameter 0 Suppose there exists a p n-consistent preliminary estimator b for 0 (say, the SLS estimator with IV matrix 9 The detailed derivation of the log likelihood function and its partial derivatives can be found in Appendix A 14

Q = [WX; X]) Then, the feasible optimal Q b and b can obtained by replacing 0 in Q and by b Furthermore, suppose b 1; b 1 ; b are consistent preliminary estimators for 1; 1 ; Then, n 1 b is a consistent estimator of n 1 de ned in (1) with the unknown parameters ; 1; 1 ; in replaced by ; b b 1; b 1 ; b Then, the feasible BGMM estimator is given by b bgmm = arg min bg () 0 b 1 bg (); (1) where bg () is obtained by replacing Q and in g () with b Q and b Following a similar argument in the proof of Proposition in Lee (00), the feasible BGMM estimator b bgmm can be shown to have the same limiting distribution as its infeasible counterpart e bgmm Proposition 4 Suppose Assumptions 1- hold, b is a p n-consistent estimator of 0, and b is a consistent estimator of The feasible BGMM estimator b bgmm de ned in (1) is asymptotically equivalent to the corresponding infeasible BGMM estimator e bgmm Under Assumption, G = WS 1 = W + 0 W + 0W + Thus, G can be approximated by the leading order terms of the series expansion, ie W; W ; W ; Therefore, a convenient alternative to the BGMM estimator under normality for empirical researchers would be the GMM estimator with Q = [WX; ; W m X; X] and 1 = W; = W n 1 tr(w )I n ; ; m = W m n 1 tr(w m )I n, for some xed m 5 Monte Carlo Experiments We conduct a small Monte Carlo simulation experiment to study the nite sample performance of the proposed GMM estimator based on the following model y 1 = 0 Wy 1 + 0 y + 0 x 1 + u 1 y = 0 x + u : In the DGP, we set 0 = 0: and 0 = 1, and generate x 1, x and u = (u 0 1; u 0 ) 0 as x 1 N(0; I n ), x N(0; I n ), and u N(0; I n ), where = 4 1 1 1 1 5 : 15

We conduct 1000 replications in the simulation experiment for di erent speci cations with n f45; 490g, 1 f0:1; 0:5; 0:9g, and ( 0 ; 0 ) f(0:5; 0:5); (0:; 0:)g From the reduce form equation (4), E(Wy 1 ) = Gx 1 0 + 0 Gx 0 Therefore, 0 = 0 = 0:5 corresponds to the case that the IVs based on E(Wy 1 ) are informative and 0 = 0 = 0: corresponds to the case that the IVs based on E(Wy 1 ) are less informative Let W 0 denote the spatial weights matrix for the study of crimes across 49 districts in Columbus, Ohio, in Anselin (1988) For n = 45, we set W = I 5 W 0, and for n = 490, we set W = I 10 W 0 Let b G = W(I n b W) 1, where b is the SLS estimator of 0 using the IV matrix Q = [WX; W X; X], where X = [x 1 ; x ] Let Q b = [ GX; b X] In the experiment, we consider the following estimators (a) The SLS estimator of equation (1) with the linear moment function Q b 0 u 1 () (b) The SLS estimator of equations (1) and () with the linear moment function (I Q) b 0 u() (c) The single-equation GMM (GMM-1) estimator of equation (1) with the linear moment function bq 0 u 1 () and the quadratic moment function u 1 () 0 [ b G n 1 tr( b G)I n ]u 1 () (d) The system GMM (GMM-) estimator of equations (1) and () with the linear moment function (I Q) b 0 u() and the quadratic moment functions u 1 () 0 [ G b n 1 tr( G)I b n ]u 1 (), u 1 () 0 [ G b n 1 tr( G)I b n ]u (), u () 0 [ G b n 1 tr( G)I b n ]u 1 (), and u () 0 [ G b n 1 tr( G)I b n ]u () Although the SLS estimator and the single-equation GMM estimator only use limited information in equation (1) and thus may not be as e cient as their counterparts (ie the SLS estimator and the system GMM estimator respectively) that use full information in the whole system, these estimators require weaker assumptions on the reduced form equation () and thus may be desirable under certain circumstances The estimation results are reported in Tables -5 We report the mean and standard deviation (SD) of the empirical distributions of the estimates To facilitate the comparison of di erent estimators, we also report their root mean square errors (RMSE) The main observations from the experiment are summarized as follows [Insert Tables -5 here] (i) The SLS and SLS estimators of 0 are upwards biased with large SDs when the IVs for Wy 1 are less informative For example, when n = 45 and 1 = 0:1, the SLS and SLS estimates of 0 reported in Table 4 are upwards biased by about 10% The biases and SDs reduce as sample size increases The SLS estimators of 0 and 0 perform better as 1 increases 1

(ii) The single-equation GMM (GMM-1) estimator of 0 is upwards biased when the IVs for Wy 1 are less informative When n = 45 and 1 = 0:1, the GMM-1 estimates of 0 reported in Table 4 are upwards biased by about % The bias reduces as sample size increases The GMM-1 estimator of 0 reduces the SD of the SLS estimator The SD reduction is more signi cant when the IVs for Wy 1 are less informative In Table, when 1 = 0:1, the GMM- 1 estimator reduces the SD of the SLS estimator by about 0% In Table 4, when 1 = 0:1, the GMM-1 estimator reduces the SD of the SLS estimator by about 5% (iii) The system GMM (GMM-) estimator of 0 is upwards biased when the sample size is moderate (n = 45) and the IVs for Wy 1 are less informative The bias reduces as 1 increases When n = 490, the GMM- estimator is essentially unbiased even if the IVs are weak The GMM- estimators of 0 and 0 have smaller SDs than the corresponding GMM-1 estimators The reduction in the SD is more signi cant when the endogeneity problem is more severe (ie 1 is larger) and/or the IVs for Wy 1 are less informative For example, in Table, when 1 = 0:9, the GMM- estimator of 0 reduces the SD of the GMM-1 estimator by about 4% In Table 5, when 1 = 0:9, the GMM- estimator of 0 reduces the SD of the GMM-1 estimator by about 5% In both cases, the GMM- estimator of 0 reduces the SD of the corresponding GMM-1 estimator by about 5% Conclusion In this paper, we propose a general GMM framework for the estimation of SAR models with endogenous regressors We introduce a new set of quadratic moment conditions to construct the GMM estimator, based on the correlation structure of the spatially lagged dependent variable with the model disturbance term and with the endogenous regressor We establish the consistency and asymptotic normality of the proposed GMM estimator and discuss the optimal choice of moment conditions We also conduct a Monte Carlo experiment to show the GMM estimator works well in nite samples The proposed GMM estimator utilizes correlation across equations (1) and () to construct moment equations and thus can be considered as a full information estimator If we only use the moment equations based on u 1 (), ie, the residual function of equation (1), the proposed GMM estimator becomes a single-equation GMM estimator Although the single-equation GMM estimator may not be as e cient as the full information GMM estimator, the single-equation GMM estima- 1

tor requires weaker assumptions on the reduced form equation () and thus may be desirable under certain circumstances The Monte Carlo experiment shows that the full information GMM estimator improves the e ciency of the single-equation GMM estimator when the endogeneity problem is severe and/or the IVs for the spatially lagged dependent variable are weak References Anselin, L (1988) Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, Dordrecht Breusch, T, Qian, H, Schmidt, P and Wyhowski, D (1999) Redundancy of moment conditions, Journal of Econometrics 91: 89 111 Carrell, S E, Sacerdote, B I and West, J E (01) From natural variation to optimal policy? The importance of endogenous peer group formation, Econometrica 81: 855 88 Cli, A and Ord, J (19) Spatial Autecorrelation, Pion, London Cli, A and Ord, J (1981) Spatial Processes, Models and Applications, Pion, London Denbee, E, Julliard, C, Li, Y and Yuan, K (014) Network risk and key players: A structural analysis of interbank liquidity AXA working paper series NO 1, FMG discussion paper 4 Hansen, L P (198) Large sample properties of generalized method of moments estimators, Econometrica 50: 109 1054 Kelejian, H H and Prucha, I R (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbance, Journal of Real Estate Finance and Economics 1: 99 11 Kelejian, H H and Prucha, I R (004) Estimation of simultaneous systems of spatially interrelated cross sectional equations, Journal of Econometrics 118: 50 Kelejian, H H and Prucha, I R (010) Speci cation and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances, Journal of Econometrics 15: 5 König, M, Liu, X and Zenou, Y (014) R&D networks: Theory, empirics and policy implications CEPR discussion paper no 98 18

Lee, L F (004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric models, Econometrica : 1899 19 Lee, L F (00) GMM and SLS estimation of mixed regressive, spatial autoregressive models, Journal of Econometrics 1: 489 514 Lee, L F and Liu, X (010) E cient GMM estimation of high order spatial autoregressive models with autoregressive disturbances, Econometric Theory : 18 0 Lin, X (010) Identifying peer e ects in student academic achievement by a spatial autoregressive model with group unobservables, Journal of Labor Economics 8: 85 80 Lin, X and Lee, L F (010) Gmm estimation of spatial autoregressive models with unknown heteroskedasticity, Journal of Econometrics 15: 4 5 Lindquist, M J and Zenou, Y (014) Key players in co-o ending networks Working paper, Stockholm University Liu, X (01) On the consistency of the LIML estimator of a spatial autoregressive model with many instruments, Economics Letters 11: 4 45 Liu, X and Lee, L F (01) Two stage least squares estimation of spatial autoregressive models with endogenous regressors and many instruments, Econometric Reviews : 4 5 Newey, W K and McFadden, D (1994) Large sample estimation and hypothesis testing, in D Mc- Fadden and R F Engle (eds), Handbook of Econometrics, Vol IV, Elsevier, North-Holland, pp 111 45 Patacchini, E and Zenou, Y (01) Juvenile delinquency and conformism, Journal of Law, Economics, and Organization 8: 1 1 Peracchi, F (001) Econometrics, John Wiley & Sons, New York Sacerdote, B (011) Peer e ects in education: How might they work, how big are they and how much do we know thus far?, in E A Hanushek, S Machin and L Woessmann (eds), Handbook of Economics of Education, Vol, Elsevier, pp 49 Yang, K and Lee, L F (014) Identi cation and QML estimation of multivariate and simultaneous spatial autoregressive models Working paper, The Ohio State University 19

A Let Likelihood Function of the SAR Model with Endogenous Regressors y () = 4 S 1 ()(X + X 1 ) X 5 and R(; ) = 4 S 1 () S 1 () 0 I n 5 ; where S() = I n W From the reduced form equations () and (), y = (y1; 0 y) 0 0 = y ( 0 ) + R( 0 ; 0 )u where u = (u 0 1; u 0 ) 0 Under normality, u N(0; I n ), and thus y N( y ; R( I n )R 0 ), where y = y ( 0 ) and R = R( 0 ; 0 ) Hence, the log likelihood function of (1) and () is given by 1 L(; ) = n ln() ln jr(; )( I n)r(; ) 0 j 1 [y y()] 0 [R(; )( I n )R(; ) 0 ] 1 [y y ()]: As u() = R 1 (; )[y y ()] and jr 1 (; )j = js()j Then, the log likelihood function can be written as L(; ) = n ln() 1 ln j( I n)j + ln js()j 1 u()0 ( I n ) 1 u(): The rst order partial derivatives of the log likelihood function are L(; ) = tr(g()) + [y0 1W 0 ; 0]( I n ) 1 u() L(; ) = [y0 ; 0]( I n ) 1 u() L(; ) = [X0 1; 0]( I n ) 1 u() L(; ) = [0; X0 ]( I n ) 1 u() and ( I n ) 1 L(; ) = 1 ( I n) 1 u()u()0 ; (1) where G() = WS() 1 Since Wy 1 = G()(u 1 () + u ()) + G()(X + X 1 ) and y = 0

X + u (), then L(; ) = tr(g()) + [(X + X 1) 0 G() 0 ; 0] 0 ( I n ) 1 u() (18) +[(u 1 () + u ()) 0 G() 0 ; 0] 0 ( I n ) 1 u() and L(; ) = [0 X 0 ; 0]( I n ) 1 u() + [u () 0 ; 0]( I n ) 1 u(): (19) From (1), the ML estimator for is given by b = 4 b 1 b 1 b 1 b 5 = n 1 4 u 1() 0 u 1 () u 1 () 0 u () u 1 () 0 u () u () 0 u () 5 : Substitution of b into (18) and (19) gives L(; b ) = [(X + X 1 ) 0 G() 0 ; 0] 0 ( b I n ) 1 u() + b j b j u 1() 0 [G() n 1 tr(g())i n ]u 1 () b 1 j b j u () 0 [G() n 1 tr(g())i n ]u 1 () + b j b j u 1() 0 [G() n 1 tr(g())i n ]u () b 1 j b j u () 0 [G() n 1 tr(g())i n ]u () and L(; b ) = [ 0 X 0 ; 0]( b I n ) 1 u(): B Lemmas For ease of reference, we list some useful results without proofs Lemmas B1-B can be found (or are straightforward extensions of the lemmas) in Lee (00) Lemma B is a special case of Lemma in Yang and Lee (014) Lemmas B8 and B9 are from Breusch et al (1999) 1

Lemma B1 Let A and B be n n nonstochastic matrices such that tr(a) = tr(b) = 0 Then, (i) E(u 0 1Au 1 u 0 1Bu 1 ) = ( 4;0 4 1)vec D (A) 0 vec D (B) + 4 1tr(AB (s) ) (ii) E(u 0 1Au 1 u 0 1Bu ) = ( ;1 1 1 )vec D (A)vec D (B) + 1 1 tr(ab (s) ) (iii) E(u 0 1Au 1 u 0 Bu ) = ( ; 1 1)vec D (A) 0 vec D (B) + 1tr(AB (s) ) (iv) E(u 0 1Au u 0 1Bu ) = ( ; 1 1)vec D (A) 0 vec D (B) + 1 tr(ab 0 ) + 1tr(AB) (v) E(u 0 1Au u 0 Bu ) = ( 1; 1 )vec D (A) 0 vec D (B) + 1 tr(ab (s) ) (vi) E(u 0 Au u 0 Bu ) = ( 0;4 4 )vec D (A) 0 vec D (B) + 4 tr(ab (s) ) Lemma B Let A be an n n nonstochastic matrix and c be an n 1 nonstochastic vector Then, (i) E(u 0 1Au 1 u 0 1c) = ;0 vec D (A) 0 c (ii) E(u 0 1Au 1 u 0 c) = E(u 0 1Au u 0 1c) = ;1 vec D (A) 0 c (iii) E(u 0 1Au u 0 c) = E(u 0 Au u 0 1c) = 1; vec D (A) 0 c (iv) E(u 0 Au u 0 c) = 0; vec D (A) 0 c: Lemma B Let A be an nn nonstochastic matrix with row and columns sums uniformly bounded in absolute value Then, (i) n 1 u 0 1Au 1 = O p (1), n 1 u 0 1Au = O p (1); and (ii) n 1 [u 0 1Au 1 E(u 0 1Au 1 )] = o p (1), n 1 [u 0 1Au E(u 0 1Au )] = o p (1) Lemma B4 Let A be an nn nonstochastic matrix with row and columns sums uniformly bounded in absolute value Let c be an n 1 nonstochastic vector with uniformly bounded elements Then, n 1= c 0 Au r = O p (1) and n 1 c 0 Au r = o p (1) Furthermore, if lim n!1 n 1 c 0 AA 0 c exists and is positive de nite, then n 1= c 0 d Au r! N(0; r lim n!1 n 1 c 0 AA 0 c), for r = 1; Lemma B5 Suppose n 1 [ () 0()] = o p (1) uniformly in, where 0() is uniquely identi ed at 0 De ne b = arg min () and b = arg min () If n 1 [ () ()] = o p (1) uniformly in then both b and b are consistent estimators of 0 Furthermore, assume that 1 n () converges uniformly to a matrix which is nonsingular at 0 0 and 1 () = 0 O p (1) If 1 n [ () 0 ()] = o 0 p (1) and p 1 n [ () 0 ()] = o 0 p (1) uniformly in, then p n( b p 0 ) n( b 0 ) = o p (1) p n

Lemma B Let A and B be n n nonstochastic matrices with row and columns sums uniformly bounded in absolute value, c 1 and c be n1 nonstochastic vectors with uniformly bounded elements G is either G, G n 1 tr(g)i n or G diag(g), and G b is obtained by replacing 0 in G by its p n-consistent estimator b Suppose Assumption holds Then, n 1 c 0 1( G b G)c = o p (1), n 1= c 0 1( G b G)Au r = o p (1), n 1 u 0 ra 0 ( G b G)Bu s = o p (1), and n 1= u 0 r( G b G)u s = o p (1), for r; s = 1; Lemma B Let A r;s be an nn nonstochastic matrix with row and column sums uniformly bounded in absolute value for r; s = 1; Let c 1 and c be n 1 nonstochastic vectors with uniformly bounded elements Let = Var(), where = P r=1 c0 ru r + P s=1 P r=1 (u0 sa r;s u r E[u 0 sa r;s u r ]) Suppose = O(n) and n 1 is bounded away from zero Then, 1 d! N(0; 1) Lemma B8 Consider the set of moment conditions g() = [g 1 () 0 ; g () 0 ] 0 with E[g( 0 )] = 0 De ne D i = E[ 0 g i ()] and ij = E[g i ()g j () 0 ] for i; j = 1; The following statements are equivalent (i) g is redundant given g 1 ; (ii) D = 1 1 11 D 1 and (iii) there exists a matrix A such that D = 1 A and D 1 = 11 A Lemma B9 Consider the set of moment conditions g() = [g 1 () 0 ; g () 0 ; g () 0 ] 0 with E[g( 0 )] = 0 Then (g 0 ; g 0 ) 0 is redundant given g 1 if and only if g is redundant given g 1 and g is redundant given g 1 C Proofs Proof of Proposition 1: To prove consistency, rst we need to show the uniform convergence of n g() 0 F 0 Fg() in probability For some typical row F i of F 0 1 0 1 mx mx F i g() = f 1;i Q 0 u 1 () + f ;i Q 0 u () + u 1 () 0 f 1;ij j A u 1 () + u 1 () 0 f ;ij j A u () j=1 0 1 0 1 mx mx +u () 0 f ;ij j A u 1 () + u () 0 f 4;ij j A u () j=1 j=1 j=1 where F i = (f 1;i ; f ;i ; f 1;i1 ; ; f 1;im ; ; f 4;i1 ; ; f 4;im ) and f 1;i and f ;i are row sub-vectors As u 1 () = d 1 () + r 1 (), where d 1 () = ( 0 )G( 0 X 0 + X 1 0 ) + ( 0 )X 0 + X 1 ( 0 ) and

r 1 () = u 1 + ( 0 )(Gu 1 + 0 Gu ) + ( 0 )u, we have 0 1 0 1 mx mx u 1 () 0 f 1;ij j A u 1 () = d 1 () 0 f 1;ij j A d 1 () + l 1 () + q 1 () j=1 Pm where l 1 () = d 1 () 0 j=1 f 1;ij (s) j r 1 () and q 1 () = r 1 () Pmj=1 0 f 1;ij j r 1 () It follows by Lemmas B and B4 that n 1 l 1 () = o p (1) and n 1 q 1 () n 1 E[q 1 ()] = o p (1) uniformly in, where j=1 E[q 1 ()] = ( 0 )[ 1 + 1 ( 0 ) + 0 ( 0 )] +( 0 ) ( 0 + 1 0 + 1) mx j=1 mx f 1;ij tr(g 0 j G): j=1 f 1;ij tr(g (s) j ) Hence, n 1 u 1 () 0 Pm j=1 f 1;ij j u 1 () n 1 E[u 1 () 0 Pm j=1 f 1;ij j u 1 ()] = o p (1) uniformly in, where E[u 1 () 0 Pm j=1 f 1;ij j u 1 ()] = d 1 () 0 Pm j=1 f 1;ij j d 1 () + E[q 1 ()] As u () = d () + u, where d () = X( 0 ), we have 0 1 0 1 mx mx u 1 () 0 f ;ij j A u () = d 1 () 0 f ;ij j A d () + l () + q () j=1 j=1 where l () = r 1 () 0 Pm j=1 f ;ij j d ()+d 1 () 0 Pm j=1 f ;ij j u and q () = r 1 () 0 Pm j=1 f ;ij j u It follows by Lemmas B and B4 that n 1 l () = o p (1) and n 1 q () n 1 E[q ()] = o p (1) uniformly in, where E[q ()] = ( 0 )( 1 + 0 ) mx f ;ij tr(g j ): Pm Hence, n 1 u 1 () 0 j=1 f Pm ;ij j u () n 1 E[u 1 () 0 j=1 f ;ij j u ()] = o p (1) uniformly Pm in, where E[u 1 () 0 j=1 f Pm ;ij j u ()] = d 1 () 0 j=1 f ;ij j d () + E[q ()] Similarly, n 1 u () 0 Pm j=1 f ;ij j u 1 () j=1 n 1 E[u () 0 Pm j=1 f ;ij j u 1 ()] = o p (1), n 1 u () 0 Pm j=1 f 4;ij j u () n 1 E[u () 0 Pm j=1 f 4;ij j u ()] = o p (1), n 1 f 1;i Q 0 u 1 () n 1 E[f 1;i Q 0 u 1 ()] = o p (1), and n 1 f ;i Q 0 u () n 1 E[f ;i Q 0 u ()] = o p (1) uniformly in Therefore, n 1 Fg() n 1 FE[g()] = o p (1) uniformly in, and hence, n g() 0 F 0 Fg() converges in probability to a well de ned limit uniformly in As g() is a quadratic function of, n 1 FE[g()] is uniformly equicontinuous on by Assumption 4 The identi cation condition and the uniform equicontinuity of n 1 FE[g()] 4

imply that the identi cation uniqueness condition for n E[g() 0 ]F 0 FE[g()] must be satis ed The consistency of b follows by Theorem 151 of Peracchi (001) For the asymptotic normality of e gmm, by the mean value theorem, p n( e gmm 0 ) = n 1 g(e gmm ) 0 F 0 n 1 F 1 0 g() n 1 g(e gmm ) 0 F 0 n 1= Fg( 0 ) where = t e gmm + (1 0 g() = 4 t) 0 for some t [0; 1] and Q 0 Wy 1 Q 0 y Q 0 X 1 0 0 0 0 Q 0 X u 1 () 0 (s) 1 Wy 1 u 1 () 0 (s) 1 y u 1 () 0 (s) 1 X 1 0 u 1 () 0 (s) m Wy 1 u 1 () 0 (s) m y u 1 () 0 (s) m X 1 0 u () 0 0 1Wy 1 u () 0 0 1y u () 0 0 1X 1 u 1 () 0 1 X u () 0 0 mwy 1 u () 0 0 my u () 0 0 mx 1 u 1 () 0 m X u () 0 1 Wy 1 u () 0 1 y u () 0 1 X 1 u 1 () 0 0 1X u () 0 m Wy 1 u () 0 m y u () 0 m X 1 u 1 () 0 0 mx 0 0 0 u () 0 (s) 1 X 0 0 0 u () 0 (s) m X : 5 Using Lemmas B and B4, it follows by a similar argument in the proof of Proposition 1 in Lee (00) that n 1 0 g( b ) n 1 D = o p (1) and n 1 0 g() n 1 D = o p (1) with D given by (10) By Lemma B and the Cramer-Wald device, we have n 1= Fg( 0 ) d! N(0; lim n!1 n 1 FF 0 ) with given by (9) The desired result follows Proof of Proposition : Note that n 1 g() 0 b 1 g() = n 1 g() 0 1 g() + n 1 g() 0 ( b 1 1 )g(): With F = (n 1 ) 1=, uniform convergence of n 1 g() 0 1 g() in probability follows by a similar 5

argument in the proof of Proposition 1 On the other hand, n 1 g() 0 ( b 1 1 )g() n 1 kg()k (n 1 ) b 1 (n 1 ) 1 where k k is the Euclidean norm for vectors and matrices By a similar argument in the proof of Proposition 1, we have n 1 g() n 1 E[g()] = o p (1) and n 1 E[g()] = O(1) uniformly in, which in turn implies that n 1 kg()k = O p (1) uniformly in Therefore, n 1 g() 0 ( b 1 1 )g() = o p (1) uniformly in The consistency of b gmm follows For the asymptotic normality of p n( b gmm 0 ), note that from the proof of Proposition 1 we have n 1 0 g( b gmm ) n 1 D = o p (1), since b gmm is consistent Let = t b gmm + (1 t) 0 for some t [0; 1], then by the mean value theorem, = = p n( gmm b 0 ) n 1 g(b gmm ) 0 (n 1 ) b 1 n 1 1 0 g(b ) n 1 g(b gmm ) 0 (n 1 ) b 1 n 1= g( 0 ) h n 1 D 0 n 1 1 1 n Di 1 n 1 D 0 n 1 1 n 1= g( 0 ) + o p (1) which concludes the rst part of the proof, since in the proof of Proposition 1 it is established that n 1= g( 0 ) converges in distribution For the overidenti cation test, by the mean value theorem, for some t [0; 1] and = t b gmm + (1 t) 0 n 1= g( b gmm ) = n 1= g( 0 ) + n 1= 0 g()(b gmm 0 ) = n 1= g( 0 ) n 1 D p n( b gmm 0 ) + o p (1) = An 1= g( 0 ) + o p (1) where A = I dim(g) h n 1 D n 1 D 0 n 1 1 1 n Di 1 n 1 D 0 n 1 1 Therefore g( b gmm ) 0 b 1 g( b gmm ) = n 1= g( b gmm ) 0 (n 1 ) 1 n 1= g( b gmm ) + o p (1) = n 1= g( 0 ) 0 A 0 (n 1 ) 1 An 1= g( 0 ) + o p (1) = [ n 1 1= n 1= g( 0 )] 0 B[ n 1 1= n 1= g( 0 )] + o p (1)

where B = I dim(g) n 1 h 1= n 1 D n 1 D 0 n 1 1 1 n Di 1 n 1 D 0 n 1 1= Therefore g( b gmm ) 0 b 1 g( b gmm ) d! tr(b) ; where tr(b) = dim(g) dim() Proof of Proposition : To establish the asymptotic e ciency, we use an argument by Breusch et al (1999) to show that any additional moment conditions g de ned in () given g de ned in (1) will be redundant Following Breusch et al (1999), g is redundant given g if the asymptotic variance of an estimator based on moment equations E[g()] = 0 and E[g ()] = 0 is the same as an estimator based on E[g ()] = 0 In cases (i) and (ii), # = E[g( 0 )g ( 0 ) 0 ] = 4 ( Q0 Q ) 0 0 # 5 where # = 4 1 1 1 1 1 1 1 1 1 1 1 tr( 1 ) 4 1 1 1 1 1 4 5 5 tr( m ) 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 tr( 0 1 ) + 4 1 1 1 1 1 4 5 : 5 tr( 0 m ) 1 1 1 4

Let 1 A = 1 1 4 (C 0 + 0 0 ) 0 0 0 0 0 C 1 I KX 1 (C 0 + 0 0 ) 0 0 0 0 1 0 1 C 1I KX 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 ; 5 where C = [I K1 ; 0] 0 and X 1 = XC Then D = # A, where D is de ned in (10) Based on Lemma B8 g is redundant given g Furthermore, Lemma B9 tells us that any subset of g is redundant given g Proof of Proposition 4: To show the desired result, we only need to show b () = bg () 0 b 1 bg () and () = g () 0 1 g () satisfy the conditions of Lemma B5 First, n 1 [bg 1() g1()] = n 1 [I ( Q b Q )] 0 u(), n 1 [bg ;rs() g;rs()] = n 1 u r () 0 ( b )u s (), 0 g () = 4 Q 0 Wy 1 Q 0 y Q 0 X 1 0 0 0 0 Q 0 X u 1 () 0 (s) Wy 1 u 1 () 0 (s) y u 1 () 0 (s) X 1 0 u () 0 0 Wy 1 u () 0 0 y u () 0 0 X 1 u 1 () 0 X u () 0 Wy 1 u () 0 y u () 0 X 1 u 1 () 0 0 X 0 0 0 u () 0 (s) X ; 5 and 0 g () = 4 0 u 1() 0 u 0 1 () u 1() 0 u 0 () u () 0 u 0 1 () u () 0 u 0 () ; 5 where Q = [GX; X], is either G n 1 tr(g)i n or G diag(g), 0 u 1 () = [Wy 1 ; y ; X 1 ; 0], and 0 u () = [0; 0; 0; X] By inspection of each term of the above matrices, we conclude 8

n 1 [bg () g ()] = o p (1), n 1 [ 0 bg () g ()] = o 0 p (1) and n 1 [ bg () 0 g ()] = 0 o p (1) uniformly in by Lemma B Second, as b G G = ( b 0 )G + ( b 0 ) b GG, we have n 1 tr( b b ) n 1 tr( ) = o p (1) and n 1 tr( b 0 b ) n 1 tr( 0 ) = o p (1) Therefore, as b is a consistent estimator of, we have n 1 ( b ) = o p (1) Hence, we can conclude that n 1 [ b () ()] = o p (1) and n 1 [ 0 b () 0 ()] = o p (1) uniformly in Finally, since n 1= g ( 0 ) = O p (1) by a similar argument in the proof of Proposition 1 and n 1= [bg ( 0 ) g ( 0 )] = o p (1) by Lemma B, n 1= [ b 0 ( 0 ) 0 ( 0 )] = bg ( 0 ) 0 b 1 n 1= [bg ( 0 ) = o p (1): g ( 0 )] + [ bg ( 0 ) 0 b 1 g ( 0 ) 0 1 ]n 1= g ( 0 ) The desired result follows 9

Table : SLS, SLS and GMM Estimation (n = 45) λ 0 = 0 φ 0 = 05 β 0 = 05 γ 0 = 1 σ 1 = 01 SLS 001(018)[018] 049(008)[008] 049(00)[00] - SLS 001(01)[01] 049(008)[008] 049(00)[00] 0999(00)[00] GMM-1 00(005)[005] 0499(00)[00] 0498(005)[005] - GMM- 00(0051)[0051] 049(00)[008] 0498(005)[005] 0999(00)[00] σ 1 = 05 SLS 001(018)[018] 049(008)[008] 0495(00)[00] - SLS 00(0111)[0111] 049(008)[008] 0498(005)[005] 1000(00)[00] GMM-1 00(004)[004] 0501(00)[00] 0498(005)[005] - GMM- 004(0045)[0045] 0495(008)[008] 0499(005)[005] 0999(00)[00] σ 1 = 09 SLS 001(010)[010] 0495(000)[000] 0494(00)[00] - SLS 00(0059)[0059] 049(008)[008] 0500(009)[009] 1001(00)[00] GMM-1 00(0050)[0050] 050(00)[00] 0498(005)[005] - GMM- 001(00)[00] 0495(000)[000] 0500(008)[008] 0999(00)[00] Mean(SD)[RMSE] Table : SLS, SLS and GMM Estimation (n = 490) λ 0 = 0 φ 0 = 05 β 0 = 05 γ 0 = 1 σ 1 = 01 SLS 000(0080)[0080] 049(004)[004] 049(004)[004] - SLS 0599(009)[009] 049(004)[004] 049(004)[004] 1000(0045)[0045] GMM-1 000(005)[005] 0498(004)[004] 0498(004)[004] - GMM- 00(004)[004] 049(004)[004] 0498(0045)[004] 1000(004)[004] σ 1 = 05 SLS 000(0081)[0081] 049(0048)[0048] 049(004)[004] - SLS 000(008)[008] 049(004)[004] 0499(0040)[0040] 0999(004)[004] GMM-1 000(000)[000] 0499(004)[004] 0498(004)[004] - GMM- 001(009)[009] 049(004)[0048] 0499(0040)[0040] 0999(004)[004] σ 1 = 09 SLS 001(008)[008] 049(0048)[0048] 049(004)[004] - SLS 001(004)[004] 049(004)[0048] 0500(000)[000] 0999(004)[004] GMM-1 000(00)[00] 0500(004)[004] 0498(0045)[0045] - GMM- 000(0015)[0015] 0494(0048)[0049] 0500(000)[000] 099(0048)[0048] Mean(SD)[RMSE]

Table 4: SLS, SLS and GMM Estimation (n = 45) λ 0 = 0 φ 0 = 0 β 0 = 0 γ 0 = 1 σ 1 = 01 SLS 0(044)[049] 019(005)[00] 0194(000)[001] - SLS 00(048)[048] 0195(00)[00] 0195(000)[000] 0999(00)[00] GMM-1 0(01)[01] 001(00)[00] 0198(00)[00] - GMM- 040(0145)[0150] 0199(008)[008] 0198(005)[005] 0999(00)[00] σ 1 = 05 SLS 08(049)[044] 0195(000)[000] 0194(008)[008] - SLS 05(05)[01] 0195(009)[009] 019(0058)[0059] 1000(00)[00] GMM-1 048(0189)[0195] 00(00)[00] 0198(00)[00] - GMM- 04(0109)[011] 019(008)[008] 0199(005)[005] 0999(00)[00] σ 1 = 09 SLS 088(089)[099] 0194(000)[000] 0194(008)[008] - SLS 0(018)[010] 019(00)[008] 0199(009)[009] 1001(00)[00] GMM-1 04(018)[0184] 004(00)[00] 0198(005)[005] - GMM- 008(005)[005] 019(008)[009] 000(009)[009] 0999(00)[00] Mean(SD)[RMSE] Table 5: SLS, SLS and GMM Estimation (n = 490) λ 0 = 0 φ 0 = 0 β 0 = 0 γ 0 = 1 σ 1 = 01 SLS 05(051)[05] 0195(004)[0048] 0195(004)[004] - SLS 04(05)[05] 0195(004)[0048] 0195(004)[004] 1000(0045)[0045] GMM-1 010(0094)[0094] 0198(004)[004] 0198(004)[004] - GMM- 010(001)[00] 019(004)[004] 0198(0045)[0045] 1000(004)[004] σ 1 = 05 SLS 0(0)[00] 0195(0048)[0048] 0195(004)[004] - SLS 00(0195)[019] 0195(004)[0048] 019(0040)[0040] 0999(004)[004] GMM-1 011(009)[009] 0199(004)[004] 0198(004)[004] - GMM- 004(004)[004] 019(004)[004] 0199(0040)[0040] 0999(004)[004] σ 1 = 09 SLS 08(080)[08] 0195(0048)[0048] 0194(004)[004] - SLS 00(0100)[0100] 019(004)[0048] 000(000)[000] 0999(004)[004] GMM-1 01(009)[0098] 000(004)[004] 0198(004)[004] - GMM- 00(004)[004] 0195(0048)[0048] 000(000)[000] 099(004)[004] Mean(SD)[RMSE]