GMM Estimation of Spatial Autoregressive Models in a. System of Simultaneous Equations with Heteroskedasticity

Similar documents
GMM Estimation of SAR Models with. Endogenous Regressors

GMM estimation of spatial panels

Identi cation and E cient Estimation of Simultaneous. Equations Network Models

Estimation of a Local-Aggregate Network Model with. Sampled Networks

Simple Estimators for Semiparametric Multinomial Choice Models

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Spatial panels: random components vs. xed e ects

A Robust Test for Network Generated Dependence

GMM Estimation of Spatial Autoregressive Models in a System of Simultaneous Equations

Panel VAR Models with Spatial Dependence

Some Recent Developments in Spatial Panel Data Models

Chapter 2. Dynamic panel data models

Simple Estimators for Monotone Index Models

Spatial Regression. 11. Spatial Two Stage Least Squares. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Notes on Time Series Modeling

Multivariate Choice and Identi cation of Social Interactions

Chapter 1. GMM: Basic Concepts

A SPATIAL CLIFF-ORD-TYPE MODEL WITH HETEROSKEDASTIC INNOVATIONS: SMALL AND LARGE SAMPLE RESULTS

Panel VAR Models with Spatial Dependence

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models

A Course on Advanced Econometrics

13 Endogeneity and Nonparametric IV

Identification and QML Estimation of Multivariate and Simultaneous Spatial Autoregressive Models

GMM Estimation of Social Interaction Models with. Centrality

Notes on Generalized Method of Moments Estimation

1 Estimation of Persistent Dynamic Panel Data. Motivation

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Chapter 2. GMM: Estimating Rational Expectations Models

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

ECON0702: Mathematical Methods in Economics

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

GMM Estimation of the Spatial Autoregressive Model in a System of Interrelated Networks

Estimation of xed e ects panel regression models with separable and nonseparable space-time lters

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Parametric Inference on Strong Dependence

Simultaneous Equations with Binary Outcomes. and Social Interactions

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

The properties of L p -GMM estimators

Testing for Regime Switching: A Comment

When is it really justifiable to ignore explanatory variable endogeneity in a regression model?

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

GMM based inference for panel data models

We begin by thinking about population relationships.

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Spatial Dynamic Panel Model and System GMM: A Monte Carlo Investigation

Nonparametric Trending Regression with Cross-Sectional Dependence

Testing Random Effects in Two-Way Spatial Panel Data Models

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

Tests for Cointegration, Cobreaking and Cotrending in a System of Trending Variables

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

4.3 - Linear Combinations and Independence of Vectors

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Introduction: structural econometrics. Jean-Marc Robin

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

ECON2285: Mathematical Economics

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer

MC3: Econometric Theory and Methods. Course Notes 4

Web Appendix to Multivariate High-Frequency-Based Volatility (HEAVY) Models

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

Tests forspatial Lag Dependence Based onmethodof Moments Estimation

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Rank Estimation of Partially Linear Index Models

A Conditional-Heteroskedasticity-Robust Con dence Interval for the Autoregressive Parameter

Control Functions in Nonseparable Simultaneous Equations Models 1

Outline. Overview of Issues. Spatial Regression. Luc Anselin

The Case Against JIVE

Large Sample Properties of Estimators in the Classical Linear Regression Model

Bootstrapping the Grainger Causality Test With Integrated Data

Estimation of the Conditional Variance in Paired Experiments

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels.

ESTIMATION PROBLEMS IN MODELS WITH SPATIAL WEIGHTING MATRICES WHICH HAVE BLOCKS OF EQUAL ELEMENTS*

1. The Multivariate Classical Linear Regression Model

GMM Estimation with Noncausal Instruments

A New Approach to Robust Inference in Cointegration

Estimation and Inference with Weak Identi cation

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

the error term could vary over the observations, in ways that are related

Short T Dynamic Panel Data Models with Individual and Interactive Time E ects

Estimation of simultaneous systems of spatially interrelated cross sectional equations

Robust Standard Errors to spatial and time dependence in. state-year panels. Lucciano Villacorta Gonzales. June 24, Abstract

Robust Standard Errors in Transformed Likelihood Estimation of Dynamic Panel Data Models with Cross-Sectional Heteroskedasticity

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

11. Bootstrap Methods

Cointegration Tests Using Instrumental Variables Estimation and the Demand for Money in England

Intermediate Econometrics

On Expected Gaussian Random Determinants

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

R = µ + Bf Arbitrage Pricing Model, APM

Economics 241B Estimation with Instruments

Strength and weakness of instruments in IV and GMM estimation of dynamic panel data models

The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors

Transcription:

GMM Estimation of Spatial Autoregressive Models in a System of Simultaneous Equations with Heteroskedasticity Xiaodong Liu Department of Economics, University of Colorado Boulder Email: xiaodong.liu@colorado.edu Paulo Saraiva Department of Economics, University of Colorado Boulder Email: saraiva@colorado.edu October, 1 Abstract This paper proposes a GMM estimation framework for the SAR model in a system of simultaneous equations with heteroskedastic disturbances. Besides linear moment conditions, the proposed GMM estimator also utilizes quadratic moment conditions based on the covariance structure of model disturbances within and across equations. Compared with the QML approach, the GMM estimator is easier to implement and robust under heteroskedasticity of unknown form. We derive the heteroskedasticity-robust standard error for the GMM estimator. Monte Carlo experiments show that the proposed GMM estimator performs well in nite samples. JEL classi cation: C1, C Key words: simultaneous equations, spatial autoregressive models, quadratic moment conditions, unknown heteroskedasticity 1 Introduction The spatial autoregressive (SAR) model introduced by Cli and Ord (19, 1981) has received considerable attention in various elds of economics as it provides a convenient framework to model the interaction between economic agents. However, with a few exceptions (e.g., Kelejian We thank the editor, the associate editor, and two anonymous referees for valuable comments and suggestions. The remaining errors are our own. 1

and Prucha, ; Baltagi and Pirotte, 11; Yang and Lee, 1), most theoretical works in the spatial econometrics literature focus on the single-equation SAR model, which assumes that an economic agent s choice (or outcome) in a certain activity is isolated from her and other agents choices (or outcomes) in related activities. This restrictive assumption potentially limits the usefulness of the SAR model in many contexts. To incorporate the interdependence of economic agents choices and outcomes across di erent activities, Kelejian and Prucha () extends the single-equation SAR model to the simultaneousequation SAR model. They propose both limited information two stage least squares (SLS) and full information three stage least squares (SLS) estimators for the estimation of model parameters and establish the asymptotic properties of the estimators. In a recent paper, Yang and Lee (1) study the identi cation and estimation of the simultaneous-equation SAR model by the full information quasi-maximum likelihood (QML) approach. They give identi cation conditions for the simultaneous-equation SAR model that are analogous to the rank and order conditions for the classical simultaneous-equation model and derive asymptotic properties of the QML estimator. The QML estimator is asymptotically more e cient than the SLS estimator under normality but can be computationally di cult to implement. In this paper, we propose a generalized method of moments (GMM) estimator for the identi- cation and estimation of simultaneous-equation SAR models with heteroskedastic disturbances. Similar to the GMM estimator proposed by Lee () and Lin and Lee (1) for single-equation SAR models, the GMM estimator utilizes both linear moment conditions based on the orthogonality condition between the instrumental variable (IV) and model disturbances, and quadratic moment conditions based on the covariance structure of model disturbances. While the single-equation GMM estimator can be considered as an equation-by-equation limited information estimator for a system of simultaneous equations, the simultaneous-equation GMM estimator proposed in this paper exploits the correlation structure of disturbances within and across equations and thus is a full information estimator. We study the identi cation of model parameters under the GMM framework and derive asymptotic properties of the GMM estimator under heteroskedasticity of unknown form. Furthermore, we propose a heteroskedasticity-robust estimator for the asymptotic variance-covariance matrix of the GMM estimator in the spirit of White (198). The GMM estimator is asymptotically more e cient than the SLS estimator. Compared with the QML estimator considered in Yang and Lee (1), the GMM estimator is easier to implement and robust under heteroskedasticity. Monte

Carlo experiments show that the proposed GMM estimator performs well in nite samples. The remaining of this paper is organized as follows. In Section, we describe the model and give the moment conditions used to construct the GMM estimator. In Section, we establish the identi cation for the model under the GMM framework. We derive the asymptotic properties of the GMM estimator in Section. Results of Monte Carlo simulation experiments are reported in Section. Section brie y concludes. Proofs are collected in the Appendix. Throughout the paper, we adopt the following notation. For an n n matrix A = [a ij ], let diag(a) denote an n n diagonal matrix with the i-th diagonal element being a ii, i.e., diag(a) = diag(a 11 ; ; a nn ). Let (A) denotes the spectral radius of the square matrix A. For an n m matrix B = [b ij ], the vectorization of B is denoted by vec(b) = (b 11 ; ; b n1 ; b 1 ; ; b nm ). 1 Let I n denote the n n identity matrix and i n;k denote the k-th column of I n. Model and Moment Conditions.1 Model The model considered in this paper is given by a system of m simultaneous equations for n cross sectional units, Y = Y + WY + XB + U; (1) where Y = [y 1 ; ; y m ] is an n m matrix of endogenous variables, X is an n K X matrix of exogenous variables, and U = [u 1 ; ; u m ] is an n m matrix of disturbances. W = [w ij ] is an n n nonstochastic matrix of spatial weights, with w ij representing the proximity between cross sectional units i and j. The diagonal elements of W are normalized to be zeros. In the literature, WY is usually referred to as the spatial lag., and B are, respectively, m m, m m and K X m matrices of true parameters in the data generating process (DGP). The diagonal elements of are normalized to be zeros. In general, the identi cation of simultaneous-equation models needs exclusion restrictions. Let k;, k; and k; denote vectors of nonzero elements of the k-th columns of, and B respectively under some exclusion restrictions. Let Y k, Y k and X k denote the corresponding matrices containing columns of Y, Y = WY and X that appear in the right hand side of the k-th equation. 1 If A, B, C are conformable matrices, then vec(abc) = (C A)vec(B), where denotes the Kronecker product. In this paper, all variables are allowed to depend on the sample size, i.e., are allowed to formulate triangular arrays as in Kelejian and Prucha (1). Nevertheless, we suppress the subscript n to simplify the notation. For SAR models, the notion of proximity is not limited to the geographical sense. It can be economic proximity, technology proximity, or social proximity. Hence the SAR model has a broad range of applications.

Then, the k-th equation of model (1) is y k = Y k k; + Y k k; + X k k; + u k : () We maintain the following assumptions regarding the DGP. Assumption 1 Let u ik denote the (i; k)-th element of U and u denote the vectorization of U, i.e., u = vec(u). (i) (u i1 ; ; u im ) are independently distributed across i with zero mean. (ii) E(uu ) = 11 1m..... m1 mm is nonsingular, with kl = lk = diag( 1;kl ; ; n;kl ). (iii) Eju ik u il u is u it j 1+ is bounded for any i = 1; ; n and k; l; s; t = 1; ; m, for some positive constant. Assumption The elements of X are uniformly bounded constants. X has full column rank K X. lim n!1 n 1 X X exists and is nonsingular. Assumption is nonsingular with a zero diagonal. ( (I m ) 1 ) < 1=(W). Assumption W has a zero diagonal. The row and column sums of W and (I mn I n W) 1 are uniformly bounded in absolute value. Assumption k; = ( k; ; k;; k;) is in the interior of a compact and convex parameter space for k = 1; ; m. The above assumptions are based on some standard assumptions in the literature of SAR models (see, e.g., Kelejian and Prucha, ; Lee, ; Lin and Lee, 1). In particular, Assumption is from Yang and Lee (1). Under this assumption, S = I mn I n W is nonsingular, and hence the simultaneous-equation SAR model (1) has a well de ned reduced form y = S 1 (B I n )x + S 1 u; () where y = vec(y), x = vec(x), and u = vec(u). Note that, when m = 1, we have = and = 11;. Then, ( (I m ) 1 ) < 1=(W) becomes the familiar parameter constraint j 11; j < 1=(W) for the single-equation SAR model.

. Motivating examples To illustrate the empirical relevance of the proposed model, we give two motivating examples in this subsection. The rst example models the spillover e ect of scal policies between local jurisdictions. The second example characterizes the conformity e ect in social networks...1 Fiscal policy interaction Consider a set of n local jurisdictions. Each jurisdiction chooses expenditures in m publicly provided services to maximize the social welfare. The social welfare function of jurisdiction i is given by V i = ( ik + k=1 n X lk l=1 j=1 w ij y jl )y ik 1 k=1 l=1 lk y ik y il ; () where ik captures the existing condition of publicly provided service k in jurisdiction i (and other exogenous characteristics of jurisdiction i), y ik represents the spending by jurisdiction i on publicly provided service k, and w ij measures the geographical proximity between jurisdictions i and j (e.g., one could let w ij = 1 if jurisdictions i and j share a common border and w ij = otherwise). The rst term of the social welfare function () captures the social welfare gain from publicly provided services. The marginal social welfare gain of the expenditure y ik is given by ik + P m l=1 lk P n j=1 w ijy jl, where P n j=1 w ijy jl represents the spending by neighboring jurisdictions on publicly provided services and its coe cient lk captures the spillover e ect of publicly provided services (see, e.g., Allers and Elhorst, 11). For example, people may use libraries, parks and other recreation facilities in neighboring jurisdictions. In this case, the spending by a jurisdiction on such facilities might be considered as a substitute for the spending by a neighboring jurisdiction on similar or related facilities and thus generate a negative spillover e ect (i.e. lk < ). On the other hand, if the services provided by neighboring jurisdictions are complements (e.g. road networks and business parks, or hospitals and nursing homes), then the spillover e ect is expected to be positive (i.e. lk > ). The second term of the social welfare function () captures the cost of publicly provided services. The coe cient lk ( lk = kl ) represents the substitution e ect between expenditures by a jurisdiction on publicly provided services k and l. See Hauptneier et al. (1) for some discussion on including the cost of publicly provided services instead of imposing a budget constraint in the social welfare function.

Maximizing the social welfare function () yields the best response function y ik = lk y ik + n X lk l=1;l=k l=1 j=1 w ij y jl + ik where lk = lk = kk, lk = lk = kk, and ik = ik= kk. Suppose ik = x ik k + u ik, where x ik is a vector of the observable characteristics of jurisdiction i and u ik captures the unobservable heterogeneity of jurisdiction i. Then the best response function implies the simultaneous-equation SAR model ()... Social conformity Patacchini and Zenou (1) consider a social conformity model where the social norm is given by the average behavior of peers in a certain activity. We generalize their model by de ning the social norm based on a portfolio of di erence activities. Suppose a set of n individuals interact in a social network. The network topology is captured by the matrix W = [w ij ]. Let d i denote the number of friends of individual i. A possible speci cation of W is such that w ij = 1=d i if individuals i and j are friends and w ij = otherwise. Individual i choose e ort levels y i1 ; ; y im simultaneously in m activities to maximize her utility function U i = k=1 ik y ik 1 k=1 l=1 lk y ik y il m X k=1 1 k(y ik X n % lk w ij y jl ) : () l=1 j=1 The rst term of the utility function () captures the payo from the e orts with the productivity of individual i in activity k given by ik. The second term is the cost from the e orts with the substitution e ect between e orts in di erent activities captured by lk ( lk = kl ). The last term re ects the in uence of an individual s friends on her own behavior. It is such that each individual wants to minimize the social distance between her own behavior y ik to the social norm of that activity. The social norm for activity k is given by the weighted average behavior of her friends in m activities P m l=1 % P n lk j=1 w ijy jl with the weights % lk such that P m l=1 % lk = 1. The coe cient k captures the taste for conformity. Maximizing the utility function () yields the best response function y ik = l=1;l=k lk y ik + X n lk l=1 j=1 w ij y jl + ik

where lk = lk =( kk + k ), lk = k % lk =( kk + k ), and ik = ik=( kk + k ). Suppose ik = x ik k + u ik. Then the best response function leads to the econometric model ().. Moment conditions Following Lee () and Lin and Lee (1), for the estimation of the simultaneous-equation SAR model (1), we consider both linear moment conditions E(Q u k ) = ; () where Q is an n K Q matrix of IVs, and quadratic moment conditions E(u k r u l ) = tr( r kl ); for r = 1; ; p; where r s are n n constant matrices. Note that, if the diagonal elements of r s are zeros, then the quadratic moment conditions become E(u k r u l ) = ; for r = 1; ; p: () As an example, we could use Q = [X; WX; ; W p X] and 1 = W; = W diag(w ); ; p = W p diag(w p ), where p is some predetermined positive integer, to construct the linear and quadratic moment conditions. The quadratic moment conditions () exploit the covariance structure of model disturbances both within and across equations, and hence are more general than the quadratic moment conditions considered in Lee () and Lin and Lee (1). Let the residual function for the k-th equation be u k ( k ) = y k Y k k Y k k X k k ; where k = ( k ; k; k). The empirical linear moment functions based on () can be written as g 1;k g 1;k ( k ) = Q u k ( k ) (8) Suppose the parameters in the best response function can be identi ed, then parameters in the utility function () can be identi ed up to a scale factor. This issue is P common for simultaneous-equation models (Schmidt, 19). In this example, if lk can be identi ed, then k = m kk l=1 P m lk=(1 l=1 lk), which is identi able up to a scale factor kk.

and the empirical quadratic moment functions based on () can be written as g ;kl g ;kl ( k ; l ) = [ 1u k ( k ); ; pu k ( k )] u l ( l ) (9) for k; l = 1; ; m. Combining both linear and quadratic moment functions by de ning g() = g 1() g () ; (1) where = ( 1; ; m), g 1 () = (g 1;1; ; g 1;m), and g () = (g ;11; ; g ;1m; g ;1; ; g ;mm). The identi cation and estimation of the simultaneous-equation SAR model (1) is then based on the moment conditions E[g( )] =. We maintain the following assumption regarding the moment conditions. Assumption (i) The elements of Q are uniformly bounded. (ii) The diagonal elements of r are zeros, and the row and column sums of r are uniformly bounded, for r = 1; ; p. (iii) lim n!1 n 1 exists and is nonsingular, where = Var[g( )]. Identi cation Following Yang and Lee (1), we establish the identi cation of the simultaneous-equation SAR model in two steps. In the rst step, we consider the identi cation of the pseudo reduced form parameters in equation (11). In the second step, we recover the structural parameters from the pseudo reduced form parameters..1 Identi cation of the pseudo reduced form parameters When is nonsingular, the simultaneous-equation SAR model (1) has a pseudo reduced form Y = WY + X + V; (11) where = (I m ) 1, = B (I m ) 1, and V = U(I m ) 1. Equation (11) has the speci cation of a multivariate SAR model (see, Yang and Lee, 1; Liu, 1). First, we consider the identi cation of the pseudo reduced form parameters = [ lk; ] and = [ 1; ; ; m; ] under the GMM framework. 8

The k-th equation in model (11) is given by y k = l=1 lk;wy l + X k; + v k ; where Wy l = H l ( I n )x + H l v (1) with H l = (i m;l W)[I mn ( W)] 1, x = vec(x), and v = vec(v). Hence, the residual function for the k-th equation can be written as v k ( k ) = y k m X l=1 lkwy l X k = d k ( k ) + v k + ( lk; lk )H l v; (1) l=1 where k = ( 1k ; ; mk ; k ) and d k ( k ) = [E(Wy 1 ); ; E(Wy m ); X]( k; k ). The pseudo reduced form parameters in model (11) can be identi ed by the moment conditions described in the previous section. Similar to (8) and (9), the linear moment functions can be written as f 1;k f 1;k ( k ) = Q v k ( k ) and the quadratic moment functions can be written as f ;kl f ;kl ( k ; l ) = [ 1v k ( k ); ; pv k ( k )] v l ( l ) for k; l = 1; ; m. Let f() = [f 1 () ; f () ], where = ( 1; ; m), f 1 () = (f1;1; ; f1;m), and f () = (f;11; ; f;1m; f;1; ; f;mm). For to be identi ed by the moment conditions E[f( )] =, the moment equations lim n!1 n 1 E[f()] = need to have a unique solution at = (Hansen, 198). It follows from (1) that lim n 1 E[f 1;k ( k )] = lim n 1 Q d k ( k ) = lim n 1 Q [E(Wy 1 ); ; E(Wy m ); X]( k; k ) n!1 n!1 n!1 for k = 1; ; m. The linear moment equations, lim n!1 n 1 E[f 1;k ( k )] =, have a unique solution at k = k;, if Q [E(Wy 1 ); ; E(Wy m ); X] has full column rank for n su ciently large. A necessary condition for this rank condition is that [E(Wy 1 ); ; E(Wy m ); X] has full column rank 9

of m + K X and rank(q) m + K X for n su ciently large. If, however, [E(Wy 1 ); ; E(Wy m ); X] does not have full column rank, then the model may still be identi able via the quadratic moment conditions. Suppose for some m f; 1; ; m 1g, E(Wy l ) and the columns of [E(Wy 1 ); ; E(Wy m ); X] are linearly dependent, i.e., E(Wy l ) = P m k=1 c 1;klE(Wy k ) + Xc ;l for some vector of constants (c 1;1l ; ; c 1; ml ; c ;l ) R m+k X, for l = m + 1; ; m. In this case, d k ( k ) = E(Wy j )[ jk; jk + ( lk; lk )c 1;jl ] + X[ k; k + ( lk; lk )c ;l ]; j=1 l= m+1 l= m+1 and hence lim n!1 n 1 E[f 1;k ( k )] = implies that jk = jk; + ( lk; lk )c 1;jl (1) l= m+1 k = k; + ( lk; lk )c ;l ; l= m+1 for j = 1; ; m and k = 1; ; m, provided that Q [E(Wy 1 ); ; E(Wy m ); X] has full column rank for n su ciently large. Therefore, ( 1k; ; ; mk; ; k; ) can be identi ed if lk; (for l = m + 1; ; m) can be identi ed from the quadratic moment conditions. With k given by (1), we have E[v k ( k ) r v l ( l )] = ( ik; ik )tr[h i r E(v l v )] + ( jl; jl )tr[ r H j E(vv k)] i=1 + i=1 j=1 j=1 ( ik; ik )( jl; jl )tr[h i r H j E(vv )]; where E(vv ) = [(I m ) 1 I n ][(I m ) 1 I n ] and E(v k v ) = E(vv k ) = (i m;k I n)e(vv ). Therefore, the quadratic moment equations, lim n!1 n 1 E[f ;kl ( k ; l )] = for k; l = 1; ; m, have a unique solution at, if the equations lim n 1 f ( ik; ik )tr[h i r E(v l v )] + ( jl; jl )tr[ r H j E(vv k)] (1) n!1 i=1 + i=1 j=1 j=1 ( ik; ik )( jl; jl )tr[h i r H j E(vv )]g = ; We adopt the convention that [E(Wy 1 ); ; E(Wy m ); X] = X for m =. 1

for r = 1; ; p and k; l = 1; ; m, have a unique solution at. To wrap up, su cient conditions for the identi cation of the pseudo reduced form parameters are summarized in the following assumption. Assumption At least one of the following conditions holds. (i) lim n!1 n 1 Q [E(Wy 1 ); ; E(Wy m ); X] exists and has full column rank. (ii) lim n!1 n 1 Q [E(Wy 1 ); ; E(Wy m ); X] exists and has full column rank for some m m 1. The equations (1), for r = 1; ; p and k; l = 1; ; m, have a unique solution at. Example 1 To better understand the identi cation conditions in Assumption, consider the pseudo reduced form equations (11) with m = and X = [x; Wx], where x is n 1 vector of individualspeci c exogenous characteristics. In this case, equations (11) can be written as y 1 = 11; Wy 1 + 1; Wy + 11; x + 1; Wx + u 1 (1) y = 1; Wy 1 + ; Wy + 1; x + ; Wx + u : The pseudo reduced form equations (1) have the speci cation of a multivariate SAR model (Yang and Lee, 1). In (1), Wx is a spatial lag of the exogenous variable and its coe cient captures the contextual e ect (Manski, 199). How to identify the endogenous peer e ect captured by kk; from the contextual e ect has been a major interest in the literature of social interaction models. In the multivariate SAR model (1), the identi cation problem is even more challenging because of the presence of the cross-equation peer e ect captured by lk; (l = k). Identi cation of model (1) can be achieved via linear moment conditions if Assumption (i) holds, or via linear and quadratic moment conditions if Assumption (ii) holds. A necessary condition for Assumption (i) is that [E(Wy 1 ); E(Wy ); x; Wx] has full column rank for n su ciently large. It follows by a similar argument as in Cohen-Cole et al. (1) that [E(Wy 1 ); E(Wy ); x; Wx] has full column rank if and only if the matrices I n ; W; W ; W are linearly independent 8 and the A weaker identi cation condition can be derived based on (1) and (1) if the constants c 1;1l ; ; c 1; ml ; c ;l are known to the researcher. 8 For example, the weights matrix W for a star network is given by (n 1) 1 (n 1) 1 1 W =........ : 1 As W = W, the linear independence of I n; W; W ; W does not hold. 11

parameter matrix 11; 1; 1 1; 1; ; 11; + 1; 1; 11; 11; 1; + ; ( 11; + ; ) 1; ; ; 1; 1; 1; 11; ; 11; ; 1; 1; (1) has full column rank. Suppose 11; = 1; = 1; = ; = in the DGP. Then the parameter matrix (1) does not have full column rank. Therefore, [E(Wy 1 ); E(Wy ); x; Wx] does not have full column rank and Assumption (i) fails to hold. In this case, model (1) can be identi ed if Assumption (ii) holds. From (1), E(Wy k ) = H k ( I n )x = as =, which implies that d k ( k ) = [E(Wy 1 ); E(Wy m ); x; Wx]( k; k ) = ( 1k; 1k )x + ( k; k )Wx. Hence, from the linear moment equations lim n!1 n 1 Q d k ( k ) =, only 1k; and k; can be identi ed if lim n!1 n 1 Q [x; Wx] exists and has full column rank. Identi cation of lk; has to be achieved via the quadratic moment equations (1). It follows from (1) that, for k = l = 1, ( 11; 11 ) lim n!1 n 1 tr[h 1( r + r)e(v 1 v )] + ( 1; 1 ) lim n!1 n 1 tr[h ( r + r)e(v 1 v )] +( 11; 11 ) lim n!1 n 1 tr[h 1 r H 1 E(vv )] + ( 1; 1 ) lim n!1 n 1 tr[h r H E(vv )] +( 11; 11 )( 1; 1 ) lim n!1 n 1 tr[h 1( r + r)h E(vv )] = ; (18) for r = 1; ; p. If the matrix lim n 1 n!1 tr[h 1( 1 + 1)E(v 1 v )] tr[h 1( p + p)e(v 1 v )] tr[h ( 1 + 1)E(v 1 v )] tr[h ( p + p)e(v 1 v )] tr[h 1 1 H 1 E(vv )] tr[h 1 p H 1 E(vv )] tr[h 1 H E(vv )] tr[h p H E(vv )] tr[h 1( 1 + 1)H E(vv )] tr[h 1( p + p)h E(vv )] has full row rank, (18) has a unique solution at ( 11; ; 1; ) and hence ( 11; ; 1; ) can be identi ed. Similarly, ( 1; ; ; ) can be identi ed from (1) with k = l =. 1

. Identi cation of the structural parameters Provided that the pseudo reduced form parameters and can be identi ed from the linear and quadratic moment conditions as discussed above. Then, the identi cation problem of the structural parameters in = [(I m ) ; ; B ] through the linear restrictions = (I m ) 1 and = B (I m ) 1 is essentially the same one as in the classical linear simultaneous-equation model (see, e.g., Schmidt, 19). Let # k; denote the k-th column of. Suppose there are R k restrictions on # k; of the form R k # k; = where R k is a R k (m+k X ) matrix of known constants. Following a similar argument as in Yang and Lee (1), the su cient and necessary rank condition for identi cation is rank(r k ) = m 1, and the necessary order condition is R k m 1, for k = 1; ; m. Assumption 8 For k = 1; ; m, R k # k; = for some R k (m + K X ) constant matrix R k with rank(r k ) = m 1: Example To better understand the rank condition in Assumption 8, consider the model y 1 = 1; y + 11; Wy 1 + X 1; + u 1 (19) y = 1; y 1 + ; Wy + X ; + u with pseudo reduced-form equations y 1 = 11; Wy 1 + 1; Wy + X 1; + v 1 () y = 1; Wy 1 + ; Wy + X ; + v ; where 11; 1; 1; ; = (1 1; 1; ) 1 11; 1; 11; 1; ; ; (1) and [ 1; ; ; ] = (1 1; 1; ) 1 [ 1; + 1; ; ; ; + 1; 1; ]: () Suppose Assumption holds for () and thus the pseudo reduced-form parameters can be identi- 1

ed. Then, the structural parameters = [(I m ) ; ; B ] = 1 1; 11; 1; 1; 1 ; ; can be identi ed from (1) and () if the rank condition holds. The exclusion restriction for the rst equation of model (19) can be represented by R 1 = [; ; ; 1; 1KX ]. Then R 1 = [; ; ], which has rank 1 if ; =. Similarly, the exclusion restriction for the second equation can be represented by R = [; ; 1; ; 1KX ]. Then R = [ 11; ; ] which has rank 1 if 11; =. Indeed, if 11; = ; = in the DGP, then (19) becomes a classical linear simultaneous equations model, which cannot be identi ed without imposing exclusion restrictions on k;. GMM Estimation.1 Consistency and asymptotic normality Based on the moment conditions E[g( )] =, the GMM estimator for the simultaneous-equation SAR model (1) is given by e gmm = arg min g() F Fg() () where F is some conformable matrix such that lim n!1 F exists with full row rank greater than or equal to dim (). Usually, F F is referred to as the GMM weighting matrix. To characterize the asymptotic distribution of the GMM estimator, rst we need to derive = Var[g( )] and D = E[ @ @ g( )]. Lemmas A.1 and A. in the Appendix that As r s have zero diagonals for r = 1; ; p, it follows by = Var[g( )] = 11 ; () where 11 = Var[g 1 ( )] = (I m Q) (I m Q) = Q 11 Q Q 1m Q..... Q 1m Q Q mm Q 1

and = Var[g ( )] with a typical block matrix in given by E(g ;ij g;kl)j = = tr( il 1 jk 1 ) + tr( ik 1 jl 1) tr( il 1 jk 1 ) + tr( ik p jl p)..... tr( il p jk 1 ) + tr( ik p jl 1) tr( il p jk p ) + tr( ik p jl p) : The explicit expression for D depends on the speci c restrictions imposed on the model parameters. Let Z k = [Y k ; Y k ; X k ]. Then, @ D = E[ @ g( )] = [D 1; D ] ; () where @ D 1 = E[ @ g 1( )] = Q E(Z 1 )... Q E(Z m ) and @ D = E[ @ g ( )] = 1;11. 1;1m... 1;m1. + ;11.... ;m1... ;1m ; 1;mm ;mm with 1;kl = [E(Z k 1u l ); ; E(Z k pu l )] and ;kl = [E(Z l 1u k ); ; E(Z l pu k )]. In the following proposition we establish consistency and asymptotic normality of the GMM estimator e gmm de ned in (). Proposition 1 Suppose Assumptions 1-8 hold, Then, p n( e gmm ) d! N(; AsyVar( e gmm )) 1

where AsyVar( e gmm ) = lim n!1 [(n 1 D) F F(n 1 D)] 1 (n 1 D) F F(n 1 )F F(n 1 D)[(n 1 D) F F(n 1 D)] 1 with and D de ned in () and () respectively. With F F in () replaced by (n 1 ) 1, AsyVar( e gmm ) reduces to (lim n!1 n 1 D 1 D) 1. Therefore, by the generalized Schwarz inequality, (n 1 ) 1 is the optimal GMM weighting matrix. However, since depends on the unknown matrix, the GMM estimator with the optimal weighting matrix (n 1 ) 1 is infeasible. The following proposition extends the result in Lin and Lee (1) to the simultaneous-equation SAR model by suggesting consistent estimators for n 1 and n 1 D under heteroskedasticity of unknown form. With consistently estimated n 1 and n 1 D, the feasible optimal GMM estimator and its heteroskedasticity-robust standard error can be obtained. Proposition Suppose Assumptions 1-8 hold. Let e be a consistent estimator of and e kl = diag(eu 1k eu 1l ; ; eu nk eu nl ) where eu ik is the i-th element of eu k = u k ( e k ). Let n 1 D e and n 1 e be estimators of n 1 and n 1 D, with and kl in and D replaced by e and e kl respectively. Then, n 1 D e n 1 D = o p (1) and n 1 e n 1 = o p (1). Finally Proposition establishes asymptotic normality of the feasible optimal GMM estimator. Proposition Suppose Assumptions 1-8 hold. The optimal GMM estimator is given by b gmm = arg min g() e g(); () where n 1 e is a consistent estimator of n 1. Then, p n( b gmm ) d! N(; (lim n!1 n 1 D D) 1 ). Note that, the SLS estimator can be treated as a special case of the optimal GMM estimator using only linear moment conditions, i.e., b SLS = arg min g 1 () e 1 11 g 1() = (Z e PZ) 1 Z e Py; where Z = Z 1... Z m 1

and e P = (I m Q)[(I m Q ) e (I m Q)] 1 (I m Q ). Similar to Proposition, we can show that p n( b SLS )! d N(; ( lim n 1 D 1 1 n!1 11 D 1 1 )): Since D 1 D D 1 1 11 D 1 = D 1 D, which is positive semi-de nite, the proposed GMM estimator is asymptotically more e cient than the SLS estimator.. Best moment conditions under homoskedasticity The above optimal GMM estimator is only optimal given the chosen moment conditions. The asymptotic e ciency of the optimal GMM estimator can be improved by choosing the best moment conditions. As discussed in Lin and Lee (1), under heteroskedasticity of unknown form, the best moment conditions may not be available. However, under homoskedasticity, it is possible to nd the best linear and quadratic moment conditions with Q and r s satisfying Assumption. In general, the best moment conditions depend on the model speci cation. For expositional purpose, we consider a two-equation SAR model given by y 1 = 1; y + 11; Wy 1 + 1; Wy + X 1 1; + u 1 () y = 1; y 1 + 1; Wy 1 + ; Wy + X ; + u where X 1 and X are respectively n K 1 and n K sub-matrices of X. Suppose u 1 and u are n 1 vectors of i.i.d. random variables with zero mean and E(u 1 u 1) = 11 I n, E(u u ) = I n and E(u 1 u ) = 1 I n. The reduced form of () is y 1 y = S 1 X 1 1; + S 1 X ; u 1 u ; (8) where S = I n I n W = 1 1; 1; 1 I n 11; 1; 1; ; W: 1

Let (S 1 ) kl denote the (k; l)-th block matrix of S 1, i.e., (S 1 ) kl = (i ;k I n)s 1 (i ;l I n ), for k; l = 1;. Then, from (8), y 1 = (S 1 ) 11 X 1 1; + (S 1 ) 1 X ; + (S 1 ) 11 u 1 + (S 1 ) 1 u (9) y = (S 1 ) 1 X 1 1; + (S 1 ) X ; + (S 1 ) 1 u 1 + (S 1 ) u : With the residual functions u 1 ( 1 ) = y 1 1 y 11 Wy 1 1 Wy X 1 1 u ( ) = y 1 y 1 1 Wy 1 Wy X ; the moment functions are given by g() = [g 1 () ; g () ], where g 1 () = Q u 1 ( 1 ) Q u ( ) and g () = g ;11 ( 1 ; 1 ) g ;1 ( 1 ; ) g ;1 ( ; 1 ) g ; ( ; ) ; with g ;kl ( k ; l ) = [ 1u k ( k ); ; pu k ( k )] u l ( l ). Then, the asymptotic variance-covariance matrix for the optimal GMM estimator de ned in () is AsyVar( b gmm ) = (lim n!1 n 1 D D) 1. Under homoskedasticity, de ned in () can be simpli ed so that 11 = Var[g 1 ( )] = 11Q Q 1 Q Q 1 Q Q Q Q 18

and = Var[g ( )] = 11 11 1 11 1 1 11 1 1 11 1 11 1 11 1 1 1 + 11 11 1 11 1 1 11 1 11 1 1 11 1 1 11 1 ; 1 1 1 1 1 1 with 1 = tr( 1 1 ) tr( 1 p )..... tr( p 1 ) tr( p p ) and = tr( 1 1) tr( 1 p)..... tr( p 1) tr( p p) : Furthermore, D de ned in () can be simpli ed so that @ D 1 = E[ @ g 1( )] = Q E(Z 1 ) Q E(Z ) () and @ D = E[ @ g ( )] = 1;11 1;1 1;1 + ;11 ;1 ;1 ; (1) 1; ; with 1;kl = [E(Z k 1u l ); ; E(Z k pu l )] and ;kl = [E(Z l 1u k ); ; E(Z l pu k )]. Let G kl = W(S 1 ) kl for k; l = 1;. From the reduced form (9), E(Z k ) = [(S 1 ) k;1 X 1 1; +(S 1 ) k; X ; ; G 11 X 1 1; +G 1 X ; ; G 1 X 1 1; +G X ; ; X k ] 19

and E(Z kau l ) = 1l tr[a (S 1 ) k;1 ] + l tr[a (S 1 ) k; ] 1l tr(a G 11 ) + l tr(a G 1 ) 1l tr(a G 1 ) + l tr(a G ) for k; l = 1;, where A can be replaced by either r or r for r = 1; ; p. As we can see now, both and D depend on Q and r s. The best Q and r s are those that minimize the asymptotic variance-covariance matrix of b gmm given by (lim n!1 n 1 D D) 1. Under homoskedasticity, we can use the criterion for the redundancy of moment conditions (Breusch et al., 1999) to show that the best Q and r s satisfying Assumption are Q = [X; (S 1 ) 11 X 1 ; (S 1 ) 1 X ; (S 1 ) 1 X 1 ; (S 1 ) X ; G 11 X 1 ; G 1 X ; G 1 X 1 ; G X ] 1 = (S 1 ) 11 diag[(s 1 ) 11 ] = G 11 diag(g 11 ) = (S 1 ) 1 diag[(s 1 ) 1 ] = G 1 diag(g 1 ) = (S 1 ) 1 diag[(s 1 ) 1 ] = G 1 diag(g 1 ) = (S 1 ) diag[(s 1 ) ] 8 = G diag(g ): () Proposition For model () with homoskedastic disturbances, the best Q and r s that satisfy Assumption are given in (). The best moment conditions with Q and r s given in () are not feasible as Q and r s involve unknown parameters and. With some consistent preliminary estimators for and, the feasible best moment conditions can be obtained in a similar manner as in Lee (). Note that, (S 1 ) kl = P 1 j= c jw j for some constants c ; c 1 ;, for k; l = 1;. Thus, we can use the leading order terms of the series expansion, i.e., I n ; W; ; W p to approximate the unknown (S 1 ) kl and G kl = W(S 1 ) kl in Q and r s. Since the approximation error goes to zero very fast as p increases, in practice, we could use Q = [X; WX; ; W p X] and 1 = W; = W diag(w ); ; p = W p diag(w p ) for some small p to construct the moment conditions.

Monte Carlo To study the nite sample performance of the proposed GMM estimator, we conduct a small Monte Carlo simulation experiment. The model considered in the experiment is given by y 1 = 1; y + 11; Wy 1 + 1; Wy + 1; x 1 + u 1 y = 1; y 1 + 1; Wy 1 + ; Wy + ; x + u : In the DGP, we set 1; = 1; = :, 11; = ; = :, and 1; = 1; = :. For the spatial weights matrix W = [w ij ], we take guidance from the speci cation in Kelejian and Prucha (1). More speci cally, we divide the sample into two halves, with each half formulating a circle. In the rst circle (containing n= cross sectional units), every unit is connected with the unit ahead of it and the unit behind. In the second circle, every unit is connected with the 9 units ahead of it and the 9 units behind. There are no connections across those two circles. Let d i be the number of connections of the i-th unit. We set w ij = 1=d i if units i and j are connected and w ij = otherwise. We generate x 1 and x from independent standard normal distributions. We generate u 1 = (u 11 ; ; u n1 ) and u = (u 1 ; ; u n ) such that u i1 = p & i i1 and u i = p & i i, where i1 and i are respectively the i-th elements of 1 and with 1 B N @; I n 1 I n 1 I n I n 1 C A : For the heteroskedastic case, we set & i = p d i =. The average variances of u 1 and u are. For the homoskedastic case, we set & i =. We conduct 1 replications in the simulation experiment for di erent speci cations with n f; g, 1 f:; :; :g, and ( 1; ; ; ) f(1; 1); (:; :)g. Let u 1 ( 1 ) = y 1 1 y 11 Wy 1 1 Wy 1 x 1 u ( ) = y 1 y 1 1 Wy 1 Wy x ; where 1 = ( 1 ; 11 ; 1 ; 1 ) and = ( 1 ; 1 ; ; ). Let Q = [x 1 ; x ; Wx 1 ; Wx ; W x 1 ; W x ] and = W. We consider the following estimators: 1

(a) The SLS estimator of 1 based on the linear moment function Q u 1 ( 1 ). (b) The SLS estimator of = ( 1; ) based on the linear moment function (I Q) u(), where u() = [u 1 ( 1 ) ; u ( ) ]. (c) The single-equation GMM (GMM-1) estimator of 1 based on the linear moment function Q u 1 ( 1 ) and quadratic moment function u 1 ( 1 ) u 1 ( 1 ). (d) The system GMM (GMM-) estimator of based on the linear moment function (I Q) u() and the quadratic moment functions u 1 ( 1 ) u 1 ( 1 ), u 1 ( 1 ) u ( ), and u ( ) u ( ). (e) The QML estimator of described in Yang and Lee (1). Among the above estimators, the SLS and GMM-1 are equation-by-equation limited information estimators, while SLS, GMM- and QML are full information estimators. To obtain the heteroskedasticity-robust optimal weighting matrix for the SLS and GMM estimators, we consider preliminary estimators of 1 and given by e 1 = arg min u 1 ( 1 ) Q(Q Q) 1 Q u 1 ( 1 ) + [u 1 ( 1 ) u 1 ( 1 )] e = arg min u ( ) Q(Q Q) 1 Q u ( ) + [u ( ) u ( )] : The estimation results are reported in Tables 1-. We report the mean and standard deviation (SD) of the empirical distributions of the estimates. To facilitate the comparison of di erent estimators, we also report their root mean square errors (RMSE). The main observations from the experiment are summarized as follows. [Insert Tables 1- here] With homoskedastic disturbances, all the estimators are essentially unbiased when n =. The QML estimator for ( 11; ; 1; ) has the smallest SD with the GMM- estimator being a close runner up. As reported in Table 1, when n = and 1 = :, the QML estimator of 11; reduces the SDs of the SLS, SLS, GMM-1 and GMM- estimators by, respectively,.9%,.%, 1.9% and.1%; and the QML estimator of 1; reduces the SDs of the SLS, SLS, GMM-1 and GMM- estimators by, respectively,.1%,.%,.1% and.%. On the other hand, the SDs of all the estimators for 1; and 1; are largely the same.

With heteroskedastic disturbances, the QML estimator for ( 11; ; 1; ) is downwards biased. The bias remains as sample size increases. The GMM- estimator for ( 11; ; 1; ) has the smallest SD. As reported in Table, when n = and 1 = :, the GMM- estimator of 11; reduces the SDs of the SLS, SLS, GMM-1 and QML estimators by, respectively,.%, 1.%,.% and 1.8%; and the GMM- estimator of 1; reduces the SDs of the SLS, SLS, GMM-1 and QML estimators by, respectively, 1.%, 1.8%, 11.% and 19.%. When 1; = ; = :, the IV matrix Q is less informative and the e ciency improvement of the GMM- estimator relative to the other estimators is more prominent. As reported in Table, when n = and 1 = :, the GMM- estimator of 11; reduces the SDs of the SLS, SLS, GMM-1 and QML estimators by, respectively,.%,.%, 1.% and.%; and the GMM- estimator of 1; reduces the SDs of the SLS, SLS, GMM-1 and QML estimators by, respectively,.9%,.9%, 1.% and 9.%. In this case, the GMM- estimators for 1; and 1; also reduce the SDs of the QML estimators by, respectively, 9.% and.8%. The computational cost of the GMM estimator is much lower than that of the QML estimator. For example, when n =, 1; = ; = 1, 1 = : and disturbances are homoskedastic, the average computation time for GMM-1, GMM- and QML estimators are, respectively,.,.1 and 9.8 seconds. 9 Summary In this paper, we propose a general GMM framework for the estimation of SAR models in a system of simultaneous equations with unknown heteroskedasticity. We introduce a new set of quadratic moment conditions to construct the GMM estimator based on the correlation structure of model disturbances within and across equations. We establish the consistency and asymptotic normality of the proposed the GMM estimator and discuss the optimal choice of moment conditions. We also provide heteroskedasticity-robust estimators for the optimal GMM weighting matrix and the asymptotic variance-covariance matrix of the GMM estimator. The Monte Carlo experiments show that the proposed GMM estimator perform well in nite samples. In particular, the optimal GMM estimator with some simple moment functions is robust under heteroskedasticity with no apparent loss in e ciency under homoskedasticity, whereas the 9 The Monte Carlo experiments are performed on a computer with Intel (R) Xeon (R) CPU X @. GHz and. GB RAM.

QML estimator for the spatial lag coe cient is biased with a large standard deviation in the presence of heteroskedasticity. Furthermore, the computational cost of the GMM approach is drastically smaller than that of the QML. References Allers, M. A. and Elhorst, J. P. (11). A simultaneous equations model of scal policy interactions, Journal of Regional Science 1: 1 91. Baltagi, B. H. and Pirotte, A. (11). Seemingly unrelated regressions with spatial error components, Empirical Economics : 9. Breusch, T., Qian, H., Schmidt, P. and Wyhowski, D. (1999). Redundancy of moment conditions, Journal of Econometrics 91: 89 111. Cli, A. and Ord, J. (19). Spatial Autecorrelation, Pion, London. Cli, A. and Ord, J. (1981). Spatial Processes, Models and Applications, Pion, London. Cohen-Cole, E., Liu, X. and Zenou, Y. (1). Multivariate choices and identi cation of social interactions. Working paper, University of Colorado Boulder. Hansen, L. P. (198). Large sample properties of generalized method of moments estimators, Econometrica : 19 1. Hauptneier, S., Mittermaier, F. and Rincke, J. (1). Fiscal competition over taxes and public inputs, Regional Science and Urban Economics : 19. Kelejian, H. H. and Prucha, I. R. (). Estimation of simultaneous systems of spatially interrelated cross sectional equations, Journal of Econometrics 118:. Kelejian, H. H. and Prucha, I. R. (1). Speci cation and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances, Journal of Econometrics 1:. Lee, L. F. (). GMM and SLS estimation of mixed regressive, spatial autoregressive models, Journal of Econometrics 1: 89 1. Lin, X. and Lee, L. F. (1). GMM estimation of spatial autoregressive models with unknown heteroskedasticity, Journal of Econometrics 1:.

Liu, X. (1). GMM identi cation and estimation of peer e ects in a system of simultaneous equations. Working paper, Univeristy of Colorado Boulder. Manski, C. F. (199). Identi cation of endogenous social e ects: the re ection problem, The Review of Economic Studies : 1. Patacchini, E. and Zenou, Y. (1). Juvenile delinquency and conformism, Journal of Law, Economics, and Organization 8: 1 1. Schmidt, P. (19). Econometrics, Marcel Dekker, New York. White, H. (198). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity, Econometrica 8: 81 88. White, H. (199). Estimation, Inference and Speci cation Analysis, Cambridge Univsersity Press, New York. Yang, K. and Lee, L. F. (1). Identi cation and QML estimation of multivariate and simultaneous equations spatial autoregressive models, Journal of Econometrics 19: 19 1.

A Lemmas In the following, we list some lemmas useful for proving the main results in this paper. Lemma A.1 Let A = [a ij ] and B = [b ij ] be n n nonstochastic matrices with zero diagonals. Let 1 ; ; ; be n 1 vectors of independent random variables with zero mean. Let kl = E( k l ) for k; l = 1; ; ;. Then, E( 1A B ) = tr( 1 A B ) + tr( 1 A B): Proof: As a ii = b ii = for all i, E( 1A B ) nx nx nx nx = E( a ij b kl 1;i ;j ;k ;l ) = i=1 j=1 k=1 l=1 nx a ii b ii E( 1;i ;i ;i ;i ) + i=1 + nx nx a ii b jj E( 1;i ;i )E( ;j ;j ) i=1 j=i nx nx nx nx a ij b ij E( 1;i ;i )E( ;j ;j ) + a ij b ji E( 1;i ;i )E( ;j ;j ) i=1 j=i i=1 j=i = tr( 1 A B ) + tr( 1 A B): Lemma A. Let A = [a ij ] be an n n nonstochastic matrix with a zero diagonal and c = (c 1 ; ; c n ) be an n 1 nonstochastic vector. Let 1 ; ; be n 1 vectors of independent random variables with zero mean. Then, E( 1A c) = : Proof: As a ii = for all i, nx nx nx nx E( 1A c) = E( a ij c k 1;i ;j ;k ) = a ii c i E( 1;i ;i ;i ) = : i=1 j=1 k=1 i=1

Lemma A. Let A be an mn mn nonstochastic matrix with row and column sums uniformly bounded in absolute value. Suppose u satis es Assumption 1. Then (i) n 1 u Au = O p (1) and (ii) n 1 [u Au E(u Au)] = o p (1). Proof: A trivial extension of Lemma A. in Lin and Lee (1). Lemma A. Let A be an mn mn nonstochastic matrix with row and column sums uniformly bounded in absolute value. Let c be an mn1 nonstochastic vector with uniformly bounded elements. Suppose u satis es Assumption 1. Then n 1= c Au = O p (1) and n 1 c Au = o p (1). Furthermore, if lim n!1 n 1 c AA c exists and is positive de nite, then n 1= c Au! d N(; lim n!1 n 1 c AA c). Proof: A trivial extension of Lemma A. in Lin and Lee (1). Lemma A. Let A kl be an n n nonstochastic matrix with row and column sums uniformly bounded in absolute value and c k an n 1 nonstochastic vector with uniformly bounded elements for k; l = 1; ; m. Suppose u satis es Assumption 1. Let = Var(), where = P m k=1 c k u k + P m k=1 P m l=1 [u k A klu l tr(a kl kl )]. If n 1 is bounded away from zero, then 1 d! N(; 1). Proof: A trivial extension of Lemma in Yang and Lee (1). Lemma A. Let c 1 and c be mn 1 nonstochastic vectors with uniformly bounded elements. Let S = I mn ( I n ) ( W) and e S = I mn ( e I n ) ( e W), where e and e are consistent estimators of and respectively. Then, n 1 c 1( e S 1 S 1 )c = o p (1). Proof: A trivial extension of Lemma A.9 in Lee (). Lemma A. Let f() = [f 1 () ; f () ] with E[f( )] =. De ne D i = E[ @ @ f i ()] and ij = E[f i ()f j () ] for i; j = 1;. The following statements are equivalent (i) f is redundant given f 1 ; (ii) D = 1 1 11 D 1 and (iii) there exists a matrix A such that D = 1 A and D 1 = 11 A. Proof: See Breusch et al. (1999). B Proofs Proof of Proposition 1: For consistency, we rst need to show that n 1 Fg() n 1 E[Fg()] = o p (1) uniformly in. Suppose the i-th row of F can be written as F i = [f i;1 ; ; f i;m ; f i;11;1 ; ; f i;11;p ; ; f i;mm;1 ; ; f i;mm;p ]:

Then, px F i g() = f i;k Q u k ( k ) + f i;kl;r u k ( k ) r u l ( l ): k=1 k=1 l=1 r=1 Let k; = ( 1k; ; ; mk; ) and k; = ( 1k; ; ; mk; ) denote, respectively, the k-th column of and, including the restricted parameters. From the reduced form (), u k ( k ) = y k Y k k Y k k X k k () = Y k ( k; k ) + Y k ( k; k ) + X k ( k; k ) + u k = Y( k; k ) + WY( k; k ) + X k ( k; k ) + u k = d k ( k ) + u k + [( lk; lk )(i m;l I n ) + ( lk; lk )(i m;l W)]S 1 u l=1 where d k ( k ) = [( lk; lk )(i m;l I n ) + ( lk; lk )(i m;l W)]S 1 (B I n )x + X k ( k; k ) l=1 and S = I mn I n W. This implies that E[Q u k ( k )] = Q d k ( k ) 8

and E[u k ( k ) r u l ( l )] = d k ( k ) r d l ( l ) + ( jl; jl )tr[ r (i m;j I n )S 1 E(uu k)] + ( jl; jl )tr[ r (i m;j W)S 1 E(uu k)] j=1 + ( ik; ik )tr[ r(i m;i I n )S 1 E(uu l)] + ( ik; ik )tr[ r(i m;i W)S 1 E(uu l)] + + + + i=1 i=1 j=1 i=1 j=1 i=1 j=1 i=1 j=1 ( ik; ik )( jl; jl )tr[s 1 (i m;i I n ) r (i m;j I n )S 1 ] ( ik; ik )( jl; jl )tr[s 1 (i m;i W) r (i m;j I n )S 1 ] ( ik; ik )( jl; jl )tr[s 1 (i m;i I n ) r (i m;j W)S 1 ] ( ik; ik )( jl; jl )tr[s 1 (i m;i W) r (i m;j W)S 1 ]: j=1 i=1 As F i g() is a quadratic function of and the parameter space of is bounded, it follows by Lemmas A. and A. that n 1 F i g() n 1 E[F i g()] = o p (1) uniformly in. Furthermore, n 1 E[Fg()] is uniformly equicontinuous in. The identi cation condition and the uniform equicontinuity of n 1 E[Fg()] imply that the identi cation uniqueness condition for n E[g() ]F FE[g()] holds. Therefore, e gmm is a consistent estimator of (White, 199). For the asymptotic normality, we use the mean value theorem to write p n( e gmm ) = 1 @ n @ g(e gmm ) F 1 n F @ 1 @ g() 1 @ n @ g(e gmm ) F 1 pn Fg( ) where is as convex combination of e gmm and. By Lemma A. together with the Cramer Wald device, 1 pn Fg( ) converges in distribution to N(; lim n!1 n 1 FF ). Furthermore, consistency of e gmm implies that also converges in probability to. Therefore it su ces to show that n 1 @ @ g() lim n!1 n 1 E[ @ @ g()] = o p (1) uniformly in. We divide the remainder of the proof into two parts focusing respectively on the partial derivatives of g 1 () and g (). First, note that @ @ g 1() = Q Z 1... : Q Z m 9

From the reduced form (), we have y l = (i m;l I n )S 1 [(B I n )x + u]: By Lemma A., we have n 1 Q y l n 1 E(Q y l ) = o p (1) and n 1 Q Wy l = n 1 E(Q Wy l ) + o p (1), which implies that that n 1 @ g 1 () Next, note that @ n 1 E[ @ @ g 1 ()] = o p (1). @ @ g () = 1;11 ( 1 ). 1;1m ( m )... 1;m1 ( 1 ). 1;mm ( m ) ;11 ( 1 ) ;m1 ( m )....... ;1m ( 1 ) ;mm ( m ) ; where 1;kl ( l ) = [Z k 1u l ( l ); ; Z k pu l ( l )] and ;kl ( k ) = [Z l 1u k ( k ); ; Z l pu k ( k )]. As u k ( k ) can be expanded as (), it follows by Lemmas A. and A. that n 1 Z k ru l ( l ) n 1 E[Z k ru l ( l )] = o p (1) and n 1 Z l ru k ( k ) @ n 1 E[Z l ru k ( k )] = o p (1) uniformly in. Therefore, n 1 @ g () n 1 E[ @ @ g ()] = o p (1) and the desired result follows. Proof of Proposition : We divide the proof into two parts. First we prove the consistency of n 1 e. Then we prove the consistency of n 1 e D. (1) To show n 1 e n 1 = o p (1), we need to show that (1a) n 1 Q e kl Q n 1 Q kl Q = o p (1) and (1b) n 1 tr( e kl A 1 e st A ) n 1 tr( kl A 1 st A ) = o p (1), for k; l; s; t = 1; ; m, for nn zerodiagonal matrices A 1 = [a 1;ij ] and A = [a ;ij ] that are uniformly bounded in row and column sums. The consistency of n 1 Q e kl Q in (1a) can be shown by a similar argument as in White (198). Thus, we focus on the consistency of n 1 tr( e kl A 1 e st A ) = n 1 P n i;j=1 a 1;ija ;ji eu ik eu il eu js eu jt. It follows by a similar argument as in Lin and Lee (1) that n 1 P n i;j=1 a 1;ija ;ji u ik u il u js u jt n 1 P n i;j=1 a 1;ija ;ji i;kl j;st = o p (1). Therefore, to show (1b) holds, we only need to show that n 1 P n i;j=1 a 1;ija ;ji eu ik eu il eu js eu jt n 1 P n i;j=1 a 1;ija ;ji u ik u il u js u jt = o p (1).

Note that X n n 1 X n a 1;ij a ;ji eu ik eu il eu js eu jt n 1 a 1;ij a ;ji u ik u il u js u jt i;j=1 i;j=1 X n X n = n 1 a 1;ij a ;ji (eu ik eu il u ik u il )u js u jt + n 1 a 1;ij a ;ji u ik u il (eu js eu jt u js u jt ) i;j=1 i;j=1 X n +n 1 a 1;ij a ;ji (eu ik eu il u ik u il )(eu js eu jt u js u jt ): i;j=1 From (), we have eu k = u k ( e k ) = y k Y k e k Y k e k X k e k = d k ( e k ) + u k + e k ( e k ) where d k ( e k ) = e k ( e k ) = [( lk; e lk )(i m;l I n ) + ( lk; lk e )(i m;l W)]S 1 (B I n )x + X k ( e k; k ) l=1 [( lk; e lk )(i m;l I n ) + ( lk; lk e )(i m;l W)]S 1 u l=1 and S = I mn ( I n ) ( W). Let d ik and e ik denote the i-th element of d k ( e k ) and e k ( e k ) respectively. Then, eu ik eu il = u ik u il + d ik d il + e ik e il + (u ik d il + d ik u il ) + (u ik e il + e ik u il ) + (d ik e il + e ik d il ): To show n 1 P n i;j=1 a 1;ija ;ji (eu ik eu il orders in u il. One of such terms is u ik u il )u js u jt = o p (1), we focus on terms that are of higher X n n 1 i;j=1 a 1;ij a ;ji e ik u il u js u jt = X n ( rk; e rk )n 1 a 1;ij a ;ji u il u js u jt (i m;r i n;i)s 1 u r=1 + i;j=1 ( X rk; rk e n )n 1 a 1;ij a ;ji u il u js u jt (i m;r w i )S 1 u; r=1 i;j=1 where w i denotes the i-th row of W. By Assumption 1, we can show Eju hk u il u js u jt j c for some constant c using Cauchy s inequality, which implies Ejn 1 P n i;j=1 P m r=1 a 1;ija ;ji u il u js u jt (i m;r i n;i )S 1 uj = O(1) and Ejn 1 P n i;j=1 P m r=1 a 1;ija ;ji u il u js u jt (i m;r w i )S 1 uj = O(1) because A 1, 1

A, W and S 1 are uniformly bounded in row and column sums. Hence, n 1 P n i;j=1 a 1;ija ;ji e ik u il u js u jt = o p (1). Similarly, we can show other terms in n 1 P n i;j=1 a 1;ija ;ji (eu ik eu il u ik u il )u js u jt are of order o p (1). With a similar argument as above or as in Lin and Lee (1), n 1 P n i;j=1 a 1;ija ;ji u ik u il (eu js eu jt u js u jt ) = o p (1) and n 1 P n i;j=1 a 1;ija ;ji (eu ik eu il u ik u il )(eu js eu jt u js u jt ) = o p (1). Therefore, the consistency of n 1 tr( e kl A 1 e st A ) in (1b) follows. () Some typical entries in D that involve unknown parameters are Q E(y k ) = Q (i m;k I n )S 1 (B I n )x, Q E(Wy k ) = Q (i m;k W n)s 1 (B I n )x, E(u l Ay k) = tr[(i m;l A)(i m;k I n )S 1 ] and E(u l AWy k) = tr[(i m;l A)(i m;k W)S 1 ], where A = [a ij ] is an nn zero-diagonal matrix uniformly bounded in row and column sums. To show n 1 e D n 1 D = o p (1), we need to show that (a) n 1 Q (i m;k I n) e S 1 ( e B I n )x n 1 Q (i m;k I n)s 1 (B I n )x = o p (1) and n 1 Q (i m;k W)e S 1 ( e B I n )x n 1 Q (i m;k W)S 1 (B I n )x = o p (1); (b) n 1 tr[(i m;l A)(i m;k I n) e S 1 e ] n 1 tr[(i m;l A)(i m;k I n)s 1 ] = o p (1) and n 1 tr[(i m;l A)(i m;k W) e S 1 e ] n 1 tr[(i m;l A)(i m;k W)S 1 ] = o p (1). where e S = I mn ( e I n ) ( e W) and e = e 11 1m e..... e m1 mm e : As (a) follows by Lemma A. and (b) follows by a similar argument as in Lin and Lee (1), we conclude n 1 e D n 1 D = o p (1). Proof of Proposition : For consistency, note that g() e 1 g() = g() 1 g() + g() ( e 1 1 )g(): From the proof of Proposition 1, n 1 g() 1 g() n 1 E[g() 1 g()] = o p (1) uniformly in. Hence, it su ces to show that n 1 g() ( e 1 1 )g() = o p (1) uniformly in. Let k k denote the Euclidian norm for vectors and matrices. Then, 1 n g() ( e 1 1 )g() 1 1 1 1 n kg()k n e 1 n : From the proof of Proposition 1, n 1 g() n 1 E[g()] = o p (1) uniformly in. As n 1 E[Q u k ( k )] =