GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models. Lung-fei Lee* Department of Economics. Ohio State University, Columbus, OH

Size: px

Start display at page:

Download "GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models. Lung-fei Lee* Department of Economics. Ohio State University, Columbus, OH"

Damon Martin
5 years ago
Views:

1 GMM ad 2SLS Estimatio of Mixed Regressive, Spatial Autoregressive Models by Lug-fei Lee* Departmet of Ecoomics Ohio State Uiversity, Columbus, OH October 200, October 2004; curret revisio July 2005 Abstract The GMM method ad the classical 2SLS method are cosidered for the estimatio of mixed regressive, spatial autoregressive models. These methods have computatioal advatage over the covetioal maximum likelihood method. The proposed GMM estimators are show to be cosistet ad asymptotically ormal. Withi certai classes of GMM estimators, best oes are derived. The proposed GMM estimators improve upo the 2SLS estimators ad are applicable eve if all regressors are irrelevat. A best GMM estimator may have the same limitig distributio as the ML estimator with ormal disturbaces). Key Words: Spatial ecoometrics, spatial autoregressive models, regressio, 2SLS, GMM, ML, efficiecy. Classificatio: C3, C2, R5 Correspodig address: Lug-fei Lee, 40 Arps Hall, Departmet of Ecoomics, Ohio State Uiversity, 945 N. High Street, Columbus, OH lflee@eco.ohio-state.edu * I am grateful to the Natioal Sciece Foudatio, Ecoomics Program for fiacial support for my research uder the grat umber SES-0380 ad SES I thak Mr. Ji Tao, who has worked as my RA, to coduct the Mote Carlo study i this paper. I appreciate havig valuable commets ad suggestios from Professor Christopher R. Bolliger, a aoymous referee ad a associate editor of this joural. i

2 . Itroductio I this paper, we propose a geeral GMM framework for the estimatio of mixed regressive, spatial autoregressive MRSAR) models. The GMM estimatio for those models ca be computatioally simpler tha the maximum likelihood ML) or quasi maximum likelihood QML) methods i a geeral settig. The GMM estimator GMME) may be asymptotically more efficiet tha the two-stage least squares 2SLS) estimator 2SLSE) ad may be asymptotically efficiet as the ML estimator MLE). The 2SLS method has bee oted for the estimatio of the MRSAR model i Aseli 988, 990), Lad ad Deae 992), Kelejia ad Robiso 993), Kelejia ad Prucha 997, 998), ad Lee 2003), amog others. The istrumetal variables IVs) are geerated from exogeous regressors ad the spatial weights matrix of the model. However, there are some issues o the 2SLS approach. The mai issue is that the proposed 2SLSE is iefficiet relative to the MLE. For the estimatio of a covetioal liear simultaeous equatio model, the 2SLSE is asymptotically efficiet as the limited iformatio MLE see, e.g., Amemiya 985). This is ot so for the estimatio of the MRSAR model. The 2SLSE has bee show to be cosistet ad asymptotically ormally distributed Kelejia ad Prucha 998). Lee 2003) discusses the best oe B2SLSE) withi the class of IV estimators. By comparig the limitig variace matrices, the 2SLSE ad B2SLSE are less efficiet relative to the MLE whe the disturbaces are ormally distributed). A subtle issue is that the 2SLS method would ot be cosistet whe all the exogeous regressors i the MRSAR model are really irrelevat. Furthermore, it is ot possible to test the joit sigificace of all the exogeous regressors based o those IV estimators Kelejia ad Prucha 998). O the cotrary, the MLE of a MRSAR is cosistet ad its limitig distributio ca be used for testig the joit sigificace of regressio coefficiets. These show that the 2SLS approach is less satisfactory tha the ML approach i some of its statistical properties. However, the 2SLS approach is computatioally simpler tha the ML approach ad is distributio free. For pure spatial autoregressive SAR) processes, a method of momets MOM) has bee itroduced i Kelejia ad Prucha 200). The MOM method is computatioally simpler tha the ML method. Their MOM estimator is cosistet but is ulikely to be efficiet relative to the MLE. Recetly, Lee 200) exteds the MOM estimatio ito a more geeral GMM estimatio framework. Withi that GMM framework, a The asymptotic distributio of their MOM estimator has ot bee established i their article. They have provided Mote Carlo results to demostrate that their MOM estimator is oly slightly iefficiet relative to the QML estimator uder various distributios.

3 best GMM estimator BGMME) ca have the same limitig distributio as the MLE or QML estimator. I this paper, we cosider the possible geeralizatio of the MOM method for the MRSAR model. We suggest a combiatio of the momets i the 2SLS framework with momet fuctios origiated for the estimatio of pure SAR processes. We show that the resultig GMME ca be asymptotically efficiet relative to the 2SLSE ad B2SLSE. The best GMME ca be made available ad it ca be efficiet as the MLE. With this GMM framework, oe may also test the joit sigificace of all the possible exogeous regressors. This paper is orgaized as follows. I Sectio 2, we cosider the estimatio of a MRSAR model. We discuss the momet fuctios that ca be used i additio to the momets based o the orthogoality of exogeous regressors with the model disturbace. Cosistecy ad asymptotic distributio of the GMME will be derived i Sectio 3. I Sectio 4, the GMME ad the 2SLSE are compared. The best selectio of momet fuctios ad IVs will be discussed ad its possible efficiecy property is derived. All the proofs of the results are collected i the appedices. Sectio 5 provides some Mote Carlo results for the compariso of fiite sample properties of estimators. Coclusios are draw i Sectio GMM Estimatio ad Idetificatio of the MRSAR Model The MRSAR model differs from a pure SAR process i the presece of exogeous regressors X as explaatory variables i the model: Y = λw Y + X β + ɛ, 2.) where X is a k dimesioal matrix of ostochastic exogeous variables, W is a spatial weights matrix of kow costats with a zero diagoal, ad the disturbaces ɛ i, i =,,, of the -dimesioal vector ɛ are i.i.d. 0,σ 2 ). Specifically, we assume that Assumptio. The ɛ i are i.i.d. with zero mea, variace σ 2 ad that a momet of order higher tha the fourth exists. Assumptio 2. The elemets of X are uiformly bouded costats, X has the full rak k, ad lim X X exists ad is osigular. Because statistics ivolvig quadratic forms of ɛ will be preset i the estimatio, the existece of the fourth order momet of ɛ i s will guaratee fiite variaces for the quadratic forms. The higher tha the fourth momet coditio i Assumptio is eeded i order to apply a cetral limit theorem due to Kelejia ad Prucha 200). The ostochastic X ad its uiform boudedess coditios i Assumptio 2 are for 2

4 coveiece. If the elemets of X are stochastic ad have ubouded rages, coditios i Assumptio 2 ca be replaced by some fiite momet coditios. The W Y i 2.) is called a spatial lag ad its coefficiet is supposed to represet the spatial effect due to the ifluece of eighborig uits o a sigle spatial uit. The mai iterest i estimatio of the model is, i geeral, the parameters λ ad β. I order to distiguish the true parameters from other possible values i the parameter space, we deote λ 0,β 0, ad σ0 2 as the true parameters which geerate a observed sample. Let θ =λ, β ) ad θ 0 =λ 0,β 0). This model is supposed to be a equilibrium model. The structural equatio 2.) implies the reduced form equatio that Y =I λ 0 W ) X β 0 +I λ 0 W ) ɛ. 2.2) It follows that W Y = W I λ 0 W ) X β 0 + W I λ 0 W ) ɛ ad W Y is correlated with ɛ because, i geeral, EW I λ 0 W ) ɛ ) ɛ )=σ 2 0trW I λ 0 W ) ) 0. There are some regularity coditios o W ad I λ 0 W ) which will be eeded i order that the spatial correlatios betwee uits ca be maageable. The followig assumptio 3 is origiated i the works of Kelejia ad Prucha, e.g., Kelejia ad Prucha 998). A sequece of square matrices {A }, where A = [a,ij ], is said to be uiformly bouded i row sums colum sums) i absolute value if the sequece of row sum matrix orm A = max i=,, j= a,ij colum sum matrix orm A = max j=,, i= a,ij ) are bouded. 2 Assumptio 3. The spatial weights matrices {W } ad {I λw ) } at λ = λ 0 are uiformly bouded i both row ad colum sums i absolute value. Note that we have imposed the uiform boudedess coditio o {I λ 0 W ) } oly at λ = λ 0. The stroger assumptio that {I λw ) } is uiformly bouded i both row ad colum sums i absolute value, uiformly i λ i a compact parameter space of λ) is ot imposed. 3 With the ormal distributio for ɛ, the ukow parameters λ, β ad σ 2 ca be estimated by the ML or QML) method Ord 975). The ML method ivolves the computatio of the determiat I λw ) of I λw ), at each possible value of λ durig a optimizatio search. For the case that W is rowormalized ad the correspodig spatial matrix before row ormalizatio is symmetric, the eigevalues of 2 Properties of those matrix orms ca be foud i Hor ad Johso 985), pp The latter stroger assumptio is eeded for the maximum likelihood approach. For the GMM method that we propose, because the GMM fuctio is a polyomial fuctio of θ, which is relatively simpler fuctio, our aalysis does ot require the stroger uiform boudedess assumptio. 3

5 W are all real. As I λw ) is solely a fuctio of λ ad eigevalues of W, Ord 975) poits out that computatio of I λw ) ca be easily updated as the eigevalues of W eed to be computed oly oce. 4 With a geeral spatial weights matrix which does ot have special properties, such as sparseess ad the symmetry property i Ord 975), the MLE would be difficult to be computed for sample with a large size. Recetly, Smirov ad Aseli 999) discuss the attractability of usig a characteristic polyomial approach. For a computatioal poit of view, a 2SLS method remais the simplest. Kelejia ad Prucha 998) suggest the use of W X, W 2 X, etc., together with X as IVs i a 2SLS for estimatig θ. Lee 2003) shows that the B2SLS correspods to use W I λ 0 W ) X ad X as IV matrices. These 2SLSE ad B2SLSE are computatioally simple ad have closed form expressios. However, the 2SLS ad B2SL methods have the limitatio i that at least oe ocostat regressors i X must have sigificat coefficiets i order that valid istrumetal variables ca be geerated from them. As the 2SLSE ad B2SLSE are based o the existece of relevat ocostat regressors, it is impossible to test the joit sigificace of X with those estimators. 5 Eve if valid istrumetal variables do exist, the 2SLSE may be iefficiet relative to the MLE. By comparig the limitig variace matrices of the MLE ad 2SLSE e.g., i Aseli 988; Aseli ad Bera 998), either the 2SLSE or the B2SLE have the same limitig distributio of the MLE. I this paper, we suggest to icorporate some other momet coditios i additio to those based o X i order to improve upo the efficiecy of the 2SLSE. Let Q be a k x matrix of IVs costructed as fuctios of X ad W i a 2SLS approach. Deote ɛ θ) =I λw )Y X β for ay possible value θ. The momet fuctios correspodig to the orthogoality coditios of Q ad ɛ are Q ɛ θ). Let P be the class of costat matrices which have a zero trace. A subclass P 2 of P cosistig of matrices with a zero diagoal is also iterestig. By selectig matrices P,...,P m from P, we suggest the use of P j ɛ θ)) ɛ θ) i additio to Q ɛ θ) to form a set of momet fuctios. For aalytical tractability, the matrices i P are assumed to have the uiformly boudedess properties as W. 4 However, Kelejia ad Prucha 999) have poited out that Ord s method may suffer from umerically imprecise problems for large sample. 5 Whe β 0 = 0, the model would be a pure spatial autoregressive process. For a spatial autoregressive model with oly a ozero itercept term but o other spatially varyig regressors, W l will ot be a useful istrumet for W Y whe W is row ormalized. Whe W is row-ormalized, W l = l where l is the vector of oes. 4

6 Assumptio 4. The matrices P j s from P are uiformly bouded i both row ad colum sums i absolute value, ad elemets of Q are uiformly bouded. With the selected matrices P j s ad IV matrices Q, the set of momet fuctios forms a vector g θ) =P ɛ θ),,p m ɛ θ),q ) ɛ θ) =ɛ θ)p ɛ θ),,ɛ θ)p mɛ θ),ɛ θ)q ) 2.3) for the GMM estimatio. At θ 0, g θ 0 )=ɛ P ɛ,,ɛ P m ɛ,ɛ Q ), which has a zero mea because EQ ɛ )=Q Eɛ ) = 0 ad Eɛ P j ɛ )=σ 2 0trP j ) = 0 for j =,,m. 6 The ituitio is as follows. The IV variables i Q, ca be used as IV variables for W Y ad X i 2.). The P j ɛ is ucorrelated with ɛ.asw Y = W I λ 0 W ) X β 0 + ɛ ), P j ɛ ca be used as a IV for W Y as log as P j ɛ ad W I λ 0 W ) ɛ are correlated. For ay possible value θ, d θ)p d θ)+σ0tri 2 λ 0 W ) I λw )P I λw )I λ 0 W ) ) Eg θ)) =. d θ)p md θ)+σ0 2trI λ 0 W ) I λw )P mi λw )I λ 0 W ) ), 2.4) Q d θ) where d θ) =X β 0 β) +λ 0 λ)w I λ 0 W ) X β 0. For these momet fuctios to be useful, they have to idetify the true parameter θ 0 of the model. I the GMM framework, the idetificatio coditio requires the uique solutio of the limitig equatios, lim Eg θ)) = 0 at θ 0 Hase 982). The momet equatios correspodig to Q are lim Q d θ) = lim Q X,Q W I λ 0 W ) X β 0 )β 0 β),λ 0 λ) = 0. They will have a uique solutio at θ 0 if Q X,Q W I λ 0 W ) X β 0 ) has a full colum rak, i.e., rak k + ), for large eough. This sufficiet rak coditio implies the ecessary rak coditio that X,W I λ 0 W ) X β 0 ) has a full colum rak k+) ad that Q has a rak at least k + ), for large eough. The sufficiet rak coditio requires Q to be correlated with W Y i the limit as goes to ifiity. This is so because EQ W Y )=Q W I λ 0 W ) X β 0. Uder the sufficiet rak coditio, θ 0 ca thus be idetified via lim Q d θ) =0. The ecessary full rak coditio of X,W I λ 0 W ) X β 0 ) for large is possible oly if the set cosistig of W I λ 0 W ) X β 0 ad X is ot asymptotically liearly depedet. This rak coditio would ot hold, i particular, if β 0 were zero. There are other cases whe this depedece ca occur see, e.g., Kelejia ad Prucha 998). As X has rak k, ifx,w I λ 0 W ) X β 0 ) does ot have a full rak 6 The selectio of the umber of m is ot a importat issue for our GMM approach, because there exists a best P as discussed i a subsequet sectio. Furthermore, from our Mote Carlo study, the selectio of W ad W 2 trw 2 ) I ) provides accurate approximatios for the best oe. 5

7 k + ), its rak will be k, ad there will exist a costat vector c such that W I λ 0 W ) X β 0 = X c. The, d θ) =X β 0 β +λ 0 λ)c) ad Q d θ) =Q X β 0 β +λ 0 λ)c). The correspodig momet equatios Q d θ) = 0 will have may solutios but the solutios are all described by the relatio that β = β 0 +λ 0 λ)c as log as Q X has a full rak k. Uder this sceario, β 0 ca be idetified oly if λ 0 is idetifiable. The idetificatio of λ 0 will rely o the remaiig quadratic momet equatios lim tri λ 0 W ) I λw )P j I λw )I λ 0 W ) ) = 0 for j =,,m. I this case, Y = λ 0 W I λ 0 W ) X β 0 )+X β 0 +I λ 0 W ) ɛ = X β 0 + λ 0 c)+u, where u =I λ 0 W ) ɛ. The relatioship u = λ 0 W u + ɛ is a SAR process. The idetificatio of λ 0 thus comes from the SAR process u. Oe ca see from Lee 200) that the set of the limitig quadratic momet equatios has a uique solutio at λ 0 if lim [trp s W I λ 0 W ) ),,trp s m W I λ 0 W ) )] is liearly idepedet of lim [tri λ 0 W ) W P W I λ 0 W ) ),,tri λ 0 W ) W P m W I λ 0 W ) )], where A s = A+A for ay square matrix A. The followig assumptio summarizes some sufficiet coditios for the idetificatio of θ 0 from the momet equatios lim Eg θ)) = 0. Assumptio 5. Either i) lim Q W I λ 0 W ) X β 0,X ) has the full rak k +), or ii) lim Q X has the full rak k, lim trp s j W I λ 0 W ) ) 0 for some j, ad lim [trp s W I λ 0 W ) ),,trp s m W I λ 0 W ) )] is liearly idepedet of lim [tri λ 0 W ) W P W I λ 0 W ) ),,tri λ 0 W ) W P m W I λ 0 W ) )]. I terms of computatio of the GMM estimator, because the momet fuctios i g θ) are quadratic fuctios i λ ad β, the GMM objective fuctio will be of polyomial of order four. The derivatio of the GMM estimator will ivolve the miimizatio of a polyomial fuctio i θ. The computatio of polyomial coefficiets, which do ot ivolve the ukow θ, eed to be doe oce. The evaluatio of the correspodig objective fuctio will thus ivolve the multiplicatio of these polyomial coefficiets with powers of θ s at differet values of θ. The computatio is more complicated tha that of the 2SLS but shall be simpler tha that of the ML approach. The variace matrix of these momets fuctios ivolves variaces ad covariaces of liear ad quadratic forms of ɛ. For ay square matrix A, let vec D A) =a,,a ) deote the colum vector formed with the diagoal elemets of A. The, EQ ɛ ɛ P ɛ )=Q i= j= p,ijeɛ ɛ i ɛ j )= Q vec D P )µ 3 ad Eɛ P j ɛ ɛ P l ɛ )=µ 4 3σ0)vec 4 D P j)vec D P l )+σ0trp 4 j Pl s ) by Lemma A.2, 6

8 where µ 3 = Eɛ 3 i) ad µ 4 = Eɛ 4 i). It follows that varg θ 0 )) = Ω where Ω = ) µ4 3σ0)ω 4 mω m µ 3 ω mq µ 3 Q + V ω m 0, 2.5) with ω m =[vec D P ),,vec D P m )] ad trp P) s trp P s m) 0 V = σ0 4.. ). trp m P s ) trp mpm s ) 0 = m 0 σ4 0 0 Q, 2.6) σ Q Q σ0 2 Q 0 where m =[vecp ),,vecp m )] [vecp s ),,vecp s m )], by usig trab) =veca ) vecb) for ay coformable matrices A ad B. Whe ɛ is ormally distributed, Ω is simplified to V because µ 3 = 0 ad µ 4 =3σ 4 0. If P j s are from P 2,Ω = V also because ω m = 0. I geeral, from 2.3), Ω is osigular if ad oly if both vecp ),,vecp m )) ad Q have full colum raks. This is so, because Ω would be sigular if ad oly if the momets i g θ 0 ) are fuctioally depedet, equivaletly, if ad oly if m j= a jp j = 0 ad Q b = 0 for some costat vector a,,a m,b ) 0. As elemets of P j s ad Q are uiformly bouded by Assumptio 4, it is apparet that ω mq, ω mω m ad Q Q are of order O). Furthermore, because P j P s l is bouded i row or colum sums i absolute value, trp jp s l )=O). Cosequetly, Ω = O). It is thus meaigful to impose the followig covetioal regularity coditio o the limit of Ω. Assumptio 6. The limit of Ω exists ad is a osigular matrix. 7 The variace matrix Ω is eeded to formulate the optimum GMME with g θ). 3. Cosistecy ad Asymptotic Distributios The followig propositio provides the asymptotic distributio of the GMME with a liear trasformatio of the momet equatios, a g θ), where a is a matrix with a full row rak greater tha or equal to k + ). The a is assumed to coverge to a costat full rak matrix a 0. This correspods to the Hase s GMM settig, which illustrates the optimal weightig issue. Assumptio 7. θ 0 is i the iterior of the parameter space Θ, which is a compact subset of R k I this paper, for simplicity, we do ot cosider the large group iteractios sceario as i Case 99), where lim Ω might be sigular. It is possible to exted our aalysis to cover that case but the aalysis would become much algebraically complicated. For a aalysis o the MLE, see Lee 2004). 8 For a oliear extremum estimator, the parameter space would usually be assumed to be a compact set Amemiya 985). The extremum estimate always exist whe the objective fuctio is cotiuous o a compact set. Furthermore, for the proof of cosistecy of the extremum estimator, the uiform covergece argumet will usually require a compact parameter space see the uiform covergece theorem i Amemiya). 7

9 Propositio. Uder Assumptios -5, suppose that P j for j =,,m, are from P ad Q is a k x IV matrix so that a 0 lim Eg θ)) = 0 has a uique root at θ 0 i Θ. The, the GMME ˆθ derived from mi θ Θ g θ)a a g θ) is a cosistet estimator of θ 0, ad ˆθ θ 0 ) D N0, Σ), where ad [ Σ = lim D )a a ] D ) D )a a Ω )a a [ D ) D )a a )] D, 3.) D ) σ 2 = 0 trpw s I λ 0 W ) ) σ0trp 2 mw s I λ 0 W ) ) W I λ 0 W ) X β 0 ) Q 0 0 X Q, 3.2) uder the assumptio that lim a D exists ad has the full rak k +). The rak coditios i Assumptio 5 imply that lim g θ 0 ) = 0 has a uique root at θ 0 ad, hece, its correspodig gradiet matrix lim D of 2.8) has rak k + ). I the presece of Assumptio 5, the extra coditios that lim a Eg θ)) = 0 has a uique root at θ 0 ad the limit of a D has full rak k + ), are simply to elimiate the bad choice of a sequece {a } for the liear combiatio a g θ), which may result i the loss of idetificatio from the origial g. If Assumptio 5 were ot satisfied, the coditio that a 0 lim g θ) = 0 has a uique root at θ 0 would ot be satisfied. 9 The idetificatio coditio is specified for the limitig fuctio. The weaker requiremet that a Eg θ)) = 0 has a uique root at θ 0 for large eough is ot sufficiet for θ 0 to be idetifiably uique, because the objective fuctio might flatte out i the limit. From Propositio, with the momet fuctios g θ) i 2.3), the optimal choice of a weightig matrix a a is Ω ) by the geeralized Schwartz iequality. If P j s are selected from the subclass P 2 or ɛ i s are ormally distributed, Ω i 2.5) will be reduced to the simpler matrix V i 2.6). These variace matrices ca be used to form the optimal GMM objective fuctio with g θ). The σ 2, µ 3, ad µ 4 ca be cosistetly estimated by usig estimated residuals of ɛ from a iitial cosistet estimate of θ The Ω For our GMM approach, the momet fuctios are quadratic fuctios of θ. Because of its simple oliear structure, the uiform covergece argumet of the sample objective fuctio after proper ormalizatio ca be easily established as log as the parameter space is bouded. So the compact parameter space assumptio may be relaxed to a bouded set as log as the miimum of the objective fuctio exists i such a parameter space. 9 Whe the set of W I λ 0 W ) X β 0 ad X were liearly depedet, Propositio would ot cover the large group iteractio case of Case 99) because, i this situatio, trp s j W I λ 0 W ) ) would vaish i the limit ad Assumptios 5 ad 6 eeded be stregtheed. The MLE ˆλ of λ 0 i this situatio is kow to have a slower rate of covergece tha that of the MLE ˆβ of β 0 see, Lee 2004)). 0 The detailed proof is straightforward but tedious ad is omitted here 8

10 ca the be cosistetly estimated as ˆΩ. The followig propositio shows that the feasible optimum GMME OGMME) with a cosistetly estimated ˆΩ has the same limitig distributio as that of the OGMME based o Ω. With the optimum GMM objective fuctio, a overidetificatio test is available. Propositio 2. OGMME ˆθ o, Uder Assumptios -6, suppose that ˆΩ ) Ω ) = o P ), the the feasible derived from mi θ Θ g θ)ˆω g θ) based o g θ) i 2.3) with P j s from P has the asymptotic distributio ˆθo, θ 0 ) D N0, lim D Ω D ) ). 3.3) Furthermore, g ˆθ )ˆΩ g ˆθ ) D χ 2 m + k x ) k + )). 3.4) The 2SLSE ˆβ 2sl, would be icosistet whe the set of W I λ 0 W ) X β 0 ad X is liearly depedet. A obvious example is β 0 = 0. Aother example is a model where the oly relevat variable i X is the itercept term l ad W is row-ormalized Kelejia ad Prucha 998). I that case, W I λ 0 W ) X β 0 = β0 λ 0 l because I λ 0 W ) l = λ 0 l ad W l = l. However, eve whe the set of W I λ 0 W ) X β 0 ad X is liearly depedet, the GMM approach may still work because of the additioal momet fuctios with P j s. The asymptotic distributio of the GMME i 2.9) ca be used to formulate a Wald statistic for testig the overall sigificace of all exogeous variables while that of the 2SLSE ca ot. 4. Efficiecy ad the BGMME θ 0 is The optimal GMME ˆθ o, ca be compared with the 2SLSE. With Q as the IV matrix, the 2SLSE of ˆθ 2sl, =[Z Q Q Q ) Q Z ] Z Q Q Q ) Q Y, 4.) where Z =W Y,X ). The asymptotic distributio of ˆθ 2sl, is ˆθ2sl, θ 0 ) D N0,σ0 2 lim { W I λ 0 W ) X β 0,X ) Q Q Q ) Q W I λ 0 W ) X β 0,X )} ), 4.2) uder the assumptios that lim Q Q is osigular ad lim Q W I λ 0 W ) X β 0,X ) has the full colum rak k + ) Kelejia ad Prucha 998). Because the 2SLSE ca be derived from mi θ ɛ θ)q Q Q ) Q ɛ θ), the 2SLS approach is a special case of the GMM estimatio i Propositio 9

11 with a =0, Q Q ) /2 ) ad a g θ) = Q Q ) /2 Q ɛ θ). It follows from Propositio 2, ˆθ o, shall be efficiet relative to ˆθ 2sl,. Withi the 2SLS framework, by the geeralized Schwartz iequality applied to the asymptotic variace of ˆθ 2ls, i 4.2), the best IV matrix Q will be W I λ 0 W ) X β 0,X ). Usig the best IV matrix for Q i the GMM framework, the resultig GMME shall be efficiet relative to the B2SLE. There is a related questio o whether W I λ 0 W ) X β 0,X ) ca be the best IV matrix i the class of matrices Q with give P j s. The aswer ca be affirmative for cases where the momets Q ɛ do ot iteract with the momets ɛ P jɛ via their correlatios. The covariace of Q ɛ ad ɛ P jɛ, j =,,m,isµ 3 Q ω m, which ca be zero whe µ 3 =0orω m =0. The remaiig issue is o the best selectio of P j s. Whe the disturbace ɛ is ormally distributed or P j s are from P 2, D Ω Cm D = m C m ) + σ0 2 W I λ 0 W ) X β 0,X ) W I λ 0 W ) X β 0,X ), 4.3) where C m =[trp s W I λ 0 W ) ),,trp s mw I λ 0 W ) )]. Note that, because trp j P s l )= 2 trp j s P l s ), m ca be rewritte as m = 2. i) Whe P j s are from P 2, trp s P s ) trp s P m s ). trpm s P s ) trp m s P m s ) = 2 [vecp s ) vecp s m)] [vecp s ) vecp s m)]. trp s jw I λ 0 W ) )= 2 tr P s j[w I λ 0 W ) DiagW I λ 0 W ) )] s) = 2 vec [W I λ 0 W ) DiagW I λ 0 W ) )] s) vecp s j ) for j =,,m,ic m, where DiagA) deotes the diagoal matrix formed by the diagoal elemets of a square matrix A. Therefore, the geeralized Schwartz iequality implies that C m m C m 2 vec [W I λ 0 W ) DiagW I λ 0 W ) )] s) vec [W I λ 0 W ) DiagW I λ 0 W ) )] s) = tr [W I λ 0 W ) DiagW I λ 0 W ) )] s W I λ 0 W ) ). Thus, i the subclass P 2,[W I λ 0 W ) DiagW I λ 0 W ) )] ad together with [W I λ 0 W ) X β 0,X ] provide the set of best IV fuctios. Note that the best selected matrix for the quadratic momet is a sigle matrix which is best relative to ay fiite umber of P j. So ay additioal P j i additio to the best oe will ot play a role i asymptotically) efficiet estimatio. This is so also for the best IV matrix Q for liear momets. 0

12 ii) For the case where ɛ is N0,σ 2 0I ), because, for ay P j P, trpj s W I λ 0 W ) )= 2 vec [W I λ 0 W ) trw I λ 0 W ) ) I ] s) vecpj s ), for j =,, m, the geeralized Schwartz iequality implies that C m mc m tr [W I λ 0 W ) trw I λ 0 W ) ) I ] s W I λ 0 W ) ). Hece, i the broader class P,[W I λ 0 W ) trwi λ0w) ) I ]ad[w I λ 0 W ) X β 0,X ] provide the best set of IV fuctios. For all those cases, i the evet that the set of W I λ 0 W ) X β 0 ad X is liearly depedet, W I λ 0 W ) X β 0 is redudat ad the best IV matrix shall simply be X. 2 I practice, with iitial cosistet estimates ˆλ, ˆβ of λ 0 ad β 0, W I λ 0 W ) ca be estimated as W I ˆλ W ), ad W I λ 0 W ) X β 0 by ŴI ˆλ W ) X ˆβ. The correspodig variace matrix V of these best momet fuctios ca be estimated as ˆV. The followig propositio summarizes the results ad shows that the feasible BGMME has the same limitig distributio as the BGMME. Propositio 3. Uder Assumptios -3, suppose that ˆλ is a -cosistet estimate of λ 0, ˆβ is a cosistet estimate of β 0, ad ˆσ 2 is a cosistet estimate of σ 2 0. Withi the class of GMME s derived with P 2, the BGMME ˆθ 2b, has the limitig distributio that ˆθ2b, θ 0 ) D N0, Σ 2b ) where Σ 2b = lim tr[g DiagG )) s G ]+ G σ0 2 X β 0 ) G X β 0 ) with G = W I λ 0 W ), which is assumed to exist. σ 2 0 σ 2 0 X G X β 0 ) σ 2 0 X X ) G X β 0 ) X, 4.4) I the evet that ɛ N0,σ 2 0 I ), withi the broader class of GMME s derived with P, the BGMME ˆθ b, has the limitig distributio that ˆθ b, θ 0 ) D N0, Σ b ) where ) tr[g trg) I ) s G ]+ G σ Σ b = lim 0 2 X β 0 ) G X β 0 ) G σ0 2 X β 0 ) X X σ G 0 2 X β 0 ) X, 4.5) σ X 0 2 which is assumed to exist. 2 The best IV vector W I λ 0 W ) X β 0 together with X will satisfy the idetificatio Assumptio 5i) as log as they are ot liearly depedet i the limit. I the evet that they are, the model idetificatio will follow from the best matrix W I λ 0 W ) trw I λ 0 W ) )I or W I λ 0 W ) DiagW I λ 0 W ) ) for the quadratic momet Lee 200). With those best IV vectors ad matrices, its variace matrix will be osigular ad Assumptio 6 will also be satisfied.

13 Whe ɛ is N0,σ 2 0I ), the model 2.) ca be estimated by the ML method. The log likelihood fuctio of the MRSAR model via its reduced form equatio i 2.2) is l L = 2 l2π) 2 l σ2 +l I λw ) 2σ 2 [Y I λw ) X β] I λw )I λw )[Y I λw ) X β]. 4.6) The asymptotic variace of the MLE ˆθ ml,, ˆσ 2 ml, )is trg 2 )+trg G )+ G σ 2 X β 0 ) G X β 0 ) AsyVarˆθ ml,, ˆσ ml,) 2 0 = σ0 2 X σ G 0 2 X β 0 ) trg ) σ0 2 X G X β 0 σ 2 0 X X 0 trg ) σ σ 4 0 see, e.g., Aseli ad Bera 998), p.256). From the iverse of a partitioed matrix, the asymptotic variace of the MLE ˆθ ml, is trg 2 )+trg G )+ G σ0 AsyV arˆθ ml, )= 2 X β 0 ) G X β 0 ) 2 tr2 G ) σ 2 0 σ 2 0 ) X G X β 0 ). X G X β 0 σ 2 0 X X As trg 2 )+trg G ) 2 tr2 G ) = trg trg) I ) s G ), the GMME ˆθ b, has the same limitig distributio as the MLE of θ 0 from Propositio 3. There is a ituitio o the best GMM approach compared with the maximum likelihood oe. The derivatives of the log likelihood i 4.6) are ad l L σ 2 l L β = σ 2 X ɛ θ), = 2σ 2 + 2σ 4 ɛ θ)ɛ θ), l L λ = trw I λw ) )+ σ 2 [W I λw ) X β] ɛ θ)+ σ 2 ɛ θ)[w I λw ) ] ɛ θ). The equatio l L σ 2 = 0 implies that the MLE is ˆσ 2 θ) = ɛ θ)ɛ θ) for a give value θ. By substitutig ˆσ 2 θ) ito the remaiig likelihood equatios, the MLE ˆθ ml, will be characterized by the momet equatios: X ɛ θ) =0, ad [W I λw ) X β] ɛ θ)+ɛ [W θ) I λw ) ] trw I λw ) ) ɛ θ) =0. The similarity of the best GMM momets ad the above likelihood equatios is revealig. The best GMM approach has the liear ad quadratic momets of ɛ θ) i its formatio ad uses cosistetly estimated matrices i its liear ad quadratic forms. 2

14 5. Some Mote Carlo Results The model i the Mote Carlo study is specified as Y = λw Y + X β + X 2 β 2 + X 3 β 3 + ɛ, where x i, x i2 ad x i3 are three idepedetly geerated stadard ormal variables ad are i.i.d. for all i, ad ɛ i s are i.i.d. N0,σ 2 ). Whe the sample size is = 49, the spatial weights matrix W correspods to the weights matrix for the study of crimes across 49 districts i Columbus, Ohio i Aseli 988). For large sample sizes of = 245 ad 490, the correspodig spatial weights matrices are block diagoal matrices with the precedig matrix as their diagoal blocks. These correspod to the poolig, respectively, of five ad te separate districts with similar eighborig structures i each district. The estimatio methods cosidered are the.) 2SLS the two stage least squares method with IV s X,W X, ad W 2 X ; 2.) GMM a simple uweighted GMM approach usig Q =X,W X,W 2 X ) for liear momets ad W ad W 2 trw 2 ) I for quadratic momets with a idetity matrix as the distace matrix); 3.) OGMM the optimum GMM approach usig Q =X,W X,W 2 X ) for liear momets ad W ad W 2 trw 2 ) I for quadratic momets with the iverse of their estimated) variace matrix as the distace matrix); 3 4.) BGMM the best optimum GMM approach by usig X ad I ˆλ W ) X ˆβ for the liear momets, ad W I ˆλ W ) tr[w I ˆλ W ) ]I for the quadratic momet, where ˆλ, ˆβ ) is a iitial cosistet estimate; 5.) ML the maximum likelihood approach. The umber of repetitios is,000 for each case i this Mote Carlo experimet. The regressors are radomly redraw for each repetitio. I each case, we report the mea Mea ad stadard deviatio SD of the empirical distributios of the estimates. To facilitate the compariso of various estimators, their root mea square errors RMSE are also reported. I all the cases of this study, the true λ 0 is set to 0.6. The smallest sample size is = 49, ad the moderate sample sizes are 245 ad 490. The variace of the equatio errors σ 2 0 is 2. The β coefficiets are varied i the experimets. 3 The ormality of the disturbaces is assumed. The σ 2 i the variace matrix of the momets is estimated with the estimated residuals of the model equatio with the simple GMM estimates as its coefficiets. Note that the miimizatio of a objective fuctio i various GMM approaches is performed globally without imposig a restricted parameter space, such as λ lies i, ), i our study. 3

15 Table reports the results of the case where β 0 =.0, β 20 = 0 ad β 30 =.0. I this case, the correspodig variace ratio of xβ 0 with the sum of variaces of xβ 0 ad ɛ is 0.5. If oe igores the iteractio term, this ratio would represet R 2 =0.5 i a regressio equatio. The results idicate that the mai differeces of the various estimatio approaches are o the estimatio of the spatial effect λ. For the small sample size N = 49, the 2SLS is biased upward by 2.7% ad it has also the largest SD compared with the various GMME s ad the MLE. The MLE is biased dowward by 4% ad the OGMME is biased upward by 6.8%. The GMME ad BGMME are essetially ubiased. The BGMME is ot better tha the GMME ad OGMME i terms of SD ad RMSE with this small sample. Amog the various GMM estimates, the OGMME is better i terms of SD ad RMSE. The MLE has the smallest SD ad RMSE amog all these estimates. The estimates of β s of the various methods do ot have much differeces. The estimates of β s have small biases. Whe icreases to 245 or 490, the upward bias of the 2SLS estimate of λ is reduced. All the other estimates are ubiased. Eve for the moderate sample sizes, the 2SLS estimates of λ have apparetly larger SD ad RMSE s tha the correspodig various GMMEs ad MLEs. The BGMME is slightly better tha the OGMME, ad, i tur, the OGMME is slightly efficiet relative to the GMME. The BGMM is efficiet as the MLE whe N = 490. I Table 2, the true parameters are β = 0.2, β 2 = 0, ad β 3 =0.2. The variace of xβ 0 is much smaller tha the variace of ɛ. If oe igores the iteractio term, the implied R 2 is about 0.04 i a regressio equatio. I this case, the λ may be relatively more difficult to be estimated by the 2SLS. The upward bias of the 2SLSE of λ ca be large. The SD ad RMSE of the 2SLS estimates are larger tha those of the MLE ad various GMME s. The OGMME ca be the best amog the various GMM estimates whe N = 49. The BGMME ca be better tha the other GMMEs with larger N. The BGMME estimates ca be as efficiet as the MLE with large N = 490. For the estimates of the β s, there are ot much differeces amog the various estimates. I summary, the 2SLSE of λ has larger biases ad SD s tha those of the various GMME s. 4 The performace of the 2SLSE becomes worse whe the value of R 2 becomes small. The various GMME s ca substatially improve upo the 2SLSE. The OGMME ad BGMME ca be efficiet as the MLE with large sample sizes. 5 The differeces of the various estimators occur oly for the estimatio of λ but ot for 4 This may be so because our cases have oly moderate or very small R 2 values due to the explaatory variables. 5 However, i some repetitios, the BGMME may be sesitive to iitial cosistet estimates i the costructio of G. The results i Table ad Table 2 are based, respectively, o 2SLSE ad GMME 4

16 estimatio of the β s. 6. Coclusio I this paper, we cosider the estimatio of the MRSAR model. The 2SLS method has bee suggested i the literature for the estimatio of the MRSAR model. The 2SLS method ca be applicable oly if some of the spatially varyig exogeous variables are really relevat. It is impossible i the 2SLS framework to test the overall sigificace of all the exogeous variables i the MRSAR model. It is kow that the 2SLSE does ot attai the same limitig distributio of the MLE uder ormal disturbaces) of the MRSAR model. This paper improves upo the 2SLS approach by itroducig additioal momet fuctios i the GMM framework. The resulted GMME ca be efficiet relative to the 2SLSE. It is possible to derive the best GMME s withi certai classes of GMME s. Oe of the BGMME s ca attai the same limitig distributio of the MLE uder ormal disturbaces). Withi the GMM estimatio framework, it is possible to test the overall sigificace of all the exogeous variables i the model. The GMM approach may, i priciple, be geeralized for the estimatio of MRSAR models with higher order spatial lags ad models with both spatial lags ad/or SAR disturbaces. However, may issues for the models with higher order momets have ot bee well uderstood, for example, the proper parameter space of the lag coefficiets ad its idetificatio problem. These will be studied i aother occasio. as iitial estimates. 5

17 Appedix A: Some Useful Lemmas I this appedix, we list some lemmas which are useful for the proofs of the results i the text. Lemma A. Suppose that the sequeces of -dimesioal colum vectors {z } ad {z 2 } are uiformly bouded. If {A } are uiformly bouded i either row or colum sums i absolute value, the z A z 2 = O). Proof: This is trivial. Lemma A.2 Suppose that ɛ,,ɛ are i.i.d. radom variables with zero mea, fiite variace σ 2 ad fiite fourth momet µ 4. The, for ay two square matrices A ad B, Eɛ Aɛ ɛ Bɛ )=µ 4 3σ 4 0)vec DA)vec D B)+σ 4 0[trA)trB)+trAB s )], where B s = B + B. Proof: This is a Lemma i Lee 200). Lemma A.3 Suppose that {A } are uiformly bouded i both row ad colum sums i absolute value. The ɛ,...,ɛ are i.i.d. with zero mea ad its fourth momet exists. The, Eɛ A ɛ )=O), varɛ A ɛ )=O), ɛ A ɛ = O P ), ad ɛ A ɛ Eɛ A ɛ )=o P ). Proof: Lee 200). Lemma A.4 Suppose that A is a matrix with its colum sums beig uiformly bouded i absolute value, elemets of the k matrix C are uiformly bouded, ad ɛ,,ɛ are i.i.d. with zero mea ad fiite variace σ 2. The, C A ɛ = O P ) ad C A ɛ = o p ). Furthermore, if the limit of C A A C exists ad is positive defiite, the C A ɛ D N0,σ 2 lim C A A C ). Proof: See Lee 2004). Lemma A.5 Suppose that {A } is a sequece of symmetric matrices with row ad colum sums uiformly bouded i absolute value ad b = b,,b ) is a -dimesioal vector such that sup i= b i 2+η < for some η > 0. The ɛ,,ɛ are i.i.d. radom variables with zero mea ad fiite variace σ 2, ad its momet E ɛ 4+2δ ) for some δ>0 exists. Let σ 2 Q be the variace of Q where Q = ɛ A ɛ + b ɛ σ 2 tra ). Assume that the variace σ 2 Q is bouded away from zero at the rate. The, Q σ Q D N0, ). Proof: See Kelejia ad Prucha 200). Lemma A.6 Suppose that Q θ) Q θ)) coverges i probability to zero uiformly i θ Θ which is a covex set, ad { Q θ)} satisfies the idetificatio uiqueess coditio at θ 0. Let ˆθ ad ˆθ be, 6

18 respectively, the miimizers of Q θ) ad Q θ) i Θ. If Q θ) Q θ)) = o P ) uiformly i θ Θ, the both ˆθ ad ˆθ coverge i probability to θ 0. I additio, suppose that 2 Q θ) coverges i probability to a well defied limitig matrix, uiformly i θ Θ, which is osigular at θ 0, ad Q θ 0) = O P ). If 2 Q θ) i θ Θ ad Q θ0) distributio. 2 Q θ) )=o P ) uiformly Qθ0) )=o P ), the ˆθ θ 0 ) ad ˆθ θ 0 ) have the same limitig Proof: The covergece of ˆθ to θ 0 follows from the uiform covergece of Q θ) Q θ)) to zero i probability ad the uiqueess idetificatio coditio of { Q θ)} White 994). As Q θ) Q θ)) = Q θ) Q θ))+ Q θ) Q θ)) = o P ) uiformly i θ Θ, the covergece of ˆθ to θ 0 i probability follows. For the limitig distributio, the Taylor expasio of Q θ) ˆθ θ 0 )= = = 2 Q ) θ ) Q θ 0 ) ) 2 Q θ ) + o P ) 2 Q θ ) at θ 0 implies that Q θ 0 ) ) Q θ 0 ) + o P ). + o P )) Thus, ˆθ θ 0 ) ad ˆθ θ 0 ) have the same limitig distributio. Q.E.D. Lemma A.7 Suppose that the elemets of the k matrices X are uiformly bouded for all ; ad lim X X exists ad this limitig matrix is osigular, the the projectors, X X X ) X ad I X X X ) X, are uiformly bouded i both row ad colum sums i absolute value. Proof: See Lee 2004). Lemma A.8 Suppose that { W } ad { S λ 0 ) }, where is a matrix orm, are bouded. The { S λ) }, where S λ) =I λw, are uiformly bouded i a eighborhood of λ 0. Proof: See Lee 2004). I the followig Lemmas ad Appedix B, some simplified otatios shall be used to miimize the presetatio of mathematical terms. For ay square matrix A, we shall deote the adjusted matrix A tra) I )ora DiagA)) by A d. Furthermore, with the spatial weights matrix W, deote S λ) = I λw, S = S λ 0 ), G λ) =W I λw ), ad G = G λ 0 ). Lemma A.9 Suppose that z ad z 2 are -dimesioal colum vectors of costats which elemets are uiformly bouded, the costat matrices A are uiformly bouded i the maximum colum sum orm, ad ɛ i s i ɛ =ɛ,,ɛ ), are i.i.d. with zero mea ad a fiite variace σ 2. Let ˆλ be a 7

19 -cosistet estimate of λ0. The, uder Assumptio 3, ) z G ˆλ ) G ) z 2 = o P ), z G ˆλ ) G ) d z 2 = o P ); ad 2) z G ˆλ ) G ) A ɛ = o P ), z G ˆλ ) G ) d A ɛ = o P ). Proof: As S S ˆλ ) = ˆλ λ 0 )W, it follows that G ˆλ ) G = W [S ˆλ ) S ] = W S ˆλ )[S S ˆλ )]S =ˆλ λ 0 )G ˆλ )G, ad G ˆλ ) G ) d =ˆλ λ 0 )G ˆλ )G ) d. A further expasio implies that G ˆλ ) G =ˆλ λ 0 )G 2 +ˆλ λ 0 ) 2 G ˆλ )G 2 ad G ˆλ ) G ) d = ˆλ λ 0 )G 2d +ˆλ λ 0 ) 2 G ˆλ )G 2 )d. We ote that, uder Assumptio 3, because S is uiformly bouded i both row ad colum sums i absolute value, S λ) ad, hece, G λ) must be uiformly bouded i both row ad colum sums i absolute value uiformly i λ i a small eighborhood of λ 0 by Lemma A.8. As ˆλ is cosistet, it follow that G ˆλ ) is uiformly bouded i both row ad colum sums i absolute value with probability oe. Therefore, Lemma A. implies that z G G ˆλ )z 2 = O P ). Hece, z G ˆλ ) G ) z 2 =ˆλ λ 0 ) z G G ˆλ )z 2 = o P ) as ˆλ λ 0 = o P ). Similarly, z G ˆλ ) G ) d z 2 = o P ). This proves ). For 2), by the further expasio of G ˆλ ) aroud G, z G ˆλ ) G ) A ɛ = ˆλ λ 0 ) z G 2 A ɛ +ˆλ λ 0 ) 2 z G 2 G ˆλ )A ɛ. Lemma A.4 implies that z G 2 A ɛ = o P ). Therefore, with the -cosistet ˆλ, the first term o the right had side is o P ). The remaider term is also o P ). This is so as follows. Let be the maximum colum sum orm. Because the product of matrices uiformly bouded i colum sums i absolute value is uiformly bouded i colum sums i absolute value, G G ˆλ )A c for some costat c for all. As elemets of z are uiformly bouded, there exists a costat c 2 such that z c 2. It follows that ˆλ λ 0 ) 2 z 2 G G ˆλ )A ɛ [ ˆλ λ 0 )] 2 ɛ 3/2 z G 2 G ˆλ )A c c 2 [ ˆλ λ 0 )] 2 ɛ i. 2 The result 2) follows because λ λ 0 )=o P ) ad i= ɛ i = O P ) by the strog law of large i= umbers. Similarly argumets are applicable to z G ˆλ ) G ) d A ɛ. Q.E.D. Lemma A.0 Let A ad B be matrices, uiformly bouded i both row ad colum sums i absolute value. The ɛ i s i ɛ =ɛ,,ɛ ) are i.i.d. with zero mea ad its fourth momet exists. Suppose that ˆλ is a -cosistet estimator of λ 0. The, uder Assumptio 3, i) ɛ A G ˆλ ) G ) d B ɛ = o P ), ad 8

20 ii) ɛ G ˆλ ) G ) d ɛ = o P ). Proof: This is a case of Lemma A.9 i Lee 200). Q.E.D. Appedix B: Proofs Proof of Propositio. For cosistecy, we first show that a g θ) a Eg θ)) will coverge i probability uiformly i θ Θ to zero. Let a =a,,a m,a x ) where a x is a row) subvector. The a g θ) =ɛ θ) m j= a jp j )ɛ θ)+a x Q ɛ θ). By expasio, ɛ θ) =d θ)+ɛ +λ 0 λ)g ɛ where d θ) =λ 0 λ)g X β 0 + X β 0 β). It follows that m m ɛ θ) a j P j )ɛ θ) =d θ) a j P j )d θ)+l θ)+q θ), j= j= where l θ) =d θ) m j= a jp s j)ɛ +λ 0 λ)g ɛ ) ad q θ) =ɛ +λ 0 λ)ɛ G ) m j= a jp j )ɛ + λ 0 λ)g ɛ ). The term l θ) is liear i ɛ. By expasio, l θ) =λ 0 λ) X β 0 ) G m a j Pj)ɛ s +β 0 β) m X a j Pj)ɛ s j= +λ 0 λ) 2 m X β 0 ) G a j Pj)G s ɛ +λ 0 λ)β 0 β) m X a j Pj)G s ɛ = o P ), j= by Lemma A.4, uiformly i θ Θ. The uiform covergece i probability follows because l θ) is simply a quadratic fuctio of λ ad β ad Θ is a bouded set. Similarly, q θ) = ɛ m a j P j )ɛ +λ 0 λ) m ɛ G a j Pj)ɛ s +λ 0 λ) 2 m ɛ G a j P j )G ɛ j= =λ 0 λ) σ2 0 m j= j= a j trg P s j)+λ 0 λ) 2 σ2 0 j= j= j= m a j trg P j G )+o P ), uiformly i θ Θ, by Lemma A.3 ad Eɛ P j ɛ )=σ 2 0trP j ) = 0 for all j =,,m. Cosequetly, ɛ θ) m a j P j )ɛ θ) = m d θ) j= j= +λ 0 λ) 2 σ2 0 j= a j P j )d θ)+λ 0 λ) σ2 0 m a j trpjg s ) j= m a j trg P jg )+o P ), uiformly i θ Θ. As g θ) is a quadratic fuctio of θ ad Θ is bouded, j= a Eg θ)) is uiformly equicotiuous o Θ. The idetificatio coditio ad the uiform equicotiuity of a Eg θ)) imply that the idetificatio uiqueess coditio for 2 Eg θ)a a Eg θ)) must be satisfied. The cosistecy of the GMME ˆθ follows from the uiform covergece ad the idetificatio uiqueess coditio White 994). 9

21 For the asymptotic distributio of ˆθ, by the Taylor expasio of g ˆθ ) a a g ˆθ )=0atθ 0, As ɛθ) ˆθ θ 0 )= [ g ˆθ ) = W Y,X ), it follows that gθ) a a g θ ) ] g ˆθ ) a a g θ 0 ). = P s ɛ θ),,p s mɛ θ),q ) W Y,X ). Explicitly, ɛ θ)p j s W Y = ɛ θ)p j s G X β 0 + ɛ θ)p j s G ɛ. By Lemmas A.4 ad A.3, ɛ θ)pjg s X β 0 = d θ)pjg s X β 0 + ɛ PjG s X β 0 +λ 0 λ) ɛ G PjG s X β 0 ad uiformly i θ Θ. Hece, = d θ)p s jg X β 0 + o P ), ɛ θ)p s jg ɛ = d θ)p s jg ɛ + ɛ P s jg ɛ + λ 0 λ)ɛ G P s jg ɛ = σ2 0 trp s jg )+λ 0 λ) σ2 0 trg P s jg )+o P ), ɛ θ)p s jw Y = d θ)p s jg X β 0 + σs 0 trp s jg )+λ 0 λ) σ2 0 trg P s jg )+o P ), uiformly i θ Θ. At θ 0, d θ 0 ) = 0 ad, hece, ɛ θ 0)Pj s W Y = σ2 0 trpj s G )+o P ). At θ 0, ɛ θ 0 )Pj s X = o P ). Fially, Q W Y = Q G X β 0 + Q G ɛ = Q G X β 0 + o P ). I coclusio, g θ ) = D + o P ) with D i 3.2). O the other had, Lemma A.5 implies that a g θ 0 )= [ɛ m j= a j P j )ɛ + a x Q ɛ ] D N0, lim a Ω a ). The asymptotic distributio of ˆλ λ 0 ) follows. Q.E.D. Proof of Propositio 2. The geeralized Schwartz iequality implies that the optimal weightig matrix for a a i Propositio is Ω ). For cosistecy, cosider g θ)ˆω g θ) = g θ)ω g θ)+ g θ)ˆω Ω )g θ). With a = Ω ) /2 i Propositio, Assumptio 6 implies that a 0 = lim Ω ) /2 exits. Because a 0 is osigular, the idetificatio coditio of θ 0 correspods to the uique root of lim E g θ)) = 0 at θ 0, which is satisfied by Assumptio 5. Hece, the uiform covergece i probability of g θ)ω g θ) to a well defied limit uiformly i θ Θ follows by a similar argumet i the proof of Propositio. So it remais to show that g θ)ˆω orm for vectors ad matrices. The, Ω )g θ) =o P ) uiformly i θ Θ. Let be the Euclidea g θ)ˆω Ω )g θ) g θ) ) 2 ˆΩ ) Ω ). 20

22 To show that this is of probability order o P ) uiformly i θ Θ, it is sufficiet to show that g θ) = O P ) uiformly i θ Θ. From the proof of Propositio, [g θ) Eg θ))] = o P ) uiformly i θ Θ. O the other had, as d θ)p j d θ) =λ 0 λ) 2 X β 0 ) G P j G X β 0 +λ 0 λ) X β 0 ) G P s jx β 0 β) +β 0 β) X P j X β 0 β) =O P ), uiformly i θ Θ by Lemma A., it follows that Eɛ θ)p jɛ θ)) = d θ)p jd θ)+λ 0 λ)σ0 2 trp j s G )+λ 0 λ) 2 σ0 2 trg P jg )=O), uiformly i θ Θ. Similarly, EQ ɛ θ)) = Q d θ) =λ 0 λ) Q G X β 0 + Q X β 0 β) =O) uiformly i θ Θ. These imply that Eg θ)) = O) uiformly i θ Θ. Cosequetly, by the Markov iequality, g θ) = O P ) uiformly i θ Θ. Therefore, g θ)ˆω Ω )g θ) = o P ), uiformly i θ Θ. The cosistecy of the feasible optimum GMME ˆθ o, follows. For the limitig distributio, as g ˆθ ) = D + o P ) from the proof of Propositio, ˆθo, θ 0 )= = [ D g ˆθ ) Ω ) ˆΩ ) D ] D g ˆθ ) Ω g ˆθ ) ) g θ 0 )+o P ). ) ˆΩ g θ 0 ) The limitig distributio of ˆθ o θ 0 ) i 3.3) follows from this expasio. For the overidetificatio test, by the Taylor expasio, g ˆθ o, )= g θ 0 ) + where A = I D [ D g ˆθ o )ˆΩ g ˆθ o ) = g θ 0)A Ω = g θ 0) Ω ) /2 Ω D χ 2 m + k x ) k + )), g θ ) ˆθo, θ 0 )= g θ 0 ) ) D ] D Ω ). Therefore, ) A g θ 0 )+o P ) D g θ 0 ) ˆθo, θ 0 )+o P ) = A + o P ), { I Ω D ) /2 [D Ω D } ) ] D Ω ) /2 Ω ) /2 g θ 0 )+o P ) because g θ 0 ) D N0, lim Ω ) as i the proof of Propositio. Q.E.D. 2

23 Proof of Propositio 3. For the feasible best GMM estimatio with P, the vector of momet fuctios is ĝ b, θ) =ɛ θ)ĝ trĝ ) I )ɛ θ),ɛ θ)ĝx ˆβ,ɛ θ)x ), ad the correspodig estimated V is ) ˆV =ˆσ 4 trĝ trĝ ) I ) s Ĝ ) 0. 0 ˆσ ĜX 2 ˆβ,X ) ĜX ˆβ,X ) Whe G X β 0 ad X are liearly depedet, the liear momet of Ĝ X ˆβ would be redudat ad should be dropped. The momet fuctio g b, θ) ad the correspodig proper weightig fuctio ˆV should simply be ĝ b, θ) =ɛ θ)ĝ trĝ ) I )ɛ θ),ɛ θ)x ), ad ˆV =ˆσ 4 trĝ trĝ ) I ) sĝ ) 0 0 ˆσ 2 X X ) The feasible best GMME with P will be derived from mi θ Θ g b, θ) ˆV g b, θ). For the best GMM estimatio with the subclass P 2, its momet fuctios ĝ 2b, ad estimated variace matrix are those with Ĝ DiagĜ ) replacig Ĝ trĝ ) I ) i the precedig expressios. We shall show that the objective fuctios Q θ) =ĝ b, θ) ˆV ĝ b, θ) ad Q θ) =g b, θ)v g b, θ), where g b, is the couter part of ĝ b, with G ad β 0 replacig, respectively, Ĝ ad ˆβ, will satisfy the coditios i Lemma A.6. If so, the GMME from the miimizatio of Q θ) will have the same limitig distributio as that of the miimizatio of Q θ). The differece of Q θ) ad Q θ) ad its derivatives ). ivolve the differece of ĝ b, θ) ad g b, θ) ad their derivatives. Furthermore, oe has to cosider the differece of ˆV ad V. First, cosider ĝ b,θ) g b, θ)). Explicitly, ĝ b,θ) g b, θ)) = ɛ θ)ĝ G ) d ɛ θ), ) Ĝ X ˆβ G X β 0 ) ɛ θ), 0. The ɛ θ) is related to ɛ as ɛ θ) =ɛ +λ 0 λ)g ɛ +d θ) where d θ) =λ 0 λ)g X β 0 +X β 0 β). It follows that X G ɛ θ) = X G ɛ +λ 0 λ) X G G ɛ + X G d θ) = X G d θ) +o P ) by Lemma A.4, uiformly i θ Θ. O the other had, Lemma A. implies that X G d θ) =O P ) uiformly i θ Θ. The uiformity follows because d θ) is liear i λ ad β. Hece X G ɛ θ) =O P ) uiformly i θ Θ. Similarly, Lemma A.9 implies X Ĝ G ) ɛ θ) = X Ĝ G ) ɛ +λ 0 λ) X Ĝ G ) G ɛ + X Ĝ G ) d θ) =o P ) uiformly i θ Θ. It follows that Ĝ X ˆβ G X β 0 ) ɛ θ) = ˆβ X Ĝ G ) ɛ θ) +ˆβ β 0 ) X G ɛ θ) = o P ) because ˆβ β 0 = o P ). Similarly, Lemmas A.9 ad A.0 imply that ɛ θ)ĝ G ) d ɛ θ) =o P ) uiformly i θ Θ. Hece, we coclude that ĝ b,θ) g b, θ)) = o P ) uiformly i θ Θ. 22

24 Cosider the derivatives of ĝ b, θ) ad g b, θ). As the secod derivatives of ɛ θ) with respect to θ are zero because ɛ θ) is liear i θ, it follows that ɛ g b, θ) θ)g ds ɛ θ) = G X β 0 ) ɛ θ) X ɛ θ), 2 g b, θ) = ɛ θ) G ds ɛ θ) 0. 0 The first order derivatives of ɛ θ) is ɛθ) = W Y,X ). Because W Y = G X β 0 + G ɛ, W Y ) Ĝ G ) ds ɛ θ) = G X β 0 ) Ĝ G ) ds d θ)+ G X β 0 ) Ĝ G ) ds ɛ +λ 0 λ)g ɛ ) + d θ)ĝ G ) ds G ɛ + ɛ G Ĝ G ) ds ɛ +λ 0 λ)g ɛ )=o P ), uiformly i θ Θ, ad W Y ) Ĝ G ) ds W Y = X β 0 ) G Ĝ G ) ds G X β X β 0 ) G Ĝ G ) ds G ɛ + ɛ G Ĝ G ) ds G ɛ = o P ), by Lemmas A.9 ad A.0. Similarly, Lemmas A.9 ad A.0 imply that X Ĝ G ) ds ɛ θ) =o P ), X Ĝ G ) ds W Y = o P ), ad X Ĝ G ) ds X = o p ). Hece, we coclude that ĝb,θ) g b,θ) )=o P ) ad 2 ĝ b,θ) 2 g b,θ) )=o P ) uiformly i θ Θ. For the cases uder cosideratio, ˆV ad V are block diagoal matrices. Without loss of geerality, cosider the situatio that G X β 0 ad X are ot liearly depedet for large. Thus, ˆV = trĝ ds Ĝ) ) ) 0 trg ds 0 ˆσ Â 2 G, V = ) 0 0 σ0a 2, where A =G X β 0,X ) G X β 0,X ) ad Â =Ĝ X ˆβ,X ) Ĝ X ˆβ,X ). The differece of Ĝ ad G is Ĝ G )=ˆλ λ 0 )G 2 +ˆλ λ 0 ) 2 Ĝ G 2. This implies that trgds Ĝ G )) = ˆλ λ 0 ) trgds G 2 )+ˆλ λ 0 ) 2 trgds ĜG 2 )=o P ), because trgds G 2 )=O), trgds ĜG 2 )=O P ) ad ˆλ λ 0 )=o P ). Similarly, tr[ĝ G ) ds Ĝ ]=ˆλ λ 0 ) trg2ds Ĝ)+ˆλ λ 0 ) 2 tr[ĝ G 2 )ds Ĝ ]=o P ). Therefore, [trĝds ds Ĝ) trg G )] = tr[ĝds G ds )Ĝ+G ds Ĝ G )] = o P ). Cosider the remaiig block matrix. Because ˆσ 2 is a cosistet estimate of σ 2 0 ad A = O) by Lemma A., ˆσ2 Â σ0a 2 )=ˆσ 2 Â A )+ˆσ 2 σ0) 2 A =ˆσ 2 Â A )+o P ). 23

25 The differece Â A )iso p ) because Ĝ X ˆβ ) X G X β 0 ) X = ˆβ X Ĝ G )X + ˆβ β 0 ) X G X = o P ) ad ĜX ˆβ ) ĜX ˆβ ) G X β 0 ) G X β 0 ) = ˆβ X Ĝ G ) Ĝ X ˆβ + ˆβ X G Ĝ G )X ˆβ +ˆβ + β 0 ) X G G X ˆβ β 0 )=o P ), by Lemmas A. ad A.9. I coclusio, ˆV V = o P ). It follows that ˆV ) V ) = o P ) by the cotiuous mappig theorem. Furthermore, because ĝ b,θ) g b, θ)) = o P ), ad [g b,θ) Eg b, θ))] = o P ) uiformly i θ Θ, ad sup θ Θ Eg b,θ)) = O) i the proof of Propositio 2, g b,θ) ad ĝb,θ) are stochastically bouded, uiformly i θ Θ. Similarly, g b,θ), ĝ b,θ), 2 g b,θ) ad 2 ĝ b,θ) are stochastically bouded, uiformly i θ Θ. With the uiform covergece i probability ad uiformly stochastic boudedess properties, the differece of Q θ) ad Q θ) ca be ivestigated. By expasio, Q θ) Q θ)) = ĝ b,θ) ˆV ĝ b, θ) g b, θ)) + g b,θ) ˆV uiformly i θ Θ. Similarly, for each compoet θ l of θ, 2 Q θ) 2 Q θ) l l = 2 [ ĝ b, θ) ˆV ĝ b, θ) +ĝ l b, θ) ˆV 2 ĝ b, θ) l = o P ). Fially, because ĝ b, θ0) ˆV limit theorems i Lemmas A.4 ad A.5, Q θ 0) =2{ ĝ b, θ 0) =2 ĝ b, θ 0) Q θ 0 ) ) ˆV ˆV This differece will be of order o P ) if V )ĝ b, + g b,θ)v ĝ b, θ) g b, θ)) = o P ), g b,θ) l V g b, θ) + g b, θ)v 2 g b, θ) l )] g b, θ0) V )=o P ) as above, ad g b, θ 0 )=O P ) by the cetral ĝ b, θ 0 ) g b, θ 0 )) + ĝ b,θ 0 ) ĝ b, θ 0 ) g b, θ 0 )) + o P ). ˆV ĝ b, θ 0 ) g b, θ 0 )) = o P ). g b, θ 0) V ) g b, θ 0 )} Lemma A.0 implies that the compoet ɛ Ĝ G ) d ɛ = o P ). Lemmas A.4 ad A.9 imply that [ĜX ˆβ ) G X β 0 ) ]ɛ = ˆβ X Ĝ G ) ɛ +ˆβ β 0 ) X G ɛ = o P ), ad ˆβ X ɛ β 0 X ɛ )=ˆβ β 0 ) X ɛ = o P ). Hece ĝ b, θ 0 ) g b, θ 0 )) = o P ). Fially, the results of the propositio follows from Lemma A.6 ad the precedig propositios. Q.E.D. 24

GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models. Lung-fei Lee* Department of Economics. Ohio State University, Columbus, OH

GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models. Lung-fei Lee* Department of Economics. Ohio State University, Columbus, OH GMM ad 2SLS Estimatio of Mixed Regressive, Spatial Autoregressive Models by Lug-fei Lee* Departmet of Ecoomics Ohio State Uiversity, Columbus, OH October 200 Abstract The GMM method ad the classical 2SLS