ESTIMATION BIAS IN SPATIAL MODELS WITH STRONGLY CONNECTED WEIGHT MATRICES

Size: px

Start display at page:

Download "ESTIMATION BIAS IN SPATIAL MODELS WITH STRONGLY CONNECTED WEIGHT MATRICES"

Georgiana Waters
6 years ago
Views:

1 ESTIMATION BIAS IN SPATIAL MODELS ITH STRONGLY CONNECTED EIGHT MATRICES Toy E. Smith Departmet of Systems ad Electrical Egieerig Uiversity of Pesylvaia Jauary, 008 (Revised Jue 8, 008) Abstract I this paper it is show that for both spatial lag ad spatial autoregressive models with strogly coected weight matrices, maximum likelihood estimates of the spatial depedece parameter are ecessarily biased dowward. I additio, it is show that same bias is preset i geeral Mora tests of spatial depedecy, so that positive depedecies will ofte fail to be detected whe weight matrices are strogly coected. A simulated umerical example is preseted to illustrate some of the practical cosequeces of these biases. Key ords: bias, spatial lag models, spatial autoregressive models, Mora test

2 1. Itroductio I a recet simulatio study, Mizruchi ad Neuma (008) have show that for spatial lag models with strogly coected (high desity) weight matrices, there is ofte a severe dowward bias i maximum-likelihood estimates of the spatial depedecy parameter. 1 This same bias is also reported by Farber, Páez ad Volz (008) i their recet simulatio study of the ifluece of etwork topology o tests of spatial depedecies. Hece the cetral purpose of this paper is to clarify the ature of this bias from a aalytical perspective. I additio, it is show that same bias is preset i both spatial autoregressive models ad i the more geeral Mora test of spatial depedecy. I all cases this bias implies that sigificatly positive spatial depedecies will ofte fail to be detected whe weight matrices are strogly coected. To establish these results, the aalytical strategy will be to cosider the extreme case of maximally coected weight matrices, ad to obtai exact results for this case. The rest will the follow from simple cotiuity cosideratios. To avoid repetitio, the aalytical developmet of spatial regressio models will focus o spatial lag models. Parallel results for spatial autoregressive models will simply be sketched. Hece to fix the ideas, we begi with the followig a stadard spatial lag model (SL) for spatial uits: (1) y = y Xβ ε ε N σ I, ~ (0, ) ( k 1) where y R is some variable of iterest ad X = [1, x1,.., xk] R represets a relevat set of k explaatory variables, with 1 = (1,..,1) deotig the uit -vector (correspodig to the itercept term i this liear model). [Throughout the followig 1 aalysis it will always be assumed that X has full colum rak, k 1, so that ( X X ) exists.] The ukow parameters of the model iclude the vector, β = ( β0, β1,.., β k ) of beta coefficiets, the variace, σ, of each residual i ε, ad the spatial depedece parameter,, which is of primary iterest i the preset aalysis. Also of major iterest is the structure of the spatial weight matrix,. For purposes of the preset aalysis, it is coveiet to begi by characterizig these matrices i the followig way. First we choose a fixed positive scalar, b, to serve as a upper boud o weight values. ith respect to this boud, a -square matrix, = ( wij : i, j = 1,.. ), is desigated as a weight matrix iff (i) w ii = 0 ad (ii) 0 wij b for all i, j = 1,..,. As usual, coditio (i) specifies that depedecies are defied oly betwee distict spatial uits. Coditio (ii) ca be thought of as a ormalizatio coditio the allows each weight, w, to be iterpreted as the degree of coectivity betwee i ad j, where wij ij = b implies a maximal degree of coectivity. This is particularly appropriate for 1 I am idebted to a referee for poitig out that similar observatios were made by Bao ad Ullah (007) with respect to the secod order bias of these estimates i the cotext of a pure spatial lag model with circular weight matrices of varyig degrees of coectivity.

3 applicatios of model (1) to social etworks amog agets. For the preset, the boud b oly serves as a coveiet coceptual device, ad ca be set equal to oe without loss of geerality. However, the questio of appropriate matrix ormalizatios for the estimatio of is of some importace, ad will be addressed below. If the class of all -square weight matrices is deoted by R (where the fixed scale parameter b is take to be implicit), the the relevat geometry of this set ca be depicted for the = case as follows. Observe that each matrix is of the form () 0 w 1 = w 1 0 ad thus is fully characterized by the -vector ( w1, w 1). Hece the etire class is see to be equivalet to the poits i the square, [0, b ], show below: Figure 1 here Here the lower left had corer correspods to the miimally coected weight matrix, with all zero compoets, ad the upper right had corer correspods to the maximally coected weight matrix,, with all off-diagoal elemets equal to b. This depictio for the x case makes it clear that ad are the two atural extreme weight matrices i for all. 3 Sice correspods to complete statistical idepedece i model (1), attetio has aturally focused o those weight matrices,, that are sufficietly close to to iherit all of its desirable large-sample properties (such as cosistecy ad asymptotic ormality of parameter estimates). Thus most of the literature has focused o those matrices i the lower left eighborhood show i Figure 1. I this cotext, the distiguishig feature of the preset aalysis is that it focuses rather o the upper right eighborhood i Figure 1, which for the momet we loosely desigate as strogly coected weight matrices. 4 Our cetral objective is to show ot oly that such weight matrices fail to share the desirable properties of the idepedece case, but also to determie the exact ature of this failure. Of particular iterest will be the severe dowward bias i maximum likelihood estimates of the spatial depedecy parameter,. This termiology is ot to be cofused with the graph-theoretical otio of totally coected which refers oly to the presece of a ozero liks betwee all distict ode pairs. 3 This ca also be expressed i terms of the (cell-wise) matrix iequalities for all 4 Here it should be oted that maximally coected spatial weight matrices have bee previously studied i a somewhat differet cotext by Kelejia ad Prucha (00) who described them simply as models with equal spatial weights [see also Kelejia, Prucha ad Yuzefovich (006) ad Baltagi (006)]. 3

4 To establish this result i a self-cotaied maer, it is coveiet to begi with a detailed developmet of the maximum-likelihood estimatio problem for the spatial lag model i Sectio below. This is followed i Sectio 3 by a aalysis of the maximally coected case,, i the upper right corer. The results for this case are exteded by cotiuity i Sectio 4 to all matrices sufficietly close to i a appropriate sese, ad some umerical illustratios are give. I Sectio 5 it is show that these results are essetially the same for spatial autoregressive models. Fially it is show i Sectio 6 that strog coectivity also has cosequeces for Mora diagostic tests of spatial idepedece.. Maximum Likelihood Estimatio for SL Models Model (1) implies that y is multiormally distributed, ad i particular that for ay give data, ( yx, ) the log likelihood fuctio for parameters ( β, σ, ) takes the form: 5 (3) L y X cost I ( βσ,,, ) = l( σ ) l det( ) 1 (( I ) y X ) (( I ) y X ) σ β β where I is the -square idetity matrix, ad where all terms ot ivolvig the parameters are subsumed i cost. As with all geeralized liear models, oe proceeds by first fixig the covariace parameters (i this case, ) ad maximizig the likelihood fuctio i β ad σ to produce the well-kow closed form coditioal estimates: (4) ˆ ( ) ( ) 1 sl = ( ) β X X X I y (5) ( ˆ ˆ σ ( ) (1/ ) ( ) ( )) (( ) ˆ sl = I y Xβsl I y Xβsl ( ) ) where the subscript sl deotes the SL model. These are the substituted ito (3) to yield a reduced fuctio desigated as the cocetrated likelihood fuctio, L sl, for. After some simple cacellig of terms, this fuctio takes the form: (6) L y X = cost I ˆ σ sl (, ) l det( ) ( /)l[ ( )] Oe the maximizes this fuctio to obtai the maximum likelihood estimate, ˆ, of ad the substitutes this value ito (4) ad (5) to obtai correspodig maximum likelihood estimates, ˆ ˆ β ( ˆ = βsl ) ad ˆ σ ˆ ( ˆ = σsl ), of β ad σ respectively. However, our primary iterest here is i ˆ itself. 5 Most of the followig developmet is quite stadard, ad ca be foud i may refereces icludig Aseli (1988) ad Aseli ad Bera (1998, sectio III.B). 4

5 To aalyze the fuctio, L sl, oe ca make further reductios as follows [see also Aseli (1988, Sectio 1.1.1)]. First let (7) 1 M = I X( XX) X deote the orthogoal projectio oto the complemet of the spa of X, so that by costructio, M M =, (8) (9) 1 MX = X X( XX ) XX = X X = 0, ad MM = ( I X( XX ) X )( I X( XX ) X ) = I X( XX ) X = M The substitutio of (4) ad (7) ito (5) yields the more compact form of the coditioal variace estimate, 1 1 (10) ˆ σsl( ) = (1/ ) ([ I X( XX ) X ]( I ) y ) ([ I X( XX ) X ]( I ) y) = (1/ ) M( I ) y M( I ) y ( ) ( ) ( ) = (1/ ) y( I ) M( I ) y This i tur allows the cocetrated likelihood i (6) to be writte as (11) L ( y, X ) = cost l det( I ) ( / )l[ y ( I ) M ( I ) y] sl where the term ( /)l(1/ ) has ow bee absorbed ito the costat. Further reductio is possible by observig that if the eigevalues of are deoted by λ( ) = { λi : i = 1,.., }, the the correspodig eigevalues of ( I ) are well kow to be give by λ( I ) = {1 λi : i = 1,.., }. To avoid complicatios i the aalysis to follow, it is coveiet to restrict our attetio to weight matrices,, with real eigevalues (which, most importatly, icludes all which are either symmetric or are row ormalizatios of symmetric matrices). I additio, it will also be assumed that the maximum eigevalue, λ ( ), of is max positive.6 (I particular this icludes all ozero symmetric weight matrices.) Hece we ow restrict our attetio to the subset: (1) = { : λ( ) is real, ad λ ( ) > 0} max 6 This maximum eigevalue is always oegative [Hor ad Johso (1985, Th.8.1.3)], but eed ot be positive eve whe has positive elemets. Eve for = the matrix, = [0 1;0 0], has λ ( ) = {0,0}. 5

6 Give this subset, together with the fact that the determiat of ay matrix is the product of its eigevalues [Hor ad Johso (1985,Th.1..1)], it the follows that (13) det( I ) = (1 λ ) l det( I ) = l 1 λ i i i i as log as each term, 1 λi, o the right had side is ozero. This of course requires further restrictios o. To specify these coditios, we first ote that sice the trace of every matrix is the sum of its eigevalues [Hor ad Johso (1985,Th.1..1)], it follows that i i i (14) λ = tr( ) = w = 0 ii for all. But sice λ ( ) > 0 for all max, this i tur implies that λ ( ), mi must be egative. These observatios together imply that for ay, all terms, 1 λi, i (13) will be positive if the admissible values of are restricted to the ope iterval (15) [ ] = 1 1 ( λ, ( ) λmax ( ) ) mi Hece we ow restrict to the iterval, [ ]. Uder this restrictio, (13) allows (11) to be reduced to the explicit form, (16) L ( y, X ) = cost l 1 λ ( / )l[ y ( I ) M ( I ) y] sl i i which is more readily aalyzed (ad computed). At this poit oe typically proceeds by observig that sice l det( I ) = o the boudaries of [ ], it is reasoable to assume that L sl has a well-defied differetiable maximum i the ope iterval [ ]. This will be true as log as the secod term i (16) is bouded above. To esure this, it must of course be assumed that (17) M ( I ) y 0 for all [ ] To iterpret this coditio, observe that model (1) ca be equivaletly writte as ( I ) y = Xβ ε, where ( I ) y represets the value of y after spatial lag effects have bee accouted for. If this variable is desigated as the effective value of y, (18) y ( ) = ( I ) y i model (1), the as a parallel to classical regressio, it is here assumed that for the give data vector, y, oe of its effective values, { y ( ) : [ ]} is perfectly fitted by X (i.e., lies i the spa of X ). e desigate data sets ( yx, ) satisfyig (17) as -regular. 6

7 Notice that for = 0 this implies the usual regularity coditio that My 0. Data ( y, X ) satisfyig oly this (classical regressio) coditio is simply said to be regular. 3. Biased Estimatio for the Maximally Coected Case i SL Models Give the simple form of the cocetrated likelihood fuctio, L sl, i (16), oe ca proceed to search for a maximum, ˆ, i the iterval [ ] (typically by stadard lie search procedures). However, it turs out that for the maximally coected case,, this maximizatio procedure is doomed to fail. Ideed, the mai result of this sectio will be to show that eve for regular data sets, L sl is always ubouded o [ ]. To establish this result, we begi by aalyzig the properties of. First observe that sice the -square uit matrix is costructible as the outer product, 1 1, the maximally coected weight matrix,, ca be writte as: (19) 0 b b b 0 (1 1 ) = = b b b 0 b I ith this explicit form, the followig result shows that the eigevalues of are computable i closed form: Lemma 1. For all b > 0 the eigevalues of i (19) are give by (0) ( ) { b,.., b, b( 1)} λ = where the eigevalue, b, has multiplicity 1. Proof: It follows from Searle (198, Sectio 1.3.d) that the eigevalues of ay matrix of the form A= ai c 11 are give by (1) λ ( A) = { a,..., a,( a c)} where a has multiplicity 1. Hece the eigevalues of () = b (11 I) = ( b) I ( b)11 are immediately see to be those i (0). The secod (ad most importat) property of maximally coected weight matrices is the followig idetity: 7

8 Lemma. If M is the orthogoal projectio matrix i (7) associated with ay data matrix, X = [1, x1,.., xk], for model (1) the, (3) (4) M = b M = M Proof: Simply observe from (19) ad (7) that M = ( I X( XX ) X ) b (11 I ) 1 = b I X XX X X XX X 1 1 [11 ( ) 11 ( ) ] 1 But sice [ X ( XX ) X ] X = X ad sice 1 is the first colum of X, it follows 1 i particular that [ X( XX ) X ]1 = 1. Hece we see that (5) M = b [1 1 I 1 1 X( XX ) X ] = b [ X( XX ) X I ] = b M Next observe that sice [ X ( XX ) X ]1 = 1 1 = 1 [ X( XX ) X ], it also follows that (6) M = b (1 1 I ) ( I X( XX ) X ) 1 = b X XX X I X XX X 1 1 [11 11 ( ) ( ) ] = b I X XX X 1 [11 11 ( ) ] = b I X XX X = b M. 1 [ ( ) ] Oe useful cosequece of this result is the followig: Lemma 3. Every regular data set, ( y, X ), is -regular. Proof: First observe from Lemma, together with the symmetry of ad M, that for ay [ ] ad data set ( yx, ), (7) y I M I y = y I M M y ( ) ( ) ( )( ) = y ( M M M M ) y = y ( M bm bm b M) y = (1 b b ) ymy 8

9 = (1 b) ymy But [ ] the implies that > 1/ b ad hece that 1 b > 0. Thus - regularity of ( y, X ) will follow if it ca be show that ymy > 0. But sice M is a orthogoal project matrix ad hece is positive semidefiite, it follows that ymy 0 for all y, ad moreover that ymy = 0 My= 0 [Hor ad Johso (1985, p.400)]. Fially sice the regularity of ( y, X ) implies that My 0, it must the be true that ymy > 0, ad thus that -regularity holds. ith these properties, we are ow ready to establish our mai result, amely that L sl is ubouded o [ ]. I particular, we show that L sl icreases without boud as approaches the lower boudary of [ ]. To so, observe from (15) ad (0) that this lower boudary poit,, is give by (8) = 1/ λmi( ) = 1/ b ith this defiitio we ow have: Propositio 1. If = i model (1), the for all regular data sets ( y, X ) ad all decreasig sequeces ( m) i [ ], with lim m m =, (9) lim L ( y, X) m sl m = Proof: The strategy will be to use Lemmas 1 ad to show that the cocetrated likelihood fuctio i (16) is reducible to a simple aalytical form for which the result is obvious. To do so, we first observe from Lemma 1 ad the positivity of mi {1 λ ( )} o [ ] that for ay [ ] we must have 7 i i i i i (30) l 1 λ = l(1 λ ) = ( 1) l[1 ( b)] l[1 b( 1)] i = ( 1)l(1 b) l[1 b( 1)] Moreover, we see from (7) above that (31) = ( /)l[ y( I ) M( I ) y] ( /)l[(1 b) y' My] = { l(1 b) ( / )l( y' My)} 7 For the case of b = 1/( 1) this result appears i sectio.5 of Kelejia ad Prucha (00). 9

10 Notice also from Lemma 3 that this log expressio is well defied for all [ ]. Hece by substitutig (30) ad (31) ito (16) we obtai the followig simple expressio for the cocetrated likelihood fuctio, (3) L ( y, X ) = cost {( 1)l(1 b) l[1 b( 1)]} sl { l(1 b) ( / )l( y' My)} = cost l(1 b) l[1 b( 1)] where the term, ( / )l( ymy ), ot cotaiig has agai bee absorbed i cost. From here we eed oly observe that sice = 1/b, it follows that for ay decreasig sequece ( m) i [ ], with lim m m =, we must have (33) lim L ( y, X ) = cost lim l(1 b) lim l[1 b( 1)] m sl m m m m m ad the result is established. 8 = cost l(1 b) l[1 b( 1)] = cost l(0) l( ) = Hece from a formal viewpoit, it may be cocluded that o maximum likelihood estimator of exists for model (1) whe =. 9 This somewhat surprisig i view of the fact that existece of maximum likelihood estimators for model (1) is geerally assumed to hold as log as ad [ ]. Moreover, it is iterestig to ote that from a practical viewpoit, such a failure would most likely ot eve be detected by stadard software. Ideed, oe would typically observe that the lie-search algorithm has coverged to some value of very close to. To gai further isights here, it is useful to illustrate this fidig with a umerical example, as show i Figure below. This is take from the umerical simulatio example preseted i Sectio 4 below (for a sample size of = 50 ). The First Term ad Secod Term show i Figure correspod, respectively, to the log-determiat expressio (30) ad the log-quadratic expressio (31) i Propositio 1 above. Notice that the log-determiat term is always well behaved, sice it is a sum of simple cocave fuctios, l(1 λ i ), o [ ]. Hece the culprit here is the log-quadratic term, which 8 Note that L is also ubouded at the upper boudary of [ ], amely = 1/ λ ( ) = 1/[ b( 1)]. sl max But sice L ( y, X) =, this is of little iterest for maximum likelihood estimatio. sl 9 This failure of existece is a istace of the more geeral result of Arold (1979, Th.3) regardig the o-existece of maximum-likelihood estimators for covariace parameters i liear models with exchageably distributed errors. I am idebted to Federico Martellosio for poitig this out to me. 10

11 i the preset case ot oly diverges to at, but does so at a faster rate tha the correspodig divergece of the log-determiat to. Figure Here Before examiig the practical cosequeces of this result for strogly coected weight matrices, we give a alterative statemet of Propositio 1 that will also prove useful for applicatios. Recall that our basic regularity assumptio o data ( yx, ) was desiged to avoid cases where some effective y -value, y ( ) was perfectly fitted by the data, X, i model (1). e ow show that Propositio 1 results from the fact that for every data set ( yx, ) i model (1), if =, the X must yield a perfect fit to the effective y - value, y ( ), o the lower boudary of [ ]. This fact depeds critically o the presece of a itercept term i model (1) [as should already be apparet from the proof of Lemma ]. Hece it is ow coveiet to make this itercept term explicit by rewritig model (1) as (34) y = y β X β ε ε N σ 1, ~ (0, ) 0 where X = [ x1,.., x k ] ad β = ( β1,.., β k ). For the particular case of it the follows that for ay choice of X, (35) = β β ε y y 1 0 X I y = β X β ε ( ) 01 y ( ) = β 1 X β ε 0 ith this otatio, recall from Lemma 3 ad (17) that for ay regular data set ( y, X) = ( y,[1, X ]) there exists o [ ] such that the effective value y ( ) is a perfect fit to X, i.e., such that (36) y ( ) = β0( )1 X β( ) for some choice of [ β0( ), β( )]. But eve whe this is true, it turs out that coditio (36) always fails at the lower boudary value,, of [ ] as we ow show: The followig result is essetially cotaied i Theorem 1 of Kelejia, Prucha ad Yuzefovich (006), where it is employed to study the cosistecy properties of SLS estimatio i the case of equal spatial weights. 11

12 Propositio. If = i model (1), the for ay data set (, ) y X, (37) y ( ) = β0 ( ) 1 X β( ) where β0 ( ) = 1 y ad β( ) = 0. Proof: First recall from (8) that for ay positive boud b, (38) = = < 1/ λmi( ) 1/ b 0 Hece it follows that, (39) y ( ) = ( I ) y = { I ( 1/ b)[ b (1 1 I )]} y = [ I (1 1 I )] y = 1 1 y = (1 y) 1 X 0 = β ( ) 1 X β( ) 0 ad the result is established. The advatage of viewig this result i terms of perfect fits is that it provides iformatio about the bias of other parameter estimates. For whe ˆ, expressio (37) suggests that oe should have ˆ β 0 1 y ad ˆ β j 0 for all j = 1,.., k. Moreover, sice a perfect fit ecessarily implies zero variace of residuals, this i tur suggests that ˆ σ e shall explore the practical cosequeces of these fidigs i the ext sectio. 4. Cosequeces for Strogly Coected eight Matrices i SL Models The above results show that for the extreme case of maximally coected matrices, we ca obtai a exact aalytical formulatio of the bias iheret i maximum likelihood estimatio for spatial lag models. This i tur suggests that such bias should be iherited by matrices, that are close to i some appropriate sese. To do so, it is coveiet to edow with a matrix orm that will allow some explicit measure of closeess. Here there are may choices. For example, the 1-orm of ay matrix A= ( aij ) R is A = a 1 ij, ad the -orm (Euclidea orm) of A is A ( a ) 1/ ij ij ij =. 1 However, for our preset purposes, the followig scaled versio of the 1-orm is useful for weight matrices,, 11 These results illustrate the more geeral fidig of Arold (1979, p.196) regardig the icosistecy of stadard parameter estimates for liear models with exchageably distributed errors. 1 May other choices are illustrated i Hor ad Johso (1985, Sectio 5.6). 1

13 1 (40) = = 1 1 ( 1) ij ( 1) ( ij / ) rc b w = ij w b ij 1. which we desigate as the relative coectivity orm. 13 If ( wij / b ) deotes the relative coectivity betwee uits (agets) i ad j, the this is simply the average of these relative coectivities over all distict ( i, j ) pairs. I the case of biary matrices,, this is easily see to reduce to the graph-theoretic otio of average lik desity. Give this orm (or ay other matrix orm), the iduced distace betwee ad is the give by (41) = w b = [1 ( w / b )] 1 1 rc b ( 1) ij ij ( 1) ij ij Next, we observe that up to this poit the actual magitude of has ot bee cosidered. All that has bee asserted is that for ay give weight matrix,, these values must lie i the ope iterval [ ] of expressio (15), ad that this iterval cotais zero (so that both positive ad egative values of are always possible.) But to gai further isight, it is useful to evaluate this iterval i specific cases. I the umerical illustratio below we shall use a sample size, = 50. Hece, by settig the boud at b = 1, it follows from Lemma 1 that for the maximally coected matrix, 50, we have λ ( ) = 1 mi ad λ ( ) 49 max =. The correspodig bouds o for this case are thus see to be (4) ( b b 1 ) ( ) 1/,1/ ( 1) = 1, ( 1,.0) 49 which, from a practical viewpoit, is see to offer little room for positive spatial depedecies at all. Sice positive depedecies are by far the most iterestig for practical applicatios, it is clear that a better choice of b should be cosidered. Here the most atural choice is to set b= 1/( 1), so that uder this ormalizatio we obtai (43) λ = b = = max ( ) ( 1) ( 1)/( 1) 1 This will esure that the iterval, [0,1), of oegative -values used for most applicatios actually lies i [ ]. For = 50 we the have (44) [ ] = ( 1/ b,1) = ( ( 1),1) = ( 49,1) 13 Sice every positive scalig of a orm yields aother orm, the first equality shows that this is ideed a matrix orm. 13

14 Oe additioal feature of this ormalizatio that is particularly useful for the preset aalysis is that 14 (45) [0,1) [ ] for all So this same iterval of -values is available for every choice of. 15 Give this ormalizatio, the objective of this sectio is to exted the bias results for maximally coected weight matrices i Propositio 1 to all weight matrices,, that are strogly coected i the sese that they are sufficietly close to i the relative coectivity orm. To do so, it is coveiet to itroduce the followig additioal covetios. First, for ay give ad data set ( y, X ) for model (1), we shall write the maximum likelihood estimator for as ˆ ( y, X ). As poited out above, this estimator ca fail to exist eve whe ( yx, ) is -regular. But for weight matrices close to (i relative coectivity), it should be clear that if ( yx, ) is - regular, the a differetiable maximum, ˆ ( y, X ), fails to exist oly whe L sl is ubouded at the lower boudary of [ ]. I such cases, we simply set ˆ ( y, X ) equal to this lower boudary, so that ˆ ( y, X ) ca be treated as a well-defied value for each. Next, to quatify the possible bias of these estimates, it is coveiet to focus oly o the most importat case of positive depedecies i model (1), i.e., > 0, ad to quatify various degrees of uderestimatio by iequalities of the form, (46) ˆ ( yx, ) < /(1 α ) where parameter α > 0 ca be iterpreted as a bias factor. For example, a bias factor of α = 1 would imply that ˆ ( y, X ) is less tha half the true value of. More geerally, higher bias factors correspod to more severe uderestimatio of. ith these covetios, we ow have the followig cosequece of Propositio 1: Propositio 3. For ay regular data set ( y, X ) with 3 ad ay give value, 0 (0,1), of the spatial depedece parameter for model (1), there exists for each choice of bias factor, α (0,1), a sufficietly small ε = εα (, 0, yx, ) > 0 such that for all, (47) < ε ˆ (, ) < /(1 α) rc y X 0 14 Expressio (45) follows from the fact that 0 λ ( ) λ ( ) = 1 1/ λ ( ) 1 [see max max max Hor ad Johso (1985, Corollary )]. 15 A alterative ormalizatio that also shares this property is to set b equal to the reciprocal of the smallest row or colum sum, as proposed by Kelejia ad Prucha (008, Lemma ). Though less stadard tha the preset covetio, this ormalizatio has the advatage of beig much easier to compute for large weight matrices. 14

15 Proof Sketch: The proof of this result is rather techical, ad is deferred to the Appedix. But the basic idea is simple. Observe from Figure that ot oly does the cocetrated likelihood fuctio, L sl, diverge to at, but i fact its derivative is everywhere egative i [ ]. Hece, if we ow write the cocetrated likelihood fuctio as, Lsl ( y, X, ), to emphasize its depedece o [as well as data ( y, X ) ], the the strategy of the proof is to show that the correspodig derivative, L sl ( y, X, ), with respect to is cotiuous i at the poit. Usig this cotiuity property, it is the possible to show that for ay choice of bias factor, α, whe is sufficietly close to [i.e., whe ε i (47) is sufficietly small], oe ca guaratee that L sl ( y, X, ) will be egative for all [ ] with /(1 α) 0, ad thus that Lsl ( y, X, ) ca oly achieve a maximum o [, /(1 α)). I other words, for ay degree of bias, α > 0, there is some threshold level of strog coectivity, < ε, which is sufficiet to esure this degree of bias. The proof sketched above also shows (from the persistece of egative slopes) that uder coditios of o spatial depedece (i.e, = 0 ) this ull hypothesis will ted to be falsely rejected i favor of egative depedecies ( < 0) for strogly coected weight matrices. Moreover, i cases where such depedecies are ideed egative, the stregth of these depedecies will ted to be overestimated. But as with all such cotiuity results, Propositio 3 still leaves ope the questio of how strog this coectivity must be i order to see a substatial effect. hile such questios ca oly be aswered defiitively by extesive simulatios, it is oetheless possible to illustrate the potetial sigificace of these results by meas of a typical example. 16 Here we set = 50, k =, ad costruct x -data ( x1, x ) by simulatig two uiformly distributed radom vectors, so that X = [150, x1, x]. Model (1) was the parameterized with β = ( β0, β1, β ) = (1,,3) ad stadard deviatio σ = 1. Agai for sake of illustratio the sigle value, =.5, was chose to represet (substatial) positive spatial depedecy i model (1). To aalyze the effects of strog coectivity, oly symmetric biary weight matrices were used i order to allow a average lik desity iterpretatio of the matrix orm i (40). A umber of matrices,, with differet average lik desities, d = (0,1) were radomly sampled. I particular, the values rc d {.30,.50,.80,.90,.95,.99} were chose for study. For each d a matrix, ( ), d 16 This example is oly meat to illustrate the practical cosequeces of the aalytical results above. As metioed i the itroductio, more extesive ad systematic simulatios ca be foud i both Mizruchi ad Neuma (008) ad Farber, S., A. Páez ad E. Volz (008). 15

16 was radomly sampled from the distributio idepedetly assigig w ij = 1 with probability d ad w ij = 0 otherwise. 17 I order to make the results at differet desity levels more readily comparable, each matrix d was ormalized i the same maer as, by dividig d by its maximum eigevalue. This rescalig esures that the positive values of i each simulated model are exactly the same, amely (0,1). 18 For each of these matrices, 1000 y -vectors were the simulated for model (1), ad correspodig maximum-likelihood estimates { ˆ ( s d ) : s= 1,..,1000} were computed.19 Perhaps the simplest way to summarize these results is to compare the sample mea values of ˆd for each of these desities with the true value, =.50, as i colum of Table 1 below. Table 1 Here As expected, oe sees uderestimatio i all cases, with steadily icreasig severity for higher desities. For compariso, the maximally coected case, d = 1, has bee added to show this extreme case is vastly worse tha all others. But oetheless, oe ca see the cotiuity properties i Propositio 3 at work. Uderestimatio becomes quite severe as coectivity desity icreases. Note also that i Table 1 the correspodig -itervals, [ d ], i (15) above are give i colum 4 (colum 3 will be discussed below). To provide a fuller compariso, selected histograms of { ˆ ( s d ) : s = 1,..,1000} are show for the cases d =.50,.80,.90,.99 i Figure 3 below. 0 Here the true value, =.50, is Figure 3 Here idicated by a bold arrow i each case to facilitate the visual compariso of these estimates. So at average lik-desity levels of at least 80% ( d.80 ) there is a substatial dowward bias i estimates. Aother way to see this is to ask what fractio of these estimates are reported as sigificatly differet from zero i the stadard two-sided tests usig asymptotic z-values. 1 Here, for a true value of =.50, oly the upper 4% of sample estimates at d =.80 are sigificatly differet from zero. he the desity is 17 Note that desity values, d, ca oly be approximated by this samplig procedure. However, repeated samples at each desity level yielded variatios that were too small to warrat reportig. I all cases the matrix, chose had a average lik desity well withi.01 of d. ( d ) 18 The ormalizatio, b = 1/( 1) = 1/49, used above has the theoretical advatage of preservig all relative coectivity relatioships. But the preset scalig to uit maximum eigevalues is a more typical ormalizatio used i practice. For compariso, calculatios were also doe for the 1/49 scalig, ad produced eve more dramatic uderestimatio results tha those preseted here. 19 The estimatio was doe i Matlab usig a modified versio of the LeSage (1999) suite of programs. 0 Cases d =.30 ad. d =.95 are, respectively, very similar to d =.50 ad d =.90, ad are omitted. 16

17 icreased to d =.90 this drops to less tha 15%. Further ivestigatios of such sigificace questios will be take up i Sectio 5 below. Fially, it is of iterest to recall from the discussio followig Propositio above that this uderestimatio of has cosequeces for the bias of other parameter estimates that are at least qualitatively predictable. hile it is difficult to place magitudes o the degree of these biases, they ca at least be illustrated for the simulatios of model (1) above. The mea estimates for all parameters are show i Table below (where the meas for ˆ have bee repeated from Table 1): Table Here Recall from Propositio that for the perfect fit case i the last row of Table, oe would predict a itercept coefficiet, ˆ β 0 1 y. I the preset case, the mea value of 1 y was about 351, which is i clear agreemet with Table. Hece for strogly coected weight matrices this is see to result i extreme overestimatio of β 0 i the preset case. It is also iterestig to ote that while the limitig estimates of β = ( β1, β ) ad σ i Table also agree with the zero values predicted by Propositio, these biases seem to disappear much more rapidly as lik desity decreases. However, it is worth otig that eve a slight dowward bias i ˆ σ (ad hece ˆ σ ) ca have potetially serious cosequeces for testig, where it ca lead to erroeous sigificace of beta parameters. 5. Extesio to Spatial Autoregressive Models The results above demostrate that strog coectivity of weight matrices ca lead to severe bias i the estimatio of spatial depedecies i spatial lag models. Hece it is atural to ask whether similar behavior is exhibited by spatial autoregressive models. Our mai result here is to show that with respect to spatial depedece parameters, the results for these two models are essetially idetical. To establish this, we begi by formulatig this model ad sketchig the parallel maximum likelihood estimatio problem for this case. As a parallel to model (1), the stadard spatial autoregressive model (SAR) for spatial uits: (48) y = Xβ u u = u ε ε N σ I,, ~ (0, ) 1 Note that for tests of positive it is theoretically more appropriate to cosider a oe-sided test ( > 0 ). But such results are ot reported i stadard spatial regressio software. This is also kow as the spatial errors model, to emphasize the spatial depedece amog errors. 17

18 where ow the spatial depedece parameter,, ad spatial weight matrix,, characterize possible spatial depedecies amog the residuals rather tha the depedet variable, y. 3 If oe solves for u ad writes this model i reduced form as (49) y = Xβ I ε ε N σ I 1 ( ), ~ (0, ) the it becomes clear that ad directly ifluece the covariace structure of the residuals, ε. Agai y is multiormally distributed, where the log likelihood fuctio for parameters ( β, σ, ) i (3) ow takes the form: (50) ( βσ,,, ) = l( σ ) l det( ) L y X cost I 1 σ ( y Xβ) ( I )( I )( y Xβ) The parallel betwee (3) ad (50) is eve more clear whe oe solves for the coditioal estimates of β ad σ give, (51) (5) ˆ 1 sar ( ) = [ ( )( ) ] ( )( ) β X I I X X I I y ˆ σ ( ) = (1/ )( y X ˆ β ( )) ( I )( I )( y X ˆ β ( )) sar sar sar ad substitutes ito (50) to obtai the cocetrated likelihood fuctio, L sar, for. Agai, after cacellig of terms, this fuctio reduces to (53) L y X = cost I ˆ σ sar (, ) l det( ) ( /)l[ ( )] which is see to be idetical i form to (6). 4 Hece these cocetrated likelihood fuctios differ oly with respect to their correspodig coditioal variace estimates i (5) ad (5). However, for the special case of maximally coected weight matrices,, it turs out that these coditioal variace estimates are idetical, as we ow show. To do so, we begi with the followig prelimiary result o a certai class of orthogoal 1 projectios, which are exemplified by the key projectio, X ( XX ) X, embodied i k expressio (7) for M. If for ay matrix, A R, of full colum rak k we ow let 1 k P = A( AA ) A deote the orthogoal projectio of R ito the spa of A (so that by A defiitio, PA A = A) the we have the followig useful coditio for equality betwee such projectios: 3 Although the spatial depedece parameter i this model acts o residuals rather tha y, we choose to keep the same otatio,, i order to emphasize the parallels betwee these two models. 4 I particular the costat terms, cost, are also easily show to be idetical. 18

19 k Lemma 4. For ay matrices, AB, R, of full colum rak, (54) PA = PB PAB= B Proof: Sice PB B = B, it follows at oce that PA = PB PAB= PBB= B. So we eed oly show the coverse. To do so, observe first that (55) P B= B P B BB B = B BB B P P = P 1 1 A A [( ) ] [( ) ] A B B Moreover, it also follows that (56) B = PB= A AA AB = A AA AB A 1 1 ( ) ( ) ( ) B B= BA AA AB BB = BA AA AB 1 1 ( ) ( ) AB = BB AA > 0 ad hece that AB is osigular. Thus by the first lie of (56) we have (57) PB= B PAAA AB = AAA AB PA= A 1 1 B B ( ) ( ) ( ) ( ) B where the last implicatio follows by post-multiplicatio of both sides by the 1 iverse of the osigular matrix ( AA ) ( AB ). Hece by the argumet i (55) (58) PA B = A PP B A = PA PP A B = PA where here the last implicatio follows by takig trasposes of both sides ad usig the symmetry of P A ad P B. Hece it be cocluded from (55) ad (58) that (59) PA = PAPB = PB ad the result is established. ith this result, we ow have the followig key idetity betwee SL models (1) ad SAR models (49) for the case of maximally coected weight matrices. Propositio 4. If = i models (1) ad (49), the the cocetrated likelihood fuctios L sl ad L sar are idetical for all [ ]. Proof: To establish this result, it is clear from (6) ad (53) that it suffices to show that the coditioal variace estimates i (10) ad (5) are idetical o [ ]. But if for otatioal coveiece we ow let (60) B = I = I b(1 1 I ) 19

20 [where b= 1/( 1) is oe possibility] the by the first lie of (10), it follows that for the SL model (1), 1 1 (61) ˆ σsl ( ) = (1/ ) ([ I X( XX ) X ] B y ) ([ I X( XX ) X ] By) = I X XX X B y 1 (1/ ) [ ( ) ] = (1/ ) ( I P ) B y X To compare this with the SAR model (49), observe from (51) that (6) ˆ 1 1 ( ) = ( ) = [( )( )] ( ) βsar X BBX XBBy BX BX BX By (63) ad hece from (5) that ˆ σ ( ) (1/ )( ˆ ( )) ( ˆ sar = y Xβsar B B y Xβsar ( )) = (1/ ) B ( ˆ y Xβsar ( )) = B y X BX BX BX By 1 (1/ ) ( [( ) ( )] ( ) ) = I BX BX BX BX By 1 (1/ ) { ( )[( ) ( )] ( ) } ) = (1/ ) ( I P ) B y B X I this form it is clear that the result will follow if it ca be show that (64) PX = B X for all P [ ] But sice X = [1, X ] ad PX X = X together imply that P 1 = 1, we must have X (65) PX( B X) = bpx(1 1 I) X = b( PX1 )1 X bpxx = b(1 1 ) X bx = b(1 1 I ) X = B X ad may coclude from Lemma 4 that (64) holds for all [ ]. Hece it follows at oce from Propositio 4 that for maximally coected weight matrices,, it will always be true that the maximum likelihood estimates, ˆ, of i correspodig SL ad SAR models are idetical. This i tur implies that Propositio 1 must hold i tact if the SL model i (1) is replaced by SAR model i (49). Hece the 0

21 same type of cotiuity argumet i Propositio ca be used to show that the spatial depedece parameter,, i SAR models will be uderestimated for strogly coected weight matrices. Rather tha repeat such argumets here, we simply report the correspodig estimatio results for the SAR model based o the same data X, parameters ( β, σ, ), ad weight matrices, d, d {.30,.50,.80,.90,.95,.99,1.00} used i Sectio 3 above. The results for are displayed i colum 3 of Table 1 i that sectio ad show that, as predicted by Propositio 4, these estimates coverge to the same extreme value as d approaches uity. However it is also clear that (at least i this particular example) the uderestimatio of is eve more severe tha for the SL model above. Table 3 Here The results for other parameter estimates are show i Table 3 above. Notice first that all mea beta estimates appear to be remarkably accurate eve i the maximally coected case. This is explaied by the well kow fact that for the SAR model, ˆβ is always a ubiased estimator of β for a correctly specified model, sice (66) E ˆ β X X I I X X I I E y X 1 ( ) = [ ( )( ) ] ( )( ) ( ) = X I I X X I I Xβ = β 1 [ ( )( ) ] ( )( ) Notice also that there is some slight uderestimatio of residual variace, as i the case of SL models. This does ot appear to be too severe (i the preset example). But agai, eve slight uderestimatio of variace ca lead to erroeous sigificat of beta parameters. Moreover, i the extreme case of maximal coectivity these estimates are i fact completely ustable, as ca be see by the depedecy of ˆβ o ˆ i the coditioal beta estimator of (6) above. If we set (67) ˆ = 1/ λmi ( ) = 1/ b i this extreme case, the (68) Bˆ = I ( 1/ b)[ b(11 I )] = 11 together with 1 1 = implies that (69) ˆ β ( ˆ ) = ( X B B X) XB B y = ( X 11 X) X 11 y 1 1 sar ˆ ˆ ˆ ˆ Hece if there is at least oe explaatory variable other tha the itercept (i.e., if k 1) the the matrix, X 1 1 X, is sigular ad the iverse i (69) will ot exist. I practice 1

22 however, what typically happes is that estimatio algorithms coverge to values close to 1/b which will yield well-defied aswers. I the case illustrated above, where 1/(1/49) = 49, eve values of cotiue to produce reasoable lookig estimates o average. 6. Cosequeces for Mora Tests of Spatial Autocorrelatio Aside from the above cosequeces for spatial regressio models such as SL ad SAR models, strog coectivity of weight matrices has broader implicatios for diagostic aalyses of spatial autocorrelatio. This is most evidet with respect to the sigle most widely used test for spatial autocorrelatio, amely the Mora Test. I particular, suppose that oe cosiders the ull hypothesis of idepedece ( = 0 ), uder which both SL ad SAR models reduce to the stadard liear model: (70) y = Xβ ε, ε N(0, σ I ) If oe costructs the stadard maximum-likelihood (OLS) estimates of β uder this hypothesis, (71) ˆ β = ( X X) 1 X y ad forms the correspodig vector of residual estimates: (7) ˆ ε = y yˆ = y X ˆ β the for ay give cadidate choice of a spatial weight matrix,, the associated Mora statistic, I, is defied by [see for example Aseli (1988, Sectio 8.1.1)]: (73) I = α ˆ ε ˆ ε ˆˆ ε ε where the positive costat, α = / = / w 1 ij ij, plays o substative role i the aalysis to follow. This ca be expressed i a more coveiet form (agai followig Aseli) by otig from (71) ad (7) that (74) 1 1 ˆ ( ) [ ( ) ] ε = y X XX Xy= I X XX X y= My ad hece from (9) that I ca be equivaletly writte as (75) I = α y MMy y My Uder the hypothesis of idepedece i (70), the mea ad variace of I are well kow to be [Aseli (1988, Sectio 8.1.1)]:

23 (76) (77) αtr( M ) EI ( ) = ( k 1) ( α ) { tr( MM ) tr( MM ) [ tr( M )] } var( I ) = [ E( I )] [ ( k 1)] [ ( k 1)] I this settig, our mai result is to show that for maximally coected weight matrices,, this Mora statistic is degeerate. 5 I particular it is completely cocetrated at the mea, EI ( ), ad hece ca ever detect spatial autocorrelatio. To establish this result, we first ote from (75) that this statistic is oly meaigful for data sets ( y, X ) with ymy 0. But sice, ymy = 0 My= 0 (as show i the proof of Lemma 3 above), this is i tur equivalet to the coditio that My 0. Hece, for purposes of this sectio we agai assume regularity of ( y, X ). I additio, we employ the ormalizatio covetio, b= 1/( 1), for so that λ ( ) 1 max =. Fially, for each regular data set, ( y, X ), we let I ( yx, ) deote the correspodig sample value of I i (75). ith these covetios, we have the followig result: Propositio 5. For all regular data sets ( y, X ), (78) I ( yx, ) = EI ( ) Proof: First observe that sice follows that b= = =, it 1/( 1) ( 1)[1/( 1)] 1 (79) α = = / = 1 1 ad hece that the Mora statistic for this case reduces to (80) I ( y, X) = y My y M y Thus we see from Lemma 3 that y ( bm) y 1 ymy 1 (81) I ( y, X) = = = y My 1 y My 1 5 This degeeracy is also a istace of the more geeral result i Arold (1979, Th.5) for the class of ivariat test statistics for liear models with exchageably distributed errors. A more explicit versio relatig to the preset case is give i Martellosio (008, Props. 3.4 ad 3.6). 3

24 ad may coclude that I is ideed cocetrated at a sigle value. To show that this value is precisely the mea, E( I ), uder idepedece, we first ote that sice the trace of orthogoal projectio (symmetric idempotet) matrix, M, is equal to the dimesio of its image space [Searle (198, Sectio 1.)], ad sice the dimesio of the complemet of the spa of X is ( k 1), it follows that (8) tr( M ) = ( k 1) This i tur implies from Lemma 3 that (83) 1 ( k 1) tr( M ) = tr( bm ) = tr( M ) = 1 1 ad hece from (76) ad (79) that tr( M ) 1 (84) EI ( ) = = ( k 1) 1 Thus the result follows from (81) ad (84). As a cosequece of Propositio 5, it follows that (with probability oe) 6 the realized value of I is precisely its expected value uder idepedece. Hece o evidece for spatial depedece ca ever be detected i this extreme case. More geerally, the same type of cotiuity argumet used i Propositio 3 above shows that for weight matrices,, that are sufficietly close to [say i terms of the relative coectivity orm] it must be true that the possible values of I are cocetrated close to the mea EI ( ). So agai this statistic should have little ability to detect spatial depedece. To make these ideas more cocrete, we choose to focus o the stadard z-test for Mora statistics foud i most software. If the stadard deviatio of I uder idepedece is 1/ deoted by σ ( I) = var( I), the it is well kow [Cliff ad Ord(1981, Sectio 8.5.1)] that the stadardized z-value (85) Z = I EI ( ) σ ( I ) is approximately distributed N(0,1) for large. Hece oe ca use this distributio theory to test the hypothesis of spatial idepedece with respect to weight matrix. 7 6 It is a simple matter to show for ay X, the set of y with My = 0 has probability measure zero. 7 It is worth otig here that the exact distributio of I uder idepedece has bee obtaied by Tiefelsdorf ad Boots (1995). However, most statistical packages rely o the asymptotic approximatio above. 4

25 To study the behavior of this test for strogly coected weight matrices, we shall focus oly o the simulatio results i Sectio 3 above based o the SL model. Here it was assumed that =.5 ad hece that a substatial degree of positive spatial depedece is preset. To determie whether this depedece ca be detected by the Mora statistic for a give weight matrix,, it suffices to compute I ( yx, ) for simulated data sets from model (1), ad the examie the frequecy distributio of z-values, Z ( yx, ), geerated by this data. For a oe-sided test of > 0 at the α =.05, oe eed oly cout the fractio of z-values above z α = 1.65 to determie the power of this test to detect positive spatial depedece, give the true value =.5. For the 1000 simulated values at each lik desity level i Sectio 3 above, the resultig estimated power levels are show i Table 4 below. Table 4 Here Here it is clear that at lik desities above.80 the distributio is so cocetrated aroud the ull mea, EI ( ), that eve a depedecy level of =.5 is detectable less tha 10% of the time. 8 It is also of iterest to ote that eve though the distributio of I cocetrates at the ull mea as lik desity approaches 1, the power levels do ot appear to fall to zero i Table 4. The reaso for this is that cocetratio of I values drives the variace i (76) to zero (as ca easily be verified by the same calculatios as for the mea i the proof of Propositio 5 9 ). Hece whe I is highly cocetrated, the stadardized value, Z, becomes ustable (as it approaches the limitig idetermiate values 0/0 for ). 7. Cocludig Remarks I this paper it has bee show that presece of strogly coected spatial weight matrices ca itroduce serious biases ito both the estimatio ad testig of spatial autocorrelatio. Hece oe is led to ask whether there is ay simple ituitive explaatio for this. Oe possibility relates to the otio of effective sample size. It has log bee observed that the presece of statistical depedecies essetially reduce the amout of iformatio gaied from each idividual observatio. For example, the observatio of a sequece of perfectly correlated coi tosses will offer o more iformatio tha the observatio of oly the first toss, o matter how log the sequece is. Hece it ca be argued that i so far as strog spatial coectivity reflects strog depedecies amog 8 It is also of iterest to ote that the.054 value for desity.99 is cosistet with a limitig value of α =.05 for the maximally coected case, as implied by results of Martellosio (008, Prop.3.5). 9 Note i particular from Lemma that for = btr( M) = [ ( k 1)]/( 1)., ( ) = [( )( )] = ( ) tr M M tr bm bm b tr MM 5

26 uits (or agets), there should be less statistical iformatio available for estimatio or tests of hypotheses. But while this argumet has ituitive appeal, ad is o doubt true to some extet, it fails to explai, for example, why maximum-likelihood methods should systematically uderestimate the parameter i SL ad SAR models. I this paper it has bee show that much ca be leared by studyig the extreme case of maximally coected weight matrices,. I particular, both cocetrated likelihood fuctios ad Mora statistics reduce to particularly simple forms i this case, ad ca be studied i detail. But eve i this extreme case, the subtlety of the uderestimatio questio above is uderscored by the fact that quite differet argumets were used to boud the values for each term i the cocetrated log likelihood fuctio. I particular, both the eigevalue structure of 1 ad the relatio of to the regressio projectio operator, I X( XX ) X, were ivolved. So i some respects, these results serve to raise as may theoretical questios as they aswer. Eve more importat are questios relatig to the practical cosequeces of these results. hile the sigle simulatio example preseted here is very suggestive, it ca provide o defiitive guidelies for applicatios. Hece the actual severity of these biases ca oly be determied by more extesive ad systematic simulatio studies, as already begu by Mizruchi ad Neuma (008) ad Farber, Páez ad Volz (008). Refereces. Aseli, L. (1988) Spatial Ecoometrics: Methods ad Models, Kluwer: Netherlads. Aseli, L. ad A. Bera (1998), Spatial Depedece i Liear Regressio Models with a Itroductio to Spatial Ecoometrics. I A. Ullah ad D. Giles (eds.), Hadbook of Applied Ecoomic Statistics, New York: Marcel Dekker, pp Arold, S.F. (1979) Liear models with exchageably distributed errors, Joural of the America Statistical Associatio, 74: Baltagi, B.H., (006) Radom effects ad spatial autocorrelatio with equal weights, Ecoometric Theory, : Bao, Y. ad A. Ullah (007) Fiite sample properties of maximum likelihood estimator i spatial models, Joural of Ecoometrics, 137: Farber, S., A. Páez ad E. Volz (008) Topology ad depedecy tests i spatial ad etwork autoregressive models, forthcomig i Geographical Aalysis (curretly available at 6

27 Hor, R.A. ad C.R. Johso, (1985) Matrix Aalysis, Cambridge Uiversity Press: Cambridge. Kelejia, H.H. ad I.R. Prucha (1998) Geeralized spatial two-stage least squares procedure for estimatig a spatial autoregressive model with autoregressive disturbaces, The Joural of Real Estate Fiace ad Ecoomics, 17: Kelejia, H.H. ad I.R. Prucha (00) SLS ad OLS i a spatial autoregressive model with equal spatial weights, Regioal Sciece ad Urba Ecoomics, 3: Kelejia, H.H., I.R. Prucha ad Y. Yuzefovich, (006) Estimatio problems i models with spatial weightig matrices which have blocks of equal elemets, Joural of Regioal Sciece, 46: Kelejia, H.H. ad I.R. Prucha (008) Specificatio ad estimatio of spatial autoregressive models with autoregressive ad heteroskedastic disturbaces, forthcomig i the Joural of Ecoometrics. LeSage, J. (1999) Spatial Ecoometrics Toolbox, Martellosio, F. (008) Testig for spatial autocorrelatio: the regressors that make the power disappear, orkig Paper, Departmet of Ecoomics, Uiversity of Readig, Readig RG6 6A, UK. Mizruchi, M.S. ad E.J Neuma (008) The effect of desity o the levels of bias i the etwork autocorrelatio model, forthcomig i Social Networks (curretly available at Ord, K. (1975) Estimatio methods for models of spatial iteractio, Joural of the America Statistical Associatio, 70: Searle, S.R., (198) Matrix Algebra Useful for Statistics, iley: New York. Tiefelsdorf, M. ad B.N. Boots (1995) The exact distributio of Mora s I, Eviromet ad Plaig A, 7: Ackowledgemets: The author is grateful to Mark Mizruchi, Eric Neuma, Oleg Smirov, Harry Kelejia ad Federico Martellosio for may helpful commets o a earlier draft of this paper. 7

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),