W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS. Introduction When it coes to applying econoetric odels to analyze georeferenced data, researchers are well aware that ignoring spatial dependencies would lead to inefficient and biased estiates. A lot of theoretical and siulation studies have been undertaken seeking efficient and consistent estiators for spatial data analysis and one ajor issue concerns the kind of decisions to ake regarding the specification of the structure of spatial dependence including the type of spatial weights to be incorporated into the odel. Therefore, our study focuses on the evaluation of two approaches that explicitly odel spatial autocorrelation: classical W-based regression and the recently proposed latent variables spatial autoregressive odel. In the spatial econoetrics literature the W-based spatial regressions have been predoinant and ost coonly used. The approach is based on one or ore spatial structural atrices, usually denoted W, that accounts for spatial dependence and spill-over effects. The selection of spatial weight atrices is a crucial feature of spatial odels because they ipose a priori a structure of spatial dependence on the odels and affect estiates (Bhattacharjee and Jensen-Butler, 6; Anselin, and Fingleton, 3) and substantive interpretation of the research findings (Hepple, 995). Autoregressive odels can include several types of spatial weight atrices. Most coon are the contiguity-based atrices. Two regions are said to be first-order contiguous if they share a coon border. Higher-order contiguity is defined in a siilar way. The contiguity atrix is based on different fors (whether the units of observation share coon boundaries or vertices) or orders of contiguity. The other type of weight atrix is Another approach to dealing with spatial autocorrelation is spatial filtering. It coes down to the reoval of spatial dependence in a spatially autocorrelated variable by partioning it into a filtered nonspatial variable and a residual spatial variable such that conventional regression techniques can be applied to the filtered data (see aongst others Getis and Griffith () and Tiefelsdorf and Griffith (7). Spatial filtering is not considered in this paper
distance based and calculated using either an algorith with distance, such as inverse distance or inverse distance squared, or a fixed distance band. Much progress has been ade with respect to the coparison of spatial weights atrices (Hepple, 995), the construction of spatial weights atrices (Getis and Aldstadt, 3; Aldstadt and Getis (6) and estiating spatial weights atrices that are consistent with an observed pattern of spatial dependence rather than assuing a priori the nature of spatial interaction. In spite of all these developents the ost coon procedure is still to assue a priori first order contiguity, as expressed by a atrix W with diagonal eleents equal to zero and off-diagonal eleents equal to one if two regions are first-order contiguous and zero elsewhere. The alternative, a latent variables based approach was introduced by Foler and Oud (8). It proceeds on the basis of a structural equation odel (SEM). SEM allows siultaneous handling of observed and latent variables within one odel fraework. Latent variables refer to those phenoena that are supposed to exist but cannot be observed directly. One exaple is socio-econoic status. It carries the concept of an individual s standing in the society which cannot be observed or easured directly. However, it can be easured via observable indicators like educational attainent, incoe, occupational status, etc. The structure of a SEM includes two parts: (i) the structural odel presenting the causal relationships between latent variables and (ii) the easureent odel showing the relationships between the latent variables and their observable indicators (Oud and Foler, 8). The latent variables approach replaces the spatially lagged variables in the structural odel (which is the analogue of the spatial dependence odel in the W-based approach) by latent variables and odels the relationship between latent spatially lagged variables and its observed indicators in the easureent odel. Since one latent variable can be easured by several indicators, this approach allows for the straightforward inclusion of spatial dependence in the odel. It is also capable of testing distance decay in a straightforward fashion so as to identify the spatial units that evoke a distance effect and those that do not. Moreover, it is possible to include various different types of contiguity as it allows easureent of spatial dependence by several sets of indicators. In that case ore than one
latent spatial dependence variable enters the structural odel with corresponding indicators in the easureent odel. Foler and Oud (8) shows that the latent variables approach can produce the sae estiates as obtained by the W-based approach but also that it is ore general than the W-based approach. However, further coparison is needed to draw up the pros and cons of each approach. To gain ore insight into the quality of the estiators of the regression coefficients including the spatial autocorrelation coefficient in both approaches, we carry out a nuber of Monte Carlo siulations. The coparison of the two approaches is done by considering Anselin s (988) Colubus, Ohio, crie dataset. The perforance of the approaches will be analyzed in ters of bias and ean squared error for various values of spatial autocorrelation. The rest of the paper is organized as follows: Section explains the W-based spatial regression approach and the latent variables approach and specifies their odel structures. A detailed description of the experiental design is given in section 3. In section 4 we report the siulation results and section 5 concludes the paper.. Model Specifications. The W-based Autoregressive Model We only consider the spatial lag odel which assues that the dependent variable is a function of exogenous variables and the dependent variable at other locations. We just refer to this odel as the W-based spatial autoregressive odel. The specification for the odel reads: y = ρ Wy + X + ε () ε ~ N(, σ ) () I n where y is an n vector of observations on the dependent variable, X is an n k data atrix of explanatory variables with the associated coefficient vector, ε is an 3
n vector of error ters. W is the row-standardized n n spatial weight (contiguity) atrix, ρ is the spatial lag paraeter. Anselin (988) presents a axiu likelihood ethod for estiating the paraeters of this odel that he labels a ixed regressive spatial autoregressive odel as it cobines the standard regression odel with a spatially lagged dependent variable (analogue to the lagged dependent variable odel in tie-series analysis). Maxiu likelihood estiation of this odel is based on a concentrated likelihood function in the sense that the axiization can be reduced to an expression in one paraeter, conditional upon the values of the other paraeters. The corresponding log-likelihood fuction is of the for: N N L = lnπ lnσ + ln A ( Ay X ) ( Ay X ) σ (3) with A = I ρw. (4) A few regressions are carried out along with a univariate paraeter optiization of the siplified likelihood function over values of the autoregressive paraeter ρ. The steps are described in Anselin (988):. Perfor OLS of X on y : y = X O + ε O. Perfor OLS of X on Wy : Wy = X L + ε L 3. Copute residuals: eo = y Xˆ O, el = Wy Xˆ L 4. Given e O and e L, find ρ that axiizes the concentrated likelihood function: L C N = C ln O L O L N ln ( e ρe ) ( e ρe ) + I ρw, which is a result of the substitution of estiated and σ into the log-likelihood, with C as the constant 5. Given ρˆ that axiizes L C, copute: ˆ = ˆ O ρ ˆ L, ˆ σ = ( eo ρel ) ( eo ρel ). N 4
The likelihood function discussed above contains a Jacobian ter ln A. Since the atrix A is of diension equal to the nuber of observations, its presence in the function to be optiized akes the nuerical analysis considerably coplex. The deterinant and its derivative need to be evaluated at each iteration for a new value of the spatial paraeter ρ. Ord (977) has derived a siplification of deterinants such as A in ters of its eigenvalues, ore specifically: ln I ρw = ln Π ( ρw ) = Σ ln( ρw ) (5) i i i i where the w i are the eigenvalues of W.. The Latent Variables Autoregressive Model A SEM in general for consists of three basic equations: = y Λ η + ε y with ( ) ε Λ ξ +δ = x x with ( ) δ η η + Γξ + ζ cov ε = Θ, (6) cov δ = Θ, (7) = B with cov ( ξ ) = Φ, ( ζ ) = Ψ cov. (8) In the easureent odels () and (), the vectors y and x are observed endogenous and exogenous variables; the vectors η and ξ contain latent endogenous and exogenous variables; the atrices Λ y and Λ x specify the loadings of the observed variables (indicators) on the vectors of latent variables y and x, and Θ ε and Θ δ are the easureent error covariance atrices. Directly observed variables can be conveniently handled in the structural odel by specifying an identity relationship between a given observed variable and the corresponding latent variable in the easureent odel. In the structural odel (3) B specifies the structural relationships aong the latent endogenous variables utually and Γ contains the ipacts of the exogenous latent variables on the endogenous. Φ is the covariance atrix of latent exogenous variables and Ψ the covariance atrices of the errors in the structural odel. The easureent 5
errors in ε and δ are assued to be uncorrelated with the latent variables in η and ξ as well as with the structural errors inζ. In a SEM odelling fraework paraeter estiation is done by iniizing the distance between the theoretical covariance atrix (based on hypotheses relating to the odel structure as specified in the paraeter atrices Λ x, Λ y, Θε, Θδ, B, Γ, Φ and Ψ ) and the observed covariance atrix. Several estiators for SEM have been developed including instruental variables (IV), two-stage least squares (TSLS), unweighted least squares (ULS), generalized least squares (GLS), fully weighted (WLS) and diagonally weighted least squares (DWLS), and axiu likelihood (ML). ML ethod is the ost coonly used ethod in estiating SEM odels and the default in the statistical packages Mx and LISREL. Below we apply the ML estiation. It axiizes the log-likelihood function: l N N ( θ Y ) = ln Σ tr( SΣ ) ln π pn where Σ is the theoretical variance covariance atrix in ters of the free and constrained eleents in the 8 paraeter atrices: (9) Λ Σ = y ( I B)( ΓΦΓ + Ψ)( I B) Λ Λ ΦΓ ( I B) Λ x y y + Θ ε Λ y ( I B) ΓΦΛ x Λ ΦΛ + Θ x x δ () S is the observed variance covariance atrix for given data Y. The ML-estiator ˆ arg axl ( θ Y ) axiizes l ( θ Y ) θ = chooses that value of θ which. Miniizing the fit function: F ML = ln Σ + tr ( SΣ ) ln S p () gives the sae results as axiizing the above likelihood function (Oud and Foler, 8). The LISREL software package also contains a variety of test and odel evaluation statistics and gives hints about identification probles (see Joreskog and Sorbo, 996 for details). On the basis of SEM, we propose a odel structure to represent the spatial lag odel. We specify the lag odel as a SEM and estiate it using the software package Mx. 6
The odel is referred to as latent variables spatial autoregressive odel in this paper. We apply the concept of a latent variable to the spatially lagged dependent variable such that in the structural odel Wy is replaced by a latent variable while in the easureent odel the latent spatially lagged dependent variable is related to its indicators. It replaces the spatially lagged variable Wy in the W-based spatial autoregressive odel by a latent variableη : y = ρη + γ ' x + ζ () and is copleted by a easureent equation: with y = Λη + ε (3) y y y =, M y λ λ Λ =, M λ σ ε Θ = σ ε O σ ε (4) (Observe that without an appropriate constraint, the loadings λi will not be identified. For that reason, one often chooses λ =.) We assue that the spatially lagged observed variables are chosen on the basis of theoretical or ad hoc considerations. Moreover, as pointed out in the introduction ore than one spatial feature can be taken into account. For instance, both the relationships to neighbouring regions as relationships to the core regions could be included in the case of the crie odel. We first turn to the easureent odel. This odel is constructed by eans of selection functions or selection atrices S i which select relevant observations fro the vector of observations as follows. In order to distinguish between the standard definition of a SEM in ters of variables and a SEM defined in units of observations, as in standard spatial econoetrics, we denote the latter by tilde (~). 7
~ y ~ y ~ y = S ~ y, = S ~ y, = M S ~ y. (5) That is, S selects the values for the first indicator ~ y, S for the second indicator ~ y, etc. For exaple, S ay be defined as the selector of the observations on, say crie, in the nearest contiguous neighbour, S as the selector of the observations on crie in the next nearest contiguous neighbour, S 3 as the spillover of crie fro the core, easured, for exaple, as crie in the core divided by a region s distance fro the core, etc. Thus, for the easureent odel we obtain: ~ y ~ y ~ y = S ~ ~ ~ y = λη + ε, = S ~ y = λ ~ η + ~ ε, = S M ~ y = λ ~ η + ~ ε. (6) Fro the above one observes that spatial dependence is captured by two kinds of paraeters, ρ and λ i, whereas in the standard lag odel only the average effect ρy w shows up. This eans that a uch richer representation and testing of the spatial structure can be obtained than by way of standard spatial econoetric approaches. For instance, it allows for deterining those units that evoke a distance effect and those that do not, as suggested by Getis and Aldstadt (4) by testing the significance of the λ i coefficients. The standard SEM log-likelihood function needs correction so as to account for the presence of the spatially lagged dependent variable aong the explanatory variables. Fro Foler and Oud (8), for ρ ρ ρ A = I S S L λ λ λ S (7) we need to add ln A to the log-likelihood function. 8
3. Experiental Design To investigate the perforances of the two odelling approaches specified in section, we conduct a nuber of Monte Carlo siulations. Both approaches are applied to Anselin s (988) Colubus, Ohio, crie dataset. The first step is the design of a data-generating procedure. We first explain the data generating procedure for the W-based spatial lag odel. For this purpose we rewrite equation () according to the nuber of variables and observations involved in Anselin s crie odel: y ρ Wy + + x + + ε (8) = x Thus the expression of y is: y = I ρ W ) ( + x + x + ε ) (9) ( The steps for generating saples are:. Fix values for the vector (,, ) as: = 45, =., =. 3, and vary ρ over the interval [, ).. Generate saples of error ter ε by randoly drawing fro a unifor (, ) distribution. 3. Copute y according to (9). Next the spatial lag odel is estiated for each saple which gives: ρ,,, and σ, on the basis of which we copute bias and ean squared error (MSE). The procedure is siilar for the latent variables approach. Specifically, the Wy in standard spatial lag odel is replaced by a latent variable while in the easureent odel the latent spatially lagged dependent variable is related to its indicators. Here we take the observables for the three contiguous nearest neighbouring regions. Next we estiate the latent spatial lag odel for each of the sae saples generated fro steps ~3 in the siulation procedure of the standard spatial lag odel, which gives ρ (aongst others),,, and σ. Then the bias and MSE of the estiators are calculated. The nuber of replications is set to and the ain diensions over which the 9
variation in power of the approaches and the distribution of the pre-test estiators are investigated are the changing values of the spatial lag paraeter ρ. We exaine the paraeter space fro [.,.9] using increents of.. The experient consists of coparisons in ters of odel estiators and their bias and MSE of the two approaches. The results are displayed in tables and graphs in the next section. 4. Siulation Results
Table. Mean of estiation results on the Colubus, Ohio, crie dataset by W-based and latent variables spatial autoregressive odels ρ σ Paraeter ρ W-based Latent W-based Latent W-based Latent W-based Latent W-based Latent. -.44 -.9 (.69) (.43)..55.57 (.63) (.45)..54.38 (.55) (.3).3.54.45 (.46) (.54).4.354.37 (.35) (.4).5.455.495 (.) (.6).6.558.599 (.8) (.4).7.66.69 (.93) (.3).8.768.793 (.78) (.6).9.876.4 (.6) (.69) -.8 -.964 (.35) (.44) -. -. (.38) (.38) -.4 -.4 (.3) (.38) -.8 -.5 (.33) (.394) -.3 -.4 (.36) (.345) -.37 -.47 (.39) (.34) -.43 -.35 (.3) (.3) -.49 -.3 (.36) (.33) -.55 -.969 (.334) (.369) -.64. (.353) (.433) -.97 -.9 (.93) (.3) -.98 -.95 (.93) (.) -.98 -.9 (.94) (.6) -.98 -.93 (.94) (.3) -.99 -.86 (.94) (.) -.99 -.84 (.94) (.94) -.99 -.77 (.94) (.94) -.3 -.7 (.94) (.94) -.99 -.58 (.94) (.93) -.98 -. (.94) (.) 45.97 47.538 (6.8) (88.589) 46.87 45.49 (6.334) (37.36) 46.88 44.78 (6.67) (.58) 46.56 43.343 (6.98) (.44) 46.78 4.5 (7.44) (.59) 47.5 39.595 (7.633) (.) 47.5 4.595 (8.6) (9.65) 48.44 4.76 (8.874) (9.5) 48.778 4.759 (.39) (4.363) 5.387 -.58 (4.83) (58.67) 9.846 88.68 (9.345) (8.78) 9.3 89.8 (9.376) (.86) 9.8 9.8 (9.43) (9.8) 9.43 93.63 (9.5) (.453) 9.68 95.79 (9.6) (.635) 9.964 96.6 (9.737) (.895) 93.84 96.68 (9.886) (5.45) 93.678 94.983 (.43) (6.83) 94.58 9.38 (.34) (3.348) 95.53.49 (.65) (3.55)
Table. Bias of the estiators for ρ, and, with different values for the spatial lag paraeter ρ Paraeter ρ W-based Latent W-based Latent W-based Latent..38.4.44.3.74.96..33.59.46.85.74.84..7.8.48.86.74.85.3..78.5.78.75.79.4..5.53.66.75.78.5...55.59.75.76.6.89.94.58.57.75.77.7.76.76.6.57.75.79.8.6.77.69.9.75.8.9.47.5.83.47.75.5
Table 3. Mean squared error ( ) of the estiators for ρ, and, with different values for the spatial lag paraeter ρ Paraeter ρ W-based Latent W-based Latent W-based Latent. 3.3 5.93 9.34 8.47.868.74..847 6.6 9.499 4.63.87.5..65 5.68 9.67 4.555.874.36.3.34 6.77 9.86 5.743.877.64.4.3 4.8.67.4.88.9.5.696.56.94.696.883.96.6.344.545.544.468.886.98.7.3.6.865.53.887.973.8.7.349.48 3.663.89.4.9.47.988.83 38.359.889 4.867 3
5. Conclusions 4
Reference Aldstadt, J. and Getis, A. (6). Using AMOEBA to create a spatial weights atrix and identify spatial clusters. Geographical Analysis, 38, 37-343. Aldstadt, J. and Getis, A. (4). Constructing the spatial weights atrix using a local statistic. Geographical Analysis, 36,, 9-4. Anselin, L. (). Under the hood: Issues in the specification and interpretation of spatial regression odels. Agricultural Econoics 7, 3, 47-67. Anselin, L. (988). Spatial Econoetrics: Methods and Models. Dordrecht: Kluwer Acadeic Publishers. Bhattacharjee, A. and Jensen-Butler, C. (6). Estiation of the spatial weights atrix in the spatial error odel, with an application to diffusion in housing deand. Available: http://www.econ.ca.ac.uk/panel6/papers/bhattacharjeepaper3.pdf. Fingleton, B. (3). Externalities, econoic geography and spatial econoetrics: conceptual and odeling developents, International Regional Science Review, 6,, 97-7. Foler, H. and Oud, J. (8). How to get rid of W? A latent variables approach to odeling spatially lagged variables. Environent and Planning A, 4, 56-538. Hepple, L.W. (995). Bayesian techniques in spatial and network econoetrics:. Model coparison and posterior odds. Environent and Planning A, 7, 447-469. Hepple, L.W. (995). Bayesian techniques in spatial and network econoetrics:. Coputational ethods and algoriths. Environent and planning A, 7, 65-644. LeSage, J.P. (999). The Theory and Practice of Spatial Econoetrics, 999. Available: http://www.spatial-econoetrics.co/htl/sbook.pdf. Neale M.C., Boker S.M., Xie G., Maes H.H. (3). Mx: Statistical Modeling. VCU Box 96, Richond, VA 398: Departent of Psychiatry. 6th Edition. 5