W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

1 W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS An Liu University of Groningen Henk Folmer University of Groningen Wageningen University Han Oud Radboud University Nijmegen

2 Objective To evaluate the performances of W-based and latent variables spatial modeling approaches by means of Monte Carlo simulations in a more comprehensive setting

Motivated by previous simulation studies: 3 Two data generation schemes - on the basis of the structure of a classical W-based spatial autoregressive model using a contiguity or inverse distance weight matrix - also based on the W-based model structure, but with spatial dependence incorporated as spillover from hotspots weighted by inverse distance Evaluation of estimators in terms of bias and RMSE - each approach outperformed the other in some cases, but neither of them was obviously dominant in both settings

W-based spatial autoregressive model 4 y = ρ Wy + Xβ + ε ε ~ N(, σ 2 I n ) Maximum likelihood estimation: N N 2 1 L = lnπ lnσ + ln A 2 2 2σ A = I ρw 2 ( Ay Xβ ) ( Ay Xβ ) Jacobian correction: ln w i I ρw = ln Π ( 1 ρw ) = Σ ln( 1 : the eigenvalues of W i i i ρw ) i

5 Latent variables Refer to those phenomena that are supposed to exist but cannot be directly observed Can be measured by means of observables Example: socio-economic status, measured by income, education level and employment status

Structural equation modeling (SEM) - structural model: relationships between the latent variables η = Bη + ζ with cov( ζ ) = Ψ - measurement model: relationships between the latent variables and their observable indicators y = Λη + ε with cov( ε ) = Θ Maximum likelihood estimation: N N 1 pn l θ Y = ln Σ tr SΣ ln 2 2 2 2 Σ: theoretical covariance matrix ( ) ( ) π 1 1 Σ = Λ( I B) Ψ( I B' ) Λ ' +Θ S: observed covariance matrix 6

7 SEM representation of the observed spatial lag model Structural equation: y = ρη + γ ' x + ζ Measurement equation: y = Λη + ε Jacobian correction: ~ N N l 2 2 ρ ρ A = I S1 S2 L mλ mλ ( θ y ) = ln A ln Σ tr( SΣ ) ln 2π S m 1 2 m ρ λ 1 pn 2 : the selection function for the mth indicator λ m : the factor loading of the mth indicator m S m

8 Simulation study design Rationale: two types of spatial dependence are considered in order to solve a common problem, e.g. economic activity in a region could usually be influenced by both neighbors and economic centre (hotspot) By introducing two spatial lag parameters ρ 1 (for hotspot) and ρ 2 (for neighbors), we get - a broader and more inclusive definition of spatial dependence in sample generation from a practical perspective - a more comprehensive comparison of the performance of the classical W-based model and the SEM approach

9 Simulation study design (cont d) Map: regular lattice structures of dimensions 7 7, 1 1, 15 15 (N = 49, 1, 225) Samples generated based on the structure of the standard spatial lag model: y = ρ 1W 1y + ρ2w2 y + xβ + ε 1 W 1 is the inverse distance matrix with elements equal to for cell i and hotspot j and zero elsewhere; d ij W 2 is a first-order contiguity or inverse distance matrix. Hotspot needs to be fixed before sample generation: according to the values of x (largest value)

Sample generation procedure 1 1. Generate the exogenous variable x by drawing from U(,1); 2. Fix β =1 for all simulation runs; 3. ρ1 and ρ2 take values,.1,.3,.5,.7 and.9 consecutively under constraint: I ρ W ρ W ; - ML estimation requires I ρw 1 1 2 2 > 4. Generate values for the error term by randomly drawing from N(, 2); 5. Choose the hotspot according to the values of x in step 1 and 1 compute y as: y = ( I ρ W ρ W ) ( xβ + ). > ε 1 1 2 2 ε

Estimation and analysis 11 Repeat estimation procedure of W-based models (I. true model; II. first order contiguity or inverse distance, depends on W 2 used in sample generation) and SEM (first three nearest neighbors and spillover from hotspot as indicators) Number of replications: 5 Compute bias and RMSE of the estimators for β, the only comparable regression coefficient Compare two approaches over the dimensions of different value combinations of spatial lag parameters, specifications of weight matrices and sample sizes

Simulation results in graphs Bias in absolute value (N = 49, W 2 = contiguity).8.7.6.5.4.3.2.1 12.1.1.1.1.1.1.3.3.3.3.3.5.5.5.5.7 abs(bias).1.3.5.7.9.1.3.5.7.9.1.3.5.7.1.3.5.7.7.9.9.1.3.1 TRUE CONT SEM rho2-contiguity rho1-hotspot

RMSE (N = 49, W 2 = contiguity).21.19.17.15.13.11.9.7.5 13.1.1.1.1.1.1.3.3.3.3.3.5 RMSE.1.3.5.7.9.1.3.5.7.9.1.3.5.7.5.5.5.1.3.5.7.7.7.9.9.1.3.1 TRUE CONT SEM rho2-contiguity rho1-hotspot

Bias in absolute value (N = 49, W 2 = inverse distance).5.45.4.35.3.25.2.15.1.5. 14.9.1.1.1.1.1.1.3.3.3.3.3.5.5.5.5.7.7.7 abs(bias).1.3.5.7.9.1.3.5.7.9.1.3.5.7.1.3.5.1.3.9.1 TRUE INVD SEM rho2-inverse distance rho1-hotspot

RMSE (N = 49, W 2 = inverse distance).5.45.4.35.3.25.2.15.1.5 15.1.1.1.1.1.1.3.3.3.3.3.5 RMSE.1.3.5.7.9.1.3.5.7.9.1.3.5.7.5.5.5.1.3.5.7.7.7.9.9.1.3.1 TRUE INVD SEM rho2-inverse distance rho1-hotspot

Mean of bias grouped by ρ 1 (N = 49, 1, 225) 16

Mean of RMSE grouped by ρ 1 (N = 49, 1, 225) 17

18 Conclusions SEM frequently has smaller bias and RMSE than the misspecified W-based models SEM increasingly outperforms W-based models as the spatial lag parameter for spillover from hotspot goes up Both approaches perform better and their differences get smaller in terms of RMSE with larger sample sizes The leading chances of SEM grows by sample size SEM is also more stable than misspecified W-based models in terms of variations in bias and RMSE

19 Discussions SEM: not all model search options were exploited - indicators were fixed whereas the optimal choice and number could be tested and identified for each sample - the option of using more than one latent variable would bring SEM closer to the correctly specified model, i.e. one for spillover from hotspots, one for neighbors

Thank you for your attention! 2