Outline. Overview of Issues. Spatial Regression. Luc Anselin

Spatial Regression Luc Anselin University of Illinois, Urbana-Champaign http://www.spacestat.com Outline Overview of Issues Spatial Regression Specifications Space-Time Models Spatial Latent Variable Models Overview of Issues 1

Motivation Model-Driven new focus on spatial interaction» new economic geography, interacting agents, spatial equity, spatial externalities, etc. Data-Driven use of geo-referenced information» GIS data, data integration Four Elements of Spatial Econometrics Specifying the Structure of Spatial Dependence» which locations/observations interact Testing for the Presence of Spatial Dependence» what type of dependence, what is the alternative Estimating Models with Spatial Dependence» spatial lag, spatial error, higher order Spatial Prediction» interpolation, missing values Spatial Dependence Estimating the Form/Extent of Spatial Interaction substantive spatial dependence spatial lag models: y Correcting for the Effect of Spatial Spill-overs spatial dependence as a nuisance spatial error models: ε 2

Specifying Spatial Dependence Substantive Spatial Dependence lag dependence include Wyas explanatory variable in regression y = ρwy + Xβ + ε Dependence as a Nuisance error dependence non-spherical error variance E[εε ] = Ω» where Ω incorporates dependence structure Interpretation of Spatial Lag True Contagion related to economic-behavioral process only meaningful if areal units appropriate (ecological fallacy) interesting economic interpretation (substantive) Apparent Contagion scale problem, spatial filtering Interpretation of Spatial Error Spill-Over in Ignored Variables poor match process with unit of observation or level of aggregation apparent contagion: regional structural change economic interpretation less interesting nuisance parameter Common in Empirical Practice 3

Cost of Ignoring Spatial Dependence Ignoring Spatial Lag omitted variable problem OLS estimates biased and inconsistent Ignoring Spatial Error efficiency problem OLS still unbiased, but inefficient OLS standard errors and t-tests biased Specification of Spatial Econometric Models Issues to address formal statistical framework» spatial stochastic process what type of spatial dependence» lag or error, conditional or simultaneous extent of autocovariance» spatial weights, distance decay dependence vs. heterogeneity» some forms of dependence induce heterogeneity simple or higher order model Formal Structure for Spatial Dependence Direct Representation covariance between observations a direct function of a distance metric» Cov[y i, y j ] = f(θ, d ij ) -- Cov symmetric and positive definite continuous function of distance, isotropic Spatial Process Models spatial random process {Z i, i =D}» Markov random field form of process determines form of covariance spatial weights to specify spatial interaction 4

Tests for Spatial Dependence Spatial Autocorrelation Tests Moran s I for regression residuals moments-based tests alternative hypothesis is NOT a specific spatial process model Maximum Likelihood Based Tests Lagrange Multiplier/Score tests» based on OLS residuals Wald and Likelihood Ratio tests» based on Maximum Likelihood estimation of spatial model Estimation of Spatial Econometric Models Maximum Likelihood assume normality likelihood function includes Jacobian» numerical problems GMM/GM/IV robust to non-normality not very efficient no/difficult inference in some cases Spatial Regression Specifications 5

Spatial Autoregressive Model (simultaneous - SAR) Specification assume E[y] = 0, w.l.o.g. y = ρwy + u and E[u] = 0 (I - ρw)y = u, or, y = (I - ρw) -1 u Covariance Structure E[yy ] = (I - ρw) -1 E(uu )(I - ρw ) -1» with E[uu ] = σ 2 I [no need to assume gaussian]» E[yy ] = σ 2 [(I - ρw) (I - ρw)] -1 Conditional Autoregressive Model - CAR Specification E[ y i y* ] = µ i + ρσ j w ij ( y j - µ j ), j i Covariance Structure assuming [ y i y* ] gaussian then joint density of y is MVN[µ, Σ] provided constraints on interaction terms are satisfied SAR vs CAR SAR is NOT First Order CAR SAR variance structure (for symmetric W)»[(I -ρw) (I - ρw)] -1 = [I - 2ρW + ρ 2 W 2 ] -1» compare to CAR = [I - ρw] -1 SAR corresponds to conditional model with first AND second order neighbors CAR is NOT a First Order SAR SAR representation of CAR not useful» requires Choleski decomposition of I - ρw. 6

Spatial Moving Average SMA Specification y = ρwu + u y = (I + ρw)u Covariance Structure E[yy ] = (I + ρw)e(uu )(I + ρw)» with E[uu ] = σ 2 I [no need to assume gaussian]» E[yy ] = σ 2 [(I - ρw) (I - ρw)] = σ 2 [ I -ρ(w+w ) + ρ 2 WW ] Mixed Regressive Spatial Autoregressive Model Specification y = ρwy + Xβ + u with u as i.i.d. Spatial Filter (I - ρw)y = Xβ + u Reduced Form y = (I - ρw) -1 Xβ + (I - ρw) -1 u E[ y X ] = (I - ρw) -1 Xβ Spatial Multiplier Expansion of Reduced Form for w ij < 1 and ρ < 1 (I - ρw) -1 = I + ρw + ρ 2 W 2 +» Leontief inverse Spatial Multiplier E[ y X ] = [I + ρw + ρ 2 W 2 + ] Xβ» function of X, WX, W 2 X,» first, second, third order neighbors» all locations involved, but with distance decay 7

Higher Order Models Spatial Autoregressive, Moving Average Process (SARMA) of order p, q AR in y y= ρ 1 W 1 y + ρ 2 W 2 y +... + ρ p W p y+ ε MA in error term ε = λ 1 W 1 ξ + λ 2 W 2 ξ + + λ q W q ξ + ξ Special Cases biparametric SAR (Brandsma and Ketellapper)» y = ρ 1 W 1 y + ρ 2 W 2 y + Xβ + u Higher Order Models (continued) Both Lag and Error AR y = ρw 1 y + Xβ + ε ε = λw 2 ε + u or (I - ρw 1 )y = Xβ + (I - λw 2 ) -1 u (I - λw 2 )(I - ρw 1 )y = (I - λw 2 )Xβ + u Identification Problems for W 1 = W 2 y = (λ + ρ)wy - λρw 2 y + Xβ -λwxβ + u for W1 orthogonal to W2, or W1.W2 = 0 y = λw 2 y + ρw 1 y + Xβ - λw 2 Xβ + u Spatial AR Error Process Spatial Autoregressive Error y = Xβ + ε with ε = λwε + ξ Ω = σ 2 [(I - λw) (I - λw)] -1» Ω = σ 2 [ (I + ρw + ρ 2 W 2 + ) (I + ρw + ρ 2 W 2 + )] -1» range: all observations Spatial Common Factor Model (I - λw)y = (I - λw)xβ + η» OLS on spatially filtered variables spatial Durbin model: y = λwy + Xβ - λwxβ + η y = γ 1 Wy + Xγ 2 + WXγ 3 + η with γ 1.γ 2 = - γ 3» spatial common factor constraint 8

Global Spillovers Nature of Interaction (I - ρw) -1 all locations interact» spatial multiplier Model Taxonomy unmodeled effects only: spatial AR error» y = Xβ + (I - λw) -1 u both unmodeled and X: spatial lag» y = (I - ρw) -1 Xβ + (I - ρw) -1 u X only: spatial lag with MA error» y = (I - ρw) -1 Xβ + u = ρwy + Xβ + u - ρwu Local Spillovers Nature of Interaction W (spatial lag): only immediate neighbors interact Model Taxonomy unmodeled effects only: spatial MA error» y = Xβ + u + λwu X only: spatial cross-regressive» y = Xβ + γwxβ + u unmodeled effects and X: MA error with WX» y = Xβ + γwxβ + u + λwu» y = (I + λw)(xβ + u) (common factor) Spatial Dependence and Heteroskedasticity Spatial dependence induces heteroskedasticity MA process diagonal element»1 + λ (w ii + w ii ) + λ 2 [ (WW ) ii ]»(WW ) ii = Σ j w ij 2» depends on number of neighbors» not constant -> heteroskedastic errors AR process similar» also higher order powers 9

Space-Time Models Time and Spatial Dependence Example on Single Dimension processes on a line» locations i-1, i, i+1 autoregressive process in time (AR)»y i,t = ρ y i,t-1 + u i,t : y at i and t related to y at i in t-1»with process stable across space» y i-1,t = ρ y i-1,t + u i-1,t : same for y at i-1 (and i+1) autoregressive process on line (SAR)»y i,t = λ (y i-1,t + y i+1,t ) + ε i,t : y at i in t related to i-1,i+1 in t»with process stable over time:» y i,t-1 = λ (y i-1,t-1 + y i+1,t-1 ) + ε i,t-1 : same for y at t-1 Identification Problems Space-Time Dependence substitute y i,t-1 into AR» y i,t = ρ.λ (y i-1,t-1 + y i+1,t-1 ) + [ρ ε i,t-1 + u i,t ] substitute y i-1,t-1, y i+1,t-1 into SAR» y i,t = λ.ρ (y i-1,t-1 + y i+1,t-1 ) + [λ(ε i-1,t-1 + ε i+1,t-1 ) + u i,t ] Identification Problem without further structure, λ and ρ are not separately identifiable from simple space-time AR model» identification problem: ρ.λ Identification Requires Separable Models spatially dependent time proces time dependent spatial process 10

Model Dependence General Framework y = Σ r ρ r W r y + Xβ + ε» y as NT by 1 matrix, stacked by time or region»w r as NT by NT weights matrix expressing all space-time dependencies» different W r to express spatial and/or time dependence time dependence: initial value problem space-time dependence: what weights? Space-Time Dependence Pure space-recursive i at t depends on neighbors at t-1» spatial lag at t-1 is exogenous y it = γ[wy] i,t-1 + f(z) + ε i,t» spatial lag becomes endogenous for space-time error dependence, not for either serial or spatial alone Time-space recursive i at t depends on i at t-1 and neighbors at t-1» serial lag exogenous in absence of serial error dependence y it = ρy i,t-1 + γ[wy] i,t-1 + f(z) + ε i,t» serial lag endogenous for serial error dependence» spatial lag endogenous for space-time error dependence Space-Time Dependence (2) Time-space simultaneous i at t depends on i at t-1 and neighbors at t» spatial lag is endogenous y it = ρy i,t-1 + γ[wy] i,t + f(z) + ε it» implies dependence of y i,t on spatial lag at t-1 The works i at t depends on t-1 and current and past neighbors y it = ρy i,t-1 + λ[wy],i, t + γ[wy] i,,t-1 + f(z) + ε it» identification problems 11

Error Dependence Identification Problem Cov[y it,y js ] 0 for some i,j,t,s Impose Structure groupwise dependence» limit dependence to one dimension»e[ε i,t.ε j,s ] = σ h for all (i,t,j,s) =S h» classical SUR: E[ε i,t.ε j,s ] = σ ij for all t, s» spatial SUR: E[ε i,t.ε j,s ] = σ ts for all i, j parametric dependence» spatial AR or serial AR» identification issues between two dimensions Spatial SUR Model Parameters vary by time, fixed across space y it = x it β t + ε it spatial weights fixed: W N by N Spatial Lag Dependence y it = ρ t [Wy t ] i,t + x it β t + ε it Var[ε it ] = σ 2 t E[ε it ε is ] = σ 2 ts Spatial Error Dependence ε it = λ t [Wε t ] i,t + u it Var[u it ] = σ 2 t E[u it u is ] = σ 2 ts Practical Modeling Strategies To Pool or Not to Pool test constraints on β it = β» Chow test and generalizations test for groupwise heteroskedasticity locational and/or time dummies SUR or Not test on diagonality of cross-equation covariance with fixed β or variable β it Spatial Effects care in fixed effects models» condition if locational dummies lag vs error 12

The Two Effects Model One Cross-Section Common Regression Coefficient in each time period t, a cross-section y it = x it β + u it Error Components u it = µ i + ν it Spatial Autoregressive Errors ν it = λ Σ j w ij ν jt + ε it or ν t = (I - λw) -1 ε t = B -1 ε t Matrix Form y t = X t β + µ + B -1 ε t The Two Effects Model All Cross Sections Stacked Equations y NTx1 = X NTxk β kx1 + (ι T I N ) µ Nx1 + (I T B -1 NxN) ε NTx1 T cross sections of N observations» ι T is T by 1 vector of ones»i N(T) is N by N (T by T) identity matrix» Kronecker products yield NT by NT matrices Error Variance Var[uu ] E[uu ] = (ι T ι T I N )σ 2 µ + [I T (B B) -1 ]σ 2»with E[µµ ] = σ 2 µ I N and E[εε ] = σ2 I NT Ω = σ 2 Ψ = σ 2 [(J T I N ) φ + (I T (B B) -1 )]»with J T = ι T ι T (a matrix of ones), φ===σ 2 µ / σ2 Det (Ψ) and Ψ -1 using some special matrix properties Ψ = (B B) -1 + T φ I N B -2(T-1) Ψ -1 = J* T [(B B) -1 + T φ I N ] -1 + E T (B B)»with J* T = (1/T)J T and E T = I T -J* T 13

Spatial Latent Variable Models Spatial Latent Variable Models Latent Variable Structure y i * = x i β + ε i»y i * is unobservable, x i β is index function, ε i random error observables»prob[y i * > 0] = Prob[ x i β + ε i > 0]» requires specification of marginal probabilities Spatial Autocorrelation lag: spatial dependence in y i *»Cov[y i *y j *] 0, for i-j neighbors error: spatial dependence in ε i»cov[ε i ε j ] 0, for i-j neighbors Substantive and Nuisance Spatial Dependence Substantive: Lag Dependence y i * = ρ Σ j w ij y j * + x i β + ε i latent y i * function of latent values at neighbors Interaction between underlying propensity not the same as observed y i = revealed decisions Nuisance: Error Dependence y i * = x i β + ε i with ε i = λ Σ j w ij ε j + u i (u i iid) randomness ε i joint dependent with errors at neighbors unobservables have some spatial structure 14

Spatial Lag Probit Model Simultaneous Simultaneous Model y* are jointly determined y* = (I - ρw) -1 Xβ + (I - ρw) -1 ε»withε i ~ N(0,1)»u = (I -ρw) -1 ε and u ~ MVN(0,[(I -ρw) (I -ρw)] -1 ) no longer independent nor homoskedastic» u i is marginal of MVN integrate out N-1 dimensions standardize by location-specific variance condition for y i = 1 or y* i 0» x i β + ρ[wxβ] i + ρ 2 [W 2 Xβ] I + + u i 0»Prob[y i = 1] = Prob [ u i < G(X,β,ρ) ] depends on all x i Spatial Lag Probit Model Conditional Conditional Model not y* but observed y i y* = ρ Σ j w ij E[y* j X] + x i β + ε i» E[y* j X] exogenous?» if so, set z i = Σ j w ij E[ y* j X] = Σ j w ij y j average of observed 1 for neighbors y* = ρz i +x i β + ε i» treat as standard probit» requires much larger N to compensate for loss in information = coding approach only» conditional perspective not same for inference different interpretation, suitable for interpolation, but NOT for explaining complete spatial pattern Spatial Error Probit Model Error Distribution y i * = x i β + ε i with ε i = λ Σ i w ij ε j +u i with u i ~ N(0, 1): ε ~MVN (0,[(I - ρw) (I - ρw)] -1 ) Characteristics multivariate, not univariate normal heteroskedastic» standard probit inconsistent with heteroskedasticity» Var[ε i ] = ω ii, with ω ii = diagonal of [(I - ρw) (I - ρw)] -1» no analytical expression for ω ii P[ ε i < x i β ] is marginal of N-dimensional MVN 15