Spatial Dependence in Regressors and its Effect on Estimator Performance

Size: px

Start display at page:

Download "Spatial Dependence in Regressors and its Effect on Estimator Performance"

Rodney Hicks
6 years ago
Views:

1 Spatial Dependence in Regressors and its Effect on Estimator Performance R. Kelley Pace LREC Endowed Chair of Real Estate Department of Finance E.J. Ourso College of Business Administration Louisiana State University Baton Rouge, LA James P. LeSage Fields Endowed Chair Texas State University - San Marcos Department of Finance and Economics San Marcos, TX jlesage@spatial-econometrics.com Shuang Zhu Department of Finance E.J. Ourso College of Business Administration Louisiana State University Baton Rouge, LA June 2, 2010

2 Abstract In econometrics most work focuses on spatial dependence in the regressand or disturbances. However, LeSage and Pace (2009); Pace and LeSage (2009) showed that the bias in β from applying OLS to a regressand generated from a spatial autoregressive process was exacerbated by spatial dependence in the regressor. Also, the marginal likelihood function or restricted maximum likelihood (REML) function includes a determinant of a function of the spatial parameter and the regressors. Therefore, high dependence in the regressor may affect the likelihood through this term. Finally, the notion of effective sample size for dependent data suggests that the loss of information from dependence may have implications for the information content of various instruments when using instrumental variables. Empirically, many common economic regressors such as income, race, and employment show high levels of spatial autocorrelation. Based on these empirical results, we conduct a Monte Carlo study using maximum likelihood, restricted maximum likelihood, and two instrumental variable specifications for the lag y model (SAR) and spatial Durbin model (SDM) in the presence of correlated regressors while varying signal-to-noise, spatial dependence, and weight matrix specifications. We find that REML outperforms ML in the presence of correlated regressors and that instrumental variable performance is affected by such dependence. The combination of correlated regressors and the SDM provides a challenging environment for instrumental variable techniques. In addition, we examine the estimation of marginal effects and show that this can behave better than estimation of component parameters. We also make suggestions for improving Monte Carlo experiments. KEYWORDS: regressor autocorrelation, spatial Durbin model, REML, spatial autoregression, maximum likelihood, spatial econometrics.

3 1 Introduction In spatial settings, such as real estate prices in a city, the dependence of each observation on potentially every other observation impedes theoretical derivation of finite sample results. Consequently, almost all finite sample investigations of spatial econometric methods rely upon Monte Carlo experiments. However, such Monte Carlo experiments usually employ iid random variables for the regressors. However, empirically regressors often display substantial spatial dependence. In this note, we document the large spatial dependence exhibited by common variables such as income, education, and race; and show that the dependence in regressors can make a material difference in the performance of two-stage least squares, a common spatial method (Anselin, 1988; Kelejian and Prucha, 1998; Lee, 2007). Essentially, spatial dependence reduces the information content of variables which exacerbates the weak instrument problem (Bound, Jaeger, and Baker, 1995; Staiger and Stock, 1997). The spatial dependence interacts with other well-known weak instrument characteristics such as goodness-of-fit and number of instruments. In addition, we show that estimation of the common spatial Durbin model (Anselin, 1988) becomes particularly sensitive to traditional weak instrument considerations as well as to the spatial dependence in the regressors. In section 2 we lay out the data generating processes, empirical data that suggest high levels of spatial dependence in the regressors may be common, and two stage least squares estimation. In section 3 we discuss the design of the Monte Carlo experiment and the results. Based on these findings and other information we offer some suggestions for the design of Monte Carlo experiments in spatial settings in section??. In section 5 we discuss the implications of this work for future spatial research. 1

4 2 Spatial Dependence in y and Regressors In this section we set forth the data generating processes for the spatial autoregressive and spatial Durbin models in section 2.1. In 2.2 we empirically measure spatial dependence in common regressors. This informs our choice of a spatial autoregressive process for the regressors. We discuss likelihood approaches (maximum likelihood and residual maximum likelihood) to estimation in section 2.3. In section 2.4 we lay out the two stages least squares and best instruments approach to estimating the parameter in the spatial autoregressive and spatial Durbin models. In particular, we discuss how these model choices affects the instrument set. Finally, we discuss marginal effects and possible bias problems in section Spatial Models Assume that the n by 1 dependent variable y follows a spatial autoregressive DGP (often abbreviated as SAR). 1 y = (I n ρw ) 1 Xβ + (I n ρw ) 1 ε (1) ε N(0, σ 2 εi n ) (2) where X contains n observations on k exogenous regressors, ε is a n by 1 vector of normal iid disturbances with variance σε, 2 and β is a k by 1 vector of regression parameters. The spatial character of (1) is determined by the n by n spatial weight matrix W and the spatial scalar parameter ρ. When observation or region j is a neighbor to observation or region i, W ij > 0 and otherwise W ij = 0. These values are exogenous. By convention, observations can not serve as neighbors to themselves and therefore W ii = 0. For simplicity, assume that W has real eigenvalues such as occur when a matrix is similar to a real, symmetric matrix. Since scaling a matrix by its principal eigenvalue can always yield a new matrix with a principal eigenvalue of 1, assume 1 In spatial econometrics the acronym SAR usually refers to an autoregression in the dependent variable while in spatial statistics it is usually an autoregression in the disturbances (Ripley, 1981). 2

5 W has a principal eigenvalue of 1 and a minimum eigenvalue of λ min. Since the diagonal of W contains zeros, tr(w ) = 0. Therefore, the sum of the eigenvalues equals 0. Given a positive principal eigenvalue, λ min < 0. Consequently, ρ (λ 1 min, 1) yields a symmetric positive definite (I n ρw ). Given (1), (3) becomes the corresponding estimation equation. y = Xβ + ρw y + ε (3) Since functions of the dependent variable appear on both sides of (3), estimation by OLS is biased (LeSage and Pace, 2009; Pace and LeSage, 2009) and require alternative means of estimation such as those based on the likelihood or, in the case under examination, instrumental variables. As with any model the composition of the explanatory variable matrix X becomes important. The SAR model has two variants in terms of the regressors where ι is the n by 1 vector of ones and U is a n by p matrix of non-constant regressors. The first variant X SAR in (4) contains regressors by themselves with associated p by 1 parameter vector γ for the non-constant variables along with a scalar parameter α associated with the constant vector and the second variant X SDM in (6) contains regressors with associated p by 1 parameter vector γ along with their spatial lags with associated p by 1 parameter vector θ as well as a scalar parameter α associated with the constant vector. This latter variant leads to the spatial Durbin model (SDM) which nests the SAR (when θ = 0), spatial error model or SEM (when θ = ργ), the spatial lag of X model or SLX (when ρ = 0) and the conventional iid disturbance model (when ρ = 0 and θ = 0). LeSage and Pace (2009); Pace and LeSage (2009) show the SDM model arises naturally as a byproduct of spatial disturbances and omitted variables that are correlated with the included variables. 3

6 X SAR = [ ι U ] (4) β SAR = [ α γ ] (5) X SDM = [ ι U W U ] (6) β SAR = [ α γ θ ] (7) 2.2 Spatial Dependence in the Regressors In addition, many regressors display spatial dependence as well. To obtain an idea of such dependence we took some common variables and estimated a univariate SAR model for each of these via maximum likelihood. The county data is from the contiguous US states, the census tract data is from New York state, and the block group data is from the Bronx. All variables are logged. Missing values were taken out of sample. The initial weight matrix used 30 (t = ) nearest neighbors where the (t+1)th nearest neighbor received five percent less weight than the tth order neighbor. The final weight matrix was this initial matrix plus its transpose reweighted to make it symmetric and doubly stochastic with a principal eigenvalue of 1. Table 1 documents that many common variables display a high degree of spatial dependence. These range from a high of for the autocorrelation in county house prices to for the autocorrelation in female employment across block groups in the Bronx. To allow for the observed spatial dependence in the explanatory variables we assume that these follow a univariate SAR model as specified in (8) with a DGP in (10). U j = φw U j + R j j = 1... k (8) R j N(0, σ 2 R j I n ) (9) U j = (I n φw ) 1 R j (10) 4

7 Variables County Census Tract Block Group Black Age Age Bachelor s Degree Civilian Employment Female Employment Median HH Income Per Capita Income Median Rent Median House Price Number of Observations 3, , , Table 1: Estimates of spatial dependence for selected census variables in different levels of geography by spatial autoregressive regression model. 2.3 Likelihood-based Estimators Maximum likelihood (concentrated) finds the value of ρ that maximizes the concentrated log-likelihood function (11) with respect to ρ and the optimum value (ρ ) equals the maximum likelihood estimate (ρ = ρ ML ). L ML (ρ) = κ + ln I n ρw n 2 ln(e(ρ) e(ρ)) (11) e(ρ) = (I n ρw )y X (β) (12) β = (X X) 1 X (I n ρw )y (13) It is well-known that maximum likelihood often has a downward bias in estimation of ρ in small samples. The restricted or residuals maximum likelihood (REML) as introduced by Patterson and Thompson (1971) results in a concentrated log-likelihood function that augments the usual log-likelihood with another determinant term and multiplies the log of the sum-of-squared error term by 0.5(n k) rather than by 0.5n. REML can produced unbiased estimates of variances and covariances whereas maximum likelihood only produces consistent estimates. 5

8 L RE (ρ) = κ 1 2 ln X (I n ρw ) (I n ρw )X + ln I n ρw n k 2 ln(e(ρ) e(ρ)) (14) Note, the REML concentrated log-likelihood has a determinant term that can be affected by the dependence in X. Suppose X is univariate and equals U (SAR process) and W is symmetric. Further, suppose U follows an autoregressive process as described by (10). The expectation of the term equal (15) and since a determinant of a scalar is that scalar, this would also be the determinant of that term as well. E(U (I n ρw ) 2 U) = tr[(i n φw ) 2 (I n ρw ) 2 ] (15) By inspection, high values of φ will tend to inflate the magnitude of this term. Therefore, spatial dependence in a regressor could cause REML to differ from ML in such cases. 2.4 Instrumental Variables Instrumental variable techniques have been widely used in spatial econometrics (Anselin, 1988; Kelejian and Prucha, 1998; Lee, 2007). In practice, due to the widespread availability of software and intuitive appeal, two stage least squares (2SLS) has been often used in applications (Anselin, 1988; Land and Deane, 1992; Byme, 2005; Millimet and Rangaprasad, 2007; Richards and Padilla, 2009). Given X, two stage least squares uses a set of instruments Z that includes X and some additional variables V as shown in (16). The problematic element in estimating (3) is the simultaneity between y and W y. Two stage least squares replaces W y with its predicted value based on the instruments P Z W y as in (18). 6

9 Z = [ X V ] (16) P Z = Z(Z Z) 1 Z (17) y = Xβ + ρ(p Z W y) + ε (18) It remains to specify V. Kelejian and Prucha (1998) motivated possible instruments through the combination of the Taylor series expansion of (I n ρw ) 1 in (19) and the expression for the expectation of W y in the SAR model as in (20). (I n ρw ) 1 = I n + ρw + ρ 2 W 2 + ρ 3 W (19) E(W y) = W Xβ + ρw 2 Xβ + ρ 2 W 3 Xβ +... (20) Their suggested instruments for the SAR model was (21) which involved the first two terms of the expectation of W y. By analogy, the first two terms of the expectation of W y in the SDM model that are not contained in the regressors appear in (22). V SAR = [ W U W 2 U ] (21) V SDM = [ W 2 U W 3 U ] (22) Lee (2003) adopted a different instrumental variable approach, termed the best instruments, which also used (20) to suggest replacing W y with W (I n ρw ) 1 Xβ L. In this theoretical form (which uses the true parameters) the best instruments provide a bound to the performance of feasible instrumental variable approaches (Kelejian, Prucha, and Yuzefovich, 2004). Examination of the finite sample properties of these techniques has largely been conducted though Monte Carlo experiments (Kelejian, Prucha, and Yuzefovich, 2004; Klotz, 2004; Lee, 2007) and most of these examine cases with moderate to high levels of signal-to-noise. For example, Kelejian, Prucha, and Yuzefovich (2004) excluded low signalto-noise scenarios. In contrast, Lee (2007) examined a case where the R 2 was 0.04 in which scenario 2SLS performed poorly. 7

10 This suggests that the weak instrument problems of 2SLS in other contexts (Bound, Jaeger, and Baker, 1995; Staiger and Stock, 1997) may carry over into spatial estimation. Factors affecting the weak instrument problem include goodness-of-fit and number of instruments. These problems can reduce performance even in relatively large sample sizes. In addition to these well-known problems, the spatial dependence of regressors may affect the weak instrument problem. In fact, Bowden and Turkington (1984, p ) found in a time series context that increasing the correlation in the regressors had a non-monotonic effect. On one hand, such correlations can increase the correlation between the instrumented out variable and its instruments, on the other hand it increases the correlation between the instruments and the exogenous variables (lower marginal information). The SDM has the potential to further affect the performance of instrumental variables since including the spatial lags of already included regressors in the model precludes using these in aiding the identification of W y. Specifically, the spatial Durbin model by construction reduces the possible number of identifying instruments for W y relative to the SAR model. To further examine this we note that the SAR model in (3) along with the regressors given by (6) defines the SDM model as shown in (23). This has the associated DGP in (24). Recognizing that in (24) the term (I n ρw ) 1 Xγ also equals Xγ + (I n ρw ) 1 W Xργ and combining terms yields (25). y = α + Uγ + W Uθ + ρw y + ε (23) y = α + (I n ρw ) 1 Uγ + (I n ρw ) 1 W Uθ + (I n ρw ) 1 ε (24) y = α + Uγ + (I n ρw ) 1 W U(ργ + θ) + (I n ρw ) 1 ε (25) An important part of the two stage least squares method is the regression of the variable to be instrumented out (W y) in (26) versus the instruments Z as in (27) with its expanded counterpart (28). Note, matrices commute with functions of themselves so that W (I n ρw ) 1 8

11 equals (I n ρw ) 1 W. W y = α + W Uγ + (I n ρw ) 1 W 2 U(ργ + θ) + (I n ρw ) 1 W ε (26) W y = Zδ + ɛ (27) W yx SDM δ (1) + V SDM δ (2) + ɛ (28) An important aspect of the weak instrument problem is the magnitude of δ (2), the parameter vector associated with the identifying instruments V SDM. To address this more directly we employ the Frisch- Waugh-Lovell technique to examine the regression where both sides are multiplied by, in this case, M XSDM. This does not affect the parameter estimates of δ (2), but does remove δ (1) from the model as done in (30). M XSDM = I n X SDM (X SDMX SDM ) 1 X SDM (29) M XSDM W y = M XSDM V SDM δ (2) + M XSDM ɛ (30) Consider the expectation of the regressand in (30) (using (26)) as shown in (31). This is the systematic part to be explained (the signal). If this has a small magnitude relative to the random part (the noise), this sets up the conditions leading to the weak instrument problem. E(M XSDM W y) = M XSDM (I n ρw ) 1 W 2 U(ργ + θ) (31) Note, the signal has a zero magnitude when the true DGP is an error model (θ = ργ). In this case two stage least squares technique will not work as this violates the conditions underlying the proof of consistency (instruments have explanatory power). In practice, θ rarely exactly equals ργ. However, it often has the opposite sign of γ and this suggests that weak instruments will often be a problem when estimating the SDM using two stage least squares. 9

12 2.5 Marginal Effects and Bias Correction A number of authors have pointed out that the derivative of the regressand with respect to the regressor in these models does not equal β as in the case of iid models fitted by OLS. In fact, the partial derivatives of the expected value of the regressand with respect to regressor j (marginal effects) for the SAR DGP equals (32). E(y) X j = (I n ρw ) 1 β j (32) = S(ρ) j (33) Since this is an n by n matrix and since there are k 1 non-constant variables, this results in (k 1)n 2 partial derivatives which provides an overwhelming amount of information. To deal with this volume of information LeSage and Pace (2009) suggested summarizing these partial derivatives. Specifically, they suggested averaging all the column or row sums to arrive at the average total effect or impact, the average of the diagonal of S j (ρ) to arrive at the average direct effect or impact, and the average of the off-diagonal elements of S j (ρ) to arrive at the average indirect effect or impact. As shown in LeSage and Pace (2009) the simplest case arise for a row or doubly stochastic W. In this case the average total effect is a simple non-linear function of ρ as shown in (34). T (ρ) j = (1 ρ) 1 β j (34) (35) Due to the non-linearity and the sampling variabilty of estimates of ρ, the expected total effect does not equal the total effect of the estimate of ρ as shown in (36). E(T (ρ) j ) (1 ˆρ) 1 ˆβj (36) 10

13 Fortunately, the quantiles (q) of the total effects are easier to use. Since (1 ρ) 1 is a monotonic transformation, q(f(x)) equals f(q(x)) as in (37). q(t (ρ) j ) = (1 q(ρ)) 1 q(β j ) (37) 3 Monte Carlo Experiments In this section we discuss the design of the Monte Carlo experiments and discuss the experimental results. Specifically, in section 3.1 we examine a Monte Carlo experiment with different levels of spatial dependence in y, goodness-of-fit, and dependence in the regressors. In section 3.2 we take the most difficult case from the previous experiments and see how the estimators perform as a function of sample size. Finally, in section 3.3 we examine the performance of the various estimators in estimating the marginal effects. In all of the experiments that follow two-stage least squares uses W 2 X and W 3 X for SDM, W X and W 2 X for SAR as in (22) and (21). There are 15 non-constant explanatory variables and therefore the number of additional instruments equals 30. We do not focus on the problem of large number of instruments. However, one would expect the findings of the experiments to be mitigated with fewer variables (hence fewer additional instruments) and exacerbated with more variables (additional instruments). Many researchers have an idea of typical levels of spatial dependence in y and typical levels of signal-to-noise for data in their field of interest. In the experiments we set σ 2 ε to the level required to yield a given signalto-noise as defined in (39). To maintain succinct notation, we call this measure R 2. 11

14 R 2 = 1 E(ε (I n ρw ) 2 ε) β X (I n ρw ) 2 Xβ = 1 σε 2 tr( ( I n ρw ) 2) β X (I n ρw ) 2 Xβ (38) (39) In generating U j which provides the components for X SAR and X SDM we set the variance σr 2 j to 1 in (10). In all the experiments the intercept was set to 0 while the regression parameters associated with U in X SAR and X SDM were set to 1 and the regression parameters associated with W U in X SDM were set to 0. Therefore, the DGP was always SAR. We did this to facilitate comparisons across the SAR and SDM results (especially for the marginal effects). The best instrumental variable estimator uses the true values of β and ρ in computing (I n ρw ) 1 W Xβ. This obviously is the best possible case for an instrumental variable technique. 3.1 Results by Signal-to-Noise, Spatial Dependence in Regressors and Regressand In the following experiments, we only examine a sample size of 3, 000 observations, normally considered a large sample size. We designed the experiments to facilitate matching the cases to researchers knowledge of their data. Specifically, we simulated using (1), (10) with levels of ρ equal to 0.4 and 0.8 which reflect moderate and strong spatial dependence in y. Given the empirical levels of spatial dependence in the regressors as shown in Table 1, we set φ to equal 0, 0.5, and 0.95 to represent the iid case, the case of moderate spatial dependence in the regressors, and finally the case of high spatial dependence in the regressors. The σε 2 was set to a level that assured that the R 2 as defined by (39) equaled 0.1 (low signal-to-noise), 0.5 (moderate signal-to-noise), or 0.9 (high signal-to-noise). Therefore, the experiment examined 18 cases that vary by spatial dependence in the regressors, signal-to-noise, and spatial dependence in y. Each case involves 1, 000 trials. 12

15 In the tables we recorded the median estimate of ρ as well as the median absolute deviation (MAD) of the estimated values of ρ from all the estimators. In the tables the subscripts RE, B, IV, ML stands for REML, the Best IV, two stage least squares, and maximum likelihood. We used the medians and median absolute deviations to control the problems which occured due to sometimes wild estimates from the best and two stage least squares instrumental variables estimators. To simplify the presentation, we discuss all SAR results in section and all SDM results in section SAR Results Table 2 show the usual instrumental variable results in the spatial literature in the presence of high or moderate signal to noise (R 2 = 0.9, 0.5) and no spatial dependence in the regressors (φ = 0). Namely, both the best instrumental variable specification and two-stage least squares show very low levels of bias in those cases. Also, the results confirm those of Lee (2007) for the low signal-to-noise cases where two stage least squares shows greater bias. Note, the bias of maximum likelhood monotonically increases with φ. Although REML displays some bias, it is lower than ML and the lowest of the estimators. However, Table 2 shows the great importance of spatial dependence in the regressors. All of the worst cases in terms of bias and rmse happen in the high spatial dependence of regressor (φ = 0.95) cases for the Best IV, two stage least squares, and maximum likelihood estimators (relative to REML). The performance of the IV estimators is non-monotonic with the highest relative RMSEs occurring for both the lowest and the highest levels of φ. In terms of producing estimates outside of the interval (0, 1), the Best IV estimator with true parameters yielded estimates of ρ of less than 0 in 3 out of 1, 000 trials, and greater than 1 in 4 out of 1, 000 trials when R 2 = 0.1, φ = 0.95, and ρ = 0.4 (case 18). Two stage least squares produced estimates greater than 1 in 9.6 percent of the trials for this case. 13

16 ρ r 2 φ ρ ML ρ RE ˆρ B ˆρ IV ˆρ OLS Table 2: Estimation of the SAR Model via Instrumental Variables and Maximum Likelihood

17 0 ρ r 2 φ RML RRE ˆRB ˆRIV ˆROLS Table 3: SAR Model RMSE from Instrumental Variables and Maximum Likelihood 15

18 3.1.2 SDM Results Table 4 demonstrates the challenges posed by the SDM model for the ML, Best IV, and two stage least squares estimators. Only REML displays very low bias across all the cases. Even ML shows increasing bias as dependence in the regressors (φ) rises. In the worst case (15) where φ = 0.95, ρ = 0.4, and R 2 = 0.1, ML gave an average estimate of for a bias of Relative to REML, the RMSE for ML was 12.1 percent higher. In contrast, REML gave an average estimate of for a bias of More dramatically, the best IV (with true starting values) gave estimates of ρ of 0.46 or a bias of 0.16 and two stage least squares produced average estimates of for a bias of The RMSE for the best IV (with true starting values) and two stage least squares increased by factors of over 11 and 19 over REML. Table 5 shows that only 53.8 and 59.8 percent of the trials resulting in estimates for the best IV and two stage least squares in the (0, 1) interval. For ML and REML all of the ρ estimates fell in this interval. For two stage least squares in the worst case (18) when ρ = 0.8, φ = 0.95, and R 2 = 1 only 29.4 percent of the trials had ρ estimates (0, 1). Even when R 2 equaled 0.5, some of the cases resulted in ρ estimates outside of (0, 1) for the best IV and two stage least squares estimators. 16

19 ρ r 2 φ ρ ML ρ RE ˆρ B ˆρ IV ˆρ OLS Table 4: Estimation of the SDM Model via Instrumental Variables and Maximum Likelihood

20 0 ρ r 2 φ ˆρ B < 0 ˆρ B > 1 ˆρ IV > 1 ˆρ OLS > Table 5: Proportion of of the SDM Model estimates in (0, 1) for Instrumental Variables and Maximum Likelihood 18

21 0 ρ r 2 φ RML RRE ˆRB ˆRIV ˆROLS Table 6: SDM Model RMSE from Instrumental Variables and Maximum Likelihood 19

22 3.2 Performance by Sample Size In section the most difficult case (15) for the best instruments in estimating the SDM was for ρ = 0.4, R 2 = 0.1, and φ = Both the best instruments and two stage least squares produced biased estimates for ρ when n = 3, 000. However, the theory establishes the consistency of these techniques. In this section we examine how the bias and performance of the IV techniques varies with sample size. Table 7 shows that the best instruments show low bias after 500, 000 observations while the two stage least squares estimates take 2, 500, 000 observations to achieve low bias. In contrast, maximum likelihood displayed low bias across all sample sizes (despite using an approximation to the log-determinant) while OLS displayed high (but relatively constant) bias across all sample sizes. In terms of computation, we used a Delaunay triangle routine to determine W based on contiguity. This took slightly over one minute on a 2005 vintage computer. We used the Barry and Pace (1999) log-determinant approximation in calculating maximum likelihood (with some modifications described in LeSage and Pace (2009)). The reported times for maximum likelihood combined the times for computing the log-determinant and estimating the model once. Naturally, in the simulation the log-determinant approximation was only run once and the marginal time required for estimating the model for another y approximately equaled OLS. The best IV estimator, due to computing (I n ρw ) 1 Xβ, actually required more time than maximum likelihood when n = 2, 500, 000. Nonetheless, all the techniques were computational feasible, even for large sample sizes. 20

23 n 20, , ,000 2,500,000 mean ˆρ ML s.d. ˆρ ML mean ˆρ B s.d. ˆρ B mean ˆρ IV s.d. ˆρ IV mean ˆρ OLS s.d. ˆρ OLS mean R Delaunay secs ML secs Best IV secs SLS IV secs OLS secs Table 7: Estimation of the SDM Model for large n 21

24 3.3 Marginal Effects We begin with the total marginal effects for the SAR model as shown in Table 8. First, the ML, REML, and Best IV estimator usually display low biases across the cases. However, two stage least squares displays a positive bias which becomes serious for some of the cases when R 2 = 0.5 and all of the cases when R 2 = 0.1. In almost every situation the variability of the estimates of the total effects from all the estimators rises with φ. In some situations the variability more than doubles. For the SDM, Table 9 displays the total effect results across the estimators and cases. In this case REML displays the lowest unadjusted total effect biases, although it often displays slightly more variability than ML. In the cases where R 2 = 0.9 the best IV slightly underestimates the total effects. This underestimation becomes somewhat more prominent as the R 2 falls. However, two stage least squares displays very serious biases when R 2 equals 0.5 and especially 0.1. In case 15, due to having estimates of ρ > 1, two stage least squares reverses the sign of the total effect. 22

25 ρ r 2 φ TML TRE ˆTB ˆTIV ˆTOLS Table 8: Estimated Total Effects from the SAR Model via Instrumental Variables and Maximum Likelihood

26 ρ r 2 φ TML TRE ˆTB ˆTIV ˆTOLS * Table 9: Estimated Total Effects from the SDM Model via Instrumental Variables and Maximum Likelihood

27 4 Sensitivity to W The above analysis has always used a contiguity-based weight matrix. A common finding in spatial econometrics is that the structure of W matters in some cases. To investigate this, we examined the case with a SDM DGP where n = 3, 000, φ = 0.95, ρ = 0.4, and R 2 = 0.9. This is similar to the case used in analysing models with large n in section 3.2. However, the sample size is smaller and the R 2 was raised from 0.1 to 0.9 to allow better performance of the IV estimators for this sample size of 3, 000 observations. We examined the afore mentioned contiguity weight matrix, weight matrices based on 6, 20, and 30 nearest neighbors (NN-6, NN-20, NN-30), weight matrices where the 3, 20, and 30 diagonals closest to the main diagonal are positive (D-6, D-20, and D-30), a random matrix where approximately 6/n proportion of the off-diagonal elements are positive (Random), and a block diagonal matrix comprised of a 50 by 50 submatrix S with the 6 diagonals nearest the main diagonal were positive (and 0 otherwise) so that the candidate weight matrix equals I 60 S. All candidate weight matrices were symmetricized and scaled to be doubly stochastic. Table 10 shows that the weight matrices made a difference in the average estimates of ρ across the 1, 000 trials. For all the weight matrices, REML gave answers close to the true parameter value of 0.4. The Best IV estimator also came close to this value with the most bias appearing in the case with a matrix with 20 positive diagonals (D-20) where the average estimate of ρ equalled Interestingly, maximum likelihood showed considerable bias in the case of the D-30 weight matrix with an average estimate of Also, 2SLS showed its highest bias in this case with an estimate of OLS showed its lowest bias in the case of NN-30. Typically, the determinant plays a smaller role the denser the weight matrix and the less clustered neighbors are to the main diagonal. W ML REML BEST 2SLS OLS Contiguity NN D NN D NN D Random Block Diagonal Table 10: Average Estimates of ρ for the SDM Model Across W Table 11 shows the RMSE for ρ estimates for the different estimators across the 25

28 different weight matrices normalized by the REML RMSE. Interestingly, maximum likelihood performed much worse than REML for some of the weight matrices. In the case of a weight matrix with positive entries in the 15 closest off-diagonals on either side of main diagonal (D-30), ML had 67.2 percent more RMSE than REML. Almost all of that, however, was bias. For the Best IV estimator, it had RMSE worse than OLS for D 20, D 30, and Random weight matrices. For two stage least squares, it had RMSE worse than OLS for the NN 20, D 20, NN 30, D 30, and Random weight matrices. W ML REML BEST 2SLS OLS Contiguity NN D NN D NN D Random Block Diagonal Table 11: Relative RMSE of ρ for the SDM Model Across W 5 Conclusion Much of the research conducted through Monte Carlo experiments in spatial econometrics assumes that the regressors are iid. This is not a realistic assumption given the very high level of measured spatial dependence found in common economic variables. Dependence in the regressors may not be very important in all situations. For example, estimation of regression parameters in error models is unbiased in the presence of spatially dependent disturbances and this will not change for dependent regressors (although it might affect estimation precision). However, as this manuscript documents, the performance of instrumental variable techniques when estimating SAR and especially SDM models can be sensitive to spatial dependence in the regressors even when using thousands of observations. In previous research, the bias of OLS when estimating SAR models also displayed sensitivity to dependence in the regressors (LeSage and Pace, 2009; Pace and LeSage, 2009). Therefore, we recommend that investigations into the performance of various spatial techniques examine the sensitivity of these techniques to spatial dependence in explanatory variables in conjunction with other spatial features of interest. 26

29 References Anselin, L. (1988). Spatial Econometrics: Methods and Models, Dordrecht: Kluwer Academic. Barry, R.P. and R.K. Pace (1999). A Monte Carlo Estimator of the Log Determinant of Large Sparse Matrices, Linear Algebra and its Applications, 289, pp Bound, J., D. A. Jaeger and R. Baker (1995). Problems with Instrumental Variables Estimation When the Correlation Bctween the Instruments and the Endogenous Explanatory Variable is Weak, Journal of the American Statistical Association, Vol. 90, pp Bowden, Roger J. and Darrell A Turkington (1984). Instrumental Variables, Cambridge: Cambridge University Press. Byme, P. F. (2005). Strategic interaction and the adoption of tax increment financing, Regional Science and Urban Economics, Vol. 35, Issue 3, pp Cressie, N. (1993). Statistics for Spatial Data, Revised edition, New York: John Wiley. Kelejian, H., and I. Prucha. (1998). A Generalized Spatial Two Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances, Journal of Real Estate Finance and Economics, Vol. 17, pp Kelejian, H., I. Prucha, and Y. Yuzefovich (2004). Instrumental Variable Estimation of a Spatial Autoregressive Model with Autoregressive Disturbances: Large and Small Sample Results, Advances in Econometrics: Volume 18: Spatial and Spatiotemporal Econometrics, (Oxford: Elsevier Ltd), J.P. LeSage and R.K. Pace (eds.), pp Klotz, Stefan (2004). Cross Sectional Dependence in Spatial Econometrics: With an Application to German Start-Up Activity Data, Munster: LIT Verlag Munster. Land, K. and G. Deane (1992). On the large-sample estimation of regression models with spatial or network-effects terms: a two stage least squares approach, In P. Marsden (Ed.), Sociological Methodology, San Francisco: Jossey-Bass, pp Lee, L. F. (2003). Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model with Autoregressive Disturbances. Econometric Reviews, Vol. 22, pp Lee, L. F. (2007). GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models, Journal of Econometrics, Vol. 137, pp LeSage, J. P. and K. R. Pace (2009). Introduction to Spatial Econometrics, Boca Raton: CRC Press/Taylor & Francis. 27

30 Millimet, D. L. and V. Rangaprasad (2007). Strategic competition amongst public schools, Regional Science and Urban Economics, Vol. 37, Issue 2, pp Pace, R. Kelley and James P. LeSage (2009). Biases of OLS and Spatial Lag Models in the Presence of an Omitted Variable and Spatially Dependent Variables, Progress in Spatial Analysis: Methods and Applications, eds, Antonio Pa ez, Julie Gallo, Ron N. Buliung, and Sandy Dall erba. Berlin: Springer-Verlag. Patterson, H.D.; Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, Vol. 58, pp Richards, T. J. and L. Padilla (2009). Promotion and Fast Food Demand, American Journal of Agricultural Economics, Vol. 91, Issue 1, pp Ripley, B. (1981). Spatial Statistics, New York: Wiley. Staiger, D. and J. H. Stock (1997). Instrumental Variables Regression with Weak Instruments, Econometrica, Vol. 65, Issue 3, pp

Omitted Variable Biases of OLS and Spatial Lag Models

Omitted Variable Biases of OLS and Spatial Lag Models R. Kelley Pace and James P. LeSage 1 Introduction Numerous authors have suggested that omitted variables affect spatial regression methods less than