Lectures 3: Bayesian estimation of spatial regression models

Size: px

Start display at page:

Download "Lectures 3: Bayesian estimation of spatial regression models"

Virgil Hugo Nash
5 years ago
Views:

1 Lectures 3: Bayesian estimation of spatial regression models James P. LeSage University of Toledo Department of Economics Toledo, OH March 2004

2 Introduction Application of Bayesian estimation methods to SAR, SDM and SEM spatial regression models where the number of observations is very large should result in estimates nearly identical to those from maximum likelihood methods. A standard result in econometrics because prior information is dominated by a large amount of sample information. Three points, 1) Bayesian methods can solve the problem of inference in maximum likelihood computed using numerial hessians, which are not always very good. 2) Bayesian methods can be used to relax the assumption of constant variance normal disturbances made by maximum likelihood methods, resulting in extended models. 3) Bayesian methods can be used to formally solve model comparison problems. We can compare models based on: different weight matrices, different explanatory (X) variables, or different model specifications, e.g., SAR, SDM, SEM. We can develop Bayesian variants of the SAR, SDM and SEM models that generalize on maximum likelihood by allowing for non-constant variance over space. 1

3 ALL Bayesian econometric theory in 2 overheads In econometrics we care about 1) inference about parameters, 2) model comparisons, and 3) prediction. p(a, B) = p(a B)p(B) basic probability rules for A, B r.v. p(a, B) = P (B A)p(A) basic probability rules setting these two equal and rearranging yields Bayes Rule p(b A) = p(a B)p(B) p(a) (1) Letting y = A represent model data and θ = B denote model parameters: P (θ y) = p(y θ)p(θ) (2) p(y) With reference to 1) inference, All Bayesian learning/inference about θ y is based on the posterior P (θ y). Since we don t care to learn about p(y), because it doesn t involve the parameters θ, our object of interest we can simplify to: p(θ y) = p(y θ)p(θ) (3) 2

4 Overhead #2 With reference to 2) model comparison, each of m models is denoted by a likelihood function and prior, p(θ i y, M i ) = p(y θi, M i )p(θ i M i ) (4) p(y M i ) Use Bayes rule again to explode terms like p(y M i ) to arrive at the posterior model probabilities, the basis for inference about different models, given the sample data. p(m i y) = p(y M i)p(m i ) (5) p(y) p(y M i ) is called the marginal likelihood and we can solve for this key quantity needed for model comparison finding: Z p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i (6) to avoid dealing with p(y), we can find: P O ij = p(m i y) p(m j y) = p(y M i)p(m i ) (7) p(y M j )p(m j ) With reference to 3) prediction of y based on y, using rules of probability: p(y y) = Z p(y, θ y)dθ = Z p(y y, θ)p(θ y)dθ (8) 3

5 Markov Chain Monte Carlo estimation Bayesian ordinary least-squares regression (non-spatial example) y = Xβ + ε (9) ε N(0, σ 2 I n ) The parameters to be estimated in (9) are (β, σ), for which we assign a prior density of the form π(β, σ) = π 1 (β)π 2 (σ), with a normal prior for β and diffuse prior for σ. Rβ N(r, T ) (10) π 1 (β) exp{ (1/2)(Rβ r) T 1 (Rβ r)} π 2 (σ) (1/σ) (Like Theil and Goldberger (1961) regression.) For our purposes it is convenient to express (10) in an alternative (equivalent) form based on a factorization of T 1 into Q Q = T 1, and q = Qr leading to (11). 4

6 Qβ N(q, I m ) (11) π 1 (β) exp{ (1/2)(Qβ q) (Qβ q)} Following the usual Bayesian methodology, we combine the likelihood function for our simple model: L(β, σ) (1/σ n )exp[(y Xβ) (y Xβ)/2σ 2 ] (12) with the priors π 1 (β) and π 2 (σ) to produce the posterior density for (β, σ) shown in (13). p(β, σ) (1/σ n+1 )exp[β ˆβ(σ)] [V (σ)] 1 [β ˆβ(σ)] ˆβ(σ) = (X X + σ 2 Q Q) 1 (X y + σ 2 Q q) V (σ) = σ 2 (X X + σ 2 Q Q) 1 (13) In (13), we have used the notation ˆβ(σ) to convey that the mean of the posterior, ˆβ, is conditional on the parameter σ, as is the variance, denoted by V (σ). This single parameter prevents analytical solution of the Bayesian regression problem. 5

7 In order to overcome this problem, Theil and Goldberger (1961) observed that conditional on σ, the posterior density for β is multivariate normal. They proposed that σ 2 be replaced by an estimated value, ˆσ 2, based on least-squares estimates ˆβ. ˆβ T G = (X X + ˆσ 2 Q Q) 1 (X y + ˆσ 2 Q q) var( ˆβ T G ) = ˆσ 2 (X X + ˆσ 2 Q Q) 1 (14) The advantage of this solution is that the estimation problem can be solved using existing least-squares regression software. Their solution produces a point estimate which we label ˆβT G and an associated variance-covariance estimate, both of which are shown in (14). 6

8 An MCMC Sampling solution The Gibbs sampler provides a way to sample from a multivariate probability density based only on the densities of subsets of vectors conditional on all others. A two-step Gibbs sampler for the posterior distribution (13) of our Bayesian regression model based on the distribution of β conditional on σ and the distribution of σ conditional on β. For our regression problem, the posterior density for β conditional on σ, p(β σ), is multivariate normal with mean equal to and variance as indicated in (15). p(β σ) N[ ˆβ(σ), V (σ)] (15) ˆβ(σ) = (X X + σ 2 Q Q) 1 (X y + σ 2 Q q) V (σ) = σ 2 (X X + σ 2 Q Q) 1 The posterior density for σ conditional on β, p(σ β) is: p(σ β) (e e)/χ 2 (n)(16) (y Xβ) (y Xβ)/σ 2 β χ 2 (n) 7

9 The Gibbs sampler suggested by these two conditional posterior distributions involves the following computations. Begin with arbitrary values for the parameters β 0 and σ 0, which we designate with the superscript 0. Compute the mean and variance of β N[ ˆβ(σ), V (σ)] using (15) conditional on the initial value σ 0. Use the computed mean and variance of β to draw a multivariate normal random vector, which we label β 1. Use the value β 1 along with a random χ 2 (n) draw to determine σ 1 using (16). The above four steps are known as a single pass through our (two-step) Gibbs sampler, where we have replaced the initial arbitrary values of β 0 and σ 0 with new values labeled β 1 and σ 1. We now return to step using the new values β 1 and σ 1 in place of the initial values β 0 and σ 0, and make another pass through the sampler. This produces a new set of values, β 2 and σ 2. 8

10 An Example % A simple Gibbs sampler n=100; k=3; % set nobs and nvars x = randn(n,k); % generate data set b = ones(k,1); y = x*b + randn(n,1); r = [ ] ; % prior means R = eye(k); T = eye(k); % prior variance Q = chol(inv(t)); Qpr = Q*r; % convenience transform b0 = (x *x)\(x *y); % use ols as initial values sige = (y-x*b0) *(y-x*b0)/(n-k); ndraw = 6000; nomit = 1000; % set the number of draws bsave = zeros(ndraw,k); % allocate storage for results ssave = zeros(ndraw,1); xpx = x *x; xpy = x *y; % compute these before sampling tic; for i=1:ndraw; % Start the sampling xpxi = inv(xpx + sige*q); xpyi = (xpy + sige*qpr); % update b b = xpxi*xpyi; b = norm_rnd(sige*xpxi) + b; bsave(i,:) = b ; % update sige e = y - x*b; chi = chis_rnd(1,n); sige = e *e/chi; ssave(i,1) = sige; end; % End the sampling toc; bhat = mean(bsave(nomit+1:ndraw,:)); % calculate means bstd = std(bsave(nomit+1:ndraw,:)); % and std deviations shat = mean(ssave(nomit+1:ndraw,1)); 9

11 Results Gibbs estimates (1.22 seconds for 6,000 draws) Variable Coefficient t-statistic t-probability variable variable variable Theil-Goldberger Regression Estimates R-squared = Rbar-squared = sigma^2 = Durbin-Watson = Nobs, Nvars = 100, 3 *************************************************************** Variable Prior Mean Std Deviation variable variable variable *************************************************************** Posterior Estimates Variable Coefficient t-statistic t-probability variable variable variable

12 Bayesian spatial models Similar to the least-squares model we already did. Parameters are: β, σ, ρ. We need the conditional distributions for each of these, and then we can develop a sampler. A normal-gamma conjugate prior for β and σ, and a uniform prior for ρ. The prior distributions are indicated using π. y = ρw y + Xβ + ε ε N(0, σ 2 I n ) π(β) N(c, T ) π(1/σ 2 ) Γ(d, ν) π(ρ) U[0, 1] (17) 11

13 In the case of large samples involving 3,000 observations, the normal-gamma priors for β, σ should exert relatively little influence. Setting c to zero and T to a very large number results in a diffuse prior for β. Diffuse settings for σ involve setting d = 0, ν = 0. For completeness, we develop the results for the case of a normal-gamma prior on β, σ. In contrast to the case of the priors on β, σ, assigning an informative prior to the parameter ρ associated with spatial dependence should exert an impact on the estimation outcomes even in large samples. This is due the important role played by spatial dependence in these models. In typical applications where the magnitude and significance of ρ is a subject of interest, a diffuse prior would be used. It is however possible to rely on an informative prior for this parameter. The parameters β, σ and ρ in the SAR model can be estimated by drawing sequentially from the conditional distributions of these parameters. 12

14 Conditional distributions To implement this estimation method, we need to determine the conditional distributions for each parameter in our Bayesian SAR model. The conditional distribution for β follows from the maximum likelihood model: p(β ρ, σ) N( b, σ 2 B) (18) b = A(X Sy + σ 2 T 1 c) B = σ 2 A A = (X X + σ 2 T 1 ) 1 S = (I n ρw ) We see that the conditional for β is a multivariate normal distribution from which it is easy to sample a vector β. The conditional distribution for σ given the other parameters, takes the form (see Gelman, Carlin, Stern and Rubin, 1995): p(σ 2 β, ρ) (σ 2 ) (n 2 +d+1) exp e = (I n ρw )y Xβ e e + 2ν 2σ 2 13

15 which is proportional to an inverse gamma distribution with parameters (n/2) + d and e e + 2ν. Again, this would be an easy distribution from which to sample a scalar value for σ. Finally, the conditional posterior distribution of ρ takes the form: p(ρ β, σ) S(ρ) (s 2 (ρ)) (n k)/2 π(ρ) (19) s 2 (ρ) = (Sy Xb(ρ)) (Sy Xb(ρ)) (n k) S = (I n ρw ) A problem arises here in that this distribution is not one for which established algorithms exist to produce random draws. There are however two ways to sample from this conditional distribution. 1) Use Metropolis-Hastings algorithms. 2) A griddy Gibbs sampler, with univariate numerical integration to find the conditional posterior distribution and then carry out a draw using inversion. 14

16 The MCMC sampler By way of summary, an MCMC estimation scheme involves starting with arbitrary initial values for the parameters which we denote β 0, σ 0, V 0, ρ 0. We then sample sequentially from the following set of conditional distributions for the parameters in our model. p(β σ 0, ρ 0, ), which is a multivariate normal distribution with mean and variance defined in (18). This updated value for the parameter vector β we label β 1. p(σ β 1, ρ 0 ), which is chi-squared distributed with n+2d degrees of freedom. Note that we rely on the updated value of the parameter vector β = β 1 when evaluating this conditional density. We label the updated parameter σ = σ 1 and note that we will continue to employ the updated values of previously sampled parameters when evaluating the next conditional densities in the sequence. p(ρ β 1, σ 1, which we could sample using a metropolis step or univariate numerical integration and inversion. 15

17 Sampling for ρ using griddy Gibbs This is a recent discovery of mine: p(ρ y, X, W ) S (s 2 ) (n k)/2 π(ρ) (20) We work with the log of the expression in (20), and construct a vector associated with a grid of q values for ρ in the feasible interval that takes the form in (21). 0 Ln p(ρ 1 ) Ln p(ρ 2 ). Ln p(ρ q ) 1 C A 0 Ln S(ρ 1 ) Ln S(ρ 2 ). Ln S(ρ q ) 1 C A (n k 2 ) 0 Ln(s 2 (ρ 1 )) Ln(s 2 (ρ 2 )). Ln(s 2 (ρ q )) (21) 1 C A Where: s 2 (ρ i ) = e o e o 2ρ i e d e o + ρ 2 i e d e d e = e o ρe d e o = y Xβ o e d = W y Xβ d 16

18 β o = (X X) 1 X y β d = (X X) 1 X W y (22) This vector allows univariate numerical integration using a simple method such as Simpson s rule. For this model, you can compute the log-determinant once before you begin sampling. 17

19 Computational issues using griddy Gibbs There are four separate terms involved in the univariate integration problem over the range of support for the parameter ρ. These are shown in (23) as T 1, T 2, T 3 and T 4. In addition, there are constants of proportionality that arise during the analytical integration of β and σ that include a Gamma function that involves n k, where n denotes the number of observations and k the number of explanatory variables. For cases involving model comparisons where the number of explanatory variables k varies, we need also include this expression during solution of our integration problem, which we will refer to as T 0. T 1 (ρ) = I n ρw (23) T 2 = X X 1/2 T 3 (ρ) = s 2 (ρ) (n k)/2 s 2 (ρ) = {[(I n ρw )y Xβ] [(I n ρw )y Xβ]} T 4 (ρ) = 1 (1 + ρ) α 1 (1 ρ) α 1 Be(α, α) 2 2α 1 18

20 Griddy Gibbs sampling for rho rho draw 0.8 cumulative marginal distribution for rho uniform 0,1 values Figure 1: Griddy Gibbs sampling for ρ 19

21 Applied examples load elect.dat; % load data on votes y = log(elect(:,7)./elect(:,8)); x1 = log(elect(:,9)./elect(:,8)); x2 = log(elect(:,10)./elect(:,8)); x3 = log(elect(:,11)./elect(:,8)); latt = elect(:,5); long = elect(:,6); n = length(y); x = [ones(n,1) x1 x2 x3]; n = 3107; [junk W junk] = xy2cont(latt,long); vnames = strvcat( voters, const, educ, homeowners, income ); result = sar(y,x,w); % maximum likelihood estimates prt(result,vnames); ndraw = 2500; nomit = 500; prior.novi = 1; % homoscedastic model result2 = sar_g(y,x,w,ndraw,nomit,prior); % MCMC estimation result2.tflag = tstat ; prt(result2,vnames); plt(result2); result = sdm(y,x,w); % maximum likelihood estimates prt(result,vnames); ndraw = 2500; nomit = 500; prior.novi = 1; % homoscedastic model result2 = sdm_g(y,x,w,ndraw,nomit,prior); % MCMC estimation result2.tflag = tstat ; prt(result2,vnames); 20

22 Results sar: hessian not positive definite augmenting small eigenvalues Spatial autoregressive Model Estimates Dependent Variable = voters R-squared = Rbar-squared = sigma^2 = Nobs, Nvars = 3107, 4 log-likelihood = # of iterations = 12 min and max rho = , total time in secs = time for lndet = time for t-stats = Pace and Barry, 1999 MC lndet approximation used order for MC appr = 50 iter for MC appr = 30 *************************************************************** Variable Coefficient Asymptot t-stat z-probability const educ homeowners income rho

23 Bayesian spatial autoregressive model Homoscedastic version Dependent Variable = voters R-squared = Rbar-squared = mean of sige draws = Nobs, Nvars = 3107, 4 ndraws,nomit = 2500, 500 total time in secs = time for lndet = time for sampling = Pace and Barry, 1999 MC lndet approximation used order for MC appr = 50 iter for MC appr = 30 numerical integration used for rho min and max rho = , *************************************************************** Posterior Estimates Variable Coefficient Asymptot t-stat z-probability const educ homeowners income rho

24 sar: hessian not positive definite augmenting small eigenvalues Spatial Durbin model Dependent Variable = voters R-squared = Rbar-squared = sigma^2 = log-likelihood = Nobs, Nvars = 3107, 4 # iterations = 17 min and max rho = , total time in secs = time for lndet = time for t-stats = Pace and Barry, 1999 MC lndet approximation used order for MC appr = 50 iter for MC appr = 30 *************************************************************** Variable Coefficient Asymptot t-stat z-probability const educ homeowners income W-educ W-homeowners W-income rho

25 Bayesian Spatial Durbin model Homoscedastic version Dependent Variable = voters R-squared = mean of sige draws = Nobs, Nvars = 3107, 7 ndraws,nomit = 2500, 500 total time in secs = time for lndet = time for sampling = Pace and Barry, 1999 MC lndet approximation used order for MC appr = 50 iter for MC appr = 30 numerical integration used for rho min and max rho= , *************************************************************** Variable Coefficient Asymptot t-stat z-probability const educ homeowners income W-educ W-homeowners W-income rho

26 Using plt(results) SAR heteroscedastic Gibbs Actual vs. Predicted Residuals Actual Predicted Mean of V i draws Posterior Density for rho Figure 2: plotting the results structure 25

27 Bayesian heteroscedastic/robust spatial models We introduce a more general version of the SAR, SDM and SEM models that allows for non-constant variance across space as well as outliers. When dealing with spatial datasets one can encounter what have become known as enclave effects, where a particular region does not follow the same relationship as the majority of spatial observations. For example, all counties in a single state might represent aberrant observations that differ from those in all other counties. This will lead to fat-tailed errors that are not normal, but more likely to follow a Student-t distribution. Introduce a set of variance scalars (v 1, v 2,..., v n ), as unknown parameters that need to be estimated. This allows us to assume ε N(0, σ 2 V ), where V = diag(v 1, v 2,..., v n ). The prior distribution for the v i terms takes the form of an independent χ 2 (r)/r distribution. Recall that the χ 2 distribution is a single parameter distribution, where we have represented this parameter as r. This allows us to estimate the additional n parameters v i in the model by adding the single parameter r to our estimation procedure. 26

28 The heteroscedastic prior r=2 r=5 r=20 1 prior probability density V i values Figure 3: Prior V i distributions for various values of r The prior mean of the v i equals unity and the prior variance of the v i is 2/r. This implies that as r becomes very large, the terms v i will all approach unity, resulting in V = I n, the traditional assumption of constant variance across space. On the other hand, small values of r lead to a skewed distribution that permits large values of v i that deviate greatly from the prior mean of unity. The role of these large v i values is to accommodate outliers or observations containing large variances by downweighting these observations. 27

29 Note that ε N(0, σ 2 V ), with V diagonal implies a generalized least-squares (GLS) correction to the vector y and explanatory variables matrix X. The GLS correction involves dividing through by v i, which leads to large v i values functioning to downweight these observations. Even in large samples, this prior will exert an impact on the estimation outcome. Bayesian heteroscedastic SAR model A normal-gamma conjugate prior for β and σ, and a uniform prior for ρ in addition to the chi-squared prior for the terms in V. The prior distributions are indicated using π. y = ρw y + Xβ + ε ε N(0, σ 2 V ) V = diag(v 1,..., v n ) π(β) N(c, T ) π(r/v i ) IIDχ 2 (r) π(1/σ 2 ) Γ(d, ν) π(ρ) U[0, 1] (24) 28

30 Estimation of Bayesian spatial models An unfortunate complication that arises with this extension is that the addition of the chi-squared prior creates a complicated posterior distribution. Assume for the moment, diffuse priors for β, σ. A key insight is that if we knew V, this problem would look like a GLS version of the SAR model. That is, conditional on V, we would arrive at similar expressions as in maximum likelihood, p where the y and X are transformed by dividing through by: diag(v ). We rely on a Markov Chain Monte Carlo (MCMC) estimation method that exploits this fact. The parameters β, V and σ in the heteroscedastic SAR model can be estimated by drawing sequentially from the conditional distributions of these parameters. 29

31 Conditional distributions To implement this estimation method, we need to determine the conditional distributions for each parameter in our Bayesian heteroscedastic SAR model. The conditional distribution for β follows from the insight that given V, we can rely on standard Bayesian GLS regression results to show that: p(β ρ, σ, V ) N( b, σ 2 B) (25) β = A(X V 1 Sy + σ 2 T 1 c) B = σ 2 A A = (X V 1 X + σ 2 T 1 ) 1 We see that the conditional for β is a multivariate normal distribution from which it is easy to sample a vector β. The conditional distribution for σ given the other parameters, takes the form (see Gelman, Carlin, Stern and Rubin, 1995): p(σ 2 β, ρ, V ) (σ 2 ) (n 2 +d+1) exp e V 1 e + 2ν 2σ 2 e = (I n ρw )y Xβ which is proportional to an inverse gamma distribution with parameters (n/2) + d and e V 1 e + 2ν. Again, this would be an easy distribution from which to sample a scalar value for σ. Geweke (1993) shows that the conditional distribution of V given the other parameters is proportional to a chi-square density with r

32 degrees of freedom. Specifically, we can express the conditional posterior of each v i as: p( e2 i + r v i β, ρ, σ 2, v i ) χ 2 (r + 1) (26) where v i = (v 1,..., v i 1, v i+1,..., v n ) for each i, and e is as defined above. Again, this represents a known distribution from which it is easy to construct a scalar draw. Finally, the conditional posterior distribution of ρ takes the form: p(ρ β, σ, V ) S(ρ) (s 2 (ρ)) (n k)/2 π(ρ) (27) s 2 (ρ) = (Sy Xb(ρ)) V 1 (Sy Xb(ρ)) (n k) Again, we can use either Metropolis-Hastings or univariate numerical integration to draw ρ values. 31

33 Example code while (iter <= ndraw); % ============> start sampling; xs = matmul(x,sqrt(v)); % ============> update beta ys = sqrt(v).*y; Wys = sqrt(v).*wy; AI = inv(xs *xs + sige*ti); yss = ys - rho*wys; b = xs *yss + sige*tic; b0 = AI*b; bhat = norm_rnd(sige*ai) + b0; xb = xs*bhat; nu1 = n + 2*nu; % ============> update sige e = (yss - xb); d1 = 2*d0 + e *e; chi = chis_rnd(1,nu1); sige = d1/chi; ev = y - rho*wy - xb; % ============> update vi chiv = chis_rnd(n,rval+1); vi = ((ev.*ev/sige) + in*rval)./chiv; V = in./vi; b0 = AI*xs *ys; % ============> update rho bd = AI*xs *Wys; e0 = ys - xs*b0; ed = Wys - xs*bd; epe0 = e0 *e0; eped = ed *ed; epe0d = ed *e0; rho = draw_rho(detval,epe0,eped,epe0d,n,k,rho); bsave(iter,1:k) = bhat ;% ============> save draws ssave(iter,1) = sige; psave(iter,1) = rho; vmean = vmean + vi; iter = iter + 1; end; % ============> end sampling 32

34 Applied Example To illustrate the heteroscedastic Bayesian SAR model, generate data with outliers. y = (I n ρw ) 1 Xβ + (I n ρw ) 1 ε (28) load anselin.dat; % 49 observations from Columbus, OH latt = anselin(:,4); long = anselin(:,5); % create W-matrix using Anselin s neigbhorhood crime data set [junk W junk] = xy2cont(latt,long); [n junk] = size(w); k = 3; IN = eye(n); sige = 0.1; rho = 0.7; % true values for sige and rho x = randn(n,k); % generate random normal X-matrix beta = ones(k,1); % true parameter values ndraw = 2500; % # of draws to carry out nomit = 500; % # of draws to exclude for burn-in % generate y based on the SAR model out = ones(n,1); out(30,1) = 10; out(35,1) = 10; out(40,1) = 10; y = inv(in-rho*w)*x*beta + inv(in-rho*w)*randn(n,1).*out; % estimate maximum likelihood model result = sar(y,x,w); prt(result); prior.rval = 4; % heteroscedastic prior result2 = sar_g(y,x,w,ndraw,nomit,prior); prt(result2); 33

35 Results Spatial autoregressive Model Estimates R-squared = Rbar-squared = sigma^2 = Nobs, Nvars = 49, 3 log-likelihood = # of iterations = 16 min and max rho = , total time in secs = time for lndet = time for t-stats = Pace and Barry, 1999 MC lndet approximation used order for MC appr = 50 iter for MC appr = 30 *************************************************************** Variable Coefficient Asymptot t-stat z-probability variable variable variable rho

36 Results Bayesian spatial autoregressive model Heteroscedastic model R-squared = sigma^2 = r-value = 4 Nobs, Nvars = 49, 3 ndraws,nomit = 2500, 500 total time in secs = time for lndet = time for sampling = Pace and Barry, 1999 MC lndet approximation used order for MC appr = 50 iter for MC appr = 30 numerical integration used for rho min and max rho = , *************************************************************** Posterior Estimates Variable Coefficient Std Deviation p-level variable variable variable rho

37 Actual Predicted SAR model Actual vs. Predicted Residuals Figure 4: Maximum likelihood act/pred and residuals 40 Viestimates Observations Figure 5: Posterior means of the v i estimates 36

Using Matrix Exponentials to Explore Spatial Structure in Regression Relationships

Using Matrix Exponentials to Explore Spatial Structure in Regression Relationships James P. LeSage 1 University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com R. Kelley