Quasi Maximum Likelihood Estimation of Spatial Models with Heterogeneous Coefficients

Size: px

Start display at page:

Download "Quasi Maximum Likelihood Estimation of Spatial Models with Heterogeneous Coefficients"

Alexis Richard
5 years ago
Views:

1 Quasi Maximum Likelihood Estimation of Spatial Models with Heterogeneous Coefficients Michele Aquaro Natalia Bailey M. Hashem Pesaran CESIFO WORKING PAPER NO. 548 CAEGORY : EMPIRICAL AND HEOREICAL MEHODS JUNE 5 An electronic version of the paper may be downloaded from the SSRN website: from the RePEc website: from the CESifo website: ISSN

2 CESifo Working Paper No. 548 Quasi Maximum Likelihood Estimation of Spatial Models with Heterogeneous Coefficients Abstract his paper considers spatial autoregressive panel data models and extends their analysis to the case where the spatial coefficients differ across the spatial units. It derives conditions under which the spatial coefficients are identified and develops a quasi maximum likelihood QML estimation procedure. Under certain regularity conditions, it is shown that the QML estimators of individual spatial coefficients are consistent and asymptotically normally distributed when both the time and cross section dimensions of the panel are large. It derives the asymptotic covariance matrix of the QML estimators allowing for the possibility of non-gaussian error processes. Small sample properties of the proposed estimators are investigated by Monte Carlo simulations for Gaussian and non-gaussian errors, and with spatial weight matrices of differing degree of sparseness. he simulation results are in line with the paper s key theoretical findings and show that the QML estimators have satisfactory small sample properties for panels with moderate time dimensions and irrespective of the number of cross section units in the panel, under certain sparsity conditions on the spatial weight matrix. JEL-Code: C, C3. Keywords: spatial panel data models, heterogeneous spatial lag, coefficients, identification, quasi maximum likelihood QML estimators, non-gaussian errors. Michele Aquaro University of Warwick Coventry, CV4 7AL / United Kingdom M.Aquaro@warwick.ac.uk Natalia Bailey Queen Mary, University of London London E 4NS / United Kingdom n.bailey@qmul.ac.uk M. Hashem Pesaran University of Southern California Los Angeles CA / USA pesaran@usc.edu June 9, 5 he authors would like to acknowledge helpul comments from Bernard Fingleton, Harry Kelejian, Ron Smith and Cynthia Fan Yang. Financial support under ESRC Grant ES/I366/ is also gratefully acknowledged.

3 Introduction Following the pioneering contributions of Whittle 954 and Cliff and Ord 973, important advances have been made in the analysis of spatial models. he original maximum likelihood approach of Cliff and Ord developed for a large number of spatial units N observed at a point in time or over a given time interval has been extended to cover panel data models with fixed effects and dynamics. Other estimation and testing techniques, such as the generalised method of moments GMM, have also been proposed. Some of the key references to the literature include Upton and Fingleton 985, Anselin 988, Cressie 993, Kelejian and Robinson 993, Ord and Getis 995, Anselin and Bera 998, and more recently, Haining 3, Lee 4, Kelejian and Prucha 999, Kelejian and Prucha, Lin and Lee, Lee and Yu, LeSage and Pace, Arbia, Cressie and Wikle, and Elhorst 4. Extensions to dynamic panels are provided by Anselin, Baltagi et al. 3, Kapoor et al. 7, Baltagi et al. 7, and Yu et al. 8. One important feature of the above contributions is the fact that they all assume that except for unit-specific effects, all other parameters, including the spatial lag coefficients, are homogeneous. his assumption might be needed in the case of pure spatial models or spatial panel data models with a short time dimension, but with increasing availability of large panel data sets where N and are both reasonably large it seems desirable to allow the spatial lag coefficients to vary across the spatial units. Examples of such data sets include large panels that cover regions, counties, states, or countries in the analysis of economic variables such as house prices, real wages, employment and income. For instance, in the empirical applications by Baltagi and Levin 986 on demand for tobacco consumption, and by Holly et al. on house price diffusion across States in the US, the maintained assumption that spillover effects from neighbouring States are the same across all the 48 mainland States seems unduly restricting, particularly considering the large size of the US and the uneven distribution of economic activity across it. his paper considers spatial autoregressive panel data models and extends their analysis to the case where the spatial coefficients differ across the spatial units. It derives conditions under which the spatial coefficients are globally and locally identified and proposes a quasi maximum likelihood QML estimation procedure for estimation and inference. It shows that the QML estimators are consistent and asymptotically normal under general regularity conditions when both the time and cross section dimensions of the panel are large. Asymptotic covariance matrices of the QML estimators are derived under Gaussian errors as well as when the errors are non-gaussian, and consistent estimators of these covariance matrices are proposed for inference. he pure spatial model is further extended to include exogenous regressors, allowing the slope coefficients of the regressors to vary over the cross section units. he model allows for spatial dependence directly through contemporaneous dependence of individual units on their neighbours, and indirectly through possible cross-sectional dependence in the regressors. he small sample performance of QML estimators are investigated by Monte Carlo simulations for different choices of the spatial weight matrices. he simulation results are in line with the paper s key theoretical findings, and show that the proposed estimators have good small sample properties for panels with moderate time dimensions and irrespective of the number of cross section units in the panel, although under non-gaussian errors, tests based on QML estimators of the spatial parameters can be slightly distorted when the time dimension is relatively

4 small. he rest of the paper is organised as follows. Section sets up the first order spatial autoregressive model with heterogeneous coefficients HSAR, formulates the assumptions, and derives the log-likelihood function. he identification problem is discussed in Section 3, followed by an account of consistency and asymptotic normality of the QML estimator in Sections 4 and 5, respectively. Section 6 considers the inclusion of heteroskedastic error variances and exogenous regressors in the HSAR model. Section 7 outlines the Monte Carlo design and reports small sample results bias, root mean square errors, size and power for different parameter values and sample size combinations. Some concluding remarks are provided in Section 8. Notation: We denote the largest and the smallest eigenvalues of the N N matrix A = a ij by λ max A and λ min A, respectively, its trace by tr A = N i= a ii, its maximum absolute N column sum norm by A = max j N i= a ij, and its maximum absolute row sum norm N by A = max i N j= a ij. stands for Hadamard product or element-wise matrix product operator, p denotes convergence in probability, and d convergence in distribution. All asymptotics are carried out for a given N and as. K and ɛ will be used to denote finite large and non-zero small positive numbers, respectively. A heterogeneous spatial autoregressive model HSAR he standard first-order spatial autoregressive panel data model is given by see, for example, Anselin 988 y it = ψ N w ij y jt + ε it, i =,,., N; t =,,.,, j= where scalar ψ is the spatial autoregressive parameter, assumed to be the same over all cross section units. Further, w i y t = N j= w ijy jt, where w i = w i, w i,., w in with w ii =, and y t = y t, y t,., y Nt. Here w i denotes an N non-stochastic vector which is determined a priori. In its non-normalised form, it comprises of binary elements taking values of one or zero depending on whether unit i is connected with unit j under a suitably defined distance metric, for i j, i, j =,,., N. Stacking the observations by N individual units, becomes: y t = ψw y t + ε t, t =,,.,, where W = w ij, i, j =,,., N, is an N N network or spatial weight matrix that characterises all the connections, and ε t = ε t, ε t,., ε Nt. A heterogeneous version of can be written as y it = ψ i N j= w ij y jt + ε it, i =,,., N; t =,,.,. 3 Initially, we assume that error variances are homoskedastic and set σi = σ for all i, so that we focus on the heterogeneity of ψ i. Extensions of the model to the case where the errors are heteroscedastic and 3 includes exogenous regressors are considered in Section 6.

5 Again, stacking the observations on individual units for each time period, t, we have I N ΨW y t = ε t, t =,,.,, 4 where Ψ = diag ψ, ψ = ψ, ψ,., ψ N, and I N is an N N identity matrix. For each i =,,., N, the true value of ψ i will be denoted by ψ i and accordingly Ψ denotes the value of Ψ evaluated at the true values, ψ = ψ, ψ,., ψ N. In what follows, we also make use of the notations, Sψ = I N ΨW, and S = I N Ψ W. For the analysis of identification and estimation of the heterogeneous spatial autoregressive HSAR model, we adopt the following assumptions: Assumption he N N spatial weight matrix, W = w ij, is exactly sparse such that h N = max i N I w ij, j N is bounded in N, where I A denotes the indicator function that takes the value of if A holds and otherwise, and all diagonal elements of matrix W are zero, i.e. w ii =, for all i =,,., N. Remark his assumption ensures that the maximum number of non-zero elements in each row is bounded in N, and the matrix norms W and W are bounded in N. Note that we do not need the weight matrix to be exactly sparse, and Assumption can be replaced by an approximate sparsity condition. Assumption he error terms, ε it, i =,,., N, N, t =,,., are independently distributed over i and t, have zero means, and constant variances, Eε it = σ, where < κ σ κ <, and κ, κ are finite generic constants independent of N. Assumption 3 he N + parameter vector, θ = ψ, σ Θ, is a sub-set of the N + dimensional Euclidean space, R N+. Θ is a closed and bounded compact set and includes the true value of θ, denoted by θ = ψ, σ, as an interior point. Assumption 4 λ min S ψsψ] >, for all values of ψ Θ and N. Assumption 5 Let y t = y t, y t,., y Nt, and consider the sample covariance matrix of y t, ˆΣ = y t y t/. 5 t= For a given N we have as ˆΣ p Σ, 6 uniformly in θ = ψ, σ, where Σ = σ I N Ψ W I N Ψ W = σ S S. Remark Assumptions and are standard in the spatial econometrics literature. Also, it is easily seen that under Assumption, Sψ = I N ΨW will be globally invertible if sup i ψ i W <. his follows since condition sup i ψ i W < ensures that matrix I N 3

6 ΨW is strictly diagonally dominant, and hence invertible. But for identification, estimation, and inference when N is large, we also need a similar condition on the columns of W, namely sup i ψ i W <. Combining the two conditions we have sup ψ i < max {/ W, / W }. 7 i See Lemma in Appendix A. his result reduces to the condition obtained in Lemma of Kelejian and Prucha for the homogeneous case where ψ i = ψ for all i. Remark 3 Note also that under Assumption and condition 7, for all values of ψ and N we have λ max S ψsψ ] Sψ Sψ + sup ψ i W + sup ψ i W < K. i i his result, together with Assumptions and 4, ensures that for all θ = ψ, σ Θ, and N λ min Σ θ] > and λ max Σ θ] < K <, 8 and where λ min Σ θ ] > and λ max Σ θ ] < K <, 9 Σ θ = σ I N ΨW I N ΨW = σ S ψsψ ].. he log-likelihood function Under Assumption and assuming condition 7 is met, then Sψ = I N ΨW is non-singular and 4 can be expressed as y t = S ψε t, t =,,.,. Under Assumption and assuming that the errors, ε it, are normally distributed, then the joint density function of y, y,., y is given by Sψ σ I N / exp π N/ σ y ts ψsψy t, t= and the quasi log-likelihood function can be written as lθ = N lnπ N ln σ + ln Sψ σ y ts ψsψy t, t= he N N matrix A = a ij is said to be strictly diagonally dominant if a ii > N j i aij, for all i =,,., N. hen by the Levy-Desplanques theorem it follows that A is non-singular. See 4

7 where as before θ = ψ, σ. he last term of the log-likelihood function can be written more conveniently as y ts ψsψy t = t= tr y ts ] ψsψy t t= ] = tr S t= ψsψ y ty t = tr S ψsψ ˆΣ ] where ˆΣ is defined by 5. Hence, lθ = N 3 Identification ln π N ln σ + ln Sψ σ tr S ψsψ ˆΣ ]. We now investigate the conditions under which θ is identified. We first note that E ˆΣ = E yt y t /, t= and using, E ˆΣ = σ S S, where E denotes expectation taken under the DGP characterised by the true parameter vector θ and, as before, S = Sψ = I N Ψ W. Now taking expectations of the log likelihood function,, at the true value of θ, θ, and using the above result we have E l θ ] = N ln π N ln σ + ln S N. 3 Next, we take expectations of the log-likelihood at some other parameter value, θ: Let P = I N DG, E l θ] = N ln π N ln σ + ln Sψ 4 σ σ tr S ψsψ S S ]. G G ψ = W I N Ψ W, 5 and D = Ψ Ψ. 6 5

8 hen, Sψ = I N Ψ W Ψ Ψ W = I N Ψ Ψ W I N Ψ W ] I N Ψ W = I N DG S. 7 Note that I N DG = SψS is invertible so long as condition 7 is met for all values of ψ i, including the true values, ψ i. Consider now the expectations of the difference of the log-likelihoods evaluated at θ and θ given in 3 and 4. We have, E l θ l θ] N But, = σ ln σ N ln I N DG + σ { N σ tr I N ΨW I N ΨW I N Ψ W I N Ψ W ] }. I N ΨW I N Ψ W = I N DG I N Ψ W I N Ψ W = I N DG. 8 herefore, after some algebra, and denoting N E l θ l θ] by Q N ϕ, we have Q N ϕ = ln δ + δ] N ln I N DG N δ tr DG 9 + N δ tr G G D, where ϕ = d, δ, δ = σ σ /σ <, and d = d, d,., d N, d i = ψ i ψ i, for i =,,., N. Using the derivative results in Appendix B, we have Q N ϕ ϕ { N = diag G I N DG ] τ N N δ diag G τ N + N δ diag G } G D τ N tr DG tr G G D ], δ δ + N and Q N ϕ / ϕ ϕ = N Λ N ϕ, where Λ N ϕ = A A + δ diag G G diag G τ N diag G G D τ N τ N diag G τ N diag G G D N δ ], τ N is an N vector of ones, is the Hadamard product matrix operator, and A = G I N DG = W I N ΨW. Hence, Q N / ϕ =, and using mean-value Using 8 we have I N DG = I N ΨW I N Ψ W, and hence as required. G I N DG = W I N Ψ W I N Ψ W I N ΨW = W I N ΨW, 6

9 extension of Q N ϕ we have Q N ϕ = N ϕ Λ N ϕ ϕ, where ϕ = d, δ = ψ ψ, ψ ψ,., ψ N ψ N, σ σ / σ ], ψi lies on the line segment joining and ψ i, for i =,,., N and < σ σ. Furthermore, Q N ϕ ϕ ϕ N λ min Λ N ϕ], and Q N ϕ = if and only if N ϕ ϕ =, so long as λ min Λ N ϕ] > for all values of ψ i lying on the line segment joining and ψ i and < σ σ. Hence, for a fixed N, the parameters ψ i, for i =,,., N and σ are globally identified if λ min Λ N ϕ] >, for all values of ψ i lying between and ψ i, and < σ σ, and are locally identified if λ min Λ N ] >. For future reference we note that ] G G Λ N = + diag G G diag G τ N τ N diag G. Also, using the inverse of partitioned matrices, a necessary and sufficient condition for the identification of ψ is given by the following rank condition ] G Rank G + diag G G N diag G τ N τ N diag G = N, 3 which reduces to the simple inequality condition under homogeneity of spatial coefficients, ψ i = ψ, for all i or Ψ = ψi N N τ N G G τ N + τ N diag G G τ N ] τ N N diag G τ N >. his condition further simplifies to N tr G + N tr G G ] > N tr G ]. 4 he two identification conditions 3 and 4 provide an interesting contrast between the spatial models with homogeneous and heterogeneous spatial effects. Under the homogeneity restriction, using 4 it is easily seen that ψ = is identified so long as N tr W W > for all N including as N. But the same is not true of the individual spatial coefficients, ψ i, under heterogeneity. o see this, note that when ψ i =, the i th row of W no longer enters the log-likelihood function and the i th unit becomes totally disconnected from the rest of the units. As an example consider the case where N = and note that in this case W =, 7

10 and hence the rank condition 3 reduces to Rank + ψ ψ ψ ψ ψ + ψ =. his condition is clearly satisfied unless ψ = ψ =. Consider now the case at the other extreme where N is allowed to rise without bounds. hen, for identification, it is also required that Q N ϕ tends to a finite limit as N. o simplify the exposition, we focus on local identification. Using we note that Q N ϕ = N d G G d + diag G G ] ] diag G τ N d+δ + δ N 4, where d diag G τ N = N i= d ig,ii, where g,ii is the i th diagonal element of G. Also, since Q N ϕ, then Furthermore, Q N ϕ = Q N ϕ + δ N i= g,ii N d d λ max G G + diag G G ] N / d d N / + δ 4. λ max G G + diag G G ] G G + diag G G, and by Lemma of Appendix A, and noting that then G G G G, Q N ϕ K d d + δ N 4, where K is a fixed constant which is bounded in N. As a result, Q N ϕ is also bounded in N, noting that ψ i and σ are assumed be bounded. herefore, assuming that λ min Λ N ] > for all values of N including when N, we have K d d + δ N 4 Q N ϕ ϕ ϕ λ min Λ N ] >, N and the N spatial parameters, ψ i for i =,,., N, are identified by the condition lim N d d = lim N N N ] i= ψ i ψ i =. N Clearly, as we have shown above, all the spatial parameters are identified if λ min Λ N ϕ] >. But if N is sufficiently large, some of the parameters might be unidentified. his is because the 8

11 condition lim N N i= ψ i ψ i /N =, does not necessarily imply ψ i = ψ i for all i. he main identification result is summarised in the following proposition. Proposition Consider the heterogeneous spatial autoregressive HSAR model given by 3 and suppose that Assumptions to 5 hold and the invertibility condition 7 is met. hen the true parameter values, σ and ψ i, for i =,,., N, are identified if λ min Λ N ϕ] >, where Λ N ϕ is defined by. In the case where N is large and rising without bounds, λ min Λ N ϕ] > is necessary for identification but need not be sufficient if the aim is to identify all the spatial coefficients, ψ i for i =,,., N, as N. For local identification the condition simplifies to λ min H > ɛ >, for all N, where H = G G + diag G G N diag G τ N τ N diag G, 5 G G ψ = W I N Ψ W, Ψ = diagψ, ψ = ψ, ψ,., ψ N, the i th element of diag G G is given by g i g i, and W is the spatial weight matrix defined under Assumption. For large N it is also required that λ max H < K for all N. 4 Consistency of the QML estimator he quasi maximum likelihood estimator, ˆθ = ˆψ, ˆσ, is defined by ˆθ = arg maxl θ. 6 θ Θ We now show that, under Assumptions to 5 and assuming that the invertibility condition 7 holds, we have p lim ˆθ = θ = ψ, σ, namely, ˆθ is a consistent estimator of θ. First, for the true parameter values θ Θ we have ] l θ θ = arg maxe, θ Θ which is unique over Θ see Section 3. herefore, Also, lim E But under Assumption 4 we have ] l θ l ˆθ p lim ] l θ p lim lim E l θ l θ p lim. = lim E l θ ], ]. 7 9

12 and given 7 we have that herefore, l ˆθ p lim lim E ] l θ l ˆθ p lim = lim lim E l θ and since θ presents the unique maximum of E l θ ], then 5 Asymptotic normality ], l θ p lim ˆθ = θ. 8 he QML estimator, ˆθ = ˆψ,ˆσ, solves the score function, U ˆθ = l ]. ˆθ / θ =, associated with the log-likelihood function given by. For a given N, consider now the mean expansion of U ˆθ around θ = U ˆθ = U θ H θ ˆθ θ, where H θ = l θ, and θ θ θ lies on the line segments joining ˆθ and θ. 3 In view of the consistency result, 8, we also have p lim θ = θ, and given that under our assumptions H θ is a smooth function of θ, then H θ p E l θ θ θ = H θ, which is a positive definite matrix under the identification condition. hus, the limiting distribution of ˆθ θ is given by the limiting distribution of / U θ. But U θ = lθ, lθ, ψ σ l θ ψ = diag G τ N + σ y t y t Ψ y t, 3 More specifically, denoting typical elements of θ, θ, and ˆθ by θ i, θ i, and ˆθ i, then the i, j element of H θ is evaluated at θ i, θ j, where θ i is a convex combination of ˆθ i and θ i. t=

13 and l θ σ = N σ = N σ + σ σ 4 y ti N Ψ W I N Ψ W y t t= N ] ε tε t, t= where G = W I N Ψ W = g,ij, y t = y t, y t,., y Nt = W y t, and τ N is an N vector of ones. Consider first the i th component of l θ / ψ, and note that it can be written as l θ ψ i = g,ii + σ yitε it. Also yit = e i,n G ε t, where e i,n is an N dimensional vector with its i th element unity and zeros elsewhere. hen l θ = η it, 9 ψ i where t= t= η it = e i,ng ζ t ζ it g,ii, ζ t = ε t /σ = ζ t, ζ t,..., ζ Nt, E ζ t =, E ζ t ζ it = e i,n, and V arζ t = I N. Also, E η it =, and E η it = e i,n G E ζ t ζ tζ it G e i,n g,ii, which yields V ar η it = g,ii ] N E ζ 4 it + g,ij. 3 j= ] Hence, E lθ ψ i =, and since, under Assumption, η it s are distributed independently over t, then so long as E as N, ω ii, where ω ii = lim ε 4+ɛ it t= lθ < K for some small positive ɛ >, ψ i will be distributed V ar η it = g,ii lim t= E ] ζit 4 + he above result holds even if N, so long as N j= g,ij is bounded in N. his latter condition is met if the maximum row sum norm of matrix G is bounded, which is ensured under Assumption and assuming condition 7 holds. See Lemma of Appendix A. Similarly, / lθ can be written as σ l θ σ = σ t= ε t ε t N ]. σ N j= g,ij.

14 or l θ σ = σ t= ξ tn, 3 where ξ tn = N i= ζ it. Hence, for a fixed N and under Assumption, ξ tn, for t =,,.,, have zero means and are serially independent with a finite variance, if E ε it 4+ɛ < K, for some small positive ɛ. Hence, under these conditions, / lθ also tends to N, ω σ, where N ω = lim V arζit. 4σ 4 t= i= For large N, one needs to consider the limiting distribution of N / lθ, which is also σ asymptotically normally distributed with variance now given by N lim N V arζit. N, 4σ 4 Finally, using 9 and 3 we have for i j t= i= ] l θ E l θ = ψ i ψ j = = t= t = t= t = E η it η jt E e ] i,ng ζ t ζ it g,ii e j,n G ζ t ζ jt g,jj E e ] i,ng ζ t ζ it g,ii e j,n G ζ t ζ jt g,jj t= = e i,ng E ζ it ζ jt ζ t ζ t G e j,n g,ii g,jj = g,ij g,ji, for i j. Recall from 3 that ] l θ E l θ = E η ] N it = g ψ i ψ,ii E ζ 4 it + i j= g,ij.

15 Finally, Let ] l θ E l θ σ ψ i γ = lim We collect the various terms of where = σ = σ = σ t= t = E ξ tn η it t= t = i= t= t = i= N E ζit e ] i,ng ζ t ζ it g,ii N g,ii t= E ζ 4 ] it =. t= σ E ] ζit 4 = lim J θ, γ = lim E l θ θ J,ii = g,ii γ + E ζit ] e i,n G ζ t ζ it g,ii V arζit. 3 t= l θ ] = J,ij, θ N g,ij, for i =,,., N j= J,ij = g,ij g,ji, for i j =,,., N J,iN+ = γg,ii σ, for i =,,., N, J,ii = Nγ 4σ 4, for i = N +. Alternatively, in matrix notation, we have J θ, γ = G G τ N τ N + γ I N] + diag G G τ N diag G γ σ γ σ diag G τ N Nγ 4σ Having established that the score vector is asymptotically normally distributed, it is now easily seen that, for a fixed N and as, ˆθ θ d N ], AsyV arˆθ, 34 where AsyV arˆθ =H θ J θ, γ H θ and H θ = E lθ ]. In the case θ θ where the errors, ε it, are Gaussian, γ = and, as to be expected, H θ = J θ, γ. his is 3

16 easily verified noting that H θ = E ] l θ θ θ = H H H H N+ N+, 35 where H = E H = E H = E H = H. ] l θ ψ ψ = G G + diag σ ] l θ ψ σ l θ σ ] = σ 4 = N σ 4 t= E y t y t Ψ y t ] + σ 6 t= ] t= E yit y it, i =,., N E y ti N Ψ W I N Ψ W y t ] Since y t = y t, y t,., y Nt = W y t then, for each i =,,., N, we have t= y it y it t= = y it t= = w i y t = w i y t y t w i. t= hen, ] t= E y it σ = σ w i = σ E yt y t ] w i t= w ii N Ψ W E εt ε t ] I N W Ψ w i t= = σ w ii N Ψ W σ I N IN W Ψ w i = g ig i, where g i is the ith row of G. Stacking over the N units and collecting the different components we obtain H = E ] l θ ψ ψ = G G + diag G G. Next, for each i =,,., N, we have recall that ε it = y it ψ i y it, E ] l θ ψ i σ = t= σ 4 E y it ε it = σ g,ii, 4

17 where the last equality follows from E t= y it ε it = E t= w i y tε it = E t= w i I N Ψ W ε t ε it = σ g,ii, ] = σ w ii N Ψ W e i,n and, as before, e i,n is an N dimensional vector with its i th element unity and zeros elsewhere. Stacking over the N units we obtain H = E ] l θ ψ σ = σ diag G τ N, where as before τ N is an N vector of ones. Finally, H = E ] l θ σ = E N σ 4 + σ 6 t= ε tε t = N σ 4. Using the above results we now have H θ = G G + diag G G diag G σ τ N τ N diag G σ N σ 4, 36 which is equal to J θ, γ defined by 33 for γ =, as required. o obtain the inverse of H θ, let H = H H H H and note that H θ = H H H H H H H H + H H H H H he asymptotic covariance matrix of ˆψ is given by H, where H = H H H H = G G + diag G G N diag G τ N τ N diag G. Under the identification conditions established in Proposition, H is full rank. Further, G has bounded maximum absolute row and column sum norms by Lemma of Appendix A, thus it also follows that G G has a bounded maximum absolute row column sum norm, and hence H has a bounded maximum absolute row column sum norm. he main result of this section can now be summarised in the following proposition. Proposition Consider the heterogeneous spatial autoregressive HSAR model given by 3 and suppose that: a Assumptions to 5 hold, b the invertibility condition 7 is met, c the. 5

18 N N information matrix H = G G + diag G G N diag G τ N τ N diag G, is full rank, where G = W I N Ψ W, Ψ = diagψ, ψ = ψ, ψ,., ψ N, the i th element of diag G G is given by g i g i, and W is the spatial weight matrix, and d ε it IIDN, σ. hen the maximum likelihood estimator of ψ, denoted by ˆψ and computed by 6, has the following asymptotic distribution as, where AsyV ar ˆψ = which does not depend on σ. ˆψ ψ d N ], AsyV ar ˆψ, ] G G + diag G G N diag G τ N τ N diag G, 37 Remark 4 In the case where ε it are non-gaussian but E ε it 4+ɛ < K holds for some small positive ɛ, the quasi maximum likelihood estimator, ˆψ, continues to be normally distributed but its asymptotic covariance matrix is given by the upper N N partition of H θ J θ, γ H θ where J θ, γ and H θ are defined by 33 and 36, respectively. Note that γ is defined by 3, and as noted earlier under Gaussian errors it takes the value of γ =, and we have J θ, = H θ. 5. Consistent estimation of AsyV arˆθ he asymptotic covariance matrix of ˆθ can be constructed directly by using the expressions derived for 33 and 36, yielding the standard formula AsyV arˆθ = H θ, 38 when the information matrix equality holds in the case of ε it IIDN, σ and γ =, and the sandwich formula AsyV arˆθ = H θ J θ, γ H θ, 39 otherwise. Consistent estimators of H θ and J θ, γ can be obtained by replacing θ with its QML estimator, ˆθ, and estimating γ by ˆγ = N N t= i= 4 ˆεit, ˆσ where ˆε it = y it ˆψ i N j= w ijy jt, with ˆσ and ˆψ i being the QML estimators of σ and ψ i, respectively. Alternatively, one can use the sample counterparts of J θ, γ and H θ and estimate the 6

19 covariance matrix of the QML estimators by V arˆθ = H ˆθ, 4 and V arˆθ = H ˆθ J ˆθ, ˆγ H ˆθ, 4 where H θ = lθ and J θ θ θ = lθ lθ θ θ with l θ = t= l t θ. he first and second derivatives are provided in Appendix C for the general case discussed in the next section. 6 he HSAR model with heteroskedastic error variances and exogenous regressors he heterogeneous spatial autoregressive model 3 can be extended to include exogenous regressors as well as heteroskedastic errors. In this case the spatial model can be written as y it = ψ i N j= w ij y jt + β ix it + ε it, i =,,., N; t =,,.,, 4 where, as before, N j= w ijy jt = w i y t, y t = y t, y t,., y Nt and w i = w i, w i,., w in with w ii =. Now, we also introduce a k vector of exogenous regressors x it = x i,t, x i,t,., x ik,t with parameters β i = β i, β i,., β ik. he above specification is sufficiently general and allows for the inclusion of fixed effects by setting one of the regressors, say x i,t, equal to unity. We also allow the errors, ε it, to be cross-sectionally heteroskedastic, namely V ar ε it = σi for i =,,., N. Stacking by individual units for each time period t, 4 becomes y t = ΨW y t + Bx t + ε t, t =,,.,, 43 where Ψ = diag ψ and ψ = ψ, ψ,., ψ N, W = w ij, i, j =,,., N, B = diag β, β,., β N, x t = x t, x t,., x Nt, and ε t = ε t, ε t,., ε Nt. Under Assumption and assuming condition 7 holds, then 43 can be written as y t = I N ΨW Bx t + ε t, t =,,.,. 44 he quasi log-likelihood function can then be written as assuming that the errors are Gaussian lθ = N lnπ N ln σi + ln I N ΨW 45 i= I N ΨW y t Bx t ] Σ ε I N ΨW y t Bx t ], t= 7

20 where Σ ε = diagσ, σ,., σ N. Alternatively, it is often more convenient to write the above log-likelihood function as lθ = N lnπ N ln σi + ln I N ΨW i= N i= y i ψ i y i X iβ i y i ψ i y i X iβ i σi, 46 where θ= ψ, β, β,., β N, σ, σ,., σ N with β i = β i, β i,., β ik, X i = x i, x i,., x i is the k matrix of regressors on the i th cross section unit with x it = x i,t, x i,t,., x ik,t, y i = y i, y i,., y i, and y i = y i, y i,., y i, with the elements yit = N j= w ijy jt = w i y t. he analysis of this model can proceed as in the case of the HSAR model discussed earlier without further conceptual complications. We make the following assumptions: Assumption 6 he Nk + parameter vector, θ = ψ, β, β,., β N, σ, σ,., σ N Θ, is a sub-set of the Nk + dimensional Euclidean space, R Nk+. Θ is a closed and bounded compact set and includes the true value of θ, denoted by θ, as an interior point, and sup i β i < K. Assumption 7 he error terms, ε it, i =,,., N, N, t =,,., are independently distributed over i and t, have zero means, and constant variances, Eε it = σ i, where < κ σ i κ <, and κ, κ are finite generic constants independent of N. Assumption 8 he regressors, x it, for i =,,..., N, are exogenous such that E x it ε jt = for all i and j, t= x it ε jt p, uniformly in i and j =,,., N. he covariance matrices Ex it x jt = Σ ij, for all i and j, are time-invariant and finite, Σ ii is non-singular, X ix j p Σ ij, sup i λmax X ix i ] < K, and infi λmin X ix i ] >. Remark 5 Assumption 8 is standard and allows for the regressors to be cross-sectionally correlated. his is sufficiently general and applies both when N is finite and when it rises without bounds. he analysis of identification can now proceed as before and will be based on H θ = E lθ ]. he relevant partial derivatives of the associated Hessian matrix are given θ θ in Appendix C. Setting β= β, β,., β N, and σ = σ, σ,., σ N, we write where H = E H = E H θ = E ] l θ θ θ = ] l θ ψ ψ, H = E l θ β β ], H 3 = E H H H 3. H H 3 H 33 l θ ψ β l θ β σ Nk+ Nk+ ], H 3 = E ], H 33 = E ] l θ ψ σ,, 47 l θ σ σ ]. 8

21 H is given by the N N matrix H = G G + diag σi t= E y it, i =,,., N ] where, as before, G = W I N Ψ W with its i th row denoted by g i, and t= E y it = w i I N Ψ W BE x t x t B ] + Σ ε IN W Ψ w i = g i BE xt x t B ] + Σ ε gi, where BE x t x t B is an N N matrix with its r, s element given by β rσ rs β s. H is an N kn matrix with its i th row given by a kn vector of zeros except for its i th block which is given by the k vector σi E y i X i, namely H = σ E y X σ E y X.... σ N E y N X N,, where E y i X i = E yitx it t= = E = w ii N Ψ W BE x t x it = g i Σ i β, Σ i β,., Σ in β N. t= w iy t x it H 3 is an N N diagonal matrix with its i th element given by σi w i I N Ψ W e i,n = σi g,ii, H is an Nk Nk block diagonal matrix with its i th block given by σi Σ ii, H 3 =, and H 33 = diag/σ 4, /σ4,., /σ4 N. For a finite N and as all the parameters are identified under Assumptions, 4, 5, 6, 7, and 8, and assuming that H θ, defined by 47, is a positive definite matrix. Since under these Assumptions, σi >, Σ ii >, for all i, then once ψ i s are identified, then β is and σi s are also identified conditional on ψ is. o derive the necessary and sufficient condition for identification of spatial parameters we partition H θ as follows H θ = H H. H where H = H, H 3 is an N Nk + N matrix, and since H 3 = H 3 =, then H = diag H, H 33, which is an Nk + N Nk + N matrix. he asymptotic covariance matrix, 9

22 of ˆψ is given by H, where But H = H H H H = H H H H H 3 H 33 H H = G G + diag σ i g i BΣxx B + Σ ε gi, i =,,., N ], where the r, s element of BΣ xx B is given by β rσ rs β s as before, and g i Σ εg i = N s= σ sg,is. Hence Similarly, H = G G N ] + diag σ s /σi g,is, i =,,., N + diag σ i N r= s= s= ] N g,is g,ir β rσ rs β s, i =,,., N. H H H = diag σi w ii N Ψ W BE x t x it Σ ii E ] x it x t B I N W Ψ w i, i =,,., N = diag σi g i Σ i β, Σ i β,., Σ in β N Σ ii Σ i β, Σ i β,., Σ in β N g i, i =,,., N ] ] N N = diag σi g,is g,ir β rσ ri Σ ii Σ isβ s, i =,,., N, and r= s= Using the above results in 48, now yields, H 3 H 33 H 3 = diag g,ii, i =,,., N ]. H = G G + diag g,ii + + diag σ i N r= s= N g,is g,ir β r N s=,s i σ s /σi g,is, i =,,., N 49 Σrs Σ ri Σ ii Σ is βs, i =,,., N he necessary and sufficient condition for identification of ψ i s in the general model, 4, is now give by Rank H = N. In the special case where the regressors are cross-sectionally uncorrelated, namely when Σ rs =, if r s, the third term in the above result vanishes and we have H = G G + diag g,ii + N s=,s i which does not depend on β is or the exogenous regressors. σ s /σi g,is, i =,,., N, ].

23 In the case where N is large, for identification of ψ we require the following two conditions λ min H > ɛ >, for all N, λ max H < K <, for all N. he first condition is the usual rank condition when N is finite. Regarding the second condition we first note that since under the identification condition H is a symmetric positive definite matrix then λ max H = λ max H H. Also it is easily seen that H G G + sup i g,ii + N N + sup i σ i g,is g,ir β r r= s= N s=,s i σ s /σi g,is Σrs Σ ri Σ ii Σ is βs. 5 Recall that G < K and G < K, are bounded in N when Assumption and 7 hold. See Lemma in Appendix A, from which it also follows that G G < K. Furthermore, g,ii + N s=,s i σ s /σi g,is sup iσi inf i σi N s= g,is < K, since the maximum absolute row column norm of matrix G is bounded, and < κ σ i κ < by assumption. Consider now the last term of 5 and note that N σ i r= s= σ i σ i σ i N N g,is g,ir β r r= s= N g,is g,ir β r Σrs Σ ri Σ ii Σ is βs Σrs Σ ri Σ ii Σ is βs sup β r Σrs Σ ri Σ ii Σ N is βs r,s sup r β r sup s r= s= β s sup Σ rs Σ ri Σ ii Σ is r.s N g,is g,ir N g,is. Under Assumptions 6, 7 and 8, inf σi >, sup r β r, sup s β s, Σ rs and Σ ii exist and are finite. Also sup N i s= g,is = G, which is bounded under our assumptions. Hence it follows that N N sup i σ i g,is g,ir β r Σrs Σ ri Σ ii Σ is βs < K. herefore, overall we have r= s= λ max H = λmax H s= H < K, for all N. Com-

24 bining this result together with the identification condition we have ɛ < λ min H < λ max H < K, for some small positive, ɛ, and a finite possibly large positive constant, K. Inverting the above inequality also yields 4 K λ min H λ max H ɛ, which establishes that H exists and is bounded even for N large. he main result of this section can now be summarised in the following proposition. Proposition 3 Consider the heterogeneous spatial autoregressive HSAR model given by 4 and suppose that: a Assumptions, 4, 5, and 6, 7, and 8 hold, b the invertibility condition 7 is met, c λ min H > ɛ >, for all N, where H is the N N matrix H = G G + diag g,ii + + diag σ i N r= s= N g,is g,ir β r N s=,s i σ s /σi g,is, i =,,., N Σrs Σ ri Σ ii Σ is βs, i =,,., N G = W I N Ψ W = g,ij, Ψ = diagψ, ψ = ψ, ψ,., ψ N, and W is the spatial weight matrix, and d ε it IIDN, σi. hen the maximum likelihood estimator of ψ, denoted by ˆψ and computed by 6, has the following asymptotic distribution as, ˆψ ψ d N ], AsyV ar ˆψ, ], where AsyV ar ] ˆψ = H. he asymptotic results hold for any N. Remark 6 he results of the above theorem readily extend to spatial models with non-gaussian errors, but as in the case of pure spatial models the estimates of the variance matrix of the QML estimators must be based on the sandwich formula V ar ˆθ = H ˆθ J ˆθ where H ˆθ and J ˆθ are given in Appendix C. H ˆθ, 4 Recall that for any positive definite matrix A, we have λ maxa = /λ min A.

25 7 Monte Carlo study We investigate the small sample properties of the proposed QML estimator through a Monte Carlo simulation study. We consider the following data generating process y it = a i + ψ i N j= w ij y jt + β i x it + ε it, i =,,., N; t =,,.,. 5 We include one exogenous regressor, x it, with coefficient β i in each regression. Stacking by individual units, 5 becomes y t = a + ΨW y t + Bx t + ε t, t =,,.,, 5 where a = a, a,., a N, Ψ = diag ψ and ψ = ψ, ψ,., ψ N, W = w ij, i, j =,,., N, B = diag β, where β = β, β,., β N, x t = x t, x t,., x Nt, and ε t = ε t, ε t,., ε Nt. he unknown parameters are summarised in the vector θ = a, ψ, β, σ, σ = σ, σ,..., σ N. In total there are 4N unknown parameters. We generate x it as x it = φ i w ix t + v it, 53 or in matrix form x t = I N ΦW v t, where x t = x t, x t,., x Nt, Φ = diagφ, φ,., φ N, and v t = v t, v t,., v Nt, with v it IIDN, σ v. We set φ i =.5 representing a moderate degree of spatial dependence, and σ v = N tr I N ΦW I N ΦW ]. his ensures that N N i= V arx it =. For W = w ij, i, j =,,., N, we use the 4 connection spatial matrix below. We consider both Gaussian and non-gaussian errors. Specifically we consider the following two error generating processes ε it /σ i IIDN,, and ε it /σ i IID χ ] /, for i =,,., N, and t =,,.,, where χ is a chi-square variate with degrees of freedom. σi are generated as independent draws from χ /4 +.5, for i =,,..., N, and kept fixed across the replications. For the weight matrix, W = w ij, we first use contiguity criteria to generate the nonnormalized weights, wij o, then row normalise the resultant weight matrices to obtain w ij. More specifically, we consider W matrices with, 4 and connections and generate wij o, for i, j =,,., N, as 3

26 connections: wi,j o = wo i,j+ =, and zero otherwise, 4 connections: wi,j o = wo i,j = wo i,j+ = wo i,j+ =, and zero otherwise, connections: wi,j o = = wo i,j 5 = wo i,j+ = = wo i,j+5 =, and zero otherwise. Since by construction W =, then condition 7 is satisfied if sup i ψ i <, and ensures that I N ΨW is invertible. hen, y t = I N ΨW a + Bx t + ε t ; t =,,.,. We consider two main experiments: Experiment A, where a i IIDN,, β i are set to zero and ψ i IIDU,.8, for i =,,., N, and Experiment B, where a i IIDN,, β i IIDU, and ψ i IIDU,.8, for i =,,., N. We consider the following N, combinations: N = 5, 5, 75, and = 5, 5,,, and use R = replications for each experiment. Across the replications, θ, and the weight matrix, W, are kept fixed, whilst the errors and the regressors, ε it and x it and hence y it, are re-generated randomly in each replication. Note that, as N increases, supplementary units are added to the original vector θ generated initially for N = 5. Due to the problem of simultaneity, the degree of time variation in yit for each unit i depends on the choice of W as well as the number of cross section units, N. Naturally, this is reflected in the performance of the ψ i estimators and the tests based on them. We compute both bias and RMSE of the QML estimators for individual cross section units, as well as their averages across all N cross section units. In addition, we report empirical sizes based on the individual spatial autoregressive parameter estimates and power functions for three units whose distinct true spatial autoregressive parameters, ψ i, are selected to be low, medium and large in magnitude. he experiments are carried out for spatial weight matrices, W, with two, four and ten connections and the spatial parameter estimates are denoted by ˆψ ir = R R ˆψ r= i,r, where ˆψ i,r refers to the QML estimator of ψ i in the r th replication. 7. Bias and RMSE results he QML estimators are computed using the likelihood function 46. We start with the basic model in Experiment A that assumes a first order heterogeneous spatial autoregressive model without any exogenous variables and with normally distributed errors. able A reports the average bias and RMSE of the mean ˆψ ir estimates, averaged across replications and over all cross section units. For each spatial weight matrix used W with two, four or ten connections the average biases lie in the vicinity of zero while their corresponding RMSEs decline with for a given N, as to be expected. able A displays bias and RMSE results for the spatial autoregressive parameters, ˆψiR, of individual units arranged with respect to their true values in an ascending order, from lowest to the highest, for clarity of exposition. hese are shown as N increases from 5 to. Due to the large number of spatial parameter estimates for the cross section sizes N considered, some of these have been excluded from able A and are available upon request from the authors. Also to save space, we are only reporting results based on the spatial weight matrix, W, with four connections. Results for other choices of spatial weight matrices are available upon request. Overall, it is evident that the biases of the individual spatial autoregressive parameters are close to zero even for small values of, and irrespective of the magnitude of the spatial 4

27 parameter. As the theory suggests, the quality of the individual spatial parameter estimates are not affected by the size of N, but improve with the time dimension. he reported RMSEs decline with for all values of N. Similar results can be obtained even if the errors are non-gaussian. ables A4 and A5 give bias and RMSE of the QML estimators of ψ i when the errors are generated as iid χ random variables. hese tables follow the same lay out as those of ables A and A, and differ from them only in the way the errors are generated. A comparison of the results across these two sets of tables show that the QML estimator is reasonably robust to non-normal errors, and the rate at which RMSEs decline with when the errors are generated as iid χ tends to be similar to when the errors are generated as iid N,. he bias and RMSE results for Experiment B with normally distributed errors are summarised in able B for the average estimates across the spatial units, and in able B for the individual spatial estimates as for Experiment A. It is clear that adding exogenous regressors to the spatial model does not alter the main conclusions and, if anything, their inclusion can marginally improve the precision with which individual spatial parameters are estimated. o save space results for the non-gaussian errors scenario for Experiment B are not shown but conclusions are qualitatively analogous to those for Experiment A. hese results are available upon request from the authors. 7. Size and power results he empirical size of the tests based on the individual QML estimates of ψ i for Experiment A, in the case where the errors are iid Normal are summarized in able A3. We present results based on the sample standard and sandwich covariance matrix formulae given by 4 and 4, respectively. As can be seen, in general the tests are correctly sized at 5 per cent for relatively large, although for small values of there are size distortions when both the standard and sandwich formulae are used. he results based on the standard and sandwich formulae both converge to 5 per cent as increases, irrespective of the value of N. he empirical power function of the tests are displayed in Figures A-A3 for three cross section units with a low, medium and high spatial parameter. More precisely, Figure A shows a number of power functions under different alternative hypotheses ψ i = ψ i +δ, with ψ i =.3374 and δ =.8,.79,.,.79,.8, or until the parameter space boundaries of and are reached for i =,,., N. Similarly, Figures A and A3 depict the same functions when ψ i takes a medium value ψ i =.559 and a high value ψ i =.7676, respectively. Note that in order to save space, the depicted power functions are based on tests where the sandwich formula is used. 5 Overall, for a specific cross section unit, i, the empirical power functions are similar for all N but improve with. Furthermore, perhaps not surprisingly, empirical power functions become more and more asymmetrical as ψ i s move closer and closer to the boundary value of. Empirical size estimates for Experiment A when errors are non-normal are summarized in able A6. here are some size distortions when is small irrespective of whether the standard or sandwich formulae are used. But as increases the size distortion of the tests based on both formulae tend to zero. See, for example, the size estimates for N = 5 and =. he estimates of the power functions, computed using the sandwich formula, are provided in Figures A4-A6, and are comparable to those shown above for the Gaussian case. 5 Equivalent power functions based on the standard covariance matrix estimator are available upon request. 5

28 Size and power results for Experiment B with Gaussian errors are summarised in ables B3-B4, and in Figures B-B6. he results are comparable to those obtained for Experiment A, and there are some size distortions when is small, but the size estimates converge to their nominal value of 5 per cent as increases. For completeness, the associated power functions are displayed in Figures B-B3, which tend to be sharper than those shown in Figures A-A3, for the experiments without exogenous regressors. Adding an exogenous regressor to the spatial model seems to result in more precise estimates of the spatial coefficients. urning to size and power of tests based on the regression coefficients, β i, the size results are summarized in able B4. As can be seen, empirical sizes using the standard variance formula are very close to the nominal value of 5 per cent for all N and combinations, although there are some size distortions when is small. On the other hand, using the sandwich formula produces larger size distortions when is small. he associated power functions computed using the sandwich formula are displayed in Figures B4-B6 for a low value of β i β i =.344, a medium β i =.4898, and for a high value β i =.9649, respectively. Again the empirical power functions are similar across N and improve with. 8 Conclusion Standard spatial econometric models assume a single parameter to characterise the intensity or strength of spatial dependence. In the case of pure cross section models or panel data models with a short, such restrictive parameter specification might be inevitable. However, in a data rich environment where both the time and cross section N dimensions are large, this assumption can be relaxed. his paper investigates a spatial autoregressive panel data model with fully heterogeneous spatial parameters. he asymptotic properties of the quasi maximum likelihood estimator are analysed assuming a sparse spatial structure with each individual unit having at least one connection. Conditions under which the QML estimator of spatial parameters, ψ i, are consistent and asymptotically normal are derived. It is also shown that under certain bound conditions on WI N ΨW the asymptotic properties of the individual estimates are not affected by the size of N. Monte Carlo simulation results provided are supportive of the theoretical findings. Extensions of this model specification that incorporate richer temporal and spatial dynamics and that accommodate negative as well as positive connections are interesting avenues for future research. he methods developed in the paper can also be applied to hierarchical panel data models where spatial parameters are assumed to be the same within regions groups but allowed to differ across regions or groups. 6

29 Appendix A echnical lemmas Lemma Let W comply to Assumption. hen matrix Sψ = I N ΨW is non-singular if sup ψ i < max {/ W, / W }, i =,,., N. i Proof. Let ϱ ΨW be the spectral radius of matrix ΨW. Non-singularity of Sψ = I N ΨW is ensured if ϱ ΨW <. A. However, since for any matrix norm A, ϱ A A, then using the maximum column sum matrix norm we have and from A. we have ϱ ΨW ΨW Ψ W = sup ψ i W, A. i sup ψ i W <. i Similarly, using a maximum row sum matrix norm we have sup ψ i W <, i where we have used the result Ψ = Ψ = sup ψ i. herefore, matrix Sψ = I N ΨW is invertible if i sup ψ i < max {/ W, / W }, i =,,., N. A.3 i Lemma Let G ψ = W I N ΨW, and suppose that Assumption holds. hen G ψ < K and G ψ < K, A.4 for all values of ψ= ψ, ψ,., ψ N that satisfy condition 7. Proof. Under condition 7, we have G ψ = W + W ΨW + W ΨW ΨW +., and G ψ W + W Ψ + W 3 Ψ + But Ψ s = sup i ψ i ] s, and under condition 7 we have sup ψ i W <. Hence i G ψ W sup i. ψ i W 7

30 Similarly, G ψ = W sup i. ψ i W he boundedness of column and row matrix norms of G ψ now follow since, under Assumption, W and W are bounded, sup ψ i W >, and sup ψ i W >. i i Appendix B Derivatives of Q N ϕ We consider the first and second derivatives of function Q N ϕ, as specified in 9 and repeated below for convenience: Q N ϕ = ln δ + δ] N ln I N DG N δ tr DG + N δ tr G G D where ϕ = d, δ, δ = σ σ /σ <, and d = d, d,., d N, d i = ψ i ψ i, for i =,,., N. First derivatives For the first derivatives, we have: Q N ϕ ϕ = ϕϕ d ϕϕ δ N+. Q N ϕ d QN ϕ = d i = N g i I N DG e i N δ g,ii + N δ g ig i d i, where e i is the i th column of E ij matrix of zeros bar element i, j which is equal to unity, and g i is the ith row of G. Q N ϕ δ = δ δ + N Overall, stacking over cross section units we get: Q N ϕ ϕ tr DG tr G G D ]. N = diag G I N DG ] τ N N δ diag G τ N + N δ diag G G D τ N tr DG tr G G D ], where τ N is a N vector of ones. δ δ + N 8

31 Second derivatives For the second derivatives, we have: Q N ϕ ϕ ϕ = Q N ϕ d d Q N ϕ δ d Q N ϕ d δ Q N ϕ δ N+ N+. First, where Q N ϕ d d = Q N ϕ d i d j, Q N ϕ d i d j = Further, and finally { N g i I N DG E ii G I N DG e i + N δ g i g i, if i = j N g i I N DG E jj G I N DG e i, if i j. Q N ϕ d δ Q N ϕ = = d i δ Q N ϕ δ = δ. N g,ii N g ig i d i, Overall, stacking over cross section units we get Q N ϕ / ϕ ϕ = N Λ N ϕ, where: Λ N ϕ = A A + δ diag G G diag G τ N diag G G D τ N τ N diag G τ N diag G G D N δ where τ N is an N vector of ones, is the Hadamard product matrix operator, and A = G I N DG = W I N ΨW. Appendix C Estimator of AsyV ar ˆθ Derivatives of the log-likelihood function he vector of maximum likelihood estimates, ˆθ, in Section 6 is obtained by maximising the log-likelihood function 46 which we reproduce for convenience here lθ = N N lnπ ln σi + ln I N ΨW N y i ψ i y i X iβ i y i ψ i y i X iβ i σ, i= i= i C.5 where θ = ψ, β, β,., β N, σ, σ,., σ N = ψ, β, β,., β N, σ, with σ = σ, σ,., σ N. ], 9

GMM estimation of spatial panels

MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted