The Saddlepoint Approximation of Moran s I s and Local Moran s I i s Reference Distributions and Their Numerical Evaluation

Size: px

Start display at page:

Download "The Saddlepoint Approximation of Moran s I s and Local Moran s I i s Reference Distributions and Their Numerical Evaluation"

Molly Johnson
5 years ago
Views:

1 Michael Tiefelsdorf The Saddlepoint Approximation of Moran s I s and Local Moran s I i s Reference Distributions and Their Numerical Evaluation Global Moran s I and local Moran s I i are the most commonly used test statistics for spatial autocorrelation in univariate map patterns or in regression residuals. They belong to the general class of ratios of quadratic forms for whom a whole array of approximation techniques has been proposed in the statistical literature, such as the prominent saddlepoint approximation by Offer Lieberman (1994). The saddlepoint approximation outperforms other approximation methods with respect to its accuracy and computational costs. In addition, only the saddlepoint approximation is capable of handling, in analytical terms, reference distributions of Moran s I that are subject to significant underlying spatial processes. The accuracy and computational benefits of the saddlepoint approximation are demonstrated for a set of local Moran s I i statistics under either the assumption of global spatial independence or subject to an underlying global spatial process. Local Moran s I i is known to have an excessive kurtosis and thus void the use of the simple approximation methods of its reference distribution. The results demonstrate how well the saddlepoint approximation fits the reference distribution of local Moran s I i. Furthermore, for local Moran s I i under the assumption of global spatial independence several algebraic simplifications lead to substantial gains in numerical efficiency. This makes it possible to evaluate local Moran s I i s significance in large spatial tessellations. Moran s I and several related spatial statistics, such as Geary s c (see Cliff and Ord 1981, p. 167), which can be expressed as quadratic forms, are frequently encountered in the spatial statistical literature and implemented in several spatial software packages to test for spatial autocorrelation in regression residuals. So far, the assessment of the significances of the observed values of these statistics is performed either under the assumption that the statistics follow approximately a normal distribution for which the expectation and variance can be calculated, or by extensive simulation experiments that either randomize the locations of the observed residuals or assume The author thanks J. Keith Ord for his valuable comments and suggestions on an earlier draft of this paper. Michael Tiefelsdorf is assistant professor of geography at The Ohio State University. tiefelsdorf.1@osu.edu Geographical Analysis, Vol. 34, No. 3 (July 00) The Ohio State University Submitted: 9/10/01. Revised version accepted: 1/04/0

2 188 / Geographical Analysis a specific error structure of the underlying regression disturbances. In addition, we can give the exact reference distribution of these ratios of quadratic forms for normally distributed regression residuals. These exact distributions allow us to evaluate the performance of any approximation. This paper focuses on an investigation of the performance of the saddlepoint approximation method as a substitute for the exact reference distribution of Moran s I. The results demonstrate that saddlepoint approximation provides a substantial improvement over simpler approximation methods, and that it can be applied in a wider range of conditions than the conventional methods. While, in most instances, under the assumption of spatial independence a correctly specified normal approximation is quite feasible for empirical tessellations with more than 100 spatial objects, there are numerous situations in spatial statistics where the normal approximation leads to a misjudgment of the significance of an observed test statistics. Most exceptions are related to unusual forms of the spatial link matrix or to peculiar sets of exogenous variables in the regression model. For example, the reference distributions of local Moran s I i and also global Moran s I, which is defined by higher-order neighbor link matrices (Boots and Tiefelsdorf 000), as well as the general spatial cross-product statistic (see Costanzo, Hubert, and Golledge 1983) do not necessarily converge asymptotically toward the normal distribution as the number of spatial observations increases. Another often overlooked issue in practical applications of the Moran s I test is the specification of its expectation and variance. The normal approximation is based on both statistics. The moments of Moran s I depend on the underlying set of exogenous variables X in the regression model (see, for instance, Hepple 1998, or Tiefelsdorf 000). Most expositions in geographical literature as well as software implementations of Moran s I skip this generalized specification of the moments. For regression residuals, which depend on the exogenous variables, Tiefelsdorf and Boots (1995) have shown that this negligence leads, as in the case of the Durbin-Watson d statistic for serial autocorrelation, to an indeterminate area. This indeterminate area envelops the true significance for an observed Moran s I obs from below and above. As the number of explanatory variables increases, the envelope becomes wider and as the number of observations increases, it shrinks. Furthermore, even if the normal approximation is feasible under the assumption of spatial independence, it will no longer hold for spatial test statistics under the influence of a significant spatial process. For example, like the well-known Pearson product moment correlation coefficient ρ, the possible value range of Moran s I is bound from below and above. However, in contrast to the Pearson s ρ, the bounds of Moran s I do not need to be symmetrical around its expectation [see Tiefelsdorf (1998) for the calculation of Moran s I bounds]. The conditional reference distribution of Moran s I will be shifted towards either bound in the presence of an underlying spatial process with either significant positive or negative spatial autocorrelation. This shift will skew the reference distribution and will invalidate the normal approximation, which assumes symmetry of the bounds around its center. In addition, the regular expressions for the moments under the assumption of spatial independence are no longer valid and the evaluation of moments conditional upon an underlying spatial process is complex and requires numerical integration (see Tiefelsdorf 000, section 9.). This departure from the assumption of spatial independence rules out all moment-based approximation methods. In lieu of analytical knowledge about the reference distribution under the influence of significant spatial process, it is common practice to fall back upon simulation experiments. For the general class of simultaneous and conditional autoregressive Gaussian processes as well as moving average Gaussian processes in spatial regression models, however, the exact conditional distribution of global Moran s I and local Moran s I i can be calculated, which enables us to directly study the power of these tests and to adjust the probability of local Moran s I i subject to the global underlying

3 Michael Tiefelsdorf / 189 spatial process. The gains of using, whenever possible, the exact analytical approach over simulation techniques are quite substantial: (1) the resulting reference distributions of the exact approach are free from random influences which are even present in the simulation experiments with several thousand repetitions; () the analyst has full control over all parameters of the model and a change in the model constellation does not require the repetitions of the simulation experiment; and (3) the exact approach is numerically more efficient. Because the saddlepoint approximation method, which is investigated in this paper, uses the same information as input as the exact approach, it is expected to come close the reference distribution. The choice for using local Moran s I i as an example for the saddlepoint approximation is motivated by two considerations. First, it is well known that the reference distribution of local Moran s I i deviates substantially from the normal distribution and this thus takes all approximation methods to the test. And secondly, local Moran s I i is a highly relevant test statistic in model-driven spatial data analysis such as the local analysis of regression residuals. It singles out spatial clusters (that is, those cliques of equally signed extreme regression residuals) and hot spots (that is, those outstanding regression residuals with respect to their neighborhoods) within a global spatial pattern. The definitions for hot spots and (clinical) spatial clusters are dependent on the spatial scale and they follow in this paper the process-based definitions given by Wartenberg and Greenberg (1990, p. S158; see also Waller and Lawson 1995, p. 94). These process-based definitions should not be confused with the pattern-based perspectives given elsewhere in the literature, which use the terms hot spot, cold spot, and spatial cluster synonymously to signify positive local autocorrelation and the term spatial outlier to denote negative local autocorrelation. Another prominent spatial test for local spatial association is Getis and Ord s G i statistic (Ord and Getis 1995), which has been developed further by Ord and Getis (001) to incorporate a conditional perspective based on an underlying positively autocorrelated spatial process. Boots and Tiefelsdorf (000, p. 30) give a brief comparison of the two local spatial tests. Basically local Moran s I i is able to uncover negative local spatial autocorrelation whereas local Getis and Ord s G i statistic is able to distinguish between clusters of above average data values and below average data values. In addition, Tiefelsdorf and Boots (1997) discuss an immanent design property of the local Moran s I i statistic. It is only sensitive if regression residuals at the reference location deviate substantially from the underlying regression model. In the light of spatial residual analysis in regression models this is a desirable property. In contrast, local Geary s c i (see Anselin 1995) cannot distinguish between local spatial clusters in average (zero) regression residuals and clusters in either positive or negative regression residuals and assigns the same statistical significance to each of these different local patterns. Under the spatial modelling perspective of socioeconomic, epidemiological, or other processes, that goes far beyond pure exploratory spatial data analysis, extreme spatial objects (signified by either large negative or positive regression residuals) are of higher interest than average spatial objects (denoted by regression residuals around zero), and the story that negative local spatial autocorrelation conveys about the underlying spatial process is as relevant as the account that positive local spatial autocorrelation gives. This paper is laid out with the practical implementation 1 of the saddlepoint approximation technique for local Moran s I i in mind. Nevertheless, first some theoretical discussions are required. The next section gives the general forms of global Moran s I and local Moran s I i in terms of a ratio of quadratic forms and its associated 1. An implementation as SPSS macro of the saddlepoint approximation as well as the moments for the global and local Moran test statistiscs under the assumption of spatial independence can be found at the author s homepage at

4 190 / Geographical Analysis spectrum of eigenvalues. This spectrum of eigenvalues provides the key to the saddlepoint approximation and the exact reference distribution as well as general equations for the moments. This section is followed by a comparative discussion of alternative approximation methods for Moran s I s distribution under the assumption of spatial independence. Section 3 states the saddlepoint approximation in general terms for global Moran s I and local Moran s I i. The applied section compares the accuracy of saddlepoint approximation for local Moran s I i against the normal approximation, the Edgeworth approximation as well as the exact distribution. This section gives also several algebraic simplifications of the implementation of the saddlepoint approximation for local Moran s I i under the assumption of global spatial independence. These simplifications reduce the numerical burden for calculating the saddlepoint approximation substantially. The comparisons in this section are performed under the assumption of global spatial independence for one local Moran s I i in the interior of a hexagonal tessellation with 64 cells and for an empirical data set of the 19 counties of the former German Democratic Republic using the conditional distributions of the local Moran s I i s (see Tiefelsdorf 1998). Some concluding statements and observations close this paper. 1. SPECIFICATION OF MORAN S I AND ITS SPECTRUM OF EIGENVALUES The purpose of this section is to review the specifications of Moran s I and local Moran s I i. A detailed derivation of the specification and probability distribution of global and local Moran s I under the presence of a significant spatial process can be found in Tiefelsdorf (1998 or 000). Assume that we are dealing with a system of n spatially distributed observations y that are related to a set of k exogenous variables in the design matrix X via a linear regression model y X β ε. Let the vector of underlying disturbances of a regression model be distributed as ε (0,σ Ω). The n k design matrix X of exogenous variables includes the constant vector 1 (1,1,,1) T to model the intercept. The k 1 vector β comprises the regression parameters. The disturbances ε follow a covariance structure, which is reflected by the n n positive-definite matrix Ω and the parameter σ is the variance of the disturbances. The covariance matrix reduces to the n n identity matrix σ I if the disturbances are independent. The regression residuals εˆ [I X(X T X) 1 X T ] y are then distributed as εˆ (0,σ M Ω M) where M I X(X T X) 1 X T is the projection matrix. The inner term of the covariance matrix is defined by Ω (I ρ V) 1 (I ρ V T ) 1 for an underlying autoregressive spatial process. The parameter ρ measures the degree of spatial autocorrelation in the disturbances with respect to the underlying spatial structure matrix V. The n n matrix V reflects the standardized relationships among pairs of spatial objects, which are specified in a general spatial relationship matrix G. A similarity metric between the pairs of spatial objects (i,j) is used, that is, the elements of G are greater than or equal to zero. Zero indicates that a pair of spatial objects is unrelated and any value greater than zero signifies a relationship between a pair of spatial objects. A spatial object is not related to itself by definition so that the diagonal elements of G are always zero. The matrix G is supposed to be symmetric. The spatial structure matrix V is the standardized form of the general spatial relationship matrix G. The three standardization methods, also known as coding schemes, are the row-sum standardized coding scheme W, the globally standardized coding scheme C, or the variance stabilizing coding scheme S [see Tiefelsdorf, Griffith, and Boots (1999) for the characteristics and specifications of the three coding schemes]. The spatial structure matrix V does not need to be symmetric. Global Moran s I is defined by the regression residuals εˆ and a global spatial link matrix V as a ratio of quadratic forms

5 Michael Tiefelsdorf / 191 ˆ T 1 T ε ( ) ˆ I V V ε. ˆ T ε εˆ (1) The denominator εˆt εˆ is related to an estimate of the variance σ and makes Moran s I a scale-free function. While the observed value of Moran s I obs does not change whether the symmetric link matrix 1 (V V T ) or the potentially asymmetric link matrix V is used, in order for the numerator to be a quadratic form the symmetric specification must be used. Several transformations (see Tiefelsdorf 1998 or 000) must be applied in order to evaluate the distribution Pr(I I obs Ω) of Moran s I conditional to an underlying spatial process. These transformations lead to [ ] obs T 1 T 1 T obs 1 Pr( I I Ω) Pr δ Ω M ( V V ) I I M Ω δ 0 () with δ (0,σ I). In the case of an autoregressive spatial process Ω 1 (I ρ V) 1. The spectrum of eigenvalues {γ 1,,γ n } of the inner term Ω [ ] 1 T 1 ( T ) obs I M V V I M Ω 1 (3) in equation () characterizes perfectly the reference distribution of Moran s I. It is used in Imhof s method (Imhof 1961) to calculate the exact reference distribution by numerical integration and it is also the key building block for the saddlepoint approximation that is proposed in this paper. Recall, under the assumption of spatial independence Ω 1 must be substituted by the identity matrix I. Under this assumption, a revised spectrum of eigenvalues {λ 1,,λ n }, which is based on M 1 (V V T ) M, proves to be useful. This revised spectrum gives the moments of Moran s I under the assumption of spatial independence. However, some precautions must be taken to exclude those k eigenvalues from the calculation of the moments, which are necessarily zero due to the rank defect of the projection matrix M (see Tiefelsdorf 000, section 9.1). The smallest and largest eigenvalue, λ 1 and λ n respectively, determine the feasible range of the Moran statistic. This range depends on the underlying regression matrix X and the spatial arrangement represented by the spatial link matrix V. For global Moran s I and rectangular tessellations in the rook adjacency specification the range is approximately [ 1,1] and for empirical irregular tessellations with an average of six neighbors for interior cells it is approximately [ 0.5,1] (see Boots and Tiefelsdorf 000). The reference distribution can then be evaluated for any observed value of Moran s I obs at the shifted spectrum {γ 1,,γ n } {λ 1 I obs,,λ n I obs } by Imhof s method or by the saddlepoint approximation. The difference between global Moran s I and local Moran s I i for the ith spatial reference object lies in the specification of the spatial link matrix. In fact, the set of local link matrices V i are the building blocks of global spatial link matrix 1 (V V T ). All elements in a local link matrix are zero except for those elements in the ith row and column, which are copies of the ith row and column of the general spatial relationship matrix G. This gives a star-shaped symmetric local spatial link matrix V i of the structural form

6 19 / Geographical Analysis V i s i 0 L 0 g 0 L 0 1i M O M M M O M 0 L 0 g 0 L 0 i 1, i g L g 0 g L g i1 i, i 1 i, i 1 in 0 L 0 g 0 L 0 i 1, i M O M M M O M 0 L 0 g 0 L 0 ni where s i is a coding scheme specific scaling parameter for the ith spatial object and g ij are the relevant elements of G. For the definitions of s i see Tiefelsdorf, Griffith, and Boots (1999). The sum of the local link matrices over all spatial objects reconstructs the global link matrix, 3 that is, 1 (V V T ) Σ n i 1V i, and thus associates by this additivity property (Anselin 1995) the local Moran s I i s to the global Moran s I statistic. It can be seen, by substituting the local link matrix V i for the global link matrix 1 (V V T ), that local Moran s I i is also defined as ratio of quadratic forms by I i T ε i ˆ V ε ˆ. ˆ T ε εˆ Consequently, all definitions, all statistical procedures and all general distributional properties also apply to local Moran s I i. Specific distributional properties of local Moran s I i are outlined in section 4.. APPROXIMATION METHODS FOR MORAN S I UNDER SPATIAL INDEPENDENCE All approximation methods proposed so far in the geographical literature focus on modeling the reference distribution of global Moran s I under the assumption of spatial independence and normally distributed regression residuals (or normally distributed variations around the mean of random variables in a univariate map pattern analysis). These approximation methods are based on the central moments of Moran s I up to the fourth order: the expectation (I) µ 1, variance (I) µ, skewness µ 3 /µ 3/ and kurtosis µ 4 /µ 4/. For the numerical specification of these moments in either the eigenvalue or the trace formulation see Tiefelsdorf (000, ch. 9). Under alternative hypotheses of a significant underlying spatial process, these moments are no longer valid. For instance, even under spatial independence but with heteroskedastic regression disturbances, Waldhör (1996) shows that the regular expressions for the moments of Moran s I break down because its numerator and denominator of Moran s I are no longer independent. The geographical literature proposes the following approximation methods for Moran s I under the assumption of spatial independence:. A note of caution is required here: we cannot construct the local link matrices V i by simply extracting the ith row and column from the global spatial link matrix 1 ( V V T ). Such an operation does not preserve the properties of the coding schemes. 3. Note that this equation is sometimes stated in terms of the arithmetic mean of the local link matrices 1 (V VT ) 1 n Σ n i 1V i where the scaling parameter changes to s * i n s i.

7 Normal Approximation Cliff and Ord (1971) and Sen (1976) have investigated conditions under which it is reasonable to assume that the distribution of Moran s I approaches the normal distribution. These conditions are based on regularity properties of the spatial link matrix (in particular, that no subset of spatial objects dominates the spatial link matrix). Then the higher-order moments approach for an increasing number of spatial objects in the underlying tessellation those of the normal distribution. Assuming that these conditions are satisfied, the test statistic I I obs µ µ 1 ( 01,) Michael Tiefelsdorf / 193 is approximately standard normally distributed and its probability can be evaluated by obs Pr( I I ) Φ ( I ) (4) where Φ( ) is the distribution function of the standard normal distribution. This naïve approach is commonly used to evaluate the significance of an observed value of Moran s I obs and it provides a satisfactory approximation for global Moran s I in empirical tessellations with more than one hundred spatial objects and well-behaved spatial link matrices V (see Boots and Tiefelsdorf 000). Pearson Type III Approximation The Pearson type III approximation uses the first, second, and third moments of Moran s I to approximate its reference distribution. The gamma distribution and, a special case of it, the chi-square distribution (Mood, Graybill, and Boes 1974, p. 4) belong to the class of Pearson type III distributions. The third moment guarantees that the Pearson type III approximation can capture any skewness in the reference distribution; however, higher-order deviations such as any kurtosis cannot be accommodated. Costanzo, Hubert, and Golledge (1983), Tango (1995), and others use the gamma distribution to approximate the upper tail probabilities of general spatial cross-product statistics. Also these statistics belong to the class of quardratic forms and can be accommodated by the methodologies that are outlined in this paper. Imhof (1961, p. 45) extended Pearson s three-moment χ -approximation to evaluate the significance of quadratic forms in noncentral χ -distributed random variables. All three moments are used to transform the observed value of Moran s I into a chisquare distributed variable and to approximate the degrees of freedom. See also Kuonen (1999, p. 930) for a general discussion of Pearson s three-moment χ - approximation in the context of quadratic forms. The Pearson s approximation is not feasible for local Moran s I i because it cannot accommodate local Moran s I i s excessive kurtosis. Beta Approximations This approximation approach uses the beta distribution as reference and can be implemented in two different ways. The beta distribution is given by (I*) p 1 (1 I*) q 1 β(p,q) with I* [0,1] and p,q 0 and β(p,q) being the beta function. The cumulative distribution function of a beta-distributed random variable is often called the incomplete beta and most statistical software packages have functions to calculate it. The feasible range of a random variable following a beta distribution is bound from

8 194 / Geographical Analysis below by 0 and from above by 1. Since also Moran s I is bound from above and below we can bring it into the [0,1] range by the transformation I I I * min I I. max min (5) The choice of the lower bound I min and the upper bound I max depends on the selected beta approximation method. Another appealing property of the beta distribution is that it can model, in dependence of the parameters p and q, a wide range of shapes of density functions. These include U-shaped, J-shaped, and unimodal shaped distributions as well as the uniform distribution. However, not all bound distributions can be approximated by the beta distribution as the combination of its skewness and kurtosis may vary only within specific limits. See, for example, Figure 6.1 in Stuart and Ord (1994, p. 16) for the Pearson distribution family, of which the beta distribution is a type I member. The Durbin-Watson Approach. The Durbin-Watson approach (Durbin and Watson 1951) has been followed by Cliff and Ord (197) for Moran s I. It assumes that the feasible range of Moran s I is available. The exact limits are given by the smallest and largest eigenvalues of the matrix M 1 1 (V V T ) M, that is, I min λ 1 and I max λ n. The Durbin-Watson approach uses the first two moments, that is, µ 1 and µ, of the Moran s I statistic as well as its feasible range to estimate by the method of moments the parameters p and q of the beta distribution. The probability Pr(I I obs ) of an observed value of Moran s I obs is then calculated by using the transformation (5) and evaluating the incomplete beta at I*. This approach has limited flexibility of modeling the reference distribution of Moran s I because it is based on a functional relationship between the parameters p and q. The Henshaw Approach. The approach taken by Henshaw (1966 with corrections in 1968) uses higher-order moments. It starts off by matching the skewness and kurtosis of the beta distribution to the skewness and kurtosis of the Moran s I statistic in order to estimate the parameters p and q (see Henshaw 1966, pp ). These parameters as well as the expectation and variance of Moran s I are used to estimate I min and I max in order to calculate the probability Pr(I I obs ) of an observed value of Moran s I obs by evaluating the incomplete beta at I*. By using the skewness and kurtosis to fit the beta distribution to Moran s I s underlying reference distribution, substantial flexibility is gained. For instance, in contrast to the Durbin-Watson approach the skewness depends no longer on the location of the expectation with respect to the bounds, but it is modeled explicitly. However, the restrictions with respect to the feasible combinations of the kurtosis as well as the skewness of the beta distribution prohibit us from using this approximation in a wide range of situations. For instance, as observed by Hepple (1998), for spatial link matrices defined on higher-order spatial lags a feasible estimation of the parameter p 0 could not be established. In addition, the kurtosis of local Moran s I i is too large to allow us to use Henshaw s approach to fit the beta distribution. Edgeworth Series Approximation An Edgeworth series approximation to the reference distribution of Moran s I has been introduced by Terui and Kikuchi (1994). Usually moments or cumulants up to the fourth order are used in Edgeworth series approximations so that this approximation method is able to model the kurtosis and skewness of the underlying reference distribution. A concise example of the use of the Edgeworth approximation to derive the density function is given in Seeber (199). Details of the Edgeworth approxima-

9 Michael Tiefelsdorf / 195 tion can be found in Stuart and Ord (1994). The Edgeworth approximation performs in general accurately in the center of the reference distribution. Adversely, it is inadequate to model the tails of the distribution where significance tests are usually performed. In the tails, the approximated density function can even become negative or exhibit outstanding modes. See, for instance, Table 6.1 in Stuart and Ord (1994) for multimodality and negative approximated density functions or, in the context of Moran s I, negative approximated densities in the left-hand tails of Figures 1 to 4 in Terui and Kikuchi (1994). In this paper, the Edgeworth series approximation has been implemented for comparison purposes under the assumption of global spatial independence. In order to perform the Edgeworth approximation for the distribution function, the third and fourth cumulants of the standardized Moran s I have been used: κ 3 µ 3 /µ 3/ and κ 4 (µ 4 3 µ ) /µ 4/ [see Stuart and Ord (1994), eq. (3.43)]. The Edgeworth series [see Stuart and Ord (1994), eqs. (6.4) and (6.43) including the note on the truncation of the series] up to the fourth order is obs h ( I ) h ( I ) h ( I ) κ3 3 κ4 5 κ3 Pr( I I ) Φ ( I ) φ( I ) where Φ ( ) and φ ( ) are the distribution and density functions of the standard normal distribution, respectively, which are evaluated at the standardized observed Moran coefficient I (I obs µ 1 ) / µ. The Hermite polynomials [see Stuart and Ord (1994), eq. (6.3)] are h (I ) I 1, h 3 (I ) I 3 3 I and h 5 (I ) I 5 10 I 3 15 I. This specification is more general than the one given in Terui and Kikuchi (1994) because it is defined upon regression residuals (through κ 3 and κ 4 ) and not only by the univariate variation of a random variable around its mean. It accommodates, in addition, correct degrees for freedom. Note that the Edgeworth approximation differs only in the second term from the standard normal approximation in equation (4). In the tails of the distribution, the polynomial second term of the Edgeworth approximation can even worsen the accuracy when compared to the standard normal approximation in the first term. 3. THE SADDLEPOINT APPROXIMATION This discussion is based on findings by Offer Lieberman (1994) who developed the saddlepoint approximation for ratios of quadratic forms in normal variables. General secondary sources on the saddlepoint approximation are, in increasing level of complexity, Seeber (199), Goutis and Casella (1999), Kolassa (1997), and Jensen (1995). While the implementation of the saddlepoint approximation is relatively simple and numerically efficient, the underlying theory to develop it is quite advanced and left to the reader to explore by studying the references noted above. Noteworthy at this place, however, is the flexibility of the saddlepoint approximation; for instance, it is also applied to evaluate the distribution of likelihood estimators or to approximate the tail probabilities by means of the Lugannini and Rice (1980) equation, in which case the relative error is of magnitude O(n 3 ). The key to the outstanding performance of the saddlepoint approximation is that the entire cumulant generating function is used and that it is readjusted at each value of the random variable by the saddlepoint ωˆ to optimize the fit of the approximation. In the context of global Moran s I, Terui and Kikuchi (1994) mention in their conclusions the potential use of the saddlepoint approximation.

10 196 / Geographical Analysis Since Moran s I and local Moran s I i under either the assumption of independence or conditional to a significant spatial process belong the class of ratios of quadratic forms, the key equations of Lieberman (1994) are reviewed here in order to perform the saddlepoint approximation. The Lugannani-Rice formula for Moran s I is obs 1 1 Pr( I I ) Φ( r) φ( r) r u (6) where Φ ( ) and φ ( ) are the distribution and density functions of the standard normal distribution, respectively, and n r sign( ωˆ ) ln( 1 ωˆ γ ), u ωˆ n i 1 i 1 γ i ( 1 ωˆ γ ) i i. The function sign (ωˆ ) is 1 for ωˆ 0, 0 for ωˆ 0, and 1 for ωˆ 0. The parameter ωˆ is the solution of the saddlepoint equation n γ i i 1 1 ωˆ γ i 0. (7) Recall that the spectrum of eigenvalues {γ 1,,γ n } in equation () depends among other factors on the observed value I obs. The saddlepoint ωˆ is consequently updated at each value of Moran s I obs. The saddlepoint ωˆ is bound from below and above by the open interval ωˆ 1 1, γ γ 1 n where the eigenvalue spectrum is assumed to be sorted ascendingly. The Lugannani- Rice formula (6) has a singularity at the mean of the distribution because u collapses then to u 0. At this point the probability function becomes Pr(I (I)) n 3 n γ. Determining the root ωˆ of the saddlepoint equation (7) along with the calculation of the eigenvalues must usually be performed by 6 1 γ π i i i 1 i numerical methods, which are computationally the most expensive part of the saddlepoint approximation. Barndorff-Nielsen [1990, equations (6.1) and (6.)] has suggested an alternative equation to the Lugannani-Rice formula (6), which is Pr(I I obs ) Φ r ( r 1 ln. It uses the same arguments r and u and gives virtually identical results as the Lugannani-Rice r ( u) ) formula.

11 Michael Tiefelsdorf / APPLICATIONS OF THE SADDLEPOINT APPROXIMATION FOR LOCAL MORAN S I i It is well known that the central limit theorem is inapplicable to evaluate the distribution of local Moran s I i. Local Moran s I i is not asymptotically normally distributed but instead deviates, with an increasing number of spatial objects, more and more from the normal distribution because the kurtosis increases rather than shrinks. Cliff and Ord (1981, p. 50) first mentioned this effect for star-shaped spatial link matrices V i, which only have the rank. Consequently, the matrix M V i M has only one significant positive eigenvalue λ n and one significant negative eigenvalue λ 1 with the remaining eigenvalues being zero. More general criteria for the limiting distribution of quadratic forms can be found in Johnson and Kotz (1970, p. 167). See Table 1, which shows the first four moments of local Moran s I i and its feasible range for a spatial object in the interior of a hexagonal tessellation. In this example the C-coding scheme, a projection matrix M (1) I 1(1 T 1) 1 1 T modelling the variation y y 1 of a georeferenced variable around its global mean and global spatial independence have been assumed. It can be seen that, as the number of spatial objects n in the tessellation increases, the feasible range [λ 1,λ n ] and, associated with it, the kurtosis are growing. This deviation from the asymptotic normal distribution, as the number of spatial objects n increases, puts all approximation methods to the test and makes local Moran s I i an excellent candidate to assess the feasibility of any approximation method to model the exact reference distribution. The exact reference distribution can be calculated by means of Imhof s (1961) method which has been outlined for Moran s I in Tiefelsdorf and Boots (1995) as well as in detail in Tiefelsdorf (000). In order to evaluate the exact reference distribution, numerical integration must be performed using as input the spectrum of eigenvalues {γ 1,,γ n }. 4.1 The Algebraic and Numerical Simplifications of Saddlepoint Approximation for Local Moran s I i under Global Spatial Independence If we assume global spatial independence, we can give for any projection matrix M the spectrum of eigenvalues {λ 1,,λ n } as well as the saddlepoint ωˆ of local Moran s I i in analytical terms. This increases the computational efficiency as it avoids reverting to numerical calculations of the spectrum of eigenvalues and finding iteratively the root of the saddlepoint equation (7). The availability of such efficient solutions allows us to perform local spatial autocorrelation tests in very large tessellations. The restriction imposed by the assumption of global spatial independence may be achieved heuristically in regression models by the application of either nonparametric or parametric spatial filtering methods (see Griffith 000 or Haining 1991). There is also, in general, within the likelihood estimation framework of global spatial regression models, considerable interest to derive efficient analytical expressions or accurate approximations of the eigenvalue spectrum of spatial link matrices associated with large tessellations (see Griffith and Sone 1995 and Smirnov and Anselin 001). TABLE 1 Distributional Characteristics of Local Moran s I i in Dependence of an Increasing Hexagonal Tessellation for an Internal Cell i under the Assumption of Global Spatial Independence n (I i ) µ 1 (I i ) µ skewness kurtosis feasible range [ 15.31, 14.1] [ 56.69, 55.60] [ 17.81, 16.77]

12 198 / Geographical Analysis Boots and Tiefelsdorf (000, p. 36) have shown that the eigenvalues of the matrix M V i M are 1, n t t t λ (8) and λ j 0 for j {,,n 1} where t 1 trace(m V i M) and t trace([m V i M] ). As has been pointed out by J. Keith Ord (personal communications), the expressions t 1 and t can be further algebraically simplified. This reduces substantially the required number of computational operations to evaluate these matrix products: T T 1 trace( M V M) trace( X V X ( X X) ) i i T T 1 trace([ M Vi M] ) trace( Vi ) trace( X Vi X ( X X) ) (9) T T 1 trace( [ X Vi X ( X X) ] ). In order to derive these expressions use has been made of the idempotency of the projection matrix M I X (X T X) 1 X T, that is, M M M, and the facts that trace(v i ) 0 as well as trace(a B) trace(b A). Most of the matrix products on the right-hand side of equation (9) involve only k k matrices instead of n n matrices on the left-hand side of the equation. Since the local spatial link matrix V i is star-shaped and extremely sparse with only d i nonzero elements in either its ith row or column, it requires only d i d i operations to calculate V i. The number of computational operations could be further reduced by making use of the inherent symmetries in the matrix terms of equation (9). From this spectrum of eigenvalues, the set of eigenvalues of M (V i I i,obs I) M for any observed value of I i,obs is given by shifting the initial spectrum { γ1, K, γ n} { λ1 Ii, obs, I,, K,,, 0,, 0, λ, } 14i obs I 4444 i obs 313 K n Ii obs m times k times where m n k and k is the number of linear independent variables including the constant vector 1 in the design matrix X. Thus k eigenvalues must remain zero because the projection matrix M has only the rank n k. A detailed discussion can be found in Tiefelsdorf (000, pp. 80 8). Under the explicit incorporation of a spatial process Ω, as in equation (3) the eigenvalues of local Moran s I i can no longer be given in analytical terms. The saddlepoint equation (7) reduces under global spatial independence, due to the replication of eigenvalues, to the simple form ( λ1 Ii, obs) I i, obs 0 ( λ n Ii, obs) m k 0. (10) 1 ωλ ˆ ( ) ˆ ˆ ˆ 1 Ii, obs 1 ωii, obs ω ωλ ( n Ii, obs) Figure 1 shows a graph of the saddlepoint equation for local Moran s I i at an internal cell in a tessellation with sixty-four hexagons. An observed value of local Moran I i;obs (I i ) (I i ) is assumed and its associated saddlepoint is ωˆ 0

13 Michael Tiefelsdorf / 199 FIG. 1. Saddlepoint Equation (7) for Local Moran s I i at I i,obs (I i ) (I i ) at an Interior Cell in a Hexagonal Tessellation with Sixty-four Cells The saddlepoint equation is fairly flat around f(ωˆ ) 0, which makes it difficult for numerical algorithms to find the root efficiently. This flatness can be explained by the m replications of the eigenvalues I i,obs. The saddlepoint of equation (10) can be solved in analytical terms. It takes the form 1 a ˆω 4 b c (11) with terms a, b and c being defined as These expressions can be implemented easily in any software environment. 4. The Accuracy of the Saddlepoint Approximation for the Reference Distribution of Local Moran s I i The first example compares the saddlepoint approximation, the Edgeworth approximation and the normal approximation against the exact reference distribution of local Moran s I i under the assumption of global spatial independence. Again, a hexaga m I ( λ λ I ) I ( 3λ 3λ 4I ) λ λ, i, obs 1 n i, obs i, obs 1 n i, obs 1 n b I ( m) ( λ I ) ( λ I ), and i, obs n i, obs 1 i, obs 1 i, obs n i, obs c λ I ( m 1) λ I ( m 1) 1 n 1 n 1 i, obs n i, obs i, obs i, obs i, obs λλ ( λλ λi λ I mi m I I ).

14 00 / Geographical Analysis onal setting with sixty-four cells has been selected and local Moran s I i is evaluated at an interior cell in the C-coding scheme and the projection matrix is M (1). Figure shows the exact as well as the three approximated distribution functions over the full probability range [0,1]. Obviously, the Edgeworth approximation performs poorly. It is particularly off in the tails of the distribution, whereas, in the center of the distribution, it outperforms the normal approximation. Overall, its accuracy is slightly better than that of the normal approximation. Distribution functions are by definition monotonically increasing; however, the Edgeworth approximation is partially decreasing in the tails. This effect is associated with a partially negative density function in the tails of the Edgeworth approximation. One can further observe that the normal approximation substantially overstates the significance of local Moran s I i in the tails as it approaches either Pr(I i I i,obs ) 0 for negative local spatial autocorrelation or 1 Pr(I i I i,obs ) 1 for positive local spatial autocorrelation much faster than the reference distribution. This is particularly critical in the situation of multiple testing of several local Moran s I i s where the overall α-level is adjusted downward, for instance, by a Bonferroni-type correction. Here the normal approximation would flag several local Moran s I i s in both tails as significant, whereas using the reference distribution indicates correctly that these local Moran s I i s deviate insignificantly from the zero hypothesis of local spatial independence. These findings indicate that neither the Edgeworth approximation nor the normal approximation are valid models for local Moran s I i reference distribution. On the other hand, the reference distribution and the saddlepoint approximation are virtually indistinguishable and the accuracy improves in the tails. This is clearly illustrated in Figures 3 and 4, which zoom in to the negative and positive tails of the reference distribution and the three approximations. One may argue, however, that the assumption of global spatial independence is inappropriate in empirical settings, which mostly exhibit some degree of global spatial autocorrelation. From a confirmatory point of view, this argument is highly relevant and the distribution of local Moran s I i conditional upon the global spatial process must be used. Extreme tail probabilities of local Moran s I i s then indicate a tendency FIG.. The Exact Distribution Function and the Three Approximated Distribution Functions for Local Moran s I i under the Assumption of Spatial Independence in a Hexagonal Tessellation with Sixty-four Cells

15 FIG. 3. Detailed View on the Negative Tail of the Distribution Function in Figure FIG. 4. Detailed View on the Positive Tail of the Distribution Function in Figure

16 0 / Geographical Analysis to exhibit either local clusters or local singularities (hot spots) beyond the average autocorrelation level of the global spatial process. Consequently, the conditional distribution identifies local pockets of spatial heterogeneity in the underlying global spatial process. In contrast, if global spatial independence denotes the reference level against which local Moran s I i is assessed, then the individual local Moran s I i s indicate the contribution to the global autocorrelation level. This is due to the additivity constraint I Σ n i 1I i. The next example uses an empirical setting to investigate the conditional distribution of local Moran s I i and its saddlepoint approximation. This example is taken from Tiefelsdorf (1998 and 000) and investigates spatial distribution of regression residuals from a bladder cancer incidence model in the 19 counties of the former German Democratic Republic. The regression model uses four explanatory and control variables. A global autoregressive spatial process with an autocorrelation level ρ had been identified. Local Moran s I i is specified in the variance stabilizing S-coding scheme. Its probabilities Pr(I i I i,obs Ω) are calculated subject to the identified autoregressive spatial process. Each local Moran s I i has its own distribution because the local spatial link matrix V i and consequently the spectrum of eigenvalues {γ 1,,γ n } differs from county to county. Recall that under the presence of a spatial process, the moments of local and global Moran s I are no longer effortlessly available and thus all approximation methods in section are no longer directly applicable. Figure 5 compares the exact conditional reference distribution against the saddlepoint approximation in a P-P plot. Points on the main diagonal indicate a perfect correspondence between the reference distribution and its saddlepoint approximation. The saddlepoint approximation equals the referenced distribution in the tails, where statistical tests are performed, whereas in the center of the distribution, mild deviations can be observed. The 19 probability comparisons do not line up on a smooth curve because each local Moran s I i has it individual distributional characteristics. FIG. 5. P-P Plot of Local Moran s I i Conditional Distribution for the 19 Counties of the Former GDR against the Saddlepoint Approximation

17 Michael Tiefelsdorf / 03 This example demonstrates that the saddlepoint approximation also works well for the conditional reference distribution in a larger spatial setting. The inherent autoregressive global spatial process in the data has been adjusted properly because, if the conditional probabilities were projected on either axis, they would follow, as required by theory, a rectangular distribution (see Robins, Van der Vaart, and Ventura 000). 4.3 The Numerical Efficiency of the Saddlepoint Approximation for Local Moran s I i s Reference Distribution The question remaining is, are there any computational gains by using the saddlepoint approximation? This question is addressed for the bladder cancer model by assuming either an underlying global spatial process or global spatial independence. Table shows the cumulative computing times for all 19 counties using the vector optimized numerical programming package GAUSS 3.5 (Aptech Systems, 000) on a Pentium III processor running at 850 MHz. Whenever possible, use has been made of GAUSS facility to handle efficiently operations on sparse matrices. All intermediate but constant matrix products were held in the core memory ready to evaluate local Moran s I i at all reference locations in the spatial tessellation. The first notable finding is that under the assumption of global spatial independence, the analytical solution to derive the 19 spectra of eigenvalues is by far more efficient than calculating numerically the matrix products M V i M and their spectra of eigenvalues. Explicit elementwise programming of the matrix products, that involve the sparse local link matrix V i, would increase the numerical efficiency even further. Consequently, the analytical eigenvalue approach should be used for local Moran s I i under the assumption of spatial independence. The analytical identification of the saddlepoints ωˆ takes virtually no time compared to an ad hoc implementation of the bisection algorithm (see Press et al. 199, pp ). Also the time to approximate local Moran s I i s probabilities by the saddlepoint approximation is negligible whereas the numerical integration in Imhof s formulation takes a substantial amount of time. Analytical solutions for the spectrum of eigenvalues of local Moran s TABLE Comparison of Computing Times between the Saddlepoint Approximation for Local Moran s I i and the Evaluation of the Exact Reference Distribution by Numerical Integration. Computing times under the assumption of global spatial independence Eigenvalues of M V i M: Analytical solution including matrix products (equations 8 and 9) Numerical evaluation of eigenvalues including matrix products Saddlepoint ωˆ : Analytical solution (equation 11) Numerical evaluation by bisection search (equation 7) Probabilities: Saddlepoint approximation (equation 6) Reference distribution via numerical integration 7.3 sec sec. 0.0 sec. 3.9 sec. 0.1 sec sec. Computing times under the presence of a global spatial process Eigenvalues of Ω 1 T M (V i I i,obs I) M Ω 1 : Numerical evaluation of eigenvalues including matrix products Saddlepoint ωˆ : Numerical evaluation by bisection search (equation 7) Probabilities: Saddlepoint approximation Reference distribution via numerical integration 73.9 sec. 3.3 sec. 0.1 sec sec. Several analytical solutions (see section 4.1) are employed under the assumption of global spatial independence. The reported times are cumulative over the 19 computations for each county in the former GDR.

18 04 / Geographical Analysis I i and for the saddlepoint ωˆ are no longer available under the presence of an underlying global spatial process. Their numerical counterparts must be used. Given that the eigenvalues have been already calculated, the key finding is that the saddlepoint approximation, including the numerical search for the saddlepoint ωˆ, is roughly one hundred times faster than the numerical integration of the exact method. The methods for the numerical search of the saddlepoint and the numerical integration perform slightly better under the presence of a global spatial process than under the assumption of global spatial independence. This is due to the fact that the eigenvalue spectrum is more diverse under a global spatial process than under the assumption of global spatial independence where it accumulates around the single value I i,obs. A note of caution, however, is appropriate. We can evaluate even in extremely large spatial tessellations under the assumption of global spatial independence the saddlepoint approximated probabilities of local Moran s I i. However, the same evaluation under the presence of a significant global spatial process become prohibitive because now products of n n matrices need to be calculated and their eigenvalue spectra needs to be determined numerically. For instance, for a spatial system of 3,053 contiguous U.S. counties (excluding Alaska and Hawaii) in the atlas of cancer mortality (see Devesa et al. 1999), the calculation of the matrix products and their associated eigenvalue spectrum for just one local Moran s I i takes over an hour under the presence of a global spatial process. The subsequent calculation of the saddlepoint approximation consumes only several seconds whereas the exact calculation of the reference distribution via numerical integration is performed within minutes. 5. CONCLUSIONS The flexibility of the saddlepoint approximation, its accuracy and numerical efficiency to evaluate the distribution of spatial statistics, such as that of global and local Moran s I, makes it the approximation method of first choice. This holds either under small sample conditions or under large sample circumstances in special cases, when the central limit theorem does not apply. In particular, if the distribution of the Moran s statistic does not converge to normality, such as for local Moran s I i and other special specifications of spatial link matrices, the saddlepoint approximation is the only choice besides calculating the exact reference distribution. In addition, when the moments of Moran s I are not readily available, such as for the distribution of Moran s I subject to a spatial process, for power function evaluations or for heteroskedasticity in the disturbances, it is again the only feasible approximation method because the underlying covariance matrix Ω can be explicitly accommodated. The saddlepoint approximation performs particularly well in the tails of distributions where statistical significance tests are usually conducted. Throughout this paper it has been assumed that the residuals are normally distributed. The normality assumption is at first sight quite restrictive. Nevertheless, it is common practice in standard regression modelling and testing to work with it or to achieve it by data transformations. However, for other error distributions, whenever the cumulant generating function of the test statistic can be derived, the saddlepoint approximation becomes applicable (Huzurbazar 1999), even if the reference distribution is not readily available. This property has the potential to lead to informed significance tests in spatial statistics without the need for the normality assumption and by avoiding extensive simulation experiments. Strawderman (000, p. 1363) states that in univariate problems, there is really no need to rely on asymptotic expansions at all, because most reasonable numerical quadrature routines can yield exact results to user-controlled levels of error. In part, this author agrees with this statement and prefers, whenever possible, to evaluate the exact distribution of Moran s I by numerical integration. However, judging from a

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the