The Saddlepoint Approximation of Moran s I s and Local Moran s I i s Reference Distributions and Their Numerical Evaluation

Size: px
Start display at page:

Download "The Saddlepoint Approximation of Moran s I s and Local Moran s I i s Reference Distributions and Their Numerical Evaluation"

Transcription

1 Michael Tiefelsdorf The Saddlepoint Approximation of Moran s I s and Local Moran s I i s Reference Distributions and Their Numerical Evaluation Global Moran s I and local Moran s I i are the most commonly used test statistics for spatial autocorrelation in univariate map patterns or in regression residuals. They belong to the general class of ratios of quadratic forms for whom a whole array of approximation techniques has been proposed in the statistical literature, such as the prominent saddlepoint approximation by Offer Lieberman (1994). The saddlepoint approximation outperforms other approximation methods with respect to its accuracy and computational costs. In addition, only the saddlepoint approximation is capable of handling, in analytical terms, reference distributions of Moran s I that are subject to significant underlying spatial processes. The accuracy and computational benefits of the saddlepoint approximation are demonstrated for a set of local Moran s I i statistics under either the assumption of global spatial independence or subject to an underlying global spatial process. Local Moran s I i is known to have an excessive kurtosis and thus void the use of the simple approximation methods of its reference distribution. The results demonstrate how well the saddlepoint approximation fits the reference distribution of local Moran s I i. Furthermore, for local Moran s I i under the assumption of global spatial independence several algebraic simplifications lead to substantial gains in numerical efficiency. This makes it possible to evaluate local Moran s I i s significance in large spatial tessellations. Moran s I and several related spatial statistics, such as Geary s c (see Cliff and Ord 1981, p. 167), which can be expressed as quadratic forms, are frequently encountered in the spatial statistical literature and implemented in several spatial software packages to test for spatial autocorrelation in regression residuals. So far, the assessment of the significances of the observed values of these statistics is performed either under the assumption that the statistics follow approximately a normal distribution for which the expectation and variance can be calculated, or by extensive simulation experiments that either randomize the locations of the observed residuals or assume The author thanks J. Keith Ord for his valuable comments and suggestions on an earlier draft of this paper. Michael Tiefelsdorf is assistant professor of geography at The Ohio State University. tiefelsdorf.1@osu.edu Geographical Analysis, Vol. 34, No. 3 (July 00) The Ohio State University Submitted: 9/10/01. Revised version accepted: 1/04/0

2 188 / Geographical Analysis a specific error structure of the underlying regression disturbances. In addition, we can give the exact reference distribution of these ratios of quadratic forms for normally distributed regression residuals. These exact distributions allow us to evaluate the performance of any approximation. This paper focuses on an investigation of the performance of the saddlepoint approximation method as a substitute for the exact reference distribution of Moran s I. The results demonstrate that saddlepoint approximation provides a substantial improvement over simpler approximation methods, and that it can be applied in a wider range of conditions than the conventional methods. While, in most instances, under the assumption of spatial independence a correctly specified normal approximation is quite feasible for empirical tessellations with more than 100 spatial objects, there are numerous situations in spatial statistics where the normal approximation leads to a misjudgment of the significance of an observed test statistics. Most exceptions are related to unusual forms of the spatial link matrix or to peculiar sets of exogenous variables in the regression model. For example, the reference distributions of local Moran s I i and also global Moran s I, which is defined by higher-order neighbor link matrices (Boots and Tiefelsdorf 000), as well as the general spatial cross-product statistic (see Costanzo, Hubert, and Golledge 1983) do not necessarily converge asymptotically toward the normal distribution as the number of spatial observations increases. Another often overlooked issue in practical applications of the Moran s I test is the specification of its expectation and variance. The normal approximation is based on both statistics. The moments of Moran s I depend on the underlying set of exogenous variables X in the regression model (see, for instance, Hepple 1998, or Tiefelsdorf 000). Most expositions in geographical literature as well as software implementations of Moran s I skip this generalized specification of the moments. For regression residuals, which depend on the exogenous variables, Tiefelsdorf and Boots (1995) have shown that this negligence leads, as in the case of the Durbin-Watson d statistic for serial autocorrelation, to an indeterminate area. This indeterminate area envelops the true significance for an observed Moran s I obs from below and above. As the number of explanatory variables increases, the envelope becomes wider and as the number of observations increases, it shrinks. Furthermore, even if the normal approximation is feasible under the assumption of spatial independence, it will no longer hold for spatial test statistics under the influence of a significant spatial process. For example, like the well-known Pearson product moment correlation coefficient ρ, the possible value range of Moran s I is bound from below and above. However, in contrast to the Pearson s ρ, the bounds of Moran s I do not need to be symmetrical around its expectation [see Tiefelsdorf (1998) for the calculation of Moran s I bounds]. The conditional reference distribution of Moran s I will be shifted towards either bound in the presence of an underlying spatial process with either significant positive or negative spatial autocorrelation. This shift will skew the reference distribution and will invalidate the normal approximation, which assumes symmetry of the bounds around its center. In addition, the regular expressions for the moments under the assumption of spatial independence are no longer valid and the evaluation of moments conditional upon an underlying spatial process is complex and requires numerical integration (see Tiefelsdorf 000, section 9.). This departure from the assumption of spatial independence rules out all moment-based approximation methods. In lieu of analytical knowledge about the reference distribution under the influence of significant spatial process, it is common practice to fall back upon simulation experiments. For the general class of simultaneous and conditional autoregressive Gaussian processes as well as moving average Gaussian processes in spatial regression models, however, the exact conditional distribution of global Moran s I and local Moran s I i can be calculated, which enables us to directly study the power of these tests and to adjust the probability of local Moran s I i subject to the global underlying

3 Michael Tiefelsdorf / 189 spatial process. The gains of using, whenever possible, the exact analytical approach over simulation techniques are quite substantial: (1) the resulting reference distributions of the exact approach are free from random influences which are even present in the simulation experiments with several thousand repetitions; () the analyst has full control over all parameters of the model and a change in the model constellation does not require the repetitions of the simulation experiment; and (3) the exact approach is numerically more efficient. Because the saddlepoint approximation method, which is investigated in this paper, uses the same information as input as the exact approach, it is expected to come close the reference distribution. The choice for using local Moran s I i as an example for the saddlepoint approximation is motivated by two considerations. First, it is well known that the reference distribution of local Moran s I i deviates substantially from the normal distribution and this thus takes all approximation methods to the test. And secondly, local Moran s I i is a highly relevant test statistic in model-driven spatial data analysis such as the local analysis of regression residuals. It singles out spatial clusters (that is, those cliques of equally signed extreme regression residuals) and hot spots (that is, those outstanding regression residuals with respect to their neighborhoods) within a global spatial pattern. The definitions for hot spots and (clinical) spatial clusters are dependent on the spatial scale and they follow in this paper the process-based definitions given by Wartenberg and Greenberg (1990, p. S158; see also Waller and Lawson 1995, p. 94). These process-based definitions should not be confused with the pattern-based perspectives given elsewhere in the literature, which use the terms hot spot, cold spot, and spatial cluster synonymously to signify positive local autocorrelation and the term spatial outlier to denote negative local autocorrelation. Another prominent spatial test for local spatial association is Getis and Ord s G i statistic (Ord and Getis 1995), which has been developed further by Ord and Getis (001) to incorporate a conditional perspective based on an underlying positively autocorrelated spatial process. Boots and Tiefelsdorf (000, p. 30) give a brief comparison of the two local spatial tests. Basically local Moran s I i is able to uncover negative local spatial autocorrelation whereas local Getis and Ord s G i statistic is able to distinguish between clusters of above average data values and below average data values. In addition, Tiefelsdorf and Boots (1997) discuss an immanent design property of the local Moran s I i statistic. It is only sensitive if regression residuals at the reference location deviate substantially from the underlying regression model. In the light of spatial residual analysis in regression models this is a desirable property. In contrast, local Geary s c i (see Anselin 1995) cannot distinguish between local spatial clusters in average (zero) regression residuals and clusters in either positive or negative regression residuals and assigns the same statistical significance to each of these different local patterns. Under the spatial modelling perspective of socioeconomic, epidemiological, or other processes, that goes far beyond pure exploratory spatial data analysis, extreme spatial objects (signified by either large negative or positive regression residuals) are of higher interest than average spatial objects (denoted by regression residuals around zero), and the story that negative local spatial autocorrelation conveys about the underlying spatial process is as relevant as the account that positive local spatial autocorrelation gives. This paper is laid out with the practical implementation 1 of the saddlepoint approximation technique for local Moran s I i in mind. Nevertheless, first some theoretical discussions are required. The next section gives the general forms of global Moran s I and local Moran s I i in terms of a ratio of quadratic forms and its associated 1. An implementation as SPSS macro of the saddlepoint approximation as well as the moments for the global and local Moran test statistiscs under the assumption of spatial independence can be found at the author s homepage at

4 190 / Geographical Analysis spectrum of eigenvalues. This spectrum of eigenvalues provides the key to the saddlepoint approximation and the exact reference distribution as well as general equations for the moments. This section is followed by a comparative discussion of alternative approximation methods for Moran s I s distribution under the assumption of spatial independence. Section 3 states the saddlepoint approximation in general terms for global Moran s I and local Moran s I i. The applied section compares the accuracy of saddlepoint approximation for local Moran s I i against the normal approximation, the Edgeworth approximation as well as the exact distribution. This section gives also several algebraic simplifications of the implementation of the saddlepoint approximation for local Moran s I i under the assumption of global spatial independence. These simplifications reduce the numerical burden for calculating the saddlepoint approximation substantially. The comparisons in this section are performed under the assumption of global spatial independence for one local Moran s I i in the interior of a hexagonal tessellation with 64 cells and for an empirical data set of the 19 counties of the former German Democratic Republic using the conditional distributions of the local Moran s I i s (see Tiefelsdorf 1998). Some concluding statements and observations close this paper. 1. SPECIFICATION OF MORAN S I AND ITS SPECTRUM OF EIGENVALUES The purpose of this section is to review the specifications of Moran s I and local Moran s I i. A detailed derivation of the specification and probability distribution of global and local Moran s I under the presence of a significant spatial process can be found in Tiefelsdorf (1998 or 000). Assume that we are dealing with a system of n spatially distributed observations y that are related to a set of k exogenous variables in the design matrix X via a linear regression model y X β ε. Let the vector of underlying disturbances of a regression model be distributed as ε (0,σ Ω). The n k design matrix X of exogenous variables includes the constant vector 1 (1,1,,1) T to model the intercept. The k 1 vector β comprises the regression parameters. The disturbances ε follow a covariance structure, which is reflected by the n n positive-definite matrix Ω and the parameter σ is the variance of the disturbances. The covariance matrix reduces to the n n identity matrix σ I if the disturbances are independent. The regression residuals εˆ [I X(X T X) 1 X T ] y are then distributed as εˆ (0,σ M Ω M) where M I X(X T X) 1 X T is the projection matrix. The inner term of the covariance matrix is defined by Ω (I ρ V) 1 (I ρ V T ) 1 for an underlying autoregressive spatial process. The parameter ρ measures the degree of spatial autocorrelation in the disturbances with respect to the underlying spatial structure matrix V. The n n matrix V reflects the standardized relationships among pairs of spatial objects, which are specified in a general spatial relationship matrix G. A similarity metric between the pairs of spatial objects (i,j) is used, that is, the elements of G are greater than or equal to zero. Zero indicates that a pair of spatial objects is unrelated and any value greater than zero signifies a relationship between a pair of spatial objects. A spatial object is not related to itself by definition so that the diagonal elements of G are always zero. The matrix G is supposed to be symmetric. The spatial structure matrix V is the standardized form of the general spatial relationship matrix G. The three standardization methods, also known as coding schemes, are the row-sum standardized coding scheme W, the globally standardized coding scheme C, or the variance stabilizing coding scheme S [see Tiefelsdorf, Griffith, and Boots (1999) for the characteristics and specifications of the three coding schemes]. The spatial structure matrix V does not need to be symmetric. Global Moran s I is defined by the regression residuals εˆ and a global spatial link matrix V as a ratio of quadratic forms

5 Michael Tiefelsdorf / 191 ˆ T 1 T ε ( ) ˆ I V V ε. ˆ T ε εˆ (1) The denominator εˆt εˆ is related to an estimate of the variance σ and makes Moran s I a scale-free function. While the observed value of Moran s I obs does not change whether the symmetric link matrix 1 (V V T ) or the potentially asymmetric link matrix V is used, in order for the numerator to be a quadratic form the symmetric specification must be used. Several transformations (see Tiefelsdorf 1998 or 000) must be applied in order to evaluate the distribution Pr(I I obs Ω) of Moran s I conditional to an underlying spatial process. These transformations lead to [ ] obs T 1 T 1 T obs 1 Pr( I I Ω) Pr δ Ω M ( V V ) I I M Ω δ 0 () with δ (0,σ I). In the case of an autoregressive spatial process Ω 1 (I ρ V) 1. The spectrum of eigenvalues {γ 1,,γ n } of the inner term Ω [ ] 1 T 1 ( T ) obs I M V V I M Ω 1 (3) in equation () characterizes perfectly the reference distribution of Moran s I. It is used in Imhof s method (Imhof 1961) to calculate the exact reference distribution by numerical integration and it is also the key building block for the saddlepoint approximation that is proposed in this paper. Recall, under the assumption of spatial independence Ω 1 must be substituted by the identity matrix I. Under this assumption, a revised spectrum of eigenvalues {λ 1,,λ n }, which is based on M 1 (V V T ) M, proves to be useful. This revised spectrum gives the moments of Moran s I under the assumption of spatial independence. However, some precautions must be taken to exclude those k eigenvalues from the calculation of the moments, which are necessarily zero due to the rank defect of the projection matrix M (see Tiefelsdorf 000, section 9.1). The smallest and largest eigenvalue, λ 1 and λ n respectively, determine the feasible range of the Moran statistic. This range depends on the underlying regression matrix X and the spatial arrangement represented by the spatial link matrix V. For global Moran s I and rectangular tessellations in the rook adjacency specification the range is approximately [ 1,1] and for empirical irregular tessellations with an average of six neighbors for interior cells it is approximately [ 0.5,1] (see Boots and Tiefelsdorf 000). The reference distribution can then be evaluated for any observed value of Moran s I obs at the shifted spectrum {γ 1,,γ n } {λ 1 I obs,,λ n I obs } by Imhof s method or by the saddlepoint approximation. The difference between global Moran s I and local Moran s I i for the ith spatial reference object lies in the specification of the spatial link matrix. In fact, the set of local link matrices V i are the building blocks of global spatial link matrix 1 (V V T ). All elements in a local link matrix are zero except for those elements in the ith row and column, which are copies of the ith row and column of the general spatial relationship matrix G. This gives a star-shaped symmetric local spatial link matrix V i of the structural form

6 19 / Geographical Analysis V i s i 0 L 0 g 0 L 0 1i M O M M M O M 0 L 0 g 0 L 0 i 1, i g L g 0 g L g i1 i, i 1 i, i 1 in 0 L 0 g 0 L 0 i 1, i M O M M M O M 0 L 0 g 0 L 0 ni where s i is a coding scheme specific scaling parameter for the ith spatial object and g ij are the relevant elements of G. For the definitions of s i see Tiefelsdorf, Griffith, and Boots (1999). The sum of the local link matrices over all spatial objects reconstructs the global link matrix, 3 that is, 1 (V V T ) Σ n i 1V i, and thus associates by this additivity property (Anselin 1995) the local Moran s I i s to the global Moran s I statistic. It can be seen, by substituting the local link matrix V i for the global link matrix 1 (V V T ), that local Moran s I i is also defined as ratio of quadratic forms by I i T ε i ˆ V ε ˆ. ˆ T ε εˆ Consequently, all definitions, all statistical procedures and all general distributional properties also apply to local Moran s I i. Specific distributional properties of local Moran s I i are outlined in section 4.. APPROXIMATION METHODS FOR MORAN S I UNDER SPATIAL INDEPENDENCE All approximation methods proposed so far in the geographical literature focus on modeling the reference distribution of global Moran s I under the assumption of spatial independence and normally distributed regression residuals (or normally distributed variations around the mean of random variables in a univariate map pattern analysis). These approximation methods are based on the central moments of Moran s I up to the fourth order: the expectation (I) µ 1, variance (I) µ, skewness µ 3 /µ 3/ and kurtosis µ 4 /µ 4/. For the numerical specification of these moments in either the eigenvalue or the trace formulation see Tiefelsdorf (000, ch. 9). Under alternative hypotheses of a significant underlying spatial process, these moments are no longer valid. For instance, even under spatial independence but with heteroskedastic regression disturbances, Waldhör (1996) shows that the regular expressions for the moments of Moran s I break down because its numerator and denominator of Moran s I are no longer independent. The geographical literature proposes the following approximation methods for Moran s I under the assumption of spatial independence:. A note of caution is required here: we cannot construct the local link matrices V i by simply extracting the ith row and column from the global spatial link matrix 1 ( V V T ). Such an operation does not preserve the properties of the coding schemes. 3. Note that this equation is sometimes stated in terms of the arithmetic mean of the local link matrices 1 (V VT ) 1 n Σ n i 1V i where the scaling parameter changes to s * i n s i.

7 Normal Approximation Cliff and Ord (1971) and Sen (1976) have investigated conditions under which it is reasonable to assume that the distribution of Moran s I approaches the normal distribution. These conditions are based on regularity properties of the spatial link matrix (in particular, that no subset of spatial objects dominates the spatial link matrix). Then the higher-order moments approach for an increasing number of spatial objects in the underlying tessellation those of the normal distribution. Assuming that these conditions are satisfied, the test statistic I I obs µ µ 1 ( 01,) Michael Tiefelsdorf / 193 is approximately standard normally distributed and its probability can be evaluated by obs Pr( I I ) Φ ( I ) (4) where Φ( ) is the distribution function of the standard normal distribution. This naïve approach is commonly used to evaluate the significance of an observed value of Moran s I obs and it provides a satisfactory approximation for global Moran s I in empirical tessellations with more than one hundred spatial objects and well-behaved spatial link matrices V (see Boots and Tiefelsdorf 000). Pearson Type III Approximation The Pearson type III approximation uses the first, second, and third moments of Moran s I to approximate its reference distribution. The gamma distribution and, a special case of it, the chi-square distribution (Mood, Graybill, and Boes 1974, p. 4) belong to the class of Pearson type III distributions. The third moment guarantees that the Pearson type III approximation can capture any skewness in the reference distribution; however, higher-order deviations such as any kurtosis cannot be accommodated. Costanzo, Hubert, and Golledge (1983), Tango (1995), and others use the gamma distribution to approximate the upper tail probabilities of general spatial cross-product statistics. Also these statistics belong to the class of quardratic forms and can be accommodated by the methodologies that are outlined in this paper. Imhof (1961, p. 45) extended Pearson s three-moment χ -approximation to evaluate the significance of quadratic forms in noncentral χ -distributed random variables. All three moments are used to transform the observed value of Moran s I into a chisquare distributed variable and to approximate the degrees of freedom. See also Kuonen (1999, p. 930) for a general discussion of Pearson s three-moment χ - approximation in the context of quadratic forms. The Pearson s approximation is not feasible for local Moran s I i because it cannot accommodate local Moran s I i s excessive kurtosis. Beta Approximations This approximation approach uses the beta distribution as reference and can be implemented in two different ways. The beta distribution is given by (I*) p 1 (1 I*) q 1 β(p,q) with I* [0,1] and p,q 0 and β(p,q) being the beta function. The cumulative distribution function of a beta-distributed random variable is often called the incomplete beta and most statistical software packages have functions to calculate it. The feasible range of a random variable following a beta distribution is bound from

8 194 / Geographical Analysis below by 0 and from above by 1. Since also Moran s I is bound from above and below we can bring it into the [0,1] range by the transformation I I I * min I I. max min (5) The choice of the lower bound I min and the upper bound I max depends on the selected beta approximation method. Another appealing property of the beta distribution is that it can model, in dependence of the parameters p and q, a wide range of shapes of density functions. These include U-shaped, J-shaped, and unimodal shaped distributions as well as the uniform distribution. However, not all bound distributions can be approximated by the beta distribution as the combination of its skewness and kurtosis may vary only within specific limits. See, for example, Figure 6.1 in Stuart and Ord (1994, p. 16) for the Pearson distribution family, of which the beta distribution is a type I member. The Durbin-Watson Approach. The Durbin-Watson approach (Durbin and Watson 1951) has been followed by Cliff and Ord (197) for Moran s I. It assumes that the feasible range of Moran s I is available. The exact limits are given by the smallest and largest eigenvalues of the matrix M 1 1 (V V T ) M, that is, I min λ 1 and I max λ n. The Durbin-Watson approach uses the first two moments, that is, µ 1 and µ, of the Moran s I statistic as well as its feasible range to estimate by the method of moments the parameters p and q of the beta distribution. The probability Pr(I I obs ) of an observed value of Moran s I obs is then calculated by using the transformation (5) and evaluating the incomplete beta at I*. This approach has limited flexibility of modeling the reference distribution of Moran s I because it is based on a functional relationship between the parameters p and q. The Henshaw Approach. The approach taken by Henshaw (1966 with corrections in 1968) uses higher-order moments. It starts off by matching the skewness and kurtosis of the beta distribution to the skewness and kurtosis of the Moran s I statistic in order to estimate the parameters p and q (see Henshaw 1966, pp ). These parameters as well as the expectation and variance of Moran s I are used to estimate I min and I max in order to calculate the probability Pr(I I obs ) of an observed value of Moran s I obs by evaluating the incomplete beta at I*. By using the skewness and kurtosis to fit the beta distribution to Moran s I s underlying reference distribution, substantial flexibility is gained. For instance, in contrast to the Durbin-Watson approach the skewness depends no longer on the location of the expectation with respect to the bounds, but it is modeled explicitly. However, the restrictions with respect to the feasible combinations of the kurtosis as well as the skewness of the beta distribution prohibit us from using this approximation in a wide range of situations. For instance, as observed by Hepple (1998), for spatial link matrices defined on higher-order spatial lags a feasible estimation of the parameter p 0 could not be established. In addition, the kurtosis of local Moran s I i is too large to allow us to use Henshaw s approach to fit the beta distribution. Edgeworth Series Approximation An Edgeworth series approximation to the reference distribution of Moran s I has been introduced by Terui and Kikuchi (1994). Usually moments or cumulants up to the fourth order are used in Edgeworth series approximations so that this approximation method is able to model the kurtosis and skewness of the underlying reference distribution. A concise example of the use of the Edgeworth approximation to derive the density function is given in Seeber (199). Details of the Edgeworth approxima-

9 Michael Tiefelsdorf / 195 tion can be found in Stuart and Ord (1994). The Edgeworth approximation performs in general accurately in the center of the reference distribution. Adversely, it is inadequate to model the tails of the distribution where significance tests are usually performed. In the tails, the approximated density function can even become negative or exhibit outstanding modes. See, for instance, Table 6.1 in Stuart and Ord (1994) for multimodality and negative approximated density functions or, in the context of Moran s I, negative approximated densities in the left-hand tails of Figures 1 to 4 in Terui and Kikuchi (1994). In this paper, the Edgeworth series approximation has been implemented for comparison purposes under the assumption of global spatial independence. In order to perform the Edgeworth approximation for the distribution function, the third and fourth cumulants of the standardized Moran s I have been used: κ 3 µ 3 /µ 3/ and κ 4 (µ 4 3 µ ) /µ 4/ [see Stuart and Ord (1994), eq. (3.43)]. The Edgeworth series [see Stuart and Ord (1994), eqs. (6.4) and (6.43) including the note on the truncation of the series] up to the fourth order is obs h ( I ) h ( I ) h ( I ) κ3 3 κ4 5 κ3 Pr( I I ) Φ ( I ) φ( I ) where Φ ( ) and φ ( ) are the distribution and density functions of the standard normal distribution, respectively, which are evaluated at the standardized observed Moran coefficient I (I obs µ 1 ) / µ. The Hermite polynomials [see Stuart and Ord (1994), eq. (6.3)] are h (I ) I 1, h 3 (I ) I 3 3 I and h 5 (I ) I 5 10 I 3 15 I. This specification is more general than the one given in Terui and Kikuchi (1994) because it is defined upon regression residuals (through κ 3 and κ 4 ) and not only by the univariate variation of a random variable around its mean. It accommodates, in addition, correct degrees for freedom. Note that the Edgeworth approximation differs only in the second term from the standard normal approximation in equation (4). In the tails of the distribution, the polynomial second term of the Edgeworth approximation can even worsen the accuracy when compared to the standard normal approximation in the first term. 3. THE SADDLEPOINT APPROXIMATION This discussion is based on findings by Offer Lieberman (1994) who developed the saddlepoint approximation for ratios of quadratic forms in normal variables. General secondary sources on the saddlepoint approximation are, in increasing level of complexity, Seeber (199), Goutis and Casella (1999), Kolassa (1997), and Jensen (1995). While the implementation of the saddlepoint approximation is relatively simple and numerically efficient, the underlying theory to develop it is quite advanced and left to the reader to explore by studying the references noted above. Noteworthy at this place, however, is the flexibility of the saddlepoint approximation; for instance, it is also applied to evaluate the distribution of likelihood estimators or to approximate the tail probabilities by means of the Lugannini and Rice (1980) equation, in which case the relative error is of magnitude O(n 3 ). The key to the outstanding performance of the saddlepoint approximation is that the entire cumulant generating function is used and that it is readjusted at each value of the random variable by the saddlepoint ωˆ to optimize the fit of the approximation. In the context of global Moran s I, Terui and Kikuchi (1994) mention in their conclusions the potential use of the saddlepoint approximation.

10 196 / Geographical Analysis Since Moran s I and local Moran s I i under either the assumption of independence or conditional to a significant spatial process belong the class of ratios of quadratic forms, the key equations of Lieberman (1994) are reviewed here in order to perform the saddlepoint approximation. The Lugannani-Rice formula for Moran s I is obs 1 1 Pr( I I ) Φ( r) φ( r) r u (6) where Φ ( ) and φ ( ) are the distribution and density functions of the standard normal distribution, respectively, and n r sign( ωˆ ) ln( 1 ωˆ γ ), u ωˆ n i 1 i 1 γ i ( 1 ωˆ γ ) i i. The function sign (ωˆ ) is 1 for ωˆ 0, 0 for ωˆ 0, and 1 for ωˆ 0. The parameter ωˆ is the solution of the saddlepoint equation n γ i i 1 1 ωˆ γ i 0. (7) Recall that the spectrum of eigenvalues {γ 1,,γ n } in equation () depends among other factors on the observed value I obs. The saddlepoint ωˆ is consequently updated at each value of Moran s I obs. The saddlepoint ωˆ is bound from below and above by the open interval ωˆ 1 1, γ γ 1 n where the eigenvalue spectrum is assumed to be sorted ascendingly. The Lugannani- Rice formula (6) has a singularity at the mean of the distribution because u collapses then to u 0. At this point the probability function becomes Pr(I (I)) n 3 n γ. Determining the root ωˆ of the saddlepoint equation (7) along with the calculation of the eigenvalues must usually be performed by 6 1 γ π i i i 1 i numerical methods, which are computationally the most expensive part of the saddlepoint approximation. Barndorff-Nielsen [1990, equations (6.1) and (6.)] has suggested an alternative equation to the Lugannani-Rice formula (6), which is Pr(I I obs ) Φ r ( r 1 ln. It uses the same arguments r and u and gives virtually identical results as the Lugannani-Rice r ( u) ) formula.

11 Michael Tiefelsdorf / APPLICATIONS OF THE SADDLEPOINT APPROXIMATION FOR LOCAL MORAN S I i It is well known that the central limit theorem is inapplicable to evaluate the distribution of local Moran s I i. Local Moran s I i is not asymptotically normally distributed but instead deviates, with an increasing number of spatial objects, more and more from the normal distribution because the kurtosis increases rather than shrinks. Cliff and Ord (1981, p. 50) first mentioned this effect for star-shaped spatial link matrices V i, which only have the rank. Consequently, the matrix M V i M has only one significant positive eigenvalue λ n and one significant negative eigenvalue λ 1 with the remaining eigenvalues being zero. More general criteria for the limiting distribution of quadratic forms can be found in Johnson and Kotz (1970, p. 167). See Table 1, which shows the first four moments of local Moran s I i and its feasible range for a spatial object in the interior of a hexagonal tessellation. In this example the C-coding scheme, a projection matrix M (1) I 1(1 T 1) 1 1 T modelling the variation y y 1 of a georeferenced variable around its global mean and global spatial independence have been assumed. It can be seen that, as the number of spatial objects n in the tessellation increases, the feasible range [λ 1,λ n ] and, associated with it, the kurtosis are growing. This deviation from the asymptotic normal distribution, as the number of spatial objects n increases, puts all approximation methods to the test and makes local Moran s I i an excellent candidate to assess the feasibility of any approximation method to model the exact reference distribution. The exact reference distribution can be calculated by means of Imhof s (1961) method which has been outlined for Moran s I in Tiefelsdorf and Boots (1995) as well as in detail in Tiefelsdorf (000). In order to evaluate the exact reference distribution, numerical integration must be performed using as input the spectrum of eigenvalues {γ 1,,γ n }. 4.1 The Algebraic and Numerical Simplifications of Saddlepoint Approximation for Local Moran s I i under Global Spatial Independence If we assume global spatial independence, we can give for any projection matrix M the spectrum of eigenvalues {λ 1,,λ n } as well as the saddlepoint ωˆ of local Moran s I i in analytical terms. This increases the computational efficiency as it avoids reverting to numerical calculations of the spectrum of eigenvalues and finding iteratively the root of the saddlepoint equation (7). The availability of such efficient solutions allows us to perform local spatial autocorrelation tests in very large tessellations. The restriction imposed by the assumption of global spatial independence may be achieved heuristically in regression models by the application of either nonparametric or parametric spatial filtering methods (see Griffith 000 or Haining 1991). There is also, in general, within the likelihood estimation framework of global spatial regression models, considerable interest to derive efficient analytical expressions or accurate approximations of the eigenvalue spectrum of spatial link matrices associated with large tessellations (see Griffith and Sone 1995 and Smirnov and Anselin 001). TABLE 1 Distributional Characteristics of Local Moran s I i in Dependence of an Increasing Hexagonal Tessellation for an Internal Cell i under the Assumption of Global Spatial Independence n (I i ) µ 1 (I i ) µ skewness kurtosis feasible range [ 15.31, 14.1] [ 56.69, 55.60] [ 17.81, 16.77]

12 198 / Geographical Analysis Boots and Tiefelsdorf (000, p. 36) have shown that the eigenvalues of the matrix M V i M are 1, n t t t λ (8) and λ j 0 for j {,,n 1} where t 1 trace(m V i M) and t trace([m V i M] ). As has been pointed out by J. Keith Ord (personal communications), the expressions t 1 and t can be further algebraically simplified. This reduces substantially the required number of computational operations to evaluate these matrix products: T T 1 trace( M V M) trace( X V X ( X X) ) i i T T 1 trace([ M Vi M] ) trace( Vi ) trace( X Vi X ( X X) ) (9) T T 1 trace( [ X Vi X ( X X) ] ). In order to derive these expressions use has been made of the idempotency of the projection matrix M I X (X T X) 1 X T, that is, M M M, and the facts that trace(v i ) 0 as well as trace(a B) trace(b A). Most of the matrix products on the right-hand side of equation (9) involve only k k matrices instead of n n matrices on the left-hand side of the equation. Since the local spatial link matrix V i is star-shaped and extremely sparse with only d i nonzero elements in either its ith row or column, it requires only d i d i operations to calculate V i. The number of computational operations could be further reduced by making use of the inherent symmetries in the matrix terms of equation (9). From this spectrum of eigenvalues, the set of eigenvalues of M (V i I i,obs I) M for any observed value of I i,obs is given by shifting the initial spectrum { γ1, K, γ n} { λ1 Ii, obs, I,, K,,, 0,, 0, λ, } 14i obs I 4444 i obs 313 K n Ii obs m times k times where m n k and k is the number of linear independent variables including the constant vector 1 in the design matrix X. Thus k eigenvalues must remain zero because the projection matrix M has only the rank n k. A detailed discussion can be found in Tiefelsdorf (000, pp. 80 8). Under the explicit incorporation of a spatial process Ω, as in equation (3) the eigenvalues of local Moran s I i can no longer be given in analytical terms. The saddlepoint equation (7) reduces under global spatial independence, due to the replication of eigenvalues, to the simple form ( λ1 Ii, obs) I i, obs 0 ( λ n Ii, obs) m k 0. (10) 1 ωλ ˆ ( ) ˆ ˆ ˆ 1 Ii, obs 1 ωii, obs ω ωλ ( n Ii, obs) Figure 1 shows a graph of the saddlepoint equation for local Moran s I i at an internal cell in a tessellation with sixty-four hexagons. An observed value of local Moran I i;obs (I i ) (I i ) is assumed and its associated saddlepoint is ωˆ 0

13 Michael Tiefelsdorf / 199 FIG. 1. Saddlepoint Equation (7) for Local Moran s I i at I i,obs (I i ) (I i ) at an Interior Cell in a Hexagonal Tessellation with Sixty-four Cells The saddlepoint equation is fairly flat around f(ωˆ ) 0, which makes it difficult for numerical algorithms to find the root efficiently. This flatness can be explained by the m replications of the eigenvalues I i,obs. The saddlepoint of equation (10) can be solved in analytical terms. It takes the form 1 a ˆω 4 b c (11) with terms a, b and c being defined as These expressions can be implemented easily in any software environment. 4. The Accuracy of the Saddlepoint Approximation for the Reference Distribution of Local Moran s I i The first example compares the saddlepoint approximation, the Edgeworth approximation and the normal approximation against the exact reference distribution of local Moran s I i under the assumption of global spatial independence. Again, a hexaga m I ( λ λ I ) I ( 3λ 3λ 4I ) λ λ, i, obs 1 n i, obs i, obs 1 n i, obs 1 n b I ( m) ( λ I ) ( λ I ), and i, obs n i, obs 1 i, obs 1 i, obs n i, obs c λ I ( m 1) λ I ( m 1) 1 n 1 n 1 i, obs n i, obs i, obs i, obs i, obs λλ ( λλ λi λ I mi m I I ).

14 00 / Geographical Analysis onal setting with sixty-four cells has been selected and local Moran s I i is evaluated at an interior cell in the C-coding scheme and the projection matrix is M (1). Figure shows the exact as well as the three approximated distribution functions over the full probability range [0,1]. Obviously, the Edgeworth approximation performs poorly. It is particularly off in the tails of the distribution, whereas, in the center of the distribution, it outperforms the normal approximation. Overall, its accuracy is slightly better than that of the normal approximation. Distribution functions are by definition monotonically increasing; however, the Edgeworth approximation is partially decreasing in the tails. This effect is associated with a partially negative density function in the tails of the Edgeworth approximation. One can further observe that the normal approximation substantially overstates the significance of local Moran s I i in the tails as it approaches either Pr(I i I i,obs ) 0 for negative local spatial autocorrelation or 1 Pr(I i I i,obs ) 1 for positive local spatial autocorrelation much faster than the reference distribution. This is particularly critical in the situation of multiple testing of several local Moran s I i s where the overall α-level is adjusted downward, for instance, by a Bonferroni-type correction. Here the normal approximation would flag several local Moran s I i s in both tails as significant, whereas using the reference distribution indicates correctly that these local Moran s I i s deviate insignificantly from the zero hypothesis of local spatial independence. These findings indicate that neither the Edgeworth approximation nor the normal approximation are valid models for local Moran s I i reference distribution. On the other hand, the reference distribution and the saddlepoint approximation are virtually indistinguishable and the accuracy improves in the tails. This is clearly illustrated in Figures 3 and 4, which zoom in to the negative and positive tails of the reference distribution and the three approximations. One may argue, however, that the assumption of global spatial independence is inappropriate in empirical settings, which mostly exhibit some degree of global spatial autocorrelation. From a confirmatory point of view, this argument is highly relevant and the distribution of local Moran s I i conditional upon the global spatial process must be used. Extreme tail probabilities of local Moran s I i s then indicate a tendency FIG.. The Exact Distribution Function and the Three Approximated Distribution Functions for Local Moran s I i under the Assumption of Spatial Independence in a Hexagonal Tessellation with Sixty-four Cells

15 FIG. 3. Detailed View on the Negative Tail of the Distribution Function in Figure FIG. 4. Detailed View on the Positive Tail of the Distribution Function in Figure

16 0 / Geographical Analysis to exhibit either local clusters or local singularities (hot spots) beyond the average autocorrelation level of the global spatial process. Consequently, the conditional distribution identifies local pockets of spatial heterogeneity in the underlying global spatial process. In contrast, if global spatial independence denotes the reference level against which local Moran s I i is assessed, then the individual local Moran s I i s indicate the contribution to the global autocorrelation level. This is due to the additivity constraint I Σ n i 1I i. The next example uses an empirical setting to investigate the conditional distribution of local Moran s I i and its saddlepoint approximation. This example is taken from Tiefelsdorf (1998 and 000) and investigates spatial distribution of regression residuals from a bladder cancer incidence model in the 19 counties of the former German Democratic Republic. The regression model uses four explanatory and control variables. A global autoregressive spatial process with an autocorrelation level ρ had been identified. Local Moran s I i is specified in the variance stabilizing S-coding scheme. Its probabilities Pr(I i I i,obs Ω) are calculated subject to the identified autoregressive spatial process. Each local Moran s I i has its own distribution because the local spatial link matrix V i and consequently the spectrum of eigenvalues {γ 1,,γ n } differs from county to county. Recall that under the presence of a spatial process, the moments of local and global Moran s I are no longer effortlessly available and thus all approximation methods in section are no longer directly applicable. Figure 5 compares the exact conditional reference distribution against the saddlepoint approximation in a P-P plot. Points on the main diagonal indicate a perfect correspondence between the reference distribution and its saddlepoint approximation. The saddlepoint approximation equals the referenced distribution in the tails, where statistical tests are performed, whereas in the center of the distribution, mild deviations can be observed. The 19 probability comparisons do not line up on a smooth curve because each local Moran s I i has it individual distributional characteristics. FIG. 5. P-P Plot of Local Moran s I i Conditional Distribution for the 19 Counties of the Former GDR against the Saddlepoint Approximation

17 Michael Tiefelsdorf / 03 This example demonstrates that the saddlepoint approximation also works well for the conditional reference distribution in a larger spatial setting. The inherent autoregressive global spatial process in the data has been adjusted properly because, if the conditional probabilities were projected on either axis, they would follow, as required by theory, a rectangular distribution (see Robins, Van der Vaart, and Ventura 000). 4.3 The Numerical Efficiency of the Saddlepoint Approximation for Local Moran s I i s Reference Distribution The question remaining is, are there any computational gains by using the saddlepoint approximation? This question is addressed for the bladder cancer model by assuming either an underlying global spatial process or global spatial independence. Table shows the cumulative computing times for all 19 counties using the vector optimized numerical programming package GAUSS 3.5 (Aptech Systems, 000) on a Pentium III processor running at 850 MHz. Whenever possible, use has been made of GAUSS facility to handle efficiently operations on sparse matrices. All intermediate but constant matrix products were held in the core memory ready to evaluate local Moran s I i at all reference locations in the spatial tessellation. The first notable finding is that under the assumption of global spatial independence, the analytical solution to derive the 19 spectra of eigenvalues is by far more efficient than calculating numerically the matrix products M V i M and their spectra of eigenvalues. Explicit elementwise programming of the matrix products, that involve the sparse local link matrix V i, would increase the numerical efficiency even further. Consequently, the analytical eigenvalue approach should be used for local Moran s I i under the assumption of spatial independence. The analytical identification of the saddlepoints ωˆ takes virtually no time compared to an ad hoc implementation of the bisection algorithm (see Press et al. 199, pp ). Also the time to approximate local Moran s I i s probabilities by the saddlepoint approximation is negligible whereas the numerical integration in Imhof s formulation takes a substantial amount of time. Analytical solutions for the spectrum of eigenvalues of local Moran s TABLE Comparison of Computing Times between the Saddlepoint Approximation for Local Moran s I i and the Evaluation of the Exact Reference Distribution by Numerical Integration. Computing times under the assumption of global spatial independence Eigenvalues of M V i M: Analytical solution including matrix products (equations 8 and 9) Numerical evaluation of eigenvalues including matrix products Saddlepoint ωˆ : Analytical solution (equation 11) Numerical evaluation by bisection search (equation 7) Probabilities: Saddlepoint approximation (equation 6) Reference distribution via numerical integration 7.3 sec sec. 0.0 sec. 3.9 sec. 0.1 sec sec. Computing times under the presence of a global spatial process Eigenvalues of Ω 1 T M (V i I i,obs I) M Ω 1 : Numerical evaluation of eigenvalues including matrix products Saddlepoint ωˆ : Numerical evaluation by bisection search (equation 7) Probabilities: Saddlepoint approximation Reference distribution via numerical integration 73.9 sec. 3.3 sec. 0.1 sec sec. Several analytical solutions (see section 4.1) are employed under the assumption of global spatial independence. The reported times are cumulative over the 19 computations for each county in the former GDR.

18 04 / Geographical Analysis I i and for the saddlepoint ωˆ are no longer available under the presence of an underlying global spatial process. Their numerical counterparts must be used. Given that the eigenvalues have been already calculated, the key finding is that the saddlepoint approximation, including the numerical search for the saddlepoint ωˆ, is roughly one hundred times faster than the numerical integration of the exact method. The methods for the numerical search of the saddlepoint and the numerical integration perform slightly better under the presence of a global spatial process than under the assumption of global spatial independence. This is due to the fact that the eigenvalue spectrum is more diverse under a global spatial process than under the assumption of global spatial independence where it accumulates around the single value I i,obs. A note of caution, however, is appropriate. We can evaluate even in extremely large spatial tessellations under the assumption of global spatial independence the saddlepoint approximated probabilities of local Moran s I i. However, the same evaluation under the presence of a significant global spatial process become prohibitive because now products of n n matrices need to be calculated and their eigenvalue spectra needs to be determined numerically. For instance, for a spatial system of 3,053 contiguous U.S. counties (excluding Alaska and Hawaii) in the atlas of cancer mortality (see Devesa et al. 1999), the calculation of the matrix products and their associated eigenvalue spectrum for just one local Moran s I i takes over an hour under the presence of a global spatial process. The subsequent calculation of the saddlepoint approximation consumes only several seconds whereas the exact calculation of the reference distribution via numerical integration is performed within minutes. 5. CONCLUSIONS The flexibility of the saddlepoint approximation, its accuracy and numerical efficiency to evaluate the distribution of spatial statistics, such as that of global and local Moran s I, makes it the approximation method of first choice. This holds either under small sample conditions or under large sample circumstances in special cases, when the central limit theorem does not apply. In particular, if the distribution of the Moran s statistic does not converge to normality, such as for local Moran s I i and other special specifications of spatial link matrices, the saddlepoint approximation is the only choice besides calculating the exact reference distribution. In addition, when the moments of Moran s I are not readily available, such as for the distribution of Moran s I subject to a spatial process, for power function evaluations or for heteroskedasticity in the disturbances, it is again the only feasible approximation method because the underlying covariance matrix Ω can be explicitly accommodated. The saddlepoint approximation performs particularly well in the tails of distributions where statistical significance tests are usually conducted. Throughout this paper it has been assumed that the residuals are normally distributed. The normality assumption is at first sight quite restrictive. Nevertheless, it is common practice in standard regression modelling and testing to work with it or to achieve it by data transformations. However, for other error distributions, whenever the cumulant generating function of the test statistic can be derived, the saddlepoint approximation becomes applicable (Huzurbazar 1999), even if the reference distribution is not readily available. This property has the potential to lead to informed significance tests in spatial statistics without the need for the normality assumption and by avoiding extensive simulation experiments. Strawderman (000, p. 1363) states that in univariate problems, there is really no need to rely on asymptotic expansions at all, because most reasonable numerical quadrature routines can yield exact results to user-controlled levels of error. In part, this author agrees with this statement and prefers, whenever possible, to evaluate the exact distribution of Moran s I by numerical integration. However, judging from a

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Spatial Analysis 2. Spatial Autocorrelation

Spatial Analysis 2. Spatial Autocorrelation Spatial Analysis 2 Spatial Autocorrelation Spatial Autocorrelation a relationship between nearby spatial units of the same variable If, for every pair of subareas i and j in the study region, the drawings

More information

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Arthur Getis* and Jared Aldstadt** *San Diego State University **SDSU/UCSB

More information

Introduction to Spatial Statistics and Modeling for Regional Analysis

Introduction to Spatial Statistics and Modeling for Regional Analysis Introduction to Spatial Statistics and Modeling for Regional Analysis Dr. Xinyue Ye, Assistant Professor Center for Regional Development (Department of Commerce EDA University Center) & School of Earth,

More information

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB SPACE Workshop NSF NCGIA CSISS UCGIS SDSU Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB August 2-8, 2004 San Diego State University Some Examples of Spatial

More information

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 1. Introduction and Review. Luc Anselin.  Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 1. Introduction and Review Luc Anselin http://spatial.uchicago.edu matrix algebra basics spatial econometrics - definitions pitfalls of spatial analysis spatial autocorrelation spatial

More information

The Simplex Method: An Example

The Simplex Method: An Example The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified

More information

Local Spatial Autocorrelation Clusters

Local Spatial Autocorrelation Clusters Local Spatial Autocorrelation Clusters Luc Anselin http://spatial.uchicago.edu LISA principle local Moran local G statistics issues and interpretation LISA Principle Clustering vs Clusters global spatial

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale António Manuel RODRIGUES 1, José António TENEDÓRIO 2 1 Research fellow, e-geo Centre for Geography and Regional Planning,

More information

Tightening Durbin-Watson Bounds

Tightening Durbin-Watson Bounds The Economic and Social Review, Vol. 28, No. 4, October, 1997, pp. 351-356 Tightening Durbin-Watson Bounds DENIS CONNIFFE* The Economic and Social Research Institute Abstract: The null distribution of

More information

A Test of Cointegration Rank Based Title Component Analysis.

A Test of Cointegration Rank Based Title Component Analysis. A Test of Cointegration Rank Based Title Component Analysis Author(s) Chigira, Hiroaki Citation Issue 2006-01 Date Type Technical Report Text Version publisher URL http://hdl.handle.net/10086/13683 Right

More information

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation 1/30 Outline Basic Econometrics in Transportation Autocorrelation Amir Samimi What is the nature of autocorrelation? What are the theoretical and practical consequences of autocorrelation? Since the assumption

More information

Spatial Filtering with EViews and MATLAB

Spatial Filtering with EViews and MATLAB AUSTRIAN JOURNAL OF STATISTICS Volume 36 (2007), Number 1, 17 26 Spatial Filtering with EViews and MATLAB Robert Ferstl Vienna University of Economics and Business Administration Abstract: This article

More information

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity International Mathematics and Mathematical Sciences Volume 2011, Article ID 249564, 7 pages doi:10.1155/2011/249564 Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity J. Martin van

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis CHAPTER 8 MODEL DIAGNOSTICS We have now discussed methods for specifying models and for efficiently estimating the parameters in those models. Model diagnostics, or model criticism, is concerned with testing

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Multivariate Time Series: Part 4

Multivariate Time Series: Part 4 Multivariate Time Series: Part 4 Cointegration Gerald P. Dwyer Clemson University March 2016 Outline 1 Multivariate Time Series: Part 4 Cointegration Engle-Granger Test for Cointegration Johansen Test

More information

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor Spatial Statistics Module 4 Geographers are very interested in studying, understanding, and quantifying the patterns we can see on maps Q: What kinds of map patterns can you think of? There are so many

More information

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

arxiv: v1 [stat.me] 14 Jan 2019

arxiv: v1 [stat.me] 14 Jan 2019 arxiv:1901.04443v1 [stat.me] 14 Jan 2019 An Approach to Statistical Process Control that is New, Nonparametric, Simple, and Powerful W.J. Conover, Texas Tech University, Lubbock, Texas V. G. Tercero-Gómez,Tecnológico

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Finite-sample quantiles of the Jarque-Bera test

Finite-sample quantiles of the Jarque-Bera test Finite-sample quantiles of the Jarque-Bera test Steve Lawford Department of Economics and Finance, Brunel University First draft: February 2004. Abstract The nite-sample null distribution of the Jarque-Bera

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

A nonparametric test for seasonal unit roots

A nonparametric test for seasonal unit roots Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna To be presented in Innsbruck November 7, 2007 Abstract We consider a nonparametric test for the

More information

The Study on Trinary Join-Counts for Spatial Autocorrelation

The Study on Trinary Join-Counts for Spatial Autocorrelation Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 5-7, 008, pp. -8 The Study on Trinary Join-Counts

More information

Variables and Variable De nitions

Variables and Variable De nitions APPENDIX A Variables and Variable De nitions All demographic county-level variables have been drawn directly from the 1970, 1980, and 1990 U.S. Censuses of Population, published by the U.S. Department

More information

Reading Assignment. Distributed Lag and Autoregressive Models. Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1

Reading Assignment. Distributed Lag and Autoregressive Models. Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1 Reading Assignment Distributed Lag and Autoregressive Models Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1 Distributed Lag and Autoregressive Models Distributed lag model: y t = α + β

More information

Spatial Autocorrelation (2) Spatial Weights

Spatial Autocorrelation (2) Spatial Weights Spatial Autocorrelation (2) Spatial Weights Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline

More information

Basics of Geographic Analysis in R

Basics of Geographic Analysis in R Basics of Geographic Analysis in R Spatial Autocorrelation and Spatial Weights Yuri M. Zhukov GOV 2525: Political Geography February 25, 2013 Outline 1. Introduction 2. Spatial Data and Basic Visualization

More information

ARIMA Modelling and Forecasting

ARIMA Modelling and Forecasting ARIMA Modelling and Forecasting Economic time series often appear nonstationary, because of trends, seasonal patterns, cycles, etc. However, the differences may appear stationary. Δx t x t x t 1 (first

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Spatial Regression. 10. Specification Tests (2) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 10. Specification Tests (2) Luc Anselin.  Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 10. Specification Tests (2) Luc Anselin http://spatial.uchicago.edu 1 robust LM tests higher order tests 2SLS residuals specification search 2 Robust LM Tests 3 Recap and Notation LM-Error

More information

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 9. Specification Tests (1) Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 9. Specification Tests (1) Luc Anselin http://spatial.uchicago.edu 1 basic concepts types of tests Moran s I classic ML-based tests LM tests 2 Basic Concepts 3 The Logic of Specification

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

Testing Random Effects in Two-Way Spatial Panel Data Models

Testing Random Effects in Two-Way Spatial Panel Data Models Testing Random Effects in Two-Way Spatial Panel Data Models Nicolas Debarsy May 27, 2010 Abstract This paper proposes an alternative testing procedure to the Hausman test statistic to help the applied

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

Lecture 6: Hypothesis Testing

Lecture 6: Hypothesis Testing Lecture 6: Hypothesis Testing Mauricio Sarrias Universidad Católica del Norte November 6, 2017 1 Moran s I Statistic Mandatory Reading Moran s I based on Cliff and Ord (1972) Kelijan and Prucha (2001)

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Mapping and Analysis for Spatial Social Science

Mapping and Analysis for Spatial Social Science Mapping and Analysis for Spatial Social Science Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Chapter Four Gelfond s Solution of Hilbert s Seventh Problem (Revised January 2, 2011)

Chapter Four Gelfond s Solution of Hilbert s Seventh Problem (Revised January 2, 2011) Chapter Four Gelfond s Solution of Hilbert s Seventh Problem (Revised January 2, 2011) Before we consider Gelfond s, and then Schneider s, complete solutions to Hilbert s seventh problem let s look back

More information

Greene, Econometric Analysis (7th ed, 2012)

Greene, Econometric Analysis (7th ed, 2012) EC771: Econometrics, Spring 2012 Greene, Econometric Analysis (7th ed, 2012) Chapters 2 3: Classical Linear Regression The classical linear regression model is the single most useful tool in econometrics.

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case Areal data models Spatial smoothers Brook s Lemma and Gibbs distribution CAR models Gaussian case Non-Gaussian case SAR models Gaussian case Non-Gaussian case CAR vs. SAR STAR models Inference for areal

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1 Reading Assignment Serial Correlation and Heteroskedasticity Chapters 1 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1 Serial Correlation or Autocorrelation y t = β 0 + β 1 x 1t + β x t +... + β k

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive

More information

Spatial Effects and Externalities

Spatial Effects and Externalities Spatial Effects and Externalities Philip A. Viton November 5, Philip A. Viton CRP 66 Spatial () Externalities November 5, / 5 Introduction If you do certain things to your property for example, paint your

More information

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Multicollinearity and correlation among local regression coefficients in geographically weighted regression

Multicollinearity and correlation among local regression coefficients in geographically weighted regression J Geograph Syst (2005) 7: 161 187 DOI: 10.1007/s10109-005-0155-6 ORIGINAL PAPER David Wheeler Æ Michael Tiefelsdorf Multicollinearity and correlation among local regression coefficients in geographically

More information

splm: econometric analysis of spatial panel data

splm: econometric analysis of spatial panel data splm: econometric analysis of spatial panel data Giovanni Millo 1 Gianfranco Piras 2 1 Research Dept., Generali S.p.A. and DiSES, Univ. of Trieste 2 REAL, UIUC user! Conference Rennes, July 8th 2009 Introduction

More information

Adaptive Localization: Proposals for a high-resolution multivariate system Ross Bannister, HRAA, December 2008, January 2009 Version 3.

Adaptive Localization: Proposals for a high-resolution multivariate system Ross Bannister, HRAA, December 2008, January 2009 Version 3. Adaptive Localization: Proposals for a high-resolution multivariate system Ross Bannister, HRAA, December 2008, January 2009 Version 3.. The implicit Schur product 2. The Bishop method for adaptive localization

More information

SPATIAL ECONOMETRICS: METHODS AND MODELS

SPATIAL ECONOMETRICS: METHODS AND MODELS SPATIAL ECONOMETRICS: METHODS AND MODELS STUDIES IN OPERATIONAL REGIONAL SCIENCE Folmer, H., Regional Economic Policy. 1986. ISBN 90-247-3308-1. Brouwer, F., Integrated Environmental Modelling: Design

More information

,i = 1,2,L, p. For a sample of size n, let the columns of data be

,i = 1,2,L, p. For a sample of size n, let the columns of data be MAC IIci: Miller Asymptotics Chapter 5: Regression Section?: Asymptotic Relationship Between a CC and its Associated Slope Estimates in Multiple Linear Regression The asymptotic null distribution of a

More information

POLSCI 702 Non-Normality and Heteroskedasticity

POLSCI 702 Non-Normality and Heteroskedasticity Goals of this Lecture POLSCI 702 Non-Normality and Heteroskedasticity Dave Armstrong University of Wisconsin Milwaukee Department of Political Science e: armstrod@uwm.edu w: www.quantoid.net/uwm702.html

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

F9 F10: Autocorrelation

F9 F10: Autocorrelation F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?

More information

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

Spatial Autocorrelation

Spatial Autocorrelation Spatial Autocorrelation Luc Anselin http://spatial.uchicago.edu spatial randomness positive and negative spatial autocorrelation spatial autocorrelation statistics spatial weights Spatial Randomness The

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

The Robustness of the Multivariate EWMA Control Chart

The Robustness of the Multivariate EWMA Control Chart The Robustness of the Multivariate EWMA Control Chart Zachary G. Stoumbos, Rutgers University, and Joe H. Sullivan, Mississippi State University Joe H. Sullivan, MSU, MS 39762 Key Words: Elliptically symmetric,

More information

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS 1 W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS An Liu University of Groningen Henk Folmer University of Groningen Wageningen University Han Oud Radboud

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Spatial Analysis 1. Introduction

Spatial Analysis 1. Introduction Spatial Analysis 1 Introduction Geo-referenced Data (not any data) x, y coordinates (e.g., lat., long.) ------------------------------------------------------ - Table of Data: Obs. # x y Variables -------------------------------------

More information

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Outline. Overview of Issues. Spatial Regression. Luc Anselin Spatial Regression Luc Anselin University of Illinois, Urbana-Champaign http://www.spacestat.com Outline Overview of Issues Spatial Regression Specifications Space-Time Models Spatial Latent Variable Models

More information

Research Notes and Comments I 347

Research Notes and Comments I 347 Research Notes and Comments I 347 mum-likelihood estimation of the constant and does not need to be applied a posteriori. Overall, the replacement of the lognormal model by the Poisson model provides a

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the

More information

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11. Chapter 4 Statistics 45 CHAPTER 4 BASIC QUALITY CONCEPTS 1.0 Continuous Distributions.0 Measures of Central Tendency 3.0 Measures of Spread or Dispersion 4.0 Histograms and Frequency Distributions 5.0

More information

Nonparametric Estimation of the Spatial Connectivity Matrix Using Spatial Panel Data

Nonparametric Estimation of the Spatial Connectivity Matrix Using Spatial Panel Data Geographical Analysis (2012), Nonparametric Estimation of the Spatial Connectivity Matrix Using Spatial Panel Data Michael Beenstock 1, Daniel Felsenstein 2 1 Department of Economics, Hebrew University

More information

Econometrics Part Three

Econometrics Part Three !1 I. Heteroskedasticity A. Definition 1. The variance of the error term is correlated with one of the explanatory variables 2. Example -- the variance of actual spending around the consumption line increases

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

Coskewness and Cokurtosis John W. Fowler July 9, 2005

Coskewness and Cokurtosis John W. Fowler July 9, 2005 Coskewness and Cokurtosis John W. Fowler July 9, 2005 The concept of a covariance matrix can be extended to higher moments; of particular interest are the third and fourth moments. A very common application

More information

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

More information