Identification of Local Clusters for Count Data: A. Model-Based Moran s I Test

Size: px
Start display at page:

Download "Identification of Local Clusters for Count Data: A. Model-Based Moran s I Test"

Transcription

1 Identification of Local Clusters for Count Data: A Model-Based Moran s I Test Tonglin Zhang and Ge Lin Purdue University and West Virginia University February 14, 2007 Department of Statistics, Purdue University, 250 North University Street,West Lafayette, IN , tlzhang@stat.purdue.edu Department of Geology and Geography, West Virginia University, Morgantown, WV , glin@wvu.edu 0

2 Identification of Local Clusters for Count Data: A Model-Based Moran s I Test Abstract We set out I DR as a loglinear model-based Moran s I test for Poisson count data that resembles the Moran s I residual test for Gaussian data. We evaluate its type I and type II error probabilities via simulations, and demonstrate its utility via a case study. When population sizes are heterogeneous, I DR is effective in detecting local clusters by local association terms with an acceptable type I error probability. When used in conjunction with local spatial association terms in loglinear models, I DR can also indicate the existence of first-order global cluster that can hardly be removed by local spatial association terms. In this situation, I DR should not be directly applied for local cluster detection. In the case study of St. Louis homicides, we bridge loglinear model methods for parameter estimation to exploratory data analysis, so that a uniform association term can be defined with spatially varied contributions among spatial neighbors. The method makes use of exploratory tools such as Moran s I scatter plots and residual plots to evaluate the magnitude of deviance residuals, and it is effective to model the shape, the elevation and the magnitude of a local cluster in the model-based test. Keywords: Cluster and clustering; deviance residual; Moran s I; permutation test; spatial autocorrelation; type I error probability. 1 Introduction Count and cross-tabulated frequency data are common in geographical analyses. Many spatial phenomena, such as births, deaths, crimes and species richness, can be counted by a spatial unit, either as a raw count or as a ratio over some exposure. Prior to the 1970s, count data were often converted to rate for statistical analyses because of limited computational power in categorical statistics. In the late 1970s, computationally expensive methods, such as loglinear models for 1

3 cross-tabulated data were introduced into social sciences and geography [15, 38], and they were quickly included in many statistical packages. In spatial statistical analyses, however, counts are still frequently converted to rate so that a testing method for continuous variables, such as Moran s I [26, 27] or Getis-Ord s G [20], can be directly applied. However, when population sizes are heterogeneous across spatial units, converting counts to rates often leads to variance inflation and biased type I error probabilities. Some propose to incorporate a population weight to the test statistics [29, 35], but the heterogeneity problem still remains [5]. Since a loglinear model can take account of population sizes in its likelihood ratio test, it is natural to extend the spatial statistics under the loglinear model framework. In this article, we set out a loglinear model-based test statistic for Poisson count data that corresponds to Moran s I for continuous data. We chose Moran s I because of its popularity and its ease of implementation. There have been hundreds of applications and extensions of the statistic since Moran s I was first published in 1948 [26]. Currently, most researchers focus on estimation methods [13, 22, 37], spatial distribution properties [7, 19], and adjustment of heterogeneous population sizes for count data [5, 29, 35]. A concurrent theme is focused on local spatial statistics or indicators [16]. It is pointed out that the extent of spatial correlation may vary locally due to omitted, misspecified, and deficient measurements for a stationary spatial relationship [17]. A significant Moran s I test may be caused by either a global trend of spatial autocorrelation, or a few local spatial associations. Attempts have been made to partition space for spatially varied parameterization [10], and to decompose a global autocorrelation measure, such as Moran s I, into a local indicator of spatial association (LISA) [3]. With auxiliary information, LISA is able to locate spatial associations, such as hot spots and cool spots [33]. Our model-based test should complement LISA, because it is not only able to explicitly indicate high-value and low-value clusters, but it is also able to account for heterogeneous population sizes. As its name suggested, a model-based test depends on a particular statistical model. In a linear regression model, a dependent variable is often associated with a set of explanatory variables. After 2

4 a final model is derived, a residual Moran s I test for spatial autocorrelation can also be performed to detect spatial clustering for unexplained variations ([14], p. 197). When a regression model does not include any explanatory variables, the residual Moran s I test is identical to the Moran s I test of the dependent variable. If we can bridge this test for spatial autocorrelation to a loglinear model, it would likely narrow the apparent knowledge gap between Moran s I for continuous data and other autocorrelation tests for count data ([38], p. 307). There are some recent advances in incorporating count data in spatial statistics. Griffith [21] introduced a spatial filter specification of the auto-regressive logistic model that is able to remove the global clustering effect. The model is likely to provide unbiased parameter estimates for autoregressive logistic regression, but due to its focus on model correction, the method may not be able to detect a local association. Several test statistics, such as I pop and Ipop [29] and modified I [35] by Empirical Bayes Index (EBI), C G [34] or spatial-x 2 test [31], are able to account for heterogeneous population sizes and to detect a local cluster, but none of them can account for ecological or geographic covariates. Lin s [23] spatial logit association model is able to include ecological covariates and spatial associations, but the significance of a logit association term is not a direct measure of spatial clustering. Although Apanasovich and his coauthors [6] used the Pearson residuals to test for spatial autocorrelation in their autoregressive model, the test was not formally specified and evaluated for wider applications. In this paper, we demonstrate that Moran s I based on loglinear residuals can be used not only as a global indicator of spatial autocorrelation, but also as a tool for modeling the location, the shape, the elevation and the size of a local spatial cluster. In the remaining sections of the paper, we briefly review the permutation test of Moran s I by using regression residuals and reformulate it in the context of Poisson data by using the deviance residuals of a loglinear model. We then use simulations to evaluate its statistical properties under the null and alternative hypotheses of spatial independence. In section 4, we apply the deviance residual Moran s I test to the St. Louis crime data. Finally, we provide some concluding remarks. 3

5 2 A Model-Based Moran s I Consider a study area that has m regions indexed by i. Let X i be the variable of interest in region i. Moran s I [26, 27] is expressed as: I = mi=1 mj=1 w ij (X i X)(X j X) [ m i=1 j=1 w ij ][ m i=1 (X i X) 2 /m], (1) where X = m i=1 X i /m, and w ij with w ii = 0 is the (i, j)-th element of a spatial weight matrix W. Commonly w ij is defined by the adjacency of spatial units: w ij = 1 if regions i and j are adjacent (neighbors) and w ij = 0 otherwise. A significant and positive value of Moran s I indicates the existence of a positive autocorrelation, or the existence of high-value or low-value clustering. A significant and negative value of Moran s I indicates a negative autocorrelation, or a tendency toward the juxtaposition of high values next to low values. The null hypothesis of Moran s I is usually based on the assumption that the distributions of X i are homogeneous. The p-value of the significance of Moran s I is computed from a z-test based on its z-value given by Z(I) = [I E(I)]/ V (I), where E(I) and V (I) are the theoretically mean and variance respectively under the null hypothesis. Under the null hypothesis of no spatial autocorrelation, Z(I) is assumed to be asymptotically distributed of N(0, 1) as m. The theoretical values of E(I) and V (I) are usually computed under the random permutation test scheme: E(I) = 1 m 1 (2) and V (I) = m[(m2 3m + 3)S 1 ms 2 + 3S 2 0] b 2 [(m 2 m)s 1 2mS 2 + 6S 2 0] (m 1)(m 2)(m 3)S 2 0 E 2 (I), (3) where S 0 = m i=1 mj=1,j i (w ij +w ji )/2, S 1 = m i=1 mj=1,j i (w ij +w ji ) 2 /2, and S 2 = m i=1 (w i +w i ) 2 with w i = m j=1,j i w ij, and b 2 = m m i=1 (z i z) 4 /[ m i=1 (z i z) 2 ] 2 ([14], p. 21). When observations are counts, such as crimes, X i in (1) often takes the form of case rate as X i = n i /ξ i, where n i is the number of cases and ξ i is the at risk population size in region 4

6 i. However, the homogeneous assumption under this specification may not be valid [11]. Since loglinear models can relax this assumption, we can specify a loglinear model and use its deviance residuals to test for spatial autocorrelation. Suppose that the random count N i with an observed count n i, i = 1,, m, follows a Poisson distribution and assume that the counts N i s are independent. Suppose that a set of geographical covariates are observed together with the count n i. Then, a loglinear model can be set out by taking the observed geographical covariates as explanatory variables and the logarithm of the at risk population size (e.g. log(ξ i )) as an offset term. When the parameters are estimated by maximum likelihood estimation, the estimated value ˆn i of the expected count E(N i ) can be derived and the conventional deviance residual ([1], p. 588) for region i is r i,d = 2sign(n i ˆn i )[n i log(n i /ˆn i ) n i + ˆn i ] 1/2, (4) where sign( ) is the sign function defined as sign(a) is 1 if a > 0, is 0 if a = 0, and is 1 if a < 0. The concepts and statistical properties of deviance residuals in loglinear models are well established. We can readily extend these concepts to spatial statistics. Note that the numerator of Moran s I is a martingale if X i s are independent with mean 0. When X i = r i,d with r i,d given in (4) and ˆn i is replaced by the expected count E(N i ), X 1,, X m are independent and E(X i ) is almost 0 if E(N i ) is large (e.g. E(N i ) > 5). When E(N i ) is estimated by ˆn i from a loglinear model, then under the model assumption, ˆn i is a consistent estimator of E(N i ), and the joint distribution of (r 1,d,, r m,d ) is approximately normal with mean 0 and variance-covariance matrix an orthogonal projection matrix [1, 30] denoted by P m. For a fixed number of covariates when m is large, the orthogonal project matrix P m is almost equivalent to the identity matrix since the dimension of the kernel space of the project matrix is equal to the number of covariates. When m goes to infinity, r 1,d,, r m,d are approximately independent and the asymptotic normality of Z(I) can be proven by a martingale approximation of the numerator of Moran s I with an application of the Martingale Central Limit Theorem ([9], p. 475). In addition, one must also consider the 5

7 convergence of the permutation mean and variance of Moran s I in this scenario [32]. This particular asymptotic formulation of the deviance residuals is analogous to that of regression residuals ([14], p. 198). It is noted that deviance residuals are very flexible in loglinear models, and they reflect categorical structure (in this case spatial structure) while controlling for potentially heterogeneous population sizes ([1], p. 495). We can similarly test deviance residuals for spatial autocorrelation by specifying a loglinear model. Since a loglinear model, such as log-rate model, can incorporate geographic (or ecological) covariates, we can test its residuals for spatial autocorrelation in the presence or absence of ecological covariates. A nature approach is to apply the random permutation test so that Moran s I based on the deviance residual of a loglinear model is analogous to Moran s I based on residuals of a regression model [6]. Given that deviance residuals are approximately multivariate normal, we can test spatial autocorrelation of the residuals by replacing X i in (1) with r i,d in equation (4), and we label it I DR. The mean and variance of I DR can be identically derived from the random permutation scheme of the conventional Moran s I as given by equations (2) and (3) respectively. To implement I DR, we can simply estimate the expected counts under the null model with the intercept only, which indicates that ˆn i = ξ i (n/ξ) with n = n i=1 n i and ξ = m i=1 ξ i. In this case, the i-th deviance residual r i,d can be derived by inserting ˆn i in (4). If I DR is positive and significant, it suggests spatial clustering, which can either be contributed by a first-order clustering trend or a few local clusters. We can detect clustering contributions by applying spatial association models [23, 24]. First, a number of spatial association terms are added to the null model. Then, the parameter estimates together with residuals are derived in the model fitting process. The existence of spatial autocorrelation is tested again via I DR for the model residuals. If I DR is significant in the null model but not significant in the model with local association terms, the significance found in the null model is likely to be accounted for by the local association terms. If a few spatial association terms cannot reduce the significance of I DR from the null model, it suggests the existence of a first-order global clustering tendency. 6

8 Under the assumption that there is a local cluster in the study area, a loglinear model with a spatial association term is: log(λ i ) = log(ξ i ) + β 0 + β 1 d i(j) (5) where λ i = E(N i ), β 0 is the grand mean and β 1 is the unknown parameter for the spatial association term defined by d i(j) in which d i(j) is 1 if location i is believed to be in a cluster centered at unit j, and otherwise d i(j) is 0. We test if β 1 significantly differs from 0. The significance of the local association term is determined by its p-value via the likelihood ratio test over the null model without the spatial association term. Likewise, the contribution of the spatial association term to I DR is determined by comparing the p-value of I DR with and without the term. If the coefficient of β 1 is positively significant, then the local cluster is a hot spot. If the coefficient is negatively significant, then the local cluster is a cool spot. Besides the above likelihood ratio test, we can gauge the contribution of β 1 for the local cluster by comparing I DR results with and without the β 1 term in model (5). If I DR is not significant when the spatial association term is included, then the clustered effect in the null model is sufficiently removed by the association term. If the inclusion of spatial association term in model (5) does not change the significance level of I DR in the null model, then the clustering effect remains. To further improve model fit and to identify the explained clustering effect, one can either refine the spatial association term already in the model or include another spatial association term. Finally, if the existence of a local cluster is accompanied by a first-order global clustering trend, the likelihood ratio test may still be significant by the inclusion of the local association term, but it is unlikely to reduce the p-value of I DR from a significant level to a non-significant level. 7

9 3 Simulation Assessment of I DR We designed Monte Carlo simulation experiments to assess the effectiveness of the model-based test I DR under population heterogeneity. Type I error rates were evaluated under the null hypothesis of homogeneous rates with heterogeneous population sizes, while spatial cluster modeling was evaluated in the presence and absence of first-order global clustering trend. All the simulation experiments were based on a lattice with w ij being defined by the rook rule of spatial adjacency. We set the significance level of α = 0.05 to assess the rejection rates of I DR in each set of simulations. In the presence of a local cluster, a residual plot was also furnished to facilitate the evaluation process. In addition to I DR, we included the original Moran s I by converting counts to rates, and denoted it by I r which is defined by letting x i = r i = n i /ξ i in (1). Previous studies have demonstrated that I r is sensitive to heterogeneous populations, and the inclusion of I r was to serve as a baseline for comparison. We also included the Empirical Bayesian Index (EBI) denoted by I EBI, a population-adjusted Moran s I proposed by Assuncao and Reis [5]. I EBI is found to be effective for adjusting population sizes in the presence of population heterogeneity, and it has been included in GeoDa, a popular spatial exploratory data analysis freeware [4]. However, I EBI is not a model-based test and it cannot include ecological covariates. This can be seen from the definition, in which z i = r i,ebi = (p i b)/ ν i, where b = n/ξ, ν i = a + b/ξ i, a = s 2 b/(ξ/m) and s 2 = m i=1 ξ i (p i d)/ξ. Hence, I EBI = mi=1 j i w ij ( p i b νi [ m l=1 ( p i b νi 1 m 1 m ml=1 p l b νl )( p j b νj 1 ml=1 p l b m ml=1 p l b νl ) 2 /m][ m i=1 mj=1, i w ij ] νl ). (6) Assess I DR for population heterogeneity. Following the simulation studies of Walter [36] and Assuncao and Reis [5], we compared the type I error probabilities of I DR, I EBI and I r based on Monte Carlo simulations. Walter [36] reported that densely populated areas with a pocket sparsely populated area could cause an excessive type I error probability for I r. To represent this pattern, we generated relatively low population of 10 6 (1 η) 2 for lattice points within a 2-unit 8

10 circle centered at (3, 3), and 10 6 for others. The η value indexes population heterogeneity from 0 to 0.8 with an increment of 0.04 increments. When η = 0, all the populations were homogeneous. As η is getting closer to 1, the populations are increasingly heterogeneous. Based on the above population patterns, we generated independent Poisson random variables with the mean value being 10 4 times the population size for each lattice point. Since identical rates were expected across all lattice points, there should be no spatial clustering. The rejection rate, therefore, should reflect the type I error probability of the spatial autocorrelation test. For each η value, we calculated type I error probabilities based on 10, 000 simulations and resultant z values. The results (Figure 1) show that both I DR and I EBI were able to account for population heterogeneity with an almost identical type I error probability around 0.05 for all η values. The type I error probability of I r, however, was only acceptable when η is small with little variation in population sizes. As η increased, the type I error rates also increased. When population sizes varied substantially (η = 0.8), the rejection rate was as high as 25%, a result consistent with Walter s simulations. Assess I DR for local cluster detection. Based on the previous simulation result, we devised a fixed heterogeneous population pattern: the population was 10 5 if a point on the lattice was within the circle and the population was 10 6 otherwise. We generated independent Poisson random variables with the mean equal to times the population size for each lattice point. We then inserted a 2-unit circle for a cluster effect centered at (7, 7), and set the mean equal to (1+δ) times the population. The δ value represented the strength and direction of the cluster effect, and it increased from 0.8 to 0.8 with 0.04 in each step. If δ < 0, the circle represented a low-value cluster; if δ > 0, it represented a high-value cluster. Again, based on 10, 000 simulations for each δ value under population heterogeneity, the rejection rates of I DR with and without the spatial association term from model (5) are shown in Figure 2. The rejection rate without the spatial association term indicates the statistical power of I DR, while the rate with the spatial association term indicates the effectiveness of the model-based 9

11 test for a high-value or low-value cluster. If the model based test is effective, the test statistic should no longer be significant when the spatial association term that covers the exact circle being included. The results show that I DR under the null model had a reasonable power (Figure 2). When δ values were around 0, the rejection rate was around When the absolute δ values were greater than 0.25, the rejection rates were about 15%. When the δ value reached 0.8 or a cool spot, the rejection rate was almost 100%. When δ value reached 0.8, the rejection rate was about 85%. Both results suggest that I DR under the null model is likely to be significant when there is a strong local cluster. However, when the cluster tendency was accounted for by the spatial association term, the rejection rates were consistently around 0.05, suggesting that I DR was unlikely to be significant when a spatial association term absorbed the cluster effect. Since the relative risks within the cluster were all similarly higher or lower than the rest of the area in our simulations, once its effect was removed by the spatial association term, the study area became spatially independent, a result consistent with previous simulations in the spatial logit association model [23]. The effect of the spatial association term can be illustrated by the residual and QQ-normal plots from a single simulation. The upper panel of Figure 3 displays the results under the null model. The I DR test had a p-value of primarily due to a number of extremely high deviance residuals from the clustered area. Likewise, the QQ-plot shows that a number of high values are concentrated in the upper tail, suggesting the existence of extreme values. The lower panel in Figure 3 shows that once the spatial association term was added to the model, the effect of extreme large residual values in the null model was disappeared, whereas the p-value of I DR reduced to 0.12 with evenly distributed residuals. This result is also collaborated from the QQ-plot with all the values along a straight line. Assess I DR for a local cluster in the presence of first-order global clustering. It is known that a local cluster and a first-order clustering trend can operate simultaneously. In the presence of global clustering, it is often necessary to first de-trend before fitting a spatial regression 10

12 model [2, 12]. We intend to evaluate the performance of I DR in this situation by generating the global spatial structure from a log-normal distribution, and by inserting a local cluster from the previous simulation with δ = 2.0 in the simulation. If the local test is insensitive to the first-order clustering tendency, then it indicates the existence of global clustering. In the simulation process, we first generated 100 identically independently distributed (iid) N(0, 1) random variables, denoted by ɛ = (ɛ 1, ɛ 2,, ɛ 100 ). Next, we calculated a vector u by letting u = (I ρw ) 1 ɛ with ρ increasing from 0 to 0.2 in step increment 0.01 such that u satisfied u = ρw u + ɛ, where ρ is the coefficient of the global spatial association [2, 5]. Third, we let λ = (1+2d)e u, where λ = (λ 1,, λ 100 ) was the vector of Poisson intensity for generating counts. We generated a conditional independent Poisson random variable N i with parameter λ i times the i-th population size. When ρ = 0, there was only a local cluster in the simulated pattern, and when ρ 0, there were both local and global clustering tendencies in the simulation pattern. We assess the effectiveness of I DR by comparing the rejection rates of I DR with and without the spatial association term. Based on 10, 000 simulations for each ρ value, the results (Figure 4) showed that the spatial association term was unable to reduce the clustering effect except when the global clustering trend was very weak. For instance, when ρ = 0, the rejection rate for I DR in the null model was about 28%, and it suggested spatial clustering. When the spatial association term was included in this case, the local cluster tendency was reduced similar to the previous simulation. As the global clustering trend ρ increased, the rejection rates of I DR also increased, and the two curves with and without the association term were likely to be significant for even a modest increase in ρ. The inclusion of the spatial association term had little effect on removing a local clustering effect in the presence of the global clustering tendency. It further suggests that even when a association term might be significant in terms of the likelihood ratio test, the local effect might not be trustworthy, 11

13 because the global effect overshadowed the local effect. Figure 5 displays the residual and QQ-plots of the deviance residuals with and without the spatial association term from a single simulation run (ρ = 0.15). It is evident, there were only few large deviance residuals in absolute values, and they were not clumped together. This pattern is in sharp contrast with the one in Figure 3. In addition, the p-values of I DR with and without using the spatial association term were very close: with the spatial association term and without. These result suggest that the inclusion of a local association term is unlikely to reduce the significance of I DR because of the overall global clustering effect. In summary, I DR is effective in reducing type I error probabilities of the traditional Moran s I due to heterogeneous population sizes, and its performance is comparable to that of I EBI. An advantage of I DR over I EBI is its ability to include ecological or other spatial covariates. When a significant I DR is contributed mainly by a local cluster, we can devise a spatial association term to remove the cluster effect, so that the spatial autocorrelation observed in the null model would not be significant anymore. The exact form of association term can be determined either by a stepwise regression method [23] or from a exploratory method, such as deviance residual plots. Since I DR is sensitive to the existence of local clusters but not sensitive to the presence of the global trend, the inclusion of a spatial association term in the I DR test can indicate whether a first-order global clustering trend exists or not. 4 St. Louis Homicides Data analysis In this section, we apply I DR to analyzing homicides in the St. Louis region. The data set was originally analyzed by Messner, et. al [25], and it is also included as part of exercises in GeoDa [4], a simple spatial analysis package developed by Anselin and his associates. In the original paper, homicide rates for and periods were analyzed at the county level, and a number of local clusters including one centered at St. Louis City were identified by LISA. Here, 12

14 we can use the model based I DR to detect spatial clustering based on homicide incidents and the at risk population. Analogous to LISA, we also included a local version of deviance residual Moran s I or deviance residual LISA denoted by I DR,i. The result of I DR,i was compared with the results of the local versions of I r and I EBI, denoted by I r,i and I EBI,i respectively, where I DR,i, I r,i and I EBI,i can be defined according to the formula given by Anselin [3] as I i = mj=1,j i w ij (x i x)(x j x) mi=1 (x i x) 2 /m (7) by letting x i = r i,d, x i = r i and x i = r i,ebi respectively. All I DR,i, I r,i and I EBI,i are able to provide additional ways of exploratory spatial analysis for count data, such as cluster maps. However, only I DR,i is able to provide an additional clustering analysis when a covariate variable is accounted for. In the preliminary analysis, we found that I DR for the period was with an insignificant p-value of , and I DR for the period was with a significant p-value of We, therefore, focused on the latter period. Between 1988 and 1993, there were 2, 650 homicides, and the average homicide rate was about 10 per 100, 000. County populations in the study area vary substantially: St. Louis County was the largest with more than one million residents, and five other counties that include St. Louis City, St. Clair, Boone, Sangamon and Macon had at least 50, 000 residents. To detect spatial clustering for homicides, we first fitted the null model. The results from I DR indicated a significant clustering tendency with the z-value of 2.72 and p-value of When we plotted the deviance residuals by five equal intervals (Figure 6), St. Louis City was in the first interval with deviance residual, St. Clair county was in the third interval (18.06), and there was no county in the second interval. This indicated that St. Louis City was the only county that indicated a high-value cluster surrounded by St. Clair, St. Louis and Madison counties. In addition, we further plotted deviance residuals LISA by using GeoDa, and found that the standardized values of I DR,i, I r,i and I EBI,i were 17.5, and respectively when i indicated the St. Louis 13

15 county, and the values were 12.27, and respectively when i indicated the St. Clair county. The values of the rest counties were much lower that the values of those two counties. The LISA plot also indicated that St. Louis and Madison counties were next to high valued counties, presumably the two very high valued county St. Louis City and St. Clair counties. Based on the above information, we decided not to adopt the spatial association term that assigns equal contribution to the clustered effect. We refined the shape of the cluster by examining each individual residual within the adjacent counties, and devised a spatially varied association term to capture the magnitude of residual variation within a cluster. Based on the principle of the uniform association model [1], a large residual value should correspond to a large d i(j) value, and a relatively small residual value should correspond to a small d i(j) value. When a neighbor county has an ignorable absolute residual value, it can be dropped from the spatial specification. From the five equal interval classification, St. Louis City was in the first interval, St. Clair in the third, and Madison and St. Louis in the fourth interval. We assigned, accordingly, 4 to St. Louis City, 2 to St. Clair, and 1 to Madison and St. Louis counties, and this assignment could be achieved automatically in our search algorithm because of standard intervals were used. The results show that the model with the spatial association term was highly significant contributing to an around 2, 610 reduction of deviance from the null model of 2, 944 to the alternative model of 334. In the meantime, the coefficient of the spatial association term indicates a high-value cluster, and its inclusion changed the p-values of I DR from in the null model to in the alternative model. It suggests that the spatial association term can remove the effect of the local cluster, and there was no global clustering trend. In addition, if we inspect the deviance residuals individually for the 4 counties, we could see 40, 20, and 5. Based on this information, we further experimented with assigning 4 to St. Louis City, 2.5 to St. Clair, and 1 to Madison and St. Louis counties (Model II), and this assignment could further reduce the deviance to 177 with a z-value of for I DR. In both cases, the values of I DR,i significantly decreased to a very low level for St. Louis and St. Clair counties which was almost not 14

16 significant throughout the region at the 0.05 probability level when we adjusted for the multiple testing problem of 78 units by the Bonferroni s method (see [28], p 153). It is worth noting that odds ratios can be used to describe the shape of a cluster. For instance, the odds ratio of = e in Model I between St. Louis county and other counties indicates that St. Louis county was = e times as likely as other counties to have a homicide. Similarly, St. Louis City would be = e times and St. Clair would be = e times as likely. Alternatively, we can use geographical covariates to explain the detected clustering tendency. For instance, it is known that St. Louis City had a high concentration of Blacks. We obtained the percentage of Blacks from the 1990 census for all the 78 counties and used it as an ecological covariate in place of the spatial association term. The results (Table 1 last row) show the percentage of Blacks was positively associated with the likelihood of homicides in the study area. The ecological model performed slightly better than the spatial association model in terms of the likelihood ratio test, i.e., smaller deviance with the same number degrees of freedom. In addition, when the ecological variable was included, the p-value of for I DR was not significant, suggesting that there was no spatial autocorrelation anymore. This result implies that the St. Louis City cluster detected by the association term can be explained by the percentage of Blacks in the case study. The use of an ecological variable or a spatial association term can both yield useful information to describe and quantify a detected cluster. 5 Concluding Remarks In this paper, we have specified and evaluated I DR as a loglinear model-based Moran s I test for Poisson count data that resembles the Moran s I residuals test for Gaussian data. Based on previous studies, we pointed out that loglinear residuals are not only asymptotically normal, but also applicable to the permutation test of Moran s I for a correctly specified model. We evaluated 15

17 type I and type II error rates via simulations, and found that I DR was effective to account for heterogeneous population sizes, and to detect a local cluster in the absence of a global trend. In the presence of a global trend, the power of detecting a local cluster was very weak, a problem that also exists for a continuous dependent variable in a linear regression model [12]. In the case study, we extended Lin s [23] spatial association model that emphasizes equal contributions among spatial neighbors to an ordered or uniform spatial association model that captures spatially varied contributions among spatial neighbors within a cluster. This model has several advantages. First, it makes use of exploratory tools such as Moran s I scatter plots and residual plots to evaluate the magnitude of deviance residuals. Second, cluster shape can be determined in terms of its geographic coverage and its slope via odds ratios. In other words, a 3-dimensional cluster that spatially varies in terms of its magnitude can be derived by the spatially varied association term. Third, this analysis can be extended to probit, logit [6] and other limited dependent variables under the loglinear framework. Finally, our model-based I DR test is complementary to recent development of residual-based spatial statistical approaches [8]. Future research should extend I DR to other test statistics, such as Getis-Ord s G [20] and Geary s c [18], and assess their effectiveness for various spatial problems. Likewise, there are many conventional methods for modeling categorical associations, and we should examine their effectiveness for constructing a spatially varied association term, and for specifying various forms of loglinear models in the context of spatial analysis. The current study does not offer any detrend methods in the presence of a global trend, and how to de-trend while locating and explaining local clusters remains an challenging issue. Finally, like other model-based tests, when a model is mis-specified, the result from a model-based test, such as I DR can be misleading, criteria for a correctly specified model should be established for spatial loglinear models. Aknowledgements: The authors would like to thank a reviewer for the detailed comments and suggestions, which have substantially improved the quality of the paper. 16

18 References [1] Agresti, A. (2002). Categorical Data Analysis. Wiley, New York. [2] Anselin, L. (1990). Spatial dependence and spatial structural instability in applied regression analysis. Journal of Regional Science, 30, [3] Anselin, L. (1995). Local indicators of spatial association-lisa. Geographical Analysis, 27, [4] Anselin, L., Syabri, I. and Kho, Y. (2006). GeoDa: An introduction to spatial data analysis. Geographical Analysis, 38, [5] Assuncao, R. and Reis, E. (1999). A new proposal to adjust Moran s I for population density. Statistics in Medicine, 18, [6] Apanasovich, T. V, Sheather, S., Lupton, J. R., Popovic, N., Yurner, N. D., Chapkin, R. S., Braby, L., A., Carroll, R. J. (2003). Testing for spatial correlation in nonstationary binary data, with application to aberrant crypt foci in colon carcinogenesis. Biometrics, 50, [7] Bennett, R. J. and Haining, R. P. (1985). Spatial structure and spatial interaction modeling approaches to the statistical analysis of geographic data. Journal of Royal Statistical Society A, 48, [8] Baddeley, A., Turner, R. and Hazelton, M. (2005). Residual analysis for spatial point processes. Journal of Royal Statistical Society B, 67, [9] Billingsley, P. (1995). Probability and Measure, Wiley, New York. [10] Brunsdon, C., Aitkin, M., Fotheringham, S. and Charlton, M. (1999). A comparison of random coefficient modeling and geographically weighted regression for spatial non-stationary regression problems. Geographical and Environmental Modelling, 3,

19 [11] Besag, J. and Newell, J. (1991). The detection of clusters in rare diseases. Journal of Royal Statistical Society A, 154, [12] Cressie, N. (1993). Statistics for spatial data, Wiley, New York. [13] Cliff, A. D. and Ord, J. K. (1972). Test for spatial autocorrelation among regression residuals. Geographical Analysis, 4, [14] Cliff, A. D. and Ord, J. K. (1981). Spatial Processes: Models And Applications, Pion, London. [15] Fingleton, B. (1983b). Loglinear models with dependent spatial data. Environment and Planning A, 15, [16] Fotheringham, S. (1997). Trends in quantitative geography: I: stressing the local. Progress in Human Geography, 21, [17] Fotheringham, S. (1999). Guest editorial: local modeling. Geographical and Environmental Modeling, [18] Geary, R. C. (1954). The contiguity ratio and statistical mapping. The Incorporated Statistician, 5, [19] Getis, A. and Aldstadt, J. (2004). Constructing the spatial weights matrix using a local statistic. Geographical Analysis, 36, [20] Getis, A. and Ord, J. (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24, [21] Griffith, D. (2002). A spatial filtering specification for the auto-poisson model. Statistics and Probability Letters, 58, [22] Lee, S. I. (2004). A generalized significance testing method for global measures of spatial association: an extension of the Mantel test. Environment And Planning A, 36,

20 [23] Lin, G. (2003). A spatial logit association model for cluster detection. Geographical Analysis, 35, [24] Lin, G. and Zhang, T. (2005). Loglinear residual tests of Moran I autocorrelation and their applications to Kentucky Breast Cancer Data. Geographical Analysis, to appear. [25] Messner, S., Anselin, L., Baller, R., Hawkins, D., Deane, G. and Tolnay, S. (1999). The spatial patterning of county homicide rates: an application of exploratory spatial data analysis. Journal of Quantitative Criminology, 15, [26] Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society Series B, 10, [27] Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37, [28] Neter, J., Kutner, M. H., Nachtsheim, C. and Wasserman, W. (1996). Applied Linear Statistical Models, 4th Edition, McGraw Hill, New York. [29] Oden, N. (1995). Adjusting Moran s I for population density. Statistics in Medicine, 14, [30] Pierce, D. and Schafer, D. (1986). Residuals in Generalized linear models. Journal of American Statistical Association, 81, [31] Rogerson, P. A. (1999). The detection of clusters using a spatial version of the chi-square goodness-of-fit statistics. Geographical Analysis, 31, [32] Sen, A. (1976). Large sample-size distribution of statistics used in testing for spatial correlation. Geographical analysis, 9, [33] Sokal, P. R., Oden, N. L. and Thomson, B. A. (1998). Local spatial autocorrelation in a biological model. Geographical Analysis, 30, [34] Tango, T. (1995). A class of tests for detecting general and focused clustering of rare diseases. Statistics in Medicine, 14,

21 [35] Waldhor, T. (1996). The spatial autocorrelation coefficient Moran s I under heteroscedasticity. Statistics in Medicine, 15, [36] Walter, S. D. (1992). The analysis of regional patterns in health data. American Journal of Epidemiology, 136, [37] Whittemore, A., Friend, N., Brown, B. and Holly, E. (1987). A test to detect clusters of disease. Biometrika, 74, [38] Wrigley, N. (1985) Categorical Data Analysis for Geographers and Environmental Scientists. Longman, New York. 20

22 Rejection Rate I r I EBI I DR η Figure 1: Type I error rates of I r, I DR and I EBI under heterogeneity (α = 0.05). 21

23 Local Cluster Rejection Rate I DR without I DR with δ Figure 2: Rejection rate of I DR with and without the spatial association term (α = 0.05). 22

24 Residual Plot: Without QQ plot: Without Deviance Residuals Sample Quantiles Index Theoretical Quantiles Residual Plot: With QQ plot: With Deviance Residuals Sample Quantiles Index Theoretical Quantiles Figure 3: Residual plots and QQ-plots in the presence a local cluster (δ = 0.5). 23

25 Global and Local Trend Rejection Rate I DR without I DR with δ Figure 4: Power functions of I DR with and without the spatial association term. 24

26 Residual Plot: Without QQplot: Without Deviance Residuals Sample Quantiles Index Theoretical Quantiles Residual Plot: With QQplot: With Deviance Residuals Sample Quantiles Index Theoretical Quantiles Figure 5: Residual plots and QQ-plots in the presence of local and global clustering structures 25

27 St. Louis Madison St. Clair Deviance Miles Figure 6: Deviance residuals of the null model for St. Louis homicides. 26

28 Table 1: Loglinear model estimates and I DR results for St. Louis homicides: Models ˆβ1 p-value G 2 d.f. I DR p-value Null Spatial association I (St. Louis) Spatial association II (St. Louis)* Ecological covariate (% of Blacks) Note: variables captured by ˆβ 1 are in parentheses. Model I assigns 4 to St. Louis county, 2 to St. Clair county, and 1 to the other adjacent counties; Model II differs by assigning 2.5 to St. Clair county. 27

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Arthur Getis* and Jared Aldstadt** *San Diego State University **SDSU/UCSB

More information

Cluster Detection Based on Spatial Associations and Iterated Residuals in Generalized Linear Mixed Models

Cluster Detection Based on Spatial Associations and Iterated Residuals in Generalized Linear Mixed Models Biometrics 65, 353 360 June 2009 DOI: 10.1111/j.1541-0420.2008.01069.x Cluster Detection Based on Spatial Associations and Iterated Residuals in Generalized Linear Mixed Models Tonglin Zhang 1, and Ge

More information

Loglinear Residual Tests of Moran s I Autocorrelation and their Applications to Kentucky Breast Cancer Data

Loglinear Residual Tests of Moran s I Autocorrelation and their Applications to Kentucky Breast Cancer Data Geographical Analysis ISSN 0016-7363 Loglinear Residual Tests of Moran s I Autocorrelation and their Applications to Kentucky Breast Cancer Data Ge Lin, 1 Tonglin Zhang 1 Department of Geology and Geography,

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

Spatial Clusters of Rates

Spatial Clusters of Rates Spatial Clusters of Rates Luc Anselin http://spatial.uchicago.edu concepts EBI local Moran scan statistics Concepts Rates as Risk from counts (spatially extensive) to rates (spatially intensive) rate =

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

Chapter 15 Spatial Disease Surveillance: Methods and Applications

Chapter 15 Spatial Disease Surveillance: Methods and Applications Chapter 15 Spatial Disease Surveillance: Methods and Applications Tonglin Zhang 15.1 Introduction The availability of geographical indexed health and population data and statistical methodologies have

More information

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB SPACE Workshop NSF NCGIA CSISS UCGIS SDSU Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB August 2-8, 2004 San Diego State University Some Examples of Spatial

More information

Spatial Analysis 2. Spatial Autocorrelation

Spatial Analysis 2. Spatial Autocorrelation Spatial Analysis 2 Spatial Autocorrelation Spatial Autocorrelation a relationship between nearby spatial units of the same variable If, for every pair of subareas i and j in the study region, the drawings

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 53 (2009) 2851 2858 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Spatial

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5 OBJECTIVES The objectives of the SASI Spatial Analysis were to (1) explore the spatial structure of the asymptotic area swept (z ), (2) define clusters of high and low z for each gear type, (3) determine

More information

Spatial Analysis 1. Introduction

Spatial Analysis 1. Introduction Spatial Analysis 1 Introduction Geo-referenced Data (not any data) x, y coordinates (e.g., lat., long.) ------------------------------------------------------ - Table of Data: Obs. # x y Variables -------------------------------------

More information

Introduction to Spatial Statistics and Modeling for Regional Analysis

Introduction to Spatial Statistics and Modeling for Regional Analysis Introduction to Spatial Statistics and Modeling for Regional Analysis Dr. Xinyue Ye, Assistant Professor Center for Regional Development (Department of Commerce EDA University Center) & School of Earth,

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Exploratory Spatial Data Analysis (ESDA)

Exploratory Spatial Data Analysis (ESDA) Exploratory Spatial Data Analysis (ESDA) VANGHR s method of ESDA follows a typical geospatial framework of selecting variables, exploring spatial patterns, and regression analysis. The primary software

More information

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor Spatial Statistics Module 4 Geographers are very interested in studying, understanding, and quantifying the patterns we can see on maps Q: What kinds of map patterns can you think of? There are so many

More information

Local Spatial Autocorrelation Clusters

Local Spatial Autocorrelation Clusters Local Spatial Autocorrelation Clusters Luc Anselin http://spatial.uchicago.edu LISA principle local Moran local G statistics issues and interpretation LISA Principle Clustering vs Clusters global spatial

More information

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Key message Spatial dependence First Law of Geography (Waldo Tobler): Everything is related to everything else, but near things

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 1. Introduction and Review. Luc Anselin.  Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 1. Introduction and Review Luc Anselin http://spatial.uchicago.edu matrix algebra basics spatial econometrics - definitions pitfalls of spatial analysis spatial autocorrelation spatial

More information

Chapter 6 Spatial Analysis

Chapter 6 Spatial Analysis 6.1 Introduction Chapter 6 Spatial Analysis Spatial analysis, in a narrow sense, is a set of mathematical (and usually statistical) tools used to find order and patterns in spatial phenomena. Spatial patterns

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Key message Spatial dependence First Law of Geography (Waldo Tobler): Everything is related to everything else, but near things

More information

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES Mariana Nagy "Aurel Vlaicu" University of Arad Romania Department of Mathematics and Computer Science

More information

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Clemson University TigerPrints All Dissertations Dissertations 8-207 Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Emily Nystrom Clemson University, emily.m.nystrom@gmail.com

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Mapping and Analysis for Spatial Social Science

Mapping and Analysis for Spatial Social Science Mapping and Analysis for Spatial Social Science Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Dale L. Zimmerman Department of Statistics and Actuarial Science, University of Iowa, USA

Dale L. Zimmerman Department of Statistics and Actuarial Science, University of Iowa, USA SPATIAL STATISTICS Dale L. Zimmerman Department of Statistics and Actuarial Science, University of Iowa, USA Keywords: Geostatistics, Isotropy, Kriging, Lattice Data, Spatial point patterns, Stationarity

More information

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK WHAT IS GEODA? Software program that serves as an introduction to spatial data analysis Free Open Source Source code is available under GNU license

More information

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S. Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis Nicholas M. Giner Esri Parrish S. Henderson FBI Agenda The subjectivity of maps What is Hot Spot Analysis? Why do Hot

More information

An Introduction to Pattern Statistics

An Introduction to Pattern Statistics An Introduction to Pattern Statistics Nearest Neighbors The CSR hypothesis Clark/Evans and modification Cuzick and Edwards and controls All events k function Weighted k function Comparative k functions

More information

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World Outline Practical Point Pattern Analysis Critiques of Spatial Statistical Methods Point pattern analysis versus cluster detection Cluster detection techniques Extensions to point pattern measures Multiple

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin Exploratory Spatial Data Analysis ESDA Luc Anselin University of Illinois, Urbana-Champaign http://www.spacestat.com Outline ESDA Exploring Spatial Patterns Global Spatial Autocorrelation Local Spatial

More information

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3 GIS APPLICATIONS LECTURE 3 SPATIAL AUTOCORRELATION. First law of geography: everything is related to everything else, but near things are more related than distant things Waldo Tobler Check who is sitting

More information

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS Srinivasan R and Venkatesan P Dept. of Statistics, National Institute for Research Tuberculosis, (Indian Council of Medical Research),

More information

Spatial autocorrelation: robustness of measures and tests

Spatial autocorrelation: robustness of measures and tests Spatial autocorrelation: robustness of measures and tests Marie Ernst and Gentiane Haesbroeck University of Liege London, December 14, 2015 Spatial Data Spatial data : geographical positions non spatial

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data Presented at the Seventh World Conference of the Spatial Econometrics Association, the Key Bridge Marriott Hotel, Washington, D.C., USA, July 10 12, 2013. Application of eigenvector-based spatial filtering

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

In matrix algebra notation, a linear model is written as

In matrix algebra notation, a linear model is written as DM3 Calculation of health disparity Indices Using Data Mining and the SAS Bridge to ESRI Mussie Tesfamicael, University of Louisville, Louisville, KY Abstract Socioeconomic indices are strongly believed

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign GIS and Spatial Analysis Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline GIS and Spatial Analysis

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Generalized common spatial factor model

Generalized common spatial factor model Biostatistics (2003), 4, 4,pp. 569 582 Printed in Great Britain Generalized common spatial factor model FUJUN WANG Eli Lilly and Company, Indianapolis, IN 46285, USA MELANIE M. WALL Division of Biostatistics,

More information

Spatial Autocorrelation

Spatial Autocorrelation Spatial Autocorrelation Luc Anselin http://spatial.uchicago.edu spatial randomness positive and negative spatial autocorrelation spatial autocorrelation statistics spatial weights Spatial Randomness The

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

Geographically Weighted Regression as a Statistical Model

Geographically Weighted Regression as a Statistical Model Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne

More information

Testing Random Effects in Two-Way Spatial Panel Data Models

Testing Random Effects in Two-Way Spatial Panel Data Models Testing Random Effects in Two-Way Spatial Panel Data Models Nicolas Debarsy May 27, 2010 Abstract This paper proposes an alternative testing procedure to the Hausman test statistic to help the applied

More information

Research Notes and Comments I 347

Research Notes and Comments I 347 Research Notes and Comments I 347 mum-likelihood estimation of the constant and does not need to be applied a posteriori. Overall, the replacement of the lognormal model by the Poisson model provides a

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Christopher Dougherty London School of Economics and Political Science

Christopher Dougherty London School of Economics and Political Science Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND Testing For Unit Roots With Cointegrated Data NOTE: This paper is a revision of

More information

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction Exploratory Spatial Data Analysis Using GeoDA: : An Introduction Prepared by Professor Ravi K. Sharma, University of Pittsburgh Modified for NBDPN 2007 Conference Presentation by Professor Russell S. Kirby,

More information

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 ARIC Manuscript Proposal # 1186 PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 1.a. Full Title: Comparing Methods of Incorporating Spatial Correlation in

More information

Rob Baller Department of Sociology University of Iowa. August 17, 2003

Rob Baller Department of Sociology University of Iowa. August 17, 2003 Applying a Spatial Perspective to the Study of Violence: Lessons Learned Rob Baller Department of Sociology University of Iowa August 17, 2003 Much of this work was funded by the National Consortium on

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Bayesian Areal Wombling for Geographic Boundary Analysis

Bayesian Areal Wombling for Geographic Boundary Analysis Bayesian Areal Wombling for Geographic Boundary Analysis Haolan Lu, Haijun Ma, and Bradley P. Carlin haolanl@biostat.umn.edu, haijunma@biostat.umn.edu, and brad@biostat.umn.edu Division of Biostatistics

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Rate Maps and Smoothing

Rate Maps and Smoothing Rate Maps and Smoothing Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline Mapping Rates Risk

More information

Surveillance of Infectious Disease Data using Cumulative Sum Methods

Surveillance of Infectious Disease Data using Cumulative Sum Methods Surveillance of Infectious Disease Data using Cumulative Sum Methods 1 Michael Höhle 2 Leonhard Held 1 1 Institute of Social and Preventive Medicine University of Zurich 2 Department of Statistics University

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

On dealing with spatially correlated residuals in remote sensing and GIS

On dealing with spatially correlated residuals in remote sensing and GIS On dealing with spatially correlated residuals in remote sensing and GIS Nicholas A. S. Hamm 1, Peter M. Atkinson and Edward J. Milton 3 School of Geography University of Southampton Southampton SO17 3AT

More information

The Study on Trinary Join-Counts for Spatial Autocorrelation

The Study on Trinary Join-Counts for Spatial Autocorrelation Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 5-7, 008, pp. -8 The Study on Trinary Join-Counts

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

School on Modelling Tools and Capacity Building in Climate and Public Health April Point Event Analysis

School on Modelling Tools and Capacity Building in Climate and Public Health April Point Event Analysis 2453-12 School on Modelling Tools and Capacity Building in Climate and Public Health 15-26 April 2013 Point Event Analysis SA CARVALHO Marilia PROCC FIOCRUZ Avenida Brasil 4365 Rio De Janeiro 21040360

More information

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

Basics of Geographic Analysis in R

Basics of Geographic Analysis in R Basics of Geographic Analysis in R Spatial Autocorrelation and Spatial Weights Yuri M. Zhukov GOV 2525: Political Geography February 25, 2013 Outline 1. Introduction 2. Spatial Data and Basic Visualization

More information

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

More information

GLM models and OLS regression

GLM models and OLS regression GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information