A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns

Size: px
Start display at page:

Download "A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns"

Transcription

1 A. Stewart Fotheringham and F. Benjamin Zhan A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns This paper compares the performances of three explorato y methods for cluster detection in spatial point patterns where the at-risk population is known. After reviewing two existing methods, Openshaw et al. (1987) and Besag and Newell (1 991), an alternative method is introduced. These three methods are then compared empirically using two point patterns drawn from a disaggregate housing database consisting of 28,832 observations. Each observation in the data set contains attributes of single-family detached dwellings in the City of Amherst, New York. This paper provides some new insights into the performance of the three methods, as previous applications have used spatially aggregated (and hence rather inaccurate) data. The paper also demonstrates the utility of GZS for this type of spatial analysis. 1. INTRODUCTION The analysis of spatial point patterns has long been an important concern in geographical inquiry [see, for example, Boots and Getis (1988) and the references therein]. The availability of georeferenced point type data in digital form and the advantages that geographical information systems (GIS) offer for analyzing spatial point data suggest that interest in spatial point pattern analysis will increase. Indeed, there has been much recent interest from researchers across several disciplines (Clayton and Kaldor 1987; Openshaw et al. 1987; Stone 1988; Doll 1989, Gardner 1989; Hills and Alexander 1989; Wheldon 1989; Cuzick and Edwards 1990; Besag and Newell 1991), and particularly in The idea for this paper began as part of Research Initiative #14, Spatial Analysis and GIS, of the National Center for Geographic Information and Analysis in the United States, supported by a grant from the National Science Foundation (SBR ). Continued support for A. Stewart Fotheringham was provided by the North-East Regional Research Laboratory in the United Kingdom and for F. Benjamin Zhan by a faculty research enhancement grant from Southwest Texas State University. The authors thank Dr. Barry Lentnek for allowing them to use the Amherst housing data; David Phillips and Martin Camacho for their assistance with the data set; and Professor Stan Openshaw for his comments. Generous help from Fuxiang Xia and Ge Lin is also greatly appreciated. A. Stewart Fotheringham is professor of geography at the North-East Regional Research Laboratory, University of Newcastle. F. Benjamin Zhan is assistant professor of geography and planning, Southwest Texas State University. Geographical Analysis, Vol. 28, No. 3 (July 1996) Ohio State University Press Final version accepted 12/20/94.

2 A. Stewart Fotheringham and F. Benjamin Zhan / 201 the study of spatial patterns of disease (Marshall 1991). Given the rich availability of data in GIs, and the nature of spatial point pattern analysis, where the underlying statistical assumptions are often hard to specify and selection biases are usually present (Besag and Newell 1991), it seems particularly important to examine the detection of clusters using exploratory techniques (Openshaw et al. 1987; Besag and Newell 1991). Two broad categories of point patterns can be identified: those for which the at-risk population is known, and those for which the at-risk population is unknown. While it is recognized that there may be many situations where the atrisk population is unknown, such as the occurrence of the lighting strikes, for example, this paper concentrates solely on the former. The reason for this is that knowledge of the spatial distribution of the at-risk population allows more interesting clusters to be distinguished from those that arise purely from spatial variations in the density of the at-risk population. For example, a map of the incidence of some disease is relatively uninformative if the underlying distribution of the population is unknown: clusters of the disease will inevitably appear in areas of high population density. The geographically interesting question is not where is the sample clustered? but where is the sample clustered relative to the population? Regardless of the specific technique used for cluster detection, the general procedure for hypothesis testing is basically the same: a null hypothesis (Ho) and alternative (research) hypothesis (HI) are specified; a test statistic is computed from the observed point pattern; and a technique is chosen for assessing the significance of the statistic. Ideally the test statistic should be computed from a comparison of the observed points and the underlying at-risk population. This is a problem if data are aggregated to a certain level as by Openshaw et al. (1987) and Besag and Newell (1991) where the observed cases and the population at risk are aggregated into census enumeration districts (EDs) or census tracts and georeferenced to the centroids of these zones. The purpose of this paper is to compare the performance of three exploratory methods used for detecting clusters in spatial point patterns using examples from a file containing georeferenced data on 28,832 houses in Amherst, New York. We will first give a brief review of the existing exploratory methods for cluster detection in section 2. Section 3 presents an alternative method to those that currently exist. The design of the empirical research is presented in Section 4 and results discussed in Section 5. Conclusions are drawn in Section TWO EXISTING METHODS FOR DETECTING SPATIAL POINT CLUSTERS Reviews on general point pattern analysis can be found in Ripley (1981), Diggle (1983), and Upton and Fingleton (1985), and those particularly related to geographical research can be found in Boots and Getis (1988). Reviews of the methods used for the analysis of clusters in spatial point patterns concerned with disease are provided by Hills and Alexander (1989) and Marshall (1991). Because our concern here is with the detection of clusters in spatial point patterns using exploratory methods, the literature review is focused on such methods. The first attempt for detecting spatial point clusters using exploratory methods is the Geographical Analysis Machine (GAM) developed by Openshaw et al. (1987). For convenience, the method will be called the Openshaw method hereafter. GAM consists of four components: ( 1) a spatial hypothesis generator, (2) a procedure for assessing significance, (3) a GIS to handle retrieval of spatial data, and (4) a geographical display and map processing system (Openshaw et al. 1987, p. 338).

3 202 Geographical Analysis The technique used by Openshaw et al. (1987) is illustrated in Figure 1. First, a universe of all possible circle-based hypotheses are generated using the following algorithm. (1) Construct an initial grid over the area of interest, and define the minimum, maximum, and the incremental value of radii of the circles to be located at the intersections on the grid. The length of each side of the grid and the radius of a circle are chosen in such a way that the initial grid lattice is sufficiently fine-grained and that the circles can overlap to a large degree. (2) For a constructed grid mesh and a determined circle size, move the circle in such a manner that it is located on each grid intersection systematically. Compute the test statistic for each circle at each grid intersection. If the test statistic passes the significance test (see below), the location and the circle are stored for later visualization. (3) Increase the radius of the circle by the specified increment, and accordingly construct a new grid mesh. (4) Repeat steps 2 and 3 until the radius reaches the maximum value. Openshaw et al. (19871, in their CAM, used Monte Carlo simulation to assess significance. Circles are located systematically in the study area as discussed above. The count of observed cases within each circle is used as the test statistic. That is, the count of observed cases within a circle for the observed point data is compared with the count of simulated cases in the circle for each of the a - 1 sampled data sets. The circle and its location are recorded if and only if the count of observed cases in the circle for the observed point data is the largest one among the a test statistics. For a - 1 simulated sample data sets, the significance level is i. Using a Monte Carlo significance test has a number of advantages as described by Hope (1968, p. 582). Essentially, the technique is assumption free and can always be used when underlying distributions are unknown or when the necessary conditions for applying a test are not met. It may also be used when only vague alternative hypotheses exist and when only a vague definition of the test criteria can be given. However, as identified by Besag and Newell (1991) and Marshall (1991), there are weaknesses in the method used by Openshaw et al. (1987). One such weakness is that there is no control for multiple testing both locally and globally (Besag and Newell 1991, p. 148). The global aspect means that clusters may be produced by chance alone when the circle used is large. The local aspect is related to the problem that the change of radius and the shifts in location are not taken into account in the calculation of the significance levels. Secondly, it is very difficult to calculate the observed cases and to define the population at risk within the circular area given that the data are aggregated into irregular districts. More recently, Besag and Newell (1991) propose a method (hereafter referred to as the Besag method) that avoids some of these deficiencies. In the Besag method, under the null hypothesis Ho it is assumed that the total observed cases in a circle (defined in the same way as in the Openshaw method) are located randomly among the population at risk (pm) with the mean probability P, = % (n is the total number of observed cases and N is the total population at risk). The probability of observing exactly x cases among the population at risk can then be approximated by the Poisson term: e- AZ - for ~=1,2,3, X!... where A = Pmean x pm. It follows that, for a prespecified value k, the probability of observing k or more than k cases among p,,, is

4 Q nput the observed data (number of observations = n 1 input the data of population at risk + (number of observations = N) for significance test at level a, randomly sample 1 / (a-i) data sets from the population at risk, and make sure that + each data set contains n observation set the minimum, maximum and incremental values for the radii of the circles obtain the radius of a circle and construct a grid mesh so that the length of the side of a cell in the grid mesh is some fraction of the radius of the circle move the circle in such a way so that each time the circle is located on one of the intersections of the grid mesh consecutively compute a test statistic for the observed data within the circl I t ompute a test statistic for each of the 1 / (a-i) sampled data sets within the circle I t se Monte Carlo significance test to assess the significanc + store the circle and the location 6 FIG. 1. Openshaw et al. s Procedure for Cluster Detection

5 204 / Geographical Analysis This formula is used to calculate the significance level for each potential cluster. In this method, the cluster is detected based on whether an observed case forms the center of a cluster of cases through examining the number of nearest zones, M, given that a prespecified accumulated k cases are observed in the M zones. Suppose that at least one case is observed in zone i = 0, labeled Ao. In order to check if there is a cluster around Ao, all other zones are labeled Ai, i = 1,2,..., sequentially, the sequence depending on the distance between the centroid of a zone i # 0 and the centroid of zone i = 0. Let xi be the observed number of cases in zone i, yi be the population at risk in zone i. The accumulated number of cases and accumulated number of population at risk in the zones can be defined respectively, as follows: Let M = min(i : Di 1 k) (5) where k is a predetermined number of observed cases (for example, k = 4) and M is defined in such a way that zones Ao,..., AM contain at least k cases. When the value of M is small, it is indicative of a cluster around Ao. It should be noted that Di and pi as defined in (3) and (4) are slightly different from the definition in Besag and Newell (1991) in that no observed case is discounted. Formulas (3) and (4) are used in the experiment conducted in this paper because of the use of disaggregated data. To understand the physical base behind this method, one has to appreciate that the observed cases and the population at risk are aggregated data and are georeferenced to the centroids of zones [census enumeration districts (EDs) or census tracts] distributed over the study area. If individual data were available, the method would be more subjective because of the lack of predefined zones. The method can also be criticized because the value of k is chosen in an ad hoc manner, although the results for different values of k can obviously be displayed to obviate this problem. In each example presented by Openshaw et al. (1987) and Besag and Newell (1991), the data used are aggregated into census tracts or enumeration districts. The observed cases and population at risk are georeferenced to the centroids of these zones. There is an obvious fundamental problem for computing the test statistic and defining population at risk for any given circle when the Openshaw method is used (Besag and Newell 1991) because the computation is based on the census enumeration districts (EDs) whose centroids are within the circle. This apparently does not reflect the situation in reality. It would be more desirable to conduct the analysis using the true coordinates of the observed cases and the population at risk using disaggregated data.

6 A. Stewart Fotheringham and F. Benjamin Zhan / AN ALTERNATIVE METHOD FOR DETECTING SPATIAL POINT CLUSTERS A third method for detecting spatial point clusters is introduced in this paper. The procedure for the method is illustrated in Figure 2. It differs from the Openshaw method in basically two respects: the location and size of a circle are determined randomly within specified ranges, and the Poisson probability distribution is used directly for assessing significance. Let the total population at risk in the area of interest be N and the total number of observed cases (with a particular attribute) be n; then the mean probability of observing a case in the entire area is For any circle whose location and radius are determined randomly, the number of cases (2) and the population at risk (y) within the circle can be obtained. The expected number of cases (A) in the circle then can be determined as: The probability P(zlA) for observing exactly 2 cases in the circle with expected cases A can then be determined using the Poisson distribution (Getis and Boots 1978, p. 19): Two methods can be used to assess the significance. In the first, P(zl A) in (8) is directly used as the measurement of significance. That is, if P(x,Iz) < U, where u is a prespecified level of significance, the radius and location of the circle generating P(zl A) are stored. The second method of significance testing that can be applied is that adopted in the Besag method described above. The only difference here is that the locations and radii of the circles are determined randomly, and k is the number of observations with a particular attribute that lie within each circle. In what follows, the results of the two significance testing procedures are very similar so only the results of the first one are reported. Hereafter, this third method is called the Fotheringham and Zhan method, or Fotheringham method for short. 4. RESEARCH DESIGN FOR COMPARING THE THREE METHODS The methods described above for detecting point clusters were coded in C and linked with ARC/INFO 6.1, running on a SUN workstation. The purpose of this section is to discuss the experimental design for testing these methods in terms of the preparation of data and the choice of search parameters in the programs. 4.1 The Preparation of Data In this study, the objects under investigation are houses in Amherst, New York. A master database consisting of 28,832 houses (observations) is constructed, stored, and managed using the ARC/INFO database. The data in the master database is derived from a data file containing information about the single-family detached dwellings in the City of Amherst, New York. These houses are geocoded using two-dimensional coordinates, and the locations of

7 nput the observed data (number of observations = n I input the data of population at risk (number of observations = N) i compute the mean probability: n / N J set the minimum and maximum values of the radii of the circle 4 randomly select the radius of a circle within a specified range of radius value 1 randomly locate the circle in the area of interest 1 (compute the number of points from the observed data within the circle: x) t compute the number of population at risk within the circle I compute the expected number of points in the circle using the mean probability and the population at risk in the circle 1 compute the probability of observing x points in the circle using the Poisson distribution and the expected number of points & store the circle and the location /. sufficient number of circles seeded? FIG. 2. Fotheringham and Zhan's Procedure for Cluster Detection

8 A. Stewart Fotheringham and F. Benjamin Zhan / 207 all 28,832 houses are shown in Figure 3. In addition to the x, y coordinates, other attributes such as age, type of construction, quality, and price are available for every observation. The 28,832 houses can be regarded as the population at risk in the area and houses with various attributes are drawn from this population. Polygons of census tracts covering Amherst are also created and added to the map, but are used solely for visualization. Two data sets were drawn from the total population at risk. Data set 1 consists of houses whose overall construction quality is rated in the lowest category (1-5) and contains the 277 points mapped in Figure 4. Data set 2, illustrated in Figure 5, consists of two hundred randomly selected points from the master database. In Figures 4 and 5, various clusters seem to be present although different clusters may be apparent to different people. It is also not clear whether a cluster is worthy of further investigation because it results from some clustering process or whether it is simply a reflection of the distribution of the underlying at-risk population. For example, Table 1 shows that the use of standard tests such as the variance-to-mean ratio and nearest neighbor analysis indicate extremely strong clustering in both spatial distributions (one of which is a random drawing) in Figures 4 and 5. This is because the distribution of the at-risk population in each case is strongly clustered but this is ignored in the calculation of the variance-to-mean and nearest neighbor statistics. These statistics cannot differentiate between distributions that are clustered because of the distribution of the underlying population and those that exhibit clusters that are geographically interesting. Conversely, there might well be points that do not appear to merit investigation when examined without exogenous knowledge, and only appear as significant clusters when compared to the at-risk population. The three methods described above are designed to remove these problems by using information on the at-risk population to automate the identification of spatial clusters that warrant further geographic investigation. 4.2 The Choice of Search Parameters Before each program is run, a number of search parameters must be decided. One parameter that is used in all three methods is the minimum number of observed cases to be considered in a circle. This parameter is set to one (1) for the Fotheringham and Openshaw methods, which means that significance assessment is conducted as long as there is at least one case within a given circle at a particular location. Because the Besag method directly uses a prespecified number (Ic), the minimum value of k is set to two (2), following the experiment conducted by Besag and Newell (1991). Other search parameters to be set are the minimum and maximum radii of the circles. After a number of trials, the minimum radius is set to meters (600 feet), and the maximum radius is set to meters (2,100 feet). In Amherst, the expansion in east-west direction is meters (35,704 feet), and meters (49,866 feet) in north-south. It should be pointed out here that the range of the radii is data dependent and only becomes clear after several trials. Circles that are too small will not detect the extent of large clusters and may miss clusters all together, while circles that are too large risk hiding variations at smaller scales. This is one of the reasons why exploratory data analysis is important, and the optimal choice of the parameters remains subject to further investigation. For the Openshaw and the Besag methods, one other parameter, the increment of the radius, has to be determined and based on the results of a number of trials, it is here set to 76.2 meters (250 feet). Various tests of sensitivity of each of the three methods are employed. For

9 208 Geographical Analysis FIG. 3. Houses in the Study Area (At-Risk Population)-The Master Database the Openshaw method, the number of simulated sample data sets are chosen as 19, 49, 99, 199, 499, so that they are equivalent to significance levels of 0.05, 0.02, 0.01, 0.005, 0.002, respectively. A significance level of is not used for the Openshaw method because of the computer time required to investigate 999 simulated samples. The Besag method is sensitive to the value of Ic and six values are reported (Ic = 2,3,4,5,6, and 7), all at the 0.05 significance level. In the Fotheringham method, significance levels are set to 0.05, 0.02, 0.01, 0.005, 0.002, and and clusters are displayed at each of the levels. Because data set 2 is a sample drawn randomly from the at-risk population and hence is subject to sampling variation, ten such samples are drawn and

10 209 FIG. 4. Test Data Set Category (1-5) Lowest FIG. 5. Test Data Set 2: Randomly Selected Points from the At-Risk Population results are reported for the average of all ten. For the Openshaw and the Besag tests, 29,218 circles were seeded for each random sample at each significance level and the proportion of circles displayed (that is, those containing significant clusters of points) is calculated for each significance level in the case of the

11 210 / Geographical Analysis TABLE 1 Classical Point-Pattern Analysis Results for Data Sets 1 and 2 Variance-Mean ratio t value R (nearest neighbor) Data Set 1 (Figure 4) Data Set 2 (Figure 5) ' ' NME: 'significantly different from 1.0 at the 99 percent confidence level. Openshaw method and for each value of k in the case of the Besag method. For the Fotheringham test, five thousand circles were seeded and the proportion of circles displayed is calculated for each significance level. The Fotheringham method uses a random placement of circles and hence fewer circles need to be seeded. 5. RESULTS 5.1 Visualization In order to demonstrate the relative performance of the three techniques of automatic cluster detection, the results (all retained circles and locations) were displayed using the ARC/INFO GIS and are reported in Figures 6 and 7. Both figures refer to the respective data displayed in Figures 4 and 5 and both are composed of three sets of circles derived from the Openshaw, Besag, and Fotheringham methods, respectively. Every circle represents a significant cluster of points using the significance testing procedures described above. Figures 6a-c contain results using data set one defined as houses with the lowest construction quality ranking and Figures 7a-c show the results from the two hundred points in data set two which are drawn randomly from the master IignifKUre lewl= 0.01 observed point pattern aignifiancc lcvd = FIG. 6a. Detecting Spatial Point Clusters Using the Openshaw Method: Actual Point Pattern

12 A. Stewart Fotheringham and F. Benjamin Zhan / 211 k=2 k=3 k=4 M observed point pattern k=s k=6 k=7 FIG. 6b. Detecting Spatial Point Clusters Using the Besag and Newell Method: Actual Point Pattern significance level = 0.02 significanec level = 0.01 I significance ~ w = u 0.00s ~ significance lcvel = significsnec levcl = FIG. 6c. Detecting Spatial Point Clusters Using the Fotheringharn and Zhan Method: Actual Point Pattern database. Both sets of figures contain a separate window displaying the point pattern on which the results are based. Figure 6a shows the results of applying Openshaw s method to data set 1. It is clear that the technique identifies a large number of clusters, especially at traditional significance levels (0.05 and 0.01) where almost every point appears as a

13 212 J Geographical Analysis I M M M significance level = 0.0s significance level = 0.02 significance lcvcl M M observed point pattern FIG. 7a. Detecting Spatial Point Clusters Using the Openshaw Method: Random Point Pattern k-2 k=3 k-4 observed point pattern k-s k=6 k=l FIG. 7b. Detecting Spatial Point Clusters Using the Besag and Newell Method: Random Point Pattern significant cluster. Even at significance levels as extreme as 0.002, the technique identifies large numbers of clusters. The Besag method, the results of which are shown in Figure 6b, is much more conservative although the results are clearly dependent on k, a predefined number of points within a circle. Above k = 4, the technique picks out only the clusters of points in the southern part of the

14 A. Stewart Fotheringham and F. Benjamin Zhan J 213 significance level 0.05 significance level = 0.02 significance level * 0.01 M I observed point pattern I I significance level = significance level = significance level = FIG. 7c. Detecting Spatial Point Clusters Using the Fotheringham and Zhan Method: Random Point Pattern map and only two areas of the map appear to have interesting clusters. The Fotheringham methodology appears marginally more selective than Openshaw s at more extreme levels of significance but essentially depicts similar results. One general finding is that the techniques would appear to be more useful when used with an extreme significance level such as so that a limited number of significant clusters is identified. At less extreme values, the techossibly identify too many clusters to be useful as an exploratory tool. niques To pace P the above results in perspective, each of the three methods is applied to a set of points randomly drawn from the at-risk population and these results are shown in Figures 7a-c. It is important to emphasize that the point pattern in this data set does not appear random because the distribution reflects the distribution of the at-risk population from which the sample is drawn. Given that the population is nonrandomly located in space, the sample is similarly spatially nonrandom. A logical test of each method is therefore to see whether it can separate a visual cluster from a geographically interesting one, the latter being a set of points that is significantly more clustered than the distribution of the underlying at-risk population would suggest. The Openshaw technique performs slightly less satisfactorily in this regard: clusters appear at significance levels even as extreme as It is more difficult to evaluate the Besag technique because although clusters are identified at all values of Ic, the significance level is The Fotheringham technique identifies relatively fewer clusters and identifies none at a significance level above These results are encouraging because if in the random samples geographically interesting clusters can be separated from clusters that result merely from the distribution of the underlying population, clusters that are identified in the nonrandom distributions can be treated as geographically interesting. For instance, the results of the Fotheringham method at significance levels and with the random data suggest that the clusters identified from this method in data set 1 are of geographic interest in that they probably arise

15 214 Geographical Analysis TABLE 2 Performance Indicators of the Three Techniques on Ten Random Samples Significance level Number of circles Average number of Average proportion or k value seeded circles displayed of circles displayed a. Openshaw et al. method, ,01314, , , b. Besag and Newell method (significance level: 0.05) , , ,001 c. Fotheringham and Zhan method ,02400, from a spatial process and not from variations in the underlying population density. 5.2 A Further Test Based on Random Distributions A further test of the three techniques is undertaken by examining the performance of each technique on ten different random drawings from the at-risk population. These results are summarized in Table 2 where each of the ten distributions consists of two hundred randomly drawn points. Table 2a contains the results of the Openshaw technique applied to each of the ten distributions. At each significance level, 29,218 circles are seeded and the average number of these circles that are displayed (and hence contain a significance cluster of points) is given in column 3. These average frequencies are converted to average proportions in column 4. Given that the distributions are random drawing from the at-risk population, a comparison of these proportions across the different techniques yields some insights into the probability of each technique identifying false positives (although it says nothing about failure to identify real positives). Unfortunately, the Besag results in Table 2b depend on the value of k, the minimum number of points within a circle, and so are not directly comparable. The results for the Fotheringham technique, shown in Table 2c, result from only five thousand seeded circles at each significance level because in the technique the circles are seeded randomly, whereas in the other two techniques the circles are uniformly placed over the studying area. The results for all three methods are encouraging in that the average proportion of circles displayed is always less than half the significance level (circles are displayed only when a significantly larger number of points is observed than would be expected). The Besag procedure is particularly impressive when it is noted that the proportions are all calculated at a significance level of The results again suggest that the circles identified at extreme significance levels in

16 A. Stewart Fotheringham and F. Benjamin Zhan / 215 Figure 6a-c would therefore seem to represent the outcomes of some interesting geographic processes. It is useful to note that the Fotheringham method appears to be less sensitive than the other two methods at low levels of significance but is more sensitive at higher levels of significance. This suggests that the simpler procedure of randomly assigning circles (the Fotheringham method) works just as well as comprehensively covering the study area, and may in fact be more selective when extreme significance levels are used. 5.3 Sensitivity to Circle Definition All three methods of point pattern analysis depend upon a definition of circle size. The above results, for instance, are for circles that have a radius between and meters. In order to examine the potential sensitivity of the results to this definition, some other ranges were selected and the methodology described in section 5.1 repeated. The results of one significance level corresponding to data set one are shown in Figures 8a-c. Each technique has a similar sensitivity to circle definition in that as the circles increase in size, the circles in which significant clusters occur increasingly overlap and give an exaggerated appearance to a cluster. That is, regardless of the effect on statistical detection, varying the size of circles used affects the perception of the results. Given that all three techniques are intended to be used in an exploratory mode, this perceptual sensitivity needs more attention. It could be argued that an advantage of exploratory techniques is that analyses can be undertaken under many different conditions and in this case maps can be reported with different circle ranges. 6. CONCLUSIONS The increasing prevalence of GIS technology and the concomitant access to disaggregate spatial data sets will lead to a greater demand for automated cluster detection techniques. Such techniques have obvious applications in the 183 m 5 R< 2SVm observed point pattern 41 I rn 5 R < S64 m FIG. 8a. Detecting Spatial Point Clusters Using the Openshaw Method: The Effect of Circle Size

17 216 1 Geographical Analysis 183 m < - R < 259 m 259m5R<411m observed point pattern 411 m L R < 564 m 564mLR FIG. 8b. Detecting Spatial Point Clusters Using the Besag and Newell Method: The Effect of Circle Size 183 m < R < 259m 259m<Rc411 m observed point pattern 411 m<r<564m 564mLR FIG. 8c. Detecting Spatial Point Clusters Using the Fotheringham and Zhan Method: The Effect of Circle Size investigation of the incidence of certain types of disease but they can also be applied to a host of subjects in the social and environmental sciences. Openshaw et al. (1987) popularized the automation and visualization of cluster detection through a randomized version of quadrat analysis although earlier work such as that by Hudson (1969) predates this by about two decades. Openshaw et al.'s work not only has the advantage of producing a visual output showing the locations of significant clusters of points, but it overcomes the disadvantage of stan-

18 A. Stewart Fotheringham and F. Benjamin Zhan / 217 dard quadrat analysis by allowing the units in which occurrences are counted to be of random size. Besag and Newell (1991) present an alternative methodology for the automated detection of point clusters, and a third approach is provided in this paper. A comparison of the three techniques is presented based on a disaggregate housing data set in which all 28,832 points in the at-risk population have been geocoded. The Openshaw et al. and Besag and Newell techniques have previously been applied only to point patterns aggregated to larger spatial units and, to our knowledge, this is the first application of all three techniques to a disaggregate data set. Testing the techniques with point patterns randomly drawn from a known spatial distribution provides encouraging results in that they appear to perform well in separating geographically interesting clusters from those that result merely from the distribution of the underlying population. The Besag and Newell method appears to be particularly good at not identifying false positives although the Fotheringham and Zhan technique is easier to apply and is not dependent on a definition of minimum cluster size. Finally, the results demonstrate that there are still perceptual issues concerning exploratory graphics that need to be resolved. The techniques evaluated in this paper are potentially very useful in identifying clusters of points that warrant further investigation. Although each relies upon a statistical procedure to determine whether the number of points within a random circle is significant, the overall result in each case is a visual representation of the set of circles in which significant clusters are found. This leads to perceptual questions which are not addressed here about the way such information should be presented. LITERATURE CITED Besag, J., and P. J. Diggle (1977). Simple Monte Carlo Tests for Spatial Pattern. Applied Statistics 26 (3), Besag, J., and J. Newell (1991). The Detection of Clusters in Rare Diseases. Journal of the Royal Statistical Society, A 154, Part 1, Boots, B. N., and A. Getis (1988). Point Pattern Analysis. The Publishers of Professional Social Science. Newbury Park: Sage Publications. Clayton, D., and J. Kaldor (1987). Empirical Bayes Estimates of Age-standardized Relative Risks for Use in Disease Mapping. Biomdrics 43, Cusick, J., and R. Edwards (1990). Spatial Clustering for Inhomegeneous Populations (with discussion). Journal of the Royal Statistical Society B 52, Diggle, P. J. (1983). Statistical Analysis of Spatial Point Patterns. New York: Academic Press. Doll, R. (1989). The Epidemiology of Childhood Leukaemia. Journal of the Royal Statistical Society, A, 152, Gardner, M. J. (1989). Review of Reported Increases of Childhood Cancer Rates in the Vicinity of Nuclear Installations in the UK. ]ournal of the Royal Statisticnl Society, A 152, Getis, A,, and B. Boots (1978). Models of Spatial Processes: An Approach to the Study of Point, Line, and Area Patterns. Cambridge, London: Cambridge University Press. Hope, A. C. A. (1968). A Simplified Monte Carlo Significance Test Procedure. Jouml of the Royal Statistical Society, B 30 (3) Hills, M., and F. Alexander (1989). Statistical Method Used in Assessing the Risk of Disease near a Source of Environmental Pollution: A Review. Journal of the Royal Statistical Society, A 152, Hudson, J. C. (1969). Pattern Recognition in Empirical Map Analysis. Journal of Regional Science 9 (2), Marshall, R. J. (1991). A Review of Methods for the Statistical Analysis of Spatial Patterns of Disease. Journal of the Royal Statistical Society, A 154 (P3), Openshaw, S., M. E. Charlton, C. Wymer, and A. W. Craft (1987). A Mark 1 Geographical Analysis

19 218 1 Geographical Analysis Machine for the Automated Analysis of Point Data Sets. Intemutwnal]oumd of Geographical Informtwn Systems 1 (4), Ripley, B. D. (1981). Spatial Statistics. New York: John Wiley. Stone, R. A. (1988). Investigations of Excess Environmental Risks around Putative Sources: Statistical Problems and a Proposed Test. Statistical Methods 7, Upton, G. J. G., and B. Fingleton (1985). Spatial Data Analysis by Example, Vol. 1, Point Pattern and Quantitatiue Data. Chichester: John Wiley and Sons. Wheldon, T. E. (1989). The Assessment of Risk of Radiation-Induced Childhood Leukaemia in the Vicinity of Nuclear Instdations. ]oumul of the Royal Statistical Society, A 152,

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World Outline Practical Point Pattern Analysis Critiques of Spatial Statistical Methods Point pattern analysis versus cluster detection Cluster detection techniques Extensions to point pattern measures Multiple

More information

Chapter 6 Spatial Analysis

Chapter 6 Spatial Analysis 6.1 Introduction Chapter 6 Spatial Analysis Spatial analysis, in a narrow sense, is a set of mathematical (and usually statistical) tools used to find order and patterns in spatial phenomena. Spatial patterns

More information

Cluster Analysis using SaTScan

Cluster Analysis using SaTScan Cluster Analysis using SaTScan Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability

More information

Context-dependent spatial analysis: A role for GIS?

Context-dependent spatial analysis: A role for GIS? J Geograph Syst (2000) 2:71±76 ( Springer-Verlag 2000 Context-dependent spatial analysis: A role for GIS? A. Stewart Fotheringham Department of Geography, University of Newcastle, Newcastle-upon-Tyne NE1

More information

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Cluster Analysis using SaTScan Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Outline Clusters & Cluster Detection Spatial Scan Statistic Case Study 28 September 2007 APHEO Conference

More information

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Inclusion of Non-Street Addresses in Cancer Cluster Analysis Inclusion of Non-Street Addresses in Cancer Cluster Analysis Sue-Min Lai, Zhimin Shen, Darin Banks Kansas Cancer Registry University of Kansas Medical Center KCR (Kansas Cancer Registry) KCR: population-based

More information

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 ARIC Manuscript Proposal # 1186 PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 1.a. Full Title: Comparing Methods of Incorporating Spatial Correlation in

More information

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics The Nature of Geographic Data Types of spatial data Continuous spatial data: geostatistics Samples may be taken at intervals, but the spatial process is continuous e.g. soil quality Discrete data Irregular:

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context.

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context. by Introduction Problem Do the patterns of forest fires change over time? Do forest fires occur in clusters, and do the clusters change over time? Is this information useful in fighting forest fires? This

More information

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES Mariana Nagy "Aurel Vlaicu" University of Arad Romania Department of Mathematics and Computer Science

More information

Spatial Clusters of Rates

Spatial Clusters of Rates Spatial Clusters of Rates Luc Anselin http://spatial.uchicago.edu concepts EBI local Moran scan statistics Concepts Rates as Risk from counts (spatially extensive) to rates (spatially intensive) rate =

More information

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Arthur Getis* and Jared Aldstadt** *San Diego State University **SDSU/UCSB

More information

Figure Figure

Figure Figure Figure 4-12. Equal probability of selection with simple random sampling of equal-sized clusters at first stage and simple random sampling of equal number at second stage. The next sampling approach, shown

More information

THE 3D SIMULATION INFORMATION SYSTEM FOR ASSESSING THE FLOODING LOST IN KEELUNG RIVER BASIN

THE 3D SIMULATION INFORMATION SYSTEM FOR ASSESSING THE FLOODING LOST IN KEELUNG RIVER BASIN THE 3D SIMULATION INFORMATION SYSTEM FOR ASSESSING THE FLOODING LOST IN KEELUNG RIVER BASIN Kuo-Chung Wen *, Tsung-Hsing Huang ** * Associate Professor, Chinese Culture University, Taipei **Master, Chinese

More information

What is sampling? shortcut whole population small part Why sample? not enough; time, energy, money, labour/man power, equipment, access measure

What is sampling? shortcut whole population small part Why sample? not enough; time, energy, money, labour/man power, equipment, access measure What is sampling? A shortcut method for investigating a whole population Data is gathered on a small part of the whole parent population or sampling frame, and used to inform what the whole picture is

More information

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign GIS and Spatial Analysis Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline GIS and Spatial Analysis

More information

Using Geographic Information Systems for Exposure Assessment

Using Geographic Information Systems for Exposure Assessment Using Geographic Information Systems for Exposure Assessment Ravi K. Sharma, PhD Department of Behavioral & Community Health Sciences, Graduate School of Public Health, University of Pittsburgh, Pittsburgh,

More information

Comparison of spatial methods for measuring road accident hotspots : a case study of London

Comparison of spatial methods for measuring road accident hotspots : a case study of London Journal of Maps ISSN: (Print) 1744-5647 (Online) Journal homepage: http://www.tandfonline.com/loi/tjom20 Comparison of spatial methods for measuring road accident hotspots : a case study of London Tessa

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

Gridded population. redistribution models and applications. David Martin 20 February 2009

Gridded population. redistribution models and applications. David Martin 20 February 2009 Gridded population data for the UK redistribution models and applications David Martin 20 February 2009 Overview UK gridded data history (brief!) Small area data availability Grid-based modelling responses

More information

The Trade Area Analysis Model

The Trade Area Analysis Model The Trade Area Analysis Model Trade area analysis models encompass a variety of techniques designed to generate trade areas around stores or other services based on the probability of an individual patronizing

More information

An Introduction to Pattern Statistics

An Introduction to Pattern Statistics An Introduction to Pattern Statistics Nearest Neighbors The CSR hypothesis Clark/Evans and modification Cuzick and Edwards and controls All events k function Weighted k function Comparative k functions

More information

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE CO-282 POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE KYRIAKIDIS P. University of California Santa Barbara, MYTILENE, GREECE ABSTRACT Cartographic areal interpolation

More information

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis 6 Why Is It There? Why Is It There? Getting Started with Geographic Information Systems Chapter 6 6.1 Describing Attributes 6.2 Statistical Analysis 6.3 Spatial Description 6.4 Spatial Analysis 6.5 Searching

More information

Nature of Spatial Data. Outline. Spatial Is Special

Nature of Spatial Data. Outline. Spatial Is Special Nature of Spatial Data Outline Spatial is special Bad news: the pitfalls of spatial data Good news: the potentials of spatial data Spatial Is Special Are spatial data special? Why spatial data require

More information

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression Multiple Dependent Hypothesis Tests in Geographically Weighted Regression Graeme Byrne 1, Martin Charlton 2, and Stewart Fotheringham 3 1 La Trobe University, Bendigo, Victoria Austrlaia Telephone: +61

More information

Overview of Spatial analysis in ecology

Overview of Spatial analysis in ecology Spatial Point Patterns & Complete Spatial Randomness - II Geog 0C Introduction to Spatial Data Analysis Chris Funk Lecture 8 Overview of Spatial analysis in ecology st step in understanding ecological

More information

Place Syntax Tool (PST)

Place Syntax Tool (PST) Place Syntax Tool (PST) Alexander Ståhle To cite this report: Alexander Ståhle (2012) Place Syntax Tool (PST), in Angela Hull, Cecília Silva and Luca Bertolini (Eds.) Accessibility Instruments for Planning

More information

Spatial Point Pattern Analysis

Spatial Point Pattern Analysis Spatial Point Pattern Analysis Jiquan Chen Prof of Ecology, University of Toledo EEES698/MATH5798, UT Point variables in nature A point process is a discrete stochastic process of which the underlying

More information

DIRECTIONAL EXPANSION IN MIGRAnON MODELLING

DIRECTIONAL EXPANSION IN MIGRAnON MODELLING DIRECTIONAL EXPANSION IN MIGRATION MODELLING DIRECTIONAL EXPANSION IN MIGRAnON MODELLING Timothy C. Pitts Department of Geography University at Buffalo Amherst. NY 14261 ABSTRACf Migration is typically

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Grid Enabling Geographically Weighted Regression

Grid Enabling Geographically Weighted Regression Grid Enabling Geographically Weighted Regression Daniel J Grose 1, Richard Harris 2, Chris Brundson 3, and Dave Kilham 2 1 Centre for e-science, Lancaster University, United Kingdom 2 School of Geographical

More information

The Study on Trinary Join-Counts for Spatial Autocorrelation

The Study on Trinary Join-Counts for Spatial Autocorrelation Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 5-7, 008, pp. -8 The Study on Trinary Join-Counts

More information

Detection of Clustering in Spatial Data

Detection of Clustering in Spatial Data Detection of Clustering in Spatial Data Lance A. Waller Department of Biostatistics Rollins School of Public Health Emory University 1518 Clifton Road NE Atlanta, GA 30322 E-mail: lwaller@sph.emory.edu

More information

An Introduction to SaTScan

An Introduction to SaTScan An Introduction to SaTScan Software to measure spatial, temporal or space-time clusters using a spatial scan approach Marilyn O Hara University of Illinois moruiz@illinois.edu Lecture for the Pre-conference

More information

Texas A&M University

Texas A&M University Texas A&M University CVEN 658 Civil Engineering Applications of GIS Hotspot Analysis of Highway Accident Spatial Pattern Based on Network Spatial Weights Instructor: Dr. Francisco Olivera Author: Zachry

More information

Analysis of Bank Branches in the Greater Los Angeles Region

Analysis of Bank Branches in the Greater Los Angeles Region Analysis of Bank Branches in the Greater Los Angeles Region Brian Moore Introduction The Community Reinvestment Act, passed by Congress in 1977, was written to address redlining by financial institutions.

More information

Exploring representational issues in the visualisation of geographical phenomenon over large changes in scale.

Exploring representational issues in the visualisation of geographical phenomenon over large changes in scale. Institute of Geography Online Paper Series: GEO-017 Exploring representational issues in the visualisation of geographical phenomenon over large changes in scale. William Mackaness & Omair Chaudhry Institute

More information

A Modified DBSCAN Clustering Method to Estimate Retail Centre Extent

A Modified DBSCAN Clustering Method to Estimate Retail Centre Extent A Modified DBSCAN Clustering Method to Estimate Retail Centre Extent Michalis Pavlis 1, Les Dolega 1, Alex Singleton 1 1 University of Liverpool, Department of Geography and Planning, Roxby Building, Liverpool

More information

Learning Computer-Assisted Map Analysis

Learning Computer-Assisted Map Analysis Learning Computer-Assisted Map Analysis by Joseph K. Berry* Old-fashioned math and statistics can go a long way toward helping us understand GIS Note: This paper was first published as part of a three-part

More information

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Typical information required from the data collection can be grouped into four categories, enumerated as below. Chapter 6 Data Collection 6.1 Overview The four-stage modeling, an important tool for forecasting future demand and performance of a transportation system, was developed for evaluating large-scale infrastructure

More information

A spatial scan statistic for multinomial data

A spatial scan statistic for multinomial data A spatial scan statistic for multinomial data Inkyung Jung 1,, Martin Kulldorff 2 and Otukei John Richard 3 1 Department of Epidemiology and Biostatistics University of Texas Health Science Center at San

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations

Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations Catalogue no. 11-522-XIE Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations 2004 Proceedings of Statistics Canada

More information

KEYWORDS: census maps, map scale, inset maps, feature density analysis, batch mapping. Introduction

KEYWORDS: census maps, map scale, inset maps, feature density analysis, batch mapping. Introduction Automated Detection and Delineation of Inset Maps William G. Thompson ABSTRACT: In order to create a useful map, the cartographer must select a scale at which the map reader can distinguish all features

More information

LEHMAN COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF ENVIRONMENTAL, GEOGRAPHIC, AND GEOLOGICAL SCIENCES CURRICULAR CHANGE

LEHMAN COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF ENVIRONMENTAL, GEOGRAPHIC, AND GEOLOGICAL SCIENCES CURRICULAR CHANGE LEHMAN COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF ENVIRONMENTAL, GEOGRAPHIC, AND GEOLOGICAL SCIENCES CURRICULAR CHANGE Hegis Code: 2206.00 Program Code: 452/2682 1. Type of Change: New Course 2.

More information

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE Abstract SHI Lihong 1 LI Haiyong 1,2 LIU Jiping 1 LI Bin 1 1 Chinese Academy Surveying and Mapping, Beijing, China, 100039 2 Liaoning

More information

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1 Data Collection Lecture Notes in Transportation Systems Engineering Prof. Tom V. Mathew Contents 1 Overview 1 2 Survey design 2 2.1 Information needed................................. 2 2.2 Study area.....................................

More information

The History Behind Census Geography

The History Behind Census Geography The History Behind Census Geography Michael Ratcliffe Geography Division US Census Bureau Kentucky State Data Center Affiliate Meeting August 5, 2016 Today s Presentation A brief look at the history behind

More information

Teaching Research Methods: Resources for HE Social Sciences Practitioners. Sampling

Teaching Research Methods: Resources for HE Social Sciences Practitioners. Sampling Sampling Session Objectives By the end of the session you will be able to: Explain what sampling means in research List the different sampling methods available Have had an introduction to confidence levels

More information

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale António Manuel RODRIGUES 1, José António TENEDÓRIO 2 1 Research fellow, e-geo Centre for Geography and Regional Planning,

More information

Exploring Digital Welfare data using GeoTools and Grids

Exploring Digital Welfare data using GeoTools and Grids Exploring Digital Welfare data using GeoTools and Grids Hodkinson, S.N., Turner, A.G.D. School of Geography, University of Leeds June 20, 2014 Summary As part of the Digital Welfare project [1] a Java

More information

The Building Blocks of the City: Points, Lines and Polygons

The Building Blocks of the City: Points, Lines and Polygons The Building Blocks of the City: Points, Lines and Polygons Andrew Crooks Centre For Advanced Spatial Analysis andrew.crooks@ucl.ac.uk www.gisagents.blogspot.com Introduction Why use ABM for Residential

More information

Lecture 4. Spatial Statistics

Lecture 4. Spatial Statistics Lecture 4 Spatial Statistics Lecture 4 Outline Statistics in GIS Spatial Metrics Cell Statistics Neighborhood Functions Neighborhood and Zonal Statistics Mapping Density (Density surfaces) Hot Spot Analysis

More information

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources Carol A. Gotway Crawford National Center for Environmental Health Centers for Disease Control and Prevention,

More information

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS THUY ANH NGO 1. Introduction Statistics are easily come across in our daily life. Statements such as the average

More information

SECTION 4 PARCEL IDENTIFIERS 4.1 LONGITUDE AND LATITUDE

SECTION 4 PARCEL IDENTIFIERS 4.1 LONGITUDE AND LATITUDE SECTION 4 PARCEL IDENTIFIERS 4.1 LONGITUDE AND LATITUDE Most maps must be drawn in such a way that points and areas can be located accurately on the earth's surface and described objectively. A uniform

More information

Dorling fbetw.tex V1-04/12/2012 6:10 P.M. Page xi

Dorling fbetw.tex V1-04/12/2012 6:10 P.M. Page xi Dorling fbetw.tex V1-04/12/2012 6:10 P.M. Page xi List of figures P.1 Born in England, Scotland or Wales Britain 1981 (four levels each), ward map (wards are used to define most other administrative areas

More information

Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand

Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand 166 Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand Clive E. SABEL and Phil BARTIE Abstract This paper outlines the development of a method

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

The History Behind Census Geography

The History Behind Census Geography The History Behind Census Geography Michael Ratcliffe Geography Division US Census Bureau Tennessee State Data Center August 8, 2017 Today s Presentation A brief look at the history behind some of the

More information

Class 9. Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference

Class 9. Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference Class 9 Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference Spatial Analysis Turns raw data into useful information by adding greater informative content and value

More information

Geovisualization of Attribute Uncertainty

Geovisualization of Attribute Uncertainty Geovisualization of Attribute Uncertainty Hyeongmo Koo 1, Yongwan Chun 2, Daniel A. Griffith 3 University of Texas at Dallas, 800 W. Campbell Road, Richardson, Texas 75080, 1 Email: hxk134230@utdallas.edu

More information

Cell-based Model For GIS Generalization

Cell-based Model For GIS Generalization Cell-based Model For GIS Generalization Bo Li, Graeme G. Wilkinson & Souheil Khaddaj School of Computing & Information Systems Kingston University Penrhyn Road, Kingston upon Thames Surrey, KT1 2EE UK

More information

Location Suitability Analysis

Location Suitability Analysis 2010 Fall 406 Final Project Location Suitability Analysis New Burger stores in San Fernando Valley Presenter: Rich Lee I. Introduction In-N-Out Burger is famous in South West America. Established in 1948

More information

GEO 463-Geographic Information Systems Applications. Lecture 1

GEO 463-Geographic Information Systems Applications. Lecture 1 GEO 463-Geographic Information Systems Applications Lecture 1 Rules of engagement No Mobile Submit course work- scratch my back.i..? Software- Quantum GIS vrs ArcGIS Open source vrs Commercial Free vrs

More information

Q.1 Define Population Ans In statistical investigation the interest usually lies in the assessment of the general magnitude and the study of

Q.1 Define Population Ans In statistical investigation the interest usually lies in the assessment of the general magnitude and the study of Q.1 Define Population Ans In statistical investigation the interest usually lies in the assessment of the general magnitude and the study of variation with respect to one or more characteristics relating

More information

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System Outline I Data Preparation Introduction to SpaceStat and ESTDA II Introduction to ESTDA and SpaceStat III Introduction to time-dynamic regression ESTDA ESTDA & SpaceStat Learning Objectives Activities

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of

More information

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August SaTScan TM User Guide for version 7.0 By Martin Kulldorff August 2006 http://www.satscan.org/ Contents Introduction... 4 The SaTScan Software... 4 Download and Installation... 5 Test Run... 5 Sample Data

More information

GIS Methodology in Determining Population Centers and Distances to Nuclear Power Plants

GIS Methodology in Determining Population Centers and Distances to Nuclear Power Plants GIS Methodology in Determining Population Centers and Distances to Nuclear Power Plants Tine Ningal & Finbarr Brereton The methods used to determine the population centres are described below. The default

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g Chapter 7: Qualitative and Quantitative Sampling Introduction Quantitative researchers more concerned with sampling; primary goal to get a representative sample (smaller set of cases a researcher selects

More information

Glossary. Appendix G AAG-SAM APP G

Glossary. Appendix G AAG-SAM APP G Appendix G Glossary Glossary 159 G.1 This glossary summarizes definitions of the terms related to audit sampling used in this guide. It does not contain definitions of common audit terms. Related terms

More information

Understanding the modifiable areal unit problem

Understanding the modifiable areal unit problem Understanding the modifiable areal unit problem Robin Flowerdew School of Geography and Geosciences, University of St Andrews March 2009 Acknowledgements Mick Green (Lancaster) and David Steel (Wollongong),

More information

ESRI 2008 Health GIS Conference

ESRI 2008 Health GIS Conference ESRI 2008 Health GIS Conference An Exploration of Geographically Weighted Regression on Spatial Non- Stationarity and Principal Component Extraction of Determinative Information from Robust Datasets A

More information

Mapping Landscape Change: Space Time Dynamics and Historical Periods.

Mapping Landscape Change: Space Time Dynamics and Historical Periods. Mapping Landscape Change: Space Time Dynamics and Historical Periods. Bess Moylan, Masters Candidate, University of Sydney, School of Geosciences and Archaeological Computing Laboratory e-mail address:

More information

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio Applied Spatial Data Analysis with R 4:1 Springer Contents Preface VII 1 Hello World: Introducing Spatial Data 1 1.1 Applied Spatial Data Analysis

More information

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO

More information

The CrimeStat Program: Characteristics, Use, and Audience

The CrimeStat Program: Characteristics, Use, and Audience The CrimeStat Program: Characteristics, Use, and Audience Ned Levine, PhD Ned Levine & Associates and Houston-Galveston Area Council Houston, TX In the paper and presentation, I will discuss the CrimeStat

More information

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall Announcement New Teaching Assistant Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall 209 Email: michael.harrigan@ttu.edu Guofeng Cao, Texas Tech GIST4302/5302, Lecture 2: Review of Map Projection

More information

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis Guofeng Cao http://www.spatial.ttu.edu Department of Geosciences Texas Tech University guofeng.cao@ttu.edu

More information

Geographers Perspectives on the World

Geographers Perspectives on the World What is Geography? Geography is not just about city and country names Geography is not just about population and growth Geography is not just about rivers and mountains Geography is a broad field that

More information

Enhanced Subsurface Interpolation by Geological Cross-Sections by SangGi Hwang, PaiChai University, Korea

Enhanced Subsurface Interpolation by Geological Cross-Sections by SangGi Hwang, PaiChai University, Korea Enhanced Subsurface Interpolation by Geological Cross-Sections by SangGi Hwang, PaiChai University, Korea Abstract Subsurface geological structures, such as bedding, fault planes and ore body, are disturbed

More information

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In

More information

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis Guofeng Cao www.spatial.ttu.edu Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Fall 2018 Spatial Point Patterns

More information

Chapter Three. Hypothesis Testing

Chapter Three. Hypothesis Testing 3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being

More information

Combining Incompatible Spatial Data

Combining Incompatible Spatial Data Combining Incompatible Spatial Data Carol A. Gotway Crawford Office of Workforce and Career Development Centers for Disease Control and Prevention Invited for Quantitative Methods in Defense and National

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Chapter 1. Gaining Knowledge with Design of Experiments

Chapter 1. Gaining Knowledge with Design of Experiments Chapter 1 Gaining Knowledge with Design of Experiments 1.1 Introduction 2 1.2 The Process of Knowledge Acquisition 2 1.2.1 Choosing the Experimental Method 5 1.2.2 Analyzing the Results 5 1.2.3 Progressively

More information

DATA DISAGGREGATION BY GEOGRAPHIC

DATA DISAGGREGATION BY GEOGRAPHIC PROGRAM CYCLE ADS 201 Additional Help DATA DISAGGREGATION BY GEOGRAPHIC LOCATION Introduction This document provides supplemental guidance to ADS 201.3.5.7.G Indicator Disaggregation, and discusses concepts

More information

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2 Acknowledgments xiii Preface xv GIS Tutorial 1 Introducing GIS and health applications 1 What is GIS? 2 Spatial data 2 Digital map infrastructure 4 Unique capabilities of GIS 5 Installing ArcView and the

More information

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME: 1) A GIS data model using an array of cells to store spatial data is termed: a) Topology b) Vector c) Object d) Raster 2) Metadata a) Usually includes map projection, scale, data types and origin, resolution

More information

Applications of GIS in Health Research. West Nile virus

Applications of GIS in Health Research. West Nile virus Applications of GIS in Health Research West Nile virus Outline Part 1. Applications of GIS in Health research or spatial epidemiology Disease Mapping Cluster Detection Spatial Exposure Assessment Assessment

More information

Superiority by a Margin Tests for One Proportion

Superiority by a Margin Tests for One Proportion Chapter 103 Superiority by a Margin Tests for One Proportion Introduction This module provides power analysis and sample size calculation for one-sample proportion tests in which the researcher is testing

More information

Development of Integrated Spatial Analysis System Using Open Sources. Hisaji Ono. Yuji Murayama

Development of Integrated Spatial Analysis System Using Open Sources. Hisaji Ono. Yuji Murayama Development of Integrated Spatial Analysis System Using Open Sources Hisaji Ono PASCO Corporation 1-1-2, Higashiyama, Meguro-ku, TOKYO, JAPAN; Telephone: +81 (03)3421 5846 FAX: +81 (03)3421 5846 Email:

More information

Spatial Analysis 1. Introduction

Spatial Analysis 1. Introduction Spatial Analysis 1 Introduction Geo-referenced Data (not any data) x, y coordinates (e.g., lat., long.) ------------------------------------------------------ - Table of Data: Obs. # x y Variables -------------------------------------

More information

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara Statistical Perspectives on Geographic Information Science Michael F. Goodchild University of California Santa Barbara Statistical geometry Geometric phenomena subject to chance spatial phenomena emphasis

More information