Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Similar documents
Spatial Analysis 2. Spatial Autocorrelation

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters

Spatial Autocorrelation (2) Spatial Weights

Constructing the Spatial Weights Matrix Using a Local Statistic

Performance of W-AMOEBA and W-Contiguity matrices in Spatial Lag Model

Spatial Analysis 1. Introduction

Introduction to Spatial Statistics and Modeling for Regional Analysis

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Exploratory Spatial Data Analysis (ESDA)

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

SIMULATION AND APPLICATION OF THE SPATIAL AUTOREGRESSIVE GEOGRAPHICALLY WEIGHTED REGRESSION MODEL (SAR-GWR)

Modeling the Ecology of Urban Inequality in Space and Time

Identification of Local Clusters for Count Data: A. Model-Based Moran s I Test

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Temporal vs. Spatial Data

An Introduction to Pattern Statistics

Spatial Modeling, Regional Science, Arthur Getis Emeritus, San Diego State University March 1, 2016

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Community Health Needs Assessment through Spatial Regression Modeling

Spatial Clusters of Rates

Spatial Statistics For Real Estate Data 1

Rob Baller Department of Sociology University of Iowa. August 17, 2003

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Spatial Econometric STAR Models: Lagrange Multiplier Tests and Monte Carlo Simulations

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

CHAPTER 3 APPLICATION OF MULTIVARIATE TECHNIQUE SPATIAL ANALYSIS ON RURAL POPULATION UNDER POVERTYLINE FROM OFFICIAL STATISTICS, THE WORLD BANK

Basics of Geographic Analysis in R

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Cluster Analysis using SaTScan

Modeling Spatial Dependence and Spatial Heterogeneity in. County Yield Forecasting Models

Variables and Variable De nitions

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Testing Random Effects in Two-Way Spatial Panel Data Models

ESRI 2008 Health GIS Conference

Spatial correlation and demography.

NONPARAMETRIC ESTIMATION OF THE SPATIAL CONNECTIVITY MATRIX BY THE METHOD OF MOMENTS USING SPATIAL PANEL DATA

Spatial Autocorrelation

Network data in regression framework

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

ON THE NEGATION OF THE UNIFORMITY OF SPACE RESEARCH ANNOUNCEMENT

Spatial Regression. 10. Specification Tests (2) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Empirical versus exogenous spatial weighting matrices: an entropy-based intermediate solution

In matrix algebra notation, a linear model is written as

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

Local Spatial Autocorrelation Clusters

Lecture 7: Spatial Econometric Modeling of Origin-Destination flows

Using Spatial Statistics Social Service Applications Public Safety and Public Health

Spatial Effects and Externalities

Nonparametric Estimation of the Spatial Connectivity Matrix Using Spatial Panel Data

Spatial Regression. 6. Specification Spatial Heterogeneity. Luc Anselin.

A SPATIAL ANALYSIS OF A RURAL LAND MARKET USING ALTERNATIVE SPATIAL WEIGHT MATRICES

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

ESE 502: Assignments 6 & 7

Optimum Spatial Weighted in Small Area Estimation

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

Spatial Autocorrelation and Interactions between Surface Temperature Trends and Socioeconomic Changes

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

An Introduction to SaTScan

Geographically weighted regression approach for origin-destination flows

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Geographically Weighted Regression (GWR) Modelling with Weighted Fixed Gaussian Kernel and Queen Contiguity for Dengue Fever Case Data

Spatial Trends of unpaid caregiving in Ireland

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Spatial heterogeneity in economic growth of European regions

Measuring The Benefits of Air Quality Improvement: A Spatial Hedonic Approach. Chong Won Kim, Tim Phipps, and Luc Anselin

Focal Location Quotients: Specification and Applications

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

GeoDa and Spatial Regression Modeling

The Study on Trinary Join-Counts for Spatial Autocorrelation

Lab #3 Background Material Quantifying Point and Gradient Patterns

Bayesian Hierarchical Models

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Spatial autocorrelation: robustness of measures and tests

Finite Sample Properties of Moran s I Test for Spatial Autocorrelation in Probit and Tobit Models - Empirical Evidence

Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction

Lecture 1: Introduction to Spatial Econometric

Spatial Regression Models for Demographic Analysis

Local versus Global Convergence in Europe: A Bayesian Spatial Econometric Approach

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial Investigation of Mineral Transportation Characteristics in the State of Washington

Spatial groupwise heteroskedasticity and the SCAN approach

Departamento de Economía Universidad de Chile

Evaluating sustainable transportation offers through housing price: a comparative analysis of Nantes urban and periurban/rural areas (France)

Proceedings of the 8th WSEAS International Conference on APPLIED MATHEMATICS, Tenerife, Spain, December 16-18, 2005 (pp )

Ensemble Spatial Autoregressive Model on. the Poverty Data in Java

Spatial Regression. 11. Spatial Two Stage Least Squares. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

GIS CONFERENCE MAKING PLACE MATTER Decoding Health Data with Spatial Statistics

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

International Journal of Remote Sensing, in press, 2006.

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Spatial Data Mining. Regression and Classification Techniques

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Reasons for Instability in Spatial Dependence Models

ESTIMATION PROBLEMS IN MODELS WITH SPATIAL WEIGHTING MATRICES WHICH HAVE BLOCKS OF EQUAL ELEMENTS*

COLUMN. Spatial Analysis in R: Part 2 Performing spatial regression modeling in R with ACS data

Transcription:

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Arthur Getis* and Jared Aldstadt** *San Diego State University **SDSU/UCSB Joint PhD Program Paper presented at the Regional Research Institute, West Virginia University Morgantown, West Virginia December 8, 25

AMOEBA A design for the construction of a spatial weights matrix using empirical data. Multidirectional: Searches for spatial association in all specified directions. Optimal: Optimum in the sense that the scale is local (the finest scale) and the analysis reveals all spatial association. Ecotope-Based: The ecotope is a specialized region (a particular habitat) within a larger region. Algorithm: The algorithm for finding the ecotope is based on an analytical system that often finds highly irregular (amoeba-like) sub-regions of spatial association.

The Issues Question 1 How does one create an appropriate spatial weights matrix? Question 2 Can we have confidence in the identification of spatial clusters?

Question 1 How does one create an appropriate spatial weights matrix?

The Spatial Weights Matrix In a regression context W is the formal expression of spatial dependence between spatial units (the spatial effects). Used in, for example: y = ρwy + Xβ + ε

The Typical W Matrix j-------> 1 2 3 n i=1 w 11 w 12 w 13... w 1n i=2 w 21 w 22 i=3 w 31 i=n w n1 w nn

Some Traditional W Schemes Contiguity Inverse Distances Lengths of Shared Borders, Perimeters n th Nearest Neighbor Distance All Centroids within d Ranked Distances Network Links

Commentators on W Anselin: Outlined the problem Dacey: varying results given schemes Cliff and Ord: rook s and queen s cases Griffith: better under-specified Florax & Rey: over-specification reduces power Kooijman: maximize Moran s Openshaw: computer search for best model Bartels: binary defensible Hammersley-Clifford: near neighbors in Markov Tiefelsdorf, Griffith, Boots: standardization Florax and Graff: bias due to matrix sparseness GEODA listserv

Some Recent W Schemes Fotheringham, Brunsdon, and Charlton s bandwidth distance decay (1996) LeSage s Gaussian distance decline (1999) McMillen s tri-cube distance decline (1998) Getis and Aldstadt s local statistics model (21, 22) Fotheringham, Charlton, Brunsdon s optimize bandwidth (22) LeSage s Bayesian approach (23) Aldstadt and Getis AMOEBA (23)

W Theory or Reality? Exogenous versus endogenous Estimation versus prediction Model driven versus data driven The AMOEBA approach

AMOEBA: Critical Number of Links Identification Local statistics values are computed around each observation as the number of links (d) increases. When the absolute values fail to rise, the cluster diameter is reached. First peak equals G i * dc. 2.5 2 Gi* 1.5 1.5 1 2 3 4 5 Distance Links

AMOEBA: Weight Calculation When d c >, w ij w ij P( z Zd = P( z Z =, otherwise. c d ) P( z Z c ) P( z d ij Z ) ), for all j where d ij d c When d c =, for all j, w ij = P(z) is the cumulative probability associated with the standard variate of the normal distribution Weights vary between and 1.

AMOEBA: Links Designations d ij is the number of links from the focus spatial unit i to another spatial unit j d c is the critical number of links: the number of links from i beyond which no further autocorrelation exists.

AMOEBA as W and U in an Autoregressive Spatial Lag Model It is conceivable for rows of the weights matrix to be completely filled with zeroes indicating that there is no local spatial autocorrelation surrounding an observation. To compensate for the zero row effect, we create a dummy variable, U, that assigns a 1 for all observations with no dependence structure and otherwise. y = θwy + αu + Xβ + ε

AMOEBA as W and U in a Autoregressive Spatial Error Model y = αu + Xβ + (I - κw) -1 ε

AMOEBA: The non-spatial and spatial matrices U = 1 1 1 1 W = w 2,1 w 3,1 w 4,1 w 7,1 w 8,1 w 9,1 w 11,1 w 12,1 w 14,1 w 1,2 w 3,2 w 4,2 w 7,2 w 8,2 w 9,2 w 11,2 w 12,2 w 14,2 w 1,3 w 2,3 w 4,3 w 7,3 w 8,3 w 9,3 w 11,3 w 12,3 w 14,3 w 1,4 w 2,4 w 3,4 w 7,4 w 8,4 w 9,4 w 11,4 w 12,4 w 14,4 w 1,5 w 2,5 w 3,5 w 4,5 w 7,5 w 8,5 w 9,5 w 11,5 w 12,5 w 14,5 w 1,6 w 2,6 w 3,6 w 4,6 w 7,6 w 8,6 w 9,6 w 11,6 w 12,6 w 14,6 w 1,7 w 2,7 w 3,7 w 4,7 w 8,7 w 9,7 w 11,7 w 12,7 w 14,7 w 1,8 w 2,8 w 3,8 w 4,8 w 7,8 w 9,8 w 11,8 w 12,8 w 14,8 w 1,9 w 2,9 w 3,9 w 4,9 w 7,9 w 8,9 w 11,9 w 12,9 w 14,9 w 1,1 w 2,1 w 3,1 w 4,1 w 7,1 w 8,1 w 9,1 w 11,1 w 12,1 w 1,11 w 2,11 w 3,11 w 4,11 w 7,11 w 8,11 w 9,11 w 12,11 w 1,12 w 2,12 w 3,12 w 4,12 w 7,12 w 8,12 w 9,12 w 11,12 w 1,13 w 2,13 w 3,13 w 4,13 w 7,13 w 8,13 w 9,13 w 1,13 w 11,14 w 12,13 w 14,1 w 14,11 w 14,12 w 14,13 w 1,14 w 2,14 w 3,14 w 4,14 w 7,14 w 8,14 w 9,14 w 12,14

Generalized AMOEBA Yc 1c Wcc Wc Yc ε c α ρ β Y = 1 + + + Y 1 ε

Total Fertility Rates Amman, Jordan An Example 1994 (data by census units)

Mediterranean Sea LEBANON SYRIA IRAQ Gaza PALESTINIAN AUTHORITY ISRAEL SAUDI ARABIA EGYPT

Explanatory Variables Regressor social variables 1. Percent of females with higher education (called hi-ed ) 2. Percent females married (called married )

Ordinary Least Squares No W or U AIC 165.35 t VALUES constant 6.266 hi-ed -14.344 married 1.261

AMOEBA in Spatial Error Models A M O E B A Contiguity G i I i c i AIC 167.352 79.159 147.43 11.1 t VALUES constant 6.499 6.499 7.21 6.342 hi-ed -13.4-11.55-13.316-4.68 married 1.164 1.978 1.227 1.154 lambda 1.634 98.792 1.187 14.5 non-spatial 12.588-4.48 7.89

Comparison of Spatial Contiguity and AMOEBA Spatial Error Model Spatial Error Model: G i AMOEBA has AIC much lower than contiguity (79.159 to 16.625). All AMOEBA models are an improvement over contiguity. G i AMOEBA has an extremely high lambda and nonspatial vector: good descriptor of spatial and nonspatial effects. G i AMOEBA shows social variables to be significant in explaining TFR.

AMOEBA in Spatial Lag Models A M O E B A Contiguity G i I i c i AIC 16.625 18.27 148.481 123.881 t VALUES constant 5.419 3.866 5.68 4.742 hi-ed -9.927 -.87-9.51-8.642 married 1.164 2.16 1.341 1.21 Rho -.5 7.435 1.819 5.443 Non-Spatial 7.594-1.657 8.58

Comparison of Spatial Contiguity and AMOEBA Spatial Lag Model Again all AMOEBA have lower AIC than contiguity; G i AMOEBA is best. All variables significant.

Question 2 Can we have confidence in the identification of spatial clusters?

Problems with Spatial Clusters Not explicit (what is a cluster?) Are they statistically significant? (degree of confidence) What is the appropriate spatial scale? Often arbitrary, too general Over and under identification Appropriate shape (too circular, ellipsoidal) In general, the believability problem

AMOEBA Procedure I For each observation i, local statistics values (e.g., G i*, Z[I i ], Z[c i ]) are obtained for all combinations of near neighbors j of i within distance d of i. The set of j observations that maximizes the local statistic become members of the ecotope together with the i th observation.

1 1 1 1

AMOEBA Subsequent Procedures The procedure is repeated at increasing distances from i. At each distance d from i, only the j observations that are contiguous to the already existing ecotope are evaluated. Again, using the local statistic, all combinations together with the already existing ecotope members are evaluated. That new set of j observations that maximizes the local statistic become members of the ecotope.

2 2 1 2 2 1 1 2 2 1 2

3 4 3 2 3 4 4 3 2 1 2 3 4 3 2 1 1 2 3 3 2 1 2 3 4 6 5 4 3 4 6 5 4

mean = variance = 1 Hypothetical Clusters mean = variance = 1

AMOEBA Example 1 LSM AMOEBA G i AMOEBA I i AMOEBA c i

AMOEBA Example 2 LSM AMOEBA G i AMOEBA I i AMOEBA c i

AMOEBA Example 3 LSM AMOEBA G i AMOEBA I i AMOEBA c i

AMOEBA Example 4 LSM AMOEBA G i AMOEBA I i AMOEBA c i

AMOEBA Example 5 LSM AMOEBA G i AMOEBA I i AMOEBA c i

Heterogeneous Clusters This is like the data used in the GA paper.

Homogeneous Clusters This is the same 6 clusters with radii 2,4, and 6. The high clusters have a mean of.5 and the low clusters have a mean of -.5. These means are added to random values from the Normal(,1) distribution.

Peaked Clusters

Real World Example Clustering of dengue hemorrhagic fever in Thailand by province and by month. 14 years data: 168 monthly observations

STARS: A GIS System Rey, Sergio. Space-Time Analysis of Regional Systems (STARS). Available as an open source program on the Internet.

Other Clustering Algorithms SaTScan by Kulldorff (1997, v4. 24), (Communications in Statistics) FleXScan by Tango and Takahashi (24, 25) (International Journal of Health Geographics)

Bases of Clustering Methods AMOEBA SaTScan FleXScan Based on values of the Based on a moving Based on spatial scan local statistic as d circle of varying radii statistic used on increases in many searching for the circle irregularly shaped directions from an that is the least likely windows formed by index location. to have occurred by connecting adjacent chance. neighbors.

Clustering Methods Tests AMOEBA Ho: The sum of the observed values within ecotopes is greater (lesser) than expected by chance. The p value is calculated based on the location of the local statistic values of the observed ecotope within Monte Carlo permutations. SaTScan Ho: The sum of observed cases within the circular search region is proportional to the population size. The p value is calculated based on Poisson realizations using the global rate. FleXScan Ho and p: Same as SaTScan, but within the irregular search region.

Clustering Comparison High Risk Provinces Low Risk Provinces --------------------------------------------------- Cluster No Cluster Cluster No Cluster --------------------------------------------------- Relative Risk Expected 38 178 AMOEBA 34 4 178 SaTScan 35 3 21 157 FleXScan 35 3 3 175