The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

Similar documents
Introduction to Spatial Statistics and Modeling for Regional Analysis

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

Neighborhood effects in demography: measuring scales and patterns

Income Distribution Dynamics in The European Union: Exploring (In)Stability Measures within Geographical Clusters

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

USING DOWNSCALED POPULATION IN LOCAL DATA GENERATION

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Tracey Farrigan Research Geographer USDA-Economic Research Service

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

The Case for Space in the Social Sciences

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

Visualize and interactively design weight matrices

Lecture 4. Spatial Statistics

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Equity in Healthcare Accessibility

What s special about spatial data?

Spatial Autocorrelation (2) Spatial Weights

Spatial correlation and demography.

Spatial Trends of unpaid caregiving in Ireland

Measures of Spatial Dependence

Locational Error Impacts on Local Spatial Autocorrelation Indices: A Syracuse Soil Sample Pb-level Data Case Study

SPATIAL ECONOMETRICS: METHODS AND MODELS

Labour MarketAreas: ThePortuguese case

GIS Spatial Statistics for Public Opinion Survey Response Rates

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Nature of Spatial Data. Outline. Spatial Is Special

Identification of Urban Areas Using Raster Spatial Analysis

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Variables and Variable De nitions

Mapping and Analysis for Spatial Social Science

Exploratory Spatial Data Analysis (And Navigating GeoDa)

Spatial Analysis 1. Introduction

A Spatial Econometric Approach to Model the Growth of Tourism Flows to China Cities

Temporal vs. Spatial Data

GIS in Locating and Explaining Conflict Hotspots in Nepal

Metropolitan Areas in Italy

Spatial Analysis 2. Spatial Autocorrelation

Terra Communis (tcomm): A free data provider for historical census micro-data.

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Working Paper No Introduction to Spatial Econometric Modelling. William Mitchell 1. April 2013

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Regional Development Composite Index 2007

Spatial Data, Spatial Analysis and Spatial Data Science

Spatial Statistics For Real Estate Data 1

A cellular automata model for the study of small-size urban areas

Basics of Geographic Analysis in R

Spatial Autocorrelation

Local Spatial Autocorrelation Clusters

The identification of spatial dependence in the analysis of regional economic development join-count test application

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Rob Baller Department of Sociology University of Iowa. August 17, 2003

Modelling Sea-Level Rise in the Lisbon city coastal area, using Free and Open Source Technologies

A spatial literacy initiative for undergraduate education at UCSB

Spatial Variation in Hospitalizations for Cardiometabolic Ambulatory Care Sensitive Conditions Across Canada

Knowledge Spillovers, Spatial Dependence, and Regional Economic Growth in U.S. Metropolitan Areas. Up Lim, B.A., M.C.P.

The Geography of Social Change

Modeling the Ecology of Urban Inequality in Space and Time

The Nature of Geographic Data

The Study on Trinary Join-Counts for Spatial Autocorrelation

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Exploratory Spatial Data Analysis of Regional Economic Disparities in Beijing during the Preparation Period of the 2008 Olympic Games

South Europe at the crossroads

A Guide to Census Geography

Links between socio-economic and ethnic segregation at different spatial scales: a comparison between The Netherlands and Belgium

Are Travel Demand Forecasting Models Biased because of Uncorrected Spatial Autocorrelation? Frank Goetzke RESEARCH PAPER

Where Do Overweight Women In Ghana Live? Answers From Exploratory Spatial Data Analysis

Output: -Observed Mean Distance -Expected Mean Distance - Nearest Neighbor Index -Graphic report - Test variables:

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Spatial Effects in Convergence of Portuguese Product

Online Robustness Appendix to Endogenous Gentrification and Housing Price Dynamics

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Figure 10. Travel time accessibility for heavy trucks

GIS AND TERRITORIAL INTELLIGENCE. Using Microdata. Jean Dubé and Diègo Legros

A Spatial Multiple Discrete-Continuous Model

ESRI 2008 Health GIS Conference

Identification of Economic Clusters Using ArcGIS Spatial Statistics. Joseph Frizado Bruce Smith Michael Carroll

Lecture 1: Introduction to Spatial Econometric

Introduction to spatial data analysis

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Compact guides GISCO. Geographic information system of the Commission

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS

Small-Area Population Forecasting Using a Spatial Regression Approach

Multifractal portrayal of the distribution of the Swiss population

How the science of cities can help European policy makers: new analysis and perspectives

On A Comparison between Two Measures of Spatial Association

Working Paper Proceedings

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

Exploring the Impact of Ambient Population Measures on Crime Hotspots

CSISS Resources for Research and Teaching

Strategic Regional Planning and Regional Employment in Greece: A Clustering Analysis Approach

Application of the Getis-Ord Gi* statistic (Hot Spot Analysis) to seafloor organisms

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

GIS CONFERENCE MAKING PLACE MATTER Decoding Health Data with Spatial Statistics

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

Refinement of the OECD regional typology: Economic Performance of Remote Rural Regions

Spatial Modeling, Regional Science, Arthur Getis Emeritus, San Diego State University March 1, 2016

Transcription:

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale António Manuel RODRIGUES 1, José António TENEDÓRIO 2 1 Research fellow, e-geo Centre for Geography and Regional Planning, Faculdade de Ciências sociais e Humanas, Universidade Nova de Lisboa, Email: amrodrigues@fcsh.unl.pt 2 Associate Professor, Department of Geography and Regional Planning, Faculdade de Ciências sociais e Humanas, Universidade Nova de Lisboa, Email: ja.tenedorio@fcsh.unl.pt ABSTRACT Spatial analysis of lattice data depend greatly on the way distances between spatial units (regions) are measured. These distances are the basis of the weights matrices used to obtain statistics which take into account neighborhood relations between agents. In the social sciences, data is normally collected for sets of regions. The method used to divide the spatial surface into irregular units influence inferences about social phenomena, mainly because these units are neither homogenous nor regular. This paper tests the effect of different data aggregation methodologies on global spatial autocorrelation statistics. It is shown that regional consistency and administrative functions positively influence the existence and observation of clear spatial patterns. Resident population, is the variable chosen; two study areas and five different datasets are used to consistently test the effect of different spatial aggregation methods. KEYWORDS Spatial Autocorrelation, Spatial weights matrix, spillover effects 1. INTRODUCTION The common aspect between any geographical phenomena is the possibility to identify its location relative to any model which represents the surface of the Earth. Moreover, any individual or collective action is conditioned (and conditions) actions taken by agents located nearby. Hence, we may speak of a contagious or spillover effect which result in clear spatial patterns. Distance becomes a chief component in any geographical model, because this variable resumes the relation in space between intervening agents. In Statistics, because the fact that it is common for geographical information to be compiled and made available for a set in spatial units/regions (particularly in the social sciences), a set of Exploratory Spatial Data Analysis tools has been developed which measure the relations between these units. Central to this set of tools is the use of spatial weights matrices where each cell represents the geographical relation between a pair of spatial units. Distance measurement methodologies range from the existence of contiguity to the quantification of timedistances between centroids or urban nodes. The fact that there are no clear and formal methodologies which can assist in the choice of the most adequate proximity measurement causes model robustness to be somewhat endangered. For this reason, this is a field where further research is needed (Getis 2009). This paper tests the effect of distinct spatial data aggregation methods on measures of spatial autocorrelation. Through the use of different specifications for the spatial weights matrix, sensitivity to size and irregularity will be tested using sets of administrative regions and regular hexagonal grids superimposed on the study-areas. Observed variations in global spatial autocorrelation statistics contribute as guidelines for further research. The main objective is to test the stability of spatial autocorrelation measures to different methods of spatial data aggregation and to the use of different distance metrics. Most individual actions and in general spatial

individual attributes (eg. where someone works, lives or goes shopping) take place in a specific pointlocation. Yet, most available datasets aggregate such attributes according to pre-defined regional settings. There are several problems related to these aggregation exercises: first, using summarized indicators which refer to the average behavior of individuals or the sum of some demographic attribute may lead to mis-interpretations of social phenomena since the internal area of each spatial unit is not homogeneous. The second problem is related to the fact that individual attributes and actions are in general not independent in space; there are spatial patterns associated with each phenomenon which may be quantified using spatial autocorrelation metrics. These measure the relation between the values of an attribute aggregated for a number of spatial units/regions and the weighted mean of the same variable considering a set of neighbors. These exploratory measures are highly dependent on the concept of neighborhood used or the way distance between neighbors is quantified. The next section starts by describing the datasets used, as well as the study area. This is be followed by a brief discussion of the types of spatial weights matrices used as well as the global spatial autocorrelation statistics. Section 3 describes the results and section 4 consists of concluding remarks. 2. METHODOLOGY This paper intends to analyze how, given a spatial object/surface, the inferences made in respect to any anthropic or socio-economic variable are affected by structural characteristics of the data. Moreover, the study of spatial phenomena is particularly sensible to the method used to measure proximity between spatial units/regions. These proximity measures are in fact the backbone of geographical data analysis since they are the distinctive characteristic of spatial analysis. Two study areas will be analyzed: Continental Portugal and the Lisbon local council. For each, the distribution of resident population (Census 2001 data) is the variable of interest. Resident population, rather than population density was chosen since one objective of the study is to examine how regional irregularities affect spatial analysis. 2.1. Study-areas and data Figure 1: Continental Portugal: NUTS3 (n=28), local councils (n=278), hexagonal grid (n=278) The spatial independence of observations will be tested in order to identify existing spatial patterns. For each study area, the distribution of the variable of interest will be examined for different geographical levels of aggregation: in respect to Continental Portugal, the datasets used correspond to the NUTS 3 regions (28 spatial units) and the local councils (278). An extra dataset will be used, which results from the aggregation of the data from the highest disaggregation level (over 170 thousand census tracts) according to a set of 278 regular hexagons. In respect to the Lisbon area, two geographical datasets will be

where fij represents the common border between regions i and j. In the present study, a binary specification was chosen, since one of the spatial structures analyzed is a regular hexagonal grid, in which case contiguity measures are equal across the spatial plane. For the same reason, the chosen nearest neighbors matrix is a binary matrix, where each element wij is equal to one when j belongs to the set of k neighbors. Formally: where k is the set of i s nearest neighbors.

2.2. Spatial Autocorrelation In this study, two global spatial autocorrelation statistics are used: the Moran's I and Geary's C -equations 1a and 1b (Getis 2008). Both use a set of neighborhoods specified in the spatial weights matrix in order to obtain a measure of the co-variation of the variable of interest in relation of the spatially weighted mean of the set of neighbors. While Moran's I varies between -1 and 1 (-1 representing perfect negative autocorrelation and 1 perfect positive), Geary's C varies between -2 ( perfect negative) and 0 (perfect positive). While the former is centered around 0, the latter is centered around 1. 3. RESULTS The variable of interest in the present study, resident population summed up for different regional settings, as mentioned above, is aggregated according to different regional schemas. Figures 3 and 4 show the spatial distribution of the variable in Continental Portugal according to the existing 278 local councils (figure 3) and in the Lisbon local council according to the 53 existing local authorities (figure 4). The map on figure 3 also show the borders of the 28 NUTS3 regions which cover the study-area. In relation to Continental Portugal, there is a concentration of population in the North and along the coastal area. Figure 3 and 4 also include the graphical representation of the empirical density functions which show in both cases the existence of a small group of regions with large resident population numbers. In relation to Continental Portugal, it is interesting to note the existence of an intermediate group of medium to large size local councils. Figure 4 also shows the empirical density associated with resident population values for the hexagonal grid. It is possible to conclude that eliminating shape differences does not eliminated the skewness in the data. Figure 3: Spatial distribution of resident population (Continental Portugal)

Figure 4: Spatial distribution of resident population (Lisbon local council) Next, the two global spatial autocorrelation statistics, Moran's I and Geary's C were calculated for the two study areas using first a contiguity spatial weights matrix. In relation to Continental Portugal, the statistics were computed for the distribution of resident population for the 28 NUTS 3 regions, the 278 local councils and the 278 regular hexagons. For the NUTS 3 regions, there is significant weak positive autocorrelation in the data, a fact explained by the non-functional nature of this regional aggregation level and the large mean area. NUTS regions were created for statistical purposes and in most cases lack physical, social and/or urban coherence. When data is disaggregated at the local council level, spatial autocorrelation rises (higher Moran's I and lower Geary C). Hence, for the same variable, it is shown that spatial patterns are considerably different as the geographical level of analysis changes. Still related to Continental Portugal, as mentioned above, the same data was aggregated according to an hexagonal grid with the number of spatial unis equal to the number of local councils. In this case, positive spatial autocorrelation decreases according to both statistics. Again, regular hexagonal grids lack functional coherence, although the values are higher than at the NUTS 3 level. As for both indicators, the number of spatial units (n) is in the numerator, this fact is not surprising. Table 1: Global spatial autocorrelation statistics For the Lisbon local council, a similar behavior is observed. There is strong positive autocorrelation in the data at the administrative level (local authorities), whilst the spatial pattern is weaker when an hexagonal grid is used. Still for this study-area, it was thought as important to analyze, using a Moran scatterplot, the influence of outliers. This was so given the observed differences in terms of area of spatial units as distance from the older urban core increases.

Figure 5: Moran scatterplot (administrative units and hexagonal grid) Lisbon local council The Moran scatterplot represents on the horizontal axis the variable of interest and on the vertical axis the spatially weighted mean value of the same variable. Figure 5 shows that there is a group of administrative units whose value of resident population and of the lagged variable is well above the rest. Moreover, all these regions are located away from the old urban core. In order to test the influence of heterogeneity in terms of regional size, global spatial autocorrelation was computed for a set of units whose area is below the median (figure 6a). This represented a total of 27 spatial units, only one of which was located away from the core; this one was excluded from the dataset as contiguity would be lost otherwise. Finally, in order to test the effect of spatial units' size, two datasets were merged: one representing the 26 urban core local authorities and the other the aggregated census tracts 1. This resulted in a dataset with 728 spatial units (figure 6b). Figure 6: (a) Set of the 27 smaller administrative units; (b) Set of 728 spatial units Lisbon local council The spatial autocorrelation statistics for the old urban core, in spite of the small number of spatial units (26), is very high (Moran's I = 0.62 and Geary's C = 0.26), both significant at the 99% level. This demonstrates that for a small set of small functional urban regions, homogeneous in terms of area, spatial patterns are very strong. For the merged dataset, consisting of 728 spatial units covering the whole of Lisbon local council, autocorrelation is weaker, in spite of the much higher number of regions (Moran's I 1 Census tracts, in Portugal, have two level of aggregation: smaller units called sub-secções and larger ones, called secções. These latter, called in the main text aggregated census tracts, were used in this exercise.

= 0.42 and Geary's C = 0.26). This highlights a behavior which is consistent in all datasets: that functional long established administrative limits condition self-organizing urban growth around central administrative nodes; it stresses the importance of the administrative function in shaping urban form. This greater interdependence represented by high positive spatial autocorrelation is the evidence of longestablished regularity. Finally, this regularity highlighted by strong positive spatial autocorrelation is also likely to be a result of internal homogeneity in terms of type of construction. Figure 7: Moran's I statistic using W k spatial weights matrices with varying number of neighbors The final objective of the paper was to test the impact of an extra form for the spatial weights matrix; W k matrices with a varying number of neighbors. Figure 7 shows the variation of the Moran's I statistic as the number of neighbors increases. The first point to be made is that the statistic reaches its maximum when k is equal to or less than three, which is important to note considering that the mean number of contiguous regions in all datasets is between five and six. Also, for the NUTS 3 datasets, Moran's I is highest with K equal to one, which results from the large size and non-functional nature of this geographical level. Also, and confirming prior results, autocorrelation is stronger when using administrative regions (local councils and local authorities). This confirms that regularity comes second to function. 3. Concluding remarks This article intended to demonstrate how sensible are measures of global spatial autocorrelation to the geographical level of aggregation as well as to the regularity of spatial units in terms of size and shape. Using several datasets for two study-areas, it was shown that: first, consistency in terms of self-organizing human activity around urban nodes at the national level result in stronger spatial patterns. This factor is more important than regularity as the values for the autocorrelation statistics are smaller when using hexagonal grids. Also, at the urban level, regularity of the urban core, centered around administrative units also result in higher values for the statistics. This result is amplified because of the weaker patterns observed when considering a large dataset of administrative regions and census tracts combined. This shows that census tracts lack functional homogeneity. The results of this study are important in terms of providing some guidelines when global spatial

autocorrelation measures are used. Particularly, when including spatial dependence in statistical models, one should take into consideration the nature of the geographical units; in other words, it is important to study the regularity of the spatial units and their functional homogeneity. Nonetheless the results should be confirmed in future studies using different datasets as to reach a more general set of rules for choosing the correct set of regions or at least to acknowledge the effect of differences in terms spatial aggregation methodologies. References Anselin, L. (1992) Spatial Data Analysis with GIS: An Introduction to Application in the Social Sciences, Technical Report 92-10 - National Center for Geographic Information and Analysis, University of California) Bivand, R. Pebesma, E. Gómez-Rubio, V. (2008) Applied Spatial Data Analysis, Series Use R, Springer Cliff, A. Ord, K. (1970) Spatial Autocorrelation: A Review of Existing and New Measures with Applications, Economic Geography - 46, pp. 269-292 Getis, A. (2007) Reflections on spatial autocorrelation, Regional Science and Urban Economics - 37, pp. 491 496 Getis, A. (2008) A History of the Concept of Spatial Autocorrelation: A Geographer s Perspective, Geographical Analysis - 40, pp. 297 309 Getis, A. (2009) Spatial Weights Matrices, Geographical Analysis - 41, pp. 404 410 Getis, A. Ord, J. (1992) The Analysis of Spatial Association by Use of Distance Statistics, Geographical Analysis - 24, pp.189-206 Goodchild, M. (2009) What Problem? Spatial Autocorrelation and Geographic Information Science, Geographical Analysis - 41, pp. 411 417 Rodrigues, A. (2006) Labour Productivity Dynamics in Europe: Alternative Explanations for a Well Known Problem, Rome - International Workshop on spatial Econometrics and Statistics Rogerson, P. (2001) Statistical Methods for Geography, SAGE Pubications