The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale António Manuel RODRIGUES 1, José António TENEDÓRIO 2 1 Research fellow, e-geo Centre for Geography and Regional Planning, Faculdade de Ciências sociais e Humanas, Universidade Nova de Lisboa, Email: amrodrigues@fcsh.unl.pt 2 Associate Professor, Department of Geography and Regional Planning, Faculdade de Ciências sociais e Humanas, Universidade Nova de Lisboa, Email: ja.tenedorio@fcsh.unl.pt ABSTRACT Spatial analysis of lattice data depend greatly on the way distances between spatial units (regions) are measured. These distances are the basis of the weights matrices used to obtain statistics which take into account neighborhood relations between agents. In the social sciences, data is normally collected for sets of regions. The method used to divide the spatial surface into irregular units influence inferences about social phenomena, mainly because these units are neither homogenous nor regular. This paper tests the effect of different data aggregation methodologies on global spatial autocorrelation statistics. It is shown that regional consistency and administrative functions positively influence the existence and observation of clear spatial patterns. Resident population, is the variable chosen; two study areas and five different datasets are used to consistently test the effect of different spatial aggregation methods. KEYWORDS Spatial Autocorrelation, Spatial weights matrix, spillover effects 1. INTRODUCTION The common aspect between any geographical phenomena is the possibility to identify its location relative to any model which represents the surface of the Earth. Moreover, any individual or collective action is conditioned (and conditions) actions taken by agents located nearby. Hence, we may speak of a contagious or spillover effect which result in clear spatial patterns. Distance becomes a chief component in any geographical model, because this variable resumes the relation in space between intervening agents. In Statistics, because the fact that it is common for geographical information to be compiled and made available for a set in spatial units/regions (particularly in the social sciences), a set of Exploratory Spatial Data Analysis tools has been developed which measure the relations between these units. Central to this set of tools is the use of spatial weights matrices where each cell represents the geographical relation between a pair of spatial units. Distance measurement methodologies range from the existence of contiguity to the quantification of timedistances between centroids or urban nodes. The fact that there are no clear and formal methodologies which can assist in the choice of the most adequate proximity measurement causes model robustness to be somewhat endangered. For this reason, this is a field where further research is needed (Getis 2009). This paper tests the effect of distinct spatial data aggregation methods on measures of spatial autocorrelation. Through the use of different specifications for the spatial weights matrix, sensitivity to size and irregularity will be tested using sets of administrative regions and regular hexagonal grids superimposed on the study-areas. Observed variations in global spatial autocorrelation statistics contribute as guidelines for further research. The main objective is to test the stability of spatial autocorrelation measures to different methods of spatial data aggregation and to the use of different distance metrics. Most individual actions and in general spatial

individual attributes (eg. where someone works, lives or goes shopping) take place in a specific pointlocation. Yet, most available datasets aggregate such attributes according to pre-defined regional settings. There are several problems related to these aggregation exercises: first, using summarized indicators which refer to the average behavior of individuals or the sum of some demographic attribute may lead to mis-interpretations of social phenomena since the internal area of each spatial unit is not homogeneous. The second problem is related to the fact that individual attributes and actions are in general not independent in space; there are spatial patterns associated with each phenomenon which may be quantified using spatial autocorrelation metrics. These measure the relation between the values of an attribute aggregated for a number of spatial units/regions and the weighted mean of the same variable considering a set of neighbors. These exploratory measures are highly dependent on the concept of neighborhood used or the way distance between neighbors is quantified. The next section starts by describing the datasets used, as well as the study area. This is be followed by a brief discussion of the types of spatial weights matrices used as well as the global spatial autocorrelation statistics. Section 3 describes the results and section 4 consists of concluding remarks. 2. METHODOLOGY This paper intends to analyze how, given a spatial object/surface, the inferences made in respect to any anthropic or socio-economic variable are affected by structural characteristics of the data. Moreover, the study of spatial phenomena is particularly sensible to the method used to measure proximity between spatial units/regions. These proximity measures are in fact the backbone of geographical data analysis since they are the distinctive characteristic of spatial analysis. Two study areas will be analyzed: Continental Portugal and the Lisbon local council. For each, the distribution of resident population (Census 2001 data) is the variable of interest. Resident population, rather than population density was chosen since one objective of the study is to examine how regional irregularities affect spatial analysis. 2.1. Study-areas and data Figure 1: Continental Portugal: NUTS3 (n=28), local councils (n=278), hexagonal grid (n=278) The spatial independence of observations will be tested in order to identify existing spatial patterns. For each study area, the distribution of the variable of interest will be examined for different geographical levels of aggregation: in respect to Continental Portugal, the datasets used correspond to the NUTS 3 regions (28 spatial units) and the local councils (278). An extra dataset will be used, which results from the aggregation of the data from the highest disaggregation level (over 170 thousand census tracts) according to a set of 278 regular hexagons. In respect to the Lisbon area, two geographical datasets will be

where fij represents the common border between regions i and j. In the present study, a binary specification was chosen, since one of the spatial structures analyzed is a regular hexagonal grid, in which case contiguity measures are equal across the spatial plane. For the same reason, the chosen nearest neighbors matrix is a binary matrix, where each element wij is equal to one when j belongs to the set of k neighbors. Formally: where k is the set of i s nearest neighbors.

2.2. Spatial Autocorrelation In this study, two global spatial autocorrelation statistics are used: the Moran's I and Geary's C -equations 1a and 1b (Getis 2008). Both use a set of neighborhoods specified in the spatial weights matrix in order to obtain a measure of the co-variation of the variable of interest in relation of the spatially weighted mean of the set of neighbors. While Moran's I varies between -1 and 1 (-1 representing perfect negative autocorrelation and 1 perfect positive), Geary's C varies between -2 ( perfect negative) and 0 (perfect positive). While the former is centered around 0, the latter is centered around 1. 3. RESULTS The variable of interest in the present study, resident population summed up for different regional settings, as mentioned above, is aggregated according to different regional schemas. Figures 3 and 4 show the spatial distribution of the variable in Continental Portugal according to the existing 278 local councils (figure 3) and in the Lisbon local council according to the 53 existing local authorities (figure 4). The map on figure 3 also show the borders of the 28 NUTS3 regions which cover the study-area. In relation to Continental Portugal, there is a concentration of population in the North and along the coastal area. Figure 3 and 4 also include the graphical representation of the empirical density functions which show in both cases the existence of a small group of regions with large resident population numbers. In relation to Continental Portugal, it is interesting to note the existence of an intermediate group of medium to large size local councils. Figure 4 also shows the empirical density associated with resident population values for the hexagonal grid. It is possible to conclude that eliminating shape differences does not eliminated the skewness in the data. Figure 3: Spatial distribution of resident population (Continental Portugal)

Figure 4: Spatial distribution of resident population (Lisbon local council) Next, the two global spatial autocorrelation statistics, Moran's I and Geary's C were calculated for the two study areas using first a contiguity spatial weights matrix. In relation to Continental Portugal, the statistics were computed for the distribution of resident population for the 28 NUTS 3 regions, the 278 local councils and the 278 regular hexagons. For the NUTS 3 regions, there is significant weak positive autocorrelation in the data, a fact explained by the non-functional nature of this regional aggregation level and the large mean area. NUTS regions were created for statistical purposes and in most cases lack physical, social and/or urban coherence. When data is disaggregated at the local council level, spatial autocorrelation rises (higher Moran's I and lower Geary C). Hence, for the same variable, it is shown that spatial patterns are considerably different as the geographical level of analysis changes. Still related to Continental Portugal, as mentioned above, the same data was aggregated according to an hexagonal grid with the number of spatial unis equal to the number of local councils. In this case, positive spatial autocorrelation decreases according to both statistics. Again, regular hexagonal grids lack functional coherence, although the values are higher than at the NUTS 3 level. As for both indicators, the number of spatial units (n) is in the numerator, this fact is not surprising. Table 1: Global spatial autocorrelation statistics For the Lisbon local council, a similar behavior is observed. There is strong positive autocorrelation in the data at the administrative level (local authorities), whilst the spatial pattern is weaker when an hexagonal grid is used. Still for this study-area, it was thought as important to analyze, using a Moran scatterplot, the influence of outliers. This was so given the observed differences in terms of area of spatial units as distance from the older urban core increases.

Figure 5: Moran scatterplot (administrative units and hexagonal grid) Lisbon local council The Moran scatterplot represents on the horizontal axis the variable of interest and on the vertical axis the spatially weighted mean value of the same variable. Figure 5 shows that there is a group of administrative units whose value of resident population and of the lagged variable is well above the rest. Moreover, all these regions are located away from the old urban core. In order to test the influence of heterogeneity in terms of regional size, global spatial autocorrelation was computed for a set of units whose area is below the median (figure 6a). This represented a total of 27 spatial units, only one of which was located away from the core; this one was excluded from the dataset as contiguity would be lost otherwise. Finally, in order to test the effect of spatial units' size, two datasets were merged: one representing the 26 urban core local authorities and the other the aggregated census tracts 1. This resulted in a dataset with 728 spatial units (figure 6b). Figure 6: (a) Set of the 27 smaller administrative units; (b) Set of 728 spatial units Lisbon local council The spatial autocorrelation statistics for the old urban core, in spite of the small number of spatial units (26), is very high (Moran's I = 0.62 and Geary's C = 0.26), both significant at the 99% level. This demonstrates that for a small set of small functional urban regions, homogeneous in terms of area, spatial patterns are very strong. For the merged dataset, consisting of 728 spatial units covering the whole of Lisbon local council, autocorrelation is weaker, in spite of the much higher number of regions (Moran's I 1 Census tracts, in Portugal, have two level of aggregation: smaller units called sub-secções and larger ones, called secções. These latter, called in the main text aggregated census tracts, were used in this exercise.

= 0.42 and Geary's C = 0.26). This highlights a behavior which is consistent in all datasets: that functional long established administrative limits condition self-organizing urban growth around central administrative nodes; it stresses the importance of the administrative function in shaping urban form. This greater interdependence represented by high positive spatial autocorrelation is the evidence of longestablished regularity. Finally, this regularity highlighted by strong positive spatial autocorrelation is also likely to be a result of internal homogeneity in terms of type of construction. Figure 7: Moran's I statistic using W k spatial weights matrices with varying number of neighbors The final objective of the paper was to test the impact of an extra form for the spatial weights matrix; W k matrices with a varying number of neighbors. Figure 7 shows the variation of the Moran's I statistic as the number of neighbors increases. The first point to be made is that the statistic reaches its maximum when k is equal to or less than three, which is important to note considering that the mean number of contiguous regions in all datasets is between five and six. Also, for the NUTS 3 datasets, Moran's I is highest with K equal to one, which results from the large size and non-functional nature of this geographical level. Also, and confirming prior results, autocorrelation is stronger when using administrative regions (local councils and local authorities). This confirms that regularity comes second to function. 3. Concluding remarks This article intended to demonstrate how sensible are measures of global spatial autocorrelation to the geographical level of aggregation as well as to the regularity of spatial units in terms of size and shape. Using several datasets for two study-areas, it was shown that: first, consistency in terms of self-organizing human activity around urban nodes at the national level result in stronger spatial patterns. This factor is more important than regularity as the values for the autocorrelation statistics are smaller when using hexagonal grids. Also, at the urban level, regularity of the urban core, centered around administrative units also result in higher values for the statistics. This result is amplified because of the weaker patterns observed when considering a large dataset of administrative regions and census tracts combined. This shows that census tracts lack functional homogeneity. The results of this study are important in terms of providing some guidelines when global spatial

autocorrelation measures are used. Particularly, when including spatial dependence in statistical models, one should take into consideration the nature of the geographical units; in other words, it is important to study the regularity of the spatial units and their functional homogeneity. Nonetheless the results should be confirmed in future studies using different datasets as to reach a more general set of rules for choosing the correct set of regions or at least to acknowledge the effect of differences in terms spatial aggregation methodologies. References Anselin, L. (1992) Spatial Data Analysis with GIS: An Introduction to Application in the Social Sciences, Technical Report 92-10 - National Center for Geographic Information and Analysis, University of California) Bivand, R. Pebesma, E. Gómez-Rubio, V. (2008) Applied Spatial Data Analysis, Series Use R, Springer Cliff, A. Ord, K. (1970) Spatial Autocorrelation: A Review of Existing and New Measures with Applications, Economic Geography - 46, pp. 269-292 Getis, A. (2007) Reflections on spatial autocorrelation, Regional Science and Urban Economics - 37, pp. 491 496 Getis, A. (2008) A History of the Concept of Spatial Autocorrelation: A Geographer s Perspective, Geographical Analysis - 40, pp. 297 309 Getis, A. (2009) Spatial Weights Matrices, Geographical Analysis - 41, pp. 404 410 Getis, A. Ord, J. (1992) The Analysis of Spatial Association by Use of Distance Statistics, Geographical Analysis - 24, pp.189-206 Goodchild, M. (2009) What Problem? Spatial Autocorrelation and Geographic Information Science, Geographical Analysis - 41, pp. 411 417 Rodrigues, A. (2006) Labour Productivity Dynamics in Europe: Alternative Explanations for a Well Known Problem, Rome - International Workshop on spatial Econometrics and Statistics Rogerson, P. (2001) Statistical Methods for Geography, SAGE Pubications