Output: -Observed Mean Distance -Expected Mean Distance - Nearest Neighbor Index -Graphic report - Test variables:

Similar documents
Introduction to spatial data analysis

Introduction to spatial data analysis

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Exploratory Spatial Data Analysis (ESDA)

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

GIS CONFERENCE MAKING PLACE MATTER Decoding Health Data with Spatial Statistics

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lab #3 Background Material Quantifying Point and Gradient Patterns

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

Spatial Autocorrelation

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

Lecture 8. Spatial Estimation

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

Nature of Spatial Data. Outline. Spatial Is Special

Geoprocessing Tools at ArcGIS 9.2 Desktop

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

What s special about spatial data?

Temporal vs. Spatial Data

Lecture 5 Geostatistics

Introduction to Spatial Statistics and Modeling for Regional Analysis

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Local Spatial Autocorrelation Clusters

An Introduction to Pattern Statistics

Spatial Analysis 2. Spatial Autocorrelation

Objectives Define spatial statistics Introduce you to some of the core spatial statistics tools available in ArcGIS 9.3 Present a variety of example a

Exploratory Spatial Data Analysis (And Navigating GeoDa)

GIST 4302/5302: Spatial Analysis and Modeling

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

Spatial Analysis 1. Introduction

Measures of Spatial Dependence

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context.

Universitat Autònoma de Barcelona Facultat de Filosofia i Lletres Departament de Prehistòria Doctorat en arqueologia prehistòrica

ENGRG Introduction to GIS

Lecture 1: Introduction to Spatial Econometric

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

Basics of Geographic Analysis in R

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

Application of the Getis-Ord Gi* statistic (Hot Spot Analysis) to seafloor organisms

Spatial Pattern Analysis: Mapping Trends and Clusters. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

CSISS Tools and Spatial Analysis Software

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Texas A&M University

Creating and Managing a W Matrix

Visualize and interactively design weight matrices

The CrimeStat Program: Characteristics, Use, and Audience

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

Introduction GeoXp : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées.

Using GIS to Identify Pedestrian- Vehicle Crash Hot Spots and Unsafe Bus Stops

Lecture 4. Spatial Statistics

The Study on Trinary Join-Counts for Spatial Autocorrelation

Overview of Spatial analysis in ecology

Tutorial 8 Raster Data Analysis

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Global Spatial Autocorrelation Clustering

Mapping and Analysis for Spatial Social Science

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

Spatial Pattern Analysis: Mapping Trends and Clusters

Everything is related to everything else, but near things are more related than distant things.

Spatial Point Pattern Analysis

Spatial Analysis with ArcGIS Pro STUDENT EDITION

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Modeling the Ecology of Urban Inequality in Space and Time

Tracey Farrigan Research Geographer USDA-Economic Research Service

Where to Invest Affordable Housing Dollars in Polk County?: A Spatial Analysis of Opportunity Areas

Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017

Chapter 6 Spatial Analysis

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Spatial Data Analysis in Archaeology Anthropology 589b. Kriging Artifact Density Surfaces in ArcGIS

Spatial Analysis of Population Distribution by Employment Sectors in Tokyo Metropolitan Area

Spatial Data, Spatial Analysis and Spatial Data Science

Identification of Economic Clusters Using ArcGIS Spatial Statistics. Joseph Frizado Bruce Smith Michael Carroll

Spatial-Temporal Analytics with Students Data to recommend optimum regions to stay

GEO 463-Geographic Information Systems Applications. Lecture 1

GIS CONCEPTS ARCGIS METHODS AND. 2 nd Edition, July David M. Theobald, Ph.D. Natural Resource Ecology Laboratory Colorado State University

Representation of Geographic Data

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

GIST 4302/5302: Spatial Analysis and Modeling

Using Spatial Statistics and Geostatistical Analyst as Educational Tools

Testing for global spatial autocorrelation in Stata

Spatial Data Mining. Regression and Classification Techniques

EVALUATING CHANGING RESIDENTIAL SEGREGATION IN AUCKLAND, NEW ZEALAND, USING SPATIAL STATISTICS

Class 9. Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference

The Implementation of Autocorrelation-Based Regioclassification in ArcMap Using ArcObjects

Concepts and Applications of Kriging

Concepts and Applications of Kriging

Spatial Pattern Analysis: Mapping Trends and Clusters

Construction Engineering. Research Laboratory. Approaches Towards the Identification of Patterns in Violent Events, Baghdad, Iraq ERDC/CERL CR-09-1

Comparison of spatial methods for measuring road accident hotspots : a case study of London

HUMAN CAPITAL CATEGORY INTERACTION PATTERN TO ECONOMIC GROWTH OF ASEAN MEMBER COUNTRIES IN 2015 BY USING GEODA GEO-INFORMATION TECHNOLOGY DATA

Section C: Management of the Built Environment GIS As A Tool: Technical Aspects of Basic GIS

Concepts and Applications of Kriging. Eric Krause

Transcription:

Clustering: global indexes (to measure the global degree of clustering for the whole set of events) -> methods based on quadrats (joint count) vs. on distances AVERAGE NEAREST NEIGHBOUR: the distance between events is less (clustering) or more (pattern inibitorio) of the expected distance in case of complete spatial randomness? (Clark-Evans, 50s) Nearest neighbour ratio = observed mean distance / expected mean distance (CSR) -> Input: Points: unweighted (= 1) / Projected coordinate system! (Polygons and lines: convert into points with x, y = centroids) Output: -Observed Mean Distance -Expected Mean Distance - Nearest Neighbor Index -Graphic report - Test variables: -> Toolbox / Spatial statistics / Analyzing patterns p-value: probabilty of the spatial distribution to be random z-score: standard deviation of the real values from expected values - measure the ANN for firms within the GRA (selection of rm_immig.shp) Bivariate point patterns : co-agglomeration, co-location, competition/cooperation, related variety: Bivariate/Cross K function, Pairwise interaction point process.. Crimestat, R.. Risk-Adjusted Nearest Neighbor Hierarchical Spatial Clustering (Rnnh) (Crimestat) Clustering index in which the probability of identifying clusters for certain categories of events is assessed in relation to the spatial distribution of all events, by using an interpolation between the (kernel) density surfaces of the primary file (e.g. crimes) and the secondary files (eg. population) Multi-variate point patterns ( ). -> Bivariate point patterns analysis for each couple of patterns

Clustering processes at different scales In the figure: 10 clusters of first order, 8 clusters of second order, 3 of third order, and so on.. NEAREST NEIGHBOR HIERARCHICAL CLUSTER: constantdistance clustering routine for non-weighted events, hierarchical: first order clusters are considered points which may cluster at the second order and so on, until criteria are satisfied (for each order). RIPLEY'S K-FUNCTION: To identify clustered/inhibitory/random point patterns t different scales/distances between points (Ripley 1976, 1981 Spatial statistics ) Two uses: to confirm/reject the null/random hypothesis at various scales/distances + to dientify the scale/distance where the clustering/inhibition is more intense/weak K = expected number of events / real number of events In case of complete spatial randomness: K(d) = πd2 : Output (dbf, shp): n. cluster, mean center, deviational ellipse and convex hull (spezzata) of points beloning to each cluster, area and cluster density. Results are heavily influenced by the identified first order clusters

Linearization of the K function: L function (Besag 1977) In case of complete spatial randomness: L(d) = d (ArcGIS): Or L(d) = 0 -> (Crimestat) K value (clustering) Confidence interval Expected value Confidence interval Ripley s K Lower and upper confidence envelops: beyond which results may be considered significant Confidence envelops are estimated thanks to the reiteration of a Montecarlo simulation (Crimestat: 100 simulations; ArcGIS: 0 / 9%, 99% o 99,9% of the confidence interval). Corollary: simulations work better if the number of points is not small (> 100) Spatial statistics / Analyzing patterns / Multi-Distance Spatial Cluster Analysis Maximum distance Crimestat: SQRT(A)/3 ArcGIS:? Distance ranges Crimestat: 100 ArcGIS: from 1 to 100 (or: beginning distance + distance increment )

(K function) other parameters: Weight field: default: 1, fixed: weight (number of events at each point). The weighted estimation gives different results (clustering is likely to be higher)!: points cannot have distance=0* Problems with the analysis of spatial data #1: -Study area extension (if too small, the analysis may not include elements which are important to provide an exhaustive explanation. If too big, the spatial distribution pattern may be due of a diversity of processes which have nothing to do with what we want to explain. Example: suburban, scattered and low density urban areas). Is an area sensitive tool: results are influenced by the area extension Study area methods: Default: minimum enclosing rectangle User provided: via polygonial layer -> «Study Area» -> reduce the size of the area Creat a mask of the area within the GRA (ring road) by selecting (manually) the zone urbanistiche within the GRA and exporting the selection as mask_area.shp Specific problems in the analysis of spatial data #2: Boundary problems: given the probability of non observed events beyond the study area s boundaries (with a similar or dissimilar spatial distribution), con distribuzione spaziale simile o dissimile), clustering near the boundaries is under-estimated. Boundary correction methods: NONE: because events are only to be found within the boundaries. Or because the point layer is wider than the study area: points beyond the boundaries of the study area are used for estimating the K function (!!!) SIMULATE_OUTER_BOUNDARY_VALUES: simulate a «mirrored» distribution of points beyond the bounadries REDUCE_ANALYSIS_AREA: reduces the study area. RIPLEY'S_EDGE_CORRECTION_FORMULA: those points whose distance from the boundary is smaller than to other points, are weighted more (good only for non irregular study areas) Output: table(+ Display result graphically): - ExpK (K expected value in case of CSR), - Envelopes (confidence intervals), - ObservedK (value of K) - DiffK (ObservedK-ExpK) Cautions: - Works better for clustered than for inhibitory processes - It s mainly a tool for identifying second-order clusters, i.e. localized clusters, intra-regional scales or medium distances. - Not reliable for small numbers of events (>30, >100) - Not reliable for strongly irregular areas (if it s not possible to solve adequately the boundary problem)

Measure the Ripley K function for the distribution of firms owned by foreigners within the GRA (ring road) Space-time Ripley s K Input: vv/rm_immig_wdata.shp (Confidence envelop: 0 permutations)* Click Display results graphically Distance bands: 20 Weight field: CNT Beginning distance: 250 Distance increments: 250 Boundary correction method: NONE, because: Study area: User provided = dropbox/corsimemotef/lezgis16/4/mask.shp Verify the graphic and table (diff) output Taxonomy of spatial analysis tools (in ArcGIS and Crimestat) Of events (spatial distribution) Of intensities (spatial association) Global indexes of spatial autocorrelation Global indexes Average nearest neighbour (Multi scale) K Ripley Global indexes of autocorrelation: Moran s I Geary s C Kernel density maps Local indicators of spatial association (LISA): Local indexes Nearest neighbour hierarchical clustering Local Anselin of Moran s I (Cluster and outlier analys.) Risk Adjusted Nearest Neighbor Hierarchical Clustering Getis Ord Gi (Hot spot analysis)

3. Global indexes of spatial AUTOCORRELATION First law of geography (Tobler) = "Everything is related to everything else, but near things are more related than distant things." It s a form of spatial dependence (positive or negative): the degree to which nearby features are similar or dissimilar*, vs. an hypothesis of complete spatial randomness. - Similar to time series analysis, but both proximity AND direction/position (2D) Why to estimate the degree of spatial autocorrelation: - To understand the process (or the variety of processes..) which explain the geographical distribution of intensities - To estimate the degree to which nearby features potentially influence each other (=interaction, interdependence, attraction, contagion, clustering, segregation, etc ) - To verify the degree to which the observed variables are (not) statistically indipendent (eg. autocorrelation reduces the dataset s information content or obscures what is specific about each area, because intensities in one area are partially influenced by what is happening nearby) - (Eg. to test the spatial autocorrelation of models residuals) - (Eg. to assist in the identification of the spatial sample size) Exploratory Spatial Data Analysis (and mapping) vs. Modelling (formal verification and testing of hypothesis) Spatial auto-correlation: global indexes Moran s I Spatial autocorrelation (MORAN S I): Global co-variance index adapted from the analysis of the memory effect in time series (Moran 40s, Whittle 1954). Measures the gobal degree of similarity between the (upper and lower) intensities (-/+) of nearby features Xi X = intensity in point Xi average intensity (Xi-X)(Xj-X): Cross-product, high if values are similar Wij: spatial weights (/influences) matrix * Clustered/high autocorrelation if I is high (I>0), dispersed/low autocorrelation if I is low (I<0), vs. the CSR hypothesis Iexp=-[1/(n-1)]

Spatial statistics / Analyzing patterns / Spatial autocorrelation (Moran s I) Conceptualization of spatial relationships: Inverse distance (squared): spatial relationships between features are inversely proportional to their (squared) distance. Computational problems with small distances (crimestat: adjust for small distances ) and no threshold (n to n) Fixed distance band: within the threshold (band) any feature weights 1. Appropriate in the case of non-uniform polygons, and for large point datasets. Zone of indifference: neighbors (or features within the distance threshold) weight 1. Other features weight is inversely proportional to their distance. Appropriate as above, when the influence of distant features is relevant. Computational problems. Polygon contiguity (adjacency!): considersonlybordering features (1 if bordering, 0 all the others). Appropriate only for regular polygons (original Moran s I. Generalized by Cliff and Ord 1973. Widely used in spatial econometrics) Conceptualization of spatial relationships (2): Spatial statistics / Modeling spatial relationships / Generate spatial weight matrix Distance Band or Threshold Distance (mostly for large datasets): threshold beyond which influence is null (with inverse distance = i) 0: all features are considered; ii) Empty: applies a default threshold distance (min distance at which any feature has a neighbour); iii) defined by the user Weights Matrix from file: uses a spatial weight matrix file (.swm) created/adapted by the user Spatial weight matrix Table in.swm format in which any cell includes an expression of the distance, time, cost, influence, spatial relationship between any couple of features (presence/absence or intensity)

Conceptualization of spatial relationships (3): INVERSE_DISTANCE: ( ) + Exponent (!), eg. 2 FIXED_DISTANCE: ( ) K_NEAREST_NEIGHBORS: considers only a K number of the most proximate features CONTIGUITY_EDGES_ONLY: considers only features which share a boundary ( rooke ) CONTIGUITY_EDGES_CORNERS: considers only features which share a boundary and/or vertex ( queen ) ROW STANDARDIZATION: values in the spatial weight matrix are standardized in order for their sum to be = 1. To avoid the indexes to be influenced by the different number of nearby features: appropriate in the case of sample data and compulsory in the case of polygon contiguity, because (irregular) polygons have a different and arbitrary number of bordering features. Test variables: Z-score = standard deviation / p-value DELAUNAY_TRIANGULATION: create overlapping triangles connecting polygons centroids, and considers only features which share a triangle s vertex.. CONVERT_TABLE: to specify spatial relationships in a table [Convert spatial weight matrix to table (utilities)] Normality: the Z-score displays a normal distribution? Output: -Moran s index - Expected index - Variance - Z-score e p-value Cautions: -Significant only above a certain number of features (> 30) Vs. Geary s autocorrelation index (Moran is more robust) HIGH/LOW CLUSTERING (Getis & Ord). The probability for high or low values (+) to be clustered or dispersed (similar to Average Nearest Neighbour) LAB: a spatial analysis of public schools quality in Rome = to what extent school quality depends on the context? = what is the degree of spatial autocorrelation of school quality? Input: spatial17/addxy/schools_roma_xy_dprv.dbf = a table with XY coordinates of all primary and secondary schools in Rome, including a (normalized) «deprivation index» f(dropouts, repetitions, students to teachers ratio, students per classroom, foreign students ratio). 1. Georefer the Schools_Roma dbf using Add XY data + export the data output setting its coordinate system «as the data frame» 2. Estimate the global autocorrelation of the normalized deprivation index, using arctoolbox/spatial statistics/analyzing patterns/moran s I, and setting all the parameters..

LAB: what is the degree of spatial autocorrelation of school quality? 2. Estimate the global autocorrelation of the normalized deprivation index, using the Moran s I. Parameters: Input feature class: schools Input field = «DPRV_NORM» Conceptualization of spatial relationships:? Row standardization:? Threshold distance: 10.000 meters Generate report -> Verify the graphic report and test variables: what is the result? Is this statistically significant? Do high or low quality schools cluster in certain zones, and where? -> Local indicators of spatial autocorrelation 4. LOCAL INDEX OF SPATIAL AUTOCORRELATION To measure the degree of autocorrelation for each geographical feature (where and which features?) Local indexes of spatial association/autocorrelation Anselin local of Moran s I (Anselin L. 1995, Local indicators of spatial association LISA. Geographical Analysis 27, 93-115) To attribute to each feature a degree of high/low autocorrelation based on its (high/low) intensity being similar/dissimilar to nearby features Z: intensity, S: variance, W: spatial weight matrix

Input: polygons (crimestat) and points(arcgis) Output: Grado di segregazione tra aree a prevalenza di imprenditori cinesi e aree a prevalenza di imprenditori italiani Contributo Anselin locale alla local segregazione of Moran s tra aree a I prevalente of the presenza distance of unità condotte entrepreneurs da imprenditori from cinesi o their italiani country of origin Cluster type (COType) identifies (and renders): - Features which are part of high (HH) or low (LL) values clusters, because nearby features have similar values, and are statistical significant (positive and high z-score). - outlier features, with high or low values, surrounded by features with low (HL) or high (LH) values, and are statistical significant (low and negative z-score) Spatial statistics / Mapping clusters / Cluster and outlier analysis LAB: a spatial analysis of public schools quality in Rome = To what extent school quality depends on the context? Do high or low quality schools cluster in certain zones, and where? 1. Identify and render those schools which are part of clusters of nearby low or high quality schools using arctoolbox/spatial statistics/mapping clusters/cluster and outlier analysis Input: Schools with data shapefile, input field: DPRV_NORM Spatial relationships: Inverse distance -> Modify the symbology of the ouput layer in order to visualize only the schools in clusters of high or low and significant spatial autocorrelation values -> Open and verify the ouput layer attribute table -> In a copied layer, represent the value of the index (L_Milndex) disregarding of the degree of significance -> Check(and trytomakesense) ofoutliers

Local indexes of spatial autocorrelation (2): Getis-Ord Gi, high/low clustering (Hot Spot Analysis) Identifies features which are part of hot spots : areas with unusual clustering of high or low values (Cliff & Ord, Spatial autocorrelation, 1973), based on the value of the GiZScore (categorized according to the standard deviation: the higher the GiZscore, the more nearby features have high values, and viceversa. (You may do a density map of using the Z-Score as weight) Cautions: - reliable only with large dataset (>30 features) - test problems (the significativity test is based on global indexes of spatial autocorrelation)

LAB: 1. measure the (global) spatial autocorrelation of the distribution of all foreigners (and of Chinese) in Rome s zone urbanistiche and 2. identify (local) clusters of contigous zones with an high or low density of foreigners (and of Chinese) Spatial interpolation: to obtain surface data from point sample observations Input: spatial17/vv/zurb_wdata.shp Input field:? Arctoolbox tools? Conceptualization of spatial relationship:? Standardization:? Threshold dist.:? Results? Spatial interpolation: INVERSE DISTANCE WEIGHTED Spatial interpolation: KRIGING

Spatial interpolation: KRIGING (..more) problems with the analysis of spatial data Example of the modifiable area unit problem (MAUP): Gerrymandering (distortions due to the shape of electoral partitions) The modifiable area unit problem (MAUP): any geographical discontinuity is artificial, (more or the less) arbitrary, modifiable, and influences the results and explanation - Scale problem, f(spatial resolution). E.g. statistical relations are stronger the lower is the degree of spatial resolution, because variance is lower = the more we aggregate data, the stronger they correlate. The more we disaggregate date (and increase spatial resolution), the more the variance and the risk this is due to chance or mistakes - Zoning problem, f(geodata geometry), for any given number of zones, results are influenced by their shape The urban (and mostly liberal) concentration of Columbus, Ohio, located at the center of the map, is split into thirds, each segment then attached to - and outnumbered by - largely conservative suburbs. -Non uniformity: a uniform geographical partition, will be non uniform in terms of statistical attributes, and viceversa (e.g. population). Data in less dense areas are less reliable. -Irregularity, vs. compactness (e.g. administrative divisions)

And.. - Ecological fallacy: the results of aggregate analysis cannot be attributed to each individuals, or to higher scales (the rate of suicides is higher where more catholics live = catholics more keen to suicide?) - Outliers: very frequent in spatial data. The higher the spatial resolution of data, the more the probability of outliears. - Geodata quality (accuracy, completeness, consistence, resolution..) Specific problems: measurement mistakes are not indipendent (e.g. population subtracted from an area is attributed to the neighbour). The more dense the areas, the lower the data quality (but the lower the distortion due to measurement mistakes) - Categorial data: spatial analysis tools for categorial data are still largely missing - Coincident locations (distance = 0) -> collect events (to turn coincident points of unique events into weighted points) ArcGIS desktop/online Help.. ArcGIS desktop/online Help (2) Help!!! http://forums. arcgis.com http://support. esri.com/en/ http://mappingcenter. esri.com http://blogs.esri.com/ esri/arcgis/