Interaction Analysis of Spatial Point Patterns

Size: px

Start display at page:

Download "Interaction Analysis of Spatial Point Patterns"

Paul Milton Golden
5 years ago
Views:

1 Interaction Analysis of Spatial Point Patterns Geog 2C Introduction to Spatial Data Analysis Phaedon C Kyriakidis wwwgeogucsbedu/ phaedon Department of Geography University of California Santa Barbara Santa Barbara, CA 936- phaedon@geogucsbedu Spring Quarter 9 Spatial Point Patterns Definition Set of point locations with recorded events within study region, eg, locations of trees, disease or crime incidents N= clustered events in a study region N= random events in a study region point locations could correspond to all possible events or to subsets of them (mapped versus sampled point pattern) attribute values could have also been measured at event locations, eg, tree diameter (marked point pattern) not considered in this handout Objective of this handout Introduce statistical tools for quantifying spatial interaction of events, eg, clustering versus randomness or regularity Ph Kyriakidis (UCSB) Geog 2C Spring 9 2 / 27

2 Outline Concepts & Notation Distance & Distance Matrices Distances Involved in Spatial Point Patterns Quantifying Spatial Interaction: G Function Quantifying Spatial Interaction: F Function Quantifying Spatial Interaction: K Function Points To Remember Ph Kyriakidis (UCSB) Geog 2C Spring 9 3 / 27 Some Notation Concepts & Notation Point events Set of N locations of events occurring in a study area: {u i, i =,, N}, u i D R K u i = coordinate vector of i-th event location, eg, in 2D u i = {x i y i }, = belongs to, D = study domain, a subset of a K-dimensional space R K Variable of interest y(s) = number of events (a count) within arbitrary domain or support s with measure (length, area, volume) s ; support s is centered at an arbitrary location u and can also be denoted as s(u); in statistics, y(s) is treated as a realization of a random variable (RV) Y (s) Objective Quantify interaction, eg, covariation, between outcomes of any two RVs Y (s) and Y (s ) To do so, all RVs must lie in the same environment ; in other words, the long-term average (expectation) of RV Y (s) should be similar to that of Y (s ) Ph Kyriakidis (UCSB) Geog 2C Spring 9 4 / 27

3 Concepts & Notation Intensity of Events Local intensity λ(u) Mean number of events per unit area at an arbitrary location or point u, formally defined as: { } E{Y (s)} λ(u) = lim, u D s s where E{Y (s)} denotes the expectation (mean) of RV Y (s) within region s(u) centered at u and s is the area of that region Overall intensity λ Estimated as: ˆλ = n D, where D = measure (area) of study region D First-order stationarity Any RV Y (s) should have the same long-term average, for a fixed areal unit s This implies a constant intensity: λ(u) = λ, u D, and the expected number of events with a region s is just a function of s : E{Y (s)} = λ s, s D Ph Kyriakidis (UCSB) Geog 2C Spring 9 5 / 27 Concepts & Notation Interaction Between Count RVs Second-order intensity Long-term average (expectation) of products of counts per unit areas at any two arbitrary points u and u, formally defined as: { E{Y (s)y (s σ(u, u )} ) = lim s, s s s Some terminology }, u, u D not the same as E{Y(s)}*E{Y(s')}, unless variables are independent second-order stationarity: expectation of all RVs is constant (first-order stationarity), and second-order intensity is a function of separation vector between any two locations u and u isotropy: only distance (not orientation) of separation vector matters Outlook Quantifying interaction in spatial point patterns within the above assumptions or working hypotheses amounts to studying distances between events Ph Kyriakidis (UCSB) Geog 2C Spring 9 6 / 27

4 Distance & Distance Matrices Distance A measure of proximity (typically along a crow s flight path) between any two locations or spatial entities Euclidean distance Consider two points in a 2D (geographical or other) space with coordinates u i = (x i, y i ) and u j = (x j, y j ) The Euclidean distance d ij between points u i and u j is computed via Pythagoras s theorem as: d ij = d(u i, u j ) = u i u j = (x i x j ) 2 + (y i y j ) 2 u i u j is called the 2-norm of vector h ij = u i u j locations u i and u j are called, respectively, the tail and head of vector h ij y i y u i d ij y i y j y j u j x i x j x j Ph Kyriakidis (UCSB) Geog 2C Spring 9 7 / 27 x i x Distance Metric Distance & Distance Matrices Formal characteristics of a distance metric A measure d ij of proximity between locations u i and u j is a valid distance metric if it satisfies the following requirements: distance between a point and itself is always zero: d ii = distance between a point and another one is always positive: d ij > distance between two points is the same no matter which point you consider first: d ij = d ji the triangular inequality holds: sum of length of two sides of a triangle cannot be smaller than length of third side: d ij d il + d lj A metric d ij need not always be Euclidean, hence should checked to ensure that it is a valid distance metric Ph Kyriakidis (UCSB) Geog 2C Spring 9 8 / 27

5 Distance & Distance Matrices Non-Euclidean Distances Alternative distance measures (i) over a road, or railway, (ii) along a river, (ii) over a network u 2 u u3 u 4 u 5 Euclidean distance between locations network distance between locations Even more exotic distance measures (i) travel time over a network, (ii) perceived travel time between urban landmarks, (iii) volume of exports/imports Euclidean distances between network nodes actual or perceived distances on the network the latter might not even be formal distance metrics, ie: d ij d ji Ph Kyriakidis (UCSB) Geog 2C Spring 9 9 / 27 Distance & Distance Matrices Minkowski s Generalized Distance Definition Consider two points in a K-dimensional (geographical or other) space R K with coordinate vectors u i = [u i,, u ik,, u ik ] and u j = [u j,, u jk,, u jk ] The Minkowski distance of order p (with p > ), denoted as d (p) ij, between points u i and u j is computed as: ( K ) /p d (p) ij = u ik u jk p Particular cases k= Manhattan or city-block distance: d () K ij = Euclidean distance: d (2) k= u ik u jk 2 infinity norm or Chebyshev distance, as p : max( u i u j,, u ik u jk,, u ik u jk ) ij = K k= u ik u jk Distances computed from points in multidimensional spaces are routinely used in statistical pattern recognition; points represent objects or cases, each described by K attribute values Ph Kyriakidis (UCSB) Geog 2C Spring 9 / 27

6 Distance & Distance Matrices Euclidean Distance Matrix: Single Set of Points Definition Consider a set of N points {u,, u i,, u N } in a K-dimensional (geographical or other) space The distance matrix D is square (N N) matrix containing the distances {d(u i, u j ), i =,, N, j =,, N} between all N N possible pairs of points in the set u i u u 2 u 3 u 4 u 5 x i x x 2 x 3 x 4 x 5 y i y y 2 y 3 y 4 y 5 by convention, u is the coordinate vector of the st point in the set (st entry in data file) D = d d 2 d 3 d 4 d 5 d 2 d 22 d 23 d 24 d 25 d 3 d 32 d 33 d 34 d 35 d 4 d 42 d 43 d 44 d 45 d 5 d 52 d 53 d 54 d 55 = d 2 d 3 d 4 d 5 d 2 d 23 d 24 d 25 d 3 d 23 d 34 d 35 d 4 d 24 d 34 d 45 d 5 d 25 d 35 d 45 = [d ij] i-th row (or column) contains distances between i-th point u i and all others (including itself) D is symmetric with zeros along its diagonal Ph Kyriakidis (UCSB) Geog 2C Spring 9 / 27 Distance & Distance Matrices Euclidean Distance Matrix: Two Sets of Points Definition Consider 2 sets of points {u,, u i,, u N } and {t,, t j,, t M } in a K-dimensional (geographical or other) space The distance matrix D is a (N M) matrix containing the Euclidean distances {d(u i, t j ), i =,, N, j =,, M} between all N M possible pairs formed by these two sets of points u i u u 2 u 3 u 4 u 5 x i x x 2 x 3 x 4 x 5 y i y y 2 y 3 y 4 y 5 t j t t 2 t 3 t 4 t 5 t 6 t 7 x j x x 2 x 3 x 4 x 5 x 6 x 7 y j y y 2 y 3 y 4 y 5 y 6 y 7 by convention, u is the coordinate vector of the st datum in the data set #, and similarly for t D = d d 2 d 3 d 4 d 5 d 6 d 7 d 2 d 22 d 23 d 24 d 25 d 26 d 27 d 3 d 32 d 33 d 34 d 35 d 36 d 37 d 4 d 42 d 43 d 44 d 45 d 46 d 47 d 5 d 52 d 53 d 54 d 55 d 56 d 57 = [d ij] i-th row contains distances between i-th point u i in set # and all points in set #2 j-th column contains distances between j-th point t j in set #2 and all points in set # D is not symmetric, ie, d 2 d 2 : pair {u, t 2 } is not the same as pair {u 2, t } Ph Kyriakidis (UCSB) Geog 2C Spring 9 2 / 27

7 Distances Involved in Spatial Point Patterns Distances Between Events in A Point Pattern Event-to-event distance Distance d ij between event at location u i and another event at location u j : d ij = (x i x j ) 2 + (y i y j ) 2 Point-to-event distance Distance d pj between a randomly chosen point at location t p and an event at location u j : d pj = ( x p x j ) 2 + (ỹ p y j ) 2 Event-to-nearest-event distance Distance d min (u i ) between an event at location u i and its nearest neighbor event: d min (u i ) = min{d ij, j =,, N} j i Point-to-nearest-event distance Distance d min (t p ) between a randomly chosen point at location t p and its nearest neighbor event: d min (t p ) = min{ d pj, j =,, N} Ph Kyriakidis (UCSB) Geog 2C Spring 9 3 / 27 Distances Involved in Spatial Point Patterns Event-to-Nearest-Event Distances u Pattern with N=5 events u 2 u u 5 3 u Distance matrix eg, 598 = d min (u ), 762 = d min (u 2 ) Some events might be nearest neighbors of each other: eg, u 4, u 5, or have same nearest neighbor: eg, u 2, u 3, u 4 are nearest neighbors of u 5 Mean nearest neighbor distance Average of all d min (u i ) values: d min = N d min (u i ) N i= Drawback: single number does not suffice to describe point pattern Ph Kyriakidis (UCSB) Geog 2C Spring 9 4 / 27

8 The G Function Quantifying Spatial Interaction: G Function Definition Proportion of event-to-nearest-event distances d min (u i ) no greater than given distance cutoff d, estimated as: Ĝ(d) = #{d min(u i ) d, i =,, N} N Cumulative distribution function (CDF) of all N event-to-nearest-event distances; instead of computing average d min of d min values, compute their CDF For point pattern in previous page Sample histogram of event nearest neighbor distances Ĝ(d) Sample G function event-to-nearest neighbor distance, d event-to-nearest neighbor distance, d for larger number of events N, Ĝ(d) becomes smoother Ph Kyriakidis (UCSB) Geog 2C Spring 9 5 / 27 Quantifying Spatial Interaction: G Function Event-to-Nearest-Event (E2NE) Distance Histograms N= random stratified events in a study region N= clustered events in a study region Histogram of E2NE distances (evenly spaced events) Histogram of E2NE distances (clustered events) event-to-nearest neighbor distance, d event-to-nearest neighbor distance, d for evenly-spaced events, more E2NE distances similar to spacing of events for clustered events, more small E2NE distances and fewer large such distances Ph Kyriakidis (UCSB) Geog 2C Spring 9 6 / 27

9 Quantifying Spatial Interaction: G Function Sample G Function Examples N= random stratified events in a study region N= clustered events in a study region Sample G function (evenly spaced events) Sample G function (clustered events) Ĝ(d) 5 Ĝ(d) event-to-nearest neighbor distance, d event-to-nearest neighbor distance, d for evenly-spaced events, Ĝ(d) rises gradually up to the distance at which most events are spaced, and then increases rapidly for clustered events, Ĝ(d) rises rapidly at short distances, and then levels off at larger d-values Ph Kyriakidis (UCSB) Geog 2C Spring 9 7 / 27 The F Function Quantifying Spatial Interaction: F Function Definition Proportion of point-to-nearest-event distances d min (t j ) no greater than given distance cutoff d, estimated as: ˆF (d) = #{ d min (t j ) d, j =,, M} M Cumulative distribution function (CDF) of all M point-to-nearest-event distances Pattern with N=5 events and M= random points Sample F function ˆF (d) point-to-nearest neighbor distance, d for larger number M of random points, ˆF (d) becomes even smoother Note: The F function provides information on event proximity to voids Ph Kyriakidis (UCSB) Geog 2C Spring 9 8 / 27

10 Quantifying Spatial Interaction: F Function Point-to-Nearest-Event (P2NE) Distance Histograms N= random stratified events in a study region N= clustered events in a study region Histogram of P2NE distances (evenly spaced events) 3 Histogram of P2NE distances (clustered events) point-to-nearest neighbor distance, d point-to-nearest neighbor distance, d for evenly-spaced events, there are more nearest events at small distances from randomly placed points for clustered events, P2NE distances are generally larger than the previous case, and there are a few large such distances Ph Kyriakidis (UCSB) Geog 2C Spring 9 9 / 27 Quantifying Spatial Interaction: F Function Sample F Function Examples N= random stratified events in a study region N= clustered events in a study region Sample F function (evenly spaced events) Sample F function (clustered events) ˆF (d) 5 ˆF (d) point-to-nearest neighbor distance, d point-to-nearest neighbor distance, d for evenly-spaced events, ˆF (d) rises rapidly up to the distance at which most events are spaced, and then levels off (more nearest neighbors at small distances from randomly placed points) for clustered events, ˆF (d) rises rapidly at short distances, and then levels off at larger d-values Ph Kyriakidis (UCSB) Geog 2C Spring 9 / 27

11 Quantifying Spatial Interaction: F Function Comparing Sample G and F Functions N= random stratified events in a study region N= clustered events in a study region proportion Sample G and F functions (evenly spaced events) proportion Sample G and F functions (clustered events) Ĝ(d) ˆF (d) distance, d Ĝ(d) ˆF (d) distance, d for evenly-spaced events, there is more open space (smaller point-to-event distances), hence ˆF (d) rises faster than Ĝ(d) for clustered events, the reverse is true Ph Kyriakidis (UCSB) Geog 2C Spring 9 2 / 27 Quantifying Spatial Interaction: K Function The Sample K Function Concept building construct set of concentric circles (of increasing radius d) around each event 2 count number of events in each distance band 3 cumulative number of events up to radius d around all events = sample K function ˆK(d) Formal definition K(d) = u 3 u Example of K function estimation 6 events within distance h=6 units from event at location 3 events within distance h=6 units from event at location u 2 4 events within distance h=6 units from event at location E{# of events within distance d of any arbitrary event } E{# of events within study domain } λ N #{d ij d, i =,, N, j( i) =,, N} = ˆK(d) Ph Kyriakidis (UCSB) Geog 2C Spring 9 22 / 27

12 Quantifying Spatial Interaction: K Function Interpreting The Sample K Function Re-expressing ˆK(d) = λ N #{d ij d, i =,, N, j( i) =,, N} = D N N #{d ij d, i =,, N, j( i) =,, N} = D (proportion of event-to-event distances d) In other words: Function ˆK(d) is the sample cumulative distribution function (CDF) of all N 2 N event-to-event distances, scaled by D u Pattern with N=5 events u 2 u u 5 3 u 4 Sample histogram of event to event distances event-to-event distance, d ˆK(d)/ A Sample K function (/) event-to-event distance, d Note: Ignore bin at d = (center plot) and point at d = (right plot) Ph Kyriakidis (UCSB) Geog 2C Spring 9 23 / 27 Quantifying Spatial Interaction: K Function Event-to-Event Distance Histograms N= random stratified events in a study region N= clustered events in a study region Histogram of event to event distances (evenly spaced) 45 Histogram of event to event distances (clustered) event-to-event distance event-to-event distance for evenly-spaced events, there are more medium-sized E2E distances than small or large such distances for clustered events, the distribution of E2E distances is multi-modal Ph Kyriakidis (UCSB) Geog 2C Spring 9 24 / 27

13 Quantifying Spatial Interaction: K Function Event-to-Event Distance CDFs N= random stratified events in a study region N= clustered events in a study region cumulative CDF of event to event distances (evenly spaced) event-to-event distance cumulative CDF of event to event distances (clustered) event-to-event distance for clustered events, there are multiple bumps in the CDF of E2E distances due to the grouping of events in space Ph Kyriakidis (UCSB) Geog 2C Spring 9 25 / 27 Quantifying Spatial Interaction: K Function Sample K Function Examples N= random stratified events in a study region N= clustered events in a study region Sample K function (evenly spaced events) Sample K function (clustered events) Area proportion, ˆK(d) Area proportion, ˆK(d) event-to-event distance, d event-to-event distance, d sample K function ˆK(d) is monotonically increasing and is a scaled (by domain measure D ) version of the CDF of E2E distances Ph Kyriakidis (UCSB) Geog 2C Spring 9 26 / 27

14 Points To Remember Recap Quantifying interaction in spatial point patterns event-to-nearest-event distances use the sample G function Ĝ(d) point-to-nearest-event distances use the sample F function ˆF (d) event-to-event distances use the sample K function ˆK(d) K function looks at information beyond nearest neighbors Caveats clustering is always a function of the overall intensity of a point pattern clustering might occur due to local intensity variations or due to interaction; it is very difficult to disentangle each contribution Watch out for boundaries and edge effects distance distortions due to map projections sampled versus mapped point patterns Ph Kyriakidis (UCSB) Geog 2C Spring 9 27 / 27

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 5 Topic Overview 1) Introduction/Unvariate Statistics 2) Bootstrapping/Monte Carlo Simulation/Kernel