Overview of Spatial analysis in ecology

Size: px

Start display at page:

Download "Overview of Spatial analysis in ecology"

Austin Cameron
5 years ago
Views:

1 Spatial Point Patterns & Complete Spatial Randomness - II Geog 0C Introduction to Spatial Data Analysis Chris Funk Lecture 8 Overview of Spatial analysis in ecology st step in understanding ecological process is to identify patterns Spatial auto-correlation might indicate patterns or processes Processes can operate on multiple scales, patches, gradients Auto-correlation may be spurious, interpolative, true or induced True = caused by interaction among neighboring locations Induced=caused by a causal relationship with another correlated variable(s) which h itself is auto-correlated t Nearest neighbor distances (average d i ) Under CSR, counts follow a Poission distribution, average d i follows a Weibull distribution ib ti E(average d i )= i (A/n) 0.5 A=area, n = number of points i = a constant which varies as a function of the i th neighbor Ripley s K function Under CSR, the expected # of points is d, where d is the distance lag Ripley s L(d) function linearizes and stabilizes the variances L(d) = (K(d)/ ) 0.5 -d Under CSR E(L(d)) = 0, positive values imply cluster, negative values imply stratification

2 Complete Spatial Randomness (CSR) 3 Loose definition Spatial process, here a spatial point process, serving as a generating mechanism of spatial point patterns, with the following characteristics: intensity (mean # of events per unit area) is constant in any subregion s of the study domain D no environmental or first-order effects Position or occurrence of any event is independent of occurrence of any other event no event-to-event interaction or second-order effects Two versions of CSR point process models Binomial point process: there are n events in study domain D, which are located at random Poisson point process: the number of events n is a realization from a Poisson distribution; once a realization n l of n is generated, these n l points are located at random within D For a Poisson point process, number of events n in study region D varies from realization to realization, whereas this number is fixed for a Binomial point process. In other words, if we generate L sets of simulated point patterns from a Poisson point process, there will be L different numbers of events over the L realizations; for the Binomial process, these L numbers will all be the same. Homogeneous Poisson Point Process Formal definition Number of events y y(s), a count, within an arbitrary subregion s with area s is a realization of a random variable Y Y(s) with a Poisson PDF: 4 Any two RVs Y(s) and Y(s ) defined over two nonoverlapping subregions s and s are independent

Homogeneous Poisson Point Process: Simulation (I) Setting Consider a study region D of size D = 00x00 and an overall intensity = 0.0, leading to an expected count of E{Y(D)} = D = 00 events within D.

3 Homogeneous Poisson Point Process: Simulation (I) Setting Consider a study region D of size D = 00x00 and an overall intensity = 0.0, leading to an expected count of E{Y(D)} = D = 00 events within D. Let D be partitioned into Q = 5 square quadrats of equal size s q = 0x0, for all q. One can now define a set of Q = 5 random variables {Y(s q ), q = Q}, one per quadrat. Under CSR, the RV Y(s) associated with any quadrat has an expected count of E{Y(s)} = s = 4 events (per st-order stationarity), and counts across different quadrats are independent 5 Objective Generate a realization (a point pattern) from a homogeneous Poisson process; in other words, simulate counts from the Q = 5 RVs {Y(s q ); q = Q}. Once a count y(s) is simulated for quadrat s, y(s) events (points) are placed at random within s. Since E{Y(s)} = 4, we need to generate, on average, 4 events within any quadrat s. Since counts across quadrats are independent, simulated events within s do not influence the generation of events outside s. All this amount to zooming in to a particular quadrat s, generating a count y(s) from a Poisson distribution with mean E{Y(s)} = 4, and then repeating for all Q quadrats. This is the same as generating, on average, 00 events randomly within D from a RV Y(D) with a Poisson distribution with mean E{Y(D)} = D = 00; then, y(d) would denote a simulated count over D Homogeneous Poisson Point Process: Flowchart (II) Let L be the number of realizations (alternative point patterns) to generate, and n l be the number of events of the (to be) simulated point pattern in the l- th realization (using the previous notation, n l y(d)). generate L numbers (counts) {n l ; l = L} from a Poisson distribution with mean ( D = expected # of events); these L counts serve as numbers of events for the point patterns to be simulated. for the l-th realization, simulate the locations of n l events in D, by generating n l values of x- and y-coordinates, independent and uniformly distributed ib t d along the two sides of a rectangle enclosing D 3. reject any events that do not lie in D, and repeat step until n l events are obtained within D; steps & constitute a realization from a Binomial process with n l events 4. repeat steps and 3 with another # of events n l, to generate another realization, i.e., the l -th simulated point pattern 6

4 Realizations from a Binomial Point Process Two realizations from a Binomial i spatial point process with n = 50 events: 7 Events can appear clustered, but this is due to chance if st-order effects were present, i.e., if varied through the study region, more events should appear at same places from one realization to another; hence, clusters would be formed around high intensity areas in each realization, even if no interaction was included in the model if strong nd-order effects were present, events would appear clustered in every realization; such clusters, however, would appear in different places from one realization to another if no st-order effects were present Sampling Distribution of a Statistic Under CSR (I) Sample statistic Mean event-to-nearest-event (ENE) distance; here the variable of interest is the distance (ENE) between any event an its nearest neighbor event, and the selected summary statistic is the mean of those distances: Constructing sampling distribution of mean ENE via simulation. Adopt a null hypothesis, here CSR, as a mechanism for generating point patterns; that null hypothesis also includes the parameters, here, of the population. Generate (simulate) one realization of a point pattern under CSR 3. Compute simulated average d min value from that realization 4. Rrepeat steps () and (), say, L = 000 times to obtain L simulated average d min values 5. Histogram of L simulated average d min values = sampling distribution of mean ENE distance under the null hypothesis 8

5 Sampling Distribution of a Statistic Under CSR (II) Two realizations of a Binomial point process with n = 50 events: Sampling distribution or histogram of average d min values from 500 simulated (under CSR) point patterns, each having n = 50 events 9 Sampling Distribution of a Statistic Under CSR (III) Two realizations of a Binomial point process with n = 00 events: Sampling distribution or histogram of average d min values from 500 simulated (under CSR) point patterns, each having n = 00 events 0

6 Looking at Observed Point Patterns (I) Sampling distribution of average d min values under CSR Two observed point patterns with n = 00 events: Question: Could these two point patterns be realizations under CSR? Answer: No, and this can be said with great confidence; pattern on left (right) has much larger (smaller) mean ENE distance than expected under CSR Looking at Observed Point Patterns (II) Observed point pattern with n = 00 events, and sampling distribution of average d min under CSR: Question: Is observed point pattern more clustered: than a CSR-generated one? Answer: Most probably no, since observed average d min = 5.8 (black vertical bar) lies at the center of the sampling distribution of average d mi n values under CSR

7 Looking at Observed Point Patterns (III) Observed point pattern with n = 00 events, and sampling distribution of d min under CSR: 3 Question: Is this pattern more clustered than a CSR-generated one? Equivalent question: Since small average d min values indicate clustering, what is the observed area to the left of average d min on the sampling distribution under CSR? Answer: The area under the curve of the sampling distribution to the left of observed average d min = 4.65 is an indication of how unlikely is the observed pattern to be generated by CSR: the smaller that area, the more unlike is the pattern to be a realization under CSR. NOTE: if we were asking whether the observed point pattern was more even (less clustered) than a CSR-generated one, we would be looking at the area under the curve to the right of 4.65, since we would be interested in larger (than CSR-related) such distance values P-Value of An Observed Sample Statistic 4 P-value: Area under curve of sampling distribution in the direction of the alternative hypothesis from the observed statistic = probability of observing the statistic by chance (e.g. under the null hypothesis). Here, the probability of average d min value 4.65 Direction dependence in defining the P-value comes into play for one-sided tests; when we are just interested in whether the null hypothesis holds or not, no matter the direction of the alternative hypothesis (two-sided test), the final P-value is defined as twice the above P-value (for a symmetric sampling distribution) Interpretation: The P-value is a measure of how unlikely the observed pattern is to be generated by the null hypothesis: the smaller the P-value, the more unlikely is the pattern to be a realization under the null hypothesis, here CSR Any P-value is associated with a null hypothesis, since a P-value is computed from a sampling distribution which in turn is generated under a null hypothesis; here, the null hypothesis involves a spatial point process model (CSR) and some fixed quantities, i.e., # of events and the particular domain (with its boundaries)

8 Sampling Distribution of G Function Under CSR Interpretation: Plots provide envelope of simulated minimum and maximum G(d) curves under the null hypothesis of CSR, for a given overall intensity computed as n/ D,hence tied to the # of events considered and the particular domain; The larger n is (more events in the domain), the tighter that envelope. 5 Link to hypothesis testing: To assess whether an observed point pattern can be regarded a realization from a CSR null process, evaluate the relative position (within that envelope) of the observed G(d) curve Testing Observed Ghat Plots Against CSR (I) Two observed point patterns with n = 00 events: Question: Could these two point patterns be realizations under CSR? 6 Most probably no, since the observed G(d) curve lies outside the simulation envelope

9 Testing Observed Ghat Plots Against CSR (II) Observed point pattern with n = 00 events: Question: Could this point pattern be a realization under CSR? 7 Answer: Most probably yes, since observed G(d) curve lies very close to mean simulated plot, and is well within the simulation envelope Analytically-Derived Sampling Distributions 8 Concept For simple domains, e.g., rectangles, there exist mathematical formulae that provide the expected values of sample statistics under CSR; in other words, people have already calculated l what is the mean of a very large number of simulated average d min or G(d) values under CSR, without ever touching a computer These formulae have been derived before the advent of powerful computers, and have been used for a long time in point pattern analysis since, no simulation runs are involved, such analytically-derived formulae can be easily used without t resorting to computerintensive simulation procedures Limitations Analytically-derived ll i d formulae need to account for the fact that t events near the boundary of the study region do not have the same number of neighbors as events in the middle of that region Such edge effects can be taken care of when the study region has simple geometry, e.g., for rectangles

10 CSR-Expected Mean Nearest Neighbor Distance Definition Average of all N ENE values Note that a single number does not suffice to a describe point pattern Checking for CSR. Compute expected value of mean nearest-neighbor distance, under CSR:. Form ratio R: Interpretation: R < observed nearest neighbor distances shorter than expected tendency towards clustering R> tendency towards evenly enl spaced eventsents 9 Result depends heavily upon study area definition (used to compute ) CSR-Expected G and F Functions G function definition: Proportion of event-to-nearest-event distances d min (u i ) no greater than given distance cutoff d cumulative distribution function (CDF) of all n event-to-nearest- event distances: F function definition: Proportion of point-to-nearest- event distances d min( (t p ) no greater than given distance cutoff d CDF of all m point-to-nearest-event distances: Expected G and F function under CSR for relatively small distances to avoid edge effects: 0 Checking for CSR: compare empirical functions G(d) and F(d) with their theoretical counterparts E{G(d)} and E{F(d)} under CSR

11 Examples of Observed and CSR-Expected G Functions Examples of Observed and CSR-Expected F Functions

12 Example with Evenly Spaced Events 3 The K Function. construct set of concentric circles (of increasing radius d) around each event. compute # of events in each distance band, excluding event at the center 3. cumulative number of events up to radius d around all events becomes the sample K function K(d) 4

13 CSR-Expected K Function K(d) & L(d) functions under CSR this can become a very large number (due to d ), and consequently small differences between K(d) and E{K(d)} cannot be easily resolved use L function instead: 5 With E{L(d)} = 0 Interpreting the L function L(d) > 0 implies clustering L(d) < 0 implies stratification Watch out for edge effects Reality tends to be patchy Can we use Monte Carlo simulations instead of edge effect corrections? Examples of L Functions 6 L(d) > 0 more events are separated by distance d than expected under CSR clustering

14 Other Spatial Point Process Models Heterogeneous with no second-order effects Heterogeneous Poisson process: intensity is made spatially varying (u), and could be linked to covariates. Simulation proceeds by generating events from a homogeneous Poisson process with intensity max = max{ (u)}, and dthen independently d keeping an event at u with probability bilit (u)/ max Cox process: spatially varying intensity (u) in a non-deterministic way (doubly stochastic process); a field of (u)-values is first simulated, and then simulation proceeds as in the heterogeneous Poisson model Homogeneous with second-order effects Poisson cluster process: i) Simulate centroids of parent events from a homogeneous Poisson process ii) Associate a simulated number of off-spring with each parent centroid iii) Simulate the locations of off-spring around each parent centroid according to some bivariate PDF, and iv) Keep only the locations of off-sprind as the final simulated point pattern There also exist processes with both first- and second-order effects e.g., the inhomogeneous Poisson cluster process : : : 7 Recap (I) 8 Confirmatory analysis of spatial point patterns Allows us to quantify the departure of results obtained via exploratory tools, e.g average d min or G(d), from expected results derived d under a specific null hypotheses (here CSR) Can be used to assess to what extent observed point patterns can be regarded as realizations from a particular spatial process (here CSR) CSR involves: i) a constant intensity and (ii) no event-to-event interaction Sampling distribution of a test statistic Lies at the heart of any statistical hypothesis testing procedure, and is tied to a particular null hypothesis (and a particular study domain) Simulation and analytical derivations are two alternative ways of computing such sampling distributions (the latter being increasingly replaced by the former) Watch t h out for edge effects when using analytically ll derived d sampling distributions

15 Recap (II) More interesting spatial point process models Heterogeneous Poisson process, Cox process, Poisson cluster process Note: It is almost impossible to assess whether an observed point pattern (a single realization from a hypothesized point process) stems from a process with only first- or only second-order effects or a combination thereof; different processes could yield indistinguishable realizations under certain parameter combinations (equi-finality) Parameter estimation? In practice, we are most often dealing with the problem of estimating the parameters of a spatial point process model from data, i.e., from an observed spatial point pattern. This is an inverse problem, as opposed to the forward problem of generating patterns from processes. The inverse problem, however, er is under-determined, determined mostly because we only have realization (the observed pattern) from a hypothesized process Data Generating process Forward problem Data/Map 9 Schrodinger s Box Inverse Problem

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 5 Topic Overview 1) Introduction/Unvariate Statistics 2) Bootstrapping/Monte Carlo Simulation/Kernel