Overview of Spatial analysis in ecology

Similar documents
Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

Interaction Analysis of Spatial Point Patterns

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Spatial Point Pattern Analysis

Chapter 6 Spatial Analysis

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Point Pattern Analysis

Simulation. Where real stuff starts

Lab #3 Background Material Quantifying Point and Gradient Patterns

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Practical Statistics

Math Review Sheet, Fall 2008

Overview of Statistical Analysis of Spatial Data

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Simulation. Where real stuff starts

A Spatio-Temporal Point Process Model for Firemen Demand in Twente

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

6. Spatial analysis of multivariate ecological data

Lecture 26 Section 8.4. Wed, Oct 14, 2009

Random Number Generation. CS1538: Introduction to simulations

CS 543 Page 1 John E. Boon, Jr.

AP Statistics Cumulative AP Exam Study Guide

So we will instead use the Jacobian method for inferring the PDF of functionally related random variables; see Bertsekas & Tsitsiklis Sec. 4.1.

Semester , Example Exam 1

Probability and Stochastic Processes

GIST 4302/5302: Spatial Analysis and Modeling

Spatial Autocorrelation

Reliability Theory of Dynamically Loaded Structures (cont.)

The Chi-Square Distributions

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Chapter 1 Statistical Inference

Institute of Actuaries of India

Modeling and Performance Analysis with Discrete-Event Simulation

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π

Practice Problems Section Problems

Monte Carlo Integration II & Sampling from PDFs

ECO220Y Continuous Probability Distributions: Uniform and Triangle Readings: Chapter 9, sections

Oikos. Appendix 1 and 2. o20751

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba

SIMULATION SEMINAR SERIES INPUT PROBABILITY DISTRIBUTIONS

Spatial Clusters of Rates

If we want to analyze experimental or simulated data we might encounter the following tasks:

Recall the Basics of Hypothesis Testing

Modelling the risk process

Stochastic Processes

Two-Sample Inferential Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Test of Complete Spatial Randomness on Networks

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

MS&E 226: Small Data

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Chapter 22. Comparing Two Proportions 1 /29

Monte Carlo Simulation. CWR 6536 Stochastic Subsurface Hydrology

Subject CS1 Actuarial Statistics 1 Core Principles

Testing of mark independence for marked point patterns

Learning Objectives for Stat 225

Fundamentals of Applied Probability and Random Processes

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Statistical Data Analysis

ENGRG Introduction to GIS

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Chapter 22. Comparing Two Proportions 1 /30

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

Chapter 4: Monte Carlo Methods. Paisan Nakmahachalasint

Universitat Autònoma de Barcelona Facultat de Filosofia i Lletres Departament de Prehistòria Doctorat en arqueologia prehistòrica

16 : Markov Chain Monte Carlo (MCMC)

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

6 Single Sample Methods for a Location Parameter

Introduction. Spatial Processes & Spatial Patterns

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Northwestern University Department of Electrical Engineering and Computer Science

Inferential Statistics

Stochastic Simulation

The multigroup Monte Carlo method part 1

Hierarchical Modeling and Analysis for Spatial Data

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Lecture 4: Testing Stuff

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

UNIT 5:Random number generation And Variation Generation

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS

The Chi-Square Distributions

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Statistical Data Analysis Stat 3: p-values, parameter estimation

Math 494: Mathematical Statistics

CONDUCTING INFERENCE ON RIPLEY S K-FUNCTION OF SPATIAL POINT PROCESSES WITH APPLICATIONS

Introduction to Statistics and Error Analysis II

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Spatial point processes

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

1. Exploratory Data Analysis

Lecture 4: Statistical Hypothesis Testing

Lecture 5: Sampling Methods

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Y i = η + ɛ i, i = 1,...,n.

Transcription:

Spatial Point Patterns & Complete Spatial Randomness - II Geog 0C Introduction to Spatial Data Analysis Chris Funk Lecture 8 Overview of Spatial analysis in ecology st step in understanding ecological process is to identify patterns Spatial auto-correlation might indicate patterns or processes Processes can operate on multiple scales, patches, gradients Auto-correlation may be spurious, interpolative, true or induced True = caused by interaction among neighboring locations Induced=caused by a causal relationship with another correlated variable(s) which h itself is auto-correlated t Nearest neighbor distances (average d i ) Under CSR, counts follow a Poission distribution, average d i follows a Weibull distribution ib ti E(average d i )= i (A/n) 0.5 A=area, n = number of points i = a constant which varies as a function of the i th neighbor Ripley s K function Under CSR, the expected # of points is d, where d is the distance lag Ripley s L(d) function linearizes and stabilizes the variances L(d) = (K(d)/ ) 0.5 -d Under CSR E(L(d)) = 0, positive values imply cluster, negative values imply stratification

Complete Spatial Randomness (CSR) 3 Loose definition Spatial process, here a spatial point process, serving as a generating mechanism of spatial point patterns, with the following characteristics: intensity (mean # of events per unit area) is constant in any subregion s of the study domain D no environmental or first-order effects Position or occurrence of any event is independent of occurrence of any other event no event-to-event interaction or second-order effects Two versions of CSR point process models Binomial point process: there are n events in study domain D, which are located at random Poisson point process: the number of events n is a realization from a Poisson distribution; once a realization n l of n is generated, these n l points are located at random within D For a Poisson point process, number of events n in study region D varies from realization to realization, whereas this number is fixed for a Binomial point process. In other words, if we generate L sets of simulated point patterns from a Poisson point process, there will be L different numbers of events over the L realizations; for the Binomial process, these L numbers will all be the same. Homogeneous Poisson Point Process Formal definition Number of events y y(s), a count, within an arbitrary subregion s with area s is a realization of a random variable Y Y(s) with a Poisson PDF: 4 Any two RVs Y(s) and Y(s ) defined over two nonoverlapping subregions s and s are independent

Homogeneous Poisson Point Process: Simulation (I) Setting Consider a study region D of size D = 00x00 and an overall intensity = 0.0, leading to an expected count of E{Y(D)} = D = 00 events within D. Let D be partitioned into Q = 5 square quadrats of equal size s q = 0x0, for all q. One can now define a set of Q = 5 random variables {Y(s q ), q = Q}, one per quadrat. Under CSR, the RV Y(s) associated with any quadrat has an expected count of E{Y(s)} = s = 4 events (per st-order stationarity), and counts across different quadrats are independent 5 Objective Generate a realization (a point pattern) from a homogeneous Poisson process; in other words, simulate counts from the Q = 5 RVs {Y(s q ); q = Q}. Once a count y(s) is simulated for quadrat s, y(s) events (points) are placed at random within s. Since E{Y(s)} = 4, we need to generate, on average, 4 events within any quadrat s. Since counts across quadrats are independent, simulated events within s do not influence the generation of events outside s. All this amount to zooming in to a particular quadrat s, generating a count y(s) from a Poisson distribution with mean E{Y(s)} = 4, and then repeating for all Q quadrats. This is the same as generating, on average, 00 events randomly within D from a RV Y(D) with a Poisson distribution with mean E{Y(D)} = D = 00; then, y(d) would denote a simulated count over D Homogeneous Poisson Point Process: Flowchart (II) Let L be the number of realizations (alternative point patterns) to generate, and n l be the number of events of the (to be) simulated point pattern in the l- th realization (using the previous notation, n l y(d)). generate L numbers (counts) {n l ; l = L} from a Poisson distribution with mean ( D = expected # of events); these L counts serve as numbers of events for the point patterns to be simulated. for the l-th realization, simulate the locations of n l events in D, by generating n l values of x- and y-coordinates, independent and uniformly distributed ib t d along the two sides of a rectangle enclosing D 3. reject any events that do not lie in D, and repeat step until n l events are obtained within D; steps & constitute a realization from a Binomial process with n l events 4. repeat steps and 3 with another # of events n l, to generate another realization, i.e., the l -th simulated point pattern 6

Realizations from a Binomial Point Process Two realizations from a Binomial i spatial point process with n = 50 events: 7 Events can appear clustered, but this is due to chance if st-order effects were present, i.e., if varied through the study region, more events should appear at same places from one realization to another; hence, clusters would be formed around high intensity areas in each realization, even if no interaction was included in the model if strong nd-order effects were present, events would appear clustered in every realization; such clusters, however, would appear in different places from one realization to another if no st-order effects were present Sampling Distribution of a Statistic Under CSR (I) Sample statistic Mean event-to-nearest-event (ENE) distance; here the variable of interest is the distance (ENE) between any event an its nearest neighbor event, and the selected summary statistic is the mean of those distances: Constructing sampling distribution of mean ENE via simulation. Adopt a null hypothesis, here CSR, as a mechanism for generating point patterns; that null hypothesis also includes the parameters, here, of the population. Generate (simulate) one realization of a point pattern under CSR 3. Compute simulated average d min value from that realization 4. Rrepeat steps () and (), say, L = 000 times to obtain L simulated average d min values 5. Histogram of L simulated average d min values = sampling distribution of mean ENE distance under the null hypothesis 8

Sampling Distribution of a Statistic Under CSR (II) Two realizations of a Binomial point process with n = 50 events: Sampling distribution or histogram of average d min values from 500 simulated (under CSR) point patterns, each having n = 50 events 9 Sampling Distribution of a Statistic Under CSR (III) Two realizations of a Binomial point process with n = 00 events: Sampling distribution or histogram of average d min values from 500 simulated (under CSR) point patterns, each having n = 00 events 0

Looking at Observed Point Patterns (I) Sampling distribution of average d min values under CSR Two observed point patterns with n = 00 events: Question: Could these two point patterns be realizations under CSR? Answer: No, and this can be said with great confidence; pattern on left (right) has much larger (smaller) mean ENE distance than expected under CSR Looking at Observed Point Patterns (II) Observed point pattern with n = 00 events, and sampling distribution of average d min under CSR: Question: Is observed point pattern more clustered: than a CSR-generated one? Answer: Most probably no, since observed average d min = 5.8 (black vertical bar) lies at the center of the sampling distribution of average d mi n values under CSR

Looking at Observed Point Patterns (III) Observed point pattern with n = 00 events, and sampling distribution of d min under CSR: 3 Question: Is this pattern more clustered than a CSR-generated one? Equivalent question: Since small average d min values indicate clustering, what is the observed area to the left of average d min on the sampling distribution under CSR? Answer: The area under the curve of the sampling distribution to the left of observed average d min = 4.65 is an indication of how unlikely is the observed pattern to be generated by CSR: the smaller that area, the more unlike is the pattern to be a realization under CSR. NOTE: if we were asking whether the observed point pattern was more even (less clustered) than a CSR-generated one, we would be looking at the area under the curve to the right of 4.65, since we would be interested in larger (than CSR-related) such distance values P-Value of An Observed Sample Statistic 4 P-value: Area under curve of sampling distribution in the direction of the alternative hypothesis from the observed statistic = probability of observing the statistic by chance (e.g. under the null hypothesis). Here, the probability of average d min value 4.65 Direction dependence in defining the P-value comes into play for one-sided tests; when we are just interested in whether the null hypothesis holds or not, no matter the direction of the alternative hypothesis (two-sided test), the final P-value is defined as twice the above P-value (for a symmetric sampling distribution) Interpretation: The P-value is a measure of how unlikely the observed pattern is to be generated by the null hypothesis: the smaller the P-value, the more unlikely is the pattern to be a realization under the null hypothesis, here CSR Any P-value is associated with a null hypothesis, since a P-value is computed from a sampling distribution which in turn is generated under a null hypothesis; here, the null hypothesis involves a spatial point process model (CSR) and some fixed quantities, i.e., # of events and the particular domain (with its boundaries)

Sampling Distribution of G Function Under CSR Interpretation: Plots provide envelope of simulated minimum and maximum G(d) curves under the null hypothesis of CSR, for a given overall intensity computed as n/ D,hence tied to the # of events considered and the particular domain; The larger n is (more events in the domain), the tighter that envelope. 5 Link to hypothesis testing: To assess whether an observed point pattern can be regarded a realization from a CSR null process, evaluate the relative position (within that envelope) of the observed G(d) curve Testing Observed Ghat Plots Against CSR (I) Two observed point patterns with n = 00 events: Question: Could these two point patterns be realizations under CSR? 6 Most probably no, since the observed G(d) curve lies outside the simulation envelope

Testing Observed Ghat Plots Against CSR (II) Observed point pattern with n = 00 events: Question: Could this point pattern be a realization under CSR? 7 Answer: Most probably yes, since observed G(d) curve lies very close to mean simulated plot, and is well within the simulation envelope Analytically-Derived Sampling Distributions 8 Concept For simple domains, e.g., rectangles, there exist mathematical formulae that provide the expected values of sample statistics under CSR; in other words, people have already calculated l what is the mean of a very large number of simulated average d min or G(d) values under CSR, without ever touching a computer These formulae have been derived before the advent of powerful computers, and have been used for a long time in point pattern analysis since, no simulation runs are involved, such analytically-derived formulae can be easily used without t resorting to computerintensive simulation procedures Limitations Analytically-derived ll i d formulae need to account for the fact that t events near the boundary of the study region do not have the same number of neighbors as events in the middle of that region Such edge effects can be taken care of when the study region has simple geometry, e.g., for rectangles

CSR-Expected Mean Nearest Neighbor Distance Definition Average of all N ENE values Note that a single number does not suffice to a describe point pattern Checking for CSR. Compute expected value of mean nearest-neighbor distance, under CSR:. Form ratio R: Interpretation: R < observed nearest neighbor distances shorter than expected tendency towards clustering R> tendency towards evenly enl spaced eventsents 9 Result depends heavily upon study area definition (used to compute ) CSR-Expected G and F Functions G function definition: Proportion of event-to-nearest-event distances d min (u i ) no greater than given distance cutoff d cumulative distribution function (CDF) of all n event-to-nearest- event distances: F function definition: Proportion of point-to-nearest- event distances d min( (t p ) no greater than given distance cutoff d CDF of all m point-to-nearest-event distances: Expected G and F function under CSR for relatively small distances to avoid edge effects: 0 Checking for CSR: compare empirical functions G(d) and F(d) with their theoretical counterparts E{G(d)} and E{F(d)} under CSR

Examples of Observed and CSR-Expected G Functions Examples of Observed and CSR-Expected F Functions

Example with Evenly Spaced Events 3 The K Function. construct set of concentric circles (of increasing radius d) around each event. compute # of events in each distance band, excluding event at the center 3. cumulative number of events up to radius d around all events becomes the sample K function K(d) 4

CSR-Expected K Function K(d) & L(d) functions under CSR this can become a very large number (due to d ), and consequently small differences between K(d) and E{K(d)} cannot be easily resolved use L function instead: 5 With E{L(d)} = 0 Interpreting the L function L(d) > 0 implies clustering L(d) < 0 implies stratification Watch out for edge effects Reality tends to be patchy Can we use Monte Carlo simulations instead of edge effect corrections? Examples of L Functions 6 L(d) > 0 more events are separated by distance d than expected under CSR clustering

Other Spatial Point Process Models Heterogeneous with no second-order effects Heterogeneous Poisson process: intensity is made spatially varying (u), and could be linked to covariates. Simulation proceeds by generating events from a homogeneous Poisson process with intensity max = max{ (u)}, and dthen independently d keeping an event at u with probability bilit (u)/ max Cox process: spatially varying intensity (u) in a non-deterministic way (doubly stochastic process); a field of (u)-values is first simulated, and then simulation proceeds as in the heterogeneous Poisson model Homogeneous with second-order effects Poisson cluster process: i) Simulate centroids of parent events from a homogeneous Poisson process ii) Associate a simulated number of off-spring with each parent centroid iii) Simulate the locations of off-spring around each parent centroid according to some bivariate PDF, and iv) Keep only the locations of off-sprind as the final simulated point pattern There also exist processes with both first- and second-order effects e.g., the inhomogeneous Poisson cluster process : : : 7 Recap (I) 8 Confirmatory analysis of spatial point patterns Allows us to quantify the departure of results obtained via exploratory tools, e.g average d min or G(d), from expected results derived d under a specific null hypotheses (here CSR) Can be used to assess to what extent observed point patterns can be regarded as realizations from a particular spatial process (here CSR) CSR involves: i) a constant intensity and (ii) no event-to-event interaction Sampling distribution of a test statistic Lies at the heart of any statistical hypothesis testing procedure, and is tied to a particular null hypothesis (and a particular study domain) Simulation and analytical derivations are two alternative ways of computing such sampling distributions (the latter being increasingly replaced by the former) Watch t h out for edge effects when using analytically ll derived d sampling distributions

Recap (II) More interesting spatial point process models Heterogeneous Poisson process, Cox process, Poisson cluster process Note: It is almost impossible to assess whether an observed point pattern (a single realization from a hypothesized point process) stems from a process with only first- or only second-order effects or a combination thereof; different processes could yield indistinguishable realizations under certain parameter combinations (equi-finality) Parameter estimation? In practice, we are most often dealing with the problem of estimating the parameters of a spatial point process model from data, i.e., from an observed spatial point pattern. This is an inverse problem, as opposed to the forward problem of generating patterns from processes. The inverse problem, however, er is under-determined, determined mostly because we only have realization (the observed pattern) from a hypothesized process Data Generating process Forward problem Data/Map 9 Schrodinger s Box Inverse Problem