MODULE 14: Spatial Statistics in Epidemiology and Public Health Lecture 3: Point Processes

Similar documents
Agile GIS : Building applicationspecific spatial analytic software from freely available software tools

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Detection of Clustering in Spatial Data

Chapter 6 Spatial Analysis

Spatial point processes

A Spatio-Temporal Point Process Model for Firemen Demand in Twente

Point Pattern Analysis

Spatial point process. Odd Kolbjørnsen 13.March 2017

Getting started with spatstat

Overview of Spatial analysis in ecology

CONDUCTING INFERENCE ON RIPLEY S K-FUNCTION OF SPATIAL POINT PROCESSES WITH APPLICATIONS

Lecture 9 Point Processes

Detection of Clustering in Spatial Data

Lecture 10 Spatio-Temporal Point Processes

Summary statistics for inhomogeneous spatio-temporal marked point patterns

On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

Second-Order Analysis of Spatial Point Processes

Statistícal Methods for Spatial Data Analysis

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Point Pattern Analysis

School on Modelling Tools and Capacity Building in Climate and Public Health April Point Event Analysis

Tests for spatial randomness based on spacings

Introduction. Spatial Processes & Spatial Patterns

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation

Bayesian Hierarchical Models

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

Cluster Analysis using SaTScan

Lab #3 Background Material Quantifying Point and Gradient Patterns

Log-Density Estimation with Application to Approximate Likelihood Inference

Testing of mark independence for marked point patterns

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

Spatial Data Analysis in Archaeology Anthropology 589b. Kriging Artifact Density Surfaces in ArcGIS

Ripley s K function. Philip M. Dixon Iowa State University,

Hierarchical Modeling and Analysis for Spatial Data

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Exponential Distribution and Poisson Process

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Statistical Data Analysis Stat 3: p-values, parameter estimation

review session gov 2000 gov 2000 () review session 1 / 38

Statistics 222, Spatial Statistics. Outline for the day: 1. Problems and code from last lecture. 2. Likelihood. 3. MLE. 4. Simulation.

Spatial Clusters of Rates

Controlling Processes That Change Land

A Preliminary Home-Range Analysis of Loggerhead Sea Turtles Released in Virginia & North Carolina

ASCALE-SENSITIVETESTOF ATTRACTION AND REPULSION BETWEEN SPATIAL POINT PATTERNS

Many natural processes can be fit to a Poisson distribution

If we want to analyze experimental or simulated data we might encounter the following tasks:

Lecture - 30 Stationary Processes

Jean-Michel Billiot, Jean-François Coeurjolly and Rémy Drouilhet

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Test of Complete Spatial Randomness on Networks

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Chapter 4 Part 3. Sections Poisson Distribution October 2, 2008

Convergence of Random Processes

First- and Second-Order Properties of Spatiotemporal Point Processes in the Space-Time and Frequency Domains.

1 Probability and Random Variables

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

Stat 890 Design of computer experiments

Lecture 21: October 19

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore

Applied Spatial Analysis in Epidemiology

Temporal point processes: the conditional intensity function

PERCENTILE ESTIMATES RELATED TO EXPONENTIAL AND PARETO DISTRIBUTIONS

An adapted intensity estimator for linear networks with an application to modelling anti-social behaviour in an urban environment

This does not cover everything on the final. Look at the posted practice problems for other topics.

18.440: Lecture 28 Lectures Review

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

GRE Quantitative Reasoning Practice Questions

Defect Detection using Nonparametric Regression

Question 1. Question Solution to final homework!! Friday 5 May Please report any typos immediately for a timely response.

Brief introduction to Markov Chain Monte Carlo

How to estimate quantiles easily and reliably

STATISTICAL DATA ANALYSIS AND MODELING OF SKIN DISEASES

DRAFT: A2.1 Activity rate Loppersum

Bob Van Dolah. Marine Resources Research Institute South Carolina Department of Natural Resources

REFERENCES AND FURTHER STUDIES

Applied Spatial Analysis in Epidemiology

Motion Along a Straight Line

Hypothesis Testing hypothesis testing approach

Lecture 7. Poisson and lifetime processes in risk analysis

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Math 304 Handout: Linear algebra, graphs, and networks.

The exponential distribution and the Poisson process

Applied Statistics. Monte Carlo Simulations. Troels C. Petersen (NBI) Statistics is merely a quantisation of common sense

Markov Processes. Stochastic process. Markov process

2.8 Linear Approximation and Differentials

Poisson line processes. C. Lantuéjoul MinesParisTech

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Systems of Linear Equations

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Stat 451 Lecture Notes Simulating Random Variables

Expected Values, Exponential and Gamma Distributions

Statistical Methods in Particle Physics

More on Estimation. Maximum Likelihood Estimation.

Transcription:

MODULE 14: Spatial Statistics in Epidemiology and Public Health Lecture 3: Point Processes Jon Wakefield and Lance Waller 1 / 90

CSR Monte Carlo testing Heterogeneous Poisson process K functions Estimating K functions Edge correction Monte Carlo envelopes Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment 2 / 90

References Baddeley, A., Rubak, E., and Turner. R. (2015) Spatial Point Patterns: Methodology and Applications in R. Boca Raton, FL: CRC/Chapman & Hall. Diggle, P.J. (1983) Statistical Analysis of Spatial Point Patterns. London: Academic Press. Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns, Third Edition CRC/Chapman & Hall. Waller and Gotway (2004, Chapter 5) Applied Spatial Statistics for Public Health Data. New York: Wiley. Møller, J. and Waagepetersen (2004) Statistical Inference and Simulation for Spatial Point Processes. Boca Raton, FL: CRC/Chapman & Hall. 3 / 90

Goals Describe basic types of spatial point patterns. Introduce mathematical models for random patterns of events. Introduce analytic methods for describing patterns in observed collections of events. Illustrate the approaches using examples from archaeology, conservation biology, and epidermal neurology. 4 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Terminology Event: An occurrence of interest (e.g., disease case). Event location: Where an event occurs. Realization: An observed set of event locations (a data set). Point: Any location in the study area. Point: Where an event could occur. Event: Where an event did occur. 5 / 90

CSR Monte Carlo testing Heterogeneous Poisson process We use probability models to generate patterns so, in effect, all of the patterns we consider are random. Usually, random pattern refers to a pattern not influenced by the factors under investigation. 6 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Complete spatial randomness (CSR) Start with a model of lack of pattern. Events equally likely to occur anywhere in the study area. Event locations independent of each other. 7 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Six realizations of CSR 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 8 / 90

CSR Monte Carlo testing Heterogeneous Poisson process CSR as a boundary condition CSR serves as a boundary between: Patterns that are more clustered than CSR. Patterns that are more regular than CSR. 9 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Too Clustered (top), Too Regular (bottom) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v Clustered 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v Clustered 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v Clustered 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v Regular 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v Regular 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v Regular 10 / 90

CSR Monte Carlo testing Heterogeneous Poisson process The role of scale Eyeballing clustered/regular sometimes difficult. In fact, an observed pattern may be clustered at one spatial scale, and regular at another. Scale is an important idea and represents the distances at which the underlying process generating the data operates. Many ecology papers on estimating scale of a process (e.g., plant disease, animal territories) but little in public health literature (so far). 11 / 90

Outline CSR Monte Carlo testing Heterogeneous Poisson process Clusters of regular patterns/regular clusters Regular pattern of clusters Cluster of regular patterns v 0.0 0.2 0.4 0.6 0.8 1.0 v 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u u 12 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Spatial Point Processes Mathematically, we treat our point patterns as realizations of a spatial stochastic process. A stochastic process is a collection of random variables X 1, X 2,..., X N. Examples: Number of people in line at grocery store. For us, each random variable represents an event location. 13 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Stationarity/Isotropy Stationarity: Properties of process invariant to translation. Isotropy: Properties of process invariant to rotation around an origin. Why do we need these? They provide a sort of replication in the data that allows statistical estimation. Not required, but development easier. 14 / 90

CSR Monte Carlo testing Heterogeneous Poisson process CSR as a Stochastic Process Let N(A) = number of events observed in region A, and λ = a positive constant. A homogenous spatial Poisson point process is defined by: (a) N(A) Pois(λ A ) (b) given N(A) = n, the locations of the events are uniformly distributed over A. λ is the intensity of the process (mean number of events expected per unit area). 15 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Is this CSR? Criterion (b) describes our notion of uniform and independently distributed in space. Criteria (a) and (b) give a recipe for simulating realizations of this process: Generate a Poisson random number of events. Distribute that many events uniformly across the study area. runif(n,min(x),max(x)) runif(n,min(y),max(y)) 16 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Compared to temporal Poisson process Recall the three magic features of a Poisson process in time: Number of events in non-overlapping intervals are Poisson distributed, Conditional on the number of events, events are uniformly distributed within a fixed interval, and Interevent times are exponentially distributed. In space, no ordering so no (uniquely defined) interevent distances. 17 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Spatial Poisson process equivalent Criteria (a) and (b) equivalent to 1. # events in non-overlapping areas is independent 2. Let A = region, A = area of A Pr[ exactly one event in A] lim = λ > 0 A 0 A 3. Pr[2 or more in A] lim = 0 A 0 A 18 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Interesting questions Test for CSR Simulate CSR Estimate λ (or λ(s)) Compare λ 1 (s) and λ 2 (s); s D for two point processes. (same underlying intensity? e.g. Z 1 (s) = location of disease cases Z 2 (s) = location of population at risk or environmental exposure levels.)) 19 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Monte Carlo hypothesis testing Review of basic hypothesis testing framework: T = a random variable representing the test statistic. Under H 0 : U F 0. From the data, observe T = t obs (the observed value of the test statistic). p-value = 1 F 0 (t obs ) Sometimes it is easier to simulate F 0 ( ) than to calculate F 0 ( ) exactly. Besag and Diggle (1977). 20 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Steps in Monte Carlo testing 1. observe u 1. 2. simulate u 2,..., u m from F 0. 3. p.value = rank of u 1 m. For tests of CSR (complete spatial randomness), M.C. tests are very useful since CSR is easy to simulate but distribution of test statistic may be complex. 21 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Monte Carlo tests very helpful 1. when distribution of U is complex but the spatial distribution associated with H 0 is easy to simulate. 2. for permutation tests e.g., 592 cases of leukemia in 1 million people in 790 regions permutation test requires all possible permutations. Monte Carlo assigns cases to regions under H 0. 22 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Testing for CSR A good place to start analysis because: 1. rejecting CSR a minimal prerequisite to model observed pattern (i.e. if not CSR, then there is a pattern to model) 2. tests aid in formulation of possible alternatives to CSR (e.g. regular, clustered). 3. attained significance levels measure strength of evidence against CSR. 4. informal combination of several complementary tests to show how pattern departs from CSR (although this has not been explored in depth). 23 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Final CSR notes CSR: 1. is the white noise of spatial point processes. 2. characterizes the absence of structure (signal) in data. 3. often the null hypothesis in statistical tests to determine if there is structure in an observed point pattern. 4. not as useful in public health? Why not? 24 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Heterogeneous population density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 u v 25 / 90

CSR Monte Carlo testing Heterogeneous Poisson process What if intensity not constant? Let d(x, y) = tiny region around (x, y) λ(x, y) = E[N(d(x, y))] lim d(x,y) 0 d(x, y) Pr[N(d(x, y)) = 1] lim d(x,y) 0 d(x, y) If we use λ(x, y) instead of λ, we get a Heterogeneous (Inhomogeneous) Poisson Process 26 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Heterogeneous Poisson Process ( ) 1. N(A) = Pois (x,y) A λ(x, y)d(x, y) ( A = (x,y) A d(x, y)) 2. Given N(A) = n, events distributed in A as an independent sample from a distribution on A with p.d.f. proportional to λ(x, y). 27 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Example intensity function lambda u v v 0 5 10 15 20 0 5 10 15 20 u 28 / 90

CSR Monte Carlo testing Heterogeneous Poisson process Six realizations 0 5 10 15 20 0 5 10 15 20 u v 0 5 10 15 20 0 5 10 15 20 u v 0 5 10 15 20 0 5 10 15 20 u v 0 5 10 15 20 0 5 10 15 20 u v 0 5 10 15 20 0 5 10 15 20 u v 0 5 10 15 20 0 5 10 15 20 u v 29 / 90

CSR Monte Carlo testing Heterogeneous Poisson process IMPORTANT FACT! Without additional information, no analysis can differentiate between: 1. independent events in a heterogeneous (non-stationary) environment 2. dependent events in a homogeneous (stationary) environment 30 / 90

How do we estimate intensities? For an heterogeneous Poisson process, λ(s) is closely related to the density of events over A (λ(s) density). So density estimators provide a natural approach (for details on density estimation see Silverman (1986) and Wand and Jones (1995, KernSmooth R library!)). Main idea: Put a little kernel of density at each data point, then sum to give the estimate of the overall density function. 31 / 90

What you need In order to do kernel estimation, you need to choose: 1. kernel (shape) - Epeuechnikov (1986) shows any reasonable kernel gives near optimal results. 2. bandwidth (range of influence). The larger b, the bandwidth, the smoother the estimated function. 32 / 90

Kernels and bandwidths Kernel variance = 0.02 Kernel variance = 0.03 0.0 0.2 0.4 0.6 0.8 1.0 s Kernel variance = 0.04 0.0 0.2 0.4 0.6 0.8 1.0 s Kernel variance = 0.1 0.0 0.2 0.4 0.6 0.8 1.0 s 0.0 0.2 0.4 0.6 0.8 1.0 s 33 / 90

1-dim kernel estimation If we have observations u 1, u 2,..., u N (in one dimension), the kernel density estimate is f (u) = 1 Nb N ( ) u ui Kern b i=1 (1) where Kern( ) is a kernel function satisfying Kern(s)ds = 1 D and b the bandwidth. To estimate the intensity function, replace N 1 by D 1. 34 / 90

What do we do with this? Evaluate intensity (density) at each of a grid of locations. Make surface or contour plot. 35 / 90

How do we pick the bandwidth? Circular question... Minimize the Mean Integrated Squared Error (MISE) Cross validation Scott s rule (good rule of thumb). b u = σ u N 1/(dim+4) (2) where σ u is the sample standard deviation of the u-coordinates, N represents the number of events in the data set (the sample size), and dim denotes the dimension of the study area. 36 / 90

Kernel estimation in R base density() one-dimensional kernel library(mass) kde2d(x, y, h, n = 25, lims = c(range(x), range(y))) library(kernsmooth) bkde2d(x, bandwidth, gridsize=c(51, 51), range.x=<<see below>>, truncate=true) block kernel density estimation dpik() to pick bandwidth 37 / 90

More kernel estimation in R library(splancs) kernel2d(pts,poly,h0,nx=20, ny=20,kernel= quartic ) library(spatstat) ksmooth.ppp(x, sigma, weights, edge=true) 38 / 90

Data Break: Early Medieval Grave Sites What do we have? Alt and Vach (1991). (Data sent from Richard Wright Emeritus Professor, School of Archaeology, University of Sydney.) Archeaological dig in Neresheim, Baden-Württemberg, Germany. The anthropologists and archaeologists involved wonder if this particular culture tended to place grave sites according to family units. 143 grave sites. 39 / 90

What do we want? 30 with missing or reduced wisdom teeth ( affected ). Does the pattern of affected graves (cases) differ from the pattern of the 113 non-affected graves (controls)? How could estimates of the intensity functions for the affected and non-affected grave sites, respectively, help answer this question? 40 / 90

Outline of analysis Read in data. Plot data (axis intervals important!). Call 2-dimensional kernel smoothing functions (choose kernel and bandwidth). Plot surface (persp()) and contour (contour()) plots. Visual comparison of two intensities. 41 / 90

Plot of the data 4000 6000 8000 10000 4000 6000 8000 10000 u v Grave locations (=grave, O=affected) O O O O O O O O O O O O O O O O O O O O O O O O O O O O 42 / 90

Case intensity Estimated intensity function Affected grave locations Intensity u v v 4000 8000 4000 8000 u 43 / 90

Control intensity u v Intensity Estimated intensity function 4000 6000 8000 10000 4000 6000 8000 10000 u v Non affected grave locations 44 / 90

What we have/don t have Kernel estimates suggest similarities and differences. Suggest locations where there might be differences. No significance testing (yet!) 45 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes First and Second Order Properties The intensity function describes the mean number of events per unit area, a first order property of the underlying process. What about second order properties relating to the variance/covariance/correlation between event locations (if events non independent...)? 46 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Ripley s K function Ripley (1976, 1977 introduced) the reduced second moment measure or K function K(h) = E[# events within h of a randomly chosen event], λ for any positive spatial lag h. NOTE: Use of λ implies assumption of stationary process! 47 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Properties of K(h) Ripley (1977) shows specifying K(h) for all h > 0, equivalent to specifying Var[N(A)] for any subregion A. Under CSR, K(h) = πh 2 (area of circle of with radius h). Clustered? K(h) > πh 2. Regular? K(h) < πh 2. 48 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Second order intensity? K(h) not universally hailed as the way (or even a good way) to describe second order properties. Provides a nice introduction for us, but another related property is... The second order intensity, λ 2 (s, u), λ 2 (s, u) = E(N(d(s))N(d(u)) lim d(s ) 0 d(s) d(u) d(u) 0 49 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Relationship to K(h) How does K( ) relate to λ 2 (s, u)? In R 2, for a stationary, isotropic process λ 2 (s, u) = λ 2 ( s u ). Then, λk(h) = 2π λ h 0 uλ 2 (u)du So λ 2 (h) = λ2 K (h) 2πh. Which to use (K( ) or λ 2 ( ))? In theory, λ 2 (h) often used but K(h) is easier to estimate from a set of data. 50 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Estimating K(h) Start with definition, replacing expectation with average yields K(h) = 1 λn N i=1 N δ(d(i, j) < h), j=1 j i for N events, where d(i, j) = distance between events i and j δ(d(i, j) < h) = 1 if d(i, j) < h, 0 otherwise. 51 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Edges Think about events near edges of study area. How do we count events within h if distance between edge and observed event is < h? Unobservable data drive increasing. More of a problem as h increases... Edge effects call for edge correction. 52 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Edge correction Add guard area around study area and only calculate K(h) for h < width of guard area. Toroidal correction. Ripley s (1976) edge correction K ec (h) = λ 1 N i=1 N j=1 j i w 1 ij δ(d(i, j) < h) where w ij = proportion of the circumference of the circle centered at event i with radius d(i, j) within the study area. 53 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Ripley s weights Conceptually, conditional probability of observing an event at distance d(i, j) given an event occurs d(i, j) from event i. Works for holes in study area too. (Astronomy application). Requires definition of study boundary (and way of calculating w ij ). 54 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Calculating K(h) in R library(splancs) khat(pts,poly,s,newstyle=false) poly defines polygon boundary (important!!!). library(spatstat) Kest(X, r, correction=c("border", "isotropic", "Ripley", "translate")) Boundary part of X (point process object ). 55 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Plots with K(h) Plotting (h, K(h)) for CSR is a parabola. K(h) = πh 2 implies ( ) K(h) 1/2 = h. π Besag (1977) suggests plotting h versus L(h) where ( ) 1/2 Kec (h) L(h) = h π 56 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Variability and Envelopes Are there any distributional results for K(h)? Some, but mostly for particular region shapes. Monte Carlo approaches more general. Observe K(h) from data. Simulate a realization of events from CSR. Find K(h) for the simulated data. Repeat simulations many times. Create simulation envelopes from simulation-based K(h) s. 57 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Example: Regular clusters and clusters of regularity Estimated K function, regular pattern of clusters sqrt(khat/pi) h 0.1 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Distance (h) Estimated K function, cluster of regular patterns sqrt(khat/pi) h 0.1 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Distance (h) 58 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Data break: Medieval gravesites Lhat distance 400 400 L plot for all gravesites, rectangle 0 1000 2000 3000 4000 5000 Distance Lhat distance 400 400 L plot for affected gravesites, rectangle 0 1000 2000 3000 4000 5000 Distance Lhat distance 400 400 L plot for non affected gravesites, rectangle 0 1000 2000 3000 4000 5000 Distance 59 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Clustering! Looks like strong clustering, but wait... Compares all, cases, controls to CSR. Oops, forgot to adjust for polygon, clustering inside square area! Significant clustering, but interesting clustering? Let s try again with polygon adjustment. 60 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Medieval graves: K functions with polygon adjustment Lhat distance 400 400 L plot for all gravesites, polygon 0 1000 2000 3000 4000 5000 Distance Lhat distance 400 400 L plot for affected gravesites, polygon 0 1000 2000 3000 4000 5000 Distance Lhat distance 400 400 L plot for non affected gravesites, polygon 0 1000 2000 3000 4000 5000 Distance 61 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Clustering? Clustering of cases at very shortest distances. Likely due to two coincident-pair sites (both cases in both pairs). Could we try random labelling here? Construct envelopes based on random samples of 30 cases from set of 143 locations. 62 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes A few notes Since K(h) is a function of distance (h), it can indicate clustering at one scale and regularity at another. K(h) measures cumulative aggregation (# events up to h). May result in delayed response to change from clustering to regularity. λ 2 (or similar measures) based on K (h) may respond more instantaneously. Often see envelopes based on extremes, but what does this mean with respect to increasing numbers of simulations? Quantiles may be better (at least they will converge to something). 63 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Testing Envelopes provide pointwise inference at particular values of h, but not correct to find deviation then assess whether it is significant. Can conduct test of H 0 : L(h) = h (equivalent to H 0 : K(h) = πh 2 ) for a predefined interval for h (Stoyan et al., 1995, p. 51). Monte Carlo test based on T = max L(h) h. 0 h h max 64 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes Notes Just as the first and second moments do not uniquely define a distribution, λ(s) and K(h) do not uniquely define a spatial point pattern (Baddeley and Silverman 1984, and in Section 5.3.4 ). Analyses based on λ(s) typically assume independent events, describing the observed pattern entirely through a heterogeneous intensity. Analyses based on K(h) typically assume a stationary process (with constant λ), describing the observed pattern entirely through correlations between events. Remember Barlett s (1964) result (IMPORTANT FACT! above). 65 / 90

K functions Estimating K functions Edge correction Monte Carlo envelopes What questions can we answer? Are events uniformly distributed in space? Test CSR. If not, where are events more or less likely? Intensity esimtation. Do events tend to occur near other events, and, if so, at what scale? K functions with Monte Carlo envelopes. 66 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Sea turtle biology Existed (at least) since Jurassic period ( 200-120 mya). Air breathing reptile. Spends life in ocean, returns to natal beach to lay eggs. Lays eggs on land ( 100 eggs per clutch, buried in sand). Hatchlings appear en masse, head to the ocean. 67 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Sea Turtle Species 68 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Consider the hatchling Life fraught with peril. On menu for crabs, gulls, etc. Goal 1: Get to the ocean. Follow the light! 69 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Sea Turtles and Humans: Regulatory implications All species threatened or endangered. Regulations in place for fisheries (TEDs), beach-front lighting, beach-front development. East coast of Florida (USA), prime nesting site for loggerhead turtles. Also popular among green turtles (biannual cycle). We consider impact of two different construction projects on Juno Beach, Florida. 70 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Two construction projects Construction of 990-foot fishing pier in 1998. Environmental impact statement approved if collect data to determine impact on sea turtle nesting. Beach nourishment project 2000. Dredge up sand offshore and rebuild beach. Past studies on impact of beach nourishment on turtle nesting (reduced for 2 nesting seasons, off-shore slope impact?) 71 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Standard approach Patrol beach each morning during nesting season. Count trackways in each Index Nesting Beach Zone. Compare distribution of counts between years. 72 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Juno Beach data Marinelife Center of Juno Beach volunteers patrol beach each morning. Differential GPS coordinates for apex of each crawl. Store: Date. Nest? Species. Predation? Approximately 10,000 points per year. 73 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Juno Beach Data 74 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Loggerhead Displacement from Beach 75 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Displacement of Green Emergences Displacement of nest sites to loess beach: Greens Displacement from loess 100 0 100 200 Nourishment Project Pre pier (1997 1998) Pre Nourishment (1999 2000) Post Nourishment (2001 2002) 1 2 3 4 5 6 7 8 9 10 11 20000 15000 10000 5000 0 5000 10000 Feet along beach 76 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Linearizing the beach While we have sub-meter accurate GPS locations, depth on beach of secondary interest. We want to know where differences in nesting pattern occur and whether these observed differences are significant. Treat emergence locations as realization of a spatial point process in one dimension (events on a line). Ignore depth on beach by projecting all points to loess curve representing beach. Three time periods of interest: Pre-pier (1997-1998), pre-nourishment (post-pier) (1999-2000), post-nourishment (2001-2002). 77 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Kernel estimates for each time period Let λ(s) denote intensity function of point process (mean number of events per unit distance). Intensity proportional to pdf. Data: locations s 1,..., s n. Estimate via kernel estimation: λ(s) = 1 [ ] n nb i=1 Kern (s s i ) b Kern( ) is kernel function, b bandwidth (governing smoothness). 78 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Ratio of kernel estimates Kelsall and Diggle (1995, Stat in Med) consider ratio of two intensity estimates as spatial measure of relative risk. Suppose we have n 1 before events, n 2 after events. λ 1 (s), λ 2 (s) associated before/after intensity functions. Conditional on n 1 and n 2, data equivalent to two random samples from pdfs: f 1 (s) = λ 1 (s)/ λ 1 (u)du and f 2 (s) = λ 2 (s)/ λ 2 (u)du. 79 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Ratio of kernel estimates Consider log relative risk r(s) = log f 1(s) f 2 (s). Some algebra yields: r(s) = log H 0 : r(s) = 0 for all s. How do we test this? [ ] [ ] λ1 (s) λ1 (u)du log. λ 2 (s) λ2 (u)du 80 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Monte Carlo inference Conditional on all n 1 + n 2 locations, randomize n 1 to before, n 2 to after. Calculate f (s) and ĝ (s), get ratio r (s). Repeat many (999) times. Calculate pointwise 95% tolerance region. 81 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Densities density 0 e+00 2 e 05 4 e 05 6 e 05 pre pier CC (loggerhead) pre pier CM (green) pre nourishment CC (loggerhead) pre nourishment CM (green) post nourishment CC (loggerhead) post nourishment CM (green) Nesting locations along beach Nourishment Project 1 2 3 4 5 6 7 8 9 10 11 0 5000 10000 15000 20000 25000 30000 x=distance along beach (ft) 82 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Pre- vs. post-pier, Loggerheads Pre pier vs. pre nourishment, Loggerheads Densities 0 e+00 5 e 05 Pre pier (1997 1998) Pre nourishment (1999 2000) 0 5000 10000 15000 20000 25000 30000 Feet along beach Ratio of Densities 0.0 1.0 2.0 Nourishment Project 1 2 3 4 5 6 7 8 9 10 11 0 5000 10000 15000 20000 25000 30000 Feet along beach 83 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Pre- vs. post-pier, Greens Pre pier vs. pre nourishment, Greens Densities 0 e+00 5 e 05 Pre pier (1997 1998) Pre nourishment (1999 2000) 0 5000 10000 15000 20000 25000 30000 Feet along beach Ratio of Densities 0.0 1.0 2.0 Nourishment Project 1 2 3 4 5 6 7 8 9 10 11 0 5000 10000 15000 20000 25000 30000 Feet along beach 84 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Differences Green densities smoother (wider bandwidth due to smaller sample size). Clear impact of pier on loggerheads (but very local). Conservation strategy? Maintain protected areas nearby. Note: impact on all crawls (nesting and non-nesting). 85 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Pre- vs. post-nourishment, Loggerheads Pre nourishment vs. post nourishment, Loggerheads Densities 0 e+00 5 e 05 Pre pier (1997 1998) Pre nourishment (1999 2000) 0 5000 10000 15000 20000 25000 30000 Feet along beach Ratio of Densities 0.0 1.0 2.0 Nourishment Project 1 2 3 4 5 6 7 8 9 10 11 0 5000 10000 15000 20000 25000 30000 Feet along beach 86 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Pre- vs. post-nourishment, Greens Pre nourishment vs. post nourishment, Greens Densities 0 e+00 5 e 05 Pre pier (1997 1998) Pre nourishment (1999 2000) 0 5000 10000 15000 20000 25000 30000 Feet along beach Ratio of Densities 0.0 1.0 2.0 Nourishment Project 1 2 3 4 5 6 7 8 9 10 11 0 5000 10000 15000 20000 25000 30000 Feet along beach 87 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Differences Reduced nesting in nourishment zone, increases just to south. Offshore current suggests approach from north (left on plot). Impact on all crawls (nesting and non-nesting). Suggests impact of nourishment before turtle emerges. Slope? 88 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Conclusions Pairwise comparisons suggest: Local impact of pier for loggerheads, but not for greens. Shift to south in emergences (nesting and non-nesting) due to nourishment for both loggerheads and greens. 89 / 90

Sea Turtle Biology Juno Beach, Florida Data Comparing Nesting patterns Pre- vs. post-nourishment Case study conclusion Questions? 90 / 90