Spatial Point Pattern Analysis

Spatial Point Pattern Analysis Jamie Monogan University of Georgia Spatial Data Analysis Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 1 / 13

Objectives By the end of this meeting, participants should be able to: Test for complete spatial randomness in a point pattern. Analytically formulate the motivation behind a point process model. Use a generalized additive logistic regression model to predict the relative risk of events in space. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 2 / 13

Three Point Patterns Bivand, Pebesma, and Gómez-Rubio 2008, Table 7.1 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Cells Japanese Pines Redwoods Regular Pattern Homogeneously Distributed Clustered Pattern Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 3 / 13

Testing Complete Spatial Randomness G function: Distribution of distances from arbitrary event to nearest event Define distances as d i = min j {d ij, j i}, i = 1... n. We can estimate the function as: Ĝ(r) = #{d i : d i r, i}, n where the numerator is the number of elements with distance lower than or equal to d and n is the total number of points. Under complete spatial randomness: G(r) = 1 exp{ λπr 2 }. F function: Distribution of all distances from an arbitrary point in the area to its nearest event Often called the empty space function because it measures the average space between events. Under complete spatial randomness: F (r) = 1 exp{ λπr 2 }. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 4 / 13

Data Generating Processes for Point Patterns The Poisson Distribution Homogenous Poisson process: The number of events in region A with area A is Poisson distributed with mean λ A. Given n observed events in A, they are uniformly distributed in A. An unbiased estimator of the intensity parameter: ˆλ = n A. Inhomogenous Poisson process: Suppose the intensity parameter can vary across space. Estimator: ˆλ(x) = 1 h 2 n i=1 κ ( x xi h ) /q( x ), where: h measures the level of smoothing, x i are the n observed points, q( x ) is a border correction to compensate for missing values, and κ is a bivariate and symmetrical kernel function. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 5 / 13

Using Covariates to Model the Location of Events John Snow s map of cholera deaths in the Soho District of London, summer 1854 (Ward & Gleditsch 2008, 12) Inhomogenous Poisson process: Use covariates to determine the intensity parameter, or relative rate of events. Binary estimator: Choose a good control group and use covariates to determine the relative risk of a case over a control. Example: Distance from the Broad Street well with Snow s data. Would work in either case, though you would need a control group for the binary estimator. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 6 / 13

Point Patterns and Relative Risk I Spatial point processes can be thought of as Poisson processes where events can occur in an arbitrarily small space. In this vein, epidemiologists often try to model where events occur relative to the location of the population at risk. With a well selected control group reflecting the underlying population, the relative risk can be thought of as: r(x) = log{f 1 (x)/f 2 (x)} Where f 1 is the spatial density of cases (such as a disease), f 2 is the spatial density of the control group (reflecting the broader population), and x is the location of a case or control in space. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 7 / 13

Point Patterns and Relative Risk II Kelsall & Diggle (1995) approach this problem by estimating Poisson processes with spatially-varying intensity parameters and then calculating the ratio of the parameters to get the relative risk: ρ(x) = r(x) + c 1 = log{λ 1 (x)/λ 2 (x)} Where c 1 is an additive constant. Later, Kelsall & Diggle (1998) observe that pooling cases and controls and using a binary estimator yields another valid estimate of relative risk: logit{p(x)} = ρ(x) + c 2 = r(x) + c 1 + c 2 Where c 2 is an additive constant. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 8 / 13

Using a Generalized Additive Model with a Logit Link More on GAM models: http://monogan.myweb.uga.edu/teaching/statcomp/gam.pdf A logistic regression that pools cases and controls in a binary estimator and also allows the incorporation of covariate terms when estimating the relative risk (Kelsall & Diggle 1998): P(y = 1) = p(x, u) logit{p(x, u)} = u β + g(x) y is a dichotomous variable coded 1 for a case observation and 0 for a control observation, x refers to a location in latitude and longitude, u is a vector of covariates observed at location x, β is a vector of coefficients for the covariates, g(x) is a smooth function over space that is not dependent on the covariates, and p(x, u) is the probability a case observation (rather than a control) will be placed at location x given covariates u. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 9 / 13

Example: Spatial Placement of Major Air Polluters Monogan, Konisky, & Woods Enforcement of the Clean Air Act is bifurcated: federal and state government serve a major role. States have an incentive to free ride by trying to place major air polluters downwind of their own residents. Case group: major air polluters. Control group: hazardous waste treatment, storage and disposal facilities. This group should reflect the larger distribution of where polluters would be located with relatively low incentives to free ride. Outcome variable: The probability that a particular site in latitude and longitude hosts a major air polluter, rather than a hazardous waste facility (Kelsall & Diggle 1998). logit{p(x, u)} = u β + g(x) Hypothesis: The farther from the downwind border a site is, the less likely it will host a major air polluter. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 10 / 13

1 0 0.5 Features of the Polluter Placement Model 0.5 Latitude 25 30 35 40 45 4.5 5 2.5 3.5 1.5 4 3 2 1 0.5 0 1 0.5 0.5 1.5 120 110 100 90 80 70 Wind Direction, 1930-1996 Longitude Baseline Relative Risk Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 11 / 13

Results from the Polluter Placement Model Generalized Additive Model, Logit Link Covariate Estimate Std. Error t-ratio p-value Distance from leeward border -0.00028 0.00008-3.49717 0.00047 Intercept -0.40901 0.01861-21.97748 0.00000 Notes: Approximate significance test for intercept smoothed over latitude and longitude: χ 2 28.62 = 3916.741 (p <.001). N = 39727, AIC= 49095. Estimates computed with R 2.15.1. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 12 / 13

May 1, 3:30-6:30, Baldwin 302 In-class presentations of student papers. 12 minute talk. 3 minute question-and-answer. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 13 / 13