Spatial Point Pattern Analysis

Similar documents
Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Spatial Misalignment

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Modeling the Mean: Response Profiles v. Parametric Curves

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Consider Table 1 (Note connection to start-stop process).

Multiple Regression Analysis: The Problem of Inference

Lecture 2: Poisson and logistic regression

Correlation analysis. Contents

Point Pattern Analysis

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Joint Gaussian Graphical Model Review Series I

Two-Variable Regression Model: The Problem of Estimation

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

Chapter 6 Spatial Analysis

Lecture 5: Poisson and logistic regression

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Point process models for earthquakes with applications to Groningen and Kashmir data

Correlation and regression

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Exam Applied Statistical Regression. Good Luck!

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis

Linear Regression & Correlation

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE

STA6938-Logistic Regression Model

A Spatio-Temporal Point Process Model for Firemen Demand in Twente

Statistical Analysis of List Experiments

Introduction to Statistical Analysis

Generalized logit models for nominal multinomial responses. Local odds ratios

School on Modelling Tools and Capacity Building in Climate and Public Health April Point Event Analysis

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018

Spatial Statistics A Framework for Analyzing Geographically Referenced Data in Insurance Ratemaking

Stat 642, Lecture notes for 04/12/05 96

Statistical Inference for Means

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology

Statistical Analysis of the Item Count Technique

Bayesian Hierarchical Models

MLE for a logit model

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Classification. Chapter Introduction. 6.2 The Bayes classifier

Model Estimation Example

8 Nominal and Ordinal Logistic Regression

Linear Regression Models P8111

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Modeling the Covariance

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Introduction to Multivariate Relationships

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

High-Throughput Sequencing Course

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Quasi-likelihood Scan Statistics for Detection of

Heteroscedasticity. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Heteroscedasticity POLS / 11

Using statistical methods to analyse environmental extremes.

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Logistic regression model for survival time analysis using time-varying coefficients

Introduction to Probability and Stocastic Processes - Part I

Lecture 4: Generalized Linear Mixed Models

Machine Learning Linear Classification. Prof. Matteo Matteucci

Log-linear Models for Contingency Tables

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Chapter 4. Parametric Approach. 4.1 Introduction

12 Modelling Binomial Response Data

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Part 8: GLMs and Hierarchical LMs and GLMs

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

Lecture 3.1 Basic Logistic LDA

Week 9 The Central Limit Theorem and Estimation Concepts

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Categorical Predictor Variables

Lab 8. Matched Case Control Studies

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 17-1: Regression with dichotomous outcome variable - Logistic Regression

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Experimental Design and Data Analysis for Biologists

Distribution-free ROC Analysis Using Binary Regression Techniques

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Modelling the Covariance

Generalized Linear Models for Non-Normal Data

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Investigating Models with Two or Three Categories

Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Mean square continuity

Distribution-free ROC Analysis Using Binary Regression Techniques

ZERO INFLATED POISSON REGRESSION

Biostatistics Advanced Methods in Biostatistics IV

Transcription:

Spatial Point Pattern Analysis Jamie Monogan University of Georgia Spatial Data Analysis Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 1 / 13

Objectives By the end of this meeting, participants should be able to: Test for complete spatial randomness in a point pattern. Analytically formulate the motivation behind a point process model. Use a generalized additive logistic regression model to predict the relative risk of events in space. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 2 / 13

Three Point Patterns Bivand, Pebesma, and Gómez-Rubio 2008, Table 7.1 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Cells Japanese Pines Redwoods Regular Pattern Homogeneously Distributed Clustered Pattern Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 3 / 13

Testing Complete Spatial Randomness G function: Distribution of distances from arbitrary event to nearest event Define distances as d i = min j {d ij, j i}, i = 1... n. We can estimate the function as: Ĝ(r) = #{d i : d i r, i}, n where the numerator is the number of elements with distance lower than or equal to d and n is the total number of points. Under complete spatial randomness: G(r) = 1 exp{ λπr 2 }. F function: Distribution of all distances from an arbitrary point in the area to its nearest event Often called the empty space function because it measures the average space between events. Under complete spatial randomness: F (r) = 1 exp{ λπr 2 }. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 4 / 13

Data Generating Processes for Point Patterns The Poisson Distribution Homogenous Poisson process: The number of events in region A with area A is Poisson distributed with mean λ A. Given n observed events in A, they are uniformly distributed in A. An unbiased estimator of the intensity parameter: ˆλ = n A. Inhomogenous Poisson process: Suppose the intensity parameter can vary across space. Estimator: ˆλ(x) = 1 h 2 n i=1 κ ( x xi h ) /q( x ), where: h measures the level of smoothing, x i are the n observed points, q( x ) is a border correction to compensate for missing values, and κ is a bivariate and symmetrical kernel function. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 5 / 13

Using Covariates to Model the Location of Events John Snow s map of cholera deaths in the Soho District of London, summer 1854 (Ward & Gleditsch 2008, 12) Inhomogenous Poisson process: Use covariates to determine the intensity parameter, or relative rate of events. Binary estimator: Choose a good control group and use covariates to determine the relative risk of a case over a control. Example: Distance from the Broad Street well with Snow s data. Would work in either case, though you would need a control group for the binary estimator. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 6 / 13

Point Patterns and Relative Risk I Spatial point processes can be thought of as Poisson processes where events can occur in an arbitrarily small space. In this vein, epidemiologists often try to model where events occur relative to the location of the population at risk. With a well selected control group reflecting the underlying population, the relative risk can be thought of as: r(x) = log{f 1 (x)/f 2 (x)} Where f 1 is the spatial density of cases (such as a disease), f 2 is the spatial density of the control group (reflecting the broader population), and x is the location of a case or control in space. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 7 / 13

Point Patterns and Relative Risk II Kelsall & Diggle (1995) approach this problem by estimating Poisson processes with spatially-varying intensity parameters and then calculating the ratio of the parameters to get the relative risk: ρ(x) = r(x) + c 1 = log{λ 1 (x)/λ 2 (x)} Where c 1 is an additive constant. Later, Kelsall & Diggle (1998) observe that pooling cases and controls and using a binary estimator yields another valid estimate of relative risk: logit{p(x)} = ρ(x) + c 2 = r(x) + c 1 + c 2 Where c 2 is an additive constant. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 8 / 13

Using a Generalized Additive Model with a Logit Link More on GAM models: http://monogan.myweb.uga.edu/teaching/statcomp/gam.pdf A logistic regression that pools cases and controls in a binary estimator and also allows the incorporation of covariate terms when estimating the relative risk (Kelsall & Diggle 1998): P(y = 1) = p(x, u) logit{p(x, u)} = u β + g(x) y is a dichotomous variable coded 1 for a case observation and 0 for a control observation, x refers to a location in latitude and longitude, u is a vector of covariates observed at location x, β is a vector of coefficients for the covariates, g(x) is a smooth function over space that is not dependent on the covariates, and p(x, u) is the probability a case observation (rather than a control) will be placed at location x given covariates u. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 9 / 13

Example: Spatial Placement of Major Air Polluters Monogan, Konisky, & Woods Enforcement of the Clean Air Act is bifurcated: federal and state government serve a major role. States have an incentive to free ride by trying to place major air polluters downwind of their own residents. Case group: major air polluters. Control group: hazardous waste treatment, storage and disposal facilities. This group should reflect the larger distribution of where polluters would be located with relatively low incentives to free ride. Outcome variable: The probability that a particular site in latitude and longitude hosts a major air polluter, rather than a hazardous waste facility (Kelsall & Diggle 1998). logit{p(x, u)} = u β + g(x) Hypothesis: The farther from the downwind border a site is, the less likely it will host a major air polluter. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 10 / 13

1 0 0.5 Features of the Polluter Placement Model 0.5 Latitude 25 30 35 40 45 4.5 5 2.5 3.5 1.5 4 3 2 1 0.5 0 1 0.5 0.5 1.5 120 110 100 90 80 70 Wind Direction, 1930-1996 Longitude Baseline Relative Risk Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 11 / 13

Results from the Polluter Placement Model Generalized Additive Model, Logit Link Covariate Estimate Std. Error t-ratio p-value Distance from leeward border -0.00028 0.00008-3.49717 0.00047 Intercept -0.40901 0.01861-21.97748 0.00000 Notes: Approximate significance test for intercept smoothed over latitude and longitude: χ 2 28.62 = 3916.741 (p <.001). N = 39727, AIC= 49095. Estimates computed with R 2.15.1. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 12 / 13

May 1, 3:30-6:30, Baldwin 302 In-class presentations of student papers. 12 minute talk. 3 minute question-and-answer. Jamie Monogan (UGA) Spatial Point Pattern Analysis Spatial Data Analysis 13 / 13