Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Similar documents
Modelling spatio-temporal patterns of disease

Spatio-temporal Point Processes: Methods and Applications

A Tale of Two Parasites

Multistate Modeling and Applications

Robust MCMC Algorithms for Bayesian Inference in Stochastic Epidemic Models.

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Fast Likelihood-Free Inference via Bayesian Optimization

Andrew B. Lawson 2019 BMTRY 763

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

Quasi-likelihood Scan Statistics for Detection of

Estimating the Exponential Growth Rate and R 0

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique

Cluster Analysis using SaTScan

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Disease mapping with Gaussian processes

Spatio-temporal Statistical Modelling for Environmental Epidemiology

Survival Analysis Math 434 Fall 2011

ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES

A general mixed model approach for spatio-temporal regression data

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Statistical Inference and Methods

Introduction to Bayes and non-bayes spatial statistics

Analysing geoadditive regression data: a mixed model approach

11 Survival Analysis and Empirical Likelihood

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Summary statistics for inhomogeneous spatio-temporal marked point patterns

Model Based Clustering of Count Processes Data

Multi-state Models: An Overview

A spatio-temporal model for extreme precipitation simulated by a climate model

Efficient Likelihood-Free Inference

STAT331. Cox s Proportional Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

CTDL-Positive Stable Frailty Model

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Inference for partially observed stochastic dynamic systems: A new algorithm, its theory and applications

Spatial Point Pattern Analysis

Statistical Methods for Alzheimer s Disease Studies

Frailty Models and Copulas: Similarities and Differences

Survival Analysis. Stat 526. April 13, 2018

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Bayesian Hierarchical Models

Survival Analysis I (CHL5209H)

Lecture 22 Survival Analysis: An Introduction

Tests of independence for censored bivariate failure time data

Hierarchical Modeling for Multivariate Spatial Data

Downloaded from:

Using statistical methods to analyse environmental extremes.

Spatial Inference of Nitrate Concentrations in Groundwater

Statistical Inference for Stochastic Epidemic Models

SUPPLEMENTARY MATERIAL

Cox s proportional hazards/regression model - model assessment

Spatio-temporal epidemiology of Campylobacter jejuni enteritis, in an area of Northwest England,

SPRING 2007 EXAM C SOLUTIONS

Multivariate Count Time Series Modeling of Surveillance Data

POMP inference via iterated filtering

Cox s proportional hazards model and Cox s partial likelihood

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

The Joys of Geocoding (from a Spatial Statistician s Perspective)

Model comparison and selection

Bayesian Spatial Health Surveillance

Likelihood Inference for Lattice Spatial Processes

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Introduction to repairable systems STK4400 Spring 2011

Malaria transmission: Modeling & Inference

Community Health Needs Assessment through Spatial Regression Modeling

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Proceedings of the 2017 Winter Simulation Conference W. K. V. Chan, A. D Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds.

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Identification of hotspots of rat abundance and their effect on human risk of leptospirosis in a Brazilian slum community

A short introduction to INLA and R-INLA

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Measurement Error in Spatial Modeling of Environmental Exposures

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Bayesian Geostatistical Design

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

e 4β e 4β + e β ˆβ =0.765

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

MAS3301 / MAS8311 Biostatistics Part II: Survival

Infinitely Imbalanced Logistic Regression

Peaks-Over-Threshold Modelling of Environmental Data

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Spatial Misalignment

Estimating functions for inhomogeneous spatial point processes with incomplete covariate data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Control Variates for Markov Chain Monte Carlo

Summary STK 4150/9150

Transcription:

Statistical Analysis of Spatio-temporal Point Process Data Peter J Diggle Department of Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University School of Public Health

Gastroenteric disease in Hampshire, UK

Gastroenteric disease in Hampshire, UK 3374 incident cases, 1 August 2000 to 26 August 2001. largely sporadic incidence pattern concentration in population centres occasional clusters of cases?

Questions establish normal spatio-temporal pattern of reported cases (NHS Direct) identify spatially and temporally localised anomalies in incidence pattern (real-time surveillance)

The 2001 UK FMD epidemic First confirmed case 20 February 2001 Approximately 140,000 at-risk farms in the UK (cattle and/or sheep) Outbreaks in 44 counties, epidemic particularly severe in Cumbria and Devon Last confirmed case 30 September 2001 Consequences included: more than 6 million animals slaughtered (4 million for disease control, 2 million for welfare reasons ) estimated direct cost 8 billion

540000 y 500000 460000 460000 500000 y 540000 580000 31 March 580000 28 February 320000 360000 400000 280000 320000 360000 x 30 April 31 May 400000 540000 y 500000 460000 460000 500000 y 540000 580000 x 580000 280000 280000 320000 360000 400000 280000 320000 x 360000 400000 360000 400000 x 580000 540000 y 500000 460000 460000 500000 y 540000 580000 30 June 280000 320000 360000 x 400000 280000 320000 x

Progress of the epidemic in Cumbria predominant pattern is of transmission between nearneighbouring farms but also some apparently spontaneous outbreaks qualitatively similar pattern in other English counties

Questions What factors affected the spread of the epidemic? How effective were control strategies in limiting the spread?

Analysis strategies for continuous-time processes 1. Empirical: log-gaussian Cox process models Poisson process with space-time intensity Λ(x, t) = exp{s(x, t)} 2. Mechanistic: work with conditional intensity function H t = λ(x, t H t ) = complete history (locations and times of events) conditional intensity (hazard) for new event at location x, time t, given history H t

Analysis strategies for continuous-time processes (1) log-gaussian Cox process model relatively tractable (eg closed-form expressions for second-moment structure) also able to generate a wide range of aggregated patterns scientifically natural if major determinant of pattern is environmental variation otherwise, often still a sensible empirical model

Model for gastroenteric disease data Notation λ 0 (x, t) = λ(x, t) = R(x, t) = normal intensity of incident cases actual intensity of incident cases spatio-temporal variation from normal pattern λ(x, t) = λ 0 (x, t)r(x, t) Scientific objective Use incident data up to time t to construct predictive distribution for current risk surface, R(x, t), hence identify anomalies, for further investigation.

Spatio-temporal model formulation λ(x, t) = λ 0 (x, t)r(x, t) λ 0 (x, t) = λ 0 (x)µ 0 (t) R(x, t) = exp{s(x, t)} S(x, t) = spatio-temporal Gaussian process: E[S(x, t)] = 0.5σ 2 Var{S(x, t)} = σ 2 Corr{S(x, t), S(x u, t v)} = ρ(u, v) conditional on R(x, t), incident cases form an inhomogeneous Poisson process with intensity λ(x, t)

Parameter estimation λ 0 (x) : locally adaptive kernel smootihng µ 0 (t) : Poisson log-linear regression σ 2, ρ(u, v) : matching empirical and theoretical second moments (but could also use Monte Carlo MLE)

Spatial prediction plug-in for estimated model parameters MCMC to generate samples from conditional distribution of S(x, t) given data up to time t choose critical threshold value c > 1 map empirical exceedance probabilities, p t (x) = P (exp{s(x, t)} > c data) web-reporting with daily updates Do we need to take account of parameter uncertainty?

Spatial prediction : results for 6 March 2003 c = 2

Analysis strategies for continuous-time processes (2) Analysis via conditional intensity function H t = λ(x, t H t ) = complete history (locations and times of events) conditional intensity (hazard) for new event at location x, time t, given history H t

Likelihood analysis Log-likelihood for data (x i, t i ) A [0, T ] : i = 1,..., n, with t 1 < t 2 <... < t n, is L(θ) = n log λ(x i, t i H ti ) i=1 T 0 A λ(x, t H t )dxdt Rarely tractable, but Monte Carlo methods are becoming available in special cases (eg log-gaussian Cox processes)

Partial likelihood analysis Data (x i, t i ) A [0, T ] : i = 1,..., n, with t 1 < t 2 <... < t n Condition on locations x i and times t i, derive log-likelihood for observed ordering 1, 2,..., n can allow for right-censored event-times if relevant R i = risk-set at time t i p i = λ(x i, t i H ti )/ j R i λ(x j, t i H ti ) (discrete R i ) p i = λ(x i, t i H ti )/ R i λ(x j, t i H ti )dx (continuous R i ) partial log-likelihood: L p (θ) = n i=1 log p i

A model for the FMD epidemic (after Keeling et al, 2001) Notation H t = history of process up to t λ(x, t H t ) = conditional intensity λ jk (t) = rate of transmission from farm j to farm k Farm-specific covariates for farm i n 1i = number of cows n 2i = number of sheep

Transmission kernel f(u) = exp{ (u/φ) 0.5 } + ρ At-risk indicator for transmission of infection I jk (t) = 1 if farm k not infected and not slaughtered by time t, and farm j infected and not slaughtered by time t Reporting delay Simplest assumption is that reporting date is infection date plus τ (latent period of disease plus reporting delay if any)

Resulting statistical model λ jk (t) = λ 0 (t)a j B k f( x j x k )I jk (t) λ 0 (t) = arbitrary A j = (αn 1j + n 2j ) B k = (βn 1k + n 2k )

Fitting the model rate of infection for farm k at time t is λ k (t) = j λ jk (t) partial likelihood contribution from ith case is p i = λ i (t i )/ k λ k (t i )

FMD results Common parameter values in Cumbria and Devon? Likelihood ratio test: χ 2 4 = 2.98 Parameter estimates (ˆα, ˆβ, ˆφ, ˆρ) = (4.92, 30.68, 0.39, 9.9 10 5 ) But note that likelihood ratio test rejects ρ = 0.

Model extensions sub-linear dependence of infectivity/susceptibility on stock size A j = (αn γ 1j + nγ 2j ) B k = (βn γ 1k + nγ 2k ) Likelihood ratio test: χ 2 1 = 334.9. other farm-specific covariates, eg z j = area of farm j and similarly for B k. A j = (αn γ 1j + nγ 2j ) exp(z j δ) Likelihood ratio test: χ 2 1 = 3.26

Baseline intensity: Nelson-Aalen estimator Write λ ij (t) as λ ij (t) = λ 0 (t)ρ ij (t) Nelson-Aalen estimator is ˆΛ 0 (t) = t 0 ˆρ(u) 1 dn(u) = i:t i t ˆρ(t i ) 1 where ˆρ(t) is plug-in from fitted model.

Nelson-Aalen estimates for Cumbria (solid line) and Devon (dotted line) cumulative hazard 0.000 0.005 0.010 0.015 0.020 50 100 150 200 time (days since 1 Feb)

An ecological application data record locations x i and arrival times t i of nesting birds on several small off-shore islands birds known to prefer higher ground for nesting physical limit on distance between any two nests 25cm does spatio-temporal pattern of nesting sites show any evidence of spatial interaction beyond minimum separation distance?

Model for the pattern of nesting sites Interaction function h(u) = Conditional intensity is 0 : u δ 0 θ : δ < u δ 1 : u > δ λ(x, t H t ) = λ 0 (t) exp{z(x)β} g(x, t H t ) z(x) = elevation u (t) = min j:tj <t x x j g(x, t i H ti ) = h{u (t i )}

Final pattern on four islands Island 84 Island 74 y 4495400 4495500 300960 301000 301040 301080 y 4495300 4495400 300580 300620 300660 x x Island 61 Island 56 y 4494820 4494880 4494940 300800 300840 300880 y 4494900 4495000 301040 301080 x x

Confidence envelope for h(u) exp(theta) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.5 14.5 Distance (meter)

Conclusions spatio-temporal point process data-sets becoming widely available different problems require different modelling strategies temporal should often take precedence over spatial routine implementation is an important consideration