Statistical Analysis of Spatio-temporal Point Process Data Peter J Diggle Department of Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University School of Public Health
Gastroenteric disease in Hampshire, UK
Gastroenteric disease in Hampshire, UK 3374 incident cases, 1 August 2000 to 26 August 2001. largely sporadic incidence pattern concentration in population centres occasional clusters of cases?
Questions establish normal spatio-temporal pattern of reported cases (NHS Direct) identify spatially and temporally localised anomalies in incidence pattern (real-time surveillance)
The 2001 UK FMD epidemic First confirmed case 20 February 2001 Approximately 140,000 at-risk farms in the UK (cattle and/or sheep) Outbreaks in 44 counties, epidemic particularly severe in Cumbria and Devon Last confirmed case 30 September 2001 Consequences included: more than 6 million animals slaughtered (4 million for disease control, 2 million for welfare reasons ) estimated direct cost 8 billion
540000 y 500000 460000 460000 500000 y 540000 580000 31 March 580000 28 February 320000 360000 400000 280000 320000 360000 x 30 April 31 May 400000 540000 y 500000 460000 460000 500000 y 540000 580000 x 580000 280000 280000 320000 360000 400000 280000 320000 x 360000 400000 360000 400000 x 580000 540000 y 500000 460000 460000 500000 y 540000 580000 30 June 280000 320000 360000 x 400000 280000 320000 x
Progress of the epidemic in Cumbria predominant pattern is of transmission between nearneighbouring farms but also some apparently spontaneous outbreaks qualitatively similar pattern in other English counties
Questions What factors affected the spread of the epidemic? How effective were control strategies in limiting the spread?
Analysis strategies for continuous-time processes 1. Empirical: log-gaussian Cox process models Poisson process with space-time intensity Λ(x, t) = exp{s(x, t)} 2. Mechanistic: work with conditional intensity function H t = λ(x, t H t ) = complete history (locations and times of events) conditional intensity (hazard) for new event at location x, time t, given history H t
Analysis strategies for continuous-time processes (1) log-gaussian Cox process model relatively tractable (eg closed-form expressions for second-moment structure) also able to generate a wide range of aggregated patterns scientifically natural if major determinant of pattern is environmental variation otherwise, often still a sensible empirical model
Model for gastroenteric disease data Notation λ 0 (x, t) = λ(x, t) = R(x, t) = normal intensity of incident cases actual intensity of incident cases spatio-temporal variation from normal pattern λ(x, t) = λ 0 (x, t)r(x, t) Scientific objective Use incident data up to time t to construct predictive distribution for current risk surface, R(x, t), hence identify anomalies, for further investigation.
Spatio-temporal model formulation λ(x, t) = λ 0 (x, t)r(x, t) λ 0 (x, t) = λ 0 (x)µ 0 (t) R(x, t) = exp{s(x, t)} S(x, t) = spatio-temporal Gaussian process: E[S(x, t)] = 0.5σ 2 Var{S(x, t)} = σ 2 Corr{S(x, t), S(x u, t v)} = ρ(u, v) conditional on R(x, t), incident cases form an inhomogeneous Poisson process with intensity λ(x, t)
Parameter estimation λ 0 (x) : locally adaptive kernel smootihng µ 0 (t) : Poisson log-linear regression σ 2, ρ(u, v) : matching empirical and theoretical second moments (but could also use Monte Carlo MLE)
Spatial prediction plug-in for estimated model parameters MCMC to generate samples from conditional distribution of S(x, t) given data up to time t choose critical threshold value c > 1 map empirical exceedance probabilities, p t (x) = P (exp{s(x, t)} > c data) web-reporting with daily updates Do we need to take account of parameter uncertainty?
Spatial prediction : results for 6 March 2003 c = 2
Analysis strategies for continuous-time processes (2) Analysis via conditional intensity function H t = λ(x, t H t ) = complete history (locations and times of events) conditional intensity (hazard) for new event at location x, time t, given history H t
Likelihood analysis Log-likelihood for data (x i, t i ) A [0, T ] : i = 1,..., n, with t 1 < t 2 <... < t n, is L(θ) = n log λ(x i, t i H ti ) i=1 T 0 A λ(x, t H t )dxdt Rarely tractable, but Monte Carlo methods are becoming available in special cases (eg log-gaussian Cox processes)
Partial likelihood analysis Data (x i, t i ) A [0, T ] : i = 1,..., n, with t 1 < t 2 <... < t n Condition on locations x i and times t i, derive log-likelihood for observed ordering 1, 2,..., n can allow for right-censored event-times if relevant R i = risk-set at time t i p i = λ(x i, t i H ti )/ j R i λ(x j, t i H ti ) (discrete R i ) p i = λ(x i, t i H ti )/ R i λ(x j, t i H ti )dx (continuous R i ) partial log-likelihood: L p (θ) = n i=1 log p i
A model for the FMD epidemic (after Keeling et al, 2001) Notation H t = history of process up to t λ(x, t H t ) = conditional intensity λ jk (t) = rate of transmission from farm j to farm k Farm-specific covariates for farm i n 1i = number of cows n 2i = number of sheep
Transmission kernel f(u) = exp{ (u/φ) 0.5 } + ρ At-risk indicator for transmission of infection I jk (t) = 1 if farm k not infected and not slaughtered by time t, and farm j infected and not slaughtered by time t Reporting delay Simplest assumption is that reporting date is infection date plus τ (latent period of disease plus reporting delay if any)
Resulting statistical model λ jk (t) = λ 0 (t)a j B k f( x j x k )I jk (t) λ 0 (t) = arbitrary A j = (αn 1j + n 2j ) B k = (βn 1k + n 2k )
Fitting the model rate of infection for farm k at time t is λ k (t) = j λ jk (t) partial likelihood contribution from ith case is p i = λ i (t i )/ k λ k (t i )
FMD results Common parameter values in Cumbria and Devon? Likelihood ratio test: χ 2 4 = 2.98 Parameter estimates (ˆα, ˆβ, ˆφ, ˆρ) = (4.92, 30.68, 0.39, 9.9 10 5 ) But note that likelihood ratio test rejects ρ = 0.
Model extensions sub-linear dependence of infectivity/susceptibility on stock size A j = (αn γ 1j + nγ 2j ) B k = (βn γ 1k + nγ 2k ) Likelihood ratio test: χ 2 1 = 334.9. other farm-specific covariates, eg z j = area of farm j and similarly for B k. A j = (αn γ 1j + nγ 2j ) exp(z j δ) Likelihood ratio test: χ 2 1 = 3.26
Baseline intensity: Nelson-Aalen estimator Write λ ij (t) as λ ij (t) = λ 0 (t)ρ ij (t) Nelson-Aalen estimator is ˆΛ 0 (t) = t 0 ˆρ(u) 1 dn(u) = i:t i t ˆρ(t i ) 1 where ˆρ(t) is plug-in from fitted model.
Nelson-Aalen estimates for Cumbria (solid line) and Devon (dotted line) cumulative hazard 0.000 0.005 0.010 0.015 0.020 50 100 150 200 time (days since 1 Feb)
An ecological application data record locations x i and arrival times t i of nesting birds on several small off-shore islands birds known to prefer higher ground for nesting physical limit on distance between any two nests 25cm does spatio-temporal pattern of nesting sites show any evidence of spatial interaction beyond minimum separation distance?
Model for the pattern of nesting sites Interaction function h(u) = Conditional intensity is 0 : u δ 0 θ : δ < u δ 1 : u > δ λ(x, t H t ) = λ 0 (t) exp{z(x)β} g(x, t H t ) z(x) = elevation u (t) = min j:tj <t x x j g(x, t i H ti ) = h{u (t i )}
Final pattern on four islands Island 84 Island 74 y 4495400 4495500 300960 301000 301040 301080 y 4495300 4495400 300580 300620 300660 x x Island 61 Island 56 y 4494820 4494880 4494940 300800 300840 300880 y 4494900 4495000 301040 301080 x x
Confidence envelope for h(u) exp(theta) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.5 14.5 Distance (meter)
Conclusions spatio-temporal point process data-sets becoming widely available different problems require different modelling strategies temporal should often take precedence over spatial routine implementation is an important consideration