Andrew B. Lawson 2019 BMTRY 763

Similar documents
How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

Announcing a Course Sequence: March 9 th - 10 th, March 11 th, and March 12 th - 13 th 2015 Historic Charleston, South Carolina

Bayesian Hierarchical Models

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Community Health Needs Assessment through Spatial Regression Modeling

Bayesian Spatial Health Surveillance

A short introduction to INLA and R-INLA

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Point process with spatio-temporal heterogeneity

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Modelling spatio-temporal patterns of disease

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Bayesian Areal Wombling for Geographic Boundary Analysis

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Hierarchical modelling of performance indicators, with application to MRSA & teenage conception rates

R-INLA. Sam Clifford Su-Yun Kang Jeff Hsieh. 30 August Bayesian Research and Analysis Group 1 / 14

STA 216, GLM, Lecture 16. October 29, 2007

Gaussian Process Regression Model in Spatial Logistic Regression

Disease mapping with Gaussian processes

Generalized common spatial factor model

Lecture 23. Spatio-temporal Models. Colin Rundel 04/17/2017

WEB application for the analysis of spatio-temporal data

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Multivariate spatial modeling

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Represent processes and observations that span multiple levels (aka multi level models) R 2

Aggregated cancer incidence data: spatial models

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Space-time modelling of air pollution with array methods

Performance of INLA analysing bivariate meta-regression and age-period-cohort models

To link to this article: PLEASE SCROLL DOWN FOR ARTICLE

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Multivariate Count Time Series Modeling of Surveillance Data

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Hierarchical Modeling for Spatio-temporal Data

DIC: Deviance Information Criterion

Flexible Spatio-temporal smoothing with array methods

Robust MCMC Algorithms for Bayesian Inference in Stochastic Epidemic Models.

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Lecture 8. Poisson models for counts

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Hierarchical Modelling for Univariate Spatial Data

Measurement Error in Spatial Modeling of Environmental Exposures

Bayesian Regression Linear and Logistic Regression

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee

INLA for Spatial Statistics

Summary STK 4150/9150

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Modeling for Univariate Spatial Data

Order-q stochastic processes. Bayesian nonparametric applications

DIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals

Spacetime models in R-INLA. Elias T. Krainski

Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique

Chapter 4 - Fundamentals of spatial processes Lecture notes

Cluster Analysis using SaTScan

Fully Bayesian Spatial Analysis of Homicide Rates.

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Bayesian non-parametric model to longitudinally predict churn

Lecture 5: Poisson and logistic regression

Challenges in modelling air pollution and understanding its impact on human health

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

Alain F. Zuur Highland Statistics Ltd. Newburgh, UK.

Spatial modelling using INLA

Disease mapping with Gaussian processes

-A wild house sparrow population case study

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Bayesian Dynamic Modeling for Space-time Data in R

Concepts and Applications of Kriging. Eric Krause

Chapter 4 - Spatial processes R packages and software Lecture notes

Comparison of Bayesian Spatio-Temporal Models for Chronic Diseases

Example using R: Heart Valves Study

Part 8: GLMs and Hierarchical LMs and GLMs

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

STAT Lecture 11: Bayesian Regression

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

Statistics 910, #15 1. Kalman Filter

Modelling geoadditive survival data

arxiv: v1 [stat.me] 13 May 2017

Part 7: Hierarchical Modeling

Spatio-Temporal Modelling of Credit Default Data

Dirichlet process Bayesian clustering with the R package PReMiuM

Bayesian Nonparametric Spatio-Temporal Models for Disease Incidence Data

Bayesian hierarchical space time model applied to highresolution hindcast data of significant wave height

Hierarchical Modeling for Multivariate Spatial Data

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017

Gaussian copula regression using R

A tutorial in spatial and spatio-temporal models with R-INLA

Lecture 14 Bayesian Models for Spatio-Temporal Data

COS513 LECTURE 8 STATISTICAL CONCEPTS

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

Multivariate Survival Analysis

Transcription:

BMTRY 763

FMD Foot and mouth disease (FMD) is a viral disease of clovenfooted animals and being extremely contagious it spreads rapidly. It reduces animal production, and can sometimes be fatal in young stock. Counts of new cases of FMD were available for parishes within the county of Cumbria, Northern England for the period February 2001-August 2001. The data were available as half monthly counts and so there were 13 time periods available for analysis.

FMD incidence NW England (Cumbria) 2001 Infected premises (IPs) are to be modeled Within parishes we have counts of IPs and we also have a record of the total number of premises which changes over time. 138 parishes and 13 half-monthly time periods (February 2001-August 2001)

FMD first 6 time periods case/pop ratios (row-wise)

Data example: FMD in Cumbria Foot and Mouth outbreak in 2001 in Cumbria, UK 138 parishes in area Time period: single biweekly count of Ips Files: FMD_case_parish_data.txt FMD_INLA_Rcode.txt FMD_spatial_worked_examples_Rcode_ABDM.txt

FMD (FMDonePERIOD_data.txt) FMDframe<-list(n=138, count=c(1,0,1,0,2,3,1,0,1,0,3,2,0,3,1,6,1,10,2,1,8,4,4,1,1, 3,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0, 0,0,0,1,1,1,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0)) FMDmap<-readSplus("FMD_splusMAP.txt") plot(fmdmap)

Polygon Plot

FMD data Counts in parishes (y) Population of premises in parishes (n) Crude rate (y/n) crude rate map [0,0.17)(131) [0.17,0.33)(1) [0.33,0.5)(3) [0.5,0.67)(2) [0.67,0.83)(0) [0.83,1](1)

Possible descriptive Model A binomial model may be assumed A logit link also y logit( p ) α v bin( n, p ) i i i i N τ 1 0 0 i = α + 1 v 0 (0, ) N(0, τ ) v i

INLA code ind<-seq(1:138) formula1<-count~1+f(ind,model="iid,param=c(2,1)) res1<inla(formula1,family="binomial",data=fmdframe,ntrial s=pop,control.compute=list(dic=true)) summary(res1)

Results: UH component UH<-res1$summary.random$ind[,2] fillmap(fmdmap,"posterior mean UHcomponent",UH,n.col=4)

UH component posterior mean UH component [-0.94,0.64)(108) [0.64,2.23)(9) [2.23,3.81)(13) [3.81,5.4](7)

Results: residuals fit<-res1$summary.fitted.values$mean resid<-count-fit fillmap(fmdmap,"estimated crude residuals",resid,n.col=4)

Residuals estimated crude residuals [-0.01,2.45)(118) [2.45,4.92)(7) [4.92,7.38)(1) [7.38,9.84](3)

Results: infection probability linpred<-res1$summary.linear.predictor$mean prob<-exp(linpred)/(1+exp(linpred)) fillmap(fmdmap,"posterior mean infection probability",prob,n.col=4)

posterior mean infection probability Infection probability [0,0.12)(129) [0.12,0.24)(4) [0.24,0.36)(2) [0.36,0.48](3)

Results: DIC LOCdic<- res1$dic$local.dic fillmap(fmdmap,"local DIC",LOCdic,n.col=5)

DIC local DIC [0.21,1.35)(107) [1.35,2.5)(1) [2.5,3.65)(13) [3.65,4.8)(10) [4.8,5.94](6)

Space Time Modeling

Space and time considerations Often we need to consider both spatial and temporal effects in disease data Could have a location and date of diagnosis OR could have counts of disease within small areas and fixed time periods The second of these is more common and more aggregated I will only consider this latter situation here

Ohio county level respiratory cancer 10 years A well known dataset (full dataset 21 years ) 1979-1988 shown here SIRs displayed

Space-time (ST) Modeling Some notation Assume counts within fixed spatial and temporal periods: map evolutions Both space and time are subscripts in the analysis Consider separable models (with spatial and separate temporal terms) Also interaction effects

Notation outcome : y ; RRisk: ij θ ij expected count: e ij i = 1,..., m: small areas j = 1,..., J : time periods

Expected Counts Computation (simplest - overall average): e p. y / p = ij ij ij ij i j i j where p ij is the population of the ij th unit

Basic retrospective model Infinite population; small disease probability Poisson assumption y ~ Pois( e θ ) ij ij ij log( θ ) = α + S + T + ST S T i j ST ij 0 i j ij : spatial terms : temporal terms ij : interaction

Full data set: 21 years of Ohio lung cancer 10 years of SMRs standardized with the statewide rate: 1979-1988 Frequently analyzed Row wise from 1979

Some Random Effect models model 1a: log( θ ) = α + v + u + βt model 1b: ij 0 i i j log( θ ) = α + v + u + γ model 2: ij 0 i i j log( θ ) = α + v + u + γ + γ model 3: ij 0 i i 1j 2 j log( θ ) = α + v + u + γ + ψ model 4: ij 0 i i 2 j ij log( θ ) = α + v + u + γ + γ + ψ ij 0 i i 1j 2 j ij model 5: variants of (3) with ψ ij

Simple separable models Spatial components (BYM) plus linear time trend model 1a: log( θ ) = α + v + u + βt t v j i ij 0 i i j : time of j th period τ 1 (0, v ) u N u τ n i N 1... ( δ, u / δ ) i i

Simple trend models Spatial components+ random time effect model 1b: log( ) θ α γ = + v + u + ij 0 i i j

Random Walk Prior distribution Model 1 b: we assume a random effect for the time element and this has a random walk prior distribution: j 1 1 γ γ N ( γ, τ ) j More generally an AR1 prior could be used: 1 < j 1 γ γ N ( λγ, τ ); 0 λ 1 j

Simple space-time models Two random time effects: model 2: log( θ ) = α + v + u + γ + γ γ 1 j ij 0 i i 1j 2 j N 1 (0, τ ) γ 1 1 2 j N( 2 j 1, ) γ γ τ γ 2

Interaction models Separable models can be enhanced by using additive interaction terms Model 3: log( ) θ α γ ψ = + v + u + + ij 0 i i 2 j ij

Interaction priors A variety of priors for the interaction can be assumed (both correlated and non-separable) Knorr-Held (2000) first suggested dependent priors (see Lawson (2013) ch12) Two simple separable examples of possible priors are: ψ ij N(0, τ ) uncorrelated (model 3) ψ ~ N( ψ, τ ) random walk (model 5) ij ψ i, j 1 ψ

Interaction models Model 4, as for (3) but with added uncorrelated tiem effect Model 5, as for (3) but with a correlated interaction prior in time model 4: log( θ ) = α + v + u + γ + γ + ψ ij 0 i i 1j 2 j ij model 5: variants of (3) with ψ ij

Model fitting Results (WinBUGS) Model DIC pd 1a 5759 80 1b 5759 80 2 5759.4 79 3 5751.4 129 4 5755.3 129 5 5750.6 115

Interpretation The temporal trend model does not provide a better fit than the random walk (1a, 1b) The extra RE in model 2 is not needed The inclusion of the interaction in model 3 is significant but model 4 is not good Model 5 with the random walk interaction seems best as it has lowest DIC and smaller pd than model 3

Model fitting on INLA Data setup: often space time data is in the form of a matrix: y(i,j), e(i,j) Rows are small areas (eg counties, parishes etc) Columns are time units (eg years, or months days etc) Ohio data: i=1,,88; j=1,,10 Matrix of 88 x 10 dimension For INLA use, it is convenient to reformat the data so that an individual small area represents a row, but the repeated measurements on the small area are repeated rows Long form format

R Code For matrices read in with 88 x 10 structures (in R) then the code for conversion is yl<-rep(0,880) el<-rep(0,880) T=10 for (i in 1:88){ for (j in 1:10){ k<-j+t*(i-1) yl[k]<-y[i,j] el[k]<-e[i,j]}}

Index setup Indices can now be set up to address different effects: year<-rep(1:10,len=880) region<-rep(1:88,each=10) region2<-region ind2<-rep(1:880)

OHIO_Stmapping_INLA_Rcode.txt All models are described in this file. Data is stored in OHIO_STmapping_INLA_RcodeFAL.txt Paste contents of file into R This will yield y, e, and indicator variables in the data.frame called data

Models fitted on INLA Spatial only (UH) Spatial only (UH+CH) UH +CH +time trend model 1a UH+CH +time iid UH+CH+time rw1 model 1b UH+CH+time (iid, rw1) model 2 UH +time rw1 +Stint UH+CH+time rw1 +Stint model 3

DIC comparison INLA model Model DIC pd WB model 1 Spatial only (UH) 5758.2 79.65 2 UH+CH 5757.4 79.66 3 UH+CH+time trend 5759.28 80.05 1a 4 UH+CH+time iid 5760.4 80.58 5 UH+CH+time RW1 6 UH+CH+time (iid, rw1) 7 UH+time rw1+st int 8 UH+CH+time rw1+stint 5760.6 80.60 1b 5763.1 81.97 2 5753.80 116.78 5757.9 86.41 3

Results: INLA model 7 UH [-0.6,-0.36)(2) [-0.36,-0.12)(16) [-0.12,0.12)(46) [0.12,0.36](23)

Results: INLA model 7 time effect yearr -0.002 0.000 0.002 2 4 6 8 10 time

Break

FMD modeling in space-time Load data: FMD_case_parish_data.txt INLA code in: FMD_INLA_ST_Rcode.txt

Model Development Data is counts within parishes in 13 time periods We also have current population of premises A binomial model may be appropriate y Bin( p, n ) ij ij ij logit( p ) = f ( y ) + f ( predictors) + f (R Es) ij 1 i, j 1 2 3 A Poisson model could also be considered if we modulate the Poisson mean with population y ij Pois( μ ) ij log( μ ) = f ( n ) + f ( y ) + f ( predictors and REs) ij 1 ij 2 i, j 1 3

Simpler Poisson Model We will look at the Poisson model Also only use random effects Lagged dependencies can be accommodated using the copying facility in INLA. We will only have lagged REs

Models Purely spatial (UH+CH) pl offset Purely spatial (UH+CH) pl predictor UH+CH+ pl predictor+ time RW1 UH+CH+ pl offset+ time RW1 UH+CH+ pl predictor+ time RW1+STint

Model Description DIC pd Results 1a UH+CH+ pl offset 1911.67 93.13 DICs 1b UH+CH+ pl predictor 1915.35 93.68 2 UH+CH+ pl predictor + time RW1 1711.73 98.68 3 UH+CH+ pl offset + time RW1 1776.67 98.99 4 UH+CH+ pl predictor + time RW1+ STint 1289.52 292.66

Model 4 This model has spatial and temporal effects with a pl predictor and ST interaction y ij Pois( μ ) ij μ = exp{ α + α n + v + u + γ + ψ } ij 0 1 ij i i j ij NB lagged effects in count and population not included BUT a lagged dependency random effect is

Results UH [-0.73,0.05)(76) [0.05,0.82)(47) [0.82,1.6)(13) [1.6,2.38](1)

Results CH [-1.45,-0.62)(31) [-0.62,0.2)(56) [0.2,1.03)(35) [1.03,1.85](16)

Results time<-seq(1:13) plot(time,yearr) lines(time,yearr) yearr -0.5 0.0 0.5 1.0 1.5 2 4 6 8 10 12 time

Results ST interaction period 1 ST interaction period 1 [-1.07,-0.45)(15) [-0.45,0.17)(93) [0.17,0.8)(10) [0.8,1.42)(6) [1.42,2.05)(4) [2.05,2.67](9)

Results: St interaction period 2 ST interaction period 2 [-1.13,-0.44)(13) [-0.44,0.26)(89) [0.26,0.95)(13) [0.95,1.64)(11) [1.64,2.34)(6) [2.34,3.03](6)

Finally Other INLA features Measurement error in predictors ( mec, meb) Missingness in outcomes (copy facility) Geographically weighted regression e.g. f(ind,x1,model= besag,graph= ) Smoothed predictors e.g. f(x1,model= rw1 ) Modeling point processes via SPDE facilities (LGCP)

INLA OpenBUGS Finally INLA versus OpenBUGS Runs on R x Only through Brugs or R2WinBUGS Large datasets Mixtures x x Posterior functionals? x Special spatial Models Missingness X some: LGCP for point processes Only outcomes in general, but can handle drop-out models X GeoBUGS +CAR models Can handle a range of missingness

Cautionary references A study has found that the default prior distributions for precisions in INLA can lead to very inaccurate precision estimates for random effects Carroll, R., Lawson, A. B., Faes, C., Kirby, R. S. and Aregay, M. (2015) Comparing INLA and OpenBUGS for hierarchical Poisson modeling in disease mapping. Spatial and Spatio-temporal Epidemiology, 14-15,45-54. See also Taylor, B. and Diggle, P. (2014) INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-gaussian Cox Processes. Journal of Statistical Computation and Simulation, 84, 10, 2266-2284 Teng, M., Nathoo, F.S., Johnson, T.D. (2017). Bayesian Computation for Log Gaussian Cox Processes: A Comparative Analysis of Methods. Journal of Statistical Computation and Simulation DOI: 10.1080/00949655.2017.1326117 So be warned!

Conclusions Thanks for your attention! Contact address: lawsonab@musc.edu INLA examples given in Appendix D of Lawson, A. B. (2013) Bayesian Disease Mapping: hierarchical modeling in spatial epidemiology. 2 nd Ed CRC Press, New York And also chapter 15 of Lawson (2018) 3 rd Ed Full 2 x 1 x 2 day courses on BDM (including WinBUGS, INLA, CARBayes and Nimble) given in MUSC (March) Contacts: MUSC courses June Watson email: watsonju@musc.edu

Andrew B Lawson 2019