Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Similar documents
Bayesian Areal Wombling for Geographic Boundary Analysis

Hierarchical Modeling and Analysis for Spatial Data

Bayesian Hierarchical Models

Multivariate spatial modeling

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

Introduction to Spatial Data and Models

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data

Point process with spatio-temporal heterogeneity

Introduction to Spatial Data and Models

Spatial Misalignment

Community Health Needs Assessment through Spatial Regression Modeling

Gaussian predictive process models for large spatial data sets.

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Hierarchical Multiresolution Approaches for Dense Point-Level Breast Cancer Treatment Data

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Modelling for Univariate Spatial Data

Introduction to Geostatistics

Hierarchical Modeling for Univariate Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data

Principles of Bayesian Inference

Hierarchical Modeling for Multivariate Spatial Data

Disease mapping with Gaussian processes

Spatial Misalignment

Hierarchical Modelling for Multivariate Spatial Data

Principles of Bayesian Inference

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis

Introduction. Spatial Processes & Spatial Patterns

Principles of Bayesian Inference

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Modelling geoadditive survival data

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Bayesian spatial hierarchical modeling for temperature extremes

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee

Hierarchical Modeling for non-gaussian Spatial Data

Bayesian Nonparametric Spatio-Temporal Models for Disease Incidence Data

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Bayesian data analysis in practice: Three simple examples

Principles of Bayesian Inference

Introduction to Spatial Data and Models

Wrapped Gaussian processes: a short review and some new results

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Statistícal Methods for Spatial Data Analysis

Lecture : Probabilistic Machine Learning

CBMS Lecture 1. Alan E. Gelfand Duke University

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian Linear Regression

Statistics for extreme & sparse data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Hierarchical Modelling for non-gaussian Spatial Data

Bayesian Regression Linear and Logistic Regression

Hierarchical Modeling for Spatio-temporal Data

Analysing geoadditive regression data: a mixed model approach

ECE521 week 3: 23/26 January 2017

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675.

Gaussian Process Regression Model in Spatial Logistic Regression

CPSC 540: Machine Learning

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Inference. Chapter 9. Linear models and regression

Spatio-temporal modeling of avalanche frequencies in the French Alps

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

SPATIAL ANALYSIS & MORE

Fusing point and areal level space-time data. data with application to wet deposition

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Using Estimating Equations for Spatially Correlated A

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Generalized common spatial factor model

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones

Approximate Marginal Posterior for Log Gaussian Cox Processes

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

MCMC algorithms for fitting Bayesian models

Hybrid Dirichlet processes for functional data

The Bayes classifier

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

Part 6: Multivariate Normal and Linear Models

Spatial Inference of Nitrate Concentrations in Groundwater

Combining Incompatible Spatial Data

Ch 4. Linear Models for Classification

Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Analysis of Spatial Point Patterns

Nonparametric Bayesian Methods (Gaussian Processes)

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

Andrew B. Lawson 2019 BMTRY 763

Bayesian Linear Models

Overview of Spatial Statistics with Applications to fmri

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Transcription:

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 1/27 Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information Shengde Liang, Bradley P. Carlin, and Alan E. Gelfand shengdel@biostat.umn.edu, brad@biostat.umn.edu, and alan@stat.duke.edu Division of Biostatistics, School of Public Health, University of Minnesota and Institute of Statistics and Decision Sciences, Duke University

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 2/27 Background In disease mapping, data are typically presented as aggregated counts over areal regions (counties, zip codes, etc.) analyze with areal or lattice models

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 2/27 Background In disease mapping, data are typically presented as aggregated counts over areal regions (counties, zip codes, etc.) analyze with areal or lattice models Example: Model regional counts as Poisson(E i e µ i ) and assume the µ i follow an area-level conditionally autoregressive (CAR) distribution

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 2/27 Background In disease mapping, data are typically presented as aggregated counts over areal regions (counties, zip codes, etc.) analyze with areal or lattice models Example: Model regional counts as Poisson(E i e µ i ) and assume the µ i follow an area-level conditionally autoregressive (CAR) distribution But if precise geographic coordinates are available, the data are properly viewed as a spatial point pattern

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 2/27 Background In disease mapping, data are typically presented as aggregated counts over areal regions (counties, zip codes, etc.) analyze with areal or lattice models Example: Model regional counts as Poisson(E i e µ i ) and assume the µ i follow an area-level conditionally autoregressive (CAR) distribution But if precise geographic coordinates are available, the data are properly viewed as a spatial point pattern Under a non-homogeneous Poisson process, likelihood for the intensity surface generating the locations is known, but complex...

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 2/27 Background In disease mapping, data are typically presented as aggregated counts over areal regions (counties, zip codes, etc.) analyze with areal or lattice models Example: Model regional counts as Poisson(E i e µ i ) and assume the µ i follow an area-level conditionally autoregressive (CAR) distribution But if precise geographic coordinates are available, the data are properly viewed as a spatial point pattern Under a non-homogeneous Poisson process, likelihood for the intensity surface generating the locations is known, but complex... Spatial point process methods and computations both more challenging retreat to the Poisson-CAR?...

But we can t retreat! We often have individual-level covariates (either of interest" or nuisance") we want to incorporate: We seek to compare patterns across certain treatment covariates that mark" the point pattern Other non-spatial covariates (e.g., patient characteristics or risk factors) are nuisances Still other spatial covariates may be available at either point or areal summary level Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 3/27

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 3/27 But we can t retreat! We often have individual-level covariates (either of interest" or nuisance") we want to incorporate: We seek to compare patterns across certain treatment covariates that mark" the point pattern Other non-spatial covariates (e.g., patient characteristics or risk factors) are nuisances Still other spatial covariates may be available at either point or areal summary level We propose modeling point patterns jointly over geographic and non-spatial nuisance covariate space (precludes aggregation to counts)

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 3/27 But we can t retreat! We often have individual-level covariates (either of interest" or nuisance") we want to incorporate: We seek to compare patterns across certain treatment covariates that mark" the point pattern Other non-spatial covariates (e.g., patient characteristics or risk factors) are nuisances Still other spatial covariates may be available at either point or areal summary level We propose modeling point patterns jointly over geographic and non-spatial nuisance covariate space (precludes aggregation to counts) Interest lies in certain covariate effects, and the marginal intensity associated with geographic space

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 3/27 But we can t retreat! We often have individual-level covariates (either of interest" or nuisance") we want to incorporate: We seek to compare patterns across certain treatment covariates that mark" the point pattern Other non-spatial covariates (e.g., patient characteristics or risk factors) are nuisances Still other spatial covariates may be available at either point or areal summary level We propose modeling point patterns jointly over geographic and non-spatial nuisance covariate space (precludes aggregation to counts) Interest lies in certain covariate effects, and the marginal intensity associated with geographic space Dependence between treatment-specific intensity surfaces multivariate spatial process modeling!

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 4/27 Model-Based Approach Write the intensity of the process as λ(s), where s is a location in a spatial domain D.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 4/27 Model-Based Approach Write the intensity of the process as λ(s), where s is a location in a spatial domain D. Cumulative intensity over any block A (say, county or zip code) is A λ(s)ds, which is λ A, with A is the area of A, if λ(s) is free of s

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 4/27 Model-Based Approach Write the intensity of the process as λ(s), where s is a location in a spatial domain D. Cumulative intensity over any block A (say, county or zip code) is A λ(s)ds, which is λ A, with A is the area of A, if λ(s) is free of s Likelihood for observed locations s i, i = 1,...,n, is then L(λ(s),s D; {s i } n i=1 ) = e D λ(s)ds n i=1 λ(s i )

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 4/27 Model-Based Approach Write the intensity of the process as λ(s), where s is a location in a spatial domain D. Cumulative intensity over any block A (say, county or zip code) is A λ(s)ds, which is λ A, with A is the area of A, if λ(s) is free of s Likelihood for observed locations s i, i = 1,...,n, is then L(λ(s),s D; {s i } n i=1 ) = e D λ(s)ds n i=1 λ(s i ) Parametrizing λ(s) by θ, adding a prior distribution p(θ) posterior distribution p(λ(s;θ) {s i }) for the intensity surface

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 5/27 Modeling (cont d) We typically think of λ(s) as a log-gaussian process (GP) realization, for which the prior might be p(λ(s) µ(s),σ 2,φ), where µ(s) is the mean, and σ 2 and φ are the variance parameters of the GP.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 5/27 Modeling (cont d) We typically think of λ(s) as a log-gaussian process (GP) realization, for which the prior might be p(λ(s) µ(s),σ 2,φ), where µ(s) is the mean, and σ 2 and φ are the variance parameters of the GP. We express µ(s) in part as z (s)β, for some spatially referenced covariates z(s)

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 5/27 Modeling (cont d) We typically think of λ(s) as a log-gaussian process (GP) realization, for which the prior might be p(λ(s) µ(s),σ 2,φ), where µ(s) is the mean, and σ 2 and φ are the variance parameters of the GP. We express µ(s) in part as z (s)β, for some spatially referenced covariates z(s) Since λ(s) is being modeled as a random realization of a spatial process, the integral in the likelihood is stochastic, precluding explicit evaluation. Computational challenges thus include: the stochastic integration the often large collection of spatial locations (the big n problem") a prior specification that is only available through finite dimensional distributions.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 6/27 Really Brief Literature Review Wolpert and Ickstadt (1998): fully Bayesian approaches for spatially nonhomogeneous Poisson process data

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 6/27 Really Brief Literature Review Wolpert and Ickstadt (1998): fully Bayesian approaches for spatially nonhomogeneous Poisson process data Beneš et al. (2002): Bayesian analysis of a log Gaussian Cox process model, asssuming λ(s) constant over grid cells, and no joint modeling of multiple surfaces or non-spatially indexed covariates

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 6/27 Really Brief Literature Review Wolpert and Ickstadt (1998): fully Bayesian approaches for spatially nonhomogeneous Poisson process data Beneš et al. (2002): Bayesian analysis of a log Gaussian Cox process model, asssuming λ(s) constant over grid cells, and no joint modeling of multiple surfaces or non-spatially indexed covariates Diggle (2007): nice review of log Gaussian Cox process modeling both for static and dynamic point patterns

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 6/27 Really Brief Literature Review Wolpert and Ickstadt (1998): fully Bayesian approaches for spatially nonhomogeneous Poisson process data Beneš et al. (2002): Bayesian analysis of a log Gaussian Cox process model, asssuming λ(s) constant over grid cells, and no joint modeling of multiple surfaces or non-spatially indexed covariates Diggle (2007): nice review of log Gaussian Cox process modeling both for static and dynamic point patterns Brix and Diggle (2001) and Duan et al. (2006): spatiotemporal case, using differential equations to drive the evolution of the intensity surface over time.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 6/27 Really Brief Literature Review Wolpert and Ickstadt (1998): fully Bayesian approaches for spatially nonhomogeneous Poisson process data Beneš et al. (2002): Bayesian analysis of a log Gaussian Cox process model, asssuming λ(s) constant over grid cells, and no joint modeling of multiple surfaces or non-spatially indexed covariates Diggle (2007): nice review of log Gaussian Cox process modeling both for static and dynamic point patterns Brix and Diggle (2001) and Duan et al. (2006): spatiotemporal case, using differential equations to drive the evolution of the intensity surface over time. Then there s the probabilistic discussion in Møller and Waagepetersen (2004), but really not much else out there for fully Bayesian inference...

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 7/27 N Minnesota Breast Cancer Data Latitude 46.0 47.0 48.0 49.0 Longitude Jittered residential locations of cases, as well as radiation treatment facilities (RTFs; triangles), northern Minnesota, 1998 2002. not shown: treatment (BCS vs. mastectomy), age, stage, census variables (education, poverty, race, etc.)

N Minnesota Breast Cancer Data Two options for surgery: mastectomy: more invasive and disfiguring, but usually does not require follow-up radiation treatment breast conserving surgery (BCS, or lumpectomy"): requires daily radiation therapy over a 5-week period Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 8/27

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 8/27 N Minnesota Breast Cancer Data Two options for surgery: mastectomy: more invasive and disfiguring, but usually does not require follow-up radiation treatment breast conserving surgery (BCS, or lumpectomy"): requires daily radiation therapy over a 5-week period primary task: compare the BCS and mastectomy point patterns, and account for their dependence when estimating main effects common to both.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 8/27 N Minnesota Breast Cancer Data Two options for surgery: mastectomy: more invasive and disfiguring, but usually does not require follow-up radiation treatment breast conserving surgery (BCS, or lumpectomy"): requires daily radiation therapy over a 5-week period primary task: compare the BCS and mastectomy point patterns, and account for their dependence when estimating main effects common to both. secondary question: Are women living in more rural areas (as measured by estimated driving distance to the nearest RTF) more likely to opt for mastectomy?

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 8/27 N Minnesota Breast Cancer Data Two options for surgery: mastectomy: more invasive and disfiguring, but usually does not require follow-up radiation treatment breast conserving surgery (BCS, or lumpectomy"): requires daily radiation therapy over a 5-week period primary task: compare the BCS and mastectomy point patterns, and account for their dependence when estimating main effects common to both. secondary question: Are women living in more rural areas (as measured by estimated driving distance to the nearest RTF) more likely to opt for mastectomy? Covariates we use: distance to nearest RTF (spatial exact) census tract poverty rate (spatial areal only) patient age and stage (non-spatial)

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 9/27 Modeling with Spatial Covariates Let X = {s i } n i=1 be a set of random locations modeled using a nonhomogeneous Poisson process with intensity λ(s) We take λ(s) = r(s)π(s), where r(s) is the population density surface at location s. In practice, we let r(s) = (# points in A)/ A for all s A

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 9/27 Modeling with Spatial Covariates Let X = {s i } n i=1 be a set of random locations modeled using a nonhomogeneous Poisson process with intensity λ(s) We take λ(s) = r(s)π(s), where r(s) is the population density surface at location s. In practice, we let r(s) = (# points in A)/ A for all s A Thus π(s) is interpreted as a population adjusted (or relative) intensity, which we model on the log scale as π(s) = exp(β z(s) + w(s)), where w(s) is a zero-centered stochastic process, and β is an unknown vector of regression coefficients.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 9/27 Modeling with Spatial Covariates Let X = {s i } n i=1 be a set of random locations modeled using a nonhomogeneous Poisson process with intensity λ(s) We take λ(s) = r(s)π(s), where r(s) is the population density surface at location s. In practice, we let r(s) = (# points in A)/ A for all s A Thus π(s) is interpreted as a population adjusted (or relative) intensity, which we model on the log scale as π(s) = exp(β z(s) + w(s)), where w(s) is a zero-centered stochastic process, and β is an unknown vector of regression coefficients. If w(s) is taken to be a Gaussian process, then the original point process is called a log Gaussian Cox process (LGCP)

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 10/27 Modeling with Spatial Covariates Likelihood for β and w D = {w(s) : s D} given X: ( ) L(β,w D ;X) exp r(s)π(s)ds r(s i )π(s i ). D s i X

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 10/27 Modeling with Spatial Covariates Likelihood for β and w D = {w(s) : s D} given X: ( ) L(β,w D ;X) exp r(s)π(s)ds r(s i )π(s i ). D s i X Adding priors on w D and β leads to p(β,w D X) L(β,w D ;X)p(β)p(w D ) intractable!

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 10/27 Modeling with Spatial Covariates Likelihood for β and w D = {w(s) : s D} given X: ( ) L(β,w D ;X) exp r(s)π(s)ds r(s i )π(s i ). D s i X Adding priors on w D and β leads to p(β,w D X) L(β,w D ;X)p(β)p(w D ) intractable! In practice, partition D into A i, i = 1, 2,...,m), integrate the intensity surface over each A i, and create a Poisson likelihood for the counts given λ(a i ) still requires A i r(s)π(s)ds, which cannot be done explicitly.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 10/27 Modeling with Spatial Covariates Likelihood for β and w D = {w(s) : s D} given X: ( ) L(β,w D ;X) exp r(s)π(s)ds r(s i )π(s i ). D s i X Adding priors on w D and β leads to p(β,w D X) L(β,w D ;X)p(β)p(w D ) intractable! In practice, partition D into A i, i = 1, 2,...,m), integrate the intensity surface over each A i, and create a Poisson likelihood for the counts given λ(a i ) still requires A i r(s)π(s)ds, which cannot be done explicitly. Could model at the areal level (log π(a i )), but precludes use of point level covariate information π(a i ) = A i π(s) exp( A i (β z(s) + w(s))ds); latter, simpler integration could lead to ecological fallacy

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 11/27 Computational Approach Suppose we replace D λ(s) with some numerical integration (analytic or Monte Carlo), so we replace w D with a finite set, say w = {w(s j ), j = 1, 2,...,m}.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 11/27 Computational Approach Suppose we replace D λ(s) with some numerical integration (analytic or Monte Carlo), so we replace w D with a finite set, say w = {w(s j ), j = 1, 2,...,m}. Revise the likelihood to L(β,w,w(s 1 ),...w(s n );X)p(w,w(s i ),...w(s n ))p(β). Now, we only need to work with an (n + m)-dimensional random variable to handle the w s, whose prior is just an (n + m)-dimensional normal distribution.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 11/27 Computational Approach Suppose we replace D λ(s) with some numerical integration (analytic or Monte Carlo), so we replace w D with a finite set, say w = {w(s j ), j = 1, 2,...,m}. Revise the likelihood to L(β,w,w(s 1 ),...w(s n );X)p(w,w(s i ),...w(s n ))p(β). Now, we only need to work with an (n + m)-dimensional random variable to handle the w s, whose prior is just an (n + m)-dimensional normal distribution. We need z(s) at each s j, but they are not random so may be interpolated or tiled; we just need to be able to assign a value of z for each s D.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 11/27 Computational Approach Suppose we replace D λ(s) with some numerical integration (analytic or Monte Carlo), so we replace w D with a finite set, say w = {w(s j ), j = 1, 2,...,m}. Revise the likelihood to L(β,w,w(s 1 ),...w(s n );X)p(w,w(s i ),...w(s n ))p(β). Now, we only need to work with an (n + m)-dimensional random variable to handle the w s, whose prior is just an (n + m)-dimensional normal distribution. We need z(s) at each s j, but they are not random so may be interpolated or tiled; we just need to be able to assign a value of z for each s D. Our case: distance to RTF = exact, poverty = tiled

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 12/27 Introducing Non-spatial Covariates Recall two types of non-spatial covariates: the marks (our case: treatment [MAS vs BCS]) the nuisances (our case: age and cancer stage) We wish to adjust intensity to reflect age and stage for each treatment group.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 12/27 Introducing Non-spatial Covariates Recall two types of non-spatial covariates: the marks (our case: treatment [MAS vs BCS]) the nuisances (our case: age and cancer stage) We wish to adjust intensity to reflect age and stage for each treatment group. Writing the nuisances as continuous, introduce a second argument into the definition of the intensity, π(s,v) = exp ( β z(s) + w(s,v) ). This is a surface over the product space D V (i.e., geographic space by covariate space)

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 12/27 Introducing Non-spatial Covariates Recall two types of non-spatial covariates: the marks (our case: treatment [MAS vs BCS]) the nuisances (our case: age and cancer stage) We wish to adjust intensity to reflect age and stage for each treatment group. Writing the nuisances as continuous, introduce a second argument into the definition of the intensity, π(s,v) = exp ( β z(s) + w(s,v) ). This is a surface over the product space D V (i.e., geographic space by covariate space) Interest is in the marginal spatial intensity associated with π(s,v), based on data {(s i,v i ), i = 1, 2,...,n}.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 13/27 Introducing Non-spatial Covariates In the interest of separating s and v in the model, for now let us write w(s,v) = w(s) + u(v) model w(s) as above (GP), but take u(v) = v α

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 13/27 Introducing Non-spatial Covariates In the interest of separating s and v in the model, for now let us write w(s,v) = w(s) + u(v) model w(s) as above (GP), but take u(v) = v α Separability allows π(s, v) to factor as exp{u(v)}π(s) can study the spatial intensity scaled by the nuisances

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 13/27 Introducing Non-spatial Covariates In the interest of separating s and v in the model, for now let us write w(s,v) = w(s) + u(v) model w(s) as above (GP), but take u(v) = v α Separability allows π(s, v) to factor as exp{u(v)}π(s) can study the spatial intensity scaled by the nuisances Separable additive marked log relative intensity: log π k (s,v) = β 0k + z (s)β k + v α k + w k (s). β 0k capture the global mark effects spatially-referenced covariates and nuisances can differentially affect the mark-specific intensities w k provide a different GP realization for each k.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 13/27 Introducing Non-spatial Covariates In the interest of separating s and v in the model, for now let us write w(s,v) = w(s) + u(v) model w(s) as above (GP), but take u(v) = v α Separability allows π(s, v) to factor as exp{u(v)}π(s) can study the spatial intensity scaled by the nuisances Separable additive marked log relative intensity: log π k (s,v) = β 0k + z (s)β k + v α k + w k (s). β 0k capture the global mark effects spatially-referenced covariates and nuisances can differentially affect the mark-specific intensities w k provide a different GP realization for each k. Sensible reduced models: w k (s) = w(s), β k = β, α k = α.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 13/27 Introducing Non-spatial Covariates In the interest of separating s and v in the model, for now let us write w(s,v) = w(s) + u(v) model w(s) as above (GP), but take u(v) = v α Separability allows π(s, v) to factor as exp{u(v)}π(s) can study the spatial intensity scaled by the nuisances Separable additive marked log relative intensity: log π k (s,v) = β 0k + z (s)β k + v α k + w k (s). β 0k capture the global mark effects spatially-referenced covariates and nuisances can differentially affect the mark-specific intensities w k provide a different GP realization for each k. Sensible reduced models: w k (s) = w(s), β k = β, α k = α. Or: Add multiplicative interaction between z(s) and v

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 14/27 Introducing Non-spatial Covariates The intensity associated with our separable additive model is λ k (s,v) = exp(β 0k + v α k ) r(s) exp(z (s)β k + w k (s)). latter part is the natural marginal spatial intensity!

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 14/27 Introducing Non-spatial Covariates The intensity associated with our separable additive model is λ k (s,v) = exp(β 0k + v α k ) r(s) exp(z (s)β k + w k (s)). latter part is the natural marginal spatial intensity! Possible dependence among the w k (s) surfaces multivariate Gaussian process over the w k.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 14/27 Introducing Non-spatial Covariates The intensity associated with our separable additive model is λ k (s,v) = exp(β 0k + v α k ) r(s) exp(z (s)β k + w k (s)). latter part is the natural marginal spatial intensity! Possible dependence among the w k (s) surfaces multivariate Gaussian process over the w k. Cross-covariances Γ w (s,s ) = [Cov(w 1 (s),w 2 (s ))] must be specified carefully so that process realizations remain positive definite: separable forms (easy!) nonseparable forms via coregionalization (Gelfand et al., 2004; c.f. BCG book, 2004, Sections 7.1-2).

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 15/27 Full Likelihood Specification Letting {(s ki,v ki ),i = 1, 2,...n k } be the locations and nuisances associated with the n k points having mark k, the likelihood becomes ( ) exp λ k (s,v)dvds λ k (s ki,v ki ). k D V k s ki,v ki

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 15/27 Full Likelihood Specification Letting {(s ki,v ki ),i = 1, 2,...n k } be the locations and nuisances associated with the n k points having mark k, the likelihood becomes ( ) exp λ k (s,v)dvds λ k (s ki,v ki ). k D V k s ki,v ki Under our separable additive model, this becomes k exp ( q(β 0k,α k ) D r(s) exp(z (s)β k + w k (s))ds ) k nk i=1 (exp(β 0k + v ki α k )r(s ki ) exp(z (s ki )β k + w k (s ki )), where q(β 0k,α k ) = V exp(β 0k + v α k )dv. Additive form in z(s) and v results in single integrals Our approximations permit use of the same set of integration grid points s j for each k.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 16/27 Computational Issues Our (n + m)-dimensional normal distributions above require working with inverses and determinants of very large covariance matrices the big N problem in geostatistics!

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 16/27 Computational Issues Our (n + m)-dimensional normal distributions above require working with inverses and determinants of very large covariance matrices the big N problem in geostatistics! Our (simple) approach: replace w(s) by an approximation w(s) in a lower-dimensional subspace.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 16/27 Computational Issues Our (n + m)-dimensional normal distributions above require working with inverses and determinants of very large covariance matrices the big N problem in geostatistics! Our (simple) approach: replace w(s) by an approximation w(s) in a lower-dimensional subspace. Specifically, at any point s 0, replace the original process w(s 0 ) in the likelihood by the predictive process w(s 0 ) = E(w(s 0 ) w ), where w = [w k (s j )] k,j MV N(0, Γ (θ)) is a realization of w(s) over an arbitrary set of knots S = {s 1,...,s m} (Banerjee et al., 2007).

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 16/27 Computational Issues Our (n + m)-dimensional normal distributions above require working with inverses and determinants of very large covariance matrices the big N problem in geostatistics! Our (simple) approach: replace w(s) by an approximation w(s) in a lower-dimensional subspace. Specifically, at any point s 0, replace the original process w(s 0 ) in the likelihood by the predictive process w(s 0 ) = E(w(s 0 ) w ), where w = [w k (s j )] k,j MV N(0, Γ (θ)) is a realization of w(s) over an arbitrary set of knots S = {s 1,...,s m} (Banerjee et al., 2007). Fit the model using a Gibbs sampler to update the parameters in π(s,v) as well as the random effects at the centroids.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 17/27 Application to N Minn Data Location-specific covariates: z 1 (s), the log standardized distance to nearest RTF z 2 (s), the poverty rate in the census tract containing s Non-location-specific covariates: v 1, the patient s stage at diagnosis (1 if late" [regional or distant], and 0 otherwise) v 2, the patient s age at diagnosis

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 17/27 Application to N Minn Data Location-specific covariates: z 1 (s), the log standardized distance to nearest RTF z 2 (s), the poverty rate in the census tract containing s Non-location-specific covariates: v 1, the patient s stage at diagnosis (1 if late" [regional or distant], and 0 otherwise) v 2, the patient s age at diagnosis Assume population density r(s) is constant within tract (2000 census data)

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 17/27 Application to N Minn Data Location-specific covariates: z 1 (s), the log standardized distance to nearest RTF z 2 (s), the poverty rate in the census tract containing s Non-location-specific covariates: v 1, the patient s stage at diagnosis (1 if late" [regional or distant], and 0 otherwise) v 2, the patient s age at diagnosis Assume population density r(s) is constant within tract (2000 census data) Though we have exact spatial coordinates, the figures on next few slides are presented at tract level, since image-contour plots require preliminary interpolation (say, via interp or MBA in R) which can be misleading over our irregular areal grid

92 49.0 latitude 96 94 0.84 1.68 0.00 0.25 0.50 0.75 1.00 8.55 94 longitude 92 90 92 90 7.55 6.55 5.55 4.55 48.5 48.5 latitude 48.0 46.0 46.5 47.0 latitude 47.0 46.5 46.0 96 94 longitude 47.5 48.5 48.0 47.5 46.0 46.5 47.0 latitude 96 49.0 0.01 90 49.0 0.83 92 longitude 49.0 1.67 47.5 47.0 46.5 46.0 90 longitude 48.0 94 47.5 96 48.0 48.5 49.0 48.5 48.0 latitude 46.0 46.5 47.0 47.5 48.0 47.5 46.0 46.5 47.0 latitude 48.5 49.0 Non-spatial covariates & crude intensity 96 94 longitude 92 90 96 94 92 90 longitude For mastectomy (top row) and BCS (bottom row), left: observed median age middle: observed proportion of late diagnosis; right: observed log-relative intensity (count divided by population). Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 18/27

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 19/27 Population and spatial covariates 1.00 13.00 34.00 573.00 11351.00 2.27 1.31 0.34 0.63 1.60 0.02 0.13 0.24 0.35 0.46 left: population density by tract middle: log-standardized distance to nearest RTF by tract right: poverty rate by tract Note the Red Lake Indian Reservation in the poverty map

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 20/27 Model fitting Priors: σ U(0, 100), τ U( 0.999, 0.999) φ U(0.5, 25) correlation between the two closest sites, exp( φd min ), varies from 0.01 to 0.91

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 20/27 Model fitting Priors: σ U(0, 100), τ U( 0.999, 0.999) φ U(0.5, 25) correlation between the two closest sites, exp( φd min ), varies from 0.01 to 0.91 Results: Full bivariate spatial model outperforms univariate spatial and nonspatial (GLMM, Bivariate GLMM) models in terms of DIC

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 20/27 Model fitting Priors: σ U(0, 100), τ U( 0.999, 0.999) φ U(0.5, 25) correlation between the two closest sites, exp( φd min ), varies from 0.01 to 0.91 Results: Full bivariate spatial model outperforms univariate spatial and nonspatial (GLMM, Bivariate GLMM) models in terms of DIC Table (next slide) gives fixed effect estimates, where second rows give differential effect in the BCS group. age does not affect the relative intensity of mastectomies, but older women are somewhat less likely to choose BCS. The random effects models identify distance to the nearest RTF as significant for mastectomy, but the simple GLM estimate misses this.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 21/27 Parameter Estimates model biv spat res BGLMM GLM MAS int -8.11(0.070) -8.12 (0.072) -8.09(0.049) log-dist -0.15(0.040) -0.15 (0.047) -0.02(0.023) poverty -1.92(0.565) -1.85 (0.576) -1.16(0.366) age 0.01(0.013) 0.01 (0.012) 0.02(0.012) late -0.64(0.046) -0.64 (0.047) -0.65(0.046) BCS int -0.02(0.086) -0.03 (0.085) 0.00(0.070) log-dist -0.14(0.040) -0.14 (0.043) -0.14(0.034) poverty -0.91(0.673) -0.83 (0.667) -1.06(0.551) age -0.04(0.020) -0.04 (0.020) -0.04(0.019) late -1.02(0.083) -1.02 (0.083) -1.01(0.085) τ 0.83(0.053) 0.82 (0.058)

92 90 8.16 7.67 1.30 49.0 latitude 96 0.61 0.08 0.77 1.46 10.30 94 longitude 92 90 92 90 9.45 8.59 7.73 6.88 48.5 48.5 latitude 48.0 46.0 46.5 47.0 latitude 47.0 46.5 46.0 96 94 longitude 47.5 48.5 48.0 47.5 46.0 46.5 47.0 latitude 90 49.0 8.66 92 49.0 9.16 94 longitude 49.0 9.65 47.5 47.0 46.5 46.0 96 longitude 48.0 94 47.5 96 48.0 48.5 49.0 48.5 48.0 latitude 46.0 46.5 47.0 47.5 48.0 47.5 46.0 46.5 47.0 latitude 48.5 49.0 Fitted intensity surfaces, full model 96 94 longitude 92 90 96 94 92 90 longitude For mastectomy (top row) and BCS (bottom row), and assuming the mean age and an early diagnosis, left: log-relative intensity without spatial residuals middle: spatial residuals right: complete log-relative intensity surfaces Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 22/27

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 23/27 Fitted log-relative intensity surfaces left column: the two spatial covariates alone encourage higher (darker) values in the south and near the population centers middle column: compensating residual activity in several rural and suburban tracts right column: resembles spatially smoothed versions of the corresponding raw" maps (though direct comparison is not really possible due to age and stage adjustment) Similarity of the mastectomy and BCS spatial residual maps fairly large estimated ˆτ (nonspatial correlation)

Discussion Summary: Extended the LGCP model to accommodate covariates that are spatially referenced individual-level covariates that mark the process individual-level nuisance" risk factors Fitted areal-level log-relative intensity maps now adjusted for the non-spatially varying covariates Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 24/27

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 24/27 Discussion Summary: Extended the LGCP model to accommodate covariates that are spatially referenced individual-level covariates that mark the process individual-level nuisance" risk factors Fitted areal-level log-relative intensity maps now adjusted for the non-spatially varying covariates Future work: Extending to the case of three-way interactions, e.g., mark by space by individual covariates Imprecision in the (typically rural) addresses? Space-time point pattern analysis: separable versus non-separable models for log-intensity, etc. Wombling to determine statistically significant boundaries in the residual or fitted relative intensity surface (Liang, Banerjee and Carlin, 2008)...

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 25/27 Wombling for Spatial Point Processes Areal: assuming that spatial residuals are regional, we set w k (s) = w ki, if s region i, and assign them a multivariate conditionally autoregressive (MCAR) distribution. Following Lu and Carlin (2005), define the boundary for mark k as those segments having large values of E( ij,k Data), where ij,k = w ki w kj.

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 25/27 Wombling for Spatial Point Processes Areal: assuming that spatial residuals are regional, we set w k (s) = w ki, if s region i, and assign them a multivariate conditionally autoregressive (MCAR) distribution. Following Lu and Carlin (2005), define the boundary for mark k as those segments having large values of E( ij,k Data), where ij,k = w ki w kj. Point-level: For a mean-squared differentiable surface Y (s) and any open curve C in the domain, Banerjee and Gelfand (2006) define the wombling measure of C as D n(s) Y (s)dν = Y (s),n(s) dν, C i.e. the total gradient along C, where, is the inner product, n(s) is the normal direction to C, and D n(s) Y (s) is the directional derivative along n(s). C

Residual wombling, areal case longitude latitude longitude latitude Boundaries (top 20% of E( ij,k Data)) shown as dark edges for mastectomy (left) and BCS (right) Cook County (far NE corner) has statistically higher log intensity than Lake County (adjacent to the W) Red Lake Reservation (resembles a P" rotated 90 degrees clockwise) significantly separated from all but one neighboring tract Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 26/27

Residual wombling, point-level case y 0.77 0.76 0.75 0.74 0.73 0.72 0.71 y 0.77 0.76 0.75 0.74 0.73 0.72 0.71 0.04 0.02 0.00 0.02 0.04 x 0.04 0.02 0.00 0.02 0.04 x y 0.77 0.76 0.75 0.74 0.73 0.72 0.71 y 0.77 0.76 0.75 0.74 0.73 0.72 0.71 0.04 0.02 0.00 0.02 0.04 x 0.04 0.02 0.00 0.02 0.04 x top: Estimated residual surfaces (left MAS, right BCS) bottom: Mean predicted gradient surfaces White lines are candidate boundaries; all are Bayesianly significant" at the 0.05 level. THE END (at last!) Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 27/27