Bayesian Areal Wombling for Geographic Boundary Analysis

Similar documents
Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Multivariate spatial modeling

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Hierarchical Modeling and Analysis for Spatial Data

Generalized common spatial factor model

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

SPATIAL ANALYSIS & MORE

Bayesian Linear Regression

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

STA 216, GLM, Lecture 16. October 29, 2007

Principles of Bayesian Inference

Example using R: Heart Valves Study

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Bayesian Hierarchical Models

Principles of Bayesian Inference

Spatio-Temporal Modelling of Credit Default Data

Fully Bayesian Spatial Analysis of Homicide Rates.

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Markov Chain Monte Carlo

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Bayesian data analysis in practice: Three simple examples

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675.

Disease mapping with Gaussian processes

Stat 5101 Lecture Notes

Spatial Analysis of Incidence Rates: A Bayesian Approach

Beyond MCMC in fitting complex Bayesian models: The INLA method

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Bayesian Regression Linear and Logistic Regression

Community Health Needs Assessment through Spatial Regression Modeling

Principles of Bayesian Inference

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Spatial Time Series Models for Rice and Cassava Yields Based On Bayesian Linear Mixed Models

Multivariate Survival Analysis

Part 8: GLMs and Hierarchical LMs and GLMs

Default Priors and Effcient Posterior Computation in Bayesian

Report and Opinion 2016;8(6) Analysis of bivariate correlated data under the Poisson-gamma model

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Nearest Neighbor Gaussian Processes for Large Spatial Data

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-682.

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Statistícal Methods for Spatial Data Analysis

Principles of Bayesian Inference

Nonparametric Bayesian Methods (Gaussian Processes)

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Propensity Score Weighting with Multilevel Data

Andrew B. Lawson 2019 BMTRY 763

Bayesian Linear Models

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Methods for Machine Learning

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Markov Networks.

Using Estimating Equations for Spatially Correlated A

Markov chain Monte Carlo

Computational statistics

Part 6: Multivariate Normal and Linear Models

Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Spatial Misalignment

Advanced Methods for Agricultural and Agroenvironmental. Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller

Accounting for Complex Sample Designs via Mixture Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

STA 4273H: Statistical Machine Learning

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

Bayesian Mixture Modeling

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Riemann Manifold Methods in Bayesian Statistics

Multivariate Normal & Wishart

Hierarchical Modelling for Univariate and Multivariate Spatial Data

MCMC algorithms for fitting Bayesian models

Estimating marginal likelihoods from the posterior draws through a geometric identity

Hierarchical Modelling for Univariate Spatial Data

ST 740: Markov Chain Monte Carlo

Bayesian Inference for Regression Parameters

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

STA 4273H: Statistical Machine Learning

Quasi-likelihood Scan Statistics for Detection of

A short introduction to INLA and R-INLA

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Hierarchical Linear Models

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Bayesian Linear Models

1 Inference for binomial proportion (Matlab/Python)

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Probing the covariance matrix

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

Transcription:

Bayesian Areal Wombling for Geographic Boundary Analysis Haolan Lu, Haijun Ma, and Bradley P. Carlin haolanl@biostat.umn.edu, haijunma@biostat.umn.edu, and brad@biostat.umn.edu Division of Biostatistics School of Public Health University of Minnesota Bayesian Areal Wombling for Geographic Boundary Analysis p. 1/30

Background Spatial data typically classified two ways: point-referenced (geostatistical): spatial locations are points with known coordinates areal (lattice): locations are geographic regions (e.g., counties) with adjacency information Bayesian Areal Wombling for Geographic Boundary Analysis p. 2/30

Background Spatial data typically classified two ways: point-referenced (geostatistical): spatial locations are points with known coordinates areal (lattice): locations are geographic regions (e.g., counties) with adjacency information Important topic in spatial statistics: boundary analysis, wherein we seek to identify regions of abrupt change. Indicated by: steep gradients in a continuous surface regional boundaries separating regions with drastically different measurements in a lattice surface Bayesian Areal Wombling for Geographic Boundary Analysis p. 2/30

Background Spatial data typically classified two ways: point-referenced (geostatistical): spatial locations are points with known coordinates areal (lattice): locations are geographic regions (e.g., counties) with adjacency information Important topic in spatial statistics: boundary analysis, wherein we seek to identify regions of abrupt change. Indicated by: steep gradients in a continuous surface regional boundaries separating regions with drastically different measurements in a lattice surface Focus in this talk: Boundary analysis for areal data Bayesian Areal Wombling for Geographic Boundary Analysis p. 2/30

Background (cont d) Spatial boundary analysis techniques often called wombling, after a foundational paper by Womble (1951). Bayesian Areal Wombling for Geographic Boundary Analysis p. 3/30

Background (cont d) Spatial boundary analysis techniques often called wombling, after a foundational paper by Womble (1951). In areal (or polygonal) wombling, a dissimilarity metric measures the difference between adjacent regions. Methods for choosing boundary elements: absolute (dissimilarity metrics greater than C) relative (dissimilarity metrics in the top k%) Bayesian Areal Wombling for Geographic Boundary Analysis p. 3/30

Background (cont d) Spatial boundary analysis techniques often called wombling, after a foundational paper by Womble (1951). In areal (or polygonal) wombling, a dissimilarity metric measures the difference between adjacent regions. Methods for choosing boundary elements: absolute (dissimilarity metrics greater than C) relative (dissimilarity metrics in the top k%) Problems: Relative (top k%) thresholding method always finds a fixed number of boundary elements Approach is algorithmic, rather than stochastic: no model or likelihood, so statements about the significance of a detected boundary only relative to predetermined, often unrealistic null distributions. Bayesian Areal Wombling for Geographic Boundary Analysis p. 3/30

Crisp vs. Fuzzy Areal Wombling Suppose we have responses Z i for regions i = 1,...,n For neighboring regions i and j and some distance metric, assign the boundary likelihood value (BLV) D ij = Z i Z j. Bayesian Areal Wombling for Geographic Boundary Analysis p. 4/30

Crisp vs. Fuzzy Areal Wombling Suppose we have responses Z i for regions i = 1,...,n For neighboring regions i and j and some distance metric, assign the boundary likelihood value (BLV) D ij = Z i Z j. Crisp wombling: Boundary is those edges having BLV s above specified thresholds, i.e., for some c > 0, {(i,j) : D ij > c, i adjacent to j}. Bayesian Areal Wombling for Geographic Boundary Analysis p. 4/30

Crisp vs. Fuzzy Areal Wombling Suppose we have responses Z i for regions i = 1,...,n For neighboring regions i and j and some distance metric, assign the boundary likelihood value (BLV) D ij = Z i Z j. Crisp wombling: Boundary is those edges having BLV s above specified thresholds, i.e., for some c > 0, {(i,j) : D ij > c, i adjacent to j}. Fuzzy wombling: Partial membership in the boundary is permitted, say by defining the boundary membership values (BMVs) D ij / max{d ij } ij Bayesian Areal Wombling for Geographic Boundary Analysis p. 4/30

Example: MN colorectal cancer data Here n i is the total number of colorectal cancers occurring in county i, and Y i is the number of these that were detected late. Let Z i = SLDR i = Y i E i, i = 1,...,N, the standardized late detection ratio (SLDR), where the expected counts are computed via internal standardization as E i = n i r, and r = i Y i/ i n i, the statewide late detection rate. Bayesian Areal Wombling for Geographic Boundary Analysis p. 5/30

Example: MN colorectal cancer data Here n i is the total number of colorectal cancers occurring in county i, and Y i is the number of these that were detected late. Let Z i = SLDR i = Y i E i, i = 1,...,N, the standardized late detection ratio (SLDR), where the expected counts are computed via internal standardization as E i = n i r, and r = i Y i/ i n i, the statewide late detection rate. Left panel of next slide shows BoundarySEER plots of traditional wombled boundaries... Bayesian Areal Wombling for Geographic Boundary Analysis p. 5/30

Example: MN colorectal cancer data Left: BoundarySeer choropleth map of colorectal cancer late detection SLDRs, and crisp wombled boundaries arising from the top 20% (red) and 50% (yellow) of the D ij. Bayesian Areal Wombling for Geographic Boundary Analysis p. 6/30

Example: MN colorectal cancer data Left: BoundarySeer choropleth map of colorectal cancer late detection SLDRs, and crisp wombled boundaries arising from the top 20% (red) and 50% (yellow) of the D ij. Right: "Tricked" BoundarySeer map of fitted Bayesian colorectal cancer detection SLDRs, and Bayesian wombled boundaries based on the top 20% of the Ê( ij y) values using G = 1500 Gibbs samples. Bayesian Areal Wombling for Geographic Boundary Analysis p. 6/30

Hierarchical modeling approach We employ the Poisson log-linear form Y i Poisson(µ i ) where log µ i = log E i + x iβ + φ i. where x i are region-specific covariates, and Bayesian Areal Wombling for Geographic Boundary Analysis p. 7/30

Hierarchical modeling approach We employ the Poisson log-linear form Y i Poisson(µ i ) where log µ i = log E i + x iβ + φ i. where x i are region-specific covariates, and the random effects φ = (φ 1,...,φ N ) are given a conditionally autoregressive CAR(τ) specification, φ i φ j i N( φ i, 1/(τm i )), where N denotes the normal distribution, φ i is the average of the φ j i that are adjacent to φ i, and m i is the number of these adjacencies. Bayesian Areal Wombling for Geographic Boundary Analysis p. 7/30

Hierarchical modeling approach Define the BLV for boundary (i,j) as ij = η i η j for all i adjacent to j, with η i = µ i /E i. Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

Hierarchical modeling approach Define the BLV for boundary (i,j) as ij = η i η j for all i adjacent to j, with η i = µ i /E i. BLVs posterior distribution = wombled boundaries! Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

Hierarchical modeling approach Define the BLV for boundary (i,j) as with η i = µ i /E i. ij = η i η j for all i adjacent to j, BLVs posterior distribution = wombled boundaries! Crisp: Define ij to be part of the boundary if and only if E( ij y) > c for some constant c > 0, or if and only if P( ij c y) > c for some constant 0 < c < 1. Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

Hierarchical modeling approach Define the BLV for boundary (i,j) as with η i = µ i /E i. ij = η i η j for all i adjacent to j, BLVs posterior distribution = wombled boundaries! Crisp: Define ij to be part of the boundary if and only if E( ij y) > c for some constant c > 0, or if and only if P( ij c y) > c for some constant 0 < c < 1. Fuzzy: P( ij c y) is itself the fuzzy BMV, with MC estimate and associated sd ˆp ij = # (g) ij > c G and ˆpij (1 ˆp ij ) G, retaining only every M th sample, for a total of G. Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

MN colorectal cancer data revisited means ˆp ij : associated sd s: Posterior probability areal wombling maps for the Minnesota colorectal cancer detection data using three illustrative values of c (5, 15, and 30%), M = 5, and G = 2000. Bayesian Areal Wombling for Geographic Boundary Analysis p. 9/30

MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) The ˆp maps suggest little evidence of strong boundaries between counties; only a few county boundaries estimated to separate regions with true SLDRs that differ by more than 15%. Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) The ˆp maps suggest little evidence of strong boundaries between counties; only a few county boundaries estimated to separate regions with true SLDRs that differ by more than 15%. The standard deviation plots reveal that the overall uncertainty associated with each segment tends to decrease for the more extreme c (5 and 30), as we become more certain that most segments either are or are not part of the boundary. Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) The ˆp maps suggest little evidence of strong boundaries between counties; only a few county boundaries estimated to separate regions with true SLDRs that differ by more than 15%. The standard deviation plots reveal that the overall uncertainty associated with each segment tends to decrease for the more extreme c (5 and 30), as we become more certain that most segments either are or are not part of the boundary. Animated sequences of crisp boundaries (as in www.biostat.umn.edu/ haolanl/movie.gif) may be more enlightening than maps of posterior standard deviations... Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). If we womble the spatial residuals φ i (instead of the η i ), Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). If we womble the spatial residuals φ i (instead of the η i ), Boundaries now separate regions with differing unmodeled heterogeneity = help identify missing covariates! Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). If we womble the spatial residuals φ i (instead of the η i ), Boundaries now separate regions with differing unmodeled heterogeneity = help identify missing covariates! No significant boundaries = (covariate-adjusted) mapwide equity, a.k.a. environmental justice Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. The CAR remains a valid distributional specification provided 0 w ij 1 = more possibilities for spatial smoothing: Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. The CAR remains a valid distributional specification provided 0 w ij 1 = more possibilities for spatial smoothing: Choose the w ij inversely proportional to the distance separating the centroids of regions i and j, or Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. The CAR remains a valid distributional specification provided 0 w ij 1 = more possibilities for spatial smoothing: Choose the w ij inversely proportional to the distance separating the centroids of regions i and j, or Think of the w ij as additional unknown parameters to be estimated, allowing the data to help determine the degree and nature of spatial smoothing. Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

Alternative: Modeling adjacency Mimicking an idea from statistical social network analysis (Wang and Wong, 1987; Hoff et al., 2002), model the w ij as w ij p ij Bernoulli(p ij ), where log ( pij 1 p ij ) = z ijγ. Bayesian Areal Wombling for Geographic Boundary Analysis p. 13/30

Alternative: Modeling adjacency Mimicking an idea from statistical social network analysis (Wang and Wong, 1987; Hoff et al., 2002), model the w ij as w ij p ij Bernoulli(p ij ), where log ( pij 1 p ij Example: Let z 1ij = 1 (so that γ 1 is an intercept parameter), ) = z ijγ. z 2ij = d ij, the distance between the centroids of regions i and j, z 3ij = (area i + area j )/2, the average area of the two regions, and z 4ij = x i x j, the absolute difference of some regional covariate (say, percent urban, or percent of residents who are smokers). Bayesian Areal Wombling for Geographic Boundary Analysis p. 13/30

Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. MCMC sampling of φ, τ, β, γ, and W can proceed by a tuned mixture of Gibbs and Metropolis steps. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. MCMC sampling of φ, τ, β, γ, and W can proceed by a tuned mixture of Gibbs and Metropolis steps. Try it with simulated data, arising from γ 0 = 1,γ 1 = 2,β 0 = 1, and τ = 30. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. MCMC sampling of φ, τ, β, γ, and W can proceed by a tuned mixture of Gibbs and Metropolis steps. Try it with simulated data, arising from γ 0 = 1,γ 1 = 2,β 0 = 1, and τ = 30. The simulation of φ is facilitated by WinBUGS, and log E i s are borrowed from a Minnesota breast cancer late detection data set. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

Example: Simulated data Left: Map of simulated "true" boundaries (1 w ij ), where thick dark lines indicate boundaries and thin blue lines indicate adjacency between two regions. Bayesian Areal Wombling for Geographic Boundary Analysis p. 15/30

Example: Simulated data Left: Map of simulated "true" boundaries (1 w ij ), where thick dark lines indicate boundaries and thin blue lines indicate adjacency between two regions. Right: Map of simulated "true" φ i on a blue color scale. Bayesian Areal Wombling for Geographic Boundary Analysis p. 15/30

Example: Simulated data Left: Map of simulated "true" boundaries (1 w ij ), where thick dark lines indicate boundaries and thin blue lines indicate adjacency between two regions. Right: Map of simulated "true" φ i on a blue color scale. Note: Most of the boundaries in the left panel are preserved in the right panel! Bayesian Areal Wombling for Geographic Boundary Analysis p. 15/30

Example: Simulated data Left: Map of posterior mean of 1 p ij with corresponding boundaries shaded in color. Note: The majority of these lines match with those in L panel of previous figure! Bayesian Areal Wombling for Geographic Boundary Analysis p. 16/30

Example: Simulated data Left: Map of posterior mean of 1 p ij with corresponding boundaries shaded in color. Note: The majority of these lines match with those in L panel of previous figure! Right: Map of posterior means of φ i on blue color scale. Note: Overall spatial pattern remains, but with smoothing of the φ i among neighbors. Bayesian Areal Wombling for Geographic Boundary Analysis p. 16/30

Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. That is, given that a segment is part of the boundary, we d like our model to favor including neighboring segments in the boundary as well. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. That is, given that a segment is part of the boundary, we d like our model to favor including neighboring segments in the boundary as well. But the CAR model is ideal for this type of modeling! We simply need to define a second CAR model on the boundary segment space, in addition to the one we already have on the areal unit space. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. That is, given that a segment is part of the boundary, we d like our model to favor including neighboring segments in the boundary as well. But the CAR model is ideal for this type of modeling! We simply need to define a second CAR model on the boundary segment space, in addition to the one we already have on the areal unit space. That is, if W is the N N (random) adjacency matrix for the counties, now we also have W, an N adj N adj (fixed) adjacency matrix for the boundary segments, where N adj is the total number of unique county adjacencies. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

New Approach: 2-Level CAR That is, we need to expand our logit model to ( ) pij log = z 1 p ijγ + ψ ij, ij where ψ CAR(λ,W ) and W is a 0-1 adjacency matrix on the boundary segment space, with adjacency defined by the map (fixed; no covariates this time). Bayesian Areal Wombling for Geographic Boundary Analysis p. 18/30

New Approach: 2-Level CAR That is, we need to expand our logit model to ( ) pij log = z 1 p ijγ + ψ ij, ij where ψ CAR(λ,W ) and W is a 0-1 adjacency matrix on the boundary segment space, with adjacency defined by the map (fixed; no covariates this time). Boundaries may again be obtained from posterior summarization of the ij, the p ij, or the w ij. Bayesian Areal Wombling for Geographic Boundary Analysis p. 18/30

New Approach: 2-Level CAR That is, we need to expand our logit model to ( ) pij log = z 1 p ijγ + ψ ij, ij where ψ CAR(λ,W ) and W is a 0-1 adjacency matrix on the boundary segment space, with adjacency defined by the map (fixed; no covariates this time). Boundaries may again be obtained from posterior summarization of the ij, the p ij, or the w ij. To check, we generate data from a 4 4 "template" having true boundary" separating 6 regions (the 4 in the bottom row and the middle two in the 3rd row) from the rest... Bayesian Areal Wombling for Geographic Boundary Analysis p. 18/30

New Approach: 2-Level CAR Raw Data L&C, Pr(delta_ij>c data) 1 2 3 4 5 6 [ 2.17, 1.51) [ 1.51, 0.85) [ 0.85, 0.19) [ 0.19,0.47) [0.47,1.13) [1.13,1.79] 1 2 3 4 5 6 [0.06,0.75] (0.75,0.94] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Left: Map of a simulated data set from our model. Bayesian Areal Wombling for Geographic Boundary Analysis p. 19/30

New Approach: 2-Level CAR Raw Data L&C, Pr(delta_ij>c data) 1 2 3 4 5 6 [ 2.17, 1.51) [ 1.51, 0.85) [ 0.85, 0.19) [ 0.19,0.47) [0.47,1.13) [1.13,1.79] 1 2 3 4 5 6 [0.06,0.75] (0.75,0.94] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Left: Map of a simulated data set from our model. Right: Posterior probability that ij exceeds the cutoff c, LC method (nonrandom adjacency matrix W ). Bayesian Areal Wombling for Geographic Boundary Analysis p. 19/30

New Approach: 2-Level CAR Raw Data L&C, Pr(delta_ij>c data) 1 2 3 4 5 6 [ 2.17, 1.51) [ 1.51, 0.85) [ 0.85, 0.19) [ 0.19,0.47) [0.47,1.13) [1.13,1.79] 1 2 3 4 5 6 [0.06,0.75] (0.75,0.94] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Left: Map of a simulated data set from our model. Right: Posterior probability that ij exceeds the cutoff c, LC method (nonrandom adjacency matrix W ). Note that there are six true" boundary segments; LC finds only 3 of them. It is badly fooled by the surprisingly low values in the northeast corner and the surprisingly high value in the 3rd row, last column. Bayesian Areal Wombling for Geographic Boundary Analysis p. 19/30

Results: 2-Level CAR CAR2, Pr(delta_ij>c data) CAR2, post.mean of (1 pij) 1 2 3 4 5 6 [0.07,0.82] (0.82,0.97] 1 2 3 4 5 6 [0.12,0.17) [0.17,0.21) [0.21,0.25) [0.25,0.29) [0.29,0.34) [0.34,0.38] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Left: Posterior probability that ij exceeds the cutoff c (fuzzy boundaries), 2-Level CAR (CAR2). Bayesian Areal Wombling for Geographic Boundary Analysis p. 20/30

Results: 2-Level CAR CAR2, Pr(delta_ij>c data) CAR2, post.mean of (1 pij) 1 2 3 4 5 6 [0.07,0.82] (0.82,0.97] 1 2 3 4 5 6 [0.12,0.17) [0.17,0.21) [0.21,0.25) [0.25,0.29) [0.29,0.34) [0.34,0.38] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Left: Posterior probability that ij exceeds the cutoff c (fuzzy boundaries), 2-Level CAR (CAR2). Right: Posterior mean of 1 p ij, 2-Level CAR. Bayesian Areal Wombling for Geographic Boundary Analysis p. 20/30

Results: 2-Level CAR CAR2, Pr(delta_ij>c data) CAR2, post.mean of (1 pij) 1 2 3 4 5 6 [0.07,0.82] (0.82,0.97] 1 2 3 4 5 6 [0.12,0.17) [0.17,0.21) [0.21,0.25) [0.25,0.29) [0.29,0.34) [0.34,0.38] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Left: Posterior probability that ij exceeds the cutoff c (fuzzy boundaries), 2-Level CAR (CAR2). Right: Posterior mean of 1 p ij, 2-Level CAR. When applied to the ij s, the CAR2 method performs no better than LC. But when applied to the p ij s, the method now correctly finds all six boundary segments. Bayesian Areal Wombling for Geographic Boundary Analysis p. 20/30

Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Try to increase the impact of response data Y i on the wombled boundaries (relative to the covariates Z ij ) while retaining boundary connectedness Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Try to increase the impact of response data Y i on the wombled boundaries (relative to the covariates Z ij ) while retaining boundary connectedness by eliminating the CAR(τ,W) model and the φ i s altogether (but retaining the CAR(λ,W ) and the ψ ij ) Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Try to increase the impact of response data Y i on the wombled boundaries (relative to the covariates Z ij ) while retaining boundary connectedness by eliminating the CAR(τ,W) model and the φ i s altogether (but retaining the CAR(λ,W ) and the ψ ij ) by retaining both CAR models, but fitting them simultaneously, rather than hierarchically Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Suppose D ij N(η ij, 1/τ e ), where η ij = γ 0 + z ij γ 1 + ψ ij. Let z ij = x i x j (again, difference not absolute difference here), and we again let ψ CAR(λ,W ). Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Suppose D ij N(η ij, 1/τ e ), where η ij = γ 0 + z ij γ 1 + ψ ij. Let z ij = x i x j (again, difference not absolute difference here), and we again let ψ CAR(λ,W ). This model eliminates the hard-to-estimate W ij (Bernoulli) parameters, and retains only the second" CAR model, which captures the similarity of neighboring boundary segments on the map. Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Suppose D ij N(η ij, 1/τ e ), where η ij = γ 0 + z ij γ 1 + ψ ij. Let z ij = x i x j (again, difference not absolute difference here), and we again let ψ CAR(λ,W ). This model eliminates the hard-to-estimate W ij (Bernoulli) parameters, and retains only the second" CAR model, which captures the similarity of neighboring boundary segments on the map. This model should deliver ψ ij -based wombled boundaries that are more connected than before Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: ψ ij 0 when Y i and Y j are both larger or both smaller than expected Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: ψ ij 0 when Y i and Y j are both larger or both smaller than expected ψ ij 0 when one of Y i or Y i is bigger than expected, and the other is smaller Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: ψ ij 0 when Y i and Y j are both larger or both smaller than expected ψ ij 0 when one of Y i or Y i is bigger than expected, and the other is smaller Might even also contemplate interaction between the φ i and the ψ ij! Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

Another issue: Multivariate data Suppose we have multivariate disease counts Y ki for disease k in county i. A multivariate hierarchical areal wombling approach might model Y ki ind Poisson(µ ki ), k = 1,...,K > 1, and φ i = (φ 1i,...,φ Ki ) MCAR(α, Λ). Wombling may now proceed much as before, but with a Multivariate CAR model that captures correlation both across space and among the K variables. Bayesian Areal Wombling for Geographic Boundary Analysis p. 24/30

Another issue: Multivariate data Suppose we have multivariate disease counts Y ki for disease k in county i. A multivariate hierarchical areal wombling approach might model Y ki ind Poisson(µ ki ), k = 1,...,K > 1, and φ i = (φ 1i,...,φ Ki ) MCAR(α, Λ). Wombling may now proceed much as before, but with a Multivariate CAR model that captures correlation both across space and among the K variables. Several conditional (e.g., φ 1 followed by φ 2 φ 1 ) and marginal (obtained jointly by coregionalization) MCAR models are available, many of which can be fit in WinBUGS! Bayesian Areal Wombling for Geographic Boundary Analysis p. 24/30

Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Goal is to estimate which zips are covered" and which are not covered" by a particular hospice system. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Goal is to estimate which zips are covered" and which are not covered" by a particular hospice system. In the Duluth area there are two hospice systems (St. Mary s and St. Luke s). Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Goal is to estimate which zips are covered" and which are not covered" by a particular hospice system. In the Duluth area there are two hospice systems (St. Mary s and St. Luke s). We seek to draw both boundaries around the service regions of both hospices (both invidually and jointly), while accounting for any correlation between the two. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

Raw Hospice Data MN: Zipcodes Served by Duluth Hospices Medicare Raw Data Map 43 44 45 46 47 48 49 50 Not Served St. Mary Only St. Luke Only Served by both 96 94 92 90 Bayesian Areal Wombling for Geographic Boundary Analysis p. 26/30

Results of straight Bayesian approach MN: Zipcodes Served by Duluth Hospices Model Based Map 43 44 45 46 47 48 49 50 Not Served St. Mary Only St. Luke Only Served by both 96 94 92 90 Bayesian Areal Wombling for Geographic Boundary Analysis p. 27/30

Bayes with Backfilling" results MN: Zipcodes Served by Duluth Hospices Model Based Map 43 44 45 46 47 48 49 50 Not Served St. Mary Only St. Luke Only Served by both 96 94 92 90 Bayesian Areal Wombling for Geographic Boundary Analysis p. 28/30

Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Positive, since if the St. Mary s deaths in a zip are high, the St. Luke s deaths are also likely to be high (consistency of cancer control" across regions). Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Positive, since if the St. Mary s deaths in a zip are high, the St. Luke s deaths are also likely to be high (consistency of cancer control" across regions). Negative, since the two hospices are in competition for patients, and the resulting zero-sum game means negative correlation among hospices within zip (though there may still be positive spatial correlation). Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Positive, since if the St. Mary s deaths in a zip are high, the St. Luke s deaths are also likely to be high (consistency of cancer control" across regions). Negative, since the two hospices are in competition for patients, and the resulting zero-sum game means negative correlation among hospices within zip (though there may still be positive spatial correlation). So we re not sure! But fortunately, the MCAR(α, Λ) distribution will automatically accommodate either possibility! Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30

Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Variance-based, univariate: Use means or quantiles of p ij or w ij as before, since usual MCAR has only one adjacency matrix W. This then gives boundaries for the larger region served by any hospice. Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30

Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Variance-based, univariate: Use means or quantiles of p ij or w ij as before, since usual MCAR has only one adjacency matrix W. This then gives boundaries for the larger region served by any hospice. Variance-based, bivariate: Use an MCAR with two adjacency matrices, one for each hospice???... Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30

Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Variance-based, univariate: Use means or quantiles of p ij or w ij as before, since usual MCAR has only one adjacency matrix W. This then gives boundaries for the larger region served by any hospice. Variance-based, bivariate: Use an MCAR with two adjacency matrices, one for each hospice???... We plan to investigate all 3! Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30