Bayesian Areal Wombling for Geographic Boundary Analysis

Size: px

Start display at page:

Download "Bayesian Areal Wombling for Geographic Boundary Analysis"

Rosaline Riley
6 years ago
Views:

1 Bayesian Areal Wombling for Geographic Boundary Analysis Haolan Lu, Haijun Ma, and Bradley P. Carlin and Division of Biostatistics School of Public Health University of Minnesota Bayesian Areal Wombling for Geographic Boundary Analysis p. 1/30

2 Background Spatial data typically classified two ways: point-referenced (geostatistical): spatial locations are points with known coordinates areal (lattice): locations are geographic regions (e.g., counties) with adjacency information Bayesian Areal Wombling for Geographic Boundary Analysis p. 2/30

3 Background Spatial data typically classified two ways: point-referenced (geostatistical): spatial locations are points with known coordinates areal (lattice): locations are geographic regions (e.g., counties) with adjacency information Important topic in spatial statistics: boundary analysis, wherein we seek to identify regions of abrupt change. Indicated by: steep gradients in a continuous surface regional boundaries separating regions with drastically different measurements in a lattice surface Bayesian Areal Wombling for Geographic Boundary Analysis p. 2/30

4 Background Spatial data typically classified two ways: point-referenced (geostatistical): spatial locations are points with known coordinates areal (lattice): locations are geographic regions (e.g., counties) with adjacency information Important topic in spatial statistics: boundary analysis, wherein we seek to identify regions of abrupt change. Indicated by: steep gradients in a continuous surface regional boundaries separating regions with drastically different measurements in a lattice surface Focus in this talk: Boundary analysis for areal data Bayesian Areal Wombling for Geographic Boundary Analysis p. 2/30

5 Background (cont d) Spatial boundary analysis techniques often called wombling, after a foundational paper by Womble (1951). Bayesian Areal Wombling for Geographic Boundary Analysis p. 3/30

6 Background (cont d) Spatial boundary analysis techniques often called wombling, after a foundational paper by Womble (1951). In areal (or polygonal) wombling, a dissimilarity metric measures the difference between adjacent regions. Methods for choosing boundary elements: absolute (dissimilarity metrics greater than C) relative (dissimilarity metrics in the top k%) Bayesian Areal Wombling for Geographic Boundary Analysis p. 3/30

7 Background (cont d) Spatial boundary analysis techniques often called wombling, after a foundational paper by Womble (1951). In areal (or polygonal) wombling, a dissimilarity metric measures the difference between adjacent regions. Methods for choosing boundary elements: absolute (dissimilarity metrics greater than C) relative (dissimilarity metrics in the top k%) Problems: Relative (top k%) thresholding method always finds a fixed number of boundary elements Approach is algorithmic, rather than stochastic: no model or likelihood, so statements about the significance of a detected boundary only relative to predetermined, often unrealistic null distributions. Bayesian Areal Wombling for Geographic Boundary Analysis p. 3/30

8 Crisp vs. Fuzzy Areal Wombling Suppose we have responses Z i for regions i = 1,...,n For neighboring regions i and j and some distance metric, assign the boundary likelihood value (BLV) D ij = Z i Z j. Bayesian Areal Wombling for Geographic Boundary Analysis p. 4/30

9 Crisp vs. Fuzzy Areal Wombling Suppose we have responses Z i for regions i = 1,...,n For neighboring regions i and j and some distance metric, assign the boundary likelihood value (BLV) D ij = Z i Z j. Crisp wombling: Boundary is those edges having BLV s above specified thresholds, i.e., for some c > 0, {(i,j) : D ij > c, i adjacent to j}. Bayesian Areal Wombling for Geographic Boundary Analysis p. 4/30

10 Crisp vs. Fuzzy Areal Wombling Suppose we have responses Z i for regions i = 1,...,n For neighboring regions i and j and some distance metric, assign the boundary likelihood value (BLV) D ij = Z i Z j. Crisp wombling: Boundary is those edges having BLV s above specified thresholds, i.e., for some c > 0, {(i,j) : D ij > c, i adjacent to j}. Fuzzy wombling: Partial membership in the boundary is permitted, say by defining the boundary membership values (BMVs) D ij / max{d ij } ij Bayesian Areal Wombling for Geographic Boundary Analysis p. 4/30

11 Example: MN colorectal cancer data Here n i is the total number of colorectal cancers occurring in county i, and Y i is the number of these that were detected late. Let Z i = SLDR i = Y i E i, i = 1,...,N, the standardized late detection ratio (SLDR), where the expected counts are computed via internal standardization as E i = n i r, and r = i Y i/ i n i, the statewide late detection rate. Bayesian Areal Wombling for Geographic Boundary Analysis p. 5/30

12 Example: MN colorectal cancer data Here n i is the total number of colorectal cancers occurring in county i, and Y i is the number of these that were detected late. Let Z i = SLDR i = Y i E i, i = 1,...,N, the standardized late detection ratio (SLDR), where the expected counts are computed via internal standardization as E i = n i r, and r = i Y i/ i n i, the statewide late detection rate. Left panel of next slide shows BoundarySEER plots of traditional wombled boundaries... Bayesian Areal Wombling for Geographic Boundary Analysis p. 5/30

13 Example: MN colorectal cancer data Left: BoundarySeer choropleth map of colorectal cancer late detection SLDRs, and crisp wombled boundaries arising from the top 20% (red) and 50% (yellow) of the D ij. Bayesian Areal Wombling for Geographic Boundary Analysis p. 6/30

14 Example: MN colorectal cancer data Left: BoundarySeer choropleth map of colorectal cancer late detection SLDRs, and crisp wombled boundaries arising from the top 20% (red) and 50% (yellow) of the D ij. Right: "Tricked" BoundarySeer map of fitted Bayesian colorectal cancer detection SLDRs, and Bayesian wombled boundaries based on the top 20% of the Ê( ij y) values using G = 1500 Gibbs samples. Bayesian Areal Wombling for Geographic Boundary Analysis p. 6/30

15 Hierarchical modeling approach We employ the Poisson log-linear form Y i Poisson(µ i ) where log µ i = log E i + x iβ + φ i. where x i are region-specific covariates, and Bayesian Areal Wombling for Geographic Boundary Analysis p. 7/30

16 Hierarchical modeling approach We employ the Poisson log-linear form Y i Poisson(µ i ) where log µ i = log E i + x iβ + φ i. where x i are region-specific covariates, and the random effects φ = (φ 1,...,φ N ) are given a conditionally autoregressive CAR(τ) specification, φ i φ j i N( φ i, 1/(τm i )), where N denotes the normal distribution, φ i is the average of the φ j i that are adjacent to φ i, and m i is the number of these adjacencies. Bayesian Areal Wombling for Geographic Boundary Analysis p. 7/30

17 Hierarchical modeling approach Define the BLV for boundary (i,j) as ij = η i η j for all i adjacent to j, with η i = µ i /E i. Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

18 Hierarchical modeling approach Define the BLV for boundary (i,j) as ij = η i η j for all i adjacent to j, with η i = µ i /E i. BLVs posterior distribution = wombled boundaries! Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

19 Hierarchical modeling approach Define the BLV for boundary (i,j) as with η i = µ i /E i. ij = η i η j for all i adjacent to j, BLVs posterior distribution = wombled boundaries! Crisp: Define ij to be part of the boundary if and only if E( ij y) > c for some constant c > 0, or if and only if P( ij c y) > c for some constant 0 < c < 1. Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

20 Hierarchical modeling approach Define the BLV for boundary (i,j) as with η i = µ i /E i. ij = η i η j for all i adjacent to j, BLVs posterior distribution = wombled boundaries! Crisp: Define ij to be part of the boundary if and only if E( ij y) > c for some constant c > 0, or if and only if P( ij c y) > c for some constant 0 < c < 1. Fuzzy: P( ij c y) is itself the fuzzy BMV, with MC estimate and associated sd ˆp ij = # (g) ij > c G and ˆpij (1 ˆp ij ) G, retaining only every M th sample, for a total of G. Bayesian Areal Wombling for Geographic Boundary Analysis p. 8/30

21 MN colorectal cancer data revisited means ˆp ij : associated sd s: Posterior probability areal wombling maps for the Minnesota colorectal cancer detection data using three illustrative values of c (5, 15, and 30%), M = 5, and G = Bayesian Areal Wombling for Geographic Boundary Analysis p. 9/30

22 MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

23 MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) The ˆp maps suggest little evidence of strong boundaries between counties; only a few county boundaries estimated to separate regions with true SLDRs that differ by more than 15%. Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

24 MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) The ˆp maps suggest little evidence of strong boundaries between counties; only a few county boundaries estimated to separate regions with true SLDRs that differ by more than 15%. The standard deviation plots reveal that the overall uncertainty associated with each segment tends to decrease for the more extreme c (5 and 30), as we become more certain that most segments either are or are not part of the boundary. Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

25 MN colorectal cancer data revisited The probability of each segment being a member of the boundary decreases in c (the threshold for being a BE) The ˆp maps suggest little evidence of strong boundaries between counties; only a few county boundaries estimated to separate regions with true SLDRs that differ by more than 15%. The standard deviation plots reveal that the overall uncertainty associated with each segment tends to decrease for the more extreme c (5 and 30), as we become more certain that most segments either are or are not part of the boundary. Animated sequences of crisp boundaries (as in haolanl/movie.gif) may be more enlightening than maps of posterior standard deviations... Bayesian Areal Wombling for Geographic Boundary Analysis p. 10/30

26 MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

27 MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

28 MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). If we womble the spatial residuals φ i (instead of the η i ), Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

29 MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). If we womble the spatial residuals φ i (instead of the η i ), Boundaries now separate regions with differing unmodeled heterogeneity = help identify missing covariates! Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

30 MN colorectal cancer data revisited To assess whether County 63 (Red Lake, a T-shaped county in the NW) is truly isolated from its only two neighbors (Counties 57 and 60), by evaluating p 63 P( 63,57 > c 63,60 > c y). Since the { (g) ij } come from the joint posterior of { ij }, Monte Carlo estimates are available = simultaneous inference without a multiple comparisons problem (i.e. no need for Bonferroni correction). If we womble the spatial residuals φ i (instead of the η i ), Boundaries now separate regions with differing unmodeled heterogeneity = help identify missing covariates! No significant boundaries = (covariate-adjusted) mapwide equity, a.k.a. environmental justice Bayesian Areal Wombling for Geographic Boundary Analysis p. 11/30

31 Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

32 Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. The CAR remains a valid distributional specification provided 0 w ij 1 = more possibilities for spatial smoothing: Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

33 Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. The CAR remains a valid distributional specification provided 0 w ij 1 = more possibilities for spatial smoothing: Choose the w ij inversely proportional to the distance separating the centroids of regions i and j, or Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

34 Alternative: Modeling adjacency Note that the univariate CAR model can be written as ( j φ i φ ( i) N w ) ijφ j 1 j w, ij τ j w, ij where the weights w ij are equal to 1 if i j and regions i and j are adjacent, and 0 otherwise. The CAR remains a valid distributional specification provided 0 w ij 1 = more possibilities for spatial smoothing: Choose the w ij inversely proportional to the distance separating the centroids of regions i and j, or Think of the w ij as additional unknown parameters to be estimated, allowing the data to help determine the degree and nature of spatial smoothing. Bayesian Areal Wombling for Geographic Boundary Analysis p. 12/30

35 Alternative: Modeling adjacency Mimicking an idea from statistical social network analysis (Wang and Wong, 1987; Hoff et al., 2002), model the w ij as w ij p ij Bernoulli(p ij ), where log ( pij 1 p ij ) = z ijγ. Bayesian Areal Wombling for Geographic Boundary Analysis p. 13/30

36 Alternative: Modeling adjacency Mimicking an idea from statistical social network analysis (Wang and Wong, 1987; Hoff et al., 2002), model the w ij as w ij p ij Bernoulli(p ij ), where log ( pij 1 p ij Example: Let z 1ij = 1 (so that γ 1 is an intercept parameter), ) = z ijγ. z 2ij = d ij, the distance between the centroids of regions i and j, z 3ij = (area i + area j )/2, the average area of the two regions, and z 4ij = x i x j, the absolute difference of some regional covariate (say, percent urban, or percent of residents who are smokers). Bayesian Areal Wombling for Geographic Boundary Analysis p. 13/30

37 Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

38 Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

39 Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

40 Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. MCMC sampling of φ, τ, β, γ, and W can proceed by a tuned mixture of Gibbs and Metropolis steps. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

41 Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. MCMC sampling of φ, τ, β, γ, and W can proceed by a tuned mixture of Gibbs and Metropolis steps. Try it with simulated data, arising from γ 0 = 1,γ 1 = 2,β 0 = 1, and τ = 30. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

42 Alternative: Modeling adjacency Crisp boundaries based on w ij : those segments ij having P(w ij = 0 y) > c. Fuzzy boundaries based on w ij : use the P(w ij = 0 y) values themselves as the BMVs. Reilly (2001) showed γ is estimable even under a noninformative prior. MCMC sampling of φ, τ, β, γ, and W can proceed by a tuned mixture of Gibbs and Metropolis steps. Try it with simulated data, arising from γ 0 = 1,γ 1 = 2,β 0 = 1, and τ = 30. The simulation of φ is facilitated by WinBUGS, and log E i s are borrowed from a Minnesota breast cancer late detection data set. Bayesian Areal Wombling for Geographic Boundary Analysis p. 14/30

43 Example: Simulated data Left: Map of simulated "true" boundaries (1 w ij ), where thick dark lines indicate boundaries and thin blue lines indicate adjacency between two regions. Bayesian Areal Wombling for Geographic Boundary Analysis p. 15/30

44 Example: Simulated data Left: Map of simulated "true" boundaries (1 w ij ), where thick dark lines indicate boundaries and thin blue lines indicate adjacency between two regions. Right: Map of simulated "true" φ i on a blue color scale. Bayesian Areal Wombling for Geographic Boundary Analysis p. 15/30

45 Example: Simulated data Left: Map of simulated "true" boundaries (1 w ij ), where thick dark lines indicate boundaries and thin blue lines indicate adjacency between two regions. Right: Map of simulated "true" φ i on a blue color scale. Note: Most of the boundaries in the left panel are preserved in the right panel! Bayesian Areal Wombling for Geographic Boundary Analysis p. 15/30

46 Example: Simulated data Left: Map of posterior mean of 1 p ij with corresponding boundaries shaded in color. Note: The majority of these lines match with those in L panel of previous figure! Bayesian Areal Wombling for Geographic Boundary Analysis p. 16/30

47 Example: Simulated data Left: Map of posterior mean of 1 p ij with corresponding boundaries shaded in color. Note: The majority of these lines match with those in L panel of previous figure! Right: Map of posterior means of φ i on blue color scale. Note: Overall spatial pattern remains, but with smoothing of the φ i among neighbors. Bayesian Areal Wombling for Geographic Boundary Analysis p. 16/30

48 Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

49 Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. That is, given that a segment is part of the boundary, we d like our model to favor including neighboring segments in the boundary as well. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

50 Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. That is, given that a segment is part of the boundary, we d like our model to favor including neighboring segments in the boundary as well. But the CAR model is ideal for this type of modeling! We simply need to define a second CAR model on the boundary segment space, in addition to the one we already have on the areal unit space. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

51 Extension: Boundary Adjacency The previous approaches (especially those based on ij ) tend to create networks of disconnected boundary segments. We would prefer an approach that drew continuous boundaries across the map. That is, given that a segment is part of the boundary, we d like our model to favor including neighboring segments in the boundary as well. But the CAR model is ideal for this type of modeling! We simply need to define a second CAR model on the boundary segment space, in addition to the one we already have on the areal unit space. That is, if W is the N N (random) adjacency matrix for the counties, now we also have W, an N adj N adj (fixed) adjacency matrix for the boundary segments, where N adj is the total number of unique county adjacencies. Bayesian Areal Wombling for Geographic Boundary Analysis p. 17/30

52 New Approach: 2-Level CAR That is, we need to expand our logit model to ( ) pij log = z 1 p ijγ + ψ ij, ij where ψ CAR(λ,W ) and W is a 0-1 adjacency matrix on the boundary segment space, with adjacency defined by the map (fixed; no covariates this time). Bayesian Areal Wombling for Geographic Boundary Analysis p. 18/30

53 New Approach: 2-Level CAR That is, we need to expand our logit model to ( ) pij log = z 1 p ijγ + ψ ij, ij where ψ CAR(λ,W ) and W is a 0-1 adjacency matrix on the boundary segment space, with adjacency defined by the map (fixed; no covariates this time). Boundaries may again be obtained from posterior summarization of the ij, the p ij, or the w ij. Bayesian Areal Wombling for Geographic Boundary Analysis p. 18/30

54 New Approach: 2-Level CAR That is, we need to expand our logit model to ( ) pij log = z 1 p ijγ + ψ ij, ij where ψ CAR(λ,W ) and W is a 0-1 adjacency matrix on the boundary segment space, with adjacency defined by the map (fixed; no covariates this time). Boundaries may again be obtained from posterior summarization of the ij, the p ij, or the w ij. To check, we generate data from a 4 4 "template" having true boundary" separating 6 regions (the 4 in the bottom row and the middle two in the 3rd row) from the rest... Bayesian Areal Wombling for Geographic Boundary Analysis p. 18/30

55 New Approach: 2-Level CAR Raw Data L&C, Pr(delta_ij>c data) [ 2.17, 1.51) [ 1.51, 0.85) [ 0.85, 0.19) [ 0.19,0.47) [0.47,1.13) [1.13,1.79] [0.06,0.75] (0.75,0.94] Left: Map of a simulated data set from our model. Bayesian Areal Wombling for Geographic Boundary Analysis p. 19/30

56 New Approach: 2-Level CAR Raw Data L&C, Pr(delta_ij>c data) [ 2.17, 1.51) [ 1.51, 0.85) [ 0.85, 0.19) [ 0.19,0.47) [0.47,1.13) [1.13,1.79] [0.06,0.75] (0.75,0.94] Left: Map of a simulated data set from our model. Right: Posterior probability that ij exceeds the cutoff c, LC method (nonrandom adjacency matrix W ). Bayesian Areal Wombling for Geographic Boundary Analysis p. 19/30

57 New Approach: 2-Level CAR Raw Data L&C, Pr(delta_ij>c data) [ 2.17, 1.51) [ 1.51, 0.85) [ 0.85, 0.19) [ 0.19,0.47) [0.47,1.13) [1.13,1.79] [0.06,0.75] (0.75,0.94] Left: Map of a simulated data set from our model. Right: Posterior probability that ij exceeds the cutoff c, LC method (nonrandom adjacency matrix W ). Note that there are six true" boundary segments; LC finds only 3 of them. It is badly fooled by the surprisingly low values in the northeast corner and the surprisingly high value in the 3rd row, last column. Bayesian Areal Wombling for Geographic Boundary Analysis p. 19/30

58 Results: 2-Level CAR CAR2, Pr(delta_ij>c data) CAR2, post.mean of (1 pij) [0.07,0.82] (0.82,0.97] [0.12,0.17) [0.17,0.21) [0.21,0.25) [0.25,0.29) [0.29,0.34) [0.34,0.38] Left: Posterior probability that ij exceeds the cutoff c (fuzzy boundaries), 2-Level CAR (CAR2). Bayesian Areal Wombling for Geographic Boundary Analysis p. 20/30

59 Results: 2-Level CAR CAR2, Pr(delta_ij>c data) CAR2, post.mean of (1 pij) [0.07,0.82] (0.82,0.97] [0.12,0.17) [0.17,0.21) [0.21,0.25) [0.25,0.29) [0.29,0.34) [0.34,0.38] Left: Posterior probability that ij exceeds the cutoff c (fuzzy boundaries), 2-Level CAR (CAR2). Right: Posterior mean of 1 p ij, 2-Level CAR. Bayesian Areal Wombling for Geographic Boundary Analysis p. 20/30

60 Results: 2-Level CAR CAR2, Pr(delta_ij>c data) CAR2, post.mean of (1 pij) [0.07,0.82] (0.82,0.97] [0.12,0.17) [0.17,0.21) [0.21,0.25) [0.25,0.29) [0.29,0.34) [0.34,0.38] Left: Posterior probability that ij exceeds the cutoff c (fuzzy boundaries), 2-Level CAR (CAR2). Right: Posterior mean of 1 p ij, 2-Level CAR. When applied to the ij s, the CAR2 method performs no better than LC. But when applied to the p ij s, the method now correctly finds all six boundary segments. Bayesian Areal Wombling for Geographic Boundary Analysis p. 20/30

61 Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

62 Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

63 Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

64 Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Try to increase the impact of response data Y i on the wombled boundaries (relative to the covariates Z ij ) while retaining boundary connectedness Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

65 Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Try to increase the impact of response data Y i on the wombled boundaries (relative to the covariates Z ij ) while retaining boundary connectedness by eliminating the CAR(τ,W) model and the φ i s altogether (but retaining the CAR(λ,W ) and the ψ ij ) Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

66 Current Work Compare methods based on ij and w ij by simulating the average probability that they correctly classify each potential boundary segment Various remedies for handling "islands" (regions with no neighbors) when modeling adjacency Investigate L 1 (absolute difference) CARs, and sensitivity to the choice of priors p(γ i ) that control the effect of the covariate data Z ij Try to increase the impact of response data Y i on the wombled boundaries (relative to the covariates Z ij ) while retaining boundary connectedness by eliminating the CAR(τ,W) model and the φ i s altogether (but retaining the CAR(λ,W ) and the ψ ij ) by retaining both CAR models, but fitting them simultaneously, rather than hierarchically Bayesian Areal Wombling for Geographic Boundary Analysis p. 21/30

67 Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

68 Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Suppose D ij N(η ij, 1/τ e ), where η ij = γ 0 + z ij γ 1 + ψ ij. Let z ij = x i x j (again, difference not absolute difference here), and we again let ψ CAR(λ,W ). Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

69 Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Suppose D ij N(η ij, 1/τ e ), where η ij = γ 0 + z ij γ 1 + ψ ij. Let z ij = x i x j (again, difference not absolute difference here), and we again let ψ CAR(λ,W ). This model eliminates the hard-to-estimate W ij (Bernoulli) parameters, and retains only the second" CAR model, which captures the similarity of neighboring boundary segments on the map. Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

70 Connected, Y i -dominated boundaries Eliminating the CAR(τ,W) model and the φ i s: Suppose we think of the original data Y i as normal. Then the differences D ij = Y i Y j are also normal (note differences, not absolute differences here). Suppose D ij N(η ij, 1/τ e ), where η ij = γ 0 + z ij γ 1 + ψ ij. Let z ij = x i x j (again, difference not absolute difference here), and we again let ψ CAR(λ,W ). This model eliminates the hard-to-estimate W ij (Bernoulli) parameters, and retains only the second" CAR model, which captures the similarity of neighboring boundary segments on the map. This model should deliver ψ ij -based wombled boundaries that are more connected than before Bayesian Areal Wombling for Geographic Boundary Analysis p. 22/30

71 Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

72 Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

73 Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: ψ ij 0 when Y i and Y j are both larger or both smaller than expected Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

74 Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: ψ ij 0 when Y i and Y j are both larger or both smaller than expected ψ ij 0 when one of Y i or Y i is bigger than expected, and the other is smaller Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

75 Connected, Y i -dominated boundaries Simultaneous fitting of the two CAR models: Suppose we replace the Poisson mean structure with log µ i = log E i + x iβ + φ i + ψ i, where φ CAR(τ,W) and ψ CAR(λ,W ) as before. Now ψ i, the average of the edge effects for those edges that comprise region i, contributes directly to the mean structure (instead of to the variance structure via W ). The ψ ij capture signed agreement since: ψ ij 0 when Y i and Y j are both larger or both smaller than expected ψ ij 0 when one of Y i or Y i is bigger than expected, and the other is smaller Might even also contemplate interaction between the φ i and the ψ ij! Bayesian Areal Wombling for Geographic Boundary Analysis p. 23/30

76 Another issue: Multivariate data Suppose we have multivariate disease counts Y ki for disease k in county i. A multivariate hierarchical areal wombling approach might model Y ki ind Poisson(µ ki ), k = 1,...,K > 1, and φ i = (φ 1i,...,φ Ki ) MCAR(α, Λ). Wombling may now proceed much as before, but with a Multivariate CAR model that captures correlation both across space and among the K variables. Bayesian Areal Wombling for Geographic Boundary Analysis p. 24/30

77 Another issue: Multivariate data Suppose we have multivariate disease counts Y ki for disease k in county i. A multivariate hierarchical areal wombling approach might model Y ki ind Poisson(µ ki ), k = 1,...,K > 1, and φ i = (φ 1i,...,φ Ki ) MCAR(α, Λ). Wombling may now proceed much as before, but with a Multivariate CAR model that captures correlation both across space and among the K variables. Several conditional (e.g., φ 1 followed by φ 2 φ 1 ) and marginal (obtained jointly by coregionalization) MCAR models are available, many of which can be fit in WinBUGS! Bayesian Areal Wombling for Geographic Boundary Analysis p. 24/30

78 Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

79 Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

80 Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Goal is to estimate which zips are covered" and which are not covered" by a particular hospice system. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

81 Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Goal is to estimate which zips are covered" and which are not covered" by a particular hospice system. In the Duluth area there are two hospice systems (St. Mary s and St. Luke s). Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

82 Application: Hospices near Duluth, MN Data consist of hospice deaths and total Medicare population in each Minnesota zip code. Hospices have a home base", and will typically only drive so many miles to provide in-home hospice service. Goal is to estimate which zips are covered" and which are not covered" by a particular hospice system. In the Duluth area there are two hospice systems (St. Mary s and St. Luke s). We seek to draw both boundaries around the service regions of both hospices (both invidually and jointly), while accounting for any correlation between the two. Bayesian Areal Wombling for Geographic Boundary Analysis p. 25/30

83 Raw Hospice Data MN: Zipcodes Served by Duluth Hospices Medicare Raw Data Map Not Served St. Mary Only St. Luke Only Served by both Bayesian Areal Wombling for Geographic Boundary Analysis p. 26/30

84 Results of straight Bayesian approach MN: Zipcodes Served by Duluth Hospices Model Based Map Not Served St. Mary Only St. Luke Only Served by both Bayesian Areal Wombling for Geographic Boundary Analysis p. 27/30

85 Bayes with Backfilling" results MN: Zipcodes Served by Duluth Hospices Model Based Map Not Served St. Mary Only St. Luke Only Served by both Bayesian Areal Wombling for Geographic Boundary Analysis p. 28/30

86 Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

87 Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Positive, since if the St. Mary s deaths in a zip are high, the St. Luke s deaths are also likely to be high (consistency of cancer control" across regions). Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

88 Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Positive, since if the St. Mary s deaths in a zip are high, the St. Luke s deaths are also likely to be high (consistency of cancer control" across regions). Negative, since the two hospices are in competition for patients, and the resulting zero-sum game means negative correlation among hospices within zip (though there may still be positive spatial correlation). Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

89 Application: Hospices near Duluth, MN What sort of correlation do we expect between data from the two hospices? Positive, since if the St. Mary s deaths in a zip are high, the St. Luke s deaths are also likely to be high (consistency of cancer control" across regions). Negative, since the two hospices are in competition for patients, and the resulting zero-sum game means negative correlation among hospices within zip (though there may still be positive spatial correlation). So we re not sure! But fortunately, the MCAR(α, Λ) distribution will automatically accommodate either possibility! Bayesian Areal Wombling for Geographic Boundary Analysis p. 29/30

90 Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30

91 Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Variance-based, univariate: Use means or quantiles of p ij or w ij as before, since usual MCAR has only one adjacency matrix W. This then gives boundaries for the larger region served by any hospice. Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30

92 Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Variance-based, univariate: Use means or quantiles of p ij or w ij as before, since usual MCAR has only one adjacency matrix W. This then gives boundaries for the larger region served by any hospice. Variance-based, bivariate: Use an MCAR with two adjacency matrices, one for each hospice???... Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30

93 Application: Hospices near Duluth, MN The wombled boundaries can then be: Mean-based, bivariate: Use means or quantiles of = φ ki φ kj, obtaining separate boundaries for each hospice system k. (k) ij Variance-based, univariate: Use means or quantiles of p ij or w ij as before, since usual MCAR has only one adjacency matrix W. This then gives boundaries for the larger region served by any hospice. Variance-based, bivariate: Use an MCAR with two adjacency matrices, one for each hospice???... We plan to investigate all 3! Bayesian Areal Wombling for Geographic Boundary Analysis p. 30/30

Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis

Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis Haijun Ma, Bradley P. Carlin and Sudipto Banerjee December 8, 2008 Web Appendix A: Selecting