Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis

Similar documents
Bayesian Areal Wombling for Geographic Boundary Analysis

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian spatial hierarchical modeling for temperature extremes

Part 8: GLMs and Hierarchical LMs and GLMs

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

Chris Bishop s PRML Ch. 8: Graphical Models

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

A spatio-temporal model for extreme precipitation simulated by a climate model

Hierarchical Modeling for non-gaussian Spatial Data

Hierarchical Modelling for non-gaussian Spatial Data

Probabilistic Graphical Models

Hierarchical Modeling and Analysis for Spatial Data

Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models

Default Priors and Effcient Posterior Computation in Bayesian

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Modelling for non-gaussian Spatial Data

Cross-sectional space-time modeling using ARNN(p, n) processes

Spatio-Temporal Modelling of Credit Default Data

MCMC algorithms for fitting Bayesian models

Variational Inference (11/04/13)

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Introduction to Spatial Data and Models

Learning discrete graphical models via generalized inverse covariance matrices

Introduction to Spatial Data and Models

A fast sampler for data simulation from spatial, and other, Markov random fields

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Bayesian non-parametric model to longitudinally predict churn

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Compatibility of conditionally specified models

A Note on Bayesian Inference After Multiple Imputation

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Hierarchical Modelling for Univariate Spatial Data

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Variable Selection in Structured High-dimensional Covariate Spaces

Nearest Neighbor Gaussian Processes for Large Spatial Data

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Bayesian linear regression

Intelligent Systems:

An EM algorithm for Gaussian Markov Random Fields

7 The structure of graphs excluding a topological minor

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Statistical Practice

Bayesian Linear Regression

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Modelling geoadditive survival data

13: Variational inference II

Hierarchical Modelling for Univariate Spatial Data

BAYESIAN HIERARCHICAL MODELS FOR MISALIGNED DATA: A SIMULATION STUDY

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Summary STK 4150/9150

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Spatial Smoothing in Stan: Conditional Auto-Regressive Models

arxiv: v1 [stat.ap] 18 Nov 2015

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

A Bayesian perspective on GMM and IV

Gaussian Process Regression Model in Spatial Logistic Regression

Fully Bayesian Spatial Analysis of Homicide Rates.

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

1 Data Arrays and Decompositions

Sparse Linear Models (10/7/13)

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

High dimensional Ising model selection

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Bayesian data analysis in practice: Three simple examples

Markov random fields. The Markov property

More on nuisance parameters

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Statistics & Data Sciences: First Year Prelim Exam May 2018

Probabilistic Graphical Models

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Bayesian Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Thinking in Spatial Statistics

The Ising model and Markov chain Monte Carlo

Downloaded from:

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Principles of Bayesian Inference

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

Stat 5101 Lecture Notes

Gaussian Multiscale Spatio-temporal Models for Areal Data

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

Introduction to graphical models: Lecture III

Scaling the Topology of Symmetric, Second-Order Planar Tensor Fields

Probabilistic Graphical Models Lecture Notes Fall 2009

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

Bayesian analysis of conditional autoregressive models

Analysing geoadditive regression data: a mixed model approach

Adding Spatially-Correlated Errors Can Mess Up The Fixed Effect You Love

5. Discriminant analysis

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Transcription:

Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis Haijun Ma, Bradley P. Carlin and Sudipto Banerjee December 8, 2008 Web Appendix A: Selecting binding strength and penalty parameters Here, we address the proper selection of the binding strength parameter ν and the penalty parameter κ in the penalized SE-Ising prior (8). To do this, for a given map (say, St. Luke s) we obtain samples φ E g, g = 1,...,G = 3000 from the penalized SE-Ising distribution for different values of ν and κ. As a comparison, we also obtain i.i.d. draws φ E k,g iid Bernoulli(p), k = 1,...,K, g = 1,...,G = 3000, where K is number of edges in the map and p for each draw is the mean of the φ E g from the two priors should have the same number of on edges. draws from the penalized SE-Ising prior. Thus on average, samples Table 1: Comparison of average number of strings of on edges, M, and average string length, L, for penalized SE-Ising (Ising) and i.i.d. Bernoulli (IID) models over 3000 prior draws, St. Luke s map. Ê(φ E ) is the empirical mean for the penalized SE-Ising. ν κ M Ising M IID L Ising L IID Ê(φ E ) 1 10 4.5 5.01 49.5 39.57 0.99 1 5 4.5 5.02 44.5 39.51 0.99 1 1 4.5 5.02 49.47 39.52 0.99 0.5 10 4.49 5.44 45.88 34.75 0.92 0.5 5 5.41 10.99 27.95 14.18 0.73 0.5 3 19.65 21.54 5.58 5.15 0.57 0.5 1 41.69 30.55 1 1.4 0.29 1 10 4.49 9.95 33.51 16.04 0.75 1 7 10.09 15.08 13.47 9.05 0.66 1 5 23.33 23.53 4.28 4.36 0.54 1 1 43.25 28.94 1 1.12 0.24 100 10 43.6 27.48 1 1.05 0.22 100 5 43.6 27.48 1 1.05 0.22 100 1 43.59 27.52 1 1.05 0.22 The simulation results using the St. Luke s map are given in Table 1. It can be seen that for ν = 1 and most values of κ, the penalized SE-Ising distribution performs better, delivering fewer and longer strings of on edges than the i.i.d. Bernoulli. For ν = 100 and most values of κ, the penalized SE-Ising instead performs worse, producing more and shorter strings of edges. For ν = 0.5 and ν = 1, as κ increases, the number of strings of on edges and the average length of the strings decrease, with the penalized SE-Ising performing better for more negative values of κ. For fixed ν, experimentation with a finer grid of κ values suggests a fairly abrupt 1

Table 2: Comparison of average number of strings of on edges, M, and average string length, L, for penalized SE-Ising (Ising) and i.i.d. Bernoulli (IID) models over G = 3000 prior draws, SMDC map. Ê(φ E ) is the empirical mean for the penalized SE-Ising. ν κ M Ising M IID L Ising L IID Ê(φ E) 1 10 5 6.07 77 61.16 0.99 1 5 5 6.08 77 61.01 0.99 1 1 4.99 6.09 77.13 60.89 0.99 0.5 10 5 7.75 70.86 45.7 0.92 0.5 5 7.15 19.56 39.58 14.21 0.73 0.5 3 35.78 39.62 5.57 5.04 0.57 0.5 1 76.97 56.36 1 1.38 0.29 1 10 6.99 17.47 41.35 16.44 0.75 1 7 16.55 27.29 14.79 8.97 0.66 1 5 44.8 44.37 3.99 4.08 0.53 1 1 79.91 53.39 1 1.06 0.24 100 10 80.61 50.96 1 1.01 0.22 100 5 80.59 50.84 1 1.01 0.22 100 1 80.59 50.84 1 1.01 0.22 change at a particular point, as often seen with models of this sort (Winkler, 2003). In our setting, this point is roughly κ = 3 for ν = 0.5 and κ = 7 for ν = 1. Similar patterns are observed from the simulation results for SMDC, given in Table 2. In the remainder of our paper, we continue with the ν and κ values used in Table 1 (ν = 0.5 and κ = 3) of the manuscript, since they appear reasonably well justifed by the prior simulations in Tables 1 and 2, and since they deliver the appealing fitted wombled maps shown below. While we were able to obtain satisfactory boundaries in our application simply by experimenting with a few values for ν and κ, one might wonder if, like τ ψ (for CAR2), they could be estimated from the data. A hyperprior for (ν, κ) would likely need to feature some informative content, since data-based information here will be weak, and in any case we want to encourage the model to choose longer strings of boundary segments. Designing efficient updating schemes for (ν, κ) will not be easy; moreover, such schemes may need to involve the normalizing constant of the Ising model. This amounts to finding certain vertex-induced subgraphs (see, e.g., Lund and Yannakakis, 1993) for which no polynomial time algorithm exists. Since this difficult problem arises at every MCMC iteration, we considered only the case of fixed ν and κ. As an enhancement to our SE model, we might want the binding strength to increase if the areal level information supports connection of two edges. Leclerc (1989) recommends smoothing second differences across adjacent areas. Alternatively, note that our edge priors currently use only Gibbs distributions based on cliques of size two. While this framework is conceptually and computationally feasible, it does not permit unequivocal distinction of several important edge patterns. Consider for example four boundary segments arranged in a four-way edge intersection. CAR models with cliques of size two cannot differentiate between a continuation (north-south or east-west boundary only) and a cross (both boundaries crossing at the intersection), though in our context we would want the model to favor the former. The image processing literature (Dass and Nair, 2003; Winkler, 2003) uses cliques of size 4 to address this problem, but in our irregular lattice setting it is not at all clear how such cliques would be defined, nor how the problem could be managed computationally. 2

[0.5,0.6) [0.6,0.8) [0.8,0.86] (a)spatial Classification, p [0.5,0.6) [0.6,0.8) [0.8,0.98] (b)spatial Classification, p Figure 1: Posterior mean of p i for the spatial classification model: (a) St. Luke s; (b) SMDC. [0,0.02) [0.02,0.05) [0.05,0.13) [0.13,0.41] [0,0.02) [0.02,0.08) [0.08,0.21) [0.21,0.65] (a) Spatial Classification, p (b) Spatial Classification, p Figure 2: Posterior median of p,ij = p i p j for the spatial classification model: (a) St. Luke s; (b) SMDC. Web Appendix B: Comparisons with spatial classification Given the complex nature of our chosen boundary analysis models, one might wonder if a simple spatial classification model (say, a logistic regression model for the probability that each zip code is served) might work just as well. That is, suppose we let Z i = 1 if Y i > 0 (i.e., in there are any patients served by the hospice in zip code i), and Z i = 0 if Y i = 0. Suppose Z i Bernoulli(p i ), where p i is the probability that area i is served. We adopt the mean structure logit(p i ) = β 0 + β 1 X i + φ i, i = 1,...,n, where φ CAR(τ φ, W), and the single covariate x i is again the intercentroidal distance from the patient s zip to the nearest relevant hospice home base zip. Figure 1 plots the posterior means of p i, i = 1,..., n. It can be seen that the areas that are closer to the headquarters of the hospices have higher chances of being served, and these probabilities decrease as the distance from the headquarters increases. However, for both hospices, the totality of regions for which E(p i z) 0.8 or even 0.6 is much smaller than the self-reported service area. Figures 2 and 3 give the corresponding boundary plots based on p,ij = p i p j and log OR ij = log pi(1 pj) p j(1 p i) respectively. Note that neither of these two criteria appear suitable: the smoothness of the fitted surface in Figure 1 has apparently led to the identification of boundary segments throughout the map. Thus, this apparently sensible class of spatial classification models is not helpful for identifying service area boundaries. 3

[0,0.24) [0.24,0.6) [0.6,1.07) [1.07,2.12] (a) Spatial Classification, logor [0,0.35) [0.35,0.89) [0.89,1.49) [1.49,4.02] (b) Spatial Classification, logor Figure 3: Posterior median of log OR ij = log pi(1 pj) p j(1 p i) for the spatial classification model: (a) St. Luke s; (b) SMDC. Web Appendix C: Edge correction, thresholding and BLV definition A practical issue in almost any spatial analysis is the differential effect of the methods near the outer boundary of the spatial domain. As pointed out in Subsection 2.2, areas having more neighbors are subject to a greater degree of smoothing by the CAR model. Spatial units close to the outer boundary typically have fewer neighbors, and may thus be undersmoothed. In the case of areal boundary analysis, this problem may manifest as a tendency to find spurious boundary segments that connect to the outer boundary. This motivates a search for a spatial edge correction method. Various such methods have been proposed in the literature; see Lawson et al. (1999) for a review. For our random edge models, if an edge element touches the outer boundary, we extend its edge-level neighborhood structure to include those segments of the outer boundary as its neighbors. However, these additional outer boundary segments are not treated as random variables to be estimated, but mere placeholders in the algorithm. For SE-Ising, we set these φ E ij we are certain they are part of the boundary). parameters equal to 1 (since A second practical issue concerns the detection of spurious boundary segments within a homogenous region. We wish to find only segments that encompass a hospice s service area; minor boundaries separating different levels of service within the area are less important. To address this problem, we may use a form of thresholding, redefining the BLV in the µ-based case as µ,ij = (µ i µ j ) if (µ i c)(µ j c) < 0, and 0 otherwise. That is, the BLV is defined so that differences in the means of adjacent ZIP codes are recorded only when the means lie on opposite sides of c, where c is a predetermined cut-off value indicating a minimum service level. The BLV is thus 0 if the adjacent ZIP codes are both inside the service region, or both outside of it. In our analysis, we would like to list a ZIP code as served if it contains at least one served patient over our three-year observation period. As such, we somewhat arbitrarily selected c = 1.5 as the threshold value for our µ -based maps, meaning that a posterior median difference of at least one-half a patient per year will lead to the establishment of a boundary. We found this c to perform well in eliminating intra-service area clutter from our maps, and thus used it in the final maps presented in the main paper. However, at the urging of the associate editor, we also undertook a brief investigation of a few other values of c, namely 0.5, 1, 2, and 5. The results for c = 0.5 and 5 using the same shading levels as used in the main paper are shown in Figure 4; results for c = 1 and 2 were visually indistinguishable from those for c = 1.5. On the whole, the results for the St. Luke s map are very similar to before, though the smaller value of c does lead to some minor boundaries in the northwest portion of the map, where we have a few small but nonzero counts. Differences are somewhat more 4

[10,16.49] [10,19.81] (b) Penalized SE Ising(c=0.5), µ (b) Penalized SE Ising(c=5), µ [10,140.45] [10,142.05] (b) Penalized SE Ising(c=0.5), µ (b) Penalized SE Ising(c=5), µ Figure 4: Maps of St. Luke s and SMDC s service area boundaries given by the penalized SE-Ising model with varying thresholding cutoffs c: (a) St. Luke s service area boundaries, c = 0.5; (b) St. Luke s service area boundaries, c = 5; (c) SMDC s service area boundaries, c = 0.5; (d) SMDC s service area boundaries, c = 5. pronounced for the SMDC map, with the larger c causing the boundaries to contract somewhat around the more highly served ZIP code areas. Thus, at least within a reasonable range (here, 0.5 to 5), c acts as a boundary intensity parameter. While in our case c can be chosen from context (our half a patient per year requirement), it can also be determined in the way an ordinary CAR model smoothing parameter is sometimes set: by personal preference, perhaps based on context (in this case, how weak a definition of service one is willing to tolerate, and hence how large the final service area will be). Web Appendix D: Normalizing constants in the SE-Ising model We now return to the issue of unknown normalizing constants in (3) and (6). Several technical issues arise due to the singularity of D w W. Were the CAR distribution proper, the normalizing constants would involve the determinant of D w W and τ φ to some power. Viewing the CAR as a degenerate multivariate normal distribution defined on a hyperplane and comparing it to an n-dimensional multivariate normal distribution, we see that the analogous normalizing constant for (1) would be τ φ 2π (D w W) 1/2. But since (D w W) is rank deficient, D w W = 0. Besag and Higdon (1999) suggest keeping only D w W +, defined as the non-zero eigenvalue product of (D w W); see also Lu et al. (2007). Hodges et al. (2003) recommend using rank(d w W)/2 as the exponent for τ φ /(2π), and show that rank(d w W) = n I where I is the number of disconnected islands in the spatial structure. Since (D w W) is a function of φ E, both its determinant and I will need to be updated at every MCMC iteration. To stabilize the estimation of φ S i corresponding to singleton islands (islands consisting of just one areal unit) 5

that appear and disappear from the model, we assign each an independent N(0, 1/τ φ ) prior component. The rest of the φ S vector retains its joint improper CAR(τ φ, W) prior, parameterized using a sum-to-zero constraint (Besag and Kooperberg, 1995). As mentioned above, these problems can also be solved via the proper CAR that uses D ǫ = Diag(w i+ + ǫ) for some small ǫ > 0. This however adds the difficult problem of choosing the value of ǫ, to which the results are often rather sensitive (Lu et al., 2007). Additional references Besag, J. and Higdon, D. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Statist. Soc., Ser. B, 61, 691 746. Besag, J. and Kooperberg, C. (1995). On conditional and intrinsic autoregressions. Biometrika, 82, 733 746. Hodges, J.S., Carlin, B.P., and Fan, Q. (2003). On the precision of the conditionally autoregressive prior in spatial models. Biometrics, 59, 317 322. Leclerc, Y.G. (1989). Constructing simple stable descriptions for image partitioning. International Journal of Computer Vision, 3, 73 102. Lund, C. and Yannakakis, M. (1993). The approximation of maximum subgraph problems. In Proceedings of the 20th International Colloquium on Automata, Languages, and Programming, 700, 40 51. 6