Spatial Analyses of Periodontal Data Using Conditionally. Autoregressive Priors Having Two Classes of Neighbor Relations

Size: px

Start display at page:

Download "Spatial Analyses of Periodontal Data Using Conditionally. Autoregressive Priors Having Two Classes of Neighbor Relations"

Heather Little
6 years ago
Views:

1 Spatial Analyses of Periodontal Data Using Conditionally Autoregressive Priors Having Two Classes of Neighbor Relations By BRIAN J. REICH a, JAMES S. HODGES b and BRADLEY P. CARLIN b a Department of Statistics, North Carolina State University, 20 Founders Drive, Box 820, Raleigh, NC 269, U.S.A. b Division of Biostatistics, School of Public Health, University of Minnesota, 222 University Ave SE, Suite 200, Minneapolis, MN 44, U.S.A. reich@stat.ncsu.edu hodges@ccbr.umn.edu brad@biostat.umn.edu Correspondence Author: Brian J. Reich Telephone: (99) -686 Fax: (99) -9 April 2, 2006 The third author was supported in part by NIH grants R0 ES00-0 and R0 CA99 0. The authors thank Prof. David Madigan of Rutgers University for showing us the no-free-terms structure.

2 Spatial Analyses of Periodontal Data Using Conditionally Autoregressive Priors Having Two Classes of Neighbor Relations Abstract Attachment loss, the extent of a tooth s root (in millimeters) that is no longer attached to surrounding bone by periodontal ligament, is often used to measure the current state of a patient s periodontal disease and monitor disease progression. Attachment loss data can be analyzed using a conditionally autoregressive (CAR) prior distribution, which smooths fitted values toward neighboring values. However, it may be desirable to have more than one class of neighbor relation in the spatial structure, so the different classes of neighbor relations can induce different degrees of smoothing. For example, we may wish to allow smoothing of neighbor pairs bridging the gap between teeth to be different from smoothing of pairs that do not bridge such gaps. Adequately modelling the spatial structure may improve monitoring of periodontal disease progression. This paper develops a two-neighbor-relation CAR model to handle this situation and presents associated theory to help explain the sometimes unusual posterior distributions of the parameters controlling the different types of smoothing. The posterior of these smoothing parameters often has long upper tails, and its shape can change dramatically depending on the spatial structure. Like previous authors, we show that the prior distribution on these parameters has little effect on the posterior of the fixed effects, but has a marked influence on the posterior of both the random effects and the smoothing parameters. Our analysis of attachment loss data also suggests that the spatial structure itself varies between individuals. Key Words: Conditional autoregressive prior; Gaussian Markov random field; identification; neighbor relations; periodontal data.

3 Introduction In periodontics, attachment loss (AL), the extent of a tooth s root (in millimeters) that is no longer attached to surrounding bone by periodontal ligament, is used to assess the cumulative damage to a patient s periodontium and to see if treatment stops disease progression. Many texts (e.g., Darby and Walsh 99) describe periodontal measurement. Figure shows AL for a particular patient (Patient in our study). One patient s mouth has up to 68 measurements (six per tooth) with at least one island (disconnected group of regions) per jaw; missing teeth can create more islands. The patient shown in Figure is missing tooth number 2 on the left side of the maxilla (upper jaw), resulting in three islands. This paper presents the first analyses of periodontal data using spatial statistical methods. The data are from a clinical trial conducted at the University of Minnesota s Dental School comparing three active treatments to placebo and to no treatment. The 0 patients presented here were at least years old, had moderate to severe periodontal disease, and were not undergoing endodontic or surgical periodontal therapy. Each patient was examined once at baseline and four times after administration of treatment, at three month intervals. The original analysis used whole-mouth averages of clinical measures (including attachment loss) or averages of subsets of sites defined by baseline disease status. These standard, non-spatial analyses found no treatment effect (Shievitz 99). A natural spatial model for analyzing attachment loss is the conditionally autoregressive (CAR) model, popularized for Bayesian disease mapping by Besag et al. (99). In an exam with n measurement sites, assume x i b + θ i is the true attachment loss at site i, i =,..., n, where x i is a column vector containing covariates having coefficients b, and θ i captures spatial variation in true attachment loss that is not explained by the covariates. We include seven population-level

4 covariates: six tooth number indicators (Figure defines the tooth numbers) and an indicator for direct sites (sites not in a gap between two teeth). Let y i be site i s observed attachment loss, and assume the likelihood y i θ i, b, τ 0 is normal with mean x i b + θ i and precision τ 0, conditionally independent across i. The spatial structure governing the θ i is described by a lattice of neighbor relations among sites. A CAR model for θ with L 2 norm (also called a Gaussian Markov random field model) has improper density ( p(θ τ) τ (n G)/2 exp τ ) 2 θ Qθ, () where the parameter τ 0 controls smoothing induced by this prior, larger values smoothing more than smaller; G is the number of islands in the spatial structure; θ = (θ,..., θ n ) ; and Q is n n with non-diagonal entries q ij = if i and j are neighbors and 0 otherwise, and diagonal entries q ii equal to the number of region i s neighbors. This is an n-variate normal kernel specified by its precision matrix τ Q instead of its covariance. Attachment loss measurements are spatially correlated, but their correlation may not be simply a function of distance. Figure 2 identifies four types of neighbor pairs, labelled I-IV. The four neighbor types may have different correlations as suggested by previous studies (e.g., Sterne et al., 988; Gunsolley et al., 994; Roberts, 999) and by the empirical correlations in Table a for the 0 subjects analyzed here. Thus, modelling these data may require two or more classes of neighbor relations in the spatial structure, with the l th class having its own τ l so the different neighbor-relation classes can induce different degrees of smoothing. Besag and Higdon (999) introduced CAR priors with two classes of neighbor relations, modelling a rectangular grid of plots with different smoothing parameters for row and column neighbors. They also extended the model to adjust for edge effects and to analyze data from multiple experiments. 2

5 Table : Empirical correlations of each type of neighbor pair and the neighbor pairs controlled by each smoothing parameter for each grid. Figure 2 defines the types. Number of pairs is the number of pairs of each type of neighbor relation for the 0 patients combined. Empirical correlation is the correlation of each type of neighbor pair using the residuals from the regression (non-spatial) of the 0 patients AL onto population-level covariates. (a) Empirical correlations Type I Type II Type III Type IV Number of pairs Empirical correlation (b) Neighbor pairs controlled by each smoothing parameter for each grid Grid Type I Type II Type III Type IV One class of neighbor pairs NR τ τ τ τ Sides vs interproximal A τ τ τ 2 τ 2 Interproximal vs direct only B τ τ 2 τ 2 τ 2 Type II vs others C τ 2 τ τ 2 τ 2 As possible models for attachment loss, we consider four neighborhood structures ( grids ) with one or two classes of neighbor relations, defined in Table b. The first grid (NR) allows only one class of neighbor relations and has just one smoothing parameter, τ. Grid A distinguishes neighbor pairs entirely on either the buccal (cheek) or lingual (tongue) sides of the teeth (types I and II) from other neighbor pairs. Grid B distinguishes neighbors bridging the gap between teeth, the interproximal region (types II, III, and IV) from type I neighbor pairs. Finally, Grid C distinguishes type II neighbor pairs from the other types. Spatial analysis of periodontal data can potentially serve several purposes. In research, it can be desirable to take periodontal measurements at only a subset of sites. For example, the National Health and Nutrition Examination Survey III (NHANES III) measured only two sites per tooth on a randomly-selected half-mouth (Drury et al., 996). Different spatial structures may imply different sampling schemes. Also, different spatial structures are consistent with different etiologies of attachment loss. Compared to the NR model, grids A and B imply a special role for interproximal regions; compared to each other, they imply different effects for different interproximal

6 sites. Clinically, measurement error is relatively large. Calibration studies commonly show that a single attachment loss measurement has an error with a standard deviation of roughly 0.4 to mm (Osborn et. al, 990; Osborn et. al, 992). Figure shows a severe case of periodontal disease, so measurement error with a mm standard deviation is substantial. Practitioners in effect do t-tests at each site to determine if an apparent change is real, and commonly a site s measured attachment loss must change by at least 2 mm to be deemed a true change. It should be possible to exploit the spatial correlation of attachment loss measurements to mitigate the effects of measurement error and improve sensitivity. Section 2 develops a spatial model for periodontal data using a CAR prior with two neighbor relations (2NRCAR). Our analysis was initially hampered by technical problems, such as MCMC autocorrelations near one for the precision parameters, requiring us to more carefully consider identification in these models. Sections and 4 derive the marginal posterior density of (z, z 2 ), for z l = log(τ l /τ 0 ), l =, 2, and examine identification of the z l. Section then applies Section 2 s model our 0-patient data set. While the spatial structure of these AL data appears to vary considerably among patients, grids with two neighbor relations are superior to the NR grid for some patients. The choice of grid has noteworthy effects on fitted values and the posterior of the fixed effects. Section 6 considers the effect of (z, z 2 ) s prior, and Section concludes. Technical results are relegated to the Appendix. 2 A spatial model for periodontal data Let y p, the n p -vector of patient p s observed AL, follow a normal distribution with mean X p b p + θ p and precision τ 0p I np, where X p is a n p k p matrix of known covariates, b p is a k p -vector of fixed effects, and θ p is patient p s n p -vector of spatial random effects, p =,..., N. Correlation between a 4

7 patient s AL at contiguous sites is modelled in the prior for the spatial random effects θ p. Although it may be possible to specify a model with spatial correlation in measurement error and in the true mean attachment loss, it would be difficult for the data to differentiate between these competing sources of spatial correlation. We have resolved this by assuming all the spatial correlation is accounted for by θ p s prior. θ p has a CAR prior with two neighbor relations, written, extending (), as p(θ p τ p, τ 2p ) c(τ p, τ 2p ) /2 exp ( ) 2 θ p{τ p Q p + τ 2p Q 2p }θ p, (2) where Q lp and τ lp respectively describe and control smoothing of class l neighbor pairs for patient p, and c(τ p, τ 2p ) is the product of the positive eigenvalues of τ p Q p + τ 2p Q 2p (see below). We assume a pair of regions are neighbors of at most one type. Q lp has rank n p G lp, G lp being the number of islands in neighbor class l s spatial structure for patient p; assume G lp < n p, i.e., the l th neighborhood structure is not null. If G p is the number of islands in patient p s combined spatial structure, G lp G p. We assume all patients have the same grid, e.g., Grid A, (although patients may have different adjacency matrices due to missing teeth), but it is possible to consider models where the grid can vary between subjects. Appendix A. derives the following results. For any Q p and Q 2p, there is a nonsingular B p such that Q p = B pd p B p and Q 2p = B pd 2p B p, where D lp is diagonal with n p G lp positive diagonal entries and G lp zero entries (Newcomb, 96). Call D lp s diagonal elements d lpj and without loss of generality assume the last G p diagonal elements of both D lp are zero. Then the product of τ p Q p + τ 2p Q 2p s positive eigenvalues is proportional to c(τ p, τ 2p ) = n p G p j= (τ p d pj + τ 2p d 2pj ). We assume the fixed effects are shared across patients, i.e., b p b and k p k, to capture known

8 patterns in attachment loss. There are k = fixed effects: six tooth number indicators (with Tooth serving as the reference) and an indicator for direct sites (sites not in the gap between teeth). Previous studies have shown these to be the only substantial and consistent fixed effects (e.g., Sterne et al., 988; Gunsolley et al., 994; Shievitz, 99; Roberts, 999). Since the rows and columns of Q p and Q 2p sum to zero, the two-neighbor relation CAR model necessarily implies a flat prior on θ p s average on each island. To ensure b s identifiability, we do not include a column for the intercept in X p, so that the intercept is implicit in θ p. The precision parameters {τ 0p, τ p, τ p } are allowed to vary between patients. The transformation from {τ 0p, τ p, τ p } to {z 0p = log(τ 0p ), z p = log(τ p /τ 0p ), z 2p = log(τ 2p /τ 0p )} allows for Gaussian priors which more naturally capture vague prior information and allow the {z lp } to be correlated a priori. Under this parameterization, z 0p sets the scale and z lp controls the amount of smoothing of class l neighbors for patient p. Our analysis in Section considers two priors for the (z 0p, z p, z 2p ): () the patient s z lp are independent a priori with z lp Uniform( 0, 0), l {0,, 2}, s =,..., 0; and (2) the patient s (z 0p, z p, z 2p ) are drawn from (and thus smoothed by) the Gaussian prior (z 0p, z p, z 2p ) N(µ, Σ) where µ = (µ 0, µ, µ 2 ) and Σ is a diagonal precision matrix with diagonal elements (η 0, η, η 2 ). While it may be reasonable to expect a patient with large z p will also have large z 2p, in the absence of any pre-existing data to this effect, we prefer to let z p and z 2p be independent a priori and let the data induce any posterior correlation. In many cases, the data overcome this prior independence; the posterior correlation of z p and z 2p is often very high. To complete the hierarchical model, the µ l and η l are given independent N(m l, p l ) and Gamma(a l, b l ) priors respectively, l {0,, 2}. Assuming the Gaussian prior on the patients precisions, the full posterior, p(θ, b, z 0, z, z 2 y), is N p= ( np z 0p exp 2 ) ez 0p 2 {(y p X p b θ p ) (y p X p b θ p )} () 6

9 n N p G p (e z p +z 0p d pj + e z 2p +z 0p d 2pj ) /2 exp p= j= N ( η /2 l exp η ) l 2 (z l p µ l ) 2 p= l {0,,2} ( ) ez 0p 2 {θ p(e z p Q p + e z 2p Q 2p )θ p } η a l l exp( p l 2 (µ l m l ) 2 b l η l ) where y = (y,..., y N ), θ = (θ,..., θ N), and z l = (z l,..., z ln ), l {0,, 2}. Section s analysis assumes m l = 0, p l = 0.00, and a l = b l = 0.0, l {0,, 2}. Exploring identification of (τ, τ 2 ) by inspecting p(τ, τ 2 θ) Our analysis of AL data was initially hampered by poor identification of the smoothing parameters {z p, z 2p }, whose posteriors often have long tails and MCMC draws with autocorrelations near one. Identification of the smoothing parameters warrants further consideration because, as our data analysis shows, they play a key role in estimating attachment loss. This section explores (τ, τ 2 ) s identification through their conditional posterior p(τ, τ 2 θ), provides sufficient conditions that ensure both are τ and τ 2 are identified, and gives characteristics of spatial grids that lead to well-identified smoothing parameters. Section 4 explores (z, z 2 ) s full marginal posterior using one patient s data. To simplify notation, both sections drop the subscripts that identifies patients. The conditional density of (τ, τ 2 ) can be re-expressed to highlight identification issues. As in Section 2, assume a non-singular B such that Q = B D B and Q 2 = B D 2 B, D and D 2 being diagonal. For θ = Bθ, p(τ, τ 2 θ ) n G j= [ (d j τ + d 2j τ 2 ) /2 exp( ] 2 θ 2 j {d j τ + d 2j τ 2 }) p(τ, τ 2 ). (4)

10 Denoting γ j = d j τ + d 2j τ 2, (τ, τ 2 ) has conditional density p(τ, τ 2 θ ) n G j= ( γ /2 j exp θ 2 j γ j 2 ) p(τ, τ 2 ), () Thus τ and τ 2 enter p(τ, τ 2 θ), and hence p(τ, τ 2 y), only through the prior p(τ, τ 2 ) and the N G linear combinations {γ j }. The conditional density p(z, z 2 θ ) is also a function of (z, z 2 ) only through the prior and n G linear functions of (e z, e z 2 ); we omit details. The j th term of the product in () is constant for (τ, τ 2 ) satisfying d j τ + d 2j τ 2 = c for c > 0, so individual terms in () s product do not identify τ and τ 2. Rather, identification arises from multiplying terms with different ratios d j /d 2j. If there are two or more distinct ratios d j /d 2j, τ and τ 2 are identified. This holds provided each pair of regions are neighbors of at most one type and neither neighborhood structure is null, as assumed; see Appendix A.2. Each term γ /2 j exp( γ iθj 2 2 ) has the form of a gamma density with variate γ j = d j τ + d 2j τ 2, mode θj 2, and an infinite set of modal (τ,τ 2 ) satisfying d j τ + d 2j τ 2 = θj 2. Terms with d j 0 and d 2j 0 give non-identified modal lines τ 2 = τ d j /d 2j + θk 2 /d 2j. Only the intercepts of these lines depend on θ; the slopes, d j /d 2j, do not. Each term in () can be deemed a free term or a mixed term depending on (d j, d 2j ). We define the j th term to be a free term for τ if d 2j = 0 and d j 0, and vice versa for τ 2. A free term for τ is a function of τ only, taking the form of a gamma density with variate τ. Mixed terms have both d j 0 and d 2j 0. As Section 4 shows, grids with free terms give better identification than grids with no free terms. As noted, G d j are zero, G 2 d 2j are zero, and G pairs (d j, d 2j ) are (0,0), so τ has G 2 G free terms and τ 2 has G G free terms. The θ 2 j corresponding to, say, τ 2 s free terms are functions of the differences between averages of the θ s on the G islands defined by class neighbors. For 8

11 example, under Grid A, if there are no missing teeth, class neighbors define two islands on each jaw: the long strips on measurements on the jaw s lingual (tongue) and buccal (cheek) sides. For each jaw, the θ 2 j for τ 2 s free term is proportional to the difference between the average of lingual θ and buccal θ, which depends on only τ 2, not τ. Similarly, the difference between θ s average on the four sites in the gap between Teeth and 2 and θ s average on the four sites in the gap between Teeth 2 and only depends on τ, not τ 2, resulting in a free term for τ. Grid C gives no free terms for τ because neighbor pairs controlled by τ 2 (Types I, III, and IV) form a connected graph (Figure 2). Considering spatial maps outside of periodontal analysis, certain grids give no free terms for either smoothing parameter. For example, both class and class 2 neighbors in Figure s grid form connected graphs, leaving no free terms for either τ or τ 2. Both τ and τ 2 are still identified under this grid because there are mixed terms with different ratios d j /d 2j, but identification is likely to be poor. If all terms are free terms, τ and τ 2 are conditionally independent a posteriori if they are independent a priori. This happens if, for example, the data consist of two islands, each with its own τ l. Mixed terms induce negative correlation between τ and τ 2 conditional on θ. Specifically, a quadratic approximation to log p(τ, τ 2 θ) gives corr(τ, τ 2 θ) 2 / 22, where ab = n G j= d aj d bj (d j τ +d 2j τ 2 ) 2. This approximate conditional correlation is never positive, but the marginal posterior correlation of τ and τ 2 can be. 4 Exploring p(z, z 2 y) This section derives and explores the marginal posterior of the smoothing parameters using Patient s data (Figure ). No tidy expression like () is available for p(z, z 2 y) except in special cases. Of course, an MCMC algorithm draws from p(z, z 2 τ 0, θ, y) = p(z, z 2 θ), so the free/mixed 9

12 terms ideas developed in Section for p(τ, τ 2 θ) may still help explain p(z, z 2 y). To compute the marginal posterior of (z, z 2 ), we temporarily ignore the fixed effects and give τ 0 a Gamma(a 0, b 0 ) prior, giving the full posterior p(θ, τ 0, τ, τ 2 y) p(τ, τ 2 )τ n/2+a 0 0 ( exp 2 {τ 0 n G j= (τ d j + τ 2 d 2j ) /2 (6) ) [ (y θ) ] (y θ) + 2b 0 + θ(τ Q + τ 2 Q 2 )θ}, where p(τ, τ 2 ) is (τ, τ 2 ) s prior. Next, reparameterize to z l = log(τ l /τ 0 ) and integrate θ and τ 0 out of (6), leaving the marginal posterior of the smoothing parameters (z, z 2 ), n G p(z, z 2 y) p(z, z 2 ) (e z d j + e z 2 d 2j ) /2 I n + e z Q + e z 2 Q 2 /2 R a, () j= where R = b 0 + ( 2 {y I n (I n + e z Q + e z 2 Q 2 ) ) y} and a = (n G)/2 + a 0. Figure 4 contains contour plots of the log marginal posterior of (z, z 2 ) under each grid using Subject s data (Figure ). With each contour plot we present a graph of the n G = 9 unidentified lines evaluated at the marginal posterior median of (z, z 2 ), e.g., the set of (x, x 2 ) satisfying d j e x + d 2j e x 2 = d e z j + d e z 2 2j, i =,..., N G, where z l is the posterior median of z l. In these plots, the straight lines represent the unidentified modal lines arising from free terms for z (vertical) or z 2 (horizontal), and the curved lines represent the unidentified modal curves arising from mixed terms. Table 2 gives counts of free and mixed terms for the three grids with two neighbor relations. The shape of (z, z 2 ) s posterior is largely determined by the free terms. Grids A and B (Figures 4b and 4d) have long upper tails at specific z and z 2 arising from the free terms, e.g., at z and z 2 2. for Grid A. The long upper tails (as opposed to lower tails) may reflect positive 0

13 Table 2: Free-term counts for each periodontal grid for Patient Grid n G G G 2 Free Terms for z Free Terms for z 2 Mixed Terms A B C skewness in the distribution of the free terms intercepts. Grid C has no free terms for z, which explains the absence of a long upper tail for z 2 (Figure 4f). Grid A has many free terms for z, few for z 2, and many mixed terms. The disparity in free terms implies better identification of z than z 2. The ridge along the vertical line z in Figure 4b is more narrow than the ridge along the horizontal line z 2 2. The many mixed terms (Figure 4a) induce some curvature of the L-shaped contours near the intersection of the lines of mass along z and z 2 2. Grid B has many free terms for both z and z 2 and far fewer mixed terms than Grid A. Again, z has more free terms than z 2, so the density is more peaked along z than along z 2 (Figure 4d). The relative paucity of mixed terms (Figure 4c) means little curvature in the L-shape of (z, z 2 ) s posterior. The absence of free terms for z under Grid C explains the poor identification of z in Figure 4f. For large z 2, class 2 neighbors are highly smoothed; because they form a connected graph, class neighbors are forced to be similar regardless of z, that is, the data provide little information for z. Thus, for large z 2, say z 2 4, z s posterior is nearly flat (Figure 4f). For small z 2, say z 2 =, smoothing of class 2 neighbors does not obscure smoothing of class neighbors, allowing the data to rule out z < 0. For z 2 =, z s posterior is still flat for large z because all large z correspond to almost complete smoothing of class neighbors.

14 Analysis of periodontal data for many patients This section uses Section 2 s model to analyze baseline attachment loss from N = 0 patients in the clinical trial described in Section, originally described and analyzed by Shievitz (99). MCMC convergence was improved by integrating out the CAR random effects and sampling from the marginal posterior of ( ) b, z 0p, z p, z 2p. For each model, structured MCMC (Sargent et al., 2000) with blocks b and (z 0p, z p, z 2p ) was used to make 20,000 draws from p(b, z 0p, z p, z 2p y). Draws of θ p were then generated from the conditional distribution of θ p given each iteration s (b, z 0p, z p, z 2p ). Convergence was assessed by comparing summaries of the (z p, z 2p ) draws to contour plots of the exact posterior of (z p, z 2p ) given by Equation (). Figure a is a scatterplot of ( z 0p, z p ) from the NR grid assuming the patients z lp have independent Uniform(-0,0) priors, where z lp is z lp s posterior median. Most of the z 0p = log( τ 0p ) are near zero, which corresponds to measurement error with standard deviation roughly.0, as mentioned in Section. The most striking feature of this plot is the patient-to-patient variation in the smoothing parameters z p. For example, z = 6.48, i.e., smoothing of θ is negligible (Figure 6) and z =.9, i.e., θ is substantially smoothed (Figure ). The different amounts of smoothing for these two patients may be driven by the steady increase in AL from the front to the back of the lower jaw for Patient. In contract with the random scatter for Patient, this distinct spatial pattern indicates that the site-to-site variation in Patient s AL is real and should not be smoothed over. By contrast, the model with (z 0p, z p, z 2p ) N(µ, Σ) a priori smooths the ( z 0p, z p ) towards (0,0) (Figure b). Figure c also shows considerable patient-to-patient variation in ( z p, z 2p ) for Grid A assuming the z lp have independent Uniform(-0,0) priors. For the most part, z p (0, 4), but the z 2p vary almost uniformly from - to. The plot of posterior medians resembles the shape of the marginal 2

15 posterior distribution of Patient s (z, z 2 ) under Grid A, in Figure 4b. This may be explained by the counts of free terms in Table 2. Since Grid A gives many free terms for z and few free terms for z 2, we expect the data to provide less information about z 2 than z. Therefore, the sampling distribution of z 2p should have more variation than the sampling distribution of z p, as in Figure c. The counts of free terms may also explain ( z p, z 2p ) s shrinkage under the model where the (z 0p, z p, z 2p ) have a multivariate normal prior. Figure d shows that the z p are moderately shrunk compared to Figure c, but the z 2p are shrunk almost completely. Since z has more free terms than z 2, the data are more informative for z than z 2 and the prior has less influence on z s posterior than z 2 s posterior. We compare the models using the deviance information criterion (DIC) of Speigelhalter et al. (2002). Defining D(θ, b, z 0, z, z 2 ) = 2 log f(y θ, b, z 0, z, z 2 ), DIC = D + P D where P D = D ˆD, D = E(D(θ, b, z 0, z, z 2 ) y), and ˆD = D(E(θ, b, z 0, z, z 2 y)), the expectations being taken with respect to the full posterior. The model s fit is measured by the posterior mean of the deviance, D, while the model s complexity is captured by P D, the effective number of parameters in the model. Models with smaller DIC and D are favored. DIC can also be computed individually for each patient. Defining D p (θ, b, z 0, z, z 2 ) = 2 log f(y p θ, b, z 0, z, z 2 ), DIC p = D p + P Dp where P Dp = D p ˆD p, Dp = E(D p (θ, b, z 0, z, z 2 ) y), and ˆD = D p (E(θ, b, z 0, z, z 2 y)). Since Np= D p (θ, b, z 0, z, z 2 ) = D(θ, b, z 0, z, z 2 ), N p= DIC p = DIC and N p= P Dp = P D. Table gives DIC for each grid and prior choice for the z lp. Although the spatial models are more complex (i.e., have larger P D ) than the non-spatial models, DIC strongly favors spatial modelling of these periodontal data because of the substantial improvement in fit (i.e., reduction in D); the two models with only fixed effects and without spatial CAR random effects ( FE only ) have the largest DIC statistics by far. No patient s DIC p favors these non-spatial models. Also,

16 Table : Summary of DIC for various models. DIC is the DIC specific to only Patient. # pats min DIC p is the number of patients that have the smallest DIC p for the given model. FE only is the non-spatial model with only fixed effects. (a) z lp Uniform(-0,0) a priori, independent across l and p FE only NR Grid Grid A Grid B Grid C DIC D P D DIC # pats min DIC p (b) (z 0p, z p, z 2p ) N(µ, Σ) a priori FE only NR Grid Grid A Grid B Grid C DIC D P D for each grid, DIC and D overwhelmingly favor independent z lp over shrunken z lp. This is not surprising considering the large patient-to-patient variation and non-gaussian scatters in Figures a and c. Assuming independent z lp a priori, the NR grid has smaller DIC and D than each twoneighbor-relation grid (Table a). However, patients are far from unanimous in favoring the NR grid. A two-neighbor-relation grid minimizes of the 0 patients DIC p ( # pats min DIC p ). Grid A has the smallest DIC of the 2NR grids and minimizes DIC p for more patients than the NR grid. Even Grid C, which has the largest DIC of the spatial grids, minimizes DIC p for 0 of the 0 patients. Table 4 shows posterior summaries for the fixed effects. Under the non-spatial model ( FE only ) each fixed-effect posterior interval except Tooth 4 excludes zero. Under this model, mean AL is higher at direct sites than sites in the gap between teeth, lower on teeth 2 and than tooth, and higher on teeth - than tooth. As expected, the 9% interval of each fixed effect is wider under each spatial model than under the FE-only model. Also, the posterior medians of the tooth-number fixed effects are generally closer to zero under the spatial models, especially Grid 4

17 Table 4: Posterior median and 9% interval of the fixed effects under different grids assuming z lp Uniform(-0,0) independent across l and p. FE only refers to the non-spatial model with only fixed effect. FE only NR Grid A Grid B Grid C Direct 0.8 ( 0., 0.22) 0.2 ( 0.6, 0.26) 0.20 ( 0., 0.2) 0.2 ( 0.6, 0.26) 0.2 ( 0.6, 0.26) Tooth (-0.2, -0.06) (-0.9, 0.0) -0.0 (-0.20, 0.00) -0.0 (-0.20, 0.00) (-0.8, 0.02) Tooth (-0., -0.8) (-0.4, -0.0) (-0.6, -0.) -0.2 (-0.8, -0.) (-0.4, -0.0) Tooth (-0.0, 0.06) 0.0 (-0.4, 0.4) -0.0 (-0.4, 0.2) -0.0 (-0., 0.) 0.0 (-0., 0.) Tooth 0.9 ( 0., 0.2) 0.9 ( 0.0, 0.) 0.8 ( 0.02, 0.) 0. ( 0.04, 0.29) 0.20 ( 0.0, 0.) Tooth ( 0.2, 0.88) 0.6 ( 0.6, 0.94) 0.6 ( 0.8, 0.92) 0.69 ( 0.0, 0.88) 0.4 ( 0.4, 0.92) Tooth 0.9 ( 0.8, 0.99) 0.82 ( 0.6,.0) 0.8 ( 0.6,.0) 0.4 ( 0., 0.96) 0.8 ( 0.62,.0) B, compared to the FE-only model. Under the spatial models, variation in θ absorbs some of the tooth-number effects and nudges the tooth-number fixed effects towards zero. Reich et al. (2006) explore the effect of adding CAR parameters on the fixed effects in spatial regression. These authors show that spatial smoothing has the largest effect on covariates that vary smoothly in space, which explains why the changes in medians from the non-spatial model to the spatial models are larger for the tooth-number effects than for the direct-site effect. Grid A minimizes DIC, the DIC specific to Patient. Figure plots Patient s data (symbols) and fitted values, i.e., θ + X b s posterior mean, under the NR grid (solid lines) and Grid A (dashed lines). The fitted values under the NR grid are similar to the fitted values under Grids B and C (not shown), but often differ from Grid A s fitted values by over 0. mm. For Grid A, draws of z are generally larger than draws of z 2 and ( z, z 2 ) = (.6, 2.0); here z controls smoothing of Type I and II neighbor pairs, which form long strips of sites along the buccal and lingual sides of each jaw. Large z and small z 2 smooth substantially within these long strips but not between them. Figure shows that this is preferable for the upper jaw, where attachment loss is similar across tooth number but is larger for lingual sites (mean AL =.49) than buccal sites (mean AL = 2.20). Grids B, C, and NR smooth more between these strips.

18 6 Effect of the prior distribution Choosing parameterizations and priors for the scale parameters in hierarchical models is an important and unresolved issue (Daniels and Kass, 999,200; Natarajan and Kass, 2000; Kelsall and Wakefield, 2002) that we cannot settle here. However, the generally poor identification of the smoothing parameters calls for an investigation of the sensitivity of Section s results to changes in the priors for the scale parameters. This section considers five putatively vague parameterizations/priors, each independent across patients: Uniform(-0,0) and LogGamma(0.0,0.0) priors for the z lp, Uniform(0,0) priors on the standard deviations σ lp = τ /2 l p, Gamma(0.0,0.0) priors on the variances σ 2 l p, and a prior motivated by Besag and Higdon (999) having τ 0p G(0.0,0.0), λ p = τ p + τ 2p G(0.0,0.0), and β p = τ p /(τ p + τ 2p ) U(0,). For each parameterization/prior, we consider Grid A and discuss the types of fit it encourages and examine the influence on fixedeffect estimates, fitted values, and the posteriors of the smoothing parameters. Figure 8 summarizes the induced prior (contour lines) and posterior samples (boxes and whiskers) of the (z p, z 2p ) for each parameterization/prior. For several patients, the posterior of the smoothing parameters is affected by the prior, especially for patients with large z p or z 2p. For example, Figure 8b s LogGamma(0.0,0.0) prior for z lp favors small z lp and precludes extremely smooth fits with z p > or z 2p >. The priors in Figures 8c 8e all have mode (0,0) and encourage moderate levels of smoothing. Of these three priors, the uniform prior on the standard deviations (Figure 8c) shrinks the (z p, z 2p ) most towards (0, 0) and the inverse gamma prior for the variances (Figure 8d) is most like the uniform prior on z lp. The Uniform(0,) prior on the β p, which control the relative amount of smoothing of the two classes of neighbor pairs, discourages fits with large z lp and small z 2p (i.e., β p ), or small z lp and large z 2p (i.e., β p 0), and shrinks the smoothing parameters towards the line z = z 2 (Figure 8e). 6

19 Table : Posterior median and 9% interval of the fixed effects under Grid A and different priors/parameteriztions for the scale parameters. Each column corresponds to a different parameteriztion/prior. The scale parameters are independent across and within patients a priori under each prior. The transformations used in this table are σ 2 l p = /τ lp, λ p = τ p +τ 2p, and β p = τ p /(τ p +τ 2p ). z 0p U(-,) z 0p LG(.0,.0) σ 0p U(0,0) σ0 2 p IG(.0,.0) τ 0p G(.0,.0) z p U(-,) z p LG(.0,.0) σ p U(0,0) σ 2 p IG(.0,.0) λ p G(.0,.0) z 2p U(-,) z 2p LG(.0,.0) σ 2p U(0,0) σ2 2 p IG(.0,.0) β p U(0,) Direct 0.20 ( 0., 0.2) 0.20 ( 0., 0.26) 0.2 ( 0.6, 0.26) 0.2 ( 0., 0.26) 0.20 ( 0., 0.2) Tooth (-0.20, 0.00) -0.0 (-0.20, 0.00) (-0.20, 0.0) -0.0 (-0.20, 0.00) (-0.9, 0.0) Tooth (-0.6, -0.) (-0.6, -0.2) -0.2 (-0.6, -0.) (-0.6, -0.) (-0.6, -0.2) Tooth ( -0.4, 0.2) (-0.6, 0.2) -0.0 (-0.6, 0.) (-0., 0.2) -0.0 (-0.6, 0.) Tooth 0.8 ( 0.02, 0.) 0.6 ( 0.0, 0.2) 0. ( 0.00, 0.) 0. ( 0.02, 0.2) 0. ( 0.0, 0.) Tooth 6 0. ( 0.8, 0.92) 0. ( 0.6, 0.90) 0. ( 0.4, 0.92) 0. ( 0.8, 0.92) 0.4 ( 0., 0.92) Tooth 0.82 ( 0.6,.0) 0.80 ( 0.6, 0.99) 0.8 ( 0.60,.02) 0.82 ( 0.6,.0) 0.80 ( 0.60,.00) The prior also has a noticeable effect on Patient s fitted values. Figure 9 shows X β + θ s posterior mean for Patient s left maxillary island under two priors for the scale parameters: z lp Uniform(-0,0) (solid lines) and the prior motivated by Besag and Higdon (dashed lines). Under Grid A, z controls smoothing of Type I and II neighbor pairs, which form long strips of sites along the buccal and lingual sides of each jaw, that is, z controls smoothing of adjacent sites within Figure 9 s two rows. With z lp Uniform(-0,0), β s posterior median is 0.998, i.e., z is generally larger than z 2, and the fitted values within Figure 9 s rows are very similar. Recall that the uniform prior on β p discourages fits with large z and small z 2 (Figure 8e). Under this prior, β s posterior median is 0.86 and the fitted values within Figure 9 s rows are less smooth, differing from the fitted values under the uniform prior on the z lp by as much as.4 mm (Tooth ). The fixed-effect estimates and intervals in Table are nearly identical under the five priors. This agrees with previous work in this area, e.g., Daniels and Kass (999). These authors show that the prior on the scale parameter may lead to substantial changes in the posterior of the scale parameters, but such changes typically do not affect the fixed effects posteriors.

20 Discussion In our periodontal data analysis, our model choice statistic sometimes favored models with two neighbor relations despite their increased complexity. Although the NR grid had a smaller overall DIC than any of the two-neighbor-relation grids, Grid A minimized the patient-specific DIC for more patients than the NR grid. The spatial structure appeared to vary considerably among these 0 patients, who were haphazardly selected from the study population. The empirical correlations in Table a suggest a fourth two-neighbor-relation grid, Grid D, that allows for differential smoothing of neighbor pairs on the same tooth (Types I and III) and neighbor pairs on different teeth (Types II and IV). Assuming the z ls have independent Unif(-0,0) priors, Grid D has a smaller overall DIC than Grid A. However, as Figure 0 shows, this result is largely driven by Patients,, 29, 6 and 8. These patients all have severe periodontal disease (e.g., Patient in Figure 6) and may not represent the study population. Although these patients do not seem to be influential on the fixed effects, they do influence the choice of spatial grid; Grid A has smaller patient-specific DIC than Grid D for 28 of the 4 remaining patients. Stratification into more homogeneous groups of patients, such as moderate and advanced groups, may be needed for one grid to prevail as best for all patients. Stratification may also allow the degree of smoothing to be the same for all patients in a stratum, although our model still enables different smoothing for different patients when this is appropriate (Figures 6 and ). Modelling with random grids may also be possible (Lu, 200). The smoothing parameters z l = log(τ l /τ 0 ) are identified except in trivial cases, but identification can be poor depending on the spatial structure. Free terms greatly enhance identification. Free terms arise from prior contrasts in θ with precision depending on only one z l. Generally, z l with no free terms are poorly identified, especially if neighbor pairs of the other class are highly 8

21 smoothed. This may cause computing problems such as poor MCMC convergence. MCMC algorithms exploiting the free-term structure may give better performance. For CAR models with two neighbor relations, the prior on the smoothing parameters is very important. The posterior of several patients smoothing parameters were affected by the priors considered in Section 6. The prior also had a large effect on Patient s fitted values. However, similar to other work in this area (e.g., Daniels and Kass, 999), for these data, the smoothing parameters prior did not affect the fixed-effect estimates. Finally, this paper presents a method for analyzing baseline periodontal data. In practice, longitudinal data may also be of interest. The 2NRCAR prior could be applied in this spatiotemporal setting by defining the two neighbor types to be spatial neighbors at the same visit and temporal neighbors at the same spatial location. References Besag J, Higdon D (999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Stat. Soc., Series B, 6: Besag J, York JC, Mollié A (99). Bayesian image restoration, with two applications in spatial statistics (with discussion). Ann. Inst. Stat. Math., 4: 9. Best NG, Waller LA, Thomas A, Conlon EM, Arnold RA (999). Bayesian models for spatially correlated disease and exposure data (with discussion). Bayesian Statistics 6: 6. Daniels MJ, Kass RE (999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J. Amer. Statist. Assoc., 94: Daniels MJ, Kass RE (200) Shrinkage estimators for covariance matrices. Biometrics, : 84. Darby ML, Walsh MM (99). Periodontal and Oral Hygiene Assessment. Dental Hygiene Theory and Practice. Philadelphia: W. B. Saunders. Drury TF, Winn DM, Snowden CB, Kingman A, Kleinman DV, Lewis B (996). An overview of the oral health component of the National Health and Nutrition Examination Survey (NHANES III-Phase ). J. Dent. Research : Graybill FA (98). Matrices with Applications in Statistics. Pacific Grove, CA: Wadsworth & Brooks. Gunsolley JC, Williams DA, Schenkein HA (994). Variance component modeling of attachment level measurements. J. Clin. Perio., 2:

22 Hodges JS, Carlin BP, Fan Q (200). On the precision of the conditionally autoregressive prior in spatial models. Biometrics, 9: 22. Kelsall JE, Wakefield JC (2002). Modeling spatial variation in disease risk: A geostatistical approach. J. Amer. Statist. Assoc., 9: Lu H (200). Measuring the complexity of generalized linear hierarchical models and Bayesian areal Wombling for boundary analysis. PhD Thesis, University of Minnesota. Natarajan R, Kass RE (2000). Reference bayesian methods for generalized linear models. J. Amer. Statist. Assoc., 9:22 2. Newcomb RW (96). On the simultaneous diagonalization of two semi-definite matrices. Quart. Appl. Math., 9: Osborn J, Stoltenberg J, Huso B, Aeppli D, Pihlstrom B. (990). Comparison of measurement variability using a standard and constant force periodontal probe. J. Periodontology, 6:49 0. Osborn JB, Stoltenberg JL, Huso BA, Aeppli DM, Pihlstrom BL. (992). Comparison of measurement variability in subjects with moderate periodontitis using a conventional and constant force periodontal probe. J. Periodontology, 6: Reich BJ, Hodges JS, Zadnik V (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics, In press. Roberts T (999). Examining mean structure, covariance structure and correlation of interproximal sites in a large periodontal data set. MS Thesis, Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN. Sargent DJ, Hodges JS, Carlin BP (2000). Graph. Stat. 9:2 24. Structured Markov chain Monte Carlo. J. Comp. Shievitz P (99). The effect of a non-steroidal anti-inflammatory drug on periodontal clinical parameters after scaling. MS Thesis, School of Dentistry, University of Minnesota, Minneapolis, MN. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002). Bayesian measures of model complexity and fit (with discussion and rejoinder) J. Roy. Statist. Soc., Ser. B, 64:8 69. Sterne JAC, Johnson NW, Wilton JMA, Joyston-Bechel S, Smales FC (988). Variance components analysis of data from periodontal research. J. Perio. Res., 2:48. Appendix A. The CAR Prior with Two Neighbor Relations Newcomb (96) showed how to construct a nonsingular B such that Q = B D B and Q 2 = B D 2 B, where D l is diagonal with n G l positive diagonal entries and G l zero entries. Thus (2) s exponent can be written as 2 θ B {τ D + τ 2 D 2 }Bθ. B is orthogonal only if Q Q 2 is symmetric (Graybill 98, Theorem 2.2.2). Also, B is not unique, but apart from permuting rows or 20

23 columns, any B can be obtained from any other B by pre-multiplying by a diagonal matrix with positive diagonal entries. As will become clear, any such change, or any permutation of B s rows or columns, has no noteworthy effect. For a given B, define D l s diagonal elements as d lj 0, l =, 2 and j =,..., n. (The d lj depend on B; we suppress this for simplicity.) For exactly G values of j, d j = d 2j = 0. To see this, set τ = τ 2, turning the problem back into a CAR prior with one class of neighbor relation; D + D 2 has exactly G zero diagonal entries, and the result follows. Without loss of generality, define B so D l s last G diagonal entries are zero and d j + d 2j > 0 for j =,..., n G. Following Hodges et al. (200), define θ = Bθ and partition θ as θ = (θ, θ 2 ), where θ has length n G and θ 2 has length G. Then (2) s exponent is 2 θ diag{τ d j +τ 2 d 2j }θ, diag{ν j } being a diagonal matrix with {ν j } on the diagonal in the order j =,..., n G. This exponent is the kernel of a proper multivariate normal density for θ, which has multiplier n G j= (τ d j + τ 2 d 2j ) /2. (8) For j with both d lj positive, the j th term s contribution to the multiplier is determined by the ratio d j /d 2j, because d 2j > 0 can be factored out and disappears in the proportionality constant. For j with only one d lj positive, that d lj can likewise be factored out. Thus the proper version of this CAR prior is unique even though B is not. A.2 Proof that z and z 2 are identified in non-trivial cases If d j /d 2j = c for j =,..., n G, then D = cd 2, which implies Q = cq 2. Since off-diagonal elements of Q l are 0 or, either c = and Q = Q 2, so each neighbor pair is a pair of both classes, or c = 0 and Q is the zero matrix, i.e., its neighborhood structure is null. Both possibilities were ruled out by assumption, so there are at least two distinct d j /d 2j. 2

24 Figure : Patient s attachment loss. The shaded boxes represent teeth and the circles represent measurement sites. Maxillary and Mandibular refer to upper and lower jaws respectively. The maxilla s second tooth on the left is missing. 2 4 Maxilliary AL(mm) > Mandibular

25 Figure 2: Neighbor pairs in a three-tooth periodontal grid. Rectangles represent teeth, circles represent sites where AL is measured, and different line types represent the types of neighbor pairs. Type I Type II Type III Type IV Figure : Non-periodontal grid with no free terms for either τ or τ 2. The solid lines are class neighbors and the dashed lines are class 2 neighbors. 2

26 Figure 4: Posterior of (z, z 2 ) for Patient under various spatial grids z2 z z (a) Non-identified curves based on (z, z 2 ) s posterior median under Grid A 0 (b) Contour plot of (z, z 2 ) s log marginal posterior under Grid A z 44 0 z2 z z (c) Non-identified curves based on (z, z 2 ) s posterior median under Grid B 0 (d) Contour plot of (z, z 2 ) s log marginal posterior under Grid B z z2 z z (e) Non-identified curves based on (z, z 2 ) s posterior median under Grid C 0 (f) Contour plot of (z, z 2 ) s log marginal posterior under Grid C z 44 24

27 Figure : Summary of the posterior of each patient s smoothing parameters. The boxes represent the posterior medians and the whiskers represent the interquartile ranges of each patient s (z 0p, z p ) or (z p, z 2p ), p =,..., 0. z z (a) (z 0p, z p ) under the NR grid assuming the z lp are independent a priori z0 (b) (z 0p, z p ) under the NR grid assuming the (z 0p, z lp ) are shrunk with a normal prior z0 z z z z (c) (z p, z 2p ) under Grid A assuming the precisions are independent a priori (d) (z p, z 2p ) under Grid A assuming the (z 0p, z lp, z 2p ) are shrunk with a normal prior 2

28 Figure 6: Patient s data (symbols) and posterior mean of X β + θ (solid line) for the NR grid assuming the z ls are independent a priori. Maxillary and Mandibular refer to upper and lower jaws respectively, while buccal and lingual refer to the cheek and the tongue sides of the teeth, respectively. Attachment Loss (mm) D M D M D M D M D M D M D M M D M D M D M D M D M D M D Tooth Number Maxillary/ Lingual Maxillary/ Buccal Observed Data Mandibular/ Lingual Posterior Mean Mandibular/ Buccal 26

29 Figure : Patient s data (symbols) and posterior mean of X β + θ assuming the z lp are independent a priori for the NR grid (solid lines) and Grid A (dashed lines). Maxillary and Mandibular refer to upper and lower jaws respectively, while buccal and lingual refer to the cheek and the tongue sides of the teeth, respectively. Attachment Loss (mm) D M D M D M D M D M D M D M M D M D M D M D M D M D M D Tooth Number Maxillary/ Lingual Maxillary/ Buccal Observed Data Mandibular/ Lingual Posterior Means Mandibular/ Buccal 2

30 Figure 8: Summary each patient s smoothing parameters under different priors. The boxes represent the posterior medians and the whiskers represent the interquartile ranges of each patient s (z p, z 2p ), p =,..., 0. The shaded lines are contours of (z p, z 2p ) s induced prior density. The transformations used are σ 2 l p = /τ lp, λ p = τ p + τ 2p, and β p = τ p /(τ p + τ 2p ). (a) z lp Unif(-0,0), l {0,, 2} (b) z lp LG(.0,.0), l {0,, 2} z z z z (c) σ lp U(0,0), l {0,, 2} (d) σ 2 l p IG(0.0,0.0), l {0,, 2} z z z z (e) τ 0p G(.0,.0), λ p G(.0,.0), β p U(0,) z z

31 Figure 9: Data (symbols) and posterior mean of X β +θ for Patient s left maxillary island under Grid A assuming z lp Uniform(-0,0) (solid lines) and τ 0p G(.0,.0), λ p G(.0,.0), and β p U(0,) (dashed lines), each independent across l {0,, 2} and p {,..., 0}. Buccal and lingual refer to the cheek and the tongue sides of the teeth, respectively. Attachment Loss (mm) D M M D M D M D M D M D M D M D Lingual Buccal Tooth Number Observed Data Posterior Means Figure 0: Plot of each patient s DIC p under grids A and D. DICp under Grid D DICp under Grid A 29

Spatial Analyses of Periodontal Data Using Conditionally Autoregressive Priors Having Two Classes of Neighbor Relations

Spatial Analyses of Periodontal Data Using Conditionally Autoregressive Priors Having Two Classes of Neighbor Relations Brian J. REICH, James S. HODGES, and Bradley P. CARLIN Attachment loss, the extent