Bayesian semiparametric inference for the accelerated failure time model using hierarchical mixture modeling with N-IG priors

Size: px

Start display at page:

Download "Bayesian semiparametric inference for the accelerated failure time model using hierarchical mixture modeling with N-IG priors"

Arlene Ferguson
5 years ago
Views:

1 Bayesian semiparametric inference for the accelerated failure time model using hierarchical mixture modeling with N-IG priors Raffaele Argiento 1, Alessandra Guglielmi 2, Antonio Pievatolo 1, Fabrizio Ruggeri 1 CNR-IMATI, Milano 1 Politecnico di Milano, Milano 2 Abstract We will pursue a Bayesian semiparametric approach for an Accelerated Failure Time regression model, usually considered in survival analysis, when the error distribution is a mixture of parametric densities with a nonparametric mixing measure. The Dirichlet process is a popular choice for the mixing measure, yielding a Dirichlet process mixture model for the error; the paper considers the same model, but here, as an alternative to the Dirichlet process, the mixing measure is equal to a normalized inverse-gaussian prior, built from normalized inverse-gaussian finite dimensional distributions, as recently proposed in the literature. A comparison between the two models will be carried out. Markov chain Monte Carlo techniques will be used to estimate the predictive distribution of the survival time, along with the posterior distribution of the regression parameters. The efficiency of computational methods will also be compared, using both real and simulated data. Keywords: AFT regression models, Bayesian semiparametrics, Mixture models, MCMC algorithms. 1 Introduction In the survival literature, the Accelerated Failure Time AFT) model is usually meant as the multiplicative effect of a fixed p-vector of covariates x = x 1,..., x p ) on the failure time T, i.e. log T = x β + W, where β = β 1,..., β p ) is the vector of regression parameters and W denotes the error. Recently this model has received much attention in the Bayesian community, in particular in papers where the error W or expw)) has been represented hierarchically as a mixture of parametric densities with a Dirichlet process as mixing measure, i.e., the well-known Dirichlet process mixture DPM) models, introduced by Lo 1984). In Kuo and Mallick 1997), the authors model expw) as a Dirichlet process location mixture of normals with variance fixed and small). Kottas and Gelfand 2001) and Gelfand and Kottas 2003) propose a flexible semiparametric class of zero median distributions for the error, which essentially consists of a Dirichlet process scale mixture of split 0-mean normals with skewness handled parametrically). In Ghosh and Goshal 2006) the distribution of expw) is given as a scale mixture of Weibull distributions with a Dirichlet process as mixing measure, and the consistency of the posterior distribution of the regression parameters is established. Pointing out that the marginal prior for expw) in Kuo and Mallick 1997) gives positive probability to the negative reals, Hanson 2006) proposes a DPM model of gamma densities, mixing over both the scale and the shape of the gammas, for the distribution of expw). In the paper we will consider the model T = e x β V, where V := e W is a mixture of some parametric family of densities on the positive real line, mixed by a random distribution function G on R s s is a positive integer). The paper mainly focuses on the performances of two hierarchical mixture models comparing Bayesian inferences on the regression parameters and the survival times. On one hand we will assume that G has a Dirichlet process prior, yielding DPM models for V ; on the other hand, G will have the normalized inverse-gaussian prior N-IG prior), as introduced in Lijoi et al. 2005), thus defining what we call N-IG mixtures for short. The matching between the two priors is achieved centering G at the same distribution function G 0, and assuming equal prior means for the number of components in the mixture. N-IG mixtures of normals have been studied in Lijoi et al. 2006), but, at the best of our knowledge, no papers are available on N-IG mixture with kernel having support on R +, including a regression component. The N-IG prior preserves the same tractability as the Dirichlet process prior, and, furthermore, it is characterized by a more elaborate clustering property: the weight given to a single observation depends on the whole number of ties in the sample. Moreover, the prior distribution of the number of components for the N-IG mixture is more spread out compared to the same prior under the DPM model. Following Hanson 2006), we will consider hierarchical mixtures of gamma densities, mixed on both the scale and the shape parameters. The centering distribution G 0 we assume yields an infinite mean marginal prior for V ; however prior information can be expressed by means of the median of V. Posterior inference on β and PT > t) is carried out via Gibbs sampling, incorporating censoring when necessary. Of course, density estimation can be performed within this model ignoring the regression aspect i.e. simply assuming a null vector of covariates). The two competing models will be tested on real and simulated data. We will use the same notation to denote distribution functions and the corresponding probability measures. 2 The model Let T 1,...,T n be the survival times of n subjects, and let x i = x i1,..., x ip ) be the covariate vector associated with observed or censored) t i, i = 1,...,n. The model we consider can be hierarchically expressed as follows

2 G q, T i = e x i β V i, i = 1,...,n, V i θ i ind k ; θ i ), θ i G iid G, 1) G 0 A) := E q GA)), A BΘ) β G, β πβ), where k ; θ i ) is a family of densities on R +, depending on a vector of parameters θ i belonging to a Borel subset Θ of R s, and q is the prior distribution on the random d.f. G; G 0 is a d.f. on Θ, expressing the mean of G. Here we will assume that G is a Dirichlet process, or an N-IG process; in both cases, the prior is completely known when an additional positive parameter is given. Indeed, we will write G DPaG 0 )) to denote the Dirichlet prior with parameter ag 0 ), and G N-IGM, G 0 )) to denote the N-IG prior; see Lijoi et al. 2005) for notation and definition. Recall that, for both the Dirichlet process and N-IG priors, EGA)) = G 0 A), whereas VarGA)) = G 0 A)1 G 0 A))M 2 e M Γ 2, M) for the N-IG prior Γ, ) denotes the incomplete gamma function), and, on the other hand, VarGA)) = G 0 A)1 G 0 A)))/a + 1) under the Dirichlet prior. The Bayesian model specification is completed assuming that G 0 depends on s hyperparameters γ 1,..., γ s possibly random and distributed according to πγ 1,..., γ s )). Observe that, if G is an N-IG process with diffuse mean G 0, then the predictive distributions of θ n+1, given θ 1,...,θ n, are Pθ n+1 A θ 1,..., θ n ) = 2) k w 0,n G 0 A) + w 1,n n j 1 2 )δ θj A), for any A BΘ), where θ 1,...,θ k n 1,...,n k, j n j = n) denote the k distinct observations multiplicities) within θ 1,..., θ n and w 0,n = w 1,n = P n r=0 n r) M 2 ) r+1 Γk+1+2r 2n;M) 2n P n 1 r=0 n 1 r ) M 2 ) r Γk+2+2r 2n;M) P n r=0 n r) M 2 ) r+1 Γk+2r 2n;M) n P n 1 r=0 n 1 r ) M 2 ) r Γk+2+2r 2n;M). Moreover, the distribution of the number of distinct observations k in a sample of size n is p N-IG k n) = 2n k 1 n 1 r=0 n 1 r 3) ) e M M 2 ) n 1 n 1 2 2n k 1 Γk) 4) ) Γk+2+2r 2n;M) M 2 ). r It should be pointed out that, for n fixed, the corresponding value of the mean has a lower bound, given by the limit as M goes to 0. As mentioned in the Introduction, we are going to consider hierarchical mixtures of gamma densities k ; θ), θ = ϑ 1, ϑ 2 ), with mean ϑ 1 /ϑ 2, and to choose the centering distribution G 0 on R + R + as the product of two exponential distributions, i.e. ϑ 1 and ϑ 2, under G 0, are independent, exponentially distributed with parameters γ 1 and, respectively. Assuming independent exponential distributions for G 0 is computationally convenient, since, in this case, a closedform expression for the marginal prior of V is available, which is needed in the implementation of the Gibbs sampling procedure a conjugate hierarchical mixture model is obtained). Indeed, for model 1), under both Dirichlet process and N-IG priors, the marginal prior density of V is, for v > 0, f V v) = Θ kv; θ)g 0 dθ) 5) γ 1 = vv + )γ 1 + log v+γ2 v )), 2 with distribution function F V v) = γ 1 / γ 1 + log1 + γ2 v ), ) v > 0. This distribution has infinite mean, but information about the hyperparameters γ 1 and can be derived fixing the median m of V, thus resulting in a semiparametric m-median regression model, analogously as in Kottas and Gelfand 2001). Therefore, in the paper, we will assume γ 1 = log1 + /m). On the other hand, the hyperparameter controls the dispersion of V : the interquartile range of the marginal prior, as a function of m and, is 1 + γ2 /m) 1/3 1 ) γ2 /m) 3 1 ) 1 ), which, for fixed m, increases with increasing. Observe that the 90% prior probability interval for V is [ 1 + ) 19, m ) 1/19 ]. m 1 However, we point out that this choice for G 0 is not extremely flexible, since f V is decreasing with an asymptote at 0 for any γ 1,, in such a way that it always puts an appreciable amount of mass near zero even if γ 1 log1 + /m)). A flat prior distribution is often imposed upon the β regression parameters Kuo and Mallick 1997), Hanson 2006)), regardless of the form of the kernel density k ; θ). We introduce instead the reparametrization α j = e βj, j = 1,...,p, and assign independent gamma priors to the α j s. In this way, with gamma kernel densities, the full conditional posterior distributions of the α j s associated to binary covariates are still gamma. 3 The algorithm Bayesian inference is carried out through MCMC simulation. The algorithm for the AFT model is built upon one for the hierarchical mixture model, by adding a module for the updating of the regression parameters. The desired output of the algorithm for the hierarchical mixture model is an estimate of the predictive density of a new observation V n+1, given the data v 1,...,v n. This can be obtained as the ergodic average of the conditional densities of V n+1, given θ j) = θ j) 1,..., θj) n ) and the data, where {θ j) } j 1 is the output of a Markov chain having fθ data) as its equilibrium distribution: ˆf Vn+1 v data) = 1 J J f Vn+1 v data, θ j) ). 6)

3 The density f Vn+1 v data, θ) can be found by conditioning on G first, and then integrating it out with respect to its distribution given θ and the data, which leaves us with the following closed form expressions: f Vn+1 v data, θ) = 1 [ f V v) + a + n for the DPM model, and n i=1 ] kv; θ i ) f Vn+1 v data, θ) = 8) k w 0,n f V v) + w 1,n n j 1 ) kv; θj 2 ) for the N-IG mixture, f V v) in 7) and 8) as given in 5). To produce the Markov sequence {θ j) } j 1 with fθ data) as its stationary distribution, we use a Polya urn Gibbs sampler scheme such as Algorithm 2 in Neal 2000). The algorithm updates in turn and componentwise a vector of cluster indexes c 1,...,c n ), where c i = c j if θ i = θ j, and the vector of distinct values θ = θ1,..., θ k ). The cluster indexes are linked to the distinct values of the parameters by letting c i = j if θ i = θj. The algorithm for the N-IG mixture model and for the DPM model are identical, except for the weights used for choosing the urn. For the moment, let us denote w i,n by w i,n k), in order to stress the dependence on k. Let us also denote by n i) j the multiplicity of θj after removing θ i from θ, which produces a θ i), by c i) the corresponding vector of indexes and by k i) the dimension of θ i). Then the full conditional of c i is Pc i = j c i), θ i), v i ) ) w 1,n 1 k i) ) n i) j 1 2 kv i θ i) j ) =, w 0,n 1 k i) )a i + w 1,n 1 k i) )b i if j k i), and Pc i = k i) + 1 c i), θ, v i ) = where a i := f V v i ) and b i := k i) w 0,n 1 k i) )a i w 0,n 1 k i) )a i + w 1,n 1 k i) )b i, n i) j 1 ) kv i ; θ i) j ). 2 When c i takes on a new value k i) + 1, a corresponding new value θ i must be sampled from the posterior distribution proportional to kv i ; θ i )g 0 θ i ), with the same method given by Hanson 2006) as Result 2, thus enlarging θ. The full conditional of θj is simply proportional to g 0 θ j ) i:c i=j kv i ; θ j ), 7) where g 0 is the density function associated to G 0. This distribution is easy to sample from when n j = 1 by the abovementioned Result 2, whereas it requires a Metropolis step otherwise. The extension of the Gibbs sampler to the AFT model is obtained by letting p v i = e x i β t i = α xij j t i with the reparametrization introduced in the previous Section) and by updating the indexes and θ as just described. The α j parameters corresponding to binary covariates have a full conditional gamma distribution, whereas a Metropolis step is required for the remaining regression parameters. The actual state of the Markov chain is thus c, θ, α). In the survival analysis context, the predictive distribution of an individual with covariate value x is usually presented as the predictive survival function. Therefore, with data t 1,...,t n, instead of 6) one calculates Ŝ Tn+1 t x, data) = 1 J J S Vn+1 e x β j) t data, θ j) ). The survival function S Vn+1 data, θ) can be easily found by integrating 7) or 8). When censored survival times are present, one can proceed by augmenting the unknown parameter space with the unobserved survival times, and enlarging θ and c consequently. The unobserved survival times are sampled one at a time from their full conditional distributions truncated at their respective censoring points. 4 Data illustrations 4.1 Simulated data for density estimation We consider simulated data from a mixture of 3 gamma densities to perform density estimation. We generated a random sample of size 100 from the density 0.2 gamma40, 20)+0.6 gamma6, 1)+0.2 gamma200, 20), the mean is 6 and the variance is 10.12) and computed the posterior density estimates from the Dirichlet and the N-IG mixtures of gammas the no-covariate AFT model). We assumed M = 0.01 and M = 5.39, which correspond to prior means of the number of components under the N-IG prior equal to and 30, respectively. The matching with the Dirichlet process prior is achieved when a = 3.10 and a = 14.16, respectively. We set the median m of V equal to 5.67 i.e. the true median), 56.7 and 0.57, and the hyperparameter in G 0 equal to 0.01, 1 and 10, corresponding to the values of the IQRs and the 90% prior probability intervals listed in Table 1. For each choice of the hyperparameters we ran several MCMC chains and observed that they stabilize after around 2000 iterations. Then we ran the chains for 50000

4 a) b) c) d) Figure 1: Histogram from the simulated data and density estimates under the Dirichlet process prior dotted red) and the N-IG prior dashed blue) when = Panel a): a = 3.10, M = 0.01, m = 5.67, Panel b): a = 14.16, M = 5.39, m = 5.67, Panel c): a = 3.10, M = 0.01, m = 56.7, Panel d): a = 14.16, M = 5.39, m = In each graph the solid green) line denotes the true density. Table 1: IQR and 90% probability interval for the marginal prior of V m [0.03,10.86] [0.29,107.82] [2.98, ] [4 10 9,18.20] [0.05,116.48] [2.54, ] [ ,60.08] [4 10 8,181.95] [0.48, ] Table 2: Errors in the uniform metric for the simulated dataset of size 100 between the true and estimated distribution functions. m M a iterations keeping the values every 50 iterations to reduce autocorrelation. As a measure of the performances of the models, we computed the error, in the uniform metric EUM), between the true distribution function and the predictive distributions under both priors). Figure 1 displays the true density, the histogram from the simulated data, and the density estimates under the Dirichlet and N-IG process priors for some values of the hyperparameters; in each graph the two estimates are indistinguishable. Table 2 presents the errors for the values of the hyperparameters as in Figure 1, confirming the impression gained from the graphs. To quantify the difference in estimating the true density for repeated samples and at a smaller sample size, we computed the mean error between the exact d.f. and the estimates. If ˆFT T 1,..., T n ) denotes the Bayesian estimate of the true F T ), based on a data sample T 1,..., T n ), and EUMT 1,..., T n ) := sup t ˆF T t T 1,..., T n ) F T t), we computed E T1,...,T n EUMT 1,...,T n )) generating 200 samples of size n = 30. The mean errors and the corresponding standard deviations) of the estimates for some choices of the hyperparameters are presented in Table 3. The values obtained seem to confirm a substantial equivalence of the two nonparametric priors in density estimation. However, according to Figure 2, the posterior distribution of the number of the clusters under the N-IG model corresponding to the prior 4)) is more robust to the choice of the hyperparameters. Its modal values are indeed closer than those under the DPM model to the number of clusters in the true density, and more so when the prior expected number of clusters is farther from the true value panels b) and d)).

5 Table 3: Mean errors and corresponding standard deviations in brackets) between the estimates and the true distribution for samples of size 30 from Example 1. The error is the distance, in the uniform metric, between distribution functions. EK n ) = 6.2 EK n ) = 14.5 γ N-IG DIR N-IG DIR ) ) ) ) m = ) ) ) ) ) ) ) ) ) ) ) ) m = ) ) ) ) ) ) ) ) ) ) ) ) m = ) ) ) ) ) ) ) ) Ek n ) = m = a) Ek n ) = 30 m = b) Ek n ) = m = c) d) Ek n ) = 30 m = Figure 2: Posterior distributions of the number of the clusters under the N-IG mixtures solid blue) and the DPM model dotted red) when = 0.01 for the simulated dataset of size 100.

6 patient 1,36) EK n) = 12.4 patient 0,36) EK n) = patient 1,36) EK n) = 41.9 patient 0,36) EK n) = Figure 3: Estimated survival functions under the Dirichlet process prior dotted red) and the N-IG prior dashed blue) for 2 patients covariate 1, 36) in the left column and covariate 0, 36) in the right column) from Example 2 when m = and = 1. We set a = 3.271, M = in the first row and a = , M = 10 in the second row. 4.2 Dataset involving censoring As a second example, we consider survival times in thousands of days of small-cell lung cancer data patients with right censoring from Ying, Jung and Wei 1995), and studied also in Walker and Mallick 1999), Yang 1999), Kottas and Gelfand 2001), Hanson 2006). The standard therapy is to use a combination of etoposide E) and cisplatin P); however the optimal sequencing and administration schedule was not defined. The data consist of n = 121 survival times in days of patients with limited-stage small-cell lung cancer who were randomly assigned to two different regimens Treatment A: P followed by E, administered to 62 patients, and Treatment B: E followed by P, administered to 59 patients); moreover 23 patients were administratively right-censored. In this case the covariates are x i = x i1, x i,2 ), with x i1 = 0 if patient i was assigned to Treatment A, and x i,2 denoting the patient s entry age. In Figure 3 the estimated survival distributions, under the two semiparametric Bayesian models considered here, for 2 patients, are shown, for some values of the hyperparameters. Observe that, for high values of M and a i.e. strong confidence in the prior), both estimates of the survival function show a bump near the origin. This is imputable to the choice of the marginal prior of V, presenting a heavy asymptote in zero for each value of the hyperparameters. The Bayesian estimates of α 1 and α 2, together with the 90% credible intervals - for m = and different values of - are presented in Table 4; the two models considered provide very similar results, that actually agree with Bayesian estimates provided by the literature. We ran the MCMC chains for iterations, with a thinning of 100, and a burn-in period of To establish a predictive fit measure for the model considered, we adopted a cross-validation approach via the predictive densities of each non-censored survival time, given all the others; see Gelfand, Dey, Chang 1992). If ỹ j, j = 1,..., n, is the estimated from the Gibbs sampling procedure) median of the predictive distribution of T j T j), and s j is the estimated predictive median of T j ỹ j, given T j), then t j ỹ j represents a standardized residual. In this case, a predictive fit index can be defined s j as I := t j ỹ j, s j j S where S denotes the index set of non-censored data in the sample. For m = 2.436, = 1 and EK n ) = 12.41, the indexes of predictive fit are I N-IG = and I DIR = 99.99, indicating a slightly) better fit for the N-IG mixture; see Figure 4. 5 Comments From a predictive point of view, the illustrative examples show that there is not a substantial difference between the DPM and N-IG process priors in mixture density estimation. However,

7 Table 4: Estimates of α 1 and α 2, with 90% probability credible intervals, for the small-cell lung cancer dataset under the N-IG mixture and DPM priors, when m = N-IG M=0.001 EK n ) = 12.41) M=1 EK n ) = 19.31) M=10 EK n ) = 41.93) ˆα 1 = 1.149; 1.181; 1.819) ˆα 1 = 1.421; 1.124; 1.762) ˆα 1 = 1.409; 1.115; 1.782) ˆα 2 = 1.011; 1.000; 1.023) ˆα 2 = 1.017; 1.006; 1.028) ˆα 2 = 1.018; 1.007; 1.028) ˆα 1 = 1.533; 1.203; 1.903) ˆα 1 = 1.529; 1.201; 1.895) ˆα 1 = 1.522; 1.181; 1.926) ˆα 2 = 1.015; 1.005; 1.025) ˆα 2 = 1.015; 1.005; 1.025) ˆα 2 = 1.015; 1.006; 1.024) ˆα 1 = 1.521; 1.022; 1.042) ˆα 1 = 1.531; 1.168; 1.941) ˆα 1 = 1.516; 1.158; 1.967) ˆα 2 = 1.032; 1.022; 1.042) ˆα 2 = 1.031; 1.021; 1.041) ˆα 2 = 1.029; 1.021; 1.039) DIR a=3.271 EK n ) = 12.41) a= EK n ) = 19.31) a= EK n ) = 41.93) ˆα 1 = 1.437; 1.149; 1.772) ˆα 1 = 1.409; 1.123; 1.753) ˆα 1 = 1.387; 1.096; 1.761) ˆα 2 = 1.014; 1.003; 1.025) ˆα 2 = 1.016; 1.005; 1.028) ˆα 2 = 1.021; 1.013; 1.029) ˆα 1 = 1.515; 1.174; 1.887) ˆα 1 = 1.512; 1.192; 1.869) ˆα 1 = 1.530; 1.156; 1.984) ˆα 2 = 1.016; 1.007; 1.025) ˆα 2 = 1.018; 1.009; 1.027) ˆα 2 = 1.020; 1.012; 1.028) ˆα 1 = 1.518; 1.150; 1.947) ˆα 1 = 1.521; 1.128; 1.943) ˆα 1 = 1.537; 1.123; 2.049) ˆα 2 = 1.032; 1.022; 1.043) ˆα 2 = 1.032; 1.023; 1.041) ˆα 2 = 1.033; 1.025; 1.042) I NIG = I DIR = RESIDUALS x 2 Figure 4: Standardized residuals against the continuous covariate for the small-cell lung cancer dataset under the N-IG mixture blue) and DPM red) priors, when m =

8 the N-IG process prior seems to be more effective in the estimation of the number of components in the mixture. In the regression example, the model with the N-IG process prior error fits the data a little better. However, a similar experiment on the Feigl and Zelen 1965) leukemia dataset see Argiento, 2007) gives the opposite result I N-IG = and I DIR = ). In both cases we notice that the difference in the predictive fit indexes is not dramatic, and it is difficult to choose between the two process priors based only on this. This uncertainty remains even if we consider that the result on the Feigl and Zelen dataset is influenced by a few abnormal observations, whose removal produces essentially equal predictive fit indexes. Although the N-IG process prior seems to be better than the DPM prior in finding clustering structures in the data, the computation of the weights 3) requires multiple precision arithmetics, because of the presence of the sum of several incomplete gamma functions. Therefore we did all the computations and the MCMC simulations using R R Development Core Team, 2006), but we used Maple for setting up a table with the necessary weights which do not change during simulations). An alternative to Maple is the PARI C library The PARI Group, 2006), which can be used both at initialization and at run time, because C subroutines can be loaded into R. Given the availability of these multiple precision computational tools, the calculation of the weights is not a serious concern. Kottas, A. and Gelfand, A. E. 2001), Bayesian semiparametric median regression modeling, Journal of the American Statistical Association, 96, Kuo, L. and Mallick, B. 1997), Bayesian semiparametric inference for the accelerated failure-time model, The Canadian Journal of Statistics, 25, Lijoi, A., Mena, R. H. and Pruenster, I. 2005), Hierarchical mixture modeling with normalized inverse-gaussian priors, Journal of the American Statistical Association, 100, Lijoi, A., Mena, R. H. and Pruenster, I. 2006), Controlling the reinforcement in Bayesian nonparametric mixture models, under review. Lo, A.Y. 1984), On a class of Bayesian nonparametric estimates: I. Density estimates, The Annals of Statistics, 12, Neal, R. M. 2000), Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics, 100, R Development Core Team 2006), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria The PARI Group 2006), PARI/GP, version 2.3.1, Bordeaux Yang, S. 1999). Censored median regression using weighted empirical survival and hazard functions, Journal of the American Statistical Association, 94, Ying, Z., Jung, S. H. and Wei, L. J. 1995). Survival analysis with median regression models, Journal of the American Statistical Association, 90, Walker, S. and Mallick, B.K. 1999). A Bayesian semiparametric accelerated failure time model, Biometrics, 55, Acknowledgments The authors are indebted to Ramses Mena, who provided some code and precious suggestions. Moreover, the authors would like to thank Antonio Lijoi and Igor Pruenster for helpful discussions. References Argiento, A. 2007), Bayesian semiparametric inference for the accelerated failure time model, PhD Thesis, Bocconi University. Cook, R.D. and Weisberg S. 1982), Residuals and influence in regression, Chapman & Hall, London. Feigl, P. and Zelen, M. 1965), Estimation of exponential survival probabilities with concomitant information, Biometrics, 21, Gelfand, A. E., Dey, D. K. and Chang, H. 1992), Model determination using predictive distributions, with implementation via sampling-based methods Disc: p ), Bayesian Statistics 4. Proceedings of the Fourth Valencia International Meeting, Gelfand, A. E. and Kottas, A. 2003), Bayesian semiparametric regression for median residual life, Scand. J. Statist., 30, Ghosh, S.K. and Ghosal, S. 2006), Semiparametric accelerated failure time models for censored data, Bayesian Statistics and its Applications S. K. Upadhyay et al., eds.), Anamaya Publishers, New Delhi, Hanson, T.E. 2006), Modeling censored lifetime data using a mixture of gammas baseline, Bayesian Analysis, 1, Kalbfleisch, J. D. and Prentice, R.L. 2002), The statistical analysis of failure time data, Second Edition, Wiley, Hoboken NJ). Kottas, A. 2006), Nonparametric Bayesian survival analysis using mixtures of Weibull distributions, J. Statist. Plann. Inference, 136,

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University