BAYESIAN ANALYSIS OF PROPORTIONAL HAZARD MODELS. Ewha Womans University and Pennsylvania State University

Similar documents
FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

On the Breslow estimator

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

Full likelihood inferences in the Cox model: an empirical likelihood approach

STAT Sample Problem: General Asymptotic Results

Asymptotics for posterior hazards

Bayesian Inference. Chapter 9. Linear models and regression

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Statistical Inference and Methods

Multivariate Survival Data With Censoring.

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Efficiency of Profile/Partial Likelihood in the Cox Model

Bayesian Analysis of Simple Step-stress Model under Weibull Lifetimes

CS Lecture 19. Exponential Families & Expectation Propagation

Modelling geoadditive survival data

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Survival Analysis Math 434 Fall 2011

Product-limit estimators of the survival function with left or right censored data

Analysing geoadditive regression data: a mixed model approach

Bayesian Analysis for Partially Complete Time and Type of Failure Data

Formulas for probability theory and linear models SF2941

and Comparison with NPMLE

Multivariate Survival Analysis

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Statistical Analysis of Competing Risks With Missing Causes of Failure

University of California, Berkeley

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Laplace s Equation. Chapter Mean Value Formulas

Exponential martingales: uniform integrability results and applications to point processes

Asymptotics for posterior hazards

A Very Brief Summary of Statistical Inference, and Examples

On Reparametrization and the Gibbs Sampler

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance

Empirical Processes & Survival Analysis. The Functional Delta Method

Step-Stress Models and Associated Inference

Frailty Models and Copulas: Similarities and Differences

Models for Multivariate Panel Count Data

Hybrid Censoring; An Introduction 2

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Mi-Hwa Ko. t=1 Z t is true. j=0

A New Two Sample Type-II Progressive Censoring Scheme

Multistate Modeling and Applications

STAT 331. Martingale Central Limit Theorem and Related Results

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Lecture 5 Models and methods for recurrent event data

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Bayesian Methods for Machine Learning

A class of neutral to the right priors induced by superposition of beta processes

Monotonicity and Aging Properties of Random Sums

A general mixed model approach for spatio-temporal regression data

Estimation of parametric functions in Downton s bivariate exponential distribution

On the posterior structure of NRMI

Spring 2012 Math 541B Exam 1

Bayesian Regression Linear and Logistic Regression

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators

DAGStat Event History Analysis.

Introduction to Reliability Theory (part 2)

Information and Credit Risk

Survival Analysis for Case-Cohort Studies

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Lecture 10: Semi-Markov Type Processes

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Stat 5101 Lecture Notes

Asymptotic Statistics-III. Changliang Zou

Change detection problems in branching processes

CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Poisson random measure: motivation

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS

Contents 1. Contents

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. Chapter Stochastic Process

A Note on the Central Limit Theorem for a Class of Linear Systems 1

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Theoretical Properties of Weighted Generalized. Rayleigh and Related Distributions

On the Fisher Bingham Distribution

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

ECE 4400:693 - Information Theory

CTDL-Positive Stable Frailty Model

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Jump-type Levy Processes

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Estimation of the Bivariate and Marginal Distributions with Censored Data

Asymptotic behavior for sums of non-identically distributed random variables

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

GAUSSIAN PROCESSES; KOLMOGOROV-CHENTSOV THEOREM

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Asymptotic statistics using the Functional Delta Method

Parameter Estimation

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Transcription:

The Annals of Statistics 23, Vol. 31, No. 2, 493 511 Institute of Mathematical Statistics, 23 BAYESIAN ANALYSIS OF PROPORTIONAL HAZARD MODELS BY YONGDAI KIM 1 AND JAEYONG LEE Ewha Womans University and Pennsylvania State University This paper is concerned with Bayesian analysis of the proportional hazard model with left truncated and right censored data. We use a process neutral to the right as the prior of the baseline survival function and a finitedimensional prior is placed on the regression coefficient. We then obtain the exact form of the joint posterior distribution of the regression coefficient and the baseline cumulative hazard function. As a by-product, we prove the propriety of the posterior distribution with the constant prior on the regression coefficient. 1. Introduction. This paper is concerned with Bayesian inference of the proportional hazard model when observations are both left truncated and right censored (LTRC). Bayesian analysis of the proportional hazard model with right censored data has been studied, for example, by Kalbfleisch (1978), Hjort (199) and Laud, Damien and Smith (1998). However, their results cannot be extended directly to LTRC data. To the best of our knowledge, even with the popular prior processes gamma and beta a Markov chain Monte Carlo (MCMC) algorithm is not available. The main contribution of this paper is the derivation of the posterior distribution in a closed form for LTRC survival data. There are two important consequences. First, any Markov chain Monte Carlo algorithm for right censored data can be modified to implement a Bayesian analysis of LTRC data. Second, the theoretical study of the posterior distribution for LTRC data has become simpler because of the availability of its closed form. In particular, we prove the propriety of the posterior when an improper constant prior is used for the regression coefficient. To our knowledge, this result is the first propriety result concerning prior processes neutral to the right. Our proof also has novel features. We derive the posterior distribution by embedding the LTRC data into a counting process model. Because it handles left truncated data easily, this approach, using counting processes, has various advantages over other earlier approaches. First the proof is much simpler and more systematic. Kalbfleisch (1978) used gamma process priors for the log of the survival function and obtained the posterior distribution by deriving finitedimensional distributions of the posterior process. The proof involves nontrivial Received February 21; revised May 22. 1 Supported in part by KOSEF through the Statistical Research Center for Complex Systems at Seoul National University. AMS 2 subject classifications. Primary 62C1; secondary 6G55. Key words and phrases. Bayesian analysis, proportional hazard model, left truncation, censoring, neutral to the right process, propriety of posterior. 493

494 Y. KIM AND J. LEE details, particularly when there are ties among observations. Our approach handles ties in a consistent way. Second, the posterior distribution can be derived for a much wider class of priors. While the previous results require specific knowledge of a prior for the baseline distribution, such as a gamma or beta process, our approach requires only that the prior of the baseline distribution be a process neutral to right. In Example 1 of Section 3, Kalbfleisch s result is deduced from the main theorem of this paper. In Example 2 of the same section, we derive the posterior distribution of the regression coefficient with beta process priors, which is an extension of Hjort (199) in tied observations. Laud, Damien and Smith (1998) implemented an MCMC algorithm for the proportional hazard model with beta process priors. However, the derivation of the conditional posterior distribution of the cumulative hazard function, given the regression coefficients used in their algorithm, is not available in the literature. Our results justify its use. The paper is organized as follows. Section 2 introduces the model and the class of priors. Section 3 presents the posterior distribution. Section 4 gives the propriety result. Sections 5 and 6 give the proofs of the results. 2. Proportional hazard models and processes neutral to the right. In this section, we introduce the proportional hazard model for LTRC data and review prior processes neutral to the right for the baseline survival function. We begin by modelling the complete data and then introduce the truncation and censoring mechanisms. The postulates of the proportional hazard model are as follows. Let X 1,X 2,... be survival times with covariates Z 1,Z 2,..., where Z i R p. Suppose the distribution F i of X i with covariate Z i is given by 1 F i (t) = (1 F(t)) exp(βt Z i ), where β R p is the unknown regression coefficient and F is the unknown baseline distribution function. The survival times are only partially observed due to the presence of truncation and censoring variables (W i,c i ), which are assumed to be independent random vectors independent of the X i s. Let T i = min(x i,c i ) and δ i = I(X i C i ). In the presence of LTRC data, one observes (T i,δ i ) only when T i W i. Finally the data consist of n copies of (T i,δ i,w i,z i ) with T i >W i.bysettingw i =, data subject to only right censoring can be treated in this framework. There are two parameters in the proportional hazard model: the regression coefficient β and the baseline cumulative distribution function (c.d.f.) F.Weuse a process neutral to the right [Doksum (1974)] for F and a finite-dimensional distribution with a density for β. Processes neutral to the right include popular prior processes such as Dirichlet processes [Ferguson (1973)], gamma processes [Kalbfleisch (1978); Lo (1982)] and beta processes [Hjort (199)]. Before deriving the posterior distribution, we review several features of processes neutral to the right. A random distribution function F is a process neutral to the right if F(t)= 1 exp( Y(t)),whereY(t)is a nondecreasing Lévy process with Y() = andlim t Y(t)= with probability 1 [Doksum (1974)]. We can

BAYESIAN ANALYSIS OF HAZARD MODELS 495 redefine it in terms of a cumulative hazard function (c.h.f.), taking the approach initiated by Hjort (199). Let A be the c.h.f. of F, A(t) = t (1 F(s )) 1 df (s). Then, it can be shown that F is a process neutral to the right if and only if A is a nondecreasing Lévy process such that A() =, A(t) 1, for all t with probability 1 and either A(t) = 1forsomet>orlim t A(t) = a.s. From what follows, we simply use the term Lévy process forapriorprocessofthe c.h.f. A that induces a process neutral to the right on F. For any given Lévy process A(t) on [, ), there exists a unique random measure µ on [, ) [, 1] such that (1) A(t) = xµ(ds, dx), [, t] [, 1] where µ is defined by (2) µ([,t] B) = I ( A(s) B ) s t for any Borel subset B of [, 1] and for all t>. Since µ is a Poisson random measure [Jacod and Shiryaev (1987), page 7], there exists a unique σ -finite measure ν on [, ) [, 1] such that E(µ([,t] B)) = ν([,t] B). Conversely, for a given σ -finite measure ν with t 1 xν(ds,dx) < for all t>, we can construct a Lévy process through (1). Thus, we can conveniently characterize a Lévy process by ν, which we call the Lévy measure of A (or F ). The expectation of A(t) is also neatly expressed as (3) E(A(t)) = t 1 xν(ds, dx). The relationship between Le vy processes and Poisson random measures is well known in probability theory. For its complete treatment, see Jacod and Shiryaev (1987). For their application to Bayesian survival models, which is relatively new, refer to Kim (1999) and Kim and Lee (21). 3. The posterior distribution. A priori, let the baseline c.d.f. F beaprocess neutral to the right with a Lévy measure ν of the form ν(dt,dx) = f t (x) dx dt for x [, 1] and let π(β) be the prior density function for β. Letq n be the number of distinct uncensored observations and let t 1 <t 2 < <t qn be the ordered distinct uncensored observations. Define D n (t) ={i : T i = t,δ i = 1,i = 1,...,n}, R n (t) ={i : W i <t T i,i = 1,...,n} and R n + (t) = R n(t) D n (t). Thus, D n (t) and R n (t) are the sets of the failure times at time t and observations at risk at time t, respectively. The next theorem is the main result of the paper and is proven in Section 5. THEOREM 3.1. Let the observations be denoted by D n = ((T 1,δ 1,W 1,Z 1 ),...,(T n,δ n,w n,z n )).

496 Y. KIM AND J. LEE (i) Conditional on β and D n, the posterior distribution of F is a process neutral to the right with Lévy measure ν(dt,dx β,d n ) = (1 x) q (4) n j Rn(t) exp(β T Z j ) f t (x) dx dt + dh i (x β)δ ti (dt), where δ a is the degenerate probability measure at a and H i ( β) is a probability measure on [, 1] with density proportional to [ ( h i (x β) = 1 (1 x) exp(β T Z j ) )] j R (1 x) n + (t i ) exp(βt Z j ) (5) fti (x). (6) j D n (t i ) (ii) The marginal posterior distribution of β is where ρ(β) = n Ti 1 W i q n π(β D n ) e ρ(β) 1 h i (x β)dxπ(β), exp(β T Z i ) ) i 1 (1 x) j=1 Y j (t) exp(β T Z j ) f t (x) dx dt, Y j (t) = I(W j <t T j ) for j = 1,...,n and i 1 j=1 Y j(t) exp(β T Z j ) = when i = 1. REMARK. Setting Z j = forallj, we can deduce the posterior of the c.h.f. when there are no covariates, which coincides with the results in Hjort (199) and Kim (1999). REMARK. The marginal posterior density of β given in (6) has the form L(β)π(β), wherel(β) = exp( ρ(β)) q n 1 h i (x β)dx. Thus, L(β) is the integrated likelihood of β with the c.h.f. integrated out. This result can be used for computation of posterior model probabilities in selecting a subset of appropriate covariates or in Bayesian model averaging. REMARK. For only right censored observations, the posterior distribution can be obtained from Theorem 3.1 by letting W i = foralli. In the next two examples of the application of Theorem 3.1, we derive the posterior distributions from the two well-known families of prior processes gamma and beta processes.

BAYESIAN ANALYSIS OF HAZARD MODELS 497 EXAMPLE 1 (Gamma process prior). A priori, assume that Y(t) = log(1 F(t)) is a gamma process with parameters (c(t), A (t)). Here, the gamma process with parameters (c(t), A (t)) is defined to be a Lévy process whose log moment generating function is log E ( exp( θy (t)) ) t = (e θx 1) c(s) x exp( c(s)x)dxda (s) provided A (t) is continuous. This class of prior processes was proposed by Lo (1982). Kalbfleisch (1978) used a subclass of these processes with constant c(t) for the proportional hazard model. The cumulative hazard function A of the gamma process is a Lévy process with a Lévy measure ν given by t 1 ν([,t] B) = B log(1 x) c(s)(1 x)c(s) 1 dxda (s). Therefore, from Theorem 3.1(i), we can establish that the posterior distribution of the c.h.f. given β is a Lévy process with Lévy measure where ν(dt,dx β,d n ) = c(t) log(1 x) (1 x) j Rn(t) exp(β T Z j )+c(t) 1 dxda (t) + q n dh i (x β)δ ti (dt), [ 1 dh i (x β) log(1 x) j D n (t i ) exp(β T Z j ) )] (1 x) j R n + (t i ) exp(βt Z j )+c(t i ) 1 dx. Even though the continuous part of Y is a gamma process, the gamma process prior is not conjugate because the distributions of jumps are not gamma distributions. This fact was overlooked by Clayton (1991), who used a gamma process in implementing an MCMC algorithm for the frailty model. In the case of constant c(t) without left truncation, our result coincides with that of Kalbfleisch (1978). EXAMPLE 2 (Beta process prior). The beta process with mean A and scale parameter c is a Lévy process with Lévy measure ν(dt,dx) = c(t)x 1 (1 x) c(t) 1 dxda (t). If the prior of the c.h.f. A isthebetaprocessgivenabove, the posterior of A given β is a Lévy process with Lévy measure ν(dt,dx β,d n ) = c(t) x (1 x) j Rn(t) exp(β T Z j )+c(t) 1 dxda (t) q n + dh i (x β)δ ti (dt),

498 Y. KIM AND J. LEE where dh i (x β) 1 x [ j D n (t i ) exp(β T Z j ) )] j R (1 x) n + (t i ) exp(βt Z j )+c(t i ) 1 dx. Also the marginal posterior distribution of β is given by (6) with h i (x β)dx = dh i (x β) and ρ(β) = n Ti 1 W i c(t) exp(β T Z i ) ) x i 1 (1 x) j=1 Y j (t) exp(β T Z j )+c(t) 1 dxda (t). Hjort (199) derived the posterior distribution with a beta process prior when there are no ties and observations are subject to only right censoring. This example extends his results for tied observations. With the result given above, we can devise a Markov chain Monte Carlo algorithm for left truncated and right censored data. Based on the idea that a positive increasing Lévy process is an integral of a Poisson random measure, Lee and Kim (22) developed an approximate algorithm to generate a beta process and illustrated their algorithm with the proportional hazard model when the observations are subject to right censoring only. The algorithm can be used verbatim for the left truncated and right censored data, except that the risk set R n (t) is adjusted for left truncation, that is, W i s are not. 4. Propriety of the posterior. In this section, we consider the propriety of the posterior distribution with the constant improper prior on β. For a set of vectors {x 1,...,x m } in R p, a conical (nonnegative linear) combination is represented by a point x = m j=1 λ j x j, where λ j forj = 1,...,m.ForasetA in R p, the conical hull of A is the collection of all conical combinations of vectors from A or } { m coni(a) = λ j x j : x j A, λ j andm is a positive integer j=1 We assume the following conditions on the prior process of the baseline c.h.f. A1. There exists a positive number ς 1 such that sup t [,τ],x [, 1] xf t (x) (1 x) 1 ς 1 (= M 1 )<, whereτ = max{t 1,...,T n }. A2. There exist positive constants M 2 and ς 2 and a positive function a (t) continuous on (,τ)such that xf t (x) M 2 (1 x) ς 2 1 a (t) for all x [, 1] and t τ..

BAYESIAN ANALYSIS OF HAZARD MODELS 499 THEOREM 4.1. Let B ={Z j Z k : i = 1,...,q n,j D n (t i ), k R n + (t i)}. Assume that A1 and A2 hold. If coni(b) = R p, the posterior distribution with a constant prior on the regression coefficients is proper. The sketch of the proof of Theorem 4.1 is as follows. Conditions A1 and A2 yield the result that the marginal posterior distribution of β with a flat constant prior is bounded by J z B(e βt z 1) for some positive constant J,where i a i is the minimum value of a i s. Thus, B is the set of vectors for which the likelihood decreases exponentially along the direction of β with β T z<. The condition coni(b) = R p implies that the likelihood decreases exponentially in any direction. Consequently, integrability is guaranteed. The detailed proof of Theorem 4.1 is presented in Section 6. REMARK. The condition coni(b) = R p is the condition equivalent to uniqueness of the maximum likelihood estimator and log-concavity of the partial likelihood. See Jacobsen (1989) and Anderson, Borgan, Gill and Keiding (1993). This former condition can be seen as a counterpart to the linear independence of covariates in linear regression models. Consider a gamma process prior in Example 1. Suppose A (t) = t a (s) ds for all t> such that a (t) is positive continuous and bounded on (,τ]. If < inf t [,τ] c(t) sup t [,τ] c(t) <, then the assumptions A1 and A2 are satisfied by any positive constants ς 1 and ς 2 such that ς 1 < inf t [,τ] c(t) and ς 2 > sup t [,τ] c(t). Hence, we can use the constant improper prior on β as long as coni(b) = R p holds. Similar arguments hold for a beta process prior in Example 2. 5. Proof of Theorem 3.1. We prove Theorem 3.1 in this section. The proofs of part (i) and (ii) of the theorem are given in the following two subsections. Throughout this section, the probability P refers to the joint probability measure of the sampling distribution and its prior, and E( ) refers to its expectation. Define counting processes N i and Y i on [, ) by N i (t) = I(T i t,δ i = 1) and Y i (t) = I(W i <t T i ). Let F t = σ(n 1 (s),...,n n (s), s t). Then the compensator i of N i with respect to F t conditional on β and F is i (t) = t Y i (s) da i (s), where A i (t) is the cumulative hazard function of F i. See Fleming and Harrington (1991) or Anderson, Borgan, Gill and Keiding (1993) for the theory of counting processes. Since the truncation and censoring times are assumed to be independent of the survival times, the posterior distribution conditional on D n is the same as that conditional on N n = (N 1,...,N n ) and Z n = (Z 1,...,Z n ). In the following, we derive the posterior distribution conditioning on N n and Z n, and we drop Z n in the formulas unless there is a possibility of confusion.

5 Y. KIM AND J. LEE 5.1. The posterior distribution of F given β and data. In this subsection, we prove part (i) of Theorem 3.1. Noting that A i (t) is also a Lévy process, we derive the posterior distribution of A (and, consequently, of F )byfirstderiving the posterior distribution of A i using Theorem 3.2 in Kim (1999) and inverting it to A using Theorem A.1 in the Appendix. Suppose the Lévy measure ν of F is given by (7) l ν(dt,dx) = f t (x) dx + dg j (x)δ vj (dt), j=1 where for each t, f t (x) dx is a σ -finite measure on [, 1] with t 1 x f s (x) dx ds < for all t and G j (x) are distributions on [, 1]. Here, L = {v 1,...,v l } is the set of fixed discontinuities of F. For a given counting process N 1 (t) with covariate Z 1,defineN 1 (t) by N 1 (t) = N 1(t) N 1 (t) I(t L). For given r>, define F(t: r) by F(t: r) = 1 (1 F(t)) r and let A(t : r) be the cumulative hazard function of F(t: r). LEMMA 5.1. For given β and N 1, the posterior distribution of F is a process neutral to the right with Lévy measure ν(dt,dx β,n 1 )= (1 x) Y 1(t) exp(β T Z 1 ) f t (x) dx dt (8) + l j=1 cj 1 [ (1 x) Y 1 (v j ) exp(β T Z 1 ) ] 1 a j [ 1 (1 x) exp(βt Z 1 ) ] a j dg j (x)δ vj (dt) + c 1 (t) [ 1 (1 x) exp(βt Z 1 ) ] f t (x) dx dn 1 (t), where c j and c(t) are normalizing constants and a j = I( N 1 (v j ) = 1). PROOF. Let r = exp(β T Z 1 ). Theorem A.1 with h(x) = 1 (1 x) r implies that A(t : r) is a Lévy process with Lévy measure ν(dt,dx : r) = 1 r (1 x)1/r 1 f t 1/r ) dxdt l ( + dg j 1 (1 x) 1/r ) δ vi (dt). Since the compensator of N 1 (t) is t Y 1 (s) da(s : r), Theorem 3.2 of Kim (1999) implies that the posterior distribution of A(t : r),givenβ and N 1, is a Lévy process

with Lévy measure ϱ: BAYESIAN ANALYSIS OF HAZARD MODELS 51 ϱ(dt,dx β,n 1 ) = ( 1 Y 1 (t)x ) 1/r(1 x) 1/r 1 f t 1/r ) dxdt Since + l j=1 cj 1 ( 1 Y1 (v j )x ) 1 a j x a j ( dg j 1 (1 x) 1/r ) δ vj (dt) + c 1 (t)x1/r(1 x) 1/r 1 f t 1/r ) dxdn 1 (t). A(t) = s t ( 1 ( 1 A(s : r) ) 1/r ) I ( A(s : r) > ), Theorem A.1 with h(x) = 1 (1 x) 1/r implies that the posterior distribution of A conditional on β and N 1 is a Lévy process with Lévy measure ν( β,n 1 ) defined by (8). PROOF OF (i) OF THEOREM 3.1. For n = 1, Lemma 5.1 with L = implies the desired result. For n 2, we complete the proof by applying Lemma 5.1 repeatedly. 5.2. The marginal posterior distribution of β. In this subsection, we prove part (ii) of Theorem 3.1. The proof consists of two parts. First, we derive the marginal compensator of the counting process N k+1,givenβ and N k. Second, we derive the likelihood of β using Jacod s formula for the likelihood ratio [Jacod (1975); Andersen, Borgan, Gill and Keiding (1993)]. Assume L = in (7). Let r k = exp(β T Z k ) for k = 1,...,n. LEMMA 5.2. For k 1, conditional on β and N k, N k+1 is a multiplicative counting process with compensator t Y k+1 (s) de[a(s : r k+1 ) β,n k ], where t 1 ( E[A(t : r k+1 ) β,n k ]= 1 (1 x) exp(β T Z k+1 ) (9) ) ν(ds,dx β,n k ). PROOF. Part (i) of Theorem 3.1 implies that, conditional on β and N k,the posterior distribution of A is a Lévy process with Lévy measure ν( β,n k ). Hence, A(t : r k+1 ) is also a Lévy process and we have P(X k+1 >t β,n k ) = E ( ) 1 F(t: r k+1 ) β,n k ( ( = E 1 da(s: rk+1 ) ) ) β,n k s t = s t( 1 de[a(s : rk+1 ) β,n k ] ).

52 Y. KIM AND J. LEE Hence, conditional on β and N k, the cumulative hazard function of X k+1 is E[A(t : r k+1 ) β,n k ]. By (3) and the transformation of variables technique, we have t 1 ( E[A(t : r k+1 ) β,n k ]= 1 (1 x) exp(β T Z k+1 ) ) ν(ds,dx β,n k ) and the proof is complete. LEMMA 5.3. Let P k+1 ( β,n k ) be the probability measure of N k+1 conditional on β and N k and let L(N k+1 β,n k ) = dp k+1( β,n k ) dp k+1 (, N k ). Then q k [ 1 1 ( L(N k+1 β,n k ) (1 x) exp(β T Z k+1 ) ) 1 N k+1 (t i ) (1) c i exp(βt Z k+1 ) ) N k+1 (t i ) exp(β T Z j ) ) j D k (t i ) ] Yk+1 (t i ) j R (1 x) k + (t i ) exp(βt Z j ) fti (x) dx [ Tk+1 1 ( exp 1 (1 x) exp(β T Z k+1 ) ) W k+1 ] (1 x) j R k (t) exp(β T Z j ) f t (x) dx dt [ 1 ( 1 (1 x) exp(β T Z k+1 ) ) (1 x) j R k (T k+1 ) exp(β T Z j ) ft (x) dx] ξk+1, where q k, (t 1,...,t qk ), D k,r k and R k + are defined similarly to q n, (t 1,...,t qn ), D n,r n and R n + in Theorem 3.1, except that only the first k observations are used. Also ξ k+1 = I(δ k+1 = 1,T k+1 t i,i = 1,...,q k ) and (11) 1 c i = j D k (t i ) exp(β T Z j ) ) j R (1 x) k + (t i ) exp(βt Z j ) fti (x) dx.

BAYESIAN ANALYSIS OF HAZARD MODELS 53 PROOF. Let B(t) = E[A(t : r k+1 ) β,n k ]. Let B d (t) = q k B(t i)i (t i t) and B c (t) = B(t) B d (t). By a result of Jacod (1975) [or see Anderson, Borgan, Gill and Keiding (1993)] and the definition of product integration, we have L(N k+1 β,n k ) ( Yk+1 (t) db(t) ) N k+1 (t)( 1 Yk+1 (t) db(t) ) 1 N k+1 (t) (12) t [,τ] [ = b c (T k+1 ) ξ k+1 exp q k ] Y k+1 (t) db c (t) B d (t) N k+1(t i ) ( 1 Y k+1 (t i ) B d (t) ) 1 N k+1 (t i ), where b c (t) is the first derivative of B c (t). From the definition of B(t) in (9), we have q k [ 1 1 (1 B d (t) = (1 x) exp(β T Z ) k+1) (13) and (14) t B c (t) = c i 1 j D k (t i ) exp(β T Z j ) ) ] j R (1 x) k + (t i ) exp(βt Z j ) fti (x) dxi(t i t) exp(β T Z k+1 ) ) (1 x) j R k (s) exp(β T Z j ) f s (x) dx ds. The proof is complete by substituting (13) and (14) in (12). PROOF OF (ii) OF THEOREM 3.1. When n = 1, (6) is valid by Lemma 5.3. Now, assume that (6) is true for n = k. Since π(β N k+1 ) L(N k+1 β,n k )π(β N k ) and the c i in (11) is the same as 1 h i (x β)dx in (5) with n replaced by k, we complete the proof by rearranging the equations L(N k+1 β,n k ) in (1) and π(β N k ) in (6). 6. Proof of Theorem 4.1. In this section, we prove the propriety of the posterior distribution of β with the constant prior. Let q n 1 L(β) = e ρ(β) h i (x β)dx. This is the right-hand side of (6) with π(β) 1. We will prove that L(β) is integrable.

54 Y. KIM AND J. LEE Let = r <r 1 < <r m < be the ordered values of all the distinct points of {T 1,...,T n,w 1,...,W n }. For a given sequence of real numbers {a i }, i a i is defined to be the maximum value of {a i }. For each k,letn k be an integer such that t k = r nk and let rnk 1 n ρ k (β) = [Y i (t) exp(βt Z i ) ) r nk 1 LEMMA 6.1. For k = 1,...,q n, ρ k (β) C k where 1 ρ kj (β) = r nk and C k = M 2 r nk 1 a (t) dt. PROOF. Note that ρ k (β) = 1 (1 x) exp(βt Z j ) rnk r nk 1 1 x i 1 (1 x) j=1 Y j (t) exp(β T Z j ) ] f t (x) dx dt. j D n (t k ) ρ kj (β), l R (1 x) n + (t k ) exp(βt Z l )+ς 2 1 dx nj=1 Y j (t) exp(β T Z j ) ) f t (x) dx dt. Hence, for any j D n (t k ), rnk 1 ( ρ k (β) 1 (1 x) exp(β T Z j ) ) f t (x) dx dt r nk 1 rnk 1 r nk 1 rnk 1 exp(β T Z j ) ) l R (1 x) n + (t k ) exp(βt Z l ) ft (x) dx dt 1 (1 x) exp(βt Z j ) r nk 1 x l R (1 x) n + (t k ) exp(βt Z l )+ς2 1 dxm2 a (t) dt = C k ρ kj (β). The third inequality is due to A2. Since j is chosen arbitrarily, the proof is complete. Let l i (β) = 1 h i (x β)dx. For j D n (t i ),definel ij (β) by 1 1 (1 x) exp(βt Z j ) k R l ij (β) = (1 x) n + (t i ) exp(βt Z k )+ς 1 1 dx. x

BAYESIAN ANALYSIS OF HAZARD MODELS 55 LEMMA 6.2. l i (β) M 1 l ij (β). j D n (t i ) Forany j D n (t i ),wehave 1 ( l i (β) 1 (1 x) exp(β T Z j ) ) k R (1 x) n + (t i ) exp(βt Z k ) fti (x) dx PROOF. 1 1 (1 x) exp(βt Z j ) k R M 1 (1 x) n + (t i ) exp(βt Z k )+ς 1 1 dx x = M 1 l ij (β). The second inequality is due to A1. Since j is arbitrary, this completes the proof. Then LEMMA 6.3. Let ρ i (β) = j D n (t i ) ρ ij (β) and let l i (β) = j D n (t i ) l ij (β). L(β) M q n 1 exp ( C i ρi (β)) li (β), where C i are defined in Lemma 6.1. PROOF. and Lemma 6.2 implies q n Lemma 6.1 implies q n q n ρ(β) ρ k (β) C k ρk (β) q n k=1 k=1 l i (β) M q n 1 li (β). The above two inequalities lead to the conclusion. Let ψ(x) = 1 (1 (1 y) x 1 )/y dy. Direct calculation yields that for any positive η, (15) where ψ (x) = dψ(x)/dx. ψ (η) = sup η x< q n xψ (x) <, LEMMA 6.4. There exists a constant K>such that sup exp ( C i ρ β R p i (β)) li (β) K for i = 1,...,q n.

56 Y. KIM AND J. LEE PROOF. First consider ρ ij (β) l ij (β) for j D(t i ). Then, using the mean value theorem, we can write ρ ij (β) l ij (β) ( ψ + ψ k R n + (t i ) ( ψ ( exp(β T Z k ) + ς 2 ) ψ exp(β T Z j ) + exp(β T Z j ) + k R + n (t i ) k R + n (t i ) ( k R + n (t i ) exp(β T Z k ) + ς 2 ) exp(β T Z k ) + ς 1 ) = (ς 2 ς 1 )ψ (a 1 ) + (ς 2 ς 1 )ψ (a 2 ) exp(β T Z k ) + ς 1 ) for some constants a 1 >ς 3 and a 2 >ς 3,whereς 3 = min{ς 1,ς 2 }. From (15), we have ρ ij (β) l ij (β) ς 1 ς 2 2ψ (ς 3 ). ς 3 Since ρ i (β) l i (β) j D n (t i ) ( ρij (β) l ij (β) ) > ς 1 ς 2 2ψ (ς 3 ), ς 3 we have exp ( ( C i ρi (β)) li (β) exp( C i li (β)) C i l (β) exp Now, letting { ( K = max exp we complete the proof. ( exp C i ς 1 ς 2 2ψ (ς 3 ) ς 3 C i ς 1 ς 2 2ψ (ς 3 ) ς 3 j D n (t i ) k R n + (t i ) C i ς 1 ς 2 2ψ (ς 3 ) ς 3 ) / C i. ) / C i : i = 1,...,q n }, LEMMA 6.5. For i = 1,...,q n, li (β) ψ (ς 1 ) exp ( β T (Z k Z j ) ). ) / C i

BAYESIAN ANALYSIS OF HAZARD MODELS 57 PROOF. It suffices to show that for any j D n (t i ), l ij (β) ψ (ς 1 ) exp ( β T (Z k Z j ) ). k R + n (t i ) Since ψ is decreasing, the mean value theorem yields that ( l ij (β) = ψ exp(β T Z j ) + ) exp(β T Z k ) + ς 1 ψ ( k R + n (t i ) exp(β T Z j )ψ ( k R + n (t i ) exp(β T Z k ) + ς 1 ) k R + n (t i ) exp(β T Z k ) + ς 1 ) exp(β T Z j ) k R + n (t i ) exp(βt Z k ) + ς 1 ψ (ς 1 ) ψ (ς 1 ) k R n + (t i ) exp(βt (Z k Z j )) ψ (ς 1 ) exp ( β T (Z k Z j ) ). k R + n (t i ) LEMMA 6.6. For some constant J>, L(β) J z B( e β T z 1 ). PROOF. Lemma 6.4 implies that for i = 1,...,q n, q n sup β R p k=1 k i exp ( C k ρ k (β)) l k (β) < Kq n 1. Since exp( C i ρi (β)) 1, Lemma 6.5 yields L(β) M q n 1 Kq n 1 ψ (ς 1 ) exp ( β T (Z k Z j ) ). j D n (t i ) k R n + (t i ) Since i is arbitrary, we have L(β),...,q n j D n (t i ) k R n + (t i ) M q n 1 Kq n 1 ψ (ς 1 ) exp ( β T (Z k Z j ) ).

58 Y. KIM AND J. LEE On the other hand, by Lemma 6.4, we have L(β) sup q n β R p k=1 exp ( C k ρ k (β)) M 1 l k (β) < (M 1K) q n. Finally, by letting J = max{m q n 1 Kqn 1 ψ (ς 1 ), (M 1 K) q n }, the proof is complete. PROOF OF THEOREM 4.1. there exists h j > such that Suppose B ={z 1,...,z m }. First, we show that (16) m ( ) e βt z i 1 e h j β j for each j = 1, 2,...,p.Takej = 1andlete 1 be the unit vector where the first coordinate is 1 and all the other coordinates are. Since coni{z 1,...,z m }=R p, there exist nonnegative numbers c 1,...,c m such that m e 1 = c i z i. Let d = 1/(m m c i ) and d i = dc i. Note that d is well defined because not all c i can be zero, and nd i 1foralli = 1, 2,...,m: m ( ) m ( e βt z i 1 e nd ) iβ T z i 1 { m } exp d i β T z i 1 = e dβ 1 1. The first inequality holds because nd i 1; the second inequality holds because the minimum of n real numbers is always less than or equal to their average. Similarly, we can find g> such that m ( ) e βt z i 1 e gβ 1 1. Now, let h 1 = d g and assume that (16) holds for j = 1. By mimicking the above derivations with e j, the unit vector whose jth coordinate is 1 and all the others are, we can find h j for (16).

BAYESIAN ANALYSIS OF HAZARD MODELS 59 Finally, let h = min{h 1,...,h p }. Combining Lemma 6.6 and (16), we have p L(β) dβ J e h j β i dβ Rp R p and the proof is complete. J = Jp = Jp = Jp p R p e h β i dβ β 1 = p β i β1 β1 R R = Jp2 p β 1 e h β 1 dβ β 1 (2 β 1 ) p 1 e h β 1 dβ 1 β 1 > e h β 1 dβ p dβ 2 dβ 1 β p 1 1 e h β 1 dβ 1 <, APPENDIX Transformation of nondecreasing Lévy processes. Let SC[, 1] be the set of all strictly increasing differentiable functions defined on [, 1] with h() = and h(1) = 1. We prove the following theorem. THEOREM A.1. Let A be a Lévy process with Lévy measure ν given by (7). For a given h SC[, 1], define a process B by B(t) = h ( A(s) ) I ( A(s) > ). s t Then B is a Lévy process with Lévy measure ν B, where ν B (dt, dx) = dh 1 (x) ( f t h 1 (x) ) l ( (17) dxdt + dg j h 1 (x) ) δ vj (dt) dx and h 1 is the inverse function of h. j=1 PROOF. LetA n be a Lévy process defined by A n (t) = A(s)I ( 1/n < A(s) 1 ). s t Let N(t) be a Poisson process with intensity function a(t) = 1 1/n f t (x) dx. Conditional on {N(s): s t}, let Y 1,...,Y N(t) be independent random variables with density a 1 (t i )f ti (x)i (1/n < x 1) for i = 1,..., where t i is the

51 Y. KIM AND J. LEE time of the ith jump of N on [,t]. Also,letV 1,...,V l be independent random variables with each other as well as N and Y 1,...,Y N(t) such that the distribution function of V i is G i left truncated at 1/n. Since these two Lévy processes have the common Lévy measure ν An (dt, dx) = I(1/n < x 1)ν(dt, dx), we have A(t) = d N(t) l (18) Y i + V i I(v i t). Define a Lévy process B n by Then (18) implies B n (t) = s t h ( A(s) ) I ( 1/n < A(s) 1 ). B n (t) = d N(t) h(y i ) + l h(v i )I (v i t). Since the density of h(y i ) is a 1 (t i )(dh 1 (x))/(dx)f ti (h 1 (x))i (h(1/n) < x 1)dx and the distribution function of h(v i ) is G i (h 1 (x)) left truncated at h(1/n), the Lévy measure of B n is ν Bn (dt, dx) = dh 1 (x) ( f t h 1 (x) ) I ( h(1/n) < x 1 ) dxdt dx l ( + dg nj h 1 (x) ) δ vj (dt), j=1 where G nj is G j left truncated at 1/n. Let B be the Lévy process with Lévy measure ν B in (17). Theorem 3.13 d in Jacod and Shiryaev (1987) implies that B n B. On the other hand, from d d its construction, it is obvious that B n B. Hence, B = B and the proof is complete. Acknowledgment. We thank the Associate Editor and a referee for valuable comments that helped us improve the paper. REFERENCES ANDERSON, P. K., BORGAN, Ø., GILL, R. D. andkeiding, N. (1993). Statistical Methods Based on Counting Processes. Springer, New York. CLAYTON, D. G. (1991). A Monte Carlo method for Bayesian inference in frailty models. Biometrics 47 467 485. DOKSUM, K. A. (1974). Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab. 2 183 21. FERGUSON, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 29 23.

BAYESIAN ANALYSIS OF HAZARD MODELS 511 FLEMING,T.R.andHARRINGTON, D. P. (1991). Counting Processes and Survival Analysis. Wiley, New York. HJORT, N. L. (199). Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist. 18 1259 1294. JACOBSEN, M. (1989). Existence and unicity of MLEs in discrete exponential family distributions. Scand. J. Statist. 16 335 349. JACOD, J. (1975). Multivariate point processes: Predictable projection, Radon Nikodym derivatives, representation of martingales. Z. Warsch. Verw. Gebiete 31 235 253. JACOD, J. andshiryaev, A. N. (1987). Limit Theorems for Stochastic Processes. Springer, New York. KALBFLEISCH, J. D. (1978). Nonparametric Bayesian analysis of survival time data. J. Roy. Statist. Soc.Ser.B4 214 221. KIM, Y. (1999). Nonparametric Bayesian estimators for counting processes. Ann. Statist. 27 562 588. KIM,Y.andLEE, J. (21). On posterior consistency of survival models. Ann. Statist. 29 666 686. LAUD, P. W., DAMIEN, P. and SMITH, A. F. M. (1998). Bayesian nonparametric and covariate analysis of failure time data. Practical Nonparametric and Semiparametric Bayesian Statistics. Lecture Notes in Statist. 133 213 225. LEE,J.andKIM, Y. (22). A new algorithm to generate beta processes. Unpublished manuscript. LO, A. Y. (1982). Bayesian nonparametric statistical inference for Poisson point processes. Z. Wahrsch. Verw. Gebiete 59 55 66. DEPARTMENT OF STATISTICS EWHA WOMANS UNIVERSITY 11-1 DAEHYUN-DONG SEODAEMUM-GU SEOUL 12-75 KOREA E-MAIL: ydkim@mm.ewha.ac.kr DEPARTMENT OF STATISTICS PENNSYLVANIA STATE UNIVERSITY 326 THOMAS BUILDING UNIVERSITY PARK,PENNSYLVANIA 1682 E-MAIL: leej@stat.psu.edu