BAYESIAN ANALYSIS OF PROPORTIONAL HAZARD MODELS. Ewha Womans University and Pennsylvania State University

Size: px

Start display at page:

Download "BAYESIAN ANALYSIS OF PROPORTIONAL HAZARD MODELS. Ewha Womans University and Pennsylvania State University"

Rudolf Miller
6 years ago
Views:

1 The Annals of Statistics 23, Vol. 31, No. 2, Institute of Mathematical Statistics, 23 BAYESIAN ANALYSIS OF PROPORTIONAL HAZARD MODELS BY YONGDAI KIM 1 AND JAEYONG LEE Ewha Womans University and Pennsylvania State University This paper is concerned with Bayesian analysis of the proportional hazard model with left truncated and right censored data. We use a process neutral to the right as the prior of the baseline survival function and a finitedimensional prior is placed on the regression coefficient. We then obtain the exact form of the joint posterior distribution of the regression coefficient and the baseline cumulative hazard function. As a by-product, we prove the propriety of the posterior distribution with the constant prior on the regression coefficient. 1. Introduction. This paper is concerned with Bayesian inference of the proportional hazard model when observations are both left truncated and right censored (LTRC). Bayesian analysis of the proportional hazard model with right censored data has been studied, for example, by Kalbfleisch (1978), Hjort (199) and Laud, Damien and Smith (1998). However, their results cannot be extended directly to LTRC data. To the best of our knowledge, even with the popular prior processes gamma and beta a Markov chain Monte Carlo (MCMC) algorithm is not available. The main contribution of this paper is the derivation of the posterior distribution in a closed form for LTRC survival data. There are two important consequences. First, any Markov chain Monte Carlo algorithm for right censored data can be modified to implement a Bayesian analysis of LTRC data. Second, the theoretical study of the posterior distribution for LTRC data has become simpler because of the availability of its closed form. In particular, we prove the propriety of the posterior when an improper constant prior is used for the regression coefficient. To our knowledge, this result is the first propriety result concerning prior processes neutral to the right. Our proof also has novel features. We derive the posterior distribution by embedding the LTRC data into a counting process model. Because it handles left truncated data easily, this approach, using counting processes, has various advantages over other earlier approaches. First the proof is much simpler and more systematic. Kalbfleisch (1978) used gamma process priors for the log of the survival function and obtained the posterior distribution by deriving finitedimensional distributions of the posterior process. The proof involves nontrivial Received February 21; revised May Supported in part by KOSEF through the Statistical Research Center for Complex Systems at Seoul National University. AMS 2 subject classifications. Primary 62C1; secondary 6G55. Key words and phrases. Bayesian analysis, proportional hazard model, left truncation, censoring, neutral to the right process, propriety of posterior. 493

2 494 Y. KIM AND J. LEE details, particularly when there are ties among observations. Our approach handles ties in a consistent way. Second, the posterior distribution can be derived for a much wider class of priors. While the previous results require specific knowledge of a prior for the baseline distribution, such as a gamma or beta process, our approach requires only that the prior of the baseline distribution be a process neutral to right. In Example 1 of Section 3, Kalbfleisch s result is deduced from the main theorem of this paper. In Example 2 of the same section, we derive the posterior distribution of the regression coefficient with beta process priors, which is an extension of Hjort (199) in tied observations. Laud, Damien and Smith (1998) implemented an MCMC algorithm for the proportional hazard model with beta process priors. However, the derivation of the conditional posterior distribution of the cumulative hazard function, given the regression coefficients used in their algorithm, is not available in the literature. Our results justify its use. The paper is organized as follows. Section 2 introduces the model and the class of priors. Section 3 presents the posterior distribution. Section 4 gives the propriety result. Sections 5 and 6 give the proofs of the results. 2. Proportional hazard models and processes neutral to the right. In this section, we introduce the proportional hazard model for LTRC data and review prior processes neutral to the right for the baseline survival function. We begin by modelling the complete data and then introduce the truncation and censoring mechanisms. The postulates of the proportional hazard model are as follows. Let X 1,X 2,... be survival times with covariates Z 1,Z 2,..., where Z i R p. Suppose the distribution F i of X i with covariate Z i is given by 1 F i (t) = (1 F(t)) exp(βt Z i ), where β R p is the unknown regression coefficient and F is the unknown baseline distribution function. The survival times are only partially observed due to the presence of truncation and censoring variables (W i,c i ), which are assumed to be independent random vectors independent of the X i s. Let T i = min(x i,c i ) and δ i = I(X i C i ). In the presence of LTRC data, one observes (T i,δ i ) only when T i W i. Finally the data consist of n copies of (T i,δ i,w i,z i ) with T i >W i.bysettingw i =, data subject to only right censoring can be treated in this framework. There are two parameters in the proportional hazard model: the regression coefficient β and the baseline cumulative distribution function (c.d.f.) F.Weuse a process neutral to the right [Doksum (1974)] for F and a finite-dimensional distribution with a density for β. Processes neutral to the right include popular prior processes such as Dirichlet processes [Ferguson (1973)], gamma processes [Kalbfleisch (1978); Lo (1982)] and beta processes [Hjort (199)]. Before deriving the posterior distribution, we review several features of processes neutral to the right. A random distribution function F is a process neutral to the right if F(t)= 1 exp( Y(t)),whereY(t)is a nondecreasing Lévy process with Y() = andlim t Y(t)= with probability 1 [Doksum (1974)]. We can

3 BAYESIAN ANALYSIS OF HAZARD MODELS 495 redefine it in terms of a cumulative hazard function (c.h.f.), taking the approach initiated by Hjort (199). Let A be the c.h.f. of F, A(t) = t (1 F(s )) 1 df (s). Then, it can be shown that F is a process neutral to the right if and only if A is a nondecreasing Lévy process such that A() =, A(t) 1, for all t with probability 1 and either A(t) = 1forsomet>orlim t A(t) = a.s. From what follows, we simply use the term Lévy process forapriorprocessofthe c.h.f. A that induces a process neutral to the right on F. For any given Lévy process A(t) on [, ), there exists a unique random measure µ on [, ) [, 1] such that (1) A(t) = xµ(ds, dx), [, t] [, 1] where µ is defined by (2) µ([,t] B) = I ( A(s) B ) s t for any Borel subset B of [, 1] and for all t>. Since µ is a Poisson random measure [Jacod and Shiryaev (1987), page 7], there exists a unique σ -finite measure ν on [, ) [, 1] such that E(µ([,t] B)) = ν([,t] B). Conversely, for a given σ -finite measure ν with t 1 xν(ds,dx) < for all t>, we can construct a Lévy process through (1). Thus, we can conveniently characterize a Lévy process by ν, which we call the Lévy measure of A (or F ). The expectation of A(t) is also neatly expressed as (3) E(A(t)) = t 1 xν(ds, dx). The relationship between Le vy processes and Poisson random measures is well known in probability theory. For its complete treatment, see Jacod and Shiryaev (1987). For their application to Bayesian survival models, which is relatively new, refer to Kim (1999) and Kim and Lee (21). 3. The posterior distribution. A priori, let the baseline c.d.f. F beaprocess neutral to the right with a Lévy measure ν of the form ν(dt,dx) = f t (x) dx dt for x [, 1] and let π(β) be the prior density function for β. Letq n be the number of distinct uncensored observations and let t 1 <t 2 < <t qn be the ordered distinct uncensored observations. Define D n (t) ={i : T i = t,δ i = 1,i = 1,...,n}, R n (t) ={i : W i <t T i,i = 1,...,n} and R n + (t) = R n(t) D n (t). Thus, D n (t) and R n (t) are the sets of the failure times at time t and observations at risk at time t, respectively. The next theorem is the main result of the paper and is proven in Section 5. THEOREM 3.1. Let the observations be denoted by D n = ((T 1,δ 1,W 1,Z 1 ),...,(T n,δ n,w n,z n )).

4 496 Y. KIM AND J. LEE (i) Conditional on β and D n, the posterior distribution of F is a process neutral to the right with Lévy measure ν(dt,dx β,d n ) = (1 x) q (4) n j Rn(t) exp(β T Z j ) f t (x) dx dt + dh i (x β)δ ti (dt), where δ a is the degenerate probability measure at a and H i ( β) is a probability measure on [, 1] with density proportional to [ ( h i (x β) = 1 (1 x) exp(β T Z j ) )] j R (1 x) n + (t i ) exp(βt Z j ) (5) fti (x). (6) j D n (t i ) (ii) The marginal posterior distribution of β is where ρ(β) = n Ti 1 W i q n π(β D n ) e ρ(β) 1 h i (x β)dxπ(β), exp(β T Z i ) ) i 1 (1 x) j=1 Y j (t) exp(β T Z j ) f t (x) dx dt, Y j (t) = I(W j <t T j ) for j = 1,...,n and i 1 j=1 Y j(t) exp(β T Z j ) = when i = 1. REMARK. Setting Z j = forallj, we can deduce the posterior of the c.h.f. when there are no covariates, which coincides with the results in Hjort (199) and Kim (1999). REMARK. The marginal posterior density of β given in (6) has the form L(β)π(β), wherel(β) = exp( ρ(β)) q n 1 h i (x β)dx. Thus, L(β) is the integrated likelihood of β with the c.h.f. integrated out. This result can be used for computation of posterior model probabilities in selecting a subset of appropriate covariates or in Bayesian model averaging. REMARK. For only right censored observations, the posterior distribution can be obtained from Theorem 3.1 by letting W i = foralli. In the next two examples of the application of Theorem 3.1, we derive the posterior distributions from the two well-known families of prior processes gamma and beta processes.

5 BAYESIAN ANALYSIS OF HAZARD MODELS 497 EXAMPLE 1 (Gamma process prior). A priori, assume that Y(t) = log(1 F(t)) is a gamma process with parameters (c(t), A (t)). Here, the gamma process with parameters (c(t), A (t)) is defined to be a Lévy process whose log moment generating function is log E ( exp( θy (t)) ) t = (e θx 1) c(s) x exp( c(s)x)dxda (s) provided A (t) is continuous. This class of prior processes was proposed by Lo (1982). Kalbfleisch (1978) used a subclass of these processes with constant c(t) for the proportional hazard model. The cumulative hazard function A of the gamma process is a Lévy process with a Lévy measure ν given by t 1 ν([,t] B) = B log(1 x) c(s)(1 x)c(s) 1 dxda (s). Therefore, from Theorem 3.1(i), we can establish that the posterior distribution of the c.h.f. given β is a Lévy process with Lévy measure where ν(dt,dx β,d n ) = c(t) log(1 x) (1 x) j Rn(t) exp(β T Z j )+c(t) 1 dxda (t) + q n dh i (x β)δ ti (dt), [ 1 dh i (x β) log(1 x) j D n (t i ) exp(β T Z j ) )] (1 x) j R n + (t i ) exp(βt Z j )+c(t i ) 1 dx. Even though the continuous part of Y is a gamma process, the gamma process prior is not conjugate because the distributions of jumps are not gamma distributions. This fact was overlooked by Clayton (1991), who used a gamma process in implementing an MCMC algorithm for the frailty model. In the case of constant c(t) without left truncation, our result coincides with that of Kalbfleisch (1978). EXAMPLE 2 (Beta process prior). The beta process with mean A and scale parameter c is a Lévy process with Lévy measure ν(dt,dx) = c(t)x 1 (1 x) c(t) 1 dxda (t). If the prior of the c.h.f. A isthebetaprocessgivenabove, the posterior of A given β is a Lévy process with Lévy measure ν(dt,dx β,d n ) = c(t) x (1 x) j Rn(t) exp(β T Z j )+c(t) 1 dxda (t) q n + dh i (x β)δ ti (dt),

6 498 Y. KIM AND J. LEE where dh i (x β) 1 x [ j D n (t i ) exp(β T Z j ) )] j R (1 x) n + (t i ) exp(βt Z j )+c(t i ) 1 dx. Also the marginal posterior distribution of β is given by (6) with h i (x β)dx = dh i (x β) and ρ(β) = n Ti 1 W i c(t) exp(β T Z i ) ) x i 1 (1 x) j=1 Y j (t) exp(β T Z j )+c(t) 1 dxda (t). Hjort (199) derived the posterior distribution with a beta process prior when there are no ties and observations are subject to only right censoring. This example extends his results for tied observations. With the result given above, we can devise a Markov chain Monte Carlo algorithm for left truncated and right censored data. Based on the idea that a positive increasing Lévy process is an integral of a Poisson random measure, Lee and Kim (22) developed an approximate algorithm to generate a beta process and illustrated their algorithm with the proportional hazard model when the observations are subject to right censoring only. The algorithm can be used verbatim for the left truncated and right censored data, except that the risk set R n (t) is adjusted for left truncation, that is, W i s are not. 4. Propriety of the posterior. In this section, we consider the propriety of the posterior distribution with the constant improper prior on β. For a set of vectors {x 1,...,x m } in R p, a conical (nonnegative linear) combination is represented by a point x = m j=1 λ j x j, where λ j forj = 1,...,m.ForasetA in R p, the conical hull of A is the collection of all conical combinations of vectors from A or } { m coni(a) = λ j x j : x j A, λ j andm is a positive integer j=1 We assume the following conditions on the prior process of the baseline c.h.f. A1. There exists a positive number ς 1 such that sup t [,τ],x [, 1] xf t (x) (1 x) 1 ς 1 (= M 1 )<, whereτ = max{t 1,...,T n }. A2. There exist positive constants M 2 and ς 2 and a positive function a (t) continuous on (,τ)such that xf t (x) M 2 (1 x) ς 2 1 a (t) for all x [, 1] and t τ..

7 BAYESIAN ANALYSIS OF HAZARD MODELS 499 THEOREM 4.1. Let B ={Z j Z k : i = 1,...,q n,j D n (t i ), k R n + (t i)}. Assume that A1 and A2 hold. If coni(b) = R p, the posterior distribution with a constant prior on the regression coefficients is proper. The sketch of the proof of Theorem 4.1 is as follows. Conditions A1 and A2 yield the result that the marginal posterior distribution of β with a flat constant prior is bounded by J z B(e βt z 1) for some positive constant J,where i a i is the minimum value of a i s. Thus, B is the set of vectors for which the likelihood decreases exponentially along the direction of β with β T z<. The condition coni(b) = R p implies that the likelihood decreases exponentially in any direction. Consequently, integrability is guaranteed. The detailed proof of Theorem 4.1 is presented in Section 6. REMARK. The condition coni(b) = R p is the condition equivalent to uniqueness of the maximum likelihood estimator and log-concavity of the partial likelihood. See Jacobsen (1989) and Anderson, Borgan, Gill and Keiding (1993). This former condition can be seen as a counterpart to the linear independence of covariates in linear regression models. Consider a gamma process prior in Example 1. Suppose A (t) = t a (s) ds for all t> such that a (t) is positive continuous and bounded on (,τ]. If < inf t [,τ] c(t) sup t [,τ] c(t) <, then the assumptions A1 and A2 are satisfied by any positive constants ς 1 and ς 2 such that ς 1 < inf t [,τ] c(t) and ς 2 > sup t [,τ] c(t). Hence, we can use the constant improper prior on β as long as coni(b) = R p holds. Similar arguments hold for a beta process prior in Example Proof of Theorem 3.1. We prove Theorem 3.1 in this section. The proofs of part (i) and (ii) of the theorem are given in the following two subsections. Throughout this section, the probability P refers to the joint probability measure of the sampling distribution and its prior, and E( ) refers to its expectation. Define counting processes N i and Y i on [, ) by N i (t) = I(T i t,δ i = 1) and Y i (t) = I(W i <t T i ). Let F t = σ(n 1 (s),...,n n (s), s t). Then the compensator i of N i with respect to F t conditional on β and F is i (t) = t Y i (s) da i (s), where A i (t) is the cumulative hazard function of F i. See Fleming and Harrington (1991) or Anderson, Borgan, Gill and Keiding (1993) for the theory of counting processes. Since the truncation and censoring times are assumed to be independent of the survival times, the posterior distribution conditional on D n is the same as that conditional on N n = (N 1,...,N n ) and Z n = (Z 1,...,Z n ). In the following, we derive the posterior distribution conditioning on N n and Z n, and we drop Z n in the formulas unless there is a possibility of confusion.

8 5 Y. KIM AND J. LEE 5.1. The posterior distribution of F given β and data. In this subsection, we prove part (i) of Theorem 3.1. Noting that A i (t) is also a Lévy process, we derive the posterior distribution of A (and, consequently, of F )byfirstderiving the posterior distribution of A i using Theorem 3.2 in Kim (1999) and inverting it to A using Theorem A.1 in the Appendix. Suppose the Lévy measure ν of F is given by (7) l ν(dt,dx) = f t (x) dx + dg j (x)δ vj (dt), j=1 where for each t, f t (x) dx is a σ -finite measure on [, 1] with t 1 x f s (x) dx ds < for all t and G j (x) are distributions on [, 1]. Here, L = {v 1,...,v l } is the set of fixed discontinuities of F. For a given counting process N 1 (t) with covariate Z 1,defineN 1 (t) by N 1 (t) = N 1(t) N 1 (t) I(t L). For given r>, define F(t: r) by F(t: r) = 1 (1 F(t)) r and let A(t : r) be the cumulative hazard function of F(t: r). LEMMA 5.1. For given β and N 1, the posterior distribution of F is a process neutral to the right with Lévy measure ν(dt,dx β,n 1 )= (1 x) Y 1(t) exp(β T Z 1 ) f t (x) dx dt (8) + l j=1 cj 1 [ (1 x) Y 1 (v j ) exp(β T Z 1 ) ] 1 a j [ 1 (1 x) exp(βt Z 1 ) ] a j dg j (x)δ vj (dt) + c 1 (t) [ 1 (1 x) exp(βt Z 1 ) ] f t (x) dx dn 1 (t), where c j and c(t) are normalizing constants and a j = I( N 1 (v j ) = 1). PROOF. Let r = exp(β T Z 1 ). Theorem A.1 with h(x) = 1 (1 x) r implies that A(t : r) is a Lévy process with Lévy measure ν(dt,dx : r) = 1 r (1 x)1/r 1 f t 1/r ) dxdt l ( + dg j 1 (1 x) 1/r ) δ vi (dt). Since the compensator of N 1 (t) is t Y 1 (s) da(s : r), Theorem 3.2 of Kim (1999) implies that the posterior distribution of A(t : r),givenβ and N 1, is a Lévy process

9 with Lévy measure ϱ: BAYESIAN ANALYSIS OF HAZARD MODELS 51 ϱ(dt,dx β,n 1 ) = ( 1 Y 1 (t)x ) 1/r(1 x) 1/r 1 f t 1/r ) dxdt Since + l j=1 cj 1 ( 1 Y1 (v j )x ) 1 a j x a j ( dg j 1 (1 x) 1/r ) δ vj (dt) + c 1 (t)x1/r(1 x) 1/r 1 f t 1/r ) dxdn 1 (t). A(t) = s t ( 1 ( 1 A(s : r) ) 1/r ) I ( A(s : r) > ), Theorem A.1 with h(x) = 1 (1 x) 1/r implies that the posterior distribution of A conditional on β and N 1 is a Lévy process with Lévy measure ν( β,n 1 ) defined by (8). PROOF OF (i) OF THEOREM 3.1. For n = 1, Lemma 5.1 with L = implies the desired result. For n 2, we complete the proof by applying Lemma 5.1 repeatedly The marginal posterior distribution of β. In this subsection, we prove part (ii) of Theorem 3.1. The proof consists of two parts. First, we derive the marginal compensator of the counting process N k+1,givenβ and N k. Second, we derive the likelihood of β using Jacod s formula for the likelihood ratio [Jacod (1975); Andersen, Borgan, Gill and Keiding (1993)]. Assume L = in (7). Let r k = exp(β T Z k ) for k = 1,...,n. LEMMA 5.2. For k 1, conditional on β and N k, N k+1 is a multiplicative counting process with compensator t Y k+1 (s) de[a(s : r k+1 ) β,n k ], where t 1 ( E[A(t : r k+1 ) β,n k ]= 1 (1 x) exp(β T Z k+1 ) (9) ) ν(ds,dx β,n k ). PROOF. Part (i) of Theorem 3.1 implies that, conditional on β and N k,the posterior distribution of A is a Lévy process with Lévy measure ν( β,n k ). Hence, A(t : r k+1 ) is also a Lévy process and we have P(X k+1 >t β,n k ) = E ( ) 1 F(t: r k+1 ) β,n k ( ( = E 1 da(s: rk+1 ) ) ) β,n k s t = s t( 1 de[a(s : rk+1 ) β,n k ] ).

10 52 Y. KIM AND J. LEE Hence, conditional on β and N k, the cumulative hazard function of X k+1 is E[A(t : r k+1 ) β,n k ]. By (3) and the transformation of variables technique, we have t 1 ( E[A(t : r k+1 ) β,n k ]= 1 (1 x) exp(β T Z k+1 ) ) ν(ds,dx β,n k ) and the proof is complete. LEMMA 5.3. Let P k+1 ( β,n k ) be the probability measure of N k+1 conditional on β and N k and let L(N k+1 β,n k ) = dp k+1( β,n k ) dp k+1 (, N k ). Then q k [ 1 1 ( L(N k+1 β,n k ) (1 x) exp(β T Z k+1 ) ) 1 N k+1 (t i ) (1) c i exp(βt Z k+1 ) ) N k+1 (t i ) exp(β T Z j ) ) j D k (t i ) ] Yk+1 (t i ) j R (1 x) k + (t i ) exp(βt Z j ) fti (x) dx [ Tk+1 1 ( exp 1 (1 x) exp(β T Z k+1 ) ) W k+1 ] (1 x) j R k (t) exp(β T Z j ) f t (x) dx dt [ 1 ( 1 (1 x) exp(β T Z k+1 ) ) (1 x) j R k (T k+1 ) exp(β T Z j ) ft (x) dx] ξk+1, where q k, (t 1,...,t qk ), D k,r k and R k + are defined similarly to q n, (t 1,...,t qn ), D n,r n and R n + in Theorem 3.1, except that only the first k observations are used. Also ξ k+1 = I(δ k+1 = 1,T k+1 t i,i = 1,...,q k ) and (11) 1 c i = j D k (t i ) exp(β T Z j ) ) j R (1 x) k + (t i ) exp(βt Z j ) fti (x) dx.

11 BAYESIAN ANALYSIS OF HAZARD MODELS 53 PROOF. Let B(t) = E[A(t : r k+1 ) β,n k ]. Let B d (t) = q k B(t i)i (t i t) and B c (t) = B(t) B d (t). By a result of Jacod (1975) [or see Anderson, Borgan, Gill and Keiding (1993)] and the definition of product integration, we have L(N k+1 β,n k ) ( Yk+1 (t) db(t) ) N k+1 (t)( 1 Yk+1 (t) db(t) ) 1 N k+1 (t) (12) t [,τ] [ = b c (T k+1 ) ξ k+1 exp q k ] Y k+1 (t) db c (t) B d (t) N k+1(t i ) ( 1 Y k+1 (t i ) B d (t) ) 1 N k+1 (t i ), where b c (t) is the first derivative of B c (t). From the definition of B(t) in (9), we have q k [ 1 1 (1 B d (t) = (1 x) exp(β T Z ) k+1) (13) and (14) t B c (t) = c i 1 j D k (t i ) exp(β T Z j ) ) ] j R (1 x) k + (t i ) exp(βt Z j ) fti (x) dxi(t i t) exp(β T Z k+1 ) ) (1 x) j R k (s) exp(β T Z j ) f s (x) dx ds. The proof is complete by substituting (13) and (14) in (12). PROOF OF (ii) OF THEOREM 3.1. When n = 1, (6) is valid by Lemma 5.3. Now, assume that (6) is true for n = k. Since π(β N k+1 ) L(N k+1 β,n k )π(β N k ) and the c i in (11) is the same as 1 h i (x β)dx in (5) with n replaced by k, we complete the proof by rearranging the equations L(N k+1 β,n k ) in (1) and π(β N k ) in (6). 6. Proof of Theorem 4.1. In this section, we prove the propriety of the posterior distribution of β with the constant prior. Let q n 1 L(β) = e ρ(β) h i (x β)dx. This is the right-hand side of (6) with π(β) 1. We will prove that L(β) is integrable.

12 54 Y. KIM AND J. LEE Let = r <r 1 < <r m < be the ordered values of all the distinct points of {T 1,...,T n,w 1,...,W n }. For a given sequence of real numbers {a i }, i a i is defined to be the maximum value of {a i }. For each k,letn k be an integer such that t k = r nk and let rnk 1 n ρ k (β) = [Y i (t) exp(βt Z i ) ) r nk 1 LEMMA 6.1. For k = 1,...,q n, ρ k (β) C k where 1 ρ kj (β) = r nk and C k = M 2 r nk 1 a (t) dt. PROOF. Note that ρ k (β) = 1 (1 x) exp(βt Z j ) rnk r nk 1 1 x i 1 (1 x) j=1 Y j (t) exp(β T Z j ) ] f t (x) dx dt. j D n (t k ) ρ kj (β), l R (1 x) n + (t k ) exp(βt Z l )+ς 2 1 dx nj=1 Y j (t) exp(β T Z j ) ) f t (x) dx dt. Hence, for any j D n (t k ), rnk 1 ( ρ k (β) 1 (1 x) exp(β T Z j ) ) f t (x) dx dt r nk 1 rnk 1 r nk 1 rnk 1 exp(β T Z j ) ) l R (1 x) n + (t k ) exp(βt Z l ) ft (x) dx dt 1 (1 x) exp(βt Z j ) r nk 1 x l R (1 x) n + (t k ) exp(βt Z l )+ς2 1 dxm2 a (t) dt = C k ρ kj (β). The third inequality is due to A2. Since j is chosen arbitrarily, the proof is complete. Let l i (β) = 1 h i (x β)dx. For j D n (t i ),definel ij (β) by 1 1 (1 x) exp(βt Z j ) k R l ij (β) = (1 x) n + (t i ) exp(βt Z k )+ς 1 1 dx. x

13 BAYESIAN ANALYSIS OF HAZARD MODELS 55 LEMMA 6.2. l i (β) M 1 l ij (β). j D n (t i ) Forany j D n (t i ),wehave 1 ( l i (β) 1 (1 x) exp(β T Z j ) ) k R (1 x) n + (t i ) exp(βt Z k ) fti (x) dx PROOF. 1 1 (1 x) exp(βt Z j ) k R M 1 (1 x) n + (t i ) exp(βt Z k )+ς 1 1 dx x = M 1 l ij (β). The second inequality is due to A1. Since j is arbitrary, this completes the proof. Then LEMMA 6.3. Let ρ i (β) = j D n (t i ) ρ ij (β) and let l i (β) = j D n (t i ) l ij (β). L(β) M q n 1 exp ( C i ρi (β)) li (β), where C i are defined in Lemma 6.1. PROOF. and Lemma 6.2 implies q n Lemma 6.1 implies q n q n ρ(β) ρ k (β) C k ρk (β) q n k=1 k=1 l i (β) M q n 1 li (β). The above two inequalities lead to the conclusion. Let ψ(x) = 1 (1 (1 y) x 1 )/y dy. Direct calculation yields that for any positive η, (15) where ψ (x) = dψ(x)/dx. ψ (η) = sup η x< q n xψ (x) <, LEMMA 6.4. There exists a constant K>such that sup exp ( C i ρ β R p i (β)) li (β) K for i = 1,...,q n.

14 56 Y. KIM AND J. LEE PROOF. First consider ρ ij (β) l ij (β) for j D(t i ). Then, using the mean value theorem, we can write ρ ij (β) l ij (β) ( ψ + ψ k R n + (t i ) ( ψ ( exp(β T Z k ) + ς 2 ) ψ exp(β T Z j ) + exp(β T Z j ) + k R + n (t i ) k R + n (t i ) ( k R + n (t i ) exp(β T Z k ) + ς 2 ) exp(β T Z k ) + ς 1 ) = (ς 2 ς 1 )ψ (a 1 ) + (ς 2 ς 1 )ψ (a 2 ) exp(β T Z k ) + ς 1 ) for some constants a 1 >ς 3 and a 2 >ς 3,whereς 3 = min{ς 1,ς 2 }. From (15), we have ρ ij (β) l ij (β) ς 1 ς 2 2ψ (ς 3 ). ς 3 Since ρ i (β) l i (β) j D n (t i ) ( ρij (β) l ij (β) ) > ς 1 ς 2 2ψ (ς 3 ), ς 3 we have exp ( ( C i ρi (β)) li (β) exp( C i li (β)) C i l (β) exp Now, letting { ( K = max exp we complete the proof. ( exp C i ς 1 ς 2 2ψ (ς 3 ) ς 3 C i ς 1 ς 2 2ψ (ς 3 ) ς 3 j D n (t i ) k R n + (t i ) C i ς 1 ς 2 2ψ (ς 3 ) ς 3 ) / C i. ) / C i : i = 1,...,q n }, LEMMA 6.5. For i = 1,...,q n, li (β) ψ (ς 1 ) exp ( β T (Z k Z j ) ). ) / C i

15 BAYESIAN ANALYSIS OF HAZARD MODELS 57 PROOF. It suffices to show that for any j D n (t i ), l ij (β) ψ (ς 1 ) exp ( β T (Z k Z j ) ). k R + n (t i ) Since ψ is decreasing, the mean value theorem yields that ( l ij (β) = ψ exp(β T Z j ) + ) exp(β T Z k ) + ς 1 ψ ( k R + n (t i ) exp(β T Z j )ψ ( k R + n (t i ) exp(β T Z k ) + ς 1 ) k R + n (t i ) exp(β T Z k ) + ς 1 ) exp(β T Z j ) k R + n (t i ) exp(βt Z k ) + ς 1 ψ (ς 1 ) ψ (ς 1 ) k R n + (t i ) exp(βt (Z k Z j )) ψ (ς 1 ) exp ( β T (Z k Z j ) ). k R + n (t i ) LEMMA 6.6. For some constant J>, L(β) J z B( e β T z 1 ). PROOF. Lemma 6.4 implies that for i = 1,...,q n, q n sup β R p k=1 k i exp ( C k ρ k (β)) l k (β) < Kq n 1. Since exp( C i ρi (β)) 1, Lemma 6.5 yields L(β) M q n 1 Kq n 1 ψ (ς 1 ) exp ( β T (Z k Z j ) ). j D n (t i ) k R n + (t i ) Since i is arbitrary, we have L(β),...,q n j D n (t i ) k R n + (t i ) M q n 1 Kq n 1 ψ (ς 1 ) exp ( β T (Z k Z j ) ).

16 58 Y. KIM AND J. LEE On the other hand, by Lemma 6.4, we have L(β) sup q n β R p k=1 exp ( C k ρ k (β)) M 1 l k (β) < (M 1K) q n. Finally, by letting J = max{m q n 1 Kqn 1 ψ (ς 1 ), (M 1 K) q n }, the proof is complete. PROOF OF THEOREM 4.1. there exists h j > such that Suppose B ={z 1,...,z m }. First, we show that (16) m ( ) e βt z i 1 e h j β j for each j = 1, 2,...,p.Takej = 1andlete 1 be the unit vector where the first coordinate is 1 and all the other coordinates are. Since coni{z 1,...,z m }=R p, there exist nonnegative numbers c 1,...,c m such that m e 1 = c i z i. Let d = 1/(m m c i ) and d i = dc i. Note that d is well defined because not all c i can be zero, and nd i 1foralli = 1, 2,...,m: m ( ) m ( e βt z i 1 e nd ) iβ T z i 1 { m } exp d i β T z i 1 = e dβ 1 1. The first inequality holds because nd i 1; the second inequality holds because the minimum of n real numbers is always less than or equal to their average. Similarly, we can find g> such that m ( ) e βt z i 1 e gβ 1 1. Now, let h 1 = d g and assume that (16) holds for j = 1. By mimicking the above derivations with e j, the unit vector whose jth coordinate is 1 and all the others are, we can find h j for (16).

17 BAYESIAN ANALYSIS OF HAZARD MODELS 59 Finally, let h = min{h 1,...,h p }. Combining Lemma 6.6 and (16), we have p L(β) dβ J e h j β i dβ Rp R p and the proof is complete. J = Jp = Jp = Jp p R p e h β i dβ β 1 = p β i β1 β1 R R = Jp2 p β 1 e h β 1 dβ β 1 (2 β 1 ) p 1 e h β 1 dβ 1 β 1 > e h β 1 dβ p dβ 2 dβ 1 β p 1 1 e h β 1 dβ 1 <, APPENDIX Transformation of nondecreasing Lévy processes. Let SC[, 1] be the set of all strictly increasing differentiable functions defined on [, 1] with h() = and h(1) = 1. We prove the following theorem. THEOREM A.1. Let A be a Lévy process with Lévy measure ν given by (7). For a given h SC[, 1], define a process B by B(t) = h ( A(s) ) I ( A(s) > ). s t Then B is a Lévy process with Lévy measure ν B, where ν B (dt, dx) = dh 1 (x) ( f t h 1 (x) ) l ( (17) dxdt + dg j h 1 (x) ) δ vj (dt) dx and h 1 is the inverse function of h. j=1 PROOF. LetA n be a Lévy process defined by A n (t) = A(s)I ( 1/n < A(s) 1 ). s t Let N(t) be a Poisson process with intensity function a(t) = 1 1/n f t (x) dx. Conditional on {N(s): s t}, let Y 1,...,Y N(t) be independent random variables with density a 1 (t i )f ti (x)i (1/n < x 1) for i = 1,..., where t i is the

18 51 Y. KIM AND J. LEE time of the ith jump of N on [,t]. Also,letV 1,...,V l be independent random variables with each other as well as N and Y 1,...,Y N(t) such that the distribution function of V i is G i left truncated at 1/n. Since these two Lévy processes have the common Lévy measure ν An (dt, dx) = I(1/n < x 1)ν(dt, dx), we have A(t) = d N(t) l (18) Y i + V i I(v i t). Define a Lévy process B n by Then (18) implies B n (t) = s t h ( A(s) ) I ( 1/n < A(s) 1 ). B n (t) = d N(t) h(y i ) + l h(v i )I (v i t). Since the density of h(y i ) is a 1 (t i )(dh 1 (x))/(dx)f ti (h 1 (x))i (h(1/n) < x 1)dx and the distribution function of h(v i ) is G i (h 1 (x)) left truncated at h(1/n), the Lévy measure of B n is ν Bn (dt, dx) = dh 1 (x) ( f t h 1 (x) ) I ( h(1/n) < x 1 ) dxdt dx l ( + dg nj h 1 (x) ) δ vj (dt), j=1 where G nj is G j left truncated at 1/n. Let B be the Lévy process with Lévy measure ν B in (17). Theorem 3.13 d in Jacod and Shiryaev (1987) implies that B n B. On the other hand, from d d its construction, it is obvious that B n B. Hence, B = B and the proof is complete. Acknowledgment. We thank the Associate Editor and a referee for valuable comments that helped us improve the paper. REFERENCES ANDERSON, P. K., BORGAN, Ø., GILL, R. D. andkeiding, N. (1993). Statistical Methods Based on Counting Processes. Springer, New York. CLAYTON, D. G. (1991). A Monte Carlo method for Bayesian inference in frailty models. Biometrics DOKSUM, K. A. (1974). Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab FERGUSON, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist

19 BAYESIAN ANALYSIS OF HAZARD MODELS 511 FLEMING,T.R.andHARRINGTON, D. P. (1991). Counting Processes and Survival Analysis. Wiley, New York. HJORT, N. L. (199). Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist JACOBSEN, M. (1989). Existence and unicity of MLEs in discrete exponential family distributions. Scand. J. Statist JACOD, J. (1975). Multivariate point processes: Predictable projection, Radon Nikodym derivatives, representation of martingales. Z. Warsch. Verw. Gebiete JACOD, J. andshiryaev, A. N. (1987). Limit Theorems for Stochastic Processes. Springer, New York. KALBFLEISCH, J. D. (1978). Nonparametric Bayesian analysis of survival time data. J. Roy. Statist. Soc.Ser.B KIM, Y. (1999). Nonparametric Bayesian estimators for counting processes. Ann. Statist KIM,Y.andLEE, J. (21). On posterior consistency of survival models. Ann. Statist LAUD, P. W., DAMIEN, P. and SMITH, A. F. M. (1998). Bayesian nonparametric and covariate analysis of failure time data. Practical Nonparametric and Semiparametric Bayesian Statistics. Lecture Notes in Statist LEE,J.andKIM, Y. (22). A new algorithm to generate beta processes. Unpublished manuscript. LO, A. Y. (1982). Bayesian nonparametric statistical inference for Poisson point processes. Z. Wahrsch. Verw. Gebiete DEPARTMENT OF STATISTICS EWHA WOMANS UNIVERSITY 11-1 DAEHYUN-DONG SEODAEMUM-GU SEOUL KOREA ydkim@mm.ewha.ac.kr DEPARTMENT OF STATISTICS PENNSYLVANIA STATE UNIVERSITY 326 THOMAS BUILDING UNIVERSITY PARK,PENNSYLVANIA leej@stat.psu.edu

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach