and Comparison with NPMLE

Size: px

Start display at page:

Download "and Comparison with NPMLE"

Shawn Sharp
5 years ago
Views:

1 NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY USA mai/

2 1. Introduction Assume lifetimes are non-negative, iid with a distribution F ( ): X 1,, X n. (1.1) However, these lifetimes are subject to censoring. right censoring: { { Xi if X Z i = i C i 1 if Xi C and C i if X i > C i = i (1.2) i 0 if X i > C i where C i are the (right) censoring times. Double censoring: (Chang and Yang 1987) Z i = X i if Y i X i C i C i if X i > C i and i = Y i if X i < Y i 1 if Y i X i C i 0 if X i > C i. 2 if X i < Y i (1.3)

3 (C i, Y i ), i = 1,, n are the left and right censoring times with C i > Y i. WLOG let the observations be arranged such that Z 1,, Z k are the uncensored observations, i.e. 1 = 1,, k = 1. In the Bayesian estimation of F ( ) we need not make assumptions about the distributions of the left and right censoring times C i and Y i. The observations can be described in three groups Z 1,, Z k where X i = Z i ; and Z k+1,, Z m where X i > Z i ; and Z m+1,, Z n, where X i < Z i.

4 The current status data or case 1 interval censored data: (Huang and Wellner (1996)) we observe T i, i = { 0 if Xi > T i 2 if X i < T i. Usually T i are assumed iid. This iid assumption does not make a difference here. Therefore, the current status data is a special case of (1.3) where all the observations are either left or right censored, i.e. k = 0. In the case 2 interval censoring, we assume X 1,, X k are observed exactly (k non-random, and may be zero), and only the observations X k+1,, X n are interval censored, and the following information is known:

5 (1) We observe n k intervals. With some abuse of notation, we denote the intervals by [L j, Z j ) for j = k + 1,, n. (1.4) (2) We know that L j X j < Z j. For our analysis here case 2 and case k do not make a difference. To make the notation consistent with the doubly censored case, we let Z 1 = X 1,, Z k = X k for directly observable outcomes. The interval censored outcomes are [L j, Z j ) for j = k + 1,, n. When the interval [L j, Z j ) = [a, ), this reduces to right censoring, and when [L j, Z j ) = [0, a), this reduces to left censoring.

6 Arbitrary Censored/truncated Data of Turnbull (1976) Suppose independent random variables, X i, are drawn from the conditional distribution functions P (X t X B i ). Here B i are truncation sets. Furthermore, X i is censored into the set A i, i.e. we only know that X i A i. Turnbull supposed A i = j [L ij, R ij ]. The empirical (or nonparametric) likelihood, based on the censored and truncated observations (A i, B i ), pertaining to the CDF of X is L = i P (X A i ) P (X B i ),

7 see Turnbull 1976 (3.6). Turnbull (1976) identified the (disjoint) intervals, [q j, p j ], where the NPMLE may have positive probability masses. Let us denote the mid point of those intervals by t j and the corresponding probability masses by s j. Turnbull proposed an EM/self-consistant algorithm to find the NPMLE of the CDF of X. The self-consistent equation of Turnbull: π j (s) = s j j = 1,..., m.

8 We seek to maximize the empirical likelihood with an extra mean constraint for given µ and g(t). m j=1 s j g(t j ) = g(t)df (t) = µ, (2) Our self-consistent equation is just where π j (s) = s j j = 1, 2,... m (3) π j (s) = Ni=1 {µ ij (s) + ν ij (s)} M(s) + λ(g(t j ) µ). (4)

9 In the above M(s), µ ij, ν ij are as defined in Turnbull (1976), and λ is the solution of the following equation: 0 = m j=1 (g(t j ) µ) N i=1 {µ ij (s) + ν ij (s)} M(s) + λ(g(t j ) µ). (5) M(s) = i j (µ ij + ν ij ), µ ij = µ ij (s) = EI {xi [q j,p j ]}, ν ij = EJ ij. Because of truncation, each observation X i = x i, can be considered a remnant of a group, the size which is unknown and all (except the

10 one observed) with x-values in B c i. They can be thought of as X i s ghosts. J ij is the number in the group corresponding to the i th observation which have values in [q j, p j ].

11 Remark: Other constraints of the type g 1 (t)df (t) = g 2 (t)df (t) for one, two or more samples (F i (t) can be handled similarly. Remark: Using a similar modified EM method, we can compute the NPMLE under weighted hazard constraint: g(t)dλ(t) = θ for arbitrary censored/truncated data. (constrained NPMLE of hazard function). This can even be generalized to cover the case where X s follow a Cox proportional hazard model. Modified self-consistent/em algorithm can

12 be used to compute the empirical likelihood ratio for β (the regression coefficient) there with arbitrary censored/truncated data, etc.

13 Using the NPMLEs, with and without extra constraint, we can compute Empirical Likelihood Ratio: 2 log L( ˆF ) L( ˆF ) = χ where ˆF is the NPMLE without additional constraint. ˆF is the NPMLE with a constraint. Test the hypothesis specified by the constraint: Reject H 0 if χ > 3.84 The Empirical likelihood method was first proposed by Thomas and Grunkmier (1975) to obtain better confidence intervals in connection

14 with the Kaplan-Meier estimator. Owen (1988, 1990) and many others developed this into a general methodology. It has many desirable statistical properties, see Owen s (2001) book Empirical Likelihood. One of the nice features of the empirical likelihood ratio that is particularly appreciated in the censored/truncated data analysis is that we can test hypothesis/construct confidence intervals without estimating the variance of the statistic. Some of the computational algorithms above have been implemented: package emplik for software R. ( )

15 Back to Bayes: Prior on the probability F ( ). Assume F ( ) is distributed as a Dirichlet process with parameter α. Under the Dirichlet process prior assumption, the probability measure P (A) = A df has the following property: given any partition of the real line: A 1, A u the joint distribution of the random vector (P (A 1 ),, P (A u )) has a Dirichlet distribution with parameter given by α(a 1 ),, α(a u ). For more discussion and properties of Dirichlet process prior, please see Ferguson (1973), Susarla and Van Ryzin (1976) and Ferguson, Phadia and Tiwari (1993).

16 Susarla and Van Ryzin (1976) obtained the Bayes estimator for F ( ) under Dirichlet process prior, using squared error loss, for right censored data. They also showed that ˆF Bayes ˆF NP MLE = Kaplan Meier Est. as α 0 Some later papers studied the large sample consistency of the Bayes estimator (Susarla and Van Ryzin 1978) and the posterior distribution (Ghosh and Ramamoorthi 1995). Huffer and Doss (1999) used Monte Carlo methods to compute the nonparametric Bayes estimator.

17 We show that for any sequence of priors the nonparametric Bayes estimators under squared error loss cannot always converge to the corresponding NPMLE with doubly/interval censored data. This is a bit surprising since in most cases, MLEs are limits of Bayes estimators. 2. Bayes estimator with right, left/interval censored observations The Bayes estimator of 1 F ( ) under squared error loss of SV is the conditional expectation of 1 F given all the observations. The conditional expectation is computed in two steps: first given all the uncensored observations we find theconditional distribution of 1 F.

18 Second, given all the censored observations we compute the conditional expectation, where the distribution of those lifetimes before censoring are given by the result of the first step. The following theorem specifies the conditional distribution of F ( ) given all the uncensored observations, which accomplishes the first step. Theorem 1 The posterior distribution of the random probability measure P given ( 1 = 1, Z 1 ),..., ( k = 1, Z k ) is the Dirichlet process with parameter β = α + k i=1 δ Zi, where δ a is a unit measure on the point a.

19 Next, the conditional expectation of 1 F (u) = P [u, ) is computed given the remaining n k 1 censored observations. 2.1 Bayes estimator with One interval/left censored observation We first present the Bayes estimator with many right censored observations but with only one interval censored observation, denoted by [L w, Z w ). By the similar argument as in SV Corollary 1, the conditional expectation, E β, of 1 F (u) = P [u, ) = P (X u) given all the right censored data and one interval censored observation is Ŝ D (u) = E β{p [u, )P [L w, Z w ) right censored P [Z i, )} E β {P [L w, Z w ) right censored P [Z i, )}. (2.1)

20 This is also the desired Bayes estimator of 1 F (u). Straightforward calculation yields Ŝ D (u) = E βp [u, ){P [L w, ) P [Z w, )} r c P [Z i, ) E β {P [L w, ) P [Z w, )} r c P [Z i, ) = E β{p [u, )P [L w, ) r c P [Z i, )} E β {P [u, )P [Z w, ) r c P [Z i, )} E β {P [L w, ) r c P [Z i, )} E β {P [Z w, ) r c P [Z i, )} = E β 1 E β 2 E β 3 E β 4. (2.3) The four expectations in (2.3) are all of a same type and can be computed explicitly by the Lemma below.

21 Given a set of positive numbers 0 < a k+1 < a k+2 < < a m <, it partitions the nonnegative half line R + into intervals [0, a k+1 ), [a k+1, a k+2 ),, [a m, ). By Theorem 1 the random vector P [0, a k+1 ), P [a k+1, a k+2 ), P [a m, ) has a Dirichlet distribution with parameter vector (β k+1,, β m+1 ) where β k+1 = β[0, a k+1 ),, β m+1 = β[a m, ). The measure β is given as before by β = α + uncensored δ Zi. Lemma 1 (Susarla and Van Ryzin) With the notation above, we have E β m i=k+1 P [a i, ) = m k 1 i=0 i + i j=0 β m+1 j i + β(r + ) = m k 1 i=0 ( i + β[am i, ) i + β(r + ) )

22 When α(r + ) = 0 the expression on the right hand side of Lemma 1 is still well defined unless there are no uncensored observations in the sample. Remark: It is clear from the definition of β that when α(r + ) 0, β is integer valued. This implies that the limit of the expectation in Lemma 1 has a rational value (finite product of rational numbers) as α(r + ) Many interval censored observations

23 The Bayes estimator of 1 F (u) = P (X u) is Ŝ D (u) = E β{p [u, ) i c P [L w, Z w ) r c P [Z i, )} E β { i c P [L w, Z w ). (2.5) r c P [Z i, )} Let us recall the identity m i=1 (b i a i ) = y 1 y 2 y m (2.7) where y i = either b i or a i and the summation is over all possible 2 m choices. The integer m is defined as m = number of interval censored observations.

24 By using (2.7), we can rewrite P [L w, Z w ) = {P [L w, ) P [Z w, )} = P 1 P 2 P m i c i c where each P w = either P [L w, ) or P [Z w, ) and the summation is over all 2 m different choices. To make the expression more specific we introduce the follow notation. Define vectors ξ = (ξ 1,, ξ m ) where each ξ i = either 0 or 1. There are 2 m such vectors. Given m interval censored observations, [L i, Z i ), we define 2 m sets of numbers (each of size m) {c i (ξ), i = 1, 2,, m} where c i (ξ) = L i if ξ i = 0 otherwise c i (ξ) = Z i.

25 Associate with each set of {c i (ξ), i = 1, m} we also define a sign: if the set contains even number of Z i s then the sign is positive, if the set contains odd number of Z i s the sign is negative, i.e. ± = ( 1) ξ i. With these definition we can further rewrite i c P [L w, Z w ) = P 1 P 2 P m = ξ ± m i=1 P [c i (ξ), ) where the summation is over all 2 m different ξ s, and ± is the associated sign we defined above. Finally, we define new sets of numbers by adding r (r = #{r c}) right censored observations Z 1,, Z r to {c i (ξ), i = 1,, m}: {b j (ξ), j = 1,, m + r} = {c i (ξ), i = 1,, m} {Z 1,, Z r }.

26 For any sets of real numbers b 1, b 2,, b k, we denote by b ( 1), b ( 2), b ( k) the reversely ordered numbers (descending). So, b + (ξ), i = 1,, m+ ( i) r is a set of m + r numbers ordered from largest to smallest. With these sets of numbers defined, the denominator of (2.5) can be written as Eβ P 1 P 2 P m P [Z i, ) r c = ξ ± m+r i=1 i 1 + β[b + (ξ), ) ( i) i 1 + β(r + ) where the summation is over 2 m different ξ s. We can similarly compute the numerator of (2.5) except there is one more term, P [u, ), included with the right censored observations. Define {b + j (ξ)} = {c i(ξ), i = 1,, m} {Z 1,, Z r, u}.

27 Theorem 2 The nonparametric Bayes estimator of the survival function S(u) = 1 F (u) with right censored and interval censored data under Dirichlet process prior is Ŝ D (u) = Eβ { P1 P 2 P m r c P [Z i, ) P [u, ) } Eβ { P1 P 2 P m r c P [Z i, ) }, = ξ ( 1) ξ ( 1) m+r+1 ξs i=1 m+r ξs i=1 i 1 + β[b + (ξ), ) ( i) i 1 + β(r + ) i 1 + β[b ( i) (ξ), ) i 1 + β(r + ). (2.8) The two sums in (2.8) are over all 2 m possible ξ s.

28 Admittedly the two summations above involves 2 m terms when there are m interval censored observations. Also, in the summations, there are both positive and negative terms that will cancel and diminishing significant digits. Rounding errors will be magnified. Remark: From Lemma 1 and Theorem 2, we can infer that the limit of the Bayes estimator (2.8) when the α measure approaches zero is a step function, at least for u < maximum observed value. This is because all the E β involved will be step functions according to Lemma 1. We can also infer that when the α measure approaches zero, the limit of the Bayes estimator (2.8) is always rational valued, since the E β involved will be all rational valued.

29 Examples We shall pay close attention to the limit of the Bayes estimator when α 0 in the Dirichlet prior (non-informative prior), and compare the estimator with NPMLE. Example 1 These are the data used by SV (1976) in their example but with an added left censored observation at Z = 4. Z i s : : Table 1. Data with one left and four right censored observations.

30 We plot the estimator with α(u, ) = B exp( θu). The plot shows estimators for B = 8, θ = 0.12 and B = 0.001, θ = The latter is indistinguishable in appearance with the limit of α 0. t NPMLE Limit Bayes Table 2. NPMLE and limit of Bayes estimator for data in table 1

32 Example 2: We took the first 10 observations from the breast cosmesis data with radiation of Finkelstein and Wolfe as reported by Fay (1999). Out of the 10, there are 4 right censored observations, 4 interval censored observations and 2 left censored observations (i.e. interval censored with left-ends = 0). Data: [45, ), [6, 10), [0, 7), [46, ), [46, ), [7,16), [17, ), [7,14), [37, 44), [0, 8). We compute the nonparametric Bayes estimator with α(u, ) = B exp( θu) The resulting estimator with B = 8 and θ = 0.3 is computed using the software we developed and plotted in Figure 2.

34 In the following two examples, the Bayes estimators are obtained with formula (2.8) and then we let α(r + ) 0 to obtain the limit. The NPMLE s are also calculated not by software but analytically. Example 3 Here we took a small example with one left and one right censored observation. The NPMLE and limit of non-parametric Bayes estimator turns out to be exactly the same. Z i s : Z (1) Z (2) Z (3) Z (4) Z (5) : Jump of 1 ˆF (u) Table 4. Data with one left and one right censored observation. Example 4 The order of two censoring indicators in Example 3 are switched and the NPMLE is different from the limit of Bayes estimator.

35 The limit of Bayes estimator is not self-consistent either. To calculate the NPMLE, we first note for this data the NPMLE F ( ) have only three jumps at Z (1), Z (3) and Z (5). Denote the jumps by p 1, p 2, p 3. By symmetry we must have p 1 = p 3. Using the constraint p i = 1, we can reduce the likelihood, L = p 1 (p 2 + p 3 )p 2 (p 1 + p 2 )p 3, to a function of only p 2. Straightforward calculation show p 2 = 5/5 maximizes the likelihood. Therefore p 1 = p 3 = (5 5)/10 which is the entry in the table. Z i s : Z (1) Z (2) Z (3) Z (4) Z (5) : Jump of limit Bayes Jump of NPMLE

36 Remark: This example showed that with positive probability, the NPMLE 1 ˆF ( ) for doubly/interval censored data can take irrational values. A closer look at the Example 4 provides some insight as why the two estimators are different, as described in next section. Remark: Example 3 and 4 reveal two different situations. The difference is that left and right censored data overlap (right censored observation is smaller then the left censored observation) in example 4. The overlap in Example 3 is not real, since there is no probability mass inside the overlap. 4. Limit of Bayes and NPMLE

37 The argument below is valid for any prior, not just the Dirichlet process prior. Theorem 3. Suppose a sequence of priors, π v ; v = 1, 2,, is such that the (nonparametric) Bayes estimator 1 ˆF v ( ) under squared error loss, converges to the Kaplan-Meier estimator whenever the data contains only right censored observations. Then this same sequence of Bayes estimators cannot converge, in general, to the NPMLE for interval/doubly censored data. Proof: The Bayes estimator under squared error loss can be written as 1 ˆF v (u) = E πp [u, )L F (data) E π L F (data),

38 where L F (data) is the likelihood of the data when its distribution is F. The assumption of the Theorem for right censored data says that as v we always have E π {P [u, ) P [x i, ) r c E π r c P [x i, ) uncensor uncensor P ({x j })} P ({x j })} 1 F K M (u). (4.1) Notice the Kaplan-Meier estimator, F K M (u), is always rational valued. Now we look at a particular sample configuration with just one left censored observation, for example, the data as in Example 4. The Bayes estimator of 1 F (u) for this data can be written as E π P [u, )P ({Z 1 })P [Z 2, )P ({Z 3 })P [0, Z 4 )P ({Z 5 }) E π P ({Z 1 })P [Z 2, )P ({Z 3 })P [0, Z 4 )P ({Z 5 }).

39 Let us use the notation P (Z) = P ({Z}), and P (Z + ) = P [Z, ). Write P [0, Z 4 ) = 1 P [Z 4, ) = 1 P (Z 4 + ) and expand to get E π P (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 )P (u+ ) E π P (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 )P (Z+ 4 )P (u+ ) E π P (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 ) E πp (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 )P (Z+ 4 ) (4.2) If we divide the numerator of (4.2) by E π P (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 )P (u+ ) then as v, the numerator will converge to the limit: 1 [1 F K M (Z 4)] according to (4.1). Here the Kaplan-Meier estimator is based on three uncensored observations: Z 1, Z 3, Z 5 and two right censored observations: Z 2, u. Similarly, if we divide the denominator of (4.2) by EP (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 ) then it has the limit 1 [1 F K M (Z 4)], where the Kaplan-Meier es-

40 timator is based on three uncensored observations, Z 1, Z 3, Z 5 and one right censored observation Z 2. In other words, multiply (4.2) by E π P (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 ) E π P (Z 1 )P (Z 3 )P (Z 5 )P (Z + 2 )P (u+ ) (4.3) then it will have a rational limit. This factor, (4.3), itself have a rational limit as v ( = [1 F K M (u)] 1 ). This imply the limit of (4.2), as v, is F K M (Z 4) F K M (Z 4) [1 F K M (u)]. But that cannot be the NPMLE as Example 4 show the NPMLE is irrational valued.

41 This also serve as the proof for the interval censored case since the left censored observation is just [0, Z (4) ) interval censored. Corollary 1. There is no sequence of priors, π v, such that the resulting sequence of Bayes estimators under squared error loss, 1 F v ( ), always converge to the NPMLE in the interval/doubly censored data case. Proof: Suppose, to the contrary, there is such a sequence of priors. Since right censoring is a special case of double/interval censoring (zero left censoring or all intervals are of the form [a i, )), this sequence of estimators must converge to the Kaplan-Meier estimator with right censored data. But by Theorem 3 such sequence cannot converge to the NPMLE for doubly/interval censored data case in general.

42 References Chang, M. (1990). Weak convergence of a self-consistent estimator of the survival function with doubly censored data. Ann. Statist. 18, Chang, M. and Yang. G.L. (1987). Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Ann. Statist. 15, Chen, K. and Zhou, M. (2003). Non-parametric hypothesis testing and confidence intervals with doubly censored data. Lifetime Data Analysis 9, Fay, M. (1999). Splus functions for nonparametric estimate and test for interval censored data. Ferguson, T. (1973). A Bayesian analysis of some nonparametric Problems. Ann. Statist Ferguson, T., Phadia, E.G. and Tiwari, R.C. (1993). Bayesian nonparametric inference. Current Issues in Statistical Inference: Essays in honor of D. Basu (edited by M. Ghosh and P.K. Pathak). IMS Lecture Notes Monograph series 34, Ghosh, J.K. and Ramamoorthi, R.V. (1995). Consistency of Bayesian inference for survival analysis with or without censoring. Analysis of Censored Data (edited by Koul, H. and Deshpande, J.V.) IMS Lecture Notes Monograph Series 27, Groeneboom, P. and Wellner, J. (1992). Information Bounds and Nonparametric Maximum Likelihood Estimation. Birkhauser, Basel. Gu, M.G. and Zhang, C.H. (1993). Asymptotic properties of self-consistent estimators based on doubly censored data. Ann. Statist. 21,

43 Hjort, N. (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist. 18, Huang, J. and Wellner, J. (1996). Interval censored survival data: a review of recent progress. Proceedings of the First Seattle Symposium in Biostatistics. Lin, D.Y. and Fleming, T. Editors Huffer, F. and Doss, H. (1999). Software for Bayesian analysis of censored data using mixtures of Dirichlet priors. Preprint. Kaplan, E. and Meier, P. (1958), Non-parametric estimator from incomplete observations. J. Amer. Statist. Assoc. 53, Susarla, V. and Van Ryzin, J. (1976), Nonparametric Bayesian estimation of survival curves from incomplete observations. J. Amer. Statist. Assoc. 71, Susarla, V. and Van Ryzin, J. (1978), Large sample theory for a Bayesian nonparametric survival curves estimator based on censored samples. Ann. Statist. 6, Turnbull, B. (1974). Nonparametric estimation of a survivorship function with doubly censored data. JASA, Department of Statistics University of Kentucky Lexington, KY

NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA

Statistica Sinica 14(2004, 533-546 NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA Mai Zhou University of Kentucky Abstract: The non-parametric Bayes estimator with