Pseudo full likelihood estimation for prospective survival. analysis with a general semiparametric shared frailty model: asymptotic theory

Size: px

Start display at page:

Download "Pseudo full likelihood estimation for prospective survival. analysis with a general semiparametric shared frailty model: asymptotic theory"

Alannah Pierce
5 years ago
Views:

1 Pseudo full likelihood estimation for prospective survival analysis with a general semiparametric shared frailty model: asymptotic theory David M. Zucker 1 Department of Statistics, Hebrew University, Mt. Scopus, Jerusalem 9195, Israel mszucker@mscc.huji.ac.il Malka Gorfine Faculty of Industrial Engineering and Management, Technion, Technion City, Haifa 32, Israel, and Department of Mathematics, Bar-Ilan University, Ramat-Gan, 529, Israel gorfinm@ie.technion.ac.il Li Hsu Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA , USA lih@fhcrc.org August 29, 27 1 To whom correspondence should be addressed. Phone: Fax:

2 Abstract In this work we present a simple estimation procedure for a general frailty model for analysis of prospective correlated failure times. Earlier work showed this method to perform well in a simulation study. Here we provide rigorous large-sample theory for the proposed estimators of both the regression coefficient vector and the dependence parameter, including consistent variance estimators. Key words: Correlated failure times; EM algorithm; Frailty model; Prospective family study; Survival analysis. i

3 1 Introduction Many epidemiological studies involve failure times that are clustered into groups, such as families or schools. Unobserved characteristics shared by members of the same cluster (e.g. genetic information or unmeasured shared environmental exposures) could influence time to the studied event. Frailty models express within-cluster dependence through a shared unobservable random effect. Estimation in the frailty model has received much attention under various frailty distributions, including gamma (Gill, 1985, 1989; Nielsen et al., 1992; Klein 1992, among others), positive stable (Hougaard, 1986; Fine et al., 23), inverse Gaussian, compound Poisson (Henderson and Oman, 1999) and log-normal (McGilchrist, 1993; Ripatti and Palmgren, 2; Vaida and Xu, 2, among others). Hougaard (2) provides a comprehensive review of the properties of the various frailty distributions. In a frailty model, the parameters of interest typically are the regression coefficients, the cumulative baseline hazard function, and the dependence parameters in the random effect distribution. Since the frailties are latent covariates, the Expectation-Maximization (EM) algorithm is a natural estimation tool, with the latent covariates estimated in the E-step and the likelihood maximized in the M-step after substituting in the estimated latent quantities. Gill (1985), Nielsen et al. (1992) and Klein (1992) discussed EM-based maximum likelihood estimation for the semiparametric gamma frailty model. One problem with the EM algorithm is that variance estimates for the estimated parameters are not readily available (Louis, 1982; Gill, 1989; Nielsen et al., 1992; Andersen et al., 1997). It has been suggested (Gill, 1989; Nielsen et al, 1992) that a nonparametric information calculation could yield consistent variance estimators. Parner (1998), building on Murphy (1994, 1995), proved the consistency and asymptotic normality of the maximum likelihood estimator in the gamma frailty model. Parner also presented a consistent estimator of the 1

4 limiting covariance matrix of the estimator, based on inverting a discrete observed information matrix. He noted that since the dimension of the observed information matrix grows with the number of observed survival times, inverting the matrix is practically infeasible for a large data set with many distinct failure times. He therefore suggested an alternate approach to estimating the covariance, based on solving a discrete version of a second order Sturm-Liouville equation, along the lines of Bickel (1985). This covariance estimator requires less computational effort, but still is not so simple to implement. We (Gorfine et al., 26) developed a new method that can handle any parametric frailty distribution with finite moments. Nonconjugate frailty distributions can be handled by a simple univariate numerical integration over the frailty distribution. Our new method possesses a number of desirable properties: a non-iterative procedure for estimating the cumulative hazard function; consistency and asymptotic normality of the parameter estimates; a direct consistent covariance estimator; and easy computation and implementation. The method was found to perform well in a simulation study and the results are very similar to those of the EM-based method. Indeed, on a dataset-by-dataset basis, the correlation between our estimator and the EM estimator was found to be 95% for the covariate regression parameter and 98-99% for the within-cluster dependence parameter. The purpose of the current paper is to present in detail the theoretical justification for the method. Our technical approach resembles that of Bagdonavicius and Nikulin (1999) and Dabrowska (26a, 26b). These works, however, dealt with a univariate data context, whereas we deal with a clustered data context. Dabrowska works with a transformation model with unknown transformation. She discusses the univariate gamma frailty model, but assumes that the shape parameter of the frailty distribution is known. Indeed, as discussed in Dabrowska (26a, pp ), identifiability problems arise in the univariate 2

5 gamma frailty model with unknown shape parameter when an unknown transformation is involved. In fact, even when the transformation is known, if there are no covariate effects on the hazard rate (i.e., in the model (1) below, the regression parameter vector β is equal to zero), the shape parameter cannot be identified from univariate data (Lancaster and Nickell, 198). In our setting, there is no unknown transformation, and we have clustered data. In this case, the shape parameter is identifiable irrespective of whether β is zero or nonzero. In our work, we are specifically interested in estimating the shape parameter, which expresses the within-cluster dependence. In genetic research and other contexts, this cluster dependence parameter is itself of significant scientific interest, because it provides insight into the impact of genetic and environmental factors on the disease incidence. Dabrowska (26b) discusses a one-step method for converting a consistent estimator into a semiparametric efficient estimator. In principle, this approach could be applied to our estimator as well. In our simulations, however, we found that our estimator was comparable in efficiency to the full nonparametric MLE. Thus, although our estimator is not theoretically semiparametric efficient, in practical terms it closely approaches semiparametric efficiency. The plan of the paper is as follows. Section 2 presents the estimation procedure. Section 3 presents the consistency and asymptotic normality results, along with the covariance estimator for the parameter estimates. Section 4 presents a simulation study. Section 5 presents the technical conditions required for our theoretical results and the proofs of these results. The proofs are patterned after Zucker (25), but with a number of significant differences, which are described at the beginning of Section 5. 3

6 2 The Proposed Approach Consider n families, with family i containing m i members, i = 1,..., n. Following Parner (1998, p. 187), we regard m i as a random variable over {1,..., m} for some m, and build up the remainder of the model conditional on m i. Let T ij and C ij denote the failure and censoring times, respectively, for individual ij. The observed follow-up time is T ij = min(t ij, C ij ), and the failure indicator is δ ij = I(T ij C ij ). On each individual, we observe a p-vector of covariates Z ij. In addition, we associate with family i an unobservable family-level covariate W i, the frailty, which induces dependence among family members. The conditional hazard function for individual ij, given the family frailty W i, is taken to be λ ij (t) = W i λ (t) exp(β T Z ij ) i = 1,..., n j = 1,..., m i (1) where λ is an unspecified conditional baseline hazard and β is a p-vector of unknown regression coefficients. This is an extension of the Cox (1972) proportional hazards model, with the hazard function for an individual in family i multiplied by W i. Conditional on W i, the individuals within a family are assumed independent. We also assume that, given Z ij and W i, the censoring is independent and noninformative for W i and (β, Λ ) (Andersen et al., 1993, Sec. III.2.3). We assume further that the frailty W i is independent of Z ij and has a density f(w; θ), where θ is an unknown parameter. For simplicity we assume that θ is a scalar, but the development extends readily to the case where θ is a vector. Finally, we assume that for any given family, there is a positive probability of at least two failures. This condition is necessary to ensure identifiability of the model; see Nielsen et al. (1992, Sec. 4, end). Let τ be the end of the observation period. The full likelihood of the data then can 4

7 be written as L = = n m i {λ ij (T ij )} δ ij S ij (T ij )f(w)dw n m i n {λ (T ij ) exp(β T Z ij )} δ ij w Ni.(τ) exp{ wh i. (τ)}f(w)dw, (2) i=1 where N ij (t) = δ ij I(T ij t), N i. (t) = m i j=1 N ij (t), H ij (t) = Λ (T ij t) exp(β T Z ij ), a b = min{a, b}, Λ ( ) is the baseline cumulative hazard function, S ij ( ) is the conditional survival function of subject ij, and H i. (t) = m i j=1 H ij (t). The log-likelihood is given by n m i n { l = δ ij log{λ (T ij ) exp(β T Z ij )} + log i=1 } w Ni.(τ) exp{ wh i. (τ)}f(w)dw. The normalized scores (log-likelihood derivatives) for (β 1,..., β p ) are given by U r = 1 n n m i i=1 j=1 δ ij Z ijr 1 n n i=1 [ mi j=1 H ij (T ij )Z ijr ] w N i. (τ)+1 exp{ wh i. (τ)}f(w)dw w N i. (τ) exp{ wh i. (τ)}f(w)dw (3) for r = 1,..., p. The normalized score for θ is U p+1 = 1 n n i=1 w N i. (τ) exp{ wh i. (τ)}f (w)dw w N i. (τ) exp{ wh i. (τ)}f(w)dw where f (w) = d dθ f(w). Let γ = (βt, θ) and U(γ, Λ ) = (U 1,..., U p, U p+1 ) T. To obtain estimators ˆβ and ˆθ, we propose to substitute an estimator of Λ, denoted by ˆΛ, into the equations U(γ, Λ ) =. is Let Y ij (t) = I(T ij t) and let F t denote the entire observed history up to time t, that F t = σ{n ij (u), Y ij (u), Z ij, i = 1,..., n; j = 1,..., m i ; u t}. Then, as discussed by Gill (1992) and Parner (1998), the stochastic intensity process for N ij (t) with respect to F t is given by λ (t) exp(β T Z ij )Y ij (t)ψ i (γ, Λ, t ), (4) 5

8 where ψ i (γ, Λ, t) = E(W i F t ). Using a Bayes theorem argument and the joint density (2) with observation time restricted to [, t), we obtain ψ i (γ, Λ, t) = φ 2i (γ, Λ, t)/φ 1i (γ, Λ, t), where φ ki (γ, Λ, t) = w N i.(t)+(k 1) exp{ wh i. (t)}f(w)dw, k = 1,..., 4. Given the intensity model (4), in which exp(β T Z ij )ψ i (γ, Λ, t ) may be regarded as a time dependent covariate effect, a natural estimator of Λ is a Breslow (1974) type estimator along the lines of Zucker (25). For given values of β and θ we estimate Λ as a step function with jumps at the observed failure times τ k, k = 1,..., K, with ˆΛ (τ k ) = d k ni=1 ψ i (γ, ˆΛ, τ k 1 ) m i j=1 Y ij (τ k ) exp(β T Z ij ) where d k is the number of failures at time τ k. Note that given the intensity model (4), the estimator of the kth jump depends on ˆΛ up to and including time τ k 1. By this approach, we avoid complicating the iterative optimization process with a further iterative scheme for estimating the cumulative hazard. This feature, however, does not necessarily translate into a computational advantage relative to the EM-method, because ψ i (γ, ˆΛ, τ k 1 ) has to be computed at every jump. Bagdonavicius and Nikulin (1999) proposed a similar estimator in a univariate survival context, for a model which they called the generalized proportional hazards model, which includes univariate frailty-type models. (5) 3 Asymptotic Properties Let γ = (β T, θ ) T with β, θ and Λ (t) denoting the respective true values of β, θ and Λ (t), and let ˆγ = (ˆβ T, ˆθ) T. We assume the technical conditions listed in Section

9 In Section 4.3, we establish the following results. A. ˆΛ (t, γ) converges almost surely to Λ (t, γ) uniformly in t and γ. B. U(γ, ˆΛ (, γ)) converges almost surely uniformly in t and γ to a limit u(γ, Λ (, γ)). C. There exists a unique consistent root to U(ˆγ, ˆΛ (, ˆγ)) =. In Section 4.4, we show that n 1/2 (ˆγ γ ) is asymptotically normally distributed. We accomplish this by analyzing in turn each of the terms in the following decomposition: = U(ˆγ, ˆΛ (, ˆγ)) = U(γ, Λ ) + [U(γ, ˆΛ (, γ )) U(γ, Λ )] + [U(ˆγ, ˆΛ (, ˆγ)) U(γ, ˆΛ (, γ ))]. (6) We show further that the covariance matrix of ˆγ can be consistently estimated by a sandwich estimator of the following form: Ĉov(ˆγ) = D 1 (ˆγ){ ˆV(ˆγ) + Ĝ(ˆγ) + Ĉ(ˆγ)}D 1 (ˆγ) T. (7) The matrix D consists of the derivatives of the U r s with respect to the parameters γ. V is the asymptotic covariance matrix of U(γ, Λ ), G is the asymptotic covariance matrix of [U(γ, ˆΛ (, γ )) U(γ, Λ )], and C is the asymptotic covariance matrix between U(γ, Λ ) and [U(γ, ˆΛ (, γ )) U(γ, Λ )]. The term G+C reflects the added variance resulting from the need to estimate the cumulative hazard function. All these matrices are defined explicitly in Section Simulation Study for the Gamma Frailty Case In Gorfine et al. (26), we presented a simulation study comparing our method to the EM method under the gamma frailty distribution with expectation 1 and variance θ. Here 7

10 we extend the simulation study by considering larger θ values and family sizes larger than two. Gorfine et al. describes in detail the steps of the EM-based algorithm, as given in Nielsen et al. (1992), in parallel with the corresponding steps in our procedure. We refer the reader to Gorfine et al. for details. The setup for the simulation study, which is patterned after Hsu et al. (24), is as follows. We worked with a sample size of 3 families, with a common family size of 2 or 5. We generated for each family a common frailty value W from the gamma distribution with scale and shape parameters both equal to θ 1, and for each individual a single covariate Z from the standard normal distribution. Conditional on W, the survivor function was taken to be S(t Z, W ) = exp{ W exp(βz)(.1t) 4.6 }. We took the censoring distribution to be N(6, 15 2 ). The β values examined were β = ln(2) and β = ln(3), leading to censoring levels of approximately 85% and 8%, respectively. The censoring distribution was chosen in order to generate an appropriate mean age at onset and age-of-onset distribution, similar to what is often observed for late onset diseases. With censoring distributed according to N(13, 15 2 ) the respective censoring levels are approximately 35% and 3%. The θ values examined were θ = 2 and θ = 4. Tables 1-2 present the simulation results for the two estimation techniques, based on 1, replications. For our method, we compare the mean estimated standard error based on our theoretical formula with the empirical standard error, and provide the empirical coverage rate of the 95% Wald-type confidence interval. For the EM-based method, we report only the empirical standard error. In addition, the empirical correlation between the EM-based estimators and our estimators is presented. The additional simulation 8

11 results confirm our earlier findings. Both estimation techniques perform very well in terms of bias. Also, for our method, fairly good agreement was observed between the estimated and the empirical standard error, although some differences were seen in some cases. The high values of the correlations implies similarity between the two estimation techniques not only on an average basis, but actually on a data set by data set basis. 5 Technical Conditions and Proofs 5.1 Introductory Remarks This section presents the technical conditions we assume for the asymptotic results and the proofs of these results. Some details have been omitted for the sake of brevity. These details are provided in an expanded version of this paper which is available at the Front for the Mathematics ArXiv under Statistics, publication number: math.st/ The general pattern of the argument follows that of Zucker (25), but with some significant changes. Our estimator for the cumulative hazard is based on the formula ˆΛ (t) = t n 1 n mi dn ij (s) n 1 n mi ψ i (γ, ˆΛ, s )Y ij (s) exp(β T Z ij ). The quantity ψ i (γ, ˆΛ, s ) involves terms of the form ˆΛ (s T ij ), i.e. it involves ˆΛ values at T ij as well as at s. By contrast, the corresponding integrand in Zucker s (25) estimator involves only ˆΛ(s ). (The estimator of Bagdonavicius and Nikulin (1999) is similar to that of Zucker (25) in this respect.) This difference in the structure of the estimators entails the need for substantial extensions to the argument. In particular, Zucker s consistency proof for the cumulative hazard estimate makes use of a result of the form sup β,t,c A (β, t, c) a (β, t, c) a.s., where A (β, t, c) is a certain empirical process, a (β, t, c) is its expectation, and the supremum is over β B, t [, τ], and c [, Λ max ]. In our consistency proof, we need the more complex result given in (2) 9

12 below, whose proof requires a sophisticated argument. In the asymptotic normality proof, a number of extra steps are required, relative to Zucker s proof, to deal with the middle term in the decomposition (6). In particular, we need to introduce the decomposition of ˆΛ (t, γ ) Λ (t) given in (25) below, and the interchange of integrals that is carried out right after introducing this decomposition. Furthermore, unlike in Zucker (25), the first two terms in the decomposition (6) are correlated, so that extra development is needed to deal with the correlation (Step III of the asymptotic normality proof). The structure of the derivative matrix of the score function vector is more complex than in Zucker (25). Finally, in contrast with Zucker (25), we use mainly the classical CLT for sums of iid s rather than the martingale CLT. We take note here that since β and Z ij are bounded, there exists a constant ν > such that ν 1 exp(β T Z ij ) ν. (8) This fact is used repeatedly in our proofs. We also introduce here some basic definitions. Recall that ψ i (γ, Λ, t) = w N i (t)+1 e H i (t)w f(w)dw w N i (t) e H i (t)w f(w)dw, with H i (t) = H i (t, γ, Λ) = m i j=1 Λ(T ij t) exp(β T Z ij ) (here we define H i so as to allow dependence on a general γ and Λ, which will often not be explicitly indicated in the notation). We define (for r m and h ) ψ (r, h) = w r+1 e hw f(w)dw wr e hw f(w)dw. (9) We further define ψ min(h) = min r m ψ (r, h) and ψ max(h) = max r m ψ (r, h). In (9), the numerator and denominator are bounded above since W is assumed to have finite (m + 2)-th moment. Also, since W is nondegenerate, the numerator and denominator are 1

13 strictly positive. Thus ψ max(h) is finite and ψ min(h) is strictly positive. The following result can be proved by elementary calculus (details in the expanded version). Lemma 1: The function ψ (r, h) is decreasing in h. Hence for all γ G and all t, ψ i (γ, Λ, t) ψ max(), (1) ψ i (γ, Λ, t) ψ min(mνλ(t)). (11) 5.2 Technical Conditions In deriving the asymptotic properties of ˆγ we make the following assumptions: 1. The random vectors (m i, T i1,..., T im i, C i1,..., C imi, Z i1,..., Z imi, W i ), i = 1,..., n, are independent and identically distributed. 2. There is a finite maximum follow-up time τ >, with E[ m i j=1 Y ij (τ)] = y > for all i. 3. (a) Conditional on Z ij and W i, the censoring is independent and noninformative of W i and (β, Λ ). (b) W i is independent of Z ij and of m i. 4. The frailty random variable W i has finite moments up to order (m + 2). 5. Z ij is bounded. 6. The parameter γ lies in a compact subset G of IR p+1 containing an open neighborhood of γ. 7. There exist B > and h > (independent of θ) such that, for all h h, we have ψ min(h) Bh 1. 11

14 8. The baseline hazard function λ (t) is bounded over [, τ] by some fixed (but not necessarily known) constant λ max. 9. The function f (w; θ) = (d/dθ)f(w; θ) is absolutely integrable. 1. The censoring distribution has at most finitely many jumps on [, τ]. 11. For any given family, there is a positive probability of at least two failures. 12. The matrix [( / γ)u(γ, ˆΛ (, γ))] γ=γ is invertible with probability going to 1 as n. In regard to Assumption 7, the assumption is satisfied if either one of the following two conditions holds. a. There exist b(θ) > and C(θ) > such that sup θ with b(θ) bounded from below over θ. f(w; θ) C(θ)w 1 (b(θ) 1) as h, b. We have lim w sup θ f(w; θ) =, and there exists a > independent of θ such that f(w; θ) is increasing in w over w [, a]. These conditions cover a wide range of frailty distributions, including popular choices such as the gamma, inverse Gaussian, and lognormal. 5.3 Preliminary Lemmas Lemma 2: Define Λ = 1.3e mσ h/(mν), with σ = 1.1mν 2 /(By ), with h and B as above. Then, with probability one, there exists n such that, for all t [, τ] and γ G, ˆΛ (t, γ) Λ for n n. (12) 12

15 Thus, ˆΛ (t, γ) is naturally bounded, with no need to impose an upper bound artificially. Proof: To simplify the writing below, we will suppress the argument γ in ˆΛ (t, γ). Recall 1 n ˆΛ (τ k ) = ψ i (γ, ˆΛ m i, τ k 1 ) Y ij (τ k ) exp(β T Z ij ), i=1 where we now take d k = 1 since the survival time distribution is assumed continuous. Using Lemma 1 and (8), we have j=1 ˆΛ (τ k ) n 1 νψmin(mν ˆΛ(τ k 1 )) 1 1 n 1 n m i Y ij (τ). By the strong law of large numbers, there exists with probability one some n such that We thus have, for n n, 1 n n m i Y ij (τ).999y for n n. (13) ˆΛ (τ k ) n 1 ( 1.1ν y ) ψ min(mν ˆΛ(τ k 1 )) 1. (14) Given this result, the desired conclusion can be obtained via simple technical manipulations, detailed in the expanded version of this paper. Lemma 3: We have sup s [,τ] ˆΛ (s, γ ) ˆΛ (s, γ ) as n, as an immediate consequence of Lemma 2 and (14). 5.4 Consistency We now show the almost sure consistency of ˆβ and ˆΛ. The argument is built on Claims A-C of Section 3, which we prove below. Our argument follows Zucker (25, Appendix A.3). Claim A: ˆΛ (t, γ) converges a.s. to some function Λ (t, γ) uniformly in t and γ. Proof: Whenever a functional norm is written below, the relevant uniform norm is intended. We define Λ max = max( Λ, λ max τ), h max = mνλ max, and ψ (r, h) = ψ (r, h 13

16 h max ). It is easy to see that ψ (r, h) is Lipschitz continuous in h, uniformly in r. Recall that ψ i (γ, Λ, t) = ψ (N i (t), H i (t, γ, Λ)). Lemma 2 implies that H i (t, γ, ˆΛ (, γ)) h max for all t [, τ] and γ G. Hence ψ i (γ, ˆΛ (, γ), t) = ψ (N i (t), H i (t, γ, ˆΛ (, γ))). Now define, for a general function Λ, and Ξ n (t, γ, Λ) = t n 1 n mi dn ij (s) n 1 n mi ψ (N i (s ), H i (s, γ, Λ))Y ij (s) exp(β T Z ij ) Ξ(t, γ, Λ) = t E[ m i j=1 ψ (N i (s ), H i (s, γ, Λ ))Y ij (s) exp(β T Z ij )] E[ m i j=1 ψ (N i (s ), H i (s, γ, Λ))Y ij (s ) exp(β T Z ij )] λ (s)ds. By definition, ˆΛ (t, γ) satisfies the equation ˆΛ (t, γ) = Ξ n (t, γ, ˆΛ (, γ)). (15) Next, define q γ (s, Λ) = E[ m i j=1 ψ (N i (s ), H i (s, γ, Λ ))Y ij (s) exp(β T Z ij )] E[ m i j=1 ψ (N i (s ), H i (s, γ, Λ))Y ij (s) exp(β T Z ij )] λ (s). This function is uniformly bounded by B = [ψ max()/ψ min(h max )]λ max. Moreover, by the Lipschitz continuity of ψ (r, h) with respect to h, it satisfies a Lipschitz-like condition of the form q γ (s, Λ 1 ) q γ (s, Λ 2 ) K sup u s Λ 1 (u) Λ 2 (u). Hence, by mimicking the argument of Hartman (1973, Theorem 1.1), we find that the equation Λ(t) = Ξ(t, γ, Λ) has a unique solution, which we denote by Λ (t, γ). The claim then is that ˆΛ (t, γ) converges almost surely (uniformly in t and γ) to Λ (t, γ). Define Λ (n) (t, γ) to be a modified version of ˆΛ (t, γ) defined by linear interpolation between the jumps. Lemma 3 implies that, with probability one, and thus sup Λ (n) (t, γ) ˆΛ (t, γ), (16) t,γ sup Ξ n (t, γ, Λ (t, γ)) Ξ n (t, γ, ˆΛ (t, γ)). (17) t,γ 14

17 Lemma 2 shows that the family L = { Λ (n) (t, γ), n n } is uniformly bounded. We can show further that L is equicontinuous, using arguments similar to those of Zucker (25). The first step is to note that, with N(t) = n 1 n i=1 mi j=1 N ij (t), we have N(t) E[N i (t)] as n uniformly in t with probability one. From this we can obtain the following result: with probability one, for any ɛ > there exists n (ɛ) such that for all t and u with u < t, ˆΛ (t, γ) ˆΛ (u, γ) B (t u) + ɛ 2 for all n n (ɛ). (18) Moreover, ˆΛ (t, γ) is Lipschitz continuous in γ, uniformly in γ and t. The equicontinuity follows. Given that L is a.s. uniformly bounded and equicontinuous, the Arzela-Ascoli theorem implies that it is (almost surely) a relatively compact set in C([, τ] G). Next, define A(γ, Λ, s) = 1 n m i ψ (N i (s ), H i (s, γ, Λ))Y ij (s) exp(β T Z ij ), n m i a(γ, Λ, s) = E ψ (N i (s ), H i (s, γ, Λ))Y ij (s) exp(β T Z ij ). j=1 For any fixed continuous Λ, the functional strong law of large numbers of Andersen and Gill (1982, Appendix III) implies that Here we need the following more complex result: sup A(γ, Λ, s) a(γ, Λ, s) a.s. (19) s,γ sup A(γ, Λ (n), s) a(γ, Λ (n), s) a.s. (2) s,γ The proof of (2) is involved; we give the details in Section 4.5 below. In outline form, the proof involves two steps: (1) showing that, for any given ɛ >, we can define an appropriate finite class L ɛ of functions Λ such that Λ (n) can be suitably approximated by some member of the class; (2) applying the result (19), which will hold uniformly over the finite class. 15

18 Given (2) and the a.s. uniform convergence of N(t) to E[Ni (t)], we can infer that (n) (n) sup Ξ n (t, γ, Λ (t, γ)) Ξ(t, γ, Λ (t, γ)) a.s. (21) t,γ This result is obtained by adapting the argument of Aalen (1976, Lemma 6.1). From (15), (16), (17), and (21) it follows that any limit point of { Λ (n) (t, γ)} must satisfy the equation Λ = Ξ(t, γ, Λ). Since Λ (t, γ) is the unique solution of this equation, it is the unique limit point of { Λ (n) (t, γ)}. Thus { Λ (n) (t, γ)} is a sequence in a compact set with unique limit point Λ (t, γ). Hence Λ (n) (t, γ) converges a.s. uniformly in t and γ to Λ (t, γ). In view of (16), the same holds of ˆΛ (t, γ), which is the desired result. Note that Λ (, γ ) = Λ ( ). Indeed, if we plug Λ into the expression for Ξ(t, γ, Λ), the expectation terms cancel, and so we are left with the integral of λ (s). Thus, Λ is the solution to the equation Λ = Ξ(t, γ, Λ). Claim B: With u(γ, Λ (, γ)) = E[U(γ, Λ (, γ))], we have U(γ, ˆΛ (, γ)) u(γ, Λ (, γ)) uniformly in γ G with probability one. Proof: As in Zucker (25). Claim C: There exists a unique consistent root to U(ˆγ, ˆΛ (, ˆγ)) =. Proof: We apply Foutz s (1977) consistency theorem for maximum likelihood type estimators. The following conditions must be established: F1. U(γ, ˆΛ (, γ))/ γ exists and is continuous in an open neighborhood about γ. F2. The convergence of U(γ, ˆΛ (, γ))/ γ to its limit is uniform in open neighborhood of γ. F3. U(γ, ˆΛ (, γ )) as n. F4. The matrix [ U(γ, ˆΛ (, γ))/ γ] γ=γ is invertible with probability going to 1 as 16

19 n. (In Foutz s paper, the matrix in question is symmetric, and so he stated the condition in terms of positive definiteness. But his proof, which is based on the inverse function theorem, shows that the basic condition needed is invertibility.) It is easily seen that Condition F1 holds. Given Assumptions 2, 4, and 5, Condition F2 follows from the previously-cited functional law of large numbers. As for Condition F3, Claim B says that U(γ, Λ (, γ)) converges a.s. uniformly to u(γ, Λ (, γ)) = E[U(γ, Λ (, γ))]. We noted already that Λ (, γ ) = Λ ( ). Thus we need only show that E[U(γ, Λ )] =. Since U is a score function derived from a classical iid likelihood, this result follows from classical likelihood theory. Condition F4 has been assumed in Assumption 12. With Conditions F1-F4 established, the result follows. 5.5 Asymptotic Normality To show that ˆγ is asymptotically normally distributed, we write = U(ˆγ, ˆΛ (, ˆγ)) = U(γ, Λ ) + [U(γ, ˆΛ (, γ )) U(γ, Λ )] + [U(ˆγ, ˆΛ (, ˆγ)) U(γ, ˆΛ (, γ ))] In the following we consider each of the terms of the right-hand side of the equation. Step I We can write U(γ, Λ ) = n 1 n i=1 ξ i, where ξ i is a (p + 1)-vector with r-th element, r = 1,..., p, given by m i ξ ir = δ ij Z ijr j=1 and (p + 1)-th element given by [ mi j=1 H ij (τ)z ijr ] w N i. (τ)+1 exp{ w{h i. (τ)}f(w; θ)dw w N i. (τ) exp{ wh i. (τ)}f(w; θ)dw ξ i(p+1) = w N i. (τ) exp{ wh i. (τ)}f (w; θ)dw w N i. (τ) exp{ wh i. (τ)}f(w; θ)dw. 17

20 Thus U(γ, Λ ) is the mean of the iid mean-zero random vectors ξ i. It hence follows from the central limit theorem that n 1 2 U(γ, Λ ) is asymptotically mean-zero multivariate normal. To estimate the covariance matrix, let ξ i be the counterpart of ξ i with estimates of γ and Λ substituted for the true values. Then an empirical estimator of the covariance matrix is given by ˆV(ˆγ) = n 1 n i=1 ξ i ξ T i. This is a consistent estimator of the covariance matrix since ˆΛ (t, γ) converges to Λ (t, γ) a.s. uniformly in t and γ (Claim A), and ˆγ is a consistent estimator of γ (Claim C). Step II Let Ûr = U r (γ, ˆΛ ), r = 1,..., p, and Ûp+1 = U p+1 (γ, ˆΛ ) (in this segment of the proof, when we write (γ, ˆΛ ) the intent is to signify (γ, ˆΛ (, γ )). First order Taylor expansion of Ûr about Λ, r = 1,..., p + 1, gives n 1/2 {U r (γ, ˆΛ ) U r (γ, Λ )} m i n = n 1/2 Q ijr (γ, Λ, T ij ){ˆΛ (T ij, γ ) Λ (T ij )} + o p (1), (22) where Q ijr (γ, Λ φ 2i (γ, Λ, T ij ) =, τ) φ 1i (γ, Λ, τ) R ijz ijr φ 3i(γ, Λ, τ) m i φ 1i (γ, Λ, τ) R ij H ij (T ij )Z ijr j=1 + φ2 2i(γ, Λ, τ) m i φ 2 1i(γ, Λ, τ) R ij H ij (T ij )Z ijr j=1 for r = 1,..., p, and Q ij(p+1) (γ, Λ, T ij ) = Rij with R ij = exp(β T Z ij ) and φ 2i (γ, Λ, τ)φ (θ) 1i (γ, Λ, τ) φ 2 1i(γ, Λ, τ) 2i (γ, Λ, τ) φ 1i (γ, Λ, τ), φ(θ) φ (θ) ki (γ, Λ, t) = w N i.(t)+(k 1) exp{ wh i. (t)}f (w)dw, k = 1, 2. The validity of the approximation (22) can be seen by an argument similar to that used in connection with (24) below. 18

21 Given the intensity process (4), the process M ij (t) = N ij (t) t λ (u) exp(β T Z ij )Y ij (u)ψ i (γ, Λ, u )du is a mean zero martingale with respect to the filtration F t. Also, by Lemma 3, we have that sup s [,τ] ˆΛ (s, γ ) ˆΛ (s, γ ) converges to zero. Thus, replacing s by s we obtain the following approximation, uniformly over t [, τ]: ˆΛ (t, γ ) Λ (t) 1 n + 1 n t t {Y(s, Λ )} 1 n m i dm ij (s) [ {Y(s, ˆΛ )} 1 {Y(s, Λ )} 1] n m i dn ij (s), (23) where Y(s, Λ) = 1 n ψ i (γ m i, Λ, s) Y ij (s) exp(β T Z ij ). n i=1 j=1 Now let W(s, r) = {Y(s, Λ + r )} 1 with = ˆΛ Λ. Define Ẇ and Ẅ as the first and second derivative of W with respect to r, respectively. Then, computing the necessary derivatives and carrying out a first order Taylor expansion of W(s, r) around r = evaluated at r = 1 with Lagrange remainder (Abramowitz and Stegun, 1972, p. 88), we get {Y(s, ˆΛ )} 1 {Y(s, Λ )} 1 = = 1 n n m i 1 Ẇ(s, ) + r(s)) 2Ẅ(s, [ Ri. (s)η 1i (, s) 1 ] {Y(s, Λ )} 2 2 h i( r(s), s) exp(β T Z ij ){ˆΛ (T ij s) Λ (T ij s)}, (24) where R ij (u) = exp(β T Z ij )Y ij (u), R i. (u) = m i j=1 R ij (u), r(s) [, 1], η 1i (r, s) = φ 3i(γ, Λ + r, s) φ 1i (γ, Λ + r, s) { φ2i (γ, Λ } 2 + r, s), φ 1i (γ, Λ + r, s) and h i (r, s) is as defined in Section 4.6 below, and shown there to be o(1) uniformly in r and s. 19

22 Let η 1i (s) = η 1i (, s). Plugging (24) into (23) we get t ˆΛ (t, γ ) Λ (t) n 1 {Y(s, Λ )} 1 n m i dm ij (s) t n n 2 m k I(T kl > s)r k. (s)η 1k (s) n exp(β T Z k=1 l=1 {Y(s, Λ )} 2 kl ){ˆΛ (s) Λ m i (s)} dn ij (s) t n n 2 m k I(T kl s)r k. (s)η 1k (s) n exp(β T Z k=1 l=1 {Y(s, Λ )} 2 kl ){ˆΛ (T kl ) Λ m i (T kl )} dn ij (s) t n + n 2 m k 1 n k=1 l=1 2 h k( r(s), s) exp(β T Z kl ){ˆΛ (T kl ) Λ m i (T kl )} dn ij (s). (25) The third term of the above equation can be written, by interchanging the order of integration, as n 2 n m k n m i t k=1 l=1 where Ñij(t) = I(T ij t) and Hence we get [ R k. (s)η 1k (s) s ] {Y(s, Λ )} 2 exp(βt Z kl ) {ˆΛ (u) Λ (u)}dñkl(u)} dn ij (s) t n = {ˆΛ (s) Λ m i (s)} Ω ij (s, t)dñij(s), t n Ω ij (s, t) = n 2 {Y(u, Λ )} 2 R i. (u)η 1i (u) exp(β T m k Z ij ) dn kl (u). s k=1 l=1 t ˆΛ (t, γ ) Λ (t) = n 1 {Y(s, Λ )} 1 where t {ˆΛ (s, γ ) Λ (s)} Υ(s) = n 2 {Y(s, Λ )} 2 n m k k=1 l=1 n m i n m i dm ij (s) {δ ij Υ(s) + Ω ij (s, t) + o(n 1 )}dñij(s) I(T kl > s)r k. (s)η 1k (s) exp(β T Z kl ). The o(n 1 ) is uniform in t (see Section 4.6) and will be dominated by Ω and Υ, which are of order n 1. Hence the o(n 1 ) term can be ignored. 2

23 An argument similar to that of Yang and Prentice (1999) and Zucker (25) now yields the martingale representation ˆΛ (t, γ ) Λ (t) 1 t nˆp(t) ˆp(s ) n mi dm ij (s), (26) Y(s, Λ ) where ˆp(t) = 1 + s t n m i {δ ij Υ(s) + Ω ij (s, t)}dñij(s). Based on (22), we can write U r (γ, ˆΛ ) U r (γ, Λ ) n 1 n m i τ Q ijr (γ, Λ, s){ˆλ (s, γ ) Λ (s)}dñij(s). Plugging the martingale representation (26) into the above equation and carrying out some more algebra (again involving an interchange of integrals) gives where U r (γ, ˆΛ ) U r (γ, Λ ) τ n 1 π r (s, γ, Λ ) ˆp(s ) n mk k=1 l=1 dm kl(s), (27) Y(s, Λ ) π r (s, γ, Λ ) = n 1 τ s ni=1 mi j=1 Q ijr (γ, Λ, t)dñij(t). ˆp(t) Therefore, n 1/2 [U(γ, ˆΛ (, γ )) U(γ, Λ (, γ ))] is asymptotically mean zero multivariate normal with covariance matrix that can be consistently estimated by G rl (ˆγ) = n 1 τ for r, l = 1,..., p + 1. Step III we have π r (s, ˆγ, ˆΛ )π l (s, ˆγ, ˆΛ ){ˆp(s )} 2 ni=1 mi j=1 dn ij (s) {Y(s, ˆΛ )} 2 We now examine the sum of U(γ, Λ ) and U(γ, ˆΛ (, γ )) U(γ, Λ ). From (27), U r (γ, ˆΛ τ n (, γ )) U r (γ, Λ ) n 1 m k α r (s) dm kl (s) = 1 n µ kr, k=1 l=1 n k=1 21

24 where α r (s) is the limiting value of π r (s, γ, Λ )ˆp(s )/Y(s, Λ ) and µ kr is defined as µ kr = τ m k α r (s) dm kl (s). l=1 Arguments in Yang and Prentice (1999, Appendix A) can be used to show that ˆp(s ) has a limit. Also, clearly E[µ kr ] =. We thus have U r (γ, Λ ) + [U r (γ, ˆΛ (, γ )) U r (γ, Λ )] 1 n (ξ ir + µ ir ), n i=1 which is a mean of n iid random variables. Hence n 1/2 {U r (γ, Λ ) + [U r (γ, ˆΛ (, γ )) U r (γ, Λ )]} is asymptotically normally distributed. The covariance matrix may be estimated by ˆV(ˆγ) + Ĝ(ˆγ) + Ĉ(ˆγ), where Ĉ rl (ˆγ) = 1 n with n (ξirµ il + ξilµ ir), r, l = 1,..., p + 1, i=1 µ ir = τ m i π r (s, ˆγ, ˆΛ )ˆp(s ) Y(s, ˆΛ d ˆM ij (s) ) j=1 and ˆM ij (t) = N ij (t) t exp(ˆβ T Z ij )Y ij (u)ψ i (ˆγ, ˆΛ, u )dˆλ (u). Step IV First order Taylor expansion of U(ˆγ, ˆΛ (, ˆγ)) about γ = (β T, θ ) T gives U(ˆγ, ˆΛ (, ˆγ)) = U(γ, ˆΛ (, γ )) + D(γ )(ˆγ γ ) T + o p (1), where D ls (γ) = U l (γ, ˆΛ (, γ))/ γ s for l, s = 1,..., p + 1, with γ p+1 = θ. 22

25 For l, s = 1,..., p we have n D ls (γ) = n 1 φ 2i (γ, ˆΛ, τ) m i Ĥij(T ij ) i=1 φ 1i (γ, ˆΛ Z ijl, τ) j=1 β s [ φ3i (γ, ˆΛ, τ) φ 1i (γ, ˆΛ, τ) φ2 2i(γ, ˆΛ ], τ) mi Ĥi.(τ) φ 2 1i(γ, ˆΛ Ĥ ij (T ij )Z ijl, τ) j=1 β s, (28) Ĥij(τ k ) = ˆΛ (T ij τ k ) exp(β T Z ij ) + β s β ˆΛ (T ij τ k ) exp(β T Z ij )Z ijs s and { ˆΛ (τ k ) n = d k β s n i=1 } 2 φ 2i (γ, ˆΛ, τ k 1 ) i=1 φ 1i (γ, ˆΛ, τ k 1 ) R i.(τ k ) [{ φ 2 2i (γ, ˆΛ, τ k 1 ) φ 2 1i(γ, ˆΛ, τ k 1 ) φ 3i(γ, ˆΛ, τ k 1) φ 1i (γ, ˆΛ, τ k 1 ) + φ 2i(γ, ˆΛ, τ k 1 ) φ 1i (γ, ˆΛ, τ k 1 ) m i R ij (τ k )Z ijs. j=1 } Ĥ i. (τ k 1 ) β s R i. (τ k ) For l = 1,..., p we have n D l(p+1) (γ) = n 1 φ 2i (γ, ˆΛ, τ) m i Ĥij(T ij ) i=1 φ 1i (γ, ˆΛ Z ijl, τ) j=1 θ + φ(θ) 2i (γ, ˆΛ, τ) φ 1i (γ, ˆΛ, τ) φ 2i(γ, ˆΛ (θ), τ)φ1i (γ, ˆΛ, τ) φ 2 1i(γ, ˆΛ, τ) { φ 2 + 2i (γ, ˆΛ, τ) φ 2 1i(γ, ˆΛ, τ) φ 3i(γ, ˆΛ } ], τ) Ĥ i. (τ) mi φ 1i (γ, ˆΛ Ĥ ij (T ij )Z ijl, τ) θ (29) j=1 and n D (p+1)l (γ) = n 1 i=1 φ (θ) 1i (γ, ˆΛ, τ)φ 2i (γ, ˆΛ, τ) φ 2 1i(γ, ˆΛ, τ) 2i (γ, ˆΛ, τ) φ 1i (γ, ˆΛ, τ) φ(θ) Ĥi.(τ) β l. (3) Finally, n D (p+1)(p+1) (γ) = n 1 + i=1 φ (θ,θ) 1i (γ, ˆΛ, τ) φ 1i (γ, ˆΛ, τ) φ(θ) φ(θ) 1i (γ, ˆΛ, τ)φ 2i (γ, ˆΛ, τ) φ 2 1i(γ, ˆΛ, τ) 23 1i (γ, ˆΛ 2, τ) φ 1i (γ, ˆΛ ) φ(θ) 2i (γ, ˆΛ, τ) φ 1i (γ, ˆΛ, τ) Ĥi.(τ) θ (31)

26 where φ (θ,θ) 1i (γ, ˆΛ, τ) = w Ni.(τ) exp{ wĥi.(τ)} d2 f(w) dw, dθ 2 Ĥij(τ k ) θ = ˆΛ (T ij τ k ) θ exp(β T Z ij ), and ˆΛ (τ k ) θ } 2 { n φ 2i (γ, = d ˆΛ, τ k 1 ) k i=1 φ 1i (γ, ˆΛ, τ k 1 ) R i.(τ k ) n R i. (τ k ) φ(θ) 2i (γ, ˆΛ, τ k 1 ) i=1 φ 1i (γ, ˆΛ, τ k 1 ) φ 2i(γ, ˆΛ (θ), τ k 1)φ1i (γ, ˆΛ, τ k 1 ) φ 2 1i(γ, ˆΛ, τ k 1 ) + Ĥi.(τ { k 1 ) φ 2 2i (γ, ˆΛ, τ k 1 ) θ φ 2 1i(γ, ˆΛ, τ k 1 ) φ 3i(γ, ˆΛ }], τ k 1) φ 1i (γ, ˆΛ., τ k 1 ) Step V Combining the results above we get that n 1/2 (ˆγ γ ) is asymptotically zero-mean normally distributed with a covariance matrix that can be consistently estimated by ˆD 1 (ˆγ){ ˆV(ˆγ) + Ĝ(ˆγ) + Ĉ(ˆγ)} ˆD 1 (ˆγ) T. 5.6 Proof of (2) The goal is to prove that This involves several steps. that sup A(γ, Λ (n), s) a(γ, Λ (n), s) a.s. (32) s,γ First, it is easy to see that there exists a constant κ (independent of γ and s) such sup A(γ, Λ 1, s) A(γ, Λ 2, s) κ Λ 1 Λ 2, (33) s,γ sup a(γ, Λ 1, s) a(γ, Λ 2, s) κ Λ 1 Λ 2. (34) s,γ 24

27 Next, for any fixed continuous Λ, the functional strong law of large numbers of Andersen and Gill (1982, Appendix III) implies that, with probability one, Now, given ɛ >, define the sets {t (ɛ) j sup A(γ, Λ, s) a(γ, Λ, s). (35) s,γ }, {γ (ɛ) }, and {Λ(ɛ)} to be finite partition k grids of [, τ], G, and [, Λ max ], respectively, with distance of no more than ɛ between grid l points. Define L ɛ to be the set of functions of t and γ defined by linear interpolation through vertices of the form (t (ɛ) j, γ (ɛ) k, Λ(ɛ) l ). Obviously L ɛ is a finite set. Hence, in view of (35), there exists a probability-one set of realizations Ω ɛ for which Define sup A(γ, Λ, s) a(γ, Λ, s). (36) s [,τ],γ G,Λ L ɛ Ω = Ω 1/l l=1 and Ω = Ω Ω, with Ω as defined earlier. Clearly Pr(Ω ) = 1. From now on, we restrict attention to Ω. Now let ɛ > be given. Choose l > ɛ 1. In view of (18) and (36), we can find for any ω Ω a suitable positive integer n(ɛ, ω) such that, whenever n n(ɛ, ω), Λ (n) (n) (t, γ) Λ (u, γ) B (t u) + ɛ 2 t, u, (37) where Next, let Λ (ɛ) jk Λ (n) sup A(γ, Λ, s) a(γ, Λ, s) ɛ. (38) s [,τ],γ G,Λ L 1/l denote the function defined by linear interpolation through (t (ɛ) is the element of {Λ(ɛ)} that is closest to l (n) Λ (t (ɛ) j j, γ (ɛ) k, γ (ɛ) ). It is clear that k, Λ (ɛ) jk ), Λ (n) (t (ɛ) j, γ (ɛ) (n) ) Λ k (t (ɛ) j, γ (ɛ) k ) ɛ j, k. 25

28 (n) Using (37) and the Lipschitz continuity of Λ (t, γ) with respect to γ (which follows from the corresponding property of ˆΛ (t, γ)), we thus obtain sup Λ (n) (n) (t, γ) Λ (t, γ) B ɛ t,γ for a suitable fixed constant B (depending on B and C ). Combining this with (38) and (34), we obtain sup A(γ, Λ (n), s) a(γ, Λ (n), s) (2κB + 1)ɛ s,γ for all n n(ɛ, ω). Since ɛ was arbitrary, the desired conclusion (32) follows, and the proof is thus complete. 5.7 Definition and behavior of h i (r, s) The quantity h i (r, s) appearing in (24) is given by h i (r, s) = 2R i. (s)η 1i (r, s) {Y(s, Λ + r )} 3 1 n R i.(s)η 2i (r, s) {Y(s, Λ + r )} 2 n m i R l. (s)η 1l (r, s) exp(β T Z lj ) (T lj s) l=1 j=1 m i exp(β T Z ij ) (T ij s) j=1 where (T ij s) = ˆΛ (T ij s) Λ o (T ij s) and η 2i (r, s) = 2 { φ2i (γ, Λ } 3 + r, s) + φ 4i(γ, Λ + r, s) φ 1i (γ, Λ + r, s) φ 1i (γ, Λ + r, s) 3φ 2i(γ, Λ + r, s)φ 3i (γ, Λ + r, s) {φ 1i (γ, Λ + r, s)} 2. For all i = 1,..., n and s [, τ], we have R i. (s) mν, where ν is as in (8). Moreover, for k = 1,..., 4, we have E[W r min+(k 1) i exp{ W i me βt Z Λ (τ)}] φ ki (γ, Λ, s) E[W rmax+(k 1) i ] where r max = arg max 1 r m E(W r i ), r min = arg min 1 r m E(W r i ). Hence, η 1i and η 2i are bounded. In addition, the the proof of Lemma 2 show that Y(s, Λ + r ) is uniformly bounded away from zero for n sufficiently large. Finally, in the consistency proof we obtained = o(1). Therefore h i (r, s) is o(1) uniformly in r and s. 26

29 6 Acknowledgements We thanks the referees for their helpful comments, and for calling our attention to the work of Dabrowska (26a, 26b). 7 References Aalen, O. O. (1976). Nonparametric inference in connection with multiple decrement models. Scand. J. Statist. 3, Abramowitz, M. and Stegun, I. A. (Eds.) (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing New York: Dover. Andersen, P. K., Borgan, O, Gill, R. D. and Keiding, N. (1993). Statistical Models Based on Counting Processes. Berlin: Springer-Verlag. Andersen, P. K. and Gill, R. D. (1982). Cox s regression model for counting processes: A large sample study. Ann. Statist. 1, Andersen, P. K., Klein, J. P., Knudsen, K. M. and Palacios, R. T. (1997). Estimation of variance in Cox s regression model with shared gamma frailty. Biometrics 53, Bickel, P. (1985). Efficient testing in a class of transformation models. Bull. Int. Statist. Inst., 51, 53-81, Meeting 23, Amsterdam. Bagdonavicius, V. B., and Nikulin, M. S. (1999). Generalized proportional hazards model based on modified partial likelihood Lifetime Data Analysis, 5,

30 Breslow, N. (1974). Covariance analysis of censored survival data. Biometrics, 3, Cox, D. R. (1972). Regression models and life tables (with discussion). J. R. Statist. Soc. B 34, Dabrowska, D. (26a). Estimation in a class of semi-parametric transformation models. In Optimality: Second Erich L. Lehmann Symposium, Institute of Mathematical Statistics Lecture Notes and Monographs Series Vol. 49 (J. Roho, ed.). Beachwood, OH: Institute of Mathematical Statistics. Dabrowska, D. (26b). Information bounds and efficient estimation in a class of censored transformation models. Technical report. Available at arxiv:math.st/6888. Fine, J. P., Glidden D. V. and Lee, K. (23). A simple estimator for a shared frailty regression model. J. R. Statist. Soc. B 65, Foutz, R. V. (1977). On the unique consistent solution to the likelihood equation. J. Amer. Statist. Ass. 72, Gill, R. D. (1985). Discussion of the paper by D. Clayton and J. Cuzick. J. R. Statist. Soc. A 148, Gill, R. D. (1989). Non- and semi-parametric maximum likelihood estimators and the Von Mises method (Part 1). Scand. J. Statist. 16, Gill, R. D. (1992). Marginal partial likelihood. Scand. J. Statist. 79, Gorfine, M., Zucker, D. M., and Hsu, L. (26). Prospective survival analysis with a general semiparametric shared frailty model - a pseudo full likelihood approach. Biometrika 93,

31 Hartman, P. (1973). Ordinary Differential Equations, 2nd ed. (reprinted, 1982), Boston: Birkhauser. Henderson, R. and Oman, P. (1999). Effect of frailty on marginal regression estimates in survival analysis. J. R. Statist. Soc. B 61, Hougaard, P. (1986). Survival models for heterogeneous populations derived from stable distributions. Biometrika 73, Hougaard, P. (2). Analysis of Multivariate Survival data. New York: Springer. Klein, J. P. (1992). Semiparametric estimation of random effects using the Cox model based on the EM Algorithm. Biometrics 48, Lancaster, T., and Nickell, S. J. (198). The analysis of re-employment probabilities for the unemployed. Journal of the Royal statistical Society, Series A 143, Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. J. R. Statis. Soc. B 44, McGilchrist, C. A. (1993). REML estimation for survival models with frailty. Biometrics 49, Murphy, S. A. (1994). Consistency in a proportional hazards model incorporating a random effect. Ann. Statist. 22, Murphy, S. A. (1995). Asymptotic theory for the frailty model. Ann. Statist. 23,

32 Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sorensen, T. I. (1992). A counting process approach to maximum likelihood estimation of frailty models. Scand. J. Statist. 19, Parner, E. (1998). Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 26, Ripatti, S. and Palmgren J. (2). Estimation of multivariate frailty models using penalized partial likelihood. Biometrics 56, Vaida, F. and Xu, R. H. (2). Proportional hazards model with random effects. Stat. in Med. 19, Yang, S. and Prentice, R. L. (1999). Semiparametric inference in the proportional odds regression model. J. Amer. Statist. Ass. 94, Zucker, D. M. (25). A pseudo partial likelihood method for semi-parametric survival regression with covariate errors. J. Amer. Statist. Ass. 1,

33 Table 1: Simulation results for family size 2. A: Empirical mean. B: Empirical standard deviation. C: Estimated Standard deviation. D: Coverage rate. E: Correlation. ˆβ ˆθ censoring θ β % Our approach EM algorithm Our approach EM algorithm 2 ln(2) 35 A B C D E A B C D E ln(3) 3 A B C D E A B C D E ln(2) 5 A B C D E A B C D E ln(3) 45 A B C D E A B C D E

34 Table 2: Simulation results for family size equals 5. A: Empirical mean. B: Empirical standard deviation. C: Estimated Standard deviation. D: Coverage rate. E: Correlation. ˆβ ˆθ censoring θ β % Our approach EM algorithm Our approach EM algorithm 2 ln(2) 35 A B C D E A B C D E ln(3) 3 A B C D E A B C D E ln(2) 5 A B C D E A B C D E ln(3) 45 A B C D E A B C D E

arxiv:math/ v1 [math.st] 18 May 2005

arxiv:math/ v1 [math.st] 18 May 2005 Prospective survival analysis with a general semiparametric arxiv:math/55387v1 [math.st] 18 May 25 shared frailty model - a pseudo full likelihood approach Malka Gorfine 1 Department of Mathematics, Bar-Ilan