Maximum Likelihood Estimation for the Proportional Odds Model with Random Effects

Size: px
Start display at page:

Download "Maximum Likelihood Estimation for the Proportional Odds Model with Random Effects"

Transcription

1 Maximum Likelihood Estimation for the Proportional Odds Model with Random Effects DONGLIN ZENG, D. Y. LIN, and GUOSHENG YIN In this article, we study the semiparametric proportional odds model with random effects for correlated, right-censored failure time data. We estalish that the maximum likelihood estimators for the parameters of this model are consistent and asymptotically Gaussian. Furthermore, the limiting variances achieve the semiparametric efficiency ounds and can e consistently estimated. Simulation studies show that the asymptotic approximations are accurate for practical sample sizes and that the efficiency gains of the proposed estimators over those of Cai, Cheng and Wei (22, JASA) can e sustantial. A real example is provided to illustrate the proposed methods. KEY WORDS: Correlated failure time data; Frailty model; Linear transformation model; Proportional hazards; Semiparametric efficiency; Survival data. Donglin Zeng is Assistant Professor and D. Y. Lin is Dennis Gillings Distinguished Professor, Department of Biostatistics, CB# 742, University of North Carolina, Chapel Hill, NC Guosheng Yin is Assistant Professor, Department of Biostatistics, M. D. Anderson Cancer Center, Houston, TX 773. This research was supported y the National Institutes of Health. 1

2 1. INTRODUCTION In many scientific studies, there exists natural or artificial clustering of study sujects such that the survival times or failure times of the sujects within the same cluster are correlated. A common approach to accommodating the intra-class dependence is to incorporate an unoserved random effect, the so-called frailty, into the Cox (1972) proportional hazards model. Specifically, the hazard function for the jth suject of the ith cluster associated with a d 1 -vector of covariates X ij is postulated to take the form λ(t X ij, ξ i ) = ξ i λ (t)e XT ijα, i = 1,..., n; j = 1,..., ni, (1) where λ ( ) is an unspecified aseline hazard function, α is a vector of unknown regression parameters, and ξ i is the unoserved frailty for the ith cluster. Although various parametric distriutions for the frailty have een suggested, the existing literature has een focused on the simple case of gamma frailty. The consistency and asymptotic distriution of the maximum likelihood estimator for the gamma frailty model have een rigorously studied y Murphy (1994; 1995) for the case of no covariates and y Parner (1998) for the case with covariates. Model (1) imposes a common gamma frailty on all memers of the same cluster. Several authors have extended this shared gamma frailty model to accommodate more flexile dependence among cluster memers. In particular, Pertersen (1998) allowed different additive frailties for different memers of the same cluster. Parner (1998) assumed that the frailty for each cluster consists of two independent components: a common cluster-level effect and a suject-specific effect, and showed that the maximum likelihood estimator is efficient. Under model (1), the conditional hazard functions given frailties are required to e proportionate over time among different sets of covariate values. This assumption of proportional hazards may not e satisfied in certain applications. For independent failure time data, an attractive alternative to the proportional hazards model is the proportional odds model (Pettitt 1982; Bennett 1983). The proportional odds model constraints the ratio of the odds of survival associated with two sets of covariate values to e constant over time, and consequently the ratio of the hazards to converge to unity as time increases. By contrast, the proportional hazards model constraints the hazard ratio to e constant while the odds ratio tends to or infinity. Physical and iological rationale ehind the proportional odds model was provided y Bennett (1983) and others. Statistical inference is much more challenging under the proportional odds model than under the proportional hazards model. Important contriutions have een made 2

3 y Bennett (1983), Pettitt (1984), Cuzick (1988), Darowska and Doksum (1988), Cheng, Wei and Ying (1995), Wu (1995), Murphy, Rossini and van der Vaart (1997), Shen (1998), Lam and Leung (21), and Chen, Jin and Ying (22) among others. In this article, we consider the proportional odds model with random effects for correlated failure time data. Specifically, logit{s(t X ij, Z ij, i )} = G(t) + X T ijβ + Z T ij i, i = 1,..., n; j = 1,..., n i, (2) where X ij is a d 1 -vector of covariates, as defined earlier, Z ij is a d 2 -vector of covariates, which usually contains 1 and part of X ij, G( ) is an unspecified strictly increasing function, β is a set of unknown regression parameters, i is a set of unoserved random effects, and S( X ij, Z ij, i ) is the survival function conditional on X ij Z ij, and i. We assume that i follows a normal distriution with mean zero and unknown covariance matrix Σ. Note that model (2) allows covariate-specific or suject-specific random effects whereas model (1) only allows a clusterspecific frailty. Two recent articles are concerned with special versions of model (2). Specifically, Cai, Cheng and Wei (22) studied model (2) with a scalar random effect (i.e., Z ij 1). The parameter estimators are otained y minimizing the empirical sum of squares of the differences etween certain oserved quantities and their expected values. The estimators are not asymptotically efficient, and the variance estimation is computationally demanding. The censoring mechanism is required to e purely random and independent of covariates. Lam, Lee and Leung (22) considered the proportional odds model with scalar random effects µ ij, i = 1,..., n and j = 1,..., n i. Within the ith cluster, µ ij, j = 1,..., n i, are multivariate normal with a specific covariance structure. Lam et al. (22) otained the estimators for the regression parameters y maximizing a marginal likelihood ased on the ranks of the failure times. They did not provide formal asymptotic results or consider the prolem of survival function estimation. In this article, we study the maximum likelihood estimation of model (2). The estimators are shown to e consistent and asymptotically efficient. The asymptotic distriutions of the estimators and consistent variance estimators are also otained. Numerical studies reveal that the proposed estimators perform well for practical sample sizes and the efficiency gains over the estimators of Cai et al. (22) can e sustantial. We descrie in greater detail the data structure and model assumptions in the next section, and develop the estimation theory in Section 3. We then present the results of our numerical studies in Section 4 and provide an application to a real medical study in Section 5. Some 3

4 concluding remarks are given in Section 6. Most of the technical details are relegated to the appendix. 2. DATA STRUCTURE AND MODEL ASSUMPTIONS Suppose that there is a random sample of n clusters with potentially different sizes. For i = 1,..., n and j = 1,..., n i, let T ij and Cij e the latent failure time and censoring time for the jth memer of the ith cluster, and let X ij and Z ij e the corresponding d 1 - and d 2 - vectors of covariates. The regression relationship etween T ij and (X ij, Z ij ) is given y model (2). The data consist of (Y ij, ij, X ij, Z ij ) (i = 1,..., n; j = 1,..., n i ), where Y ij = T ij C ij, ij = I(T ij C ij ), and C ij = Cij τ. Here and in the sequel, a = min(a, ), a = max(a, ), I( ) is the indicator function, and τ is a fixed constant denoting the end of the study. We impose the following regularity conditions. C.1. Conditional on covariates X ij and Z ij, the censoring time Cij is independent of the failure time T ij and random effects i. C.2. There exists some positive constant δ such that Pr(Cij τ X ij, Z ij ) δ almost surely. C.3. All the X ij and Z ij are ounded. In addition, if there exist a constant vector c and a symmetric matrix Σ such that [1, X T ij]c + Z T ijσz ij =, j = 1,..., n i, and Z T ijσz ij =, j j ; j, j = 1,..., n i almost surely, then c = and Σ =. C.4. The true value G (t) of G(t) is a strictly increasing function in [, τ] and is continuously differentiale. In addition, G () =, de G(t) /dt t=+ >, and G (τ) <. C.5. The true values of β and Σ, β and Σ, elong to the interior of a known compact set Θ = { (β, Σ) : β B for some constant B, Σ is positive definite and its eigenvalues are ounded away from and }. C.6. The cluster size is completely random. In addition, there exists a positive integer n such that 1 n i n and Pr(n i 2) >. 4

5 Remark 1. Conditions C.3, C.4 and C.6 ensure the identifiaility of the parameters in model (2). If Z ij = Z ij in condition C.3 for continuous covariates, then the two displays in this condition are equivalent to the linear independence of [1, X T ij] and the linear independence of Z ij. In condition C.4, the equality G () = follows from the fact that S( X ij, Z ij, i ) = 1, and the inequality G (τ) < implies that Pr(T ij > τ X ij, Z ij, i ) >. The ound G (τ) is unknown in practice. Condition C.6 implies that the cluster size is ounded and some clusters have at least two sujects. 3. MAXIMUM LIKELIHOOD ESTIMATION Define H(t) = e G(t) and H (t) = e G (t). Note that H () =. Under model (2) and condition C.1, the likelihood function for the parameters (β, Σ, H ) is proportional to n n i e (XT ijβ+z T ij ) 1 ij i=1 H(Y ij ) + e (XT ijβ+z T ij ) e (XT ijβ+z T ij ) H ij (Y ij ) Σ 1/2 e T Σ 1 /2 d, (H(Y ij ) + e (XT ijβ+z T ij ) ) 2 where H (t) is the derivative of H(t). It would seem natural to calculate the maximum likelihood estimators of (β, Σ, H ) y maximizing the aove likelihood function. The maximum of this function, however, is infinity since we can always choose some function H( ) with fixed values at the Y ij while letting H (Y ij ) go to infinity for some Y ij with ij = 1. Thus, we relax H( ) to e right-continuous and allow H( ) to have jumps at the Y ij. We then maximize the following function n n i e (XT ijβ+z T ij ) 1 ij L n (β, Σ, H) i=1 H(Y ij ) + e (XT ijβ+z T ij ) e (XT ijβ+z T ij ) ij H{Y ij } Σ 1/2 e T Σ 1 /2 d, (3) (H(Y ij ) + e (XT ijβ+z T ij ) ) 2 where H{t} denotes the jump size of H(t) at t. To e specific, we maximize L n (β, Σ, H) over the parameter space {(β, Σ, H) : (β, Σ) Θ, H(t) is an increasing right-continuous function in [, τ] with H() = }. The resulting estimators, denoted y β n, Σ n and Ĥn, are referred to as the nonparametric maximum likelihood estimators (NPMLEs) (Parner, 1998) or the sieve maximum likelihood estimators (Huang and Rossini, 1997; Murphy, Rossini and van der Vaart, 1997). 5

6 The existence of the maximizers follows from the following arguments. First, for any (β, Σ, H) in the parameter space, the ith term on the right-hand side of (3) is ounded y max X ij,z ij,(β,σ) Θ n i e XT ijβ+z T ij Σ 1/2 e T Σ 1 /2 d <, where the inequality follows from the oundedness of X ij and Z ij, and the compactness of Θ. Second, for any H, we can always construct a new increasing function H which is a step function with jumps only at the Y ij for which ij = 1 such that H (Y ij ) = H(Y ij ). Clearly, H {Y ij } H{Y ij } for ij = 1 so that L n (β, Σ, H ) L n (β, Σ, H). This implies that the function H which maximizes L n (β, Σ, H) should e a step function with positive jumps only at the Y ij for which ij = 1. Third, if H{Y ij } = for some Y ij, then it is easy to see that L n (β, Σ, H) =. Therefore, we conclude that the maximizers exist. The aove arguments imply that Ĥn(t) is a step function with jumps only at the Y ij for which ij = 1. Thus, the NPMLEs for (β, Σ, H ) can e otained y maximizing L n (β, Σ, H) over the parameter space (β, Σ) Θ and the jump sizes of H at the Y ij. This maximization can e realized via many optimization algorithms such as the large-scale unconstrained optimization function fminunc in MATLAB, which is descried in the next section. The asymptotic properties of the proposed estimators are stated in the following theorems. Theorem 1. Under conditions C.1 C.6, β n β, Σ n Σ and sup t [,τ] Ĥn(t) H (t) almost surely, where is the Euclidean norm. Theorem 2. Under conditions C.1 C.6, the random element n( β T n β T, Σ T n Σ T, Ĥn( ) H ( )) T converges weakly to a zero-mean Gaussian process in the metric space R d 1 R d 2(d 2 +1)/2 l [, τ], where Σ n and Σ are treated as extended column vectors consisting of the upper triangle elements, and l [, τ] is a normed space consisting of all the ounded functions and the norm is defined as the supremum norm on [, τ]. Furthermore, β n and Σ n are asymptotically efficient. Remark 2. Theorem 1 presents the consistency of the maximum likelihood estimators. In conditions C.1 C.6, H( ) is not assumed to e a ounded function, which means that the weakcompactness of the parameter H( ) is not assumed. Thus, otaining a ound for the maximum likelihood estimator Ĥn( ) is a key to the proof of Theorem 1. The proof of Theorem 1 adopts some ideas from Murphy s (1994) proof of the consistency for the gamma frailty model, ut the technical details are quite different. Once the consistency is estalished, the asymptotic 6

7 distriutions of the maximum likelihood estimators stated in Theorem 2 can e derived along the lines of Murphy (1995) and Parner (1998), although the verification of the continuous invertiility of the information operator is sustantially different from theirs. In the statement of Theorem 2, asymptotically efficient estimators mean that the asymptotic variances attain the semiparametric efficiency ounds as defined in Bickel et al. (1993, Ch. 3). The proofs of Theorems 1 and 2 are given in the appendix. It is essential to estimate the asymptotic covariance matrices of β n and Σ n. Intuitively, the variation in estimating the parameter H( ) arises from the variation in estimating the jump sizes of H( ) at the Y ij for which ij = 1. Thus, we can regard the oserved likelihood function as a likelihood function indexed y the parameters β, Σ, and the parameters which represent the jump sizes of H( ) at the Y ij for which ij = 1. From the Fisher information theory in the parametric setting, the asymptotic covariance matrix in Theorem 2 can e estimated y the inverse of the oserved information matrix for all the parameters. constant vector (h 1, h 2 ) R d 1 Specifically, for any R d 2(d 2 +1)/2 and any ounded function h 3, the asymptotic variance of h T 1 β n + h T 2 Σ n + τ h 3(t)dĤn(t) is equal to the asymptotic variance of h T 1 β n + h T Σ 2 n + ij =1 h 3 (Y ij )Ĥn{Y ij } so that it can e estimated y h T nj 1 n h n, where h n is the vector comprising of h 1, h 2 and the h 3 (Y ij ) for which ij = 1, and J n is the negative Hessian matrix of log L n ( β, Σ, Ĥ) with respect to (β, Σ) and the jump sizes of H at the Y ij for which ij = 1. The next theorem formalizes this approximation. Theorem 3. Let V (h 1, h 2, h 3 ) e the asymptotic variance of the random variale n 1/2 {h T 1 ( β n β ) + h T 2 ( Σ n Σ ) + τ h 3(t)d(Ĥn(t) H (t))}. Under conditions C.1 C.6, the estimator nh T nj 1 n h n V (h 1, h 2, h 3 ) uniformly in (h 1, h 2, h 3 ) in proaility. Theorem 3 implies that, when the numer of uncensored oservations is not too large, one can simply invert the oserved information matrix for all the parameters, including β, Σ, and the H{Y ij } for which ij = 1 to calculate the variances and covariances. Our numerical studies revealed that this approximation is satisfactory for practical sample sizes. 4. NUMERICAL STUDIES Simulation studies were conducted to evaluate the finite-sample properties of the proposed methods. In the first set of studies, the failure times were generated from the following special case of model (2): logit{s(t X 1ij, X 2ij, i )} = log t X 1ij + X 2ij + i, i = 1,..., n; j = 1, 2, 7

8 where X 1i1 =, X 1i2 = 1, X 2i1 X 2i2 is a uniform(, 1) random variale, and i is zeromean normal with variance σ 2. The censoring times were generated from the uniform(, 15) distriution, corresponding to approximately 33% censoring rate. We used the optimization algorithm fminunc in the optimization toolox of MATLAB to otain the maximum likelihood estimates of β 1, β 2, σ and H. When the gradients and the Hessian derivatives of the likelihood function are provided, the search algorithm is a suspace trust region method and is ased on the interior-reflective Newton method descried in Coleman and Li (1994; 1996). In each iteration of the search, a large linear system is approximately solved y using the method of preconditioned conjugate gradients. The algorithm converges when the search step size and the norm of the search gradients are smaller than certain thresholds. To avoid negative estimates of the jump sizes for H or negative estimates of σ, we used the logarithms of the jumps sizes and log σ as the parameters during the search. The starting values for (β 1, β 2, σ) were set to e (,, 1). The starting value for the jump size H{Y ij } at the failure time Y ij was given y expression (A.1) in Appendix A.1, on the right-hand side of which the values for (β 1, β 2, σ) were set to e the initial values and H(t) was set to e t. In general, the search algorithm converged within 1 iterations. After the algorithm converged, the variance estimates were calculated y inverting the oserved information matrix. Tale 1 displays the results of these simulation studies with n = 2. likelihood estimators for all the parameters show little ias. The maximum The proposed standard error estimators agree well with the empirical standard errors, and the confidence intervals provide reasonale coverages. For comparisons, we also computed the estimates ased on the method of Cai et al. (22), which minimizes the following criterion function nρ n n i i=1 l k=1 ili(y il Y ik t ) Ĝ w (Y il ) α, + n α n i j i=1 k=1 l=1 e (t+xt ikβ+) (t+x e T jlβ+ ) 1 + e (t+xt ik β+) e (t+xt ikβ+) e (t+xt ilβ+) 1 e 2 2σ 2 dtd 1 + e (t+xt ik β+) {1 + e (t+xt il β+) } 2 2πσ n j [ jl I(Y jl Y ik t ) Ĝ 2 c(y jl ) 1 e 2 1 2σ 2 e 2 2σ 2 dtdd {1 + e (t+xt jl β+ ) } 2 2πσ 2πσ 2, (4) where t is the minimum of the 95th percentile of the oserved Y ij and the 98th percentile of the oserved Y ij for which ij = 1. In (4), ρ is chosen to minimize the asymptotic variance of the estimator for β, Ĝc(t) = e Λc (t) and Ĝw(t) = e Λw (t), where Λ c is the Nelson-Aalen estimator 8 2

9 ased on (Y ij, 1 ij ), i = 1,..., n; j = 1,..., n i, and Λ w is the Nelson-Aalen estimator ased on {Y ik Y il, 1 (I(Y ik Y il ) ik + I(Y ik > Y il ) il )}, i = 1,..., n, 1 k < l n i. The mean squared errors for Cai et al. s estimators of β 1, β 2 and σ turned out to e.96,.749 and.935 under σ = 3, and.55,.192 and.311 under σ = 1. Thus, these estimators can e consideraly less efficient than the maximum likelihood estimators, especially when there is strong intra-class dependence. It would e interesting to make comparisons with Lam et al. s method. Their estimators, however, are not easy to program. Tale 1. Summary Statistics for the Simulation Studies With One Random Effect Paramter Value Mean SE SEE 95% CP MSE σ = 3 β β σ H(τ/4) H(τ/2) H(3τ/4) H(τ) σ = 1 β β σ H(τ/4) H(τ/2) H(3τ/4) H(τ) NOTE: Mean and SE stand for the mean and standard error of the estimator. SEE is the mean of the standard error estimator, and 95% CP is the coverage proaility of the 95% confidence interval. MSE is the mean squared error. Each entry is ased on 5 simulated data sets. In our second set of studies, we considered ivariate normal random effects. The failure times were generated from the following model: logit{s(t X i, 1i, 2i )} = log { (1 + t/2) 2 1 } +.5X 1ij.5X 2ij + 1i + X 2ij 2i, i = 1,..., n; j = 1, 2, where X 1i1 =, X 1i2 = 1, X 2i1 X 2i2 is a uniform(, 1) random variale, and ( 1i, 2i ) has a ivariate normal distriution with zero means, unit variances and covariance.4. censoring times were set to e min(3, C ), where C is uniform(3/8, 11/8), so that approximately 9 The

10 34% of the failure times were censored. The optimization algorithm fminunc was again used to find the maximum likelihood estimates. We used the logarithms of the jump sizes of H and the elements in the square-root of the covariate matrix of the random effects as the parameters during the search. To ensure that the covariance matrix estimate is positive definite, we let the ojective function e a large negative value (i.e., 1 5 ) if any condition for a positive definite matrix was violated. This penalization essentially restricts the search within the meaningful regions of the parameters. As efore, oth the gradients and the Hessian derivatives of the ojective function were supplied in the search algorithm. The starting values for the regression parameters and the covariance matrix were zeros and the identity matrix respectively, while the starting values for the jump sizes of H were determined y (A.1), in which the parametric components were set to e the initial values and H(t) to e t. Tale 2. Summary Statistics for the Simulation Studies With Two Random Effects Parameter Value Mean SE SEE 95% CP MSE n = 2 β β σ σ σ H(τ/4) H(τ/2) H(3τ/4) H(τ) n = 4 β β σ σ σ H(τ/4) H(τ/2) H(3τ/4) H(τ) NOTE: See Note to Tale 1. Tale 2 displays the results for n = 2 and n = 4. For n = 2, the search usually converged after aout 1 iterations, and it took less than 2 hours to complete 5 repetitions on GHz Athlon machines. For n = 4, it took aout 1 hours to complete. In the tale, 1

11 σ 11, σ 22 and σ 12 are the elements in the square-root of the covariance matrix for 1 and 2 so that σ 11 = σ 22 =.979 and σ 12 =.24. For the regression parameters and the function H, the maximum likelihood estimators show little ias and the proposed standard error estimators agree well with the empirical standard errors. The parameters for the covariance matrix of the random effects are estimated less well, although there is a trend for improvement as n increases. 5. AN EXAMPLE We now illustrate the proposed methods with the well-known Diaetic Retinopathy Study (Huster et al. 1989), which was conducted to assess the effectiveness of laser photocoagulation in delaying visual loss among patients with diaetic retinopathy. One eye of each patient was randomly selected to receive the laser treatment while the other eye was used as a control. The failure time of interest is the time to visual loss as measured y visual acuity less than 5/2. Following previous authors, we confine our attention to a suset of 197 high-risk patients, and consider three covariates: X 1ij indicates, y the values 1 versus, whether or not the jth eye (j = 1 for the left eye and j = 2 for the right eye) of the ith patient was treated with laser photocoagulation, X 2i1 X 2i2 indicates, y the values 1 versus, whether the ith patient had adult-onset or juvenile-onset diaetics, and X 3ij = X 1ij X 2ij. We fit model (2) with these three covariates, along with random effects i to account for the correlation etween the two eyes of the same patient. We used the fminunc function with the starting values descried in the previous section. The results of the analysis are shown in Tale 3. There is a high degree of dependence etween the failure times of the two eyes from the same patient. Both the treatment indicator and the interaction term are significant, whereas the diaetic type is not. Cai et al. (22) reported estimates of β 1, β 2 and β 3 of.46,.74 and 1.41 with estimated standard errors of.3,.38 and.54. The conclusions ased on the two sets of results would e somewhat different. Tale 3. Maximum Likelihood Estimates of the Random-Effect Proportional Odds Model for the Diaetic Retinopathy Study Parameter Estimate SE Est/SE p-value β β β σ <.1 11

12 NOTE: SE is the estimated standard error, and p-value pertains to the two-sided test of zero parameter value. To compare the fit of competing models, Cai et al. (22, p. 516) proposed a distance measure which summarizes the differences etween the oserved and fitted values of the failure times. This distance measure turns out to e for the proportional odds model with normal random effect as opposed to for the proportional hazards model with gamma frailty. Thus, the former model appears to fit the data slightly etter than the latter. One important application of random-effects models is to predict the future survival experience of one memer given the survival history of the other memers of the same cluster. In the Diaetic Retinopathy Study, one may e interested in estimating, for example, the conditional survival proailities of the treated eye given that it has not failed efore 3 months while the untreated eye failed etween 24 and 3 months, i.e., Pr(T 2 > t T 2 > 3, 24 < T 1 < 3, X 11 =, X 12 = 1, X 2 ) for t > 3, where T 2 is the failure time for the treated eye and T 1 is the failure time for the untreated eye, X 1k is the treatment status for the kth eye, and X 2 is the diaetic type for this patient. It is straightforward to show that = Pr(T 2 > t T 2 > 3, 24 < T 1 < 3, X 11 =, X 12 = 1, X 2 ) u g(u, t, 1, X 2; β, σ, H){g(u, 24,, X 2 ; β, σ, H) g(u, 3,, X 2 ; β, σ, H)}φ(u)du u g(u, 3, 1, X 2; β, σ, H){g(u, 24,, X 2 ; β, σ, H) g(u, 3,, X 2 ; β, σ, H)}φ(u)du, (5) where φ( ) is the standard normal density function, and g(u, t, X 1, X 2 ; β, σ, H) = e β 1X 1 β 2 X 2 β 3 X 1 X 2 σu H(t) + e β 1X 1 β 2 X 2 β 3 X 1 X 2 σu. We can easily estimate this proaility function y replacing β 1, β 2, σ and H in (5) y their respective maximum likelihood estimates and then evaluating the integration via the Gaussianquadrature formula. The variance function is given y D T J 1 n D, where D is the derivative of (5) with respect to (β 1, β 2, σ) and the jump sizes of H at the Y ij for which ij = 1. Figure 1 displays the estimated survival curves along with the 95% confidence intervals for the two diaetic types. 6. DISCUSSION We have developed consistent and efficient estimators for the proportional odds model with random effects, which is a useful alternative to the popular proportional hazards model with 12

13 Survival proailities Follow-up time (months) Figure 1: Estimated conditional survival proailities for the diaetic retinopathy patients: the curve represents the point estimate of the survival function for the adult-onset diaetics, and the curves the corresponding 95% confidence limits; the curve represents the the point estimate of the survival function for the juvenile-onset diaetics, and the curves the corresponding 95% confidence limits. 13

14 gamma frailty. The proposed estimators are more efficient than those of Cai et al. (22). It is computationally less demanding to evaluate the variances of the proposed estimators than those of Cai et al. s estimators as the latter requires a multi-layer summation. The proposed numerical algorithm does not guarantee a gloal maximum. This is a common prolem for all maximum likelihood estimators in complex settings. Our experience, however, indicates that the proposed algorithm works well in practice. One approach to increase one s confidence in the estimates is to employ different starting values. We have tried different starting values in our simulated and real data and otained very similar answers. Note that the existing ad hoc estimating equations may have multiple solutions as well. For the variance estimation, we invert the oserved information matrix on the asis of Theorem 3. When the numer of uncensored oservations is large, the matrix inversion may potentially e unstale. An alternative approach is to use the numerical differentiation of the profile log-likelihood function, as implemented y Huang and Rossini (1997) and Murphy, Rossini and van der Vaart (1997). In the latter approach, the choice of the neighorhood is aritrary and no variance estimates are availale for the survival function estimators. It would e worthwhile to study the maximum likelihood estimation for a general class of linear transformation models with random effects ψ{s(t X ij, Z ij, i )} = G(t) + X T ijβ + Z T ij i, (6) where ψ is a given link function. A versatile family of link functions is ψ(s) = log{λ 1 (s λ 1)}, λ, which contains oth the proportional hazards model (λ = ) and the proportional odds model (λ = 1). General linear transformation models have een studied y Bickel (1986), Darowka and Doksum (1988), Cheng et al. (1995) and Chen et al. (23) among others for independent failure time data and y Cai et al. (22) for clustered failure time data (with a scalar random effect), although asymptotically efficient estimators have yet to e developed. It is expected that the asymptotic normality and the efficiency of the maximum likelihood estimators for this class of models depend on the smoothness property of ψ. We are currently investigating the conditions for ψ and developing the requisite asymptotic theory. The proposed methods are ased on the normality of the random effects. The normality assumption may not e satisfied in some applications. It would e desirale to relax this assumption and to require only that the random effects have zero means. One possile approach is to approximate the density of random effects with a truncated series expansion (Davidian and Giltinana 1995, Ch. 7). We will pursue this generalization in our future work. 14

15 APPENDIX. PROOFS OF THEOREMS A.1 Proof of Theorem 1 The proof of Theorem 1 mimics Murphy s (1994) proof of consistency for the proportional hazards model with gamma frailty. Sustantial technical complications arise from the fact that, unlike the gamma frailty model, the random effects in our setting cannot e integrated out explicitly. The proof consists of two major steps: in the first step, we show that Ĥn( ) has an upper ound in [, τ] with proaility one; in the second step, we show that any convergent susequence of ( β n, Σ n, Ĥn) must converge to (β, Σ, H ). Step 1. We prove that Ĥn( ) has an upper ound in [, τ] with proaility one. Our approach is to show that since Ĥn maximizes L n it cannot diverge. Let l n (β, Σ, H) = log L n (β, Σ, H). By definition, l n ( β n, Σ n, Ĥn) l n (β, Σ, H) for any β, Σ and H. We wish to show that if Ĥn diverges, then the difference in the log-likelihood must e negative, which will e a contradiction. If H is continuous, l n (β, Σ, H) will e infinite for finite n. Thus, the choice of H = H is excluded. The key is to construct a suitale function H n that uniformly converges to H. Suppose that Ĥn(τ) in some sample space with positive proaility. We will show that n 1 {l n ( β n, Σ n, Ĥn) l n (β, Σ, H n )} diverges to if Ĥn(τ). We will construct the function H n y imitating Ĥn. By differentiating l n (β, Σ, H) with respect to H{Y ij } and setting the derivative to zero, we see that Ĥn{Y ij } satisfies the equation n ij H{Y ij } = R 1k( β n, H, )R 2k (Y ij, β 1 T n, H, )e Σ n /2 Σ n 1/2 d k=1 R 1k( β 1 T n, H, )e Σ n /2 Σ n 1/2, d (A.1) where R 1k (β, H, ) = n k l=1 e (XT klβ+z T kl ) { H(Ykl ) + e (XT kl β+zt kl )} 1+ kl, R 2k (t, β, H, ) = n k l=1 (1 + kl )I(Y kl t) H(Y kl ) + e (XT kl β+zt kl ). Thus, we define H n (t) as a step function with jumps only at the Y ij for which ij = 1 and the jump size H n {Y ij } satisfies the equation ij H n {Y ij } = n k=1 R 1k(β, H, )R 2k (Y ij, β, H, )e T Σ 1 /2 Σ 1/2 d Specifically, H n (t) = n i=1 ni I(Y ij t) H n {Y ij }. R 1k(β, H, )e T Σ 1 /2 Σ 1/2 d 15. (A.2)

16 We will show that H n (t) converges to H (t) uniformly in t [, τ] with proaility one. By the Glivenko-Cantelli theorem (van der Vaart and Wellner 1996, p. 122), H n (t) converges almost surely to E { ni I(Y ij t) ij /µ(y ij ) }, where µ(y) = E R 1k(β, H, )R 2k (y, β, H, )e T Σ 1 /2 Σ 1/2 d R 1k(β, H, )e T Σ 1 /2 Σ 1/2 d n k (1 + = E kl )I(Y kl y) E H (Y kl ) + e (XT β kl +ZT kl ) n k. l=1 If we denote y S c ( X kl, Z kl ) the survival function of C kl given (X kl, Z kl ), then (1 + kl )I(Y kl y) E H (Y kl ) + e (XT β kl +ZT kl ) n k 1 H = E 2 (t) y H (t) + e (XT β kl +ZT kl ) { H (t) + e (XT β kl +ZT )} 2 S c(t X kl, Z kl )dt kl 1 1 E y H (t) + e (XT β kl +ZT kl ) H (t) + e (XT β ds c(t X kl, Z kl ) kl +ZT kl ) n k S = E c (y X kl, Z kl ) {H (y) + e (XT β kl +ZT kl ) } 2 n k, where the second equality follows from integration y part. Thus, n i I(Y ij t) n i ij E µ(y ij ) = E t S E c (y X ij, Z ij )H (y) dy µ(y){h (y) + e (XT ijβ +Z T ij ) } 2 n i = t H (y)dy = H (t). Consequently, H n (t) uniformly converges to H (t) in [, τ]. By plugging equation (A.1) into l n ( β n, Σ n, Ĥn), we otain { l n ( β n, Σ n } n, Ĥn) = log R 1i ( β n, Ĥn, ) Σ 1 n 1/2 T e Σ n /2 d i=1 n n i n ij log R 1k( β n, Ĥn, )R 2k (Y ij, β 1 n, Ĥn, T )e Σ n /2 Σ n 1/2 d i=1 k=1 R 1k( β 1 n, Ĥn, T )e Σ n /2 Σ n 1/2. d Likewise, y plugging equation (A.2) into l n (β, Σ, H n ) and applying the Glivenko-Cantelli theorem, we see that n 1 l n (β, Σ, H n ) = O(1) n 1 n i=1 ni ij log(n), where O(1) denotes a random variale ounded away from infinity almost surely. Thus, n 1 {l n ( β n, Σ n, Ĥn) l n (β, Σ, H n { n )} = O(1)+n 1 log R 1i ( β n, Ĥn, ) Σ n 1/2 e 16 i=1 } 1 T Σ n /2 d

17 n n 1 n i n ij log i=1 n 1 R 1k( β n, Ĥn, )R 2k (Y ij, β 1 n, Ĥn, T )e Σ n /2 Σ n 1/2 d k=1 R 1k( β 1 n, Ĥn, T )e Σ n /2 Σ n 1/2. d (A.3) We will show that if Ĥn(τ), then the right-hand side of (A.3) will diverge to. To this end, we will ound each term in (A.3). Let m and M e constants such that < m e XT kl β n M < almost surely for all k = 1,..., n; l = 1,..., nk. Since Ĥ n (y) + e (XT kl β Ĥ n +Z T kl ) n (y) + e XT kl β n, if Z T kl, } e ZT kl {Ĥ n (y) + e XT kl β n, if Z T kl >, we have Ĥn(y) + e (XT kl β n +Z T kl ) e ZT kl {Ĥn(y) + m}. Similarly, Ĥn(y) + e (XT kl β n +Z T kl ) e ZT kl {Ĥn(y) + M}. Thus, there exist constants C 1 and C 2 such that the following results hold: R 1i ( β n, Ĥn, ) Σ n 1/2 e 1 T Σ n /2 d n i n i C 1 R 1k ( β n, Ĥn, )R 2k (Y ij, β n, Ĥn, ) Σ n 1/2 e n k l=1 n k C 2 e (XT kl β n +Z T kl ) (1+ kl) Z T kl l =1 (Ĥn(Y kl ) + M) 1+ kl I(Y kl Y ij ) Ĥ n (Y kl ) + M n k l=1 e (XT ij β n +Z T ij )+(1+ ij) Z T ij (Ĥn(Y ij ) + m) 1+ ij Σ n 1/2 e 1 T Σ n /2 d 1, (A.4) (Ĥn(Y ij ) + m) 1+ ij 1 T Σ n /2 d After plugging (A.4) and (A.5) into (A.3), we otain n 1 {l n ( β n, Σ n, Ĥn) l n (β, Σ, H n )} = O(1) n 1 n 1 n n i i=1 n k (1 + kl )I(Y kl Y ij )e ZT kl l =1 Ĥ n (Y kl ) + M Σ n 1/2 e T Σ 1 n /2 d 1. (A.5) (Ĥn(Y kl ) + M) 1+ kl log [ n 1 n n k k=1 l =1 Because there exists a constant C 3 such that n n i i=1 { I(Ykl Y ij ) Ĥ n (Y kl ) + M (1 + ij ) log(ĥn(y ij ) + m) } nk l=1 C 2(Ĥn(Y kl ) + m) 1+ ] kl nk l=1 C. 1(Ĥn(Y kl ) + M) 1+ kl we conclude that nk l=1 C 2(Ĥn(Y kl ) + m) 1+ kl nk l=1 C 1(Ĥn(Y kl ) + M) 1+ kl C 3 >, n 1 {l n ( β n, Σ n, Ĥn) l n (β, Σ, H n )} = O(1) n 1 17 n n i i=1 (1 + ij ) log(ĥn(y ij ) + m)

18 n 1 n n i i=1 ij log { n 1 n n k k=1 l=1 } I(Y kl Y ij ). (A.6) Ĥ n (Y kl ) + M It remains to show that, if Ĥn(τ), the right-hand side of (A.6) diverges to. To this end, we choose a partition of [, τ] as follows: with s = τ, choose s 1 < s such that 1 2 E n i (1 + ij )I(Y ij = s ) > E n i ij I(Y ij [s 1, s )). By conditions C.2 and C.4, such an s 1 exists. Define a constant ɛ (, 1) such that ɛ 1 ɛ < E { ni I(Y ij [s 1, s )) } E { ni ij I(Y ij [, τ)) }. If s 1 >, we can choose s 2 max(, s) such that s is the minimum value less than s 1 satisfying that n i (1 ɛ)e (1 + ij )I(Y ij [s 1, s )) E n i ij I(Y ij [s, s 1 )). Clearly, s 2 exists under condition C.4, and s 2 < s 1. This process is continued so that we otain a sequence: τ s > s 1 > s 2 >... such that 1 2 E n i (1 + ij )I(Y ij = s ) E n i ij I(Y ij [s 1, s )), n i (1 ɛ)e (1 + ij )I(Y ij [s p, s p 1 )) E n i ij I(Y ij [s p+1, s p )), p 1. We claim that such a sequence cannot e infinite, i.e., there exists a finite N such that s N+1 = ; otherwise, s p s for some s [, τ). By the definition of s p, n i n i (1 ɛ)e (1 + ij )I(Y ij [s p, s p 1 )) = E ij I(Y ij [s p+1, s p )), p 1. We sum the aove equations over p = 1, 2,..., and y the continuity of true densities, we otain n i (1 ɛ)e (1 + ij )I(Y ij [s, τ)) = E n i ij I(Y ij [s, s 1 )). Thus, n i (1 ɛ)e I(Y ij [s, τ)) ɛe n i ij I(Y ij [s, s 1 )), 18

19 which contradicts the choice of ɛ. Therefore, the sequence is finite: τ = s >... > s N+1 =. Now the right-hand side of (A.6) can e ounded y n n 1 n i I(Y ij = τ)(1 + ij ) log(ĥn(τ) + m) i=1 N n 1 n n i (1 + ij )I(Y ij [s p+1, s p )) log(ĥn(s p+1 ) + m) p= i=1 N n { n 1 n i n } ij I(Y ij [s p+1, s p )) log n 1 n k I(Y kl Y ij, Y kl [s p+1, s p )) + O(1) p= i=1 k=1 l=1 Ĥ n (s p ) + M 1 n n i (1 + ij )I(Y ij = τ) log(ĥn(τ) + m) 2n i=1 1 n n i (1 + ij )I(Y ij = τ) log(ĥn(τ) + m) 2n i=1 n n 1 n i ij I(Y ij [s 1, s )) log(ĥn(τ) + M) i=1 N n n i n 1 (1 + ij )I(Y ij [s p, s p 1 )) log(ĥn(s p ) + m) p=1 i=1 n n 1 n i ij I(Y ij [s p+1, s p )) log(ĥn(s p ) + M) i=1 N n { n 1 n i n } ij I(Y ij [s p+1, s p )) log n 1 n k I(Y kl Y ij, Y kl [s p+1, s p )) + O(1). p= i=1 k=1 l=1 (A.7) The first term on the right-hand side of (A.7) diverges to as Ĥn(τ). The second term is negative as n is large due to the choice of s 1. By the selection of s p, p = 1,..., N, the third term cannot diverge to +. Finally, the fourth term is ounded ecause of the Glivenko- Cantelli theorem. Hence, the right-hand side of (A.7) diverges to. This contradicts the fact that the left-hand side (A.6) is non-negative. In conclusion, we have shown that Ĥn(τ) has an upper ound with proaility 1. Thus, it follows from Helly s selection theorem that there exists a convergent susequence, still denoted y Ĥn( ), which converges point-wise to a monotone function H ( ) in [, τ]. Since β n and Σ n elong to a compact set, y choosing a further susequence, we can assume that β n β and Σ n Σ for some random vectors β and Σ. Step 2. We will show that β = β, Σ = Σ and H (t) = H (t). Define R 3k (t, β, Σ, H) = R 1k(β, H, )R 2k (t, β, H, )e T Σ 1 /2 d R. 1k(β, H, )e T Σ 1 /2 d 19

20 In view of equations (A.1) and (A.2), we see that Ĥn(t) is asolutely continuous with respect to H n (t), and Ĥ n (t) = t nk=1 R 3k (u, β, Σ, H ) nk=1 R 3k (u, β n, Σ n, Ĥn) d H n (u). By taking the limits on oth sides of the aove display, we conclude that H (t) is asolutely continuous with respect to H (t) so that H (t) is differentiale with respect to t. In addition, dĥn(t)/d H n (t) converges to dh (t)/dh (t) uniformly in t. On the other hand, n 1 {l n ( β n, Σ n, Ĥn) l n (β, Σ, H n )} n { = n 1 log R 1i ( β n, Ĥn, ) Σ n 1/2 e i=1 n n 1 i=1 { log } 1 T Σ n /2 d R 1i (β, H } n, ) Σ 1/2 e T Σ 1 /2 d n i n +n 1 ij log (Ĥn{Y ij }/ H n {Y ij } ). i=1 (A.8) By letting n in (A.8), we have E log R 1i(β, H, ) Σ 1/2 e T Σ 1 /2 d n i H (Y ij ) ij R 1i(β, H, ) Σ 1/2 e T Σ 1 /2 d n i H (Y ij ) ij Because the right-hand side is the negative Kullack-Leiler information, we have = n i n i almost surely. In other words, n i H (Y ij ) ij H (Y ij ) ij R 1i (β, H, ) Σ 1/2 e T Σ 1 /2 d R 1i (β, H, ) Σ 1/2 e T Σ 1 /2 d e (XT ijβ +Z T ij ) H (Y ij ) ij { } H (Y ij ) + e (XT ijβ 1+ ij Σ 1/2 e T Σ +Z T ij ) 1 /2 d. = n i e (XT ijβ +Z T ij ) H (Y ij ) ij { } 1+ ij Σ 1/2 e T Σ H (Y ij ) + e (XT ijβ +Z T ij ) 1 /2 d. (A.9) We will show that equation (A.9) entails that β = β, Σ = Σ and H = H. Fix an integer k such that 1 k n i. We let ij = 1, Y ij = in (A.9) for j = 1,..., k; for those 2

21 j such that j > k, we perform the following action on the jth term on oth sides of (A.9): if ij =, we replace Y ij with τ; if ij = 1, we integrate Y ij from to τ. Thus, we otain k = {H ()e XTijβ } ni +Z Tij k j=k+1 Σ 1/2 e T Σ 1 /2 d {H ()e XTijβ } ni +Z Tij H (τ)e XT ijβ +Z T ij H (τ)e XT ijβ +Z T ij ij 1 H (τ)e XT ijβ +Z T ij + 1 H (τ)e XT ijβ +Z T ij ij ij 1 1 ij j=k+1 H (τ)e XT ijβ +Z T ij + 1 H (τ)e XT ijβ +Z T ij + 1 Σ 1/2 e T Σ 1 /2 d. (A.1) Since { ij : j = k + 1,..., n i } are aritrary, we sum the two sides of (A.1) over all possile ij, j = k + 1,..., n i to yield Thus, k = k k exp k = exp {H ()e XTijβ } +Z Tij Σ 1/2 e T Σ 1 /2 d {H ()e XTijβ +Z Tij } Σ 1/2 e T Σ 1 /2 d. X T ijβ + ( k Z ij ) T Σ ( k Z ij ) 2 H () k X T ijβ + ( k Z ij ) T Σ ( k Z ij ) 2 H () k. (A.11) Condition C.4 implies that H () >. Note that the index set {1,..., k} in equation (A.11) can e replaced y any suset of {1,..., n i }. Thus, it is easy to derive from (A.11) that and Z T ijσ Z ij = Z T ijσ Z ij, j j ; j, j = 1,..., n i, X T ijβ + ZT ijσ Z ij 2 + log H () = X T ijβ + ZT ijσ Z ij 2 + log H (), j = 1,..., n i. According to condition C.3, Σ = Σ, β = β and H () = H (). To show that H = H, we let i1 = 1 in (A.1) and integrate Y i1 from to y; we also perform the following action on the jth term on oth sides of (A.1) for j = 2,... n i : if ij =, 21

22 we replace Y ij with τ; if ij = 1, we integrate Y ij from to τ. Then we sum the resulting equalities over all possile { ij : j = 2,..., n i } to yield = H (y)e XT i1β +Z T i1 H (y)e XT i1β +Z T i1 + 1 Σ 1/2 e T Σ 1 /2 d H (y)e XT i1β +Z T i1 H (y)e XT i1β +Z T i1 + 1 Σ 1/2 e T Σ 1 /2 d. Because the two sides of the aove equation are strictly monotone in H (y) and H (y), respectively, we have H (y) = H (y). Comining the results from Step 1 and Step 2, we conclude that, almost surely, β n β, Σ n Σ, and Ĥn(y) H (y). The uniform convergence of Ĥn to H follows from the fact that H is a continuous function. A.2. Proof of Theorem 2 Consider the set H = {(h 1, h 2, h 3 ) : h 1 R d 1, h 2 R d 2(d 2 +1)/2, h 3 ( ) is a function on [, τ]; h 1 1, h 2 1, h 3 V 1}, where h 3 V denotes the total variation of h 3 ( ) in [, τ]. We define a sequence of maps S n mapping a neighorhood of (β, Σ, H ), denoted y U, in the parameter space for (β, Σ, H) into l (H) (i.e., the space consisting of ounded functionals on H) as follows: S n (β, Σ, H)[h 1, h 2, h 3 ] n 1 d dɛ l n(β + ɛh 1, Σ + ɛh 2, H(t) + ɛ A n1 [h 1 ] + A n2 [h 2 ] + A n3 [h 3 ], t h 3 (s)dh(s)) where A np, p = 1, 2, 3, are linear functionals on R d 1, R d 2(d 2 +1)/2, and BV [, τ], respectively, and BV [, τ] is the space of functions with finite total variation in [, τ]. In fact, if we let l β, l Σ, l H [h 3 ] e the score function for β, the score function for Σ, and the score for H along the path H(t) + ɛ t h 3(s)dH(s) for a single cluster, then ɛ= A n1 [h 1 ] = P n [h T 1 l β ], A n2 [h 2 ] = P n [h T 2 l Σ ], A n3 [h 3 ] = P n [l H [h 3 ]], where P n denotes the empirical measure ased on n independent clusters. 22

23 We can explicitly write the functionals A np,p = 1, 2, 3, as follows. Define the operation etween two matrices M 1 and M 2 of the same size as the trace of (M 1 M T 2 ), and for each h 2 R d 2(d 2 +1)/2, let D(h 2 ) e the symmetric matrix such that the extended vector taken from D(h 2 ) is the same as h 2. Then, A n1 [h 1 ] = n 1 n i=1 A n2 [h 2 ] = n 1 n A n3 [h 3 ] = n 1 n R 1i(β, H, ) n i i=1 { n i i=1 X T ijh 1 [(1 + ij )/{1 + H(Y ij )e XT ijβ+z T ij } 1]e T Σ 1 /2 d R, 1i(β, H, )e T Σ 1 /2 d R 1i(β, H, )e T Σ 1 /2 { T Σ 1 D(h 2 )Σ 1 /2 Σ 1 D(h 2 )/2}d R 1i(β, H, )e T Σ 1 /2 d ij h 3 (Y ij ) Yij (1+ ij ) h 3 (y)dh(y) R 1i(β, H, )/(H(Y ij ) + e (XT ijβ+z T ij ) )e T Σ 1 } /2 d R. 1i(β, H, )e T Σ 1 /2 d Correspondingly, we define the limit map S : (β, Σ, H) l (H) as S(β, Σ, H)[h 1, h 2, h 3 ] = A 1 [h 1 ] + A 2 [h 2 ] + A 3 [h 3 ], where the linear functionals A p, p = 1, 2, 3, are otained y replacing the the empirical sum in the A np y the expectation. Clearly, S n ( β n, Σ n, Ĥn) =, and S(β, Σ, H ) =. The desired asymptotic normality will follow if we can verify the four conditions stated in Theorem 2 of Murphy (1995). The first condition that n(s n (β, Σ, H ) S(β, Σ, H )) weakly converges to a tight Gaussian process on l (H) holds since H is a Donsker class and the functionals A np are ounded Lipschitz functionals with respect to H. By the smoothness of S(β, Σ, H), the Fréchet differentiaility holds and the derivative of S(β, Σ, H) at (β, Σ, H ), denoted y Ṡ(β, Σ, H ), is a map from the space {(β β, Σ Σ, H H ) : (β, Σ, H) is in the neighorhood U of (β, Σ, H )} to l (H). The approximation condition that sup (h 1,h 2,h 3 ) H (S n S)( β n, Σ n, Ĥn)[h 1, h 2, h 3 ] (S n S)(β, Σ, H )[h 1, h 2, h 3 ] = o p (n 1/2 { β n β + Σ n Σ + sup Ĥn(y) H (y) }) y [,τ] can e verified along the lines of Murphy (1995, appendix)., 23

24 It remains to show that the linear map Ṡ(β, Σ, H ), denoted y Λ, is continuously invertile on its range. Note that Λ maps (β β, Σ Σ, H H ) to a ounded functional on H. Algeraic manipulations yield Λ(β β, Σ Σ, H H )[h 1, h 2, h 3 ] = (β β ) T Q 1 (h 1, h 2, h 3 ) + (Σ Σ ) T Q 2 (h 1, h 2, h 3 ) + τ Q 3 (h 1, h 2, h 3 )d(h H ), where ( ) h1 Q 1 (h 1, h 2, h 3 ) = B 1 + h 2 ) Q 2 (h 1, h 2, h 3 ) = B 2 ( h1 h 2 ) Q 3 (h 1, h 2, h 3 ) = B 3 ( h1 h 2 + τ τ h 3 (y)d 1 (y)dy, h 3 (y)d 2 (y)dy, + 4 h 3 (y) + τ h 3 (t)d 3 (t, y)dt, B 1, B 2, B 3 are constant matrices, D 1 (y), D 2 (y), D 3 (t, y) are continuously differentiale functions depending on the true densities, and 4 >. Therefore, the operator Q(h 1, h 2, h 3 ) (Q 1 (h 1, h 2, h 3 ), Q 2 (h 1, h 2, h 3 ), Q 3 (h 1, h 2, h 3 )) T can e considered as a sum of a continuously invertile linear operator and a compact operator from the linear span of H to itself. The invertiility of Λ Ṡ(β, Σ, H ) is equivalent to the invertiility of the linear operator Q(h 1, h 2, h 3 ). It suffices to prove that Q(h 1, h 2, h 3 ) is a one-to-one map (Rudin 1973, pp ). If Q(h 1, h 2, h 3 ) =, then Λ(β β, Σ Σ, H H )[h 1, h 2, h 3 ] = for any (β, Σ, H) in the neighorhood U. In particular, we choose y β = β + ɛh 1, Σ = Σ + ɛh 2, H(y) = H (y) + ɛ h 3 (t)dh (t) for a small constant ɛ. By the definition of Λ, = Λ(β β, Σ Σ, H H )[h 1, h 2, h 3 ] = ɛe{(l β [h 1 ] + l Σ [h 2 ] + l H [h 3 ]) 2 }. Thus, l β [h 1 ] + l Σ [h 2 ] + l H [h 3 ] = almost surely. After writing out the expression of this equation, we otain n i R 4i (β, H, ) X T ijh (1 + ij ) d (H (Y ij )e (XT ijβ +Z T ij ) N(, Σ ) + 1) n i + { Σ 1 D(h 2 ) + 1 } 2 T Σ 1 D(h 2 )Σ 1 R 4i (β, H, )d N(, Σ ) R 4i (β, H, ) ijh 3 (Y ij ) (1 + ij) Yij h 3 (y)dh (y) (H (Y ij ) + e (XT ijβ +Z T ij ) d N(, Σ ) =, ) 24 (A.12)

25 where R 4i (β, H, ) = R 1i (β, H, ) n i {H (Y ij )} ij. We will show that (A.12) entails h 1 =, h 2 = and h 3 = y adopting the ideas used in the proof of the identifiaility for Theorem 1. First, we let X ij and Z ij e fixed. Then for a fixed k such that 1 k n i, we define measures µ 1,... µ ni for any Borel set A [, τ], µ m ({} A) =, µ m ({1} A) = I( A), m k, on the set {, 1} [, τ] as follows: µ m ({} A) = I(τ A), µ m ({1} A) = I A dx, m > k. We integrate oth sides of (A.12) over {( i,1, Y 1 ),..., ( i,ni, Y i,ni )} with respect to the product measure dµ 1 dµ ni. In other words, we let im = 1 and Y im = for m k; we sum all the equalities of (A.12) for all possile cominations of { i,k+1,..., i,ni } {, 1} n i k, in which we choose Y im = τ if im = and integrate Y im from to τ if im = 1. The resulting integration is. We study the integral of each term on the left-hand side of (A.12) with respect to the measure n i m=1 µ m. For the first term on the left-hand side of (A.12), from the expression of R 4i (β, H, ), we have that for any, if j k, = X T ijh 1 R 4i (β, H, )X T m k = X T ijh 1 ijh 1 {H ()e XTimβ +Z Tim } δ im {,1},m>k m>k m k if j > k, it holds that = X T ijh 1 {H ()e XTimβ +Z Tim } ; R 4i (β, H, )X T m k 1 + (1 + ij ) d( (H (Y ij )e (XT ijβ +Z T ij ) + 1) n i m=1 µ m ) δ im (H (τ)e XT imβ +Z T im + 1) 1 δ im H (τ)e XT imβ +Z T im + 1 ijh 1 {H ()e XTimβ +Z Tim } δ im {,1},m>k,m j m>k,m j 1 + (1 + ij ) d( (H (Y ij )e (XT ijβ +Z T ij ) + 1) n i m=1 µ m ) δ im (H (τ)e XT imβ +Z T im + 1) 1 δ im H (τ)e XT imβ +Z T im

26 (1 δ 1 1 ij) 1 + δ ij {,1} (H (τ)e XT ijβ +Z T ij + 1) H (τ)e XT ijβ +Z T ij + 1 τ H +δ (t)e (XT ijβ +Z T ij ) 2 ij 1 + dt (H (t) + e (XT ijβ +Z T ij ) ) 2 H (t)e XT ijβ +Z T ij + 1 = X T ijh 1 {H ()e XTimβ } +Z Tim =. m k Therefore, δ ij {,1} n i = X T ijh 1 j k Likewise, = (1 δ 1 1 ij) 1 + (H (τ)e XT ijβ +Z T ij + 1) H (τ)e XT ijβ +Z T ij + 1 τ H +δ (t)e (XT ijβ +Z T ij ) 2 ij 1 + dt (H (t) + e (XT ijβ +Z T ij ) ) 2 H (t)e XT ijβ +Z T ij + 1 R 4i (β, H, ) X T ijh (1 + ij ) d (H (Y ij )e (XT ijβ +Z T ij ) N(, Σ )d( + 1) {H ()e XTimβ } +Z Tim d N(, Σ ). m k { 1 2 Σ 1 D(h 2 ) + 1 } 2 T Σ 1 D(h 2 )Σ 1 R 4i (β, H, )d N(, Σ )d( { 1 2 Σ 1 D(h 2 ) + 1 } 2 T Σ 1 D(h 2 )Σ 1 {H ()e XTimβ } +Z Tim d N(, Σ ). m k Furthermore, if j k, if j > k, = m k R 4i (β, H, ) ijh 3 (Y ij ) (1 + ij) Yij h 3 (y)dh (y) ni H (Y ij ) + e (XT ijβ +Z T ij ) d( µ m ) m=1 = h 3 () {H ()e XTimβ } +Z Tim ; m k R 4i (β, H, ) ijh 3 (Y ij ) (1 + ij) Yij h 3 (y)dh (y) ni H (Y ij ) + e (XT ijβ +Z T ij ) d( µ m ) m=1 {H ()e XTimβ } +Z Tim δ ij {,1} n i m=1 n i m=1 µ m ) µ m ) (A.13) (A.14) (1 δ 1 τ ij) h 3(y)dH (y) H (τ)e XT ijβ +Z T ij + 1 H (τ) + e (XT ijβ +Z T ij ) 26

27 Thus, τ H +δ (t)e (XT ijβ +Z T ij ) 2 t ij h 3 (t) h 3(y)dH (y) dt =. (H (t) + e (XT ijβ +Z T ij ) ) 2 H (t) + e (XT ijβ +Z T ij ) n i = h 3 () j k R 4i (β, H, ) ijh 3 (Y ij ) (1 + ij) Yij h 3 (y)dh (y) (H (Y ij ) + e (XT ijβ +Z T ij ) d N(, Σ )d( ) {H ()e XTimβ } +Z Tim d N(, Σ ). m k Comining (A.13), (A.14) and (A.15) and integrating over, we otain k X T ijh ( k ) T Z ij D(h2 ) ( k ) Z ij + kh3 () =. 2 Since the order for the suscripts j = 1,..., k is aritrary, it holds that k 2 j=k 1 +1 X T ijh ( k 2 j=k 1 +1 ) T Z ij D(h2 ) ( k 2 j=k 1 +1 Z ij ) + (k2 k 1 )h 3 () = n i m=1 µ m ) (A.15) for any 1 k 1 < k 2 n i. Thus, Z T ijd(h 2 )Z ij = for j j and X T ijh 1 +Z T ijd(h 2 )Z ij /2+h 3 () =. By condition C.3, D(h 2 ) =. As a result, h 2 = and h 1 =. In (A.13), we set Y ij =, j = 2,..., n i, and ij = 1, j = 1,..., n i, so as to otain h 3 (Y i1 ) = Yi1 2 h 3 (y)dh (y) i1β +Z T )+ n i e (XT i1 j=2 (XT ijβ +Z T ij ) e T Σ 1 /2 /(H (Y i1 ) + e (XT i1β +Z T i1 ) ) 3 d e (XT i1β +Z T i1 )+ n i j=2 (XT ijβ +Z T ij ) e T Σ 1 /2 /(H (Y i1 ) + e (XT i1β +Z T i1 ) ) 2 d That is, g(y) y h 3(t)dH (t) satisfies the homogeneous equation g (y) H (y) g(y) e (XT i1β +Z T i1 )+ n i j=2 (XT ijβ +Z T ij ) e T Σ 1 /2 /(H (Y i1 ) + e (XT i1β +Z T i1 ) ) 3 d e (XT i1β +Z T i1 )+ n i j=2 (XT ijβ +Z T ij ) e T Σ 1 /2 /(H (Y i1 ) + e (XT i1β +Z T i1 ) ) 2 d with oundary condition g() =. Thus, it is clear that g(y) =, i.e., h 3 (y) =. Hence, we have verified that Q is one-to-one map and have thus shown the invertiility of Ṡ(β, Σ, H ). The asymptotic distriution stated in Theorem 2 now follows from Theorem 2 of Murphy (1995). Furthermore, n Ṡ(β, Σ, H )( β n β, Σ n Σ, Ĥn H )[h 1, h 2, h 3 ] = n( β n β ) T Q 1 (h) + n( Σ n Σ ) T Q 2 (h) + τ n Q 3 (h)d(ĥn H ) = n(p n P)[h T 1 l β + h T 2 l Σ + l H [h 3 ]] + o p (1) (A.16) 27 =.

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Semiparametric Regression Analysis of Multiple Right- and Interval-Censored Events

Semiparametric Regression Analysis of Multiple Right- and Interval-Censored Events Journal of the American Statistical Association ISSN: 162-1459 Print 1537-274X Online Journal homepage: http://www.tandfonline.com/loi/uasa2 Semiparametric Regression Analysis of Multiple Right- and Interval-Censored

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Spiking problem in monotone regression : penalized residual sum of squares

Spiking problem in monotone regression : penalized residual sum of squares Spiking prolem in monotone regression : penalized residual sum of squares Jayanta Kumar Pal 12 SAMSI, NC 27606, U.S.A. Astract We consider the estimation of a monotone regression at its end-point, where

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Reduced-rank hazard regression

Reduced-rank hazard regression Chapter 2 Reduced-rank hazard regression Abstract The Cox proportional hazards model is the most common method to analyze survival data. However, the proportional hazards assumption might not hold. The

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data

Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data Jaeun Choi A dissertation sumitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Vector Spaces. EXAMPLE: Let R n be the set of all n 1 matrices. x 1 x 2. x n

Vector Spaces. EXAMPLE: Let R n be the set of all n 1 matrices. x 1 x 2. x n Vector Spaces DEFINITION: A vector space is a nonempty set V of ojects, called vectors, on which are defined two operations, called addition and multiplication y scalars (real numers), suject to the following

More information

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Statistica Sinica 23 (2013), 383-408 doi:http://dx.doi.org/10.5705/ss.2011.151 A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Chi-Chung Wen and Yi-Hau Chen

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Nonparametric Estimation of Smooth Conditional Distributions

Nonparametric Estimation of Smooth Conditional Distributions Nonparametric Estimation of Smooth Conditional Distriutions Bruce E. Hansen University of Wisconsin www.ssc.wisc.edu/~hansen May 24 Astract This paper considers nonparametric estimation of smooth conditional

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO The Annals of Statistics 2006, Vol. 34, No. 3, 1436 1462 DOI: 10.1214/009053606000000281 Institute of Mathematical Statistics, 2006 HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO BY NICOLAI

More information

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL Statistica Sinica 19 (2009), 997-1011 ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL Anthony Gamst, Michael Donohue and Ronghui Xu University

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong Tseng

Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong Tseng Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong seng Reference: 1. Andersen, Borgan, Gill, and Keiding (1993). Statistical Model Based on Counting Processes, Springer-Verlag, p.229-255

More information

Simple Examples. Let s look at a few simple examples of OI analysis.

Simple Examples. Let s look at a few simple examples of OI analysis. Simple Examples Let s look at a few simple examples of OI analysis. Example 1: Consider a scalar prolem. We have one oservation y which is located at the analysis point. We also have a ackground estimate

More information

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS APPLICATIONES MATHEMATICAE 9,3 (), pp. 85 95 Erhard Cramer (Oldenurg) Udo Kamps (Oldenurg) Tomasz Rychlik (Toruń) EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS Astract. We

More information

MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA

MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA Statistica Sinica 25 (215), 1231-1248 doi:http://dx.doi.org/1.575/ss.211.194 MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA Yuan Yao Hong Kong Baptist University Abstract:

More information

arxiv: v4 [math.st] 29 Aug 2017

arxiv: v4 [math.st] 29 Aug 2017 A Critical Value Function Approach, with an Application to Persistent Time-Series Marcelo J. Moreira, and Rafael Mourão arxiv:66.3496v4 [math.st] 29 Aug 27 Escola de Pós-Graduação em Economia e Finanças

More information

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam Symmetry and Quantum Information Feruary 6, 018 Representation theory of S(), density operators, purification Lecture 7 Michael Walter, niversity of Amsterdam Last week, we learned the asic concepts of

More information

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A) I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2011/2 ESTIMATION

More information

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0, Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

Empirical Processes & Survival Analysis. The Functional Delta Method

Empirical Processes & Survival Analysis. The Functional Delta Method STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 3 The Functional Delta Method Lu Mao lmao@biostat.wisc.edu 3-1 Objectives By the end of this lecture, you will

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Maximum likelihood estimation in semiparametric regression models with censored data

Maximum likelihood estimation in semiparametric regression models with censored data Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 4 words. Contributions longer than 4 words will be cut by the editor. J. R. Statist.

More information

1 Systems of Differential Equations

1 Systems of Differential Equations March, 20 7- Systems of Differential Equations Let U e an open suset of R n, I e an open interval in R and : I R n R n e a function from I R n to R n The equation ẋ = ft, x is called a first order ordinary

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Chaos and Dynamical Systems

Chaos and Dynamical Systems Chaos and Dynamical Systems y Megan Richards Astract: In this paper, we will discuss the notion of chaos. We will start y introducing certain mathematical concepts needed in the understanding of chaos,

More information

arxiv: v1 [cs.fl] 24 Nov 2017

arxiv: v1 [cs.fl] 24 Nov 2017 (Biased) Majority Rule Cellular Automata Bernd Gärtner and Ahad N. Zehmakan Department of Computer Science, ETH Zurich arxiv:1711.1090v1 [cs.fl] 4 Nov 017 Astract Consider a graph G = (V, E) and a random

More information

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following

More information

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models doi: 10.1111/j.1467-9469.2005.00487.x Published by Blacwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 33: 1 23, 2006 Ran Regression Analysis

More information

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2 Least squares: Mathematical theory Below we provide the "vector space" formulation, and solution, of the least squares prolem. While not strictly necessary until we ring in the machinery of matrix algera,

More information

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Statistica Sinica 20 (2010), 441-453 GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Antai Wang Georgetown University Medical Center Abstract: In this paper, we propose two tests for parametric models

More information

Online Supplementary Appendix B

Online Supplementary Appendix B Online Supplementary Appendix B Uniqueness of the Solution of Lemma and the Properties of λ ( K) We prove the uniqueness y the following steps: () (A8) uniquely determines q as a function of λ () (A) uniquely

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

Upper Bounds for Stern s Diatomic Sequence and Related Sequences Upper Bounds for Stern s Diatomic Sequence and Related Sequences Colin Defant Department of Mathematics University of Florida, U.S.A. cdefant@ufl.edu Sumitted: Jun 18, 01; Accepted: Oct, 016; Pulished:

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Frailty Probit model for multivariate and clustered interval-censor

Frailty Probit model for multivariate and clustered interval-censor Frailty Probit model for multivariate and clustered interval-censored failure time data University of South Carolina Department of Statistics June 4, 2013 Outline Introduction Proposed models Simulation

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof)

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof) JOURNAL OF MODERN DYNAMICS VOLUME 4, NO. 4, 010, 637 691 doi: 10.3934/jmd.010.4.637 STRUCTURE OF ATTRACTORS FOR (a, )-CONTINUED FRACTION TRANSFORMATIONS SVETLANA KATOK AND ILIE UGARCOVICI (Communicated

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Ying Xiao, Student Memer, IEEE, Krishnaiyan Thulasiraman, Fellow, IEEE, and Guoliang Xue, Senior Memer, IEEE Astract

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky EMPIRICAL ENVELOPE MLE AND LR TESTS Mai Zhou University of Kentucky Summary We study in this paper some nonparametric inference problems where the nonparametric maximum likelihood estimator (NPMLE) are

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Semiparametric Regression Analysis of Survival and Longitudinal Data

Semiparametric Regression Analysis of Survival and Longitudinal Data Semiparametric Regression Analysis of Survival and Longitudinal Data A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

A Reduced Rank Kalman Filter Ross Bannister, October/November 2009

A Reduced Rank Kalman Filter Ross Bannister, October/November 2009 , Octoer/Novemer 2009 hese notes give the detailed workings of Fisher's reduced rank Kalman filter (RRKF), which was developed for use in a variational environment (Fisher, 1998). hese notes are my interpretation

More information

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data Biometrika (28), 95, 4,pp. 947 96 C 28 Biometrika Trust Printed in Great Britain doi: 1.193/biomet/asn49 Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival

More information

arxiv: v1 [cs.gt] 4 May 2015

arxiv: v1 [cs.gt] 4 May 2015 Econometrics for Learning Agents DENIS NEKIPELOV, University of Virginia, denis@virginia.edu VASILIS SYRGKANIS, Microsoft Research, vasy@microsoft.com EVA TARDOS, Cornell University, eva.tardos@cornell.edu

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2008 Paper 85 Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data Yi Li

More information

10 Lorentz Group and Special Relativity

10 Lorentz Group and Special Relativity Physics 129 Lecture 16 Caltech, 02/27/18 Reference: Jones, Groups, Representations, and Physics, Chapter 10. 10 Lorentz Group and Special Relativity Special relativity says, physics laws should look the

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

A732: Exercise #7 Maximum Likelihood

A732: Exercise #7 Maximum Likelihood A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function

More information

On Estimation of Partially Linear Transformation. Models

On Estimation of Partially Linear Transformation. Models On Estimation of Partially Linear Transformation Models Wenbin Lu and Hao Helen Zhang Authors Footnote: Wenbin Lu is Associate Professor (E-mail: wlu4@stat.ncsu.edu) and Hao Helen Zhang is Associate Professor

More information

Testing near or at the Boundary of the Parameter Space (Job Market Paper)

Testing near or at the Boundary of the Parameter Space (Job Market Paper) Testing near or at the Boundary of the Parameter Space (Jo Market Paper) Philipp Ketz Brown University Novemer 7, 24 Statistical inference aout a scalar parameter is often performed using the two-sided

More information

Analysis of transformation models with censored data

Analysis of transformation models with censored data Biometrika (1995), 82,4, pp. 835-45 Printed in Great Britain Analysis of transformation models with censored data BY S. C. CHENG Department of Biomathematics, M. D. Anderson Cancer Center, University of

More information

Section 8.5. z(t) = be ix(t). (8.5.1) Figure A pendulum. ż = ibẋe ix (8.5.2) (8.5.3) = ( bẋ 2 cos(x) bẍ sin(x)) + i( bẋ 2 sin(x) + bẍ cos(x)).

Section 8.5. z(t) = be ix(t). (8.5.1) Figure A pendulum. ż = ibẋe ix (8.5.2) (8.5.3) = ( bẋ 2 cos(x) bẍ sin(x)) + i( bẋ 2 sin(x) + bẍ cos(x)). Difference Equations to Differential Equations Section 8.5 Applications: Pendulums Mass-Spring Systems In this section we will investigate two applications of our work in Section 8.4. First, we will consider

More information

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement Open Journal of Statistics, 07, 7, 834-848 http://www.scirp.org/journal/ojs ISS Online: 6-798 ISS Print: 6-78X Estimating a Finite Population ean under Random on-response in Two Stage Cluster Sampling

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

IN this paper, we consider the estimation of the frequency

IN this paper, we consider the estimation of the frequency Iterative Frequency Estimation y Interpolation on Fourier Coefficients Elias Aoutanios, MIEEE, Bernard Mulgrew, MIEEE Astract The estimation of the frequency of a complex exponential is a prolem that is

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

1 Hoeffding s Inequality

1 Hoeffding s Inequality Proailistic Method: Hoeffding s Inequality and Differential Privacy Lecturer: Huert Chan Date: 27 May 22 Hoeffding s Inequality. Approximate Counting y Random Sampling Suppose there is a ag containing

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

A converse Gaussian Poincare-type inequality for convex functions

A converse Gaussian Poincare-type inequality for convex functions Statistics & Proaility Letters 44 999 28 290 www.elsevier.nl/locate/stapro A converse Gaussian Poincare-type inequality for convex functions S.G. Bokov a;, C. Houdre ; ;2 a Department of Mathematics, Syktyvkar

More information

Step-Stress Models and Associated Inference

Step-Stress Models and Associated Inference Department of Mathematics & Statistics Indian Institute of Technology Kanpur August 19, 2014 Outline Accelerated Life Test 1 Accelerated Life Test 2 3 4 5 6 7 Outline Accelerated Life Test 1 Accelerated

More information

CHAPTER 5. Linear Operators, Span, Linear Independence, Basis Sets, and Dimension

CHAPTER 5. Linear Operators, Span, Linear Independence, Basis Sets, and Dimension A SERIES OF CLASS NOTES TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS LINEAR CLASS NOTES: A COLLECTION OF HANDOUTS FOR REVIEW AND PREVIEW OF LINEAR THEORY

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0 8 Parametric models 8. Introduction In the last few sections (the KM

More information

Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC)

Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC) Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC) Authors: Alexei C. Ionan, Mei-Yin C. Polley, Lisa M. McShane, Kevin K. Doin Section Page

More information

A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models

A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models Jian Huang Department of Statistics University of Iowa Ying Zhang and Lei Hua Department of Biostatistics University

More information

Fixed-b Inference for Testing Structural Change in a Time Series Regression

Fixed-b Inference for Testing Structural Change in a Time Series Regression Fixed- Inference for esting Structural Change in a ime Series Regression Cheol-Keun Cho Michigan State University imothy J. Vogelsang Michigan State University August 29, 24 Astract his paper addresses

More information

Chapter 2 Canonical Correlation Analysis

Chapter 2 Canonical Correlation Analysis Chapter 2 Canonical Correlation Analysis Canonical correlation analysis CCA, which is a multivariate analysis method, tries to quantify the amount of linear relationships etween two sets of random variales,

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Log-mean linear regression models for binary responses with an application to multimorbidity

Log-mean linear regression models for binary responses with an application to multimorbidity Log-mean linear regression models for inary responses with an application to multimoridity arxiv:1410.0580v3 [stat.me] 16 May 2016 Monia Lupparelli and Alerto Roverato March 30, 2016 Astract In regression

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 A Class of Parallel Douly Stochastic Algorithms for Large-Scale Learning arxiv:1606.04991v1 [cs.lg] 15 Jun 2016 Aryan Mokhtari Alec Koppel Alejandro Rieiro Department of Electrical and Systems Engineering

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information