Maximum likelihood estimation for Cox s regression model under nested case-control sampling

Size: px
Start display at page:

Download "Maximum likelihood estimation for Cox s regression model under nested case-control sampling"

Transcription

1 Biostatistics (2004), 5, 2,pp Printed in Great Britain Maximum likelihood estimation for Cox s regression model under nested case-control sampling THOMAS H. SCHEIKE Department of Biostatistics, University of Copenhagen, Blegdamsvej 3, DK-2200 KBH N, Denmark ts@kubism.ku.dk ANDERS JUUL Department of Growth and Reproduction, University Hospital of Copenhagen, Blegdamsvej 9, Denmark and Centre of Preventive Medicine, Glostrup County Hospital, Denmark SUMMARY Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox s proportional hazards model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulinlike growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used to obtain information additional to the relative risk estimates of covariates. Keywords: Cox model; Efficiency; Nested case-control; Proportional hazards model; Survival data. 1. INTRODUCTION Large cohort studies are designed to learn about covariate effects for relatively rare diseases. Often the covariates of interest are expensive to obtain and the study therefore will be very expensive to carry out. Thomas (1977) suggested an alternative design called the nested case-control (NCC) design where each case is compared to a random sample from the risk set. This design will typically reduce the amount of data dramatically. The standard analysis for nested case-control sampling, see Thomas (1977), Oakes (1981), Goldstein and Langholz (1992) or Borgan et al. (1995), can be implemented by standard software. Samuelsen (1997) suggested several procedures for more efficient analysis. To find out whether low serum insulin-like growth factor (IGF-I) was associated with increased risk of ischemic heart disease (IHD), we carried out a matched nested case-control sampling for a large cohort study (Juul et al., 2002). We had two major reasons for doing the study as an NCC. First, the measurement To whom correspondence should be addressed. Biostatistics Vol. 5 No. 2 c Oxford University Press 2004; all rights reserved.

2 194 T. H. SCHEIKE AND A. JUUL of IGF-I was done by a relatively costly analysis based on blood samples that were retrieved from a freezer where they were stored at the initiation of the study in Secondly, since the blood sample had to be discarded at the analysis for IGF-I we also wanted to limit the use of valuable biological material that might be useful for other scientific studies. The proportional hazards model with time constant covariates is often used to analyze survival data. We here suggest a maximum likelihood estimator (MLE) that appears to have better finite sample properties than the standard partial likelihood estimation procedure. For the parameter levels found in the case-study we observed that the standard analysis and MLE had the same efficiency, but when covariate effects were large then the MLE gave increased efficiency compared to the standard analysis. Maximization of the likelihood function is easily carried out by use of the EM-algorithm. This is particularly simple in the present context since standard software can be applied. Standard errors are obtained by a numerical profile likelihood approach based on EM aided differentiation as in Chen and Little (1999), see also Murphy et al. (1997). The EM-algorithm (Dempster et al., 1977) has been utilized in a number of non-parametric maximum likelihood settings, see e.g. Turnbull (1976), Nielsen et al. (1992), Murphy et al. (1997), Wellner and Zhan (1997), Martinussen (1999) and Chen and Little (1999). The papers by Martinussen (1999) and Chen and Little (1999) on missing covariates in a proportional hazards setting are closely related to the present approach, but our approach differs from their methods by a new non-parametric way of handling the distribution of the covariates. Martinussen (1999) and Chen and Little (1999) are limited to parametric assumptions about the covariate distribution for the partly observed covariates. We here suggest a simple and elegant approach for dealing with the distribution of the covariates completely non-parametrically by combining the approach of Wellner and Zhan (1997) with the EM algorithm for survival data. Chen (2002) extends Chen and Little (1999) and makes a similar suggestion for the missing data problem. The missing data problem deals with covariates that are missing at random (MAR), and the nested case-control sampling can be viewed as having covariates missing at random, since the missingness of the covariates depend solely on the survival times and censoring indicators of the data. Even though the probability that a subject has missing covariates does not depend on the unobserved covariates, nested case-control sampling differs from the set-up described in the missing data papers because the missingness is not independent across subjects. Despite these differences, however, nested case-control sampling leads to a likelihood and EM-algorithm that is equivalent to that of Chen (2002). We also present some new models where the stratified baseline hazard is modelled by a proportional hazards model, and show how the algorithm can be modified to estimate cumulative baseline hazards stratified according the covariate that is only partly observed. Both these models are new for nested casecontrol sampling, but are easy to estimate in a maximum likelihood setting. The paper is organized as follows. Section 2 presents nested case-control sampling and introduces notation. Section 3 introduces the new non-parametric MLE and outlines the estimation procedure. In Section 4 we show some extensions and discuss various simplifying assumptions. Section 5 gives a simulation study aimed at validating the results from the application in Section 6. Section 7 contains some closing remarks. 2. NESTED CASE-CONTROL SAMPLING Let U and C denote the survival and censoring times of an individual. Let T = min(u, C) denote the observed time at risk with δ = I U C the indicator of failure, and let (X, Z) denote a (q + p)- dimensional covariate vector. We make the assumption that U and C are independent given Z, X. An additional important assumption is that the censoring is non-informative on Z, i.e. that the censoring distribution may only depend on X. Toobtain a MLE we also assume that the censoring distribution is

3 MLE for nested case-control sampling 195 non-informative on the parameters of interest. It is crucial for the further analysis that the covariate vector does not vary with time. Define the associated counting process N(t) = I t T and the at risk indicator Y (t) = I T >t. Consider a cohort of M independent and identically distributed subjects that we denote as (T i,δ i, Z i, X i ), i = 1,...,M or (N i (t), Y i (t), X i, Z i ), i = 1,...,M. Wedonot observe the full cohort but only observe data from a nested case-control sampling design. We thus observe X i at the initiation of the study for all subjects (covariates such as gender and age) but we only observe Z i for those of the subjects that become cases or are selected as controls. The observations are (N i (t), Y i (t), X i ), i = 1,...,M,aswell as Z i for those subjects that become cases or controls. Define the size of the cohort as M, let R ={1,...,M} be the index set of all subjects in the cohort, and let R 1 and R 0 denote, respectively, the index sets of all cases and controls in R,defined as R j ={i R : δ i = j} for j = 0, 1. When the risk sets are functions of time they are restricted to subjects under risk at that time, such that for example R(t) ={i R : Y i (t) = 1} is the set of subjects at risk just before time t and R 1 (t) ={i R(t) : i R 1 }. Let R 0 denote the index set of the selected controls that are not in R 1. We assume that the intensity of N(t) is given by a stratified version of Cox s regression model as λ(t) = Y (t)λ(x, t) exp{z T β 0 }, (2.1) where β 0 is a p-dimensional unknown regression parameter, and that X takes S distinct values, thus resulting in S strata. We denote these values as x j, j = 1,...,S, and let λ j (t) = λ(x j, t). The stratified baseline hazard, λ(x, t),isnonparametric which allows the model a great deal of flexibility. We need additional notation to describe how the controls are selected in the NCC. Given that subject i becomes a case at time t, weselect a set of controls, R ncc (t), from the risk set at time t. The controls at time t are selected as a random sample of subjects under risk (R(t)) and we allow the selection to depend only on X. When likelihood-based inference is the aim it is crucial that the probability of sampling a given risk set depends solely on the observed data. This is the case for nested case-control sampling. We here restrict attention to simple random sampling of those under risk, possibly matched on X. This implies that controls may be sampled several times. The sampling of controls must not depend on knowledge of who becomes cases. Therefore subjects who become cases after time t should be available for sampling as controls at t. Based on full information about the entire cohort the partial likelihood estimator, ˆβ, for Cox s regression model would be the solution to the score equation M ( U 0 (β) = Z i S ) 1(β, R(t)) dn i (t) = 0 S 0 (β, R(t)) i=1 where S k (β, R(t)) = i R(t) Y i(t) exp(zi T β)zi k for k = 0, 1 and where for a vector a, a 0 = 1 and a 1 = a. The standard analysis of the nested case-control design, see Thomas (1977), Oakes (1981), Goldstein and Langholz (1992) or Borgan et al. (1995), is based on the score equation for the regression parameter β 0 U ncc (β) = M i=1 ( Z i S ) 1(β, R ncc (t) {i}) dn i (t) = 0, S 0 (β, R ncc (t) {i}) where the risk set R ncc (t) is a random sample of the risk set at time t as above. Borgan et al. (1995) present an asymptotic analysis in the martingale framework. We consider two different sampling methods for the controls: (1) the controls are a simple random sample from the risk set; and (2) the controls are randomly sampled from the risk set of controls that

4 196 T. H. SCHEIKE AND A. JUUL match the cases on some aspects of X. The matched sampling is the more general situation and is dealt with in the next section. 3. MLE FOR NESTED CASE-CONTROL SAMPLING In this section we propose a maximum likelihood estimator for β 0 in the semi-parametric setting of Cox s regression model. A traditional survival analysis will condition on the observed covariates, and then no modelling of their distribution is needed. Similarly, the standard nested case-control analysis does not require modelling of the distribution of Z. However, when a likelihood approach is pursued with Z only observed for some subjects we need to model the distribution of Z given X to represent precisely the information contributed by the subjects for whom Z is unobserved. Because X is fully observed it makes no difference whether it is conditioned on or considered random, and we therefore do not include it in the likelihood. Note, however, that covariates must not depend on time to accommodate a full likelihood approach. Let f (z x) and f (z U = t, x) denote the conditional distribution of Z given X = x and the conditional distribution of Z given U = t and X = x, respectively. We let f (z) denote the marginal distribution of Z. Note that f (z U = t, x) = λ(x, t) exp(z T β 0 ) exp( (x, t) exp(z T β 0 )) f (z x)/g(t x), where g(t x) = λ(x, t) exp(z T β 0 ) exp( (x, t) exp(z T β 0 )) f (z x) dz is the marginal distribution of U given X = x, and that the conditional distribution of Z given U t and X = x is equal to f (z U t, x) = exp( (x, t) exp(z T β 0 )) f (z x)/g(t x), (3.1) where G(t x) = P(U t X = x) = exp( (x, t) exp(z T β 0 )) f (z x) dz. The likelihood for the data of the M iid subjects consists of terms from controls, cases and other subjects. The likelihood contribution for a subject that becomes a case at the time where we record the covariate Z is g(t X) f (z U = t, X) = λ(x, t) exp(z T β)exp( (X, T ) exp(z T β)) f (Z X); the likelihood contribution for a subject who is selected as a control at a time, T z, and is then followed until T is G(T z X) f (z T T z, X) exp( ( (X, T ) (X, T z )) exp(z T β))(λ(x, t) exp(z T β)) δ = (λ 0 (X, T ) exp(z T β)) δ exp( 0 (X, T ) exp(z T β)) f (Z X); finally, the likelihood contribution for a subject whose Z covariate is never observed is G(T X). The combined likelihood of these contributions is (proportional to) {(λ(x i, T i ) exp(zi T β)) δi exp( (X i, T i ) exp(zi T β)) f (Z i X i )} (3.2) i R 1 R 0 G(T i X i ), (3.3) i R\(R 1 R 0 ) where \ denotes set difference. The first part, (3.2), of the likelihood is of the standard Cox form, while the second term, (3.3) is more complicated. We have omitted all the sampling probabilities of the controls since these were assumed to depend only on the observed data.

5 MLE for nested case-control sampling 197 The key assumption of this paper is that the MLE of the distribution of Z given X has point masses at the observed covariate values, see Wellner and Zhan (1997) for a discussion of the consequences of this assumption. We let S(X) be the mapping that takes X into the stratum number. Denote the distinct values among the observed Z covariates for which X = x s as W s (1),..., W s (l s ) (l s is the number of distinct values in stratum s) and the corresponding point masses as p s = (p s (1),...,p s (l s )), such that l s i=1 p s (i) = 1 for all s = 1,...,S. Weuse the notation that p s (Z i ) = l s k=1 p s (k)i Zi =W s (k). The second term of the likelihood, (3.3), can then be written as l S(X) G(T X) = exp( (X, T ) exp(w S(X) (i) T β))p S(X) (i). i=1 It would be appealing to maximize the likelihood with respect to the two non-parametric terms p j and j ( ) ( j = 1,...,S) for given β to obtain a profile likelihood for β on which inference and asymptotic properties could be established. Unfortunately, this does not seem tractable because of the G(T X) terms. Instead we apply the EM-algorithm since maximization is easy in the full data situation. The full log-likelihood given all covariates is l(, β, p) = = M {log(λ(x i, T i ) exp(zi T β))δ i (X i, T i ) exp(zi T β) + log( f (Z i X i ))} i=1 M {log(λ(x i, T i ) exp(zi T β))δ i (X i, T i ) exp(zi T β)}+ i=1 S l s s=1 k=1 a s (k) log(p s (k)) (3.4) with a s ( j) = M i=1 I Zi =W s ( j),x i =x s.wefurther define b s = M i=1 I Xi =x s, and let D i denote the data of the ith subject. We use the notation = ( 1 ( ),..., S ( )), where s ( ) = (x s, ) and p = (p 1,.., p S ). The E-step of the EM-algorithm consists of computing Q((, β, p), ( k,β k, p k )) := E k (l(, β, p) D 1,...,D M, k, p k,β k ), i.e. the conditional expectation of l(, β, p) given the observed data, D 1,...,D M, and the current parameter estimates, ( k,β k, p k ).Wethus need to compute the expectations E k (Z i D i ) and E k (exp(z T i β) D i ). These expectations are simple to compute due to the assumption about the finite number of values. Using (3.1) leads to P k (Z i = W s ( j) U i T i, X i = x s ) = exp( k (x s, T i ) exp(w s ( j) T β k ))p k s ( j) ls l=1 exp( k (x s, T i ) exp(w s (l) T β k ))p k s (l). Define α k ij = P k (Z i = W S(Xi )( j) D i ).For subjects that become cases or are selected as controls α k ij = l S(Xi ) j=1 I Z i =W S(Xi )( j), whereas for the other subjects α k ij = P k(z i = W S(Xi )( j) U i T i, X i ).Now, lsj=1 E k (exp(zi T exp(w s ( j) T β)αij k β) U i > T i, X i = x s ) = lsj=1. αij k

6 198 T. H. SCHEIKE AND A. JUUL Therefore E k (l(, β, p) D 1,...,D M, k,β k, p k ) is computed as + i R\(R 1 R 0 ) i R 1 R 0 {log(λ(x i, T i ) exp(z T i β))δ i (X i, T i ) exp(z T i β) + log(p S(Xi )(Z i ))} (3.5) l S(Xi ) j=1 (X i, t) exp(w S(Xi )( j) T β))α k ij + i R\(R 1 R 0 ) l S(Xi ) The conditional expectation of the full data likelihood given the observed data, Q((, β, p), ( k,β k, p k )),ismaximized in p 1,...,p J, subject to l sj=1 p s ( j) = 1, by ˆp s ( j) = j=1 log(p S(Xi )( j))α k ij. (3.6) M I Xi =x s αij k /b s, s = 1,...,S, (3.7) i=1 in β by ˆβ that is the solution to the partial likelihood score M S i=1 s=1 and in by the Breslow estimator I Xi =x s ( Z i S 1 (β, t, s) S 0 (β, t, s) ˆ (s, t) = t 0 ) dn i (t) = 0, (3.8) 1 S 0 ( ˆβ,t, s) dn s (t) (3.9) where N s (t) = i I X i =x s N i (t) and for h = 0, 1 S h (β, t, s) = I Xi =x s Y i (t) exp(zi T β)zi h i R 1 R 0 + I Xi =x s Y i (t) exp(w s ( j) T β)αij k W s( j) h. i R\(R 1 R 0 ) j The last maximization in β and is equivalent to a standard Cox regression with offsets. The algorithm is therefore simple to implement by standard software. In the case where all subjects that are not selected as cases or controls are right censored at the same time the expressions simplify. In recent work Chen (2002) shows that the MLE for the relative risk parameters is consistent and asymptotically Normal when based on independent identically distributed observations with an independent missing data mechanism. For nested case-control sampling, however, the missingness is not independent across subjects. The probability that a subject is sampled as a control depends on both the number of failures and the failure times of all subjects. The likelihood and the EM-algorithm are, however, equivalent and we expect the results to carry through. Standard errors for the regression parameters may be obtained by bootstrapping techniques or directly from the information matrix as in Louis (1982). The total number of parameters will, however, be large if the number of events and the number of different covariate values are large. Therefore, as interest primarily centres on the regression parameters it seems preferable to use techniques that focus on these parameters. We suggest estimating standard errors by EM-aided differentiation of the profile likelihood for β, asin Chen and Little (1999). With β 0 and pβ denoting the maximizers of the observed data log-likelihood, l O (, β, p), for given β, then the derivative of the observed data profile likelihood is β l O(β, β 0, pβ ) = E ( β l(β, β 0, pβ ) D 1,...,D M, β,β,p β ),

7 MLE for nested case-control sampling 199 where the conditional expectation is computed with the parameters ( β,β,p β ).Now, with β N, j denoting a perturbed version of ˆβ where the jth component is perturbed by d, then l O ( ˆβ, ˆ 0, ˆp) 1 ( ) β β j d E β l( β N, j 0,β N, j, p β N, j ) D 1,...,D M, β N, j 0,β N, j, p β N, j. Chen and Little suggest that d = 1/N results in a reasonable performance. Our simulations also indicate that the standard errors perform quite well for the sample sizes considered in the case study. 4. EXTENSIONS AND ADDITIONAL MODELLING Various additional assumptions may be made to simplify the parameters of the model. We here consider two possible assumptions about the baseline hazards and the distribution of Z given X. We illustrate the use of the simplifying assumptions for the case study. We may for example assume that the stratified Cox model is in fact a proportional hazards model λ(x, t) = λ 0 (t) exp{x T η}. (4.1) This leads to a similar analysis except that the score equation for the relative risk parameters, (3.8), and the Breslow estimator, (3.9), are no longer stratified. If the conditional distribution of Z given X, f (z x), is known not to depend on X, so that the stratification variable does not contain information about the distribution of Z, the maximization (3.7) is not stratified. It may also be of interest to stratify the baseline hazards according to the partly unobserved covariates. This gives the model λ(t) = Y (t)λ(z, t) exp{x T β}, (4.2) where we now assume that the partly unobserved covariate Z takes only a finite number of distinct values, and that the fully observed covariate X leads to proportional hazards. This model can be maximized similarly to the full data likelihood in the standard case (3.4). Subjects with unobserved Z are simply distributed to the strata according to the distribution of Z given the observed data. Model (4.2) may be used for examining the proportionality of the covariates and can give additional insight into the time-varying effects of the covariates. 5. SIMULATIONS FOR NESTED CASE-CONTROL SAMPLING To learn about the finite sample properties of the MLE in a situation comparable to the case study discussed in the next section we simulated survival times from a cohort of size 4000 that were censored at time 15. We first considered two continuous covariates that were independent standard Normals with log-relative risk 0.25 and 0.25, respectively and with a constant baseline hazard with levels 0.004, and 0.016, respectively. We also varied the log-relative risk to (0,0), ( 0.5, 0.5) and ( 1, 1). Due to the equivariance of the likelihood approach, the minus sign could be omitted without otherwise changing the results; it is included here only in order to mimic the values of the application. Table 1 shows the performance of a Cox regression analysis for a fully observed cohort (4000), the standard nested case-control analysis (standard NCC) and the MLE, with log-relative risk parameters (0, 0), ( 0.5, 0.5) and ( 1, 1). Wecomputed the empirical mean, empirical standard error and the mean of the estimated standard error for the different estimators. The empirical variances are also given relative to the full cohort Cox. For the low relative risk effects the standard NCC analysis performed very well and

8 200 T. H. SCHEIKE AND A. JUUL Table 1. Simulations of cohort size 4000 with two covariates and three different baseline hazard levels, and for three levels of the relative risk parameters ((0, 0), ( 0.5, 0.5), and ( 1, 1)). Empirical mean (Emp. mean), empirical standard deviation (emp. sd.), and mean of estimated standard deviation (mean est. sd.). The empirical standard deviations are given in absolute size as well as relative (in parentheses) to the full cohort Cox standard errors. Simulations based on 1000 replications N=4000, two controls Emp. Emp. Emp. Emp. Mean Mean λ 0 (t) Av. cases mean mean sd. sd. est. sd est. sd Cox Standard NCC (1.24) 0.08 (1.22) MLE (1.23) 0.08 (1.21) Cox Standard NCC (1.21) 0.05 (1.15) MLE (1.24) 0.06 (1.16) Cox Standard NCC (1.24) 0.04 (1.25) MLE (1.26) 0.05 (1.30) Cox Standard NCC (1.48) 0.08 (1.60) MLE (1.44) 0.08 (1.52) Cox Standard NCC (1.42) 0.06 (1.39) MLE (1.35) 0.06 (1.31) Cox Standard NCC (1.29) 0.04 (1.25) MLE (1.28) 0.04 (1.21) Cox Standard NCC (1.66) 0.15 (1.84) MLE (1.25) 0.10 (1.25) Cox Standard NCC (1.67) 0.09 (1.70) MLE (1.32) 0.07 (1.26) Cox Standard NCC (1.89) 0.06 (1.73) MLE (1.12) 0.04 (1.21) led to estimates that were almost as efficient as the full Cox. The MLE had a similar behaviour but led to slightly higher variances. The findings are consistent for the three different levels of the baseline hazard. Log-relative risk at absolute level 0.25 led to similar conclusions and is not shown. For log-relative risk at 0.5 wesee that the estimators have similar performances, but that the MLE is slightly better. For the high risk case we see that the MLE leads to an important improvement in efficiency compared to the standard analysis. All estimators had their variability well estimated by the estimators of the standard errors. Note also that the MLE has a slight bias. In conclusion, we find that when the NCC has good efficiency then the MLE has a similar performance. This appears to be the case for log-relative risk effects smaller than 0.5 for the level of the baseline hazard in the simulation. For larger effects the MLE improves considerably on the NCC.

9 MLE for nested case-control sampling 201 Table 2. Log-relative risk of standardized IGF-I and IGFBP-3 for IHD data Standard NCC MLE IGF-I (0.118) (0.091) IGFBP (0.114) (0.115) 6. APPLICATION To study the effect of IGF-I and its binding protein 3 (IGFBP-3) on IHD Juul et al. (2002) performed a nested case-control study based on a Danish population. The study was based on a population of 3784 Danes leading to 231 cases of IHD, each of whom was subsequently matched on age and gender to two randomly selected controls. Study participants were recruited with approximate ages 30, 40, 50 and 60 years. A detailed description of the study can be found in Juul et al. (2002). Here, we give some additional analysis of the key covariates in the MLE framework to illustrate the use of the MLE. First we report the analysis of Juul et al. (2002) focusing on the effects of the covariates IGF-I and IGFBP-3. Table 2 shows the estimators of the standard analysis and the MLE where IGF-I and IGFBP-3 were included as standardized continuous covariates. The estimates and standard errors are almost the same for the two methods. Increased levels of IGF-I lead to significantly lowered risk of IHD when correcting for IGFBP-3. Conversely, increased levels of IGFBP-3 lead to a significantly higher risk of IHD. The baseline hazards stratified by age and gender were estimated by the MLE and are shown in Figure 1. These reveal that the baseline hazards varied considerably with age group and gender. Generally, females had a risk of IHD between one-half and one-third of males for all age groups, and the risk increased with age. To further summarize the effect of age and gender we assumed that the stratified baseline hazards led to proportional effects, thus using the additional modelling of the stratified baseline hazard described in Section 4. The effects of age and gender were then found using the MLE method. We found that females had a log-relative risk (sd) of 1.08(0.16), whilst with every year of age the log-relative risk increased by 0.088(0.0090). The common baseline hazard, which cannot be estimated by the standard NCC analysis, is shown in Figure 2. The thick and thin lines give the cumulative baseline hazards for males and for females, respectively. We also grouped IGF-I and IGFBP-3 into their quartiles and estimated the relative risk for these groups. The quartile analysis (Table 3) resulted in considerably smaller standard errors for the MLE in the stratified case compared to the estimates from the standard NCC. Not all of the log-relative risk estimates are significant, but the estimates are consistent with the linear modelling carried out above. We considered two versions of the MLE: one where the baseline hazard had proportional effects of age and gender; and a more general model, closer in spirit to the NCC analysis, where the baseline hazard was stratified according to eight groups depending on age and gender. We see, for example, that IGF-I in the lowest quartile corresponds to a log-relative risk of 0.57(0.15) for the stratified MLE, in contrast to the standard NCC estimate of 0.68(0.29). The stratified MLE had standard errors that differ considerably from the standard errors of the MLE. This suggests that the additional modelling of the stratified baseline hazards did not result in increased efficiency, even though the proportional modelling of the baseline hazards appears quite reasonable. The difference in the two sets of standard errors may therefore indicate that the EM-aided standard errors should only be used with caution. We tried different convergence criteria and found some variation in the estimated standard errors.

10 202 T. H. SCHEIKE AND A. JUUL Age= 30 Age= 40 cum. baseline cum. baseline cum. baseline time (years) Age= time (years) cum. baseline time (years) Age= time (years) Fig. 1. Cumulative baseline hazards according to age and gender. Males, thick line; females, thin line. cum. baseline Cumulative Baseline Hazard for IHD time (years) Fig. 2. Cumulative baseline hazards adjusted for effects of age and gender by proportional modelling for subjects aged 50 on entry, see text. Males, thick line; females, thin line.

11 MLE for nested case-control sampling 203 Table 3. Log-relative risk of quartiles of IGF-I and IGFBP-3 for IHD data. The effects of age and gender are only given for the model where they are modelled as proportional effects cum. distribution cum. distribution Standard NCC MLE MLE Strat. IGF-I 1st quartile 0.68 (0.29) 0.58 (0.21) 0.57 (0.15) IGF-I 2nd quartile 0.33 (0.27) 0.26 (0.19) 0.26 (0.20) IGF-I 3rd quartile 0.49 (0.25) 0.51 (0.27) 0.51 (0.18) IGF-I 4th quartile BP-3 1st quartile BP-3 2nd quartile 0.54 (0.26) 0.49 (0.17) 0.44 (0.22) BP-3 3rd quartile 0.44 (0.28) 0.46 (0.15) 0.39 (0.18) BP-3 4th quartile 0.71 (0.30) 0.63 (0.12) 0.59 (0.14) Gender (0.16) - Age (0.0090) - Males IGFI Standardized IGFI Females IGFI Standardized IGFI cum. distribution cum. distribution Males IGFBP Standardized IGFBP3 Females IGFBP Standardized IGFBP3 Fig. 3. Marginal distribution of IGF-1 and IGFBP-3 for males and females, on standardized scale. The MLE gives an estimate of the distribution of IGF-I and IGFBP-3 given the stratification variables. In Figure 3 we show the marginal distribution for IGF-I and IGFBP-3 for the two genders in a model where the distributions of these factors were assumed to depend only on gender. Figure 3 indicates that both hormones are somewhat higher for males. The distributions are shown on the standardized scale for the two hormones, as the variables enter the analysis in the standardized form. Note, however, that the estimates are invariant under translation and change of scale. A further analysis

12 204 T. H. SCHEIKE AND A. JUUL cum. baseline Quartile of Ratio IGFI/IGFBP3 2. Quartile of Ratio IGFI/IGFBP3 3. Quartile of Ratio IGFI/IGFBP3 4. Quartile of Ratio IGFI/IGFBP time (years) Fig. 4. Cumulative baseline hazards stratified according to level of the ratio of IGF-I to IGFBP-3 for males aged 50 on entry. also indicated that the distribution of the hormones did not vary much with the age groups. Finally, we estimated the cumulative baseline hazard stratified according to the quartiles of the ratio of IGF-I and IGFBP-3. The gender and age were present in the model through proportional hazards modelling. This analysis is summarized in Figure 4. In Figure 4 we see that a low ratio of IGF-I to IGFBP-3 indicates a high risk of IHD. For an individual in the high risk group the expected rate of occurrence of IHD is 0.12 per 15 years, which approximately translates to a 12% risk of developing IHD over the 15 years of the study. The analysis was adjusted for effects of gender and age that are fully observed for all individuals. These effects were similar to those reported earlier. 7. DISCUSSION The proposed MLE for nested case-control sampling appears to result in increased efficiency compared to the standard analysis when the effects are large. In addition, the MLE may be used to obtain additional information about the covariates. In our example we were able to compute estimates of the cumulative baseline hazard stratified according to the ratio of IGF-I and IGFBP-3. One important methodological point is that the NCC and the case-cohort study (Prentice, 1986; Self and Prentice, 1988) lead to likelihoods of the same form and therefore can be analysed by the same program. We conjecture that the two designs lead to equivalent power. The suggested MLE technique can be extended to deal with left truncation, in which case we are still able to compute the required conditional mean of the unobserved covariates given the data. The MLE has several important limitations compared to the standard analysis. The MLE procedure can only deal with time-constant covariates and a limited number of situations. Also, the censoring mechanism must be independent of the partly unobserved covariate. We estimated the standard errors of the log-relative risk parameters by EM-aided numerical differentiation, and this approach worked well in the simulations that we performed. In the application, however,

13 MLE for nested case-control sampling 205 we found that the standard errors showed some numerical instability. Repeated analysis of the data with slightly different criteria for convergence led to some variability in the estimates of the standard errors. We have assumed that X is discrete in nature. However, this assumption may be relaxed by using smoothing techniques to access the conditional distribution of Z given X. Further research should study the asymptotic properties of the MLE procedure. We conjecture that the results of Chen (2002) will carry through. Note that the partly observed covariates (Z) are not allowed to influence the selection of controls. To deal with counter-matching where the risk set selection also depends on the value of the observed value, Z i,ofthe covariate for the corresponding case, some modification of the suggested MLE is needed. See Borgan et al. (1995) for a discussion of various other strategies for selecting the controls in the NCC design. ACKNOWLEDGEMENTS The first author received partial support through a grant from the National Institutes of Health and did most of the work while employed by Department of Mathematical Sciences at Aalborg University. We thank the Centre for Preventive Medicine for letting us use their data, and the associate editor and two reviewers for their comments that lead to an improved presentation. REFERENCES ANDERSEN, P. K., BORGAN, Ø., GILL, R. D. AND KEIDING, N.(1993). Statistical Models Based on Counting Processes. New York: Springer. BORGAN, Ø., GOLDSTEIN, L.AND LANGHOLZ, B.(1995). Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Annals of Statistics 23, CHEN, H.Y.(2002). Double-semiparametric method for missing covariates in cox regression models. Journal of the American Statistical Association 97, CHEN, H. Y. AND LITTLE, R. J.(1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association 94, COX, D.R.(1972). Regression models and life tables. Journal of the Royal Statistical Society B 34, DEMPSTER, A. P., LAIRD, N. M. AND RUBIN, D..B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, GOLDSTEIN, L. AND LANGHOLZ, B.(1992). Asymptotic theory for nested case-control sampling in the Cox regression model. Annals of Statistics 20, JUUL, A., SCHEIKE, T., DAVIDSEN, M., GYLLENBORG, J. AND JØRGENSEN, T.(2002). Low serum insulinlike growth factor I is associated with increased risk of ischemic heart disease: a population based cohort study. Circulation 106, LOUIS, T. A.(1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society B 44, MARTINUSSEN, T. (1999). Cox regression with incomplete covariate measurements using the EM-algorithm. Scandinavian Journal of Statistics 26, MURPHY, S. A., ROSSINI, A. J. AND VAN DER VAART, A. W.(1997). Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association 92, NIELSEN, G. G., GILL, R., ANDERSEN, P. K. AND SØRENSEN, T. I. A.(1992). A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics 19,

14 206 T. H. SCHEIKE AND A. JUUL OAKES, D.(1981). Survival times: aspects of partial likelihood (with discussion). International Statistical Review 49, PRENTICE, R. L.(1986). A case-cohort design for epidemiological cohort studies and disease prevention trials. Biometrika 73, SAMUELSEN, S. O.(1997). A pseudolikelihood approach to analysis of nested case-control studies. Biometrika 84, SELF, S. G. AND PRENTICE, R. L.(1988). Asymptotic distribution theory and efficiency results for case-cohort studies. Annals of Statistics 16, THOMAS, D.(1977). Addendum to: methods of cohort analysis: appraisal by application to asbestos mining, by F.D.K. Liddell, J.C. McDonald and D.C. Thomas. Journal of the Royal Statistical Society A 140, TURNBULL, B. W.(1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the American Statistical Association 69, WELLNER, J. A. AND ZHAN, Y. (1997). A hybrid algorithm for computing of the nonparametric maximum likelihood estimator from censored data. Journal of the American Statistical Association 92, [Received August 28, 2002; first revision October 14, 2002; second revision January 13, 2003; third revision April 2, 2003; accepted for publication August 7, 2003]

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Cox s proportional hazards/regression model - model assessment

Cox s proportional hazards/regression model - model assessment Cox s proportional hazards/regression model - model assessment Rasmus Waagepetersen September 27, 2017 Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

NIH Public Access Author Manuscript J Am Stat Assoc. Author manuscript; available in PMC 2015 January 01.

NIH Public Access Author Manuscript J Am Stat Assoc. Author manuscript; available in PMC 2015 January 01. NIH Public Access Author Manuscript Published in final edited form as: J Am Stat Assoc. 2014 January 1; 109(505): 371 383. doi:10.1080/01621459.2013.842172. Efficient Estimation of Semiparametric Transformation

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Cox Regression in Nested Case Control Studies with Auxiliary Covariates

Cox Regression in Nested Case Control Studies with Auxiliary Covariates Biometrics DOI: 1.1111/j.1541-42.29.1277.x Cox Regression in Nested Case Control Studies with Auxiliary Covariates Mengling Liu, 1, Wenbin Lu, 2 and Chi-hong Tseng 3 1 Division of Biostatistics, School

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen The Annals of Statistics 21, Vol. 29, No. 5, 1344 136 A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1 By Thomas H. Scheike University of Copenhagen We present a non-parametric survival model

More information

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis PROPERTIES OF ESTIMATORS FOR RELATIVE RISKS FROM NESTED CASE-CONTROL STUDIES WITH MULTIPLE OUTCOMES (COMPETING RISKS) by NATHALIE C. STØER THESIS for the degree of MASTER OF SCIENCE Modelling and Data

More information

The Ef ciency of Simple and Countermatched Nested Case-control Sampling

The Ef ciency of Simple and Countermatched Nested Case-control Sampling Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 26: 493±509, 1999 The Ef ciency of Simple and Countermatched Nested Case-control

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca February 3, 2015 21-1 Time matching/risk set sampling/incidence density sampling/nested design 21-2 21-3

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

Philosophy and Features of the mstate package

Philosophy and Features of the mstate package Introduction Mathematical theory Practice Discussion Philosophy and Features of the mstate package Liesbeth de Wreede, Hein Putter Department of Medical Statistics and Bioinformatics Leiden University

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA Ørnulf Borgan Bryan Langholz Abstract Standard use of Cox s regression model and other relative risk regression models for

More information

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin FITTING COX'S PROPORTIONAL HAZARDS MODEL USING GROUPED SURVIVAL DATA Ian W. McKeague and Mei-Jie Zhang Florida State University and Medical College of Wisconsin Cox's proportional hazard model is often

More information

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan Outline: Cox regression part 2 Ørnulf Borgan Department of Mathematics University of Oslo Recapitulation Estimation of cumulative hazards and survival probabilites Assumptions for Cox regression and check

More information

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

A Regression Model For Recurrent Events With Distribution Free Correlation Structure A Regression Model For Recurrent Events With Distribution Free Correlation Structure J. Pénichoux(1), A. Latouche(2), T. Moreau(1) (1) INSERM U780 (2) Université de Versailles, EA2506 ISCB - 2009 - Prague

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Frailty Modeling for clustered survival data: a simulation study

Frailty Modeling for clustered survival data: a simulation study Frailty Modeling for clustered survival data: a simulation study IAA Oslo 2015 Souad ROMDHANE LaREMFiQ - IHEC University of Sousse (Tunisia) souad_romdhane@yahoo.fr Lotfi BELKACEM LaREMFiQ - IHEC University

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen Outline Cox s proportional hazards model. Goodness-of-fit tools More flexible models R-package timereg Forthcoming book, Martinussen and Scheike. 2/38 University of Copenhagen http://www.biostat.ku.dk

More information

A joint modeling approach for multivariate survival data with random length

A joint modeling approach for multivariate survival data with random length A joint modeling approach for multivariate survival data with random length Shuling Liu, Emory University Amita Manatunga, Emory University Limin Peng, Emory University Michele Marcus, Emory University

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Large sample theory for merged data from multiple sources

Large sample theory for merged data from multiple sources Large sample theory for merged data from multiple sources Takumi Saegusa University of Maryland Division of Statistics August 22 2018 Section 1 Introduction Problem: Data Integration Massive data are collected

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Multi-state models: prediction

Multi-state models: prediction Department of Medical Statistics and Bioinformatics Leiden University Medical Center Course on advanced survival analysis, Copenhagen Outline Prediction Theory Aalen-Johansen Computational aspects Applications

More information

Additive hazards regression for case-cohort studies

Additive hazards regression for case-cohort studies Biometrika (2), 87, 1, pp. 73 87 2 Biometrika Trust Printed in Great Britain Additive hazards regression for case-cohort studies BY MICAL KULIC Department of Probability and Statistics, Charles University,

More information

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis CIMAT Taller de Modelos de Capture y Recaptura 2010 Known Fate urvival Analysis B D BALANCE MODEL implest population model N = λ t+ 1 N t Deeper understanding of dynamics can be gained by identifying variation

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Stratified Nested Case-Control Sampling in the Cox Regression Model

Stratified Nested Case-Control Sampling in the Cox Regression Model Stratified Nested Case-Control Sampling in the Cox Regression Model Bryan Langholz Department of Preventive Medicine, University of Southern California, School of Medicine, 2025 Zonal Ave, Los Angeles,

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

1 Introduction. 2 Residuals in PH model

1 Introduction. 2 Residuals in PH model Supplementary Material for Diagnostic Plotting Methods for Proportional Hazards Models With Time-dependent Covariates or Time-varying Regression Coefficients BY QIQING YU, JUNYI DONG Department of Mathematical

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Residuals and model diagnostics

Residuals and model diagnostics Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional

More information

Modelling and Analysis of Recurrent Event Data

Modelling and Analysis of Recurrent Event Data Modelling and Analysis of Recurrent Event Data Edsel A. Peña Department of Statistics University of South Carolina Research support from NIH, NSF, and USC/MUSC Collaborative Grants Joint work with Prof.

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

END-POINT SAMPLING. Yuan Yao, Wen Yu and Kani Chen. Hong Kong Baptist University, Fudan University and Hong Kong University of Science and Technology

END-POINT SAMPLING. Yuan Yao, Wen Yu and Kani Chen. Hong Kong Baptist University, Fudan University and Hong Kong University of Science and Technology Statistica Sinica 27 (2017), 000-000 415-435 doi:http://dx.doi.org/10.5705/ss.202015.0294 END-POINT SAMPLING Yuan Yao, Wen Yu and Kani Chen Hong Kong Baptist University, Fudan University and Hong Kong

More information

Rene Tabanera y Palacios 4. Danish Epidemiology Science Center. Novo Nordisk A/S Gentofte. September 1, 1995

Rene Tabanera y Palacios 4. Danish Epidemiology Science Center. Novo Nordisk A/S Gentofte. September 1, 1995 Estimation of variance in Cox's regression model with gamma frailties. Per Kragh Andersen 2 John P. Klein 3 Kim M. Knudsen 2 Rene Tabanera y Palacios 4 Department of Biostatistics, University of Copenhagen,

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA

MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA Statistica Sinica 25 (215), 1231-1248 doi:http://dx.doi.org/1.575/ss.211.194 MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA Yuan Yao Hong Kong Baptist University Abstract:

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

A Poisson Process Approach for Recurrent Event Data with Environmental Covariates NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No.

A Poisson Process Approach for Recurrent Event Data with Environmental Covariates NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No. A Poisson Process Approach for Recurrent Event Data with Environmental Covariates Anup Dewanji Suresh H. Moolgavkar NRCSE T e c h n i c a l R e p o r t S e r i e s NRCSE-TRS No. 028 July 28, 1999 A POISSON

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA P. VENKATESAN* K. VISWANATHAN + R. PRABHAKAR* * Tuberculosis Research Centre,

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL ROBUST 24 c JČMF 24 TESTINGGOODNESSOFFITINTHECOX AALEN MODEL David Kraus Keywords: Counting process, Cox Aalen model, goodness-of-fit, martingale, residual, survival analysis. Abstract: The Cox Aalen regression

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

Regression Calibration in Semiparametric Accelerated Failure Time Models

Regression Calibration in Semiparametric Accelerated Failure Time Models Biometrics 66, 405 414 June 2010 DOI: 10.1111/j.1541-0420.2009.01295.x Regression Calibration in Semiparametric Accelerated Failure Time Models Menggang Yu 1, and Bin Nan 2 1 Department of Medicine, Division

More information

Accelerated Failure Time Models: A Review

Accelerated Failure Time Models: A Review International Journal of Performability Engineering, Vol. 10, No. 01, 2014, pp.23-29. RAMS Consultants Printed in India Accelerated Failure Time Models: A Review JEAN-FRANÇOIS DUPUY * IRMAR/INSA of Rennes,

More information

CURE MODEL WITH CURRENT STATUS DATA

CURE MODEL WITH CURRENT STATUS DATA Statistica Sinica 19 (2009), 233-249 CURE MODEL WITH CURRENT STATUS DATA Shuangge Ma Yale University Abstract: Current status data arise when only random censoring time and event status at censoring are

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL The Cox PH model: λ(t Z) = λ 0 (t) exp(β Z). How do we estimate the survival probability, S z (t) = S(t Z) = P (T > t Z), for an individual with covariates

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

STATISTICAL METHODS FOR CASE-CONTROL AND CASE-COHORT STUDIES WITH POSSIBLY CORRELATED FAILURE TIME DATA

STATISTICAL METHODS FOR CASE-CONTROL AND CASE-COHORT STUDIES WITH POSSIBLY CORRELATED FAILURE TIME DATA STATISTICAL METHODS FOR CASE-CONTROL AND CASE-COHORT STUDIES WITH POSSIBLY CORRELATED FAILURE TIME DATA by Sangwoo Kang A dissertation submitted to the faculty of the University of North Carolina at Chapel

More information

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities Hutson Journal of Statistical Distributions and Applications (26 3:9 DOI.86/s4488-6-47-y RESEARCH Open Access Nonparametric rank based estimation of bivariate densities given censored data conditional

More information

Handling Ties in the Rank Ordered Logit Model Applied in Epidemiological

Handling Ties in the Rank Ordered Logit Model Applied in Epidemiological Handling Ties in the Rank Ordered Logit Model Applied in Epidemiological Settings Angeliki Maraki Masteruppsats i matematisk statistik Master Thesis in Mathematical Statistics Masteruppsats 2016:4 Matematisk

More information

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Jin Kyung Park International Vaccine Institute Min Woo Chae Seoul National University R. Leon Ochiai International

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

ST745: Survival Analysis: Cox-PH!

ST745: Survival Analysis: Cox-PH! ST745: Survival Analysis: Cox-PH! Eric B. Laber Department of Statistics, North Carolina State University April 20, 2015 Rien n est plus dangereux qu une idee, quand on n a qu une idee. (Nothing is more

More information

The Proportional Hazard Model and the Modelling of Recurrent Failure Data: Analysis of a Disconnector Population in Sweden. Sweden

The Proportional Hazard Model and the Modelling of Recurrent Failure Data: Analysis of a Disconnector Population in Sweden. Sweden PS1 Life Cycle Asset Management The Proportional Hazard Model and the Modelling of Recurrent Failure Data: Analysis of a Disconnector Population in Sweden J. H. Jürgensen 1, A.L. Brodersson 2, P. Hilber

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information