Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula modelling. Example and Exercises. October 21 Clustered survival data For k = 1,..., K, i = 1,..., n, let Tik and Cik be the failure and censoring times for the ith individual in the kth cluster and let Xik(t) be a p-vector of covariates. Put Tk = ( T1k,..., Tnk), Ck = (C1k,..., Cnk), Xk(t) = (X1k(t),..., Xnk(t)). We assume that ( Tk, Ck, Xk( )), k = 1,..., K are independent and identically distributed variables and these variables follow the model described in the following. The right-censored failure time is denoted Tik = Tik Cik and as usual we let Yik(t) = 1(Tik t) Clustered survival data Two basic approaches : 1) Frailty models: given random effects Z the survival times are independent with hazards λ(t)z exp(x T i β) 2) Marginal approach, given covariates the marginal intensity is (need intensity to construct likelihood of data) Yi(t)λ(t) exp(x T i β) This gives population regression parameters, but one can also characterize the dependende using copula (or underlying frailty distribution). Assume for example that given frailty Z, the intensities are Zh(Λ, exp(xi T β)) for a specific choice of h. More on this later.

Frailty models Characterizing dependence ( Tk, Ck, Xk( ), Vk), k = 1,, K are assumed to be i.i.d. variables. Censoring, conditional on Vk and covariates, is assumed to be independent and noninformative on Vk. When the frailty variable Vk is gamma-distributed with mean one and variance θ 1 we get the. The shared frailty model is specified solely with respect to the (unobserved) filtration Ht = k Hk t, where H k t = σ{nik(s), Yik(s), Xik(s), Vk : i = 1, n, s t}, and often it is assumed to be a proportional hazards model We shall talk about estimation for the random effects models, but first some general remarks on dependence measures. Given two (T1, T2) survival times, we can compute Pearson correlation, Spearman correlation, and the Kendall s tau. Different ways of interpreting the degree of association. Spearman correlation depends on aspects of the bivariate distribution, but not the marginals. The same property is kept by Kendall s tau. These measures are rank-based. λ H ik (t) = Yik(t)Vkλ(t) exp(x T ik (t)β). (1) Kendall s tau Kendall s tau is defined given an i.i.d. copy, and the degree of difference between such two i.i.d. copies E(I ((T1 T1)(T2 T2) > )) E(I ((T1 T1)(T2 T2) < )) Dependence which is just the concordance probability - discordance probability Given a bivariate distrubtion written on copula-form F (s, t) = C(F1(s), F2(t)) where C is the copula specifying bivariate distributions on [, 1] 2, and Fj is the jth marginal. Kendall s tau is computed as τ = C(u, v)c(du, dv) Yik(t)Vkλ(t) exp(x T ik (t)β). (2) with Vk i.i.d. Γ with mean 1, and variance 1/θ, then τ = 1 1 + 2θ in the sense that with fixed covariates we see this Kendall s tau.

The frailty model Estimation : NPMLE for Frailty model When Vk i.i.d. Γ with mean 1, and variance 1/θ, then the observed intensity given covariates become 1 + θ 1 Nk. Yik(t) 1 + θ 1 T i exp(x ik T exp(xik (t)β). (3) (t)β)λ(tik)λ(t) And the marginal intensity becomes: 1 + θ 1 Nik(t ) T Yik(t) 1 + θ 1 exp(xik T exp(xik (t)β). (4) (t)β)λ(tik)λ(t) Frailty models Th (VkdΛ(s) exp(xki T β)) Nki (s) exp ( Vk exp(xki T β)dλ(s) k i s t (5) and with Vk Gamma distributed with mean 1, and variance 1/θ, giving the additional term f (Vk) (6) k For fixed θ write up the EM-algorithm for this model. How do we get standard errors for this model. To compute E-step we need E(Vk Data), and this conditional mean (and distribution) can be found from full likelihood. Recall that a Gamma distribution with parameters λ, α (Γ(λ, α)) has mean λα and variance λα 2 and density (Γ(λ)α λ ) 1 x λ 1 exp( x/α). Write up full EM-algorithm for the model where θ is not known. How does left truncation affect the computations. Frailty models, Attenuation > fit<-coxph(surv(time,status)~adult*trt+frailty(id), data=diabetes) > fit Call: coxph(formula = Surv(time, status) ~ adult * trt + frailty(id), data = diabetes) coef se(coef) se2 Chisq DF p adult.397.259.25 2.35 1..13 trt -.56.225.221 5.3 1..25 frailty(id) 122.54 88.6.98 adult:trt -.985.362.355 7.41 1..65 Iterations: 6 outer, 25 Newton-Raphson Variance of random effect=.926 I-likelihood = -847 If dim(x)=1 and gamma frailty with variance θ and mean 1 then the relative risk is Y (t)λ(t) exp (X T 1 β) 1 + θλ (t) ( exp (β) 1 + θλ(t) 1 + exp (β)θλ(t) ), (7) where Λ(t) = t λ(s) ds. exp (β) at time t = and tends to 1 as t >. V positive stable with Laplace transform ϕθ(t) = exp ( t θ ), < θ 1, then the marginal intensity is Y (t)θλ(t)λ(t) θ 1 exp (θx T β),

Frailty models, Attenuation Assume that V is a positive stochastic variable with Laplace transform ϕθ(t), that the covariate is one-dimensional, and that β >. The relative risk in the marginal model is exp (β) (D log ϕθ)(exp ((X + 1)β)Λ(t)) = exp (β)k(t), (D log ϕθ)(exp (X β)λ(t)) and we see that k(t) 1 if and only if (D logϕθ)(exp ((X + 1)β)Λ(t)) (D logϕθ)(exp (Xβ)Λ(t)). The latter inequality holds if log (ϕθ) is convex, which is the case since D 2 log (ϕθ)(t) = E(V 2 h(t, V )) E(Vh(t, V )) 2 with h(t, V ) = exp ( tv )/E(exp ( tv )). Frailty models Considering the twin.csv data of menarche ages for pairs of twins. First estimate the marginal effect of cohort, and assess wheter or not the zygosity affects the marginal models. Fit a frailty model for the overall data, to asses the effect of cohort. Use the phmm and coxph program and compare the estimates of the cumulative baseline. Compare also the baseline with that from the marginal model for mono and dyzygotic twins. Fit separate models for monozygotic and dizygotic twins. Estimate also the marginal baselines using this model, and compare them. fit a joint model using phmm and think carefully about how to parametrize this model such that the marginals models make sense. Is there a genetic effect on the timing of menarche? Report a Kendall s tau for this data for mono- and dizygotic twins and overall. Is the dependence different for mono- versus dizygotic twins. How can you make survival predictions for the monozygotic and dyzygotic twins? Frailty models Two sets of frailty models quite similar: Frailty model Frailty model has subject specific regression effects, that relies on choice of frailty (ofcourse). Frailty parameter model can in principle be identified solely on univariate data because marginal depends on θ. Frailty parameter primarily identified from correlation in practice. Frailty parameter has Kendall s tau interpretation. Frailty model more easy to extend to various other settings. Two-stage model, Later today Marginals fixed and regression parameters gives population effects. Frailty accounts solely for dependence. Frailty parameter has Kendall s tau interpretation. To be explicit, assume that the marginal and conditional intensities are λ F ik ik (t) = λik(t), λ H ik (t) = Vkλ ik(t), where we assume that λ ik (t) is predictable with respect to the marginal filtration. One may show that the relationship between the above two intensities is t λik(t) = Yik(t)( λ ik(t))(d log ϕθ)( λ ik(s) ds), t t λ ik(t) = Yik(t)( λik(t)) exp ( λik(s) ds)(dϕ 1 θ )(exp ( λik(s) ds

Establish the connection between the the two sets of intensities: Hint : compute the marginal survival for both models! Marginal models Right-censored failure times Tik = Tik Cik, Yik(t) = 1(Tik t), Nik(t) = 1(Tik t, Tik = Tik) Marginal (intensity) model is a model Cox model: F ik t = σ{nik(s), Yik(s), Xik(s) : s t}, (8) λ F ik ik (t) = Yik(t)λ(t) exp (Xik T (t)β). (9) It is important to note that (9) is not the intensity with respect to the observed filtration Ft k, (1) Ft = k Characterizing dependence where F k t = σ{nik(s), Yik(s), Xik(s) : i = 1, n, s t} is the information generated by observing all the individuals in the kth cluster. Characterizing dependence copula models P( T1 > t1,..., Tn > tn) = Cθ(S1(t1),..., Sn(tn)), where Sj, j = 1,..., n, denotes the marginal survivor functions. All multivariate distribution has this form. Archimedean copula model family, Cθ(u1,..., un) = ϕθ(ϕ 1 θ (u1) + + ϕ 1 θ (un)) for some non-negative convex decreasing function ϕθ with ϕθ() = 1. Below we describe the two-stage method for the Clayton-Oakes model with marginal hazards on Cox form. Assume random effects Vk, k = 1,..., K such that ( Tk, Ck, Xk( ), Vk), k = 1,, K are i.i.d. variables. Censoring, conditional on Vk and covariates, is assumed to be independent and noninformative on Vk. Tik, i = 1,, n, are independent given Vk, X1( ),, Xn( ). Vk Γ mean 1 and variance θ 1. Let Tik = Tik Cik, Yik(t) = 1(Tik t) and Nik(t) = 1(Tik t, Tik = Tik).

Now, with respect to (unobserved) filtration where is Ht = k H k t, (11) H k t = σ{nik(s), Yik(s), Xik(s), Vk : i = 1, n, s t}, λ H ik (t) = Vkλ ik(t, θ, λ( )), (12) referred to as the, and so that the marginal intensities are on Cox form λ F ik ik (t) = Yik(t)λ(t) exp (Xik T (t)β), (13) Then λ ik is λ ik(t, θ, λ( )) = Yik(t)λ(t) exp(x T ik (t)β) exp(θ 1 t exp(x T ik (s)β T )λ(s)ds), It is of interest to find the intensities with respect to the observed filtration Ft given in (1). It can be shown that these are where λ F ik(t) = Yik(t)λ(t) exp(x T ik (t)β)fik(t), (14) (θ + N k(t ) )(exp(θ 1 t fik(t) = θ n t fk(t) = 1 + (exp(θ 1 j=1 λ(s) exp(x T ik (s)β)ds) ), fk(t) Yjk(s)λ(s) exp(x T jk (s)β)ds) 1). The observed (partial) log-likelihood function is K log(1 + N k(t ) K n )dn k(t) + log(yik(t) λik(t))dnik(t) k=1 θ k=1 i=1 K [ ] n θ + N k(τ) log(1 + θ 1 Yik(t) λik(t)dt), (15) k=1 i=1 where λik(t) = λ(t)e X T ik (t)β exp(θ 1 t e X T ik (s)β λ(s)ds). Terms depending on θ in (15) gives 1 ( K K n log(1 + θ 1 N k(t ))dn k(t) + θ 1 Nik(τ)Hik K k=1 k=1 i=1 K ) (θ + N k(τ)) log(rk(θ)), (16) Hik = k=1 n Yik(t)e X ik T (t)β dλ(t), Rk(θ) = 1 + (exp(θ 1 Hik) 1). i=1

Now, by replacing Hik with Ĥik = Yik(t) exp(xik T (t) ˆβI )d ˆΛI (t) in (16), we obtain the pseudo log likelihood for θ, and maximizing this function in θ gives the two-stage estimator of θ. Under some regularity conditions, [?] showed consistency and asymptotic normality of this estimator. data(diabetes) # Marginal Cox model with treat as covariate fit<-two.stage(surv(time,status) ~prop(treat)+cluster(id), data=diabetes,nit=4,theta=1) summary(fit) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau.962.355 2.71.667.325 intercept Marginal Cox-Aalen model fit Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val -.777.169.147.169-4.59 4.4e-6 prop(treat) Call: two.stage(surv(time, status) ~ prop(treat) + cluster(id), data = diabetes, Nit = 4, theta = 1) # Stratification after adult theta.des<-model.matrix(~-1+factor(adult),diabetes); fit.s2<-two.stage(surv(time,status) ~+1+prop(treat) + cluster(id), data=diabetes,nit=4,theta=1,theta.des=theta.des) summary(fit.s2) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau factor(adult)1.915.399 2.29.219.314 factor(adult)2 1.8.722 1.5.133.352 Marginal Cox-Aalen model fit # test for same variance among the two strata theta.des<-model.matrix(~factor(adult),diabetes); fit.s3<-two.stage(surv(time,status) ~+1+prop(treat)+cluster(id), data=diabetes,nit=4,theta=1,theta.des=theta.des) summary(fit.s3) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau (Intercept).915.399 2.29.219.314 factor(adult)2.17.815.28.835.782 Marginal Cox-Aalen model fit Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val -.777.169.147.169-4.59 4.4e-6 prop(treat) Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val -.777.169.147.169-4.59 4.4e-6 prop(treat) # to fit model without covariates, beta.fixed=1, but still need prop term! fit<-two.stage(surv(time,status) ~ prop(treat) + cluster(id), data=diabetes,theta=.95,detail=,beta.fixed=1) summary(fit) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau.584.278 2.1.357.226 intercept

Consider the twin data Fit marginal models for the mono- and the dizygotic twins. Report a Kendall s tau. How does this Kendall s tau compare with that from the standard frailty model? Validate the fit of the model, by validating the marginal model. Compare formally the dependence between mono- and dyzygotic twins.