Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Similar documents
Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Multivariate Survival Analysis

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Modelling geoadditive survival data

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Survival Regression Models

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Survival Analysis using Bivariate Archimedean Copulas. Krishnendu Chandra

Frailty Models and Copulas: Similarities and Differences

Proportional hazards model for matched failure time data

Time-dependent coefficients

Cox s proportional hazards/regression model - model assessment

Tests of independence for censored bivariate failure time data

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Lecture 8 Stat D. Gillen

Lecture 5 Models and methods for recurrent event data

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Survival analysis in R

Financial Econometrics and Volatility Models Copulas

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

MAS3301 / MAS8311 Biostatistics Part II: Survival

The coxvc_1-1-1 package

Pairwise dependence diagnostics for clustered failure-time data

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich

Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas

Analysing geoadditive regression data: a mixed model approach

STAT Sample Problem: General Asymptotic Results

Modelling Dependent Credit Risks

Frailty Modeling for clustered survival data: a simulation study

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

A Measure of Association for Bivariate Frailty Distributions

Multivariate Survival Data With Censoring.

Survival analysis in R

Statistical Inference and Methods

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Semiparametric Regression

A general mixed model approach for spatio-temporal regression data

Introduction to Statistical Analysis

DAGStat Event History Analysis.

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

ST745: Survival Analysis: Cox-PH!

Survival Analysis Math 434 Fall 2011

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

CTDL-Positive Stable Frailty Model

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Composite likelihood and two-stage estimation in family studies

On consistency of Kendall s tau under censoring

Modelling and Analysis of Recurrent Event Data

parfm: Parametric Frailty Models in R

Moger, TA; Haugen, M; Yip, BHK; Gjessing, HK; Borgan, Ø. Citation Lifetime Data Analysis, 2010, v. 17, n. 3, p

Exercises. (a) Prove that m(t) =

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

MAS3301 / MAS8311 Biostatistics Part II: Survival

Lecture 3. Truncation, length-bias and prevalence sampling

Models for Multivariate Panel Count Data

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

Politecnico di Torino. Porto Institutional Repository

A Goodness-of-fit Test for Semi-parametric Copula Models of Right-Censored Bivariate Survival Times

Multistate models and recurrent event models

STATISTICAL ANALYSIS OF MULTIVARIATE INTERVAL-CENSORED FAILURE TIME DATA

Nonparametric estimation of linear functionals of a multivariate distribution under multivariate censoring with applications.

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

Gibbs Sampling in Linear Models #2

Survival Distributions, Hazard Functions, Cumulative Hazards

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Ensemble estimation and variable selection with semiparametric regression models

Harvard University. Harvard University Biostatistics Working Paper Series

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

β j = coefficient of x j in the model; β = ( β1, β2,

Reduced-rank hazard regression

Regularization in Cox Frailty Models

Multistate Modeling and Applications

A joint modeling approach for multivariate survival data with random length

Modeling and Analysis of Recurrent Event Data

Copula modeling for discrete data

Unobserved Heterogeneity

Multi-state Models: An Overview

On the Breslow estimator

On a connection between the Bradley-Terry model and the Cox proportional hazards model

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA


Multistate models and recurrent event models

Likelihood Construction, Inference for Parametric Survival Distributions

1 Glivenko-Cantelli type theorems

Lecture 7 Time-dependent Covariates in Cox Regression

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Optimal exact tests for complex alternative hypotheses on cross tabulated data

FRAILTY MODELS FOR MODELLING HETEROGENEITY

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Simulating Exchangeable Multivariate Archimedean Copulas and its Applications. Authors: Florence Wu Emiliano A. Valdez Michael Sherris

frailtyem: An R Package for Estimating Semiparametric Shared Frailty Models

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Dependent Hazards in Multivariate Survival Problems

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Transcription:

Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula modelling. Example and Exercises. October 21 Clustered survival data For k = 1,..., K, i = 1,..., n, let Tik and Cik be the failure and censoring times for the ith individual in the kth cluster and let Xik(t) be a p-vector of covariates. Put Tk = ( T1k,..., Tnk), Ck = (C1k,..., Cnk), Xk(t) = (X1k(t),..., Xnk(t)). We assume that ( Tk, Ck, Xk( )), k = 1,..., K are independent and identically distributed variables and these variables follow the model described in the following. The right-censored failure time is denoted Tik = Tik Cik and as usual we let Yik(t) = 1(Tik t) Clustered survival data Two basic approaches : 1) Frailty models: given random effects Z the survival times are independent with hazards λ(t)z exp(x T i β) 2) Marginal approach, given covariates the marginal intensity is (need intensity to construct likelihood of data) Yi(t)λ(t) exp(x T i β) This gives population regression parameters, but one can also characterize the dependende using copula (or underlying frailty distribution). Assume for example that given frailty Z, the intensities are Zh(Λ, exp(xi T β)) for a specific choice of h. More on this later.

Frailty models Characterizing dependence ( Tk, Ck, Xk( ), Vk), k = 1,, K are assumed to be i.i.d. variables. Censoring, conditional on Vk and covariates, is assumed to be independent and noninformative on Vk. When the frailty variable Vk is gamma-distributed with mean one and variance θ 1 we get the. The shared frailty model is specified solely with respect to the (unobserved) filtration Ht = k Hk t, where H k t = σ{nik(s), Yik(s), Xik(s), Vk : i = 1, n, s t}, and often it is assumed to be a proportional hazards model We shall talk about estimation for the random effects models, but first some general remarks on dependence measures. Given two (T1, T2) survival times, we can compute Pearson correlation, Spearman correlation, and the Kendall s tau. Different ways of interpreting the degree of association. Spearman correlation depends on aspects of the bivariate distribution, but not the marginals. The same property is kept by Kendall s tau. These measures are rank-based. λ H ik (t) = Yik(t)Vkλ(t) exp(x T ik (t)β). (1) Kendall s tau Kendall s tau is defined given an i.i.d. copy, and the degree of difference between such two i.i.d. copies E(I ((T1 T1)(T2 T2) > )) E(I ((T1 T1)(T2 T2) < )) Dependence which is just the concordance probability - discordance probability Given a bivariate distrubtion written on copula-form F (s, t) = C(F1(s), F2(t)) where C is the copula specifying bivariate distributions on [, 1] 2, and Fj is the jth marginal. Kendall s tau is computed as τ = C(u, v)c(du, dv) Yik(t)Vkλ(t) exp(x T ik (t)β). (2) with Vk i.i.d. Γ with mean 1, and variance 1/θ, then τ = 1 1 + 2θ in the sense that with fixed covariates we see this Kendall s tau.

The frailty model Estimation : NPMLE for Frailty model When Vk i.i.d. Γ with mean 1, and variance 1/θ, then the observed intensity given covariates become 1 + θ 1 Nk. Yik(t) 1 + θ 1 T i exp(x ik T exp(xik (t)β). (3) (t)β)λ(tik)λ(t) And the marginal intensity becomes: 1 + θ 1 Nik(t ) T Yik(t) 1 + θ 1 exp(xik T exp(xik (t)β). (4) (t)β)λ(tik)λ(t) Frailty models Th (VkdΛ(s) exp(xki T β)) Nki (s) exp ( Vk exp(xki T β)dλ(s) k i s t (5) and with Vk Gamma distributed with mean 1, and variance 1/θ, giving the additional term f (Vk) (6) k For fixed θ write up the EM-algorithm for this model. How do we get standard errors for this model. To compute E-step we need E(Vk Data), and this conditional mean (and distribution) can be found from full likelihood. Recall that a Gamma distribution with parameters λ, α (Γ(λ, α)) has mean λα and variance λα 2 and density (Γ(λ)α λ ) 1 x λ 1 exp( x/α). Write up full EM-algorithm for the model where θ is not known. How does left truncation affect the computations. Frailty models, Attenuation > fit<-coxph(surv(time,status)~adult*trt+frailty(id), data=diabetes) > fit Call: coxph(formula = Surv(time, status) ~ adult * trt + frailty(id), data = diabetes) coef se(coef) se2 Chisq DF p adult.397.259.25 2.35 1..13 trt -.56.225.221 5.3 1..25 frailty(id) 122.54 88.6.98 adult:trt -.985.362.355 7.41 1..65 Iterations: 6 outer, 25 Newton-Raphson Variance of random effect=.926 I-likelihood = -847 If dim(x)=1 and gamma frailty with variance θ and mean 1 then the relative risk is Y (t)λ(t) exp (X T 1 β) 1 + θλ (t) ( exp (β) 1 + θλ(t) 1 + exp (β)θλ(t) ), (7) where Λ(t) = t λ(s) ds. exp (β) at time t = and tends to 1 as t >. V positive stable with Laplace transform ϕθ(t) = exp ( t θ ), < θ 1, then the marginal intensity is Y (t)θλ(t)λ(t) θ 1 exp (θx T β),

Frailty models, Attenuation Assume that V is a positive stochastic variable with Laplace transform ϕθ(t), that the covariate is one-dimensional, and that β >. The relative risk in the marginal model is exp (β) (D log ϕθ)(exp ((X + 1)β)Λ(t)) = exp (β)k(t), (D log ϕθ)(exp (X β)λ(t)) and we see that k(t) 1 if and only if (D logϕθ)(exp ((X + 1)β)Λ(t)) (D logϕθ)(exp (Xβ)Λ(t)). The latter inequality holds if log (ϕθ) is convex, which is the case since D 2 log (ϕθ)(t) = E(V 2 h(t, V )) E(Vh(t, V )) 2 with h(t, V ) = exp ( tv )/E(exp ( tv )). Frailty models Considering the twin.csv data of menarche ages for pairs of twins. First estimate the marginal effect of cohort, and assess wheter or not the zygosity affects the marginal models. Fit a frailty model for the overall data, to asses the effect of cohort. Use the phmm and coxph program and compare the estimates of the cumulative baseline. Compare also the baseline with that from the marginal model for mono and dyzygotic twins. Fit separate models for monozygotic and dizygotic twins. Estimate also the marginal baselines using this model, and compare them. fit a joint model using phmm and think carefully about how to parametrize this model such that the marginals models make sense. Is there a genetic effect on the timing of menarche? Report a Kendall s tau for this data for mono- and dizygotic twins and overall. Is the dependence different for mono- versus dizygotic twins. How can you make survival predictions for the monozygotic and dyzygotic twins? Frailty models Two sets of frailty models quite similar: Frailty model Frailty model has subject specific regression effects, that relies on choice of frailty (ofcourse). Frailty parameter model can in principle be identified solely on univariate data because marginal depends on θ. Frailty parameter primarily identified from correlation in practice. Frailty parameter has Kendall s tau interpretation. Frailty model more easy to extend to various other settings. Two-stage model, Later today Marginals fixed and regression parameters gives population effects. Frailty accounts solely for dependence. Frailty parameter has Kendall s tau interpretation. To be explicit, assume that the marginal and conditional intensities are λ F ik ik (t) = λik(t), λ H ik (t) = Vkλ ik(t), where we assume that λ ik (t) is predictable with respect to the marginal filtration. One may show that the relationship between the above two intensities is t λik(t) = Yik(t)( λ ik(t))(d log ϕθ)( λ ik(s) ds), t t λ ik(t) = Yik(t)( λik(t)) exp ( λik(s) ds)(dϕ 1 θ )(exp ( λik(s) ds

Establish the connection between the the two sets of intensities: Hint : compute the marginal survival for both models! Marginal models Right-censored failure times Tik = Tik Cik, Yik(t) = 1(Tik t), Nik(t) = 1(Tik t, Tik = Tik) Marginal (intensity) model is a model Cox model: F ik t = σ{nik(s), Yik(s), Xik(s) : s t}, (8) λ F ik ik (t) = Yik(t)λ(t) exp (Xik T (t)β). (9) It is important to note that (9) is not the intensity with respect to the observed filtration Ft k, (1) Ft = k Characterizing dependence where F k t = σ{nik(s), Yik(s), Xik(s) : i = 1, n, s t} is the information generated by observing all the individuals in the kth cluster. Characterizing dependence copula models P( T1 > t1,..., Tn > tn) = Cθ(S1(t1),..., Sn(tn)), where Sj, j = 1,..., n, denotes the marginal survivor functions. All multivariate distribution has this form. Archimedean copula model family, Cθ(u1,..., un) = ϕθ(ϕ 1 θ (u1) + + ϕ 1 θ (un)) for some non-negative convex decreasing function ϕθ with ϕθ() = 1. Below we describe the two-stage method for the Clayton-Oakes model with marginal hazards on Cox form. Assume random effects Vk, k = 1,..., K such that ( Tk, Ck, Xk( ), Vk), k = 1,, K are i.i.d. variables. Censoring, conditional on Vk and covariates, is assumed to be independent and noninformative on Vk. Tik, i = 1,, n, are independent given Vk, X1( ),, Xn( ). Vk Γ mean 1 and variance θ 1. Let Tik = Tik Cik, Yik(t) = 1(Tik t) and Nik(t) = 1(Tik t, Tik = Tik).

Now, with respect to (unobserved) filtration where is Ht = k H k t, (11) H k t = σ{nik(s), Yik(s), Xik(s), Vk : i = 1, n, s t}, λ H ik (t) = Vkλ ik(t, θ, λ( )), (12) referred to as the, and so that the marginal intensities are on Cox form λ F ik ik (t) = Yik(t)λ(t) exp (Xik T (t)β), (13) Then λ ik is λ ik(t, θ, λ( )) = Yik(t)λ(t) exp(x T ik (t)β) exp(θ 1 t exp(x T ik (s)β T )λ(s)ds), It is of interest to find the intensities with respect to the observed filtration Ft given in (1). It can be shown that these are where λ F ik(t) = Yik(t)λ(t) exp(x T ik (t)β)fik(t), (14) (θ + N k(t ) )(exp(θ 1 t fik(t) = θ n t fk(t) = 1 + (exp(θ 1 j=1 λ(s) exp(x T ik (s)β)ds) ), fk(t) Yjk(s)λ(s) exp(x T jk (s)β)ds) 1). The observed (partial) log-likelihood function is K log(1 + N k(t ) K n )dn k(t) + log(yik(t) λik(t))dnik(t) k=1 θ k=1 i=1 K [ ] n θ + N k(τ) log(1 + θ 1 Yik(t) λik(t)dt), (15) k=1 i=1 where λik(t) = λ(t)e X T ik (t)β exp(θ 1 t e X T ik (s)β λ(s)ds). Terms depending on θ in (15) gives 1 ( K K n log(1 + θ 1 N k(t ))dn k(t) + θ 1 Nik(τ)Hik K k=1 k=1 i=1 K ) (θ + N k(τ)) log(rk(θ)), (16) Hik = k=1 n Yik(t)e X ik T (t)β dλ(t), Rk(θ) = 1 + (exp(θ 1 Hik) 1). i=1

Now, by replacing Hik with Ĥik = Yik(t) exp(xik T (t) ˆβI )d ˆΛI (t) in (16), we obtain the pseudo log likelihood for θ, and maximizing this function in θ gives the two-stage estimator of θ. Under some regularity conditions, [?] showed consistency and asymptotic normality of this estimator. data(diabetes) # Marginal Cox model with treat as covariate fit<-two.stage(surv(time,status) ~prop(treat)+cluster(id), data=diabetes,nit=4,theta=1) summary(fit) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau.962.355 2.71.667.325 intercept Marginal Cox-Aalen model fit Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val -.777.169.147.169-4.59 4.4e-6 prop(treat) Call: two.stage(surv(time, status) ~ prop(treat) + cluster(id), data = diabetes, Nit = 4, theta = 1) # Stratification after adult theta.des<-model.matrix(~-1+factor(adult),diabetes); fit.s2<-two.stage(surv(time,status) ~+1+prop(treat) + cluster(id), data=diabetes,nit=4,theta=1,theta.des=theta.des) summary(fit.s2) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau factor(adult)1.915.399 2.29.219.314 factor(adult)2 1.8.722 1.5.133.352 Marginal Cox-Aalen model fit # test for same variance among the two strata theta.des<-model.matrix(~factor(adult),diabetes); fit.s3<-two.stage(surv(time,status) ~+1+prop(treat)+cluster(id), data=diabetes,nit=4,theta=1,theta.des=theta.des) summary(fit.s3) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau (Intercept).915.399 2.29.219.314 factor(adult)2.17.815.28.835.782 Marginal Cox-Aalen model fit Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val -.777.169.147.169-4.59 4.4e-6 prop(treat) Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val -.777.169.147.169-4.59 4.4e-6 prop(treat) # to fit model without covariates, beta.fixed=1, but still need prop term! fit<-two.stage(surv(time,status) ~ prop(treat) + cluster(id), data=diabetes,theta=.95,detail=,beta.fixed=1) summary(fit) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall's tau.584.278 2.1.357.226 intercept

Consider the twin data Fit marginal models for the mono- and the dizygotic twins. Report a Kendall s tau. How does this Kendall s tau compare with that from the standard frailty model? Validate the fit of the model, by validating the marginal model. Compare formally the dependence between mono- and dyzygotic twins.