Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Similar documents
Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Multivariate Survival Analysis

Cox s proportional hazards/regression model - model assessment

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Tests of independence for censored bivariate failure time data

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

Frailty Models and Copulas: Similarities and Differences

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Proportional hazards model for matched failure time data

Modelling geoadditive survival data

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Multivariate Survival Data With Censoring.

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

STAT Sample Problem: General Asymptotic Results

Survival Analysis Math 434 Fall 2011

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

Time-dependent coefficients

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Survival Regression Models

Pairwise dependence diagnostics for clustered failure-time data


Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Semiparametric Regression

MAS3301 / MAS8311 Biostatistics Part II: Survival

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich

Lecture 8 Stat D. Gillen

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL

Financial Econometrics and Volatility Models Copulas

Lecture 5 Models and methods for recurrent event data

Survival analysis in R

DAGStat Event History Analysis.

STAT331. Cox s Proportional Hazards Model

Modelling and Analysis of Recurrent Event Data

CTDL-Positive Stable Frailty Model

A Goodness-of-fit Test for Semi-parametric Copula Models of Right-Censored Bivariate Survival Times

The coxvc_1-1-1 package

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Survival Analysis using Bivariate Archimedean Copulas. Krishnendu Chandra

Modelling Dependent Credit Risks

A Measure of Association for Bivariate Frailty Distributions

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

Nonparametric estimation of linear functionals of a multivariate distribution under multivariate censoring with applications.

Harvard University. Harvard University Biostatistics Working Paper Series

MAS3301 / MAS8311 Biostatistics Part II: Survival

Composite likelihood and two-stage estimation in family studies

Introduction to Statistical Analysis

Copula modeling for discrete data

Survival analysis in R

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Statistical Inference and Methods

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

Models for Multivariate Panel Count Data

ST745: Survival Analysis: Cox-PH!

Cox s proportional hazards model and Cox s partial likelihood

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

β j = coefficient of x j in the model; β = ( β1, β2,

Relative-risk regression and model diagnostics. 16 November, 2015

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Survival Analysis. Stat 526. April 13, 2018

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Frailty Modeling for clustered survival data: a simulation study

1 Introduction. 2 Residuals in PH model

A copula goodness-of-t approach. conditional probability integral transform. Daniel Berg 1 Henrik Bakken 2

1 Glivenko-Cantelli type theorems

Cox regression: Estimation

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Reduced-rank hazard regression

Likelihood Construction, Inference for Parametric Survival Distributions

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

On a connection between the Bradley-Terry model and the Cox proportional hazards model

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

Outline. Cox's regression model Goodness-of-t methods. Cox's proportional hazards model: Survival analysis

On the Breslow estimator

A general mixed model approach for spatio-temporal regression data

Lecture 22 Survival Analysis: An Introduction

Analysing geoadditive regression data: a mixed model approach

Survival Analysis for Case-Cohort Studies

frailtyem: An R Package for Estimating Semiparametric Shared Frailty Models

Two-level lognormal frailty model and competing risks model with missing cause of failure

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Step-Stress Models and Associated Inference

Ensemble estimation and variable selection with semiparametric regression models

Survival Analysis: Counting Process and Martingale. Lu Tian and Richard Olshen Stanford University

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

STATISTICAL ANALYSIS OF MULTIVARIATE INTERVAL-CENSORED FAILURE TIME DATA

SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS

Multistate Modeling and Applications

Regularization in Cox Frailty Models

Tied survival times; estimation of survival probabilities

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Lecture 2: Martingale theory for univariate survival analysis

Transcription:

Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula modelling. Example and Exercises. November 212 Clustered survival data For k = 1,..., K, i = 1,..., n, let T ik and C ik be the failure and censoring times for the ith individual in the kth cluster and let X ik (t) be a p-vector of covariates. Put T k = ( T 1k,..., T nk ), C k = (C 1k,..., C nk ), X k (t) = (X 1k (t),..., X nk (t)). We assume that ( T k, C k, X k ( )), k = 1,..., K are independent and identically distributed variables and these variables follow the model described in the following. The right-censored failure time is denoted T ik = T ik C ik and as usual we let Y ik (t) = 1(T ik t) Clustered survival data Two basic approaches : 1) Frailty models: given random effects "Z" the survival times are independent with hazards λ (t)z exp(x T i β) 2) Marginal approach, given covariates the marginal intensity is (need intensity to construct likelihood of data) Y i (t)λ (t) exp(x T i β) This gives population regression parameters, but one can also characterize the dependende using copula (or underlying frailty distribution). Assume for example that given frailty Z, the intensities are Zh(Λ, exp(xi T β)) for a specific choice of h. More on this later.

Frailty models Characterizing dependence ( T k, C k, X k ( ), V k ), k = 1,, K are assumed to be i.i.d. variables. Censoring, conditional on V k and covariates, is assumed to be independent and noninformative on V k. When the frailty variable V k is gamma-distributed with mean one and variance 1 we get the. The shared frailty model is specified solely with respect to the (unobserved) filtration H t = k Hk t, where H k t = σ{n ik (s), Y ik (s), X ik (s), V k : i = 1, n, s t}, and often it is assumed to be a proportional hazards model We shall talk about estimation for the random effects models, but first some general remarks on dependence measures. Given two (T 1, T 2 ) survival times, we can compute Pearson correlation, Spearman correlation, and the Kendall s tau. Different ways of interpreting the degree of association. Spearman correlation depends on aspects of the bivariate distribution, but not the marginals. The same property is kept by Kendall s tau. These measures are rank-based. λ H ik (t) = Y ik(t)v k λ (t) exp(xik T (t)β). (1) Kendall s tau Dependence Kendall s tau is defined given an i.i.d. copy, and the degree of difference between such two i.i.d. copies E(I ((T 1 T 1 )(T 2 T 2 ) > )) E(I ((T 1 T 1 )(T 2 T 2 ) < )) which is just the concordance probability - discordance probability Given a bivariate distrubtion written on copula-form F (s, t) = C(F 1 (s), F 2 (t)) where C is the copula specifying bivariate distributions on [, 1] 2, and F j is the jth marginal. Kendall s tau is computed as τ = C(u, v)c(du, dv) Y ik (t)v k λ (t) exp(xik T (t)β). (2) with V k i.i.d. Γ with mean 1, and variance 1/, then τ = 1 1 + 2 in the sense that with fixed covariates we see this Kendall s tau.

The frailty model When V k i.i.d. Γ with mean 1, and variance 1/, then the observed intensity given covariates become 1 + 1 N k. Y ik (t) 1 + 1 i exp(x ik T (t)β)λ (T ik ) λ (t) exp(xik T (t)β). (3) And the marginal intensity becomes: 1 + 1 N ik (t ) Y ik (t) 1 + 1 exp(xik T (t)β)λ (T ik ) λ (t) exp(xik T (t)β). (4) What is the observed hazard, and how is it different from marginal intensities? As a part of the next exercise we shall also derive these expressions. Frailty models Estimation : NPMLE for Frailty model (V k dλ (s) exp(xki T β)) N ki (s) exp ( k i s t Th V k exp(x T ki β)dλ (s) (5) and with V k Gamma distributed with mean 1, and variance 1/, giving the additional term f (V k ) (6) k For fixed write up the EM-algorithm for this model. How do we get standard errors for this model. To compute E-step we need E(Vk Data), and this conditional mean (and distribution) can be found from full likelihood. Recall that a Gamma distribution with parameters λ, α (Γ(λ, α)) has mean λα and variance λα 2 and density (Γ(λ)α λ ) 1 x λ 1 exp( x/α). Write up full EM-algorithm for the model where is not known. Frailty models, Attenuation > fit<-coxph(surv(time,status)~adult*trt+frailty(id), data=diabetes) > fit Call: coxph(formula = Surv(time, status) ~ adult * trt + frailty(id), data = diabetes) coef se(coef) se2 Chisq DF p adult.397.259.25 2.35 1..13 trt -.56.225.221 5.3 1..25 frailty(id) 122.54 88.6.98 adult:trt -.985.362.355 7.41 1..65 Iterations: 6 outer, 25 Newton-Raphson Variance of random effect=.926 I-likelihood = -847 Fit the model yoursel, and try to simplify the regression part, is the frailty variance significant, and what is it. What is the related Kendall s tau? Compare with the marginal esimates. How do the regression effects differ? If dim(x)=1 and gamma frailty with variance and mean 1 then the relative risk is Y (t)λ (t) exp (X T 1 β) 1 + Λ (t) ( exp (β) 1 + Λ (t) 1 + exp (β)λ (t) ), (7) where Λ (t) = t λ (s) ds. exp (β) at time t = and tends to 1 as t >. V positive stable with Laplace transform ϕ (t) = exp ( t ), < 1, then the marginal intensity is Y (t)λ (t)λ (t) 1 exp (X T β),

Frailty models, Attenuation Frailty models Assume that V is a positive stochastic variable with Laplace transform ϕ (t), that the covariate is one-dimensional, and that β >. The relative risk in the marginal model is exp (β) (D log ϕ )(exp ((X + 1)β)Λ (t)) (D log ϕ )(exp (X β)λ (t)) = exp (β)k(t), and we see that k(t) 1 if and only if (D log ϕ )(exp ((X + 1)β)Λ (t)) (D log ϕ )(exp (X β)λ (t)). The latter inequality holds if log (ϕ ) is convex, which is the case since D 2 log (ϕ )(t) = E(V 2 h(t, V )) E(Vh(t, V )) 2 with h(t, V ) = exp ( tv )/E(exp ( tv )). Two sets of frailty models quite similar: Frailty model Frailty model has subject specific regression effects, that relies on choice of frailty (ofcourse). Frailty parameter model can in principle be identified solely on univariate data because marginal depends on. Frailty parameter primarily identified from correlation in practice. Frailty parameter has Kendall s tau interpretation. Frailty model more easy to extend to various other settings. Two-stage model, Later today Marginals fixed and regression parameters gives population effects. Frailty accounts solely for dependence. Frailty parameter has Kendall s tau interpretation. Two-stage modelling Two-stage modelling To be explicit, assume that the marginal and conditional intensities are λ F ik ik (t) = λ ik(t), λ H ik (t) = V kλ ik (t), where we assume that λ ik (t) is predictable with respect to the marginal filtration. One may show that the relationship between the above two intensities is Establish the connection between the the two sets of intensities: Hint : compute the marginal survival for both models! λ ik (t) = Y ik (t)( λ ik (t))(d log ϕ )( t λ ik (t) = Y ik(t)( λ ik (t)) exp ( t λ ik (s) ds), t λ ik (s) ds)(dϕ 1 )(exp ( λ ik (s) ds)),

Marginal models Right-censored failure times T ik = T ik C ik, Y ik (t) = 1(T ik t), N ik (t) = 1(T ik t, T ik = T ik ) Marginal (intensity) model is a model given Cox model: F ik t = σ{n ik (s), Y ik (s), X ik (s) : s t}, (8) λ F ik ik (t) = Y ik(t)λ (t) exp (Xik T (t)β). (9) It is important to note that (9) is not the intensity with respect to the observed filtration F t = Ft k, (1) k where F k t = σ{n ik (s), Y ik (s), X ik (s) : i = 1, n, s t} is the information generated by observing all the individuals in the kth cluster. Now, with respect to (unobserved) filtration where is H t = k H k t, (11) H k t = σ{n ik (s), Y ik (s), X ik (s), V k : i = 1, n, s t}, λ H ik (t) = V kλ ik (t,, λ ( )), (12) referred to as the, and so that the marginal intensities are on Cox form λ F ik ik (t) = Y ik(t)λ (t) exp (Xik T (t)β), (13) Characterizing dependence Copula models P( T 1 > t 1,..., T n > t n ) = C (S 1 (t 1 ),..., S n (t n )), where S j, j = 1,..., n, denotes the marginal survivor functions. All multivariate distribution has this form. Archimedean copula model family, C (u 1,..., u n ) = ϕ (ϕ 1 (u 1) + + ϕ 1 (u n)) for some non-negative convex decreasing function ϕ with ϕ () = 1. Then λ ik is λ ik (t,, λ ( )) = Y ik (t)λ (t) exp(xik T (t)β) t exp( 1 exp(xik T (s)βt )λ (s)ds), It is of interest to find the intensities with respect to the observed filtration F t given in (1). It can be shown that these are where λ F ik (t) = Y ik(t)λ (t) exp(x T ik (t)β)f ik(t), (14) f ik (t) = ( + N k (t ) n t f k (t) = 1 + (exp( 1 j=1 )(exp( 1 t λ (s) exp(x T f k (t) ik (s)β)ds) ), Y jk (s)λ (s) exp(xjk T (s)β)ds) 1).

The observed (partial) log-likelihood function is K k=1 log(1 + N k(t ) )dn k (t) + K [ + N k (τ) ] n log(1 + 1 k=1 i=1 K n k=1 i=1 log(y ik (t) λ ik (t))dn ik (t) Y ik (t) λ ik (t)dt), (15) Terms depending on in (15) gives 1 K ( K k=1 log(1 + 1 N k (t ))dn k (t) + K k=1 i=1 n 1 N ik (τ)h ik K ) ( + N k (τ)) log(r k ()), (16) k=1 where λ ik (t) = λ (t)e X T ik (t)β exp( 1 t e X T ik (s)β λ (s)ds). H ik = Y ik (t)e X T ik (t)β dλ (t), R k () = 1 + n (exp( 1 H ik ) 1). i=1 Now, by replacing H ik with Ĥ ik = Y ik (t) exp(x T ik (t) ˆβ I )d ˆΛ I (t) in (16), we obtain the pseudo log likelihood for, and maximizing this function in gives the two-stage estimator of. Under some regularity conditions, [?] showed consistency and asymptotic normality of this estimator. This is equivalent to fitting directly the MLE for the Clayton-Oakes copula form C (u 1,..., u n ) = ϕ (ϕ 1 (u 1) + + ϕ 1 (u n)) Derive this Copula form for the Clayton-Oakes Gamma frailty model with Cox marginals. Write up the likelihood for censored bivariate survival data based on this represenation.

Two-stage modelling Two-stage modelling data(diabetes) # Marginal Cox model with treat as covariate marg <- cox.aalen(surv(time,status) ~prop(treat)+cluster(id),data=diabetes) fit<-two.stage(marg,data=diabetes,theta=1) summary(fit) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall s tau intercept.962.355 2.71.667.325 Marginal Cox-Aalen model fit Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val prop(treat) -.777.169.147.169-4.59 4.4e-6 # Stratification after adult theta.des<-model.matrix(~-1+factor(adult),diabetes); fit.s2<-two.stage(marg,data=diabetes,nit=4,theta=1,theta.des=theta.des) summary(fit.s2) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall s tau factor(adult)1.915.399 2.29.219.314 factor(adult)2 1.8.722 1.5.133.352 Marginal Cox-Aalen model fit Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val prop(treat) -.777.169.147.169-4.59 4.4e-6 Two-stage modelling Likelihoods # test for same variance among the two strata theta.des<-model.matrix(~factor(adult),diabetes); fit.s3<-two.stage(marg,data=diabetes,nit=4,theta=1,theta.des=theta.des) summary(fit.s3) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall s tau (Intercept).915.399 2.29.219.314 factor(adult)2.17.815.28.835.782 Marginal Cox-Aalen model fit Proportional Cox terms : Coef. SE Robust SE D2log(L)^-1 z P-val prop(treat) -.777.169.147.169-4.59 4.4e-6 Generally given af bivariate survival distribution G(x, y) and censored bivariate survival data (X i, δ i, Y i, ν i ), the likelihood is written as G(X i, Y i ) (1 δ i )(1 ν i ) (D x G)(X i, Y i ) δ i (1 ν i ) (17) (D y G)(X i, Y i ) (1 δ i )ν i (D X D y G)(X i, Y i ) δ i ν i (18) # to fit model without covariates, beta.fixed=1, but still need prop term! fit<-two.stage(marg,data=diabetes,theta=.95,detail=,beta.fixed=1) summary(fit) Dependence parameter for Clayton-Oakes-Glidden model Variance SE z P-val Kendall s tau intercept.584.278 2.1.357.226

Goodness of fit of Gamma Frailty Different frailt distribution have different consequences for the type of dependence. GOF One idea (Shih, 1998, Shih and Louis 1995) Compare likelihoods from different models. Score process. Under Gamma-distribution we earlier computed E(Z k History) = γ k (t) = 1 + 1 N k. 1 + 1 i exp(x T ik (t)β)λ (T ik ) Idea is to look at γ k (t) that has mean 1 under the gamma distribution. Let W (t) = K 1/2 k (γ k (t) 1) that is asymptotically Gaussian with a variance we can also estiamte under the null. This can be resampled and a supremum test can be constructed. Frailty models Considering the twin.csv data of menarche ages for pairs of twins. Look at the two.stage help pages to see different examples of its use. First estimate the marginal effect of cohort, and assess wheter or not the zygosity affects the marginal models. Fit a frailty model for the overall data, to asses the effect of cohort. Use the phmm and coxph program and compare the estimates of the cumulative baseline. Compare also the baseline with that from the marginal model for mono and dyzygotic twins. Fit separate models for monozygotic and dizygotic twins. Estimate also the marginal baselines using this model, and compare them. Is there a genetic effect on the timing of menarche? Report a Kendall s tau for this data for mono- and dizygotic twins and overall. Is the dependence different for mono- versus dizygotic twins. How can you make survival predictions for the monozygotic and dyzygotic twins? Consider the (simulated) twin data of menarche age for twins. Fit marginal models for the mono- and the dizygotic twins. Report a Kendall s tau. How does this Kendall s tau compare with that from the standard frailty model? Validate the fit of the model, by validating the marginal model. What about the frailty part? Compare formally the dependence between mono- and dyzygotic twins.