DAGStat Event History Analysis.

Similar documents
In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Lecture 2: Martingale theory for univariate survival analysis

Lecture 5 Models and methods for recurrent event data

Survival Regression Models

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Cox s proportional hazards model and Cox s partial likelihood

Lecture 7 Time-dependent Covariates in Cox Regression

Survival Analysis Math 434 Fall 2011

STAT Sample Problem: General Asymptotic Results

Lecture 22 Survival Analysis: An Introduction

STAT331. Cox s Proportional Hazards Model

MAS3301 / MAS8311 Biostatistics Part II: Survival

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong Tseng

Philosophy and Features of the mstate package

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Survival analysis in R

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Survival Analysis. Stat 526. April 13, 2018

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

TMA 4275 Lifetime Analysis June 2004 Solution

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Analysis of competing risks data and simulation of data following predened subdistribution hazards

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Survival Analysis I (CHL5209H)

Survival Analysis. STAT 526 Professor Olga Vitek

Multistate models in survival and event history analysis

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Multi-state Models: An Overview

Statistical Inference and Methods

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Estimating transition probabilities for the illness-death model The Aalen-Johansen estimator under violation of the Markov assumption Torunn Heggland

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

Survival analysis in R

Multi-state models: prediction

Multistate models and recurrent event models

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Multistate Modeling and Applications

Multistate models and recurrent event models

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Empirical Processes & Survival Analysis. The Functional Delta Method

Exercises. (a) Prove that m(t) =

Lecture 3. Truncation, length-bias and prevalence sampling

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

β j = coefficient of x j in the model; β = ( β1, β2,

Multivariate Survival Data With Censoring.

9 Estimating the Underlying Survival Distribution for a

Frailty Models and Copulas: Similarities and Differences

Follow this and additional works at: Part of the Applied Mathematics Commons

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Duration Analysis. Joan Llull

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

Credit risk and survival analysis: Estimation of Conditional Cure Rate

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

University of California, Berkeley

Survival Analysis for Case-Cohort Studies

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Logistic regression model for survival time analysis using time-varying coefficients

9. Estimating Survival Distribution for a PH Model

Statistical Analysis of Competing Risks With Missing Causes of Failure

ST495: Survival Analysis: Maximum likelihood

Modelling geoadditive survival data

e 4β e 4β + e β ˆβ =0.765

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

8. Parametric models in survival analysis General accelerated failure time models for parametric regression

Introduction to repairable systems STK4400 Spring 2011

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

STAT 526 Spring Final Exam. Thursday May 5, 2011

JOINT REGRESSION MODELING OF TWO CUMULATIVE INCIDENCE FUNCTIONS UNDER AN ADDITIVITY CONSTRAINT AND STATISTICAL ANALYSES OF PILL-MONITORING DATA

Beyond GLM and likelihood

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Understanding product integration. A talk about teaching survival analysis.

A multi-state model for the prognosis of non-mild acute pancreatitis

Residuals and model diagnostics

Survival Analysis using Bivariate Archimedean Copulas. Krishnendu Chandra

A note on the decomposition of number of life years lost according to causes of death

Package crrsc. R topics documented: February 19, 2015

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Linear rank statistics

A Comparison of Different Approaches to Nonparametric Inference for Subdistributions

On the Breslow estimator

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

ST5212: Survival Analysis

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Resampling methods for randomly censored survival data

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Transcription:

DAGStat 2016 Event History Analysis Robin.Henderson@ncl.ac.uk 1 / 75

Schedule 9.00 Introduction 10.30 Break 11.00 Regression Models, Frailty and Multivariate Survival 12.30 Lunch 13.30 Time-Variation and Dynamic Covariates 15.00 Break 15.30 Competing Risks 17.00 End Each session will consist of 45 minutes talks and 45 minutes computing exercises 2 / 75

Session 1: Introduction 1 Types of data 2 Survival analysis recap 3 Counting processes 4 Nelson-Aalen estimator 5 In R 3 / 75

Single event survival 1 : leukaemia data 0 2 4 6 8 10 X X X X C X X C C X 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Years 1 Henderson et al, JASA, 2002 4 / 75

Competing risks: transplant data 0 2 4 6 8 10 R C R D C R C C D R 0 2 4 6 8 10 Years 5 / 75

Clustered survival 2 : retinipathy data 0 2 4 6 8 10 X X X C X C X X X X C X C X C C C 0 10 20 30 40 50 60 70 Months 2 Huster et al, Biometrics, 1989 6 / 75

Recurrent events I: ships data 0 2 4 6 8 10 X X X X X X X X X X X X X C X X X X X C 0 5 10 15 20 25 30 Age 7 / 75

Recurrent events II: more ships data 0 2 4 6 8 10 X X X X X X X X X X X X X X X X C X X C X X X X X X X X C X C X X X X C 0 5 10 15 20 25 30 Age 8 / 75

Complex event history 3 : diarrhoea data 0 2 4 6 8 10 x xx xx O O x O x xx x x x x x x x O x x x x x x 3 Borgan et al, Scan J, 2007 0 100 200 300 400 Time 9 / 75

Single event survival definitions Time to event T (assumed continuous here) Cdf F (t) = P(T t), pdf f (t) = df (t)/dt Survival function S(t) = P(T > t) = 1 F (t) Hazard function Approximation Cumulative hazard α(t) = f (t) S(t) P (T [t, t + dt) T t) α(t)dt A(t) = t 0 α(u)du S(t) = exp{ A(t)} 10 / 75

Single event survival data basics Independent right-censoring Traditional notation: data (t i, δ i ) with δ i = 1 if event, δ i = 0 if censored Kaplan-Meier estimator 1 t < smallest observed failure time Ŝ(t) = ( ) i:t i t 1 d(t i ) n(t i ) otherwise (n(t i ), d(t i ) number at risk and number of events at t i ) Log-rank test (two groups) based on U = ( ) n 1 (t 1 ) d 1 (t i ) n 1 (t i ) + n 2 (t i ) {d 1(t 1 ) + d 2 (t i )} i Cox model α(t x) = α 0 (t)e βx 11 / 75

Counting processes N(t) = number of events that have occurred up to and including time t dn(t) = number of events in [t, t + dt) (0 or 1) History to t: F t History to just before t: F t Intensity P (dn(t) = 1 F t ) λ(t) = lim dt 0 dt Approximations P (dn(t) = 1 F t ) λ(t)dt E[dN(t) F t ] λ(t)dt Adding over lots of small intervals E[N(t)] = t 0 λ(u)du = Λ(t) 12 / 75

Martingales in under 140 characters M(t) = N(t) Λ(t) is a martingale E[M(t) F u ] = M(u) If H(t) is predictable W (t) = is a martingale Approximately t Var (W (t)) = 0 H(u)dM(u) t 0 H 2 (u)dλ(u) 13 / 75

Observed data and independent censoring At risk { 1 at risk just before t Y (t) = 0 otherwise α(t): underlying intensity (strictly α(t F t )) λ(t) = Y (t)α(t): intensity of observed counting process Cumulative versions A(t) = t 0 α(u)du Λ(t) = t 0 Independent censoring α(t + F t, Y (t)) = α(t + F t ) λ(u)du 14 / 75

Nelson-Aalen estimator Sample size n, no tied event times Now Informally N(t) = ˆα(t) = But for consistency n N i (t) Y (t) = i=1 n Y i (t) i=1 1 Y (t) dn(t) = 1, Y (t) > 0 0 dn(t) = 0 or Y (t) = 0 Â(t) = t where J(u) = I (Y (u) > 0) This is the Nelson-Aalen estimator 0 J(u) Y (u) dn(u) 15 / 75

Variance of Nelson-Aalen estimator Nelson-Aalen Â(t) = t We know M(t) = N(t) Λ(t) So And Estimated by Â(t) = t 0 0 J(u) Y (u) dn(u) J(u) t Y (u) dλ(u) + 0 ) t Var (Â(t) = 0 ) t Var ˆ (Â(t) = 0 J(u) Y (u) dm(u) J(u) Y 2 (u) dλ(u) J(u) Y 2 (u) dn(u) 16 / 75

In R Standard survival For counting processes Surv(time,status) Surv(time1,time2,status) (Intervals open on left, closed on right) Eg individal i = 1 has events at times 17 and 26 months and is right-censored at 36 months Consider as Old id New id time1 time2 status 1 1 0 17 1 1 2 17 26 1 1 3 26 36 0 17 / 75

18 / 75

Session 2: Regression Models, Frailty and Multivariate Survival 1 Likelihood construction 2 Cox and Aalen models 3 Frailty for single-event survival 4 Frailty for clustered data 19 / 75

Likelihood for single event survival Observation t, δ Likelihood contribution f (t) δ = 1 L(t, δ) = S(t) δ = 0 Since α(t) = f (t)/s(t) L(t, δ) = α(t) δ S(t) = α(t) δ exp{ A(t)} where A(t) = t 0 α(u)du 20 / 75

Alternative derivation 0 Time X (t, δ = 1) Intervals I 1, I 2,..., I K of length dt, boundaries 0 = t 0 < t 1 < t 2... < t K = t L(t, δ = 1) P(No event in I 1 ) P(No event in I 2 past)... P(Event in I K past) = K 1 {1 α(t j )dt} α(t)dt α(t) exp{ A(t)} j=0 Similar argument if δ = 0 21 / 75

Likelihood for recurrent events 0 Time X X O Events (t 1, t 2 ) Censored t L(data) P(No event in I past) Empty intervals P(Event in I past) Occupied intervals α(t 1 )α(t 2 ) exp{ A(t)} 22 / 75

Likelihood for event history data In general L(data) = = K 1 k=0 K 1 k=0 P(data in[t k, t k + dt) F tk ) {P(events of interest in [t k, t k + dt) F tk ) P(other data in [t k, t k + dt) events of interest in [t k, t k + dt), F tk )} A partial likelihood is L(data) = K 1 k=0 P(events of interest in [t k, t k + dt) F tk ) 23 / 75

Regression models I: Cox Proportional Hazards α i (t x i ) = α 0 (t)e βx i Partial likelihood ( ) e βt x i L = i:event at t i j R i e βt x j where R i = R(t i ) = {k : Y k (t i ) = 1} is risk set Same works more generally, with α i (t F t ) = α(t x i ) 24 / 75

Regression models II: Aalen additive α i (t F t ) = β 0 (t) + β 1 (t)x 1 + β 2 (t)x 2... Inference on cumulative coefficients B j (t) = t 0 β j (u)du BASELINE 0 1 2 3 4 5 6 X1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 100 300 Day 0 100 300 Day 25 / 75

Missing Data 26 / 75

Frailty Assume α(t x) = α 0 (t) exp{βx} α(t x obs, x miss ) = α 0 (t) exp{β obs x obs + β miss x miss } α(t x obs, W ) = α 0 (t) exp{β obs + W } α(t x obs, Z) = Zα 0 (t) exp{β obs } Z not observed Need to work with α(t x obs ) Assume E[Z] = 1 for identifiability 27 / 75

Frailty distributions Gamma. Can calculate S(t x) and α(t x) explicitly Log-Normal, so W = log Z Normal. Motivated by CLT for missing data Positive stable. Defined by Stays in PH family E[exp{ uz}] = exp{ u ν } 28 / 75

Frailty distributions Density 0.0 0.1 0.2 0.3 0.4 Log normal Gamma Positive stable 4 2 0 2 4 log(frailty) 29 / 75

Effect of frailty Survival 0.0 0.2 0.4 0.6 0.8 1.0 No frailty Log normal frailty Gamma frailty Positive stable frailty 0.0 0.5 1.0 1.5 2.0 Time 30 / 75

Leukaemia data: no frailty >coxph(surv(time,cens) age+male+wbc+dep,data=leukaemia) coef se(coef) z p age 0.0296 0.0021 14.04 0.0e+00 male 0.0522 0.0678 0.77 4.4e-01 wbc 0.0031 0.0005 6.89 5.7e-12 dep 0.0293 0.0090 3.24 1.2e-03 31 / 75

Leukaemia data:frailty (n=1043) >coxph(surv(time,cens) age+male+wbc+dep +frailty(1:n),data=leukaemia) coef se(coef) se2 Chisq DF p age 0.0499 0.0032 0.0026 239.6 1 0.0e+00 male 0.1179 0.1076 0.0778 1.2 1 2.7e-01 wbc 0.0060 0.0008 0.0006 60.0 1 9.5e-15 dep 0.0591 0.0148 0.0107 15.9 1 6.6e-05 frailty(1:n) 1106.2 479 0.0e+00 Variance of random effect= 0.952 Warning message:... 32 / 75

Shared frailty for clustered data Eg pairs (T 1, T 2 ), (x 1, x 2 ) Hazards α 1 (t x 1, Z) = Zα 0 (t) exp{β 1 x 1 } α 2 (t x 2, Z) = Zα 0 (t) exp{β 2 x 2 } 33 / 75

34 / 75

Session 3: Time-Variation and Dynamic Covariates 1 Time-varying covariates 2 Time varying effects 3 Quick and dirty 4 Aalen revisited 5 Dynamic covariates 35 / 75

Time varying covariates Cox: α i (t x i ) = α 0 (t)e βx i Partial likelihood ( ) e βt x i L = i:event at t i j R i e βt x j Also works with Cox: α i (t x i ) = α 0 (t)e βx i (t) ( ) e βt x i (t i ) L = i:event at t i j R i e βt x j (t i ) 36 / 75

Ships data 1 Sales of 3908 ships 2 Fixed covariates: type (three), weight, speed 3 Time-varying covariates: owner number, price index 37 / 75

id type dwt speed owner index start stop cens. 713 1 37971 15 1 23 79 80 0 713 1 37971 15 1 23 80 81 1 713 1 37971 15 2 23 81 82 0 713 1 37971 15 2 23 82 83 0 713 1 37971 15 2 22 83 84 0 713 1 37971 15 2 22 84 85 0 713 1 37971 15 2 22 85 86 0 713 1 37971 15 2 22 86 87 0 713 1 37971 15 2 22 87 88 1 713 1 37971 15 3 22 88 89 0. 38 / 75

fit1=coxph(surv(start,stop,cens) as.factor(type)*owner +dwt+speed+index,data=shipslong) coef se(coef) z as.factor(type)2-0.3411 0.1051-3.25 as.factor(type)3-0.1996 0.0659-3.03 owner -0.0072 0.0187-3.86 dwt -1.709e-07 2.253e-07-0.76 speed -0.0419 0.0107-3.90 index 0.4746 0.0833 5.70 as.factor(type)2:owner 0.1387 0.0401 3.46 as.factor(type)3:owner 0.0874 0.3080 2.84 39 / 75

Time varying effects Cox: α i (t x i ) = α 0 (t)e β(t)x i Various methods to estimate smooth β(t) Quick and dirty: changepoint α 0 (t)e β 1x t τ, α(t x) = α 0 (t)e β 2x t > τ. L(β 1, β 2 ) = i:t i τ ( e β 1x i j R i e β 1x j ) δi i:t i >τ ( e β 2x i j R i e β 2x j ) δi 40 / 75

Time-varying effects: quick and dirty 1 Split the time axis at a point τ 2 Fit a Cox model to times before τ. Get estimates ˆβ 1. 3 Fit a Cox model to times after τ. Get estimates ˆβ 2. 4 Do ˆβ 1 and ˆβ 2 seem to be very different? 5 Try various τ, compare (log) likelihoods 41 / 75

Leukaemia data >tau=1 >fit0=coxph(surv(time,cens) age+male+wbc+dep,data=leukaemia) >leuk1=leukaemia >leuk2=leukaemia >i1=leukaemia$time>tau >i2=leukaemia$time<=tau >leuk1$cens[i1]=0 >leuk2$cens[i2]=0 >fit1=coxph(surv(time,cens) age+male+wbc+dep,data=leuk1) >fit2=coxph(surv(time,cens) age+male+wbc+dep,data=leuk2) 42 / 75

Leukaemia data >fit0$loglik [1] -5457.211-5325.523 > fit1$loglik [1] -4248.897-4111.588 > fit2$loglik [1] -1208.315-1198.187 > fit1$loglik[2]+fit2$loglik[2] [1] -5309.774 43 / 75

fit1 coef se z age 0.0369 0.0027 13.84 male 0.0524 0.0788 0.66 wbc 0.0034 0.0005 7.30 dep 0.0442 0.0105 4.23 fit2 coef se z age 0.0148 0.0036 4.06 male 0.1269 0.1335 0.95 wbc 0.0016 0.0014 1.14 dep -0.0104 0.0184-0.57 44 / 75

Aalen additive α i (t F t ) = β 0 (t) + β 1 (t)x i1 (t) + β 2 (t)x i2 (t)... Inference on cumulative coefficients B j (t) = t 0 β j (u)du Martingale theory for (cumulative) standard errors 45 / 75

Leukaemia data age 0.00 0.01 0.02 0.03 0.04 0.05 male 0.2 0.0 0.2 0.4 0.6 0 1 2 3 4 5 Time 0 1 2 3 4 5 Time wbc 0.000 0.004 0.008 dep 0.00 0.02 0.04 0.06 0.08 0 1 2 3 4 5 Time 0 1 2 3 4 5 Time 46 / 75

Recap Counting process N(t) History F t Intensity α(t F t ) At-risk Y (t) E[N(t)] = t 0 Y (u)α(u F u )du Static models α(t F t ) = α(t F 0 ) 47 / 75

Recap Counting process N(t) History F t Intensity α(t F t ) At-risk Y (t) E[N(t)] = t 0 Y (u)α(u F u )du Static models α(t F t ) = α(t F 0 ) Cox: α(t F 0 ) = α 0 (t)e βx Frailty α(t F 0, Z) = Zα 0 (t)e βx Aalen (constant): α(t F 0 ) = β 0 (t) + β 1 x 1 (t) +... Aalen (varying): α(t F 0 ) = β 0 (t) + β 1 (t)x 1 (t) +... Logistic: α(t F 0 ) = expit{β 0 (t) + β 1 (t)x 1 (t) +...} 47 / 75

Dynamic models E[N(t)] = t 0 Y (u)α(u F u )du Static models α(t F t ) = α(t F 0 ) Dynamic models incorporate F t Frailty: α(t F t ) = E[Z F t ] α 0 (t)e βx Dynamic covariate D t = g(f t ) Eg D t = Number of events before t Days at risk before t 48 / 75

Diarrhoea data Eg D t =previous episode rate (episodes/time) Example: test for effect of rain-affected accommodation (Wald) Model Rain-affected Previous episode rate No dynamic 3.70 Include D t 1.53 6.78 49 / 75

Fixed X β XY (t) Y t 50 / 75

Fixed X β XY.D (t) Y t β XD (t) D t β DY (t) 51 / 75

Solution Assume Use D t = X γ t + Z t Ẑ t = D t ˆD t = D t X ˆγ t = D t X ( X T X ) 1 X T D t 52 / 75

X β XY (t) Y t Ẑ t β ZY (t) 53 / 75

Example: test for effect of rain-affected accommodation D t =previous episode rate (episodes/time) Model Rain-affected Previous episode rate No dynamic 3.70 Include D t 1.53 6.78 Include Ẑ t 3.79 6.77 54 / 75

55 / 75

Session 4: Competing Risks 1 Set-up 2 Cumulative incidence function 3 Cause-specific hazards 4 Subdistribution hazards 5 Words from the wise 56 / 75

Competing risks Events of more than one type Death from one of several causes Time to first event Latent failure time interpretation Independence not identifiable T = min{t 1, T 2,...} 57 / 75

Multistate interpretation 0 1 2 3 C i (t)=state (0,1,2...) of person i at time t T = inf t>0 (C(t) 0) 58 / 75

Single-event survival recap S(t) = exp{ A(t)} Kaplan-Meier Ŝ(t) = i:t i t ( 1 d(t ) i) n(t i ) (n(t i ), d(t i ) number at risk and number of events at t i ) Nelson-Aalen Â(t) = i:t i t Assume independent right-censoring d(t i ) n(t i ) 59 / 75

Cumulative Incidence Function Two causes from now on T = time in state 0 C = C( ) = C(T ) CIF: F j (t) = P(T t, C = j) Marginal S(t) = 1 F 1 (t) F 2 (t) Estimated by usual Kaplan-Meier with event types pooled T is proper But lim S(t) = 0 t lim F j(t) = P(C = j) < 1 t 60 / 75

Naive Kaplan-Meier One cause at a time Treat other causes as censoring For cause j Ŝ j (t) = i:t i t ( 1 d ) j(t i ) n(t i ) (d j (t i ) number of events of type j at t i ) Does not in general estimate 1 F j (t) 61 / 75

Kidney transplant data Survival 0.0 0.2 0.4 0.6 0.8 1.0 KM for tx failure KM for death 0 5 10 15 20 25 Time (years) 62 / 75

Kidney transplant data Survival 0.0 0.2 0.4 0.6 0.8 1.0 1 KM for tx failure KM for death 0 5 10 15 20 25 Time (years) 63 / 75

Transition probabilities and cause-specific hazards P jk (s, t) = P(C(t) = k C(s) = j) P 00 (t) = S(t) α j (t) = lim dt 0 P 0j (t, t + dt)/dt Cumulative cause-specific hazard Can be estimated A j (t) = t 0 Â j (t) = i:t i t α j (u)du d j (t i ) n(t i ) And ˆF j (t) = i:t i t Ŝ(t i ) d j(t i ) n(t i ) 64 / 75

Kidney transplant data Survival 0.0 0.2 0.4 0.6 0.8 1.0 1 KM for tx failure CIF for tx failure KM for death 1 CIF for death 0 5 10 15 20 25 Time (years) 65 / 75

Kidney transplant data Survival 0.0 0.2 0.4 0.6 0.8 1.0 1 KM for tx failure CIF for tx failure 1 KM for death CIF for death 0 5 10 15 20 25 Time (years) 66 / 75

Categorical covariates CIF via cif=survfit(surv(time,cens,type="mstate") 1,data=kidney) As usual cif=survfit(surv(time,cens,type="mstate") capd,data=kidney) 67 / 75

Kidney transplant data: CIF Survival 0.0 0.2 0.4 0.6 0.8 1.0 Failure, no CAPD Death,no CAPD Failure, CAPD Death, CAPD 0 5 10 15 20 25 Time (years) 68 / 75

General covariates Cause-specific hazards can be modelled as usual α j (t x) = α 0j (t) exp{β j x} Estimation as usual 69 / 75

coxph(surv(time,censdeath) rage+capd+drmm+bmm,data=kidney) coef se z rage -0.0131 0.0021-6.16 capd -0.1457 0.0651-2.24 drmm 0.2779 0.0513 5.42 bmm 0.0865 0.0518 1.67 coxph(surv(time,censfail) rage+capd+drmm+bmm,data=kidney) coef se(coef) z rage 0.0456 0.0032 14.23 capd -0.1586 0.0868-1.83 drmm 0.2297 0.0721 3.19 bmm 0.2009 0.0681 2.95 70 / 75

Interpretation Depends on all causes F 1 (t) = = = t 0 t 0 t 0 S(u)α 1 (u x)du e { A 1(u x) A 2 (u x)} α 1 (u x)du e { A 01(u)e β 1 x A 02 (u)e β 2 x } α 01 (u)e β 1x du Better to calculate P 0j (0, t) for specific x Aalen-Johansen transition matrix P(t) = (I + da(t)) u t 71 / 75

State probabilities Recipient aged 60, no CAPD, drmm=1, bmm=1 P(t) 0.0 0.2 0.4 0.6 0.8 1.0 OK Failed Dead 0 5 10 15 20 25 30 35 Time t 72 / 75

Subdistribution hazards Standard survival α(t) = d log(s(t)) dt CIF F j (t) = P(T t, C = j) Define Interpretation α j (t) = d log (1 F j(t)) dt α j (t) P (C(t + dt) = j C(t) j) Compare with cause-specific α j (t) P (C(t + dt) = j C(t) = 0) Fine and Gray use proportional hazards for α j (t) Problem: estimation requires us to assume that individuals in state k j remain in the risk sets for transition to k 73 / 75

Other approaches P(T t, C = j) = P(T t C = j)p(c = j) P(T t, C = j) = P(C = j T t)p(t t) Parametric models 74 / 75

Final words Andersen & Keiding, SiM, 2012 1 Do not condition on the future 2 Do not regard individuals at risk after they have died 3 Stick to this world 75 / 75