Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Similar documents
Lecture 8 Stat D. Gillen

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 7 Time-dependent Covariates in Cox Regression

Survival Regression Models

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Multistate models and recurrent event models

Multistate models and recurrent event models

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Multistate models in survival and event history analysis

MAS3301 / MAS8311 Biostatistics Part II: Survival

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

The coxvc_1-1-1 package

Time-dependent coefficients

Cox s proportional hazards/regression model - model assessment

Lecture 10. Diagnostics. Statistics Survival Analysis. Presented March 1, 2016

Multi-state Models: An Overview

Multi-state models: prediction

Package threg. August 10, 2015

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

β j = coefficient of x j in the model; β = ( β1, β2,

Philosophy and Features of the mstate package

Relative-risk regression and model diagnostics. 16 November, 2015

Fitting Cox Regression Models

Time-dependent covariates

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Chapter 4 Regression Models

Survival Analysis. STAT 526 Professor Olga Vitek

Survival analysis in R

Stat 642, Lecture notes for 04/12/05 96

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Frailty Modeling for clustered survival data: a simulation study

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Consider Table 1 (Note connection to start-stop process).

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Multivariable Fractional Polynomials

5. Parametric Regression Model

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

CURE FRACTION MODELS USING MIXTURE AND NON-MIXTURE MODELS. 1. Introduction

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

Power and Sample Size Calculations with the Additive Hazards Model

Multivariable Fractional Polynomials

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

STAT331. Cox s Proportional Hazards Model

Case-control studies

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Survival analysis in R

Package crrsc. R topics documented: February 19, 2015

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Statistics in medicine

The nltm Package. July 24, 2006

A multi-state model for the prognosis of non-mild acute pancreatitis

Residuals and model diagnostics

Longitudinal + Reliability = Joint Modeling

Survival Analysis Math 434 Fall 2011

Multivariate Survival Analysis

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Survival Analysis I (CHL5209H)

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Multinomial Logistic Regression Models

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Survival Analysis. Stat 526. April 13, 2018

Chapter 7: Hypothesis testing

Extensions of Cox Model for Non-Proportional Hazards Purpose

Proportional hazards regression

Introduction to Statistical Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Indications and outcomes after UD HSCT

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

TMA 4275 Lifetime Analysis June 2004 Solution

Longitudinal Modeling with Logistic Regression

Lecture 12: Effect modification, and confounding in logistic regression

A fast routine for fitting Cox models with time varying effects

ST 732, Midterm Solutions Spring 2019

ST745: Survival Analysis: Cox-PH!

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

Simple techniques for comparing survival functions with interval-censored data

Beyond GLM and likelihood

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

Extensions of Cox Model for Non-Proportional Hazards Purpose

Survival models and health sequences

Nonparametric Model Construction

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Simple logistic regression

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Single-level Models for Binary Responses

for Time-to-event Data Mei-Ling Ting Lee University of Maryland, College Park

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Transcription:

Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 9.1

Survival analysis involves subjects moving through time Hazard may change over time Relative hazard may change over time Factors (covariates) determining relative hazard may change over time Why should we only examine covariates that are fixed at their baseline values? 9.2

Examples 1. In a study of Medicare patients, interest is on the effects of heart failure on mortality (age at death) We wish to allow relative hazard to change as subjects experience sentinel events such as: ischemia arrhythmia pump failure Health status of subjects is dynamic 2. In HIV studies of the time from infection (sero-conversion) to AIDS, the risk of AIDS-onset varies dramatically with the subject s CD-4 T-cell count, which is a smooth function of time 3. The relative hazard between high and low risk AML (leukemia) patients may decline with time 9.3

These effects can be expressed with time-varying covariates: 1. x i (t) = ischemia i (t) indicator for ith subject for ischemia hx before age t 2. x i (t) = CD4 i (t) ith HIV patient s CD4 count at time t since HIV-seroconversion 3. x i1 = I(risk i = high) = baseline high risk indicator, and x i2 (t) = x i1 log(t) is the interaction of risk status with time Conceptually, these effects are easy to include in a proportional hazards. Why? 9.4

The partial likelihood for the proportional hazards with time-fixed covariates is L P = failure times j ( ) exp(β T x (j) ) i R (j) exp(β T x i ) L P compares the covariate x (j) of the failed subject to the covariates x i in the risk set R (j) at time t (j) But, there is absolutely no reason that the covariate x i for the ith subject has to be the same for all times t (j). 9.5

The partial likelihood for the proportional hazards with time-varying covariates is L P = failure times j ( ) exp{β T x (j) (t (j) )} i R (j) exp{β T x i (t (j) )} where x i (t (j) ) is the value of the ith subject s covariate at time t (j) L P now compares the covariate x (j) (t (j) ) at time t (j) of the failed subject to the covariates x i (t (j) ) at time t (j) in the risk set R (j) 9.6

To formalize, consider the data: (y 1, δ 1, x 1 (t)),, (y n, δ n, x n (t)), where y i = observed follow-up time δ i = failure (vs. censoring) indicator x i (t) = vector covariate function of time t Note that x i (t) is set of values that (may) vary with t. R (j) is the set of subjects at risk for failure at t (j) Q: who is in the risk set? A: the covariate values at risk at t (j) are {x i (t (j) ) : i R (j) } 9.7

The partial likelihood is constructed by comparing the risk given x (j) (t (j) ) to the risk given all other x i (t (j) )s for i R (j) : L (j) = Pr{subject with x (j) (t (j) ) fails at t (j) } Pr{ some subject in R (j) failed at t (j) } and L P = L (j) failure times j The is λ i {t x i ( )} = λ 0 (t) exp{β T x i (t)} 9.8

Note: The cumulative hazard function for the ith subject is then Λ i {t x i ( )} = = t 0 t 0 λ i {s x i ( )} ds λ 0 (s) exp{β T x i (s)} ds which no longer factors into Λ 0 (t) and exp(β T x i ) The baseline cumulative hazard Λ 0 (t) now refers to a subject with x i (t) = 0 for all t λ0 (t), Λ 0 (t), S 0 (t) may no longer have a clear interpretation Recovering Λi {t x i ( )} is more difficult than before, and perhaps less meaningful 9.9

Consider the bone-marrow transplant data (Section 1.3, K & M): time origin is time of transplant failure event is relapse or death Intermediate events that could effect the hazard for death or relapse include: development of acute graft-versus-host disease (agvhd) development of chronic graft-versus-host disease (cgvhd) return of platelet count to self-sustaining level (platelet recovery) 9.10

We are interested here in the effect of platelet recovery (iplate), a binary variable taking value 0 at t = 0 and, for some subjects, taking value 1 at some time tplate. Lets look at the data > bmt <- read.csv( "http://www.ics.uci.edu/~dgillen/ STAT295/Data/bmt.csv" ) > bmt <- bmt[ order(bmt$tnodis), ] > bmt <- bmt[,-1] > bmt <- cbind( 1:dim(bmt)[1], bmt ) > names(bmt)[1] <- "id" 9.11

### Center age of patient and donor for later interpretation > bmt$agep.c = bmt$agep - 28 > bmt$aged.c = bmt$aged - 28 > bmt[1:25,c("id","tnodis","inodis", "iplate", "tplate")] id tnodis inodis iplate tplate 35 1 1 1 0 1 108 2 2 1 0 2 84 3 10 1 0 10 129 4 16 1 0 16 114 5 32 1 1 16 87 6 35 1 0 35 109 7 47 1 1 11 116 8 47 1 1 28 80 9 48 1 1 14 132 10 48 1 1 30 9.12

Data analysis set-up For a subject who never experiences platelet recovery, iplate= 0, for all time t up to the end of observation tnodis. This is like one observation with one exit time. For a subject experiencing platelet recovery, think of the data as two observations: first, we have an observation with iplate= 0, entering at time t = 0 and being censored at time t =tplate. iplate= 0 for this observation then, we have an observation with iplate= 1, entering the just just after time t =tplate, and exiting at time t =tnodis. iplate= 1 for this observation 9.13

Data analysis set-up ### Create a duplicate record for each subject > bmt.tvc <- bmt[rep(1:dim(bmt)[1],each=2),] > id.1 <-!duplicated( bmt.tvc$id ) > id.2 <- duplicated( bmt.tvc$id ) ### Deal with first record, set start, stop, event indicator, ### and covariate value > bmt.tvc$start[ id.1 bmt.tvc$iplate==0 ] <- 0 > bmt.tvc$stop[ id.1 & bmt.tvc$iplate==1 ] <- bmt.tvc[ id.1 & bmt.tvc$iplate==1, ]$tplate > bmt.tvc$inodis[ id.1 & bmt.tvc$iplate==1 ] <- 0 > bmt.tvc$iplate[ id.1 ] <- 0 ### Deal with second record, set start and stop > bmt.tvc$start[ id.2 & bmt.tvc$iplate==1 ] <- bmt.tvc[ id.2 & bmt.tvc$iplate==1, ]$tplate > bmt.tvc$stop[ id.2 ] <- bmt.tvc[ id.2, ]$tnodis 9.14

Data analysis set-up ### Remove records with missing stop value ### (these did not change iplate status) > bmt.tvc <- bmt.tvc[!is.na(bmt.tvc$stop), ] > bmt.tvc[1:20,c("id","tnodis", "start", "stop", "inodis", "iplate")] id tnodis start stop inodis iplate 35.1 1 1 0 1 1 0 108.1 2 2 0 2 1 0 84.1 3 10 0 10 1 0 129.1 4 16 0 16 1 0 114 5 32 0 16 0 0 114.1 5 32 16 32 1 1 87.1 6 35 0 35 1 0 109 7 47 0 11 0 0 109.1 7 47 11 47 1 1 116 8 47 0 28 0 0 116.1 8 47 28 47 1 1 80 9 48 0 14 0 0 80.1 9 48 14 48 1 1 132 10 48 0 30 0 0 132.1 10 48 30 48 1 1 9.15

Data analysis set-up Interpretation of what we have so far: id indicates multiple observations on a single subject start is the entry time for each observation stop is the exit time for each observation the exit time for the previous observation is used as the entry time for the following observations To create the Surv() response we can now specify Surv(start, stop, event) ### Look at the Surv() object > Surv( bmt.tvc$start, bmt.tvc$stop, bmt.tvc$inodis) [1] ( 0, 1 ] ( 0, 2 ] ( 0, 10 ] ( 0, 16 ] ( 0, 16+] ( 16, 32 ] ( 0, 35 ] 9.16

then proceeds by calling coxph() as before, with the expanded" Surv() object ### Fit Cox with time-varying covariate iplate > fit <- coxph( Surv( start, stop, inodis) ~ fab + agep.c*aged.c + factor(g) + imtx + iplate, data=bmt.tvc ) > summary(fit) coef exp(coef) se(coef) z Pr(> z ) fab 0.82003 2.27056 0.28379 2.89 0.0039 ** agep.c 0.00588 1.00590 0.01985 0.30 0.7671 aged.c 0.00505 1.00506 0.01785 0.28 0.7773 factor(g)2-0.95837 0.38352 0.36382-2.63 0.0084 ** factor(g)3-0.36327 0.69540 0.37352-0.97 0.3308 imtx 0.23177 1.26083 0.25775 0.90 0.3685 iplate -0.94348 0.38927 0.34109-2.77 0.0057 ** agep.c:aged.c 0.00288 1.00289 0.00093 3.10 0.0019 ** exp(coef) exp(-coef) lower.95 upper.95 fab 2.271 0.440 1.302 3.960 agep.c 1.006 0.994 0.968 1.046 aged.c 1.005 0.995 0.971 1.041 factor(g)2 0.384 2.607 0.188 0.782 factor(g)3 0.695 1.438 0.334 1.446 imtx 1.261 0.793 0.761 2.090 iplate 0.389 2.569 0.199 0.760 agep.c:aged.c 1.003 0.997 1.001 1.005 9.17

From the Wald test on the previous page, we obtain a p-value of 0.0057, indicating that platelet recovery is significantly associated with increased survival. We can also perform a likelihood ratio test... ### Fit Cox without time-varying covariate for LRT > fit.red <- coxph( Surv( start, stop, inodis) ~ fab + agep.c*aged.c + factor(g) + imtx, data=bmt.tvc ) > anova( fit.red, fit ) Analysis of Deviance Table Cox : response is Surv(start, stop, inodis) Model 1: ~ fab + agep.c * aged.c + factor(g) + imtx Model 2: ~ fab + agep.c * aged.c + factor(g) + imtx + iplate loglik Chisq Df P(> Chi ) 1-356 2-353 6.53 1 0.011 * 9.18

Interpretation: Adjusting for FAB, group, ages of patient and donor and MTX assignment, a randomly sampled subject having had a platelet recovery by any given time t has about 2/5 the risk of relapse / death than a (similar) subject at the same time t who has not yet experienced platelet recovery. Conclusion: Platelet recovery appears to be an important indicator for successful recovery, or at least, for delayed relapse. 9.19

Recall we had a with several covariates, and we were considering the effect of MTX (0 or 1), indicating receipt of a GVH prophylactic We had concluded via a graphical investigation: The hazards for the two MTX groups do not appear to be proportional Can we / test this conclusion? 9.20

Our original was λ i (t x i ) = λ 0 (t) exp(η i ) where η i = β T x i x ij = ages of patient and donor, fab, group, and x i7 = imtx i β 7 is the estimated log hazard ratio at all times t for imtx= 1 versus imtx= 0, holding other x s constant 9.21

Consider the alternative λ i (t imtx i = 1)/λ i (t imtx i = 0) = t β 8 exp(β 7 ) This expresses a departure from the proportional hazards assumption: β 8 = 0 t β 8 = 1 t so λ i(t imtx = 1) λ i (t imtx = 0) = exp(β 7) Testing H 0 : β 8 = 0 will provide a test of the proportional hazards assumption 9.22

The is now and x i7 = imtx i x i8 = imtx i log(t) ie. a covariate by (log) time interaction... Q: How do we test this in R? A: Create a data set with one observation for each person at each observed failure time: entry: just after last failure time t(j 1) exit: observed failure time t(j) Note: Fitting such a is computationally intensive and can be difficult for large datasets (with many failures)... 9.23

Data analysis set-up Here is one way we can obtain the necessary dataset... ## ##### Create dataset to look at an interaction with time u.evtimes <- unique( bmt$tnodis[ bmt$inodis==1 ] ) num.event <- length( u.evtimes ) bmt.texpand <- bmt[, c("id", "tnodis", "inodis", "fab", "agep.c", "aged.c", "g", "imtx") ] ## ##### Replicate each record for the number of observed failures bmt.texpand <- bmt.texpand[ rep(bmt.texpand$id,each=num.event), ] bmt.texpand$start <- rep( c(0,u.evtimes[1:(num.event-1)]), sum(!duplicated(bmt.texpand$id)) ) bmt.texpand$stop <- rep( u.evtimes, sum(!duplicated(bmt.texpand$id)) ) ## ##### Remove unnecessary rows and create event indicators ## bmt.texpand <- bmt.texpand[ bmt.texpand$tnodis > bmt.texpand$start, ] bmt.texpand <- bmt.texpand[ dim(bmt.texpand)[1]:1, ] bmt.texpand$stop <- ifelse(!duplicated(bmt.texpand$id), bmt.texpand$tnodis, bmt.texpand$stop) bmt.texpand$inodis <- ifelse(!duplicated(bmt.texpand$id), bmt.texpand$inodis,0 ) bmt.texpand <- bmt.texpand[ dim(bmt.texpand)[1]:1, ] 9.24

Now, our scientific question is whether the effect of imtx as measured by the hazard ratio changes with respect to time This can be parameterized and tested using a multiplicative interaction ### Test whether effect of imtx varies with time > fit <- coxph( Surv( start, stop, inodis) ~ fab + agep.c*aged.c + factor(g) + imtx + imtx:log(stop), data=bmt.texpand ) > summary(fit) coef exp(coef) se(coef) z Pr(> z ) fab 0.881860 2.415388 0.278461 3.17 0.0015 ** agep.c 0.003399 1.003405 0.019918 0.17 0.8645 aged.c 0.000496 1.000496 0.018114 0.03 0.9782 factor(g)2-1.014874 0.362448 0.362198-2.80 0.0051 ** factor(g)3-0.329875 0.719014 0.368384-0.90 0.3705 imtx 2.709578 15.022935 1.154639 2.35 0.0189 * agep.c:aged.c 0.003050 1.003054 0.000955 3.19 0.0014 ** imtx:log(stop) -0.479984 0.618793 0.225561-2.13 0.0333 * 9.25

### Test whether effect of imtx varies with time > fit <- coxph( Surv( start, stop, inodis) ~ fab + agep.c*aged.c + factor(g) + imtx + imtx:log(stop), data=bmt.texpand ) > summary(fit) exp(coef) exp(-coef) lower.95 upper.95 fab 2.415 0.4140 1.399 4.169 agep.c 1.003 0.9966 0.965 1.043 aged.c 1.000 0.9995 0.966 1.037 factor(g)2 0.362 2.7590 0.178 0.737 factor(g)3 0.719 1.3908 0.349 1.480 imtx 15.023 0.0666 1.563 144.406 agep.c:aged.c 1.003 0.9970 1.001 1.005 imtx:log(stop) 0.619 1.6160 0.398 0.963 9.26

The Wald test indicates a time-varying effect with imtx (z-statistic of -2.23 and p-value of 0.0258) We can also look at a likelihood ratio test... ### Fit Cox without interaction for LRT Analysis of Deviance Table Cox : response is Surv(start, stop, inodis) Model 1: ~ fab + agep.c * aged.c + factor(g) + imtx Model 2: ~ fab + agep.c * aged.c + factor(g) + imtx + imtx:log(stop) loglik Chisq Df P(> Chi ) 1-356 2-354 5.36 1 0.021 * --- Signif. codes: 0 Ô***Õ 0.001 Ô**Õ 0.01 Ô*Õ 0.05 Ô.Õ 0.1 Ô Õ 1 9.27

Interpretation: When log(t) = 0, i.e. at t = 1, the estimated relative hazard for an imtx= 1 subject, compared to an imtx= 0 subject is 16.5, CI [1.7,157] At 1 year (t = 365), the estimated log-relative hazard for an imtx= 1 subject, compared to aimtx= 0 subject is log(16.48) + log(.607) log(365) = 0.143, giving a relative hazard estimate of exp( 0.143) = 0.87 9.28

The lincontr.coxph() makes this easy... Conclusion: The relative risk of relapse or death for MTX patients is very high early in the study, but drops off later in the study. The hazards are not proportional (X 2 = 5.98 on 1 df) ### Estimated effect of imtx at 1 year (365 days) > lincontr.coxph( fit, contr.names=c("imtx","imtx:log(stop)"), contr.coef=c(1,log(365)) ) Test of H_0: exp( 1*imtx + 5.8999*imtx:log(stop) ) = 1 : exp( Est ) se.est zstat pval ci95.lo ci95.hi 1 0.885 0.336-0.363 0.716 0.458 1.711 9.29

Modeling with time-varying covariates With time-varying covariates, it is much more difficult to interpret the in terms of survival time baseline survival function may not have a real meaning you are always safe to maintain interpretation at the hazard level Keep your eyes on the risk sets! Time-varying covariates methods can be used to test the proportional hazards assumptions 9.30

Modeling with time-varying covariates Model-building strategy: Follow similar strategy to that for time-fixed covariates Treat true time-varying covariates on equal footing with other covariates Treat interactions of time-fixed covariates with log(t) (used to test proportional hazards assumption) on equal footing with other interactions: Examine their effects later in building strategy Go for parsimony, if possible 9.31

Modeling with time-varying covariates Need to be careful about the functional form of t in the : Good ideal to categorize time. For example, consider the hazard ratio associated with your predictor of interest over 0-6 months, 6-12 months, etc. (See Homework 4) Can also consider Schoenfeld resdiduals...coming soon 9.32