The coxvc_1-1-1 package

Similar documents
A fast routine for fitting Cox models with time varying effects

Reduced-rank hazard regression

Package CoxRidge. February 27, 2015

Survival analysis in R

Survival analysis in R

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Time-dependent coefficients

β j = coefficient of x j in the model; β = ( β1, β2,

Survival Regression Models

Relative-risk regression and model diagnostics. 16 November, 2015

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

MAS3301 / MAS8311 Biostatistics Part II: Survival

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Lecture 7 Time-dependent Covariates in Cox Regression

Fitting Cox Regression Models

Lecture 8 Stat D. Gillen

Multistate models and recurrent event models

Extensions of Cox Model for Non-Proportional Hazards Purpose

Multistate models and recurrent event models

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Lecture 10. Diagnostics. Statistics Survival Analysis. Presented March 1, 2016

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

On a connection between the Bradley-Terry model and the Cox proportional hazards model

More on Cox-regression

Multivariable Fractional Polynomials

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Multivariable Fractional Polynomials

Checking model assumptions with regression diagnostics

Cox s proportional hazards/regression model - model assessment

Chapter 4 Regression Models

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

On a connection between the Bradley Terry model and the Cox proportional hazards model

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Survival Analysis. Stat 526. April 13, 2018

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Extensions of Cox Model for Non-Proportional Hazards Purpose

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Residuals and model diagnostics

Package SimSCRPiecewise

Matched Pair Data. Stat 557 Heike Hofmann

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Philosophy and Features of the mstate package

Step-Stress Models and Associated Inference

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

for Time-to-event Data Mei-Ling Ting Lee University of Maryland, College Park

Survival Analysis. STAT 526 Professor Olga Vitek

4. Comparison of Two (K) Samples

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Does anemia contribute to end-organ dysfunction in ICU patients Statistical Analysis

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

ST745: Survival Analysis: Cox-PH!

Understanding the Cox Regression Models with Time-Change Covariates

Time-varying covariates and coefficients in Cox regression models

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Practical considerations for survival models

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

The nltm Package. July 24, 2006

Building a Prognostic Biomarker

Package ICGOR. January 13, 2017

Model Adequacy Test for Cox Proportional Hazard Model

Multivariate Survival Analysis

Cox Proportional-Hazards Regression for Survival Data in R

One-stage dose-response meta-analysis

TMA 4275 Lifetime Analysis June 2004 Solution

Multi-state Models: An Overview

Frailty Modeling for clustered survival data: a simulation study

Power and Sample Size Calculations with the Additive Hazards Model

Modelling geoadditive survival data

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Consider Table 1 (Note connection to start-stop process).

Semiparametric Regression

Package ICBayes. September 24, 2017

5. Parametric Regression Model

Flexible parametric alternatives to the Cox model, and more

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Chapter 7: Hypothesis testing

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

ST495: Survival Analysis: Maximum likelihood

MAS3301 / MAS8311 Biostatistics Part II: Survival

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Lecture 22 Survival Analysis: An Introduction

Package JointModel. R topics documented: June 9, Title Semiparametric Joint Models for Longitudinal and Counting Processes Version 1.

Part III Measures of Classification Accuracy for the Prediction of Survival Times

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Meta-analysis of epidemiological dose-response studies

Validation. Terry M Therneau. Dec 2015

Survival models and health sequences

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

STAT 526 Spring Final Exam. Thursday May 5, 2011

Multistate Modeling and Applications

Outline. Cox's regression model Goodness-of-t methods. Cox's proportional hazards model: Survival analysis

Transcription:

Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models [24] with time varying effects of the covariates and reduced-rank models [77]. What makes those two modelling approaches so special is that an expanded data set has to be created before fitting, making the task computationally demanding, since even small data sets explode when stacking together all the possible risk sets. Using coxvc the models can be fitted on the original data, in a very fast and efficient algorithm, as described in [76]. The set of routines included in the package also contains some small useful functions that the authors often use when fitting survival models. The coxvc 1-1-1 requires packages MASS, splines and survival [64], which are automatically loaded when you use the package. Please refer to the manual of those packages for more information. The MASS [102] package is loaded for using the command ginverse which is essential when estimating the generalized inverse matrix of the information matrix from a reduced-rank model. Splines are loaded in order to transform some of the covariates when running the models. Note that this package is not essential (although the build in examples of the coxvc package use splines) but it is definitely useful in many applications. Last, the survival package is the base core of the package, since it is needed for creating the survival objects used in our examples. A.2 Statistical background The Cox proportional hazards models is the most common method to analyze survival data. However, the main assumption of proportionality - the hazard ratio of two different cases remain constant regardless of time- is often violated, especially in studies with long follow up. The most straightforward way to extent the model is via the inclusion of interactions of the covariates with time 123

The coxvc_1-1-1 package functions. A non-proportional Cox model may be written as: h(t X) = h 0 (t) exp(xθf ) (A.1) where h 0 (t) is the unspecified baseline hazard, X is an 1 p matrix of p covariates, F is a n q matrix of q time functions, and Θ is a p q matrix of estimable coefficients. Perperoglou, le Cessie and van Houwelingen [77] introduced the idea of reduced-rank regression to survival analysis with time varying coefficients. A reduced-rank model requires the matrix of regression coefficients Θ to be written as a product of two submatrices, B of size p r and Γ of size q r, thus resulting in Θ = BΓ, a matrix of reduced-rank r, smaller than the number of covariates p or the number of time functions q. For fitting the full model, r has to be chosen to be equal to the minimum (p, q), in which case the structure matrix Θ is of full rank. This package was created to fulfil the demand of fitting reduced- rank hazards models in a fast and efficient way. For motivation of the package use refer to [76]. The new version of the package contains an additional set of small functions that were found useful to the author in several cases when analyzing survival data. A.3 Examples First load the coxvc library: > library(coxvc) The sample data within this library come from a study of ovarian cancer patients [104]. There are in total 358 cases of patients with information of the following variables: 124 time The number of days from enrollment until death or censoring. death An indicator of death (1) or censoring (0). karn The karnofsky index measuring the ability of the patients to perform several tasks. diam The diameter of the residual tumor. figo The Figo index, denoting the site of the metastasis. x Patient id

A.3. Examples Table A.1: Definitions of variables and patients frequencies X k 0 1 2 3 4 Karnofsky 100 90 80 70 < 70 n 137 108 47 46 20 X f 0 1 Figo III IV n 262 96 X d 0 1 2 3 4 Diameter Micro < 1 1-2 2-5 > 5 n 29 67 49 68 145 For more information refer to table A.1. First attach the data: > data(ova) > attach(ova) A short summary of the data follows: > summary(ova) time death karn figo Min. : 7.0 Min. :0.000 Min. :0.000 Min. :0.0000 1st Qu.: 369.3 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0000 Median : 734.5 Median :1.000 Median :1.000 Median :0.0000 Mean : 980.8 Mean :0.743 Mean :1.173 Mean :0.2682 3rd Qu.:1619.3 3rd Qu.:1.000 3rd Qu.:2.000 3rd Qu.:1.0000 Max. :2729.0 Max. :1.000 Max. :4.000 Max. :1.0000 diam x Min. :0.000 Min. : 1.00 1st Qu.:1.000 1st Qu.: 90.25 Median :3.000 Median :179.50 Mean :2.651 Mean :179.50 3rd Qu.:4.000 3rd Qu.:268.75 Max. :4.000 Max. :358.00 A simple Cox proportional hazards model can be fitted in the usual way using the coxph command from survival library: 125

The coxvc_1-1-1 package > fit.ph <- coxph(surv(time, death) ~ karn + diam + figo) > fit.ph Call: coxph(formula = Surv(time, death) ~ karn + diam + figo) coef exp(coef) se(coef) z p karn 0.172 1.19 0.0532 3.23 1.2e-03 diam 0.200 1.22 0.0495 4.05 5.2e-05 figo 0.542 1.72 0.1353 4.00 6.2e-05 Likelihood ratio test=64.1 on 3 df, p=7.68e-14 n= 358 A test of proportionality based on Schoenfeld residuals [92] reveals that in fact there are deviations from proportional hazards in the data. > cox.zph(fit.ph) rho chisq p karn -0.15773 7.23734 0.00714 diam -0.03464 0.31140 0.57682 figo -0.00405 0.00432 0.94760 GLOBAL NA 9.64451 0.02184 as it is indicated by the small global p-value given above. A graphical inspection given by: > par(mfrow = c(3, 1)) > plot(cox.zph(fit.ph)) The results are shown in figure A.1 and suggest that there may be an interaction of time with the covariates. A first approach will be to fit a full rank model, which includes the full Θ matrix. We choose to transform time using B-splines, thus create the F matrix to contain F 1 (t) = 1 a constant and cubic B-spline functions on 3 degrees of freedom: > Ft <- cbind(rep(1, nrow(ova)), bs(time, df = 3)) Then the full rank model is given by: > fit.r3 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 3, + data = ova) > fit.r3 126

A.3. Examples Beta(t) for karn 1 0 1 2 82 200 350 470 630 800 1100 1900 Time Beta(t) for diam 2.0 0.5 1.0 82 200 350 470 630 800 1100 1900 Time Beta(t) for figo 2 0 2 4 82 200 350 470 630 800 1100 1900 Time Figure A.1: Test of proportionality based on scaled Schoenfeld residuals along with a spline smooth with 90% confidence intervals. call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 3, data = ova) coef exp(coef) se(coef) z p karn 0.556 1.744 0.152 3.6579 0.00025 diam 0.199 1.220 0.168 1.1845 0.24000 figo 0.501 1.650 0.392 1.2781 0.20000 karn:f1(t) -1.370 0.254 0.730-1.8767 0.06100 diam:f1(t) 0.043 1.044 0.696 0.0618 0.95000 figo:f1(t) 0.543 1.721 1.797 0.3022 0.76000 karn:f2(t) 0.497 1.644 1.396 0.3560 0.72000 diam:f2(t) 0.465 1.592 1.072 0.4338 0.66000 figo:f2(t) -2.280 0.102 3.201-0.7123 0.48000 127

The coxvc_1-1-1 package karn:f3(t) -1.676 0.187 2.150-0.7795 0.44000 diam:f3(t) -1.550 0.212 1.546-1.0026 0.32000 figo:f3(t) 3.481 32.492 4.358 0.7988 0.42000 log-likelihood= -1373.840 algorithm converged in 5 iterations The class of object fit.r3 is coxvc. The generic function printcoxvc is included in the package for printing results from the full model. The model has 21 parameters, and in practice the results are identical with fitting a coxph model on the expanded data set. However, the fit here was done in 5 iterations, on the original data set, which makes the routine much faster and more efficient. There are in total 266 events present in the ovarian data set. The object fit.r3 also contains the baseline hazard evaluated at this event time points. The function expand.haz can be used for expanding either the baseline or the cumulative baseline hazard. > haz <- fit.r3$hazard > length(haz) [1] 266 > haz.exp <- expand.haz(haz, death, fun = "baseline") > length(haz.exp) [1] 358 When expanding the baseline hazard, the function assigns a zero value in the time points of censoring, while when expanding a cumulative baseline hazard, the function assigns the value of the cumulative baseline at the time where the previous event took place whenever there is a censored case. > cum.haz <- cumsum(haz) > cum.haz.exp <- expand.haz(cum.haz, death, fun = "cumulative") The function plotcoxvc is included in the package to draw figures of the time varying behavior of the covariates: > plotcoxvc(fit.r3, fun = "effects", xlab = "time in days") The same function can be also used for plotting the survival function. Since the object fit.r3 is a coxvc using plot(survfit(...)) will not give the survival plot. Instead, the function plotcoxvc can be used: 128

A.3. Examples 1.0 0.5 0.0 0.5 1.0 1.5 2.0 karn diam figo 0 500 1000 1500 2000 time in days Figure A.2: Estimated effects of the covariates over time, for the full rank model. > plotcoxvc(fit.r3, fun = "survival", xlab = "time in days") In figure A.2 we have seen that the time varying behavior of the covariates is too flexible, especially in the last days of the follow up. We fitted a rank=2 model at the data, to see whether the fit improves: > fit.r2 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 2, + data = ova) > fit.r2 call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 2, data = ova) coef exp(coef) se(coef) karn 0.545 1.7246084 0.151 129

The coxvc_1-1-1 package 0.0 0.2 0.4 0.6 0.8 1.0 0 500 1000 1500 2000 time in days Figure A.3: Survival function for the full rank model. diam 0.176 1.1924381 0.163 figo 0.616 1.8515072 0.358 karn:f1(t) -1.277 0.2788727 0.711 diam:f1(t) 0.209 1.2324450 0.646 figo:f1(t) -0.415 0.6603403 1.259 karn:f2(t) 0.239 1.2699785 1.322 diam:f2(t) 0.094 1.0985597 0.942 figo:f2(t) 0.298 1.3471618 1.455 karn:f3(t) -1.146 0.3179059 1.964 diam:f3(t) -0.830 0.4360493 1.352 figo:f3(t) -2.054 0.1282210 2.202 log-likelihood= -1375.299, Rank= 2 algorithm converged in 12 iterations 130

A.3. Examples Beta : Gamma: [,1] [,2] [,1] [,2] [1,] 5.4171-72.6259 [1,] 1.0610 0.0716 [2,] 0.3858-3.2510 [2,] 1.8601 0.1563 [3,] 3.8667-48.6655 [3,] 0.5818 0.0401 [4,] -5.4324-0.3894 > summary(fit.r2) call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 2, data = ova) Beta : Gamma: [,1] [,2] [,1] [,2] [1,] 5.4171-72.6259 [1,] 1.0610 0.0716 [2,] 0.3858-3.2510 [2,] 1.8601 0.1563 [3,] 3.8667-48.6655 [3,] 0.5818 0.0401 [4,] -5.4324-0.3894 The class of fit.r2 is coxrr. For reduced-rank models the generic function print.coxrr will print the estimated coefficients of the model along with their standard errors and so forth, as well as the factors of the Θ matrix, B and Γ. Moreover, the function summary.coxrr will provide also summary of the B and Γ matrices. We see that the rank=2 model, with 16 parameters in total, has a more reasonable fitting of the covariate effects > plotcoxvc(fit.r2, fun = "effects", xlab = "time in days") while the rank=1 model with 9 free parameters, is more much more rigid: > fit.r1 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 1, + data = ova) > fit.r1 call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 1, data = ova) 131

The coxvc_1-1-1 package 1.0 0.5 0.0 0.5 1.0 karn diam figo 0 500 1000 1500 2000 time in days Figure A.4: Estimated effects of the covariates over time, for the rank=2 model. coef exp(coef) se(coef) karn 0.334 1.39654314 0.101 diam 0.381 1.46374761 0.122 figo 0.779 2.17929188 0.258 karn:f1(t) -0.478 0.62002220 0.334 diam:f1(t) -0.545 0.57984178 0.396 figo:f1(t) -1.115 0.32791528 0.796 karn:f2(t) 0.298 1.34716179 0.562 diam:f2(t) 0.340 1.40494759 0.638 figo:f2(t) 0.695 2.00370907 1.314 karn:f3(t) -1.142 0.31918002 0.873 diam:f3(t) -1.303 0.27171542 0.981 figo:f3(t) -2.664 0.06966899 2.067 132

A.3. Examples log-likelihood= -1378.049, Rank= 1 algorithm converged in 5 iterations Beta : Gamma: [,1] [,1] [1,] 0.2146 [1,] 1.5553 [2,] 0.2450 [2,] -2.2257 [3,] 0.5008 [3,] 1.3876 [4,] -5.3198 > plotcoxvc(fit.r1, fun = "effects", xlab = " time in days") 1.0 0.5 0.0 0.5 1.0 karn diam figo 0 500 1000 1500 2000 time in days Figure A.5: Estimated effects of the covariates over time, for the rank=1 model. The package also contains a small function calc.h0 to compute the baseline hazard from a Cox model, evaluated for a case with all covariate values equal 133

The coxvc_1-1-1 package to zero. For example consider the simple proportional hazards model fit.ph. To get an estimate of the baseline hazard the function coxph.details can be used: > haz.ph <- coxph.detail(fit.ph)$haz > haz.ph0 <- calc.h0(fit.ph) The object haz.ph is the baseline hazard evaluated at the mean value of the covariates, while the object haz.ph0 is the baseline hazard evaluated for all covariate values equal to zero. This can be seen in graph A.6: > plot(time[death == 1], exp(-cumsum(haz.ph)), ylim = c(0, 1), + ylab = "", "l") > lines(time[death == 1], exp(-cumsum(haz.ph0)), col = 2) 0.0 0.2 0.4 0.6 0.8 1.0 0 500 1000 1500 2000 time[death == 1] Figure A.6: Figure of survival for an average person (black line) and a person with covariates X = 0 134