The coxvc_1-1-1 package

Size: px

Start display at page:

Download "The coxvc_1-1-1 package"

Homer Allison
6 years ago
Views:

1 Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models [24] with time varying effects of the covariates and reduced-rank models [77]. What makes those two modelling approaches so special is that an expanded data set has to be created before fitting, making the task computationally demanding, since even small data sets explode when stacking together all the possible risk sets. Using coxvc the models can be fitted on the original data, in a very fast and efficient algorithm, as described in [76]. The set of routines included in the package also contains some small useful functions that the authors often use when fitting survival models. The coxvc requires packages MASS, splines and survival [64], which are automatically loaded when you use the package. Please refer to the manual of those packages for more information. The MASS [102] package is loaded for using the command ginverse which is essential when estimating the generalized inverse matrix of the information matrix from a reduced-rank model. Splines are loaded in order to transform some of the covariates when running the models. Note that this package is not essential (although the build in examples of the coxvc package use splines) but it is definitely useful in many applications. Last, the survival package is the base core of the package, since it is needed for creating the survival objects used in our examples. A.2 Statistical background The Cox proportional hazards models is the most common method to analyze survival data. However, the main assumption of proportionality - the hazard ratio of two different cases remain constant regardless of time- is often violated, especially in studies with long follow up. The most straightforward way to extent the model is via the inclusion of interactions of the covariates with time 123

2 The coxvc_1-1-1 package functions. A non-proportional Cox model may be written as: h(t X) = h 0 (t) exp(xθf ) (A.1) where h 0 (t) is the unspecified baseline hazard, X is an 1 p matrix of p covariates, F is a n q matrix of q time functions, and Θ is a p q matrix of estimable coefficients. Perperoglou, le Cessie and van Houwelingen [77] introduced the idea of reduced-rank regression to survival analysis with time varying coefficients. A reduced-rank model requires the matrix of regression coefficients Θ to be written as a product of two submatrices, B of size p r and Γ of size q r, thus resulting in Θ = BΓ, a matrix of reduced-rank r, smaller than the number of covariates p or the number of time functions q. For fitting the full model, r has to be chosen to be equal to the minimum (p, q), in which case the structure matrix Θ is of full rank. This package was created to fulfil the demand of fitting reduced- rank hazards models in a fast and efficient way. For motivation of the package use refer to [76]. The new version of the package contains an additional set of small functions that were found useful to the author in several cases when analyzing survival data. A.3 Examples First load the coxvc library: > library(coxvc) The sample data within this library come from a study of ovarian cancer patients [104]. There are in total 358 cases of patients with information of the following variables: 124 time The number of days from enrollment until death or censoring. death An indicator of death (1) or censoring (0). karn The karnofsky index measuring the ability of the patients to perform several tasks. diam The diameter of the residual tumor. figo The Figo index, denoting the site of the metastasis. x Patient id

3 A.3. Examples Table A.1: Definitions of variables and patients frequencies X k Karnofsky < 70 n X f 0 1 Figo III IV n X d Diameter Micro < > 5 n For more information refer to table A.1. First attach the data: > data(ova) > attach(ova) A short summary of the data follows: > summary(ova) time death karn figo Min. : 7.0 Min. :0.000 Min. :0.000 Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median :1.000 Median :1.000 Median : Mean : Mean :0.743 Mean :1.173 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. :1.000 Max. :4.000 Max. : diam x Min. :0.000 Min. : st Qu.: st Qu.: Median :3.000 Median : Mean :2.651 Mean : rd Qu.: rd Qu.: Max. :4.000 Max. : A simple Cox proportional hazards model can be fitted in the usual way using the coxph command from survival library: 125

4 The coxvc_1-1-1 package > fit.ph <- coxph(surv(time, death) ~ karn + diam + figo) > fit.ph Call: coxph(formula = Surv(time, death) ~ karn + diam + figo) coef exp(coef) se(coef) z p karn e-03 diam e-05 figo e-05 Likelihood ratio test=64.1 on 3 df, p=7.68e-14 n= 358 A test of proportionality based on Schoenfeld residuals [92] reveals that in fact there are deviations from proportional hazards in the data. > cox.zph(fit.ph) rho chisq p karn diam figo GLOBAL NA as it is indicated by the small global p-value given above. A graphical inspection given by: > par(mfrow = c(3, 1)) > plot(cox.zph(fit.ph)) The results are shown in figure A.1 and suggest that there may be an interaction of time with the covariates. A first approach will be to fit a full rank model, which includes the full Θ matrix. We choose to transform time using B-splines, thus create the F matrix to contain F 1 (t) = 1 a constant and cubic B-spline functions on 3 degrees of freedom: > Ft <- cbind(rep(1, nrow(ova)), bs(time, df = 3)) Then the full rank model is given by: > fit.r3 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 3, + data = ova) > fit.r3 126

5 A.3. Examples Beta(t) for karn Time Beta(t) for diam Time Beta(t) for figo Time Figure A.1: Test of proportionality based on scaled Schoenfeld residuals along with a spline smooth with 90% confidence intervals. call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 3, data = ova) coef exp(coef) se(coef) z p karn diam figo karn:f1(t) diam:f1(t) figo:f1(t) karn:f2(t) diam:f2(t) figo:f2(t)

6 The coxvc_1-1-1 package karn:f3(t) diam:f3(t) figo:f3(t) log-likelihood= algorithm converged in 5 iterations The class of object fit.r3 is coxvc. The generic function printcoxvc is included in the package for printing results from the full model. The model has 21 parameters, and in practice the results are identical with fitting a coxph model on the expanded data set. However, the fit here was done in 5 iterations, on the original data set, which makes the routine much faster and more efficient. There are in total 266 events present in the ovarian data set. The object fit.r3 also contains the baseline hazard evaluated at this event time points. The function expand.haz can be used for expanding either the baseline or the cumulative baseline hazard. > haz <- fit.r3$hazard > length(haz) [1] 266 > haz.exp <- expand.haz(haz, death, fun = "baseline") > length(haz.exp) [1] 358 When expanding the baseline hazard, the function assigns a zero value in the time points of censoring, while when expanding a cumulative baseline hazard, the function assigns the value of the cumulative baseline at the time where the previous event took place whenever there is a censored case. > cum.haz <- cumsum(haz) > cum.haz.exp <- expand.haz(cum.haz, death, fun = "cumulative") The function plotcoxvc is included in the package to draw figures of the time varying behavior of the covariates: > plotcoxvc(fit.r3, fun = "effects", xlab = "time in days") The same function can be also used for plotting the survival function. Since the object fit.r3 is a coxvc using plot(survfit(...)) will not give the survival plot. Instead, the function plotcoxvc can be used: 128

7 A.3. Examples karn diam figo time in days Figure A.2: Estimated effects of the covariates over time, for the full rank model. > plotcoxvc(fit.r3, fun = "survival", xlab = "time in days") In figure A.2 we have seen that the time varying behavior of the covariates is too flexible, especially in the last days of the follow up. We fitted a rank=2 model at the data, to see whether the fit improves: > fit.r2 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 2, + data = ova) > fit.r2 call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 2, data = ova) coef exp(coef) se(coef) karn

8 The coxvc_1-1-1 package time in days Figure A.3: Survival function for the full rank model. diam figo karn:f1(t) diam:f1(t) figo:f1(t) karn:f2(t) diam:f2(t) figo:f2(t) karn:f3(t) diam:f3(t) figo:f3(t) log-likelihood= , Rank= 2 algorithm converged in 12 iterations 130

9 A.3. Examples Beta : Gamma: [,1] [,2] [,1] [,2] [1,] [1,] [2,] [2,] [3,] [3,] [4,] > summary(fit.r2) call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 2, data = ova) Beta : Gamma: [,1] [,2] [,1] [,2] [1,] [1,] [2,] [2,] [3,] [3,] [4,] The class of fit.r2 is coxrr. For reduced-rank models the generic function print.coxrr will print the estimated coefficients of the model along with their standard errors and so forth, as well as the factors of the Θ matrix, B and Γ. Moreover, the function summary.coxrr will provide also summary of the B and Γ matrices. We see that the rank=2 model, with 16 parameters in total, has a more reasonable fitting of the covariate effects > plotcoxvc(fit.r2, fun = "effects", xlab = "time in days") while the rank=1 model with 9 free parameters, is more much more rigid: > fit.r1 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 1, + data = ova) > fit.r1 call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 1, data = ova) 131

10 The coxvc_1-1-1 package karn diam figo time in days Figure A.4: Estimated effects of the covariates over time, for the rank=2 model. coef exp(coef) se(coef) karn diam figo karn:f1(t) diam:f1(t) figo:f1(t) karn:f2(t) diam:f2(t) figo:f2(t) karn:f3(t) diam:f3(t) figo:f3(t)

11 A.3. Examples log-likelihood= , Rank= 1 algorithm converged in 5 iterations Beta : Gamma: [,1] [,1] [1,] [1,] [2,] [2,] [3,] [3,] [4,] > plotcoxvc(fit.r1, fun = "effects", xlab = " time in days") karn diam figo time in days Figure A.5: Estimated effects of the covariates over time, for the rank=1 model. The package also contains a small function calc.h0 to compute the baseline hazard from a Cox model, evaluated for a case with all covariate values equal 133

12 The coxvc_1-1-1 package to zero. For example consider the simple proportional hazards model fit.ph. To get an estimate of the baseline hazard the function coxph.details can be used: > haz.ph <- coxph.detail(fit.ph)$haz > haz.ph0 <- calc.h0(fit.ph) The object haz.ph is the baseline hazard evaluated at the mean value of the covariates, while the object haz.ph0 is the baseline hazard evaluated for all covariate values equal to zero. This can be seen in graph A.6: > plot(time[death == 1], exp(-cumsum(haz.ph)), ylim = c(0, 1), + ylab = "", "l") > lines(time[death == 1], exp(-cumsum(haz.ph0)), col = 2) time[death == 1] Figure A.6: Figure of survival for an average person (black line) and a person with covariates X = 0 134

A fast routine for fitting Cox models with time varying effects

Chapter 3 A fast routine for fitting Cox models with time varying effects Abstract The S-plus and R statistical packages have implemented a counting process setup to estimate Cox models with time varying