Outline: Cox regression part 2 Ørnulf Borgan Department of Mathematics University of Oslo Recapitulation Estimation of cumulative hazards and survival probabilites Assumptions for Cox regression and check of model assumptions NORBIS course University of Oslo 4-8 December 217 1 2 Recapitulation Assume that we have a sample of n individuals, and let N i (t) count the observed occurrences of the event of interest for individual i as a function of (study) time t We have the decomposition dn ( t) = λ ( t) dt + dm ( t) i i i observation signal noise he intensity process for individual i may be given as λ i ( t) = Yi ( t) α( t xi ) at risk indicator hazard rate (intensity) (time-dependency of covariates suppressed in the notation) Assume that the hazard rate for individual i takes the form α ( t x ) = α ) (, x ( t)) i ( t rβ i he common choice of relative risk function is ( ) ( β1 1 β ) r( β, x ( t)) = exp β x ( t) = exp x ( t) + + x ( t) i i i p ip which gives Cox's regression model e β j baseline hazard hazard ratio (relative risk) is the hazard ratio (HR), often called relative risk, for one unit's increase in the j-th covariate, keeping all other covariates the same 4
Partial likelihood and estimation of β Ordinary ML-estimation does not work for the relative risk regression models (due to the nonparametric baseline) Instead we have to use Cox's partial likelihood Cumulative hazards and survival probabilities We will estimate the cumulative baseline hazard A t t ( ) ( ) = α We take the aggregated counting process as our starting point. u du Its intensity process is given by Here i j is the index of the individual who experiences an event at j, and is the risk set at j If we had knownβ, we could have repeated the argument we used to derive the Nelson-Aalen estimator to show that we could estimate A ( t) by 5 6 Since is unknown, we replace it by to obtain the Breslow estimator: he corresponding survival function is given by { } = exp A( t x ) and it may be estimated by If all covariates are fixed, the cumulative hazard corresponding to an individual with covariate vector and it may be estimated by x is 7 Alternatively we may use (as is done in R): { A ˆ t x } Sɶ ( t x ) = exp ( ) For practical purposes there is little difference between the two estimators he estimators of the cumulative hazards and survival functions are approximately normal and their variances may be estimated as described in section 4.1.6 in ABG 8
Melanoma data We first compare Nelson-Aalen estimates (black lines) with the cumulative hazards obtained from a Cox model with sex as only covariate (red lines) Cumulative hazard..1.2.3.4.5.6.7 Females Males Using R he results on the previous slide are obtained by the following commands: # We consider a model with sex as the only covariate and start by # making Nelson-Aalen plots for females and males: fit.ss=coxph(surv(lifetime,status==1)~strata(sex),data=melanoma) surv.ss=survfit(fit.ss) plot(surv.ss,fun="cumhaz", mark.time=false,xlim=c(,1),ylim=c(,.7), xlab="years since operation",ylab="cumulative hazard",lty=c(1,3),lwd=2) legend("topleft",c("females","males"),lty=c(1,3),lwd=2) # We then fit a Cox model with sex as the only covariate and plot # the model based estmates of the cumulative hazards in the same plot: fit.s=coxph(surv(lifetime,status==1)~factor(sex),data=melanoma) surv.s=survfit(fit.s,newdata=data.frame(sex=c(1,2))) lines(surv.s,fun="cumhaz", mark.time=f, lty=c(1,3),lwd=2,col="red") 2 4 6 8 1 Years since operation 9 1 hen we consider a model with sex thickness and ulceration ˆβ HR se( ˆ β ) Z P Sex:.459 1.58.267 1.72.85 hickness:.113 1.12.38 2.99.28 Ulceration: -1.667.31.311-3.75.18 Estimated cumulative hazards : We will estimate cumulative hazards and survival functions for females with the following combinations of tumor thickness and ulceration: 1) hickness: 1 mm, ulceration: absent 2) hickness: 2 mm, ulceration: absent 3) hickness: 2 mm, ulceration: present 4) hickness: 4 mm, ulceration: present 11 12
Estimated survival functions: Using R he results on the previous slides are obtained by the following commands: # We consider the model with sex, ulceration and thickness: fit.stu=coxph(surv(lifetime,status==1)~factor(sex)+factor(ulcer)+thickn, data=melanoma) summary(fit.stu) # We plot the cumulative hazards for females for four covariate combinations: # 1) thickn=1, ulcer=2 2) thickn=2, ulcer=2 # 3) thickn=2, ulcer=1 4) thickn=4, ulcer=1 new.covariates=data.frame(sex=c(1,1,1,1), ulcer=c(2,2,1,1), thickn=c(1,2,2,4)) surv.stu=survfit(fit.stu,newdata=new.covariates) plot(surv.stu,fun="cumhaz", mark.time=false, xlim=c(,1), xlab="years since operation",ylab="cumulative hazard",lty=1:4,lwd=2) legend("topleft",c("female, 1 mm, absent","female, 2 mm, absent", "female 2 mm, present","female, 4 mm, present"), lty=1:4,lwd=2) 13 # o plot the survival functions for females for the same combinations of the # covariates we just omit the "cumhaz" option Assumptions for Cox regression We consider a Cox regression model with fixed covariates: α( t x) = α( t) exp( β x) Note that the model assumes: 1) Log-linearity: log{ α( t x)} = log{ α ( t)} + β x Check of log-linearity We check log-linearity for a numeric covariate, say covariate 1, assuming that log-linearity is ok for the other covariates We may fit a penalized smoothing spline s( x1 ) for the effect of covariate 1 : α( t x) = α ( t) exp { s( x ) + β x } 1 2 2 and see if the spline estimate becomes fairly linear 2) Proportional hazards: α( t x2) = exp{ β ( x2 x1)} (independent of time) α( t x ) 1 A number of methods exist for checking these assumption, and we will have a look at two of them (this material is not in the ABG-book, cf page 134) 15 Melanoma data: Checking log-linearity by using a spline for tumor thickness in a model with sex and ulceration as the other covariates he non-linear parts of the smoothing spline have a significant effect (P=.38) Partial for pspline(thickn) -2-1 1 2 3 5 1 15 thickn
When the effect of a numeric covariate is not log-linear, we may transform the covariate or use a grouped version of it For the melanoma data, the plots indicate that we may use log-thickness as covariate (and then log2 is a good choice) Melanoma data: Checking log-linearity by using a spline for log2 of tumor thickness in a model with sex and ulceration as the other covariates Partial for pspline(log2thick) -6-4 -2 2 he non-linear parts of the smoothing spline do not have a significant effect (P=.41) -2 2 4 log2thick Using R he results on the previous slide are obtained by the following commands: # o check log-linearity for thickness, we fit a model with sex, ulceration # and penalized smoothing spline for the effect of thickness: fit.spstu=coxph(surv(lifetime,status==1)~factor(sex)+factor(ulcer)+pspline(thickn), data=melanoma) print(fit.spstu) termplot(fit.spstu,se=,terms=3) # o check log-linearity for log2(thickness) [which has to be defined as a # new covariate], we fit a model with sex, ulceration and penalized smoothing # spline for the effect of log2(thickness): melanoma$log2thick=log2(melanoma$thickn) fit.spslogtu=coxph(surv(lifetime,status==1)~factor(sex)+factor(ulcer)+pspline(log2thick), data=melanoma) print(fit.spslogtu) termplot(fit.spslogtu,se=,terms=3) 18 Check of proportional hazards One way to check if we have proportional hazard is to fit a model of the form { 11 i1 12 i1 p1 ip p2 ipg t } α( t x) = α ( t) exp β x + β x g( t) + + β x + β x ( ) for a known function g(t), e.g. g(t) = log t Melanoma data: Plots that indicate possible time dependent effects of the covariates We then test the null hypotesis that one or all of β j 2 = For the melanoma data: chisq p factor(sex)2.23.631 factor(ulcer)2.96.328 log2thick 4.2.45 GLOBAL 8.77.33 19 he test and plots indicate that there may be a non-proportional (i.e. time-dependent) effect of log-thickness 2
Using R he results on the previous slides are obtained by the commands below. # We will do a formal test for proportionality of the covariates. his is done by, # for each covariate x, adding time-dependent covariate x*log(t), and testing # whether the time-dependent covariates are significant using a score test: cox.zph(fit.slogtu,transform='log') # he test indicates that the effect of tumor-thickness is not proportional. # he estimate we get for log-thicness in then a weighted average of the # time-varying effect # We also make plots that give nonparametric estimates of the (possible) # time dependent effect of the covariates: par(mfrow=c(1,3)) plot(cox.zph(fit.slogtu)) par(mfrow=c(1,1) 21