STK4080/9080 Survival and event history analysis

SK48/98 Survval and event hstory analyss Lecture 7: Regresson modellng Relatve rsk regresson Regresson models Assume that we have a sample of n ndvduals, and let N (t) count the observed occurrences of the event of nterest for ndvdual as a functon of (study) tme t We have the decomposton dn ( t) = λ ( t) dt + dm ( t) Partal lkelhood Estmaton of cumulatve hazards and survval probabltes Martngale resduals and model check Stratfed models observaton sgnal nose We wll consder regresson models where the ntensty process λ ( t) for ndvdual depends on a vector of (possbly) tme-dependent covarates x ( t) = ( x ( t), x ( t),..., x ( t)) 2 p 2 he ntensty process for ndvdual may be gven as λ ( t) = Y ( t) α( t x ) at rsk ndcator hazard rate (ntensty) (tme-dependency of covarates suppressed n the notaton) A regresson model specfes how the hazard rate depends on the covarates We wll consder two types of regresson models: Relatve rsk regresson models (secton 4.) Addtve regresson models (secton 4.2) 3 A note on covarates We assume that the ntensty processes depend on the covarate processes x ( t) = ( x ( t), x ( t),..., x ( t)) =,..., n 2 p hroughout we wll assume that the covarate processes are predctable hs mples that: fxed covarates should be measured n advance (.e. at tme zero) and reman fxed throughout the study the values at tme t of tme-dependent covarates should be known "ust before" tme t You should never let covarates depend on nformaton from the future! 4

It s useful to dstngush between external (or exogenous) and nternal (or endogenous) covarates Examples of external covarates are: Fxed covarates Defned tme-dependent covarates: the complete covarate path s gven at the outset of the study (e.g. a person's age at study tme t ) Ancllary tme-dependent covarates: the path of a stochastc process that s not nfluenced by the event beng studed (e.g. observed level of ar polluton) me-dependent covarates that are not external, are called nternal One example of an nternal covarate s a bomarker measured for the ndvduals durng follow-up Interpretaton of regresson analyses wth nternal tme-dependent covarates s not at all straghtforward! 5 Relatve rsk regresson models Assume that the hazard rate for ndvdual takes the form α ( t x ) = α ) (, x ( t)) baselne hazard ( t rβ hazard rato (relatve rsk) β α ( t) We assume r (, ) =, so the baselne hazard s the hazard for an ndvdual wth all covarates equal to zero We make no assumptons of the form of the baselne hazard hus the model contans a nonparametrc part (the baselne hazard) and a parametrc part (the relatve rsk functon) We say that the model s semparametrc 6 he common choce of relatve rsk functon s ( ) ( β β ) r( β, x ( t)) = exp β x ( t) = exp x ( t) + + x ( t) p p whch gves Cox's regresson model Consder two ndvduals, ndexed and 2, and assume that all components of x ( t) and ( t) are equal, 2 except the -th component where x ( t) = x ( t) + hen: α( t x2) α( t x ) ( β x t ) ( β x t ) α( t)exp 2( ) = α ( t)exp ( ) x 2 { ( 2 t t )} = exp β x ( ) x ( ) = e β hus e β s the hazard rato for one unt's ncrease n the -th covarate, keepng all other covarates constant 7 Other possble choces of the relatve rsk functon are: he addtve rsk functon: r( β, x ( t)) = + β x ( t) he excess relatve rsk functon: p r( β, x ( t)) = { + β x ( t)} = Cox regresson s the only relatve rsk regresson model mplemented n R 8

Partal lkelhood and estmaton of β Ordnary ML-estmaton does not work for the relatve rsk regresson models (due to the nonparametrc baselne) Instead we have to use a partal lkelhood We wll se how ths may be derved he ntensty process of N (t) s gven as λ ( t) = Y ( t) α( t x ) = Y ( t) α ( t) r( β, x ( )) t he ntensty process of the aggregated countng process takes the form (assumng no ont events) We consder the condtonal probablty of observng an event for ndvdual at tme t, gven the past and gven that an event s observed at tme t : ( t) = P( dn ( t) = dn ( t) =, F ) π t P( dn( t) = F t ) = P( dn ( t) = F ) t hen the ntensty process of N (t) may be factorzed as 9 We obtan the partal lkelhood by multplyng together the condtonal probabltes over all observed event tmes (thereby dsregardng the nformaton on the regresson coeffcents contaned n the aggregated process) hen, f s the ndex of the ndvdual who experences an event at, the partal lkelhood becomes We wll show (later) that the maxmum partal lkelhood estmator enoys "the usual propertes" of ML-estmators hus s approxmately multvarate normally dstrbuted around the true value of wth a covarance matrx that may be estmated by, where s the observed nformaton matrx For general relatve rsk functons t may be better to use the expected nformaton matrx. But as ths concdes wth the observed nformaton matrx for Cox regresson, we wll not go nto these detals (cf. secton 4..5) where s the rsk set at 2

o test the null hypothess H : β =, we may use the Wald test statstc ˆ β Z = se ( ˆ β ) whch s approxmately standard normally dstrbuted under the null hypothess o obtan a confdence nterval for the hazard rato we transform the lmts of the standard confdence nterval for to get the 95% confdence nterval : β { ˆ β ˆ ± se β } exp.96 ( ) e β 3 o test the smple null hypothess H : β = β for a specfed value of β (typcally β = ) we may apply the usual lkelhood based tests statstcs: he lkelhood rato test statstc: he score test statstc: where = Uβ ( ) Iβ ( ) Uβ ( ) 2 χ SC he Wald test statstc: χ = ( βˆ β ) Iβ ( ˆ)( βˆ β ) 2 W s the vector of score functons All the test statstcs are approxmately ch-squared dstrbuted wth p df under the null hypothess 4 All the tests may be generalzed to a composte null hypothess, where on want to test the hypothess that r of the regresson coeffcents are zero (or equvalently, after a reparameterzaton, that there are r lnear restrctons among the regresson coeffcents) In partcular f s the maxmum partal lkelhood estmator under the null hypothess, the lkelhood rato test statstc takes the form and t s approxmately ch-squared dstrbuted wth r df under the null hypothess 5 Usng R For llustraton we use the melanoma data (cf practcal exercses and 2) # Read data: path="http://www.uo.no/studer/emner/matnat/math/sk48/h4/melanoma.txt" melanoma=read.table(path,header=) # We frst consder the model wth log-thckness as the only covarate: ft.t=coxph(surv(lfetme,status==)~log2(thckn),data=melanoma) summary(ft.t) # Note that we use base 2 logarthms for ease of nterpretaton # hen we consder the model wth log-thckness and sex as covarates: ft.ts=coxph(surv(lfetme,status==)~log2(thckn)+sex,data=melanoma) summary(ft.ts) # Note that snce sex s a bnary covarate (coded and 2), we get the same # estmates f we treat sex as a numerc covarate or as a categorcal # covarate [by usng factor(sex) n the coxph-command] # he two models may be compared usng the lkelhood rato test: anova(ft.t,ft.ts,test="chsq") 6

Estmaton of cumulatve hazards and survval probabltes We wll estmate the cumulatve baselne hazard A t t ( ) ( ) = α u du We take the aggregated countng process as our startng pont Its ntensty process s gven by For a gven value ofβ, we may therefore estmate A ( t) by Snce s unknown, we replace t by to obtan the Breslow estmator: If we had knownβ, ths would have been an example of the multplcatve ntensty model 7 8 If all covarates are fxed, the cumulatve hazard correspondng to an ndvdual wth a gven covarate vector s x he correspondng survval functon s gven by and t may be estmated by and t may be estmated by For a gven path of an external tme-dependent covarate, the cumulatve hazard may be estmated by Alternatvely we may use (as s done n R): { A ˆ t x } Sɶ ( t x ) = exp ( ) For practcal purposes there s lttle dfference between the two estmators he estmators of the cumulatve hazards and survval functons are approxmately normal and ther varances may be estmated as descrbed n secton 4..6 (whch s not part of the currculum) 2

Usng R For llustraton we contnue to use the melanoma data # We frst consder ulceraton as the only covarate and start by # makng Nelson-Aalen plots for patents wth and wthout ulceraton: ft.su=coxph(surv(lfetme,status==)~strata(ulcer),data=melanoma) surv.su=survft(ft.su) plot(surv.su,fun="cumhaz", mark.tme=f,xlm=c(,),ylm=c(,.7), xlab="years snce operaton",ylab="cumulatve hazard",lty=:2) legend("topleft",c("ulceraton","no ulceraton"),lty=:2) # We then ft a Cox model wth ulceraton as the only covarate and plot # the model based estmates of the cumulatve hazards n the same plot: ft.u=coxph(surv(lfetme,status==)~ulcer,data=melanoma) surv.u=survft(ft.u,newdata=data.frame(ulcer=c(,2))) lnes(surv.u,fun="cumhaz", mark.tme=f,conf.nt=f, lty=:2,col="red") # We then consder the model wth ulceraton and log-thckness ft.ut=coxph(surv(lfetme,status==)~ulcer+log2(thckn),data=melanoma) summary(ft.ut) # We wll plot the cumulatve hazards for the four covarate combnatons # ) ulcer=2, thckn= # 2) ulcer=2, thckn=4 # 3) ulcer=, thckn=4 # 3) ulcer=, thckn=8 new.covarates=data.frame(ulcer=c(2,2,,),thckn=c(,4,4,8)) surv.ut=survft(ft.ut,newdata= new.covarates) plot(surv.ut,fun="cumhaz", mark.tme=f, xlm=c(,), xlab="years snce operaton",ylab="cumulatve hazard",lty=:4) legend("topleft",c("","2","3","4"), lty=:4) # o plot the survval functons for the same combnatons of the # covarates we ust omt the "cumhaz" opton: plot(surv.ut,mark.tme=f, xlm=c(,), xlab="years snce peraton",lty=:4) legend("bottomleft",c("","2","3","4"), lty=:4) 2 22 Martngale resduals and model check We know that the processes wth Λ ( t) = t λ ( u) du = Y ( u) r( β, x ( u)) α ( u) du M ( t) = N ( t) Λ ( t) are martngales f the model s correctly specfed Λ ( t) ˆβ β α u du We may estmate by nsertng for and da ˆ ( u) for ( ) where A ˆ ( t) s the Breslow estmator t 23 Estmated cumulatve ntensty processes: Λ ˆ ( t ) = ( ) ( ˆ, ( )) ˆ t Y u r β x u d A ( u ) Y ( ) ( ˆ r β, x ( )) r( βˆ, x ( )) = t l l R Martngale resdual processes Mˆ ( t) = N ( t) Λˆ ( t) Martngale resduals M ˆ = M ˆ ( τ ) where τ s upper tme lmt of study 24

In the ABG-book (secton 4..3) a method s descrbed for checkng goodness-of-ft for relatve rsk regresson models usng grouped martngale resdual processes We wll not consder ths method, but rather present the methods of Ln et al (Bometrka 993) for checkng the assumptons of Cox regresson usng cumulatve sums of martngale resdual processes So consder Cox's regresson model wth fxed covarates: α( t x) = α ( t)exp( β x) he model assumes: ) Log-lnearty: 2) Proportonal hazards: log{ α( t x)} = log{ α ( t)} + β x α( t x2) = exp{ β ( x2 x)} (ndependent of tme) α( t x ) 25 26 For checkng log-lnearty,.e. f the k-th covarate has correct functonal form, we may consder ( ) = ( ) ˆ n k k = W x I x x M n = I( x x) N ( τ ) I( x x) exp ( βˆ x ) ( βˆ x ) k k = exp R l l R he two terms are the observed and expected number of falures for covarate values x 27 Illustraton for melanoma data wth ulceraton and tumor thckness (not log-transformed) as covarates -6-4 -2 2 4 6 5 5 If the model s correctly specfed, the test process should fluctuate around zero So «large» values ndcate that the covarate has a wrong functonal form But how large s «large»? umor thckness 28

Ln et al. (993) showed that f the model s correctly specfed, Wk ( x) s asymptotcally dstrubuted as a mean zero Gaussan process he lmtng dstrbuton s ntractable, but Ln et al. suggested a way to approxmate the dstrbuton usng Monte Carlo smulatons he trck s to consder an asymptotc approxmaton of Wk ( x) and to replace dm (t) n ths approxmaton by G dn (t) where the G 's are sampled from a standard normal dstrbuton (keepng the data fxed) 29 Cumulatve MG-resduals Plot of the observed test process together wth 5 smulated processes (assumng a correct model) -5 5 5 5 umor thckness he computaton may be performed usng the tmereg package n R, cf. below and secton 6.2 n Martnussen & Scheke (Sprnger 26) he plot ndcates that the model predcts too many deaths for thn tumors o get a P-value we compare sup Wk ( x) wth smulated processes, gvng P=.64 3 For a model wth log thckness and ulceraton we get the followng result: Cumulatve MG-resduals -5 5 For checkng proportonal hazards, we for the k-th covarate consder ( ) ˆ = n n exp( βˆ x ) Uk t xk M ( t) = xk N ( t) xk ˆ = exp β x Illustraton for melanoma data wth log tumor thckness and ulceraton: prop(ulcer) ( ) = t R l l R prop(log2(thckn)) -2 2 4 log tumor thckness Here the assumpton of a log-lnear effect seems fne (P=.3) 3 Cumulatve MG-resduals -4-2 2 4 P=.4 2 4 6 8 Cumulatve MG-resduals - -5 5 P=.5 2 4 6 8 Years snce operaton Years snce operaton

Usng R For llustraton we contnue to use the melanoma data We wll use the tmereg package so ths needs to be nstalled and loaded # We frst consder a model wth ulceraton and thckness (not log-transformed) ft.ut=cox.aalen(surv(lfetme,status==)~prop(ulcer)+prop(thckn), data=melanoma, weghted.test=, resduals=,rate.sm=,n.sm=) #Check of log-lnearty resds.ut=cum.resduals(ft.ut,data=melanoma,cum.resd=) plot(resds.ut,score=2,xlab="umor thckness") summary(resds.ut) # We then check log-lnearty and proportonal hazards for a model wth log-transformed thckness ft.ult=cox.aalen(surv(lfetme,status==)~prop(ulcer)+prop(log2(thckn)), data=melanoma, weghted.test=, resduals=,rate.sm=,n.sm=) resds.ult=cum.resduals(ft.ult,data=melanoma,cum.resd=) plot(resds.ult,score=2,xlab="log tumor thckness") summary(resds.ult) par(mfrow=c(,2)) plot(ft.ult,score=,xlab="years snce operaton") summary(ft.ult) Stratfed models So far we have assumed a common baselne hazard for all ndvduals,.e. α ( t x ) = α ) (, x ( t)) ( t rβ When ths s not a realstc assumpton, one may adopt a stratfed verson of the model hen the study popolaton s grouped nto k strata, and for an ndvdual n stratum s we assume that the hazard takes the form: α ( t x,stratu m s) = α ( t) r( β, x ( t)) Note that the effects of the covarates are assumed to be the same accross strata, whle the baselne hazard may vary between strata 33 34 s We now estmate β by maxmzng the partal lkelhood We may estmate the stratum-specfc cumulatve baselne hazards t A ( t) ( u) du = α s s where are the observed event tmes n by the Breslow estmators stratum s and s the rsk set n ths stratum at tme s he maxmum partal lkelhood estmator enoys smlar propertes as for the stuaton wthout stratfcaton and statstcal test may be performed as before 35 As before these provde the bass for estmatng cumulatve hazards and survval functons for gven values of fxed covarates (or gven paths of external tme-varyng covarates) 36

Usng R For llustraton we contnue to use the melanoma data # We ft a model where we stratfy on ulceraton use log-thckness as covarate ft.strat=coxph(surv(lfetme,status==)~log2(thckn)+strata(ulcer), data=melanoma) summary(ft.strat) # We may plot the cumuatve baselne hazards for the two ulceraton strata: baselne.covar=data.frame(thckn=) surv.strat=survft(ft.strat,newdata=baselne.covar) plot(surv.strat,fun="cumhaz", mark.tme=f,xlm=c(,), xlab="years snce operaton",ylab="cumulatve hazard",lty=:2) 37