ST745: Survival Analysis: Cox-PH!

ST745: Survival Analysis: Cox-PH! Eric B. Laber Department of Statistics, North Carolina State University April 20, 2015

Rien n est plus dangereux qu une idee, quand on n a qu une idee. (Nothing is more dangerous than an idea, when you have only one idea.) Kayne West I would never want a book s autograph. I am a proud non-reader of books. Kayne West

Then and now Last time we discussed parametric regression models AFT models PH models IPCW least squares Today we ll discuss Cox-PH model Estimation and inference R code

Warm-up Explain to your stat buddy 1. Explain how a parametric regression model might be used with survival data 2. Give an example of a semi-parametric regression model 3. Name three common complaints about non-parametric regression models True or false: (T/F) The original Cox-PH paper is the most highly cited statistics paper appearing in the Journal of the Royal Statistical Society Series B? (T/F) D.R. Cox founded the statistics department at NCSU (T/F) By leaving uninteresting parts of the general model completely unspecified, semi-parametric models do not require model diagnostics (T/F) Merlin is older in dog years than Madeline is in hamster years?

D.R. Cox Sir David Cox (knighted 1985) Conceived of the Cox-PH model in a fevered dream 1 Most real life statistical problems have one or more nonstandard features. There are no routine statistical questions; only questionable statistical routines. D.R. Cox 1 Well, he was reportedly very ill anyway.

Cox-PH model Let T denote survival time and x R p covariates, Cox-PH model assumes h(t x) = h 0 (t) exp {x β}, where β R p are coefficients and h 0 (t) is left unspecified Estimation and inference does not rely on form of h 0 (t) Est and inf does depend on multiplicative form Model diagnostics are essential to validate this assumption

Cox-PH model cont d Some implications of the PH model Survivor fn S(t bx) { = {S 0 (t)} exp(x β), where S 0 (t) = exp } t 0 h 0(s)ds Subjects with covariates x and x have hazard ratio h(t x) h(t x ) = exp {(x x ) β}, thus, log {h(t x)/h(t x )} = (x x ) β βj is the per-unit increase in the log hazard (why?)

Estimation Recall that a partial likelihood is a LH with nuisance terms removed Cox (1975) derived a partial LH for estimation of Cox-PH Mathematical rigor and asymptotic results provided by our own Tsiatis (1981) Just when you thought you were done...recall dn i (t) = 1 Ti [t,t+ t),δ i =1 Yi (t) = 1 Ti t dn (t) = n i=1 dn i(t) dn(t) = (dn1 (t),..., dn n (t)) Let t (1),..., t (K) denote unique failure times, assume one failure per time Let x (i) denote covariates for subject with failure time t (i) Let R (i) denote the set of individuals at risk at t (i)

Estimation cont d We showed in L3 that under non-informative right-censoring L = P { dn(t) H(t) } t=0 P {dn(t) dn (t), H(t)} P {dn (t) H(t)}, t=0 where H(t) is the history of failure/censoring process on (0, t) Drop terms P {dn (t) H(t)} to obtain partial LH

Estimation cont d Since we assume unique failure times dn (t) is 0 or 1 given t sufficiently small If dn (t) = 0 then dn(t) 0 If dn (t) = 1 then P {dn i (t) = 1 dn (t) = 1, H(t)} = Y i (t)h(t x i ) t n k=1 Y k(t)h(t x k ) t Y i (t)exp {x i β} n k=1 Y k(t) exp {x k β, } thus, the partial LH is L(β) K i=1 exp {x (i) β } l R (i) exp {x l β}

Try it! Compute the partial likelihood Patient ID t δ z 1 2 1 2 2 2 0 2 3 3 1 1 4 4 1 3

Estimation and inference Estimator β n = arg max β L(β) L(β) is not the LH we re used to (why?), however, we ll see it can be used to conduct tests and inference Show that L(β) K i=1 } exp {x (i) β l R (i) exp { x l β} = ( n exp { x i β} ) δi n l=1 Y l(t i ) exp { x l β}, Note* above is defined with multiple failures at a given time i=1

Estimation and inference cont d Log partial likelihood (hereafter, log-lh) [ ( n n δ i x i β log Y l (t i ) exp { x l β})] i=1 l=1 Score fn U(β) = n δ i [x i x(t i, β)], i=1 where x(t, β) = n l=1 x ly l (t) exp { x l β} / n l=1 Y l(t) exp { x l β} coxph in R computes β n as solution to U(β) = 0

Yet even more estimation and inference Information matrix { n n l=1 I (β) = δ Y l(t i ) exp { x l β} } [x l x(t i, β)] [x l x(t i, β)] i n l=1 Y l(t i ) exp { x l β} i=1 Asymptotic inference based on I 1/2 ( β n β) N(0, I p ) We can also use LRT treating L as a likelihood Λ(β) = 2l( β) 2l(β)

Fitting Cox-PH in R coxph.r

Stratified Cox-PH Consider categorical covariate z (e.g,. sex) Adjust nonparametrically for by fitting h(t x, z) = h j (t) exp {x β}, where h j (t) is the baseline hazard for category j of z How is this different than including z as a predictor?