In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Chapter 5 modelling Semi parametric We have considered parametric and nonparametric techniques for comparing survival distributions between different treatment groups. Nonparametric techniques, such as Kaplan-Meier, or the log rank test, have the advantage that they require few model assumptions. However they are not powerful and fail to cope with other covariate designs. 1

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require correctness of the chosen family. A compromise solution is a semi parametric procedure known as the proportional hazards model. 2

Why focus on the hazard function? Physically enlightening to consider the immediate risk attaching to an individual known to be alive at age t; comparisons of groups of individuals are sometimes most incisively made via the hazard; hazard-based models are often convenient when there is censoring or there are several types of failure; comparison with an exponential distribution is particularly simple in terms of hazard; 3

the hazard is the special form of the intensity function for more elaborate point processes with multiple failures. 4

Example 5.1 Consider lifetime T i Weibull(α,λ i ) where λ i is a function of covariates x i. The hazard function for subject i is This separates into h i (t) = λ α i αt α 1. a function of t, h 0 (t) = αt α 1, the baseline hazard common to all subjects, and a multiplier depending on the subject index i. The hazard function is proportional to h 0 (t), and the covariate effect multiplies h 0 by a different quantity for different subjects. 5

Equivalently h i (t) h j (t) = λα i λ α j. does not depend on t. 6

The advantage of proportional hazards models is that as different patients have hazard functions which are proportional across time, we can make inferences on the covariate effects without having to specify the form of the baseline hazard, h 0. We get the flexibility of parametric modelling without the sensitivity incurred by complete model specification. 7

Cox s Proportional Hazards Model Cox s proportional hazards model has the form h(t,x) = h 0 (t) exp(β x). A more general proportional hazards model is h(t,x) = h 0 (t)g(x). 8

Exercise 5.2 In the Cox proportional hazards model show that the ratio of hazards for different subjects does not depend on time. 9

Sol: 5.2 h i (t) h j (t) = h(t,x i) h(t,x j ) = h 0(t) exp[β x i ] h 0 (t) exp[β x j ] = exp[β (x i x j )] does not depend on t. 10

Partial likelihood The likelihood obtained for this model requires determining the pdf f(t) from the hazard function, h(t). This depends on h 0, and so is to be avoided. Cox s partial likelihood is based on conditioning on times at which failures occur, and then extracting the information about the covariates by assessing who failed at these times. 11

Exercise 5.3 Are the phrases failure times and times at which failures occur synonomous? 12

Sol: 5.3 No. 13

Remarks It is not clear that the partial likelihood is a true likelihood, or a marginal or integrated likelihood, or a conditional likelihood. It is a similar idea to that used in the Kaplan Meier estimate of the survivor function. 14

Motivation for the partial likelihood Consider a failure time t. Assume that there are no ties, so that only one failure occurred at time t. Denote by R(t) the risk set at time t; i.e. the set of individuals alive immediately before time t. Calculate the probability that the individual who actually fails in the interval [t,t + δt) is individual i, where i R(t). Recall P(T [t,t + δt) T t) h(t)δt for small δt. 15

Now consider P{i fails in [t,t + δt) one failure in [t,t + δt),r(t)} = = = = P{only i fails in [t,t + δt) R(t)} P{one failure in [t, t + δt) R(t)} h i (t)δt j R(t)\i (1 h j(t)δt) { h k (t)δt } j R(t)\k (1 h j(t)δt) k R(t) h i (t)δt(1 + O(δt)) k R(t) {h k(t)δt(1 + O(δt))} h i (t) k R(t) h k(t) + O(δt). Now let δt 0. 16

Under the proportional hazard assumption h i (t) k R(t) h k(t) = = h 0 (t) exp{β x i } k R(t) h 0(t) exp{β x k } exp{β x i } k R(t) exp{β x k }. Taking the product of such terms over all times of deaths defines the partial likelihood: { } n exp{β x i } PL(β) =. k R(t i ) exp{β x k } i=1 17

Exercise 5.4 n = 3 and λ i = exp{β x i }, i = 1, 2, 3. Write out the PL in terms of the λs from (i) failure times 2, 4, 5 respectively for individuals i = 1, 2, 3, (ii) failure times 4, 5, 2 respectively for individuals i = 1, 2, 3, 18

Sol: 5.4 (i) fail order of individuals 1, 2, 3 λ 1 λ 2 PL(β) = λ 1 + λ 2 + λ 3 λ 2 + λ 3 (ii) fail order of individuals 3, 2, 1 PL(β) = λ 3 λ 1 + λ 2 + λ 3 λ 2 λ 1 + λ 2 Recall the Exponential calculation of P(T 1 < T 2 < T 3 ). 19

Remarks on the PL function Censored individuals appear in denominator for all t i less than their censoring time. L(β) in no way depends on h 0 (t) so can use partial likelihood to make inferences about β (e.g. treatment/covariate effects) without specifying h 0 (t). Use as if a standard likelihood - find maximum partial likelihood estimate, and for inference use β N(β,I(β) 1 ) established using asymptotic arguments. 20

Numerical methods are generally needed to maximise the likelihood function Ties in data complicate details. (Ignore for now.) No probability justification for the product. 21

Example of Cox PH in R R fits Cox s proportional hazards model using similar syntax to the fitting of parametric models. Consider the leuk data and linear predictor ag+log(wbc) as before: coxph(surv(time)~ag+log(wbc),data = leuk) coef exp(coef) se(coef) z p agpresent -1.069 0.343 0.429-2.49 0.0130 log(wbc) 0.368 1.444 0.136 2.70 0.0069 Likelihood ratio test=15.6 on 2 df, p=0.000401 n= 33 Both variables are extremely significant. 22

Exercise 5.5 Why is there no intercept term in the fitted coefficients. 23

Sol: 5.5 Add an intercept to the PH model st h i (t) = h 0 (t) exp(α + βx i ). However the PL is n PL(β) = = = i=1 n i=1 n i=1 h i (t) k R(t) h k(t) and is invariant to α. h 0 (t) exp(α + βx i ) k R(t) h 0(t) exp(α + βx k ) exp(βx i ) k R(t) exp(βx k), Equivalently it is absorbed into the baseline hazard. 24

We can confirm that ag is informative to the model by trying to remove it: coxph(formula = Surv(time) ~ log(wbc), data = leuk) Likelihood ratio test=9.19 on 1 df, p=0.00243 n= 33 The likelihood ratio statistic for the ag effect is 15.6 9.19 = 6.4, significant on χ 2 1. 25

Relative risk The fitted model includes both log(wbc) and ag and has hazard function h(t) of the form: ag absent ag present h 0 (t) exp{ 0.368 log (wbc)} h 0 (t) exp{ 1.069 + 0.368 log(wbc)} The effect of a two level factor variable on the response is to evaluate the relative risk. For the ag factor suppose patients in the two groups have the same value of the other covariates. The relative risk is the ratio of hazards = exp( 1.069) = 0.343. 26

Thus, patients with ag present have a risk which is around 1/3 of patients without ag (provided they have the same value of wbc). We note also that the coefficient of the log(wbc) variable is positive, so that increasing the level of wbc (and keeping the other covariates fixed) increases the risk. 27

Estimating the baseline hazard With estimates of the covariate effects we can construct a cumulative hazard function estimate Ĥ 0 corresponding to h 0. Recall the KM estimator Ĥ 0 (t) = d j r(t j ) = j:t j <t deaths at t j num at risk at t j. Replace r(t) by i R(t) exp( β x i ) to allow for different risks of different subjects with different values of x. 28

Then plot Ĥ0(t) or Ŝ0(t) or Ĥ 0 (t) exp( β x) for a given x. 29

Leukaemia example Suggests two plots of the baseline hazard in R: one for each level of ag, one using the mean (or min,max) of log (wbc). out.km = survfit(surv(time)~ag,data=leuk) plot(out.km, lty=2:3, log=t, col=c( red, blue )); grid() # KM out.coxph <- coxph(surv(time)~log(wbc)+ag,leuk) lines(survfit(out.coxph, newdata=data.frame(ag=1, wbc=min(leuk$wbc))), col= blue,lty=3,lwd=3) lines(survfit(out.coxph, newdata=data.frame(ag=0, wbc=min(leuk$wbc))), col= red,lty=2,lwd=3) Note that survfit can be passed the output of the coxph function. 30

0.05 0.20 0.50 0.05 0.20 0.50 0.05 0.20 0.50 0 50 100 150 0 50 100 150 0 50 100 150 Baseline survivor plots from Cox s PH model fitted to the leuk data. Thin lines are Kaplan Meier estimates that ignore blood counts. The thick lines are from a Cox PH model. 31

The plot shows the higher the wbc, the greater the risk, significant difference between the two ag groups, and linearity of the log survivor functions (suggests exponential model for these data). 32

Futhermore: the median value of wbc is very similar to the Kaplan Meier estimators. However, close inspection reveals that in each plot, the fitted lines graphs are very slightly closer under the Cox model than using the Kaplan Meier estimate. The Cox model has accounted for differences in patients due to the effect of wbc which are ignored by the K M estimator. Thus, allowing for the different levels of wbc in the two ag groups the evidence for a difference in survival times between the two groups is (very slightly) diminished. In other examples, the effect of properly accounting for covariates through a Cox model can be very influential. 33

Testing the PH assumption The PH assumption h(t) = h 0 (t)g(x), where g(x) may be exp(β x), gives a survivor function S(t) S(t,x) = S 0 (t) g(x), where S 0 (t) = exp{ t 0 h 0(u)du} is the baseline survivor function for an individual with g(x) = 1. Taking logs log { logs(t)} = logg(x) + log { logs 0 (t)}. Survivor functions of sub-groups of the data common covariates are parallel when plotted on the complementary log log scale. 34

Plots of log { logs(t,x i )} against t or log (t) should be roughly parallel. 35

Example 5.6 The following code fits the Cox s PH model with a different baseline hazard for each ag group.?strata out.leukph <- coxph(surv(time)~log(wbc)+strata(ag),data=leuk) plot(survfit(out.leukph,conf.type= log-log ), grid() col=c( red, blue ),fun="cloglog",lty=1:2,xlim=c(1,200)) The fitted survivor function uses the mean value of wbc for all individuals as default, so that we can compare the slopes of the lines. The fitted lines look very close to parallel, confirming the appropriateness of the Cox PH model for this data set. 36

3 2 1 0 1 2 1 2 5 10 50 200 37