6. Cox Regression Models. (Part I)

Size: px

Start display at page:

Download "6. Cox Regression Models. (Part I)"

Andrew Webb
5 years ago
Views:

1 6. Cox Regressio Models (Part I)

2 The Proportioal Hazards Model A proportioal hazards model proposed by D.R. Cox (197) assmes that λ t z = λ 0 t e β 1z 1 + +β p z p = λ 0 t e zt β where z is a p 1 vector of covariates sch as treatmet idicators, progositc factors, etc., ad β is a p 1 vector of regressio coefficiets. Note that there is o itercept β 0 i the model. So λ 0 t is ofte called the baselie hazard fctio. It ca be iterpreted as the hazard fctio for the poplatio of sbjects with z = 0, i.e. λ t z = 0 = λ 0 t. The baselie hazard fctio λ 0 t ca take ay shape as a fctio of t. The oly reqiremet is that λ 0 t > 0. This is the oparametric part of the model ad z t β is the parametric part of the model. So Cox's proportioal hazards model is a semiparametric model.

3 Iterpretatio of a PH model exp z It is easy to show that der PH model, S t z = S 0 t Tβ, where S t z is the srvival fctio of the sbpoplatio with covariate z ad S 0 t is the srvival fctio of baselie poplatio (z = 0). That is S 0 t = e 0 For ay two sets of covariates z 0 ad z 1, t λ0 d λ t z 1 = λ 0 t e z1 λ t z 0 λ 0 t e = e z 1 z Tβ 0, for all t > 0 z 0 T β which is a costat over time. Eqivaletly, log λ t z 1 = z λ t z 1 z T 0 β. 0 T β With oe it icrease i z k while other covariate vales beig held fixed, the log λ t z k + 1 λ t z = β k + 1 λ t z k = e β k or, k λ t z k Therefore, β k e β k is the icrease i log hazard-ratio (hazard-ratio) at ay time with it icrease i the kth covariate z k. Frthermore, sice P t T < t + Δt T t, z k + 1 P t T < t + Δt T t, z k λ t z k + 1 Δt λ t z k Δt = e β k so e β k ca be loosely iterpreted as the ratio of two coditioal probabilities of dyig i the ear ftre give a sbject is alive at ay time t.

4 Iferetial Problems From the iterpretatio of the model, it is obvios that β characterizes the effect of z. So β shold be the focs of or iferece while λ 0 t is a isace parameter. Give a sample of cesored srvival data, or iferetial problems iclde: 1. Estimate β; derive its statistical properties.. Testig hypothesis H 0 : β = 0 or for part of β. 3. Diagostics. Sice the baselie hazard λ 0 t is left completely specified (ifiite dimesioal), ordiary likelihood methods ca't be sed to estimate β. Cox coceived of the idea of a partial likelihood to remove the isace parameter λ 0 t from the proposed estimatig eqatio. Historical Note: Cox described the proportioal hazards model i JRSSB (197), i what is ow the most qoted statistical papers i history. He also otlied i this paper the method for estimatio which he referred to as sig coditioal likelihood. It was poited ot to him i the literatre that what he proposed was ot a coditioal likelihood ad that there may be some flaws i his logic. Cox (1975) was able to recast his method of estimatio throgh what he called partial likelihood ad pblished this i Biometrika. This approach seemed to be based o sod iferetial priciples. Rigoros proofs showig the cosistecy ad asymptotic ormality were ot pblished til 1981 whe Tsiatis (Aals of Statistics) demostrated these large sample properties. I 198, Aderso ad Gill (Aals of Statistics) simplified ad geeralized these reslts throgh the se of cotig processes.

5 Estimatio Usig Partial Likelihood Data: X i, Δ i, z i, i = 1,,, where for the ith idividal, T X i = mi T i, C i ; Δ i = I T i, C i ; z i = z i1, z i,, z ip is a vector of covariates. Model: λ t z i = λ 0 t e z i t β where, P t T < t + h T t, z i λ t z i = lim h 0 + h Assme that C i ad T i are coditioally idepedet give z i. The the case-specific hazard ca be sed to represet the hazard of iterest. That is (i terms of coditioal probabilities) P x X i < x + Δx, Δ i = 1 X i x, z i = P x T i < x + Δx T i x, z i λ Ti x z i Δx Similar to the case of log rak test, we eed to defie some otatio. Let s break the time axis (patiet time) ito a grid of poits. Assme the srvival time is cotios. We hece ca take the grid poits dese eogh so that at most oe death ca occr withi ay iterval.

6 Notatios Let dn i () deote the idicator for the ith idividal beig observed to die i, + Δ. Namely, dn i = I X i, + Δ, Δ i = 1 ; Let Y i () deote the idicator for whether or ot the ith idividal is at risk at time. Namely, Y i = I X i Let dn x = i=1 dn i x deote the mber of deaths for the whole sample occrrig i, + Δ. Sice we are assmig Δ is sfficietly small, so dn is either 1 or 0 at ay time. Let Y x = i=1 Y i x be the total mber from the etire sample who are at risk at time. Let F x deote the iformatio p to time x (oe of the grid poits) F x = dn i, Y i, z i, i = 1,,, for all grid poits < x ad dn x Also, F = X i, Δ i, z i, i = 1,, deote all the data i the sample. Let I x deote the idividal i the sample who died at time x if someoe died. If o oe dies at time x, the I x = 0. For example, I x = j meas that the jth idividal died i x, x + Δx. The the data (with reddacy) ca be expressed as F 1, I 1, F, I,, F, where 1 < < are the grid poits

7 The Partial Likelihood The likelihood of the parameter λ 0 t ad β ca be writte as P F 1 = f 1 ; λ 0 t, β P I 1 = i 1 F 1 = f 1 ; λ 0 t, β P F = f F 1 = f 1, I 1 = i 1 ; λ 0 t, β P I = i F = f, F 1 = f 1, I 1 = i 1 ; λ 0 t, β ad the last term ca be simplified as P I = i F = f, F 1 = f 1, I 1 = i 1 ; λ 0 t, β = P I = i F = f ; λ 0 t, β That is, the fll likelihood ca be writte as the prodct of a series of coditioal likelihoods. The partial likelihood (as defied by D.R. Cox) cosists of the prodct of every other coditioal probabilities i the above presetatio. That is PL = all grid pt P I = i F = f ; λ 0 t, β

8 Ex: Patiet ID x δ z PL = 1 eβ e β +e 3β e3β e 3β e β e β +e β +e β +e 3β Case 1: Sppose coditioal o F() we have dn = 0. That is, o death is observed at time. I sch a case, I = 0 with probability 1. Hece for ay grid poit where dn = 0, we have P I = 0 F = f = 1 Case : dn = 1. Coditioal o F, if we kow that oe idividal dies at time, the it mst be oe of the idividals still at risk (alive ad ot cesored) at time ; i.e., amog the followig idividals i: Y i = 1. Also coditioal o F, we kow the covariate vector z i associated to each idividal i sch that Y i = 1. Therefore, we ask the followig qestio: AmogY = i=1 Y i idividals, what is the probability that the observed death happeed to the ith sbject (who is actally observed to die at ) rather tha to the other patiets? A: It is proportioal to the ith sbject s case-specific hazard of dyig at time.

9 Let A i = the evet that sbject i is goig to die i, + Δ give that he/she is still alive at. If a patiet is ot at risk at (i.e., Y i = 0), the A i = φ. Sice we chose Δ to be so small that there is at most oe death i, + Δ, so we kow A 1, A,, A are mtally exclsive. Also, der idepedet cesorig assmptio, the case-specific hazard is the same as the hazard of iterest (Chapter 3 ), i.e., λ, δ i = 1 z i = λ z i. Sice Δ is chose to be very small, so P A i Y i λ, δ i = 1 z i Δ = Y i λ z i Δ = Y i λ 0 e ztβ Δ assmptio of the cox model P I = i F = f ; λ 0 t, β = P A i() A 1 A = P A i() P A l λ 0 e z T i() β Δ λ 0 eztβ l ΔY l = e zi() T β e ztβ l Y l σ Here Y i() = 1 sice we kow this patiet had to be at risk at, ad died i, + Δ. Take together: PL = all grid pt e z i T β e z l Tβ Y l dn()

10 Other eqivalet ways of writig the partial likelihood iclde: Let t 1,, t d defie the distict death times, the d PL β = j=1 e z T i t j β e z l Tβ Y l t j PL β = i=1 all grid pt e z i T β e z l Tβ Y l dn i () PL β = i=1 e z T i t j β e z l Tβ Y l t j δ i The importace of sig the partial likelihood is that this fctio depeds oly o β, the parameter of iterest, ad is free of the baselie hazard λ 0 t, which is ifiite dimesioal isace fctio. Cox sggested treatig PL as a reglar likelihood fctio ad makig iferece o β accordigly. For example, we maximize the partial likelihood to get the estimate of β, ofte called MPLE (maximm partial likelihood estimate), ad se the mis of the secod derivative of the log partial likelihood as the iformatio matrix, etc.

11 ҧ Properties of the score of the partial likelihood For ease of presetatio, let s focs o oe covariate case. the log partial likelihood fctio of β is l β = The score fctio is all grid pt l β U β = β = all grid pt Ad the secod derivative is l β β = dn() σ dn() z I β log z, β = σ z l e z l β Y l Defie e z l β Y l σ dn() z I z l e z l β Y l e z l β Y l e z l β Y l σ σ z l e z l β Y l e z l β Y l z l e z l β Y l e z l β Y l = z l w l where, w l = ez l β Y l e z l β Y l w l is the weight that is proportioal to the hazard of the idividal failig. So ҧ z, β ca be iterpreted as the weighted average of the covariate z amog those idividals still at risk at time with weights w l.

12 Defie V z, β = σ = σ z l e z l β Y l e z l β Y l = z l w l = z l e z l β Y l e z l β Y l z l zҧ, β zҧ, β σ zҧ, β z l e z l β Y l e z l β Y l This last represetatio says that V z, β ca be iterpreted as the weighted variace of the covariates amog those idividals still at risk at ad hece V z, β > 0. Coseqetly, l β β = dn V z, β < 0 w l Therefore l β has a iqe maximizer ad ca be obtaied iqely by solvig the followig partial likelihood eqatio: l β U β = β = dn() z I σ z l e z l β Y l = 0 all grid pt e z l β Y l This maximizer መβ defies the MPLE of β.

13 Termiology: The qatity l β β = dn V z, β is defied as the partial likelihood observed iformatio ad is deoted by J β. Ultimately, we wat to show that the MPLE መβ has ice statistical properties. These iclde: Cosistecy: መβ will coverge to the tre vale of β, which geerated the data as the sample size gets larger. We call this tre vale β 0. Asymptotic Normality: መβ will be approximately ormally distribted with mea β 0 ad a variace which ca be estimated from the data. This approximatio will be better as the sample size gets larger. This reslt is sefl i makig iferece for the tre β. Efficiecy: Amog all other competig estimators for β, the MPLE has the smallest variace, at least, whe the sample size gets larger. I order to show the properties for መβ, we expad U መβ at the tre vale β 0 sig Taylor expasio: 0 = U መβ U β 0 + U β 0 መβ β β 0 መβ β 0 U β 1 0 U β β 0 = J β 1 0 U β 0 This expressio idicates that we eed to ivestigate the properties of the score fctio U β 0, i.e., U β 0 = dn z l zҧ, β

14 Properties of the Score I E U β 0 = E dn z I() zҧ, β 0 = E dn z I() zҧ, β 0 = E E dn z I() zҧ, β 0 F() = E dn E z I() F() zҧ, β 0 Notice that: E z I() F() = σ z l e z l β 0Y l e z l β 0Y l = z l w l = zҧ, β 0 Therefore, E U β 0 = 0

15 Coditioal distribtio of z I() give F() Vale of I() Vale of z I() Probability 1 z 1 e z 1 β 0 Y 1 / e z l β 0Y l = w 1 z e z β 0 Y / e z l β 0Y l = w 1 e z z β 0 Y / e z l β 0Y l E z I() F() = σ z l e z l β 0Y l e z l β 0Y l = w = z l w l = zҧ, β 0 Var z I() F() = z l E z I() F() w l = = V z (, β 0 ) σ z l zҧ, β 0 e z l β 0Y l e z l β 0 Y l

16 Properties of the Score II Var U β 0 = E U β 0 = E dn = E z I() zҧ, β 0 dn z I zҧ, β 0 + E dn z I zҧ, β 0 dn z I zҧ, β 0 As sal, WLOG, assme >, ad deote A = dn z I zҧ, β 0, A = dn z I zҧ, β 0 The E A A = E E A A F = E A E A F = 0 So, the expectatio of the cross-prodct is zero. The, Var U β 0 = E A = E E A F() E A F() = E dn z I zҧ, β 0 F() dn = dn = E dn z I zҧ, β 0 F()

17 Coditioal o F, dn is kow, zҧ, β 0 is also kow, ad zҧ, β 0 = E z I() F() E A F() = dn E z I zҧ, β 0 = dn Var z I F() = dn V z (, β 0 ) F() Coseqetly, Var U β 0 = E dn V z (, β 0 ) = E dn V z, β 0 Note that the qatity σ dn V z, β 0 is a statistic (ca be calclated from the observed data), so σ dn V z, β 0 is a biased estimate of Var U β 0. I fact, σ dn V z, β 0 is the partial likelihood observed iformatio J β 0 we defied before. The score U β 0 = σ A() is a sm of coditioally correlated mea zero radom variables ad its variace ca be biasedly estimated by J β 0. By the martigale CLT, we have: U β 0 ~ a N(0, J β 0 ) መβ β 0 ~ a N(0, J 1 β 0 ) መβ β 0 ~ a N(0, J 1 መβ ) Note that: J 1 መβ = σ dn V z, መβ Sice መβ β 0 J β 0 1 U β 0 I practice, sice β 0 is kow, we ca sbstitte መβ for β 0 for the variace of U β 0.

Iferece with a Sigle Covariate Assme a proportioal hazards model with a sigle covariate z λ t = λ 0 t e zβ After we get or data x i, δ i, z i, the MPLE መβ ca be obtaied by solvig the partial

18 Iferece with a Sigle Covariate Assme a proportioal hazards model with a sigle covariate z λ t = λ 0 t e zβ After we get or data x i, δ i, z i, the MPLE መβ ca be obtaied by solvig the partial likelihood eqatio; i.e., settig the partial score to zero. The asymptotically, መβ ~ a N(β 0, J 1 መβ ) We ca se this fact to costrct cofidece iterval for β ad test the hypothesis H 0 : β = β 0, etc. For example, a 1 α CI of β is መβ ± z α/ J 1 መβ 1/ Example: Myelomatosis data (SAS code) Defie a treatmet idicator trt1 which takes vale 0 for treatmet 1 ad takes vale 1 for treatmet. A 95% CI of β is: 0.578±1.96* = [-0.46, 1.57] A 95% CI of the hazard ratio e β is: e 0.46, e 1.57 = 0.653, 4.816

19 Compariso of Score Test ad Two-Sample Log Rak Test Assme z is the dichotomos idicator for treatmet; i.e., 1 for treatmet 1 z = ቊ 0 for treatmet 0 ad the proportioal hazards model: λ t = λ 0 t e zβ Score test: Uder H 0 : β = 0, the score U(0) (evalated der H 0 ) has the distribtio U 0 ~ a N(0, J 0 ) U 0 ~ a or, χ 1 J 1/ 0 Notice that U 0 = dn z I zҧ, 0 ҧ The: 1. If a death occrs at time, the dn = 1, i which case there will a cotribtio to U 0 by addig z I z, 0. Otherwise o cotribtio.. Sice z = 1 for treatmet 1 ad z = 0 for treatmet 0, z I will the be the mber of deaths at time from treatmet Uder H 0 : β = 0, zҧ, 0 is simplified to be zҧ, 0 = zl Y l () Y l (), which is the proportio of idividals i grop 1 amog those at risk at time. Sice we oly assme oe death at time, this proportio is the expected mber of death for treatmet 1 amog those at risk at time, der the ll hypothesis of o treatmet differece.

20 4. Therefore, U 0 is the sm over the death times of the observed mber of deaths from treatmet 1 mis the expected mber of deaths der the ll hypothesis. This was the merator of the two-sample log rak test: dn 1 Y 1 Y dn() 5. The deomiator of the score test was compted as J 1 0 = dn V z, 0 sice V z, 0 = σ l z l zҧ, 0 Y l () = σ l Y l () = Y 0 Y 1 Y + Y 1 Y 0 () Y () Y() 1 = 1 Y 1 Y dn Y 0 Y 1 Y () = Y 0 Y 1 Y() Y 3 () 1 Y Y 1() Y() Y() = Y 0 Y 1 Y () Y 0 () Note that I the special case where dn()ca oly be oe or zero, the variace sed to compte the lograk test statistic is: Y 1 Y 0 dn [Y dn()] Y Y 1 = Y 0 Y 1 Y (), which is J(0)

21 Therefore, for cotios srvival time data with o ties, the score test of the hypothesis H 0 : β = 0 i the proportioal hazards model is exactly the same as the lograk test for dichotomos covariate z. I fact, the score test U 0 J 1/ 0 ca be sed to test for ay covariate vale z, whether or ot z is discrete or cotios. The hypothesis H 0 : β = 0 implies that the hazard rate at ay time t is affected by the covariate z, or that the srvival distribtio does ot deped o z. The alterative hypothesis H a : β 0 wold mea that idividals with a higher vale of z wold have stochastically larger (or smaller depedig o the sig of β) srvival distribtio tha those idividals with a smaller vales of z. The test commad i Proc Lifetest comptes the score test of the hypothesis H 0 : β = 0 for the proportioal hazards model. Note that whe sig the test commad, the covariate z is ot limited to beig dichotomos, or discrete. Example: Myelomatosis data revisited (SAS code)

22 Likelihood Ratio Test As i the ordiary likelihood theory, the (partial) likelihood ratio test ca also be sed to test the ll hypothesis: H 0 : β = β 0 The likelihood ratio test ses the fact that derh 0 : β = β 0, l መβ l β 0 ~ a χ 1 Therefore, for a give level of sigificace α, we reject H 0 : β = β 0 if A Heristic Proof: Expadig l β 0 l β 0 Sice መβ maximize l β, i. e. So, H 0 : β = β 0, U መβ l መβ l β 0 χ 1,α P χ 1 > χ 1,α = α l መβ + dl መβ dβ at the MPLE መβ, we get መβ β d l መβ! d β = dl መβ dβ = 0 d l መβ ad d β l መβ l β 0 J መβ መβ β 0 = መβ β 0 J 1/ 0 ~ a χ 1 = J መβ መβ β 0 Note: The SAS procedre Phreg ca ONLY hadle right cesored data. By መβ β 0 ~ a N(0, J 1 መβ )

LECTURE 13 SPURIOUS REGRESSION, TESTING FOR UNIT ROOT = C (1) C (1) 0! ! uv! 2 v. t=1 X2 t

LECTURE 13 SPURIOUS REGRESSION, TESTING FOR UNIT ROOT = C (1) C (1) 0! ! uv! 2 v. t=1 X2 t APRIL 9, 7 Sprios regressio LECTURE 3 SPURIOUS REGRESSION, TESTING FOR UNIT ROOT I this sectio, we cosider the sitatio whe is oe it root process, say Y t is regressed agaist aother it root process, say