Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

Size: px

Start display at page:

Download "Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time"

Josephine Wilcox
5 years ago
Views:

1 Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/36

2 The case of complete observations Suppose that we have a single sample of survival times, where none of the observations are censored. Then, the survivor function S(t) can be estimated by the empirical survivor function, given by Ŝ(t) = Number of individuals with survival times > t Number of individuals in the data set. Let t 1, t 2,..., t n be the exact survival times of the n individuals under study. We relabel the n survival times t 1, t 2,..., t n in ascending order such that t (1) t (2)... t (n). Winter term 2018/19 2/36

3 The case of complete observations The survivorship function at t (i) can be estimated as Ŝ(t (i) ) = n i n = 1 i n, where n i is the number of individuals surviving longer than t (i). If two or more t (i) are equal (tied observations), the largest i value is used. This gives a conservative estimate for the tied observations. Since every individual is alive at the beginning of the study and no one survives longer than t (n), Ŝ(t (0) ) = 1 and Ŝ(t (n) ) = 0. Winter term 2018/19 3/36

4 Example: Computation of Ŝ(t) for 10 lung cancer patients Winter term 2018/19 4/36

5 Plot of the empirical survivor function Months Figure: Step function Ŝ(t) of lung cancer patients. Winter term 2018/19 5/36

6 Estimating S(t) in case of censored observations When censored observations are present, a different method of estimating S(t) is required. Before attempting to fit a theoretical distribution to a set of survival data, we will discuss nonparametric methods for estimating S(t). They are also said to be distribution-free, since they do not require specific assumptions to be made about the underlying distribution of the survival times. If the main objective is to find a model for the data, estimates obtained by nonparametric methods and graphical methods can be helpful in choosing a distribution. Winter term 2018/19 6/36

7 Kaplan-Meier estimator Let n be the total number of individuals whose survival times, censored or not, are available. Relabel the n survival times in order of increasing magnitude such that t (1) t (2)... t (n). The Kaplan-Meier (product-limit) estimate of the survivor function is Ŝ(t) = n r n r + 1, t (r) t where r runs through those positive integers for which t (r) t and t (r) is uncensored. Winter term 2018/19 7/36

8 Example Table: Kaplan-Meier estimates Ŝ(t) of remission durations of 10 patients with solid tumors. Remission time Rank r (n r) (n r+1) Ŝ(t) / /7 9/10 6/7 = /6 9/10 6/7 5/6 = /4 9/10 6/7 5/6 3/4 = /2 9/10 6/7 5/6 3/4 1/2 = Winter term 2018/19 8/36

9 Plot of the Kaplan-Meier survival curve s(t) 0> 1.0 c s: "2: 0.8 :::l Cf) c :e g_ 0.4 e Months Figure: Function Ŝ(t) for remission data. Winter term 2018/19 9/36

10 2.2 Nonparametrische Pointwise confidence S(t) intervals und Λ(t) for the Schätzung, survival function Kaplan-Meier-Schätzer Alternative Kaplan-Meier-Schätzer formulation of the (aka. Kaplan-Meier Produkt-Limit-Schätzer estimate Consider Betrachtet m distinct werden failure dietimes, geordneten t (1) < t (2) Ereigniszeitpunkte <... < t (m) [* denotes t (k), k = event]. 1, 2,, m, * * * *..., m n 0 t (1) < t (2) < < t (m 1) < t (m) * Ereignis TheDiskrete Kaplan-Meier Hazardrate estimate für ofis(t) k = [t can (k 1) be, written t (k) ) istas wiederum { λ d k = 1 P (T [t (k 1), for t (k) t ) < t(1) T t (k 1) ) Ŝ(t) = ( ) t Die Wahrscheinlichkeit, (k) t 1 d, k n das k for t t (1) k-te Intervall zu überleben, where gegeben n k the es wurde numbererreicht: of individuals at risk just prior to t (k) and d k the number of failures at t p k = 1 λ d (k) (k = 1..., m). k = P (T t (k) T t (k 1) ) Winter term 2018/19 10/36

11 Greenwood s formula The variance of the Kaplan-Meier estimator is estimated by Greenwood s formula: Var[Ŝ(t)] = [Ŝ(t)] 2 t (k) t d k n k (n k d k ). The standard error of the Kaplan-Meier estimator is given by se{ŝ(t)} = { Var[Ŝ(t)]} 1/2. Winter term 2018/19 11/36

12 Nelson-Aalen estimator The Nelson-Aalen estimator of the cumulative hazard function H(t) is defined as { 0 for t < t(1) H(t) = t (k) t d k. n k for t t (1) The estimated variance of the Nelson-Aalen estimator is given by Var[ H(t)] = d k. n 2 t (k) t k Winter term 2018/19 12/36

13 Breslow estimator Based on the Nelson-Aalen estimator of the cumulative hazard rate, the Breslow estimator of the survival function is given by S(t) = exp[ H(t)]. The estimated variance is obtained as Var[ S(t)] = [ S(t)] 2 d k n 2 t (k) t k. Winter term 2018/19 13/36

14 Example Table: Nelson-Aalen estimates H(t) and S(t) and Kaplan-Meier estimates Ŝ(t) of remission time of 10 patients. t (k) d k n k d k /n k H(t) S(t) Ŝ(t) Winter term 2018/19 14/36

15 Life-table estimate Sterbetafel-Methode (1) Eine der traditionellsten Methoden zur Analyse von Verweildauern und Lebenszeiten. Wird The life-table vorwiegend (or actuarial) in Demographie estimate und of the Lebensversicherungen survivor function angewandt allows to group event times into intervals. Eignet sich auch für Daten in gruppierter Form Diskretisierung Consider a discretization der Zeitachse of the in time q + axis 1 Intervalle in q + 1 adjacent, [a non-overlapping k 1, a k ), k = intervals, 1,..., q + [a k 1 1, wobei, a k ), k a= 0 = 1,. 0.. und, q + a1 q+1 where =. a 0 = 0 and a q+1 =. a 0 = 0 a 1 a 2 a 3... a q Winter term 2018/19 15/36

16 Life-table estimate (2) Notations n: number of individuals at the start of the study. d k : number of deaths in the kth interval. c k : number of censored survival times in the kth interval. n k : number of individuals who are alive at the start of the kth interval. It holds that n k = n k 1 d k 1 c k 1 for k = 2,..., q + 1. n k : Number of individuals at risk of experiencing the event in the kth interval, assuming that censored survival times occur uniformly throughout the kth interval, n k = n k c k /2. Winter term 2018/19 16/36

17 Life-table estimate (3) The conditional proportion of death in the kth interval given exposure to the risk of death in the kth interval is d k /n k. The life-table estimate of S(a k ) is given by ( S (a k ) = S (a k 1 ) 1 d ) ( ) k k n k = 1 d j n j for k = 1,..., q with S (a 0 ) = 1. The estimated variance of the life-table estimate is given by k Var[S (a k )] = [S (a k )] 2 d j n j (n j d, j) for k = 1,..., q with the standard error of S (a 0 ) = 1 being zero. Winter term 2018/19 17/36 j=1 j=1,

18 Example Table: Life-table estimate of the survivor function for some data. k a k 1 n k d k c k n k 1 d k S n (a k k ) Winter term 2018/19 18/36

19 Sterbetafel-Methode (6) Beispiel Plot of the life-table estimate Plot von ˆP k für jedes Intervall. S(t) t Winter term 2018/19 19/36

20 Cohort life tables A cohort is a group of individuals who have some common origin from which the event time will be calculated. They are followed over time and their event time or censoring time is recorded to fall in one of q + 1 adjacent, non-overlapping intervals, I k = [a k 1, a k ), k = 1,..., q + 1 with a 0 = 0 and a q+1 =. A traditional cohort life table presents the actual mortality experience of the cohort from the birth of each individual to the death of the last surviving member of the cohort. Censoring may occur because some individuals may migrate out of the study area or drop out of observation. Winter term 2018/19 20/36

21 Basic construction of a cohort life table 1 The 1st column gives the intervals I k, k = 1,..., q The 2nd column gives the number of subjects n k, entering the kth interval who have not experienced the event. 3 The 3rd column gives the number of censored survival times in the kth interval, c k. 4 The 4th column gives the number of individuals at risk of experiencing the event in the kth interval, n k. 5 The 5th column reports the number of individuals d k who experience the event in the kth interval. 6 The 6th column gives the estimated survival function at the start of the kth interval, S (a k 1 ) with S (a 0 ) = 1. Winter term 2018/19 21/36

22 Basic construction of a cohort life table (2) 7 The 7th column gives the estimated pdf ˆf (a mk ) at the midpoint of the kth interval, a mk = (a k + a k 1 )/2: ˆf (a mk ) = [S (a k 1 ) S (a k )]/(a k a k 1 ). 8 The 8th column gives the estimated hazard rate ĥ(a mk ) at the midpoint of the kth interval, a mk : ĥ(a mk ) = ˆf (a mk )/S (a mk ) = ˆf (a mk )/{S (a k ) + [S (a k 1 ) S (a k )]/2} = 2ˆf (a mk ) [S (a k ) + S (a k 1 )] Note that S (a mk ) is based on a linear approximation between the estimate of the survivor function at the endpoints of the interval. Winter term 2018/19 22/36.

23 Basic construction of a cohort life table (3) 9 The 9th column gives the standard error of survival at the beginning of the kth interval, se{s (a k 1 )} = { Var[S (a k 1 )]} 1/2 for k = 2,..., q + 1 with se{s (a 0 )} = The 10th column shows the standard error of the pdf at the midpoint of the kth interval. 11 The 11th column shows the standard error of the hazard function at the midpoint of the kth interval. Winter term 2018/19 23/36

24 Example of a cohort life table Figure: Life table for weaning example (Klein and Moeschberger 2003, p. 156). Winter term 2018/19 24/36

25 Interval estimates An estimate of the survivor function provides a summary estimate of the mortality experience of a given population. The standard error of an estimate of the survivor function provides some information about the precision of the estimate. We can use these estimators to construct a pointwise confidence interval (CI) for the corresponding value of the survivor function at a fixed time t. The intervals are constructed to assure, with prescribed probability, that the true value of the survival function, at a predetermined time t, falls in the interval we construct. Winter term 2018/19 25/36

26 Standard normal distribution cp(x) 1 - p 1-p 0 Figure: Percentiles of the standard normal distribution. Winter term 2018/19 26/36

27 Confidence interval A pointwise 100(1 α)% confidence interval for S(t), for a given value of t, is given by Ŝ(t) ± z 1 α 2 se{ŝ(t)}, where se{ŝ(t)} = { Var[Ŝ(t)]} 1/2 and z 1 α is the 1 α/2 2 quantile of the standard normal distribution. For all t it holds that ) a P (Ŝ(t) z1 α se{ŝ(t)} S(t) Ŝ(t) + z 2 1 α se{ŝ(t)} 2 1 α for a given confidence level 1 α. Winter term 2018/19 27/36

28 Confidence intervals based on transformations of Ŝ(t) Alternative CIs can be constructed by first transforming Ŝ(t) and obtaining a CI for the transformed value. The resulting confidence limits are then back-transformed to give a confidence interval for S(t) itself. Possible transformations include the log transformation, ln[ŝ(t)], the logistic transformation, ln[ŝ(t)/{1 Ŝ(t)}], and the complementary log-log transformation, ln[ ln{ŝ(t)}]. Such intervals can lead to better coverage probabilities. Winter term 2018/19 28/36

29 Simultaneous confidence bands Pointwise confidence intervals are valid for a single fixed time at which the inference is to be made. For simultaneous confidence bands of the survival function it should hold that P (Ŝ(t) ± cα (t L, t U ) S(t), t [t L, t U ]) a 1 α for a given confidence level 1 α. We present two approaches for constructing confidence bands for S(t). Winter term 2018/19 29/36

30 Equal precision (EP) bands Equal precision (EP) bands are obtained as Ŝ(t) ± c α (a L, a U ) se{ŝ(t)}, with c α (a L, a U ) chosen such that 0 < a L < a U < 1, where and a L = nˆσ2 S (t L) 1 + nˆσ 2 S (t L) ˆσ 2 S(t) = Var(Ŝ(t)) Ŝ(t) 2, a U = nˆσ2 S (t U) 1 + nˆσ 2 S (t U) = t (k) t d k n k (n k d k ). Winter term 2018/19 30/36

31 Equal precision (EP) bands (2) To construct 100(1 α)% confidence bands for S(t) over the range [t L, t U ], we find a confidence coefficient, c α (a L, a U ). We pick t L < t U so that t L is greater than or equal to the smallest observed event time and t U is less than or equal to the largest observed event time. Values of c α (a L, a U ) can be obtained from Table C.3 in Appendix C in Klein and Moeschberger (2003). EP bands are proportional to pointwise confidence intervals. Winter term 2018/19 31/36

32 Hall-Wellner bands An alternate set of confidence bands are Hall-Wellner bands, which are obtained as Ŝ(t) ± k α(a L, a U ) n [1 + nˆσ 2 S(t)]Ŝ(t). To construct a 100(1 α)% confidence band for S(t) over the region [t L, t U ], we find a confidence coefficient, k α (a L, a U ). For these bands, a lower limit, t L, of zero is allowed. Winter term 2018/19 32/36

33 Hall-Wellner bands (2) Values of k α (a L, a U ) can be obtained from Table C.4 in Appendix C in Klein and Moeschberger (2003). Hall-Wellner bands are not proportional to the pointwise confidence intervals. As in the case of pointwise confidence intervals, other forms for the confidence bands based on transformations are available. Winter term 2018/19 33/36

34 Point estimates of quantiles of survival times Recall that the pth quantile (0 p 1) of a random variable T with survival function S(t) is defined as t p = inf{t : S(t) 1 p}. To estimate t p, we find the smallest time ˆt p for which the value of the estimated survivor function is less than or equal to 1 p, that is, ˆt p = inf{t : Ŝ(t) 1 p}. Some software packages use a different estimator. Winter term 2018/19 34/36

35 Interval estimates of quantiles of survival times An estimator of the variance of ˆt p may be obtained from an application of the delta method. The suggested estimator for the variance of the estimator of the pth quantile is Var(ˆt p ) = Var(Ŝ(ˆt p )) [ˆf (ˆt p )] 2. The estimator of the pdf often used is ˆf (ˆt p ) = Ŝ(û p) Ŝ(ˆl p ) ˆl p û p. Winter term 2018/19 35/36

36 Interval estimates of quantiles of survival times (2) The values û p and ˆl p are chosen such that û p < ˆt p < ˆl p and are obtained as û p = max{t : Ŝ(t) 1 p+ɛ} and ˆl p = min{t : Ŝ(t) 1 p ɛ} for small values of ɛ. The endpoints of a 100 (1 α)% confidence interval are where s.e.(ˆt p ) = ˆt p ± z 1 α/2 s.e.(ˆt p ), Var(ˆt p ). Winter term 2018/19 36/36

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 4 Fall 2012 4.2 Estimators of the survival and cumulative hazard functions for RC data Suppose X is a continuous random failure time with