A note on the decomposition of number of life years lost according to causes of death

Size: px

Start display at page:

Download "A note on the decomposition of number of life years lost according to causes of death"

Damon Jeffery Gilmore
5 years ago
Views:

1 A note on the decomposition of number of life years lost according to causes of death Per Kragh Andersen Research Report 12/2 Department of Biostatistics University of Copenhagen

2 A note on the decomposition of number of life years lost according to causes of death Per Kragh Andersen, Department of Biostatistics, University of Copenhagen, Ø. Farimagsgade 5 PB299, DK-114 Copenhagen K, Denmark. pka@biostat.ku.dk Abstract The standard competing risks model is studied and we show that the cause j cumulative incidence function integrated from to τ has a natural interpretation as the expected number of life years lost due to cause j before time τ. This is analogous to the τ-restricted mean life time which is the survival function integrated from to τ. The large sample properties of a non-parametric estimator are outlined, and the method is exemplified using a standard data set on survival with malignant melanoma. It is discussed how the number of years lost may be related to subject-specific explanatory variables in a regression model based on pseudo-observartions. The method is contrasted to cause-specific measures of life years lost used in demography. Key words: demography; life expectancy; Survival analysis. 1

3 Introduction In demography the expected life time is a useful summary of the age-specific mortality experience (e.g., Preston et al., 21). If α(t) is the age-specific mortality rate (hazard function) then t S(t) = exp( α(u)du) is the corresponding survival function, and the life expectancy at birth is e(, ) = S(t)dt. In practice, is represented by the maximally attainable life length, say ω, and age-specific mortality rates between ages and ω are needed to estimate the expected life time. When death can be caused by a number, k, of exclusive and exhaustive causes j = 1,...,k, the survival function is k t S(t) = exp( α j (u)du) j=1 where α j ( ) is the cause-j-specific hazard function. Thereby, the life expectancy at birth can be written as e(, ) = S 1 (t) S k (t)dt with S j (t) = exp( t α j(u)du). So-called cause-j-deleted life-tables are 2

4 then defined via the survival function S(t)/S j (t) = exp( l j t α l (u)du) and under independence of causes of death, this is the survival function in a population where cause j is no longer operating and where (by independence ) the rate of death from cause l j is still given by the cause-specific hazard α l (t). Following those lines, life expectancy without cause j e j (, ) = S(t)/S j (t)dt and number of years lost due to cause j e j (, ) e(, ) can be defined (e.g., Beltrán-Sánchez et al., 28). In biostatistics, the assumption of independent competing risks has to a large extent been abandoned by now because of the recognition of the fact that data from this world (where all k causes of death are operating) provide little information on how the mortality experience would be in a hypothetical world where cause j has been eliminated (e.g., Andersen and Keiding, 212). It is therefore of interest to develop alternative measures of the impact on life expectancy of given causes of death which do not rely on such extrapolations. This is what we will do in this note and, as we shall see, a very simple approach to the question can be taken. 3

5 Restricted mean life time and number of life years lost In biostatistics the expected life time can rarely be estimated reliably because of right-censoring and as an alternative summary of the mortality experience, the τ-restricted mean life time is some times used (e.g., Irwin, 1949; Karrison, 1987; Andersen et al., 24). This is defined as e(,τ) = S(t)dt, (1) thus for τ = ω =, this is simply the expected life time at birth. The restricted mean life time is the expectation E(T τ) of the random variable T τ where T is the (unrestricted) life time. Therefore, the difference L(,τ) = τ E(T τ) (2) is the expected number of years lost before time τ. In a competing risks situation, the cause j cumulative incidence is the probability F j (t) = P(T t,d = j) = t S(u )α j (u)du of failing from cause j before time t (where the random variable D denotes the cause of failure with possible values j = 1,...,k). For all values of t we have k S(t) + F j (t) = 1 j=1 4

6 and, therefore, the expected number of years lost before time τ can be written as a sum L(,τ) = τ k S(t)dt = F j (t)dt j=1 with a term for each of the failure causes. We will now show that each term L j (,τ) = F j (t)dt has a quite simple interpretation. The competing risks model can conveniently be represented by a multistate model for a stochastic process, X(t), with states : alive (the initial state) and j: dead by cause j, j = 1,...,k (the final, absorbing states) (e.g., Andersen et al., 1993, Section IV.4). The j transition intensity is the cause-specific hazard α j (t) and the state occupation probabilities are P(X(t) = ) = S(t) and P(X(t) = j) = F j (t),j = 1,...,k. The failure time is the (proper) random variable, T = inf t {X(t) }, denoting the time of exit from the initial state whereas the times T (j) = inf t {X(t) = j} of entry into the final state j = 1,...,k are all improper random variables because P(T (j) = ) = 1 F j ( ) >. However, for any τ, the random variable T (j) τ is proper with an expected value given by E(T (j) τ) = ts(t)α j (t)dt + τ(1 F j (τ)) 5

7 which by partial integration equals τ F j(t)dt. This leads to the result L j (,τ) = F j (t)dt = E(τ T (j) τ). (3) Thereby, the total number of years lost before time τ can be decomposed into a sum over failure causes j of terms which can be interpreted as the number of years lost due to that cause. Restricting attention to subjects still alive at, say t, is simply handled by considering the conditional survival function S(t t ) = S(t)/S(t ) and the conditional cumulative incidences F j (t t ) = F j (t)/s(t ) given survival till t. Thereby, for those still alive at time t, the expected life time before τ > t e(t,τ) = t S(t t )dt and the expected number of years lost before τ due to cause j L j (t,τ) = t F j (t t )dt can be defined. Graphically, e(,τ) and L j (,τ),j = 1,...,k can be represented, as follows. Suppose for simplicity that k = 2 causes are considered. The cumulative incidences may be presented on a stacked plot, i.e., F 1 (t) and F 1 (t) + F 2 (t) are plotted against t and on such a plot, L 1 (,τ) is the area under the lower curve between and τ, and L 2 (,τ) the area between the two curves between and τ. Finally, e(,τ) is the area between the upper 6

8 curve and 1 between and τ (see Figure 1 for an example). Non-parametric estimation of the survival and cumulative incidence functions based on right-censored (and possibly, left-truncated) survival data is straightforward using the Kaplan-Meier and Aalen-Johansen estimators. Thereby, also the quantities e(t,τ) and L j (t,τ) are easily estimable. For the former, asymptotic properties as the number, n of subjects increases were presented by Andersen et al. (1993, Example IV.3.8). For the latter, asymptotic properties are outlined in the Appendix. In the case of no censoring, i.e. all failures before time τ are observed, both the Kaplan-Meier and the Aalen-Johansen estimators are simple relative frequencies, thus in this case the estimators of (1) and (3) are simply ê(,τ) = 1 (T i τ) n i and L j (,τ) = 1 I(D i = j)(τ T i τ). (4) n i Here, T i is the time of failure for subject i = 1,...,n and D i is the cause of failure indicator for subject i = 1,...,n with possible values j = 1,...,k. If interest focuses on how the parameters e(,τ) and L j (,τ) relate to individual explanatory variables (such as gender, ethnic groups, occupational groups etc.) then regression analysis is straightforward in the case of no censoring. In the more realistic case where some observations are rightcensored (before time τ), one possible approach would be to use so-called 7

9 pseudo-observations (e.g., Andersen et al., 23, 25). If θ denotes one of these parameters and θ the estimator based on the full sample then the ith pseudo-observation for θ is θ i = n θ (n 1) θ ( i),i = 1,...,n. (5) Here, θ ( i) is the estimator applied to the sample (of size n 1) obtained by taking subject i out of the full sample. In this way the association between the parameter and explanatory variables, Z i can be ascertained via generalized estimating equations with θ i as the outcome variable (Andersen et al., 23). The method is implemented as the function pseudoyl in the R-package pseudo available at CRAN. Example: Survival with malignant melanoma Andersen et al. (1993, Example I.3.1) presented a study of survival with skin cancer (malignant melanoma). Here, 25 patients had radical surgery at Odense University Hospital, Denmark between 1962 and At the closing date of the study, 57 patients had died from the disease (cause j = 1), 14 had died from other causes (cause j = 2) and 134 were still alive. Figure 1 shows the stacked plot of the cumulative incidences of dying from the two causes in question. For τ = 1 years the restricted mean life time is ê(, 1) = 7.51 (SD=.245) while L 1 (, 1) = 2.5 (SD=.232) years were lost due to deaths from the disease and L 2 (, 1) =.485 (SD=.133) years were lost due to deaths from other causes. The corresponding areas on the stacked plot can 8

10 be seen in Figure 1. Figure 1 around here. In regression models using pseudo-observations it was found that men on average lost 1.21 (SD=.488) more years than women because of the disease and.211 (SD=.284) more years than women due to other causes. The estimated intercepts in these models (estimated numbers of years lost for women) were for deaths from the disease and.44 for deaths from other causes. Since 79/25=38.5% of the patients were men, these results are well in line with the overall numbers of years lost (since and ). The effect of tumor thickness on the number of years lost due to the disease was.361 years (SD=.82) for each mm and the effect of age on the number of years lost due to other causes was.269 years (SD=.82) for every 1 years of age. Discussion. Accounting for standard mortality In demography, number of life years lost is usually related to life expectancy based on standard mortality tables (e.g., Vaupel and Canudas-Romo, 23, Aragón et al., 28, Shkolnikov et al., 211). Thus, for uncensored data the expected number of life years lost is defined as n L(,ω) = D i e (T i,ω)/ i=1 i D i where D i = n and e (T i,ω) is the mean residual life time at the time (age) T i of death, calculated based on standard mortality rates. For cause 9

11 j = 1,...,k the number of life years lost due to that cause is similarly defined as n L j (,ω) = I(D i = j)e (T i,ω)/ i=1 i I(D i = j). Thus, the mean residual life times at time (age) of death from cause j are averaged over those who died from that cause. In the parlance of Gardner and Sanborn (199) this is a measure of Years of potential life lost. Replacing the mean residual life time in these expressions by the difference max(τ T i, ), a measure of Premature (to τ) years of potential life lost is obtained. This is seen to resemble (4) (for uncensored data), however, we divide by n when the cause-specific measure is calculated while, in demography, division by the number i I(D i = j) of deaths from cause j (before τ) is done. As we have seen, our measure of the expected number of life years lost due to cause j before τ has an interpretation as the parameter E(τ T (j) τ) from the population where all k causes of death are operating and it does not involve standard mortality tables. In biostatistics a framework where it is feasible to account for standard mortality is that of relative survival or excess hazard models. Here, the mortality rate, α(t) for a subject is written as a sum α(t) = α (t) + α (t) of a standard mortality rate, α (t) (assumed known) and an excess mortality, α (t). Often, this model is used in the absence of cause of death information, however, formally this is the hazard rate in a competing risks model with two causes of failure in which case α (t) is interpreted as the mortality rate 1

12 from the disease while α (t) is the mortality rate from other causes. Further, if cause of death information is available and if cause-specific standard mortality rates, α j(t),j = 1,...,k are given with α (t) = j α j(t) then an excess mortality competing risks model for the cause-specific hazard α j (t) = α j (j) + α j(t) can be studied. This can be seen as a competing risks model with k + 1 cause-specific hazards: α (t) for dying from disease-unrelated causes and α j (t) = α j (t) α (t) for dying from cause j,j = 1,...,k due to the disease. For this competing risks model we have the relation S(t) + j F j (t) + F (t) = 1 (with obvious notation) and the expected total number of life years lost before time τ L(,τ) = τ S(t)dt can be decomposed into L(,τ) = j F j (t)dt + F (t)dt. Here, the interpretation is that F (t)dt is the unavoidable expected number of life years lost due to the standard mortality while F j(t)dt represents 11

13 the number of years lost from cause j due to the disease. Note that, in L(,τ) the quantity L (,τ) = F (t)dt = t S(u)α (u)dudt appeared. This represents the standard number of life years lost when the excess mortality rates α j (t),j = 1,...,k are also operating and replaces the quantity t S (u)α (u)dudt = τ S (t)dt used in demography. The latter would be the number of life years lost before τ in a population where α (t) is the only hazard rate operating, a quantity which does not play a role here since all our computations relate to this world where all causes of death are present. Acknowledgements The research was supported by the Danish Natural Science Research Council [grant number Point process modelling and statistical inference ]. Comments from Niels Keiding, Department of Biostatistics, University of Copenhagen are greatly appreciated. We thank Maja Pohar Perme, Institute of Biostatistics and Medical Informatics, University of Ljubljana for updating the pseudo package. 12

14 References Andersen, PK, Borgan, Ø, Gill, RD, Keiding, N (1993) Statistical Models Based on Counting Processes. Springer-Verlag, New York. Andersen, PK, Hansen, MG, Klein, JP (24). Regression analysis of restricted mean life time using pseudo-observations. Lifetime Data Analysis 1, Andersen, PK, Keiding, N. (212). Interpretability and importance of functionals in competing risks and multi-state models. Statistics in Medicine (in press). Andersen, PK, Klein, JP, Rosthøj, S (23). Generalized linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika 9, Aragón, TJ, Lichtensztajn, DY, Katcher, BS, Reiter, R, Katz, MH (28). Calculating expected years of life lost for assessing local ethnic disparities in causes of premature death. BMC Public Health 8, doi:1.1186/ Beltrán-Sánchez, H, Preston, SH, Canudas-Romo, V (28). An integrated approach to cause-of-death analysis: cause-deleted life tables and decompositions of life expectancy. Demographic Research 19,

15 Gardner, JW, Sanborn, JS (199). Years of potential life lost (YPLL) What does it measure? Epidemiology 1, Irwin, JO. (1949). The standard error of an estimate of expectation of life, with special reference to expectation of tumourless life in experiments with mice. Journal of Hygiene 47, Karrison, T. (1987). Restricted mean life with adjustment for covariates. Journal of the American Statistical Association 82, Preston, SH, Heuveline, P, Guillot, M (21). Demography. Measuring and Modeling Population Processes. Blackwell, Oxford. Shkolnikov, VM, Andreev, EM, Zhang, Z, Oeppen, J, Vaupel, JW. (211). Losses of expected lifetime in the United States and other developed countries: Methods and empirical analyses. Demography 48, Vaupel, JW, Canudas-Romo, V (23). Decomposing change in life expectancy: A bouquet of formulas in honor of Nathan Keifitz s 9th birthday. Demography 4,

16 Appendix: Asymptotics ( F j (t) F j (t))dt = = ( t ( t t ( (Ŝ(u) S(u))dÂj(u) + S(u)(dÂj(u) da j (u)) ) dt ) Ŝ(u) ( S(u) 1)S(u)dÂj(u) + S(u)(dÂj(u) da j (u)) dt u J(x) Y (x) dm(x)s(u)dâj(u) + S(u) J(u) ) Y (u) dm j(u) dt (Duhamel s equation and consistency of Ŝ and Âj) = ( t t (changing order of integration) u df j (x) J(u) ) J(u) dm(u) + S(u) Y (u) Y (u) dm j(u) dt = ( t (S(u) F j (t) + F j (u)) J(u) Y (u) dm j(u) (F j (t) F j (u)) J(u) Y (u) dm j(u) ) dt. In large samples, a process of the form n t f t(u)j(u)dm(u)/y (u) with a deterministic f t ( ) behaves like a mean zero Gaussian process U(t) with a covariance σ(s,t) which may be estimated by σ(s,t) = s t f s (u)f t (u) dn(u). Y 2 (u) Therefore, we have in large samples n ( F j (t) F j (t))dt U j (t)dt + U j (t)dt 15

17 where U j and U j are independent. It follows that L j (,τ) is asymptotically normal and that the variance may be estimated by ( σ j (s,t) + σ j (s,t))dsdt with f j t (u) = S(u) F j (t) + F j (u) and f j t (u) = F j (t) + F j (u). Thus, σ j (s,t) = s t (Ŝ(u) F j (s) + F j (u))(ŝ(u) F j (t) + F j (u)) dn Y (u) 2 j (u) and σ j (s,t) = s t ( F j (u) F j (s))( F j (u) F j (t)) d N Y (u) 2 r (u), r j where, N r ( ) counts failures from cause r. 16

18 Cumulative incidences YEARS Figure 1: Malignant melanoma study: Stacked plot of cumulative incidences of dying from the disease (dashed curve) and from other causes (difference between solid and dashed curves). 17

19 Research Reports available from Department of Biostatistics Department of Biostatistics University of Copenhagen Øster Farimagsgade 5 P.O. Box Copenhagen K Denmark 1/1 Andersen, P.K. & Skrondal, A. Biological interaction from a statistical point of view. 1/2 Rosthøj, S., Keiding, N., Schmiegelow, K. Application of History-Adjusted Marginal Structural Models to Maintenance Therapy of Children with Acute Lymphoblastic Leukaemia. 1/3 Parner, E.T. & Andersen, P.K. Regression analysis of censored data using pseudoobservations. 1/4 Nielsen, T. & Kreiner, S. Course Evaluation and Development: What can Learning Styles Contribute? 1/5 Lange, T. & Hansen, J.V. Direct and Indirect Effects in a Survival Context. 1/6 Andersen, P.K. & Keiding, N. Interpretability and importance of functionals in competing risks and multi-state models. 1/7 Gerds, T.A., Kattan, M.W., Schumacher, M. & Yu, C. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. 1/8 Mogensen, U.B., Ishwaran, H. & Gerds, T.A. Evaluating random forests for survival analysis using prediction error curves. 11/1 Kreiner, S. Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. 11/2 Andersen, P.K. Competing risks in epidemiology: Possibilities and pitfalls. 11/3 Holst, K.K. Model diagnostics based on cumulative residuals: The R-package gof. 11/4 Holst, K.K. & Budtz-Jørgensen, E. Linear latent variable models: The lava-package. 11/5 Holst, K.K., Budtz-Jørgensen, E. & Knudsen, G.M. A latent variable model with mixed binary and continuous response variables. 11/6 Cortese, G., Gerds, T.A., &Andersen, P.K. Comparison of prediction models for competing risks with time-dependent covariates.

20 11/7 Scheike, T.H. & Sun, Y. On Cross-Odds Ratio for Multivariate Competing Risks Data. 11/8 Gerds, T.A., Scheike T.H. & Andersen P.K. Absolute risk regression for competing risks: interpretation, link functions and prediction. 11/9 Scheike, T.H., Maiers, M.J., Rocha, V. & Zhang, M. Competing risks with missing covariates: Effect of haplotypematch on hematopoietic cell transplant patients. 12/1 Keiding, N., Hansen, O.K.H., Sørensen, D.N. & Slama, R. The current duration approach to estimating time to pregnancy. 12/2 Andersen, P.K. A note on the decomposition of number of life years lost according to causes of death

Multi-state models: prediction

Multi-state models: prediction Department of Medical Statistics and Bioinformatics Leiden University Medical Center Course on advanced survival analysis, Copenhagen Outline Prediction Theory Aalen-Johansen Computational aspects Applications