Concordance for prognostic models with competing risks

Size: px

Start display at page:

Download "Concordance for prognostic models with competing risks"

Winfred Hubbard
5 years ago
Views:

1 Concordance for prognostic models with competing risks Marcel Wolbers Michael T. Koller Jacquline C. M. Witteman Thomas A. Gerds Research Report 13/3 Department of Biostatistics University of Copenhagen

2 Concordance for prognostic models with competing risks MARCEL WOLBERS 1, MICHAEL T. KOLLER 2, JACQUELINE C.M. WITTEMAN 3, and THOMAS A. GERDS 4 1 Oxford University Clinical Research Unit, Wellcome Trust Major Overseas Programme, Ho Chi Minh City, Viet Nam, and Centre for Tropical Medicine, Nuffield Department of Medicine, University of Oxford, Oxford, UK 2 Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, 431 Basel, Switzerland 3 Erasmus Medical Center, 315 Rotterdam, The Netherlands 4 Department of Biostatistics, University of Copenhagen, 114 Copenhagen K, Denmark February 19, 213 Abstract The concordance probability is a widely used measure to assess discrimination of prognostic models with binary and survival endpoints. In this article we discuss and formalize an earlier definition of the concordance probability for regression models with competing risks endpoints (Wolbers et al, Epidemiology 29;2: ). For right censored data, we investigate inverse probability of censoring weighted (IPCW) estimates of a truncated concordance index. The estimates are based on a working model for the censoring distribution. The simplest model assumes that the censoring distribution is independent of the predictor variables. If the working model is correctly specified then the IPCW estimate consistently estimates the concordance probability. The small sample properties of the estimates are assessed in a simulation study also against misspecification of the working model. We further illustrate the methods by computing the concordance probability for a prognostic model of coronary heart disease (CHD) events in the presence of the competing risk of non-chd death. C index; Competing risks; Concordance probability; Coronary heart disease; Prognostic models. 1

3 1 Introduction Clinical decision-making and cost-effectiveness analyses often rely on prognostic models that quantify a subject s absolute risk of a disease event of interest over time. Traditionally, these models are based on survival analysis and a large body of research discusses the development and validation of prognostic models for time-to-event data [9, 22, 24]. However, study populations of common diseases increasingly consist of elderly individuals with varying degrees of co-morbidity who are likely to experience one of several disease endpoints other than the endpoint of main interest [14]. As an example, prediction of coronary heart disease (CHD) events in elderly subjects is complicated by the fact that subjects may die from other causes prior to the observation of the disease event of interest [25, 13]. Another example is the quantification of the benefit from implantable cardioverter-defibrillators (ICDs) in patients at risk of sudden cardiac death. Patients with an ICD profit from implantation if the device appropriately treats a lifethreatening cardiac arrhythmia event (e.g. ventricular fibrillation). However, any benefit from ICD implantation is abolished if a patient dies prior to appropriate ICD therapy. It is thus of interest whether death without prior appropriate ICD therapy occurs in a relevant proportion of patients and how such patients could be identified prospectively [15]. It is well known that the naive application of standard survival analysis leads to bias and risk over-estimation if competing risks are present [8, 18]. Prognostic models that take competing risk events into account are thus of growing importance. In many applications, including the examples mentioned above, the parameter of main interest for medical decision-making is the absolute risk of the event of interest as quantified by its (covariate-dependent) cumulative incidence function [4]. Thus, regression models are particularly attractive when they provide subject specific estimates of the absolute risks based on a set of covariates [7]. Several measures for quantifying the accuracy of prognostic models have been adapted from the standard survival setting with only one failure cause to competing risks. Measures include prediction error curves [21], time-dependent sensitivity, specificity and AUC [19, 27], and reclassification methods [25, 13]. Commonly these measures assess prognoses for the event status at a fixed followup duration t. In contrast, the concordance index assesses prognoses for the order of the event times. For survival data, the concordance index [1] is the most widely reported measure of discrimination and we have previously presented a simple adaptation of the concordance index to the competing risks setting [25]. In the present paper, we formally define a cause-specific concordance index in the presence of competing risks and derive estimators suitable for right censored data. Notably, the proposed concordance index depends only on the cumulative incidence function of the event of interest. To illustrate the definition, we describe the behaviour of the concordance index for data generated according to a proportional cause-specific hazards model as well as for data from a direct model for the cumulative incidence function. 2

4 We then study estimation of a truncated concordance index in the presence of right-censoring. We introduce an inverse probability of censoring weighted (IPCW) estimator and demonstrate its consistency if the working model for the censoring distribution is correctly specified. The empirical bias and mean-square error are examined in a simulation study. Finally, we illustrate the methods for an example of coronary risk prediction in older woman using data from the Rotterdam Study [12]. 2 Definition of concordance 2.1 Assessing scores in uncensored data Competing risks data without censoring are given by pairs (T, D) of data where T is the time to the event and D is the event type. For the purpose of discussing the definition and estimation of the cause-specific concordance index it is sufficient to assume that there are only two competing events. Thus, for simplicity of presentation we let D = 1 denote the event of interest and D = 2 the occurrence of any competing event. In applications it may be important to model all competing events separately. The concordance index is defined for any prognostic score M(X) which can be used to order subjects with respect to the risk of an event of type 1. For example M(X) could be the linear predictor of a regression model for the event of interest derived on a training data set. We assume that higher values of M(X) are associated with higher risk of the event 1. The concordance probability assesses the discriminatory power of the prognostic score. Intuitively, M(X) discriminates well if it assigns high risks to individuals experiencing the event of interest early, lower risks to individuals experiencing the event of interest late, and negligible risk to those who never experience the event of interest. The latter occurs when the individual experiences a competing event [25]. To formally define the concordance index for uncensored data we assume an independent test data set consisting of i.i.d. realizations of (X i, T i, D i ) (i = 1,..., n) from the joint distribution of predictor variables and competing risks data. We then define C 1 = P ( M(X i ) > M(X j ) D i = 1 and (T i < T j or D j = 2)) (1) for any randomly chosen pair of subjects i, j from the joint distribution of predictor variables and competing risks data. According to this definition, we count two types of pairs of individuals for which the ranking of the prognostic score can be concordant or discordant. The first type are pairs where individual i experiences the event of interest while individual j is still event free. The second type are pairs where individual i experiences the event of interest while individual j experiences the competing event at any time. For both types it is intuitive to call the pair concordant when the prognostic score is higher for individual i. Pairs of individuals which both experience the competing event 3

5 are not evaluable for C 1 but are counted for the cause-specific concordance C 2 for the competing event which is defined analogously. It is important to discuss modifications of definition (1) for tied data [see 26]. For example, it may happen that M(Xi ) = M(X j ). Depending on the application, it may then be sensible to count such pairs with a weight of 1/2: C 1 = P ( M(X i ) > M(X j ) D i = 1 and (T i < T j or D j = 2)) P ( M(X i ) = M(X j ) D i = 1 and (T i < T j or D j = 2)). To simplify notation, we used definition (1) as the basis for our further developments. 2.2 Assessing prediction models in right censored data We now generalize the concordance index defined in Section 2.1 in two ways. First, we replace the simple prognostic score M(X) by a more general prediction model M(t, X) for the risk of event 1 until time t. In this way we include regression models for the cumulative incidence of the event of interest at time t, i.e. estimates of F 1 (t X) = P (T t, D = 1 X), which can be obtained by combining cause-specific hazards models, by fitting a Fine and Gray regression model or by direct binomial regression [3, 2]. Second, to include the typical application where individuals have a limited duration of follow-up (i.e. in the presence of right censoring) we define a truncated version of the concordance index. Following [23] and [6] we define C 1 (t) := P (M(t, X i ) > M(t, X j ) D i = 1 and T i t and (T i < T j or D j = 2)). (2) The parameter C 1 (t) can be interpreted as the ability of the model to correctly rank events of interest up to time t and to distinguish them from competing events. The truncation is necessary to enable estimation of C 1 (t) from right censored data. Note that C 1 = C 1 ( ) in the special case where the order of a pair of predictions does not depend on time, i.e. M(t, X i ) = M(s, X i ) for all s, t. However, C 1 ( ) is only identifiable when the cumulative incidence F 1 ( X) can be identified on [, ) for all X. But in most real world applications even the marginal Aalen-Johansen estimator for F 1 (which is independent of X) is only defined on [, t max ) where t max is the maximal follow-up length of the study. Indeed, the truncated concordance (2) can be written as a functional of F 1 and of the marginal distribution of the predictor values of a pair of individuals (X i, X j ). To see this, we introduce notation for the order of the predicted risks for a pair of individuals, Q ij (t) = I{M(t, X i ) > M(t, X j )}. 4

6 We also note that I({s < T j or D j = 2}) = 1 I({T j s, D j = 1}). Thus, P{D i = 1 and T i t and (T i < T j or D j = 2)} t = E Xi,X j (1 F 1 (s X j ))df 1 (s X i ) (3) and we can rewrite the truncated concordance index (2) as C 1 (t) = E X i,x j (Q ij (t) t (1 F 1(s X j ))df 1 (s X i )) E Xi,X j ( t (1 F. (4) 1(s X j ))df 1 (s X i )) Of note, in the form given in equation (4) the cause-specific concordance for event 1 does not depend on the cumulative incidence F 2 of the competing event. This feature is not obvious in formula (2) but desirable when the aim is to assess the discriminative ability of a model for F 1. 3 Illustration for proportional cause-specific hazards models and direct models for the cumulative incidence Cause-specific Cox regression models and direct models for the cumulative incidence are the most frequently used regression models for competing risks [18]. In this section, we illustrate the properties of the concordance probability in setting where the true data-generating mechanism is either a set of cause-specific proportional hazards models or a Fine-Gray model. In both situations, we asses the concordance of a time-independent prognostic score. Specifically, we simulate a prognostic index M(X) = X by drawing values from the standard normal distribution. Using a Monte Carlo approach we then approximated C 1 = C 1 ( ) based on simulated data sets, each consisting of n = 1, uncensored i.i.d. realizations drawn from the respective data-generating mechanism. 3.1 Proportional cause-specific hazards models Data were generated under proportional cause-specific hazards models with constant baseline hazards: Event 1: λ 1 (t X) = λ 1 exp(β 1 X) Event 2: λ 2 (t X) = λ 2 exp(β 2 X) We set λ 1 = 1 and considered three values for λ 2 = {,.5, 2}. Note that λ 2 = corresponds to no competing risks. Four different scenarios for β 2 were considered: (1) β 2 =, (2) β 2 = β 1, (3) β 2 = β 1, (4) β 2 = 2β 1. For all scenarios, we calculated the concordance probability C 1 as depending on the strength of association of the covariate with the cause-specific hazard of 5

7 the event of interest by varying β 1 from to 1. The results are displayed in Figure 1. Figure 1 shows only weak dependence of the concordance probability on the cause-specific hazard of the competing risk if the latter is not affected by the covariate (black solid lines). If the covariate is associated with an increased hazard of the event of interest but a decreased hazard of the competing event, the discrimination ability improves (black dashed lines lines). In contrast, the discrimination ability is hampered if covariates affect both cause-specific hazards with regression coefficients of the same sign (gray lines). This occurs in particular if there is a strong association with the competing risk (gray dashed lines) or if the baseline competing hazard is high (right panel). These results can be explained by the fact that the overall effect of a covariate on the cumulative incidence function of the event of interest depends on the size of the cause-specific baseline hazard of the event of interest and the competing event as well as on the size of both cause-specific hazard ratios [1, 14]. In particularly, a covariate that is positively associated with the cause-specific hazard of the event of interest can simultaneously be negatively associated with the corresponding cumulative incidence function of that event if the cause-specific hazard of the competing event is larger and shows a stronger positive association with the covariate. This explains the observed concordance probabilities <.5. Importantly, these findings illustrate that a prognostic model for the absolute risk of the event of interest with good discrimination requires predictors which are only weakly or, even better, reversely associated with the cause-specific hazard of the competing event. 3.2 Direct models for the cumulative incidence We generated data from the following Fine-Gray model [3]: P (T t, D = 1 X) = 1 [1 p(1 exp( t))] exp(βx). Note that when X =, this is a unit exponential mixture with mass 1 p at, such that the probability of an event of type 1 in the interval [, ) is p. We varied the value of p between values.1,.5, and.9. As before we varied the strength of association β from to 1. Results are displayed in Figure 2. In addition to the expected positive association between β and the concordance probability there is also a weak dependence on p. 4 Estimation of concordance in the presence of right censoring 4.1 Right censored data To indicate the end of follow up for subject i we introduce a subject specific censoring time C i. Thus, we observe only (X i, T i, D i, i ) (i = 1... n) where 6

8 T i = min(t i, C i ), i = I{T i C i } and D i = i D i. We also use the following notation: Ñ 1 i (t) = I{ T i t, D i = 1} N 1 i (t) = I{T i t, D i = 1} Ñ 2 i (t) = I{ T i t, D i = 2} N 2 i (t) = I{T i t, D i = 2} Ã ij = I{ T i < T j } A ij = I{T i < T j } B ij = I{ T i T j and D j = 2} B ij = I{T i T j and D j = 2} The event-free survival probability conditional on the covariate X i is then given by S(t X i ) = 1 E(N 1 i (t) X i ) E(N 2 i (t) X i ) = 1 F 1 (t X i ) F 2 (t X i ). We allow the censoring distribution to depend on the covariates X i but assume throughout that C i is conditionally independent of (T i, D i ) given X i. This implies P( T i > t X i ) = G(t X i ) S(t X i ), (5) where G(t X i ) = P (C i > t X i ) is the conditional probability of being uncensored at time t. Noting G(t X i ) = P (C i t X i ) we also have for k = 1, 2: E(Ñ k i (t) X i ) = E( i N k i (t) X i ) = t 4.2 Ignoring non-evaluable pairs E(C i s X i ) E(dN k i (s) X i ) = t An asymptotically biased estimate of C 1 (t) is given by Ĉ 1,naive (t) = n n i=1 j=1 (Ãij + B ij )Q ij (t)ñ i 1(t) n n i=1 j=1 (Ãij + B ij )Ñ i 1(t) G(s X i ) df k (s X i ). (6) where Q ij (t) = I{M(t, X i ) > M(t, X j )} is an indicator for the order of predicted risks at time t. This can be interpreted as the proportion of definitely concordant pairs amongst evaluable pairs, i.e. pairs for which one individual experiences the event of interest and concordance can be decided based on the observed (potentially censored) data. The evaluable pairs are all pairs of the form (( D i = 1, T i, M(t, X i )), ( D j, T j, M(t, X j ))) with T i t and T j > T i or D j = 2. A good prognostic model should assign a lower risk to an individual that experiences an event (of any type) late ( T j > T i ) or never experiences the event of interest ( D j = 2). Thus, concordant pairs are those evaluable pairs with M(t, X i ) > M(t, X j ), i.e. Q ij (t) = 1. Pairs for which both individuals are either censored or experience the competing event as well as pairs for which 7

9 the first individual experiences the event of interest and the second is censored before that time point are considered non-evaluable because it is impossible to decide whether the predictions for such a pair are concordant or not. This simple estimate evaluated at the time t max of the maximum follow-up duration has been previously defined in (author?) [25]. While simple, a major problem of this estimator is that by ignoring non-evaluable pairs without any correction, bias is introduced. It is well known that Harrel s C depends on the censoring distribution [23, 6] and Ĉ1,naive(t) shares this limitation. 4.3 IPCW estimate We use a working model to derive an estimate Ĝ for G, the conditional probability of being uncensored given the predictors. For example, we could assume a Cox regression model: G(t X i ) = exp{ t exp(γ T X i )Γ (s)ds} where Γ is an unknown baseline hazard function and γ a vector of regression coefficients that quantify the effect of the predictors on the censoring hazard. Alternatively, we could assume that the censoring is independent of the predictors and estimate G with the Kaplan-Meier estimate for the censoring times. Based on Ĝ we define weights Ŵ ij,1 = Ĝ( T i X i )Ĝ( T i X j ) and Ŵij,2 = Ĝ( T i X i )Ĝ( T j X j ). and derive the following inverse probability of censoring weighted (IPCW) estimate of C 1 (t): n n i=1 j=1 (Ãij Ŵ 1 ij,1 + B ) ij Ŵ 1 ij,2 Q ij (t)ñ i 1(t) Ĉ 1 (t) = n n i=1 j=1 (Ãij Ŵ 1 ij,1 + B (7) ij Ŵij,2) 1 Ñ 1 i (t) Lemma 1. Under conditionally independent censoring, if the working model is correctly specified in the sense that Ĝ consistently estimates G, then Ĉ1(t) consistently estimates C 1 (t) for all t with G(t x) > ɛ > uniformly over x. A proof of the lemma is given in the supplementary material. 5 Simulation study A simulation study was performed to assess bias and root mean square error (RMSE) of the proposed IPCW estimator of C 1 (t). As in Section 3.1, the competing risks data were simulated based on a standard normal covariate X and Cox proportional hazards models with constant cause-specific baseline hazards. Specifically, we chose either cause-specific hazards λ 1 (t) = 1, λ 2 (t) = 2 and regression coefficients β 1 = β 2 = 1 (scenario CR1) or λ 1 (t) = 1, λ 2 (t) =.5, 8

10 and β 1 = 2, β 2 = 1 (scenario CR2). The truncation time point t was chosen as either the median or the upper quartile q 3 of the time to event distribution. The true concordance C 1 (t) between X and failure cause 1 ranged from.62 to.86 for the different competing risks scenarios and truncation points (Table 1). We simulated both independent and dependent right-censoring for the data. Independent censoring was based on an exponential censoring distribution with the rate of the exponential distribution chosen such that (on average) 4% of events prior to t were censored. Dependent censoring was chosen according to an exponential distribution with rate λ cens exp(x) with λ cens chosen such that 3% of events prior to t were censored. Finally, we varied the sample size N between 25 and 1 observations per simulated data set. Three different estimators of C(t) were investigated: the naive estimator Ĉ 1,naive (t) ignoring non-evaluable pairs, an IPCW estimator Ĉ1,KM(t) based on a Kaplan-Meier estimate of the censoring distribution, and an IPCW estimator Ĉ 1,Cox (t) which estimated the censoring distribution with a Cox proportional hazards model depending on X. Results for all 16 simulated scenarios based on 1 simulations per scenario are displayed in Table 1. Lower empirical bias for the IPCW-based estimators and reduced bias for the Cox-model based IPCW estimator compared to the Kaplan-Meier based estimator in situations with dependent censoring are evident for almost all scenarios. The IPCW estimates also tended to have lower RMSE with substantial improvements for some scenarios. No consistent improvements in RMSE of the Cox-model based estimator over the Kaplan-Meier estimator were apparent in the situation of dependent censoring. 6 Application to coronary risk prediction Specialist medical societies recommend initiation of preventive treatment for coronary heart disease (CHD) based on a subjects predicted 1-year risk for CHD [17]. To accurately predict the absolute risk of CHD in older people, prognostic models for CHD need to account for the competing risk of non- CHD death [13]. In this section, we revisit the example of [25] on coronary risk prediction based on data of elderly women from the Rotterdam Study, a prospective, population-based cohort of elderly subjects living in a suburb area of Rotterdam, the Netherlands [12]. We analysed data from 1 years of follow-up of 4144 women aged between 55 and 9 years who were free of CHD at baseline. During that follow-up period, 389 women experienced a CHD event and 921 women died without prior CHD event. Only 41 women of those event-free had less than 1 years of follow-up. We randomly split the data set into a training data set (2763 women with 249 CHD events) and a validation data set (1381 women with 14 CHD events). Using the training set we estimated the parameters of a Fine-Gray regression model which included the traditional baseline risk factors for CHD: age, treatment for high blood pressure (yes versus no), systolic blood pressure (separate slopes depending on whether the subject was on blood pressure treatment 9

11 or not), diabetes mellitus, log-transformed total cholesterol to HDL cholesterol ratio, and smoking status (current versus never or former smoker). All these risk factors were associated with an increased CHD risk and, except for diabetes, all reached conventional significance (p <.5). We also fitted a second Fine- Gray model with age, the strongest predictor, as the only covariate. Finally we estimated two Cox regression models, one for the cause-specific hazard of CHD and one for the cause-specific hazard of non-chd death. In both Cox analyses the same factors were included as for the multivariable Fine-Gray model. Here all risk factors were significantly associated with an increased hazard of CHD. Also, the factors age, smoking and diabetes were significantly associated with an increased hazards of non-chd death whereas the log-transformed total cholesterol to HDL cholesterol ratio showed a significant inverse association. Treatment for hypertension and blood pressure were not significantly associated with the hazard of non-chd death. The two Cox analyses yielded cause-specific hazard predictions λ 1 (t x) and λ 2 (t x), respectively, which were subsequently combined into a prediction of the cumulative incidence function of cause j based on the formula: F j (t x) = 1 t { exp s } (λ 1 (u x) + λ 2 (u x))du λ j (s x)ds. Concordance estimates were obtained for these models in the validation set. The dependence of the censoring distribution on the covariates was investigated with a Cox regression model which yielded no trends and non-significant Wald tests for all variables. Thus, all IPCW estimates of concordance were based on the marginal Kaplan-Meier estimator for the censoring distribution from the validation set. The left panel of Figure 3 shows the discrimination ability of the two Fine- Gray models for varying time horizons between 1 and 1 years. The multivariable model shows higher discriminative ability compared to the model based on age alone. For both models, concordance estimates stabilize after about 2.5 years of follow-up and remain fairly stable though slightly decreasing. The decrease may occur because earlier events are easier to predict than later events. The multivariable cause-specific Cox models showed a similar trend but slightly lower c-index in the first years compared to the multivariable Fine-Gray model (curve not shown). The right panel of Figure 3 shows corresponding prediction error curves [21, 7] in the validation set which assesses the overall performance of the models (i.e. both discrimination and calibration) whereas the concordance assesses discrimination only. The upper half of Table 2 shows the estimated concordance for the cumulative incidences of CHD and non-chd death, respectively, after 1, 5 and 1 years, as were obtained with the single split into training and validation data. The multivariable models were comparable and showed superior discrimination compared to the the age only model for the prediction of CHD whereas improvements were much smaller for the prediction of non-chd death. This is 1

12 not surprising as these additional covariates are established CHD-specific risk factors which would not be expected to strongly affect non-chd death (except for other deaths related to the cardiovascular system). The lower half of Table 2 shows bootstrap-crossvalidation estimates [6] of concordance obtained by averaging the results of 1 splits into a bootstrap training set (size 4144, same as the original data) and a corresponding validation set (size n 1525). The estimated concordance for CHD at the 1 year horizon was higher as compared to the results of the single split for all models. One possible explanation for this result is that the models where trained in bootstrap samples which contains more information than the training sample used for the single split. Another possible explanation for the differences is that the single split analysis is highly dependent on how the data are split. 7 Discussion We have presented a formal definition of the concordance probability for prognostic models in the presence of competing risks. Like the concordance probability for survival or binary data, it provides a simple numeric measure of discrimination. To deal with right censored data we derived a consistent IPCW estimator of the truncated concordance probability. The fact that we estimate a truncated version of concordance rather than the unconstrained concordance probability should not be seen as a limitation of our approach. Indeed it is impossible to assess the performance of prognostic models beyond the maximum follow-up duration without strong and untestable assumptions. This is not a particularity of the competing risks situation and also known for survival data without competing risks [23, 6]. Consistency of the proposed estimator relies on the assumption that the censoring distribution is correctly specified. In many applications it will be reasonable to assume that the censoring mechanism is independent of the predictor variables and then the marginal Kaplan-Meier estimate (ignoring the predictor variables) of the censoring distribution can be used to derive a consistent IPCW estimate. The work presented here is related to the work of [19] and, especially, [27] on time-dependent sensitivity, specificity and ROC curves for competing risks. For their definition of sensitivity and specificity, these authors defined (cumulative) cases and (dynamic) controls at time t as observations with T t and D = 1 (cases) and those with T > t (controls). In addition and similar to the present work, [27] also consider alternative controls defined as observations with T > t or D = 2. With this alternative definition of controls, the area under the ROC curve at time t from [27] is similar to our definition of concordance. However, the estimators in [27] rely on cause-specific (smoothed) Cox regression models and are different from the IPCW estimator proposed here. The advantage of the IPCW estimator is that it does not require a model for the competing risks process but only an estimate of the cenroring distribution. We assumed that the prognostic model was derived on an independent train- 11

13 ing data set. While we have not analytically derived formulas for the asymptotic variance of the estimator, confidence intervals based on bootstrapping the test data set are readily available in this situation. Clearly, independent training data are not always available, and even if they are, a joint analysis of all data will be more efficient. However, some form of internal cross-validation is needed to develop and assess a prognostic model with a single data set [2, 9, 11, 5, 22]. It is important to emphasize that our definition of concordance assesses a prognostic model for the absolute risk of the event of interest in the presence of competing risks. In line with earlier work [4, 25] we regard this risk as crucial for medical decision-making in the competing risks setting. However, in many instances explicit consideration of competing events will also be important and modeling the entire competing risks multi-state process will provide further insights [1]. As an example, we saw in Section 3.1 that discrimination of prognostic models for the event of interest is hampered if covariates affect both cause-specific hazards with regression coefficients of the same sign, especially if there is a strong association with the competing risk or if the baseline competing hazard is high. This indicates that to achieve high discrimination ability one needs predictors which are only weakly or, even better, reversely associated with the cause-specific hazard of the competing event. Moreover, in settings where all competing events are of similar importance, joint accuracy criteria for the entire competing risks multi-state process are needed and their development is an important area for future research. Acknowledgments Marcel Wolbers was supported by the Wellcome Trust and the Li Ka Shing Foundation University of Oxford Global Health Programme. Conflict of Interest: None declared. References [1] J. Beyersmann, M. Dettenkofer, H. Bertz, and M. Schumacher. A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Statistics in Medicine, 26: , 27. [2] Bradley Efron and Robert Tibshirani. Improvements on cross-validation: The.632+ bootstrap method. Journal of the American Statistical Association, 92:548 56, [3] J. P. Fine and R. J. Gray. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 446:496 59, [4] M. H. Gail and R. M. Pfeiffer. On criteria for evaluating models of absolute risk. Biostatistics, 6: ,

14 [5] T. A. Gerds, T. Cai, and M. Schumacher. The performance of risk prediction models. Biometrical Journal, 5: , 28. [6] T. A. Gerds, M. W. Kattan, M. Schumacher, and C. Yu. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine, page Early view: DOI: 1.12/sim.5681, 212. [7] T.A. Gerds, T.H. Scheike, and P.K. Andersen. Absolute risk regression for competing risks: interpretation, link functions, and prediction. Statistics in Medicine, 31: , 212. [8] G. L. Grunkemeier, R. Jin, M. J. Eijkemans, and J. J. Takkenberg. Actual and actuarial probabilities of competing risks: apples and lemons. Annals of Thoracic Surgery, 83: , 27. [9] F. E. Harrell. Regression Modeling Strategies. Springer Series in Statistics, New York, 21. [1] F.E. Harrell, R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati. Evaluating the yield of medical tests. Journal of the American Medical Association, 247: , [11] T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning (Second Edition). Springer Series in Statistics, New York, 29. [12] A. Hofman, C. M. van Duijn, O. H. Franco, M. A. Ikram, H. L. Janssen, C. C. Klaver, E. J. Kuipers, T. E. Nijsten, B. H. Stricker, H. Tiemeier, A. G. Uitterlinden, M. W. Vernooij, and J. C. Witteman. The Rotterdam Study: 212 objectives and design update. European Journal of Epidemiology, 26: , Aug 211. [13] M. T. Koller, M. J. Leening, M. Wolbers, E. W. Steyerberg, M. G. Hunink, R. Schoop, A. Hofman, H. C. Bucher, B. M. Psaty, D. M. Lloyd-Jones, and J. C. Witteman. Development and validation of a coronary risk prediction model for older U.S. and European persons in the cardiovascular health study and the Rotterdam Study. Annals of Internal Medicine, 157: , Sep 212. [14] M. T. Koller, H. Raatz, E. W. Steyerberg, and M. Wolbers. Competing risks and the clinical community: irrelevance or ignorance? Stat Med, 31: , May 212. [15] M. T. Koller, B. Schaer, M. Wolbers, C. Sticherling, H. C. Bucher, and S. Osswald. Death without prior appropriate implantable cardioverterdefibrillator therapy: a competing risk study. Circulation, 117: ,

15 [16] Ulla B. Mogensen, Hemant Ishwaran, and Thomas A. Gerds. Evaluating random forests for survival analysis using prediction error curves. Journal of Statistical Software, 5:1 23, 212. [17] NCEP. Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) final report. Circulation, 16: , 22. [18] H. Putter, M. Fiocco, and R. B. Geskus. Tutorial in biostatistics: competing risks and multi-state models. Stat Med, 26: , 27. [19] P. Saha and P. J. Heagerty. Time-dependent predictive accuracy in the presence of competing risks. Biometrics, 66: , 21. [2] Thomas Scheike, M.J. Zhang, and Thomas Alexander Gerds. Predicting cumulative incidence probability by direct binomial regression. Biometrika, 95:25 22, 28. [21] R. Schoop, J. Beyersmann, M. Schumacher, and H. Binder. Quantifying the predictive accuracy of time-to-event models in the presence of competing risks. Biometric Journal, 53:88 112, 211. [22] E. W. Steyerberg. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, New York, 29. [23] H. Uno, T. Cai, M. J. Pencina, R. B. D Agostino, and L. J. Wei. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine, 3: , 211. [24] H. van Houwelingen and H. Putter. Dynamic Prediction in Clinical Survival Analysis. Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 211. [25] M. Wolbers, M. T. Koller, J. C. Witteman, and E. W. Steyerberg. Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology, 2: , 29. [26] G. Yan and T. Greene. Investigating the effects of ties on measures of concordance. Statistics in medicine, 27: , 28. [27] Y. Zheng, T. Cai, Y. Jin, and Z. Feng. Evaluating prognostic accuracy of biomarkers under competing risk. Biometrics, 68: ,

16 Supplementary Materials Section A of the supplementary material illustrates the software implementation of the proposed concordance estimator in the function cindex of the R package pec [16]. Section B provides a proof of Lemma 1. A Software implementation The presented concordance estimators for competing risks have been implemented in the function cindex of the R package pec [16]. The following box shows R code for evaluating Ĉ1(t) at t = 1, 5, and1 simultaneously for a Fine- Gray regression model and the combination of cause-specific Cox regression with age as the only covariate based on a marginal Kaplan-Meier model for the censoring distribution. 1 library(pec) 2 library(riskregression) 3 library(cmprsk) 4 5 # Fit Fine-Gray and cause-specific models for failure cause 1 to the training data 6 7 fg.train <- FGR(Hist(ttevent,evtype)~age,data=chd.train,cause=1) 8 cox.train <- CSC(Hist(ttevent,evtype)~age,data=chd.train,cause=1) 9 1 # Calculate truncated concordance at t=1,5,1 years 11 # for CHD in the test data cindex(list(fg.train,cox.train), 14 formula=hist(ttevent,evtype)~1, 15 cens.model="marginal", 16 data=chd.test,eval.times=c(1,5,1),cause=1) # Fit Fine-Gray and cause-specific models for failure cause 1 in all data 19 2 fg <- FGR(Hist(ttevent,evtype)~age,data=chd,cause=1) 21 cox <- CSC(Hist(ttevent,evtype)~age,data=chd,cause=1) # Calculate truncated concordance at t=1,5,1 years 24 # for CHD using bootstrap cross-validation cindex(list(fg.train,cox.train), 27 formula=hist(ttevent,evtype)~1, 28 cens.model="marginal", 29 splitmethod="bootcv", 3 B=1, 31 data=chd,eval.times=c(1,5,1),cause=1) 15

17 B Proof of Lemma 4.1 It is assumed that Ĝ is consistent for G. Thus, Slutsky s lemma shows that the weights converge in probability to W ij,1 = G( T i X i )G( T i X j ) and W ij,2 = G( T i X i )G( T j X j ) as n. By the law of large numbers and Slutsky s Lemma it follows that Ĉ 1 (t) converges in probability to ( E Xi,X j [Q ij (t) E{ÃijÑ i 1 1 (t)wij,1 X i, X j } + E{ B )] ij Ñi 1 1 (t)wij,2 X i, X j } [( E Xi,X j E{ÃijÑ i 1 1 (t)wij,1 X i, X j } + E{ B )]. (8) ij Ñi 1 1 (t)wij,2 X i, X j } By applying equations (4.4) and (4.5) we have and hence E(ÃijÑ 1 i (t) X i, X j ) = = E(ÃijÑ 1 i (t)w 1 ij,1 X i, X j ) = t t t Similarly by equation (4.5) we have E(I{ T j > s X j }) E(dÑ 1 i (s) X i ) G(s X j ) S(s X j ) G(s X i )df 1 (s X i ) S(s X j )df 1 (s X i ). E( B ij Ñ 1 i (t) X i, X j ) = E( i j B ij N 1 i (t) X i, X j ) which yields = E( B ij Ñ i (t)w ij,2 X i, X j ) = t s t G(v X j ) G(s X i )df 2 (v X j ) df 1 (s X i ) F 2 (s X j )df 1 (s X i ). Inserting shows that expression (8) equals the expression for C 1 (t) given in (2.3): [ E Xi,X j Q ij (t) ] [ t {S(s X j) + F 2 (s X j )}df 1 (s X i ) E Xi,X j Q ij (t) ] t [ ] {1 F 1(s X j )}df 1 (s X i ) t = [ ] E Xi,X j {S(s X t. j) + F 2 (s X j )}df 1 (s X i ) E Xi,X j {1 F 1(s X j )}df 1 (s X i ) 16

18 Tables and Figures Figure 1: Concordance for cause-specific hazards models depending on the effect size β 1 as well as on the baseline hazard λ 2 (t) and the regression coefficient for the competing risk. Black solid lines refer to β 2 =, black dashed lines to β 2 = β 1, gray solid lines to β 2 = β 1, and gray dashed lines to β 2 = 2β 1. λ 2 (t) = (no competing risks) λ 2 (t) =.5 λ 2 (t) = 2 Concordance C ~ Concordance C ~ Concordance C ~ β 1 β 1 β 1 17

19 Figure 2: Concordance for Fine-Gray models depending on the effect size β and the baseline probability p of an event of interest in [, ). Lines correspond to p =.1 (gray dashed line),.5 (black line), and.9 (gray dash-dotted line). Concordance C ~ β 18

20 Table 1: Simulation settings, bias and root mean square error (RMSE) for the 16 scenarios of the simulation study. Bias and RMSE for each scenario are based on results from 1 simulated datasets. CR=competing risks process. Simulation setting Bias ( 1) RMSE ( 1) CR t censoring N C1(t) Ĉ1,naive(t) Ĉ1,KM(t) Ĉ1,Cox(t) Ĉ1,naive(t) Ĉ1,KM(t) Ĉ1,Cox(t) 1 median indep median indep median dep median dep q 3 indep q 3 indep q 3 dep q 3 dep median indep median indep median dep median dep q 3 indep q 3 indep q 3 dep q 3 dep

21 Table 2: Estimated concordance in the coronary heart disease study truncated after 1, 5 and 1 years. Single split into training/validation data CHD non-chd death 1y 5y 1y 1y 5y 1y Fine-Gray (age only) Fine-Gray Cause-specific Cox Average of 1 splits into bootstrap training data and corresponding validation data CHD non-chd death 1y 5y 1y 1y 5y 1y Fine-Gray (age only) Fine-Gray Cause-specific Cox Figure 3: Left panel: IPCW estimates of C 1 (t) for the multivariable Fine and Gray model (solid line) and the model with age as the only covariate (dashed line) for a follow-up duration of 1 1 years in the validation data. Error bars at 2.5, 5, 7.5 and 1 years of follow-up correspond to boostrap standard errors. Right panel: Prediction error curves for the multivariable Fine and Gray model (black solid curve) and the model with age as the only covariate (black dashed line). The gray line shows the prediction error of the covariate-free nonparametric estimate of the cumulative incidence function as a benchmark. Concordance C^ 1(t) Prediction error (Brier score) Time t (years) Time t (years) 2

22 Research Reports available from Department of Biostatistics Department of Biostatistics University of Copenhagen Øster Farimagsgade 5 P.O. Box Copenhagen K Denmark 11/1 Kreiner, S. Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. 11/2 Andersen, P.K. Competing risks in epidemiology: Possibilities and pitfalls. 11/3 Holst, K.K. Model diagnostics based on cumulative residuals: The R-package gof. 11/4 Holst, K.K. & Budtz-Jørgensen, E. Linear latent variable models: The lava-package. 11/5 Holst, K.K., Budtz-Jørgensen, E. & Knudsen, G.M. A latent variable model with mixed binary and continuous response variables. 11/6 Cortese, G., Gerds, T.A., &Andersen, P.K. Comparison of prediction models for competing risks with time-dependent covariates. 11/7 Scheike, T.H. & Sun, Y. On Cross-Odds Ratio for Multivariate Competing Risks Data. 11/8 Gerds, T.A., Scheike T.H. & Andersen P.K. Absolute risk regression for competing risks: interpretation, link functions and prediction. 11/9 Scheike, T.H., Maiers, M.J., Rocha, V. & Zhang, M. Competing risks with missing covariates: Effect of haplotypematch on hematopoietic cell transplant patients. 12/1 Keiding, N., Hansen, O.K.H., Sørensen, D.N. & Slama, R. The current duration approach to estimating time to pregnancy. 12/2 Andersen, P.K. A note on the decomposition of number of life years lost according to causes of death.

23 12/3 Martinussen, T. & Vansteelandt, S. A note on collapsibility and confounding bias in Cox and Aalen regression models. 12/4 Martinussen, T. & Pipper, C.B. Estimation of Odds of Concordance based on the Aalen additive model. 12/5 Martinussen, T. & Pipper, C.B. Estimation of Causal Odds of Concordance based on the Aalen additive model. 12/6 Binder, N., Gerds, T.A. & Andersen, P.K. Pseudo-observations for competing risks with covariate dependent censoring. 12/7 Keiding, N. & Clayton, D. Standardization and control for confounding in observational studies: a historical perspective. 12/8 Hilden, J. & Gerds, T.A. Evaluating the impact of novel biomarkers: Do not rely on IDI and NRI. 12/9 Gerster, M., Madsen, M. & Andersen, P.K. Matched Survival Data in a Co-twin Control Design. 12/1 Scheike, T.H., Holst, K.K. & Hjelmborg, J.B. Estimating heritability for cause specific mortality based on twin studies. 12/11 Jensen, A.K.G., Gerds, T.A., Weeke, P., Torp-Pedersen C. & Andersen, P.K. A note on the validity of the case-time-control design for autocorrelated exposure histories. 13/1 Forman, J.L. & Sørensen M. A new approach to multi-modal diffusions with applications to protein folding. 13/2 Mogensen, U.B., Hansen, M., Bjerrum, J.T., Coskun, M., Nielsen, O.H., Olsen, J. & Gerds, T.A. Microarray based classification of inflammatory bowel disease: A comparison of modelling tools and classification scales. 13/3 Wolbers, M., Koller, M.T., Witteman, J.C.M. & Gerds, T.A. Concordance for prognostic models with competing risks.

A note on the decomposition of number of life years lost according to causes of death

A note on the decomposition of number of life years lost according to causes of death Per Kragh Andersen Research Report 12/2 Department of Biostatistics University of Copenhagen A note on the decomposition