Part III Measures of Classification Accuracy for the Prediction of Survival Times

Size: px

Start display at page:

Download "Part III Measures of Classification Accuracy for the Prediction of Survival Times"

Jemima Goodwin
5 years ago
Views:

1 Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010

2 Session Three Outline Examples Breast Cancer Cytology Data Mayo PBC Data Cystic Fibrosis Foundation Registry Data Previous approaches ROC overview TP, FP for survival outcomes ROC C/D t (p) ROC I/D t (p), AUC(t), and concordance, C τ ROC(p) interpretation and dynamic criteria, c p (t) Further work 103 ISCB 2010

3 Example: Comparing cytometry measures Breast Cancer among Younger Women N = 253 women BC diagnosed aged 20 to 44 Endpoint: time-until-death (any cause) Cytometry measurements: old (S-phase ungated) new (S-phase gated) Goal: compare measures as predictors of mortality Heagerty, Lumley & Pepe (2000) 104 ISCB 2010

4 New versus Old Method %S Gated %S Ungated 105 ISCB 2010

5 Predictive Survival Models Mayo PBC Data N = 312 subjects, 125 deaths, Baseline measurements: bilirubin, prothrombin time, albumin Goal: predict mortality; medical management 106 ISCB 2010

6 107 ISCB 2010

7 Predictive Survival Models Cystic Fibrosis Data N = 23, 530 subjects, 4, 772 deaths, n = 160, 005 longitudinal observations Longitudinal measurements: FEV1, weight, height Goal: predict mortality; transplantation selection 108 ISCB 2010

8 fev fev fev fev fev fev age age age age age age fev fev fev fev fev fev age age age age age age fev fev fev fev fev fev age age age age age age fev fev fev fev fev fev age age age ISCB 2010 age age age

9 Accuracy: Some proposals R 2 Generalizations Korn and Simon (1990) Schemper and Henderson (2000) O Quigley and Xu (2001) TP, FP, ROC Generalizations Etzioni et al (1999); Slate and Turnbull (1999) Heagerty, Lumley, and Pepe (2000) Heagerty and Zheng (2005) 109 ISCB 2010

10 Some Comments Schemper and Henderson (2000), p 249: Consequently, there have been a number of attempts to develop measures akin to R 2 for Cox proportional hazards models [[references]], though as yet, none have been generally accepted These versions of R 2 are not about variance in T They focus on average variances of the counting process: N(t) = 1(T t) 110 ISCB 2010

11 Some Comments Natural to think of survival through counting process N(t) Common to use ROC curves for logistic regression / binary classification Q: Extend classification error rate concepts to survival data? 111 ISCB 2010

12 Components of Accuracy Calibration Bias does observed match predicted? Evaluated graphically and formally Discrimination Does prediction separate subjects with different risks? Evaluated qualitatively based on K-M plots Harrell, Lee and Mark (1996) 112 ISCB 2010

13 BC Survival: New Measurement (gated) (a) Survival for %S, Gated %S: [00, 2 ), N= 86 %S: [ 2, 6 ), N= 86 %S: [ 6, 100), N= ISCB 2010

14 BC Survival: Old Measurement (ungated) (b) Survival for %S, Ungated %S: [00, 11 ), N= 86 %S: [ 11, 35 ), N= 86 %S: [ 35, 100), N= ISCB 2010

15 Binary Classification Sensitivity True Positive BINARY TEST : P (T + D = 1) CONTINUOUS MARKER : P (M > c D = 1) Specificity True Negative BINARY TEST : P (T D = 0) CONTINUOUS MARKER : P (M c D = 0) 115 ISCB 2010

16 ROC Curve An ROC curve plots the True Positive Rate, TP(c), versus the False Positive Rate, FP(c) for all possible cutpoints, c: FP(c) = P (M > c D = 0) TP(c) = P (M > c D = 1) ROC Curve : [ F P (c), T P (c) ] c 116 ISCB 2010

17 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

18 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

19 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

20 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

21 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

22 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

23 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

24 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

25 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

26 Marker versus Disease status ROC curve sensitivity specificity ISCB 2010

27 ROC Curves 1 Compare different markers over full spectrum of error combinations 2 Compare sensitivity when controlling specificity (eg TP when FP=10%) 3 AUC interpretation: For a randomly chosen case and control, the area under the ROC curve is the probability that the marker for the case is greater than the marker for the control 4 AUC is a marker-outcome concordance summary 118 ISCB 2010

28 Does a (repeated) measurement predict onset? Q: Can a measurement accurately predict which CASES will experience an event (soon)? eg FEV1 eg death time Q: Can a measurement be used to accurately guide longitudinal treatment decisions? eg lung transplantation 119 ISCB 2010

29 Classification Errors True Positive Rate P ( high measurement CASE ) False Positive Rate P ( high measurement CONTROL ) 120 ISCB 2010

30 Issues Related to Time Q: When is the measurement taken? At baseline: Y (0) At a follow-up time t: Y (t) Q: What time is used to determine when someone is a CASE or a CONTROL? Case = event (disease, death) before time t Case = event (disease, death) at time t Control = event-free through time t Control = event-free through a large follow-up time t 121 ISCB 2010

31 Our approaches measurement case control Heagerty, Lumley & Pepe (2000) baseline T t T > t Zheng & Heagerty (2004) longitudinal T = t T > t Heagerty & Zheng (2005) baseline T = t T > t Zheng & Heagerty (2007) longitudinal T t T > t Saha & Heagerty (2010) longitudinal T = t, δ = j T > t 122 ISCB 2010

32 Sensitivity and Specificity for Survival Let T denote the survival time, and let N(t) denote the counting process for the uncensored outcome: N(t) = 1(T t) Possible definitions: CASE(t) : CONTROL(t) : Cumulative N(t) = 1 Incident dn(t) = 1 Static N(t ) = 0 Dynamic N(t) = 0 Where t is a fixed large time, t >> t 123 ISCB 2010

33 HLP(2000) Time ZH(2004) Time HZ(2005) Time ZH(2007) Time 124 ISCB 2010

34 [1] Sensitivity and Specificity for Survival Define: Heagerty, Lumley & Pepe (2000) sensitivity C (c, t) : P (M > c T t) P (M > c N(t) = 1) specificity D (c, t) : P (M c T > t) P (M c N(t) = 0) T P C t (c) = P (M > c N(t) = 1) F P D t (c) = P (M > c N(t) = 0) 125 ISCB 2010

35 [1] Time-dependent ROC Curve An Cumulative/Dynamic ROC curve shows the ability of a marker to separate the cumulative cases through time t (eg T t) from the controls at time t (eg T > t) Define curve [ p, ROC C/D t (p) ]: ROC C/D t (p) = T P C { [F P D ] 1 (p) } = T P C (c p ) where c p : p = F P D (c p ) Define AUC as: AU C(t) = ROC C/D t (p) dp 126 ISCB 2010

36 Estimation: NNE for S(m, t) With censored times a valid ROC solution can be provided by using an estimator of the bivariate distribution function for [M, T ] Define: S(c, t) = P [M > c, T > t] S(c, t) = c S(t M = u)df M (u) where F M (u) is the distribution function for M 127 ISCB 2010

37 Akritas (1994): Ŝ λn (c, t) = 1 n Ŝ λn (t M = M i )1(M i > c) i where Ŝλ n (t M = M i ) is a suitable estimator of the conditional survival function characterized by a smoothing parameter λ n 128 ISCB 2010

38 Estimation: NNE for S(m, t) Define: P λn [M > c N(t) = 1] = {[1 F M (c)] Ŝλ n (c, t)} {1 Ŝλ n (t)} P λn [M c N(t) = 0] = 1 Ŝλ n (c, t) Ŝ λn (t) where Ŝλ n (t) = Ŝλ n (, t) For NNE λ n = O(n 1/3 ) sufficient for weak consistency Results from Akritas (1994) and van de Vaart and Wellner (1996) imply that the bootstrap can be used for inference 129 ISCB 2010

39 (a) ROC(40m), NNE (b) ROC(60m), NNE (c) ROC(100m), NNE sensitivity specificity (d) ROC(40m), Kaplan Meier sensitivity sensitivity specificity sensitivity sensitivity specificity (e) ROC(60m), Kaplan Meier (f) ROC(100m), Kaplan Meier sensitivity specificity 1 specificity 1 specificity 130 ISCB 2010

40 Accuracy Comparisons Sensitivity Using gated (new) measurement: P [M 1 54 N(60m) = 0] = 071 P [M 1 > 54 N(60m) = 1] = 082 Using ungated (old) measurement: P [M 2 35 N(60m) = 0] = 071 P [M 2 > 35 N(60m) = 1] = 054 Therefore, controlling M 1 and M 2 to have equal specificity, the new measure, M 1, has a greater sensitivity 131 ISCB 2010

41 Accuracy Comparisons AUC Calculations AUC for gated (new): 080, Bootstrap 95% CI: (072,089) AUC for ungated (old): 068, Bootstrap 95% CI: (056,077) 95% CI for difference in areas (new-old): (003, 026) 132 ISCB 2010

42 Summary Define time-dependent ROC curves based on prospective data and cumulative occurence of events Local survival estimation handles censored event times Tool to evaluate a marker or a survival regression model score R package: survivalroc (see web for doc/validation) Alternative definitions? More general longitudinal marker scenario? Heagerty, Lumley and Pepe (2000) Time-dependent ROC Curves for Censored Survival Data and a Diagnostic Marker Biometrics 133 ISCB 2010

43 [2] Sensitivity and Specificity for Survival Define: Heagerty & Zheng (2005) sensitivity I (c, t) : P (M > c T = t) P (M > c dn(t) = 1) specificity D (c, t) : P (M c T > t) P (M c N(t) = 0) T P I t (c) = P (M > c dn(t) = 1) F P D t (c) = P (M > c N(t) = 0) 134 ISCB 2010

44 [2] Time-dependent ROC Curve An Incident/Dynamic ROC curve shows the ability of a marker to separate the cases (T = t) from the controls (T > t) within a (potential) risk-set (T t) Define curve [ p, ROC I/D t (p) ]: ROC I/D t (p) = T P I { [F P D ] 1 (p) } = T P I (c p ) where c p : p = F P D (c p ) Define AUC as function of time: AU C(t) = ROC I/D t (p) dp 135 ISCB 2010

45 log normal survival example Marker RHO = log(time) 136 ISCB 2010

46 log-normal survival example AT RISK Marker CASE CONTROL log(time) ISCB 2010

47 I/D ROC curve for log normal sensitivity log(t) = specificity 137 ISCB 2010

48 log-normal survival example AT RISK Marker CASE CONTROL log(time) ISCB 2010

49 log-normal survival example AT RISK Marker CASE CONTROL log(time) ISCB 2010

50 log-normal survival example AT RISK Marker CASE CONTROL log(time) ISCB 2010

51 log-normal survival example AT RISK Marker CASE CONTROL log(time) ISCB 2010

52 log-normal survival example AT RISK Marker CASE CONTROL log(time) ISCB 2010

53 I/D ROC curves for log normal sensitivity log(t) = 15 log(t) = 1 log(t) = 05 log(t) = 0 log(t) = 05 log(t) = specificity 138 ISCB 2010

54 AUC(t) and Concordance Q: The I/D ROC curve and AUC(t) provide time-specific summaries of accuracy, but is there a single global summary? Concordance: C = P (M j > M k T j < T k ) C = t AUC(t) w(t) dt with w(t) = 2 f(t) S(t) 139 ISCB 2010

55 AUC(t) and Concordance Time can be restricted to (0, τ) to obtain: C τ = P (M j > M k T j < T k, T j < τ) = τ 0 AUC(t) w τ (t) dt with w τ (t) = w(t)/[1 S 2 (τ)] C is directly related to Kendall s tau, K: Korn and Simon (1990) Harrell, Lee and Mark (1996) C = K/2 + 1/2 140 ISCB 2010

56 AUC(t) curves for log normal AUC(t) w(t) rho = 09, C = 083 rho = 08, C = 078 rho = 07, C = 074 rho = 06, C = time 141 ISCB 2010

57 Estimation: Issues F Pt D (c) = P (M > c T > t) can be estimated non-parametrically for times when i R i(t) moderate-to-large, where R i (t) = 1(Ti > t), at-risk indicator However, estimation of T P I t (c) = P (M > c T = t) requires some sort of smoothing since the observed subset with T i = t may only contain one observation Essentially we are interested in regression quantiles for the marker as a function of time, T = t, but T may be censored (coarsened covariate) The hazard can be used as a bridge to estimate T P I t (p) 142 ISCB 2010

58 Estimation: Proportional Hazards Assume: (M i, T i, i) iid T i = min(t i, C i ), i = 1(T i < C i ) Independent censoring, C i No assumption for marker distribution Hazard Model: λ(t M i ) = λ 0 (t) exp(γ M i ) 143 ISCB 2010

59 Estimation: Proportional Hazards To estimate F P D t (c) we use the empirical distribution for the control set (T > t): F P D t (c) = i 1 n t R i (t+) 1(M i > c) where R i (t+) = 1(T i > t), n t = i R i (t+) 144 ISCB 2010

60 Estimation: Proportional Hazards To estimate T P I t (c) we use the exponential tilt of the empirical distribution for the risk set (T t): T P I t(c) = i π i (t, γ) 1(M i > c) where π i (t, γ) = R i (t) exp(γ M i )/W t W t = i R i (t) exp(γ M i ), R i (t) = 1(T i t) Xu and O Quigley (2000) show that the weights, π i (t, γ), applied to the risk set provide consistent estimation of P (M i > c T i = t) Partial likelihood connections: E(M T = t) 145 ISCB 2010

61 Estimation: Hazard as Bridge A general definition for the hazard is Then using a little algebra yields λ(t M i ) = P (T i = t M i ) P (T i t M i ) P (M i = m T i = t) λ(t M i = m) P (M i = m T i t) 146 ISCB 2010

62 Estimation: Hazard as Bridge A general definition for the hazard is Then using a little algebra yields λ(t M i ) = P (T i = t M i ) P (T i t M i ) P (M i = m T i = t) λ(t M i = m) }{{} P (M i = m T i t) }{{} Estimate = Smooth model + Empirical 146 ISCB 2010

63 Estimation: non-ph Estimation assuming PH can be relaxed using a varying coefficient model: Estimation of γ(t) λ(t M i ) = λ 0 (t) exp[γ(t) M i ] Hastie and Tibshirani (1993) Cai and Sun (2003) [local linear MPLE] simple smoothing of scaled Schoenfeld residuals Only assumes smooth hazard ratios, and linearity in M Linearity in M can be relaxed using functions f(m) 147 ISCB 2010

64 Estimation: non-ph Only the estimate of T P I t (c) is modified: T P I t(c) = i π i [t, γ(t)] 1(M i > c) where π i [t, γ(t)] = R i (t) exp[γ(t) M i ]/W t W t = i R i (t) exp[γ(t) M i ] 148 ISCB 2010

65 Some Simulations Data (M i, log T i ) were generated as bivariate normal with a correlation of ρ = 07 The sample size for each simulated data set was N = 200 The AUC(t) curve and the integrated curve, C τ, was estimated using: maximum likelihood assuming a bivariate normal model Cox model which assumes proportional hazards local maximum partial likelihood for the varying-coefficient model λ(t) = λ 0 (t) exp[γ(t) M i ] local linear smooth of the scaled Schoenfeld residuals to estimate the varying-coefficient model 149 ISCB 2010

66 AUC(t) Estimated Using Local-linear MPL AUC(t) log(time) 150 ISCB 2010

67 40% Censoring MLE Cox model local MPLE residual smooth log time AU C(t) mean (sd) mean (sd) mean (sd) mean (sd) (0019) 0749 (0031) 0859 (0054) 0875 (0048) (0021) 0742 (0029) 0818 (0035) 0827 (0037) (0021) 0732 (0026) 0770 (0035) 0772 (0035) (0020) 0722 (0024) 0724 (0038) 0722 (0039) (0019) 0712 (0024) 0689 (0042) 0687 (0041) (0018) 0702 (0026) 0654 (0045) 0655 (0043) (0016) 0689 (0035) 0633 (0057) 0637 (0048) (0015) 0653 (0055) 0617 (0075) 0614 (0051) (0013) 0560 (0073) 0555 (0075) 0546 (0058) C τ (0017) 0727 (0022) 0740 (0021) 0742 (0021) 151 ISCB 2010

68 Illustration: PBC Data Marker, M i, derived as linear predictor from Cox model: Covariate estimate se Z log(bilirubin) log(prothrombin time) edema albumin age Accuracy evaluated using λ 0 (t) exp[γ(t) M i ] local-linear MPLE for γ(t) (5) predictor model compared to (4) predictor excluding bilirubin 152 ISCB 2010

69 I/D ROC curves for PBC model score sensitivity year = 1 year = 2 year = 3 year = 4 year = specificity ISCB 2010

70 Illustration: PBC Data AUC based on Varying-Coefficient Cox model AUC w(t) 5 covariates: iauc = covariates: iauc = Time (days) 153 ISCB 2010

71 Summary Extension of ROC concepts to risk sets Estimation based on hazard model Varying coefficient simple methods look promising All summaries can be obtained from routine Cox model output Criterion for marker and/or model comparison Separation of marker generation and marker evaluation Note: R 2 for PBC data estimated as 032 (max possible 098) Heagerty and Zheng (2005) Survival Model Predictive Accuracy and ROC Curves Biometrics 154 ISCB 2010

72 Extension to Longitudinal Markers TP, FP based on longitudinal marker T P I t (c) = P [M(t) > c T = t] F P D t (c) = P [M(t) > c T > t] No simple concordance, but AUC(t) w(t)dt Dynamic criterion, c p (t), controls specificity c p (t) : p = P [M(t) > c p (t) T > t] Summary ROC curve shows total percent test positive when controlling FP using dynamic criterion 155 ISCB 2010

73 156 ISCB 2010

74 Illustration: CFF Data and Longitudinal Markers Split sample (development / validation) Time-dependent covariate Cox model Covariate estimate se Z fev fev1(t2) fev1(t3) gender weightz heightz Accuracy evaluated using λ 0 (t) exp[γ(t) M i (t)] Measurement spacing: median = 100, Q1 = 087, Q3 = ISCB 2010

75 CFF Survival Survival Time 158 ISCB 2010

76 Incident / Dynamic ROC for CFF Data True Positive Rate Age 10 Age 15 Age 20 Age 25 Age False Positive Rate 159 ISCB 2010

77 Summary Survival ROC Curve Dynamic criterion Saha and Heagerty (manuscript) c p (t) : p = P [M(t) > c p (t) T > t] Q: If a fixed time-dependent FP rate of p is used then what percent of cases will test positive at the appropriate time? Total Sensitivity = P [M(t) > c p (t) T = t] P [T = t]dt ROC(p) = E T [ ROC I/D T (p) ] t Note: Consider a time lag, L, such that the marker used to model the hazard is M(t L) This leads to test positive L units before failure 160 ISCB 2010

78 Time (years) model score Threshold functions for FP = 001, 005, and ISCB 2010

79 Summary Survival ROC for CFF Data sensitivity fev1, weight weight specificity 162 ISCB 2010

80 AUC(t) for CFF Data AUC(t) Time 163 ISCB 2010

81 Summary Accuracy summary ROC I/D t (p) : vary (M,t) AU C(t) : vary (t) ROC(p) : vary (M) C : global 164 ISCB 2010

82 Summary Longitudinal data as predictors of an event time Accuracy using sequential binary classification Connections to partial likelihood Software R code survivalroc R code risksetroc Now: Inference Future: Relax strong censoring assumption 165 ISCB 2010

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples