Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

Heterogeneity of Treatment Effect (HTE) Heterogeneity of Treatment Effect (HTE) refers to variability that is attributable to observable differences in patient characteristics. Accurate evaluation of HTE offers many potential benefits including informing patient decision-making and in appropriately targeting existing therapies. Often, HTE is analyzed mainly to examine consistency of treatment effect across key patient subgroups This is often done through subgroup analyses or tests for treatment-covariate interactions. 2

HTE Goals/Questions Characterizing and utilizing HTE encompasses a wide range of related goals and questions. Many of these key questions go beyond what can be addressed through conventional subgroup analysis. Key questions of interest include: Quantifying overall heterogeneity in treatment response. Estimating the proportion of patients that benefit from treatment Detection of cross-over (qualitative) interactions. Estimating individualized treatment effects. 3

Modeling HTE In contrast to subgroup analysis, many important HTE questions could be directly addressed if a sufficiently rich model describing patient outcomes were available. Bayesian nonparametric methods are well-suited to provide this individual-level view of HTE. Bayesian nonparametrics allow construction of flexible models for patient outcomes coupled with probability modeling of all unknown quantities Motivation of this work: Develop a flexible, non-parametric approach that can address many of the previously highlighted HTE goals. 4

Why Bayes? Emphasizes estimation of treatment effect heterogeneity rather than hypothesis testing. Well-suited to estimation with many parameters and small subgroups. Tends to shrink when data are sparse. Direct probability statements for questions of interest: e.g., what is the probability that a given individual will benefit from the treatment? Customized treatment recommendations - can utilize the posterior for each individual, can directly weigh efficacy versus safety. 5

Time-to-Event Data and Notation Our focus here is on cases where patient outcomes are time-to-events: T 1,..., T n For the i th patient, we observe Y i = duration of follow-up; Y i = min{t i, C i }. { 1 if failure time is observed δ i = 0 if outcome is right-censored A i = treatment assignment, A i = 1 or A i = 0 x i - a collection of baseline covariates 6

Accelerated Failure Time (AFT) Models and Individualized Treatment Effects We assume patients are randomly assigned to one of two treatments A = 1 or A = 0. To explore HTE in this trial, we consider the AFT model for log-failure time T i log T i = m(a i, x i ) }{{} Regression Function + W i }{{} Error Term The error term is assumed to satisfy the mean-zero constraint: E(W i ) = 0 7

Accelerated Failure Time (AFT) Models log T i = m(a i, x i ) }{{} + W }{{} i Regression Function Error Term In contrast to Cox PH models, AFT models provide a direct, generative model linking survival times and patient covariates. AFT models have a nice interpretation as a regression with log-time as the response. They provide interpretable measures of treatment effect: i.e., differences in expected log-survival time or ratios in expected survival. 8

Accelerated Failure Time (AFT) Models and Individualized Treatment Effects The Individualized Treatment Effect (ITE) θ(x i ) for the i th patient is the difference between expected log-failure times θ(x i ) = E [ log T i A i = 1, x i ] E [ log Ti A i = 0, x i ] = m(1, x i ) m(0, x i ). The ITE θ(x i ) represents the expected treatment effect for a patient with covariate vector x i. The ratio of expected failure times offers a more interpretable measure of treatment effect ξ(x i ) = E[ T i A i = 1, x i ] E [ T i A i = 0, x i ] = exp{θ(x i )} 9

Modeling the Regression Function Our flexible approach to modeling the regression function m(a i, x i ) builds on Bayesian additive regression trees (BART). Advantages of BART for ITE estimation: Good at handling interactions and non-linearities Very effective as an off-the-shelf method - works quite well without any hyperparameter tuning. Shown to be effective in the causal inference settings (). Seamlessly incorporates both discrete and continuous predictors. Automatically provides measures of uncertainty despite the complex nature of the model. 10

A Fully Nonparametric AFT model log T i = m(a i, x i ) }{{} + W }{{} i Regression Function Error Term Choosing a parametric form for the distribution of W i may be too restrictive on the form of the baseline hazard function. Instead, assume the density of f W of W i takes the form ( w τ ) f W (w G, σ) = φ dg(τ) σ We place a centered Dirichlet process prior on G G CDP(G 0, M) M Gamma(ψ 1, ψ 2 ) G 0 = Normal(0, σ 2 τ ) 11

A Fully Nonparametric AFT model While very flexible, our AFT models does entail certain assumptions about patient survival. Survival function ( ) S(t A i, x i ) = 1 F W log t m(a i, x i ) Survival curves across individuals cannot cross. 12

Using the Nonparametric AFT model to assess HTE The posterior distribution of all unknowns in the AFT model can be used to assess a variety of questions. For example, Point estimates of covariate-specific treatment effects The distribution of treatment effects Proportion of patient expected to benefit from treatment Qualitative interactions. The posterior can potentially be utilized in an individualized decision analysis 13

Application: The SOLVD trial Two, large placebo-controlled trial studying the efficacy of the drug enalapril in chronic heart failure patients 2, 569 enrolled in the treatment trial and 4, 228 enrolled in the prevention trial We utilized 18 patient covariates common to both trials (e.g., age, gender, ejection fraction). Primary Outcome: Time to death or first hospitalization. 14

The SPRINT trial: basic summary statistics Enalapril was found to be effective in both trials. In the treatment trial, 510 primary events in the Enalapril treatment arm and 452 primary events in the control arm. Hazard ratio in treatment trial: 0.73, [0.64, 0.82] 15

Individualized treatment effect estimates for all patients in the SOLVD trial. 6000 5000 Patient index 4000 3000 2000 1000 0 0.5 0.0 0.5 1.0 1.5 Difference in expected log failure time 16

Distribution of treatment effects in the SOLVD trial. 2.0 Treatment Trial Prevention Trial 1.5 Density 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 treatment effect (ratio in expected survival) 17

Proportion Benefiting The proportion of patients benefiting is the proportion of individuals with a positive treatment effect (i.e., θ(x i ) > 0) Q = 1 n n 1{θ(x i ) > 0} = 1 n i=1 n 1{ξ(x i ) > 1} i=1 Alternatively, one could define the proportion benefiting relative to a clinically relevant threshold ε > 0, i.e., Q ɛ = 1 n n i=1 1{θ(x i) > ε}. An estimate of Q is obtained from taking the area under the curve to the right of 1 in the graph of treatment effect distribution (shown on the previous slide). The estimated percentage of patients benefiting in the treatment trial was 96% and 89% in the prevention trial. 18

Posterior of Proportion Benefiting in the SOLVD trials. 10 Treatment Trial Prevention Trial 8 Density 6 4 2 0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Proportion Benefiting 19

Local Evidence of Benefit Posterior Probabilities of Treatment Benefit P{ξ(x i ) > 1 y, δ} Treatment Trial Prev. Trial P{ξ(x i ) > 1 y, δ} (0.99, 1] 51.38 20.47 P{ξ(x i ) > 1 y, δ} (0.95, 0.99] 24.69 23.71 P{ξ(x i ) > 1 y, δ} (0.75, 0.95] 20.08 41.98 P{ξ(x i ) > 1 y, δ} [0, 0.75] 3.85 13.84 20

Evidence of Differential Treatment Effect For patient i, the posterior probability of a greater than average treatment effect may be defined as D i = P{θ(x i ) θ avg data}, θ = 1 n n θ(x i ) i=1 and the posterior probability of a differential treatment effect is D i = max{1 2D i, 2D i 1}. Note that D i will be close to 1 whenever D i is either close to 1 or close to 0. Trt. Trial Prev. Trial Strong Evidence: Di > 0.95 19.4% 7.3 Moderate Evidence: Di > 0.80 41.9% 31.6 21

Individual-Specific Posterior Survival Curves in SOLVD 1.0 0.8 Survival Probability 0.6 0.4 0.2 0.0 Enalapril Placebo Enalapril KM estimate Placebo KM estimate 0 500 1000 1500 Time 22

Examining Qualitative Interactions Beyond quantitative heterogeneity, examination of qualitative interactions is often of key interest. Qualitative Interaction: occurs when the treatment effect in one subgroup has a different sign than in another subgroup. The presence of qualitative interactions can be examined by looking at the posterior histogram. For pre-specified subgroups of interest such as male vs. female, we can look at the posteriors of the subgroup-level treatment effects θ male = θ female = 1 N male i male 1 θ(x i ) N female i female θ(x i ) 23

SOLVD Treatment: posterior of θ male and θ female P { sign(θ male ) sign(θ female ) data } = 0.13 3 Male Female Density 2 1 0 0.2 0.0 0.2 0.4 0.6 0.8 1.0 difference in log survival time (days) 24

Variable Importance: Partial Dependence Plots 0.600 0.9 0.595 0.8 difference in log survival 0.590 0.585 0.580 difference in log survival 0.7 0.6 0.575 0.5 0.570 30 40 50 60 70 80 age 10 15 20 25 30 35 ejection fraction 25

Variable Importance for Treatment-Covariate Interactions Which covariates are important in driving differences in treatment effect? (prognostic vs. predictive) If there are no treatment-covariate interactions, m(1, x) m(0, x) should not depend on the value of x. The treatment effect θ(x) = m(1, x) m(0, x) should only depend on predictive covariates. One approach: Run some form of regression with ˆm(1, x i ) ˆm(0, x i ) as the responses: For example, run CART to find patient subgroups. Perform a variable selection procedure, to find a parsimonious model. 26

Variable Importance Regression with ˆm(1, x i ) ˆm(0, x i ) as the responses. Variables sorted by absolute value of the associated t-statistic Variable Estimate t value ejection fraction -0.0211-96.67 gender 0.0983 25.62 diabetic 0.0667 19.62 chronic pulmonary disease -0.0401-9.086 creatinine 0.0339 8.54 27

The AFTrees package The methods discussed here are implemented in the AFTrees package. ## An example: library(aftrees) solvd.fit <- AFTrees(X, y, status, ndpost = 2000) ## X - design matrix ## y - follow-up time ## status - event indicator (1 if event, 0 otherwise) ## ndpost - number of posterior draws The AFTrees package is available for download at https://github.com/nchenderson/aftrees 28

Thanks Acknowledgements: This work was supported through a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-1303-5896). 29