Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML
Precision medicine The right treatment for the right patient at the right time 2/28 Optimal DTRs Via ML
Precision medicine Patent heterogeneity: Genetic/genomic profiles Demographic, physiological characteristics Medical history, concomitant conditions Environment, lifestyle factors Adverse reactions, adherence to prior treatment... Fundamental premise: A patient s characteristics are implicated in which treatment options s/he should receive 3/28 Optimal DTRs Via ML
Clinical decision-making Clinical practice: Clinicians make a series of treatment decisions over the course of a patient s disease or disorder Key decision points in the disease/disorder process Fixed schedule, milestones, events necessitating a decision Multiple treatment options at each decision point Synthesize all information on the patient to decide on an option Goal: Make the best decisions leading to the most beneficial expected clinical outcome for this patient given his/her characteristics 4/28 Optimal DTRs Via ML
Example: Acute leukemia Two decision points: Decision 1 : Induction chemotherapy (2 options: C 1, C 2 ) Decision 2 : Maintenance treatment for patients who respond (2 options: M 1, M 2 ) Salvage chemotherapy for those who don t respond (2 options: S 1, S 2 ) Clinical outcome: Progression-free or overall survival time 5/28 Optimal DTRs Via ML
Treatment regime Precision medicine: Formalize clinical decision-making and make it evidence-based At each decision point, would like a formal rule that takes as input all available information on the patient to that point and outputs a recommended treatment action from among the possible, feasible options Treatment regime: A set of decision rules, each corresponding to a decision point Aka dynamic treatment regime, adaptive treatment strategy, adaptive intervention, treatment policy 6/28 Optimal DTRs Via ML
Two decision regime: Acute leukemia At baseline: Information x 1, accrued information h 1 = x 1 H 1 Decision 1: Set of options A 1 = {C 1,C 2 }; rule 1: d 1 (h 1 ): H 1 A 1 Between Decisions 1 and 2: Collect additional information x 2, including responder status Accrued information h 2 = (x 1, chemotherapy at decision 1, x 2 ) H 2 Decision 2: Set of options A 2 = {M 1,M 2,S 1,S 2 }; rule 2: d 2 (h 2 ): H 2 {M 1,M 2 } (responder), d 2 (h 2 ): H 2 {S 1,S 2 } (nonresponder) Treatment regime : d = {d 1 (h 1 ), d 2 (h 2 )} = (d 1, d 2 ) 7/28 Optimal DTRs Via ML
In general Treatment regime with K decision points: Baseline information x 1 X 1, intermediate information x k X k between Decisions k 1 and k, k = 2,..., K Set of treatment options A k at Decision k, elements a k A k Accrued information or history h 1 = x 1 H 1 h k = (x 1, a 1,..., x k 1, a k 1, x k ) H k, k = 2,..., K, Decision rules d 1 (h 1 ), d 2 (h 2 ),..., d K (h K ), d k : H k A k Treatment regime d = {d 1 (h 1 ),..., d K (h K )} = (d 1, d 2,..., d K ) Class of all possible K -decision regimes: D 8/28 Optimal DTRs Via ML
Optimal treatment regime Goal: Find the best or optimal regime in D d opt = (d opt 1,..., d opt K ) Assume: There is a clinical outcome by which treatment benefit can be assessed Survival time, CD4 count,... Coded so that larger is better Causal inference perspective... 9/28 Optimal DTRs Via ML
Optimal treatment regime Potential outcomes: For any regime d D Y * (d) = the outcome a patient would achieve if s/he were to receive treatment according to the rules in d Value of d V(d) = E{Y * (d)}, the population average outcome if all patients in the population were to receive treatment options according to d Optimal regime: d opt = arg max V(d) d D I.e., E{Y * (d)} E{Y * (d opt )} for all d D 10/28 Optimal DTRs Via ML
Optimal treatment regime Challenge: Can we estimate d opt = arg max d D V(d) from data? From a randomized clinical trial or observational database d opt is defined in terms of potential outcomes Must be able to express the definition of d opt equivalently in terms of the observed data 11/28 Optimal DTRs Via ML
Statistical framework Simplest setting: A single decision with two treatment options A 1 = {0, 1} Treatment regime: d D comprises a single rule d 1 d = {d 1 (h 1 )} Data: Independent and identically distributed (iid) (X 1i, A 1i, Y i ), i = 1,..., n n subjects indexed by i X 1i = baseline information observed on subject i A 1i = treatment option in A 1 actually received by subject i Y i = observed outcome for subject i History for subject i: H 1i = X 1i 12/28 Optimal DTRs Via ML
Assumptions Consistency: Y = Y * (1)I(A 1 = 1) + Y * (0)I(A 1 = 0) Positivity: pr(a 1 = a 1 H 1 = h 1 ) > 0 for all h 1 H 1, a 1 = 0, 1 No unmeasured confounders: {Y * (1), Y * (0)} A 1 H 1 The history H 1 contains all information used to assign treatments in the observed data Automatically satisfied for data from a randomized trial Standard but unverifiable assumption for observational studies 13/28 Optimal DTRs Via ML
Value of a regime Under these assumptions: Can be shown that V(d) = E{Y * (d)} [ ] = E E{Y * (1) H 1 }I{d 1 (H 1 ) = 1} + E{Y * (0) H 1 }I{d 1 (H 1 ) = 0} = E [E(Y H 1, A 1 = 1)I{d 1 (H 1 ) = 1} + E(Y H 1, A 1 = 0)I{d 1 (H 1 ) = 0}] = E [Q 1 (H 1, 1)I{d 1 (H 1 ) = 1} + Q 1 (H 1, 0)I{d 1 (H 1 ) = 0}], Implies: Optimal regime Q 1 (h 1, a 1 ) = E(Y H 1 = h 1, A 1 = a 1 ) d opt 1 (h 1 ) = I{Q 1 (h 1, 1) Q 1 (h 1, 0)} = I{Q 1 (h 1, 1) Q 1 (h 1, 0) 0} = I{C 1 (h 1 ) 0} C 1 (h 1 ) = Q 1 (h 1, 1) Q 1 (h 1, 0) is the contrast function 14/28 Optimal DTRs Via ML
Regression estimator for optimal regime Regression model: Q 1 (h 1, a 1 ; β 1 ) Fitted model Q 1 (h 1, a 1 ; β 1 ) Estimated optimal regime d opt 1 (h 1 ) = I{Q 1 (h 1, 1; β 1 ) Q 1 (h 1, 0; β 1 ) 0} Simplest form of Q-learning Concern: Misspecification of regression model 15/28 Optimal DTRs Via ML
Direct/policy search estimator for optimal regime Restricted class of regimes D η : Indexed by η 1 d η = {d 1 (h 1 ; η 1 )}, η = η 1 Motivated by a regression model, e.g., h 1 = (x 11, x 12 ), d 1 (h 1 ; η 1 ) = I(η 11 + η 12 x 11 + η 13 x 12 0), η 1 = (η 11, η 12, η 13 ) T Based on cost, feasibility in practice, interpretability; e.g., d 1 (h 1 ; η 1 ) = I(x 11 < η 11, x 12 < η 12 ), η 1 = (η 11, η 12 ) T Or d 1 (h 1 ; η 1 ) in the form of a list (if-then-else clauses) Optimal restricted regime d opt η = {d 1 (h 1 ; η opt 1 )} d 1 (h 1 ; η opt 1 ), ηopt 1 = arg max V(d η ) η 1 16/28 Optimal DTRs Via ML
Direct/policy search estimator for optimal regime Optimal restricted regime: d opt η = {d 1 (h 1 ; η opt 1 )} d 1 (h 1 ; η opt 1 ), ηopt 1 = arg max V(d η ) η 1 Suggests: Obtain an estimator V(d η ) for V (d η ) for any fixed η 1 Treat V(d η ) as a function of η 1 and maximize in η 1 That is, estimate η opt 1 by η opt 1 = arg max η 1 V(dη ) = d opt η = {d 1 (h 1, η opt 1 )} 17/28 Optimal DTRs Via ML
Inverse probability weighted value estimators Define: Consistency indicator C dη = I{A 1 = d 1 (H 1 ; η 1 )} Propensity of treatment consistent with d η π dη,1(h 1 ; η 1 ) = pr(c dη = 1 H 1 ) = π 1 (H 1 )I{d 1 (H 1 ; η 1 ) = 1} + {1 π 1 (H 1 )}I{d 1 (H 1 ; η 1 ) = 0} π 1 (h 1 ) = pr(a 1 = 1 H 1 = h 1 ) is the propensity score π 1 (h 1 ) known in a randomized trial; can posit a model π 1 (h 1 ; γ 1 ) in an observational study and obtain π dη,1(h 1 ; η 1, γ 1 ) Semiparametric theory for missing data yields... 18/28 Optimal DTRs Via ML
Inverse probability weighted value estimators Inverse probability weighted estimator for V(d η ): For fixed η 1 V IPW (d η ) = n 1 n i=1 C dη,iy i π dη,1(h 1i ; η 1, γ 1 ) Doubly robust augmented inverse probability weighted estimator: More efficient and stable V AIPW (d η ) = n [ n 1 i=1 C dη,iy i π dη,1(h 1i ; η 1, γ 1 ) C d η,i π dη,1(h 1i ; η 1, γ 1 ) π dη,1(h 1i ; η 1, γ 1 ) ] Q dη,1(h 1i ; η 1, β 1 ) Q dη,1(h 1 ; η 1, β 1 ) = Q 1 (h 1, 1; β 1 )I{d 1 (h 1 ; η 1 ) = 1}+Q 1 (h 1, 0; β 1 )I{d 1 (h 1 ; η 1 ) = 0} 19/28 Optimal DTRs Via ML
Direct/policy search estimators for optimal regime Result: Estimators for η opt 1 by maximizing V IPW (d η )or V AIPW (d η ) in η 1 Estimators for optimal restricted regime d opt η = {d 1 (h 1 ; η opt 1 )} d opt η,ipw = {d 1(h 1, η opt opt 1,IPW )} and d η,aipw = {d 1(h 1, η opt 1,AIPW )} Challenge: nonsmooth functions of η 1 ; nonstandard optimization problem 20/28 Optimal DTRs Via ML
Classification analogy So what is the connection to machine learning? ψ 1 (H 1, A 1, Y ) = A 1Y π 1 (H 1 ) {A 1 π 1 (H 1 )} Q 1 (H 1, 1), π 1 (H 1 ) ψ 0 (H 1, A 1, Y ) = (1 A 1)Y 1 π 1 (H 1 ) + {A 1 π 1 (H 1 )} Q 1 (H 1, 0). 1 π 1 (H 1 ) E{ψ 1 (H 1, A 1, Y ) ψ 0 (H 1, A 1, Y ) H 1 } = Q 1 (H 1, 1) Q 1 (H 1, 0) = C 1 (H 1 ), the contrast function Predictor of the contrast function Ĉ 1 (H 1i, A 1i, Y i ) = ψ 1 (H 1i, A 1i, Y i ) ψ 0 (H 1i, A 1i, Y i ) with fitted models Q 1 (h 1, a 1 ; β 1 ) and π 1 (h 1 ; γ 1 ) substituted 21/28 Optimal DTRs Via ML
Classification analogy Lots of algebra: Maximizing V AIPW (d η ) in η 1 is equivalent to minimizing n 1 n i=1 [ ] Ĉ1(H 1i, A 1i, Y i ) I I{Ĉ1(H 1i, A 1i, Y i ) 0} d 1 (H 1i ; η 1 ) A weighted classification error with Label I{Ĉ1(H 1i, A 1i, Y i ) 0} Weight Ĉ1(H 1i, A 1i, Y i ) Classifier d 1 (H 1i ; η 1 ) And similarly for V IPW (d η ) with ψ 1 (H 1, A 1, Y ) = A 1Y π 1 (H 1 ), ψ 0(H 1, A 1, Y ) = (1 A 1)Y 1 π 1 (H 1 ), 22/28 Optimal DTRs Via ML
Classification analogy Result: Direct/policy search estimation of dη opt by maximizing V IPW (d η ) or V AIPW (d η ) is equivalent to minimizing a weighted classification error Choice of classification approach dictates the restricted class D η E.g., linear or nonlinear SVM, CART, random forests, etc etc Can add a penalty to achieve parsimonious representation Outcome weighted learning (O-learning ) uses V IPW (d η ) with nonlinear SVM 23/28 Optimal DTRs Via ML
K > 1 decisions Extensions: d opt = (d opt 1,..., d opt K ) Q-learning Direct/policy search estimation within restricted class D η Backward induction implementation with classification representation at each step 24/28 Optimal DTRs Via ML
Discussion Summary: Direct/policy search estimation of an optimal treatment regime can be cast as a weighted classification problem Can exploit existing machine learning techniques to estimate an optimal treatment regime 25/28 Optimal DTRs Via ML
Acknowledgement IMPACT Innovative Methods Program for Advancing Clinical Trials A joint venture of Duke, UNC-Chapel Hill, NC State Supported by NCI Program Project P01 CA142538 (2010 2020) http://impact.unc.edu Statistical methods for precision cancer medicine 26/28 Optimal DTRs Via ML
Upcoming SAMSI: Statistical and Applied Mathematical Sciences Institute https://www.samsi.info/ 2018-2019 Program on Statistical, Mathematical, and Computational Methods for Precision Medicine Opening Workshop: August 13-17, 2018 27/28 Optimal DTRs Via ML
Some references Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics 68, 1010 1018. Zhang, B., Tsiatis, A. A., Davidian, M., Zhang, M., and Laber, E. B. (2012). Estimating optimal treatment regimes from a classification perspective. Stat 1, 103 114. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100, 681 694. Zhang, Y., Laber, E. B., Tsiatis, A. A., and Davidian, M. (2015). Using decision lists to construct interpretable and parsimonious treatment regimes Biometrics 71, 895 904. Zhang, Y., Laber E. B., Davidian, M., and Tsiatis, A. A. (2018). Estimation of optimal treatment regimes using lists. Journal of the American Statistical Association, in press. Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107, 1106 1118. Zhao, Y. Q., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association 110, 583 598. 28/28 Optimal DTRs Via ML