Methods for Interpretable Treatment Regimes

Size: px

Start display at page:

Download "Methods for Interpretable Treatment Regimes"

Ashlyn Martin
5 years ago
Views:

1 Methods for Interpretable Treatment Regimes John Sperger University of North Carolina at Chapel Hill May 4 th 2018 Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

2 Why Interpretable Methods? Doctors and other decision makers are more likely to trust results they can understand. Patients may be more willing to comply with treatment if they understand why it was given to them. Hypothesis Generation and Exploratory Data Analysis Auditability / reporting requirements Source: Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

3 A Decision List is a sequence of if-then-else rules Example Decision List [2] In mathematical logic, an atomic formula is a boolean formula with only a single variable and no logical connectives. Every atomic formula x has two literals associated with it: itself, x, and its negation x. A term is a conjunction of literals, for example (x 1 x 2 ) or (x 1 x 2 ). [3] A group R j is made of those subjects that satisfy the term t j but not terms t 1,..., t j 1, except for R L+1 which is comprised of all subjects who don t satisfy any of the first L terms R j = { x : satisfy(x, t j ) j 1 k=1 satisfy(x, t k ) } Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

4 Single Stage Decision Problem The observed data are n i.i.d. trajectories (x, a, y) and we re interested in a treatment rule π : X A x X R p a vector of p covariates for subject i measured at the beginning of the stage a A action. For a binary treatment A = { 1, 1} y R is the outcome for patient defined so that higher values correspond to better outcomes Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

5 Q-learning with Policy Search Define the state-action value function Q(x, a) = E[Y X = x, A = a]. Q-learning with policy search involves constructing an estimator Q for the Q-function and then search for a π which maximizes the value of the estimated Q-function. Zhang et al.: Model Q with kernel ridge regression. Use a greedy optimization procedure which turns out to be consistent for the global maximizer. [6] Lakkaraju & Rudin: Doubly robust estimation for Q. Represent fitting a list as an MDP where states are (partial) decision lists. Then use Monte Carlo Tree Search methods to find the best list-based treatment regime. [2] Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

6 Lakkaraju & Rudin Setup Expected Outcome: g 1 (π) = 1 n n i=1 a A [ ] I(ai = a) ω(x i, a) (y i Q(x i, a)) + Q(x i, a) I (π(x i ) = a) Expected Assessment Cost: g 2 (π) = 1 n n i=1 ψ(x i, π) Ordered so that if x i belongs to group R i and x j belongs to group R j where i < j then ψ(x i ) ψ(x j ) Expected Treatment Cost: g 3 (π) = 1 n n i=1 φ(x i, π) Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

7 L&R Objective Finally, L&R define the set FP of frequent patterns in the data by running a frequent pattern mining algorithm on the data. The resulting space of policies FP A is a subset of the overall space of list-based policies Π. Letting λ 1, λ 2, λ 3 be non-negative weights the objective function is π opt = argmax λ 1 g 1 (π) λ 2 g 2 (π) λ 3 g 3 (π) π FP A Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

8 Fitting a List as a Markov Decision Process (S, A, T, R) States: partial decision lists (non-terminal states) or fully fitted decision lists (terminal states) Actions: adding a rule to a decision list Transitions: The environment s transition model is deterministic Reward: λ 1 o(x i, â) λ 2 n n i D i D c,j T ψ(x i, r j ) λ 3 n i D φ(x i, â) T : new covariates in r which haven t been previously assessed D: set of subjects whose treatments have already been determined in state s D c : all subjects whose treatment assignments haven t been determined yet D D c : are the subjects assigned a treatment in the new rule Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

9 Monte-Carlo Tree Search Overview [4] Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

10 MCT with Customized Pruning To choose a node in the selection step, the algorithm picks the node s based on the average empirical value of being in that node and a function of many times that node has been selected N s out of the total number of iterations so far N. 2 log N UCB1 : V s + N s Over time the probability of choosing the wrong action at the initial node goes to zero [1] L&R prune the search space by calculating an upper bound This upper bound is calculated by assuming all the unassigned subjects get the best possible treatment without incurring any assessment or treatment costs. Upper Bound(D C ) λ 1 n max o(i, a) a i D C Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

11 Q-learning Lists Start with favorite approach to estimate the Q-function. The authors chose kernel ridge regression. Determine the estimated optimal decision rule from your estimated Q-function π Q = argmax a A Q(x, a) Fit a decision list in a greedy way one clause at a time by finding R and a If x R then a else if x X then π Q (x) (1) Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

12 Fitting the first rule Let V (R) {0, 1, 2} be the number of variables used to define R, and tuning parameters η, ζ > 0. Then the rule and action ( R 1, â 1 ) that define the first rule in the decision list. 1 R 1, â 1 = argmax R,a n + ζ n i=1 [ I(x i R) Q(x i, a) + I(x i / R) Q(x, ] π Q (x i )) ( 1 n ) n I(x i R) + η (2 V (R)) i=1 (2) Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

13 Additional rules If x R 1 then â 1 else if x R then a else if x D c then π Q (x) The first term is independent of R and a and so can be dropped during the optimization procedure. This leads to the general form of the when estimating the t-th rule of the decision list 1 R t, â t = argmax R,a n n I(x i R, x i D c ) Q(x i, a) i=1 + 1 n I(x i / R, x i D c ) n Q(x, π Q (x i )) i=1 ( ) 1 n + ζ I(x i R, x i D c ) + η (2 V (R)) n i=1 Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16 (3) (4)

14 Method Comparison Both methods converge to the optimal list-based representation as n [1] Zhang et al. s is more computationally efficient. In their paper they extend their approach to the setting where there are T < decision stages. Lakkaraju & Rudin s approach could be extended to a multi-stage setting with Q-learning in theory. Computational complexity may be a problem as T increases, but many precision medicine/clinical trial applications have few enough stages that it s probably feasible. Lakkaraju & Rudin s approach has multiple pre-processing steps which are all sensible but the statistical properties of the pipeline aren t clear Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

Model Misspecification Matters R N(Q 0, 1) where Q 0 = 1 + 2X 1 + X 2 +.5X 3 +.

15 Model Misspecification Matters R N(Q 0, 1) where Q 0 = 1 + 2X 1 + X 2 +.5X (1 X 1 X 2 )A [7] Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

16 Take-away Message There s interesting work going on with interpretable methods. You should try using them even if the model you use for treatment assignment is black-box (and it works for classification too!) Interpretable methods are only as good as the model they are based on. Model selection and validation are still necessary for reinforcement learning [5] Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

17 References I Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1 43, Himabindu Lakkaraju and Cynthia Rudin. Learning cost-effective and interpretable treatment regimes. In Artificial Intelligence and Statistics, pages , Ronald Rivest. Learning Decision Lists. Machine Learning, 2: , Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

18 References II Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, Jeremy MG Taylor, Wenting Cheng, and Jared C Foster. Reader reaction to a robust method for estimating optimal treatment regimes by zhang et al.(2012). Biometrics, 71(1): , Yichi Zhang, Eric B. Laber, Marie Davidian, and Anastasios A. Tsiatis. Estimation of optimal treatment regimes using lists. Journal of the American Statistical Association, Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

19 References III Yingqi Zhao, Donglin Zeng, A John Rush, and Michael R Kosorok. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499): , Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

20 L & R Sim Results Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

21 Zhang et al. Sim Results Sperger (UNC) Interpretable Treatment Regimes May 4 th / 16

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML