Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs. Anastasios (Butch) Tsiatis and Marie Davidian

Similar documents
Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

A Sampling of IMPACT Research:

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012)

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

A Robust Method for Estimating Optimal Treatment Regimes

Methods for Interpretable Treatment Regimes

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Robustifying Trial-Derived Treatment Rules to a Target Population

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Covariate Balancing Propensity Score for General Treatment Regimes

Recent Advances in Outcome Weighted Learning for Precision Medicine

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Sample size considerations for precision medicine

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES. University of Michigan, Ann Arbor

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Adaptive Trial Designs

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Power and Sample Size Calculations with the Additive Hazards Model

Set-valued dynamic treatment regimes for competing outcomes

Combining multiple observational data sources to estimate causal eects

High Dimensional Propensity Score Estimation via Covariate Balancing

Extending causal inferences from a randomized trial to a target population

Supplement to Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes

Group Sequential Designs: Theory, Computation and Optimisation

Semiparametric Regression and Machine Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Comparative effectiveness of dynamic treatment regimes

arxiv: v1 [stat.me] 15 May 2011

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Q learning. A data analysis method for constructing adaptive interventions

5 Methods Based on Inverse Probability Weighting Under MAR

Causal Inference Basics

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Integrated approaches for analysis of cluster randomised trials

arxiv: v1 [math.st] 29 Jul 2014

Discussion of Papers on the Extensions of Propensity Score

Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout

Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders

Bootstrapping Sensitivity Analysis

A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Targeted Maximum Likelihood Estimation in Safety Analysis

Package Rsurrogate. October 20, 2016

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be

Longitudinal + Reliability = Joint Modeling

7 Sensitivity Analysis

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

A Gate-keeping Approach for Selecting Adaptive Interventions under General SMART Designs

Quantile-Optimal Treatment Regimes

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Achieving Optimal Covariate Balance Under General Treatment Regimes

Propensity Score Weighting with Multilevel Data

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

arxiv: v2 [stat.me] 17 Jan 2017

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Building a Prognostic Biomarker

Flexible Estimation of Treatment Effect Parameters

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

ECO Class 6 Nonparametric Econometrics

Using Estimating Equations for Spatially Correlated A

Targeted Group Sequential Adaptive Designs

Causal Inference for Mediation Effects

Causal modelling in Medical Research

Multi-state Models: An Overview

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Adaptive Designs: Why, How and When?

2 Naïve Methods. 2.1 Complete or available case analysis

Stat 642, Lecture notes for 04/12/05 96

Topics and Papers for Spring 14 RIT

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Linear Methods for Prediction

Variable selection and machine learning methods in causal inference

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

DATA-ADAPTIVE VARIABLE SELECTION FOR

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Classification. Chapter Introduction. 6.2 The Bayes classifier

Propensity Score Methods for Causal Inference

Global Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach

Transcription:

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs Anastasios (Butch) Tsiatis and Marie Davidian Department of Statistics North Carolina State University http://www4.stat.ncsu.edu/~davidian 1/65 Treatment Regimes and SMARTs

Precision medicine Patent heterogeneity: Demographic characteristics Physiological characteristics Medical history, concomitant conditions Genetic/genomic characteristics Precision medicine: Multiple treatment options Multiple decision points A patient s characteristics are implicated in which treatment option s/he should receive...... and when 2/65 Treatment Regimes and SMARTs

Example: Acute leukemias Two decision points: Decision 1 : Induction chemotherapy (C, options c 1, c 2 ) Decision 2 : Maintenance treatment for patients who respond (M, options m 1, m 2 ) Salvage chemotherapy for those who don t (S, options s 1, s 2 ) 3/65 Treatment Regimes and SMARTs

Clinical decision-making How are these decisions made? Clinical judgment, practice guidelines Synthesis of all information on a patient up to the point of a decision to determine next treatment action from among the possible options Goal: Make these decisions so as to lead to the best outcome for the patient Goal: Formalize clinical decision-making and make it evidence-based 4/65 Treatment Regimes and SMARTs

Treatment regime Formalizing precision medicine: At any decision Construct a rule that takes as input the accrued information on the patient to that point and outputs the next treatment from among the possible options Decision 1: If age < 50 years and WBC < 10.0 10 3 /µl, give chemotherapy c 2 ; otherwise, give c 1 (Dynamic) treatment regime(n): Aka adaptive treatment strategy A set of such rules, each corresponding to a decision point An algorithm for treating an individual patient 5/65 Treatment Regimes and SMARTs

Single decision point First: Focus on single decision point x = all available information on the patient Set of treatment options, e.g., {c 1,c 2 } = {0, 1} Treatment regime: A single rule of the form d(x) such that d(x) = 0 or 1 depending on x Example rules/regimes: x = {age, WBC, gender} Rule involving cut-offs (rectangular region ) d(x) = I(age < 50 and WBC < 10) Rule involving a linear combination (hyperplane ) d(x) = I(age + 8.7 log(wbc) 60 > 0) 6/65 Treatment Regimes and SMARTs

Best treatment regime? Challenge: There is an infinitude of possible regimes d D = class of all possible treatment regimes Can we find the best set of rules; i.e., the best treatment regime in D? How to define best? Approach: Formalize best using ideas from causal inference 7/65 Treatment Regimes and SMARTs

Clinical outcome Outcome: There is a clinical outcome by which treatment benefit can be assessed Survival time, CD4 count, indicator of no myocardial infarction within 30 days,... Larger outcomes are better 8/65 Treatment Regimes and SMARTs

Defining best An optimal regime d opt : Should satisfy If an individual patient were to receive treatment according to d opt, his/her expected outcome would be as large as possible given the information available on him/her If all patients in the population were to receive treatment according to d opt, the expected (average) outcome for the population would be as large as possible Formalize this... 9/65 Treatment Regimes and SMARTs

Optimal regime for single decision For simplicity: Consider regimes involving a single decision with two treatment options (coded as 0 and 1) A = {0, 1} Baseline information x Treatment regime: A single rule d(x) d : x {0, 1} d D, the class of all regimes 10/65 Treatment Regimes and SMARTs

Potential outcomes Treatments 0 and 1: We can hypothesize potential outcomes Y (1) = outcome that would be achieved if a patient were to receive treatment 1; Y (0) similarly E{Y (1)} is the expected outcome if all patients in the population were to receive 1; E{Y (0)} similarly Potential outcome for a regime: For any d D Y (d) = potential outcome for a patient with baseline information X if s/he were to receive treatment in accordance with regime d Y (d) = Y (1)d(X) + Y (0){1 d(x)} Potential outcome for regime d for a patient with information X is the potential outcome for the treatment option dictated by d 11/65 Treatment Regimes and SMARTs

Optimal treatment regime Thus: E{Y (d) X = x} is the expected outcome for a patient with information x if s/he were to receive treatment according to regime d D E{Y (d)} = E[ E{Y (d) X} ] is the expected (average) outcome for the population if all patients were to receive treatment according to regime d D Optimal regime: d opt is a regime in D such that E{Y (d) X = x} E{Y (d opt ) X = x} for all x and d D E{Y (d)} E{Y (d opt )} for all d D 12/65 Treatment Regimes and SMARTs

Value of a treatment regime Definition: E{Y (d)} is referred to as the value of regime d Write V (d) = E{Y (d)} Thus, d opt is a regime in D that maximizes the value V (d) among all d D 13/65 Treatment Regimes and SMARTs

Optimal treatment regime Simplest case: A = {0, 1} Y (d) = Y (1)d(X) + Y (0){1 d(x)} Thus, the value V (d) of d is [ V (d) = E{Y (d)} = E = E It follows that d opt (x) = I ] E{Y (d) X} [ E{Y (1) X}d(X) + E{Y (0) X}{1 d(x)} [ ] E{Y (1) X = x} > E{Y (0) X = x} ] 14/65 Treatment Regimes and SMARTs

Statistical problem Goal: Given data from a clinical trial or observational study, estimate an optimal regime d opt satisfying this definition Observed data: Single decision, K = 1, A = {0, 1}. (X i, A i, Y i ), i = 1,..., n iid X = baseline information A = treatment actually received Y = observed outcome Required: Optimal regime is defined in terms of potential outcomes Must express equivalently in terms of observed data 15/65 Treatment Regimes and SMARTs

Assumptions Consistency assumption: Y = Y (1)A + Y (0)(1 A) No unmeasured confounders assumption: Y (0), Y (1) A X X contains all information used to assign treatments Automatically satisfied in a randomized trial Unverifiable but necessary in an observational study 16/65 Treatment Regimes and SMARTs

Under these assumptions E{Y (1)} = E[E{Y (1) X}] = E[E{Y (1) X, A = 1}] and similarly for E{Y (0)} = E{ E(Y X, A = 1) } Value V (d) of d: By similar calculations V (d) = E{Y (d)} = E[E{Y (d) X}] [ ] = E E{Y (1) X}d(X) + E{Y (0) X}{1 d(x)} [ ] = E E(Y X, A = 1)d(X) + E(Y X, A = 0){1 d(x)} 17/65 Treatment Regimes and SMARTs

In terms of obesrved data [ ] V (d) = E E(Y X, A = 1)d(X) + E(Y X, A = 0){1 d(x)} [ ] d opt (x) = I E{Y X = x, A = 1} > E{Y X = x, A = 0} Result: Can express V (d) and d opt in terms the observed data Q(x, a) = E(Y X = x, A = a) is the regression of outcome on X and treatment actually received A d opt chooses the treatment option that makes this regression the largest d opt (x) = I{Q(x, 1) > Q(x, 0)} 18/65 Treatment Regimes and SMARTs

Estimating V (d) and d opt Q(x, a) = E(Y X = x, A = a) [ ] V (d) = E Q(X, 1)d(X) + Q(X, 0){1 d(x)} d opt (x) = I{Q(x, 1) > Q(x, 0)} To use these results: The regression Q(x, a) = E(Y X = x, A = a) is not known Posit a model Q(x, a; β) (e.g., linear or logistic regression) Q(x, a; β) = β 0 + β 1 x + β 2 a + β 3 ax Estimate β by β (e.g., least squares, ML ) 19/65 Treatment Regimes and SMARTs

Estimating V (d) Estimator for the value of d: For any d V (d) = n 1 n i=1 [ Q(X i, 1; β)d(x i ) + Q(X i, 0; β){1 ] d(x i )} 20/65 Treatment Regimes and SMARTs

Regression estimator for d opt Regression estimator for d opt : d opt Q (x) = I{ Q(x, 1; β) > Q(x, 0; β) } Regression estimator for the value of d opt : V (d opt ) V Q ( d opt ) = n 1 n i=1 [ Q(X i, 1; β) d opt (X i ) + Q(X i, 0; β){1 d ] opt (X i )} Concern: Q(x, a; β) may be misspecified d opt could be far from d opt 21/65 Treatment Regimes and SMARTs

Alternative approach Class of regimes: Q(X, A; β) defines a class of regimes d(x, β) = I{Q(x, 1; β) > Q(x, 0; β)}, indexed by β, that may or may not contain d opt E.g., if we posit Q(X, A; β) = β 0 + β 1 X 1 + β 2 X 2 + A(β 3 + β 4 X 1 + β 5 X 2 ) Implies regimes of the form d(x, β) = I(β 3 + β 4 x 1 + β 5 x 2 0) 22/65 Treatment Regimes and SMARTs

Restricted class of regimes Suggests: Consider directly a restricted class of regimes D η d η (x) = d(x, η), indexed by η May be motivated by a regression model or based on cost, feasibility in practice, interpretability; e.g., d(x, η) = I(x 1 < η 0, x 2 < η 1 ) D η may or may not contain d opt, but still of interest Optimal restricted regime d opt η (x) = d(x, η opt ), η opt maximizes V (d η ) = E{Y (d η )} Estimate the optimal restricted regime by estimating η opt 23/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Idea: Maximize a good estimator for the value V (d η ) Missing data analogy: For fixed η, define C η = I{A = d(x, η)} = Ad(X, η) + (1 A){1 d(x, η)} C η = 1 if the treatment received is consistent with having following d η and = 0 otherwise Only subjects with C η = 1 have observed outcomes Y consistent with following d η For the rest, such outcomes are missing 24/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Propensity score: Propensity for treatment 1 π(x) = pr(a = 1 X = x) Randomized trial: π(x) is known Observational study: Posit a model π(x; γ) (e.g., logistic regression) and obtain γ using (A i, X i ), i = 1,..., n Propensity of receiving treatment consistent with d η π c (X; η) = pr(c η = 1 X) Write π c (x; η, γ) with π(x; γ) = E[Ad(X, η) + (1 A){1 d(x, η)} X] = π(x)d(x, η) + {1 π(x)}{1 d(x, η)} 25/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Inverse probability weighted estimator for V (d η ): V IPWE (d η ) = n 1 n i=1 C η,i Y i π c (X i ; η, γ). Consistent for V (d η ) if π(x; γ) (thus π c (x; η, γ)) is correctly specified But uses data only from subjects with C η = 1 26/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Doubly robust augmented inverse probability weighted estimator: n { V AIPWE (d η ) = n 1 Cη,i Y i π c (X i ; η, γ) C } η,i π c (X i ; η, γ) m(x i ; η, π c (X i ; η, γ) β) i=1 m(x; η, β) = E{Y (d η ) X} = Q(X, 1; β)d(x, η)+q(x, 0; β){1 d(x, η)} Q(x, a; β) is a model for E(Y X = x, A = a) Consistent if either π(x, γ) or Q(x, a; β) is correct (doubly robust ) Attempts to gain efficiency by using data from all subjects 27/65 Treatment Regimes and SMARTs

Value search estimators Result: Estimators η opt by maximizing V IPWE (d η ) or V AIPWE (d η ) in η Estimated optimal restricted regime d opt η (x) = d(x, η opt ) Non-smooth in η; need suitable optimization techniques Estimators for V (dη opt ) = E{Y (dη opt )} V IPWE ( d opt η,ipwe ) or VAIPWE ( d opt η,aipwe ) Semiparametric theory : AIPWE is more efficient than IPWE for estimating V (d η ) = E{Y (d η )} Estimating regimes based on AIPWE should be better Zhang et al. (2012), Biometrics 28/65 Treatment Regimes and SMARTs

Classification perspective Algebra: Maximizing V IPWE (d η ) or V AIPWE (d η ) in η can be cast as a weighted classification problem D η determined by choice of classifier, e.g., SVM, CART Can exploit available algorithms and software Zhang et al. (2012), Stat ; Zhao et al. (2012), JASA (IPWE only, Outcome weighted learning, OWL) 29/65 Treatment Regimes and SMARTs

Multiple decision points At baseline: Information x 1 (accrued information h 1 = x 1 ) Decision 1: Two options {c 1, c 2 }; rule 1: d 1 (x 1 ) x 1 {c 1, c 2 } Between decisions 1 and 2: Collect additional information x 2, including responder status Accrued information h 2 = {x 1, chemotherapy at decision 1, x 2 } Decision 2: Four options {m 1, m 2, s 1, s 2 }; rule 2: d 2 (h 2 ) h 2 {m 1, m 2 } (responder), h 2 {s 1, s 2 } (nonresponder) Regime : d = (d 1, d 2 ) 30/65 Treatment Regimes and SMARTs

Multiple decision points For example: x 1 ={age, WBC, gender, %CD56 expression,... } a 1 = chemotherapy (c 1 or c 2 ) x 2 = {grade 3+ hematologic adverse event, post-induction ECOG status, WBC, platelets,..., responder status} Accrued information h 1 = x 1, h 2 = (x 1, a 1, x 2 ) Rules d 1 (h 1 ), d 2 (h 2 ) Regime d = {d 1 (h 1 ), d 2 (h 2 )} 31/65 Treatment Regimes and SMARTs

Multiple decision points In general: K decision points Baseline information x 1, intermediate information x k between decisions k 1 and k, k = 2,..., K Set of treatment options at decision k a k A k Accrued information h 1 = x 1 h k = {x 1, a 1, x 2, a 2,..., x k 1, a k 1, x k } k = 2,..., K Decision rules d 1 (h 1 ), d 2 (h 2 ),..., d K (h K ), d k : h k A k Dynamic treatment regime d = (d 1, d 2,..., d K ) D is the set of all possible K -decision regimes 32/65 Treatment Regimes and SMARTs

Optimal regime for multiple decisions An optimal regime d opt D: Should satisfy: If an individual patient with baseline information x 1 were to receive all K treatments according to d opt, his/her expected outcome would be as large as possible If all patients in the population were to receive all K treatments according to d opt, the expected (average) outcome for the population would be as large as possible 33/65 Treatment Regimes and SMARTs

Potential outcomes Potential outcomes for regime d = (d 1,..., d K ) D: Baseline information X 1, potential outcomes X 2 (d),..., X K (d), Y (d) Optimal regime: d opt is a regime in D such that E{Y (d)} E{Y (d opt )} for all d D E{Y (d) X 1 = x 1 } E{Y (d opt ) X 1 = x 1 } for all x 1, d D Excruciating details: Schulte et al. (2014), Statistical Science 34/65 Treatment Regimes and SMARTs

Statistical problem Goal: Given data from an appropriate clinical trial or observational study, estimate an optimal regime d opt satisfying this definition Required observed data: K decisions (X 1i, A 1i, X 2i, A 2i,..., X (K 1)i, A (K 1)i, X Ki, A Ki, Y i ), i = 1,..., n, iid X 1 = baseline information A k, k = 1,..., K = treatment received at decision k X k, k = 2,..., K = intermediate information observed between decisions k 1 and k H k = (X 1, A 1, X 2,..., A k 1, X k ), k = 2,..., K = accrued information to decision k Y = observed outcome, ascertained after decision K or can be a function of X 2,..., X K 35/65 Treatment Regimes and SMARTs

Assumptions Consistency assumption: X k, k = 2,..., K, and Y are equal to the potential outcomes under the treatments A 1,..., A K actually received Critical assumption: Sequential randomization Generalization of no unmeasured confounders Treatment received at each decision k = 1,..., K is statistically independent of potential outcomes that would be achieved under all treatment options at all decisions given the accrued information H k 36/65 Treatment Regimes and SMARTs

Data sources Studies: Longitudinal observational Sequential, Multiple Assignment, Randomized Trial (SMART) 37/65 Treatment Regimes and SMARTs

SMART Leukemia example: Randomization at s M 1 Response M 2 C 1 No Response S 1 S 2 Cancer Response M 1 C 2 M 2 No Response S 1 S 2 Sequential randomization automatically holds 38/65 Treatment Regimes and SMARTs

Characterizing an optimal regime For definiteness: Take K = 2 and A k = {0, 1}, k = 1, 2 Accrued information H 1 = X 1, H 2 = (X 1, A 1, X 2 ) Optimal regime d opt : Follows from backward induction Formally in terms of potential outcomes Sequential randomization assumption allows equivalent expressions in terms of observed data (X 1, A 1, X 2, A 2, Y ) (as for single decision and no unmeasured confounders) 39/65 Treatment Regimes and SMARTs

Characterizing an optimal regime Optimal regime d opt : Backward induction Decision 2: Q 2 (h 2, a 2 ) = E(Y H 2 = h 2, A 2 = a 2 ) d opt 2 (h 2 ) = I{Q 2 (h 2, 1) > Q 2 (h 2, 0)} Ỹ 2 (h 2 ) = max{q 2 (h 2, 0), Q 2 (h 2, 1)} Decision 1: Q 1 (h 1, a 1 ) = E{Ỹ2(H 2 ) H 1 = h 1, A 1 = a 1 } d opt = (d opt 1, d opt 2 ) d opt 1 (h 1 ) = I{Q 1 (h 1, 1) > Q 1 (h 1, 0)]} Ỹ 1 (h 1 ) = max{q 1 (h 1, 0), Q 1 (h 1, 1)} The value of d opt is V (d opt ) = E{Ỹ1(H 1 )} 40/65 Treatment Regimes and SMARTs

Q-Learning (sequential regression) Estimation of d opt : Decision 2: Posit and fit a model Q 2 (h 2, a 2 ; β 2 ) by regressing Y on H 2, A 2 (e.g., least squares) and estimate d opt Q,2 (h 2) = I{Q 2 (h 2, 1; β 2 ) > Q 2 (h 2, 0; β 2 )} For each i, form predicted value Ỹ 2i = Ỹ2i(H 2i ; β 2 ) = max{q 2 (H 2i, 0; β 2 ), Q 2 (H 2i, 1; β 2 )} Decision 1: Posit and fit a model Q 1 (h 1, a 1 ; β 1 ) by regressing Ỹ 2 on H 1, A 1 (e.g., least squares) and estimate d opt Q,1 (h 1) = I{Q 1 (h 1, 1; β 1 ) > Q 1 (h 1, 0; β 1 )} Estimated regime d opt Q opt opt = ( d, d Q,1 Q,2 ) 41/65 Treatment Regimes and SMARTs

Other estimators Issue for Q-learning: Model misspecification Inverse weighted methods: Analogy to a dropout problem Restricted class D η Generalization of IPWE, AIPWE, Zhang et al. (2013), Biometrika Subjects are included as long as observed sequential treatments are consistent with following d η Zhao et al. (2015), JASA (IPWE only, simultaneous or backward outcome weighted learning, S/BOWL) 42/65 Treatment Regimes and SMARTs

SMART Leukemia example: Randomization at s M 1 Response M 2 C 1 No Response S 1 S 2 Cancer Response M 1 C 2 M 2 No Response S 1 S 2 Sequential randomization automatically holds 43/65 Treatment Regimes and SMARTs

SMART Embedded regimes: This SMART embeds 8 simple regimes for j, l, p = 1, 2 of the form Give c j followed by m l if response, else s p if nonresponse E.g., in symbols d 1 (h 1 ) = c j for all h 1 d 2 (h 2 ) = m l if response = s p if nonresponse These embedded regimes use no baseline information at decision 1 and response status only at decision 2 Randomization guarantees subjects whose actual treatments are consistent with following at least 1 of the 8 regimes 44/65 Treatment Regimes and SMARTs

SMART Multi-purpose: Compare the embedded regimes ( treatment sequences ) Collect extensive, detailed information at baseline and intermediate to decisions 1 and 2 to inform estimation of an optimal regime Experience: Numerous SMARTs conducted in behavioral sciences research https://methodology.psu.edu/ra/smart/projects Stepped care interventions Also in cancer (CALBG/Alliance), but not focused on regimes 45/65 Treatment Regimes and SMARTs

CALGB 8923 (1995) Non- Response Follow-up Chemo + Placebo Intensification I Response Intensification II AML Non- Response Follow-up Chemo + GM-CSF Intensification I Response But not for the purpose of inference on regimes... Intensification II 46/65 Treatment Regimes and SMARTs

SMART design principles Current paradigm: Embed simple regimes Base sample size on conventional comparisons, e.g., decision 1 treatments Base sample size on embedded regimes (how?) Base sample size on comparing most and least intensive embedded regimes Open problem 47/65 Treatment Regimes and SMARTs

Optimizing behavioral cancer pain intervention PI: Tamara Somers, Duke University 48/65 Treatment Regimes and SMARTs

Behavioral interventions to prevent HIV PI: Brian Mustanski, Northwestern University 49/65 Treatment Regimes and SMARTs

Methodological challenges High-dimensional information? Regression model selection? Black box vs. restricted class of regimes? Assessment of uncertainty nonstandard asymptotic theory Censored survival outcomes Multiple outcomes (e.g., efficacy and toxicity)? Design considerations for SMARTs? Real time regimes (mhealth )... Statistical research is way ahead of what is actually being done in practice 50/65 Treatment Regimes and SMARTs

Software IMPACT Innovative Methods Program for Advancing Clinical Trials A joint venture of Duke, UNC-Chapel Hill, NC State Supported by NCI Program Project P01 CA142538 (2010 2020) http://impact.unc.edu R package: DynTxRegime Implements all this and more Also available on CRAN 51/65 Treatment Regimes and SMARTs

Shameless promotion Coming in 2018: Tsiatis, A. A., Davidian, M., Laber, E. B., and Holloway, S. T. (2017). An Introduction to Dynamic Treatment Regimes: Statistical Methods for Precision Medicine. Chapman & Hall/CRC Press. 52/65 Treatment Regimes and SMARTs

References Schulte, P. J., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2014). Q- and A-learning methods for estimating optimal dynamic treatment regimes. Statistical Science, 29, 640 661. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics, 68, 1010 1018. Zhang, B., Tsiatis, A.A., Davidian, M., Zhang, M., and Laber, E.B., (2012). Estimating optimal treatment regimes from a classification perspective. Stat, 1, 103 114. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika, 100, 681 694. Zhao, Y., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, 110, 583 598. Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107, 1106 1118. 53/65 Treatment Regimes and SMARTs

Additional references Bai, X., Tsiatis, A. A., Lu, W., and Song, R. (2016). Optimal treatment regimes for survival endpoints using a locally-efficient doubly-robust estimator from a classification perspective. Lifetime Data Analysis, in press. Davidian, M., Tsiatis, A. A., and Laber, E. B. (2016). Dynamic treatment regimes. In Cancer Clinical Trials: Current and Controversial Issues in Design and Analysis, George, S. L., Wang, X. and Pang, H. (eds). Boca Raton: Chapman & Hall/CRC Press, ch. 13, pp. 409 446. Jiang, R., Lu, W., Song, R., and Davidian, M. (2016). On estimation of optimal treatment regimes for maximizing t-year survival probability. Journal of the Royal Statistical Society, Series B, in press. Laber, E. B., Zhao, Y., Regh, T., Davidian, M., Tsiatis, A. A., Stanford, J., Zeng, D., Song, R., and Kosorok, M. R. (2016). Using pilot data to size a two-arm randomized trial to find a nearly optimal personalized treatment strategy. Statistics in Medicine, 35, 1245 1256. Goldberg, Y. and Kosorok, M. R. (2012). Q-learning with censored data, Annals of Statistics, 40, 529 560. 54/65 Treatment Regimes and SMARTs

Additional references Zhang, Y., Laber, E. B., Tsiatis, A. A., and Davidian, M. (2015). Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics, 71, 895 904. Zhao, Y. Q., Zeng, D., Laber, E. B., Song, R., Yuan, M. and Kosorok, M. R. (2015), Doubly robust learning for estimating individualized treatment with censored data, Biometrika, 102, 151 68. 55/65 Treatment Regimes and SMARTs

Important philosophical point Distinguish between: The best treatment for a patient The best treatment decision for a patient given the information available on the patient Best treatment for a patient: Is 0 or 1 according to Y (0) > Y (1) or Y (0) < Y (1) Best treatment decision given the information available: We cannot hope to determine this because we can never see all potential outcomes on a given patient We can hope to make the optimal decision given the information available, i.e., find d opt that makes E{Y (d) X = x} and V (d) = E{Y (d)} as large as possible 56/65 Treatment Regimes and SMARTs

Classification perspective Generic classification situation: Z = class, label ; here, Z = {0, 1} (binary ) X = features (covariates) d is a classifier: d : x {0, 1} D is a family of classifiers, e.g., Hyperplanes of the form Rectangular regions of the form I(η 0 + η 1 X 1 + X 2 > 0) I(X 1 < η 1 ) + I(X 1 η 1, X 2 < η 2 ) 57/65 Treatment Regimes and SMARTs

Classification perspective Generic classification problem: Training set: (X i, Z i ), i = 1,..., n Find classifier d D that minimizes n Classification error {Z i d(x i )} 2 i=1 Weighted classification error n w i {Z i d(x i )} 2 Machine learning (supervised learning) Support vector machines: Hyperplanes Recursive partitioning (CART ): Rectangular regions i=1 58/65 Treatment Regimes and SMARTs

Value search estimator, revisited Recall: Estimation of d η restricted class D η η opt maximizes V (d η ) = E{Y (d η )} AIPWE V AIPWE (d η ) = n 1 n { Cη,i Y i π c (X i ; η, γ) C } η,i π c (X i ; η, γ) m(x i ; η, π c (X i ; η, γ) β) i=1 η opt maximizes V AIPWE (d η ) = estimator d opt η,aipwe (x) 59/65 Treatment Regimes and SMARTs

Value search estimator, revisited Algebra: V AIPWE (d η ) can be rewritten as n 1 n Ĉ(X i) [I{Ĉ(X i) > 0} d(x i ; η)] 2 + terms not involving d i=1 Ĉ(X i ) = { Ai Y i π(x i, γ) A } i π(x i, γ) Q(X i, 1; π(x i ; γ) β) { (1 Ai )Y i 1 π(x i ; γ) + A } i π(x i ; γ) 1 π(x i ; γ) Q(X i, 0; β), E{Ĉ(X i) X i } C(X i ) = Q(X i, 1) Q(X i, 0), C(X) is the contrast function 60/65 Treatment Regimes and SMARTs

Classification formulation Result: Estimate η opt and thus obtain minimizing in η n 1 d opt η,aipwe (x) = d(x, ηopt ) by n Ĉ(X i) [I{Ĉ(X i) > 0} d(x i ; η)] 2 i=1 Can be viewed: As a weighted classification problem Label I{Ĉ(X i) > 0} Classifier d(x i ; η) Weight Ĉ(X i) Zhang et al. (2012), Stat ; Zhao et al. (2012), JASA (IPWE only, Outcome weighted learning, OWL) 61/65 Treatment Regimes and SMARTs

More on Q-Learning Estimation of V (d opt ): For each i, form Ỹ 1i = Ỹ1i(H 1i ; β 1 ) = max{q 1 (H 1i, 0; β 1 ), Q 1 (H 1i, 1; β 1 )} Estimate the value V (d opt ) by V Q (d opt ) = n 1 n Ỹ 1i i=1 62/65 Treatment Regimes and SMARTs

More on Q-Learning Decision 2: Positing/fitting Q 2 (H 2, A 2 ; β 2 ) is a standard regression problem Data (H 2i, A 2i, Y i ), i = 1,..., n For continuous Y, linear models are often used, e.g. Q 2 (h 2, a 2 ; β 2 ) = h T 2 β 21 + a 2 ( h T 2 β 22), h2 = (1, h T 2 )T Standard model selection methods, diagnostics, etc Under the linear model d opt 2 (h 2 ) = I( h T 2 β 22 > 0) Ỹ 2 (H 2 ; β 2 ) = H T 2 β 21 + ( H T 2 β 22)I( H T 2 β 22 > 0) 63/65 Treatment Regimes and SMARTs

More on Q-Learning Ỹ 2 (H 2 ; β 2 ) = H T 2 β 21 + ( H T 2 β 22)I( H T 2 β 22 > 0) Decision 1: It is common to assume a linear model for Q 1 (H 1, A 1 ) = E{Ỹ2(H 2 ; β 2 ) H 1, A 1 } Q 1 (h 1, a 1 ; β 1 ) = h T 1 β 11 + a 2 ( h T 1 β 12), h1 = (1, h T 1 )T Data (H 1i, A 1i, Ỹ 2i ), i = 1..., n This is clearly a nonstandard regression problem; more in a moment Under the linear model (if it were correct ) d opt 1 (h 1 ) = I( h T 1 β 12 > 0) 64/65 Treatment Regimes and SMARTs

More on Q-Learning Decision 1: Even if the Decision 2 linear model is correctly specified, a linear model at Decision 1 is almost certainly incorrect In truth Q 2 (h 2, a 2 ) = x 1 + a 2 (0.5a 1 + x 2 ) and May show that X 2 X 1 = x 1, A 1 = a 1 N (ξx 1, σ 2 ) Q 1 (h 1, a 1 ) = x 1 + Not linear in general σ 2π exp{ (0.5a 1 + ξx 1 ) 2 /(2σ 2 )} +(0.5a 1 + ξx 1 )Φ{(0.5a 1 + ξx 1 )/σ} 65/65 Treatment Regimes and SMARTs