Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs. Anastasios (Butch) Tsiatis and Marie Davidian

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs Anastasios (Butch) Tsiatis and Marie Davidian Department of Statistics North Carolina State University http://www4.stat.ncsu.edu/~davidian 1/65 Treatment Regimes and SMARTs

Precision medicine Patent heterogeneity: Demographic characteristics Physiological characteristics Medical history, concomitant conditions Genetic/genomic characteristics Precision medicine: Multiple treatment options Multiple decision points A patient s characteristics are implicated in which treatment option s/he should receive...... and when 2/65 Treatment Regimes and SMARTs

Example: Acute leukemias Two decision points: Decision 1 : Induction chemotherapy (C, options c 1, c 2 ) Decision 2 : Maintenance treatment for patients who respond (M, options m 1, m 2 ) Salvage chemotherapy for those who don t (S, options s 1, s 2 ) 3/65 Treatment Regimes and SMARTs

Clinical decision-making How are these decisions made? Clinical judgment, practice guidelines Synthesis of all information on a patient up to the point of a decision to determine next treatment action from among the possible options Goal: Make these decisions so as to lead to the best outcome for the patient Goal: Formalize clinical decision-making and make it evidence-based 4/65 Treatment Regimes and SMARTs

Treatment regime Formalizing precision medicine: At any decision Construct a rule that takes as input the accrued information on the patient to that point and outputs the next treatment from among the possible options Decision 1: If age < 50 years and WBC < 10.0 10 3 /µl, give chemotherapy c 2 ; otherwise, give c 1 (Dynamic) treatment regime(n): Aka adaptive treatment strategy A set of such rules, each corresponding to a decision point An algorithm for treating an individual patient 5/65 Treatment Regimes and SMARTs

Single decision point First: Focus on single decision point x = all available information on the patient Set of treatment options, e.g., {c 1,c 2 } = {0, 1} Treatment regime: A single rule of the form d(x) such that d(x) = 0 or 1 depending on x Example rules/regimes: x = {age, WBC, gender} Rule involving cut-offs (rectangular region ) d(x) = I(age < 50 and WBC < 10) Rule involving a linear combination (hyperplane ) d(x) = I(age + 8.7 log(wbc) 60 > 0) 6/65 Treatment Regimes and SMARTs

Best treatment regime? Challenge: There is an infinitude of possible regimes d D = class of all possible treatment regimes Can we find the best set of rules; i.e., the best treatment regime in D? How to define best? Approach: Formalize best using ideas from causal inference 7/65 Treatment Regimes and SMARTs

Clinical outcome Outcome: There is a clinical outcome by which treatment benefit can be assessed Survival time, CD4 count, indicator of no myocardial infarction within 30 days,... Larger outcomes are better 8/65 Treatment Regimes and SMARTs

Defining best An optimal regime d opt : Should satisfy If an individual patient were to receive treatment according to d opt, his/her expected outcome would be as large as possible given the information available on him/her If all patients in the population were to receive treatment according to d opt, the expected (average) outcome for the population would be as large as possible Formalize this... 9/65 Treatment Regimes and SMARTs

Optimal regime for single decision For simplicity: Consider regimes involving a single decision with two treatment options (coded as 0 and 1) A = {0, 1} Baseline information x Treatment regime: A single rule d(x) d : x {0, 1} d D, the class of all regimes 10/65 Treatment Regimes and SMARTs

Potential outcomes Treatments 0 and 1: We can hypothesize potential outcomes Y (1) = outcome that would be achieved if a patient were to receive treatment 1; Y (0) similarly E{Y (1)} is the expected outcome if all patients in the population were to receive 1; E{Y (0)} similarly Potential outcome for a regime: For any d D Y (d) = potential outcome for a patient with baseline information X if s/he were to receive treatment in accordance with regime d Y (d) = Y (1)d(X) + Y (0){1 d(x)} Potential outcome for regime d for a patient with information X is the potential outcome for the treatment option dictated by d 11/65 Treatment Regimes and SMARTs

Optimal treatment regime Thus: E{Y (d) X = x} is the expected outcome for a patient with information x if s/he were to receive treatment according to regime d D E{Y (d)} = E[ E{Y (d) X} ] is the expected (average) outcome for the population if all patients were to receive treatment according to regime d D Optimal regime: d opt is a regime in D such that E{Y (d) X = x} E{Y (d opt ) X = x} for all x and d D E{Y (d)} E{Y (d opt )} for all d D 12/65 Treatment Regimes and SMARTs

Value of a treatment regime Definition: E{Y (d)} is referred to as the value of regime d Write V (d) = E{Y (d)} Thus, d opt is a regime in D that maximizes the value V (d) among all d D 13/65 Treatment Regimes and SMARTs

Optimal treatment regime Simplest case: A = {0, 1} Y (d) = Y (1)d(X) + Y (0){1 d(x)} Thus, the value V (d) of d is [ V (d) = E{Y (d)} = E = E It follows that d opt (x) = I ] E{Y (d) X} [ E{Y (1) X}d(X) + E{Y (0) X}{1 d(x)} [ ] E{Y (1) X = x} > E{Y (0) X = x} ] 14/65 Treatment Regimes and SMARTs

Statistical problem Goal: Given data from a clinical trial or observational study, estimate an optimal regime d opt satisfying this definition Observed data: Single decision, K = 1, A = {0, 1}. (X i, A i, Y i ), i = 1,..., n iid X = baseline information A = treatment actually received Y = observed outcome Required: Optimal regime is defined in terms of potential outcomes Must express equivalently in terms of observed data 15/65 Treatment Regimes and SMARTs

Assumptions Consistency assumption: Y = Y (1)A + Y (0)(1 A) No unmeasured confounders assumption: Y (0), Y (1) A X X contains all information used to assign treatments Automatically satisfied in a randomized trial Unverifiable but necessary in an observational study 16/65 Treatment Regimes and SMARTs

Under these assumptions E{Y (1)} = E[E{Y (1) X}] = E[E{Y (1) X, A = 1}] and similarly for E{Y (0)} = E{ E(Y X, A = 1) } Value V (d) of d: By similar calculations V (d) = E{Y (d)} = E[E{Y (d) X}] [ ] = E E{Y (1) X}d(X) + E{Y (0) X}{1 d(x)} [ ] = E E(Y X, A = 1)d(X) + E(Y X, A = 0){1 d(x)} 17/65 Treatment Regimes and SMARTs

In terms of obesrved data [ ] V (d) = E E(Y X, A = 1)d(X) + E(Y X, A = 0){1 d(x)} [ ] d opt (x) = I E{Y X = x, A = 1} > E{Y X = x, A = 0} Result: Can express V (d) and d opt in terms the observed data Q(x, a) = E(Y X = x, A = a) is the regression of outcome on X and treatment actually received A d opt chooses the treatment option that makes this regression the largest d opt (x) = I{Q(x, 1) > Q(x, 0)} 18/65 Treatment Regimes and SMARTs

Estimating V (d) and d opt Q(x, a) = E(Y X = x, A = a) [ ] V (d) = E Q(X, 1)d(X) + Q(X, 0){1 d(x)} d opt (x) = I{Q(x, 1) > Q(x, 0)} To use these results: The regression Q(x, a) = E(Y X = x, A = a) is not known Posit a model Q(x, a; β) (e.g., linear or logistic regression) Q(x, a; β) = β 0 + β 1 x + β 2 a + β 3 ax Estimate β by β (e.g., least squares, ML ) 19/65 Treatment Regimes and SMARTs

Estimating V (d) Estimator for the value of d: For any d V (d) = n 1 n i=1 [ Q(X i, 1; β)d(x i ) + Q(X i, 0; β){1 ] d(x i )} 20/65 Treatment Regimes and SMARTs

Regression estimator for d opt Regression estimator for d opt : d opt Q (x) = I{ Q(x, 1; β) > Q(x, 0; β) } Regression estimator for the value of d opt : V (d opt ) V Q ( d opt ) = n 1 n i=1 [ Q(X i, 1; β) d opt (X i ) + Q(X i, 0; β){1 d ] opt (X i )} Concern: Q(x, a; β) may be misspecified d opt could be far from d opt 21/65 Treatment Regimes and SMARTs

Alternative approach Class of regimes: Q(X, A; β) defines a class of regimes d(x, β) = I{Q(x, 1; β) > Q(x, 0; β)}, indexed by β, that may or may not contain d opt E.g., if we posit Q(X, A; β) = β 0 + β 1 X 1 + β 2 X 2 + A(β 3 + β 4 X 1 + β 5 X 2 ) Implies regimes of the form d(x, β) = I(β 3 + β 4 x 1 + β 5 x 2 0) 22/65 Treatment Regimes and SMARTs

Restricted class of regimes Suggests: Consider directly a restricted class of regimes D η d η (x) = d(x, η), indexed by η May be motivated by a regression model or based on cost, feasibility in practice, interpretability; e.g., d(x, η) = I(x 1 < η 0, x 2 < η 1 ) D η may or may not contain d opt, but still of interest Optimal restricted regime d opt η (x) = d(x, η opt ), η opt maximizes V (d η ) = E{Y (d η )} Estimate the optimal restricted regime by estimating η opt 23/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Idea: Maximize a good estimator for the value V (d η ) Missing data analogy: For fixed η, define C η = I{A = d(x, η)} = Ad(X, η) + (1 A){1 d(x, η)} C η = 1 if the treatment received is consistent with having following d η and = 0 otherwise Only subjects with C η = 1 have observed outcomes Y consistent with following d η For the rest, such outcomes are missing 24/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Propensity score: Propensity for treatment 1 π(x) = pr(a = 1 X = x) Randomized trial: π(x) is known Observational study: Posit a model π(x; γ) (e.g., logistic regression) and obtain γ using (A i, X i ), i = 1,..., n Propensity of receiving treatment consistent with d η π c (X; η) = pr(c η = 1 X) Write π c (x; η, γ) with π(x; γ) = E[Ad(X, η) + (1 A){1 d(x, η)} X] = π(x)d(x, η) + {1 π(x)}{1 d(x, η)} 25/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Inverse probability weighted estimator for V (d η ): V IPWE (d η ) = n 1 n i=1 C η,i Y i π c (X i ; η, γ). Consistent for V (d η ) if π(x; γ) (thus π c (x; η, γ)) is correctly specified But uses data only from subjects with C η = 1 26/65 Treatment Regimes and SMARTs

Estimating an optimal restricted regime Doubly robust augmented inverse probability weighted estimator: n { V AIPWE (d η ) = n 1 Cη,i Y i π c (X i ; η, γ) C } η,i π c (X i ; η, γ) m(x i ; η, π c (X i ; η, γ) β) i=1 m(x; η, β) = E{Y (d η ) X} = Q(X, 1; β)d(x, η)+q(x, 0; β){1 d(x, η)} Q(x, a; β) is a model for E(Y X = x, A = a) Consistent if either π(x, γ) or Q(x, a; β) is correct (doubly robust ) Attempts to gain efficiency by using data from all subjects 27/65 Treatment Regimes and SMARTs

Value search estimators Result: Estimators η opt by maximizing V IPWE (d η ) or V AIPWE (d η ) in η Estimated optimal restricted regime d opt η (x) = d(x, η opt ) Non-smooth in η; need suitable optimization techniques Estimators for V (dη opt ) = E{Y (dη opt )} V IPWE ( d opt η,ipwe ) or VAIPWE ( d opt η,aipwe ) Semiparametric theory : AIPWE is more efficient than IPWE for estimating V (d η ) = E{Y (d η )} Estimating regimes based on AIPWE should be better Zhang et al. (2012), Biometrics 28/65 Treatment Regimes and SMARTs

Classification perspective Algebra: Maximizing V IPWE (d η ) or V AIPWE (d η ) in η can be cast as a weighted classification problem D η determined by choice of classifier, e.g., SVM, CART Can exploit available algorithms and software Zhang et al. (2012), Stat ; Zhao et al. (2012), JASA (IPWE only, Outcome weighted learning, OWL) 29/65 Treatment Regimes and SMARTs

Multiple decision points At baseline: Information x 1 (accrued information h 1 = x 1 ) Decision 1: Two options {c 1, c 2 }; rule 1: d 1 (x 1 ) x 1 {c 1, c 2 } Between decisions 1 and 2: Collect additional information x 2, including responder status Accrued information h 2 = {x 1, chemotherapy at decision 1, x 2 } Decision 2: Four options {m 1, m 2, s 1, s 2 }; rule 2: d 2 (h 2 ) h 2 {m 1, m 2 } (responder), h 2 {s 1, s 2 } (nonresponder) Regime : d = (d 1, d 2 ) 30/65 Treatment Regimes and SMARTs

Multiple decision points For example: x 1 ={age, WBC, gender, %CD56 expression,... } a 1 = chemotherapy (c 1 or c 2 ) x 2 = {grade 3+ hematologic adverse event, post-induction ECOG status, WBC, platelets,..., responder status} Accrued information h 1 = x 1, h 2 = (x 1, a 1, x 2 ) Rules d 1 (h 1 ), d 2 (h 2 ) Regime d = {d 1 (h 1 ), d 2 (h 2 )} 31/65 Treatment Regimes and SMARTs

Multiple decision points In general: K decision points Baseline information x 1, intermediate information x k between decisions k 1 and k, k = 2,..., K Set of treatment options at decision k a k A k Accrued information h 1 = x 1 h k = {x 1, a 1, x 2, a 2,..., x k 1, a k 1, x k } k = 2,..., K Decision rules d 1 (h 1 ), d 2 (h 2 ),..., d K (h K ), d k : h k A k Dynamic treatment regime d = (d 1, d 2,..., d K ) D is the set of all possible K -decision regimes 32/65 Treatment Regimes and SMARTs

Optimal regime for multiple decisions An optimal regime d opt D: Should satisfy: If an individual patient with baseline information x 1 were to receive all K treatments according to d opt, his/her expected outcome would be as large as possible If all patients in the population were to receive all K treatments according to d opt, the expected (average) outcome for the population would be as large as possible 33/65 Treatment Regimes and SMARTs

Potential outcomes Potential outcomes for regime d = (d 1,..., d K ) D: Baseline information X 1, potential outcomes X 2 (d),..., X K (d), Y (d) Optimal regime: d opt is a regime in D such that E{Y (d)} E{Y (d opt )} for all d D E{Y (d) X 1 = x 1 } E{Y (d opt ) X 1 = x 1 } for all x 1, d D Excruciating details: Schulte et al. (2014), Statistical Science 34/65 Treatment Regimes and SMARTs

Statistical problem Goal: Given data from an appropriate clinical trial or observational study, estimate an optimal regime d opt satisfying this definition Required observed data: K decisions (X 1i, A 1i, X 2i, A 2i,..., X (K 1)i, A (K 1)i, X Ki, A Ki, Y i ), i = 1,..., n, iid X 1 = baseline information A k, k = 1,..., K = treatment received at decision k X k, k = 2,..., K = intermediate information observed between decisions k 1 and k H k = (X 1, A 1, X 2,..., A k 1, X k ), k = 2,..., K = accrued information to decision k Y = observed outcome, ascertained after decision K or can be a function of X 2,..., X K 35/65 Treatment Regimes and SMARTs

Assumptions Consistency assumption: X k, k = 2,..., K, and Y are equal to the potential outcomes under the treatments A 1,..., A K actually received Critical assumption: Sequential randomization Generalization of no unmeasured confounders Treatment received at each decision k = 1,..., K is statistically independent of potential outcomes that would be achieved under all treatment options at all decisions given the accrued information H k 36/65 Treatment Regimes and SMARTs

Data sources Studies: Longitudinal observational Sequential, Multiple Assignment, Randomized Trial (SMART) 37/65 Treatment Regimes and SMARTs

SMART Leukemia example: Randomization at s M 1 Response M 2 C 1 No Response S 1 S 2 Cancer Response M 1 C 2 M 2 No Response S 1 S 2 Sequential randomization automatically holds 38/65 Treatment Regimes and SMARTs

Characterizing an optimal regime For definiteness: Take K = 2 and A k = {0, 1}, k = 1, 2 Accrued information H 1 = X 1, H 2 = (X 1, A 1, X 2 ) Optimal regime d opt : Follows from backward induction Formally in terms of potential outcomes Sequential randomization assumption allows equivalent expressions in terms of observed data (X 1, A 1, X 2, A 2, Y ) (as for single decision and no unmeasured confounders) 39/65 Treatment Regimes and SMARTs

Characterizing an optimal regime Optimal regime d opt : Backward induction Decision 2: Q 2 (h 2, a 2 ) = E(Y H 2 = h 2, A 2 = a 2 ) d opt 2 (h 2 ) = I{Q 2 (h 2, 1) > Q 2 (h 2, 0)} Ỹ 2 (h 2 ) = max{q 2 (h 2, 0), Q 2 (h 2, 1)} Decision 1: Q 1 (h 1, a 1 ) = E{Ỹ2(H 2 ) H 1 = h 1, A 1 = a 1 } d opt = (d opt 1, d opt 2 ) d opt 1 (h 1 ) = I{Q 1 (h 1, 1) > Q 1 (h 1, 0)]} Ỹ 1 (h 1 ) = max{q 1 (h 1, 0), Q 1 (h 1, 1)} The value of d opt is V (d opt ) = E{Ỹ1(H 1 )} 40/65 Treatment Regimes and SMARTs

Q-Learning (sequential regression) Estimation of d opt : Decision 2: Posit and fit a model Q 2 (h 2, a 2 ; β 2 ) by regressing Y on H 2, A 2 (e.g., least squares) and estimate d opt Q,2 (h 2) = I{Q 2 (h 2, 1; β 2 ) > Q 2 (h 2, 0; β 2 )} For each i, form predicted value Ỹ 2i = Ỹ2i(H 2i ; β 2 ) = max{q 2 (H 2i, 0; β 2 ), Q 2 (H 2i, 1; β 2 )} Decision 1: Posit and fit a model Q 1 (h 1, a 1 ; β 1 ) by regressing Ỹ 2 on H 1, A 1 (e.g., least squares) and estimate d opt Q,1 (h 1) = I{Q 1 (h 1, 1; β 1 ) > Q 1 (h 1, 0; β 1 )} Estimated regime d opt Q opt opt = ( d, d Q,1 Q,2 ) 41/65 Treatment Regimes and SMARTs

Other estimators Issue for Q-learning: Model misspecification Inverse weighted methods: Analogy to a dropout problem Restricted class D η Generalization of IPWE, AIPWE, Zhang et al. (2013), Biometrika Subjects are included as long as observed sequential treatments are consistent with following d η Zhao et al. (2015), JASA (IPWE only, simultaneous or backward outcome weighted learning, S/BOWL) 42/65 Treatment Regimes and SMARTs

SMART Leukemia example: Randomization at s M 1 Response M 2 C 1 No Response S 1 S 2 Cancer Response M 1 C 2 M 2 No Response S 1 S 2 Sequential randomization automatically holds 43/65 Treatment Regimes and SMARTs

SMART Embedded regimes: This SMART embeds 8 simple regimes for j, l, p = 1, 2 of the form Give c j followed by m l if response, else s p if nonresponse E.g., in symbols d 1 (h 1 ) = c j for all h 1 d 2 (h 2 ) = m l if response = s p if nonresponse These embedded regimes use no baseline information at decision 1 and response status only at decision 2 Randomization guarantees subjects whose actual treatments are consistent with following at least 1 of the 8 regimes 44/65 Treatment Regimes and SMARTs

SMART Multi-purpose: Compare the embedded regimes ( treatment sequences ) Collect extensive, detailed information at baseline and intermediate to decisions 1 and 2 to inform estimation of an optimal regime Experience: Numerous SMARTs conducted in behavioral sciences research https://methodology.psu.edu/ra/smart/projects Stepped care interventions Also in cancer (CALBG/Alliance), but not focused on regimes 45/65 Treatment Regimes and SMARTs

CALGB 8923 (1995) Non- Response Follow-up Chemo + Placebo Intensification I Response Intensification II AML Non- Response Follow-up Chemo + GM-CSF Intensification I Response But not for the purpose of inference on regimes... Intensification II 46/65 Treatment Regimes and SMARTs

SMART design principles Current paradigm: Embed simple regimes Base sample size on conventional comparisons, e.g., decision 1 treatments Base sample size on embedded regimes (how?) Base sample size on comparing most and least intensive embedded regimes Open problem 47/65 Treatment Regimes and SMARTs

Optimizing behavioral cancer pain intervention PI: Tamara Somers, Duke University 48/65 Treatment Regimes and SMARTs

Behavioral interventions to prevent HIV PI: Brian Mustanski, Northwestern University 49/65 Treatment Regimes and SMARTs

Methodological challenges High-dimensional information? Regression model selection? Black box vs. restricted class of regimes? Assessment of uncertainty nonstandard asymptotic theory Censored survival outcomes Multiple outcomes (e.g., efficacy and toxicity)? Design considerations for SMARTs? Real time regimes (mhealth )... Statistical research is way ahead of what is actually being done in practice 50/65 Treatment Regimes and SMARTs

Software IMPACT Innovative Methods Program for Advancing Clinical Trials A joint venture of Duke, UNC-Chapel Hill, NC State Supported by NCI Program Project P01 CA142538 (2010 2020) http://impact.unc.edu R package: DynTxRegime Implements all this and more Also available on CRAN 51/65 Treatment Regimes and SMARTs

Shameless promotion Coming in 2018: Tsiatis, A. A., Davidian, M., Laber, E. B., and Holloway, S. T. (2017). An Introduction to Dynamic Treatment Regimes: Statistical Methods for Precision Medicine. Chapman & Hall/CRC Press. 52/65 Treatment Regimes and SMARTs

References Schulte, P. J., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2014). Q- and A-learning methods for estimating optimal dynamic treatment regimes. Statistical Science, 29, 640 661. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics, 68, 1010 1018. Zhang, B., Tsiatis, A.A., Davidian, M., Zhang, M., and Laber, E.B., (2012). Estimating optimal treatment regimes from a classification perspective. Stat, 1, 103 114. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika, 100, 681 694. Zhao, Y., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, 110, 583 598. Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107, 1106 1118. 53/65 Treatment Regimes and SMARTs

Additional references Bai, X., Tsiatis, A. A., Lu, W., and Song, R. (2016). Optimal treatment regimes for survival endpoints using a locally-efficient doubly-robust estimator from a classification perspective. Lifetime Data Analysis, in press. Davidian, M., Tsiatis, A. A., and Laber, E. B. (2016). Dynamic treatment regimes. In Cancer Clinical Trials: Current and Controversial Issues in Design and Analysis, George, S. L., Wang, X. and Pang, H. (eds). Boca Raton: Chapman & Hall/CRC Press, ch. 13, pp. 409 446. Jiang, R., Lu, W., Song, R., and Davidian, M. (2016). On estimation of optimal treatment regimes for maximizing t-year survival probability. Journal of the Royal Statistical Society, Series B, in press. Laber, E. B., Zhao, Y., Regh, T., Davidian, M., Tsiatis, A. A., Stanford, J., Zeng, D., Song, R., and Kosorok, M. R. (2016). Using pilot data to size a two-arm randomized trial to find a nearly optimal personalized treatment strategy. Statistics in Medicine, 35, 1245 1256. Goldberg, Y. and Kosorok, M. R. (2012). Q-learning with censored data, Annals of Statistics, 40, 529 560. 54/65 Treatment Regimes and SMARTs

Additional references Zhang, Y., Laber, E. B., Tsiatis, A. A., and Davidian, M. (2015). Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics, 71, 895 904. Zhao, Y. Q., Zeng, D., Laber, E. B., Song, R., Yuan, M. and Kosorok, M. R. (2015), Doubly robust learning for estimating individualized treatment with censored data, Biometrika, 102, 151 68. 55/65 Treatment Regimes and SMARTs

Important philosophical point Distinguish between: The best treatment for a patient The best treatment decision for a patient given the information available on the patient Best treatment for a patient: Is 0 or 1 according to Y (0) > Y (1) or Y (0) < Y (1) Best treatment decision given the information available: We cannot hope to determine this because we can never see all potential outcomes on a given patient We can hope to make the optimal decision given the information available, i.e., find d opt that makes E{Y (d) X = x} and V (d) = E{Y (d)} as large as possible 56/65 Treatment Regimes and SMARTs

Classification perspective Generic classification situation: Z = class, label ; here, Z = {0, 1} (binary ) X = features (covariates) d is a classifier: d : x {0, 1} D is a family of classifiers, e.g., Hyperplanes of the form Rectangular regions of the form I(η 0 + η 1 X 1 + X 2 > 0) I(X 1 < η 1 ) + I(X 1 η 1, X 2 < η 2 ) 57/65 Treatment Regimes and SMARTs

Classification perspective Generic classification problem: Training set: (X i, Z i ), i = 1,..., n Find classifier d D that minimizes n Classification error {Z i d(x i )} 2 i=1 Weighted classification error n w i {Z i d(x i )} 2 Machine learning (supervised learning) Support vector machines: Hyperplanes Recursive partitioning (CART ): Rectangular regions i=1 58/65 Treatment Regimes and SMARTs

Value search estimator, revisited Recall: Estimation of d η restricted class D η η opt maximizes V (d η ) = E{Y (d η )} AIPWE V AIPWE (d η ) = n 1 n { Cη,i Y i π c (X i ; η, γ) C } η,i π c (X i ; η, γ) m(x i ; η, π c (X i ; η, γ) β) i=1 η opt maximizes V AIPWE (d η ) = estimator d opt η,aipwe (x) 59/65 Treatment Regimes and SMARTs

Value search estimator, revisited Algebra: V AIPWE (d η ) can be rewritten as n 1 n Ĉ(X i) [I{Ĉ(X i) > 0} d(x i ; η)] 2 + terms not involving d i=1 Ĉ(X i ) = { Ai Y i π(x i, γ) A } i π(x i, γ) Q(X i, 1; π(x i ; γ) β) { (1 Ai )Y i 1 π(x i ; γ) + A } i π(x i ; γ) 1 π(x i ; γ) Q(X i, 0; β), E{Ĉ(X i) X i } C(X i ) = Q(X i, 1) Q(X i, 0), C(X) is the contrast function 60/65 Treatment Regimes and SMARTs

Classification formulation Result: Estimate η opt and thus obtain minimizing in η n 1 d opt η,aipwe (x) = d(x, ηopt ) by n Ĉ(X i) [I{Ĉ(X i) > 0} d(x i ; η)] 2 i=1 Can be viewed: As a weighted classification problem Label I{Ĉ(X i) > 0} Classifier d(x i ; η) Weight Ĉ(X i) Zhang et al. (2012), Stat ; Zhao et al. (2012), JASA (IPWE only, Outcome weighted learning, OWL) 61/65 Treatment Regimes and SMARTs

More on Q-Learning Estimation of V (d opt ): For each i, form Ỹ 1i = Ỹ1i(H 1i ; β 1 ) = max{q 1 (H 1i, 0; β 1 ), Q 1 (H 1i, 1; β 1 )} Estimate the value V (d opt ) by V Q (d opt ) = n 1 n Ỹ 1i i=1 62/65 Treatment Regimes and SMARTs

More on Q-Learning Decision 2: Positing/fitting Q 2 (H 2, A 2 ; β 2 ) is a standard regression problem Data (H 2i, A 2i, Y i ), i = 1,..., n For continuous Y, linear models are often used, e.g. Q 2 (h 2, a 2 ; β 2 ) = h T 2 β 21 + a 2 ( h T 2 β 22), h2 = (1, h T 2 )T Standard model selection methods, diagnostics, etc Under the linear model d opt 2 (h 2 ) = I( h T 2 β 22 > 0) Ỹ 2 (H 2 ; β 2 ) = H T 2 β 21 + ( H T 2 β 22)I( H T 2 β 22 > 0) 63/65 Treatment Regimes and SMARTs

More on Q-Learning Ỹ 2 (H 2 ; β 2 ) = H T 2 β 21 + ( H T 2 β 22)I( H T 2 β 22 > 0) Decision 1: It is common to assume a linear model for Q 1 (H 1, A 1 ) = E{Ỹ2(H 2 ; β 2 ) H 1, A 1 } Q 1 (h 1, a 1 ; β 1 ) = h T 1 β 11 + a 2 ( h T 1 β 12), h1 = (1, h T 1 )T Data (H 1i, A 1i, Ỹ 2i ), i = 1..., n This is clearly a nonstandard regression problem; more in a moment Under the linear model (if it were correct ) d opt 1 (h 1 ) = I( h T 1 β 12 > 0) 64/65 Treatment Regimes and SMARTs

More on Q-Learning Decision 1: Even if the Decision 2 linear model is correctly specified, a linear model at Decision 1 is almost certainly incorrect In truth Q 2 (h 2, a 2 ) = x 1 + a 2 (0.5a 1 + x 2 ) and May show that X 2 X 1 = x 1, A 1 = a 1 N (ξx 1, σ 2 ) Q 1 (h 1, a 1 ) = x 1 + Not linear in general σ 2π exp{ (0.5a 1 + ξx 1 ) 2 /(2σ 2 )} +(0.5a 1 + ξx 1 )Φ{(0.5a 1 + ξx 1 )/σ} 65/65 Treatment Regimes and SMARTs