Set-valued dynamic treatment regimes for competing outcomes

Size: px
Start display at page:

Download "Set-valued dynamic treatment regimes for competing outcomes"

Transcription

1 Set-valued dynamic treatment regimes for competing outcomes Eric B. Laber Department of Statistics, North Carolina State University JSM, Montreal, QC, August 5, 2013

2 Acknowledgments Zhiqiang Tan Jamie Robins Dan Lizotte Brad Ferguson NCSU and UNC DTR Groups

3 Dynamic treatment regimes Motivation: treatment of chronic illness Ex. ADHD, HIV/AIDS, cancer, depression, schizophrenia, drug and alcohol addiction, bipolar disorder, etc. Treatments adapt to evolving health status of patient Requires balancing immediate and long-term outcomes Dynamic treatment regimes (DTRs) Operationalize decision process as sequence of decision rules One decision rule for each intervention time Intervention times often outcome-dependent Decision rule maps up-to-date patient history to a recommended treatment Maximize expectation of a single clinical outcome

4 Estimation of DTRs Many options for estimating DTRs from observational or randomized studies Q- and A-learning (Murphy 2003, 2005; Robins 2004) Regret regression (Henderson et al., 2010) Dynamical systems models (Rivera et al., 2007; Navarro-Barrientos et al., 2010, 2011) Policy search Augmented value maximization (Zhang 2012, 2013ab) Outcome weighted learning (Zhao et al., 2012, 2013ab) Marginal structural mean models (Orellana et al., 2010) All aim optimize the mean of a single scalar outcome by which the goodness of competing DTRs are measured

5 Competing outcomes Specifying a single scalar outcome sometimes oversimplifies the goal of clinical decision making Need to balance several potentially competing outcomes, e.g., symptom relief and side-effect burden Patients may have heterogeneous and evolving preferences across competing outcomes In some populations, patients do not know or cannot communicate their preferences

6 Motivating example CATIE Schizophrenia study (Stroup et. al., 2003) 18 month SMART study with two main phases Aim to compare typical and atypical antipsychotics Trade-off between efficacy and side-effect burden Drugs known to be efficacious also known to have severe negative side-effects (e.g., olanzapine; Brier et al., 2005) Separate arms for efficacy and tolerability at second stage Sequential treatment of severe schizophrenia Preferences vary widely across patients (Kinter, 2009) Ex. Joe may be willing to tolerate severe weight gain, lethargy, and reduced social functioning to gain symptom relief. Joan cannot function at work if she is lethargic or experiences clouded thinking and is thus willing to take a less effective treatment if it has low side-effect burden. Preference elicitation is difficult in this population Preferences may evolve over time (Strauss et al., 2011)

7 Composite outcomes A natural approach is to combine competing outcomes into a single composite outcome A single composite outcome for all patients (e.g., Wang et al., 2012) Elicited by panel of experts Requires patient preference homogeneity (e.g., all patients are ambivalent about trading 1 side-effect unit for three units of symptom relief) Preferences do not change over time Estimate the optimal DTR across all convex combinations of competing outcomes simultaneously (e.g., Lizotte et al., 2012) True preferences must be expressible as convex combinations Patient preferences cannot change over time

8 Setup: Two-stage binary treatments For simplicity, we consider the two-stage binary treatment setting, this is not essential (see Laber et al., 2013) Observe {(X 1i, A 1i, X 2i, A 2i, E i, S i )} n i=1 comprising n iid patient trajectories : X t R pt : subject covariate vector prior to tth txt A t A t = { 1, 1}: treatment received at time t E R: first outcome, coded so that higher is better S R: second outcome, coded so that higher is better Define: H1 = X 1 and H 2 = (X 1, A 1, X 2 ), let H t denote the domain of H t Focus on estimands

9 Setup: DTRs vs SVDTRs A DTR is a pair of functions π = (π 1, π 2 ) where π t : H t A t Under DTR π a patient presenting with Ht = h t at time t is recommended treatment π t (h t ) For scalar summary Y = Y (E, S) of (E, S) the optimal DTR π opt satisfies E πopt Y E π Y for all π, where E π denotes expectation under treatment assignment via π An SVDTR is a pair of functions π = (π 1, π 2 ) where π t : H t 2 At Under SVDTR π a patient presenting with H t = h t at time t is offered treatment choices π t (h t ) A t Difficult to define optimality without assumptions about patient utility; we will define ideal decision rules View SVDTRs as a communication tool, typically paired with graphical displays (more on this later)

10 Review: dynamic programming for optimal DTR When underlying generative distribution is known, optimal DTR π opt can be found via dynamic programming 1. Define Q 2Y (h 2, a 2 ) E(Y H 2 = h 2, A 2 = a 2 ) 2. Y max a2 Q 2Y (H 2, a 2 ) = Q 2Y (H 2, π opt 2 (H 2 )) 3. Q 1Y (h 1, a 1 ) E(Y H 1 = h 1, A 1 = a 1 ) then π opt t (h t ) = arg max at A t Q ty (h t, a t ) (Bellman, 1957) Q-learning mimics dynamic programming but uses regression models in place of the requisite conditional expectations Note* subscript denotes outcome used to define Q-function

11 SVDTRs Incorporate clinically significant differences E, S > 0; a difference in E(S) of less than E ( S ) is not considered clinically significant (e.g., Friedman et al., 2010) Define contrast functions r 2E (h 2 ) E(E H 2 = h 2, A 2 = 1) E(E H 2 = h 2, A 2 = 1) r 2S (h 2 ) E(S H 2 = h 2, A 2 = 1) E(S H 2 = h 2, A 2 = 1) Ideal set-valued second stage decision rule π Ideal 2 (h 2) = {sgn(r 2E (h 2 ))}, {sgn(r 2S (h 2 ))}, { 1, 1}, if r 2E (h 2 ) E and sgn(r 2E (h 2 ))r S (h 2 ) > S, if r 2S (h 2 ) S and sgn(r 2S (h 2 ))r 2E (h 2 ) > E, otherwise,

12 SVDTRs: schematic for π Ideal 2 r 2S (h 2 ) { 1, 1} {1} S E S { 1, 1} E r 2E (h 2 ) { 1} { 1, 1}

13 SVDTRs: defining the ideal first stage rule Q-learning requires a fixed second stage rule for backup step Problem: π Ideal 2 is not a decision rule Idea: 1. Define a set of sensible second stage rules S(π2 Ideal ) 2. For each τ 2 S(π Ideal 2 π Ideal 1 ) derive ideal set-valued rule (, τ 2) assuming τ 2 will be followed at the second stage 3. A patient presenting with H 1 = h 1 is offered treatments π1 Ideal (h 1) 1 (h 1, τ 2) τ 2 S(π Ideal 2 Ideal π )

14 SVDTRs: defining a sensible set Decision rule τ 2 is compatible with set-valued rule π 2 if τ 2 (h 2 ) π 2 (h 2 ) for all h 2 H 2 Let C(π2 ) denote the set of all policies compatible with π 2 Minimally require S(π Ideal 2 ) C(πIdeal 2 ) C(π Ideal) is large and contains many unrealistic rules 2 Contrast models suggest functional form for decision rules } F {τ 2 : ρ R p 2 s.t. τ 2 (h 2 ) = sgn(h 2,2 ρ) Define the set of sensible second stage rules as ) F C(πIdeal S(π Ideal 2 2 )

15 SVDTRs: ideal set-valued first stage rule For a fixed second stage rule τ 2 define Q1E (h 1, a 1, τ 2 ) E {Q 2E (H 2, τ 2 (H 2 )) H 1 = h 1, A 1 = a 1 } r1e (h 1, τ 2 ) Q 1E (h 1, 1, τ 2 ) Q 1E (h 1, 1, τ 2 ) Q1S and r 1S defined analogously Ideal set-valued first stage rule under τ 2 at the second stage: π Ideal 1 (h 1, τ 2 ) = {sgn(r 1E (h 1, τ 2 ))}, {sgn(r 1S (h 1, τ 2 ))}, { 1, 1}, if r 1E (h 1, τ 2 ) E and sgn(r 1E (h 1, τ 2 ))r 1S (h 1, τ 2 ) > S, if r 1S (h 1, τ 2 ) S and sgn(r 1S (h 1, τ 2 ))r 1E (h 1, τ 2 ) > E, otherwise,

16 SVDTRs: Defining π Ideal 1 (, τ 2) assigns a single treatment if that treatment is expected to yield a clinically significant improvement on one or both the outcomes while not causing clinically significant loss in either outcome assuming the clinician will follow τ 2 at the second decision point. π1 Ideal Recall: Ideal decision rule at the first stage π1 Ideal (h Ideal 1) 1 (h 1, τ 2 ) π τ 2 S(π opt 2 )

17 SVDTR estimation Linear models; H t0, H t1 summaries of H t 1. Regress E and S on H 21 and H 22 to obtain Q 2E (H 2, A 2 ) = β 21E H 21 + β 22E H 22A 2 Q 2S (H 2, A 2 ) = β 21S H 21 + β 22S H 22A 2 2. Define r 2E (h 2 ) Q 2E (h 2, 1) Q 2E (h 2, 1) r 2S (h 2 ) Q 2S (h 2, 1) Q 2S (h 2, 1) 3. Use the plug-in estimator of π Ideal 2 to obtain π 2 (h 2 ) = {sgn( r 2E (h 2 ))}, {sgn( r 2S (h 2 ))}, { 1, 1}, if r 2E (h 2 ) E and sgn( r 2E (h 2 )) r S (h 2 ) > S, if r 2S (h 2 ) S and sgn( r 2S (h 2 )) r 2E (h 2 ) > E, otherwise,

18 SVDTR estimation cont d 4.* Construct set S( π 2 ) 5. For each τ 2 in S( π 2 ) regress Q 2E (H 2, τ 2 (H 2 )) and Q 2S (H 2, τ 2 (H 2 )) on H 11, H 11 and A 1 to obtain Q 1E (H 1, A 1 ; τ 2 ) = β 11E (τ 2 ) H 11 + β 12E (τ 2 ) H 12 Q 1S (H 1, A 1 ; τ 2 ) = β 11S (τ 2 ) H 11 + β 12S (τ 2 ) H Define r 1E (h 1, τ 2 ) Q 1E (h 1, 1, τ 2 ) Q 1E (h 1, 1, τ 2 ) r 1S (h 1, τ 2 ) Q 1S (h 1, 1, τ 2 ) Q 1S (h 1, 1, τ 2 ) 7. Use the plugin-estimator of π Ideal 1 to obtain π 1 (h 1, τ 2 ) (similar to step 3) 8. Define π 1 (h 1 ) τ 2 S( π 2 ) π 1 (h 1, τ 2 )

19 SVDTR computation Constructing the set S( π 2 ) is a seemingly difficult enumeration problem While S( π 2 ) is uncountably infinite, one can prove there exists a finite set S ( π 2 ) so that for all h 1 : π 1 (h 1, τ 2 ) = π 1 (h 1, τ 2 ) τ 2 S ( π 2 ) τ 2 S( π 2 ) We cast construction of S ( π 2 ) as a linear Mixed Integer Program (MIP) Computed efficiently using CPLEX With CATIE data (n 1100, p 20) runtime is less than 1 minute on a dual core laptop (2.8GHz, 8GiB)

20 Communicating the SVDTR Patient presenting at the second stage with H 2 = h 2 given: 1. π 2 (h 2 ) 2. Estimates ( Q 2E (h 2, 1), Q 2S (h 2, 1)), ( Q 2S (h 2, 1), Q 2E (h 2, 1)) Patient presenting at the first stage with H 1 = h 1 given: 1. π 1 (h 1 ) 2. Plot of Q 1E (h 1, a 1, τ 2 ) against Q 1S (h 1, a 1, τ 2 ) across τ 2 S( π 1 ) with separate plotting symbols/colors for a 1 = ±1

21 First stage plot for mean CATIE subject Estimated SVDTR using data from the CATIE study; outcomes E = PANSS and S = BMI (details in Laber et al., 2013) Constructed plot for a subject with h 1 = n 1 n i=1 H 1i PERP Not PERP BMI PANSS

22 Discussion Introduced set-valued dynamic treatment regimes (SVDTRs) One approach to dealing with competing outcomes under: Patient preference heterogeneity Difficult to elicit preferences Evolving patient preference Presented the two-stage binary treatment case, method extends to an arbitrary number of txts and stages (see Laber et al., 2013) Suggests a new framework for decision problems with both a decision maker and a treatment screener. This framework generalized MDPs and may facilitate minimax and other definitions of optimality for SVDTRs.

23 Thank you.

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis

Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis Dan Lizotte, Michael Bowling, Susan A. Murphy University of Michigan, University of Alberta Overview

More information

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN Ying Liu Division of Biostatistics, Medical College of Wisconsin Yuanjia Wang Department of Biostatistics & Psychiatry,

More information

Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders

Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders Daniel Almirall, Xi Lu, Connie Kasari, Inbal N-Shani, Univ. of Michigan, Univ. of

More information

Methods for Interpretable Treatment Regimes

Methods for Interpretable Treatment Regimes Methods for Interpretable Treatment Regimes John Sperger University of North Carolina at Chapel Hill May 4 th 2018 Sperger (UNC) Interpretable Treatment Regimes May 4 th 2018 1 / 16 Why Interpretable Methods?

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Interactive Q-learning for Probabilities and Quantiles

Interactive Q-learning for Probabilities and Quantiles Interactive Q-learning for Probabilities and Quantiles Kristin A. Linn 1, Eric B. Laber 2, Leonard A. Stefanski 2 arxiv:1407.3414v2 [stat.me] 21 May 2015 1 Department of Biostatistics and Epidemiology

More information

Sample size considerations for precision medicine

Sample size considerations for precision medicine Sample size considerations for precision medicine Eric B. Laber Department of Statistics, North Carolina State University November 28 Acknowledgments Thanks to Lan Wang NCSU/UNC A and Precision Medicine

More information

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Estimating Optimal Dynamic Treatment Regimes from Clustered Data Estimating Optimal Dynamic Treatment Regimes from Clustered Data Bibhas Chakraborty Department of Biostatistics, Columbia University bc2425@columbia.edu Society for Clinical Trials Annual Meetings Boston,

More information

Robustifying Trial-Derived Treatment Rules to a Target Population

Robustifying Trial-Derived Treatment Rules to a Target Population 1/ 39 Robustifying Trial-Derived Treatment Rules to a Target Population Yingqi Zhao Public Health Sciences Division Fred Hutchinson Cancer Research Center Workshop on Perspectives and Analysis for Personalized

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect

Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect Xige Zhang April 10, 2016 1 Introduction Technological advances in mobile devices have seen a growing popularity in

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

Balancing Covariates via Propensity Score Weighting

Balancing Covariates via Propensity Score Weighting Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014

More information

Q learning. A data analysis method for constructing adaptive interventions

Q learning. A data analysis method for constructing adaptive interventions Q learning A data analysis method for constructing adaptive interventions SMART First stage intervention options coded as 1(M) and 1(B) Second stage intervention options coded as 1(M) and 1(B) O1 A1 O2

More information

Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012)

Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012) Biometrics 71, 267 273 March 2015 DOI: 10.1111/biom.12228 READER REACTION Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012) Jeremy M. G. Taylor,* Wenting

More information

The Impact of CER on Health and Health Care Spending. Tomas J. Philipson, Ph.D Daniel Levin Chair of Public Policy The University of Chicago

The Impact of CER on Health and Health Care Spending. Tomas J. Philipson, Ph.D Daniel Levin Chair of Public Policy The University of Chicago The Impact of CER on Health and Health Care Spending Tomas J. Philipson, Ph.D Daniel Levin Chair of Public Policy The University of Chicago 1 The long history of public technology assessments R.I.P. 1972

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Micro-Randomized Trials & mhealth. S.A. Murphy NRC 8/2014

Micro-Randomized Trials & mhealth. S.A. Murphy NRC 8/2014 Micro-Randomized Trials & mhealth S.A. Murphy NRC 8/2014 mhealth Goal: Design a Continually Learning Mobile Health Intervention: HeartSteps + Designing a Micro-Randomized Trial 2 Data from wearable devices

More information

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs. Anastasios (Butch) Tsiatis and Marie Davidian

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs. Anastasios (Butch) Tsiatis and Marie Davidian Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs Anastasios (Butch) Tsiatis and Marie Davidian Department of Statistics North Carolina State University http://www4.stat.ncsu.edu/~davidian

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Overlap Propensity Score Weighting to Balance Covariates

Overlap Propensity Score Weighting to Balance Covariates Overlap Propensity Score Weighting to Balance Covariates Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu JSM 2016 Chicago, IL Joint work with Fan Li (Duke) and Alan Zaslavsky

More information

A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes

A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes Thomas A. Murray, (tamurray@mdanderson.org), Ying Yuan, (yyuan@mdanderson.org), and Peter F. Thall (rex@mdanderson.org) Department

More information

PSC 504: Dynamic Causal Inference

PSC 504: Dynamic Causal Inference PSC 504: Dynamic Causal Inference Matthew Blackwell 4/8/203 e problem Let s go back to a problem that we faced earlier, which is how to estimate causal effects with treatments that vary over time. We could

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Targeted Group Sequential Adaptive Designs

Targeted Group Sequential Adaptive Designs Targeted Group Sequential Adaptive Designs Mark van der Laan Department of Biostatistics, University of California, Berkeley School of Public Health Liver Forum, May 10, 2017 Targeted Group Sequential

More information

Supplement to Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes

Supplement to Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes Submitted to the Statistical Science Supplement to - and -learning Methods for Estimating Optimal Dynamic Treatment Regimes Phillip J. Schulte, nastasios. Tsiatis, Eric B. Laber, and Marie Davidian bstract.

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Evaluation of Viable Dynamic Treatment Regimes in a Sequentially Randomized Trial of Advanced Prostate Cancer

Evaluation of Viable Dynamic Treatment Regimes in a Sequentially Randomized Trial of Advanced Prostate Cancer Evaluation of Viable Dynamic Treatment Regimes in a Sequentially Randomized Trial of Advanced Prostate Cancer Lu Wang, Andrea Rotnitzky, Xihong Lin, Randall E. Millikan, and Peter F. Thall Abstract We

More information

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Balancing Covariates via Propensity Score Weighting: The Overlap Weights Balancing Covariates via Propensity Score Weighting: The Overlap Weights Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu PSU Methodology Center Brown Bag April 6th, 2017 Joint

More information

MDPs with Non-Deterministic Policies

MDPs with Non-Deterministic Policies MDPs with Non-Deterministic Policies Mahdi Milani Fard School of Computer Science McGill University Montreal, Canada mmilan1@cs.mcgill.ca Joelle Pineau School of Computer Science McGill University Montreal,

More information

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES. University of Michigan, Ann Arbor

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES. University of Michigan, Ann Arbor Submitted to the Annals of Applied Statistics TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES BY YEBIN TAO, LU WANG AND DANIEL ALMIRALL University of Michigan, Ann Arbor

More information

Comprehensive Examination Quantitative Methods Spring, 2018

Comprehensive Examination Quantitative Methods Spring, 2018 Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part

More information

Identification in Nonparametric Models for Dynamic Treatment Effects

Identification in Nonparametric Models for Dynamic Treatment Effects Identification in Nonparametric Models for Dynamic Treatment Effects Sukjin Han Department of Economics University of Texas at Austin sukjin.han@austin.utexas.edu First Draft: August 12, 2017 This Draft:

More information

Preference Elicitation for Sequential Decision Problems

Preference Elicitation for Sequential Decision Problems Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These

More information

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data Jeff Dominitz RAND and Charles F. Manski Department of Economics and Institute for Policy Research, Northwestern

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

Finding the Value of Information About a State Variable in a Markov Decision Process 1

Finding the Value of Information About a State Variable in a Markov Decision Process 1 05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742

More information

CS 540: Machine Learning Lecture 1: Introduction

CS 540: Machine Learning Lecture 1: Introduction CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention

An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention by Huitian Lei A dissertation submitted in partial fulfillment of the requirements for the degree of

More information

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian

More information

Semiparametric Regression and Machine Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Semiparametric Regression and Machine Learning Methods for Estimating Optimal Dynamic Treatment Regimes Semiparametric Regression and Machine Learning Methods for Estimating Optimal Dynamic Treatment Regimes by Yebin Tao A dissertation submitted in partial fulfillment of the requirements for the degree of

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials From the SelectedWorks of Paul H. Chaffee June 22, 2012 Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials Paul Chaffee Mark J. van der Laan

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Multi-armed Bandits with Limited Exploration

Multi-armed Bandits with Limited Exploration Multi-armed Bandits with Limited Exploration Sudipto Guha Kamesh Munagala Abstract A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between

More information

Adaptive Trial Designs

Adaptive Trial Designs Adaptive Trial Designs Wenjing Zheng, Ph.D. Methods Core Seminar Center for AIDS Prevention Studies University of California, San Francisco Nov. 17 th, 2015 Trial Design! Ethical:!eg.! Safety!! Efficacy!

More information

Switching-state Dynamical Modeling of Daily Behavioral Data

Switching-state Dynamical Modeling of Daily Behavioral Data Switching-state Dynamical Modeling of Daily Behavioral Data Randy Ardywibowo Shuai Huang Cao Xiao Shupeng Gui Yu Cheng Ji Liu Xiaoning Qian Texas A&M University University of Washington IBM T.J. Watson

More information

Preference-based Reinforcement Learning

Preference-based Reinforcement Learning Preference-based Reinforcement Learning Róbert Busa-Fekete 1,2, Weiwei Cheng 1, Eyke Hüllermeier 1, Balázs Szörényi 2,3, Paul Weng 3 1 Computational Intelligence Group, Philipps-Universität Marburg 2 Research

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Multiple Testing in Group Sequential Clinical Trials

Multiple Testing in Group Sequential Clinical Trials Multiple Testing in Group Sequential Clinical Trials Tian Zhao Supervisor: Michael Baron Department of Mathematical Sciences University of Texas at Dallas txz122@utdallas.edu 7/2/213 1 Sequential statistics

More information

Mathematical Optimization Models and Applications

Mathematical Optimization Models and Applications Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 2: Constant Treatment Strategies Introduction Motivation We will focus on evaluating constant treatment strategies in this lecture. We will discuss using randomized or observational study for these

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Recent Advances in Outcome Weighted Learning for Precision Medicine

Recent Advances in Outcome Weighted Learning for Precision Medicine Recent Advances in Outcome Weighted Learning for Precision Medicine Michael R. Kosorok Department of Biostatistics Department of Statistics and Operations Research University of North Carolina at Chapel

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

Coordinate Systems. Recall that a basis for a vector space, V, is a set of vectors in V that is linearly independent and spans V.

Coordinate Systems. Recall that a basis for a vector space, V, is a set of vectors in V that is linearly independent and spans V. These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (rd edition). These notes are intended primarily for in-class presentation

More information

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling The Lady Tasting Tea More Predictive Modeling R. A. Fisher & the Lady B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea Fisher was skeptical that she could distinguish

More information

Real Time Value Iteration and the State-Action Value Function

Real Time Value Iteration and the State-Action Value Function MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing

More information

Practicable Robust Markov Decision Processes

Practicable Robust Markov Decision Processes Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)

More information

arxiv: v1 [stat.ap] 17 Mar 2018

arxiv: v1 [stat.ap] 17 Mar 2018 Power Analysis in a SMART Design: Sample Size Estimation for Determining the Best Dynamic Treatment Regime arxiv:1804.04587v1 [stat.ap] 17 Mar 2018 William J. Artman Department of Biostatistics and Computational

More information

Evidence synthesis for a single randomized controlled trial and observational data in small populations

Evidence synthesis for a single randomized controlled trial and observational data in small populations Evidence synthesis for a single randomized controlled trial and observational data in small populations Steffen Unkel, Christian Röver and Tim Friede Department of Medical Statistics University Medical

More information

Weighted MCID: Estimation and Statistical Inference

Weighted MCID: Estimation and Statistical Inference Weighted MCID: Estimation and Statistical Inference Jiwei Zhao, PhD Department of Biostatistics SUNY Buffalo JSM 2018 Vancouver, Canada July 28 August 2, 2018 Joint Work with Zehua Zhou and Leslie Bisson

More information

BIOS 6649: Handout Exercise Solution

BIOS 6649: Handout Exercise Solution BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

A Robust Method for Estimating Optimal Treatment Regimes

A Robust Method for Estimating Optimal Treatment Regimes Biometrics 63, 1 20 December 2007 DOI: 10.1111/j.1541-0420.2005.00454.x A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian Department

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

ST 732, Midterm Solutions Spring 2019

ST 732, Midterm Solutions Spring 2019 ST 732, Midterm Solutions Spring 2019 Please sign the following pledge certifying that the work on this test is your own: I have neither given nor received aid on this test. Signature: Printed Name: There

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Magical Policy Search: Data Efficient Reinforcement Learning with Guarantees of Global Optimality

Magical Policy Search: Data Efficient Reinforcement Learning with Guarantees of Global Optimality Magical Policy Search Magical Policy Search: Data Efficient Reinforcement Learning with Guarantees of Global Optimality Philip S. Thomas Carnegie Mellon University Emma Brunskill Carnegie Mellon University

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014 Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the

More information

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks Y. Xu, D. Scharfstein, P. Mueller, M. Daniels Johns Hopkins, Johns Hopkins, UT-Austin, UF JSM 2018, Vancouver 1 What are semi-competing

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Randomization-Based Inference With Complex Data Need Not Be Complex!

Randomization-Based Inference With Complex Data Need Not Be Complex! Randomization-Based Inference With Complex Data Need Not Be Complex! JITAIs JITAIs Susan Murphy 07.18.17 HeartSteps JITAI JITAIs Sequential Decision Making Use data to inform science and construct decision

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Stochastic process for macro

Stochastic process for macro Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial William R. Gillespie Pharsight Corporation Cary, North Carolina, USA PAGE 2003 Verona,

More information

2534 Lecture 4: Sequential Decisions and Markov Decision Processes

2534 Lecture 4: Sequential Decisions and Markov Decision Processes 2534 Lecture 4: Sequential Decisions and Markov Decision Processes Briefly: preference elicitation (last week s readings) Utility Elicitation as a Classification Problem. Chajewska, U., L. Getoor, J. Norman,Y.

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information