Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Similar documents
Q learning. A data analysis method for constructing adaptive interventions

Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN

BIOSTATISTICAL METHODS

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Sample Size and Power Considerations for Longitudinal Studies

Building a Prognostic Biomarker

Growth Curve Modeling Approach to Moderated Mediation for Longitudinal Data

Comparative effectiveness of dynamic treatment regimes

Estimating Causal Effects of Organ Transplantation Treatment Regimes

PQL Estimation Biases in Generalized Linear Mixed Models

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Lecture 4 Multiple linear regression

A Sampling of IMPACT Research:

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

CSC 411: Lecture 09: Naive Bayes

Linear Methods for Prediction

Classification 1: Linear regression of indicators, linear discriminant analysis

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

Stat 579: Generalized Linear Models and Extensions

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Accounting for Baseline Observations in Randomized Clinical Trials

For more information about how to cite these materials visit

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Set-valued dynamic treatment regimes for competing outcomes

Time-Varying Causal. Treatment Effects

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

Core Courses for Students Who Enrolled Prior to Fall 2018

Basics of reinforcement learning

Assessing Time-Varying Causal Effect. Moderation

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Micro-Randomized Trials & mhealth. S.A. Murphy NRC 8/2014

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

BIOS 6649: Handout Exercise Solution

Probabilistic Index Models

A Significance Test for the Lasso

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Switching-state Dynamical Modeling of Daily Behavioral Data

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Linear Methods for Classification

Randomization-Based Inference With Complex Data Need Not Be Complex!

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Applied Statistics Qualifier Examination (Part II of the STAT AREA EXAM) January 25, 2017; 11:00AM-1:00PM

Analysing longitudinal data when the visit times are informative

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

A Gate-keeping Approach for Selecting Adaptive Interventions under General SMART Designs

Data Mining Stat 588

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

Multiple linear regression S6

GEE for Longitudinal Data - Chapter 8

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Models for binary data

A Significance Test for the Lasso

Lecture 1 January 18

Justine Shults 1,,, Wenguang Sun 1,XinTu 2, Hanjoo Kim 1, Jay Amsterdam 3, Joseph M. Hilbe 4,5 and Thomas Ten-Have 1

,..., θ(2),..., θ(n)

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

CONTACT DETAILS. Cape Town BIOSTATISTICIANS. Esmé Jordaan Specialist Statistician Tel:

Temporal Difference Learning & Policy Iteration

Analysing data: regression and correlation S6 and S7

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

Models for Clustered Data

Models for Clustered Data

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Power and Sample Size for the Most Common Hypotheses in Mixed Models

Accounting for Baseline Observations in Randomized Clinical Trials

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

ECE521 week 3: 23/26 January 2017

Linear Models for Classification

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Introduction to mtm: An R Package for Marginalized Transition Models

Inference for correlated effect sizes using multiple univariate meta-analyses

High-Throughput Sequencing Course

Simulation Methods for Stochastic Storage Problems: A Statistical Learning Perspective

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Biostatistics Advanced Methods in Biostatistics IV

Weighted Least Squares

Regression I: Mean Squared Error and Measuring Quality of Fit

arxiv: v1 [stat.ap] 17 Mar 2018

Extending causal inferences from a randomized trial to a target population

Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Confounding, mediation and colliding

6 Reinforcement Learning

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model

Multivariate statistical methods and data mining in particle physics

DYNAMIC AND COMPROMISE FACTOR ANALYSIS

Correlation and simple linear regression S5

Transcription:

Estimating Optimal Dynamic Treatment Regimes from Clustered Data Bibhas Chakraborty Department of Biostatistics, Columbia University bc2425@columbia.edu Society for Clinical Trials Annual Meetings Boston, May 22, 2013 1 / 26

Acknowledgements Joint work with: Guqian Du (Student @ Columbia Biostatistics) Ken Cheung (Columbia Biostatistics) Myunghee Cho Paik (Seoul National University) Supported by: NIH grant R01 NS072127-01A1 Calderone Research Prize for Junior faculty (Mailman School of Public Health, Columbia University) 2 / 26

The Context Possible SMART Designs Stroke prevention through optimal dynamic behavioral intervention Many behavioral interventions are delivered in group settings (e.g. family, clinic, classroom,...) This leads to clustered (outcome) data Overarching Methodological Questions: How do we best modify the framework for SMART design to adapt to the group setting? How do we best modify the existing estimation techniques to account for clustering? 3 / 26

Possible SMART Designs A Motivating Study Family-based behavioral intervention study in the Northern Manhattan area of NYC Still in planning phase; no data available Ultimate goal: Prevent 2nd stroke Operative objective: Reduce blood pressure (over 12 months) Interventions: Behavioral modifications through 3 educational components over 12 months Medication adherence (med) Nutrition modifications (nut) Physical activity reinforcement (phy) 4 / 26

Outline 1 Possible SMART Designs 2 3 4

Possible SMART Designs Design Considerations 3 possible behavioral interventions, delivered at family level: Medication adherence (med) Nutrition modifications (nut) Physical activity reinforcement (phy) Scientific Question: What is the optimal sequence of these interventions? Need to conduct a SMART Statistical Question: What are some of the SMART options? 6 / 26

Possible SMART Designs Play the Winner, Switch from the Loser SMART Baseline 6 months 12 months med Responder nut R nut med Primary Outcome Non-responder R phy phy 7 / 26

Possible SMART Designs Randomize the Winner, Switch from the Loser SMART Baseline 6 months 12 months med med nut R nut Responder R phy med Primary Outcome Non-responder R phy phy 8 / 26

Possible SMART Designs Randomize All (Factorial) SMART Baseline 6 months 12 months med med R nut R nut Primary Outcome phy phy 9 / 26

Possible SMART Designs Design Considerations In case of a family-based SMART: The label Responder means the patient in the family is a responder, irrespective of the response status of other family members Sample-size requirements will be higher due to intra-class correlation The 3 design options vary in terms of exploration vs. exploitation dilemma a phrase from Reinforcement Learning (Sutton and Barto, 1998) Which one is the best? What is the right metric to judge that? 10 / 26

Outline 1 Possible SMART Designs 2 3 4

Possible SMART Designs Data Structure (w/o Clustering) O 1, A 1, O 2, A 2, O 3 O t : Observation (pre-treatment) at the tth stage A t : Treatment (action) at the tth stage, A t A t H t : History at the tth stage, H 1 = O 1, H 2 = (O 1, A 1, O 2 ) Y : Primary Outcome (larger is better) A DTR is a sequence of decision rules: d (d 1, d 2 ) with d t (h t ) A t 12 / 26

Q-learning The intuition comes from Dynamic Programming (Bellman, 1957) in case the multivariate distribution of the data is known Move backward in time to take care of the delayed effects Define the Quality of treatment, Q-functions: ] H2, Q 2(H 2, A 2) = E [Y 2 A 2 Q 1(H 1, A 1) = [ ] E Y 1 + max Q 2(H 2, a 2) H 1, A 1 a 2 }{{} delayed effect Optimal DTR: d t(h t) = arg max at Q t(h t, a t), t = 1, 2 When the true Q-functions are unknown, need to first estimate them from a data set: {O 1i, A 1i, O 2i, A 2i, O 3i }, i = 1,..., n, and then apply dynamic programming

Q-learning with Linear Regression Linear regression models for Q-functions: Q t (H t, A t ; β t ) At stage 2, obtain ˆβ 2 by least squares Construct stage-1 dependent variable: Ŷ 1i = Y 1i + max Q 2 (H 2i, a 2 ; ˆβ 2 ), i = 1,..., n a 2 At stage 1, regress Ŷ 1 on suitable covariates to obtain ˆβ 1 Estimated Optimal DTR: ˆd t (h t ) = arg max a t Q t (h t, a t ; ˆβ t ), t = 1, 2

Q-learning with Clustered Data Data trajectory for the jth member in the ith family, 1 i N, 1 j n (WLOG, j = 1 indicates patient, and j > 1 family member): (O 1ij, A 1i, O 2ij, A 2i, O 3ij ) H tij denote the subject history, and H ti = (H ti1,..., H tin ) the corresponding family history Define the clustered Q-functions for subject j in family i: ] H2i Q 2j (H 2i, A 2i ) = E [Y ij, A 2i [ ] Q 1j (H 1i, A 1i ) = E max Q 2j (H 2i, a 2 ) H 1i, A 1i a 2 Q-functions are defined by conditioning on the family history As a result, the maximization accounts for information available from family members; and Q tj and Q tj are correlated within family Estimated Optimal DTR: ˆd tj (h ti ) = arg max at Q tj (h ti, a t ; ˆβ t ), t = 1, 2

Q-learning with GEE Define Y i = (Y i1,..., Y in ) T, with working covariance Σ 2i = σ 2 R, i, where σ 2 is the common variance of Y ij s, and R is the working correlation matrix (e.g. exchangeable) ( T For the ith family, let Q t1 (β t ), Q t2 (β t )..., Q tn (β t )) = Xti β t where X ti is the design matrix consisting of H ti and A ti and their interactions Next, define Y = (Y T 1,..., YT N )T, X 2 = (X 21,..., X 2N ) T and Σ 2 = diag(σ 21,..., Σ 2N ) is block-diagonal Using GEE, estimate β 2 by ˆβ 2 = (X T 2 Σ 1 2 X 2) 1 X T 2 Σ 1 2 Y Construct the stage-1 dependent variables: Ŷ 1ij = max a2 Q 2j (h 2i, a 2 ; ˆβ 2 ), and combine them to get Ŷ 1 Specify a working covariance and define X 1 and Σ 1 as above; hence estimate β 1 by ˆβ 1 = (X1 TΣ 1 1 X 1) 1 X1 TΣ 1 1 Ŷ 1

Possible SMART Designs Q-learning with GEE The stage-2 estimator ˆβ 2 is consistent for β 2 even for incorrect specification of Σ 2, and its variance can be consistently estimated by the well-known sandwich formula Theory unavailable at this point for stage-1 estimator (due to the infamous non-regularity) Practical Question: Does incorporating GEE into the Q-learning framework improve the quality of estimation in terms of the Value of the estimated DTR (i.e. mean outcome if the estimated DTR is used to assign treatment to the entire population)? 17 / 26

Outline 1 Possible SMART Designs 2 3 4

Possible SMART Designs Simulation Design Time-varying variable of interest: (negative) systolic blood pressure (BP) measured at baseline, 6 months and 12 months The generative model follows the effects observed in the Northern Manhattan Study (NOMAS) a longitudinal cohort study of stroke risk factors among Black, Hispanic, and White populations in the Northern Manhattan area: http://columbianomas.org/study.html BP is generated via a linear model involving subject-level characteristics: baseline physical activity (ACT), education, diabetes [following NOMAS] family-level characteristic: ethnicity (represented by two indicators BLACK and HISP) [following NOMAS] family-level random effect: N(0, σ 2 F) [simulated SMART] treatment variables: NUT, PHY, HISP NUT, ACT PHY [simulated SMART] random error: N(0, σ 2 e ) [following NOMAS] 19 / 26

Possible SMART Designs Simulation Design The size of treatment variables are varied through a parameter b, set according to Cohen s (1988) small and medium effect sizes Three types of SMART designs are employed to assign treatments Methods compared: Q-learning with patient only data (QL-Patient), Q-learning with all subjects (QL-All), Q-learning with GEE (QL-GEE) Metric to judge performance: Value 1000 simulated data sets, each consisting of 300 families, where each family has 3 members (1 patient) Given a training data (and hence an estimated DTR), its Value is computed using a separate Monte Carlo evaluation data set of size 2000 patients 20 / 26

Results Possible SMART Designs Play the Winner, Switch from the Loser SMART Example Effect Size, b σ F/σ e QL-Patient QL-All QL-GEE 1 Small 1 27.24 27.45 27.47 2 Small 2 26.68 26.79 26.82 3 Small 4 25.91 25.98 26.02 4 Medium 1 27.96 27.97 28.00 5 Medium 2 27.60 27.68 27.79 6 Medium 4 25.80 26.06 26.16 21 / 26

Results Possible SMART Designs Randomize the Winner, Switch from the Loser SMART Example Effect Size, b σ F/σ e QL-Patient QL-All QL-GEE 1 Small 1 27.26 27.30 27.42 2 Small 2 26.55 26.68 26.73 3 Small 4 25.93 25.96 26.04 4 Medium 1 27.99 27.96 28.02 5 Medium 2 27.72 27.80 27.82 6 Medium 4 25.76 25.99 26.21 22 / 26

Results Possible SMART Designs Randomize All (Factorial) SMART Example Effect Size, b σ F/σ e QL-Patient QL-All QL-GEE 1 Small 1 27.27 27.40 27.42 2 Small 2 26.61 26.74 26.78 3 Small 4 26.09 26.13 26.15 4 Medium 1 27.98 27.97 28.00 5 Medium 2 27.70 27.74 27.81 6 Medium 4 25.81 26.00 26.18 23 / 26

Outline 1 Possible SMART Designs 2 3 4

Summary and End Notes All 3 versions of SMART perform comparably in terms of Value one may be preferred over the others by ethical and/or logistical considerations We propose to use an extension of Q-learning incorporating GEE (QL-GEE) for estimating optimal DTRs from family-based or community-based studies where clustered data arise naturally In the simulations tried out so far, QL-GEE performs only marginally better than standard Q-learning The hope is to see bigger differences in performance in yet-unexplored settings, and perhaps smaller sample sizes Formally constructing CIs for the Value of the estimated DTRs is the natural next step it is challenging, but early inroads have been made into that problem (talk @ UPenn Conference in April will be published in a special issue of Clinical Trials containing the conference proceedings)

It s all work-in-progress... Don t take me too seriously!!