Targeted Group Sequential Adaptive Designs

Similar documents
Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials

University of California, Berkeley

University of California, Berkeley

Adaptive Trial Designs

Targeted Maximum Likelihood Estimation in Safety Analysis

University of California, Berkeley

Targeted Learning for High-Dimensional Variable Importance

Causal Inference Basics

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

Data-adaptive inference of the optimal treatment rule and its mean reward. The masked bandit

University of California, Berkeley

University of California, Berkeley

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

University of California, Berkeley

TARGETED SEQUENTIAL DESIGN FOR TARGETED LEARNING INFERENCE OF THE OPTIMAL TREATMENT RULE AND ITS MEAN REWARD

University of California, Berkeley

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Adaptive designs to maximize power. trials with multiple treatments

Statistical Inference for Data Adaptive Target Parameters

The International Journal of Biostatistics

University of California, Berkeley

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber

Machine Learning Theory (CS 6783)

University of California, Berkeley

A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes

An Omnibus Nonparametric Test of Equality in. Distribution for Unknown Functions

University of California, Berkeley

University of California, Berkeley

Machine Learning Theory (CS 6783)

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Bandit models: a tutorial

A Sampling of IMPACT Research:

Some Properties of the Randomized Play the Winner Rule

Implementing Response-Adaptive Randomization in Multi-Armed Survival Trials

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

Online Learning and Sequential Decision Making

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

The information complexity of sequential resource allocation

University of California, Berkeley

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Multi-armed bandit models: a tutorial

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Machine Learning Linear Classification. Prof. Matteo Matteucci

Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012)

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Targeted Minimum Loss Based Estimation for Longitudinal Data. Paul H. Chaffee. A dissertation submitted in partial satisfaction of the

University of California, Berkeley

arxiv: v1 [math.st] 24 Mar 2016

Interactive Q-learning for Probabilities and Quantiles

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

University of California, Berkeley

The information complexity of best-arm identification

A simulation study for comparing testing statistics in response-adaptive randomization

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Introduction to Bayesian Learning. Machine Learning Fall 2018

Estimation Considerations in Contextual Bandits

ENHANCED PRECISION IN THE ANALYSIS OF RANDOMIZED TRIALS WITH ORDINAL OUTCOMES

Modification and Improvement of Empirical Likelihood for Missing Response Problem

The Multi-Armed Bandit Problem

Counterfactual Model for Learning Systems

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

Targeted Learning with Daily EHR Data

Bayesian Contextual Multi-armed Bandits

Estimating the Effect of Vigorous Physical Activity on Mortality in the Elderly Based on Realistic Individualized Treatment and Intentionto-Treat

Optimal allocation to maximize power of two-sample tests for binary response

University of California, Berkeley

Selective Inference for Effect Modification

Combining multiple observational data sources to estimate causal eects

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Targeted Learning. Sherri Rose. April 24, Associate Professor Department of Health Care Policy Harvard Medical School

Counterfactual Evaluation and Learning

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Classification. Chapter Introduction. 6.2 The Bayes classifier

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

High Dimensional Propensity Score Estimation via Covariate Balancing

Empirical Bayes Moderation of Asymptotically Linear Parameters

Lecture 01: Introduction

Bandits : optimality in exponential families

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Linear Methods for Prediction

The Multi-Arm Bandit Framework

High-Throughput Sequencing Course

Empirical Risk Minimization

UNIVERSITY OF TORONTO Faculty of Arts and Science

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Propensity Score Weighting with Multilevel Data

arxiv: v1 [stat.me] 15 May 2011

Transcription:

Targeted Group Sequential Adaptive Designs Mark van der Laan Department of Biostatistics, University of California, Berkeley School of Public Health Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 0 / 13

Contextual multiple-bandit problem in computer science Consider a sequence (W n, Y n (0), Y n (1)) n 1 of i.i.d. random variables with common probability distribution P F 0 : W n, nth context (possibly high-dimensional) Y n (0), nth reward under action a = 0 (in ]0, 1[) Y n (1), nth reward under action a = 1 (in ]0, 1[) We consider a design in which one sequentially, observe context W n carry out randomized action A n {0, 1} based on past observations and W n get the corresponding reward Y n = Y n (A n ) (other one not revealed), resulting in an ordered sequence of dependent observations O n = (W n, A n, Y n ). Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 1 / 13

Goal of experiment We want to estimate the optimal treatment allocation/action rule d 0 : d 0 (W ) = arg max a=0,1 E 0 {Y (a) W }, which optimizes EY d over all possible rules d. the mean reward under this optimal rule d 0 : Ψ(P F 0 ) = E 0{Y (d 0 (W ))}, and we want maximally narrow valid confidence intervals (primary) Statistical... minimize regret (secondary) 1 n ni=1 (Y i Y i (d n ))... bandits This general contextual multiple bandit problem has enormous range of applications: e.g., on-line marketing, recommender systems, randomized clinical trials. Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 2 / 13

Targeted Group Sequential Adaptive Designs We refer to such an adaptive design as a particular targeted adaptive group-sequential design (van der Laan, 2008). In general, such designs aim at each stage to optimize a particular data driven criterion over possible treatment allocation probabilities/rules, and then use it in next stage. In this case, the criterion of interest is an estimator of EY d based on past data, but, other examples are, for example, that the design aims to maximize the estimated information (i..e., minimize an estimator of the variance of efficient estimator) for a particular statistical target parameter. Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 3 / 13

Mean reward under the optimal dynamic rule Notation: Q(a, W ) = E P F (Y (a) W ) q(w ) = Q(1, W ) Q(0, W ) ( blip function ) Optimal rule d( Q): d Q (W ) = arg max Q(a, W ) = I{ q(w ) > 0}. a=0,1 Mean reward under optimal dynamic rule: { Ψ(P F ) = E P F Q(d( Q)(W ), W )} Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 4 / 13

Bibliography (non exhaustive!) Sequential designs Thompson (1933), Robbins (1952) specifically in the context of medical trials - Anscombe (1963), Colton (1963) - response-adaptive designs: Cornfield et al. (1969), Zelen (1969), many more since then Covariate-adjusted Response-Adaptive (CARA) designs Rosenberger et al. (2001), Bandyopadhyay and Biswas (2001), Zhang et al. (2007), Zhang and Hu (2009), Shao et al (2010)... typically study - convergence of design... in correctly specified parametric model Chambaz and van der Laan (2013), Zheng, Chambaz and van der Laan (2015) concern - convergence of design and asymptotic behavior of estimator... using (mis-specified) parametric model Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 5 / 13

Bibliography for estimation of optimal rule under iid sampling Zhao et al (2012), Chakraborty et al (2013), Goldberg et al (2014), Laber et al (2014), Zhao et al (2015) Luedtke and van der Laan (2015, 2016), based on TMLE Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 6 / 13

Sampling Strategy: Initialization Choose a sequence ( Q n ) n 1 of estimators of Q 0 a function G : [ 1, 1] ]0, 1[ such that, for some t, ξ > 0 small, - x > ξ implies G(x) = t if x < 0 and G(x) = 1 t if x > 0 - G(0) = 50% and G non-decreasing G( q) represents a smooth approximation of the deterministic rule W I{ q(w ) > 0} over [ 1, +1], bounded away from 0 and 1. For i = 1,..., n 0 (initial sample size), carry out action/treatment A i drawn from Bernoulli(50%), observe O i = (W i, A i, Y i = Y i (A i )) Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 7 / 13

Sampling Strategy: Sequentially learn and treat Conditional on O 1,..., O n, Then 1 estimate Q 0 with Q n yields 2 define g n+1 (W ) = G( q n (W )). Note: If q n (W ) > ξ, then g n+1 (W ) d( Q n )(W ) observe W n+1 { qn (W ) = Q n (1, W ) Q n (0, W ) d( Q n )(W ) = I{ q n (W ) > 0} sample A n+1 from Bernoulli(g n+1 (W n+1 )), carry out action/treatment A n+1 get reward Y n+1 = Y n+1 (A n+1 ) Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 8 / 13

Estimation of outcome regression and optimal rule: Super-learning At each n, we can use super-learning of Q 0 = E 0 (Y A, W ), and thereby q 0 and d 0 (W ) = I( q 0 (W ) > 0). In Luedtke, van der Laan (2014), we propose a super-learner of d 0 based on evaluating each candidate estimator ˆd on a cross-validated estimator of E 0 Yˆd, so that the super-learner optimizes the dissimilarity EY d EY d0. This super-learner can include candidate estimators ˆd based on an estimator of q 0, and estimators that directly estimate d 0 by reformulating it as a classification problem using standard machine learning algorithms for classification. Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 9 / 13

TMLE of Mean Reward under Optimal Rule Given the estimator d i based on first i observations of the optimal rule d 0, across all i = n 0,..., n, the TMLE is defined as follows: Use submodel Logit Q n,ɛ = Logit Q n + ɛh(g n, d n ), where H(g n )(A, W ) = I(A = d n (W ))/g n (d n (W ) W ). Fit ɛ with weighted logistic regression where the i-th observation receives weight g n (A i W i ) g i (A i W i ) I(A i = d n (W i )), where g i is the actual randomization used for subject i in the design. This results in the TMLE Q n = Q n,ɛn of Q 0 = E 0 (Y A, W ), and the TMLE of E 0 Y d0 is thus the resulting plug-in estimator: ψn = Ψ( Q n, Q W,n ) = 1 n Q n n(d n (W i ), W i ). Liver We Forum, can Mayalso 10, 2017 use a cross-validated Targeted Group Sequential TMLE, Adaptive Designs providing a finite sample10 / 13 i=1

Inference for the Mean Reward under Optimal Rule The TMLE solves the martingale efficient score equation 0 = n i=1 D (d n, Q n, g i, ψ n)(o i ), given by 0 = 1 n ( Q n n(d n (W i ), W i ) ψn) + I(A i = d n (W i ) (Y i g i=1 i (A i W i ) Q n(a i, W i )). Since n i=1 D (d, Q, g i, ψ 0 )(O i ) is a mean zero martingale sum for any (d, Q), ψ n can be analyzed through functional martingale central limit theorem. That is, under mild regularity conditions, asymptotically, ψ n EY d behaves as a zero-mean martingale sum 1 n n i=1 { Q(d(W i ), W i ) E 0 Y d + I(A } i = d(w i ) g i (A i W i ) (Y i Q(A i, W i )), where (the possibly misspecified limits) Q and d are the limits of Q n and d n. Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 11 / 13

Inference: Continued As a consequence of the MGCLT, n(ψn EY d ) d N(0, σ0 2 ), where the asymptotic variance σ0 2 is estimated consistently with σ 2 n = 1 n n i=1 { Q(d(W i ), W i ) E 0 Y d + I(A } i = d(w i ) 2 g i (A i W i ) (Y i Q(A i, W i )) Thus, an asymptotic 0.95-confidence interval for EY d (and for the data adaptive target parameter) EY dn is given by: ψ n ± 1.96σ n /n 1/2. If d n consistently estimates d 0, then this yields a confidence interval for E 0 Y d0. Either way, we obtain a valid confidence interval for the mean reward E 0 Y dn under the actual rule we learned. Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 12 / 13

Concluding Remarks The TMLE for group-sequential adaptive designs fully preserves the integrity of randomized trials: in other words, we obtain valid inference without any reliance on model assumptions. The current dominating literature on this topic relies on standard MLE for (misspecified) parametric models and is thus highly problematic. Sequential testing, enrichment designs, and adaptive estimation of sample size, are naturally added into these targeted group-sequential adaptive designs. Contrary to fixed designs, these designs are able to learn and adapt along the way, serving the patients. The theory also applies if the choice of algorithm is set at time i only based on O 1,..., O i 1. However, it is important that the design stabilizes/learns as sample size increases, so modifying the adaptation strategy during design will hurt finite sample approximation of normal limit distribution. Liver Forum, May 10, 2017 Targeted Group Sequential Adaptive Designs 13 / 13