Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

Similar documents
Targeted Group Sequential Adaptive Designs

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

Targeted Learning for High-Dimensional Variable Importance

Targeted Maximum Likelihood Estimation in Safety Analysis

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials

University of California, Berkeley

Causal Inference Basics

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

University of California, Berkeley

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Statistical Inference for Data Adaptive Target Parameters

University of California, Berkeley

University of California, Berkeley

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Data-adaptive inference of the optimal treatment rule and its mean reward. The masked bandit

University of California, Berkeley

Extending causal inferences from a randomized trial to a target population

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

The International Journal of Biostatistics

Adaptive Trial Designs

Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters

Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model

Causal Inference for Case-Control Studies. Sherri Rose. A dissertation submitted in partial satisfaction of the. requirements for the degree of

Stat 5101 Lecture Notes

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

AFT Models and Empirical Likelihood

Targeted Minimum Loss Based Estimation for Longitudinal Data. Paul H. Chaffee. A dissertation submitted in partial satisfaction of the

Combining multiple observational data sources to estimate causal eects

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Empirical Bayes Moderation of Asymptotically Linear Parameters

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

University of California, Berkeley

University of California, Berkeley

TARGETED SEQUENTIAL DESIGN FOR TARGETED LEARNING INFERENCE OF THE OPTIMAL TREATMENT RULE AND ITS MEAN REWARD

Empirical Bayes Moderation of Asymptotically Linear Parameters

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research

University of California, Berkeley

ENHANCED PRECISION IN THE ANALYSIS OF RANDOMIZED TRIALS WITH ORDINAL OUTCOMES

University of California, Berkeley

,..., θ(2),..., θ(n)

University of California, Berkeley

A Sampling of IMPACT Research:

Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules

Lecture 6: Discrete Choice: Qualitative Response

Flexible Estimation of Treatment Effect Parameters

Modeling Real Estate Data using Quantile Regression

University of California, Berkeley

arxiv: v1 [math.st] 24 Mar 2016

Targeted Learning. Sherri Rose. April 24, Associate Professor Department of Health Care Policy Harvard Medical School

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Linear Models for Regression

Estimation of Dynamic Regression Models

Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Estimating the Effect of Vigorous Physical Activity on Mortality in the Elderly Based on Realistic Individualized Treatment and Intentionto-Treat

Surrogate loss functions, divergences and decentralized detection

1 Basic summary of article to set stage for discussion

Machine Learning 4771

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

ECE521 week 3: 23/26 January 2017

Lecture 14 October 13

Estimation of the Bivariate and Marginal Distributions with Censored Data

University of California, Berkeley

Bayesian Machine Learning

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Bayesian estimation of the discrepancy with misspecified parametric models

Monte Carlo Integration I [RC] Chapter 3

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Math 494: Mathematical Statistics

Figure 36: Respiratory infection versus time for the first 49 children.

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Bayesian Estimation Under Informative Sampling with Unattenuated Dependence

Information Theory and Statistics, Part I

arxiv: v1 [math.st] 28 Feb 2017

An Introduction to Statistical and Probabilistic Linear Models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Randomization-Based Inference With Complex Data Need Not Be Complex!

Bayesian Regression Linear and Logistic Regression

Introduction to Probabilistic Machine Learning

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Diagnosing and responding to violations in the positivity assumption.

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Transcription:

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Mark J. van der Laan 1 University of California, Berkeley School of Public Health laan@berkeley.edu mayaliv@berkeley.edu stat.berkeley.edu/ laan/ works.bepress.com/maya petersen/ targetedlearningbook.com Based on Joint Work with Maya Petersen, Antoine Chambaz, Laura Balzer May 24, 2012 Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 1 / 16

General Adaptive Designs Full data: X = (X 1,..., X n ) n i.i.d. full-data X i P X,0 M F. Target Quantity: ψ F 0 = ΨF (P X,0 ) for some Ψ F : M F IR. Observed data: O = (O i = Φ(X i, A i ) : i = 1,..., n) P PX,0,g 0, for some censored data structure Φ(X i, A i ), and, where conditional distribution g 0 of A = (A 1,..., A n ), given X, is known (or, at least, can be well estimated), and satisfies coarsening at random wr.t. X: g 0 (A X) = h 0 (O). The likelihood of O is given by: n P PX,0,g 0 (O) = P X,0 (C(O i ))g 0 (A X), i=1 where C(O i ) is coarsening of X i implied by O i. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 2 / 16

Under positivity assumption, the full-data parameter is thus identified from observed data distribution, giving us the statistical target parameter Ψ : M IR, with M = {P PX,g 0 : P X M F } the observed data model. This defines the estimation problem. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 3 / 16

Group Sequential Adaptive Randomized Trials Suppose the individuals are ordered i = 1,..., n, and the treatment/action assignment A i for subject i can also depend on the data O 1,..., O i 1, beyond its usual allowed dependence on O i : g i = P(A i X, A(i 1)) = h(a i O 1,..., O i 1, O i ). The adaptive design g i can be chosen so that it learns/adapts (for i ) an optimal fixed design g opt (A X ). Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 4 / 16

In van der Laan (2008), and Chambaz, van der Laan (2010a,b), we develop targeted maximum likelihood estimators (TMLE) and statistical inference. The statistical inference is based on Martingale CLT s and tail probability bounds for martingale sum processes. The limit distribution of the TMLE corresponds with the limit distribution of the TMLE under i.i.d. sampling P PX,0,g under a fixed unknown design g identified by the limit of the adaptive design g i. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 5 / 16

Adaptive Randomization in Community RCT Full-data: Let X i = (W i, Yi 0, Yi 1 ) i.i.d. P X,0 M F for a nonparametric full-data model M F, and let ψ0 F = E 0{Y 1 Y 0 } be the additive causal effect of a binary treatment. Observed data and CAR: Let O i = (W i, A i = (A t i, i ), Y i = Y i (A t i, i ) = i Y At i i ), where A t i denotes treatment, and 1 i denotes a missingness indicator. Let g 0 be the conditional distribution of A, given X, which is assumed to only be a function of W = (W 1,..., W n ). Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 6 / 16

Likelihood: P(O 1,..., O n ) = n Q W (W i )Q Y (Y i W i, A i )g 0 (A W), i=1 where Q W and Q Y denote the common marginal distribution of W and the common conditional distribution of Y, given A, W, respectively. We assume J g 0 (A W) = g 0,j (A(j) : j C j (W) W), (1) j=1 where C 1 (W),..., C J (W) is a partitioning of the sample {1,..., n} into J clusters deterministically implied by W. Statistical target: The causal quantity is identified by Ψ(P 0 ) = E QW,0 { Q 0 (1, W ) Q 0 (0, W )}, where Q 0 (a t, w) = E 0 (Y i A i = a t, i = 1, W i = w) (which is constant in i). Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 7 / 16

Canonical gradient/efficient Influence Curve: The statistical parameter Ψ is pathwise differentiable and its canonical gradient at P is given by D(P)(O) = 1 n n D(Q, ḡ)(o i ), i=1 where D(Q, g) is the efficient influence curve at P Q,g for the individual level data structure: D(Q, ḡ)(o i ) = D W (Q)(W i) + D Y (Q, ḡ)(o i)}, and DW (Q)(W i) = Q(1, W i ) Q(0, W i ) Ψ(Q) DY (Q, ḡ)(o i) = i(2a t i 1) ḡ(a i W i ) (Y i Q(A t i, W i )), Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 8 / 16

ḡ(a W i ) = 1 n n g j (a W j = w), w=wi j=1 g 0,j is the conditional distribution of A j, given W j, defined by g 0,j (a W j = w) = g 0,j (a (w l : l j), W j = w) Q W,0 (w l ). (w l :l j) l j (2) Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 9 / 16

Double robustness: We have E 0 D( Q, ḡ, ψ) = ψ 0 ψ if Q = Q 0 or ḡ = ḡ 0, (3) assuming that for all i, 0 < ḡ i (1 W ) < 1 a.e. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 10 / 16

Let Mark van Q der Laan be(uc theberkeley) empiricaladaptive distribution, Design in Community and QRCT 0 = (Q, Q May 0 ). 24, We 2012note11 / 16 Targeted MLE Let Y {0, 1} be binary or continuous in (0, 1). Let Q 0 n be an initial estimator of Q 0, which can be based on the loss-function L i ( Q)(O i ) = i { Yi log Q(W i, A i ) + (1 Y i ) log(1 Q(W i, A i )) }. Cross-validation treating sample as i.i.d. based on this loss function is appropriate. Let ḡ n = 1/n i g i,n be an estimator of ḡ 0, which could be based on loss L(ḡ)(W, A) = i log ḡ(a i W i ). Cross-validation now has to treat the clusters C j (W) as unit. If g 0 is known by design, one may/should use plug-in estimator, plugging in empirical distribution Q W,n. As least favorable submodel for fluctuating Q 0 n we use logistic regression Logit Q 0 n(ɛ) = Logit Q 0 n + ɛh ḡ n, where H ḡ n (A, W ) = (2A 1)/ḡ n (A W ).

The amount of fluctuation ɛ n is estimated as ɛ n = arg min ɛ n L i ( Q n(ɛ))(o 0 i ). i=1 This provides the update Q n = Q 0 n(ɛ n ). Let Q n = (Q W,n, Q n). The TMLE of ψ 0 is the corresponding plug-in estimator Ψ(Q n) = 1 n n { Q n(1, W i ) Q n(0, W i )}. i=1 The TMLE solves 0 = D( Q n, ḡ n, ψ n) = 1 n n D( Q n, ḡ n, ψn)(o i ). i=1 Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 12 / 16

Asymptotics of TMLE for Community RCT s Theorem: Suppose that for j = 1,..., J 1, C j (W) is a pair whose observations will be denoted with (O 1,j, O 2,j ), and one cluster C J is defined by the units with missingness. Suppose g 0,i (A i W) = g 0 (A i W i ), and it is known, so that ḡ 0 = g 0 is known as well. Let Q n converge to some Q. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 13 / 16

The asymptotic variance σ 2 of the standardized TMLE n(ψ n ψ 0 ) can be represented as follows: σ 2 = P Q0,g 0 {D(Q, g 0 )} 2 lim n E 0 1 J J j=1 ( Q 0 Q )(1, W 1j )( Q 0 Q )(1, W 2j ) lim n E 0 1 J J j=1 ( Q 0 Q )(0, W 1j )( Q 0 Q )(0, W 2j ). Under the assumption that the covariance-terms are positive, a conservative estimate of σ 2 is thus given by: σ 2 n = 1 n n {D (Qn, g 0 )(O i )} 2. i=1 Note: Consistent estimation of true asymptotic variance relies on consistent estimation of Q 0! Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 14 / 16

Concluding Remark: Semiparametric estimation and inference for sample size one problems is possible and represents an exciting and important area of research. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 15 / 16

Further Materials Targeted Learning Book Springer Series in Statistics van der Laan & Rose targetedlearningbook.com 2012 JSM Short Course Instructors: Petersen, Rose & van der Laan Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 16 / 16