Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Mark J. van der Laan 1 University of California, Berkeley School of Public Health laan@berkeley.edu mayaliv@berkeley.edu stat.berkeley.edu/ laan/ works.bepress.com/maya petersen/ targetedlearningbook.com Based on Joint Work with Maya Petersen, Antoine Chambaz, Laura Balzer May 24, 2012 Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 1 / 16
General Adaptive Designs Full data: X = (X 1,..., X n ) n i.i.d. full-data X i P X,0 M F. Target Quantity: ψ F 0 = ΨF (P X,0 ) for some Ψ F : M F IR. Observed data: O = (O i = Φ(X i, A i ) : i = 1,..., n) P PX,0,g 0, for some censored data structure Φ(X i, A i ), and, where conditional distribution g 0 of A = (A 1,..., A n ), given X, is known (or, at least, can be well estimated), and satisfies coarsening at random wr.t. X: g 0 (A X) = h 0 (O). The likelihood of O is given by: n P PX,0,g 0 (O) = P X,0 (C(O i ))g 0 (A X), i=1 where C(O i ) is coarsening of X i implied by O i. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 2 / 16
Under positivity assumption, the full-data parameter is thus identified from observed data distribution, giving us the statistical target parameter Ψ : M IR, with M = {P PX,g 0 : P X M F } the observed data model. This defines the estimation problem. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 3 / 16
Group Sequential Adaptive Randomized Trials Suppose the individuals are ordered i = 1,..., n, and the treatment/action assignment A i for subject i can also depend on the data O 1,..., O i 1, beyond its usual allowed dependence on O i : g i = P(A i X, A(i 1)) = h(a i O 1,..., O i 1, O i ). The adaptive design g i can be chosen so that it learns/adapts (for i ) an optimal fixed design g opt (A X ). Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 4 / 16
In van der Laan (2008), and Chambaz, van der Laan (2010a,b), we develop targeted maximum likelihood estimators (TMLE) and statistical inference. The statistical inference is based on Martingale CLT s and tail probability bounds for martingale sum processes. The limit distribution of the TMLE corresponds with the limit distribution of the TMLE under i.i.d. sampling P PX,0,g under a fixed unknown design g identified by the limit of the adaptive design g i. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 5 / 16
Adaptive Randomization in Community RCT Full-data: Let X i = (W i, Yi 0, Yi 1 ) i.i.d. P X,0 M F for a nonparametric full-data model M F, and let ψ0 F = E 0{Y 1 Y 0 } be the additive causal effect of a binary treatment. Observed data and CAR: Let O i = (W i, A i = (A t i, i ), Y i = Y i (A t i, i ) = i Y At i i ), where A t i denotes treatment, and 1 i denotes a missingness indicator. Let g 0 be the conditional distribution of A, given X, which is assumed to only be a function of W = (W 1,..., W n ). Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 6 / 16
Likelihood: P(O 1,..., O n ) = n Q W (W i )Q Y (Y i W i, A i )g 0 (A W), i=1 where Q W and Q Y denote the common marginal distribution of W and the common conditional distribution of Y, given A, W, respectively. We assume J g 0 (A W) = g 0,j (A(j) : j C j (W) W), (1) j=1 where C 1 (W),..., C J (W) is a partitioning of the sample {1,..., n} into J clusters deterministically implied by W. Statistical target: The causal quantity is identified by Ψ(P 0 ) = E QW,0 { Q 0 (1, W ) Q 0 (0, W )}, where Q 0 (a t, w) = E 0 (Y i A i = a t, i = 1, W i = w) (which is constant in i). Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 7 / 16
Canonical gradient/efficient Influence Curve: The statistical parameter Ψ is pathwise differentiable and its canonical gradient at P is given by D(P)(O) = 1 n n D(Q, ḡ)(o i ), i=1 where D(Q, g) is the efficient influence curve at P Q,g for the individual level data structure: D(Q, ḡ)(o i ) = D W (Q)(W i) + D Y (Q, ḡ)(o i)}, and DW (Q)(W i) = Q(1, W i ) Q(0, W i ) Ψ(Q) DY (Q, ḡ)(o i) = i(2a t i 1) ḡ(a i W i ) (Y i Q(A t i, W i )), Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 8 / 16
ḡ(a W i ) = 1 n n g j (a W j = w), w=wi j=1 g 0,j is the conditional distribution of A j, given W j, defined by g 0,j (a W j = w) = g 0,j (a (w l : l j), W j = w) Q W,0 (w l ). (w l :l j) l j (2) Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 9 / 16
Double robustness: We have E 0 D( Q, ḡ, ψ) = ψ 0 ψ if Q = Q 0 or ḡ = ḡ 0, (3) assuming that for all i, 0 < ḡ i (1 W ) < 1 a.e. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 10 / 16
Let Mark van Q der Laan be(uc theberkeley) empiricaladaptive distribution, Design in Community and QRCT 0 = (Q, Q May 0 ). 24, We 2012note11 / 16 Targeted MLE Let Y {0, 1} be binary or continuous in (0, 1). Let Q 0 n be an initial estimator of Q 0, which can be based on the loss-function L i ( Q)(O i ) = i { Yi log Q(W i, A i ) + (1 Y i ) log(1 Q(W i, A i )) }. Cross-validation treating sample as i.i.d. based on this loss function is appropriate. Let ḡ n = 1/n i g i,n be an estimator of ḡ 0, which could be based on loss L(ḡ)(W, A) = i log ḡ(a i W i ). Cross-validation now has to treat the clusters C j (W) as unit. If g 0 is known by design, one may/should use plug-in estimator, plugging in empirical distribution Q W,n. As least favorable submodel for fluctuating Q 0 n we use logistic regression Logit Q 0 n(ɛ) = Logit Q 0 n + ɛh ḡ n, where H ḡ n (A, W ) = (2A 1)/ḡ n (A W ).
The amount of fluctuation ɛ n is estimated as ɛ n = arg min ɛ n L i ( Q n(ɛ))(o 0 i ). i=1 This provides the update Q n = Q 0 n(ɛ n ). Let Q n = (Q W,n, Q n). The TMLE of ψ 0 is the corresponding plug-in estimator Ψ(Q n) = 1 n n { Q n(1, W i ) Q n(0, W i )}. i=1 The TMLE solves 0 = D( Q n, ḡ n, ψ n) = 1 n n D( Q n, ḡ n, ψn)(o i ). i=1 Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 12 / 16
Asymptotics of TMLE for Community RCT s Theorem: Suppose that for j = 1,..., J 1, C j (W) is a pair whose observations will be denoted with (O 1,j, O 2,j ), and one cluster C J is defined by the units with missingness. Suppose g 0,i (A i W) = g 0 (A i W i ), and it is known, so that ḡ 0 = g 0 is known as well. Let Q n converge to some Q. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 13 / 16
The asymptotic variance σ 2 of the standardized TMLE n(ψ n ψ 0 ) can be represented as follows: σ 2 = P Q0,g 0 {D(Q, g 0 )} 2 lim n E 0 1 J J j=1 ( Q 0 Q )(1, W 1j )( Q 0 Q )(1, W 2j ) lim n E 0 1 J J j=1 ( Q 0 Q )(0, W 1j )( Q 0 Q )(0, W 2j ). Under the assumption that the covariance-terms are positive, a conservative estimate of σ 2 is thus given by: σ 2 n = 1 n n {D (Qn, g 0 )(O i )} 2. i=1 Note: Consistent estimation of true asymptotic variance relies on consistent estimation of Q 0! Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 14 / 16
Concluding Remark: Semiparametric estimation and inference for sample size one problems is possible and represents an exciting and important area of research. Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 15 / 16
Further Materials Targeted Learning Book Springer Series in Statistics van der Laan & Rose targetedlearningbook.com 2012 JSM Short Course Instructors: Petersen, Rose & van der Laan Mark van der Laan (UC Berkeley) Adaptive Design in Community RCT May 24, 2012 16 / 16