Supplement to Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes

Size: px
Start display at page:

Download "Supplement to Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes"

Transcription

1 Submitted to the Statistical Science Supplement to - and -learning Methods for Estimating Optimal Dynamic Treatment Regimes Phillip J. Schulte, nastasios. Tsiatis, Eric B. Laber, and Marie Davidian bstract. This supplemental article contains technical details related to material presented in the parent article. Section and equation numbers not preceded by refer to those in the parent article. Phillip J. Schulte is Biostatistician, Duke Clinical Research Institute, Durham, North Carolina 2771, US ( phillip.schulte@duke.edu). nastasios. Tsiatis is Gertrude M. Cox Distinguished Professor, Department of Statistics, North Carolina State University, Raleigh, North Carolina , US ( tsiatis@ncsu.edu). Eric B. Laber is ssistant Professor, Department of Statistics, North Carolina State University, Raleigh, North Carolina , US ( eblaber@ncsu.edu). Marie Davidian is William Neal Reynolds Professor, Department of Statistics, North Carolina State University, Raleigh, North Carolina , US ( davidian@ncsu.edu). 1

2 2 P.J. SCHULTE ET L..1. DEMONSTRTION THT (5) (8) DEFINE N OPTIML REGIME For k = 1,..., and any d D, define the random variables α k { S k ( d k 1 )} such that (.1) α k { S k( d k 1 )}(ω) = V (1) k ( s k, ū k 1 ) for any ω Ω, where ( s k, ū k 1 ) are defined by (3). Because V one k as defined in (6) and (8) are functions of arguments ( s k, ā k 1 ) S k Āk 1, whereas the random variable S k ( d k 1 ) has realizations only in S k, we find it convenient to introduce (.1) to make this distinction clear. In words, α k { S k ( d k 1 )} is the expected outcome conditional on a patient in the population having followed regime d through decision point k 1 and then following the imal regime from that point on. We now argue that d (1) is an imal regime; i.e., d (1) satisfies (4). We first show that, for any d D, E{Y (d) S 1 = s 1, S 2(d 1 ),..., S ( d 1 )} E{Y ( d 1, d (1) ) S 1 = s 1, S 2(d 1 ),..., S ( d 1 )} (.2) = α {s 1, S 2(d 1 ),..., S ( d 1 )}. This follows because, for the set in Ω where {S 2 (d 1) = s 2,..., S ( d 1 ) = s }, the left- and right-hand sides of the first line of (.2) are equal to (.3) E{Y (d) S ( d 1 ) = s } = E{Y (ū 1, u ) S (ū 1 ) = s }, (.4) E{Y ( d 1, d (1) ) S ( d 1 ) = s } = E[Y {ū 1, d (1) ( s, ū 1 )} S (ū 1 ) = s ], respectively. By the definition of d (1) by the definition of V (1) in (5), (.4) is greater than or equal to (.3), and, (1) in (6), (.4) equals V ( s, ū 1 ). Because these results hold for sets {S 2 (d 1) = s 2,..., S ( d 1 ) = s } for any (s 2,..., s ) such that (s 1,..., s, u 1,..., u 1 ) Γ k, and by the definition of α in (.1), (.2) holds. Taking conditional expectations given S 1 = s 1 yields E{Y (d) S 1 = s 1 } E{Y ( d 1, d (1) ) S 1 = s 1 } (.5) = E[α {s 1, S 2(d 1 ),..., S ( d 1 )} S 1 = s 1 ]. The equality in (.5) holds for any d 1 = (d 1,..., d 1 ) with d D, hence it must hold for

3 SUPPLEMENT TO - ND -LERNING METHODS 3 (d 1,..., d 2, d (1) 1 ). Thus, we also have that (.6) E{Y ( d 2, d (1) 1, d(1) ) S 1 = s 1 } = E[α {S 1, S 2(d 1 ),..., S 1( d 2 ), S ( d 2, d (1) 1 )} S 1 = s 1 ]. Similarly, for any k = 1,..., 1, we can show that E[α k+1 {S 1, S 2 (d 1),..., S k+1 ( d k )} S 1 = s 1, S 2 (d 1),..., S k ( d k 1 )] E[α k+1 {S 1, S 2 (d 1),..., S k+1 ( d k 1, d (1) k )} S 1 = s 1, S 2 (d 1),..., S k ( d k 1 )] = α k {s 1, S 2 (d 1),..., S k ( d k 1 )}, which implies for k = 1,..., 1, E[α k+1 {s 1, S 2(d 1 ),..., S k+1( d k )} S 1 = s 1 ] E[α k+1 {s 1, S 2(d 1 ),..., S k+1( d k 1, d (1) k )} S 1 = s 1 ] (.7) = E[α k {s 1, S 2(d 1 ),..., S k( d k 1 )} S 1 = s 1 ] Using (.5) and (.7) with k = 1, we thus have (.8) E{Y (d) S 1 = s 1 } E{Y ( d 1, d (1) ) S 1 = s 1 } = E[α {s 1, S 2(d 1 ),..., S ( d 1 ) S 1 = s 1 ] E[α {s 1, S 2(d 1 ),..., S ( d 2, d (1) 1 ) S 1 = s 1 ] = E[α 1 {s 1, S 2(d 1 ),..., S 1( d 2 ) S 1 = s 1 ] Because of (.6), the term in (.8) is equal to E{Y ( d 2, d (1) 1, d(1) ) S 1 = s 1 }. Hence, E{Y (d) S 1 = s 1 } E{(Y ( d 1, d (1) ) S 1 = s 1 } E{Y ( d 2, d (1) 1, d(1) ) S 1 = s 1 } (.9) = E[α 1 {s 1, S 2(d 1 ),..., S 1( d 2 ) S 1 = s 1 ]. gain, because d 2 is arbitrary, if we replace it by ( d 3, d (1) 2 ), the equality in (.9) implies (.1) E{Y ( d 3, d (1) 2 ) S 1 = s 1 } = E[α 1 {s 1, S 2(d 1 ),..., S 1( d 3, d (1) 2 )} S 1 = s 1 ], where, for any d, d k = (d k,..., d ). Using (.7) with k = 2, (.9), and (.1), we obtain E{Y ( d 2, d (1) 1 ) S 1 = s 1 } = E[α 1 {s 1, S 2(d 1 ),..., S 1( d 2 )} S 1 = s 1 ] E[α 1 {s 1, S 2(d 1 ),..., S 1( d 3, d (1) 2 ) S 1 = s 1 ] = E{Y ( d 3, d (1) 2 )} S 1 = s 1 } = E[α 2 {s 1, S 2(d 1 ),..., S 2( d 3 )} S 1 = s 1 ]. Continuing in this fashion, we may conclude that, for any d D, E{Y (d) S 1 = s 1 } E{Y ( d k 1, d (1) k ) S 1 = s 1 } E{Y (d (1) ) S 1 = s 1 }, showing that d (1) defined in (5) and (7) is an imal regime satisfying (4).

4 4 P.J. SCHULTE ET L..2. DEMONSTRTION OF CORRESPONDENCE IN (19) We make the consistency, sequential randomization, and positivity (15) assumptions given in Section 3; the latter states that, for any ( s k, ā k 1 ) for which pr( S k = s k, Āk 1 = ā k 1 ) >, pr( k = a k S k = s k, Āk 1 = ā k 1 ) > if ( s k, ā k 1 ) Γ k and a k Ψ k ( s k, ā k 1 ), k = 1,...,. This ensures that the observed data contain information on the possible treatment ions involved in the class of regimes under consideration. We wish to show (16) (18), which we restate here for convenience as (.11) (.12) (.13) pr( S k = s k, Āk = ā k ) >, pr(s k+1 = s k+1 S k = s k, Āk = ā k ) = pr{s k+1(ā k ) = s k+1 S k = s k, Āk 1 = ā k 1 } = pr{s k+1(ā k ) = s k+1 S j = s j, Āj 1 = ā j 1, S j+1(ā j ) = s j+1,..., S k(ā k 1 ) = s k }, for any ( s k, ā k 1 ) Γ k and a k Ψ k ( s k, ā k 1 ), k = 1,...,, for j = 1,..., k, where we define (.13) with j = k to be the same as the expression on the right-hand side of (.12) and take S +1 = Y and S+1 (ā ) = Y (ā ). If we can show (.11), then the quantities in (9) (14) are well-defined. If (.12) (.13) also hold, then the conditional distributions of the observed data involved in the quantities in (9) (14) are the same as the conditional distributions of the potential outcomes involved in (5) (8), and (19) follows. ssume for the moment that (.11) is true. We now demonstrate (.12) and (.13) must follow. For any fixed k, by the consistency assumption, the left-hand expression in (.12) is equal to pr{sk+1 (ā k) = s k+1 S k = s k, Āk 1 = ā k 1, k = a k }. It follows by the sequential randomization assumption, which implies k Sk+1 (ā k) S k, Āk 1, that this is equal to the right-hand side of (.12). The equality in (.13) follows by induction. First note that right-hand side of (.12) is equal to (.13) with j = k. The equality of (.12) and (.13) for all j = 1,..., k can be deduced if we can show that (.13) being true for a given j implies that it is also true for j 1. For a given j = 2,..., k, by the consistency assumption, (.13) is equal to pr{sk+1 (ā k) = s k+1 S j 1 = s j 1, Āj 2 = ā j 2, j 1 = a j 1, Sj (ā j) = s j,..., Sk (ā k 1) = s k }. By the sequential randomization assumption, j 1 {Sj (ā j),..., Sk+1 (ā k)} S j 1, Āj 2, this expression is equal to pr{sk+1 (ā k) = s k+1 S j 1 = s j 1, Āj 2 = ā j 2, Sj (ā j) = s j,..., Sk (ā k 1) = s k }, which is (.13) for j 1. Note, then, that this implies that the conditional densities in (.13), which are j-dependent, are the same as those on the left-hand side of (.12), which are not and are also

5 SUPPLEMENT TO - ND -LERNING METHODS 5 equal to pr{s k+1 (ā k) = s k+1 S k (ā k 1) = s k }; i.e., the left-hand side of (.12) is completely in terms of the distribution of the observed data, whereas (.13) with j = 1 is completely in terms of the distribution of the potential outcomes. We now prove (.11) by induction. ssume we have shown (.11) for a fixed k; i.e., if ( s k, ā k 1 ) Γ k and a k Ψ k ( s k, ā k 1 ), then pr( S k = s k, Āk = ā k ) >. Then we must show that (.14) pr( S k+1 = s k+1, Āk+1 = ā k+1 ) > if ( s k+1, ā k ) Γ k+1 and a k+1 Ψ k+1 ( s k+1, ā k ). Note that (.15) pr( S k+1 = s k+1, Ākl = ā k ) = pr(s k+1 = s k+1 S k = s k, Āk = ā k ) pr S k = s k, Āk = ā k ). By (.15), (.14) will be true if pr(s k+1 = s k+1 S k = s k, Āk = ā k ) >. But by the arguments above, pr(s k+1 = s k+1 S k = s k, Āk = ā k ) = pr{s k+1(ā k ) = s k+1 S k(ā k 1 ) = s k }, which is positive because ( s k+1, ā k ) Γ k+1. Next, pr( S k+1 = s k+1, Āk+1 = ā k+1 ) = pr( k+1 = a k+1 S k+1 = s k+1, Āk = ā k )pr( S k+1 = s k+1, Āk = ā k ). However, because a k+1 Ψ k+1 ( s k+1, ā k ), by the positivity assumption, pr( k+1 = a k+1 S k+1 = s k+1, Āk = ā k ) >. The proof is completed by noting that pr(s 1 = s 1, 1 = a 1 ) = pr( 1 = a 1 S 1 = s 1 )pr(s 1 = s 1 ). If s 1 Γ 1, pr(s 1 = s 1 ) >, and pr( 1 = a 1 S 1 = s 1 ) > for a 1 Ψ(s 1 ) by the positivity assumption. To demonstrate (19), consider first the definitions of d (1) ( s, ā 1 ) and V (1) ( s, ā 1 ) given in (5) and (6). These quantities involve the conditional expectation of the potential outcome Y (ā ) given S (ā k 1), which by (.12)-(.13) is the same as the conditional expectation of Y given { S = s, Ā = ā }. Thus, d (1) ( s, ā 1 ) and V (1) ( s, ā 1 ) are the same as d ( s, ā 1 ) and V ( s, ā 1 ) defined in (1) and (11). Next, from (7) and (8), d (1) 1 ( s 1, ā 2 ) = arg (1) max E[V a 1 Ψ 1 ( s 1,ā 2 ) { s 1, S (ā 2, a 1 ), ā 2, a 1 } S 1(ā 2 ) = s 1 ]. This involves the conditional expectation of V (1), a function of S (ā 1), given S 1 (ā 2) = s 1. gain, by (.12)-(.13), this is the same as the conditional expectation of the function V (1) of S given { S = s, Ā 1 = ā 1 }. Because we have already shown that V (1) same as V, this implies that d (1) 1 ( s 1, ā 2 ) is given by arg max E{V ( s 1, S, ā 2, a 1 ) S = s, a 1 Ψ 1 ( s 1,ā 2 ) Ā 1 = (ā 2, a 2 )}, is the

6 6 P.J. SCHULTE ET L. which is the same as d 1 ( s 1, ā 2 ) given by (13) with k = 1. The argument continues in a backward iterative fashion for k = 2,..., JUSTIFICTION FOR ṼI IN -LERNING We wish to show that (.16) ( E V k+1 ( S k+1, Āk) + C k ( S k, Āk 1)[ I{C k ( S k, Āk 1) > } k ] S ) k, Āk 1 = V k ( S k, Āk 1). Defining Γ( S k+1, Āk) = V k+1 ( S k+1, Āk) + C k ( S k, Āk 1)[ I{C k ( S k, Āk 1) > } k ], we may write (.16) as (.17) E[ E{Γ( S k+1, Āk) S k, Āk} S k, Āk 1 ]. The inner expectation in (.17) may be seen to be equal to E{V k+1 ( S k+1, Āk) S k, Āk} + C k ( S k, Āk 1)[ I{C k ( S k, Āk 1) > } k ] = k ( S k, Āk) + C k ( S k, Āk 1)[ I{C k ( S k, Āk 1) > } k ]. Substituting k ( S k, Āk) = h k ( S k, Āk 1) + k C k ( S k, Āk 1), h k ( S k, Āk 1) = k ( S k, Āk 1, ), we obtain E{Γ( S k+1, Āk) S k, Āk} = h k ( S k, Āk 1)+C k ( S k, Āk 1)I{C k ( S k, Āk 1) > } = V k ( S k, Āk 1). Substituting this in (.17) yields the result..4. DEMONSTRTION OF EUIVLENCE OF - ND -LERNING IN SPECIL CSE We take = 1 and let pr( 1 = 1 S 1 = s 1 ) = π. Consider the -learning estimating equations (31) with k = 1, and take λ 1 (s 1 ; ψ 1 ) = / ψ 1 C 1 (s 1 ; ψ 1 ). Then the equations become n i=1 C 1 (S 1i ; ψ 1 ) ψ 1 ( 1i π){y i 1i C 1 (S 1i ; ψ 1 ) h 1 (S 1i ; β 1 )} =, n i=1 h 1 (S 1i ; β 1 ) β 1 {Y i 1i C 1 (S 1i ; ψ 1 ) h 1 (S 1i ; β 1 )} =. Likewise, under these conditions, taking 1 (s 1, a 1 ) = a 1 C 1 (s 1 ; ψ 1 ) + h(s 1 ; β 1 ), the -learning equation is n i=1 1 (S 1i, 1i ; ξ 1 ) ξ 1 {Y i 1i C 1 (S 1i ; ψ 1 ) h 1 (S 1i ; β 1 )} =,

7 where, with ξ 1 = (ψ T 1, βt 1 )T, SUPPLEMENT TO - ND -LERNING METHODS 7 1 (S 1i, 1i ; ξ 1 ) = ξ 1 1i C 1 (S 1i ; ψ 1 ) ψ 1 h 1 (S 1i ; β 1 ) β 1 Thus note that, with C 1 (s 1 ; ψ 1 ) and h 1 (s 1 ; β 1 ) linear in functions of S 1, as long as terms of the form in C 1 (s 1 ; ψ 1 ) are contained in those in h 1 (s 1, β 1 ), the - and -learning estimating equations are identical, as then n i=1. C 1 (S 1i ; ψ 1 ) ψ 1 {Y i 1i C 1 (S 1i ; ψ 1 ) h 1 (S 1i ; β 1 )} =. For example, if C 1 (s 1 ; ψ 1 ) = ψ 1 + s T 1 ψ 11 and h 1 (s 1 ; β 1 ) = β 1 + s T 1 β 11, then note that C 1 (S 1i ; ψ 1 ) = h 1(S 1i ; β 1 ) = 1, ψ 1 β 1 S 1i and the result is immediate. See Chakraborty et al. (21) for discussion of the case = EXMPLE OF INCOMPTIBILITY OF -FUNCTION MODELS To show (33), noting H 2 = (1, s 1, a 1, s 2 ) T = ( T 1, s 2) T, we have E{V 2 (s 1, S 2, a 1 ; ξ 2 ) S 1 = s 1, 1 = a 1 } = T 1 β 21 + β 22 E(S 2 S 1 = s 1, 1 = a 1 ) +( T 1 ψ 21 )E{I( T 1 ψ 21 + S 2 ψ 22 > ) S 1 = s 1, 1 = a 1 } +ψ 22 E{S 2 I( T 1 ψ 21 + S 2 ψ 22 > ) S 1 = s 1, 1 = a 1 }. Taking ψ 22 >, we also have I( T 1 ψ 21 + S 2 ψ 22 > ) = I(S 2 > T 1 ψ 21/ψ 22 ), from which it follows that E{I( T 1 ψ 21 + S 2 ψ 22 > ) S 1 = s 1, 1 = a 1 } = 1 Φ{( T 1 ψ 21/ψ 22 T 1 γ)/σ} = 1 Φ(η) for η = T 1 (ψ 21/ψ 22 + γ)/σ. Similarly, E{S 2 I( T 1 ψ 21 + S 2 ψ 22 > ) S 1 = s 1, 1 = a 1 } = E{S 2 I(S 2 > T 1 ψ 21/ψ 22 ) S 1 = s 1, 1 = a 1 }. It is straightforward to deduce that this is equal to η (σt + T 1 γ) ϕ(t) dt = σϕ(η) + (T 1 γ){1 Φ(η)}. Using E(S 2 S 1 = s 1, 1 = a 1 ) = 1 T γ and combining yields (33)..6. CLCULTION OF E{H( D OPT )} ND R( D OPT ) Calculation for = 1. We consider the generative data model in Section 6.1 and treatment regimes of the form d(s 1 ) = d 1 (s 1 ) = I(ψ 1 + ψ 11 s 1 > ) for arbitrary ψ 1, ψ 11. It is possible to

8 8 P.J. SCHULTE ET L. derive analytically H(d) = E{Y (d)} in this case. Under the generative data model, E{Y (d)} = E[E{Y (d) S 1 }] = E[E{Y S 1, 1 = d 1 (S 1 )}] = β 1 + β 11 E(S 1) + β 12 E(S2 1 ) + E{I(ψ 1 + ψ 11 S 1 > )(ψ 1 + ψ 11 S 1)}, and S 1 Normal(, 1). It is straightforward to deduce that E{I(ψ 1 + ψ 11 S 1 > )} = pr(s 1 > ψ 1 /ψ 11 ) or pr(s 1 < ψ 1 /ψ 11 ) as ψ 11 > or ψ 11 <, which is readily obtained from the standard normal cdf. Likewise, E{S 1 I(ψ 1 +ψ 11 S 1 > )} = E(S 1 S 1 > ψ 1 /ψ 11 )pr(s 1 > ψ 1 /ψ 11 ) if ψ 11 > and E{S 1 I(ψ 1 + ψ 11 S 1 > )} = E(S 1 S 1 < ψ 1 /ψ 11 )pr(s 1 < ψ 1 /ψ 11 ) if ψ 11 <, which are again easily calculated in a manner similar to that in Section.5. Thus, E{Y (d )} is obtained by substituting ψ 1, ψ 11 E{H( )} and hence R( ) for = in the resulting expression. To approximate or, we may use Monte Carlo simulation. Specifically, for the bth of B Monte Carlo data sets, substitute the estimates ψ 1,b, ψ 11,b, say, defining for that data set in the expression for E{Y (d)}, and call the resulting quantity U b. Then E{H( )} is approximated by B 1 B b=1 U b. Combining yields the approximation to R( ). Calculation for = 2. The developments are analogous to those above. We consider the generative data model in Section 6.2 and treatment regimes of the form d = (d 1, d 2 ), where d 1 (s 1 ) = I(ψ 1 + ψ 11 s 1 > ) and d 2 (s 1, s 2, a 1 ) = I(ψ 2 + ψ 21 a 1 + ψ 22 s 2 > ) for arbitrary ( ) { ( ψ 1, ψ 11, ψ 2, ψ 21, ψ 22. Here, E{Y (d)} = E E[E{Y (d) S2 (d), S 1} S 1 ] = E E E[Y S 2, S 1, 1 = ) } d 1 (S 1 ), 2 = d 2 {S 2, S 1, d 1 (S 1 )}] S 1, 1 = d 1 (S 1 ). Because S 1 is binary taking values in {, 1}, ( ) E{Y (d)} = E E[Y S 2, S 1, 1 = d 1 (), 2 = d 2 {S 2,, d 1 ()}] S 1 =, 1 = d 1 () pr(s 1 = ) + ( ) E E[Y S 2, S 1, 1 = d 1 (1), 2 = d 2 {S 2, 1, d 1 (1)}] S 1 = 1, 1 = d 1 (1) pr(s 1 = 1). Under the generative model, writing a 1 = I(ψ 1 + ψ 11 s 1 > ) for brevity, these expectations are of the ( ) form E E[Y S 2, S 1, 1 = d 1 (s 1 ), 2 = d 2 {S 2, s 1, d 1 (s 1 )}] S 1 = s 1, 1 = d 1 (s 1 ) = β 2 + β21 s 1 + β 22 a 1 + β 23 s 1a 1 + β 24 E{(S 2 S 1 = s 1, 1 = d 1 (s 1 )} + β 25 E{S2 2 S 1 = s 1, 1 = d 1 (s 1 )} + (ψ 2 + ψ 21 a 1)E{I(ψ 2 + ψ 21 a 1 + ψ 22 S 2 > ) S 1 = s 1, 1 = d 1 (s 1 )} + ψ 22 E{S 2I(ψ 2 + ψ 21 a 1 + ψ 22 S 2 > ) S 1 = s 1, 1 = d 1 (s 1 )}, for s 1 =, 1. In the generative data model, the conditional distribution of S 2 given S 1, 1 is normal; accordingly, it is straightforward to calculate E{S 2 S 1 = s 1, 1 = d 1 (s 1 )}, E{S 2 2 S 1 = s 1, 1 = d 1 (s 1 )}, E{I(ψ 2 + ψ 21 a 1 + ψ 22 S 2 > ) S 1 = s 1, 1 = d 1 (s 1 )}, and E{S 2 I(ψ 2 + ψ 21 a 1 + ψ 22 S 2 > ) S 1 = s 1, 1 = d 1 (s 1 )} in a manner analogous to those for the case = 1. pproximation of E{H( )} and hence R( ) for = carried out as for the case = 1. or may then be

9 SUPPLEMENT TO - ND -LERNING METHODS 9 Calculation by simulation. When an analytical expression for H(d) = E{Y (d)} for regimes of a certain form d is not available, H(d) for a fixed d may be approximated by simulation using the g-computation algorithm of Robins (1986). We demonstrate for = 2, so that d = (d 1, d 2 ); the procedure for = 1 is then immediate. For total number of simulations B, for each b = 1,..., B, the steps are: (i) Generate s 1b from the true distribution of S 1 ; (ii) generate s 2b from the true conditional distribution of S 2 given S 1 = s 1b and 1 = d 1 (s 1b ); (iii) evaluate the true E(Y S 2 = s 2, Ā2 = ā 2 ) at s 2 = s 2b = (s 1b, s 2b ) and ā 2 = [d 1 (s 1b ), d 2 { s 2b, d 1 (s 1b )}], and call the resulting value U b ; and (iv) estimate H(d) = E{Y (d)} by B 1 B b=1 U b. When d = or, one would follow the above procedure for each Monte Carlo data set. In each of steps (i) (iii), it is important to recognize that, while and are determined by the estimated ψ, the distributions from which realizations are generated depend on the true β and ψ. The values of E{H( and H( )} and E{H( )} may then be approximated by the average of the estimated H( ) across the Monte Carlo data sets, as before. ).7. CRETING EUIVLENTLY MISSPECIFIED PIRS WHEN BOTH THE PROPENSITY MODEL ND -FUNCTION RE MISSPECIFIED Consider the = 2 decision point scenario; the developments apply equally to the = 1 setting. To identify pairs (β25, φ 25 ) that are equivalently misspecified, for each of the combinations of β25 and φ 25 within a pre-specified grid, say (β 25, φ 25 ) [ 1, 1] [ 1, 1] with a step size of.5, we generate a large data set of size n = 1, from the generative data model in Section 6.2 with all other parameters fixed at their true values. This yields = 1681 combinations and hence such data sets. For each data set, the linear regression model for the response and the logistic model for propensity of treatment assignment are then fitted, and the ratio of standard errors for φ 25 and β 25, SE( φ 25 )/SE( β 25 ), say, obtained. We then fit to these values a polynomial model in φ 25, f(φ 25 ), say, and select the polynomial degree yielding a sufficiently large adjusted R 2. Setting β25 = φ 25 /f(φ 25 ) then yields the result that the corresponding t-statistics will be approximately equal. These were re-checked in the course of running the simulations so that the t-statistics differed by less than some reasonable value, usually at most a 5 percent difference, as it cannot be guaranteed that they will be precisely the same.

10 1 P.J. SCHULTE ET L..8. DERIVTION OF H1 (S 1; β1 ) ND C 1 (S 1; ψ1 ) IN THE TWO DECISION POINT SCENRIO We seek to identify the true h 1 (s 1) and C 1 (s 1), where S 1 and 1 are Bernoulli. With h 1 (s 1) = β 1 + β 11 s 1 and C 1 (s 1) = ψ 1 + ψ 11 s 1, it follows that the true -function at the first decision is 1 (s 1, a 1 ) = h 1 (s 1) + a 1 C 1 (s 1). We thus calculate 1 (s 1, a 1 ) under the generative model and equate terms to determine the form of β1, β 11, ψ 1, and ψ 11. The true value function at the second decision is V 2 (S 1, S 2, 1 ) = h 2 (S 1, S 2, 1 ) + C 2 (S 1, S 2, 1 )I{C 2 (S 1, S 2, 1 ) > }. Thus, 1 (s 1, a 1 ) = E{V 2 (S 1, S 2, 1 ) S 1 = s 1, 1 = a 1 } = β 2 + β 21 s 1 + β 22 a 1 + β 23 s 1a 1 + β 24 E{S 2 S 1 = s 1, 1 = a 1 } + β 25 E{S2 2 S 1 = s 1, 1 = a 1 } + E{C 2 (S 1, S 2, 1 )I{C 2 (S 1, S 2, 1 ) > ) S 1 = s 1, 1 = a 1 }. The conditional expectations in this expression may be calculated in a manner analogous to that in Section.5 to obtain the form of 1 (s 1, a 1 ). It follows that 1 (, ) = β 1, 1 (1, ) = β1 + β 11, 1 (, 1) = β 1 + ψ 1, and 1 (1, 1) = β 1 + β 11 + ψ 1 + ψ 11, which may be solved to yield expressions for β 1, β 11, ψ 1, and ψ DDITIONL SIMULTION RESULTS In this section, we present additional summaries of results of the simulations reported in Sections 6.1 and 6.2. The quantities R( ) and R( ) plotted in Figures 1 6 are the v-efficiencies of the estimated regimes produced by each method, as discussed at the beginning of Section 6 of the parent article. These are based on the averages E{H( H( )} and E{H( )}. Because the distributions of the ) and H( ) may be left-skewed, it is instructive to also present both the distributions themselves and alternatives to R( ) and R( ) based on the median. The left-hand panel of Figure.1 is the same as the right-most panel of Figure 1 in the main paper shown for convenience. For comparison, the right-hand panel shows both these distributions and the alternative measures of estimated regime efficiency based on medians rather than averages. Figures.2 and.3 show the same corresponding to Figures 2 and 3 in the main paper. Likewise, Figures.4.6 show the same information corresponding to the lower right-hand panels of Figures 4 6 in the main article. In nearly all cases, the plots using the median dominate those using the mean, reflecting the expected skewness.

11 SUPPLEMENT TO - ND -LERNING METHODS 11 1 % of Ideal Response learning learning φ 12 φ 12 Fig.1. One decision point, misspecified propensity model. Left panel is same as the right-most panel of Figure 1 of the main article. Right panel shows alternative measure of efficiency based on medians and their distributions. 1 % of Ideal Response learning learning β 12 β 12 Fig.2. One decision point, misspecified -function. Left panel is same as the right-most panel of Figure 2 of the main article. Right panel shows alternative measure of efficiency based on medians and their distributions.

12 12 P.J. SCHULTE ET L. 1 % of Ideal Response learning learning φ 12 (corr. to equivalent β 12 ) φ 12 (corr. to equivalent β 12 ) Fig.3. One decision point, both models misspecified. Left panel is same as the right-most panel of Figure 3 of the main article. Right panel shows alternative measure of efficiency based on medians and their distributions. 1 % of Ideal Response learning learning φ 25 φ 25 Fig.4. Two decision points, misspecified propensity model. Left panel is same as the lower right-most panel of Figure 4 of the main article. Right panel shows alternative measure of efficiency based on medians and their distributions.

13 SUPPLEMENT TO - ND -LERNING METHODS 13 1 % of Ideal Response learning learning β 25 β 25 Fig.5. Two decision points, misspecified -function. Left panel is same as the lower right-most panel of Figure 5 of the main article. Right panel shows alternative measure of efficiency based on medians and their distributions. 1 % of Ideal Response learning learning φ 25 (corr. to equivalent β 25 ) φ 25 (corr. to equivalent β 25 ) Fig.6. Two decision points, both models misspecified. Left panel is same as the right-most panel of Figure 6 of the main article. Right panel shows alternative measure of efficiency based on medians and their distributions.

14 14 P.J. SCHULTE ET L. For each level of misspecification in the misspecified propensity score scenario for one decision, Figure.7 plots the average across all simulated data sets of the standard deviations of the estimated propensity scores for each simulated data set. The figure shows that as the level of misspecification increases, the standard deviation decreases, as consistent with the claim in the main article. The bottom panels of Figure.7 show similar results for the two decision point misspecified propensity scenario. For the second decision point, shown in the bottom right panel of Figure.7, where the propensity model is misspecified, the behavior is similar (standard deviation decreases as level of misspecification increases). For comparison, the bottom left panel of Figure.7 shows the behavior of the estimated propensities at the first decision point, where the propensity model is correctly specified; as expected, there is no discernible pattern. We also carried out simulations under the case of misspecified contrast function in a scenario similar to that in Section 6.1. Here, we used the class of generative models defined in (34) with the exception that Y S 1 = s 1, 1 = a 1 Normal{β 1 + β 11s 1 + β 12s a 1 (ψ 1 + ψ 11s 1 + ψ 12s 2 1), 9}, so that ψ = (ψ 1, ψ 11, ψ 12 )T, and thus d = d 1, d 1 (s 1 ) = I(ψ 1 + ψ 11 s 1 + ψ 12 s2 1 > ). For -learning, we assumed models h 1 (s 1 ; β 1 ) = β 1 + β 11 s 1, C 1 (s 1 ; ψ 1 ) = ψ 1 + ψ 11 s 1, and π 1 (s 1 ; φ 1 ) = expit(φ 1 + φ 11 s 1 ), and for -learning used 1 (s 1, a 1 ; ξ 1 ) = h 1 (s 1 ; β 1 ) + a 1 C 1 (s 1 ; ψ 1 ). To simplify notation and distinguish between different components of the -function, we refer to h 1 (s 1 ; β 1 ) as the h-function. These models involve correctly specified contrast functions only when ψ12 =. The h-function is correctly specified when β 12 =, and the propensity model, π 1(s 1 ; φ 1 ), is correctly specified when φ 12 =. We studied the effects of misspecification of the contrast function by systematically varying β 12, φ 12, and ψ 12 while keeping the others fixed, considering parameter settings of the form φ = (, 2, φ 12 )T, β = (1, 1, β12 )T, and ψ = (1,.5, ψ12 )T. Three scenarios were considered: 1. misspecified contrast function alone (β12 = φ 12 =, and nonzero ψ 12 ) 2. misspecified contrast and h-function (φ 12 =, and nonzero β 12 and ψ 12 ) 3. misspecified contrast and propensity model (β12 =, and nonzero φ 12 and ψ 12 ).

15 SUPPLEMENT TO - ND -LERNING METHODS 15 SD of Estimated Propensities φ 12 SD of Estimated Propensities SD of Estimated Propensities φ φ 25 Fig.7. Misspecified propensity model. Each symbol represents the average across all simulated data sets of the standard deviation of the estimated propensity scores for each data set. Top: one decision point. Lower left: two decision points; first stage. Lower right: two decision points; second stage.

16 16 P.J. SCHULTE ET L. Under scenarios 2 and 3, we identified β12 = β 12 (ψ 12 ) and ψ 12 = ψ 12 (φ 12 ), respectively, using the approach outlined in Section.7, so that an analyst might be equally likely to detect either form of misspecification. Further, we observe that in the second and third scenarios, sgn(β 12 ) = sgn(ψ 12 ) and sgn(φ 12 ) = sgn(ψ 12 ), respectively, where sgn(x) = I(x > ). Figure.8 presents the v-efficiencies of estimators for d under - and -learning for each scenario. Here, the true d is an indicator for whether the quadratic function ψ 1 +ψ 11 s 1 +ψ 12 s2 1 is positive. For the given parameterization, when ψ 12 > the true d 1 (s 1 ) = 1 for all s 1. However, estimators for d under - and -learning assume a linear contrast function such that or (s 1) = for some s 1. This is illustrated in all three panels by R( (s 1) = ) < 1 and R( ) < 1. The top panel of Figure.8 indicates that, when only the contrast function is misspecified (scenario 1) and when ψ 12 for ψ 12 >,, a small gain in v-efficiency is achieved by yields slightly better performance as ψ 12 increases. over. lternatively, For the second scenario, we considered misspecification of the contrast and h-function. The results are shown in the bottom left panel of Figure.8. Similar to the prior scenario, we observe small gains in v-efficiency for over when ψ 12 (and β 12 = β 12 (ψ 12 ) ) and superior performance under -learning for some values of ψ12 >. Finally, the bottom right panel shows the results for the scenario of both misspecified contrast and misspecified propensity (scenario 3). Both - and -learning yield estimators for d that exhibit poor v-efficiency for much of the range where φ 12 < (and ψ 12 < ), while shows better v-efficiency relative to when φ 12 > (and ψ 12 > ). These results demonstrate that neither method need dominate the other in terms of performance as reflected by v-efficiency when the contrast function is misspecified..1. DESIGN OF STR*D Figure.9 presents a schematic of the STR*D study design. t levels 2 and 3, patients/physicians expressed preference for switch or augment, and were then randomized to the ions shown. Patients entering level 2 were randomized; those entering level 4 were not..11. EXPLORTORY NLYSIS ND DIGNOSTICS The development of the posited -functions used in the analysis of STR*D data in Section 7 of the main paper relied on input from the investigators. The information included in S 1 and the

17 SUPPLEMENT TO - ND -LERNING METHODS 17 % of Ideal Response learning ψ 12 % of Ideal Response learning % of Ideal Response learning ψ 12 (corr. to equivalent β 12 ) φ 12 (corr. to equivalent ψ 12 ) Fig.8. V-efficiencies R( ) and R( ) for estimating the true d (right panel) under misspecification of the contrast model, one decision point. Top: misspecified contrast function only. Lower left: misspecified contrast function and misspecified h-function. Lower right: misspecified contrast function and misspecified propensity model.

18 18 P.J. SCHULTE ET L. Level 1 Initial treatment: CIT Level 2 Switch treatments: BUP, CT, SER, or VEN ugment treatments: CIT + (BUP, BUS, or CT) Level 2a If received CT in Level 2: BUP or VEN Follow-up Level 3 Switch treatments: MIRT or NTP ugment treatments: previous treatment + (Li or THY) Level 4 Switch to TCP or MIRT + VEN Fig.9. Schematic depiction of the STR*D study. dependence of the models on IDS slope was guided by discussions with John Rush, the principal investigator. We present diagnostic plots for the first and second stage regressions used in the STR*D analysis. Figure.1 displays standard regression diagnostics for the second stage regression models used in - and -learning. The figure suggests that there are no major deviations from the assumed linear model and that there is no evidence of outliers or high-influence points. Figure.11 displays regression diagnostics for the first stage regression models used in - and -learning. The figure suggests that a linear model may be a satisfactory approximation, though it appears that the linear model overestimates the first stage -function for non-responders and underestimates the first stage -function for responders. Recall that, by design, responders have higher responses at the end of the first stage. Thus, responders tend to have higher fitted values and positive residuals. There do not appear to be any outliers or high-influence points. CNOWLEDGMENTS This work was supported by NIH grants R37 I31789, R1 C51962, R1 C85848, P1 C142538, and T32 HL79896.

19 SUPPLEMENT TO - ND -LERNING METHODS Fitted Values Residuals Fitted Values Sqrt(Studentized Residuals) Theoretical uantiles Sample uantiles Leverage Studentized Residuals Fig.1. Regression diagnostics for the second stage -function used in - and -learning. Upper left: displays residuals vs. fitted values; the figure does not suggest deviation from an underlying linear model with homoscedastic variance, and there do not appear to be any outliers. Upper right: displays the studentized residuals vs. the fitted values; the plot does not suggest deviation form an underlying linear model with homoscedastic variance, and there do not appear to be any outliers. Lower left: displays a -plot of the residuals; note that the sample quantiles are a step-function because we have treated the discrete variable IDS as continuous. Lower right: displays the studentized residuals vs. leverage; there do not appear to be any high-influence points.

20 2 P.J. SCHULTE ET L Fitted Values Residuals Fitted Values Sqrt(Studentized Residuals) Theoretical uantiles Sample uantiles Leverage Studentized Residuals Fig.11. Regression diagnostics for the first stage -function used in - and -learning. Responders are depicted with a and non-responders with a. Upper left: displays residuals vs. fitted values; there is a clear distinction between the responders and non-responders. Responders, by design, have higher first stage responses and thus tend to have higher fitted values and positive residuals. Upper right: displays the studentized residuals vs. the fitted values; again, the separation between responders and non-responders is clear. Lower left: displays a -plot of the residuals; note that the sample quantiles are a step-function because we have treated the discrete variable IDS as continuous. Lower right: displays the studentized residuals vs. leverage; there do not appear to be any high-influence points.

21 SUPPLEMENT TO - ND -LERNING METHODS 21 REFERENCES Chakraborty, B., Murphy, S.. and Strecher, V. (21). Inference for non-regular parameters in imal dynamic treatment regimes. Statistical Methods in Medical Research Robins, J. M. (1986). new approach to causal inference in mortality studies with sustained exposure periods: pplications to control of the healthy worker survivor effect. Math. Model

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012)

Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012) Biometrics 71, 267 273 March 2015 DOI: 10.1111/biom.12228 READER REACTION Reader Reaction to A Robust Method for Estimating Optimal Treatment Regimes by Zhang et al. (2012) Jeremy M. G. Taylor,* Wenting

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Interactive Q-learning for Probabilities and Quantiles

Interactive Q-learning for Probabilities and Quantiles Interactive Q-learning for Probabilities and Quantiles Kristin A. Linn 1, Eric B. Laber 2, Leonard A. Stefanski 2 arxiv:1407.3414v2 [stat.me] 21 May 2015 1 Department of Biostatistics and Epidemiology

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2002 Paper 122 Construction of Counterfactuals and the G-computation Formula Zhuo Yu Mark J. van der

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs. Anastasios (Butch) Tsiatis and Marie Davidian

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs. Anastasios (Butch) Tsiatis and Marie Davidian Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs Anastasios (Butch) Tsiatis and Marie Davidian Department of Statistics North Carolina State University http://www4.stat.ncsu.edu/~davidian

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

5 Methods Based on Inverse Probability Weighting Under MAR

5 Methods Based on Inverse Probability Weighting Under MAR 5 Methods Based on Inverse Probability Weighting Under MAR The likelihood-based and multiple imputation methods we considered for inference under MAR in Chapters 3 and 4 are based, either directly or indirectly,

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

A Robust Method for Estimating Optimal Treatment Regimes

A Robust Method for Estimating Optimal Treatment Regimes Biometrics 63, 1 20 December 2007 DOI: 10.1111/j.1541-0420.2005.00454.x A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian Department

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES. University of Michigan, Ann Arbor

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES. University of Michigan, Ann Arbor Submitted to the Annals of Applied Statistics TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES BY YEBIN TAO, LU WANG AND DANIEL ALMIRALL University of Michigan, Ann Arbor

More information

Sample size considerations for precision medicine

Sample size considerations for precision medicine Sample size considerations for precision medicine Eric B. Laber Department of Statistics, North Carolina State University November 28 Acknowledgments Thanks to Lan Wang NCSU/UNC A and Precision Medicine

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Causal Inference for Complex Longitudinal Data: The Continuous Time g-computation Formula

Causal Inference for Complex Longitudinal Data: The Continuous Time g-computation Formula Causal Inference for Complex Longitudinal Data: The Continuous Time g-computation Formula Richard D. Gill Mathematical Institute, University of Utrecht, Netherlands EURANDOM, Eindhoven, Netherlands November

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes

A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes Thomas A. Murray, (tamurray@mdanderson.org), Ying Yuan, (yyuan@mdanderson.org), and Peter F. Thall (rex@mdanderson.org) Department

More information

Methods for Interpretable Treatment Regimes

Methods for Interpretable Treatment Regimes Methods for Interpretable Treatment Regimes John Sperger University of North Carolina at Chapel Hill May 4 th 2018 Sperger (UNC) Interpretable Treatment Regimes May 4 th 2018 1 / 16 Why Interpretable Methods?

More information

Targeted Group Sequential Adaptive Designs

Targeted Group Sequential Adaptive Designs Targeted Group Sequential Adaptive Designs Mark van der Laan Department of Biostatistics, University of California, Berkeley School of Public Health Liver Forum, May 10, 2017 Targeted Group Sequential

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Set-valued dynamic treatment regimes for competing outcomes

Set-valued dynamic treatment regimes for competing outcomes Set-valued dynamic treatment regimes for competing outcomes Eric B. Laber Department of Statistics, North Carolina State University JSM, Montreal, QC, August 5, 2013 Acknowledgments Zhiqiang Tan Jamie

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

4. STATISTICAL SAMPLING DESIGNS FOR ISM

4. STATISTICAL SAMPLING DESIGNS FOR ISM IRTC Incremental Sampling Methodology February 2012 4. STATISTICAL SAMPLING DESIGNS FOR ISM This section summarizes results of simulation studies used to evaluate the performance of ISM in estimating the

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study Theoretical Mathematics & Applications, vol.2, no.3, 2012, 75-86 ISSN: 1792-9687 (print), 1792-9709 (online) Scienpress Ltd, 2012 Strategy of Bayesian Propensity Score Estimation Approach in Observational

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

MODULE 6 LOGISTIC REGRESSION. Module Objectives: MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Principles Underlying Evaluation Estimators

Principles Underlying Evaluation Estimators The Principles Underlying Evaluation Estimators James J. University of Chicago Econ 350, Winter 2019 The Basic Principles Underlying the Identification of the Main Econometric Evaluation Estimators Two

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models James J. Heckman and Salvador Navarro The University of Chicago Review of Economics and Statistics 86(1)

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Residuals in the Analysis of Longitudinal Data

Residuals in the Analysis of Longitudinal Data Residuals in the Analysis of Longitudinal Data Jemila Hamid, PhD (Joint work with WeiLiang Huang) Clinical Epidemiology and Biostatistics & Pathology and Molecular Medicine McMaster University Outline

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 290 Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome Mark

More information

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Balancing Covariates via Propensity Score Weighting: The Overlap Weights Balancing Covariates via Propensity Score Weighting: The Overlap Weights Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu PSU Methodology Center Brown Bag April 6th, 2017 Joint

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248) AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 www.aimhighschool.com COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will

More information

School District of Marshfield Course Syllabus

School District of Marshfield Course Syllabus School District of Marshfield Course Syllabus Course Name: Algebra I Length of Course: 1 Year Credit: 1 Program Goal(s): The School District of Marshfield Mathematics Program will prepare students for

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Estimation of Treatment Effects under Essential Heterogeneity

Estimation of Treatment Effects under Essential Heterogeneity Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

5.6 The Normal Distributions

5.6 The Normal Distributions STAT 41 Lecture Notes 13 5.6 The Normal Distributions Definition 5.6.1. A (continuous) random variable X has a normal distribution with mean µ R and variance < R if the p.d.f. of X is f(x µ, ) ( π ) 1/

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Stratification and Weighting Via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study

Stratification and Weighting Via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study Stratification and Weighting Via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study Jared K. Lunceford 1 and Marie Davidian 2 1 Merck Research Laboratories, RY34-A316,

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Authors and Affiliations: Nianbo Dong University of Missouri 14 Hill Hall, Columbia, MO Phone: (573)

Authors and Affiliations: Nianbo Dong University of Missouri 14 Hill Hall, Columbia, MO Phone: (573) Prognostic Propensity Scores: A Method Accounting for the Correlations of the Covariates with Both the Treatment and the Outcome Variables in Matching and Diagnostics Authors and Affiliations: Nianbo Dong

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

Putzer s Algorithm. Norman Lebovitz. September 8, 2016

Putzer s Algorithm. Norman Lebovitz. September 8, 2016 Putzer s Algorithm Norman Lebovitz September 8, 2016 1 Putzer s algorithm The differential equation dx = Ax, (1) dt where A is an n n matrix of constants, possesses the fundamental matrix solution exp(at),

More information

Semiparametric Regression and Machine Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Semiparametric Regression and Machine Learning Methods for Estimating Optimal Dynamic Treatment Regimes Semiparametric Regression and Machine Learning Methods for Estimating Optimal Dynamic Treatment Regimes by Yebin Tao A dissertation submitted in partial fulfillment of the requirements for the degree of

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Parameter Estimation for Random Differential Equation Models

Parameter Estimation for Random Differential Equation Models Parameter Estimation for Random Differential Equation Models H.T. Banks and Michele L. Joyner Center for Research in Scientific Computation North Carolina State University Raleigh, NC, United States and

More information