Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Size: px

Start display at page:

Download "Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall"

Brice Brown
5 years ago
Views:

1 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work with Thomas Ten Have (UPenn) & Susan Murphy (UMich) JSM Salt Lake City, Utah July 31, 2007

2 Contents 2 Contents 1 Warm-up: Why do we condition on pre-treatment variables S? 3 2 Time-Varying Effect Moderation 6 3 Robins Structural Nested Mean Model 8 4 Estimation in Time-Varying Setting 9 5 Bias-Variance Trade-off 16 6 Conclusions 23

3 1 Warm-up: Why do we condition on pre-treatment variables S? 3 1 Warm-up: Why do we condition on pre-treatment variables S?

4 1 Warm-up: Why do we condition on pre-treatment variables S? 4 We want the effect of A on Y. Why condition on (adjust for) pre-treatment variables S? 1. Confounding: S is correlated with both A and Y. In this case, S is known as a confounder of the effect of A on Y. 2. Precision: S may be a pre-treatment measure of Y, or any other variable highly correlated with Y. 3. Missing Data: The outcome Y is missing for some units, S and A predict missingness, and S is associated with Y. 4. Effect Heterogeneity: S may moderate, temper, or specify the effect of A on Y. In this case, S is known as a moderator of the effect of A on Y. Formalized in next slide.

5 1 Warm-up: Why do we condition on pre-treatment variables S? 5 Effect Moderation in One Time Point Definition: Fix a 0. Let µ(s, a) E[Y (a) Y (0) S = s]. If µ(s, a) is non-constant in s, then S is said to be a moderator of the effect of a on Y. Example: If the structural model for the conditional mean of Y (a) given S is E[Y (a) S = s] = β 0 + γ 1 s + β 1 a + β 2 sa, then µ(s, a) = β 1 a + β 2 sa. In this example, S is a moderator of the effect of a on Y if β 2 0.

6 2 Time-Varying Effect Moderation 6 2 Time-Varying Effect Moderation The focus of this paper is time-varying effect moderation. The data structure in the time-varying setting is {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. Running Example, from the PROSPECT Study: (a 1, a 2 ) Time-varying treatment pattern; a t is binary (0,1) Y (a 1, a 2 ) Depression at the end of the study; continuous S 1 S 2 (a 1 ) Suicidal Ideation at baseline visit; continuous Suicidal Ideation at second visit; continuous

7 2 Time-Varying Effect Moderation 7 Formal Definition of Time-Varying Causal Effects Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. Conditional Intermediate Causal Effect at t = 1: µ 1 (s 1, a 1 ) E[Y (a 1, 0) Y (0, 0) S 1 = s 1 ] = a 1 H 1 β 1 (Linear Parameterization) say = a 1 (β 11 + β 12 s 1 ) (Sample Model) Conditional Intermediate Causal Effect at t = 2: µ 2 ( s 2, ā 2 ) E[Y (a 1, a 2 ) Y (a 1, 0) S 1 = s 1, S 2 (a 1 ) = s 2 ] = a 2 H 2 β 2 (Linear Parameterization) say = a 2 (β 21 + β 22 (s 1 + s 2 )/2) (Sample Model)

8 3 Robins Structural Nested Mean Model 8 3 Robins Structural Nested Mean Model Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. The SNMM for the conditional mean of Y (a 1, a 2 ) given S 2 (a 1 ) is: E [ Y (a 1, a 2 ) S ] 2 (a 1 ) = s 2 = µ0 + ɛ 1 (s 1 ) + µ 1 (s 1, a 1 ) + ɛ 2 ( s 2, a 1 ) + µ 2 ( s 2, ā 2 ), where ɛ 2 ( s 2, a 1 ) = E[Y (a 1, 0) S 2 (a 1 ) = s 2 ] E[Y (a 1, 0) S 1 = s 1 ], ɛ 1 (s 1 ) = E[Y (0, 0) S 1 = s 1 ] E[Y (0, 0)], µ 2 ( s 2, a 2, 0) = 0 and µ 1 (s 1, 0) = 0, E S2 S 1 [ɛ 2 ( s 2, a 1 ) S 1 = s 1 ] = 0, and E S1 [ɛ 1 (s 1 )] = 0.

9 4 Estimation in Time-Varying Setting 9 4 Estimation in Time-Varying Setting Recall that parametric models for our causal estimands µ 1 and µ 2 are based on the set of parameters β = (β 1, β 2 ). The dissertation considers two estimators for β: 1. Proposed 2-Stage Regression Estimator 2. Robins Semi-parametric G-Estimator *** We can use 2-Stage Estimator for starting guesses *** Both estimators rely on Robins Consistency and Sequential Ignorability assumptions. The 2-Stage Estimator relies more on modelling assumptions. We discuss these in turn, but first...

10 4 Estimation in Time-Varying Setting 10 What s wrong with the Traditional Estimator? An Example of The Traditional Estimator: Apply OLS with E(Y S 2 = s 2, Ā2 = ā 2 ) = β 0 + η 1 s 1 + a 1 (β 1 + β 2s 1 ) + η 2 s 2 + a 2 (β 3 + β 4(s 1 + s 2 )/2) - Possibly incorrectly specified nuisance functions. - Two problems arise when using the traditional regression estimator. - These problems occur even in the absence of time-varying confounders (that is, even under Sequential Ignorability).

11 4 Estimation in Time-Varying Setting 11 First problem with the Traditional Estimator Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. Wrong Effect Baseline 4-month Visit 8-month Visit a Set 1 a 2 = 0 S 1 S 2 (a 1 ) Y (a 1, 0) What about the effect transmitted through S 2 (a 1 )?

12 4 Estimation in Time-Varying Setting 12 Second problem with the Traditional Estimator Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. Spurious Effect Baseline 4-month Visit 8-month Visit V 0 S 1 a Set 1 a 2 = 0 S 2 (a 1 ) Y (a 1, 0) Berkson s paradox; Judea Pearl s backdoor criterion

13 4 Estimation in Time-Varying Setting 13 Parameterizing the Nuisance Functions So we must parameterize the nuisance functions correctly. Recall the constraints on the nuisance functions: ɛ 2 ( s 2, a 1 ) = E[Y (a 1, 0) S 2 (a 1 ) = s 2 ] E[Y (a 1, 0) S 1 = s 1 ], ɛ 1 (s 1 ) = E[Y (0, 0) S 1 = s 1 ] E[Y (0, 0)], E S2 S 1 [ɛ 2 ( s 2, a 1 ) S 1 = s 1 ] = 0, and E S1 [ɛ 1 (s 1 )] = 0. Example parameterizations for the nuisance functions: ɛ 1 (s 1 ) say ( = η 1,1 s1 E(S 1 ) ) ɛ 2 ( s 2, a 1 ) say = ( η 2,1 + η 2,2 s 1 )( s2 E(S 2 (a 1 ) S 1 = s 1 ) )

14 4 Estimation in Time-Varying Setting 14 Proposed 2-Stage Regression Estimator Recall that E [ Y (a 1, a 2 ) S 2 (a 1 ) = s 2 ] = µ2 ( s 2, ā 2 ; β 2 ) + ɛ 2 ( s 2, a 1 ; η 2, γ 2 ) + µ 1 (s 1, a 1 ; β 1 ) + ɛ 1 (s 1 ; η 1, γ 1 ) + µ We have models for the µ s: A 1 H 1 β 1 and A 2 H 2 β 2 ; Set aside 2. Model m 1 (γ 1 ) = E(S 1 ), estimate γ 1 with GLM; model m 2 (s 1, a 1 ; γ 2 ) = E(S 2 (a 1 ) S 1 = s 1 ), estimate γ 2 with GLM 3. Construct residuals ˆδ 1 = s 1 ˆm 1 (ˆγ 1 ) and ˆδ 2 = s 2 ˆm 2 (s 1, a 1 ; ˆγ 2 ) 4. Construct models for ɛ s: G 1ˆδ1 η 1 = G 1 η 1 and G 2ˆδ2 η 2 = G 2 η 2 5. Obtain ˆβ and ˆη using OLS of Y [1, G 1, A 1H 1, G 2, A 2H 2 ]

15 4 Estimation in Time-Varying Setting 15 Robins Semi-parametric G-Estimator Robins G-Estimator is the solution to these estimating equations: { ( 0 = P n Y A 2 H 2 β 2 b 2 ( S ) ( 2, A 1 ) A 2 p 2 ( S ) 2, A 1 ) 0 H 2 } ( ) ( ) + Y A 2 H 2 β 2 A 1 H 1 β 1 b 1 (S 1 ) A 1 p 1 (S 1 ) H 1 (H 1 ) [ ] H 2 A 2 S 1, A 1 = 1 (H 1 ) = E E b 2 ( S 2, A 1 ) = E [ Y A 2 H 2 β 2 S ] 2, A 1 p 2 ( S 2, A 1 ) = P r [ A 2 = 1 S ] 2, A 1 b 1 (S 1 ) = E [Y A 2 H 2 β 2 A 1 H 1 β 1 S 1 ] p 1 (S 1 ) = P r [A 1 = 1 S 2 ] [ ] H 2 A 2 S 1, A 1 = 0

16 5 Bias-Variance Trade-off 16 5 Bias-Variance Trade-off This discussion assumes true models for the causal effects, the µ t s: Robins G-Estimator is unbiased if either p t or b t are correctly specified. So-called double-robustness property. Robins G-Estimator is semi-parametric efficient if p t, b t, and are all correctly specified. 2-Stage Regression Estimator is unbiased only if the nuisance functions are correctly specified. 2-Stage Regression Estimator with correctly specified nuisance is more efficient than G-Estimator But what happens as we mis-specify the nuisance functions?

17 5 Bias-Variance Trade-off 17 A Simulation Study of the Bias-Variance Trade-off 1. Generate Y ( S 3, Ā3) SNMM; use n = Move away from true model fit in a smooth and principled way 3. Fit 2-Stage Estimator and Robins G-Estimator 4. Repeat N = 1000 times 5. Compare bias and variance of the 2-Stage Estimator versus the G-Estimator as you move away from true fit

18 5 Bias-Variance Trade-off 18 Mis-specifying ɛ t s using S = S N(1, sd = ν) Exploration by Multiplicative Error Method true Mean Squared Difference (MSD) Scaled Root Mean Squared Difference (SRMSD) υ

19 5 Bias-Variance Trade-off 19 Results Note two different fits for Robins G-Estimator: 1. One using bad starting guesses (i.e., setting b t = 0) 2. The other using the results of the 2-Stage Estimator as starting guesses

20 5 Bias-Variance Trade-off RMSD Mean υ a1 a1:s1 2 Stage G Estimator: b_t = 0 G Estimator: 2 Stg Guess a2 a2:s2 a3 a3:s υ RMSD

21 5 Bias-Variance Trade-off RMSD Standard Deviation υ a1 a1:s1 a2 a2:s2 2 Stage G Estimator: b_t = 0 G Estimator: 2 Stg Guess a3 a3:s υ RMSD

22 5 Bias-Variance Trade-off RMSD Relative MSE: G Estimator / 2 Stage υ a1 a1:s1 a2 a2:s2 G Estimator: b_t = 0 G Estimator: 2 Stg Guess a3 a3:s υ RMSD

23 6 Conclusions 23 6 Conclusions 1. The bias-variance trade-off is real and important. 2. The G-Estimator relies on good starting guesses. 3. With bad b t guesses, the G-Estimator never dominates in terms of MSE. 4. With b t guesses from 2-Stage Estimator, the G-Estimator starts to dominate when amount of mis-specifiction is between small and medium. 5. Standard errors are difficult for both estimators. 6. The 2-Stage Estimator is easy to carry out, intuitive, and can stand alone as its own estimator.

24 6 Conclusions 24 Thank you! More Questions?

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work