Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Size: px

Start display at page:

Download "Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall"

Alyson Jackson
5 years ago
Views:

1 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work with Thomas Ten Have (UPenn) & Susan Murphy (UMich) NC State University, Dept. of Statistics September 27, 2007

2 Contents 2 Contents 1 Warm-up: Why condition on pre-treatment variables? 4 2 Time-Varying Effect Moderation 8 3 Robins Structural Nested Mean Model 11 4 Estimation in Time-Varying Setting 14 5 Bias-Variance Trade-off 20 6 Conclusions 25

3 Contents 3 7 Extra Slides 28

4 1 Warm-up: Why condition on pre-treatment variables? 4 1 Warm-up: Why condition on pre-treatment variables?...in a study of the effects of a treatment A (e.g., intervention to help reduce depression) on a response Y (e.g., depression)?

5 1 Warm-up: Why condition on pre-treatment variables? 5 We want the effect of A on Y. Why condition on (adjust for) pre-treatment variables S? 1. Confounding: S is correlated with both A and Y. In this case, S is known as a confounder of the effect of A on Y. 2. Precision: S may be a pre-treatment measure of Y, or any other variable highly correlated with Y. 3. Missing Data: The outcome Y is missing for some units, S and A predict missingness, and S is associated with Y. 4. Effect Heterogeneity: S may moderate, temper, or specify the effect of A on Y. In this case, S is known as a moderator of the effect of A on Y. Formalized in next slide.

6 1 Warm-up: Why condition on pre-treatment variables? 6 Effect Moderation in One Time Point Definition: Fix a 0. Let µ(s, a) E[Y (a) Y (0) S = s]. If µ(s, a) is non-constant in s, then S is said to be a moderator of the effect of a on Y. Example: If the structural model for the conditional mean of Y (a) given S is E[Y (a) S = s] = β 0 + γ 1 s + β 1 a + β 2 sa, then µ(s, a) = β 1 a + β 2 sa. In this example, S is a moderator of the effect of a on Y if β 2 0.

7 1 Warm-up: Why condition on pre-treatment variables? 7 Relevance of Effect Moderation Two main implications: Theoretical Implication: Understanding the heterogeneity of the effects of causes enhances our understanding of various/competing scientific theories; and it may suggest new scientific (etiologic) hypotheses to be tested. Practical Implication: Identifying types, or subgroups, of individuals for which treatment is not effective may suggest altering the treatment to suit the needs of that particular type of individual.

8 2 Time-Varying Effect Moderation 8 2 Time-Varying Effect Moderation The focus of this paper is time-varying effect moderation. The data structure in the time-varying setting is {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )} = {ā 2, S 2 (a 1 ), Y (ā 2 )}. Running Example, from the PROSPECT Study: (a 1, a 2 ) Time-varying treatment pattern; a t is binary (0,1) Y (a 1, a 2 ) Depression at the end of the study; continuous S 1 S 2 (a 1 ) Suicidal Ideation at baseline visit; continuous Suicidal Ideation at second visit; continuous

9 2 Time-Varying Effect Moderation 9 Formal Definition of Time-Varying Causal Effects Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. Conditional Intermediate Causal Effect at t = 1: µ 1 (s 1, a 1 ) E[Y (a 1, 0) Y (0, 0) S 1 = s 1 ] Conditional Intermediate Causal Effect at t = 2: µ 2 ( s 2, ā 2 ) E[Y (a 1, a 2 ) Y (a 1, 0) S 1 = s 1, S 2 (a 1 ) = s 2 ]

10 2 Time-Varying Effect Moderation 10 As a Decomposition of the Marginal Causal Effect Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. Consider the following arithmetic decomposition of the causal effect of (a 1, a 2 ) on Y, using the covariates S 2 (a 1 ): E [ Y (a 1, a 2 ) Y (0, 0) ] = E [E [ Y (a 1, a 2 ) Y (a 1, 0) S 2 (a 1 ) ]] [ + E E [ ] ] Y (a 1, 0) Y (0, 0) S 1. The inner expectations represent the conditional intermediate causal effects µ 1 and µ 2, respectively.

11 3 Robins Structural Nested Mean Model 11 3 Robins Structural Nested Mean Model Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. The SNMM for the conditional mean of Y (a 1, a 2 ) given S 2 (a 1 ) is: E [ Y (a 1, a 2 ) S 2 (a 1 ) ] { } = E[Y (0, 0)] + E[Y (0, 0) S 1 ] E[Y (0, 0)] + {E [ ] } Y(a 1, 0) Y(0, 0) S 1 { + E[Y (a 1, 0) S } 2 (a 1 )] E[Y (a 1, 0) S 1 ] + {E [ Y(a 1, a 2 ) Y(a 1, 0) S 2 (a 1 ) ]}

12 3 Robins Structural Nested Mean Model 12 The Causal versus Nuisance Functionals Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. The decomposition can be written more succinctly as: E [ Y (a 1, a 2 ) S ] 2 (a 1 ) = s 2 = µ0 + ɛ 1 (s 1 ) + µ 1 (s 1, a 1 ) + ɛ 2 ( s 2, a 1 ) + µ 2 ( s 2, ā 2 ), where ɛ 2 ( s 2, a 1 ) = E[Y (a 1, 0) S 2 (a 1 ) = s 2 ] E[Y (a 1, 0) S 1 = s 1 ], ɛ 1 (s 1 ) = E[Y (0, 0) S 1 = s 1 ] E[Y (0, 0)], µ 2 ( s 2, a 2, 0) = 0 and µ 1 (s 1, 0) = 0, E S2 S 1 [ɛ 2 ( s 2, a 1 ) S 1 = s 1 ] = 0, and E S1 [ɛ 1 (s 1 )] = 0.

13 3 Robins Structural Nested Mean Model 13 Parametrizing the Intermediate Causal Effects Recall the data structure {S 1, a 1, S 2 (a 1 ), a 2, Y (a 1, a 2 )}. Conditional Intermediate Causal Effect at t = 1: µ 1 (s 1, a 1 ) E[Y (a 1, 0) Y (0, 0) S 1 = s 1 ] = a 1 H 1 β 1 (Linear Parameterization) say = a 1 (β 11 + β 12 f 1 (s 1 )) (Sample) Conditional Intermediate Causal Effect at t = 2: µ 2 ( s 2, ā 2 ) E[Y (a 1, a 2 ) Y (a 1, 0) S 1 = s 1, S 2 (a 1 ) = s 2 ] = a 2 H 2 β 2 (Linear Parameterization) say = a 2 (β 21 + β 22 f 2 (s 1, s 2 ) + β 23 a 1 ) (Sample)

14 4 Estimation in Time-Varying Setting 14 4 Estimation in Time-Varying Setting Assume: True models for µ t s Consistency Assumptions Sequential Ignorability (given S 2 (a 1 )) Assumption In the sequel, we consider three estimators for β: 1. Traditional (Naive) Regression Estimator 2. Proposed 2-Stage Regression Estimator 3. Robins Semi-parametric G-Estimator Two flavors of the Semi-parametric G-Estimator

15 4 Estimation in Time-Varying Setting 15 What s wrong with the Traditional Estimator? An Example of The Traditional Estimator: Apply OLS with E(Y S 2 = s 2, Ā2 = ā 2 ) = β 0 + η 1 s 1 + a 1 (β 1 + β 2s 1 ) + η 2 s 2 + a 2 (β 3 + β 4(s 1 + s 2 )/2) - Possibly incorrectly specified nuisance functions. - Two problems arise when using the traditional regression estimator. - These problems occur even in the absence of time-varying confounders (that is, even under Sequential Randomization).

16 4 Estimation in Time-Varying Setting 16 First problem with the Traditional Estimator E(Y S 2 = s 2, Ā2 = ā 2 ) = β 0 + η 1 s 1 + a 1 (β 1 + β 2s 1 ) + η 2 s 2 + a 2 (β 3 + β 4(s 1 + s 2 )/2) Wrong Effect Baseline 4-month Visit 8-month Visit a Set 1 a 2 = 0 S 1 S 2 (a 1 ) Y (a 1, 0) What about the effect transmitted through S 2 (a 1 )?

17 4 Estimation in Time-Varying Setting 17 Second problem with the Traditional Estimator E(Y S 2 = s 2, Ā2 = ā 2 ) = β0 + η 1 s 1 + a 1 (β1 + β2s 1 ) + η 2 s 2 + a 2 (β3 + β4(s 1 + s 2 )/2) Spurious Effect Baseline 4-month Visit 8-month Visit V 0 S 1 a Set 1 a 2 = 0 S 2 (a 1 ) Y (a 1, 0) Berkson s paradox; Judea Pearl s backdoor criterion

18 4 Estimation in Time-Varying Setting 18 Proposed 2-Stage Estimator Approach Recall the nuisance function constraints: E S2 S 1 [ɛ 2 ( s 2, a 1 ) S 1 = s 1 ] = 0, and E S1 [ɛ 1 (s 1 )] = 0. General steps for the 2-Stage Approach: 1. Model the distributions f 1 (S 1 ; γ 1 ) and f 2 (S 2 S 1, A 1 ; γ 2 ) 2. Using η for added flexibility, construct nuisance models ɛ 1 (s 1 ; γ 1, η 1 ) and ɛ 2 ( s 2, a 1 ; γ 2, η 2 ) that satisfy the constraints 3. Then solve for β in the following OLS estimating equations: { ( 2 2 ) 0 = P n Y µ 0 A t H t β t ɛ t ( s t, a t 1 ; γ t, η t ) A } 1H 1 t=1 t=1 A 2 H 2

19 4 Estimation in Time-Varying Setting 19 Semi-parametric G-Estimator Approach The G-Estimator is the solution to these estimating equations: { ( 0 = P n Y A 2 H 2 β 2 b 2 ( S ) ( 2, A 1 ) A 2 p 2 ( S ) 2, A 1 ) 0 H 2 } ( ) ( ) + Y A 2 H 2 β 2 A 1 H 1 β 1 b 1 (S 1 ) A 1 p 1 (S 1 ) H 1 (H 1 ) [ ] H 2 A 2 S 1, A 1 = 1 (H 1 ) = E E b 2 ( S 2, A 1 ) = E [ Y A 2 H 2 β 2 S ] 2, A 1 p 2 ( S 2, A 1 ) = P r [ A 2 = 1 S ] 2, A 1 b 1 (S 1 ) = E [Y A 2 H 2 β 2 A 1 H 1 β 1 S 1 ] p 1 (S 1 ) = P r [A 1 = 1 S 2 ] [ ] H 2 A 2 S 1, A 1 = 0

20 5 Bias-Variance Trade-off 20 5 Bias-Variance Trade-off This discussion assumes true models for the causal effects, the µ t s: Robins G-Estimator is unbiased if either p t or b t are correctly specified. So-called double-robustness property. Robins G-Estimator is semi-parametric efficient if p t, b t, and are all correctly specified. 2-Stage Regression Estimator is unbiased only if the nuisance functions, the ɛ t s, are correctly specified. 2-Stage Regression Estimator with correctly specified nuisance is more efficient than G-Estimator What happens as we mis-specify the nuisance functions, the ɛ t s?

21 5 Bias-Variance Trade-off 21 A Simulation Study of the Bias-Variance Trade-off 1. Generate Y ( S 3, Ā3) using a SNMM; use n = 300, K = 3 2. Fit 2-Stage Estimator and two versions of G-Estimator: Linear Version of 2-Stage Estimator Robins G-Estimator with b t = 0 Robins G-Estimator using 2-Stage Guesses for b t 3. Repeat N = 1000 times 4. Repeat for different ɛ t -mis-specified 2-Stage fits 5. Based on MSE of estimates of β, compare bias-variance trade-off of the 2-Stage Estimator versus the G-Estimator as you move away from true fit

22 5 Bias-Variance Trade-off 22 Specifying/Measuring Mis-specification of ɛ t How do we measure the distance of the fitted ɛ t (ν) from the true nuisance function ɛ TRUE t? SRMSD(ν) = K t E(ɛ TRUE t ɛ t (ν)) 2 V ar(y ) Note that this has an effect-size-like interpretation. Thus, SRMSD=0.5 is about a moderate amount of mis-specification.

23 5 Bias-Variance Trade-off 23 Simulation Experiment Results

24 5 Bias-Variance Trade-off 24 REL MSE (G Estimator to 2 Stage Estimator) SRMSD β 10 β 11 β 20 β 21 β 30 β SRMSD G Estimator with b = 0 G Estimator with b = 2 Stage Guess

25 6 Conclusions 25 6 Conclusions 1. Offered a way to think about time-varying effect moderation 2. The bias-variance trade-off is real and important. 3. G-Estimator starts to dominate (in terms of MSE) when amount of ɛ t -mis-specifiction is moderate 4. 2-Stage Estimator provides a principled method by which to construct high quality b t guesses for the G-Estimator Stage Estimator is intuitive, and can stand alone as its own estimator. 6. Linear implementation of 2-Stage Estimator is easy to carry out, but standard errors are difficult. (Bootstrap SEs?)

26 6 Conclusions 26 Main References Robins JM (1994). Correcting for Non-compliance in Randomized Trials Using Structural Nested Mean Models. Communications in Statistics, Theory and Methods, 23, Almirall D, Ten Have T, Murphy SA. Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Submitted.

27 6 Conclusions 27 Thank you! More Questions?

28 7 Extra Slides 28 7 Extra Slides

29 7 Extra Slides 29 Linear Implementation of the 2-Stage Approach

30 7 Extra Slides 30 Linear Parametrization of the Nuisance Functions So we must parameterize the nuisance functions correctly. Recall the constraints on the nuisance functions: ɛ 2 ( s 2, a 1 ) = E[Y (a 1, 0) S 2 (a 1 ) = s 2 ] E[Y (a 1, 0) S 1 = s 1 ], ɛ 1 (s 1 ) = E[Y (0, 0) S 1 = s 1 ] E[Y (0, 0)], E S2 S 1 [ɛ 2 ( s 2, a 1 ) S 1 = s 1 ] = 0, and E S1 [ɛ 1 (s 1 )] = 0. Example parameterizations for the nuisance functions: ɛ 1 (s 1 ) say ( = η 1,1 s1 E(S 1 ) ) ɛ 2 ( s 2, a 1 ) say = ( η 2,1 + η 2,2 s 1 )( s2 E(S 2 (a 1 ) S 1 = s 1 ) )

31 7 Extra Slides 31 Steps for Linear Implementation Recall that E [ Y (a 1, a 2 ) S 2 (a 1 ) = s 2 ] = µ2 ( s 2, ā 2 ; β 2 ) + ɛ 2 ( s 2, a 1 ; η 2, γ 2 ) + µ 1 (s 1, a 1 ; β 1 ) + ɛ 1 (s 1 ; η 1, γ 1 ) + µ We have models for the µ s: A 1 H 1 β 1 and A 2 H 2 β 2 ; Set aside 2. Model m 1 (γ 1 ) = E(S 1 ), estimate γ 1 with GLM; model m 2 (s 1, a 1 ; γ 2 ) = E(S 2 (a 1 ) S 1 = s 1 ), estimate γ 2 with GLM 3. Construct residuals ˆδ 1 = s 1 ˆm 1 (ˆγ 1 ) and ˆδ 2 = s 2 ˆm 2 (s 1, a 1 ; ˆγ 2 ) 4. Construct models for ɛ s: G 1ˆδ1 η 1 = G 1 η 1 and G 2ˆδ2 η 2 = G 2 η 2 5. Obtain ˆβ and ˆη using OLS of Y [1, G 1, A 1H 1, G 2, A 2H 2 ]

32 7 Extra Slides 32 An Illustrative Data Analysis

33 7 Extra Slides 33 The Data n = 277 geriatric primary care patients from PROSPECT Study K = 3: visits to clinic at baseline, 4-, 8-, and 12-months Data structure is {S 1, A 1, S 2, A 2, S 3, A 3, Y } S t = suicidal ideation, A t = adherence, Y = 12-month depression Treatment is defined as ever meeting with a health specialist Monotonic treatment pattern: (0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1) Assumptions: - Consistency/SUTVA - Sequential Ignorability given S 3 (very likely violated) - Modeling assumptions

34 7 Extra Slides 34 Models Used in the Illustrative Analysis Causal effects: (expect β t0 < 0 and β t1 > 0) 1. µ 1 (S 1, a 1 ) = a 1 (β 10 + β 11 S 1 ), 2. µ 2 ( S 2, ā 2 ) = a 2 (β 20 + β 21 (S 1 + S 2 ) /2), and 3. µ 3 ( S 3, ā 3 ) = a 3 (β 30 + β 31 (S 1 + S 2 + S 3 )) /3). Nuisance functions: 1. ɛ 1 (S 1 ) = η 1,1 (S 1 γ 1,1 ) 2. ɛ 2 ( S 2, A 1 ) = η 2,1 (S 2 (γ 2,1 + γ 2,2 S 1 + γ 2,3 A 1 + γ 2,4 S 1 A 1 + γ 2,4 S1 2)) 3. ɛ 3 ( S 3, Ā2) = (η 3,1 + η 3,2 S 2 ) (S 3 (γ 3,1 + γ 3,2 S 1 + γ 3,3 A 1 + γ 3,4 S 1 A 1 + γ 3,5 S 2 + γ 3,6 S1 2 + γ 3,7S2 2)).

35 7 Extra Slides 35 Results 2-Stage Estimator Robins G-Estimator Parameters β ŜE( β) P r(> z ) β ŜE( β) P r(> z ) Intercept β < 0.01 ɛ 1 η 1, µ 1 β 1, β 1, ɛ 2 η 2, µ 2 β 2, β 2, ɛ 3 η 3, η 3, µ 3 β 3, < 0.01 β 3, < 0.01 G-Estimation does not produce estimates for the nuisance functions.

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work