Mediation Analysis for Health Disparities Research

Size: px

Start display at page:

Download "Mediation Analysis for Health Disparities Research"

Silas Bond
5 years ago
Views:

1 Mediation Analysis for Health Disparities Research Ashley I Naimi, PhD Oct 27 wwwashleyisaacnaimicom ashleynaimi@pittedu

3 Orientation 24 Numbered Equations Slides at: wwwashleyisaacnaimicom/slides Manuscript at: wwwashleyisaacnaimicom/papers wwwajeoxfordjournalsorg

4 Background CDE CDM Data Analysis & Results Implications Conclusions Outline 1 Background 2 Controlled Direct Effects (CDE) 3 Counterfactual Disparity Measures (CDM) 4 Analysis & Results 6 Implications 7 Conclusions

5 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? 1 / 39 Robins JM (1987) Comput Math Applic

6 Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? To answer this question: data computer mathematical operations number (the answer) Robins JM (1987) Comput Math Applic 1 / 39

7 Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? To answer this question: data computer mathematical operations number (the answer) Computer calculations are based on rigorously defined mathematical objects English language sentences are often ambiguous Robins JM (1987) Comput Math Applic 1 / 39

8 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

9 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

10 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

11 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

12 Cause-Effect Relations in Social Epidemiology Ambiguous causal effects more problematic when studying social determinants of health: Educational level Income & Wealth Neighborhood Occupational Status Socioeconomic Position Race/Ethnicity 2 / 39

13 Cause-Effect Relations in Social Epidemiology Ambiguous causal effects more problematic when studying social determinants of health: Educational level Income & Wealth Neighborhood Occupational Status Socioeconomic Position Race/Ethnicity What is the effect of race on infant mortality irrespective of race s effect on breastfeeding? 2 / 39

14 Cause-Effect Relations in Social Epidemiology Ambiguous causal effects more problematic when studying social determinants of health: Educational level Income & Wealth Neighborhood Occupational Status Socioeconomic Position Race/Ethnicity What is the effect of race on infant mortality irrespective of race s effect on breastfeeding? What is the effect of race? What is race? 2 / 39

15 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White 3 / 39

16 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White Such counterfactuals statements generally do not strike us as particularly sensible VanderWeele (2015) Explanation in Causal Inference 3 / 39

17 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White Such counterfactuals statements generally do not strike us as particularly sensible VanderWeele (2015) Explanation in Causal Inference This does not imply that race is not fundamentally causal 3 / 39

18 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White Such counterfactuals statements generally do not strike us as particularly sensible VanderWeele (2015) Explanation in Causal Inference This does not imply that race is not fundamentally causal Self reported race is not counterfactually causal 3 / 39

19 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology One solution: treat as mediation analysis problem Race quantifies disparity (non-causal exposure) Separate causal variables explain disparity (causal mediator) Examples are numerous 4 / 39

20 Cause-Effect Relations in Social Epidemiology Does serum potassium explain the racial disparity in incident diabetes risk? Chatterjee et al (2011) Does cancer stage at diagnosis explain the socioeconomic disparity in mortality? Ibfelt et al (2013) Does tobacco consumption explain the neighborhood disparity in lung cancer incidence? Hystad et al (2013) 5 / 39

21 Cause-Effect Relations in Social Epidemiology Does serum potassium explain the racial disparity in incident diabetes risk? Chatterjee et al (2011) Does cancer stage at diagnosis explain the socioeconomic disparity in mortality? Ibfelt et al (2013) Does tobacco consumption explain the neighborhood disparity in lung cancer incidence? Hystad et al (2013) In all instances, the question is: what would the disparity be if M were set to some specific level? 5 / 39

22 Background CDE CDM Data Analysis & Results Implications Conclusions Remainder of This Talk A review/explanation of six methods Difference & Product Methods Inverse Probability Weighted MSMs Structural Transformation Method G Estimation of a SNMM Targeted Minimum Loss-Based Estimation (TMLE) 6 / 39

23 Remainder of This Talk A review/explanation of six methods Difference & Product Methods Inverse Probability Weighted MSMs Structural Transformation Method G Estimation of a SNMM Targeted Minimum Loss-Based Estimation (TMLE) An illustration of the major challenge that arises An explanation of double-robustness Technical details (manuscript) Example data (manuscript) Annotated SAS code (manuscript) 6 / 39

24 Controlled Direct Effects Counterfactual Disparity Measures

25 Background CDE CDM Data Analysis & Results Implications Conclusions Controlled Direct Effect Questions about mediation are often answered by quantifying controlled direct effects CDE(m) = E[ Y(x, m) Y(x, m) ] (1) 7 / 39 Y that would be observed if X were set to x and M were set to m

26 Background CDE CDM Data Analysis & Results Implications Conclusions Controlled Direct Effect Questions about mediation are often answered by quantifying controlled direct effects CDE(m) = E[ Y(x, m) Y(x, m) ] (1) 7 / 39 versus what would be observed if X were set to x and M were set to m

27 Background CDE CDM Data Analysis & Results Implications Conclusions Controlled Direct Effect a M X Y C XY C MY b M X Y C XY C MY U 8 / 39

28 Controlled Direct Effect a b C XY C XY X M Y X M Y C MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders affected by X 4 No X-M interaction 8 / 39

29 Controlled Direct Effect 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders affected by X 4 No X-M interaction Assumption Method Difference Generalized Product IPW MSM Structural Transf G Estimation of SNMM TMLE 9 / 39

30 Counterfactual Disparity Measure In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

31 Counterfactual Disparity Measure In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

32 Counterfactual Disparity Measure In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

33 Counterfactual Disparity Measures In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

34 Counterfactual Disparity Measure a b C XY C XY X M Y X M Y C MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders associated with X 4 No X-M interaction 11 / 39

35 Counterfactual Disparity Measures 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders affected by X 4 No X-M interaction Assumption Method Difference Generalized Product IPW MSM Structural Transf G Estimation of SNMM TMLE 12 / 39

36 Data

37 Penn Moms Study We estimated the magnitude of the racial disparity in infant mortality that would remain if every woman breastfed their infant prior to discharge from the place of birth Data: 900,726 live born singleton births from Pennsylvania, 2003 to 2011 X: SR nh Black (X = 1) versus nh White (X = 0) M: Breastfeeding prior to discharge (yes = 0, 1 otherwise) Y: Infant mortality C XY : Empty set C MY : 17 Variables 13 / 39

38 Penn Moms Study C MY : year of birth urbanicity maternal education paternal education marital status WIC status birthweight (kg) gest age at birth (wks) kg wks interaction 5 min Apgar parity pre-preg smoking gest smoking 1st prenatal visit wk total prenatal visits maternal age paternal age Continuous C MY : restricted quadratic splines Categorical C MY : disjoint indicator coding 14 / 39

39 Analysis & Results

40 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

41 The Difference Method Fit a model for race (X) and infant mortality (Y), adjusted for C XY and C MY : E ( Y X, C XY, C MY ) = α0 + α 1 X + α 2C XY + α 3C MY (3) Add breastfeeding status (M): E ( Y X, M, C XY, C MY ) = β0 + β 1 X + β 2 M + β 3C XY + β 4C MY (4) CDM(m = 0) = β 1 Proportion Explained = (α 1 β 1 )/α 1 16 / 39

42 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

43 Background CDE CDM Data Analysis & Results Implications Conclusions The Generalized Product Method Fit a simple linear regression model as: E ( Y X, M, C XY, C MY ) = γ0 +γ 1 X+γ 2 M+γ 3 XM+γ 4C XY +γ 5C MY (5) CDM(m = 0) = γ 1 18 / 39

44 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

45 Counterfactual Disparity Measure b C XY C XY X M Y X M Y C MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders associated with X 4 No X-M interaction 20 / 39

46 Counterfactual Disparity Measures b X C XY Social determinants are associated with a myriad downstream variables M Y X C XY M Y Results from difference and product methods can t be trustedc MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders associated with X 4 No X-M interaction 20 / 39

47 Inverse Probability Weighting Model X and M to obtain: sw = f X (X) f X (X C XY ) f M (M) f M (M X, C XY, C MY ) (6) (7) 21 / 39

48 Inverse Probability Weighting Model X and M to obtain: sw = f X (X) f X (X C XY ) f M (M) f M (M X, C XY, C MY ) (6) P(M=1) P(M=1 X,C XY,C MY ), if M = 1 = (7) P(M=0) P(M=0 X,C XY,C MY ), if M = 0 21 / 39

49 Inverse Probability Weighting Model X and M to obtain: sw = Fit a weighted regression model: f X (X) f X (X C XY ) f M (M) f M (M X, C XY, C MY ) (6) P(M=1) P(M=1 X,C XY,C MY ), if M = 1 = (7) P(M=0) P(M=0 X,C XY,C MY ), if M = 0 E(Y X, M) = θ 0 + θ 1 X + θ 2 M + θ 3 XM (8) CDM(m = 0) = θ 1 21 / 39

50 Background CDE CDM Data Analysis & Results Implications Conclusions Inverse Probability Weighting b M X Y C XY C MY U c M X Y C XY C MY U 22 / 39

51 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

52 Background CDE CDM Data Analysis & Results Implications Conclusions Structural Transformation Model Y as: E ( Y X, M, C XY, C MY ) = α0 +α 1 X+α 2 M+α 3 XM+α 4C XY +α 5C MY (9) 24 / 39

53 Background CDE CDM Data Analysis & Results Implications Conclusions Structural Transformation Model Y as: E ( Y X, M, C XY, C MY ) = α0 +α 1 X+α 2 M+α 3 XM+α 4C XY +α 5C MY (9) Create transformed outcome: Ỹ = Y ˆα 2 M ˆα 3 XM (10) 24 / 39

54 Structural Transformation Model Y as: E ( ) Y X, M, C XY, C MY = α0 +α 1 X+α 2 M+α 3 XM+α 4C XY +α 5C MY (9) Create transformed outcome: Ỹ = Y ˆα 2 M ˆα 3 XM (10) Regress transformed outcome against X: E ( ) Ỹ X, C XY = β0 + β 1 X + β 1C XY (11) CDM(m = 0) = β 1 24 / 39

55 Background CDE CDM Data Analysis & Results Implications Conclusions Structural Transformation Method b M X Y C XY C MY U c X (Y ˆα 2 M ˆα 3 XM) C XY C MY U 25 / 39

56 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

57 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

58 Background CDE CDM Data Analysis & Results Implications Conclusions Doubly Robust G Estimation Step 1: Mediator residuals: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) 27 / 39

59 Doubly Robust G Estimation Step 1: Mediator residuals: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) Step 2: Model outcome: E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) 27 / 39

60 Doubly Robust G Estimation Step 1: Mediator residuals: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) Step 2: Model outcome: E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) Step 3: Transform outcome Ỹ = Y ˆγ 2 M ˆγ 2 XM (14) 27 / 39

61 Background CDE CDM Data Analysis & Results Implications Conclusions Doubly Robust G Estimation Step 4: Exposure residuals: ˆr(X) = X { 1 + exp[ ˆξ 0 ˆξ 1C XY ] } 1 (15) 28 / 39

62 Background CDE CDM Data Analysis & Results Implications Conclusions Doubly Robust G Estimation Step 4: Exposure residuals: ˆr(X) = X { 1 + exp[ ˆξ 0 ˆξ 1C XY ] } 1 (15) Step 5: Model transformed outcome E ( Ỹ X, C XY ) = θ0 + θ 1ˆr(X) + θc XY (16) CDM(m = 0) = θ 1 28 / 39

63 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) 29 / 39

64 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct 29 / 39

65 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (13) but not (12), = (γ 2, γ 3 ) will still be correct 29 / 39

66 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (13) but not (12), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (12) and (13), = (γ 2, γ 3 ) will be confounded 29 / 39

67 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct This is a consequence If X, C XY, C MY are removed from (13) but not of double-robustness (12), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (12) and (13), = (γ 2, γ 3 ) will be confounded 29 / 39

68 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

69 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

70 TMLE Start with fitting three regression models: logit { E[X C XY ] } = θ 0 + θ 1C XY (17) logit { E[M X, C MY, C XY ] } = β 0 + β 1 X + β 2C MY + β 3C XY (18) logit { E[Y X, M, C MY, C XY ] } = γ 0 + γ 1 X + γ 2 M + γ 3 XM + γ 4C MY + γ 5C XY (19) 31 / 39

71 TMLE Start with fitting three regression models: logit { E[X C XY ] } = θ 0 + θ 1C XY (17) logit { E[M X, C MY, C XY ] } = β 0 + β 1 X + β 2C MY + β 3C XY (18) logit { E[Y X, M, C MY, C XY ] } = γ 0 + γ 1 X + γ 2 M + γ 3 XM + γ 4C MY + γ 5C XY (19) Obtain predictions under X = 1 and M = 0 for each Denote these predictions from the outcome model Q 1, / 39

72 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Create a clever covariate from the M and X models: If X = 1 and M = 0, cc 2 = 1 P(M = 0 X = 1, C XY, C MY )P(X = 1 C XY ) (20) Otherwise cc = 0 32 / 39

73 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Create a clever covariate from the M and X models: If X = 1 and M = 0, cc 2 = 1 P(M = 0 X = 1, C XY, C MY )P(X = 1 C XY ) (20) Otherwise cc = 0 This clever covariate is simply an IPW 32 / 39

74 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Using Q 1,0 2 as an offset and cc 2 as a variable, estimate a fluctuation parameter (ϵ 2 ) from a no intercept logit model: logit { E[Y X = 1, M = 0, C MY, C XY ] } = ϵ 2 cc 2 + logit [ ˆQ 1,0 2 ] (21) 33 / 39

75 TMLE Using Q 1,0 2 as an offset and cc 2 as a variable, estimate a fluctuation parameter (ϵ 2 ) from a no intercept logit model: logit { E[Y X = 1, M = 0, C MY, C XY ] } = ϵ 2 cc 2 + logit [ ˆQ 1,0 ] 2 (21) ϵ 2 captures the degree of residual confounding in the Q 1,0 2 If these predictions are unbiased, ˆϵ 2 = 0 33 / 39

76 TMLE Using Q 1,0 2 as an offset and cc 2 as a variable, estimate a fluctuation parameter (ϵ 2 ) from a no intercept logit model: logit { E[Y X = 1, M = 0, C MY, C XY ] } = ϵ 2 cc 2 + logit [ ˆQ 1,0 ] 2 (21) ϵ 2 captures the degree of residual confounding in the Q 1,0 2 If these predictions are unbiased, ˆϵ 2 = 0 Generate updated predictions from this model, denoted Q 1, / 39

77 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Regress updated predictions Q 1,0 2 against X and C XY using logistic regression: logit{e[q 1,0 2 X, C XY ]} = β 0 + β 1 X + β 2 C XY (22) Obtain predicted values under X = 1, which we denote Q 1, / 39

78 TMLE Regress updated predictions Q 1,0 2 against X and C XY using logistic regression: logit{e[q 1,0 2 X, C XY ]} = β 0 + β 1 X + β 2 C XY (22) Obtain predicted values under X = 1, which we denote Q 1,0 1 Estimate a second fluctuation parameter: logit{e[q 1,0 2 X = 1, C XY ]} = ϵ 1 cc 1 + logit[q 1,0 1 ] (23) 34 / 39

79 TMLE Regress updated predictions Q 1,0 2 against X and C XY using logistic regression: logit{e[q 1,0 2 X, C XY ]} = β 0 + β 1 X + β 2 C XY (22) Obtain predicted values under X = 1, which we denote Q 1,0 1 Estimate a second fluctuation parameter: logit{e[q 1,0 2 X = 1, C XY ]} = ϵ 1 cc 1 + logit[q 1,0 1 ] (23) 1 where, for those with X = 1, cc 1 = P[X=1 C XY ] predictions from this model, denoted Q 1,0 1 (0 otherwise) Obtain 34 / 39

80 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Take sample average of these predictions: 1 N i ˆQ 1,0 1 This average can be interpreted as an estimate of E[Y(m = 0) X = 1] 35 / 39

81 TMLE 1 1,0 Take sample average of these predictions: N i ˆQ 1 This average can be interpreted as an estimate of E[Y(m = 0) X = 1] Repeat the entire process after replacing every instance of X = 1 with X = 0 to obtain ˆQ 0,0 1 The CDM(m = 0) can then be estimated as: 1 ( ) ˆQ 1,0 1 ˆQ 0,0 1 N i (24) 35 / 39

82 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE All predictions (Q s and denominator of cc) from logistic models Can instead use any appropriate data mining / machine learning algorithm (eg, random forest, generalized boosted model, lasso, etc) Doing so would minimize assumptions implicit in the logistic model We can also use the SuperLearner to combine desired methods into a single algorithm 36 / 39

83 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

84 Background CDE CDM Data Analysis & Results Implications Conclusions Implications Don t use standard regression to quantify the disparity explained by another variable! 38 / 39

85 Background CDE CDM Data Analysis & Results Implications Conclusions Implications Don t use standard regression to quantify the disparity explained by another variable! Any approach that simultaneously includes X and C MY in the model will yield misleading results 38 / 39

86 Background CDE CDM Data Analysis & Results Implications Conclusions Implications Don t use standard regression to quantify the disparity explained by another variable! Any approach that simultaneously includes X and C MY in the model will yield misleading results Triangulating the CDM (or CDE) is best practice When singly-robust results differ, double-robust methods should be used 38 / 39

87 Background CDE CDM Data Analysis & Results Implications Conclusions Concluding Remarks Numerous studies seek to quantify CDM, but common methods don t work When studying health disparities, potential outcomes can be used to clarify (i) what we want to estimate, and (ii) how to estimate it Causal inference (via potential outcomes) can be used to improve analyses in social epidemiology 39 / 39

88 Mediation Analysis for Health Disparities Research Ashley I Naimi, PhD Oct 27 wwwashleyisaacnaimicom ashleynaimi@pittedu

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research