Mediation Analysis for Health Disparities Research

Size: px
Start display at page:

Download "Mediation Analysis for Health Disparities Research"

Transcription

1 Mediation Analysis for Health Disparities Research Ashley I Naimi, PhD Oct 27 wwwashleyisaacnaimicom ashleynaimi@pittedu

2

3 Orientation 24 Numbered Equations Slides at: wwwashleyisaacnaimicom/slides Manuscript at: wwwashleyisaacnaimicom/papers wwwajeoxfordjournalsorg

4 Background CDE CDM Data Analysis & Results Implications Conclusions Outline 1 Background 2 Controlled Direct Effects (CDE) 3 Counterfactual Disparity Measures (CDM) 4 Analysis & Results 6 Implications 7 Conclusions

5 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? 1 / 39 Robins JM (1987) Comput Math Applic

6 Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? To answer this question: data computer mathematical operations number (the answer) Robins JM (1987) Comput Math Applic 1 / 39

7 Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? To answer this question: data computer mathematical operations number (the answer) Computer calculations are based on rigorously defined mathematical objects English language sentences are often ambiguous Robins JM (1987) Comput Math Applic 1 / 39

8 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

9 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

10 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

11 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Epidemiology Often articulated informally: What is the effect of smoking on cardiovascular disease risk, irrespective of smoking s effect on body weight? Causal inference is about reducing this ambiguity 1 / 39

12 Cause-Effect Relations in Social Epidemiology Ambiguous causal effects more problematic when studying social determinants of health: Educational level Income & Wealth Neighborhood Occupational Status Socioeconomic Position Race/Ethnicity 2 / 39

13 Cause-Effect Relations in Social Epidemiology Ambiguous causal effects more problematic when studying social determinants of health: Educational level Income & Wealth Neighborhood Occupational Status Socioeconomic Position Race/Ethnicity What is the effect of race on infant mortality irrespective of race s effect on breastfeeding? 2 / 39

14 Cause-Effect Relations in Social Epidemiology Ambiguous causal effects more problematic when studying social determinants of health: Educational level Income & Wealth Neighborhood Occupational Status Socioeconomic Position Race/Ethnicity What is the effect of race on infant mortality irrespective of race s effect on breastfeeding? What is the effect of race? What is race? 2 / 39

15 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White 3 / 39

16 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White Such counterfactuals statements generally do not strike us as particularly sensible VanderWeele (2015) Explanation in Causal Inference 3 / 39

17 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White Such counterfactuals statements generally do not strike us as particularly sensible VanderWeele (2015) Explanation in Causal Inference This does not imply that race is not fundamentally causal 3 / 39

18 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology if everyone were non-hispanic black versus if everyone non-hispanic White Such counterfactuals statements generally do not strike us as particularly sensible VanderWeele (2015) Explanation in Causal Inference This does not imply that race is not fundamentally causal Self reported race is not counterfactually causal 3 / 39

19 Background CDE CDM Data Analysis & Results Implications Conclusions Cause-Effect Relations in Social Epidemiology One solution: treat as mediation analysis problem Race quantifies disparity (non-causal exposure) Separate causal variables explain disparity (causal mediator) Examples are numerous 4 / 39

20 Cause-Effect Relations in Social Epidemiology Does serum potassium explain the racial disparity in incident diabetes risk? Chatterjee et al (2011) Does cancer stage at diagnosis explain the socioeconomic disparity in mortality? Ibfelt et al (2013) Does tobacco consumption explain the neighborhood disparity in lung cancer incidence? Hystad et al (2013) 5 / 39

21 Cause-Effect Relations in Social Epidemiology Does serum potassium explain the racial disparity in incident diabetes risk? Chatterjee et al (2011) Does cancer stage at diagnosis explain the socioeconomic disparity in mortality? Ibfelt et al (2013) Does tobacco consumption explain the neighborhood disparity in lung cancer incidence? Hystad et al (2013) In all instances, the question is: what would the disparity be if M were set to some specific level? 5 / 39

22 Background CDE CDM Data Analysis & Results Implications Conclusions Remainder of This Talk A review/explanation of six methods Difference & Product Methods Inverse Probability Weighted MSMs Structural Transformation Method G Estimation of a SNMM Targeted Minimum Loss-Based Estimation (TMLE) 6 / 39

23 Remainder of This Talk A review/explanation of six methods Difference & Product Methods Inverse Probability Weighted MSMs Structural Transformation Method G Estimation of a SNMM Targeted Minimum Loss-Based Estimation (TMLE) An illustration of the major challenge that arises An explanation of double-robustness Technical details (manuscript) Example data (manuscript) Annotated SAS code (manuscript) 6 / 39

24 Controlled Direct Effects Counterfactual Disparity Measures

25 Background CDE CDM Data Analysis & Results Implications Conclusions Controlled Direct Effect Questions about mediation are often answered by quantifying controlled direct effects CDE(m) = E[ Y(x, m) Y(x, m) ] (1) 7 / 39 Y that would be observed if X were set to x and M were set to m

26 Background CDE CDM Data Analysis & Results Implications Conclusions Controlled Direct Effect Questions about mediation are often answered by quantifying controlled direct effects CDE(m) = E[ Y(x, m) Y(x, m) ] (1) 7 / 39 versus what would be observed if X were set to x and M were set to m

27 Background CDE CDM Data Analysis & Results Implications Conclusions Controlled Direct Effect a M X Y C XY C MY b M X Y C XY C MY U 8 / 39

28 Controlled Direct Effect a b C XY C XY X M Y X M Y C MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders affected by X 4 No X-M interaction 8 / 39

29 Controlled Direct Effect 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders affected by X 4 No X-M interaction Assumption Method Difference Generalized Product IPW MSM Structural Transf G Estimation of SNMM TMLE 9 / 39

30 Counterfactual Disparity Measure In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

31 Counterfactual Disparity Measure In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

32 Counterfactual Disparity Measure In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

33 Counterfactual Disparity Measures In all instances, the question is: what would the magnitude of the disparity be if M were set to some specific level? They are questions about counterfactual disparity measures Y that would be observed if M were set to m versus those with X = x CDM(m) = E[ Y(m) X = x ] E[Y(m) X = x ] (2) Among those with X = x 10 / 39

34 Counterfactual Disparity Measure a b C XY C XY X M Y X M Y C MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders associated with X 4 No X-M interaction 11 / 39

35 Counterfactual Disparity Measures 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders affected by X 4 No X-M interaction Assumption Method Difference Generalized Product IPW MSM Structural Transf G Estimation of SNMM TMLE 12 / 39

36 Data

37 Penn Moms Study We estimated the magnitude of the racial disparity in infant mortality that would remain if every woman breastfed their infant prior to discharge from the place of birth Data: 900,726 live born singleton births from Pennsylvania, 2003 to 2011 X: SR nh Black (X = 1) versus nh White (X = 0) M: Breastfeeding prior to discharge (yes = 0, 1 otherwise) Y: Infant mortality C XY : Empty set C MY : 17 Variables 13 / 39

38 Penn Moms Study C MY : year of birth urbanicity maternal education paternal education marital status WIC status birthweight (kg) gest age at birth (wks) kg wks interaction 5 min Apgar parity pre-preg smoking gest smoking 1st prenatal visit wk total prenatal visits maternal age paternal age Continuous C MY : restricted quadratic splines Categorical C MY : disjoint indicator coding 14 / 39

39 Analysis & Results

40 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

41 The Difference Method Fit a model for race (X) and infant mortality (Y), adjusted for C XY and C MY : E ( Y X, C XY, C MY ) = α0 + α 1 X + α 2C XY + α 3C MY (3) Add breastfeeding status (M): E ( Y X, M, C XY, C MY ) = β0 + β 1 X + β 2 M + β 3C XY + β 4C MY (4) CDM(m = 0) = β 1 Proportion Explained = (α 1 β 1 )/α 1 16 / 39

42 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

43 Background CDE CDM Data Analysis & Results Implications Conclusions The Generalized Product Method Fit a simple linear regression model as: E ( Y X, M, C XY, C MY ) = γ0 +γ 1 X+γ 2 M+γ 3 XM+γ 4C XY +γ 5C MY (5) CDM(m = 0) = γ 1 18 / 39

44 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

45 Counterfactual Disparity Measure b C XY C XY X M Y X M Y C MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders associated with X 4 No X-M interaction 20 / 39

46 Counterfactual Disparity Measures b X C XY Social determinants are associated with a myriad downstream variables M Y X C XY M Y Results from difference and product methods can t be trustedc MY C MY U 1 No uncontrolled X-Y confounding 2 No uncontrolled M-Y confounding 3 No M-Y confounders associated with X 4 No X-M interaction 20 / 39

47 Inverse Probability Weighting Model X and M to obtain: sw = f X (X) f X (X C XY ) f M (M) f M (M X, C XY, C MY ) (6) (7) 21 / 39

48 Inverse Probability Weighting Model X and M to obtain: sw = f X (X) f X (X C XY ) f M (M) f M (M X, C XY, C MY ) (6) P(M=1) P(M=1 X,C XY,C MY ), if M = 1 = (7) P(M=0) P(M=0 X,C XY,C MY ), if M = 0 21 / 39

49 Inverse Probability Weighting Model X and M to obtain: sw = Fit a weighted regression model: f X (X) f X (X C XY ) f M (M) f M (M X, C XY, C MY ) (6) P(M=1) P(M=1 X,C XY,C MY ), if M = 1 = (7) P(M=0) P(M=0 X,C XY,C MY ), if M = 0 E(Y X, M) = θ 0 + θ 1 X + θ 2 M + θ 3 XM (8) CDM(m = 0) = θ 1 21 / 39

50 Background CDE CDM Data Analysis & Results Implications Conclusions Inverse Probability Weighting b M X Y C XY C MY U c M X Y C XY C MY U 22 / 39

51 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

52 Background CDE CDM Data Analysis & Results Implications Conclusions Structural Transformation Model Y as: E ( Y X, M, C XY, C MY ) = α0 +α 1 X+α 2 M+α 3 XM+α 4C XY +α 5C MY (9) 24 / 39

53 Background CDE CDM Data Analysis & Results Implications Conclusions Structural Transformation Model Y as: E ( Y X, M, C XY, C MY ) = α0 +α 1 X+α 2 M+α 3 XM+α 4C XY +α 5C MY (9) Create transformed outcome: Ỹ = Y ˆα 2 M ˆα 3 XM (10) 24 / 39

54 Structural Transformation Model Y as: E ( ) Y X, M, C XY, C MY = α0 +α 1 X+α 2 M+α 3 XM+α 4C XY +α 5C MY (9) Create transformed outcome: Ỹ = Y ˆα 2 M ˆα 3 XM (10) Regress transformed outcome against X: E ( ) Ỹ X, C XY = β0 + β 1 X + β 1C XY (11) CDM(m = 0) = β 1 24 / 39

55 Background CDE CDM Data Analysis & Results Implications Conclusions Structural Transformation Method b M X Y C XY C MY U c X (Y ˆα 2 M ˆα 3 XM) C XY C MY U 25 / 39

56 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

57 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

58 Background CDE CDM Data Analysis & Results Implications Conclusions Doubly Robust G Estimation Step 1: Mediator residuals: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) 27 / 39

59 Doubly Robust G Estimation Step 1: Mediator residuals: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) Step 2: Model outcome: E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) 27 / 39

60 Doubly Robust G Estimation Step 1: Mediator residuals: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) Step 2: Model outcome: E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) Step 3: Transform outcome Ỹ = Y ˆγ 2 M ˆγ 2 XM (14) 27 / 39

61 Background CDE CDM Data Analysis & Results Implications Conclusions Doubly Robust G Estimation Step 4: Exposure residuals: ˆr(X) = X { 1 + exp[ ˆξ 0 ˆξ 1C XY ] } 1 (15) 28 / 39

62 Background CDE CDM Data Analysis & Results Implications Conclusions Doubly Robust G Estimation Step 4: Exposure residuals: ˆr(X) = X { 1 + exp[ ˆξ 0 ˆξ 1C XY ] } 1 (15) Step 5: Model transformed outcome E ( Ỹ X, C XY ) = θ0 + θ 1ˆr(X) + θc XY (16) CDM(m = 0) = θ 1 28 / 39

63 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) 29 / 39

64 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct 29 / 39

65 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (13) but not (12), = (γ 2, γ 3 ) will still be correct 29 / 39

66 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (13) but not (12), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (12) and (13), = (γ 2, γ 3 ) will be confounded 29 / 39

67 Double Robustness I adjust for M Y confounding with two models: ˆr(M) = M { 1 + exp[ ˆδ 0 ˆδ 1 X ˆδ 2C XY ˆδ 3C MY ] } 1 (12) E [ Y X, M, C XY, C MY ] = β0 +γ 2ˆr(M) + γ 3 Xˆr(M) + β 1 X + β 2 C XY + β 3C MY (13) If X, C XY, C MY are removed from (12) but not (13), = (γ 2, γ 3 ) will still be correct This is a consequence If X, C XY, C MY are removed from (13) but not of double-robustness (12), = (γ 2, γ 3 ) will still be correct If X, C XY, C MY are removed from (12) and (13), = (γ 2, γ 3 ) will be confounded 29 / 39

68 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

69 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

70 TMLE Start with fitting three regression models: logit { E[X C XY ] } = θ 0 + θ 1C XY (17) logit { E[M X, C MY, C XY ] } = β 0 + β 1 X + β 2C MY + β 3C XY (18) logit { E[Y X, M, C MY, C XY ] } = γ 0 + γ 1 X + γ 2 M + γ 3 XM + γ 4C MY + γ 5C XY (19) 31 / 39

71 TMLE Start with fitting three regression models: logit { E[X C XY ] } = θ 0 + θ 1C XY (17) logit { E[M X, C MY, C XY ] } = β 0 + β 1 X + β 2C MY + β 3C XY (18) logit { E[Y X, M, C MY, C XY ] } = γ 0 + γ 1 X + γ 2 M + γ 3 XM + γ 4C MY + γ 5C XY (19) Obtain predictions under X = 1 and M = 0 for each Denote these predictions from the outcome model Q 1, / 39

72 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Create a clever covariate from the M and X models: If X = 1 and M = 0, cc 2 = 1 P(M = 0 X = 1, C XY, C MY )P(X = 1 C XY ) (20) Otherwise cc = 0 32 / 39

73 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Create a clever covariate from the M and X models: If X = 1 and M = 0, cc 2 = 1 P(M = 0 X = 1, C XY, C MY )P(X = 1 C XY ) (20) Otherwise cc = 0 This clever covariate is simply an IPW 32 / 39

74 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Using Q 1,0 2 as an offset and cc 2 as a variable, estimate a fluctuation parameter (ϵ 2 ) from a no intercept logit model: logit { E[Y X = 1, M = 0, C MY, C XY ] } = ϵ 2 cc 2 + logit [ ˆQ 1,0 2 ] (21) 33 / 39

75 TMLE Using Q 1,0 2 as an offset and cc 2 as a variable, estimate a fluctuation parameter (ϵ 2 ) from a no intercept logit model: logit { E[Y X = 1, M = 0, C MY, C XY ] } = ϵ 2 cc 2 + logit [ ˆQ 1,0 ] 2 (21) ϵ 2 captures the degree of residual confounding in the Q 1,0 2 If these predictions are unbiased, ˆϵ 2 = 0 33 / 39

76 TMLE Using Q 1,0 2 as an offset and cc 2 as a variable, estimate a fluctuation parameter (ϵ 2 ) from a no intercept logit model: logit { E[Y X = 1, M = 0, C MY, C XY ] } = ϵ 2 cc 2 + logit [ ˆQ 1,0 ] 2 (21) ϵ 2 captures the degree of residual confounding in the Q 1,0 2 If these predictions are unbiased, ˆϵ 2 = 0 Generate updated predictions from this model, denoted Q 1, / 39

77 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Regress updated predictions Q 1,0 2 against X and C XY using logistic regression: logit{e[q 1,0 2 X, C XY ]} = β 0 + β 1 X + β 2 C XY (22) Obtain predicted values under X = 1, which we denote Q 1, / 39

78 TMLE Regress updated predictions Q 1,0 2 against X and C XY using logistic regression: logit{e[q 1,0 2 X, C XY ]} = β 0 + β 1 X + β 2 C XY (22) Obtain predicted values under X = 1, which we denote Q 1,0 1 Estimate a second fluctuation parameter: logit{e[q 1,0 2 X = 1, C XY ]} = ϵ 1 cc 1 + logit[q 1,0 1 ] (23) 34 / 39

79 TMLE Regress updated predictions Q 1,0 2 against X and C XY using logistic regression: logit{e[q 1,0 2 X, C XY ]} = β 0 + β 1 X + β 2 C XY (22) Obtain predicted values under X = 1, which we denote Q 1,0 1 Estimate a second fluctuation parameter: logit{e[q 1,0 2 X = 1, C XY ]} = ϵ 1 cc 1 + logit[q 1,0 1 ] (23) 1 where, for those with X = 1, cc 1 = P[X=1 C XY ] predictions from this model, denoted Q 1,0 1 (0 otherwise) Obtain 34 / 39

80 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE Take sample average of these predictions: 1 N i ˆQ 1,0 1 This average can be interpreted as an estimate of E[Y(m = 0) X = 1] 35 / 39

81 TMLE 1 1,0 Take sample average of these predictions: N i ˆQ 1 This average can be interpreted as an estimate of E[Y(m = 0) X = 1] Repeat the entire process after replacing every instance of X = 1 with X = 0 to obtain ˆQ 0,0 1 The CDM(m = 0) can then be estimated as: 1 ( ) ˆQ 1,0 1 ˆQ 0,0 1 N i (24) 35 / 39

82 Background CDE CDM Data Analysis & Results Implications Conclusions TMLE All predictions (Q s and denominator of cc) from logistic models Can instead use any appropriate data mining / machine learning algorithm (eg, random forest, generalized boosted model, lasso, etc) Doing so would minimize assumptions implicit in the logistic model We can also use the SuperLearner to combine desired methods into a single algorithm 36 / 39

83 Results Total disparity: 336 (95% CI: 278, 393) Confidence Proportion Method CDM Interval (95%) Explained, % Difference , Generalized Product , IPW MSMs , Structural Transformation , G Estimation , TMLE , All results expressed on risk difference scale per 1,000 live births Proportion explained = [336 - CDM(m = 0)] / / 39

84 Background CDE CDM Data Analysis & Results Implications Conclusions Implications Don t use standard regression to quantify the disparity explained by another variable! 38 / 39

85 Background CDE CDM Data Analysis & Results Implications Conclusions Implications Don t use standard regression to quantify the disparity explained by another variable! Any approach that simultaneously includes X and C MY in the model will yield misleading results 38 / 39

86 Background CDE CDM Data Analysis & Results Implications Conclusions Implications Don t use standard regression to quantify the disparity explained by another variable! Any approach that simultaneously includes X and C MY in the model will yield misleading results Triangulating the CDM (or CDE) is best practice When singly-robust results differ, double-robust methods should be used 38 / 39

87 Background CDE CDM Data Analysis & Results Implications Conclusions Concluding Remarks Numerous studies seek to quantify CDM, but common methods don t work When studying health disparities, potential outcomes can be used to clarify (i) what we want to estimate, and (ii) how to estimate it Causal inference (via potential outcomes) can be used to improve analyses in social epidemiology 39 / 39

88 Mediation Analysis for Health Disparities Research Ashley I Naimi, PhD Oct 27 wwwashleyisaacnaimicom ashleynaimi@pittedu

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

Jun Tu. Department of Geography and Anthropology Kennesaw State University

Jun Tu. Department of Geography and Anthropology Kennesaw State University Examining Spatially Varying Relationships between Preterm Births and Ambient Air Pollution in Georgia using Geographically Weighted Logistic Regression Jun Tu Department of Geography and Anthropology Kennesaw

More information

Causal mediation analysis: Definition of effects and common identification assumptions

Causal mediation analysis: Definition of effects and common identification assumptions Causal mediation analysis: Definition of effects and common identification assumptions Trang Quynh Nguyen Seminar on Statistical Methods for Mental Health Research Johns Hopkins Bloomberg School of Public

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Statistical Methods for Causal Mediation Analysis

Statistical Methods for Causal Mediation Analysis Statistical Methods for Causal Mediation Analysis The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

Mediation for the 21st Century

Mediation for the 21st Century Mediation for the 21st Century Ross Boylan ross@biostat.ucsf.edu Center for Aids Prevention Studies and Division of Biostatistics University of California, San Francisco Mediation for the 21st Century

More information

An Introduction to Causal Inference, with Extensions to Longitudinal Data

An Introduction to Causal Inference, with Extensions to Longitudinal Data An Introduction to Causal Inference, with Extensions to Longitudinal Data Tyler VanderWeele Harvard Catalyst Biostatistics Seminar Series November 18, 2009 Plan of Presentation Association and Causation

More information

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS 776 1 12 IPW and MSM IP weighting and marginal structural models ( 12) Outline 12.1 The causal question 12.2 Estimating IP weights via modeling

More information

Causal inference in epidemiological practice

Causal inference in epidemiological practice Causal inference in epidemiological practice Willem van der Wal Biostatistics, Julius Center UMC Utrecht June 5, 2 Overview Introduction to causal inference Marginal causal effects Estimating marginal

More information

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts? Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts? Brian Egleston Fox Chase Cancer Center Collaborators: Daniel Scharfstein,

More information

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016 Mediation analyses Advanced Psychometrics Methods in Cognitive Aging Research Workshop June 6, 2016 1 / 40 1 2 3 4 5 2 / 40 Goals for today Motivate mediation analysis Survey rapidly developing field in

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Causality II: How does causal inference fit into public health and what it is the role of statistics? Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual

More information

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Ruth Keogh, Stijn Vansteelandt, Rhian Daniel Department of Medical Statistics London

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

15: Regression. Introduction

15: Regression. Introduction 15: Regression Introduction Regression Model Inference About the Slope Introduction As with correlation, regression is used to analyze the relation between two continuous (scale) variables. However, regression

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Notation and language Recall the notation that we discussed in the previous classes. We call the outcome

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

A Unification of Mediation and Interaction. A 4-Way Decomposition. Tyler J. VanderWeele

A Unification of Mediation and Interaction. A 4-Way Decomposition. Tyler J. VanderWeele Original Article A Unification of Mediation and Interaction A 4-Way Decomposition Tyler J. VanderWeele Abstract: The overall effect of an exposure on an outcome, in the presence of a mediator with which

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Help! Statistics! Mediation Analysis

Help! Statistics! Mediation Analysis Help! Statistics! Lunch time lectures Help! Statistics! Mediation Analysis What? Frequently used statistical methods and questions in a manageable timeframe for all researchers at the UMCG. No knowledge

More information

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Comparison of Three Approaches to Causal Mediation Analysis Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Introduction Mediation defined using the potential outcomes framework natural effects

More information

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS 776 1 15 Outcome regressions and propensity scores Outcome Regression and Propensity Scores ( 15) Outline 15.1 Outcome regression 15.2 Propensity

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback University of South Carolina Scholar Commons Theses and Dissertations 2017 Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback Yanan Zhang University of South Carolina Follow

More information

Propensity Score Methods, Models and Adjustment

Propensity Score Methods, Models and Adjustment Propensity Score Methods, Models and Adjustment Dr David A. Stephens Department of Mathematics & Statistics McGill University Montreal, QC, Canada. d.stephens@math.mcgill.ca www.math.mcgill.ca/dstephens/siscr2016/

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

Technical Track Session I: Causal Inference

Technical Track Session I: Causal Inference Impact Evaluation Technical Track Session I: Causal Inference Human Development Human Network Development Network Middle East and North Africa Region World Bank Institute Spanish Impact Evaluation Fund

More information

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX Well, it depends on where you're born: A practical application of geographically weighted regression to the study of infant mortality in the U.S. P. Johnelle Sparks and Corey S. Sparks 1 Introduction Infant

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Combining Difference-in-difference and Matching for Panel Data Analysis

Combining Difference-in-difference and Matching for Panel Data Analysis Combining Difference-in-difference and Matching for Panel Data Analysis Weihua An Departments of Sociology and Statistics Indiana University July 28, 2016 1 / 15 Research Interests Network Analysis Social

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Rencontres de l Hôtel Dieu Paris,12-13 mai C. Padilla, B Lalloue, D Zmirou-Navier, S Deguen

Rencontres de l Hôtel Dieu Paris,12-13 mai C. Padilla, B Lalloue, D Zmirou-Navier, S Deguen Association of Proximity to Polluting Industries, Deprivation and Infant Mortality - A spatial analysis using census data Lille metropolitan Area France 1,2,4 1,4 1,2,3,4 1,2 C. Padilla, B Lalloue, D Zmirou-Navier,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 250 A Machine-Learning Algorithm for Estimating and Ranking the Impact of Environmental Risk

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 176 A Simple Regression-based Approach to Account for Survival Bias in Birth Outcomes Research Eric J. Tchetgen

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Measurement Error in Spatial Modeling of Environmental Exposures

Measurement Error in Spatial Modeling of Environmental Exposures Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

eappendix: Description of mgformula SAS macro for parametric mediational g-formula

eappendix: Description of mgformula SAS macro for parametric mediational g-formula eappendix: Description of mgformula SAS macro for parametric mediational g-formula The implementation of causal mediation analysis with time-varying exposures, mediators, and confounders Introduction The

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B. THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B. RUBIN My perspective on inference for causal effects: In randomized

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure arxiv:1706.02675v2 [stat.me] 2 Apr 2018 Laura B. Balzer, Wenjing Zheng,

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors. EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

More information

Sensitivity analysis and distributional assumptions

Sensitivity analysis and distributional assumptions Sensitivity analysis and distributional assumptions Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA vanderweele@uchicago.edu

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the

More information

STATISTICS Relationships between variables: Correlation

STATISTICS Relationships between variables: Correlation STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.

More information

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS 776 1 14 G-Estimation ( G-Estimation of Structural Nested Models 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited

More information

A Reliable Constrained Method for Identity Link Poisson Regression

A Reliable Constrained Method for Identity Link Poisson Regression A Reliable Constrained Method for Identity Link Poisson Regression Ian Marschner Macquarie University, Sydney Australasian Region of the International Biometrics Society, Taupo, NZ, Dec 2009. 1 / 16 Identity

More information

Observational Studies 4 (2018) Submitted 12/17; Published 6/18

Observational Studies 4 (2018) Submitted 12/17; Published 6/18 Observational Studies 4 (2018) 193-216 Submitted 12/17; Published 6/18 Comparing logistic and log-binomial models for causal mediation analyses of binary mediators and rare binary outcomes: evidence to

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: Diverse risks of incident cardiovascular disease and all-cause mortality in men and women with low cash margins living alone: cohort data from 60-year-olds Authors:

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

Spatial Disparities in the Distribution of Parks and Green Spaces in the United States

Spatial Disparities in the Distribution of Parks and Green Spaces in the United States March 11 th, 2012 Active Living Research Conference Spatial Disparities in the Distribution of Parks and Green Spaces in the United States Ming Wen, Ph.D., University of Utah Xingyou Zhang, Ph.D., CDC

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Part IV Statistics in Epidemiology

Part IV Statistics in Epidemiology Part IV Statistics in Epidemiology There are many good statistical textbooks on the market, and we refer readers to some of these textbooks when they need statistical techniques to analyze data or to interpret

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

Causal mediation analysis: Multiple mediators

Causal mediation analysis: Multiple mediators Causal mediation analysis: ultiple mediators Trang Quynh guyen Seminar on Statistical ethods for ental Health Research Johns Hopkins Bloomberg School of Public Health 330.805.01 term 4 session 4 - ay 5,

More information

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary Correlation Patrick Breheny November 15 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 21 Introduction Descriptive statistics Generally speaking, scientific questions often

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect, 13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Learning Representations for Counterfactual Inference. Fredrik Johansson 1, Uri Shalit 2, David Sontag 2

Learning Representations for Counterfactual Inference. Fredrik Johansson 1, Uri Shalit 2, David Sontag 2 Learning Representations for Counterfactual Inference Fredrik Johansson 1, Uri Shalit 2, David Sontag 2 1 2 Counterfactual inference Patient Anna comes in with hypertension. She is 50 years old, Asian

More information

A comparison of 5 software implementations of mediation analysis

A comparison of 5 software implementations of mediation analysis Faculty of Health Sciences A comparison of 5 software implementations of mediation analysis Liis Starkopf, Thomas A. Gerds, Theis Lange Section of Biostatistics, University of Copenhagen Illustrative example

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Sampling bias in logistic models

Sampling bias in logistic models Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

Causality theory for policy uses of epidemiological measures

Causality theory for policy uses of epidemiological measures Chapter 6.2 Causality theory for policy uses of epidemiological measures Sander Greenland This paper provides an introduction to measures of causal effects and focuses on underlying conceptual models,

More information