Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning

Size: px

Start display at page:

Download "Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning"

Janis Carr
5 years ago
Views:

1 Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK Advanced Statistical Methods in Epidemiology February 4-6,

2 Overview 1. Cohort Studies with Similar Observation Time 2. Cohort Studies with Individual, Different Observation Time 3. Case-Control Studies: Unmatched Situation 4. Case-Control Studies: Matched Situation 2

3 1. Cohort Studies with Similar Observation Time Situation in the population: Case Non-Case Exposed p 1 1-p 1 Nonexposed p 0 1-p 0 interest in: RR = p p 1 0 3

4 Situation in the sample: Case Non-Case At Risk Exposed Y 1 n 1 - Y 1 n 1 Nonexposed Y 0 n 0 - Y 0 n 0 p = p : Interest in estimating RR 1 0 RR ^ = Y 1/n 1 Y 0 /n 0 4

5 Example: Radiation Exposure and Cancer Occurrence Case Non-Case At Risk Exposed Nonexposed RR ^ = 52/2872 6/5049 = =

6 Tests and Confidence Intervals Estimated Variance of log(rr^ ): ^ Var ( log RR^ ) = 1/Y1-1/n 1 + 1/Y 0-1/n 0 Estimated Standard Error of log(rr^ ): SE^ (log RR^ ) = 1/Y 1-1/n 1 + 1/Y 0-1/n 0 For the above example: ^ Var ( log RR^ ) = 1/52-1/ /6-1/5049 = SE^ (log RR^ ) =

7 H 0 : RR= 1 or log(rr) = 0 H 1 : H 0 is false Testing Statistic used for testing: Z = log( RR ^ )/ SE^ (log RR ^ ) Z is approx. standard normally distributed if H 0 true Test with Significance level 5%: reject H 0 if Z > 1.96 accept H 0 if Z 1.96 For the example: Z = log(15.24)/ =

8 Confidence Interval 95%-CI covers with 95% confidence the true log (RR): log( RR ^ ) ± 1.96 SE^ (log RR ^ ) For the example: log(15.24) ± = (1.8801, ) and back to the relative risk scale: (exp(1.8801), exp(3.5677) ) = (6.55, 35.43) 8

9 In STATA Exposed Unexposed Total Cases Noncases Total Risk Point estimate [95% Conf. Interval] Risk difference Risk ratio Attr. frac. ex Attr. frac. pop chi2(1) = Pr>chi2 =

10 Potential Confounding and Stratification with Respect to the Confounder Situation: Explanation? Exposed Non-Exposed Stratum Case Non- Case Case Non-Case RR Total

11 A more realistic example: Drinking Coffee and CHD Exposed (coffee) Non-Exposed Stratum Case Non- Case Case Non-Case RR Smoker Non-S Total

12 How to diagnose confounding? Stratify! Situation: Exposed Non-Exposed Stratum Case Non-Case Case Non-Case RR 1 (1) Y 1 n (1) (1) 1 - Y 1 (1) Y 0 n (1) (1) 0 - Y 0 2 (2) Y 1 n (2) (2) 1 - Y 1 (2) Y 0 n (2) (2) 1 - Y 0 k (k) Y 1 n (k) (k) 1 - Y 1 (k) Y 0 n (k) (k) 1 - Y 0 RR (1) RR (2) RR (k) Total Y 1 n 1 - Y 1 Y 0 n 1 - Y 0 RR 12

13 How should the RR be estimated? Use an average of stratum-specific weights: RR ^ = w 1 RR ^ (1) + + w k RR ^ (k) /(w 1 + +w k ) Which weights? 13

14 with n (i) = n 0 (i) + n 1 (i). Good Properties! Mantel-Haenszel Approach RR ^ MH= Y 1 (1) n (1) 0 /n (1) + + Y (k) 1 n (k) 0 /n (k) Y (1) 0 n (1) 1 /n (1) + + Y (k) 0 n (k) 1 /n (k) Mantel-Haenszel Weight: w i = Y 0 (i) n 1 (i) /n (i) w 1 RR ^ (1) + + w k RR^ (k) /(w 1 + +w k ) = RR^ MH 14

15 Illustration of the MH-weights Exposed Non-Exposed Stratum Case Non- Case Case Non-Case w i *150/ *1010/

16 Stratum Case Exposure obs In STATA Stratum RR [95% Conf. Interval] M-H Weight Crude M-H combined Test of homogeneity (M-H) chi2(1) = Pr>chi2 =

17 Illustration: Coffee-CHD-Data Case Exposure Smoking freque~y

18 Smoking RR [95% Conf. Interval] M-H Weight Crude M-H combined Test of homogeneity (M-H) chi2(1) = Pr>chi2 =

19 Inflation, Masking and Effect Modification Inflation (Confounding): Crude RR is larger (in absolute value) than stratified RR Masking (Confounding): Crude RR is smaller (in absolute value) than stratified RR Effect Modification: Crude Rate is in between stratified RR 19

20 How can these situations be diagnosed? Use heterogeneity or homogeneity test: Homogeneity Hypothesis H 0 : RR (1) = RR (2) = =RR (k) H 1 : H 0 is wrong Teststatistic: k i i 2 2 χ( k 1) i= 1 () () = (logrr log RR ) /Var(log MH RR ) 20

21 Illustration of the Heterogeneity Test for CHD-Coffee Exposed Non-Exposed Stratum Case Non- Case Case Non-Case χ 2 Smoke Non Smoke Total

22 Test of homogeneity (M-H) chi2(1) = Pr>chi2 = Smoking RR [95% Conf. Interval] M-H Weight Crude M-H combined

23 Situation: Cohort Studies with Individual, different Observation Time Event-Risk Person-Time At Risk Exposed p 1 T 1 n 1 Nonexposed p 0 T 0 n 0 Definition: Person-Time is the time that n persons spend under risk in the study period 23

24 Interest in: RR = p 1 /p 0 Situation: Events Person-Time At Risk Exposed Y 1 T 1 n 1 Nonexposed Y 0 T 0 n 0 RR ^ = Y 1/T 1 Y 0 /T 0 Y/T is also called the incidence density (ID)! 24

25 Example: Smoking Exposure and CHD Occurrence Events Person-Time ID (Events per 10,000 PYs) Exposed Nonexposed RR ^ = 206/ /5710 = =

26 Tests and Confidence Intervals Estimated Variance of log(rr ^ ^ ^ ) = log( ID 1 / ID0 ): ^ Var ( log RR^ ) = 1/Y1 + 1/Y 0 Estimated Standard Error of log(rr^ ): SE^ (log RR^ ) = 1/Y 1 + 1/Y 0 For the above example: ^ Var ( log RR^ ) = 1/206 +1/28 = SE^ (log RR^ ) =

27 Testing H 0 : RR= 1 or log(rr) = 0 H 1 : H 0 is false Statistic used for testing: Z = log( RR ^ )/ SE^ (log RR ^ ) Z is approx. normally distributed if H 0 true: Test with Significance level 5%: reject H 0 if Z > 1.96 accept H 0 if Z 1.96 For the example: Z = log(1.47)/ =

28 Confidence Interval 95%-CI covers with 95% confidence the true log (RR): log( RR ^ ) ± 1.96 SE^ (log RR ^ ) For the example: log(1.47) ± = ( , ) and back to the relative risk scale: (exp( ),exp(0.7798) ) = (0.99, 2.18) 28

29 In STATA Exposed Unexposed Total Cases Person-time Incidence Rate Point estimate [95% Conf. Interval] Inc. rate diff Inc. rate ratio (exact) Attr. frac. ex (exact) Attr. frac. pop (midp) Pr(k>=206) = (exact) (midp) 2*Pr(k>=206) = (exact) 29

30 Stratification with Respect to a Potential Confounder Example: energy intake (as surrogate measure for physical inactivity) and Ischaemic Heart Disease Exposed (<2750 kcal) Non-Exposed ( 2750 kcal) Stratum Cases P-Time Cases P-Time RR Total

31 Situation: Exposed Non-Exposed Stratum Cases P-Time Cases P-Time RR 1 (1) Y 1 (1) T 1 (1) Y 0 (1) T 0 2 (2) Y 1 (2) T 1 (2) Y 0 (2) T 0 k (k) Y 1 (k) T 1 (k) Y 0 (k) T 0 RR (1) RR (2) RR (k) Total Y 1 T 1 Y 0 T 0 RR 31

32 How should the RR be estimated? Use an average of stratum-specific weights: Which weights? with T (i) = T 0 (i) + T 1 (i). RR ^ = w (1) w k RR ^ (k) /(w 1 + +w k ) Mantel-Haenszel Approach RR ^ MH= Y 1 (1) T (1) 0 /T (1) + + Y (k) 1 T (k) 0 /T (k) Y (1) 0 T (1) 1 /T (1) + + Y (k) 0 T (k) 1 /T (k) Mantel-Haensel Weight: w i = Y 0 (i) T 1 (i) /T (i) w 1 RR ^ (1) + + w k RR^ (k) /(w 1 + +w k ) = RR^ MH 32

33 In STATA Stratum Exposure number~e Person~e Stratum IRR [95% Conf. Interval] M-H Weight (exact) (exact) (exact) Crude (exact) M-H combined Test of homogeneity (M-H) chi2(2) = 1.57 Pr>chi2 =

34 2. Case-Control Studies: Unmatched Situation Situation: Case Controls Exposed q 1 q 0 Nonexposed 1-q 1 1-q 0 Interest is in: RR = p 1 /p 0 which is not estimable not in RR e = q 1 /q 0 34

35 Illustration with a Hypo-Population: Bladder-Ca Healthy Smoking , ,000 Non-smoke , , ,000 1,000,000 RR = p 1 /p 0 = 4 = 2.504= 5/ /9990 =q 1/q 0 = RR e 35

36 However, consider the (disease) Odds Ratio defined as OR = p 1/(1-p 1 ) p 0 /(1-p 0 ) Pr(D/E) = p 1, Pr(D/NE) = p 0, Pr(E/D) = q 1, Pr(E/ND) = q 0, p = Pr(D) 36

37 p 1 = P(D/E) using Bayes Theorem p 0 = P(D/NE) = = Pr(E/D)Pr(D) Pr(E/D)Pr(D)+ Pr(E/ND)Pr(ND) = Pr(NE/D)Pr(D) Pr(NE/D)Pr(D)+ Pr(NE/ND)Pr(ND) = q 1 p q 1 p + q 0 (1-p) (1-q 1 ) p (1-q 1 ) p + (1 q 0 )(1-p) p 1 /(1-p 1 ) = q 1 p/q 0 (1-p) und p 0 /(1-p 0 ) = [(1-q 1 )p]/[(1-q 0 )(1-p)]. it follows that OR = p 1/(1-p 1 ) p 0 /(1-p 0 ) = q 1 /q 0 (1-q 1 )/(1-q 0 ) = q 1/(1-q 1,) q 0 /(1-q 0 ) = OR e Disease Odds Ratio = Exposure Odds Ratio 37

38 Illustration with a Hypo-Population: Bladder-Ca Healthy Smoking , ,000 Non-smoke , , ,000 1,000,000 OR = (500/199,500)/(500/799,500) = (500/500)/(199,500/799,500) = OR e = Also, if disease occurrence is low (low prevalence), OR RR 38

39 Estimation of OR Situation: Case Controls Exposed X 1 X 0 Nonexposed m 1 -X 1 m 0 -X 0 m 1 m 0 OR ^ = q ^ ^ 1 /(1-q1 ) ^ ^ /(1-q0 ) q 0 = X 1/(m 1 -X 1 ) X 0 /(m 0 -X 0 ) = X 1(m 0 -X 0 ) X 0 (m 1 -X 1 ) 39

40 Example: Sun Exposure and Lip Cancer Occurrence in Population of year old men Case Controls Exposed Nonexposed OR ^ = =

41 Tests and Confidence Intervals Estimated Variance of log(or ^ ): ^ Var ^ 1 ( log OR ) = X m 1 - X X m 0 - X 0 Estimated Standard Error of log(or ^ ): SE^ (log OR ^ ) = 1 X m 1 - X X + 0 m 0 - X 0 For the above example: ^ ^ Var ( log OR ) = 1/66 + 1/27 +1/14 + 1/15 = SE^ (log OR ^ ) =

42 Testing H 0 : OR= 1 or log(or) = 0 H 1 : H 0 is false Statistic used for testing: Z = log(or ^ )/ SE^ (log OR ^ ) Z is approx. normally distributed if H 0 true: Test with Significance level 5%: reject H 0 if Z > 1.96 accept H 0 if Z 1.96 For the example: Z = log(2.619)/ =

43 Confidence Interval 95%-CI covers with 95% confidence the true log (RR): log(or ^ ) ± 1.96 SE^ (log OR ^ ) For the example: log(2.619) ± = (0.1078, ) and back to the relative risk scale: (exp(0.1078),exp(1.8177) ) = (1.11, 6.16) 43

44 In STATA. Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (Woolf) Attr. frac. ex (Woolf) Attr. frac. pop chi2(1) = 4.22 Pr>chi2 = Exercise: A case-control study investigates if a keeping a pet bird is a risk factor: Cases: 98 Bird Owners, 141 None, Controls: 101 Bird Owners, 328 None 44

45 Potential Confounding and Stratification with Respect to the Confounder Situation: Lip-Cancer Sun- Exposure Smoking 45

46 Lip-Cancer and Sun Exposure with Smoking as Potential Confounder Cases Controls Stratum Exposed Non- Exp. Exp. Non- Exp. OR Smoke Non- Smoke Total Explanation? 46

47 How to diagnose confounding? Stratify! Situation: Cases Controls Cases Stratuposeposed Ex- Non-Exp. Ex- Non-Exp. OR 1 (1) X 1 m (1) (1) 1 - X 1 (1) X 0 m (1) (1) 0 - X 0 OR (1) 2 (2) X 1 m (2) (2) 1 - X 1 (2) X 0 m (2) (2) 1 - X 0 OR (2) k (k) X 1 m (k) (k) 1 - X 1 (k) X 0 m (k) (k) 1 - X 0 OR (k) Total X 1 m 1 - X 1 X 0 m 1 - X 0 OR How should the OR based upon stratification be estimated? 47

48 Use an average of stratum-specific weights: Which weights? OR ^ = w 1 OR ^ (1) + + w k OR ^ (k) /(w 1 + +w k ) Mantel-Haenszel Weight: w i = X 0 (i) (m 1 (i) -X 1 (i) )/ m (i) Mantel-Haenszel Approach OR ^ MH= X 1 (1) (m 0 (1) -X 0 (1) ) /m (1) + + X 1 (k) (m 0 (k) -X 0 (k) )/m (1) X 0 (1) (m 1 (1) -X 1 (1) )/ m (1) + + X 1 (1) (m 0 (1) -X 0 (1) )/ m (1) with m (i) = m 0 (i) + m 1 (i). w 1 OR ^ (1) + + w k OR ^ (k) /(w 1 + +w k ) = OR ^ MH 48

49 Illustration of the MH-weights Cases Controls Stratum Exposed Non- Exp. Exp. Non- Exp. w i Smoke *24/91 Non- Smoke *3/31 49

50 In STATA Case Exposure Smoke Pop cc Case Control [freq=pop], by(smoke) Smoke OR [95% Conf. Interval] M-H Weight 50

51 (exact) (exact) Crude (exact) M-H combined Test of homogeneity (M-H) chi2(1) = 0.01 Pr>chi2 = Test that combined OR = 1: Mantel-Haenszel chi2(1) = 6.96 Pr>chi2 = Note that freq=pop is optional, e.g. raw data can be used with this analysis 51

52 Inflation, Masking and Effect Modification Inflation (Confounding): Crude OR is larger (in absolute value) than stratified OR Masking (Confounding): Crude OR is smaller (in absolute value) than stratified OR Effect Modification: Crude Rate is in between stratified OR How can these situations be diagnosed? Use heterogeneity or homogeneity test: Homogeneity Hypothesis H 0 : OR (1) = OR (2) = =OR (k) H 1 : H 0 is wrong k i i 2 2 χ( k 1) i= 1 () () = (logor log OR ) / Var (log MH OR ) 52

53 Illustration of the Heterogeneity Test for Lip Cancer -Sun Exposure Cases Controls Stratum Exposed Non- Exp. Exp. Non- Exp. χ 2 Smoke Non Smoke Total

54 D E stratum freq

55 stratum OR [95% Conf. Interval] M-H Weight (exact) (exact) Crude (exact) M-H combined Test of homogeneity (M-H) chi2(1) = 0.01 Pr>chi2 = Test that combined OR = 1: Mantel-Haenszel chi2(1) = 6.96 Pr>chi2 =

56 3. Case-Control Studies: Matched Situation Given a case is sampled, a comparable control is sampled: comparable w.r.t. matching criteria Examples of matching criteria are age, gender, SES, etc. Matched pairs sampling is more elaborate: to be effective often a two stage sampling of controls is done: first stage, controls are sampled as in the unmatched case; second stage, from the sample of controls. strata are built according to the matching criteria from which the matched controls are sampled Result: data consist of pairs: (Case,Control) 56

57 Because of the design the case-control study the data are no longer two independent samples of the diseased and the healthy population, but rather one independent sample of the diseased population, and a stratified sample of the healthy population, stratified by the matching variable as realized for the case Case 1 (40 ys, man) Control 1 (40 ys, man) Case 2 (33 ys, wom) Control 2 (33 ys, wom). Because of the design of the matched case-control study, stratified analysis is most appropriate with each pair defining a stratum What is the principal structure of a pair? 57

58 Four Situations a) Case Control exposed 1 1 non-exposed 2 b) Case Control exposed 1 non-exposed

59 c) Case Control exposed 1 non-exposed 1 2 d) Case Control exposed non-exposed

60 How many pairs of each type? Four frequencies a pairs of type a) Case Control exposed 1 1 non-exposed 2 b pairs of type b) Case Control exposed 1 non-exposed

61 c pairs of type c) Case Control exposed 1 non-exposed 1 d pairs of type d) Case Control exposed non-exposed

62 OR ^ MH= X 1 (1) (m 0 (1) -X 0 (1) ) /m (1) + + X 1 (k) (m 0 (k) -X 0 (k) )/m (1) X 0 (1) (m 1 (1) -X 1 (1) )/ m (1) + + X 1 (1) (m 0 (1) -X 0 (1) )/ m (1) = a 1 0 /2 + b 1 1 /2 +c 0 0 /2 + d 0 1 /2 a 0 1 /2 + b 0 0 /2 +c 1 1 /2 + d 1 0 /2 = = b/c # pairs with case exposed and control unexposed # pairs with case unexposed and controlexposed In a matched case-control study, the Mantel-Haenszel odds ratio is estimated by the ratio of the frequency of pairs with case exposed and control unexposed to the frequency of pairs with case unexposed and control exposed: 62

63 (typical presentation of paired studies) Case Control exposed unexposed exposed a b a+b unexposed c d c+d a+c b+d OR ^ (conventional, unadjusted) = (a+b)(b+d) (a+c)(c+d) OR ^ MH = b/c (ratio of discordant pairs) 63

64 Example: Reye-Syndrome and Aspirin Intake Case Control exposed unexposed exposed unexposed OR ^ (conventional, unadjusted) = (a+b)(b+d) (a+c)(c+d) = = 7.90 OR ^ MH = b/c (ratio of discordant pairs) = 57/5 = 11.4 Cleary, for the inference only discordant pairs are required! Therefore, inference is done conditional upon discordant pairs 64

65 What is the probability that a pair is of type (Case exposed, Control unexposed) given it is discordant? π = Pr ( Case E, Control NE pair is discordant) = P(Case E, Control NE) / P(pair is discordant) = P(Case E, Control NE) / P(Case E, Control NE or Case NE, Control E) = q 1 (1-q 0 )/[ q 1 (1-q 0 ) + (1-q 1 )q 0 ] = q 1(1-q 0 ) (1-q 1 )q /( q 1(1-q 0 ) 0 (1-q 1 )q +1 ) = OR/ (OR+1) 0 65

66 How can I estimate π? π^ = frequency of pairs: Case E; Control NE frequency of all discordant pairs = b/(b+c) now, π = OR/(OR+1) or OR = π/(1-π) How can I estimate OR? OR ^ = π^ /(1-π^ ) = (b/(b+c) / (1- b/(b+c)) = b/c which corresponds to the Mantel-Haenszel-estimate used before! 66

67 Testing and CI Estimation H 0 : OR = 1 or π = OR/(OR+1) = ½ H 1 : H 0 is false since π^ is a proportion estimator its estimated standard error is: SE of π^ : π (1-π)/m = Null-Hpyothesis = ½ 1/m where m=b+c (number of discordant pairs) 67

68 Teststatistic: Z = (π^ - ½ )/ (½ 1/m ) = b+c (2 b/(b+c) 1) = (b-c)/ b+c and χ 2 = Z 2 = (b-c) 2 /(b+c) is McNemar s Chi-Square test statistic! In the example: χ 2 = (57-5) 2 /62 =

69 Confidence Interval (again using π) π^ ± 1.96 SE^ (π^ ) = π^ ± 1.96 π^ (1-π^ )/m and, to get Odds Ratios, use transform. OR = π/(1-π): ^ π^ ± 1.96 π (1-π^ )/m 1- π^ ± 1.96 π^ (1-π^ )/m to provide a 95% CI for the Odds Ratio! 69

70 In the Example, π^ = 57/62 = , π^ ± 1.96 π^ (1-π^ )/m = ± = (0.8516, ) leading to the 95%-CI for the Odds Ratio: [0.8516/( ), /( ) ] = [5.7375, ] 70

71 In Stata: Controls Cases Exposed Unexposed Total. Exposed Unexposed Total McNemar's chi2(1) = Prob > chi2 = Exact McNemar significance probability = Proportion with factor Cases.945 Controls.685 [95% Conf. Interval] difference ratio rel. diff odds ratio (exact) 71

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK March 3-5,