2 Describing Contingency Tables

Size: px

Start display at page:

Download "2 Describing Contingency Tables"

Gerard Bryan
5 years ago
Views:

1 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random or fixed, usually acts like a covariate. X has I levels, Y has J levels. A contingency table: 1 2 j J 1 n 11 n 12 n 1j n 1J X 2 n 21 n 22 n 2j n 2J. I n I1 n I2 n Ij n IJ Slide 46 Y

2 When both X and Y are random, the above contingency table is governed by the probability table (structure) 1 Y 1 j J X i π ij I where π ij = P[X = i, Y = j] and we would like to use the contingency table to make inference on this structure. For example, Y X = severity of car accidents (Fatal, Non-fatal) = Seat belt use (Yes, No) Slide 47

3 Assume X and Y have the following probability structure Fatal Y Non-fatal X Yes π 11 π 12 No π 21 π 22 Then from n car accident records, we got Fatal Y Non-fatal X Yes n 11 n 12 No n 21 n 22 and (n 11, n 12, n 21, n 22 ) T multinomial(n, π = (π 11, π 12, π 21, π 22 ) T ). Slide 48

4 Marginal distributions: Marginal distribution of Y = Severity of Car Accidents: Fatal Y Non-fatal π +1 π +2 where π +1 = π 11 + π 21, π +2 = π 12 + π 22. Marginal distribution of X = Seat Belt Use (meaningful only if X is random): Yes X No π 1+ π 2+ where π 1+ = π 11 + π 12, π 2+ = π 21 + π 22. Slide 49

5 Conditional distribution of Y given X = i: Given X = i (X is at the i-th level), the distribution of Y is governed by: π j i = P[Y = j X = i] = For the car accident example: P[X = i, Y = j] P[X = i] = π ij π i+. Given X = 1 (i.e. Seat Belt Use = Yes), the cond. dist. of Y : Y Fatal Non-fatal π 1 1 = π 11 π 11 +π 12 π 2 1 = π 12 π 11 +π 12 Given X = 2 (i.e. Seat Belt Use = No), the cond. dist. of Y : Y Fatal Non-fatal π 1 2 = π 21 π 21 +π 22 π 2 2 = π 22 π 21 +π 22 Slide 50

6 Main goal of analyzing contingency tables: Examine the relationship (association) between X and Y If both X and Y are random: Q = X and Y are independent? P[X = i, Y = j] = P[X = i]p[y = j] π ij = π i+ π +j If X is fixed (such as in a clinical trial), Y is random: Q = The distributions of Y are the same across all levels of X? Homogeneity. Note: When X, Y are random, then X and Y are independent the cond. dists of Y are the same across all levels of X (basically the same as Homogeneity). Slide 51

7 I.2 Type of Sampling: Poison, multinomial, product-multinomial Let us use car accident example for illustration: Fatal Y Severity Non-fatal X Seat Belt Use Yes n 11 n 12 No n 21 n 22 Poisson sampling: If we would like to collect car accident data for this Jan. in Wake county, then Y Severity Fatal Non-fatal X Seat Belt Use Yes n 11 n 12 No n 21 n 22 n 11, n 12, n 21, n 22 can be viewed as independent Poisson r.v. s. Slide 52

8 No n 21 n 22 n (fixed) Multinomial sampling: If we sample n car accident records of last Dec. in Wake county, then n = fixed Y Severity Fatal Non-fatal X Seat Belt Use Yes n 11 n 12 (n 11, n 12, n 21, n 22 ) T has a multinomial distribution. Note: With Poisson sampling, (n 11, n 12, n 21, n 22 ) T n has a multinomial distribution. Poisson sampling Multinomial sampling Slide 53

9 Product-multinomial sampling on X: Identify n 1+ seat-belt users and n 2+ non seat-belt users, let them drive for some time (say a month), then collect car accident data not ethical. More used in clinical trials, prospective studies, etc. Y Severity Fatal Non-fatal X Seat Belt Use Yes n 11 n 12 n 1+ (fixed) No n 21 n 22 n 2+ (fixed) Note: With multinomial sampling, (n 11, n 12, n 21, n 22 ) T multinomial(n, π = (π 11, π 12, π 21, π 22 ) T ) (n 11, n 12 ) T n 1+ multinomial(n 1+, (π 1 1, π 2 1 ) T ), indep. of (n 21, n 22 ) T n 2+ multinomial(n 2+, (π 1 2, π 2 2 ) T ). Multinomial sampling Product-multinomial sampling on X. Slide 54

10 Clinical trial example (Aspirin use on MI, p. 38): physicians were randomized and asked to take Aspirin or placebo every other day for 5 years: MI Status Fatal Attack Nonfatal Attack No Attack Placebo ,845 Aspirin ,933 n 1+ = 11, 034 and n 2+ = 11, 037 are fixed. An example of Product-multinomial sampling on X. Slide 55

11 Product-multinomial sampling on Y (used in case-control studies): Get n +1 random fatal car accidents and n +2 random non-fatal car accidents Fatal Y Severity Non-fatal X Seat Belt Use Yes n 11 n 12 No n 21 n 22 n +1 (fixed) n +2 (fixed) Multinomial sampling Product-multinomial sampling on Y. Slide 56

12 I.3 Diagnostic Tests, Sensitivity, specificity and others A special 2 2 table in diagnostic tests Y Test Result Yes No X True Disease Status Yes n 11 n 12 No n 21 n 22 With multinomial sampling or product-multinomial sampling on X, we can study Sensitivity = P[Y = Yes X =Yes] (estimated by n 11 /n 1+ ) Specificity = P[Y = No X =No] (estimated by n 22 /n 2+ ) Note: Manufacturers usually provide these. Slide 57

13 With multinomial sampling or product-multinomial sampling on Y, we can study Positive Predictive Value = P[X = Yes Y =Yes] (estimated by n 11 /n +1 ) Negative Predictive Value = P[X = No Y =No] (estimated by n 22 /n +2 ) Note: Patients are more interested in these. Slide 58

14 Example: Mammogram + radiologist s review for breast cancer has following estimated precision (P. 39) Y Test Result Yes No X Breast Cancer Yes No Note: Unfortunately, with this table we cannot determine positive or negative predictive values. Slide 59

15 II. Measuring association in 2 2 tables Assume X and Y have 2 levels (1, 2) If both X and Y are random, we have Y 1 2 X 1 π 11 π 12 2 π 21 π 22 Define π 1 = P[Y = 1 X = 1] = π 1 1 = π 11 π 1+ π 2 = P[Y = 1 X = 2] = π 1 2 = π 21 π 2+ Slide 60

16 If X is fixed, then π 1 = P[Y = 1] for X = 1 and π 2 = P[Y = 1] for X = 2: Y 1 2 X 1 π 1 1 π 1 2 π 2 1 π 2 1. Risk difference: φ = π 1 π 2 π 1 = π 2 Independence if X is random φ = 0 π 1 = π 2 Homogeneity if X if fixed φ = > 0 Positive association = 0 No association < 0 Negative association Slide 61

17 Given data from multinomial sampling or product-multinomial sampling on X Y 1 2 X 1 n 11 n 12 φ Can be estimated by (MLE of φ) 2 n 21 n 22 φ = π 1 π 2 = n 11 n 1+ n 21 n 2+. Note: The above estimate is not good for data from case-control studies Slide 62

18 2. Relative risk Depending on the magnitude of π 1 and π 2, the same φ may have different meanings. For example: Case1: π 1 = 0.410, π 2 = φ = Case2: π 1 = 0.010, π 2 = φ = For case 2 (rare event), relative risk may be more meaningful. ψ = π 1 π 2 Slide 63

19 ψ = > 1 Positive association φ > 0 = 1 No association φ = 0 < 1 Negative association φ < 0 Given data from multinomial sampling or product-multinomial sampling on X ψ can be estimated by Y 1 2 X 1 n 11 n 12 2 n 21 n 22 ψ = π 1 π 2 = n 11/n 1+ n 21 /n 2+. Note: The above ψ is not good for data from case-control studies. Slide 64

20 3. Odds ratio π success prob. ω = π 1 π odds π = ω 1+ω θ = π 1/(1 π 1 ) π 2 /(1 π 2 ) = ψ1 π 2 1 π 1. Slide 65

21 Properties of θ (a) θ and ψ (and φ) have the same direction: θ > 1 ψ > 1 φ > 0 π 1 > π 2 θ = 1 ψ = 1 φ = 0 π 1 = π 2 θ < 1 ψ < 1 φ < 0 π 1 < π 2 (b) For a rare event, π 1 0, π 2 0, independent of type of studies. θ = ψ 1 π 2 1 π 1 ψ, Slide 66

22 (c) When X is random, θ = π 11π 22 π 12 π 21. (d) Estimate of θ given data from multinomial sampling or product-multinomial sampling on X Y 1 2 X 1 n 11 n 12 2 n 21 n 22 θ = π 1/(1 π 1 ) π 2 /(1 π 2 ) = n 11/n 1+ /(1 n 11 /n 1+ ) n 21 /n 2+ /(1 n 21 /n 2+ ) = n 11n 22 n 12 n 21. Slide 67

23 (e) Odds ratio for case control studies Case Y Control X Exposure n 11 n 12 Non Exposure n 21 n 22 n +1 (fixed) n +2 (fixed) Assume underlying structure Case Y Control X Exposure π 11 π 12 Non Exposure π 21 π 22 Slide 68

24 Denote A = I(Y =case), B = I(X=exposure). Then n 11 Bin(n +1, P[B A]), indep of n 12 Bin(n +2, P[B Ā]). We can estimate P[B A] and P[B Ā] (prob of being exposed for cases and controls). If P[B A] > P[B Ā], it may be reasonable to think that exposed group is more likely to have the disease. Let us define θ = which can be estimated by P[B A]/(1 P[B A]) P[B Ā]/(1 P[B Ā]), θ = n 11 n +1 /(1 n 11 n 12 n +2 /(1 n 12 n +2 ) = n 11n 22, n 12 n 21 n +1 ) suggesting θ = θ. YES! Slide 69

25 Proof: θ = P[B A]/(1 P[B A]) P[B Ā]/(1 P[B Ā]) = P[B A]/P[ B A] P[B Ā]/P[ B Ā]) = = P[AB] P[A] / P[A B] P[A] P[ĀB] P[Ā] / P[Ā B] P[Ā] P[AB]/P[A B] P[ĀB]/P[Ā B] = P[AB]P[Ā B] P[ĀB]P[A B] = π 11π 22 π 12 π 21 = θ θ and θ are invariant to sampling! Slide 70

26 (f) A unifying formula for θ and θ: For any data obtained from Poisson sampling, multinomial sampling, product-multinomial sampling on X or Y Y 1 2 X 1 n 11 n 12 2 n 21 n 22 Let µ ij = E(n ij ) under the actual sampling scheme, then θ = µ 11µ 22 µ 12 µ 21, θ = n 11 n 22 n 12 n 21. Slide 71

27 Example (Table 2.5, p.42): Case Lung Cancer Control Smoking Yes No θ = 709 (fixed) 709 (fixed) = 3.0. Lung cancer is rare ψ θ = 3.0. Smokers are 3 times as likely to have lung cancer as non-smokers. Slide 72

28 III. Partial tables, partial association, 2 2 K tables X, Y 2 categorical variables with 2 levels. The X, Y association may not reflect a Causal relation. Need to adjust a 3rd variable Z, confounding variable (related to both X, Y ) For example, X = second hand smoking Y = lung cancer Z = age, may be related to X and Y Lung Cancer Yes No Second Hand Smoking Yes π 11 π 12 Slide 73 No π 21 π 22

29 III.1 Simpson s paradox Example Death penalty, Table 2.6 (p. 48). Data from Florida, X = defendant s race (W, B), Y = death penalty (Yes, No). Y Death Penalty Yes No X Race W B Death penalty rate for W = π 1 = = 0.11 Death penalty rate for B = π 2 = = ψ = 1.39, θ = = 1.45 White defendants are (40%) more likely to receive a death penalty than black defendants. Slide 74

30 Maybe the race of victims (Z) affects the XY association? When Z = White, XY table is Y Death Penalty Yes No X Race W π 1 = 11.3% B π 2 = 22.9% When Z = Black, XY table is Y Death Penalty Yes No X Race W 0 16 π 1 = 0% B π 2 = 2.8% This phenomenon is called Simpson s paradox. Slide 75

31 Reasons causing Simpson s paradox: Z is related to both X and Y. 1. More white victims than black victims. 2. Given Z =white, defendants are about 90% likely to be Y = white 3. Given Z =black, defendants are only about 10% likely to be Y = white. 4. More white defendants received death penalty. 5. When Z =white, X is more likely to be White. Slide 76

32 III.2 Conditional Associations Conditional odds-ratio X, Y 2 levels, random; Z K levels, random or fixed. At Z = k, observed and expected table for XY Y 1 2 X 1 n 11k n 12k 2 n 21k n 22k Y 1 2 X 1 µ 11k µ 12k 2 µ 21k µ 22k Slide 77

33 Then θ XY (k) = µ 11kµ 22k µ 12k µ 21k Conditional odds-ration of XY at Z = k θ XY (k) = n 11kn 22k n 12k n 21k Estimated conditional odds-ratio θ XY = µ 11+µ 22+ µ 12+ µ 21+ Marginal odds-ratio of XY θ XY = n 11+n 22+ n 12+ n 21+ Estimated marginal odds-ratio of XY Slide 78

34 For the death penalty example, θ XY = 1.45 θ XY (1) = = 0.43 θ XY (2) = = 0 θ XY (2) = = 0.94 Slide 79

35 Conditional independence v.s. marginal independence 1. X, Y conditional independent given Z θ XY (k) = X, Y marginal independent θ XY = If X, Y, Z are all random, π ijk = P[X = i, Y = j, Z = k], i = 1, 2, j = 1, 2, k = 1, 2, K. X, Y conditional independent given Z P[X = i, Y = j Z = k] = P[X = i Z = k]p[y = j Z = k] π ijk = π i+k π+jk π ++k π ++k π ++k π ijk = π i+kπ +jk π ++k θ XY (k) = 1 Slide 80

36 4. Example: Conditional independence marginal independence. Y Y S F S F X A X A 2 8 B 12 8 B 8 32 θ XY (1) = 1 A = B θ XY (2) = 1 A = B Marginally, Y S F X A B θ XY = 2 A > B Slide 81

37 5. Example: Marginal independence conditional independence Y Y S F S F X A 4 1 B 9 6 θ XY (1) = 8/3 X A 6 9 B 1 4 θ XY (2) = 8/3 Marginally, Y S F X A B θ XY = 1 A = B Slide 82

38 Homogeneous Association (in terms of θ) no interaction θ XY (1) = θ XY (2) = = θ XY (K) θ XY = θ XY (k). When θ XY (k) are not all the same, Z is called an effect modifier (there is interaction). If Y Z X or X Z Y then θ XY (1) = θ XY (2) = = θ XY (K) = θ XY Slide 83

39 IV Extension to I J tables (X, Y are nominal or ordinal) IV.1 Local odds-ratios Suppose we have Y j j + 1 X i π ij π i,j+1 i + 1 π i+1,j π i+1,j+1 θ ij = π ijπ i+1,j+1 π i,j+1 π i+1,j, i = 1, 2 I 1, j = 1, 2, J 1. θ ij : Conditional odds-ratio conditional on an obs falls in one of those 4 cells. Slide 84

40 We can also use X = I and Y = J as reference Y j J X i π ij π ij I π Ij π IJ α ij = π ijπ IJ π ij π Ij, i = 1, 2 I 1, j = 1, 2, J 1. α ij : Conditional odds-ratio conditional on an obs falls in one of those 4 cells. X Y θ ij = 1 α ij = 1. Slide 85

41 IV.2 Summary Measure of Association (more useful when X, Y are nominal) R 2 in regression setting: In linear regression Y = β 0 + Xβ + ǫ, we define R 2 = 1 var(ǫ) var(y ) = var(y ) var(y X) var(y ) = Proportion of variation in Y explained by X. When var(y X) depends on X, we can use R 2 = var(y ) E[var(Y X)]. var(y ) Need to define V (Y ) for a nominal cat. random variable. Slide 86

42 V (Y ) for a nominal cat. r.v. Y : Y 1 2 J Prob π +1 π +2 π +J Since numbers 1, 2,, J for the levels of Y are meaningless, V (Y ) should be defined in such a way that 1. V (Y ) 0; 2. V (Y ) = 0 π +j = 1 for some j; 3. V (Y ) is invariant to 1, 2,, J; should only depend on π +1,, π +J ; 4. V (Y ) should take maximum with uniform distribution, i.e. when π +1 = = π +J = 1/J; Slide 87

43 The following definition satisfies all above: V (Y ) = J π +j log π +j, j=1 here we define 0 log 0 = 0. at X = i, we can define V (Y X = i) using the conditional distribution of Y X = i: V (Y X = i) = J π j i log π j i, j=1 where π j i = π ij π i+. Slide 88

44 By the definition of an expectation, E{V (Y X)} = I V (Y X = i) π i+ i=1 = = I J π i+ i=1 I i=1 j=1 π ij π i+ log π ij π i+ J π ij log(π ij /π i+ ). Note: Even though X and Y are (nominal) cat. random variables, V (Y X) is a real function of X. So we can use the classical definition of an expectation. j=1 Slide 89

45 Uncertainty coefficient: U = V(Y ) E[V(Y X)] V(Y ) = J j=1 π +j log π +j + I J i=1 j=1 π ij log(π ij /π i+ ) J j=1 π +j log π +j J j=1 π ij log = I i=1 π ij π i+ π +j J j=1 π +j log π +j. Properties of U: 1. 0 U 1 Proof: Obviously, U 1. Slide 90

46 U 0 I i=1 I i=1 J π ij log j=1 I i=1 J j=1 J π ij log j=1 π ij π i+ π +j 0 π ij π i+ π +j 0 π ij log π i+π +j π ij 0 Define a discrete r.v. Z s.t. [ P Z = z ij = π ] i+π +j π ij = π ij. Then I i=1 J j=1 π ij log π i+π +j π ij = E(log Z) Slide 91

47 Plot of log(z) y z Slide 92

48 Since log(z) is concave down, we have log Z log E(Z) + 1 {Z E(Z)}. E(Z) Taking expectations on both sides E(log Z) log E(Z) + 0 E(log Z) log E(Z). But E(Z) = I J π ij z ij = I J π i+ π +j = 1. i=1 j=1 i=1 j=1 E(log Z) 0. Slide 93

49 2. U = 0 iff X Y. Proof: U = 0 E(log Z) = 0 Z is a constant z ij = E(Z) = 1 π ij = 1 π i+ π +j π ij = π i+ π +j X Y 3. U = 1 iff Y will be uniquely determined by X (perfect assoc.) Proof : U = 1 E{V (Y X)} = 0 V (Y X) = 0 At X = i, all obs of Y will fall in one cell Y will be uniquely determined by X Slide 94

50 The above U is denoted by U C R. Similarly, we can define U R C = U R C has the same properties of U C R. = V (X) E[V (X Y )] V (X) I J i=1 j=1 π π ij log ij π i+ π +j I i=1 π. i+ log π i+ A symmetric one uncertainty coefficient: U sym = 2{V (X) E[V (X Y )} V (X) + V (Y ) = 2 I J i=1 j=1 π ij log V (X) + V (Y ) π ij π i+ π +j. U sym has the same properties of U C R and U R C. Slide 95

51 V. Association of 2 ordinal cat. variables V.1 Gamma (γ) for I J tables X, Y both random, ordinal, can still use U. But want to account for the ordinal scale. 1 Y 1 j J X i π ij I Definition of concordant/discordant pairs: 2 obs (or 2 subjects) P 1 = (X 1, Y 1 ) and P 2 = (X 2, Y 2 ) are said to be 1. concordant if X 1 > X 2 and Y 1 > Y 2 or X 1 < X 2 and Y 1 < Y discordant if X 1 > X 2 and Y 1 < Y 2 or X 1 < X 2 and Y 1 > Y 2. Slide 96

52 concordance prob Π c = P[P 1 and P 2 are concordant] = P[(X 1 > X 2, Y 1 > Y 2 ) (X 1 < X 2, Y 1 < Y 2 )] = 2P[(X 1 > X 2, Y 1 > Y 2 )] = 2E{I(X 1 > X 2, Y 1 > Y 2 )} = 2E{E[I(X 1 > X 2, Y 1 > Y 2 ) X 2, Y 2 ]} = I J 2 P[X 1 > X 2 = i, Y 1 > Y 2 = j X 2 = i, Y 2 = j]π ij i=1 j=1 I J = 2 π ij h>i i=1 j=1 k>j π hk Slide 97

53 discordance prob Π d = 2 I J π ij h>i π hk i=1 j=1 k<j Gamma (γ) association measure: γ = Π c Π d Π c + Π d. Slide 98

54 Properties of γ: 1. 1 γ γ > 0 positive association; γ < 0 negative association. 3. X Y γ = 0 ( is not true in general). Given data {n ij } from a multinomial sampling (a random sample), γ can be estimated in 2 ways: 1. Use π ij = n ij /n to estimate Π c and Π d, then 2. Use γ = Π c Π d Π c + Π d. γ = C D C + D, where C = # of concordant pairs, D = # of discordant pairs. Slide 99

55 3. An Example (Table 2.8 on p. 57) of job satisfaction from 2006 General Social Survey: Job Satisfaction (Y ) Age (X) 1 (NS) 2 (FS) 3 (VS) < > C = 34( ) + 53( ) + = D = 53( ) + 88( ) + = γ = = Slide 100

56 SAS program for the job satisfaction data: options ls=80 ps=200 nodate nonumber; data table2_8; input age $ jobsat $ datalines; ; <30 NS 34 <30 FS 53 <30 VS NS FS VS 304 >50 NS 29 >50 FS 75 >50 VS 172 title "Table 2.8 of Agresti (2013) using original categories"; proc freq data=table2_8 order=data; weight count; tables age*jobsat / all norow nocol nopercent; run; Slide 101

57 SAS output from the previous SAS program: Table 2.8 of Agresti (2013) using original categories age The FREQ Procedure Table of age by jobsat jobsat Frequency NS FS VS Total < > Total Statistics for Table of age by jobsat Statistic DF Value Prob Chi-Square Likelihood Ratio Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer s V Slide 102

58 Statistic Value ASE Gamma Kendall s Tau-b Stuart s Tau-c Somers D C R Somers D R C Pearson Correlation Spearman Correlation Lambda Asymmetric C R Lambda Asymmetric R C Lambda Symmetric Uncertainty Coefficient C R Uncertainty Coefficient R C Uncertainty Coefficient Symmetric Sample Size = 1009 Slide 103

59 V.2 Comparison of 2 ordinal cat. variables: 2 J table, Y ordinal Y X (group) 1 2 J 1 π 11 π 12 π 1J 2 π 21 π 22 π 2J Define Y 1 as the r.v. of Y X = 1 and Y 2 the r.v. of Y X = 2. Then = P(Y 1 > Y 2 ) P(Y 2 > Y 1 ). Slide 104

60 Y 1 and Y 2 have distributions: Levels 1 2 J Y 1 π 1 1 π 2 1 π J 1 Then Y 2 π 1 2 π 2 2 π J 2 = j>k π j 1 π k 2 j<k π j 1 τ k 2. Slide 105

61 Given data from multinomial sampling or product-multinomial sampling on X (group): X (group) 1 2 J 1 n 11 n 12 n 1J n 1 2 n 21 n 22 n 2J n 2 1. Estimate π j 1 s by π j 1 = n 1j /n 1, and estimate π j 2 s by π j 2 = n 2j /n 2, then estimate : j>k = n 1jn 2k j<k n 1jn 2k. n 1 n 2 2. If we define group 1 > group 2, then C = j>k n 1jn 2k, D = j<k n 1jn 2k and Y = C D n 1 n 2. Slide 106

62 3. An example (Table 2.9, p.59) of shoulder tip pain after surgery: Pain Scores Treatments Active Control Estimated cumulative distributions: Pain Scores Treatments Active Control The pain scores in two groups are stochastically ordered, active group tended to have lower scores. Slide 107

63 C = 7(2 + 1) + 3(1) = 24. D = 19( ) + 2( ) + 1(3 + 2) = 251. = = Slide 108

Lecture 8: Summary Measures

Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8: