2 Describing Contingency Tables

Size: px
Start display at page:

Download "2 Describing Contingency Tables"

Transcription

1 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random or fixed, usually acts like a covariate. X has I levels, Y has J levels. A contingency table: 1 2 j J 1 n 11 n 12 n 1j n 1J X 2 n 21 n 22 n 2j n 2J. I n I1 n I2 n Ij n IJ Slide 46 Y

2 When both X and Y are random, the above contingency table is governed by the probability table (structure) 1 Y 1 j J X i π ij I where π ij = P[X = i, Y = j] and we would like to use the contingency table to make inference on this structure. For example, Y X = severity of car accidents (Fatal, Non-fatal) = Seat belt use (Yes, No) Slide 47

3 Assume X and Y have the following probability structure Fatal Y Non-fatal X Yes π 11 π 12 No π 21 π 22 Then from n car accident records, we got Fatal Y Non-fatal X Yes n 11 n 12 No n 21 n 22 and (n 11, n 12, n 21, n 22 ) T multinomial(n, π = (π 11, π 12, π 21, π 22 ) T ). Slide 48

4 Marginal distributions: Marginal distribution of Y = Severity of Car Accidents: Fatal Y Non-fatal π +1 π +2 where π +1 = π 11 + π 21, π +2 = π 12 + π 22. Marginal distribution of X = Seat Belt Use (meaningful only if X is random): Yes X No π 1+ π 2+ where π 1+ = π 11 + π 12, π 2+ = π 21 + π 22. Slide 49

5 Conditional distribution of Y given X = i: Given X = i (X is at the i-th level), the distribution of Y is governed by: π j i = P[Y = j X = i] = For the car accident example: P[X = i, Y = j] P[X = i] = π ij π i+. Given X = 1 (i.e. Seat Belt Use = Yes), the cond. dist. of Y : Y Fatal Non-fatal π 1 1 = π 11 π 11 +π 12 π 2 1 = π 12 π 11 +π 12 Given X = 2 (i.e. Seat Belt Use = No), the cond. dist. of Y : Y Fatal Non-fatal π 1 2 = π 21 π 21 +π 22 π 2 2 = π 22 π 21 +π 22 Slide 50

6 Main goal of analyzing contingency tables: Examine the relationship (association) between X and Y If both X and Y are random: Q = X and Y are independent? P[X = i, Y = j] = P[X = i]p[y = j] π ij = π i+ π +j If X is fixed (such as in a clinical trial), Y is random: Q = The distributions of Y are the same across all levels of X? Homogeneity. Note: When X, Y are random, then X and Y are independent the cond. dists of Y are the same across all levels of X (basically the same as Homogeneity). Slide 51

7 I.2 Type of Sampling: Poison, multinomial, product-multinomial Let us use car accident example for illustration: Fatal Y Severity Non-fatal X Seat Belt Use Yes n 11 n 12 No n 21 n 22 Poisson sampling: If we would like to collect car accident data for this Jan. in Wake county, then Y Severity Fatal Non-fatal X Seat Belt Use Yes n 11 n 12 No n 21 n 22 n 11, n 12, n 21, n 22 can be viewed as independent Poisson r.v. s. Slide 52

8 No n 21 n 22 n (fixed) Multinomial sampling: If we sample n car accident records of last Dec. in Wake county, then n = fixed Y Severity Fatal Non-fatal X Seat Belt Use Yes n 11 n 12 (n 11, n 12, n 21, n 22 ) T has a multinomial distribution. Note: With Poisson sampling, (n 11, n 12, n 21, n 22 ) T n has a multinomial distribution. Poisson sampling Multinomial sampling Slide 53

9 Product-multinomial sampling on X: Identify n 1+ seat-belt users and n 2+ non seat-belt users, let them drive for some time (say a month), then collect car accident data not ethical. More used in clinical trials, prospective studies, etc. Y Severity Fatal Non-fatal X Seat Belt Use Yes n 11 n 12 n 1+ (fixed) No n 21 n 22 n 2+ (fixed) Note: With multinomial sampling, (n 11, n 12, n 21, n 22 ) T multinomial(n, π = (π 11, π 12, π 21, π 22 ) T ) (n 11, n 12 ) T n 1+ multinomial(n 1+, (π 1 1, π 2 1 ) T ), indep. of (n 21, n 22 ) T n 2+ multinomial(n 2+, (π 1 2, π 2 2 ) T ). Multinomial sampling Product-multinomial sampling on X. Slide 54

10 Clinical trial example (Aspirin use on MI, p. 38): physicians were randomized and asked to take Aspirin or placebo every other day for 5 years: MI Status Fatal Attack Nonfatal Attack No Attack Placebo ,845 Aspirin ,933 n 1+ = 11, 034 and n 2+ = 11, 037 are fixed. An example of Product-multinomial sampling on X. Slide 55

11 Product-multinomial sampling on Y (used in case-control studies): Get n +1 random fatal car accidents and n +2 random non-fatal car accidents Fatal Y Severity Non-fatal X Seat Belt Use Yes n 11 n 12 No n 21 n 22 n +1 (fixed) n +2 (fixed) Multinomial sampling Product-multinomial sampling on Y. Slide 56

12 I.3 Diagnostic Tests, Sensitivity, specificity and others A special 2 2 table in diagnostic tests Y Test Result Yes No X True Disease Status Yes n 11 n 12 No n 21 n 22 With multinomial sampling or product-multinomial sampling on X, we can study Sensitivity = P[Y = Yes X =Yes] (estimated by n 11 /n 1+ ) Specificity = P[Y = No X =No] (estimated by n 22 /n 2+ ) Note: Manufacturers usually provide these. Slide 57

13 With multinomial sampling or product-multinomial sampling on Y, we can study Positive Predictive Value = P[X = Yes Y =Yes] (estimated by n 11 /n +1 ) Negative Predictive Value = P[X = No Y =No] (estimated by n 22 /n +2 ) Note: Patients are more interested in these. Slide 58

14 Example: Mammogram + radiologist s review for breast cancer has following estimated precision (P. 39) Y Test Result Yes No X Breast Cancer Yes No Note: Unfortunately, with this table we cannot determine positive or negative predictive values. Slide 59

15 II. Measuring association in 2 2 tables Assume X and Y have 2 levels (1, 2) If both X and Y are random, we have Y 1 2 X 1 π 11 π 12 2 π 21 π 22 Define π 1 = P[Y = 1 X = 1] = π 1 1 = π 11 π 1+ π 2 = P[Y = 1 X = 2] = π 1 2 = π 21 π 2+ Slide 60

16 If X is fixed, then π 1 = P[Y = 1] for X = 1 and π 2 = P[Y = 1] for X = 2: Y 1 2 X 1 π 1 1 π 1 2 π 2 1 π 2 1. Risk difference: φ = π 1 π 2 π 1 = π 2 Independence if X is random φ = 0 π 1 = π 2 Homogeneity if X if fixed φ = > 0 Positive association = 0 No association < 0 Negative association Slide 61

17 Given data from multinomial sampling or product-multinomial sampling on X Y 1 2 X 1 n 11 n 12 φ Can be estimated by (MLE of φ) 2 n 21 n 22 φ = π 1 π 2 = n 11 n 1+ n 21 n 2+. Note: The above estimate is not good for data from case-control studies Slide 62

18 2. Relative risk Depending on the magnitude of π 1 and π 2, the same φ may have different meanings. For example: Case1: π 1 = 0.410, π 2 = φ = Case2: π 1 = 0.010, π 2 = φ = For case 2 (rare event), relative risk may be more meaningful. ψ = π 1 π 2 Slide 63

19 ψ = > 1 Positive association φ > 0 = 1 No association φ = 0 < 1 Negative association φ < 0 Given data from multinomial sampling or product-multinomial sampling on X ψ can be estimated by Y 1 2 X 1 n 11 n 12 2 n 21 n 22 ψ = π 1 π 2 = n 11/n 1+ n 21 /n 2+. Note: The above ψ is not good for data from case-control studies. Slide 64

20 3. Odds ratio π success prob. ω = π 1 π odds π = ω 1+ω θ = π 1/(1 π 1 ) π 2 /(1 π 2 ) = ψ1 π 2 1 π 1. Slide 65

21 Properties of θ (a) θ and ψ (and φ) have the same direction: θ > 1 ψ > 1 φ > 0 π 1 > π 2 θ = 1 ψ = 1 φ = 0 π 1 = π 2 θ < 1 ψ < 1 φ < 0 π 1 < π 2 (b) For a rare event, π 1 0, π 2 0, independent of type of studies. θ = ψ 1 π 2 1 π 1 ψ, Slide 66

22 (c) When X is random, θ = π 11π 22 π 12 π 21. (d) Estimate of θ given data from multinomial sampling or product-multinomial sampling on X Y 1 2 X 1 n 11 n 12 2 n 21 n 22 θ = π 1/(1 π 1 ) π 2 /(1 π 2 ) = n 11/n 1+ /(1 n 11 /n 1+ ) n 21 /n 2+ /(1 n 21 /n 2+ ) = n 11n 22 n 12 n 21. Slide 67

23 (e) Odds ratio for case control studies Case Y Control X Exposure n 11 n 12 Non Exposure n 21 n 22 n +1 (fixed) n +2 (fixed) Assume underlying structure Case Y Control X Exposure π 11 π 12 Non Exposure π 21 π 22 Slide 68

24 Denote A = I(Y =case), B = I(X=exposure). Then n 11 Bin(n +1, P[B A]), indep of n 12 Bin(n +2, P[B Ā]). We can estimate P[B A] and P[B Ā] (prob of being exposed for cases and controls). If P[B A] > P[B Ā], it may be reasonable to think that exposed group is more likely to have the disease. Let us define θ = which can be estimated by P[B A]/(1 P[B A]) P[B Ā]/(1 P[B Ā]), θ = n 11 n +1 /(1 n 11 n 12 n +2 /(1 n 12 n +2 ) = n 11n 22, n 12 n 21 n +1 ) suggesting θ = θ. YES! Slide 69

25 Proof: θ = P[B A]/(1 P[B A]) P[B Ā]/(1 P[B Ā]) = P[B A]/P[ B A] P[B Ā]/P[ B Ā]) = = P[AB] P[A] / P[A B] P[A] P[ĀB] P[Ā] / P[Ā B] P[Ā] P[AB]/P[A B] P[ĀB]/P[Ā B] = P[AB]P[Ā B] P[ĀB]P[A B] = π 11π 22 π 12 π 21 = θ θ and θ are invariant to sampling! Slide 70

26 (f) A unifying formula for θ and θ: For any data obtained from Poisson sampling, multinomial sampling, product-multinomial sampling on X or Y Y 1 2 X 1 n 11 n 12 2 n 21 n 22 Let µ ij = E(n ij ) under the actual sampling scheme, then θ = µ 11µ 22 µ 12 µ 21, θ = n 11 n 22 n 12 n 21. Slide 71

27 Example (Table 2.5, p.42): Case Lung Cancer Control Smoking Yes No θ = 709 (fixed) 709 (fixed) = 3.0. Lung cancer is rare ψ θ = 3.0. Smokers are 3 times as likely to have lung cancer as non-smokers. Slide 72

28 III. Partial tables, partial association, 2 2 K tables X, Y 2 categorical variables with 2 levels. The X, Y association may not reflect a Causal relation. Need to adjust a 3rd variable Z, confounding variable (related to both X, Y ) For example, X = second hand smoking Y = lung cancer Z = age, may be related to X and Y Lung Cancer Yes No Second Hand Smoking Yes π 11 π 12 Slide 73 No π 21 π 22

29 III.1 Simpson s paradox Example Death penalty, Table 2.6 (p. 48). Data from Florida, X = defendant s race (W, B), Y = death penalty (Yes, No). Y Death Penalty Yes No X Race W B Death penalty rate for W = π 1 = = 0.11 Death penalty rate for B = π 2 = = ψ = 1.39, θ = = 1.45 White defendants are (40%) more likely to receive a death penalty than black defendants. Slide 74

30 Maybe the race of victims (Z) affects the XY association? When Z = White, XY table is Y Death Penalty Yes No X Race W π 1 = 11.3% B π 2 = 22.9% When Z = Black, XY table is Y Death Penalty Yes No X Race W 0 16 π 1 = 0% B π 2 = 2.8% This phenomenon is called Simpson s paradox. Slide 75

31 Reasons causing Simpson s paradox: Z is related to both X and Y. 1. More white victims than black victims. 2. Given Z =white, defendants are about 90% likely to be Y = white 3. Given Z =black, defendants are only about 10% likely to be Y = white. 4. More white defendants received death penalty. 5. When Z =white, X is more likely to be White. Slide 76

32 III.2 Conditional Associations Conditional odds-ratio X, Y 2 levels, random; Z K levels, random or fixed. At Z = k, observed and expected table for XY Y 1 2 X 1 n 11k n 12k 2 n 21k n 22k Y 1 2 X 1 µ 11k µ 12k 2 µ 21k µ 22k Slide 77

33 Then θ XY (k) = µ 11kµ 22k µ 12k µ 21k Conditional odds-ration of XY at Z = k θ XY (k) = n 11kn 22k n 12k n 21k Estimated conditional odds-ratio θ XY = µ 11+µ 22+ µ 12+ µ 21+ Marginal odds-ratio of XY θ XY = n 11+n 22+ n 12+ n 21+ Estimated marginal odds-ratio of XY Slide 78

34 For the death penalty example, θ XY = 1.45 θ XY (1) = = 0.43 θ XY (2) = = 0 θ XY (2) = = 0.94 Slide 79

35 Conditional independence v.s. marginal independence 1. X, Y conditional independent given Z θ XY (k) = X, Y marginal independent θ XY = If X, Y, Z are all random, π ijk = P[X = i, Y = j, Z = k], i = 1, 2, j = 1, 2, k = 1, 2, K. X, Y conditional independent given Z P[X = i, Y = j Z = k] = P[X = i Z = k]p[y = j Z = k] π ijk = π i+k π+jk π ++k π ++k π ++k π ijk = π i+kπ +jk π ++k θ XY (k) = 1 Slide 80

36 4. Example: Conditional independence marginal independence. Y Y S F S F X A X A 2 8 B 12 8 B 8 32 θ XY (1) = 1 A = B θ XY (2) = 1 A = B Marginally, Y S F X A B θ XY = 2 A > B Slide 81

37 5. Example: Marginal independence conditional independence Y Y S F S F X A 4 1 B 9 6 θ XY (1) = 8/3 X A 6 9 B 1 4 θ XY (2) = 8/3 Marginally, Y S F X A B θ XY = 1 A = B Slide 82

38 Homogeneous Association (in terms of θ) no interaction θ XY (1) = θ XY (2) = = θ XY (K) θ XY = θ XY (k). When θ XY (k) are not all the same, Z is called an effect modifier (there is interaction). If Y Z X or X Z Y then θ XY (1) = θ XY (2) = = θ XY (K) = θ XY Slide 83

39 IV Extension to I J tables (X, Y are nominal or ordinal) IV.1 Local odds-ratios Suppose we have Y j j + 1 X i π ij π i,j+1 i + 1 π i+1,j π i+1,j+1 θ ij = π ijπ i+1,j+1 π i,j+1 π i+1,j, i = 1, 2 I 1, j = 1, 2, J 1. θ ij : Conditional odds-ratio conditional on an obs falls in one of those 4 cells. Slide 84

40 We can also use X = I and Y = J as reference Y j J X i π ij π ij I π Ij π IJ α ij = π ijπ IJ π ij π Ij, i = 1, 2 I 1, j = 1, 2, J 1. α ij : Conditional odds-ratio conditional on an obs falls in one of those 4 cells. X Y θ ij = 1 α ij = 1. Slide 85

41 IV.2 Summary Measure of Association (more useful when X, Y are nominal) R 2 in regression setting: In linear regression Y = β 0 + Xβ + ǫ, we define R 2 = 1 var(ǫ) var(y ) = var(y ) var(y X) var(y ) = Proportion of variation in Y explained by X. When var(y X) depends on X, we can use R 2 = var(y ) E[var(Y X)]. var(y ) Need to define V (Y ) for a nominal cat. random variable. Slide 86

42 V (Y ) for a nominal cat. r.v. Y : Y 1 2 J Prob π +1 π +2 π +J Since numbers 1, 2,, J for the levels of Y are meaningless, V (Y ) should be defined in such a way that 1. V (Y ) 0; 2. V (Y ) = 0 π +j = 1 for some j; 3. V (Y ) is invariant to 1, 2,, J; should only depend on π +1,, π +J ; 4. V (Y ) should take maximum with uniform distribution, i.e. when π +1 = = π +J = 1/J; Slide 87

43 The following definition satisfies all above: V (Y ) = J π +j log π +j, j=1 here we define 0 log 0 = 0. at X = i, we can define V (Y X = i) using the conditional distribution of Y X = i: V (Y X = i) = J π j i log π j i, j=1 where π j i = π ij π i+. Slide 88

44 By the definition of an expectation, E{V (Y X)} = I V (Y X = i) π i+ i=1 = = I J π i+ i=1 I i=1 j=1 π ij π i+ log π ij π i+ J π ij log(π ij /π i+ ). Note: Even though X and Y are (nominal) cat. random variables, V (Y X) is a real function of X. So we can use the classical definition of an expectation. j=1 Slide 89

45 Uncertainty coefficient: U = V(Y ) E[V(Y X)] V(Y ) = J j=1 π +j log π +j + I J i=1 j=1 π ij log(π ij /π i+ ) J j=1 π +j log π +j J j=1 π ij log = I i=1 π ij π i+ π +j J j=1 π +j log π +j. Properties of U: 1. 0 U 1 Proof: Obviously, U 1. Slide 90

46 U 0 I i=1 I i=1 J π ij log j=1 I i=1 J j=1 J π ij log j=1 π ij π i+ π +j 0 π ij π i+ π +j 0 π ij log π i+π +j π ij 0 Define a discrete r.v. Z s.t. [ P Z = z ij = π ] i+π +j π ij = π ij. Then I i=1 J j=1 π ij log π i+π +j π ij = E(log Z) Slide 91

47 Plot of log(z) y z Slide 92

48 Since log(z) is concave down, we have log Z log E(Z) + 1 {Z E(Z)}. E(Z) Taking expectations on both sides E(log Z) log E(Z) + 0 E(log Z) log E(Z). But E(Z) = I J π ij z ij = I J π i+ π +j = 1. i=1 j=1 i=1 j=1 E(log Z) 0. Slide 93

49 2. U = 0 iff X Y. Proof: U = 0 E(log Z) = 0 Z is a constant z ij = E(Z) = 1 π ij = 1 π i+ π +j π ij = π i+ π +j X Y 3. U = 1 iff Y will be uniquely determined by X (perfect assoc.) Proof : U = 1 E{V (Y X)} = 0 V (Y X) = 0 At X = i, all obs of Y will fall in one cell Y will be uniquely determined by X Slide 94

50 The above U is denoted by U C R. Similarly, we can define U R C = U R C has the same properties of U C R. = V (X) E[V (X Y )] V (X) I J i=1 j=1 π π ij log ij π i+ π +j I i=1 π. i+ log π i+ A symmetric one uncertainty coefficient: U sym = 2{V (X) E[V (X Y )} V (X) + V (Y ) = 2 I J i=1 j=1 π ij log V (X) + V (Y ) π ij π i+ π +j. U sym has the same properties of U C R and U R C. Slide 95

51 V. Association of 2 ordinal cat. variables V.1 Gamma (γ) for I J tables X, Y both random, ordinal, can still use U. But want to account for the ordinal scale. 1 Y 1 j J X i π ij I Definition of concordant/discordant pairs: 2 obs (or 2 subjects) P 1 = (X 1, Y 1 ) and P 2 = (X 2, Y 2 ) are said to be 1. concordant if X 1 > X 2 and Y 1 > Y 2 or X 1 < X 2 and Y 1 < Y discordant if X 1 > X 2 and Y 1 < Y 2 or X 1 < X 2 and Y 1 > Y 2. Slide 96

52 concordance prob Π c = P[P 1 and P 2 are concordant] = P[(X 1 > X 2, Y 1 > Y 2 ) (X 1 < X 2, Y 1 < Y 2 )] = 2P[(X 1 > X 2, Y 1 > Y 2 )] = 2E{I(X 1 > X 2, Y 1 > Y 2 )} = 2E{E[I(X 1 > X 2, Y 1 > Y 2 ) X 2, Y 2 ]} = I J 2 P[X 1 > X 2 = i, Y 1 > Y 2 = j X 2 = i, Y 2 = j]π ij i=1 j=1 I J = 2 π ij h>i i=1 j=1 k>j π hk Slide 97

53 discordance prob Π d = 2 I J π ij h>i π hk i=1 j=1 k<j Gamma (γ) association measure: γ = Π c Π d Π c + Π d. Slide 98

54 Properties of γ: 1. 1 γ γ > 0 positive association; γ < 0 negative association. 3. X Y γ = 0 ( is not true in general). Given data {n ij } from a multinomial sampling (a random sample), γ can be estimated in 2 ways: 1. Use π ij = n ij /n to estimate Π c and Π d, then 2. Use γ = Π c Π d Π c + Π d. γ = C D C + D, where C = # of concordant pairs, D = # of discordant pairs. Slide 99

55 3. An Example (Table 2.8 on p. 57) of job satisfaction from 2006 General Social Survey: Job Satisfaction (Y ) Age (X) 1 (NS) 2 (FS) 3 (VS) < > C = 34( ) + 53( ) + = D = 53( ) + 88( ) + = γ = = Slide 100

56 SAS program for the job satisfaction data: options ls=80 ps=200 nodate nonumber; data table2_8; input age $ jobsat $ datalines; ; <30 NS 34 <30 FS 53 <30 VS NS FS VS 304 >50 NS 29 >50 FS 75 >50 VS 172 title "Table 2.8 of Agresti (2013) using original categories"; proc freq data=table2_8 order=data; weight count; tables age*jobsat / all norow nocol nopercent; run; Slide 101

57 SAS output from the previous SAS program: Table 2.8 of Agresti (2013) using original categories age The FREQ Procedure Table of age by jobsat jobsat Frequency NS FS VS Total < > Total Statistics for Table of age by jobsat Statistic DF Value Prob Chi-Square Likelihood Ratio Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer s V Slide 102

58 Statistic Value ASE Gamma Kendall s Tau-b Stuart s Tau-c Somers D C R Somers D R C Pearson Correlation Spearman Correlation Lambda Asymmetric C R Lambda Asymmetric R C Lambda Symmetric Uncertainty Coefficient C R Uncertainty Coefficient R C Uncertainty Coefficient Symmetric Sample Size = 1009 Slide 103

59 V.2 Comparison of 2 ordinal cat. variables: 2 J table, Y ordinal Y X (group) 1 2 J 1 π 11 π 12 π 1J 2 π 21 π 22 π 2J Define Y 1 as the r.v. of Y X = 1 and Y 2 the r.v. of Y X = 2. Then = P(Y 1 > Y 2 ) P(Y 2 > Y 1 ). Slide 104

60 Y 1 and Y 2 have distributions: Levels 1 2 J Y 1 π 1 1 π 2 1 π J 1 Then Y 2 π 1 2 π 2 2 π J 2 = j>k π j 1 π k 2 j<k π j 1 τ k 2. Slide 105

61 Given data from multinomial sampling or product-multinomial sampling on X (group): X (group) 1 2 J 1 n 11 n 12 n 1J n 1 2 n 21 n 22 n 2J n 2 1. Estimate π j 1 s by π j 1 = n 1j /n 1, and estimate π j 2 s by π j 2 = n 2j /n 2, then estimate : j>k = n 1jn 2k j<k n 1jn 2k. n 1 n 2 2. If we define group 1 > group 2, then C = j>k n 1jn 2k, D = j<k n 1jn 2k and Y = C D n 1 n 2. Slide 106

62 3. An example (Table 2.9, p.59) of shoulder tip pain after surgery: Pain Scores Treatments Active Control Estimated cumulative distributions: Pain Scores Treatments Active Control The pain scores in two groups are stochastically ordered, active group tended to have lower scores. Slide 107

63 C = 7(2 + 1) + 3(1) = 24. D = 19( ) + 2( ) + 1(3 + 2) = 251. = = Slide 108

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon

More information

Chapter 2: Describing Contingency Tables - I

Chapter 2: Describing Contingency Tables - I : Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

Means or "expected" counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv

Means or expected counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv Measures of Association References: ffl ffl ffl Summarize strength of associations Quantify relative risk Types of measures odds ratio correlation Pearson statistic ediction concordance/discordance Goodman,

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 13 - Effect Measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered 1. risk factors 2. risk

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

3 Way Tables Edpsy/Psych/Soc 589

3 Way Tables Edpsy/Psych/Soc 589 3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

More information

Measures of Association for I J tables based on Pearson's 2 Φ 2 = Note that I 2 = I where = n J i=1 j=1 J i=1 j=1 I i=1 j=1 (ß ij ß i+ ß +j ) 2 ß i+ ß

Measures of Association for I J tables based on Pearson's 2 Φ 2 = Note that I 2 = I where = n J i=1 j=1 J i=1 j=1 I i=1 j=1 (ß ij ß i+ ß +j ) 2 ß i+ ß Correlation Coefficient Y = 0 Y = 1 = 0 ß11 ß12 = 1 ß21 ß22 Product moment correlation coefficient: ρ = Corr(; Y ) E() = ß 2+ = ß 21 + ß 22 = E(Y ) E()E(Y ) q V ()V (Y ) E(Y ) = ß 2+ = ß 21 + ß 22 = ß

More information

Solution to Tutorial 7

Solution to Tutorial 7 1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

Ordinal Variables in 2 way Tables

Ordinal Variables in 2 way Tables Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Contingency Tables Part One 1

Contingency Tables Part One 1 Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Research Methodology: Tools

Research Methodology: Tools MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide

More information

Lecture 25: Models for Matched Pairs

Lecture 25: Models for Matched Pairs Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou. E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

CDA Chapter 3 part II

CDA Chapter 3 part II CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann Statistics of Contingency Tables - Extension to I x J stat 557 Heike Hofmann Outline Testing Independence Local Odds Ratios Concordance & Discordance Intro to GLMs Simpson s paradox Simpson s paradox:

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Understand the difference between symmetric and asymmetric measures

Understand the difference between symmetric and asymmetric measures Chapter 9 Measures of Strength of a Relationship Learning Objectives Understand the strength of association between two variables Explain an association from a table of joint frequencies Understand a proportional

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Loglinear models. STAT 526 Professor Olga Vitek

Loglinear models. STAT 526 Professor Olga Vitek Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number

More information

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success When the experiment consists of a series of n independent trials, and each trial may end in either success or failure,

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

1 Comparing two binomials

1 Comparing two binomials BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

Suppose that we are concerned about the effects of smoking. How could we deal with this?

Suppose that we are concerned about the effects of smoking. How could we deal with this? Suppose that we want to study the relationship between coffee drinking and heart attacks in adult males under 55. In particular, we want to know if there is an association between coffee drinking and heart

More information

Tests for Two Correlated Proportions in a Matched Case- Control Design

Tests for Two Correlated Proportions in a Matched Case- Control Design Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Readings Howitt & Cramer (2014) Overview

Readings Howitt & Cramer (2014) Overview Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests In this course, we use chi-square tests in two different ways The chi-square test for goodness-of-fit is used to determine whether

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Readings Howitt & Cramer (2014)

Readings Howitt & Cramer (2014) Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance

More information

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

STA Outlines of Solutions to Selected Homework Problems

STA Outlines of Solutions to Selected Homework Problems 1 STA 6505 CATEGORICAL DATA ANALYSIS Outlines of Solutions to Selected Homework Problems Alan Agresti January 5, 2004, c Alan Agresti 2004 This handout contains solutions and hints to solutions for many

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

E509A: Principle of Biostatistics. GY Zou

E509A: Principle of Biostatistics. GY Zou E509A: Principle of Biostatistics (Effect measures ) GY Zou gzou@robarts.ca We have discussed inference procedures for 2 2 tables in the context of comparing two groups. Yes No Group 1 a b n 1 Group 2

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Chapter 11: Analysis of matched pairs

Chapter 11: Analysis of matched pairs Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

10: Crosstabs & Independent Proportions

10: Crosstabs & Independent Proportions 10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Some comments on Partitioning

Some comments on Partitioning Some comments on Partitioning Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/30 Partitioning Chi-Squares We have developed tests

More information

Data Analysis as a Decision Making Process

Data Analysis as a Decision Making Process Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 6-Logistic Regression for Case-Control Studies Outlines: 1. Biomedical Designs 2. Logistic Regression Models for Case-Control Studies 3. Logistic

More information

2.3 Analysis of Categorical Data

2.3 Analysis of Categorical Data 90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Chapter 19. Agreement and the kappa statistic

Chapter 19. Agreement and the kappa statistic 19. Agreement Chapter 19 Agreement and the kappa statistic Besides the 2 2contingency table for unmatched data and the 2 2table for matched data, there is a third common occurrence of data appearing summarised

More information

6 Applying Logistic Regression Models

6 Applying Logistic Regression Models 6 Applying Logistic Regression Models I Model Selection and Diagnostics I.1 Model Selection # of x s can be entered in the model: Rule of thumb: # of events (both [Y = 1] and [Y = 0]) per x 10. Need to

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis

Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis 2014 Maternal and Child Health Epidemiology Training Pre-Training Webinar: Friday, May 16 2-4pm Eastern Kristin Rankin,

More information

Analysis of Categorical Data Three-Way Contingency Table

Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous

More information

Bayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK)

Bayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK) Bayesian Methods for Multivariate Categorical Data Jon Forster (University of Southampton, UK) Table 1: Alcohol intake, hypertension and obesity (Knuiman and Speed, 1988) Alcohol intake (drinks/day) Obesity

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Stat 587: Key points and formulae Week 15

Stat 587: Key points and formulae Week 15 Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Small n, σ known or unknown, underlying nongaussian

Small n, σ known or unknown, underlying nongaussian READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4

More information