Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am

1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance agreement (which depends on the marginals). π ii π i + π + i Normalize by its maximum possible value. 1 π ii π i + π π i + π + i + i

Example: rater agreement Rating by supervisor 2 Rating by supervisor 1 Authoritarian Democratic Permissive Totals Authoritarian 17 4 8 29 Democratic 5 12 0 17 Permissive 10 3 13 26 Totals 32 19 21 72

Null hypothesis: Kappa=0 (no agreement beyond chance) Example: student teacher ratings Kˆ = 17 ( + 12 72 1 + 29.583.347 = 1.347 =.362 95 % CI =.182 -.542 ( 13 ) ( * 32 29 + * * * 32 + 17 72 17 *19 + 72 * 72 *19 + 26 * 72 26 * 21 ) * 21 Interpretation: achieved 36.2% of maximum possible improvement over that expected by chance alone ) **See class handout for the formula for the asymptotic (large sample) variance of Kappa.

2. Log-Linear Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti chapter 4). 2. Recall: log µ = α x µ = e α ( e β)x A one-unit increase in X has a multiplicative impact of e β on µ. 3. General idea: predict the expected frequency (count) in each cell by a product of effects main effects and interactions. 4. (Take logs to linearize).

Log-linear vs. logistic 1. The expected distribution of the categorical variables is Poisson, not binomial. 2. The link function is the log, not the logit. 3. Predictions are estimates of the cell counts in a contingency table, not the logit of y.

Log-linear vs. logistic The variables investigated by log linear models are all treated as response variables. Therefore, loglinear models only demonstrate association between variables (like chi-square or correlation coefficient). If clear explanatory and response variables exist, then logistic regression should be used instead. Also, if the variables are continuous and cannot be broken down into discrete categories, logistic regression is preferable.

Example: 3-way 3 contingency Heart Disease Total Body Weight Sex Yes No Not over weight Male 15 5 20 Fe 40 60 100 Total 55 65 120 Over weight Male 20 10 30 Fe 10 40 50 Total 30 50 80 Source: Angela Jeansonne

In class exercise: Analyze these data using methods we have already learned. Is gender related to heart disease and is this effect modified or confounded by weight? What s the relationship between eight and gender (controlled for ) and eight and heart disease (controlled for gender)?

Example: sex, weight, and heart disease Model 1 (main effects only): Log (counts) = α eight ismale HeartDisease proc genmod data=loglinear; model total = Overweight IsMale HeartDis / dist=poisson link=log pred ; run;

df = cells parameters in model χ 2 = 4 34.5 Independence model: goodness-of of-fitfit Cells Observed Pred light//disease 15 12.75 light//no disease 5 17.25 light/fe/disease 40 38.25 light/fe/no disease 60 51.75 heavy//disease 20 8.5 Suggests independen ce model is a poor fit!! heavy//no disease 10 11.5 heavy/fe/disease 10 25.5 heavy/fe/no disease 40 34.5

Independence model: parameters Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Intercept 1 3.9464 0.1170 3.7171 4.1758 1137.17 Overweight 1-0.4055 0.1443-0.6884-0.1226 7.89 IsMale 1-1.0986 0.1633-1.4187-0.7786 45.26 HeartDis 1-0.3023 0.1430-0.5826-0.0219 4.47 Model 1: Parameter Pr > ChiSq Intercept <.0001 Overweight 0.0050 IsMale <.0001 HeartDis 0.0346 Log (counts) = 3.95 -.41 (weight) 1.1 () -.30 (heart disease)

Interpretation of Parameters Model 1: Log (counts) = 3.95 -.41 (weight) 1.1 () -.30 (heart disease) e -.41 = the (marginal) odds of being eight =.66= 80/120 e -1.1 = the odds of being =.33 = 50/150 e -0.3 = the odds of having disease=.74 = 85/115

Model with Interaction: Model 2 (main effects + some interactions): This model corresponds to case when heart disease and eight are conditionally independent (conditioned on gender). Log (counts) = α eight ismale HeartDisease + β ismale *β HeartDisease + β ismale * β eight proc genmod data=loglinear; model total = Overweight IsMale HeartDis ismale*heartdis ismale*overweight/ dist=poisson pred ; run; link=log

Analysis Of Parameter Estimates Model 2: Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept 1 4.1997 0.1155 3.9734 4.4260 Overweight 1-0.6931 0.1732-1.0326-0.3537 IsMale 1-2.4079 0.3317-3.0580-1.7579 HeartDis 1-0.6931 0.1732-1.0326-0.3537 IsMale*HeartDis 1 1.5404 0.3539 0.8468 2.2341 Overweight*IsMale 1 1.0986 0.3367 0.4388 1.7584 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1322.81 <.0001 Overweight 16.02 <.0001 IsMale 52.71 <.0001 HeartDis 16.02 <.0001 IsMale*HeartDis 18.95 <.0001 Overweight*IsMale 10.65 0.0011 Log (counts) = 4.19 -.69 (weight) 2.4 () -.69 (heart disease) 1.54 (if and heartdis) + 1.1 (if eight and )

Interpretation of Parameters, Model 2: Model 2 Log (counts) = 4.19 -.69 (weight) 2.4 () -.69 (heart disease) log( OR k = β k * ( α = β OR * 1.54 (if and heartdis) + 1.1 (if eight and ) µ 11µ 22 ) = log( ) = log µ 11 + log µ 22 log µ 12 log µ 21 = µ µ = eight :( α = * 1.54 = e = 12 = not eight :( α e β 21 * 4.66 ) ( α * ) + ( α) ( α * ) * ) ( α ) + ( α ) ) =

OR estimate from predicted counts Cells Observed Pred light//disease 15 14 light//no disease 5 6 light/fe/disease 40 33.3 light/fe/no disease 60 66.6 heavy//disease 20 21 heavy//no disease 10 9 χ 2 = 2 6.3 heavy/fe/disease 10 16.6 heavy/fe/no disease 40 33.3 OR( k OR( k 14*66.6 = light) = = 4.66 6*33.3 21*33.3 = heavy) = = 4.66 9*16.6 OR - is not confounded by weight

Male and Overweight Model 2: Log (counts) = 4.19 -.69 (weight) 2.4 () -.69 (heart disease) log( 1.54 (if and heartdis) + 1.1 (if eight and ) µ 11µ 22 OR.) = log( ) = log µ 11 + log µ 22 log µ 12 log µ 21 = µ µ k = no :( α = β * k = ( α = β * OR = e :( α β * 1.1 = e = 12 over * 3.0 21 * ) ( α ) + ( α) ( α * * ) ) ( α ) + ( α ) over ) =

OR estimate from predicted counts Cells Observed Pred light//disease 15 14 light//no disease 5 6 light/fe/disease 40 33.3 light/fe/no disease 60 66.6 heavy//disease 20 21 heavy//no disease 10 9 heavy/fe/disease 10 16.6 heavy/fe/no disease 40 33.3 OR( k OR( k 21*33.3 = ) = = 3.00 14*16.6 9*66.6 = no ) = = 3.00 6*33.3 OR -eight is not confounded by

Interpretation: Model 2 Overweight and heart-disease are independent when you condition on gender. Heart Disease Men Yes No Overweight 21 9 normal 14 6 OR=21*6/14*9 =1.0 Women Overweight 16.6 33.3 normal 33.3 66.6 OR=16.6*33.3/33.3*33.3 =1.0

Model 3: only and are related Model 2 (main effects + single interaction): This model corresponds to case when heart disease and eight and gender and eight are conditionally independent. Log (counts) = α eight ismale HeartDisease + Output Model 3: β ismale *β HeartDisease Log (counts) = 4.09 -.41 (weight) 1.9 () -.69 (heart disease) 1.54 (if and heartdis)

OR: Male and CHD Model 3: Log (counts) = 4.09 -.41 (weight) 1.9 () -.69 (heart disease) 1.54 (if and heartdis) µ 11µ 22 log( OR ) = log( ) = log µ 11 + log µ µ µ k = no eight :( α = β * k = eight :( α ( α = β * OR = e β * 1.54 = e = 12 21 ) ( α 4.66 ) * 22 log µ log µ ) + ( α) ( α * 12 ) + ( α 21 ) ( α ) = ) =

Model 3: only and are related Cells Observed Pred light//disease 15 21 light//no disease 5 9 light/fe/disease 40 30 light/fe/no disease 60 60 heavy//disease 20 14 heavy//no disease 10 6 heavy/fe/disease 10 20 heavy/fe/no disease 40 40

Collapses to Male Fe CHD No CHD 35 50 15 100 OR = 35*100 50*15 = 4.66

And heart disease and eight are independent, regardless of gender Overweight light CHD No CHD 34 51 46 69 OR = 34*69 46*51 = 1.00

And eight and gender are independent, regardless of disease Overweight light Male Fe 20 30 60 90 OR = 20*90 60*30 = 1.00

proc genmod data=loglinear; model total = Overweight IsMale HeartDis ismale*heartdis ismale*overweight Overweight*HeartDis / dist=poisson link=log pred ; run; M4: All pair-wise interactions Model 4 (main effects +all pairwise interactions): No pair of variables is conditionally independent. Log (counts) = α eight ismale β ismale *β HeartDisease + β ismale * β eight + HeartDisease β HeartDis * β eight

Model 4: Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept 1 4.1103 0.1263 3.8627 4.3579 Overweight 1-0.4458 0.1978-0.8336-0.0581 IsMale 1-2.7153 0.3877-3.4753-1.9554 HeartDis 1-0.4458 0.1978-0.8336-0.0581 IsMale*HeartDis 1 1.8213 0.3871 1.0627 2.5799 Overweight*IsMale 1 1.4456 0.3797 0.7013 2.1899 Overweight*HeartDis 1-0.8239 0.3431-1.4963-0.1515 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1058.30 <.0001 Overweight 5.08 0.0242 IsMale 49.04 <.0001 HeartDis 5.08 0.0242 IsMale*HeartDis 22.14 <.0001 Overweight*IsMale 14.49 0.0001 Overweight*HeartDis 5.77 0.0163 Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis)

OR: Male and CHD Model 4: Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis) log( OR = β * ( α = β OR * = e µ 11µ ) = log( µ µ k = eight :( α * 1.8 = e = 12 * 6.0 22 21 k = not eight :( α β ) = log µ 11 ) ( α + log µ 22 * log µ log µ ) + ( α) ( α * 12 * * ) 21 = ) ( α * ) = ) + ( α Corresponds to the M-H summary OR, stratified by eight )

Corresponds to the M-H summary OR, stratified by gender OR: CHD and eight Model 4: Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis) OR = e β * eight = e.82 =.42

OR: and eight Model 4: Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis) OR β * = e eight = e 1.4 = 4.1 Corresponds to the M-H summary OR, stratified by

OR estimate from predicted counts Cells Observed Pred light//disease 15 16 light//no disease 5 4 light/fe/disease 40 39 light/fe/no disease 60 61 heavy//disease 20 19 heavy//no disease 10 11 heavy/fe/disease 10 11 heavy/fe/no disease 40 39 χ 2 = 1.571

The saturated model Model 5 (saturated): Log (counts) = α eight ismale β ismale *β HeartDisease + β ismale * β eight + β HeartDis * β eight + HeartDisease β ismale *β HeartDisease * β eight Perfect fit no degrees of freedom.