Cohen s s Kappa and Log-linear Models

Similar documents
(c) Interpret the estimated effect of temperature on the odds of thermal distress.

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

STA6938-Logistic Regression Model

BIOS 625 Fall 2015 Homework Set 3 Solutions

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Lecture 12: Effect modification, and confounding in logistic regression

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Sections 4.1, 4.2, 4.3

STAT 7030: Categorical Data Analysis

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Log-linear Models for Contingency Tables

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Solutions for Examination Categorical Data Analysis, March 21, 2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Three-Way Contingency Tables

Testing Independence

Chapter 11: Models for Matched Pairs

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Simple logistic regression

Chapter 4: Generalized Linear Models-I

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Poisson Data. Handout #4

8 Nominal and Ordinal Logistic Regression

Lecture 25: Models for Matched Pairs

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Lecture 8: Summary Measures

Multinomial Logistic Regression Models

Binary Logistic Regression

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

Longitudinal Modeling with Logistic Regression

Chapter 11: Analysis of matched pairs

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

9 Generalized Linear Models

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Single-level Models for Binary Responses

STAC51: Categorical data Analysis

Investigating Models with Two or Three Categories

Categorical data analysis Chapter 5

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Wrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.

STA102 Class Notes Chapter Logistic Regression

BMI 541/699 Lecture 22

Ordinal Variables in 2 way Tables

Binary Dependent Variables

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Section Poisson Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Lecture 5: ANOVA and Correlation

Case-control studies C&H 16

Correlation and regression

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)

Exam details. Final Review Session. Things to Review

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Chapter 5: Logistic Regression-I

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

STAT 705: Analysis of Contingency Tables

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Simple Linear Regression: One Qualitative IV

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

STAC51: Categorical data Analysis

Poisson regression: Further topics

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Suppose that we are concerned about the effects of smoking. How could we deal with this?

Review of Statistics 101

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Psych 230. Psychological Measurement and Statistics

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

Generalized Additive Models

3 Way Tables Edpsy/Psych/Soc 589

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

A course in statistical modelling. session 06b: Modelling count data

Analysis of Categorical Data Three-Way Contingency Table

Statistics for Managers Using Microsoft Excel

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Unit 9: Inferences for Proportions and Count Data

Statistics 3858 : Contingency Tables

2.1 Linear regression with matrices

Some comments on Partitioning

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Introduction to logistic regression

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Transcription:

Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am

1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance agreement (which depends on the marginals). π ii π i + π + i Normalize by its maximum possible value. 1 π ii π i + π π i + π + i + i

Example: rater agreement Rating by supervisor 2 Rating by supervisor 1 Authoritarian Democratic Permissive Totals Authoritarian 17 4 8 29 Democratic 5 12 0 17 Permissive 10 3 13 26 Totals 32 19 21 72

Null hypothesis: Kappa=0 (no agreement beyond chance) Example: student teacher ratings Kˆ = 17 ( + 12 72 1 + 29.583.347 = 1.347 =.362 95 % CI =.182 -.542 ( 13 ) ( * 32 29 + * * * 32 + 17 72 17 *19 + 72 * 72 *19 + 26 * 72 26 * 21 ) * 21 Interpretation: achieved 36.2% of maximum possible improvement over that expected by chance alone ) **See class handout for the formula for the asymptotic (large sample) variance of Kappa.

2. Log-Linear Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti chapter 4). 2. Recall: log µ = α x µ = e α ( e β)x A one-unit increase in X has a multiplicative impact of e β on µ. 3. General idea: predict the expected frequency (count) in each cell by a product of effects main effects and interactions. 4. (Take logs to linearize).

Log-linear vs. logistic 1. The expected distribution of the categorical variables is Poisson, not binomial. 2. The link function is the log, not the logit. 3. Predictions are estimates of the cell counts in a contingency table, not the logit of y.

Log-linear vs. logistic The variables investigated by log linear models are all treated as response variables. Therefore, loglinear models only demonstrate association between variables (like chi-square or correlation coefficient). If clear explanatory and response variables exist, then logistic regression should be used instead. Also, if the variables are continuous and cannot be broken down into discrete categories, logistic regression is preferable.

Example: 3-way 3 contingency Heart Disease Total Body Weight Sex Yes No Not over weight Male 15 5 20 Fe 40 60 100 Total 55 65 120 Over weight Male 20 10 30 Fe 10 40 50 Total 30 50 80 Source: Angela Jeansonne

In class exercise: Analyze these data using methods we have already learned. Is gender related to heart disease and is this effect modified or confounded by weight? What s the relationship between eight and gender (controlled for ) and eight and heart disease (controlled for gender)?

Example: sex, weight, and heart disease Model 1 (main effects only): Log (counts) = α eight ismale HeartDisease proc genmod data=loglinear; model total = Overweight IsMale HeartDis / dist=poisson link=log pred ; run;

df = cells parameters in model χ 2 = 4 34.5 Independence model: goodness-of of-fitfit Cells Observed Pred light//disease 15 12.75 light//no disease 5 17.25 light/fe/disease 40 38.25 light/fe/no disease 60 51.75 heavy//disease 20 8.5 Suggests independen ce model is a poor fit!! heavy//no disease 10 11.5 heavy/fe/disease 10 25.5 heavy/fe/no disease 40 34.5

Independence model: parameters Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Intercept 1 3.9464 0.1170 3.7171 4.1758 1137.17 Overweight 1-0.4055 0.1443-0.6884-0.1226 7.89 IsMale 1-1.0986 0.1633-1.4187-0.7786 45.26 HeartDis 1-0.3023 0.1430-0.5826-0.0219 4.47 Model 1: Parameter Pr > ChiSq Intercept <.0001 Overweight 0.0050 IsMale <.0001 HeartDis 0.0346 Log (counts) = 3.95 -.41 (weight) 1.1 () -.30 (heart disease)

Interpretation of Parameters Model 1: Log (counts) = 3.95 -.41 (weight) 1.1 () -.30 (heart disease) e -.41 = the (marginal) odds of being eight =.66= 80/120 e -1.1 = the odds of being =.33 = 50/150 e -0.3 = the odds of having disease=.74 = 85/115

Model with Interaction: Model 2 (main effects + some interactions): This model corresponds to case when heart disease and eight are conditionally independent (conditioned on gender). Log (counts) = α eight ismale HeartDisease + β ismale *β HeartDisease + β ismale * β eight proc genmod data=loglinear; model total = Overweight IsMale HeartDis ismale*heartdis ismale*overweight/ dist=poisson pred ; run; link=log

Analysis Of Parameter Estimates Model 2: Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept 1 4.1997 0.1155 3.9734 4.4260 Overweight 1-0.6931 0.1732-1.0326-0.3537 IsMale 1-2.4079 0.3317-3.0580-1.7579 HeartDis 1-0.6931 0.1732-1.0326-0.3537 IsMale*HeartDis 1 1.5404 0.3539 0.8468 2.2341 Overweight*IsMale 1 1.0986 0.3367 0.4388 1.7584 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1322.81 <.0001 Overweight 16.02 <.0001 IsMale 52.71 <.0001 HeartDis 16.02 <.0001 IsMale*HeartDis 18.95 <.0001 Overweight*IsMale 10.65 0.0011 Log (counts) = 4.19 -.69 (weight) 2.4 () -.69 (heart disease) 1.54 (if and heartdis) + 1.1 (if eight and )

Interpretation of Parameters, Model 2: Model 2 Log (counts) = 4.19 -.69 (weight) 2.4 () -.69 (heart disease) log( OR k = β k * ( α = β OR * 1.54 (if and heartdis) + 1.1 (if eight and ) µ 11µ 22 ) = log( ) = log µ 11 + log µ 22 log µ 12 log µ 21 = µ µ = eight :( α = * 1.54 = e = 12 = not eight :( α e β 21 * 4.66 ) ( α * ) + ( α) ( α * ) * ) ( α ) + ( α ) ) =

OR estimate from predicted counts Cells Observed Pred light//disease 15 14 light//no disease 5 6 light/fe/disease 40 33.3 light/fe/no disease 60 66.6 heavy//disease 20 21 heavy//no disease 10 9 χ 2 = 2 6.3 heavy/fe/disease 10 16.6 heavy/fe/no disease 40 33.3 OR( k OR( k 14*66.6 = light) = = 4.66 6*33.3 21*33.3 = heavy) = = 4.66 9*16.6 OR - is not confounded by weight

Male and Overweight Model 2: Log (counts) = 4.19 -.69 (weight) 2.4 () -.69 (heart disease) log( 1.54 (if and heartdis) + 1.1 (if eight and ) µ 11µ 22 OR.) = log( ) = log µ 11 + log µ 22 log µ 12 log µ 21 = µ µ k = no :( α = β * k = ( α = β * OR = e :( α β * 1.1 = e = 12 over * 3.0 21 * ) ( α ) + ( α) ( α * * ) ) ( α ) + ( α ) over ) =

OR estimate from predicted counts Cells Observed Pred light//disease 15 14 light//no disease 5 6 light/fe/disease 40 33.3 light/fe/no disease 60 66.6 heavy//disease 20 21 heavy//no disease 10 9 heavy/fe/disease 10 16.6 heavy/fe/no disease 40 33.3 OR( k OR( k 21*33.3 = ) = = 3.00 14*16.6 9*66.6 = no ) = = 3.00 6*33.3 OR -eight is not confounded by

Interpretation: Model 2 Overweight and heart-disease are independent when you condition on gender. Heart Disease Men Yes No Overweight 21 9 normal 14 6 OR=21*6/14*9 =1.0 Women Overweight 16.6 33.3 normal 33.3 66.6 OR=16.6*33.3/33.3*33.3 =1.0

Model 3: only and are related Model 2 (main effects + single interaction): This model corresponds to case when heart disease and eight and gender and eight are conditionally independent. Log (counts) = α eight ismale HeartDisease + Output Model 3: β ismale *β HeartDisease Log (counts) = 4.09 -.41 (weight) 1.9 () -.69 (heart disease) 1.54 (if and heartdis)

OR: Male and CHD Model 3: Log (counts) = 4.09 -.41 (weight) 1.9 () -.69 (heart disease) 1.54 (if and heartdis) µ 11µ 22 log( OR ) = log( ) = log µ 11 + log µ µ µ k = no eight :( α = β * k = eight :( α ( α = β * OR = e β * 1.54 = e = 12 21 ) ( α 4.66 ) * 22 log µ log µ ) + ( α) ( α * 12 ) + ( α 21 ) ( α ) = ) =

Model 3: only and are related Cells Observed Pred light//disease 15 21 light//no disease 5 9 light/fe/disease 40 30 light/fe/no disease 60 60 heavy//disease 20 14 heavy//no disease 10 6 heavy/fe/disease 10 20 heavy/fe/no disease 40 40

Collapses to Male Fe CHD No CHD 35 50 15 100 OR = 35*100 50*15 = 4.66

And heart disease and eight are independent, regardless of gender Overweight light CHD No CHD 34 51 46 69 OR = 34*69 46*51 = 1.00

And eight and gender are independent, regardless of disease Overweight light Male Fe 20 30 60 90 OR = 20*90 60*30 = 1.00

proc genmod data=loglinear; model total = Overweight IsMale HeartDis ismale*heartdis ismale*overweight Overweight*HeartDis / dist=poisson link=log pred ; run; M4: All pair-wise interactions Model 4 (main effects +all pairwise interactions): No pair of variables is conditionally independent. Log (counts) = α eight ismale β ismale *β HeartDisease + β ismale * β eight + HeartDisease β HeartDis * β eight

Model 4: Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept 1 4.1103 0.1263 3.8627 4.3579 Overweight 1-0.4458 0.1978-0.8336-0.0581 IsMale 1-2.7153 0.3877-3.4753-1.9554 HeartDis 1-0.4458 0.1978-0.8336-0.0581 IsMale*HeartDis 1 1.8213 0.3871 1.0627 2.5799 Overweight*IsMale 1 1.4456 0.3797 0.7013 2.1899 Overweight*HeartDis 1-0.8239 0.3431-1.4963-0.1515 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1058.30 <.0001 Overweight 5.08 0.0242 IsMale 49.04 <.0001 HeartDis 5.08 0.0242 IsMale*HeartDis 22.14 <.0001 Overweight*IsMale 14.49 0.0001 Overweight*HeartDis 5.77 0.0163 Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis)

OR: Male and CHD Model 4: Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis) log( OR = β * ( α = β OR * = e µ 11µ ) = log( µ µ k = eight :( α * 1.8 = e = 12 * 6.0 22 21 k = not eight :( α β ) = log µ 11 ) ( α + log µ 22 * log µ log µ ) + ( α) ( α * 12 * * ) 21 = ) ( α * ) = ) + ( α Corresponds to the M-H summary OR, stratified by eight )

Corresponds to the M-H summary OR, stratified by gender OR: CHD and eight Model 4: Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis) OR = e β * eight = e.82 =.42

OR: and eight Model 4: Log (counts) = 4.11 -.25 (weight) 2.7 () -.45 (heart disease) 1.8 (if and heartdis) + 1.4 (if eight and )-.82 (if over and heartdis) OR β * = e eight = e 1.4 = 4.1 Corresponds to the M-H summary OR, stratified by

OR estimate from predicted counts Cells Observed Pred light//disease 15 16 light//no disease 5 4 light/fe/disease 40 39 light/fe/no disease 60 61 heavy//disease 20 19 heavy//no disease 10 11 heavy/fe/disease 10 11 heavy/fe/no disease 40 39 χ 2 = 1.571

The saturated model Model 5 (saturated): Log (counts) = α eight ismale β ismale *β HeartDisease + β ismale * β eight + β HeartDis * β eight + HeartDisease β ismale *β HeartDisease * β eight Perfect fit no degrees of freedom.