Solution to Tutorial 7

Similar documents
Lecture 8: Summary Measures

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

BIOS 625 Fall 2015 Homework Set 3 Solutions

Log-linear Models for Contingency Tables

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Chapter 2: Describing Contingency Tables - II

2 Describing Contingency Tables

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Three-Way Tables (continued):

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Solutions for Examination Categorical Data Analysis, March 21, 2013

Categorical Data Analysis Chapter 3

Correspondence Analysis

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

36-720: Log-Linear Models: Three-Way Tables

STAC51: Categorical data Analysis

MSH3 Generalized linear model

Describing Contingency tables

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Analysis of Categorical Data Three-Way Contingency Table

3 Way Tables Edpsy/Psych/Soc 589

Statistics 3858 : Contingency Tables

Cohen s s Kappa and Log-linear Models

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

BMI 541/699 Lecture 22

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

Lec 5: Factorial Experiment

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Module 4 Practice problem and Homework answers

Topic 21 Goodness of Fit

R Hints for Chapter 10

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

STAT 526 Spring Final Exam. Thursday May 5, 2011

Loglinear models. STAT 526 Professor Olga Vitek

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

Department of Mathematics The University of Toledo. Master of Science Degree Comprehensive Examination Applied Statistics.

STA Outlines of Solutions to Selected Homework Problems

Testing Independence

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Visualizing Independence Using Extended Association and Mosaic Plots. Achim Zeileis David Meyer Kurt Hornik

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Various Issues in Fitting Contingency Tables

REVIEW 8/2/2017 陈芳华东师大英语系

STAT 705 Chapter 16: One-way ANOVA

Homework 10 - Solution

Matched Pair Data. Stat 557 Heike Hofmann

Lecture 28 Chi-Square Analysis

Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik

Generalized Linear Models: An Introduction

Interactions in Logistic Regression

Analysis of data in square contingency tables

INTRODUCTION TO LOG-LINEAR MODELING

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Chapter 12. Analysis of variance

CHAPTER 1: BINARY LOGIT MODEL

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Exam Applied Statistical Regression. Good Luck!

Review of One-way Tables and SAS

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

STAT 705: Analysis of Contingency Tables

2.6.3 Generalized likelihood ratio tests

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Generalized logit models for nominal multinomial responses. Local odds ratios

Answer Key for STAT 200B HW No. 8

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Discrete Multivariate Statistics

Residual-based Shadings for Visualizing (Conditional) Independence

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann

Chapter 20: Logistic regression for binary response variables

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Single-level Models for Binary Responses

STA102 Class Notes Chapter Logistic Regression

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann

9 Generalized Linear Models

MATH Notebook 3 Spring 2018

Statistics in medicine

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Contingency Tables Part One 1

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Multinomial Logistic Regression Models

Lecture 12: Effect modification, and confounding in logistic regression

13.1 Categorical Data and the Multinomial Experiment

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Module 5 Practice problem and Homework answers

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Extended Mosaic and Association Plots for Visualizing (Conditional) Independence. Achim Zeileis David Meyer Kurt Hornik

Transcription:

1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are ˆλ = 9.2956, ˆλ X 1 = 0.0003, ˆλ X 2 = 0.0,ˆλ Y 1 = 4.3085 and ˆλ Y 2 = 0.0. The likelihood-ratio goodness-of-fit statistic (deviance) for the fitted model is, G 2 = 25.3720 with d.f. = 1 (p-value = 4.727301 10 7 ). Thus, the independence model is not a good fit. In this case, ˆλ Y 1 ˆλ Y 2 = 4.3085, which is estimated log odds of myocardial infarction. In other words, estimated odds of myocardial infarction for either Placebo group or aspirin group is exp( 4.3085) = 0.01345. (b) For the saturated model, the parameter estimates are: ˆλ = 9.2995, ˆλ X 1 = 0.0081, ˆλ X 2 = 0.0, ˆλ Y 1 = 4.6552, ˆλ Y 2 = 0.0, ˆλ 11 = 0.6054, ˆλ 12 = ˆλ 21 = ˆλ 22 = 0.0. The estimated conditional log odds ratio between myocardial infarction and aspirin use is ˆλ 11 +ˆλ 22 ˆλ 12 ˆλ 21 = 0.6054. Therefore, the estimated odds ratio between myocardial infarction and aspirin use is which is the sample odds ratio. exp(0.6054) = 1.832 = 189 10933 104 10845 2. (a) The parameter estimates for the fitted (DV, DP, P V ) model are: ˆλ = 4.9358, ˆλ D 1 = 2.1746, ˆλ D 2 = 0, and (b) The fitted values are: ˆλ V 1 = 1.3298, ˆλ V 2 = 0, ˆλ P 1 = 3.5961, ˆλ P 2 = 0.0 11 = 4.5950, ˆλ P 11 V DP = 2.4044, ˆλ 11 = 0.8678. Victims Defendants Death Penalty Race Race Yes No White White 52.82 414.18 Black 11.18 36.82 Black White 0.18 15.82 Black 3.82 139.18 1

From the fitted values, estimated odds ratio between D and P when V = White is 0.42 and when V = Black, it is 0.42. Since this is a homogeneous association model, the odds ratios are same at each level of V. Note that, this odds ratio can also be obtained as exp{ˆλ DP DP DP DP 11 + ˆλ 22 ˆλ 12 ˆλ 21 } = exp( 0.8678) = 0.42. The value of this common odds ratio is smaller than 1 indicating that the odds of getting death penalty for a white defendant is smaller than that of a black defendant controlling for the victims race. (c) The fitted marginal table between Death Penalty verdict and defendant s race is Defendants Death Penalty Race Yes No White 53 430 Black 15 176 Note that, the marginal table from the fitted values is the same as the one obtained from the sample data. Therefore, the marginal odds ratio is 1.446. It is greater than 1 indicating that the odds of getting a death penalty for whites are greater than that of blacks if we ignore the victims race. Thus, we observe a completely different type of association in the marginal table than the conditional association controlling for victims race. This illustrates Simpsons paradox. (d) The goodness-of-fit test statistic for this homogeneous association model is G 2 = 0.3798 with d.f. = 1 (p-value = 0.5377). Thus, we can conclude that the homogeneous association model is a good fit. (e) The parameter estimates of the fitted (DV, P V ) model are: and ˆλ = 4.9374, ˆλ V 1 = 1.1989, ˆλ V 2 = 0.0, ˆλ D 1 = 2.1903, ˆλ D 2 = 0.0, ˆλ P 1 = 3.6571, ˆλ P 2 = 0.0 11 = 4.4654, ˆλ P V 11 = 1.7045. This is the conditional independence model between D and P controlling for V. But the estimated conditional log odds ratio between D and V controlling for P is given by 11 = 4.4654 and the estimated conditional odds ratio between P and V controlling for D is given by ˆλ P 11 V = 1.7045. The likelihood ratio goodness-of-fit statistic for this model is, G 2 = 5.3940 with d.f. = 2 (p-value = 0.06741). Since the p-value is greater than 0.05, we can conclude that this model is also a good fit. The following table shows the adjusted residuals obtained from this model fit: Victims Defendants Death Penalty Race Race Yes No White White.-2.313 2.313 Black.2.313-2.313 Black White -0.678 0.678 Black 0.678-0.678 2

The adjusted residuals in the partial table for victims race white are little larger compared to those in the partial for victims race black. This suggests that the conditional independence model may not be a very good fit. (f) To test for D P partial association, we need to test the null hypothesis H 0 : λ DP 11 = λ DP 12 = λ DP 21 = λ DP 22 = 0. The likelihood ratio test statistics for this purpose is G 2 [(DV, P V ) (DV, P V, DP )] = G 2 (DV, P V ) G 2 (DV, P V, DP ) = 5.3940 0.3798 = 5.0142 with d.f. = 2 1 = 1. It has a p-value 0.0251, which is smaller than 0.05 and thus indicates that the null hypothesis can be rejected. That is, the partial association between D and P is significant. 3. Since {µ ij = nπ ij } satisfies the independence model in a I J table, π ij = π i+ π +j. Therefore, log µ ij = log n + log π i+ + log π +j. The loglinear independence model can be written as (a) From the first model, we can write log µ ij = λ + λ X i + λ Y j. log µ ia log µ ib = (log n + log π i+ + log π +a ) (log n + log π i+ + log π+b) Now from the second model, = log π +a log π +b = log(π +a /π +b ). log µ ia log µ ib = (λ + λ X i + λ Y a ) (λ + λ X i + λ Y b ) = λy a λ Y b. Thus, λ Y a λ Y b = log(π +a/π +b ). (b) All λ Y j = 0 implies that log(π +a /π +b ) = 0 or π +a = π +b for all a and b. Let all π +j = π, then from the relation J π +j = 1 j=1 we have π = 1/J. That is, π +j = 1/J for all j. 4. (a) We first fit the model (AD, DG) of conditional independence between gender and the admission controlling for the department: log µ ijk = λ + λ A i + λ G j + λ D k + λad ik The fitted values are given in the following table: + λ GD jk. 3

Whether admitted, male 1 531.43 293.57 69.57 38.43 2 354.19 205.81 15.81 9.19 3 114.00 211.00 208.00 385.00 4 141.63 275.37 127.37 247.63 5 48.08 142.92 98.92 294.08 6 24.03 348.97 21.97 319.03 The deviance for the fitted model is 21.7355 with degrees of freedom 6. Thus, it has a p-value of 0.0014, indicating that the model is a poor fit. Next we fit the homogeneous association model (AD, AG, DG): log µ ijk = λ + λ A i + λ G j + λ D k + λag ij The fitted values are given in the following table: Whether admitted, male + λ AD ik + λ GD jk. 1 529.27 295.73 71.73 36.27 2 353.64 206.36 16.36 8.64 3 109.25 215.75 212.75 380.25 4 137.21 279.79 131.79 243.21 5 45.68 145.32 101.32 291.68 6 22.96 350.04 23.04 317.96 The deviance for the fitted model is 20.2043 with degrees of freedom 5. Thus, it has a p-value of 0.0011, indicating that the homogeneous association model is also a poor fit. (b) We compute the adjusted residuals for the model (AD, DG) and report them in the following table: Whether admitted, male 1-4.153 4.153 4.153-4.153 2-0.504 0.504 0.504-0.504 3 0.868-0.868-0.868 0.868 4-0.546 0.546 0.546-0.546 5 1.000-1.000-1.000 1.000 6-0.620 0.620 0.620-0.620 We notice that the residuals are large for Department 1, but for other departments the residuals are small. So most likely department 1 is an outlier. (c) We again fit the model (AD, DG) after deleting the data for Department 1. We report the fitted values in the following table: 4

Whether admitted, male 2 354.19 205.81 15.81 9.19 3 114.00 211.00 208.00 385.00 4 141.63 275.37 127.37 247.63 5 48.08 142.92 98.92 294.08 6 24.03 348.97 21.97 319.03 The deviance for the fitted model is 2.6815 with degrees of freedom 5. Thus, it has a p-value of 0.7489, indicating that the model is a very good fit and we can conclude that there is no association between gender and admission in the departments. 5