ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Similar documents
Topic 21 Goodness of Fit

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Discrete Multivariate Statistics

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

11-2 Multinomial Experiment

Lecture 8: Summary Measures

Multinomial Logistic Regression Models

Statistics 3858 : Contingency Tables

Testing Independence

Chapter 10. Discrete Data Analysis

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Contingency Tables Part One 1

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Three-Way Tables (continued):

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Describing Contingency tables

Chapter 1. Modeling Basics

Statistics for Managers Using Microsoft Excel

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

13.1 Categorical Data and the Multinomial Experiment

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

BIOS 625 Fall 2015 Homework Set 3 Solutions

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

10.2: The Chi Square Test for Goodness of Fit

2 Describing Contingency Tables

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

Module 10: Analysis of Categorical Data Statistics (OA3102)

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

STAT 705: Analysis of Contingency Tables

Categorical Data Analysis Chapter 3

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann

Simple logistic regression

Log-linear Models for Contingency Tables

Ordinal Variables in 2 way Tables

For more information about how to cite these materials visit

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

TUTORIAL 8 SOLUTIONS #

Unit 9: Inferences for Proportions and Count Data

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Analysis of data in square contingency tables

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

BIOMETRICS INFORMATION

Summary of Chapters 7-9

Unit 9: Inferences for Proportions and Count Data

Chi-square (χ 2 ) Tests

Three-Way Contingency Tables

Solution to Tutorial 7

STAC51: Categorical data Analysis

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Correspondence Analysis

Chi-square (χ 2 ) Tests

Ling 289 Contingency Table Statistics

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Cohen s s Kappa and Log-linear Models

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Analysis of Variance

The material for categorical data follows Agresti closely.

Loglinear models. STAT 526 Professor Olga Vitek

Correlation and regression

Review of One-way Tables and SAS

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

Frequency Distribution Cross-Tabulation

Institute of Actuaries of India

Stat 5421 Lecture Notes Simple Chi-Square Tests for Contingency Tables Charles J. Geyer March 12, 2016

Multiple Sample Categorical Data

Exam details. Final Review Session. Things to Review

2.3 Analysis of Categorical Data

Chi-Square. Heibatollah Baghi, and Mastee Badii

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Central Limit Theorem ( 5.3)

STAT 526 Advanced Statistical Methodology

Categorical Variables and Contingency Tables: Description and Inference

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

CDA Chapter 3 part II

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Lecture 7: Hypothesis Testing and ANOVA

HOW TO USE PROC CATMOD IN ESTIMATION PROBLEMS

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Longitudinal Modeling with Logistic Regression

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Chapter 10: Chi-Square and F Distributions

Exercise 7.4 [16 points]

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

MSH3 Generalized linear model

Relate Attributes and Counts

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Lecture 28 Chi-Square Analysis

Lecture 01: Introduction

Goodness of Fit Goodness of fit - 2 classes

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Transcription:

page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These models specify how a cell count is related to the levels of the categorical variables that define that cell. This specification relates to the association and interaction structure among the categorical variables. We will begin with considering the simplest case - that of two-way tables. Loglinear Models for Two-way Tables Consider an x table that cross-classifies n subjects based on two categorical variables. Here the cell counts n ij follow a multinomial distribution with x categories. The probabilities ij for this multinomial form the joint distribution of the two categorical variables. The variables are statistically independent when ij i j for all i and j. The expression for the expected frequencies is then simplied to for all i and j. m ij n ij m ij n i j We construct loglinear models using m ij instead of ij so they also apply for the Poisson sampling model. On a logarithmic scale, independence has an additive form logm ij logn log i log j Suppose we denote the row variable by X and the column variable by ; We can then re-write the above expression as logm ij i X j where i X log i log h 1

page2 j log j log h and log h log h logn This model is called the loglinear model of independence for two-way contingency tables. NOTE: 1. The ANOVA 2-way design is E ijk i j where i i j j 2. The parameters i X and j satisfy i X j 0 Zero-sum constraints like this are often used in the field of experimental design as a way to make model parameters identifiable. (Other parameter definitions are possible.) n the above model, the log expected frequency for cell i,j is an additive function of a row effect i X and a column effect j. The parameter i X represents the effect of classification in row i for variable X. The larger the value of i X,the larger each expected frequency is in row i of the table. When h X l X, each expected frequency in row h equals the corresponding expected frequency in row l. Similarly, the parameter j represents the effect of classification in column j for variable. The null hypothesis of that this loglinear model holds translates to the null hypothesis of independence between the two categorical variables X and. The fitted values that satisfy the model are then m ij n in j n 2

page3 which are the same as the estimated expected frequencies for the test of independence we saw earlier. Chi-square tests of independence using X 2 and G 2 are also goodness of fit tests of this loglinear model. Once we have fitted the model, we turn to interpreting it. To understand the interpretation of parameters in the model of independence, suppose we now consider a contingency table that has only 2 columns, ie. a x2 table. Using the joint probability distribution, for the i th row the log odds of being in column 1 instead of column 2 is given by log i1 i2 log m i1 m i2 logm i1 logm i2 i X 1 i X 2 1 2 2 1 using the fact that 1 2 0 For each row, we then have that the odds of response in column 1 instead of column 2 is given by e 2 1 This implies that the probability of classification in a partciular column is the same for all rows. Note: For the special case of a 2x2 table, log log m 11m 22 m 12 m 21 logm 11 logm 22 logm 12 logm 21 and using our model 1 X 1 2 X 2 1 X 2 2 X 1 0 so that 1 under the null hypothesis that our model hold. This result is also true under the model of independence. 3

page4 Example: The following 2x2 table classifies n 3566 individuals according to smoking status and sleep problems: Sleep Problems es No Total Smoking es n 11 346 n 12 1198 n 1 1544 Status No n 21 320 n 22 1702 n 2 2022 Total n 1 666 n 2 2900 n n 3566 Are the variables Smokiing Status and Sleep problems independent? Let us now approach this question by fitting the loglinear model of independence: logm ij i X j to this data to answer this question. Under the null hypothesis of independence, the M.L.Es for i and j are i n i n p i and j n j n p j Thus we determine estimates for, i X, and j using i X logpi logn j logpj which yields 4

page5 1 X 0.1349 2 X 0.1349 1 0.7356 2 0.7356 and check!!!!! 6.5346 Using these estimates, we obtain estimates for logm ij under the model of independence. e.g.check logm 11 6.5346 0.13490.7356 5.6641 logm 12 6.5346 0.13490.7356 7.1353 logm 21 6.5346 0.13490.7356 5.9339 and logm 22 6.5346 0.13490.7356 7.4051 Thus m 11 288.36 m 12 1255.64 m 21 377.64 m 22 1644.36 This results in Pearson s X 2 i1 j1 24.98 n ij m ij 2 m ij and the likelihood ratio test statistic 5

page6 G 2 2 i1 24.79 n ij log n ij m ij The degrees of freedom associated with both of these statistics in the loglinear model of independence are determined by the number of cells in the table minus the number of independent parameters in the model. Thus, for an x table, j1 df 1 1 1 1 1. Our X 2 and G 2 each have df 1 with associated pvalue of approximately 0. Thus there is strong evidence to indicate that the loglinear model of independence is not appropriate for these data. SAS code: data smoking; input smoke $ Slpprob $ count; cards; es es 346 No es 320 es No 1198 No No 1702 ; proc catmod orderdata; model smoke*slpprob_response_/ covb predfreq; loglin smoke Slpprob; weight count; run; 6