Multinomial Logistic Regression Models

Similar documents
ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Testing Independence

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Simple logistic regression

STAT 7030: Categorical Data Analysis

Stat 642, Lecture notes for 04/12/05 96

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Stat 8053, Fall 2013: Multinomial Logistic Models

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Generalised linear models. Response variable can take a number of different formats

8 Nominal and Ordinal Logistic Regression

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

STA6938-Logistic Regression Model

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Linear Regression Models P8111

Model Estimation Example

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Categorical data analysis Chapter 5

Chapter 5: Logistic Regression-I

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Section Poisson Regression

Log-linear Models for Contingency Tables

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

ECONOMETRICS II TERM PAPER. Multinomial Logit Models

Longitudinal Modeling with Logistic Regression

Generalized linear models

9 Generalized Linear Models

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

COMPLEMENTARY LOG-LOG MODEL

Sections 4.1, 4.2, 4.3

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Models for Binary Outcomes

Ch 6: Multicategory Logit Models

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

Generalized logit models for nominal multinomial responses. Local odds ratios

BIOS 625 Fall 2015 Homework Set 3 Solutions

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

12 Modelling Binomial Response Data

Chapter 14 Logistic and Poisson Regressions

Logistic Regression. Continued Psy 524 Ainsworth

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Single-level Models for Binary Responses

Categorical Data Analysis Chapter 3

Basic Medical Statistics Course

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Lecture 12: Effect modification, and confounding in logistic regression

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Chapter 20: Logistic regression for binary response variables

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Statistical Distribution Assumptions of General Linear Models

BMI 541/699 Lecture 22

Lecture 8: Summary Measures

Cohen s s Kappa and Log-linear Models

STAT 705: Analysis of Contingency Tables

Exam Applied Statistical Regression. Good Luck!

The material for categorical data follows Agresti closely.

Generalized linear models

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Chapter 4: Generalized Linear Models-II

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ssh tap sas913, sas

CHAPTER 1: BINARY LOGIT MODEL

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Solutions for Examination Categorical Data Analysis, March 21, 2013

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Count data page 1. Count data. 1. Estimating, testing proportions

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Inference for Binomial Parameters

Chapter 11: Analysis of matched pairs

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Pseudo-score confidence intervals for parameters in discrete statistical models

Entropy Coe cient of Determination and Its Application

One-Way Tables and Goodness of Fit

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Review of One-way Tables and SAS

Transcription:

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word polychotomous is sometimes used, but this word does not exist!) When analyzing a polytomous response, it s important to note whether the response is ordinal (consisting of ordered categories) or nominal (consisting of unordered categories). Some types of models are appropriate only for ordinal responses; other models may be used whether the response is ordinal or nominal. If the response is ordinal, we do not necessarily have to take the ordering into account, but it often helps if we do. Using the natural ordering can lead to a simpler, more parsimonious model and increase power to detect relationships with other variables.

Stat 544, Lecture 19 2 If the response variable is polytomous and all the potential predictors are discrete as well, we could describe the multiway contingency table by a loglinear model. But fitting a loglinear model has two disadvantages: It has many more parameters, and many of them are not of interest. The loglinear model describes the joint distribution of all the variables, whereas the logistic model describes only the conditional distribution of the response given the predictors. The loglinear model is more complicated to interpret. In the loglinear model, the effect of a predictor X on the response Y is described by the XY association. In a logit model, however, the effect of X on Y is a main effect. If you are analyzing a set of categorical variables, and one of them is clearly a response while the others are predictors, I recommend that you use logistic rather than loglinear models.

Stat 544, Lecture 19 3 Grouped versus ungrouped. Consider a medical study to investigate the long-term effects of radiation exposure on mortality. The response variable is 8 1 if alive, >< 2 if dead from cause other than cancer, Y = 3 if dead from cancer other than leukemia, >: 4 if dead from leukemia. The main predictor of interest is level of exposure (low, medium, high). The data could arrive in ungrouped form, with one record per subject: low 4 med 1 med 2 high 1... Or it could arrive in grouped form: Exposure Y =1 Y =2 Y =3 Y =4 low 22 7 5 0 medium 18 6 7 3 high 14 12 9 9

Stat 544, Lecture 19 4 In ungrouped form, the response occupies a single column of the dataset, but in grouped form the response occupies r columns. Most computer programs for polytomous logistic regression can handle grouped or ungrouped data. Whether the data are grouped or ungrouped, we will imagine the response to be multinomial. That is, the response for row i, y i =(y i1,y i2,...,y ir ) T, is assumed to have a multinomial distribution with index n i = P r j=1 y ij and parameter π i =(π i1,π i2,...,π ir ) T. If the data are grouped, then n i is the total number of trials in the ith row of the dataset, and y ij is the number of trials in which outcome j occurred. If the data are ungrouped, then y i has a 1 in the position corresponding to the outcome that occurred and 0 s elsewhere, and n i = 1. Note, however, that if the data are ungrouped, we do not have to actually create a dataset with columns of 0 s and 1 s; a single column containing the response level 1, 2,...,r is sufficient.

Stat 544, Lecture 19 5 Describing polytomous responses by a sequence of binary models. In some cases, it makes sense to factor the response into a sequence of binary choices and model them with a sequence of ordinary logistic models. For example, consider the study of the effects of radiation exposure on mortality. The four-level response can be modeled in three stages: Population Stage 1 Alive Dead Stage 2 Non-cancer Cancer Stage 3 Other cancer Leukemia

Stat 544, Lecture 19 6 The stage 1 model, which is fit to all subjects, describes the log-odds of death. The stage 2 model, which is fit only to the subjects that die, describes the log-odds of death due to cancer versus death from other causes. The stage 3 model, which is fit only to the subjects who die of cancer, describes the log-odds of death due to leukemia versus death due to other cancers. Because the multinomial distribution can be factored into a sequence of conditional binomials, we can fit these three logistic models separately. The overall likelihood function factors into three independent likelihoods. This approach is attractive when the response can be naturally arranged as a sequence of binary choices. But in situations where arranging such a sequence is unnatural, we should probably fit a single multinomial model to the entire response.

Stat 544, Lecture 19 7 Baseline-category logit model. Suppose that y i =(y i1,y i2,...,y ir ) T has a multinomial distribution with index n i = P r j=1 y ij and parameter π i =(π i1,π i2,...,π ir ) T. When the response categories 1, 2,...,r are unordered, the most popular way to relate π i to covariates is through a set of r 1 baseline-category logits. Taking j as the baseline category, the model is «πij log = x T i β j, j j. π ij If x i has length p, then this model has (r 1) p free parameters, which we can arrange as a matrix or a vector. For example, if the last category is the baseline (j = r), the coefficients are β =[β 1,β 2,...,β r 1 ]

Stat 544, Lecture 19 8 or 2 3 vec(β) = 6 4 β 1 β 2.. 7 5 β r 1 Comments on this model The kth element of β j can be interpreted as: the increase in log-odds of falling into category j versus category j resulting from a one-unit increase in the kth covariate, holding the other covariates constant. Removing the kth covariate from the model is equivalent to simultaneously setting j 1 coefficients to zero. Any of the categories can be chosen to be the baseline. The model will fit equally well, achieving the same likelihood and producing the same fitted values. Only the values and interpretation of the coefficients will change.

Stat 544, Lecture 19 9 To calculate π i from β, the back-transformation is π ij = exp(x T i β j ) 1+ P k j exp(x T i β k) for the non-baseline categories j j, and the baseline-category probability is π ij = 1 1+ P k j exp(x T i β k). Model fitting. This model is not difficult to fit by Newton-Raphson or Fisher scoring. PROC LOGISTIC can do it. Goodness of fit. If the estimated expected counts ˆμ ij = n iˆπ ij are large enough, we can test the fit of our model versus a saturated model that estimates π independently for i =1,...,N. The deviance for comparing this model to a saturated one is G 2 =2 NX i=1 rx j=1 y ij log y ij μ ij. The saturated model has N(r 1) free parameters and the current model has p(r 1), where p is the

Stat 544, Lecture 19 10 length of x i, so the degrees of freedom are df =(N p)(r 1). The corresponding Pearson statistic is where X 2 = r ij NX i=1 rx j=1 r 2 ij, = y ij ˆμ ij pˆμij is the Pearson residual. If the model is true, both are approximately distributed as χ 2 df provided that no more than 20% of the μ ij s are below 5.0, and none are below 1.0. In practice this is often not satisfied, so there may be no way to assess the overall fit of the model. However, we may still apply a χ 2 approximation to ΔG 2 and ΔX 2 to compare nested models, provided that (N p)(r 1) is large relative to Δdf. Overdispersion Overdispersion means that the actual covariance matrix of y i exceeds that specified by the multinomial

Stat 544, Lecture 19 11 model, V (y i )=n i h Diag(π i ) π i π T i It is reasonable to think that overdispersion is present if i. the data are grouped (n i s are greater than 1), x i already contains all covariates worth considering, and the overall X 2 is substantially larger than its degrees of freedom (N p)(r 1). In this situation, it may be worthwhile to introduce a scale parameter σ 2, so that h i V (y i )=n i σ 2 Diag(π i ) π i πi T. The usual estimate for σ 2 is ˆσ 2 = X 2 (N p)(r 1), which is approximately unbiased if (N p)(r 1) is large. Introducing a scale parameter does not alter the estimate of β (which then becomes a quasilikelihood estimate), but it does alter our

Stat 544, Lecture 19 12 estimate of the variability of ˆβ. If we estimate a scale parameter, we should multiply the estimated ML covariance matrix for ˆβ by ˆσ 2 (SAS does this automatically); divide the usual Pearson residuals by ˆσ; and divide the usual X 2, G 2,ΔX 2 and ΔG 2 statistics by ˆσ 2 (SAS reports these as scaled statistics). These adjustments will have little practical effect unless the estimated scale parameter is substantially greater than 1.0 (say, 1.2 or higher). Example. The table below, reported by Delany and Moore (1987), comes from a study of the primary food choices of alligators in four Florida lakes. Researchers classified the stomach contents of 219 captured alligators into five categories: Fish (the most common primary food choice), Invertebrate (snails, insects, crayfish, etc.), Reptile (turtles, alligators), Bird, and Other (amphibians, plants, household pets, stones, and other debris). Let s describe these data by a baseline-category model, with Primary Food Choice as the outcome and Lake, Sex, and Size as covariates.

Stat 544, Lecture 19 13 Primary Food Choice Lake Sex Size Fish Inv. Rept. Bird Other Hancock M small 7 1 0 0 5 large 4 0 0 1 2 F small 16 3 2 2 3 large 3 0 1 2 3 Oklawaha M small 2 2 0 0 1 large 13 7 6 0 0 F small 3 9 1 0 2 large 0 1 0 1 0 Trafford M small 3 7 1 0 1 large 8 6 6 3 5 F small 2 4 1 1 4 large 0 1 0 0 0 George M small 13 10 0 2 2 large 9 0 0 1 2 F small 3 9 1 0 1 large 8 1 0 0 1 Because the usual primary food choice of alligators appears to be fish, we ll use fish as the baseline category; the four logit equations will then describe the log-odds that alligators select other primary food types instead of fish. Entering the data. When the data are grouped, as

Stat 544, Lecture 19 14 they are in this example, SAS expects the response categories 1, 2,...,r to appear in a single column of the dataset, with another column containing the frequency or count. That is, the data should look like this: Hancock male small fish 7 Hancock male small invert 1 Hancock male small reptile 0 Hancock male small bird 0 Hancock male small other 5 Hancock male large fish 4 Hancock male large invert 0 Hancock male large reptile 0 Hancock male large bird 1 Hancock male large other 2 --lines omitted-- George female large bird 0 George female large other 1 The lines that have a frequency of zero are not actually used in the modeling, because they contribute nothing to the loglikelihood. You can include them if you want to, but it s not necessary.

Stat 544, Lecture 19 15 Specifying the model. In the model statement, you need to tell SAS about the existence of a count or frequency variable; otherwise SAS will assume that the data are ungrouped, with each line representing a single alligator. You also need to specify which of the categories is the baseline. The link function is glogit, for generalized logit. To get fit statistics, include the options aggregate and scale=none. options nocenter nodate nonumber linesize=72; data gator; input lake $ sex $ size $ food $ count; cards; Hancock male small fish 7 Hancock male small invert 1 Hancock male small reptile 0 Hancock male small bird 0 -- lines omitted -- George female large other 1 ; proc logist data=gator; freq count; class lake size sex / order=data param=ref ref=first; model food(ref= fish ) = lake size sex / link=glogit aggregate scale=none; run; Here is the output pertaining to the goodness of fit.

Stat 544, Lecture 19 16 Response Profile Ordered Total Value food Frequency 1 bird 13 2 fish 94 3 invert 61 4 other 32 5 reptile 19 Logits modeled use food= fish as the reference category. NOTE: 24 observations having zero frequencies or weights were excluded since they do not contribute to the analysis. Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 40 50.2637 1.2566 0.1282 Pearson 40 52.5643 1.3141 0.0881 Number of unique profiles: 16 There are N = 16 profiles (unique combinations of lake, sex and size) in this dataset. The saturated model, which fits a separate multinomial distribution to each profile, has 16 4 = 64 free parameters. The current model has an intercept, three lake coefficients, one sex coefficient and one size coefficient for each of the four logit equations, for a total of 24 parameters. Therefore, the overall fit statistics have 64 24 = 40 degrees of freedom.

Stat 544, Lecture 19 17 Output pertaining to the significance of covariates: Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 66.4974 20 <.0001 Score 59.4616 20 <.0001 Wald 51.2336 20 0.0001 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq lake 12 36.2293 0.0003 size 4 15.8873 0.0032 sex 4 2.1850 0.7018 The first section (global null hypothesis) tests the fit of the current model against a null or intercept-only model. The null model has four parameters (one for each logit equation). Therefore the comparison has 24 4 = 20 degrees of freedom. This test is highly significant, indicating that at least one of the covariates has an effect on food choice. The next section (Type III analysis of effects) shows the change in fit resulting from discarding any one of the covariates lake, sex or size while keeping the others in the model. For example, consider the test for lake. Discarding lake is equivalent to setting three coefficicients to zero in each of the four logit

Stat 544, Lecture 19 18 equations; so the test for lake has 3 4 = 12 degrees of freedom. Judging from these tests, we see that lake has an effect on food choice; size has an effect on food choice; and sex does not have a discernible effect. This suggests that we should probably remove sex from the model. We also may want to look for interactions between lake and size. Here are the estimated coefficients: Analysis of Maximum Likelihood Estimates Standard Wald Parameter food DF Estimate Error Chi-Square Pr > ChiSq Intercept bird 1-2.4633 0.7739 10.1310 0.0015 Intercept invert 1-2.0744 0.6116 11.5025 0.0007 Intercept other 1-0.9167 0.4782 3.6755 0.0552 Intercept reptile 1-2.9141 0.8856 10.8275 0.0010 lake Oklawaha bird 1-1.1256 1.1924 0.8912 0.3452 lake Oklawaha invert 1 2.6937 0.6692 16.2000 <.0001 lake Oklawaha other 1-0.7405 0.7422 0.9956 0.3184 lake Oklawaha reptile 1 1.4008 0.8105 2.9872 0.0839 lake Trafford bird 1 0.6617 0.8461 0.6117 0.4341 lake Trafford invert 1 2.9363 0.6874 18.2469 <.0001 lake Trafford other 1 0.7912 0.5879 1.8109 0.1784 lake Trafford reptile 1 1.9316 0.8253 5.4775 0.0193 lake George bird 1-0.5753 0.7952 0.5233 0.4694 lake George invert 1 1.7805 0.6232 8.1623 0.0043 lake George other 1-0.7666 0.5686 1.8179 0.1776 lake George reptile 1-1.1287 1.1925 0.8959 0.3439 size large bird 1 0.7302 0.6523 1.2533 0.2629 size large invert 1-1.3363 0.4112 10.5606 0.0012

Stat 544, Lecture 19 19 size large other 1-0.2906 0.4599 0.3992 0.5275 size large reptile 1 0.5570 0.6466 0.7421 0.3890 sex female bird 1 0.6064 0.6888 0.7750 0.3787 sex female invert 1 0.4630 0.3955 1.3701 0.2418 sex female other 1 0.2526 0.4663 0.2933 0.5881 sex female reptile 1 0.6275 0.6852 0.8387 0.3598 Odds Ratio Estimates Point 95% Wald Effect food Estimate Confidence Limits lake Oklawaha vs Hancock bird 0.324 0.031 3.358 lake Oklawaha vs Hancock invert 14.786 3.983 54.893 lake Oklawaha vs Hancock other 0.477 0.111 2.042 lake Oklawaha vs Hancock reptile 4.058 0.829 19.872 lake Trafford vs Hancock bird 1.938 0.369 10.176 lake Trafford vs Hancock invert 18.846 4.899 72.500 lake Trafford vs Hancock other 2.206 0.697 6.983 lake Trafford vs Hancock reptile 6.900 1.369 34.784 lake George vs Hancock bird 0.563 0.118 2.673 lake George vs Hancock invert 5.933 1.749 20.125 lake George vs Hancock other 0.465 0.152 1.416 lake George vs Hancock reptile 0.323 0.031 3.349 size large vs small bird 2.076 0.578 7.454 size large vs small invert 0.263 0.117 0.588 size large vs small other 0.748 0.304 1.842 size large vs small reptile 1.745 0.492 6.198 sex female vs male bird 1.834 0.475 7.075 sex female vs male invert 1.589 0.732 3.449 sex female vs male other 1.287 0.516 3.211 sex female vs male reptile 1.873 0.489 7.175 How do we interpret them? Recall that there are four logit equations to predict the log-odds of birds versus fish, invertebrates versus fish, other versus fish, and

Stat 544, Lecture 19 20 reptiles versus fish. The intercepts give the estimated log-odds for the reference group lake=hancock, size=small, sex=male. For example, the estimated log-odds of birds versus fish in this group is 2.4633; the estimated log-odds of invertebrates versus fish is 2.0744; and so on. The lake effect is characterized by three dummy coefficients in each of the four logit equations. The estimated coefficient for the Lake Oklawaha dummy in the bird-versus-fish equation is 1.1256. This means that alligators in Lake Oklawaha are less likely to choose birds over fish than their colleagues in Lake Hancock are. In other words, fish appear to be less common in Lake Oklawaha than in Lake Hancock. The estimated odds ratio of exp( 1.1256) = 0.32 is the same for alligators of all sex and sizes, because this is a model with main effects but no interactions.