ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Similar documents
Multinomial Logistic Regression Models

Ch 6: Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Simple logistic regression

BIOS 625 Fall 2015 Homework Set 3 Solutions

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

COMPLEMENTARY LOG-LOG MODEL

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

STAT 7030: Categorical Data Analysis

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Models for Binary Outcomes

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

STA6938-Logistic Regression Model

ssh tap sas913, sas

Chapter 5: Logistic Regression-I

Categorical data analysis Chapter 5

Count data page 1. Count data. 1. Estimating, testing proportions

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Sections 4.1, 4.2, 4.3

Logistic Regression for Ordinal Responses

Stat 642, Lecture notes for 04/12/05 96

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Model Estimation Example

Cohen s s Kappa and Log-linear Models

8 Nominal and Ordinal Logistic Regression

Chapter 14 Logistic and Poisson Regressions

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Log-linear Models for Contingency Tables

STAT 705: Analysis of Contingency Tables

Single-level Models for Binary Responses

Generalized logit models for nominal multinomial responses. Local odds ratios

4 Multicategory Logistic Regression

Homework 1 Solutions

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Short Course Introduction to Categorical Data Analysis

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

BMI 541/699 Lecture 22

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Investigating Models with Two or Three Categories

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Longitudinal Modeling with Logistic Regression

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Lecture 12: Effect modification, and confounding in logistic regression

Generalized linear models

CHAPTER 1: BINARY LOGIT MODEL

Logistic Regressions. Stat 430

The material for categorical data follows Agresti closely.

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

Matched Pair Data. Stat 557 Heike Hofmann

Chapter 1. Modeling Basics

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

ECONOMETRICS II TERM PAPER. Multinomial Logit Models

Exam Applied Statistical Regression. Good Luck!

Chapter 20: Logistic regression for binary response variables

Linear Regression Models P8111

Chapter 4: Generalized Linear Models-I

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Introducing Generalized Linear Models: Logistic Regression

Chapter 11: Models for Matched Pairs

9 Generalized Linear Models

Chapter 11: Analysis of matched pairs

Solution to Tutorial 7

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

MSUG conference June 9, 2016

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Binary Logistic Regression

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Generalized Models: Part 1

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Models for Ordinal Response Data

Solutions for Examination Categorical Data Analysis, March 21, 2013

Lecture 2: Poisson and logistic regression

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Lecture 8: Summary Measures

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Three-Way Contingency Tables

Homework 10 - Solution

Transcription:

ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1

Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities with π 1 + + π J = 1. If we have n independent observations based on these probabilities, the probability distribution for the no. of outcomes that occur for each J types is called multinomial. Multicategory (or polychotomous) logit models simultaneously refer to all pairs of categories. They describe the odds of response in one category rather than another. Once the model specifies logits for a certain (J 1) pairs of categories, the rest are redundant. 2

Baseline Category Logits Logit models for nominal responses pair each response category with a baseline category. The choice of baseline category is arbitrary. If the last category (J) is the baseline, the baseline category logits are: log( π j π J ), j = 1,, J Given that the response falls in category j or J, this is the log odds that the response is j. For J = 3, for instance, the logit model uses log(π 1 /π 3 ) and log(π 2 /π 3 ). 3

Baseline Category Logit Models The logit models using the baseline-category logits with a predictor x has form log( π j π J ) = α j + β j x, j = 1,, J Parameters in the (J 1) equations determine parameters for logits using all other pairs of response categories. For instance, for an arbitrary pair of categories a and b log( π a π b ) = log( π a/π J π b /π J ) = log( π a π J ) log( π b π J ) = (α a + β a x) (α b + β b x) = (α a α b ) + (β a β b )x 4

Notes The logit equation for categories a and b has intercept parameter (α a α b ) and slope parameter (β a β b ). For optimal efficiency, one should fit J 1 logit equations simultaneously. Estimates of the model parameters will then have smaller standard error than the estimates obtained by fitting the equations separately. For simultaneous fitting, the same parameter estimates occur for a pair of categories no matter which category is baseline. 5

Alligator Food Choice Example The data is taken from a study by the Florida Game and Fresh Water Fish Commission of factors influencing the primary food choice of alligators. For 59 alligators sampled in Lake George, Florida, it shows the alligator length (in meters) and the primary food type, in volume, found in the alligator s stomach. Primary food type has three categories: Fish, Invertebrate, and Other. 6

Reading The Data data gator; input length choice $ @@; datalines; 1.24 I 1.30 I 1.30 I 1.32 F 1.32 F 1.40 F 1.42 I 1.42 F 1.45 I 1.45 O 1.47 I 1.47 F 1.50 I 1.52 I 1.55 I 1.60 I 1.63 I 1.65 O 1.65 I 1.65 F 1.65 F 1.68 F 1.70 I 1.73 O 1.78 I 1.78 I 1.78 O 1.80 I 1.80 F 1.85 F 1.88 I 1.93 I 1.98 I 2.03 F 2.03 F 2.16 F 2.26 F 2.31 F 2.31 F 2.36 F 2.36 F 2.39 F 2.41 F 2.44 F 2.46 F 2.56 O 2.67 F 2.72 I 2.79 F 2.84 F 3.25 O 3.28 O 3.33 F 3.56 F 3.58 F 3.66 F 3.68 O 3.71 F 3.89 F ; run; 7

Fitting A Baseline-Category Logit Model proc logistic data=gator descending ; model choice (REFERENCE="O") = length / link=glogit scale=none aggregate; output out = prob PREDPROBS=I; run; 8

Partial Output Data Set Response Variable Model Information Number of Response Levels 3 Model Optimization Technique WORK.GATOR choice generalized logit Fisher s scoring Number of Observations Read 59 Number of Observations Used 59 Ordered Response Profile Total Value choice Frequency 1 O 8 2 I 20 3 F 31 Logits modeled use choice= O as the reference category. 9

Partial Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance 75.1140 86 0.8734 0.7929 Pearson 80.1879 86 0.9324 0.6563 Number of unique profiles: 45 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 119.142 106.341 SC 123.297 114.651-2 Log L 115.142 98.341 10

Partial Output Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 16.8006 2 0.0002 Score 12.5702 2 0.0019 Wald 8.9360 2 0.0115 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq length 2 8.9360 0.0115 11

Partial Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter choice DF Estimate Error Chi-Square Pr > ChiSq Intercept I 1 5.6974 1.7938 10.0881 0.0015 Intercept F 1 1.6177 1.3073 1.5314 0.2159 length I 1-2.4654 0.8997 7.5101 0.0061 length F 1-0.1101 0.5171 0.0453 0.8314 Odds Ratio Estimates Point 95% Wald Effect choice Estimate Confidence Limits length I 0.085 0.015 0.496 length F 0.896 0.325 2.468 12

Example: We applied baseline-category logit model with J = 3, Y = primary food choice is the response. X = length of alligator is the predictor. From the parameter estimates log(ˆπ 1 /ˆπ 3 ) = 1.618 0.110x, log(ˆπ 2 /ˆπ 3 ) = 5.697 2.465x The estimated log odds that the response is f ish rather than invertebrate equals log(ˆπ 1 /ˆπ 2 ) = (1.618 5.697) + [ 0.110 ( 2.465)]x = 4.080 + 2.355x 13

Notes For each logit, one interprets the estimates just as in ordinary binary logistic regression models, conditional on the event that the response outcome was one of those two categories. For instance, given that the primary food type is fish or invertebrate, the estimated probability that it is fish increases in length x according to an logistic curve. For alligators of length x + 1 meters, the estimated odds that primary food type is f ish rather than invertebrate equal exp(2.355) = 10.5 times the estimated odds for alligators of length x meters. 14

Effect of The Predictor To test the hypothesis that primary food choice is independent of alligator length, we test H 0 : β j = 0 for j = 1, 2 in the model. The LR test takes twice the difference in maximized log likelihoods between this model and the simpler one having response independent of length. The test statistic equals 16.8 with df = 2, giving a p-value of 0.01 and strong effect of a length effect. 15

Estimating Response Probabilities One can express the multicategory logit model directly in terms of the response probabilities, as π j = exp(α j + β j x), j = 1,, J 1 J exp(α h + β h x) h=1 The denominator is same for each probability. The numerators for various j sum to the denominator. The parameters equal zero in the above equation for whichever category is the baseline in the logit expressions. 16

Example: Alligator Data The estimated probabilities of the outcomes equal ˆπ 1 = ˆπ 2 = ˆπ 3 = exp(1.62 0.11x) 1 + exp(1.62 0.011x) + exp(5.70 2.47x) exp(5.70 2.47x) 1 + exp(1.62 0.011x) + exp(5.70 2.47x) 1.0 1 + exp(1.62 0.011x) + exp(5.70 2.47x) 17

18

Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female 371 49 74 Male 250 45 71 Black Female 64 9 15 Male 25 5 13 19

Example: Y = belief in life after death (Yes, Undecided, No) X 1 = gender, X 2 = race Use dummy variables as predictors, with x 1 = 1 for females and 0 for males, and x 2 = 1 for whites and 0 for blacks. Using no as the baseline category for belief in life after death, we form the model log( π j π 3 ) = α j + β G j x 1 + β R j x 2, j = 1, 2 where G and R superscripts identify the gender and race parameters 20

Notes The model assumes a lack of interaction between gender and race. The effect parameters represent log odds ratios with the baseline category. β1 G is the conditional log odds ratio between gender and response categories 1 and 3, given race... β2 G is the conditional log odds ratio between gender and response categories 2 and 3, given race. 21

SAS Codes: Enter The Data data afterlife; input race gender belief count @@; datalines; 1 1 1 371 1 1 2 49 1 1 3 74 1 0 1 250 1 0 2 45 1 0 3 71 0 1 1 64 0 1 2 9 0 1 3 15 0 0 1 25 0 0 2 5 0 0 3 13 ; run; 22

Fit The Model proc logistic data = afterlife descending; weight count; model belief (reference="3") = race gender /link=glogit scale = none aggregate; output out = prob PREDPROBS=I; run; 23

Partial Output Response Profile Ordered Total Total Value belief Frequency Weight 1 3 4 173.00000 2 2 4 108.00000 3 1 4 710.00000 Logits modeled use belief=3 as the reference category. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. 24

Partial Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance 75.1140 86 0.8734 0.7929 Pearson 80.1879 86 0.9324 0.6563 Number of unique profiles: 45 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 119.142 106.341 SC 123.297 114.651-2 Log L 115.142 98.341 25

Partial Output Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 8.7437 4 0.0678 Score 8.8498 4 0.0650 Wald 8.7818 4 0.0668 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq race 2 2.0824 0.3530 gender 2 7.2074 0.0272 26

Partial Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter belief DF Estimate Error Chi-Square Pr >Chi-Square Intercept 2 1-0.7582 0.3614 4.4031 0.0359 Intercept 1 1 0.8828 0.2426 13.2390 0.0003 race 2 1 0.2712 0.3541 0.5863 0.4438 race 1 1 0.3420 0.2370 2.0814 0.1491 gender 2 1 0.1051 0.2465 0.1817 0.6699 gender 1 1 0.4186 0.1713 5.9737 0.0145 Odds Ratio Estimates Point 95 % Wald Effect belief Estimate Confidence Limits race 2 1.311 0.655 2.625 race 1 1.408 0.885 2.240 gender 2 1.111 0.685 1.801 gender 1 1.520 1.086 2.126 27

Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female 371 49 74 (372.8) (49.2) (72.1) Male 250 45 71 (248.2) (44.8) (72.9) Black Female 64 9 15 (62.2) (8.8) (16.9) Male 25 5 13 (26.8) (5.2) (11.1) 28

Example The goodness-of-fit statistics are G 2 = 0.8 and χ 2 = 0.9. The sample has two non-redundant logits at each of four gender-race combinations, for a total of eight logits. The model under consideration has six parameters. Thus, residual df = 8 6 = 2. The model fits well. 29

Discussion On The Example From the parameter estimates, the estimated odds of a yes rather than a no response for females are exp(0.419) = 1.5 times those for males, controlling for race; For whites, they are exp(0.342) = 1.4 times those for blacks, controlling for gender. To test the effect of gender, we test H 0 : β G j = 0 for j = 1, 2. The LR test compares G 2 = 0.8(df = 2) to G 2 = 8.0(df = 4) obtained by dropping gender from model. The difference of 7.2, based on df = 2, has a p-value of 0.03 and shows evidence of a gender effect. By contrast, the effect of race is not significant. 30

Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female 0.76 0.10 0.15 Male 0.68 0.12 0.20 Black Female 0.71 0.10 0.19 Male 0.62 0.12 0.26 31

Connection With Loglinear Models When all explanatory variables are categorical, logit models have corresponding loglinear models. The model fitted to the previous example, assumes main effects of gender (G) and race (R) on belief (B) in afterlife, with no interaction. It corresponds to the loglinear model (GR, BG, BR) of homogeneous association. The simpler logit model that deletes the race effect on belief corresponds to loglinear model (GR, BG). 32

ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Ordinal Responses 33

Models For Ordinal Responses When response categories are ordered, logits can directly incorporate the ordering. We can have models with simpler interpretations. Define the j-th cumulative probability that the response Y falls in category j or below as P (Y j) = π 1 + + π j, j = 1,, J The cumulative probabilities reflect the ordering, with P (Y 1) P (Y 2) P (Y J) = 1 Models for cumulative probabilities do not use the final one P (Y J), since it equals 1. 34

Cumulative Logits The logits of the first J 1 cumulative probabilities are P (Y j) logit[p (Y j)] = log( 1 P (Y j) ) These are called cumulative logits. = log( π 1 + + π j π j+1 + + π j ), j = 1,, J 1 A model for the j-th cumulative logit looks like an ordinary logit model for a binary response in which categories 1 to j combine to form a single category, and categories j + 1 to J form a second category. 35

For a predictor X, the model Proportional Odds Model logit[p (Y j)] = α j + βx, j = 1, J 1 has parameter β describing the effect of X on the log odds of response in category j or below. This model assumes an identical effect of X for all J 1 cumulative logits. When this model fits well, it requires a single parameter rather than J 1 parameters to describe the effect of X. 36

Interpretations This model refers to odds ratios for the collapsed response scale, for any fixed j. For two values x 1 and x 2 of X, the odds ratio utilizes the cumulative probabilities and their complements. We have log[ P (Y j X = x 2)/P (Y > j X = x 2 ) P (Y j X = x 1 )/P (Y > j X = x 1 ) ] = β(x 2 x 1 ) Since the log odds ratio is proportional to the distance between the x values with same proportionality constant β for any j, it is called a proportional odds model. 37

Comments For x 2 x 1 = 1, the odds of response below any given category multiply by e β for each unit increase in X. When the model holds with β = 0, X and Y are statistically independent. Explanatory variables in cumulative logit models can be continuous, categorical or of both types. The ML fitting process uses an iterative algorithm simultaneously for all j. 38

Example: Political Ideology Political Ideology Party Very Slightly Slightly Very Affiliation Liberal Liberal Moderate Conservative Conservative Total Democratic 80 81 171 41 55 428 Republican 30 46 148 84 99 407 39

Example: Fit A Proportional Odds Model Political ideology uses a five-point ordinal scale, ranging from very liberal to very conservative. Let X be a dummy variable for political party, with X = 1 for Democrats and X = 0 for republicans. 40

SAS Codes: Read The Data data ideology; input party ideology count @@; datalines; 1 1 80 1 2 81 1 3 171 1 4 41 1 5 55 0 1 30 0 2 46 0 3 148 0 4 84 0 5 99 ; 41

SAS Codes: Fit The Model proc logistic data = ideology order=data descending; class party /param = ref; freq count; model ideology = party /link=clogit scale=none; output out = prob PREDPROBS=I; run; 42

Partial Output: Response Profile Response Profile Ordered Total Value ideology Frequency 1 5 154 2 4 125 3 3 319 4 2 127 5 1 110 Probabilities modeled are cumulated over the lower Ordered Values. 43

Partial Output: Fit Statistics Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 3.9106 3 0.2713 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 2541.630 2484.985 SC 2560.540 2508.622-2 Log L 2533.630 2474.985 44

Output: Testing For Effects Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr>ChiSq Likelihood Ratio 58.6451 1 <.0001 Score 57.2448 1 <.0001 Wald 57.0182 1 <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr> ChiSq party 1 57.0182 <.0001 45

Partial Output: Response Profile Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr >ChiSq Intercept 5 1-2.0440 0.1188 295.9293 <.0001 Intercept 4 1-1.2116 0.1031 138.0265 <.0001 Intercept 3 1 0.5000 0.0943 28.1405 <.0001 Intercept 2 1 1.4945 0.1134 173.6781 <.0001 party 0 1 0.9745 0.1291 57.0182 <.0001 46

SAS Codes: Fit The Model proc freq data = prob noprint; weight count; tables party*ip 1*ip 2*ip 3*ip 4*ip 5 /list nocum nopercent out=test ; run; data table8 6; set test; array p(5) ip 1-ip 5; array pcount(5); do i = 1 to 5; pcount(i) = count*p(i); end; drop ip 1-ip 5 i percent; run; proc print data = table8 6 noobs; run 47

Output party COUNT pcount1 pcount2 pcount3 pcount4 pcount5 0 407 31.7714 44.0346 151.708 75.5005 103.985 1 428 78.4308 83.1523 168.226 49.1170 49.074 48

Example: Political Ideology Political Ideology Party Very Slightly Slightly Very Affiliation Liberal Liberal Moderate Conservative Conservative Total Democratic 80 81 171 41 55 428 (78.4) (83.2) (168.2) (49.1) (49.1) Republican 30 46 148 84 99 407 (31.8) (44.0) (151.7) (75.5) (104.0) 49

Example: Discussion The ML fit of the proportional odds model has estimated effect β = 0.975(ASE = 0.129). For any fixed j, the estimated odds that a Democrat s response is in the liberal direction rather than the conservative direction equal exp(0.975) = 2.65 times the estimated odds for Republicans. A 95% CI for this odds ratio equals exp(0.975 ± 1.96 0.129) = (2.1, 3.4). A fairly substantial association exists, Democrats tending to be more liberal than Republicans. 50

Example: Predicted Probabilities The cumulative probabilities equal P (Y j X = x) = exp(α j + βx) 1 + exp(α j + βx) The first estimated cumulative probability for Democrats (x = 1) equals exp[ 2.469 + 0.975(1)] 1 + exp[ 2.469 + 0.975(1)] = 0.18 The estimated probability of the j-th category can be obtained as P (Y j X = x) P (Y j 1 X = x) 51

Ordinal Tests Of Independence The LR statistic for an ordinal test of independence (H 0 : β = 0) is the difference in deviance (G 2 ) values for the independence model and the proportional odds model. In this example, the difference in G 2 values of 62.3 3.7 = 58.6, based on df = 4 3 = 1, gives extremely strong evidence of an association (p-value < 0.0001). When the model fits well, it is more powerful than the test of independence discussed in the context of two-way tables based on df = (I 1)(J 1), since it focuses on a restricted alternative and has only one degree of freedom. 52

Adjacent Categories Logits The adjacent categories logits are logit[(p (Y = j Y = jorj + 1)] = log π j π j+1, j = 1,, J 1 For J = 3, the logits are log(π 1 /π 2 ) and log(π 2 /π 3 )... These logits are a basic set equivalent to the baseline-category logits. The connections are: and log π j π J = log π j π j+1 + log π j+1 π j+2 + + log π J 1 π J log π j π j+1 = log π j π J log π j+1 π J, J = 1,, J 1. 53

Adjacent Categories Logits A model using these logits with a predictor x has form log( π j π j+1 ) = α j + β j x, j = 1,, J 1 These logits, like the baseline-category logits, determine logits for all pairs of response categories. A simpler version of the above model is: log( π j π j+1 ) = α j + βx, j = 1,, J 1 has identical effects for each pair of adjacent categories. 54

Comments For this model, the effect of X on the odds of making the lower instead of the higher response is the same for all pairs of adjacent categories. This model and proportional odds model use a single parameter, rather than J 1 parameters, for the effect of X. When the model holds, independence is equivalent to β = 0. 55

Comments The simpler adjacent categories logit model implies that the coefficient of x for the logit based on arbitrary response categories a and b (when a > b) equals β(a b). The effect depends on the distance between categories. So this model recognizes the ordering of the response scale. 56

Political Ideology Example: SAS Codes proc catmod data = ideology; weight count; response alogits; model ideology = run; quit; response party; 57

Partial Output Analysis of Weighted Least Squares Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq Intercept 0.0954 0.0321 8.84 0.0029 RESPONSE 1 0.1256 0.1169 1.15 0.2827 2 0.8597 0.1096 61.50 <.0001 3-1.0274 0.1110 85.69 <.0001 party 0 0.2159 0.0298 52.63 <.0001 You need to multiply the coefficient of party by 2, since party has two categories and PROC CATMOD assumes that the sum of the parameters is zero. Similarly, multiply the estimate of standard error by 2. 58

Alternative Method Using ML proc catmod data = ideology ; weight count; population party; model ideology= ( 1 1 1 1 0, 0 1 1 1 0, 0 0 1 1 0, 0 0 0 1 0, 1 1 1 1 4, 0 1 1 1 3, 0 0 1 1 2, 0 0 0 1 1 ) (1= Group1/2, 2= Group2/3, 3= Group3/4, 4= Group4/5, 5= party )/ml; run; quit; 59

Partial Output Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Group1/2 1 9.88 0.0017 Group2/3 1 109.50 <.0001 Group3/4 1 45.21 <.0001 Group4/5 1 9.17 0.0025 party 1 52.59 <.0001 Likelihood Ratio 3 5.52 0.1372 60

Partial Output Analysis of Maximum Likelihood Estimates Standard Chi- Model 1-0.4389 0.1396 9.88 0.0017 2-1.1724 0.1120 109.50 <.0001 3 0.7323 0.1089 45.21 <.0001 4-0.3676 0.1214 9.17 0.0025 5 0.4349 0.0600 52.59 <.0001 61

Our Model is: How To Specify The Design Matrix log( π j π j+1 ) = α j + βx, j = 1,, J 1 We can rewrite it as log( π j π J ) = α j + α j+1 + + α J 1 + (J j)βx, j = 1,, J 1 62

The Design Matrix Hence, the model is: log( π 1 π 5 ) 1 1 1 1 4x log( π 2 π 5 ) 0 1 1 1 3x = log( π 3 π 5 ) 0 0 1 1 2x log( π 4 π 5 ) 0 0 0 1 x α 1 α 2 α 3 α 4 β 63

Discussion The ML estimate of the party affiliation effect is 0.435. The estimated odds that a Democrat s ideology classification is in category j instead of j + 1 are exp( ˆβ) = 1.54 times the estimated odds for Republicans. The estimated odds that a Democrat s ideology is very liberal instead of very conservative are exp[0.435(5 1)] = 5.7 times those for Republicans. Democrats tend to be much more liberal than Republicans. 64

Example: Goodness-of-fit Here G 2 = 5.5 with df = 3 and p-value = 0.1372 -a reasonably good fit. The special case of the model with β = 0 specifies independence of ideology and party affiliation. The G 2 of that model is simply the G 2 statistic for testing independence, which equals G 2 = 62.3 with df = 4. The model with party affiliation fits much better than the independence model. 65

Some R Codes party<-c(1,0) v.lib<-c(80,30) s.lib<-c(81,46) mod<-c(171,148) s.cons<-c(41,84) v.cons<-c(55,99) library(vgam) ideo.fit<-vglm(cbind(v.lib,s.lib,mod,s.cons, v.cons) party, family=acat(link="loge", parallel=t)) summary(ideo.fit) 66

Some R Codes Pearson Residuals: log(p[y=2]/p[y=1]) log(p[y=3]/p[y=2]) log(p[y=4]/p[y=3]) log(p[y=5]/p[y=4]) 1 0.025343-0.054143-1.03393 1.4812 2-0.019599 0.081417 0.91658-1.1516 Coefficients: Value Std. Error t value (Intercept):1 0.43891 0.139615 3.1437 (Intercept):2 1.17242 0.112042 10.4641 (Intercept):3-0.73225 0.108899-6.7242 (Intercept):4 0.36762 0.121376 3.0288 party -0.43486 0.059965-7.2520 67

Some R Codes Number of linear predictors: 4 Names of linear predictors: log(p[y=2]/p[y=1]), log(p[y=3]/p[y=2]), log(p[y=4]/p[y=3]), log(p[y=5]/p[y=4]) Dispersion Parameter for acat family: 1 Residual Deviance: 5.52384 on 3 degrees of freedom Log-likelihood: -1238.411 on 3 degrees of freedom Number of Iterations: 4 68

Connection With Loglinear Models Consider the linear-by-linear association model: log µ ij = λ + λ X i + λ Y j + βu i v j If we take the column scores to be v j = j, the adjacent category logits within row i are log( π j π j+1 ) = log( µ i,j µ i,j+1 ) = log µ i,j log µ i,j+1 = (λ + λ X i + λ Y j + βu i v j ) (λ + λ X i + λ Y j+1 + βu i v j+1 ) = (λ Y j λ Y j+1) + βu i (v j v j+1 ) = (λ Y j λ Y j+1) βu i 69

Continuation-Ratio Logits Continuation-ratio logits are defined as: or as log log π j π j+1 + + π J, j = 1,, J 1 π j+1 π 1 + + π j, j = 1,, J 1 They refer to a binary response that contrasts each category with a grouping of categories from lower (higher) levels of the response scale. The continuation-ratio logit model form is useful when a sequential mechanism such as survival through various age periods, determines the response outcome. 70

Define ω j = P (Y = j Y j). With a predictor X, Interpretations ω j (x) = π j (x) π j (x) + + π J (x), j = 1,, J 1 The continuation-ratio logits are ordinary logits of these conditional probabilities. Namely, log[ω j (x)/(1 ω j (x))] 71

Example: Developmental Toxicity Study Response Concentration (mg/kg per day) Dead Malformation Normal 0(Controls) 15 1 281 62.5 17 0 225 125 22 7 283 250 38 59 202 500 144 132 9 72

Some R Codes library(vgam) conc<-c(0,62.5,125,250,500) dead<-c(15,17,22,38,144) malf<-c(1,0,7,59,132) normal<-c(281,225,283,202,9) toxic.fit<-vglm(cbind(dead,malf,normal) family=cratio()) summary(toxic.fit) conc, 73

Output Call: vglm(formula = cbind(dead, malf, normal) conc, family = cratio()) Pearson Residuals: logit(p[y>1 Y>=1]) logit(p[y>2 Y>=2]) 1-1.19018-0.062996 2-1.05956 1.479412 3 0.58619 0.445589 4 1.59620-0.879435 5-0.62856 0.857503 Coefficients: Value Std. Error t value (Intercept):1 3.2479337 0.15766019 20.601 74

Output (Intercept):2 5.7018965 0.33062798 17.246 conc:1-0.0063891 0.00043476-14.695 conc:2-0.0173747 0.00121260-14.328 Number of linear predictors: 2 Names of linear predictors: logit(p[y>2 Y>=2]) logit(p[y>1 Y>=1]), Dispersion Parameter for cratio family: 1 Residual Deviance: 11.83839 on 6 degrees of freedom Log-likelihood: -730.3872 on 6 degrees of freedom Number of Iterations: 4 75

Notes R fits the continuation-ratio logit models log π 2 + π 3 π 1 = α 1 + β 1 x, log π 3 π 2 = α 2 + β 2 x If we need the usual continuation-ratio logit models log π 1 π 2 + π 3 = α 1 + β 1 x, log π 2 π 3 = α 2 + β 2 x we should take the negative of all the coefficients. 76

data toxic1; input conc dead alive; total=dead+alive; datalines; 0 15 282 62.5 17 225 125 22 290 250 38 261 500 144 141 ; run; SAS Codes: Indirect Way 77

SAS Codes proc genmod data=toxic1 order=data; model dead/total = conc / d=bin link=logit; run; quit; 78

Partial Output Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 5.7775 1.9258 Scaled Deviance 3 5.7775 1.9258 Pearson Chi-Square 3 5.8257 1.9419 Scaled Pearson X2 3 5.8257 1.9419 Log Likelihood -514.7686 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr> ChiSq Intercept 1-3.2479 0.1577-3.5569-2.9389 424.39 <.0001 conc 1 0.0064 0.0004 0.0055 0.0072 215.96 <.0001 79

SAS Codes data toxic2; input conc malform normal; total=malform+normal; datalines; 0 1 281 62.5 0 225 125 7 283 250 59 202 500 132 9 ; run; proc genmod data=toxic2 order=data; model malform/total = conc / d=bin link=logit; run; quit; 80

Partial Output Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 6.0609 2.0203 Scaled Deviance 3 6.0609 2.0203 Pearson Chi-Square 3 3.9331 1.3110 Scaled Pearson X2 3 3.9331 1.3110 Log Likelihood -215.6185 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr> ChiSq Intercept 1-5.7019 0.3322-6.3531-5.0507 294.52 <.0001 conc 1 0.0174 0.0012 0.0150 0.0198 200.42 <.0001 81

Notes The two models here are ordinary logistic regression models in which the responses are column 1 and columns 2 3 combined for one fit and column 2 and column 3 for the second fit. When models for different continuation-ratio logits have separate parameters, as in this example, separate fitting of models for different logits gives the same results as simultaneous fitting. The sum of separate G 2 statistics is an overall goodness-of-fit statistic pertaining to the simultaneous fitting of the models. For this example, G 2 values are 5.78 for the first logit and 6.06 for the second, each based on df = 3. We summarize the fit by their sum, G 2 = 11.84, based on df = 6. 82