ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Size: px

Start display at page:

Download "ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses"

Norah Lester
5 years ago
Views:

1 ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1

2 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities with π π J = 1. If we have n independent observations based on these probabilities, the probability distribution for the no. of outcomes that occur for each J types is called multinomial. Multicategory (or polychotomous) logit models simultaneously refer to all pairs of categories. They describe the odds of response in one category rather than another. Once the model specifies logits for a certain (J 1) pairs of categories, the rest are redundant. 2

3 Baseline Category Logits Logit models for nominal responses pair each response category with a baseline category. The choice of baseline category is arbitrary. If the last category (J) is the baseline, the baseline category logits are: log( π j π J ), j = 1,, J Given that the response falls in category j or J, this is the log odds that the response is j. For J = 3, for instance, the logit model uses log(π 1 /π 3 ) and log(π 2 /π 3 ). 3

4 Baseline Category Logit Models The logit models using the baseline-category logits with a predictor x has form log( π j π J ) = α j + β j x, j = 1,, J Parameters in the (J 1) equations determine parameters for logits using all other pairs of response categories. For instance, for an arbitrary pair of categories a and b log( π a π b ) = log( π a/π J π b /π J ) = log( π a π J ) log( π b π J ) = (α a + β a x) (α b + β b x) = (α a α b ) + (β a β b )x 4

5 Notes The logit equation for categories a and b has intercept parameter (α a α b ) and slope parameter (β a β b ). For optimal efficiency, one should fit J 1 logit equations simultaneously. Estimates of the model parameters will then have smaller standard error than the estimates obtained by fitting the equations separately. For simultaneous fitting, the same parameter estimates occur for a pair of categories no matter which category is baseline. 5

6 Alligator Food Choice Example The data is taken from a study by the Florida Game and Fresh Water Fish Commission of factors influencing the primary food choice of alligators. For 59 alligators sampled in Lake George, Florida, it shows the alligator length (in meters) and the primary food type, in volume, found in the alligator s stomach. Primary food type has three categories: Fish, Invertebrate, and Other. 6

7 Reading The Data data gator; input length choice datalines; 1.24 I 1.30 I 1.30 I 1.32 F 1.32 F 1.40 F 1.42 I 1.42 F 1.45 I 1.45 O 1.47 I 1.47 F 1.50 I 1.52 I 1.55 I 1.60 I 1.63 I 1.65 O 1.65 I 1.65 F 1.65 F 1.68 F 1.70 I 1.73 O 1.78 I 1.78 I 1.78 O 1.80 I 1.80 F 1.85 F 1.88 I 1.93 I 1.98 I 2.03 F 2.03 F 2.16 F 2.26 F 2.31 F 2.31 F 2.36 F 2.36 F 2.39 F 2.41 F 2.44 F 2.46 F 2.56 O 2.67 F 2.72 I 2.79 F 2.84 F 3.25 O 3.28 O 3.33 F 3.56 F 3.58 F 3.66 F 3.68 O 3.71 F 3.89 F ; run; 7

8 Fitting A Baseline-Category Logit Model proc logistic data=gator descending ; model choice (REFERENCE="O") = length / link=glogit scale=none aggregate; output out = prob PREDPROBS=I; run; 8

9 Partial Output Data Set Response Variable Model Information Number of Response Levels 3 Model Optimization Technique WORK.GATOR choice generalized logit Fisher s scoring Number of Observations Read 59 Number of Observations Used 59 Ordered Response Profile Total Value choice Frequency 1 O 8 2 I 20 3 F 31 Logits modeled use choice= O as the reference category. 9

10 Partial Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance Pearson Number of unique profiles: 45 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

11 Partial Output Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq length

12 Partial Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter choice DF Estimate Error Chi-Square Pr > ChiSq Intercept I Intercept F length I length F Odds Ratio Estimates Point 95% Wald Effect choice Estimate Confidence Limits length I length F

13 Example: We applied baseline-category logit model with J = 3, Y = primary food choice is the response. X = length of alligator is the predictor. From the parameter estimates log(ˆπ 1 /ˆπ 3 ) = x, log(ˆπ 2 /ˆπ 3 ) = x The estimated log odds that the response is f ish rather than invertebrate equals log(ˆπ 1 /ˆπ 2 ) = ( ) + [ ( 2.465)]x = x 13

14 Notes For each logit, one interprets the estimates just as in ordinary binary logistic regression models, conditional on the event that the response outcome was one of those two categories. For instance, given that the primary food type is fish or invertebrate, the estimated probability that it is fish increases in length x according to an logistic curve. For alligators of length x + 1 meters, the estimated odds that primary food type is f ish rather than invertebrate equal exp(2.355) = 10.5 times the estimated odds for alligators of length x meters. 14

15 Effect of The Predictor To test the hypothesis that primary food choice is independent of alligator length, we test H 0 : β j = 0 for j = 1, 2 in the model. The LR test takes twice the difference in maximized log likelihoods between this model and the simpler one having response independent of length. The test statistic equals 16.8 with df = 2, giving a p-value of 0.01 and strong effect of a length effect. 15

16 Estimating Response Probabilities One can express the multicategory logit model directly in terms of the response probabilities, as π j = exp(α j + β j x), j = 1,, J 1 J exp(α h + β h x) h=1 The denominator is same for each probability. The numerators for various j sum to the denominator. The parameters equal zero in the above equation for whichever category is the baseline in the logit expressions. 16

17 Example: Alligator Data The estimated probabilities of the outcomes equal ˆπ 1 = ˆπ 2 = ˆπ 3 = exp( x) 1 + exp( x) + exp( x) exp( x) 1 + exp( x) + exp( x) exp( x) + exp( x) 17

18 18

19 Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female Male Black Female Male

20 Example: Y = belief in life after death (Yes, Undecided, No) X 1 = gender, X 2 = race Use dummy variables as predictors, with x 1 = 1 for females and 0 for males, and x 2 = 1 for whites and 0 for blacks. Using no as the baseline category for belief in life after death, we form the model log( π j π 3 ) = α j + β G j x 1 + β R j x 2, j = 1, 2 where G and R superscripts identify the gender and race parameters 20

21 Notes The model assumes a lack of interaction between gender and race. The effect parameters represent log odds ratios with the baseline category. β1 G is the conditional log odds ratio between gender and response categories 1 and 3, given race... β2 G is the conditional log odds ratio between gender and response categories 2 and 3, given race. 21

22 SAS Codes: Enter The Data data afterlife; input race gender belief count datalines; ; run; 22

23 Fit The Model proc logistic data = afterlife descending; weight count; model belief (reference="3") = race gender /link=glogit scale = none aggregate; output out = prob PREDPROBS=I; run; 23

24 Partial Output Response Profile Ordered Total Total Value belief Frequency Weight Logits modeled use belief=3 as the reference category. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. 24

25 Partial Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance Pearson Number of unique profiles: 45 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

26 Partial Output Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq race gender

27 Partial Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter belief DF Estimate Error Chi-Square Pr >Chi-Square Intercept Intercept race race gender gender Odds Ratio Estimates Point 95 % Wald Effect belief Estimate Confidence Limits race race gender gender

28 Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female (372.8) (49.2) (72.1) Male (248.2) (44.8) (72.9) Black Female (62.2) (8.8) (16.9) Male (26.8) (5.2) (11.1) 28

29 Example The goodness-of-fit statistics are G 2 = 0.8 and χ 2 = 0.9. The sample has two non-redundant logits at each of four gender-race combinations, for a total of eight logits. The model under consideration has six parameters. Thus, residual df = 8 6 = 2. The model fits well. 29

30 Discussion On The Example From the parameter estimates, the estimated odds of a yes rather than a no response for females are exp(0.419) = 1.5 times those for males, controlling for race; For whites, they are exp(0.342) = 1.4 times those for blacks, controlling for gender. To test the effect of gender, we test H 0 : β G j = 0 for j = 1, 2. The LR test compares G 2 = 0.8(df = 2) to G 2 = 8.0(df = 4) obtained by dropping gender from model. The difference of 7.2, based on df = 2, has a p-value of 0.03 and shows evidence of a gender effect. By contrast, the effect of race is not significant. 30

31 Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female Male Black Female Male

32 Connection With Loglinear Models When all explanatory variables are categorical, logit models have corresponding loglinear models. The model fitted to the previous example, assumes main effects of gender (G) and race (R) on belief (B) in afterlife, with no interaction. It corresponds to the loglinear model (GR, BG, BR) of homogeneous association. The simpler logit model that deletes the race effect on belief corresponds to loglinear model (GR, BG). 32

33 ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Ordinal Responses 33

34 Models For Ordinal Responses When response categories are ordered, logits can directly incorporate the ordering. We can have models with simpler interpretations. Define the j-th cumulative probability that the response Y falls in category j or below as P (Y j) = π π j, j = 1,, J The cumulative probabilities reflect the ordering, with P (Y 1) P (Y 2) P (Y J) = 1 Models for cumulative probabilities do not use the final one P (Y J), since it equals 1. 34

35 Cumulative Logits The logits of the first J 1 cumulative probabilities are P (Y j) logit[p (Y j)] = log( 1 P (Y j) ) These are called cumulative logits. = log( π π j π j π j ), j = 1,, J 1 A model for the j-th cumulative logit looks like an ordinary logit model for a binary response in which categories 1 to j combine to form a single category, and categories j + 1 to J form a second category. 35

36 For a predictor X, the model Proportional Odds Model logit[p (Y j)] = α j + βx, j = 1, J 1 has parameter β describing the effect of X on the log odds of response in category j or below. This model assumes an identical effect of X for all J 1 cumulative logits. When this model fits well, it requires a single parameter rather than J 1 parameters to describe the effect of X. 36

37 Interpretations This model refers to odds ratios for the collapsed response scale, for any fixed j. For two values x 1 and x 2 of X, the odds ratio utilizes the cumulative probabilities and their complements. We have log[ P (Y j X = x 2)/P (Y > j X = x 2 ) P (Y j X = x 1 )/P (Y > j X = x 1 ) ] = β(x 2 x 1 ) Since the log odds ratio is proportional to the distance between the x values with same proportionality constant β for any j, it is called a proportional odds model. 37

38 Comments For x 2 x 1 = 1, the odds of response below any given category multiply by e β for each unit increase in X. When the model holds with β = 0, X and Y are statistically independent. Explanatory variables in cumulative logit models can be continuous, categorical or of both types. The ML fitting process uses an iterative algorithm simultaneously for all j. 38

39 Example: Political Ideology Political Ideology Party Very Slightly Slightly Very Affiliation Liberal Liberal Moderate Conservative Conservative Total Democratic Republican

40 Example: Fit A Proportional Odds Model Political ideology uses a five-point ordinal scale, ranging from very liberal to very conservative. Let X be a dummy variable for political party, with X = 1 for Democrats and X = 0 for republicans. 40

41 SAS Codes: Read The Data data ideology; input party ideology count datalines; ; 41

42 SAS Codes: Fit The Model proc logistic data = ideology order=data descending; class party /param = ref; freq count; model ideology = party /link=clogit scale=none; output out = prob PREDPROBS=I; run; 42

43 Partial Output: Response Profile Response Profile Ordered Total Value ideology Frequency Probabilities modeled are cumulated over the lower Ordered Values. 43

44 Partial Output: Fit Statistics Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

45 Output: Testing For Effects Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr>ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr> ChiSq party <

46 Partial Output: Response Profile Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr >ChiSq Intercept <.0001 Intercept <.0001 Intercept <.0001 Intercept <.0001 party <

47 SAS Codes: Fit The Model proc freq data = prob noprint; weight count; tables party*ip 1*ip 2*ip 3*ip 4*ip 5 /list nocum nopercent out=test ; run; data table8 6; set test; array p(5) ip 1-ip 5; array pcount(5); do i = 1 to 5; pcount(i) = count*p(i); end; drop ip 1-ip 5 i percent; run; proc print data = table8 6 noobs; run 47

48 Output party COUNT pcount1 pcount2 pcount3 pcount4 pcount

49 Example: Political Ideology Political Ideology Party Very Slightly Slightly Very Affiliation Liberal Liberal Moderate Conservative Conservative Total Democratic (78.4) (83.2) (168.2) (49.1) (49.1) Republican (31.8) (44.0) (151.7) (75.5) (104.0) 49

50 Example: Discussion The ML fit of the proportional odds model has estimated effect β = 0.975(ASE = 0.129). For any fixed j, the estimated odds that a Democrat s response is in the liberal direction rather than the conservative direction equal exp(0.975) = 2.65 times the estimated odds for Republicans. A 95% CI for this odds ratio equals exp(0.975 ± ) = (2.1, 3.4). A fairly substantial association exists, Democrats tending to be more liberal than Republicans. 50

51 Example: Predicted Probabilities The cumulative probabilities equal P (Y j X = x) = exp(α j + βx) 1 + exp(α j + βx) The first estimated cumulative probability for Democrats (x = 1) equals exp[ (1)] 1 + exp[ (1)] = 0.18 The estimated probability of the j-th category can be obtained as P (Y j X = x) P (Y j 1 X = x) 51

52 Ordinal Tests Of Independence The LR statistic for an ordinal test of independence (H 0 : β = 0) is the difference in deviance (G 2 ) values for the independence model and the proportional odds model. In this example, the difference in G 2 values of = 58.6, based on df = 4 3 = 1, gives extremely strong evidence of an association (p-value < ). When the model fits well, it is more powerful than the test of independence discussed in the context of two-way tables based on df = (I 1)(J 1), since it focuses on a restricted alternative and has only one degree of freedom. 52

53 Adjacent Categories Logits The adjacent categories logits are logit[(p (Y = j Y = jorj + 1)] = log π j π j+1, j = 1,, J 1 For J = 3, the logits are log(π 1 /π 2 ) and log(π 2 /π 3 )... These logits are a basic set equivalent to the baseline-category logits. The connections are: and log π j π J = log π j π j+1 + log π j+1 π j log π J 1 π J log π j π j+1 = log π j π J log π j+1 π J, J = 1,, J 1. 53

54 Adjacent Categories Logits A model using these logits with a predictor x has form log( π j π j+1 ) = α j + β j x, j = 1,, J 1 These logits, like the baseline-category logits, determine logits for all pairs of response categories. A simpler version of the above model is: log( π j π j+1 ) = α j + βx, j = 1,, J 1 has identical effects for each pair of adjacent categories. 54

55 Comments For this model, the effect of X on the odds of making the lower instead of the higher response is the same for all pairs of adjacent categories. This model and proportional odds model use a single parameter, rather than J 1 parameters, for the effect of X. When the model holds, independence is equivalent to β = 0. 55

56 Comments The simpler adjacent categories logit model implies that the coefficient of x for the logit based on arbitrary response categories a and b (when a > b) equals β(a b). The effect depends on the distance between categories. So this model recognizes the ordering of the response scale. 56

57 Political Ideology Example: SAS Codes proc catmod data = ideology; weight count; response alogits; model ideology = run; quit; response party; 57

58 Partial Output Analysis of Weighted Least Squares Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq Intercept RESPONSE < <.0001 party <.0001 You need to multiply the coefficient of party by 2, since party has two categories and PROC CATMOD assumes that the sum of the parameters is zero. Similarly, multiply the estimate of standard error by 2. 58

59 Alternative Method Using ML proc catmod data = ideology ; weight count; population party; model ideology= ( , , , , , , , ) (1= Group1/2, 2= Group2/3, 3= Group3/4, 4= Group4/5, 5= party )/ml; run; quit; 59

60 Partial Output Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Group1/ Group2/ <.0001 Group3/ <.0001 Group4/ party <.0001 Likelihood Ratio

61 Partial Output Analysis of Maximum Likelihood Estimates Standard Chi- Model < < <

62 Our Model is: How To Specify The Design Matrix log( π j π j+1 ) = α j + βx, j = 1,, J 1 We can rewrite it as log( π j π J ) = α j + α j α J 1 + (J j)βx, j = 1,, J 1 62

63 The Design Matrix Hence, the model is: log( π 1 π 5 ) x log( π 2 π 5 ) x = log( π 3 π 5 ) x log( π 4 π 5 ) x α 1 α 2 α 3 α 4 β 63

64 Discussion The ML estimate of the party affiliation effect is The estimated odds that a Democrat s ideology classification is in category j instead of j + 1 are exp( ˆβ) = 1.54 times the estimated odds for Republicans. The estimated odds that a Democrat s ideology is very liberal instead of very conservative are exp[0.435(5 1)] = 5.7 times those for Republicans. Democrats tend to be much more liberal than Republicans. 64

65 Example: Goodness-of-fit Here G 2 = 5.5 with df = 3 and p-value = a reasonably good fit. The special case of the model with β = 0 specifies independence of ideology and party affiliation. The G 2 of that model is simply the G 2 statistic for testing independence, which equals G 2 = 62.3 with df = 4. The model with party affiliation fits much better than the independence model. 65

66 Some R Codes party<-c(1,0) v.lib<-c(80,30) s.lib<-c(81,46) mod<-c(171,148) s.cons<-c(41,84) v.cons<-c(55,99) library(vgam) ideo.fit<-vglm(cbind(v.lib,s.lib,mod,s.cons, v.cons) party, family=acat(link="loge", parallel=t)) summary(ideo.fit) 66

67 Some R Codes Pearson Residuals: log(p[y=2]/p[y=1]) log(p[y=3]/p[y=2]) log(p[y=4]/p[y=3]) log(p[y=5]/p[y=4]) Coefficients: Value Std. Error t value (Intercept): (Intercept): (Intercept): (Intercept): party

68 Some R Codes Number of linear predictors: 4 Names of linear predictors: log(p[y=2]/p[y=1]), log(p[y=3]/p[y=2]), log(p[y=4]/p[y=3]), log(p[y=5]/p[y=4]) Dispersion Parameter for acat family: 1 Residual Deviance: on 3 degrees of freedom Log-likelihood: on 3 degrees of freedom Number of Iterations: 4 68

69 Connection With Loglinear Models Consider the linear-by-linear association model: log µ ij = λ + λ X i + λ Y j + βu i v j If we take the column scores to be v j = j, the adjacent category logits within row i are log( π j π j+1 ) = log( µ i,j µ i,j+1 ) = log µ i,j log µ i,j+1 = (λ + λ X i + λ Y j + βu i v j ) (λ + λ X i + λ Y j+1 + βu i v j+1 ) = (λ Y j λ Y j+1) + βu i (v j v j+1 ) = (λ Y j λ Y j+1) βu i 69

70 Continuation-Ratio Logits Continuation-ratio logits are defined as: or as log log π j π j π J, j = 1,, J 1 π j+1 π π j, j = 1,, J 1 They refer to a binary response that contrasts each category with a grouping of categories from lower (higher) levels of the response scale. The continuation-ratio logit model form is useful when a sequential mechanism such as survival through various age periods, determines the response outcome. 70

71 Define ω j = P (Y = j Y j). With a predictor X, Interpretations ω j (x) = π j (x) π j (x) + + π J (x), j = 1,, J 1 The continuation-ratio logits are ordinary logits of these conditional probabilities. Namely, log[ω j (x)/(1 ω j (x))] 71

72 Example: Developmental Toxicity Study Response Concentration (mg/kg per day) Dead Malformation Normal 0(Controls)

73 Some R Codes library(vgam) conc<-c(0,62.5,125,250,500) dead<-c(15,17,22,38,144) malf<-c(1,0,7,59,132) normal<-c(281,225,283,202,9) toxic.fit<-vglm(cbind(dead,malf,normal) family=cratio()) summary(toxic.fit) conc, 73

74 Output Call: vglm(formula = cbind(dead, malf, normal) conc, family = cratio()) Pearson Residuals: logit(p[y>1 Y>=1]) logit(p[y>2 Y>=2]) Coefficients: Value Std. Error t value (Intercept):

75 Output (Intercept): conc: conc: Number of linear predictors: 2 Names of linear predictors: logit(p[y>2 Y>=2]) logit(p[y>1 Y>=1]), Dispersion Parameter for cratio family: 1 Residual Deviance: on 6 degrees of freedom Log-likelihood: on 6 degrees of freedom Number of Iterations: 4 75

76 Notes R fits the continuation-ratio logit models log π 2 + π 3 π 1 = α 1 + β 1 x, log π 3 π 2 = α 2 + β 2 x If we need the usual continuation-ratio logit models log π 1 π 2 + π 3 = α 1 + β 1 x, log π 2 π 3 = α 2 + β 2 x we should take the negative of all the coefficients. 76

77 data toxic1; input conc dead alive; total=dead+alive; datalines; ; run; SAS Codes: Indirect Way 77

78 SAS Codes proc genmod data=toxic1 order=data; model dead/total = conc / d=bin link=logit; run; quit; 78

79 Partial Output Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr> ChiSq Intercept <.0001 conc <

80 SAS Codes data toxic2; input conc malform normal; total=malform+normal; datalines; ; run; proc genmod data=toxic2 order=data; model malform/total = conc / d=bin link=logit; run; quit; 80

81 Partial Output Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr> ChiSq Intercept <.0001 conc <

82 Notes The two models here are ordinary logistic regression models in which the responses are column 1 and columns 2 3 combined for one fit and column 2 and column 3 for the second fit. When models for different continuation-ratio logits have separate parameters, as in this example, separate fitting of models for different logits gives the same results as simultaneous fitting. The sum of separate G 2 statistics is an overall goodness-of-fit statistic pertaining to the simultaneous fitting of the models. For this example, G 2 values are 5.78 for the first logit and 6.06 for the second, each based on df = 3. We summarize the fit by their sum, G 2 = 11.84, based on df = 6. 82

Multinomial Logistic Regression Models

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word