ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Size: px
Start display at page:

Download "ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses"

Transcription

1 ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1

2 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities with π π J = 1. If we have n independent observations based on these probabilities, the probability distribution for the no. of outcomes that occur for each J types is called multinomial. Multicategory (or polychotomous) logit models simultaneously refer to all pairs of categories. They describe the odds of response in one category rather than another. Once the model specifies logits for a certain (J 1) pairs of categories, the rest are redundant. 2

3 Baseline Category Logits Logit models for nominal responses pair each response category with a baseline category. The choice of baseline category is arbitrary. If the last category (J) is the baseline, the baseline category logits are: log( π j π J ), j = 1,, J Given that the response falls in category j or J, this is the log odds that the response is j. For J = 3, for instance, the logit model uses log(π 1 /π 3 ) and log(π 2 /π 3 ). 3

4 Baseline Category Logit Models The logit models using the baseline-category logits with a predictor x has form log( π j π J ) = α j + β j x, j = 1,, J Parameters in the (J 1) equations determine parameters for logits using all other pairs of response categories. For instance, for an arbitrary pair of categories a and b log( π a π b ) = log( π a/π J π b /π J ) = log( π a π J ) log( π b π J ) = (α a + β a x) (α b + β b x) = (α a α b ) + (β a β b )x 4

5 Notes The logit equation for categories a and b has intercept parameter (α a α b ) and slope parameter (β a β b ). For optimal efficiency, one should fit J 1 logit equations simultaneously. Estimates of the model parameters will then have smaller standard error than the estimates obtained by fitting the equations separately. For simultaneous fitting, the same parameter estimates occur for a pair of categories no matter which category is baseline. 5

6 Alligator Food Choice Example The data is taken from a study by the Florida Game and Fresh Water Fish Commission of factors influencing the primary food choice of alligators. For 59 alligators sampled in Lake George, Florida, it shows the alligator length (in meters) and the primary food type, in volume, found in the alligator s stomach. Primary food type has three categories: Fish, Invertebrate, and Other. 6

7 Reading The Data data gator; input length choice datalines; 1.24 I 1.30 I 1.30 I 1.32 F 1.32 F 1.40 F 1.42 I 1.42 F 1.45 I 1.45 O 1.47 I 1.47 F 1.50 I 1.52 I 1.55 I 1.60 I 1.63 I 1.65 O 1.65 I 1.65 F 1.65 F 1.68 F 1.70 I 1.73 O 1.78 I 1.78 I 1.78 O 1.80 I 1.80 F 1.85 F 1.88 I 1.93 I 1.98 I 2.03 F 2.03 F 2.16 F 2.26 F 2.31 F 2.31 F 2.36 F 2.36 F 2.39 F 2.41 F 2.44 F 2.46 F 2.56 O 2.67 F 2.72 I 2.79 F 2.84 F 3.25 O 3.28 O 3.33 F 3.56 F 3.58 F 3.66 F 3.68 O 3.71 F 3.89 F ; run; 7

8 Fitting A Baseline-Category Logit Model proc logistic data=gator descending ; model choice (REFERENCE="O") = length / link=glogit scale=none aggregate; output out = prob PREDPROBS=I; run; 8

9 Partial Output Data Set Response Variable Model Information Number of Response Levels 3 Model Optimization Technique WORK.GATOR choice generalized logit Fisher s scoring Number of Observations Read 59 Number of Observations Used 59 Ordered Response Profile Total Value choice Frequency 1 O 8 2 I 20 3 F 31 Logits modeled use choice= O as the reference category. 9

10 Partial Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance Pearson Number of unique profiles: 45 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

11 Partial Output Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq length

12 Partial Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter choice DF Estimate Error Chi-Square Pr > ChiSq Intercept I Intercept F length I length F Odds Ratio Estimates Point 95% Wald Effect choice Estimate Confidence Limits length I length F

13 Example: We applied baseline-category logit model with J = 3, Y = primary food choice is the response. X = length of alligator is the predictor. From the parameter estimates log(ˆπ 1 /ˆπ 3 ) = x, log(ˆπ 2 /ˆπ 3 ) = x The estimated log odds that the response is f ish rather than invertebrate equals log(ˆπ 1 /ˆπ 2 ) = ( ) + [ ( 2.465)]x = x 13

14 Notes For each logit, one interprets the estimates just as in ordinary binary logistic regression models, conditional on the event that the response outcome was one of those two categories. For instance, given that the primary food type is fish or invertebrate, the estimated probability that it is fish increases in length x according to an logistic curve. For alligators of length x + 1 meters, the estimated odds that primary food type is f ish rather than invertebrate equal exp(2.355) = 10.5 times the estimated odds for alligators of length x meters. 14

15 Effect of The Predictor To test the hypothesis that primary food choice is independent of alligator length, we test H 0 : β j = 0 for j = 1, 2 in the model. The LR test takes twice the difference in maximized log likelihoods between this model and the simpler one having response independent of length. The test statistic equals 16.8 with df = 2, giving a p-value of 0.01 and strong effect of a length effect. 15

16 Estimating Response Probabilities One can express the multicategory logit model directly in terms of the response probabilities, as π j = exp(α j + β j x), j = 1,, J 1 J exp(α h + β h x) h=1 The denominator is same for each probability. The numerators for various j sum to the denominator. The parameters equal zero in the above equation for whichever category is the baseline in the logit expressions. 16

17 Example: Alligator Data The estimated probabilities of the outcomes equal ˆπ 1 = ˆπ 2 = ˆπ 3 = exp( x) 1 + exp( x) + exp( x) exp( x) 1 + exp( x) + exp( x) exp( x) + exp( x) 17

18 18

19 Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female Male Black Female Male

20 Example: Y = belief in life after death (Yes, Undecided, No) X 1 = gender, X 2 = race Use dummy variables as predictors, with x 1 = 1 for females and 0 for males, and x 2 = 1 for whites and 0 for blacks. Using no as the baseline category for belief in life after death, we form the model log( π j π 3 ) = α j + β G j x 1 + β R j x 2, j = 1, 2 where G and R superscripts identify the gender and race parameters 20

21 Notes The model assumes a lack of interaction between gender and race. The effect parameters represent log odds ratios with the baseline category. β1 G is the conditional log odds ratio between gender and response categories 1 and 3, given race... β2 G is the conditional log odds ratio between gender and response categories 2 and 3, given race. 21

22 SAS Codes: Enter The Data data afterlife; input race gender belief count datalines; ; run; 22

23 Fit The Model proc logistic data = afterlife descending; weight count; model belief (reference="3") = race gender /link=glogit scale = none aggregate; output out = prob PREDPROBS=I; run; 23

24 Partial Output Response Profile Ordered Total Total Value belief Frequency Weight Logits modeled use belief=3 as the reference category. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. 24

25 Partial Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance Pearson Number of unique profiles: 45 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

26 Partial Output Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq race gender

27 Partial Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter belief DF Estimate Error Chi-Square Pr >Chi-Square Intercept Intercept race race gender gender Odds Ratio Estimates Point 95 % Wald Effect belief Estimate Confidence Limits race race gender gender

28 Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female (372.8) (49.2) (72.1) Male (248.2) (44.8) (72.9) Black Female (62.2) (8.8) (16.9) Male (26.8) (5.2) (11.1) 28

29 Example The goodness-of-fit statistics are G 2 = 0.8 and χ 2 = 0.9. The sample has two non-redundant logits at each of four gender-race combinations, for a total of eight logits. The model under consideration has six parameters. Thus, residual df = 8 6 = 2. The model fits well. 29

30 Discussion On The Example From the parameter estimates, the estimated odds of a yes rather than a no response for females are exp(0.419) = 1.5 times those for males, controlling for race; For whites, they are exp(0.342) = 1.4 times those for blacks, controlling for gender. To test the effect of gender, we test H 0 : β G j = 0 for j = 1, 2. The LR test compares G 2 = 0.8(df = 2) to G 2 = 8.0(df = 4) obtained by dropping gender from model. The difference of 7.2, based on df = 2, has a p-value of 0.03 and shows evidence of a gender effect. By contrast, the effect of race is not significant. 30

31 Example: Belief in Afterlife Belief in Afterlife Race Gender Yes Undecided No White Female Male Black Female Male

32 Connection With Loglinear Models When all explanatory variables are categorical, logit models have corresponding loglinear models. The model fitted to the previous example, assumes main effects of gender (G) and race (R) on belief (B) in afterlife, with no interaction. It corresponds to the loglinear model (GR, BG, BR) of homogeneous association. The simpler logit model that deletes the race effect on belief corresponds to loglinear model (GR, BG). 32

33 ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Ordinal Responses 33

34 Models For Ordinal Responses When response categories are ordered, logits can directly incorporate the ordering. We can have models with simpler interpretations. Define the j-th cumulative probability that the response Y falls in category j or below as P (Y j) = π π j, j = 1,, J The cumulative probabilities reflect the ordering, with P (Y 1) P (Y 2) P (Y J) = 1 Models for cumulative probabilities do not use the final one P (Y J), since it equals 1. 34

35 Cumulative Logits The logits of the first J 1 cumulative probabilities are P (Y j) logit[p (Y j)] = log( 1 P (Y j) ) These are called cumulative logits. = log( π π j π j π j ), j = 1,, J 1 A model for the j-th cumulative logit looks like an ordinary logit model for a binary response in which categories 1 to j combine to form a single category, and categories j + 1 to J form a second category. 35

36 For a predictor X, the model Proportional Odds Model logit[p (Y j)] = α j + βx, j = 1, J 1 has parameter β describing the effect of X on the log odds of response in category j or below. This model assumes an identical effect of X for all J 1 cumulative logits. When this model fits well, it requires a single parameter rather than J 1 parameters to describe the effect of X. 36

37 Interpretations This model refers to odds ratios for the collapsed response scale, for any fixed j. For two values x 1 and x 2 of X, the odds ratio utilizes the cumulative probabilities and their complements. We have log[ P (Y j X = x 2)/P (Y > j X = x 2 ) P (Y j X = x 1 )/P (Y > j X = x 1 ) ] = β(x 2 x 1 ) Since the log odds ratio is proportional to the distance between the x values with same proportionality constant β for any j, it is called a proportional odds model. 37

38 Comments For x 2 x 1 = 1, the odds of response below any given category multiply by e β for each unit increase in X. When the model holds with β = 0, X and Y are statistically independent. Explanatory variables in cumulative logit models can be continuous, categorical or of both types. The ML fitting process uses an iterative algorithm simultaneously for all j. 38

39 Example: Political Ideology Political Ideology Party Very Slightly Slightly Very Affiliation Liberal Liberal Moderate Conservative Conservative Total Democratic Republican

40 Example: Fit A Proportional Odds Model Political ideology uses a five-point ordinal scale, ranging from very liberal to very conservative. Let X be a dummy variable for political party, with X = 1 for Democrats and X = 0 for republicans. 40

41 SAS Codes: Read The Data data ideology; input party ideology count datalines; ; 41

42 SAS Codes: Fit The Model proc logistic data = ideology order=data descending; class party /param = ref; freq count; model ideology = party /link=clogit scale=none; output out = prob PREDPROBS=I; run; 42

43 Partial Output: Response Profile Response Profile Ordered Total Value ideology Frequency Probabilities modeled are cumulated over the lower Ordered Values. 43

44 Partial Output: Fit Statistics Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

45 Output: Testing For Effects Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr>ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr> ChiSq party <

46 Partial Output: Response Profile Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr >ChiSq Intercept <.0001 Intercept <.0001 Intercept <.0001 Intercept <.0001 party <

47 SAS Codes: Fit The Model proc freq data = prob noprint; weight count; tables party*ip 1*ip 2*ip 3*ip 4*ip 5 /list nocum nopercent out=test ; run; data table8 6; set test; array p(5) ip 1-ip 5; array pcount(5); do i = 1 to 5; pcount(i) = count*p(i); end; drop ip 1-ip 5 i percent; run; proc print data = table8 6 noobs; run 47

48 Output party COUNT pcount1 pcount2 pcount3 pcount4 pcount

49 Example: Political Ideology Political Ideology Party Very Slightly Slightly Very Affiliation Liberal Liberal Moderate Conservative Conservative Total Democratic (78.4) (83.2) (168.2) (49.1) (49.1) Republican (31.8) (44.0) (151.7) (75.5) (104.0) 49

50 Example: Discussion The ML fit of the proportional odds model has estimated effect β = 0.975(ASE = 0.129). For any fixed j, the estimated odds that a Democrat s response is in the liberal direction rather than the conservative direction equal exp(0.975) = 2.65 times the estimated odds for Republicans. A 95% CI for this odds ratio equals exp(0.975 ± ) = (2.1, 3.4). A fairly substantial association exists, Democrats tending to be more liberal than Republicans. 50

51 Example: Predicted Probabilities The cumulative probabilities equal P (Y j X = x) = exp(α j + βx) 1 + exp(α j + βx) The first estimated cumulative probability for Democrats (x = 1) equals exp[ (1)] 1 + exp[ (1)] = 0.18 The estimated probability of the j-th category can be obtained as P (Y j X = x) P (Y j 1 X = x) 51

52 Ordinal Tests Of Independence The LR statistic for an ordinal test of independence (H 0 : β = 0) is the difference in deviance (G 2 ) values for the independence model and the proportional odds model. In this example, the difference in G 2 values of = 58.6, based on df = 4 3 = 1, gives extremely strong evidence of an association (p-value < ). When the model fits well, it is more powerful than the test of independence discussed in the context of two-way tables based on df = (I 1)(J 1), since it focuses on a restricted alternative and has only one degree of freedom. 52

53 Adjacent Categories Logits The adjacent categories logits are logit[(p (Y = j Y = jorj + 1)] = log π j π j+1, j = 1,, J 1 For J = 3, the logits are log(π 1 /π 2 ) and log(π 2 /π 3 )... These logits are a basic set equivalent to the baseline-category logits. The connections are: and log π j π J = log π j π j+1 + log π j+1 π j log π J 1 π J log π j π j+1 = log π j π J log π j+1 π J, J = 1,, J 1. 53

54 Adjacent Categories Logits A model using these logits with a predictor x has form log( π j π j+1 ) = α j + β j x, j = 1,, J 1 These logits, like the baseline-category logits, determine logits for all pairs of response categories. A simpler version of the above model is: log( π j π j+1 ) = α j + βx, j = 1,, J 1 has identical effects for each pair of adjacent categories. 54

55 Comments For this model, the effect of X on the odds of making the lower instead of the higher response is the same for all pairs of adjacent categories. This model and proportional odds model use a single parameter, rather than J 1 parameters, for the effect of X. When the model holds, independence is equivalent to β = 0. 55

56 Comments The simpler adjacent categories logit model implies that the coefficient of x for the logit based on arbitrary response categories a and b (when a > b) equals β(a b). The effect depends on the distance between categories. So this model recognizes the ordering of the response scale. 56

57 Political Ideology Example: SAS Codes proc catmod data = ideology; weight count; response alogits; model ideology = run; quit; response party; 57

58 Partial Output Analysis of Weighted Least Squares Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq Intercept RESPONSE < <.0001 party <.0001 You need to multiply the coefficient of party by 2, since party has two categories and PROC CATMOD assumes that the sum of the parameters is zero. Similarly, multiply the estimate of standard error by 2. 58

59 Alternative Method Using ML proc catmod data = ideology ; weight count; population party; model ideology= ( , , , , , , , ) (1= Group1/2, 2= Group2/3, 3= Group3/4, 4= Group4/5, 5= party )/ml; run; quit; 59

60 Partial Output Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Group1/ Group2/ <.0001 Group3/ <.0001 Group4/ party <.0001 Likelihood Ratio

61 Partial Output Analysis of Maximum Likelihood Estimates Standard Chi- Model < < <

62 Our Model is: How To Specify The Design Matrix log( π j π j+1 ) = α j + βx, j = 1,, J 1 We can rewrite it as log( π j π J ) = α j + α j α J 1 + (J j)βx, j = 1,, J 1 62

63 The Design Matrix Hence, the model is: log( π 1 π 5 ) x log( π 2 π 5 ) x = log( π 3 π 5 ) x log( π 4 π 5 ) x α 1 α 2 α 3 α 4 β 63

64 Discussion The ML estimate of the party affiliation effect is The estimated odds that a Democrat s ideology classification is in category j instead of j + 1 are exp( ˆβ) = 1.54 times the estimated odds for Republicans. The estimated odds that a Democrat s ideology is very liberal instead of very conservative are exp[0.435(5 1)] = 5.7 times those for Republicans. Democrats tend to be much more liberal than Republicans. 64

65 Example: Goodness-of-fit Here G 2 = 5.5 with df = 3 and p-value = a reasonably good fit. The special case of the model with β = 0 specifies independence of ideology and party affiliation. The G 2 of that model is simply the G 2 statistic for testing independence, which equals G 2 = 62.3 with df = 4. The model with party affiliation fits much better than the independence model. 65

66 Some R Codes party<-c(1,0) v.lib<-c(80,30) s.lib<-c(81,46) mod<-c(171,148) s.cons<-c(41,84) v.cons<-c(55,99) library(vgam) ideo.fit<-vglm(cbind(v.lib,s.lib,mod,s.cons, v.cons) party, family=acat(link="loge", parallel=t)) summary(ideo.fit) 66

67 Some R Codes Pearson Residuals: log(p[y=2]/p[y=1]) log(p[y=3]/p[y=2]) log(p[y=4]/p[y=3]) log(p[y=5]/p[y=4]) Coefficients: Value Std. Error t value (Intercept): (Intercept): (Intercept): (Intercept): party

68 Some R Codes Number of linear predictors: 4 Names of linear predictors: log(p[y=2]/p[y=1]), log(p[y=3]/p[y=2]), log(p[y=4]/p[y=3]), log(p[y=5]/p[y=4]) Dispersion Parameter for acat family: 1 Residual Deviance: on 3 degrees of freedom Log-likelihood: on 3 degrees of freedom Number of Iterations: 4 68

69 Connection With Loglinear Models Consider the linear-by-linear association model: log µ ij = λ + λ X i + λ Y j + βu i v j If we take the column scores to be v j = j, the adjacent category logits within row i are log( π j π j+1 ) = log( µ i,j µ i,j+1 ) = log µ i,j log µ i,j+1 = (λ + λ X i + λ Y j + βu i v j ) (λ + λ X i + λ Y j+1 + βu i v j+1 ) = (λ Y j λ Y j+1) + βu i (v j v j+1 ) = (λ Y j λ Y j+1) βu i 69

70 Continuation-Ratio Logits Continuation-ratio logits are defined as: or as log log π j π j π J, j = 1,, J 1 π j+1 π π j, j = 1,, J 1 They refer to a binary response that contrasts each category with a grouping of categories from lower (higher) levels of the response scale. The continuation-ratio logit model form is useful when a sequential mechanism such as survival through various age periods, determines the response outcome. 70

71 Define ω j = P (Y = j Y j). With a predictor X, Interpretations ω j (x) = π j (x) π j (x) + + π J (x), j = 1,, J 1 The continuation-ratio logits are ordinary logits of these conditional probabilities. Namely, log[ω j (x)/(1 ω j (x))] 71

72 Example: Developmental Toxicity Study Response Concentration (mg/kg per day) Dead Malformation Normal 0(Controls)

73 Some R Codes library(vgam) conc<-c(0,62.5,125,250,500) dead<-c(15,17,22,38,144) malf<-c(1,0,7,59,132) normal<-c(281,225,283,202,9) toxic.fit<-vglm(cbind(dead,malf,normal) family=cratio()) summary(toxic.fit) conc, 73

74 Output Call: vglm(formula = cbind(dead, malf, normal) conc, family = cratio()) Pearson Residuals: logit(p[y>1 Y>=1]) logit(p[y>2 Y>=2]) Coefficients: Value Std. Error t value (Intercept):

75 Output (Intercept): conc: conc: Number of linear predictors: 2 Names of linear predictors: logit(p[y>2 Y>=2]) logit(p[y>1 Y>=1]), Dispersion Parameter for cratio family: 1 Residual Deviance: on 6 degrees of freedom Log-likelihood: on 6 degrees of freedom Number of Iterations: 4 75

76 Notes R fits the continuation-ratio logit models log π 2 + π 3 π 1 = α 1 + β 1 x, log π 3 π 2 = α 2 + β 2 x If we need the usual continuation-ratio logit models log π 1 π 2 + π 3 = α 1 + β 1 x, log π 2 π 3 = α 2 + β 2 x we should take the negative of all the coefficients. 76

77 data toxic1; input conc dead alive; total=dead+alive; datalines; ; run; SAS Codes: Indirect Way 77

78 SAS Codes proc genmod data=toxic1 order=data; model dead/total = conc / d=bin link=logit; run; quit; 78

79 Partial Output Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr> ChiSq Intercept <.0001 conc <

80 SAS Codes data toxic2; input conc malform normal; total=malform+normal; datalines; ; run; proc genmod data=toxic2 order=data; model malform/total = conc / d=bin link=logit; run; quit; 80

81 Partial Output Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr> ChiSq Intercept <.0001 conc <

82 Notes The two models here are ordinary logistic regression models in which the responses are column 1 and columns 2 3 combined for one fit and column 2 and column 3 for the second fit. When models for different continuation-ratio logits have separate parameters, as in this example, separate fitting of models for different logits gives the same results as simultaneous fitting. The sum of separate G 2 statistics is an overall goodness-of-fit statistic pertaining to the simultaneous fitting of the models. For this example, G 2 values are 5.78 for the first logit and 6.06 for the second, each based on df = 3. We summarize the fit by their sum, G 2 = 11.84, based on df = 6. 82

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Ch 6: Multicategory Logit Models

Ch 6: Multicategory Logit Models 293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Logistic Regression for Ordinal Responses

Logistic Regression for Ordinal Responses Logistic Regression for Ordinal Responses Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Common models for ordinal

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Chapter 14 Logistic and Poisson Regressions

Chapter 14 Logistic and Poisson Regressions STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

4 Multicategory Logistic Regression

4 Multicategory Logistic Regression 4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Short Course Introduction to Categorical Data Analysis

Short Course Introduction to Categorical Data Analysis Short Course Introduction to Categorical Data Analysis Alan Agresti Distinguished Professor Emeritus University of Florida, USA Presented for ESALQ/USP, Piracicaba Brazil March 8-10, 2016 c Alan Agresti,

More information

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

More information

Matched Pair Data. Stat 557 Heike Hofmann

Matched Pair Data. Stat 557 Heike Hofmann Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

ECONOMETRICS II TERM PAPER. Multinomial Logit Models

ECONOMETRICS II TERM PAPER. Multinomial Logit Models ECONOMETRICS II TERM PAPER Multinomial Logit Models Instructor : Dr. Subrata Sarkar 19.04.2013 Submitted by Group 7 members: Akshita Jain Ramyani Mukhopadhyay Sridevi Tolety Trishita Bhattacharjee 1 Acknowledgement:

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Chapter 20: Logistic regression for binary response variables

Chapter 20: Logistic regression for binary response variables Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Chapter 11: Models for Matched Pairs

Chapter 11: Models for Matched Pairs : Models for Matched Pairs Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Chapter 11: Analysis of matched pairs

Chapter 11: Analysis of matched pairs Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime

More information

Solution to Tutorial 7

Solution to Tutorial 7 1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

MSUG conference June 9, 2016

MSUG conference June 9, 2016 Weight of Evidence Coded Variables for Binary and Ordinal Logistic Regression Bruce Lund Magnify Analytic Solutions, Division of Marketing Associates MSUG conference June 9, 2016 V12 web 1 Topics for this

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

Models for Ordinal Response Data

Models for Ordinal Response Data Models for Ordinal Response Data Robin High Department of Biostatistics Center for Public Health University of Nebraska Medical Center Omaha, Nebraska Recommendations Analyze numerical data with a statistical

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

Homework 10 - Solution

Homework 10 - Solution STAT 526 - Spring 2011 Homework 10 - Solution Olga Vitek Each part of the problems 5 points 1. Faraway Ch. 4 problem 1 (page 93) : The dataset parstum contains cross-classified data on marijuana usage

More information