Chapter 5: Logistic Regression-I

Similar documents
Stat 704: Data Analysis I, Fall 2010

STAT 7030: Categorical Data Analysis

Chapter 14 Logistic regression

Chapter 4: Generalized Linear Models-I

Sections 4.1, 4.2, 4.3

Categorical data analysis Chapter 5

Chapter 4: Generalized Linear Models-II

Section Poisson Regression

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

BMI 541/699 Lecture 22

COMPLEMENTARY LOG-LOG MODEL

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Multinomial Logistic Regression Models

STA6938-Logistic Regression Model

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Chapter 11: Models for Matched Pairs

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Simple logistic regression

Single-level Models for Binary Responses

Chapter 14 Logistic and Poisson Regressions

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Inference for Binomial Parameters

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Homework 1 Solutions

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Stat 642, Lecture notes for 04/12/05 96

Correlation and regression

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Short Course Introduction to Categorical Data Analysis

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

ssh tap sas913, sas

Lecture 12: Effect modification, and confounding in logistic regression

The material for categorical data follows Agresti closely.

Testing Independence

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Measures of Association and Variance Estimation

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University

Analysing categorical data using logit models

8 Nominal and Ordinal Logistic Regression

Generalized linear models

Chapter 2: Describing Contingency Tables - I

BIOS 625 Fall 2015 Homework Set 3 Solutions

Generalized linear models

CHAPTER 1: BINARY LOGIT MODEL

Cohen s s Kappa and Log-linear Models

6 Applying Logistic Regression Models

9 Generalized Linear Models

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Generalised linear models. Response variable can take a number of different formats

jh page 1 /6

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Linear Regression Models P8111

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Statistical Modelling with Stata: Binary Outcomes

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Generalized linear models III Log-linear and related models

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Chapter 11: Analysis of matched pairs

R Hints for Chapter 10

Basic Medical Statistics Course

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

STAT 525 Fall Final exam. Tuesday December 14, 2010

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Lecture 01: Introduction

Introduction to the Logistic Regression Model

Introducing Generalized Linear Models: Logistic Regression

STAT 705: Analysis of Contingency Tables

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Statistics in medicine

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Models for Binary Outcomes

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Stat 579: Generalized Linear Models and Extensions

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

12 Modelling Binomial Response Data

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Chapter 1 Statistical Inference

Transcription:

: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay (VCU) 1 / 30

5.1 Model Interpretation 5.1.1 Model Interpretation The logistic regression model is Y i bin(n i, π i ), π i = exp(β 0 + β 1 x i1 + + β p 1 x i,p 1 ) 1 + exp(β 0 + β 1 x i1 + + β p 1 x i,p 1 ). x i = (1, x i1,..., x i,p 1 ) is a p-dimensional vector of explanatory variables including a place holder for the intercept. β = (β 0,..., β p 1 ) is the p-dimensional vector of regression coefficients. These are the unknown population parameters. η i = x iβ is called the linear predictor. Many, many uses including credit scoring, genetics, disease monitoring, etc, etc... Many generalizations: ordinal data, complex random effects models, discrete choice models, etc. D. Bandyopadhyay (VCU) 2 / 30

5.1 Model Interpretation Lets start with simple logistic regression: ( e α+βx i Y i bin n i, 1 + e α+βx i An odds ratio: let s look at how the odds of success changes when we increase x by one unit: [ ] [ ] e α+βx+β 1 π(x + 1)/[1 π(x + 1)] / 1+e = α+βx+β 1+e [ α+βx+β π(x)/[1 π(x)] ). [ e α+βx 1+e α+βx ] / = eα+βx+β e α+βx = e β. ] 1 1+e α+βx When we increase x by one unit, the odds of an event occurring increases by a factor of e β, regardless of the value of x. D. Bandyopadhyay (VCU) 3 / 30

5.1 Model Interpretation So e β is an odds ratio. We also have π(x) x = βπ(x)[1 π(x)]. Note that π(x) changes more when π(x) is away from zero or one than when π(x) is near 0.5. This gives us approximately how π(x) changes when x increases by a unit. This increase depends on x, unlike the odds ratio. D. Bandyopadhyay (VCU) 4 / 30

5.1 Model Interpretation 5.1.3 Horseshoe Crab Data Let s look at Y i = 1 if a female crab has one or more satellites, and Y i = 0 if not. So π(x) = eα+βx 1 + e α+βx, is the probability of a female having more than her nest-mate around as a function of her width x. D. Bandyopadhyay (VCU) 5 / 30

5.1 Model Interpretation Standard Wald Parameter DF Estimate Error Chi Square Pr > ChiSq I n t e r c e p t 1 12.3508 2.6287 22.0749 <.0001 width 1 0.4972 0.1017 23.8872 <.0001 Odds R a t i o E s t i m a t e s P o i n t 95% Wald E f f e c t E s t i m a t e C o n f i d e n c e L i m i t s width 1. 6 4 4 1. 3 4 7 2. 0 0 7 We estimate the probability of a satellite as ˆπ(x) = e 12.35+0.50x 1 + e 12.35+0.50x. The odds of having a satellite increases by a factor between 1.3 and 2.0 times for every cm increase in carapace width. The coefficient table houses estimates ˆβ j, se( ˆβ j ), and the Wald statistic z 2 j = { ˆβ j /se( ˆβ j )} 2 and p-value for testing H 0 : β j = 0. What do we conclude here? D. Bandyopadhyay (VCU) 6 / 30

5.1 Model Interpretation 5.1.2 Looking at data With a single predictor x, can plot p i = y i /n i versus x i. This approach works well when n i 1. The plot should look like a lazy s. Alternatively, the sample logits log p i /(1 p i ) = log y i /(n i y i ) versus x i should be approximately straight. If some categories have all successes or failures, an ad hoc adjustment is log{(y i + 0.5)/(n i y i + 0.5)}. When many n i are small, you can group the data yourself into, say, 10-20 like categories and plot them. For the horseshoe crab data let s use the categories defined in Chapter 4. A new variable w is created that is the midpoint of the width categories: D. Bandyopadhyay (VCU) 7 / 30

5.1 Model Interpretation data crab1; input color spine width satell weight; weight=weight/1000; color=color 1; y=0; n=1; if satell >0 then y=1; w=22.75; if width>23.25 then w=23.75; if width>24.25 then w=24.75; if width>25.25 then w=25.75; if width>26.25 then w=26.75; if width>27.25 then w=27.75; if width>28.25 then w=28.75; if width>29.25 then w=29.75; run; proc sort data=crab1; by w; proc means data=crab1 noprint; by w; var y n; output out=crabs2 sum=sumy sumn; data crabs3; set crabs2; p=sumy/sumn; logit =log((sumy+0.5)/(sumn sumy+0.5)); proc gplot ; plot p w; plot logit w; run; D. Bandyopadhyay (VCU) 8 / 30

5.1 Model Interpretation Probability of Satellite 0.0 0.2 0.4 0.6 0.8 1.0 20 22 24 26 28 30 32 34 Carapace Width logit(probability of Satellite) 1 0 1 2 3 4 20 22 24 26 28 30 32 34 Carapace Width Figure : Sample P & logit(p) versus width; Is it lazy s or straight? D. Bandyopadhyay (VCU) 9 / 30

5.1 Model Interpretation 5.1.4 Retrospective sampling & logistic regression In case-control studies the number of cases and the number of controls are set ahead of time. It is not possible to estimate the probability of being a case from the general population for these types of data, but just as with a 2 2 table, we can still estimate an odds ratio e β. Let Z indicate whether a subject is sampled (1=yes,0=no). Let P 1 = P(Z = 1 y = 1) be the probability that a case is sampled and let P 0 = P(Z = 1 y = 0) be the probability that a control is sampled. In a simple random sample, P 1 = P(Y = 1) and P 0 = P(Y = 0) = 1 P 1. Assume the logistic regression model π(x) = P(Y = 1 x) = eα+βx 1 + e α+βx. D. Bandyopadhyay (VCU) 10 / 30

5.1 Model Interpretation Assume that the probability of choosing a case is independent of x, P(Z = 1 y = 1, x) = P(Z = 1 y = 1) and the same for a control P(Z = 1 y = 0, x) = P(Z = 1 y = 0). This is the case, for instance, when a fixed number of cases and controls are sampled retrospectively, regardless of their x values. Bayes rule gives us P(Y = 1 z = 1, x) = where α = α + log(p 1 /P 0 ). = P 1 π(x) P 1 π(x) + P 0 (1 π(x)) e α +βx 1 + e α +βx, The parameter β has the same interpretation in terms of odds ratios as with simple random sampling. D. Bandyopadhyay (VCU) 11 / 30

5.1 Model Interpretation Comments: This is very powerful & another reason why logistic regression is widely used. Other links (e.g. identity, probit) do not have this property. Matched case/controls studies require more thought; Chapter 11.2.5. D. Bandyopadhyay (VCU) 12 / 30

5.2 Inferences for Logistic Regression 5.2.1 Inference about Model Parameters and Probabilities Consider the full model logit{π(x)} = β 0 + β 1 x 1 + + β p 1 x p 1 = x β. Most types of inferences are functions of β, say g(β). Some examples: g(β) = β j, j th regression coefficient. g(β) = e β j, j th odds ratio. g(β) = e x β /(1 + e x β ), probability π(x). If ˆβ is the MLE of β, then g(ˆβ) is the MLE of g(β). This provides an estimate. The delta method is an all-purpose method for obtaining a standard error for g(ˆβ). D. Bandyopadhyay (VCU) 13 / 30

5.2 Inferences for Logistic Regression We know ˆβ N p (β, ĉov(ˆβ)). Let g(β) be a function from R p to R. Taylor s theorem implies, as long as the MLE ˆβ is somewhat close to the true value β, that g(β) g(ˆβ) + [Dg(ˆβ)](β ˆβ), where [Dg(β)] is the vector of first partial derivatives Dg(β) = g(β) β 1 g(β) β 2. g(β) β p. D. Bandyopadhyay (VCU) 14 / 30

5.2 Inferences for Logistic Regression Then implies and finally So (ˆβ β) N p (0, ĉov(ˆβ)), [Dg(β)] (ˆβ β) N(0, [Dg(β)] ĉov(ˆβ)[dg(β)]), g(ˆβ) N(g(β), [Dg(ˆβ)] ĉov(ˆβ)[dg(ˆβ)]). se{g(ˆβ)} = [Dg(ˆβ)] ĉov(ˆβ)[dg(ˆβ)]. This can be used to get confidence intervals for probabilities, etc. D. Bandyopadhyay (VCU) 15 / 30

5.2 Inferences for Logistic Regression proc logistic data=crabs1 descending; model y = width; output out=crabs2 pred=p lower=l upper=u; proc sort data=crabs2; by width; proc gplot data=crabs2; title Estimated probabilities with pointwise 95% CI s ; symbol1 i=join color =black; symbol2 i=join color =red line=3; symbol3 i=join color =black; axis1 label =( ); plot ( l p u) width / overlay vaxis =axis1; run; D. Bandyopadhyay (VCU) 16 / 30

5.2 Inferences for Logistic Regression Probability of Satellite 0.0 0.2 0.4 0.6 0.8 1.0 20 22 24 26 28 30 32 34 Carapace Width Figure : Fitted probability of satellite as a function of width & 95% CIs. D. Bandyopadhyay (VCU) 17 / 30

5.2 Inferences for Logistic Regression 5.2.3 & 5.2.4 Goodness of fit The deviance GOF statistic is defined to be s { ( ) ( yi ni y i D = 2 y i log + (n i y i ) log n i ˆπ i n i n i ˆπ i where ˆπ i = ex i ˆβ 1+e x i i=1 Pearson s GOF statistic is ˆβ are fitted values. X 2 = s i=1 (y i n i ˆπ i ) 2 n i ˆπ i (1 ˆπ i ). )}, Both statistics are approximately χ 2 s p in large samples assuming that the number of trials n = s i=1 n i increases in such a way that each n i increases. D. Bandyopadhyay (VCU) 18 / 30

5.2 Inferences for Logistic Regression 5.2.5 Group your data Binomial data is often recorded as individual (Bernoulli) records: i y i n i x i 1 0 1 9 2 0 1 14 3 1 1 14 4 0 1 17 5 1 1 17 6 1 1 17 7 1 1 20 Grouping the data yields an identical model: i y i n i x i 1 0 1 9 2 1 2 14 3 2 3 17 4 1 1 20 D. Bandyopadhyay (VCU) 19 / 30

5.2 Inferences for Logistic Regression ˆβ, se( ˆβ j ), and L(ˆβ) don t care if data are grouped. The quality of residuals and GOF statistics depend on how data are grouped. D and Pearson s X 2 will change! In PROC LOGISTIC type AGGREGATE and SCALE=NONE after the MODEL statement to get D and X 2 based on grouped data. This option does not compute residuals based on the grouped data. You can aggregate over all variables or a subset, e.g. AGGREGATE=(width). D. Bandyopadhyay (VCU) 20 / 30

5.2 Inferences for Logistic Regression The Hosmer and Lemeshow test statistic orders observations (x i, Y i ) by fitted probabilities ˆπ(x i ) from smallest to largest and divides them into (typically) g = 10 groups of roughly the same size. A Pearson test statistic is computed from these g groups. The statistic would have a χ 2 g p distribution if each group had exactly the same predictor x for all observations (but the observations in a group do not have the same predictor x and they do not share a common success probability). In general, the null distribution is approximately χ 2 g 2 when the number of distinct patterns of covariate values equals the sample size (see text). Termed a near-replicate GOF test (Hosmer and Lemeshow 1980). The LACKFIT option in PROC LOGISTIC gives this statistic. Can also test logit{π(x)} = β 0 + β 1 x versus more general model logit{π(x)} = β 0 + β 1 x + β 2 x 2 via H 0 : β 2 = 0. D. Bandyopadhyay (VCU) 21 / 30

5.2 Inferences for Logistic Regression Raw (Bernoulli) data with aggregate scale=none lackfit; Deviance and Pearson Goodness of F i t S t a t i s t i c s C r i t e r i o n Value DF Value /DF Pr > ChiSq Deviance 69.7260 64 1.0895 0. 2911 Pearson 55.1779 64 0.8622 0.7761 Number o f u n i q u e p r o f i l e s : 66 P a r t i t i o n f o r t h e Hosmer and Lemeshow Test y = 1 y = 0 Group T o t a l Observed Expected Observed Expected 1 19 5 5. 3 9 14 1 3. 6 1 2 18 8 7. 6 2 10 1 0. 3 8 3 17 11 8. 6 2 6 8. 3 8 4 17 8 9. 9 2 9 7. 0 8 5 16 11 1 0. 1 0 5 5. 9 0 6 18 11 1 2. 3 0 7 5. 7 0 7 16 12 1 2. 0 6 4 3. 9 4 8 16 12 1 2. 9 0 4 3. 1 0 9 16 13 1 3. 6 9 3 2. 3 1 10 20 20 1 8. 4 1 0 1. 5 9 Hosmer and Lemeshow Goodness of F i t Test Chi Square DF Pr > ChiSq 5.2465 8 0.7309 D. Bandyopadhyay (VCU) 22 / 30

5.2 Inferences for Logistic Regression Comments: There are 66 distinct widths {x i } out of N = 173 crabs. For χ 2 66 2 to hold, we must keep sampling crabs that only have one of the 66 fixed number of widths! Does that make sense here? The Hosmer and Lemeshow test gives a p-value of 0.73 based on g = 10 groups. Are assumptions going into this p-value met? None of the GOF tests have assumptions that are met in practice for continuous predictors. Are they still useful? The raw statistics do not tell you where lack of fit occurs. Deviance and Pearson residuals do tell you this (later). Also, the table provided by the H-L tells you which groups are ill-fit should you reject H 0 : logistic model holds. GOF tests are meant to detect gross deviations from model assumptions. No model ever truly fits data except hypothetically. D. Bandyopadhyay (VCU) 23 / 30

5.3 Categorical Predictors 5.3.1 Categorical predictors Let s say we wish to include variable X, a categorical variable that takes on values x {1, 2,..., I }. We need to allow each level of X = x to affect π(x) differently. This is accomplished by the use of dummy variables. This is typically done one of two ways. Define z 1, z 2,..., z I 1 as follows: { 1 X = j z j = 1 X j This is the default in PROC LOGISTIC with a CLASS X statement. Say I = 3, then the model is which gives logit π(x) = β 0 + β 1 z 1 + β 2 z 2. logit π(x) = β 0 + β 1 β 2 when X = 1 logit π(x) = β 0 β 1 + β 2 when X = 2 logit π(x) = β 0 β 1 β 2 when X = 3 D. Bandyopadhyay (VCU) 24 / 30

5.3 Categorical Predictors At alternative method uses zero/one dummies instead: { 1 X = j z j = 0 X j This is the default if PROC GENMOD with a CLASS X statement. This can also be obtained in PROC LOGISTIC with the PARAM=REF option. This sets class X = I as baseline. Say I = 3, then the model is which gives logit π(x) = β 0 + β 1 z 1 + β 2 z 2. logit π(x) = β 0 + β 1 when X = 1 logit π(x) = β 0 + β 2 when X = 2 logit π(x) = β 0 when X = 3 D. Bandyopadhyay (VCU) 25 / 30

5.3 Categorical Predictors I prefer the latter method because it s easier to think about for me. You can choose a different baseline category with REF=FIRST next to the variable name in the CLASS statement. Table 3.7 (p. 89): Drinks per day Malformation 0 < 1 1 2 3 5 6 Absent 17,066 14,464 788 126 37 Present 48 38 5 1 1 data mal; input cons present absent score @@; total = present+absent; datalines ; 1 48 17066 0 2 38 14464 0.5 3 5 788 1.5 4 1 126 4.0 5 1 37 7.0; run; proc logistic data=mal; class cons / param=ref ref=last; model present/ total = cons; run; D. Bandyopadhyay (VCU) 26 / 30

5.3 Categorical Predictors Testing Global Null Hypothesis : BETA=0 Test Chi Square DF Pr > ChiSq L i k e l i h o o d R a t i o 6.2020 4 0. 1 8 46 S c o r e 12.0821 4 0. 0 1 68 Wald 9.2811 4 0. 0 544 Type 3 A n a l y s i s o f E f f e c t s Wald E f f e c t DF Chi Square Pr > ChiSq cons 4 9.2811 0.0544 A n a l y s i s o f Maximum L i k e l i h o o d E s t i m a t e s Standard Wald Parameter DF Estimate Error Chi Square Pr > ChiSq I n t e r c e p t 1 3.6109 1.0134 12.6956 0. 0004 cons 1 1 2.2627 1.0237 4.8858 0. 0 271 cons 2 1 2.3309 1.0264 5.1577 0. 0 231 cons 3 1 1.4491 1.1083 1.7097 0. 1 910 cons 4 1 1.2251 1.4264 0.7377 0. 3 904 Odds R a t i o E s t i m a t e s P o i n t 95% Wald E f f e c t E s t i m a t e C o n f i d e n c e L i m i t s cons 1 v s 5 0. 1 0 4 0. 0 1 4 0. 7 7 4 cons 2 v s 5 0. 0 9 7 0. 0 1 3 0. 7 2 7 cons 3 v s 5 0. 2 3 5 0. 0 2 7 2. 0 6 1 cons 4 v s 5 0. 2 9 4 0. 0 1 8 4. 8 1 0 D. Bandyopadhyay (VCU) 27 / 30

5.3 Categorical Predictors The model is logit π(x ) = β 0 + β 1 I {X = 1} + β 2 I {X = 2} + β 3 I {X = 3} + β 4 I {X = 4} where X denotes alcohol consumption X = 1, 2, 3, 4, 5. Type 3 analyses test whether all dummy variables associated with a categorical predictor are simultaneously zero, here H 0 : β 1 = β 2 = β 3 = β 4 = 0. If we accept this then the categorical predictor is not needed in the model. PROC LOGISTIC gives estimates and CIs for e β j for j = 1, 2, 3, 4. Here, these are interpreted as the odds of developing malformation when X = 1, 2, 3, or 4 versus the odds when X = 5. We are not as interested in the individual Wald tests H 0 : β j = 0 for a categorical predictor. Why is that? Because they only compare a level X = 1, 2, 3, 4 to baseline X = 5, not to each other. D. Bandyopadhyay (VCU) 28 / 30

5.3 Categorical Predictors The Testing Global Null Hypothesis: BETA=0 are three tests that no predictor is needed; H 0 : logit{π(x)} = β 0 versus H 1 : logit{π(x)} = x β. Anything wrong here? 1) p-values = 0.18, 0.02, 0.05 from LR, Score and Wald tests respectively; 2) the p-vlues using the exact conditional distribution of X 2 and G 2 are 0.03 and 0.13, providing mixed signals. The table 3.7 has a mixture of very small, moderate, and extremely large counts, even though n=32,574, the null distributions of X 2 and G 2 may not be close to chi-squared. In any case, these statistics ignore the ordinality of alcohol consumption. D. Bandyopadhyay (VCU) 29 / 30

5.3 Categorical Predictors Note that the Wald test for H 0 : β = 0 is the same as the Type III test that consumption is not important. Why is that? Let Y = 1 denote malformation for a randomly sampled individual. To get an odds ratio for malformation from increasing from, say, X = 2 to X = 4, note that P(Y = 1 X = 2)/P(Y = 0 X = 2) P(Y = 1 X = 4)/P(Y = 0 X = 4) = eβ 2 β 4. This is estimated with the CONTRAST command. D. Bandyopadhyay (VCU) 30 / 30