Chapter 14 Logistic and Poisson Regressions

Similar documents
You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Models for Binary Outcomes

ssh tap sas913, sas

COMPLEMENTARY LOG-LOG MODEL

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Sections 4.1, 4.2, 4.3

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Section Poisson Regression

Chapter 5: Logistic Regression-I

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Multinomial Logistic Regression Models

Count data page 1. Count data. 1. Estimating, testing proportions

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

Linear Regression Models P8111

STA6938-Logistic Regression Model

Chapter 1. Modeling Basics

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

CHAPTER 1: BINARY LOGIT MODEL

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Age 55 (x = 1) Age < 55 (x = 0)

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

Stat 642, Lecture notes for 04/12/05 96

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Generalized linear models

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Simple logistic regression

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Categorical data analysis Chapter 5

Correlation and regression

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

STAT 7030: Categorical Data Analysis

Chapter 4: Generalized Linear Models-I

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

STAT 525 Fall Final exam. Tuesday December 14, 2010

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Chapter 14 Logistic regression

Statistical Modelling with Stata: Binary Outcomes

Statistics 262: Intermediate Biostatistics Model selection

Single-level Models for Binary Responses

Logistic Regression. Continued Psy 524 Ainsworth

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Generalized Linear Modeling - Logistic Regression

Model Estimation Example

Chapter 4: Generalized Linear Models-II

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Generalized Models: Part 1

Investigating Models with Two or Three Categories

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

LOGISTIC REGRESSION Joseph M. Hilbe

8 Nominal and Ordinal Logistic Regression

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

Chapter 1 Linear Regression with One Predictor

Stat 704: Data Analysis I, Fall 2010

Introduction to Generalized Models

11. Generalized Linear Models: An Introduction

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Longitudinal Modeling with Logistic Regression

Lecture 8: Summary Measures

Lecture 12: Effect modification, and confounding in logistic regression

Topic 20: Single Factor Analysis of Variance

Overview Scatter Plot Example

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

Some comments on Partitioning

Stat 579: Generalized Linear Models and Extensions

7.1, 7.3, 7.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1/ 31

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

9 Generalized Linear Models

Generalized Multilevel Models for Non-Normal Outcomes

Chapter 1 Statistical Inference

Classification. Chapter Introduction. 6.2 The Bayes classifier

Generalized Linear Models

Lecture 5: LDA and Logistic Regression

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Introduction to the Logistic Regression Model

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Introduction to General and Generalized Linear Models

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Generalized Linear Models: An Introduction

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STAT 705: Analysis of Contingency Tables

Stat 5101 Lecture Notes

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Transcription:

STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang

Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y = 1) vs Not diseased (Y = 0) Employed (Y = 1) vs Unemployed (Y = 0) Response is binary or dichotomous Can model response using Bernoulli distribution Pr(Y i = 1) = π i Pr(Y i = 0) = 1 π i E(Y i ) = π i Goal to link π i with covariates X i 14-1

Standard Linear Model? Consider simple linear model (single predictor X) Y i = β 0 +β 1 X i +ε i π i = E[Y i ] = β 0 +β 1 X i Assumption violations: Non-normal error terms When Y i = 0 : ε i = β 0 β 1 X i When Y i = 1 : ε i = 1 β 0 β 1 X i Non-constant variance Var(Y i ) = π i (1 π i ) = (β 0 +β 1 X i )(1 β 0 β 1 X i ) Response function constraints 0 π i 1 14-2

Logistic Response Function Consider a sigmoidal response function π i = E[Y i ] = exp(β 0 +β 1 X i ) 1+exp(β 0 +β 1 X i ) Example of a nonlinear model = {1+exp( β 0 β 1 X i )} 1 14-3

Properties Monotonic increasing/decreasing function Restricts 0 π i = E[Y i ] 1 Can be linearized through transformation log ( πi 1 π i ) = β 0 +β 1 X i Known as the logit transformation Thus, use logit link to relate π i with X i Other links possible (i.e., probit, complementary log-log described on p 559-563) 14-4

Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = log n π Y i i=1 i (1 π i) 1 Y i = Y i log(π i )+ (1 Y i )log(1 π i ) π i = Y i log( )+ log(1 π i ) 1 π i = Y i (β 0 +β 1 X i ) log(1+exp(β 0 +β 1 X i )) MLEs do not have closed forms SAS performs iterative reweighted least squares (IRWLS) Given b 0 and b 1, can calculate ˆπ i 14-5

Interpretation ˆπ i is the estimated probability of individual i having response Y i = 1 b 1 is no longer the slope of a linear relationship but rather the slope of the logit relationship logit(ˆπ(x i +1)) logit(ˆπ(x i )) = b 1 Logit transformation is also the log of the odds Thus, exp(b 1 ) becomes the odds ratio Popular summary in epidemiologic studies 14-6

Example (Page 625) Board of directors are interested in the effect of a due increase Randomly surveyed n = 30 members X i is the due increase posited to member X i varied between $30 and $50 Y i is whether membership would continue 14-7

data a1; infile u:\.www\datasets525\ch14pr07.txt ; input norenew increase; renew=1-norenew; proc print data=a1; var renew increase; run; quit; Obs renew increase 1 1 30 2 0 30 3 1 30 4 1 31 5 1 32 6 1 33 7 0 34 8 1 35 9 1 35... 22 1 45 23 0 45 24 0 45 25 1 46 26 0 47 27 0 48 28 1 49 29 0 50 30 0 50 14-8

/*------ Scatterplot ------*/ goptions colors=( none ); symbol1 v=circle i=sm70; proc gplot data=a1; plot renew*increase; run; quit; 14-9

/* DESCENDING: SAS will predict the event of 1, otherwise of 0 */ proc logistic data=a1 descending; model renew = increase; output out=a2 p=pred; run; quit; Model Information Model Optimization Technique binary logit Fisher s scoring Response Profile Ordered Total Value renew Frequency 1 1 14 2 0 16 Probability modeled is renew=1. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 43.455 41.465 SC 44.857 44.267-2 Log L 41.455 37.465 14-10

Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 3.9906 1 0.0458 Score 3.8265 1 0.0504 Wald 3.5104 1 0.0610 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 4.8075 2.6558 3.2769 0.0703 increase 1-0.1251 0.0668 3.5104 0.0610 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits increase 0.882 0.774 1.006 Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D 0.402 Percent Discordant 28.1 Gamma 0.417 Percent Tied 3.6 Tau-a 0.207 Pairs 224 c 0.701 14-11

/*------ Scatterplot/Fit ------*/ symbol1 v=circle i=none; symbol2 v=star i=sm30; proc gplot data=a2; plot renew*increase pred*increase /overlay; run; quit; 14-12

Multiple Logistic Regression Easily extension using matrix formulations Similar model building strategies (SAS option of MODEL) Stepwise: SELECTION=STEPWISE Forward: SELECTION=FORWARD Backward: SELECTION=BACKWARD Best Subset: SELECTION=SCORE SELECTION=SCORE PROC LOGISTIC uses the branch and bound algorithm of Furnival and Wilson (1974) to find a specified number of models (e.g., BEST=1) with the highest score (chi-square) test statistic for all possible model sizes. The score test statistic is to test whether all the coefficients are zero. 14-13

Example (Page 573) Investigating epidemic outbreak of a disease spread by mosquitoes Randomly sampled individuals within two sectors of city Assessed whether individual had symptoms of disease and obtained other info X i1 is age X i2 indicator of middle class X i3 indicator of lower class X i4 is the sector Y i is whether they had symptoms 14-14

data a3; infile u:\.www\datasets525\apc3.dat ; input case age socioecon sector Y jnk; X2=0; if socioecon=2 then X2=1; X3=0; if socioecon=3 then X3=1; drop jnk; proc logistic data=a3 descending; model Y = age X2 X3 sector/selection=score best=1; run; quit; Regression Models Selected by Score Criterion Number of Score Variables Chi-Square Variables Included in Model 1 14.8687 sector 2 24.6315 age sector 3 24.9862 age X3 sector 4 24.9977 age X2 X3 sector Which model should be selected? X2 and X3 should be included/excluded together! Note: selection=score does not support class variables! 14-15

proc logistic data=a3 descending; class socioecon; model Y = age socioecon sector/selection=stepwise; run; quit; Summary of Stepwise Selection Effect Number Score Wald Step Entered Removed DF In Chi-Square Chi-Square Pr > ChiSq 1 sector 1 1 14.8687 0.0001 2 age 1 2 10.2014 0.0014 proc logistic data=a3 descending; model Y = age X2 X3 sector; test x2=0,x3=0; run; quit; Linear Hypotheses Testing Results Wald Label Chi-Square DF Pr > ChiSq Test 1 0.4195 2 0.8108 14-16

/*------ Remove X2 & X3 ------*/ proc logistic data=a3 descending; model Y = age sector; run; quit; Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 24.6901 2 <.0001 Score 24.6315 2 <.0001 Wald 21.6714 2 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1-3.3413 0.5921 31.8471 <.0001 age 1 0.0268 0.00865 9.6082 0.0019 sector 1 1.1817 0.3370 12.2981 0.0005 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 1.027 1.010 1.045 sector 3.260 1.684 6.310 14-17

PROC GENMOD fits a generalized linear model by ML. The available distributions: normal, binomial, Poisson, gamma, inverse Gaussian, negative binomial, multinomial. proc genmod data=a3 descending; model Y = age X2 X3 sector / link=logit noscale dist=bin; contrast socioeconomic X2 1, X3 1; contrast sector sector 1; run; quit; Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1-3.5376 0.6892-4.8884-2.1867 26.35 <.0001 age 1 0.0270 0.0087 0.0100 0.0440 9.68 0.0019 X2 1 0.0446 0.4325-0.8031 0.8923 0.01 0.9179 X3 1 0.2534 0.4056-0.5414 1.0483 0.39 0.5320 sector 1 1.2436 0.3523 0.5532 1.9341 12.46 0.0004 Scale 0 1.0000 0.0000 1.0000 1.0000 Contrast Results Contrast DF Chi-Square Pr > ChiSq Type socioeconomic 2 0.42 0.8109 LR sector 1 13.00 0.0003 LR 14-18

Hypothesis Testing Hypothesis testing and formulation of test statistics is done using a likelihood ratio test, which is similar to the general linear test (i.e., comparing the full and reduced models). The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (π i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For logistic regression DEV(X 1,X 2,...,X p 1 ) = 2log(L(b 0,b 1,...,b p 1 )) 14-19

Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD For example: Testing H O : Socioeconomic=0 Model Fit Statistics from PROC LOGISTIC Output of model Y = age X2 X3 sector; -2 Log L 211.220 Output of model Y = age sector; -2 Log L 211.639 DEV(Socioecon Age,Sector) = 211.639-211.220 = 0.419 p-value=0.8109 14-20

Can also use Wald s Test for H 0 : l β = 0 Described on page 578 Test is similar to squared z test S = (L ˆβ) (L ΣL) 1 (L ˆβ) Under H 0, S χ 2 r where r is rank of L Available in PROC GENMOD and PROC LOGISTIC, using ESTIMATE statement. 14-21

Diagnostics Goodness of fit : Overall measure of fit Consider both replicated and unreplicated binary data Replicated (c unique classes of predictors) a. Deviance goodness of fit : Compare deviance to χ 2 c p b. Pearson goodness of fit : Compare (O jk E jk ) 2 /E jk to χ 2 c p Unreplicated a. Hosemer-Lemeshow goodness of fit: Group observations into classes (usually around 10) according to fitted logit values. Assess overall fit to each class using a Pearson goodness of fit approach. (p 589-590) 14-22

Hosemer-Lemeshow Goodness of Fit Test Divide obs up into 10 groups of equal size based on percentiles of the estimated probabilities Expected # of 1 s is ˆπ i Expected # of 0 s is n i ˆπ i Compare expected with observed through χ 2 = (O ij E ij ) 2 E ij Example (Page 625) /* LACKFIT: Hosmer and Lemeshow goodness-of-fit test */ proc logistic data=a1 descending; model renew = increase / lackfit; output out=a2 p=pred; proc print; run; quit; 14-23

Obs norenew increase renew _LEVEL_ pred 1 1 50 0 1 0.19056 2 1 50 0 1 0.19056 3 0 49 1 1 0.21060 4 1 48 0 1 0.23214 5 1 47 0 1 0.25518 6 0 46 1 1 0.27967 7 0 45 1 1 0.30555 8 1 45 0 1 0.30555 9 1 45 0 1 0.30555 10 1 44 0 1 0.33272 11 1 43 0 1 0.36104 12 1 42 0 1 0.39037 13 0 41 1 1 0.42051 14 0 40 1 1 0.45125 15 1 40 0 1 0.45125 16 1 40 0 1 0.45125 17 1 39 0 1 0.48237 18 0 38 1 1 0.51363 19 0 37 1 1 0.54478 24 1 34 0 1 0.63526 25 0 33 1 1 0.66372 26 0 32 1 1 0.69104 27 0 31 1 1 0.71709 28 0 30 1 1 0.74177 29 1 30 0 1 0.74177 30 0 30 1 1 0.74177 14-24

In this example E 11 = (.19056+.19056+.21060) = 0.59 E 10 = 2.41 E 21 = (.23214+.25518+.27967) = 0.77 E 20 = 2.23 Partition for the Hosmer and Lemeshow Test renew = 1 renew = 0 Group Total Observed Expected Observed Expected 1 3 1 0.59 2 2.41 2 3 1 0.77 2 2.23 3 3 1 0.92 2 2.08 4 3 0 1.08 3 1.92 5 4 2 1.77 2 2.23 6 3 2 1.54 1 1.46 7 4 2 2.39 2 1.61 8 3 2 1.99 1 1.01 9 4 3 2.94 1 1.06 Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq 2.6526 7 0.9152 14-25

Measures of Agreement Want to measure the association of the pair (Y i,ˆπ i ), i = 1,2,,N Desired to have close (positive) association in some sense. Compare predicted probabilities of pair (Y i = 1,Y j = 0) Concordant if ˆπ i > ˆπ j (#concordant = n c ) Discordant if ˆπ i < ˆπ j (#discordant = n d ) Tie if ˆπ i = ˆπ j Consider all pairs of distinct responses In this example t = 16 14 = 224 14-26

Measures of agreement Somers D = n c n d t : ranges from 1 (all pairs disagree) to 1 (all pairs agree); Goodman-Kruskal Gamma = n c n d n c +n d : range from -1.0 (no association) to 1.0 (perfect association). Not penalizing for ties and generally greater than Somer s D; Kendall s Tau-a = n c n d : modification of Somer s D N(N 1)/2 by considering all possible paired observations. Usually much smaller than Somer s D; c = n c+(t n c n d )/2 t : equivalent to the well known measure ROC, ranging from 0.5 (randomly predicting the response) to 1 (perfectly discriminating the response). Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D 0.402 Percent Discordant 28.1 Gamma 0.417 Percent Tied 3.6 Tau-a 0.207 Pairs 224 c 0.701 14-27

Diagnostics Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Pearson residual is the ordinary residual divided by the standard error of Y i. Sum of squared residuals equals Pearson X 2 r P i = Y i ˆπ i ˆπ i (1 ˆπ i ) Studentized Pearson residual is similar but divided by its standard error so they have unit variance Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on ˆπ i > 0.5 14-28

Residual analysis Can plot residual by predicted value : A flat lowess smooth to this plot suggests the model is correct Can generate probability plot with envelope DFFITS, DFBETAS SAS contains influence and iplots options Example (Page 625)) ods html; ods graphics on; proc logistic data=a1 descending; model renew = increase / iplots influence lackfit; ods graphics off; ods html close; run; quit; 14-29

Regression Diagnostics Pearson Residual Deviance Residual Covariates Case (1 unit = 0.24) (1 unit = 0.22) increase Value -8-4 0 2 4 6 8 Value -8-4 0 2 4 6 8 1 30.0000 0.5900 * 0.7729 * 2 30.0000-1.6948 * -1.6455 * 3 30.0000 0.5900 * 0.7729 * 4 31.0000 0.6281 * 0.8155 * 5 32.0000 0.6686 * 0.8597 * 6 33.0000 0.7118 * 0.9054 * 7 34.0000-1.3197 * -1.4203 * 8 35.0000 0.8066 * 1.0012 * 9 35.0000 0.8066 * 1.0012 * 10 35.0000-1.2397 * -1.3645 * 11 36.0000-1.1646 * -1.3092 * 12 37.0000 0.9141 * 1.1021 * 13 38.0000 0.9731 * 1.1543 * 14 39.0000-0.9653 * -1.1476 * 15 40.0000 1.1028 * 1.2615 * 16 40.0000-0.9068 * -1.0955 * 17 40.0000-0.9068 * -1.0955 * 18 41.0000 1.1739 * 1.3163 * 19 42.0000-0.8002 * -0.9949 * 20 43.0000-0.7517 * -0.9465 * 21 44.0000-0.7061 * -0.8995 * 22 45.0000 1.5076 * 1.5399 * 23 45.0000-0.6633 * -0.8540 * 24 45.0000-0.6633 * -0.8540 * 25 46.0000 1.6049 * 1.5963 * 26 47.0000-0.5853 * -0.7676 * 27 48.0000-0.5498 * -0.7268 * 28 49.0000 1.9361 * 1.7651 * 29 50.0000-0.4852 * -0.6502 * 30 50.0000-0.4852 * -0.6502 * 14-30

Hat Matrix Diagonal Intercept Case (1 unit = 6.E-03) DfBeta (1 unit = 0.07) Value 0 2 4 6 8 12 16 Value -8-4 0 2 4 6 8 1 0.1040 * 0.1945 * 2 0.1040 * -0.5587 * 3 0.1040 * 0.1945 * 4 0.0941 * 0.1902 * 5 0.0841 * 0.1831 * 6 0.0743 * 0.1732 * 7 0.0651 * -0.2792 * 8 0.0568 * 0.1441 * 9 0.0568 * 0.1441 * 10 0.0568 * -0.2215 * 11 0.0497 * -0.1689 * 12 0.0442 * 0.1013 * 13 0.0404 * 0.0744 * 14 0.0385 * -0.0405 * 15 0.0385 * 0.00837 * 16 0.0385 * -0.00688 * 17 0.0385 * -0.00688 * 18 0.0404 * -0.0310 * 19 0.0440 * 0.0479 * 20 0.0491 * 0.0696 * 21 0.0555 * 0.0879 * 22 0.0628 * -0.2338 * 23 0.0628 * 0.1029 * 24 0.0628 * 0.1029 * 25 0.0707 * -0.2957 * 26 0.0788 * 0.1240 * 27 0.0869 * 0.1306 * 28 0.0946 * -0.5053 * 29 0.1017 * 0.1370 * 30 0.1017 * 0.1370 * 14-31

Influence Plots 14-32

Residual Plots 14-33

Ordinal Logistic Regression Have more than two possible outcomes on ordered scale (i.e., Likert scale), say J outcomes Often called proportional odds model ( ) Pr(Yi j) log = β 0j +β 1 X i1 + +β p 1 X i,p 1, 1 Pr(Y i j) j = 1,2,,J 1 Shared values of β 1,,β p 1 J 1 intercepts: β 01,β 02,,β 0,J 1 Proportional odds (irrelevant to the values of predictors) Pr(Y i j X i ) 1 Pr(Y i j X i ) / Pr(Y i k X i ) 1 Pr(Y i k X i ) = eβ 0j β 0k 14-34

Dose-Symptoms Example data dsymp; input dose symptoms $ n @@; ldose = log10(dose); cards; 10 None 33 10 Mild 7 10 Severe 10 20 None 17 20 Mild 13 20 Severe 17 30 None 14 30 Mild 3 30 Severe 28 40 None 9 40 Mild 8 40 Severe 32 ; /* ORDER: specifies sorting order for classification variables */ /* ORDER=DATA: order of appearance in the input data set */ proc logistic data=dsymp order=data; class symptoms; model symptoms = ldose; weight n; run; quit; Response Profile Ordered Total Total Value symptoms Frequency Weight 1 None 4 73.000000 2 Mild 4 31.000000 3 Severe 4 87.000000 Probs modeled are cumulated over the lower Ordered Values. 14-35

Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 0.1674 1 0.6825 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 31.3281 1 <.0001 Score 29.6576 1 <.0001 Wald 28.5177 1 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept None 1 4.1734 0.8862 22.1759 <.0001 Intercept Mild 1 4.9372 0.9083 29.5473 <.0001 ldose 1-3.5207 0.6593 28.5177 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits ldose 0.030 0.008 0.108 14-36

Polytomous Logistic Regression For polytomous or multicategory, consider there are J response categories Select one as baseline or reference category, ( ) P(Yi = j) log = β 0j +β 1j X i1 + +β p 1,j X i,p 1, P(Y i = J) log ( ) πij π ij = X i β j, j = 1,,J 1; Results in J 1 parameter vectors, β 1,,β J 1 14-37

Dose-Symptoms Example /* CATMOD: performs categorical data modeling */ /* DIRECT: lists numeric predictors */ proc catmod data=dsymp order=data; direct ldose; weight n; model symptoms = ldose; run; quit; Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Intercept 2 24.83 <.0001 ldose 2 26.47 <.0001 Likelihood Ratio 4 6.91 0.1410 Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq Intercept 1 5.3897 1.0998 24.02 <.0001 2 2.3076 1.3681 2.84 0.0917 ldose 1-4.1489 0.8065 26.46 <.0001 2-2.4128 0.9902 5.94 0.0148 14-38

Overdispersion Binomial/Bernoulli Model E(Y i ) = π i Var(Y i ) = π i (1 π i ) Beta-Binomial Model E(Y i ) = π i Var(Y i ) = σ 2 π i (1 π i ) Mean unaffected but Cov(ˆβ) σ 2 (X WX) 1 Can show E(χ 2 /(N p)) σ 2 14-39

Example (Page 625) proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 28 37.4648 1.3380 Scaled Deviance 28 37.4648 1.3380 Pearson Chi-Square 28 30.1028 1.0751 Scaled Pearson X2 28 30.1028 1.0751 Log Likelihood -18.7324 --------------------------------------------------------------- proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 28 37.4648 1.3380 Scaled Deviance 28 9.3662 0.3345 Pearson Chi-Square 28 30.1028 1.0751 Scaled Pearson X2 28 7.5257 0.2688 Log Likelihood -4.6831 14-40

proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Analysis Of Parameter Estimates --------------------------------------------------------------- proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Analysis Of Parameter Estimates Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept 1 4.8075 2.6558 3.28 0.0703 increase 1-0.1251 0.0668 3.51 0.0610 Scale 0 1.0000 0.0000 Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept 1 4.8075 5.3115 0.82 0.3654 increase 1-0.1251 0.1335 0.88 0.3489 Scale 0 2.0000 0.0000 14-41

INGOTS Example Taken from Cox and Snell (1989, p.10-11) The data consist of the number, r, of ingots not ready for rolling, out of n tested, for a number of combinations of heating time (heat) and soaking time (soak). data ingots; input heat soak r n @@; cards; 7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13 7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1 7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1 7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1 7 4.0 0 9 14 4.0 0 19 27 4.0 1 16 ; 14-42

proc logistic data = ingots; model r/n = heat soak; run; quit; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 108.988 101.346 SC 112.947 113.221-2 Log L 106.988 95.346 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 11.6428 2 0.0030 Score 15.1091 2 0.0005 Wald 13.0315 2 0.0015 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1-5.5592 1.1197 24.6503 <.0001 heat 1 0.0820 0.0237 11.9454 0.0005 soak 1 0.0568 0.3312 0.0294 0.8639 14-43

proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=p; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 16 13.7526 0.8595 Scaled Deviance 16 16.2476 1.0155 Pearson Chi-Square 16 13.5431 0.8464 Scaled Pearson X2 16 16.0000 1.0000 Log Likelihood -56.3214 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1-5.5592 1.0301-7.5782-3.5401 29.12 <.0001 heat 1 0.0820 0.0218 0.0392 0.1248 14.11 0.0002 soak 1 0.0568 0.3047-0.5405 0.6540 0.03 0.8522 Scale 0 0.9200 0.0000 0.9200 0.9200 NOTE: The scale parameter was estimated by the square root of Pearson s Chi-Square/ 14-44

proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=d; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 16 13.7526 0.8595 Scaled Deviance 16 16.0000 1.0000 Pearson Chi-Square 16 13.5431 0.8464 Scaled Pearson X2 16 15.7562 0.9848 Log Likelihood -55.4632 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1-5.5592 1.0381-7.5938-3.5246 28.68 <.0001 heat 1 0.0820 0.0220 0.0389 0.1252 13.90 0.0002 soak 1 0.0568 0.3071-0.5451 0.6586 0.03 0.8533 Scale 0 0.9271 0.0000 0.9271 0.9271 NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. 14-45

Poisson Regression Like logistic regression, this is a nonlinear regression model for discrete outcomes Used when response variable is a count, for example, Number of acute asthma attacks in a week Number of visits to mall during December Useful when large count is a rare event Otherwise, standard linear model with normal errors appropriate Poisson counts use transformation 14-46

Regression Model Standard linear/nonlinear regression form: Y i = E(Y i )+ε i Generalized linear model: Exponential family distribution for Y and link between the mean and covariates Common link g( ) such that µ i = g 1 (X i β) µ i = X i β (LINK=IDENTITY) µ i = exp(x iβ) (LINK=LOG) µ i = (X i β)2 (LINK=POWER(0.5)) All must result in µ i nonnegative Model is such that the Y i s are independent Poisson random variables with expected values µ i = g 1 (X i β) 14-47

Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = Y i log{µ(x i,β)} µ(x i,β)+c MLEs do not have closed forms Iterative reweighted least squares or other numerical search procedures used to solve for β 14-48

Hypothesis Testing Hypothesis testing and formulation of test statistics is done in similar manner to logistic regression. The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (µ i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For Poisson regression Yi DEV(X 1,X 2,...,X p 1 ) = 2[ log ( ) Yi ˆµ i (Y i ˆµ i ) ] 14-49

Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD Prediction Mean response for predictors X Probability of specific count (i.e., Y = 0) 14-50

Example (Page 621) Lumber company interested in the relationship between the number of customers from a census tract and X 1 : Number of housing units X 2 : Average income X 3 : Average housing unit age X 4 : Distance to nearest competitor X 5 : Distance to store Will use LINK=LOG log(µ i ) = X β data ppoi1; infile U:\.www\datasets525\CH14TA08.DAT ; input ncust x1 x2 x3 x4 x5; proc genmod; model ncust= x1 x2 x3 x4 x5 / link=log dist=poi type3; run; quit; 14-51

Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 104 114.9854 1.1056 Scaled Deviance 104 114.9854 1.1056 Pearson Chi-Square 104 101.8808 0.9796 Scaled Pearson X2 104 101.8808 0.9796 Log Likelihood 1898.0224 Analysis Of Parameter Estimates NOTE: The scale parameter was held fixed. LR Statistics For Type 3 Analysis Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square P>ChiSq Intercept 1 2.9424 0.2072 2.5362 3.3486 201.57 <.0001 x1 1 0.0006 0.0001 0.0003 0.0009 18.17 <.0001 x2 1-0.0000 0.0000-0.0000-0.0000 30.63 <.0001 x3 1-0.0037 0.0018-0.0072-0.0002 4.37 0.0365 x4 1 0.1684 0.0258 0.1179 0.2189 42.70 <.0001 x5 1-0.1288 0.0162-0.1605-0.0970 63.17 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 Chi- Source DF Square Pr > ChiSq x1 1 18.20 <.0001 x2 1 31.79 <.0001 x3 1 4.38 0.0364 x4 1 41.66 <.0001 x5 1 67.50 <.0001 14-52

Diagnostics Goodness of fit Deviance goodness of fit : Compare overall fit with χ 2 n p Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on whether ˆµ i > Y i Can plot r D i by observation (index plot) Can generate probability plot with envelope 14-53

Chapter Review Logistic Regression Background Model Inference Diagnostics and remedies Ordinal Logistic Regression Polytomous Logistic Regression Poisson Regression Background Model Inference 14-54