Chapter 14 Logistic and Poisson Regressions

Size: px

Start display at page:

Download "Chapter 14 Logistic and Poisson Regressions"

Abner Thornton
5 years ago
Views:

1 STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang

2 Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y = 1) vs Not diseased (Y = 0) Employed (Y = 1) vs Unemployed (Y = 0) Response is binary or dichotomous Can model response using Bernoulli distribution Pr(Y i = 1) = π i Pr(Y i = 0) = 1 π i E(Y i ) = π i Goal to link π i with covariates X i 14-1

3 Standard Linear Model? Consider simple linear model (single predictor X) Y i = β 0 +β 1 X i +ε i π i = E[Y i ] = β 0 +β 1 X i Assumption violations: Non-normal error terms When Y i = 0 : ε i = β 0 β 1 X i When Y i = 1 : ε i = 1 β 0 β 1 X i Non-constant variance Var(Y i ) = π i (1 π i ) = (β 0 +β 1 X i )(1 β 0 β 1 X i ) Response function constraints 0 π i

4 Logistic Response Function Consider a sigmoidal response function π i = E[Y i ] = exp(β 0 +β 1 X i ) 1+exp(β 0 +β 1 X i ) Example of a nonlinear model = {1+exp( β 0 β 1 X i )}

5 Properties Monotonic increasing/decreasing function Restricts 0 π i = E[Y i ] 1 Can be linearized through transformation log ( πi 1 π i ) = β 0 +β 1 X i Known as the logit transformation Thus, use logit link to relate π i with X i Other links possible (i.e., probit, complementary log-log described on p ) 14-4

6 Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = log n π Y i i=1 i (1 π i) 1 Y i = Y i log(π i )+ (1 Y i )log(1 π i ) π i = Y i log( )+ log(1 π i ) 1 π i = Y i (β 0 +β 1 X i ) log(1+exp(β 0 +β 1 X i )) MLEs do not have closed forms SAS performs iterative reweighted least squares (IRWLS) Given b 0 and b 1, can calculate ˆπ i 14-5

7 Interpretation ˆπ i is the estimated probability of individual i having response Y i = 1 b 1 is no longer the slope of a linear relationship but rather the slope of the logit relationship logit(ˆπ(x i +1)) logit(ˆπ(x i )) = b 1 Logit transformation is also the log of the odds Thus, exp(b 1 ) becomes the odds ratio Popular summary in epidemiologic studies 14-6

8 Example (Page 625) Board of directors are interested in the effect of a due increase Randomly surveyed n = 30 members X i is the due increase posited to member X i varied between $30 and $50 Y i is whether membership would continue 14-7

9 data a1; infile u:\.www\datasets525\ch14pr07.txt ; input norenew increase; renew=1-norenew; proc print data=a1; var renew increase; run; quit; Obs renew increase

10 /* Scatterplot */ goptions colors=( none ); symbol1 v=circle i=sm70; proc gplot data=a1; plot renew*increase; run; quit; 14-9

11 /* DESCENDING: SAS will predict the event of 1, otherwise of 0 */ proc logistic data=a1 descending; model renew = increase; output out=a2 p=pred; run; quit; Model Information Model Optimization Technique binary logit Fisher s scoring Response Profile Ordered Total Value renew Frequency Probability modeled is renew=1. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

12 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept increase Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits increase Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D Percent Discordant 28.1 Gamma Percent Tied 3.6 Tau-a Pairs 224 c

13 /* Scatterplot/Fit */ symbol1 v=circle i=none; symbol2 v=star i=sm30; proc gplot data=a2; plot renew*increase pred*increase /overlay; run; quit; 14-12

14 Multiple Logistic Regression Easily extension using matrix formulations Similar model building strategies (SAS option of MODEL) Stepwise: SELECTION=STEPWISE Forward: SELECTION=FORWARD Backward: SELECTION=BACKWARD Best Subset: SELECTION=SCORE SELECTION=SCORE PROC LOGISTIC uses the branch and bound algorithm of Furnival and Wilson (1974) to find a specified number of models (e.g., BEST=1) with the highest score (chi-square) test statistic for all possible model sizes. The score test statistic is to test whether all the coefficients are zero

15 Example (Page 573) Investigating epidemic outbreak of a disease spread by mosquitoes Randomly sampled individuals within two sectors of city Assessed whether individual had symptoms of disease and obtained other info X i1 is age X i2 indicator of middle class X i3 indicator of lower class X i4 is the sector Y i is whether they had symptoms 14-14

16 data a3; infile u:\.www\datasets525\apc3.dat ; input case age socioecon sector Y jnk; X2=0; if socioecon=2 then X2=1; X3=0; if socioecon=3 then X3=1; drop jnk; proc logistic data=a3 descending; model Y = age X2 X3 sector/selection=score best=1; run; quit; Regression Models Selected by Score Criterion Number of Score Variables Chi-Square Variables Included in Model sector age sector age X3 sector age X2 X3 sector Which model should be selected? X2 and X3 should be included/excluded together! Note: selection=score does not support class variables! 14-15

17 proc logistic data=a3 descending; class socioecon; model Y = age socioecon sector/selection=stepwise; run; quit; Summary of Stepwise Selection Effect Number Score Wald Step Entered Removed DF In Chi-Square Chi-Square Pr > ChiSq 1 sector age proc logistic data=a3 descending; model Y = age X2 X3 sector; test x2=0,x3=0; run; quit; Linear Hypotheses Testing Results Wald Label Chi-Square DF Pr > ChiSq Test

18 /* Remove X2 & X */ proc logistic data=a3 descending; model Y = age sector; run; quit; Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 age sector Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age sector

19 PROC GENMOD fits a generalized linear model by ML. The available distributions: normal, binomial, Poisson, gamma, inverse Gaussian, negative binomial, multinomial. proc genmod data=a3 descending; model Y = age X2 X3 sector / link=logit noscale dist=bin; contrast socioeconomic X2 1, X3 1; contrast sector sector 1; run; quit; Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 age X X sector Scale Contrast Results Contrast DF Chi-Square Pr > ChiSq Type socioeconomic LR sector LR 14-18

20 Hypothesis Testing Hypothesis testing and formulation of test statistics is done using a likelihood ratio test, which is similar to the general linear test (i.e., comparing the full and reduced models). The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (π i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For logistic regression DEV(X 1,X 2,...,X p 1 ) = 2log(L(b 0,b 1,...,b p 1 )) 14-19

21 Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD For example: Testing H O : Socioeconomic=0 Model Fit Statistics from PROC LOGISTIC Output of model Y = age X2 X3 sector; -2 Log L Output of model Y = age sector; -2 Log L DEV(Socioecon Age,Sector) = = p-value=

22 Can also use Wald s Test for H 0 : l β = 0 Described on page 578 Test is similar to squared z test S = (L ˆβ) (L ΣL) 1 (L ˆβ) Under H 0, S χ 2 r where r is rank of L Available in PROC GENMOD and PROC LOGISTIC, using ESTIMATE statement

23 Diagnostics Goodness of fit : Overall measure of fit Consider both replicated and unreplicated binary data Replicated (c unique classes of predictors) a. Deviance goodness of fit : Compare deviance to χ 2 c p b. Pearson goodness of fit : Compare (O jk E jk ) 2 /E jk to χ 2 c p Unreplicated a. Hosemer-Lemeshow goodness of fit: Group observations into classes (usually around 10) according to fitted logit values. Assess overall fit to each class using a Pearson goodness of fit approach. (p ) 14-22

24 Hosemer-Lemeshow Goodness of Fit Test Divide obs up into 10 groups of equal size based on percentiles of the estimated probabilities Expected # of 1 s is ˆπ i Expected # of 0 s is n i ˆπ i Compare expected with observed through χ 2 = (O ij E ij ) 2 E ij Example (Page 625) /* LACKFIT: Hosmer and Lemeshow goodness-of-fit test */ proc logistic data=a1 descending; model renew = increase / lackfit; output out=a2 p=pred; proc print; run; quit; 14-23

25 Obs norenew increase renew _LEVEL_ pred

26 In this example E 11 = ( ) = 0.59 E 10 = 2.41 E 21 = ( ) = 0.77 E 20 = 2.23 Partition for the Hosmer and Lemeshow Test renew = 1 renew = 0 Group Total Observed Expected Observed Expected Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq

27 Measures of Agreement Want to measure the association of the pair (Y i,ˆπ i ), i = 1,2,,N Desired to have close (positive) association in some sense. Compare predicted probabilities of pair (Y i = 1,Y j = 0) Concordant if ˆπ i > ˆπ j (#concordant = n c ) Discordant if ˆπ i < ˆπ j (#discordant = n d ) Tie if ˆπ i = ˆπ j Consider all pairs of distinct responses In this example t = =

28 Measures of agreement Somers D = n c n d t : ranges from 1 (all pairs disagree) to 1 (all pairs agree); Goodman-Kruskal Gamma = n c n d n c +n d : range from -1.0 (no association) to 1.0 (perfect association). Not penalizing for ties and generally greater than Somer s D; Kendall s Tau-a = n c n d : modification of Somer s D N(N 1)/2 by considering all possible paired observations. Usually much smaller than Somer s D; c = n c+(t n c n d )/2 t : equivalent to the well known measure ROC, ranging from 0.5 (randomly predicting the response) to 1 (perfectly discriminating the response). Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D Percent Discordant 28.1 Gamma Percent Tied 3.6 Tau-a Pairs 224 c

29 Diagnostics Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Pearson residual is the ordinary residual divided by the standard error of Y i. Sum of squared residuals equals Pearson X 2 r P i = Y i ˆπ i ˆπ i (1 ˆπ i ) Studentized Pearson residual is similar but divided by its standard error so they have unit variance Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on ˆπ i >

30 Residual analysis Can plot residual by predicted value : A flat lowess smooth to this plot suggests the model is correct Can generate probability plot with envelope DFFITS, DFBETAS SAS contains influence and iplots options Example (Page 625)) ods html; ods graphics on; proc logistic data=a1 descending; model renew = increase / iplots influence lackfit; ods graphics off; ods html close; run; quit; 14-29

31 Regression Diagnostics Pearson Residual Deviance Residual Covariates Case (1 unit = 0.24) (1 unit = 0.22) increase Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 14-30

32 Hat Matrix Diagonal Intercept Case (1 unit = 6.E-03) DfBeta (1 unit = 0.07) Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 14-31

33 Influence Plots 14-32

34 Residual Plots 14-33

35 Ordinal Logistic Regression Have more than two possible outcomes on ordered scale (i.e., Likert scale), say J outcomes Often called proportional odds model ( ) Pr(Yi j) log = β 0j +β 1 X i1 + +β p 1 X i,p 1, 1 Pr(Y i j) j = 1,2,,J 1 Shared values of β 1,,β p 1 J 1 intercepts: β 01,β 02,,β 0,J 1 Proportional odds (irrelevant to the values of predictors) Pr(Y i j X i ) 1 Pr(Y i j X i ) / Pr(Y i k X i ) 1 Pr(Y i k X i ) = eβ 0j β 0k 14-34

36 Dose-Symptoms Example data dsymp; input dose symptoms $ ldose = log10(dose); cards; 10 None Mild 7 10 Severe None Mild Severe None Mild 3 30 Severe None 9 40 Mild 8 40 Severe 32 ; /* ORDER: specifies sorting order for classification variables */ /* ORDER=DATA: order of appearance in the input data set */ proc logistic data=dsymp order=data; class symptoms; model symptoms = ldose; weight n; run; quit; Response Profile Ordered Total Total Value symptoms Frequency Weight 1 None Mild Severe Probs modeled are cumulated over the lower Ordered Values

37 Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept None <.0001 Intercept Mild <.0001 ldose <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits ldose

38 Polytomous Logistic Regression For polytomous or multicategory, consider there are J response categories Select one as baseline or reference category, ( ) P(Yi = j) log = β 0j +β 1j X i1 + +β p 1,j X i,p 1, P(Y i = J) log ( ) πij π ij = X i β j, j = 1,,J 1; Results in J 1 parameter vectors, β 1,,β J

39 Dose-Symptoms Example /* CATMOD: performs categorical data modeling */ /* DIRECT: lists numeric predictors */ proc catmod data=dsymp order=data; direct ldose; weight n; model symptoms = ldose; run; quit; Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Intercept <.0001 ldose <.0001 Likelihood Ratio Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq Intercept < ldose <

40 Overdispersion Binomial/Bernoulli Model E(Y i ) = π i Var(Y i ) = π i (1 π i ) Beta-Binomial Model E(Y i ) = π i Var(Y i ) = σ 2 π i (1 π i ) Mean unaffected but Cov(ˆβ) σ 2 (X WX) 1 Can show E(χ 2 /(N p)) σ

41 Example (Page 625) proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood

42 proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Analysis Of Parameter Estimates proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Analysis Of Parameter Estimates Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept increase Scale Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept increase Scale

43 INGOTS Example Taken from Cox and Snell (1989, p.10-11) The data consist of the number, r, of ingots not ready for rolling, out of n tested, for a number of combinations of heating time (heat) and soaking time (soak). data ingots; input heat soak r cards; ; 14-42

44 proc logistic data = ingots; model r/n = heat soak; run; quit; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 heat soak

45 proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=p; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 heat soak Scale NOTE: The scale parameter was estimated by the square root of Pearson s Chi-Square/ 14-44

46 proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=d; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 heat soak Scale NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF

47 Poisson Regression Like logistic regression, this is a nonlinear regression model for discrete outcomes Used when response variable is a count, for example, Number of acute asthma attacks in a week Number of visits to mall during December Useful when large count is a rare event Otherwise, standard linear model with normal errors appropriate Poisson counts use transformation 14-46

48 Regression Model Standard linear/nonlinear regression form: Y i = E(Y i )+ε i Generalized linear model: Exponential family distribution for Y and link between the mean and covariates Common link g( ) such that µ i = g 1 (X i β) µ i = X i β (LINK=IDENTITY) µ i = exp(x iβ) (LINK=LOG) µ i = (X i β)2 (LINK=POWER(0.5)) All must result in µ i nonnegative Model is such that the Y i s are independent Poisson random variables with expected values µ i = g 1 (X i β) 14-47

49 Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = Y i log{µ(x i,β)} µ(x i,β)+c MLEs do not have closed forms Iterative reweighted least squares or other numerical search procedures used to solve for β 14-48

50 Hypothesis Testing Hypothesis testing and formulation of test statistics is done in similar manner to logistic regression. The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (µ i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For Poisson regression Yi DEV(X 1,X 2,...,X p 1 ) = 2[ log ( ) Yi ˆµ i (Y i ˆµ i ) ] 14-49

51 Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD Prediction Mean response for predictors X Probability of specific count (i.e., Y = 0) 14-50

52 Example (Page 621) Lumber company interested in the relationship between the number of customers from a census tract and X 1 : Number of housing units X 2 : Average income X 3 : Average housing unit age X 4 : Distance to nearest competitor X 5 : Distance to store Will use LINK=LOG log(µ i ) = X β data ppoi1; infile U:\.www\datasets525\CH14TA08.DAT ; input ncust x1 x2 x3 x4 x5; proc genmod; model ncust= x1 x2 x3 x4 x5 / link=log dist=poi type3; run; quit; 14-51

53 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates NOTE: The scale parameter was held fixed. LR Statistics For Type 3 Analysis Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square P>ChiSq Intercept <.0001 x <.0001 x <.0001 x x <.0001 x <.0001 Scale Chi- Source DF Square Pr > ChiSq x <.0001 x <.0001 x x <.0001 x <

54 Diagnostics Goodness of fit Deviance goodness of fit : Compare overall fit with χ 2 n p Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on whether ˆµ i > Y i Can plot r D i by observation (index plot) Can generate probability plot with envelope 14-53

55 Chapter Review Logistic Regression Background Model Inference Diagnostics and remedies Ordinal Logistic Regression Polytomous Logistic Regression Poisson Regression Background Model Inference 14-54

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the