STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang
Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y = 1) vs Not diseased (Y = 0) Employed (Y = 1) vs Unemployed (Y = 0) Response is binary or dichotomous Can model response using Bernoulli distribution Pr(Y i = 1) = π i Pr(Y i = 0) = 1 π i E(Y i ) = π i Goal to link π i with covariates X i 14-1
Standard Linear Model? Consider simple linear model (single predictor X) Y i = β 0 +β 1 X i +ε i π i = E[Y i ] = β 0 +β 1 X i Assumption violations: Non-normal error terms When Y i = 0 : ε i = β 0 β 1 X i When Y i = 1 : ε i = 1 β 0 β 1 X i Non-constant variance Var(Y i ) = π i (1 π i ) = (β 0 +β 1 X i )(1 β 0 β 1 X i ) Response function constraints 0 π i 1 14-2
Logistic Response Function Consider a sigmoidal response function π i = E[Y i ] = exp(β 0 +β 1 X i ) 1+exp(β 0 +β 1 X i ) Example of a nonlinear model = {1+exp( β 0 β 1 X i )} 1 14-3
Properties Monotonic increasing/decreasing function Restricts 0 π i = E[Y i ] 1 Can be linearized through transformation log ( πi 1 π i ) = β 0 +β 1 X i Known as the logit transformation Thus, use logit link to relate π i with X i Other links possible (i.e., probit, complementary log-log described on p 559-563) 14-4
Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = log n π Y i i=1 i (1 π i) 1 Y i = Y i log(π i )+ (1 Y i )log(1 π i ) π i = Y i log( )+ log(1 π i ) 1 π i = Y i (β 0 +β 1 X i ) log(1+exp(β 0 +β 1 X i )) MLEs do not have closed forms SAS performs iterative reweighted least squares (IRWLS) Given b 0 and b 1, can calculate ˆπ i 14-5
Interpretation ˆπ i is the estimated probability of individual i having response Y i = 1 b 1 is no longer the slope of a linear relationship but rather the slope of the logit relationship logit(ˆπ(x i +1)) logit(ˆπ(x i )) = b 1 Logit transformation is also the log of the odds Thus, exp(b 1 ) becomes the odds ratio Popular summary in epidemiologic studies 14-6
Example (Page 625) Board of directors are interested in the effect of a due increase Randomly surveyed n = 30 members X i is the due increase posited to member X i varied between $30 and $50 Y i is whether membership would continue 14-7
data a1; infile u:\.www\datasets525\ch14pr07.txt ; input norenew increase; renew=1-norenew; proc print data=a1; var renew increase; run; quit; Obs renew increase 1 1 30 2 0 30 3 1 30 4 1 31 5 1 32 6 1 33 7 0 34 8 1 35 9 1 35... 22 1 45 23 0 45 24 0 45 25 1 46 26 0 47 27 0 48 28 1 49 29 0 50 30 0 50 14-8
/*------ Scatterplot ------*/ goptions colors=( none ); symbol1 v=circle i=sm70; proc gplot data=a1; plot renew*increase; run; quit; 14-9
/* DESCENDING: SAS will predict the event of 1, otherwise of 0 */ proc logistic data=a1 descending; model renew = increase; output out=a2 p=pred; run; quit; Model Information Model Optimization Technique binary logit Fisher s scoring Response Profile Ordered Total Value renew Frequency 1 1 14 2 0 16 Probability modeled is renew=1. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 43.455 41.465 SC 44.857 44.267-2 Log L 41.455 37.465 14-10
Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 3.9906 1 0.0458 Score 3.8265 1 0.0504 Wald 3.5104 1 0.0610 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 4.8075 2.6558 3.2769 0.0703 increase 1-0.1251 0.0668 3.5104 0.0610 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits increase 0.882 0.774 1.006 Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D 0.402 Percent Discordant 28.1 Gamma 0.417 Percent Tied 3.6 Tau-a 0.207 Pairs 224 c 0.701 14-11
/*------ Scatterplot/Fit ------*/ symbol1 v=circle i=none; symbol2 v=star i=sm30; proc gplot data=a2; plot renew*increase pred*increase /overlay; run; quit; 14-12
Multiple Logistic Regression Easily extension using matrix formulations Similar model building strategies (SAS option of MODEL) Stepwise: SELECTION=STEPWISE Forward: SELECTION=FORWARD Backward: SELECTION=BACKWARD Best Subset: SELECTION=SCORE SELECTION=SCORE PROC LOGISTIC uses the branch and bound algorithm of Furnival and Wilson (1974) to find a specified number of models (e.g., BEST=1) with the highest score (chi-square) test statistic for all possible model sizes. The score test statistic is to test whether all the coefficients are zero. 14-13
Example (Page 573) Investigating epidemic outbreak of a disease spread by mosquitoes Randomly sampled individuals within two sectors of city Assessed whether individual had symptoms of disease and obtained other info X i1 is age X i2 indicator of middle class X i3 indicator of lower class X i4 is the sector Y i is whether they had symptoms 14-14
data a3; infile u:\.www\datasets525\apc3.dat ; input case age socioecon sector Y jnk; X2=0; if socioecon=2 then X2=1; X3=0; if socioecon=3 then X3=1; drop jnk; proc logistic data=a3 descending; model Y = age X2 X3 sector/selection=score best=1; run; quit; Regression Models Selected by Score Criterion Number of Score Variables Chi-Square Variables Included in Model 1 14.8687 sector 2 24.6315 age sector 3 24.9862 age X3 sector 4 24.9977 age X2 X3 sector Which model should be selected? X2 and X3 should be included/excluded together! Note: selection=score does not support class variables! 14-15
proc logistic data=a3 descending; class socioecon; model Y = age socioecon sector/selection=stepwise; run; quit; Summary of Stepwise Selection Effect Number Score Wald Step Entered Removed DF In Chi-Square Chi-Square Pr > ChiSq 1 sector 1 1 14.8687 0.0001 2 age 1 2 10.2014 0.0014 proc logistic data=a3 descending; model Y = age X2 X3 sector; test x2=0,x3=0; run; quit; Linear Hypotheses Testing Results Wald Label Chi-Square DF Pr > ChiSq Test 1 0.4195 2 0.8108 14-16
/*------ Remove X2 & X3 ------*/ proc logistic data=a3 descending; model Y = age sector; run; quit; Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 24.6901 2 <.0001 Score 24.6315 2 <.0001 Wald 21.6714 2 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1-3.3413 0.5921 31.8471 <.0001 age 1 0.0268 0.00865 9.6082 0.0019 sector 1 1.1817 0.3370 12.2981 0.0005 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 1.027 1.010 1.045 sector 3.260 1.684 6.310 14-17
PROC GENMOD fits a generalized linear model by ML. The available distributions: normal, binomial, Poisson, gamma, inverse Gaussian, negative binomial, multinomial. proc genmod data=a3 descending; model Y = age X2 X3 sector / link=logit noscale dist=bin; contrast socioeconomic X2 1, X3 1; contrast sector sector 1; run; quit; Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1-3.5376 0.6892-4.8884-2.1867 26.35 <.0001 age 1 0.0270 0.0087 0.0100 0.0440 9.68 0.0019 X2 1 0.0446 0.4325-0.8031 0.8923 0.01 0.9179 X3 1 0.2534 0.4056-0.5414 1.0483 0.39 0.5320 sector 1 1.2436 0.3523 0.5532 1.9341 12.46 0.0004 Scale 0 1.0000 0.0000 1.0000 1.0000 Contrast Results Contrast DF Chi-Square Pr > ChiSq Type socioeconomic 2 0.42 0.8109 LR sector 1 13.00 0.0003 LR 14-18
Hypothesis Testing Hypothesis testing and formulation of test statistics is done using a likelihood ratio test, which is similar to the general linear test (i.e., comparing the full and reduced models). The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (π i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For logistic regression DEV(X 1,X 2,...,X p 1 ) = 2log(L(b 0,b 1,...,b p 1 )) 14-19
Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD For example: Testing H O : Socioeconomic=0 Model Fit Statistics from PROC LOGISTIC Output of model Y = age X2 X3 sector; -2 Log L 211.220 Output of model Y = age sector; -2 Log L 211.639 DEV(Socioecon Age,Sector) = 211.639-211.220 = 0.419 p-value=0.8109 14-20
Can also use Wald s Test for H 0 : l β = 0 Described on page 578 Test is similar to squared z test S = (L ˆβ) (L ΣL) 1 (L ˆβ) Under H 0, S χ 2 r where r is rank of L Available in PROC GENMOD and PROC LOGISTIC, using ESTIMATE statement. 14-21
Diagnostics Goodness of fit : Overall measure of fit Consider both replicated and unreplicated binary data Replicated (c unique classes of predictors) a. Deviance goodness of fit : Compare deviance to χ 2 c p b. Pearson goodness of fit : Compare (O jk E jk ) 2 /E jk to χ 2 c p Unreplicated a. Hosemer-Lemeshow goodness of fit: Group observations into classes (usually around 10) according to fitted logit values. Assess overall fit to each class using a Pearson goodness of fit approach. (p 589-590) 14-22
Hosemer-Lemeshow Goodness of Fit Test Divide obs up into 10 groups of equal size based on percentiles of the estimated probabilities Expected # of 1 s is ˆπ i Expected # of 0 s is n i ˆπ i Compare expected with observed through χ 2 = (O ij E ij ) 2 E ij Example (Page 625) /* LACKFIT: Hosmer and Lemeshow goodness-of-fit test */ proc logistic data=a1 descending; model renew = increase / lackfit; output out=a2 p=pred; proc print; run; quit; 14-23
Obs norenew increase renew _LEVEL_ pred 1 1 50 0 1 0.19056 2 1 50 0 1 0.19056 3 0 49 1 1 0.21060 4 1 48 0 1 0.23214 5 1 47 0 1 0.25518 6 0 46 1 1 0.27967 7 0 45 1 1 0.30555 8 1 45 0 1 0.30555 9 1 45 0 1 0.30555 10 1 44 0 1 0.33272 11 1 43 0 1 0.36104 12 1 42 0 1 0.39037 13 0 41 1 1 0.42051 14 0 40 1 1 0.45125 15 1 40 0 1 0.45125 16 1 40 0 1 0.45125 17 1 39 0 1 0.48237 18 0 38 1 1 0.51363 19 0 37 1 1 0.54478 24 1 34 0 1 0.63526 25 0 33 1 1 0.66372 26 0 32 1 1 0.69104 27 0 31 1 1 0.71709 28 0 30 1 1 0.74177 29 1 30 0 1 0.74177 30 0 30 1 1 0.74177 14-24
In this example E 11 = (.19056+.19056+.21060) = 0.59 E 10 = 2.41 E 21 = (.23214+.25518+.27967) = 0.77 E 20 = 2.23 Partition for the Hosmer and Lemeshow Test renew = 1 renew = 0 Group Total Observed Expected Observed Expected 1 3 1 0.59 2 2.41 2 3 1 0.77 2 2.23 3 3 1 0.92 2 2.08 4 3 0 1.08 3 1.92 5 4 2 1.77 2 2.23 6 3 2 1.54 1 1.46 7 4 2 2.39 2 1.61 8 3 2 1.99 1 1.01 9 4 3 2.94 1 1.06 Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq 2.6526 7 0.9152 14-25
Measures of Agreement Want to measure the association of the pair (Y i,ˆπ i ), i = 1,2,,N Desired to have close (positive) association in some sense. Compare predicted probabilities of pair (Y i = 1,Y j = 0) Concordant if ˆπ i > ˆπ j (#concordant = n c ) Discordant if ˆπ i < ˆπ j (#discordant = n d ) Tie if ˆπ i = ˆπ j Consider all pairs of distinct responses In this example t = 16 14 = 224 14-26
Measures of agreement Somers D = n c n d t : ranges from 1 (all pairs disagree) to 1 (all pairs agree); Goodman-Kruskal Gamma = n c n d n c +n d : range from -1.0 (no association) to 1.0 (perfect association). Not penalizing for ties and generally greater than Somer s D; Kendall s Tau-a = n c n d : modification of Somer s D N(N 1)/2 by considering all possible paired observations. Usually much smaller than Somer s D; c = n c+(t n c n d )/2 t : equivalent to the well known measure ROC, ranging from 0.5 (randomly predicting the response) to 1 (perfectly discriminating the response). Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D 0.402 Percent Discordant 28.1 Gamma 0.417 Percent Tied 3.6 Tau-a 0.207 Pairs 224 c 0.701 14-27
Diagnostics Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Pearson residual is the ordinary residual divided by the standard error of Y i. Sum of squared residuals equals Pearson X 2 r P i = Y i ˆπ i ˆπ i (1 ˆπ i ) Studentized Pearson residual is similar but divided by its standard error so they have unit variance Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on ˆπ i > 0.5 14-28
Residual analysis Can plot residual by predicted value : A flat lowess smooth to this plot suggests the model is correct Can generate probability plot with envelope DFFITS, DFBETAS SAS contains influence and iplots options Example (Page 625)) ods html; ods graphics on; proc logistic data=a1 descending; model renew = increase / iplots influence lackfit; ods graphics off; ods html close; run; quit; 14-29
Regression Diagnostics Pearson Residual Deviance Residual Covariates Case (1 unit = 0.24) (1 unit = 0.22) increase Value -8-4 0 2 4 6 8 Value -8-4 0 2 4 6 8 1 30.0000 0.5900 * 0.7729 * 2 30.0000-1.6948 * -1.6455 * 3 30.0000 0.5900 * 0.7729 * 4 31.0000 0.6281 * 0.8155 * 5 32.0000 0.6686 * 0.8597 * 6 33.0000 0.7118 * 0.9054 * 7 34.0000-1.3197 * -1.4203 * 8 35.0000 0.8066 * 1.0012 * 9 35.0000 0.8066 * 1.0012 * 10 35.0000-1.2397 * -1.3645 * 11 36.0000-1.1646 * -1.3092 * 12 37.0000 0.9141 * 1.1021 * 13 38.0000 0.9731 * 1.1543 * 14 39.0000-0.9653 * -1.1476 * 15 40.0000 1.1028 * 1.2615 * 16 40.0000-0.9068 * -1.0955 * 17 40.0000-0.9068 * -1.0955 * 18 41.0000 1.1739 * 1.3163 * 19 42.0000-0.8002 * -0.9949 * 20 43.0000-0.7517 * -0.9465 * 21 44.0000-0.7061 * -0.8995 * 22 45.0000 1.5076 * 1.5399 * 23 45.0000-0.6633 * -0.8540 * 24 45.0000-0.6633 * -0.8540 * 25 46.0000 1.6049 * 1.5963 * 26 47.0000-0.5853 * -0.7676 * 27 48.0000-0.5498 * -0.7268 * 28 49.0000 1.9361 * 1.7651 * 29 50.0000-0.4852 * -0.6502 * 30 50.0000-0.4852 * -0.6502 * 14-30
Hat Matrix Diagonal Intercept Case (1 unit = 6.E-03) DfBeta (1 unit = 0.07) Value 0 2 4 6 8 12 16 Value -8-4 0 2 4 6 8 1 0.1040 * 0.1945 * 2 0.1040 * -0.5587 * 3 0.1040 * 0.1945 * 4 0.0941 * 0.1902 * 5 0.0841 * 0.1831 * 6 0.0743 * 0.1732 * 7 0.0651 * -0.2792 * 8 0.0568 * 0.1441 * 9 0.0568 * 0.1441 * 10 0.0568 * -0.2215 * 11 0.0497 * -0.1689 * 12 0.0442 * 0.1013 * 13 0.0404 * 0.0744 * 14 0.0385 * -0.0405 * 15 0.0385 * 0.00837 * 16 0.0385 * -0.00688 * 17 0.0385 * -0.00688 * 18 0.0404 * -0.0310 * 19 0.0440 * 0.0479 * 20 0.0491 * 0.0696 * 21 0.0555 * 0.0879 * 22 0.0628 * -0.2338 * 23 0.0628 * 0.1029 * 24 0.0628 * 0.1029 * 25 0.0707 * -0.2957 * 26 0.0788 * 0.1240 * 27 0.0869 * 0.1306 * 28 0.0946 * -0.5053 * 29 0.1017 * 0.1370 * 30 0.1017 * 0.1370 * 14-31
Influence Plots 14-32
Residual Plots 14-33
Ordinal Logistic Regression Have more than two possible outcomes on ordered scale (i.e., Likert scale), say J outcomes Often called proportional odds model ( ) Pr(Yi j) log = β 0j +β 1 X i1 + +β p 1 X i,p 1, 1 Pr(Y i j) j = 1,2,,J 1 Shared values of β 1,,β p 1 J 1 intercepts: β 01,β 02,,β 0,J 1 Proportional odds (irrelevant to the values of predictors) Pr(Y i j X i ) 1 Pr(Y i j X i ) / Pr(Y i k X i ) 1 Pr(Y i k X i ) = eβ 0j β 0k 14-34
Dose-Symptoms Example data dsymp; input dose symptoms $ n @@; ldose = log10(dose); cards; 10 None 33 10 Mild 7 10 Severe 10 20 None 17 20 Mild 13 20 Severe 17 30 None 14 30 Mild 3 30 Severe 28 40 None 9 40 Mild 8 40 Severe 32 ; /* ORDER: specifies sorting order for classification variables */ /* ORDER=DATA: order of appearance in the input data set */ proc logistic data=dsymp order=data; class symptoms; model symptoms = ldose; weight n; run; quit; Response Profile Ordered Total Total Value symptoms Frequency Weight 1 None 4 73.000000 2 Mild 4 31.000000 3 Severe 4 87.000000 Probs modeled are cumulated over the lower Ordered Values. 14-35
Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 0.1674 1 0.6825 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 31.3281 1 <.0001 Score 29.6576 1 <.0001 Wald 28.5177 1 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept None 1 4.1734 0.8862 22.1759 <.0001 Intercept Mild 1 4.9372 0.9083 29.5473 <.0001 ldose 1-3.5207 0.6593 28.5177 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits ldose 0.030 0.008 0.108 14-36
Polytomous Logistic Regression For polytomous or multicategory, consider there are J response categories Select one as baseline or reference category, ( ) P(Yi = j) log = β 0j +β 1j X i1 + +β p 1,j X i,p 1, P(Y i = J) log ( ) πij π ij = X i β j, j = 1,,J 1; Results in J 1 parameter vectors, β 1,,β J 1 14-37
Dose-Symptoms Example /* CATMOD: performs categorical data modeling */ /* DIRECT: lists numeric predictors */ proc catmod data=dsymp order=data; direct ldose; weight n; model symptoms = ldose; run; quit; Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Intercept 2 24.83 <.0001 ldose 2 26.47 <.0001 Likelihood Ratio 4 6.91 0.1410 Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq Intercept 1 5.3897 1.0998 24.02 <.0001 2 2.3076 1.3681 2.84 0.0917 ldose 1-4.1489 0.8065 26.46 <.0001 2-2.4128 0.9902 5.94 0.0148 14-38
Overdispersion Binomial/Bernoulli Model E(Y i ) = π i Var(Y i ) = π i (1 π i ) Beta-Binomial Model E(Y i ) = π i Var(Y i ) = σ 2 π i (1 π i ) Mean unaffected but Cov(ˆβ) σ 2 (X WX) 1 Can show E(χ 2 /(N p)) σ 2 14-39
Example (Page 625) proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 28 37.4648 1.3380 Scaled Deviance 28 37.4648 1.3380 Pearson Chi-Square 28 30.1028 1.0751 Scaled Pearson X2 28 30.1028 1.0751 Log Likelihood -18.7324 --------------------------------------------------------------- proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 28 37.4648 1.3380 Scaled Deviance 28 9.3662 0.3345 Pearson Chi-Square 28 30.1028 1.0751 Scaled Pearson X2 28 7.5257 0.2688 Log Likelihood -4.6831 14-40
proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Analysis Of Parameter Estimates --------------------------------------------------------------- proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Analysis Of Parameter Estimates Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept 1 4.8075 2.6558 3.28 0.0703 increase 1-0.1251 0.0668 3.51 0.0610 Scale 0 1.0000 0.0000 Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept 1 4.8075 5.3115 0.82 0.3654 increase 1-0.1251 0.1335 0.88 0.3489 Scale 0 2.0000 0.0000 14-41
INGOTS Example Taken from Cox and Snell (1989, p.10-11) The data consist of the number, r, of ingots not ready for rolling, out of n tested, for a number of combinations of heating time (heat) and soaking time (soak). data ingots; input heat soak r n @@; cards; 7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13 7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1 7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1 7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1 7 4.0 0 9 14 4.0 0 19 27 4.0 1 16 ; 14-42
proc logistic data = ingots; model r/n = heat soak; run; quit; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 108.988 101.346 SC 112.947 113.221-2 Log L 106.988 95.346 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 11.6428 2 0.0030 Score 15.1091 2 0.0005 Wald 13.0315 2 0.0015 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1-5.5592 1.1197 24.6503 <.0001 heat 1 0.0820 0.0237 11.9454 0.0005 soak 1 0.0568 0.3312 0.0294 0.8639 14-43
proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=p; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 16 13.7526 0.8595 Scaled Deviance 16 16.2476 1.0155 Pearson Chi-Square 16 13.5431 0.8464 Scaled Pearson X2 16 16.0000 1.0000 Log Likelihood -56.3214 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1-5.5592 1.0301-7.5782-3.5401 29.12 <.0001 heat 1 0.0820 0.0218 0.0392 0.1248 14.11 0.0002 soak 1 0.0568 0.3047-0.5405 0.6540 0.03 0.8522 Scale 0 0.9200 0.0000 0.9200 0.9200 NOTE: The scale parameter was estimated by the square root of Pearson s Chi-Square/ 14-44
proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=d; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 16 13.7526 0.8595 Scaled Deviance 16 16.0000 1.0000 Pearson Chi-Square 16 13.5431 0.8464 Scaled Pearson X2 16 15.7562 0.9848 Log Likelihood -55.4632 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1-5.5592 1.0381-7.5938-3.5246 28.68 <.0001 heat 1 0.0820 0.0220 0.0389 0.1252 13.90 0.0002 soak 1 0.0568 0.3071-0.5451 0.6586 0.03 0.8533 Scale 0 0.9271 0.0000 0.9271 0.9271 NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. 14-45
Poisson Regression Like logistic regression, this is a nonlinear regression model for discrete outcomes Used when response variable is a count, for example, Number of acute asthma attacks in a week Number of visits to mall during December Useful when large count is a rare event Otherwise, standard linear model with normal errors appropriate Poisson counts use transformation 14-46
Regression Model Standard linear/nonlinear regression form: Y i = E(Y i )+ε i Generalized linear model: Exponential family distribution for Y and link between the mean and covariates Common link g( ) such that µ i = g 1 (X i β) µ i = X i β (LINK=IDENTITY) µ i = exp(x iβ) (LINK=LOG) µ i = (X i β)2 (LINK=POWER(0.5)) All must result in µ i nonnegative Model is such that the Y i s are independent Poisson random variables with expected values µ i = g 1 (X i β) 14-47
Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = Y i log{µ(x i,β)} µ(x i,β)+c MLEs do not have closed forms Iterative reweighted least squares or other numerical search procedures used to solve for β 14-48
Hypothesis Testing Hypothesis testing and formulation of test statistics is done in similar manner to logistic regression. The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (µ i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For Poisson regression Yi DEV(X 1,X 2,...,X p 1 ) = 2[ log ( ) Yi ˆµ i (Y i ˆµ i ) ] 14-49
Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD Prediction Mean response for predictors X Probability of specific count (i.e., Y = 0) 14-50
Example (Page 621) Lumber company interested in the relationship between the number of customers from a census tract and X 1 : Number of housing units X 2 : Average income X 3 : Average housing unit age X 4 : Distance to nearest competitor X 5 : Distance to store Will use LINK=LOG log(µ i ) = X β data ppoi1; infile U:\.www\datasets525\CH14TA08.DAT ; input ncust x1 x2 x3 x4 x5; proc genmod; model ncust= x1 x2 x3 x4 x5 / link=log dist=poi type3; run; quit; 14-51
Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 104 114.9854 1.1056 Scaled Deviance 104 114.9854 1.1056 Pearson Chi-Square 104 101.8808 0.9796 Scaled Pearson X2 104 101.8808 0.9796 Log Likelihood 1898.0224 Analysis Of Parameter Estimates NOTE: The scale parameter was held fixed. LR Statistics For Type 3 Analysis Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square P>ChiSq Intercept 1 2.9424 0.2072 2.5362 3.3486 201.57 <.0001 x1 1 0.0006 0.0001 0.0003 0.0009 18.17 <.0001 x2 1-0.0000 0.0000-0.0000-0.0000 30.63 <.0001 x3 1-0.0037 0.0018-0.0072-0.0002 4.37 0.0365 x4 1 0.1684 0.0258 0.1179 0.2189 42.70 <.0001 x5 1-0.1288 0.0162-0.1605-0.0970 63.17 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 Chi- Source DF Square Pr > ChiSq x1 1 18.20 <.0001 x2 1 31.79 <.0001 x3 1 4.38 0.0364 x4 1 41.66 <.0001 x5 1 67.50 <.0001 14-52
Diagnostics Goodness of fit Deviance goodness of fit : Compare overall fit with χ 2 n p Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on whether ˆµ i > Y i Can plot r D i by observation (index plot) Can generate probability plot with envelope 14-53
Chapter Review Logistic Regression Background Model Inference Diagnostics and remedies Ordinal Logistic Regression Polytomous Logistic Regression Poisson Regression Background Model Inference 14-54