Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29

14.1 Regression Models with Binary Response Variable Regression Models with Binary Response Variable the response variable of interest has only two possible qualitative outcomes be represented by a binary indicator variables: taking values on 0 and 1 A binary response variable is said to involve binary responses or dichotomous( 二元 ) responses hsuhl (NUK) LR Chap 10 2 / 29

14.1 Regression Models with Binary Response Variable Meaning of Response Function when Outcome Variable is Binary 1 Consider the simple linear regression model: Y i = β 0 + β 1 X i + ε i, Y i = 0, 1 (E{ε i } = 0) E{Y i } = β 0 + β 1 X i 2 Consider Y i to be a Bernoulli random variable: Y i Probability 1 P(Y i = 1) = π i 0 P(Y i = 0) = 1 π i E{Y i } = 1(π i ) + 0(1 π i ) = π i = P(Y i = 1) = β 0 + β 1 X i hsuhl (NUK) LR Chap 10 3 / 29

14.1 Regression Models with Binary Response Variable Meaning of Response Function when Outcome Variable is Binary (cont.) The mean response E{Y i } = β 0 + β 1 X i is simply the probability that Y i = 1 when the level of the predictor variable is X i hsuhl (NUK) LR Chap 10 4 / 29

14.1 Regression Models with Binary Response Variable Special Problems when Response Variable is Binary 1 Nonnormal Error Terms: ε i = Y i (β 0 + β 1 X i ) When Y i = 1 : When Y i = 0 : ε i = 1 β 0 β 1 X i ε i = β 0 β 1 X i 2 Nonconstant Error Variance: ε i = Y i π i (π i : constant) σ 2 {Y i } = σ 2 {ε i } = π i (1 π i ) = (E{Y i })(1 E{Y i }) 3 Constraints on Response Function: 0 E{Y } = π 1 Many response function do not automatically posses this constraint. hsuhl (NUK) LR Chap 10 5 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response Ex: health researcher studying the effect of a mother s use of alcohol X: an index of degree of alcohol use during pregnancy Y c : the duration of her pregnancy (continuous response) simple linear regression model: Y c i = β c 0 + β c z X i + ε c i, ε c i N(0, σ 2 c) the usual simple linear regression analysis hsuhl (NUK) LR Chap 10 6 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) Coded each pregnancy: { 1 if Y c Y i = i T weeks 0 if Yi c > T weeks P(Y i = 1) = π i = P(Yi c T ) = P(β0 c + β1x c i + ε c i T ) = P ε c i σ c (Z) T βc 0 σ c (β0 ) = P(Z β 0 + β 1X i ) = Φ(β 0 + β 1X i ) (Φ(z) = P(Z z), Z N(0, 1) ) β1 c X i σ c ( β1 ) hsuhl (NUK) LR Chap 10 7 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) 1 probit mean response function: E{Y i } = π i = Φ(β 0 + β 1X i ) Φ 1 (π i ) = π i = β 0 + β 1X i Φ 1 : sometimes called the probit transformation π i = β 0 + β 1 X i: probit response function or linear predictor hsuhl (NUK) LR Chap 10 8 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) bounded between 0 and 1 β1 (β 0 = 0) more S-shaped, changing rapidly in the center changing the sign of β1 Symmetric function: Y i = 1 Y i (Φ(Z) = 1 Φ( Z)) P(Y i = 1) = P(Y i = 0) = 1 Φ(β 0 + β 1X i ) = Φ( β 0 β 1X i ) hsuhl (NUK) LR Chap 10 9 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) Logistic function Similar to the normal distribution: mean=0, variance=1 slightly heavier tails hsuhl (NUK) LR Chap 10 10 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) ε L logistic r.v. µ = 0,σ = π/ 3 f L (ε L ) = exp(ε L ) [1 + exp(ε L )] 2, F L(ε L ) = exp(ε L) 1 + exp(ε L ) If ε c i Logistic distribution with µ = 0,σ c P(Y i = 1) = π i = P = P π ε c i 3 ( ε c ) i β0 + β σ 1X i c σ c (ε L ) π β0 + π β1 3 3 (β 0 ) (β 1 ), (E( εc i ) = 0, σ{ εc i } = 1) σ c σ c X i = P (ε L β 0 + β 1 X i ) = F L (β 0 + β 1 X i ) = exp(β 0 + β 1 X i ) 1 + exp(β 0 + β 1 X i ) hsuhl (NUK) LR Chap 10 11 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) 2 logistic mean response function: E{Y i } = π i = F L (β 0 + β 1 X i ) = exp(β 0 + β 1 X i ) 1 + exp(β 0 + β 1 X i ) = 1 1 + exp( β 0 β 1 X i ) F 1 (π i ) = π i = β 0 + β 1 X i hsuhl (NUK) LR Chap 10 12 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) ( ) F 1 πi (π i ) = β 0 + β 1 X i = log e : 1 π i called the logit transformation of the probability π i π i : called the odds( 勝算 ) 1 π i hsuhl (NUK) LR Chap 10 13 / 29

14.1 Regression Models with Binary Response Variable Sigmoidal(S 形 ) Response Functions for Binary Response (cont.) 3 complementary log-log response function: extreme value of Gumbel probability distribution E{Y i } = π i = 1 exp( exp(β G 0 + β G 1 X i )) π = log[ log(1 π(x i ))] = β G 0 + β G 1 X i F 1 (π) = log e ( πi 1 π i ): called the logit transformation of the probability π π i 1 π i : called the odds( 勝算 ) hsuhl (NUK) LR Chap 10 14 / 29

Simple Logistic Regression The most widely used: 1 The regression parameters have relatively simple and useful interpretations 2 statistical software is widely available for analysis of logistic regression models Estimation parameters: Estimation: MLE to estimate the parameters of the logistic response function Utilize the Bernoulli distribution for a binary random variable hsuhl (NUK) LR Chap 10 15 / 29

Simple Logistic Regression (cont.) Simple Logistic Regression Model Y Ber(π): E{Y } = π Y i = E{Y i } + ε i the distribution of ε i depends on the Bernoulli distribution of the response Y i Y i are independent Bernoulli random variables with expected values E{Y i } = π, where E{Y i } = π i = exp(β 0 + β 1 X i ) 1 + exp(β 0 + β 1 X i ) hsuhl (NUK) LR Chap 10 16 / 29

Simple Logistic Regression (cont.) Likelihood function: Each Y i : P(Y i = 1) = π i ; P(Y i = 0) = 1 π i f i (Y i ) = π Y i i (1 π i ) 1 Y i, Y i = 0, 1; i = 1,..., n The joint probability function: n n g(y 1,..., Y n ) = f i (Y i ) = π Y i i (1 π i ) 1 Y i i=1 i=1 hsuhl (NUK) LR Chap 10 17 / 29

Simple Logistic Regression (cont.) n n g(y 1,..., Y n ) = f i (Y i ) = π Y i i (1 π i ) 1 Y i i=1 i=1 n [ ( )] πi n ln g(y 1,..., Y n ) = Y i ln + ln(1 π i ) 1 π i=1 i i=1 ( 1 π i = [1 + exp(β 0 + β 1 X i )] 1 ) ( ) πi ( ln = β 0 + β 1 X i ) 1 π i n n ln L(β 0, β 1 ) = Y i (β 0 + β 1 X i ) ln[1 + exp(β 0 + β 1 X i )] i=1 i=1 No closed-form solution for β 0, β 1 that max ln L(β 0, β 1 ) hsuhl (NUK) LR Chap 10 18 / 29

Simple Logistic Regression (cont.) Maximum Likelihood Estimation: To find the MLE b 0, b 1 : require computer-intensive numerical search procedures Once b 0, b 1 are found: the logit transformation: ˆπ i = exp(b 0 + b 1 X i ) 1 + exp(b 0 + b 1 X i ) ˆπ = ln ( ) ˆπ = b 0 + b 1 X 1 ˆπ hsuhl (NUK) LR Chap 10 19 / 29

Simple Logistic Regression (cont.) 1 examine the appropriateness of the fitted response function 2 if the fit is good make a variety of inferences and predictions hsuhl (NUK) LR Chap 10 20 / 29

Simple Logistic Regression (cont.) Interpretation of b 1 : The interpretation of the estimated regression coefficient b 1 in the fitted logistic response function is not the straightforward interpretation of the slope in a linear regression model. An interpretation of b 1 : the estimated odds ˆπ/(1 ˆπ) are multiplied by exp(b 1 ) for any unit increase in X ˆπ (X j ): the logarithm of the estimated odds when X = X j (denoted as ln(odds 1 )) ˆπ (X j ) = b 0 + b 1 (X j ) hsuhl (NUK) LR Chap 10 21 / 29

Simple Logistic Regression (cont.) Similarly, ln(odds 2 ) = ˆπ (X j + 1) ( ) odds2 b 1 = ln = ln(odds 2 ) ln(odds 1 ) odds 1 odds ratio( 勝算比 ): ÔR ÔR = odds 2 odds 1 = exp(b 1 ) hsuhl (NUK) LR Chap 10 22 / 29

Deviance Goodness of Fit Test Deviance goodness of fit test: completely analogous to the F test for lack of fit for simple and multiple linear regression models Assume: c unique combinations of the predictors: X 1,..., X c n j : #{repeat binary observations at X j } Y ij : the ith binary response ar X j Deviance goodness of fit test: Likelihood ratio test of the reduced model: Reduced model: E{Y i j} = [1 + exp( X jβ)] 1 π j : are parameters Full model: E{Y i j} = π j, j = 1,..., c hsuhl (NUK) LR Chap 10 23 / 29

Deviance Goodness of Fit Test (cont.) p j = Y j n j, j = 1,..., c ˆπ: the reduced model estimate of π The likelihood ratio test statistic: Deviance: G 2 = 2 [ln L(R) ln L(F )] [ ( ) ( )] c ˆπj 1 ˆπj = 2 Y j ln + (n j Y j ) ln p j 1 p j j=1 = DEV (X 0, X 1,..., X p 1 ) hsuhl (NUK) LR Chap 10 24 / 29

Deviance Goodness of Fit Test (cont.) Test the alternative: H 0 : E{Y i j} = [1 + exp( X jβ)] 1 H a : E{Y i j} [1 + exp( X jβ)] 1 Decision rule: If DEV (X 0, X 1,..., X p 1 ) χ 2 (1 α; c p) conclude H 0 If DEV (X 0, X 1,..., X p 1 ) > χ 2 (1 α; c p) conclude H a hsuhl (NUK) LR Chap 10 25 / 29

Logistic Regression Diagnostics various residuals ordinary residuals: e i e i = { 1 ˆπi if Y i = 1 ˆπ i if Y i = 0 not be normally distributed Plots of e i against Ŷ i or X j will be uninformative Pearson Residuals: r pi r pi = Y i ˆπ i ˆπ i (1 ˆπ i ) related to Pearson chi-square goodness of fit statistic hsuhl (NUK) LR Chap 10 26 / 29

Logistic Regression Diagnostics (cont.) Studentized Pearson Residuals: r SPi r SPi = r p i, H = 1 h Ŵ 1/2 X(X Ŵ X) 1 X Ŵ 1/2 ii Ŵ : the n n diagonal matrix with elements ˆπ i (1 ˆπ i ) Deviance Residuals: dev i n DEV (X 0, X 1,..., X p 1 ) = 2 [Y i ln(ˆπ i ) + (1 Y i ) ln(1 ˆπ i )] i=1 dev i = sign(y i ˆπ i ) n 2 [Y i ln(ˆπ i ) + (1 Y i ) ln(1 ˆπ i )] i=1 n ( devi 2 = DEV (X 0, X 1,..., X p 1 )) i=1 hsuhl (NUK) LR Chap 10 27 / 29

Logistic Regression Diagnostics (cont.) hsuhl (NUK) LR Chap 10 28 / 29

Logistic Regression Diagnostics (cont.) hsuhl (NUK) LR Chap 10 29 / 29