STAT 7030: Categorical Data Analysis

Size: px

Start display at page:

Download "STAT 7030: Categorical Data Analysis"

Candice Greene
5 years ago
Views:

1 STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

2 Logistic Regression for Binary Response Data Logistic regression models binary response variables, for which the response outcome for each subject is a success or failure. Banks predict the probability that a person pays a bill on time using predictors such as size of the bill, annual income, occupation, mortgage and debt obligations, percentage of bills paid on time in the past, and other aspects of an applicant s credit history. A company that relies on catalog sales may determine whether to send a catalog to a potential customer by modeling the probability of a sale as a function of indices of past buying behavior. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

3 Outline 1 Logistic regression Horseshoe crabs (one continuous variable) Maternal alcohol consumption (one categorical predictor) Checking model adequacy Horseshoe crab, revisited Neuralgia Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

4 Horseshoe crabs (one continuous variable) Example: Horseshoe Crabs The data comes from a study of nesting horseshoe crabs. Each female horseshoe crab had a male crab resident in her nest. Satellites mean other male crabs residing nearby. Define a binary response: { 1, if a female crab has at least one satellite Y = 0, if a female crab has no satellite We want to model this binary response on one continuous predictor (width of the female crab), π(x) = P(Y = 1 X = x) = 1 P(Y = 0 X = x). Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

5 Horseshoe crabs (one continuous variable) Nonlinear Relationship Between π(x) and x Usually, binary data result from a nonlinear relationship between π(x) and x. A fixed change in x often has less impact when π(x) is near 0/1 than when π(x) is near 0.5. pi(x) beta > 0 pi(x) beta < 0 For example, in the purchase of an automobile, consider the choice between buying new or used. Let π(x) denote the probability of selecting new when annual family income = x. An increase of $50, 000 in annual income would have less effect when x = $1, 000, 000 (for which π(x) is near 1) than when x = $50, 000. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

6 Horseshoe crabs (one continuous variable) Logistic Regression Logistic regression assumes that there exists a nonlinear relationship between π(x) and x, π(x) = exp(α + βx) 1 + exp(α + βx) It implies π(x) increases (decreases) as an S-shaped function of x. This nonlinear regression model can be written in terms of a linear model for transformed response, called logit model. Note that the response is log odds. logit[π(x)] = log π(x) 1 π(x) = η = α + βx. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

7 When β 0, the curve flattens to a horizontal straight line. When β = 0, Y is independent of X. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66 Logistic regression Horseshoe crabs (one continuous variable) Influence of β The sign of β determines whether π(x) is increasing (β > 0) or decreasing (β < 0) as x increases. The rate of change increases as β increases. pi(x) alpha = 1.0 beta = 0.1 pi(x) alpha = 1.0 beta = 1.0 pi(x) alpha = 1.0 beta = pi(x) alpha = 1.0 beta = 0.1 pi(x) alpha = 1.0 beta = 1.0 pi(x) alpha = 1.0 beta =

8 Horseshoe crabs (one continuous variable) Interpret β Using Odds Ratio The logit increases by β for every one unit increase in x. β = logit[π(x + 1)] logit[π(x)] = log odds at x + 1 odds at x Or equivalently, the odds increases multiplicatively by e β for every one unit increase in x. e β = odds at x + 1 odds at x = π(x + 1)/(1 π(x + 1)) π(x)/(1 π(x)) Therefore, e β is in fact an odds ratio. Most of us dot not think naturally on a logit or odds scale, so we need to consider alternative interpretations. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

9 Horseshoe crabs (one continuous variable) Interpret β Using Probability The function π(x) is a curve and the rate of change (first derivative) π (x) = βπ(x)[1 π(x)] depends on the value of x. When π(x) = 0.5, the rate is 0.25β. When π(x) = 0.1, the rate is 0.09β. When π(x) approaches 1.0 or 0, the rate also approaches 0. The steepest slope (rate of change) occurs at x for which π(x) = 0.5, which corresponds to x = α/β. This x value is called the median effective level. It represents the level at which each outcome has a 50% chance. Near x where π(x) = 0.5, a change in x of 1/β corresponds to a change in π(x) roughly (1/β)(β/4) = Therefore, 1/β approximates the distance between x values where π(x) = 0.25 or 0.75 and where π(x) = Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

10 Horseshoe crabs (one continuous variable) pi(x) pi(x) = exp(2x)/(1 + exp(2x)) Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

11 Horseshoe crabs (one continuous variable) Scatter Plot for Horseshoe Crabs Plotting Y against X is not informative because Y is binary. presence of satellites width Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

12 Horseshoe crabs (one continuous variable) Visualizing Data with Binary Responses When there are n i observations at setting i of X plot log p i 1 p i = log y i n i y i against x Or if y i = 0 or n i plot log y i n i y i against x If the plot shows a linear pattern, logistic regression is appropriate. When x is continuous and n i = 1 (or very small), group the data with nearby x values into categories. If we use other link function g( ), we can plot g(π(x)) against x to check whether the points lie around a straight line. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

13 Horseshoe crabs (one continuous variable) Plot for Horseshoe Crabs Data A logistic regression is approximate for the horseshoe crab data. pi(x) logit(pi) width width Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

14 Horseshoe crabs (one continuous variable) SAS Code and Output proc logistic data = SAS-Dataset; model yvar (event = level ) = list-of-variables; run; The option event = tells SAS that we want to model the probability of that particular level. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

15 SAS Output Horseshoe crabs (one continuous variable) The SAS output shows ˆα = , SE(ˆα) = ˆβ = , SE( ˆβ) = Therefore, the fitted model is logit[ˆπ(x)] = x Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

16 Horseshoe crabs (one continuous variable) Fitted Regression Function Therefore, the estimated probability at x is ˆπ(x) = exp( x) 1 + exp( x) The estimated odds of having a satellite increases by 64.4% for each 1-cm increase in width. (exp( ˆβ) = 1.644). At the minimum width x = 21.0, the estimated probability is ˆπ(21.0) = exp( ) 1 + exp( ) = At the maximum width x = 33.5, the estimated probability is Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

17 Horseshoe crabs (one continuous variable) Interpretation Using Probability At the sample mean width of 26.3cm, ˆπ(x) = The estimated incremental rate of change in the fitted probability at this point is ˆβ ˆπ(x)[1 ˆπ(x)] = (0.497)(0.674)(0.326) = 0.11 For female crabs near the mean width, the estimated probability of a satellite increases at the rate of 0.11 per 1cm increase in width. The estimated rate of change is greatest at the median effective level x = 24.8 at which ˆπ(x) = log = 0 = ˆα + ˆβx x = ˆαˆβ = 24.8 There, the estimated probability increases at the rate of (0.497)(0.50)(0.50) = 0.12 per 1 cm increase in width. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

18 Horseshoe crabs (one continuous variable) Comparing Different Variables Suppose we want to compare the effect of width and weight. The fitted function for width is logit[ˆπ(x)] = x The fitted function with weight as the predictor is logit[ˆπ(x)] = x The estimated odds of having a satellite increases by 514.1% for each 1-kg increase in weight. (exp(1.815) = 6.141). Notice that the relationship between π(x) and x is nonlinear. It may be misleading if we only look at the estimates of β. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

19 Horseshoe crabs (one continuous variable) Comparing Estimated Probabilities The probabilities π(x) at quartiles are useful for comparing the effects of predictors having different units. The lower quartile, median, and upper quartile for width are 24.9, 26.1, and 27.7, and ˆπ(x) at those values equals 0.51, 0.65, and ˆπ(26.1) = exp( ) 1 + exp( ) = The quartiles for weight are 2.00, 2.35, and 2.85, and ˆπ(x) at those values are 0.48, 0.64, and Therefore, the effect of weight is similar to that of width. ˆπ(2.35) = exp( ) 1 + exp( ) = Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

20 Horseshoe crabs (one continuous variable) Inference for Parameters The SAS output gives the estimate of β and the corresponding standard error. A Wald (1 α)100% confidence interval for β is ˆβ ± z α/2 SE( ˆβ) To test hypothesis H 0 : β = β 0, the Wald test statistic is z = ( ˆβ β 0 )/SE( ˆβ) Under H 0, z approximately follows N(0, 1). The p-value and critical value are calculated according to H a. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

21 Horseshoe crabs (one continuous variable) Example of Wald Test For the logistic regression with a single predictor, logit[π(x)] = α + βx, Want to know if the probability π(x) depends on β. The test statistic of Wald test is H 0 : β = 0, H a : β 0. z = ˆβ/SE( ˆβ) Under H 0, z approximately follows N(0, 1). The SAS output shows Wald chi-squared statistic z 2 and the p-value is P(χ 2 1 z 2 ). In the example, the statistic is z 2 = (0.4972/0.1017) 2 = The p-value < , and thus β is significantly different from 0. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

22 Horseshoe crabs (one continuous variable) Confidence Interval for Parameters An approximate (1 α)100% confidence interval for β is ˆβ ± z α/2 SE( ˆβ) The Wald 95% confidence interval for β is ± = (0.2978, ) The confidence interval for the effect on the odds per 1-cm increase in width equal (e , e ) = (1.36, 2.03). We infer that a 1-cm increase in width has at least a 36% increase and at most a doubling in the odds of a satellite. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

23 Horseshoe crabs (one continuous variable) Full Model and Reduced Model Likelihood ratio test can be used to compare a reduced model with a full model. Full model: larger model, more parameters, considered to be appropriate for the data. Reduced model: smaller model, less parameters, simplified under H 0. For example, in a logistic regression logit(π) = α + βx if we want to test H 0 : β = 0, then full and reduced model are full model : logistic regression with logit(π) = α + βx reduced model : logistic regression with logit(π) = α Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

24 Horseshoe crabs (one continuous variable) Likelihood Ratio Test Denote It is clear that l 0 l 1. l 0 = maximized log likelihood under H 0 l 1 = maximized log likelihood under H a The likelihood ratio test statistic is G 2 = 2(l 0 l 1 ) Under H 0, G 2 approximately follows χ 2 d, where d is the difference between the number of parameters under H 0 and H a. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

25 Horseshoe crabs (one continuous variable) Some Comments The relationship between Wald test and likelihood ratio test is similar to that between t-test and F -test in linear regression. When testing a single parameter, Wald test and likelihood ratio test are approximately equivalent in the sense that they have the same p-values. However H a of Wald test can be either one-sided or two-sided, while H a of likelihood ratio test is usually two-sided. The likelihood ratio test can be used for more complicated testing problems. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

26 Horseshoe crabs (one continuous variable) Example of Likelihood Ratio Test The likelihood ratio test of H 0 : β = 0 essentially compares the following two models: H 0 : logit[π(x)] = α, H a : logit[π(x)] = α + βx The likelihood ratio test statistic is Under H 0, G 2 follows χ 2 1. G 2 = 2(l 0 l 1 ) The maximized log likelihood is for logit[π(x)] = α + βx and for logit[π(x)] = α. Then G 2 = 2( ) = 31.3 We should reject H 0, which means the probability that the a female crab has at least a satellite depends on its width. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

27 Horseshoe crabs (one continuous variable) Confidence Interval for Estimated Probabilities The confidence interval for π(x) is obtained by transforming the confidence interval for logit[π(x)]. An approximate (1 α)100% confidence interval for logit[π(x)] is (ˆα + ˆβx) ± z α/2 SE(ˆα + ˆβx) where SE(ˆα + ˆβx) = var(ˆα) + x 2 var( ˆβ) + 2x cov(ˆα, ˆβ). Denote the confidence interval for logit[π(x)] is (a, b). Then an approximate (1 α)100% confidence interval for π(x) is ( exp(a) 1 + exp(a), exp(b) ) 1 + exp(b) Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

28 Horseshoe crabs (one continuous variable) Example Consider a crab with width x = 26.5, the estimated logit is (0.4972)(26.5) = From the SAS output, var(ˆα) = 6.910, var( ˆβ) = , and cov(ˆα, ˆβ) = , then var(ˆα + ˆβx) = (26.5) 2 ( ) + (2)(26.5)( ) = A 95% confidence interval for logit[π(26.5)] is The estimated probability is ± = (0.457, 1.193) ˆπ(26.5) = exp(0.825)/(1 + exp(0.825)) = and a 95% confidence interval is (0.612, 0.768). Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

29 Horseshoe crabs (one continuous variable) SAS Code for Estimated Probability proc logistic data = SAS-Dataset; model response (event = level ) = list-of-variables; output out = mydata prob = prob lower = lower upper = upper; run; The above SAS code creates a new dataset, called mydata, with variables prob, lower, upper, which give the estimated probabilities and its 95% confidence intervals. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

30 Horseshoe crabs (one continuous variable) How about Ignoring the Model? Six female crabs in the sample had x = 26.5, and four of them had satellites. If we ignore the model, a 95% confidence interval is (4/6) ± 1.96 (4/6)(1 4/6)/6 = (0.29, 1.05) which is much wider than the one obtained from model. When the logistic model truly holds, the model-based estimator of a probability is considerably better than the sample proportion. The model has only two parameters to estimate, whereas the saturated model has a separate parameter for every distinct value of x. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

31 Maternal alcohol consumption (one categorical predictor) Example: Maternal Alcohol Consumption The following table summarizes the results of a study of maternal alcohol consumption (average number of drinks per day) and child s congenital malformations. congenital malformation alcohol consumption present absent < The response is binary and the predictor is a categorical variable. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

32 Categorical Predictors Maternal alcohol consumption (one categorical predictor) Categorical explanatory variables are also called factors. First consider a single factor X with I categories (levels). alcohol consumption success failure total level 1 y 1 n 1 y 1 n 1 level 2 y 2 n 2 y 2 n 2 level I y I n I y I n I In row i of the I 2 table, y i is the number of outcomes in the first column (successes) out of n i trails. It is analog to one-way ANOVA model. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

33 Saturated Model Maternal alcohol consumption (one categorical predictor) A saturated model has a separate parameter for each observation (distinct value of x), and it provides a perfect fit to the data. Heuristically, a saturated model assumes there is no relationship between the means at different values of x. We treat y i as binomial with parameter π i. logit(π i ) = log π i 1 π i = α + β i. There are in total I + 1 unknown parameters {α, β 1,..., β I }. It is a saturated model because π i is different for different level of x. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

34 Comparing Probabilities Maternal alcohol consumption (one categorical predictor) The higher β i is, the higher the value of π i (probability of success). The interested question is if there is any factor affect. or equivalently, H 0 : β 1 = β 2 = = β I H 0 : π 1 = π 2 = = π I Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

35 Maternal alcohol consumption (one categorical predictor) Constraint on Parameters There is one redundant parameter in {α, β i }. We can solve this problem by adding one constraint. One popular constraint is to choose the one level of x as the reference level, and set β I = 0. level 1 logit(π 1 ) = α + β 1 level I 1 logit(π I 1 ) = α + β I 1 level I logit(π I ) = α Therefore, α is the logit in row I, and β i is the log odds ratio for row i and row I. α = log π I 1 π I, β i = logit(π i ) logit(π I ) = log π i/(1 π i ) π I /(1 π I ) Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

36 Maternal alcohol consumption (one categorical predictor) Other Choice of Constraint The constraint of parameters and coding scheme are not unique. However, for different constraint and coding scheme, {ˆα + ˆβ i } or {ˆπ i } are the same. The differences ˆβ i ˆβ j for any two levels of X are identical and represent estimated log odds ratios. When a factor has two levels, a common alternative constraint is βi = β 1 + β 2 = 0. In this case, α is the average logit, and β i is the difference between the logit in row i and the average logit. The corresponding regression function is logit(π i ) = α + βx. where let x = 1 for one level and x = 1 for the other. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

37 Estimate of Parameters Maternal alcohol consumption (one categorical predictor) Let us focus on the constraint of β I = 0. The meaning of the parameters shed light on the estimation of parameters. Notice that population proportions are estimated by sample proportions. ˆπ i = p i = y i /n i and thus the estimates of α and β i are ˆα = log p I 1 p I, ˆβi = log p i/(1 p i ) p I /(1 p I ) It can be verified that these estimates are MLEs of α and β i. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

38 Maternal alcohol consumption (one categorical predictor) SAS Code and Results proc logistic data = SAS-Dataset; class xvar (ref = level ) / param = reference; model yvar / total = list-of-variables; run; The SAS output shows For example, estimate std err 95% CI α β β β β β ˆα = log 1 37 = , ˆβ1 = log = Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

39 Maternal alcohol consumption (one categorical predictor) Equivalent Regression Model With dummy variables, the model can be represented in terms of a regression model. A factor with I levels needs I 1 dummy variables. Define x 1,..., x I 1 by x i = 1 for row i and x i = 0 otherwise (i = 1,..., I 1). Therefore, if x 1 = = x I 1 = 0, the observation is in row I. The regression model is logit(π i ) = α + β 1 x β I 1 x I 1. α and β i have the same interpretations. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

40 Consider the hypothesis, Logistic regression Maternal alcohol consumption (one categorical predictor) Results for Example H 0 : β 1 = = β 5 Notice that H 0 means π 1 = = π 5 or the response is independent of the predictor. Under H 0, the estimated probability ˆπ 0 is the same for all different levels of x. ˆπ 0 = total sample size = Therefore, Pearson chi-squared statistic is X 2 = 12.1 (p-value is 0.02) and likelihood ratio statistic G 2 = 6.2 (p-value is 0.19). Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

41 Test of Independence The observed and expected frequencies are For example, Maternal alcohol consumption (one categorical predictor) congenital malformation expected alcohol consumption present absent present absent < = ( )( ), = ( )( ) Under H 0, both X 2 and G 2 follow χ 2 4. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

42 Ordinal Maternal alcohol consumption (one categorical predictor) Notice that the variable alcohol consumption is ordinal. Assume that scores {0, 0.5, 1.5, 4.0, 7.0} properly describe distances between levels of X. Consider the following model The SAS output shows logit(π i ) = α + βx i ˆα = , SE(ˆα) = ˆβ = , SE( ˆβ) = The estimated multiplicative effect of a unit increase in daily alcohol consumption on the odds of malformation is exp(0.317) = Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

43 Inference for β Maternal alcohol consumption (one categorical predictor) The test of independence corresponds to H 0 : β = 0. with p-value = z = / = The model seems to fit well from the table. alcohol proportion malformed consumption present absent observed fitted < Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

44 Checking model adequacy Goodness-of-fit The goodness-of-fit test is used to check if a model adequately describe the variability in data. If there are more than one subjects for each setting of x, compare the model with the saturated model. for example alcohol fitted consumption present absent prob present absent < = ( )(0.0026), = ( )( ) Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

45 Checking model adequacy Two Chi-squared Statistics First calculate fitted frequencies for each categories, and then calculate Pearson chi-squared statistic or likelihood ratio statistic using the following formulas. X 2 = i G 2 = 2 i (observed fitted) 2 fitted observed log observed fitted The Pearson chi-squared statistic for goodness-of-fit is X 2 = 2.05, and the likelihood ratio statistic is G 2 = Under H 0, both X 2 and G 2 follow χ 2 3. χ 2 3,0.05 = Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

46 Checking model adequacy Goodness-of-fit, More If there is only one subject for each setting of x, it may be misleading to compare the model with the saturated model because the chi-squared distribution does not hold. (think of linear regression model) We can group the data and check the goodness-of-fit. If the test is accepted, we can feel more comfortable about using the model for the original ungrouped data. Compare the model with a more complicated one (for example: logit[π(x)] = α + β 1 x + β 2 x 2 ). If more complex models do not fit better (test is accepted), this provides some assurance that the model chosen is reasonable. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

47 Checking model adequacy Goodness-of-fit For Horseshoe Crabs width # obs # Yes # No Fitted Yes Fitted No < > Fitted yes of kth category = Fitted no of kth category = x i in kth category x i in kth category ˆπ(x i ). {1 ˆπ(x i )}. X 2 = 5.3 and G 2 = 6.2. Both follows χ 2 6, and P-value is about 0.4. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

48 Checking model adequacy Compare to a More Complicated Model Fit a quadratic logistic regression for width, and the fitted model is logit[ˆπ(x)] = x x 2 The Wald test is not significant (P-value is ). And the likelihood-ratio statistic is 2 ( ) = There is no evidence to support adding a quadratic term. Therefore, a linear logistic model is adequate for width. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

49 Horseshoe crab, revisited Multiple Logistic Regression When there are p predictors x 1,..., x p, the logistic regression models π(x) = P(Y = 1) by or equivalently, π(x) = exp(β 0 + β 1 x β p x p ) 1 + exp(β 0 + β 1 x β p x p ) logit[π(x)] = β 0 + β 1 x β p x p The parameter β i refers to the effect of x i on the log odds that Y = 1, controlling the other x j. And e β i is the multiplicative effect on the odds of a one-unit increase in x i, at fixed levels of other x s. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

50 Horseshoe crab, revisited Horseshoe Crab Data Logistic regression can have a mixture of quantitative and qualitative predictors. In this example, we analyze the horseshoe crab data by using both the female crab s shell width and color as predictors. logit(π) = β 0 + β 1 c 1 + β 2 c 2 + β 3 c 3 + β 4 x where π = P(Y = 1) is the probability that a female crab has satellite, x = width in centimeters, and c 1 = 1 for medium-light color, and 0 otherwise c 2 = 1 for medium color, and 0 otherwise c 3 = 1 for medium-dark color, and 0 otherwise. The crab color is dark when c 1 = c 2 = c 3 = 0. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

51 Horseshoe crab, revisited The fitted model is Fitted Regression Model logit(ˆπ) = c c c x It is informative to write the fitted model for different colors, medium-light : logit(ˆπ) = x medium : logit(ˆπ) = x medium-dark : logit(ˆπ) = x dark : logit(ˆπ) = x The exponentiated difference between two color parameter estimates is an odds ratio comparing those two colors. At any given width, the estimated odds that a medium-light crab has a satellite are e = 3.8 times the estimated odds for a dark crab. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

52 Horseshoe crab, revisited Plot of Fitted Regression Curves Any one curve equals any other curve shifted to the right or left. The parallelism of curves in the horizontal dimension implies that any two curves never cross. (No interaction between color and width.) predicted probability four curves, from left to right medium (c2) medium light (c1) medium dark (c3) dark Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

53 Horseshoe crab, revisited Check Model Adequacy A more complicated model allowing color width interaction has three additional terms, the cross-products of width with the color dummy variables. logit(π) = β 0 + β 1 c 1 + β 2 c 2 + β 3 c 3 + β 4 x + β 5 c 1 x + β 6 c 2 x + β 7 c 3 x It is equivalent to fitting logistic regression with width predictor separately for crabs of each color. medium-light : logit(π) = (β 0 + β 1 ) + (β 4 + β 5 )x medium : logit(π) = (β 0 + β 2 ) + (β 4 + β 6 )x medium-dark : logit(π) = (β 0 + β 3 ) + (β 4 + β 6 )x dark : logit(π) = β 0 + β 4 x Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

54 Horseshoe crab, revisited Result of Testing Consider the hypothesis, H 0 : β 5 = β 6 = β 7 = 0, H a : not all β 5, β 6, β 7 are zero Notice that the full model is the model with interaction terms, while the reduced model is the model without interaction terms. The likelihood ratio test statistic is G 2 = = The degrees of freedom are df = 3 with p-value Therefore, the model without interaction terms is adequate to model the data. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

55 Horseshoe crab, revisited Test Effect of Color In the model without interaction terms, logit(π) = β 0 + β 1 c 1 + β 2 c 2 + β 3 c 3 + β 4 x To test whether color contributes significantly to model, H 0 : β 1 = β 2 = β 3 = 0 H a : not all β 1, β 2, β 3 are zero The reduced model is logit(π) = β 0 + β 4 x. The likelihood ratio test statistic is G 2 = = The degrees of freedom is df = 3 with p-value Therefore, we should accept H 0, which means controlling for width, the probability of a satellite is independent of color. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

56 Horseshoe crab, revisited More Consideration for Color The color has ordered categories, from lightest to darkest. logit(π) = β 0 + β 1 c + β 2 x We can assign scores to different color categories. Use scores c = {1, 2, 3, 4} for the color categories. ˆβ 1 = (SE = 0.224) and ˆβ 2 = (SE = 0.104). Compare this model to the one treating color as nominal, G 2 = 1.7, df = 2, and p-value is Use scores c = {1, 1, 1, 0} for the color categories. ˆβ1 = (SE = 0.526) and ˆβ 2 = (SE = 0.104). G 2 = 0.5, df = 2 and p-value is A much larger sample is needed to determine which color scoring is more appropriate. It is advantageous to treat ordinal predictors in a quantitative manner when such models fit well. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

57 Neuralgia Example: Neuralgia Consider a study of the analgesic effects of treatments on elderly patients with neuralgia. Two test treatments (A and B) and a placebo (P) are compared. The response variable is whether the patient reported pain or not. Researchers recorded age and gender of the patients and the duration of complaint before the treatment began. The data consist of 60 patients. The four predictor variables are treatment sex age duration A, B, P F, M continuous continuous Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

58 Neuralgia Logistic Model Consider the following logistic model logit(π) = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + β 4 x i4 + β 5 x i5 where π is the probability of reporting pain and the predictor variables are x 1 x 2 x 3 x 4 x 5 = 1 for treatment A and = 0 otherwise = 1 for treatment B and = 0 otherwise = 1 for female and = 0 for male age duration Notice that x 1 and x 2 are two dummy variables constructed for treatment. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

59 Is the model significant? Logistic regression Neuralgia Model Fitting H 0 : β 1 = = β 5 = 0, H a : not all β i are zero The likelihood ratio statistic is G 2 = = with 5 degrees of freedom. The p-value is <.0001 and the critical value is χ 2 5,0.05 = We should reject H 0. Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept Treatment A Treatment B Sex F Age Duration Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

60 Neuralgia Refined Model The variable duration is not significant. After removing this variable, the logistic model becomes logit(π) = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + β 4 x i4 and the fitted parameters are Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept Treatment A Treatment B Sex F Age Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

61 Neuralgia Odds Ratio in SAS output The SAS output of proc logistic displays the odds ratio estimates and their confidence intervals for those variables that are not involved in any interaction terms. For a categorical variable (appeared in a class statement), the odds ratio comparing each level with the last level is computed regardless of the coding scheme. one level last level success failure x + 1 x success failure For a continuous explanatory variable, the odds ratio corresponds to one unit increase of this variable. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

62 Neuralgia Interpretation of the Parameters The odds of female patients reporting pain is 16.1% of the odds for male patients. The 95% confidence interval is (0.034, 0.762). The odds of reporting pain for patients treated by A is 4.2% of the odds for patients treated by placebo. The 95% confidence interval is (0.006, 0.303). The odds of reporting pain for patients treated by B is 2.4% of the odds for patients treated by placebo. The 95% confidence interval is (0.003, 0.222). The odds of reporting pain increase by 30.0% if the patient is one-year older. The 95% confidence interval is (1.080, 1.773). Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

63 Neuralgia Contrast Statement in SAS The contrast statement enable us to conduct flexible test involving categorical variables. Suppose we want to compare the two treatments A and B. β A = β B β A β B = 0 The corresponding SAS code is contrast A vs B Treatment 1-1;. Suppose we want to compare the treatments with the placebo. (β A + β B )/2 = β P 0.5β A + 0.5β B β P = 0 Because β P = 0 for reference coding, the corresponding SAS code is contrast AB vs P Treatment ;. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

64 Neuralgia Results There is no much difference between treatment A and B. The test is not significant (p-value = ) and the 95% confidence interval for the odds ratios (0.2786, ), which contains 1. The two treatments are quite different from the placebo. The test is significant (p-value = ) and the 95% confidence interval for the odds ratio is (0.0047, ). Notice that The syntax of contrast statement in proc logistic is different from that in proc glm. If a different coding scheme is used (e.g. effect coding), the SAS code should be modified accordingly, because the constraint of effect coding is β P = β A β B. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

65 Neuralgia Model Failure Instead of Success The logistic regression is logit(π) = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + β 4 x i4 where π is the probability of reporting pain. Let π be the probability of reporting no pain. Then π = 1 π and logit( π) = log π 1 π = log 1 π π Therefore, the logistic regression for π is = log π 1 π = logit(π) logit( π) = logit(π) = β 0 β 1 x i1 β 2 x i2 β 3 x i3 β 4 x i4 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

66 Neuralgia Comparing with a Larger Model In order to assess the model adequacy, we compare this model with a larger model that including all the pairwise interactions. There are in total 14 predictors in the full model. The hypothesis are x 1, x 2, x 3, x 4, x 5 x 1 x 3, x 1 x 4, x 1 x 5, x 2 x 3, x 2 x 4, x 2 x 5, x 3 x 4, x 3 x 5, x 4 x 5 H 0 : model with x 1,..., x 5 ; H a : model with all 14 predictors. The likelihood ratio test statistic is G 2 = = with 9 degrees of freedom. The p-value is P(χ ) = We should accept H 0, which means the model with x 1,..., x 5 is adequate. Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall / 66

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases