12 Modelling Binomial Response Data

Size: px

Start display at page:

Download "12 Modelling Binomial Response Data"

Grant Hunter
5 years ago
Views:

1 c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual can be placed into one of exactly two categories. Some examples: A manufacturing process is either satisfactory or unsatisfactory. A baby is delivered at either full term or prematurely. A customer applying for a loan is either a good risk or a bad risk. Generally, we can think of an outcome as resulting in either success or failure Modelling Binary Response Probabilities We consider a general framework for modelling success probabilities. Suppose that we have G groups, with N i individuals in the i-th group. The value of the explanatory variables for those individuals in group i is x i = x i1, x i2,..., x ip. Let π i be the probability of success for an individual in the i-th group, and Y i be the number of successes in that group also thus N i Y i is the number of failures. Then assuming mutual independence between the individuals, it follows that Y i BinN i, π i,..., G. 1 Our goal is to estimate the {π i } and related quantities on the basis of the {x i } and the {y i }. To this end, next we motivate the use of appropriate link functions Transformations Since η i = β 1 x i1 + β 2 x i β p x ip, and π i 0, 1, then clearly π i cannot be regressed on the explanatory variables directly. In order to overcome this obstacle, we consider applying a transformation to π i, yielding a 1

2 new quantity on,, and then relating it to η i. Here are some possible transformations: Logistic i.e. ξ i = logitπ i πi ξ i = log 1 π i 2 Probit ξ i = Φ 1 π i 3 where Φ is the cumulative distribution function c.d.f. for the Standard Normal distribution, i.e. with π i = ξi φudu φu = 1 2π e u2 /2 Complementary log-log ξ i = log[ log1 π i ] Logistic Regression Suitability of the transformations 2-4 depends on the context and situation. However, there are a few reasons for considering the logistic transformation when it is appropriate: π log i can be interpreted as the logarithm of the odds in favour of success; 1 π i it is easy to compute; it is the canonical link for the Binomial error structure. This choice of transformation yields the framework which is known as logistic regression; more formally a are mutually independent. Y i BinN i, π i,,..., G b The explanatory variables provide a set of linear predictors η i = β 1 x i β p x ip,,..., G. 2

3 c: a and b are connected via πi gπ i = log 1 π i with gπ i = η i. The general expression for the deviance is: G D = 2 [y i θ i ˆθ i b θ i + bˆθ i ] 5 Let us derive the form of the deviance under the conditions specified above. calculations 1 it was found that πi φ = 1, a i φ = φ, θ i = log 1 π i bθ i = N i log1 + e θ i 1 = N i log = N 1 + e θ i log1 π i i and hence with π i = µ i /N i. µ i = b θ i = N ie θ i 1 + e θ i = N i π i In previous The maximum likelihood estimates for θ i and bθ i are given by ˆπi Niˆπ i ˆµi ˆθ i = log = log = log 1 ˆπ i N i N iˆπ i N i ˆµ i bˆθ i = N i log1 ˆπ i = N i log 1 ˆµ i Ni ˆµ i = N i log N i N i respectively. The corresponding values for the saturated model are πi yi /N i yi θ i = log = log = log 1 π i 1 y i /N i N i y i b θ i = N i log1 π i = N i log 1 y i Ni y i = N i log N i N i 1 Examples 5 Question 2. 3

4 noting that µ i = y i, and so π i = y i /N i. Plugging these expressions back into 5 and re-arranging terms yields D = 2 G [ y i log yi ] Ni y i + N i y i log ˆµ i N i ˆµ i 6 which has a χ 2 G p distribution approximately under the hypothesis that the model fits the data sufficiently well G > p; however, we should be cautious about the use of D, as the aforementioned distribution may be a poor approximation to the true distribution of D Ungrouped Binary Data To a limited extent, we can deal with data that are ungrouped: in effect, each individual forms its own group, so that G = n. Thus one can estimate parameters in the manner suggested in previous sections, with N i = 1,,..., G= n. However, it appears that, under logistic regression, D is not useful for assessing goodnessof-fit; this is because D can be reduced to an expression that involves fitted values, but not the observed values of the dependent variable, {y i } thus the comparison between fitted and observed values does not appear to exist. The case for this is argued below. Setting N i = 1 for all i, and G = n in 6 yields the expression D = 2 [y i log y i y i log π i + 1 y i log1 y i 1 y i log1 π i ]. Defining 0 0 = 1, then y i log y i and 1 y i log1 y i are both equal to 0 at the two possible values of y i which are 0 and 1. Hence, our expression for D reduces to Next we show that y i log D = 2 = 2 [y i log π i + 1 y i log1 π i ] π i 1 π i [ y i log πi 1 π i ] + log1 π i can be replaced by a term not involving y i. Since Y i Bin1, π i Bernoulliπ i, then the corresponding log-likelihood is given by lβ 1,..., β p = {y i logπ i + 1 y i log1 π i }. 4

5 It can be shown that j =1,..., p. l β j = { yi 1 y } i π i 1 π i x ij π i 1 π i = y i π i x ij Since the { l β j } are equal to 0 at β = β, then at β = β also. p j=1 = β j l β j = y i π i y i π i log πi p β j x ij j=1 = 0 1 π i Hence and thus D becomes D = 2 πi y i log = 1 π i πi π i log 1 π i { } πi π i log + log1 π i. 7 1 π i 12.6 Examples Dose-Response Model Bliss 1935 Recall the data showing the numbers of insects that have died after several hours exposure to a certain pesticide. Dose x i Number of Number log 10 CS 2 mgl 1 insects, n i killed, y i

6 Independence between observations, Binomial error structure, logit link, and linear predictor given by η i = β 1 + β 2 x i was assumed and the model fitted in S-PLUS; an abbreviated version of the output is presented below. > pesticide.glm <- glmprop ~ dose, family = binomial, weights = number, data = pesticide > summarypesticide.glm Call: glmformula = prop ~ dose, family = binomial, data = pesticide, weights = number Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Intercept dose Dispersion Parameter for Binomial family taken to be 1 Null Deviance: on 7 degrees of freedom Residual Deviance: on 6 degrees of freedom Number of Fisher Scoring Iterations: 5 This has a scaled deviance of on 6 degrees of freedom: this corresponds to an approximate p-value of ; this is just about acceptable. Let us try to improve upon the fit by selecting other link functions. To use the probit link, we invoke one of the optional arguments in the glm command; > pesticide.glm <- glmprop ~ dose, family = binomiallink = probit, + weights = number, data = pesticide This time we obtain a scaled deviance of on 6 degrees of freedom: a slightly better fit. Finally, let us try the complementary log-log link: > pesticide.glm <- glmprop ~ dose, family = binomiallink = cloglog, weights = number, + data = pesticide > summarypesticide.glm Coefficients: Value Std. Error t value Intercept dose Dispersion Parameter for Binomial family taken to be 1 6

7 Null Deviance: on 7 degrees of freedom Residual Deviance: on 6 degrees of freedom This time the deviance is on 6 degrees of freedom: a serious improvement! So if there were nothing about the situation or experiment that would preclude the use of such a link, then this could be the model to go for. The estimated values of the parameters are β 1 = , β2 = From these, related quantities like fitted values, and deviances can be evaluated. Indeed, from 4, we have π i = 1 exp[ expξ i ] which implies that π i = 1 exp[ exp η i ] = 1 exp[ exp β 1 + β 2 x i ] The output below shows both an explicit calculation of the fitted values for the {π i } as per the formula above, and the values obtained using fitted > d <- coefpesticide.glm[1] + coefpesticide.glm[2] * dose > d [1] > my.fitted <- 1 - exp - expd my.fitted [1] > fittedpesticide.glm Challenger Mission Dalal et. al1989 Data on the 23 space shuttle flights that occurred before the Challenger mission in 1986 are given in the following table. For each of the 23 missions, data on the temperature, in o F, at the time of flight Temp., and whether at least one primary O-ring suffered thermal distress TD were recorded. Temp TD Temp TD Temp TD

8 TD can be thought of as being the observed probability of thermal distress. We fit a logistic regression model between TD and Temp. > td <- c0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1 > temp <- c66, 70, 69, 68, 67, 72, 73, 70, 57, 63, 70, 78, 67, 53, 67, 75, 70, + 81, 76, 79, 75, 76, 58 > shuttle <- data.frametd, temp > shuttle.glm <- glmtd ~ temp, family = binomial, data = shuttle > summaryshuttle.glm Call: glmformula = td ~ temp, family = binomial, data = shuttle Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Intercept temp Dispersion Parameter for Binomial family taken to be 1 Null Deviance: on 22 degrees of freedom Residual Deviance: on 21 degrees of freedom Number of Fisher Scoring Iterations: 4 Correlation of Coefficients: Intercept temp > anovashuttle.glm Analysis of Deviance Table Binomial model Response: td Terms added sequentially first to last Df Deviance Resid. Df Resid. Dev NULL temp Notice that the weights argument has not been specified since we are dealing with ungrouped Binary response data. The residual deviance is on 21 degrees of freedom. However, this is not relevant as this is an example of ungrouped Binomial response data. Examination of the various residuals as discussed in Chapter 10 can provide an alternative route for assessing the goodness-of-fit of the model. Assuming, for the sake of argument, that the model does provide a good fit, then we see 8

9 that π = exp β 1 + β 2 x 1 + exp β 1 + β 2 x where β 1 = and β 2 = , where x is the temperature at the time of flight, and π is the corresponding estimated probability of thermal distress of at least one primary O-ring. Question: What is the predicted probability of thermal distress at 31 o F supposedly the temperature at the time of the Challenger flight? Setting x = 31 in 8 yields π = exp exp = i.e. it is predicted that thermal distress is almost certain at that temperature. Question: At which temperature is it estimated that there is a 50% chance that thermal distress occurs? Substituting for π = 0.5 into 8, we find that the corresponding x satisfies β 1 + β 2 x = 0 8 which implies that i.e. about 65 o F. x = β 1 β 2 = = Exercise 12.1 Try working through Example 6.1 in Krzanowski. 9

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will