Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Size: px
Start display at page:

Download "Estimated Precision for Predictions from Generalized Linear Models in Sociological Research"

Transcription

1 Quality & Quantity 34: , Kluwer Academic Publishers. Printed in the Netherlands. 137 Estimated Precision for Predictions from Generalized Linear Models in Sociological Research TIM FUTING LIAO Department of Sociology, University of Illinois, 326 Lincoln Hall, 702 S. Wright Street, Urbana, IL 61801, U.S.A. Abstract. In this paper I present a general method for constructing confidence intervals for predictions from the generalized linear model in sociological research. I demonstrate that the method used for constructing confidence intervals for predictions in classical linear models is indeed a special case of the method for generalized linear models. I examine four such models the binary logit, the binary probit, the ordinal logit, and the Poisson regression model to construct confidence intervals for predicted values in the form of probability, odds, Z score, or event count. The estimated confidence interval for an event prediction, when applied judiciously, can give the researcher useful information and an estimated measure of precision for the prediction so that interpretation of estimates from the generalized linear model becomes easier. Key words: generalized linear models, confidence intervals, predictions, social science methods, logit analysis, Poisson regression. 1. Introduction Predicted values provide a useful way for interpreting statistical models analyzing response probabilities in sociological research. The usual logit analysis models event probability while the Poisson regression models event count. Alternatively, they can all be seen as members of the family of the generalized linear model (GLM) because of their similarities in properties such as linearity in the systematic component of the model (McCullagh and Nelder, 1989; Nelder and Wedderburn, 1972). Despite the popularity of predicted probabilities from logit and probit models in empirical sociological research, little systematic discussion exists in the literature about their statistical precision (with the exceptions to be discussed later). In this paper, I discuss a simple method for constructing approximate confidence intervals for predictions from the GLM. The method is a natural application of the theory of the GLM based on some of its known properties. Although in theory one can mathematically solve for the variance of a prediction by treating the prediction as a function of the variances and covariances of the

2 138 TIM FUTING LIAO coefficient estimates and working through a Taylor s series, the difficulty increases exponentially with the number of independent variables. For instance, we may want to construct a confidence interval for the predicted probabilities from a binary logit model conditional upon certain combination of values in the explanatory variables. We can easily see that the variance of the predicted probability is a function of the estimated variances and covariances of the parameter estimates. Even with a relatively small number of X variables the mathematical solution becomes intractable. I present an alternative way of constructing confidence intervals for predicted values in probability or other forms of the dependent variable. Because GLMs all share the same characteristic of being linear in the systematic component, estimating the lower and upper bounds of confidence interval of the linear predictor and then working through the link function will give us a feasible method of estimating confidence limits for a predicted value in probability, odds, Z score, or event count. Standard errors for displayed effects in the GLM are discussed by Fox (1987) and help pave the foundation in this paper. Similar confidence intervals for predicted logit and odds are also used by statisticians (Clogg et al., 1991; Haberman, 1978; Rubin and Schenker, 1987). After reviewing the GLM, I systematically discuss the general method for constructing confidence intervals for predictions in GLMs, and demonstrate with three examples applying the method to the binary logit, the binary probit, the ordinal logit, and the Poisson regression model. Such confidence intervals facilitate interpretation. 2. Generalized Linear Models Following Dobson (1990), McCullagh and Nelder (1989), and Nelder and Wedderburn (1972), we have a GLM of y i, where the ith observation is a realization of a random variable Y i whose expected values are µ i. For the sake of convenience we drop the subscript i and use matrix notation hereafter. The model is specified with its two components and a canonical link function: (A) The random component: The components of Y have independent and identical distributions from the exponential family with E(Y ) = µ and variance σ 2 not assumed to be homogeneous, though assumed to vary through µ alone when heterogeneous. (B) The systematic component: Covariates X produce a linear predictor η given by η = Xβ. (C) The random and the systematic components are connected by a link function, η = g(µ), which can take many forms including the following: (1) Linear: η = µ. (2) Logarithm: η = log µ or µ = e η. (3) Logit: η = log µ/(1 µ). (4) Probit: η = 1 (µ).

3 ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 139 (5) Complementary log log: η = log{ log(1 µ)}. (6) Log log: η = log{ log(µ)}. The link functions (3) through (6) are applied to observed binomial distributions. By extension, we may specify a multinomial logit link function accordingly, based on observed multinomial distributions (data) and our understanding of the distribution (theory). Indeed, the multinomial distribution belongs to the family of exponential distributions (Barndorff-Nielsen, 1980). Alternatively, we may view this link function as an extension of the function (3) or a special case of the multivariate logit link function discussed by McCullagh and Nelder (1989: 220). 1 (7) Multinomial logit: η j = log µ j /µ J, where j indicates the jth in 1...j...J response categories. The choice of a link function depends on the distribution of the data and our theory about distribution. Specifically, the distribution of the random component in Y (the part that cannot be systematically explained by X variables) determines the link function and the type of GLM, though some distributions may be appropriately modelled with more than one link function. In GLMs, the distribution may come from an exponential family, in which the normal, the binomial, the Poisson distribution, and several others belong (see McCullagh and Nelder, 1989). When the distribution is normal, the linear link function usually, though not always, applies. The link functions based on the binomial and the Poisson distribution are quite useful, because many variables in the social sciences follow these distributions. For example, often we have discrete or categorical dependent variables that call for logit type of analysis. Sometimes we have event count data, and may use the Poisson regression model (or the negative binomial model for overdispersed data), which is based on the logarithm link function. We may use loglinear models for contingency tables to study event count data when all X variables are categorical. 3. Interval Prediction in the Generalized Linear Model Using the link function, η = g(µ), we may express the predicted value as a function of the predicted linear predictor, ˆη, or ˆµ = g 1 ( ˆη). Therefore, the interval prediction of µ in probability or other forms in the GLM can be equivalently and reasonably obtained by first constructing a confidence interval for ˆη. Essentially, we are interested in constructing a confidence interval for ˆη 0 given a set of certain values in X 0 where the subscript 0 indicates a set of given values. When the GLM is a logit model, the method for constructing a confidence interval for ˆη 0 described below is identical to that for predicting the lower and upper bounds of the logit (Clogg et al., 1991; Rubin and Schenker, 1987) and for predicting those bounds of the odds after exponentiation (Haberman, 1978). In general, the esti-

4 140 TIM FUTING LIAO mated asymptotic variance of ˆη 0 is a function of X 0 and the estimated asymptotic covariance of (Fox, 1987). Because E(η 0 ) = X 0 β (1) and asymptotically E( ˆβ) = β, (2) we obtain the error of the prediction, e 0 = E(η 0 ) ˆη 0 = X 0 ( ˆβ β), (3) and its expectation, E(e 0 ) = 0. (4) The asymptotic variance of e 0 then is var(e 0 ) = E{[ X 0 ( ˆβ β)][ X 0 ( ˆβ β)] } = X 0 E[( ˆβ β)( ˆβ β) ]X 0 X 0 [I(β)] 1 X 0 (5) because asymptotically E[( ˆβ β)( ˆβ β) ] [I(β)] 1, (6) where I(β) is Fisher information matrix. The approximation of the inverse Fisher information matrix to the asymptotic variance of maximum likelihood estimates given by Equation (6) is widely known (see, e.g., McCullagh and Nelder, 1989: 119). Therefore, asymptotically, e 0 N{0,X 0 [I(β)] 1 X 0 } ( ˆβ β) N{0, [I(β)] 1 }. (7) The result in (5) is general because the inverse of Fisher information matrix gives the variance covariance of the maximum likelihood estimates regardless of the type of GLM. When the error distribution is normal and the link function is identity, we have the classical linear regression model. Because the link function here is identity, µ/ η = 1. Thus, the result from (5) simplifies to var(e 0 X 0 ) = X 0 ˆσ 2 (X X) 1 X 0, (8)

5 ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 141 since I(β) = X 1 2 µ X. (9) σ 2 η2 This result in (8) is, of course, the same as the interval prediction of Y discussed in standard texts of linear regression models (e.g., Johnston, 1984). Here the maximum likelihood and the least square estimation give the same variance of the parameter estimates, hence the same variance of the prediction. There are two types of interval prediction in linear regression models individual and mean prediction the former having a narrower confidence interval than the latter (see Gujarati, 1988; Johnston, 1984; Judge et al., 1982). Here I focus on mean prediction, since it is more widely used. The generalization of the results above to individual prediction may be useful for some researchers but is beyond the scope of this paper. Using (5), we can directly construct the confidence interval for a mean prediction in a GLM based on the estimated variance-covariance matrix of the estimates. In general, we have var[e(η) ˆη 0 X 0 [I(β)] 1 X 0. (10) This is just another way to express (5). The predicted upper and lower confidence limits of ˆη 0 can be plugged in the inverse of any link function because ˆµ = g 1 ( ˆη) (Fox, 1987). The method given by Equation (10) is simple to apply because [I(β)] 1 or the estimated asymptotic covariance of ˆβ is readily available in the output of most statistical software packages. For a classical linear model, the link function is identity, therefore the predicted η 0 equals the predicted µ, which equals X 0 β. For four GLMs with a nonlinear link function, I present the specification and the construction of confidence intervals in the following sections. 4. The Binary Logit and Probit Model For a logit model (with the logit link function), the predicted logit and odds are, ˆη 0 and e ˆη 0. Thus, the 100(1 α) percent confidence interval for the predicted logit (also see Clogg et al., 1991; Rubin and Schenker, 1987), given X 0, is defined by the predicted lower and upper bounds of η, ˆη 0 z α/2 X 0 [I(β)] 1 X 0 ˆη 0 ˆη 0 + z α/2 X 0 [I(β)] 1 X 0, (11)

6 142 TIM FUTING LIAO and the predicted odds (also see Haberman, 1978; Hosmer and Lemeshow, 1989), given X 0, are simply the limits above exponentiated. Because ˆµ = g 1 ( ˆη), the predicted probabilities are P(Y 0 = 1 X 0 ) = e ˆη 0. (12) 1 + e ˆη 0 Thus, the 100(1 α) percent confidence interval for the predicted probability, given X 0,is e ˆη 0 z α/2 X 0 [I(β)] 1 X e ˆη 0 z α/2 X 0 [I(β)] 1 X 0 P(Y 0 = 1 X 0 ) e ˆη 0 z α/2 X 0 [I(β)] 1 X 0, 1 + e ˆη 0 z α/2 X 0 [I(β)] 1 X 0 (13) where the inverse of the information matrix gives the maximum likelihood estimates of the variance of the parameter estimates. To simplify, we may replace what is inside the squareroot sign with var(e 0 ) from (10). A GLM with the probit link function makes the model probit, prob(y = 1) = (Xβ). Similar to the logit case, the 100(1 α) percent confidence interval for the predicted probability is defined as [ˆη 0 z α/2 se(e 0 )] P(Y 0 = 1 X 0 ) [ˆη 0 + z α/2 se(e 0 )], (14) where se(e 0 ), the standard error of e 0, is the squareroot of the result from (10). Similar to the confidence interval for the logit or odds, without going through the probit link function we can have a confidence interval for predicted Z scores. Note that because of the nonlinear logit or probit link function, the interval for predicted probability may be asymmetric. The following simple example should illustrate the behavior of confidence intervals in binary logit and probit models. The cross-tabulated data presented in Table I are drawn from Morgan and Teachman (1988: Table 3). These data are about whether an adolescent aged had ever had sexual intercourse (yes/no) by the time of the survey. In addition to the number of yes and no responses in the two sexes and races, I also give the observed odds of having answered yes rather than no and the observed probability of answering yes in the rightmost two columns. While someone may want to interpret these odds and probabilities for their own sake, our focus here, however, is on interval predictions. Table II contains the binary logit and probit model estimates. Both independent variables are highly statistically significant, judging by the probability values of the significance test. We see that the estimates from either model will give similar estimated probabilities.

7 ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 143 Table I. Adolescents having ever had sexual intercourse by sex and race from the National Survey of Children Intercourse Race Sex Yes No Odds Probability White Male Female Subtotal Black Male Female Subtotal Source: Morgan and Teachman (1988, Table 3). Table II. Logit model estimated for data in Table I ˆβ Variable Logit Probit p exp( ˆβ) White / (0.226) (0.144) Female / (0.225) (0.131) Constant / (0.226) (0.138) LR Statistic df 2 2 Using (11), (13), and (14), I have calculated the upper and lower limits of the confidence intervals for the predicted odds, Z scores, and probabilities based on the two models. These confidence limits as well as the central predictions are presented in Table III. Because of the focus of the paper, I will not emphasize substantive interpretation of the predictions, which are interesting to examine in every example in the paper. What do we learn from these confidence intervals? First, a statistically significant parameter estimate by conventional standards does not guarantee nonoverlapping confidence intervals. The confidence interval for males overlaps with that for females in all the four situations in the table. This is true regardless of the form of the prediction probability, odds, or Z score because the probability is simply a function of its associated odds or Z score. Thus, I now focus my discussion on the predicted probabilities. For whites the overlap is trivial (<0.01 in probability),

8 144 TIM FUTING LIAO Table III. Predicted odds, Z scores, probabilities, and their 95% confidence intervals using the logit and probit estimates based on the sexual intercourse data Logit estimate Probit estimate Race Sex Prediction Lower Central Upper Lower Central Upper White Male Odds or Z Probability Female Odds or Z Probability Black Male Odds or Z Probability Female Odds or Z Probability while it is nontrivial for blacks (about 0.05 in probability). This knowledge cannot be gained by examining the predicted probabilities alone, because the difference in predicted probability between the sexes among blacks (0.16) is actually greater than that between the sexes among whites (0.10). Second, parameters that are statistically more precise are associated with narrower confidence intervals. This is evidenced by the confidence interval within each sex for one race being quite a distance away from the same confidence interval for the other race, while the confidence intervals for the two sexes conditional on race are not that far apart. Finally, either the logit or the probit model gives similar estimates of the confidence limits in probability. The final remark should come as no surprise, and the similarities between the two models are well discussed in the literature. Maximum likelihood estimation in the GLM is asymptotically normal. This property still applies in the interval predictions from the models. The sample size of most surveys including the current example is large enough for maintaining the property. When the link function is multinomial logit, we may naturally extend the construction of confidence intervals to the multinomial logit model. I omit a multinomial example because of its similarity to the binomial logit. Suffice it to say that for both the two- and the multiple-outcome model with J number of outcomes, j P j = 1 for the probability. The lower confidence limits for the categories 1, 2,...,J 1 and the upper confidence limit for the category J sum to 1, and the upper confidence limit for the categories 1, 2,...,J 1 and the lower confidence limit for the category J sum to 1. Because j P j = 1, in multiple-outcome models the lower confidence limit of ˆη gives the lower confidence limit of the predicted probability except for the last category, for which the upper confidence limit of ˆη gives the lower confidence limit of the predicted probability, and vice versa.

9 ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS The Ordinal Logit Model This is a natural extension of the binary-response model (McCullagh and Nelder, 1989). The predicted probability from an ordinal logit model is P(Y 0 = j X 0 ) = F(α j ˆη 0 ) F(α j 1 ˆη 0 ), (15) where α j are threshold values between response categories to be estimated with β j. In order for all the probabilities to be positive, we must have 0 <α 2 <α 3 < <α J 2. The first component of the righthand side of (15) gives Prob(Y 0 j) while the second component gives Prob(Y 0 j 1). Equation (15) gives the general form for the probability that the response Y falls into category j,and F( ) can be usually replaced with either the cumulative logistic distribution function or the cumulative normal distribution function. Here we focus on the ordinal logit model, thus using a cumulative logistic distribution function. The first threshold parameter, α 1,is typically normalized to zero so that we have one less parameter to estimate, because the scale is arbitrary and can start from or finish with any value. The confidence limits for the predicted probabilities of Prob(Y 0 j) in the ordinal logit model are L{α j [ˆη 0 ± z t/α se(e 0 )]}, (16) where L( ) substitutes F( ) to indicate the cumulative logistic distribution function. Obviously, a simple subtraction or addition of the lower or upper confidence limit will not work for (15). Thus, confidence intervals for the cumulative probability in an ordinal logit model are the natural choice. On the General Social Survey (GSS) during recent years, the respondent was asked a series of questions regarding gender attitudes. The example for the ordinal logit model is based on one of such questions in the 1989 GSS. The respondent was asked if he or she strongly agrees, agrees, disagrees, or strong disagrees that it is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family (Davis and Smith, 1991). 2 An ordinal logit model was estimated using the SAS PROC LOGISTIC procedure, and the results are presented in Table IV. 3 Most of the predictors have a strong influence on the attitude, especially age, sex, and education, which are highly significant. The overall model fit is very good, as indicated by the Likelihood Ratio statistic; the model also passes the statistical test for the proportional odds assumption (results not shown). Predicted probabilities and their confidence intervals, our focus again for this example, are estimated at the mean values of age, religion, and marital status for a white female by various levels of education (Table V). Here the probabilities and confidence intervals are estimated cumulative probabilities and their corresponding confidence intervals by using the first component in the righthand side of (15) and (16), respectively.

10 146 TIM FUTING LIAO Table IV. Ordinal logit model estimates using the 1989 GSS gender attitudes data (N = 970) Variable ˆβ se(β) p exp( ˆβ) Age White? Female? Catholic? Married? Education Constant Constant LR Statistic df 6 Note: The dependent variable is coded 1 if the respondent (strongly) agrees, 2 if disagrees, and 3 if strongly disagrees with the statement that it is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family. Source: Davis and Smith (1991). Table V. Predicted probabilities and their 95% confidence intervals using ordinal logit estimates based on the 1989 GSS gender attitudes data Not disagree Not strongly disagree Education Lower Probability Upper Lower Probability Upper 0 year years years years years years years years years years years Note: The cumulative probabilities and their confidence intervals are estimated at the mean values of age, religion, and marital status for a white female. The difference between the two columns of probabilities gives the probability for answering disagree, and 1 minus column 2 probability gives that for answering strongly disagree.

11 ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 147 The two cumulative predicted probabilities are those for answering strongly agree or agree (i.e., not disagree) and those for answering either of the first two categories or disagree (i.e., not strongly disagree). Several points are worth noting. First, for a white female with the sample mean values of age, catholic proportion, and marital status, education tends to decrease the likelihood of agreeing with the question, or, put differently, increase the likelihood of disagreeing with traditionalism. This conclusion should come as no surprise. Statistically, the two schedules of predicted probabilities have quite narrow confidence intervals. At every level of education, the upper confidence limit for the probability of not disagreeing is at least 0.05 below the lower confidence limit for the probability of not strongly disagreeing. Moreover, the width of the confidence interval is a function of education: The width of the confidence interval for the probability of not disagreeing tends to decrease with education, while that for the probability of not strongly disagreeing tends to increase with education. This makes substantive sense, because educated respondents tend to be less likely to answer strongly agree or agree while less education respondents tend to give an answer of strongly agree, agree, or even disagree. It also implies that the demarkation between the disagree and the strongly disagree category is less clear for educated people than for less educated people, whose answers tend to concentrate in the first three response categories. A gain in education makes much difference in answering this attitudinal question. Moving from the lower to the high level of education, the confidence intervals associated with 2 years, 8 years, 12 years, 16 years, and 20 years of completed education are entirely nonoverlapping for either of the two schedules of probabilities. 6. The Poisson Regression Model Event counts, especially rare event counts, are assumed to be generated by a Poisson process, and can be described and modelled by a Poisson distribution. The basic model is the single-parameter Poisson probability density function f(y i,θ i ) = P(Y i = y i ) = e θ i θ y i i, for y i = 0, 1, 2,..., ; θ i > 0, (17) y i! where θ i is the expected value of Y i, E(Y i ), with subscript i indicating individual cases. Equation (17) defines the model in terms of probability. Note that this distribution can also generate the loglinear model when all X variables are categorical. We can express the Poisson model in the observed y i or its expectations θ i, thereby producing a common formulation of the Poisson regression, ln θ i η i. (18) This ensures that θ i is always greater than 0 because θ i = exp(η i ). Equation (18) should remind us of the logarithm link function, η = log µ, which is based on the Poisson distribution.

12 148 TIM FUTING LIAO Based on (17) and (18), the confidence limits for the predicted probabilities from the Poisson regression model, given X 0, are defined as e e ˆη 0i ±z α/2 se(e 0i ) [e ˆη 0i±z α/2 se(e 0i ) ] y i, for y i = 0, 1, 2,..., ; θ i > 0. (19) y i! Practically, we only need to calculate the confidence limits as well as predicted probabilities for those response categories having a nontrivial observed frequency rather than up to infinity. As is the case with the confidence interval for predicted odds, the interval for predicted event counts requires rather simple calculations involving only exponentiating the lower and upper limits of ˆη 0. Let us use the 1990 GSS data to illustrate the confidence intervals for predicted probabilities from the Poisson regression model. On the 1990 round of the GSS, the respondent was asked about the number of traumatic events (deaths, divorces, unemployments, and hospitalization/disabilities) happening to the respondent in the past year (Davis and Smith 1991). These were relatively rare events because 65 percent of those who answered the question experienced no such event, 29.6 percent experienced one event, 4.3 percent experienced two events, and the remaining 1.1 percent experienced three events. This resembles a Poisson distribution. 4 In addition, the dispersion parameter discussed by McCullagh and Nelder (1989) is estimated to be , with unity being the equality between the mean and the variance. The estimated dispersion parameter suggests that a negative binomial model is not necessary. A Poisson regression model with the covariates of race, marital status, education, residence in an SMSA, and professional status was estimated, and its parameter estimates are reported in Table VI. 5 All covariates except residence in an SMSA are significant factors in explaining the tendency of experiencing traumatic events. As one would expect, married respondents tend to lead a more secure life; so do people who reside in an SMSA, probably due to greater access to facilities and services. Being white and educational attainment both have positive effect, probably because of a particular traumatic event, divorce. For examining confidence intervals for predictions, I use marital status and residence in an SMSA (also see Note 5). Therefore, predicted probabilities and their confidence intervals are estimated for a white respondent with 12 years of completed education and mean professional status at various levels of marital and residence status (Table VII). As suggested by the signs of the parameter estimates for marital status and residence in an SMSA, respondents who are not married or who reside in an SMSA tend to predict a higher mean event count and a lower probability of zero event and a higher probability of one or more event. As suggested by the relative size of standard errors, marital status produces narrower confidence intervals without any overlapping intervals between the married and not married people. On the other hand, SMSA residence is associated with overlapping confidence intervals for probability predictions, though not for event count predictions. Sometimes

13 ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 149 Table VI. Poisson regression model estimates using the 1990 GSS trauma data (N = 761) Variable ˆβ se( ˆβ) p exp( ˆβ) White? Married? Education NonSMSA? Professional? Constant LR Statistic df 5 Note: The dependent variable records the number of traumatic events (deaths, divorces, unemployments, and hospitalization/disabilities) happening to the respondent in the past year. Source: Davis and Smith (1991). these overlaps can be quite large (e.g., those for zero event probability). The interval becomes narrower when the probability reduces to very small values for the higher event probabilities. These confidence intervals are quite informative because they give us a precision measure for the predictions, which enable us to better compare the effects of different variables. As in other probability models, corresponding lower and upper confidence limit estimates for probability sum to unity approximately (see notes to Table VII). 7. Conclusion I have discussed in this paper a general yet simple and practical method of confidence intervals for predictions from the GLM. I have shown that the method used for making prediction for classical linear models is merely a special case of the general method for GLMs. I have examined four such models the binary logit, the binary probit, the ordinal logit, and the Poisson regression model to demonstrate the construction of the confidence intervals for predicted event probabilities, odds, Z scores, and event counts. As Liao (1994) demonstrated, these predicted values are important for interpreting GLMs. This paper shows that the confidence interval for an event prediction gives the researcher useful information and an estimated precision for the prediction so that interpretation of the GLM results becomes more informative.

14 Table VII. Predicted counts and probabilities and their 95% confidence intervals using Poisson regression estimates based on the 1990 GSS trauma data Marital Mean event count 0 Event 1 Event 2 Events 3 Events Status Residence Lower Mean Upper Lower Prob Upper Lower Prob Upper Lower Prob Upper Lower Prob Upper Single NonSMSA SMSA Married NonSMSA SMSA Note: The mean event count, the probabilities, and their confidence intervals are estimated at the mean of professional status for a white respondent with 12 years of education. All probabilities should approximately sum to 1 across the choice categories, and the lower confidence limit estimate for 0 event and the upper confidence limit estimates for the remaining events, or the upper confidence limit estimates for 0 event and the lower confidence limit estimates for the remaining events, should, too. 150 TIM FUTING LIAO

15 ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 151 Acknowledgements A version of the paper was presented at the Annual Meeting of the American Sociological Association, August 1993, Miami, Florida, U.S.A. Comments on a previous version of the paper by Kenneth Bollen, Clifford Clogg, Guang Guo, Ronald Rindfuss, and Yu Xie are greatfully acknowledged. The paper has also benefitted from the discussions on the Methodology Session of the meeting, and the discussions with Walter Davis. Notes 1. For another specification of the relation between a polytomous and multivariate polytomous dependent variables, see Haberman (1979). 2. Because of the small number of respondents answering the first category, the first two categories are combined into a (strongly) agree category for the analysis. 3. Some statistical software such as the procedure PROC LOGISTIC in SAS estimates ordinal logit and probit models, but care needs to be taken when calculating probabilities because SAS uses, instead the logit counterpart of (16) as given in McCullagh and Nelder (1989), log[prob(y j)/1 Prob(y j)]=α j + β 2 x 2 + β 3 x 3 + +β k x k,whereα j is a composite term of α j in (16) and β 1,and is replaced with +. This formulation, of course, will not affect maximum likelihood estimation of coefficients, though users may be surprised to see that every coefficient bears a sign contrary to its substantive expectation. 4. The observed distribution of %, %, 4.336%, and 1.051% for 0, 1, 2, and 3 events, respectively, is very well reproduced with the predicted probability from the Poisson regression estimates when all covariates are set at their mean values. The predictions are %, %, 5.420%, and 0.728% for these event categories, showing traumatic events closely follow a Poisson distribution. 5. This model is trimmed from a model with two more explanatory variables, age and sex, which both were highly insignificant (with a probability of the significance test greater than 0.67 and 0.48, respectively). Unfortunately, the GSS does not separately collect information on the individual traumatic events. Marital status is included in the model because its importance in explaining mental and physical health, though it is by definition correlated with divorce, one of the five traumatic events. The exposure variable has month as its unit of analysis for the Poisson regression model. References Barndorff-Nielsen, O. E. (1980). Exponential families, Memoirs No. 5, Department of Theoretical Statistics, Institute of Mathematics, University of Aarhus. Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B. & Weidman, L. (1991). Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. Journal of the American Statistical Association 86: Davis, J. A. & Smith, T. W. (1991). General Social Surveys, , [MRDF]. Chicago: National Opinion Research Center [producer], Storrs, CT: The Roper Center for Public Opinion Research, University of Connecticut [distributor]. Dobson, Annette J. (1990). An Introduction to Generalized Linear Models. London: Chapman and Hall. Fox, John. (1987). Effect displays for generalized linear models. In: C. C. Clogg (ed.), Sociological Methodology Washington: American Sociological Association, pp

16 152 TIM FUTING LIAO Gujarati, D. N. (1988). Basic Econometrics, 2nd edn. New York: McGraw-Hill. Haberman, S. J. (1978). Analysis of Qualitative Data, Volume 1. New York: Academic. Haberman, S. J. (1979). Analysis of Qualitative Data, Volume 2. New York: Academic. Hosmer, D. W., Jr & Lemeshow, S. (1989). Applied Logistic Regression. New York: Wiley. Johnston, J. (1984). Econometric Methods, 3rd edn. New York: McGraw-Hill. Judge, G. G., Hill, R. C., Griffiths, W., L utkepohl, H. & Lee, T.-C. (1982). Introduction to the Theory and Practice of Econometrics. New York: Wiley. Liao, T. F. (1994). Interpreting Probability Models: Logit, Probit, and Other Generalized Linear Models. Thousand Oaks, CA: Sage. McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models, 2nd edn. London: Chapman and Hall. Morgan, S.P. & Teachman, J. D. (1988). Logistic regression: description, examples, and comparisons. Journal of Marriage and the Family 50: Nelder, J. A. & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society A 135: Rubin, D. B. & Schenker, N. (1987). Logit-based interval estimation for binomial data using the Jeffreys prior. In C. C. Clogg (ed.), Sociological Methodology Washington: American Sociological Association, pp

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Interpreting and using heterogeneous choice & generalized ordered logit models

Interpreting and using heterogeneous choice & generalized ordered logit models Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2

More information

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

GLM models and OLS regression

GLM models and OLS regression GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus Presentation at the 2th UK Stata Users Group meeting London, -2 Septermber 26 Marginal effects and extending the Blinder-Oaxaca decomposition to nonlinear models Tamás Bartus Institute of Sociology and

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

ZERO INFLATED POISSON REGRESSION

ZERO INFLATED POISSON REGRESSION STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS

More information

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Electronic Research Archive of Blekinge Institute of Technology

Electronic Research Archive of Blekinge Institute of Technology Electronic Research Archive of Blekinge Institute of Technology http://www.bth.se/fou/ This is an author produced version of a ournal paper. The paper has been peer-reviewed but may not include the final

More information

GENERALIZED LINEAR MODELS Joseph M. Hilbe

GENERALIZED LINEAR MODELS Joseph M. Hilbe GENERALIZED LINEAR MODELS Joseph M. Hilbe Arizona State University 1. HISTORY Generalized Linear Models (GLM) is a covering algorithm allowing for the estimation of a number of otherwise distinct statistical

More information

Group comparisons in logit and probit using predicted probabilities 1

Group comparisons in logit and probit using predicted probabilities 1 Group comparisons in logit and probit using predicted probabilities 1 J. Scott Long Indiana University May 27, 2009 Abstract The comparison of groups in regression models for binary outcomes is complicated

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester GLM models and OLS regression The

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Correspondence Analysis of Longitudinal Data

Correspondence Analysis of Longitudinal Data Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Introduction to GSEM in Stata

Introduction to GSEM in Stata Introduction to GSEM in Stata Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Introduction to GSEM in Stata Boston College, Spring 2016 1 /

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Models for Heterogeneous Choices

Models for Heterogeneous Choices APPENDIX B Models for Heterogeneous Choices Heteroskedastic Choice Models In the empirical chapters of the printed book we are interested in testing two different types of propositions about the beliefs

More information

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates WI-ATSA June 2-3, 2016 Overview Brief description of logistic

More information

Applied Generalized Linear Mixed Models: Continuous and Discrete Data

Applied Generalized Linear Mixed Models: Continuous and Discrete Data Carolyn J. Anderson Jay Verkuilen Timothy Johnson Applied Generalized Linear Mixed Models: Continuous and Discrete Data For the Social and Behavioral Sciences February 10, 2010 Springer Contents 1 Introduction...................................................

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Simple ways to interpret effects in modeling ordinal categorical data

Simple ways to interpret effects in modeling ordinal categorical data DOI: 10.1111/stan.12130 ORIGINAL ARTICLE Simple ways to interpret effects in modeling ordinal categorical data Alan Agresti 1 Claudia Tarantola 2 1 Department of Statistics, University of Florida, Gainesville,

More information

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Mixed Models for Longitudinal Ordinal and Nominal Outcomes Mixed Models for Longitudinal Ordinal and Nominal Outcomes Don Hedeker Department of Public Health Sciences Biological Sciences Division University of Chicago hedeker@uchicago.edu Hedeker, D. (2008). Multilevel

More information

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Generalized Linear Probability Models in HLM R. B. Taylor Department of Criminal Justice Temple University (c) 2000 by Ralph B.

Generalized Linear Probability Models in HLM R. B. Taylor Department of Criminal Justice Temple University (c) 2000 by Ralph B. Generalized Linear Probability Models in HLM R. B. Taylor Department of Criminal Justice Temple University (c) 2000 by Ralph B. Taylor fi=hlml15 The Problem Up to now we have been addressing multilevel

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information