Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Similar documents
Generalized Linear Models Introduction

LOGISTIC REGRESSION Joseph M. Hilbe

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Linear Regression With Special Variables

Generalized linear models

Generalized Linear Models (GLZ)

Investigating Models with Two or Three Categories

9 Generalized Linear Models

STAT5044: Regression and Anova

8 Nominal and Ordinal Logistic Regression

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Generalized logit models for nominal multinomial responses. Local odds ratios

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Generalized Linear Models

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Semiparametric Generalized Linear Models

Generalized Linear Models

Interpreting and using heterogeneous choice & generalized ordered logit models

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

Multinomial Logistic Regression Models

Linear Regression Models P8111

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Repeated ordinal measurements: a generalised estimating equation approach

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Single-level Models for Binary Responses

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Introduction to Generalized Linear Models

Binary Logistic Regression

Generalized Linear Models for Non-Normal Data

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

GLM models and OLS regression

MODELING COUNT DATA Joseph M. Hilbe

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Outline of GLMs. Definitions

Logistic Regression: Regression with a Binary Dependent Variable

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

More Statistics tutorial at Logistic Regression and the new:

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Sections 4.1, 4.2, 4.3

Introducing Generalized Linear Models: Logistic Regression

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Categorical data analysis Chapter 5

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Generalized Linear Models: An Introduction

ZERO INFLATED POISSON REGRESSION

Logistic regression: Miscellaneous topics

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Comparing IRT with Other Models

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

PQL Estimation Biases in Generalized Linear Mixed Models

Electronic Research Archive of Blekinge Institute of Technology

GENERALIZED LINEAR MODELS Joseph M. Hilbe

Group comparisons in logit and probit using predicted probabilities 1

Generalized Linear Models 1

Simple logistic regression

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

D-optimal Designs for Factorial Experiments under Generalized Linear Models

11. Generalized Linear Models: An Introduction

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

CHAPTER 1: BINARY LOGIT MODEL

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Correspondence Analysis of Longitudinal Data

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Introduction to GSEM in Stata

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

Model Estimation Example

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Models for Heterogeneous Choices

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Applied Generalized Linear Mixed Models: Continuous and Discrete Data

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Introduction to General and Generalized Linear Models

STAT 7030: Categorical Data Analysis

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Simple ways to interpret effects in modeling ordinal categorical data

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Generalized Linear Probability Models in HLM R. B. Taylor Department of Criminal Justice Temple University (c) 2000 by Ralph B.

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Transcription:

Quality & Quantity 34: 137 152, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 137 Estimated Precision for Predictions from Generalized Linear Models in Sociological Research TIM FUTING LIAO Department of Sociology, University of Illinois, 326 Lincoln Hall, 702 S. Wright Street, Urbana, IL 61801, U.S.A. Abstract. In this paper I present a general method for constructing confidence intervals for predictions from the generalized linear model in sociological research. I demonstrate that the method used for constructing confidence intervals for predictions in classical linear models is indeed a special case of the method for generalized linear models. I examine four such models the binary logit, the binary probit, the ordinal logit, and the Poisson regression model to construct confidence intervals for predicted values in the form of probability, odds, Z score, or event count. The estimated confidence interval for an event prediction, when applied judiciously, can give the researcher useful information and an estimated measure of precision for the prediction so that interpretation of estimates from the generalized linear model becomes easier. Key words: generalized linear models, confidence intervals, predictions, social science methods, logit analysis, Poisson regression. 1. Introduction Predicted values provide a useful way for interpreting statistical models analyzing response probabilities in sociological research. The usual logit analysis models event probability while the Poisson regression models event count. Alternatively, they can all be seen as members of the family of the generalized linear model (GLM) because of their similarities in properties such as linearity in the systematic component of the model (McCullagh and Nelder, 1989; Nelder and Wedderburn, 1972). Despite the popularity of predicted probabilities from logit and probit models in empirical sociological research, little systematic discussion exists in the literature about their statistical precision (with the exceptions to be discussed later). In this paper, I discuss a simple method for constructing approximate confidence intervals for predictions from the GLM. The method is a natural application of the theory of the GLM based on some of its known properties. Although in theory one can mathematically solve for the variance of a prediction by treating the prediction as a function of the variances and covariances of the

138 TIM FUTING LIAO coefficient estimates and working through a Taylor s series, the difficulty increases exponentially with the number of independent variables. For instance, we may want to construct a confidence interval for the predicted probabilities from a binary logit model conditional upon certain combination of values in the explanatory variables. We can easily see that the variance of the predicted probability is a function of the estimated variances and covariances of the parameter estimates. Even with a relatively small number of X variables the mathematical solution becomes intractable. I present an alternative way of constructing confidence intervals for predicted values in probability or other forms of the dependent variable. Because GLMs all share the same characteristic of being linear in the systematic component, estimating the lower and upper bounds of confidence interval of the linear predictor and then working through the link function will give us a feasible method of estimating confidence limits for a predicted value in probability, odds, Z score, or event count. Standard errors for displayed effects in the GLM are discussed by Fox (1987) and help pave the foundation in this paper. Similar confidence intervals for predicted logit and odds are also used by statisticians (Clogg et al., 1991; Haberman, 1978; Rubin and Schenker, 1987). After reviewing the GLM, I systematically discuss the general method for constructing confidence intervals for predictions in GLMs, and demonstrate with three examples applying the method to the binary logit, the binary probit, the ordinal logit, and the Poisson regression model. Such confidence intervals facilitate interpretation. 2. Generalized Linear Models Following Dobson (1990), McCullagh and Nelder (1989), and Nelder and Wedderburn (1972), we have a GLM of y i, where the ith observation is a realization of a random variable Y i whose expected values are µ i. For the sake of convenience we drop the subscript i and use matrix notation hereafter. The model is specified with its two components and a canonical link function: (A) The random component: The components of Y have independent and identical distributions from the exponential family with E(Y ) = µ and variance σ 2 not assumed to be homogeneous, though assumed to vary through µ alone when heterogeneous. (B) The systematic component: Covariates X produce a linear predictor η given by η = Xβ. (C) The random and the systematic components are connected by a link function, η = g(µ), which can take many forms including the following: (1) Linear: η = µ. (2) Logarithm: η = log µ or µ = e η. (3) Logit: η = log µ/(1 µ). (4) Probit: η = 1 (µ).

ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 139 (5) Complementary log log: η = log{ log(1 µ)}. (6) Log log: η = log{ log(µ)}. The link functions (3) through (6) are applied to observed binomial distributions. By extension, we may specify a multinomial logit link function accordingly, based on observed multinomial distributions (data) and our understanding of the distribution (theory). Indeed, the multinomial distribution belongs to the family of exponential distributions (Barndorff-Nielsen, 1980). Alternatively, we may view this link function as an extension of the function (3) or a special case of the multivariate logit link function discussed by McCullagh and Nelder (1989: 220). 1 (7) Multinomial logit: η j = log µ j /µ J, where j indicates the jth in 1...j...J response categories. The choice of a link function depends on the distribution of the data and our theory about distribution. Specifically, the distribution of the random component in Y (the part that cannot be systematically explained by X variables) determines the link function and the type of GLM, though some distributions may be appropriately modelled with more than one link function. In GLMs, the distribution may come from an exponential family, in which the normal, the binomial, the Poisson distribution, and several others belong (see McCullagh and Nelder, 1989). When the distribution is normal, the linear link function usually, though not always, applies. The link functions based on the binomial and the Poisson distribution are quite useful, because many variables in the social sciences follow these distributions. For example, often we have discrete or categorical dependent variables that call for logit type of analysis. Sometimes we have event count data, and may use the Poisson regression model (or the negative binomial model for overdispersed data), which is based on the logarithm link function. We may use loglinear models for contingency tables to study event count data when all X variables are categorical. 3. Interval Prediction in the Generalized Linear Model Using the link function, η = g(µ), we may express the predicted value as a function of the predicted linear predictor, ˆη, or ˆµ = g 1 ( ˆη). Therefore, the interval prediction of µ in probability or other forms in the GLM can be equivalently and reasonably obtained by first constructing a confidence interval for ˆη. Essentially, we are interested in constructing a confidence interval for ˆη 0 given a set of certain values in X 0 where the subscript 0 indicates a set of given values. When the GLM is a logit model, the method for constructing a confidence interval for ˆη 0 described below is identical to that for predicting the lower and upper bounds of the logit (Clogg et al., 1991; Rubin and Schenker, 1987) and for predicting those bounds of the odds after exponentiation (Haberman, 1978). In general, the esti-

140 TIM FUTING LIAO mated asymptotic variance of ˆη 0 is a function of X 0 and the estimated asymptotic covariance of (Fox, 1987). Because E(η 0 ) = X 0 β (1) and asymptotically E( ˆβ) = β, (2) we obtain the error of the prediction, e 0 = E(η 0 ) ˆη 0 = X 0 ( ˆβ β), (3) and its expectation, E(e 0 ) = 0. (4) The asymptotic variance of e 0 then is var(e 0 ) = E{[ X 0 ( ˆβ β)][ X 0 ( ˆβ β)] } = X 0 E[( ˆβ β)( ˆβ β) ]X 0 X 0 [I(β)] 1 X 0 (5) because asymptotically E[( ˆβ β)( ˆβ β) ] [I(β)] 1, (6) where I(β) is Fisher information matrix. The approximation of the inverse Fisher information matrix to the asymptotic variance of maximum likelihood estimates given by Equation (6) is widely known (see, e.g., McCullagh and Nelder, 1989: 119). Therefore, asymptotically, e 0 N{0,X 0 [I(β)] 1 X 0 } ( ˆβ β) N{0, [I(β)] 1 }. (7) The result in (5) is general because the inverse of Fisher information matrix gives the variance covariance of the maximum likelihood estimates regardless of the type of GLM. When the error distribution is normal and the link function is identity, we have the classical linear regression model. Because the link function here is identity, µ/ η = 1. Thus, the result from (5) simplifies to var(e 0 X 0 ) = X 0 ˆσ 2 (X X) 1 X 0, (8)

ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 141 since I(β) = X 1 2 µ X. (9) σ 2 η2 This result in (8) is, of course, the same as the interval prediction of Y discussed in standard texts of linear regression models (e.g., Johnston, 1984). Here the maximum likelihood and the least square estimation give the same variance of the parameter estimates, hence the same variance of the prediction. There are two types of interval prediction in linear regression models individual and mean prediction the former having a narrower confidence interval than the latter (see Gujarati, 1988; Johnston, 1984; Judge et al., 1982). Here I focus on mean prediction, since it is more widely used. The generalization of the results above to individual prediction may be useful for some researchers but is beyond the scope of this paper. Using (5), we can directly construct the confidence interval for a mean prediction in a GLM based on the estimated variance-covariance matrix of the estimates. In general, we have var[e(η) ˆη 0 X 0 [I(β)] 1 X 0. (10) This is just another way to express (5). The predicted upper and lower confidence limits of ˆη 0 can be plugged in the inverse of any link function because ˆµ = g 1 ( ˆη) (Fox, 1987). The method given by Equation (10) is simple to apply because [I(β)] 1 or the estimated asymptotic covariance of ˆβ is readily available in the output of most statistical software packages. For a classical linear model, the link function is identity, therefore the predicted η 0 equals the predicted µ, which equals X 0 β. For four GLMs with a nonlinear link function, I present the specification and the construction of confidence intervals in the following sections. 4. The Binary Logit and Probit Model For a logit model (with the logit link function), the predicted logit and odds are, ˆη 0 and e ˆη 0. Thus, the 100(1 α) percent confidence interval for the predicted logit (also see Clogg et al., 1991; Rubin and Schenker, 1987), given X 0, is defined by the predicted lower and upper bounds of η, ˆη 0 z α/2 X 0 [I(β)] 1 X 0 ˆη 0 ˆη 0 + z α/2 X 0 [I(β)] 1 X 0, (11)

142 TIM FUTING LIAO and the predicted odds (also see Haberman, 1978; Hosmer and Lemeshow, 1989), given X 0, are simply the limits above exponentiated. Because ˆµ = g 1 ( ˆη), the predicted probabilities are P(Y 0 = 1 X 0 ) = e ˆη 0. (12) 1 + e ˆη 0 Thus, the 100(1 α) percent confidence interval for the predicted probability, given X 0,is e ˆη 0 z α/2 X 0 [I(β)] 1 X 0 1 + e ˆη 0 z α/2 X 0 [I(β)] 1 X 0 P(Y 0 = 1 X 0 ) e ˆη 0 z α/2 X 0 [I(β)] 1 X 0, 1 + e ˆη 0 z α/2 X 0 [I(β)] 1 X 0 (13) where the inverse of the information matrix gives the maximum likelihood estimates of the variance of the parameter estimates. To simplify, we may replace what is inside the squareroot sign with var(e 0 ) from (10). A GLM with the probit link function makes the model probit, prob(y = 1) = (Xβ). Similar to the logit case, the 100(1 α) percent confidence interval for the predicted probability is defined as [ˆη 0 z α/2 se(e 0 )] P(Y 0 = 1 X 0 ) [ˆη 0 + z α/2 se(e 0 )], (14) where se(e 0 ), the standard error of e 0, is the squareroot of the result from (10). Similar to the confidence interval for the logit or odds, without going through the probit link function we can have a confidence interval for predicted Z scores. Note that because of the nonlinear logit or probit link function, the interval for predicted probability may be asymmetric. The following simple example should illustrate the behavior of confidence intervals in binary logit and probit models. The cross-tabulated data presented in Table I are drawn from Morgan and Teachman (1988: Table 3). These data are about whether an adolescent aged 15 16 had ever had sexual intercourse (yes/no) by the time of the survey. In addition to the number of yes and no responses in the two sexes and races, I also give the observed odds of having answered yes rather than no and the observed probability of answering yes in the rightmost two columns. While someone may want to interpret these odds and probabilities for their own sake, our focus here, however, is on interval predictions. Table II contains the binary logit and probit model estimates. Both independent variables are highly statistically significant, judging by the probability values of the significance test. We see that the estimates from either model will give similar estimated probabilities.

ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 143 Table I. Adolescents having ever had sexual intercourse by sex and race from the National Survey of Children Intercourse Race Sex Yes No Odds Probability White Male 43 134 0.32090 0.24294 Female 26 149 0.17450 0.14857 Subtotal 69 283 0.24362 0.19602 Black Male 29 23 1.26087 0.55769 Female 22 36 0.61111 0.37931 Subtotal 51 59 0.86441 0.46364 Source: Morgan and Teachman (1988, Table 3). Table II. Logit model estimated for data in Table I ˆβ Variable Logit Probit p exp( ˆβ) White 1.314 0.789 0.000/0.000 0.269 (0.226) (0.144) Female 0.648 0.377 0.004/0.004 0.523 (0.225) (0.131) Constant 0.192 0.106 0.365/0.584 1.212 (0.226) (0.138) LR Statistic 37.459 37.379 df 2 2 Using (11), (13), and (14), I have calculated the upper and lower limits of the confidence intervals for the predicted odds, Z scores, and probabilities based on the two models. These confidence limits as well as the central predictions are presented in Table III. Because of the focus of the paper, I will not emphasize substantive interpretation of the predictions, which are interesting to examine in every example in the paper. What do we learn from these confidence intervals? First, a statistically significant parameter estimate by conventional standards does not guarantee nonoverlapping confidence intervals. The confidence interval for males overlaps with that for females in all the four situations in the table. This is true regardless of the form of the prediction probability, odds, or Z score because the probability is simply a function of its associated odds or Z score. Thus, I now focus my discussion on the predicted probabilities. For whites the overlap is trivial (<0.01 in probability),

144 TIM FUTING LIAO Table III. Predicted odds, Z scores, probabilities, and their 95% confidence intervals using the logit and probit estimates based on the sexual intercourse data Logit estimate Probit estimate Race Sex Prediction Lower Central Upper Lower Central Upper White Male Odds or Z 0.23706 0.32591 0.44808 0.86788 0.68318 0.49848 Probability 0.19163 0.24580 0.30943 0.19273 0.24725 0.30907 Female Odds or Z 0.11718 0.17051 0.24811 1.26649 1.06008 0.85367 Probability 0.10489 0.14567 0.19879 0.10267 0.14455 0.19664 Black Male Odds or Z 0.77844 1.21208 1.88729 0.16551 0.10579 0.37709 Probability 0.43771 0.54794 0.65365 0.43427 0.54213 0.64695 Female Odds or Z 0.41030 0.63414 0.98008 0.53719 0.27111 0.00503 Probability 0.29093 0.38806 0.49497 0.29557 0.39315 0.49799 while it is nontrivial for blacks (about 0.05 in probability). This knowledge cannot be gained by examining the predicted probabilities alone, because the difference in predicted probability between the sexes among blacks (0.16) is actually greater than that between the sexes among whites (0.10). Second, parameters that are statistically more precise are associated with narrower confidence intervals. This is evidenced by the confidence interval within each sex for one race being quite a distance away from the same confidence interval for the other race, while the confidence intervals for the two sexes conditional on race are not that far apart. Finally, either the logit or the probit model gives similar estimates of the confidence limits in probability. The final remark should come as no surprise, and the similarities between the two models are well discussed in the literature. Maximum likelihood estimation in the GLM is asymptotically normal. This property still applies in the interval predictions from the models. The sample size of most surveys including the current example is large enough for maintaining the property. When the link function is multinomial logit, we may naturally extend the construction of confidence intervals to the multinomial logit model. I omit a multinomial example because of its similarity to the binomial logit. Suffice it to say that for both the two- and the multiple-outcome model with J number of outcomes, j P j = 1 for the probability. The lower confidence limits for the categories 1, 2,...,J 1 and the upper confidence limit for the category J sum to 1, and the upper confidence limit for the categories 1, 2,...,J 1 and the lower confidence limit for the category J sum to 1. Because j P j = 1, in multiple-outcome models the lower confidence limit of ˆη gives the lower confidence limit of the predicted probability except for the last category, for which the upper confidence limit of ˆη gives the lower confidence limit of the predicted probability, and vice versa.

ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 145 5. The Ordinal Logit Model This is a natural extension of the binary-response model (McCullagh and Nelder, 1989). The predicted probability from an ordinal logit model is P(Y 0 = j X 0 ) = F(α j ˆη 0 ) F(α j 1 ˆη 0 ), (15) where α j are threshold values between response categories to be estimated with β j. In order for all the probabilities to be positive, we must have 0 <α 2 <α 3 < <α J 2. The first component of the righthand side of (15) gives Prob(Y 0 j) while the second component gives Prob(Y 0 j 1). Equation (15) gives the general form for the probability that the response Y falls into category j,and F( ) can be usually replaced with either the cumulative logistic distribution function or the cumulative normal distribution function. Here we focus on the ordinal logit model, thus using a cumulative logistic distribution function. The first threshold parameter, α 1,is typically normalized to zero so that we have one less parameter to estimate, because the scale is arbitrary and can start from or finish with any value. The confidence limits for the predicted probabilities of Prob(Y 0 j) in the ordinal logit model are L{α j [ˆη 0 ± z t/α se(e 0 )]}, (16) where L( ) substitutes F( ) to indicate the cumulative logistic distribution function. Obviously, a simple subtraction or addition of the lower or upper confidence limit will not work for (15). Thus, confidence intervals for the cumulative probability in an ordinal logit model are the natural choice. On the General Social Survey (GSS) during recent years, the respondent was asked a series of questions regarding gender attitudes. The example for the ordinal logit model is based on one of such questions in the 1989 GSS. The respondent was asked if he or she strongly agrees, agrees, disagrees, or strong disagrees that it is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family (Davis and Smith, 1991). 2 An ordinal logit model was estimated using the SAS PROC LOGISTIC procedure, and the results are presented in Table IV. 3 Most of the predictors have a strong influence on the attitude, especially age, sex, and education, which are highly significant. The overall model fit is very good, as indicated by the Likelihood Ratio statistic; the model also passes the statistical test for the proportional odds assumption (results not shown). Predicted probabilities and their confidence intervals, our focus again for this example, are estimated at the mean values of age, religion, and marital status for a white female by various levels of education (Table V). Here the probabilities and confidence intervals are estimated cumulative probabilities and their corresponding confidence intervals by using the first component in the righthand side of (15) and (16), respectively.

146 TIM FUTING LIAO Table IV. Ordinal logit model estimates using the 1989 GSS gender attitudes data (N = 970) Variable ˆβ se(β) p exp( ˆβ) Age 0.043 0.004 0.0001 1.044 White? 0.411 0.189 0.0297 0.663 Female? 0.684 0.134 0.0001 0.505 Catholic? 0.281 0.149 0.0593 1.324 Married? 0.087 0.132 0.5113 0.917 Education 0.170 0.023 0.0001 0.844 Constant 1 0.560 0.404 0.1664 1.751 Constant 2 2.860 0.416 0.0001 17.462 LR Statistic 219.575 df 6 Note: The dependent variable is coded 1 if the respondent (strongly) agrees, 2 if disagrees, and 3 if strongly disagrees with the statement that it is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family. Source: Davis and Smith (1991). Table V. Predicted probabilities and their 95% confidence intervals using ordinal logit estimates based on the 1989 GSS gender attitudes data Not disagree Not strongly disagree Education Lower Probability Upper Lower Probability Upper 0 year 0.68940 0.80198 0.88081 0.95514 0.97585 0.98716 2 years 0.63283 0.74260 0.82844 0.94295 0.96642 0.98044 4 years 0.57178 0.67267 0.75978 0.92755 0.95349 0.97044 6 years 0.50749 0.59414 0.67530 0.90810 0.93591 0.95572 8 years 0.44113 0.51047 0.57942 0.88342 0.91230 0.93456 10 years 0.37335 0.42622 0.48083 0.85168 0.88110 0.90533 12 years 0.30411 0.34604 0.39051 0.80969 0.84073 0.86754 14 years 0.23520 0.27375 0.31600 0.75263 0.78993 0.82293 16 years 0.17278 0.21167 0.25659 0.67705 0.72816 0.77388 18 years 0.12209 0.16056 0.20827 0.58483 0.65613 0.72103 20 years 0.08398 0.11991 0.16838 0.48286 0.57613 0.66427 Note: The cumulative probabilities and their confidence intervals are estimated at the mean values of age, religion, and marital status for a white female. The difference between the two columns of probabilities gives the probability for answering disagree, and 1 minus column 2 probability gives that for answering strongly disagree.

ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 147 The two cumulative predicted probabilities are those for answering strongly agree or agree (i.e., not disagree) and those for answering either of the first two categories or disagree (i.e., not strongly disagree). Several points are worth noting. First, for a white female with the sample mean values of age, catholic proportion, and marital status, education tends to decrease the likelihood of agreeing with the question, or, put differently, increase the likelihood of disagreeing with traditionalism. This conclusion should come as no surprise. Statistically, the two schedules of predicted probabilities have quite narrow confidence intervals. At every level of education, the upper confidence limit for the probability of not disagreeing is at least 0.05 below the lower confidence limit for the probability of not strongly disagreeing. Moreover, the width of the confidence interval is a function of education: The width of the confidence interval for the probability of not disagreeing tends to decrease with education, while that for the probability of not strongly disagreeing tends to increase with education. This makes substantive sense, because educated respondents tend to be less likely to answer strongly agree or agree while less education respondents tend to give an answer of strongly agree, agree, or even disagree. It also implies that the demarkation between the disagree and the strongly disagree category is less clear for educated people than for less educated people, whose answers tend to concentrate in the first three response categories. A gain in education makes much difference in answering this attitudinal question. Moving from the lower to the high level of education, the confidence intervals associated with 2 years, 8 years, 12 years, 16 years, and 20 years of completed education are entirely nonoverlapping for either of the two schedules of probabilities. 6. The Poisson Regression Model Event counts, especially rare event counts, are assumed to be generated by a Poisson process, and can be described and modelled by a Poisson distribution. The basic model is the single-parameter Poisson probability density function f(y i,θ i ) = P(Y i = y i ) = e θ i θ y i i, for y i = 0, 1, 2,..., ; θ i > 0, (17) y i! where θ i is the expected value of Y i, E(Y i ), with subscript i indicating individual cases. Equation (17) defines the model in terms of probability. Note that this distribution can also generate the loglinear model when all X variables are categorical. We can express the Poisson model in the observed y i or its expectations θ i, thereby producing a common formulation of the Poisson regression, ln θ i η i. (18) This ensures that θ i is always greater than 0 because θ i = exp(η i ). Equation (18) should remind us of the logarithm link function, η = log µ, which is based on the Poisson distribution.

148 TIM FUTING LIAO Based on (17) and (18), the confidence limits for the predicted probabilities from the Poisson regression model, given X 0, are defined as e e ˆη 0i ±z α/2 se(e 0i ) [e ˆη 0i±z α/2 se(e 0i ) ] y i, for y i = 0, 1, 2,..., ; θ i > 0. (19) y i! Practically, we only need to calculate the confidence limits as well as predicted probabilities for those response categories having a nontrivial observed frequency rather than up to infinity. As is the case with the confidence interval for predicted odds, the interval for predicted event counts requires rather simple calculations involving only exponentiating the lower and upper limits of ˆη 0. Let us use the 1990 GSS data to illustrate the confidence intervals for predicted probabilities from the Poisson regression model. On the 1990 round of the GSS, the respondent was asked about the number of traumatic events (deaths, divorces, unemployments, and hospitalization/disabilities) happening to the respondent in the past year (Davis and Smith 1991). These were relatively rare events because 65 percent of those who answered the question experienced no such event, 29.6 percent experienced one event, 4.3 percent experienced two events, and the remaining 1.1 percent experienced three events. This resembles a Poisson distribution. 4 In addition, the dispersion parameter discussed by McCullagh and Nelder (1989) is estimated to be 0.88087, with unity being the equality between the mean and the variance. The estimated dispersion parameter suggests that a negative binomial model is not necessary. A Poisson regression model with the covariates of race, marital status, education, residence in an SMSA, and professional status was estimated, and its parameter estimates are reported in Table VI. 5 All covariates except residence in an SMSA are significant factors in explaining the tendency of experiencing traumatic events. As one would expect, married respondents tend to lead a more secure life; so do people who reside in an SMSA, probably due to greater access to facilities and services. Being white and educational attainment both have positive effect, probably because of a particular traumatic event, divorce. For examining confidence intervals for predictions, I use marital status and residence in an SMSA (also see Note 5). Therefore, predicted probabilities and their confidence intervals are estimated for a white respondent with 12 years of completed education and mean professional status at various levels of marital and residence status (Table VII). As suggested by the signs of the parameter estimates for marital status and residence in an SMSA, respondents who are not married or who reside in an SMSA tend to predict a higher mean event count and a lower probability of zero event and a higher probability of one or more event. As suggested by the relative size of standard errors, marital status produces narrower confidence intervals without any overlapping intervals between the married and not married people. On the other hand, SMSA residence is associated with overlapping confidence intervals for probability predictions, though not for event count predictions. Sometimes

ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 149 Table VI. Poisson regression model estimates using the 1990 GSS trauma data (N = 761) Variable ˆβ se( ˆβ) p exp( ˆβ) White? 0.380 0.192 0.0483 1.462 Married? 0.537 0.117 0.0001 0.584 Education 0.044 0.022 0.0471 0.957 NonSMSA? 0.270 0.123 0.0278 1.310 Professional? 0.209 0.174 0.2304 0.811 Constant 2.896 0.318 0.0001 0.055 LR Statistic 40.515 df 5 Note: The dependent variable records the number of traumatic events (deaths, divorces, unemployments, and hospitalization/disabilities) happening to the respondent in the past year. Source: Davis and Smith (1991). these overlaps can be quite large (e.g., those for zero event probability). The interval becomes narrower when the probability reduces to very small values for the higher event probabilities. These confidence intervals are quite informative because they give us a precision measure for the predictions, which enable us to better compare the effects of different variables. As in other probability models, corresponding lower and upper confidence limit estimates for probability sum to unity approximately (see notes to Table VII). 7. Conclusion I have discussed in this paper a general yet simple and practical method of confidence intervals for predictions from the GLM. I have shown that the method used for making prediction for classical linear models is merely a special case of the general method for GLMs. I have examined four such models the binary logit, the binary probit, the ordinal logit, and the Poisson regression model to demonstrate the construction of the confidence intervals for predicted event probabilities, odds, Z scores, and event counts. As Liao (1994) demonstrated, these predicted values are important for interpreting GLMs. This paper shows that the confidence interval for an event prediction gives the researcher useful information and an estimated precision for the prediction so that interpretation of the GLM results becomes more informative.

Table VII. Predicted counts and probabilities and their 95% confidence intervals using Poisson regression estimates based on the 1990 GSS trauma data Marital Mean event count 0 Event 1 Event 2 Events 3 Events Status Residence Lower Mean Upper Lower Prob Upper Lower Prob Upper Lower Prob Upper Lower Prob Upper Single NonSMSA 0.57480 0.71842 0.89791 0.40742 0.48752 0.56282 0.32351 0.35024 0.36583 0.09298 0.12581 0.16424 0.01781 0.03013 0.04916 SMSA 0.45513 0.54859 0.66124 0.51621 0.57776 0.63467 0.28872 0.31696 0.34134 0.06570 0.08694 0.11285 0.00997 0.01590 0.02487 Married NonSMSA 0.33395 0.41991 0.52800 0.58978 0.65710 0.71609 0.23914 0.27593 0.31141 0.03993 0.05793 0.08221 0.00444 0.00811 0.01447 SMSA 0.26320 0.32065 0.39064 0.67662 0.72568 0.76859 0.20229 0.23269 0.26432 0.02662 0.03731 0.05163 0.00234 0.00399 0.00672 Note: The mean event count, the probabilities, and their confidence intervals are estimated at the mean of professional status for a white respondent with 12 years of education. All probabilities should approximately sum to 1 across the choice categories, and the lower confidence limit estimate for 0 event and the upper confidence limit estimates for the remaining events, or the upper confidence limit estimates for 0 event and the lower confidence limit estimates for the remaining events, should, too. 150 TIM FUTING LIAO

ESTIMATED PRECISION FOR PREDICTIONS FROM GENERALIZED LINEAR MODELS 151 Acknowledgements A version of the paper was presented at the Annual Meeting of the American Sociological Association, 13 17 August 1993, Miami, Florida, U.S.A. Comments on a previous version of the paper by Kenneth Bollen, Clifford Clogg, Guang Guo, Ronald Rindfuss, and Yu Xie are greatfully acknowledged. The paper has also benefitted from the discussions on the Methodology Session of the meeting, and the discussions with Walter Davis. Notes 1. For another specification of the relation between a polytomous and multivariate polytomous dependent variables, see Haberman (1979). 2. Because of the small number of respondents answering the first category, the first two categories are combined into a (strongly) agree category for the analysis. 3. Some statistical software such as the procedure PROC LOGISTIC in SAS estimates ordinal logit and probit models, but care needs to be taken when calculating probabilities because SAS uses, instead the logit counterpart of (16) as given in McCullagh and Nelder (1989), log[prob(y j)/1 Prob(y j)]=α j + β 2 x 2 + β 3 x 3 + +β k x k,whereα j is a composite term of α j in (16) and β 1,and is replaced with +. This formulation, of course, will not affect maximum likelihood estimation of coefficients, though users may be surprised to see that every coefficient bears a sign contrary to its substantive expectation. 4. The observed distribution of 65.046%, 29.566%, 4.336%, and 1.051% for 0, 1, 2, and 3 events, respectively, is very well reproduced with the predicted probability from the Poisson regression estimates when all covariates are set at their mean values. The predictions are 66.852%, 26.921%, 5.420%, and 0.728% for these event categories, showing traumatic events closely follow a Poisson distribution. 5. This model is trimmed from a model with two more explanatory variables, age and sex, which both were highly insignificant (with a probability of the significance test greater than 0.67 and 0.48, respectively). Unfortunately, the GSS does not separately collect information on the individual traumatic events. Marital status is included in the model because its importance in explaining mental and physical health, though it is by definition correlated with divorce, one of the five traumatic events. The exposure variable has month as its unit of analysis for the Poisson regression model. References Barndorff-Nielsen, O. E. (1980). Exponential families, Memoirs No. 5, Department of Theoretical Statistics, Institute of Mathematics, University of Aarhus. Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B. & Weidman, L. (1991). Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. Journal of the American Statistical Association 86: 68 75. Davis, J. A. & Smith, T. W. (1991). General Social Surveys, 1972 1991, [MRDF]. Chicago: National Opinion Research Center [producer], Storrs, CT: The Roper Center for Public Opinion Research, University of Connecticut [distributor]. Dobson, Annette J. (1990). An Introduction to Generalized Linear Models. London: Chapman and Hall. Fox, John. (1987). Effect displays for generalized linear models. In: C. C. Clogg (ed.), Sociological Methodology 1987. Washington: American Sociological Association, pp. 343 361.

152 TIM FUTING LIAO Gujarati, D. N. (1988). Basic Econometrics, 2nd edn. New York: McGraw-Hill. Haberman, S. J. (1978). Analysis of Qualitative Data, Volume 1. New York: Academic. Haberman, S. J. (1979). Analysis of Qualitative Data, Volume 2. New York: Academic. Hosmer, D. W., Jr & Lemeshow, S. (1989). Applied Logistic Regression. New York: Wiley. Johnston, J. (1984). Econometric Methods, 3rd edn. New York: McGraw-Hill. Judge, G. G., Hill, R. C., Griffiths, W., L utkepohl, H. & Lee, T.-C. (1982). Introduction to the Theory and Practice of Econometrics. New York: Wiley. Liao, T. F. (1994). Interpreting Probability Models: Logit, Probit, and Other Generalized Linear Models. Thousand Oaks, CA: Sage. McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models, 2nd edn. London: Chapman and Hall. Morgan, S.P. & Teachman, J. D. (1988). Logistic regression: description, examples, and comparisons. Journal of Marriage and the Family 50: 929 936. Nelder, J. A. & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society A 135: 370 384. Rubin, D. B. & Schenker, N. (1987). Logit-based interval estimation for binomial data using the Jeffreys prior. In C. C. Clogg (ed.), Sociological Methodology 1987. Washington: American Sociological Association, pp. 131 144.