Generalized Linear Models: An Introduction

Size: px

Start display at page:

Download "Generalized Linear Models: An Introduction"

Ashlee Wood
5 years ago
Views:

1 Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June by John Fox

2 Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn, generalized linear models (GLMs) extend the range of application of linear statistical models by accommodating response variables with non-normal conditional distributions. Except for the error, the right-hand side of a generalized linear model is essentially the same as for a linear models. Generalized Linear Models: An Introduction 2 1. Goals To introduce the format and structure of generalized linear models To show how the familiar linear, logit, and probit models fit intotheglm framework. To introduce Poisson generalized linear models for count data. To describe diagnostics for generalized linear models. Generalized Linear Models: An Introduction 3 2. The Structure of Generalized Linear Models A generalized linear model consists of three components: 1. A random component, specifying the conditional distribution of the response variable, Y i, given the predictors. Traditionally, the random component is a univariate exponential family the normal (Gaussian), binomial, Poisson, gamma, or inverse-gaussian family of distributions but generalized linear models have been extended beyond the univariate exponential families. Generalized Linear Models: An Introduction 4 2. A linear function of the regressors, called the linear predictor, η i = α + β 1 X i1 + + β k X ik on which the expected value µ i of Y i depends. The X s may include quantitative explanatory, but they may also include transformations of explanatory variables, polynomial terms, contrasts generated from factors, interaction regressors, etc. 3. An invertible link function g(µ i )=η i, which transforms the expectation of the response to the linear predictor. The inverse of the link function is sometimes called the mean function: g 1 (η i )=µ i.

3 Generalized Linear Models: An Introduction 5 Standard link functions and their inverses are shown in the following table: Link η i = g(µ i ) µ i = g 1 (η i ) identity µ i η i log log e µ i e η i inverse inverse-square square-root µ 1 i η 1 i µ 2 i η 1/2 i µi η 2 i µ logit log i 1 e 1 µ i 1+e η i probit Φ 1 (µ i ) Φ(η i ) log-log log e [ log e (µ i )] exp[ exp( η i )] complementary log-log log e [ log e (1 µ i )] 1 exp[ exp(η i )] Generalized Linear Models: An Introduction 6 The logit, probit, log-log, and complementary-log-log links are for binomial data, where Y i represents the observed proportion and µ i the expected proportion of successes in n i binomial trials that is, µ i is the probability of a success. For the probit link, Φ is the standard-normal cumulative distribution function, and Φ 1 is the standard-normal quantile function. An important special case is binary data, where all of the binomial trials are 1, and therefore all of the observed proportions Y i are either 0 or 1. This is the case that we examined previously. Generalized Linear Models: An Introduction 7 For distributions in the exponential families, the conditional variance of Y is a function of the mean µ together with a dispersion parameter φ (as showninthetablebelow). For the binomial and Poisson distributions, the dispersion parameter is fixedto1. For the Gaussian distribution, the dispersion parameter is the usual error variance, which we previously symbolized by σ 2 ε (and which doesn t depend on µ). Generalized Linear Models: An Introduction 8 Family Canonical Link Range of Y i V (Y i η i ) Gaussian identity (, + ) φ 0, 1,..., n i µ binomial logit i (1 µ i ) n i n Poisson log 0, 1, 2,... µ i gamma inverse (0, ) φµ 2 i inverse-gaussian inverse-square (0, ) φµ 3 i

4 Generalized Linear Models: An Introduction 9 The canonical link for each familiy is not only the one most commonly used, but also arises naturally from the general formula for distributions in the exponential families. Other links may be more appropriate for the specific problem at hand One of the strengths of the GLM paradigm in contrast, for example, with transformation of the response variable in a linear model is the separation of the link function from the conditional distribution of the response. GLMs are typically fit to data by the method of maximum likelihood. Denote the maximum-likelihood estimates of the regression parameters as bα, β b 1,..., β b k. These imply an estimate of the mean of the response, bµ i = g 1 (bα + β b 1 x i1 + + β b k x ik ). Generalized Linear Models: An Introduction 10 The log-likelihood for the model, maximized over the regression coefficients, is nx log e L 0 = log e p(bµ i,φ; y i ) i=1 where p( ) is the probability or probability-density function corresponding to the family employed. A saturated model, which dedicates one parameter to each observation, and hence fits the data perfectly, has log-likelihood nx log e L 1 = log e p(y i,φ; y i ) i=1 Twice the difference between these log-likelihoods defines the residual deviance under the model, a generalization of the residual sum of squares for linear models: D(y; bµ) =2(log e L 1 log e L 0 ) Generalized Linear Models: An Introduction 11 Dividing the deviance by the estimated dispersion produces the scaled deviance: D(y; bµ)/ φ. b Likelihood-ratio tests can be formulated by taking differences in the deviance for nested models. Wald tests for individual coefficients are formulated using the estimated asymptotic standard errors of the coefficients. Some familiar examples: Combining the identity link with the Gaussian family produces the normal linear model. The maximum-likelihood estimates for this model are the ordinary least-squares estimates. Combining the logit link with the binomial family produces the logisticregression model (linear-logit model). Combining the probit link with the binomial family produces the linear probit model. Generalized Linear Models: An Introduction 12 Although the logit and probit links are familiar, the log-log and complementary log-log link for binomial data are not. These links are compared in the following graph. The log-log and complementary log-log link may be appropriate when the probability of the response as a function of the linear predictor approaches 0 and 1 asymmetrically. The log-log link can be applied via the complementary log-log link by exchanging the roles of 0 s and 1 s in the response Y.

5 Generalized Linear Models: An Introduction 13 µ i = g 1 (η i ) logit probit log log complementary log log Generalized Linear Models: An Introduction Poisson GLMs for Count Data Poisson generalized linear models arise in two common formally identical but substantively distinguishable contexts: 1. when the response variable in a regression model takes on non-negative integer values, such as a count; 2. to analyze associations among categorical variables in a contingency table of counts. The default link for the Poisson family is the log link η i Generalized Linear Models: An Introduction Poisson Regression I will employ data collected by Ornstein on interlocking director and top-executive positions among 248 major Canadian firms Ornstein performed a least-squares regression of the number of interlocks maintained by each firm on the firm s assets, and dummy variables for the firm s nation of control (Canada, US, UK, Other) and sector of operation (10 categories). Because the response variable is a count, a Poisson linear model might be preferable. The marginal distribution of number of interlocks shows many zero counts and an extreme positive skew: Generalized Linear Models: An Introduction 16 Frequency

6 Generalized Linear Models: An Introduction 17 FittingaPoissonGLMwithloglinktoOrnstein sdataproducesthe following results: Generalized Linear Models: An Introduction 18 Coefficient SE Constant Assets Nation of Control (baseline: Canada) Other United Kingdom United States Sector (baseline: Agriculture and Food) Banking Construction Finance Holding Company Manufacturing Merchandizing Mining Transportation Wood and Forest Products Generalized Linear Models: An Introduction 19 An analysis of deviance table for the model shows that all three explanatory variables have highly statistically significant effects: Source G 2 df p Assets Nation of Control Sector The deviance for the null model (with only a constant) is , and for the full model; thus the analog of the squared multiple correlation is R 2 = =.495 Generalized Linear Models: An Introduction 20 Effect displays for the terms in the Poisson-regression model: log e (a) Assets (billions of dollars) log e Canada Other U.S. U.K. Nation of Control (c) log e (b) WOD TRN FIN MIN HLD MER MAN AGR BNK CON Sector

7 Generalized Linear Models: An Introduction Loglinear Models for Contingency Tables Poisson GLMs may also be used to fit loglinear models to a contingency table of frequency counts, where the object is to model association among the variables in the table. The variables constituting the classifications of the table are treated as explanatory variables in the Poisson model, while the cell count plays the role of the response. We previously examined Campbell et al. s data on voter turnout in the 1956 U. S. presidential election We used a binomial logit model to analyze a three-way contingency table for turnout by perceived closeness of the election and intensity of partisan preference. The binomial logit model treats turnout as the response. An alternative is to construct a log-linear model for the expected cell count. Generalized Linear Models: An Introduction 22 This model looks very much like a three-way ANOVA model, where in place of the cell mean we have the log cell expected count: log µ ijk = µ + α 1(i) + α 2(j) + α 3(k) +α 12(ij) + α 13(ik) + α 23(jk) + α 123(ijk) Here, variable 1 is perceived closeness of the election; variable 2 is intensity of preference; and variable 3 is turnout. Although a term such as α 12(ij) looks like an interaction, it actually models the partial association between variables 1 and 2. The three-way term α 123(ijk) allows the association between any pair of variables to be different in different categories of the third variable; it thus represents an interaction in the usual sense of that concept. In fitting the log-linear model to data, we can use sigma-constraints on the parameters, much as we would for an ANOVA model. Generalized Linear Models: An Introduction 23 In the context of a three-way contingency table, the loglinear model above is a saturated model, because it has as many independent parameters (12) as there are cells in the table. I fit this model to the American Voter data, obtaining the following Type-II analysis of deviance table: Source G 2 df p Closeness Preference Turnout Closeness Preference Closeness Turnout Preference Turnout <.0001 Closeness Preference Turnout Generalized Linear Models: An Introduction 24 Notice that the likelihood-ratio test for the three-way term Closeness Preference Turnout is identical to the test for the Closeness Preference interaction in the logit model in which Turnout is the response variable. In general, as long as we fit the parameters for the associations among the explanatory variable (here Closeness Preference and, of course, its lower-order relatives, Closeness and Preference) and for the marginal distribution of the response (Turnout), the loglinear model for a contingency table is equivalent to a logit model. There is, therefore, no real advantage to using a loglinear model in this setting. Loglinear models, however, can be used to model association in other contexts.

8 Generalized Linear Models: An Introduction Over-Dispersed Binomial and Poisson Models The binomial and Poisson GLMs fix the dispersion parameter φ to 1. It is possible to fit versions of these models in which the dispersion is a free parameter, to be estimated along with the coefficients of the linear predictor The resulting error distribution is not an exponential family. The regression coefficients are unaffected by allowing dispersion different from 1, but the coefficient standard errors are multiplied by the square-root of φ. b Because the estimated dispersion typically exceeds 1, this inflates the standard errors That is, failing to account for over-dispersion produces misleadingly small standard errors. So-called over-dispersed binomial and Poisson models arise in several different circumstances. Generalized Linear Models: An Introduction 26 For example, in modeling proportions, it is possible that the probability of success µ varies for different individuals who share identical values of the predictors (this is called unmodeled heterogeneity ); or the individual successes and failures for a binomial observation are not independent, as required by the binomial distribution. Generalized Linear Models: An Introduction Diagnostics for GLMS Most regression diagnostics extend straightforwardly to generalized linear models. These extensions typically take advantage of the computation of maximum-likelihood estimates for generalized linear models by iterated weighted least squares (IWLS the procedure typically used to fit GLMs). Generalized Linear Models: An Introduction Outlier, Leverage, and Influence Diagnostics Hat-Values Hat-values for a generalized linear model can be taken directly from the final iteration of IWLS. They have the usual interpretation except that the hat-values in a GLM depend on Y as well as on the configuration of the X s.

9 Generalized Linear Models: An Introduction Residuals Several kinds of residuals can be defined for generalized linear models: Response residuals are simply the differences between the observed response and its estimated expected value: Y i bµ i. Working residuals are the residuals from the final WLS fit. These may be used to define partial residuals for component-plusresidual plots (see below). Pearson residuals are case-wise components of the Pearson goodness-of-fit statistic for the model: bφ 1/2 (Y i bµ i ) q bv (Y i η i ) where φ is the dispersion parameter for the model and V (Y i η i ) is the variance of the response given the linear predictor. Generalized Linear Models: An Introduction 30 Standardized Pearson residuals correct for the conditional response variation and for the leverage of the observations: Y i bµ R Pi = q i bv (Y i η i )(1 h i ). Deviance residuals, D i, are the square-roots of the case-wise components of the residual deviance, attaching the sign of Y i bµ i. Standardized deviance residuals are R Di = q bφ(1 h i ) Several different approximations to studentized residuals have been suggested. To calculate exact studentized residuals would require literally refitting the model deleting each observation in turn, and noting the decline in the deviance. D i Generalized Linear Models: An Introduction 31 Here is an approximation q due to Williams: Ei = (1 h i )RDi 2 + h irpi 2 where, once again, the sign is taken from Y i bµ i. A Bonferroni outlier test using the standard normal distribution may be based on the largest absolute studentized residual. Generalized Linear Models: An Introduction Influence Measures An approximation to Cook s distance influence measure is D i = R2 Pi bφ(k +1) h i 1 h i Note: Influence on coefficients = outlyingness leverage. Approximate values of DFBETA ij and DFBETAS ij (influence and standardized influence on each coefficient) may be obtained directly from the final iteration of the IWLS procedure. There are two largely similar extensions of added-variable plots to generalized linear models, one due to Wang and another to Cook and Weisberg.

10 Generalized Linear Models: An Introduction 33 The following graph shows hat-values, studentized residuals, and Cook s distances for the quasi-poisson model fit to Ornstein s interlocking directorate data: Generalized Linear Models: An Introduction 34 One observation number 1, the corporation with the largest assets stands out by combining a very large hat-value with the biggest absolute studentized residual. This point is not a statistically significant outlier Studentized Residuals Hat Values Generalized Linear Models: An Introduction 35 As shown in a DFBETA plot, observation 1 makes the coefficient of assets substantially smaller than it would otherwise be (recall that the coefficient for assets is ): DFBETA Assets In this case, the approximate DFBETA is quite accurate: If observation 1 is deleted, the assets coefficient increases to Index Generalized Linear Models: An Introduction Nonlinearity Diagnostics Component-plus-residual plots also extend straightforwardly to generalized linear models. Nonparametric smoothing of the resulting scatterplots can be important to interpretation, especially in models for binary responses, where the discreteness of the response makes the plots difficult to examine. Similar effects can occur for binomial and Poisson data. Component-plus-residual plots use the linearized model from the last step of the IWLS fit. For example, the partial residual for X j adds the working residual to B j X ij. The component-plus-residual plot graphs the partial residual against X j.

11 Generalized Linear Models: An Introduction 37 An illustrative component+residual plot, for assets in Ornstein s interlocking-directorate quasi-poisson regression: Generalized Linear Models: An Introduction 38 I therefore investigated transforming assets down the ladder of powers and roots, eventually arriving at the log transformation, the component+residual plot for which appears quite straight: Component + Residual Assets (billions of dollars) Component + Residual This plot is difficult to examine because of the large positive skew in assets, but it appears as if the assets slope is a good deal steeper at the left than at the right Log 10 Assets (billions of dollars) Generalized Linear Models: An Introduction 39 Note the relationship between the problems of influence and nonlinearity in this example: Observation 1 was influential in the original regression because its very large assets gave it high leverage, and because un-modelled nonlinearity put the observation below the erroneously linear fit for assets, pulling the regression surface towards it. Log-transforming assets fixes both of these problems. Generalized Linear Models: An Introduction 40 Alternative effect displays for assets in the transformed model: (a) (b) log e log e Assets (billions of dollars) Assets (billions of dollars)

11. Generalized Linear Models: An Introduction

Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and