Generalized Linear Models: An Introduction

Size: px
Start display at page:

Download "Generalized Linear Models: An Introduction"

Transcription

1 Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June by John Fox

2 Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn, generalized linear models (GLMs) extend the range of application of linear statistical models by accommodating response variables with non-normal conditional distributions. Except for the error, the right-hand side of a generalized linear model is essentially the same as for a linear models. Generalized Linear Models: An Introduction 2 1. Goals To introduce the format and structure of generalized linear models To show how the familiar linear, logit, and probit models fit intotheglm framework. To introduce Poisson generalized linear models for count data. To describe diagnostics for generalized linear models. Generalized Linear Models: An Introduction 3 2. The Structure of Generalized Linear Models A generalized linear model consists of three components: 1. A random component, specifying the conditional distribution of the response variable, Y i, given the predictors. Traditionally, the random component is a univariate exponential family the normal (Gaussian), binomial, Poisson, gamma, or inverse-gaussian family of distributions but generalized linear models have been extended beyond the univariate exponential families. Generalized Linear Models: An Introduction 4 2. A linear function of the regressors, called the linear predictor, η i = α + β 1 X i1 + + β k X ik on which the expected value µ i of Y i depends. The X s may include quantitative explanatory, but they may also include transformations of explanatory variables, polynomial terms, contrasts generated from factors, interaction regressors, etc. 3. An invertible link function g(µ i )=η i, which transforms the expectation of the response to the linear predictor. The inverse of the link function is sometimes called the mean function: g 1 (η i )=µ i.

3 Generalized Linear Models: An Introduction 5 Standard link functions and their inverses are shown in the following table: Link η i = g(µ i ) µ i = g 1 (η i ) identity µ i η i log log e µ i e η i inverse inverse-square square-root µ 1 i η 1 i µ 2 i η 1/2 i µi η 2 i µ logit log i 1 e 1 µ i 1+e η i probit Φ 1 (µ i ) Φ(η i ) log-log log e [ log e (µ i )] exp[ exp( η i )] complementary log-log log e [ log e (1 µ i )] 1 exp[ exp(η i )] Generalized Linear Models: An Introduction 6 The logit, probit, log-log, and complementary-log-log links are for binomial data, where Y i represents the observed proportion and µ i the expected proportion of successes in n i binomial trials that is, µ i is the probability of a success. For the probit link, Φ is the standard-normal cumulative distribution function, and Φ 1 is the standard-normal quantile function. An important special case is binary data, where all of the binomial trials are 1, and therefore all of the observed proportions Y i are either 0 or 1. This is the case that we examined previously. Generalized Linear Models: An Introduction 7 For distributions in the exponential families, the conditional variance of Y is a function of the mean µ together with a dispersion parameter φ (as showninthetablebelow). For the binomial and Poisson distributions, the dispersion parameter is fixedto1. For the Gaussian distribution, the dispersion parameter is the usual error variance, which we previously symbolized by σ 2 ε (and which doesn t depend on µ). Generalized Linear Models: An Introduction 8 Family Canonical Link Range of Y i V (Y i η i ) Gaussian identity (, + ) φ 0, 1,..., n i µ binomial logit i (1 µ i ) n i n Poisson log 0, 1, 2,... µ i gamma inverse (0, ) φµ 2 i inverse-gaussian inverse-square (0, ) φµ 3 i

4 Generalized Linear Models: An Introduction 9 The canonical link for each familiy is not only the one most commonly used, but also arises naturally from the general formula for distributions in the exponential families. Other links may be more appropriate for the specific problem at hand One of the strengths of the GLM paradigm in contrast, for example, with transformation of the response variable in a linear model is the separation of the link function from the conditional distribution of the response. GLMs are typically fit to data by the method of maximum likelihood. Denote the maximum-likelihood estimates of the regression parameters as bα, β b 1,..., β b k. These imply an estimate of the mean of the response, bµ i = g 1 (bα + β b 1 x i1 + + β b k x ik ). Generalized Linear Models: An Introduction 10 The log-likelihood for the model, maximized over the regression coefficients, is nx log e L 0 = log e p(bµ i,φ; y i ) i=1 where p( ) is the probability or probability-density function corresponding to the family employed. A saturated model, which dedicates one parameter to each observation, and hence fits the data perfectly, has log-likelihood nx log e L 1 = log e p(y i,φ; y i ) i=1 Twice the difference between these log-likelihoods defines the residual deviance under the model, a generalization of the residual sum of squares for linear models: D(y; bµ) =2(log e L 1 log e L 0 ) Generalized Linear Models: An Introduction 11 Dividing the deviance by the estimated dispersion produces the scaled deviance: D(y; bµ)/ φ. b Likelihood-ratio tests can be formulated by taking differences in the deviance for nested models. Wald tests for individual coefficients are formulated using the estimated asymptotic standard errors of the coefficients. Some familiar examples: Combining the identity link with the Gaussian family produces the normal linear model. The maximum-likelihood estimates for this model are the ordinary least-squares estimates. Combining the logit link with the binomial family produces the logisticregression model (linear-logit model). Combining the probit link with the binomial family produces the linear probit model. Generalized Linear Models: An Introduction 12 Although the logit and probit links are familiar, the log-log and complementary log-log link for binomial data are not. These links are compared in the following graph. The log-log and complementary log-log link may be appropriate when the probability of the response as a function of the linear predictor approaches 0 and 1 asymmetrically. The log-log link can be applied via the complementary log-log link by exchanging the roles of 0 s and 1 s in the response Y.

5 Generalized Linear Models: An Introduction 13 µ i = g 1 (η i ) logit probit log log complementary log log Generalized Linear Models: An Introduction Poisson GLMs for Count Data Poisson generalized linear models arise in two common formally identical but substantively distinguishable contexts: 1. when the response variable in a regression model takes on non-negative integer values, such as a count; 2. to analyze associations among categorical variables in a contingency table of counts. The default link for the Poisson family is the log link η i Generalized Linear Models: An Introduction Poisson Regression I will employ data collected by Ornstein on interlocking director and top-executive positions among 248 major Canadian firms Ornstein performed a least-squares regression of the number of interlocks maintained by each firm on the firm s assets, and dummy variables for the firm s nation of control (Canada, US, UK, Other) and sector of operation (10 categories). Because the response variable is a count, a Poisson linear model might be preferable. The marginal distribution of number of interlocks shows many zero counts and an extreme positive skew: Generalized Linear Models: An Introduction 16 Frequency

6 Generalized Linear Models: An Introduction 17 FittingaPoissonGLMwithloglinktoOrnstein sdataproducesthe following results: Generalized Linear Models: An Introduction 18 Coefficient SE Constant Assets Nation of Control (baseline: Canada) Other United Kingdom United States Sector (baseline: Agriculture and Food) Banking Construction Finance Holding Company Manufacturing Merchandizing Mining Transportation Wood and Forest Products Generalized Linear Models: An Introduction 19 An analysis of deviance table for the model shows that all three explanatory variables have highly statistically significant effects: Source G 2 df p Assets Nation of Control Sector The deviance for the null model (with only a constant) is , and for the full model; thus the analog of the squared multiple correlation is R 2 = =.495 Generalized Linear Models: An Introduction 20 Effect displays for the terms in the Poisson-regression model: log e (a) Assets (billions of dollars) log e Canada Other U.S. U.K. Nation of Control (c) log e (b) WOD TRN FIN MIN HLD MER MAN AGR BNK CON Sector

7 Generalized Linear Models: An Introduction Loglinear Models for Contingency Tables Poisson GLMs may also be used to fit loglinear models to a contingency table of frequency counts, where the object is to model association among the variables in the table. The variables constituting the classifications of the table are treated as explanatory variables in the Poisson model, while the cell count plays the role of the response. We previously examined Campbell et al. s data on voter turnout in the 1956 U. S. presidential election We used a binomial logit model to analyze a three-way contingency table for turnout by perceived closeness of the election and intensity of partisan preference. The binomial logit model treats turnout as the response. An alternative is to construct a log-linear model for the expected cell count. Generalized Linear Models: An Introduction 22 This model looks very much like a three-way ANOVA model, where in place of the cell mean we have the log cell expected count: log µ ijk = µ + α 1(i) + α 2(j) + α 3(k) +α 12(ij) + α 13(ik) + α 23(jk) + α 123(ijk) Here, variable 1 is perceived closeness of the election; variable 2 is intensity of preference; and variable 3 is turnout. Although a term such as α 12(ij) looks like an interaction, it actually models the partial association between variables 1 and 2. The three-way term α 123(ijk) allows the association between any pair of variables to be different in different categories of the third variable; it thus represents an interaction in the usual sense of that concept. In fitting the log-linear model to data, we can use sigma-constraints on the parameters, much as we would for an ANOVA model. Generalized Linear Models: An Introduction 23 In the context of a three-way contingency table, the loglinear model above is a saturated model, because it has as many independent parameters (12) as there are cells in the table. I fit this model to the American Voter data, obtaining the following Type-II analysis of deviance table: Source G 2 df p Closeness Preference Turnout Closeness Preference Closeness Turnout Preference Turnout <.0001 Closeness Preference Turnout Generalized Linear Models: An Introduction 24 Notice that the likelihood-ratio test for the three-way term Closeness Preference Turnout is identical to the test for the Closeness Preference interaction in the logit model in which Turnout is the response variable. In general, as long as we fit the parameters for the associations among the explanatory variable (here Closeness Preference and, of course, its lower-order relatives, Closeness and Preference) and for the marginal distribution of the response (Turnout), the loglinear model for a contingency table is equivalent to a logit model. There is, therefore, no real advantage to using a loglinear model in this setting. Loglinear models, however, can be used to model association in other contexts.

8 Generalized Linear Models: An Introduction Over-Dispersed Binomial and Poisson Models The binomial and Poisson GLMs fix the dispersion parameter φ to 1. It is possible to fit versions of these models in which the dispersion is a free parameter, to be estimated along with the coefficients of the linear predictor The resulting error distribution is not an exponential family. The regression coefficients are unaffected by allowing dispersion different from 1, but the coefficient standard errors are multiplied by the square-root of φ. b Because the estimated dispersion typically exceeds 1, this inflates the standard errors That is, failing to account for over-dispersion produces misleadingly small standard errors. So-called over-dispersed binomial and Poisson models arise in several different circumstances. Generalized Linear Models: An Introduction 26 For example, in modeling proportions, it is possible that the probability of success µ varies for different individuals who share identical values of the predictors (this is called unmodeled heterogeneity ); or the individual successes and failures for a binomial observation are not independent, as required by the binomial distribution. Generalized Linear Models: An Introduction Diagnostics for GLMS Most regression diagnostics extend straightforwardly to generalized linear models. These extensions typically take advantage of the computation of maximum-likelihood estimates for generalized linear models by iterated weighted least squares (IWLS the procedure typically used to fit GLMs). Generalized Linear Models: An Introduction Outlier, Leverage, and Influence Diagnostics Hat-Values Hat-values for a generalized linear model can be taken directly from the final iteration of IWLS. They have the usual interpretation except that the hat-values in a GLM depend on Y as well as on the configuration of the X s.

9 Generalized Linear Models: An Introduction Residuals Several kinds of residuals can be defined for generalized linear models: Response residuals are simply the differences between the observed response and its estimated expected value: Y i bµ i. Working residuals are the residuals from the final WLS fit. These may be used to define partial residuals for component-plusresidual plots (see below). Pearson residuals are case-wise components of the Pearson goodness-of-fit statistic for the model: bφ 1/2 (Y i bµ i ) q bv (Y i η i ) where φ is the dispersion parameter for the model and V (Y i η i ) is the variance of the response given the linear predictor. Generalized Linear Models: An Introduction 30 Standardized Pearson residuals correct for the conditional response variation and for the leverage of the observations: Y i bµ R Pi = q i bv (Y i η i )(1 h i ). Deviance residuals, D i, are the square-roots of the case-wise components of the residual deviance, attaching the sign of Y i bµ i. Standardized deviance residuals are R Di = q bφ(1 h i ) Several different approximations to studentized residuals have been suggested. To calculate exact studentized residuals would require literally refitting the model deleting each observation in turn, and noting the decline in the deviance. D i Generalized Linear Models: An Introduction 31 Here is an approximation q due to Williams: Ei = (1 h i )RDi 2 + h irpi 2 where, once again, the sign is taken from Y i bµ i. A Bonferroni outlier test using the standard normal distribution may be based on the largest absolute studentized residual. Generalized Linear Models: An Introduction Influence Measures An approximation to Cook s distance influence measure is D i = R2 Pi bφ(k +1) h i 1 h i Note: Influence on coefficients = outlyingness leverage. Approximate values of DFBETA ij and DFBETAS ij (influence and standardized influence on each coefficient) may be obtained directly from the final iteration of the IWLS procedure. There are two largely similar extensions of added-variable plots to generalized linear models, one due to Wang and another to Cook and Weisberg.

10 Generalized Linear Models: An Introduction 33 The following graph shows hat-values, studentized residuals, and Cook s distances for the quasi-poisson model fit to Ornstein s interlocking directorate data: Generalized Linear Models: An Introduction 34 One observation number 1, the corporation with the largest assets stands out by combining a very large hat-value with the biggest absolute studentized residual. This point is not a statistically significant outlier Studentized Residuals Hat Values Generalized Linear Models: An Introduction 35 As shown in a DFBETA plot, observation 1 makes the coefficient of assets substantially smaller than it would otherwise be (recall that the coefficient for assets is ): DFBETA Assets In this case, the approximate DFBETA is quite accurate: If observation 1 is deleted, the assets coefficient increases to Index Generalized Linear Models: An Introduction Nonlinearity Diagnostics Component-plus-residual plots also extend straightforwardly to generalized linear models. Nonparametric smoothing of the resulting scatterplots can be important to interpretation, especially in models for binary responses, where the discreteness of the response makes the plots difficult to examine. Similar effects can occur for binomial and Poisson data. Component-plus-residual plots use the linearized model from the last step of the IWLS fit. For example, the partial residual for X j adds the working residual to B j X ij. The component-plus-residual plot graphs the partial residual against X j.

11 Generalized Linear Models: An Introduction 37 An illustrative component+residual plot, for assets in Ornstein s interlocking-directorate quasi-poisson regression: Generalized Linear Models: An Introduction 38 I therefore investigated transforming assets down the ladder of powers and roots, eventually arriving at the log transformation, the component+residual plot for which appears quite straight: Component + Residual Assets (billions of dollars) Component + Residual This plot is difficult to examine because of the large positive skew in assets, but it appears as if the assets slope is a good deal steeper at the left than at the right Log 10 Assets (billions of dollars) Generalized Linear Models: An Introduction 39 Note the relationship between the problems of influence and nonlinearity in this example: Observation 1 was influential in the original regression because its very large assets gave it high leverage, and because un-modelled nonlinearity put the observation below the erroneously linear fit for assets, pulling the regression surface towards it. Log-transforming assets fixes both of these problems. Generalized Linear Models: An Introduction 40 Alternative effect displays for assets in the transformed model: (a) (b) log e log e Assets (billions of dollars) Assets (billions of dollars)

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociolog 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Generalized Linear Models: An Introduction 1 1. Introduction I A snthesis due to Nelder and Wedderburn, generalized linear

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

For Bonnie and Jesse (again)

For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A L Y S I S a n d G E N E R A L I Z E D L I N E A R M O D E L S For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

6. INFLUENTIAL CASES IN GENERALIZED LINEAR MODELS. The Generalized Linear Model

6. INFLUENTIAL CASES IN GENERALIZED LINEAR MODELS. The Generalized Linear Model 79 6. INFLUENTIAL CASES IN GENERALIZED LINEAR MODELS The generalized linear model (GLM) extends from the general linear model to accommodate dependent variables that are not normally distributed, including

More information

Non-Gaussian Response Variables

Non-Gaussian Response Variables Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

More Accurately Analyze Complex Relationships

More Accurately Analyze Complex Relationships SPSS Advanced Statistics 17.0 Specifications More Accurately Analyze Complex Relationships Make your analysis more accurate and reach more dependable conclusions with statistics designed to fit the inherent

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Introduction to the R Statistical Computing Environment

Introduction to the R Statistical Computing Environment Introduction to the R Statistical Computing Environment John Fox McMaster University ICPSR 2012 John Fox (McMaster University) Introduction to R ICPSR 2012 1 / 34 Outline Getting Started with R Statistical

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Stat 8053, Fall 2013: Poisson Regression (Faraway, 2.1 & 3)

Stat 8053, Fall 2013: Poisson Regression (Faraway, 2.1 & 3) Stat 8053, Fall 2013: Poisson Regression (Faraway, 2.1 & 3) The random component y x Po(λ(x)), so E(y x) = λ(x) = Var(y x). Linear predictor η(x) = x β Link function log(e(y x)) = log(λ(x)) = η(x), for

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

A Guide to Modern Econometric:

A Guide to Modern Econometric: A Guide to Modern Econometric: 4th edition Marno Verbeek Rotterdam School of Management, Erasmus University, Rotterdam B 379887 )WILEY A John Wiley & Sons, Ltd., Publication Contents Preface xiii 1 Introduction

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved.

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved. This file consists of Chapter 4 of MacAnova User s Guide by Gary W. Oehlert and Christopher Bingham, issued as Technical Report Number 617, School of Statistics, University of Minnesota, March 1997, describing

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March Presented by: Tanya D. Havlicek, ACAS, MAAA ANTITRUST Notice The Casualty Actuarial Society is committed

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Linear Models, Problems

Linear Models, Problems Linear Models, Problems John Fox McMaster University Draft: Please do not quote without permission Revised January 2003 Copyright c 2002, 2003 by John Fox I. The Normal Linear Model: Structure and Assumptions

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

A. Motivation To motivate the analysis of variance framework, we consider the following example.

A. Motivation To motivate the analysis of variance framework, we consider the following example. 9.07 ntroduction to Statistics for Brain and Cognitive Sciences Emery N. Brown Lecture 14: Analysis of Variance. Objectives Understand analysis of variance as a special case of the linear model. Understand

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Assumptions of Linear Model Homoskedasticity Model variance No error in X variables Errors in variables No missing data Missing data model Normally distributed error Error in

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Regression and Multivariate Data Analysis Summary

Regression and Multivariate Data Analysis Summary Regression and Multivariate Data Analysis Summary Rebecca Sela February 27, 2006 In regression, one models a response variable (y) using predictor variables (x 1,...x n ). Such a model can be used for

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Generalized linear models

Generalized linear models Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU. SOS3003 Applied data analysis for social science Lecture note 08-00 Erling Berge Department of sociology and political science NTNU Erling Berge 00 Literature Logistic regression II Hamilton Ch 7 p7-4

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Notes for week 4 (part 2)

Notes for week 4 (part 2) Notes for week 4 (part 2) Ben Bolker October 3, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information