Introduction to General and Generalized Linear Models

Similar documents
Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models

Generalized linear models

Generalized linear models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Generalized Linear Models 1

Generalized Linear Models (GLZ)

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Generalized Linear Models. Kurt Hornik

Generalized Linear Models I

Lecture 8. Poisson models for counts

Outline of GLMs. Definitions

Generalized Linear Models Introduction

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Generalized Linear Models: An Introduction

12 Modelling Binomial Response Data

Stat 579: Generalized Linear Models and Extensions

Generalized linear models

Time Series Analysis

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

9 Generalized Linear Models

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

MATH Generalized Linear Models

Generalized Linear Models

Poisson regression: Further topics

Introduction to General and Generalized Linear Models

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

GENERALIZED LINEAR MODELS Joseph M. Hilbe

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

LOGISTIC REGRESSION Joseph M. Hilbe

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear Regression Models P8111

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Generalized Linear Models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Generalized Linear Models

STAT5044: Regression and Anova

Generalized Linear Models. stat 557 Heike Hofmann

Generalized Estimating Equations

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Modeling Overdispersion

Generalized Linear Models (1/29/13)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Generalized Linear Models and Extensions

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

UNIVERSITY OF TORONTO Faculty of Arts and Science

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Stat 5101 Lecture Notes

[y i α βx i ] 2 (2) Q = i=1

Introduction to Generalized Linear Models

Semiparametric Generalized Linear Models

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Multivariate Regression

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Bayesian Inference. Chapter 9. Linear models and regression

Sections 4.1, 4.2, 4.3

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

11. Generalized Linear Models: An Introduction

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

36-463/663: Multilevel & Hierarchical Models

SB1a Applied Statistics Lectures 9-10

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1

Chapter 22: Log-linear regression for Poisson counts

Computational methods for mixed models

DELTA METHOD and RESERVING

Limited Dependent Variables and Panel Data

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Institute of Actuaries of India

Advanced Ratemaking. Chapter 27 GLMs

Homework 1 Solutions

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Subject CS1 Actuarial Statistics 1 Core Principles

D-optimal Designs for Factorial Experiments under Generalized Linear Models

Chapter 5: Generalized Linear Models

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Analysis of 2 n factorial experiments with exponentially distributed response variable: a comparative study

Poisson regression 1/15

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Package HGLMMM for Hierarchical Generalized Linear Models

Generalized Linear Models for a Dependent Aggregate Claims Model

PhD Qualifying Examination Department of Statistics, University of Florida

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

MODELING COUNT DATA Joseph M. Hilbe

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Transcription:

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby October 2010 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 1 / 29

Today The generalized linear model Link function (Estimation) Fitted values Residuals Likelihood ratio test Over-dispersion Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 2 / 29

The Generalized Linear Model The Generalized Linear Model Definition (The generalized linear model) Assume that Y 1, Y 2,..., Y n are mutually independent, and the density can be described by an exponential dispersion model with the same variance function V (µ). A generalized linear model for Y 1, Y 2,..., Y n describes an affine hypothesis for η 1, η 2,..., η n, where η i = g(µ i ) is a transformation of the mean values µ 1, µ 2,..., µ n. The hypothesis is of the form H 0 : η η 0 L, where L is a linear subspace R n of dimension k, and where η 0 denotes a vector of known off-set values. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 3 / 29

The Generalized Linear Model Dimension and design matrix Definition (Dimension of the generalized linear model) The dimension k of the subspace L for the generalized linear model is the dimension of the model Definition (Design matrix for the generalized linear model) Consider the linear subspace L = span{x 1,..., x k }, i.e. the subspace is spanned by k vectors (k < n), such that the hypothesis can be written η η 0 = Xβ with β R k, where X has full rank. The n k matrix X is called the design matrix. The i th row of the design matrix is given by the model vector 0 x i1 1 x i2 C for the i th observation. x i = B @ Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 4 / 29. x ik C A,

The Generalized Linear Model The link function Definition (The link function) The link function, g( ) describes the relation between the linear predictor η i and the mean value parameter µ i = E[Y i ]. The relation is η i = g(µ i ) The inverse mapping g 1 ( ) thus expresses the mean value µ as a function of the linear predictor η: µ = g 1 (η) that is µ i = g 1 (x T i β) = g 1 j x ij β j Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 5 / 29

Link functions The Generalized Linear Model The most commonly used link functions, η = g(µ), are : Name Link function η = g(µ) µ = g 1 (η) Identity µ η logarithm ln(µ) exp(η) logit ln(µ/(1 µ)) exp(η)/[1 + exp(η)] reciprocal 1/µ 1/η power µ k η 1/k squareroot µ η 2 probit Φ 1 (µ) Φ(η) log-log ln( ln(µ)) exp( exp(η)) cloglog ln( ln(1 µ)) 1 exp( exp(η)) Table: Commonly used link function. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 6 / 29

The canonical link The Generalized Linear Model The canonical link is the function which transforms the mean to the canonical location parameter of the exponential dispersion family, i.e. it is the function for which g(µ) = θ. The canonical link function for the most widely considered densities are Density Link:η = g(µ) Name Normal η = µ identity Poisson η = ln(µ) logarithm Binomial η = ln[µ/(1 µ)] logit Gamma η = 1/µ reciprocal Inverse Gauss η = 1/µ 2 power (k = 2) Table: Canonical link functions for some widely used densities. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 7 / 29

The Generalized Linear Model Specification of a generalized linear model a) Distribution / Variance function: Specification of the distribution or the variance function V (µ). b) Link function: Specification of the link function g( ), which describes a function of the mean value which can be described linearly by the explanatory variables. c) Linear predictor: Specification of the linear dependency g(µ i ) = η i = (x i ) T β. d) Precision (optional): If needed the precision is formulated as known individual weights, λ i = w i, or as a common dispersion parameter, λ = 1/σ 2, or a combination λ i = w i /σ 2. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 8 / 29

The Generalized Linear Model Maximum likelihood estimation Theorem (Estimation in generalized linear models) Consider the generalized linear model as defined on slide 3 for the observations Y 1,... Y n and assume that Y 1,... Y n are mutually independent with densities, which can be described by an exponential dispersion model with the variance function V ( ), dispersion parameter σ 2, and optionally the weights w i. Assume that the linear predictor is parameterized with β corresponding to the design matrix X, then the maximum likelihood estimate β for β is found as the solution to [X(β)] T i µ (µ)(y µ) = 0, where X(β) denotes the local design matrix and µ = µ(β) given by µ i (β) = g 1 (x i T β), denotes the fitted mean values corresponding to the parameters β, and i µ (µ) is the expected information with respect to µ. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 9 / 29

The Generalized Linear Model Properties of the ML estimator Theorem (Asymptotic distribution of the ML estimator) Under the hypothesis η = Xβ we have asymptotically β β σ 2 N k (0, Σ), where the dispersion matrix Σ for β is D[ β] = Σ = [X T W (β)x] 1 with { W (β) = diag w i [g (µ i )] 2 V (µ i ) }, In the case of the canonical link, the weight matrix W (β) is W (β) = diag {w i V (µ i )}. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 10 / 29

The Generalized Linear Model Linear prediction for the generalized linear model Definition (Linear prediction for the generalized linear model) The linear prediction η is defined as the values η = X β with the linear prediction corresponding to the i th observation is k η i = x ij βj = (x i ) T β. j=1 The linear predictions η are approximately normally distributed with D[ η] σ 2 XΣX T where Σ is the dispersion matrix for β. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 11 / 29

The Generalized Linear Model Fitted values for the generalized linear model Definition (Fitted values for the generalized linear model) The fitted values are defined as the values where the i th value is given as µ = µ(x β), µ i = g 1 ( η i ) with the fitted value η i of the linear prediction. The fitted values µ are approximately normally distributed with [ ] µ 2 D[ µ] σ 2 XΣX T η where Σ is the dispersion matrix for β. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 12 / 29

Residual deviance The Generalized Linear Model Definition (Residual deviance) Consider the generalized linear model defined on slide 3. The residual deviance corresponding to this model is D(y; µ( β)) = n w i d(y i ; µ i ) i=1 with d(y i ; µ i ) denoting the unit deviance corresponding the observation y i and the fitted value µ i and where w i denotes the weights (if present). If the model includes a dispersion parameter σ 2, the scaled residual deviance is D (y; µ( β)) D(y; µ( β)) = σ 2. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 13 / 29

The Generalized Linear Model Residuals Residuals represents the difference between the data and the model. In the classical GLM the residuals are r i = y i µ i. These are called response residuals for GLM s. Since the variance of the response is not constant for most GLM s we need some modification. We will look at: Deviance residuals Pearson residuals Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 14 / 29

The Generalized Linear Model Residuals Definition (Deviance residual) Consider the generalized linear model from for the observations Y 1,... Y n. The deviance residual for the i th observation is defined as r D i = r D (y i ; µ i ) = sign(y i µ i ) w i d(y i, µ i ) where sign(x) denotes the sign function sign(x) = 1 for x > 0 og sign(x) = 1 for x < 0, and with w i denoting the weight (if relevant), d(y; µ) denoting the unit deviance and µ i denoting the fitted value corresponding to the i th observation. Assessments of the deviance residuals is in good agreement with the likelihood approach as the deviance residuals simply express differences in log-likelihood. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 15 / 29

The Generalized Linear Model Residuals Definition (Pearson residual) Consider again the generalized linear model from for the observations Y 1,... Y n. The Pearson residuals are defined as the values r P i = r P (y i ; µ i ) = y i µ i V ( µi )/w i The Pearson residual is thus obtained by scaling the response residual with Var[Yi ]. Hence, the Pearson residual is the response residual normalized with the estimated standard deviation for the observation. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 16 / 29

Likelihood ratio tests Likelihood ratio tests The approximative normal distribution of the ML-estimator implies that many distributional results from the classical GLM-theory are carried over to generalized linear models as approximative (asymptotic) results. An example of this is the likelihood ratio test. In the classical GLM case it was possible to derive the exact distribution of the likelihood ratio test statistic (the F-distribution). For generalized linear models, this is not possible, and hence we shall use the asymptotic results for the logarithm of the likelihood ratio. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 17 / 29

Likelihood ratio tests Likelihood ratio test Theorem (Likelihood ratio test) Consider the generalized linear model. Assume that the model H 1 : η L R k holds with L parameterized as η = X 1 β, and consider the hypotheses H 0 : η L 0 R m where η = X 0 α and m < k, and with the alternative H 1 : η L\L 0. Then the likelihood ratio test for H 0 has the test statistic 2 log λ(y) = D ( y; µ( β) ) D ( y; µ( α) ) When H 0 is true, the test statistic will asymptotically follow a χ 2 (k m) distribution. If the model includes a dispersion parameter, σ 2, then D ( µ( β); µ(β( α)) ) will asymptotically follow a σ 2 χ 2 (k m) distribution. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 18 / 29

Likelihood ratio tests Test for model sufficiency In analogy with classical GLM s one often starts with formulating a rather comprehensive model, and then reduces the model by successive tests. In contrast to classical GLM s we may however test the goodness of fit of the initial model. The test is a special case of the likelihood ratio test. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 19 / 29

Likelihood ratio tests Test for model sufficiency Test for model sufficiency Consider the generalized linear model, and assume that the dispersion σ 2 = 1. Let H full denote the full, or saturated model, i.e. H full : µ R n and consider the hypotheses H 0 : η L R k with L parameterized as η = X 0 β. Then, as the residual deviance under H full is 0, the test statistic is the residual deviance D ( µ( β) ). When H 0 is true, the test statistic is distributed as χ 2 (n k). The test rejects for large values of D ( µ( β) ). Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 20 / 29

Likelihood ratio tests Residual deviance measures goodness of fit The residual deviance D ( y; µ( β) ) is a reasonable measure of the goodness of fit of a model H 0. When referring to a hypothesized model H 0, we shall sometimes use the symbol G 2 (H 0 ) to denote the residual deviance D ( y; µ( β) ). Using that convention, the partitioning of residual deviance may be formulated as G 2 (H 0 H 1 ) = G 2 (H 0 ) G 2 (H 1 ) with G 2 (H 0 H 1 ) interpreted as the goodness fit test statistic for H 0 conditioned on H 1 being true, and G 2 (H 0 ) and G 2 (H 1 ), denoting the unconditional goodness of fit statistics for H 0 and H 1, respectively. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 21 / 29

Likelihood ratio tests Analysis of deviance table The initial test for goodness of fit of the initial model is often represented in an analysis of deviance table in analogy with the ANOVA table for classical GLM s. In the table the goodness of fit test statistic corresponding to the initial model G 2 (H 1 ) = D ( y; µ( β) ) is shown in the line labelled Error. The statistic should be compared to percentiles in the χ 2 (n k) distribution. The table also shows the test statistic for H null under the assumption that H 1 is true. The test investigates whether the model is necessary at all, i.e. whether at least some of the coefficients differ significantly from zero. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 22 / 29

Likelihood ratio tests Analysis of deviance table Note, that in the case of a generalized linear model, we can start the analysis by using the residual (error) deviance to test whether the model may be maintained, at all. This is in contrast to the classical GLM s where the residual sum of squares around the initial model H 1 served to estimate σ 2, and therefore we had no reference value to compare with the residual sum of squares. In the generalized linear models the variance is a known function of the mean, and therefore in general there is no need to estimate a separate variance. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 23 / 29

Likelihood ratio tests Analysis of deviance table Source f Deviance Mean deviance Goodness of fit interpretation Model H null k 1 D ( µ( β); ) D ( µ( µ β); ) µ null null G 2 (H null H 1 ) k 1 Residual (Error) n k D ( y; µ( β) ) D ( y; µ( β) ) G 2 (H 1 ) n k Corrected total n 1 D ( ) y; µ null G 2 (H null ) Table: Initial assessment of goodness of fit of a model H 0. H null and µ null refer to the minimal model, i.e. a model with all observations having the same mean value. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 24 / 29

Overdispersion Overdispersion It may happen that even if one has tried to fit a rather comprehensive model (i.e. a model with many parameters), the fit is not satisfactory, and the residual deviance D ( y; µ( β) ) is larger than what can be explained by the χ 2 -distribution. An explanation for such a poor model fit could be an improper choice of linear predictor, or of link or response distribution. If the residuals exhibit a random pattern, and there are no other indications of misfit, then the explanation could be that the variance is larger than indicated by V (µ). We say that the data are overdispersed. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 25 / 29

Overdispersion Overdispersion When data are overdispersed, a more appropriate model might be obtained by including a dispersion parameter, σ 2, in the model, i.e. a distribution model of the form with λ i = w i /σ 2, and σ 2 denoting the overdispersion, Var[Y i ] = σ 2 V (µ i )/w i. As the dispersion parameter only would enter in the score function as a constant factor, this does not affect the estimation of the mean value parameters β. However, because of the larger error variance, the distribution of the test statistics will be influenced. If, for some reasons, the parameter σ 2 had been known beforehand, one would include this known value in the weights, w i. Most often, when it is found necessary to choose a model with overdispersion, σ 2 shall be estimated from the data. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 26 / 29

Overdispersion The dispersion parameter For the normal distribution family, the dispersion parameter is just the variance σ 2. In the case of a gamma distribution family, the shape parameter α acts as dispersion parameter. The maximum likelihood estimation of the shape parameter is not too complicated for the normal and the gamma distributions but for other exponential dispersion families, ML estimation of the dispersion parameter is more tricky. The problem is that the dispersion parameter enters in the likelihood function, not only as a factor to the deviance, but also in the normalizing factor a(y i, w i /σ 2 ). It is necessary to have an explicit expression for this factor as function of σ 2 (as in the case of the normal and the gamma distribution families) in order to perform the maximum likelihood estimation. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 27 / 29

Overdispersion The dispersion parameter Approximate moment estimate for the dispersion parameter It is common practice to use the residual deviance D(y; µ( β)) as basis for the estimation of σ 2 and use the result that D(y; µ( β)) is approximately distributed as σ 2 χ 2 (n k). It then follows that is asymptotically unbiased for σ 2. σ 2 dev = D(y; µ( β)) n k Alternatively, one would utilize the corresponding Pearson goodness of fit statistic X 2 = n i=1 w i (y i µ i ) 2 V ( µ i ) which likewise follows a σ 2 χ 2 (n k)-distribution, and use the estimator σ 2 P ears = X2 n k. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 28 / 29

Overdispersion Deviance table in the case of overdispersion Source f Deviance Scaled deviance Model H null k 1 D ( µ( β); µ null ) D(µ( b β);bµ null )/(k 1) D(y;µ( b β))/(n k) Residual (Error) n k D(y; µ( β)) Corrected total n 1 D ( y; µ null ) Table: Example of Deviance table in the case of overdispersion. It is noted that the scaled deviance is equal to the model deviance scaled by the error deviance. The scaled deviance, D, i.e. deviance divided by σ 2 is used in the tests instead of the crude deviance in case of overdispersion. For calculation of p-values etc. the asymptotic χ 2 -distribution of the scaled deviance is used. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 29 / 29