Generalized Linear Models Introduction

Similar documents
STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Generalized Linear Models

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Linear Regression With Special Variables

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Outline of GLMs. Definitions

Generalized linear models

Generalized Linear Models 1

Generalized linear models

STAT5044: Regression and Anova


Loglikelihood and Confidence Intervals

LOGISTIC REGRESSION Joseph M. Hilbe

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Lecture 13: More on Binary Data

Introduction to General and Generalized Linear Models

SB1a Applied Statistics Lectures 9-10

Introduction to Generalized Linear Models

Model Checking and Improvement

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

Linear Regression Models P8111

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Generalized Estimating Equations

Generalized Linear Models. Kurt Hornik

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Bayesian Inference. Chapter 9. Linear models and regression

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Poisson regression 1/15

Linear Methods for Prediction

Generalized Linear Models I

Sections 4.1, 4.2, 4.3

Master s Written Examination

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Semiparametric Generalized Linear Models

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Generalized Linear Models

HT Introduction. P(X i = x i ) = e λ λ x i

Generalized Linear Models

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

MATH Generalized Linear Models

Stat 579: Generalized Linear Models and Extensions

GENERALIZED LINEAR MODELS Joseph M. Hilbe

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Generalized Linear Models (GLZ)

Answer Key for STAT 200B HW No. 7

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Chapter 4: Generalized Linear Models-I

Bayesian linear regression

Single-level Models for Binary Responses

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Linear Methods for Prediction

Bayesian Linear Regression

12 Modelling Binomial Response Data

D-optimal Designs for Factorial Experiments under Generalized Linear Models

Generalized linear mixed models (GLMMs) for dependent compound risk models

Weighted Least Squares I

9 Generalized Linear Models

Generalized linear mixed models for dependent compound risk models

Generalized Linear Mixed Models in the competitive non-life insurance market

11. Generalized Linear Models: An Introduction

Review and continuation from last week Properties of MLEs

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Biostatistics Advanced Methods in Biostatistics IV

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

ABHELSINKI UNIVERSITY OF TECHNOLOGY

Stat 5101 Lecture Notes

Contents 1. Contents

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Generalized Linear Models: An Introduction

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Math 152. Rumbos Fall Solutions to Assignment #12

Likelihood inference in the presence of nuisance parameters

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Generalized linear models III Log-linear and related models

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable

Math 494: Mathematical Statistics

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Homework 1 Solutions

MODELING COUNT DATA Joseph M. Hilbe

Lecture 5: LDA and Logistic Regression

Chapter 4: Generalized Linear Models-II

Notes on the Multivariate Normal and Related Topics

Lecture 3. Univariate Bayesian inference: conjugate analysis

Generalized Linear Models

Chapter 7. Hypothesis Testing

Transcription:

Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin

Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes, transformations will help, but not always. Generalized Linear Models are an extension to linear models which allow for regression in more complex situations. Analyzes that fall into the Generalized Linear Models framework at Logistic and Probit regression (y X has a Binomial distribution), Poisson regression (y X has a Poisson distribution, Log-linear models (contingency tables). Non-linearity, multiplicative effects and errors Bounded responses Discrete responses In all of these examples, E[Y X] = µ depends on a linear function of X, e.g. Xβ. Generalized Linear Models 1

Generalized linear model involve the following 4 pieces. 1. Linear predictor: η = Xβ 2. Link function g( ): Relates the linear predictor to the mean of the outcome variable g(µ) = η = Xβ µ = g 1 (η) = g 1 (Xβ) 3. Distribution: What is the distribution of the response variable y. These are usually a member of the exponential family which includes, normal, lognormal, poisson, binomial, gamma, hypergeometric. 4. Dispersion parameter φ: Some distributions have an additional parameter dealing with the the spread of the distribution. The form of this usually depends on the relationship between the mean and the variance. With some distributions, this is fixed (e.g. Poisson or binomial), while with others it is an additional parameter to the modelled and estimated (e.g. normal or gamma). Generalized Linear Models 2

Normal linear regression is a special case of a generalized linear model were 1. Linear predictor: η = Xβ 2. Link function: g(µ) = µ = Xβ (Identity Link) 3. Distribution: y i x i, β, σ 2 N(x T i β, σ2 ) 4. Dispersion parameter: σ 2 Generalized Linear Models 3

Poisson regression: The usually form for Poisson regression uses the log link log µ = Xβ µ = exp(xβ) Another example used the identity link µ = Xβ This is less common as the mean of a Poisson random variable must be positive which the identity link doesn t guarantee. The log link however does, which is one reason it is very popular. It also works well in many situations As mentioned earlier the dispersion parameter φ = 1 is fixed as Var(y) = µ for the Poisson distribution. Generalized Linear Models 4

Example: Geriatric Study to Reduce Falls 100 subject were studied to investigate two treatments to which is better to reduce falls. y: number of falls during 6 months of study (self-reported) x 1 : Treatment - 0 = education only, 1 = education + aerobic exercise x 2 : Gender - 0 = female, 1 = male x 3 : Balance Index (bigger is better) x 4 : Strength Index (bigger is better) The question of interest is how much does the treatment reduce the number of falls, after adjusting for Gender, Balance Index and Strength Index Generalized Linear Models 5

Falls 0 2 4 6 8 10 Falls 0 2 4 6 8 10 Education Only Education + Exercise 20 40 60 80 100 Balance Index 20 30 40 50 60 70 80 90 Strength Index Generalized Linear Models 6

Logistic regression: This form is slightly different as we work with the mean of the sample proportion y i n i instead the mean of y i. Logistic regression is based on y i x i Bin(n i, µ i ) where µ i is a function of x i. The link function is g(µ) = log µ 1 µ i.e. the log odds ratio. The inverse link function gives µ = g 1 (Xβ) = exβ 1 + e Xβ Generalized Linear Models 7

Thus the likelihood is p(y β) = n i=1 ( ni y i ) ( e X i β ) yi ( 1 1 + e X iβ 1 + e X iβ ) ni y i The dispersion parameter φ = 1 Generalized Linear Models 8

Link Functions Changing the link functions allows for different relationships between the response and predictor variables. The choice of link function g( ) should be made so that the relationship between the transformed mean and the predictor variables is linear. Note transforming the mean via the link function is different from transforming the data For example consider the two models 1. log y i X i, β N(X i β, σ 2 ) or equivalently y i X i, β logn(x i β, σ 2 ) ) E[y i X i, β] = exp (X i β + σ2 2 and Var(y i X i, β) = exp(2(x i β + σ 2 ))(exp(σ 2 ) 1) Link Functions 9

2. y i X i, β N(µ i, σ 2 ) where log µ i = X i β, µ i = exp(x i β) (normal model with log link) The first model has a different mean and the variability depends on X where as the variability in the second model does not depend on X. When choosing a link function, you often need to consider the plausible values of the mean of the distribution. For example, with binomial data, the success probability must be in [0,1]. However Xβ can take values on (, ). Thus you can get into trouble with binomial data with the model µ = Xβ (identity link). Link Functions 10

Possible choices include Logit link: g(µ) = log µ 1 µ Probit link: g(µ) = Φ 1 (µ) (Standard Normal Inverse CDF) Complementary Log-Log link g(µ) = log( log(µ)) All of these happen to be quantile functions for different distributions. Link Functions 11

Thus the inverse link functions are CDFs Logit link: g 1 (η) = eη 1 + e η (Standard Logistic) Probit link: Complementary Log-Log link: g 1 (η) = Φ(η) (N(0, 1)) g 1 (η) = e eη (Gumbel) Link Functions 12

Thus in this case any distribution defined on (, ) could be the basis for a link function, but these are the popular ones. One other choice that is used are based on t ν distributions as they have some robustness properties. Note that a link function doesn t have to have the property of mapping the range of the mean to (, ). For example, the identity link (g(µ) = µ) can be used in Poisson regression and in binomial regression problems. In the binomial case, it can be reasonable if the success probabilities lie in the range (0.2, 0.8). Similarly, an inverse link function doesn t have to have to map Xβ back to the whole range of the mean for a distribution. For example, the log link will only give positive means (µ = e η ). This can be an useful model with normal data, even though in general a normal mean can take any value. Link Functions 13

Common Link Functions The following are common link function choices for different distributions Normal Identity: g(µ) = µ Log: g(µ) = log µ Inverse: g(µ) = 1 µ Binomial Logit: g(µ) = log µ 1 µ Probit: g(µ) = Φ 1 (µ) Complementary Log-Log link: g(µ) = log( log(µ)) Log: g(µ) = log µ Common Link Functions 14

Poisson Log: g(µ) = log µ Identity: g(µ) = µ Square root: g(µ) = µ Gamma Inverse: g(µ) = 1 µ Log: g(µ) = log µ Identity: g(µ) = µ Inv-Normal Inverse squared: g(µ) = 1 µ 2 Inverse: g(µ) = 1 µ Log: g(µ) = log µ Identity: g(µ) = µ Common Link Functions 15

The first link function mentioned for each distribution is the canonical link which is based on the writing the density of each distribution in the exponential family form. p(y θ) = f(y)g(θ) exp ( φ(θ) T u(y) ) Common Link Functions 16

Dispersion Parameter So far we have only discussed the mean function. However we also need to consider the variability of the data as well. For any distribution, we can consider the variance to be a function of the mean (V (µ)) and a dispersion parameter (φ) Var(y) = φv (µ) The variance functions and dispersion parameters for the common distributions are Distribution N(µ, σ 2 ) P ois(µ) Bin(n, µ) Gamma(α, ν) V (µ) 1 µ µ(1 µ) µ φ σ 2 1 1 1 n ν Note for the Gamma distribution, the form of these can depend on how the distribution is parameterized. (McCullagh and Nelder have different formulas due to this.) Dispersion Parameter 17

So when building models we need models for dealing with the dispersion in the data. Exactly how you want to do this will depend on the problem. Overdispersion Often data will have more variability than might be expected. For example, consider Poisson like data and consider a subset of data which has the same levels of the predictor variables (call it y 1, y 2,..., y m ). If the data is Poisson, the sample variance should be approximately the sample mean s 2 m = 1 m 1 m (y i ȳ) 2 ȳ i=1 If s 2 m > ȳ, this suggests that there is more variability than can be explained by the explanatory variables. This extra variability can be handled in a number of ways. Dispersion Parameter 18

One approach is to add in some additional variability into the means. y i µ i ind P ois(µ i ) µ i X i, β, σ 2 ind N(X i β, σ 2 ) In this approach every observation with the same level of the explanatory variables will have a different mean, which will lead to more variability in the ys. Var(y i ) = E[Var(y i µ i )] + Var(E[y i µ i ]) = E[µ i ] + Var(µ i ) = X i β + σ 2 X i β = E[y i ] Note that normally you probably model log µ i N(X i β, σ 2 ), but showing the math was easier with the identity link instead of the log link. Dispersion Parameter 19

Maximum Likelihood Estimation In the analysis of Generalized Linear Models, the usual approach to determining parameter estimates is Maximum Likelihood. This is in contrast to linear regression where the usual criterion for optimization is least squares. Though these are the same if you are willing to assume that the y i X i are normally distributed. Aside: Often in the regression situation the only assumptions that are made are E[Y i X i ] = X T i β Var(Y i X i ) = σ 2 There are often no assumptions about the exact distribution of the y s. Without this, maximum likelihood approaches are not possible. Maximum Likelihood Estimation 20

MLE Framework: Lets assume that that y 1, y 2,..., y n are conditionally independent given X 1, X 2,..., X n and a parameter θ (which may be a vector). Let y i have density (if continous) or mass function (if discrete) f(y i θ). Then the likelihood function is L(θ) = n f(y i θ) i=1 The maximum likelihood estimate (MLE) of θ is ˆθ = arg sup L(θ) i.e. the value of θ that maximizes the likelihood function. One way of thinking of the MLE is that its the value of the parameter that is most consistent with the data. Maximum Likelihood Estimation 21

Note that when determining MLEs, it is usually easier to work with the log likelihood function l(θ) = log L(θ) = n log f(y i θ) i=1 It has the same optimum since log is an increasing function and it is easier to work with since derivatives of sums are usually much nicer than derivatives of products. Example: Normal mean and variance Assume y 1, y 2,..., y n iid N(µ, σ 2 ). Then Maximum Likelihood Estimation 22

l(µ, σ) = n ( log 2π log σ (y i µ) 2 ) i=1 = c n log σ 1 2σ 2 2σ 2 n (y i µ) 2 i=1 To find the MLE of µ and σ, differentiating and setting to 0 is usually the easiest approach. This gives l(µ, σ) µ l(µ, σ) σ = 1 σ 2 n (y i µ) 2 = i=1 = n σ + 1 σ 3 n (y i µ) 2 i=1 n(ȳ µ) σ 2 Maximum Likelihood Estimation 23

These imply ˆµ = ȳ ˆσ 2 = 1 n n i=1 (y i ȳ) 2 = n 1 n s2 MLEs have nice properties, assuming some regularity conditions. include These Consistent (asymptotically unbiased) Minimum variance Asymptotically normal Maximum Likelihood Estimation 24