Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Similar documents
STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Generalized Linear Models

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear Regression Models P8111

Generalized Linear Models Introduction

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Generalized linear models

Generalized Linear Models. Kurt Hornik

Generalized Linear Models 1

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

12 Modelling Binomial Response Data

LOGISTIC REGRESSION Joseph M. Hilbe

Outline of GLMs. Definitions

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

SB1a Applied Statistics Lectures 9-10

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Generalized linear models

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Likelihoods for Generalized Linear Models

Figure 36: Respiratory infection versus time for the first 49 children.

Introduction to Generalized Linear Models

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Generalized Linear Models I

Generalized Estimating Equations

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Linear Regression With Special Variables

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Semiparametric Generalized Linear Models

9 Generalized Linear Models

Generalized Linear Models. stat 557 Heike Hofmann

,..., θ(2),..., θ(n)

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Linear Methods for Prediction

When is MLE appropriate

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Lecture 01: Introduction

Stat 579: Generalized Linear Models and Extensions

Generalized Linear Models (GLZ)

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Generalized Linear Models (1/29/13)

Generalized Linear Models for Non-Normal Data

STAT5044: Regression and Anova

Introduction to General and Generalized Linear Models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

8 Nominal and Ordinal Logistic Regression

Generalized Linear Models and Exponential Families

Some explanations about the IWLS algorithm to fit generalized linear models

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

MSH3 Generalized linear model

11. Generalized Linear Models: An Introduction

Sections 4.1, 4.2, 4.3

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Chapter 4: Generalized Linear Models-II

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Weighted Least Squares I

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Generalized Linear Models

Linear Methods for Prediction

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Introduction to General and Generalized Linear Models

Generalized Linear Models: An Introduction

Multinomial Regression Models

Generalized linear models

Chapter 4: Generalized Linear Models-I

Generalized Linear Models and Extensions

UNIVERSITY OF TORONTO Faculty of Arts and Science

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Exam Applied Statistical Regression. Good Luck!

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Part V: Binary response data

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Stat 710: Mathematical Statistics Lecture 12

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

STA102 Class Notes Chapter Logistic Regression

Classification. Chapter Introduction. 6.2 The Bayes classifier

Generalised linear models. Response variable can take a number of different formats

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

MATH Generalized Linear Models

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

High-Throughput Sequencing Course

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Poisson regression 1/15

Stat 5101 Lecture Notes

Lecture 13: More on Binary Data

where F ( ) is the gamma function, y > 0, µ > 0, σ 2 > 0. (a) show that Y has an exponential family distribution of the form

STA 216, GLM, Lecture 16. October 29, 2007

Generalized Linear Models

Single-level Models for Binary Responses

Generalized Linear Models

STAT 526 Spring Final Exam. Thursday May 5, 2011

Transcription:

Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered categorical response data 2. Definition of the exponential family likelihood 3. Components of a generalized linear model (GLM) 4. Algorithm for obtaining maximum likelihood estimates 1

Cellular Differentiation Data (Piegorsch et al., 88) Interest in the effect of two agents of immuno-activating ability that may introduce cell differentiation. Response variable: number of cells that exhibited markers after exposure was recorded. Scientific interest: Do the agents TNF (tumor necrosis factor) and IFN (interferon) simulate cell differentiation independently or is there a synergetic effect? 2

Cellular Differentiation Data Number of cells Dose of Dose of differentiating TNF (U/ml) IFN (U/ml) 11 0 0 18 0 4 20 0 20 39 0 100 22 1 0 38 1 4 52 1 20 69 1 100 31 10 0 68 10 4 69 10 20 128 10 100 102 100 0 171 100 4 180 100 20 193 100 100 Scale of response variable: count 3

Comments Potentially we can log transform the count response & then fit a normal linear regression model This is often done, but are there problems/pitfalls with this approach? For count data, often the variability increases with the mean, which is not characterized by typical normal linear models In addition, the data are discrete so the normal likelihood is clearly inappropriate, and may provide a particularly poor approximation when counts can be small 4

Job expectations for psychology students Study on perspectives of students asked psychology students at the University of Regensburg if they expected to find adequate employment after getting their degree. Response variable: Ordered categorical 1-3 ranking: 1. Don t expect adequate employment 2. Not sure 3. Immediately after the degree Predictor: Age in years 5

Grouped Job Expectations Data Age in Response years 1 2 3 19 1 2 0 20 5 18 2 21 6 19 2 22 1 6 3 23 2 7 3 24 1 7 5 25 0 0 3 26 0 1 0 27 0 2 1 29 1 0 0 30 0 0 2 31 0 1 0 34 0 1 0 6

Generalized Linear Model: Motivation Normal linear regression models are not appropriate for these examples. Need a more general regression framework accounting for response data having a variety of measurement scales. Methods for model fitting & inferences in this framework. Ideally, some simplifications of linear regression would carry over. Generalizations to more complex settings (correlated data, censored observations, etc) will be necessary in many applications 7

Generalized Linear Models: The Basics In the general linear model, (y i x i ) N(x iβ, σ 2 ), with E(y i x i ) = x iβ V(y i x i ) = σ 2 Systematic component Random component The generalized linear model, generalizes both the random & systematic components The focus is on distributions in the exponential family, which includes many useful special cases (normal, Poisson, gamma, binomial, etc) 8

Likelihood Function: The Simple Exponential Family Observations y i are conditionally-independent given x i (i = 1,..., n) The conditional distribution of y i x i belongs to a simple exponential family Thus, the probability density function can be expressed as: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. (1) Here, θ i, φ are parameters and a( ), b( ) and c( ) are known functions. The θ i and φ are location and scale parameters, respectively. 9

For example, for the Normal distribution, we have f(y i ; θ i, φ) = 1 2πσ exp{ (y i µ) 2 /2σ 2 } = exp[(y i µ µ 2 /2)/σ 2 {y 2 i /σ 2 + log(2πσ 2 )}/2] so that θ i = µ, φ = σ 2, a i (φ) = φ, b(θ i ) = θ 2 i /2 and c(y i, φ) = 1 2 {y2 i /σ 2 + log(2πσ 2 )}. Thus, in this case θ i is the mean and φ is the variance 10

Let l(θ i, φ; y i ) = logf(y i ; θ i, φ) denote the log-likelihood function We can derive the mean and variance for the general case using: E ( l ) ( 2 l ) ( l = 0 and E + E ) 2 = 0. θ i θi 2 θ i These relations are well known properties of the likelihood function obtained by differentiating w.r.t. θ i the identity f(yi ; θ i, φ)dy i 1, holding the dispersion parameter φ as fixed. The notation denotes that the processes of differentiation & averaging occur at the same value of θ i 11

Note that l(θ i ; y i ) = {y i θ i b(θ i )}/a(φ) + c(y i, φ) It follows that l θ i = {y i b (θ i )}/a(φ) and 2 l θ 2 i = b (θ i )/a(φ). Hence, from the previous equalities, we have 0 = E ( l ) = {E(yi ) b (θ i )}/a(φ), θ i which implies that E(y i ) = b (θ i ) 12

Similarly, we have 0 = E ( 2 l ) ( l ) 2 + E θi 2 θ i = b (θ i )/a(φ) + E[{y i b (θ i )} 2 /a(φ) 2 ] = b (θ i )a(φ) + E(y 2 i ) 2E(y i )b (θ i ) + b (θ i ) 2 = b (θ i )a(φ) + E(y 2 i ) E(y i ) 2 var(y i ) = b (θ i )a(φ) For most commonly used exponential family distributions, a(φ) = φ/w i, where φ is a dispersion parameter and w i is a weight (typically equal to one) Hence, the mean and variance will typically follow the form: µ i = b (θ i ) and σ 2 = b (θ i )φ. 13

Characteristics of common distributions in the exponential family Normal Poisson Binomial Gamma Notation N(µ i, σ 2 ) Pois(µ i ) Bin(n i, π i ) G(µ i, ν) Range of y i (, ) [0, ) [0, n i ] (0, ) Dispersion, φ σ 2 1 1/n i ν 1 Cumulant: b(θ i ) θ 2 i /2 exp(θ i ) log(1 + e θ i) log( θ i ) Mean function, µ(θ i ) θ i exp(θ i ) 1/(1 + e θ i) 1/θ i Canonical link: θ(µ i ) identify log logit reciprocal Variance function, V (µ i ) 1 µ µ(1 µ) µ 2 14

Definition of a GLM: Systematic Component, Link Functions Instead of modeling the mean, µ i, as a linear function of predictors, x i, we introduce on one-to-one continuously differentiable transformation g( ) and focus on η i = g(µ i ), where g( ) will be called the link function and η i the linear predictor. We assume that the transformed mean follows a linear model, η i = x iβ. Since the link function is invertible and one-to-one, we have µ i = g 1 (η i ) = g 1 (x iβ). 15

Note that we are transforming the expected value, µ i, instead of the raw data, y i. For classical linear models, the mean is the linear predictor. In this case, the identity link is reasonable since both µ i and η i can take any value on the real line. This is not the case in general. 16

Link Functions for Poisson Data For example, if Y i Poi(µ i ) then µ i must be > 0. In this case, a linear model is not reasonable since for some values of x i µ i 0. By using the model, η i = log(µ i ) = x iβ, we are guaranteed to have µ i > 0 for all β R p and all values of x i. In general, a link function for count data should map the interval (0, ) R (i.e., from the + real numbers to the entire real line). The log link is a natural choice 17

Link Functions for Binomial Data For the binomial distribution, 0 < µ i < 1 (mean of y i is n i µ i ) Therefore, the link function should map from (0, 1) R Standard choices: 1. logit: η i = log{µ i /(1 µ i )}. 2. probit: η i = Φ 1 (µ i ), where Φ( ) is the N(0, 1) cdf. 3. complementary log-log: η i = log{ log(1 µ i )}. Each of these choices is important in applications & will be considered in detail later in the course 18

Recall that the exponential family density has the following form: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. where a( ), b( ) and c( ) are known functions. Specifying the GLM involves choosing a( ), b( ), c( ): 1. Specify a( ), c( ) to correspond to particular distribution (e.g., Binomial, Poisson) 2. Specify b( ) to correspond to a particular link function 19

Recall that mean & variance are µ i = b (θ i ) and σ 2 = b (θ i )φ. Using b (θ i ) = g 1 (x iβ), we can express the density as f(y i ; x i, β, φ), so that the conditional likelihood of y i given x i depends on parameters β and φ. It would seem that a natural choice for b( ) and hence g( ), would correspond to θ i = η i = x iβ, so that b ( ) is the inverse link 20

Canonical Links and Sufficient Statistics Each of the distributions we have considered has a special, canonical, link function for which there exists a sufficient statistic equal in dimension to β. Canonical links occur when θ i = η i = x iβ, with θ i the canonical parameter As a homework exercise (due next Tuesday) show whether or not the following distributions are in the exponential family (and if so) provide the canonical links: (i) Poisson, (ii) Negative binomial; (iii) Gamma; (iv) Log normal For the canonical links, the sufficient statistic is X y, with components i x ij y i, for j = 1,..., p. 21

Although canonical links often nice properties, selection of the link function should be based on prior expectation and model fit Example: Logistic Regression Suppose y i Bin(1, p i ), for i = 1,..., n, are independent 0/1 indicator variables of an adverse response (e.g., preterm birth) and let x i denote a p 1 vector of predictors for individual i (e.g., dose of dde exposure, race, age, etc). The likelihood is as follows: f(y β) = n i=1 = n i=1 = exp [ n ( p i p y i i (1 p i ) 1 y i = n ) y i (1 p i ) i=1 1 p i exp { y i log ( p ) ( i 1 )} log 1 p i 1 p i i=1 {y i θ i log(1 + e θ i )} ]. 22

Choosing the canonical link, θ i = log ( p i 1 p i the likelihood has the following form: ) = x i β, f(y β) = exp[ n i=1 {y i x iβ log(1 + e x iβ )}]. This is logistic regression, which is widely used in epidemiology and other applications for modeling of binary response data. In general, if f(y i ; θ i, φ) is in the exponential family and θ i = θ(η i ), η i = x iβ, then the model is called a generalized linear model (GLM) 23

Model fitting Choosing a GLM results in a likelihood function: L(y; β, φ, x) = n i=1 exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }, where θ i is a function of η i = x iβ. The maximum likelihood estimate is defined as β = sup L(y; β, φ, x), β with φ initially assumed to be known 24

Frequentist inferences for GLMs typically rely on β and asymptotic approximations. In the normal linear model special case, the MLE corresponds to the least squares estimator In general, there is no closed form expression so we need an algorithm to calculate β. 25

Maximum Likelihood Estimation of GLMs All GLMs can be fit using the same algorithm, a form of iteratively re-weighted least squares: 1. Given an initial value for β, calculate the estimated linear predictor η i = x i β and use that to obtain the fitted values µ i = g 1 ( η i ). Calculate the adjusted dependent variable, z i = η i + (y i µ i ) ( dη ) i dµ, 0 i where the derivative is evaluated at µ i. 26

2. Calculate the iterative weights W 1 i = ( dη i ) 2 dµ V 0 i. i where V i is the variance function evaluated at µ i. 3. Regress z i on x i with weight W i to give new estimates of β 27

Justification for the IWLS procedure Note that the log-likelihood can be expressed as l = n i=1 {y i θ i b(θ i )}/a(φ) + c(y i, φ). To maximize this log-likelihood we need l/ β j, l β j = n i=1 = n i=1 = n i=1 l i dθ i dµ i θ i dµ i dη i (y i µ i ) a(φ) (y i µ i ) W i a(φ) η i β j 1 V i dµ i dη i x ij, dη i dµ i x ij since µ i = b (θ i ) and b (θ i ) = V i implies dµ i /dθ i = V i. With constant dispersion (a(φ) = φ), the MLE equations for β j : n i=1 W i (y i µ i ) dη i dµ i x ij = 0. 28

Fisher s scoring method uses the gradient vector, l/ β = u, and minus the expected value of the Hessian matrix E ( 2 l ) = A. β r β s Given the current estimate b of β, choose the adjustment δb so Aδb = u. Excluding φ, the components of u are u r = n i=1 so we have A rs = E( u r / β s ) = E n i=1 [ (yi µ i ) β s W i (y i µ i ) dη i dµ i x ir, { dη } i dη i Wi x ir + Wi x ir (y i µ i ) ]. dµ i dµ i β s The expectation of the first term is 0 and the second term is n i=1 W i dη i dµ i x ir µ i β s = n i=1 W i dη i dµ i x ir dµ i dη i η i β s = n i=1 W i x ir x is. 29

The new estimate b = b + δb of β thus satisfies Ab = Ab + Aδb = Ab + u, where (Ab) r = s A rs b s = n i=1 W i x ir η i. Thus, the new estimate b satisfies (Ab ) r = n i=1 W i x ir {η i + (y i µ i )dη i /dµ i }. These equations have the form of linear weighted least squares equation with weight W i and dependent variable z i. 30

Some Comments The IWLS procedure is simple to implement and converges rapidly in most cases Procedures are available to calculate MLEs and implement frequentist inferences for GLMs in most software packages. In R or S-PLUS the glm( ) function can be used - try help(glm) In Matlab the glmfit( ) function can be used 31

Example: Smoking and Obesity y i = 1 if the child is obese and y i = 0 otherwise, for i = 1,..., n x i = (1, age i, smoke i, age i smoke i ) Bernoulli likelihood, L(y; β, x) = n where µ i = Pr(y i = 1 x i, β). i=1 µ y i i (1 µ i ) 1 y i, Choosing the canonical link, µ i = 1/{1 + exp( x iβ)}, results in a logistic regression model: Pr(y i = 1 x i, β) = exp(x iβ) 1 + exp(x iβ), Hence, probability of obesity depends on age and smoking through a non-linear model 32

Letting X = cbind(age,smoke,age*smoke) and Y = 0/1 obesity outcome in R, we use fit<- glm(y ~ age + smoke + age*smoke, family=binomial, data=obese) to implement IWLS and fit the model Note that data are available on the web - try to replicate results (note children a year or younger have been discarded) The command summary(glm) yields the results: 33

Coefficients: Value Std. Error t value (Intercept) -2.365173738 0.50112688-4.7197104 age -0.066204429 0.08957593-0.7390873 smoke -0.043079741 0.22375895-0.1925275 age:smoke -0.008448488 0.04010827-0.2106420 Null Deviance: 1580.905 on 3874 degrees of freedom Residual Deviance: 1574.663 on 3871 degrees of freedom Number of Fisher Scoring Iterations: 6 Correlation of Coefficients: (Intercept) age smoke age -0.9382877 smoke -0.9067235 0.8520241 age:smoke 0.8496495-0.9062117-0.9391875 34

Thus, the IWLS algorithm converged in 6 iterations to the MLE: β = ( 2.365, 0.066, 0.043, 0.008) For any value of the covariates we can calculate the probability of obesity For example, for non-smokers the age curves can be plotted by using: beta<- fit$coef ## introduce grid spanning range of observed ages x<- seq(min(obese$age),max(obese$age),length=100) ## calculate fitted probability of obesity py<- 1/(1+exp(-beta[1]+beta[2]*x)) plot(x,py,xlab="age in years", ylab="pr(obesity)") 35