STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Similar documents
STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Generalized Linear Models

Generalized Linear Models Introduction

SB1a Applied Statistics Lectures 9-10

Generalized Linear Models 1

Generalized Linear Models. Kurt Hornik

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Linear Regression Models P8111

Generalized Linear Models I

Outline of GLMs. Definitions

Generalized linear models

When is MLE appropriate

Introduction to Generalized Linear Models

Weighted Least Squares I

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Lecture 13: More on Binary Data

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

12 Modelling Binomial Response Data

Likelihoods for Generalized Linear Models

Generalized linear models

Generalized Estimating Equations

Linear Methods for Prediction

LOGISTIC REGRESSION Joseph M. Hilbe

STAT5044: Regression and Anova

Generalized Linear Models and Extensions

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Linear Regression With Special Variables

Semiparametric Generalized Linear Models

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Stat 710: Mathematical Statistics Lecture 12

Some explanations about the IWLS algorithm to fit generalized linear models

STA 216, GLM, Lecture 16. October 29, 2007

Linear Methods for Prediction

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Chapter 4: Generalized Linear Models-II

Statistics & Data Sciences: First Year Prelim Exam May 2018

Generalized Linear Models

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

9 Generalized Linear Models

11. Generalized Linear Models: An Introduction

Generalized Linear Models (GLZ)

Chapter 4: Generalized Linear Models-I

Generalized Linear Models (1/29/13)

MIT Spring 2016

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Generalized Linear Models. stat 557 Heike Hofmann

MATH Generalized Linear Models

Sections 4.1, 4.2, 4.3

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Generalized Linear Models for a Dependent Aggregate Claims Model

Lecture 8. Poisson models for counts

Applying Generalized Linear Models

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Introduction to General and Generalized Linear Models

Chapter 5: Generalized Linear Models

Lecture 01: Introduction

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Single-level Models for Binary Responses

Stat 579: Generalized Linear Models and Extensions

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Dimensionality Reduction with Generalized Linear Models

Poisson regression 1/15

Generalized Linear Models: An Introduction

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Generalized linear mixed models for dependent compound risk models

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

High-Throughput Sequencing Course

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Bayesian Multivariate Logistic Regression

GLM I An Introduction to Generalized Linear Models

,..., θ(2),..., θ(n)

Generalized Linear Models for Non-Normal Data

Generalized Linear Models

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Likelihood inference in the presence of nuisance parameters

Stat 579: Generalized Linear Models and Extensions

Chapter 1. Modeling Basics

Generalized Linear Models

The Relationship Between the Power Prior and Hierarchical Models

where F ( ) is the gamma function, y > 0, µ > 0, σ 2 > 0. (a) show that Y has an exponential family distribution of the form

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Biostatistics Advanced Methods in Biostatistics IV

Logistic Regression and Generalized Linear Models

Stat 5101 Lecture Notes

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Generalized Linear Models and Exponential Families

STAT 526 Spring Final Exam. Thursday May 5, 2011

Federated analyses. technical, statistical and human challenges

Generalized linear mixed models (GLMMs) for dependent compound risk models

D-optimal Designs for Factorial Experiments under Generalized Linear Models

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Generalized linear mixed models (GLMMs) for dependent compound risk models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Generalized linear models

Foundations of Statistical Inference

Transcription:

STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general linear model, we assume that Y i N(µ i, σ 2 ), and we further assume that the expected value µ i is a linear function µ i = x iβ, where x i = (x i1,..., x ip ) is a p 1 vector of predictors (covariates) and β is a vector of unknown parameters (regression coefficients). 1

The generalized linear model, generalizes both the random & systematic components Likelihood Function: The Exponential Family We assume that observations come from a distribution in the exponential family with the following probability density function: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. (1) Here, θ i, φ are parameters and a( ), b( ) and c( ) are known functions. The θ i and φ are location and scale parameters, respectively. 2

For example, for the Normal distribution, we have f(y i ; θ i, φ) = 1 2πσ exp{ (y i µ) 2 /2σ 2 } = exp[(y i µ µ 2 /2)/σ 2 {y 2 i /σ 2 + log(2πσ 2 )}/2] so that θ i = µ, φ = σ 2, a i (φ) = φ, b(θ i ) = θ 2 i /2 and c(y i, φ) = 1 2 {y2 i /σ 2 + log(2πσ 2 )}. Thus, in this case θ i is the mean and φ is the variance Let l(θ i, φ; y i ) = logf(y i ; θ i, φ) denote the log-likelihood function We can derive the mean and variance for the general case using: E ( l ) ( 2 l ) ( l ) 2 = 0 and E + E = 0. θ i θi 2 θ i 3

Note that l(θ i ; y i ) = {y i θ i b(θ i )}/a(φ) + c(y i, φ) It follows that l θ i = {y i b (θ i )}/a(φ) and 2 l θ 2 i = b (θ i )/a(φ). Hence, from the previous equalities, we have 0 = E ( l ) = {E(yi ) b (θ i )}/a(φ), θ i which implies that E(y i ) = b (θ i ) 4

Similarly, we have 0 = E ( 2 l ) ( l ) 2 + E θi 2 θ i = b (θ i )/a(φ) + E[{y i b (θ i )} 2 /a(φ) 2 ] = b (θ i )a(φ) + E(y 2 i ) 2E(y i )b (θ i ) + b (θ i ) 2 = b (θ i )a(φ) + E(y 2 i ) E(y i ) 2 var(y i ) = b (θ i )a(φ) For most commonly used exponential family distributions, a(φ) = φ/w i, where φ is a dispersion parameter and w i is a weight (typically equal to one) Hence, the mean and variance will typically follow the form: µ i = b (θ i ) and σ 2 = b (θ i )φ. 5

Characteristics of common distributions in the exponential family Normal Poisson Binomial Gamma Notation N(µ i, σ 2 ) Pois(µ i ) Bin(n i, π i ) G(µ i, ν) Range of y i (, ) [0, ) [0, n i ] (0, ) Dispersion, φ σ 2 1 1/n i ν 1 Cumulant: b(θ i ) θ 2 i /2 exp(θ i ) log(1 + e θ i) log( θ i ) Mean function, µ(θ i ) θ i exp(θ i ) 1/(1 + e θ i) 1/θ i Canonical link: θ(µ i ) identify log logit reciprocal Variance function, V (µ i ) 1 µ µ(1 µ) µ 2 6

Systematic Component, Link Functions Instead of modeling the mean, µ i, as a linear function of predictors, x i, we introduce on one-to-one continuously differentiable transformation g( ) and focus on η i = g(µ i ), where g( ) will be called the link function and η i the linear predictor. We assume that the transformed mean follows a linear model, η i = x iβ. Since the link function is invertible and one-to-one, we have µ i = g 1 (η i ) = g 1 (x iβ). 7

Note that we are transforming the expected value, µ i, instead of the raw data, y i. For classical linear models, the mean is the linear predictor. In this case, the identity link is reasonable since both µ i and η i can take any value on the real line. This is not the case in general. 8

Link Functions for Poisson Data For example, if Y i Poi(µ i ) then µ i must be > 0. In this case, a linear model is not reasonable since for some values of x i µ i 0. By using the model, η i = log(µ i ) = x iβ, we are guaranteed to have µ i > 0 for all β R p and all values of x i. In general, a link function for count data should map the interval (0, ) R (i.e., from the + real numbers to the entire real line). The log link is a natural choice 9

Link Functions for Binomial Data For the binomial distribution, 0 < µ i < 1. Therefore, the link function should map from (0, 1) R Standard choices: 1. logit: η i = log{µ i /(1 µ i )}. 2. probit: η i = Φ 1 (µ i ), where Φ( ) is the N(0, 1) cdf. 3. complementary log-log: η i = log{ log(1 µ i )}. Each of these choices is important in applications & will be considered in detail later in the course 10

Canonical Links and Sufficient Statistics Each of the distributions we have considered has a special, canonical, link function for which there exists a sufficient statistic equal in dimension to β. Canonical links occur when θ i = η i, with θ i the canonical parameter As a homework exercise, please show for next class that the following distributions are in the exponential family and have the listed canonical links: Normal η i = µ i Poisson η i = logµ i binomial η i = log{µ i /(1 µ i )} gamma η i = µ 1 i For the canonical links, the sufficient statistic is X y, with components i x ij y i, for j = 1,..., p. 11

Although canonical links often nice properties, selection of the link function should be based on prior expectation and model fit Example: Logistic Regression Suppose y i Bin(1, p i ), for i = 1,..., n, are independent 0/1 indicator variables of an adverse response (e.g., preterm birth) and let x i denote a p 1 vector of predictors for individual i (e.g., dose of dde exposure, race, age, etc). The likelihood is as follows: f(y β) = n = n = exp [ n p y i i (1 p i ) 1 y i = n ( p i ) y i (1 p i ) 1 p i exp { y i log ( p ) ( i 1 )} log 1 p i 1 p i {y i θ i log(1 + e θ i )} ]. 12

Choosing the canonical link, θ i = log ( p i 1 p i the likelihood has the following form: ) = x i β, f(y β) = exp[ n {y i x iβ log(1 + e x iβ )}]. This is logistic regression, which is widely used in epidemiology and other applications for modeling of binary response data. In general, if f(y i ; θ i, φ) is in the exponential family and θ i = θ(η i ), η i = x iβ, then the model is called a generalized linear model (GLM) 13

Maximum Likelihood Estimation of GLMs Unlike for the general linear model, there is no closed form expression for the MLE of β in general for GLMs. However, all GLMs can be fit using the same algorithm, a form of iteratively re-weighted least squares: 1. Given an initial value for β, calculate the estimated linear predictor η i = x i β and use that to obtain the fitted values µ i = g 1 ( η i ). Calculate the adjusted dependent variable, z i = η i + (y i µ i ) ( dη ) i dµ, 0 i where the derivative is evaluated at µ i. 2. Calculate the iterative weights W 1 i = ( dη ) i dµ V 0 i. i where V i is the variance function evaluated at µ i. 3. Regress z i on x i with weight W i to give new estimates of β 14

Justification for the IWLS procedure Note that the log-likelihood can be expressed as l = n {y i θ i b(θ i )}/a(φ) + c(y i, φ). To maximize this log-likelihood we need l/ β j, l β j = n = n = n l i dθ i dµ i θ i dµ i dη i (y i µ i ) a(φ) (y i µ i ) W i a(φ) η i β j 1 V i dµ i dη i x ij, dη i dµ i x ij since µ i = b (θ i ) and b (θ i ) = V i implies dµ i /dθ i = V i. With constant dispersion (a(φ) = φ), the MLE equations for β j : n W i (y i µ i ) dη i dµ i x ij = 0. 15

Fisher s scoring method uses the gradient vector, l/ β = u, and minus the expected value of the Hessian matrix E ( 2 l ) = A. β r β s Given the current estimate b of β, choose the adjustment δb so Aδb = u. Excluding φ, the components of u are u r = n so we have A rs = E( u r / β s ) = E n [ (yi µ i ) β s W i (y i µ i ) dη i dµ i x ir, { dη } i dη i Wi x ir + Wi x ir (y i µ i ) ]. dµ i dµ i β s The expectation of the first term is 0 and the second term is n W i dη i dµ i x ir µ i β s = n W i dη i dµ i x ir dµ i dη i η i β s = n W i x ir x is. 16

The new estimate b = b + δb of β thus satisfies Ab = Ab + Aδb = Ab + u, where (Ab) r = s A rs b s = n W i x ir η i. Thus, the new estimate b satisfies (Ab ) r = n W i x ir {η i + (y i µ i )dη i /dµ i }. These equations have the form of linear weighted least squares equation with weight W i and dependent variable z i. 17

Next Class Topic: Frequentist inference for GLMs Have homework exercise completed and written up Complete the following exercise in S-PLUS: 1. Simulate x i N(0, 1) and y i N( 1 + 2x i, 0.5), for i = 1,..., 50. 2. Fit the linear regression model E(y i x i ) = β 1 + β 2 x i using both the lm and glm functions - use help(lm) and help(glm) in S-PLUS to get details on implementation. 3. Answer questions: (a) What are the estimates? (b) Is there any difference in the output? 18