STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression models are widely used for relating predictors x i = (x i1,..., x ip ) for subject i to a response y i : (y i x i ) N(x iβ, σ 2 ), where β are regression coefficients, and σ 2 is the residual variance For example, we may be interested in investigating the genetic, demographic, and dietary factors (quantified in the x i vector) predictive of blood pressure (y i ). 1

In the above example, y 1,..., y n, were treated as independent normally distributed random variables with a linear regression for the mean. Multivariate normal linear models are also very useful: y i = (y i1,..., y iq ) N q (αx i, Σ), where α is a q p matrix of regression coefficients, & Σ is a q q covariance matrix In this course, we focus on generalizations of normal linear regression models to address a broad class of data structures - we start with the Generalized Linear Model 2

Some Motivating Examples Caesarian Birth Study (Fahrmeir and Tutz, 2001): Data on infection from births by Caesarian section. Caesarian planned Not planned Infection Infection I II N I II N Antibiotics Risk-factors 0 1 17 4 7 87 No risk-factors 0 0 2 0 0 0 No antibiotics Risk-factors 11 17 30 10 13 3 No risk-factors 4 4 32 0 0 9 3

Response variable: Occurrence or non-occurrence of infection, with two types of infection (I, II) Covariates: 1. Caesarian planned (1=yes,0=no) 2. Were risk factors present (1=yes;0=no); 3. Were antibiotics given (1=yes;0=no) Scale of response: either binary (infection or not) or unordered categorical 4

Cellular Differentiation (Piegorsch, Weinberg & Margolin, 1988): Interest in the effect of two agents of immuno-activating ability that may introduce cell differentiation. Response variable: number of cells that exhibited markers after exposure was recorded. Scientific interest: Do the agents TNF (tumor necrosis factor) and IFN (interferon) simulate cell differentiation independently or is there a synergetic effect? 5

Cellular Differentiation Data Number of cells Dose of Dose of differentiating TNF (U/ml) IFN (U/ml) 11 0 0 18 0 4 20 0 20 39 0 100 22 1 0 38 1 4 52 1 20 69 1 100 31 10 0 68 10 4 69 10 20 128 10 100 102 100 0 171 100 4 180 100 20 193 100 100 Scale of response variable: count 6

Job expectations for psychology students: Study on perspectives of students asked psychology students at the University of Regensburg if they expected to find adequate employment after getting their degree. Response variable: Ordered categorical 1-3 ranking: 1. Don t expect adequate employment 2. Not sure 3. Immediately after the degree Predictor: Age in years 7

Grouped Job Expectations Data Age in Response years 1 2 3 19 1 2 0 20 5 18 2 21 6 19 2 22 1 6 3 23 2 7 3 24 1 7 5 25 0 0 3 26 0 1 0 27 0 2 1 29 1 0 0 30 0 0 2 31 0 1 0 34 0 1 0 8

Generalized Linear Model: Motivation Clearly, normal linear regression models are not appropriate for these examples. We need a more general regression framework accounting for response data having a variety of measurement scales. Methods for model fitting & inferences in this framework. Ideally, some simplifications of linear regression would carry over. Generalizations to more complex settings (correlated data, censored observations, etc) will be necessary in many applications 9

Generalized Linear Models: The Basics In the general linear model, (y i x i ) N(x iβ, σ 2 ), with E(y i x i ) = x iβ V(y i x i ) = σ 2 Systematic component Random component The generalized linear model, generalizes both the random & systematic components The focus is on distributions in the exponential family, which includes many useful special cases (normal, Poisson, gamma, binomial, etc) 10

Likelihood Function: The Simple Exponential Family Observations y i are conditionally-independent given x i (i = 1,..., n) The conditional distribution of y i x i belongs to a simple exponential family Thus, the probability density function can be expressed as: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. (1) Here, θ i, φ are parameters and a( ), b( ) and c( ) are known functions. The θ i and φ are location and scale parameters, respectively. 11

For example, for the Normal distribution, we have f(y i ; θ i, φ) = 1 2πσ exp{ (y i µ) 2 /2σ 2 } = exp[(y i µ µ 2 /2)/σ 2 {y 2 i /σ 2 + log(2πσ 2 )}/2] so that θ i = µ, φ = σ 2, a i (φ) = φ, b(θ i ) = θ 2 i /2 and c(y i, φ) = 1 2 {y2 i /σ 2 + log(2πσ 2 )}. Thus, in this case θ i is the mean and φ is the variance 12

Let l(θ i, φ; y i ) = logf(y i ; θ i, φ) denote the log-likelihood function We can derive the mean and variance for the general case using: E ( l ) ( 2 l ) ( l ) 2 = 0 and E + E = 0. θ i θi 2 θ i These relations are well known properties of the likelihood function obtained by differentiating w.r.t. θ i the identity f(yi ; θ i, φ)dy i 1, holding the dispersion parameter φ as fixed. The notation denotes that the processes of differentiation & averaging occur at the same value of θ i 13

Note that l(θ i ; y i ) = {y i θ i b(θ i )}/a(φ) + c(y i, φ) It follows that l θ i = {y i b (θ i )}/a(φ) and 2 l θ 2 i = b (θ i )/a(φ). Hence, from the previous equalities, we have 0 = E ( l ) = {E(yi ) b (θ i )}/a(φ), θ i which implies that E(y i ) = b (θ i ) 14

Similarly, we have 0 = E ( 2 l ) ( l ) 2 + E θi 2 θ i = b (θ i )/a(φ) + E[{y i b (θ i )} 2 /a(φ) 2 ] = b (θ i )a(φ) + E(y 2 i ) 2E(y i )b (θ i ) + b (θ i ) 2 = b (θ i )a(φ) + E(y 2 i ) E(y i ) 2 var(y i ) = b (θ i )a(φ) For most commonly used exponential family distributions, a(φ) = φ/w i, where φ is a dispersion parameter and w i is a weight (typically equal to one) Hence, the mean and variance will typically follow the form: µ i = b (θ i ) and σ 2 = b (θ i )φ. 15

Characteristics of common distributions in the exponential family Normal Poisson Binomial Gamma Notation N(µ i, σ 2 ) Pois(µ i ) Bin(n i, π i ) G(µ i, ν) Range of y i (, ) [0, ) [0, n i ] (0, ) Dispersion, φ σ 2 1 1/n i ν 1 Cumulant: b(θ i ) θ 2 i /2 exp(θ i ) log(1 + e θ i) log( θ i ) Mean function, µ(θ i ) θ i exp(θ i ) 1/(1 + e θ i) 1/θ i Canonical link: θ(µ i ) identify log logit reciprocal Variance function, V (µ i ) 1 µ µ(1 µ) µ 2 16

Systematic Component, Link Functions Instead of modeling the mean, µ i, as a linear function of predictors, x i, we introduce on one-to-one continuously differentiable transformation g( ) and focus on η i = g(µ i ), where g( ) will be called the link function and η i the linear predictor. We assume that the transformed mean follows a linear model, η i = x iβ. Since the link function is invertible and one-to-one, we have µ i = g 1 (η i ) = g 1 (x iβ). 17

Note that we are transforming the expected value, µ i, instead of the raw data, y i. For classical linear models, the mean is the linear predictor. In this case, the identity link is reasonable since both µ i and η i can take any value on the real line. This is not the case in general. 18

Link Functions for Poisson Data For example, if Y i Poi(µ i ) then µ i must be > 0. In this case, a linear model is not reasonable since for some values of x i µ i 0. By using the model, η i = log(µ i ) = x iβ, we are guaranteed to have µ i > 0 for all β R p and all values of x i. In general, a link function for count data should map the interval (0, ) R (i.e., from the + real numbers to the entire real line). The log link is a natural choice 19

Link Functions for Binomial Data For the binomial distribution, 0 < µ i < 1. Therefore, the link function should map from (0, 1) R Standard choices: 1. logit: η i = log{µ i /(1 µ i )}. 2. probit: η i = Φ 1 (µ i ), where Φ( ) is the N(0, 1) cdf. 3. complementary log-log: η i = log{ log(1 µ i )}. Each of these choices is important in applications & will be considered in detail later in the course 20

Canonical Links and Sufficient Statistics Each of the distributions we have considered has a special, canonical, link function for which there exists a sufficient statistic equal in dimension to β. Canonical links occur when θ i = η i, with θ i the canonical parameter As a homework exercise, please show for next class that the following distributions are in the exponential family and have the listed canonical links: Normal η i = µ i Poisson η i = logµ i binomial η i = log{µ i /(1 µ i )} gamma η i = µ 1 i For the canonical links, the sufficient statistic is X y, with components i x ij y i, for j = 1,..., p. 21

Although canonical links often nice properties, selection of the link function should be based on prior expectation and model fit Example: Logistic Regression Suppose y i Bin(1, p i ), for i = 1,..., n, are independent 0/1 indicator variables of an adverse response (e.g., preterm birth) and let x i denote a p 1 vector of predictors for individual i (e.g., dose of dde exposure, race, age, etc). The likelihood is as follows: f(y β) = n = n = exp [ n p y i i (1 p i ) 1 y i = n ( p i ) y i (1 p i ) 1 p i exp { y i log ( p ) ( i 1 )} log 1 p i 1 p i {y i θ i log(1 + e θ i )} ]. 22

Choosing the canonical link, θ i = log ( p i 1 p i the likelihood has the following form: ) = x i β, f(y β) = exp[ n {y i x iβ log(1 + e x iβ )}]. This is logistic regression, which is widely used in epidemiology and other applications for modeling of binary response data. In general, if f(y i ; θ i, φ) is in the exponential family and θ i = θ(η i ), η i = x iβ, then the model is called a generalized linear model (GLM) 23

Maximum Likelihood Estimation of GLMs Unlike for the general linear model, there is no closed form expression for the MLE of β in general for GLMs. However, all GLMs can be fit using the same algorithm, a form of iteratively re-weighted least squares: 1. Given an initial value for β, calculate the estimated linear predictor η i = x i β and use that to obtain the fitted values µ i = g 1 ( η i ). Calculate the adjusted dependent variable, z i = η i + (y i µ i ) ( dη ) i dµ, 0 i where the derivative is evaluated at µ i. 2. Calculate the iterative weights W 1 i = ( dη ) i dµ V 0 i. i where V i is the variance function evaluated at µ i. 3. Regress z i on x i with weight W i to give new estimates of β 24

Justification for the IWLS procedure Note that the log-likelihood can be expressed as l = n {y i θ i b(θ i )}/a(φ) + c(y i, φ). To maximize this log-likelihood we need l/ β j, l β j = n = n = n l i dθ i dµ i θ i dµ i dη i (y i µ i ) a(φ) (y i µ i ) W i a(φ) η i β j 1 V i dµ i dη i x ij, dη i dµ i x ij since µ i = b (θ i ) and b (θ i ) = V i implies dµ i /dθ i = V i. With constant dispersion (a(φ) = φ), the MLE equations for β j : n W i (y i µ i ) dη i dµ i x ij = 0. 25

Fisher s scoring method uses the gradient vector, l/ β = u, and minus the expected value of the Hessian matrix E ( 2 l ) = A. β r β s Given the current estimate b of β, choose the adjustment δb so Aδb = u. Excluding φ, the components of u are u r = n so we have A rs = E( u r / β s ) = E n [ (yi µ i ) β s W i (y i µ i ) dη i dµ i x ir, { dη } i dη i Wi x ir + Wi x ir (y i µ i ) ]. dµ i dµ i β s The expectation of the first term is 0 and the second term is n W i dη i dµ i x ir µ i β s = n W i dη i dµ i x ir dµ i dη i η i β s = n W i x ir x is. 26

The new estimate b = b + δb of β thus satisfies Ab = Ab + Aδb = Ab + u, where (Ab) r = s A rs b s = n W i x ir η i. Thus, the new estimate b satisfies (Ab ) r = n W i x ir {η i + (y i µ i )dη i /dµ i }. These equations have the form of linear weighted least squares equation with weight W i and dependent variable z i. 27

Next Class Topic: Frequentist inference for GLMs Have homework exercise completed and written up for next Tuesday Complete the following exercise: 1. Write down generalized linear models for the Caesarian data (grouping the two different infection types) and the cellular differentiation data. 2. Show the different components of the GLM, expressing the likelihood in exponential family form & using a canonical link function 3. Fit the GLM using maximum likelihood and report the parameter estimates. 28