Generalized linear models

Size: px

Start display at page:

Download "Generalized linear models"

Vanessa Gilmore
5 years ago
Views:

1 Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance The components of a generalized linear models 2 2. Linear predictor and link function Variance function Canonical link Example: Bernoulli distribution and canonical link Estimation: Iteratively reweighted least squares 4 3. Saturated model, deviance and estimation of φ Binomial distribution Ressources 7 5 EXERCISES: Generalized linear models 7 Densities for generalized linear models Consider densitites of the form yθ bθ f Y y; θ, φ = exp cy, φ where φ is a fixed parameter not part of the model.

2 . Mean and variance For random variables with densities of the above form we have: Ey = b θ Vary = b θ 2 Notice that f θ Y y; θ, φ = y b θ f Y y; θ, φ. Assuming that the order of integration and differentiation can be interchanged we get: 0 = d y b = dθ θ f θ Y dy = 3 = from which 2 * follows. [ y b θ ] = [µ b θ] By differentiating twice we get [ ] 0 = d2 y b θ f dθ 2 Y dy = θ [ ] = b θ y b 2 + θ From this 2 ** follows since 0 = = = [ y µ 2 ] 2 b θ 4 y µ 2 b θ VarY b θ 2 2 The components of a generalized linear models Consider the situation where Y,..., Y n are independent random variables and Y i has density f Y y; θ i, φ of the form. In the following it is assumed that = φ/w where w is a known weight. 2. Linear predictor and link function To each Y there is associated a vector x of explanatory variables and a fixed number ω. The linear predictor is η = ηβ = ω + x β. 2

3 I generalized linear models GLIMs the linear predictor is related to the mean µ = EY by the link function g η = gµ as follows µ = g η = g ω + x β The density is parameterized by the parameter θ and we therefore need to establish the connections between θ and µ. From 2 we have that EY = µ = b θ. Letting the inverse function of b denoted by H. Then θ = b µ = Hµ = Hg ω + x β which establishes the link between the parameter θ and x β. 2.2 Variance function From 2 we also have that VarY = b θ. Since θ = Hµ we have The function VarY = b Hµ. V µ = b Hµ is called the variance function as it expresses how the variance depends on the mean. 2.3 Canonical link The connection between θ and the linear predictor η is given by θ = b g η. So if b = g then θ = η. In this case g is said to be the canonical link function. Using the canonical link function ensures that the log likelihood is concave such it has a unique maximum. 2.4 Example: Bernoulli distribution and canonical link Consider the Bernoulli distribution Rewrite as P ry = y = p y p y, y {0, } P ry = y = exp y log p + n y log p = exp = exp p p y log yθ log + expθ + log p where we have introduced the logit as θ = gp = log p. Notice that p = eθ p Hence bθ = log + e θ, w = and φ =. Now b θ = p p. 3. +e θ eθ = p and b θ = eθ = +e θ +e θ 2

4 3 Estimation: Iteratively reweighted least squares Consider the situation where Y,..., Y n are independent where the distribution of Y i has density of the form with weight w i. We have EY i = µ i VarY i = V µ i φ/w i gµ i = η i = ω i + x i β Then the log likelihood becomes lβ = w i y i θ i bθ i φ i i cy i, φ where θ i = Hµ i = Hg η i = Hg ω i + x i β. Imagine a Taylor expansion of gy i around µ i : gy i gµ i + g µ i Y i µ i = Z i The key to the algorithm is to work with adjusted dependent variables z i. To actually calculate these we need an initial guess of µ i. More about this later. We have EZ i = gµ i = η i and VarZ i = g µ i 2 V µ i φ/w i. In vector form we get EZ = η = ω + Xβ, VarZ = φ diag g µ i 2 V µ i /w i = φw µ The least squares estimate of β is ˆβ = X W X X W z ω Notice: Both W and z depend on β so some iterative scheme is needed. Initialization: Since gµ i = ω i + x i β we must have gy i ω i + x i β. So we may start by applying the link function to data, and obtaining an initial estimate of β as β 0 = X X X gy ω Iteration: Given ˆβ m from the mth iteration we calculate z m+ = Xβ m + diag g µ m i y µ m = Xβ m + r 4

5 Update W accordingly as W = W µ m and calculate β m+ = X W X X W z m+ ω Notice that β m+ = β m + X W X X W r ω This means that α = β m+ β m is determined as the MLE in the model r ω NXα, W. 3. Saturated model, deviance and estimation of φ Recall that the log likelihood l has the form lβ = lµ, y = l µ, y/φ. The saturated model is obtained by making no restrictions on µ so in this case ˆµ = y and ly, y is the log likelihood for the saturated model. The goodness of fit of a generalized linear model is investigated using the deviance: Dˆµ, y = 2lˆµ, y ly, y = 2l ˆµ, y l y, y/φ = D ˆµ, y/φ Under certain assumptions on the covariates X, it can be shown that D will be asymptotically χ 2 n p distributed where p is the rank of X. When φ is known, this result provides a goodness of fit test of the model. When φ is unknown, this results provides a natural estimate of φ as φ = D ˆµ, y n p but in this case we do not have a goodness of fit test. 3.2 Binomial distribution Suppose Y i Binn i, µ i where gµ i = η i = x i β for i =,..., N. There are different ways of putting the binomial distribution into the GLIM framework. First notice that φ = and w i = so these terms can be ignored. For later use, let X = [x : : x N ] and v i = {g µ i } 2 µ i µ i ; We may regard Y i as a sum of n i independent Bernoulli variables: n i Y i = u ij, u ij Bernp i j= 5

6 We may therefore fit the binomial distribution this way, but this is potentially very inefficient in terms of storage and computing time. An alternative approach is described below. In the Bernoulli setup: The row vector of covariates x i must then be repeated n i times: Let X i = ni x i, X = [ X : : XN ]. With M = i n i, X is an M p matrix. Let also Ṽi = v i I ni ; Ṽ = diagv,..., V N. Finally, let z ij = gµ i + g µ i u ij µ i ; z i = z i,..., z ini and z = z,..., z N. We omit the subscripts n i indicating dimensions of and I. We then have Recall that z ij = gµ i + g µ i u ij µ i. Let and let z = z,..., z N. Then X Ṽ = [ x : : x N ] v v N X Ṽ z = x z +... x N z N v v N X Ṽ n X = [ x x + + n M x N x v v N] N z i = z ij = n n i n i z i = gµ i + g µ i i j yi n i µ i E z i = gµ i Var z i = n i {g µ i } 2 µ i µ i = v i n i Let V = diag v n,..., v N nn. Then X V = [ n v x : : n N v N x N ] X V z = n v x z + + n N v N x N z N X V X = n v x x + + n N v N x N x N Notice that z i = j z ij = n i z i. This means that X Ṽ z = X V z, X Ṽ X = X V X So iterating using X, Ṽ or X, V produces the same result. 6

7 4 Ressources The standard reference on generalized linear models is McCullagh and Nelder Generalized Linear Models 5 EXERCISES: Generalized linear models The function model.matrix can be used for generating a model matrix X from a given formula and a dataset. So in the following we shall assume that we have a vector of responses y, a model matrix X and possibly also a weight vector w.. Implement your own function myglm which takes y, X, w as input. In doing so, you may find it helpful to use that the various families are implemented in R. See e.g.?binomial. You may seek inspiration from the function glm.fit. 2. The result from your function could be of a certain class, and you could create interesting methods. For example, it would be nice to obtain the approximate covariance matrix of ˆβ, which is X W X 7

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general