Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16
Today s class Poisson regression. Residuals for diagnostics. Exponential families. Fisher scoring. - p. 2/16
Poisson regression Response: Y i Poisson(λ i ) independent. Popular applications: 1. anything with count data 2. contingency tables Link functions: 1. g(x) = log(x); 2. g(x) = 1/x Variance function: V (µ) = µ. - p. 3/16
Canonical link for Poisson In logistic regression, we identified logit as canonical link because g (µ) = 1 V (µ). We have to solve g (µ) = 1 µ. Therefore, in Poisson regression the is g(µ) = log µ. - p. 4/16
Contingency tables An m k contingency table is a table of counts categorized by two categorical variables A and B B(1)... B(k) A(1) Y 11... Y 1k Y 1..... A(n) Y n1... Y nk Y n Y 1... Y k Y where Y ij Poisson(λ ij ). Thinking of categorizing a population of birds, say, by two variables: one being occurence (or not) of West Nile virus, the other, which type of bird. Question: is there more West Nile virus in one type of bird? Or, if we sample a bird at random are the events {has West Nile virus} and {is a bird of type j} independent? Equivalent to H 0 : E(Y ij ) = λ i,a λ j,b log(e(y ij )) = η i + γ j - p. 5/16
Test for independence Under H 0 Pearson s X 2 X 2 = i,j Ŷij = Y i Y j Y (Y ij Ŷ ij ) 2 Ŷ ij. Under H 0, if λ ij s are not too small, X 2 is approximately χ 2 nk n k+1. - p. 6/16
Poisson deviance Easy to see that the Poisson deviance is DEV (µ Y ) = 2 (Y log Yµ ) (Y µ). Because φ = 1 in a Poisson model, under H 0 ( ) 2 i,j Y ij log Y ij Ŷ ij (Y ij Ŷ ij ) χ 2 nk k+1 - p. 7/16
Overdispersed Poisson If the data truly is Poisson, φ = 1 but not all count data is Poisson. Example that throw off Poisson ness: clustering, i.e. counting disease occurence within families and using many families. If disease has a genetic component then the total number of occurences will be more like a mixture of Poissons. How to accomodate this in inference (or test for it)? φ = 1 n p (Y i µ i ) 2 V (µ i ) = X2 n p. - p. 8/16
Residuals Pearson s X 2 can be thought of as a sum of weighted residuals squared. Pearson residual r i,p = Y i µ i V ( µ i ) Deviance in a GLM can be expressed as a sum of n terms: DEV ( µ Y ) = DEV ( µ i Y i ) Leads to deviance residuals r i,dev = sign(y i µ i ) DEV ( µ i Y i ) these are the default ones in R. Once you have residuals, one can do all the standard diagnostic plots. - p. 9/16
Multivariate Newton-Raphson Univariate root finding: suppose we want to find a root of f : R R. Start at some x 0, x n+1 = x n f(x n) f (x n ), n 0. Multivariate root finding: suppose we want to find roots of f : R p R p {x R p : f(x) = 0 R p }. Start at some x 0 x n+1 = x n Jf(x n ) 1 f(x n ), n 0. where Jf is the Jacobean of f. Convergence to a solution is guaranteed (at least assuming that all solutions points x have det(jf(x )) 0. - p. 10/16
Finding critical points Multivariate critical point finding: suppose we want to find critical points of g : R p R. Start at some x 0 x n+1 = x n 2 g(x n ) 1 g(x n ), n 0 where 2 g(x) is the Hessian of g. Convergence to a critical point is guaranteed assuming det( 2 g(x )) 0 for all critical points x. If g is convex: convergence to global minimizer, otherwise can converge to other critical points. - p. 11/16
GLM: Fisher scoring In the Poisson model, β j DEV ( µ Y ) = = = β j DEV ( µ i Y i ) Y i µ i µ i µ i β j Y i µ i µ i 1 g ( µ i ) X ij - p. 12/16
GLM: Fisher scoring 2 DEV ( µ Y ) β k β j ( ) Yi µ i 1 = β k µ i g ( µ i ) X ij ( 1 1 = µ i g ( µ i ) 2 X ijx ik + + (Y i µ i ) ( 1 V ( µ i ) V ( µ i) + )) 1 2 µ i β k g ( µ i ) X ij Therefore Newton-Raphson needs the extra terms not in our algorithm! ( ) 2 E β= β b DEV ( µ Y ) β k β j = 1 µ i 1 g ( µ i ) 2 X ijx ik - p. 13/16
Fisher scoring with the If g (µ) = V (µ) 1 = µ 1 in the Poisson case, then β j DEV ( µ Y ) = = = β j DEV ( µ i Y i ) Y i µ i µ i µ i β j (Y i µ i )X ij 2 β k β j DEV ( µ Y ) = = 1 g ( µ i ) X ikx ij V ( µi )X ik X ij - p. 14/16
Exponential families We have seen logistic and Poisson regression: when else can we talk about GLMs? Suppose the density or mass function of Y then Using the facts we see f(y θ, φ) exp((yθ b(θ))/a(φ) + c(y, φ)) l(θ, φ Y ) = (yθ b(θ))/a(φ) + c(y, φ) ( ) l E (θ,φ) θ Var (θ,φ) ( l θ E(Y ) = b (θ), Poisson: b(θ) = e θ, a(φ) = 1. = 0 ) ( ) l 2 = E (θ,φ) θ 2 Var(y) = b (θ)a(φ). - p. 15/16
Canonical link The variance function is The is g = b 1. Why? g (µ) = Poisson: g(x) = log x... V (µ) = b b 1 (µ). 1 b (b 1 (µ)) = 1 V (µ) - p. 16/16