Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1
Outline Generalized Linear Models 1 Generalized Linear Models 2
Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response variable x i = (x i,1,..., x i,p ) T : p explanatory variables Linear predictor: r For β = (β 1,..., β p ) R p : x p i β = j=1 x i,j β j Probability Model: {y i } independent, canonical exponential r.v. s: Density: p(y i η i ) = e η i y i A(η i ) h(x) Mean Function: µ i = E [Y i ] = A(η i ) Link Function g( ) : g(µ i ) = x i β With estimate βˆ: x i βˆ = ĝ(µ i ). T Canonical Link Function: g(µ i ) = η i = [ A ] 1 (µ i ) = x β i 3
Matrix Notation y 1 x 1,1 x 1,2 x 1,p β 1 y 2. X = x 2,1 x 2,2 x 2,p y =. β =....... β y p n x n,1 x n,2 x p,n µ 1 A(η 1 ) µ 2 = A(η 2 ) E [y] = µ = = A(η).. µ n A(η n ) Examples: y i Bernoulli(θ i ) : θ η i = log( i 1 θi ) y i Poisson(λ i ) : η i = log(λ i ) y i Gaussian(µ i, 1) : η i = µ i 4
Outline Generalized Linear Models 1 Generalized Linear Models 5
Log-Likelihood Function r for Generalized Linear Model i(β) = n i=1(η i y i A(η i ) + log[h(y i )]) η T y 1 T A(η) = g(µ) T y 1 T A(η) = [Xβ] T y 1 T A(η) (for Canonical Link) Note: T (y) = X T y is sufficient when g( ) is canonical of β Solve for {β, m = 1, 2,...} iteratively: m 0 = i(β m+1 ) = i(β m ) + (β m+1 β m )i(β m ) = β = β + [ i(β )] 1 i(β ) m+1 m m m Fisher Scoring Algorithm Newton-Raphson β m Pr βˆ (the MLE) 6
For Canonical Link rn i(β) = [ (η i y i A(η i ) + log[h(y i )])] r β i=1 n = i=1 β [(η i y i A(η i ) + log[h(y i )])] r ( ) ( ) = n ηi i=1 η [(η i y i A(η i ) + log[h(y i )])] ( i β r ) ( ) T n xi β = i=1 y i A(η i ) β rn r n = i=1 (y i µ i ) x i = i=1 x i (y i µ i ) = [X] T (y µ) ( ) ( ) r T n x β β T ( β T i i(β) = [i(β)] = [ i=1 y i A(η i ) ) β ] rn = [ i=1 y i A(η i ) x i ] β r T ( ) n = i=1 [ A(η ( β T i )] x i ] r ) = n i=1 ( η i [ A(η i )] η i x i ] ) β r T ( ) r n x i ] η i n = T i=1 [ A(η i )] = β T i=1 [ A(η i )] x i xi = X T WX 7
For Canonical Link rn i(β) = β [ i=1(η i y i A(η i ) + log[h(y i )])] = X T (y µ) r ( ) ( ) xt n i β i(β) = [i(β)] = [ y i A(η i ) ] β T β T i=1 β r ( ) n = [A(η T i=1 i )] x i xi = X T WX where W = Cov(y) is diagonal with W i,i = A(η i ) = Var[y i ] 8
Iteratively Re-weighted Least Squares Interpretation Given β, and ˆµ(β ) = A(Xβ ), m m m β = β + [ i(β )] 1 i(β ) m+1 m m m = β m + [X T WX] 1 [X] T (y µˆ(β m)) Δ = (β m+1 β m ) is solved as the WLS regression of y = [y µˆ(β m)] on X = WX using Σ = Cov(y ) = W The WLS estimate of Δ is given by: Δˆ = [X Σ 1 X ] 1 T X Σ 1 y = [X T WW 1 WX] 1 X T WW 1 [y µˆ(β m)] = [X T WX] 1 X T [y µˆ(β m)] 9
Outline Generalized Linear Models 1 Generalized Linear Models 10
Binomial Data: Y i Binomial(m i, π i ), i = 1,..., k. Log-Likelihood Function ( ) k π i(π i 1,..., π k ) = [(Y i log ) + m i log(1 π i )] Covariates: {x i, i = 1,..., k} i=1 1 π i Logistic Regression Parameter: η T i = log[π i /(1 π i)] = x β W = Cov(Y) = diag(m i π i (1 π i )), (k k matrix) i 11
Outline Generalized Linear Models 1 Generalized Linear Models 12
for Generalized Linear Models Consider testing H 0 : µ = µ 0 = A(Xβ 0 ). vs H Alt : µ = µ (general n-vector) e.g., µ = A(X β ) with X = I n and β = η. Suppose y is in interior of convex support of {y : p(y η) > 0}. Then ˆη = η(y) = A 1 (y) is the MLE under H Alt The Likelihood Ratio Test Statistic of H 0 vs H Alt : 2 log λ = 2[i(η(Y [ )) i(η(µ 0 ))] = 2 [η(y) η(µ 0 )] T y [A(η(y)) A(η(µ 0 ))] = Deviance Between y and µ 0 13
Deviance Formulas for Distributions r Gaussian : r i (y i µ i ) 2 /σ0 2 Poisson : 2 r i[y i log(y i /µˆi ) (y i µˆi )] Binomial : 2 [y i log(y i /µˆi ) + (m i y i ) log[(m i y i )/(m i µˆi )] i 14
Outline Generalized Linear Models 1 Generalized Linear Models 15
Data: (y i, x i ), i = 1,..., n where y i : a q-dimensional response vector variable x i = (x i,1,..., x i,p ) T : p explanatory variables Probability Model: The conditional distributions of each y i given x i is of the form p(y x; B, φ) = f (y, η 1,..., η M, φ) for some known function f ( ), where B = [β 1 β 2 β M ] is a p M matrix of unknown regression coefficients. M Linear Predictors: For j r = 1,..., M, the jth linear predictor is η p j = η j (x) = β T j x = k=1 B kj x k, where x = (x 1,..., x p ) T with x 1 = 1 when there is an intercept. 16
Matrix Notation yt x T 1 1 x 1,1 x 1,2 x 1,p yt T 2 X = x2 x 2,1 x 2,2 x 2,p Y = =......... yt xt x n,1 x n,2 x p,n n B = [β 1 β 2 β M ] n Link Function: For each observation i E [y i ] = µ i (q 1 vectors) and g(µ i ) = η i = B T x i (M 1 vectors) where: η β T 1 (x i ) 1 x i η i = η(x i ) = =... = B T x i η M (x i ) β T mx i 17
Multivariate Exponential Family Models η T Density: p(y i η i ) = e i y i A(η i ) h(y i ) Mean Function: µ i = E[y i ] = A(η i ) Link Function g( ) : g(µ i ) = B T x i Canonical Link Function: g(µ i ) = η i = [ A ] 1 (µ i ) = B T x i Examples: y i Multinomial(m i, π i,1,..., π i,m+1 ) rm+1 π i,j 0, j = 1,..., M + 1 and π i,j = 1. j=1 y i M-Variate-Gaussian(µ i, Σ 0 ) µ i R M and Σ 0 known (M M) Case Study: Applying Generalized Linear Models. Note: Reference Yee (2010) on the VGAM Package for R 18
MIT OpenCourseWare http://ocw.mit.edu 18.655 Mathematical Statistics Spring 2016 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.