ML estimation: Random-intercepts logistic model. and z

Size: px

Start display at page:

Download "ML estimation: Random-intercepts logistic model. and z"

Susanna Reed
5 years ago
Views:

1 ML estimation: Random-intercepts logistic model log p ij 1 p = x ijβ + υ i with υ i N(0, συ) 2 ij Standardizing the random effect, θ i = υ i /σ υ, yields log p ij 1 p = x ij β + σ υθ i with θ i N(0, 1) ij Conditional probability of a positive response is: p ij = P (Y ij = 1 θ i ) = Ψ(z ij ) where the standard logistic cdf is given as 1 Ψ(z ij ) = and z 1 + exp( z ij ) ij = x ij β + σ υθ i 1

2 observations within a subject are assumed independent given the random subject effect (conditional independence) Thus, we can multiply the conditional probabilities across the n i timepoints within a subject together to yield the conditional probability for the n i 1 response vector Y i l(y i θ) = n i Ψ(z ij) Y ij[1 Ψ(z ij )] 1 Y ij j=1 the marginal probability for Y i in the population of subjects is h(y i ) = θ l(y i θ) g(θ) dθ where g(θ) represents the population distribution of the (standardized) random effects, namely, N(0, 1) 2

3 Comments on the marginal probability h(y i ) obtained by considering the conditional likelihood, which depends on the random effect, for all possible values of the random effect, thereby yielding an aggregated or marginal likelihood akin to a weighted average probability, the values of θ modify the response function z ij and thereby modify the conditional likelihood l(y i θ), which is weighted by the probability at that point in the distribution g(θ) as one goes over all values of θ 3

4 Marginal likelihood of the response patterns Y i from all subjects (since subjecs are independent of each other): or taking logs, L = N i=1 h(y i) log L = N i=1 log h(y i) Let η represent either β or σ υ, then taking derivatives log L = N i=1 h 1 (Y i ) h(y i) 4

5 express the marginal likelihood h(y i ) in the following way: = θ l(y i θ) g(θ) dθ = θ n i Ψ(z ij) Y ij[1 Ψ(z ij )] 1 Y ij g(θ) dθ j=1 = θ = θ exp exp log n i n i j=1 Ψ(z ij) Y ij[1 Ψ(z ij )] 1 Y ij g(θ) dθ j=1 Y ij log[ψ(z ij )] + (1 Y ij ) log[1 Ψ(z ij )] g(θ) dθ notice that the expression in the large ( ) is precisely the form of the likelihood in ordinary (fixed-effects) logistic regression remember also that exp f(x)/ x = f(x) f(x)/ x 5

6 Denoting l(y i θ) by l i, we get h(y i ) = θ n i j=1 Y ij Ψ(z ij ) Ψ(z ij) + 1 Y ij 1 Ψ(z ij ) ( Ψ(z ij)) z ij l i g(θ) dθ = θ n i j=1 Y ij Ψ(z ij ) Ψ(z ij )(1 Ψ(z ij )) Ψ(z ij) z ij l i g(θ) dθ and since Ψ(z ij ) equals the pdf, which for the logistic is Ψ(z ij )[1 Ψ(z ij )], log L = N i=1 h 1 (Y i ) θ n i j=1 Y ij Ψ(z ij ) z ij l i g(θ) dθ with z ij β = x ij z ij σ υ = θ i 6

7 Fisher s method of scoring provisional estimates for the vector of all parameters Θ, on iteration ι are improved by Θ ι+1 = Θ ι E 2 log L Θ ι Θ ι 1 log L Θ ι where, the information matrix, or minus the expectation of the matrix of second derivatives, is given by E 2 log L Θ ι Θ = E N ι i=1 h 2 (Y i ) h(y i) Θ ι h(y i ) Θ ι right-hand side sometimes called outer product of the gradients in econometrics, often referred to as the BHHH method 7

8 Numerical Quadrature for integration over θ method to numerically perform an integration θ f(θ)g(θ)dθ Q q=1 f(b q)a(b q ) where B q (q = 1,..., Q) are the quadrature nodes or points A(B q ) (q = 1,..., Q) are the weights (sum = 1) the more points you use, the more accurate the approximation, but the more time it takes For standard normal distribution, Gauss-Hermite quadrature does yield a likelihood value that can be used for LR tests 8

9 Gauss-Hermite quadrature points and weights Number of Quad Points = 3 Quad Points = Quad Weights= Number of Quad Points = 4 Quad Points = Quad Weights= Number of Quad Points = 10 Quad Points = Quad Weights=

10 Using the quadrature points and weights, the response model is z ijq = x ijβ + σ υ B q and so the conditional likelihood is l(y i B q ) = n i Ψ(z ijq) Y ij[1 Ψ(z ijq )] 1 Y ij j=1 yielding the approximated marginal likelihood as The first derivatives are then log L N h(y i ) Q q=1 l(y i B q ) A(B q ) i=1 h 1 (Y i ) Q q=1 n i j=1 Y ij Ψ(z ijq ) z ijq l(y i B q )A(B q ) with z ijq β = x ij z ijq σ υ = B q 10

11 Empirical Bayes estimates ˆθ i = E(θ i Y i ) = h 1 (Y i ) θ θ i l(y i θ) g(θ) dθ h 1 (Y i ) Q q=1 B q l(y i B q ) A(B q ) The variance of the empirical Bayes estimator V (ˆθ i Y i ) = h 1 (Y i ) θ (θ i ˆθ i ) 2 l(y i θ) g(θ) dθ h 1 (Y i ) Q q=1 (B q ˆθ i ) l(y i B q ) A(B q ) At convergence, one more round of quadrature and the converged values of h(y i ) which vary by subjects l(y i B q ) which vary by subjects and quadrature points 11

12 Adaptive Quadrature adapt quadrature points and weights for each subject, and at each iteration, using EB estimates of their location ˆθ i and uncertainty s 2 i = V (ˆθ i Y i ) requires fewer points to obtain accurate solution especially useful if subject random effects are very spread out (i.e., ICC is high) Adapted quadrature points and weights, from original points B q and weights A q, where φ( ) = normal pdf B iq = ˆθ i + s i B q A iq = 2π s i exp(b 2 q/2) φ(b iq ) A q 12

13 Multiple Random Effects quadrature solution must integrate over each random effect dimension (r = number of random effects) B q = (B q1, B q2,..., B qr ) = r dimension quad pt vector A(B q ) = r h=1 A(B qh) = product of univariate weights curse of dimensionality: Q r total points, where Q is the number of points per dimension (e.g., Q = 10 and r = 3 leads to evaluation at 1000 points) adaptive quadrature especially useful here, since Q can be lower 13

14 Random-intercepts Logistic Regression Program at website: hedeker/mixbin.sas.txt 1. form data matrices Y and X Y = Y 1 Y 2. Y N X = X 1 X 2. X N 2. get starting values for β and σ υ fixed-effects estimates for β set σ υ to some pre-assigned value, say based on ICC guess σ υ = ICC π 2 /3 1 ICC 14

15 3. Go over subjects, quad points, and repeated obs to obtain h(y i ) Q q=1 l(y i B q ) A(B q ) log L N i=1 h 1 (Y i ) h(y i) I(η) = E 2 log L = N i=1 h 2 (Y i ) h(y i) h(y i ) Repeat 3 until all elements of [I(η)] 1 log L < convergence criterion 15

16 Other methods for integration of θ Methods based on first- or second-order Taylor series expansions Marginal quasi-likelihood (MQL) involves expansion around the fixed part of the model Penalized or predictive quasi-likelihood (PQL) also includes the random part in its expansion Both are available in the SAS PROC GLIMMIX and MLwiN fast, but doesn t yield a likelihood for LR tests can yield downwardly biased estimates in certain situations (if N and/or n is small, or ICC is high), especially for MQL 16

17 Laplace approximation - Raudenbush et. al., (2000) a combination of a fully multivariate Taylor series expansion and Laplace approximation fast and computationally accurate yields a likelihood for LR tests available in HLM, though not for all models Other methods Markov Chain Monte Carlo (MCMC) Bayesian approach (in BUGS) Maximum Simulated Likelihood (in some STATA programs) in econometric, transportation, political science literatures 17

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture