Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model

Size: px

Start display at page:

Download "Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model"

Penelope Scott
6 years ago
Views:

1 Outline for today Computation of the likelihood function for GLMMs asmus Waagepetersen Department of Mathematics Aalborg University Denmark Computation of likelihood function. Gaussian quadrature 2. Monte Carlo methods Newton-aphson EM-algorithm case study of non-linear mixed effects model April 30, 202 /27 2/27 Generalized linear mixed effects models Consider stochastic variable Y = (Y,...,Y n ) and random effects U. Two step formulation of GLMM: U N(0,Σ). Given realization u of U, Y i independent and each follows density f i (y u) with mean µ i = g (η i ) and linear predictor η = Xβ +Zu. I.e. conditional on U, Y i follows a generalized linear model. NB: GLMM specified in terms of marginal density of U and conditional density of Y given U. But the likelihood is the marginal density of f(y) which can be hard to evaluate! Likelihood for generalized linear mixed model For normal linear mixed models we could compute the marginal distribution of Y directly as a multivariate normal. That is, f(y) is a density of a multivariate normal distribution. For a generalized linear mixed model it is difficult to evaluate the integral: f(y) = f(y,u)du = f(y u)f(u)du m m since f(y u)f(u) is a very complex function. 3/27 4/27

2 Adaptive Gaussian Quadrature - one random effect Gauss-Hermite quadrature (numerical integration) is n f(x)dx w i f(x i ) where φ is the standard normal density and (x i,w i ),i =,n are certain arguments and weights which can be looked up in a table. i= Advantage f(y u)f(u) φ(u;µ LP,σLP 2 ) = f(y σ LPx +µ LP )f(σ LP x +µ LP ) x = (u µ LP )/σ LP close to constant (f(y)) hence adaptive G-H quadrature very accurate. We can replace with = whenever f is a polynomial of degree 2n or less. Adaptive Gauss-Hermite quadrature: f(y u)f(u) f(y u)f(u)du φ(u;µ LP,σLP 2 )φ(u;µ LP,σLP 2 )du = f(y σlp x +µ LP )f(σ LP x +µ LP ) σ LP dx (change of variable: x = (u µ LP )/σ LP ) 5/27 GH scheme with n = 5: x w (x s are roots of Hermite polynomial computed using ghq in library glmmml). (GH schmes for n = 5 and n = 0 available on web page) 6/27 Prediction of random effects for GLMM Computation of conditional expectations (prediction) Conditional mean E[U Y = y] = uf(u y)du is minimum mean square error predictor, i.e. E(U Û)2 E[U Y = y] = f(y) u f(y u)f(u) du = f(y) (σ LP x+µ LP ) f(y σ LPx +µ LP )f(σ LP x +µ LP ) σ LP dx is minimal with Û = H(Y) where H(y) = E[U Y = y] Note: Difficult to analytically evaluate E[U Y = y] = uf(y u)f(u)/f(y)du (σ LP x +µ LP ) f(y σ LPx +µ LP )f(σ LP x +µ LP ) σ LP behaves like a first order polynomial in x - hence GH still accurate. 7/27 8/27

3 Difficult cases for numerical integration - dimension m > Monte Carlo computation of likelihood for GLMM correlated random effects crossed random effects nested random effects Not possible to factorize likelihood into low-dimensional integrals Number of quadrature points k m where k is number of quadrature points for D hence numerical quadrature may not be feasible. Likelihood function is an expectation: L(θ) = f(y;θ) = f(y u;β)f(u;τ 2 )du = E τ 2f(y U;β) m Use Monte Carlo simulations to approximate expectation. NB: also applicable in high dimensions Alternatives: PQL and Laplace-approximation or Monte Carlo computation. 9/27 0/27 Importance sampling Consider Z f and suppose we wish to evaluate Eh(Z) where h(z) has large variance. Suppose we can find density g so that Then h(z)f(z) g(z) where Y g. const and h(z)f(z) > 0 g(z) > 0 h(z)f(z) Eh(Z) = g(z)dz = E h(y)f(y) g(z) g(y) Importance sampling for GLMM g( ) probability density on m. L(θ) = L(θ) L IS,h (θ) = M f(y u;β)f(u;τ 2 )du = E f(y V;β)f(V;τ2 ) g(v) f(y V l ;β)f(v l ;τ 2 ) g(v l ) f(y u;β)f(u;τ 2 ) g(u)du = g(u) where V g( ). where V l g( ),l =,...,M Note variance of h(y)f(y)/g(y) small so estimate Eh(Z) n has small Monte Carlo error. n i= h(y i )f(y i ) g(y i ) /27 Find h so Var f(y V;β)f(V;τ2 ) g(v) small. VarL IS,h (θ) < if f(y v;θ)f(v;θ)/g(v) bounded (i.e. use g( ) with heavy tails). 2/27

4 Simple Monte Carlo: g(u) = f(u;τ 2 ) L(θ) = M f(y u;β)f(u;τ 2 )du = E τ 2f(y U;β) L SMC (θ) = f(y U l ;β) where U l N(0,τ 2 ) independent Monte Carlo variance: Var(L SMC (θ)) = M Varf(y U ;β) Estimate Varf(y U ;β) using empirical variance estimate based on f(y U l ;β), l =,...,M: M (f(y U l ;β) L SMC (θ)) 2 Often Varf(y U ;β) is large so large M is needed. Increasing dimension leads to worse performance (useless in high dimensions) 3/27 4/27 Possibility: Consider fixed θ 0 : Possibility: Note f(y u,β)f(u;τ 2 ) = f(y;θ) = L(θ) = const f(u y;θ) Laplace: U Y = y N ( µ LP,σLP 2 ) Use g( ) density for N ( µ LP,σ 2 LP) or tν ( µlp,σ 2 LP) -distribution: so small variance. Simulation straightforward. f(y u,β)f(u;τ 2 ) g(u) const Note: Monte Carlo version of adaptive Gaussian quadrature. g(u) = f(u y,θ 0 ) = f(y u;θ 0 )f(u;θ 0 )/L(θ 0 ) Then f(y u;β)f(u;τ 2 ) f(y u;β)f(u;τ 2 ) L(θ) = g(u)du = L(θ 0 ) g(u) f(y u;β 0 )f(u;τ0 2)f(u y,θ 0)du = [ f(y U;β)f(U;τ 2 ) ] L(θ 0 )E θ0 f(y U;β 0 )f(u;τ0 2) Y = y L(θ) [ f(y U;β)f(U;τ 2 L(θ 0 ) = E ) ] θ 0 f(y U;β 0 )f(u;τ0 2) Y = y So we can estimate ratio L(θ)/L(θ 0 ) where L(θ 0 ) is unknown constant. This suffices for finding MLE: L(θ) arg maxl(θ) = argmax θ θ L(θ 0 ) M f(y U l ;β)f(u l ;τ 2 ) f(y U l ;β 0 )f(u l ;τ 2 0 ) 5/27 where U l f(u y;θ 0 ) 6/27

5 Simulation of random variables Direct methods exists for many standard distributions (normal, binomial, t, etc.: rnorm(), rbinom(), rt() etc.) Suppose f is a non-standard density but f(z) Kg(z) for some constant K and standard density g. Then we may apply rejection sampling:. Generate Y g and W unif[0,]. 2. If W f(y) Kg(Y) return Y (accept); otherwise go to (reject). Note probability of accept is /K. Proof of rejection sampling: compute P(Y y accept) = P(Y y W f(y) Kg(Y) ) If f is high-dimensional density it may be hard to find g with small K so rejection sampling mainly useful in small dimensions. MCMC is then useful alternative (later). 7/27 8/27 Prediction of U using conditional simulation Conditional simulation of U Y = y using rejection sampling Compute Monte Carlo estimate of E(U Y = y) using conditional simulations of U Y = y or importance sampling. E(U Y = y) = M U m, U m f(u y) m= We can also evaluate e.g. P(U i > c y) or P(U i > U l,l i Y) etc. Note f(y u;β 0 )f(u;τ 2 0)/f(y;θ 0 ) K t ν (u;µ LP,σ 2 LP ) for some constant K. ejection sampling:. Generate V t ν (µ LP,σLP 2 ) and W Unif(]0,[) 2. eturn V if W f(v y;θ 0 )/(K t(v;µ LP,σLP 2 )); otherwise go to. 9/27 20/27

6 Maximization of likelihood using Newton-aphson Let Then and V θ (y,u) = d dθ logf(y,u θ) u(θ) = d dθ logl(θ) = E θ[v θ (y,u) Y = y] j(θ) = d2 dθ T dθ logl(θ) = ( E θ [dv θ (y,u)/dθ T Y = y] Var θ [V θ (y,u) Y = y] ) Newton-aphson: θ l+ = θ l +j(θ l ) u(θ l ) All unknown expectations and variances can be estimated using the previous numerical integration or Monte Carlo methods! EM-algorithm Given current estimate θ l :. (E) compute Q(θ l,θ) = E θl [logf(y,u θ) Y = y] 2. (M) θ l+ = argmax θ Q(θ l,θ). For LNMM E-step can be computed explicitly (Jiang page 65) but seems pointless as likelihood is available in closed form. For GLMMs (E) step needs numerical integration or Monte Carlo. Convergence of EM-algorithm can be quite slow. Maximization of likelihood using Newton-aphson seems better alternative. 2/27 22/27 Overview of Jiang page 65-7 Case study: non-linear mixed effects model for cow mats data Compression vs. pressure for two brands of mats Page (Monte Carlo) EM for linear mixed model and threshold model Non-linear relation Page 68-69: Newton-ahpson and importance sampling. Page 70-7: Monte Carlo EM with adaptive importance sampling (t-distribution + Laplace approximation) compression y = mmf(x) = ab +cxd b +x d, andom variation between mats of same brand, small measurement noise pressure 23/27 24/27

7 Simulated data from the two models: Estimation of non-linear model with fixed effects: Fixed effects: residual standard error 0.72 With random effects: residual standard error 0.7 nlsfit=nls(nedtryk~mmf(tryk,a,b,c,d),start= c(a=0.,b=.670,c=80,d=0.6),data=mattressdata) Estimation of non-linear model with a, b, c as random effects: nlmerfit=nlmer(nedtryk~mmfnlmer(tryk,a,b,c,d)~(a matno)+ (b matno)+(c matno),mattressdata,start=c(a=0.04,b=.64,c=74,d=0.64)) compression compression pressure pressure 25/27 Std. err. for a,b,c are 0.64,0.05 and 0.4 andom effects model gives much better representation of variability in data. 26/27 NB: to assess influence of variability of different parameters we need to look at partial derivatives (sensitivities) wrt. these parameters. Exercises. Exercise 4.3 on page 229 in Jiang. 2. exercises on exercise-sheets exercises6.pdf and exercises7.pdf. 3. Check formulas for u(θ) and j(θ). How can these expression be computed using importance sampling? 27/27

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model Outline for today Computation of the likelihood function for GLMMs asmus Waagepetersen Department of Mathematics Aalborg University Denmark likelihood for GLMM penalized quasi-likelihood estimation Laplace