Package HGLMMM for Hierarchical Generalized Linear Models

Similar documents
Hierarchical Hurdle Models for Zero-In(De)flated Count Data of Complex Designs

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Generalized linear models

STAT 526 Advanced Statistical Methodology

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Chapter 1. Modeling Basics

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Lecture 9 STK3100/4100

A brief introduction to mixed models

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Generalized Linear Models for Non-Normal Data

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Generalized linear mixed models for biologists

Hierarchical Generalized Linear Model Approach For Estimating Of Working Population In Kepulauan Riau Province

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Non-Gaussian Response Variables

Generalized Linear Models. Kurt Hornik

Generalized Linear Models

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Citation for the original published paper (version of record):

Generalized Linear Models 1

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Age 55 (x = 1) Age < 55 (x = 0)

Generalized Linear Models

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Linear Regression Models P8111

Sections 4.1, 4.2, 4.3

The hglm Package (version 1.2)

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Introduction to General and Generalized Linear Models

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

STAT 510 Final Exam Spring 2015

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

Generalized linear models

Chapter 4: Generalized Linear Models-II

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Logistic Regression - problem 6.14

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Multivariate Regression Models in R: The mcglm package

Generalized Linear Models: An Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

The hglm Package. Xia Shen Uppsala University

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Section Poisson Regression

R Output for Linear Models using functions lm(), gls() & glm()

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Lecture 3.1 Basic Logistic LDA

Outline of GLMs. Definitions

Generalized linear models

Generalized Linear Models. stat 557 Heike Hofmann

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

ZERO INFLATED POISSON REGRESSION

UNIVERSITY OF TORONTO Faculty of Arts and Science

Generalized Linear Models Introduction

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

11. Generalized Linear Models: An Introduction

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Correlated Data: Linear Mixed Models with Random Intercepts

Notes for week 4 (part 2)

Introduction to the R Statistical Computing Environment

Generalized Linear Models

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Generalized Linear Models (GLZ)

A strategy for modelling count data which may have extra zeros

Introduction (Alex Dmitrienko, Lilly) Web-based training program

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Modeling Overdispersion

Generalized Models: Part 1

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Generalized linear mixed models (GLMMs) for dependent compound risk models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

36-463/663: Multilevel & Hierarchical Models

Generalised linear models. Response variable can take a number of different formats

Value Added Modeling

Chapter 22: Log-linear regression for Poisson counts

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Transcription:

Package HGLMMM for Hierarchical Generalized Linear Models Marek Molas Emmanuel Lesaffre Erasmus MC Erasmus Universiteit - Rotterdam The Netherlands ERASMUSMC - Biostatistics 20-04-2010 1 / 52

Outline General syntax guide A bit of underlying theoretical concepts Example of analyses Comparison with existing methods Further developments ERASMUSMC - Biostatistics 20-04-2010 2 / 52

Examples Salamander data - crossed random effects Dialyzer data - longitudinal data Dialyzer data - correlated random effects Rats data - overdispersion modeling Cake data - AIC and model comparison ERASMUSMC - Biostatistics 20-04-2010 3 / 52

Hierarchical Generalized Linear Models Distribution of a response: exponential family density The mean of the distribution [ ] yθ b(θ) f(y;θ,φ) = exp + c(y,φ) φ E[y] = b (θ) = µ µ - the location of the distribution φ - the scale of a distribution or overdispersion ERASMUSMC - Biostatistics 20-04-2010 4 / 52

Hierarchical Generalized Linear Models The link function The linear predictor g(µ) = η η = Xβ + Zv Fixed effects in the mean structure - β Random effects in the mean structure - v assumed to originate form a distribution indexed by a dispersion parameter λ v. ERASMUSMC - Biostatistics 20-04-2010 5 / 52

Functions currently in the package HGLMMM HGLMfit - fitting function HGLMLikeDeriv - display derivatives of the fit HGLMLRTest - likelihood ratio test between two nested models BootstrapEnvelopeHGLM - creates bootstrap envelops for deviance residuals summary.hglm - prints out summary of the fit ERASMUSMC - Biostatistics 20-04-2010 6 / 52

HGLMfit syntax HGLMfit(DistResp = "Normal", DistRand = NULL, Link = NULL, LapFix = FALSE, ODEst = NULL, ODEstVal = 0, formulamain, formulaod, formularand, DataMain, DataRand, Offset = NULL, BinomialDen = NULL, StartBeta = NULL, StartVs = NULL, StartRGamma = NULL, INFO = TRUE, DEBUG = FALSE, na.action, contrasts = NULL, CONV = 1e-04) ERASMUSMC - Biostatistics 20-04-2010 7 / 52

HGLMfit syntax description DistResp - specify the distribution of the response as: "Normal", "Binomial", "Poisson", "Gamma" DistRand - specify the distribution of random effects: vector of distributions length equal to number of random components c("beta","gamma","igamma","normal") ERASMUSMC - Biostatistics 20-04-2010 8 / 52

HGLMfit syntax description Link - specify the link function for the response Canonical links available for Normal, Poisson and Binomial Gamma distribution has Log or Inverse link available LapFix - specify whether p v (h) is used for the estimation of the fixed effects If TRUE additional piece of code is used to estimate fixed effects as in Noh and Lee (2007) If FALSE hierarchical likelihood is used for estimation of fixed and random parameters ERASMUSMC - Biostatistics 20-04-2010 9 / 52

HGLMfit syntax description ODEst - specify whether the overdispersion parameter should be fixed or estimated if NULL it will be fixed for Poisson and Binomial, while estimated for Normal and Gamma if TRUE overdispersion structure will be estimated if FALSE overdispersion structure will be held fixed formulamain - specify the formula for the fixed structure of the model Formula with fixed and random components in the mean structure as in lme4 ERASMUSMC - Biostatistics 20-04-2010 10 / 52

HGLMfit syntax description formulaod - specify the dispersion structure (residual/overdispersion) One sided formula formularand - specify the dispersion structure of the random effects a list of one sided formulas, number of list entries must be equal to the number of the dispersion components DataMain - specify the main dataset, which will be used for formulamain and formulaod DataRand - a list containing the names of the data frames used for formularand ERASMUSMC - Biostatistics 20-04-2010 11 / 52

HGLMfit syntax description Offset - Offset variable in Poisson regression as log( µ t ) BinomialDen - specify the denominator of the Binomial distribution should be a vector of length equal to the number of observations StartBeta, StartVs, StartRGamma - specify starting values for fixed parameters, random effects and dispersion parameters of random effects respectively ODEstVal - supply values for overdispersion/ residual dispersion structure ERASMUSMC - Biostatistics 20-04-2010 12 / 52

Class HGLM objects - result of HGLMfit estimation Results - contains estimates Details - contains designs NAMES - contains labels for print out of the results CALL - contains the original call of the estimating function HGLMfit ERASMUSMC - Biostatistics 20-04-2010 13 / 52

Class HGLM objects - component Results Estimates of fixed and random effects in the mean structure Estimates of dispersion and (over)/residual dispersion parameters Gradient / Hessian / StdErrors of fixed, random and dispersion estimates Values of h-likelihood, marginal likelihood (REML) and conditional likelihood ERASMUSMC - Biostatistics 20-04-2010 14 / 52

Class HGLM objects - component Details Deviance residuals and standardized deviance residuals Involving proper hat matrix For outcome (assumed distribution) For random effects (assumed distribution) For (over)/residual dispersion (gamma distribution) For dispersion components (gamma distribution) ERASMUSMC - Biostatistics 20-04-2010 15 / 52

Other functions description HGLMLRTest Likelihood ratio test comparing two models - two arguments two objects of class HGLM HGLMLikeDeriv Gives gradients of fixed effects in the mean structure and variance components BootstrapEnvelopeHGLM Creates a 95% confidence intervals for correct residual diagnostics ERASMUSMC - Biostatistics 20-04-2010 16 / 52

Examples - Dialyzer data Dialyzer dataset Response is UFR Covariate of interest is TMP 3 centers involved - coded in a center variable Random effect - Dialyzer number Aim: Determine the relationship between UFR and TMP and determine if this relationship differs across the three centers, which use different systems to manipulate the TMP ERASMUSMC - Biostatistics 20-04-2010 17 / 52

Examples - Dialyzer data 2000 Center 1 1500 Center 2 Center 3 UFR 1000 500 200 300 400 TMP ERASMUSMC - Biostatistics 20-04-2010 18 / 52

Examples - Dialyzer data Standard analysis via SAS PROC MIXED Random intercept model Random intercept and slope model - no correlation Random intercept and slope model - fixed correlation Search over the grid for the correlation value ERASMUSMC - Biostatistics 20-04-2010 19 / 52

Dialyzer data - random intercept model dialyzer1<-dialyzer[complete.cases(dialyzer),] dialyzer1$ufrstd<-(dialyzer1$ufr-mean(dialyzer1$ufr))/sd(dialyzer1$ufr) DatasetRAEF<-data.frame(intercept=rep(1,41)) mod_dial1<-hglmfit(distresp = "Normal", DistRand = c("normal"), Link = "Identity", LapFix = FALSE, ODEst = TRUE, ODEstVal = 0, UFRSTD ~ TMP+as.factor(CENTER)+as.factor(CENTER):TMP+(1 DIALYZER), formulaod = ~ 1, list(one=~1), DataMain=dialyzer1, DataRand=list(DatasetRAEF), Offset = NULL, BinomialDen = NULL, StartBeta = NULL, StartVs = NULL, StartRGamma = NULL, INFO = TRUE, DEBUG = FALSE,contrasts = NULL, CONV = 1e-04) summary(mod_dial1) ERASMUSMC - Biostatistics 20-04-2010 20 / 52

Dialyzer data - random intercept/slope model mod_dial2<-hglmfit(distresp = "Normal", DistRand = c("normal","normal"), Link = "Identity", LapFix = FALSE, ODEst = TRUE, ODEstVal = 0, UFRSTD ~ TMP+as.factor(CENTER)+as.factor(CENTER):TMP+ (1 DIALYZER)+(TMP DIALYZER), formulaod = ~ 1, list(one=~1,two=~1), DataMain=dialyzer1, DataRand=list(DatasetRAEF,DatasetRAEF), Offset = NULL, BinomialDen = NULL, StartBeta = NULL, StartVs = NULL, StartRGamma = NULL, INFO = TRUE, DEBUG = FALSE,contrasts = NULL, CONV summary(mod_dial2) ERASMUSMC - Biostatistics 20-04-2010 21 / 52

Dialyzer data - known correlation parameter Assume correlation between random intercept and slope is known ρ = 0.648 Fit model under independence - obtain estimates of variances of intercept and slope, construct variance covariance matrix using known correlation and computed variances Compute the cholesky decomposition of this matrix Change the design matrix of random effects Fit model update the estimates of variances and use it to construct new covariance matrix using known correlation Compute cholesky decomposition of a new matrix and refit the model after changing design matrix again When variance components of your fit are close to 1 stop the procedure ERASMUSMC - Biostatistics 20-04-2010 22 / 52

Dialyzer data - known correlation parameter If variances of random intercept and slope are assumed the same - only one step is required If correlation is unknown a grid search could be done This implies many iterations in nested loops - inefficient Possibly modification of the current code to do it at every iteration ERASMUSMC - Biostatistics 20-04-2010 23 / 52

Dialyzer data - known correlation parameter temp1<-as.vector(exp(mod_dial3$results$dispersion)) rho<--0.648 Rcov<-matrix(c(temp1[1],rho*sqrt(temp1[1]*temp1[2]), rho*sqrt(temp1[1]*temp1[2]),temp1[2]),2,2) tempchol<-chol(rcov) originalz<-cbind(rep(1,nrow(dialyzer1)),dialyzer1$tmp) modifiedz<-originalz%*%t(tempchol) dialyzer1$newint<-modifiedz[,1] dialyzer1$newtmp<-modifiedz[,2] mod_dial3<-hglmfit(distresp = "Normal", DistRand = c("normal","normal"), Link = "Identity", LapFix = FALSE, ODEst = TRUE, ODEstVal = 0, UFRSTD ~ TMP+as.factor(CENTER)+as.factor(CENTER):TMP+ (NEWINT DIALYZER)+(NEWTMP DIALYZER), formulaod = ~ 1, list(one=~1,two=~1),datamain=dialyzer1, DataRand=list(DatasetRAEF,DatasetRAEF), Offset = NULL, BinomialDen = NULL, StartBeta = NULL, StartVs = NULL, StartRGamma = NULL, INFO = TRUE, DEBUG = FALSE,contras ERASMUSMC - Biostatistics 20-04-2010 24 / 52

Dialyzer data - known correlation parameter temp2<-as.vector(exp(mod_dial3$results$dispersion)) rho<--0.648 temp3<-t(tempchol)%*%matrix(c(temp2[1],0,0,temp2[2]),2,2)%*%tempchol Rcov1<-matrix(c(temp3[1,1],rho*sqrt(temp3[1,1]*temp3[2,2]), rho*sqrt(temp3[1,1]*temp3[2,2]),temp3[2,2]),2,2) tempchol<-chol(rcov1) originalz<-cbind(rep(1,nrow(dialyzer1)),dialyzer1$tmp) modifiedz<-originalz%*%t(tempchol) dialyzer1$newint<-modifiedz[,1] dialyzer1$newtmp<-modifiedz[,2] ERASMUSMC - Biostatistics 20-04-2010 25 / 52

Dialyzer data - known correlation parameter Results ===== Fixed Coefficients - Mean Structure ===== Estimate Std. Error Z value Pr(> Z ) (Intercept) -2.7702081 0.0314168-88.176 < 2e-16 *** TMP 0.0093937 0.0001110 84.623 < 2e-16 *** as.factor(center)2 0.0113560 0.0474683 0.239 0.810925 as.factor(center)3 0.0350136 0.0513306 0.682 0.495164 TMP:as.factor(CENTER)2-0.0006180 0.0001692-3.653 0.000259 *** TMP:as.factor(CENTER)3-0.0006929 0.0001813-3.821 0.000133 *** --- ===== Overdispersion Parameters Estimated ===== Estimate Std. Error Z value Pr(> Z ) (Intercept) -5.700 0.149-38.27 <2e-16 *** --- ERASMUSMC - Biostatistics 20-04-2010 26 / 52

Dialyzer data - known correlation parameter ===== Dispersion Parameters Estimated ===== Dispersion Component: DIALYZER Estimate Std. Error Z value Pr(> Z ) (Intercept) -0.0001058 0.4002638-0.000264 1 Dispersion Component: DIALYZER Estimate Std. Error Z value Pr(> Z ) (Intercept) -0.00104 0.25326-0.004 0.997 ===== Likelihood Functions Value ===== H-likelihood : 162.3979 Marginal likelihood: 170.8414 REML likelihood : 138.1487 C-likelihood : 265.6102 ERASMUSMC - Biostatistics 20-04-2010 27 / 52

Examples - Dialyzer data BootstrapEnvelopeHGLM(mod_dial_final,19,67523) Sample Quantiles 4 2 0 2 4 4 2 0 2 4 Theoretical Quantiles ERASMUSMC - Biostatistics 20-04-2010 28 / 52

Package HGLMMM Salamander data Dependent variable: success of salamanders mating Mate ij 60 male salamanders (i=1...60) + 60 female salamanders (j=1...60) Two populations of salamanders: whiteside (W) and roughbutt (R) 360 observations Question: does the type of salamander influence probability of a successful mating The model ( ) µij log = Intercept + TypeF j + TypeM i + TypeF j TypeM i + v i + v j 1 µ ij ERASMUSMC - Biostatistics 20-04-2010 29 / 52

Package HGLMMM Crossed random effects: Male Female 1 4 1 3 1 2 1 1 2 1 3 1 4 1 ERASMUSMC - Biostatistics 20-04-2010 30 / 52

Package HGLMMM Gaussian quadrature infeasible We will perform the following analyses: GLM ignoring correlation in R glm() PQL analysis in SAS PROC GLIMMIX Mixed model using Laplace approximation in R lme4:::lmer() HL(0,1) in R HGLMMM package HL(1,1) in R HGLMMM package HL(1,1) + estimation of overdispersion φ in R HGLMMM package ERASMUSMC - Biostatistics 20-04-2010 31 / 52

Package HGLMMM Generalized linear model in SAS proc genmod data=sal descending; model mate=typefw typemw typefw*typemw/dist=binomial link=logit; run; Generalized linear model in R glm(cbind(mate,1-mate)~typef+typem+typef*typem, family=binomial(link=logit),data=salamander) ERASMUSMC - Biostatistics 20-04-2010 32 / 52

Package HGLMMM PQL model in SAS proc glimmix data=sal method=rspl; class female male; model mate=typefw typemw typefw*typemw/dist=binomial link=logit s random female male; random _residual_; run; GLMM using Laplace approximation in lme4:::lmer library(lme4) lmer(mate~typef+typem+typef*typem+(1 Male)+(1 Female), family=binomial(link=logit),data=salamander) ERASMUSMC - Biostatistics 20-04-2010 33 / 52

Package HGLMMM Hierarchical Generalized Linear Model - HL(0,1) library(hglmmm) RSal<-data.frame(int=rep(1,60)) HGLMfit(DistResp="Binomial",DistRand=c("Normal","Normal"), Link="Logit",LapFix=FALSE,ODEst=FALSE,ODEstVal=c(0), formulamain=mate~typef+typem+typef*typem+(1 Female)+(1 Male), formulaod=~1,formularand=list(one=~1,two=~1), DataMain=salamander,DataRand=list(RSal,RSal), Offset=NULL,BinomialDen=rep(1,360),INFO=TRUE,DEBUG=FALSE) Hierarchical Generalized Linear Model - HL(1,1) + overdispersion LapFix=TRUE ODEst=TRUE ERASMUSMC - Biostatistics 20-04-2010 34 / 52

Package HGLMMM Salamander data - point estimates Intercept TypeF TypeM TypeF*TypeM Female Male Phi glm 0.69-2.01-0.47 2.48 1 PQL 0.79-2.29-0.54 2.82 0.72 0.63 1 PQL OD 0.93-2.73-0.65 3.35 1.44 1.31 0.66 lme 1.01-2.9-0.7 3.59 1.17 1.04 1 HL(0,1) 0.83-2.43-0.58 2.99 1.12 0.97 1 HL(1,1) 1.04-3.01-0.73 3.71 1.38 1.21 1 HL(1,1)OD 1.11-3.2-0.78 3.94 1.73 1.52 0.89 Whiteside female and Roughbutt male have lowest probability of success Both of the same population have similar probability of successful mating ERASMUSMC - Biostatistics 20-04-2010 35 / 52

Package HGLMMM Salamander data - test statistics Intercept TypeF TypeM TypeF*TypeM glm 3.1-5.9-1.5 5.4 pql 2.5-5.3-1.4 5.7 pql OD 2.5-5.8-1.5 7.2 lme 2.7-5.8-1.6 6.6 HL(0,1) 2.3-5.1-1.4 5.8 HL(1,1) 2.6-5.7-1.5 6.5 HL(1,1)OD 2.6-5.9-1.6 6.9 ERASMUSMC - Biostatistics 20-04-2010 36 / 52

Package HGLMMM Rat data 30 rats 3 drugs 4 timepoints 120 observations White blood cell count and red blood cell count Response: number of cancer cell colonies Question: Is there a difference between the drugs ERASMUSMC - Biostatistics 20-04-2010 37 / 52

Package HGLMMM Poisson Model Quasi-Poisson model Dispersion component depends on WBC Diagnostic plots ERASMUSMC - Biostatistics 20-04-2010 38 / 52

Package HGLMMM Poisson model Rrat<-data.frame(WBC=tapply(rat$WhiteBloodCells,rat$Subject,mean), RBC=tapply(rat$RedBloodCells,rat$Subject,mean)) modrat1<-hglmfit(distresp="poisson",distrand=c("normal"),link="log", LapFix=FALSE,ODEst=FALSE,ODEstVal=c(0), formulamain= Y~WhiteBloodCells+RedBloodCells+as.factor(Drug)+(1 Subject), formulaod=~1,formularand=list(one=~1), DataMain=rat, DataRand=list(Rrat),INFO=TRUE,DEBUG=FALSE) ERASMUSMC - Biostatistics 20-04-2010 39 / 52

Package HGLMMM Sample Quantiles 4 2 0 2 4 4 2 0 2 4 Theoretical Quantiles ERASMUSMC - Biostatistics 20-04-2010 40 / 52

Package HGLMMM Quasi-Poisson Model HGLMfit(DistResp="Poisson",DistRand=c("Normal"),Link="Log", LapFix=FALSE,ODEst=TRUE,ODEstVal=c(0), formulamain= Y~WhiteBloodCells+RedBloodCells+as.factor(Drug)+(1 Subject),,formulaOD=~1,formulaRand=list(one=~WBC+I(WBC^2)), DataMain=rat,DataRand=list(Rrat),INFO=TRUE,DEBUG=FALSE) ERASMUSMC - Biostatistics 20-04-2010 41 / 52

Package HGLMMM Diagnostics for Rat Model Quasi Poisson (y v) Deviance Residuals 3 2 1 0 1 2 Absolute Deviance Residuals 0.0 1.0 2.0 3.0 3.6 3.8 4.0 4.2 4.4 4.6 4.8 Scaled Fitted Values 3.6 3.8 4.0 4.2 4.4 4.6 4.8 Scaled Fitted Values Normal Q Q Plot Histogram Sample Quantiles 3 2 1 0 1 2 Frequency 0 10 20 30 2 1 0 1 2 Theoretical Quantiles 4 2 0 2 4 Deviance Residuals ERASMUSMC - Biostatistics 20-04-2010 42 / 52

Package HGLMMM Poisson Quasi-Poisson Quasi-Poisson PQL Intercept 3.301 <0.001 2.855 <0.001 2.709 <0.001 2.855 <0.001 WBC -0.052 <0.001-0.019 <0.001-0.014 0.006-0.019 <0.001 RBC 0.013 0.41 0.029 <0.001 0.028 <0.001 0.029 <0.001 Drug=2 0.197 0.034 0.146 0.345 0.166 0.098 0.146 0.347 Drug=3 0.186 0.055-0.045 0.773 0.109 0.247-0.046 0.773 Phi 1 0.111 0.104 0.117 Intercept -3.667-2.139 1.091-2.21 WBC -0.599 WBC 2 0.018 ERASMUSMC - Biostatistics 20-04-2010 43 / 52

Package HGLMMM Cake data Dependent variable: breaking angle of cakes 270 cakes 3 recipes and 6 temperatures cakes baked in batches of 18 (3 recipes * 6 temperatures) Random effects for batch and random effect for recipe within batch Question: what is the effect of the baking temperature and recipe on the breaking angle The model η ijk = intercept + recipe j + temp k + recipe j temp k + v i + v ij ERASMUSMC - Biostatistics 20-04-2010 44 / 52

Package HGLMMM Models considered Breaking angle as normal or gamma random variable What distribution for random effects One or two random effects Which mean structure - do we need an interaction ERASMUSMC - Biostatistics 20-04-2010 45 / 52

Package HGLMMM Modeling strategy Start with a complex model Use AIC (marginal likelihood) for selection of the distribution of the response Use h-likelihood for selection of distribution of the random effects Use LR test (REML) to test variance component of random effect equal to zero Use LR test (marginal likelihood) to test for the interaction ERASMUSMC - Biostatistics 20-04-2010 46 / 52

Package HGLMMM Normal Model ===== Likelihood Functions Value ===== H-likelihood : -893.6902 Marginal likelihood: -819.5366 REML likelihood : -797.6732 C-likelihood : -767.5713 Gamma Model ===== Likelihood Functions Value ===== H-likelihood : -676.3907 Marginal likelihood: -808.0586 REML likelihood : -848.9244 C-likelihood : -754.2644 We proceed with the gamma model ERASMUSMC - Biostatistics 20-04-2010 47 / 52

Package HGLMMM Selection of random effects Effect 1 Effect2 H-likelihood Normal Normal -676.4 Normal IGamma -874.8 IGamma IGamma -911.4 Gamma Gamma -676.9 Beta Beta -676.4 Lets keep Gaussian random effects ERASMUSMC - Biostatistics 20-04-2010 48 / 52

Package HGLMMM Do we need both random effects? > HGLMLRTest(modCake2,modCake7) H-likelihood of model 1 is higher Marginal likelihood comparison: LR test p-value: NA LR test statistics: 12.70101 LR difference df: 0 REML likelihood comparison: LR test p-value: 0.0005694908 LR test statistics: 11.87315 LR difference df: 1 We prefer the model with two random effects ERASMUSMC - Biostatistics 20-04-2010 49 / 52

Package HGLMMM Do we need interaction in the mean structure > HGLMLRTest(modCake2,modCake8) H-likelihood of model 1 is higher Marginal likelihood comparison: LR test p-value: 0.5034955 LR test statistics: 9.304224 LR difference df: 10 REML likelihood comparison: LR test p-value: NA LR test statistics: 29.88638 LR difference df: 0 We prefer the simpler model ERASMUSMC - Biostatistics 20-04-2010 50 / 52

Package HGLMMM Further developments Make the package more compatible with R style Add estimation of random effects with known correlation Implement non-canonical links - probit, cloglog Use package MATRIX for large datasets Efficient computation of the matrix T(T T Σ 1 a T) 1 T T Σ 1 a Second order approximations ERASMUSMC - Biostatistics 20-04-2010 51 / 52

Package HGLMMM Currently known bugs ODEst=FALSE with Gaussian response does not work properly Full description of random effects in summary function intercept/subject/distribution Proper handling of missing values OTHER BUGS ARE WELCOME Thank you for your attention m.molas@erasmusmc.nl ERASMUSMC - Biostatistics 20-04-2010 52 / 52