The combined model: A tool for simulating correlated counts with overdispersion. George KALEMA Samuel IDDI Geert MOLENBERGHS

Similar documents
Multilevel Methodology

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

High-Throughput Sequencing Course

A Flexible Modeling Framework for Overdispersed Hierarchical Data

Introduction (Alex Dmitrienko, Lilly) Web-based training program

The LmB Conferences on Multivariate Count Analysis

PQL Estimation Biases in Generalized Linear Mixed Models

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Generalized linear mixed models for biologists

Hierarchical Hurdle Models for Zero-In(De)flated Count Data of Complex Designs

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

University of California, Berkeley

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

FLEXIBLE REGRESSION MODELS FOR THE ANALYSIS OF HIERARCHICAL DATA FROM MEDICAL STUDIES

Figure 36: Respiratory infection versus time for the first 49 children.

Using Estimating Equations for Spatially Correlated A

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Hypothesis testing in multilevel models with block circular covariance structures

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Stat 5101 Lecture Notes

Surrogate marker evaluation when data are small, large, or very large Geert Molenberghs

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Longitudinal analysis of ordinal data

LOGISTIC REGRESSION Joseph M. Hilbe

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

Part 8: GLMs and Hierarchical LMs and GLMs

Institute of Actuaries of India

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

,..., θ(2),..., θ(n)

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution

Longitudinal + Reliability = Joint Modeling

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

arxiv: v2 [stat.me] 22 Feb 2017

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

University of Pennsylvania

Bayesian Linear Models

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Generalized linear mixed models (GLMMs) for dependent compound risk models

ANALYZING BINOMIAL DATA IN A SPLIT- PLOT DESIGN: CLASSICAL APPROACHES OR MODERN TECHNIQUES?

Generalized linear mixed models for dependent compound risk models

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Weighted Least Squares

A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version

T E C H N I C A L R E P O R T A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS. LITIERE S., ALONSO A., and G.

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Multivariate negative binomial models for insurance claim counts

Package HGLMMM for Hierarchical Generalized Linear Models

Package CopulaRegression

A strategy for modelling count data which may have extra zeros

Chapter 1. Modeling Basics

SAS/STAT 13.1 User s Guide. Introduction to Mixed Modeling Procedures

SAS/STAT 15.1 User s Guide Introduction to Mixed Modeling Procedures

Analysis of correlated count data using generalised linear mixed models exemplified by field data on aggressive behaviour of boars

Statistical Practice. Selecting the Best Linear Mixed Model Under REML. Matthew J. GURKA

Permutation Test for Bayesian Variable Selection Method for Modelling Dose-Response Data Under Simple Order Restrictions

Estimating prediction error in mixed models

Longitudinal Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

ST 740: Multiparameter Inference

Multivariate Regression Models in R: The mcglm package

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Advanced topics from statistics

STAT 526 Advanced Statistical Methodology

Correlated data. Non-normal outcomes. Reminder on binary data. Non-normal data. Faculty of Health Sciences. Non-normal outcomes

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Subject CS1 Actuarial Statistics 1 Core Principles

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

E(Y ij b i ) = f(x ijβ i ), (13.1) β i = A i β + B i b i. (13.2)

Practical considerations for survival models

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Generalized Linear Models (GLZ)

Stat 5101 Notes: Brand Name Distributions

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

GEE for Longitudinal Data - Chapter 8

Probability and Estimation. Alan Moses

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Generalized Estimating Equations

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Linear Models A linear model is defined by the expression

Multivariate Survival Analysis

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Models for Clustered Data

Models for Clustered Data

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Transcription:

The combined model: A tool for simulating correlated counts with overdispersion George KALEMA Samuel IDDI Geert MOLENBERGHS Interuniversity Institute for Biostatistics and statistical Bioinformatics George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 1 / 15

Motivation - Epilepsy data 1 Does treatment reduce number of seizures, on average, over time? 2 Can we make statements about the correlation? 3 How about dispersion? Average evolution of the treatment groups Mean-Variance relationship George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 2 / 15

Motivation - Epilepsy data 1 Does treatment reduce number of seizures, on average, over time? 2 Can we make statements about the correlation? 3 How about dispersion? Average evolution of the treatment groups Mean-Variance relationship George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 2 / 15

Motivation - Epilepsy data 1 Does treatment reduce number of seizures, on average, over time? 2 Can we make statements about the correlation? 3 How about dispersion? Average evolution of the treatment groups Mean-Variance relationship George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 2 / 15

Marginal models for count data Most common tool is GEE (Liang and Zeger, 1986) But correlation is a nuisance Negative-Binomial model also used to account for overdispersion We developed two models from which inference can be on both population-averaged parameters and the association To test our models, we needed to generate data in a marginal-model context So that we can calculate e.g., bias George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 3 / 15

Possibilities for generating count data Use e.g., random effects to induce correlation Obtain consistent parameters for calculation of e.g., bias, by fitting classical Poisson regression to very large dataset Not very reflective of data-generating mechanism in context Other methods in literature Limitations: severe computational restrictions difficulty achieving the target correlation generated variables are required to be overdispersed low correlations obtained correlations constrained to be strictly positive, etc Combined model (Molenberghs et al 2007, 2010) as alternative George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 4 / 15

The Combined Model(CM) Introduced by Molenberghs et al (2007, 2010) Combines the features of correlation/clustering and overdispersion in a single model E.g., For Poisson data, the Poisson-normal (GLMM) + Poisson-Gamma (Neg-Bin) = Poisson-Gamma-Normal George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 5 / 15

GLMM (a special case of CM) as data generator GLMM is given by Y ij Poi(λ ij ) ln(λ ij ) = x ij β + z ij b i b i N(0, D) with marginals (Molenberghs et al 2007,2010) being µ ij = exp(x ij β + 0.5z ij Dz ij ) and var(y i ) = M i + M i (e Z idz i J)M i George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 6 / 15

GLMM algorithm for data generation Given desired marginal (log) mean (( X i )α) and variance-covariance structure (V ) 1 Derive necessary unknowns (β and D) in GLMM by comparing the desired marginals with the marginals from the GLMM 2 Simulate b i 3 Compute ln(λ ij ) = x ij β + z ij b i 4 Simulate Y ij Poi(λ ij ) George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 7 / 15

GLMM: necessary unknowns, example Consider compound symmetry desired marginal mean ln(µ i ) = ( X i )α desired V = M i + τ 2 J (compound symmetry structure) ( ) β = X i X i X i ( X i α 0.5Z i DZ i ) ( ) D = Z i Z i Z i log ( Mi 1 τ 2 JMi 1 + J ) ( ) Z i Z i Z i For a general V, τ 2 J in D above becomes V M i. George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 8 / 15

GLMM, CM as data generators CM is given by Y ij Poi(λ ij ) λ ij = θ ij exp(x ij β + z ij b i ) b i N(0, D) E(θ ij ) = 1, var(θ i ) = Σ i with marginals and where µ ij = exp(x ij β + 0.5z ij Dz ij ) var(y i ) = M i + M i (P i J)M i P i = e (0.5Z idz i ) (Σi + J) e (0.5Z idz i ) George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 9 / 15

CM algorithm for data generation Given desired marginal (log) mean (( X i )α) and variance-covariance structure (V ) 1 Derive necessary unknowns (β, D and Σ i ) in CM similar to GLMM case 2 Generate θ i MGamma(mean = 1, var = Σ i ) or θ i MGamma(shape = 1 Σ i, scale = Σ i ) 3 Simulate b i 4 Compute λ ij = θ ij exp(x ij β + z ij b i) 5 Simulate Y ij Poi(λ ij ) George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 10 / 15

CM: necessary unknowns, example Consider general case desired marginal mean ln(µ i ) = ( X i )α desired variance-covariance structure is V (unstructured structure) ( ) β = X i X i X i ( X i α 0.5Z i DZ i ) ( ) D = Z i Z i Z i log ( Mi 1 (V M i )Mi 1 + J ) ( ) Z i Z i Z i Σ i = e 0.5Z ( idz i M 1 i (V M i )Mi 1 + J ) e 0.5Z idz i J George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 11 / 15

CM - Possible combinations Gamma random effects Yes No Normal random effects Correlated Independent Yes Correlated Independent No George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 12 / 15

Some results... %CorrPoisson(CovData=temp, id=id, OrderVar=time, Xcov=trt time trt*time, Alpha=1.521 0.237 0.254 0.345, Class=trt, outdata=out, random=, desiredvarcov=36 12 29, GammaRandEff=2, NormalRandEff=2); Derived unknowns Parameter α β diff [ D ] Intercept 1.521 1.521 0.0002203 0.0004406 trt 0.237 0.237-1.02E-14 time 0.254 0.254-6.88E-15 trt*time 0.345 0.345 1.082E-14 Y1 Y2 Y2 Y1 trt 0 1 George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 13 / 15

Some results... Derived unknowns Parameter α β diff D Intercept 1.521 1.520 0.0014135 trt 0.437 0.437-3.5E-14 time -0.254-0.255 0.0006473 trt*time 0.145 0.145 6.939E-16 [ 0.0040014 0.0000601 0.0000601 0.0002349 ] Y1 Y2 Y3 Y4 Y4 Y3 Y2 Y1 trt 0 1 George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 14 / 15

References SAS Macro available at http://ibiostat.be/software/overdispersion E-MAIL: george.kalema@med.kuleuven.be Molenberghs, G., Verbeke,G., and Demétrio, C. (2007). An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime Data Analysis 13, 513 531. Molenberghs, G., Verbeke, G., Demétrio, C., and Vieira, A. (2010). A family of generalized linear models for repeated measures with normal and conjugate random effects. Statistical Science 25, 325 347. George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 15 / 15