The combined model: A tool for simulating correlated counts with overdispersion George KALEMA Samuel IDDI Geert MOLENBERGHS Interuniversity Institute for Biostatistics and statistical Bioinformatics George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 1 / 15
Motivation - Epilepsy data 1 Does treatment reduce number of seizures, on average, over time? 2 Can we make statements about the correlation? 3 How about dispersion? Average evolution of the treatment groups Mean-Variance relationship George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 2 / 15
Motivation - Epilepsy data 1 Does treatment reduce number of seizures, on average, over time? 2 Can we make statements about the correlation? 3 How about dispersion? Average evolution of the treatment groups Mean-Variance relationship George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 2 / 15
Motivation - Epilepsy data 1 Does treatment reduce number of seizures, on average, over time? 2 Can we make statements about the correlation? 3 How about dispersion? Average evolution of the treatment groups Mean-Variance relationship George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 2 / 15
Marginal models for count data Most common tool is GEE (Liang and Zeger, 1986) But correlation is a nuisance Negative-Binomial model also used to account for overdispersion We developed two models from which inference can be on both population-averaged parameters and the association To test our models, we needed to generate data in a marginal-model context So that we can calculate e.g., bias George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 3 / 15
Possibilities for generating count data Use e.g., random effects to induce correlation Obtain consistent parameters for calculation of e.g., bias, by fitting classical Poisson regression to very large dataset Not very reflective of data-generating mechanism in context Other methods in literature Limitations: severe computational restrictions difficulty achieving the target correlation generated variables are required to be overdispersed low correlations obtained correlations constrained to be strictly positive, etc Combined model (Molenberghs et al 2007, 2010) as alternative George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 4 / 15
The Combined Model(CM) Introduced by Molenberghs et al (2007, 2010) Combines the features of correlation/clustering and overdispersion in a single model E.g., For Poisson data, the Poisson-normal (GLMM) + Poisson-Gamma (Neg-Bin) = Poisson-Gamma-Normal George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 5 / 15
GLMM (a special case of CM) as data generator GLMM is given by Y ij Poi(λ ij ) ln(λ ij ) = x ij β + z ij b i b i N(0, D) with marginals (Molenberghs et al 2007,2010) being µ ij = exp(x ij β + 0.5z ij Dz ij ) and var(y i ) = M i + M i (e Z idz i J)M i George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 6 / 15
GLMM algorithm for data generation Given desired marginal (log) mean (( X i )α) and variance-covariance structure (V ) 1 Derive necessary unknowns (β and D) in GLMM by comparing the desired marginals with the marginals from the GLMM 2 Simulate b i 3 Compute ln(λ ij ) = x ij β + z ij b i 4 Simulate Y ij Poi(λ ij ) George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 7 / 15
GLMM: necessary unknowns, example Consider compound symmetry desired marginal mean ln(µ i ) = ( X i )α desired V = M i + τ 2 J (compound symmetry structure) ( ) β = X i X i X i ( X i α 0.5Z i DZ i ) ( ) D = Z i Z i Z i log ( Mi 1 τ 2 JMi 1 + J ) ( ) Z i Z i Z i For a general V, τ 2 J in D above becomes V M i. George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 8 / 15
GLMM, CM as data generators CM is given by Y ij Poi(λ ij ) λ ij = θ ij exp(x ij β + z ij b i ) b i N(0, D) E(θ ij ) = 1, var(θ i ) = Σ i with marginals and where µ ij = exp(x ij β + 0.5z ij Dz ij ) var(y i ) = M i + M i (P i J)M i P i = e (0.5Z idz i ) (Σi + J) e (0.5Z idz i ) George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 9 / 15
CM algorithm for data generation Given desired marginal (log) mean (( X i )α) and variance-covariance structure (V ) 1 Derive necessary unknowns (β, D and Σ i ) in CM similar to GLMM case 2 Generate θ i MGamma(mean = 1, var = Σ i ) or θ i MGamma(shape = 1 Σ i, scale = Σ i ) 3 Simulate b i 4 Compute λ ij = θ ij exp(x ij β + z ij b i) 5 Simulate Y ij Poi(λ ij ) George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 10 / 15
CM: necessary unknowns, example Consider general case desired marginal mean ln(µ i ) = ( X i )α desired variance-covariance structure is V (unstructured structure) ( ) β = X i X i X i ( X i α 0.5Z i DZ i ) ( ) D = Z i Z i Z i log ( Mi 1 (V M i )Mi 1 + J ) ( ) Z i Z i Z i Σ i = e 0.5Z ( idz i M 1 i (V M i )Mi 1 + J ) e 0.5Z idz i J George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 11 / 15
CM - Possible combinations Gamma random effects Yes No Normal random effects Correlated Independent Yes Correlated Independent No George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 12 / 15
Some results... %CorrPoisson(CovData=temp, id=id, OrderVar=time, Xcov=trt time trt*time, Alpha=1.521 0.237 0.254 0.345, Class=trt, outdata=out, random=, desiredvarcov=36 12 29, GammaRandEff=2, NormalRandEff=2); Derived unknowns Parameter α β diff [ D ] Intercept 1.521 1.521 0.0002203 0.0004406 trt 0.237 0.237-1.02E-14 time 0.254 0.254-6.88E-15 trt*time 0.345 0.345 1.082E-14 Y1 Y2 Y2 Y1 trt 0 1 George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 13 / 15
Some results... Derived unknowns Parameter α β diff D Intercept 1.521 1.520 0.0014135 trt 0.437 0.437-3.5E-14 time -0.254-0.255 0.0006473 trt*time 0.145 0.145 6.939E-16 [ 0.0040014 0.0000601 0.0000601 0.0002349 ] Y1 Y2 Y3 Y4 Y4 Y3 Y2 Y1 trt 0 1 George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 14 / 15
References SAS Macro available at http://ibiostat.be/software/overdispersion E-MAIL: george.kalema@med.kuleuven.be Molenberghs, G., Verbeke,G., and Demétrio, C. (2007). An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime Data Analysis 13, 513 531. Molenberghs, G., Verbeke, G., Demétrio, C., and Vieira, A. (2010). A family of generalized linear models for repeated measures with normal and conjugate random effects. Statistical Science 25, 325 347. George KALEMA (KU Leuven) International Hexa-Symposium (UHasselt) November 14-15, 2013 15 / 15