Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Similar documents
Correlated data. Non-normal outcomes. Reminder on binary data. Non-normal data. Faculty of Health Sciences. Non-normal outcomes

Models for binary data

STAT 705 Generalized linear mixed models

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Introduction to SAS proc mixed

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Section Poisson Regression

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Models for longitudinal data

Answer to exercise: Blood pressure lowering drugs

Introduction to SAS proc mixed

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Chapter 4: Generalized Linear Models-II

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Analysis of variance and regression. May 13, 2008

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models

UNIVERSITY OF TORONTO Faculty of Arts and Science

SAS Syntax and Output for Data Manipulation:

STA6938-Logistic Regression Model

STAT 5200 Handout #26. Generalized Linear Mixed Models

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Generalized linear models

Lecture 3.1 Basic Logistic LDA

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

,..., θ(2),..., θ(n)

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA

GEE for Longitudinal Data - Chapter 8

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Statistics for exp. medical researchers Regression and Correlation

Stat 579: Generalized Linear Models and Extensions

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Sections 4.1, 4.2, 4.3

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Variance component models part I

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

PAPER 218 STATISTICAL LEARNING IN PRACTICE

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Multilevel Methodology

STAT 526 Advanced Statistical Methodology

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Figure 36: Respiratory infection versus time for the first 49 children.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Covariance Structure Approach to Within-Cases

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

Longitudinal Modeling with Logistic Regression

Using Estimating Equations for Spatially Correlated A

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household.

Introduction to Within-Person Analysis and RM ANOVA

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Generalized Linear Models for Non-Normal Data

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Lecture 10: Introduction to Logistic Regression

Package HGLMMM for Hierarchical Generalized Linear Models

Some comments on Partitioning

Changes Report 2: Examples from the Australian Longitudinal Study on Women s Health for Analysing Longitudinal Data

Simple logistic regression

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

One-stage dose-response meta-analysis

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Variance components and LMMs

Variance components and LMMs

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

STAT 7030: Categorical Data Analysis

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Generalized linear models

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Multinomial Logistic Regression Models

Cohen s s Kappa and Log-linear Models

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

Introduction to mtm: An R Package for Marginalized Transition Models

Time-Invariant Predictors in Longitudinal Models

Introduction to Linear Mixed Models: Modeling continuous longitudinal outcomes

Analysis of variance and regression. November 22, 2007

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Transcription:

Faculty of Health Sciences Correlated data Count variables Lene Theil Skovgaard & Julie Lyng Forman December 6, 2016 1 / 76

Modeling count outcomes Outline The Poisson distribution for counts Poisson models, log-linear models Overdispersion Generalized linear mixed models Population average models (PA) Subject specific models (SS) Examples: Leprosy Seizures (briefly) 2 / 76

Example: Counts of leprosy bacilli Controlled clinical trial: 10 patients treated with placebo P 10 patients treated with antibiotic A 10 patients treated with antibiotic B Recording of the number of bacilli at six sites of the body, i.e. a count variable before treatment (baseline, time=0) several months after treatment, (time=1) Reference: Snedecor, G.W. and Cochran, W.G. (1967). Statistical Methods, (6th edn). Iowa State University Press 3 / 76

Spaghettiplot - the leprosy example Number af bacilli at baseline and follow-up: 4 / 76

Counts at endpoint Do we see a difference at the end of follow-up? We probably do not have normal distributions here, and we cannot use logarithms because of zero values, so with a small dataset... Poisson distribution 5 / 76

Binary data Examples of binary outcomes: bacillus at a particular site of the body (1:yes / 0:no) smoking for a pupil in a school class (1:yes / 0:no) seizure on a single day (1:yes / 0:no) A binary variable X has a Bernoulli distribution, meaning that P(X = 1) = p P(X = 0) = 1 p For such an outcome, the mean value is E(X) = p, and the variance is Var(X) = p(1 p) 6 / 76

Binomial data If we sum up n binary observations, Y = n i=1 X i = X 1 + + X n, e.g. number of bacilli in total number of smokers in each school class number of seizures in a specific time interval we get a Binomial distribution, Y Bin(n, p), with P(Y = m) = ( ) n p m (1 p) n m m and E(Y ) = np, Var(Y ) = np(1 p) 7 / 76

Examples of Binomial distributions n=10, 50; np=1, 2, 5 or 20 (mean value) 8 / 76

Approximations to the Binomial distribution When n is large, the Binomial distribution is very intractable, so we use approximations p moderate (not too close to 0 or 1) and np > 5: Normal distribution p close to 0 (Law of rare events): the Poisson distribution, with point probabilities P(Y = m) = µm m! exp( µ) m = 0, 1, 2,... 9 / 76

Poisson distribution Counts with no well-defined upper limit: the number of cancer cases in a specific community during a specific year the number of bacilli in total the number of seizures in a certain interval When Y has a Poisson distribution, we have Mean value: E(Y ) = µ = np Variance: Var(Y ) = np In a Poisson distribution, the mean and variance are equal This fact is unfortunately often overlooked... 10 / 76

Poisson distribution Poisson distribution with mean value: µ =1,2,5 and 20 11 / 76

Models for non-normal data Generalized linear models are just like Multiple regression models, but on a scale that corresponds to the data: Normal (link=identity), mean values (almost) on the entire axis Traditional linear models Binomial (link=logit), mean values lie between 0 and 1 logistic regression (next lecture) Poisson (link=log), mean values are positive Log-linear models, Poisson regression 12 / 76

Generalized linear models, for count data Outcome variable Y i, following a Poisson distribution, with Mean value: E(Y i ) = µ i Link funktion: log, the natural logarithm. On this scale, we assume linearity in the covariates, i.e. log(µ i ) = β 0 + β 1 x i1 + + β k x ik (= X T i β) where x i1,..., x ik denote the covariate values for individual i. The log-link ensures that the mean value µ i = E(Y i ) will always be positive 13 / 76

Comparing distributions of counts Comparison of distributions from p. 5: Do we see a difference in bacilli counts at follow-up in the three groups? Model: Y i Poisson(µ i ), log(µ i ) = β t where the subscript t denotes treatment, which can be either A, B or P. This problem corresponds to a one-way ANOVA (in case of Normal distributions) We are comparing groups on a logarithmic scale, so results will be ratios: β A β P = log(µ A ) log(µ P ) exp(β A β P ) = µ A µp 14 / 76

Poisson analysis in SAS, endpoint proc genmod data=leprosy_wide; class drug; model endpoint = drug / dist=poisson link=log type3; estimate Effect A minus P drug 1 0-1; estimate Effect B minus A drug -1 1 0; estimate Antibiotic effect drug 0.5 0.5-1; run; Notes regarding the code: We use PROC GENMOD Options in the model-statement: dist=poisson: The distribution is Poisson link=log: Effects are modelled on a log-scale (natural log) type3 asks for a test of equality of all 3 drugs (as an F-test in ANOVA-models) Estimate-statements explained on p. 18 15 / 76

Output, I The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable WORK.LEPROSY Poisson Log endpoint Number of Observations Used 30 Class Level Information Class Levels Values drug 3 A B P Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Pearson Chi-Square 27 129.9144 4.8116 Scaled Pearson X2 27 129.9144 4.8116 AIC (smaller is better) 250.8420 Algorithm converged. The values above 1 in the last column indicates a misfit, see later on (overdispersion, p. 20 ff) 16 / 76

Output, II LR Statistics For Type 3 Analysis Source DF Chi-Square Pr > ChiSq drug 2 35.06 <.0001 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Conf Wald Parameter DF Estimate Error Limits Chi-Square Pr>ChiSq Intercept 1 2.5096 0.0902 2.3329 2.6863 774.66 <.0001 drug A 1-0.8419 0.1643-1.1639-0.5198 26.25 <.0001 drug B 1-0.7013 0.1566-1.0082-0.3944 20.06 <.0001 drug P 0 0.0000 0.0000 0.0000 0.0000.. Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Estimate of β A β P = 0.8419 This is on log-scale and has to be back-transformed: exp( 0.8419) = 0.43, meaning that when treated with A, the level of bacilli at endpoint is only 43% compared to the placebo-treated. But remember the misfit from last page 17 / 76

Estimate-statements In the output on the previous page, we have 4 estimates: Intercept = 2.5096: Level of (alphabetically) last group, i.e. β P 3 parameters in the drug-effect 1. drug A = -0.8419: The difference on log-scale between drug A and the Intercept (drug P), i.e. β A β P 2. drug B = -0.7013: As above, only for drug B 3. drug P = 0.0000: The difference on log-scale between drug P and the Intercept (drug P), i.e. β P β P = 0 Each Estimate-statement (p. 15) describes how we want to combine these 4 estimates, e.g. estimate Effect A minus P drug 1 0-1; says we want the combination 1 (β A β P ) 1 0 = β A β P 18 / 76

Output, III Output from Estimate statements (some columns deleted): Contrast Estimate Results Mean Mean Mean Prob Obs Label Estimate LowerCL UpperCL ChiSq ChiSq 1 Effect A minus P 0.4309 0.3123 0.5946 26.25 <.0001 2 Effect B minus A 1.1509 0.7966 1.6630 0.56 0.4541 3 Antibiotic effect 0.4623 0.3582 0.5966 35.13 <.0001 The back-transformed differences are ratios, given in the column Mean Estimate The active groups perform better at follow-up (ratio < 1). 19 / 76

Counts at baseline and follow-up The MEANS Procedure N drug Obs Variable N Median Mean Variance --------------------------------------------------------------------------- A 10 baseline 10 9.000 9.300 22.678 endpoint 10 5.000 5.300 21.567 B 10 baseline 10 8.000 10.000 27.556 endpoint 10 3.500 6.100 37.878 P 10 baseline 10 12.000 12.900 15.656 endpoint 10 12.500 12.300 51.122 --------------------------------------------------------------------------- Note: The variance is obviously bigger than the average (overdispersion, detected as a misfit on p. 16) 20 / 76

Overdispersion Overdispersion: The variance has been noted to be larger than expected for a Poisson distribution. This may be caused by omitted covariates (isn t that always the case?) unrecognized clusters heterogeneity, e.g. a zero -group (non-susceptibles) When overdispersion is disregarded The standard errors are erroneously small The P-values are erroneously small We get type I errors 21 / 76

Handling of overdispersion Two traditional solutions: Assuming that Var(Y ) = φe(y ) = φnp with some φ > 0 (most often > 1) although such a distibution does not actually exist... Including an extra random variation to account for the forgotten covariates, e.g. log(µ i ) = β 0 + β 1 x i1 + + β k x ik +b i with some assumption on the distribution of the b i s i.e. with exp(b i ) multiplied on the mean value 22 / 76

Overdispersion parameter The over-dispersion parameter φ can be estimated and multiplied onto the variance, yielding Larger standard errors Larger P-values φ is estimated from either Pearson Chi-Square Value/DF or Deviance Chi-Square Value/DF, using options scale=p or scale=d. and multiply the square root ˆφ on standard errors. 23 / 76

Additional random variation Possible models for b i : b i N (0, ωb 2 ): leads to a complicated model, which changes the level of the mean, since E(exp(b i )) = exp(ωb 2/2) > 1 (we shall return to this later) b i log Gamma: leads to Y i being distributed as a Negative binomial distribution, in which: E(Y i ) = µ i Var(Y i ) = µ i + θµ 2 i, so the variance may now be larger than the mean 24 / 76

Negative binomial distributions, with mean 10 Poisson distribution, followed by 3 negative binomial distributions, with variance 30, 110 and 210. All distributions have mean 10 25 / 76

Overdispersion in PROC GENMOD Overdispersion parameter: proc genmod data=leprosy; class drug; model endpoint = drug / dist=poisson link=log type3 scale=pearson; run; Negative Binomial model: proc genmod data=leprosy; class drug; model endpoint = drug / dist=negbin link=log type3; run; 26 / 76

Overdispersion in PROC GENMOD, scale=pearson Code shown p. 26 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr>ChiSq Intercept 1 2.5096 0.1978 2.1219 2.8973 161.00 <.0001 drug A 1-0.8419 0.3604-1.5483-0.1355 5.46 0.0195 drug B 1-0.7013 0.3435-1.3746-0.0280 4.17 0.0412 drug P 0 0.0000 0.0000 0.0000 0.0000.. Scale 0 2.1935 0.0000 2.1935 2.1935 NOTE: The scale parameter was estimated by the square root of Pearson s Chi-Square/DOF. LR Statistics For Type 3 Analysis Source Num DF Den DF F Value Pr > F Chi-Square Pr > ChiSq drug 2 27 3.64 0.0398 7.29 0.0262 27 / 76

Negative binomial analysis in GENMOD Code shown p. 26 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Wald Parameter DF Estimate Error Confidence Limits Chi-Square Intercept 1 2.5096 0.2729 1.9746 3.0446 84.54 <.0001 drug A 1-0.8419 0.3997-1.6252-0.0586 4.44 0.0352 drug B 1-0.7013 0.3966-1.4785 0.0759 3.13 0.0770 drug P 0 0.0000 0.0000 0.0000 0.0000.. Dispersion 1 0.6637 0.2255 0.3410 1.2916 NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood. LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq drug 2 4.94 0.0847 28 / 76

Results for mean(a, B) vs. P Endpoint comparisons Ratio (CI) P-value Poisson 0.4623 (0.3582, 0.5966) < 0.0001 - with overdispersion 0.4623 (0.2641, 0.8090) 0.0069 Negative Binomial 0.4623 (0.2368, 0.9025) 0.0238 Endpoints differ, but: Baseline comparisons Ratio (CI) P-value Poisson 0.7476 (0.5982, 0.9343) 0.0105 - with overdispersion 0.7476 (0.5397, 1.0355) 0.0801 Negative Binomial 0.7476 (0.5492, 1.0176) 0.0644 How do we account for baseline differences? 29 / 76

Spaghettiplot - the leprosy example Now again including both time points (0 and 1): 30 / 76

Average plot - the leprosy example Note: New scaling, different from p. 30 31 / 76

Possible purposes of the investigation 1. Evaluate the efficiency of antibiotics: red lines vs green line 2. Compare the two drugs, A and B: solid vs dotted red line 3. Quantify the effects of each of the two antibiotic drugs separately Randomization: At baseline, all patients have the same expected mean count (mean value), but by chance, the placebo individuals have larger values than the two other groups. 32 / 76

Model reflections This is just a before-after study...but We are dealing with counts, so it is natural to consider a Poisson distribution, with log-link (natural log) Because it is a randomized study, the mean values at baseline should be identical for the three groups We are prepared to see 3 different changes over time - but some of these may be identical (this is actually the main scientific question) Baseline and follow-up measurements are correlated within individuals 33 / 76

Correlations within individual? It certainly seems so... 34 / 76

Model reflections, II Can t we just take logarithms? No, because we have zeroes Some other transformation then? Yes, square roots, or arcsine, but the interpretation would suffer a lot Could we just condition on the baseline value? Yes, we could do that...but it becomes more tricky when we have multiple time points Could we analyze differences? Or rather, ratios? Hmm... We could build a Constrained Model, forcing mean values to be equal at baseline. 35 / 76

ANCOVA, Poisson The GENMOD Procedure Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Pearson Chi-Square 26 71.4569 2.7483 Scaled Pearson X2 26 26.0000 1.0000 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr>ChiSq Intercept 1 0.8723 0.3696 0.1480 1.5967 5.57 0.0183 drug A 1-0.4664 0.2793-1.0138 0.0810 2.79 0.0949 drug B 1-0.4458 0.2615-0.9584 0.0668 2.91 0.0883 drug P 0 0.0000 0.0000 0.0000 0.0000.. before 1 0.1186 0.0229 0.0738 0.1634 26.88 <.0001 Contrast Estimate Results Mean Mean Mean Prob Obs Label Estimate LowerCL UpperCL ChiSq ChiSq 1 Effect A minus P 0.6273 0.3629 1.0843 2.79 0.0949 2 Effect B minus A 1.0208 0.5530 1.8843 0.00 0.9475 3 Antibiotic effect 0.6338 0.4111 0.9769 4.27 0.0388 We note ratios closer to 1 when we adjust for baseline. (compare to p. 19) 36 / 76

Constrained model for baseline correction Parametrization of mean values (on the log-scale): Treatment Period Mean (on log scale) P Baseline β 1 P Follow-up β 1 + β 2 A Baseline β 1 A Follow-up β 1 + β 2 + β 3 B Baseline β 1 B Follow-up β 1 + β 2 + β 4 β 3 resp. β 4 denote additional effects of A and B, when compared to placebo 37 / 76

Generalized linear MIXED models Outcome variable Y ij, e.g. j th measurement time for individual i: Mean value: µ ij Link funktion g: g(µ ij ) is assumed linear in covariate vector X ij. Two kinds of models: Population average models (PA): g(µ ij ) = β 0 + β 1 x ij1 + + β k x ijk = X T ij β and (Y ij1, Y ij2 ) are associated (correlated), with some (patterned) covariance (p.42 ff) Subject-specific models (SS): g(µ ij ) = β 0 + β 1 x ij1 + + β k x ijk +b i b i N (0, ωb 2 ), random intercepts (levels) may be generalized to other random effects: slopes,...(p. 56 ff) 38 / 76

The two model types Marginal models: or Population average (PA): Describe covariate effects on the population mean, e.g. expected difference between the effects of two treatments Corresponds to the repeated-statement Mixed effects model: or Subject specific (SS): Describe covariate effects on specific individuals (or clusters), e.g. expected change over time, (r differences between boys and girls in the same school class) Corresponds to the random-statement 39 / 76

For traditional linear models (Normality) with identity link: Subject-specific model (SS) with random intercept/level is equal to Marginal model (PA) with compound symmetry covariance structure (type=cs) More generally: The interpretation of the parameters β does not depend on the way that we model the covariance/correlation (although the estimate may change somewhat depending on the assumed structure of the covariance) 40 / 76

For non-normal outcomes The above is no longer true in general, due to non-linearity of the link-function For Poisson analyses this means: Including a random subject level (as in SS-models) will change the interpretation of the mean value, but not the parameters denoting the effects of the covariates (e.g. group or time). Parameters allowed to vary between individuals will differ in interpretation as well as size SS-models will provide median-like levels (or rather levels for median individuals), as opposed to average-like levels for PA-models 41 / 76

Marginal models = Population Average (PA) A Multivariate Poisson distribution does not exist, so we only specify Marginal mean, E(Y ij X ij ) = µ ij, where log(µ ij ) = X T ij β, i.e. covariate effects as usual Distribution... Poisson (in a way), but... Marginal variance, φv (µ ij ) = φµ ij (overdispersion) Some measure of association for Y s belonging to the same individual/unit, V i = Cov(Y i ), called the working covariance matrix 42 / 76

Marginal models, technicalities Since we do not actually have a model, we cannot use a maximum likelihood approach. This has implications for the handling of missing values (lecture 4). Instead, we use the socalled GEE: Generalized estimating equation, (written in vector notation) D T V 1 i (Y i µ i ) = 0 where V i is the (working) covariance matrix for Y i, and D i is the matrix of derivatives of the mean value µ i with respect to β 43 / 76

Properties of the GEE estimation procedure PRO s: It is robust, in the sense that it gives consistent estimates even if the working covariance matrix is misspecified provided that you use the Sandwich covariance estimate (which is fortunately default in PROC GENMOD). The Sandwich covariance estimate also takes care of the possible overdispersion, as well as possible differences in variability over time. For large sample sizes, the parameter estimates will be asymptotically Normal, (i.e we can construct confidence intervals with ±2 standard errors) 44 / 76

Properties of the GEE estimation procedure, II CON s: It can only be used for balanced data It performs poorly in small datasets (anti-conservative, i.e. may give type I errors). If missing data are not missing completely at random (MCAR), the results will be flawed, and we have to do some (quite complicated) weighting in order to get consistent results (lecture 6) 45 / 76

Choosing a working covariance PROC GENMOD offers several choices Unstructured (type=un) Compound Symmetry (type=cs) Autoregressive (type=ar) Working independence (type=ind) All choices will give consistent estimates, but choices closer to the true structure will be more efficient (narrower confidence intervals, i.e. more power). 46 / 76

Marginal model (PA) for leprosy In order to restrict baseline mean values to be equal (see p. 37), we define the adjusted treatment variable: drugadj=drug; if time=0 then drugadj="p"; and the code (with unstructured working covariance) will then be proc genmod data=leprosy; class id drugadj; model bacilli= time drugadj*time / dist=poisson link=log type3; repeated subject=id / type=un corrw; run; 47 / 76

Comments to code time indicates the change over time for the placebo group (because this is the reference group) drugadj*time: specifies additional time changes over and above the changes for placebo dist=poisson: specifies the link-function as log, and the working correlation matrix as (proportional to) the mean link=log: may overrule the link-function from dist=poisson, if so needed repeated: specifies an unstructured (type=un) association between measurements on the same id (corrw requests printing) 48 / 76

Output from marginal (PA) model The GENMOD Procedure Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept 2.3734 0.0801 2.2163 2.5304 29.62 <.0001 time -0.0138 0.1573-0.3222 0.2946-0.09 0.9300 time*drugadj A -0.5406 0.2186-0.9690-0.1122-2.47 0.0134 time*drugadj B -0.4791 0.2279-0.9257-0.0325-2.10 0.0355 time*drugadj P 0.0000 0.0000 0.0000 0.0000.. Score Statistics For Type 3 GEE Analysis Source DF Chi-Square Pr > ChiSq time 1 13.90 0.0002 time*drugadj 2 4.56 0.1024 49 / 76

Interpretations There is a significant effect of antibiotics: Score test: 4.56 χ 2 (2) P = 0.10 Walds test: 6.99 χ 2 (2) P = 0.03 The effect of placebo is estimated to exp( ˆβ 2 ) = exp( 0.0138) = 0.986, i.e a decrease of 1.4% The additional effect of drug A is estimated to exp( ˆβ 3 ) = exp( 0.5406) = 0.58, and the total effect to exp( ˆβ 2 + ˆβ 3 ) = exp( 0.5544) = 0.574, i.e a decrease of 42.6% 50 / 76

Marginal model (PA) for leprosy, II With additional estimate- and output-statement, the code becomes: proc genmod data=leprosy; class id drugadj; model bacilli= time drugadj*time / dist=poisson link=log type3; repeated subject=id / type=un corrw; estimate "change for A" time 1 drugadj*time 1 0 0; estimate "change for B" time 1 drugadj*time 0 1 0; estimate "change for P" time 1 drugadj*time 0 0 1; estimate additional change A vs. P drugadj*time 1 0-1; estimate additional change B vs. P drugadj*time 0 1-1; estimate additional change A vs. B drugadj*time 1-1 0; estimate additional change (A,B) vs. P drugadj*time 0.5 0.5-1; output out=pa pred=pred_pa xbeta=xbeta_pa; run; 51 / 76

Comments to additional code The estimate statements provide Estimates of time effects for each drug separately Estimates of the additional time effect for each of the two active drugs, as compared to placebo Estimates of the difference in time effect between the two active drugs Estimates of the additional average time effect for the two active drugs, as compared to placebo Output data set, with predicted values pred_pa, for illustration purposes (see p. 54) 52 / 76

Output from additional estimate statements from p. 51 Mean Mean Mean Prob Obs Label Estimate LowerCL UpperCL ChiSq ChiSq 1 change for A 0.5744 0.4281 0.7707 13.67 0.0002 2 change for B 0.6109 0.4478 0.8333 9.68 0.0019 3 change for P 0.9863 0.7246 1.3425 0.01 0.9300 4 additional change A vs. P 0.5824 0.3795 0.8939 6.12 0.0134 5 additional change B vs. P 0.6194 0.3963 0.9681 4.42 0.0355 6 additional change A vs. B 0.9403 0.6148 1.4381 0.08 0.7765 7 additional change (A,B) vs. P 0.6006 0.4097 0.8805 6.82 0.0090 The two antibiotics are not significantly different: 0.08 χ 2 (1) P = 0.78 although the estimated effect is a tiny bit larger for drug A (smaller ratio for the decline) 53 / 76

Predicted means from Population Average model (PA) Note the identical baselines Legends: A B... P 54 / 76

Comments to estimates time profiles in comparison to the simple averages (p. 31): Treatment B starts off at a higher level Due to Regression to the mean, we therefore expect this group to have the steepest decline Since they are actually close to parallel in the averages (so that B is not steeper than A), this leads us to conclude that B is not as effective as A, and therefore, we see a difference in slope in the predicted means Same type of argument concerning P, which would decrease the most if it was equally effective 55 / 76

Subject Specific models (SS) Variance component models, see p. 38-39: Observations: Y ij, with mean value µ ij where log(µ ij ) = X T ij β+z T ij b i The b i s denote the random effects, e.g. random levels (intercepts), random slopes etc. It is assumed that b i N (0, G) and are independent of the covariates X i For any subject, the repeated measurements are conditionally independent, given the random effects, and follows a Poisson distribution This is a proper multivariate model, in which the correlation between repeated measurements on the same subject is induced by the random effects 56 / 76

Interpretation of SS Since this is a real model, we can use maximum likelihood (and handle MAR missing values), but The effect of a covariate is interpreted as being for fixed value of all other covariates, including for fixed value of the individual, i.e. specific to this subject. For models with a log-link, however, the interpretation of covariate effects are still as usual, except for The intercept (which gets more of a median-like interpretation and therefore smaller than the mean interpretation from the PA-model) Covariates that also enter as random effects e.g. random slope = random effect of time (not here) 57 / 76

A very simple example of random slopes A population consisting of two individuals (number of bacilli): Random slope on log-scale means different ratios between Follow-Up and Baseline: Individual Baseline Follow up Ratio 1 12 8 0.667 2 8 7 0.875 Average 10 7.5 0.771 but for the population, the ratio is 7.5 10 = 0.75 0.771 The average of individual ratios is not equal to the ratio of the averages 58 / 76

PRO s and CON s of SS PRO : It is an actual model, allowing likelihood inference MAR-Missing values can be handled correctly It may be used for unbalanced data sets 59 / 76 CON : The interpretation is conditional upon individual random effects, and therefore not always in focus Higher risk of misspecification, due to assumptions that are difficult to check Computationally problematic when the number of random effects or the overall size of the data becomes large.

Mixed effects model (SS) We now assume random intercepts, b i N (0, ωb 2 ) by specifying a random level for each individual (so here, G = ωb 2): proc glimmix data=leprosy method=quad(qpoints=50); class id; model bacilli= time drugadj*time / d=poisson link=log type3; random intercept / subject=id type=vc g; /* optional statements added below */ estimate "change for A" time 1 drugadj*time 1 0 0 / exp cl; estimate "change for B" time 1 drugadj*time 0 1 0 / exp cl; estimate "change for P" time 1 drugadj*time 0 0 1 / exp cl; estimate additional change A vs. P drugadj*time 1 0-1; estimate additional change B vs. P drugadj*time 0 1-1; estimate additional change A vs. B drugadj*time 1-1 0; estimate additional change (A,B) vs. P drugadj*time 0.5 0.5-1; output out=ss pred=pred pred(noblup)=predav pred(ilink)=predmu pred(ilink noblup)=predmuav; run; 60 / 76

Comments to code for SS-model We use PROC GLIMMIX method=quad(qpoints=50): perform maximum likelihood estimation by approximating the likelihood function by Gaussian quadrature. The more quadrature points, the better accuracy. random: here we have only one random intercept, so type=... is unimportant g: prints the estimate of ωb 2 (In glimmix, the parameter ωb 2 is generally denoted G) estimate-statements as before (only now, we need options exp and cl) output out=: Saves predicted values in the data set ss (there are several different kinds, see p. 64) 61 / 76

Output from SS-type analysis Estimated G Matrix Effect Row Col1 Intercept 1 0.2814 Covariance Parameter Estimates Standard Cov Parm Subject Estimate Error Intercept id 0.2814 0.09557 Solutions for Fixed Effects Standard Effect drugadj Estimate Error DF t Value Pr > t Intercept 2.2412 0.1148 29 19.53 <.0001 time 0.003088 0.1235 27 0.03 0.9802 time*drugadj A -0.6055 0.2036 27-2.97 0.0061 time*drugadj B -0.5228 0.1963 27-2.66 0.0129 time*drugadj P 0.... Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time 1 27 17.50 0.0003 time*drugadj 2 27 5.83 0.0079 Note: Somewhat steeper lines than for PA-model... 62 / 76

Output from glimmix analysis, II Only some columns shown Exponentiated Label Estimate ExpLower ExpUpper Pr > t change for A 0.5475 0.3897 0.7692 0.0012 change for B 0.5947 0.4312 0.8202 0.0026 change for P 1.0031 0.7786 1.2924 0.9802 additional change A vs. P 0.5458 0.3594 0.8288 0.0061 additional change B vs. P 0.5929 0.3963 0.8869 0.0129 additional change A vs. B 0.9206 0.5812 1.4583 0.7151 additional change (A,B) vs. P 0.5688 0.4050 0.7990 0.0021 Note again: Some differences to PA-analysis, but overall same conclusion 63 / 76

Output dataset from GLIMMIX analysis The data set ss, created p. 60, contains 4 different predicted values: output out=ss pred=pred pred(noblup)=predav pred(ilink)=predmu pred(ilink noblup)=predmuav; Predicions on log-scale (not so interesting): Pred: Individual predictions (pred=) PredAv: Predictions, averaged over population (pred(noblup)=) Predictions on original scale (more interesting): PredMu: Individual predictions (pred(ilink)=) PredMuAv: Back-transformed average predictions (pred(ilink noblup)=) 64 / 76

Individual predicted curves, SS pred(ilink)=predmu 65 / 76

Average individual predictions, SS Averages from p. 65 Legends: A B... P Note the resemblance to the average curves on p. 31 Here they are moved a bit closer together 66 / 76

Comparison of SS and PA PA left, SS right Legends: A B... P 67 / 76

Additional overdispersion in GLIMMIX Recall the assumptions from the SS-model: For any subject, the repeated measurements are conditionally independent, given the random effects, and they follow a Poisson distribution It is possible to add additional overdispersion to these conditional models by adding the line random _residual_; to the code from p. 60, or to use the Negative Binomial distribution instead of the Poisson distribution (dist=negbin) 68 / 76

Overview of results for Leprosy Decrease (A,B) vs. P Ratio (CI) P-value No correlation 0.46 (0.36, 0.60) < 0.0001 No corr., overdispersion 0.46 (0.29, 0.74) 0.0014 No corr., Negative Binomial 0.46 (0.28, 0.75) 0.0020 PA PA, Poisson 0.60 (0.41, 0.88) 0.0090 PA, Negative Binomial 0.58 (0.39, 0.86) 0.0073 SS SS, Poisson 0.57 (0.41, 0.80) 0.0021 SS, Poisson, overdispersion 0.56 (0.38, 0.82) 0.0040 SS, Negative Binomial 0.57 (0.40, 0.81) 0.0026 69 / 76

Example: Epileptic seizures Controlled clinical trial, with 58 epileptic patients: 28 treated with placebo 30 treated with progabide=active Recording of the number of epileptic seizures during an 8-week interval before treatment 4 2-weeks intervals after treatment Reference: Thall, P.F. and Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics. 70 / 76

Spaghettiplot - the epilepsy example Number af seizures per week: Looks good for the patients, but Week 0 level not comparable to the others: 8 weeks data collection 71 / 76

Seizures per week (rates) Now, week 0 is comparable to the other weeks in mean but not in variation (longer sample time) 72 / 76

Seizure example: Mean value plot Legends: Progabide Placebo Not linear...but for illustration, we could assume two straight lines on log-scale... 73 / 76

Purpose of investigation 1. Investigate what happens over time, does the number of seizures decrease? 2. Compare the decrease for a patient treated with pragabide to the decrease for a similar patient in the placebo group 3. Compare the decrease for a population treated with pragabide to the decrease for a population treated with placebo Notation: T ij denotes the time span corresponding to the number of seizures, Y ij, so T ij is either 2 or 8 weeks 74 / 76

Model building Model (in principle, not reasonable here) for the number of seizures: or Poisson outcome Random regression, i.e. linear effect of week, with individual intercepts and slopes Mean value proportional to length of period (8 or 2 weeks) log(8) and log(2) used as offsets This ensures that we model the ratio Y ij T ij, on log-scale, i.e. 75 / 76 ( ) E(Yij ) log = α + β time + γ treat time T ij log(e(y ij )) = α + β time + γ treat time + log(t ij )

Random regression, SS model in glimmix Important: The model is not reasonable here (see figure on p. 74), and is only showed to hint at possible extensions... proc glimmix data=seizures method=quad(qpoints=50); class id adjtreat visit; model seizures = weeks adjtreat*weeks / dist=poisson offset=lweeks link=log solution; random intercept weeks / subject=id type=un g; estimate weekly decline treat=0 weeks 1 weeks*adjtreat 1 0; estimate weekly decline treat=1 weeks 1 weeks*adjtreat 0 1; estimate slope, active vs. placebo?? week*adjtreat -1 1 / exp cl; output out=ss pred=pred pred(noblup)=predav pred(ilink)=predmu pred(ilink noblup)=predmuav; run; Since time (weeks) here enter as a random effect, the interpreation of time effects have to be conditional on the specific subject. 76 / 76