Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Similar documents
Mixed Models for Longitudinal Ordinal and Nominal Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago

More Mixed-Effects Models for Ordinal & Nominal Data. Don Hedeker University of Illinois at Chicago

Why analyze as ordinal? Mixed Models for Longitudinal Ordinal Data Don Hedeker University of Illinois at Chicago

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Mixed Models for Longitudinal Ordinal and Nominal Data

Mixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data. Don Hedeker University of Illinois at Chicago

Missing Data in Longitudinal Studies: Mixed-effects Pattern-Mixture and Selection Models

Multilevel Modeling of Non-Normal Data. Don Hedeker Department of Public Health Sciences University of Chicago.

Generalized Linear Models for Non-Normal Data

GEE for Longitudinal Data - Chapter 8

Correspondence Analysis of Longitudinal Data

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Advantages of Mixed-effects Regression Models (MRM; aka multilevel, hierarchical linear, linear mixed models) 1. MRM explicitly models individual

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Investigating Models with Two or Three Categories

Generalized Models: Part 1

Chapter 1. Modeling Basics

8 Nominal and Ordinal Logistic Regression

Application of Item Response Theory Models for Intensive Longitudinal Data

Lecture 3.1 Basic Logistic LDA

Longitudinal Modeling with Logistic Regression

Introduction to Generalized Models

Comparing IRT with Other Models

STAT 705 Generalized linear mixed models

Stat 642, Lecture notes for 04/12/05 96

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Homework Solutions Applied Logistic Regression

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Data-analysis and Retrieval Ordinal Classification

Sample Size Estimation for Longitudinal Studies. Don Hedeker University of Illinois at Chicago hedeker

Lecture 12: Effect modification, and confounding in logistic regression

Model Assumptions; Predicting Heterogeneity of Variance

Generalized Linear Modeling - Logistic Regression

Interpreting and using heterogeneous choice & generalized ordered logit models

Introduction to mtm: An R Package for Marginalized Transition Models

Single-level Models for Binary Responses

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

Introducing Generalized Linear Models: Logistic Regression

Using Mixture Latent Markov Models for Analyzing Change in Longitudinal Data with the New Latent GOLD 5.0 GUI

Package threg. August 10, 2015

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Logistic Regression. Continued Psy 524 Ainsworth

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Bayesian Multivariate Logistic Regression

Simple logistic regression

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

multilevel modeling: concepts, applications and interpretations

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Consider Table 1 (Note connection to start-stop process).

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only)

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

SuperMix2 features not available in HLM 7 Contents

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Statistical Analysis of List Experiments

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Model Estimation Example

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Regression Methods for Survey Data

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Fitting mixed-effects models for repeated ordinal outcomes with the NLMIXED procedure

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Categorical Predictor Variables

Models for Binary Outcomes

Multi-level Models: Idea

Categorical and Zero Inflated Growth Models

Generalized Linear Latent and Mixed Models with Composite Links and Exploded

Chapter 5: Logistic Regression-I

Generalized Multilevel Models for Non-Normal Outcomes

RESEARCH ARTICLE. Likelihood-based Approach for Analysis of Longitudinal Nominal Data using Marginalized Random Effects Models

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Answers to Problem Set #4

mixor: An R Package for Longitudinal and Clustered Ordinal Response Modeling

Repeated ordinal measurements: a generalised estimating equation approach

Binary Dependent Variables

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

(z) normal >< ª(z) logistic 1=(1 + exp( z)) P(v ij c) = (z) clog-log 1 exp( exp(z)) 1 (z) log-log exp( exp(z))

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Models for Heterogeneous Choices

Fixed effects results...32

STAT 7030: Categorical Data Analysis

Lecture 10: Introduction to Logistic Regression

Review of Statistics 101

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

ECON 594: Lecture #6

Introduction to Random Effects of Time and Model Estimation

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Transitional modeling of experimental longitudinal data with missing values

Transcription:

Mixed Models for Longitudinal Ordinal and Nominal Outcomes Don Hedeker Department of Public Health Sciences Biological Sciences Division University of Chicago hedeker@uchicago.edu Hedeker, D. (2008). Multilevel models for ordinal and nominal variables. In J. de Leeuw & E. Meijer (Eds.), Handbook of Multilevel Analysis. Springer, New York. Hedeker, D. & Gibbons, R.D. (2006). Longitudinal Data Analysis, chapters 10 & 11. Wiley. Hedeker, D. (2014). Methods for multilevel ordinal data in prevention research. Prevention Science. This work was supported by National Institute of Mental Health Contract N44MH32056. 1

Why analyze as ordinal? Efficiency: Armstrong & Sloan (1989, Amer Jrn of Epid) and Strömberg (1996, Amer Jrn of Epid) report efficiency losses between 49% to 87% when dichotomizing an ordinal outcome with five categories. Bias: continuous model can yield correlated residuals and regressors when used for ordinal outcomes; continuous model does not take into account the ceiling and floor effects of the ordinal outcome. Results in biased estimates of regression coefficients and is most critical when the ordinal variables is highly skewed (see Bauer & Sterba, 2011, Psych Methods) Logic: continuous model can yield predicted values outside of the range of the ordinal variable. 2

Ordinal Logistic Regression Model (aka Proportional Odds or Cumulative Logit Model) - McCullagh (1980) log P (Y c) = γ c x β 1 P (Y c) c = 1,..., C 1 for the C categories of the ordinal outcome x = vector of explanatory variables (plus the intercept) γ c = threshold parameters; reflect cumulative logits when x = 0 (for identification: γ 1 = 0 or β 0 = 0) positive association between explanatory variable x and ordinal outcome variable Y is reflected by β x is assumed to have the same effect on each cumulative logit (proportional odds assumption) 3

Ordinal Response and Threshold Concept Continuous y i - unobservable latent variable - related to ordinal response Y i via threshold concept threshold values γ 1, γ 2,..., γ C 1 (γ 0 = and γ C = ) C = number of ordered categories Response occurs in category c, Y i = c if γ c 1 < y i < γ c 4

The Threshold Concept in Practice How was your day? (what is your level of satisfaction today?) Satisfaction may be continuous, but we sometimes emit an ordinal response: 5

Model for Latent Continuous Responses Consider the model with p covariates for the latent response strength y i (i = 1, 2,..., N): y i = x iβ + ε i probit: ε i standard normal (mean=0, variance=1) logistic: ε i standard logistic (mean=0, variance=π 2 /3) β estimates from logistic regression are larger (in abs. value) than from probit regression by approximately π 2 /3 = 1.8 Underlying latent variable useful way of thinking of the problem not an essential assumption of the model 6

Mixed-effects ordinal logistic regression model (Hedeker & Gibbons, 1994, 1996) i = 1,... N level-2 units (clusters or subjects) j = 1,..., n i level-1 units (subjects or repeated observations) c = 1, 2,..., C response categories Y ij = ordinal response of level-2 unit i and level-1 unit j How was your day? (asked repeatedly each day for a week) 7

Mixed-effects ordinal logistic regression model λ ijc = log P ijc = γ c (x (1 P ijc ) ijβ + z ijυ i ) P ijc = Pr (Y ij c υ ; γ c, β, Σ υ ) = 1 1+exp( λ ijc ) p ijc = Pr (Y ij = c υ ; γ c, β, Σ υ ) = P ijc P ijc 1 C 1 strictly increasing model thresholds γ c x ij = p 1 covariate vector z ij = r 1 design vector for random effects β = p 1 fixed regression parameters υ i = r 1 random effects for level-2 unit i N(0, Σ υ ) 8

Model for Latent Continuous Responses Model with p covariates for the latent response strength y ij : y ij = x ijβ + υ 0i + ε ij where υ 0i N(0, σ 2 υ), and assuming ε ij standard normal (mean 0 and σ 2 = 1) leads to mixed-effects ordinal probit regression ε ij standard logistic (mean 0 and σ 2 = π 2 /3) leads to mixed-effects ordinal logistic regression 9

Underlying latent variable not an essential assumption of the model useful for obtaining intra-class correlation (r) and for design effect (d) r = σ2 υ σ 2 υ + σ 2 d = σ2 υ + σ 2 σ 2 = 1/(1 r) ratio of actual variance to the variance that would be obtained by simple random sampling (holding sample size constant) 10

Scaling of regression coefficients Fixed-effects model β estimates from logistic regression are larger (in abs. value) than from probit regression by approximately because π 2 /3 1 V (y) = σ 2 = π 2 /3 for logistic V (y) = σ 2 = 1 for probit = 1.8 11

Mixed-effects model β estimates from mixed-effects (random intercepts) model are larger (in abs. value) than from fixed-effects model by approximately because d = συ 2 + σ 2 σ 2 V (y) = σ 2 υ + σ 2 in mixed-effects (random intercepts) model V (y) = σ 2 in fixed-effects model difference depends on size of random-effects variance σ 2 υ more complex for models with multiple random effects 12

Treatment-Related Change Across Time Data from the NIMH Schizophrenia collaborative study on treatment related changes in overall severity. IMPS item 79, Severity of Illness, was scored as: 1 = normal or borderline mentally ill 2 = mildly or moderately ill 3 = markedly ill 4 = severely or among the most extremely ill The experimental design and corresponding sample sizes: Sample size at Week Group 0 1 2 3 4 5 6 completers PLC (n=108) 107 105 5 87 2 2 70 65% DRUG (n=329) 327 321 9 287 9 7 265 81% Drug = Chlorpromazine, Fluphenazine, or Thioridazine Main question of interest: Was there differential improvement for the drug groups relative to the control group? 13

Under SSI, Inc > SuperMix (English) or SuperMix (English) Student Under File click on Open Spreadsheet Open C:\SuperMixEn Examples\Workshop\Binary\SCHIZX1.ss3 (or C:\SuperMixEn Student Examples\Workshop\Binary\SCHIZX1.ss3) 14

C:\SuperMixEn Examples\Workshop\Binary\SCHIZX1.ss3 15

Select Imps79O column, then Edit > Set Missing Value 16

Select File > Data-based Graphs > Bivariate 17

18

Select File > Data-based Graphs > Bivariate 19

20

Observed Logits across Time by Condition 21

Within-Subjects / Between-Subjects components Within-subjects model - level 1 (j = 1,..., n i obs) λ ijc = γ c [b 0i + b 1i W eekj ] Between-subjects model - level 2 (i = 1,..., N subjects) b 0i = β 0 + β 2 Grp i + υ 0i b 1i = β 1 + β 3 Grp i + υ 1i υ i N ID(0, Σ υ ) 22

Under File click on Open Existing Model Setup Open C:\SuperMixEn Examples\Workshop\Ordinal\schizo2.mum (or C:\SuperMixEn Student Examples\Workshop\Ordinal\schizo2.mum) 23

Note that Dependent Variable Type is ordered 24

Note the lack of TxDrug as an explanatory variable 25

Make sure Optimization Method is set to adaptive quadrature 26

Note: Cumulative Logit link function 27

Category response indicators (IMPS79O1-IMPS79O4); results of fixed-effects model (to be ignored, or for comparison purposes) 28

29

30

31

Model Fit of Observed Proportions 32

SAS IML code: SCHZOFIT.SAS - computing marginal probabilities - ordinal model adapted from syntax at http://www.uic.edu/classes/bstt/bstt513/ (Week 12) TITLE1 NIMH Schizophrenia Data - Estimated Marginal Probabilities ; PROC IML; /* Results from random intercept and trend model */; /* using Population Average Estimates */; x0 = { 0.00000 0, 1.00000 0, 1.73205 0, 2.44949 0}; x1 = { 0.00000 0.00000, 1.00000 1.00000, 1.73205 1.73205, 2.44949 2.44949}; beta = { -.8041, -.9018}; thresh = {-4.5153, -1.726,.4765}; 33

za0 = (thresh[1] - x0*beta) ; zb0 = (thresh[2] - x0*beta) ; zc0 = (thresh[3] - x0*beta) ; za1 = (thresh[1] - x1*beta) ; zb1 = (thresh[2] - x1*beta) ; zc1 = (thresh[3] - x1*beta) ; grp0a = 1 / ( 1 + EXP(- za0)); grp0b = 1 / ( 1 + EXP(- zb0)); grp0c = 1 / ( 1 + EXP(- zc0)); grp1a = 1 / ( 1 + EXP(- za1)); grp1b = 1 / ( 1 + EXP(- zb1)); grp1c = 1 / ( 1 + EXP(- zc1)); print Random intercept and trend model ; print using Population Average Estimates ; print marginal prob for group 0 - catg 1 grp0a [FORMAT=8.4]; print marginal prob for group 0 - catg 2 (grp0b-grp0a) [FORMAT=8.4]; print marginal prob for group 0 - catg 3 (grp0c-grp0b) [FORMAT=8.4]; print marginal prob for group 0 - catg 4 (1-grp0c) [FORMAT=8.4]; print marginal prob for group 1 - catg 1 grp1a [FORMAT=8.4]; print marginal prob for group 1 - catg 2 (grp1b-grp1a) [FORMAT=8.4]; print marginal prob for group 1 - catg 3 (grp1c-grp1b) [FORMAT=8.4]; print marginal prob for group 1 - catg 4 (1-grp1c) [FORMAT=8.4]; 34

Proportional and Non-proportional Odds Proportional Odds model log P (Y ij c) = γ c [ x 1 P (Y ij c) ijβ + z ] ijυ i with υ i N(0, Σ υ ) relationship between the explanatory variables and the cumulative logits does not depend on c effects of x variables DO NOT vary across the C 1 cumulative logits 35

Non-Proportional/Partial Proportional Odds model log P (Y ij c) = γ 1 P (Y ij c) 0c [ u ijγ c + x ijβ + z ] ijυ i u ij = h 1 vector for the set of h covariates for which proportional odds is not assumed effects of u variables DO vary across the C 1 cumulative logits more flexible model for ordinal response relations can be used to empirically test proportional odds assumption 36

Proportional Odds Assumption: covariate effects are the same across all cumulative logits Response group Absent Mild Severe total control 27 46 27 100 cumulative odds 27 73 =.37 73 27 = 2.7 logit -1 1 treatment 38 44 18 100 cumulative odds 38 62 =.61 82 18 = 4.6 logit -.5 1.5 group difference =.5 for both cumulative logits 37

Non-Proportional Odds: covariate effects vary across the cumulative logits Response group Absent Mild Severe total control 27 46 27 100 cumulative odds 27 73 =.37 73 27 = 2.7 logit -1 1 treatment 28 60 12 100 cumulative odds 28 72 =.39 88 12 = 7.3 logit -.95 2 UNEQUAL group difference across cumulative logits 38

Open C:\SuperMixEn Examples\Workshop\Ordinal\schizo2np.mum (or C:\SuperMixEn Student Examples\Workshop\Ordinal\schizo2np.mum) 39

Note that Dependent Variable Type is ordered 40

Two explanatory variables: SqrtWeek and Tx SWeek 41

Explanatory Variable Interactions - both are selected 42

Proportional Odds Assumption Accepted: χ 2 4 = 3325.51 3324.16 = 1.35 43

Linear Transforms Fixed part of model: λ c = ˆγ 0c ˆβ1 SqrtWeek + ˆβ 2 Tx*SWeek + ˆγ 1c SqrtWeek + ˆγ 2c Tx*SWeek] cumulative logit variable 1 vs 2,3,4 1,2 vs 3,4 1,2,3, vs 4 SqrtWeek ˆβ1 ˆβ1 + ˆγ 12 ˆβ1 + ˆγ 13 Tx SWeek ˆβ2 ˆβ2 + ˆγ 22 ˆβ2 + ˆγ 23 H 0 : β 1 + γ 12 = 0; SqrtWeek effect is 0 on the 2nd cumulative logit z = ˆβ 1 + ˆγ 12 SE( ˆβ 1 + ˆγ 12 ) 44

45

46

47

NIMH Schiz Study: Severity of Illness (N = 437) Ordinal LR Estimates (se) - random intercept and trend model Proportional Non-Proportional Odds Model Odds Model 1 vs 2,3,4 1,2 vs 3,4 1,2,3 vs 4 Time (sqrt week) 0.900 0.951 1.033 0.834 (0.190) (0.331) (0.224) (0.206) Drug by Time 1.674 1.683 1.603 1.683 (0.208) (0.288) (0.229) (0.237) 2 log L 3325.51 3324.16 Proportional Odds accepted (χ 2 4 = 3325.51 3324.16 = 1.35) 48

San Diego Homeless Research Project (Hough) 361 mentally ill subjects who were homeless or at very high risk of becoming homeless 2 conditions: HUD Section 8 rental certificates (yes/no) baseline and 6, 12, and 24 month follow-ups Categorical outcome: housing status streets / shelters (Y = 0) community / institutions (Y = 1) independent (Y = 2) Question: Do Section 8 certificates influence housing status across time? 49

Under SSI, Inc > SuperMix (English) or SuperMix (English) Student Under File click on Open Spreadsheet Open C:\SuperMixEn Examples\Workshop\Nominal\SDHOUSE.ss3 (or C:\SuperMixEn Student Examples\Workshop\Nominal\SDHOUSE.ss3) 50

C:\SuperMixEn Examples\Workshop\Nominal\SDHOUSE.ss3 51

Select Housing column, then Edit > Set Missing Value 52

Select File > Data-based Graphs > Bivariate 53

54

Select File > Data-based Graphs > Bivariate 55

56

57

Under File click on Open Existing Model Setup Open C:\SuperMixEn Examples\Workshop\Ordinal\SDO1.mum (or C:\SuperMixEn Student Examples\Workshop\Ordinal\SDO1.mum) 58

Note that Dependent Variable Type is ordered 59

All explanatory variables are indicator (dummy) variables 60

Housing status across time: 1289 observations within 361 subjects Ordinal Mixed Regression Model estimates and standard errors (se) Proportional Odds Non-Proportional Odds Non-street 1 Independent 2 difference term estimate se estimate se estimate se estimate se threshold 1.220.198.322.207 threshold 2 2.966.230 2.700.298 t1 (6 month) 1.736.235 2.298.303 1.079.343-1.219.408 t2 (12 month) 2.316.247 3.346.387 1.645.340-1.701.467 t3 (24 month) 2.500.253 2.822.348 2.145.337 -.676.422 section 8 (yes=1).497.277.592.294.323.394 -.269.384 section 8 by t1 1.409.341.566.467 2.024.471 1.457.581 section 8 by t2 1.173.354 -.958.506 2.017.476 2.975.600 section 8 by t3.638.349 -.366.480 1.073.464 1.440.573 subject var 2.134.354 2.128.353 (ICC.4) 2 log L 2274.39 2222.25 (χ 2 7 = 52.14) bold indicates p <.05 italic indicates.05 < p <.10 1 = independent + community vs street 2 = independent vs community + street 61

For Non-Proportional Odds model, under File click on Open Existing Model Setup Open C:\SuperMixEn Examples\Workshop\Ordinal\SDO2.mum (or C:\SuperMixEn Student Examples\Workshop\Ordinal\SDO2.mum) 62

Note that Dependent Variable Type is ordered 63

Note Explanatory Variable Interactions is set to 7 64

65

Mixed Multinomial Logistic Regression Model Y ij = nominal response of level-2 unit i and level-1 unit j Which member of The Polkaholics is your favorite? (asked before, during, and after a show) 66

Mixed-effects Multinomial Logistic Regression Model log p ijc p ij1 = u ijγ c + z ijυ ic C 1 contrasts to reference cell (c = 1) regression effects γ c vary across contrasts random-effects υ ic vary across contrasts independent correlated For example, with C = 3 c = 2, 3,... C contrast ordinal nominal c1 2 & 3 vs 1 2 vs 1 c2 3 vs 1 & 2 3 vs 1 67

Model in terms of the category probabilities p ijc = Pr(Y ij = c υ ic ) = exp(z ijc ) 1 + C h=2 exp(z ijh ) for c = 2, 3,..., C p ij1 = Pr(Y ij = 1 υ ic ) = 1 1 + C h=2 exp(z ijh ) where the multinomial logit z ijc = u ijγ c + z ijυ ic 68

69

Under File click on Open Existing Model Setup Open C:\SuperMixEn Examples\Workshop\Nominal\sdhouse.mum (or C:\SuperMixEn Student Examples\Workshop\Nominal\sdhouse.mum) 70

Note that Dependent Variable Type is nominal 71

72

Can select first or last category as the reference cell 73

Try independent random effects first 74

75

76

Now allow the random effects to be correlated 77

LR test comparing models: χ 2 1 = 2218.72 2180.94 = 37.78 78

79

Summary Models for longitudinal ordinal and nominal data as developed as models for continuous and dichotomous data Proportional odds models Non and partial proportional odds models Nominal models (with reference-cell contrasts) Grouped-time survival analysis models SuperMix can do it all, including 3-level models, also for counts full likelihood solution using adaptive quadrature 80