Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia

Similar documents
Repeated ordinal measurements: a generalised estimating equation approach

Generalized Linear Models for Non-Normal Data

STAT 705 Generalized linear mixed models

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

An Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications

PQL Estimation Biases in Generalized Linear Mixed Models

Generalized Models: Part 1

Longitudinal analysis of ordinal data

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

GEE for Longitudinal Data - Chapter 8

Longitudinal Modeling with Logistic Regression

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

Introduction to Generalized Models

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Figure 36: Respiratory infection versus time for the first 49 children.

Growth models for categorical response variables: standard, latent-class, and hybrid approaches

Trends in Human Development Index of European Union

Semiparametric Generalized Linear Models

Introduction to mtm: An R Package for Marginalized Transition Models

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

,..., θ(2),..., θ(n)

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

Bayesian Multivariate Logistic Regression

Regression models for multivariate ordered responses via the Plackett distribution

A Sampling of IMPACT Research:

8 Nominal and Ordinal Logistic Regression

Investigating Models with Two or Three Categories

Generalized Linear Latent and Mixed Models with Composite Links and Exploded

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Chapter 1. Modeling Basics

Efficiency of generalized estimating equations for binary responses

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Transitional modeling of experimental longitudinal data with missing values

Sample Size and Power Considerations for Longitudinal Studies

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Lecture 10: Introduction to Logistic Regression

University of Southampton Research Repository eprints Soton

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

E(Y ij b i ) = f(x ijβ i ), (13.1) β i = A i β + B i b i. (13.2)

A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Study Design: Sample Size Calculation & Power Analysis

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

STAT 7030: Categorical Data Analysis

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

A User-Friendly Introduction to Link-Probit- Normal Models

Comparison of methods for repeated measures binary data with missing values. Farhood Mohammadi. A thesis submitted in partial fulfillment of the

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models

A weighted simulation-based estimator for incomplete longitudinal data models

Latent-Variable Models for Longitudinal Data with Bivariate Ordinal Outcomes

STAT5044: Regression and Anova

Type I and type II error under random-effects misspecification in generalized linear mixed models Link Peer-reviewed author version

Pseudo-score confidence intervals for parameters in discrete statistical models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Random Effects Models for Longitudinal Data

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Lecture 3.1 Basic Logistic LDA

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Generalized Multilevel Models for Non-Normal Outcomes

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

Introducing Generalized Linear Models: Logistic Regression

Recent Developments in Multilevel Modeling

A Correlated Binary Model for Ignorable Missing Data: Application to Rheumatoid Arthritis Clinical Data

Time-Invariant Predictors in Longitudinal Models

Analysing categorical data using logit models

Bayes methods for categorical data. April 25, 2017

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Longitudinal Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

STAT 526 Advanced Statistical Methodology

Statistical methods for marginal inference from multivariate ordinal data Nooraee, Nazanin

T E C H N I C A L R E P O R T A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS. LITIERE S., ALONSO A., and G.

Generalized Linear Models

Correspondence Analysis of Longitudinal Data

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

A class of latent marginal models for capture-recapture data with continuous covariates

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Journal of Statistical Software

Using Estimating Equations for Spatially Correlated A

Transcription:

Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3067-3082 Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia Z. Rezaei Ghahroodi Statistical Research and Training Center z rezaei@srtc.ac.ir M. Ganjali Department of Statistics Shahid Beheshti University Tehran, 1983963113, Iran m-ganjali@sbu.ac.ir I. Kazemi The University of Isfahan ikazemi@sci.ui.ac.ir Abstract In this paper some models are applied to analyze insomnia data. Insomnia is a sleep disorder in which the patient does not get enough or satisfactory sleep and investigating the use of hypnotic drug for its cure is so important. For studying the effect of drugs on insomnia, it is useful to observe the response of interest (cured or not cured) for each subject repeatedly at several times. So, a longitudinal study of repeated binary responses along with a treatment variable (with two levels hypnotic or placebo) are used for some individuals on two times. A very important feature of the data is that the correlation between the two longitudinal responses depends on the level of hypnotic drug. This is, firstly, shown by an exploratory data analysis. Then, as an option, Dale s bivariate model for analyzing longitudinal or repeated measurements of binary responses, considering the effect of treatment on the responses and the correlation structure, is used to find an overall population effect on the responses. As other options, transition and random effect models for analyzing these data are used to investigate, respectively, the reasons for

3068 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi the change of the responses and to find the subject-specific variations. How the interpretation of the results is different with the use of the Dale s model and how one may find out the effect of the drug on the correlation structure by a transition or random effect model are also discussed. Keywords: Generalized estimating equations, Insomnia, Longitudinal binary data, Marginal model, Random-effects models, Transition model 1 Introduction In many studies, for many individuals the multiple responses are taken in sequence on the same subject. All of the observations on the same subject will have the same between-subject errors and can be correlated within a subject. Two such data structures which can be problematic to analyse are repeated measures data and longitudinal data. In clinical trials and health-related applications, repeated binary response data commonly occur. For example, a physician might evaluate patients who suffer from loss of sleep or insomnia at baseline and at weekly intervals regarding whether a new hypnotic drug treatment has been successful or not. In such studies, the possibility of correlations between responses given by the same individual need to be taken into account in the data analysis. Various models can be used to handle such correlations. One approach is the use of marginal modelling, which allow for inferences about parameters averaged over the whole population (Liang et al., 1992; Molenberghs and Leasaffre, 1994). Another approach is making use of random effects modelling, which deliberately provide inferences about variability between respondents (Verbeke and Lesaffre, 1996; Diggle et al., 2002). Another appropriate approach to investigate the reasons for the change of the responses is the use of Markov (transition) models (Reuter et al., 2004; Chung et al., 2005). In this paper we shall shortly review the specification of these models and then apply them to an application where a hypnotic drug is used to cure insomnia. Maximum likelihood and generalized estimating equations approaches to the analysis of correlated longitudinal categorical data will be examined and available software for the estimation of parameters for all models will be discussed. The scientific focus in marginal modelling is devoted to two regression models: the first is useful for analysing marginal means and the second takes into account the correlation between responses of the same individual. This model considers the effect of some explanatory variables on the responses and marginal pairwise local odds ratio (Goodman, 1981) or conditional odds ratio (Bishop et al., 1975) for the correlation structure which depends on individuals. In order to analyze insomnia data set we also apply a transition model.

Models for longitudinal analysis 3069 We will discuss on how the correlation structure may be dependent on subject characteristics in transition model. Then, we introduce a random-coefficient model which takes into account the heterogeneity. In these models, maximum likelihood approach will be used to estimate the parameters. We specifically discuss that the addition of random effects to the linear predictor may substantially increases the usefulness of the random effect model, but one should be caution about interpreting the results. Addition of random effects may arise computational complexity since we must average over the random effects to obtain the likelihood function of the model. In binary random-effects models, the resulting integral does not have a closed form expression and thus numerical optimization methods needs to be employed. We will explain how existing software packages can help to fit these models for insomnia data, where measurements for each subject occur both at baseline and at follow-up. The paper is organized as follows. In Section 2, insomnia data are introduced and the initial exploratory data analysis is given. Section 3 discusses various possible models (marginal, random-effects and transition models) in some details and considers some estimation methods, including Maximum Likelihood (ML) and Generalized Estimating Equations (GEEs). For each case, the suitable software for the estimation of parameters will also be introduced. We further highlight the usefulness of transition model in the analysis of insomnia data. The data set is analyzed in Section 4. Some concluding remarks are given in Section 5. 2 Insomnia Insomnia is a sleep disorder in which the patient does not get enough, or satisfactory sleep. Insomnia can vary as to how long it lasts and as to how often it occurs. Insomnia can be short-term (called acute insomnia) or last a long time (called chronic insomnia). Acute insomnia can last from one night to a few weeks. Chronic insomnia is present when a person has insomnia at least 3 nights a week for 1 month or longer. It can be caused by many things and often occurs along with other health problems. Common causes of chronic insomnia are depression, chronic stress, and pain or discomfort at night. Acute insomnia may not require treatment. Treatments for chronic insomnia include first treating any underlying conditions or health problems that may be causing the insomnia. If insomnia still continues, the health care provider may suggest either behavioral therapy or medication. The data in Table 1 are extracted from results of a randomized, double-blind clinical trial comparing an active hypnotic drug with a placebo in 239 patients who have sleeplessness problems (Francom et al., 1989). The measure of interest is the patient s response to the question How quickly did you fall asleep

3070 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi after going to bed? In the main data the response is ordinal variable. Patients were asked this question again after a one week placebo washout period (baseline measurement) and also following a two-week treatment period. We regard treatment as an explanatory and independent variable which affect the responses. As an application for our purpose in this paper we categorize the response as a binary variable of less than 30 minutes and greater than 30 minutes. The reason for combining some levels of the initial ordered responses is that categorizing follow-up ordered response by treatment and initial ordered response leads us to have some cells with 0 counts, and some with counts less than 5. The use of asymptotic inferences, involved in using cumulative logit models for ordinal responses (vide McCullagh 1980), may be invalid for such data sets. Table 1 Time before falling to sleep obtained from the question, How quickly did you fall asleep?, in minutes (follow-up response, Y 2, by treatment and initial response, Y 1, observed and row percentages) Follow-up (Y 2 ) Treatment initial (Y 1 ) < 30 30 Total Active <30 27 5 32 84.4% 15.6% 100.0% 30 62 25 87 71.3% 28.7% 100.0% Placebo < 30 30 4 34 88.2% 11.8% 100.0% 30 30 56 86 34.9% 65.1% 100.0% Table 2 Sample marginal distribution of initial and follow-up responses, odds and odds ratio for two treatments Response category Response Treatment < 30 30 Odds Odds Ratio Initial Active 0.269 0.731 0.368 0.929 Placebo 0.284 0.716 0.396 Follow-up Active 0.748 0.252 2.97 2.97 Placebo 0.500 0.500 1 For our data in Table 1, looking at the marginal distribution of the initial and follow-up responses for the two treatments, we may conclude that, initially the two groups had similar distributions, but at the follow-up the patients with the active treatment tended to fall asleep more quickly. Table 2 displays sample marginal distributions for initial and follow-up responses for the two treatments. This table also gives the odds of sleeping less than 30 minutes and odds ratio (a ratio of the odds of sleeping less than 30 minutes for people who used active drug to the odds of sleeping less than 30 minutes for people who used placebo drug) for two treatments. The sample marginal probabilities in Table 2 show that at first time the proportion of patients who used the active drug and fell asleep slowly ( 30) is 0.731 and then this proportion decreased to 0.252 after drug use. Also the results of odds ratio in this Table show that in the initial time, response is nearly independent of the drug use. But in the follow up time, people who

Models for longitudinal analysis 3071 use active drug are much more likely to sleep less than 30 minutes than people who use placebo. Odds ratio and Somers d (Somers, 1962) can be used to measure association between each ordinal response and our binary explanatory variable (treatment). Somers d measure is the difference between the proportions of concordant and discordant pairs, out of the pairs that are untied on the binary explanatory variable and it takes values in [-1, 1]. For association between two responses the estimated odds ratio takes the value 5.576 (Somers d=0.305). So, there is a high correlation between two responses. Any analysis of these data should take into account this correlation. Also estimated odds ratio (Somers d) for the two responses for people who use active and placebo drug is 2.177 (Somers d=0.134) and 14 (Somers d =0.478), respectively. This shows that the correlation between responses changes with different level of treatment. Therefore, we have to consider the treatment effect on correlation structure as well as the treatment effect on the responses. 3 Models for the Analysis of Binary Longitudinal Data This section studies several approaches to the analysis of binary longitudinal data. We consider marginal, random-effects and transition models. The objective is to present the idea underlying each model as well as their forms for application on the insomnia study. 3.1 Marginal Model In marginal model, the regression of the response on explanatory variables is modelled separately from within-person correlation. Suppose that it is of interest to know the relation between the response vector and the explanatory variables at each time and how these relations change over time. For each observation, define two binary dependent variables, Y 1 and Y 2, each of which take the value of either 0 (Y t < 30) or 1 (Y t 30). In marginal modeling, the joint outcome (Y 1,Y 2 ) is modeled by using a marginal probability for each response variable, and correlation structure, by use of the odds ratio, parameterizes the relation between the two dependent variables. The systematic components model the marginal probabilities, π tj = Pr(Y t = j x t ) where j =0, 1 and t =1, 2, and the odds ratio, defined as ψ = π 12(00)π 12(11) π 12(01) π 12(10) where π (12)(j1 j 2 ) = Pr(Y 1 = j 1,Y 2 = j 2 ) and ψ describe the association between

3072 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi the two outcomes. Thus, for each observation we have π t1 = exp(x t β t), t =1, 2, 1 + exp(x tβ t ) where the x t for t =1, 2 are vector of explanatory variables. For the association we consider ψ = exp(x β) where x is a vector of explanatory variables and ψ specifies the dependence structure among the repeated measurements. This is necessary for a complete description of the distribution of observations and hence for a likelihood analysis. This model was first introduced by Dale (1986). It is well known that if this correlation structure is ignored, estimates of standard error of parameter estimates may be incorrectly obtained (Liang and Zeger, 1986). For Insomnia data we consider the model equation as, logit[pr(y it =1 α t,β t, treat i )] = α t + β t treat i, t =1, 2, ψ = exp(α + β treat i ), (1) where treat is an indicator variable for treatment (1: active, 0: placebo). There are two estimation approaches ML and GEE that will be discussed in the following subsections. 3.1.1 Maximum Likelihood (ML) Approach The likelihood function for the estimation of parameters in marginal models is given by Dale (1986). We may modify the Dale model in a manner that may able us to consider the effect of covariates on the correlation structure. We shall use function nlminb in software R to find the parameter estimates. Using the marginal model, the log-likelihood for our 2 time points is 1 1 L MAR = [n j1 j 2 0ln(π (12)(j1 j 2 )0)+n j1 j 2 1ln(π (12)(j1 j 2 )1)] j 1 =0 j 2 =0 where π (12)(j1 j 2 )k = Pr(Y 1 = j 1,Y 2 = j 2 treat i = k), for k =0, 1 and n (j1 j 2 )k for k =0, 1 are the number of individuals with Y 1 = j 1 and Y 2 = j 2 when the explanatory variable is treat i = k. As is seen, if we consider the effects of covariates on the correlation structure in marginal models, the maximization of the likelihood needs a highly developed program.

Models for longitudinal analysis 3073 3.1.2 Generalized Estimating Equations (GEEs) Approach GEEs provide an accomplished estimation approach with desirable statistical properties, provided that the mean structure is correctly specified, to analyze binary correlated data arising from repeated measurements. It was introduced by Liang and Zeger (1986) as an estimation method of dealing with correlated data when the data generating process follows an exponential family. In our application, repeated measurement provides a bivariate binary response (Y i1,y i2 ) for all i. GEEs specify a structure for means μ it = E(Y it ) and for variance function υ(μ it ), which describes how var(y it ) depends on μ it (McCullagh, 1983). The estimates are solutions of quasi-likelihood equations called generalized estimating equations. The GEEs approach utilizes an assumed covariance structure for Y i =(Y i1,y i2 ) specifying a variance function and a pairwise correlation pattern. The approach is also appealing for categorical data because of its computational simplicity compared to ML. The generalized estimating equations for the vector β have the form S(β) = n i=1 D i V 1 i (y i μ i )=0, where μ i is the vector of probabilities associated with Y i and D i = μ i β. Also V i is the working covariance matrix for Y i and n is the number of patients. The reader can find more details on the methodology of GEEs in Liang and Zeger (1986) and Zeger et al. (1988). This approach can be implemented in SAS software (the genmod procedure), in R (gee function in the gee package) or in STATA (the xtgee command). Let illustrate the GEEs approach for the insomnia data. We use this approach to fit the marginal model logit[pr(y i1 =1) treat i ] = α + β 1 treat i. logit[pr(y i2 =1) treat i ] = α + β 1 treat i + α 1 + β 2 treat i. In this model, α is an intercept at the first time and α 1 is the effect of time on logarithm of odds of response at the second time. By this definition α + α 1 is the intercept for the follow up response model. The effect of treatment at t = 1 is taken into account by β 1, whereas β 2 is the parameter which indicates the interaction between time and treatment. So, β 1 + β 2 indicates the effect of treatment for the follow up response. We shall use an exchangeable working correlation structure which is applicable for most longitudinal data. One shortcoming of using GEEs is that the working correlation matrix is usually assumed to have a structure independent of subjects. This ad-hoc assumption may be invalid when the between-subject variation crucially relies on some subject characteristics. Furthermore, the properties of GEEs is influenced by the model misspecification and this appears to be quite serious in

3074 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi the estimation of parameters (Liang and Zeger, 1986). 3.2 Random-Effects Models The basic idea underlying a random effects model is that there is a natural heterogeneity across individuals in their regression coefficients and that the heterogeneity can be handled by a probability distribution. Correlation among observations for the same individual arises from their sharing unobservable variables, u i. Now we illustrate a logistic random-effects model for our bivariate binary data. Suppose that we have a data set with two dependent binary responses for the i th individual, i.e. (Y i1,y i2 ). Observation Y it for the i th individual takes values 1 or 0 for t =1, 2. In general, a random-effects model can be written as follows, logit[pr(y it =1 x t,u i )] = α + x t β t + u i, t =1, 2, where the effects u i are independent and identically distributed as N(0,σ 2 1 ). The model for the insomnia data is logit[pr(y it =1 treat i, time,u i )] = α+β 1 treat i +α 1 time+β 2 treat i time+u i, (2) where time takes values 0 and 1 for the initial and follow-up time period, respectively. The ML approach can now be applied to derive the parameter estimates. The likelihood is given in Agresti and Natarajan (2001). Specifically, the joint probability density for the response vector Y i, conditional on u i, may be written as f (y i u i )= t p y it it (1 p it ) 1 y it, where, for t =1, 2, p it = Pr(Y it =1 treat i, time,u i ). It then follows that the marginal likelihood function (conditional on the covariates) for the fixed effects and the variance component has the form given by L RE = i f (y i u i ) φ ( u i 0,σ 2 1) dui, (3) where φ (u i 0,σ 2 1) is a normal probability density function with mean 0 ad variance σ 2 1 for u i. Under the random-effect model (2), the Y it, given u i, are conditionally independent over t. However, this does not necessarily imply that the intra-subject

Models for longitudinal analysis 3075 correlation coefficient, ρ, is zero. If we consider the latent variable model where a continues latent variable Yit underlies the observed Y it in (2), the variance of the random effect u i, σ1, 2 represents the between-subject variation, and the within-subject variation, π 2 /3, represents the variance of the logistic distribution, of the errors in model for Yit. Thus, correlation between Yi1 and Y i2, ρ, may be written as ρ = σ1/ 2 (σ1 2 + π 2 /3), which does not depend on subjects. As is seen, this approach may give misleading results when the correlation structure depends on covaraites. We now extend the logistic regression model by including random effects for the treatment in the structure for the linear predictor. This not only determines the structure of correlation between observations on the same subject, but also takes into account the heterogeneity among subjects, due to unobserved characteristics. To fit this random-coefficients model we assume u i = u 1i +u 2i treat i in Equations (2) and (3), where the effects u 1i and u 2i are independent and identically distributed over i and follow [ a bivariate ] normal distribution with σ 2 mean 0 and covariance matrix Σ = 1 σ 12 σ 12 σ2 2. With this model, the correlation between Yi1 and Yi2 is: where ρ treati = σ2 1 + σ2 2 (treat i) 2 +2cov(u 1i,u 2i )treat i σ 2 1 + σ 2 2(treat i ) 2 +2cov(u 1i,u 2i )treat i + π2 3 ρ treati =0 = ρ treati =1 = σ1 2, σ1 2 + π2 3 σ1 2 + σ2 2 +2cov(u 1i,u 2i ) σ 2 1 + σ 2 2 +2cov(u 1i,u 2i )+ π2 3 = σ2 1 + σ 2 2 +2σ 12 σ 2 1 + σ 2 2 +2σ 12 + π2 3 which shows that the association between responses depends on the covariates. Now we discuss alternative measures of association based on manifest variables or actual outcomes. Under the random-effect model (2) the marginal probability of a positive outcome and the joint probability of two positive outcomes, respectively, are: π i.t1 = Pr(Y it =1 treat i, time) = Pr(Y it =1 treat i, time,u i )φ(u i 0,σ1)du 2 i exp(α + β 1 treat i + α 1 time + β 2 treat i time + σ 1 u i ) = 1+exp(α + β 1 treat i + α 1 time + β 2 treat i time + σ 1 u i ) φ(u i 0, 1)du i t =1, 2 π i(12)(11) = Pr(Y i1 =1,Y i2 =1 treat i )

3076 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi = Pr(Y i1 =1 treat i,u i )Pr(Y i2 =1 treat i,u i )φ(u i 0,σ1 2 )du i = exp(α + β 1 treat i + σ 1 u i ) 1+exp(α + β 1 treat i )+σ 1 u i ) exp(α + α 1 +(β 1 + β 2 )treat i + σ 1 u i ) 1+exp(α + α 1 +(β 1 + β 2 )treat i + σ 1 u i ) φ(u i 0, 1)du i. [ ] σ 2 For the random-coefficients model with Σ = 1 0 0 σ2 2, the marginal probability of a positive outcome and the joint probability of two positive outcomes, respectively, are: π i.t1 = Pr(Y it =1 treat i, time) = Pr(Y it =1 treat i, time, u i )φ(u i 0, Σ)du i = exp(α + β 1 treat i + α 1 time + β 2 treat i time + σ 1 u i1 + σ 2 u i2 treat i ) 1+exp(α + β 1 treat i + α 1 time + β 2 treat i time + σ 1 u i1 + σ 2 u i2 treat i ) φ(u i1,u i2 0, I)du i1 du i2 t =1, 2 π i(12)(11) = Pr(Y i1 =1,Y i2 =1 treat i ) = Pr(Y i1 =1 treat i, u i )Pr(Y i2 =1 treat i, u i )φ(u i 0, Σ)du i = exp(α + β 1 treat i )+σ 1 u i1 + σ 2 u i2 treat i ) 1+exp(α + β 1 treat i )+σ 1 u i1 + σ 2 u i2 treat i ) exp(α + α 1 +(β 1 + β 2 )treat i + σ 1 u i1 + σ 2 u i2 treat i ) 1+exp(α + α 1 +(β 1 + β 2 )treat i + σ 1 u i1 + σ 2 u i2 treat i ) φ(u i1,u i2 0, I)du i1 du i2 (φ(u i 0, I)) is a bivariate normal density for the vector u i =(u 0i,u 1i ), with mean 0 and covariance matrix Σ (bivariate normal with mean 0 and Identity covariance matrix I). We are now in a position to compute any standard measure of dichotomous variables. For example, Pearson s correlation coefficient between two outcomes is: r treati = π i(12)(11) π i.11 π i.21 π i.11 (1 π i.11 ) π i.21 (1 π i.21 ) As noted in the classic text by Bishop et al. (1975, Chapter 11), all the standard measures of association for a two-by-two table are essentially functions of Pearson s correlation coefficient or functions of the odds ratio, which can be written as α treati = π i(12)(11)π i(12)(00) π i(12)(01) π i(12)(10),

Models for longitudinal analysis 3077 or the best-known measure depending on the odds ratio is Yule s Q, defined as Q treati = α treat i 1 α treati +1, where in the last two equations π i(12)(jk) = Pr(Y i1 = j, Y i2 = k treat i ) for j, k =0, 1. We can see that the calculation of the marginal and the joint probabilities are complex and there is not any command or procedure in existing software to compute them. These probabilities can be computed using Gaussian quadrature. The formula for the likelihood may be written as L RC = f (y i u i ) φ (u i 0, Σ) du i. (4) i In general, taking the first derivatives of the log-likelihood functions with respect to the parameters, will produce the ML estimates and the second derivatives will also produce their standard errors, provided that some regularity conditions are satisfied. The log-likelihood of (3) and (4), however, does not have a closed form solution and approximations to the likelihood or numerical integration techniques are required to maximize (3) and (4) with respect to the unknown parameters. The viability of approximation in the integral computation is crucial to obtain valid solutions of the likelihood function. It is shown that the adaptive Gaussian-quadrature approximation method for performing the maximum likelihood estimation has a relatively good performance among the other numerical methods (Pinheiro and Bates, 1995; Rabe-Hesketh et al., 2002). This procedure rescales the evaluation points so that the integrand is sampled in a more appropriate region and produced accurate estimates over our range of interest using fewer quadrature points. This method is implemented in many software packages, such as SAS (the nlmixed procedure) and STATA (the gllamm program) (Rabe-Hesketh et al., 2001). We will use STATA to analyze insomnia data. 3.3 Transition Model Under a transition model, correlation between Y i1 and Y i2 exists because the past value, Y i1, explicitly influences the present observation Y i2. For the data considered here the primary question of interest is the degree of change, or improvement, among treatments, or how transitions are made between consecutive time points. For such a scientific question, an appropriate approach is the use of Markov (transition) models.

3078 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi We consider a first order transition model for repeated binary responses. The general form of the model for two responses is logit [P (Y i1 =1 α, β)] logit [P (Y i2 =1 Y i1 = a; α a,β a )] = α + x 1β = α a + x 2β a where α, α 0 and α 1 are intercept parameters, β is the vector of parameters for explanatory variables at the first time and β a is a vector of parameters for the explanatory variables at the second time where a =0, 1. For an extensive discussion of transition models and the likelihood for finding ML estimates see Ganjali and Rezaei (2007)and Rezaei et al. (2009). This model also can be analyzed by existing software such as SAS (procedure: genmod) and SPSS (binary logistic) and R (glm function in stats package). The form of the model for insomnia data would be, logit[p (Y i1 =1 α, β, treat i )] = α + β treat i, logit[p (Y i2 =1 Y i1 = a; α a,β a, treat i )] = α a + β a treat i, a =0, 1. (5) By employing various models at the second period based on different values of response at the first period in (5), the correlation between responses depends on individuals. For this model one should estimate 6 parameters instead of 5 in random intercept model and 6 in random coefficient model. Therefore if there are different effects of explanatory variable on the second response for different values of the first response, one can conclude that the correlation structure between responses depends on individuals characteristics. 4 Model Fitting Results for Insomnia Data Table 3 gives the results of fitting marginal model with bivariate responses. In model (1) we may assume that α 0 and β 0 (MMCOR2), α 0 and β = 0 (MMCOR1), or ignoring the dependence structure (MMIND) among the repeated measurements. Results of fitting the marginal model MMIND for the initial response show that there is no significant effect of treatment (P-value= 0.803). Whereas, the effect of treatment on the follow-up response (P-value= 0.000)is significant. By comparing the estimation results of three marginal models, we see that the standard error of parameter estimates obtained by MMCOR2 are less than those obtained by MMCOR1 and MMIND. Further, results of model MMCOR2 show strong evidence of the treatment effect on the correlation between two responses. We may conclude that the correlation between responses changes with different level of treatment. This is e 2.639 =13.99 for people who use active drug and e 2.639 1.861 =2.177 for people who use placebo drug. This was already illustrated in our exploratory

Models for longitudinal analysis 3079 analysis in Section 2. The results from the GEEs, the random effects (RE), and the random coefficients (RC) models are shown in Table 4. The RC model was first fitted by plugging covariance between intercepts and slopes into the model. The estimate of this parameter was negative but insignificant. Thus, we refit the RC model without this covariance. Results confirm that there is no significant effect of treatment on the initial response but there is a significant effect of treatment on the follow-up response. Also estimate of the scale parameter in the random-effects model and correlation in the GEEs method show that there is a strong association between the two responses. In addition, the estimate for σ2 2 is evidence that the slope coefficient of treatment univariates across subjects, resulting that the two models are equivalent. Table 3 Parameter Estimates and Standard Error of estimates of marginal modeling parameters considering dependence between responses (MMCOR1 and MMCOR2) and assuming independence between responses (MMIND) for Insomnia data (bold numbers are significant at %5 level) MMCOR2 MMCOR1 MMIND Parameter Est. S.E. Est. S.E. Est. S.E. α 1 0.928 0.178 0.928 0.203 0.928 0.203 α 2 0.000 0.000 0.000 0.183 0.000 0.183 α 2.639 0.578 1.900 0.40 - - β 1 (T reatment : Active) 0.072 0.276 0.072 0.289 0.072 0.289 β 2 (T reatment : Active) -1.087 0.211-1.087 0.280-1.087 0.280 β(t reatment : Active) -1.861 0.792 - - - - Table 4 Parameter Estimates and Standard Error of GEEs, RE, and RC estimates for Insomnia data (bold numbers are significant at %5 level) GEEs RE RC Parameter Est. S.E. Est. S.E. Est. S.E. α 0.928 0.203 1.512 0.358 1.505 0.355 α 1 (Time) -0.928 0.197-1.521 0.392-1.522 0.392 β 1 (T reatment : Active) 0.072 0.289 0.051 0.451 0.062 0.448 β 2 (T reatment : Active) -1.159 0.338-1.739 0.545-1.743 0.545 σ1 2 - - 3.752 1.437 3.772 1.452 σ2 2 2.2E-17 6.8E-9 ρ 0.312 - - - Results of the transition model are given in Table 5. Now, we have more insight into the process of generating the data. When the initial response is less than 30 there is no significant effect from the drug. However, for an initial value of more than 30 there is a positive effect from the drug on the probability of the follow-up response. This means that the drug is less likely to be effective for patients with pre-sleeping intervals less than 30 minutes and so knowledge of the initial response may discourage the practitioner from prescribing an ineffectual treatment. In general for analyzing insomnia data we suggest the use of transition model which in our view can answer the question about change of treatment effect

3080 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi on the response. This model can also be implemented by existing software and it properly can take into account the dependence of correlation between responses on the treatment. Table 5 Parameter Estimates and Standard Error of using transition model for Insomnia data (bold numbers are significant at %5 level) Y 1 Y 2 initial< 30 Y 2 initial 30 Parameter Est. S.E. Est. S.E. Est. S.E. α 0.928 0.203 - - - - α 0 - - -2.015 0.532 - - α 1 - - - - 0.624 0.226 β(t reatment : Active) 0.072 0.289 - - - - β 0 (T reatment : Active) - - 0.329 0.721 - - β 1 (T reatment : Active) - - - - -1.532 0.328 5 Concluding Remarks The objective of presenting this paper was to illustrate the use of various models in the analysis of repeated binary outcomes, with emphasizing on the fact that the choice of model is dependent on the scientific question of interest and simplicity. An application of the models was provided through an example of health studies. The paper used user-friendly software packages, such as SAS, R or SPSS, to estimate model parameters for an insomnia patient s binary response to the question of How quickly did you fall asleep?. For these data, the subject-specific variation was modeled by including random effects in the linear predictor of the logistic regression model. Results show that there is a strong association between two responses and this correlation changes with different level of treatment. Furthermore, use of a transition model shows that the effectiveness of the drug on the follow-up response crucially depends on the initial response. In our view, results of fitting the transition model give more light on practitioners aim, as the drug is less likely to be effective for patients with pre-sleeping intervals of less than 30 minutes. However, the drug is more effective for patients with pre-sleeping intervals of more than 30 minutes. References [1] A. Agresti and R. Natarajan, Modeling clustered ordered categorical data: A survey. International Statistical Review, 69 (2001), 345-371. [2] Y. M. M. Bishop, S. E. Fienberg and P. W. Holland, Discrete Multivariate Analysis: Theory and Practice, Cambridge: MIT Press, 1975. [3] H. Chung, Y. Park and S. T. Lanza, Latent transition analysis with covariates: pubertal timing and substance use behavior in adolescent females, Statistics in Medicine, 24 (2005), 2895-2910.

Models for longitudinal analysis 3081 [4] J. R. Dale, Global cross-ratio models for bivariate, discrete, ordered responses, Biometrics, 42(1986), 909-917. [5] P.J. Diggle, P. Heagerty, K.Y. Liang and S.L. Zeger, Analysis of Longitudinal Data, Oxford: University Press, 2002. [6] S. F. Francom, C. Chung-Stein and J. R. Landis, A log-linear model for ordinal data to characterize differential change among treatments, Statistics in Medicine, 8(1989), 571-582. [7] M. Ganjali and Z. Rezaee, A transition model for analysis of repeated measure ordinal response data to identify the effects of different treatments, Drug Information Journal, 41(2007), 527-534. [8] L. A. Goodman, Association models and canonical correlation in the analysis of cross-classifications having ordered categories, Journal of the American Statistical Association, 76(1981), 320-334. [9] K.Y. Liang, S.L. Zeger and B.F. Qaqish, Multivariate regression analyzes for categorical data (with discussion). Journal of the Royal Statistical Society, Series B, 54(1992), 3-40. [10] K. Y. Liang and S. L. Zeger, Longitudinal data analysis using generalized linear models, Biometrika, 73(1986), 13-22. [11] P. McCullagh, Regression models for ordinal data, Journal of the Royal Statistical Society, Series B, 42(1980), 109-142. [12] P. McCullagh, Quasi-likelihood functions, Annual Statistics, 11(1983), 59-67. [13] G. Molenberghs and E. Leasaffre, Marginal modeling of correlated ordinal data using a multivariate plackett distribution, Journal of the American Statistical Association, 89(1994), 633-644. [14] J. C. Pinheiro and D. M. Bates, Approximations to the log-likelihood function in nonlinear mixed-effects model, Journal of Computational and Graphical Statistics, 4 (1995), 12-35. [15] S. Rabe-Hesketh, A. Pickles and A. Skrondal, GLLAMM: A class of models and a Stata program, Multilevel Modelling Newsletter, 13(2001), 17-23. [16] S. Rabe-Hesketh, A. Skrondal and A. Pickles, Reliable estimation of generalized linear mixed models using adaptive quadrature, The Stata Journal, 2(2002), 1-21. [17] Z. Rezaei Ghahroodi, M. Ganjali, and D. Berridge, A Transition Model for Ordinal Response Data with Random Dropout: An Application to the Fluvoxamine Data, Journal of Biopharmaceutical Statistics, 19 (2009), 658-671. [18] M. Reuter, J. Hennig, M. Buehner and M. Hueppe, Using latent mixed Markov models for the choice of the best pharmacological treatment, Statistics in Medicine, 23(2004), 1337-1349. [19] R. H. Somers, A New Asymmetric Measure of Association for Ordinal Variables, American Sociological Review, 27(1962), 799-811.

3082 Z. Rezaei Ghahroodi, M. Ganjali and I. Kazemi [20] G. Verbeke and E. Lesaffre, A linear mixed-effects model with heterogeneity in the random-effects population, Journal of the American Statistical Association, 91(1996), 217-221. [21] S. L. Zeger, K-Y. Liang and P. S. Albert, Models for longitudinal data: a generalized estimating equation approach, Biometrics, 44(1988), 1049-1060. Received: April, 2010