Latent-Variable Models for Longitudinal Data with Bivariate Ordinal Outcomes

Size: px

Start display at page:

Download "Latent-Variable Models for Longitudinal Data with Bivariate Ordinal Outcomes"

Baldwin Lawson
6 years ago
Views:

1 Latent-Variable Models for Longitudinal Data with Bivariate Ordinal Outcomes David Todem, 1, KyungMann Kim 2 and Emmanuel Lesaffre 3 1 Department of Statistics, University of Wisconsin-Madison, 1210 W. Dayton St., Madison, WI 53706, U.S.A. 2 Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 600 Highland Ave., Madison, WI 53792, U.S.A. 3 Biostatistical Centre, K.U.Leuven, Kapucijnenvoer 35, B-3000 Leuven, Belgium todem@stat.wisc.edu

2 Summary. We use the concept of a latent variable to derive the joint distribution of bivariate ordinal outcomes, and then extend the model to allow for longitudinal data. Specifically, we relate the observed ordinal outcomes using threshold values to a bivariate latent variable, which is then modeled as a linear mixed model. Random effects terms are used to tie all together repeated observations from the same subject. The cross-sectional association between the two outcomes is modeled through the correlation coefficient of the bivariate latent variable, conditional on random effects. Assuming conditional independence given random effects, the marginal likelihood, under the missing data at random assumption, is approximated using an adaptive Gaussian quadrature for numerical integration. The model provides fixed effects parameters that are subject-specific, but retain the population-averaged interpretation when properly scaled. This is particularly well suited for the situation in which population comparisons and individual level contrasts are of equal importance. Data from a randomized intervention trial of a cardiovascular educational program where the responses of interest are changes in hypertension and hypercholestemia status illustrate the proposed model. Generalization from bivariate to multivariate models is also discussed. Key words: Adaptive Gaussian quadrature; Bivariate ordinal outcome; Importance sampling; Latent variable; Maximum marginal likelihood; Random effects; Repeated measures; Threshold values. 1. Introduction Multiple outcomes are often used as primary endpoints in many longitudinal studies. For example, in growth curve studies not only the height of the child is measured, but also the weight. Often, even length measurements are taken of various parts of the body (e.g., arm, leg). Another example is a social study known as the panel study in which is a sociological study where multiple questions are posed to the subject to measure the outcomes of interest. The same is true in clinical trials, but more importantly the effect of a new treatment is usually evaluated from the analysis of a bivariate response vector comprising an efficacy and a safety parameter. Additionally, it is often the case that subjects responses to treatment are classified according to an ordinal or graded scale, e.g., 1

3 the visual analogue scale. An example of bivariate ordinal data in time is provided by a longitudinal trial of a cardiovascular educational program in which the responses of interest are changes in hypertension and hypercholestemia status. Each of the 266 participants in the analysis sample was randomized either to an audio-dietary educational intervention group or to a non-intervention group (control group). At each of the two follow-up times, 4 months and 12 months, the two outcomes were derived from changes in blood pressure and serum cholesterol from baseline. These outcomes were re-coded to ordinal scales according to whether there was a positive change (1), no change (2) and a negative change (3), based on the criteria established by the NIH-sponsored expert panels (NCEP, 1993; JNC, 1993). Not only were the marginal effects of the education intervention on blood pressure and cholesterol of interest, but so were education effects on the association between these two outcomes (Ten Have and Morabia, 1999). It was also of interest to know if the answer depended on the time. The literature suggests that dietary interventions do not typically impact both blood pressure and cholesterol. There is a need to investigate the effect of the intervention on the association between the two outcomes, to understand why this is the case. Statistical analysis of bivariate ordinal outcome data in time raises a number of challenging issues. It is well known, for example, that repeated measurements from the same subject over time necessitates the use of methods for correlated data. Multiplicity of the outcomes at any timepoint is another important issue. Most modeling approaches in the literature, either deal with crosssectional data or are restricted to longitudinal univariate outcomes. Limited work has been done on bivariate ordinal outcomes in time. Although separate models can be fitted to each outcome, such an approach fails to borrow strength across the outcomes variables. By exploiting the correlation structure with a multivariate model, efficiency and power could be greatly increased (O Brien, 1984). Several authors have developed maximum likelihood procedures for cross-sectional multivariate ordinal outcomes. Glonek and McCullagh (1995), and Molenberghs and Lesaffre (1994) approached the specification of the joint probability of multivariate observations through the firstand higher-order marginal parameters. One important advantage of this approach is that multivari- 2

4 ate observations of unequal numbers of variables per subject may be analyzed quite naturally. Kim (1995) also considered a maximum likelihood estimation for bivariate ordinal measures by using a constrained latent variable specific to ophthalmologic studies. Williamson and Kim (1996) further developed marginal mean regression techniques based on the Plackett (1965) and Dale (1986) global odds ratios as a measure of association. All these models, were not meant for longitudinal outcomes and therefore could not be used for multivariate ordered categorical data in time. Ten Have and Morabia (1999) then extended the original Dale (1986) model to accommodate the time component in analyzing longitudinal bivariate binary outcomes. Basically, their model uses the concept of global odds ratios to represent the cross-sectional association and random effects terms for the longitudinal association. They also included random effects in the global odds ratios which can be seen as a means to construct a goodness-of-fit test for the basic starting model. Although this extended Dale model as proposed by Ten Have and Morabia (1999) is certainly desirable in situations where the interest is in studying the within-subject evolution, it does not accommodate population-averaged comparisons, surely due to the fact that the logit link function does not have a simple representation of the marginal means of the outcomes. Specifically, the fixed effects parameters have a subject-specific interpretation and describe on average how a subject s probability of experiencing the event of interest depends on time (Zeger, Liang and Albert, 1988). This may not be relevant in studies where population comparisons and within-subject comparisons are of equal importance. Another drawback of the logit link is that it does not reduce to the usual logit model when each marginal outcome has only a single random effects and only one observation per level of this random effects (McCullogh, 1994). Finally, this model was restricted to only bivariate binary outcomes in time. A number of authors have recently proposed regression techniques for longitudinal multinomial outcomes (Clayton, 1992; Gange, Linton, Scott, DeMets, and Klein, 1993; Miller, Davis, and Landis, 1994; Lipsitz, Kim and Zhao, 1994). These authors have predominantly adopted the generalized estimating equations (GEE) of Liang and Zeger (1986) to proportional odds models for clustered ordinal responses. However, in longitudinal studies, this method is not necessarily 3

5 appropriate. Indeed, when the missing data mechanism is not completely at random, the standard GEE methods provide biased parameter estimates. Furthermore, the GEE-based approach, as a distribution-free methodology, does not lend itself to classical tools for model checking. Hence, the search for other alternatives continue. Our approach is based on the concept of a latent variable that allows a full likelihood-based modeling of longitudinal bivariate ordinal responses with ignorable missing data. We will generalize the work of Hedeker and Gibbons (1994) to allow for multiple outcomes and that of Lesaffre and Molenberghs (1991), Lesaffre and Kaufmann (1992), and Kim (1995) to allow for longitudinal outcomes. Specifically, the random effects terms are introduced in the model via the latent variable to model the induced longitudinal association or other heterogeneity among subjects. The crosssectional association is modeled through the correlation coefficient of the underlying multivariate normal latent variable, conditional on random effects. Random effects can also be included in the correlation coefficient. Of course, this will complicate the model considerably. But the inclusion is justified on two grounds: (1) the cross-sectional association is of interest in itself, certainly in a model for bivariate responses; and (2) the inclusion of a random effects structure can be seen as a means to construct a goodness-of-fit test for the basic starting model. Ochi and Prentice (1984) have also developed a class of latent-variable models to construct equi-correlated probit models. One limitation of their approach is that it was restricted to multivariate binary outcomes although its extension to ordinal outcomes is straightforward. Furthermore, their model did not accommodate multiple random effects and could not model negative correlations between the marginal outcomes. Finally, our approach has the advantage over the Ochi and Prentice model in that it does not require the mean to be constant within levels of the random effect. Latent-variable models have also been described in developmental toxicity studies where the interest was to specify the joint distribution of discrete and continuous outcomes (see e.g. Catalano and Ryan, 1992; and Fitzmaurice and Laird, 1995). But our interest is in bivariate ordinal outcomes. In Section 2, we describe the mixed effects model for bivariate ordinal responses. For this we describe the concept of latent variable in the bivariate setting, and then extend the model 4

6 to accommodate longitudinal data. Section 3 describes the estimation procedure for the model parameters. In section 4, the proposed model is applied to the example data set and compared to a population-averaged bivariate model assuming independence across time and to univariate probit random effects models. Finally, the usefulness and future extension of the model are discussed in Section A latent-variable model for bivariate ordinal data in time 2.1 Model formulation Due to the lack of a natural distribution for ordinal outcomes, it is convenient conceptually to assume that the observed ordinal responses Y =(Y 1,Y 2 ) T are generated from an underlying latent variable W =(W 1,W 2 ) T with two sets of threshold values a =(a 1,a 2,..., a r 1 ) T and b = (b 1,b 2,..., b s 1 ) T,wherer and s represent the number of ordinal levels for the first and the second marginal outcome, respectively. Specifically, the univariate responses Y 1kt and Y 2kt for unit k, e.g., subject in the longitudinal setting and cluster in repeated data setting, at time point τ kt fall in category i and j, respectively, if the first component W 1kt of the latent response exceeds a i 1 but does not exceed a i, and so for the second component W 2kt with respect to b j 1 and b j. Hence, letting a 0 = b 0 = and a r = b s =, the model formulation at the first stage is given by { (Y1kt = i, Y 2kt = j) (a i 1 W 1kt <a i,b j 1 W 2kt <b j ) i, j 1 i r and 1 j s. (1) And for the cumulative events, we get (Y 1kt i, Y 2kt j) (W 1kt <a i,w 2kt <b j ) i, j 1 i r and 1 j s. (2) The threshold values must be monotonically increasing to reflect the ordinal nature of the observed outcomes. And for a binary outcome, only one threshold value representing the usual intercept is needed. At the second stage, we consider a mixed effects regression model for the bivariate latent response as follows: { Wlkt = x lkt β l + z lkt d lk + ε lkt,l=1, 2 d k N(0,D)andε kt N (0,H kt ). (3) 5

7 We further assume that d k =(d T 1k,dT 2k )T and ε kt =(ε 1kt,ε 2kt ) T are independent. x 1kt and x 2kt are the 1 q 1 and 1 q 2 design row vectors for the fixed effects with associated column vector slopes β 1 and β 2, respectively. Also, z 1kt and z 2kt are the 1 r 1 and 1 r 2 design row vectors for the random effects associated with the column vectors of unknown random effects d 1k and d 2k, respectively. [ σ The vector ε kt is the residuals vector with covariance matrix H kt = 1 2 ] ρ kt σ 1 σ 2 ρ kt σ 1 σ 2 σ2 2. The parameters β 1 and β 2 are common to all subjects, while d 1k and d 2k are subject-specific. For subject k, z 1k and z 2k often contain fixed (time-independent) subject-specific covariates, but time-varying covariates are also possible as indicated by Lesaffre, Todem and Verbeke (2000). To obtain a well-formulated model such as in Morrell, Pearson and Brant (1997), these matrices are restricted to satisfy the conditions, rank(x lk z lk )=rank(x lk ),l=1, 2, where (x lk z lk )represents the matrix obtained by stacking the matrices x lk and z lk. To complete the model formation, the correlation coefficient ρ kt of the bivariate latent variable given d k,[w kt d k ], may depend on covariates, with a design row vector x 3kt and corresponding slope vector β 3. This correlation coefficient is modeled using the Fisher transformation, ( ) 1+ρkt log = x 3kt β 3 1 ρ kt to ensure that 1 ρ kt 1. The parameters β l and σ l, l =1, 2, are not jointly identifiable, but the ratios β l /σ l,l=1, 2, are estimable (Catalano and Ryan, 1992). Hence, the bivariate latent response, conditional on d k, can be re-scaled by assuming that σ 1 = σ 2 = 1. Furthermore, following Gibbons and Bock (1987), it is useful to orthogonalize the random effects by letting d k = Tθ k where T, a lower triangular matrix with positive diagonal elements, is the Cholesky decomposition of D, i.e., TT T = D. The re-parameterized model is then given by E(W ( k θ k )=x ) k β + z k Tθ k log 1+ρk 1 ρ k = x 3k β 3 (4) θ k N (0,I r1 +r 2 ) and TT T = D [ ] [ ] [ ] x1k 0 z1k 0 β1 where x k =, z 0 x k =, β = and I 2k 0 z 2k β r1 +r 2 is the identity matrix of 2 order r 1 + r 2. The covariance matrix for subject k is equal to V k = z k Dzk T + H k,whereh k = Diag(H k1,h k2,..., H knk ). 6

8 For the sake of computing analytically the partial derivatives of the log-likelihood with respect to the Cholesky decomposition T of D, thematrixt and the corresponding vector of independent [ ] [ ] T11 0 θ1k random effects θ k are decomposed respectively as T = and θ T 21 T k =. The 22 θ 2k model as written in (4) yields the following regression model for each marginal component with uncorrelated random effects: { E(W1kt θ k )=x 1kt β 1 + z 1kt T 11 θ 1k E(W 2kt θ k )=x 2kt β 2 + z 2kt (T 21 θ 1k + T 22 θ 2k ) 2.2 Features of the model and parameter interpretation The model as described above accounts for the cross-sectional, longitudinal and the crosssectional-by-longitudinal association as follows: Corr(W 1kt,W 2kt )= ρ kt +z 1kt Cov(d 1k,d 2k )z T 2kt (1+z1kt Var(d 1k )z T 1kt )(1+z 2ktVar(d 2k )z T 2kt ) Corr(W lkt,w lkt )= z lktvar(d lk )z T lkt, l =1, 2; t t 1+z lkt Var(d lk )zlkt T Corr(W 1kt,W 2kt )= z 1kt Cov(d 1k,d 2k )z T 2kt (1+z1kt Var(d 1k )z1kt T )(1+z 2ktVar(d 2k )z2kt T t ),t The correlation coefficient ρ kt accounts implicitly for the cross-sectional association between the two actual outcomes given the subject-specific random effects d k. The multivariate nature of the (subject) random effects accounts for both the longitudinal association and the cross-sectional-bylongitudinal association, respectively, through the variances and the covariance of the marginal components of d 1k and d 2k. The correlation structure of the model, captured through ψ ll ktt = corr(w lkt,w l kt ), l,l =1, 2; t, t =1, 2,...n k, can be represented schematically by considering the two latent outcomes at two time points t and t asshowninfigure1. [Figure 1 about here.] On the latent scale, E(W lkt θ k )=x lkt β l + z lkt Tθ k and E(W lkt )=x lkt β l,l=1, 2. Therefore, the fixed effects parameters have both the subject-specific and marginal interpretations, as it is typical in linear mixed models (see Diggle et al., 1994). 7

9 This does not hold on the data scale. However, population-averaged parameters can be expressed as a factor of subject-specific parameters and therefore are equivalent with respect to model testing and reduction. From equation (2), the cumulative distribution of (Y 1kt,Y 2kt ) conditional on random effects is given as follows: pr(y 1kt i, Y 2kt j d k )=Φ(a i x 1kt β 1 z 1kt d 1k,b j x 2kt β 2 z 2kt d 2k ) (5) Taking the expectation of the conditional probability in equation (5) with respect to the distribution of the random effects d k and using the results of the theorem given in Appendix A, we get pr(y 1kt i, Y 2kt j) =Φ ρkt a i x 1kt β 1, b j x 2kt β 2 (6) 1+z 1kt Var(d 1k )z1kt T 1+z 2kt Var(d 2k )z2kt T which gives for each marginal component ( pr(y 1kt i) =Φ ( pr(y 2kt j) =Φ a i x 1kt β 1 1+z1kt Var(d 1k )z1kt T b j x 2kt β 2 1+z2kt Var(d 2k )z2kt T ) ) (7) where Φ(.) andφ ρkt (.) represent, respectively, the cumulative distribution function (CDF) of a univariate standard normal and the bivariate standard normal with a correlation coefficient ρ kt. Hence, for the first marginal outcome, it is easily seen from equation (7) that the populationaveraged parameters associated with β 1 and a i are smaller in magnitude provided that Var(d 1k )is positive and estimable and are given respectively by β 1 1+z1kt Var(d 1k )z T 1kt 2.3 Generalization of the method to multivariate ordinal outcomes in time and a i. 1+z1kt Var(d 1k )z1kt T The proposed model for bivariate ordinal outcomes can be easily extended to the case where m ordered categorical outcomes Y kt =(Y 1kt,Y 2kt,..., Y mkt ) T are observed over time. One typically assumes the existence of an underlying multivariate normal response W kt =(W 1kt,W 2kt,..., W mkt ) T which is related to the observed outcome through m vectors of threshold values a 1,a 2,..., a m 1 and a m,wherea l =(a l1,a l2,..., a l,sl 1) T,withs l being the number of ordered levels of the lth outcome. Specifically, the model formulation at the first stage is given by, { (Y1kt = i 1,..., Y mkt = i m ) (a i1 1 W 1kt <a i1,..., a im 1 W mkt <a im ) i 1,i 2,..., i m in the set of possible outcomes. 8

10 At the second stage, we consider a mixed effects regression model for the latent response as follows, { Wlkt = x lkt β l + z lkt d lk + ε lkt, l =1, 2,..., m d k N(0,D)andε k N(0,H kt ) For a relatively small number, m, of cross-sectional outcomes, the correlation matrix H kt may be left unstructured. However, when m is large, a structure could be given to H kt, assuming for example that its off-diagonal elements are all equal. This structure corresponds to that of the equi-correlated probit model of Ochi and Prentice (1984). The optimization and the estimation can proceed from there. 3. Maximum marginal likelihood estimation Given the above random effects regression model for the latent bivariate response W and relations (1) and (3), the joint probability of the bivariate ordinal response conditional on θ k is given by pr(y 1kt = i, Y 2kt = j θ k ) = pr(a i 1 W 1kt <a i,b j 1 W 2kt <b j θ k ) = Φ ρkt (a i e 1kt,b j e 2kt ) Φ ρkt (a i 1 e 1kt,b j e 2kt ) Φ ρkt (a i e 1kt,b j 1 e 2kt )+Φ ρkt (a i 1 e 1kt,b j 1 e 2kt ) where e lkt = E(W lkt θ k ),l=1, 2. Assuming that the bivariate responses of subject k are conditionally independent given θ k, the joint probability L(y k θ k ) for the observed outcome matrix y k is equal to the product of the conditional probabilities of all time-point responses and is given by L(y k θ k )= n k r t=1 i=1 j=1 where I(.) is the indicator variable. s [pr(y 1kt = i, Y 2kt = j θ k )] I(y 1kt=i) I(y 2kt =j) Then the marginal density of Y k in the population, i.e., the contribution of subject k to the marginal likelihood, is the integral of L(y k θ k ) weighted by the joint density function of the transformed random effects terms, namely, L(y k )=(2π) (r 1+r 2 )/2 R r 1 +r 2 L(y k θ k )exp( θ k 2 /2)dθ k where R r 1+r 2 is the (r 1 + r 2 ) dimensional euclidian space with. being the euclidian norm. The marginal likelihood for a sample of N independent subjects is given by L = N k=1 L(y k). Maximizing the log-likelihood, log L, with respect to Θ = (β 1,β 2,β 3,a,b,vec(T )) with vec(t )being 9

11 the vector of unique non-zero elements of T, the vector of model parameters, yields the likelihood equation log L Θ = N k=1 L 1 (y k ) L(y k) Θ =0 which will give MLE, provided that 2 log L is positive definite at the optimum solution. The key Θ Θ T computational features rely then on evaluating L(y k ) Θ = nk r s R r 1 +r 2 t=1 i=1 j=1 I(y 1kt = i) I(y 2kt = j)p 1 p ktij ktij Θ L(y k θ k )(2π) (r 1+r 2 )/2 exp( θ k 2 /2)dθ k where p ktij = Pr(Y 1kt = i, Y 2kt = j θ k ). This leads to the computation of p ktij Θ. For the first univariate outcome, the partial derivative of p ktij with respect to β 1 is given by p ktij β 1 = [ ( ) ( ) b φ(a i e 1kt )Φ j e 2kt ρ kt (a i e 1kt ) b + φ(a 1 ρ 2 i 1 e 1kt )Φ j e 2kt ρ kt (a i 1 e 1kt ) kt 1 ρ 2 kt ( ) ( )] b +φ(a i e 1kt )Φ j 1 e 2kt ρ kt (a i e 1kt ) b φ(a 1 ρ 2 i 1 e 1kt )Φ j 1 e 2kt ρ kt (a i 1 e 1kt ) x kt 1 ρ 2 1kt kt where φ(.) is the probability density function of the standard normal distribution. The partial derivatives of p ktij with respect to a threshold value a i,i =1,...r 1, gives ( ) ( ) p ktij b a = δ i i,i φ(a i e 1kt )Φ j e 2kt ρ kt (a i e 1kt ) b δ 1 ρ 2 i 1,i φ(a i 1 e 1kt )Φ j e 2kt ρ kt (a i 1 e 1kt ) kt 1 ρ 2 kt ( ) ( ) b δ i,i φ(a i e 1kt )Φ j 1 e 2kt ρ kt (a i e 1kt ) b δ 1 ρ 2 i 1,i φ(a i 1 e 1kt )Φ j 1 e 2kt ρ kt (a i 1 e 1kt ) kt 1 ρ 2 kt where δ i,i =1ifi = i and 0 otherwise. Differentiating p ktij with respect to β 2 and threshold values b j, j = 1,...s 1, can be done analogously. Differentiating p ktij with respect to β 3 through a chain rule is given by p ktij β 3 = [φ ρkt (a i e 1kt,b j e 2kt ) φ ρkt (a i 1 e 1kt,b j e 2kt ) φ ρkt (a i e 1kt,b j 1 e 2kt )+φ ρkt (a i 1 e 1kt,b j 1 e 2kt )] 2x 3kte x 3kt β 3 (1+e x 3kt β 3 ) 2 Differentiating p ktij with respect to vec(t )=(vec(t 11 ),vec(t 21 ),vec(t 22 )) requires the computations of p ktij e lkt (derived above); e lkt vec(t ll ) =(θ lk z lkt )J T r l, l =1, 2; and e 2kt vec(t 21 ) = θ 1k z 2kt,where is the direct product operator. As noted by Hedeker and Gibbons (1994), J T r l is the transformation 10

12 matrix of Magnus (1988), with dimension r l (r l +1)/2 rl 2, which eliminates the elements above the main diagonal. The likelihood equations can be solved using a Quasi-Newton approach where Θ γ, the parameter at step γ, is updated as follows, 1 log L Θ γ+1 =Θ γ + I e (Θ γ ; y) Θ γ where I e (Θ γ ; y), an empirical and consistent estimator of the information matrix at step γ (derived in Appendix B), is given by I e (Θ γ ; y) = N k=1 L 2 (y k ) L(y k) Θ γ ( ) L(yk ) T N S S T (8) with S = 1 N N k=1 L 1 (y k ) L(y k) Θ γ. At the optimum point, i.e., γ = γ max, I e (Θ γmax ; y) = ( ) T N k=1 L 2 (y k ) L(y k) L(yk ) Θ γmax Θ γmax can be inverted to get the asymptotic variance-covariance matrix of the model parameter estimates. 3.1 Adaptive Gaussian quadrature and computations Gaussian quadrature rules are used to approximate integrals of functions with respect to a given kernel by a weighted average of the integrand evaluated at predetermined abscissas as in Pinheiro and Bates (2000). This methodology relies basically on the concept of orthogonal functions for which high degree of accuracy is attained when the integrand is sufficiently smoothed. The weights and abscissas used in Gaussian quadrature rules for the most common kernels, including the normal kernel, can be obtained from the tables of Abramowitz and Stegun (1964). If Q univariate Θ γ quadrature points are requested, then a r 1 + r 2 -dimensional integral requires Q r 1+r 2 multivariate points P T q =(P q1,p q2,..., P q,r1 +r 2 ),q=1,..., Q r 1+r 2, with associated weights given by the product of the corresponding univariate weights Π(P q )= r 1 +r 2 h=1 Π(P qh ). As the number of random effects, r 1 + r 2, increases, the multidimensional quadrature points increases exponentially in the quadrature solution. However, several authors have reported that the number of points in each univariate dimension can be reduced for higher dimensional integrals without impairing the accuracy of the approximations. For example, we found that as few as five 11

13 points per dimension were sufficient to obtain adequate accuracy when random intercept models were fitted. Hence, the contribution L(y k ) of subject k to the likelihood can be approximated by Q r 1 +r 2 L(y k ) π (r 1+r 2 )/2 q=1 L(y k 2 P q )Π(P q ). (9) The Gaussian quadrature as noted by Pinheiro and Bates (2000) can be viewed as a deterministic version of Monte Carlo integration in which the random sample of θ k are generated from the N (0,I r1 +r 2 ). In a pure Gaussian quadrature approach, the quadrature points and the corresponding weights are fixed beforehand, but in Monte Carlo, they are left to random choice. Because importance sampling tends to be much more efficient than the deterministic Monte Carlo, we consider the equivalence of importance sampling in the Gaussian quadrature context which is termed by Pinheiro and Bates (2000) as the adaptive Gaussian quadrature. Here, the grid of the abscissas in the scale of θ k is centered around the conditional mode ˆθ k rather than 0, as in (9). We recall that L(y k )=(2π) (r 1+r 2 )/2 R r 1 +r 2 exp(log(l(y k θ k )) θ k 2 /2)dθ k For the ease of notation, let H(y k, ˆθ k )= 2 h(y k,θ k ) θ k θ T k θk =ˆθ k where h(y k,θ k )=log(l(y k θ k )) θ k 2 /2and ˆθ k = arg max θk h(y k,θ k ). In addition, we consider the scaling of θ k using H 1 2 (y k, ˆθ k ), the Cholesky decomposition of H(y k, ˆθ k ) as follows, θ k = ˆθ k + H 1 2 (y k, ˆθ k )θ k. The adaptive Gaussian quadrature is then given by Q r 1 +r 2 L(y k ) (2π) (r 1+r 2 )/2 q=1 { exp h(y k, ˆθ k + H 1 2 (yk, ˆθ k )P q )+ P q 2} Π(P q ) 12

14 3.2 Goodness-of-fit As the model is likelihood-based, likelihood-ratio test (LRT) statistics could be used to test for statistical significance of the fixed effects terms in the model. However, testing for the need or the reduction of the dimensionality of the random effects results in a non-standard problem since the null set does not lie in the interior of the parameter space (Chernoff, 1954). An informal assessment of the model fit can be performed by comparing the observed proportions to the fitted probabilities using expressions in (6) and (7), marginally and jointly for the two outcomes. Others have used this approach for monitoring random effects models (see e.g. Legler and Ryan, 1997; Hedeker and Gibbons, 1994; and Ten Have and Morabia, 1999). 4. Example To illustrate the application of the proposed model to longitudinal bivariate ordinal outcomes, we examined data collected in the cardiovascular educational study. These data were previously analyzed by Ten Have and Morabia (1999) with outcomes re-coded to binary scales (negative vs. no change or positive changes). Modeling the data as binary outcomes results in a loss of information although they are more convenient to handle from a medical standpoint. As noted by these authors, the data set contains missing outcomes at the 4-month-visit and at 12-month visit. Specifically, out of 266 subjects, there were 208 subjects with reported outcomes at the 4-month visit and 243 subjects at the 12-month visit. Missingness was less severe at 12 months due to additional efforts by the study coordinators to contact study patients at 12 months. Preliminary analyses were performed to see if the missing visits were related to the unobserved outcomes. For example, analyses of the dropout indicator, at 4 months and at 12 months, using the blood pressure and cholesterol level at the previous visit as covariates, were performed. No association involving these covariates was found at a 5% significance level. Therefore, the Missingness in these data was considered to be ignorable. Under such conditions, the maximum likelihood estimation yields consistent and unbiased estimates (Rubin, 1976). From a statistical perspective, the effects under investigation, for each component of the model, are intervention (audio intervention being the reference group), time (12 months being the reference 13

15 time), and intervention-time interaction. The intervention-time interaction effect corresponds to the difference in intervention between the two follow-up times. The proposed latent-variable model denoted as MEBO1 with correlated random intercept terms is first fitted to the data. A simple version of MEBO1 denoted as MEBO2, which assumes independence between the random effects of each marginal component is also considered. The aim here is to assess the robustness of the model estimates under MEBO1. A third model, referred to as IBO, is a naive bivariate model in that it omits the random effects parameters while assuming the same fixed effects structure as MEBO1. Results are also presented for two separate random intercept models (MEUO1 and MEUO2), one for each response, to assess the robustness of the marginal probit estimates when the outcomes are jointly analyzed. The results obtained from these last models were exactly the same as the ones provided by MIXOR (A FORTRAN program for modeling univariate ordinal outcome in time, proposed by Hedeker and Gibbons, 1994). Each of the five model estimates obtained using GAUSS (Aptech, 1990) and displayed in table 1 contains all these effects in each component. Sensitivity of the estimates. As expected, the impact of including random effects in the bivariate probit model (MEBO1) is similar to the impact reported in the literature for the univariate ordinal mixed effects model. The estimates and standard errors under MEBO1 exceed consistently the corresponding population-averaged counterparts (IBO) reflecting the heterogeneity among subjects as noted by Zeger et al. (1988). We should note, however, that this trend is not observed for the estimates of the correlation coefficient ρ kt. A theoretical explanation on this is yet to be found. Table 1 also reveals that the marginal probit parameter estimates and standard errors under MEBO1 are very similar to the analogous parameter estimates and standard errors under the univariate MEUO1 and MEUO2. However, this is not necessarily an indication of orthogonality between the marginal probit and the correlation coefficient parameter estimates for at least one reason: the marginal probit standard errors under the MEBO1 are smaller than the analogous standard errors under MEUO1 and MEUO2 in contrast to what is expected for the bivariate probit model without random effects. Finally, the estimates of model MEBO2 that assumes independence between the random effects are very close to those under model MEBO1. In fact, the observed value 14

16 of the LRT statistic is which is less than , the critical point based on a χ 2 distribution with 1 degree of freedom. This suggests that ignoring the cross-sectional by longitudinal association is not a big issue in these data. [Table 1 about here.] Model interpretations. Under the MEBO1 model, the intervention-time interaction effect estimate on the probit for cholesterol is (0.2728), which is not significant at 5% level (p=0.1571). By rescaling this estimate to get the population-averaged parameter estimates, subjects on the audio-intervention arm, in spite of the lack of statistical significance, exhibit higher probabilities of observing a positive or at least no change in the cholesterol status than subjects under the non-intervention arm at 4 months. And this gap between the two intervention groups is greater at 12 months. Under the same model, the intervention-time interaction effect estimate on the probit for blood pressure is (0.2666), which is significant at 5% level (p=0.025). Again by rescaling this estimate to get the population-averaged parameter estimates, subjects under the audio-intervention arm exhibit higher probabilities of observing a positive or at least no change in the blood pressure status than subjects under the non-intervention arm at 12 months. However, at 4 months, participants under the non-intervention arm perform better. Therefore the intervention becomes important when additional educational materials are introduced between 4 and 12 months. [Table 2 about here.] Goodness-of-fit. In an informal assessment of goodness-of-fit, Table 2 compares the observed cumulative proportions and MEBO1-based cumulative fitted probabilities for each intervention by time combination. A very good agreement between the observed and predicted marginal and joint proportions is indicated, which should be expected given that all three components of the model include the intervention-time interaction. 5. Discussion We have described latent-variable models for analyzing longitudinal bivariate ordinal outcomes which provide both a person-specific and marginal covariate-response interpretations of the fixed 15

17 effects model parameters. The model formation assumes two stages. First, the ordinal response is related to a continuous latent variable via the threshold concept. Second, a classical multivariate mixed model for the latent response is formulated. Assuming conditional independence given random effects terms, the marginal likelihood for ordinal outcomes is approximated using an adaptive Gauss-Hermite quadrature approach to numerical integration. An important feature of these models is that they allow irregularly spaced measurements across time, time-dependent and independent covariates and ignorable missing data. However, the data set used to illustrate our methodology does not have a continuous covariate, but our estimation approach does theoretically accommodate them. For example, if the data set is highly unbalanced with respect to time, we can easily fit a random intercept and a random slope with respect to time to model each subject profile. Although the proposed models are motivated by the two-stage modeling approach, it is the marginal models that are fitted to the data. Hence, inferences based on the marginal models do not explicitly assume the presence of random effects representing the natural heterogeneity among subjects. However, competing models with the same marginal fit correspond to the same maximized likelihood and with equal fixed effects estimates. An important question with respect to our approach is whether we actually believe in the threshold model and the unobserved latent variable or if we merely use it as a device to handle ordinal data. Several authors reported in the literature that latent-variable models require large data sets in order to estimate all model parameters (see e.g., Garrett and Zeger, 2001). In many instances, it is unclear if there is enough data to estimate the model parameters uniquely or with any precision. To deal with this issue, we fitted the proposed model using different parameter starting values to check if the model is identifiable. This technique is obviously too empirical and heuristic. Therefore, there is a need to develop a formal procedure for monitoring latent variable models. Garrett and Zeger (2001) have proposed the use of a Bayesian approach where the posterior distribution of each parameter is compared to the prior distribution. Other authors have also used this Bayesian approach, under the terminology Rubin s posterior predictive check, for 16

18 model monitoring when tools such as the LRT is not applicable. Typically, the use of the limiting distribution of the LRT statistic for the test of a need of random effects or the reduction of their dimension is not valid due to the fact that the regularity condition that the null hypothesis has to lie in the interior of the parameter space is not met. Though our models were built from the features of the data at hand, they have general application to situations where multiple ordinal outcomes are recorded over time. However, in some situations such as the presence of informative missing data, the proposed models need to be extended accordingly. The model could also be extended by assuming that the random effects terms follow a non-parametric distribution for which the support points and the frequencies have to be estimated from the data. Even the number of support points could be estimated as well. This will definitely lead to larger parameter s standard errors as the normality assumptions do bring additional information to the model formation. All these issues will be the focus of future work. Acknowledgements The first author wishes to thank Dr Thomas Ten Have for the permission to use data from the cardiovascular educational program study. Financial support for this work was provided by the University of Wisconsin-Madison through the Graduate School, Medical School and Comprehensive Cancer Center. The first author finally acknowledges the financial support from the IBC/ENAR student award for the presentation of this work during the ENAR 2002 Spring meeting. References Aptech, S. (1990). GAUSS 2.1 User s manual. Kent, WA: Author. Catalano, P. J. and Ryan, L. M. (1992). Bivariate latent variable models for clustered discrete and continuous outcomes. Journal of the American Statistical Association 87, Chernoff, H. (1954). On the distribution of the likelihood ratio. Annals of the mathematical statistics 25,

19 Dale, J. (1986). Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42, Diggle, P., Liang, K. and Zeger, S. (1994). Analysis of longitudinal data. Clarendon Press, Oxford. Fitzmaurice, G. M. and Laird, N. L. (1995). Regression models for a bivariate discrete and continuous outcome with clustering. Journal of the American statistical Association 90, Garrett, E. and Zeger, S. (2001). Assessing estimability of latent class models using a bayesian estimation approach. In The impact of technology on biometrics. ENAR Spring Meeting. Gibbons, R. and Bock, R. D. (1987). Trend in correlated proportions. Psychometrika 52, Glonek, G. F. V. and McCullagh, P. (1995). Multivariate logistic models. Journal of the Royal Statistical Society Series B 57, Hedeker, D. and Gibbons, R. (1994). A random-effects ordinal regreession model for multilevel analysis. Biometrics 50, Kim, K. (1995). A bivariate cumulative probit regression model for ordered categorical data. Statistics in Medicine 14, Kim, K. and Todem, D. (2000). A gauss program for fitting bivariate probit model with applications to ophtamologic studies. Technical report, Department of Biostatistics, University of Wisconsin- Madison. Legler, J. M. and Ryan, L. M. (1997). Latent variable model for teratogenesis using multiple binary outcomes. Journal of the American Statistical Association 92, Lesaffre, E. and Kaufmann, H. (1992). Existence and uniqueness of the maximum likelihood estimator for a multivariate probit model. Journal of the American statistical Association 87, Lesaffre, E. and Molenberghs, G. (1991). Multivariate probit analysis : a neglected procedure in medical statistics. Statistics in Medicine 10, Lesaffre, E., Todem, D. and Verbeke, G. (2000). Flexible modelling of the covariance matrix in a linear random effects model. Biometrical Journal 42, Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. 18

20 Biometrika 73, Lipsitz, S., Kim, K. and Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine 13, McCullogh, C. E. (1994). Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association 89, Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinal data using a multivariate plackett distribution. Journal of the American Statistical Association 89, Morrell, C., Pearson, J. D. and Brant, L. J. (1997). Linear transformations of linear mixed effects models. The American Statistician 51, O Brien, P. C. (1984). Procedures for comparing multiples endpoints. Biometrics 40, Ochi, Y. and Prentice, R. L. (1984). Likelihood inference in a correlated probit regression model. Biometrika 71, Pinheiro, J. C. and Bates, D. M. (2000). Mixed effects models in S and S-plus. Springer, New-York. Plackett (1965). A class of bivariate distribution. Journal of the American statistical Association 60, Rubin, D. B. (1976). Inference and mising data. Biometrika 63, Ten Have, T. and Morabia, A. (1999). Mixed effects models with bivariate and univariate association parameters for longitudinal bivariate binary response data. Biometrics 55, Williamson, J. and Kim, K. (1996). A global odds ratio regression model for bivariate ordered categorical data from ophthalmologic studies. Statistics in Medicine 15, Zeger, S. L., Liang, K.-Y. and Albert, P. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44, Appendix A Computation of the marginal cumulative probabilities In order to deduce the marginal probabilities distributions of (Y 1kt,Y 2kt ), we let Φ C (u) bethe bivariate normal distribution function with argument u, mean vector 0 and covariance matrix C. 19

21 We also let φ C (v, µ) be the bivariate normal density function with argument v, mean vector µ and covariance matrix C. Theorem 1. If pr(y 1kt i, Y 2kt j d k )=Φ C (u + x kt β + z kt d k ) (A.1) where u =(a i,b j ) and d k is a random vector with mean E(d k ) and covariance matrix D, then pr(y 1kt i, Y 2kt j) =Φ zkt Dz T kt +C (u + x ktβ + z kt E(d k )) (A.2) Proof. Integrating out the random effects d k in (A.1), we find, after a variable transformation and a change of the order of integration pr(y 1kt i, Y 2kt j) = ai x 1kt β 1 bj x 2kt β 2 f(ω)dω (A.3) where f(ω) = R r 1 +r 2 φ D(u, E(d k ))φ C (ω, z kt u)du with R r 1+r 2 being the support space of the random effects d k. Using standard results on normal distributions, one can show that f(ω) =φ zkt Dz T kt +C(ω, z kte(d k )) (A.4) Inserting (A.4) in (A.3) and (A.2) concludes the proof. Appendix B Approximation to the observed information matrix Assuming the data y =(y 1,y 2,..., y N ) is a matrix of independent random vectors with a common probability density function L(Θ,y k ) L(y k ). The log-likelihood and the score functions for N independent subjects are given respectively by log L = N k=1 log L(y k)ands(θ,y)= N k=1 S(Θ; y k) where S(Θ; y k )= log L(y k) θ. Hence the expected information matrix for the whole data is I(Θ) = Ni(Θ) with i(θ) = E ( s(θ; y k )s T (Θ; y k ) ) = Cov(s(Θ; y k )) being the information contained in a single observation. The empirical information in a single observation can be estimated by î(θ) = 1 N N k=1 s(θ; y k )s T (Θ; y k ) 1 N 2 S(Θ; y)st (Θ; y). 20

22 The empirical information matrix I e (Θ; y) based on N independent subjects and displayed in equation (8) is then given by I e (Θ; y) = N k=1 s(θ; y k)s T (Θ; y k ) 1 N S(Θ; y)st (Θ; y) which, for Θ = ˆΘ (MLE), reduces to I e ( ˆΘ; y) = N k=1 s( ˆΘ; y k )s T ( ˆΘ; y k )sinces( ˆΘ; y) =0. I e ( ˆΘ; y) is commonly used in practice to approximate the observed information matrix which is difficult if not impossible to compute analytically; see for example Hedeker and Gibbons (1996), Kim and Todem (2000). I e ( ˆΘ; y)/n is a consistent estimator of i(θ) at the maximum likelihood. The use of this estimator can be justified also in the following sense: I(Θ; y) = 2 log L Θ Θ T = N 2 log L(y k ) k=1 Θ Θ T = N k=1 s(θ; y k)s T (Θ; y k ) N k=1 1 2 L(y k ) L(y k ) Θ Θ T where I(Θ; y) is the observed information matrix which is different from the expected version I(Θ) defined earlier. The second term on the right-hand of the last equality has zero expectation. Hence, I( ˆΘ; y) N k=1 s( ˆΘ; y k )s T ( ˆΘ; y k ) = I e ( ˆΘ; y) where the accuracy of this approximation depends on how close ˆΘ is to Θ. In the particular case of multinomial distributed data, Kim and Todem (2000) have shown that the second term on the right-hand side of the first equality above is zero, and so the equality above holds exactly. 21

23 ψ 11ktt W 1 kt W 1 kt ψ 12ktt ψ 12ktt ψ 12kt t W 2 kt W 2 kt ψ 22ktt Figure 1. The correlation structure of the proposed model on the latent scale 22

24 Table 1 Parameter estimates (standard errors) under the MEBO1, MEBO2, MEUO1, MEUO2 and IBO models fitted to the cardiovascular trial data Component Parameter MEBO1 MEBO2 MEUO1 MEUO2 IB0 Thresh (0.2195) (0.2187) (0.2283) (0.1144) Thresh.2 - Thresh (0.2195) (0.2186) (0.2167) (0.0777) CHOL Interv. (audio) (0.2252) (0.2241) (0.2309) (0.1534) Time (12 months) (0.1796) (0.1794) (0.2059) (0.1883) Interv. by Time (0.2728) (0.2728) (0.2713) (0.2735) Thresh (0.1481) (0.1482) (0.1569) (0.1089) Thresh.2 - Thresh (0.1234) (0.1235) (0.1431) (0.0661) BP Interv. (audio) (0.2035) (0.2031) (0.1943) (0.1505) Time (12 months) (0.1676) (0.1674) (0.1657) (0.1738) Interv. by Time (0.2666) (0.2641) (0.2368) (0.2571) Intercept (0.4513) (0.4330) (0.2306) CHOL-BP Interv. (audio) (0.6213) (0.6193) (0.3251) Time (12 months) (0.7528) (0.7478) (0.3953) Interv. by Time (1.0037) (1.0012) (0.5341) T Random effects (0.1734) (0.1720) (0.1708) variance terms T (Cholesky) (0.1274) T (0.1274) (0.1275) (0.1422) log L MEBO1 and MEBO2: Mixed Effects Bivariate Ordinal model under correlated (T 12 0) and uncorrelated (T 12 =0) random effects respectively; MEUO1 and MEUO2: Mixed Effects Univariate Ordinal model for the first and second marginal respectively; IBO: Bivariate Ordinal model under independence. CHOL and BP: Probit parameters for Cholesterol response and Blood pressure response respectively; CHOL- BP: Parameters (on the Fisher transformation scale) of the underlying correlation for Cholesterol-Blood pressure response. T 11,T 21 and T 22 : Cholesky decomposition of the random intercept variance terms (Var(d 1k ) = T 2 11, Cov(d 1k,d 2k )=T 11 T 21 and Var(d 2k )=T T 2 21 ). 23

25 Table 2 Model-based (MEBO1) marginal and joint cumulative probabilities for changes in Cholesterol and Blood pressure status with the corresponding observed cumulative proportions in parentheses Event BP= 1 BP 2 Marginal(CHOL) CHOL= 1 Intervention Visit (months) No (0.0242) (0.1048) (0.1290) No (0.0099) (0.0693) (0.0990) Yes (0.0168) (0.0756) (0.1513) Yes (0.0467) (0.1121) (0.1682) CHOL 2 Intervention Visit (months) No (0.1371) (0.4435) (0.6210) No (0.1188) (0.4059) (0.6139) Yes (0.1261) (0.4454) (0.6891) Yes (0.1776) (0.5794) (0.7850) Marginal(BP) Intervention Visit (months) No (0.2177) (0.7016) No (0.1485) (0.6535) Yes (0.1765) (0.6218) Yes (0.2150) (0.7383) The changes in cholesterol status (CHOL) and blood pressure (BP) are three-level ordinal outcomes where 1 represents a positive change; 2 no change; and 3 a negative change. 24

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary