Latent-Variable Models for Longitudinal Data with Bivariate Ordinal Outcomes

Size: px
Start display at page:

Download "Latent-Variable Models for Longitudinal Data with Bivariate Ordinal Outcomes"

Transcription

1 Latent-Variable Models for Longitudinal Data with Bivariate Ordinal Outcomes David Todem, 1, KyungMann Kim 2 and Emmanuel Lesaffre 3 1 Department of Statistics, University of Wisconsin-Madison, 1210 W. Dayton St., Madison, WI 53706, U.S.A. 2 Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 600 Highland Ave., Madison, WI 53792, U.S.A. 3 Biostatistical Centre, K.U.Leuven, Kapucijnenvoer 35, B-3000 Leuven, Belgium todem@stat.wisc.edu

2 Summary. We use the concept of a latent variable to derive the joint distribution of bivariate ordinal outcomes, and then extend the model to allow for longitudinal data. Specifically, we relate the observed ordinal outcomes using threshold values to a bivariate latent variable, which is then modeled as a linear mixed model. Random effects terms are used to tie all together repeated observations from the same subject. The cross-sectional association between the two outcomes is modeled through the correlation coefficient of the bivariate latent variable, conditional on random effects. Assuming conditional independence given random effects, the marginal likelihood, under the missing data at random assumption, is approximated using an adaptive Gaussian quadrature for numerical integration. The model provides fixed effects parameters that are subject-specific, but retain the population-averaged interpretation when properly scaled. This is particularly well suited for the situation in which population comparisons and individual level contrasts are of equal importance. Data from a randomized intervention trial of a cardiovascular educational program where the responses of interest are changes in hypertension and hypercholestemia status illustrate the proposed model. Generalization from bivariate to multivariate models is also discussed. Key words: Adaptive Gaussian quadrature; Bivariate ordinal outcome; Importance sampling; Latent variable; Maximum marginal likelihood; Random effects; Repeated measures; Threshold values. 1. Introduction Multiple outcomes are often used as primary endpoints in many longitudinal studies. For example, in growth curve studies not only the height of the child is measured, but also the weight. Often, even length measurements are taken of various parts of the body (e.g., arm, leg). Another example is a social study known as the panel study in which is a sociological study where multiple questions are posed to the subject to measure the outcomes of interest. The same is true in clinical trials, but more importantly the effect of a new treatment is usually evaluated from the analysis of a bivariate response vector comprising an efficacy and a safety parameter. Additionally, it is often the case that subjects responses to treatment are classified according to an ordinal or graded scale, e.g., 1

3 the visual analogue scale. An example of bivariate ordinal data in time is provided by a longitudinal trial of a cardiovascular educational program in which the responses of interest are changes in hypertension and hypercholestemia status. Each of the 266 participants in the analysis sample was randomized either to an audio-dietary educational intervention group or to a non-intervention group (control group). At each of the two follow-up times, 4 months and 12 months, the two outcomes were derived from changes in blood pressure and serum cholesterol from baseline. These outcomes were re-coded to ordinal scales according to whether there was a positive change (1), no change (2) and a negative change (3), based on the criteria established by the NIH-sponsored expert panels (NCEP, 1993; JNC, 1993). Not only were the marginal effects of the education intervention on blood pressure and cholesterol of interest, but so were education effects on the association between these two outcomes (Ten Have and Morabia, 1999). It was also of interest to know if the answer depended on the time. The literature suggests that dietary interventions do not typically impact both blood pressure and cholesterol. There is a need to investigate the effect of the intervention on the association between the two outcomes, to understand why this is the case. Statistical analysis of bivariate ordinal outcome data in time raises a number of challenging issues. It is well known, for example, that repeated measurements from the same subject over time necessitates the use of methods for correlated data. Multiplicity of the outcomes at any timepoint is another important issue. Most modeling approaches in the literature, either deal with crosssectional data or are restricted to longitudinal univariate outcomes. Limited work has been done on bivariate ordinal outcomes in time. Although separate models can be fitted to each outcome, such an approach fails to borrow strength across the outcomes variables. By exploiting the correlation structure with a multivariate model, efficiency and power could be greatly increased (O Brien, 1984). Several authors have developed maximum likelihood procedures for cross-sectional multivariate ordinal outcomes. Glonek and McCullagh (1995), and Molenberghs and Lesaffre (1994) approached the specification of the joint probability of multivariate observations through the firstand higher-order marginal parameters. One important advantage of this approach is that multivari- 2

4 ate observations of unequal numbers of variables per subject may be analyzed quite naturally. Kim (1995) also considered a maximum likelihood estimation for bivariate ordinal measures by using a constrained latent variable specific to ophthalmologic studies. Williamson and Kim (1996) further developed marginal mean regression techniques based on the Plackett (1965) and Dale (1986) global odds ratios as a measure of association. All these models, were not meant for longitudinal outcomes and therefore could not be used for multivariate ordered categorical data in time. Ten Have and Morabia (1999) then extended the original Dale (1986) model to accommodate the time component in analyzing longitudinal bivariate binary outcomes. Basically, their model uses the concept of global odds ratios to represent the cross-sectional association and random effects terms for the longitudinal association. They also included random effects in the global odds ratios which can be seen as a means to construct a goodness-of-fit test for the basic starting model. Although this extended Dale model as proposed by Ten Have and Morabia (1999) is certainly desirable in situations where the interest is in studying the within-subject evolution, it does not accommodate population-averaged comparisons, surely due to the fact that the logit link function does not have a simple representation of the marginal means of the outcomes. Specifically, the fixed effects parameters have a subject-specific interpretation and describe on average how a subject s probability of experiencing the event of interest depends on time (Zeger, Liang and Albert, 1988). This may not be relevant in studies where population comparisons and within-subject comparisons are of equal importance. Another drawback of the logit link is that it does not reduce to the usual logit model when each marginal outcome has only a single random effects and only one observation per level of this random effects (McCullogh, 1994). Finally, this model was restricted to only bivariate binary outcomes in time. A number of authors have recently proposed regression techniques for longitudinal multinomial outcomes (Clayton, 1992; Gange, Linton, Scott, DeMets, and Klein, 1993; Miller, Davis, and Landis, 1994; Lipsitz, Kim and Zhao, 1994). These authors have predominantly adopted the generalized estimating equations (GEE) of Liang and Zeger (1986) to proportional odds models for clustered ordinal responses. However, in longitudinal studies, this method is not necessarily 3

5 appropriate. Indeed, when the missing data mechanism is not completely at random, the standard GEE methods provide biased parameter estimates. Furthermore, the GEE-based approach, as a distribution-free methodology, does not lend itself to classical tools for model checking. Hence, the search for other alternatives continue. Our approach is based on the concept of a latent variable that allows a full likelihood-based modeling of longitudinal bivariate ordinal responses with ignorable missing data. We will generalize the work of Hedeker and Gibbons (1994) to allow for multiple outcomes and that of Lesaffre and Molenberghs (1991), Lesaffre and Kaufmann (1992), and Kim (1995) to allow for longitudinal outcomes. Specifically, the random effects terms are introduced in the model via the latent variable to model the induced longitudinal association or other heterogeneity among subjects. The crosssectional association is modeled through the correlation coefficient of the underlying multivariate normal latent variable, conditional on random effects. Random effects can also be included in the correlation coefficient. Of course, this will complicate the model considerably. But the inclusion is justified on two grounds: (1) the cross-sectional association is of interest in itself, certainly in a model for bivariate responses; and (2) the inclusion of a random effects structure can be seen as a means to construct a goodness-of-fit test for the basic starting model. Ochi and Prentice (1984) have also developed a class of latent-variable models to construct equi-correlated probit models. One limitation of their approach is that it was restricted to multivariate binary outcomes although its extension to ordinal outcomes is straightforward. Furthermore, their model did not accommodate multiple random effects and could not model negative correlations between the marginal outcomes. Finally, our approach has the advantage over the Ochi and Prentice model in that it does not require the mean to be constant within levels of the random effect. Latent-variable models have also been described in developmental toxicity studies where the interest was to specify the joint distribution of discrete and continuous outcomes (see e.g. Catalano and Ryan, 1992; and Fitzmaurice and Laird, 1995). But our interest is in bivariate ordinal outcomes. In Section 2, we describe the mixed effects model for bivariate ordinal responses. For this we describe the concept of latent variable in the bivariate setting, and then extend the model 4

6 to accommodate longitudinal data. Section 3 describes the estimation procedure for the model parameters. In section 4, the proposed model is applied to the example data set and compared to a population-averaged bivariate model assuming independence across time and to univariate probit random effects models. Finally, the usefulness and future extension of the model are discussed in Section A latent-variable model for bivariate ordinal data in time 2.1 Model formulation Due to the lack of a natural distribution for ordinal outcomes, it is convenient conceptually to assume that the observed ordinal responses Y =(Y 1,Y 2 ) T are generated from an underlying latent variable W =(W 1,W 2 ) T with two sets of threshold values a =(a 1,a 2,..., a r 1 ) T and b = (b 1,b 2,..., b s 1 ) T,wherer and s represent the number of ordinal levels for the first and the second marginal outcome, respectively. Specifically, the univariate responses Y 1kt and Y 2kt for unit k, e.g., subject in the longitudinal setting and cluster in repeated data setting, at time point τ kt fall in category i and j, respectively, if the first component W 1kt of the latent response exceeds a i 1 but does not exceed a i, and so for the second component W 2kt with respect to b j 1 and b j. Hence, letting a 0 = b 0 = and a r = b s =, the model formulation at the first stage is given by { (Y1kt = i, Y 2kt = j) (a i 1 W 1kt <a i,b j 1 W 2kt <b j ) i, j 1 i r and 1 j s. (1) And for the cumulative events, we get (Y 1kt i, Y 2kt j) (W 1kt <a i,w 2kt <b j ) i, j 1 i r and 1 j s. (2) The threshold values must be monotonically increasing to reflect the ordinal nature of the observed outcomes. And for a binary outcome, only one threshold value representing the usual intercept is needed. At the second stage, we consider a mixed effects regression model for the bivariate latent response as follows: { Wlkt = x lkt β l + z lkt d lk + ε lkt,l=1, 2 d k N(0,D)andε kt N (0,H kt ). (3) 5

7 We further assume that d k =(d T 1k,dT 2k )T and ε kt =(ε 1kt,ε 2kt ) T are independent. x 1kt and x 2kt are the 1 q 1 and 1 q 2 design row vectors for the fixed effects with associated column vector slopes β 1 and β 2, respectively. Also, z 1kt and z 2kt are the 1 r 1 and 1 r 2 design row vectors for the random effects associated with the column vectors of unknown random effects d 1k and d 2k, respectively. [ σ The vector ε kt is the residuals vector with covariance matrix H kt = 1 2 ] ρ kt σ 1 σ 2 ρ kt σ 1 σ 2 σ2 2. The parameters β 1 and β 2 are common to all subjects, while d 1k and d 2k are subject-specific. For subject k, z 1k and z 2k often contain fixed (time-independent) subject-specific covariates, but time-varying covariates are also possible as indicated by Lesaffre, Todem and Verbeke (2000). To obtain a well-formulated model such as in Morrell, Pearson and Brant (1997), these matrices are restricted to satisfy the conditions, rank(x lk z lk )=rank(x lk ),l=1, 2, where (x lk z lk )represents the matrix obtained by stacking the matrices x lk and z lk. To complete the model formation, the correlation coefficient ρ kt of the bivariate latent variable given d k,[w kt d k ], may depend on covariates, with a design row vector x 3kt and corresponding slope vector β 3. This correlation coefficient is modeled using the Fisher transformation, ( ) 1+ρkt log = x 3kt β 3 1 ρ kt to ensure that 1 ρ kt 1. The parameters β l and σ l, l =1, 2, are not jointly identifiable, but the ratios β l /σ l,l=1, 2, are estimable (Catalano and Ryan, 1992). Hence, the bivariate latent response, conditional on d k, can be re-scaled by assuming that σ 1 = σ 2 = 1. Furthermore, following Gibbons and Bock (1987), it is useful to orthogonalize the random effects by letting d k = Tθ k where T, a lower triangular matrix with positive diagonal elements, is the Cholesky decomposition of D, i.e., TT T = D. The re-parameterized model is then given by E(W ( k θ k )=x ) k β + z k Tθ k log 1+ρk 1 ρ k = x 3k β 3 (4) θ k N (0,I r1 +r 2 ) and TT T = D [ ] [ ] [ ] x1k 0 z1k 0 β1 where x k =, z 0 x k =, β = and I 2k 0 z 2k β r1 +r 2 is the identity matrix of 2 order r 1 + r 2. The covariance matrix for subject k is equal to V k = z k Dzk T + H k,whereh k = Diag(H k1,h k2,..., H knk ). 6

8 For the sake of computing analytically the partial derivatives of the log-likelihood with respect to the Cholesky decomposition T of D, thematrixt and the corresponding vector of independent [ ] [ ] T11 0 θ1k random effects θ k are decomposed respectively as T = and θ T 21 T k =. The 22 θ 2k model as written in (4) yields the following regression model for each marginal component with uncorrelated random effects: { E(W1kt θ k )=x 1kt β 1 + z 1kt T 11 θ 1k E(W 2kt θ k )=x 2kt β 2 + z 2kt (T 21 θ 1k + T 22 θ 2k ) 2.2 Features of the model and parameter interpretation The model as described above accounts for the cross-sectional, longitudinal and the crosssectional-by-longitudinal association as follows: Corr(W 1kt,W 2kt )= ρ kt +z 1kt Cov(d 1k,d 2k )z T 2kt (1+z1kt Var(d 1k )z T 1kt )(1+z 2ktVar(d 2k )z T 2kt ) Corr(W lkt,w lkt )= z lktvar(d lk )z T lkt, l =1, 2; t t 1+z lkt Var(d lk )zlkt T Corr(W 1kt,W 2kt )= z 1kt Cov(d 1k,d 2k )z T 2kt (1+z1kt Var(d 1k )z1kt T )(1+z 2ktVar(d 2k )z2kt T t ),t The correlation coefficient ρ kt accounts implicitly for the cross-sectional association between the two actual outcomes given the subject-specific random effects d k. The multivariate nature of the (subject) random effects accounts for both the longitudinal association and the cross-sectional-bylongitudinal association, respectively, through the variances and the covariance of the marginal components of d 1k and d 2k. The correlation structure of the model, captured through ψ ll ktt = corr(w lkt,w l kt ), l,l =1, 2; t, t =1, 2,...n k, can be represented schematically by considering the two latent outcomes at two time points t and t asshowninfigure1. [Figure 1 about here.] On the latent scale, E(W lkt θ k )=x lkt β l + z lkt Tθ k and E(W lkt )=x lkt β l,l=1, 2. Therefore, the fixed effects parameters have both the subject-specific and marginal interpretations, as it is typical in linear mixed models (see Diggle et al., 1994). 7

9 This does not hold on the data scale. However, population-averaged parameters can be expressed as a factor of subject-specific parameters and therefore are equivalent with respect to model testing and reduction. From equation (2), the cumulative distribution of (Y 1kt,Y 2kt ) conditional on random effects is given as follows: pr(y 1kt i, Y 2kt j d k )=Φ(a i x 1kt β 1 z 1kt d 1k,b j x 2kt β 2 z 2kt d 2k ) (5) Taking the expectation of the conditional probability in equation (5) with respect to the distribution of the random effects d k and using the results of the theorem given in Appendix A, we get pr(y 1kt i, Y 2kt j) =Φ ρkt a i x 1kt β 1, b j x 2kt β 2 (6) 1+z 1kt Var(d 1k )z1kt T 1+z 2kt Var(d 2k )z2kt T which gives for each marginal component ( pr(y 1kt i) =Φ ( pr(y 2kt j) =Φ a i x 1kt β 1 1+z1kt Var(d 1k )z1kt T b j x 2kt β 2 1+z2kt Var(d 2k )z2kt T ) ) (7) where Φ(.) andφ ρkt (.) represent, respectively, the cumulative distribution function (CDF) of a univariate standard normal and the bivariate standard normal with a correlation coefficient ρ kt. Hence, for the first marginal outcome, it is easily seen from equation (7) that the populationaveraged parameters associated with β 1 and a i are smaller in magnitude provided that Var(d 1k )is positive and estimable and are given respectively by β 1 1+z1kt Var(d 1k )z T 1kt 2.3 Generalization of the method to multivariate ordinal outcomes in time and a i. 1+z1kt Var(d 1k )z1kt T The proposed model for bivariate ordinal outcomes can be easily extended to the case where m ordered categorical outcomes Y kt =(Y 1kt,Y 2kt,..., Y mkt ) T are observed over time. One typically assumes the existence of an underlying multivariate normal response W kt =(W 1kt,W 2kt,..., W mkt ) T which is related to the observed outcome through m vectors of threshold values a 1,a 2,..., a m 1 and a m,wherea l =(a l1,a l2,..., a l,sl 1) T,withs l being the number of ordered levels of the lth outcome. Specifically, the model formulation at the first stage is given by, { (Y1kt = i 1,..., Y mkt = i m ) (a i1 1 W 1kt <a i1,..., a im 1 W mkt <a im ) i 1,i 2,..., i m in the set of possible outcomes. 8

10 At the second stage, we consider a mixed effects regression model for the latent response as follows, { Wlkt = x lkt β l + z lkt d lk + ε lkt, l =1, 2,..., m d k N(0,D)andε k N(0,H kt ) For a relatively small number, m, of cross-sectional outcomes, the correlation matrix H kt may be left unstructured. However, when m is large, a structure could be given to H kt, assuming for example that its off-diagonal elements are all equal. This structure corresponds to that of the equi-correlated probit model of Ochi and Prentice (1984). The optimization and the estimation can proceed from there. 3. Maximum marginal likelihood estimation Given the above random effects regression model for the latent bivariate response W and relations (1) and (3), the joint probability of the bivariate ordinal response conditional on θ k is given by pr(y 1kt = i, Y 2kt = j θ k ) = pr(a i 1 W 1kt <a i,b j 1 W 2kt <b j θ k ) = Φ ρkt (a i e 1kt,b j e 2kt ) Φ ρkt (a i 1 e 1kt,b j e 2kt ) Φ ρkt (a i e 1kt,b j 1 e 2kt )+Φ ρkt (a i 1 e 1kt,b j 1 e 2kt ) where e lkt = E(W lkt θ k ),l=1, 2. Assuming that the bivariate responses of subject k are conditionally independent given θ k, the joint probability L(y k θ k ) for the observed outcome matrix y k is equal to the product of the conditional probabilities of all time-point responses and is given by L(y k θ k )= n k r t=1 i=1 j=1 where I(.) is the indicator variable. s [pr(y 1kt = i, Y 2kt = j θ k )] I(y 1kt=i) I(y 2kt =j) Then the marginal density of Y k in the population, i.e., the contribution of subject k to the marginal likelihood, is the integral of L(y k θ k ) weighted by the joint density function of the transformed random effects terms, namely, L(y k )=(2π) (r 1+r 2 )/2 R r 1 +r 2 L(y k θ k )exp( θ k 2 /2)dθ k where R r 1+r 2 is the (r 1 + r 2 ) dimensional euclidian space with. being the euclidian norm. The marginal likelihood for a sample of N independent subjects is given by L = N k=1 L(y k). Maximizing the log-likelihood, log L, with respect to Θ = (β 1,β 2,β 3,a,b,vec(T )) with vec(t )being 9

11 the vector of unique non-zero elements of T, the vector of model parameters, yields the likelihood equation log L Θ = N k=1 L 1 (y k ) L(y k) Θ =0 which will give MLE, provided that 2 log L is positive definite at the optimum solution. The key Θ Θ T computational features rely then on evaluating L(y k ) Θ = nk r s R r 1 +r 2 t=1 i=1 j=1 I(y 1kt = i) I(y 2kt = j)p 1 p ktij ktij Θ L(y k θ k )(2π) (r 1+r 2 )/2 exp( θ k 2 /2)dθ k where p ktij = Pr(Y 1kt = i, Y 2kt = j θ k ). This leads to the computation of p ktij Θ. For the first univariate outcome, the partial derivative of p ktij with respect to β 1 is given by p ktij β 1 = [ ( ) ( ) b φ(a i e 1kt )Φ j e 2kt ρ kt (a i e 1kt ) b + φ(a 1 ρ 2 i 1 e 1kt )Φ j e 2kt ρ kt (a i 1 e 1kt ) kt 1 ρ 2 kt ( ) ( )] b +φ(a i e 1kt )Φ j 1 e 2kt ρ kt (a i e 1kt ) b φ(a 1 ρ 2 i 1 e 1kt )Φ j 1 e 2kt ρ kt (a i 1 e 1kt ) x kt 1 ρ 2 1kt kt where φ(.) is the probability density function of the standard normal distribution. The partial derivatives of p ktij with respect to a threshold value a i,i =1,...r 1, gives ( ) ( ) p ktij b a = δ i i,i φ(a i e 1kt )Φ j e 2kt ρ kt (a i e 1kt ) b δ 1 ρ 2 i 1,i φ(a i 1 e 1kt )Φ j e 2kt ρ kt (a i 1 e 1kt ) kt 1 ρ 2 kt ( ) ( ) b δ i,i φ(a i e 1kt )Φ j 1 e 2kt ρ kt (a i e 1kt ) b δ 1 ρ 2 i 1,i φ(a i 1 e 1kt )Φ j 1 e 2kt ρ kt (a i 1 e 1kt ) kt 1 ρ 2 kt where δ i,i =1ifi = i and 0 otherwise. Differentiating p ktij with respect to β 2 and threshold values b j, j = 1,...s 1, can be done analogously. Differentiating p ktij with respect to β 3 through a chain rule is given by p ktij β 3 = [φ ρkt (a i e 1kt,b j e 2kt ) φ ρkt (a i 1 e 1kt,b j e 2kt ) φ ρkt (a i e 1kt,b j 1 e 2kt )+φ ρkt (a i 1 e 1kt,b j 1 e 2kt )] 2x 3kte x 3kt β 3 (1+e x 3kt β 3 ) 2 Differentiating p ktij with respect to vec(t )=(vec(t 11 ),vec(t 21 ),vec(t 22 )) requires the computations of p ktij e lkt (derived above); e lkt vec(t ll ) =(θ lk z lkt )J T r l, l =1, 2; and e 2kt vec(t 21 ) = θ 1k z 2kt,where is the direct product operator. As noted by Hedeker and Gibbons (1994), J T r l is the transformation 10

12 matrix of Magnus (1988), with dimension r l (r l +1)/2 rl 2, which eliminates the elements above the main diagonal. The likelihood equations can be solved using a Quasi-Newton approach where Θ γ, the parameter at step γ, is updated as follows, 1 log L Θ γ+1 =Θ γ + I e (Θ γ ; y) Θ γ where I e (Θ γ ; y), an empirical and consistent estimator of the information matrix at step γ (derived in Appendix B), is given by I e (Θ γ ; y) = N k=1 L 2 (y k ) L(y k) Θ γ ( ) L(yk ) T N S S T (8) with S = 1 N N k=1 L 1 (y k ) L(y k) Θ γ. At the optimum point, i.e., γ = γ max, I e (Θ γmax ; y) = ( ) T N k=1 L 2 (y k ) L(y k) L(yk ) Θ γmax Θ γmax can be inverted to get the asymptotic variance-covariance matrix of the model parameter estimates. 3.1 Adaptive Gaussian quadrature and computations Gaussian quadrature rules are used to approximate integrals of functions with respect to a given kernel by a weighted average of the integrand evaluated at predetermined abscissas as in Pinheiro and Bates (2000). This methodology relies basically on the concept of orthogonal functions for which high degree of accuracy is attained when the integrand is sufficiently smoothed. The weights and abscissas used in Gaussian quadrature rules for the most common kernels, including the normal kernel, can be obtained from the tables of Abramowitz and Stegun (1964). If Q univariate Θ γ quadrature points are requested, then a r 1 + r 2 -dimensional integral requires Q r 1+r 2 multivariate points P T q =(P q1,p q2,..., P q,r1 +r 2 ),q=1,..., Q r 1+r 2, with associated weights given by the product of the corresponding univariate weights Π(P q )= r 1 +r 2 h=1 Π(P qh ). As the number of random effects, r 1 + r 2, increases, the multidimensional quadrature points increases exponentially in the quadrature solution. However, several authors have reported that the number of points in each univariate dimension can be reduced for higher dimensional integrals without impairing the accuracy of the approximations. For example, we found that as few as five 11

13 points per dimension were sufficient to obtain adequate accuracy when random intercept models were fitted. Hence, the contribution L(y k ) of subject k to the likelihood can be approximated by Q r 1 +r 2 L(y k ) π (r 1+r 2 )/2 q=1 L(y k 2 P q )Π(P q ). (9) The Gaussian quadrature as noted by Pinheiro and Bates (2000) can be viewed as a deterministic version of Monte Carlo integration in which the random sample of θ k are generated from the N (0,I r1 +r 2 ). In a pure Gaussian quadrature approach, the quadrature points and the corresponding weights are fixed beforehand, but in Monte Carlo, they are left to random choice. Because importance sampling tends to be much more efficient than the deterministic Monte Carlo, we consider the equivalence of importance sampling in the Gaussian quadrature context which is termed by Pinheiro and Bates (2000) as the adaptive Gaussian quadrature. Here, the grid of the abscissas in the scale of θ k is centered around the conditional mode ˆθ k rather than 0, as in (9). We recall that L(y k )=(2π) (r 1+r 2 )/2 R r 1 +r 2 exp(log(l(y k θ k )) θ k 2 /2)dθ k For the ease of notation, let H(y k, ˆθ k )= 2 h(y k,θ k ) θ k θ T k θk =ˆθ k where h(y k,θ k )=log(l(y k θ k )) θ k 2 /2and ˆθ k = arg max θk h(y k,θ k ). In addition, we consider the scaling of θ k using H 1 2 (y k, ˆθ k ), the Cholesky decomposition of H(y k, ˆθ k ) as follows, θ k = ˆθ k + H 1 2 (y k, ˆθ k )θ k. The adaptive Gaussian quadrature is then given by Q r 1 +r 2 L(y k ) (2π) (r 1+r 2 )/2 q=1 { exp h(y k, ˆθ k + H 1 2 (yk, ˆθ k )P q )+ P q 2} Π(P q ) 12

14 3.2 Goodness-of-fit As the model is likelihood-based, likelihood-ratio test (LRT) statistics could be used to test for statistical significance of the fixed effects terms in the model. However, testing for the need or the reduction of the dimensionality of the random effects results in a non-standard problem since the null set does not lie in the interior of the parameter space (Chernoff, 1954). An informal assessment of the model fit can be performed by comparing the observed proportions to the fitted probabilities using expressions in (6) and (7), marginally and jointly for the two outcomes. Others have used this approach for monitoring random effects models (see e.g. Legler and Ryan, 1997; Hedeker and Gibbons, 1994; and Ten Have and Morabia, 1999). 4. Example To illustrate the application of the proposed model to longitudinal bivariate ordinal outcomes, we examined data collected in the cardiovascular educational study. These data were previously analyzed by Ten Have and Morabia (1999) with outcomes re-coded to binary scales (negative vs. no change or positive changes). Modeling the data as binary outcomes results in a loss of information although they are more convenient to handle from a medical standpoint. As noted by these authors, the data set contains missing outcomes at the 4-month-visit and at 12-month visit. Specifically, out of 266 subjects, there were 208 subjects with reported outcomes at the 4-month visit and 243 subjects at the 12-month visit. Missingness was less severe at 12 months due to additional efforts by the study coordinators to contact study patients at 12 months. Preliminary analyses were performed to see if the missing visits were related to the unobserved outcomes. For example, analyses of the dropout indicator, at 4 months and at 12 months, using the blood pressure and cholesterol level at the previous visit as covariates, were performed. No association involving these covariates was found at a 5% significance level. Therefore, the Missingness in these data was considered to be ignorable. Under such conditions, the maximum likelihood estimation yields consistent and unbiased estimates (Rubin, 1976). From a statistical perspective, the effects under investigation, for each component of the model, are intervention (audio intervention being the reference group), time (12 months being the reference 13

15 time), and intervention-time interaction. The intervention-time interaction effect corresponds to the difference in intervention between the two follow-up times. The proposed latent-variable model denoted as MEBO1 with correlated random intercept terms is first fitted to the data. A simple version of MEBO1 denoted as MEBO2, which assumes independence between the random effects of each marginal component is also considered. The aim here is to assess the robustness of the model estimates under MEBO1. A third model, referred to as IBO, is a naive bivariate model in that it omits the random effects parameters while assuming the same fixed effects structure as MEBO1. Results are also presented for two separate random intercept models (MEUO1 and MEUO2), one for each response, to assess the robustness of the marginal probit estimates when the outcomes are jointly analyzed. The results obtained from these last models were exactly the same as the ones provided by MIXOR (A FORTRAN program for modeling univariate ordinal outcome in time, proposed by Hedeker and Gibbons, 1994). Each of the five model estimates obtained using GAUSS (Aptech, 1990) and displayed in table 1 contains all these effects in each component. Sensitivity of the estimates. As expected, the impact of including random effects in the bivariate probit model (MEBO1) is similar to the impact reported in the literature for the univariate ordinal mixed effects model. The estimates and standard errors under MEBO1 exceed consistently the corresponding population-averaged counterparts (IBO) reflecting the heterogeneity among subjects as noted by Zeger et al. (1988). We should note, however, that this trend is not observed for the estimates of the correlation coefficient ρ kt. A theoretical explanation on this is yet to be found. Table 1 also reveals that the marginal probit parameter estimates and standard errors under MEBO1 are very similar to the analogous parameter estimates and standard errors under the univariate MEUO1 and MEUO2. However, this is not necessarily an indication of orthogonality between the marginal probit and the correlation coefficient parameter estimates for at least one reason: the marginal probit standard errors under the MEBO1 are smaller than the analogous standard errors under MEUO1 and MEUO2 in contrast to what is expected for the bivariate probit model without random effects. Finally, the estimates of model MEBO2 that assumes independence between the random effects are very close to those under model MEBO1. In fact, the observed value 14

16 of the LRT statistic is which is less than , the critical point based on a χ 2 distribution with 1 degree of freedom. This suggests that ignoring the cross-sectional by longitudinal association is not a big issue in these data. [Table 1 about here.] Model interpretations. Under the MEBO1 model, the intervention-time interaction effect estimate on the probit for cholesterol is (0.2728), which is not significant at 5% level (p=0.1571). By rescaling this estimate to get the population-averaged parameter estimates, subjects on the audio-intervention arm, in spite of the lack of statistical significance, exhibit higher probabilities of observing a positive or at least no change in the cholesterol status than subjects under the non-intervention arm at 4 months. And this gap between the two intervention groups is greater at 12 months. Under the same model, the intervention-time interaction effect estimate on the probit for blood pressure is (0.2666), which is significant at 5% level (p=0.025). Again by rescaling this estimate to get the population-averaged parameter estimates, subjects under the audio-intervention arm exhibit higher probabilities of observing a positive or at least no change in the blood pressure status than subjects under the non-intervention arm at 12 months. However, at 4 months, participants under the non-intervention arm perform better. Therefore the intervention becomes important when additional educational materials are introduced between 4 and 12 months. [Table 2 about here.] Goodness-of-fit. In an informal assessment of goodness-of-fit, Table 2 compares the observed cumulative proportions and MEBO1-based cumulative fitted probabilities for each intervention by time combination. A very good agreement between the observed and predicted marginal and joint proportions is indicated, which should be expected given that all three components of the model include the intervention-time interaction. 5. Discussion We have described latent-variable models for analyzing longitudinal bivariate ordinal outcomes which provide both a person-specific and marginal covariate-response interpretations of the fixed 15

17 effects model parameters. The model formation assumes two stages. First, the ordinal response is related to a continuous latent variable via the threshold concept. Second, a classical multivariate mixed model for the latent response is formulated. Assuming conditional independence given random effects terms, the marginal likelihood for ordinal outcomes is approximated using an adaptive Gauss-Hermite quadrature approach to numerical integration. An important feature of these models is that they allow irregularly spaced measurements across time, time-dependent and independent covariates and ignorable missing data. However, the data set used to illustrate our methodology does not have a continuous covariate, but our estimation approach does theoretically accommodate them. For example, if the data set is highly unbalanced with respect to time, we can easily fit a random intercept and a random slope with respect to time to model each subject profile. Although the proposed models are motivated by the two-stage modeling approach, it is the marginal models that are fitted to the data. Hence, inferences based on the marginal models do not explicitly assume the presence of random effects representing the natural heterogeneity among subjects. However, competing models with the same marginal fit correspond to the same maximized likelihood and with equal fixed effects estimates. An important question with respect to our approach is whether we actually believe in the threshold model and the unobserved latent variable or if we merely use it as a device to handle ordinal data. Several authors reported in the literature that latent-variable models require large data sets in order to estimate all model parameters (see e.g., Garrett and Zeger, 2001). In many instances, it is unclear if there is enough data to estimate the model parameters uniquely or with any precision. To deal with this issue, we fitted the proposed model using different parameter starting values to check if the model is identifiable. This technique is obviously too empirical and heuristic. Therefore, there is a need to develop a formal procedure for monitoring latent variable models. Garrett and Zeger (2001) have proposed the use of a Bayesian approach where the posterior distribution of each parameter is compared to the prior distribution. Other authors have also used this Bayesian approach, under the terminology Rubin s posterior predictive check, for 16

18 model monitoring when tools such as the LRT is not applicable. Typically, the use of the limiting distribution of the LRT statistic for the test of a need of random effects or the reduction of their dimension is not valid due to the fact that the regularity condition that the null hypothesis has to lie in the interior of the parameter space is not met. Though our models were built from the features of the data at hand, they have general application to situations where multiple ordinal outcomes are recorded over time. However, in some situations such as the presence of informative missing data, the proposed models need to be extended accordingly. The model could also be extended by assuming that the random effects terms follow a non-parametric distribution for which the support points and the frequencies have to be estimated from the data. Even the number of support points could be estimated as well. This will definitely lead to larger parameter s standard errors as the normality assumptions do bring additional information to the model formation. All these issues will be the focus of future work. Acknowledgements The first author wishes to thank Dr Thomas Ten Have for the permission to use data from the cardiovascular educational program study. Financial support for this work was provided by the University of Wisconsin-Madison through the Graduate School, Medical School and Comprehensive Cancer Center. The first author finally acknowledges the financial support from the IBC/ENAR student award for the presentation of this work during the ENAR 2002 Spring meeting. References Aptech, S. (1990). GAUSS 2.1 User s manual. Kent, WA: Author. Catalano, P. J. and Ryan, L. M. (1992). Bivariate latent variable models for clustered discrete and continuous outcomes. Journal of the American Statistical Association 87, Chernoff, H. (1954). On the distribution of the likelihood ratio. Annals of the mathematical statistics 25,

19 Dale, J. (1986). Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42, Diggle, P., Liang, K. and Zeger, S. (1994). Analysis of longitudinal data. Clarendon Press, Oxford. Fitzmaurice, G. M. and Laird, N. L. (1995). Regression models for a bivariate discrete and continuous outcome with clustering. Journal of the American statistical Association 90, Garrett, E. and Zeger, S. (2001). Assessing estimability of latent class models using a bayesian estimation approach. In The impact of technology on biometrics. ENAR Spring Meeting. Gibbons, R. and Bock, R. D. (1987). Trend in correlated proportions. Psychometrika 52, Glonek, G. F. V. and McCullagh, P. (1995). Multivariate logistic models. Journal of the Royal Statistical Society Series B 57, Hedeker, D. and Gibbons, R. (1994). A random-effects ordinal regreession model for multilevel analysis. Biometrics 50, Kim, K. (1995). A bivariate cumulative probit regression model for ordered categorical data. Statistics in Medicine 14, Kim, K. and Todem, D. (2000). A gauss program for fitting bivariate probit model with applications to ophtamologic studies. Technical report, Department of Biostatistics, University of Wisconsin- Madison. Legler, J. M. and Ryan, L. M. (1997). Latent variable model for teratogenesis using multiple binary outcomes. Journal of the American Statistical Association 92, Lesaffre, E. and Kaufmann, H. (1992). Existence and uniqueness of the maximum likelihood estimator for a multivariate probit model. Journal of the American statistical Association 87, Lesaffre, E. and Molenberghs, G. (1991). Multivariate probit analysis : a neglected procedure in medical statistics. Statistics in Medicine 10, Lesaffre, E., Todem, D. and Verbeke, G. (2000). Flexible modelling of the covariance matrix in a linear random effects model. Biometrical Journal 42, Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. 18

20 Biometrika 73, Lipsitz, S., Kim, K. and Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine 13, McCullogh, C. E. (1994). Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association 89, Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinal data using a multivariate plackett distribution. Journal of the American Statistical Association 89, Morrell, C., Pearson, J. D. and Brant, L. J. (1997). Linear transformations of linear mixed effects models. The American Statistician 51, O Brien, P. C. (1984). Procedures for comparing multiples endpoints. Biometrics 40, Ochi, Y. and Prentice, R. L. (1984). Likelihood inference in a correlated probit regression model. Biometrika 71, Pinheiro, J. C. and Bates, D. M. (2000). Mixed effects models in S and S-plus. Springer, New-York. Plackett (1965). A class of bivariate distribution. Journal of the American statistical Association 60, Rubin, D. B. (1976). Inference and mising data. Biometrika 63, Ten Have, T. and Morabia, A. (1999). Mixed effects models with bivariate and univariate association parameters for longitudinal bivariate binary response data. Biometrics 55, Williamson, J. and Kim, K. (1996). A global odds ratio regression model for bivariate ordered categorical data from ophthalmologic studies. Statistics in Medicine 15, Zeger, S. L., Liang, K.-Y. and Albert, P. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44, Appendix A Computation of the marginal cumulative probabilities In order to deduce the marginal probabilities distributions of (Y 1kt,Y 2kt ), we let Φ C (u) bethe bivariate normal distribution function with argument u, mean vector 0 and covariance matrix C. 19

21 We also let φ C (v, µ) be the bivariate normal density function with argument v, mean vector µ and covariance matrix C. Theorem 1. If pr(y 1kt i, Y 2kt j d k )=Φ C (u + x kt β + z kt d k ) (A.1) where u =(a i,b j ) and d k is a random vector with mean E(d k ) and covariance matrix D, then pr(y 1kt i, Y 2kt j) =Φ zkt Dz T kt +C (u + x ktβ + z kt E(d k )) (A.2) Proof. Integrating out the random effects d k in (A.1), we find, after a variable transformation and a change of the order of integration pr(y 1kt i, Y 2kt j) = ai x 1kt β 1 bj x 2kt β 2 f(ω)dω (A.3) where f(ω) = R r 1 +r 2 φ D(u, E(d k ))φ C (ω, z kt u)du with R r 1+r 2 being the support space of the random effects d k. Using standard results on normal distributions, one can show that f(ω) =φ zkt Dz T kt +C(ω, z kte(d k )) (A.4) Inserting (A.4) in (A.3) and (A.2) concludes the proof. Appendix B Approximation to the observed information matrix Assuming the data y =(y 1,y 2,..., y N ) is a matrix of independent random vectors with a common probability density function L(Θ,y k ) L(y k ). The log-likelihood and the score functions for N independent subjects are given respectively by log L = N k=1 log L(y k)ands(θ,y)= N k=1 S(Θ; y k) where S(Θ; y k )= log L(y k) θ. Hence the expected information matrix for the whole data is I(Θ) = Ni(Θ) with i(θ) = E ( s(θ; y k )s T (Θ; y k ) ) = Cov(s(Θ; y k )) being the information contained in a single observation. The empirical information in a single observation can be estimated by î(θ) = 1 N N k=1 s(θ; y k )s T (Θ; y k ) 1 N 2 S(Θ; y)st (Θ; y). 20

22 The empirical information matrix I e (Θ; y) based on N independent subjects and displayed in equation (8) is then given by I e (Θ; y) = N k=1 s(θ; y k)s T (Θ; y k ) 1 N S(Θ; y)st (Θ; y) which, for Θ = ˆΘ (MLE), reduces to I e ( ˆΘ; y) = N k=1 s( ˆΘ; y k )s T ( ˆΘ; y k )sinces( ˆΘ; y) =0. I e ( ˆΘ; y) is commonly used in practice to approximate the observed information matrix which is difficult if not impossible to compute analytically; see for example Hedeker and Gibbons (1996), Kim and Todem (2000). I e ( ˆΘ; y)/n is a consistent estimator of i(θ) at the maximum likelihood. The use of this estimator can be justified also in the following sense: I(Θ; y) = 2 log L Θ Θ T = N 2 log L(y k ) k=1 Θ Θ T = N k=1 s(θ; y k)s T (Θ; y k ) N k=1 1 2 L(y k ) L(y k ) Θ Θ T where I(Θ; y) is the observed information matrix which is different from the expected version I(Θ) defined earlier. The second term on the right-hand of the last equality has zero expectation. Hence, I( ˆΘ; y) N k=1 s( ˆΘ; y k )s T ( ˆΘ; y k ) = I e ( ˆΘ; y) where the accuracy of this approximation depends on how close ˆΘ is to Θ. In the particular case of multinomial distributed data, Kim and Todem (2000) have shown that the second term on the right-hand side of the first equality above is zero, and so the equality above holds exactly. 21

23 ψ 11ktt W 1 kt W 1 kt ψ 12ktt ψ 12ktt ψ 12kt t W 2 kt W 2 kt ψ 22ktt Figure 1. The correlation structure of the proposed model on the latent scale 22

24 Table 1 Parameter estimates (standard errors) under the MEBO1, MEBO2, MEUO1, MEUO2 and IBO models fitted to the cardiovascular trial data Component Parameter MEBO1 MEBO2 MEUO1 MEUO2 IB0 Thresh (0.2195) (0.2187) (0.2283) (0.1144) Thresh.2 - Thresh (0.2195) (0.2186) (0.2167) (0.0777) CHOL Interv. (audio) (0.2252) (0.2241) (0.2309) (0.1534) Time (12 months) (0.1796) (0.1794) (0.2059) (0.1883) Interv. by Time (0.2728) (0.2728) (0.2713) (0.2735) Thresh (0.1481) (0.1482) (0.1569) (0.1089) Thresh.2 - Thresh (0.1234) (0.1235) (0.1431) (0.0661) BP Interv. (audio) (0.2035) (0.2031) (0.1943) (0.1505) Time (12 months) (0.1676) (0.1674) (0.1657) (0.1738) Interv. by Time (0.2666) (0.2641) (0.2368) (0.2571) Intercept (0.4513) (0.4330) (0.2306) CHOL-BP Interv. (audio) (0.6213) (0.6193) (0.3251) Time (12 months) (0.7528) (0.7478) (0.3953) Interv. by Time (1.0037) (1.0012) (0.5341) T Random effects (0.1734) (0.1720) (0.1708) variance terms T (Cholesky) (0.1274) T (0.1274) (0.1275) (0.1422) log L MEBO1 and MEBO2: Mixed Effects Bivariate Ordinal model under correlated (T 12 0) and uncorrelated (T 12 =0) random effects respectively; MEUO1 and MEUO2: Mixed Effects Univariate Ordinal model for the first and second marginal respectively; IBO: Bivariate Ordinal model under independence. CHOL and BP: Probit parameters for Cholesterol response and Blood pressure response respectively; CHOL- BP: Parameters (on the Fisher transformation scale) of the underlying correlation for Cholesterol-Blood pressure response. T 11,T 21 and T 22 : Cholesky decomposition of the random intercept variance terms (Var(d 1k ) = T 2 11, Cov(d 1k,d 2k )=T 11 T 21 and Var(d 2k )=T T 2 21 ). 23

25 Table 2 Model-based (MEBO1) marginal and joint cumulative probabilities for changes in Cholesterol and Blood pressure status with the corresponding observed cumulative proportions in parentheses Event BP= 1 BP 2 Marginal(CHOL) CHOL= 1 Intervention Visit (months) No (0.0242) (0.1048) (0.1290) No (0.0099) (0.0693) (0.0990) Yes (0.0168) (0.0756) (0.1513) Yes (0.0467) (0.1121) (0.1682) CHOL 2 Intervention Visit (months) No (0.1371) (0.4435) (0.6210) No (0.1188) (0.4059) (0.6139) Yes (0.1261) (0.4454) (0.6891) Yes (0.1776) (0.5794) (0.7850) Marginal(BP) Intervention Visit (months) No (0.2177) (0.7016) No (0.1485) (0.6535) Yes (0.1765) (0.6218) Yes (0.2150) (0.7383) The changes in cholesterol status (CHOL) and blood pressure (BP) are three-level ordinal outcomes where 1 represents a positive change; 2 no change; and 3 a negative change. 24

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros Web-based Supplementary Material for A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with excess Zeros Dimitris Rizopoulos, 1 Geert Verbeke, 1 Emmanuel Lesaffre 1 and Yves

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia

Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3067-3082 Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia Z. Rezaei Ghahroodi

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation Structures Authors: M. Salomé Cabral CEAUL and Departamento de Estatística e Investigação Operacional,

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version

A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version Made available by Hasselt University Library in Document Server@UHasselt Reference

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Mixed Models for Longitudinal Ordinal and Nominal Outcomes Mixed Models for Longitudinal Ordinal and Nominal Outcomes Don Hedeker Department of Public Health Sciences Biological Sciences Division University of Chicago hedeker@uchicago.edu Hedeker, D. (2008). Multilevel

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Random Effects Models for Longitudinal Data

Random Effects Models for Longitudinal Data Chapter 2 Random Effects Models for Longitudinal Data Geert Verbeke, Geert Molenberghs, and Dimitris Rizopoulos Abstract Mixed models have become very popular for the analysis of longitudinal data, partly

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 36 11-1-2016 Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness Li Qin Yale University,

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Longitudinal data analysis using generalized linear models

Longitudinal data analysis using generalized linear models Biomttrika (1986). 73. 1. pp. 13-22 13 I'rinlfH in flreal Britain Longitudinal data analysis using generalized linear models BY KUNG-YEE LIANG AND SCOTT L. ZEGER Department of Biostatistics, Johns Hopkins

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Efficiency of generalized estimating equations for binary responses

Efficiency of generalized estimating equations for binary responses J. R. Statist. Soc. B (2004) 66, Part 4, pp. 851 860 Efficiency of generalized estimating equations for binary responses N. Rao Chaganty Old Dominion University, Norfolk, USA and Harry Joe University of

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011 INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA Belfast 9 th June to 10 th June, 2011 Dr James J Brown Southampton Statistical Sciences Research Institute (UoS) ADMIN Research Centre (IoE

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis.

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis. Using Using David Hughes 2015 Outline Using 1. 2. Multivariate Generalized Linear Mixed () 3. Longitudinal 4. 5. Using Complex data. Using Complex data. Longitudinal Using Complex data. Longitudinal Multivariate

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Identification Problem for The Analysis of Binary Data with Non-ignorable Missing

Identification Problem for The Analysis of Binary Data with Non-ignorable Missing Identification Problem for The Analysis of Binary Data with Non-ignorable Missing Kosuke Morikawa, Yutaka Kano Division of Mathematical Science, Graduate School of Engineering Science, Osaka University,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

A generalized linear mixed model for longitudinal binary data with a marginal logit link function

A generalized linear mixed model for longitudinal binary data with a marginal logit link function A generalized linear mixed model for longitudinal binary data with a marginal logit link function MICHAEL PARZEN, Emory University, Atlanta, GA, U.S.A. SOUPARNO GHOSH Texas A&M University, College Station,

More information

Covariance modelling for longitudinal randomised controlled trials

Covariance modelling for longitudinal randomised controlled trials Covariance modelling for longitudinal randomised controlled trials G. MacKenzie 1,2 1 Centre of Biostatistics, University of Limerick, Ireland. www.staff.ul.ie/mackenzieg 2 CREST, ENSAI, Rennes, France.

More information

Why analyze as ordinal? Mixed Models for Longitudinal Ordinal Data Don Hedeker University of Illinois at Chicago

Why analyze as ordinal? Mixed Models for Longitudinal Ordinal Data Don Hedeker University of Illinois at Chicago Why analyze as ordinal? Mixed Models for Longitudinal Ordinal Data Don Hedeker University of Illinois at Chicago hedeker@uic.edu www.uic.edu/ hedeker/long.html Efficiency: Armstrong & Sloan (1989, Amer

More information

RESEARCH ARTICLE. Likelihood-based Approach for Analysis of Longitudinal Nominal Data using Marginalized Random Effects Models

RESEARCH ARTICLE. Likelihood-based Approach for Analysis of Longitudinal Nominal Data using Marginalized Random Effects Models Journal of Applied Statistics Vol. 00, No. 00, August 2009, 1 14 RESEARCH ARTICLE Likelihood-based Approach for Analysis of Longitudinal Nominal Data using Marginalized Random Effects Models Keunbaik Lee

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Growth models for categorical response variables: standard, latent-class, and hybrid approaches

Growth models for categorical response variables: standard, latent-class, and hybrid approaches Growth models for categorical response variables: standard, latent-class, and hybrid approaches Jeroen K. Vermunt Department of Methodology and Statistics, Tilburg University 1 Introduction There are three

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Longitudinal Data Analysis Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Course description: Longitudinal analysis is the study of short series of observations

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence International Journal of Statistics and Probability; Vol. 4, No. 1; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Estimating Explained Variation of a Latent

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Introduction (Alex Dmitrienko, Lilly) Web-based training program

Introduction (Alex Dmitrienko, Lilly) Web-based training program Web-based training Introduction (Alex Dmitrienko, Lilly) Web-based training program http://www.amstat.org/sections/sbiop/webinarseries.html Four-part web-based training series Geert Verbeke (Katholieke

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

MARGINALIZED REGRESSION MODELS FOR LONGITUDINAL CATEGORICAL DATA

MARGINALIZED REGRESSION MODELS FOR LONGITUDINAL CATEGORICAL DATA MARGINALIZED REGRESSION MODELS FOR LONGITUDINAL CATEGORICAL DATA By KEUNBAIK LEE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS074) p.5763 Assessing GEE Models wh Longudinal Ordinal Data by Global Odds Ratio LIN, KUO-CHIN Graduate Instute of

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Analysis of Repeated Measures and Longitudinal Data in Health Services Research

Analysis of Repeated Measures and Longitudinal Data in Health Services Research Analysis of Repeated Measures and Longitudinal Data in Health Services Research Juned Siddique, Donald Hedeker, and Robert D. Gibbons Abstract This chapter reviews statistical methods for the analysis

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Implementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS

Implementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS PharmaSUG2011 - Paper SP09 Implementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS Madan Gopal Kundu, Indiana University Purdue University at Indianapolis, Indianapolis,

More information

Correspondence Analysis of Longitudinal Data

Correspondence Analysis of Longitudinal Data Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

LATENT VARIABLE MODELS FOR LONGITUDINAL STUDY WITH INFORMATIVE MISSINGNESS

LATENT VARIABLE MODELS FOR LONGITUDINAL STUDY WITH INFORMATIVE MISSINGNESS LATENT VARIABLE MODELS FOR LONGITUDINAL STUDY WITH INFORMATIVE MISSINGNESS by Li Qin B.S. in Biology, University of Science and Technology of China, 1998 M.S. in Statistics, North Dakota State University,

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University. Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information