Repeated ordinal measurements: a generalised estimating equation approach

Size: px

Start display at page:

Download "Repeated ordinal measurements: a generalised estimating equation approach"

Domenic Simpson
6 years ago
Views:

1 Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related regression models for ordered categorical data may be expressed as generalised linear models for correlated binary responses. These may be fitted using the generalised estimated equation approach of Liang and Zeger (1986) and yields nearly identical results to maximum likelihood while offering further flexibility. The approach also generalises to deal with repeated ordinal measurements in the same subject, such as those commonly observed in medical cross-over experiments. Keywords: Generalised estimating equations, ordinal data, repeated measures, cross-over trials. 1 Background: generalised estimating equations In a generalised linear model (GLM), a response vector y, length N, has expectation vector µ whose elements are related to those of a linear predictor η by the link function g(.) (so that g(µ i ) = η i ). The linear predictor is given by the linear model η = Xβ where X is a matrix whose rows, x i, are vectors of covariates for each observational unit. In the original formulation of GLMs, the responses are assumed to be independent with variances φv(µ i ), where V (.) is the variance function and φ the scale factor. 1

2 Estimation of the regression coefficients, β, is by solution of the estimating equations X t We = 0 (1) where e is the vector of scaled residuals, e i = g (µ i )(y i µ i ), and W is a diagonal matrix of weights such that [W ii ] 1 = φv(µ i ) [ g (µ i ) ] 2. It is well known that these estimating equations lead to maximum likelihood estimates of β when the distribution of responses is drawn from the exponential family. In other cases, they are referred to as maximum quasi-likelihood estimates. In either case, under the assumptions set out above, the asymptotic variance of the estimates is estimated by (X t WX) 1 (evaluated with β at its estimated value, ˆβ). More recently, it has been recognised that these estimating equations lead to consistent (if not fully efficient) estimates of β even when the variance function V(.) is mis-specified. However, in these circumstances the above variance estimate is incorrect and it is necessary to use an alternative robust estimate of the form (X t WX) 1 S(X t W X) 1. Here S is the sum of squares and products (SSP) matrix for the individual contributions to the estimating equation. More precisely, if x i is the i th row of X, the contribution of subject i to the estimating equations is given by u i = W ii e i x i so that equation 1 may be rewritten as i u i = 0, then S = u i u t i. i In the papers by Liang, Zeger and collaborators (for example, Liang and Zeger, 1986) this idea has been further developed. If the vector y, of length NM, represents M repeated measurements on N different individuals, then the model may be extended to allow for correlation between repeated measurements of the same individual with, say, Corr(y i j,y ik ) = (ρ jk ) i. 2

3 (Usually the correlation structure is assumed to be constant across subjects, so that the i subscript may be dropped.) Efficient estimation of β may be achieved by solution of estimating equations of the same general form as discussed above, with the modification that the W matrix should now be block-diagonal as a result of the correlation between repeated measures. Since it may be difficult in practice to specify the correct variance and correlation structures, Liang and Zeger recommend the use of a convenient working approximation to ρ, and of a robust estimate for the variance of ˆβ. This is obtained in the same way as before, the matrix S now representing the empirical SSP matrix for the N contributions of individual subjects to the estimating equation, u i (= j u i j ). 2 Ordered categorical responses Perhaps the most popular method for analysis of ordered categorical data analysis is that based upon the cumulative logit regression model. This was first proposed by Snell (1964) and further generalised by McCullagh (1980) to allow link functions other than the logit. McCullagh s description of the model was in terms of an underlying latent continuous response, stratified at unknown cutpoints. For a response with C categories we need C 2 parameters to represent these cutpoints (since the boundary between the first two categories can be taken as zero without loss of generality). An alternative view of the class of models is that they hold that, over the C 1 different ways of collapsing the response into a binary one, the quantal regression equations are unchanged, save in their intercepts. Any of the usual binary regression links (logit, probit, complementary log-log etc.) are available. With this view of the model, the extra C 2 cutpoint parameters represent differences between the intercepts of the C 1 binary regressions. This latter view of the logit version of the model prompted Clayton (1974) to propose, for the two sample problem, a modified version of the Mantel-Haenszel estimate of the common odds ratio in a stack of 2 2 contingency tables. The possible collapses of the ordinal response yield C 1 such tables and these provide C 1 correlated estimates of the common odds ratio. Clayton showed that, although the optimal weights for pooling these estimates are rather complicated, use of weights which are optimal under the null hypothesis provides a convenient practical method. This method was based on two main ideas, 3

4 1. the treatment of the ordinal response as C 1 correlated binary responses, and 2. the use of weights which are locally optimal around the null. The first (and, to a lesser extent, the second) of these ideas is carried through in the present proposal. Thus, if the ordered categorical response of the i th subject is coded 1,...,C then we may create an expanded vector of binary responses, y, of length N(C 1) and indexed by i and j so that, for j = 1,...,C 1, y i j = I(y i j). The Snell-McCullagh model relates the expectation of this vector, µ, via a link function to the linear predictor vector, η, with elements also indexed by i and j η i j = θ j + x t iβ. This model may be fitted using generalized estimating equations. An expanded design matrix, X, is created by repeating each row of the original design matrix C 1 times, corresponding to the C 1 possible collapses, and appending C 2 columns of dummy variables to allow for differences in intercepts of the C 1 possible binary regressions. The binomial variance function correctly specifies the variances of the elements of y. The correlations of responses are simple functions of their expectations, µ, Corr(y i j,y ik ) = µ i,min( j,k) µ i jµ ik µi j (1 µ i j )µ ik (1 µ ik ). A working correlation matrix, constant for all i, is provided by the estimate under the null hypothesis of homogeneity of response. This is obtained by substituting the marginal cumulative proportions for µ i j, j = 1,...,C 1 in the above expression. Notice that, in contrast with the method of Clayton (1974), the weighting scheme uses estimates under the null hypothesis only for correlations between the elements of y their variances are dealt with correctly. However, if software allowed, there would be no need even for this inaccuracy, which arises solely from a requirement to specify a common correlation structure across subjects. 4

5 Time (minutes) Treatment < > 60 Active Placebo Table 1: Time to falling asleep for 239 subjects An example The main purpose of this paper is to exploit the natural generalisation of this approach to deal with repeated ordinal measurements within the same subject. Before proceeding to this, however, a comparison with the method of maximum likelihood in the simpler case serves to demonstrate the efficiency of the method, and some practical advantages. Table 1 reproduces a dataset which has been analysed elsewhere in the literature (Framcom, Chuang and Landis, 1989; Agresti, 1989). The data concern time to falling asleep, coded into 4 ordered categories, for N = 239 subjects, half receiving active treatment and half placebo. Measurements were made pretreatment and on a follow-up occasion after treatment, but for this first analysis only the follow-up data are shown. For the GEE analysis, each subject contributes three binary response variables coding whether time to falling asleep was (a) 20 minutes, (b) 30 minutes, or (c) 60 minutes. The corresponding marginal proportions are , and so that the working correlation matrix is If treatment is coded into a vector z, with z i = 0 indicating placebo and z i = 1 indicating active treatment, the Snell-McCullagh model is η i j = µ + θ j + βz i where the cutpoint parameters, θ j, are subject to a linear constraint such as the corner constraint θ 1 = 0. Alternatively, in the syntax introduced by Wilkinson and Rogers (1973) and further developed in computer programs such as GLIM, this model can be written. 1 + Cutpoint + Treatment. 5

6 Using GEE with a logistic link and binomial variance function, the treatment effect is estimated as ˆβ = with asymptotic standard error (note that the positive coefficient indicates a shift to the left in the response distribution). Full maximum likelihood yielded ˆβ = with an ASE of An unexpected benefit of the GEE approach is that the cutpoint parameters enter simply as terms in the linear model. The assumption of constancy of treatment effect across cutpoints may be tested by inclusion of a Cutpoint Treatment interaction term in the model. A single degree of freedom test for trend of treatment effect across cutpoints can be carried out by fitting the model η i j = µ + θ j + βz i + γ jz i. In the present example, this yields ˆγ = with ASE There is, therefore, some suggestion of failure of the Snell-McCullagh model, with a tendency for the treatment effect to increase with shift of cutpoint to the right. This impression is also suggested by inspection of the odds ratios for the three cutpoints; cutting at 20 minutes gives an odds ratio of (40 89)/(31 79) = 1.45, cutting at 30 minutes gives an odds ratio of (89 60)/(60 30) = 2.97, and cutting at 60 minutes gives an odds ratio of (108 25)/(95 11) = 2.58 The ability to include such interaction terms represents a genuine extension of the Snell-McCullagh approach. Although the representation of treatment effect with a single parameter requires us to assume no interaction between treatment and cutpoint, there is no such requirement for other explanatory variables of less direct interest. Thus, the proportional odds assumption may be maintained for the effect of interest (treatment), but relaxed for the effects of other powerful disturbing influences. 3 Repeated ordinal response data The extension of the method to deal with repeated ordinal measurements in the same subject is natural. Such repeated ordinal measurements occur frequently in cross-over trials (see, for example, Jones and Kenward, 1989), and in experiments which incorporate a pre-treatment baseline measurement. The analysis of such data by maximum likelihood is difficult. Incorporation of a random subject effect in the linear model leads to an intractable likelihood, as do other approaches to modelling the association structure. 6

7 By contrast, the GEE approach is straightforward. Each ordinal measurement contributes a block of derived binary response variables so that, if there are R repeated measurements, the binary response vector is of length NR(C 1). Explanatory variables may be constant within a subject, in which case each value must be repeated R(C 1) times in the design matrix, or may vary from occasion to occasion, requiring each value to be repeated C 1 times. The model will include effects for cutpoint, occasion, other explanatory variables, and (possibly) their interaction. Two methods have been considered for calculating a working correlation matrix 1. to calculate working correlations between binary responses representing different cutpoints of the same measurement as in 2, and to ignore all others, and 2. to estimate the correlation structure as a free R(C 1) R(C 1) matrix. The second suggestion requires estimation of the correlation structure and this is an active research area. In this paper the approach suggested by Liang and Zeger (1986) is used. In later work (Liang, Zeger and Qaqish, 1992) this was termed GEE1 to distinguish it from the (rather more efficient) approach of Prentice and Zhao (1991), which they termed GEE2. An example Table 2 shows the sleep data in more detail, including both pre-treatment and follow-up measurements. The extended analysis simultaneously models pre-treatment and follow-up responses by expanding each subject s responses into 6 binary indicators. If, as before, we index subjects by i and cutpoints by j, and further index pre-treatment and follow-up responses by t = 0 and 1 respectively, then a model for treatment effect is or, in the Wilkinson and Rogers syntax, η i jt = µ + θ j + βz i + γt + δz i t. 1 + Cutpoint + Treatment + Occasion + Treatment.Occasion The parameter of interest in this model is the interaction parameter, δ. 7

8 Initial Follow-up occasion Treatment occasion < > 60 Active < > Placebo < > Table 2: Time to falling asleep for 239 subjects, pre-treatment and at follow-up Our first working correlation structure is ρ 1 = The bottom right section of this matrix is the same as that used in 2 and the top left section is calculated in the same way from the marginal cumulative proportions for the pre-treatment measurement (0.1088, and ). Correlation between pre-treatment and follow-up responses are ignored. In the second approach, the correlation matrix was estimated from the data, as ρ 2 = Note that these two matrices agree quite closely except for those elements set to zero in the former. 8..

9 ASE Method Estimate Naive Robust GEE(ρ 1 ) GEE(ρ 2 ) EWLS Table 3: Estimates of the Treatment Occasion interaction parameter The estimates of the interaction parameter δ obtained from these two analyses are given in Table 3. Also shown is the estimate obtained by Agresti (1989), who fitted the same model to these data using empirically weighted least squares (EWLS). For the GEE analyses, two ASE s are given. The first ( naive ) estimate is the appropriate diagonal element of (X t WX) 1 and requires that the working correlation matrix and the variance function are both correct. The second is the robust estimate which allows for mis-specification of either or both of these. The variance function cannot fail to be correctly specified since, for any response, y, taking on values 0 or 1, Var(y) = E(y)[1 E(y)]. It is therefore not surprising that in the second GEE analysis, which estimates the correlation structure from the data, the naive and robust ASE s agree closely. In the first analysis, the naive ASE is incorrect owing to the mis-specification of the working correlation matrix. However, the robust ASE is very close to that obtained in the second analysis. It would seem, therefore, that the loss of efficiency due to using an incorrect working correlation structure is negigible. Agresti s estimates using EWLS were only published to two decimal places, but seem to agree quite closely with the GEE analyses. In no analysis was the treatment effect estimated more precisely than our earlier analysis which discarded the pre-treatment baseline measurement. This, of course, is not surprising if we consider the analogous analysis for measurements on a continuous interval scale. In that case, we only gain from using the baseline data if the between-subject component of variance excedes the within-subject error variance. In that case, the correlation between pre-treatment and follow-up measurements would excede

10 4 Discussion The generalised estimating equation method proposed by Liang and Zeger provides an invaluable new tool for the applied statistician. The approach to ordinal response data described here serves to demonstrate the flexibility of the approach and its ability to provide a unified approach to seemingly unrelated problems. Now that software is becoming available, it is increasingly attractive to use this general technique in preference to more specialised (and limited) programs. This paper has shown that 1. the Snell-McCullagh model for ordinal response data may be treated as a special instance of marginal models for repeated binary responses, 2. the GEE method of estimation is nearly as efficient as full maximum likelihood, 3. the approach allows extension of the model to include interactions between cutpoints and explanatory variables, and to deal with repeated ordinal measurements. Some problems remain. In particular the performance of the method for repeated measurements in small samples requires further investigation, particularly in view of its potential application in cross-over trials. In this context, the adequacy of the robust ASE requires further study. The alternative is to estimate the correlation structure and use the naive ASE, but estimation of a large number of correlations from a small sample is potentially hazardous. A further possibility is to model the correlation structure more parsimoniously in terms of the expected values and, perhaps, one further parameter expressing the strength of association between pretreatment and follow-up measurements. It must be expected, however, that whatever approach turns out to be preferable, generalised estimating equation methods will prove better in small samples than the empirical weighted least squares approach which is currently its main competitor. Software The computations described in this paper were carried out in S using the gee() function written by Vincent Carey and available on STATLIB. The maximum like- 10

11 lihood analysis of the follow-up data was carried out using SAS PROC LOGIS- TIC. Agresti s (1989) analysis used SAS PROC CATMOD. Acknowledgements I am grateful to the associate editor and to the referees for their constructive criticism of an earlier version. References Agresti, A. (1989) A survey of models for repeated ordered categorical response data. Statistics in Medicine, 8, Clayton, D.G. (1974) Some odds ratio statistics for the analysis of ordered categorical data. Biometrika, 61, Francom, S.F., Chuang, C. and Landis, J.R. (1989) A log-linear model for ordinal data to characterize differential change among treatments. Statistics in Medicine, 8, Jones, B. and Kenward, M.G. (1989) The Design and Analysis of Cross-over Trials. Chapman and Hall, London. Liang, K.-Y. and Zeger, S.L. (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73, Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992) Multivariate regression analyses for categorical data (with discussion). J.R.Statist.Soc. B, 54, McCullagh, P. (1980) Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society, Series B, 42, Prentice, R.L. and Zhao, L.P. (1991) Estimating equations in means and covariances of multivariate discrete and continuous responses. Biometrics, 47, Snell, E.J. (1964) A scaling procedure for ordered categorical data. Biometrics, 20, Wilkinson, G.N. and Rogers, C.E. (1973) Symbolic description of factorial models for analysis of variance. Applied Statistics, 22,

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference