A measure of partial association for generalized estimating equations

Size: px

Start display at page:

Download "A measure of partial association for generalized estimating equations"

Grant Baker
5 years ago
Views:

1 A measure of partial association for generalized estimating equations Sundar Natarajan, 1 Stuart Lipsitz, 2 Michael Parzen 3 and Stephen Lipshultz 4 1 Department of Medicine, New York University School of Medicine, and the VA New York Harbor Healthcare System, US 2 Division of General Internal Medicine, Brigham and Women s Hospital, US 3 Goizueta Business School, Emory University, US 4 Department of Pediatrics, University of Miami School of Medicine, US Abstract: In a regression setting, the partial correlation coefficient is often used as a measure of standardized partial association between the outcome y and each of the covariates in x =[x 1,..., x K ]. In a linear regression model estimated using ordinary least squares, with y as the response, the estimated partial correlation coefficient between y and x k can be shown to be a monotone function, denoted f(z), of the Z-statistic for testing if the regression coefficient of x k is 0. When y is non-normal and the data are clustered so that y and x are obtained from each member of a cluster, generalized estimating equations are often used to estimate the regression parameters of the model for y given x. In this paper, when using generalized estimating equations, we propose using the above transformation f(z) of the GEE Z-statistic as a measure of partial association. Further, we also propose a coefficient of determination to measure the strength of association between the outcome variable and all of the covariates. To illustrate the method, we use a longitudinal study of the binary outcome heart toxicity from chemotherapy in children with leukaemia or sarcoma. Key words: coefficient of determination; longitudinal data; repeated measures Received November 2004; revised January 2006; accepted March Introduction In a normal linear regression setting, investigators are often interested in measuring the partial association between the outcome and the kth covariate, controlling for the other covariates. If one uses ordinary least squares (OLS) to estimate the regression parameters, the usual estimate of the partial correlation coefficient (Magee, 1990) is a monotone function, denoted f(z),ofthez-statistic for testing if the kth regression coefficient is 0. In an increasing number of biomedical studies, the data are clustered so that an outcome and covariate vector are obtained from each member of a cluster, and generalized estimating equations (GEE) (Liang and Zeger, 1986; Prentice, 1988) are Address for correspondence: Sundar Natarajan, 423 East 23rd Street, Room North, New York, NY sundar.natarajan@med.nyu.edu 2007 SAGE Publications / X

2 176 S Natarajan et al. used to estimate the regression parameters. For such repeated measures and clustered data settings, investigators are still often interested in obtaining a measure of partial association between the outcome and a covariate for a given member of the cluster. In this paper, we propose using the above transformation f(z)of the GEE Z-statistic as a measure of partial association. Further, in repeated measures and clustered data settings, investigators are often interested in measuring the strength of association between the outcome and all of the covariates, as measured by a coefficient of determination. In normal linear regression, the coefficient of determination R 2 can be shown to be the same monotone function discussed earlier, f(w), of the Wald statistic W, for testing if all parameters (except the intercept) are 0. In this paper, for the GEE approach, we propose a coefficient of determination that is the same function f(w) of a GEE Wald statistic W for testing that all regression coefficients (except the intercept) are 0. To illustrate the method, we use a dataset from a longitudinal study (Lipshultz et al., 1995) to explore the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukaemia or osteogenic sarcoma in childhood. There are 115 patients in the study. The outcome measured over time is abnormal wall stress of the heart (yes, no); it is measured at the end of chemotherapy until the present time. The maximum follow-up time is 18.5 years from the end of treatment. Children were not measured at pre-specified times, so the observation times are unequally spaced. The covariates of interest are the cumulative dose of doxorubicin, sex, age at end of treatment and time since the end of treatment. Table 1 shows data from 10 of the 115 patients on file. We use GEE to estimate the logistic regression model for the probability of abnormal wall stress as a function of these covariates. We are interested in a measure of partial association at a given time between wall stress and each of the covariates. Investigators are often interested in standardized measures of partial association in order to directly compare the degree to which each of the covariates helps explain the variance in the outcome variable. In particular, as seen in Table 1, since the covariates are measured on different scales (dose is in milligrams; age and time are in years; and sex is dichotomous), it is difficult to directly compare the magnitude of the covariate affects the investigators wanted a standardized measure of partial association for which to compare the covariate effects. For example, in clinical practice, it would be important to quantify the degree to which separate variables explain the outcome; this would allow modification/refinement of therapeutic strategies. In Section 2, we review the partial correlation coefficient when using OLS, and, in Section 3, we use the form of the partial correlation coefficient in OLS to define a measure of partial association for GEE. Section 4 discusses the coefficient of determination for OLS, and again uses the form of the coefficient of determination in OLS to define a coefficient of determination for GEE. Section 5 analyses the above longitudinal example; in Section 6, we perform simulations to explore the finite sample properties of our proposed methods.

3 Measure of partial association for GEEs 177 Table 1 Data from cardiotoxicity study Time since Wall Age at end Patient treatment a stress b Dose c of treatment a Sex F F F F F F F F F F F F F F M M M M M M M M M M M M M M M M Notes: a years. b 1 = abnormal, 0 = normal. c in mg. 2 Review of ordinary least squares We have a random sample of N independent individuals, in which the ith individual has response Y i and a K 1 covariate vector x i =[x i1,..., x ik ]. The response Y i is

4 178 S Natarajan et al. assumed to be normal with mean E(Y i x i ) = β 1 x i β K x ik and variance Var(Y i x i ) = σ 2 ; if the model has an intercept, then x i1 = 1 for all i. Welet β k be the maximum likelihood estimator of β k, Ŷ i = β 0 + β 1 x i β K x ik be the maximum likelihood estimator of E(Y i x i ), and Ni=1 ˆσ 2 (Y i Ŷ i ) 2 = (2.1) N be the maximum likelihood estimator of σ 2. The variance estimate of β k, Var( β k ) is the kth diagonal element of the (X X) 1 matrix times ˆσ 2, where the ith row of X is x i. Next, we discuss the partial correlation coefficient, and show how the partial correlation coefficient can be written as a function of the usual Z-statistic for testing H 0 : β k = 0. Without loss of generality, suppose we are interested in the partial correlation coefficient between Y i and x ik, say ρk 2, controlling for the other covariates. Neter et al. (1996) show that the square of the estimated partial correlation coefficient is ρ K 2 = SSE(X 1,..., X K 1 ) SSE(X 1,..., X K ) = 1 SSE(X 1,..., X K ) SSE(X 1,..., X K 1 ) SSE(X 1,..., X K 1 ), (2.2) where SSE(X 1,..., X K ) is the sums of squares error with all covariates in the model and SSE(X 1,..., X K 1 ) is the sums of squares error with all covariates except x ik in the model. Further, the F -statistic for testing H 0 : β K = 0 can be written as F K = SSE(X 1,..., X K 1 ) SSE(X 1,..., X K ), SSE(X 1,..., X K )/(N K) or, equivalently, 1 + F K /(N K) = SSE(X 1,..., X K 1 ) SSE(X 1,..., X K ). (2.3) After substituting (2.3) into (2.2), we obtain ρ K 2 = F K /(N K) = F K/(N K) 1 + F K /(N K). (2.4) Finally, note that we can also write F K as F K = (N K) ZK 2 N, (2.5)

5 Measure of partial association for GEEs 179 where Z k = β k Var( β k ), (2.6) is the usual Wald-statistic for testing H 0 : β k = 0. Substituting (2.5) in (2.4) and taking the square root, we obtain ρ k = Z k/ N 1 + Z 2 k /N. (2.7) In the following section, we use this form of the partial correlation coefficient to define a new measure of partial association for GEE. 3 Repeated measures and generalized estimating equations Repeated measures studies arise often in biomedical studies. In repeated measures studies, the basic sampling unit is a group or cluster of subjects; a measurement is made on each subject within the cluster. The observations within the cluster, one from each subject, constitute the repeated measurements on the cluster. In a developmental toxicology study, the cluster is a litter and the newborn are the subjects within the cluster; in an eye disease study, the cluster is the person and the two eyes are the subjects within the cluster. In the example shown in Table 1, the cluster is the person and the repeated measures are the binary cardiotoxic measurements over time. Thus, instead of a univariate response, each individual contributes n i responses. Individual i(i= 1,..., N), has an n i 1 response vector Y i = [Y i1,..., Y ini ], where Y it is a member of the exponential family, which includes normal, Bernoulli, Poisson and gamma random variables. There is also a K 1 covariate vector x ij = [x ij 1,..., x ij K ] associated with Y ij, such that for some function g( ), and µ ij = E(Y ij x ij ) = g(x ij β), Var(Y ij x ij ) = h(µ ij ), (3.1) for some function h(µ ij ). For example, when Y ij is binary, and logistic regression is used, µ ij = ex ij β 1 + e x ij β and Var(Y ij x ij ) = µ ij (1 µ ij ).

6 180 S Natarajan et al. The GEE estimator of β, β, is found by solving the estimating equations u β ( β) = N i=1 D i 1 V i [Y i µ i ( β)] =0, (3.2) where µ i = [µ i1,..., µ ini ], D i = µ i (β)/ β, and V i is the n i n i working correlation matrix of Y i. This working correlation matrix is specified through a working correlation matrix. In particular, the correlation structure Y i, is accounted for by R i (α), an i n i working correlation matrix, which is fully specified by an s 1 vector of unknown parameters α. In (3.2), V i = A 1/2 i R i (α)a 1/2 i, where A i is a n i n i diagonal matrix with Var(Y it x it ) = h(µ it ) as the tth diagonal entry. The estimate of β is obtained by plugging in a consistent estimator of α into (3.2) and solving for β iteratively. At each iteration, the estimate of α from the working correlation matrix as well as the estimator β are refined. Liang and Zeger (1986) show that under mild regularity conditions, β is a consistent estimator of β. A robust, sandwich estimator (White, 1982) can be used to consistently estimate the covariance matrix of β, and is particularly useful since the working correlation may be misspecified. To test H 0 : β k = 0, one can use the GEE Wald Z-statistic Z k = β k, (3.3) Var( β k ) where Var( β k ) is the robust variance estimate. Making the analogy to (2.7), we define a measure of partial association between Y ij and x ij k to be ρ k = Z k/ N 1 + Z 2 k /N. (3.4) Since ρ k 2 = Z2 k /N 1 + Zk 2/N is always between 0 and 1, then ρ k will always be in the interval [ 1, 1]. However, we now propose a slight modification to ρ k in (3.4) because, for ordinary logistic regression, the Wald statistic in (3.3) has been shown to have poor properties as β k gets large (Hauck and Donner, 1977). In particular, Hauck and Donner (1977) showed that the Wald statistic is not a monotone increasing function of the parameter estimate β k as the distance between the β k and the null value (0) increases. In fact,

7 Measure of partial association for GEEs 181 as the distance between the β k and the 0 increases, the Wald statistic increases to a certain point and then decreases toward 0. Basically, the Wald statistic Zk 2 behaves this way because, as β k, Var( β K ) increases at a faster rate than β k 2. Because of this property, Hauck and Donner (1977) recommend that the Wald statistic in (2.6) not be used with logistic regression. Thus, since Zk 2 is not a monotone increasing function of the parameter estimate β k, ρ k 2 in (3.4) will not be either. In many data analysis problems, an alternative to the Wald statistic that has greater power is the likelihood ratio statistic. Unfortunately, there is no likelihood in GEE, so a likelihood ratio statistic cannot be used to form a measure of partial association for GEE. Here, we propose the use of the Wald statistic with the variance of β k estimated under the null H 0 : β k = 0. To obtain the variance estimate under the null, say Ṽar( β k ), one replaces β in the GEE robust variance Var( β k ) with the estimate β under the null. Thus, our proposed Wald statistic for use in a measure of partial association is β k Z k =, (3.5) Ṽar( β k ) which, under the null, as N, is approximately chi-square with 1 df. Then, we define the measure of partial association between Y ij and x ij k to be ρ k = Z k / N 1 + Z 2 k /N. (3.6) The intuition is that ρ k transforms the test statistic Z k to a more intuitively appealing ( 1, 1) scale. In Section 6, we compare the asymptotic properties of (3.4) and (3.6) in a simple logistic regression setting without repeated measures to explore the properties of these two measures of partial association. 4 Extension to the coefficient of determination Suppose we have a normal linear regression model as in Section 2, with the model E(Y i x i ) = β 0 + β 1 x i β K x ik = x i β, and variance Var(Y i x i ) = σ 2. We let β be the maximum likelihood estimator of β, and Var( β) = ˆσ 2 (X X) 1 be the estimated covariance matrix of β, where ˆσ 2 is given by (2.1). To test

8 182 S Natarajan et al. H 0 : β 1 =...= β K = 0, one can use the Wald statistic Q =[C β] [C Var( β)c ] 1 [C β], (4.1) where C is a (K 1) K matrix with its first column having all 0s, and its last (K 1) columns being the (K 1) identity matrix. Christensen (1996) shows that the coefficient of determination, R 2, equals R 2 = Q/N 1 + Q/N. (4.2) For the repeated measures model E(Y ij x ij ) = g(β 0 + β 1 x ij β K x ij K ), one can use a Wald test like (4.1) to test H 0 : β 1 =... = β K = 0, and form an R 2 statistic like (4.2). However, because of the problems encountered using the Wald statistic discussed in the previous section, we propose the coefficient of determination R 2 = Q/N 1 + Q/N, (4.3) where Q =[C β] [CṼar( β)c ] 1 [C β], (4.4) is the Wald statistic with the GEE robust covariance matrix estimated under the null H 0 : β 1 =... = β K = 0, and denoted by Ṽar( β). Again, the intuition is that R 2 transforms the Wald test statistic Q to a more intuitively appealing (0, 1) scale. The properties of our proposed coefficient of determination in (4.4) is explored in Section 7. Unlike linear regression using OLS, there is no guarantee that a model with additional covariates would have a larger R 2, although most of the time it will be true. If an additional covariate adds very little information, it is possible that this R 2 could decrease very slightly; this possibility is explored in simulations in Section 6. For likelihood-based methods, if one substituted a likelihood ratio statistic for the Wald statistic Q in R 2, as proposed by Bohrnstedt and Knoke (1994), then it would be true that a model with additional covariates would have a larger value of R 2. Unfortunately, since there is no likelihood with GEE, this substitution cannot be used. 5 Example Late cardiotoxic effects of the chemotherapy doxorubicin are increasingly a problem for patients who survive childhood cancer. Cardiotoxicity is often progressive and

9 Measure of partial association for GEEs 183 some patients have disabling symptoms. In our example (Lipshultz et al., 1995), the objective was to identify risk factors for late cardiotoxicity, as measured by abnormal wall stress of the heart over time: Y ij equals 1 if abnormal wall stress or 0 if normal wall stress at time j. The abnormal wall stress was determined by examining echocardiograms from N = 115 children and adults who had received cumulative doses of 45 to 550 mg of doxorubicin per square metre of body-surface area for the treatment of acute lymphoblastic leukaemia in childhood. The covariates are the cumulative dose (ranging from 45 to 550 mg); age at end of treatment (ranging from 1.5 to 20 years); time since the end of chemotherapy to the given wall stress measurement (ranging from 0 to 15.5 years); and sex (1 = female, 0 = male). Table 1 shows data from 10 of the 115 patients on file. If we let π ij = pr(y ij = 1 x ij ) be the probability of abnormal wall stress at time t, then the logistic regression model of interest at time t is logit(π ij ) = β 0 + β 1 sex i + β 2 age i + β 3 dose i + β 4 τ ij, where τ ij is a function that maps the jth measurement (j = 1,..., n i ) on individual i to the time since the end of chemotherapy. Since the observation times for each individual is different, we modelled the correlation as autoregressive 1 (AR1), i.e., for a subject seen at times j and l, Corr(Y ij, Y il x ij, x il ) = α τ ij τ il, where 0 < α < 1. When j = l, there is perfect correlation and the correlation equals 1. When observations are far apart in time, then as τ ij τ il, α = 0 and the observations are uncorrelated. However, to explore how the proposed measure of partial association is affected by different working correlation models, we also obtained GEE estimates under the naive assumption of independence, as well as under an exchangeable correlation structure, in which Corr(Y ij, Y il x ij, x il ) = α, for all j = l. Here, we calculate the measure of partial association and the coefficient of determination using the Wald statistic with robust variance estimated under the null. Table 2 gives GEE estimates for the three working correlation models, as well as the estimated measures of partial association, and R 2 formed from sequentially going down the table (adding each covariate to the model as you go down the table). Note that the measures of partial association and coefficients of determination are very similar for all working correlation models; further note that R 2 increases as more covariates are added to the model. Using the estimates of β from the AR1 working correlation model, dose is the most significant predictor, with a measure of partial association of about 0.43, meaning that an increase in dose increases the probability

10 184 S Natarajan et al. Table 2 Estimates from the cardiac study for three working correlation models Cov Standard Partial Variable model Estimate error z-value p-value corr R 2 Intercept IND EXC AR Dose IND EXC AR Female IND EXC AR Age at Trt IND EXC AR Time since end IND EXC AR of an abnormal wall stress measurement. In this example, the clinical investigator was not sure whether to scale dose as milligrams (as it is scaled in the current dataset) or grams; since the Z-statistic does not depend on linear transformations of the covariate, our measure of partial association is the same for either scale, so it does not matter which scale when we use our measure of partial association. Female gender is the next most significant predictor, with a 0.17 partial association, meaning that females have a higher probability of abnormal wall stress. Age at the end of treatment is the next most significant predictor, with a 0.15 partial association, meaning that an older patient is more likely to have abnormal wall stress. Finally, there appears to be no significant partial association between the time since the end of treatment and wall stress. The overall measure R 2 = 0.23 may indicate that if other covariates were available, we might be able to find a better fitting model. Note, some coefficients of determination are constrained to be less than 1 for binary outcome data (discussed in the following section) so that a value of 0.23 might actually mean a good fit; however, in simulations performed in the next section, our proposed coefficient of determination does not appear to be constrained to be less than 1, so that a value of 0.23 may indicate that the model is not a great fit. Although this is just an example, the measures of partial association were very similar for all working correlation models. However, for some datasets (especially with time-varying covariates), more complex working correlation models (e.g., AR1 versus independence) will often lead to higher efficiency than simpler working

11 Measure of partial association for GEEs 185 correlation models, which in turn will lead to larger Wald statistics, and thus larger values of the partial associations. 6 Asymptotic study of measures of partial association To compare (3.6) with (3.4) in a simple data setting without repeated measures, we consider logistic regression for a (2 2) table. Suppose that the binary covariate is denoted X i and the binary outcome is Y i. In the most general case, with both X i and Y i random, the joint probability that (X i = j) and (Y i = k) is denoted by p jk = pr[(x i = j), (Y i = k)], for j = 0, 1, and k = 0, 1. For simple logistic regression of the Bernoulli outcome Y i versus the binary covariate X i, the logistic regression coefficient of X i, denoted by β 1, is the log odds ratio, β 1 = log(p 11 ) + log(p 00 ) log(p 10 ) log(p 01 ). The maximum likelihood estimate (MLE) of β 1, say β 1, is calculated by replacing p jk in β 1 by the MLE p jk = n jk /N, where n jk is the number of subjects with (X i = j) and (Y i = k), and N is the total sample size. The variance of β 1 under the alternative that the odds ratio does not equal 1 is Var( β 1 ) = 1 ( N p 11 p 00 p 10 p 01 The variance of the MLE under the null that the odds ratio equals 1 is Var 0 ( β 1 ) = 1 ( N p 1+ p +1 p 0+ p +0 p 1+ p +0 p 0+ p +1 where p j+ = p j0 + p j1 and p +k = p 0k + p 1k are the marginal probabilities. Then, it can be easily shown that, as N, ρ 1 in (3.4) converges in probability to ρ 1a = and ρ 1 in (3.6) converges in probability to / β 1 p p p p 1 01 ). 1 + β 2 1 /[p p p p 1 01 ] ), ρ 1o = / β 1 (p1+ p +1 ) 1 + (p 0+ p +0 ) 1 + (p 1+ p +0 ) 1 + (p 0+ p +1 ) β1 2/[(p 1+p +1 ) 1 + (p 0+ p +0 ) 1 + (p 1+ p +0 ) 1 + (p 0+ p +1 ) 1 ]

12 186 S Natarajan et al. Rho METHOD Wald Null Wald BETA1 Figure 1 Asymptotic study for partial associations Then, to compare ρ 1a and ρ 1o, we consider the scenario in which the cell probabilities p 10 = 0.05 and p 11 = 0.45, and p 00 varies from 0.01 to 0.49 in increments of 0.05, and correspondingly, p 01 varies from 0.49 to 0.01, giving log-odds ratios (β 1 ) that range from 0 to 7.7 and odds ratios that range from 1 to A plot comparing ρ 1a and ρ 1o, in this scenario is given in Figure 1. From Figure 1, we see that ρ 1o (Wald ρ with variance under the null) gives higher values than ρ 1a (Wald ρ with variance under the alternative) throughout the whole curve. The Wald ρ with variance under the null appears to be converging to 1 as β 1 gets larger, whereas, as discussed in Hauck and Donner (1977), the usual Wald statistic in (3.3) has poor properties; it is not a monotone increasing function of the parameter estimate as the distance between the parameter estimate and the null value increases. This bears out in Figure 1, in which ρ 1a increases slowly as β 1 increases from 0 to 4.5, and then decreases back toward 0 for β 1 > 4.5. This suggests that Wald ρ with variance under the alternative, ρ, has poor properties, and should not be used. In particular, even though we have only considered a non-repeated measures case here, we expect similar results to hold for GEE, since maximum likelihood for ordinary logistic regression is a special case of GEE. 7 Simulations for coefficients of determination In this section we study the finite sample performance of our proposed coefficients of determination. In order to compare our proposed coefficients to a well-established

13 Measure of partial association for GEEs 187 coefficient of determination, for example, the pseudo-r 2 proposed by Cox and Snell (1989) based on the likelihood-ratio statistic, we perform simulations in which the true model is an ordinary logistic regression for independent univariate observations. In particular, we have performed simulations for ordinary logistic regression with univariate outcomes instead of GEE for repeated measures data so that our proposed coefficient of determination can be compared to the coefficient of determination based on the likelihood (Cox and Snell, 1989). The pseudo-r 2 proposed by Cox and Snell (1989) based on the likelihood-ratio statistic is defined as [ ] L(0) 2/n RLR 2 = 1, L( β) where L( β) denotes the likelihood for the fitted model, and L(0) denotes the likelihood for the model in which all regression coefficients (except the intercept) are 0. Below, we compare the size of the three coefficients of determination (Cox and Snell s likelihood ratio, Wald, and Wald with variance estimated under the null) as a logistic regression coefficient gets larger. We also explore the number of times the Wald coefficients of determination decrease as an unimportant covariate (in which the true logistic regression coefficient is 0) is added to the model. We formulate the true Bernoulli distribution from which we simulate as p(y i x i1, x i2, x i3, β) = p y i i (1 p i ) y i, (7.1) where the logit of the probability of success equals logit[pr(y i = 1 x i1, x i2, x i3, β)] = x i1 x i2 + β 3 x i3, (7.2) i = 1,..., n. Also, in the simulations, β 3 is varied from 0 to 5.5, and, for simplicity, we fixed the total sample size at n = 400. Here, for the covariate distributions, we let X i1, X i2 and X i3 all be independent of each other, with X i1 having a Bernoulli distribution with pr(x i = 1) = 0.5; and X i2 and X i3 both having N(0, 1) distributions. We took one sample of size 400 random vectors (X i1, X i2, X i3 ), and fixed this covariate distribution for all simulations. We performed 12 sets of 1000 simulations; corresponding to varying β 3 from 0 to 5.5 in increments of 0.5. First, we want to compare the size of the coefficients of determination as β 3 increases. In particular, we calculated the three R 2 measures when fitting the full model (7.2) with covariates X i1, X i2 and X i3. Figure 2 plots the average value of the R 2 s over the 1000 simulations at each value of β 3. The results from Figure 2 are similar to those in Figure 1. From Figure 2, we see that the Wald R 2 with variance under the null gives higher values than the other two. In fact, the Wald R 2 with variance under the null appears to be converging to 1 as β 3 gets larger. As discussed earlier, Hauck and Donner (1977) showed that the usual Wald statistic in (3.3) has poor properties; it is not a monotone increasing function of the parameter estimate

14 188 S Natarajan et al. R-Square METHOD Wald null Likelihood _ Wald BETA3 Figure 2 Simulation results for coefficients of determination as the distance between the parameter estimate and the null value increases. This bears out in Figure 2, in which the coefficient of determination based on the usual Wald statistic increases slowly as β 3 increases from 0 to 2.5, and then decreases back toward 0 for β 3 > 2.5. The likelihood-ratio coefficient of determination (Cox and Snell, 1989) does not give as high values as the Wald R 2 with variance under the null; however, as discussed by Nagelkerke (1991), the likelihood-ratio R 2 can take on a maximum value of [1 L(0)] (2/n), which is the reason it is always less than our proposed method. For the simulation with β 3 = 5.5, on average, the maximum value that the likelihood-ratio R 2 can take on is approximately 0.75; the average value of the likelihood-ratio R 2 at β 3 = 5.5 is When we divide by 0.75, we get 0.82, which is much closer to 0.89, the value of the Wald R 2 with variance under the null. Thus, unlike the likelihood coefficient of determination proposed by Cox and Snell (1989), our proposed coefficient of determination does not appear to be constrained to be less than 1; our coefficient of determination appears to be more similar to the redefined likelihood coefficient of determination proposed by Nagalkerke (1991), which is not constrained to be less than 1. Note, as discussed in Section 4, in theory, under the weak assumption that β 0 is finite under the null H 0 : β 1 =...= β K = 0, R 2 will converge to 1 if any of the β k s converge to ± ; the simulations in this section confirm that result. In the simulations, we also calculated the three R 2 measures when fitting the reduced model with covariates X i1 and X i2 (that is, dropping X i3 out of the model).

15 Measure of partial association for GEEs 189 We did this in order to estimate the percentage of times the R 2 measures decrease when adding another covariate (X i3 ) to the model. In particular, we calculated the number of times R 2 with (X i1, X i2, X i3 ) in the model is less than R 2 with just (X i1, X i2 ) in the model. Of course, for the likelihood ratio, the number of times will be 0 since the likelihood always increases as more covariates are added to the model. When β 3 = 0, the number of times R 2 decreased was 14.2% for the usual Wald, and 7.6% for the Wald with variance under the null. However, for practical purposes, if R 2 was rounded to the third decimal point, the number of times R 2 decreased would be 0. For any simulations with β 3 > 0, R 2 never decreased, even without rounding. Since the likelihood ratio R 2 is not available for GEE, and our Wald R 2 with variance under the null compares favorably to the likelihood ratio R 2, we propose use of our Wald R 2 with variance under the null. Because of its poor properties, we suggest never using the coefficient of determination based on the usual Wald statistic. 8 Discussion In this paper, we have proposed a measure of partial association for GEE that is an extension of what is used for linear regression. The set of partial associations may be especially useful when the covariates are measured on different scales. Most importantly, since the Z-statistic is invariant to linear transformations, and our measure of partial association is a monotone function of the Z-statistic, our measure of partial association is invariant to linear transformation. Thus, even if two investigators have a disagreement of how to scale the covariates, the measure of partial association will be unaffected by the scale change. Although partial associations are useful, they are not without their problems (Greenland et al., 1986). The partial associations do not provide more insight than is given in the estimates, estimated standard errors, Z-statistics, and p-values, but is a summary measure. We do not suggest that an investigator use the partial associations alone instead of the usual results that are published, but in conjunction with them. We have also proposed an overall R 2 to measure the strength of association between the outcome variable and predictors. This proposed R 2 is a monotone function of the Wald statistic with variance under the null for testing if all regression coefficients (except the intercept) are 0. If an additional covariate is very nonsignificant, there is a small possibility that our proposed R 2 could decrease very slightly. However, in simulations, we found that, for all practical purposes, our proposed measure of R 2 never decreased. Further, unlike other proposed coefficients of determination, our proposed coefficient can take on values close to 1. In our example, we found that the measures of partial association and coefficient of determination based on the Wald under the null were very similar for all working

16 190 S Natarajan et al. correlations. However, in datasets with time-varying covariates, more complex covariance models (e.g., AR1 versus independence) will often lead to higher efficiency than simpler covariance models, which in turn will lead to larger Wald statistics, and thus larger values of the partial associations and coefficient of determination. Acknowledgements We are grateful for the support provided by grants AI 60373, GM 29745, HL 69800, CA 74015, CA 70101, CA 68484, HL from the National Institutes of Health (USA), and RCD from the Department of Veterans Affairs (USA). References Bohrnstedt G and Knoke D (1994) Statistics for Social Data Analysis (Third Edition). Itasca, IL: FE Peacock Publishers, Inc. Christensen R (1996) Plane Answers to Complex Questions. The Theory of Linear Models (Second Edition). New York, NY: Springer-Verlag. Cox DR and Snell EJ (1989) Analysis of binary data. London: Chapman and Hall. Greenland S, Schlesselman JJ and Criqui MH (1986) The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology, 123, Hauck WW and Donner A (1977) Wald s test as applied to hypotheses in logit analysis. Journal of the American Statistical Association, 72, Liang KY and Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73, Lipshultz SE, Lipsitz SR, Mone SM, Goorin AM, Sallan SE and Sanders SP (1995) Female sex and higher drug dose as risk factors for late cardiotoxic effects of doxorubicin therapy for childhood cancer. New England Journal of Medicine, 332, Magee L (1990) R 2 measures based on Wald and likelihood ratio joint significance tests. The American Statistician, 44, Nagelkerke NJD (1991) A note on a general definition of the coefficient of determination. Biometrika, 78, Neter J, Kutner MH, Nachsheim CJ and Wasserman W (1996) Applied Linear Statistical Models (Fourth Edition). Boston, MA: McGraw-Hill. Prentice RL (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, White H (1982) Maximum likelihood estimation under mis-specified models. Econometrica, 50, 1 26.

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.