Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Size: px

Start display at page:

Download "Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at"

Brandon Riley
5 years ago
Views:

1 Biometrika Trust A note on the sensitivity to assumptions of a generalized linear mixed model Author(s): D. R. COX and M. Y. WONG Source: Biometrika, Vol. 97, No. 1 (MARCH 2010), pp Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: Accessed: :37 UTC REFERENCES Linked references are available on JSTOR for this article: You may need to log in to JSTOR to access the linked references. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika

2 Biometrika (2010), 97, \,pp ? 2010 Biometrika Trust Printed in Great Britain doi: /biomet/asp083 Miscellanea A note on the sensitivity to assumptions of a generalized linear mixed model By D. R. COX Nuffield College, Oxford OX1 INF, U.K. david.cox@nuffield.ox.ac.uk and M. Y. WONG Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong mamywong@ust.hk Summary A simple case of Poisson regression is used to study the potential gain in efficiency from using a mixed model representation. Possible systematic errors arising from misspecification of the random terms in the model are examined. It is shown in particular that for a special but realistic problem, appreciable bias may arise from misspecification of a random component. Some key words: Conditional likelihood; Multilevel model; Poisson model; Recovery of information; Stratification. 1. Introduction Generalized linear mixed models are widely used, especially in the analysis of observational studies. See, for example, Breslow & Clayton (1993) and Snijders & Bosker (1999, Ch. 14). In the present paper we discuss briefly some general issues connected with such models and then examine in more detail the sensitivity to assumptions of a particular but representative special case. We also assess the nominal improvement in precision that the mixed model formulation produces. We consider for illustration a simple special situation in which observations are grouped into centres, blocks or strata. Some parameters, called structural parameters, have a common interpretation across strata and are of intrinsic interest, even if their value changes from stratum to stratum. Others are stratum-specific and are nuisance parameters for the current purpose. A random effect formulation may be used for one or both types of parameter. First we briefly discuss some of the general issues involved, but the detailed analysis in the paper is restricted to the second type of parameter. 2. Structural parameters A typical example of a structural parameter is a treatment contrast in an observational or experimental study replicated in a broadly similar and comparable form in a number of centres. If there is inexplicable variation between centres in the treatment effect, as judged by a significant treatment by centre interaction, it is common, although not entirely uncontroversial, to add a centre-specific random component to the treatment effect. For a general discussion, see Cox & Solomon (2000,? 4.5). The effect of this as compared with an assumption of constant treatment effect is, for unbalanced data, both movement of the point estimate of the treatment effect towards an unweighted average of the individual effects and increase in the notional standard error. A realistic estimation of the magnitude of the additional component of variance requires

3 210 D. R. Cox and M. Y. Wong that the number of centres is appreciable. The parameter defining the treatment effect refers to an average over an ensemble of repetitions of the stochastic system generating the interaction; it is probably rarely plausible to regard the centres as a random sample from a meaningful target population of centres, leading to an interpretation that, if available, would be more tangible. We now suppose that interest focuses on the structural parameter. 3. Formulation of a special case In order to study a specific situation, we suppose that observations are available in pairs (y7-o? J'y i ) f?r j? 1,..., m represented by independent Poisson distributed random variables with means (rjoaje'0, rjiaje6), where 0 is the structural parameter and aj is stratum-specific. Here rjo and rj\ are known constants specifying, for example, the sizes of risk sets. In this formulation the number, m, of strata may be quite large. This model is representative of, in particular, epidemiological studies of mortality in unexposed and exposed individuals stratified by, for example, age and other features. The model has been analyzed by de Stavola & Cox (2008) with the stratum parameters fixed, their objective being to examine the effects of unnecessary stratification. An essentially similar but more complicated version would allow the estimation of, say, a logistic regression equation within each stratum, some of the parameters of which are stratum-specific. 4. Some general considerations If interest is in 0, the presence of a large number of nuisance parameters is a danger signal of possible loss of sensitivity or even of inconsistency. There are several ways to proceed. The first is to replace the aj by a much smaller number of parameters. For example, if the strata are defined by age, a polynomial or other relatively simple function of age might be used. The second, and in a sense the extreme opposite, approach is to regard the ctj as totally arbitrary and to eliminate them from the likelihood by appropriate conditioning, possible in the present example. This imposes the very strong requirement that the resulting analysis should have specified properties whatever the aj may be, no matter how extreme their values or how bizarre their configuration. Another possibility, again treating the aj as arbitrary, is to calculate the profile likelihood for the parameter 0 of interest, that is, to find for each fixed 0 the maximum likelihood estimates of aj and the cor responding maximized loglikelihood function. This may lead to inconsistent estimates of 0 unless there is substantial information internally from each stratum about the corresponding o?j. In our example this would require the individual Poisson-distributed counts to be large. We do not consider this approach further. There are now three versions that involve treating the aj as random variables, typically but not necessarily values for different j being independent. The simplest, and the one studied in more detail here, is to regard the aj as independent and identically distributed random variables with a parametrically specified distribution, in particular in the present context a gamma distribution. A corresponding normal-theory issue concerns the recovery of inter-block information in unbalanced designs (Yates, 1940). Here the Uj are block effects assumed independently and identically normally distributed, an assumption given some support by randomization of treatments to blocks. Such justification is, of course, not available in observational studies. The objective here is to improve the estimation of 0, especially in situations in which in fact the stratification is largely ineffective and the lack of balance sacrifices information. Next we may regard the aj as independent and identically distributed random variables with an arbitrary distribution. Marginal maximum likelihood is now possible. For a general discussion of consistency of estimation in this formulation, see Kiefer & Wolfowitz (1956) and, for the implications for estimation in matched pair binary data, Neuhaus et al. (1994). The emphasis in this work is on showing the consistency of the resulting estimate and that indeed in some situations it is the same as that from the conditional approach, thus implying that no recovery of inter-stratum information has been achieved. Finally, it is formally possible to treat the aj as having an arbitrary joint distribution or as independent with each component having its own arbitrary distribution. It seems clear on general grounds that this cannot be distinguished from treating the aj as arbitrary unknown constants.

4 Miscellanea 211 The h-likelihood approach (Lee et al., 2006) studies predominantly inference about the stratum-specific parameters. Here we examine the parametric random effects formulation and its implications. A general point is that estimation of the dispersion of the random component of aj is somewhat akin to estimating an upper variance component from m? 1 degrees of freedom. This has very high estimation error if m is small. 5. Some likelihoods The conditional likelihood contribution for 0 from stratum j when aj is fixed and after conditioning on tj =yjo + yj\ is where dj = yjx - yj0, r,-(0) = rj0e~e + rjxee. The information about 0 is obtained from (1) as djo-tjlogrjio), (1) m l>4e?;w/0(9)' (2) 7 = 1 whereas, if it is correctly assumed that aj = a for all j, the pooled information conditionally on 5ZJ=i 0 *s ip = 4? ^?>oj (prj^j j?r'(0)- (3) Thus, in the analysis regarding the a7 as unknown constants, the analogue of the efficiency factor of an incomplete block design is (E^oHE^o.) " The behaviour of this is explored in a slightly different notation by de Stavola & Cox (2008). This quantity is not the efficiency of the conditional stratum-specific analysis that depends on the variation among the aj, but rather the loss of information in that analysis when in fact the stratification is nugatory. Now consider the likelihood when the c?j are independent and identically distributed with the gamma density ^(^a)77-1 exp(??a)/ F(r]) for a > 0. It is convenient to write v = for the reciprocal of the mean. Then the contribution of the j th stratum to the loglikelihood is where r7(#, vrj) = Vj(0) -\- vr?. log{r(i; + ri)i T{ri)} + Odj + rj log r? + rj log v - (tj + rj) logry(0, vr?), (4) It follows on differentiating twice with respect to the parameters and taking expectations over tj, dj that the parameter 77 is estimated orthogonally to 0, v and hence errors in estimating rj can be disregarded in the subsequent calculations. This corresponds in the analysis of incomplete block designs to insensitivity to the weights by which between- and within-block estimates are combined. If lm denotes the mixed model likelihood obtained by adding (4) over 7, we have that m iee = E(-d2lm/d02) = $2{4r;or,-i + j{e))i[vr?{9, vr?)}, m i0v = E(-d2lm/d0dv) = {-r1/v)y,r'j{9)lrj{e, vrj), ivv = E(-d2lm/dv2) = (n/v2)^2rj(9)/rj(e, vrj). Here r'j{6) is the derivative of rj(6) with respect to 9.

5 212 D. R. Cox and M. Y. Wong Table 1. Ratio of ic/iee.v for Coo =?ii = 1/2, 0yo = ~aj\ and different values ofro, r\ and rj r0 = 2 r0 = 10 r0 = 50 W? ri/ro Thus, the information about 0 adjusting for the estimation of v is iee.v = iee??qv/ iw The information measures given by (2) and by (3) are recovered when r? = 0 co stratum effects and when rj is very large corresponding to constant stratum eff 6. Some numerical results For numerical comparisons we may, by transformation of the rj t, for t = 0, take 0 = 0 and v = 1. Primary interest lies in how, as r? increases, the info iee.v The ratio of ic/iee.v is expressed as a function of r? and of quantities describing the pattern of variation between strata in the rjt, defined as follows. We write m mm rjt =n(l j=\ j=\ y=i Here Coo anc* c\\ specify proport sponding covariance. Numerical wo between two values?c(t. If aj? = a7o, the numbers at risk r ic = iee.v and there is no informati has aj\ =?ajo and some numerical r from the random-effects assumption 1 / V rj, is small and the two risk g In these and subsequent calculation behaviour near 0 = 00, we define 7. Sensitivity to assu A central assumption in the above in a gamma distribution. Failure of likelihood estimate 0 or to a wrong c on the former property by studyin are small biases in the gradient of th then the formal maximum likelihoo

6 Miscellanea 213 Table 2. Bias on 0 x 103 for y0 = yx = 1, ay o = O y?r a// j, aj\ divided equally between?cii <z??/ different values of r$, r\ and r? l/y/ri n/r0 r0 = 2 r0 = 10 rq = (44) 4-2(4-0) 2-5(34) 5 4-5(8-0) 3-1(4-2) 1-3(2-2) (84) 24(3-1) 0-8(1-7) (174) 11-2(10-0) 4-0(40) (15-6) 5-9(6-8) 1-6(2-2) (16-0) 3-7(4-5) 0-9(1-5) (64-1) 22-2(174) 5-1 (4-8) 5 334(354) 8-2(9-5) 1-7(2-2) (23-0) 4-5 (5-5) 0-9(1-5) The values in parentheses are bias calculated from 1000 simulations. To a first approximation we use the information matrix, /, calculated under the gamma model. Both / a (ge> gv) are proportional to the number of strata, implying that if (5) is nonnegative, the correspond estimate is inconsistent. The expected values of the gradient components are best first calculated stratum by stratum for fi values of the ocj and with 0 = 0. In fact d0 J ry(0, vrj) J 3 1 V / ry(0, vrj) J J 1 where Aj, Bj do not depend on o?y. It follows that if E((Xj) is constant, the maximum likelihood estimating equations derived from the gamma density are unbiased and, under very general conditions to this order, a consistent estimate of is obtained regardless of, for example, the distributional form of the c?j. This is not the case, however, E(c?j) depends on (ryo, rj\). To study this, suppose that E(aj) = v~l(l + Y\aji + Yoajo). (6) Then, provided that the ajt are small and we take 9 = 0, we have that ri(2r0 + vr])(yic2u + y0ci0) - r0(2rx + vr?)(yic0\ + y0cl0) *1 2 gv? Y] f /A M? * \') As a function of r/, the bias component go is greatest when r? is information to be recovered from the between-stratum variation. Table 2 shows the approximate bias in the estimate of 9 derived fr been assumed that y0 = yx = I and that either coo = 0 or c\\? coo In the symmetrical case, r\ = r0, en? coo, Yo = Y\>Q = 0, we means that 9 is consistent since i?v = 0. For a7o =? aj \, both go and set to be zero and different values of c\ \ were considered, with 1000 the simulation, 40 pairs of Poisson-distributed random variables where 9 = 0 and the aj were obtained from (6). There is very goo the simulations, shown in parentheses, and the corresponding theor the various approximations involved in the latter. The systematic err appreciable, that is when there is appreciable information in the betw the error note that the ratio of the rates in the two groups is e20, so th 0-06 in 9 is equivalent to a 13% error in the ratio of rates.

7 214 D. R. Cox and M. Y. Wong 8. Discussion We now comment briefly on the implications of these results, both for the specific situation analyzed and also more broadly We suppose that there is a parameter of interest, here 9, and a considerable number of individual parameters, here otj. There are typically two extreme estimates of 0, one assuming the otj totally arbitrary and the other assuming the aj constant. If the assumptions of the random effects model are correct, in effect a combination of the two is obtained. Table 1 shows that the gains in efficiency over the first analysis are often quite small, implying, because the estimates themselves are asymptotically efficient under appropriate assumptions, that the estimates themselves are nearly equal. In fact the random effects estimate will rarely differ appreciably from the more appropriate of the two simpler estimates given above. If, however, the assumptions of the random effects formulation are violated in that the otj are related to relevant features of the strata, then the estimate of 0 is inconsistent, in extreme cases appreciably so. This suggests the desirability of an initial check both of the need for the random effects model and of its appropriateness. First, is it reasonable to treat the structural parameter, 0, as a constant? To address this, a conditional maximum likelihood estimate of 0 can be obtained for each stratum, together with a standard error, and the resulting estimate checked for homogeneity. A more sensitive procedure is to plot the estimates against stratum-specific explanatory variables. If these consist only of the rkj then rxj and r0y may be dichotomized to form groups of strata as a basis for the comparison. If 0 is reasonably treated as a constant, estimates of the separate stratum parameters {aj} may be formed. For this there is a complementary conditional analysis for a regarding each stratum as in principle having a separate value of 0, the likelihood contribution involving a modified Bessel function. A more practicable approach is possible if 0 is assumed constant and the number of strata is large. Then we may estimate each aj from the full stratum-specific likelihood replacing 0 by its overall maximum likelihood estimate, 0, thus leading to the estimate?j = tj/rj(q). If it seems that a random-effects representation of them is both potentially useful and appropriate, then the more complex mixed model may be fitted. Appropriateness means in particular that the variation in the aj appears random. Comparison of the estimates obtained from the formal model with those obtained by more elementary procedures is desirable. Acknowledgement We thank the editor and the referees for very helpful comments. M.Y. Wong was funded by the Hong Kong Earmarked Research Grant. References Breslow, N. E. & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Am. Statist Assoc. 88, Cox, D. R. & Solomon, R J. (2000). Components of Variance. Boca Raton, FL: Chapman and Hall. De Stavola, B. & Cox, D. R. (2008). On the consequences of overstratification. Biometrika 95, Kiefer, J. & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many nuisance parameters. Ann. Math. Statist. 27, Lee, Y., Nelder, J. A. & Pawitan, Y. (2006). Generalized Linear Models with Random Effects. Boca Raton, FL: Chapman and Hall. Neuhaus, J. M., Kalbfleisch, J. D. & Hauck, W. W. (1994). Conditions for consistent estimation in mixed-effects models for binary matched-pair data. Can. J. Statist. 22, Snijders, T. & Bosker, R. (1999). Multilevel Analysis. New York: Sage. Yates, F. (1940). The recovery of inter-block information in balanced incomplete block designs. Ann. Eugen. 10, [Received February Revised August 2009]

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Robust Regression via Discriminant Analysis Author(s): A. C. Atkinson and D. R. Cox Source: Biometrika, Vol. 64, No. 1 (Apr., 1977), pp. 15-19 Published by: Oxford University Press on