A litter-based approach to risk assessment in developmental toxicity. studies via a power family of completely monotone functions

A litter-based approach to ris assessment in developmental toxicity studies via a power family of completely monotone functions Anthony Y. C. Ku National University of Singapore, Singapore Summary. A new class of distributions for exchangeable binary data is proposed that originates from modelling the joint success probabilities of all orders by a power family of completely monotone functions. The proposed distribution allows flexible modelling of the dose-response relationship for both the marginal response probability and the pairwise odds ratio and is especially well suited for a litter-based approach to ris assessment. Specifically, the ris of at least one adverse response within a litter taes on a simple form under the proposed distribution and can be reduced further to a generalised linear model if a complementary log-log lin function is used. Existing distributions such as the beta-binomial or folded-logistic functions have a tendency of assigning too much probability to zero, leading to underestimation of the ris that at least one foetus is affected and overestimation of the safe dose. The proposed distribution does not suffer from this problem. With the aid of symbolic differentiation, the proposed distribution can be fitted easily and quicly via the method of scoring. The usefulness of the proposed class of distributions and its superiority over existing distributions are demonstrated in a series of examples involving developmental toxicology and teratology data. Keywords: Complementary log-log lin; Completely monotone function; Dose-response function; Developmental toxicity data; Exchangeability; Intra-cluster correlation; Method of scoring; Odds ratio; Ris assessment; Symbolic differentiation

. Introduction and survey of literature By design, many scientific experiments involve the collection of data on clusters of subjects such as households or litters of animals. A major driving force behind the development of methods for analysing clustered binary data is the desire by regulatory agencies to protect the public from exposure to potentially harmful substances. In a typical developmental toxicity study, pregnant laboratory animals are randomly assigned to receive a toxin at varying dose levels during the period of major organogenesis. These animals are sacrificed prior to term and the uterus is removed and examined for resorptions, foetal deaths and foetal malformations, resulting in clustered binary or multinomial data. The aim of such a study is to assess the relationship between exposure to the toxic substance and the incidence of developmental problem. Another important tas is ris assessment and the determination of an acceptable low-ris or safe dose level (Crump, 984; Chen and Kodell, 989; Ryan, 992). One major consideration that must be taen into account when analysing data from such experiments is the tendency for littermates to behave more similarly than non-littermates. Failure to account for litter effect and the over-dispersion it induces will lead to estimates with overstated precision. A common way to account for litter effect and over-dispersion is to assume that the intra-litter correlation is induced by a random effect shared by all the foetuses within the same litter. This random effect can be looed upon as the combined effect of all factors, both genetic and environmental, that are shared by the littermates. Given this litter specific random effect, the outcomes of the littermates are assumed to be conditionally independent. The use of a beta distribution to model the random effects results in the famous beta-binomial distribution (Williams, 975; Haseman and Kupper, 2

979) that until recently has dominated much of the statistical literature of teratology and developmental toxicology. Other extra-binomial models that have been proposed include the logistic-normal-binomial model (Williams, 982) and the probit-normal-binomial model (Ochi and Prentice, 984), but they are not as widely used as the beta-binomial distribution in the analysis of litter data. The beta-binomial model, however, has its limitations. To begin with, the shape of a beta-binomial distribution is not flexible enough and is often U-shaped, J-shaped or reverse J-shaped (George and Bowman, 995) rather than unimodal with mode near the expected value µ = np. Thus it could happen that nearly all the probability mass are concentrated at the two ends 0 and n, whereas values near the supposedly expected value are highly improbable. Another disadvantage of the beta-binomial model is the non-robustness of its estimates to misspecification of the correlation structure (Kupper et al., 986; Williams, 988). Liang and Hanfelt (994) advocate the use of quasi-lielihood method to obtain robust estimates of the response probabilities that are insensitive to misspecification of the correlation structure. Bowman et al. (995) modelled both the mean responses and intra-litter correlations as functions of dose levels and used a generalised estimating equation (GEE) approach to obtain parameter estimates. A drawbac of the GEE approach is that it can be inefficient for estimating the correlation structure (Moore, 986) that could be of interest in its own rights. Furthermore, the GEE approach typically models only the first two moments and as such cannot provide estimates of quantities that depend on higher order moments (Bowman and George, 995; George and Kodell, 996). In particular, it cannot estimate the probability that at least one littermate 3

is adversely affected (Geys et al., 999) which is of interest in a litter-based approach to quantitative ris assessment. Rather than inducing a positive intra-litter correlation indirectly via a shared random effect, Kupper and Haseman (978) and Altham (978) proposed a correlated binomial distribution with additive interactions but this distribution is not widely used. Altham also proposed a multiplicative generalization of the binomial distribution. Connolly and Liang (988) proposed a class of conditional logistic models for clustered binary data that includes the multiplicative binomial as well as Rosner s (984) model as special cases. This class of models is defined in terms of the conditional probability of each unit, given the number of positive responses in the remaining units of the cluster. Parameter estimation for this class of models is hampered by the need to evaluate a normalising constant for every combination of cluster size and parameter values. Connolly and Liang (988) resorted to a woring lielihood approach. Geys et al. (999) proposed an exponential family of conditional models for multivariate clustered binary data. Needless to say, normalising constants are even more difficult to compute in the multivariate case and they resort to a pseudo-lielihood estimation approach. George and Bowman (995) proposed a new modelling approach centred on exchangeability, which they argued is a reasonable assumption for litter data in developmental toxicity experiments. This approach is based on the fact that the distribution for a set of n exchangeable binary variables X,..., X n is uniquely determined by λ P( X =... = X = ), =,..., n. George and Bowman set λ = = F ( ; β ), where F is the folded-logistic function. 4

A limitation of the folded-logistic model is that there are no additional parameters to model the correlation structure (Molenberghs et al., 998). In section 2, we propose extensions of the folded-logistic model that allow more flexibility in the value of the intra-litter correlation. In section 3, we propose a new distribution for exchangeable binary data based on a power family of completely monotone functions. The proposed distribution allows flexible modelling of the dose-response relationship for both the marginal response probability and the pairwise odds ratio. Even though the lielihood function is very complex, the score function as well as the expected information matrix can be obtained readily with the aid of symbolic differentiation. Furthermore, if the foetal response probability follows a generalised linear model with a complementary log-log lin function, then the probability of at least one foetal response within a litter will follow the same model but with an additional linear term in log(litter-size). This is an attractive property that maes the proposed distribution particularly useful in a litter-based approach to quantitative ris assessment. We also suggest a method for finding the lower confidence limit of the benchmar dose that corresponds to a given level of excess ris. 2. The folded-logistic model and its extensions We begin by introducing the so-called mean parameters (Eholm et al. 995) ( X X ) = P( X =... = X ) λ = E... =. Without loss of generality, suppose we are interested in the joint probability P( X =... = X r =, X r+ =... = X n = 0) = E{ X... X r ( X r+ )...( X n) }. By expanding the product on the right hand side first before we tae expectation, we get n r + n r P ( X =... = X r =, X r =... = X n = 0) = ( ) = 0 λr +, 5

with λ 0 defined as. Let R = X... + + X n be the number of positive responses in a cluster of size n, it follows from exchangeability that n n n r +. () n r P ( R = r) = P( X =... = X r =, X r =... = X n = 0) = ( ) r r = 0 λr+ George and Bowman (995) let λ = F ( ; β) for some response function F and then used () to deduce the joint distribution of X,..., X n. However, care must be taen in modelling the λ to ensure that the summation () results in a legitimate probability between zero and one. A sufficient condition (George and Bowman, 995) is that ( ) ( ) F ( x) 0 for all positive integers, where ( ) F ( x ) is the th derivative of F ( x; β ) with respect to x. Such a function is said to be completely monotone (Feller, 97, p.224). George and Bowman (995) used one particular completely monotone function, the folded-logistic function, to define λ 2 = + ( + ) β with dose-dependent β = β 0 + β d. It follows that the marginal response probability is 2 p = P( X = ) = λ =. (2) β + 2 β By the same toen, P X =, X = ) = λ = 2/( + 3 ), and the intra-litter correlation is given by ( 2 2 2 λ2 λ ρ =. (3) λ λ ) ( 6

Figure is a plot of the intra-litter correlation versus the marginal probability under the folded-logistic model. We can see that the intra-litter correlation is fixed automatically once the response probability is given and this is clearly unrealistic and restrictive. Table shows the result of fitting the folded logistic model to Sellam s (948) Brassica data on the number of pairs of bivalents showing association. It can be seen that the folded logistic model fits the data poorly with a lielihood value even smaller than that of the pure binomial fit. The reason for the poor fit is clear. The probability of association is around 0.58 and existing fits based on the beta-binomial or the correlated binomial distributions all suggest that the pairwise correlation is less than 0., whereas under the folded logistic model, the correlation is forced to be around 0.2 when p = 0. 58. To relax the constraint on the values of the intra-litter correlation under the folded-logistic model, we mae use of the following result. If X,..., X Y,...,Y are n and n two independent sets of exchangeable binary random variables with non-negative correlation and mean parameters denoted by ξ s and η s respectively, then Z = = X Y,..., Zn X nyn will obviously be exchangeable also with mean parameters λ = E Z... Z ) = E( X... X ) E( Y... Y ) =ξ η. (4) ( Two extreme cases are worth mentioning. In the first case, Y,...,Yn are independent and identically distributed. For this case, it is no surprise that the pairwise correlation among Z = = X Y,..., Z n X nyn is less than the pairwise correlation of X,..., X n specifically, it can be shown that corr( XY, X 2Y2 ) = wi corr( X, X 2) < corr( X, X 2), where. More 7

2 var( X ) E( Y) =. 2 var( Y) E( X ) w I 2 var( X ) E( Y) + var( X ) var( Y) + For the other extreme case, Y =... = Y = n Y are totally dependent, and we have corr( XY, X 2Y2 ) = wd corr( X, X 2) + ( wd ) > corr( X, X 2), where var( X ) E( Y) w D =. 2 var( X ) E( Y) + var( Y) E( X ) Our first extension of the folded-logistic model has λ s given by αβ + 2 2 λ β αβ 2 =, (5) + + ( + ) with parameters β > 0 and 0 α. This is a special case of (4) with X,..., X n following a folded logistic model with parameter αβ and Y,...,Yn independent and identically distributed according to a Bernoulli distribution with parameter αβ β π = ( + 2 ) /( + 2 ). It follows that the substitution of (5) into () will yield a bonafide distribution with pairwise correlation smaller than or equal to that dictated by the folded-logistic model. Note that (5) is defined in such a way that the marginal response probability p = λ still follows the folded-logistic form (2). When we fit this extended family to Sellam s data, the result is much improved but still slightly worse than the beta-binomial fit, see Table. It should be noted that the pairwise correlation is no longer constrained to a value near 0.2 but is estimated to be 0.079 instead. To achieve pairwise correlation higher than that allowed by the folded-logistic model, we consider (4) with dependent, in fact, identical Y i s to get 8

αβ + 2 2 λ β αβ 2 =. (6) + + ( + ) A three-parameter family that contains (5) and (6) as special cases is given by αβ + 2 2 λ β αβ 2 =, (7) + + ( + ) with 0. Note that (7) reduces to (5) if = and to (6) if = 0. 3. The power family of distributions 3.. Definition Instead of using the three-parameter family (7), we propose next a more manageable twoparameter family of distributions that can be parametrised in terms of the marginal response probability and the intra-litter association, thereby allowing dose-response modelling for both. This is achieved by applying the power transformation directly to the foetus response probability p to result in λ = P ( X =... = X = ) = p, (8) with 0 p,. Note that p = λ = P X ) is indeed the foetus response ( = probability. A proof that x p is a completely monotone function in x for 0 < p <, 0 < < is given in Appendix A. Hence the substitution of (8) to () will result in a bona-fide probability distribution n n r n r ( r+ ) P( R = r) = ( ) p r. (9) = 0 Alternatively, we can use the same type of model for X = X (i.e., swap s with 0 s) with p = P( X = ) = P( X = 0) = p = q to get λ = P ( X X =... = = ) = P( X =... = X = 0) = q. (0) 9

The resulting probability distribution is r n r ( n r+ ) P( R = r) = P( R = n r) = ( ) q r. () = 0 This alternative way of modelling by swapping 0 s with s had also been noted by Eholm et al. (995). We call (9) and () the p -power and q -power family respectively. A referee has expressed concern over the lac of invariance to codereversal. We do not share the same concern because we feel that quantitative ris assessment in developmental toxicology is an asymmetric problem in itself. For example, while P ( R ) is of interest because it is the probability that at least one littermate is affected, the corresponding probability P ( R ) = P( R n ) after codereversal is of no obvious relevance. The results of fitting the p-power and q-power distribution to Sellam s Brassica data can be found in Table. It is clear that the q-power distribution provides a better fit than that of the p-power family. In fact, the q-power family gives an almost perfect fit and is clearly the best among all the distributions listed in Table. We advocate the use of the q-power distribution as it fits real data better than the p-power distribution in all the examples that we have looed at. One way to understand why this is the case is to consider the shape of the probability mass function. From Figure 2, we can see that the q-power probability functions are much closer to bellshaped than their p-power counterparts when the foetus response probability and intralitter correlation are small, which is typically the case in toxicological experiments. In contrast, the p-power distribution tends to put too much probability mass at zero and hence will underestimate the probability P ( R ) that at least one littermate is affected. 0

It follows that the use of p-power distribution will overestimate the safe dose in a litter based approach to quantitative ris assessment. The q-power distribution does not have this problem. Furthermore, P ( R ) taes on a very appealing form under the q-power assumption maing it amenable to the theory of generalised linear model, see section 4. 3.2. Measures of association We now interpret the meaning of the parameters. Obviously, q = λ = P X 0) and so ( = p = q = P ( X = ) is the marginal probability that a foetus is affected. Since P ( X =... = X = 0) = q, the probability that X... = X 0 is the same as that of a sample of = = independent Bernoulli observations and in this sense we can interpret as an effective sample size. It follows that = corresponds to independence and = 0 corresponds to complete dependence. Substituting (0) into (3) yields the intra-litter correlation 2 2 q q ρ =. (2) q( q) Note that for 0 < q <, ρ = 0 if and only if = and so zero intra-litter correlation characterises independence. As it stands, model () allows positive intra-litter correlation ρ only. This is not a serious objection as most clustered data are positively correlated. Furthermore, it is well nown that ρ /( n ) in general and so there is not much room for negative correlation anyway if the cluster size is large. Nevertheless, there is still some capacity for generating negative correlation from (). It was proven in Feller (97, p.225) that n if we define P ( R = r) by (), then P( R = r) =, provided that λ = 0. Thus () r= 0

defines a bona fide distribution even for > as long as the resulting P ( R = r) 0 for r = 0,..., n. This leads to negative correlation in view of (2). In practice, it is more meaningful to parametrise () in terms of a measure of association than to use. We can parametrise () in terms of marginal probability q and the intra-litter correlation ρ by inverting (2) to obtain 2 [ log{ q + ρq( q) }/ log( q) ] log =. (3) log 2 An alternative measure of association is the odds ratio between two responses within the same litter. For model (), the odds ratio is given by ψ 2 2 ( 2q + q ) 2 ( q q ) 2 q =. (4) To invert (4), we use the well nown formula (Lipsitz et al., 99) that expresses the joint probability { + 2( ψ ) q} 2 2 + 2( ψ ) q 4q ψ ( ψ ) p 00 = P( X = 0, X 2 = 0) = (5) 2( ψ ) in terms of the marginal probability q and the odds ratio ψ. Since 2 00 q p = by (0), it follows that ( log p / log ) log 00 q = (6) log 2 with p 00 given by (5) and as a result we are able to parametrise () in terms of q and the odds ratio ψ. For binary responses, it is now standard practice (Fleiss, 986; Lipsitz et al., 99) to use the log odds ratio as a measure of association rather than the intra-litter 2

correlation. The main reason for this choice is that there is no restriction on the possible values of the log odds ratio. Other desirable properties of the log odds ratio are given in the boos by Bishop et al. (975, Ch. ) and Fleiss (98, Ch. 5, 6). If the litters are all of size 2 (i.e., paired data), then the parameters q and φ = logψ are orthogonal. This is because if a litter is of size 2, then the unconditional odds ratio is the same as the conditional odds ratio and the latter is nown to be orthogonal to q (Fitzmaurice and Laird, 993). Even though q and φ = logψ are no longer orthogonal for general litter size, we still expect inference regarding q to be less affected by assumptions about φ than by the specification of the intra-litter correlation structure. We now illustrate this point with the data set of Weil (970) comprising a treatment and a control group. 3.3. Analysis of Weil s data Williams (975) fitted a separate beta-binomial distribution to each group. As noted by Liang and Hanfelt (994), very different estimates for β = log{ p ( p )/( p ( p ))} are obtained if we assume a common intra-litter correlation for the control and treatment groups. The results of fitting the q-power family of distributions are given in Table 2. When we fit a separate q-power distribution to each group, we obtain β ˆ =. 23 (.448). As in the beta-binomial case, the results change quite a bit if the intra-litter correlations T C C T ρ T and ρ C are assumed to be equal. More specifically, we get a smaller estimate of p C (.246 instead of.258), a bigger estimate of p T (.2 instead of.02), and hence β ˆ = 0.873 is of considerably smaller magnitude than before. The same phenomenon was observed by Williams (988) for the beta-binomial fit. If we assume a common log odds ratio φ rather than a common intra-litter correlation, the estimate becomes 3

β ˆ =.237 (.47) which is comparable to the estimate obtained earlier when no constraints are imposed. This is despite the fact that the lielihood ratio test rejects the hypothesis of equal odds ratio at the 5% level (test statistic of 4.00 on degree of freedom). Thus for this data set, the maximum lielihood estimate of β seems to be quite robust to incorrect specification of the odds ratio. 3.4. Dose-response modelling The power family of distributions () can be used in conjunction with any dose-response relationship. For modelling the marginal probabilities, we can use a generalised linear (in dose or log-dose) formulation with logit, probit or the complementary log-log lin (Ryan, 992), a Weibull function (Chen and Kodell, 989) or a folded-logistic function (George and Bowman, 995). The complementary log-log lin log { log( )} = log( log q) = β + β d (7) p 0 deserves special mention because this lin function is preserved in the induced model for the probability of at least one foetal response. We will digress on this point in section 4. 3.5. Score function and information matrix Since the score function and information matrix are obtained by summing over litters, it suffices to consider the contribution of a single litter, say of size n with r affected foetuses. The log-lielihood based on this litter alone is log{ ( R r) } P =, where P ( R = r) is given by (). Let denote the differentiation operator with respect to the parameters of the model under consideration, we have log { P( R = r) } P( R = r) = = P( R = r) n r r = 0 r ( ) q P( R = r) ( n r + ) (8) 4

and the problem is reduces to differentiating functions of the form q with respect to the model parameters. Rather than differentiating manually which can be extremely tedious in view of (3), (5) and (6), we use symbolic differentiation by calling the Splus function deriv to obtain q and hence also (8). The expected information based on a single litter is E n T [ { P( R = r) } log{ P( R = r) }] = T P( R = r) P( R = r) log, (9) r= 0 P( R = r) where P ( R = r) is given by the numerator of (8) and P ( R = r) by (). Summing (8) and (9) over litters, we obtain the score function and the expected information based on the whole sample and hence the maximum lielihood estimates can be obtained iteratively using the method of scoring. Standard errors for the maximum lielihood estimates can be obtained by inverting the expected information matrix. The suggested procedure can be implemented quite readily in practice. Given a tool for carrying out symbolic differentiation, it does not tae too much effort to program the computation of (8) and (9), which are the building blocs of the entire estimation procedure. 4. Probability of at least one affected littermate 4.. Special role of the complementary log-log lin In a litter-based approach to ris assessment, a relevant quantity is the probability that at least one littermate is adversely affected. Lipsitz et al. (995) were also interested in estimating such union probability for longitudinal data. Their method is based on the Bahadur representation of the joint distribution and requires the estimation of correlations of all orders by the method of moments. To do this accurately obviously requires large 5

number of litters. For the q -power family, the probability of at least one affected littermate taes on a particularly simple and appealing form, P( R ) = P( R = 0) = P( X =... = X n = 0) = q (20) n which should be compared with the usual formula n q valid under independence. In contrast, the corresponding probability under the p -power family (9) is P( R ) = P( R = 0) = n = 0 ( ) which is considerably more complicated. It follows from (20) that log n p [ log{ P( R } ] = log n + log( log q) = log n + β + β d (2) if p satisfies (7). Thus if p follows a generalised linear model under a complementary log-log lin function, then the same holds true for the probability P ( R ) but with an extra term log n appearing in (2). A simpler, though less efficient, way to estimate the model parameters is to dichotomise the data to 0 or depending on whether R = 0 or R. According to equation (2), the dichotomised data follow a binary regression model with complementary log-log lin and hence the parameters can be estimated using any statistical software that does binary regression. 4.2. The E2 data set Consider now the E2 data set (Broos et al., 997) on the numbers of dead foetuses in litters of mice from untreated experimental animals. When a q-power distribution is fitted to this data set, the estimated parameters are ˆ =. 8322, q ˆ = pˆ =. 8806, or 0 6

β ˆ = log( log qˆ) = 2.0625 in the complementary log-log scale. We shall show below that the q -power distribution leads to a better estimate of the probability that a litter has at least one dead foetus. Under our model, the probability that a litter of size n is affected is given by (20). The expected number of affected litters is thus 9 n= 3 n ( ) m n q, where m is the number of litters of size n in the E2 data set. Substituting q ˆ =. 8806, n ˆ =.8322 into the above formula, we obtain the number 36.8. This agrees well with the actual number of affected litters in the data set, which is 35 out of a total of 2 litters. In comparison, the beta-binomial fit predicted 29. affected litters only, which is quite an underestimate. The observation that the beta-binomial fit underestimates the number of affected litters is a recurrent theme in this paper. This has to do with the fact that the beta-binomial probability is often U-shaped or reverse J-shaped even though the data may suggest otherwise. In essence, the presence of a positive intra-litter correlation has caused the beta-binomial distribution to drive most of its mass to zero or full count. For the E2 data set, the modal number of dead foetuses is for litter size 9, 0, 2, 4, 6 and 7 but the fitted beta-binomial probability function is reverse J-shaped with mode at zero. As a result, the beta-binomial distribution over-estimates the probability that none of the littermates is affected and hence underestimates the number of affected litters. We can test the goodness of fit of the q-power distribution by comparing the full ~ data estimates with the estimates β = 2. 735, ~ =. 0795 obtained from binary regression of the dichotomised data with complementary log-log lin [ log{ P( R } ] = log( log q) + log n = β + log n log. 7

~ Thus β ˆ β = 0. 65 (.9547), ˆ ~ =. 2473 (.4346) and the standardised differences Z =.68, Z =. 57 are not sufficiently large to warrant rejection of the q-power model. β 5. Analysis of the 2,4,5-T data 5.. Model fitting and comparison In a study conducted at the U.S. National Centre for Toxicological Research, pregnant mice from several strains were given daily doses of the herbicide 2,4,5-T from day 6 to day 4 of gestation. For each female mouse, the number of implantation sites, foetal deaths, resorptions and cleft palate malformations were recorded. Further details of this study can be found in Holson et al. (99). In eeping with most published analyses of the data set, we consider only data obtained from the out-bred strain CD- and use a combined endpoint of death, resorption or malformation. For this strain, there were six dose groups corresponding to exposure levels of 0, 30, 45, 60, 75 and 90 mg/g of 2,4,5- T. A listing of the data can be found in George and Bowman (995). As noted by Dominici and Parmigiani (200), this data set is quite hard to model due to the presence of zero inflation, n-inflation, over-dispersion and large urtosis. Furthermore, the extent of departure from the binomial model varies significantly with dose. We now give a new analysis of the 2,4,5-T data based on the q-power distributions and in doing so uncover a deficiency of existing fits from a litter-based point of view to ris assessment. We begin by fitting a separate q-power distribution to the control as well as each of the five dose groups. The log lielihood is 700.89 which is considerably larger than that of the beta-binomial fit (-736.22) and the folded-logistic fit (-785.76). Next, we attempt dose-response modelling to arrive at a more parsimonious description. For reasons mentioned in section 4., it seems natural to consider the 8

complementary log-log lin of the response probability. A plot of log(-log q) versus dose level is given by the unfilled circles in Figure 3. It can be seen that all the points lie around a straight line except for the point corresponding to the control group. This suggests that a reasonable model is log( > log q ) = η I + β 0 { d 0} 0I + β = { d 0} d, (22) so that log( log q) = η0 for the control group rather than β 0. As for the measure of intra-litter association, we prefer to use the pairwise odds ratio ψ for reasons mentioned in section 3.2. Figure 3 suggests that the log odds ratio is approximately linear in the dose level, hence φ = logψ = α + α. (23) 0 d We then fit q-power distributions to the six groups with q and φ related to dose according to (22) and (23). The Splus code for doing this, together with the 2,4,5-T data set, can be found in http://www.stat.nus.edu.sg/~u/. We obtain the parameter estimates η ˆ0 = 2.572 (.3), β ˆ0 = 3. 407 (.82), β ˆ = 0. 0499 (.0032), α ˆ 0 = 0. 000437 (.0377) and α ˆ = 0. 0352 (.00254). With 7 parameters less than the separate-fit model, the log lielihood is only reduced slightly to 702.75 and the lielihood ratio test is clearly in favour of the reduced model. Table 3 displays the estimates of the response probabilities and intra-litter correlations obtained under assumptions (22) and (23). The left-hand panel of Table 4 displays the expected number of affected foetuses for each dose group under the various models. It can be seen that the folded-logistic fit is the best and the betabinomial fit is worst. The new model that we propose does better in 5 of the 6 dose groups and is only slightly worse than the folded-logistic model for the 45mg/g group. 9

Note in particular how it corrects for the severe over-prediction of the number of affected foetuses by all existing models at the 30mg/g group. Faustman et al. (994) and Geys et al. (999) pointed out that it is important from a biological perspective to tae into account the health of the entire litter. Under the socalled litter-based approach to quantitative ris assessment, a litter is said to be affected if at least one foetus is affected. In the right-hand panel of Table 4, we compare the number of affected litters predicted by the folded-logistic and our model. We do not include the beta-binomial model since it has already been demonstrated as the worst fitting model. The GEE approach cannot estimate higher order probabilities and so is also excluded from our study. In contrast, the q-power family of distributions is extremely well suited to estimating the probability of at least one affected foetus, via equation (20). From Table 4, we can see how the folded-logistic model underestimates the number of affected litters for the control group and overestimates in the dose groups. A referee expressed the opinion that our improved fit may have more to do with the better modelling of the response probability by (22) than the use of the q-power distribution. To see if this is the case, we consider also the p-power, the beta-binomial and the extended folded logistic distribution (6) with the same foetus response probability p and intra-litter correlation ρ as that of the q-power fit we obtained earlier. Because the foetus response probabilities are matched, all four distributions perform identically in terms of predicting the number of affected foetuses. When it comes to estimating the number of affected litters, however, the difference in distributional assumptions begin to show because the probability that a litter is affected is a union probability that cannot be determined from the first two moments alone. The most noteworthy point of Table 5 20

is that all the distributions, except the q-power family, substantially underestimate the number of affected litters in the 30, 45 and 60mg/g dose groups. To understand why, we plot in Figure 4 the fitted probability functions for litter size 2 under the various distributional assumptions. It appears that the presence of positive intra-litter has caused all the distributions, with the exception of the q-power model, to inflate the probability mass at zero. Consequently, the probability that at least one foetus is adversely affected is underestimated by all except the q-power distribution. 5.2. Determination of safe dose Let r (d ) be a suitably chosen function that relates the ris of an adverse effect, such as death, resorption or malformation, to the exposure level of a toxic substance. In a foetusbased approach to ris assessment, r (d ) could be chosen as the marginal probability that a foetus is affected. In a litter-based approach, interest is focussed on P ( R ), the probability that at least one foetus is affected. Since P( R ) = P( R d, n) depends on the litter size n in addition to the exposure level d, it is customary to weight P( R d, n) according to the empirical relative frequency f (n) of the litter sizes across all dose groups. Thus r ( d) = n= f ( n) P( R d, n) is a suitable ris function if a litter-based approach is adopted. For the q-power class of models, P( R d, n) taes on the particularly simple form (20), with q related to dose by a dose-response function such as (7). In quantitative ris assessment (Geys et al., 999), one is interested in the excess ris over bacground, r * ( d) = r( d) r(0). 2

Crump (984) defined the benchmar dose, BMD α as the dose level that produces an excess ris of α. Typical choices of α are.000,.0,.05 and., depending on how big an excess ris is regarded as tolerable. A point estimate B Mˆ D of the benchmar dose is α obtained by solving the equation r ˆ* ( d) = α, where r ˆ* ( d) is the estimated excess ris function that results from replacing all parameters in r * ( d ) by estimates. In the presence of sampling uncertainty, it is more meaningful to construct a 95% lower confidence interval for BMD α than to calculate just a point estimate. The conventional lower confidence limit based on asymptotic normality is given by BMˆ D α.645 vâr(bmˆ Dα ), where vâr(bmˆ D ) α is the estimated variance of B Mˆ Dα obtained using delta method. A drawbac of this approach is that it might yield unstable (Catalano et al., 994) as well as negative estimates. Kimmel and Gaylor (988) proposed an alternative way to obtain lower confidence interval for BMD α via test inversion. To be specific, the confidence interval consists of all those dose levels d such that the hypothesis H : r ( d) = α is not rejected in favour of the one-sided alternative H a : r ( d) < α at level 0.05. A little algebra shows that the resulting 95% lower confidence interval for the benchmar dose BMD α consists of all those d such that * * rˆ U ( d) = rˆ ( d) +.645 vâr( rˆ ( d)) α (24) and setting rˆ U ( d) = α leads to the so-called lower effective dose LED α. Since rˆ U ( d) is the 95% upper confidence limit for the excess ris r * ( d ), (24) tells us that a 95% lower confidence interval for the benchmar dose BMD α can be obtained by taing all those dose levels d such that the 95% upper confidence interval for r * ( d ) contains α. A 22

graphical illustration of this is given in Ku (2003) who also taes into account the possibility of failure to implant. Regan and Catalano (999) consider the ris of malformation and low foetal weight simultaneously. Table 6 shows the benchmar dose and lower effective dose estimated from the 2,4,5-T data using the folded logistic as well as our q-power model. For the foetus-based approach, we use α = 0. 0. A larger value of α = 0. 05 is used under the litter-based approach since the bacground ris is higher by virtue of the fact that we are considering a union probability. Recall from our earlier discussion that log( log q) appears to be linear in dose in the range 30-90 mg/g but the same straight line cannot be used to describe what happens at the control group. Without additional data, one cannot really ascertain the form of the dose-response curve in the range 0-30. For the purpose of calculating benchmar dose, we assume that log( log q) is piecewise linear in d with a changepoint at 30. The sample estimate of this dose-response curve is depicted by the dotted curve in Figure 3. This piecewise linear assumption will lead to conservative estimates of the benchmar dose if log( log q) actually falls below the straight line in the initial dose range 0-30 mg/g. We observed earlier from Table 4 that the folded-logistic model underestimates the number of affected litters for the control group and overestimates for the dose groups. This suggests that the folded-logistic model will overestimate the excess ris and hence result in a benchmar dose that is lower than necessary. Table 6 confirms this view as the benchmar dose and lower effective dose determined by the folded logistic fit are uniformly smaller than that obtained from the q-power model. 6. Conclusions 23

In conclusion, we have proposed in this paper a power family of distributions that allows flexible dose-response modelling of the response probability as well as association structure. Estimating the probability that at least one littermate is affected is particularly easy under the proposed model and this has applications in a litter-based approach to quantitative ris assessment. The q-power distribution is fairly easy to fit with the aid of symbolic differentiation and there is no need to use pseudo lielihood or Monte Carlo approximation. It provides a better fit to several data sets that we have looed at than existing distributions. All these lead us to the belief that the q-power class of distributions is suitable for routine use and provides an additional option in the analysis of clustered binary data for statisticians woring in toxicology, teratology and other disciplines. Acnowledgements The author would lie to than the editor and the referees for their helpful suggestions. Appendix A: Proof that g x F ( x) = p is a completely monotone function Let G( x) log{ F( x) } = x log p ( ) =, we have G ( x) = ( )...( + ) x log p. ( ) Since 0 < p, <, it is clear that ( ) G ( x) 0 for x > 0. Now G( x) F ( x) = e and so () () ( ) F ( x) = F ( x) G ( x) is negative as required. It can be proven by induction that F ( x) is of the form ( ) = ( r) ( r) F ( x) a rf ( x) G ( x), r= 0 where (0) a are some positive constants and F ( x) = F( x) by convention. Assuming r ( ) ( ) r F r ( x) 0 is true for r = 0,...,, it follows that ( ) ( ) F ( x) = a r= 0 r ( ) r F ( r) ( x)( ) r G ( r) ( x) 0 24

and hence F (x) is completely monotone by mathematical induction. References Altham, P. M. E. (978) Two generalizations of the binomial distribution. Appl. Statist., 27, 62-67. Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (975) Discrete Multivariate Analysis: Theory and Prentice. Cambridge, Mass: MIT Press. Bowman, D. and George, E. O. (995) A saturated model for analyzing exchangeable binary data: Applications to clinical and developmental toxicity study. J. Am. Statist. Assoc., 90, 87-879. Bowman, D., Chen, J. J., and George E. O. (995) Estimating variance functions in developmental toxicity studies. Biometrics, 5, 523-528. Broos, S. P., Morgan, B. J. T., Ridout, M. S. and Pac, S. E. (997) Finite mixture models for proportions. Biometrics, 53, 097-5. Catalano, P. J., Ryan, L. M. and Scharfstein, D. (994) Modelling fetal death and malformation in developmental toxicity. Ris Analysis, 4, 6-69. Chen, J. J. and Kodell, R. L. (989) Quantitative ris assessment for teratological effects. J. Am. Statist. Assoc., 84, 966-97. Connolly, M. A. and Liang, K. Y. (988) Conditional logistic regression models for correlated binary data. Biometria, 75, 50-506. Crump, K. S. (984) A new method for determining allowable daily intaes. Fundamental and Applied Toxicology, 4, 854-87. Dominici, F. and Parmigiani, G. (200) Bayesian semiparametric analysis of developmental toxicology data. Biometrics, 57, 50-57. 25

Eholm, A., Smith, P. W. F. and McDonald, J. W. (995) Marginal regression analysis of a multivariate binary response. Biometria, 82, 847-854. Faustman, E. M., Allen, B. C., Kavloc, R. J. and Kimmel, C. A. (994) Dose response assessment for developmental toxicity. I. Characterization of database and determination of no observed adverse effect levels. Fundamental and Applied Toxicology, 23, 478-486. Feller, W. (97) An Introduction to Probability Theory and Its Applications, Volume II, 2nd ed. New Yor: Wiley. Fitzmaurice, G. M. and Laird, N. M. (993) A lielihood-based method for analysing longitudinal binary responses. Biometria, 80, 4-5. Fleiss, J. L. (98) Statistical Methods for Rates and Proportions, 2nd ed. New Yor: Wiley. George, E. O. and Bowman, D. (995) A full lielihood procedure for analysing exchangeable binary data. Biometrics, 5, 52-523. George, E. O. and Kodell, R. L. (996) Tests of independence, treatment heterogenity, and dose-related trend with exchangeable binary data. J. Am. Statist. Assoc., 9, 602-60. Geys, H., Molenberghs, G. and Ryan, L. (999) Pseudolielihood modeling of multivariate outcomes in developmental toxicology. J. Am. Statist. Assoc., 94, 734-745. Haseman, J. K. and Kupper, L. L. (979) Analysis of dichotomous response data from certain toxicological experiments. Biometrics, 35, 28-293. 26

Holston, J. F., Gaines, T. B., Nelson, C. J., LaBorde, J. B., Gaylor, D. W., Sheehan, D. M. and Young, J. F. (99). Developmental toxicity of 2,4,5-trichlorophenoxiacetic acid I: Multireplicated dose response studies in four inbred strains and one outbred stoc of mice. Fundamental and Applied Toxicology, 9, 286-297. Kimmel, C. A. and Gaylor, D. W. (988) Issues in qualitative and quantitative ris analysis for developmental toxicology. Ris Analysis, 8, 5-20. Ku, A.Y.C. (2003) A generalised estimating equation approach to modelling foetal response in developmental toxicity studies when number of implants is dosedependent. Applied Statistics, 52, 5-6. Kupper, L. L. and Haseman, J. K. (978) The use of a correlated binomial model for the analysis of certain toxicological experiments. Biometrics, 35, 28-293. Kupper, L. L., Portier, C., Hogan, M. D. and Yamamoto, E. (986) The impact of litter effects on dose-response modeling in teratology. Biometrics, 42, 85-98. Liang, K. Y. and Hanfelt, J. (994) On the use of quasi-lielihood method in teratological experiments. Biometrics, 50, 872-880. Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (99) Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometria, 78, 53-60. Lipsitz, S. R., Fitzmaurice, G. M., Sleeper, L. and Zhao, L. P. (995) Estimation methods for the joint distribution of repeated binary observations. Biometrics, 5, 562-570. Molenberghs, G., Declerc, L. and Aerts, M. (998) Misspecifying the lielihood for clustered binary data. Comput. Statist. Data Anal., 26, 327-349. 27

Moore, D. F. (986) Asymptotic properties of moment estimators for overdispersed counts and proportions. Biometria, 73, 583-588. Ochi, Y. and Prentice, R. L. (984) Lielihood inference in a correlated probit regression model. Biometria, 7, 53-543. Regan, M. M. and Catalano, P. J. (999) Lielihood models for clustered binary and continuous outcomes: applications to developmental toxicology. Biometrics, 55, 760-768. Rosner, B. (984) Multivariate methods in ophthalmology with applications to other paired-data situations. Biometrics, 40, 025-035. Ryan, L. (992) Quantitative ris assessment for developmental toxicity. Biometrics, 48, 63-74. Sellam, J. G. (948) A probability distribution derived from the binomial distribution by regarding the probability of a success as variable between the sets of trials. J. R. Statist. Soc. B, 0, 257-26. Weil, C. S. (970) Selection of valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food and Cosmetics Toxicology, 8, 77-82. Williams, D. A. (975) The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics, 3, 949-952. Williams, D. A. (982) Extra-binomial variation in logistic-linear models. Appl. Statist., 3, 44-48. Williams, D. A. (988) Estimation bias using beta-binomial distribution in teratology. Biometrics, 44, 305-309. 28

Table. The fits of various distributions to Sellam s Brassica data Number of associations 0 2 3 2 Pearson χ Log-li Observed 32 03 22 80 Binomial 24.86 03.24 42.93 65.96 8.0-440.39 Beta-binomial 33.96 97.8 27.69 78.7 0.76-436.8 Folded logistic 50.8 90.8 04.4 90.98 2.89-443.33 Extended folded logistic 34.75 93.23 33.4 75.62 2.47-437.67 p-power 33.86 93.50 36.04 73.60 3.07-437.97 q-power 32.33 0.92 22.79 79.97 0.02-436.44 The results for binomial, beta-binomial, additive and multiplicative correlated binomial distributions are reproduced from Altham (978). Table 2. Fitting Weil s data by the q-power family of distributions βˆ SE( βˆ ) pˆ T pˆ C ρˆ T ρˆ C φˆ T φˆ C Log-li Separate fit -.23.448.02.258.289.08.38 0.84-54.75 Common ρ -0.873.473.20.246.53.53 0.763.03-58.34 Common φ -.237.470.095.266.89.02 0.903 0.903-56.75 Table 3. Estimated response probabilities and intra-litter correlations for the 2,4,5-T data GEE Beta-binomial Folded-logistic q-power Dose group p ρ p ρ p ρ p ρ Control.043.063.0570 -.004.0624.0804.0736.0000 30 mg/g.769.2207.2062.2479.805.82.374.577 45 mg/g.395.2959.3500.363.2965.433.2682.3358 60 mg/g.5063.3676.5274.4682.4680.842.4829.4840 75 mg/g.693.4352.6982.566.698.2627.757.5323 90 mg/g.8302.4982.8274.6425.9696.3954.9473.4043 Log lielihood -736.22-785.76-702.75 The results for GEE, beta-binomial and folded-logistic models are reproduced from George and Bowman (995) 29

Table 4. Estimated number of affected foetuses and litters for the 2,4,5-T data Number of affected foetuses Number of affected litters Dose group Observed GEE Beta-binom F-logistic q-power Observed F-logistic q-power Control 59 34.57 45.7 50.04 59.00 40 29.77 40.74 30 mg/g 24 72.30 200.83 75.77 30.80 56 67.0 59.37 45 mg/g 338 357.84 392.00 332.0 30.98 80 90.09 79.22 60 mg/g 383 400.99 47.70 370.67 389.24 69 73.6 67.02 75 mg/g 372 326.98 330.25 330.8 362.34 42 43.40 42.64 90 mg/g 242 20.7 20.29 246.28 240.62 24 24.95 24.94 The results for GEE, beta-binomial and folded-logistic models are reproduced from George and Bowman (995) Table 5. Estimated number of affected litters for the 2,4,5-T data for different distributions with the same foetus response probabilities and intra-litter correlations Number of affected litters Dose group p ρ Observed Beta-Bin Extended FL p-power q-power Control.0736.0000 40 40.74 40.74 40.74 40.74 30 mg/g.374.577 56 50.87 5.59 47.20 59.37 45 mg/g.2682.3358 80 68.44 6.4 6.59 79.22 60 mg/g.4829.4840 69 6.96 55.37 57.09 67.02 75 mg/g.757.5323 42 4.59 39.58 39.85 42.64 90 mg/g.9473.4043 24 24.9 24.87 24.7 24.94 Table 6. Determination of benchmar and lower effective dose in mg/g based on the 2,4,5-T data Foetus-based approach Litter-based approach Folded logistic q-power Folded logistic q-power BMD 4.0 6.04.0. 05 LED 3.78 4.72.0. 05 BMD 4.54 8.74 LED 4.46 5.48 30

Fig.. The relationship between intra-litter correlation and marginal response probability under the folded-logistic model 3

Probability 0.0 0.2 0.4 p=. rho=. q-power p-power Probability 0.0 0.2 0.4 p=. rho=.2 0 5 0 5 0 5 0 5 Number of affected foetuses Number of affected foetuses Probability 0.0 0.0 0.20 p=.2 rho=. Probability 0.0 0.5 0.30 p=.2 rho=.2 0 5 0 5 0 5 0 5 Number of affected foetuses Number of affected foetuses Probability 0.0 0.0 p=.3 rho=. Probability 0.0 0.0 0.20 p=.3 rho=.2 0 5 0 5 0 5 0 5 Number of affected foetuses Number of affected foetuses Fig. 2. A comparison of q-power versus p-power probability functions for litter size 5 32

log(-log(-p)) -2-0 0 2 3 log odds ratio 0 20 40 60 80 dose Fig. 3. A plot of the estimated foetal response probabilities in complementary log-log scale (unfilled circles) and log odds ratio (filled circles) versus dose levels for the 2,4,5-T data: The circles are the results of fitting a separate q-power distribution to each dose group and the lines are obtained from dose-response models (22) and (23) 33

Probability 0.0 0.2 0.4 Dose=30mg/g, p=.374, rho=.577 q-power p-power beta-binomial extended folded-logistic 0 2 4 6 8 0 2 Number of affected foetuses Dose=45mg/g, p=.2682, rho=.3358 Probability 0.0 0.2 0 2 4 6 8 0 2 Number of affected foetuses Probability 0.05 0.5 0.25 Dose=60mg/g, p=.4829, rho=.4840 0 2 4 6 8 0 2 Number of affected foetuses Fig. 4. A comparison of the q-power with other distributions that share the same foetus response probability p and intra-litter correlation ρ 34