A litter-based approach to risk assessment in developmental toxicity. studies via a power family of completely monotone functions

Size: px
Start display at page:

Download "A litter-based approach to risk assessment in developmental toxicity. studies via a power family of completely monotone functions"

Transcription

1 A litter-based approach to ris assessment in developmental toxicity studies via a power family of completely monotone functions Anthony Y. C. Ku National University of Singapore, Singapore Summary. A new class of distributions for exchangeable binary data is proposed that originates from modelling the joint success probabilities of all orders by a power family of completely monotone functions. The proposed distribution allows flexible modelling of the dose-response relationship for both the marginal response probability and the pairwise odds ratio and is especially well suited for a litter-based approach to ris assessment. Specifically, the ris of at least one adverse response within a litter taes on a simple form under the proposed distribution and can be reduced further to a generalised linear model if a complementary log-log lin function is used. Existing distributions such as the beta-binomial or folded-logistic functions have a tendency of assigning too much probability to zero, leading to underestimation of the ris that at least one foetus is affected and overestimation of the safe dose. The proposed distribution does not suffer from this problem. With the aid of symbolic differentiation, the proposed distribution can be fitted easily and quicly via the method of scoring. The usefulness of the proposed class of distributions and its superiority over existing distributions are demonstrated in a series of examples involving developmental toxicology and teratology data. Keywords: Complementary log-log lin; Completely monotone function; Dose-response function; Developmental toxicity data; Exchangeability; Intra-cluster correlation; Method of scoring; Odds ratio; Ris assessment; Symbolic differentiation

2 . Introduction and survey of literature By design, many scientific experiments involve the collection of data on clusters of subjects such as households or litters of animals. A major driving force behind the development of methods for analysing clustered binary data is the desire by regulatory agencies to protect the public from exposure to potentially harmful substances. In a typical developmental toxicity study, pregnant laboratory animals are randomly assigned to receive a toxin at varying dose levels during the period of major organogenesis. These animals are sacrificed prior to term and the uterus is removed and examined for resorptions, foetal deaths and foetal malformations, resulting in clustered binary or multinomial data. The aim of such a study is to assess the relationship between exposure to the toxic substance and the incidence of developmental problem. Another important tas is ris assessment and the determination of an acceptable low-ris or safe dose level (Crump, 984; Chen and Kodell, 989; Ryan, 992). One major consideration that must be taen into account when analysing data from such experiments is the tendency for littermates to behave more similarly than non-littermates. Failure to account for litter effect and the over-dispersion it induces will lead to estimates with overstated precision. A common way to account for litter effect and over-dispersion is to assume that the intra-litter correlation is induced by a random effect shared by all the foetuses within the same litter. This random effect can be looed upon as the combined effect of all factors, both genetic and environmental, that are shared by the littermates. Given this litter specific random effect, the outcomes of the littermates are assumed to be conditionally independent. The use of a beta distribution to model the random effects results in the famous beta-binomial distribution (Williams, 975; Haseman and Kupper, 2

3 979) that until recently has dominated much of the statistical literature of teratology and developmental toxicology. Other extra-binomial models that have been proposed include the logistic-normal-binomial model (Williams, 982) and the probit-normal-binomial model (Ochi and Prentice, 984), but they are not as widely used as the beta-binomial distribution in the analysis of litter data. The beta-binomial model, however, has its limitations. To begin with, the shape of a beta-binomial distribution is not flexible enough and is often U-shaped, J-shaped or reverse J-shaped (George and Bowman, 995) rather than unimodal with mode near the expected value µ = np. Thus it could happen that nearly all the probability mass are concentrated at the two ends 0 and n, whereas values near the supposedly expected value are highly improbable. Another disadvantage of the beta-binomial model is the non-robustness of its estimates to misspecification of the correlation structure (Kupper et al., 986; Williams, 988). Liang and Hanfelt (994) advocate the use of quasi-lielihood method to obtain robust estimates of the response probabilities that are insensitive to misspecification of the correlation structure. Bowman et al. (995) modelled both the mean responses and intra-litter correlations as functions of dose levels and used a generalised estimating equation (GEE) approach to obtain parameter estimates. A drawbac of the GEE approach is that it can be inefficient for estimating the correlation structure (Moore, 986) that could be of interest in its own rights. Furthermore, the GEE approach typically models only the first two moments and as such cannot provide estimates of quantities that depend on higher order moments (Bowman and George, 995; George and Kodell, 996). In particular, it cannot estimate the probability that at least one littermate 3

4 is adversely affected (Geys et al., 999) which is of interest in a litter-based approach to quantitative ris assessment. Rather than inducing a positive intra-litter correlation indirectly via a shared random effect, Kupper and Haseman (978) and Altham (978) proposed a correlated binomial distribution with additive interactions but this distribution is not widely used. Altham also proposed a multiplicative generalization of the binomial distribution. Connolly and Liang (988) proposed a class of conditional logistic models for clustered binary data that includes the multiplicative binomial as well as Rosner s (984) model as special cases. This class of models is defined in terms of the conditional probability of each unit, given the number of positive responses in the remaining units of the cluster. Parameter estimation for this class of models is hampered by the need to evaluate a normalising constant for every combination of cluster size and parameter values. Connolly and Liang (988) resorted to a woring lielihood approach. Geys et al. (999) proposed an exponential family of conditional models for multivariate clustered binary data. Needless to say, normalising constants are even more difficult to compute in the multivariate case and they resort to a pseudo-lielihood estimation approach. George and Bowman (995) proposed a new modelling approach centred on exchangeability, which they argued is a reasonable assumption for litter data in developmental toxicity experiments. This approach is based on the fact that the distribution for a set of n exchangeable binary variables X,..., X n is uniquely determined by λ P( X =... = X = ), =,..., n. George and Bowman set λ = = F ( ; β ), where F is the folded-logistic function. 4

5 A limitation of the folded-logistic model is that there are no additional parameters to model the correlation structure (Molenberghs et al., 998). In section 2, we propose extensions of the folded-logistic model that allow more flexibility in the value of the intra-litter correlation. In section 3, we propose a new distribution for exchangeable binary data based on a power family of completely monotone functions. The proposed distribution allows flexible modelling of the dose-response relationship for both the marginal response probability and the pairwise odds ratio. Even though the lielihood function is very complex, the score function as well as the expected information matrix can be obtained readily with the aid of symbolic differentiation. Furthermore, if the foetal response probability follows a generalised linear model with a complementary log-log lin function, then the probability of at least one foetal response within a litter will follow the same model but with an additional linear term in log(litter-size). This is an attractive property that maes the proposed distribution particularly useful in a litter-based approach to quantitative ris assessment. We also suggest a method for finding the lower confidence limit of the benchmar dose that corresponds to a given level of excess ris. 2. The folded-logistic model and its extensions We begin by introducing the so-called mean parameters (Eholm et al. 995) ( X X ) = P( X =... = X ) λ = E... =. Without loss of generality, suppose we are interested in the joint probability P( X =... = X r =, X r+ =... = X n = 0) = E{ X... X r ( X r+ )...( X n) }. By expanding the product on the right hand side first before we tae expectation, we get n r + n r P ( X =... = X r =, X r =... = X n = 0) = ( ) = 0 λr +, 5

6 with λ 0 defined as. Let R = X X n be the number of positive responses in a cluster of size n, it follows from exchangeability that n n n r +. () n r P ( R = r) = P( X =... = X r =, X r =... = X n = 0) = ( ) r r = 0 λr+ George and Bowman (995) let λ = F ( ; β) for some response function F and then used () to deduce the joint distribution of X,..., X n. However, care must be taen in modelling the λ to ensure that the summation () results in a legitimate probability between zero and one. A sufficient condition (George and Bowman, 995) is that ( ) ( ) F ( x) 0 for all positive integers, where ( ) F ( x ) is the th derivative of F ( x; β ) with respect to x. Such a function is said to be completely monotone (Feller, 97, p.224). George and Bowman (995) used one particular completely monotone function, the folded-logistic function, to define λ 2 = + ( + ) β with dose-dependent β = β 0 + β d. It follows that the marginal response probability is 2 p = P( X = ) = λ =. (2) β + 2 β By the same toen, P X =, X = ) = λ = 2/( + 3 ), and the intra-litter correlation is given by ( λ2 λ ρ =. (3) λ λ ) ( 6

7 Figure is a plot of the intra-litter correlation versus the marginal probability under the folded-logistic model. We can see that the intra-litter correlation is fixed automatically once the response probability is given and this is clearly unrealistic and restrictive. Table shows the result of fitting the folded logistic model to Sellam s (948) Brassica data on the number of pairs of bivalents showing association. It can be seen that the folded logistic model fits the data poorly with a lielihood value even smaller than that of the pure binomial fit. The reason for the poor fit is clear. The probability of association is around 0.58 and existing fits based on the beta-binomial or the correlated binomial distributions all suggest that the pairwise correlation is less than 0., whereas under the folded logistic model, the correlation is forced to be around 0.2 when p = To relax the constraint on the values of the intra-litter correlation under the folded-logistic model, we mae use of the following result. If X,..., X Y,...,Y are n and n two independent sets of exchangeable binary random variables with non-negative correlation and mean parameters denoted by ξ s and η s respectively, then Z = = X Y,..., Zn X nyn will obviously be exchangeable also with mean parameters λ = E Z... Z ) = E( X... X ) E( Y... Y ) =ξ η. (4) ( Two extreme cases are worth mentioning. In the first case, Y,...,Yn are independent and identically distributed. For this case, it is no surprise that the pairwise correlation among Z = = X Y,..., Z n X nyn is less than the pairwise correlation of X,..., X n specifically, it can be shown that corr( XY, X 2Y2 ) = wi corr( X, X 2) < corr( X, X 2), where. More 7

8 2 var( X ) E( Y) =. 2 var( Y) E( X ) w I 2 var( X ) E( Y) + var( X ) var( Y) + For the other extreme case, Y =... = Y = n Y are totally dependent, and we have corr( XY, X 2Y2 ) = wd corr( X, X 2) + ( wd ) > corr( X, X 2), where var( X ) E( Y) w D =. 2 var( X ) E( Y) + var( Y) E( X ) Our first extension of the folded-logistic model has λ s given by αβ λ β αβ 2 =, (5) + + ( + ) with parameters β > 0 and 0 α. This is a special case of (4) with X,..., X n following a folded logistic model with parameter αβ and Y,...,Yn independent and identically distributed according to a Bernoulli distribution with parameter αβ β π = ( + 2 ) /( + 2 ). It follows that the substitution of (5) into () will yield a bonafide distribution with pairwise correlation smaller than or equal to that dictated by the folded-logistic model. Note that (5) is defined in such a way that the marginal response probability p = λ still follows the folded-logistic form (2). When we fit this extended family to Sellam s data, the result is much improved but still slightly worse than the beta-binomial fit, see Table. It should be noted that the pairwise correlation is no longer constrained to a value near 0.2 but is estimated to be instead. To achieve pairwise correlation higher than that allowed by the folded-logistic model, we consider (4) with dependent, in fact, identical Y i s to get 8

9 αβ λ β αβ 2 =. (6) + + ( + ) A three-parameter family that contains (5) and (6) as special cases is given by αβ λ β αβ 2 =, (7) + + ( + ) with 0. Note that (7) reduces to (5) if = and to (6) if = The power family of distributions 3.. Definition Instead of using the three-parameter family (7), we propose next a more manageable twoparameter family of distributions that can be parametrised in terms of the marginal response probability and the intra-litter association, thereby allowing dose-response modelling for both. This is achieved by applying the power transformation directly to the foetus response probability p to result in λ = P ( X =... = X = ) = p, (8) with 0 p,. Note that p = λ = P X ) is indeed the foetus response ( = probability. A proof that x p is a completely monotone function in x for 0 < p <, 0 < < is given in Appendix A. Hence the substitution of (8) to () will result in a bona-fide probability distribution n n r n r ( r+ ) P( R = r) = ( ) p r. (9) = 0 Alternatively, we can use the same type of model for X = X (i.e., swap s with 0 s) with p = P( X = ) = P( X = 0) = p = q to get λ = P ( X X =... = = ) = P( X =... = X = 0) = q. (0) 9

10 The resulting probability distribution is r n r ( n r+ ) P( R = r) = P( R = n r) = ( ) q r. () = 0 This alternative way of modelling by swapping 0 s with s had also been noted by Eholm et al. (995). We call (9) and () the p -power and q -power family respectively. A referee has expressed concern over the lac of invariance to codereversal. We do not share the same concern because we feel that quantitative ris assessment in developmental toxicology is an asymmetric problem in itself. For example, while P ( R ) is of interest because it is the probability that at least one littermate is affected, the corresponding probability P ( R ) = P( R n ) after codereversal is of no obvious relevance. The results of fitting the p-power and q-power distribution to Sellam s Brassica data can be found in Table. It is clear that the q-power distribution provides a better fit than that of the p-power family. In fact, the q-power family gives an almost perfect fit and is clearly the best among all the distributions listed in Table. We advocate the use of the q-power distribution as it fits real data better than the p-power distribution in all the examples that we have looed at. One way to understand why this is the case is to consider the shape of the probability mass function. From Figure 2, we can see that the q-power probability functions are much closer to bellshaped than their p-power counterparts when the foetus response probability and intralitter correlation are small, which is typically the case in toxicological experiments. In contrast, the p-power distribution tends to put too much probability mass at zero and hence will underestimate the probability P ( R ) that at least one littermate is affected. 0

11 It follows that the use of p-power distribution will overestimate the safe dose in a litter based approach to quantitative ris assessment. The q-power distribution does not have this problem. Furthermore, P ( R ) taes on a very appealing form under the q-power assumption maing it amenable to the theory of generalised linear model, see section Measures of association We now interpret the meaning of the parameters. Obviously, q = λ = P X 0) and so ( = p = q = P ( X = ) is the marginal probability that a foetus is affected. Since P ( X =... = X = 0) = q, the probability that X... = X 0 is the same as that of a sample of = = independent Bernoulli observations and in this sense we can interpret as an effective sample size. It follows that = corresponds to independence and = 0 corresponds to complete dependence. Substituting (0) into (3) yields the intra-litter correlation 2 2 q q ρ =. (2) q( q) Note that for 0 < q <, ρ = 0 if and only if = and so zero intra-litter correlation characterises independence. As it stands, model () allows positive intra-litter correlation ρ only. This is not a serious objection as most clustered data are positively correlated. Furthermore, it is well nown that ρ /( n ) in general and so there is not much room for negative correlation anyway if the cluster size is large. Nevertheless, there is still some capacity for generating negative correlation from (). It was proven in Feller (97, p.225) that n if we define P ( R = r) by (), then P( R = r) =, provided that λ = 0. Thus () r= 0

12 defines a bona fide distribution even for > as long as the resulting P ( R = r) 0 for r = 0,..., n. This leads to negative correlation in view of (2). In practice, it is more meaningful to parametrise () in terms of a measure of association than to use. We can parametrise () in terms of marginal probability q and the intra-litter correlation ρ by inverting (2) to obtain 2 [ log{ q + ρq( q) }/ log( q) ] log =. (3) log 2 An alternative measure of association is the odds ratio between two responses within the same litter. For model (), the odds ratio is given by ψ 2 2 ( 2q + q ) 2 ( q q ) 2 q =. (4) To invert (4), we use the well nown formula (Lipsitz et al., 99) that expresses the joint probability { + 2( ψ ) q} ( ψ ) q 4q ψ ( ψ ) p 00 = P( X = 0, X 2 = 0) = (5) 2( ψ ) in terms of the marginal probability q and the odds ratio ψ. Since 2 00 q p = by (0), it follows that ( log p / log ) log 00 q = (6) log 2 with p 00 given by (5) and as a result we are able to parametrise () in terms of q and the odds ratio ψ. For binary responses, it is now standard practice (Fleiss, 986; Lipsitz et al., 99) to use the log odds ratio as a measure of association rather than the intra-litter 2

13 correlation. The main reason for this choice is that there is no restriction on the possible values of the log odds ratio. Other desirable properties of the log odds ratio are given in the boos by Bishop et al. (975, Ch. ) and Fleiss (98, Ch. 5, 6). If the litters are all of size 2 (i.e., paired data), then the parameters q and φ = logψ are orthogonal. This is because if a litter is of size 2, then the unconditional odds ratio is the same as the conditional odds ratio and the latter is nown to be orthogonal to q (Fitzmaurice and Laird, 993). Even though q and φ = logψ are no longer orthogonal for general litter size, we still expect inference regarding q to be less affected by assumptions about φ than by the specification of the intra-litter correlation structure. We now illustrate this point with the data set of Weil (970) comprising a treatment and a control group Analysis of Weil s data Williams (975) fitted a separate beta-binomial distribution to each group. As noted by Liang and Hanfelt (994), very different estimates for β = log{ p ( p )/( p ( p ))} are obtained if we assume a common intra-litter correlation for the control and treatment groups. The results of fitting the q-power family of distributions are given in Table 2. When we fit a separate q-power distribution to each group, we obtain β ˆ =. 23 (.448). As in the beta-binomial case, the results change quite a bit if the intra-litter correlations T C C T ρ T and ρ C are assumed to be equal. More specifically, we get a smaller estimate of p C (.246 instead of.258), a bigger estimate of p T (.2 instead of.02), and hence β ˆ = is of considerably smaller magnitude than before. The same phenomenon was observed by Williams (988) for the beta-binomial fit. If we assume a common log odds ratio φ rather than a common intra-litter correlation, the estimate becomes 3

14 β ˆ =.237 (.47) which is comparable to the estimate obtained earlier when no constraints are imposed. This is despite the fact that the lielihood ratio test rejects the hypothesis of equal odds ratio at the 5% level (test statistic of 4.00 on degree of freedom). Thus for this data set, the maximum lielihood estimate of β seems to be quite robust to incorrect specification of the odds ratio Dose-response modelling The power family of distributions () can be used in conjunction with any dose-response relationship. For modelling the marginal probabilities, we can use a generalised linear (in dose or log-dose) formulation with logit, probit or the complementary log-log lin (Ryan, 992), a Weibull function (Chen and Kodell, 989) or a folded-logistic function (George and Bowman, 995). The complementary log-log lin log { log( )} = log( log q) = β + β d (7) p 0 deserves special mention because this lin function is preserved in the induced model for the probability of at least one foetal response. We will digress on this point in section Score function and information matrix Since the score function and information matrix are obtained by summing over litters, it suffices to consider the contribution of a single litter, say of size n with r affected foetuses. The log-lielihood based on this litter alone is log{ ( R r) } P =, where P ( R = r) is given by (). Let denote the differentiation operator with respect to the parameters of the model under consideration, we have log { P( R = r) } P( R = r) = = P( R = r) n r r = 0 r ( ) q P( R = r) ( n r + ) (8) 4

15 and the problem is reduces to differentiating functions of the form q with respect to the model parameters. Rather than differentiating manually which can be extremely tedious in view of (3), (5) and (6), we use symbolic differentiation by calling the Splus function deriv to obtain q and hence also (8). The expected information based on a single litter is E n T [ { P( R = r) } log{ P( R = r) }] = T P( R = r) P( R = r) log, (9) r= 0 P( R = r) where P ( R = r) is given by the numerator of (8) and P ( R = r) by (). Summing (8) and (9) over litters, we obtain the score function and the expected information based on the whole sample and hence the maximum lielihood estimates can be obtained iteratively using the method of scoring. Standard errors for the maximum lielihood estimates can be obtained by inverting the expected information matrix. The suggested procedure can be implemented quite readily in practice. Given a tool for carrying out symbolic differentiation, it does not tae too much effort to program the computation of (8) and (9), which are the building blocs of the entire estimation procedure. 4. Probability of at least one affected littermate 4.. Special role of the complementary log-log lin In a litter-based approach to ris assessment, a relevant quantity is the probability that at least one littermate is adversely affected. Lipsitz et al. (995) were also interested in estimating such union probability for longitudinal data. Their method is based on the Bahadur representation of the joint distribution and requires the estimation of correlations of all orders by the method of moments. To do this accurately obviously requires large 5

16 number of litters. For the q -power family, the probability of at least one affected littermate taes on a particularly simple and appealing form, P( R ) = P( R = 0) = P( X =... = X n = 0) = q (20) n which should be compared with the usual formula n q valid under independence. In contrast, the corresponding probability under the p -power family (9) is P( R ) = P( R = 0) = n = 0 ( ) which is considerably more complicated. It follows from (20) that log n p [ log{ P( R } ] = log n + log( log q) = log n + β + β d (2) if p satisfies (7). Thus if p follows a generalised linear model under a complementary log-log lin function, then the same holds true for the probability P ( R ) but with an extra term log n appearing in (2). A simpler, though less efficient, way to estimate the model parameters is to dichotomise the data to 0 or depending on whether R = 0 or R. According to equation (2), the dichotomised data follow a binary regression model with complementary log-log lin and hence the parameters can be estimated using any statistical software that does binary regression The E2 data set Consider now the E2 data set (Broos et al., 997) on the numbers of dead foetuses in litters of mice from untreated experimental animals. When a q-power distribution is fitted to this data set, the estimated parameters are ˆ =. 8322, q ˆ = pˆ =. 8806, or 0 6

17 β ˆ = log( log qˆ) = in the complementary log-log scale. We shall show below that the q -power distribution leads to a better estimate of the probability that a litter has at least one dead foetus. Under our model, the probability that a litter of size n is affected is given by (20). The expected number of affected litters is thus 9 n= 3 n ( ) m n q, where m is the number of litters of size n in the E2 data set. Substituting q ˆ =. 8806, n ˆ =.8322 into the above formula, we obtain the number This agrees well with the actual number of affected litters in the data set, which is 35 out of a total of 2 litters. In comparison, the beta-binomial fit predicted 29. affected litters only, which is quite an underestimate. The observation that the beta-binomial fit underestimates the number of affected litters is a recurrent theme in this paper. This has to do with the fact that the beta-binomial probability is often U-shaped or reverse J-shaped even though the data may suggest otherwise. In essence, the presence of a positive intra-litter correlation has caused the beta-binomial distribution to drive most of its mass to zero or full count. For the E2 data set, the modal number of dead foetuses is for litter size 9, 0, 2, 4, 6 and 7 but the fitted beta-binomial probability function is reverse J-shaped with mode at zero. As a result, the beta-binomial distribution over-estimates the probability that none of the littermates is affected and hence underestimates the number of affected litters. We can test the goodness of fit of the q-power distribution by comparing the full ~ data estimates with the estimates β = , ~ = obtained from binary regression of the dichotomised data with complementary log-log lin [ log{ P( R } ] = log( log q) + log n = β + log n log. 7

18 ~ Thus β ˆ β = (.9547), ˆ ~ = (.4346) and the standardised differences Z =.68, Z =. 57 are not sufficiently large to warrant rejection of the q-power model. β 5. Analysis of the 2,4,5-T data 5.. Model fitting and comparison In a study conducted at the U.S. National Centre for Toxicological Research, pregnant mice from several strains were given daily doses of the herbicide 2,4,5-T from day 6 to day 4 of gestation. For each female mouse, the number of implantation sites, foetal deaths, resorptions and cleft palate malformations were recorded. Further details of this study can be found in Holson et al. (99). In eeping with most published analyses of the data set, we consider only data obtained from the out-bred strain CD- and use a combined endpoint of death, resorption or malformation. For this strain, there were six dose groups corresponding to exposure levels of 0, 30, 45, 60, 75 and 90 mg/g of 2,4,5- T. A listing of the data can be found in George and Bowman (995). As noted by Dominici and Parmigiani (200), this data set is quite hard to model due to the presence of zero inflation, n-inflation, over-dispersion and large urtosis. Furthermore, the extent of departure from the binomial model varies significantly with dose. We now give a new analysis of the 2,4,5-T data based on the q-power distributions and in doing so uncover a deficiency of existing fits from a litter-based point of view to ris assessment. We begin by fitting a separate q-power distribution to the control as well as each of the five dose groups. The log lielihood is which is considerably larger than that of the beta-binomial fit ( ) and the folded-logistic fit ( ). Next, we attempt dose-response modelling to arrive at a more parsimonious description. For reasons mentioned in section 4., it seems natural to consider the 8

19 complementary log-log lin of the response probability. A plot of log(-log q) versus dose level is given by the unfilled circles in Figure 3. It can be seen that all the points lie around a straight line except for the point corresponding to the control group. This suggests that a reasonable model is log( > log q ) = η I + β 0 { d 0} 0I + β = { d 0} d, (22) so that log( log q) = η0 for the control group rather than β 0. As for the measure of intra-litter association, we prefer to use the pairwise odds ratio ψ for reasons mentioned in section 3.2. Figure 3 suggests that the log odds ratio is approximately linear in the dose level, hence φ = logψ = α + α. (23) 0 d We then fit q-power distributions to the six groups with q and φ related to dose according to (22) and (23). The Splus code for doing this, together with the 2,4,5-T data set, can be found in We obtain the parameter estimates η ˆ0 = (.3), β ˆ0 = (.82), β ˆ = (.0032), α ˆ 0 = (.0377) and α ˆ = (.00254). With 7 parameters less than the separate-fit model, the log lielihood is only reduced slightly to and the lielihood ratio test is clearly in favour of the reduced model. Table 3 displays the estimates of the response probabilities and intra-litter correlations obtained under assumptions (22) and (23). The left-hand panel of Table 4 displays the expected number of affected foetuses for each dose group under the various models. It can be seen that the folded-logistic fit is the best and the betabinomial fit is worst. The new model that we propose does better in 5 of the 6 dose groups and is only slightly worse than the folded-logistic model for the 45mg/g group. 9

20 Note in particular how it corrects for the severe over-prediction of the number of affected foetuses by all existing models at the 30mg/g group. Faustman et al. (994) and Geys et al. (999) pointed out that it is important from a biological perspective to tae into account the health of the entire litter. Under the socalled litter-based approach to quantitative ris assessment, a litter is said to be affected if at least one foetus is affected. In the right-hand panel of Table 4, we compare the number of affected litters predicted by the folded-logistic and our model. We do not include the beta-binomial model since it has already been demonstrated as the worst fitting model. The GEE approach cannot estimate higher order probabilities and so is also excluded from our study. In contrast, the q-power family of distributions is extremely well suited to estimating the probability of at least one affected foetus, via equation (20). From Table 4, we can see how the folded-logistic model underestimates the number of affected litters for the control group and overestimates in the dose groups. A referee expressed the opinion that our improved fit may have more to do with the better modelling of the response probability by (22) than the use of the q-power distribution. To see if this is the case, we consider also the p-power, the beta-binomial and the extended folded logistic distribution (6) with the same foetus response probability p and intra-litter correlation ρ as that of the q-power fit we obtained earlier. Because the foetus response probabilities are matched, all four distributions perform identically in terms of predicting the number of affected foetuses. When it comes to estimating the number of affected litters, however, the difference in distributional assumptions begin to show because the probability that a litter is affected is a union probability that cannot be determined from the first two moments alone. The most noteworthy point of Table 5 20

21 is that all the distributions, except the q-power family, substantially underestimate the number of affected litters in the 30, 45 and 60mg/g dose groups. To understand why, we plot in Figure 4 the fitted probability functions for litter size 2 under the various distributional assumptions. It appears that the presence of positive intra-litter has caused all the distributions, with the exception of the q-power model, to inflate the probability mass at zero. Consequently, the probability that at least one foetus is adversely affected is underestimated by all except the q-power distribution Determination of safe dose Let r (d ) be a suitably chosen function that relates the ris of an adverse effect, such as death, resorption or malformation, to the exposure level of a toxic substance. In a foetusbased approach to ris assessment, r (d ) could be chosen as the marginal probability that a foetus is affected. In a litter-based approach, interest is focussed on P ( R ), the probability that at least one foetus is affected. Since P( R ) = P( R d, n) depends on the litter size n in addition to the exposure level d, it is customary to weight P( R d, n) according to the empirical relative frequency f (n) of the litter sizes across all dose groups. Thus r ( d) = n= f ( n) P( R d, n) is a suitable ris function if a litter-based approach is adopted. For the q-power class of models, P( R d, n) taes on the particularly simple form (20), with q related to dose by a dose-response function such as (7). In quantitative ris assessment (Geys et al., 999), one is interested in the excess ris over bacground, r * ( d) = r( d) r(0). 2

22 Crump (984) defined the benchmar dose, BMD α as the dose level that produces an excess ris of α. Typical choices of α are.000,.0,.05 and., depending on how big an excess ris is regarded as tolerable. A point estimate B Mˆ D of the benchmar dose is α obtained by solving the equation r ˆ* ( d) = α, where r ˆ* ( d) is the estimated excess ris function that results from replacing all parameters in r * ( d ) by estimates. In the presence of sampling uncertainty, it is more meaningful to construct a 95% lower confidence interval for BMD α than to calculate just a point estimate. The conventional lower confidence limit based on asymptotic normality is given by BMˆ D α.645 vâr(bmˆ Dα ), where vâr(bmˆ D ) α is the estimated variance of B Mˆ Dα obtained using delta method. A drawbac of this approach is that it might yield unstable (Catalano et al., 994) as well as negative estimates. Kimmel and Gaylor (988) proposed an alternative way to obtain lower confidence interval for BMD α via test inversion. To be specific, the confidence interval consists of all those dose levels d such that the hypothesis H : r ( d) = α is not rejected in favour of the one-sided alternative H a : r ( d) < α at level A little algebra shows that the resulting 95% lower confidence interval for the benchmar dose BMD α consists of all those d such that * * rˆ U ( d) = rˆ ( d) vâr( rˆ ( d)) α (24) and setting rˆ U ( d) = α leads to the so-called lower effective dose LED α. Since rˆ U ( d) is the 95% upper confidence limit for the excess ris r * ( d ), (24) tells us that a 95% lower confidence interval for the benchmar dose BMD α can be obtained by taing all those dose levels d such that the 95% upper confidence interval for r * ( d ) contains α. A 22

23 graphical illustration of this is given in Ku (2003) who also taes into account the possibility of failure to implant. Regan and Catalano (999) consider the ris of malformation and low foetal weight simultaneously. Table 6 shows the benchmar dose and lower effective dose estimated from the 2,4,5-T data using the folded logistic as well as our q-power model. For the foetus-based approach, we use α = A larger value of α = is used under the litter-based approach since the bacground ris is higher by virtue of the fact that we are considering a union probability. Recall from our earlier discussion that log( log q) appears to be linear in dose in the range mg/g but the same straight line cannot be used to describe what happens at the control group. Without additional data, one cannot really ascertain the form of the dose-response curve in the range For the purpose of calculating benchmar dose, we assume that log( log q) is piecewise linear in d with a changepoint at 30. The sample estimate of this dose-response curve is depicted by the dotted curve in Figure 3. This piecewise linear assumption will lead to conservative estimates of the benchmar dose if log( log q) actually falls below the straight line in the initial dose range 0-30 mg/g. We observed earlier from Table 4 that the folded-logistic model underestimates the number of affected litters for the control group and overestimates for the dose groups. This suggests that the folded-logistic model will overestimate the excess ris and hence result in a benchmar dose that is lower than necessary. Table 6 confirms this view as the benchmar dose and lower effective dose determined by the folded logistic fit are uniformly smaller than that obtained from the q-power model. 6. Conclusions 23

24 In conclusion, we have proposed in this paper a power family of distributions that allows flexible dose-response modelling of the response probability as well as association structure. Estimating the probability that at least one littermate is affected is particularly easy under the proposed model and this has applications in a litter-based approach to quantitative ris assessment. The q-power distribution is fairly easy to fit with the aid of symbolic differentiation and there is no need to use pseudo lielihood or Monte Carlo approximation. It provides a better fit to several data sets that we have looed at than existing distributions. All these lead us to the belief that the q-power class of distributions is suitable for routine use and provides an additional option in the analysis of clustered binary data for statisticians woring in toxicology, teratology and other disciplines. Acnowledgements The author would lie to than the editor and the referees for their helpful suggestions. Appendix A: Proof that g x F ( x) = p is a completely monotone function Let G( x) log{ F( x) } = x log p ( ) =, we have G ( x) = ( )...( + ) x log p. ( ) Since 0 < p, <, it is clear that ( ) G ( x) 0 for x > 0. Now G( x) F ( x) = e and so () () ( ) F ( x) = F ( x) G ( x) is negative as required. It can be proven by induction that F ( x) is of the form ( ) = ( r) ( r) F ( x) a rf ( x) G ( x), r= 0 where (0) a are some positive constants and F ( x) = F( x) by convention. Assuming r ( ) ( ) r F r ( x) 0 is true for r = 0,...,, it follows that ( ) ( ) F ( x) = a r= 0 r ( ) r F ( r) ( x)( ) r G ( r) ( x) 0 24

25 and hence F (x) is completely monotone by mathematical induction. References Altham, P. M. E. (978) Two generalizations of the binomial distribution. Appl. Statist., 27, Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (975) Discrete Multivariate Analysis: Theory and Prentice. Cambridge, Mass: MIT Press. Bowman, D. and George, E. O. (995) A saturated model for analyzing exchangeable binary data: Applications to clinical and developmental toxicity study. J. Am. Statist. Assoc., 90, Bowman, D., Chen, J. J., and George E. O. (995) Estimating variance functions in developmental toxicity studies. Biometrics, 5, Broos, S. P., Morgan, B. J. T., Ridout, M. S. and Pac, S. E. (997) Finite mixture models for proportions. Biometrics, 53, Catalano, P. J., Ryan, L. M. and Scharfstein, D. (994) Modelling fetal death and malformation in developmental toxicity. Ris Analysis, 4, Chen, J. J. and Kodell, R. L. (989) Quantitative ris assessment for teratological effects. J. Am. Statist. Assoc., 84, Connolly, M. A. and Liang, K. Y. (988) Conditional logistic regression models for correlated binary data. Biometria, 75, Crump, K. S. (984) A new method for determining allowable daily intaes. Fundamental and Applied Toxicology, 4, Dominici, F. and Parmigiani, G. (200) Bayesian semiparametric analysis of developmental toxicology data. Biometrics, 57,

26 Eholm, A., Smith, P. W. F. and McDonald, J. W. (995) Marginal regression analysis of a multivariate binary response. Biometria, 82, Faustman, E. M., Allen, B. C., Kavloc, R. J. and Kimmel, C. A. (994) Dose response assessment for developmental toxicity. I. Characterization of database and determination of no observed adverse effect levels. Fundamental and Applied Toxicology, 23, Feller, W. (97) An Introduction to Probability Theory and Its Applications, Volume II, 2nd ed. New Yor: Wiley. Fitzmaurice, G. M. and Laird, N. M. (993) A lielihood-based method for analysing longitudinal binary responses. Biometria, 80, 4-5. Fleiss, J. L. (98) Statistical Methods for Rates and Proportions, 2nd ed. New Yor: Wiley. George, E. O. and Bowman, D. (995) A full lielihood procedure for analysing exchangeable binary data. Biometrics, 5, George, E. O. and Kodell, R. L. (996) Tests of independence, treatment heterogenity, and dose-related trend with exchangeable binary data. J. Am. Statist. Assoc., 9, Geys, H., Molenberghs, G. and Ryan, L. (999) Pseudolielihood modeling of multivariate outcomes in developmental toxicology. J. Am. Statist. Assoc., 94, Haseman, J. K. and Kupper, L. L. (979) Analysis of dichotomous response data from certain toxicological experiments. Biometrics, 35,

27 Holston, J. F., Gaines, T. B., Nelson, C. J., LaBorde, J. B., Gaylor, D. W., Sheehan, D. M. and Young, J. F. (99). Developmental toxicity of 2,4,5-trichlorophenoxiacetic acid I: Multireplicated dose response studies in four inbred strains and one outbred stoc of mice. Fundamental and Applied Toxicology, 9, Kimmel, C. A. and Gaylor, D. W. (988) Issues in qualitative and quantitative ris analysis for developmental toxicology. Ris Analysis, 8, Ku, A.Y.C. (2003) A generalised estimating equation approach to modelling foetal response in developmental toxicity studies when number of implants is dosedependent. Applied Statistics, 52, 5-6. Kupper, L. L. and Haseman, J. K. (978) The use of a correlated binomial model for the analysis of certain toxicological experiments. Biometrics, 35, Kupper, L. L., Portier, C., Hogan, M. D. and Yamamoto, E. (986) The impact of litter effects on dose-response modeling in teratology. Biometrics, 42, Liang, K. Y. and Hanfelt, J. (994) On the use of quasi-lielihood method in teratological experiments. Biometrics, 50, Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (99) Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometria, 78, Lipsitz, S. R., Fitzmaurice, G. M., Sleeper, L. and Zhao, L. P. (995) Estimation methods for the joint distribution of repeated binary observations. Biometrics, 5, Molenberghs, G., Declerc, L. and Aerts, M. (998) Misspecifying the lielihood for clustered binary data. Comput. Statist. Data Anal., 26,

28 Moore, D. F. (986) Asymptotic properties of moment estimators for overdispersed counts and proportions. Biometria, 73, Ochi, Y. and Prentice, R. L. (984) Lielihood inference in a correlated probit regression model. Biometria, 7, Regan, M. M. and Catalano, P. J. (999) Lielihood models for clustered binary and continuous outcomes: applications to developmental toxicology. Biometrics, 55, Rosner, B. (984) Multivariate methods in ophthalmology with applications to other paired-data situations. Biometrics, 40, Ryan, L. (992) Quantitative ris assessment for developmental toxicity. Biometrics, 48, Sellam, J. G. (948) A probability distribution derived from the binomial distribution by regarding the probability of a success as variable between the sets of trials. J. R. Statist. Soc. B, 0, Weil, C. S. (970) Selection of valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food and Cosmetics Toxicology, 8, Williams, D. A. (975) The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics, 3, Williams, D. A. (982) Extra-binomial variation in logistic-linear models. Appl. Statist., 3, Williams, D. A. (988) Estimation bias using beta-binomial distribution in teratology. Biometrics, 44,

29 Table. The fits of various distributions to Sellam s Brassica data Number of associations Pearson χ Log-li Observed Binomial Beta-binomial Folded logistic Extended folded logistic p-power q-power The results for binomial, beta-binomial, additive and multiplicative correlated binomial distributions are reproduced from Altham (978). Table 2. Fitting Weil s data by the q-power family of distributions βˆ SE( βˆ ) pˆ T pˆ C ρˆ T ρˆ C φˆ T φˆ C Log-li Separate fit Common ρ Common φ Table 3. Estimated response probabilities and intra-litter correlations for the 2,4,5-T data GEE Beta-binomial Folded-logistic q-power Dose group p ρ p ρ p ρ p ρ Control mg/g mg/g mg/g mg/g mg/g Log lielihood The results for GEE, beta-binomial and folded-logistic models are reproduced from George and Bowman (995) 29

30 Table 4. Estimated number of affected foetuses and litters for the 2,4,5-T data Number of affected foetuses Number of affected litters Dose group Observed GEE Beta-binom F-logistic q-power Observed F-logistic q-power Control mg/g mg/g mg/g mg/g mg/g The results for GEE, beta-binomial and folded-logistic models are reproduced from George and Bowman (995) Table 5. Estimated number of affected litters for the 2,4,5-T data for different distributions with the same foetus response probabilities and intra-litter correlations Number of affected litters Dose group p ρ Observed Beta-Bin Extended FL p-power q-power Control mg/g mg/g mg/g mg/g mg/g Table 6. Determination of benchmar and lower effective dose in mg/g based on the 2,4,5-T data Foetus-based approach Litter-based approach Folded logistic q-power Folded logistic q-power BMD LED BMD LED

31 Fig.. The relationship between intra-litter correlation and marginal response probability under the folded-logistic model 3

32 Probability p=. rho=. q-power p-power Probability p=. rho= Number of affected foetuses Number of affected foetuses Probability p=.2 rho=. Probability p=.2 rho= Number of affected foetuses Number of affected foetuses Probability p=.3 rho=. Probability p=.3 rho= Number of affected foetuses Number of affected foetuses Fig. 2. A comparison of q-power versus p-power probability functions for litter size 5 32

33 log(-log(-p)) log odds ratio dose Fig. 3. A plot of the estimated foetal response probabilities in complementary log-log scale (unfilled circles) and log odds ratio (filled circles) versus dose levels for the 2,4,5-T data: The circles are the results of fitting a separate q-power distribution to each dose group and the lines are obtained from dose-response models (22) and (23) 33

34 Probability Dose=30mg/g, p=.374, rho=.577 q-power p-power beta-binomial extended folded-logistic Number of affected foetuses Dose=45mg/g, p=.2682, rho=.3358 Probability Number of affected foetuses Probability Dose=60mg/g, p=.4829, rho= Number of affected foetuses Fig. 4. A comparison of the q-power with other distributions that share the same foetus response probability p and intra-litter correlation ρ 34

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,

More information

arxiv: v2 [stat.me] 27 Aug 2014

arxiv: v2 [stat.me] 27 Aug 2014 Biostatistics (2014), 0, 0, pp. 1 20 doi:10.1093/biostatistics/depcounts arxiv:1305.1656v2 [stat.me] 27 Aug 2014 Markov counting models for correlated binary responses FORREST W. CRAWFORD, DANIEL ZELTERMAN

More information

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

T E C H N I C A L R E P O R T A HIERARCHICAL MODELING APPROACH FOR RISK ASSESSMENT IN DEVELOPMENTAL TOXICITY STUDIES

T E C H N I C A L R E P O R T A HIERARCHICAL MODELING APPROACH FOR RISK ASSESSMENT IN DEVELOPMENTAL TOXICITY STUDIES T E C H N I C A L R E P O R T 0464 A HIERARCHICAL MODELING APPROACH FOR RISK ASSESSMENT IN DEVELOPMENTAL TOXICITY STUDIES FAES, C., GEYS, H., AERTS, M. and G. MOLENBERGHS * I A P S T A T I S T I C S N

More information

On the multivariate probit model for exchangeable binary data. with covariates. 1 Introduction. Catalina Stefanescu 1 and Bruce W. Turnbull 2.

On the multivariate probit model for exchangeable binary data. with covariates. 1 Introduction. Catalina Stefanescu 1 and Bruce W. Turnbull 2. On the multivariate probit model for exchangeable binary data with covariates Catalina Stefanescu 1 and Bruce W. Turnbull 2 1 London Business School, Regent s Park, London NW1 4SA, UK 2 School of Operations

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

1 Introduction: Extra-Binomial Variability In many experiments encountered in the biological and biomedical sciences, data are generated in the form o

1 Introduction: Extra-Binomial Variability In many experiments encountered in the biological and biomedical sciences, data are generated in the form o Bootstrap Goodness-of-Fit Test for the Beta-Binomial Model STEVEN T. GARREN 1, RICHARD L. SMITH 2 &WALTER W. PIEGORSCH 3, 1 Department of Mathematics and Statistics, James Madison University, Harrisonburg,

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

A Novel Application of a Bivariate Regression Model for Binary and Continuous Outcomes to Studies of Fetal

A Novel Application of a Bivariate Regression Model for Binary and Continuous Outcomes to Studies of Fetal A Novel Application of a Bivariate Regression Model for Binary and Continuous Outcomes to Studies of Fetal Toxicity Julie S. Najita, Yi Li, and Paul J. Catalano Department of Biostatistics, Harvard School

More information

Multivariate clustered data analysis in developmental toxicity studies

Multivariate clustered data analysis in developmental toxicity studies 319 Statistica Neerlandica (2001) Vol. 55, nr. 3, pp. 319±345 Multivariate clustered data analysis in developmental toxicity studies G. Molenberghs and H. Geys Biostatistics, Center for Statistics, Limburgs

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Inverse Sampling for McNemar s Test

Inverse Sampling for McNemar s Test International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Efficiency of generalized estimating equations for binary responses

Efficiency of generalized estimating equations for binary responses J. R. Statist. Soc. B (2004) 66, Part 4, pp. 851 860 Efficiency of generalized estimating equations for binary responses N. Rao Chaganty Old Dominion University, Norfolk, USA and Harry Joe University of

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

SUPPLEMENTARY SIMULATIONS & FIGURES

SUPPLEMENTARY SIMULATIONS & FIGURES Supplementary Material: Supplementary Material for Mixed Effects Models for Resampled Network Statistics Improve Statistical Power to Find Differences in Multi-Subject Functional Connectivity Manjari Narayan,

More information

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

Estimation and sample size calculations for correlated binary error rates of biometric identification devices Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

ANALYSIS OF DOSE-RESPONSE DATA IN THE PRESENCE OF EXTRA-BINOMIAL VARIATION

ANALYSIS OF DOSE-RESPONSE DATA IN THE PRESENCE OF EXTRA-BINOMIAL VARIATION ANALYSIS OF DOSE-RESPONSE DATA IN THE PRESENCE OF EXTRA-BINOMIAL VARIATION Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, N.C. 27695-8203 and Division of Biometry and

More information

ON THE USE OF A CORRELATED BINOMIAL MODEL FOR THE

ON THE USE OF A CORRELATED BINOMIAL MODEL FOR THE 'e ON THE USE OF A CORRELATED BINOMIAL MODEL FOR THE ~~ALYSIS OF CERTAIN TOXICOLOGICAL EXPERIMENTS by L.L. Kupper Department of Biostatistics University of North Carolina, Chapel Hill J.K. Haseman Biometry

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis Hanxiang Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis March 4, 2009 Outline Project I: Free Knot Spline Cox Model Project I: Free Knot Spline Cox Model Consider

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Lecture 1 Introduction to Multi-level Models

Lecture 1 Introduction to Multi-level Models Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University Calculating Effect-Sizes David B. Wilson, PhD George Mason University The Heart and Soul of Meta-analysis: The Effect Size Meta-analysis shifts focus from statistical significance to the direction and

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

THE EGG GAME DR. WILLIAM GASARCH AND STUART FLETCHER

THE EGG GAME DR. WILLIAM GASARCH AND STUART FLETCHER THE EGG GAME DR. WILLIAM GASARCH AND STUART FLETCHER Abstract. We present a game and proofs for an optimal solution. 1. The Game You are presented with a multistory building and some number of superstrong

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

Pseudo-score confidence intervals for parameters in discrete statistical models

Pseudo-score confidence intervals for parameters in discrete statistical models Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters

More information

Developmental Toxicity Studies

Developmental Toxicity Studies A Bayesian Nonparametric Modeling Framework for Developmental Toxicity Studies Kassandra Fronczyk and Athanasios Kottas Abstract: We develop a Bayesian nonparametric mixture modeling framework for replicated

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Journal of Data Science 8(2010), 43-59 A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Jing Wang Louisiana State University Abstract: In this paper, we

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping : Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj InSPiRe Conference on Methodology

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University. Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their

More information

Zero-Inflated Models in Statistical Process Control

Zero-Inflated Models in Statistical Process Control Chapter 6 Zero-Inflated Models in Statistical Process Control 6.0 Introduction In statistical process control Poisson distribution and binomial distribution play important role. There are situations wherein

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES Libraries Annual Conference on Applied Statistics in Agriculture 2005-17th Annual Conference Proceedings BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES William J. Price Bahman Shafii Follow this

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Session 3 The proportional odds model and the Mann-Whitney test

Session 3 The proportional odds model and the Mann-Whitney test Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression. Ryan Godwin. ECON University of Manitoba Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

More information

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics

More information

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

Chapter 8 Heteroskedasticity

Chapter 8 Heteroskedasticity Chapter 8 Walter R. Paczkowski Rutgers University Page 1 Chapter Contents 8.1 The Nature of 8. Detecting 8.3 -Consistent Standard Errors 8.4 Generalized Least Squares: Known Form of Variance 8.5 Generalized

More information

A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All?

A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All? A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All? Authors: Sudhir Paul Department of Mathematics and Statistics, University of Windsor, Ontario, Canada (smjp@uwindsor.ca)

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information