Multivariate clustered data analysis in developmental toxicity studies

Size: px
Start display at page:

Download "Multivariate clustered data analysis in developmental toxicity studies"

Transcription

1 319 Statistica Neerlandica (2001) Vol. 55, nr. 3, pp. 319±345 Multivariate clustered data analysis in developmental toxicity studies G. Molenberghs and H. Geys Biostatistics, Center for Statistics, Limburgs Universitair Centrum, Universitaire Campus, B3590 Diepenbeek, Belgium In this paper we review statistical methods for analyzing developmental toxicity data. Such data raise a number of challenges. Models that try to accommodate the complex data generating mechanism of a developmental toxicity study, should take into account the litter effect and the number of viable fetuses, malformation indicators, weight and clustering, as a function of exposure. Further, the size of the litter may be related to outcomes among live fetuses. Scienti c interest may be in inference about the dose effect, on implications of model misspeci cation, on assessment of model t, and on the calculation of derived quantities such as safe limits, etc. We describe the relative merits of conditional, marginal and random-effects models for multivariate clustered binary data and present joint models for both continuous and discrete data. Key Words and Phrases: dose-response models, generalized estimating equations, conditional model, marginal model, maximum likelihood, pseudo-likelihood. 1 Introduction Lately, society has been increasingly concerned about problems related to fertility and pregnancy, birth defects, and developmental abnormalities. Consequently, regulatory agencies such as the U.S. Environmental Protection Agency (EPA) and the Food and Drug Administration (FDA) have given increased priority to protection against drugs, harmful chemical compounds and other environmental hazards. As epidemiological evidence of adverse effects on fetal development may not be available for speci c chemical compounds present in the environment, laboratory experiments in small mammalian species provide an alternative source of evidence essential for identifying potential developmental toxicants. For ethical reasons, animal studies afford a greater level of control than epidemiological studies. Moreover, they can be conducted in advance of human exposure. Laboratory studies have been used for several decades, but methods for extrapolating the results to humans are still being developed and re ned. Since these laboratory studies involve considerable amounts of time and money, as well as huge numbers of animals, it is essential that the most geert.molenberghs@luc.ac.be, helena.geys@luc.ac.be. Published by Blackwell Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA.

2 320 G. Molenberghs and H. Geys appropriate and ef cient statistical models are used (WILLIAMS and RYAN, 1996). Three standard procedures (Segments I, II and III) have been established to assess speci c types of effects. Segment I or fertility studies are designed to assess male and female fertility and general reproductive ability. Such studies are typically conducted in one species of animals and involve exposing males for 60 days and females for 14 days prior to mating. Segment II studies are also referred to as teratology studies, since historically the primary goal was to study malformations. Segment III tests are focused on effects later in gestation and involve exposing pregnant animals from the 15th day of gestation through lactation. In this paper, we concentrate on segment II studies. A Segment II experiment involves exposing timed-pregnant animals (rats, mice and occasionally rabbits for which the time of fertilization can be calculated) during major organogenesis (days 6 to 15 for mice and rats) and structural development. Administration of the exposure is generally by the clinical or environmental routes most relevant for human exposure. Dose levels consist of a control group and 3 or 4 dose groups, each with 20 to 30 pregnant dams. The dams are sacri ced just prior to normal delivery, at which time the uterus is removed and thoroughly examined. An interesting aspect of Segment II designs is the hierarchical structure of the developmental outcomes. Figure 1 illustrates the data structure. An implant may be resorbed at different stages during gestation (very early death that is detectable at the time of maternal sacri ce as a small mark on the uterine wall). If the implant survives being resorbed, the developing fetus is at risk of fetal death. Adding the number of resorptions and fetal deaths yields the number of non-viable fetuses. If the fetus survives the entire gestation period, growth reduction such as low birth weight may occur. The fetus may also exhibit one or more types of malformation. These are commonly classi ed into three broad categories: (i) external malformations are those visible by naked eye, for instance missing limbs; (ii) skeletal malformations might include missing or malformed bones; and (iii) visceral malformations affect internal organs such as the heart, the brain, the lungs etc. Each speci c malformation is typically recorded as a dichotomous variable (present or absent). Adding the number of resorptions, the number of fetal deaths and the number of viable fetuses yields the total number of implantations. Since exposure to the test agent takes place after implantation, the number of implants, a random variable, is not expected to be doserelated. While such an hierarchical approach naturally deals with a structure which could also be considered to be incomplete, classical missing data methods are very uncommon in this eld. The analysis of developmental toxicity data as described above, raises a number of challenges (MOLENBERGHS et al., 1998; ZHU and FUNG, 1996), summarized below. Because of genetic similarity and the same treatment conditions, offspring of the same mother behave more alike than those of different mothers. This has been termed litter effect. As a result, responses on different fetuses within a cluster are likely to be correlated, inducing extra variation in the data relative to those associated with the common binomial or multinomial distribution. This extra variation must be

3 Clustered data in toxicity studies 321 Fig. 1. Data structure of developmental toxicity studies. taken into account in statistical analyses (CHEN and KODELL, 1989; KUPPER et al., 1986). Since deleterious events can occur at several points in development, an interesting aspect lies in the staging or hierarchy of possible adverse fetal outcomes (WILLIAMS and RYAN, 1996). Ultimately, a model should take into account this hierarchical structure in the data: (i) a toxic insult early in gestation may result in a resorbed fetus; (ii) thereafter an implant is at risk of fetal death; (iii) fetuses that survived the entire gestation period are threatened by low birth weight and/or several types of malformation. While some attempts have been made for the joint analysis of prenatal death and malformation (CHEN et al., 1991; RYAN, 1992), the analysis of developmental toxicity data has usually been conducted on the number of viable fetuses alone. An appropriate statistical model should then account for possible correlations among the different fetal endpoints. As the number of viable fetuses can

4 322 G. Molenberghs and H. Geys sometimes affect the chance of an adverse effect (in a large litter a larger number of animals have to compete for the same maternal resources and therefore the probability of malformation may be larger), a model should also be exible enough to allow litter size to affect response probabilities. Finally, one may have to deal with outcomes of a mixed continuous (low birth weigth)/discrete (malformation indicator) nature. A unique type of developmental toxicity study was originally developed by BROWN and FABRO (1981) to assess the impact of heat stress on embryonic development. Subsequent adaptations by KIMMEL et al. (1994) allow the investigation of effects, related to both temperature and duration of exposure. These heatshock experiments are described in detail in Section Motivating studies In this section we introduce two developmental toxicity studies conducted by the Research Triangle Institute under contract to the National Toxicology Program (NTP). The studies concerned the effects in mice, respectively rats of di(2- ethylhexyl)-phthalate (DEHP) (TYL et al., 1988) and ethylene glycol (EG) (PRICE et al., 1987). Further, we also describe the heatshock studies introduced by BROWN and FABRO (1981). 2.1 DEHP study in mice The use of phtalic acid esters as plasticizers for numerous plastic devices is widespread. The most commonly used ester is di(2-ethylhexyl)-phthalate (DEHP), which may constitute as much as 40% by weight of the nished products, in order to provide them with a desirable exibility and clarity. It has been well documented that small quantities of phtalic acid esters may leak out of plastic containers in the presence of food, milk, blood, or various solvents. Owing to their ubiquitous distribution and presence in human and animal tissues, the possible toxic effects of the phtalic acid esters have been the subject of considerable concern. In particular, the developmental toxicity study described by TYL et al. (1988) has attracted much interest in the toxicity of DEHP. The doses selected for the study were 0, 0:025, 0:05, 0:10 and 0:15% DEHP with 25, 26, 26, 17 and 9 timed-pregnant mice assigned to each of these dose groups, respectively. Females were observed daily during treatment, but no maternal deaths or distinctive clinical signs were observed. The dams were sacri ced, slightly prior to normal delivery and the status of uterine implantation sites recorded. A total of 1082 live fetuses were dissected from the uterus, anaesthetised, and examined for external, visceral and skeletal malformations. Table 1 shows for each dose group, the number of pregnant dams, the mean litter size for live fetuses and the rate of malformation (number of live fetuses affected 3 100=number of live fetuses) for three different classes: external malformations, visceral malformations and skeletal malformations. The table suggests clear doserelated trends in the malformation rates. The average litter size (number of viable

5 Clustered data in toxicity studies 323 Table 1. Summary data from a DEHP experiment in mice Dose Dams Litter Size Malformations (%) (mean) Ext. Visc. Skel animals) decreases with increased levels of exposure to DEHP, a nding that is attributable to the dose-related increase in fetal deaths. 2.2 EG study in rats PRICE et al. (1985) describe a developmental toxicity experiment, investigating the effect of EG in rats. The doses selected for the present teratology study were 0, 1:25, 2:50 and 5:0 g=kg day. A total of 1368 live rat fetuses were examined for low birth weight (continuous) or defects (binary). This joint occurence of continuous and binary outcomes will provide additional challenges in model development. Table 2 summarizes the malformation and fetal weight data from this experiment. The data show clear dose-related trends for both outcomes. The rate of malformation increases with dose, ranging from 1:3% in the control group to 68:6% in the highest dose group. The mean fetal weight decreases monotonically with increasing dose, ranging from 3.40 g to 2.48 g in control and highest dose group, respectively. The fetal weight variances, however, do not change monotonically with dose. In the lower dose groups, the variances remain approximately constant. However, in the highest dose group, the fetal weight variance is elevated. Further, it can be observed that simple Pearson correlation coef cients (r) between weight and malformation tend to strengthen with increasing doses. As doses increase, the correlation becomes more negative, because the probability of malformation is increasing and fetal weight is decreasing. This is illustrated in Figure 2, which shows the observed malformation rates for all clusters (litters), the averaged malformation rates for each dose groups Table 2. Summary data from an EG experiment in rats Dose Dams Litter Size Malf. Weight Pearson (g=kg day) (mean) Nr. % Mean SD Corr. (r)

6 324 G. Molenberghs and H. Geys Fig. 2. EG (rats) study. Top panel: observed malformation rates for all clusters ( ) (the size of the symbol re ects the multiplicity of the rate) and the observed malformation rates for all dose groups (3). Bottom panel: average weights for all clusters (d) and average weights for all dose groups (3).

7 Clustered data in toxicity studies 325 (with 95% con dence limits), the average weight outcomes for all clusters and the average weight outcomes (with 95% con dence limits) for each dose group. 2.3 Heatshock studies Heatshock studies have been described by BROWN and FABRO (1981) and KIMMEL et al. (1994). In these studies, embryos are removed from the uterus of the maternal dam and cultured in vitro. Next, each embryo is exposed to a short period of heat stress by placing the culture vial into a warm water bath, involving an increase over body temperature of 4 to 58C for a duration of 5 to 45 minutes. The embryos are examined 24 hours later for impaired or accelerated development. This type of developmental test system has several advantages over the standard Segment II design. First of all, the exposure is directly administered to the embryo, so that controversial issues regarding the unknown relationship between the exposure level to the maternal dam and that which is actually received by the embryo, need not be taken into account. Secondly, the exposure pattern can be easily controlled, since target temperature levels in the warm water baths can be achieved within 2 minutes. Further, information regarding the effects of exposure are quickly obtained, in contrast to the Segment II study which requires 8 to 12 days after exposure to assess impact. And nally, this animal test system provides a convenient mechanism for examining the joint effects of both duration of exposure and exposure levels. The studies collect measurements on 13 morphological variables. We will focus our attention on three of these (olfactory system (OLF) connected with the sense of smell, optic system (OPT) related to vision, and midbrain (MBN), the middle of the three primary divisions of the brain) and assess the effects of both duration and level of exposure on each morphological endpoint, coded as affected (1) versus normal (0). The study design for the set of experiments conducted by KIMMEL et al. (1994) is shown in Table 3, which indicates the number of embryos cultured in each temperature±duration combination. A total of 375 embryos, arising from 71 initial dams, survived the heat exposure and were used for analysis. Table 3. Heatshock studies: Number of (viable) embryos exposed to each combination of duration and temperature Temperature Duration of Exposure Total Total

8 326 G. Molenberghs and H. Geys The distribution of cluster sizes, ranging between 2 and 11, is given in Table 4. The mean cluster size is 5. Since only surviving fetuses were included, cluster sizes are smaller than those observed in most other developmental toxicity studies and do not re ect the true original litter size. 3 Accounting for litter effects Failure to account for the clustering in the data can lead to serious underestimation of the variances of dose effect parameters and, hence, in ated test statistics. The need for methods that appropriately account for the heterogeneity among litters, especially with regard to binary outcomes, has long been recognized. HASEMAN and KUPPER (1979) provide an excellent survey of likelihood generalizations of standard distributions to account for clustering. At a more complicated level, HOUWING- DUISTERMAAT and VAN HOUWELINGEN (1998) incorporate a familial association structure in logistic regression models. In general, models for multivariate correlated binary data can be grouped into the following different classes: conditionally speci ed models, marginal models, or cluster-speci c models (DIGGLE, LIANG and ZEGER, 1994). The answer to the question, `Which model family is to be preferred', depends principally on the research question(s) to be answered. In conditionally speci ed models the probability of a positive response for one member of the cluster is modelled conditionally on other outcomes for the same cluster, while marginal models relate the covariates directly to the marginal probabilities. Cluster-speci c (CS) models differ from the two previous models by the inclusion of parameters which are speci c to the cluster. What method is used to t the model depends, to a large extent, on the assumptions the investigator is willing to make. If one is willing to specify fully the joint probabilities, maximum likelihood methods can be adopted. Yet, if only a partial description in terms of marginal or conditional probabilities is given, one has to rely on non-likelihood methods such as: generalized estimating equations or pseudolikelihood methods. 3.1 Conditional modelling Owing to the popularity of marginal (especially generalized estimating equations) and random-effects models for correlated binary data, conditional models have received relatively little attention, especially in the context of multivariate clustered data. DIGGLE, LIANG and ZEGER (1994, pp. 147±148) criticized the conditional approach because the interpretation of the dose effect on the risk of one outcome is Table 4. Heatshock studies: distribution of cluster sizes cluster size n i number of clusters of size n i

9 Clustered data in toxicity studies 327 conditional on the responses of other outcomes for the same individual, outcomes of other individuals and the litter size. MOLENBERGHS and RYAN (1999), henceforth abbreviated as MR, discuss the advantages of conditional models and how, with appropriate care, the disadvantages can be overcome. They constructed the joint distribution for clustered multivariate binary outcomes, based on a multivariate exponential family model (COX, 1972). De ning z ij as the number of individuals from cluster i, positive on outcome j and z ijj9 as the number, positive on both outcomes j and j9, the model is expressed as: f Yi (y i ; È i ) 8 < ˆ exp : X M jˆ1 è ij z (1) ij XM jˆ1 ä ij z (2) ij X j, j9 9 ù ijj9 z (3) ijj9 X = ã ijj9 z (4) ijj9 A(È i) ;, j, j9 (1) where z (1) ij ˆ z ij, z (2) ij ˆ z ij (n i z ij ), z (3) ijj9 ˆ 2z ijj9 z ij z ij9, z (4) ijj9 ˆ z ij(n i z ij9 ) z ij9 (n i z ij ) z (3) ijj9. A(È i) is the normalizing constant, resulting from summing the previous model over all 2 Mn i possible outcomes. The advantage of this model is the exibility with which both main effects and associations can be modelled, and the absence of constraints on the parameter space which eases interpretability. Further, the fact that the probability model depends explicitly and implicitly on the cluster size is seen as an advantage since it is in line with the observation that litter size itself depends on the level of exposure. The exibility of the MR model partly relies on the exponential family framework. However, the classical advantages of exponential families can be lost, especially in multivariate settings, where the normalizing constant poses excessive computational requirements. Several suggestions have been made to overcome this problem, such as Monte Carlo integration (TANNER, 1991). For example, GEYER and THOMPSON (1992) use Markov chain Monte Carlo simulations to construct a Monte Carlo approximation to an analytically intractable likelihood. ARNOLD and STRAUSS (1991) and ARNOLD, CASTILLO and SARABIA (1992) propose the use of a so-called pseudo-likelihood (PL). Pseudo-likelihood (or pseudo-maximum-likelihood) methods are alternatives to maximum likelihood estimation that retain the methodology and properties while trying to eliminate some of the dif culties such as strong distributional assumptions or intensive computations. The idea is that a parametric family of models is speci ed, to which likelihood methodology is applied; the method is denoted `pseudo', as there is no assumption that this family is the true distribution generating the data. GEYS, MOLENBERGHS and RYAN (1997, 1999) implemented a pseudo-likelihood method for the MR model that replaces the joint distribution of the responses, a multivariate exponential-family model, by a product of conditional densities that do not necessarily multiply to the joint distribution. In this approach, the normalizing constant cancels, thus greatly simplifying computations, especially when litter sizes are large

10 328 G. Molenberghs and H. Geys and variable (since the normalizing constant depends on litter size). Let (y 1,..., y N ) be a set of M-dimensional observation vectors. De ne S as the set of all 2 M 1 vectors of length M, consisting solely of zeros and ones, with each vector having at least one nonzero entry. Denote by y (s) k the subvector of y k corresponding to the components of s 2 S that are nonzero. The associated joint density is written as f s (y (s) k ; È k). Specify a set ä ˆfä s js 2 Sg of real numbers, with at least one nonzero component and de ne the log pseudo-likelihood as: pl ˆ XN iˆ1 X s2s ä s ln f s (y (s) i ; È i ), (2) where some (though not all) of the ä s 's may be negative, but exp( pl) corresponds to a product of marginal and conditional densities. ARNOLD and STRAUSS (1991) established consistency and asymptotic normality. For clustered, multivariate binary data a convenient PL function is found by replacing the joint density (1), by the product of Mn i univariate conditional densities describing outcome j for the kth individual in a cluster, given all other outcomes in the cluster: PL(1) ˆ YN Y M Y n i iˆ1 jˆ1 kˆ1 f (y ijk jy ij9k9, j9 6ˆ j or k9 6ˆ k; È i ): (3) Equation (3) is but one de nition of the PL function. GEYS, MOLENBERGHS and RYAN (1997) considered an alternative speci cation and further showed that only a very small ef ciency loss was paid for enormous computational gains of pseudolikelihood over maximum likelihood. Moreover they proposed score and likelihood ratio tests for the pseudo-likelihood framework. They are easy to calculate, exhibit a very satisfactory behaviour and provide the necessary tools for model selection. AERTS and CLAESKENS (1997) showed how bootstrap approximations can be used as interesting alternatives to the classical asymptotic chi-squared distributions of these test statistics. As an illustration, Table 5 shows the maximum likelihood and pseudo-likelihood estimates of the DEHP study for the univariate conditional model, applied on each of the outcomes: external, visceral and skeletal malformation, as well as a collapsed Table 5. Maximum likelihood (model based standard errors; empirically corrected standard errors) and pseudo-likelihood (standard errors) estimates for the conditional model Method Par. External Visceral Skeletal Collapsed ML â (0.58;0.52) 2.39 (0.50;0.52) 2.79 (0.58;0.77) 2.04 (0.35;0.42) â d 3.07 (0.65;0.62) 2.45 (0.55;0.60) 2.91 (0.63;0.82) 2.98 (0.51;0.66) â a 0.18 (0.04;0.04) 0.18 (0.04;0.04) 0.17 (0.04;0.05) 0.16 (0.03;0.03) PL â (0.53) 2.30 (0.50) 2.41 (0.73) 1.80 (0.35) â d 3.24 (0.60) 2.55 (0.53) 2.52 (0.81) 2.95 (0.56) â a 0.18 (0.04) 0.20 (0.04) 0.21 (0.05) 0.20 (0.03)

11 Clustered data in toxicity studies 329 outcome, de ned to be 1 if any malformation occured and 0 otherwise. The natural parameters were modelled as follows: è i ˆ â 0 â d d i where d i is the dose level applied to the ith clsuter, and ä i ˆ â a, i.e. a constant association model. Obviously, ML and PL parameter estimates agree fairly closely. In all cases, we nd a highly signi cant dose trend and a signi cant clustering parameter. Since the pseudo-likelihood still re ects the underlying likelihood it can be useful for dose-response modelling. 3.2 Marginal modelling In marginal models, the parameters characterize the marginal probabilities of a subset of the outcomes, without conditioning on the other outcomes. BAHADUR (1961) proposed a marginal model, accounting for the association via marginal correlations. This model has also been studied by COX (1972), KUPPER and HASEMAN (1978) and ALTHAM (1978). Assuming exchangeability, in the sense that each fetus within a litter has the same malformation probability, and in addition setting all the three- and higher-way correlations equal to zero, Bahadur's representation can be simpli ed to give the following marginal distribution of Z i, the number of malformations in cluster i: f (z i jð i, r i ) ˆ ni z i ð z i i (1 ð i) n i z i " ( z i 1 ði 3 1 r i n )# i z i ði z i (n i z i ), 2 ð i 2 1 ð i where ð i denotes the malformation probability in the ith cluster, r i the pairwise correlation between any 2 malformation outcomes and n i denotes the litter size. A drawback of this approach is the fact that the correlation r i is highly constrained when the higher order correlations have been removed. Even when higher order parameters are included, the parameter space of marginal parameters and correlations often has a very peculiar shape. BAHADUR (1961) discusses restrictions on the parameter space in the case of a second order approximation. From these, it can be deduced that the lower bound approaches zero as the cluster size increases. However, it is important to note that also the upper bound for r i is constrained. Indeed, even though it is one for clusters of size two, the upper bound is in the range [1=(n i 1), 2=(n i 1)] for larger clusters. Taking a (realistic) cluster of size 12, the upper bound is in the range (0:09; 0:18). KUPPER and HASEMAN (1978) present numerical values for the constraints on r i for choices of ð i and n i. Restrictions for a speci c version where a third order association parameter is included as well, have been studied by PRENTICE (1988). A more general situation is discussed in DECLERCK, AERTS and MOLENBERGHS (1998). They have shown that the range of second order associations is markedly enlarged in a four-way Bahadur model. But

12 330 G. Molenberghs and H. Geys tting higher order Bahadur models is dif cult, due to the increasingly complicated nature of the restrictions on the parameter space. MOLENBERGHS and LESAFFRE (1994) and LANG and AGRESTI (1994) have proposed models which parameterize the association in terms of marginal odds ratios. DALE (1986) de ned the bivariate global odds ratio model, based on a bivariate Plackett distribution (PLACKETT, 1965). MOLENBERGHS and LESAFFRE (1994) extended this model to multivariate ordinal outcomes. They generalize the bivariate Plackett distribution in order to establish the multivariate cell probabilities. Their method involves solving polynomials of high degree and computing their derivatives. LANG and AGRESTI (1994) exploit the equivalence between direct modelling and imposing restrictions on the multinomial probabilities, using undetermined Lagrange multipliers. Alternatively, the cell probabilities can be tted using a Newton iteration scheme, as suggested by GLONEK and MCCULLAGH (1995). However, even though a variety of exible models exist, maximum likelihood can be unattractive due to excessive computational requirements, especially when high dimensional vectors of correlated data arise. As a consequence, alternative methods have been in demand. LIANG and ZEGER (1986) proposed so-called generalized estimating equations (GEE1) which require only the correct speci cation of the univariate marginal distributions provided one is willing to adopt `working' assumptions about the association structure. They estimate the parameters associated with the expected value of an individual's vector of binary responses and phrase the working assumptions about the association between pairs of outcomes in terms of marginal correlations. PRENTICE (1988) extended their results to allow joint estimation of probabilities and pairwise correlations. LIPSITZ, LAIRD and HARRINGTON (1991) modi ed the estimating equations of PRENTICE (1988) to allow modelling of the association through marginal odds ratios rather than marginal correlations. When adopting GEE1 one does not use information of the association structure to estimate the main effect parameters. As a result, it can be shown that GEE1 yields consistent main effect estimators, even when the association structure is misspeci ed. However, severe misspeci cation may seriously affect the ef ciency of the GEE1 estimators. In addition, GEE1 should be avoided when some scienti c interest is placed on the association parameters. A second order extension of these estimating equations (GEE2) that include the marginal pairwise association as well, has been studied by LIANG, ZEGER and QAQISH (1992). They note that GEE2 is nearly fully ef cient though bias may occur in the estimation of the main effect parameters when the association structure is misspeci ed. The results of applying the maximum likelihood and GEE2 method for the Bahadur model to the DEHP data are given in Table 6. They are not directly comparable with the parameters of the conditional model. The intercepts correspond to a low baseline malformation rate. The dose parameters show a signi cant dose trend in all cases. LE CESSIE and VAN HOUWELINGEN (1994) suggested approximating the true likelihood by means of a pseudo-likelihood function that is easier to evaluate and to

13 Clustered data in toxicity studies 331 Table 6. Parameter estimates for the Bahadur model Par. External Visceral Skeletal Collapsed Maximum likelihood Estimates (standard errors) â (0.39) 4.42 (0.33) 4.67 (0.39) 3.83 (0.27) â d 5.15 (0.56) 4.38 (0.49) 4.68 (0.56) 5.38 (0.47) â a 0.11 (0.03) 0.11 (0.02) 0.13 (0.03) 0.12 (0.03) GEE2 Estimates (standard error) â (0.37) 4.49 (0.36) 5.23 (0.40) 5.23 (0.40) â d 5.29 (0.55) 4.52 (0.59) 5.35 (0.60) 5.35 (0.60) â a 0.15 (0.05) 0.15 (0.06) 0.18 (0.02) 0.18 (0.02) maximize. They replace the likelihood contribution f (y i1,..., y ini ) by the product of all pairwise contributions f (y ij, y ik )(1< j, k < n i ). Grouping the outcomes for subject i into a vector Y i, the contribution of the ith cluster to the log pseudolikelihood is pl i ˆ P j, k ln f (y ij, y ik ) if it contains more than one observation and an ordinary logistic regression contribution otherwise. For binary data and taking the exchangeability assumption into account, the log pseudo-likelihood contribution pl i can be formulated as: pl i ˆ zi 2 ln ð i11 n i z i 2 ln (1 2ð i10 ð i11 ) z i (n i z i )ln(ð i10 ð i11 ) (4) where ð i11 denotes the bivariate probability of observing two successes and ð i10 is the marginal probability of observing one success. A non-equivalent speci cation of the pseudo-likelihood contribution for the ith cluster is pl i ˆ pl i =(n i 1). The factor 1=(n i 1) corrects for the fact that each response Y ij occurs n i 1 times in the ith contribution to the PL and it ensures that the PL reduces to full likelihood under independence. GEYS, MOLENBERGHS and LIPSITZ (1998) explore the connection between these pseudo-likelihoods and generalized estimating equations for marginally speci ed odds ratio models. They show under which conditions both PL approaches coincide and study the general differences. The relative merits of both methods in terms of computational ease and relative ef ciency are assessed. Table 7 shows the parameter estimates obtained by tting a marginal odds ratio model to the DEHP data (collapsed outcome only), using pseudo-likelihood as well as GEE methods. Table 7 shows that the parameter estimates, obtained by either the pseudolikelihood or generalized estimating equations approach, are comparable. Because main interest is focused on the dose effect, pl was used rather than pl. Dose effects and association parameters are always signi cant, except for the GEE1 association

14 332 G. Molenberghs and H. Geys Table 7. Pseudo-likelihood, GEE2 and GEE1 estimates (standard errors) for the marginal odds ratio model (collapsed outcome) Method â 0 â d â a PL 3.98 (0.30) 5.57 (0.61) 1.11 (0.27) GEE (0.25) 5.06 (0.51) 0.97 (0.23) GEE (0.31) 5.79 (0.62) 0.41 (0.34) estimates. The GEE1 standard error for â a is much larger than for its PL and GEE2 counterparts; the GEE2 standard error is the smallest. 3.3 Marginalized random-effects model In random-effects models, the intracluster correlation is assumed to arise from natural heterogeneity in the parameters across litters. SKELLAM (1948) assumes the random success probability P i in cluster i to follow a beta distribution with mean ð i and, given P i, the outcomes within the ith cluster follow a binomial distribution. This results in the beta-binomial model with marginal distribution B(ði (r 1 1) z i,(1 ð i )(r 1 1) (n i z i )) f (z i jð i, r) ˆ B(ð i (r 1 1), (1 ð i )(r 1 1)) ni z i where B(:, :) denotes the beta function. Note that the beta-binomial model and the Bahadur model have the same rst and second order moments and hence they both feature the intraclass correlation coef cient r as a measure of association. Table 8 gives parameter estimates for the beta-binomial model, applied to the DEHP study. Bahadur (ML and GEE2) and beta-binomial parameters have the same interpretation. The intercepts â 0 and dose effect â d parameters have similar numerical values but the situation is slightly different for â a. The beta-binomial estimate for â a is typically about double the corresponsing Bahadur maximum likelihood estimate. This is due to range restrictions on â a in the Bahadur model. AERTS, DECLERCK, and MOLENBERGHS (1997) and MOLENBERGHS, DECLERCK, and AERTS (1998) compared the conditional model, the Bahadur model, and the betabinomial model for parameter estimation, hypothesis testing, and safe dose determination. They concluded that the conditional model is computationally faster and more stable while the beta-binomial model has readily interpretable parameters. In both cases, the likelihood ratio test for no dose effect has satisfactory behaviour. The (5) Table 8. Maximum likelihood estimates (standard errors) for the beta-binomial model Par. External Visceral Skeletal Collapsed â (0.42) 4.38 (0.36) 4.88 (0.44) 3.83 (0.31) â d 5.20 (0.59) 4.42 (0.54) 4.92 (0.63) 5.59 (0.56) â a 0.21 (0.09) 0.22 (0.09) 0.27 (0.11) 0.32 (0.10)

15 Clustered data in toxicity studies 333 Bahadur model is hard to use, both from the computational view-point as well as due to parameter space restrictions (DECLERCK,AERTS, and MOLENBERGHS, 1998). 3.4 Cluster-speci c modelling Population-averaged models (PA) are commonly used in standard teratology studies with only cluster-level covariates. Nevertheless, the effects of individual-level exposures can also be estimated, but their interpretations are then based on the overall population. In contrast, in cluster-speci c models, explicit regression adjustments are made for cluster effects and hence all parameters are interpreted as within-cluster effects. Clearly, the choice between one or the other modelling approach depends primarily on the scienti c question that needs to be answered. If interest lies in overall effects of exposure on response, population-averaged models are most appropriate. In contrast, if interest lies in within-cluster comparisons, cluster-speci c approaches are most appropriate. Population-averaged models do not explicitly control for cluster effects and therefore within-litter differences may be confounded by within-litter variation due to unmeasured genetic and environmental factors (TEN HAVE and HARTZEL, 1995). Within the class of cluster-speci c models, one can study a mixed-effect logistic model as an alternative way of accounting for intra-litter heterogeneity as well as a conditional likelihood method. In the mixed-effect logistic procedure cluster effects are accommodated by assuming that they are realizations of a random variable and integrating over their distribution. With conditional likelihood, one conditions on the suf cient statistics for the cluster-speci c effects (TEN HAVE,LANDIS and WEAVER, 1995; CONAWAY, 1989). One should however bear in mind that it is not always appropriate to compare the results obtained with both approaches. NEUHAUS and KALBFLEISCH (1998) show that conditional likelihood methods estimate purely within-cluster covariate effects, whereas mixture model approaches estimate a weighted average of between- and within-cluster covariate effects. Therefore, in practice, mixed effect logistic models and conditional logistic models may estimate different types of effects and are uncomparable. Only in the case where within- and between-cluster covariate effects are the same, both approaches yield identical estimates with improved ef ciency of the mixed effects approach over the conditional logistic approach. Let us rst consider the mixed-effects logistic models, where the intercept terms b i are allowed to vary from cluster to cluster, according to a normal distribution: logit P(Y ik ˆ 1jb i, x ik ) ˆ x ik â b i : (6) In this formulation, x ik denotes the kth row of the design matrix X i. The regression parameters ( â) in this CS mixed-effects logistic model measure the change in the conditional logit of the probability of response with a unit increase in the corresponding covariates for individuals at the same random-effects level (e.g. within a cluster with only individual-level covariates). The association between littermates is induced by the random intercept. Because cluster sizes for developmental toxicology studies

16 334 G. Molenberghs and H. Geys are relatively small, more complex random-effect structures can seldom be addressed from a practical perspective. GEYS, MOLENBERGHS and WILLIAMS (2001a) used a direct maximum likelihood method using numerical integration, such as implemented in, for example, the MIXOR software package (HEDEKER and GIBBONS, 1993). Table 9 shows the parameter estimates, with p-values, for the mixed-effects logistic model (MIXLOG) and the compound symmetry model (CSYM), the latter belonging to the class of PA models. Notice the observed `shrinkage effect', which is in agreement with ndings of NEUHAUS and JEWELL (1993). One exception is formed by the OPT outcome, for which the correlation parameter was estimated negative. For all outcomes there is evidence of a signi cant effect of the cumulative exposure (dt) and a signi cant effect of duration of exposure at temperatures above normal body temperature (t ). Furthermore, the parameter estimate for t is negative, indicating that shorter durations of the same cumulative exposure cause more developmental damage than longer ones. Table 9 also shows the conditional logistic regression (CONDLOG) parameter estimates. All cluster level effects are conditioned out. Therefore we cannot obtain parameter estimates for the intercepts. Clearly, there is a large discrepancy between the MIXLOG and CONDLOG parameter estimates, especially for the OPT and OLF responses. NEUHAUS and KALBFLEISCH (1998) note that a covariate has both a between-cluster component, which may be summarized in terms of x i, the cluster mean, and a within-cluster component x ik x i. The CONDLOG approach estimates the pure within-cluster covariate effect of x ik x i. However, the MIXLOG approach estimates the effect of x ik. Therefore, the results of CONDLOG and MIXLOG are comparable only under the assumption of common between- and within-cluster covariate effects, in which case the MIXLOG approach is more ef cient. Otherwise, both procedures yield discrepant results and comparison of standard errors or statistical signi cance is not relevant. Table 9. Heatshock study: parameter estimates (standard errors; p-values) for the mixed effects logistic (MIXLOG), compound symmetry (CSYM) and conditional logistic (CONDLOG) models Outcome Par. Model MIXLOG CSYM CONDLOG MBN â (0.23;0.00) 1.82 (0.21;0.00) â t 4.23 (1.52;0.01) 3.97 (1.66;0.02) 4.64 (2.55;0.07) â dt 6.38 (1.60;0.00) 5.99 (1.69;0.00) 6.84 (2.63;0.01) OPT â (0.29;0.00) 2.47 (0.24;0.00) â t 3.68 (1.47;0.04) 3.73 (1.67;0.03) 1.46 (3.04;0.63) â dt 5.60 (1.36;0.00) 5.65 (1.66;0.00) 3.96 (3.01;0.19) OLF â (0.32;0.00) 1.56 (0.22;0.00) â t 5.70 (1.93;0.01) 4.71 (1.74;0.01) 3.40 (2.96;0.25) â dt 8.06 (1.95;0.00) 6.55 (1.77;0.00) 6.30 (3.04;0.04)

17 Clustered data in toxicity studies 335 For the heatshock studies the assumption of common between- and within-cluster covariate effects was satis ed for the MBN response. That explains the similarity in MIXLOG and CONDLOG parameter estimates for that response and the increased ef ciency of MIXLOG as opposed to the CONDLOG method. Where we found strong signi cant effects for c and h by the MIXLOG and PA approaches, we now observe a reduction in statistical signi cance of the CONDLOG estimates. This is in agreement with the results of NEUHAUS and LESPERANCE (1996), summarized in Section 3. The cumulative exposure and duration of exposure at `positive increases' of temperature are highly correlated (correlation coef cient ˆ 97%) and moreover the cluster sizes in the heatshock study are relatively small (mean cluster size is 5). In contrast, for OPT and OLF, the above assumption was not satis ed, explaining the large discrepancy between MIXLOG and CONDLOG estimates. A comparison of standard errors or statistical signi cance is thus not appropriate here, unless we t a mixed effects logistic model with separate parameters for the between- and withincluster covariate component. The within-cluster covariate effect estimates thus obtained for OPT are (s.e. ˆ 2:647) and 1:255 (s.e. ˆ 2:779) for cumulative exposure c and high temperature h respectively. Similarly, we found (s.e. ˆ 3:044) and 2:839 (s.e. ˆ 2:968) for the within-cluster covariate effects of c and h on OLF. Clearly, these estimates are again similar to the CONDLOG estimates, but not more ef cient. 4 Risk assessment Risk assessment can be de ned as (ROBERTS and ABERNATHY, 1996) `the use of available information to evaluate and estimate exposure to a substance and its consequent adverse health effects'. The ultimate goal in the risk assessment process is to determine a safe level of exposure. Traditionally, quantitative risk assessment in developmental toxicology has been based on the NOAEL, or No Observable Adverse Effect Level, which is the dose immediately below that deemed statistically or biologically signi cant when compared with controls. The NOAEL, however, has been criticized for its poor statistical properties (see for example, WILLIAMS and RYAN, 1996), so that attention has turned to more formal dose-response models. The standard approach requires the speci cation of an adverse event, along with r(d) representing the probability that this event occurs at dose level d. For developmental toxicity studies where offspring are clustered within litters, there are several ways to de ne the concept of an adverse effect. First, one can state that an adverse effect has occurred if a particular offspring is abnormal (fetus based). Alternatively, one might conclude that an adverse effect has occurred if at least one offspring from the litter is affected (litter based). Based on this probability, a common measure for the excess risk over background is de ned as r (d) ˆ r(d) r(0) (7) or as

18 336 G. Molenberghs and H. Geys r r(d) r(0) (d) ˆ, (8) 1 r(0) where de nition (8) puts greater weight on outcomes with large background risks. The benchmark dose (BMD q ) is then de ned as the dose satisfying r (d) ˆ q, where q corresponds to the pre-speci ed level of increased response and is typically speci ed as 0:01, 1, 5 or 10%. In practice, calculation of the BMD follows several steps. After choosing and tting an appropriate dose-response model, the excess risk function is solved for the dose, d, that yields r (d) ˆ q. Since the dose-response curve is estimated from data and has inherent variability, the BMD is itself only an estimate of the true dose that would result in this level of excess risk. The nal step therefore consists of acknowledging this sampling uncertainty for the model on which the BMD q is based, by replacing the BMD q by its lower con dence limit (WILLIAMS and RYAN, 1996). Several approaches have been proposed. The conventional approach was to use a Wald based method: BMDL d q ˆ BMD d q 1:645 q dvar( BMD d q ): However, it turned out that this approach suffers from severe drawbacks: it may yield negative lower limits, can yield unstable estimates, etc. (CRUMP and HOWE, 1983; KREWSKI and VAN RYZIN, 1981; CATALANO,RYAN,SCHARFSTEIN, 1994). Alternatively, an upper limit for the risk function can be computed, and thus the dose that corresponds to a q% increased response above background is determined from this upper limit curve by solving: ^r q (d) 1:645 dvar(^r (d)) ˆ q, where the variance of the estimated increased risk function ^r (d) is estimated as:! T dvar(^r (d)) r (d) and where dcov( ^â) is the estimated covariance matrix of ^â. The resulting dose level is referred to as the lower effective dose (LED q )(KIMMEL and GAYLOR, 1988). CRUMP and HOWE (1983) recommend using the asymptotic distribution of the likelihood ratio (if available). According to this method, an approximate 100(1 á)% lower limit for the BMD, denoted by BMD(1), corresponding to an excess risk of q is de ned as minfd(â): r(d; â) ˆ q over all â such that 2(l( ^â) l(â)) < 2 p (1 á)g, where l denotes the log-likelihood and p is the number of model parameters. A second approach, denoted BMD(2), is based on the pro le likelihood method (MORGAN 1992, Section 2.7.3). First, construct a pro le likelihood based con dence interval for the dose effect parameter â d. Secondly, transform this interval into an âˆ^â

19 Clustered data in toxicity studies 337 interval for d and check that the transformation is monotonic. AERTS, DECLERCK and MOLENBERGHS (1997) compare the different lower limits for the BMD and show that, in general, BMD(1) yields lower results than BMD(2). Furthermore, they note that for conditionally speci ed models, the transformation is not monotonic, and hence the BMD(2) should not be applied to such models. In Table 10 BMD(1) and BMD(2) are applied to the DEHP data. In general, VSD(1) yields lower results than VSD(2), and the values obtained with the conditional model are somewhat higher than for both other models. A variation on this theme, suggested by many authors (CHEN and KODELL, 1989; RYAN, 1992), rst determines a lower con dence limit, e.g. corresponding to an excess risk of 1%, and then linearly extrapolates it to a BMD. The main advantage quoted for this procedure is that the determination of a BMD is less model dependent. 5 Goodness-of- t for likelihood based models with clustered binary data In order to evaluate how effective models are in describing the outcome variable, we need to assess the quality of their t. LE CESSIE and VAN HOUWELINGEN (1995) considered a goodness-of- t test for generalized linear models with canonical link function and known dispersion parameter, based on the score test for extra variation in a random effects model. LIPSITZ, FITZMAURICE and MOLENBERGHS (1996) note that for the special case of a binary response, several methods for assessing the goodness-of- t of binary logistic regression models have been proposed. All these methods are based on the notion of partitioning the covariate space into groups or regions. TSIATIS (1980) proposed a goodness-of- t statistic for the logistic regression model for a given partition of the covariate space, but he did not provide a method for partitioning the covariate space into suitable regions. HOSMER and LEMESHOW (1989) proposed the partition of subjects into groups or regions on the basis of the percentiles of the predicted probabilities from the tted logistic regression model. To construct a goodness-of- t measure for clustered binary data, we adapted the methods proposed by HOSMER and LEMESHOW (1989) and TSIATIS (1980). Following Table 10. Effective doses and lower con dence limits for DEHP study. Entirely model based computation. All quantities shown should be divided by 10 4 Model Statistic External Visceral Skeletal Collapsed Bahadur ED VSD(1) VSD(2) BB ED VSD(1) VSD(2) Cond. ED VSD(1)

20 338 G. Molenberghs and H. Geys these authors, groups are constructed according to deciles of the predicted malformation probabilities in each temperature-duration combination. Given this partition, the goodness-of- t statistic is formulated by de ning G 1 group indicators (in our example, G ˆ 10): I g ik ˆ 1 if ^ð ik is in region g (g ˆ 1,..., G 1) 0 otherwise, where ^ð ik is the estimated malformation probability of the kth individual within the ith cluster, calculated from the model that takes into account the clustering between the individuals. For example, in the context of the heatshock studies, the following model could be considered: ð ik ln ˆ â 0 â t t ik 1 ð â dtdt ik XG 1 I g ik ã g: ik The association is modelled similarly as in the model for which the goodness-of- t is assessed. If the mean structure in the original model is correctly speci ed, then ã 1 ˆˆã G 1 ˆ 0. MOORE and SPRUILL (1975) note that, even though I g ik is based on random quantities ^ð ik, the partition can be treated asymptotically as if it were based on the true ð ik. To test the goodness-of- t of the model, one can use either a likelihood ratio, Wald or score statistic to test H 0 : ã 1 ˆˆã G 1 ˆ 0. For large samples, each of these statistics has approximately a 2 distribution with G 1 degrees of freedom, if the model under the null hypothesis is correctly speci ed. GEYS, MOLENBERGHS and WILLIAMS (2001a) suggest the use of the likelihood ratio statistic, since it is simple to calculate and is fairly powerful. For large samples, all estimated expected frequencies should typically be greater than 1 and at least 80% should be greater than 5. Otherwise, one can collapse some frequencies, reducing the number of groups G (LIPSITZ, FITZMAURICE and MOLENBERGHS, 1996). HOSMER and LEMESHOW (1989) noted that G ˆ 6 should be a minimum, since a test statistic calculated from fewer than six groups will usually have low power. Note that in the goodness-of- t assessment described above, correlation is essentially treated as a nuisance parameter and interest is focused on the relationship between the covariates and the probability of response. Recent work uncovered de ciencies of the goodnessof- t tests based on the ones proposed by HOSMER and LEMESHOW (HOSMER, HOSMER, LEMESHOW, LE CESSIE, 1997). Decisions on model t may depend more on choice of cutpoints than on lack-of- t and their test statistic may have relatively low power with small sample sizes. Developing improved goodness-of- t test statistics for likelihood based models for clustered binary data is a topic of further research. gˆ1 6 Joint modelling of continuous and discrete outcomes Developmental toxicity studies may seek to determine the effects of dose on fetal weight (continuous) and malformation incidence (binary) simultaneously, as both

T E C H N I C A L R E P O R T A HIERARCHICAL MODELING APPROACH FOR RISK ASSESSMENT IN DEVELOPMENTAL TOXICITY STUDIES

T E C H N I C A L R E P O R T A HIERARCHICAL MODELING APPROACH FOR RISK ASSESSMENT IN DEVELOPMENTAL TOXICITY STUDIES T E C H N I C A L R E P O R T 0464 A HIERARCHICAL MODELING APPROACH FOR RISK ASSESSMENT IN DEVELOPMENTAL TOXICITY STUDIES FAES, C., GEYS, H., AERTS, M. and G. MOLENBERGHS * I A P S T A T I S T I C S N

More information

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

A litter-based approach to risk assessment in developmental toxicity. studies via a power family of completely monotone functions

A litter-based approach to risk assessment in developmental toxicity. studies via a power family of completely monotone functions A litter-based approach to ris assessment in developmental toxicity studies via a power family of completely monotone functions Anthony Y. C. Ku National University of Singapore, Singapore Summary. A new

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis Hanxiang Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis March 4, 2009 Outline Project I: Free Knot Spline Cox Model Project I: Free Knot Spline Cox Model Consider

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

T E C H N I C A L R E P O R T THE EFFECTIVE SAMPLE SIZE AND A NOVEL SMALL SAMPLE DEGREES OF FREEDOM METHOD

T E C H N I C A L R E P O R T THE EFFECTIVE SAMPLE SIZE AND A NOVEL SMALL SAMPLE DEGREES OF FREEDOM METHOD T E C H N I C A L R E P O R T 0651 THE EFFECTIVE SAMPLE SIZE AND A NOVEL SMALL SAMPLE DEGREES OF FREEDOM METHOD FAES, C., MOLENBERGHS, H., AERTS, M., VERBEKE, G. and M.G. KENWARD * I A P S T A T I S T

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

1 Introduction: Extra-Binomial Variability In many experiments encountered in the biological and biomedical sciences, data are generated in the form o

1 Introduction: Extra-Binomial Variability In many experiments encountered in the biological and biomedical sciences, data are generated in the form o Bootstrap Goodness-of-Fit Test for the Beta-Binomial Model STEVEN T. GARREN 1, RICHARD L. SMITH 2 &WALTER W. PIEGORSCH 3, 1 Department of Mathematics and Statistics, James Madison University, Harrisonburg,

More information

Small Sample Methods for the Analysis of Clustered Binary Data

Small Sample Methods for the Analysis of Clustered Binary Data Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 5-2008 Small Sample Methods for the Analysis of Clustered Binary Data Lawrence J. Cook Utah State University

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling J. Shults a a Department of Biostatistics, University of Pennsylvania, PA 19104, USA (v4.0 released January 2015)

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

A Novel Application of a Bivariate Regression Model for Binary and Continuous Outcomes to Studies of Fetal

A Novel Application of a Bivariate Regression Model for Binary and Continuous Outcomes to Studies of Fetal A Novel Application of a Bivariate Regression Model for Binary and Continuous Outcomes to Studies of Fetal Toxicity Julie S. Najita, Yi Li, and Paul J. Catalano Department of Biostatistics, Harvard School

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Simple Estimators for Semiparametric Multinomial Choice Models

Simple Estimators for Semiparametric Multinomial Choice Models Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper

More information

Non-parametric Tests for the Comparison of Point Processes Based on Incomplete Data

Non-parametric Tests for the Comparison of Point Processes Based on Incomplete Data Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 28: 725±732, 2001 Non-parametric Tests for the Comparison of Point Processes Based

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

On the multivariate probit model for exchangeable binary data. with covariates. 1 Introduction. Catalina Stefanescu 1 and Bruce W. Turnbull 2.

On the multivariate probit model for exchangeable binary data. with covariates. 1 Introduction. Catalina Stefanescu 1 and Bruce W. Turnbull 2. On the multivariate probit model for exchangeable binary data with covariates Catalina Stefanescu 1 and Bruce W. Turnbull 2 1 London Business School, Regent s Park, London NW1 4SA, UK 2 School of Operations

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Dose-response modeling with bivariate binary data under model uncertainty

Dose-response modeling with bivariate binary data under model uncertainty Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,

More information

MARGINAL MODELLING OF MULTIVARIATE CATEGORICAL DATA

MARGINAL MODELLING OF MULTIVARIATE CATEGORICAL DATA STATISTICS IN MEDICINE Statist. Med. 18, 2237} 2255 (1999) MARGINAL MODELLING OF MULTIVARIATE CATEGORICAL DATA GEERT MOLENBERGHS* AND EMMANUEL LESAFFRE Biostatistics, Limburgs Universitair Centrum, B3590

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels STATISTICS IN MEDICINE Statist. Med. 2005; 24:1357 1369 Published online 26 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2009 Prediction of ordinal outcomes when the

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

R E S E A R C H T R I A N G L E P A R K, N O R T H C A R O L I N A

R E S E A R C H T R I A N G L E P A R K, N O R T H C A R O L I N A R E S E A R C H T R I A N G L E P A R K, N O R T H C A R O L I N A Simultaneous Modeling of Multiple Outcomes over Time: A Hierarchical Modeling Approach Abhik Das1, Charlotte Gard1, Henrietta Bada2 and

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

Random Effects Models for Longitudinal Data

Random Effects Models for Longitudinal Data Chapter 2 Random Effects Models for Longitudinal Data Geert Verbeke, Geert Molenberghs, and Dimitris Rizopoulos Abstract Mixed models have become very popular for the analysis of longitudinal data, partly

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Obnoxious lateness humor

Obnoxious lateness humor Obnoxious lateness humor 1 Using Bayesian Model Averaging For Addressing Model Uncertainty in Environmental Risk Assessment Louise Ryan and Melissa Whitney Department of Biostatistics Harvard School of

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Exact unconditional tests for a 2 2 matched-pairs design

Exact unconditional tests for a 2 2 matched-pairs design Statistical Methods in Medical Research 2003; 12: 91^108 Exact unconditional tests for a 2 2 matched-pairs design RL Berger Statistics Department, North Carolina State University, Raleigh, NC, USA and

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All?

A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All? A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All? Authors: Sudhir Paul Department of Mathematics and Statistics, University of Windsor, Ontario, Canada (smjp@uwindsor.ca)

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION STATISTICS IN MEDICINE GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION NICHOLAS J. HORTON*, JUDITH D. BEBCHUK, CHERYL L. JONES, STUART R. LIPSITZ, PAUL J. CATALANO, GWENDOLYN

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Sampling bias in logistic models

Sampling bias in logistic models Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models

More information

A measure of partial association for generalized estimating equations

A measure of partial association for generalized estimating equations A measure of partial association for generalized estimating equations Sundar Natarajan, 1 Stuart Lipsitz, 2 Michael Parzen 3 and Stephen Lipshultz 4 1 Department of Medicine, New York University School

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Within-individual dependence in self-controlled case series models for recurrent events

Within-individual dependence in self-controlled case series models for recurrent events Within-individual dependence in self-controlled case series models for recurrent events C. Paddy Farrington and Mounia N. Hocine Department of Mathematics and Statistics, The Open University, Milton Keynes

More information

Lecture 1 Introduction to Multi-level Models

Lecture 1 Introduction to Multi-level Models Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

The Ef ciency of Simple and Countermatched Nested Case-control Sampling

The Ef ciency of Simple and Countermatched Nested Case-control Sampling Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 26: 493±509, 1999 The Ef ciency of Simple and Countermatched Nested Case-control

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure Ciarleglio and Arendt Trials (2017) 18:83 DOI 10.1186/s13063-017-1791-0 METHODOLOGY Open Access Sample size determination for a binary response in a superiority clinical trial using a hybrid classical

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Likelihood ratio testing for zero variance components in linear mixed models

Likelihood ratio testing for zero variance components in linear mixed models Likelihood ratio testing for zero variance components in linear mixed models Sonja Greven 1,3, Ciprian Crainiceanu 2, Annette Peters 3 and Helmut Küchenhoff 1 1 Department of Statistics, LMU Munich University,

More information

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes Achmad Efendi 1, Geert Molenberghs 2,1, Edmund Njagi 1, Paul Dendale 3 1 I-BioStat, Katholieke Universiteit

More information

When is a copula constant? A test for changing relationships

When is a copula constant? A test for changing relationships When is a copula constant? A test for changing relationships Fabio Busetti and Andrew Harvey Bank of Italy and University of Cambridge November 2007 usetti and Harvey (Bank of Italy and University of Cambridge)

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information