Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Size: px
Start display at page:

Download "Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci"

Transcription

1 Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci Abstract. When censored time-to-event data are used to map quantitative trait loci (QTL), the existence of nonsusceptible subjects entails extra challenges. If the heterogeneous susceptibility is ignored or inappropriately handled, we may either fail to detect the responsible genetic factors or find spuriously significant locations. In this article, an interval mapping method based on parametric mixture cure models is proposed, which takes into consideration of nonsusceptible subjects. The proposed model can be used to detect the QTL that are responsible for differential susceptibility and/or time-to-event trait distribution. In particular, we propose a likelihood based testing procedure with genome-wide significance levels calculated using a resampling method. The performance of the proposed method and the importance of considering the heterogeneous susceptibility are demonstrated by simulation studies and an application to survival data from an experiment on mice infected with Listeria monocytogenes. Keywords: EM algorithm; Parametric proportional hazards model; QTL mapping; Time-to-event data. 1. Introduction Mapping genes that underlie complex traits is of great interest (Lander and Schork, 1994; Glazier et al., 2002). In standard interval mapping of quantitative traits loci (QTL), the trait distribution is often modeled as a mixture of two (or more) normal components corresponding to two (or more) different genotypes at the putative QTL (Lander and Botstein, 1989; Zeng, 1993). Time-to-event data as quantitative trait values (e.g. age-at-onset of cancer, timeto-recurrence of tumor) have been used to identify various disease genes (Claus et al., 1990; Carter et al., 1992; Miki et al., 1994; Boyartchuk et al., 2001; Symons et al., 2002; among others). To study time-to-event traits, however, classical survival models may be more natural than the mixture of normals. Indeed, the standard interval mapping methods for normally distributed and fully observed quantitative traits have been extended successfully to the time-to-event traits subject to random censoring 1

2 (e.g. Li and Thompson, 1997; Diao et al., 2004; Diao and Lin, 2005, among others). A further challenge that has not received adequate attention when studying time-to-event trait is the issue of latent heterogeneous susceptibilities. In a population consisting of susceptible and nonsusceptible individuals, all susceptible subjects would eventually experience the event of interest in the absence of censoring, while the nonsusceptible ones can be regarded as cured, i.e. not at risk of developing the particular event. For example, Boyartchuk et al. (2001) considered a data set consisting of the survival times of 116 female intercross mice after infection with Listeria monocytogenes. About 30% of the mice had survived longer than 240 hours to the end of the study and might be considered as cured or nonsusceptible from a biological point of view. In addition to good scientific evidence for the existence of nonsusceptible subpopulation, the data with heterogeneous susceptibility usually have heavy censoring at the end of the study. The corresponding Kaplan-Meier curve has a long non-zero tail or the histogram shows a spike in the right end (Broman, 2003). When analyzing such genetic data, failure to account for the latent heterogeneous susceptibility might result in significant power loss in detecting the responsible genetic factor and/or lead to spurious significant results (Farewell, 1982; Hodge and Elston, 1994; Hodge et al., 2001). Therefore, statistical methods that incorporate the mixed susceptibilities into the modeling and analysis are needed. For the aforementioned mouse survival data, Broman (2003) proposed a two-part model: a normal distribution for log survival times of observed failure mice and a point mass at the end of the study for observed survivors. This two-part model may be useful to detect QTL when there is one common administrative censoring time, as being the case of this particular data set. Under general random censorship, such two-part separation of subjects may not always be reasonable. Therefore, it is of great practical interest to develop general statistical methods applicable to QTL mapping for randomly censored time-to-event traits from a population of latent heterogeneous susceptibility. In this paper, we propose a parametric mixture cure model for QTL mapping when the primary trait is the randomly censored time-to-event data from a population of mixed susceptibility. In the context of survival analysis, cure models have been developed to handle heterogenous susceptibilities and are applicable under general 2

3 random censoring (Kuk and Chen, 1992; Sy and Taylor, 2000; Peng and Dear, 2000; Lu and Ying, 2004, among others). Cure models may also be viewed as a special case of the competing risks models with unobserved cured events (Fine, 1999). To adapt these cure models for mapping QTL, we need to overcome a few challenges. In particular, we must account for the missing covariates due to the fact that the genotypes of the putative QTL are unknown. Furthermore, we need to identify appropriate genome-wide critical values for the proposed test statistics at certain nominal levels. The interval mapping tests are usually carried out at multiple locations along the chromosomes and the test statistics are typically not independent. Thus obtaining appropriate genomewide critical values is crucial in the context of genome-wide QTL mapping. The paper is organized as follow. In Section 2, we introduce the parametric mixture cure model and propose an EM-based likelihood ratio test (LRT). When the log-normal model is used for event times of susceptible subjects, the proposed cure model generalizes the two-part model of Broman (2003) to allow for latent susceptible status under general random right censoring. When using the parametric proportional hazards model for the time-to-event trait of susceptible subjects, the proposed cure model extends the parametric proportional hazards (PPH) model of Diao et al. (2004) to deal with heterogeneous susceptibility. Since the proposed cure model characterizes the QTL effects on susceptibility and/or survival distribution of susceptible subjects, it can be used to test such effects separately or simultaneously. It can also be used to account for potential effects of other risk factors by incorporating covariates into the regression model in a natural way. The issue of genome-wide significance level is also addressed in Section 2. Recently, Diao et al. (2004), Zou et al. (2004), and Lin (2005) introduced an efficient resampling method for assessing the genome-wide significance level. The resampling method is computationally less intensive and applicable to many complex genetic models (Zou et al., 2004). Therefore, we adopt this resampling method to obtain the genome-wide thresholds for the proposed likelihood ratio tests at certain nominal levels. In Section 3, the performance of our proposed methods and the importance of considering the heterogeneous susceptibility are demonstrated by simulation studies and an application to survival times of intercrossed mice following infection with Listeria 3

4 monocytogenes (Boyartchuk et al., 2001; Broman, 2003). Some concluding remarks are given in Section Methods In this section, we first propose the general parametric mixture cure model. We then formulate the likelihood based genome-wide tests and discuss the determination of genome-wide thresholds Notation and Models Consider a sample of n individuals with mixed susceptibility. Let T i denote the potential time-to-event trait for individual i, i = 1,, n. In addition, T i < stands for the failure time of a susceptible subject and η i takes value 1 or 0, indicating whether the ith subject is susceptible or not. Thus T i has the following decomposition: T i = η i T i + (1 η i ), (1) where the multiplication of 0 and is defined to be 0. The observation on the trait value of the ith individual consists of two components: the observed event time Y i = min(t i, C i ) and the censoring indicator δ i = I(T i C i ), where C i denotes a random censoring time and is assumed to be noninformative (Kalbfleish and Prentice, 2002). Note that, δ i = 1 implies η i = 1, but η i is unobservable when δ i = 0. Thus, the susceptible statuses are uncertain for censored subjects. Suppose we have data on trait values and a set of genetic markers. Let M i denote the multiple marker genotype information of the ith subject. We consider a putative QTL with two alleles Q and q and denote its unknown QTL genotype by G i. Here G i is coded as a dummy variable for all possible combinations of genotypes. For example, in F 2 intercross design, G i can be recorded using a two dimensional vector with three possible values, (1, 0), (0, 1) and (0, 0), according to the genotypes QQ, Qq and qq, respectively. In addition, let Z i denote other observed covariates of interest, such as environmental exposures, which is assumed to be independent of G i. For a susceptible subject, i.e. η i = 1, its failure time T i f(t Z i, G i ; β, β g, θ), where the parameter β g is assumed to follow a parametric distribution 4 depicts the effects of the QTL on the

5 time-to-event trait distribution for susceptible subjects, β indicates the corresponding effects of covariates and θ is the inherent distribution parameter of the parametric distribution f. More specifically, a linear regression model for QTL and covariates effects leads to a simple model f(t Z i, G i ; β Z i + β g G i, θ). For the binary outcome of susceptible indicator η i, it is natural to consider a logistic regression model where Z i pr(η i = 1 G i, Z i ) = exp(γ Z i + γ g G i ) 1 + exp(γ Z i + γ g G i ), (2) = (1, Z i) so that γ contains an intercept term, and the regression parameter γ g represents the QTL effects on susceptibility. Note that, at any putative QTL except for markers, genotype information G i is unknown. A natural idea is to treat G i as missing data, which can be handled by an EM algorithm. The conditional probability of G i given the marker information M i at a specific location d is denoted by pr(g i M i ; d). Under the assumption of no crossover interference and no genotyping errors, pr(g i M i ; d) is determined by the two flanking markers and the position of the QTL in the interval. For many experimental cross studies, explicit formulas for pr(g i M i ; d) are available in several books and papers, for example in Lynch and Walsh (1998, P435, equation (15.2)). The complete data consist of independent copies of C = {Y, δ, η, G, M, Z}, while the observed data consist of n independent copies of O = {Y, δ, M, Z}. Let µ = (γ, γ g, β, β g, θ). The likelihood function of the complete data is constructed as follows: L C (µ; d) = n { exp(γ Zi + γ } gg i ) ηi { } 1 ηi exp(γ Z i=1 i + γ gg i ) 1 + exp(γ Zi + γ gg i ) n [ {f(y i Z i, G i ; β, β g, θ)} δ i {1 F (Y i Z i, G i ; β, β g, θ)} 1 δ i i=1 n pr(g i M i ; d). (3) i=1 where F (t Z i, G i ; β, β g, θ) = t 0 f(s Z i, G i ; β, β g, θ)ds denotes the cumulative distribution function of the survival time for susceptible subjects. It is straightforward to verify that, using the complete data likelihood (3), the observed data likelihood is a mixture of several components corresponding to different 5 ] ηi

6 genotypes and susceptibilities: L(µ, d) = n { K i=1 j=1 [ p i (j) {f(y i Z i, G i ; β, β g, θ)} δ i {1 F (Y i Z i, G i ; β, β g, θ)} 1 δ i exp(γ Zi + γ gg j ) 1 + exp(γ Zi + γ gg j ) + 1 δ ]} i 1 + exp(γ Zi +. (4) γ gg j ) where K denotes the number of possible genotypes of putative QTL, and {G j } denote the coded values of genotypes. For example, for F 2 intercross population, we may use G 1,2,3 = {(1, 0), (0, 1), (0, 0)}. Standard interval mapping methods (Lander and Botstein, 1989; Zeng, 1993) examine the existence of QTL through the chromosome in a specified distance, e.g. 1 or 2 centi-morgan (cm), using a likelihood ratio test (LRT). In all our numerical studies, we evaluate the LRT with the specified distance 1 cm. To construct such a profile of LRT over the regions of the chromosome, the maximum likelihood estimates (MLE) ˆµ under the alternative model and the restricted MLE µ under the null hypothesis need to be calculated at each given position d Hypotheses and LRT Under the proposed parametric mixture cure model, the QTL has two types of effects on the trait distribution: γ g is the long-term effect on susceptibility and β g is the shortterm effect on survival of the susceptible subjects. Therefore, the proposed mixture cure model can be used to test the following hypotheses: No overall QTL effects, H 0 : γ g = 0 and β g = 0 vs. H 1 : γ g 0 or β g 0; No QTL effects on susceptibility, H 0γ : γ g = 0 vs. H 1γ : γ g 0; No QTL effects on the survival of susceptible subjects, H 0β : β g = 0 vs. H 1β : β g 0. To test the above hypotheses, the LRT statistic LR(d) = 2 ln{l(ˆµ; d)/l( µ; d)} is calculated at each location d. Under the null hypothesis H 0, the MLE µ does not depend on the testing location d, so µ needs to be calculated only once for each data set. But ˆµ, LR(d) and µ under other null hypotheses do depend on the location d since pr(g i M i ; d) varies along d. We employ the EM algorithm (Dempster et al., 1977) to obtain the parameter estimates. In the EM algorithm, we need to calculate the conditional expectation of 6

7 l C (µ; d) = log L C (µ; d) in (3) with respect to the unobserved quantities {η i, G i } given the current estimated parameter values and the observed data O i = {Y i, δ i, M i, Z i }. For example, consider the parametric proportional hazards mixture cure model in which the hazard function for the survival time of susceptible subject is specified as λ(t G i, Z i ) = λ 0 (t; θ) exp(β Z i + β g G i ). (5) Denote the cumulative hazard function by Λ 0 (t; θ) = t 0 λ 0(s; θ)ds. Then substituting the density and distribution functions in (3) by their counterparts specified by the proportional hazards model (5), simple algebraic manipulation yields L C (µ; d) = n { exp(γ Zi + γ } gg i ) ηi { } 1 ηi exp(γ Z i=1 i + γ gg i ) 1 + exp(γ Zi + γ gg i ) n { λ0 (Y i ; θ) exp(β Z i + β gg i ) } δ i η i e η iλ 0 (Y i ; ) exp( Z i + g G i) i=1 n pr(g i M i ; d). i=1 Note that the conditional expectation of the complete data log-likelihood (6) can be written as a function of conditional expectations of {η i, G i, η i G i }. Thus, in each E-step of the EM iteration, it suffices to compute the conditional expectation of these quantities given the current parameter values and the observed data. (6) In the kth step, the corresponding conditional expectations of {η i, G i, η i G i }, denoted by {E(η i O i, µ (k) ), E(G i = G j O i, µ (k) ), E(η i G i = η i G j O i, µ (k) )}, can be derived explicitly. To simplify the notation, the superscript (k) of parameters are suppressed in the following formulas for these conditional moments. 1 δ i = 1 E(η i O i, µ (k) ) = D 1 K i0 j=1 e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 0 D 1 E(G i = G j O i, µ (k) i1 ) = e g G j e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 1 D 1 i0 [e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j ) + {1 π i (G j )}]p i (j) δ i = 0 D 1 E(η i G i = η i G j O i, µ (k) i1 ) = e g G j e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 1 D 1 i0 e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 0, 7

8 where p i (j) = pr(g i = G j M i ), π i (G j ) pr(η i = 1 G i = G j, Zi ) as defined in equation (2), and K D i1 = e g G j e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j), j=1 K D i0 = {e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j ) + 1 π i (G j )}p i (j). j=1 In the M-step, we obtain the arguments that maximize the expected log-likelihood. Then the EM algorithm iterates until it converges. The LOD score is defined as log 10 {L(ˆµ; d)/l( µ; d)} = LR(d)/(2 ln10). Evaluation of the LOD score at each location yields a LOD profile over the chromosome. The location with the largest LOD score can be used as an estimate of the QTL location provided that this largest value exceeds the threshold of a certain significance level Genome-wide Threshold Assessing the genome-wide significance level is challenging when the QTL is searched over the whole genome because the tests are performed at multiple locations and the test statistics are not independent. The point-wise significance level based on the χ 2 approximation without the multiplicity correction is no longer appropriate. The Bonferroni correction becomes too conservative when the number of tests is large. Recently, Diao et al. (2004), Zou et al. (2004), and Lin (2005) proposed a novel numerical method for searching the genome-wide threshold using a resampling approach. The method is computationally feasible and is applicable to many genetic models. Zou et al. (2004) and Lin (2005) gave detailed discussions of the performance of this resampling method and comparisons with other competing methods (e.g. Rebai et al., 1995; Dupuis and Siegmund, 1999). We employ the resampling method to assess the genome-wide significance level. Using the well known asymptotic equivalence between the likelihood ratio test and the score test (Cox and Hinkley, 1974), the resampling approach computes the empirical threshold for LRT by generating a large number of randomly perturbed score test statistics. At each location d, the score test statistic is a sum of independent and identically distributed (i.i.d.) terms with mean zero, thus it is convenient to perturb 8

9 each term with an independent standard Gaussian random variable as discussed in Lin (2005). More specifically, let U(µ; d) = n U i (µ; d) =. i=1 n l i (µ; d)/ µ (7) i=1 denote the score function at a location d. To test different null hypotheses, the score functions corresponding to the parameters to be tested will be used to construct the test statistic. For example, to test H 0 : γ g = 0 and β g = 0, the corresponding score functions are U g g (µ; d) =. n i=1 l i(µ; d)/ γ g n i=1 l i(µ; d)/ β g. The score test statistic is defined as W (d) = 1 T g g nû ( µ; d) ˆV 1 (d)û ( µ; d) (8) g g where Û g g is a consistent estimator of U g g under the null hypothesis and its formulation is derived in Appendix. The estimated covariance matrix ˆV (d) can be obtained by n 1 n i=1 Û g g,i (d)û T g g,i (d). To approximate the distribution of W (d), we generate a large number, say R = 10000, of W (d) = 1 T g g nû ( µ; d) ˆV 1 (d)û ( µ; d) (9) g g using randomly perturbed score functions Û g g (d) = n i=1 Û g g,i (d)x i, X i N(0, 1), while fixing the observed data (Y, δ, M, Z). From each set of (X 1,..., X n ), we can calculate sup d W (d). Then the threshold for the genome-wide significance level α is determined by the 100(1 α)th percentile of the R simulated values of sup d W (d). It is clear that the resampling method only involves generating standard normal random variables and some straightforward calculations. Remark: In the presence of missing marker genotypes, the conditional probability of the putative QTL genotype can be calculated based on the two closest observed flanking markers of the specific location. The proposed approach can easily deal with missing and/or dominant marker situations since only the conditional probabilities {p i (j)} need modifed formulas (Jiang and Zeng, 1997; Zou et al., 2004). We have considered both types of complications in the following real data application. 9

10 3. Numerical Results In this section we first demonstrate the proposed mixture cure models with an application to a real data set. Then we report numerical results of our simulations conducted to assess the performance of the proposed methods under various settings motivated by the real data example Real Data Example To illustrate our methods, we considered the data from the study on the survival of 116 female mice from an intercross experiment between the BALB/cByJ and C57BL/6ByJ strains after infection with Listeria monocytogenes (Boyartchuk et al., 2001; Broman, 2003). The mice were genotyped at 133 markers over 20 chromosomes, including 2 on the X chromosome. In this specific data, the only censoring occurred at the end of the study. From the biological point of view, the mice surviving more than 240 hours may have recovered from the infection. We employed our proposed mixture cure model to detect the QTL effects on several related chromosomes. More specifically, we consider the parametric proportional hazards mixture cure model (6) and use the two-parameter Weibull hazard function, λ 0 (t; θ) = θ 1 θ 2 t θ2 1. For the inter-crossed mice data, the genotypes G i are recorded using a two dimensional vector with three possible values, (1, 0), (0, 1) and (0, 0), according to the genotypes QQ, Qq and qq, respectively. The EM algorithm and LRT tests were conducted as described in the last section. The threshold at the 5% genome-wide significance level obtained from the resampling approach is The corresponding estimates of regression parameters at the locations with the largest LOD scores for each chromosome are presented in Table 1. Recall that the regression parameters were estimated when the putative QTL location was fixed, for the purpose of constructing the likelihood ratio test statistics at each d. (INSERT TABLE 1 HERE) We also carried out LOD Score analysis using the PPH model on the same set of data to gain insights on the importance of taking into account possible heterogeneous susceptibility. The profiles of both the LOD(γ g, β g ) for the cure model and LOD(β g ) 10

11 for the PPH model for testing H 0 are shown in the top plot of Figure 1. These two methods show some discrepancies, especially in chromosomes 1 and 5, which will further discussed later. (INSERT FIGURE 1 HERE) To test more specific effects of the QTL, we proceed with testing hypotheses H 0 and H 0 using the proposed cure mixture model. The corresponding LOD(d) profiles are shown in the middle and bottom plots of Figure 1. It is obvious from the plots that, only in chromosome 13, all three peaks of LOD scores of the cure models exceed the genomewide thresholds. This indicates that the QTL in chromosome 13 has significant joint and separate effects on susceptibility and survival times of susceptible mice. Additional interesting patterns were found in chromosomes 1 and 5. In chromosome 1, there is no significance for susceptibility (H 0γ : γ g = 0, Figure 1: middle plot), but a significant effect on the survival distribution among the susceptible mice (H 0β : β g = 0, Figure 1: bottom plot). On the other hand, the QTL in chromosome 5 significantly affects only the susceptibility of mice (Figure 1: middle plot) but not on the survival distribution of the susceptible mice (Figure 1: bottom plot). Next we examine chromosomes 1 and 5 more closely to properly interpret the differences between our results and findings using the PPH method. To be specific, we first examine the survival distribution based on marker D1M355, the marker closest to the estimated QTL position (81 cm) on chromosome 1. Various survival distribution plots are presented in Figure 2. (INSERT FIGURE 2 HERE) The Kaplan-Meier plots of censored survival times grouped by marker D1M355 genotypes are displayed on the upper left; the Kaplan-Meier plots for the observed failure times only are displayed on the upper right. The lower two plots are the estimated survival distributions for the corresponding upper plots using our proposed mixture cure model and the resulting estimates of the parameters in Table 1. The upper left Kaplan-Meier plot shows that the three survival curves nearly approach to the same level at the end of the study, which indicates that cure proportions of three group are 11

12 close. This tail similarity obscures the vertical differences among survival curves, while such differences are much more obvious when considering only the survival distribution for susceptible subjects, as shown in the upper and lower right plots. Thus, the PPH model that essentially assumes no cured fraction, which looks for differences in the upper left plot, does not yield significance. But the proposed parametric mixture cure model, which considers both the upper left and right plots, yields significance on chromosome 1. These findings are consistent with the results reported in Broman (2003) and Diao et al. (2004). Additionally, based on our testing results, chromosome 5 seems to be a typical case where the QTL only affects susceptibility. To see this, Figure 3 displays four survival distribution plots on marker D5M357 presented in the same layout as described above for Figure 2. It is obvious from the two plots on the left that cure fractions of the three groups are very different. The estimated cure fractions are 0.64, 0.29 and 0.03 for the three genotypes AA, Aa, and aa respectively. But the two plots on the right show that the survival distributions of susceptible subjects are very similar among these three groups. Because the tails are well separated, the overall survival curves can still be distinguished from each other even without considering cure effects. Hence the QTL effect in chromosome 5 can be detected using both the cure model and the PPH model. (INSERT FIGURE 3 HERE) Model Diagnostic The proportional hazards mixture cure model with two-parameter Weibull baseline hazard function is used in the analysis of listeria data. Based on Figures 2 and 3, the estimated survival curves (lower plots) seem very similar to the observed Kaplan- Meier curves (upper plots). Model diagnostic and goodness-of-fit analysis are critical in practical applications of parametric models. In this subsect, we provide some formal examination of the goodness of fit of the proposed parametric mixture cure model to the listeria data. The listeria data have no other covariates besides the genotype groups. We first assess the Weibull baseline hazard assumption for the survival distribution of the susceptible subjects by examining each marker genotype group separately. More specifically, 12

13 at a specific marker and for each genotype group, an overall survival distribution S(t) was estimated by the nonparametric Kaplan-Meier estimate which is denoted by Ŝ(t). Under the assumption of existing cure proportion, the overall survival distribution function can be written as S(t) = ps s (t) + (1 p), where p stands for the susceptible probability and S s (t) denotes the survival distribution for the susceptible subjects in this genotype group. Therefore, we can represent S s (t) as {S(t) (1 p)}/p. A consistent nonparametric estimate of S s (t) is thus Ŝs(t) = {Ŝ(t) Ŝ( )}/(1 Ŝ( )). To test for a shared Weibull hazard assumption for the survival distribution of susceptible subjects, we plot log{ log Ŝs(t)} against log t which should be linear under the assumption of a Weibull distribution. The plots are approximately linear, supporting a Weibull baseline assumption. The plots on markers D1M355 and D5M357 were presented in Figure 4. We note that there appears to be some deviations from the straight-line at the beginning of the study. One possible reason is that the animals were not immediately at risk of death after infection. (INSERT FIGURE 4 HERE) We next examine the overall fit of the proposed mixture cure model to the listeria data graphically and numerically. The population survival distribution S (t) was estimated nonparametrically using the Kaplan-Meier estimate Ŝ (t). Under the proposed parametric mixture cure model, the survival distribution S (t; µ) is estimated by fixing the parameters at the estimated values ˆµ (from Table 1). We plot Ŝ (t) against S (t; ˆµ) and the resulting P-P plot can be used to assess the closeness to the diagonal line. The results on markers D1M355 and D5M357 were shown in Figure 5. The two sample Kolmogorov-Smirnov tests were employed to evaluate the goodness-of-fit between Ŝ (t) and S (t; ˆµ) at these two markers and yielded p-values of 0.34 and 0.18, respectively. Therefore, these results and plots indicate that there are no serious violation for using the proposed proportional hazards mixture cure model with the Weibull baseline hazards function. INSERT FIGURE 5 HERE 3.2. Simulations 13

14 A series of simulation studies were conducted to assess the performance of the proposed cure model under practical settings. For illustration, we also present the results from the PPH model assuming no cured fraction. The survival times were generated from an F 2 population of mixed susceptibility that mimic the settings of the real data example presented above. No covariates other than genetic factors are included. To be specific, we consider the parametric proportional hazards mixture cure model with the following specifications, Pr(η i = 1 G i ) = exp(γ 0 + γ g G i ) 1 + exp(γ 0 + γ g G i ), and λ(t G i, η i = 1) = λ 0 (t; θ)e g G i, where G i is defined for F 2 population as before and λ 0 (t; θ) is the Weibull baseline hazard function. The independent censoring times C i are generated from the uniform distribution U(0, 10). Under this general random censoring, it is no longer appropriate to simply attribute all censored individuals to the category of cured. One chromosome of total length 100 cm is simulated, and the markers are generated from a Markov chain at evenly spaced positions with distances of 10 and 20 cm. Under the alternative hypotheses, the QTL is posited at 35 cm with different combinations of the long-term and short-term effects. Under each scenario, we perform 1000 runs with sample size n = 300. In each run, we resample times. To examine performances of the LOD Score tests using the cure model and the PPH model, the null hypotheses H 0 : γ g = 0, and β g = 0 are tested under various proportions of nonsusceptible subjects. The genome-wide threshold is obtained by the resampling method described in the proceeding section, and the estimated threshold values are presented in Table 2. (INSERT TABLE 2 HERE) The empirical type I errors and powers of the two methods using the cure model and the PPH model were summarized in Table 3. The simulation results indicate that the proposed cure model produces reasonable type I errors and good powers to detect the QTL effects under all types of alternative models that we simulated. On the other hand, the PPH method inflates the type I errors and may lose power to detect the QTL under some cure model alternatives. More specifically, under the null hypothesis of no 14

15 overall QTL effects (γ g = β g = 0), the cure fraction is 38% (γ 0 = 0.5), and it is clear that the PPH method yields inflated type I errors. This demonstrates that without taking into account the existence of non-susceptible individuals, the PPH method tends to find a spurious QTL. (INSERT TABLE 3 HERE) The powers are calculated under three different alternatives, which mimic the different situations presented in the real data. For simplicity, we directly compare the powers of our proposed cure method with the PPH method without adjustment for type I error inflation. However, since the PPH method produces inflated type I errors, adjustments are needed if practically applied in such context. For the first alternative (Table 2), we consider the case β g = (0.5, 0.5) and γ g = 0, which mimics the situation of chromosome 1 (Figure 2), i.e. the QTL only affects the survival distribution of susceptible subjects but not the susceptible probability. With 6 markers, at the 5% nominal level, the proposed cure method has 85% power to detect the QTL but the PPH method only attains less than 1/3 of the power of the cure method. Similar results are observed with 10-marker setting and other nominal levels. This agrees with the performance of the PPH model on chromosome 1 in the analysis of the real example. To mimic the situation in chromosome 5 (Figure 3), we consider the second alternative β g = 0 and γ g = (0.75, 0.75). Both the cure method and the PPH method retain good power to detect the QTL under this situation. The PPH method has a slightly higher power than the cure model method, but note that the PPH method is not adjusted for its inflated type I errors here. Such observations provide some evidence in support of the results of the real data analysis in chromosome 5, where the PPH method has larger LOD scores than the cure model method, but both methods can find the QTL. To mimic chromosome 13 in the real data, we consider the third alternative that the QTL has joint effects: β g = (0.5, 0.5) and γ g = (0.25, 0.25). It is evident from Table 3 that the proposed cure model method has almost twice the power of the PPH method to detect the presence of the QTL in this case. We conclude this simulation section with a few comments. During the search for genome-wide thresholds, for each scenario, we performed 1000 runs with sample size 15

16 n = 300. We thereby obtained a large number of {W (d), d} as defined in (7) directly using the data drawn from each null hypothesis, which enable us to evaluate the empirical distribution of the genome-wide LOD score thresholds of the given significance levels. For the LOD scores based on the mixture cure model, the estimated thresholds (Table 2) obtained using the resampling approach (using {W (d), d} in (8)) were close to the corresponding percentiles of the empirical thresholds from the {W (d), d}. This fact ensures that our method will maintain proper type I errors at the given sample sizes. In addition, we performed simulations to study the situation when the underlying population is actually homogeneous, i.e. no presence of non-susceptible subjects in a survival model. Under such situations, the LOD score based on the traditional survival model provides the valid test. Using the mixture cure model, the estimates become questionable because the long-term cure effects γ g are not identifiable. However, for testing various hypotheses for QTL using the LRT (LOD score), the cure model approach is still valid for homogeneous populations (Hodge and Elston, 1994; Liu and Shao, 2003) in terms of maintaining correct significant levels, though it often has a small to moderate loss of power (results not shown) compared to the LOD scores based on the correct homogeneous model. 4. Discussion In this article, we propose a mixture cure model for interval mapping of QTL using timeto-event trait from a population of mixed susceptibility. The method is applicable when the time-to-event trait is subject to random censoring. This method provides a natural tool for detecting QTL which affects susceptibility and/or the survival distribution of the susceptible population. Genome-wide significance levels for the LRT can be obtained using the resampling method. Goodness-of-fit of the parametric mixture cure model is also discussed. The proposed method can be generalized to composite QTL models along the lines of Zeng (1993, 1994) as discussed in Diao et al. (2004). More recently, Diao and Lin (2005) developed a semiparametric proportional hazards model for mapping QTL using time-to-event traits. Resampling method was also implemented through the efficient scores to obtain genome-wide threshold. Similar strategy may also be extended to 16

17 the semiparametric proportional hazards cure model, which seems to be a worthy further research project. The challenge is that semiparametric efficient scores for the regression parameters are much more complicated in the mixture cure model which requires further investigation. We have also demonstrated through simulation studies that, if the underlying population is really a mixture of susceptible and nonsusceptible subjects, the LOD score based on the proposed mixture cure models can be used to test for QTL effects. On the other hand, as indicated by our simulation results, the methods that ignore the latent heterogeneous susceptibility such as the simple PPH model may fail to detect the true QTL and may also produce spurious QTL in the presence of heterogeneous susceptibility. Thus the proposed mixture cure model are useful to map QTL based on time-toevent data whenever there exist biological reasons and/or long enough follow-up time in the study which indicate the existence of a latent non-susceptible sub-population. Appendix Appendix A. Score functions of observed likelihood in the case of proportional hazards mixture cure model for F 2 intercross family. In this appendix, we present the formulas of the score functions based on the observed likelihood for the parametric mixture cure model. The scores are required in the resampling stage and are defined as the partial derivatives of the observed likelihood in (4) with respect to the parameters, U(µ; d) = l(µ; d) β = ( l(µ; d) l(µ; d) l(µ; d) l(µ; d) l(µ; d) =,, µ γ γ g β,, β g l(µ; d) γ = l(µ; d) γ g = n i=1 1 n i=1 n i=1 D i j=1,2,3 1 D i j=1,2,3 1 D i j=1,2,3 [ p i (j) [ p i (j) [ p i (j) e Z i + g G j A i δ i (1 + e Zi + g G j ) 2 e Zi + g G j A i δ i (1 + e Zi + g G j ) 2 ) l(µ; d) θ ] Z i ; ] G j ; e Z i + g G j A i {δ i Λ 0 (Y i )e Z i + g G j } 1 + e Zi + g G j 17 where ] Z i ;

18 l(µ; d) β g = n i=1 1 D i j=1,2,3 [ p i (j) e Z i + g G j A i {δ i Λ 0 (Y i )e Z i + g G j } 1 + e Zi + g G j ] G j ; l(µ; d) θ = n i=1 1 D i j=1,2,3 [ log λ 0 (Y i ) p i (j) A i {δ i θ Λ 0(Y i ) θ where A i = (λ 0 (Y i )e Z i + g G j ) δ i exp( Λ 0 (Y i )e Z i + g G j ) and D i = j=1,2,3 [ p i (j) A i e Z i + g G j 1 + e Z i + g G j + 1 δ i 1 + e Z i + g G j e Z i + g G j } e ] Zi + g G j 1 + e Zi + g G j The above score functions are employed to approximate the genome-wide threshold using the resampling method of Diao et al. restricted MLE under the null hypothesis. Appendix B. The observed information matrix. ]. (2004) with µ = µ evaluated at the Let I(µ) denote the observed information matrix, which consists of the second derivatives of the observed log-likelihood with respect to the parameters. The direct calculation of the second derivatives can be complicated, hence we prefer to use Louis (1982) s formula to calculate it based on the complete likelihood, i.e. I(µ; d) = E µ (I C (µ; d) O) E µ (U C (µ; d)u T C(µ; d) O)+E µ (U C (µ; d) O)Eµ T (U C (µ; d) O) We divide the above observed information matrix into several blocks with respect to different parameters I γ (µ; d) I γγg I γβ I γβg I γθ I γg (µ; d) I γg β I γg β g I γg θ I(µ; d) = I β (µ; d) I ββg I βθ. I βg (µ; d) I βgθ I θ (µ; d) To test the null hypothesis H 0, the observed information matrix is evaluated at the restricted MLE µ under H 0. Then the likelihood ratio test statistics constructed at 18

19 each location d are asymptotically equivalent to the following estimated score test statistics at the same location: Û γg β g ( µ; d) = U γ g ( µ; d) I I γ gγ I γgβ I γ I γβ I γθ γgθ I U βg ( µ; d) I βg γ I βg β I βg β I βθ θ I θ 1 U γ ( µ; d) U β ( µ; d) U θ ( µ; d). Reference V. L. Boyartchuk, K. W. Broman, R. E. Mosher et al., Multigenic control of listeria monocytogenes susceptibility in mice, Nat. Genet., vol. 27 pp , K. W. Broman, Mapping quantitative trait loci in the case of a spike in the phenotype distribution, Genetics, vol. 163 pp , B. S. Carter, T. H. Beaty, G. D. Steinberg, B. Childs, and P. C. Walsh, Mendelian inheritance of familial prostate cancer, Proc. Natl. Acad. Sci. USA, vol. 89 pp , E. B. Claus, N. J. Risch and W. D. Thompson, Using age of onset to distinguish between subforms of breast cancer, Annals of Human Genetics, vol. 54 pp , D. R. Cox, Regression models and life tables (with discussion), J. R. Stat. Soc. B, vol. 34 pp , D. R. Cox and D. V. Hinkley, Theoretical Statistics. Chapman & Hall, London, A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, vol. 39 pp. 1-38, G. Diao, D. Y. Lin and F. Zou, Mapping quantitative trait loci with censored observations, Genetics, vol. 168 pp , Diao, G. and Lin, D. Y. (2005) Semiparametric methods for mapping quantitative trait loci with censored data. Biometrics, vol. 61,

20 J. Dupuis, D. Siegmund, Statistical methods for mapping quantitative trait loci from a dense set of markers, Genetics, vol. 151 pp , V. T. Farewell, The use of mixture models for the analysis of survival data with long-term survivors, Biometrics, vol. 38 pp , J. P. Fine, Analysing competing risks data with transformation models, J. R. Stat. Soc. B,, vol. 61 pp , A. M. Glazier, J. H. Nadeau and T. J. Aitman, Finding genes that underlie complex traits, Science, vol. 298 pp , S. E. Hodge and R. C. Elston, Lods, Wrods, and Mods: the interpretation of Lod scores calculated under different models, Genet Epidemiol. vol. 11 pp , S. E. Hodge, V. Vieland and D. A. Greenberg, HLODs remain powerful tools for detection of linkage in the presence of genetic heterogeneity, Am. J. Hum. Genet. vol. 70 pp , C. J. Jiang and Z-B. Zeng, Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines, Genetica, vol. 101 pp , J. D. Kalbfleish and R. L. Prentice, The Statistical Analysis of Failure Time Data, Ed. 2. Wiley, NJ, A. Y. C. Kuk and C. H. Chen, A mixture model combining logistic regression with proportional hazards regression, Biometrika, vol. 79 pp , E. S. Lander and D. Botstein, Mapping mendelian factors underlying quantitative traits using RFLP linkage maps, Genetics, vol. 121 pp , E. S. Lander and N. J. Schork, Genetic dissection of complex traits, Science, vol. 265 pp ,

21 H. Li and E. A. Thompson, Semiparametric estimation of major gene and random familial effects for age of onset, Biometrics, vol. 53 pp D. Y. Lin, An efficient Monte Carlo approach to assessing statistical significance in genomic studies, Bioinformatics. vol. 21 pp , X. Liu and Y. Shao, Asymptotics of likelihood ratio test under loss of identifiability, Ann. Statist. vol. 31 pp , T. A. Louis, Finding the observed information matrix when using the EM algorithm, J. R. Stat. Soc. B, vol. 44 pp , W. Lu and Z. Ying, On semiparametric transformation cure models, Biometrika, vol. 91 pp , M. Lynch and B. Walsh, Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA, Y. Miki, J. Swensen, D. Shattuck-Eidens, P. A. Futreal, et al., A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1, Science, vol. 166 pp , Y. Peng and K. B. G. Dear, A nonparametric mixture model for cure rate estimation, Biometrics, vol. 56 pp , A. Rebai, B. Goffinet and B. Mangin, Comparing power of different methods for QTL detection, Biometrics, vol. 51 pp , Sy, J. P. and Taylor, J. M. G. (2000) Estimation in a Cox proportional hazards cure model. Biometrics, vol. 56, R. C. Symons, M. J. Daly, J. Fridlyand, T. P. Speed, W. D. Cook, et al., Multiple genetic loci modify susceptibility to plasmacytoma-related morbidity in Eµ-v-abl transgenic mice, Proc. Natl. Acad. Sci. USA, vol. 99 pp , Z. B. Zeng, Theoretical basis for separation of multiple linked gene effects in mapping QTL, Proc. Natl. Acad. Sci. USA, vol. 90 pp ,

22 Z. B. Zeng, Precision mapping of quantitative trait loci, Genetics, vol. 136 pp , F. Zou, J. P. Fine, J. Hu and D. Y. Lin, An efficient resampling method for assessing genome-wide statistical significance in mapping quantitative trait loci, Genetics, vol. 168 pp ,

23 Table 1: Estimated QTL positions and effects from the Listeria data Chromosome Pos(cM) LOD γ 0 γ g1 γ g2 β g1 β g Table 2: LOD score thresholds at significance level α obtained from resampling (with the sample standard errors in the parentheses) Resampling Cure model PPH model No. markers α = 5% α = 1% α = 5% α = 1% H 0 : γ g1 = 0, γ g2 = 0 and β g1 = 0, β g2 = (0.03) 3.81 (0.06) 2.13 (0.03) 2.90 (0.052) (0.03) 3.95 (0.06) 2.26 (0.03) 3.04 (0.05) H 1a : γ g1 = 0, γ g2 = 0 and β g1 = 0.5, β g2 = (0.03) 3.82 (0.06) 2.12 (0.03) 2.89 (0.05) (0.03) 3.97 (0.05) 2.26 (0.03) 3.03 (0.05) H 2a : γ g1 = 0.75, γ g2 = 0.75 and β g1 = 0.0, β g2 = (0.03) 3.80 (0.05) 2.13 (0.03) 2.90 (0.05) (0.03) 3.95 (0.06) 2.27 (0.03) 3.04 (0.05) H 3a : γ g1 = 0.25, γ g and β g1 = 0.5, β g2 = (0.03) 3.83 (0.06) 2.12 (0.03) 2.89 (0.05) (0.03) 3.97 (0.06) 2.26 (0.03) 3.04 (0.05) 23

24 Table 3: Simulated Type I errors and powers ( in %) using the setup of Table 2 H 0 H 1a H 2a H 3a No. markers Model α = 5% 1% α = 5% 1% α = 5% 1% α = 5% 1% 6 Cure PPH Cure PPH

25 LOD Score Cure LOD PPH LOD chr1 chr5 chr6 chr13 chr Posititions (cm) LOD Score Cure LOD of Susceptibility chr1 chr5 chr6 chr13 chr15 Posititions (cm) LOD Score Cure LOD of Survival chr1 chr5 chr6 chr13 chr15 Posititions (cm) Figure 1: Top: Testing H 0 of no overall QTL effects, the LOD scores from two QTL mapping methods: Cure mixture model and PPH survival model on the Listeria data. The threshold (dotted horizontal line) is the 5% genome-wide significance level based on the resampling method using cure mixture model. Middle: Testing H 0γ of no QTL effects on susceptibility, the LOD score profile from cure mixture model and the threshold (dotted horizontal line) based on resampling method. Bottom: Testing H 0β of no QTL effects on survival distribution of the susceptible population, the LOD score profile from cure mixture model and the threshold based on resampling method. 25

26 Survival Probability AA Aa aa Survival in Susceptible Subjects Estimated Overall Survival Probability Hours Estimated Survival in Susceptible Subjects Hours Figure 2: On Marker D1M355 in chromosome 1. Upper left is the Kaplan-Meier curves of survival times of all mice after infection of Listeria; Upper right is the Kaplan-Meier curves using only survival times of the mice with observed death; Lower left is the estimated overall survival distribution using mixture cure model; Lower right is the estimated survival distribution of susceptible population using mixture cure model. 26

27 Survival Probability AA Aa aa Survival in Susceptible Subjects Estimated Overall Survival Probability Hours Estimated Survival in Susceptible Subjects Hours Figure 3: On Marker D5M357 in chromosome 5. Upper left is the Kaplan-Meier curves of survival times of all mice after infection of Listeria; Upper right is the Kaplan-Meier curves using only survival times of the mice with observed death; Lower left is the estimated overall survival distribution using mixture cure model; Lower right is the estimated survival distribution of susceptible population using mixture cure model. 27

28 AA group Aa group aa group log( log(s(t))) log( log(s(t))) log( log(s(t))) log t log t log t AA group Aa group aa group log( log(s(t))) log( log(s(t))) log( log(s(t))) log t log t log t Figure 4: For each genotype group at a marker, log( log Ŝs(t)) was plotted against log(t) and the straight line was fitted using the least square method. Top row: Marker D1M355; Bottom row: Marker D5M

29 Proposed parametric estimated survival distribution Chr1:D1M355 pv=0.34 Chr5:D5M357 pv= Kaplan Meier estimate Figure 5: P-P plots of nonparametric Kaplan-Meier estimate of survival distribution against parametric estimates by proportional hazards mixture cure model with a Weibull baseline function. 29

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

The Admixture Model in Linkage Analysis

The Admixture Model in Linkage Analysis The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Binary trait mapping in experimental crosses with selective genotyping

Binary trait mapping in experimental crosses with selective genotyping Genetics: Published Articles Ahead of Print, published on May 4, 2009 as 10.1534/genetics.108.098913 Binary trait mapping in experimental crosses with selective genotyping Ani Manichaikul,1 and Karl W.

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations, Copyright 2004 by the Genetics Society of America DOI: 10.1534/genetics.104.031427 An Efficient Resampling Method for Assessing Genome-Wide Statistical Significance in Mapping Quantitative Trait Loci Fei

More information

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Prediction of the Confidence Interval of Quantitative Trait Loci Location Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint

More information

MANY agriculturally and biomedically important

MANY agriculturally and biomedically important Copyright Ó 007 by the Genetics Society of America DOI: 10.1534/genetics.106.059808 Mapping Temporally Varying Quantitative Trait Loci in Time-to-Failure Experiments Frank Johannes 1 Center for Developmental

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

3003 Cure. F. P. Treasure

3003 Cure. F. P. Treasure 3003 Cure F. P. reasure November 8, 2000 Peter reasure / November 8, 2000/ Cure / 3003 1 Cure A Simple Cure Model he Concept of Cure A cure model is a survival model where a fraction of the population

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information

Methods for QTL analysis

Methods for QTL analysis Methods for QTL analysis Julius van der Werf METHODS FOR QTL ANALYSIS... 44 SINGLE VERSUS MULTIPLE MARKERS... 45 DETERMINING ASSOCIATIONS BETWEEN GENETIC MARKERS AND QTL WITH TWO MARKERS... 45 INTERVAL

More information

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS BIRS 2016 1 NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS Malka Gorfine Tel Aviv University, Israel Joint work with Danielle Braun and Giovanni

More information

A new simple method for improving QTL mapping under selective genotyping

A new simple method for improving QTL mapping under selective genotyping Genetics: Early Online, published on September 22, 2014 as 10.1534/genetics.114.168385 A new simple method for improving QTL mapping under selective genotyping Hsin-I Lee a, Hsiang-An Ho a and Chen-Hung

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Multiple interval mapping for ordinal traits

Multiple interval mapping for ordinal traits Genetics: Published Articles Ahead of Print, published on April 3, 2006 as 10.1534/genetics.105.054619 Multiple interval mapping for ordinal traits Jian Li,,1, Shengchu Wang and Zhao-Bang Zeng,, Bioinformatics

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0 8 Parametric models 8. Introduction In the last few sections (the KM

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com

More information

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Hierarchical Generalized Linear Models for Multiple QTL Mapping Genetics: Published Articles Ahead of Print, published on January 1, 009 as 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple QTL Mapping Nengun Yi 1,* and Samprit Baneree

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes by Se Hee Kim A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial

More information

Constrained estimation for binary and survival data

Constrained estimation for binary and survival data Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline

More information

Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the Copyright 001 by the Genetics Society of America Statistical Issues in the Analysis of Quantitative Traits in Combined Crosses Fei Zou, Brian S. Yandell and Jason P. Fine Department of Statistics, University

More information

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

Combining dependent tests for linkage or association across multiple phenotypic traits

Combining dependent tests for linkage or association across multiple phenotypic traits Biostatistics (2003), 4, 2,pp. 223 229 Printed in Great Britain Combining dependent tests for linkage or association across multiple phenotypic traits XIN XU Program for Population Genetics, Harvard School

More information

The universal validity of the possible triangle constraint for Affected-Sib-Pairs

The universal validity of the possible triangle constraint for Affected-Sib-Pairs The Canadian Journal of Statistics Vol. 31, No.?, 2003, Pages???-??? La revue canadienne de statistique The universal validity of the possible triangle constraint for Affected-Sib-Pairs Zeny Z. Feng, Jiahua

More information

Duration Analysis. Joan Llull

Duration Analysis. Joan Llull Duration Analysis Joan Llull Panel Data and Duration Models Barcelona GSE joan.llull [at] movebarcelona [dot] eu Introduction Duration Analysis 2 Duration analysis Duration data: how long has an individual

More information

QUANTITATIVE trait analysis plays an important observed. These assumptions are likely to be false when

QUANTITATIVE trait analysis plays an important observed. These assumptions are likely to be false when Copyright 2004 y the Genetics Society of America DOI: 10.1534/genetics.103.023903 Mapping Quantitative Trait Loci With Censored Oservations Guoqing Diao, D. Y. Lin 1 and Fei Zou Department of Biostatistics,

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University Survival Analysis: Weeks 2-3 Lu Tian and Richard Olshen Stanford University 2 Kaplan-Meier(KM) Estimator Nonparametric estimation of the survival function S(t) = pr(t > t) The nonparametric estimation

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION INVESTIGATION A New Simple Method for Improving QTL Mapping Under Selective Genotyping Hsin-I Lee,* Hsiang-An Ho,* and Chen-Hung Kao*,,1 *Institute of Statistical Science, Academia Sinica, Taipei 11529,

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Meei Pyng Ng 1 and Ray Watson 1

Meei Pyng Ng 1 and Ray Watson 1 Aust N Z J Stat 444), 2002, 467 478 DEALING WITH TIES IN FAILURE TIME DATA Meei Pyng Ng 1 and Ray Watson 1 University of Melbourne Summary In dealing with ties in failure time data the mechanism by which

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data

Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data Columbia International Publishing Journal of Advanced Computing (2013) 1: 43-58 doi:107726/jac20131004 Research Article Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

Proportional hazards regression

Proportional hazards regression Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,

More information

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky Summary Empirical likelihood ratio method (Thomas and Grunkmier 975, Owen 988,

More information

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis PROPERTIES OF ESTIMATORS FOR RELATIVE RISKS FROM NESTED CASE-CONTROL STUDIES WITH MULTIPLE OUTCOMES (COMPETING RISKS) by NATHALIE C. STØER THESIS for the degree of MASTER OF SCIENCE Modelling and Data

More information