Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci
|
|
- Gillian Logan
- 5 years ago
- Views:
Transcription
1 Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci Abstract. When censored time-to-event data are used to map quantitative trait loci (QTL), the existence of nonsusceptible subjects entails extra challenges. If the heterogeneous susceptibility is ignored or inappropriately handled, we may either fail to detect the responsible genetic factors or find spuriously significant locations. In this article, an interval mapping method based on parametric mixture cure models is proposed, which takes into consideration of nonsusceptible subjects. The proposed model can be used to detect the QTL that are responsible for differential susceptibility and/or time-to-event trait distribution. In particular, we propose a likelihood based testing procedure with genome-wide significance levels calculated using a resampling method. The performance of the proposed method and the importance of considering the heterogeneous susceptibility are demonstrated by simulation studies and an application to survival data from an experiment on mice infected with Listeria monocytogenes. Keywords: EM algorithm; Parametric proportional hazards model; QTL mapping; Time-to-event data. 1. Introduction Mapping genes that underlie complex traits is of great interest (Lander and Schork, 1994; Glazier et al., 2002). In standard interval mapping of quantitative traits loci (QTL), the trait distribution is often modeled as a mixture of two (or more) normal components corresponding to two (or more) different genotypes at the putative QTL (Lander and Botstein, 1989; Zeng, 1993). Time-to-event data as quantitative trait values (e.g. age-at-onset of cancer, timeto-recurrence of tumor) have been used to identify various disease genes (Claus et al., 1990; Carter et al., 1992; Miki et al., 1994; Boyartchuk et al., 2001; Symons et al., 2002; among others). To study time-to-event traits, however, classical survival models may be more natural than the mixture of normals. Indeed, the standard interval mapping methods for normally distributed and fully observed quantitative traits have been extended successfully to the time-to-event traits subject to random censoring 1
2 (e.g. Li and Thompson, 1997; Diao et al., 2004; Diao and Lin, 2005, among others). A further challenge that has not received adequate attention when studying time-to-event trait is the issue of latent heterogeneous susceptibilities. In a population consisting of susceptible and nonsusceptible individuals, all susceptible subjects would eventually experience the event of interest in the absence of censoring, while the nonsusceptible ones can be regarded as cured, i.e. not at risk of developing the particular event. For example, Boyartchuk et al. (2001) considered a data set consisting of the survival times of 116 female intercross mice after infection with Listeria monocytogenes. About 30% of the mice had survived longer than 240 hours to the end of the study and might be considered as cured or nonsusceptible from a biological point of view. In addition to good scientific evidence for the existence of nonsusceptible subpopulation, the data with heterogeneous susceptibility usually have heavy censoring at the end of the study. The corresponding Kaplan-Meier curve has a long non-zero tail or the histogram shows a spike in the right end (Broman, 2003). When analyzing such genetic data, failure to account for the latent heterogeneous susceptibility might result in significant power loss in detecting the responsible genetic factor and/or lead to spurious significant results (Farewell, 1982; Hodge and Elston, 1994; Hodge et al., 2001). Therefore, statistical methods that incorporate the mixed susceptibilities into the modeling and analysis are needed. For the aforementioned mouse survival data, Broman (2003) proposed a two-part model: a normal distribution for log survival times of observed failure mice and a point mass at the end of the study for observed survivors. This two-part model may be useful to detect QTL when there is one common administrative censoring time, as being the case of this particular data set. Under general random censorship, such two-part separation of subjects may not always be reasonable. Therefore, it is of great practical interest to develop general statistical methods applicable to QTL mapping for randomly censored time-to-event traits from a population of latent heterogeneous susceptibility. In this paper, we propose a parametric mixture cure model for QTL mapping when the primary trait is the randomly censored time-to-event data from a population of mixed susceptibility. In the context of survival analysis, cure models have been developed to handle heterogenous susceptibilities and are applicable under general 2
3 random censoring (Kuk and Chen, 1992; Sy and Taylor, 2000; Peng and Dear, 2000; Lu and Ying, 2004, among others). Cure models may also be viewed as a special case of the competing risks models with unobserved cured events (Fine, 1999). To adapt these cure models for mapping QTL, we need to overcome a few challenges. In particular, we must account for the missing covariates due to the fact that the genotypes of the putative QTL are unknown. Furthermore, we need to identify appropriate genome-wide critical values for the proposed test statistics at certain nominal levels. The interval mapping tests are usually carried out at multiple locations along the chromosomes and the test statistics are typically not independent. Thus obtaining appropriate genomewide critical values is crucial in the context of genome-wide QTL mapping. The paper is organized as follow. In Section 2, we introduce the parametric mixture cure model and propose an EM-based likelihood ratio test (LRT). When the log-normal model is used for event times of susceptible subjects, the proposed cure model generalizes the two-part model of Broman (2003) to allow for latent susceptible status under general random right censoring. When using the parametric proportional hazards model for the time-to-event trait of susceptible subjects, the proposed cure model extends the parametric proportional hazards (PPH) model of Diao et al. (2004) to deal with heterogeneous susceptibility. Since the proposed cure model characterizes the QTL effects on susceptibility and/or survival distribution of susceptible subjects, it can be used to test such effects separately or simultaneously. It can also be used to account for potential effects of other risk factors by incorporating covariates into the regression model in a natural way. The issue of genome-wide significance level is also addressed in Section 2. Recently, Diao et al. (2004), Zou et al. (2004), and Lin (2005) introduced an efficient resampling method for assessing the genome-wide significance level. The resampling method is computationally less intensive and applicable to many complex genetic models (Zou et al., 2004). Therefore, we adopt this resampling method to obtain the genome-wide thresholds for the proposed likelihood ratio tests at certain nominal levels. In Section 3, the performance of our proposed methods and the importance of considering the heterogeneous susceptibility are demonstrated by simulation studies and an application to survival times of intercrossed mice following infection with Listeria 3
4 monocytogenes (Boyartchuk et al., 2001; Broman, 2003). Some concluding remarks are given in Section Methods In this section, we first propose the general parametric mixture cure model. We then formulate the likelihood based genome-wide tests and discuss the determination of genome-wide thresholds Notation and Models Consider a sample of n individuals with mixed susceptibility. Let T i denote the potential time-to-event trait for individual i, i = 1,, n. In addition, T i < stands for the failure time of a susceptible subject and η i takes value 1 or 0, indicating whether the ith subject is susceptible or not. Thus T i has the following decomposition: T i = η i T i + (1 η i ), (1) where the multiplication of 0 and is defined to be 0. The observation on the trait value of the ith individual consists of two components: the observed event time Y i = min(t i, C i ) and the censoring indicator δ i = I(T i C i ), where C i denotes a random censoring time and is assumed to be noninformative (Kalbfleish and Prentice, 2002). Note that, δ i = 1 implies η i = 1, but η i is unobservable when δ i = 0. Thus, the susceptible statuses are uncertain for censored subjects. Suppose we have data on trait values and a set of genetic markers. Let M i denote the multiple marker genotype information of the ith subject. We consider a putative QTL with two alleles Q and q and denote its unknown QTL genotype by G i. Here G i is coded as a dummy variable for all possible combinations of genotypes. For example, in F 2 intercross design, G i can be recorded using a two dimensional vector with three possible values, (1, 0), (0, 1) and (0, 0), according to the genotypes QQ, Qq and qq, respectively. In addition, let Z i denote other observed covariates of interest, such as environmental exposures, which is assumed to be independent of G i. For a susceptible subject, i.e. η i = 1, its failure time T i f(t Z i, G i ; β, β g, θ), where the parameter β g is assumed to follow a parametric distribution 4 depicts the effects of the QTL on the
5 time-to-event trait distribution for susceptible subjects, β indicates the corresponding effects of covariates and θ is the inherent distribution parameter of the parametric distribution f. More specifically, a linear regression model for QTL and covariates effects leads to a simple model f(t Z i, G i ; β Z i + β g G i, θ). For the binary outcome of susceptible indicator η i, it is natural to consider a logistic regression model where Z i pr(η i = 1 G i, Z i ) = exp(γ Z i + γ g G i ) 1 + exp(γ Z i + γ g G i ), (2) = (1, Z i) so that γ contains an intercept term, and the regression parameter γ g represents the QTL effects on susceptibility. Note that, at any putative QTL except for markers, genotype information G i is unknown. A natural idea is to treat G i as missing data, which can be handled by an EM algorithm. The conditional probability of G i given the marker information M i at a specific location d is denoted by pr(g i M i ; d). Under the assumption of no crossover interference and no genotyping errors, pr(g i M i ; d) is determined by the two flanking markers and the position of the QTL in the interval. For many experimental cross studies, explicit formulas for pr(g i M i ; d) are available in several books and papers, for example in Lynch and Walsh (1998, P435, equation (15.2)). The complete data consist of independent copies of C = {Y, δ, η, G, M, Z}, while the observed data consist of n independent copies of O = {Y, δ, M, Z}. Let µ = (γ, γ g, β, β g, θ). The likelihood function of the complete data is constructed as follows: L C (µ; d) = n { exp(γ Zi + γ } gg i ) ηi { } 1 ηi exp(γ Z i=1 i + γ gg i ) 1 + exp(γ Zi + γ gg i ) n [ {f(y i Z i, G i ; β, β g, θ)} δ i {1 F (Y i Z i, G i ; β, β g, θ)} 1 δ i i=1 n pr(g i M i ; d). (3) i=1 where F (t Z i, G i ; β, β g, θ) = t 0 f(s Z i, G i ; β, β g, θ)ds denotes the cumulative distribution function of the survival time for susceptible subjects. It is straightforward to verify that, using the complete data likelihood (3), the observed data likelihood is a mixture of several components corresponding to different 5 ] ηi
6 genotypes and susceptibilities: L(µ, d) = n { K i=1 j=1 [ p i (j) {f(y i Z i, G i ; β, β g, θ)} δ i {1 F (Y i Z i, G i ; β, β g, θ)} 1 δ i exp(γ Zi + γ gg j ) 1 + exp(γ Zi + γ gg j ) + 1 δ ]} i 1 + exp(γ Zi +. (4) γ gg j ) where K denotes the number of possible genotypes of putative QTL, and {G j } denote the coded values of genotypes. For example, for F 2 intercross population, we may use G 1,2,3 = {(1, 0), (0, 1), (0, 0)}. Standard interval mapping methods (Lander and Botstein, 1989; Zeng, 1993) examine the existence of QTL through the chromosome in a specified distance, e.g. 1 or 2 centi-morgan (cm), using a likelihood ratio test (LRT). In all our numerical studies, we evaluate the LRT with the specified distance 1 cm. To construct such a profile of LRT over the regions of the chromosome, the maximum likelihood estimates (MLE) ˆµ under the alternative model and the restricted MLE µ under the null hypothesis need to be calculated at each given position d Hypotheses and LRT Under the proposed parametric mixture cure model, the QTL has two types of effects on the trait distribution: γ g is the long-term effect on susceptibility and β g is the shortterm effect on survival of the susceptible subjects. Therefore, the proposed mixture cure model can be used to test the following hypotheses: No overall QTL effects, H 0 : γ g = 0 and β g = 0 vs. H 1 : γ g 0 or β g 0; No QTL effects on susceptibility, H 0γ : γ g = 0 vs. H 1γ : γ g 0; No QTL effects on the survival of susceptible subjects, H 0β : β g = 0 vs. H 1β : β g 0. To test the above hypotheses, the LRT statistic LR(d) = 2 ln{l(ˆµ; d)/l( µ; d)} is calculated at each location d. Under the null hypothesis H 0, the MLE µ does not depend on the testing location d, so µ needs to be calculated only once for each data set. But ˆµ, LR(d) and µ under other null hypotheses do depend on the location d since pr(g i M i ; d) varies along d. We employ the EM algorithm (Dempster et al., 1977) to obtain the parameter estimates. In the EM algorithm, we need to calculate the conditional expectation of 6
7 l C (µ; d) = log L C (µ; d) in (3) with respect to the unobserved quantities {η i, G i } given the current estimated parameter values and the observed data O i = {Y i, δ i, M i, Z i }. For example, consider the parametric proportional hazards mixture cure model in which the hazard function for the survival time of susceptible subject is specified as λ(t G i, Z i ) = λ 0 (t; θ) exp(β Z i + β g G i ). (5) Denote the cumulative hazard function by Λ 0 (t; θ) = t 0 λ 0(s; θ)ds. Then substituting the density and distribution functions in (3) by their counterparts specified by the proportional hazards model (5), simple algebraic manipulation yields L C (µ; d) = n { exp(γ Zi + γ } gg i ) ηi { } 1 ηi exp(γ Z i=1 i + γ gg i ) 1 + exp(γ Zi + γ gg i ) n { λ0 (Y i ; θ) exp(β Z i + β gg i ) } δ i η i e η iλ 0 (Y i ; ) exp( Z i + g G i) i=1 n pr(g i M i ; d). i=1 Note that the conditional expectation of the complete data log-likelihood (6) can be written as a function of conditional expectations of {η i, G i, η i G i }. Thus, in each E-step of the EM iteration, it suffices to compute the conditional expectation of these quantities given the current parameter values and the observed data. (6) In the kth step, the corresponding conditional expectations of {η i, G i, η i G i }, denoted by {E(η i O i, µ (k) ), E(G i = G j O i, µ (k) ), E(η i G i = η i G j O i, µ (k) )}, can be derived explicitly. To simplify the notation, the superscript (k) of parameters are suppressed in the following formulas for these conditional moments. 1 δ i = 1 E(η i O i, µ (k) ) = D 1 K i0 j=1 e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 0 D 1 E(G i = G j O i, µ (k) i1 ) = e g G j e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 1 D 1 i0 [e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j ) + {1 π i (G j )}]p i (j) δ i = 0 D 1 E(η i G i = η i G j O i, µ (k) i1 ) = e g G j e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 1 D 1 i0 e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j) δ i = 0, 7
8 where p i (j) = pr(g i = G j M i ), π i (G j ) pr(η i = 1 G i = G j, Zi ) as defined in equation (2), and K D i1 = e g G j e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j )p i (j), j=1 K D i0 = {e Λ 0(Y i ; ) exp( Z i + g Gj) π i (G j ) + 1 π i (G j )}p i (j). j=1 In the M-step, we obtain the arguments that maximize the expected log-likelihood. Then the EM algorithm iterates until it converges. The LOD score is defined as log 10 {L(ˆµ; d)/l( µ; d)} = LR(d)/(2 ln10). Evaluation of the LOD score at each location yields a LOD profile over the chromosome. The location with the largest LOD score can be used as an estimate of the QTL location provided that this largest value exceeds the threshold of a certain significance level Genome-wide Threshold Assessing the genome-wide significance level is challenging when the QTL is searched over the whole genome because the tests are performed at multiple locations and the test statistics are not independent. The point-wise significance level based on the χ 2 approximation without the multiplicity correction is no longer appropriate. The Bonferroni correction becomes too conservative when the number of tests is large. Recently, Diao et al. (2004), Zou et al. (2004), and Lin (2005) proposed a novel numerical method for searching the genome-wide threshold using a resampling approach. The method is computationally feasible and is applicable to many genetic models. Zou et al. (2004) and Lin (2005) gave detailed discussions of the performance of this resampling method and comparisons with other competing methods (e.g. Rebai et al., 1995; Dupuis and Siegmund, 1999). We employ the resampling method to assess the genome-wide significance level. Using the well known asymptotic equivalence between the likelihood ratio test and the score test (Cox and Hinkley, 1974), the resampling approach computes the empirical threshold for LRT by generating a large number of randomly perturbed score test statistics. At each location d, the score test statistic is a sum of independent and identically distributed (i.i.d.) terms with mean zero, thus it is convenient to perturb 8
9 each term with an independent standard Gaussian random variable as discussed in Lin (2005). More specifically, let U(µ; d) = n U i (µ; d) =. i=1 n l i (µ; d)/ µ (7) i=1 denote the score function at a location d. To test different null hypotheses, the score functions corresponding to the parameters to be tested will be used to construct the test statistic. For example, to test H 0 : γ g = 0 and β g = 0, the corresponding score functions are U g g (µ; d) =. n i=1 l i(µ; d)/ γ g n i=1 l i(µ; d)/ β g. The score test statistic is defined as W (d) = 1 T g g nû ( µ; d) ˆV 1 (d)û ( µ; d) (8) g g where Û g g is a consistent estimator of U g g under the null hypothesis and its formulation is derived in Appendix. The estimated covariance matrix ˆV (d) can be obtained by n 1 n i=1 Û g g,i (d)û T g g,i (d). To approximate the distribution of W (d), we generate a large number, say R = 10000, of W (d) = 1 T g g nû ( µ; d) ˆV 1 (d)û ( µ; d) (9) g g using randomly perturbed score functions Û g g (d) = n i=1 Û g g,i (d)x i, X i N(0, 1), while fixing the observed data (Y, δ, M, Z). From each set of (X 1,..., X n ), we can calculate sup d W (d). Then the threshold for the genome-wide significance level α is determined by the 100(1 α)th percentile of the R simulated values of sup d W (d). It is clear that the resampling method only involves generating standard normal random variables and some straightforward calculations. Remark: In the presence of missing marker genotypes, the conditional probability of the putative QTL genotype can be calculated based on the two closest observed flanking markers of the specific location. The proposed approach can easily deal with missing and/or dominant marker situations since only the conditional probabilities {p i (j)} need modifed formulas (Jiang and Zeng, 1997; Zou et al., 2004). We have considered both types of complications in the following real data application. 9
10 3. Numerical Results In this section we first demonstrate the proposed mixture cure models with an application to a real data set. Then we report numerical results of our simulations conducted to assess the performance of the proposed methods under various settings motivated by the real data example Real Data Example To illustrate our methods, we considered the data from the study on the survival of 116 female mice from an intercross experiment between the BALB/cByJ and C57BL/6ByJ strains after infection with Listeria monocytogenes (Boyartchuk et al., 2001; Broman, 2003). The mice were genotyped at 133 markers over 20 chromosomes, including 2 on the X chromosome. In this specific data, the only censoring occurred at the end of the study. From the biological point of view, the mice surviving more than 240 hours may have recovered from the infection. We employed our proposed mixture cure model to detect the QTL effects on several related chromosomes. More specifically, we consider the parametric proportional hazards mixture cure model (6) and use the two-parameter Weibull hazard function, λ 0 (t; θ) = θ 1 θ 2 t θ2 1. For the inter-crossed mice data, the genotypes G i are recorded using a two dimensional vector with three possible values, (1, 0), (0, 1) and (0, 0), according to the genotypes QQ, Qq and qq, respectively. The EM algorithm and LRT tests were conducted as described in the last section. The threshold at the 5% genome-wide significance level obtained from the resampling approach is The corresponding estimates of regression parameters at the locations with the largest LOD scores for each chromosome are presented in Table 1. Recall that the regression parameters were estimated when the putative QTL location was fixed, for the purpose of constructing the likelihood ratio test statistics at each d. (INSERT TABLE 1 HERE) We also carried out LOD Score analysis using the PPH model on the same set of data to gain insights on the importance of taking into account possible heterogeneous susceptibility. The profiles of both the LOD(γ g, β g ) for the cure model and LOD(β g ) 10
11 for the PPH model for testing H 0 are shown in the top plot of Figure 1. These two methods show some discrepancies, especially in chromosomes 1 and 5, which will further discussed later. (INSERT FIGURE 1 HERE) To test more specific effects of the QTL, we proceed with testing hypotheses H 0 and H 0 using the proposed cure mixture model. The corresponding LOD(d) profiles are shown in the middle and bottom plots of Figure 1. It is obvious from the plots that, only in chromosome 13, all three peaks of LOD scores of the cure models exceed the genomewide thresholds. This indicates that the QTL in chromosome 13 has significant joint and separate effects on susceptibility and survival times of susceptible mice. Additional interesting patterns were found in chromosomes 1 and 5. In chromosome 1, there is no significance for susceptibility (H 0γ : γ g = 0, Figure 1: middle plot), but a significant effect on the survival distribution among the susceptible mice (H 0β : β g = 0, Figure 1: bottom plot). On the other hand, the QTL in chromosome 5 significantly affects only the susceptibility of mice (Figure 1: middle plot) but not on the survival distribution of the susceptible mice (Figure 1: bottom plot). Next we examine chromosomes 1 and 5 more closely to properly interpret the differences between our results and findings using the PPH method. To be specific, we first examine the survival distribution based on marker D1M355, the marker closest to the estimated QTL position (81 cm) on chromosome 1. Various survival distribution plots are presented in Figure 2. (INSERT FIGURE 2 HERE) The Kaplan-Meier plots of censored survival times grouped by marker D1M355 genotypes are displayed on the upper left; the Kaplan-Meier plots for the observed failure times only are displayed on the upper right. The lower two plots are the estimated survival distributions for the corresponding upper plots using our proposed mixture cure model and the resulting estimates of the parameters in Table 1. The upper left Kaplan-Meier plot shows that the three survival curves nearly approach to the same level at the end of the study, which indicates that cure proportions of three group are 11
12 close. This tail similarity obscures the vertical differences among survival curves, while such differences are much more obvious when considering only the survival distribution for susceptible subjects, as shown in the upper and lower right plots. Thus, the PPH model that essentially assumes no cured fraction, which looks for differences in the upper left plot, does not yield significance. But the proposed parametric mixture cure model, which considers both the upper left and right plots, yields significance on chromosome 1. These findings are consistent with the results reported in Broman (2003) and Diao et al. (2004). Additionally, based on our testing results, chromosome 5 seems to be a typical case where the QTL only affects susceptibility. To see this, Figure 3 displays four survival distribution plots on marker D5M357 presented in the same layout as described above for Figure 2. It is obvious from the two plots on the left that cure fractions of the three groups are very different. The estimated cure fractions are 0.64, 0.29 and 0.03 for the three genotypes AA, Aa, and aa respectively. But the two plots on the right show that the survival distributions of susceptible subjects are very similar among these three groups. Because the tails are well separated, the overall survival curves can still be distinguished from each other even without considering cure effects. Hence the QTL effect in chromosome 5 can be detected using both the cure model and the PPH model. (INSERT FIGURE 3 HERE) Model Diagnostic The proportional hazards mixture cure model with two-parameter Weibull baseline hazard function is used in the analysis of listeria data. Based on Figures 2 and 3, the estimated survival curves (lower plots) seem very similar to the observed Kaplan- Meier curves (upper plots). Model diagnostic and goodness-of-fit analysis are critical in practical applications of parametric models. In this subsect, we provide some formal examination of the goodness of fit of the proposed parametric mixture cure model to the listeria data. The listeria data have no other covariates besides the genotype groups. We first assess the Weibull baseline hazard assumption for the survival distribution of the susceptible subjects by examining each marker genotype group separately. More specifically, 12
13 at a specific marker and for each genotype group, an overall survival distribution S(t) was estimated by the nonparametric Kaplan-Meier estimate which is denoted by Ŝ(t). Under the assumption of existing cure proportion, the overall survival distribution function can be written as S(t) = ps s (t) + (1 p), where p stands for the susceptible probability and S s (t) denotes the survival distribution for the susceptible subjects in this genotype group. Therefore, we can represent S s (t) as {S(t) (1 p)}/p. A consistent nonparametric estimate of S s (t) is thus Ŝs(t) = {Ŝ(t) Ŝ( )}/(1 Ŝ( )). To test for a shared Weibull hazard assumption for the survival distribution of susceptible subjects, we plot log{ log Ŝs(t)} against log t which should be linear under the assumption of a Weibull distribution. The plots are approximately linear, supporting a Weibull baseline assumption. The plots on markers D1M355 and D5M357 were presented in Figure 4. We note that there appears to be some deviations from the straight-line at the beginning of the study. One possible reason is that the animals were not immediately at risk of death after infection. (INSERT FIGURE 4 HERE) We next examine the overall fit of the proposed mixture cure model to the listeria data graphically and numerically. The population survival distribution S (t) was estimated nonparametrically using the Kaplan-Meier estimate Ŝ (t). Under the proposed parametric mixture cure model, the survival distribution S (t; µ) is estimated by fixing the parameters at the estimated values ˆµ (from Table 1). We plot Ŝ (t) against S (t; ˆµ) and the resulting P-P plot can be used to assess the closeness to the diagonal line. The results on markers D1M355 and D5M357 were shown in Figure 5. The two sample Kolmogorov-Smirnov tests were employed to evaluate the goodness-of-fit between Ŝ (t) and S (t; ˆµ) at these two markers and yielded p-values of 0.34 and 0.18, respectively. Therefore, these results and plots indicate that there are no serious violation for using the proposed proportional hazards mixture cure model with the Weibull baseline hazards function. INSERT FIGURE 5 HERE 3.2. Simulations 13
14 A series of simulation studies were conducted to assess the performance of the proposed cure model under practical settings. For illustration, we also present the results from the PPH model assuming no cured fraction. The survival times were generated from an F 2 population of mixed susceptibility that mimic the settings of the real data example presented above. No covariates other than genetic factors are included. To be specific, we consider the parametric proportional hazards mixture cure model with the following specifications, Pr(η i = 1 G i ) = exp(γ 0 + γ g G i ) 1 + exp(γ 0 + γ g G i ), and λ(t G i, η i = 1) = λ 0 (t; θ)e g G i, where G i is defined for F 2 population as before and λ 0 (t; θ) is the Weibull baseline hazard function. The independent censoring times C i are generated from the uniform distribution U(0, 10). Under this general random censoring, it is no longer appropriate to simply attribute all censored individuals to the category of cured. One chromosome of total length 100 cm is simulated, and the markers are generated from a Markov chain at evenly spaced positions with distances of 10 and 20 cm. Under the alternative hypotheses, the QTL is posited at 35 cm with different combinations of the long-term and short-term effects. Under each scenario, we perform 1000 runs with sample size n = 300. In each run, we resample times. To examine performances of the LOD Score tests using the cure model and the PPH model, the null hypotheses H 0 : γ g = 0, and β g = 0 are tested under various proportions of nonsusceptible subjects. The genome-wide threshold is obtained by the resampling method described in the proceeding section, and the estimated threshold values are presented in Table 2. (INSERT TABLE 2 HERE) The empirical type I errors and powers of the two methods using the cure model and the PPH model were summarized in Table 3. The simulation results indicate that the proposed cure model produces reasonable type I errors and good powers to detect the QTL effects under all types of alternative models that we simulated. On the other hand, the PPH method inflates the type I errors and may lose power to detect the QTL under some cure model alternatives. More specifically, under the null hypothesis of no 14
15 overall QTL effects (γ g = β g = 0), the cure fraction is 38% (γ 0 = 0.5), and it is clear that the PPH method yields inflated type I errors. This demonstrates that without taking into account the existence of non-susceptible individuals, the PPH method tends to find a spurious QTL. (INSERT TABLE 3 HERE) The powers are calculated under three different alternatives, which mimic the different situations presented in the real data. For simplicity, we directly compare the powers of our proposed cure method with the PPH method without adjustment for type I error inflation. However, since the PPH method produces inflated type I errors, adjustments are needed if practically applied in such context. For the first alternative (Table 2), we consider the case β g = (0.5, 0.5) and γ g = 0, which mimics the situation of chromosome 1 (Figure 2), i.e. the QTL only affects the survival distribution of susceptible subjects but not the susceptible probability. With 6 markers, at the 5% nominal level, the proposed cure method has 85% power to detect the QTL but the PPH method only attains less than 1/3 of the power of the cure method. Similar results are observed with 10-marker setting and other nominal levels. This agrees with the performance of the PPH model on chromosome 1 in the analysis of the real example. To mimic the situation in chromosome 5 (Figure 3), we consider the second alternative β g = 0 and γ g = (0.75, 0.75). Both the cure method and the PPH method retain good power to detect the QTL under this situation. The PPH method has a slightly higher power than the cure model method, but note that the PPH method is not adjusted for its inflated type I errors here. Such observations provide some evidence in support of the results of the real data analysis in chromosome 5, where the PPH method has larger LOD scores than the cure model method, but both methods can find the QTL. To mimic chromosome 13 in the real data, we consider the third alternative that the QTL has joint effects: β g = (0.5, 0.5) and γ g = (0.25, 0.25). It is evident from Table 3 that the proposed cure model method has almost twice the power of the PPH method to detect the presence of the QTL in this case. We conclude this simulation section with a few comments. During the search for genome-wide thresholds, for each scenario, we performed 1000 runs with sample size 15
16 n = 300. We thereby obtained a large number of {W (d), d} as defined in (7) directly using the data drawn from each null hypothesis, which enable us to evaluate the empirical distribution of the genome-wide LOD score thresholds of the given significance levels. For the LOD scores based on the mixture cure model, the estimated thresholds (Table 2) obtained using the resampling approach (using {W (d), d} in (8)) were close to the corresponding percentiles of the empirical thresholds from the {W (d), d}. This fact ensures that our method will maintain proper type I errors at the given sample sizes. In addition, we performed simulations to study the situation when the underlying population is actually homogeneous, i.e. no presence of non-susceptible subjects in a survival model. Under such situations, the LOD score based on the traditional survival model provides the valid test. Using the mixture cure model, the estimates become questionable because the long-term cure effects γ g are not identifiable. However, for testing various hypotheses for QTL using the LRT (LOD score), the cure model approach is still valid for homogeneous populations (Hodge and Elston, 1994; Liu and Shao, 2003) in terms of maintaining correct significant levels, though it often has a small to moderate loss of power (results not shown) compared to the LOD scores based on the correct homogeneous model. 4. Discussion In this article, we propose a mixture cure model for interval mapping of QTL using timeto-event trait from a population of mixed susceptibility. The method is applicable when the time-to-event trait is subject to random censoring. This method provides a natural tool for detecting QTL which affects susceptibility and/or the survival distribution of the susceptible population. Genome-wide significance levels for the LRT can be obtained using the resampling method. Goodness-of-fit of the parametric mixture cure model is also discussed. The proposed method can be generalized to composite QTL models along the lines of Zeng (1993, 1994) as discussed in Diao et al. (2004). More recently, Diao and Lin (2005) developed a semiparametric proportional hazards model for mapping QTL using time-to-event traits. Resampling method was also implemented through the efficient scores to obtain genome-wide threshold. Similar strategy may also be extended to 16
17 the semiparametric proportional hazards cure model, which seems to be a worthy further research project. The challenge is that semiparametric efficient scores for the regression parameters are much more complicated in the mixture cure model which requires further investigation. We have also demonstrated through simulation studies that, if the underlying population is really a mixture of susceptible and nonsusceptible subjects, the LOD score based on the proposed mixture cure models can be used to test for QTL effects. On the other hand, as indicated by our simulation results, the methods that ignore the latent heterogeneous susceptibility such as the simple PPH model may fail to detect the true QTL and may also produce spurious QTL in the presence of heterogeneous susceptibility. Thus the proposed mixture cure model are useful to map QTL based on time-toevent data whenever there exist biological reasons and/or long enough follow-up time in the study which indicate the existence of a latent non-susceptible sub-population. Appendix Appendix A. Score functions of observed likelihood in the case of proportional hazards mixture cure model for F 2 intercross family. In this appendix, we present the formulas of the score functions based on the observed likelihood for the parametric mixture cure model. The scores are required in the resampling stage and are defined as the partial derivatives of the observed likelihood in (4) with respect to the parameters, U(µ; d) = l(µ; d) β = ( l(µ; d) l(µ; d) l(µ; d) l(µ; d) l(µ; d) =,, µ γ γ g β,, β g l(µ; d) γ = l(µ; d) γ g = n i=1 1 n i=1 n i=1 D i j=1,2,3 1 D i j=1,2,3 1 D i j=1,2,3 [ p i (j) [ p i (j) [ p i (j) e Z i + g G j A i δ i (1 + e Zi + g G j ) 2 e Zi + g G j A i δ i (1 + e Zi + g G j ) 2 ) l(µ; d) θ ] Z i ; ] G j ; e Z i + g G j A i {δ i Λ 0 (Y i )e Z i + g G j } 1 + e Zi + g G j 17 where ] Z i ;
18 l(µ; d) β g = n i=1 1 D i j=1,2,3 [ p i (j) e Z i + g G j A i {δ i Λ 0 (Y i )e Z i + g G j } 1 + e Zi + g G j ] G j ; l(µ; d) θ = n i=1 1 D i j=1,2,3 [ log λ 0 (Y i ) p i (j) A i {δ i θ Λ 0(Y i ) θ where A i = (λ 0 (Y i )e Z i + g G j ) δ i exp( Λ 0 (Y i )e Z i + g G j ) and D i = j=1,2,3 [ p i (j) A i e Z i + g G j 1 + e Z i + g G j + 1 δ i 1 + e Z i + g G j e Z i + g G j } e ] Zi + g G j 1 + e Zi + g G j The above score functions are employed to approximate the genome-wide threshold using the resampling method of Diao et al. restricted MLE under the null hypothesis. Appendix B. The observed information matrix. ]. (2004) with µ = µ evaluated at the Let I(µ) denote the observed information matrix, which consists of the second derivatives of the observed log-likelihood with respect to the parameters. The direct calculation of the second derivatives can be complicated, hence we prefer to use Louis (1982) s formula to calculate it based on the complete likelihood, i.e. I(µ; d) = E µ (I C (µ; d) O) E µ (U C (µ; d)u T C(µ; d) O)+E µ (U C (µ; d) O)Eµ T (U C (µ; d) O) We divide the above observed information matrix into several blocks with respect to different parameters I γ (µ; d) I γγg I γβ I γβg I γθ I γg (µ; d) I γg β I γg β g I γg θ I(µ; d) = I β (µ; d) I ββg I βθ. I βg (µ; d) I βgθ I θ (µ; d) To test the null hypothesis H 0, the observed information matrix is evaluated at the restricted MLE µ under H 0. Then the likelihood ratio test statistics constructed at 18
19 each location d are asymptotically equivalent to the following estimated score test statistics at the same location: Û γg β g ( µ; d) = U γ g ( µ; d) I I γ gγ I γgβ I γ I γβ I γθ γgθ I U βg ( µ; d) I βg γ I βg β I βg β I βθ θ I θ 1 U γ ( µ; d) U β ( µ; d) U θ ( µ; d). Reference V. L. Boyartchuk, K. W. Broman, R. E. Mosher et al., Multigenic control of listeria monocytogenes susceptibility in mice, Nat. Genet., vol. 27 pp , K. W. Broman, Mapping quantitative trait loci in the case of a spike in the phenotype distribution, Genetics, vol. 163 pp , B. S. Carter, T. H. Beaty, G. D. Steinberg, B. Childs, and P. C. Walsh, Mendelian inheritance of familial prostate cancer, Proc. Natl. Acad. Sci. USA, vol. 89 pp , E. B. Claus, N. J. Risch and W. D. Thompson, Using age of onset to distinguish between subforms of breast cancer, Annals of Human Genetics, vol. 54 pp , D. R. Cox, Regression models and life tables (with discussion), J. R. Stat. Soc. B, vol. 34 pp , D. R. Cox and D. V. Hinkley, Theoretical Statistics. Chapman & Hall, London, A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, vol. 39 pp. 1-38, G. Diao, D. Y. Lin and F. Zou, Mapping quantitative trait loci with censored observations, Genetics, vol. 168 pp , Diao, G. and Lin, D. Y. (2005) Semiparametric methods for mapping quantitative trait loci with censored data. Biometrics, vol. 61,
20 J. Dupuis, D. Siegmund, Statistical methods for mapping quantitative trait loci from a dense set of markers, Genetics, vol. 151 pp , V. T. Farewell, The use of mixture models for the analysis of survival data with long-term survivors, Biometrics, vol. 38 pp , J. P. Fine, Analysing competing risks data with transformation models, J. R. Stat. Soc. B,, vol. 61 pp , A. M. Glazier, J. H. Nadeau and T. J. Aitman, Finding genes that underlie complex traits, Science, vol. 298 pp , S. E. Hodge and R. C. Elston, Lods, Wrods, and Mods: the interpretation of Lod scores calculated under different models, Genet Epidemiol. vol. 11 pp , S. E. Hodge, V. Vieland and D. A. Greenberg, HLODs remain powerful tools for detection of linkage in the presence of genetic heterogeneity, Am. J. Hum. Genet. vol. 70 pp , C. J. Jiang and Z-B. Zeng, Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines, Genetica, vol. 101 pp , J. D. Kalbfleish and R. L. Prentice, The Statistical Analysis of Failure Time Data, Ed. 2. Wiley, NJ, A. Y. C. Kuk and C. H. Chen, A mixture model combining logistic regression with proportional hazards regression, Biometrika, vol. 79 pp , E. S. Lander and D. Botstein, Mapping mendelian factors underlying quantitative traits using RFLP linkage maps, Genetics, vol. 121 pp , E. S. Lander and N. J. Schork, Genetic dissection of complex traits, Science, vol. 265 pp ,
21 H. Li and E. A. Thompson, Semiparametric estimation of major gene and random familial effects for age of onset, Biometrics, vol. 53 pp D. Y. Lin, An efficient Monte Carlo approach to assessing statistical significance in genomic studies, Bioinformatics. vol. 21 pp , X. Liu and Y. Shao, Asymptotics of likelihood ratio test under loss of identifiability, Ann. Statist. vol. 31 pp , T. A. Louis, Finding the observed information matrix when using the EM algorithm, J. R. Stat. Soc. B, vol. 44 pp , W. Lu and Z. Ying, On semiparametric transformation cure models, Biometrika, vol. 91 pp , M. Lynch and B. Walsh, Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA, Y. Miki, J. Swensen, D. Shattuck-Eidens, P. A. Futreal, et al., A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1, Science, vol. 166 pp , Y. Peng and K. B. G. Dear, A nonparametric mixture model for cure rate estimation, Biometrics, vol. 56 pp , A. Rebai, B. Goffinet and B. Mangin, Comparing power of different methods for QTL detection, Biometrics, vol. 51 pp , Sy, J. P. and Taylor, J. M. G. (2000) Estimation in a Cox proportional hazards cure model. Biometrics, vol. 56, R. C. Symons, M. J. Daly, J. Fridlyand, T. P. Speed, W. D. Cook, et al., Multiple genetic loci modify susceptibility to plasmacytoma-related morbidity in Eµ-v-abl transgenic mice, Proc. Natl. Acad. Sci. USA, vol. 99 pp , Z. B. Zeng, Theoretical basis for separation of multiple linked gene effects in mapping QTL, Proc. Natl. Acad. Sci. USA, vol. 90 pp ,
22 Z. B. Zeng, Precision mapping of quantitative trait loci, Genetics, vol. 136 pp , F. Zou, J. P. Fine, J. Hu and D. Y. Lin, An efficient resampling method for assessing genome-wide statistical significance in mapping quantitative trait loci, Genetics, vol. 168 pp ,
23 Table 1: Estimated QTL positions and effects from the Listeria data Chromosome Pos(cM) LOD γ 0 γ g1 γ g2 β g1 β g Table 2: LOD score thresholds at significance level α obtained from resampling (with the sample standard errors in the parentheses) Resampling Cure model PPH model No. markers α = 5% α = 1% α = 5% α = 1% H 0 : γ g1 = 0, γ g2 = 0 and β g1 = 0, β g2 = (0.03) 3.81 (0.06) 2.13 (0.03) 2.90 (0.052) (0.03) 3.95 (0.06) 2.26 (0.03) 3.04 (0.05) H 1a : γ g1 = 0, γ g2 = 0 and β g1 = 0.5, β g2 = (0.03) 3.82 (0.06) 2.12 (0.03) 2.89 (0.05) (0.03) 3.97 (0.05) 2.26 (0.03) 3.03 (0.05) H 2a : γ g1 = 0.75, γ g2 = 0.75 and β g1 = 0.0, β g2 = (0.03) 3.80 (0.05) 2.13 (0.03) 2.90 (0.05) (0.03) 3.95 (0.06) 2.27 (0.03) 3.04 (0.05) H 3a : γ g1 = 0.25, γ g and β g1 = 0.5, β g2 = (0.03) 3.83 (0.06) 2.12 (0.03) 2.89 (0.05) (0.03) 3.97 (0.06) 2.26 (0.03) 3.04 (0.05) 23
24 Table 3: Simulated Type I errors and powers ( in %) using the setup of Table 2 H 0 H 1a H 2a H 3a No. markers Model α = 5% 1% α = 5% 1% α = 5% 1% α = 5% 1% 6 Cure PPH Cure PPH
25 LOD Score Cure LOD PPH LOD chr1 chr5 chr6 chr13 chr Posititions (cm) LOD Score Cure LOD of Susceptibility chr1 chr5 chr6 chr13 chr15 Posititions (cm) LOD Score Cure LOD of Survival chr1 chr5 chr6 chr13 chr15 Posititions (cm) Figure 1: Top: Testing H 0 of no overall QTL effects, the LOD scores from two QTL mapping methods: Cure mixture model and PPH survival model on the Listeria data. The threshold (dotted horizontal line) is the 5% genome-wide significance level based on the resampling method using cure mixture model. Middle: Testing H 0γ of no QTL effects on susceptibility, the LOD score profile from cure mixture model and the threshold (dotted horizontal line) based on resampling method. Bottom: Testing H 0β of no QTL effects on survival distribution of the susceptible population, the LOD score profile from cure mixture model and the threshold based on resampling method. 25
26 Survival Probability AA Aa aa Survival in Susceptible Subjects Estimated Overall Survival Probability Hours Estimated Survival in Susceptible Subjects Hours Figure 2: On Marker D1M355 in chromosome 1. Upper left is the Kaplan-Meier curves of survival times of all mice after infection of Listeria; Upper right is the Kaplan-Meier curves using only survival times of the mice with observed death; Lower left is the estimated overall survival distribution using mixture cure model; Lower right is the estimated survival distribution of susceptible population using mixture cure model. 26
27 Survival Probability AA Aa aa Survival in Susceptible Subjects Estimated Overall Survival Probability Hours Estimated Survival in Susceptible Subjects Hours Figure 3: On Marker D5M357 in chromosome 5. Upper left is the Kaplan-Meier curves of survival times of all mice after infection of Listeria; Upper right is the Kaplan-Meier curves using only survival times of the mice with observed death; Lower left is the estimated overall survival distribution using mixture cure model; Lower right is the estimated survival distribution of susceptible population using mixture cure model. 27
28 AA group Aa group aa group log( log(s(t))) log( log(s(t))) log( log(s(t))) log t log t log t AA group Aa group aa group log( log(s(t))) log( log(s(t))) log( log(s(t))) log t log t log t Figure 4: For each genotype group at a marker, log( log Ŝs(t)) was plotted against log(t) and the straight line was fitted using the least square method. Top row: Marker D1M355; Bottom row: Marker D5M
29 Proposed parametric estimated survival distribution Chr1:D1M355 pv=0.34 Chr5:D5M357 pv= Kaplan Meier estimate Figure 5: P-P plots of nonparametric Kaplan-Meier estimate of survival distribution against parametric estimates by proportional hazards mixture cure model with a Weibull baseline function. 29
Introduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA
More informationIntroduction to QTL mapping in model organisms
Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]
More informationGene mapping in model organisms
Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4
More informationThe Admixture Model in Linkage Analysis
The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the
More informationStatistical issues in QTL mapping in mice
Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping
More informationBinary trait mapping in experimental crosses with selective genotyping
Genetics: Published Articles Ahead of Print, published on May 4, 2009 as 10.1534/genetics.108.098913 Binary trait mapping in experimental crosses with selective genotyping Ani Manichaikul,1 and Karl W.
More informationUse of hidden Markov models for QTL mapping
Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing
More informationAnumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,
Copyright 2004 by the Genetics Society of America DOI: 10.1534/genetics.104.031427 An Efficient Resampling Method for Assessing Genome-Wide Statistical Significance in Mapping Quantitative Trait Loci Fei
More informationPrediction of the Confidence Interval of Quantitative Trait Loci Location
Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationAsymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis
The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint
More informationMANY agriculturally and biomedically important
Copyright Ó 007 by the Genetics Society of America DOI: 10.1534/genetics.106.059808 Mapping Temporally Varying Quantitative Trait Loci in Time-to-Failure Experiments Frank Johannes 1 Center for Developmental
More informationLecture 3. Truncation, length-bias and prevalence sampling
Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationMultistate Modeling and Applications
Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More information3003 Cure. F. P. Treasure
3003 Cure F. P. reasure November 8, 2000 Peter reasure / November 8, 2000/ Cure / 3003 1 Cure A Simple Cure Model he Concept of Cure A cure model is a survival model where a fraction of the population
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationSemiparametric Regression
Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under
More informationLecture 5 Models and methods for recurrent event data
Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO
UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department
More informationQuantile Regression for Residual Life and Empirical Likelihood
Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationLikelihood Construction, Inference for Parametric Survival Distributions
Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make
More informationSurvival Analysis. Lu Tian and Richard Olshen Stanford University
1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival
More informationMethods for QTL analysis
Methods for QTL analysis Julius van der Werf METHODS FOR QTL ANALYSIS... 44 SINGLE VERSUS MULTIPLE MARKERS... 45 DETERMINING ASSOCIATIONS BETWEEN GENETIC MARKERS AND QTL WITH TWO MARKERS... 45 INTERVAL
More informationNONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS
BIRS 2016 1 NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS Malka Gorfine Tel Aviv University, Israel Joint work with Danielle Braun and Giovanni
More informationA new simple method for improving QTL mapping under selective genotyping
Genetics: Early Online, published on September 22, 2014 as 10.1534/genetics.114.168385 A new simple method for improving QTL mapping under selective genotyping Hsin-I Lee a, Hsiang-An Ho a and Chen-Hung
More informationProduct-limit estimators of the survival function with left or right censored data
Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut
More informationEmpirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm
Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier
More informationMixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina
Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -
More informationMultiple interval mapping for ordinal traits
Genetics: Published Articles Ahead of Print, published on April 3, 2006 as 10.1534/genetics.105.054619 Multiple interval mapping for ordinal traits Jian Li,,1, Shengchu Wang and Zhao-Bang Zeng,, Bioinformatics
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0 8 Parametric models 8. Introduction In the last few sections (the KM
More informationPublished online: 10 Apr 2012.
This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer
More informationTests of independence for censored bivariate failure time data
Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ
More informationPart III Measures of Classification Accuracy for the Prediction of Survival Times
Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples
More informationMapping multiple QTL in experimental crosses
Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com
More informationHierarchical Generalized Linear Models for Multiple QTL Mapping
Genetics: Published Articles Ahead of Print, published on January 1, 009 as 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple QTL Mapping Nengun Yi 1,* and Samprit Baneree
More informationPrevious lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)
Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value
More informationSurvival Distributions, Hazard Functions, Cumulative Hazards
BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution
More informationHypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations
Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate
More informationGoodness-of-fit tests for randomly censored Weibull distributions with estimated parameters
Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly
More informationApproximation of Survival Function by Taylor Series for General Partly Interval Censored Data
Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor
More informationTMA 4275 Lifetime Analysis June 2004 Solution
TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,
More informationLecture 22 Survival Analysis: An Introduction
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which
More informationMapping multiple QTL in experimental crosses
Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]
More informationSurvival Analysis for Case-Cohort Studies
Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz
More informationLecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013
Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype
More informationβ j = coefficient of x j in the model; β = ( β1, β2,
Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)
More informationSemiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes
Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes by Se Hee Kim A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial
More informationConstrained estimation for binary and survival data
Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline
More informationTied survival times; estimation of survival probabilities
Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation
More information[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements
[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers
More informationA Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints
Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationChapter 2 Inference on Mean Residual Life-Overview
Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate
More informationQUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the
Copyright 001 by the Genetics Society of America Statistical Issues in the Analysis of Quantitative Traits in Combined Crosses Fei Zou, Brian S. Yandell and Jason P. Fine Department of Statistics, University
More informationAnalysis of competing risks data and simulation of data following predened subdistribution hazards
Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013
More information11 Survival Analysis and Empirical Likelihood
11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationLongitudinal + Reliability = Joint Modeling
Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,
More informationSample size determination for logistic regression: A simulation study
Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This
More informationCombining dependent tests for linkage or association across multiple phenotypic traits
Biostatistics (2003), 4, 2,pp. 223 229 Printed in Great Britain Combining dependent tests for linkage or association across multiple phenotypic traits XIN XU Program for Population Genetics, Harvard School
More informationThe universal validity of the possible triangle constraint for Affected-Sib-Pairs
The Canadian Journal of Statistics Vol. 31, No.?, 2003, Pages???-??? La revue canadienne de statistique The universal validity of the possible triangle constraint for Affected-Sib-Pairs Zeny Z. Feng, Jiahua
More informationDuration Analysis. Joan Llull
Duration Analysis Joan Llull Panel Data and Duration Models Barcelona GSE joan.llull [at] movebarcelona [dot] eu Introduction Duration Analysis 2 Duration analysis Duration data: how long has an individual
More informationQUANTITATIVE trait analysis plays an important observed. These assumptions are likely to be false when
Copyright 2004 y the Genetics Society of America DOI: 10.1534/genetics.103.023903 Mapping Quantitative Trait Loci With Censored Oservations Guoqing Diao, D. Y. Lin 1 and Fei Zou Department of Biostatistics,
More informationAnalysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates
Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationSurvival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University
Survival Analysis: Weeks 2-3 Lu Tian and Richard Olshen Stanford University 2 Kaplan-Meier(KM) Estimator Nonparametric estimation of the survival function S(t) = pr(t > t) The nonparametric estimation
More informationSupplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements
Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model
More informationTHE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION
INVESTIGATION A New Simple Method for Improving QTL Mapping Under Selective Genotyping Hsin-I Lee,* Hsiang-An Ho,* and Chen-Hung Kao*,,1 *Institute of Statistical Science, Academia Sinica, Taipei 11529,
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationMeei Pyng Ng 1 and Ray Watson 1
Aust N Z J Stat 444), 2002, 467 478 DEALING WITH TIES IN FAILURE TIME DATA Meei Pyng Ng 1 and Ray Watson 1 University of Melbourne Summary In dealing with ties in failure time data the mechanism by which
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationParametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data
Columbia International Publishing Journal of Advanced Computing (2013) 1: 43-58 doi:107726/jac20131004 Research Article Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored
More informationCox s proportional hazards model and Cox s partial likelihood
Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.
More informationSTAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis
STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive
More informationCausal Model Selection Hypothesis Tests in Systems Genetics
1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;
More informationProportional hazards regression
Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression
More informationInferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data
Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone
More informationOn the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease
On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,
More informationA comparison of inverse transform and composition methods of data simulation from the Lindley distribution
Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationCOMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky
COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky Summary Empirical likelihood ratio method (Thomas and Grunkmier 975, Owen 988,
More informationTHESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis
PROPERTIES OF ESTIMATORS FOR RELATIVE RISKS FROM NESTED CASE-CONTROL STUDIES WITH MULTIPLE OUTCOMES (COMPETING RISKS) by NATHALIE C. STØER THESIS for the degree of MASTER OF SCIENCE Modelling and Data
More information