QUANTITATIVE trait analysis plays an important observed. These assumptions are likely to be false when

Size: px
Start display at page:

Download "QUANTITATIVE trait analysis plays an important observed. These assumptions are likely to be false when"

Transcription

1 Copyright 2004 y the Genetics Society of America DOI: /genetics Mapping Quantitative Trait Loci With Censored Oservations Guoqing Diao, D. Y. Lin 1 and Fei Zou Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina Manuscript received Octoer 31, 2003 Accepted for pulication June 2, 2004 ABSTRACT The existing statistical methods for mapping quantitative trait loci (QTL) assume that the phenotype follows a normal distriution and is fully oserved. These assumptions may not e satisfied when the phenotype pertains to the survival time or failure time, which has a skewed distriution and is usually suject to censoring due to random loss of follow-up or limited duration of the experiment. In this article, we propose an interval-mapping approach for censored failure time phenotypes. We formulate the effects of QTL on the failure time through parametric proportional hazards models and develop efficient likelihood-ased inference procedures. In addition, we show how to assess genome-wide statistical significance. The performance of the proposed methods is evaluated through extensive simulation studies. An application to a mouse cross is provided. QUANTITATIVE trait analysis plays an important oserved. These assumptions are likely to e false when role in the understanding of genetic variations the phenotype pertains to the survival time or failure in plants and animals. Mapping quantitative trait loci time. The Weiull distriution and other skewed distri- (QTL) can lead to improvements in economic traits, utions with long right tails are more appropriate than such as yield and quality in crop plants and milk producis the normal distriution. Furthermore, the failure time tion in cows. QTL mapping in animals can also provide often suject to censoring so that the trait value is valuale insights into the genetic etiologies of complex known only to e eyond the censoring time. An exam- human diseases (Hilert et al. 1991; Jaco et al. 1991; ple of failure time in plant experiments is the flowering Shepel et al. 1998). time, which may e censored due to limited duration Much of the modern statistical methodology for QTL of the experiment; see Ferreira et al. (1995). In animal mapping in experimental crosses originates from the studies, the failure times of interest include time to seminal work of Lander and Botstein (1989). The tumor and time to death (i.e., survival time), which Lander-Botstein interval-mapping method postulates may e suject to censoring ecause of limited study that QTL occur at a series of positions within a set of duration or death due to unrelated causes. One particu- adjacent marker intervals and that the trait value dein lar example is a mice cross presented y Broman (2003), pends on the QTL genotype through a linear regression which the trait of interest is time to death after a model. The distance etween each pair of genetic markstill acterial infection and in which 30% of the mice are ers is assumed known. The method steps through the alive at the end of the study period. Symons et al. genome in specified increments, say every 1 or 2 cm, (2002) presented another interesting study, in which and calculates the likelihood-ratio statistic for testing the phenotype is the time until terminal illness due to no QTL present at each position. The position with the tumor for E-v-al transgenic mice. largest value of the likelihood-ratio statistic is declared The incompleteness of the trait values presents major to e the candidate QTL location provided that the challenges in the application of the interval-mapping value exceeds a certain threshold level. It is widely recogwhich approach. Broman (2003) considered a cure model in nized that the interval-mapping method has higher power the mice that are alive at the end of the study and requires fewer progenies than the single-marker are regarded as cured and in which the survival times analysis (Lander and Botstein 1989; Haley and Knott among the deaths follow a log-normal distriution. This 1992; Zeng 1994). Doerge et al. (1997) descried in is a specialized model, which can deal only with the greater detail this method and various extensions. situations in which the potential censoring times are Most of the existing QTL-mapping methods require equal among all study sujects. Symons et al. (2002) that the phenotype e normally distriuted and fully utilized a variant of the expectation-maximization (EM) algorithm (Lipsitz and Irahim 1998) to map QTL with censored oservations. This method is computationally intensive and its properties have not een inves- 1 Corresponding author: Department of Biostatistics, University of North Carolina, 3101E McGavran-Greenerg Hall, CB 7420, Chapel tigated. Hill, NC lin@ios.unc.edu In this article, we provide a simple and rigorous exten- Genetics 168: (Novemer 2004)

2 1690 G. Diao, D. Lin and F. Zou sion of the interval-mapping approach of Lander and interference and no genotyping errors, these proailities Botstein (1989) to censored quantitative traits. Specifically, are determined y the genotypes of the two flanking we formulate the effects of QTL on the failure markers and the location of the QTL; see Equation 15.2 time through proportional hazards models (Kalfleish of Lynch and Walsh (1998). and Prentice 2002, Sect. 2.3). We then develop effi- We assume that censoring is noninformative (Kalfleish cient likelihood-ased methods for locating QTL and and Prentice 2002, p. 195) and that the censorcient estimating the effects of QTL. In addition, we show how ing time is independent of the failure time and QTL to assess genome-wide statistical significance y extending genotype. The likelihood for the vector of parameters the analytical results of Lander and Botstein (1989) ased on the complete data (Y i, i, G i, M i )(i1,..., and Dupuis and Siegmund (1999) and y developing n) is proportional to an accurate and efficient Monte Carlo procedure. We conduct extensive simulation studies to evaluate the n i (Yi g)e Y 0 i (t g)dt i,g I(G ig), (2) performance of the proposed methods. Finally, we provide an application to the mice data of Broman (2003). i1 g1,0,1 while the likelihood ased on the oserved data (Y i, i, M i )(i 1,...,n) is proportional to METHODS Interval mapping: In this section, we develop an interval-mapping method for potentially censored failure- time traits in an F 2 intercross population y modeling a single QTL. Expanding the results to other crosses is straightforward. Consider n progenies from an intercross etween two inred strains. Let T i denote the quantitative trait for the ith suject, which pertains to a failure time that can potentially e censored and thus incompletely oserved. Let C i e the censoring time for the ith suject. The oservation on the trait value of the ith suject consists of two components: Y i min(t i, C i ) and i I(T i C i ), where I() is the indicator function for event. The failure time T i is fully oserved only when it is uncensored, i.e., i 1. Suppose that we have data on a set of genetic markers with a known genetic map. Let M i denote the multipoint marker genotype data for the ith suject. We consider a putative QTL locus d in the genome with two possile alleles q and Q from the two inred parents and define G i 1, 0, or 1 according to whether the ith suject has genotype qq, Qq, orqq, respectively, at the QTL. We specify a proportional hazards model for the effects of the QTL genotype on the failure time such that, conditional on the QTL genotype G i, the hazard function of T i takes the form (t G i ) 0 (t)e 1 G i 2 (1 G i ), i 1,...,n, (1) n i1 g1,0,1 i (Yi g)e 0 Y i (t g)dt i,g. (3) To otain the maximum-likelihood estimator (MLE) of, we may maximize the oserved-data likelihood (3) directly. An alternative approach is to apply the EM algorithm (Dempster et al. 1977) to (2). The expected value of the complete-data log-likelihood given the o- served data can e shown to e, up to a constant, where p i,g () and n i1g1,0,1 p i,g () i ln (Y i g) Y i p i,g() v1,0,1 p i,v(), 0 (t g)dt, (4) g 1, 0, 1; i 1,...,n, p i,g() i,g exp i ( 1 g 2 (1 g )) Y i 0 (t g)dt. In the E-step, we evaluate p i,g () at the current estimate of. The M-step can proceed in a similar manner to the case of complete data since expression (4), with in p i,g () fixed, takes the same form as the complete- data log-likelihood. We egin the EM algorithm y as- signing an initial value to and iterate until conver- gence. The initial value for is set to 0 and that of to some value in the parameter space of. The resulting MLE is denoted y ˆ. See appendix a for further detail. We test the null hypothesis of no QTL effects, i.e., H 0 : 0, y the likelihood-ratio statistic where 1 and 2 pertain to the additive and dominant effects of the QTL, and 0 ( ) is an unknown aseline hazard function (Kalfleish and Prentice 2002, Sect. 2.3). In this article, we assume a parametric model for LR 2ln L(ˆ) L( ), 0. In particular, we consider a Weiull hazard function 0 (t) 1 2 t 21, 1 0, 2 0(Kalfleish and Pren- where L( ) is the oserved-data likelihood, and (0, tice 2002, p. 33). ) with eing the restricted MLE of under H 0. The Write (, ), where ( 1, 2 ) and ( 1, LOD score is LR/(2 ln 10). Under H 0, LR is asymptotically 2 2 ). At each locus, we may calculate i,g Pr(G i g M i ) -distriuted with 2 d.f. (appendix ). Note that (g 1, 0, 1; i 1,...,n), which are the conditional p i,g (), ˆ, L(ˆ), LR, and LOD all depend on the locus proailities of the QTL genotypes given the oserved d through the dependence of i,g on d. In the sequel, marker data. Under the assumptions of no crossover we include d in the expressions to emphasize their de-

3 QTL Mapping With Censored Data 1691 pendence on d if amiguity arises. Note also that and W(d) n 1 U T 1 ( ; d)vˆ (d)u ( ; d), L( ) do not depend on d. Thus, as in the case of stanwhere Vˆ (d) n 1 i1û n i (d)û T i (d) is a consistent estimadard interval mapping, the likelihood under H 0 is calcutor of the limiting covariance matrix V(d). It can e lated once while the likelihood under the alternative is shown that W(d) is asymptotically equivalent to LR(d) evaluated at each location in the genome to produce a (Cox and Hinkley 1974, Sect. 9.3). LOD curve for each chromosome. The position with In general, the limiting distriution of sup d W(d) is the largest value of the LOD score is declared to e the not analytically tractale. We use a resampling approach QTL location provided that the value exceeds a certain similar to that of Lin et al. (1993) to approximate the threshold level. We show how to determine the threshnull distriutions of n 1/2 U ( ; d) and sup d W(d). From old level in the following section. appendix, it is easy to see that n 1/2 U ( ; d) converges Thresholds: When searching the entire chromosome to a zero-mean Gaussian process with covariance funcor whole genome for QTL, one should select a threshold tion (d 1, d 2 ) that is the limit of n 1 i1ũ n i (; d 1 )Ũ T i (; level for the LOD score such that the proaility (under d 2 )at(d 1, d 2 ) and that (d 1, d 2 ) can e consistently the null hypothesis) that LOD or some other test statistic estimated y n 1 i1û n i (d 1 )Û T i (d 2 ). Define Û(d) i1 exceeds this level anywhere in the genome equals the n Û i (d)z i, where Z i (i 1,...,n) are independent desired false-positive rate. The pointwise significance standard normal random variales that are independent level ased on the 2 -approximation is inadequate eof the oserved data. Conditional on the oserved data, cause of the multiple tests while the Bonferroni correcn Û(d) is a Gaussian process with mean 0 and covarition is too conservative ecause of the dependence of ance function n 1 i1û n i (d 1 )Û T i (d 2 )at(d 1, d 2 ), which conthe test statistics at different points in the genome. In verges to (d 1, d 2 ). Thus, the conditional distriution appendix c, we show that the likelihood-ratio statistic of n 1/2 Û(d) given the oserved data converges to the LR(d) can e partitioned into the sum of the squares limiting distriution of n 1/2 U ( ; d). As a result, the distriof two asymptotically independent Ornstein-Uhleneck ution of W(d) can e approximated y that of processes. This result is analogous to those of Lander and Botstein (1989) and Dupuis and Siegmund (1999) Ŵ(d) n 1 Û T 1 (d)vˆ (d)û(d). and implies that the analytical approximations of thresholds for normal traits can e applied to the case of To approximate the distriution of sup d W(d), we gen- censored failure time oservations. These analytical renumer of times while holding the oserved data fixed; erate the normal random sample (Z 1,...,Z n ) a large sults assume that the markers are dense or equally spaced with no missing data and thus may not work well for each sample, we calculate Ŵ(d) and sup d Ŵ(d). The in practice. Using results of Davies (1977, 1987), Reai 100(1 )th percentile of the simulated sup d Ŵ(d) is et al. (1994) provided approximate thresholds in acklevel of. This resampling approach is computationally the threshold value for the genome-wide significance cross (BC) and F 2, which are applicale in the intermedimuch more efficient than the use of permutation (Churate map density case. The calculations are formidale, even for F 2, and do not accommodate missing marker chill and Doerge 1994) and other simulation methods data. ecause it involves only simulation of normal random To overcome the limitations of the analytical approxilated data sets. variales and does not entail repeated analysis of simumations, we propose a novel resampling approach to determining the thresholds for genome-wide statistical significance. This approach allows aritrary distriutions of the markers as well as aritrary test positions. It SIMULATIONS also accommodates missing marker data and dominant To investigate the operating characteristics of the proposed markers. methods in practical situations, we performed For evaluating the correlations of the test statistics extensive simulation studies. We generated the failure among different locations, it is more convenient to work times from the Weiull distriutions with aseline haz- with the score statistic than the likelihood-ratio statistic. ard function 0 (t) 1 2 t 21, where and 2 Let U ( ; d) e the score function (ased on the o- 2. We reparameterized 1 and 2 according to k served-data likelihood) for at location d, which can e k (k 1, 2) so as to ensure that the estimates of 1 e approximated y the sum of n independent zero- and 2 are positive. The censoring times were generated mean random vectors i1ũ n i ( 0 ; d), where 0 (0, 0 ) from the uniform (0, ) distriution, where was chosen is the true parameter value under H 0 ; see appendix. to yield 30% censored oservations. Assuming no Thus, n 1/2 U ( ; d) is asymptotically zero-mean normal crossover interference, we generated the marker data with covariance matrix V(d) that is the limit of n 1 from the Markov chain. The interval-mapping step size i1ũ n i ( 0 ; d)ũ T i ( 0 ; d). We replace the unknown quantities was set at 1 cm. in Ũ i ( 0 ; d) y their sample estimators to yield We considered a chromosome with a total length of Û i (d) shown in appendix. The score statistic for testing 100 cm. Genetic maps with different numers of equally H 0 : 0 against H 1 : 0 takes the form spaced markers were simulated. Under H 1, one QTL

4 1692 G. Diao, D. Lin and F. Zou TABLE 1 Summary statistics for the estimator of the additive QTL effect at the true QTL location H 0 : 1 0, 2 0 H 1 : , No. of markers Mean a SE SEE c CP d Mean a SE SEE c CP d a Mean is the mean of the parameter estimator. SE is the standard error of the parameter estimator. c SEE is the mean of the standard error estimator. d CP is the coverage proaility of the 95% confidence interval. located at 35 cm was simulated with and We generated 10,000 replicates of 300 oservations from an F 2 population. We evaluated the finite-sample of unevenly spaced markers, we placed m markers at the following locations, properties of the MLEs of the QTL effects at the true LOC j 50(j 1)/(m 1), j 1,...,[m/2], 100(j 1)/(m 1), otherwise, QTL location. The results for the estimator of the additive QTL effect are summarized in Tale 1. The pro- where LOC j is the jth marker location and [m/2] is the posed estimator appears to e virtually uniased. The largest integer that is less than or equal to m/2. In these standard error estimator reflects accurately the true vari- settings, the first half of the markers is denser than ation. The confidence intervals have proper coverage the second half of the markers. We generated 10,000 proailities. We otained similar results for the estima- replicates of 300 oservations from an F 2 population. tor of the dominant QTL effect (data not shown). We The dense-map and sparse-map approximations were also examined the performance of the proposed inter- otained from Equation C1 in appendix c. The threshval-mapping methods for locating the QTL and estimat- olds for the resampling method were ased on 10,000 ing the QTL effects at the location with the maximum normal samples. The results are summarized in Tales LOD. The results are summarized in Tale 2. There is 3 and 4. little ias for the estimator of the QTL location or the The thresholds ased on the resampling method are estimators of the QTL effects. The mapping is more close to the empirical values, whether the data are generprecise for denser marker maps. ated under H 0 or H 1 ; consequently, the LR tests ased We conducted additional simulation studies to evalu- on these thresholds have proper type I error and power. ate the performance of the analytical and resampling This is true of any genetic map, with or without missing methods for determining genome-wide statistical sig- marker genotypes and dominant markers. The densenificance. We generated oth equally and unequally map approximations are too conservative and thus respaced markers. We also simulated data with missing sult in power loss, while the sparse-map approximations marker genotypes and dominant markers, which are tend to e too lieral. We also assessed the approximamore comparale with real data. We considered one tions y Reai et al. (1994), which turn out to e conservative chromosome with a total length of 100 cm. For the cases when the genetic map is dense. For example, in TABLE 2 Sampling means of the estimators for the QTL location and for the QTL effects at the estimated QTL location in the simulation studies H 0 : 1 0, 2 0 H 1 : , QTL effects QTL effects No. of QTL QTL markers location (cm) 1 2 location (cm) (34.2) (0.141) (0.235) 36.6 (11.3) (0.107) (0.173) (32.8) (0.143) (0.233) 35.8 (10.9) (0.101) (0.162) (31.8) (0.148) (0.251) 35.7 (9.3) (0.098) (0.154) (31.4) (0.150) (0.255) 35.5 (8.9) (0.100) (0.152) Standard errors are shown in parentheses.

5 QTL Mapping With Censored Data 1693 TABLE 3 Analytical and resampling-ased thresholds at the targeted genome-wide significance level of Resampling c Analytical Empirical d H 0 H 1 Dense-map Sparse-map No. of Marker markers pattern a 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% (0.16) (0.25) 9.81 (0.16) (0.25) (0.17) (0.25) (0.16) (0.25) (0.19) (0.27) (0.18) (0.27) (0.16) (0.25) 9.64 (0.17) (0.25) (0.17) (0.25) (0.17) (0.26) (0.19) (0.27) (0.18) (0.26) a Under pattern 1, markers are evenly spaced with no missing marker genotypes or dominant markers; under pattern 2, markers are unevenly spaced with 20% missing marker genotypes and 5% dominant markers. Percentiles of the test statistic ased on 10,000 simulated data sets. c Average thresholds from 10,000 simulated data sets. The values in parentheses are the standard errors of the thresholds. d QTL is located at 35 cm with and the case of 51 markers with 0.05, the sizes are 2.66 and 2.26% for marker patterns 1 and 2, respectively. APPLICATION In this approach, the censored oservations are treated as the true failure times and an average rank is assigned to those oservations. In the two-part approach, Broman considered a cure model in which the mice that are alive at the end of the study are regarded as cured while the survival times among the deaths follow a log-normal distriution. We applied the proposed methods to these data, as- suming a Weiull aseline hazard, and the results are shown in Tale 5 and Figures 1 and 2. The threshold for the LOD score at the 5% genome-wide significance level ased on the resampling approach is 3.43, which is close to 3.27, the threshold otained y permutation for the NP approach of Broman (2003). Our results are fairly consistent with those of Broman (2003). We detect almost the same QTL on chromosomes 5, 13, and 15 except that we detect an additional QTL on chromosome 6 rather than on chromosome 1. The QTL To illustrate our methods, we consider the mice data previously analyzed y Broman (2003). A total of 116 female mice from an intercross etween the BALB/cByJ and C57BL/6ByJ strains were genotyped at 133 markers, including 2 on the X chromosome. The phenotype of interest is the time to death following infection with Listeria monocytogenes. Approximately 30% of the survival times are censored. Broman (2003) proposed a nonparametric (NP) ap- proach and a two-part model. The NP approach is an extension of the Kruskal-Wallis statistic (Lehmann 1975, Sect. 5.2) y assigning a prior weight ( i,g ) to the rank of the ith oservation for each QTL genotype group g. TABLE 4 Sizes/powers (%) according to the analytical and resampling-ased thresholds Analytical Resampling Dense-map Sparse-map No. of Marker H 0 H 1 H 0 H 1 H 0 H 1 markers pattern a 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% a Under pattern 1, markers are evenly spaced with no missing marker genotypes or dominant markers; under pattern 2, markers are unevenly spaced with 20% missing marker genotypes and 5% dominant markers. QTL is located at 35 cm with and

6 1694 G. Diao, D. Lin and F. Zou TABLE 5 Estimates of the QTL positions and QTL effects along with the maximum LOD scores for the data on survival time following infection with Listeria monocytogenes in 116 intercross mice Proposed method Nonparametric Two-part model Chromosome Pos (cm) LOD 1 2 Pos (cm) LOD Pos (cm) LOD Pos, position. on chromosome 5 appears to have a strong additive locations ecause there are two more free parameters in effect and the hazard ratio of the survival time with the two-part model than in the other two methods. This genotype QQ vs. qq is Genotypes qq and Qq at will decrease the power to detect QTL since a larger the QTL on chromosome 6 seem to have similar effects. threshold (i.e., 4.91) is required. To evaluate different The QTL on chromosome 13 appears to have oth methods on a common scale, we converted the LOD additive and dominant effects. The QTL on chromosome curves to the estimated pointwise P-values. Figure 2 dis- 15 appears to have a strong dominant effect. At plays the values of log 10 P for chromosomes 1, 5, 6, 13, most detected QTL locations, our LOD scores are larger and 15. Comparisons with the nonparametric method than those of Broman s NP approach. This suggests that and two-part model reveal that the proposed method our approach may e more efficient in detecting QTL. yields more significant results on the aove chromo- Figure 1 shows the LOD curves from the three methods: somes except chromosome 1. Incidentally, the resampling proposed method, nonparametric method, and two-part method is 100 times faster than the permutation model. The LOD scores from the two-part model are method in this application. larger than those of the other two methods at some QTL To get some ideas aout the adequacy of the Weiull Figure 1. The LOD scores from three QTL mapping methods for the data on survival time following infection with Listeria monocytogenes in 116 intercross mice. The threshold pertains to the 5% genome-wide significance level under the resampling method.

7 QTL Mapping With Censored Data 1695 Figure 2. Plot of the log 10 P-values for three QTL mapping methods for the data on survival time following infection with Listeria monocytogenes in 116 intercross mice. The P-values for the proposed method are ased on 100,000 normal samples. In the region etween 26 and 30 cm on chromosome 5, the P-values are 10 5 and thus are not displayed. The P-values for the NP and two-part models are ased on 11,000 permutation replicates. distriution in descriing the Listeria data, we fitted oth the semiparametric model and the Weiull model at marker D5M357, which is close to the peak of the LOD score on chromosome 5. The estimated QTL effects from the two models are very similar. It would e worthwhile to develop formal goodness-of-fit methods for assessing the adequacy of the parametric survival model at the true QTL location. EXTENSIONS In this section, we extend the single-qtl model to multiple QTL. The approach of interval mapping (IM) considers one putative QTL at a time. The QTL located elsewhere on the genome can have interfering effects, so that the estimators for the locations and effects of QTL may e iased and the power of detecting QTL may e compromised (Lander and Botstein 1989; Haley and Knott 1992; Zeng 1994). Boer et al. (2002) showed that the IM method fails to detect three inter- acting QTL with no main effects through simulation studies. A variety of approaches have een proposed for mapping multiple QTL. These methods can increase the power to detect QTL and reduce iases in the estimators of the QTL effects and locations. In this section, we consider mainly composite-interval mapping (CIM; Jansen 1993; Zeng 1993, 1994) and multiple-interval mapping (MIM; Kao et al. 1999) for censored traits. Composite-interval mapping: The idea of CIM is to comine IM with multiple regression analysis in mapping QTL y conditioning on markers outside a region of interest to account for the effects of other QTL. To extend the original CIM model to censored traits, we consider the following proportional hazards model, (t G i ) 0 (t)exp 1 G i 2 (1 G i ) ( 1k M ik 2k (1 M ik )) kj,j1, (5) where j and j 1 conform to two flanking markers ordering the putative QTL, and M ik is the indicator variale for the marker genotype, which takes values 1, 0, and 1 for genotypes aa, Aa, and AA, respectively. We may further enhance the model y considering the interaction effects etween the putative QTL and controlling markers. Replacing (t G i ) in (2) and (3) with (5), we otain the complete-data and oserved-data likelihood functions, respectively. As in the case of standard interval mapping, we can maximize the oserved-data likelihood directly or apply the EM algorithm to otain the MLEs. We can test H 0 : 0 at any position in the genome. The CIM approach requires that the sample size e large relative to the numer of markers included in the model. In practice, the sample size is generally not very large. Thus, Zeng (1994) suggested including in the model only those markers that are more or less evenly

8 1696 G. Diao, D. Lin and F. Zou spaced in the genome or those preidentified markers plete-data partial-likelihood score function given the that explain most of the genetic variation in the genome. oserved data. The solution to this conditional expecta- This suggestion also applies to our setting. tion is not a maximum partial-likelihood estimator, so Multiple-interval mapping: The MIM approach pro- that the method is not statistically efficient. The Monte posed y Kao et al. (1999) uses multiple marker intervals Carlo simulation is time-consuming. There is no formal simultaneously to fit multiple putative QTL directly in justification for the method of Lipsitz and Irahim the model. Consider K QTL, Q 1,...,Q K, located at d 1, (1998) or that of Symons et al. (2002). Our method is...,d K in the genome. There are 3 K possile QTL computationally much simpler than Monte Carlo simugenotypes. Some of the K QTL may exhiit epistasis. lation. It is ased on the maximum-likelihood estimator We formulate the effects of the K QTL on the failure and is thus statistically efficient. Furthermore, we have time through a proportional hazards model, such that, estalished the theoretical properties of our method conditional on the joint genotype G i (G 1i,...,G Ki ), and assessed its empirical performance through simulathe hazard function of T i takes the form tion studies. The method of Symons et al. (2002) has the advantage that the aseline hazard function is unspecified. (t G i ) 0 (t)exp K j x ij K jk (x T ij B jk x ik ), (6) j1 jk The proposed resampling approach to determining genome-wide significance is applicale to aritrary gewhere x ij (G ij,1 G ij ) T, jk is an indicator variale netic maps and accommodates missing marker data and for epistasis etween Q j and Q k, and j and B jk pertain dominant markers. For the CIM and MIM models, no to the main effects and epistatic effects, respectively. analytical thresholds are availale and the permutation The variale jk indicates, y the values 1 vs. 0, whether method is extremely time-consuming. The proposed or not Q j and Q k interact. Given the marker data M i for resampling approach can e applied to the CIM and the ith suject and assuming no crossover interference, MIM models since all the relevant formulas have een we may calculate i,g (g 1,...,3 K ), the conditional presented for an aritrary likelihood. For the MIM proailities of the 3 K possile genotypes of the K QTL. method, the resampling approach produces appro- The complete-data likelihood takes the same form as priate thresholds for testing each putative QTL given Equation 2 except that the summation of g is now over the others, with or without adjustment for the fact that (1,...,3 K ). To otain the MLEs and LOD scores, we multiple QTL are tested simultaneously. can again apply the EM algorithm. We have written a computer program in C to imple- Since the true numer and locations of the QTL are ment the proposed method. This program is availale unknown, model selection is a critical issue in the MIM from the authors upon request. approach. Kao et al. (1999) suggested stepwise and chunkwise selection with the likelihood-ratio test statistic as a selection criterion to identify QTL, to separate linked QTL, and to analyze epistasis etween QTL. Broman LITERATURE CITED and Speed (2002) developed a modified Bayesian Boer, M. P., C. J. Braak and R. C. Jansen, 2002 A penalized likeli- information criterion (Schwarz 1978) for model selecone-dimensional hood method for mapping epistatic quantitative trait loci with genome searches. Genetics 162: tion. When many QTL are included in the MIM model, Broman, K. W., 2003 Mapping quantitative trait loci in the case of the computation can e very intensive. Sen and Chur- a spike in the phenotype distriution. Genetics 163: chill (2001) reduced the computation urden of MIM Broman, K. W., and T. P. Speed, 2002 A model selection approach for the identification of quantitative trait loci in experimental y replacing the EM algorithm with a Monte Carlo algocrosses (with discussion). J. R. Stat. Soc. B 64: rithm. All these strategies can e applied to our setting. Churchill, G. A., and R. W. Doerge, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: Cox, D. R., and D. V. Hinkley, 1974 Theoretical Statistics. Chapman & DISCUSSION Hall, London. Davies, R. B., 1977 Hypothesis testing when a nuisance parameter We have descried our methods in the context of an is present only under the alternative. Biometrika 64: Davies, R. B., 1987 Hypothesis testing when a nuisance parameter F 2 population. All the proposed methods can e easily is present only under the alternative. Biometrika 74: generalized to other crosses such as BC, F 3, or even more Dempster, A. P., N. M. Laird and D. B. Ruin, 1977 Maximum complicated crosses such as comined crosses (Zou et likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39: al. 2001). We may also extend our methods to accommo- Doerge, R. W., Z-B. Zeng and B. S. Weir, 1997 Statistical issues in date covariates, such as lock factors in an agriculture the search for genes affecting quantitative traits in experimental field trial or cage numer in a mouse cross. populations. Stat. Sci. 12: Symons et al. (2002) formulated the effects of the Dupuis, J., and D. Siegmund, 1999 Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151: QTL on the failure time through the semiparametric proportional hazards model. They utilized a variant of Ferreira, M. E., J. Satagopan, B. S. Yandell, P. H. Williams and the EM algorithm developed y Lipsitz and Irahim T. C. Osorn, 1995 Mapping loci controlling vernalization re- quirement and flower time in Brassica napus. Theor. Appl. Genet. (1998), in which Monte Carlo simulation is used to 90: approximate the conditional expectation of the com- Haley, C. S., and S. A. Knott, 1992 A simple regression method

9 QTL Mapping With Censored Data 1697 for mapping quantitative trait loci in line crosses using flanking Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative markers. Heredity 69: Traits. Sinauer, Sunderland, MA. Hilert, P., K. Lindpaintner, J. S. Beckmann, T. Serikawa, F. Sou- Reai, A., B. Gofinet and B. Mangin, 1994 Approximate thresholds rier et al., 1991 Chromosomal mapping of two genetic loci of interval mapping test for QTL detection. Genetics 138: 235 associated with lood-pressure regulation in hereditary hyperten sive rats. Nature 353: Schwarz, G., 1978 Estimating the dimension of a model. Ann. Stat. Jaco, H. J., K. Lindpaintner, S. E. Lincoln, K. Kusumi, R. K. Bunker 6: et al., 1991 Genetic mapping of a gene causing hypertension in Sen, S., and G. A. Churchill, 2001 A statistical framework of quanti- the stroke-prone spontaneously hypertensive rat. Cell 67: 213 tative trait mapping. Genetics 159: Shepel, L. A., H. Lan, J. D. Haag, G. M. Brasic, M. E. Gheen et al., Jansen, R. C., 1993 Interval mapping of multiple quantitative trait 1998 Genetic identification of multiple loci that control reast loci. Genetics 135: cancer susceptiility. Genetics 149: Kalfleish, J. D., and R. L. Prentice, 2002 The Statistical Analysis Siegmund, D., 1985 Sequential Analysis: Tests and Confidence Intervals. of Failure Time Data, Ed. 2. Wiley, Hooken, NJ. Springer-Verlag, New York. Kao, C. H., Z-B. Zeng and R. D. Teasdale, 1999 Multiple interval Symons, R. C., M. J. Daly, J. Fridlyand, T. P. Speed, W. D. Cook mapping for quantitative trait loci. Genetics 152: et al., 2002 Multiple genetic loci modify susceptiility to plas- Lander, E. S., and D. Botstein, 1989 Mapping Mendelian factors macytoma-related moridity in E-v-al transgenic mice. Proc. underlying quantitative traits using RFLP linkage maps. Genetics Natl. Acad. Sci. USA 99: : Zeng, Z-B., 1993 Theoretical asis of precision mapping of quantita- Lehmann, E. L., 1975 Nonparametrics: Statistical Methods Based on tive trait loci. Proc. Natl. Acad. Sci. USA 90: Ranks. Holden-Day, San Francisco. Zeng, Z-B., 1994 Precision mapping of quantitative traits loci. Genet- Lin, D. Y., L. J. Wei and Z. Ying, 1993 Checking the Cox model ics 136: with cumulative sums of martingale-ased residuals. Biometrika Zou, F., B. S. Yandell and J. P. Fine, 2001 Statistical issues in the 80: analysis of quantitative traits in comined crosses. Genetics 158: Lipsitz, S. R., and J. G. Irahim, 1998 Estimating equations with incomplete categorical covariates in the Cox model. Biometrics 54: Communicating editor: G. Churchill APPENDIX A: COMPUTATIONS OF MLE AND COVARIANCE MATRIX In this section, we present the formulas for the M-step of the EM algorithm under the Weiull model. In addition, we provide the oserved information matrix as well as a consistent estimator of the covariance matrix of MLE ˆ. In the M-step of the (k 1)th iteration, the first and second derivatives of the expected value of the log-likelihood (k) given the oserved data and current estimate ˆ are given y U (k1) () n p i,g (ˆ (k) )U i,g (), (A1) where I (k1) () n i1 g1,0,1 i1 g1,0,1 ig i,gg U i,g () i Yi i,g i (1 e 2 log Yi ) i,g e, 2 log p i,g (ˆ (k) )I i,g (), i,ggg T i,gg i,ge 2 log Yi g I i,g () i,g g T i,g i,g e 2 log Yi i,g e 2 log Yi g T i,g e 2 log Yi e 2 log Yi { i i,g (1 e 2 log Yi )}, g (g, 1 g ) T, and i,g Y i 0 (t g)dt, which is the cumulative hazard function conditional on the QTL genotype g. Then, we can apply the Newton-Raphson algorithm to update the current estimate with the new maximizer ˆ (k1). The oserved information matrix is given y I() n p i,g ()I i,g () n p i,g (){U i,g ()} 2 n p i,g ()U i,g () 2, i1 g1,0,1 i1 g1,0,1 i1 g1,0,1 where for a column vector a, a 2 denotes the matrix aa T. Thus, a consistent estimator of the covariance matrix of ˆ is given y the inverse of the oserved information matrix evaluated at ˆ, i.e., Cov(ˆ) I 1 (ˆ). (A2) APPENDIX B: ASYMPTOTIC PROPERTIES OF SCORE AND LIKELIHOOD-RATIO STATISTICS In this section, we show that LR(d) is asymptotically 2 2-distriuted under H 0 and provide the necessary ingredients for deriving the thresholds. Let U(; d) and I(; d) e the oserved-data score function and information matrix at location d with the following partitions to conform with the partition (, ) of,

10 1698 G. Diao, D. Lin and F. Zou U(; d) U (; d) U (; d), and I(; d) I (; d) I (; d) I (; d) I (; d). Under H 0, n 1 I( ) converges to (d). Denote (d) 1 (d) (d) (d) (d). It can e shown that LR(d) n 1 U T ( ; d) (d)u ( ; d) asymptotically (Cox and Hinkley 1974, Sect. 9.3). Through Taylor series expansions, n 1/2 U ( ; d) n 1/2 i1ũ n i ( 0 ; d) asymptotically, where Ũ i ( 0 ; d) U,i ( 0 ; d) (d) 1 (d)u,i ( 0 ; d) and 0 (0, 0 ). The replacements of the unknown quantities in Ũ i ( 0 ; d) with their sample estimators yield Û i (d) U,i ( ; d) I ( ; d)i 1 ( ; d)u,i ( ; d). Let z(d) ( (d)) 1/2 (n 1/2 U ( 0 ; d) (d) 1 (d)n 1/2 U ( 0 ; d)). Then z(d) converges to a normal distriution with mean 0 and an identity 2 2 covariance matrix. Thus, LR(d) is asymptotically distriuted as 2 2 under H 0. APPENDIX C: ANALYTICAL APPROXIMATIONS OF THRESHOLDS We show in this appendix that, for infinitely dense markers, the null distriution of LR(d) can e approximated y an Ornstein-Uhleneck process. Under H 0, (d) does not depend on d. Let d 1 and d 2 denote two points on the chromosome, and r e the recomination fraction corresponding to the genetic distance d 1 d 2. Under the assumption of no crossover interference, it is easy to show that the correlation etween g(d 1 ) and g(d 2 ) is given y Corr(g(d 1 ), g(d 2 )) 1 2r r provided that r is small. Since U,i (; d) ( i 0 (Y i ))g i (d) under H 0, we have Corr(z(d 1 ), z(d 2 )) Corr(g(d 1 ), g(d 2 )) 1 2r r. The aove result implies that z 1 (d) and z 2 (d), the first and second components of z(d), are approximately independent Ornstein-Uhleneck processes with means zero and variances 1 2r and 1 4r, respectively. By the arguments of Dupuis and Siegmund (1999), the tail distriution of sup d LR(d) under H 0 satisfies P(sup LR(d) a 2 ) 1 exp{(c 3va 2 L)exp(a 2 /2)}, (C1) d where is the average marker distance (in morgans), C is the numer of chromosomes, L is the total length of the genome (in morgans), and v v(a(6) 1/2 ), the definition of which can e found in Siegmund (1985). When 0, the aove formula reduces to that of Lander and Botstein (1989). These results imply that all the analytical thresholds for the normal trait can e applied to our case.

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations, Copyright 2004 by the Genetics Society of America DOI: 10.1534/genetics.104.031427 An Efficient Resampling Method for Assessing Genome-Wide Statistical Significance in Mapping Quantitative Trait Loci Fei

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci Abstract. When censored time-to-event data are used to map quantitative trait loci (QTL), the existence of nonsusceptible

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the Copyright 001 by the Genetics Society of America Statistical Issues in the Analysis of Quantitative Traits in Combined Crosses Fei Zou, Brian S. Yandell and Jason P. Fine Department of Statistics, University

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Prediction of the Confidence Interval of Quantitative Trait Loci Location Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28

More information

Multiple interval mapping for ordinal traits

Multiple interval mapping for ordinal traits Genetics: Published Articles Ahead of Print, published on April 3, 2006 as 10.1534/genetics.105.054619 Multiple interval mapping for ordinal traits Jian Li,,1, Shengchu Wang and Zhao-Bang Zeng,, Bioinformatics

More information

Binary trait mapping in experimental crosses with selective genotyping

Binary trait mapping in experimental crosses with selective genotyping Genetics: Published Articles Ahead of Print, published on May 4, 2009 as 10.1534/genetics.108.098913 Binary trait mapping in experimental crosses with selective genotyping Ani Manichaikul,1 and Karl W.

More information

Methods for QTL analysis

Methods for QTL analysis Methods for QTL analysis Julius van der Werf METHODS FOR QTL ANALYSIS... 44 SINGLE VERSUS MULTIPLE MARKERS... 45 DETERMINING ASSOCIATIONS BETWEEN GENETIC MARKERS AND QTL WITH TWO MARKERS... 45 INTERVAL

More information

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

QTL Model Search. Brian S. Yandell, UW-Madison January 2017 QTL Model Search Brian S. Yandell, UW-Madison January 2017 evolution of QTL models original ideas focused on rare & costly markers models & methods refined as technology advanced single marker regression

More information

The Admixture Model in Linkage Analysis

The Admixture Model in Linkage Analysis The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Hierarchical Generalized Linear Models for Multiple QTL Mapping Genetics: Published Articles Ahead of Print, published on January 1, 009 as 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple QTL Mapping Nengun Yi 1,* and Samprit Baneree

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org R/qtl workshop (part 2) Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Example Sugiyama et al. Genomics 71:70-77, 2001 250 male

More information

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery

More information

A new simple method for improving QTL mapping under selective genotyping

A new simple method for improving QTL mapping under selective genotyping Genetics: Early Online, published on September 22, 2014 as 10.1534/genetics.114.168385 A new simple method for improving QTL mapping under selective genotyping Hsin-I Lee a, Hsiang-An Ho a and Chen-Hung

More information

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets Yu-Ling Chang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

Locating multiple interacting quantitative trait. loci using rank-based model selection

Locating multiple interacting quantitative trait. loci using rank-based model selection Genetics: Published Articles Ahead of Print, published on May 16, 2007 as 10.1534/genetics.106.068031 Locating multiple interacting quantitative trait loci using rank-based model selection, 1 Ma lgorzata

More information

MANY agriculturally and biomedically important

MANY agriculturally and biomedically important Copyright Ó 007 by the Genetics Society of America DOI: 10.1534/genetics.106.059808 Mapping Temporally Varying Quantitative Trait Loci in Time-to-Failure Experiments Frank Johannes 1 Center for Developmental

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

THE basic principle of using genetic markers to study genetic marker map throughout the genome, IM can

THE basic principle of using genetic markers to study genetic marker map throughout the genome, IM can Copyright 1999 by the Genetics Society of America Multiple Interval Mapping for Quantitative Trait Loci Chen-Hung Kao,* Zhao-Bang Zeng and Robert D. Teasdale *Institute of Statistical Science, Academia

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Multiple-Interval Mapping for Quantitative Trait Loci Controlling Endosperm Traits. Chen-Hung Kao 1

Multiple-Interval Mapping for Quantitative Trait Loci Controlling Endosperm Traits. Chen-Hung Kao 1 Copyright 2004 by the Genetics Society of America DOI: 10.1534/genetics.103.021642 Multiple-Interval Mapping for Quantitative Trait Loci Controlling Endosperm Traits Chen-Hung Kao 1 Institute of Statistical

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI by ROBY JOEHANES B.S., Universitas Pelita Harapan, Indonesia, 1999 M.S., Kansas State University, 2002 A REPORT submitted in partial

More information

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked Copyright 2001 by the Genetics Society of America A Statistical Framework for Quantitative Trait Mapping Śaunak Sen and Gary A. Churchill The Jackson Laboratory, Bar Harbor, Maine 04609 Manuscript received

More information

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population Genet. Sel. Evol. 36 (2004) 415 433 415 c INRA, EDP Sciences, 2004 DOI: 10.1051/gse:2004009 Original article Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred

More information

On the mapping of quantitative trait loci at marker and non-marker locations

On the mapping of quantitative trait loci at marker and non-marker locations Genet. Res., Camb. (2002), 79, pp. 97 106. With 3 figures. 2002 Cambridge University Press DOI: 10.1017 S0016672301005420 Printed in the United Kingdom 97 On the mapping of quantitative trait loci at marker

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen A Statistical Framework for Expression Trait Loci (ETL) Mapping Meng Chen Prelim Paper in partial fulfillment of the requirements for the Ph.D. program in the Department of Statistics University of Wisconsin-Madison

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

MOST complex traits are influenced by interacting

MOST complex traits are influenced by interacting Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple Quantitative Trait Locus Mapping Nengjun Yi*,1 and Samprit Banerjee

More information

QTL model selection: key players

QTL model selection: key players QTL Model Selection. Bayesian strategy. Markov chain sampling 3. sampling genetic architectures 4. criteria for model selection Model Selection Seattle SISG: Yandell 0 QTL model selection: key players

More information

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION INVESTIGATION A New Simple Method for Improving QTL Mapping Under Selective Genotyping Hsin-I Lee,* Hsiang-An Ho,* and Chen-Hung Kao*,,1 *Institute of Statistical Science, Academia Sinica, Taipei 11529,

More information

On Computation of P-values in Parametric Linkage Analysis

On Computation of P-values in Parametric Linkage Analysis On Computation of P-values in Parametric Linkage Analysis Azra Kurbašić Centre for Mathematical Sciences Mathematical Statistics Lund University p.1/22 Parametric (v. Nonparametric) Analysis The genetic

More information

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Selecting explanatory variables with the modified version of Bayesian Information Criterion Selecting explanatory variables with the modified version of Bayesian Information Criterion Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh,

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

I Have the Power in QTL linkage: single and multilocus analysis

I Have the Power in QTL linkage: single and multilocus analysis I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Glenn Heller Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center,

More information

Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci

Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci John Stephen Yap 1, Jianqing Fan 2 and Rongling Wu 1 1 Department of Statistics, University

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississippi University, MS38677 K-sample location test, Koziol-Green

More information

Lecture 11: Multiple trait models for QTL analysis

Lecture 11: Multiple trait models for QTL analysis Lecture 11: Multiple trait models for QTL analysis Julius van der Werf Multiple trait mapping of QTL...99 Increased power of QTL detection...99 Testing for linked QTL vs pleiotropic QTL...100 Multiple

More information

QTL mapping of Poisson traits: a simulation study

QTL mapping of Poisson traits: a simulation study AO Ribeiro et al. Crop Breeding and Applied Biotechnology 5:310-317, 2005 Brazilian Society of Plant Breeding. Printed in Brazil QTL mapping of Poisson traits: a simulation study Alex de Oliveira Ribeiro

More information

Two-Stage Improved Group Plans for Burr Type XII Distributions

Two-Stage Improved Group Plans for Burr Type XII Distributions American Journal of Mathematics and Statistics 212, 2(3): 33-39 DOI: 1.5923/j.ajms.21223.4 Two-Stage Improved Group Plans for Burr Type XII Distriutions Muhammad Aslam 1,*, Y. L. Lio 2, Muhammad Azam 1,

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Constructing Confidence Intervals for QTL Location. B. Mangin, B. Goffinet and A. Rebai

Constructing Confidence Intervals for QTL Location. B. Mangin, B. Goffinet and A. Rebai Copyright 0 1994 by the Genetics Society of America Constructing Confidence Intervals for QTL Location B. Mangin, B. Goffinet and A. Rebai Znstitut National de la Recherche Agronomique, Station de Biomitrie

More information

Mapping QTL to a phylogenetic tree

Mapping QTL to a phylogenetic tree Mapping QTL to a phylogenetic tree Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman Human vs mouse www.daviddeen.com 3 Intercross

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Model Selection for Multiple QTL

Model Selection for Multiple QTL Model Selection for Multiple TL 1. reality of multiple TL 3-8. selecting a class of TL models 9-15 3. comparing TL models 16-4 TL model selection criteria issues of detecting epistasis 4. simulations and

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Lecture 9 GxE Mixed Models. Lucia Gutierrez Tucson Winter Institute

Lecture 9 GxE Mixed Models. Lucia Gutierrez Tucson Winter Institute Lecture 9 GxE Mixed Models Lucia Gutierrez Tucson Winter Institute 1 Genotypic Means GENOTYPIC MEANS: y ik = G i E GE i ε ik The environment includes non-genetic factors that affect the phenotype, and

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data

Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data Jaeun Choi A dissertation sumitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

ANIMAL models and their corresponding genomes parameter estimation, because the sample size for the

ANIMAL models and their corresponding genomes parameter estimation, because the sample size for the Copyright 2005 by the Genetics Society of America DOI: 10.1534/genetics.103.019752 An Expectation-Maximization Likelihood-Ratio Test for Handling Missing Data: Application in Experimental Crosses Tianhua

More information

Lecture 6. QTL Mapping

Lecture 6. QTL Mapping Lecture 6 QTL Mapping Bruce Walsh. Aug 2003. Nordic Summer Course MAPPING USING INBRED LINE CROSSES We start by considering crosses between inbred lines. The analysis of such crosses illustrates many of

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Interval Estimation for Parameters of a Bivariate Time Varying Covariate Model

Interval Estimation for Parameters of a Bivariate Time Varying Covariate Model Pertanika J. Sci. & Technol. 17 (2): 313 323 (2009) ISSN: 0128-7680 Universiti Putra Malaysia Press Interval Estimation for Parameters of a Bivariate Time Varying Covariate Model Jayanthi Arasan Department

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI By DÁMARIS SANTANA MORANT A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

More information

Review Article Statistical Methods for Mapping Multiple QTL

Review Article Statistical Methods for Mapping Multiple QTL Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2008, Article ID 286561, 8 pages doi:10.1155/2008/286561 Review Article Statistical Methods for Mapping Multiple QTL Wei Zou

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC)

Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC) Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC) Authors: Alexei C. Ionan, Mei-Yin C. Polley, Lisa M. McShane, Kevin K. Doin Section Page

More information

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht

More information

Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. Jessica Mendes Maia

Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. Jessica Mendes Maia ABSTRACT MAIA, JESSICA M. Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. (Under the direction of Professor Zhao-Bang Zeng). The goal of this dissertation is

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

The genomes of recombinant inbred lines

The genomes of recombinant inbred lines The genomes of recombinant inbred lines Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman C57BL/6 2 1 Recombinant inbred lines (by sibling mating)

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6. Linkage Mapping Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6. Genetic maps The relative positions of genes on a chromosome can

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

1. Define the following terms (1 point each): alternative hypothesis

1. Define the following terms (1 point each): alternative hypothesis 1 1. Define the following terms (1 point each): alternative hypothesis One of three hypotheses indicating that the parameter is not zero; one states the parameter is not equal to zero, one states the parameter

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information