A Robust Identity-by-Descent Procedure Using Affected Sib Pairs: Multipoint Mapping for Complex Diseases
|
|
- Shannon Willis
- 5 years ago
- Views:
Transcription
1 Original Paper Hum Hered 001;51:64 78 Received: May 1, 1999 Revision received: September 10, 1999 Accepted: October 6, 1999 A Robust Identity-by-Descent Procedure Using Affected Sib Pairs: Multipoint Mapping for Complex Diseases Kung-Yee Liang a Yen-Feng Chiu a Terri H. Beaty b Departments of a Biostatistics and b Epidemiology, School of Hygiene and Public Health, Johns Hopkins University, Baltimore, Md., USA Key Words Affected sib pairs W Generalized estimating equations W Identity by descent W Multipoint W Robustness W Sample size and power Abstract Multipoint linkage analysis is a powerful tool to localize susceptibility genes for complex diseases. However, the conventional lod score method relies critically on the correct specification of mode of inheritance for accurate estimation of gene position. On the other hand, allelesharing methods, as currently practiced, are designed to test the null hypothesis of no linkage rather than estimate the location of the susceptibility gene(s). In this paper, we propose an identity-by-descent (IBD)-based procedure to estimate the location of an unobserved susceptibility gene within a chromosomal region framed by multiple markers. Here we deal with the practical situation where some of the markers might not be fully informative. Rather the IBD statistic at an arbitrary within the region is imputed using the multipoint marker information. The method is robust in that no assumption about the genetic mechanism is required other than that the region contains no more than one susceptibility gene. In particular, this approach builds upon a simple representation for the expected IBD at any arbitrary locus within the region using data from affected sib pairs. With this representation, one can carry out a parametric inference procedure to locate an unobserved susceptibility gene. In addition, here we derive a sample size formula for the number of affected sib pairs needed to detect linkage with multiple markers. Throughout, the proposed method is illustrated through simulated data. We have implemented this method including exploratory and formal model-fitting procedures to locate susceptibility genes, plus sample size and power calculations in a program, GENEFINDER, which will be made available shortly. Introduction Copyright 000 S. Karger AG, Basel Likelihood-based linkage analysis and allele-sharing methods remain the two most commonly used tools to test whether markers with known chromosomal locations are linked to unobservable genes controlling susceptibility to a complex disease. Recent advances in molecular biology have generated dense maps of polymorphic markers which can be used individually or in multipoint linkage analysis (where multiple markers are considered simultaneously) to identify and map susceptibility gene for com- ABC Fax karger@karger.ch S. Karger AG, Basel Accessible online at: Dr. Kung-Yee Liang Department of Biostatistics, School of Hygiene and Public Health Johns Hopkins University Baltimore, MD 105 (USA) Fax
2 plex disorders. Logical connections between these two approaches were drawn [e.g., Lander and Kruglyak, 1995; Whittemore, 1996; Kruglyak et al., 1996] and consequently, a statistical package, GENEHUNTER, was made available for unified multipoint analyses of qualitative traits [Kruglyak et al., 1996]. When there is prior evidence that the region contains a susceptibility gene, it is intuitive that the additional information provided by simultaneously considering multiple genetic markers should yield greater power to pinpoint the unobserved disease locus. However, the parametric lod score approach requires the specification of the mode of inheritance, and it is well known that conclusions regarding localizing susceptibility genes drawn from this approach are sensitive to model misspecification. On the other hand, nonparametric allele-sharing methods were designed to test the null hypothesis that marker alleles shared by pairs (or larger sets) of relatives are independent of any putative disease gene; see Hauser and Boehnke [1998] for an excellent review on the methods. For individual markers, this does not provide specific information about map location or genetic distance. Even for multipoint analysis, these allele-sharing methods remain essentially a test of the null hypothesis of no linkage. The multipoint approach toward testing this null hypothesis does, however, create the temptation to conclude that the map location giving the highest evidence against H 0 represents the most likely site for the susceptibility locus. A commonly raised question has been whether or not the map location corresponding to the maximum nonparametric linkage (NPL) score from GENEHUNTER provides direct evidence for the location of the disease gene. The conventional wisdom is that the magnitude of this test statistic depends, among other factors, heavily on sample size. In the context of allele-sharing methods, the notion of sample size has not only to do with the number of pedigrees (or affected individuals), but also with the informativeness of the individual genetic marker. Thus, a more polymorphic marker may give rise to a larger NPL test statistic value, even though it is further away from the disease locus than closer less informative markers. In this paper, we propose a method to estimate the location of an unobserved susceptibility gene when there is preliminary evidence that the chromosomal region framed by multiple markers includes a disease gene. The method is based upon the familiar identity-by-descent (IBD) statistic and hence it avoids the need to specify the model of inheritance as does the lod score method. Furthermore, the proposed method focuses on estimating the location of the disease gene rather than on testing the null hypothesis. As such, we capitalize upon the extra information provided by multiple markers compared to a single marker. The paper has the following organization. First, we study the robustness of these IBD statistics for multipoint analysis. Here, a simple representation relating the expected IBD statistic from a single marker to its distance from the disease locus is derived. The robustness reflects the fact that this expression is the same regardless of the true mode of inheritance. On the other hand, by closely examining the key coefficient in this expression, one obtains insight as to why information for locating an unobserved disease gene is reduced in the presence of oligogenic inheritance and/or genetic heterogeneity. Second, motivated by the simple expression noted above, we propose a sequence of IBDbased statistics to approximate the expected IBD when all markers are not fully informative. This approach, which is exploratory in nature, provides a way to identify, by inspection, the interval formed by the flanking markers in the chromosomal region. Third, a more formal inferential procedure is introduced to estimate the location of the disease gene. In so doing, we draw upon the analogy of this approach to longitudinal data analysis. Fourth, based on our proposed method, we outline how one may compute the sample size needed for multipoint linkage analysis. Here the sample size refers to the number of independent pairs of affected sibs needed to achieve the prespecified statistical power. Robustness of IBD Statistics Consider a chromosomal region R of length T centimorgans which contains no more than one unobserved susceptibility gene at some location Ù. For simplicity, we assume for now the affected sib pair design has been adopted in which M markers at loci 0 ^ t 1!...! t M ^ T were genotyped for each individual; see figure 1. Extension to multiple affected relatives (other than full sibs) will be discussed later. Define S(t) as the number of alleles (0, 1 or ) shared IBD for an affected sib pair at any arbitrary locus t, 0 ^ t ^ T. The following proposition is crucial for the subsequent development: Proposition 1. Under the conventional assumptions of random mating, linkage equilibrium and generalized single ascertainment [Hodge and Vieland, 1996]. The expected number of alleles shared IBD S(t) has the form Ì(t) = E(S(t)A ) = 1 + ( t,ù 1)(E(S(Ù)A ) 1), (1) Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:
3 Fig. 1. Hypothetical locations of M observed markers and unobserved susceptibility gene in a chromosomal region of T cm. where denotes the event that both siblings are affected (a sampling criterion) and the map distance between t and the location of the true susceptibility gene Ù is t,ù = ( t,ù ) = t,ù + (1 t,ù ), () with t,ù being the recombination fraction between marker t and the unobserved disease gene at Ù. The proof of Proposition 1 is given in Appendix 1. Obviously, when the region R is unlinked to the postulated disease gene, t,ù = 1/, in which case Ì(t) = 1 as would be expected. Note that the expression in (1), i.e. E(S(t)A ) being linear in t,ù, holds regardless of the mode of inheritance for the disease. Furthermore, Remark 1. The expected value of S(t) is strictly decreasing in At ÙA, the genetic distance between loci t and Ù, and attains its maximum value in E(S(Ù)A ) at t = Ù. An important statistical implication of this observation is that if S(t) is available for all t D [0, T], then one can examine the plot of S(t) against t. The value tˆ whose S value reaches the peak of the plot would provide a consistent estimate of Ù, the location of the disease locus [e.g. Huber, 1967]. Here the phrase consistency corresponds to the ideal situation where the number of available affected sib pairs is sufficiently large. The coefficient in (1), that is C = E(S(Ù)A ) 1 (3) does depend on the underlying genetic mechanism. In particular Model 1 (Single Locus). For a single locus model in which the disease gene resides within the region R, one has C SL = (Ï M 1) /(4Ï S ), (4) where Ï M and Ï S are the risk ratio for the MZ twin and a sibling of an affected individual, respectively [Risch, 1990a]; see also Suarez et al. [1978] for a different expression. Model (Single Locus with Heterogeneity). In the presence of heterogeneity as characterized by the admixture model by Smith [1963], one has C SLH = C SL, (5) where is the proportion of the linked families, i.e. 0!! 1. Model 3 (Two-Locus Additive Model). When there are two unlinked susceptibility loci involved acting additively, formulas (30) and (31) of Risch [1990b] give C TLA = K 1 K (Ï 1M 1) 4Ï S, (6) where K is the population prevalence, K 1 is the prevalence summand for the first locus and Ï 1M the risk ratio for an MZ twin attributed to locus 1 (the locus linked to the chromosomal region under consideration). This twolocus model can be thought of as a mapping exercise for one locus when a second unlinked locus exists. Model 4 (Two-Locus Multiplicative Model). In the situation that the two unlinked loci operated multiplicatively, formulas (7) and (8) of Risch [1990b] give C TLM = (Ï 1M 1)/(4Ï 1S ). (7) Intuitively, one s ability to estimate Ù, in light of Remark 1, depends on the magnitude of C which lies between zero and one. As shown in figure, the smaller the C, the flatter the plot of Ì(t), the expected IBD statistic, against t, which makes it more difficult to distinguish Ì(t) between Ù and adjacent t values. Here we have used Haldane s [1919] mapping function relating to the genetic distance between two loci, i.e. = (1 e 0.0 A t Ù A )/. (8) 66 Hum Hered 001;51:64 78 Liang/Chiu/Beaty
4 Fig.. Plot of Ì(t), the expected IBD statistics at locus t, versus t. The location for the susceptibility gene is at Ù = 35 cm. This observation is consistent with the conventional wisdom that heterogeneity would in general reduce the power to detect linkage; see (4) and (5). Furthermore, considering single-locus models as a special case of the two-locus model, one can show (see Appendix ) from (4) and (6) that 0! * = C TLA C SL = K 1 K Ï 1S Ï S!1. (9) Thus, the power for locating an unobserved susceptibility gene is compromised (as reflected by *) if the true mechanism is a two-locus additive model. The fact that C TLA can be re-expressed as * C SL also suggests that the reduction in power due to the presence of the second additive locus is equivalent to the impact due to the presence of heterogeneity of magnitude *. We note that similar observations have been made for the likelihood-based linkage analysis [e.g., Goldin, 199; Vieland et al., 199; Schork et al., 1993]. Exploratory Analysis for Locating Ù With the expression in (1), i.e. Ì(t) = 1 + ( t, Ù 1) WC and remark 1, a potential question to raise is how one can approximate Ì(t) from the data at hand. Here the investigators observe for each one of n sib pairs, Y i = (Y i (t 1 ),..., Y i (t j ),..., Y i (t M )) where Y i (t j ) represents the marker information at locus t j, j = 1,..., M for the ith pedigree. In addition, we denote i the affected status of the sib pair (both are affected in this case) along with the parents genotypes, if available, i = 1,..., n. If all individuals were typed for a marker at locus t which is highly polymorphic so IBD sharing can be counted directly, S i (t) is also directly counted. In this case, one can simply estimate Ì(t) by S(t) = n i =1 S i (t)/n, where S i (t) is the number of alleles shared IBD at locus t for the ith sib pair. However, only marker loci t 1! t!..., t M are available and some markers may be less than completely informative about IBD sharing. In this situation, one needs to consider all possible IBD configurations at locus t consistent with the observed marker information, Y i to make inference about S i (t). To this end, we propose the use of the following statistic to estimate Ì(t) by imputing S i (t) given Y i, namely S*(t) = where and n i = 1 S i (tay i ) = S i (tay i )/n, (10) l = 0 Pr(S i (t) = lay i ) = l Pr(S i (t) = lay i ) (11) M j = 1 l j = 0 {Pr(S i (t) = las i (t 1 ) = l 1,..., S i (t M ) = l M ) WPr(S i (t 1 ) = l 1,..., S i (t M ) = l M AY i )}. (1) Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:
5 Fig. 3. Plot of Ì*(t), the expected value of S*(t) at locus t, versus t. The flanking markers are at 30 and 50 cm. a Ù = 35 cm. b Ù = 40 cm. c Ù = 45 cm. Fig. 4. Plot of Ì*(Ù), the expected value of S*(t ) at locus t. The flanking markers are at 30 and 40 cm; Ù = 35 cm. 68 Hum Hered 001;51:64 78 Liang/Chiu/Beaty
6 The computation of (1) through inheritance vectors has been discussed in detail in Whittemore [1996] and Kruglyak et al. [1996] and has been implemented in the GENEHUNTER program. We note that the first term in (1) involves the recombination fractions among markers at loci t,..., t M and t; whereas the second term involves the population allele frequencies of the markers at t j, j = 1,..., M, which are assumed to be known. In the special case that t = t j for a particular j and that this marker at t j is fully polymorphic, S i (tay i ) = S i (t) in which case S*(t) = S(t). It is straightforward to see that S*(t) is an unbiased estimate of Ì*(t) = E(S(tAY )A ). (13) The following proposition examines the connection between Ì*(t) and Ì(t). Proposition. Assuming, without loss of generality, that Ù is flanked by t l and t l +1 for a particular l, l = 1,..., M 1, then we have (i) when t ^ t l or t 6 t l +1, i.e. t is outside the interval formed by the flanking markers, Ì*(t) = Ì(t) = 1 + ( t, Ù 1) WC, (ii) when t l! t! Ù! t l +1, i.e. t is within the same interval as Ù and is to the left of Ù, then Ì*(t) = 1 + C( t, Ù 1) 1 4 t l,t(1 tl,t) Ù,tl +1 (1 Ù,tl +1 ) tl,t l +1 (1 tl,t l +1 ) and (iii) when t l! Ù! t! t l +1, then (14) Ì*(t) = 1 + C( t,ù 1) 1 4 t l,ù(1 tl,ù) t, tl +1 (1 t, tl +1 ) tl, t l +1 (1 tl, t l +1, ) (15) where is defined in (). A sketch of the proof is given in Appendix 3. This proposition shows that our proposed statistic, S*(t), 0 ^ t ^ T provides, irrespective of the true mode of inheritance, unbiased estimation of Ì(t) for loci outside of the flanking interval (t l,t l +1 ). For any arbitrary point t in the same interval as Ù,S*(t) has the tendency to underestimate Ì(t) as the last term in (14) and (15) is!1; see Appendix 3. Figure 3 presents plots of Ì*(t) against t for selected C values. Here we assumed Ù is flanked by two markers at 30 and 50 cm. For loci within the interval of (30, 50), rather than for Ì*(t) to climb up from both ends to the peak at t = Ù, instead a volcano is created. This volcano would not be symmetric unless Ù were in the middle of the interval, i.e. Ù = 40 cm in this case; see figure 3. Figure 4 demonstrates the benefit of having a denser marker map. Here Ù at 35 cm is flanked by markers at 30 and 40 cm, resulting in a smaller spread of the volcano rim. Armed with the characteristics of Ì*(t) stated above, the plot of S*(t) versus t would be useful in locating, by inspection, the flanking interval, formed by adjacent markers around a disease locus at Ù. To illustrate, we simulated fully informative marker data from a single chromosome of length 100 cm for samples of n = 50, 100 and 00 affected sib pairs. We assumed a single locus model with incomplete penetrances of 0.9 for genotype Dd and a phenocopy rate of 0.1 for genotype dd. No specification on the penetrance of DD is needed in this simple example as we further assumed that all siblings were offspring of a Dd! dd mating. This simple model was considered in MacLean et al. [1993], Hodge and Elston [1994] and Liang et al. [1996]. Figures 5a,b give plots of S*(t) versus t over a 100-cM region with M = 10 and 0 equally spaced markers, respectively. In all cases, the estimated curves, S*(t) resemble the theoretical ones, Ì*(t), and the true location for the susceptibility gene (Ù = 45 cm when M = 10 and 47.5 cm when M = 0) was demarcated well by the corresponding flanking markers whose S*(t) values are higher than all other locations. The plots also demonstrate the benefit of having denser markers and larger numbers of affected sib pairs which produce smoother curves. Modelling Approach for Locating Ù The approach suggested earlier is exploratory in nature as it provides a means to visually identify the map interval that may contain Ù. With multiple markers (M 6 ) typed and the fact that Ì(t) can be characterized parametrically by two parameters, Ù and C, a more formal statistical inference for Ù may be warranted. Remark. It is worth reiterating the interpretation of Ù and C before proceeding further. Here Ù is the location of an unobserved susceptibility gene. No assumption has been made as to whether more than one locus is involved in the disease process. The parameter C, as defined in (3), is one less than the expected number of alleles shared IBD at locus Ù for an affected sib pair. While estimable, as will be seen below, the magnitude of estimated C, regardless of its precision, would not necessarily reveal a single true genetic mechanism. For example, even if a one-locus model is correct, one cannot rule out the possibility of linkage heterogeneity as, the proportion of linked fami- Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:
7 Fig. 5. Plots of simulated S*(t) versus t for n = 50, 100 and 00, respectively. a 10 equally spaced markers and Ù = 45 cm. b 0 equally spaced markers and Ù = 47.5 cm. lies, and C SL are totally confounded with each other; see (5). To this end, we propose the use of S i (t j AY i ), j = 1,..., M, i = 1,..., n as the basis for inference on = (Ù,C). The primary reason for utilizing the proposed statistics S i (tay i ) at loci t 1,..., t M only is because, according to Proposition, E(S i (t j AY i )A i ) = Ì(t j ). (16) This property, i.e. S i (t j AY i ) being unbiased for Ì(t j ), is crucial for subsequent development. On the other hand, in the absence of the knowledge as to which intervals formed by the t j s that may cover Ù, one cannot be sure, according to Proposition, if S i (tay i ) is unbiased for Ì(t) for an arbitrary locus t (where no marker data are available). We propose to estimate = (Ù,C) by solving the following estimating equations for : n FÌ( ) i =1 F ) Cov 1 (S i (Y i )A i )(S i (Y i ) Ì( )) I 0, (17) where S i (Y i ) = (S i (t 1 AY i ),..., S i (t M AY i ))) and Ì( ) = (Ì(t 1 ; ),..., Ì(t M ; ))), both of which are M! 1 vectors; the symbol ) denotes the transpose of a matrix of arbitrary dimension. Here we have stressed the dependence of Ì(t j ) on I (Ù,C) by reexpressing it as Ì(t j ; ). This approach was developed in the context of longitudinal data analysis, known as the generalized estimating equations (GEE) method [Liang and Zeger, 1986] where n represents the number of individuals and M is the number of repeated observations, S i (t j AY i ), j = 1,..., M in this case, at occasions t 1, t,..., t M. This method has the desired property that the derived estimates for Ù and C and their estimated precision are valid so long as (16) holds up, which is the case as suggested by Proposition. One minor modification is needed when employing this method (or any method required the differentiability 70 Hum Hered 001;51:64 78 Liang/Chiu/Beaty
8 Fig. 6. Plots of Ì(t) and approximated Ì(t) based on equation 18, respectively, versus t. Here C = 0.5 and Ù = 35 cm. Table 1. GEE estimates and their standard errors of Ù and C for the six simulated data sets in figure 5 Number of markers, M Number of pedigrees, n True location for Ù, cm Estimate B s.e. Ù C B B B B B B B B B B B B0.050 assumption for parameters) is that strictly speaking, Ì(t) in (1) is not differentiable with respect to Ù; see the Haldane function in (8). This can be fixed by replacing At ÙA in (8) by At ÙA if At ÙA 6 Â, 1  (t Ù) + 1 W  if At ÙA! Â, (18) where  is some prespecified positive number. Such modification is commonly used in the context of robust regression analysis as a means to reduce the impact of potential outliers [e.g. Huber, 1964]. Figure 6 contrasts Ì(t) versus the one in (18) with  = 1, instead of At ÙA, is employed when computing ( t, Ù ). As expected, both curves are identical except for locus t which is within  cm of Ù, the true location. The difference appears to be negligible and more importantly, for the new curve, it peaks at Ù as well. We have applied this GEE method to the six simulated data sets presented in figure 5 and results for estimates of Ù and C and the corresponding standard error (s.e.) estimates are given in table 1. In all 6 cases considered, the proposed method provides reliable estimates of Ù, the true (but unobserved) location of the susceptibility gene. The s.e. estimates of Ù strongly suggest the benefit, as expected, of having a greater sample size and denser markers. For instance, the variance estimate of Ù reduces from.76 [= (4.77) ] for n = 50 to 5.76 [= (.40) ] for n = 00 with M = 10 markers, a 74% reduction in uncertainty. Meanwhile, a 39.% reduction in uncertainty is achieved if the number of equally spaced markers increases from 10 to 0 where n = 50. As a side remark, we have also applied the same GEE method with  values of 0.5 and 0.1. Results are virtually identical and therefore are not reported here. Finally, figure 7 gives plots of S*(t) and the fitted Ì(t) along with the estimated Ù and C values versus Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:
9 Fig. 7. Plots of simulated S*(t) and fitted Ì(t) versus t. a n = 100, M = 10 and Ù = 45 cm. b n = 00, M = 0 and Ù = 47.5 cm. t for two simulated data sets. These plots present graphical evidence on the complementary information provided by the exploratory and confirmatory approaches. Power to Detect Linkage Another usage of formula (1) in the GEE approach is that sample sizes (in terms of the number of affected sib pairs n) needed to detect linkage for multipoint analysis can be readily computed. Under the null hypothesis of no linkage between the region and the susceptibility gene, i.e. H 0 : Ù =, the estimating function in (17) reduces to L = n i =1 1)Cov 1 (S i (Y i )A ; H 0 )(S i (Y i ) 1) = n i =1 L i, (19) where 1 = (1,... 1)), a M! 1 matrix. This statistic has the feature of equating S i (t j AY i ), which is observable, to its expected value (1) under H 0. This suggests L, which combines IBD information across markers and pedigrees, can serve as the basis for testing H 0. Specifically, one may test against H 0 by referring L* = L (0) n 1) Cov 1 (S i (Y i )A i ; H 0 )1 1/ i =1 to the standard normal distribution in a one-sided test. Straightforward derivation gives the following sample size formula with type I error Á and type II error ß: ( v n = 0 z Á + v 1 z ß ) (1) Cov 1 (S 1 (Y 1 )A ; H 0 )(Ì( ) 1)), (1) where z Á denotes the (1 Á)th quantile for the standard normal distribution and v 0 = var(l 1 ; H 0 ) = 1) Cov 1 (S 1 (Y 1 )A ; H 0 )1, v 1 = Var(L 1 ; H A ) = 1 Cov 1 (S 1 (Y 1 )A ; H 0 )Cov(S 1 (Y 1 )A ; H A )Cov 1 (S 1 (Y 1 )A ; H 0 ) 1. 7 Hum Hered 001;51:64 78 Liang/Chiu/Beaty
10 Explicit expressions for Cov(S 1 (Y 1 )A ; H 0 ) and Cov(S 1 (Y 1 ) ; H A ), both of which are M! M matrices are given in Appendix 4. For simplicity, we have assumed, as in Risch [1990b], complete polymorphism for all M markers so that S(t j AY) I S(t j ). Remark 3. The following three pieces of information are needed in order to employ the sample size formula shown in (1): (i) the number of markers, M, along with their relative locations, i.e. t 1, t,..., t M in the chromosomal region, (ii) the postulated location of the targeted susceptibility gene, i.e. Ù, which is within the region R formed by the M markers. (iii) the postulated genetic mechanism via E(S(Ù)A ) = 1 + g g 0 var(s(ù)a ) = g + g 0 (g g 0 ), where g l = Pr(S(Ù) = la ), for l = 0, 1,. Remark 4. The denominator of (1) can be reexpressed as M j = 1 (Ì(t j ) 1) = C M j = 1 ( tj, Ù 1), where C is defined in (3). In light of comments made after (3), it is our speculation that it is the magnitude of E(S(Ù)A ), rather than that of var(s(ù)a ), which appears in the numerator of (1), that will have a greater impact on the final sample size in any situation. In particular, Model 1 (single locus) E(S(Ù)A ) = 1 + (Ï M 1)/(4Ï S ) I E var(s(ù)a ) = (Ï M 1) + Ï M Ï S ÏS 4Ï S I V. Model (single locus with heterogeneity) E(S(Ù)A ) = 1 + (Ï M 1) /(4Ï S ), var(s(ù)a ) = (V + (1 )(E 1) 1/) + 1/. Model 3 (two-locus additive model) E(S(Ù)A ) = Ï S K 1 K (Ï1M 1), var(s(ù)a ) = (Ï 1M 1) 16Ï S K 1 K 4 + Ï 1M Ï 1S + 1 4Ï S K 1 K + 1. Model 4 (two-locus multiplicative model) E(s(Ù)A ) = 1 + (Ï 1M 1)/(4Ï 1S ), var(s(ù)a ) = (Ï 1M 1) + Ï 1M Ï 1S Ï 1S. 16Ï 1S For the most part, the above expressions are derived from formulas provided in Risch [1990b]. Figure 8 shows plots of sample sizes needed, in log scale, versus Ù, the true location of the susceptibility gene. Here the type I and II errors are taken as and 0., respectively, the former corresponding approximately to a lod of 3. A single-locus model with Ï S = Ï O [Risch, 1990b] was assumed so that E(S(Ù)A ) = 1 + (Ï S 1)/(Ï S ), Var(S(Ù)A ) = 1 (Ï S 1). 4Ï S We consider three cases: Case I: M = 1 with t 1 = 45 cm; Case II: M = with t 1 = 45 cm and t = 55 cm; Case III: M = 4 with t 1 = 35 cm, t = 45 cm, t 3 = 55 cm and t 4 = 65 cm. Several remarks are worth noting. First, whether having more markers would help to reduce the sample size necessary to detect linkage depends heavily on two accounts: (i) whether Ù is within the region spanned by the markers and (ii) whether one of the observed markers is adjacent to Ù. For example, when Ù = 46 cm and Ï S =, fewer number of affected sib pairs is required in Case I (where a single marker is at t = 45 cm) which requires n = 176 compared to Case II (n = 199) and Case III (n = 65). However, when Ù = 55 cm, one needs 36 pairs to detect linkage to a single marker at t = 45 cm ( " 0.09) as opposed to n = 196 and 6 for Cases II and III, respectively. Second, one important advantage of having multiple markers for mapping the susceptibility gene is that the sample size is remarkably stable over the range spanned by the markers so long as Ù is within that region; see the numbers quoted above for Cases II and III. This is to be contrasted with the single marker situation (Case I) in which the logarithm of sample size increases approximately linearly in At ÙA. Given that one is never certain about the exact location of Ù, the multiple marker approach provides a conservative yet more robust approach for detecting linkage. Third, while the advantage of having multiple (or more) markers for detecting linkage (hypothesis testing) is not overwhelming, its advantage for more precisely locating a susceptibility gene (mapping) is rather convincing, as demonstrated in the previous sections. Thus one should not discount the importance of having multiple and dense markers when estimating the location of Ù is just as critical, if not more so, than simply testing the hypothesis of linkage to the region. Fourth, the vertical scales of three figures match well with the intuition that fewer sample sizes are needed for larger Ï S [e.g. Risch, 1990b]. One striking result is the similarity of the Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:
11 Fig. 8. Plots of sample size, in log scale, versus Ù. Case I: M = 1 and Ù 1 = 45 cm; Case II: M = and t 1 = 45 cm and t = 55 cm; Case III: M = 4 and t 1 = 35 cm, t = 45 cm, t 3 = 55 cm and t 4 = 65 cm. a Ï S =. b Ï S = 5. c Ï s = 10. Fig. 9. Plots of sample size ratio for Ï S 1 versus Ï S =. = Case I; WWWWWW = Case II; = Case III; = 4 (Ï S 1) /Ï S. 74 Hum Hered 001;51:64 78 Liang/Chiu/Beaty
12 patterns as shown in figures 8a c for different Ï S values. To further explore this observation, express n as a function of Ï s i.e. n(ï s ). Figure 9 shows plots of n(ï s = )/n(ï s ) versus Ï S, ^ Ï S ^ 10, for three cases with Ù = 47 cm; results are very similar for other choices of Ù values. It suggests that the ratios of sample sizes for Ï s = versus other Ï s values are very similar regardless of number of markers and the true Ù values. Furthermore, this ratio is well approximated by C (Ï s ) C (Ï s = ) = (Ï s 1) /(Ï s ) = 4(Ï S 1) 1/4 where we have expressed C = E(S(Ù) A ) 1, which appears in the denominator of the sample size formulas (1), as a function of Ï S for single-locus models. Thus the quantity C = E(S(Ù)A ) 1 is not only critical for determining minimum sample size, as reflected by (1), but provides a meaningful approximation to contrasting sample sizes with different Ï S values; see Remark 4. Discussion In this paper, we propose an IBD-based method to locate an unobserved susceptibility gene when data from multiple marker are available. The main novelty of the proposed work lies on the representation seen in (1) which has the following feature: regardless of the true genetic mechanism, the expected IBD for an affected sib pair at any arbitrary locus t is linear in the distance between it and the true susceptibility locus Ù,, so long as the region formed by the markers contains no more than one susceptibility gene. Based upon this representation, we developed both exploratory and a formal model-fitting procedure to locate a susceptibility gene within the chromosomal region of interest. Also presented is the sample size (in terms of the number of affected sib pairs) formula to detect linkage with multiple markers. Extension of the proposed method to the situation in which some pedigrees may possess three or more affected siblings is straightforward as one may replace S i (tay i ) in (10) by m i S il (tay i )/m i () l = 1 where m i is the number of affected sib pairs in the ith pedigree. For designs containing affected relative pairs other than siblings, an expression similar to (1) can be established as well. For example, it is easy to show for a grandparent-grandchild affected pair, denoted as *, one has Ï S, E(S(t) A *) = 1 + (1 t, Ù) E(S(Ù)A *) 1 = 1 + (1 t, Ù)WD. Thus the GEE method can be applied to estimate Ù, C and D in situations where both affected sib pairs and grandparent-grandchild pairs were sampled. Questions as to whether other scoring functions such as Sib all [Whittemore and Halpern, 1994] may be more efficient in detecting linkage than () in more complicated situations including multiple affected relatives are still under investigation. It is worth noting that in order to compute these imputed IBD statistics, one needs to assume the knowledge regarding the ordering and distances of the multiple markers and their allele frequencies. The proposed work should not be viewed as a competitor to the existing methods such as the lod score and NPL methods as implemented in the GENEHUNTER program. Rather, our method implicitly assumes that there is some preliminary evidence of linkage within the chromosomal region. Our main goal is to estimate the map position of a single unobserved susceptibility locus while providing a conventional confidence interval for its map position. Assumption about the evidence of linkage can be validated through testing the null hypothesis of no linkage by using either the methods noted above or, for example, the test statistics considered in Kruglyak et al. [1996], Kong and Cox [1997] and Teng and Siegmund [1998]. Thus, we view the proposed method as a supplement to the existing methods with the ultimate goal of locating a susceptibility gene in a robust fashion. Obviously, this approach is dependent on the mapping function used and we have only considered Haldane s mapping function here. Further work to explore the impact of this assumption and conditions, such as variable levels of independence across the region of interest, possible gender differences in recombination fractions and imprinting, is needed. Finally, the proposed work including the exploratory plots, the GEE method to estimate Ù and sample size and power calculations, has been implemented in a FOR- TRAN software, GENEFINDER. This program will be made available through the web site when it is properly documented and tested. Acknowledgments This work is supported by NIH grant GM The authors are grateful to Paul Rathouz and Steve Self for helpful discussions and to Chiung-Yu Huang for computing assistance. Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:
13 Appendix 1: Proof of Proposition 1 Define f l = Pr( AS(Ù) = l) / Pr( ), l = 0, 1,, where is the event that both siblings are affected. Under the assumption that R contains no more than one susceptible gene, one has for k = 0, 1, Pr(S(t) = k A ) = l = 0 f l WPr(S(t) = k,s(ù) = l), where the joint distribution of S(t) and S(Ù) has been derived by Haseman and Elston [197] as a simple function of t, Ù. Consequently, we have Ì(t) = E(S(t)A ) = 1 f t, Ù + f 1 + f 0 (1 t, Ù ). But it is straightforward to show that Pr(S(Ù) = A ) = f W 1 4, Pr(S(Ù) = A ) = f 1 W 1, Pr(S(Ù) = 0A ) = f 1 0 W 4 Thus, Ì(t) can be expressed as Ì(t) = t, Ù Pr(S(Ù) = A ) + Pr(S(Ù) = 1A ) + (1 t, Ù ) Pr(S(Ù) = 0 A ) = ( t, Ù 1) Pr(S(Ù) = A ) + Pr(S(Ù) = 1A ) + (1 t, Ù ) = 1 + ( t, Ù 1)(E(S(Ù)A ) 1). Appendix 3: A Sketch of the Proof of Proposition According to (11), it is easy to see that Ì*(t) = E(S(tAY )A ) = 1 + E(Pr(S(t) = AY)A ) E(Pr(S(t) = 0AY)A ). For notational simplicity, we assume here that Pr(S(t) AY) follows the Markov chain of order 1 (MC1) assumption, i.e., the distribution of S(t) depends only on the information provided by the flanking markers. For locus t, let t j and t j + 1 denote the loci which cover t. Without loss of generality, we further assume that Y(t j ) = S(t j ) and Y(t j + 1 ) = S(t j + 1 ), i.e. markers at loci t j and t j + 1 are fully polymorphic. With these assumptions, Ì*(t) can be reexpressed as Ì*(t) = 1 + = 1 + W l = 0 a = 0 a = 0 b = 0 b = 0 [ Pr(S(t) = AS(t j ) = a, S(t j + 1 ) = b) Pr(S(t) = 0AS(t j ) = a, S(t j + 1 ) = b) WPr(S(t j ) = a, S(t j + 1) = b A )] [ Pr(S(Ù) = Aa,b ) Pr(S(Ù) = 0Aa,b ) f l Pr(S(t) = l, S(t j ) = a, S(t j + 1) = b ) ] = 1 + l = 0 a = 0 b = 0 f l Pr(S(t) = Aa, b) Pr(S(t) = 0 A,b) WPr(S(Ù) = l, S(t j ) = a,s(t j + 1 ) = b ) Appendix : The Inequality in (9) Using formulas (5) and (14) of Risch [1990a] repeatedly, we have (6) = K 1 (Ï 1M 1) K 4Ï S K 1 K (Ï1M 1)+(Ï 1O 1) + K K (Ï M 1)+(Ï O 1) +4 = K 1 K Ï 1M 1! K 1 K Ï 1M 1 K 1 K (4Ï1S 4) + 4 K 1 K! K 1 K Ï 1M 1 Ï = 1M 1 W 4Ï 1S 4Ï 1S = (4). Here we have adopted the notation that the single locus assumed in Model 1 corresponds to locus 1 in Model 3, the two-locus additive model. Consequently, 0! * = (6) (4) = K 1 Ï 1S K! 1. Ï S = 1 + l = 0 f l B l, (A1) here f l is defined in Appendix 1 as Pr( AS(Ù) = l)/pr( ) and we have used Pr(S(Ù) = Aa,b) to denote Pr(S(Ù) = AS(t j ) = a,s(t j + 1 ) = b ) for simplicity. We consider three exhaustive and exclusive situations: Situation I: Ù is outside of (t j,t j + 1 ). This is equivalent to (i) in Proposition. We consider only the case that Ù is to the right of (t j,t j + 1 ), i.e. t j ^ t ^ t j + 1! Ù as results apply to the other case as well. Applying the MC1 assumption repeatedly, we have B = a b = a b Pr(S(Ù) = AS(tj + 1) = b,s(t) =, S(tj ) = a) * Pr(S(t j + 1 ) = bas(t) =, S(t j ) = a ) * Pr(S(t) =, S(t j ) = a) Pr(S(t) = AS(t j + 1 ) = b,s(ù) = 0, S(t j ) = a) * Pr(S(t j + 1 ) = bas(t) = 0, S(t j ) = a ) * Pr(S(t) = 0, S(t j ) = a) Pr(S(Ù) =, S(t j + 1 ) = b,s(t) =, S(t j ) = a) Pr(S(Ù) =, S(t j + 1 ) = b,s(t) = 0, S(t j ) = a) 76 Hum Hered 001;51:64 78 Liang/Chiu/Beaty
14 = Pr(S(Ù) =, S(t) = ) Pr(S(Ù) =, S(t) = 0) = 1 4 t,ù 1 4 (1 t, Ù) = 1 4 ( j, Ù 1). Similarly, one can show that B 1 = Pr(S(Ù) = 1, S(t) = ) Pr(S(Ù) = 1, S(t) = 0) = 1 t, Ù (1 t, Ù ) 1 t, Ù (1 t, Ù ) = 0 and B 0 = Pr(S(Ù) = 0, S(t) = ) Pr(S(Ù) = 0, S(t) = 0) = 1 4 (1 t, Ù) 1 4 t,ù = 1 4 (1 t, Ù). Thus, one has To show that 0 ^ 1 F ^ 1, note that (1 ) = 1 ( 1) /4 and 0 ^ ( 1) ^ 1. Thus, Ù,tj + 1 (1 Ù,tj + 1 ) tj,t j + 1 (1 tj,t j + 1 ) = 1 ( Ù,t j + 1 1) 1 ( tj,t j + 1 1) 1 ( = Ù,tj + 1 1) 1 ( tj, Ù 1) ( Ù,tj + 1 1). Furthermore, 0 ^ (1 ) ^ 1/4. As a result, 0 ^ F! 1 and this implies that 0 ^ 1 F ^ 1. Situation III: t j ^ Ù ^ t ^ t j + 1 This situation corresponds to (iii) of Proposition. The proof is similar to that in situation II and is therefore omitted. Ì*(t) = 1 + f 1 W 4 ( t, Ù 1) + f 1 W0 + f 1 0 W 4 (1 t, Ù) = 1 + ( t, Ù 1)(Pr(S(Ù) = A ) Pr(S(Ù) = 0 A )) = 1 + ( t, Ù 1)(E(S(Ù)A ) 1) = Ì(t). Situation II: t j ^ t ^ Ù ^ t j + 1 This situation corresponds to (ii) of Proposition. Before proceeding, we state the following lemma which will prove to be useful. Lemma: For any three loci at t 1! t! t 3, t1, t 3 1 = ( t1, t 1)( t, t 3 1). Proof. From (10), one has t1, t 3 1 = (1 t1, t 3 ) = e0.04 A t1 t3 A = e 0.04( A t1 t A + A t t3 A ) = ( t1, t 1)( t, t 3 1). Long and tedious algebraic manipulations give B = B 0 = 1 4 W ( t, Ù 1) 1 4 t j,t (1 tj,t) Ù,tj + 1 (1 Ù,tj + 1 ) tj,t j + 1 (1 tj,t j + 1 (A) and B 1 = 0. Denoting the last term in (A) as 1 F, we have from (A1) Ì*(t) = (f f 1 )( t, Ù 1) (1 F) = 1 + (E(S(Ù)A ) 1)( t, Ù 1) (1 F). Appendix 4: Expressions for Cov(S(Y)1 ; H 0 ) and Cov(S(Y)1 ; H A ) As noted in the text, the jth component of S(Y) = (S(t 1 (Y),..., S(t M AY)) reduces to S(t j ), i.e. S(t j AY ) = S(t j ), if the markers are fully polymorphic, as we assume here. Under H A, i.e. Ù is within the region R, Var(S(t j )A ; H A ) = ( tj, Ù 1) (Var(S(Ù)A ) 1/) + 1/, j = 1,..., M and Cov(S(t j ), S(t l ) ; H A ), j! l = 1,..., M equals to ( tj,t l 1)Var(S(Ù)A ) if Ù D [t j,t l ] ( tj, Ù 1)( tl, Ù 1) (Var(S(Ù)A ) 1/) + 1 ( t j,t l 1) if Ù DA [t j,t l ]. Under the null hypothesis that Ù =, Var(S(Ù)A ) = 1/ and consequently, Cov(s(t j ), S(t l A ; H 0 ) = ( tj,t l 1)/, j! l = 1,..., M. Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:
15 References Goldin LR: Detection of linkage under heterogeneity: comparison of two-locus vs. Admixture models. Genet Epidemiol 199;9: Haldane JBS: The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet 1919;8: Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 197;:3 19. Hauser ER, Boehnke M: Genetic linkage analysis of complex genetic traits by using affected sibling pairs. Biometrics 1998;54: Hodge SE, Elston R: Lods, wrods, and mods: The interpretation of lod scores calculated under different models. Genet Epidemiol 1994;11: Hodge SE, Vieland VJ: The essence of single ascertainment. Genetics 1996;144: Huber PJ: Robust estimation of a location parameter. Ann Math Statist 1964;35: Huber PJ: The behavior of maximum likelihood estimators under non-standard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, vol 1, pp Kong A, Cox NJ: Allele-sharing models: Lod scores and accurate linkage tests. Am J Hum Genet 1997;61: Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: A unified multipoint approach. Am J Hum Genet 1996;58: Lander ES, Kruglyak L: Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nat Genet 1995;11: Liang KY, Rathouz PJ, Beaty TH: Determining linkage and mode of inheritance: Mod scores and other methods. Genet Epidemiol 1996;13: Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13. MacLean CJ, Bishop DT, Sherman SL, Diehl SR: Distribution of lod scores under uncertain mode of inheritance. Am J Hum Genet 1993; 5: Risch N: Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 1990a;46: 8. Risch N: Linkage strategies for genetically complex traits: II. The power of affected relative pairs. Am J Hum Genet 1990b;46:9 41. Schork NJ, Boehnke M, Terwilliger JD, Ott J: Twotrait-locus linkage analysis: A powerful strategy for mapping complex genetic traits. Am J Hum Genet 1993;53: Smith CAB: Testing for heterogeneity of recombination fraction values in human genetics. Ann Hum Genet 1963;7: Suarez BK, Rice J, Reich T: The generalized sib pair IBD distribution: Its use in the detection of linkage. Am Hum Genet 1978;4: Teng J, Siegmund D: Multipoint linkage analysis using affected relative pairs and partially informative markers. Biometrics 1998;54: Vieland VJ, Hodge SE, Greenberg DA: Adequacy of single-locus approximations for linkage analysis of oligogenic traits. Genet Epidemiol 199; 9: Whittemore AS: Genome scanning for linkage: An overview. Am J Hum Genet 1996;59: Whittemore AS, Halpern J: A class of tests for linkage using affected pedigree members. Biometrics 1994;50: Hum Hered 001;51:64 78 Liang/Chiu/Beaty
Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives
Genetic Epidemiology 16:225 249 (1999) Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives Mary Sara McPeek* Department of Statistics, University of Chicago, Chicago, Illinois
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationThe universal validity of the possible triangle constraint for Affected-Sib-Pairs
The Canadian Journal of Statistics Vol. 31, No.?, 2003, Pages???-??? La revue canadienne de statistique The universal validity of the possible triangle constraint for Affected-Sib-Pairs Zeny Z. Feng, Jiahua
More informationThe Admixture Model in Linkage Analysis
The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationOn Computation of P-values in Parametric Linkage Analysis
On Computation of P-values in Parametric Linkage Analysis Azra Kurbašić Centre for Mathematical Sciences Mathematical Statistics Lund University p.1/22 Parametric (v. Nonparametric) Analysis The genetic
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationAnalytic power calculation for QTL linkage analysis of small pedigrees
(2001) 9, 335 ± 340 ã 2001 Nature Publishing Group All rights reserved 1018-4813/01 $15.00 www.nature.com/ejhg ARTICLE for QTL linkage analysis of small pedigrees FruÈhling V Rijsdijk*,1, John K Hewitt
More informationCombining dependent tests for linkage or association across multiple phenotypic traits
Biostatistics (2003), 4, 2,pp. 223 229 Printed in Great Britain Combining dependent tests for linkage or association across multiple phenotypic traits XIN XU Program for Population Genetics, Harvard School
More informationModeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17
Modeling IBD for Pairs of Relatives Biostatistics 666 Lecture 7 Previously Linkage Analysis of Relative Pairs IBS Methods Compare observed and expected sharing IBD Methods Account for frequency of shared
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationThe Lander-Green Algorithm. Biostatistics 666 Lecture 22
The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals
More informationVariance Component Models for Quantitative Traits. Biostatistics 666
Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond
More informationI Have the Power in QTL linkage: single and multilocus analysis
I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department
More informationPower and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees
Johns Hopkins University, Dept. of Biostatistics Working Papers 1-5-2004 Power and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees Weimin Chen Johns Hopkins Bloomberg School of
More informationUse of hidden Markov models for QTL mapping
Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing
More informationTesting for Homogeneity in Genetic Linkage Analysis
Testing for Homogeneity in Genetic Linkage Analysis Yuejiao Fu, 1, Jiahua Chen 2 and John D. Kalbfleisch 3 1 Department of Mathematics and Statistics, York University Toronto, ON, M3J 1P3, Canada 2 Department
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationAsymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis
The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationPrediction of the Confidence Interval of Quantitative Trait Loci Location
Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationDNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to
1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution
More informationA General and Accurate Approach for Computing the Statistical Power of the Transmission Disequilibrium Test for Complex Disease Genes
Genetic Epidemiology 2:53 67 (200) A General and Accurate Approach for Computing the Statistical Power of the Transmission Disequilibrium Test for Complex Disease Genes Wei-Min Chen and Hong-Wen Deng,2
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationStatistical issues in QTL mapping in mice
Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping
More informationMapping quantitative trait loci in oligogenic models
Biostatistics (2001), 2, 2,pp. 147 162 Printed in Great Britain Mapping quantitative trait loci in oligogenic models HSIU-KHUERN TANG, D. SIEGMUND Department of Statistics, 390 Serra Mall, Sequoia Hall,
More informationIntroduction to QTL mapping in model organisms
Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com
More informationMultipoint Quantitative-Trait Linkage Analysis in General Pedigrees
Am. J. Hum. Genet. 6:9, 99 Multipoint Quantitative-Trait Linkage Analysis in General Pedigrees Laura Almasy and John Blangero Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio
More informationResultants in genetic linkage analysis
Journal of Symbolic Computation 41 (2006) 125 137 www.elsevier.com/locate/jsc Resultants in genetic linkage analysis Ingileif B. Hallgrímsdóttir a,, Bernd Sturmfels b a Department of Statistics, University
More informationAFFECTED RELATIVE PAIR LINKAGE STATISTICS THAT MODEL RELATIONSHIP UNCERTAINTY
AFFECTED RELATIVE PAIR LINKAGE STATISTICS THAT MODEL RELATIONSHIP UNCERTAINTY by Amrita Ray BSc, Presidency College, Calcutta, India, 2001 MStat, Indian Statistical Institute, Calcutta, India, 2003 Submitted
More informationLecture 6. QTL Mapping
Lecture 6 QTL Mapping Bruce Walsh. Aug 2003. Nordic Summer Course MAPPING USING INBRED LINE CROSSES We start by considering crosses between inbred lines. The analysis of such crosses illustrates many of
More informationSimple, Robust Linkage Tests for Affected Sibs
Am. J. Hum. Genet. 6:8 4, 998 Simple, Robust Linkage Tests for Affected Sibs Alice S. Whittemore and I-Ping Tu Department of Health Research and Policy, Stanford University School of Medicine; and Stanford
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationEstimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty
Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School
More informationSNP Association Studies with Case-Parent Trios
SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature
More informationQTL mapping under ascertainment
QTL mapping under ascertainment J. PENG Department of Statistics, University of California, Davis, CA 95616 D. SIEGMUND Department of Statistics, Stanford University, Stanford, CA 94305 February 15, 2006
More informationThe Quantitative TDT
The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA
More informationSOLUTIONS TO EXERCISES FOR CHAPTER 9
SOLUTIONS TO EXERCISES FOR CHPTER 9 gronomy 65 Statistical Genetics W. E. Nyquist March 00 Exercise 9.. a. lgebraic method for the grandparent-grandoffspring covariance (see parent-offspring covariance,
More informationNIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.
NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More informationVARIANCE-COMPONENTS (VC) linkage analysis
Copyright Ó 2006 by the Genetics Society of America DOI: 10.1534/genetics.105.054650 Quantitative Trait Linkage Analysis Using Gaussian Copulas Mingyao Li,*,,1 Michael Boehnke, Goncxalo R. Abecasis and
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationLecture 8. QTL Mapping 1: Overview and Using Inbred Lines
Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationGene mapping in model organisms
Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2
More informationComputation of Multilocus Prior Probability of Autozygosity for Complex Inbred Pedigrees
Genetic Epidemiology 14:1 15 (1997) Computation of Multilocus Prior Probability of Autozygosity for Complex Inbred Pedigrees Sun-Wei Guo* Department of Biostatistics, University of Michigan, Ann Arbor
More informationMCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES
MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES Elizabeth A. Thompson Department of Statistics, University of Washington Box 354322, Seattle, WA 98195-4322, USA Email: thompson@stat.washington.edu This
More informationBinary trait mapping in experimental crosses with selective genotyping
Genetics: Published Articles Ahead of Print, published on May 4, 2009 as 10.1534/genetics.108.098913 Binary trait mapping in experimental crosses with selective genotyping Ani Manichaikul,1 and Karl W.
More informationBackward Genotype-Trait Association. in Case-Control Designs
Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,
More informationMapping multiple QTL in experimental crosses
Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com
More informationComputational Aspects of Aggregation in Biological Systems
Computational Aspects of Aggregation in Biological Systems Vladik Kreinovich and Max Shpak University of Texas at El Paso, El Paso, TX 79968, USA vladik@utep.edu, mshpak@utep.edu Summary. Many biologically
More informationInference on pedigree structure from genome screen data. Running title: Inference on pedigree structure. Mary Sara McPeek. The University of Chicago
1 Inference on pedigree structure from genome screen data Running title: Inference on pedigree structure Mary Sara McPeek The University of Chicago Address for correspondence: Department of Statistics,
More informationLongitudinal analysis of ordinal data
Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationEvolutionary quantitative genetics and one-locus population genetics
Evolutionary quantitative genetics and one-locus population genetics READING: Hedrick pp. 57 63, 587 596 Most evolutionary problems involve questions about phenotypic means Goal: determine how selection
More informationMajor questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.
Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary
More informationMethods for Cryptic Structure. Methods for Cryptic Structure
Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationSTATISTICAL GENETICS 98 Bayesian Linkage Analysis, or: How I Learned to Stop Worrying and Love the Posterior Probability of Linkage
Am. J. Hum. Genet. 63:947 954, 998 STATISTICA GENETICS 98 Bayesian inkage Analysis, or: How I earned to Stop Worrying and ove the Posterior Probability of inkage Veronica J. Vieland Departments of Preventive
More informationYour use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
American Society for Quality A Note on the Graphical Analysis of Multidimensional Contingency Tables Author(s): D. R. Cox and Elizabeth Lauh Source: Technometrics, Vol. 9, No. 3 (Aug., 1967), pp. 481-488
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationThe genomes of recombinant inbred lines
The genomes of recombinant inbred lines Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman C57BL/6 2 1 Recombinant inbred lines (by sibling mating)
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationLinkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.
Linkage Mapping Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6. Genetic maps The relative positions of genes on a chromosome can
More informationQTL Mapping I: Overview and using Inbred Lines
QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different
More informationUnivariate Linkage in Mx. Boulder, TC 18, March 2005 Posthuma, Maes, Neale
Univariate Linkage in Mx Boulder, TC 18, March 2005 Posthuma, Maes, Neale VC analysis of Linkage Incorporating IBD Coefficients Covariance might differ according to sharing at a particular locus. Sharing
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 4, Issue 1 2005 Article 11 Combined Association and Linkage Analysis for General Pedigrees and Genetic Models Ola Hössjer University of
More informationCINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012
CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 Silvia Heubach/CINQA 2012 Workshop Objectives To familiarize biology faculty with one of
More informationGene mapping, linkage analysis and computational challenges. Konstantin Strauch
Gene mapping, linkage analysis an computational challenges Konstantin Strauch Institute for Meical Biometry, Informatics, an Epiemiology (IMBIE) University of Bonn E-mail: strauch@uni-bonn.e Genetics an
More informationMethods for QTL analysis
Methods for QTL analysis Julius van der Werf METHODS FOR QTL ANALYSIS... 44 SINGLE VERSUS MULTIPLE MARKERS... 45 DETERMINING ASSOCIATIONS BETWEEN GENETIC MARKERS AND QTL WITH TWO MARKERS... 45 INTERVAL
More informationHarvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen
Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen
More information... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE
NORMAL DISTRIBUTIONS OF PHENOTYPES Mice Fruit Flies In:Introduction to Quantitative Genetics Falconer & Mackay 1996 CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE Mean and variance are two quantities
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationPowerful Regression-Based Quantitative-Trait Linkage Analysis of General Pedigrees
Am. J. Hum. Genet. 71:38 53, 00 Powerful Regression-Based Quantitative-Trait Linkage Analysis of General Pedigrees Pak C. Sham, 1 Shaun Purcell, 1 Stacey S. Cherny, 1, and Gonçalo R. Abecasis 3 1 Institute
More informationLecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011
Lecture 6: Introduction to Quantitative genetics Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Quantitative Genetics The analysis of traits whose variation is determined by both a
More informationA consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation
Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More informationSimple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com
12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationQuantitative Genetics
Bruce Walsh, University of Arizona, Tucson, Arizona, USA Almost any trait that can be defined shows variation, both within and between populations. Quantitative genetics is concerned with the analysis
More informationOPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)
MATHEMATICAL BIOSCIENCES doi:10.3934/mbe.2015.12.503 AND ENGINEERING Volume 12, Number 3, June 2015 pp. 503 523 OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION
More informationMeasurements and Data Analysis
Measurements and Data Analysis 1 Introduction The central point in experimental physical science is the measurement of physical quantities. Experience has shown that all measurements, no matter how carefully
More informationConfirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models
Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,
More informationLecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013
Lecture 9 Short-Term Selection Response: Breeder s equation Bruce Walsh lecture notes Synbreed course version 3 July 2013 1 Response to Selection Selection can change the distribution of phenotypes, and
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationLinkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA
Linkage analysis and QTL mapping in autotetraploid species Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA Collaborators John Bradshaw Zewei Luo Iain Milne Jim McNicol Data and
More informationAnalysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities
More informationChapter 7: Simple linear regression
The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population
More informationMantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC
Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)
More information