A Robust Identity-by-Descent Procedure Using Affected Sib Pairs: Multipoint Mapping for Complex Diseases

Size: px
Start display at page:

Download "A Robust Identity-by-Descent Procedure Using Affected Sib Pairs: Multipoint Mapping for Complex Diseases"

Transcription

1 Original Paper Hum Hered 001;51:64 78 Received: May 1, 1999 Revision received: September 10, 1999 Accepted: October 6, 1999 A Robust Identity-by-Descent Procedure Using Affected Sib Pairs: Multipoint Mapping for Complex Diseases Kung-Yee Liang a Yen-Feng Chiu a Terri H. Beaty b Departments of a Biostatistics and b Epidemiology, School of Hygiene and Public Health, Johns Hopkins University, Baltimore, Md., USA Key Words Affected sib pairs W Generalized estimating equations W Identity by descent W Multipoint W Robustness W Sample size and power Abstract Multipoint linkage analysis is a powerful tool to localize susceptibility genes for complex diseases. However, the conventional lod score method relies critically on the correct specification of mode of inheritance for accurate estimation of gene position. On the other hand, allelesharing methods, as currently practiced, are designed to test the null hypothesis of no linkage rather than estimate the location of the susceptibility gene(s). In this paper, we propose an identity-by-descent (IBD)-based procedure to estimate the location of an unobserved susceptibility gene within a chromosomal region framed by multiple markers. Here we deal with the practical situation where some of the markers might not be fully informative. Rather the IBD statistic at an arbitrary within the region is imputed using the multipoint marker information. The method is robust in that no assumption about the genetic mechanism is required other than that the region contains no more than one susceptibility gene. In particular, this approach builds upon a simple representation for the expected IBD at any arbitrary locus within the region using data from affected sib pairs. With this representation, one can carry out a parametric inference procedure to locate an unobserved susceptibility gene. In addition, here we derive a sample size formula for the number of affected sib pairs needed to detect linkage with multiple markers. Throughout, the proposed method is illustrated through simulated data. We have implemented this method including exploratory and formal model-fitting procedures to locate susceptibility genes, plus sample size and power calculations in a program, GENEFINDER, which will be made available shortly. Introduction Copyright 000 S. Karger AG, Basel Likelihood-based linkage analysis and allele-sharing methods remain the two most commonly used tools to test whether markers with known chromosomal locations are linked to unobservable genes controlling susceptibility to a complex disease. Recent advances in molecular biology have generated dense maps of polymorphic markers which can be used individually or in multipoint linkage analysis (where multiple markers are considered simultaneously) to identify and map susceptibility gene for com- ABC Fax karger@karger.ch S. Karger AG, Basel Accessible online at: Dr. Kung-Yee Liang Department of Biostatistics, School of Hygiene and Public Health Johns Hopkins University Baltimore, MD 105 (USA) Fax

2 plex disorders. Logical connections between these two approaches were drawn [e.g., Lander and Kruglyak, 1995; Whittemore, 1996; Kruglyak et al., 1996] and consequently, a statistical package, GENEHUNTER, was made available for unified multipoint analyses of qualitative traits [Kruglyak et al., 1996]. When there is prior evidence that the region contains a susceptibility gene, it is intuitive that the additional information provided by simultaneously considering multiple genetic markers should yield greater power to pinpoint the unobserved disease locus. However, the parametric lod score approach requires the specification of the mode of inheritance, and it is well known that conclusions regarding localizing susceptibility genes drawn from this approach are sensitive to model misspecification. On the other hand, nonparametric allele-sharing methods were designed to test the null hypothesis that marker alleles shared by pairs (or larger sets) of relatives are independent of any putative disease gene; see Hauser and Boehnke [1998] for an excellent review on the methods. For individual markers, this does not provide specific information about map location or genetic distance. Even for multipoint analysis, these allele-sharing methods remain essentially a test of the null hypothesis of no linkage. The multipoint approach toward testing this null hypothesis does, however, create the temptation to conclude that the map location giving the highest evidence against H 0 represents the most likely site for the susceptibility locus. A commonly raised question has been whether or not the map location corresponding to the maximum nonparametric linkage (NPL) score from GENEHUNTER provides direct evidence for the location of the disease gene. The conventional wisdom is that the magnitude of this test statistic depends, among other factors, heavily on sample size. In the context of allele-sharing methods, the notion of sample size has not only to do with the number of pedigrees (or affected individuals), but also with the informativeness of the individual genetic marker. Thus, a more polymorphic marker may give rise to a larger NPL test statistic value, even though it is further away from the disease locus than closer less informative markers. In this paper, we propose a method to estimate the location of an unobserved susceptibility gene when there is preliminary evidence that the chromosomal region framed by multiple markers includes a disease gene. The method is based upon the familiar identity-by-descent (IBD) statistic and hence it avoids the need to specify the model of inheritance as does the lod score method. Furthermore, the proposed method focuses on estimating the location of the disease gene rather than on testing the null hypothesis. As such, we capitalize upon the extra information provided by multiple markers compared to a single marker. The paper has the following organization. First, we study the robustness of these IBD statistics for multipoint analysis. Here, a simple representation relating the expected IBD statistic from a single marker to its distance from the disease locus is derived. The robustness reflects the fact that this expression is the same regardless of the true mode of inheritance. On the other hand, by closely examining the key coefficient in this expression, one obtains insight as to why information for locating an unobserved disease gene is reduced in the presence of oligogenic inheritance and/or genetic heterogeneity. Second, motivated by the simple expression noted above, we propose a sequence of IBDbased statistics to approximate the expected IBD when all markers are not fully informative. This approach, which is exploratory in nature, provides a way to identify, by inspection, the interval formed by the flanking markers in the chromosomal region. Third, a more formal inferential procedure is introduced to estimate the location of the disease gene. In so doing, we draw upon the analogy of this approach to longitudinal data analysis. Fourth, based on our proposed method, we outline how one may compute the sample size needed for multipoint linkage analysis. Here the sample size refers to the number of independent pairs of affected sibs needed to achieve the prespecified statistical power. Robustness of IBD Statistics Consider a chromosomal region R of length T centimorgans which contains no more than one unobserved susceptibility gene at some location Ù. For simplicity, we assume for now the affected sib pair design has been adopted in which M markers at loci 0 ^ t 1!...! t M ^ T were genotyped for each individual; see figure 1. Extension to multiple affected relatives (other than full sibs) will be discussed later. Define S(t) as the number of alleles (0, 1 or ) shared IBD for an affected sib pair at any arbitrary locus t, 0 ^ t ^ T. The following proposition is crucial for the subsequent development: Proposition 1. Under the conventional assumptions of random mating, linkage equilibrium and generalized single ascertainment [Hodge and Vieland, 1996]. The expected number of alleles shared IBD S(t) has the form Ì(t) = E(S(t)A ) = 1 + ( t,ù 1)(E(S(Ù)A ) 1), (1) Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:

3 Fig. 1. Hypothetical locations of M observed markers and unobserved susceptibility gene in a chromosomal region of T cm. where denotes the event that both siblings are affected (a sampling criterion) and the map distance between t and the location of the true susceptibility gene Ù is t,ù = ( t,ù ) = t,ù + (1 t,ù ), () with t,ù being the recombination fraction between marker t and the unobserved disease gene at Ù. The proof of Proposition 1 is given in Appendix 1. Obviously, when the region R is unlinked to the postulated disease gene, t,ù = 1/, in which case Ì(t) = 1 as would be expected. Note that the expression in (1), i.e. E(S(t)A ) being linear in t,ù, holds regardless of the mode of inheritance for the disease. Furthermore, Remark 1. The expected value of S(t) is strictly decreasing in At ÙA, the genetic distance between loci t and Ù, and attains its maximum value in E(S(Ù)A ) at t = Ù. An important statistical implication of this observation is that if S(t) is available for all t D [0, T], then one can examine the plot of S(t) against t. The value tˆ whose S value reaches the peak of the plot would provide a consistent estimate of Ù, the location of the disease locus [e.g. Huber, 1967]. Here the phrase consistency corresponds to the ideal situation where the number of available affected sib pairs is sufficiently large. The coefficient in (1), that is C = E(S(Ù)A ) 1 (3) does depend on the underlying genetic mechanism. In particular Model 1 (Single Locus). For a single locus model in which the disease gene resides within the region R, one has C SL = (Ï M 1) /(4Ï S ), (4) where Ï M and Ï S are the risk ratio for the MZ twin and a sibling of an affected individual, respectively [Risch, 1990a]; see also Suarez et al. [1978] for a different expression. Model (Single Locus with Heterogeneity). In the presence of heterogeneity as characterized by the admixture model by Smith [1963], one has C SLH = C SL, (5) where is the proportion of the linked families, i.e. 0!! 1. Model 3 (Two-Locus Additive Model). When there are two unlinked susceptibility loci involved acting additively, formulas (30) and (31) of Risch [1990b] give C TLA = K 1 K (Ï 1M 1) 4Ï S, (6) where K is the population prevalence, K 1 is the prevalence summand for the first locus and Ï 1M the risk ratio for an MZ twin attributed to locus 1 (the locus linked to the chromosomal region under consideration). This twolocus model can be thought of as a mapping exercise for one locus when a second unlinked locus exists. Model 4 (Two-Locus Multiplicative Model). In the situation that the two unlinked loci operated multiplicatively, formulas (7) and (8) of Risch [1990b] give C TLM = (Ï 1M 1)/(4Ï 1S ). (7) Intuitively, one s ability to estimate Ù, in light of Remark 1, depends on the magnitude of C which lies between zero and one. As shown in figure, the smaller the C, the flatter the plot of Ì(t), the expected IBD statistic, against t, which makes it more difficult to distinguish Ì(t) between Ù and adjacent t values. Here we have used Haldane s [1919] mapping function relating to the genetic distance between two loci, i.e. = (1 e 0.0 A t Ù A )/. (8) 66 Hum Hered 001;51:64 78 Liang/Chiu/Beaty

4 Fig.. Plot of Ì(t), the expected IBD statistics at locus t, versus t. The location for the susceptibility gene is at Ù = 35 cm. This observation is consistent with the conventional wisdom that heterogeneity would in general reduce the power to detect linkage; see (4) and (5). Furthermore, considering single-locus models as a special case of the two-locus model, one can show (see Appendix ) from (4) and (6) that 0! * = C TLA C SL = K 1 K Ï 1S Ï S!1. (9) Thus, the power for locating an unobserved susceptibility gene is compromised (as reflected by *) if the true mechanism is a two-locus additive model. The fact that C TLA can be re-expressed as * C SL also suggests that the reduction in power due to the presence of the second additive locus is equivalent to the impact due to the presence of heterogeneity of magnitude *. We note that similar observations have been made for the likelihood-based linkage analysis [e.g., Goldin, 199; Vieland et al., 199; Schork et al., 1993]. Exploratory Analysis for Locating Ù With the expression in (1), i.e. Ì(t) = 1 + ( t, Ù 1) WC and remark 1, a potential question to raise is how one can approximate Ì(t) from the data at hand. Here the investigators observe for each one of n sib pairs, Y i = (Y i (t 1 ),..., Y i (t j ),..., Y i (t M )) where Y i (t j ) represents the marker information at locus t j, j = 1,..., M for the ith pedigree. In addition, we denote i the affected status of the sib pair (both are affected in this case) along with the parents genotypes, if available, i = 1,..., n. If all individuals were typed for a marker at locus t which is highly polymorphic so IBD sharing can be counted directly, S i (t) is also directly counted. In this case, one can simply estimate Ì(t) by S(t) = n i =1 S i (t)/n, where S i (t) is the number of alleles shared IBD at locus t for the ith sib pair. However, only marker loci t 1! t!..., t M are available and some markers may be less than completely informative about IBD sharing. In this situation, one needs to consider all possible IBD configurations at locus t consistent with the observed marker information, Y i to make inference about S i (t). To this end, we propose the use of the following statistic to estimate Ì(t) by imputing S i (t) given Y i, namely S*(t) = where and n i = 1 S i (tay i ) = S i (tay i )/n, (10) l = 0 Pr(S i (t) = lay i ) = l Pr(S i (t) = lay i ) (11) M j = 1 l j = 0 {Pr(S i (t) = las i (t 1 ) = l 1,..., S i (t M ) = l M ) WPr(S i (t 1 ) = l 1,..., S i (t M ) = l M AY i )}. (1) Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:

5 Fig. 3. Plot of Ì*(t), the expected value of S*(t) at locus t, versus t. The flanking markers are at 30 and 50 cm. a Ù = 35 cm. b Ù = 40 cm. c Ù = 45 cm. Fig. 4. Plot of Ì*(Ù), the expected value of S*(t ) at locus t. The flanking markers are at 30 and 40 cm; Ù = 35 cm. 68 Hum Hered 001;51:64 78 Liang/Chiu/Beaty

6 The computation of (1) through inheritance vectors has been discussed in detail in Whittemore [1996] and Kruglyak et al. [1996] and has been implemented in the GENEHUNTER program. We note that the first term in (1) involves the recombination fractions among markers at loci t,..., t M and t; whereas the second term involves the population allele frequencies of the markers at t j, j = 1,..., M, which are assumed to be known. In the special case that t = t j for a particular j and that this marker at t j is fully polymorphic, S i (tay i ) = S i (t) in which case S*(t) = S(t). It is straightforward to see that S*(t) is an unbiased estimate of Ì*(t) = E(S(tAY )A ). (13) The following proposition examines the connection between Ì*(t) and Ì(t). Proposition. Assuming, without loss of generality, that Ù is flanked by t l and t l +1 for a particular l, l = 1,..., M 1, then we have (i) when t ^ t l or t 6 t l +1, i.e. t is outside the interval formed by the flanking markers, Ì*(t) = Ì(t) = 1 + ( t, Ù 1) WC, (ii) when t l! t! Ù! t l +1, i.e. t is within the same interval as Ù and is to the left of Ù, then Ì*(t) = 1 + C( t, Ù 1) 1 4 t l,t(1 tl,t) Ù,tl +1 (1 Ù,tl +1 ) tl,t l +1 (1 tl,t l +1 ) and (iii) when t l! Ù! t! t l +1, then (14) Ì*(t) = 1 + C( t,ù 1) 1 4 t l,ù(1 tl,ù) t, tl +1 (1 t, tl +1 ) tl, t l +1 (1 tl, t l +1, ) (15) where is defined in (). A sketch of the proof is given in Appendix 3. This proposition shows that our proposed statistic, S*(t), 0 ^ t ^ T provides, irrespective of the true mode of inheritance, unbiased estimation of Ì(t) for loci outside of the flanking interval (t l,t l +1 ). For any arbitrary point t in the same interval as Ù,S*(t) has the tendency to underestimate Ì(t) as the last term in (14) and (15) is!1; see Appendix 3. Figure 3 presents plots of Ì*(t) against t for selected C values. Here we assumed Ù is flanked by two markers at 30 and 50 cm. For loci within the interval of (30, 50), rather than for Ì*(t) to climb up from both ends to the peak at t = Ù, instead a volcano is created. This volcano would not be symmetric unless Ù were in the middle of the interval, i.e. Ù = 40 cm in this case; see figure 3. Figure 4 demonstrates the benefit of having a denser marker map. Here Ù at 35 cm is flanked by markers at 30 and 40 cm, resulting in a smaller spread of the volcano rim. Armed with the characteristics of Ì*(t) stated above, the plot of S*(t) versus t would be useful in locating, by inspection, the flanking interval, formed by adjacent markers around a disease locus at Ù. To illustrate, we simulated fully informative marker data from a single chromosome of length 100 cm for samples of n = 50, 100 and 00 affected sib pairs. We assumed a single locus model with incomplete penetrances of 0.9 for genotype Dd and a phenocopy rate of 0.1 for genotype dd. No specification on the penetrance of DD is needed in this simple example as we further assumed that all siblings were offspring of a Dd! dd mating. This simple model was considered in MacLean et al. [1993], Hodge and Elston [1994] and Liang et al. [1996]. Figures 5a,b give plots of S*(t) versus t over a 100-cM region with M = 10 and 0 equally spaced markers, respectively. In all cases, the estimated curves, S*(t) resemble the theoretical ones, Ì*(t), and the true location for the susceptibility gene (Ù = 45 cm when M = 10 and 47.5 cm when M = 0) was demarcated well by the corresponding flanking markers whose S*(t) values are higher than all other locations. The plots also demonstrate the benefit of having denser markers and larger numbers of affected sib pairs which produce smoother curves. Modelling Approach for Locating Ù The approach suggested earlier is exploratory in nature as it provides a means to visually identify the map interval that may contain Ù. With multiple markers (M 6 ) typed and the fact that Ì(t) can be characterized parametrically by two parameters, Ù and C, a more formal statistical inference for Ù may be warranted. Remark. It is worth reiterating the interpretation of Ù and C before proceeding further. Here Ù is the location of an unobserved susceptibility gene. No assumption has been made as to whether more than one locus is involved in the disease process. The parameter C, as defined in (3), is one less than the expected number of alleles shared IBD at locus Ù for an affected sib pair. While estimable, as will be seen below, the magnitude of estimated C, regardless of its precision, would not necessarily reveal a single true genetic mechanism. For example, even if a one-locus model is correct, one cannot rule out the possibility of linkage heterogeneity as, the proportion of linked fami- Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:

7 Fig. 5. Plots of simulated S*(t) versus t for n = 50, 100 and 00, respectively. a 10 equally spaced markers and Ù = 45 cm. b 0 equally spaced markers and Ù = 47.5 cm. lies, and C SL are totally confounded with each other; see (5). To this end, we propose the use of S i (t j AY i ), j = 1,..., M, i = 1,..., n as the basis for inference on = (Ù,C). The primary reason for utilizing the proposed statistics S i (tay i ) at loci t 1,..., t M only is because, according to Proposition, E(S i (t j AY i )A i ) = Ì(t j ). (16) This property, i.e. S i (t j AY i ) being unbiased for Ì(t j ), is crucial for subsequent development. On the other hand, in the absence of the knowledge as to which intervals formed by the t j s that may cover Ù, one cannot be sure, according to Proposition, if S i (tay i ) is unbiased for Ì(t) for an arbitrary locus t (where no marker data are available). We propose to estimate = (Ù,C) by solving the following estimating equations for : n FÌ( ) i =1 F ) Cov 1 (S i (Y i )A i )(S i (Y i ) Ì( )) I 0, (17) where S i (Y i ) = (S i (t 1 AY i ),..., S i (t M AY i ))) and Ì( ) = (Ì(t 1 ; ),..., Ì(t M ; ))), both of which are M! 1 vectors; the symbol ) denotes the transpose of a matrix of arbitrary dimension. Here we have stressed the dependence of Ì(t j ) on I (Ù,C) by reexpressing it as Ì(t j ; ). This approach was developed in the context of longitudinal data analysis, known as the generalized estimating equations (GEE) method [Liang and Zeger, 1986] where n represents the number of individuals and M is the number of repeated observations, S i (t j AY i ), j = 1,..., M in this case, at occasions t 1, t,..., t M. This method has the desired property that the derived estimates for Ù and C and their estimated precision are valid so long as (16) holds up, which is the case as suggested by Proposition. One minor modification is needed when employing this method (or any method required the differentiability 70 Hum Hered 001;51:64 78 Liang/Chiu/Beaty

8 Fig. 6. Plots of Ì(t) and approximated Ì(t) based on equation 18, respectively, versus t. Here C = 0.5 and Ù = 35 cm. Table 1. GEE estimates and their standard errors of Ù and C for the six simulated data sets in figure 5 Number of markers, M Number of pedigrees, n True location for Ù, cm Estimate B s.e. Ù C B B B B B B B B B B B B0.050 assumption for parameters) is that strictly speaking, Ì(t) in (1) is not differentiable with respect to Ù; see the Haldane function in (8). This can be fixed by replacing At ÙA in (8) by At ÙA if At ÙA 6 Â, 1  (t Ù) + 1 W  if At ÙA! Â, (18) where  is some prespecified positive number. Such modification is commonly used in the context of robust regression analysis as a means to reduce the impact of potential outliers [e.g. Huber, 1964]. Figure 6 contrasts Ì(t) versus the one in (18) with  = 1, instead of At ÙA, is employed when computing ( t, Ù ). As expected, both curves are identical except for locus t which is within  cm of Ù, the true location. The difference appears to be negligible and more importantly, for the new curve, it peaks at Ù as well. We have applied this GEE method to the six simulated data sets presented in figure 5 and results for estimates of Ù and C and the corresponding standard error (s.e.) estimates are given in table 1. In all 6 cases considered, the proposed method provides reliable estimates of Ù, the true (but unobserved) location of the susceptibility gene. The s.e. estimates of Ù strongly suggest the benefit, as expected, of having a greater sample size and denser markers. For instance, the variance estimate of Ù reduces from.76 [= (4.77) ] for n = 50 to 5.76 [= (.40) ] for n = 00 with M = 10 markers, a 74% reduction in uncertainty. Meanwhile, a 39.% reduction in uncertainty is achieved if the number of equally spaced markers increases from 10 to 0 where n = 50. As a side remark, we have also applied the same GEE method with  values of 0.5 and 0.1. Results are virtually identical and therefore are not reported here. Finally, figure 7 gives plots of S*(t) and the fitted Ì(t) along with the estimated Ù and C values versus Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:

9 Fig. 7. Plots of simulated S*(t) and fitted Ì(t) versus t. a n = 100, M = 10 and Ù = 45 cm. b n = 00, M = 0 and Ù = 47.5 cm. t for two simulated data sets. These plots present graphical evidence on the complementary information provided by the exploratory and confirmatory approaches. Power to Detect Linkage Another usage of formula (1) in the GEE approach is that sample sizes (in terms of the number of affected sib pairs n) needed to detect linkage for multipoint analysis can be readily computed. Under the null hypothesis of no linkage between the region and the susceptibility gene, i.e. H 0 : Ù =, the estimating function in (17) reduces to L = n i =1 1)Cov 1 (S i (Y i )A ; H 0 )(S i (Y i ) 1) = n i =1 L i, (19) where 1 = (1,... 1)), a M! 1 matrix. This statistic has the feature of equating S i (t j AY i ), which is observable, to its expected value (1) under H 0. This suggests L, which combines IBD information across markers and pedigrees, can serve as the basis for testing H 0. Specifically, one may test against H 0 by referring L* = L (0) n 1) Cov 1 (S i (Y i )A i ; H 0 )1 1/ i =1 to the standard normal distribution in a one-sided test. Straightforward derivation gives the following sample size formula with type I error Á and type II error ß: ( v n = 0 z Á + v 1 z ß ) (1) Cov 1 (S 1 (Y 1 )A ; H 0 )(Ì( ) 1)), (1) where z Á denotes the (1 Á)th quantile for the standard normal distribution and v 0 = var(l 1 ; H 0 ) = 1) Cov 1 (S 1 (Y 1 )A ; H 0 )1, v 1 = Var(L 1 ; H A ) = 1 Cov 1 (S 1 (Y 1 )A ; H 0 )Cov(S 1 (Y 1 )A ; H A )Cov 1 (S 1 (Y 1 )A ; H 0 ) 1. 7 Hum Hered 001;51:64 78 Liang/Chiu/Beaty

10 Explicit expressions for Cov(S 1 (Y 1 )A ; H 0 ) and Cov(S 1 (Y 1 ) ; H A ), both of which are M! M matrices are given in Appendix 4. For simplicity, we have assumed, as in Risch [1990b], complete polymorphism for all M markers so that S(t j AY) I S(t j ). Remark 3. The following three pieces of information are needed in order to employ the sample size formula shown in (1): (i) the number of markers, M, along with their relative locations, i.e. t 1, t,..., t M in the chromosomal region, (ii) the postulated location of the targeted susceptibility gene, i.e. Ù, which is within the region R formed by the M markers. (iii) the postulated genetic mechanism via E(S(Ù)A ) = 1 + g g 0 var(s(ù)a ) = g + g 0 (g g 0 ), where g l = Pr(S(Ù) = la ), for l = 0, 1,. Remark 4. The denominator of (1) can be reexpressed as M j = 1 (Ì(t j ) 1) = C M j = 1 ( tj, Ù 1), where C is defined in (3). In light of comments made after (3), it is our speculation that it is the magnitude of E(S(Ù)A ), rather than that of var(s(ù)a ), which appears in the numerator of (1), that will have a greater impact on the final sample size in any situation. In particular, Model 1 (single locus) E(S(Ù)A ) = 1 + (Ï M 1)/(4Ï S ) I E var(s(ù)a ) = (Ï M 1) + Ï M Ï S ÏS 4Ï S I V. Model (single locus with heterogeneity) E(S(Ù)A ) = 1 + (Ï M 1) /(4Ï S ), var(s(ù)a ) = (V + (1 )(E 1) 1/) + 1/. Model 3 (two-locus additive model) E(S(Ù)A ) = Ï S K 1 K (Ï1M 1), var(s(ù)a ) = (Ï 1M 1) 16Ï S K 1 K 4 + Ï 1M Ï 1S + 1 4Ï S K 1 K + 1. Model 4 (two-locus multiplicative model) E(s(Ù)A ) = 1 + (Ï 1M 1)/(4Ï 1S ), var(s(ù)a ) = (Ï 1M 1) + Ï 1M Ï 1S Ï 1S. 16Ï 1S For the most part, the above expressions are derived from formulas provided in Risch [1990b]. Figure 8 shows plots of sample sizes needed, in log scale, versus Ù, the true location of the susceptibility gene. Here the type I and II errors are taken as and 0., respectively, the former corresponding approximately to a lod of 3. A single-locus model with Ï S = Ï O [Risch, 1990b] was assumed so that E(S(Ù)A ) = 1 + (Ï S 1)/(Ï S ), Var(S(Ù)A ) = 1 (Ï S 1). 4Ï S We consider three cases: Case I: M = 1 with t 1 = 45 cm; Case II: M = with t 1 = 45 cm and t = 55 cm; Case III: M = 4 with t 1 = 35 cm, t = 45 cm, t 3 = 55 cm and t 4 = 65 cm. Several remarks are worth noting. First, whether having more markers would help to reduce the sample size necessary to detect linkage depends heavily on two accounts: (i) whether Ù is within the region spanned by the markers and (ii) whether one of the observed markers is adjacent to Ù. For example, when Ù = 46 cm and Ï S =, fewer number of affected sib pairs is required in Case I (where a single marker is at t = 45 cm) which requires n = 176 compared to Case II (n = 199) and Case III (n = 65). However, when Ù = 55 cm, one needs 36 pairs to detect linkage to a single marker at t = 45 cm ( " 0.09) as opposed to n = 196 and 6 for Cases II and III, respectively. Second, one important advantage of having multiple markers for mapping the susceptibility gene is that the sample size is remarkably stable over the range spanned by the markers so long as Ù is within that region; see the numbers quoted above for Cases II and III. This is to be contrasted with the single marker situation (Case I) in which the logarithm of sample size increases approximately linearly in At ÙA. Given that one is never certain about the exact location of Ù, the multiple marker approach provides a conservative yet more robust approach for detecting linkage. Third, while the advantage of having multiple (or more) markers for detecting linkage (hypothesis testing) is not overwhelming, its advantage for more precisely locating a susceptibility gene (mapping) is rather convincing, as demonstrated in the previous sections. Thus one should not discount the importance of having multiple and dense markers when estimating the location of Ù is just as critical, if not more so, than simply testing the hypothesis of linkage to the region. Fourth, the vertical scales of three figures match well with the intuition that fewer sample sizes are needed for larger Ï S [e.g. Risch, 1990b]. One striking result is the similarity of the Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:

11 Fig. 8. Plots of sample size, in log scale, versus Ù. Case I: M = 1 and Ù 1 = 45 cm; Case II: M = and t 1 = 45 cm and t = 55 cm; Case III: M = 4 and t 1 = 35 cm, t = 45 cm, t 3 = 55 cm and t 4 = 65 cm. a Ï S =. b Ï S = 5. c Ï s = 10. Fig. 9. Plots of sample size ratio for Ï S 1 versus Ï S =. = Case I; WWWWWW = Case II; = Case III; = 4 (Ï S 1) /Ï S. 74 Hum Hered 001;51:64 78 Liang/Chiu/Beaty

12 patterns as shown in figures 8a c for different Ï S values. To further explore this observation, express n as a function of Ï s i.e. n(ï s ). Figure 9 shows plots of n(ï s = )/n(ï s ) versus Ï S, ^ Ï S ^ 10, for three cases with Ù = 47 cm; results are very similar for other choices of Ù values. It suggests that the ratios of sample sizes for Ï s = versus other Ï s values are very similar regardless of number of markers and the true Ù values. Furthermore, this ratio is well approximated by C (Ï s ) C (Ï s = ) = (Ï s 1) /(Ï s ) = 4(Ï S 1) 1/4 where we have expressed C = E(S(Ù) A ) 1, which appears in the denominator of the sample size formulas (1), as a function of Ï S for single-locus models. Thus the quantity C = E(S(Ù)A ) 1 is not only critical for determining minimum sample size, as reflected by (1), but provides a meaningful approximation to contrasting sample sizes with different Ï S values; see Remark 4. Discussion In this paper, we propose an IBD-based method to locate an unobserved susceptibility gene when data from multiple marker are available. The main novelty of the proposed work lies on the representation seen in (1) which has the following feature: regardless of the true genetic mechanism, the expected IBD for an affected sib pair at any arbitrary locus t is linear in the distance between it and the true susceptibility locus Ù,, so long as the region formed by the markers contains no more than one susceptibility gene. Based upon this representation, we developed both exploratory and a formal model-fitting procedure to locate a susceptibility gene within the chromosomal region of interest. Also presented is the sample size (in terms of the number of affected sib pairs) formula to detect linkage with multiple markers. Extension of the proposed method to the situation in which some pedigrees may possess three or more affected siblings is straightforward as one may replace S i (tay i ) in (10) by m i S il (tay i )/m i () l = 1 where m i is the number of affected sib pairs in the ith pedigree. For designs containing affected relative pairs other than siblings, an expression similar to (1) can be established as well. For example, it is easy to show for a grandparent-grandchild affected pair, denoted as *, one has Ï S, E(S(t) A *) = 1 + (1 t, Ù) E(S(Ù)A *) 1 = 1 + (1 t, Ù)WD. Thus the GEE method can be applied to estimate Ù, C and D in situations where both affected sib pairs and grandparent-grandchild pairs were sampled. Questions as to whether other scoring functions such as Sib all [Whittemore and Halpern, 1994] may be more efficient in detecting linkage than () in more complicated situations including multiple affected relatives are still under investigation. It is worth noting that in order to compute these imputed IBD statistics, one needs to assume the knowledge regarding the ordering and distances of the multiple markers and their allele frequencies. The proposed work should not be viewed as a competitor to the existing methods such as the lod score and NPL methods as implemented in the GENEHUNTER program. Rather, our method implicitly assumes that there is some preliminary evidence of linkage within the chromosomal region. Our main goal is to estimate the map position of a single unobserved susceptibility locus while providing a conventional confidence interval for its map position. Assumption about the evidence of linkage can be validated through testing the null hypothesis of no linkage by using either the methods noted above or, for example, the test statistics considered in Kruglyak et al. [1996], Kong and Cox [1997] and Teng and Siegmund [1998]. Thus, we view the proposed method as a supplement to the existing methods with the ultimate goal of locating a susceptibility gene in a robust fashion. Obviously, this approach is dependent on the mapping function used and we have only considered Haldane s mapping function here. Further work to explore the impact of this assumption and conditions, such as variable levels of independence across the region of interest, possible gender differences in recombination fractions and imprinting, is needed. Finally, the proposed work including the exploratory plots, the GEE method to estimate Ù and sample size and power calculations, has been implemented in a FOR- TRAN software, GENEFINDER. This program will be made available through the web site when it is properly documented and tested. Acknowledgments This work is supported by NIH grant GM The authors are grateful to Paul Rathouz and Steve Self for helpful discussions and to Chiung-Yu Huang for computing assistance. Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:

13 Appendix 1: Proof of Proposition 1 Define f l = Pr( AS(Ù) = l) / Pr( ), l = 0, 1,, where is the event that both siblings are affected. Under the assumption that R contains no more than one susceptible gene, one has for k = 0, 1, Pr(S(t) = k A ) = l = 0 f l WPr(S(t) = k,s(ù) = l), where the joint distribution of S(t) and S(Ù) has been derived by Haseman and Elston [197] as a simple function of t, Ù. Consequently, we have Ì(t) = E(S(t)A ) = 1 f t, Ù + f 1 + f 0 (1 t, Ù ). But it is straightforward to show that Pr(S(Ù) = A ) = f W 1 4, Pr(S(Ù) = A ) = f 1 W 1, Pr(S(Ù) = 0A ) = f 1 0 W 4 Thus, Ì(t) can be expressed as Ì(t) = t, Ù Pr(S(Ù) = A ) + Pr(S(Ù) = 1A ) + (1 t, Ù ) Pr(S(Ù) = 0 A ) = ( t, Ù 1) Pr(S(Ù) = A ) + Pr(S(Ù) = 1A ) + (1 t, Ù ) = 1 + ( t, Ù 1)(E(S(Ù)A ) 1). Appendix 3: A Sketch of the Proof of Proposition According to (11), it is easy to see that Ì*(t) = E(S(tAY )A ) = 1 + E(Pr(S(t) = AY)A ) E(Pr(S(t) = 0AY)A ). For notational simplicity, we assume here that Pr(S(t) AY) follows the Markov chain of order 1 (MC1) assumption, i.e., the distribution of S(t) depends only on the information provided by the flanking markers. For locus t, let t j and t j + 1 denote the loci which cover t. Without loss of generality, we further assume that Y(t j ) = S(t j ) and Y(t j + 1 ) = S(t j + 1 ), i.e. markers at loci t j and t j + 1 are fully polymorphic. With these assumptions, Ì*(t) can be reexpressed as Ì*(t) = 1 + = 1 + W l = 0 a = 0 a = 0 b = 0 b = 0 [ Pr(S(t) = AS(t j ) = a, S(t j + 1 ) = b) Pr(S(t) = 0AS(t j ) = a, S(t j + 1 ) = b) WPr(S(t j ) = a, S(t j + 1) = b A )] [ Pr(S(Ù) = Aa,b ) Pr(S(Ù) = 0Aa,b ) f l Pr(S(t) = l, S(t j ) = a, S(t j + 1) = b ) ] = 1 + l = 0 a = 0 b = 0 f l Pr(S(t) = Aa, b) Pr(S(t) = 0 A,b) WPr(S(Ù) = l, S(t j ) = a,s(t j + 1 ) = b ) Appendix : The Inequality in (9) Using formulas (5) and (14) of Risch [1990a] repeatedly, we have (6) = K 1 (Ï 1M 1) K 4Ï S K 1 K (Ï1M 1)+(Ï 1O 1) + K K (Ï M 1)+(Ï O 1) +4 = K 1 K Ï 1M 1! K 1 K Ï 1M 1 K 1 K (4Ï1S 4) + 4 K 1 K! K 1 K Ï 1M 1 Ï = 1M 1 W 4Ï 1S 4Ï 1S = (4). Here we have adopted the notation that the single locus assumed in Model 1 corresponds to locus 1 in Model 3, the two-locus additive model. Consequently, 0! * = (6) (4) = K 1 Ï 1S K! 1. Ï S = 1 + l = 0 f l B l, (A1) here f l is defined in Appendix 1 as Pr( AS(Ù) = l)/pr( ) and we have used Pr(S(Ù) = Aa,b) to denote Pr(S(Ù) = AS(t j ) = a,s(t j + 1 ) = b ) for simplicity. We consider three exhaustive and exclusive situations: Situation I: Ù is outside of (t j,t j + 1 ). This is equivalent to (i) in Proposition. We consider only the case that Ù is to the right of (t j,t j + 1 ), i.e. t j ^ t ^ t j + 1! Ù as results apply to the other case as well. Applying the MC1 assumption repeatedly, we have B = a b = a b Pr(S(Ù) = AS(tj + 1) = b,s(t) =, S(tj ) = a) * Pr(S(t j + 1 ) = bas(t) =, S(t j ) = a ) * Pr(S(t) =, S(t j ) = a) Pr(S(t) = AS(t j + 1 ) = b,s(ù) = 0, S(t j ) = a) * Pr(S(t j + 1 ) = bas(t) = 0, S(t j ) = a ) * Pr(S(t) = 0, S(t j ) = a) Pr(S(Ù) =, S(t j + 1 ) = b,s(t) =, S(t j ) = a) Pr(S(Ù) =, S(t j + 1 ) = b,s(t) = 0, S(t j ) = a) 76 Hum Hered 001;51:64 78 Liang/Chiu/Beaty

14 = Pr(S(Ù) =, S(t) = ) Pr(S(Ù) =, S(t) = 0) = 1 4 t,ù 1 4 (1 t, Ù) = 1 4 ( j, Ù 1). Similarly, one can show that B 1 = Pr(S(Ù) = 1, S(t) = ) Pr(S(Ù) = 1, S(t) = 0) = 1 t, Ù (1 t, Ù ) 1 t, Ù (1 t, Ù ) = 0 and B 0 = Pr(S(Ù) = 0, S(t) = ) Pr(S(Ù) = 0, S(t) = 0) = 1 4 (1 t, Ù) 1 4 t,ù = 1 4 (1 t, Ù). Thus, one has To show that 0 ^ 1 F ^ 1, note that (1 ) = 1 ( 1) /4 and 0 ^ ( 1) ^ 1. Thus, Ù,tj + 1 (1 Ù,tj + 1 ) tj,t j + 1 (1 tj,t j + 1 ) = 1 ( Ù,t j + 1 1) 1 ( tj,t j + 1 1) 1 ( = Ù,tj + 1 1) 1 ( tj, Ù 1) ( Ù,tj + 1 1). Furthermore, 0 ^ (1 ) ^ 1/4. As a result, 0 ^ F! 1 and this implies that 0 ^ 1 F ^ 1. Situation III: t j ^ Ù ^ t ^ t j + 1 This situation corresponds to (iii) of Proposition. The proof is similar to that in situation II and is therefore omitted. Ì*(t) = 1 + f 1 W 4 ( t, Ù 1) + f 1 W0 + f 1 0 W 4 (1 t, Ù) = 1 + ( t, Ù 1)(Pr(S(Ù) = A ) Pr(S(Ù) = 0 A )) = 1 + ( t, Ù 1)(E(S(Ù)A ) 1) = Ì(t). Situation II: t j ^ t ^ Ù ^ t j + 1 This situation corresponds to (ii) of Proposition. Before proceeding, we state the following lemma which will prove to be useful. Lemma: For any three loci at t 1! t! t 3, t1, t 3 1 = ( t1, t 1)( t, t 3 1). Proof. From (10), one has t1, t 3 1 = (1 t1, t 3 ) = e0.04 A t1 t3 A = e 0.04( A t1 t A + A t t3 A ) = ( t1, t 1)( t, t 3 1). Long and tedious algebraic manipulations give B = B 0 = 1 4 W ( t, Ù 1) 1 4 t j,t (1 tj,t) Ù,tj + 1 (1 Ù,tj + 1 ) tj,t j + 1 (1 tj,t j + 1 (A) and B 1 = 0. Denoting the last term in (A) as 1 F, we have from (A1) Ì*(t) = (f f 1 )( t, Ù 1) (1 F) = 1 + (E(S(Ù)A ) 1)( t, Ù 1) (1 F). Appendix 4: Expressions for Cov(S(Y)1 ; H 0 ) and Cov(S(Y)1 ; H A ) As noted in the text, the jth component of S(Y) = (S(t 1 (Y),..., S(t M AY)) reduces to S(t j ), i.e. S(t j AY ) = S(t j ), if the markers are fully polymorphic, as we assume here. Under H A, i.e. Ù is within the region R, Var(S(t j )A ; H A ) = ( tj, Ù 1) (Var(S(Ù)A ) 1/) + 1/, j = 1,..., M and Cov(S(t j ), S(t l ) ; H A ), j! l = 1,..., M equals to ( tj,t l 1)Var(S(Ù)A ) if Ù D [t j,t l ] ( tj, Ù 1)( tl, Ù 1) (Var(S(Ù)A ) 1/) + 1 ( t j,t l 1) if Ù DA [t j,t l ]. Under the null hypothesis that Ù =, Var(S(Ù)A ) = 1/ and consequently, Cov(s(t j ), S(t l A ; H 0 ) = ( tj,t l 1)/, j! l = 1,..., M. Robust Multipoint Mapping for Complex Diseases Hum Hered 001;51:

15 References Goldin LR: Detection of linkage under heterogeneity: comparison of two-locus vs. Admixture models. Genet Epidemiol 199;9: Haldane JBS: The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet 1919;8: Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 197;:3 19. Hauser ER, Boehnke M: Genetic linkage analysis of complex genetic traits by using affected sibling pairs. Biometrics 1998;54: Hodge SE, Elston R: Lods, wrods, and mods: The interpretation of lod scores calculated under different models. Genet Epidemiol 1994;11: Hodge SE, Vieland VJ: The essence of single ascertainment. Genetics 1996;144: Huber PJ: Robust estimation of a location parameter. Ann Math Statist 1964;35: Huber PJ: The behavior of maximum likelihood estimators under non-standard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, vol 1, pp Kong A, Cox NJ: Allele-sharing models: Lod scores and accurate linkage tests. Am J Hum Genet 1997;61: Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: A unified multipoint approach. Am J Hum Genet 1996;58: Lander ES, Kruglyak L: Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nat Genet 1995;11: Liang KY, Rathouz PJ, Beaty TH: Determining linkage and mode of inheritance: Mod scores and other methods. Genet Epidemiol 1996;13: Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13. MacLean CJ, Bishop DT, Sherman SL, Diehl SR: Distribution of lod scores under uncertain mode of inheritance. Am J Hum Genet 1993; 5: Risch N: Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 1990a;46: 8. Risch N: Linkage strategies for genetically complex traits: II. The power of affected relative pairs. Am J Hum Genet 1990b;46:9 41. Schork NJ, Boehnke M, Terwilliger JD, Ott J: Twotrait-locus linkage analysis: A powerful strategy for mapping complex genetic traits. Am J Hum Genet 1993;53: Smith CAB: Testing for heterogeneity of recombination fraction values in human genetics. Ann Hum Genet 1963;7: Suarez BK, Rice J, Reich T: The generalized sib pair IBD distribution: Its use in the detection of linkage. Am Hum Genet 1978;4: Teng J, Siegmund D: Multipoint linkage analysis using affected relative pairs and partially informative markers. Biometrics 1998;54: Vieland VJ, Hodge SE, Greenberg DA: Adequacy of single-locus approximations for linkage analysis of oligogenic traits. Genet Epidemiol 199; 9: Whittemore AS: Genome scanning for linkage: An overview. Am J Hum Genet 1996;59: Whittemore AS, Halpern J: A class of tests for linkage using affected pedigree members. Biometrics 1994;50: Hum Hered 001;51:64 78 Liang/Chiu/Beaty

Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives

Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives Genetic Epidemiology 16:225 249 (1999) Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives Mary Sara McPeek* Department of Statistics, University of Chicago, Chicago, Illinois

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

The universal validity of the possible triangle constraint for Affected-Sib-Pairs

The universal validity of the possible triangle constraint for Affected-Sib-Pairs The Canadian Journal of Statistics Vol. 31, No.?, 2003, Pages???-??? La revue canadienne de statistique The universal validity of the possible triangle constraint for Affected-Sib-Pairs Zeny Z. Feng, Jiahua

More information

The Admixture Model in Linkage Analysis

The Admixture Model in Linkage Analysis The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

On Computation of P-values in Parametric Linkage Analysis

On Computation of P-values in Parametric Linkage Analysis On Computation of P-values in Parametric Linkage Analysis Azra Kurbašić Centre for Mathematical Sciences Mathematical Statistics Lund University p.1/22 Parametric (v. Nonparametric) Analysis The genetic

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Analytic power calculation for QTL linkage analysis of small pedigrees

Analytic power calculation for QTL linkage analysis of small pedigrees (2001) 9, 335 ± 340 ã 2001 Nature Publishing Group All rights reserved 1018-4813/01 $15.00 www.nature.com/ejhg ARTICLE for QTL linkage analysis of small pedigrees FruÈhling V Rijsdijk*,1, John K Hewitt

More information

Combining dependent tests for linkage or association across multiple phenotypic traits

Combining dependent tests for linkage or association across multiple phenotypic traits Biostatistics (2003), 4, 2,pp. 223 229 Printed in Great Britain Combining dependent tests for linkage or association across multiple phenotypic traits XIN XU Program for Population Genetics, Harvard School

More information

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17 Modeling IBD for Pairs of Relatives Biostatistics 666 Lecture 7 Previously Linkage Analysis of Relative Pairs IBS Methods Compare observed and expected sharing IBD Methods Account for frequency of shared

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

The Lander-Green Algorithm. Biostatistics 666 Lecture 22 The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

I Have the Power in QTL linkage: single and multilocus analysis

I Have the Power in QTL linkage: single and multilocus analysis I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department

More information

Power and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees

Power and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees Johns Hopkins University, Dept. of Biostatistics Working Papers 1-5-2004 Power and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees Weimin Chen Johns Hopkins Bloomberg School of

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Testing for Homogeneity in Genetic Linkage Analysis

Testing for Homogeneity in Genetic Linkage Analysis Testing for Homogeneity in Genetic Linkage Analysis Yuejiao Fu, 1, Jiahua Chen 2 and John D. Kalbfleisch 3 1 Department of Mathematics and Statistics, York University Toronto, ON, M3J 1P3, Canada 2 Department

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Prediction of the Confidence Interval of Quantitative Trait Loci Location Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to 1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution

More information

A General and Accurate Approach for Computing the Statistical Power of the Transmission Disequilibrium Test for Complex Disease Genes

A General and Accurate Approach for Computing the Statistical Power of the Transmission Disequilibrium Test for Complex Disease Genes Genetic Epidemiology 2:53 67 (200) A General and Accurate Approach for Computing the Statistical Power of the Transmission Disequilibrium Test for Complex Disease Genes Wei-Min Chen and Hong-Wen Deng,2

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Mapping quantitative trait loci in oligogenic models

Mapping quantitative trait loci in oligogenic models Biostatistics (2001), 2, 2,pp. 147 162 Printed in Great Britain Mapping quantitative trait loci in oligogenic models HSIU-KHUERN TANG, D. SIEGMUND Department of Statistics, 390 Serra Mall, Sequoia Hall,

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Multipoint Quantitative-Trait Linkage Analysis in General Pedigrees

Multipoint Quantitative-Trait Linkage Analysis in General Pedigrees Am. J. Hum. Genet. 6:9, 99 Multipoint Quantitative-Trait Linkage Analysis in General Pedigrees Laura Almasy and John Blangero Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio

More information

Resultants in genetic linkage analysis

Resultants in genetic linkage analysis Journal of Symbolic Computation 41 (2006) 125 137 www.elsevier.com/locate/jsc Resultants in genetic linkage analysis Ingileif B. Hallgrímsdóttir a,, Bernd Sturmfels b a Department of Statistics, University

More information

AFFECTED RELATIVE PAIR LINKAGE STATISTICS THAT MODEL RELATIONSHIP UNCERTAINTY

AFFECTED RELATIVE PAIR LINKAGE STATISTICS THAT MODEL RELATIONSHIP UNCERTAINTY AFFECTED RELATIVE PAIR LINKAGE STATISTICS THAT MODEL RELATIONSHIP UNCERTAINTY by Amrita Ray BSc, Presidency College, Calcutta, India, 2001 MStat, Indian Statistical Institute, Calcutta, India, 2003 Submitted

More information

Lecture 6. QTL Mapping

Lecture 6. QTL Mapping Lecture 6 QTL Mapping Bruce Walsh. Aug 2003. Nordic Summer Course MAPPING USING INBRED LINE CROSSES We start by considering crosses between inbred lines. The analysis of such crosses illustrates many of

More information

Simple, Robust Linkage Tests for Affected Sibs

Simple, Robust Linkage Tests for Affected Sibs Am. J. Hum. Genet. 6:8 4, 998 Simple, Robust Linkage Tests for Affected Sibs Alice S. Whittemore and I-Ping Tu Department of Health Research and Policy, Stanford University School of Medicine; and Stanford

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

SNP Association Studies with Case-Parent Trios

SNP Association Studies with Case-Parent Trios SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature

More information

QTL mapping under ascertainment

QTL mapping under ascertainment QTL mapping under ascertainment J. PENG Department of Statistics, University of California, Davis, CA 95616 D. SIEGMUND Department of Statistics, Stanford University, Stanford, CA 94305 February 15, 2006

More information

The Quantitative TDT

The Quantitative TDT The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

SOLUTIONS TO EXERCISES FOR CHAPTER 9

SOLUTIONS TO EXERCISES FOR CHAPTER 9 SOLUTIONS TO EXERCISES FOR CHPTER 9 gronomy 65 Statistical Genetics W. E. Nyquist March 00 Exercise 9.. a. lgebraic method for the grandparent-grandoffspring covariance (see parent-offspring covariance,

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

VARIANCE-COMPONENTS (VC) linkage analysis

VARIANCE-COMPONENTS (VC) linkage analysis Copyright Ó 2006 by the Genetics Society of America DOI: 10.1534/genetics.105.054650 Quantitative Trait Linkage Analysis Using Gaussian Copulas Mingyao Li,*,,1 Michael Boehnke, Goncxalo R. Abecasis and

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Computation of Multilocus Prior Probability of Autozygosity for Complex Inbred Pedigrees

Computation of Multilocus Prior Probability of Autozygosity for Complex Inbred Pedigrees Genetic Epidemiology 14:1 15 (1997) Computation of Multilocus Prior Probability of Autozygosity for Complex Inbred Pedigrees Sun-Wei Guo* Department of Biostatistics, University of Michigan, Ann Arbor

More information

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES Elizabeth A. Thompson Department of Statistics, University of Washington Box 354322, Seattle, WA 98195-4322, USA Email: thompson@stat.washington.edu This

More information

Binary trait mapping in experimental crosses with selective genotyping

Binary trait mapping in experimental crosses with selective genotyping Genetics: Published Articles Ahead of Print, published on May 4, 2009 as 10.1534/genetics.108.098913 Binary trait mapping in experimental crosses with selective genotyping Ani Manichaikul,1 and Karl W.

More information

Backward Genotype-Trait Association. in Case-Control Designs

Backward Genotype-Trait Association. in Case-Control Designs Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com

More information

Computational Aspects of Aggregation in Biological Systems

Computational Aspects of Aggregation in Biological Systems Computational Aspects of Aggregation in Biological Systems Vladik Kreinovich and Max Shpak University of Texas at El Paso, El Paso, TX 79968, USA vladik@utep.edu, mshpak@utep.edu Summary. Many biologically

More information

Inference on pedigree structure from genome screen data. Running title: Inference on pedigree structure. Mary Sara McPeek. The University of Chicago

Inference on pedigree structure from genome screen data. Running title: Inference on pedigree structure. Mary Sara McPeek. The University of Chicago 1 Inference on pedigree structure from genome screen data Running title: Inference on pedigree structure Mary Sara McPeek The University of Chicago Address for correspondence: Department of Statistics,

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Evolutionary quantitative genetics and one-locus population genetics

Evolutionary quantitative genetics and one-locus population genetics Evolutionary quantitative genetics and one-locus population genetics READING: Hedrick pp. 57 63, 587 596 Most evolutionary problems involve questions about phenotypic means Goal: determine how selection

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

STATISTICAL GENETICS 98 Bayesian Linkage Analysis, or: How I Learned to Stop Worrying and Love the Posterior Probability of Linkage

STATISTICAL GENETICS 98 Bayesian Linkage Analysis, or: How I Learned to Stop Worrying and Love the Posterior Probability of Linkage Am. J. Hum. Genet. 63:947 954, 998 STATISTICA GENETICS 98 Bayesian inkage Analysis, or: How I earned to Stop Worrying and ove the Posterior Probability of inkage Veronica J. Vieland Departments of Preventive

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at American Society for Quality A Note on the Graphical Analysis of Multidimensional Contingency Tables Author(s): D. R. Cox and Elizabeth Lauh Source: Technometrics, Vol. 9, No. 3 (Aug., 1967), pp. 481-488

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

The genomes of recombinant inbred lines

The genomes of recombinant inbred lines The genomes of recombinant inbred lines Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman C57BL/6 2 1 Recombinant inbred lines (by sibling mating)

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6. Linkage Mapping Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6. Genetic maps The relative positions of genes on a chromosome can

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

Univariate Linkage in Mx. Boulder, TC 18, March 2005 Posthuma, Maes, Neale

Univariate Linkage in Mx. Boulder, TC 18, March 2005 Posthuma, Maes, Neale Univariate Linkage in Mx Boulder, TC 18, March 2005 Posthuma, Maes, Neale VC analysis of Linkage Incorporating IBD Coefficients Covariance might differ according to sharing at a particular locus. Sharing

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 4, Issue 1 2005 Article 11 Combined Association and Linkage Analysis for General Pedigrees and Genetic Models Ola Hössjer University of

More information

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 Silvia Heubach/CINQA 2012 Workshop Objectives To familiarize biology faculty with one of

More information

Gene mapping, linkage analysis and computational challenges. Konstantin Strauch

Gene mapping, linkage analysis and computational challenges. Konstantin Strauch Gene mapping, linkage analysis an computational challenges Konstantin Strauch Institute for Meical Biometry, Informatics, an Epiemiology (IMBIE) University of Bonn E-mail: strauch@uni-bonn.e Genetics an

More information

Methods for QTL analysis

Methods for QTL analysis Methods for QTL analysis Julius van der Werf METHODS FOR QTL ANALYSIS... 44 SINGLE VERSUS MULTIPLE MARKERS... 45 DETERMINING ASSOCIATIONS BETWEEN GENETIC MARKERS AND QTL WITH TWO MARKERS... 45 INTERVAL

More information

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen

More information

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE NORMAL DISTRIBUTIONS OF PHENOTYPES Mice Fruit Flies In:Introduction to Quantitative Genetics Falconer & Mackay 1996 CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE Mean and variance are two quantities

More information

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National

More information

Powerful Regression-Based Quantitative-Trait Linkage Analysis of General Pedigrees

Powerful Regression-Based Quantitative-Trait Linkage Analysis of General Pedigrees Am. J. Hum. Genet. 71:38 53, 00 Powerful Regression-Based Quantitative-Trait Linkage Analysis of General Pedigrees Pak C. Sham, 1 Shaun Purcell, 1 Stacey S. Cherny, 1, and Gonçalo R. Abecasis 3 1 Institute

More information

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Lecture 6: Introduction to Quantitative genetics Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Quantitative Genetics The analysis of traits whose variation is determined by both a

More information

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.

More information

Linkage and Linkage Disequilibrium

Linkage and Linkage Disequilibrium Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

Quantitative Genetics

Quantitative Genetics Bruce Walsh, University of Arizona, Tucson, Arizona, USA Almost any trait that can be defined shows variation, both within and between populations. Quantitative genetics is concerned with the analysis

More information

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang) MATHEMATICAL BIOSCIENCES doi:10.3934/mbe.2015.12.503 AND ENGINEERING Volume 12, Number 3, June 2015 pp. 503 523 OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION

More information

Measurements and Data Analysis

Measurements and Data Analysis Measurements and Data Analysis 1 Introduction The central point in experimental physical science is the measurement of physical quantities. Experience has shown that all measurements, no matter how carefully

More information

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,

More information

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013 Lecture 9 Short-Term Selection Response: Breeder s equation Bruce Walsh lecture notes Synbreed course version 3 July 2013 1 Response to Selection Selection can change the distribution of phenotypes, and

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA Linkage analysis and QTL mapping in autotetraploid species Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA Collaborators John Bradshaw Zewei Luo Iain Milne Jim McNicol Data and

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities

More information

Chapter 7: Simple linear regression

Chapter 7: Simple linear regression The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information