Using the genomic relationship matrix to predict the accuracy of genomic selection

Size: px
Start display at page:

Download "Using the genomic relationship matrix to predict the accuracy of genomic selection"

Transcription

1 J. Anim. Breed. Genet. ISSN ORIGINAL ARTICLE Using the genomic relationship matrix to predict the accuracy of genomic selection M.E. Goddard 1,2, B.J. Hayes 2 & T.H.E. Meuwissen 3 1 Department of Agriculture and Food Systems, University of Melbourne, Melbourne, Vic., Australia 2 Biosciences Research Division, Victorian Department of Primary Industries, Bundoora, Vic., Australia 3 Norwegian University of Life Sciences, Ås, Norway Keywords Genomic selection; relationship matrix. Correspondence M. Goddard, Biosciences Research Division, Victorian Department of Agriculture, 1 Park Drive, Bundoora, Vic. 3083, Australia. Tel: ; Fax: ; mike.goddard@dpi.vic.gov.au Received: 6 February 2011; accepted: 18 August 2011 Summary Estimated breeding values (EBVs) using data from genetic markers can be predicted using a genomic relationship matrix, derived from animal s genotypes, and best linear unbiased prediction. However, if the accuracy of the EBVs is calculated in the usual manner (from the inverse element of the coefficient matrix), it is likely to be overestimated owing to sampling errors in elements of the genomic relationship matrix. We show here that the correct accuracy can be obtained by regressing the relationship matrix towards the pedigree relationship matrix so that it is an unbiased estimate of the relationships at the QTL controlling the trait. This method shows how the accuracy increases as the number of markers used increases because the regression coefficient (of genomic relationship towards pedigree relationship) increases. We also present a deterministic method for predicting the accuracy of such genomic EBVs before data on individual animals are collected. This method estimates the proportion of genetic variance explained by the markers, which is equal to the regression coefficient described above, and the accuracy with which marker effects are estimated. The latter depends on the variance in relationship between pairs of animals, which equals the mean linkage disequilibrium over all pairs of loci. The theory was validated using simulated data and data on fat concentration in the milk of Holstein cattle. Introduction The matrix of relationships among a group of individuals can be used to predict their breeding values, to manage inbreeding and in genetic conservation. This relationship matrix can be calculated from the pedigree, but it is also possible to calculate the relationship matrix from genotypes at genetic markers such as single-nucleotide polymorphisms (SNPs). Elements of the genomic relationship matrix are estimates of the realized proportion of the genome that two individuals share, whereas the pedigree-derived relationship matrix is the expectation of this proportion. This genomic relationship matrix can be used in genomic selection to estimate breeding values. Genomic selection refers to the use of a large number of genetic markers, such as SNPs, covering the whole genome to predict the genetic value of individuals (Meuwissen et al. 2001). The individuals might be people whose genetic risk of developing a complex disease is being predicted, or they might be domestic animals or plants in which estimates of ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011) doi: /j x

2 Predict the accuracy of genomic selection M. E. Goddard et al. their breeding value will be used to select parents to breed the next generation. In cattle, the availability of high-throughput, high-density genotyping with SNP chips has led to the widespread adoption of genomic selection in dairy cattle breeding programmes, where it is predicted to double the rate of genetic improvement (Schaeffer 2006; Dalton 2009). Traditionally, livestock have been selected on the basis of estimated breeding values (EBVs) calculated from data on phenotype and pedigree using a statistical technique called best linear unbiased prediction (BLUP) (Henderson 1984). A desirable feature of this method is that the accuracy of the EBVs could be calculated as part of the statistical analysis. This is not the case with many methods used for genomic selection. Currently, the most trusted method for assessing the accuracy of genomic EBVs is an empirical test in which a sample of animals have genomic EBVs calculated, and then additional phenotypic data are collected to assess how accurately the EBVs predict these new data. This is time-consuming, fails to predict the accuracy of individual EBVs and is wasteful in that the new data are used only to estimate accuracy and not to improve the prediction of breeding value. Other cross-validation techniques can also be used but they are also time-consuming and do not yield the accuracy of the final prediction, or individual accuracies. In practice, some authors have used the inverse of the mixed model or BLUP equations (e.g. VanRaden 2008; Hayes et al. 2009a,b) but, as we show in this paper, this can overestimate the accuracy. It would be very useful to be able to predict the accuracy of EBVs calculated using genomic selection in two situations. Firstly, after the data have been collected and are being analysed, it would be useful to calculate accuracies of EBVs as part of the statistical analysis, as is done for traditional EBVs, including for individuals without their own phenotypes. In this situation, we are interested in calculating the accuracy of the EBVs of individual animals. Secondly, when planning a selection programme using genomic selection, it would be useful to be able to predict the accuracy of alternative designs so that the best one could be implemented. In this situation, we wish to predict the accuracy of EBVs of classes of animals, which we might then use in deterministic simulations of alternative breeding programmes. This paper presents methods for both of these situations. After the data have been collected and are being analysed, it should be possible to predict the accuracy from the properties of the statistical method (as in VanRaden 2008 and Harris & Johnson 2010). However, this requires that the statistical model matches the true situation. For instance, if the model makes assumptions about the distribution of the effects of genes affecting the trait (QTL), this should match the real distribution. If we assume that there are a very large number of QTL whose effects follow a normal distribution with constant variance, the analysis (called BLUP by Meuwissen et al. 2001) is robust to departures from this assumption and the accuracy is little affected even if the distribution of QTL effects does not follow a normal distribution. However, empirical tests of the accuracy derived from the inverse of the BLUP equations often find that it is overestimated by this method, dramatically so if it is used to predict breeding values in one breed based on data from another breed (Hayes et al. 2009a). An anomaly of the method is that it does not predict increasing accuracy as the number of markers is increased and this explains why it overestimates accuracy as shown below. In this paper, we describe how to calculate the accuracy of genomic EBVs after the data have been collected, taking into account the number of markers used. The accuracy of genomic EBVs expected before data are collected has been considered by Goddard (2009) for unrelated animals and by Hayes et al. (2009b) for simple family structures such as groups of full-sibs and half-sibs. Their deterministic method treats the genome as if it were a series of small chromosomal segments, each of which is inherited independently. Here, we treat chromosomes as continuous and show that a similar prediction results. The objectives of this paper are twofold: (i) to derive a method for calculating the accuracy of EBVs calculated using a known genomic relationship matrix; and (ii) to derive a method to predict this accuracy before the data on individual animals are collected. In the Materials and Methods section, we first develop the theory to predict accuracy and then describe the simulation and real data in which it is tested. We only consider the genomic selection method called BLUP by Meuwissen et al. (2001). Materials and methods Theory Calculation of accuracy after data are collected Consider a group of T animals with breeding values are controlled by Q QTL. At the jth QTL, the genotypes (00, 01, 11) have frequency (1 ) p j ) 2, 2 p j 410 ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

3 M. E. Goddard et al. Predict the accuracy of genomic selection (1 ) p j ) and p j 2, respectively, and are described for the ith animal by w ij = )2 p j, 1-2 p j and 2-2 p j, respectively, for the three genotypes. This coding causes the mean of w ij over animals to be zero. Let; y ¼ fixed effects þ g þ e g ¼ Wu where, y is a T 1 vector of phenotypic values, g is a T 1 vector of breeding values, u is a Q 1 vector of effects of QTL assumed N(0, I r 2 u ),W is a T Q matrix describing the genotype of each animal i at each QTL j by w ij, e is a T 1 vector of environmental effects, and VðgÞ ¼WW 0 r 2 u VðeÞ ¼Ir 2 e VðÞ¼WW y 0 r 2 u þ Ir2 e h 2 ¼ r 2 g = r2 g þ r2 e Thus, ignoring fixed effects, the model y ¼ Wu þ e is equivalent to the conventional model y ¼ g þ e if VðgÞ ¼ Gr 2 g ¼ WW 0 r 2 u That is, the genomic model is equivalent to a conventional animal model with the relationship matrix calculated from the QTL genotypes. This equivalence has been pointed out before but usually in terms of marker genotypes and effects rather than QTL genotypes and effects (e.g. Nejati-Javaremi et al. 1997; Villanueva et al. 2005; Fernando 1998, Habier et al. 2007; VanRaden et al. 2009, Goddard 2009 Strandén & Garrick 2009). In practice, we will use the markers to estimate G, but this formulation makes it clear that it is the relationship matrix based on the QTL that we need to estimate. Let G = A + D where A is the relationship matrix based on pedigree and D represents deviations from A owing to the observed segregation of alleles at the QTL. In a conventional BLUP, we use A instead of G. This still leads to unbiased estimates of g and to appropriate estimates of accuracy because E(G A) =A. That is A is an unbiased estimate of G in the same way that BLUP EBVs are an unbiased estimate of true breeding value. We need an estimate of G based on the marker genotypes that also has this property. Let G m ¼ W mw 0 m =M,where W m is a matrix defined in the same way as W above but recording the genotypes at markers instead of QTL and M* =R2p j (1 ) p j ). Then; G m ¼ A þ D þ E where the errors (E) occur because the markers used are a sample of genomic positions and so G m includes some sampling errors. G mcan be improved by taking a weighted average of the relationship estimated from each marker where the weights are the inverse of the prediction error variance (PEV) (Powell et al. 2010; Yang et al. 2010). This can be achieved by defining x ij as w ij (sqrt(2p j (1 ) p j )) and calculating G m as G m ¼ XX 0 =M where M is the number of markers. However, G m is still not unbiased in the sense we require, i.e. E(G G m ) is not G m. Instead, we use ^G ¼ A þ bðg m AÞ ð1þ where b is the regression of elements of G ) A on elements of G m ) A ¼ VD ð Þ= ½VD ð ÞþVE ð ÞŠ ð2þ V(D) is the variance of D ij across all possible pairs of relationship, i.e. it is a scalar and V(E) is similarly defined. To indicate that V(D) is a variance across the elements of D, D is not written in bold type. The V(D) could be predicted from theory but it may be better to estimate it from the data. Assuming QTL have the same properties as the markers, we can assess how well the markers predict the relationship based on QTL by how well they predict the relationship based on other markers. If we randomly split the markers into two non-overlapping sets and calculate G m1 and G m2 for the two sets, then c = cov(g m1 ) A, G m2 ) A) estimates V(D) and V(G m1 ) G m2 ) estimates 2*V(E), where V(E) is the variance of the errors in G m1 (or G m2 ). [We exclude the diagonal elements of G and A from these calculations because they have ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

4 Predict the accuracy of genomic selection M. E. Goddard et al. slightly different properties to the non-diagonal elements and it is the latter that are important]yang et al. (2010) show that this is V(E) =1 M where M is the number of markers. Then, b can be estimated as b ¼ 1 1=M= ðc þ 1=MÞ ¼ 1 1= ðcm þ 1Þ: ð2aþ The regression coefficient b may vary among subsets of the data. For instance, relationships between animals from different breeds may have a small V(D) and hence a smaller value of b than relationships within a breed. It would be desirable to calculate b separately for categories of relationship that differ greatly in V(D), but we will not pursue that possibility in this paper. If the QTL differ in a systematic way from the markers, then V(E) will be greater than expected from the finite number of markers. An alternative derivation of ^G shows how to estimate b in this case. Let y ¼ q þ a þ e where a is a vector of polygenic effects not captured by the markers N(0, A r 2 a ),q = Xm is a vector of breeding values explained by markers, m is a vector of standardized marker effects N(0, I r 2 m ) VðqÞ ¼ G m r 2 q and G m ¼ XX 0 =M g ¼ q þ a r 2 g ¼ r2 q þ r2 a VðgÞ ¼ Gr 2 g ¼ G mr 2 q þ Ar2 a The variance components in this linear model can be estimated by REML, for example, and then b ¼ r 2 q =r2 g. Then ^G ¼ G m r 2 q þ Ar2 a ¼ ½ A þ b ð G m AÞŠr 2 g ð1aþ is the same as defined above in (1) but this time with the regression coefficient estimated from the phenotype data rather than predicted from the marker data. We can now use ^G in place of A in the normal mixed model equations (MME) and calculate EBVs for individual animals and the accuracy of these EBVs in the usual manner. Prediction of accuracy before data are collected In practice, the methodology, for calculating EBVs and their accuracy from genetic markers described above, would be implemented using Henderson s MME. If we ignore fixed effects, we can use the equivalent selection index approach. We assume that the training data consist of T unrelated animals (A = I) with phenotypes and marker genotypes. We wish to predict the accuracy of the EBVs for additional, unrelated test animals with marker genotypes but without phenotypes. We will use the model y = g + e and g = q + a introduced above. As VðaÞ ¼Ar 2 a ¼ Ir2 a,the only information about the EBV of the test animals comes from marker information, which allows us to estimate q. Therefore, reliability (accuracy squared) of EBV R 2 ¼ Vð^qÞ=VðgÞ ¼Vð^qÞ=VðqÞVðqÞ=VðgÞ ¼ Vð^qÞ=VðqÞ b ð3þ Therefore,we have to predict two quantities: the proportion of the genetic variance explained by the markers (b) and the accuracy with which the combined marker effects (q) are estimated (Dekkers 2007; Goddard 2009). b =V(q) V(g). We will only consider the case where the QTL are not systematically different from the markers and therefore this can be predicted from the properties of the markers, which in turn can be predicted from the theory of linkage disequilibrium. Equation (2) shows b depends on the true variation in relationship between pairs of animals, V(D), compared to the sampling error caused by a finite number of markers, V(E). Appendix 1 shows that V(D) = mean value of r 2 over all pairs of loci where r 2 is a standard measure of LD (Hill & Robertson 1968). The appendix derives the expected value of this mean, V(D) =1 M e, where M e ¼ 2N e Lk=logðN e LÞ ð4þ where N e is the effective population size, L is the average length of a chromosome in Morgan, k is the number of chromosomes and M e is called the effective number of chromosome segments segregating in the population (Goddard 2009; Hayes et al. 2009b). As V(E) =1 M b ¼ ½1=M e Š= ½1=M e þ 1=MŠ ¼ M= ðm e þ MÞ ð2bþ Vð^qÞ=VðqÞ:The selection index equation for the EBV of an animal without phenotype is ^q ¼ g 0 2 V 1 y where g 2 is the vector of covariances between the target animal and the training animals, and V ¼ G m r 2 q þ I r2 e þ r2 a,y is the vector of phenotypic 412 ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

5 M. E. Goddard et al. Predict the accuracy of genomic selection values of animals in the training set and the Vð^qÞ ¼g 0 2 V 1 g 2 : A convenient way to calculate g 2 V )1 g 2 for many different animals is to expand the matrix V to include the target animal V ¼ r2 q þ r2 e þ r2 a g0 2 g 2 V Then; g 0 2 V 1 g 2 ¼ V =V ð5þ where V* 11 is the first diagonal element of the inverse of V*. By treating each animal in turn as the target animal, it is possible to calculate the accuracy of the EBVs for T + 1 possible target animals (i.e. each animal in turn is regarded as the target animal that does not have a phenotype recorded). In Appendix 2, we describe a heuristic approximation for ½V 11 Š 1 r 2 e þ r2 a þ r2 q =ð1 þ hþ where h = Tbh 2 M e, so that Vð^qÞ=VðqÞ ¼ h=ð1 þ hþ ð6þ This formula (6) is very similar to those of Goddard (2009) and Daetwyler et al. (2008) who derived it by considering the accuracy of estimating a single marker effect. It does not account for the reduction in the error variance when all markers are fitted simultaneously. Daetwyler et al. (2008) proposed the following correction: If the reliability (accuracy squared) calculated by (3) is called R 2 w o, then the proposed estimate of the reliability (R 2 D )is Simulation R 2 D ¼ R2 w=o 1 þ R4 w=o h2 2h! ð7þ Populations Twenty replicate populations were simulated as described in Meuwissen and Goddard (2010). Briefly, Fisher-Wright s idealized populations were simulated with N e = 1000 for generations in order to achieve a mutation drift balance and linkage disequilibrium between the created SNPs. N e was assumed high here, which makes the probability that the training set contains close relatives of the predicted individual quite small, i.e. the predicted accuracies can be interpreted as accuracies of unrelated individuals. The genome consisted of one chromosome of 1 Morgan. Mutations were simulated according to the infinite sites mutation model (Kimura 1969) at a rate of 10 )8 per base-pair per meiosis. The recombination rate was also 10 )8 per base-pair per meiosis. This resulted on average in segregating mutations (SNPs). At random 1000 SNPs were sampled (without replacement) to enter SNP- SET1, 1000 other SNPs were sampled for SNP-SET2, and 30 SNPs were sampled to act as QTL (QTL-SET). QTL effects were sampled from the Normal distribution. Thus, SNP-SET1, SNP-SET2 and QTL-SET are disjunct sets, and there are no systematic differences between the SNPs that enter each of the sets. After the generations, the 20 populations were simulated for 10 more generations following the same population model, but now the sampled pedigree was recorded and used to set up a relationship matrix A of the last generation of animals. Both SNP-SET1 and SNP-SET2 were used to set up genomic relationship matrices G m1 and G m2, respectively, calculated as XX M with X as defined above. This is very similar to Yang et al. (2010), except that Yang et al. calculated the diagonals of the G mx matrices in a different manner in order to reduce their sampling error. The latter however resulted in negative eigenvalues of the G mx matrix, whereas XX is semi-positive definite. Although Yang et al. (2010) found that such negative eigenvalues did not cause any problems, we have been cautious and used XX. Prediction of accuracy using G m When using the GBLUP approach, the genomic relationship matrix, G m, can be used to predict the accuracy of individual genomic estimated breeding values (GEBV) following Henderson s (1984) mixed model theory. Equivalently, this prediction can be based on selection index theory. We will use selection index theory here and assume (arbitrarily) that genetic variance is 1, in which case the variance of the selection index equals its reliability. We assume a phenotyped and genotyped training population of size T and use selection index theory to calculate the accuracy of the EBV for an additional animal that has marker genotypes but no phenotype. As above, we do this by expanding the matrix V = V(y) tov* by including the animal whose EBV reliability is required as the first animal. Then, the reliability calculated from the selection index, or equivalent MME, is (V* 11 ) 1 V* 11 ). By treating each animal in turn as the test animal, whose EBV reliability is required, we can calculate the accuracy for all T +1 animals with only one matrix inversion. By defining r 2 a as zero, we can calculate the accuracy predicted by the MME when G m is used instead of ^G. ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

6 Predict the accuracy of genomic selection M. E. Goddard et al. Estimation of true accuracy True genetic values of all animals were calculated as:tbv i ¼ P 30 j¼1 W iju j, where summation is over the QTL in QTL-SET, w ij is the genotype of QTL j with 0, 1 and 2 denoting 0 0, 1 0 and 1 1, respectively, and u j is the normally distributed effect of the QTL. Within each data set, the variance of the TBV i was standardized to 1. To obtain phenotypic records, an environmental effect was added to the TBV, which was sampled from N(0, r 2 e ), where r2 e was 4, 1 or to obtain heritabilities of 0.2, 0.5 and 0.9, respectively. To obtain estimates of breeding values, the phenotypic data were analysed by the GBLUP model: y ¼ l þ Zg þ e where g is the genetic value of 500 training and 500 evaluation animals; Z is an incidence matrix indicating which animals have records; and V(g) =G m1 or ^G: Estimation of variance components pertaining to g and e and estimation of ^g were by ASREML (Gilmour et al., 2002). True accuracy of the GEBV was calculated as the correlation between the ^g of the evaluation animals and their TBV. Real data The data set consisted of 1200 Australian Holstein bulls. The phenotype used for each bull was the mean of his daughter s fat percentage (fat%) in their milk. We chose this phenotype because it is known to be affected by a gene of large effect (diglyceride acultransferase or DGAT) and we wished to test the theory under these conditions. To obtain this phenotype, we de-regressed the Australian Breeding Values (ABVs) to remove the contribution from relatives other than daughters (e.g. Pryce et al. 2010) while retaining the correction for non-genetic effects such as herd. All bulls with de-regressed EBVs had at least 80 daughters. The bulls were genotyped using the Illumina Bovine50K array, which includes single-nucleotide polymorphism (SNP) markers (Matukumalli et al. 2009). The following criteria and checks were applied to the bull s genotypes. Mendelian consistency checks revealed a small number of sons who were discordant with their sires at many (>1000) SNPs or sires with many discordant sons. These animals (17) were removed from the data set. We omitted bulls if they had more than 20% of missing genotypes. And 1181 bulls passed these criteria. Criteria for selecting SNPs were <5% pedigree discordants (e.g. cases where a sire was homozygous for one allele and progeny were homozygous for the other allele), 90% call rate, minor allele frequencies (MAF) >2%, Hardy Weinberg p > All of these criteria were met by SNPs. A small number of these were not assigned to any chromosome on Bovine Genome Build 4.0 and were omitted from the final data set, as were SNPs on the X chromosome. Parentage checking was then performed again, and any genotypes incompatible with pedigree were set to missing. To impute missing genotypes, the SNPs were ordered by chromosome position. All SNPs that could not be mapped or were on the X chromosome were excluded from the final data set, leaving SNPs. To impute missing genotypes, the genotype calls and missing genotype information were submitted to fastphase chromosome by chromosome (Scheet & Stephens 2006). The genotypes were taken as those filled in by fastphase. The matrix G m among the 1181 bulls was constructed as G m =XX M. The matrix A was constructed from the pedigree of the bulls, which had ancestors back to Then, ^G was calculated as described above in equation (1a), as ^G ¼ A þ b^ðg m AÞ where ^r 2 q and ^r 2 gwere estimated using ASREML (Gilmour et al. 2002) and ^b ¼ ^r 2 q =^r2 g : The discovery data set consisted of bulls progeny tested before 2004 (n = 756). The bulls in the validation data set were progeny tested during or after 2004 (n = 400). GEBV for the validation set bulls were predicted using the normal BLUP equations with the A matrix replaced by ^G. Only phenotypes of the reference set bulls were used in this prediction. Note that ^b was also derived using phenotypes from only the reference set bulls. The accuracy of the GEBV was calculated in two ways. The ^G realized accuracy was the correlation of the GEBV for the validation set bulls with their phenotypes divided by the accuracy of the phenotypes (0.9, from ADHIS). The ^G theoretical accuracy was calculated from the diagonal elements of the inverse of the coefficient matrix in the usual way. To assess the value of correcting the G matrix for the proportion of variance explained by the markers, we also calculated the realized and theoretical accuracies with G m in place of ^G. Calculations were performed with markers used to construct the genomic relationship matrices. ^r 2 g 414 ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

7 M. E. Goddard et al. Predict the accuracy of genomic selection Results Simulated data after the data are collected Table 1 summarizes the properties of the relationship matrix (G m ) calculated from the simulated marker data. Two separate sets of markers (1 and 2) are used so that the PEV in calculating elements of G m and the true variance among elements of G m can be assessed. In the simulated data, the PEV of elements of G m is (Table 1). This is as predicted from our theory that PEV = 1 M. As expected, the observed variance of G m (0.0045) is the sum of the true variance ( estimated as the cov(g m1, G m2 )) and the PEV. Therefore, the regression coefficient b in the calculation of ^G in equation (2) is 0.78= Appendix 2 derives the result that the true V(G m ) is equal to the mean LD r 2 over all pairs of markers. To confirm this, we calculated the mean r 2 over the pairs of SNPs and found it to be Table 2 compares the accuracy predicted using genomic relationship matrices in conventional MME with the true accuracy of the EBVs, which we can calculate from the correlation between EBV and true BV, which is known because this is simulated data. When G m is used in traditional MME to calculate EBVs and the accuracies of those EBVs are calculated from the MME in the normal manner, the accuracies of the EBVs are overestimated (Table 2). However, when ^G is used in the MME, the accuracies of the EBVs agree with the true accuracies calculated as the correlation of true BV with EBV, at least to within the standard error. Simulated data before the data are collected The aim in this section is to test the method of predicting the accuracy of genomic EBVs using information about the population (effective population size N e ), the genome length (L), the trait (heritability h 2 ) and the size of the discovery population (T). The prediction method presented in the Methods section is in two parts the proportion of variance Table 1 Properties of the estimated genomic relationships 1 Cov (G m1 ) A, G m2 ) A) 2 PEV 3 V(G m1 ) A) 2 Table 2 Comparison of predicted accuracy using genomic relationship matrices and true accuracy from correlation between true BV and EBV 1 h 2 Predicted accuracy G m Number of training records is T = 500. explained by the markers [b = V(q) V(g)] and the accuracy in estimating the marker effects ½vð^qÞ=vðqÞŠ. We consider these two separately and then the combined prediction. We compare the predicted accuracies against the accuracy calculated using the MME because we have shown above that that predicts the true accuracy. Vq ðþ=vðgþ For N e = 1000, L = 1 equation (4) gives the effective number of chromosome segments as M e =2N e L log(n e L) = 290. Therefore, we predict that the V(D) =1 M e = close to the observed figure of (Table 1). Therefore, b = 0.78 from equation 2b. Vð^qÞ=VðqÞ Using this value of M e in formulae 6, we predicted the accuracy of EBVs and compared that to the value expected from the MME using G m (Table 3). [In calculating h, we set b = 1 because use of G m in the MME implies that the markers explain all the variance]. Table 3 also contains the predicted accuracy before and after the correction of Daetwyler et al. (2008) using equation (7). Predicted accuracy increases with T and h 2 and is approximately in agreement with the accuracy from the MME. As expected, the correction of Daetwyler et al. (2008) has little effect when reliabilities are low but a more marked effect as accuracies approach 1.0 where this correction improves the agreement with the accuracies calculated from the MME equations. ^G Vð^qÞ=VðgÞ True accuracy Excluding self-relationships. 2 The A, G m1 and G m2 matrices are calculated as in the main text. 3 Prediction error variance calculated as V(G m1 ) G m2 ) 2. The results in Table 3 ignore the errors in the G m matrix both in predicting reliabilities and in calculating them in the MME. Table 2 shows that if the G m ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

8 Predict the accuracy of genomic selection M. E. Goddard et al. Table 3 Predicted accuracy of EBVs before data are collected and from the mixed model equations (MME) after data collection using relationship matrix G m1 h 2 T 2 Predicted 1 R w o MME Prediction with and without the Daetwyler et al. (2008) correction. 2 T = the number of animals in the training population. Table 4 Reliabilities (accuracy squared) predicted and from mixed model equations (MME) using ^G h 2 T R D Predicted (Daetwyler) MME matrix is regressed to ^G, MME correctly predicts the accuracy obtained by simulation. In Table 4, we combine the two aspects of predicting the accuracy by using equation (3) where b = Note that b is also needed in the formula for h. Table 4 shows that the predicted accuracies using equations (3) and (7) agree reasonably well with the accuracies calculated from the MME. The accuracies in Table 4 are lower than the comparable figure in Table 3 because in Table 3, we assumed that G m correctly described the genetic relationship matrix, while in Table 4, we used ^G that incorporates a residual polygenic variance not explained by the markers. In summary, the simulation study shows that the proportion of genetic variance explained by the markers (b) can be predicted by equation 2b, the accuracy of estimating marker effects can be predicted by equation (6), these two estimated can be combined using equations (3) and (7) to predict the accuracy of EBVs, and these accuracies agree with those obtained from the MME and with the true accuracies. Real data Figure 1 shows the accuracy calculated from the MME and realized for milk fat% in Holstein bulls. When G m is used in the MME, the calculated accuracy is greater than that actually realized but when ^G is used, the calculated accuracy is close to that realized. As the number of markers increases, the accuracy predicted by ^G and realized increases, whereas the accuracy calculated using G m decreases. This occurs because V(G m ) decreases as the number of markers is increased. In other words, when the number of markers is small, G m is subject to large sampling errors, which cause the calculated accuracy to appear high but the real accuracy is reduced. As the number of markers increases, the sampling errors become small and G m and ^G converge. Although BLUP methods of genomic selection assume normally distributed QTL effects, they are not very sensitive to departures from this assumption. This is illustrated here by the use of fat concentration in milk, a trait that is known to be influenced by a QTL with a large effect (DGAT). Despite the existence of this QTL, the theory correctly predicts the accuracy of EBVs calculated using BLUP methods. Discussion Equivalent models The equivalence of a model based on marker effects and a conventional animal model with the relationship matrix estimated from the markers has been pointed out by several authors (Nejati-Javaremi et al. 1997; Fernando 1998; Villanueva et al. 2005; Habier et al. 2007; Goddard 2009; Strandén & Garrick 2009; VanRaden et al. 2009). The equality of the mean LD and the variance of relationships (shown in Appendix 1) is another aspect of this same equivalence. Both LD and relationships are caused by the inheritance, without recombination, of segments of chromosomes from a common ancestor. If the genome comprised an infinite number of loci all inherited independently (i.e. no linkage), there would be no LD or variation in relationship except that caused by variation in pedigree relationship. Linkage causes 416 ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

9 M. E. Goddard et al. Predict the accuracy of genomic selection 0.9 Fat (%) Accuracy Gm_theoretical Gm_realised Ghat_theoretical Ghat_realised Number of SNPs Figure 1 The accuracy of estimated breeding values for milk fat concentration in Holstein bulls. points close together on a chromosome to have the same coalescence tree. As a consequence of this, there is a correlation between the relationship at one locus and that at neighbouring loci. This, in turn, causes variation in relationship in excess of that caused by variation in pedigree relationship. In the absence of LD, markers would not predict the genotypes at QTL, and the relationship at markers would not predict the relationship at QTL. If the relationship between all pairs of individuals were the same, then all individuals with no phenotype would receive the same EBV. This emphasizes the importance of variation in genomic relationship in driving the accuracy of genomic selection. Thus, it is meaningless to ask whether genomic selection works because it utilizes relationships over and above those due to pedigree or because it utilizes LD: the two explanations are equivalent. The equivalence between a model based on QTL effects and a conventional animal model would be invalidated if LD between QTL systematically increased or decreased total genetic variance. For instance, the Bulmer effect occurs when selection results in negative covariance between QTL because chromosomes tend to carry a mix of positive and negative QTL alleles. In this case, the total genetic variance will be less than that expected from the sum of the QTL variances (V(g) =WW r 2 u ). However, we usually define the genetic variance in a base population where there is assumed to be no Bulmer effect, and this genetic variance will agree with that calculated from the sum of the QTL variances. If the effective population size (N e ) is large, common ancestors tend to be in the distant past and so recombination will have broken up chromosomes into many small pieces that coalesce independently. Consequently, as N e increases, the variation in relationship decreases because the relationship between two individuals is an average over many independent chromosome segments. The derivation in Appendix 1 shows that the relationship is effectively an average over M e segments where M e =2N e Lk - log(n e L) [formula (4)]. In this paper, we point out another equivalence: that between a model using an unbiased estimate of the relationship matrix at the QTL (^G) and that using a residual polygenic effect as well as a random effect described by a relationship matrix at the markers (G m ). This equivalence explains why the number of markers used is important. If the number of markers (M) is too small, G m estimates G too imprecisely. The extent to which the markers track relationships at the QTL depends on M (M + M e ). This formula also describes the extent of LD between markers because M (M + M e )=1 (2 + 4N e c log(nel)) where c is the average distance between markers is Lk M. This is almost the same as the expectation of LD between neighbouring markers r 2 =1 (2 + 4N e c). The difference between the two formulae (log(nel)) can be thought as due to the LD between all other markers and a target marker, not just the nearest marker. Knowledge of the variation in relationship led us to an approximation for the inverse of the matrix V, where V =V(y), and this in turn led to an approximation for the accuracy of EBVs calculated from marker genotypes. This approximation, derived from a consideration of variation in relationships, is the same as that derived by Goddard (2009) and Daetwyler et al. (2008) from a consideration of the accuracy of estimating the effect of a single marker ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

10 Predict the accuracy of genomic selection M. E. Goddard et al. or, more correctly, an effective chromosome segment. Because the formula are the same, it was easy for us to include the correction of Daetwyler et al. (2008), which accounts for the increased accuracy in estimating the effect of one marker when all other markers have been fitted and hence reduced the residual variance. The accuracy of genomic selection depends on the proportion of the genetic variance explained by the SNPs and the accuracy with which the SNP effects are estimated (Dekkers 2007; Goddard 2009). These two components of the accuracy are also used in this paper. The proportion of the genetic variance explained by the markers is b = V(q) V(g), and the accuracy of estimating marker effects is Vð^qÞ=VðqÞ. The proportion of the genetic variance explained by the markers To estimate ^G from G m, we regress G m back towards A and the regression coefficient (b) is the proportion of genetic variance explained by the markers. VanRaden (2008) proposed the same regression equation, but without a thorough derivation of its coefficient b. If QTL are not systematically different to markers, b = M (M + M e ), as shown in Table 4. However, if the QTL are systematically different to the markers, b must be estimated from data on phenotypes. Research on human height (Yang et al. 2010) found that only half the genetic variance was explained by the SNPs owing to imperfect LD between the SNPs and the QTL. Of the remaining half, 10% was because of the finite number of SNPs used ( ) and 40% was because of systematic differences between QTL and SNPs. For instance, the QTL could have lower MAF than the SNPs. Within a breed of cattle such as Holsteins, recent N e is very small (100) compared with in humans. Consequently, the variation in relationships in cattle is large or, equivalently, LD is extensive, and so far fewer SNPs are necessary to explain most of the variance in relationship or, equivalently, most of the variation in QTL. Figure 1 shows that the accuracy of EBVs has reached an asymptote by markers but in humans this has not occurred even after markers. While within a breed (of cattle) the accuracy may reach an asymptote after markers, for between-breed prediction, the accuracy is likely to reach an asymptote at a much higher number of markers. For example, Hayes et al. (2009a) found that the theoretical accuracy in a combined Holstein Jersey population substantially overpredicted the actual accuracy when using G m, even with markers. This is because between breeds, the variation in relationship will be very small so that large numbers of markers are required to predict these relationships accurately (and capture the limited LD that exists between breeds). So for multibreed prediction of GEBV, it is important to use ^G rather than G m when calculating reliabilities and to calculate the regression coefficient b separately for within-breed and between-breed relationships. This parallels the finding of De Roos et al. (2008) that the phase of LD was not conserved between breeds when markers were used. In other words, SNPs are not enough to accurately detect relationships between cattle from different breeds such as Holstein and Jersey. Unified approaches to utilize phenotypic, full pedigree, and genomic information for genetic evaluation, for both genotyped and un-genotyped individuals, have been proposed, which use a single relationship matrix in the BLUP equations (Aguilar et al. 2010). This matrix has both G matrix and A matrix components (e.g. sub-matrices based on relationships derived from genotypes and sub-matrices derived from pedigree relationships. The ^G matrix proposed here would be the most suitable for describing (genomic) relationships among genotyped individuals in such an approach because, if G m was used, the accuracies might be overestimated. However, the relationships in G and A must be expressed to the same base before the matrices are combined (Meuwissen et al. 2011). Further developments Further developments of the methods presented here are desirable. For instance, how would the accuracy of genomic selection be predicted when there are pedigree relationships among the animals as well as relationships estimated from the markers? By analogy with the method used to combine the reliabilities from other sources of data, we suggest that h = R 2 (1 ) R 2 ) should be additive when independent sources of data are combined, but we have not investigated this suggestion in the present paper. Conclusions When the BLUP method of genomic selection is used, the results of this paper can be summarized as a series of recommendations for calculating the accuracy of genomic EBVs: After the data have been collected: Fit an animal model but with the relationship matrix calculated as 418 ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

11 M. E. Goddard et al. Predict the accuracy of genomic selection ^G: The regression coefficients (b) would ideally be calculated by estimating the variance components associated with SNPs and with the residual polygenic variance. However, if QTL are assumed to have properties similar to SNPs, we can estimate b as M (M + M e ). M e in turn can be estimated from observed variation in relationships [V(G ) A) minus the PEV] or from LD (mean r 2 ) or from N e L using equation (4). If ^G is used in the MME, the predicted accuracy of GEBV will also capture linkage and family information that is present among the reference population and selection candidates. Before the data have been collected: Calculate the proportion of genetic variance explained by the SNPs (b) as above. Calculate the accuracy of estimating SNP effects as h (h + 1) where h = Tbh 2 M e. Calculate the reliability (accuracy squared) as R 2 w o = b h (h + 1). Apply the Daetwyler correction to R 2 w=o to obtain R2 D. References Aguilar I., Misztal I., Johnson D.L., Legarra A., Tsuruta S., Lawlor T.J. (2010) A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci., 93, Daetwyler H.D., Villanueva B., Woolliams J.A., Weedon M.N. (2008) Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE, 3, e3395. doi: /journal.pone PMID: Dalton R. (2009) No bull: genes for better milk. Nature, 457, 369. De Roos A.P.W., Hayes B.J., Spelman R., Goddard M.E. (2008) Linkage disequilibrium and persistence of phase in Holstein Friesian, Jersey and Angus cattle. Genetics, 179, Dekkers J.C. (2007) Prediction of response to markerassisted and genomic selection using selection index theory. J. Anim. Breed. Genet., 124, Fernando R.L. (1998) Some true aspects of finite locus models. In: Proceedings of the 6th World Congress of Genetics Applied to Livestock Production, January University of New England, Armidale, Australia, 26, pp Gilmour AR, Gogel BJ, Cullis BR, Welham SJ, Thompson R. ASReml User Guide Release 1.0. VSN International Ltd., Hemel Hempstead, UK; Goddard M.E. (2009) Genomic selection: prediction of accuracy and maximisation of long term response. Genetica, 136, Habier D., Fernando R.L., Dekkers J.C. (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics, 177, Harris B.L., Johnson D.L. (2010) Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation. J. Dairy Sci., 93, Hayes B.J., Bowman P.J., Chamberlain A.C., Verbyla K., Goddard M.E. (2009a) Accuracy of genomic breeding values in multi-breed populations. Genet. Sel. Evol., 41, 51. Hayes B.J., Visscher P.M., Goddard M.E. (2009b) Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res., 91, Henderson C.R. (1984) Applications of Linear Models in Animal Breeding. University of Guelph, Guelph, Ontario. Hill W.G., Robertson A. (1968) Linkage disequilibrium in finite populations. Theor. Appl. Genet., 38, Kimura M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61, Matukumalli L.K., Lawley C.T., Schnabel R.D., Taylor J.F., Allan M.F., Heaton M.P., O Connell J., Moore S.S., Smith T.P., Sonstegard T.S., Van Tassell C.P. (2009) Development and characterization of a high density SNP genotyping assay for cattle. PLoS ONE, 4, e5350. Meuwissen T.H..E., Hayes B.J., Goddard M.E. (2001) Prediction of total genetic value using genome wide dense marker maps. Genetics, 157, Meuwissen T. H. E., Luan T., Woolliams J. A (2011) The unified approach to the use of genomic and pedigree information in genomic evaluations revisited. J. Anim. Breed. Genet., 128, Nejati-Javaremi A., Smith C., Gibson J. (1997) Effect of total allelic relationship on accuracy of evaluation and response to selection. J. Anim. Sci., 75, Powell J.E., Visscher P.M., Goddard M.E. (2010) Reconciling the analysis of IBD and IBS in complex trait studies. Nat. Rev. Genet., 11, Pryce J.E., Bolormaa S., Chamberlain A.J., Bowman P.J., Savin K., Goddard M.E., Hayes B.J. (2010) A validated genome-wide association study in two dairy cattle breeds for milk production and fertility traits using variable length haplotypes. J. Dairy Sci., 93, Schaeffer L.R. (2006) Strategy for applying genome-wide selection in dairy cattle. J. Anim. Breed. Genet., 123, Scheet P., Stephens M.A. (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet., 78, Strandén I., Garrick D.J. (2009) Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit J. Dairy Sci., 92, Sved J.A. (1971) Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor. Popul. Biol., 2, ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

12 Predict the accuracy of genomic selection M. E. Goddard et al. Tenesa A., Navarro P., Hayes B.J., Duffy D.L., Clarke G.M., Goddard M.E., Visscher P.M. (2007) Recent human effective population size estimated from linkage disequilibrium. Genome Res., 17, VanRaden P.M. (2008) Efficient methods to compute genomic predictions. J. Dairy Sci., 91, VanRaden P.M., Van Tassell C.P., Wiggans G.R., Sonstegard T.S., Schnabel R.D. et al. (2009) Invited Review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci., 92, Villanueva B., Pong-Wong R., Fernandez J., Toro M.A. (2005) Benefits from marker-assisted selection under an additive polygenic genetic model. J. Anim. Sci., 83, Yang J., Beben B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.F., Heath A.C., Martin N.G., Montgomery G.W., Goddard M.E., Visscher P.M. (2010) Missing heritability of human height explained by genomic relationships. Nat. Genet., 42, Appendix 1 The properties of the genetic covariance matrix WW Using the same model as in the main text, that is g = Wu and assuming u N(0, I), so that V(g) =WW and the ij element of this matrix is the covariance of breeding values between animals i and j. The elements of W (w ik ) describe the genotype of animal i at marker k. Then the ith diagonal element of WW 0 ¼ w 0 iw i Eðw 0 iw i Þ ¼ R2p j 1 p j ¼ r 2 g off diagonal element ij of WW 0 ¼ w 0 iw j ¼ Rw ik w jk E w 0 i w j ¼ 0 because the animals are unrelated Vðwi 0 w jþ¼eðrw ik w jk ÞðRw ik w jk Þ ¼ EðRRðw ik w jk Þðw il w jl Þ ¼ RREðw ik w jk Þðw il w jl Þ ¼ RREðw ik w il ÞEðw jk w jl Þ ¼ RRCovðw k ; w l Þ 2 ¼ RRrkl 2 2p kð1 p k Þ2p l ð1 p l Þ ¼ RR2p k ð1 p k Þ2p l ð1 p l ÞRRrkl 2 =½Qð1 QÞŠ where Q is the number of QTL ¼ r 4 g mean ðr2 Þ where r 2 is the usual measure of LD Thus, G = WW r 2 g is a relationship matrix with diagonal elements averaging 1. The off-diagonal elements of G have mean = 0 and their variance is the mean of the r 2 measure of linkage disequilibrium over all pairs of loci. If we consider the QTL to be M unlinked loci, then r 2 kk = 1 and r 2 kl = 0, so the mean of r 2 =1 M. However, if we assume that QTL are spread all along the chromosome, we can evaluate the mean by integrating: mean (r 2 )=[òò r 2 kl dk dl] (L 2 ), where the limits of integration are 0, and L, the length of the chromosome. Assuming E(r 2 )=1 (2 + 4Nc) (Tenesa et al. 2007), where N is the effective population size and c is the distance between loci in Morgan, the ZZ mean ðr 2 Þ¼½ rkl 2 dkdlš=ðl2 Þ ZZ ¼ 1=ð2 þ 4nðl kþþdkdl=l 2 ¼½ð2þ4NLÞ logð2 þ 4NLÞ 4NL 4NL log 2 2 log 2Š=ð8N 2 L 2 Þ logðnlþ=ð2nlþ for large NL If the genome is made up of k chromosomes, each of length L mean r 2 logðnlþ= ð2nlkþ for large NL: Thus, the number of effective QTL (M) is 2NLk log(nl) even if the number of actual QTL is infinite because linkage generates LD, which is a correlation between loci to one another. If one prefers to use E(r 2 )=1 (1 + 4Nc) (Sved 1971), because the LD is driven by inbreeding without new mutation, then mean (r 2 ) log(2nl) (2NLk) for large NL. Appendix 2 Heuristic approximation for V* )1 From the main text V ¼ G m r 2 q þ I r2 e þ r2 a where G m =XX M. G m can be written as I+D Then where EðDÞ ¼ 0 VðDÞ ¼ 1=Me ði þ DÞ 1 ¼ I D þ D 2... I þ D ª 2011 Blackwell Verlag GmbH J. Anim. Breed. Genet. 128 (2011)

Genotyping strategy and reference population

Genotyping strategy and reference population GS cattle workshop Genotyping strategy and reference population Effect of size of reference group (Esa Mäntysaari, MTT) Effect of adding females to the reference population (Minna Koivula, MTT) Value of

More information

GBLUP and G matrices 1

GBLUP and G matrices 1 GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described

More information

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model

More information

FOR animals and plants, many genomic analyses with SNP

FOR animals and plants, many genomic analyses with SNP INVESTIGATION Inexpensive Computation of the Inverse of the Genomic Relationship Matrix in Populations with Small Effective Population Size Ignacy Misztal 1 Animal and Dairy Science, University of Georgia,

More information

Best unbiased linear Prediction: Sire and Animal models

Best unbiased linear Prediction: Sire and Animal models Best unbiased linear Prediction: Sire and Animal models Raphael Mrode Training in quantitative genetics and genomics 3 th May to th June 26 ILRI, Nairobi Partner Logo Partner Logo BLUP The MME of provided

More information

Large scale genomic prediction using singular value decomposition of the genotype matrix

Large scale genomic prediction using singular value decomposition of the genotype matrix https://doi.org/0.86/s27-08-0373-2 Genetics Selection Evolution RESEARCH ARTICLE Open Access Large scale genomic prediction using singular value decomposition of the genotype matrix Jørgen Ødegård *, Ulf

More information

A relationship matrix including full pedigree and genomic information

A relationship matrix including full pedigree and genomic information J Dairy Sci 9 :4656 4663 doi: 103168/jds009-061 American Dairy Science Association, 009 A relationship matrix including full pedigree and genomic information A Legarra,* 1 I Aguilar, and I Misztal * INRA,

More information

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Prediction of the Confidence Interval of Quantitative Trait Loci Location Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28

More information

Genomic prediction using haplotypes in New Zealand dairy cattle

Genomic prediction using haplotypes in New Zealand dairy cattle Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2016 Genomic prediction using haplotypes in New Zealand dairy cattle Melanie Kate Hayr Iowa State University

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

Limited dimensionality of genomic information and effective population size

Limited dimensionality of genomic information and effective population size Limited dimensionality of genomic information and effective population size Ivan Pocrnić 1, D.A.L. Lourenco 1, Y. Masuda 1, A. Legarra 2 & I. Misztal 1 1 University of Georgia, USA 2 INRA, France WCGALP,

More information

REDUCED ANIMAL MODEL FOR MARKER ASSISTED SELECTION USING BEST LINEAR UNBIASED PREDICTION. R.J.C.Cantet 1 and C.Smith

REDUCED ANIMAL MODEL FOR MARKER ASSISTED SELECTION USING BEST LINEAR UNBIASED PREDICTION. R.J.C.Cantet 1 and C.Smith REDUCED ANIMAL MODEL FOR MARKER ASSISTED SELECTION USING BEST LINEAR UNBIASED PREDICTION R.J.C.Cantet 1 and C.Smith Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science,

More information

Animal Model. 2. The association of alleles from the two parents is assumed to be at random.

Animal Model. 2. The association of alleles from the two parents is assumed to be at random. Animal Model 1 Introduction In animal genetics, measurements are taken on individual animals, and thus, the model of analysis should include the animal additive genetic effect. The remaining items in the

More information

3. Properties of the relationship matrix

3. Properties of the relationship matrix 3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,

More information

Mixture model equations for marker-assisted genetic evaluation

Mixture model equations for marker-assisted genetic evaluation J. Anim. Breed. Genet. ISSN 931-2668 ORIGINAL ARTILE Mixture model equations for marker-assisted genetic evaluation Department of Statistics, North arolina State University, Raleigh, N, USA orrespondence

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

A reduced animal model with elimination of quantitative trait loci equations for marker-assisted selection

A reduced animal model with elimination of quantitative trait loci equations for marker-assisted selection Original article A reduced animal model with elimination of quantitative trait loci equations for marker-assisted selection S Saito H Iwaisaki 1 Graduate School of Science and Technology; 2 Department

More information

Accounting for read depth in the analysis of genotyping-by-sequencing data

Accounting for read depth in the analysis of genotyping-by-sequencing data Accounting for read depth in the analysis of genotyping-by-sequencing data Ken Dodds, John McEwan, Timothy Bilton, Rudi Brauning, Rayna Anderson, Tracey Van Stijn, Theodor Kristjánsson, Shannon Clarke

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Crosses. Computation APY Sherman-Woodbury «hybrid» model. Unknown parent groups Need to modify H to include them (Misztal et al., 2013) Metafounders

Crosses. Computation APY Sherman-Woodbury «hybrid» model. Unknown parent groups Need to modify H to include them (Misztal et al., 2013) Metafounders Details in ssgblup Details in SSGBLUP Storage Inbreeding G is not invertible («blending») G might not explain all genetic variance («blending») Compatibility of G and A22 Assumption p(u 2 )=N(0,G) If there

More information

Prediction of breeding values with additive animal models for crosses from 2 populations

Prediction of breeding values with additive animal models for crosses from 2 populations Original article Prediction of breeding values with additive animal models for crosses from 2 populations RJC Cantet RL Fernando 2 1 Universidad de Buenos Aires, Departamento de Zootecnia, Facultad de

More information

Prediction of IBD based on population history for fine gene mapping

Prediction of IBD based on population history for fine gene mapping Genet. Sel. Evol. 38 (2006) 231 252 231 c INRA, EDP Sciences, 2006 DOI: 10.1051/gse:2006001 Original article Prediction of IBD based on population history for fine gene mapping Jules HERNÁNDEZ-SÁNCHEZ,ChrisS.HALEY,

More information

Models with multiple random effects: Repeated Measures and Maternal effects

Models with multiple random effects: Repeated Measures and Maternal effects Models with multiple random effects: Repeated Measures and Maternal effects 1 Often there are several vectors of random effects Repeatability models Multiple measures Common family effects Cleaning up

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

population when only records from later

population when only records from later Original article Estimation of heritability in the base population when only records from later generations are available L Gomez-Raya LR Schaeffer EB Burnside University of Guelph, Centre for Genetic

More information

Lecture 11: Multiple trait models for QTL analysis

Lecture 11: Multiple trait models for QTL analysis Lecture 11: Multiple trait models for QTL analysis Julius van der Werf Multiple trait mapping of QTL...99 Increased power of QTL detection...99 Testing for linked QTL vs pleiotropic QTL...100 Multiple

More information

Prediction of Future Milk Yield with Random Regression Model Using Test-day Records in Holstein Cows

Prediction of Future Milk Yield with Random Regression Model Using Test-day Records in Holstein Cows 9 ` Asian-Aust. J. Anim. Sci. Vol. 19, No. 7 : 9-921 July 26 www.ajas.info Prediction of Future Milk Yield with Random Regression Model Using Test-day Records in Holstein Cows Byoungho Park and Deukhwan

More information

Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef sire evaluation

Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef sire evaluation Genet. Sel. Evol. 36 (2004) 325 345 325 c INRA, EDP Sciences, 2004 DOI: 10.1051/gse:2004004 Original article Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef

More information

IN a recent article formulas for computing probabilities

IN a recent article formulas for computing probabilities Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.074344 Prediction of Multilocus Identity-by-Descent William G. Hill 1 and Jules Hernández-Sánchez Institute of Evolutionary

More information

Multiple random effects. Often there are several vectors of random effects. Covariance structure

Multiple random effects. Often there are several vectors of random effects. Covariance structure Models with multiple random effects: Repeated Measures and Maternal effects Bruce Walsh lecture notes SISG -Mixed Model Course version 8 June 01 Multiple random effects y = X! + Za + Wu + e y is a n x

More information

Extension of single-step ssgblup to many genotyped individuals. Ignacy Misztal University of Georgia

Extension of single-step ssgblup to many genotyped individuals. Ignacy Misztal University of Georgia Extension of single-step ssgblup to many genotyped individuals Ignacy Misztal University of Georgia Genomic selection and single-step H -1 =A -1 + 0 0 0 G -1-1 A 22 Aguilar et al., 2010 Christensen and

More information

Single and multitrait estimates of breeding values for survival using sire and animal models

Single and multitrait estimates of breeding values for survival using sire and animal models Animal Science 00, 75: 15-4 1357-798/0/11300015$0 00 00 British Society of Animal Science Single and multitrait estimates of breeding values for survival using sire and animal models T. H. E. Meuwissen

More information

Bases for Genomic Prediction

Bases for Genomic Prediction Bases for Genomic Prediction Andres Legarra Daniela A.L. Lourenco Zulma G. Vitezica 2018-07-15 1 Contents 1 Foreword by AL (it only engages him) 5 2 Main notation 6 3 A little bit of history 6 4 Quick

More information

RESTRICTED M A X I M U M LIKELIHOOD TO E S T I M A T E GENETIC P A R A M E T E R S - IN PRACTICE

RESTRICTED M A X I M U M LIKELIHOOD TO E S T I M A T E GENETIC P A R A M E T E R S - IN PRACTICE RESTRICTED M A X I M U M LIKELIHOOD TO E S T I M A T E GENETIC P A R A M E T E R S - IN PRACTICE K. M e y e r Institute of Animal Genetics, Edinburgh University, W e s t M a i n s Road, Edinburgh EH9 3JN,

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Accuracies of genomically estimated breeding values from pure-breed and across-breed predictions in Australian beef cattle

Accuracies of genomically estimated breeding values from pure-breed and across-breed predictions in Australian beef cattle Boerner et al. Genetics Selection Evolution 2014, 46:61 Genetics Selection Evolution RESEARCH Open Access Accuracies of genomically estimated breeding values from pure-breed and across-breed predictions

More information

Evaluation of Autoregressive Covariance Structures for Test-Day Records of Holstein Cows: Estimates of Parameters

Evaluation of Autoregressive Covariance Structures for Test-Day Records of Holstein Cows: Estimates of Parameters J. Dairy Sci. 88:2632 2642 American Dairy Science Association, 2005. Evaluation of Autoregressive Covariance Structures for Test-Day Records of Holstein Cows: Estimates of Parameters R. M. Sawalha, 1 J.

More information

arxiv: v1 [stat.me] 10 Jun 2018

arxiv: v1 [stat.me] 10 Jun 2018 Lost in translation: On the impact of data coding on penalized regression with interactions arxiv:1806.03729v1 [stat.me] 10 Jun 2018 Johannes W R Martini 1,2 Francisco Rosales 3 Ngoc-Thuy Ha 2 Thomas Kneib

More information

Multi-population genomic prediction. Genomic prediction using individual-level data and summary statistics from multiple.

Multi-population genomic prediction. Genomic prediction using individual-level data and summary statistics from multiple. Genetics: Early Online, published on July 18, 2018 as 10.1534/genetics.118.301109 Multi-population genomic prediction 1 2 Genomic prediction using individual-level data and summary statistics from multiple

More information

Distinctive aspects of non-parametric fitting

Distinctive aspects of non-parametric fitting 5. Introduction to nonparametric curve fitting: Loess, kernel regression, reproducing kernel methods, neural networks Distinctive aspects of non-parametric fitting Objectives: investigate patterns free

More information

A simple method to separate base population and segregation effects in genomic relationship matrices

A simple method to separate base population and segregation effects in genomic relationship matrices Plieschke et al. Genetics Selection Evolution (2015) 47:53 DOI 10.1186/s12711-015-0130-8 Genetics Selection Evolution RESEARCH ARTICLE Open Access A simple method to separate base population and segregation

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Genome-wide linkage disequilibrium and past effective population size in three Korean cattle breeds

Genome-wide linkage disequilibrium and past effective population size in three Korean cattle breeds SHORT COMMUNICATION doi: 10.1111/age.12488 Genome-wide linkage disequilibrium and past effective population size in three Korean cattle breeds P. Sudrajad*, D. W. Seo*, T. J. Choi, B. H. Park, S. H. Roh,

More information

Genetic Parameter Estimation for Milk Yield over Multiple Parities and Various Lengths of Lactation in Danish Jerseys by Random Regression Models

Genetic Parameter Estimation for Milk Yield over Multiple Parities and Various Lengths of Lactation in Danish Jerseys by Random Regression Models J. Dairy Sci. 85:1596 1606 American Dairy Science Association, 2002. Genetic Parameter Estimation for Milk Yield over Multiple Parities and Various Lengths of Lactation in Danish Jerseys by Random Regression

More information

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

MODELLING STRATEGIES TO IMPROVE GENETIC EVALUATION FOR THE NEW ZEALAND SHEEP INDUSTRY. John Holmes

MODELLING STRATEGIES TO IMPROVE GENETIC EVALUATION FOR THE NEW ZEALAND SHEEP INDUSTRY. John Holmes MODELLING STRATEGIES TO IMPROVE GENETIC EVALUATION FOR THE NEW ZEALAND SHEEP INDUSTRY John Holmes A thesis submitted for the degree of Doctor of Philosophy at the University of Otago, Dunedin, New Zealand

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

LINEAR MODELS FOR THE PREDICTION OF ANIMAL BREEDING VALUES SECOND EDITION

LINEAR MODELS FOR THE PREDICTION OF ANIMAL BREEDING VALUES SECOND EDITION LINEAR MODELS FOR THE PREDICTION OF ANIMAL BREEDING VALUES SECOND EDITION LINEAR MODELS FOR THE PREDICTION OF ANIMAL BREEDING VALUES Second Edition R.A. Mrode, PhD Scottish Agricultural College Sir Stephen

More information

Contrasting Models for Lactation Curve Analysis

Contrasting Models for Lactation Curve Analysis J. Dairy Sci. 85:968 975 American Dairy Science Association, 2002. Contrasting Models for Lactation Curve Analysis F. Jaffrezic,*, I. M. S. White,* R. Thompson, and P. M. Visscher* *Institute of Cell,

More information

Quantitative characters - exercises

Quantitative characters - exercises Quantitative characters - exercises 1. a) Calculate the genetic covariance between half sibs, expressed in the ij notation (Cockerham's notation), when up to loci are considered. b) Calculate the genetic

More information

The advantage of factorial mating under selection is uncovered by deterministically predicted rates of inbreeding

The advantage of factorial mating under selection is uncovered by deterministically predicted rates of inbreeding Genet. Sel. Evol. 37 (005) 57 8 57 c INRA, EDP Sciences, 004 DOI: 0.05/gse:004036 Original article The advantage of factorial mating under selection is uncovered by deterministically predicted rates of

More information

A first step toward genomic selection in the multi-breed French dairy goat population

A first step toward genomic selection in the multi-breed French dairy goat population J. Dairy Sci. 96 :794 7305 http://dx.doi.org/ 10.3168/jds.013-6789 American Dairy Science Association, 013. A first step toward genomic selection in the multi-breed French dairy goat population C. Carillier,*

More information

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be Animal Models 1 Introduction An animal model is one in which there are one or more observations per animal, and all factors affecting those observations are described including an animal additive genetic

More information

Maternal Genetic Models

Maternal Genetic Models Maternal Genetic Models In mammalian species of livestock such as beef cattle sheep or swine the female provides an environment for its offspring to survive and grow in terms of protection and nourishment

More information

Lecture 7 Correlated Characters

Lecture 7 Correlated Characters Lecture 7 Correlated Characters Bruce Walsh. Sept 2007. Summer Institute on Statistical Genetics, Liège Genetic and Environmental Correlations Many characters are positively or negatively correlated at

More information

5. Best Linear Unbiased Prediction

5. Best Linear Unbiased Prediction 5. Best Linear Unbiased Prediction Julius van der Werf Lecture 1: Best linear unbiased prediction Learning objectives On completion of Lecture 1 you should be able to: Understand the principle of mixed

More information

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci Quantitative Genetics: Traits controlled my many loci So far in our discussions, we have focused on understanding how selection works on a small number of loci (1 or 2). However in many cases, evolutionary

More information

Evolutionary quantitative genetics and one-locus population genetics

Evolutionary quantitative genetics and one-locus population genetics Evolutionary quantitative genetics and one-locus population genetics READING: Hedrick pp. 57 63, 587 596 Most evolutionary problems involve questions about phenotypic means Goal: determine how selection

More information

Definition of the Subject. The term Animal Breeding refers to the human-guided genetic improvement of phenotypic traits in domestic

Definition of the Subject. The term Animal Breeding refers to the human-guided genetic improvement of phenotypic traits in domestic Page Number: 0 Date:25/4/11 Time:23:49:09 1 A 2 Animal Breeding, Foundations of 3 GUILHERME J. M. ROSA 4 University of Wisconsin 5 Madison, WI, USA 6 Article Outline 7 Glossary 8 Definition of the Subject

More information

Lecture 2. Basic Population and Quantitative Genetics

Lecture 2. Basic Population and Quantitative Genetics Lecture Basic Population and Quantitative Genetics Bruce Walsh. Aug 003. Nordic Summer Course Allele and Genotype Frequencies The frequency p i for allele A i is just the frequency of A i A i homozygotes

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

Repeated Records Animal Model

Repeated Records Animal Model Repeated Records Animal Model 1 Introduction Animals are observed more than once for some traits, such as Fleece weight of sheep in different years. Calf records of a beef cow over time. Test day records

More information

Linear Models for the Prediction of Animal Breeding Values

Linear Models for the Prediction of Animal Breeding Values Linear Models for the Prediction of Animal Breeding Values R.A. Mrode, PhD Animal Data Centre Fox Talbot House Greenways Business Park Bellinger Close Chippenham Wilts, UK CAB INTERNATIONAL Preface ix

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Breeding Values and Inbreeding. Breeding Values and Inbreeding Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A

More information

Genomic model with correlation between additive and dominance effects

Genomic model with correlation between additive and dominance effects Genetics: Early Online, published on May 9, 018 as 10.1534/genetics.118.301015 1 Genomic model with correlation between additive and dominance effects 3 4 Tao Xiang 1 *, Ole Fredslund Christensen, Zulma

More information

Pedigree and genomic evaluation of pigs using a terminal cross model

Pedigree and genomic evaluation of pigs using a terminal cross model 66 th EAAP Annual Meeting Warsaw, Poland Pedigree and genomic evaluation of pigs using a terminal cross model Tusell, L., Gilbert, H., Riquet, J., Mercat, M.J., Legarra, A., Larzul, C. Project funded by:

More information

Evolution of phenotypic traits

Evolution of phenotypic traits Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters

More information

Prediction of response to selection within families

Prediction of response to selection within families Note Prediction of response to selection within families WG Hill A Caballero L Dempfle 2 1 Institzite of Cell, Animal and Population Biology, University of Edinburgh, West Mains Road, Edinburgh, EH9 3JT,

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Orthogonal Estimates of Variances for Additive, Dominance. and Epistatic Effects in Populations

Orthogonal Estimates of Variances for Additive, Dominance. and Epistatic Effects in Populations Genetics: Early Online, published on May 18, 2017 as 10.1534/genetics.116.199406 Orthogonal Estimates of Variances for Additive, Dominance and Epistatic Effects in Populations Z.G. VITEZICA *, 1, A. LEGARRA,

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

NCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157)

NCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157) NCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157) Evidence Statement Q1 Expected coverage Merit Excellence

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi- 11 0012 vkbhatia@iasri.res.in Introduction Variance components are commonly used

More information

Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations

Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations Scientia Agricola http://dx.doi.org/0.590/003-906-04-0383 Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations 43 José Marcelo Soriano

More information

Procedure 2 of Section 2 of ICAR Guidelines Computing of Accumulated Lactation Yield. Computing Lactation Yield

Procedure 2 of Section 2 of ICAR Guidelines Computing of Accumulated Lactation Yield. Computing Lactation Yield of ICAR Guidelines Computing of Accumulated Lactation Yield Table of Contents 1 The Test Interval Method (TIM) (Sargent, 1968)... 4 2 Interpolation using Standard Lactation Curves (ISLC) (Wilmink, 1987)...

More information

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important? Statistical Genetics Agronomy 65 W. E. Nyquist March 004 EXERCISES FOR CHAPTER 3 Exercise 3.. a. Define random mating. b. Discuss what random mating as defined in (a) above means in a single infinite population

More information

Legend: S spotted Genotypes: P1 SS & ss F1 Ss ss plain F2 (with ratio) 1SS :2 WSs: 1ss. Legend W white White bull 1 Ww red cows ww ww red

Legend: S spotted Genotypes: P1 SS & ss F1 Ss ss plain F2 (with ratio) 1SS :2 WSs: 1ss. Legend W white White bull 1 Ww red cows ww ww red On my honor, this is my work GENETICS 310 EXAM 1 June 8, 2018 I. Following are 3 sets of data collected from crosses: 1. Spotted by Plain gave all spotted in the F1 and 9 spotted and 3 plain in the F2.

More information

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013 Lecture 24: Multivariate Response: Changes in G Bruce Walsh lecture notes Synbreed course version 10 July 2013 1 Overview Changes in G from disequilibrium (generalized Bulmer Equation) Fragility of covariances

More information

EXERCISES FOR CHAPTER 7. Exercise 7.1. Derive the two scales of relation for each of the two following recurrent series:

EXERCISES FOR CHAPTER 7. Exercise 7.1. Derive the two scales of relation for each of the two following recurrent series: Statistical Genetics Agronomy 65 W. E. Nyquist March 004 EXERCISES FOR CHAPTER 7 Exercise 7.. Derive the two scales of relation for each of the two following recurrent series: u: 0, 8, 6, 48, 46,L 36 7

More information

Solutions to Problem Set 4

Solutions to Problem Set 4 Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Genomic best linear unbiased prediction method including imprinting effects for genomic evaluation

Genomic best linear unbiased prediction method including imprinting effects for genomic evaluation Nishio Satoh Genetics Selection Evolution (2015) 47:32 DOI 10.1186/s12711-015-0091-y Genetics Selection Evolution RESEARCH Open Access Genomic best linear unbiased prediction method including imprinting

More information

Principles of QTL Mapping. M.Imtiaz

Principles of QTL Mapping. M.Imtiaz Principles of QTL Mapping M.Imtiaz Introduction Definitions of terminology Reasons for QTL mapping Principles of QTL mapping Requirements For QTL Mapping Demonstration with experimental data Merit of QTL

More information

Benefits of dominance over additive models for the estimation of average effects in the presence of dominance

Benefits of dominance over additive models for the estimation of average effects in the presence of dominance G3: Genes Genomes Genetics Early Online, published on August 25, 2017 as doi:10.1534/g3.117.300113 Benefits of dominance over additive models for the estimation of average effects in the presence of dominance

More information

Unit 6 Reading Guide: PART I Biology Part I Due: Monday/Tuesday, February 5 th /6 th

Unit 6 Reading Guide: PART I Biology Part I Due: Monday/Tuesday, February 5 th /6 th Name: Date: Block: Chapter 6 Meiosis and Mendel Section 6.1 Chromosomes and Meiosis 1. How do gametes differ from somatic cells? Unit 6 Reading Guide: PART I Biology Part I Due: Monday/Tuesday, February

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

An indirect approach to the extensive calculation of relationship coefficients

An indirect approach to the extensive calculation of relationship coefficients Genet. Sel. Evol. 34 (2002) 409 421 409 INRA, EDP Sciences, 2002 DOI: 10.1051/gse:2002015 Original article An indirect approach to the extensive calculation of relationship coefficients Jean-Jacques COLLEAU

More information

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50 LECTURE #10 A. The Hardy-Weinberg Equilibrium 1. From the definitions of p and q, and of p 2, 2pq, and q 2, an equilibrium is indicated (p + q) 2 = p 2 + 2pq + q 2 : if p and q remain constant, and if

More information

Computations with Markers

Computations with Markers Computations with Markers Paulino Pérez 1 José Crossa 1 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Computations with Markers 1/20 Contents 1 Genomic relationship matrix 2 3 Big Data!

More information