THE ABILITY TO PREDICT COMPLEX TRAITS from marker data

Size: px
Start display at page:

Download "THE ABILITY TO PREDICT COMPLEX TRAITS from marker data"

Transcription

1 Published November, 011 ORIGINAL RESEARCH Ridge Regression and Other Kernels for Genomic Selection with R Pacage rrblup Jeffrey B. Endelman* Abstract Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marer-assisted selection. Genomic selection addresses this complexity by including all marers in the prediction model. A ey method for the genomic prediction of breeding values is ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other ernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive ernels in plant breeding, a new software pacage for R called rrblup has been developed. At its core is a fast maximum-lielihood algorithm for mixed models with a single variance component besides the residual error, which allows for effi cient prediction with unreplicated training data. Use of the rrblup software is demonstrated through several examples, including the identifi cation of optimal crosses based on superior progeny value. In cross-validation tests, the prediction accuracy with nonadditive ernels was signifi cantly higher than RR for wheat (Triticum aestivum L.) grain yield but equivalent for several maize (Zea mays L.) traits. Published in The Plant Genome 4: Published Nov doi: /plantgenome Crop Science Society of America 5585 Guilford Rd., Madison, WI USA An open-access publication All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. THE ABILITY TO PREDICT COMPLEX TRAITS from marer data is becoming increasingly important in plant breeding (Bernardo, 008). The earliest attempts, now over 0 years old, involved first identifying significant marers and then combining them in a multiple regression model (Lande and Thompson, 1990). The focus over the last decade has been on genomic selection methods, in which all marers are included in the prediction model (Bernardo and Yu, 007; Heffner et al., 009; Jannin et al., 010). One of the first methods proposed for genomic selection was ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) in the context of mixed models (Whittaer et al., 000; Meuwissen et al., 001). The basic RR-BLUP model is y = WGu + ε, [1] where u ~ ( 0, σ u ) N I is a vector of marer effects, G is the genotype matrix (e.g., {aa,aa,aa} = { 1,0,1} for biallelic single nucleotide polymorphisms (SNPs) under an additive model), and W is the design matrix relating lines to observations (y). The BLUP solution for the marer effects can be written as either û= Z ( ZZ +λi) 1 y or ( ) 1 uˆ = Z Z+λI Z y, where Z = WG and the ridge parameter λ = σe / σ is the ratio between the residual and u marer variances (Searle et al., 006). Compared with ordinary regression, for which the number of marers cannot exceed the number of observations, RR has no such limit and also has improved numerical stability Dep. of Crop and Soil Sciences, Washington State Univ., State Route 536, Mount Vernon, WA Received 6 May 011. *Corresponding author (j.endelman@gmail.com). Abbreviations: θ REML, restricted maximum lielihood solution for θ; BLR, Bayesian Linear Regression; BLUP, best linear unbiased prediction; EXP, exponential model; GAUSS, Gaussian model; GEBV, genomic-estimated breeding value; LL, log-lielihood; ML, maximum lielihood; REML, restricted maximum lielihood; RR, ridge regression; r pred, cross-validation accuracy; r train, training population accuracy; SNP, single nucleotide polymorphism. 50 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3

2 when marers are highly correlated (Hoerl and Kennard, 000). There is a close connection between marer-based RR-BLUP (Eq. [1]) and inship-blup, in which the performance of breeding lines is predicted based on their inship to other germplasm (Bernardo, 1994; Piepho et al., 008). The basic inship-blup model is y = Wg + ε g ~ N( 0, σg ) K, [] where g is a vector of genotypic values. In pedigree-based prediction of breeding values, K is the additive relationship matrix A derived from the coefficients of coancestry (Bernardo, 010). These coefficients reflect the average behavior of alleles undergoing Mendelian segregation, but the actual segregation can be captured with the marer-based relationship matrix K RR = GG. [3] Equation [3] has the property that, for random populations, its expected value is proportional to A plus a constant (Habier et al., 007); for this reason it has been called the realized (additive) relationship matrix. Another ey property of K RR is that the genomic-estimated breeding values (GEBVs) it produces ( ĝ in Eq. []) are equivalent to those from the marer-based RR-BLUP approach ( Gu ˆ in Eq. [1]) (Hayes et al., 009). When using genomic selection to advance lines as varieties, it is not just the breeding (additive) value but the full genotypic value that is of interest (Piepho et al., 008). Rather than modeling epistatic interactions directly, which is challenging because of the combinatorial complexity, an alternative approach is to capture them through an appropriate ernel function (Gianola and van Kaam, 008; Piepho, 009; de los Campos et al., 010). The realized relationship model (Eq. [3]) is in fact a ernel in genotype space and can be written as K = Gi, G, where the angle bracets denote the inner j (or dot) product between genotypes i and j. In geometry the inner product measures the similarity of two vectors, so with the additive relationship model the genetic covariance between lines is proportional to their similarity in genotype space. This geometric formulation enables use of the socalled ernel tric in machine learning, which involves replacing the inner product in the original (genotype) space with an inner product in a more complex feature space, technically called a reproducing ernel Hilbert space (Schölopf and Smola, 00): (, ) ( ), ( ) K = K G G = Φ G Φ G, [4] i j i j Equation [4] means that the ernel function K, which taes the two genotypes as arguments and returns a single number, equals the inner product between the genotypes in a feature space defined by Φ. Although one can construct ernels by first specifying Φ and then applying Eq. [4], this is unnecessary as the feature space is guaranteed to exist for any positive semidefinite ernel (Schölopf and Smola, 00). To calculate BLUPs that include nonadditive effects, it is sufficient to solve Eq. [] with K based on an appropriate ernel function (Gianola and van Kaam, 008). The objective of the present research was to develop an R pacage for genomic prediction based on a maximum lielihood (ML) or restricted maximum lielihood (REML) approach to ridge regression (RR) and other ernels. The result is rrblup (available at [verified 1 Nov. 011]), which uses a fast spectral algorithm for mixed models with a single variance component besides the residual error (Kang et al., 008). After demonstrating features of the software, the accuracy of its prediction methods are compared by cross-validation using structured populations of wheat (Triticum aestivum L.) (Crossa et al., 010) and maize (Zea mays L.) (Yu et al., 006). MATERIALS AND METHODS The wheat population consisted of 599 inbred lines genotyped at 179 Diversity Array Technology (DArT) marers and was downloaded as part of the Bayesian Linear Regression (BLR) pacage for R, version 1. (Pérez et al., 010). Single nucleotide polymorphism marers and phenotypic data for maize ear height, ear diameter, and male flowering time were downloaded from the TASSEL website (Bradbury et al., 007). For each of the ten maize chromosomes, the diploid marer data were phased and missing alleles imputed using the software BEAGLE, version (Browning and Browning, 007). After removing monomorphic marers, 953 remained. The population size was 79 inbred lines, but due to missing phenotypic data only 76 lines were available for flowering time and 49 for ear diameter. For each of the 179,101 unique crosses between the 599 wheat lines, the expected mean and standard deviation (SD) for the GEBV of the recombinant inbred progeny were calculated based on the predicted marer effects in environment 1. In the absence of a linage map, marers were assumed to segregate independently, which is clearly an approximation. (With a linage map the SD could be simulated more realistically.) If p + and p denote the frequency of the +1 and 1 alleles, respectively, at locus in the parents, then the mean GEBV of the inbred progeny is Eg ˆ = EG uˆ = ( p + p ) u ˆ, and the i i variance (neglecting uncertainty in the marer effects) is ( ) Var = = g ˆ ˆ ˆ i Var Gi u E Gi E Gi u = 1 ( p + ) ˆ p u Bayesian LASSO predictions were made with the BLR pacage for R, version 1., and hyperparameters were chosen based on the guidelines of Pérez et al. (010). For the prior distribution of the residual variance, the degrees of freedom was df ε = 3 and the scale was S ε = (Var[y]/)( + df ε ), where Var[y] is the variance of the training data. The prior distribution for the LASSO ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 51

3 1/ shrinage parameter had mode ( ) G, where G is the average over the training data and the sum is over marers. The rate and shape hyperparameters were 10 5 and 0.5, respectively. A total of 10,000 iterations was used, with a burn-in period of 000 iterations. Statistical analysis of the cross-validation results was conducted with SAS PROC GLM (SAS Institute, 1994), with partition and method as fixed effects. The REGWQ option was used to control the strong familywise error rate (the probability of false discovery) at RESULTS AND DISCUSSION Marer vs. Kinship-Based Prediction At the core of the rrblup pacage is the function mixed.solve, which solves any mixed model of the form y = Xβ + Zu + ε u ~ N( 0, σu ) K, [5] where X is a full-ran design matrix for the fixed effects β, Z is the design matrix for the random effects u, K is a positive semidefinite matrix, and the residuals are normal with constant variance. Variance components are estimated by either ML or REML (default) using the spectral decomposition algorithm of Kang et al. (008). The R function returns the variance components, the maximized log-lielihood (LL), the ML estimate for β, and BLUP solution for u. It was stated in the introduction that when the realized relationship matrix GG is used, the marer-based (Eq. [1]) and inship-based (Eq. []) formulations of the prediction problem give equivalent GEBV. This can be verified numerically using mixed.solve and a set of 599 wheat lines from the BLR pacage for R (Pérez et al., 010). The BLR variable Y contains the two-year average grain yield in four environments (standardized to zero mean and unit variance), and the genotype matrix is coded as {0,1} in the variable X. To be consistent with the notation in this article, the genotypes were recoded as { 1,1} in G: library(rrblup) #load rrblup library(blr) #load BLR data(wheat) #load wheat data G <- *X 1 #recode genotypes y <- Y[,1] #yields from E1 #marer-based ans1 <- mixed.solve(y=y,z=g) #inship-based K <- tcrossprod(g) #K = GG' I <- diag(599) ans <- mixed.solve(y=y,z=i,k=k) #Compare GEBV cor (G%*%ans1$u, ans$u) #equals 1 In the first call to mixed.solve the design matrix equals the genotype matrix, so the random effects are the marer effects. In this case K is an identity matrix, which the software assumes because no K variable is provided. When no design matrix for fixed effects is provided, as in this example, an intercept term is automatically included. In the second call to mixed.solve, an identity matrix is used for Z and the realized relationship matrix GG is used for K. In this case the random effects are the breeding values, which in the last line of code are compared with the GEBV from the marer-based model. As shown in the comments, the correlation is exactly 1. Each of the two calls to mixed.solve too five seconds on a laptop computer with two gigabytes of memory, running R.13.1 (R Development Core Team, 011). Although the two approaches are equivalent for calculating GEBV, some analyses depend on nowing the marer effects. For example, when different lines are evaluated in different environments, even though a whole genotype environment analysis is not possible, one can still study marer environment interactions (Crossa et al., 010). Another application is to design crosses in a breeding program (Bernardo et al., 006; Zhong and Jannin, 007). The expected mean for the progeny can be calculated as the mean of the parental GEBV, but the marer effects are needed to compute the variance of the population, which is important for genetic gain. To illustrate, each circle in Fig. 1 shows the expected mean (μ) and standard deviation (σ) for the GEBV of recombinant inbred lines from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marer effects in environment 1. In the upper right corner of the figure are crosses between lines with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected. For a given selection intensity i, the mean of the selected population is μ s = μ + iσ, which Zhong and Jannin (007) called the superior progeny value. The superior progeny values for the crosses in Fig. 1 were calculated for selection intensities ranging from 1.4 (0% selected) to.7 (1% selected). The top nine crosses were conserved across this range and are listed in Table 1, with lines identified by their GEBV ran. Exactly one of the top two highest-gebv lines was found in every pair, but the 1 cross does not appear because the two lines share 96% of their alleles and have an expected SD of Kernels with Epistatic Effects At present there are two ernels other than RR in the rrblup pacage. One is the Gaussian model (GAUSS): K = exp[ (D /θ) ], [6] Where 1/ M D = ( ) ( ) 1/4M Gi G j [7] = 1 is the Euclidean distance between genotypes i and j, normalized to the interval [0,1]. The parameter θ is a scale parameter that influences how quicly the genetic 5 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3

4 Table 1. Top nine wheat crosses based on superior progeny value (SPV) in environment 1. Cross Kinship SPV 0% SPV 1% Mean GEBV SD GEBV Line identifi er equals the GEBV ran. Fraction of shared alleles (identity by state). GEBV, genomic-estimated breeding value. Figure 1. Analysis of line crosses. Each circle is the expected mean and standard deviation (SD) for the genomic-estimated breeding values (GEBVs) of the recombinant inbred progeny from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marer effects in environment 1. In the top right of the figure are crosses between parents with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected. covariance decays with distance. The other ernel is the exponential model (EXP): K = exp( D /θ). These ernels are available through the rrblup function inship.blup, which was designed to predict the genotypic values of one population based on the genotypes and phenotypes of a second, training population. To illustrate its use, consider again the 599 wheat lines from the BLR pacage, which have been randomly partitioned into 10 sets for use in 10-fold cross-validation (Pérez et al., 010). The variable sets contain the partition number for each line. To predict the genotypic values of set 1 using the other nine sets as the training population, the R code is train <- which(sets!=1) pred <- which(sets==1) ans.rr<-inship.blup(y=y[train], G.train=G[train,],G.pred=G[pred,]) ans.gauss<-inship.blup(y=y[train], G.train=G[train,],G.pred=G[pred,], K.method="GAUSS") #accuracy with RR cor(ans.rr$g.pred,y[pred]) #accuracy with GAUSS cor(ans.gauss$g.pred,y[pred]) r between the predicted genotypic value and observed phenotype for the prediction population, which In the first call to inship.blup the ernel method is not specified, so by default the realized relationship model is used. The last two lines of code calculate the correlation ( ĝy ) measures the cross-validation accuracy of the prediction method. Table shows the accuracies of the two methods for all 10 sets in environments 1 and. The results demonstrate that the performance of GAUSS compared to RR depends on both the structure of the population and the phenotype. For 9 out of 10 sets in environment 1, the accuracy with GAUSS was higher than RR. The largest gap was for set 5, where the accuracy with RR was 0.34 vs with GAUSS. Across the 10 sets the mean accuracy with GAUSS was 0.58 vs for RR (p = by paired t-test). By contrast, in environment there was no significant difference between the prediction methods (p = 0.). To better understand these differences, Fig. shows the log-lielihood (LL) (solid circles), training population accuracy (r train ) (dashed line), and cross-validation accuracy (r pred ) (open circles) as a function of the scale parameter θ (see Eq. [6]). The rrblup pacage uses REML (or ML) to identify the optimal scale parameter, and because the genotype distances have been normalized to the unit interval (Eq. [7]), this is also the essential range for θ. The two panels in Fig. correspond to sets 5 and 6 in environment 1, which showed contrasting results in the RR vs. GAUSS comparison: for set 5 the accuracy with GAUSS was higher and vice versa for set 6 (see Table ). In both cases the REML solution for θ (θ REML ) was similar and the r train approached 1 as θ decreased to zero. The crucial difference lies in r pred. For set 5 r pred exhibited an interior maximum near the θ REML while for set 6 r pred was maximized at θ = 1 and declined steadily as θ decreased. The significance of this observation for understanding Table is that GAUSS behaves lie RR when θ is large relative to D. This follows from the Taylor series 4 expansion, K = 1 ( D θ ) + 1/( D θ ) +, and the fact that [ D ] is equivalent to the additive model GG for inbred lines (Piepho, 009). As θ decreases, the epistatic 4 interactions in the higher order terms (e.g., D ) become more important. When r pred has an interior maximum near θ REML, as in set 5, GAUSS will have higher accuracy than RR. When r pred increases monotonically with θ, ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 53

5 GAUSS will not have higher accuracy than RR; whether GAUSS is lower or equivalent depends on the shape of the LL profile. In the case of set 6, the LL profile peaed at θ REML = 0.4, so RR had higher accuracy. For most sets in environment, both LL and r pred increased monotonically with θ (not shown), so GAUSS and RR were equivalent. These phenomena are relevant to the question of whether GAUSS is prone to overfitting, which Piepho (009) and Heslot et al. (01) have raised as a concern. In both studies the residual error with GAUSS was much smaller than with RR, or equivalently the accuracy for the training population was nearly 1. This was also observed with the BLR wheat data, as shown by the dashed line in Fig.. To constitute overfitting, however, there must be a tradeoff between higher accuracy for the training set and lower accuracy for the validation set (Dietrich, 1995). The results in Heslot et al. (01) and the present study show that such a tradeoff is rare provided the scale parameter is chosen properly. Overfitting was observed for set 6 in environment 1, but more typically r pred was either the same or higher with GAUSS compared to RR (see Table ). To investigate the matter further, a different data set 79 maize lines genotyped at 953 SNP marers was analyzed with the rrblup pacage. The cross-validation accuracies for maize flowering time, ear height, and ear diameter are shown alongside the results for wheat grain yield in Table 3. For wheat grain yield, the accuracy with GAUSS was 6 to 7 percentage points higher than RR in every environment but environment (similar to Crossa et al. [010]). For all three maize traits there was no significant difference between GAUSS and RR, which provides additional evidence that overfitting (i.e., Table. Cross-validation accuracies ( r ĝy ) for wheat grain yield. Environment 1 Environment Set RR GAUSS RR GAUSS Mean ** **Means signifi cantly different at the 0.01 probability level in Environment 1. Prediction set; the other nine sets were used for training. RR, ridge regression. GAUSS, Gaussian model. a loss in cross-validation accuracy) is not common with GAUSS. The results also suggest that most (perhaps all) of the genetic variation was additive for the maize traits. Table 3 includes the cross-validation results with EXP, which was equivalent to GAUSS for all seven traits. Piepho (009) also found little difference between these two models in his analysis of maize grain yield. Lie GAUSS, EXP captures nonadditive effects but the structure of its feature space is different. For the limited plant breeding data analyzed thus far with the two methods, this difference appears to be of little consequence. Figure. Performance of the Gaussian model (GAUSS). The figure depicts the effect of the Gaussian scale parameter (θ in Eq. [6]) on the restricted log-lielihood (LL), the training population accuracy (r train ), and the cross-validation accuracy (r pred ) when predicting sets 5 or 6 in environment 1. For set 5 the restricted maximum lielihood solution for θ (θ REML ) = 0.5, and for set 6 θ REML = 0.4. In both cases r train approached 1 as θ 0, but the trends for r pred were different. For set 5 r pred exhibited an interior maximum near θ REML, while for set 6 r pred increased monotonically with θ. Because GAUSS is approximately ridge regression (RR) when θ is large, the contrasting behavior in this figure illustrates why GAUSS had higher r pred than RR for set 5 but vice versa for set 6 (see Table ). 54 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3

6 Table 3. Tenfold cross-validation accuracy ( r ĝy ) for maize and wheat traits. Method Wheat yield 1 Wheat yield Wheat yield 3 Wheat yield 4 Maize flowering time Maize ear height Maize ear diameter GAUSS 0.58 a 0.49 a 0.45 a 0.54 a 0.73 a 0.51 a 0.53 ab EXP 0.57 a 0.49 a 0.45 a 0.54 a 0.73 a 0.54 a 0.54 a RR 0.51 b 0.48 a 0.38 b 0.48 b 0.73 a 0.51 a 0.5 b BL 0.51 b 0.48 a 0.38 b 0.47 b 0.73 a 0.5 a 0.53 ab GAUSS, Gaussian model; EXP, exponential model; RR, ridge regression; BL, Bayesian LASSO. Within each trait, accuracies with the same letter were not signifi cantly different at the 0.05 probability level. For the sae of comparison, Table 3 also shows the accuracy of the additive Bayesian LASSO model, which was equivalent to RR for all seven traits. CONCLUSIONS The objective of this research was to create software that maes ridge regression and other ernel methods accessible to plant breeders interested in genomic selection. At the core of the rrblup pacage is the function mixed.solve, which can be used to solve both the marerbased and inship-based versions of the genomic prediction problem. The function inship.blup provides a more intuitive interface for inship-based prediction and includes several genetic models, including an additive relationship matrix and the nonadditive Gaussian ernel. Acnowledgments The author thans Jean-Luc Jannin for his mentoring and helpful comments on the manuscript. References Bernardo, R Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci. 34:0 5. Bernardo, R Molecular marers and selection for complex traits in plants: Learning from the last 0 years. Crop Sci. 48: Bernardo, R Quantitative traits in plant breeding. Stemma Press, Woodbury, MN. Bernardo, R., L. Moreau, and A. Charcosset Number and fitness of selected individuals in marer-assisted and phenotypic recurrent selection. Crop Sci. 46: Bernardo, R., and J. Yu Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 47: Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and E.S. Bucler TASSEL: Software for association mapping of complex traits in diverse samples. Available at maizegenetics.net/tassel (verified 1 Nov. 011). Bioinformatics 3: Browning, S.R., and B.L. Browning Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81: de los Campos, G., D. Gianola, G.J.M. Rosa, K.A. Weigel, and J. Crossa Semi-parametric genomic-enabled prediction of genetic values using reproducing ernel Hilbert spaces methods. Genet. Res. Camb. 9: Crossa, J., G. de los Campos, P. Pérez, D. Gianola, J. Burgueño, J.L. Araus, D. Maumbi, R.P. Singh, S. Dreisigacer, J. Yan, V. Arief, M. Banziger, and H.-J. Braun Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular marers. Genetics 186: Dietrich, T Overfitting and undercomputing in machine learning. ACM Comput. Surv. 7: Gianola, D., and J.B. van Kaam Reproducing ernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178: Habier, D., R.L. Fernando, and J.C.M. Deers The impact of genetic relationship information on genome-assisted breeding values. Genetics 177: Hayes, B.J., P.M. Visscher, and M.E. Goddard Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. Camb. 91: Heffner, E.L., M.E. Sorrells, and J.-L. Jannin Genomic selection for crop improvement. Crop Sci. 49:1 1. Heslot, N., H.-P. Yang, M.E. Sorrells, and J.-L. Jannin. 01. Genomic selection in plant breeding: A comparison of models. Crop Sci. 5: Hoerl, A.E., and R.W. Kennard Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 4: Jannin, J.-L., A.J. Lorenz, and H. Iwata Genomic selection in plant breeding: From theory to practice. Brief. Funct. Genomic 9: Kang, H.M., N.A. Zaitlen, C.M. Wade, A. Kirby, D. Hecerman, M.J. Daly, and E. Esin Efficient control of population structure in model organism association mapping. Genetics 178: Lande, R., and R. Thompson Efficiency of marer-assisted selection in the improvement of quantitative traits. Genetics 14: Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard Prediction of total genetic value using genome-wide dense marer maps. Genetics 157: Pérez, P., G. de los Campos, J. Crossa, and D. Gianola Genomicenabled prediction based on molecular marers and pedigree using the Bayesian Linear Regression pacage in R. Plant Gen. 3: Piepho, H.P Ridge regression and extensions for genomewide selection in maize. Crop Sci. 49: Piepho, H.P., J. Möhring, A.E. Melchinger, and A. Büchse BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:09 8. R Development Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. SAS Institute SAS 9. for Windows. SAS Institute, Cary, NC. Schölopf, B., and A.J. Smola. 00. Learning with ernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA. Searle, S.R., G. Casella, and C.E. McCulloch Variance components. John Wiley & Sons, Hoboen, NJ. Whittaer, J.C., R. Thompson, and M.C. Denham Marer-assisted selection using ridge regression. Genet. Res. Camb. 75:49 5. Yu, J., G. Pressoir, W.H. Briggs, I.V. Bi, M. Yamasai, J.F. Doebley, M.D. McMullen, B.S. Gaut, D.M. Nielsen, J.B. Holland, S. Kresovich, and E.S. Bucler A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: Zhong, S., and J.-L. Jannin Using quantitative trait loci results to discriminate among crosses on the basis of their progeny mean and variance. Genetics 177: ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 55

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions

More information

Prediction of genetic Values using Neural Networks

Prediction of genetic Values using Neural Networks Prediction of genetic Values using Neural Networks Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1 1 CIMMyT-Mexico 2 University of Wisconsin, Madison. September, 2014 SLU,Sweden Prediction of genetic Values

More information

Recent advances in statistical methods for DNA-based prediction of complex traits

Recent advances in statistical methods for DNA-based prediction of complex traits Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology

More information

Supplementary Information

Supplementary Information Supplementary Information 1 Supplementary Figures (a) Statistical power (p = 2.6 10 8 ) (b) Statistical power (p = 4.0 10 6 ) Supplementary Figure 1: Statistical power comparison between GEMMA (red) and

More information

Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations

Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations Scientia Agricola http://dx.doi.org/0.590/003-906-04-0383 Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations 43 José Marcelo Soriano

More information

arxiv: v1 [stat.me] 10 Jun 2018

arxiv: v1 [stat.me] 10 Jun 2018 Lost in translation: On the impact of data coding on penalized regression with interactions arxiv:1806.03729v1 [stat.me] 10 Jun 2018 Johannes W R Martini 1,2 Francisco Rosales 3 Ngoc-Thuy Ha 2 Thomas Kneib

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Package BLR. February 19, Index 9. Pedigree info for the wheat dataset

Package BLR. February 19, Index 9. Pedigree info for the wheat dataset Version 1.4 Date 2014-12-03 Title Bayesian Linear Regression Package BLR February 19, 2015 Author Gustavo de los Campos, Paulino Perez Rodriguez, Maintainer Paulino Perez Rodriguez

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

Package BGGE. August 10, 2018

Package BGGE. August 10, 2018 Package BGGE August 10, 2018 Title Bayesian Genomic Linear Models Applied to GE Genome Selection Version 0.6.5 Date 2018-08-10 Description Application of genome prediction for a continuous variable, focused

More information

Genomewide Selection in Oil Palm: Increasing Selection Gain per Unit Time and Cost with Small Populations

Genomewide Selection in Oil Palm: Increasing Selection Gain per Unit Time and Cost with Small Populations Genomewide Selection in Oil Palm: Increasing Selection Gain per Unit Time and Cost with Small Populations C.K. Wong R. Bernardo 1 ABSTRACT Oil palm (Elaeis guineensis Jacq.) requires 19 years per cycle

More information

BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS. Universidad de Quintana Roo, Chetumal, Quintana Roo, México.

BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS. Universidad de Quintana Roo, Chetumal, Quintana Roo, México. G3: Genes Genomes Genetics Early Online, published on October 28, 2016 as doi:10.1534/g3.116.035584 1 BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS Jaime Cuevas 1, José

More information

Package LBLGXE. R topics documented: July 20, Type Package

Package LBLGXE. R topics documented: July 20, Type Package Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Package bwgr. October 5, 2018

Package bwgr. October 5, 2018 Type Package Title Bayesian Whole-Genome Regression Version 1.5.6 Date 2018-10-05 Package bwgr October 5, 2018 Author Alencar Xavier, William Muir, Shizhong Xu, Katy Rainey. Maintainer Alencar Xavier

More information

Principles of QTL Mapping. M.Imtiaz

Principles of QTL Mapping. M.Imtiaz Principles of QTL Mapping M.Imtiaz Introduction Definitions of terminology Reasons for QTL mapping Principles of QTL mapping Requirements For QTL Mapping Demonstration with experimental data Merit of QTL

More information

GBLUP and G matrices 1

GBLUP and G matrices 1 GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described

More information

Lecture 9 Multi-Trait Models, Binary and Count Traits

Lecture 9 Multi-Trait Models, Binary and Count Traits Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait

More information

Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations

Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations 4 Scientia Agricola http://dx.doi.org/0.590/678-99x-05-0479 Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations José Marcelo Soriano

More information

Genotype-Environment Effects Analysis Using Bayesian Networks

Genotype-Environment Effects Analysis Using Bayesian Networks Genotype-Environment Effects Analysis Using Bayesian Networks 1, Alison Bentley 2 and Ian Mackay 2 1 scutari@stats.ox.ac.uk Department of Statistics 2 National Institute for Agricultural Botany (NIAB)

More information

Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models

Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models GENOMIC SELECTION Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models Jaime Cuevas,* José Crossa,,1 Osval A. Montesinos-López, Juan Burgueño, Paulino Pérez-Rodríguez, and

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

Software for genome-wide association studies having multivariate responses: Introducing MAGWAS

Software for genome-wide association studies having multivariate responses: Introducing MAGWAS Software for genome-wide association studies having multivariate responses: Introducing MAGWAS Chad C. Brown 1 and Alison A. Motsinger-Reif 1,2 1 Department of Statistics, 2 Bioinformatics Research Center

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction

Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction Sergio Pérez- Elizalde, Jaime Cuevas, Paulino Pérez- Rodríguez,and José Crossa One of the most

More information

Partitioning Genetic Variance

Partitioning Genetic Variance PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive

More information

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) Kelly Swarts PAG Allele Mining 1/11/2014 Imputation is the projection

More information

TASK 6.3 Modelling and data analysis support

TASK 6.3 Modelling and data analysis support Wheat and barley Legacy for Breeding Improvement TASK 6.3 Modelling and data analysis support FP7 European Project Task 6.3: How can statistical models contribute to pre-breeding? Daniela Bustos-Korts

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Large scale genomic prediction using singular value decomposition of the genotype matrix

Large scale genomic prediction using singular value decomposition of the genotype matrix https://doi.org/0.86/s27-08-0373-2 Genetics Selection Evolution RESEARCH ARTICLE Open Access Large scale genomic prediction using singular value decomposition of the genotype matrix Jørgen Ødegård *, Ulf

More information

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012 Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O

More information

Supplementary Materials

Supplementary Materials Supplementary Materials A Prior Densities Used in the BGLR R-Package In this section we describe the prior distributions assigned to the location parameters, (β j, u l ), entering in the linear predictor

More information

Priors in whole-genome regression: the Bayesian alphabet returns

Priors in whole-genome regression: the Bayesian alphabet returns Genetics: Early Online, published on May, 3 as.534/genetics.3.5753 Priors in whole-genome regression: the Bayesian alphabet returns 3 Daniel Gianola Department of Animal Sciences, Department of Biostatistics

More information

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer

More information

H = σ 2 G / σ 2 P heredity determined by genotype. degree of genetic determination. Nature vs. Nurture.

H = σ 2 G / σ 2 P heredity determined by genotype. degree of genetic determination. Nature vs. Nurture. HCS825 Lecture 5, Spring 2002 Heritability Last class we discussed heritability in the broad sense (H) and narrow sense heritability (h 2 ). Heritability is a term that refers to the degree to which a

More information

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Prediction of the Confidence Interval of Quantitative Trait Loci Location Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28

More information

Package brnn. R topics documented: January 26, Version 0.6 Date

Package brnn. R topics documented: January 26, Version 0.6 Date Version 0.6 Date 2016-01-26 Package brnn January 26, 2016 Title Bayesian Regularization for Feed-Forward Neural Networks Author Paulino Perez Rodriguez, Daniel Gianola Maintainer Paulino Perez Rodriguez

More information

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6 Yamamoto et al. BMC Genetics 2014, 15:50 METHODOLOGY ARTICLE Open Access Effect of advanced intercrossing on genome structure and on the power to detect linked quantitative trait loci in a multi-parent

More information

In animal and plant breeding, phenotypic selection indices

In animal and plant breeding, phenotypic selection indices Published December 30 2015 RESEARCH Statistical Sampling Properties of the Coefficients of Three Phenotypic Selection Indices J. Jesus Cerón-Rojas José Crossa* Jaime Sahagún-Castellanos ABSTRACT The aim

More information

Selection Methods in Plant Breeding

Selection Methods in Plant Breeding Selection Methods in Plant Breeding Selection Methods in Plant Breeding 2nd Edition by Izak Bos University of Wageningen, The Netherlands and Peter Caligari University of Talca, Chile A C.I.P. Catalogue

More information

Genotyping strategy and reference population

Genotyping strategy and reference population GS cattle workshop Genotyping strategy and reference population Effect of size of reference group (Esa Mäntysaari, MTT) Effect of adding females to the reference population (Minna Koivula, MTT) Value of

More information

MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES

MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES Patricia E Klein, Mandy Yan, Ellen Young, Jeekin Lau, Stella Kang, Natalie Patterson, Natalie Anderson and David Byrne Department of Horticultural Sciences,

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

3. Properties of the relationship matrix

3. Properties of the relationship matrix 3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,

More information

Maize Genetics Cooperation Newsletter Vol Derkach 1

Maize Genetics Cooperation Newsletter Vol Derkach 1 Maize Genetics Cooperation Newsletter Vol 91 2017 Derkach 1 RELATIONSHIP BETWEEN MAIZE LANCASTER INBRED LINES ACCORDING TO SNP-ANALYSIS Derkach K. V., Satarova T. M., Dzubetsky B. V., Borysova V. V., Cherchel

More information

Washington Grain Commission Wheat and Barley Research Annual Progress Reports and Final Reports

Washington Grain Commission Wheat and Barley Research Annual Progress Reports and Final Reports Washington Grain Commission Wheat and Barley Research Annual Progress Reports and Final Reports PROJECT #: 30109-5345 Progress report year: 3 of 3 Title: Evaluation And Selection For Cold Tolerance In

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

A. Motivation To motivate the analysis of variance framework, we consider the following example.

A. Motivation To motivate the analysis of variance framework, we consider the following example. 9.07 ntroduction to Statistics for Brain and Cognitive Sciences Emery N. Brown Lecture 14: Analysis of Variance. Objectives Understand analysis of variance as a special case of the linear model. Understand

More information

c. M. Hernandez, J. Crossa, A. castillo

c. M. Hernandez, J. Crossa, A. castillo THE AREA UNDER THE FUNCTION: AN INDEX FOR SELECTING DESIRABLE GENOTYPES 8 9 c. M. Hernandez, J. Crossa, A. castillo 0 8 9 0 Universidad de Colima, Mexico. International Maize and Wheat Improvement Center

More information

Genetics (patterns of inheritance)

Genetics (patterns of inheritance) MENDELIAN GENETICS branch of biology that studies how genetic characteristics are inherited MENDELIAN GENETICS Gregory Mendel, an Augustinian monk (1822-1884), was the first who systematically studied

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1. Package MACAU2 April 8, 2017 Type Package Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data Version 1.10 Date 2017-03-31 Author Shiquan Sun, Jiaqiang Zhu, Xiang Zhou Maintainer Shiquan Sun

More information

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding Professur Pflanzenzüchtung Professur Pflanzenzüchtung A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding Jens Léon 4. November 2014, Oulu Workshop

More information

Evolution of quantitative traits

Evolution of quantitative traits Evolution of quantitative traits Introduction Let s stop and review quickly where we ve come and where we re going We started our survey of quantitative genetics by pointing out that our objective was

More information

EPISTASIS has long been recognized as an important component

EPISTASIS has long been recognized as an important component GENETICS GENOMIC SELECTION Modeling Epistasis in Genomic Selection Yong Jiang and Jochen C. Reif 1 Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben,

More information

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

An indirect approach to the extensive calculation of relationship coefficients

An indirect approach to the extensive calculation of relationship coefficients Genet. Sel. Evol. 34 (2002) 409 421 409 INRA, EDP Sciences, 2002 DOI: 10.1051/gse:2002015 Original article An indirect approach to the extensive calculation of relationship coefficients Jean-Jacques COLLEAU

More information

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to 1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci Quantitative Genetics: Traits controlled my many loci So far in our discussions, we have focused on understanding how selection works on a small number of loci (1 or 2). However in many cases, evolutionary

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

BGLR: A Statistical Package for Whole-Genome Regression

BGLR: A Statistical Package for Whole-Genome Regression BGLR: A Statistical Package for Whole-Genome Regression Paulino Pérez Rodríguez Socio Economía Estadística e Informática, Colegio de Postgraduados, México perpdgo@colpos.mx Gustavo de los Campos Department

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Wheat Genetics and Molecular Genetics: Past and Future. Graham Moore

Wheat Genetics and Molecular Genetics: Past and Future. Graham Moore Wheat Genetics and Molecular Genetics: Past and Future Graham Moore 1960s onwards Wheat traits genetically dissected Chromosome pairing and exchange (Ph1) Height (Rht) Vernalisation (Vrn1) Photoperiodism

More information

Prediction and Validation of Three Cross Hybrids in Maize (Zea mays L.)

Prediction and Validation of Three Cross Hybrids in Maize (Zea mays L.) International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 7 Number 01 (2018) Journal homepage: http://www.ijcmas.com Original Research Article https://doi.org/10.20546/ijcmas.2018.701.183

More information

Linear Models for the Prediction of Animal Breeding Values

Linear Models for the Prediction of Animal Breeding Values Linear Models for the Prediction of Animal Breeding Values R.A. Mrode, PhD Animal Data Centre Fox Talbot House Greenways Business Park Bellinger Close Chippenham Wilts, UK CAB INTERNATIONAL Preface ix

More information

R-squared for Bayesian regression models

R-squared for Bayesian regression models R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Quantitative Trait Variation

Quantitative Trait Variation Quantitative Trait Variation 1 Variation in phenotype In addition to understanding genetic variation within at-risk systems, phenotype variation is also important. reproductive fitness traits related to

More information

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi- 11 0012 vkbhatia@iasri.res.in Introduction Variance components are commonly used

More information

Pooling multiple imputations when the sample happens to be the population.

Pooling multiple imputations when the sample happens to be the population. Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Heinrich Grausgruber Department of Crop Sciences Division of Plant Breeding Konrad-Lorenz-Str Tulln

Heinrich Grausgruber Department of Crop Sciences Division of Plant Breeding Konrad-Lorenz-Str Tulln 957.321 Sources: Nespolo (2003); Le Rouzic et al. (2007) Heinrich Grausgruber Department of Crop Sciences Division of Plant Breeding Konrad-Lorenz-Str. 24 3430 Tulln Zuchtmethodik & Quantitative Genetik

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Genetic Analysis for Heterotic Traits in Bread Wheat (Triticum aestivum L.) Using Six Parameters Model

Genetic Analysis for Heterotic Traits in Bread Wheat (Triticum aestivum L.) Using Six Parameters Model International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 7 Number 06 (2018) Journal homepage: http://www.ijcmas.com Original Research Article https://doi.org/10.20546/ijcmas.2018.706.029

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Ch 11.Introduction to Genetics.Biology.Landis

Ch 11.Introduction to Genetics.Biology.Landis Nom Section 11 1 The Work of Gregor Mendel (pages 263 266) This section describes how Gregor Mendel studied the inheritance of traits in garden peas and what his conclusions were. Introduction (page 263)

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information