THE ABILITY TO PREDICT COMPLEX TRAITS from marker data

Size: px

Start display at page:

Download "THE ABILITY TO PREDICT COMPLEX TRAITS from marker data"

Spencer Harrell
5 years ago
Views:

1 Published November, 011 ORIGINAL RESEARCH Ridge Regression and Other Kernels for Genomic Selection with R Pacage rrblup Jeffrey B. Endelman* Abstract Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marer-assisted selection. Genomic selection addresses this complexity by including all marers in the prediction model. A ey method for the genomic prediction of breeding values is ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other ernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive ernels in plant breeding, a new software pacage for R called rrblup has been developed. At its core is a fast maximum-lielihood algorithm for mixed models with a single variance component besides the residual error, which allows for effi cient prediction with unreplicated training data. Use of the rrblup software is demonstrated through several examples, including the identifi cation of optimal crosses based on superior progeny value. In cross-validation tests, the prediction accuracy with nonadditive ernels was signifi cantly higher than RR for wheat (Triticum aestivum L.) grain yield but equivalent for several maize (Zea mays L.) traits. Published in The Plant Genome 4: Published Nov doi: /plantgenome Crop Science Society of America 5585 Guilford Rd., Madison, WI USA An open-access publication All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. THE ABILITY TO PREDICT COMPLEX TRAITS from marer data is becoming increasingly important in plant breeding (Bernardo, 008). The earliest attempts, now over 0 years old, involved first identifying significant marers and then combining them in a multiple regression model (Lande and Thompson, 1990). The focus over the last decade has been on genomic selection methods, in which all marers are included in the prediction model (Bernardo and Yu, 007; Heffner et al., 009; Jannin et al., 010). One of the first methods proposed for genomic selection was ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) in the context of mixed models (Whittaer et al., 000; Meuwissen et al., 001). The basic RR-BLUP model is y = WGu + ε, [1] where u ~ ( 0, σ u ) N I is a vector of marer effects, G is the genotype matrix (e.g., {aa,aa,aa} = { 1,0,1} for biallelic single nucleotide polymorphisms (SNPs) under an additive model), and W is the design matrix relating lines to observations (y). The BLUP solution for the marer effects can be written as either û= Z ( ZZ +λi) 1 y or ( ) 1 uˆ = Z Z+λI Z y, where Z = WG and the ridge parameter λ = σe / σ is the ratio between the residual and u marer variances (Searle et al., 006). Compared with ordinary regression, for which the number of marers cannot exceed the number of observations, RR has no such limit and also has improved numerical stability Dep. of Crop and Soil Sciences, Washington State Univ., State Route 536, Mount Vernon, WA Received 6 May 011. *Corresponding author (j.endelman@gmail.com). Abbreviations: θ REML, restricted maximum lielihood solution for θ; BLR, Bayesian Linear Regression; BLUP, best linear unbiased prediction; EXP, exponential model; GAUSS, Gaussian model; GEBV, genomic-estimated breeding value; LL, log-lielihood; ML, maximum lielihood; REML, restricted maximum lielihood; RR, ridge regression; r pred, cross-validation accuracy; r train, training population accuracy; SNP, single nucleotide polymorphism. 50 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3

2 when marers are highly correlated (Hoerl and Kennard, 000). There is a close connection between marer-based RR-BLUP (Eq. [1]) and inship-blup, in which the performance of breeding lines is predicted based on their inship to other germplasm (Bernardo, 1994; Piepho et al., 008). The basic inship-blup model is y = Wg + ε g ~ N( 0, σg ) K, [] where g is a vector of genotypic values. In pedigree-based prediction of breeding values, K is the additive relationship matrix A derived from the coefficients of coancestry (Bernardo, 010). These coefficients reflect the average behavior of alleles undergoing Mendelian segregation, but the actual segregation can be captured with the marer-based relationship matrix K RR = GG. [3] Equation [3] has the property that, for random populations, its expected value is proportional to A plus a constant (Habier et al., 007); for this reason it has been called the realized (additive) relationship matrix. Another ey property of K RR is that the genomic-estimated breeding values (GEBVs) it produces ( ĝ in Eq. []) are equivalent to those from the marer-based RR-BLUP approach ( Gu ˆ in Eq. [1]) (Hayes et al., 009). When using genomic selection to advance lines as varieties, it is not just the breeding (additive) value but the full genotypic value that is of interest (Piepho et al., 008). Rather than modeling epistatic interactions directly, which is challenging because of the combinatorial complexity, an alternative approach is to capture them through an appropriate ernel function (Gianola and van Kaam, 008; Piepho, 009; de los Campos et al., 010). The realized relationship model (Eq. [3]) is in fact a ernel in genotype space and can be written as K = Gi, G, where the angle bracets denote the inner j (or dot) product between genotypes i and j. In geometry the inner product measures the similarity of two vectors, so with the additive relationship model the genetic covariance between lines is proportional to their similarity in genotype space. This geometric formulation enables use of the socalled ernel tric in machine learning, which involves replacing the inner product in the original (genotype) space with an inner product in a more complex feature space, technically called a reproducing ernel Hilbert space (Schölopf and Smola, 00): (, ) ( ), ( ) K = K G G = Φ G Φ G, [4] i j i j Equation [4] means that the ernel function K, which taes the two genotypes as arguments and returns a single number, equals the inner product between the genotypes in a feature space defined by Φ. Although one can construct ernels by first specifying Φ and then applying Eq. [4], this is unnecessary as the feature space is guaranteed to exist for any positive semidefinite ernel (Schölopf and Smola, 00). To calculate BLUPs that include nonadditive effects, it is sufficient to solve Eq. [] with K based on an appropriate ernel function (Gianola and van Kaam, 008). The objective of the present research was to develop an R pacage for genomic prediction based on a maximum lielihood (ML) or restricted maximum lielihood (REML) approach to ridge regression (RR) and other ernels. The result is rrblup (available at [verified 1 Nov. 011]), which uses a fast spectral algorithm for mixed models with a single variance component besides the residual error (Kang et al., 008). After demonstrating features of the software, the accuracy of its prediction methods are compared by cross-validation using structured populations of wheat (Triticum aestivum L.) (Crossa et al., 010) and maize (Zea mays L.) (Yu et al., 006). MATERIALS AND METHODS The wheat population consisted of 599 inbred lines genotyped at 179 Diversity Array Technology (DArT) marers and was downloaded as part of the Bayesian Linear Regression (BLR) pacage for R, version 1. (Pérez et al., 010). Single nucleotide polymorphism marers and phenotypic data for maize ear height, ear diameter, and male flowering time were downloaded from the TASSEL website (Bradbury et al., 007). For each of the ten maize chromosomes, the diploid marer data were phased and missing alleles imputed using the software BEAGLE, version (Browning and Browning, 007). After removing monomorphic marers, 953 remained. The population size was 79 inbred lines, but due to missing phenotypic data only 76 lines were available for flowering time and 49 for ear diameter. For each of the 179,101 unique crosses between the 599 wheat lines, the expected mean and standard deviation (SD) for the GEBV of the recombinant inbred progeny were calculated based on the predicted marer effects in environment 1. In the absence of a linage map, marers were assumed to segregate independently, which is clearly an approximation. (With a linage map the SD could be simulated more realistically.) If p + and p denote the frequency of the +1 and 1 alleles, respectively, at locus in the parents, then the mean GEBV of the inbred progeny is Eg ˆ = EG uˆ = ( p + p ) u ˆ, and the i i variance (neglecting uncertainty in the marer effects) is ( ) Var = = g ˆ ˆ ˆ i Var Gi u E Gi E Gi u = 1 ( p + ) ˆ p u Bayesian LASSO predictions were made with the BLR pacage for R, version 1., and hyperparameters were chosen based on the guidelines of Pérez et al. (010). For the prior distribution of the residual variance, the degrees of freedom was df ε = 3 and the scale was S ε = (Var[y]/)( + df ε ), where Var[y] is the variance of the training data. The prior distribution for the LASSO ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 51

3 1/ shrinage parameter had mode ( ) G, where G is the average over the training data and the sum is over marers. The rate and shape hyperparameters were 10 5 and 0.5, respectively. A total of 10,000 iterations was used, with a burn-in period of 000 iterations. Statistical analysis of the cross-validation results was conducted with SAS PROC GLM (SAS Institute, 1994), with partition and method as fixed effects. The REGWQ option was used to control the strong familywise error rate (the probability of false discovery) at RESULTS AND DISCUSSION Marer vs. Kinship-Based Prediction At the core of the rrblup pacage is the function mixed.solve, which solves any mixed model of the form y = Xβ + Zu + ε u ~ N( 0, σu ) K, [5] where X is a full-ran design matrix for the fixed effects β, Z is the design matrix for the random effects u, K is a positive semidefinite matrix, and the residuals are normal with constant variance. Variance components are estimated by either ML or REML (default) using the spectral decomposition algorithm of Kang et al. (008). The R function returns the variance components, the maximized log-lielihood (LL), the ML estimate for β, and BLUP solution for u. It was stated in the introduction that when the realized relationship matrix GG is used, the marer-based (Eq. [1]) and inship-based (Eq. []) formulations of the prediction problem give equivalent GEBV. This can be verified numerically using mixed.solve and a set of 599 wheat lines from the BLR pacage for R (Pérez et al., 010). The BLR variable Y contains the two-year average grain yield in four environments (standardized to zero mean and unit variance), and the genotype matrix is coded as {0,1} in the variable X. To be consistent with the notation in this article, the genotypes were recoded as { 1,1} in G: library(rrblup) #load rrblup library(blr) #load BLR data(wheat) #load wheat data G <- *X 1 #recode genotypes y <- Y[,1] #yields from E1 #marer-based ans1 <- mixed.solve(y=y,z=g) #inship-based K <- tcrossprod(g) #K = GG' I <- diag(599) ans <- mixed.solve(y=y,z=i,k=k) #Compare GEBV cor (G%*%ans1$u, ans$u) #equals 1 In the first call to mixed.solve the design matrix equals the genotype matrix, so the random effects are the marer effects. In this case K is an identity matrix, which the software assumes because no K variable is provided. When no design matrix for fixed effects is provided, as in this example, an intercept term is automatically included. In the second call to mixed.solve, an identity matrix is used for Z and the realized relationship matrix GG is used for K. In this case the random effects are the breeding values, which in the last line of code are compared with the GEBV from the marer-based model. As shown in the comments, the correlation is exactly 1. Each of the two calls to mixed.solve too five seconds on a laptop computer with two gigabytes of memory, running R.13.1 (R Development Core Team, 011). Although the two approaches are equivalent for calculating GEBV, some analyses depend on nowing the marer effects. For example, when different lines are evaluated in different environments, even though a whole genotype environment analysis is not possible, one can still study marer environment interactions (Crossa et al., 010). Another application is to design crosses in a breeding program (Bernardo et al., 006; Zhong and Jannin, 007). The expected mean for the progeny can be calculated as the mean of the parental GEBV, but the marer effects are needed to compute the variance of the population, which is important for genetic gain. To illustrate, each circle in Fig. 1 shows the expected mean (μ) and standard deviation (σ) for the GEBV of recombinant inbred lines from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marer effects in environment 1. In the upper right corner of the figure are crosses between lines with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected. For a given selection intensity i, the mean of the selected population is μ s = μ + iσ, which Zhong and Jannin (007) called the superior progeny value. The superior progeny values for the crosses in Fig. 1 were calculated for selection intensities ranging from 1.4 (0% selected) to.7 (1% selected). The top nine crosses were conserved across this range and are listed in Table 1, with lines identified by their GEBV ran. Exactly one of the top two highest-gebv lines was found in every pair, but the 1 cross does not appear because the two lines share 96% of their alleles and have an expected SD of Kernels with Epistatic Effects At present there are two ernels other than RR in the rrblup pacage. One is the Gaussian model (GAUSS): K = exp[ (D /θ) ], [6] Where 1/ M D = ( ) ( ) 1/4M Gi G j [7] = 1 is the Euclidean distance between genotypes i and j, normalized to the interval [0,1]. The parameter θ is a scale parameter that influences how quicly the genetic 5 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3

Table 1. Top nine wheat crosses based on superior progeny value (SPV) in environment 1. Cross Kinship SPV 0% SPV 1% Mean GEBV SD GEBV 1 4 0.57.61.54 1.971 0.07 1 5 0.57.60.5 1.970 0.07 1 3 0.69.56.

4 Table 1. Top nine wheat crosses based on superior progeny value (SPV) in environment 1. Cross Kinship SPV 0% SPV 1% Mean GEBV SD GEBV Line identifi er equals the GEBV ran. Fraction of shared alleles (identity by state). GEBV, genomic-estimated breeding value. Figure 1. Analysis of line crosses. Each circle is the expected mean and standard deviation (SD) for the genomic-estimated breeding values (GEBVs) of the recombinant inbred progeny from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marer effects in environment 1. In the top right of the figure are crosses between parents with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected. covariance decays with distance. The other ernel is the exponential model (EXP): K = exp( D /θ). These ernels are available through the rrblup function inship.blup, which was designed to predict the genotypic values of one population based on the genotypes and phenotypes of a second, training population. To illustrate its use, consider again the 599 wheat lines from the BLR pacage, which have been randomly partitioned into 10 sets for use in 10-fold cross-validation (Pérez et al., 010). The variable sets contain the partition number for each line. To predict the genotypic values of set 1 using the other nine sets as the training population, the R code is train <- which(sets!=1) pred <- which(sets==1) ans.rr<-inship.blup(y=y[train], G.train=G[train,],G.pred=G[pred,]) ans.gauss<-inship.blup(y=y[train], G.train=G[train,],G.pred=G[pred,], K.method="GAUSS") #accuracy with RR cor(ans.rr$g.pred,y[pred]) #accuracy with GAUSS cor(ans.gauss$g.pred,y[pred]) r between the predicted genotypic value and observed phenotype for the prediction population, which In the first call to inship.blup the ernel method is not specified, so by default the realized relationship model is used. The last two lines of code calculate the correlation ( ĝy ) measures the cross-validation accuracy of the prediction method. Table shows the accuracies of the two methods for all 10 sets in environments 1 and. The results demonstrate that the performance of GAUSS compared to RR depends on both the structure of the population and the phenotype. For 9 out of 10 sets in environment 1, the accuracy with GAUSS was higher than RR. The largest gap was for set 5, where the accuracy with RR was 0.34 vs with GAUSS. Across the 10 sets the mean accuracy with GAUSS was 0.58 vs for RR (p = by paired t-test). By contrast, in environment there was no significant difference between the prediction methods (p = 0.). To better understand these differences, Fig. shows the log-lielihood (LL) (solid circles), training population accuracy (r train ) (dashed line), and cross-validation accuracy (r pred ) (open circles) as a function of the scale parameter θ (see Eq. [6]). The rrblup pacage uses REML (or ML) to identify the optimal scale parameter, and because the genotype distances have been normalized to the unit interval (Eq. [7]), this is also the essential range for θ. The two panels in Fig. correspond to sets 5 and 6 in environment 1, which showed contrasting results in the RR vs. GAUSS comparison: for set 5 the accuracy with GAUSS was higher and vice versa for set 6 (see Table ). In both cases the REML solution for θ (θ REML ) was similar and the r train approached 1 as θ decreased to zero. The crucial difference lies in r pred. For set 5 r pred exhibited an interior maximum near the θ REML while for set 6 r pred was maximized at θ = 1 and declined steadily as θ decreased. The significance of this observation for understanding Table is that GAUSS behaves lie RR when θ is large relative to D. This follows from the Taylor series 4 expansion, K = 1 ( D θ ) + 1/( D θ ) +, and the fact that [ D ] is equivalent to the additive model GG for inbred lines (Piepho, 009). As θ decreases, the epistatic 4 interactions in the higher order terms (e.g., D ) become more important. When r pred has an interior maximum near θ REML, as in set 5, GAUSS will have higher accuracy than RR. When r pred increases monotonically with θ, ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 53

5 GAUSS will not have higher accuracy than RR; whether GAUSS is lower or equivalent depends on the shape of the LL profile. In the case of set 6, the LL profile peaed at θ REML = 0.4, so RR had higher accuracy. For most sets in environment, both LL and r pred increased monotonically with θ (not shown), so GAUSS and RR were equivalent. These phenomena are relevant to the question of whether GAUSS is prone to overfitting, which Piepho (009) and Heslot et al. (01) have raised as a concern. In both studies the residual error with GAUSS was much smaller than with RR, or equivalently the accuracy for the training population was nearly 1. This was also observed with the BLR wheat data, as shown by the dashed line in Fig.. To constitute overfitting, however, there must be a tradeoff between higher accuracy for the training set and lower accuracy for the validation set (Dietrich, 1995). The results in Heslot et al. (01) and the present study show that such a tradeoff is rare provided the scale parameter is chosen properly. Overfitting was observed for set 6 in environment 1, but more typically r pred was either the same or higher with GAUSS compared to RR (see Table ). To investigate the matter further, a different data set 79 maize lines genotyped at 953 SNP marers was analyzed with the rrblup pacage. The cross-validation accuracies for maize flowering time, ear height, and ear diameter are shown alongside the results for wheat grain yield in Table 3. For wheat grain yield, the accuracy with GAUSS was 6 to 7 percentage points higher than RR in every environment but environment (similar to Crossa et al. [010]). For all three maize traits there was no significant difference between GAUSS and RR, which provides additional evidence that overfitting (i.e., Table. Cross-validation accuracies ( r ĝy ) for wheat grain yield. Environment 1 Environment Set RR GAUSS RR GAUSS Mean ** **Means signifi cantly different at the 0.01 probability level in Environment 1. Prediction set; the other nine sets were used for training. RR, ridge regression. GAUSS, Gaussian model. a loss in cross-validation accuracy) is not common with GAUSS. The results also suggest that most (perhaps all) of the genetic variation was additive for the maize traits. Table 3 includes the cross-validation results with EXP, which was equivalent to GAUSS for all seven traits. Piepho (009) also found little difference between these two models in his analysis of maize grain yield. Lie GAUSS, EXP captures nonadditive effects but the structure of its feature space is different. For the limited plant breeding data analyzed thus far with the two methods, this difference appears to be of little consequence. Figure. Performance of the Gaussian model (GAUSS). The figure depicts the effect of the Gaussian scale parameter (θ in Eq. [6]) on the restricted log-lielihood (LL), the training population accuracy (r train ), and the cross-validation accuracy (r pred ) when predicting sets 5 or 6 in environment 1. For set 5 the restricted maximum lielihood solution for θ (θ REML ) = 0.5, and for set 6 θ REML = 0.4. In both cases r train approached 1 as θ 0, but the trends for r pred were different. For set 5 r pred exhibited an interior maximum near θ REML, while for set 6 r pred increased monotonically with θ. Because GAUSS is approximately ridge regression (RR) when θ is large, the contrasting behavior in this figure illustrates why GAUSS had higher r pred than RR for set 5 but vice versa for set 6 (see Table ). 54 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3

6 Table 3. Tenfold cross-validation accuracy ( r ĝy ) for maize and wheat traits. Method Wheat yield 1 Wheat yield Wheat yield 3 Wheat yield 4 Maize flowering time Maize ear height Maize ear diameter GAUSS 0.58 a 0.49 a 0.45 a 0.54 a 0.73 a 0.51 a 0.53 ab EXP 0.57 a 0.49 a 0.45 a 0.54 a 0.73 a 0.54 a 0.54 a RR 0.51 b 0.48 a 0.38 b 0.48 b 0.73 a 0.51 a 0.5 b BL 0.51 b 0.48 a 0.38 b 0.47 b 0.73 a 0.5 a 0.53 ab GAUSS, Gaussian model; EXP, exponential model; RR, ridge regression; BL, Bayesian LASSO. Within each trait, accuracies with the same letter were not signifi cantly different at the 0.05 probability level. For the sae of comparison, Table 3 also shows the accuracy of the additive Bayesian LASSO model, which was equivalent to RR for all seven traits. CONCLUSIONS The objective of this research was to create software that maes ridge regression and other ernel methods accessible to plant breeders interested in genomic selection. At the core of the rrblup pacage is the function mixed.solve, which can be used to solve both the marerbased and inship-based versions of the genomic prediction problem. The function inship.blup provides a more intuitive interface for inship-based prediction and includes several genetic models, including an additive relationship matrix and the nonadditive Gaussian ernel. Acnowledgments The author thans Jean-Luc Jannin for his mentoring and helpful comments on the manuscript. References Bernardo, R Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci. 34:0 5. Bernardo, R Molecular marers and selection for complex traits in plants: Learning from the last 0 years. Crop Sci. 48: Bernardo, R Quantitative traits in plant breeding. Stemma Press, Woodbury, MN. Bernardo, R., L. Moreau, and A. Charcosset Number and fitness of selected individuals in marer-assisted and phenotypic recurrent selection. Crop Sci. 46: Bernardo, R., and J. Yu Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 47: Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and E.S. Bucler TASSEL: Software for association mapping of complex traits in diverse samples. Available at maizegenetics.net/tassel (verified 1 Nov. 011). Bioinformatics 3: Browning, S.R., and B.L. Browning Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81: de los Campos, G., D. Gianola, G.J.M. Rosa, K.A. Weigel, and J. Crossa Semi-parametric genomic-enabled prediction of genetic values using reproducing ernel Hilbert spaces methods. Genet. Res. Camb. 9: Crossa, J., G. de los Campos, P. Pérez, D. Gianola, J. Burgueño, J.L. Araus, D. Maumbi, R.P. Singh, S. Dreisigacer, J. Yan, V. Arief, M. Banziger, and H.-J. Braun Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular marers. Genetics 186: Dietrich, T Overfitting and undercomputing in machine learning. ACM Comput. Surv. 7: Gianola, D., and J.B. van Kaam Reproducing ernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178: Habier, D., R.L. Fernando, and J.C.M. Deers The impact of genetic relationship information on genome-assisted breeding values. Genetics 177: Hayes, B.J., P.M. Visscher, and M.E. Goddard Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. Camb. 91: Heffner, E.L., M.E. Sorrells, and J.-L. Jannin Genomic selection for crop improvement. Crop Sci. 49:1 1. Heslot, N., H.-P. Yang, M.E. Sorrells, and J.-L. Jannin. 01. Genomic selection in plant breeding: A comparison of models. Crop Sci. 5: Hoerl, A.E., and R.W. Kennard Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 4: Jannin, J.-L., A.J. Lorenz, and H. Iwata Genomic selection in plant breeding: From theory to practice. Brief. Funct. Genomic 9: Kang, H.M., N.A. Zaitlen, C.M. Wade, A. Kirby, D. Hecerman, M.J. Daly, and E. Esin Efficient control of population structure in model organism association mapping. Genetics 178: Lande, R., and R. Thompson Efficiency of marer-assisted selection in the improvement of quantitative traits. Genetics 14: Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard Prediction of total genetic value using genome-wide dense marer maps. Genetics 157: Pérez, P., G. de los Campos, J. Crossa, and D. Gianola Genomicenabled prediction based on molecular marers and pedigree using the Bayesian Linear Regression pacage in R. Plant Gen. 3: Piepho, H.P Ridge regression and extensions for genomewide selection in maize. Crop Sci. 49: Piepho, H.P., J. Möhring, A.E. Melchinger, and A. Büchse BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:09 8. R Development Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. SAS Institute SAS 9. for Windows. SAS Institute, Cary, NC. Schölopf, B., and A.J. Smola. 00. Learning with ernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA. Searle, S.R., G. Casella, and C.E. McCulloch Variance components. John Wiley & Sons, Hoboen, NJ. Whittaer, J.C., R. Thompson, and M.C. Denham Marer-assisted selection using ridge regression. Genet. Res. Camb. 75:49 5. Yu, J., G. Pressoir, W.H. Briggs, I.V. Bi, M. Yamasai, J.F. Doebley, M.D. McMullen, B.S. Gaut, D.M. Nielsen, J.B. Holland, S. Kresovich, and E.S. Bucler A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: Zhong, S., and J.-L. Jannin Using quantitative trait loci results to discriminate among crosses on the basis of their progeny mean and variance. Genetics 177: ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 55

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions