THE ABILITY TO PREDICT COMPLEX TRAITS from marker data
|
|
- Spencer Harrell
- 5 years ago
- Views:
Transcription
1 Published November, 011 ORIGINAL RESEARCH Ridge Regression and Other Kernels for Genomic Selection with R Pacage rrblup Jeffrey B. Endelman* Abstract Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marer-assisted selection. Genomic selection addresses this complexity by including all marers in the prediction model. A ey method for the genomic prediction of breeding values is ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other ernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive ernels in plant breeding, a new software pacage for R called rrblup has been developed. At its core is a fast maximum-lielihood algorithm for mixed models with a single variance component besides the residual error, which allows for effi cient prediction with unreplicated training data. Use of the rrblup software is demonstrated through several examples, including the identifi cation of optimal crosses based on superior progeny value. In cross-validation tests, the prediction accuracy with nonadditive ernels was signifi cantly higher than RR for wheat (Triticum aestivum L.) grain yield but equivalent for several maize (Zea mays L.) traits. Published in The Plant Genome 4: Published Nov doi: /plantgenome Crop Science Society of America 5585 Guilford Rd., Madison, WI USA An open-access publication All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. THE ABILITY TO PREDICT COMPLEX TRAITS from marer data is becoming increasingly important in plant breeding (Bernardo, 008). The earliest attempts, now over 0 years old, involved first identifying significant marers and then combining them in a multiple regression model (Lande and Thompson, 1990). The focus over the last decade has been on genomic selection methods, in which all marers are included in the prediction model (Bernardo and Yu, 007; Heffner et al., 009; Jannin et al., 010). One of the first methods proposed for genomic selection was ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) in the context of mixed models (Whittaer et al., 000; Meuwissen et al., 001). The basic RR-BLUP model is y = WGu + ε, [1] where u ~ ( 0, σ u ) N I is a vector of marer effects, G is the genotype matrix (e.g., {aa,aa,aa} = { 1,0,1} for biallelic single nucleotide polymorphisms (SNPs) under an additive model), and W is the design matrix relating lines to observations (y). The BLUP solution for the marer effects can be written as either û= Z ( ZZ +λi) 1 y or ( ) 1 uˆ = Z Z+λI Z y, where Z = WG and the ridge parameter λ = σe / σ is the ratio between the residual and u marer variances (Searle et al., 006). Compared with ordinary regression, for which the number of marers cannot exceed the number of observations, RR has no such limit and also has improved numerical stability Dep. of Crop and Soil Sciences, Washington State Univ., State Route 536, Mount Vernon, WA Received 6 May 011. *Corresponding author (j.endelman@gmail.com). Abbreviations: θ REML, restricted maximum lielihood solution for θ; BLR, Bayesian Linear Regression; BLUP, best linear unbiased prediction; EXP, exponential model; GAUSS, Gaussian model; GEBV, genomic-estimated breeding value; LL, log-lielihood; ML, maximum lielihood; REML, restricted maximum lielihood; RR, ridge regression; r pred, cross-validation accuracy; r train, training population accuracy; SNP, single nucleotide polymorphism. 50 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3
2 when marers are highly correlated (Hoerl and Kennard, 000). There is a close connection between marer-based RR-BLUP (Eq. [1]) and inship-blup, in which the performance of breeding lines is predicted based on their inship to other germplasm (Bernardo, 1994; Piepho et al., 008). The basic inship-blup model is y = Wg + ε g ~ N( 0, σg ) K, [] where g is a vector of genotypic values. In pedigree-based prediction of breeding values, K is the additive relationship matrix A derived from the coefficients of coancestry (Bernardo, 010). These coefficients reflect the average behavior of alleles undergoing Mendelian segregation, but the actual segregation can be captured with the marer-based relationship matrix K RR = GG. [3] Equation [3] has the property that, for random populations, its expected value is proportional to A plus a constant (Habier et al., 007); for this reason it has been called the realized (additive) relationship matrix. Another ey property of K RR is that the genomic-estimated breeding values (GEBVs) it produces ( ĝ in Eq. []) are equivalent to those from the marer-based RR-BLUP approach ( Gu ˆ in Eq. [1]) (Hayes et al., 009). When using genomic selection to advance lines as varieties, it is not just the breeding (additive) value but the full genotypic value that is of interest (Piepho et al., 008). Rather than modeling epistatic interactions directly, which is challenging because of the combinatorial complexity, an alternative approach is to capture them through an appropriate ernel function (Gianola and van Kaam, 008; Piepho, 009; de los Campos et al., 010). The realized relationship model (Eq. [3]) is in fact a ernel in genotype space and can be written as K = Gi, G, where the angle bracets denote the inner j (or dot) product between genotypes i and j. In geometry the inner product measures the similarity of two vectors, so with the additive relationship model the genetic covariance between lines is proportional to their similarity in genotype space. This geometric formulation enables use of the socalled ernel tric in machine learning, which involves replacing the inner product in the original (genotype) space with an inner product in a more complex feature space, technically called a reproducing ernel Hilbert space (Schölopf and Smola, 00): (, ) ( ), ( ) K = K G G = Φ G Φ G, [4] i j i j Equation [4] means that the ernel function K, which taes the two genotypes as arguments and returns a single number, equals the inner product between the genotypes in a feature space defined by Φ. Although one can construct ernels by first specifying Φ and then applying Eq. [4], this is unnecessary as the feature space is guaranteed to exist for any positive semidefinite ernel (Schölopf and Smola, 00). To calculate BLUPs that include nonadditive effects, it is sufficient to solve Eq. [] with K based on an appropriate ernel function (Gianola and van Kaam, 008). The objective of the present research was to develop an R pacage for genomic prediction based on a maximum lielihood (ML) or restricted maximum lielihood (REML) approach to ridge regression (RR) and other ernels. The result is rrblup (available at [verified 1 Nov. 011]), which uses a fast spectral algorithm for mixed models with a single variance component besides the residual error (Kang et al., 008). After demonstrating features of the software, the accuracy of its prediction methods are compared by cross-validation using structured populations of wheat (Triticum aestivum L.) (Crossa et al., 010) and maize (Zea mays L.) (Yu et al., 006). MATERIALS AND METHODS The wheat population consisted of 599 inbred lines genotyped at 179 Diversity Array Technology (DArT) marers and was downloaded as part of the Bayesian Linear Regression (BLR) pacage for R, version 1. (Pérez et al., 010). Single nucleotide polymorphism marers and phenotypic data for maize ear height, ear diameter, and male flowering time were downloaded from the TASSEL website (Bradbury et al., 007). For each of the ten maize chromosomes, the diploid marer data were phased and missing alleles imputed using the software BEAGLE, version (Browning and Browning, 007). After removing monomorphic marers, 953 remained. The population size was 79 inbred lines, but due to missing phenotypic data only 76 lines were available for flowering time and 49 for ear diameter. For each of the 179,101 unique crosses between the 599 wheat lines, the expected mean and standard deviation (SD) for the GEBV of the recombinant inbred progeny were calculated based on the predicted marer effects in environment 1. In the absence of a linage map, marers were assumed to segregate independently, which is clearly an approximation. (With a linage map the SD could be simulated more realistically.) If p + and p denote the frequency of the +1 and 1 alleles, respectively, at locus in the parents, then the mean GEBV of the inbred progeny is Eg ˆ = EG uˆ = ( p + p ) u ˆ, and the i i variance (neglecting uncertainty in the marer effects) is ( ) Var = = g ˆ ˆ ˆ i Var Gi u E Gi E Gi u = 1 ( p + ) ˆ p u Bayesian LASSO predictions were made with the BLR pacage for R, version 1., and hyperparameters were chosen based on the guidelines of Pérez et al. (010). For the prior distribution of the residual variance, the degrees of freedom was df ε = 3 and the scale was S ε = (Var[y]/)( + df ε ), where Var[y] is the variance of the training data. The prior distribution for the LASSO ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 51
3 1/ shrinage parameter had mode ( ) G, where G is the average over the training data and the sum is over marers. The rate and shape hyperparameters were 10 5 and 0.5, respectively. A total of 10,000 iterations was used, with a burn-in period of 000 iterations. Statistical analysis of the cross-validation results was conducted with SAS PROC GLM (SAS Institute, 1994), with partition and method as fixed effects. The REGWQ option was used to control the strong familywise error rate (the probability of false discovery) at RESULTS AND DISCUSSION Marer vs. Kinship-Based Prediction At the core of the rrblup pacage is the function mixed.solve, which solves any mixed model of the form y = Xβ + Zu + ε u ~ N( 0, σu ) K, [5] where X is a full-ran design matrix for the fixed effects β, Z is the design matrix for the random effects u, K is a positive semidefinite matrix, and the residuals are normal with constant variance. Variance components are estimated by either ML or REML (default) using the spectral decomposition algorithm of Kang et al. (008). The R function returns the variance components, the maximized log-lielihood (LL), the ML estimate for β, and BLUP solution for u. It was stated in the introduction that when the realized relationship matrix GG is used, the marer-based (Eq. [1]) and inship-based (Eq. []) formulations of the prediction problem give equivalent GEBV. This can be verified numerically using mixed.solve and a set of 599 wheat lines from the BLR pacage for R (Pérez et al., 010). The BLR variable Y contains the two-year average grain yield in four environments (standardized to zero mean and unit variance), and the genotype matrix is coded as {0,1} in the variable X. To be consistent with the notation in this article, the genotypes were recoded as { 1,1} in G: library(rrblup) #load rrblup library(blr) #load BLR data(wheat) #load wheat data G <- *X 1 #recode genotypes y <- Y[,1] #yields from E1 #marer-based ans1 <- mixed.solve(y=y,z=g) #inship-based K <- tcrossprod(g) #K = GG' I <- diag(599) ans <- mixed.solve(y=y,z=i,k=k) #Compare GEBV cor (G%*%ans1$u, ans$u) #equals 1 In the first call to mixed.solve the design matrix equals the genotype matrix, so the random effects are the marer effects. In this case K is an identity matrix, which the software assumes because no K variable is provided. When no design matrix for fixed effects is provided, as in this example, an intercept term is automatically included. In the second call to mixed.solve, an identity matrix is used for Z and the realized relationship matrix GG is used for K. In this case the random effects are the breeding values, which in the last line of code are compared with the GEBV from the marer-based model. As shown in the comments, the correlation is exactly 1. Each of the two calls to mixed.solve too five seconds on a laptop computer with two gigabytes of memory, running R.13.1 (R Development Core Team, 011). Although the two approaches are equivalent for calculating GEBV, some analyses depend on nowing the marer effects. For example, when different lines are evaluated in different environments, even though a whole genotype environment analysis is not possible, one can still study marer environment interactions (Crossa et al., 010). Another application is to design crosses in a breeding program (Bernardo et al., 006; Zhong and Jannin, 007). The expected mean for the progeny can be calculated as the mean of the parental GEBV, but the marer effects are needed to compute the variance of the population, which is important for genetic gain. To illustrate, each circle in Fig. 1 shows the expected mean (μ) and standard deviation (σ) for the GEBV of recombinant inbred lines from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marer effects in environment 1. In the upper right corner of the figure are crosses between lines with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected. For a given selection intensity i, the mean of the selected population is μ s = μ + iσ, which Zhong and Jannin (007) called the superior progeny value. The superior progeny values for the crosses in Fig. 1 were calculated for selection intensities ranging from 1.4 (0% selected) to.7 (1% selected). The top nine crosses were conserved across this range and are listed in Table 1, with lines identified by their GEBV ran. Exactly one of the top two highest-gebv lines was found in every pair, but the 1 cross does not appear because the two lines share 96% of their alleles and have an expected SD of Kernels with Epistatic Effects At present there are two ernels other than RR in the rrblup pacage. One is the Gaussian model (GAUSS): K = exp[ (D /θ) ], [6] Where 1/ M D = ( ) ( ) 1/4M Gi G j [7] = 1 is the Euclidean distance between genotypes i and j, normalized to the interval [0,1]. The parameter θ is a scale parameter that influences how quicly the genetic 5 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3
4 Table 1. Top nine wheat crosses based on superior progeny value (SPV) in environment 1. Cross Kinship SPV 0% SPV 1% Mean GEBV SD GEBV Line identifi er equals the GEBV ran. Fraction of shared alleles (identity by state). GEBV, genomic-estimated breeding value. Figure 1. Analysis of line crosses. Each circle is the expected mean and standard deviation (SD) for the genomic-estimated breeding values (GEBVs) of the recombinant inbred progeny from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marer effects in environment 1. In the top right of the figure are crosses between parents with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected. covariance decays with distance. The other ernel is the exponential model (EXP): K = exp( D /θ). These ernels are available through the rrblup function inship.blup, which was designed to predict the genotypic values of one population based on the genotypes and phenotypes of a second, training population. To illustrate its use, consider again the 599 wheat lines from the BLR pacage, which have been randomly partitioned into 10 sets for use in 10-fold cross-validation (Pérez et al., 010). The variable sets contain the partition number for each line. To predict the genotypic values of set 1 using the other nine sets as the training population, the R code is train <- which(sets!=1) pred <- which(sets==1) ans.rr<-inship.blup(y=y[train], G.train=G[train,],G.pred=G[pred,]) ans.gauss<-inship.blup(y=y[train], G.train=G[train,],G.pred=G[pred,], K.method="GAUSS") #accuracy with RR cor(ans.rr$g.pred,y[pred]) #accuracy with GAUSS cor(ans.gauss$g.pred,y[pred]) r between the predicted genotypic value and observed phenotype for the prediction population, which In the first call to inship.blup the ernel method is not specified, so by default the realized relationship model is used. The last two lines of code calculate the correlation ( ĝy ) measures the cross-validation accuracy of the prediction method. Table shows the accuracies of the two methods for all 10 sets in environments 1 and. The results demonstrate that the performance of GAUSS compared to RR depends on both the structure of the population and the phenotype. For 9 out of 10 sets in environment 1, the accuracy with GAUSS was higher than RR. The largest gap was for set 5, where the accuracy with RR was 0.34 vs with GAUSS. Across the 10 sets the mean accuracy with GAUSS was 0.58 vs for RR (p = by paired t-test). By contrast, in environment there was no significant difference between the prediction methods (p = 0.). To better understand these differences, Fig. shows the log-lielihood (LL) (solid circles), training population accuracy (r train ) (dashed line), and cross-validation accuracy (r pred ) (open circles) as a function of the scale parameter θ (see Eq. [6]). The rrblup pacage uses REML (or ML) to identify the optimal scale parameter, and because the genotype distances have been normalized to the unit interval (Eq. [7]), this is also the essential range for θ. The two panels in Fig. correspond to sets 5 and 6 in environment 1, which showed contrasting results in the RR vs. GAUSS comparison: for set 5 the accuracy with GAUSS was higher and vice versa for set 6 (see Table ). In both cases the REML solution for θ (θ REML ) was similar and the r train approached 1 as θ decreased to zero. The crucial difference lies in r pred. For set 5 r pred exhibited an interior maximum near the θ REML while for set 6 r pred was maximized at θ = 1 and declined steadily as θ decreased. The significance of this observation for understanding Table is that GAUSS behaves lie RR when θ is large relative to D. This follows from the Taylor series 4 expansion, K = 1 ( D θ ) + 1/( D θ ) +, and the fact that [ D ] is equivalent to the additive model GG for inbred lines (Piepho, 009). As θ decreases, the epistatic 4 interactions in the higher order terms (e.g., D ) become more important. When r pred has an interior maximum near θ REML, as in set 5, GAUSS will have higher accuracy than RR. When r pred increases monotonically with θ, ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 53
5 GAUSS will not have higher accuracy than RR; whether GAUSS is lower or equivalent depends on the shape of the LL profile. In the case of set 6, the LL profile peaed at θ REML = 0.4, so RR had higher accuracy. For most sets in environment, both LL and r pred increased monotonically with θ (not shown), so GAUSS and RR were equivalent. These phenomena are relevant to the question of whether GAUSS is prone to overfitting, which Piepho (009) and Heslot et al. (01) have raised as a concern. In both studies the residual error with GAUSS was much smaller than with RR, or equivalently the accuracy for the training population was nearly 1. This was also observed with the BLR wheat data, as shown by the dashed line in Fig.. To constitute overfitting, however, there must be a tradeoff between higher accuracy for the training set and lower accuracy for the validation set (Dietrich, 1995). The results in Heslot et al. (01) and the present study show that such a tradeoff is rare provided the scale parameter is chosen properly. Overfitting was observed for set 6 in environment 1, but more typically r pred was either the same or higher with GAUSS compared to RR (see Table ). To investigate the matter further, a different data set 79 maize lines genotyped at 953 SNP marers was analyzed with the rrblup pacage. The cross-validation accuracies for maize flowering time, ear height, and ear diameter are shown alongside the results for wheat grain yield in Table 3. For wheat grain yield, the accuracy with GAUSS was 6 to 7 percentage points higher than RR in every environment but environment (similar to Crossa et al. [010]). For all three maize traits there was no significant difference between GAUSS and RR, which provides additional evidence that overfitting (i.e., Table. Cross-validation accuracies ( r ĝy ) for wheat grain yield. Environment 1 Environment Set RR GAUSS RR GAUSS Mean ** **Means signifi cantly different at the 0.01 probability level in Environment 1. Prediction set; the other nine sets were used for training. RR, ridge regression. GAUSS, Gaussian model. a loss in cross-validation accuracy) is not common with GAUSS. The results also suggest that most (perhaps all) of the genetic variation was additive for the maize traits. Table 3 includes the cross-validation results with EXP, which was equivalent to GAUSS for all seven traits. Piepho (009) also found little difference between these two models in his analysis of maize grain yield. Lie GAUSS, EXP captures nonadditive effects but the structure of its feature space is different. For the limited plant breeding data analyzed thus far with the two methods, this difference appears to be of little consequence. Figure. Performance of the Gaussian model (GAUSS). The figure depicts the effect of the Gaussian scale parameter (θ in Eq. [6]) on the restricted log-lielihood (LL), the training population accuracy (r train ), and the cross-validation accuracy (r pred ) when predicting sets 5 or 6 in environment 1. For set 5 the restricted maximum lielihood solution for θ (θ REML ) = 0.5, and for set 6 θ REML = 0.4. In both cases r train approached 1 as θ 0, but the trends for r pred were different. For set 5 r pred exhibited an interior maximum near θ REML, while for set 6 r pred increased monotonically with θ. Because GAUSS is approximately ridge regression (RR) when θ is large, the contrasting behavior in this figure illustrates why GAUSS had higher r pred than RR for set 5 but vice versa for set 6 (see Table ). 54 THE PLANT GENOME NOVEMBER 011 VOL. 4, NO. 3
6 Table 3. Tenfold cross-validation accuracy ( r ĝy ) for maize and wheat traits. Method Wheat yield 1 Wheat yield Wheat yield 3 Wheat yield 4 Maize flowering time Maize ear height Maize ear diameter GAUSS 0.58 a 0.49 a 0.45 a 0.54 a 0.73 a 0.51 a 0.53 ab EXP 0.57 a 0.49 a 0.45 a 0.54 a 0.73 a 0.54 a 0.54 a RR 0.51 b 0.48 a 0.38 b 0.48 b 0.73 a 0.51 a 0.5 b BL 0.51 b 0.48 a 0.38 b 0.47 b 0.73 a 0.5 a 0.53 ab GAUSS, Gaussian model; EXP, exponential model; RR, ridge regression; BL, Bayesian LASSO. Within each trait, accuracies with the same letter were not signifi cantly different at the 0.05 probability level. For the sae of comparison, Table 3 also shows the accuracy of the additive Bayesian LASSO model, which was equivalent to RR for all seven traits. CONCLUSIONS The objective of this research was to create software that maes ridge regression and other ernel methods accessible to plant breeders interested in genomic selection. At the core of the rrblup pacage is the function mixed.solve, which can be used to solve both the marerbased and inship-based versions of the genomic prediction problem. The function inship.blup provides a more intuitive interface for inship-based prediction and includes several genetic models, including an additive relationship matrix and the nonadditive Gaussian ernel. Acnowledgments The author thans Jean-Luc Jannin for his mentoring and helpful comments on the manuscript. References Bernardo, R Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci. 34:0 5. Bernardo, R Molecular marers and selection for complex traits in plants: Learning from the last 0 years. Crop Sci. 48: Bernardo, R Quantitative traits in plant breeding. Stemma Press, Woodbury, MN. Bernardo, R., L. Moreau, and A. Charcosset Number and fitness of selected individuals in marer-assisted and phenotypic recurrent selection. Crop Sci. 46: Bernardo, R., and J. Yu Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 47: Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and E.S. Bucler TASSEL: Software for association mapping of complex traits in diverse samples. Available at maizegenetics.net/tassel (verified 1 Nov. 011). Bioinformatics 3: Browning, S.R., and B.L. Browning Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81: de los Campos, G., D. Gianola, G.J.M. Rosa, K.A. Weigel, and J. Crossa Semi-parametric genomic-enabled prediction of genetic values using reproducing ernel Hilbert spaces methods. Genet. Res. Camb. 9: Crossa, J., G. de los Campos, P. Pérez, D. Gianola, J. Burgueño, J.L. Araus, D. Maumbi, R.P. Singh, S. Dreisigacer, J. Yan, V. Arief, M. Banziger, and H.-J. Braun Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular marers. Genetics 186: Dietrich, T Overfitting and undercomputing in machine learning. ACM Comput. Surv. 7: Gianola, D., and J.B. van Kaam Reproducing ernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178: Habier, D., R.L. Fernando, and J.C.M. Deers The impact of genetic relationship information on genome-assisted breeding values. Genetics 177: Hayes, B.J., P.M. Visscher, and M.E. Goddard Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. Camb. 91: Heffner, E.L., M.E. Sorrells, and J.-L. Jannin Genomic selection for crop improvement. Crop Sci. 49:1 1. Heslot, N., H.-P. Yang, M.E. Sorrells, and J.-L. Jannin. 01. Genomic selection in plant breeding: A comparison of models. Crop Sci. 5: Hoerl, A.E., and R.W. Kennard Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 4: Jannin, J.-L., A.J. Lorenz, and H. Iwata Genomic selection in plant breeding: From theory to practice. Brief. Funct. Genomic 9: Kang, H.M., N.A. Zaitlen, C.M. Wade, A. Kirby, D. Hecerman, M.J. Daly, and E. Esin Efficient control of population structure in model organism association mapping. Genetics 178: Lande, R., and R. Thompson Efficiency of marer-assisted selection in the improvement of quantitative traits. Genetics 14: Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard Prediction of total genetic value using genome-wide dense marer maps. Genetics 157: Pérez, P., G. de los Campos, J. Crossa, and D. Gianola Genomicenabled prediction based on molecular marers and pedigree using the Bayesian Linear Regression pacage in R. Plant Gen. 3: Piepho, H.P Ridge regression and extensions for genomewide selection in maize. Crop Sci. 49: Piepho, H.P., J. Möhring, A.E. Melchinger, and A. Büchse BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:09 8. R Development Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. SAS Institute SAS 9. for Windows. SAS Institute, Cary, NC. Schölopf, B., and A.J. Smola. 00. Learning with ernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA. Searle, S.R., G. Casella, and C.E. McCulloch Variance components. John Wiley & Sons, Hoboen, NJ. Whittaer, J.C., R. Thompson, and M.C. Denham Marer-assisted selection using ridge regression. Genet. Res. Camb. 75:49 5. Yu, J., G. Pressoir, W.H. Briggs, I.V. Bi, M. Yamasai, J.F. Doebley, M.D. McMullen, B.S. Gaut, D.M. Nielsen, J.B. Holland, S. Kresovich, and E.S. Bucler A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: Zhong, S., and J.-L. Jannin Using quantitative trait loci results to discriminate among crosses on the basis of their progeny mean and variance. Genetics 177: ENDELMAN: GENOMIC SELECTION WITH R PACKAGE rrblup 55
GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)
GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions
More informationPrediction of genetic Values using Neural Networks
Prediction of genetic Values using Neural Networks Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1 1 CIMMyT-Mexico 2 University of Wisconsin, Madison. September, 2014 SLU,Sweden Prediction of genetic Values
More informationRecent advances in statistical methods for DNA-based prediction of complex traits
Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology
More informationSupplementary Information
Supplementary Information 1 Supplementary Figures (a) Statistical power (p = 2.6 10 8 ) (b) Statistical power (p = 4.0 10 6 ) Supplementary Figure 1: Statistical power comparison between GEMMA (red) and
More informationQuantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations
Scientia Agricola http://dx.doi.org/0.590/003-906-04-0383 Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations 43 José Marcelo Soriano
More informationarxiv: v1 [stat.me] 10 Jun 2018
Lost in translation: On the impact of data coding on penalized regression with interactions arxiv:1806.03729v1 [stat.me] 10 Jun 2018 Johannes W R Martini 1,2 Francisco Rosales 3 Ngoc-Thuy Ha 2 Thomas Kneib
More informationLecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013
Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic
More informationLecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013
Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More informationPackage BLR. February 19, Index 9. Pedigree info for the wheat dataset
Version 1.4 Date 2014-12-03 Title Bayesian Linear Regression Package BLR February 19, 2015 Author Gustavo de los Campos, Paulino Perez Rodriguez, Maintainer Paulino Perez Rodriguez
More informationLecture 8 Genomic Selection
Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection
More informationPackage BGGE. August 10, 2018
Package BGGE August 10, 2018 Title Bayesian Genomic Linear Models Applied to GE Genome Selection Version 0.6.5 Date 2018-08-10 Description Application of genome prediction for a continuous variable, focused
More informationGenomewide Selection in Oil Palm: Increasing Selection Gain per Unit Time and Cost with Small Populations
Genomewide Selection in Oil Palm: Increasing Selection Gain per Unit Time and Cost with Small Populations C.K. Wong R. Bernardo 1 ABSTRACT Oil palm (Elaeis guineensis Jacq.) requires 19 years per cycle
More informationBAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS. Universidad de Quintana Roo, Chetumal, Quintana Roo, México.
G3: Genes Genomes Genetics Early Online, published on October 28, 2016 as doi:10.1534/g3.116.035584 1 BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS Jaime Cuevas 1, José
More informationPackage LBLGXE. R topics documented: July 20, Type Package
Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationPackage bwgr. October 5, 2018
Type Package Title Bayesian Whole-Genome Regression Version 1.5.6 Date 2018-10-05 Package bwgr October 5, 2018 Author Alencar Xavier, William Muir, Shizhong Xu, Katy Rainey. Maintainer Alencar Xavier
More informationPrinciples of QTL Mapping. M.Imtiaz
Principles of QTL Mapping M.Imtiaz Introduction Definitions of terminology Reasons for QTL mapping Principles of QTL mapping Requirements For QTL Mapping Demonstration with experimental data Merit of QTL
More informationGBLUP and G matrices 1
GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described
More informationLecture 9 Multi-Trait Models, Binary and Count Traits
Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait
More informationQuantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations
4 Scientia Agricola http://dx.doi.org/0.590/678-99x-05-0479 Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations José Marcelo Soriano
More informationGenotype-Environment Effects Analysis Using Bayesian Networks
Genotype-Environment Effects Analysis Using Bayesian Networks 1, Alison Bentley 2 and Ian Mackay 2 1 scutari@stats.ox.ac.uk Department of Statistics 2 National Institute for Agricultural Botany (NIAB)
More informationBayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models
GENOMIC SELECTION Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models Jaime Cuevas,* José Crossa,,1 Osval A. Montesinos-López, Juan Burgueño, Paulino Pérez-Rodríguez, and
More informationEstimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty
Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School
More informationSoftware for genome-wide association studies having multivariate responses: Introducing MAGWAS
Software for genome-wide association studies having multivariate responses: Introducing MAGWAS Chad C. Brown 1 and Alison A. Motsinger-Reif 1,2 1 Department of Statistics, 2 Bioinformatics Research Center
More informationMixed-Models. version 30 October 2011
Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector
More informationSelection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction
Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction Sergio Pérez- Elizalde, Jaime Cuevas, Paulino Pérez- Rodríguez,and José Crossa One of the most
More informationPartitioning Genetic Variance
PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive
More informationNew imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)
New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) Kelly Swarts PAG Allele Mining 1/11/2014 Imputation is the projection
More informationTASK 6.3 Modelling and data analysis support
Wheat and barley Legacy for Breeding Improvement TASK 6.3 Modelling and data analysis support FP7 European Project Task 6.3: How can statistical models contribute to pre-breeding? Daniela Bustos-Korts
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationLarge scale genomic prediction using singular value decomposition of the genotype matrix
https://doi.org/0.86/s27-08-0373-2 Genetics Selection Evolution RESEARCH ARTICLE Open Access Large scale genomic prediction using singular value decomposition of the genotype matrix Jørgen Ødegård *, Ulf
More informationMixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012
Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O
More informationSupplementary Materials
Supplementary Materials A Prior Densities Used in the BGLR R-Package In this section we describe the prior distributions assigned to the location parameters, (β j, u l ), entering in the linear predictor
More informationPriors in whole-genome regression: the Bayesian alphabet returns
Genetics: Early Online, published on May, 3 as.534/genetics.3.5753 Priors in whole-genome regression: the Bayesian alphabet returns 3 Daniel Gianola Department of Animal Sciences, Department of Biostatistics
More informationSPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA
SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer
More informationH = σ 2 G / σ 2 P heredity determined by genotype. degree of genetic determination. Nature vs. Nurture.
HCS825 Lecture 5, Spring 2002 Heritability Last class we discussed heritability in the broad sense (H) and narrow sense heritability (h 2 ). Heritability is a term that refers to the degree to which a
More informationPrediction of the Confidence Interval of Quantitative Trait Loci Location
Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28
More informationPackage brnn. R topics documented: January 26, Version 0.6 Date
Version 0.6 Date 2016-01-26 Package brnn January 26, 2016 Title Bayesian Regularization for Feed-Forward Neural Networks Author Paulino Perez Rodriguez, Daniel Gianola Maintainer Paulino Perez Rodriguez
More informationEiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6
Yamamoto et al. BMC Genetics 2014, 15:50 METHODOLOGY ARTICLE Open Access Effect of advanced intercrossing on genome structure and on the power to detect linked quantitative trait loci in a multi-parent
More informationIn animal and plant breeding, phenotypic selection indices
Published December 30 2015 RESEARCH Statistical Sampling Properties of the Coefficients of Three Phenotypic Selection Indices J. Jesus Cerón-Rojas José Crossa* Jaime Sahagún-Castellanos ABSTRACT The aim
More informationSelection Methods in Plant Breeding
Selection Methods in Plant Breeding Selection Methods in Plant Breeding 2nd Edition by Izak Bos University of Wageningen, The Netherlands and Peter Caligari University of Talca, Chile A C.I.P. Catalogue
More informationGenotyping strategy and reference population
GS cattle workshop Genotyping strategy and reference population Effect of size of reference group (Esa Mäntysaari, MTT) Effect of adding females to the reference population (Minna Koivula, MTT) Value of
More informationMOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES
MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES Patricia E Klein, Mandy Yan, Ellen Young, Jeekin Lau, Stella Kang, Natalie Patterson, Natalie Anderson and David Byrne Department of Horticultural Sciences,
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for
More informationRelevance Vector Machines for Earthquake Response Spectra
2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More information3. Properties of the relationship matrix
3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,
More informationMaize Genetics Cooperation Newsletter Vol Derkach 1
Maize Genetics Cooperation Newsletter Vol 91 2017 Derkach 1 RELATIONSHIP BETWEEN MAIZE LANCASTER INBRED LINES ACCORDING TO SNP-ANALYSIS Derkach K. V., Satarova T. M., Dzubetsky B. V., Borysova V. V., Cherchel
More informationWashington Grain Commission Wheat and Barley Research Annual Progress Reports and Final Reports
Washington Grain Commission Wheat and Barley Research Annual Progress Reports and Final Reports PROJECT #: 30109-5345 Progress report year: 3 of 3 Title: Evaluation And Selection For Cold Tolerance In
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationA. Motivation To motivate the analysis of variance framework, we consider the following example.
9.07 ntroduction to Statistics for Brain and Cognitive Sciences Emery N. Brown Lecture 14: Analysis of Variance. Objectives Understand analysis of variance as a special case of the linear model. Understand
More informationc. M. Hernandez, J. Crossa, A. castillo
THE AREA UNDER THE FUNCTION: AN INDEX FOR SELECTING DESIRABLE GENOTYPES 8 9 c. M. Hernandez, J. Crossa, A. castillo 0 8 9 0 Universidad de Colima, Mexico. International Maize and Wheat Improvement Center
More informationGenetics (patterns of inheritance)
MENDELIAN GENETICS branch of biology that studies how genetic characteristics are inherited MENDELIAN GENETICS Gregory Mendel, an Augustinian monk (1822-1884), was the first who systematically studied
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationPackage MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.
Package MACAU2 April 8, 2017 Type Package Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data Version 1.10 Date 2017-03-31 Author Shiquan Sun, Jiaqiang Zhu, Xiang Zhou Maintainer Shiquan Sun
More informationA mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding
Professur Pflanzenzüchtung Professur Pflanzenzüchtung A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding Jens Léon 4. November 2014, Oulu Workshop
More informationEvolution of quantitative traits
Evolution of quantitative traits Introduction Let s stop and review quickly where we ve come and where we re going We started our survey of quantitative genetics by pointing out that our objective was
More informationEPISTASIS has long been recognized as an important component
GENETICS GENOMIC SELECTION Modeling Epistasis in Genomic Selection Yong Jiang and Jochen C. Reif 1 Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben,
More informationBayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition
Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction
More informationHow robust are the predictions of the W-F Model?
How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationAn indirect approach to the extensive calculation of relationship coefficients
Genet. Sel. Evol. 34 (2002) 409 421 409 INRA, EDP Sciences, 2002 DOI: 10.1051/gse:2002015 Original article An indirect approach to the extensive calculation of relationship coefficients Jean-Jacques COLLEAU
More informationDNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to
1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationQ1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.
OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall
More informationQuantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci
Quantitative Genetics: Traits controlled my many loci So far in our discussions, we have focused on understanding how selection works on a small number of loci (1 or 2). However in many cases, evolutionary
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationBGLR: A Statistical Package for Whole-Genome Regression
BGLR: A Statistical Package for Whole-Genome Regression Paulino Pérez Rodríguez Socio Economía Estadística e Informática, Colegio de Postgraduados, México perpdgo@colpos.mx Gustavo de los Campos Department
More informationRegression Model In The Analysis Of Micro Array Data-Gene Expression Detection
Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationWheat Genetics and Molecular Genetics: Past and Future. Graham Moore
Wheat Genetics and Molecular Genetics: Past and Future Graham Moore 1960s onwards Wheat traits genetically dissected Chromosome pairing and exchange (Ph1) Height (Rht) Vernalisation (Vrn1) Photoperiodism
More informationPrediction and Validation of Three Cross Hybrids in Maize (Zea mays L.)
International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 7 Number 01 (2018) Journal homepage: http://www.ijcmas.com Original Research Article https://doi.org/10.20546/ijcmas.2018.701.183
More informationLinear Models for the Prediction of Animal Breeding Values
Linear Models for the Prediction of Animal Breeding Values R.A. Mrode, PhD Animal Data Centre Fox Talbot House Greenways Business Park Bellinger Close Chippenham Wilts, UK CAB INTERNATIONAL Preface ix
More informationR-squared for Bayesian regression models
R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationQuantitative Trait Variation
Quantitative Trait Variation 1 Variation in phenotype In addition to understanding genetic variation within at-risk systems, phenotype variation is also important. reproductive fitness traits related to
More informationVARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)
VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi- 11 0012 vkbhatia@iasri.res.in Introduction Variance components are commonly used
More informationPooling multiple imputations when the sample happens to be the population.
Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationHeinrich Grausgruber Department of Crop Sciences Division of Plant Breeding Konrad-Lorenz-Str Tulln
957.321 Sources: Nespolo (2003); Le Rouzic et al. (2007) Heinrich Grausgruber Department of Crop Sciences Division of Plant Breeding Konrad-Lorenz-Str. 24 3430 Tulln Zuchtmethodik & Quantitative Genetik
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationIntroduction to QTL mapping in model organisms
Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationPower and sample size calculations for designing rare variant sequencing association studies.
Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationGenetic Analysis for Heterotic Traits in Bread Wheat (Triticum aestivum L.) Using Six Parameters Model
International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 7 Number 06 (2018) Journal homepage: http://www.ijcmas.com Original Research Article https://doi.org/10.20546/ijcmas.2018.706.029
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationCharles E. McCulloch Biometrics Unit and Statistics Center Cornell University
A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationCh 11.Introduction to Genetics.Biology.Landis
Nom Section 11 1 The Work of Gregor Mendel (pages 263 266) This section describes how Gregor Mendel studied the inheritance of traits in garden peas and what his conclusions were. Introduction (page 263)
More informationBiostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE
Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationLikelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is
Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb
More information