GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

Size: px
Start display at page:

Download "GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)"

Transcription

1 GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 1/29

2 Contents 1 General comments 2 LASSO 3 Application examples 4 Extension of BL to include infinitesimal effect SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 2/29

3 General comments General comments The regression linear model is given by, where e i N(0, σ 2 e), i = 1,..., n. y i = µ + p x ij β j + e i, (1) j=1 The key Idea is obtain estimates for β and then obtain GEBVs. ˆβ can be obtained using penalized regression methods, for example ridge regression (G-BLUP). Now we review another penalized regression method called LASSO=Least Angle and Shrinkage Operator. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 3/29

4 LASSO LASSO In LASSO estimates for β are obtained by minimizing the augmented sum of squares: { min (y X j β j ) (y X j β j ) + λ } β j, (2) β where λ 0 is a regularization parameter that controls the trade-offs between goodness of fit (measured with sum of squares of error, SCE) and model complexity (measured with β 2 j ) Notes: 1 The value for λ can be fixed by using cross-validation methods. 2 Some of the entries in β take the value of 0, so LASSO can be useful as a variable selection method. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 4/29

5 Continued... LASSO Problems with LASSO: 1 At most, n entries in β can be different from 0. This is problematic in GS, where usually n << p (curse of dimensionality). 2 It can be difficult to select the value for λ. 3 It is difficult to obtain estimates for σ 2 e. 4 It is difficult to obtain confidence intervals for β j, j = 1,..., p. Alternatives: Bayesian estimation methods... SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 5/29

6 Bayesian LASSO LASSO SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 6/29

7 Continued... LASSO SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 7/29

8 Continue... LASSO SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 8/29

9 Continued... LASSO Density function In ridge regression, p(β j σβ) 2 = N(β j 0, σβ), 2 j = 1,..., p In LASSO p(β j σe, 2 λ) = DE(β j 0, λ/σe) β Figure 1: Prior in BL and in BRR SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 9/29

10 LASSO Join posterior distribution of model unknowns The join distribution for model unknowns is given by: p(β, σ 2 e, µ data) = n N(y i µ+ p x ij β j, σe) 2 p(β j ω) p(σe) p(µ) p(λ 2 2 ), i=1 where p(µ) 1, p(σ 2 e) = χ 2 (σ 2 e df, S) and p(λ 2 ) = Gamma(λ 2 rate, shape). This model can be implemented using MCMC methods, for more detail see Park and Casella, 2008; de los Campos et al. (2009). j=1 The model is implemented in the package BLR. (3) SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 10/29

11 Contents Application examples Example 1: Barley dataset 1 General comments 2 LASSO 3 Application examples Example 1: Barley dataset Example 2: Wheat dataset (CIMMyT) 4 Extension of BL to include infinitesimal effect SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 11/29

12 Application examples Example 1: Barley dataset Example 1: Barley dataset This example comes from Xi and Xu (2008). DH population with n = 145 lines, each line tested in 25 environments. The response variable is grain yield. We have p = 127 MM covering 7 chromosomes. BL model fitted using the BLR package in R with B = 20, 000 iterations, burn in = 10,000, thin=10. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 12/29

13 SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 13/29 Figure 2: Point estimates for β Application examples Example 1: Barley dataset B=20,000, burnin=10,000, a=b=0.1 β j j

14 Application examples Example 1: Barley dataset β 2 β β 13 β β β Figure 3: Posterior distributions for β s SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 14/29

15 Application examples Example 1: Barley dataset β β β 95 β β Figure 4: Posterior distributions for β s SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 15/29

16 Contents Application examples Example 2: Wheat dataset (CIMMyT) 1 General comments 2 LASSO 3 Application examples Example 1: Barley dataset Example 2: Wheat dataset (CIMMyT) 4 Extension of BL to include infinitesimal effect SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 16/29

17 Application examples Example 2: Wheat dataset (CIMMyT) Example 2: Wheat dataset (CIMMyT) Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers (x ij, i = 1,..., n, j = 1,..., p) (coded as 0,1). The pedigree information is also available. Lets load the dataset in R, 1 Load R 2 Install BGLR package (if not yet installed) 3 Load the package 4 Load the data SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 17/29

18 Continued... Application examples Example 2: Wheat dataset (CIMMyT) Lets assume that we want to predict the grain yield for environment 1 using ridge regression or equivalently the G-BLUP. We do not know the value for σ 2 e and λ, so we can obtain estimates using the data. We will use the function BGLR. R code below fit the BL model using Bayesian approach with non informative priors for σ 2 e, λ, rm(list=ls()) library(bglr) data(wheat) Y=wheat.Y X=wheat.X y=y[,1] setwd( /tmp/ ) #Linear predictor ETA=list(list(X=X,model="BL")) fml<-bglr(y=y,eta=eta,niter=10000, burnin=5000,thin=10) plot(fml$yhat,y[,1]) SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 18/29

19 Application examples Example 2: Wheat dataset (CIMMyT) Continued fml$yhat Y[, 1] Figure shows observed vs predicted grain yield. Predictions ŷ = ˆµ + X ˆβ, and estimates for σ 2 e, λ can be obtained easily in R > fml$yhat > fml$vare [1] > fml$eta[[1]]$lambda [1] SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 19/29

20 Application examples Example 2: Wheat dataset (CIMMyT) Continued Predicted Marker effects Bayesian LASSO Bayesian Ridge Regression Predicted Genetic Values Bayesian LASSO Bayesian Ridge Regression SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 20/29

21 Continued... Application examples Example 2: Wheat dataset (CIMMyT) The GEBVs can be obtained easily in R, #GEVBs #option 1 X%*%fmL$ETA[[1]]$b #option 2 fml$yhat-fml$mu Excersise: Lets assume that we want to predict the grain yield for some wheat lines. Assume that we have only the genotypic information for those lines. Write the R code for fitting a BL model. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 21/29

22 Extension of BL to include infinitesimal effect Extension of BL to include infinitesimal effect de los Campos et al. (2009) extended the basic BL model to include an infinitesimal effect, that is: y i = µ + p x ij β j + u i + e i, (4) j=1 where u N(0, σ 2 ua) and A is the pedigree matrix. The model can be implemented using Bayesian methods. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 22/29

23 Extension of BL to include infinitesimal effect Example 3: Including an infinitesimal effect In this example we continue with the analysis of the wheat dataset, and we include an infinitesimal effect in the model. rm(list=ls()) setwd("/tmp") library(bglr) data(wheat) #Loads the wheat dataset X=wheat.X A=wheat.A Y=wheat.Y y=y[,1] #Linear predictor ETA=list(list(X=X,model="BL"), list(k=a,model="rkhs")) ### Runs the Gibbs sampler fm<-bglr(y=y,eta=eta, niter=30000,burnin=5000,thin=10) SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 23/29

24 Extension of BL to include infinitesimal effect σ e Density Iter σ e 2 σ u Density Iter σ u 2 Figure 5: Posterior distribution for σ 2 e and σ 2 u SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 24/29

25 Extension of BL to include infinitesimal effect Marker βj Figure 6: Marker effects SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 25/29

26 Extension of BL to include infinitesimal effect h Narrow sense heritability calculated according to Xi and Xu (2008), h 2 j = V j ˆβ 2 j V y, where V y is the phenotypic variance, and V j is the sample variance of x ij ; i = 1,..., n Marker Figure 7: Heritability SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 26/29

27 Extension of BL to include infinitesimal effect Phenotype Pred. Gen. Value Figure 8: Observed vs predicted values SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 27/29

28 Extension of BL to include infinitesimal effect Questions? SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 28/29

29 Extension of BL to include infinitesimal effect References Park, T. and Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103, Yi, N. y Xu, S. (2008). Bayesian Lasso for Quantitative Trait Loci Mapping. Genetics, 179, de los Campos G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. Cotes. (2009). Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigree. Genetics 182: Pérez-Rodríguez P., G. de los Campos, J. Crossa and D. Gianola. (2010). Genomic-enabled prediction based on molecular markers and pedigree using the BLR package in R. The plant Genome, 3(2): SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (BL) 29/29

Prediction of genetic Values using Neural Networks

Prediction of genetic Values using Neural Networks Prediction of genetic Values using Neural Networks Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1 1 CIMMyT-Mexico 2 University of Wisconsin, Madison. September, 2014 SLU,Sweden Prediction of genetic Values

More information

Package BLR. February 19, Index 9. Pedigree info for the wheat dataset

Package BLR. February 19, Index 9. Pedigree info for the wheat dataset Version 1.4 Date 2014-12-03 Title Bayesian Linear Regression Package BLR February 19, 2015 Author Gustavo de los Campos, Paulino Perez Rodriguez, Maintainer Paulino Perez Rodriguez

More information

Supplementary Materials

Supplementary Materials Supplementary Materials A Prior Densities Used in the BGLR R-Package In this section we describe the prior distributions assigned to the location parameters, (β j, u l ), entering in the linear predictor

More information

Package BGLR. R topics documented: October 2, Version 1.0. Date Title Bayesian Generalized Linear Regression

Package BGLR. R topics documented: October 2, Version 1.0. Date Title Bayesian Generalized Linear Regression Package BGLR October 2, 2013 Version 1.0 Date 2012-09-12 Title Bayesian Generalized Linear Regression Author Gustavo de los Campos, Paulino Perez Rodriguez, Maintainer Paulino Perez Rodriguez

More information

File S1: R Scripts used to fit models

File S1: R Scripts used to fit models File S1: R Scripts used to fit models This document illustrates for the wheat data set how the models were fitted in R. To begin, the required R packages are loaded as well as the wheat data from the BGLR

More information

Package BGGE. August 10, 2018

Package BGGE. August 10, 2018 Package BGGE August 10, 2018 Title Bayesian Genomic Linear Models Applied to GE Genome Selection Version 0.6.5 Date 2018-08-10 Description Application of genome prediction for a continuous variable, focused

More information

BGLR: A Statistical Package for Whole-Genome Regression

BGLR: A Statistical Package for Whole-Genome Regression BGLR: A Statistical Package for Whole-Genome Regression Paulino Pérez Rodríguez Socio Economía Estadística e Informática, Colegio de Postgraduados, México perpdgo@colpos.mx Gustavo de los Campos Department

More information

BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS. Universidad de Quintana Roo, Chetumal, Quintana Roo, México.

BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS. Universidad de Quintana Roo, Chetumal, Quintana Roo, México. G3: Genes Genomes Genetics Early Online, published on October 28, 2016 as doi:10.1534/g3.116.035584 1 BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS Jaime Cuevas 1, José

More information

Recent advances in statistical methods for DNA-based prediction of complex traits

Recent advances in statistical methods for DNA-based prediction of complex traits Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology

More information

Package bwgr. October 5, 2018

Package bwgr. October 5, 2018 Type Package Title Bayesian Whole-Genome Regression Version 1.5.6 Date 2018-10-05 Package bwgr October 5, 2018 Author Alencar Xavier, William Muir, Shizhong Xu, Katy Rainey. Maintainer Alencar Xavier

More information

Computations with Markers

Computations with Markers Computations with Markers Paulino Pérez 1 José Crossa 1 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Computations with Markers 1/20 Contents 1 Genomic relationship matrix 2 3 Big Data!

More information

Threshold Models for Genome-Enabled Prediction of Ordinal Categorical Traits in Plant Breeding

Threshold Models for Genome-Enabled Prediction of Ordinal Categorical Traits in Plant Breeding University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Faculty Publications, Department of Statistics Statistics, Department of 2015 Threshold Models for Genome-Enabled Prediction

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations

Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations Scientia Agricola http://dx.doi.org/0.590/003-906-04-0383 Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations 43 José Marcelo Soriano

More information

arxiv: v1 [stat.me] 10 Jun 2018

arxiv: v1 [stat.me] 10 Jun 2018 Lost in translation: On the impact of data coding on penalized regression with interactions arxiv:1806.03729v1 [stat.me] 10 Jun 2018 Johannes W R Martini 1,2 Francisco Rosales 3 Ngoc-Thuy Ha 2 Thomas Kneib

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

THE ABILITY TO PREDICT COMPLEX TRAITS from marker data

THE ABILITY TO PREDICT COMPLEX TRAITS from marker data Published November, 011 ORIGINAL RESEARCH Ridge Regression and Other Kernels for Genomic Selection with R Pacage rrblup Jeffrey B. Endelman* Abstract Many important traits in plant breeding are polygenic

More information

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation One-week Course on Genetic Analysis and Plant Breeding 21-2 January 213, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation Jiankang Wang, CIMMYT China and CAAS E-mail: jkwang@cgiar.org; wangjiankang@caas.cn

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Package LBLGXE. R topics documented: July 20, Type Package

Package LBLGXE. R topics documented: July 20, Type Package Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author

More information

Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models

Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models GENOMIC SELECTION Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models Jaime Cuevas,* José Crossa,,1 Osval A. Montesinos-López, Juan Burgueño, Paulino Pérez-Rodríguez, and

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction

Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction Sergio Pérez- Elizalde, Jaime Cuevas, Paulino Pérez- Rodríguez,and José Crossa One of the most

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Pedigree and genomic evaluation of pigs using a terminal cross model

Pedigree and genomic evaluation of pigs using a terminal cross model 66 th EAAP Annual Meeting Warsaw, Poland Pedigree and genomic evaluation of pigs using a terminal cross model Tusell, L., Gilbert, H., Riquet, J., Mercat, M.J., Legarra, A., Larzul, C. Project funded by:

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs INTRODUCTION TO ANIMAL BREEDING Lecture Nr 3 The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs Etienne Verrier INA Paris-Grignon, Animal Sciences Department

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

QTL model selection: key players

QTL model selection: key players QTL Model Selection. Bayesian strategy. Markov chain sampling 3. sampling genetic architectures 4. criteria for model selection Model Selection Seattle SISG: Yandell 0 QTL model selection: key players

More information

Genotyping strategy and reference population

Genotyping strategy and reference population GS cattle workshop Genotyping strategy and reference population Effect of size of reference group (Esa Mäntysaari, MTT) Effect of adding females to the reference population (Minna Koivula, MTT) Value of

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

arxiv: v1 [stat.me] 5 Aug 2015

arxiv: v1 [stat.me] 5 Aug 2015 Scalable Bayesian Kernel Models with Variable Selection Lorin Crawford, Kris C. Wood, and Sayan Mukherjee arxiv:1508.01217v1 [stat.me] 5 Aug 2015 Summary Nonlinear kernels are used extensively in regression

More information

Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data

Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data DOCTORAL THESIS IN ANIMAL SCIENCE Hanni P. Kärkkäinen ACADEMIC DISSERTATION To be presented, with the permission of

More information

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression Genetics: Published Articles Ahead of Print, published on February 15, 2010 as 10.1534/genetics.110.114280 Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Bayesian Linear Models

Bayesian Linear Models Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors

More information

Robust Bayesian Simple Linear Regression

Robust Bayesian Simple Linear Regression Robust Bayesian Simple Linear Regression October 1, 2008 Readings: GIll 4 Robust Bayesian Simple Linear Regression p.1/11 Body Fat Data: Intervals w/ All Data 95% confidence and prediction intervals for

More information

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:storlie@lanl.gov Outline Reduction of Emulator

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

A Review of Bayesian Variable Selection Methods: What, How and Which

A Review of Bayesian Variable Selection Methods: What, How and Which Bayesian Analysis (2009) 4, Number 1, pp. 85 118 A Review of Bayesian Variable Selection Methods: What, How and Which R.B. O Hara and M. J. Sillanpää Abstract. The selection of variables in regression

More information

DOI /sagmb Statistical Applications in Genetics and Molecular Biology 2013; 12(3):

DOI /sagmb Statistical Applications in Genetics and Molecular Biology 2013; 12(3): DOI 10.1515/sagmb-01-004 Statistical Alications in Genetics and Molecular Biology 013; 1(3): 375 391 Christina Lehermeier, Valentin Wimmer, Theresa Albrecht a, Hans-Jürgen Auinger, Daniel Gianola, Volker

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

Genome-enabled Prediction of Complex Traits with Kernel Methods: What Have We Learned?

Genome-enabled Prediction of Complex Traits with Kernel Methods: What Have We Learned? Proceedings, 10 th World Congress of Genetics Applied to Livestock Production Genome-enabled Prediction of Complex Traits with Kernel Methods: What Have We Learned? D. Gianola 1, G. Morota 1 and J. Crossa

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Integrated Anlaysis of Genomics Data

Integrated Anlaysis of Genomics Data Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between

More information

Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations

Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations 4 Scientia Agricola http://dx.doi.org/0.590/678-99x-05-0479 Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations José Marcelo Soriano

More information

Bayesian QTL mapping using skewed Student-t distributions

Bayesian QTL mapping using skewed Student-t distributions Genet. Sel. Evol. 34 00) 1 1 1 INRA, EDP Sciences, 00 DOI: 10.1051/gse:001001 Original article Bayesian QTL mapping using skewed Student-t distributions Peter VON ROHR a,b, Ina HOESCHELE a, a Departments

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Large scale genomic prediction using singular value decomposition of the genotype matrix

Large scale genomic prediction using singular value decomposition of the genotype matrix https://doi.org/0.86/s27-08-0373-2 Genetics Selection Evolution RESEARCH ARTICLE Open Access Large scale genomic prediction using singular value decomposition of the genotype matrix Jørgen Ødegård *, Ulf

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

MACAU 2.0 User Manual

MACAU 2.0 User Manual MACAU 2.0 User Manual Shiquan Sun, Jiaqiang Zhu, and Xiang Zhou Department of Biostatistics, University of Michigan shiquans@umich.edu and xzhousph@umich.edu April 9, 2017 Copyright 2016 by Xiang Zhou

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals

A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals Bioinformatics Advance Access published August 30 015 A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals A. Yazdani 1 D. B. Dunson 1 Human Genetic Center University of

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model

More information

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Hierarchical Generalized Linear Models for Multiple QTL Mapping Genetics: Published Articles Ahead of Print, published on January 1, 009 as 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple QTL Mapping Nengun Yi 1,* and Samprit Baneree

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

IEOR165 Discussion Week 5

IEOR165 Discussion Week 5 IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework

More information

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Hierarchical Modeling for Spatial Data

Hierarchical Modeling for Spatial Data Bayesian Spatial Modelling Spatial model specifications: P(y X, θ). Prior specifications: P(θ). Posterior inference of model parameters: P(θ y). Predictions at new locations: P(y 0 y). Model comparisons.

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Limited dimensionality of genomic information and effective population size

Limited dimensionality of genomic information and effective population size Limited dimensionality of genomic information and effective population size Ivan Pocrnić 1, D.A.L. Lourenco 1, Y. Masuda 1, A. Legarra 2 & I. Misztal 1 1 University of Georgia, USA 2 INRA, France WCGALP,

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High

More information

Package brnn. R topics documented: January 26, Version 0.6 Date

Package brnn. R topics documented: January 26, Version 0.6 Date Version 0.6 Date 2016-01-26 Package brnn January 26, 2016 Title Bayesian Regularization for Feed-Forward Neural Networks Author Paulino Perez Rodriguez, Daniel Gianola Maintainer Paulino Perez Rodriguez

More information

Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping

Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping Huang et al. BMC Genetics 2013, 14:5 METHODOLOGY ARTICLE Open Access Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping Anhui Huang 1, Shizhong Xu 2 and Xiaodong Cai 1*

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

The Pennsylvania State University The Graduate School THE BAYESIAN LASSO, BAYESIAN SCAD AND BAYESIAN GROUP LASSO WITH APPLICATIONS TO GENOME-WIDE

The Pennsylvania State University The Graduate School THE BAYESIAN LASSO, BAYESIAN SCAD AND BAYESIAN GROUP LASSO WITH APPLICATIONS TO GENOME-WIDE The Pennsylvania State University The Graduate School THE BAYESIAN LASSO, BAYESIAN SCAD AND BAYESIAN GROUP LASSO WITH APPLICATIONS TO GENOME-WIDE ASSOCIATION STUDIES A Dissertation in Statistics by Jiahan

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information