GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP

Size: px
Start display at page:

Download "GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP"

Transcription

1 GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP

2 RIDGE REGRESSION THE ESTIMATOR (HOERL AND KENNARD, 1979) WAS DERIVED USING A CONSTRAINED MINIMIZATION ARGUMENT Recirocal of Lagrange multilier ( regularization arameter in machine learning) Exected value Bias Covariance matrix

3 NOTE: THE INTERCEPT IS NOT REGULARIZED

4 WAYS IN WHICH RIDGE REGRESSION CAN BE INTERPRETED As shrunken estimator of regressions or marker effects (ridge) As redictor of random effects (BLUP) [THIS IS A CRYPTIC INTERPRETATION BECAUSE WE WISH TO LEARN GENE EFFECTS : these do not vary at random, but over a concetual distribution] As mean of a conditional osterior distribution (Bayes) As maximum of a enalized likelihood under the L2 norm (PMLE) In all cases we need ω OR variance ratio (and of the individual variances for interval inference)

5 BLUP

6

7

8 Bayes IN THE BAYESIAN INTERPRETATION, β HAS A TRUE, FIXED VALUE. THE RANDOMNESS REPRESENTS UNCERTAINTY PRIOR AND POSTERIOR TO OBSERVING DATA

9 RIDGE REGRESSION EXAMPLE 1

10 RIDGE REGRESSION EXAMPLE 2

11

12

13

14 ASSIGNING A VALUE TO ω: GENERALIZED CROSS-VALIDATION VARIANCE COMPONENTS+ BRUTE FORCE:

15 GCV: MOTIVATION The residual SS will be called T(λ)

16 SAME TRAIT IN 4 DIFFERENT ENVIRONMENTS: FOUR DIFFERENT BEHAVIORS!

17 GENOMIC BLUP

18

19 RECALL Genotyes (random variable W denotes genotye at a locus) W aa 1 W Aa 0 W AA 1 E HW W 2 q 2 q Var HW W E X 2 E 2 X 2 q 2 q 2 2q W aa 0 W Aa 1 W AA 2 E HW W 2 2 2q 2 q 2 Var HW W 4 2 2q 4 2 2q Coding does not affect the variance of genotyes but mean shifts 2 q 1 Deviations from means are invariant to this tye of coding W E W Coding 1 1 q 1 q 2 0 q q q 1 q 2 1 W E W Coding

20 A LOOK AT VAN RADEN S GENOMIC RELATIONSHIP MATRIX x 11. x 1 X ind,marker x 21. x 2... x n1. x n XX x 11. x 1 x 21. x 2... x n1. x n x 11 x 21. x n1.... x 1 x 2. x n 2 x 1j x 1j x 2j x 2 2j. x 1j x nj 2 x nj

21 In Van Raden s G-matrix : If all elements of G(VR) are divided by this factor, then scale is consistent with A. E E x ij x 1 j x 2 j 2 Var x ij E 2 x ij 2 j q j j q j j q j Cov x 1 j x 2j 2 ij j q j j q j 2 2 j q j E x 1 j E x 2 j 2 j q 2 j 2 j q j 1 Cov x 1j x 2 j 2 j q 2 j 2 j q j 1 j q j 2 2q Or if x s centered additive relationshi Note: LD does not enter into this form of genomic relationshi matrix

22 UNDER HARDY-WEINBERG AND IDEALIZED CONDITIONS E XX 2 j q j j q j 2 Symmetric a 12 2 j q j j q j 2.. a 1n 2 j q j j q j 2 Additive relationshis a 2n... a n,n 1 2 j q j j q j 2 2 j q j j q j 2 2 j q j j q j 2 2 j q j j q j 2

23 Likewise, if the x s are centered 1 a 12.. a 1n E X E X n X E X n 2 j q j Symmetric 1 a 2n... a n,n 1 1 A 2 j q j E X E X n X E X n 2 j q j A A= n x n matrix of additive relationshis

24 Then, the genomic relationshi matrix G X E X X E X 2 j 1 j X X V M,HW Is the realization of a rocess. If this rocess is the HW rocess, then its exectation is E X E X n X E X n 2 j q j A For examle: arent and offsring are exected to have a relationshi=0.5 but in reality it could be larger or smaller

25 MANY G-MATRICES (each may rovide a different variance comonent decomosition) Examles G VR 1 X cent X cent 2 j 1 q j G ST 1 X stdx std ; X std x ij x j Var x ij G 1 2 G VR G ST followed by some scaling? G W 1 X stdwx std where W uses LD information? G Blend G ST 1 A after some re-scaling of matrices G VR scaled in (0,2) with maminimax function? x xmin y ymin ymax ymin xmax xmin g ij,vr min g ij,vr g ij 2 max g ij,vr min g ij,vr

26 FOCUS: GWAS FOCUS: HERITABILITY FOCUS: KINSHIP

27 MARKERS ARE NOT QTL: a disconnect

28 QTLs in LD with markers QTLs in LE with markers

29 Off-diagonals of Genomic correlations among 500 individuals MARKERS OBSERVABLE UNOBSERVABLE QTLs

30 BACK TO BASIC SETTING: linear regression on markers

31 BLUP Cov, y Var 1 y y E y BRUTE FORCE 1 2 X XX 2 I e 2 1 y BRUTE FORCE 2 X XX 1 I XX 1 e y BRUTE FORCE: invert n x n and then ma onto x n MME: invert x NO COMPELLING REASON FOR MME HERE

32 MARKED GENOTYPIC VALUE SAME RESULT: BOTH n x n COMPUTATIONS REQUIRED

33 E y, variance comonents Estimate marker effects from genomic BLUP? Use standard BLUP theory under normality! E y E g y E g "ITERATED EXPECTATIONS" g E X y, variance comonents under normality E X E Cov, X Var X 1 X E X 0 2 X XX X E X y E X, y E X y X XX 1 X y X XX 1 E X y X y X XX 1 g E y, variance comonents X XX 1 I XX 1 2 e e /V M,HW 1 y [REMEMBER THIS]

34 BRUTE FORCE DEFINITION: BLUP is a conditional exectation under normality E y, variance comonents Cov, X XX 2 I e 2 1 y 2 X XX 2 I e 2 1 y 2 X XX 1 2 XX 1 e 2 1 y X XX 1 I XX 1 e y [REMEMBER?] CAN GO BACK AND FORTH BETWEEN GENOMIC BLUP AND RIDGE REGRESSION ESTIMATES OF MARKER EFFFECTS X XX 1 g g X

35 BACK TO GENOMIC BLUP When should a secific reresentation of GBLUP be used? Suose <n. Then G has at most rank=n and the inverse of G does not exist rm(list=ls(all=true)) ###LOAD LATTICE AND MATRIX library(mass) library(bglr) library(lattice) library(matrix) set.seed( ) ####GVR= genomic relationshi a la Van Raden (2008) GVR<-X%*%t(X)/varHW ar(mfrow=c(2,1)) vecgvr<-as.vector(gvr) hist(gvr,main="distribution of elements of GVR",xlab="GVR values") diaggvr<-diag(gvr) lot(diaggvr,ylab="diagonal values",main="diagonal values of GVR") ar(mfrow=c(1,1)) ###LOAD DATA data(wheat) Y<-wheat.Y X<-wheat.X y<-y[,1] n<-length(y) X<-X[,1:50] freq<-numeric(ncol(x)) for (j in 1: ncol(x)){ freq[j]<-mean(x[,j]) } X<-scale(X, center = TRUE, scale = FALSE) Frequency Distribution of elements of GVR > summary(vecgvr) Min. 1st Qu. Median Mean 3 rd Qu. Max > summary(diaggvr) GVR values Min. 1st Qu. Median Mean 3rd Qu. Max ###Markers are binary so var of marker codes is (1-) ###instead of 2(1-) er locus Diagonal values of GVR varhw<-sum(freq*(1-freq)) varhw [1] Diagonal values Index ISSUE HERE: SCALE DIFFERS FROM THAT OF A!

36 Calculation of GBLUP (once one has arrived at some G) g E g Cov g, y Var y 1 y E y G 2 g G 2 g I 2 e 1 y G G I e y g I G 1 e 2 g 2 1 y Var g y Var g g G 2 g G 2 g G 2 g I 2 e 1 2 G g G g 2 G G I e 2 G g 2 I G 1 e 2 I G 1 e 2 g 2 1 I G 1 2 e e g g 2 1 G g 2 g 2 1 G g 2 I G 1 2 e G 2 g 2 2 G g g

37 ####Does GVR have an inverse in this case? No, rank(gvr) should be 50 > GVRinv<-chol2inv(chol(GVR)) Error in chol2inv(chol(gvr)) : error in evaluating the argument 'x' in selecting a method for function 'chol2inv': Error in chol.default(gvr) : the leading minor of order 38 is not ositive definite > rankmatrix(gvr) Warning in rankmatrix(gvr) : rankmatrix(<large sarse Matrix>, method = 'tolnorm2') coerces to dense matrix. Probably should rather use method = 'qrlinpack'!? [1] 50 attr(,"method") [1] "tolnorm2" attr(,"usegrad") [1] FALSE attr(,"tol") [1] e-11 MUST USE STRONG ARM FOR CALCULATING GBLUP g GVR G VR I e 2 g 2 1 y Var g g G VR g 2 G VR G VR I e 2 I G VR G VR I e 2 g 2 1 G VR g 2 g 2 1 G VR g 2

38 Suose 2 g 0.30; e g ####Comute GBLUP using the strong-arm method. ####varg=0.30,vare=0.70 ####lambda< varg=0.30 vare=0.70 lambda=vare/varg Vstar<-(GVR+lambda*diag(n)) Vstarinv<-chol2inv(chol(Vstar)) ghat<-gvr%*%vstarinv%*%y lot(ghat,ylab="gblup",main="genomic BLUP (Van Raden G) varg=0.30 vare=0.70") ###Comute rediction error variance covariance matrix PEVMAT<-varg*(diag(n)-GVR%*%Vstarinv)%*%GVR ###CALCULATE MODEL DERIVED RELIABILITIES RELS<-varg*diag(n)-diag(PEVMAT)/varg RELGBLUPS<-diag(RELS) lot(relgblups,ylab="rel",main="reliabilities of G-BLUP (Van Raden G)") Genomic BLUP (Van Raden G) varg=0.30 vare=0.70 Reliabilities of G-BLUP (Van Raden G) GBLUP REL No evidence of overfit Index Index

39 #####Imact of G-matrix on GBLUP #####Assume same variance decomosition #####Scale to be in (0,2) GVscaled<-matrix(nrow=nrow(X),ncol=nrow(X)) VRmin<-min(GVR) VRmax<-max(GVR) VRmin VRmax for (i in 1:nrow(X)){ for (j in 1:nrow(X)){ GVscaled[i,j]<-2*(GVR[i,j]-VRmin)/(VRmax-VRmin) } } Frequency Histogram of A scaled A Histogram of GVR scaled in (0,2) ####How does it comare with A? A<-wheat.A ar(mfrow=c(2,1)) hist(a,main="histogram of A scaled") hist(gvscaled,main="histogram of GVR scaled in (0,2)") Frequency GVscaled ar(mfrow=c(1,1)) cor(as.vector(a),as.vector(gvscaled)) [1]

40 #####BLUP (assume save var decomosition) varg=0.30 vare=0.70 lambda=vare/varg VstarGVS<-(GVscaled+lambda*diag(n)) VstarinvGVS<-chol2inv(chol(VstarGVS)) ghatgvs<-gvscaled%*%vstarinvgvs%*%y ghatgvs BLUP A vs GBLUP GVS VstarA<-(A+lambda*diag(n)) VstarinvA<-chol2inv(chol(VstarA)) ghata<-a%*%vstarinva%*%y ghat ar(mfrow=c(3,1)) lot(ghata,ghatgvs,main="blup A vs GBLUP GVS") lot(ghata,ghat,main="blup A vs GBLUP GVR") lot(ghat,ghatgvs,main="gblup GVR vs GBLUP GVS") ar(mfrow=c(1,1)) > cor(ghata,ghat) [,1] [1,] > cor(ghata,ghatgvs) [,1] [1,] > cor(ghat,ghatgvs) [,1] [1,] ghatgvs ghata BLUP A vs GBLUP GVR ghata GBLUP GVR vs GBLUP GVS msea<-sum((y-ghata)**2)/n msehat<-sum((y-ghat)**2)/n msehatgvs<-sum((y-ghatgvs)**2)/n msea msehat msehatgvs > msea [1] > msehat [1] > msehatgvs [1] ghat BLUP(A) FITS BETTER (may redict worse)

41 GENERALIZED CV IN GBLUP (zero-means model, wheat data)

42

File S1: R Scripts used to fit models

File S1: R Scripts used to fit models File S1: R Scripts used to fit models This document illustrates for the wheat data set how the models were fitted in R. To begin, the required R packages are loaded as well as the wheat data from the BGLR

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Supplementary Materials

Supplementary Materials Supplementary Materials A Prior Densities Used in the BGLR R-Package In this section we describe the prior distributions assigned to the location parameters, (β j, u l ), entering in the linear predictor

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Package BGGE. August 10, 2018

Package BGGE. August 10, 2018 Package BGGE August 10, 2018 Title Bayesian Genomic Linear Models Applied to GE Genome Selection Version 0.6.5 Date 2018-08-10 Description Application of genome prediction for a continuous variable, focused

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analysis of Variance and Design of Exeriment-I MODULE II LECTURE -4 GENERAL LINEAR HPOTHESIS AND ANALSIS OF VARIANCE Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur

More information

Enhancing Genome-Enabled Prediction by Bagging Genomic BLUP

Enhancing Genome-Enabled Prediction by Bagging Genomic BLUP Enhancing Genome-Enabled Prediction by Bagging Genomic BLUP Daniel Gianola 1,,3 *, Kent A. Weigel, Nicole Krämer 4, Alessandra Stella 5, Chris-Carolin Schön 4 1 Department of Animal Sciences, University

More information

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs INTRODUCTION TO ANIMAL BREEDING Lecture Nr 3 The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs Etienne Verrier INA Paris-Grignon, Animal Sciences Department

More information

Section 4.4 Z-Scores and the Empirical Rule

Section 4.4 Z-Scores and the Empirical Rule Section 4.4 Z-Scores and the Empirical Rule 1 GPA Example A sample of GPAs of 40 freshman college students appear below (sorted in increasing order) 1.40 1.90 1.90 2.00 2.10 2.10 2.20 2.30 2.30 2.40 2.50

More information

GBLUP and G matrices 1

GBLUP and G matrices 1 GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described

More information

The Poisson Regression Model

The Poisson Regression Model The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Supplemental Information

Supplemental Information Sulemental Information Anthony J. Greenberg, Sean R. Hacett, Lawrence G. Harshman and Andrew G. Clar Table of Contents Table S1 2 Table S2 3 Table S3 4 Figure S1 5 Figure S2 6 Figure S3 7 Figure S4 8 Text

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012 Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model

More information

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Allele Frequency Estimation

Allele Frequency Estimation Allele Frequency Estimation Examle: ABO blood tyes ABO genetic locus exhibits three alleles: A, B, and O Four henotyes: A, B, AB, and O Genotye A/A A/O A/B B/B B/O O/O Phenotye A A AB B B O Data: Observed

More information

Prediction of genetic Values using Neural Networks

Prediction of genetic Values using Neural Networks Prediction of genetic Values using Neural Networks Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1 1 CIMMyT-Mexico 2 University of Wisconsin, Madison. September, 2014 SLU,Sweden Prediction of genetic Values

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

ASSOCIATION ANALYSES of the MAS-QTL DATA SET using GRAMMAR, PRINCIPAL COMPONENTS and BAYESIAN NETWORK METHODOLOGIES

ASSOCIATION ANALYSES of the MAS-QTL DATA SET using GRAMMAR, PRINCIPAL COMPONENTS and BAYESIAN NETWORK METHODOLOGIES OSL ASSOCATO AALYSS of the MAS-QTL DATA ST using GAMMA, PCPAL COMPOTS and BAYSA TWOK MTODOLOGS Burak Karacaören, Tomi Silander, José M. Álvarez- Castro, Chris S. aley, Dirk Jan de Koning OSL STTT and (D)SVS,

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Ridge and Lasso Regression

Ridge and Lasso Regression enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1) Introduction to MVC Definition---Proerness and strictly roerness A system G(s) is roer if all its elements { gij ( s)} are roer, and strictly roer if all its elements are strictly roer. Definition---Causal

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

ANOVA, ANCOVA and MANOVA as sem

ANOVA, ANCOVA and MANOVA as sem ANOVA, ANCOVA and MANOVA as sem Robin Beaumont 2017 Hoyle Chapter 24 Handbook of Structural Equation Modeling (2015 paperback), Examples converted to R and Onyx SEM diagrams. This workbook duplicates some

More information

Package LBLGXE. R topics documented: July 20, Type Package

Package LBLGXE. R topics documented: July 20, Type Package Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Introduction to Confirmatory Factor Analysis

Introduction to Confirmatory Factor Analysis Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

FE FORMULATIONS FOR PLASTICITY

FE FORMULATIONS FOR PLASTICITY G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

A Time-Varying Threshold STAR Model of Unemployment

A Time-Varying Threshold STAR Model of Unemployment A Time-Varying Threshold STAR Model of Unemloyment michael dueker a michael owyang b martin sola c,d a Russell Investments b Federal Reserve Bank of St. Louis c Deartamento de Economia, Universidad Torcuato

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Chapter 13 Variable Selection and Model Building

Chapter 13 Variable Selection and Model Building Chater 3 Variable Selection and Model Building The comlete regsion analysis deends on the exlanatory variables ent in the model. It is understood in the regsion analysis that only correct and imortant

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions

10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions 10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions March 24, 2008 1 [25 points], (Jingrui) (a) Generate data as follows. n = 100 p = 1000 X = matrix(rnorm(n p), n, p) beta = c(rep(10,

More information

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 17. Bayesian inference; Bayesian regression Training == optimisation (?) Stages of learning & inference: Formulate model Regression

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Hypothesis Test-Confidence Interval connection

Hypothesis Test-Confidence Interval connection Hyothesis Test-Confidence Interval connection Hyothesis tests for mean Tell whether observed data are consistent with μ = μ. More secifically An hyothesis test with significance level α will reject the

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v) Multile Sequence Alignment Given: Set of sequences Score matrix Ga enalties Find: Alignment of sequences such that otimal score is achieved. Motivation Aligning rotein families Establish evolutionary relationshis

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information